Skip to content

2025-04-21

cs.AR - Architecture

标题作者发布日期PDF摘要
SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysisRitik Raj, Sarbartha Banerjee, Nikhil Chandra, Zishen Wan, Jianming Tong, Ananda Samajdar, Tushar Krishna2025-04-21下载The rapid advancements in AI, scientific computing, and high-performance computing (HPC) have driven the need for versatile and efficient hardware accelerators.
Computing with Printed and Flexible ElectronicsMehdi B. Tahoori, Emre Ozer, Georgios Zervakis, Konstantinos Balaskas, Priyanjana Pal2025-04-21下载Printed and flexible electronics (PFE) have emerged as the ubiquitous solution for application domains at the extreme edge, where the demands for low manufacturing and operational cost cannot be met b...
ForgeBench: A Machine Learning Benchmark Suite and Auto-Generation Framework for Next-Generation HLS ToolsAndy Wanna, Hanqiu Chen, Cong Hao2025-04-21下载Although High-Level Synthesis (HLS) has attracted considerable interest in hardware design, it has not yet become mainstream due to two primary challenges.
Advancing AI-assisted Hardware Design with Hierarchical Decentralized Training and Personalized Inference-Time OptimizationHao Mark Chen, Zehuan Zhang, Wanru Zhao, Nicholas Lane, Hongxiang Fan2025-04-21下载Recent years have witnessed a significant increase in the adoption of AI techniques to enhance electronic design automation. In particular, the emergence of Large Language Models (LLMs) has sparked si...
Considerations on the Design of Transceivers for Ambient Internet of ThingsYuxiao Zhao, Zhen Shen, Shiyu Li, Jing Feng, Hao Min2025-04-21下载The Ambient IoT (A-IoT) will introduce trillions of connections and enable low-cost battery-less devices. The A-IoT nodes can achieve low cost ($\sim$ 0.
Hardware-based Heterogeneous Memory Management for Large Language Model InferenceSoojin Hwang, Jungwoo Kim, Sanghyeon Lee, Hongbeen Kim, Jaehyuk Huh2025-04-21下载A large language model (LLM) is one of the most important emerging machine learning applications nowadays. However, due to its huge model size and runtime increase of the memory footprint, LLM inferen...
GainSight: A Unified Framework for Data Lifetime Profiling and Heterogeneous Memory CompositionPeijing Li, Matthew Hung, Yiming Tan, Konstantin Hoßfeld, Jake Cheng Jiajun, Shuhan Liu, Lixian Yan, Xinxin Wang, Philip Levis, H. -S. Philip Wong, Thierry Tambe2025-04-21下载As AI workloads drive increasing memory requirements, domain-specific accelerators need higher-density on-chip memory beyond what current SRAM scaling trends can provide.
Ultra-Low-Power Spiking Neurons in 7 nm FinFET Technology: A Comparative Analysis of Leaky Integrate-and-Fire, Morris-Lecar, and Axon-Hillock ArchitecturesLogan Larsh, Raiyan Siddique, Sarah Sharif Yaser Mike Banad2025-04-21下载Neuromorphic computing aims to replicate the brain's remarkable energy efficiency and parallel processing capabilities for large-scale artificial intelligence applications.
Splitwiser: Efficient LM inference with constrained resourcesAsad Aali, Adney Cardoza, Melissa Capo2025-04-21下载Efficient inference of LLMs remains a crucial challenge, with two main phases: a compute-intensive prompt computation and a memory-intensive token generation.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Tracing Cross-chain Transactions between EVM-based Blockchains: An Analysis of Ethereum-Polygon BridgesTao Yan, Chuanshan Huang, Claudio J. Tessone2025-04-21下载Ethereum's scalability has been a major concern due to its limited transaction throughput and high fees. To address these limitations, Polygon has emerged as a sidechain solution that facilitates asse...
FedFetch: Faster Federated Learning with Adaptive Downstream PrefetchingQifan Yan, Andrew Liu, Shiqi He, Mathias Lécuyer, Ivan Beschastnikh2025-04-21下载Federated learning (FL) is a machine learning paradigm that facilitates massively distributed model training with end-user data on edge devices directed by a central server.
Advancing AI-assisted Hardware Design with Hierarchical Decentralized Training and Personalized Inference-Time OptimizationHao Mark Chen, Zehuan Zhang, Wanru Zhao, Nicholas Lane, Hongxiang Fan2025-04-21下载Recent years have witnessed a significant increase in the adoption of AI techniques to enhance electronic design automation. In particular, the emergence of Large Language Models (LLMs) has sparked si...
To Offload or Not To Offload: Model-driven Comparison of Edge-native and On-device Processing In the Era of AcceleratorsNathan Ng, David Irwin, Ananthram Swami, Don Towsley, Prashant Shenoy2025-04-21下载Computational offloading is a promising approach for overcoming resource constraints on client devices by moving some or all of an application's computations to remote servers.
Is Intelligence the Right Direction in New OS Scheduling for Multiple Resources in Cloud Environments?Xinglei Dou, Lei Liu, Limin Xiao2025-04-21下载Making it intelligent is a promising way in System/OS design. This paper proposes OSML+, a new ML-based resource scheduling mechanism for co-located cloud services.
SLO-Aware Scheduling for Large Language Model InferencesJinqi Huang, Yi Xiong, Xuebing Yu, Wenjie Huang, Entong Li, Li Zeng, Xin Chen2025-04-21下载Large language models (LLMs) have revolutionized applications such as code completion, chatbots, and online classification. To elevate user experiences, service level objectives (SLOs) serve as crucia...
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron CoreDennis Liu, Zijie Yan, Xin Yao, Tong Liu, Vijay Korthikanti, Evan Wu, Shiqing Fan, Gao Deng, Hongxiao Bai, Jianbin Chang, Ashwath Aithal, Michael Andersch, Mohammad Shoeybi, Jiajie Yao, Chandler Zhou, David Wu, Xipeng Li, June Yang2025-04-21下载Mixture of Experts (MoE) models enhance neural network scalability by dynamically selecting relevant experts per input token, enabling larger model sizes while maintaining manageable computation costs...
WindVE: Collaborative CPU-NPU Vector EmbeddingJinqi Huang, Xuebing Yu, Yi Xiong, Wenjie Huang, Entong Li, Li Zeng, Xin chen2025-04-21下载Retrieval-Augmented Generation is a technology that enhances large language models by integrating information retrieval. In the industry, inference services based on LLMs are highly sensitive to cost-...
ReCraft: Self-Contained Split, Merge, and Membership Change of Raft ProtocolKezhi Xiong, Soonwon Moon, Joshua Kang, Bryant Curto, Jieung Kim, Ji-Yong Shin2025-04-21下载Designing reconfiguration schemes for consensus protocols is challenging because subtle corner cases during reconfiguration could invalidate the correctness of the protocol.
Cultivating Multidisciplinary Research and Education on GPU Infrastructure for Mid-South Institutions at the University of Memphis: Practice and ChallengeMayira Sharif, Guangzeng Han, Weisi Liu, Xiaolei Huang2025-04-21下载To support rapid scientific advancement and promote access to large-scale computing resources for under-resourced institutions at the Mid-South region, the University of Memphis (UofM) established the...
Splitwiser: Efficient LM inference with constrained resourcesAsad Aali, Adney Cardoza, Melissa Capo2025-04-21下载Efficient inference of LLMs remains a crucial challenge, with two main phases: a compute-intensive prompt computation and a memory-intensive token generation.
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token ThrottlingTianyu Guo, Xianwei Zhang, Jiangsu Du, Zhiguang Chen, Nong Xiao, Yutong Lu2025-04-21下载Pipeline parallelism has emerged as a predominant approach for deploying large language models (LLMs) across distributed nodes, owing to its lower communication overhead compared to tensor parallelism...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Generative Artificial Intelligence for Beamforming in Low-Altitude EconomyGeng Sun, Jia Qi, Chuang Zhang, Xuejie Liu, Jiacheng Wang, Dusit Niyato, Yuanwei Liu, Dong In Kim2025-04-21下载The growth of low-altitude economy (LAE) has driven a rising demand for efficient and secure communication. However, conventional beamforming optimization techniques struggle in the complex LAE enviro...
Direct Search Algorithm for Clock Skew Compensation Immune to Floating-Point Precision LossKyeong Soo Kim2025-04-21下载We have been investigating clock skew compensation immune to floating-point precision loss by taking into account the discrete nature of clocks in digital communication systems; extending Bresenham's ...
NetCloak: Dynamic Topology Expansion for Secure and Scalable Configuration SharingQianye Wang, Yuejie Wang, Yongting Chen, Guyue Liu2025-04-21下载As modern networks continue to grow in both scale and complexity, sharing real-world device configurations poses significant privacy risks, especially when adversaries can infer organizational size or...
IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic ClassificationFengyuan Nie, Guangjie Liu, Weiwei Liu, Jianan Huang, Bo Gao2025-04-21下载Traffic classification is crucial for securing Internet of Things (IoT) networks. Deep learning-based methods can autonomously extract latent patterns from massive network traffic, demonstrating signi...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
LithOS: An Operating System for Efficient Machine Learning on GPUsPatrick H. Coppock, Brian Zhang, Eliot H. Solomon, Vasilis Kypriotis, Leon Yang, Bikash Sharma, Dan Schatzberg, Todd C. Mowry, Dimitrios Skarlatos2025-04-21下载The surging demand for GPUs in datacenters for machine learning (ML) has made efficient GPU utilization crucial. However, meeting the diverse needs of ML models while optimizing resource usage is chal...

cs.PF - Performance

标题作者发布日期PDF摘要
SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysisRitik Raj, Sarbartha Banerjee, Nikhil Chandra, Zishen Wan, Jianming Tong, Ananda Samajdar, Tushar Krishna2025-04-21下载The rapid advancements in AI, scientific computing, and high-performance computing (HPC) have driven the need for versatile and efficient hardware accelerators.
Zero-Shot, But at What Cost? Unveiling the Hidden Overhead of MILS's LLM-CLIP Framework for Image CaptioningYassir Benhammou, Alessandro Tiberio, Gabriel Trautmann, Suman Kalyan2025-04-21下载MILS (Multimodal Iterative LLM Solver) is a recently published framework that claims "LLMs can see and hear without any training" by leveraging an iterative, LLM-CLIP based approach for zero-shot imag...
Is Intelligence the Right Direction in New OS Scheduling for Multiple Resources in Cloud Environments?Xinglei Dou, Lei Liu, Limin Xiao2025-04-21下载Making it intelligent is a promising way in System/OS design. This paper proposes OSML+, a new ML-based resource scheduling mechanism for co-located cloud services.

基于 VitePress 构建