2025-04-21

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis	Ritik Raj, Sarbartha Banerjee, Nikhil Chandra, Zishen Wan, Jianming Tong, Ananda Samajdar, Tushar Krishna	2025-04-21	下载	The rapid advancements in AI, scientific computing, and high-performance computing (HPC) have driven the need for versatile and efficient hardware accelerators.
Computing with Printed and Flexible Electronics	Mehdi B. Tahoori, Emre Ozer, Georgios Zervakis, Konstantinos Balaskas, Priyanjana Pal	2025-04-21	下载	Printed and flexible electronics (PFE) have emerged as the ubiquitous solution for application domains at the extreme edge, where the demands for low manufacturing and operational cost cannot be met b...
ForgeBench: A Machine Learning Benchmark Suite and Auto-Generation Framework for Next-Generation HLS Tools	Andy Wanna, Hanqiu Chen, Cong Hao	2025-04-21	下载	Although High-Level Synthesis (HLS) has attracted considerable interest in hardware design, it has not yet become mainstream due to two primary challenges.
Advancing AI-assisted Hardware Design with Hierarchical Decentralized Training and Personalized Inference-Time Optimization	Hao Mark Chen, Zehuan Zhang, Wanru Zhao, Nicholas Lane, Hongxiang Fan	2025-04-21	下载	Recent years have witnessed a significant increase in the adoption of AI techniques to enhance electronic design automation. In particular, the emergence of Large Language Models (LLMs) has sparked si...
Considerations on the Design of Transceivers for Ambient Internet of Things	Yuxiao Zhao, Zhen Shen, Shiyu Li, Jing Feng, Hao Min	2025-04-21	下载	The Ambient IoT (A-IoT) will introduce trillions of connections and enable low-cost battery-less devices. The A-IoT nodes can achieve low cost ($\sim$ 0.
Hardware-based Heterogeneous Memory Management for Large Language Model Inference	Soojin Hwang, Jungwoo Kim, Sanghyeon Lee, Hongbeen Kim, Jaehyuk Huh	2025-04-21	下载	A large language model (LLM) is one of the most important emerging machine learning applications nowadays. However, due to its huge model size and runtime increase of the memory footprint, LLM inferen...
GainSight: A Unified Framework for Data Lifetime Profiling and Heterogeneous Memory Composition	Peijing Li, Matthew Hung, Yiming Tan, Konstantin Hoßfeld, Jake Cheng Jiajun, Shuhan Liu, Lixian Yan, Xinxin Wang, Philip Levis, H. -S. Philip Wong, Thierry Tambe	2025-04-21	下载	As AI workloads drive increasing memory requirements, domain-specific accelerators need higher-density on-chip memory beyond what current SRAM scaling trends can provide.
Ultra-Low-Power Spiking Neurons in 7 nm FinFET Technology: A Comparative Analysis of Leaky Integrate-and-Fire, Morris-Lecar, and Axon-Hillock Architectures	Logan Larsh, Raiyan Siddique, Sarah Sharif Yaser Mike Banad	2025-04-21	下载	Neuromorphic computing aims to replicate the brain's remarkable energy efficiency and parallel processing capabilities for large-scale artificial intelligence applications.
Splitwiser: Efficient LM inference with constrained resources	Asad Aali, Adney Cardoza, Melissa Capo	2025-04-21	下载	Efficient inference of LLMs remains a crucial challenge, with two main phases: a compute-intensive prompt computation and a memory-intensive token generation.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Tracing Cross-chain Transactions between EVM-based Blockchains: An Analysis of Ethereum-Polygon Bridges	Tao Yan, Chuanshan Huang, Claudio J. Tessone	2025-04-21	下载	Ethereum's scalability has been a major concern due to its limited transaction throughput and high fees. To address these limitations, Polygon has emerged as a sidechain solution that facilitates asse...
FedFetch: Faster Federated Learning with Adaptive Downstream Prefetching	Qifan Yan, Andrew Liu, Shiqi He, Mathias Lécuyer, Ivan Beschastnikh	2025-04-21	下载	Federated learning (FL) is a machine learning paradigm that facilitates massively distributed model training with end-user data on edge devices directed by a central server.
Advancing AI-assisted Hardware Design with Hierarchical Decentralized Training and Personalized Inference-Time Optimization	Hao Mark Chen, Zehuan Zhang, Wanru Zhao, Nicholas Lane, Hongxiang Fan	2025-04-21	下载	Recent years have witnessed a significant increase in the adoption of AI techniques to enhance electronic design automation. In particular, the emergence of Large Language Models (LLMs) has sparked si...
To Offload or Not To Offload: Model-driven Comparison of Edge-native and On-device Processing In the Era of Accelerators	Nathan Ng, David Irwin, Ananthram Swami, Don Towsley, Prashant Shenoy	2025-04-21	下载	Computational offloading is a promising approach for overcoming resource constraints on client devices by moving some or all of an application's computations to remote servers.
Is Intelligence the Right Direction in New OS Scheduling for Multiple Resources in Cloud Environments?	Xinglei Dou, Lei Liu, Limin Xiao	2025-04-21	下载	Making it intelligent is a promising way in System/OS design. This paper proposes OSML+, a new ML-based resource scheduling mechanism for co-located cloud services.
SLO-Aware Scheduling for Large Language Model Inferences	Jinqi Huang, Yi Xiong, Xuebing Yu, Wenjie Huang, Entong Li, Li Zeng, Xin Chen	2025-04-21	下载	Large language models (LLMs) have revolutionized applications such as code completion, chatbots, and online classification. To elevate user experiences, service level objectives (SLOs) serve as crucia...
MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core	Dennis Liu, Zijie Yan, Xin Yao, Tong Liu, Vijay Korthikanti, Evan Wu, Shiqing Fan, Gao Deng, Hongxiao Bai, Jianbin Chang, Ashwath Aithal, Michael Andersch, Mohammad Shoeybi, Jiajie Yao, Chandler Zhou, David Wu, Xipeng Li, June Yang	2025-04-21	下载	Mixture of Experts (MoE) models enhance neural network scalability by dynamically selecting relevant experts per input token, enabling larger model sizes while maintaining manageable computation costs...
WindVE: Collaborative CPU-NPU Vector Embedding	Jinqi Huang, Xuebing Yu, Yi Xiong, Wenjie Huang, Entong Li, Li Zeng, Xin chen	2025-04-21	下载	Retrieval-Augmented Generation is a technology that enhances large language models by integrating information retrieval. In the industry, inference services based on LLMs are highly sensitive to cost-...
ReCraft: Self-Contained Split, Merge, and Membership Change of Raft Protocol	Kezhi Xiong, Soonwon Moon, Joshua Kang, Bryant Curto, Jieung Kim, Ji-Yong Shin	2025-04-21	下载	Designing reconfiguration schemes for consensus protocols is challenging because subtle corner cases during reconfiguration could invalidate the correctness of the protocol.
Cultivating Multidisciplinary Research and Education on GPU Infrastructure for Mid-South Institutions at the University of Memphis: Practice and Challenge	Mayira Sharif, Guangzeng Han, Weisi Liu, Xiaolei Huang	2025-04-21	下载	To support rapid scientific advancement and promote access to large-scale computing resources for under-resourced institutions at the Mid-South region, the University of Memphis (UofM) established the...
Splitwiser: Efficient LM inference with constrained resources	Asad Aali, Adney Cardoza, Melissa Capo	2025-04-21	下载	Efficient inference of LLMs remains a crucial challenge, with two main phases: a compute-intensive prompt computation and a memory-intensive token generation.
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling	Tianyu Guo, Xianwei Zhang, Jiangsu Du, Zhiguang Chen, Nong Xiao, Yutong Lu	2025-04-21	下载	Pipeline parallelism has emerged as a predominant approach for deploying large language models (LLMs) across distributed nodes, owing to its lower communication overhead compared to tensor parallelism...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Generative Artificial Intelligence for Beamforming in Low-Altitude Economy	Geng Sun, Jia Qi, Chuang Zhang, Xuejie Liu, Jiacheng Wang, Dusit Niyato, Yuanwei Liu, Dong In Kim	2025-04-21	下载	The growth of low-altitude economy (LAE) has driven a rising demand for efficient and secure communication. However, conventional beamforming optimization techniques struggle in the complex LAE enviro...
Direct Search Algorithm for Clock Skew Compensation Immune to Floating-Point Precision Loss	Kyeong Soo Kim	2025-04-21	下载	We have been investigating clock skew compensation immune to floating-point precision loss by taking into account the discrete nature of clocks in digital communication systems; extending Bresenham's ...
NetCloak: Dynamic Topology Expansion for Secure and Scalable Configuration Sharing	Qianye Wang, Yuejie Wang, Yongting Chen, Guyue Liu	2025-04-21	下载	As modern networks continue to grow in both scale and complexity, sharing real-world device configurations poses significant privacy risks, especially when adversaries can infer organizational size or...
IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic Classification	Fengyuan Nie, Guangjie Liu, Weiwei Liu, Jianan Huang, Bo Gao	2025-04-21	下载	Traffic classification is crucial for securing Internet of Things (IoT) networks. Deep learning-based methods can autonomously extract latent patterns from massive network traffic, demonstrating signi...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
LithOS: An Operating System for Efficient Machine Learning on GPUs	Patrick H. Coppock, Brian Zhang, Eliot H. Solomon, Vasilis Kypriotis, Leon Yang, Bikash Sharma, Dan Schatzberg, Todd C. Mowry, Dimitrios Skarlatos	2025-04-21	下载	The surging demand for GPUs in datacenters for machine learning (ML) has made efficient GPU utilization crucial. However, meeting the diverse needs of ML models while optimizing resource usage is chal...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis	Ritik Raj, Sarbartha Banerjee, Nikhil Chandra, Zishen Wan, Jianming Tong, Ananda Samajdar, Tushar Krishna	2025-04-21	下载	The rapid advancements in AI, scientific computing, and high-performance computing (HPC) have driven the need for versatile and efficient hardware accelerators.
Zero-Shot, But at What Cost? Unveiling the Hidden Overhead of MILS's LLM-CLIP Framework for Image Captioning	Yassir Benhammou, Alessandro Tiberio, Gabriel Trautmann, Suman Kalyan	2025-04-21	下载	MILS (Multimodal Iterative LLM Solver) is a recently published framework that claims "LLMs can see and hear without any training" by leveraging an iterative, LLM-CLIP based approach for zero-shot imag...
Is Intelligence the Right Direction in New OS Scheduling for Multiple Resources in Cloud Environments?	Xinglei Dou, Lei Liu, Limin Xiao	2025-04-21	下载	Making it intelligent is a promising way in System/OS design. This paper proposes OSML+, a new ML-based resource scheduling mechanism for co-located cloud services.