2025-04-28

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
3D MPSoC with On-Chip Cache Support -- Design and Exploitation	Rodrigo Cataldo, Cesar Marcon, Debora Matos	2025-04-28	下载	The increasing density of transistors in Integrated Circuits (ICs) has enabled the development of highly integrated Systems-on-Chip (SoCs) and, more recently, Multiprocessor Systems-on-Chip (MPSoCs).
From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification	Junhao Ye, Yuchen Hu, Ke Xu, Dingrong Pan, Qichun Chen, Jie Zhou, Shuai Zhao, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang	2025-04-28	下载	Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of the total development effort. While the Universal Verification Methodology (UVM) is widely used...
FoldedHexaTorus: An Inter-Chiplet Interconnect Topology for Chiplet-based Systems using Organic and Glass Substrates	Patrick Iff, Maciej Besta, Torsten Hoefler	2025-04-28	下载	Chiplet-based systems are rapidly gaining traction in the market. Two packaging options for such systems are the established organic substrates and the emerging glass substrates.
Dynamic Tsetlin Machine Accelerators for On-Chip Training at the Edge using FPGAs	Gang Mao, Tousif Rahman, Sidharth Maheshwari, Bob Pattison, Zhuang Shao, Rishad Shafik, Alex Yakovlev	2025-04-28	下载	The increased demand for data privacy and security in machine learning (ML) applications has put impetus on effective edge training on Internet-of-Things (IoT) nodes.
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMs	Xilong Xie, Liang Wang, Limin Xiao, Meng Han, Lin Sun, Shuai Zheng, Xiangrong Xu	2025-04-28	下载	Large language models (LLMs) have significantly advanced the natural language processing paradigm but impose substantial demands on memory and computational resources.
Hardware/Software Co-Design of RISC-V Extensions for Accelerating Sparse DNNs on FPGAs	Muhammad Sabih, Abrarul Karim, Jakob Wittmann, Frank Hannig, Jürgen Teich	2025-04-28	下载	The customizability of RISC-V makes it an attractive choice for accelerating deep neural networks (DNNs). It can be achieved through instruction set extensions and corresponding custom functional unit...
Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models	Lei Xu, Shanshan Wang, Emmanuel Casseau, Chenglong Xiao	2025-04-28	下载	High-Level Synthesis (HLS) Design Space Exploration (DSE) is essential for generating hardware designs that balance performance, power, and area (PPA).
ChipletQuake: On-die Digital Impedance Sensing for Chiplet and Interposer Verification	Saleh Khalaj Monfared, Maryam Saadat Safa, Shahin Tajik	2025-04-28	下载	The increasing complexity and cost of manufacturing monolithic chips have driven the semiconductor industry toward chiplet-based designs, where smaller and modular chiplets are integrated onto a singl...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
SoK: A Survey of Mixing Techniques and Mixers for Cryptocurrencies	Juraj Mariani, Ivan Homoliak	2025-04-28	下载	Blockchain technologies have overturned the digital finance industry by introducing a decentralized pseudonymous means of monetary transfer. The pseudonymous nature introduced privacy concerns, enabli...
Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems	Alireza Furutanpey, Carmen Walser, Philipp Raith, Pantelis A. Frangoudis, Schahram Dustdar	2025-04-28	下载	This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms, addressing the critical gap between theoretical optimization techniques and pra...
Cosmos: A Cost Model for Serverless Workflows in the 3D Compute Continuum	Cynthia Marcelino, Sebastian Gollhofer-Berger, Thomas Pusztai, Stefan Nastic	2025-04-28	下载	Due to the high scalability, infrastructure management, and pay-per-use pricing model, serverless computing has been adopted in a wide range of applications such as real-time data processing, IoT, and...
Network-Aware Scheduling for Remote Gate Execution in Quantum Data Centers	Shahrooz Pouryousef, Reza Nejabati, Don Towsley, Ramana Kompella, Eneet Kaur	2025-04-28	下载	Modular quantum computing provides a scalable approach to overcome the limitations of monolithic quantum architectures by interconnecting multiple Quantum Processing Units (QPUs) through a quantum net...
SYMI: Efficient Mixture-of-Experts Training via Model and Optimizer State Decoupling	Athinagoras Skiadopoulos, Mark Zhao, Swapnil Gandhi, Thomas Norrie, Shrijeet Mukherjee, Christos Kozyrakis	2025-04-28	下载	Mixture-of-Experts (MoE) models have become a widely-adopted solution to continue scaling model sizes without a corresponding linear increase in compute.
semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified Storage	Ke Hong, Lufang Chen, Zhong Wang, Xiuhong Li, Qiuli Mao, Jianping Ma, Chao Xiong, Guanyu Wu, Buhe Han, Guohao Dai, Yun Liang, Yu Wang	2025-04-28	下载	Existing large language model (LLM) serving systems fall into two categories: 1) a unified system where prefill phase and decode phase are co-located on the same GPU, sharing the unified computational...
Taming the Titans: A Survey of Efficient LLM Inference Serving	Ranran Zhen, Juntao Li, Yixin Ji, Zhenlin Yang, Tong Liu, Qingrong Xia, Xinyu Duan, Zhefeng Wang, Baoxing Huai, Min Zhang	2025-04-28	下载	Large Language Models (LLMs) for Generative AI have achieved remarkable progress, evolving into sophisticated and versatile tools widely adopted across various domains and applications.
Efficient and Adaptable Overlapping for Computation and Communication via Signaling and Reordering	Ke Hong, Xiuhong Li, Minxu Liu, Qiuli Mao, Tianqi Wu, Zixiao Huang, Lufang Chen, Zhong Wang, Yichong Zhang, Zhenhua Zhu, Guohao Dai, Yu Wang	2025-04-28	下载	Generative models have achieved remarkable success across various applications, driving the demand for multi-GPU computing. Inter-GPU communication becomes a bottleneck in multi-GPU computing systems,...
Boosting LLM Serving through Spatial-Temporal GPU Resource Sharing	Zejia Lin, Hongxin Xu, Guanyi Chen, Zhiguang Chen, Yutong Lu, Xianwei Zhang	2025-04-28	下载	Modern LLM serving systems confront inefficient GPU utilization due to the fundamental mismatch between compute-intensive prefill and memory-bound decode phases.
Adjusted Objects: An Efficient and Principled Approach to Scalable Programming (Extended Version)	Boubacar Kane, Pierre Sutra	2025-04-28	下载	Parallel programs require software support to coordinate access to shared data. For this purpose, modern programming languages provide strongly-consistent shared objects.
Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler	Size Zheng, Wenlei Bao, Qi Hou, Xuegui Zheng, Jin Fang, Chenhui Huang, Tianqi Li, Haojie Duanmu, Renze Chen, Ruifan Xu, Yifan Guo, Ningxin Zheng, Ziheng Jiang, Xinyi Di, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, Liqiang Lu, Yun Liang, Jidong Zhai, Xin Liu	2025-04-28	下载	In this report, we propose Triton-distributed, an extension of existing Triton compiler, to overcome the programming challenges in distributed AI systems.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
A Virtual Cybersecurity Department for Securing Digital Twins in Water Distribution Systems	Mohammadhossein Homaei, Agustin Di Bartolo, Oscar Mogollon-Gutierrez, Fernando Broncano Morgado, Pablo Garcia Rodriguez	2025-04-28	下载	Digital twins (DTs) help improve real-time monitoring and decision-making in water distribution systems. However, their connectivity makes them easy targets for cyberattacks such as scanning, denial-o...
Tree embedding based mapping system for low-latency mobile applications in multi-access networks	Yu Mi, Randeep Bhatia, Fang Hao, An Wang, Steve Benno, Tv Lakshman	2025-04-28	下载	Low-latency applications like AR/VR and online gaming need fast, stable connections. New technologies such as V2X, LEO satellites, and 6G bring unique challenges in mobility management.
Network-Aware Scheduling for Remote Gate Execution in Quantum Data Centers	Shahrooz Pouryousef, Reza Nejabati, Don Towsley, Ramana Kompella, Eneet Kaur	2025-04-28	下载	Modular quantum computing provides a scalable approach to overcome the limitations of monolithic quantum architectures by interconnecting multiple Quantum Processing Units (QPUs) through a quantum net...
Mixture of Experts for Decentralized Generative AI and Reinforcement Learning in Wireless Networks: A Comprehensive Survey	Yunting Xu, Jiacheng Wang, Ruichen Zhang, Changyuan Zhao, Dusit Niyato, Jiawen Kang, Zehui Xiong, Bo Qian, Haibo Zhou, Shiwen Mao, Abbas Jamalipour, Xuemin Shen, Dong In Kim	2025-04-28	下载	Mixture of Experts (MoE) has emerged as a promising paradigm for scaling model capacity while preserving computational efficiency, particularly in large-scale machine learning architectures such as la...
Automatic Configuration Protocols for Optical Quantum Networks	Amin Taherkhani, Andrew Todd, Kentaro Teramoto, Rodney Van Meter, Shota Nagayama	2025-04-28	下载	Before quantum networks can scale up to practical sizes, there are many deployment and configuration tasks that must be automated. Currently, quantum networking testbeds are largely manually configure...
Lifecycle Management of Optical Networks with Dynamic-Updating Digital Twin: A Hybrid Data-Driven and Physics-Informed Approach	Yuchen Song, Min Zhang, Yao Zhang, Yan Shi, Shikui Shen, Xiongyan Tang, Shanguo Huang, Danshi Wang	2025-04-28	下载	Digital twin (DT) techniques have been proposed for the autonomous operation and lifecycle management of next-generation optical networks. To fully utilize potential capacity and accommodate dynamic s...
Graph Reinforcement Learning for QoS-Aware Load Balancing in Open Radio Access Networks	Omid Semiari, Hosein Nikopour, Shilpa Talwar	2025-04-28	下载	Next-generation wireless cellular networks are expected to provide unparalleled Quality-of-Service (QoS) for emerging wireless applications, necessitating strict performance guarantees, e.g.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Ariel OS: An Embedded Rust Operating System for Networked Sensors & Multi-Core Microcontrollers	Elena Frank, Kaspar Schleiser, Romain Fouquet, Koen Zandberg, Christian Amsüss, Emmanuel Baccelli	2025-04-28	下载	Large swaths of low-level system software building blocks originally implemented in C/C++ are currently being swapped for equivalent rewrites in Rust, a relatively more secure and dependable programmi...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Network-Aware Scheduling for Remote Gate Execution in Quantum Data Centers	Shahrooz Pouryousef, Reza Nejabati, Don Towsley, Ramana Kompella, Eneet Kaur	2025-04-28	下载	Modular quantum computing provides a scalable approach to overcome the limitations of monolithic quantum architectures by interconnecting multiple Quantum Processing Units (QPUs) through a quantum net...