Skip to content

2025-04-28

cs.AR - Architecture

标题作者发布日期PDF摘要
3D MPSoC with On-Chip Cache Support -- Design and ExploitationRodrigo Cataldo, Cesar Marcon, Debora Matos2025-04-28下载The increasing density of transistors in Integrated Circuits (ICs) has enabled the development of highly integrated Systems-on-Chip (SoCs) and, more recently, Multiprocessor Systems-on-Chip (MPSoCs).
From Concept to Practice: an Automated LLM-aided UVM Machine for RTL VerificationJunhao Ye, Yuchen Hu, Ke Xu, Dingrong Pan, Qichun Chen, Jie Zhou, Shuai Zhao, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang2025-04-28下载Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of the total development effort. While the Universal Verification Methodology (UVM) is widely used...
FoldedHexaTorus: An Inter-Chiplet Interconnect Topology for Chiplet-based Systems using Organic and Glass SubstratesPatrick Iff, Maciej Besta, Torsten Hoefler2025-04-28下载Chiplet-based systems are rapidly gaining traction in the market. Two packaging options for such systems are the established organic substrates and the emerging glass substrates.
Dynamic Tsetlin Machine Accelerators for On-Chip Training at the Edge using FPGAsGang Mao, Tousif Rahman, Sidharth Maheshwari, Bob Pattison, Zhuang Shao, Rishad Shafik, Alex Yakovlev2025-04-28下载The increased demand for data privacy and security in machine learning (ML) applications has put impetus on effective edge training on Internet-of-Things (IoT) nodes.
FineQ: Software-Hardware Co-Design for Low-Bit Fine-Grained Mixed-Precision Quantization of LLMsXilong Xie, Liang Wang, Limin Xiao, Meng Han, Lin Sun, Shuai Zheng, Xiangrong Xu2025-04-28下载Large language models (LLMs) have significantly advanced the natural language processing paradigm but impose substantial demands on memory and computational resources.
Hardware/Software Co-Design of RISC-V Extensions for Accelerating Sparse DNNs on FPGAsMuhammad Sabih, Abrarul Karim, Jakob Wittmann, Frank Hannig, Jürgen Teich2025-04-28下载The customizability of RISC-V makes it an attractive choice for accelerating deep neural networks (DNNs). It can be achieved through instruction set extensions and corresponding custom functional unit...
Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language ModelsLei Xu, Shanshan Wang, Emmanuel Casseau, Chenglong Xiao2025-04-28下载High-Level Synthesis (HLS) Design Space Exploration (DSE) is essential for generating hardware designs that balance performance, power, and area (PPA).
ChipletQuake: On-die Digital Impedance Sensing for Chiplet and Interposer VerificationSaleh Khalaj Monfared, Maryam Saadat Safa, Shahin Tajik2025-04-28下载The increasing complexity and cost of manufacturing monolithic chips have driven the semiconductor industry toward chiplet-based designs, where smaller and modular chiplets are integrated onto a singl...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
SoK: A Survey of Mixing Techniques and Mixers for CryptocurrenciesJuraj Mariani, Ivan Homoliak2025-04-28下载Blockchain technologies have overturned the digital finance industry by introducing a decentralized pseudonymous means of monetary transfer. The pseudonymous nature introduced privacy concerns, enabli...
Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud SystemsAlireza Furutanpey, Carmen Walser, Philipp Raith, Pantelis A. Frangoudis, Schahram Dustdar2025-04-28下载This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms, addressing the critical gap between theoretical optimization techniques and pra...
Cosmos: A Cost Model for Serverless Workflows in the 3D Compute ContinuumCynthia Marcelino, Sebastian Gollhofer-Berger, Thomas Pusztai, Stefan Nastic2025-04-28下载Due to the high scalability, infrastructure management, and pay-per-use pricing model, serverless computing has been adopted in a wide range of applications such as real-time data processing, IoT, and...
Network-Aware Scheduling for Remote Gate Execution in Quantum Data CentersShahrooz Pouryousef, Reza Nejabati, Don Towsley, Ramana Kompella, Eneet Kaur2025-04-28下载Modular quantum computing provides a scalable approach to overcome the limitations of monolithic quantum architectures by interconnecting multiple Quantum Processing Units (QPUs) through a quantum net...
SYMI: Efficient Mixture-of-Experts Training via Model and Optimizer State DecouplingAthinagoras Skiadopoulos, Mark Zhao, Swapnil Gandhi, Thomas Norrie, Shrijeet Mukherjee, Christos Kozyrakis2025-04-28下载Mixture-of-Experts (MoE) models have become a widely-adopted solution to continue scaling model sizes without a corresponding linear increase in compute.
semi-PD: Towards Efficient LLM Serving via Phase-Wise Disaggregated Computation and Unified StorageKe Hong, Lufang Chen, Zhong Wang, Xiuhong Li, Qiuli Mao, Jianping Ma, Chao Xiong, Guanyu Wu, Buhe Han, Guohao Dai, Yun Liang, Yu Wang2025-04-28下载Existing large language model (LLM) serving systems fall into two categories: 1) a unified system where prefill phase and decode phase are co-located on the same GPU, sharing the unified computational...
Taming the Titans: A Survey of Efficient LLM Inference ServingRanran Zhen, Juntao Li, Yixin Ji, Zhenlin Yang, Tong Liu, Qingrong Xia, Xinyu Duan, Zhefeng Wang, Baoxing Huai, Min Zhang2025-04-28下载Large Language Models (LLMs) for Generative AI have achieved remarkable progress, evolving into sophisticated and versatile tools widely adopted across various domains and applications.
Efficient and Adaptable Overlapping for Computation and Communication via Signaling and ReorderingKe Hong, Xiuhong Li, Minxu Liu, Qiuli Mao, Tianqi Wu, Zixiao Huang, Lufang Chen, Zhong Wang, Yichong Zhang, Zhenhua Zhu, Guohao Dai, Yu Wang2025-04-28下载Generative models have achieved remarkable success across various applications, driving the demand for multi-GPU computing. Inter-GPU communication becomes a bottleneck in multi-GPU computing systems,...
Boosting LLM Serving through Spatial-Temporal GPU Resource SharingZejia Lin, Hongxin Xu, Guanyi Chen, Zhiguang Chen, Yutong Lu, Xianwei Zhang2025-04-28下载Modern LLM serving systems confront inefficient GPU utilization due to the fundamental mismatch between compute-intensive prefill and memory-bound decode phases.
Adjusted Objects: An Efficient and Principled Approach to Scalable Programming (Extended Version)Boubacar Kane, Pierre Sutra2025-04-28下载Parallel programs require software support to coordinate access to shared data. For this purpose, modern programming languages provide strongly-consistent shared objects.
Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton CompilerSize Zheng, Wenlei Bao, Qi Hou, Xuegui Zheng, Jin Fang, Chenhui Huang, Tianqi Li, Haojie Duanmu, Renze Chen, Ruifan Xu, Yifan Guo, Ningxin Zheng, Ziheng Jiang, Xinyi Di, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, Liqiang Lu, Yun Liang, Jidong Zhai, Xin Liu2025-04-28下载In this report, we propose Triton-distributed, an extension of existing Triton compiler, to overcome the programming challenges in distributed AI systems.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Virtual Cybersecurity Department for Securing Digital Twins in Water Distribution SystemsMohammadhossein Homaei, Agustin Di Bartolo, Oscar Mogollon-Gutierrez, Fernando Broncano Morgado, Pablo Garcia Rodriguez2025-04-28下载Digital twins (DTs) help improve real-time monitoring and decision-making in water distribution systems. However, their connectivity makes them easy targets for cyberattacks such as scanning, denial-o...
Tree embedding based mapping system for low-latency mobile applications in multi-access networksYu Mi, Randeep Bhatia, Fang Hao, An Wang, Steve Benno, Tv Lakshman2025-04-28下载Low-latency applications like AR/VR and online gaming need fast, stable connections. New technologies such as V2X, LEO satellites, and 6G bring unique challenges in mobility management.
Network-Aware Scheduling for Remote Gate Execution in Quantum Data CentersShahrooz Pouryousef, Reza Nejabati, Don Towsley, Ramana Kompella, Eneet Kaur2025-04-28下载Modular quantum computing provides a scalable approach to overcome the limitations of monolithic quantum architectures by interconnecting multiple Quantum Processing Units (QPUs) through a quantum net...
Mixture of Experts for Decentralized Generative AI and Reinforcement Learning in Wireless Networks: A Comprehensive SurveyYunting Xu, Jiacheng Wang, Ruichen Zhang, Changyuan Zhao, Dusit Niyato, Jiawen Kang, Zehui Xiong, Bo Qian, Haibo Zhou, Shiwen Mao, Abbas Jamalipour, Xuemin Shen, Dong In Kim2025-04-28下载Mixture of Experts (MoE) has emerged as a promising paradigm for scaling model capacity while preserving computational efficiency, particularly in large-scale machine learning architectures such as la...
Automatic Configuration Protocols for Optical Quantum NetworksAmin Taherkhani, Andrew Todd, Kentaro Teramoto, Rodney Van Meter, Shota Nagayama2025-04-28下载Before quantum networks can scale up to practical sizes, there are many deployment and configuration tasks that must be automated. Currently, quantum networking testbeds are largely manually configure...
Lifecycle Management of Optical Networks with Dynamic-Updating Digital Twin: A Hybrid Data-Driven and Physics-Informed ApproachYuchen Song, Min Zhang, Yao Zhang, Yan Shi, Shikui Shen, Xiongyan Tang, Shanguo Huang, Danshi Wang2025-04-28下载Digital twin (DT) techniques have been proposed for the autonomous operation and lifecycle management of next-generation optical networks. To fully utilize potential capacity and accommodate dynamic s...
Graph Reinforcement Learning for QoS-Aware Load Balancing in Open Radio Access NetworksOmid Semiari, Hosein Nikopour, Shilpa Talwar2025-04-28下载Next-generation wireless cellular networks are expected to provide unparalleled Quality-of-Service (QoS) for emerging wireless applications, necessitating strict performance guarantees, e.g.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Ariel OS: An Embedded Rust Operating System for Networked Sensors & Multi-Core MicrocontrollersElena Frank, Kaspar Schleiser, Romain Fouquet, Koen Zandberg, Christian Amsüss, Emmanuel Baccelli2025-04-28下载Large swaths of low-level system software building blocks originally implemented in C/C++ are currently being swapped for equivalent rewrites in Rust, a relatively more secure and dependable programmi...

cs.PF - Performance

标题作者发布日期PDF摘要
Network-Aware Scheduling for Remote Gate Execution in Quantum Data CentersShahrooz Pouryousef, Reza Nejabati, Don Towsley, Ramana Kompella, Eneet Kaur2025-04-28下载Modular quantum computing provides a scalable approach to overcome the limitations of monolithic quantum architectures by interconnecting multiple Quantum Processing Units (QPUs) through a quantum net...

基于 VitePress 构建