2025-07-03

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Hey AI, Generate Me a Hardware Code! Agentic AI-based Hardware Design & Verification	Deepak Narayan Gadde, Keerthan Kopparam Radhakrishna, Vaisakh Naduvodi Viswambharan, Aman Kumar, Djones Lettnin, Wolfgang Kunz, Sebastian Simon	2025-07-03	下载	Modern Integrated Circuits (ICs) are becoming increasingly complex, and so is their development process. Hardware design verification entails a methodical and disciplined approach to the planning, dev...
Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure	Rui Xie, Asad Ul Haq, Yunhua Fang, Linsen Ma, Sanchari Sen, Swagath Venkataramani, Liu Liu, Tong Zhang	2025-07-03	下载	High-Bandwidth Memory (HBM) delivers exceptional bandwidth and energy efficiency for AI workloads, but its high cost per bit, driven in part by stringent on-die reliability requirements, poses a growi...
AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models	Chenhao Xue, Kezhi Li, Jiaxing Zhang, Yi Ren, Zhengyuan Shi, Chen Zhang, Yibo Lin, Lining Zhang, Qiang Xu, Guangyu Sun	2025-07-03	下载	Arithmetic circuits, such as adders and multipliers, are fundamental components of digital systems, directly impacting the performance, power efficiency, and area footprint.
System-performance and cost modeling of Large Language Model training and inference	Wenzhe Guo, Joyjit Kundu, Uras Tos, Weijiang Kong, Giuliano Sisto, Timon Evenblij, Manu Perumkunnil	2025-07-03	下载	Large language models (LLMs), based on transformer architectures, have revolutionized numerous domains within artificial intelligence, science, and engineering due to their exceptional scalability and...
DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs	Mohammad Akyash, Kimia Azar, Hadi Kamali	2025-07-03	下载	As one of their many applications, large language models (LLMs) have recently shown promise in automating register transfer level (RTL) code generation.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Symbiosis: Multi-Adapter Inference and Fine-Tuning	Saransh Gupta, Umesh Deshpande, Travis Janssen, Swami Sundararaman	2025-07-03	下载	Parameter-efficient fine-tuning (PEFT) allows model builders to capture the task-specific parameters into adapters, which are a fraction of the size of the original base model.
Collective Communication Profiling of Modern-day Machine Learning Workloads	Jit Gupta, Andrew Li, Tarun Banka, Ariel Cohen, T. Sridhar, Raj Yavatkar	2025-07-03	下载	Machine Learning jobs, carried out on large number of distributed high performance systems, involve periodic communication using operations like AllReduce, AllGather, and Broadcast.
BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers	Patrik Okanovic, Sameer Deshmukh, Grzegorz Kwasniewski, Yi Zhu, Haruto Fujii, Sakina Fatima, Maciej Besta, Kentaro Katayama, Takumi Honda, Yusuke Nagasaka, Torsten Hoefler	2025-07-03	下载	The energy consumption of large-scale ML models is dominated by data movement, shuffling billions of parameters across memory hierarchies and data centers.
Characterizing Compute-Communication Overlap in GPU-Accelerated Distributed Deep Learning: Performance and Power Implications	Seonho Lee, Jihwan Oh, Junkyum Kim, Seokjin Go, Jongse Park, Divya Mahajan	2025-07-03	下载	This paper provides an in-depth characterization of GPU-accelerated systems, to understand the interplay between overlapping computation and communication which is commonly employed in distributed tra...
FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM Inference	Xing Liu, Lizhuo Luo, Ming Tang, Chao Huang, Xu Chen	2025-07-03	下载	Distributed inference serves as a promising approach to enabling the inference of large language models (LLMs) at the network edge. It distributes the inference process to multiple devices to ensure t...
MULTI-SCOUT: Multistatic Integrated Sensing and Communications in 5G and Beyond for Moving Target Detection, Positioning, and Tracking	Yalin E. Sagduyu, Kemal Davaslioglu, Tugba Erpek, Sastry Kompella, Gustave Anderson, Jonathan Ashdown	2025-07-03	下载	This paper presents a complete signal-processing chain for multistatic integrated sensing and communications (ISAC) using 5G Positioning Reference Signal (PRS).
Analysing semantic data storage in Distributed Ledger Technologies for Data Spaces	Juan Cano-Benito, Andrea Cimmino, Sven Hertling, Heiko Paulheim, Raúl García-Castro	2025-07-03	下载	Data spaces are emerging as decentralised infrastructures that enable sovereign, secure, and trustworthy data exchange among multiple participants.
Resolving CAP Through Automata-Theoretic Economic Design: A Unified Mathematical Framework for Real-Time Partition-Tolerant Systems	Craig S Wright	2025-07-03	下载	The CAP theorem asserts a trilemma between consistency, availability, and partition tolerance. This paper introduces a rigorous automata-theoretic and economically grounded framework that reframes the...
Red grape detection with accelerated artificial neural networks in the FPGA's programmable logic	Sandro Costa Magalhães, Marco Almeida, Filipe Neves dos Santos, António Paulo Moreira, Jorge Dias	2025-07-03	下载	Robots usually slow down for canning to detect objects while moving. Additionally, the robot's camera is configured with a low framerate to track the velocity of the detection algorithms.
Alps, a versatile research infrastructure	Maxime Martinasso, Mark Klein, Thomas C. Schulthess	2025-07-03	下载	The Swiss National Supercomputing Centre (CSCS) has a long-standing tradition of delivering top-tier high-performance computing systems, exemplified by the Piz Daint supercomputer.
On the Inference (In-)Security of Vertical Federated Learning: Efficient Auditing against Inference Tampering Attack	Chung-ju Huang, Ziqi Zhang, Yinggui Wang, Binghui Wang, Tao Wei, Leye Wang	2025-07-03	下载	Vertical Federated Learning (VFL) is an emerging distributed learning paradigm for cross-silo collaboration without accessing participants' data.
Flotilla: A scalable, modular and resilient federated learning framework for heterogeneous resources	Roopkatha Banerjee, Prince Modi, Jinal Vyas, Chunduru Sri Abhijit, Tejus Chandrashekar, Harsha Varun Marisetty, Manik Gupta, Yogesh Simmhan	2025-07-03	下载	With the recent improvements in mobile and edge computing and rising concerns of data privacy, Federated Learning(FL) has rapidly gained popularity as a privacy-preserving, distributed machine learnin...
Domain-Adversarial Transfer Learning for Fault Root Cause Identification in Cloud Computing Systems	Bruce Fang, Danyi Gao	2025-07-03	下载	This paper addresses the challenge of fault root cause identification in cloud computing environments. The difficulty arises from complex system structures, dense service coupling, and limited fault i...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
RCA Copilot: Transforming Network Data into Actionable Insights via Large Language Models	Alexander Shan, Jasleen Kaur, Rahul Singh, Tarun Banka, Raj Yavatkar, T. Sridhar	2025-07-03	下载	Ensuring the reliability and availability of complex networked services demands effective root cause analysis (RCA) across cloud environments, data centers, and on-premises networks.
Collective Communication Profiling of Modern-day Machine Learning Workloads	Jit Gupta, Andrew Li, Tarun Banka, Ariel Cohen, T. Sridhar, Raj Yavatkar	2025-07-03	下载	Machine Learning jobs, carried out on large number of distributed high performance systems, involve periodic communication using operations like AllReduce, AllGather, and Broadcast.
An End-to-End Assurance Framework for AI/ML Workloads in Datacenters	Jit Gupta, Tarun Banka, Rahul Gupta, Mithun Dharmaraj, Jasleen Kaur	2025-07-03	下载	Modern machine learning workloads such as large language model training, fine-tuning jobs are highly distributed and span across hundreds of systems with multiple GPUs.
DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift	Po-Heng Chou, Ching-Wen Chen, Wan-Jen Huang, Walid Saad, Yu Tsao, Ronald Y. Chang	2025-07-03	下载	In this paper, the precoding design is investigated for maximizing the throughput of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with obstructed direct communication paths.
On the Architectural Split and Radio Intelligence Controller Placement in Integrated O-RAN-enabled Non-Terrestrial Networks	Jorge Baranda, Marius Caus, Luis Blanco, Cristian J. Vaca-Rubio, Engin Zeydan, Kapal Dev, Zheng Li, Tomaso DeCola	2025-07-03	下载	The integration of Terrestrial Networks (TNs) with Non-Terrestrial Networks (NTNs) poses unique architectural and functional challenges due to heterogeneous propagation conditions, dynamic topologies ...
MULTI-SCOUT: Multistatic Integrated Sensing and Communications in 5G and Beyond for Moving Target Detection, Positioning, and Tracking	Yalin E. Sagduyu, Kemal Davaslioglu, Tugba Erpek, Sastry Kompella, Gustave Anderson, Jonathan Ashdown	2025-07-03	下载	This paper presents a complete signal-processing chain for multistatic integrated sensing and communications (ISAC) using 5G Positioning Reference Signal (PRS).

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Access Control Threatened by Quantum Entanglement	Zhicheng Zhang, Mingsheng Ying	2025-07-03	下载	Access control is a cornerstone of computer security that prevents unauthorised access to resources. In this paper, we study access control in quantum computer systems.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing	Liangyu Wang, Huanyi Xie, Di Wang	2025-07-03	下载	Fine-tuning large language models (LLMs) remains resource-intensive due to their sheer scale. While zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating backward passe...