Skip to content

2025-07-03

cs.AR - Architecture

标题作者发布日期PDF摘要
Hey AI, Generate Me a Hardware Code! Agentic AI-based Hardware Design & VerificationDeepak Narayan Gadde, Keerthan Kopparam Radhakrishna, Vaisakh Naduvodi Viswambharan, Aman Kumar, Djones Lettnin, Wolfgang Kunz, Sebastian Simon2025-07-03下载Modern Integrated Circuits (ICs) are becoming increasingly complex, and so is their development process. Hardware design verification entails a methodical and disciplined approach to the planning, dev...
Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference InfrastructureRui Xie, Asad Ul Haq, Yunhua Fang, Linsen Ma, Sanchari Sen, Swagath Venkataramani, Liu Liu, Tong Zhang2025-07-03下载High-Bandwidth Memory (HBM) delivers exceptional bandwidth and energy efficiency for AI workloads, but its high cost per bit, driven in part by stringent on-die reliability requirements, poses a growi...
AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion ModelsChenhao Xue, Kezhi Li, Jiaxing Zhang, Yi Ren, Zhengyuan Shi, Chen Zhang, Yibo Lin, Lining Zhang, Qiang Xu, Guangyu Sun2025-07-03下载Arithmetic circuits, such as adders and multipliers, are fundamental components of digital systems, directly impacting the performance, power efficiency, and area footprint.
System-performance and cost modeling of Large Language Model training and inferenceWenzhe Guo, Joyjit Kundu, Uras Tos, Weijiang Kong, Giuliano Sisto, Timon Evenblij, Manu Perumkunnil2025-07-03下载Large language models (LLMs), based on transformer architectures, have revolutionized numerous domains within artificial intelligence, science, and engineering due to their exceptional scalability and...
DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMsMohammad Akyash, Kimia Azar, Hadi Kamali2025-07-03下载As one of their many applications, large language models (LLMs) have recently shown promise in automating register transfer level (RTL) code generation.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Symbiosis: Multi-Adapter Inference and Fine-TuningSaransh Gupta, Umesh Deshpande, Travis Janssen, Swami Sundararaman2025-07-03下载Parameter-efficient fine-tuning (PEFT) allows model builders to capture the task-specific parameters into adapters, which are a fraction of the size of the original base model.
Collective Communication Profiling of Modern-day Machine Learning WorkloadsJit Gupta, Andrew Li, Tarun Banka, Ariel Cohen, T. Sridhar, Raj Yavatkar2025-07-03下载Machine Learning jobs, carried out on large number of distributed high performance systems, involve periodic communication using operations like AllReduce, AllGather, and Broadcast.
BLaST: High Performance Inference and Pretraining using BLock Sparse TransformersPatrik Okanovic, Sameer Deshmukh, Grzegorz Kwasniewski, Yi Zhu, Haruto Fujii, Sakina Fatima, Maciej Besta, Kentaro Katayama, Takumi Honda, Yusuke Nagasaka, Torsten Hoefler2025-07-03下载The energy consumption of large-scale ML models is dominated by data movement, shuffling billions of parameters across memory hierarchies and data centers.
Characterizing Compute-Communication Overlap in GPU-Accelerated Distributed Deep Learning: Performance and Power ImplicationsSeonho Lee, Jihwan Oh, Junkyum Kim, Seokjin Go, Jongse Park, Divya Mahajan2025-07-03下载This paper provides an in-depth characterization of GPU-accelerated systems, to understand the interplay between overlapping computation and communication which is commonly employed in distributed tra...
FlowSpec: Continuous Pipelined Speculative Decoding for Efficient Distributed LLM InferenceXing Liu, Lizhuo Luo, Ming Tang, Chao Huang, Xu Chen2025-07-03下载Distributed inference serves as a promising approach to enabling the inference of large language models (LLMs) at the network edge. It distributes the inference process to multiple devices to ensure t...
MULTI-SCOUT: Multistatic Integrated Sensing and Communications in 5G and Beyond for Moving Target Detection, Positioning, and TrackingYalin E. Sagduyu, Kemal Davaslioglu, Tugba Erpek, Sastry Kompella, Gustave Anderson, Jonathan Ashdown2025-07-03下载This paper presents a complete signal-processing chain for multistatic integrated sensing and communications (ISAC) using 5G Positioning Reference Signal (PRS).
Analysing semantic data storage in Distributed Ledger Technologies for Data SpacesJuan Cano-Benito, Andrea Cimmino, Sven Hertling, Heiko Paulheim, Raúl García-Castro2025-07-03下载Data spaces are emerging as decentralised infrastructures that enable sovereign, secure, and trustworthy data exchange among multiple participants.
Resolving CAP Through Automata-Theoretic Economic Design: A Unified Mathematical Framework for Real-Time Partition-Tolerant SystemsCraig S Wright2025-07-03下载The CAP theorem asserts a trilemma between consistency, availability, and partition tolerance. This paper introduces a rigorous automata-theoretic and economically grounded framework that reframes the...
Red grape detection with accelerated artificial neural networks in the FPGA's programmable logicSandro Costa Magalhães, Marco Almeida, Filipe Neves dos Santos, António Paulo Moreira, Jorge Dias2025-07-03下载Robots usually slow down for canning to detect objects while moving. Additionally, the robot's camera is configured with a low framerate to track the velocity of the detection algorithms.
Alps, a versatile research infrastructureMaxime Martinasso, Mark Klein, Thomas C. Schulthess2025-07-03下载The Swiss National Supercomputing Centre (CSCS) has a long-standing tradition of delivering top-tier high-performance computing systems, exemplified by the Piz Daint supercomputer.
On the Inference (In-)Security of Vertical Federated Learning: Efficient Auditing against Inference Tampering AttackChung-ju Huang, Ziqi Zhang, Yinggui Wang, Binghui Wang, Tao Wei, Leye Wang2025-07-03下载Vertical Federated Learning (VFL) is an emerging distributed learning paradigm for cross-silo collaboration without accessing participants' data.
Flotilla: A scalable, modular and resilient federated learning framework for heterogeneous resourcesRoopkatha Banerjee, Prince Modi, Jinal Vyas, Chunduru Sri Abhijit, Tejus Chandrashekar, Harsha Varun Marisetty, Manik Gupta, Yogesh Simmhan2025-07-03下载With the recent improvements in mobile and edge computing and rising concerns of data privacy, Federated Learning(FL) has rapidly gained popularity as a privacy-preserving, distributed machine learnin...
Domain-Adversarial Transfer Learning for Fault Root Cause Identification in Cloud Computing SystemsBruce Fang, Danyi Gao2025-07-03下载This paper addresses the challenge of fault root cause identification in cloud computing environments. The difficulty arises from complex system structures, dense service coupling, and limited fault i...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
RCA Copilot: Transforming Network Data into Actionable Insights via Large Language ModelsAlexander Shan, Jasleen Kaur, Rahul Singh, Tarun Banka, Raj Yavatkar, T. Sridhar2025-07-03下载Ensuring the reliability and availability of complex networked services demands effective root cause analysis (RCA) across cloud environments, data centers, and on-premises networks.
Collective Communication Profiling of Modern-day Machine Learning WorkloadsJit Gupta, Andrew Li, Tarun Banka, Ariel Cohen, T. Sridhar, Raj Yavatkar2025-07-03下载Machine Learning jobs, carried out on large number of distributed high performance systems, involve periodic communication using operations like AllReduce, AllGather, and Broadcast.
An End-to-End Assurance Framework for AI/ML Workloads in DatacentersJit Gupta, Tarun Banka, Rahul Gupta, Mithun Dharmaraj, Jasleen Kaur2025-07-03下载Modern machine learning workloads such as large language model training, fine-tuning jobs are highly distributed and span across hundreds of systems with multiple GPUs.
DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase ShiftPo-Heng Chou, Ching-Wen Chen, Wan-Jen Huang, Walid Saad, Yu Tsao, Ronald Y. Chang2025-07-03下载In this paper, the precoding design is investigated for maximizing the throughput of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with obstructed direct communication paths.
On the Architectural Split and Radio Intelligence Controller Placement in Integrated O-RAN-enabled Non-Terrestrial NetworksJorge Baranda, Marius Caus, Luis Blanco, Cristian J. Vaca-Rubio, Engin Zeydan, Kapal Dev, Zheng Li, Tomaso DeCola2025-07-03下载The integration of Terrestrial Networks (TNs) with Non-Terrestrial Networks (NTNs) poses unique architectural and functional challenges due to heterogeneous propagation conditions, dynamic topologies ...
MULTI-SCOUT: Multistatic Integrated Sensing and Communications in 5G and Beyond for Moving Target Detection, Positioning, and TrackingYalin E. Sagduyu, Kemal Davaslioglu, Tugba Erpek, Sastry Kompella, Gustave Anderson, Jonathan Ashdown2025-07-03下载This paper presents a complete signal-processing chain for multistatic integrated sensing and communications (ISAC) using 5G Positioning Reference Signal (PRS).

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Access Control Threatened by Quantum EntanglementZhicheng Zhang, Mingsheng Ying2025-07-03下载Access control is a cornerstone of computer security that prevents unauthorised access to resources. In this paper, we study access control in quantum computer systems.

cs.PF - Performance

标题作者发布日期PDF摘要
DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel ComputingLiangyu Wang, Huanyi Xie, Di Wang2025-07-03下载Fine-tuning large language models (LLMs) remains resource-intensive due to their sheer scale. While zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating backward passe...

基于 VitePress 构建