2025-09-05

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Comparing Methods for the Cross-Level Verification of SystemC Peripherals with Symbolic Execution	Karl Aaron Rudkowski, Sallar Ahmadi-Pour, Rolf Drechsler	2025-09-05	下载	Virtual Prototypes (VPs) are important tools in modern hardware development. At high abstractions, they are often implemented in SystemC and offer early analysis of increasingly complex designs.
Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device	Niansong Zhang, Wenbo Zhu, Courtney Golden, Dan Ilan, Hongzheng Chen, Christopher Batten, Zhiru Zhang	2025-09-05	下载	Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications.
Distributed-HISQ: A Distributed Quantum Control Architecture	Yilun Zhao, Kangding Zhao, Peng Zhou, Dingdong Liu, Tingyu Luo, Yuzhen Zheng, Peng Luo, Shun Hu, Jin Lin, Cheng Guo, Yinhe Han, Ying Wang, Mingtang Deng, Junjie Wu, X. Fu	2025-09-05	下载	The design of a scalable Quantum Control Architecture (QCA) faces two primary challenges. First, the continuous growth in qubit counts has rendered distributed QCA inevitable, yet the nondeterministic...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Efficient Fault Localization in a Cloud Stack Using End-to-End Application Service Topology	Dhanya R Mathews, Mudit Verma, Pooja Aggarwal, J. Lakshmi	2025-09-05	下载	Cloud application services are distributed in nature and have components across the stack working together to deliver the experience to end users.
veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMD	Youjie Li, Cheng Wan, Zhiqi Lin, Hongyu Zhu, Jiacheng Yang, Ziang Song, Xinyi Di, Jiawei Wu, Huiyao Shu, Wenlei Bao, Yanghua Peng, Haibin Lin, Li-Wen Chang	2025-09-05	下载	Large Language Models (LLMs) have scaled rapidly in size and complexity, requiring increasingly intricate parallelism for distributed training, such as 3D parallelism.
On Using Large-Batches in Federated Learning	Sahil Tyagi	2025-09-05	下载	Efficient Federated learning (FL) is crucial for training deep networks over devices with limited compute resources and bounded networks. With the advent of big data, devices either generate or collec...
Scaling Performance of Large Language Model Pretraining	Alexander Interrante-Grant, Carla Varela-Rosa, Suhaas Narayan, Chris Connelly, Albert Reuther	2025-09-05	下载	Large language models (LLMs) show best-in-class performance across a wide range of natural language processing applications. Training these models is an extremely computationally expensive task; front...
Dynamic reconfiguration for malleable applications using RMA	Iker Martín-Álvarez, José I. Aliaga, Maribel Castillo	2025-09-05	下载	This paper investigates the novel one-sided communication methods based on remote memory access (RMA) operations in MPI for dynamic resizing of malleable applications, enabling data redistribution wit...
Toward Distributed 3D Gaussian Splatting for High-Resolution Isosurface Visualization	Mengjiao Han, Andres Sewell, Joseph Insley, Janet Knowles, Victor A. Mateevitsi, Michael E. Papka, Steve Petruzza, Silvio Rizzi	2025-09-05	下载	We present a multi-GPU extension of the 3D Gaussian Splatting (3D-GS) pipeline for scientific visualization. Building on previous work that demonstrated high-fidelity isosurface reconstruction using G...
An Efficient Subspace Algorithm for Federated Learning on Heterogeneous Data	Jiaojiao Zhang, Yuqi Xu, Kun Yuan	2025-09-05	下载	This work addresses the key challenges of applying federated learning to large-scale deep neural networks, particularly the issue of client drift due to data heterogeneity across clients and the high ...
Discovering Software Parallelization Points Using Deep Neural Networks	Izavan dos S. Correia, Henrique C. T. Santos, Tiago A. E. Ferreira	2025-09-05	下载	This study proposes a deep learning-based approach for discovering loops in programming code according to their potential for parallelization.
Ratio1 -- AI meta-OS	Andrei Damian, Petrica Butusina, Alessandro De Franceschi, Vitalii Toderian, Marius Grigoras, Cristian Bleotiu	2025-09-05	下载	We propose the Ratio1 AI meta-operating system (meta-OS), a decentralized MLOps protocol that unifies AI model development, deployment, and inference across heterogeneous edge devices.
VoltanaLLM: Feedback-Driven Frequency Control and State-Space Routing for Energy-Efficient LLM Serving	Jiahuan Yu, Aryan Taneja, Junfeng Lin, Minjia Zhang	2025-09-05	下载	Modern Large Language Model (LLM) serving systems increasingly support interactive applications, like real-time chat assistants, code generation tools, and agentic workflows.
Accelerating Privacy-Preserving Federated Learning in Large-Scale LEO Satellite Systems	Binquan Guo, Junteng Cao, Marie Siew, Binbin Chen, Tony Q. S. Quek, Zhu Han	2025-09-05	下载	Large-scale low-Earth-orbit (LEO) satellite systems are increasingly valued for their ability to enable rapid and wide-area data exchange, thereby facilitating the collaborative training of artificial...
STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUs	Han Liang, Jiahui Zhou, Zicheng Zhou, Xiaoxi Zhang, Xu Chen	2025-09-05	下载	The escalating adoption of diffusion models for applications such as image generation demands efficient parallel inference techniques to manage their substantial computational cost.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Joint Routing, Resource Allocation, and Energy Optimization for Integrated Access and Backhaul with Open RAN	Reshma Prasad, Maxime Elkael, Gabriele Gemmi, Osama M. Bushnaq, Debashisha Mishra, Prasanna Raut, Jennifer Simonjan, Michele Polese, Tommaso Melodia	2025-09-05	下载	As networks evolve towards 6G, Mobile Network Operators (MNOs) must accommodate diverse requirements and at the same time manage rising energy consumption.
Distributed Link Sparsification for Scalable Scheduling Using Graph Neural Networks (Journal Version)	Zhongyuan Zhao, Gunjan Verma, Ananthram Swami, Santiago Segarra	2025-09-05	下载	In wireless networks characterized by dense connectivity, the significant signaling overhead generated by distributed link scheduling algorithms can exacerbate issues like congestion, energy consumpti...
Pair-Bid Auction Model for Optimized Network Slicing in 5G RAN	Mengyao Li, Sebastian Troia, Yingqian Zhang, Guido Maier	2025-09-05	下载	Network slicing is a key 5G technology that enables multiple virtual networks to share physical infrastructure, optimizing flexibility and resource allocation.
A Federated Fine-Tuning Paradigm of Foundation Models in Heterogenous Wireless Networks	Jingyi Wang, Zhongyuan Zhao, Qingtian Wang, Zexu Li, Yue Wang, Tony Q. S. Quek	2025-09-05	下载	Edge intelligence has emerged as a promising strategy to deliver low-latency and ubiquitous services for mobile devices. Recent advances in fine-tuning mechanisms of foundation models have enabled edg...
Where Have All the Firewalls Gone? Security Consequences of Residential IPv6 Transition	Erik Rye, Dave Levin, Robert Beverly	2025-09-05	下载	IPv4 NAT has limited the spread of IoT botnets considerably by default-denying bots' incoming connection requests to in-home devices unless the owner has explicitly allowed them.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
MambaLite-Micro: Memory-Optimized Mamba Inference on MCUs	Hongjun Xu, Junxi Xia, Weisi Yang, Yueyuan Sui, Stephen Xia	2025-09-05	下载	Deploying Mamba models on microcontrollers (MCUs) remains challenging due to limited memory, the lack of native operator support, and the absence of embedded-friendly toolchains.
Ratio1 -- AI meta-OS	Andrei Damian, Petrica Butusina, Alessandro De Franceschi, Vitalii Toderian, Marius Grigoras, Cristian Bleotiu	2025-09-05	下载	We propose the Ratio1 AI meta-operating system (meta-OS), a decentralized MLOps protocol that unifies AI model development, deployment, and inference across heterogeneous edge devices.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Efficient Fault Localization in a Cloud Stack Using End-to-End Application Service Topology	Dhanya R Mathews, Mudit Verma, Pooja Aggarwal, J. Lakshmi	2025-09-05	下载	Cloud application services are distributed in nature and have components across the stack working together to deliver the experience to end users.
Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device	Niansong Zhang, Wenbo Zhu, Courtney Golden, Dan Ilan, Hongzheng Chen, Christopher Batten, Zhiru Zhang	2025-09-05	下载	Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications.
Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks	Jason Gardner, Ayan Dutta, Swapnoneel Roy, O. Patrick Kreidl, Ladislau Boloni	2025-09-05	下载	The growing computational demands of deep reinforcement learning (DRL) have raised concerns about the environmental and economic costs of training large-scale models.
L1RA: Dynamic Rank Assignment in LoRA Fine-Tuning	Raul Singh, Nicolo Brunello, Vincenzo Scotti, Mark James Carman	2025-09-05	下载	The ability of Large Language Models (LLMs) to solve complex tasks has made them crucial in the development of AI-based applications. However, the high computational requirements to fine-tune these LL...