Skip to content

2025-09-05

cs.AR - Architecture

标题作者发布日期PDF摘要
Comparing Methods for the Cross-Level Verification of SystemC Peripherals with Symbolic ExecutionKarl Aaron Rudkowski, Sallar Ahmadi-Pour, Rolf Drechsler2025-09-05下载Virtual Prototypes (VPs) are important tools in modern hardware development. At high abstractions, they are often implemented in SystemC and offer early analysis of increasingly complex designs.
Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM DeviceNiansong Zhang, Wenbo Zhu, Courtney Golden, Dan Ilan, Hongzheng Chen, Christopher Batten, Zhiru Zhang2025-09-05下载Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications.
Distributed-HISQ: A Distributed Quantum Control ArchitectureYilun Zhao, Kangding Zhao, Peng Zhou, Dingdong Liu, Tingyu Luo, Yuzhen Zheng, Peng Luo, Shun Hu, Jin Lin, Cheng Guo, Yinhe Han, Ying Wang, Mingtang Deng, Junjie Wu, X. Fu2025-09-05下载The design of a scalable Quantum Control Architecture (QCA) faces two primary challenges. First, the continuous growth in qubit counts has rendered distributed QCA inevitable, yet the nondeterministic...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Efficient Fault Localization in a Cloud Stack Using End-to-End Application Service TopologyDhanya R Mathews, Mudit Verma, Pooja Aggarwal, J. Lakshmi2025-09-05下载Cloud application services are distributed in nature and have components across the stack working together to deliver the experience to end users.
veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMDYoujie Li, Cheng Wan, Zhiqi Lin, Hongyu Zhu, Jiacheng Yang, Ziang Song, Xinyi Di, Jiawei Wu, Huiyao Shu, Wenlei Bao, Yanghua Peng, Haibin Lin, Li-Wen Chang2025-09-05下载Large Language Models (LLMs) have scaled rapidly in size and complexity, requiring increasingly intricate parallelism for distributed training, such as 3D parallelism.
On Using Large-Batches in Federated LearningSahil Tyagi2025-09-05下载Efficient Federated learning (FL) is crucial for training deep networks over devices with limited compute resources and bounded networks. With the advent of big data, devices either generate or collec...
Scaling Performance of Large Language Model PretrainingAlexander Interrante-Grant, Carla Varela-Rosa, Suhaas Narayan, Chris Connelly, Albert Reuther2025-09-05下载Large language models (LLMs) show best-in-class performance across a wide range of natural language processing applications. Training these models is an extremely computationally expensive task; front...
Dynamic reconfiguration for malleable applications using RMAIker Martín-Álvarez, José I. Aliaga, Maribel Castillo2025-09-05下载This paper investigates the novel one-sided communication methods based on remote memory access (RMA) operations in MPI for dynamic resizing of malleable applications, enabling data redistribution wit...
Toward Distributed 3D Gaussian Splatting for High-Resolution Isosurface VisualizationMengjiao Han, Andres Sewell, Joseph Insley, Janet Knowles, Victor A. Mateevitsi, Michael E. Papka, Steve Petruzza, Silvio Rizzi2025-09-05下载We present a multi-GPU extension of the 3D Gaussian Splatting (3D-GS) pipeline for scientific visualization. Building on previous work that demonstrated high-fidelity isosurface reconstruction using G...
An Efficient Subspace Algorithm for Federated Learning on Heterogeneous DataJiaojiao Zhang, Yuqi Xu, Kun Yuan2025-09-05下载This work addresses the key challenges of applying federated learning to large-scale deep neural networks, particularly the issue of client drift due to data heterogeneity across clients and the high ...
Discovering Software Parallelization Points Using Deep Neural NetworksIzavan dos S. Correia, Henrique C. T. Santos, Tiago A. E. Ferreira2025-09-05下载This study proposes a deep learning-based approach for discovering loops in programming code according to their potential for parallelization.
Ratio1 -- AI meta-OSAndrei Damian, Petrica Butusina, Alessandro De Franceschi, Vitalii Toderian, Marius Grigoras, Cristian Bleotiu2025-09-05下载We propose the Ratio1 AI meta-operating system (meta-OS), a decentralized MLOps protocol that unifies AI model development, deployment, and inference across heterogeneous edge devices.
VoltanaLLM: Feedback-Driven Frequency Control and State-Space Routing for Energy-Efficient LLM ServingJiahuan Yu, Aryan Taneja, Junfeng Lin, Minjia Zhang2025-09-05下载Modern Large Language Model (LLM) serving systems increasingly support interactive applications, like real-time chat assistants, code generation tools, and agentic workflows.
Accelerating Privacy-Preserving Federated Learning in Large-Scale LEO Satellite SystemsBinquan Guo, Junteng Cao, Marie Siew, Binbin Chen, Tony Q. S. Quek, Zhu Han2025-09-05下载Large-scale low-Earth-orbit (LEO) satellite systems are increasingly valued for their ability to enable rapid and wide-area data exchange, thereby facilitating the collaborative training of artificial...
STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUsHan Liang, Jiahui Zhou, Zicheng Zhou, Xiaoxi Zhang, Xu Chen2025-09-05下载The escalating adoption of diffusion models for applications such as image generation demands efficient parallel inference techniques to manage their substantial computational cost.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Joint Routing, Resource Allocation, and Energy Optimization for Integrated Access and Backhaul with Open RANReshma Prasad, Maxime Elkael, Gabriele Gemmi, Osama M. Bushnaq, Debashisha Mishra, Prasanna Raut, Jennifer Simonjan, Michele Polese, Tommaso Melodia2025-09-05下载As networks evolve towards 6G, Mobile Network Operators (MNOs) must accommodate diverse requirements and at the same time manage rising energy consumption.
Distributed Link Sparsification for Scalable Scheduling Using Graph Neural Networks (Journal Version)Zhongyuan Zhao, Gunjan Verma, Ananthram Swami, Santiago Segarra2025-09-05下载In wireless networks characterized by dense connectivity, the significant signaling overhead generated by distributed link scheduling algorithms can exacerbate issues like congestion, energy consumpti...
Pair-Bid Auction Model for Optimized Network Slicing in 5G RANMengyao Li, Sebastian Troia, Yingqian Zhang, Guido Maier2025-09-05下载Network slicing is a key 5G technology that enables multiple virtual networks to share physical infrastructure, optimizing flexibility and resource allocation.
A Federated Fine-Tuning Paradigm of Foundation Models in Heterogenous Wireless NetworksJingyi Wang, Zhongyuan Zhao, Qingtian Wang, Zexu Li, Yue Wang, Tony Q. S. Quek2025-09-05下载Edge intelligence has emerged as a promising strategy to deliver low-latency and ubiquitous services for mobile devices. Recent advances in fine-tuning mechanisms of foundation models have enabled edg...
Where Have All the Firewalls Gone? Security Consequences of Residential IPv6 TransitionErik Rye, Dave Levin, Robert Beverly2025-09-05下载IPv4 NAT has limited the spread of IoT botnets considerably by default-denying bots' incoming connection requests to in-home devices unless the owner has explicitly allowed them.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
MambaLite-Micro: Memory-Optimized Mamba Inference on MCUsHongjun Xu, Junxi Xia, Weisi Yang, Yueyuan Sui, Stephen Xia2025-09-05下载Deploying Mamba models on microcontrollers (MCUs) remains challenging due to limited memory, the lack of native operator support, and the absence of embedded-friendly toolchains.
Ratio1 -- AI meta-OSAndrei Damian, Petrica Butusina, Alessandro De Franceschi, Vitalii Toderian, Marius Grigoras, Cristian Bleotiu2025-09-05下载We propose the Ratio1 AI meta-operating system (meta-OS), a decentralized MLOps protocol that unifies AI model development, deployment, and inference across heterogeneous edge devices.

cs.PF - Performance

标题作者发布日期PDF摘要
Efficient Fault Localization in a Cloud Stack Using End-to-End Application Service TopologyDhanya R Mathews, Mudit Verma, Pooja Aggarwal, J. Lakshmi2025-09-05下载Cloud application services are distributed in nature and have components across the stack working together to deliver the experience to end users.
Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM DeviceNiansong Zhang, Wenbo Zhu, Courtney Golden, Dan Ilan, Hongzheng Chen, Christopher Batten, Zhiru Zhang2025-09-05下载Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications.
Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari BenchmarksJason Gardner, Ayan Dutta, Swapnoneel Roy, O. Patrick Kreidl, Ladislau Boloni2025-09-05下载The growing computational demands of deep reinforcement learning (DRL) have raised concerns about the environmental and economic costs of training large-scale models.
L1RA: Dynamic Rank Assignment in LoRA Fine-TuningRaul Singh, Nicolo Brunello, Vincenzo Scotti, Mark James Carman2025-09-05下载The ability of Large Language Models (LLMs) to solve complex tasks has made them crucial in the development of AI-based applications. However, the high computational requirements to fine-tune these LL...

基于 VitePress 构建