Appearance
2025-09-05
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Comparing Methods for the Cross-Level Verification of SystemC Peripherals with Symbolic Execution | Karl Aaron Rudkowski, Sallar Ahmadi-Pour, Rolf Drechsler | 2025-09-05 | 下载 | Virtual Prototypes (VPs) are important tools in modern hardware development. At high abstractions, they are often implemented in SystemC and offer early analysis of increasingly complex designs. |
| Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device | Niansong Zhang, Wenbo Zhu, Courtney Golden, Dan Ilan, Hongzheng Chen, Christopher Batten, Zhiru Zhang | 2025-09-05 | 下载 | Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications. |
| Distributed-HISQ: A Distributed Quantum Control Architecture | Yilun Zhao, Kangding Zhao, Peng Zhou, Dingdong Liu, Tingyu Luo, Yuzhen Zheng, Peng Luo, Shun Hu, Jin Lin, Cheng Guo, Yinhe Han, Ying Wang, Mingtang Deng, Junjie Wu, X. Fu | 2025-09-05 | 下载 | The design of a scalable Quantum Control Architecture (QCA) faces two primary challenges. First, the continuous growth in qubit counts has rendered distributed QCA inevitable, yet the nondeterministic... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Efficient Fault Localization in a Cloud Stack Using End-to-End Application Service Topology | Dhanya R Mathews, Mudit Verma, Pooja Aggarwal, J. Lakshmi | 2025-09-05 | 下载 | Cloud application services are distributed in nature and have components across the stack working together to deliver the experience to end users. |
| veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMD | Youjie Li, Cheng Wan, Zhiqi Lin, Hongyu Zhu, Jiacheng Yang, Ziang Song, Xinyi Di, Jiawei Wu, Huiyao Shu, Wenlei Bao, Yanghua Peng, Haibin Lin, Li-Wen Chang | 2025-09-05 | 下载 | Large Language Models (LLMs) have scaled rapidly in size and complexity, requiring increasingly intricate parallelism for distributed training, such as 3D parallelism. |
| On Using Large-Batches in Federated Learning | Sahil Tyagi | 2025-09-05 | 下载 | Efficient Federated learning (FL) is crucial for training deep networks over devices with limited compute resources and bounded networks. With the advent of big data, devices either generate or collec... |
| Scaling Performance of Large Language Model Pretraining | Alexander Interrante-Grant, Carla Varela-Rosa, Suhaas Narayan, Chris Connelly, Albert Reuther | 2025-09-05 | 下载 | Large language models (LLMs) show best-in-class performance across a wide range of natural language processing applications. Training these models is an extremely computationally expensive task; front... |
| Dynamic reconfiguration for malleable applications using RMA | Iker Martín-Álvarez, José I. Aliaga, Maribel Castillo | 2025-09-05 | 下载 | This paper investigates the novel one-sided communication methods based on remote memory access (RMA) operations in MPI for dynamic resizing of malleable applications, enabling data redistribution wit... |
| Toward Distributed 3D Gaussian Splatting for High-Resolution Isosurface Visualization | Mengjiao Han, Andres Sewell, Joseph Insley, Janet Knowles, Victor A. Mateevitsi, Michael E. Papka, Steve Petruzza, Silvio Rizzi | 2025-09-05 | 下载 | We present a multi-GPU extension of the 3D Gaussian Splatting (3D-GS) pipeline for scientific visualization. Building on previous work that demonstrated high-fidelity isosurface reconstruction using G... |
| An Efficient Subspace Algorithm for Federated Learning on Heterogeneous Data | Jiaojiao Zhang, Yuqi Xu, Kun Yuan | 2025-09-05 | 下载 | This work addresses the key challenges of applying federated learning to large-scale deep neural networks, particularly the issue of client drift due to data heterogeneity across clients and the high ... |
| Discovering Software Parallelization Points Using Deep Neural Networks | Izavan dos S. Correia, Henrique C. T. Santos, Tiago A. E. Ferreira | 2025-09-05 | 下载 | This study proposes a deep learning-based approach for discovering loops in programming code according to their potential for parallelization. |
| Ratio1 -- AI meta-OS | Andrei Damian, Petrica Butusina, Alessandro De Franceschi, Vitalii Toderian, Marius Grigoras, Cristian Bleotiu | 2025-09-05 | 下载 | We propose the Ratio1 AI meta-operating system (meta-OS), a decentralized MLOps protocol that unifies AI model development, deployment, and inference across heterogeneous edge devices. |
| VoltanaLLM: Feedback-Driven Frequency Control and State-Space Routing for Energy-Efficient LLM Serving | Jiahuan Yu, Aryan Taneja, Junfeng Lin, Minjia Zhang | 2025-09-05 | 下载 | Modern Large Language Model (LLM) serving systems increasingly support interactive applications, like real-time chat assistants, code generation tools, and agentic workflows. |
| Accelerating Privacy-Preserving Federated Learning in Large-Scale LEO Satellite Systems | Binquan Guo, Junteng Cao, Marie Siew, Binbin Chen, Tony Q. S. Quek, Zhu Han | 2025-09-05 | 下载 | Large-scale low-Earth-orbit (LEO) satellite systems are increasingly valued for their ability to enable rapid and wide-area data exchange, thereby facilitating the collaborative training of artificial... |
| STADI: Fine-Grained Step-Patch Diffusion Parallelism for Heterogeneous GPUs | Han Liang, Jiahui Zhou, Zicheng Zhou, Xiaoxi Zhang, Xu Chen | 2025-09-05 | 下载 | The escalating adoption of diffusion models for applications such as image generation demands efficient parallel inference techniques to manage their substantial computational cost. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Joint Routing, Resource Allocation, and Energy Optimization for Integrated Access and Backhaul with Open RAN | Reshma Prasad, Maxime Elkael, Gabriele Gemmi, Osama M. Bushnaq, Debashisha Mishra, Prasanna Raut, Jennifer Simonjan, Michele Polese, Tommaso Melodia | 2025-09-05 | 下载 | As networks evolve towards 6G, Mobile Network Operators (MNOs) must accommodate diverse requirements and at the same time manage rising energy consumption. |
| Distributed Link Sparsification for Scalable Scheduling Using Graph Neural Networks (Journal Version) | Zhongyuan Zhao, Gunjan Verma, Ananthram Swami, Santiago Segarra | 2025-09-05 | 下载 | In wireless networks characterized by dense connectivity, the significant signaling overhead generated by distributed link scheduling algorithms can exacerbate issues like congestion, energy consumpti... |
| Pair-Bid Auction Model for Optimized Network Slicing in 5G RAN | Mengyao Li, Sebastian Troia, Yingqian Zhang, Guido Maier | 2025-09-05 | 下载 | Network slicing is a key 5G technology that enables multiple virtual networks to share physical infrastructure, optimizing flexibility and resource allocation. |
| A Federated Fine-Tuning Paradigm of Foundation Models in Heterogenous Wireless Networks | Jingyi Wang, Zhongyuan Zhao, Qingtian Wang, Zexu Li, Yue Wang, Tony Q. S. Quek | 2025-09-05 | 下载 | Edge intelligence has emerged as a promising strategy to deliver low-latency and ubiquitous services for mobile devices. Recent advances in fine-tuning mechanisms of foundation models have enabled edg... |
| Where Have All the Firewalls Gone? Security Consequences of Residential IPv6 Transition | Erik Rye, Dave Levin, Robert Beverly | 2025-09-05 | 下载 | IPv4 NAT has limited the spread of IoT botnets considerably by default-denying bots' incoming connection requests to in-home devices unless the owner has explicitly allowed them. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| MambaLite-Micro: Memory-Optimized Mamba Inference on MCUs | Hongjun Xu, Junxi Xia, Weisi Yang, Yueyuan Sui, Stephen Xia | 2025-09-05 | 下载 | Deploying Mamba models on microcontrollers (MCUs) remains challenging due to limited memory, the lack of native operator support, and the absence of embedded-friendly toolchains. |
| Ratio1 -- AI meta-OS | Andrei Damian, Petrica Butusina, Alessandro De Franceschi, Vitalii Toderian, Marius Grigoras, Cristian Bleotiu | 2025-09-05 | 下载 | We propose the Ratio1 AI meta-operating system (meta-OS), a decentralized MLOps protocol that unifies AI model development, deployment, and inference across heterogeneous edge devices. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Efficient Fault Localization in a Cloud Stack Using End-to-End Application Service Topology | Dhanya R Mathews, Mudit Verma, Pooja Aggarwal, J. Lakshmi | 2025-09-05 | 下载 | Cloud application services are distributed in nature and have components across the stack working together to deliver the experience to end users. |
| Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device | Niansong Zhang, Wenbo Zhu, Courtney Golden, Dan Ilan, Hongzheng Chen, Christopher Batten, Zhiru Zhang | 2025-09-05 | 下载 | Compute-in-SRAM architectures offer a promising approach to achieving higher performance and energy efficiency across a range of data-intensive applications. |
| Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks | Jason Gardner, Ayan Dutta, Swapnoneel Roy, O. Patrick Kreidl, Ladislau Boloni | 2025-09-05 | 下载 | The growing computational demands of deep reinforcement learning (DRL) have raised concerns about the environmental and economic costs of training large-scale models. |
| L1RA: Dynamic Rank Assignment in LoRA Fine-Tuning | Raul Singh, Nicolo Brunello, Vincenzo Scotti, Mark James Carman | 2025-09-05 | 下载 | The ability of Large Language Models (LLMs) to solve complex tasks has made them crucial in the development of AI-based applications. However, the high computational requirements to fine-tune these LL... |