2025-12-10

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
A Vertically Integrated Framework for Templatized Chip Design	Jeongeun Kim, Christopher Torng	2025-12-10	下载	Developers who primarily engage with software often struggle to incorporate custom hardware into their applications, even though specialized silicon can provide substantial benefits to machine learnin...
Algorithm-Driven On-Chip Integration for High Density and Low Cost	Jeongeun Kim, Sabrina Yarzada, Paul Chen, Christopher Torng	2025-12-10	下载	Growing interest in semiconductor workforce development has generated demand for platforms capable of supporting large numbers of independent hardware designs for research and training without imposin...
Pinball: A Cryogenic Predecoder for Surface Code Decoding Under Circuit-Level Noise	Alexander Knapen, Guanchen Tao, Jacob Mack, Tomas Bruno, Mehdi Saligane, Dennis Sylvester, Qirui Zhang, Gokul Subramanian Ravi	2025-12-10	下载	Scaling fault tolerant quantum computers, especially cryogenic systems based on the surface code, to millions of qubits is challenging due to poorly-scaling data processing and power consumption overh...
ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators	Guoqiang Zou, Wanyu Wang, Hao Zheng, Longxiang Yin, Yinhe Han	2025-12-10	下载	Existing memory management techniques severely hinder efficient Large Language Model serving on accelerators constrained by poor random-access bandwidth.
RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML Inference	Siyuan Ma, Jiajun Hu, Jeeho Ryoo, Aman Arora, Lizy Kurian John	2025-12-10	下载	In-DRAM Processing-In-Memory (DRAM-PIM) has emerged as a promising approach to accelerate memory-intensive workloads by mitigating data transfer overhead between DRAM and the host processor.
Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens	Yanpeng Yu, Haiyue Ma, Krish Agarwal, Nicolai Oswald, Qijing Huang, Hugo Linsenmaier, Chunhui Mei, Ritchie Zhao, Ritika Borkar, Bita Darvish Rouhani, David Nellans, Ronny Krashinsky, Anurag Khandelwal	2025-12-10	下载	Expert Parallelism (EP) permits Mixture of Experts (MoE) models to scale beyond a single GPU. To address load imbalance across GPUs in EP, existing approaches aim to balance the number of tokens each ...
Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers	Jinming Lu, Jiayi Tian, Yequan Zhao, Hai Li, Zheng Zhang	2025-12-10	下载	Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training objectives.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
A Comparative Analysis of zk-SNARKs and zk-STARKs: Theory and Practice	Ayush Nainwal, Atharva Kamble, Nitin Awathare	2025-12-10	下载	Zero-knowledge proofs (ZKPs) are central to secure and privacy-preserving computation, with zk-SNARKs and zk-STARKs emerging as leading frameworks offering distinct trade-offs in efficiency, scalabili...
Link-Sharing Backpressure Routing In Wireless Multi-Hop Networks	Zhongyuan Zhao, Yujun Ming, Ananthram Swami, Kevin Chan, Fikadu Dagefu, Santiago Segarra	2025-12-10	下载	Backpressure (BP) routing and scheduling is an established resource allocation method for wireless multi-hop networks, noted for its fully distributed operation and maximum queue stability.
Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers	Zhaolan Huang, Kaspar Schleiser, Gyungmin Myung, Emmanuel Baccelli	2025-12-10	下载	Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and...
Recoverable Lock-Free Locks	Hagit Attiya, Panagiota Fatourou, Eleftherios Kosmas, Yuanhao Wei	2025-12-10	下载	This paper presents the first transformation that introduces both lock-freedom and recoverability. Our transformation starts with a lock-based implementation, and provides a recoverable, lock-free sub...
Straggler Tolerant and Resilient DL Training on Homogeneous GPUs	Zeyu Zhang, Haiying Shen	2025-12-10	下载	Despite the popularity of homogeneous GPU-based deep learning (DL) training, the prevalence, causes and impact of stragglers and the effectiveness of existing straggler mitigation approaches are still...
SynthPix: A lightspeed PIV images generator	Antonio Terpin, Alan Bonomi, Francesco Banelli, Raffaello D'Andrea	2025-12-10	下载	We describe SynthPix, a synthetic image generator for Particle Image Velocimetry (PIV) with a focus on performance and parallelism on accelerators, implemented in JAX.
PHWSOA: A Pareto-based Hybrid Whale-Seagull Scheduling for Multi-Objective Tasks in Cloud Computing	Zhi Zhao, Hang Xiao, Wei Rang	2025-12-10	下载	Task scheduling is a critical research challenge in cloud computing, a transformative technology widely adopted across industries. Although numerous scheduling solutions exist, they predominantly opti...
Scalable Construction of Spiking Neural Networks using up to thousands of GPUs	Bruno Golosio, Gianmarco Tiddia, José Villamar, Luca Pontisso, Luca Sergi, Francesco Simula, Pooja Babu, Elena Pastorelli, Abigail Morrison, Markus Diesmann, Alessandro Lonardo, Pier Stanislao Paolucci, Johanna Senk	2025-12-10	下载	Diverse scientific and engineering research areas deal with discrete, time-stamped changes in large systems of interacting delay differential equations.
WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving	Chiheng Lou, Sheng Qi, Rui Kang, Yong Zhang, Chen Sun, Pengcheng Wang, Bingyang Liu, Xuanzhe Liu, Xin Jin	2025-12-10	下载	Deploying multiple models within shared GPU clusters is promising for improving resource efficiency in large language model (LLM) serving. Existing multi-LLM serving systems optimize GPU utilization a...
PerCache: Predictive Hierarchical Cache for RAG Applications on Mobile Devices	Kaiwei Liu, Liekang Zeng, Lilin Xu, Bufang Yang, Zhenyu Yan	2025-12-10	下载	Retrieval-augmented generation (RAG) has been extensively used as a de facto paradigm in various large language model (LLM)-driven applications on mobile devices, such as mobile assistants leveraging ...
Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANN	Nam Anh Dang, Ben Landrum, Ken Birman	2025-12-10	下载	Vector search underpins modern information-retrieval systems, including retrieval-augmented generation (RAG) pipelines and search engines over unstructured text and images.
A Distributed Framework for Privacy-Enhanced Vision Transformers on the Edge	Zihao Ding, Mufeng Zhu, Zhongze Tang, Sheng Wei, Yao Liu	2025-12-10	下载	Nowadays, visual intelligence tools have become ubiquitous, offering all kinds of convenience and possibilities. However, these tools have high computational requirements that exceed the capabilities ...
SHARe-KAN: Holographic Vector Quantization for Memory-Bound Inference	Jeff Smith	2025-12-10	下载	Kolmogorov-Arnold Networks (KANs) face a fundamental memory wall: their learned basis functions create parameter counts that impose extreme bandwidth demands, hindering deployment in memory-constraine...
GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference	Phuong Tran, Tzu-Hao Liu, Long Tan Le, Tung-Anh Nguyen, Van Quan La, Eason Yu, Han Shu, Choong Seon Hong, Nguyen H. Tran	2025-12-10	下载	Large language models (LLMs) have revolutionized natural language processing, yet their high computational demands pose significant challenges for real-time inference, especially in multi-user server ...
Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens	Yanpeng Yu, Haiyue Ma, Krish Agarwal, Nicolai Oswald, Qijing Huang, Hugo Linsenmaier, Chunhui Mei, Ritchie Zhao, Ritika Borkar, Bita Darvish Rouhani, David Nellans, Ronny Krashinsky, Anurag Khandelwal	2025-12-10	下载	Expert Parallelism (EP) permits Mixture of Experts (MoE) models to scale beyond a single GPU. To address load imbalance across GPUs in EP, existing approaches aim to balance the number of tokens each ...
TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0	Jinyu Chen, Long Shi, Taotao Wang, Jiaheng Wang, Wei Zhang	2025-12-10	下载	The rapid growth of Web3.0 is transforming the Internet from a centralized structure to decentralized, which empowers users with unprecedented self-sovereignty over their own data.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Lightweight Security for Private Networks: Real-World Evaluation of WireGuard	Hubert Djuitcheu, Andrew Sergeev, Khurshid Alam, Danny Santhosh, Achim Autenrieth, Jochen Seitz	2025-12-10	下载	This paper explores WireGuard as a lightweight alternative to IPsec for securing the user plane as well as the control plane in an industrial Open RAN deployment at the Adtran Terafactory in Meiningen...
Network Traffic Analysis with Process Mining: The UPSIDE Case Study	Francesco Vitale, Paolo Palmiero, Massimiliano Rak, Nicola Mazzocca	2025-12-10	下载	Online gaming is a popular activity involving the adoption of complex systems and network infrastructures. The relevance of gaming, which generates large amounts of market revenue, drove research in m...
Link-Sharing Backpressure Routing In Wireless Multi-Hop Networks	Zhongyuan Zhao, Yujun Ming, Ananthram Swami, Kevin Chan, Fikadu Dagefu, Santiago Segarra	2025-12-10	下载	Backpressure (BP) routing and scheduling is an established resource allocation method for wireless multi-hop networks, noted for its fully distributed operation and maximum queue stability.
Towards Practical and Usable In-network Classification	Di Zhu, Jianxi Chen, Hyojoon Kim	2025-12-10	下载	In-network machine learning enables real-time classification directly on network hardware, offering consistently low inference latency. However, current solutions are limited by strict hardware constr...
M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural Networks	Blessed Guda, Carlee Joe-Wong	2025-12-10	下载	The rise of 5G/6G network technologies promises to enable applications like autonomous vehicles and virtual reality, resulting in a significant increase in connected devices and necessarily complicati...
Graph-Based Bayesian Optimization for Quantum Circuit Architecture Search with Uncertainty Calibrated Surrogates	Prashant Kumar Choudhary, Nouhaila Innan, Muhammad Shafique, Rajeev Singh	2025-12-10	下载	Quantum circuit design is a key bottleneck for practical quantum machine learning on complex, real-world data. We present an automated framework that discovers and refines variational quantum circuits...
BlockFLEX: An Adaptive and Survivable Architecture with Hierarchical Routing for LEO Satellite Networks	Xiangtong Wang	2025-12-10	下载	This paper presents \textbf{BlockFLEX}, an adaptive and survivable architecture with a hierarchical routing scheme for Low Earth Orbit satellite networks, designed to address dynamic topology changes ...
Eunomia: A Multicontroller Domain Partitioning Framework in Hierarchical Satellite Network	Qi Zhang, Kun Qiu, Zhe Chen, Wenjun Zhu, Xiaofan Xu, Ping Du, Yue Gao	2025-12-10	下载	With the rise of mega-satellite constellations, the integration of hierarchical non-terrestrial and terrestrial networks has become a cornerstone of 6G coverage enhancements.
Tyche: A Hybrid Computation Framework of Illumination Pattern for Satellite Beam Hopping	Ziheng Yang, Kun Qiu, Zhe Chen, Wenjun Zhu, Yue Gao	2025-12-10	下载	High-Throughput Satellites (HTS) use beam hopping to handle non-uniform and time-varying ground traffic demand. A significant technical challenge in beam hopping is the computation of effective illumi...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
ZeroOS: A Universal Modular Library OS for zkVMs	Guangxian Zou, Isaac Zhang, Ryan Zarick, Kelvin Wong, Thomas Kim, Daniel L. -K. Wong, Saeid Yazdinejad, Dan Boneh	2025-12-10	下载	zkVMs promise general-purpose verifiable computation through ISA-level compatibility with modern programs and toolchains. However, compatibility extends further than just the ISA; modern programs ofte...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers	Zhaolan Huang, Kaspar Schleiser, Gyungmin Myung, Emmanuel Baccelli	2025-12-10	下载	Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and...
TinyDéjàVu: Smaller Memory Footprint & Faster Inference on Sensor Data Streams with Always-On Microcontrollers	Zhaolan Huang, Emmanuel Baccelli	2025-12-10	下载	Always-on sensors are increasingly expected to embark a variety of tiny neural networks and to continuously perform inference on time-series of the data they sense.