Skip to content

2025-12-10

cs.AR - Architecture

标题作者发布日期PDF摘要
A Vertically Integrated Framework for Templatized Chip DesignJeongeun Kim, Christopher Torng2025-12-10下载Developers who primarily engage with software often struggle to incorporate custom hardware into their applications, even though specialized silicon can provide substantial benefits to machine learnin...
Algorithm-Driven On-Chip Integration for High Density and Low CostJeongeun Kim, Sabrina Yarzada, Paul Chen, Christopher Torng2025-12-10下载Growing interest in semiconductor workforce development has generated demand for platforms capable of supporting large numbers of independent hardware designs for research and training without imposin...
Pinball: A Cryogenic Predecoder for Surface Code Decoding Under Circuit-Level NoiseAlexander Knapen, Guanchen Tao, Jacob Mack, Tomas Bruno, Mehdi Saligane, Dennis Sylvester, Qirui Zhang, Gokul Subramanian Ravi2025-12-10下载Scaling fault tolerant quantum computers, especially cryogenic systems based on the surface code, to millions of qubits is challenging due to poorly-scaling data processing and power consumption overh...
ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class AcceleratorsGuoqiang Zou, Wanyu Wang, Hao Zheng, Longxiang Yin, Yinhe Han2025-12-10下载Existing memory management techniques severely hinder efficient Large Language Model serving on accelerators constrained by poor random-access bandwidth.
RACAM: Enhancing DRAM with Reuse-Aware Computation and Automated Mapping for ML InferenceSiyuan Ma, Jiajun Hu, Jeeho Ryoo, Aman Arora, Lizy Kurian John2025-12-10下载In-DRAM Processing-In-Memory (DRAM-PIM) has emerged as a promising approach to accelerate memory-intensive workloads by mitigating data transfer overhead between DRAM and the host processor.
Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not TokensYanpeng Yu, Haiyue Ma, Krish Agarwal, Nicolai Oswald, Qijing Huang, Hugo Linsenmaier, Chunhui Mei, Ritchie Zhao, Ritika Borkar, Bita Darvish Rouhani, David Nellans, Ronny Krashinsky, Anurag Khandelwal2025-12-10下载Expert Parallelism (EP) permits Mixture of Experts (MoE) models to scale beyond a single GPU. To address load imbalance across GPUs in EP, existing approaches aim to balance the number of tokens each ...
Tensor-Compressed and Fully-Quantized Training of Neural PDE SolversJinming Lu, Jiayi Tian, Yequan Zhao, Hai Li, Zheng Zhang2025-12-10下载Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training objectives.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
A Comparative Analysis of zk-SNARKs and zk-STARKs: Theory and PracticeAyush Nainwal, Atharva Kamble, Nitin Awathare2025-12-10下载Zero-knowledge proofs (ZKPs) are central to secure and privacy-preserving computation, with zk-SNARKs and zk-STARKs emerging as leading frameworks offering distinct trade-offs in efficiency, scalabili...
Link-Sharing Backpressure Routing In Wireless Multi-Hop NetworksZhongyuan Zhao, Yujun Ming, Ananthram Swami, Kevin Chan, Fikadu Dagefu, Santiago Segarra2025-12-10下载Backpressure (BP) routing and scheduling is an established resource allocation method for wireless multi-hop networks, noted for its fully distributed operation and maximum queue stability.
Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core MicrocontrollersZhaolan Huang, Kaspar Schleiser, Gyungmin Myung, Emmanuel Baccelli2025-12-10下载Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and...
Recoverable Lock-Free LocksHagit Attiya, Panagiota Fatourou, Eleftherios Kosmas, Yuanhao Wei2025-12-10下载This paper presents the first transformation that introduces both lock-freedom and recoverability. Our transformation starts with a lock-based implementation, and provides a recoverable, lock-free sub...
Straggler Tolerant and Resilient DL Training on Homogeneous GPUsZeyu Zhang, Haiying Shen2025-12-10下载Despite the popularity of homogeneous GPU-based deep learning (DL) training, the prevalence, causes and impact of stragglers and the effectiveness of existing straggler mitigation approaches are still...
SynthPix: A lightspeed PIV images generatorAntonio Terpin, Alan Bonomi, Francesco Banelli, Raffaello D'Andrea2025-12-10下载We describe SynthPix, a synthetic image generator for Particle Image Velocimetry (PIV) with a focus on performance and parallelism on accelerators, implemented in JAX.
PHWSOA: A Pareto-based Hybrid Whale-Seagull Scheduling for Multi-Objective Tasks in Cloud ComputingZhi Zhao, Hang Xiao, Wei Rang2025-12-10下载Task scheduling is a critical research challenge in cloud computing, a transformative technology widely adopted across industries. Although numerous scheduling solutions exist, they predominantly opti...
Scalable Construction of Spiking Neural Networks using up to thousands of GPUsBruno Golosio, Gianmarco Tiddia, José Villamar, Luca Pontisso, Luca Sergi, Francesco Simula, Pooja Babu, Elena Pastorelli, Abigail Morrison, Markus Diesmann, Alessandro Lonardo, Pier Stanislao Paolucci, Johanna Senk2025-12-10下载Diverse scientific and engineering research areas deal with discrete, time-stamped changes in large systems of interacting delay differential equations.
WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM ServingChiheng Lou, Sheng Qi, Rui Kang, Yong Zhang, Chen Sun, Pengcheng Wang, Bingyang Liu, Xuanzhe Liu, Xin Jin2025-12-10下载Deploying multiple models within shared GPU clusters is promising for improving resource efficiency in large language model (LLM) serving. Existing multi-LLM serving systems optimize GPU utilization a...
PerCache: Predictive Hierarchical Cache for RAG Applications on Mobile DevicesKaiwei Liu, Liekang Zeng, Lilin Xu, Bufang Yang, Zhenyu Yan2025-12-10下载Retrieval-augmented generation (RAG) has been extensively used as a de facto paradigm in various large language model (LLM)-driven applications on mobile devices, such as mobile assistants leveraging ...
Passing the Baton: High Throughput Distributed Disk-Based Vector Search with BatANNNam Anh Dang, Ben Landrum, Ken Birman2025-12-10下载Vector search underpins modern information-retrieval systems, including retrieval-augmented generation (RAG) pipelines and search engines over unstructured text and images.
A Distributed Framework for Privacy-Enhanced Vision Transformers on the EdgeZihao Ding, Mufeng Zhu, Zhongze Tang, Sheng Wei, Yao Liu2025-12-10下载Nowadays, visual intelligence tools have become ubiquitous, offering all kinds of convenience and possibilities. However, these tools have high computational requirements that exceed the capabilities ...
SHARe-KAN: Holographic Vector Quantization for Memory-Bound InferenceJeff Smith2025-12-10下载Kolmogorov-Arnold Networks (KANs) face a fundamental memory wall: their learned basis functions create parameter counts that impose extreme bandwidth demands, hindering deployment in memory-constraine...
GoodSpeed: Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge InferencePhuong Tran, Tzu-Hao Liu, Long Tan Le, Tung-Anh Nguyen, Van Quan La, Eason Yu, Han Shu, Choong Seon Hong, Nguyen H. Tran2025-12-10下载Large language models (LLMs) have revolutionized natural language processing, yet their high computational demands pose significant challenges for real-time inference, especially in multi-user server ...
Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not TokensYanpeng Yu, Haiyue Ma, Krish Agarwal, Nicolai Oswald, Qijing Huang, Hugo Linsenmaier, Chunhui Mei, Ritchie Zhao, Ritika Borkar, Bita Darvish Rouhani, David Nellans, Ronny Krashinsky, Anurag Khandelwal2025-12-10下载Expert Parallelism (EP) permits Mixture of Experts (MoE) models to scale beyond a single GPU. To address load imbalance across GPUs in EP, existing approaches aim to balance the number of tokens each ...
TDC-Cache: A Trustworthy Decentralized Cooperative Caching Framework for Web3.0Jinyu Chen, Long Shi, Taotao Wang, Jiaheng Wang, Wei Zhang2025-12-10下载The rapid growth of Web3.0 is transforming the Internet from a centralized structure to decentralized, which empowers users with unprecedented self-sovereignty over their own data.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Lightweight Security for Private Networks: Real-World Evaluation of WireGuardHubert Djuitcheu, Andrew Sergeev, Khurshid Alam, Danny Santhosh, Achim Autenrieth, Jochen Seitz2025-12-10下载This paper explores WireGuard as a lightweight alternative to IPsec for securing the user plane as well as the control plane in an industrial Open RAN deployment at the Adtran Terafactory in Meiningen...
Network Traffic Analysis with Process Mining: The UPSIDE Case StudyFrancesco Vitale, Paolo Palmiero, Massimiliano Rak, Nicola Mazzocca2025-12-10下载Online gaming is a popular activity involving the adoption of complex systems and network infrastructures. The relevance of gaming, which generates large amounts of market revenue, drove research in m...
Link-Sharing Backpressure Routing In Wireless Multi-Hop NetworksZhongyuan Zhao, Yujun Ming, Ananthram Swami, Kevin Chan, Fikadu Dagefu, Santiago Segarra2025-12-10下载Backpressure (BP) routing and scheduling is an established resource allocation method for wireless multi-hop networks, noted for its fully distributed operation and maximum queue stability.
Towards Practical and Usable In-network ClassificationDi Zhu, Jianxi Chen, Hyojoon Kim2025-12-10下载In-network machine learning enables real-time classification directly on network hardware, offering consistently low inference latency. However, current solutions are limited by strict hardware constr...
M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural NetworksBlessed Guda, Carlee Joe-Wong2025-12-10下载The rise of 5G/6G network technologies promises to enable applications like autonomous vehicles and virtual reality, resulting in a significant increase in connected devices and necessarily complicati...
Graph-Based Bayesian Optimization for Quantum Circuit Architecture Search with Uncertainty Calibrated SurrogatesPrashant Kumar Choudhary, Nouhaila Innan, Muhammad Shafique, Rajeev Singh2025-12-10下载Quantum circuit design is a key bottleneck for practical quantum machine learning on complex, real-world data. We present an automated framework that discovers and refines variational quantum circuits...
BlockFLEX: An Adaptive and Survivable Architecture with Hierarchical Routing for LEO Satellite NetworksXiangtong Wang2025-12-10下载This paper presents \textbf{BlockFLEX}, an adaptive and survivable architecture with a hierarchical routing scheme for Low Earth Orbit satellite networks, designed to address dynamic topology changes ...
Eunomia: A Multicontroller Domain Partitioning Framework in Hierarchical Satellite NetworkQi Zhang, Kun Qiu, Zhe Chen, Wenjun Zhu, Xiaofan Xu, Ping Du, Yue Gao2025-12-10下载With the rise of mega-satellite constellations, the integration of hierarchical non-terrestrial and terrestrial networks has become a cornerstone of 6G coverage enhancements.
Tyche: A Hybrid Computation Framework of Illumination Pattern for Satellite Beam HoppingZiheng Yang, Kun Qiu, Zhe Chen, Wenjun Zhu, Yue Gao2025-12-10下载High-Throughput Satellites (HTS) use beam hopping to handle non-uniform and time-varying ground traffic demand. A significant technical challenge in beam hopping is the computation of effective illumi...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
ZeroOS: A Universal Modular Library OS for zkVMsGuangxian Zou, Isaac Zhang, Ryan Zarick, Kelvin Wong, Thomas Kim, Daniel L. -K. Wong, Saeid Yazdinejad, Dan Boneh2025-12-10下载zkVMs promise general-purpose verifiable computation through ISA-level compatibility with modern programs and toolchains. However, compatibility extends further than just the ISA; modern programs ofte...

cs.PF - Performance

标题作者发布日期PDF摘要
Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core MicrocontrollersZhaolan Huang, Kaspar Schleiser, Gyungmin Myung, Emmanuel Baccelli2025-12-10下载Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and...
TinyDéjàVu: Smaller Memory Footprint & Faster Inference on Sensor Data Streams with Always-On MicrocontrollersZhaolan Huang, Emmanuel Baccelli2025-12-10下载Always-on sensors are increasingly expected to embark a variety of tiny neural networks and to continuously perform inference on time-series of the data they sense.

基于 VitePress 构建