2025-08-22

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
zkPHIRE: A Programmable Accelerator for ZKPs over HIgh-degRee, Expressive Gates	Alhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Siddharth Garg, Brandon Reagen	2025-08-22	下载	Zero-Knowledge Proofs (ZKPs) have emerged as a powerful tool for secure and privacy-preserving computation. ZKPs enable one party to convince another of a statement's validity without revealing anythi...
Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective	Tianyao Shi, Yi Ding	2025-08-22	下载	Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their heavy resource demands make quantization-reducing precision to lower-bit formats-critical for e...
RIROS: A Parallel RTL Fault SImulation FRamework with TwO-Dimensional Parallelism and Unified Schedule	Jiaping Tang, Jianan Mu, Zizhen Liu, Ge Yu, Tenghui Hua, Bin Sun, Silin Liu, Jing Ye, Huawei Li	2025-08-22	下载	With the rapid development of safety-critical applications such as autonomous driving and embodied intelligence, the functional safety of the corresponding electronic chips becomes more critical.
Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates	Yang Liu, Yi Chen, Yongwei Zhao, Yifan Hao, Zifu Zheng, Weihao Kong, Zhangmai Li, Dongchen Jiang, Ruiyang Xia, Zhihong Ma, Zisheng Liu, Zhaoyong Wan, Yunqi Lu, Ximing Liu, Hongrui Guo, Zhihao Yang, Zhe Wang, Tianrui Ma, Mo Zou, Rui Zhang, Ling Li, Xing Hu, Zidong Du, Zhiwei Xu, Qi Guo, Tianshi Chen, Yunji Chen	2025-08-22	下载	The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailor...
Bare-Metal RISC-V + NVDLA SoC for Efficient Deep Learning Inference	Vineet Kumar, Ajay Kumar M, Yike Li, Shreejith Shanker, Deepu John	2025-08-22	下载	This paper presents a novel System-on-Chip (SoC) architecture for accelerating complex deep learning models for edge computing applications through a combination of hardware and software optimisations...
GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts Model	Deepak Kumar, Divakar Yadav, Yash Patel	2025-08-22	下载	We present a single-GPU (H100, bf16) evaluation of GPT-OSS-20B (Mixture-of-Experts; 20.9B total, approx. 3.61B active) against dense baselines Qwen3-32B and Yi-34B across multiple dimensions.
HePGA: A Heterogeneous Processing-in-Memory based GNN Training Accelerator	Chukwufumnanya Ogbogu, Gaurav Narang, Biresh Kumar Joardar, Janardhan Rao Doppa, Krishnendu Chakrabarty, Partha Pratim Pande	2025-08-22	下载	Processing-In-Memory (PIM) architectures offer a promising approach to accelerate Graph Neural Network (GNN) training and inference. However, various PIM devices such as ReRAM, FeFET, PCM, MRAM, and S...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
A User-centric Kubernetes-based Architecture for Green Cloud Computing	Matteo Zanotto, Leonardo Vicentini, Redi Vreto, Francesco Lumpp, Diego Braga, Sandro Fiore	2025-08-22	下载	To meet the increasing demand for cloud computing services, the scale and number of data centers keeps increasing worldwide. This growth comes at the cost of increased electricity consumption, which d...
PICO: Performance Insights for Collective Operations	Saverio Pasqualoni, Lorenzo Piarulli, Daniele De Sensi	2025-08-22	下载	Collective operations are cornerstones of both HPC application and large-scale AI training and inference. Yet, comprehensive, systematic and reproducible performance evaluation and benchmarking of sai...
Neuromorphic Simulation of Drosophila Melanogaster Brain Connectome on Loihi 2	Felix Wang, Bradley H. Theilman, Fred Rothganger, William Severa, Craig M. Vineyard, James B. Aimone	2025-08-22	下载	We demonstrate the first-ever nontrivial, biologically realistic connectome simulated on neuromorphic computing hardware. Specifically, we implement the whole-brain connectome of the adult Drosophila ...
On the Duality of Task and Actor Programming Models	Rohan Yadav, Joseph Guman, Sean Treichler, Michael Garland, Alex Aiken, Fredrik Kjolstad, Michael Bauer	2025-08-22	下载	Programming models for distributed and heterogeneous machines are rapidly growing in popularity to meet the demands of modern workloads. Task and actor models are common choices that offer different t...
Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective	Tianyao Shi, Yi Ding	2025-08-22	下载	Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their heavy resource demands make quantization-reducing precision to lower-bit formats-critical for e...
Generalizing Brooks' theorem via Partial Coloring is Hard Classically and Locally	Jan Bok, Avinandan Das, Anna Gujgiczer, Nikola Jedličková	2025-08-22	下载	We investigate the classical and distributed complexity of \emph{ $k$ -partial $c$ -coloring} where $c=k$ , a natural generalization of Brooks' theorem where each vertex should be colored from the palette...
Scalable hybrid quantum Monte Carlo simulation of U(1) gauge field coupled to fermions on GPU	Kexin Feng, Chuang Chen, Zi Yang Meng	2025-08-22	下载	We develop a GPU-accelerated hybrid quantum Monte Carlo (QMC) algorithm to solve the fundamental yet difficult problem of $U(1)$ gauge field coupled to fermions, which gives rise to a $U(1)$ Dirac spi...
Hybrid Classical-Quantum Supercomputing: A demonstration of a multi-user, multi-QPU and multi-GPU environment	Mateusz Slysz, Piotr Rydlichowski, Krzysztof Kurowski, Omar Bacarreza, Esperanza Cuenca Gomez, Zohim Chandani, Bettina Heim, Pradnya Khalate, William R. Clements, James Fletcher	2025-08-22	下载	Achieving a practical quantum advantage for near-term applications is widely expected to rely on hybrid classical-quantum algorithms. To deliver this practical advantage to users, high performance com...
Self-Healing Network of Interconnected Edge Devices Empowered by Infrastructure-as-Code and LoRa Communication	Rob Carson, Mohamed Chahine Ghanem, Feriel Bouakkaz	2025-08-22	下载	This Paper proposes a self-healing, automated network of Raspberry Pi devices designed for deployment in scenarios where traditional networking is unavailable.
GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts Model	Deepak Kumar, Divakar Yadav, Yash Patel	2025-08-22	下载	We present a single-GPU (H100, bf16) evaluation of GPT-OSS-20B (Mixture-of-Experts; 20.9B total, approx. 3.61B active) against dense baselines Qwen3-32B and Yi-34B across multiple dimensions.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
QoS-based Intelligent multi-connectivity for B5G networks	Ali Parsa, Neda Moghim, Sachin Shetty	2025-08-22	下载	The rapid advancement of communication technologies has established cellular networks as the backbone for diverse applications, each with distinct quality of service requirements.
Self-Healing Network of Interconnected Edge Devices Empowered by Infrastructure-as-Code and LoRa Communication	Rob Carson, Mohamed Chahine Ghanem, Feriel Bouakkaz	2025-08-22	下载	This Paper proposes a self-healing, automated network of Raspberry Pi devices designed for deployment in scenarios where traditional networking is unavailable.
Set Transformer Architectures and Synthetic Data Generation for Flow-Guided Nanoscale Localization	Mika Leo Hube, Filip Lemic, Ethungshan Shitiri, Gerard Calvo Bartra, Sergi Abadal, Xavier Costa Pérez	2025-08-22	下载	Flow-guided Localization (FGL) enables the identification of spatial regions within the human body that contain an event of diagnostic interest.
Joint Cache Placement and Routing in Satellite-Terrestrial Edge Computing Network: A GNN-Enabled DRL Approach	Yuhao Zheng, Ting You, Kejia Peng, Chang Liu	2025-08-22	下载	In this letter, we investigate the problem of joint content caching and routing in satellite-terrestrial edge computing networks (STECNs) to improve caching service for geographically distributed user...
ANSC: Probabilistic Capacity Health Scoring for Datacenter-Scale Reliability	Madhava Gaikwad, Abhishek Gandhi	2025-08-22	下载	We present ANSC, a probabilistic capacity health scoring framework for hyperscale datacenter fabrics. While existing alerting systems detect individual device or link failures, they do not capture the...
A Survey of Post-Quantum Cryptography Support in Cryptographic Libraries	Nadeem Ahmed, Lei Zhang, Aryya Gangopadhyay	2025-08-22	下载	The rapid advancement of quantum computing poses a significant threat to modern cryptographic systems, necessitating the transition to Post-Quantum Cryptography (PQC).
Congestion Control System Optimization with Large Language Models	Zhiyuan He, Aashish Gottipati, Lili Qiu, Yuqing Yang, Francis Y. Yan	2025-08-22	下载	Congestion control is a fundamental component of Internet infrastructure, and researchers have dedicated considerable effort to developing improved congestion control algorithms.
Time Series Based Network Intrusion Detection using MTF-Aided Transformer	Poorvi Joshi, Mohan Gurusamy	2025-08-22	下载	This paper introduces a novel approach to time series classification using a Markov Transition Field (MTF)-aided Transformer model, specifically designed for Software-Defined Networks (SDNs).
CoVeRaP: Cooperative Vehicular Perception through mmWave FMCW Radars	Jinyue Song, Hansol Ku, Jayneel Vora, Nelson Lee, Ahmad Kamari, Prasant Mohapatra, Parth Pathak	2025-08-22	下载	Automotive FMCW radars remain reliable in rain and glare, yet their sparse, noisy point clouds constrain 3-D object detection. We therefore release CoVeRaP, a 21 k-frame cooperative dataset that time-...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
PICO: Performance Insights for Collective Operations	Saverio Pasqualoni, Lorenzo Piarulli, Daniele De Sensi	2025-08-22	下载	Collective operations are cornerstones of both HPC application and large-scale AI training and inference. Yet, comprehensive, systematic and reproducible performance evaluation and benchmarking of sai...
GreenLLM: SLO-Aware Dynamic Frequency Scaling for Energy-Efficient LLM Serving	Qunyou Liu, Darong Huang, Marina Zapater, David Atienza	2025-08-22	下载	Large Language Models (LLMs) are becoming the backbone of modern cloud services, yet their inference costs are dominated by GPU energy. Unlike traditional GPU workloads, LLM inference has two stages w...
Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective	Tianyao Shi, Yi Ding	2025-08-22	下载	Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their heavy resource demands make quantization-reducing precision to lower-bit formats-critical for e...
Two-Timescale Dynamic Service Deployment and Task Scheduling with Spatiotemporal Collaboration in Mobile Edge Networks	Yang Li, Xing Zhang, Yunji Zhao, Wenbo Wang	2025-08-22	下载	Collaborative edge computing addresses the resource constraints of individual edge nodes by enabling resource sharing and task co-processing across multiple nodes.
ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference	Wangsong Yin, Daliang Xu, Mengwei Xu, Gang Huang, Xuanzhe Liu	2025-08-22	下载	On-device running Large Language Models (LLMs) is nowadays a critical enabler towards preserving user privacy. We observe that the attention operator falls back from the special-purpose NPU to the gen...
GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts Model	Deepak Kumar, Divakar Yadav, Yash Patel	2025-08-22	下载	We present a single-GPU (H100, bf16) evaluation of GPT-OSS-20B (Mixture-of-Experts; 20.9B total, approx. 3.61B active) against dense baselines Qwen3-32B and Yi-34B across multiple dimensions.