Skip to content

2025-08-22

cs.AR - Architecture

标题作者发布日期PDF摘要
zkPHIRE: A Programmable Accelerator for ZKPs over HIgh-degRee, Expressive GatesAlhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Siddharth Garg, Brandon Reagen2025-08-22下载Zero-Knowledge Proofs (ZKPs) have emerged as a powerful tool for secure and privacy-preserving computation. ZKPs enable one party to convince another of a statement's validity without revealing anythi...
Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality PerspectiveTianyao Shi, Yi Ding2025-08-22下载Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their heavy resource demands make quantization-reducing precision to lower-bit formats-critical for e...
RIROS: A Parallel RTL Fault SImulation FRamework with TwO-Dimensional Parallelism and Unified ScheduleJiaping Tang, Jianan Mu, Zizhen Liu, Ge Yu, Tenghui Hua, Bin Sun, Silin Liu, Jing Ye, Huawei Li2025-08-22下载With the rapid development of safety-critical applications such as autonomous driving and embodied intelligence, the functional safety of the corresponding electronic chips becomes more critical.
Hardwired-Neurons Language Processing Units as General-Purpose Cognitive SubstratesYang Liu, Yi Chen, Yongwei Zhao, Yifan Hao, Zifu Zheng, Weihao Kong, Zhangmai Li, Dongchen Jiang, Ruiyang Xia, Zhihong Ma, Zisheng Liu, Zhaoyong Wan, Yunqi Lu, Ximing Liu, Hongrui Guo, Zhihao Yang, Zhe Wang, Tianrui Ma, Mo Zou, Rui Zhang, Ling Li, Xing Hu, Zidong Du, Zhiwei Xu, Qi Guo, Tianshi Chen, Yunji Chen2025-08-22下载The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailor...
Bare-Metal RISC-V + NVDLA SoC for Efficient Deep Learning InferenceVineet Kumar, Ajay Kumar M, Yike Li, Shreejith Shanker, Deepu John2025-08-22下载This paper presents a novel System-on-Chip (SoC) architecture for accelerating complex deep learning models for edge computing applications through a combination of hardware and software optimisations...
GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts ModelDeepak Kumar, Divakar Yadav, Yash Patel2025-08-22下载We present a single-GPU (H100, bf16) evaluation of GPT-OSS-20B (Mixture-of-Experts; 20.9B total, approx. 3.61B active) against dense baselines Qwen3-32B and Yi-34B across multiple dimensions.
HePGA: A Heterogeneous Processing-in-Memory based GNN Training AcceleratorChukwufumnanya Ogbogu, Gaurav Narang, Biresh Kumar Joardar, Janardhan Rao Doppa, Krishnendu Chakrabarty, Partha Pratim Pande2025-08-22下载Processing-In-Memory (PIM) architectures offer a promising approach to accelerate Graph Neural Network (GNN) training and inference. However, various PIM devices such as ReRAM, FeFET, PCM, MRAM, and S...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
A User-centric Kubernetes-based Architecture for Green Cloud ComputingMatteo Zanotto, Leonardo Vicentini, Redi Vreto, Francesco Lumpp, Diego Braga, Sandro Fiore2025-08-22下载To meet the increasing demand for cloud computing services, the scale and number of data centers keeps increasing worldwide. This growth comes at the cost of increased electricity consumption, which d...
PICO: Performance Insights for Collective OperationsSaverio Pasqualoni, Lorenzo Piarulli, Daniele De Sensi2025-08-22下载Collective operations are cornerstones of both HPC application and large-scale AI training and inference. Yet, comprehensive, systematic and reproducible performance evaluation and benchmarking of sai...
Neuromorphic Simulation of Drosophila Melanogaster Brain Connectome on Loihi 2Felix Wang, Bradley H. Theilman, Fred Rothganger, William Severa, Craig M. Vineyard, James B. Aimone2025-08-22下载We demonstrate the first-ever nontrivial, biologically realistic connectome simulated on neuromorphic computing hardware. Specifically, we implement the whole-brain connectome of the adult Drosophila ...
On the Duality of Task and Actor Programming ModelsRohan Yadav, Joseph Guman, Sean Treichler, Michael Garland, Alex Aiken, Fredrik Kjolstad, Michael Bauer2025-08-22下载Programming models for distributed and heterogeneous machines are rapidly growing in popularity to meet the demands of modern workloads. Task and actor models are common choices that offer different t...
Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality PerspectiveTianyao Shi, Yi Ding2025-08-22下载Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their heavy resource demands make quantization-reducing precision to lower-bit formats-critical for e...
Generalizing Brooks' theorem via Partial Coloring is Hard Classically and LocallyJan Bok, Avinandan Das, Anna Gujgiczer, Nikola Jedličková2025-08-22下载We investigate the classical and distributed complexity of \emph{kk-partial cc-coloring} where c=kc=k, a natural generalization of Brooks' theorem where each vertex should be colored from the palette...
Scalable hybrid quantum Monte Carlo simulation of U(1) gauge field coupled to fermions on GPUKexin Feng, Chuang Chen, Zi Yang Meng2025-08-22下载We develop a GPU-accelerated hybrid quantum Monte Carlo (QMC) algorithm to solve the fundamental yet difficult problem of U(1)U(1) gauge field coupled to fermions, which gives rise to a U(1)U(1) Dirac spi...
Hybrid Classical-Quantum Supercomputing: A demonstration of a multi-user, multi-QPU and multi-GPU environmentMateusz Slysz, Piotr Rydlichowski, Krzysztof Kurowski, Omar Bacarreza, Esperanza Cuenca Gomez, Zohim Chandani, Bettina Heim, Pradnya Khalate, William R. Clements, James Fletcher2025-08-22下载Achieving a practical quantum advantage for near-term applications is widely expected to rely on hybrid classical-quantum algorithms. To deliver this practical advantage to users, high performance com...
Self-Healing Network of Interconnected Edge Devices Empowered by Infrastructure-as-Code and LoRa CommunicationRob Carson, Mohamed Chahine Ghanem, Feriel Bouakkaz2025-08-22下载This Paper proposes a self-healing, automated network of Raspberry Pi devices designed for deployment in scenarios where traditional networking is unavailable.
GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts ModelDeepak Kumar, Divakar Yadav, Yash Patel2025-08-22下载We present a single-GPU (H100, bf16) evaluation of GPT-OSS-20B (Mixture-of-Experts; 20.9B total, approx. 3.61B active) against dense baselines Qwen3-32B and Yi-34B across multiple dimensions.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
QoS-based Intelligent multi-connectivity for B5G networksAli Parsa, Neda Moghim, Sachin Shetty2025-08-22下载The rapid advancement of communication technologies has established cellular networks as the backbone for diverse applications, each with distinct quality of service requirements.
Self-Healing Network of Interconnected Edge Devices Empowered by Infrastructure-as-Code and LoRa CommunicationRob Carson, Mohamed Chahine Ghanem, Feriel Bouakkaz2025-08-22下载This Paper proposes a self-healing, automated network of Raspberry Pi devices designed for deployment in scenarios where traditional networking is unavailable.
Set Transformer Architectures and Synthetic Data Generation for Flow-Guided Nanoscale LocalizationMika Leo Hube, Filip Lemic, Ethungshan Shitiri, Gerard Calvo Bartra, Sergi Abadal, Xavier Costa Pérez2025-08-22下载Flow-guided Localization (FGL) enables the identification of spatial regions within the human body that contain an event of diagnostic interest.
Joint Cache Placement and Routing in Satellite-Terrestrial Edge Computing Network: A GNN-Enabled DRL ApproachYuhao Zheng, Ting You, Kejia Peng, Chang Liu2025-08-22下载In this letter, we investigate the problem of joint content caching and routing in satellite-terrestrial edge computing networks (STECNs) to improve caching service for geographically distributed user...
ANSC: Probabilistic Capacity Health Scoring for Datacenter-Scale ReliabilityMadhava Gaikwad, Abhishek Gandhi2025-08-22下载We present ANSC, a probabilistic capacity health scoring framework for hyperscale datacenter fabrics. While existing alerting systems detect individual device or link failures, they do not capture the...
A Survey of Post-Quantum Cryptography Support in Cryptographic LibrariesNadeem Ahmed, Lei Zhang, Aryya Gangopadhyay2025-08-22下载The rapid advancement of quantum computing poses a significant threat to modern cryptographic systems, necessitating the transition to Post-Quantum Cryptography (PQC).
Congestion Control System Optimization with Large Language ModelsZhiyuan He, Aashish Gottipati, Lili Qiu, Yuqing Yang, Francis Y. Yan2025-08-22下载Congestion control is a fundamental component of Internet infrastructure, and researchers have dedicated considerable effort to developing improved congestion control algorithms.
Time Series Based Network Intrusion Detection using MTF-Aided TransformerPoorvi Joshi, Mohan Gurusamy2025-08-22下载This paper introduces a novel approach to time series classification using a Markov Transition Field (MTF)-aided Transformer model, specifically designed for Software-Defined Networks (SDNs).
CoVeRaP: Cooperative Vehicular Perception through mmWave FMCW RadarsJinyue Song, Hansol Ku, Jayneel Vora, Nelson Lee, Ahmad Kamari, Prasant Mohapatra, Parth Pathak2025-08-22下载Automotive FMCW radars remain reliable in rain and glare, yet their sparse, noisy point clouds constrain 3-D object detection. We therefore release CoVeRaP, a 21 k-frame cooperative dataset that time-...

cs.PF - Performance

标题作者发布日期PDF摘要
PICO: Performance Insights for Collective OperationsSaverio Pasqualoni, Lorenzo Piarulli, Daniele De Sensi2025-08-22下载Collective operations are cornerstones of both HPC application and large-scale AI training and inference. Yet, comprehensive, systematic and reproducible performance evaluation and benchmarking of sai...
GreenLLM: SLO-Aware Dynamic Frequency Scaling for Energy-Efficient LLM ServingQunyou Liu, Darong Huang, Marina Zapater, David Atienza2025-08-22下载Large Language Models (LLMs) are becoming the backbone of modern cloud services, yet their inference costs are dominated by GPU energy. Unlike traditional GPU workloads, LLM inference has two stages w...
Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality PerspectiveTianyao Shi, Yi Ding2025-08-22下载Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their heavy resource demands make quantization-reducing precision to lower-bit formats-critical for e...
Two-Timescale Dynamic Service Deployment and Task Scheduling with Spatiotemporal Collaboration in Mobile Edge NetworksYang Li, Xing Zhang, Yunji Zhao, Wenbo Wang2025-08-22下载Collaborative edge computing addresses the resource constraints of individual edge nodes by enabling resource sharing and task co-processing across multiple nodes.
ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM InferenceWangsong Yin, Daliang Xu, Mengwei Xu, Gang Huang, Xuanzhe Liu2025-08-22下载On-device running Large Language Models (LLMs) is nowadays a critical enabler towards preserving user privacy. We observe that the attention operator falls back from the special-purpose NPU to the gen...
GPT-OSS-20B: A Comprehensive Deployment-Centric Analysis of OpenAI's Open-Weight Mixture of Experts ModelDeepak Kumar, Divakar Yadav, Yash Patel2025-08-22下载We present a single-GPU (H100, bf16) evaluation of GPT-OSS-20B (Mixture-of-Experts; 20.9B total, approx. 3.61B active) against dense baselines Qwen3-32B and Yi-34B across multiple dimensions.

基于 VitePress 构建