2025-12-15

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Pipeline Stage Resolved Timing Characterization of FPGA and ASIC Implementations of a RISC V Processor	Mostafa Darvishi	2025-12-15	下载	This paper presents a pipeline stage resolved timing characterization of a 32-bit RISC V processor implemented on a 20 nm FPGA and a 7 nm FinFET ASIC platform.
Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing	Juncheng Huo, Yunfan Gao, Xinxin Liu, Sa Wang, Yungang Bao, Xitong Gao, Kan Shi	2025-12-15	下载	As processor designs grow more complex, verification remains bottlenecked by slow software simulation and low-quality random test stimuli. Recent research has applied software fuzzers to hardware veri...
Design in Tiles: Automating GEMM Deployment on Tile-Based Many-PE Accelerators	Aofeng Shen, Chi Zhang, Yakup Budanaz, Alexandru Calotoiu, Torsten Hoefler, Luca Benini	2025-12-15	下载	Tile-based many-Processing Element (PE) accelerators can achieve competitive performance on General Matrix Multiplication (GEMM), but they are extremely hard to program, as their optimal software mapp...
Toward Reproducible and Standardized Computer Architecture Simulation with gem5	Kunal Pai, Harshil Patel, Erin Le, Noah Krim, Mahyar Samani, Bobby R. Bruce, Jason Lowe-Power	2025-12-15	下载	Reproducibility in simulation-based computer architecture research requires coordinating artifacts like disk images, kernels, and benchmarks, but existing workflows are inconsistent.
Striking the Balance: GEMM Performance Optimization Across Generations of Ryzen AI NPUs	Endri Taka, Andre Roesti, Joseph Melber, Pranathi Vasireddy, Kristof Denolf, Diana Marculescu	2025-12-15	下载	The high computational and memory demands of modern deep learning (DL) workloads have led to the development of specialized hardware devices from cloud to edge, such as AMD's Ryzen AI XDNA NPUs.
Noise-Resilient Quantum Aggregation on NISQ for Federated ADAS Learning	Chethana Prasad Kabgere, Sudarshan T S B	2025-12-15	下载	Advanced Driver Assistance Systems (ADAS) increasingly employ Federated Learning (FL) to collaboratively train models across distributed vehicular nodes while preserving data privacy.
An Optimal Alignment-Driven Iterative Closed-Loop Convergence Framework for High-Performance Ultra-Large Scale Layout Pattern Clustering	Shuo Liu	2025-12-15	下载	With the aggressive scaling of VLSI technology, the explosion of layout patterns creates a critical bottleneck for DFM applications like OPC. Pattern clustering is essential to reduce data complexity,...
SliceMoE: Bit-Sliced Expert Caching under Miss-Rate Constraints for Efficient MoE Inference	Yuseon Choi, Sangjin Kim, Jungjun Oh, Gwangtae Park, Byeongcheol Kim, Hoi-Jun Yoo	2025-12-15	下载	MoE models offer efficient scaling through conditional computation, but their large parameter size and expensive expert offloading make on-device deployment challenging.
SeVeDo: A Heterogeneous Transformer Accelerator for Low-Bit Inference via Hierarchical Group Quantization and SVD-Guided Mixed Precision	Yuseon Choi, Sangjin Kim, Jungjun Oh, Byeongcheol Kim, Hoi-Jun Yoo	2025-12-15	下载	Low-bit quantization is a promising technique for efficient transformer inference by reducing computational and memory overhead. However, aggressive bitwidth reduction remains challenging due to activ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Q-IRIS: The Evolution of the IRIS Task-Based Runtime to Enable Classical-Quantum Workflows	Narasinga Rao Miniskar, Mohammad Alaul Haque Monil, Elaine Wong, Vicente Leyton-Ortega, Jeffrey S. Vetter, Seth R. Johnson, Travis S. Humble	2025-12-15	下载	Extreme heterogeneity in emerging HPC systems are starting to include quantum accelerators, motivating runtimes that can coordinate between classical and quantum workloads.
SEDULity: A Proof-of-Learning Framework for Distributed and Secure Blockchains with Efficient Useful Work	Weihang Cao, Mustafa Doger, Sennur Ulukus	2025-12-15	下载	The security and decentralization of Proof-of-Work (PoW) have been well-tested in existing blockchain systems. However, its tremendous energy waste has raised concerns about sustainability.
Design in Tiles: Automating GEMM Deployment on Tile-Based Many-PE Accelerators	Aofeng Shen, Chi Zhang, Yakup Budanaz, Alexandru Calotoiu, Torsten Hoefler, Luca Benini	2025-12-15	下载	Tile-based many-Processing Element (PE) accelerators can achieve competitive performance on General Matrix Multiplication (GEMM), but they are extremely hard to program, as their optimal software mapp...
astroCAMP: A Community Benchmark and Co-Design Framework for Sustainable SKA-Scale Radio Imaging	Denisa-Andreea Constantinescu, Rubén Rodríguez Álvarez, Jacques Morin, Etienne Orliac, Mickaël Dardaillon, Sunrise Wang, Hugo Miomandre, Miguel Peón-Quirós, Jean-François Nezan, David Atienza	2025-12-15	下载	The Square Kilometre Array (SKA) will operate one of the world's largest continuous scientific data systems, sustaining petascale imaging under strict power envelopes.
Janus: Disaggregating Attention and Experts for Scalable MoE Inference	Zhexiang Zhang, Ye Wang, Xiangyu Wang, Yumiao Zhao, Jingzhe Jiang, Qizhen Weng, Shaohuai Shi, Yin Chen, Minchen Yu	2025-12-15	下载	Large Mixture-of-Experts (MoE) model inference is challenging due to high resource demands and dynamic workloads. Existing solutions often deploy the entire model as a single monolithic unit, which ap...
SIGMA: An AI-Empowered Training Stack on Early-Life Hardware	Lei Qu, Lianhai Ren, Peng Cheng, Rui Gao, Ruizhe Wang, Tianyu Chen, Xiao Liu, Xingjian Zhang, Yeyun Gong, Yifan Xiong, Yucheng Ding, Yuting Jiang, Zhenghao Lin, Zhongxin Guo, Ziyue Yang	2025-12-15	下载	An increasing variety of AI accelerators is being considered for large-scale training. However, enabling large-scale training on early-life AI accelerators faces three core challenges: frequent system...
Temporal parallelisation of continuous-time maximum-a-posteriori trajectory estimation	Hassan Razavi, Ángel F. García-Fernández, Simo Särkkä	2025-12-15	下载	This paper proposes a parallel-in-time method for computing continuous-time maximum-a-posteriori (MAP) trajectory estimates of the states of partially observed stochastic differential equations (SDEs)...
SPARS: A Reinforcement Learning-Enabled Simulator for Power Management in HPC Job Scheduling	Muhammad Alfian Amrizal, Raka Satya Prasasta, Santana Yuda Pradata, Kadek Gemilang Santiyuda, Reza Pulungan, Hiroyuki Takizawa	2025-12-15	下载	High-performance computing (HPC) clusters consume enormous amounts of energy, with idle nodes as a major source of waste. Powering down unused nodes can mitigate this problem, but poorly timed transit...
Towards Secure Decentralized Applications and Consensus Protocols in Blockchains (on Selfish Mining, Undercutting Attacks, DAG-Based Blockchains, E-Voting, Cryptocurrency Wallets, Secure-Logging, and CBDC)	Ivan Homoliak	2025-12-15	下载	With the rise of cryptocurrencies, many new applications built on decentralized blockchains have emerged. Blockchains are full-stack distributed systems where multiple sub-systems interact.
Adaptive GPU Resource Allocation for Multi-Agent Collaborative Reasoning in Serverless Environments	Guilin Zhang, Wulan Guo, Ziqi Tan	2025-12-15	下载	Multi-agent systems powered by large language models have emerged as a promising paradigm for solving complex reasoning tasks through collaborative intelligence.
Toward Self-Healing Networks-on-Chip: RL-Driven Routing in 2D Torus Architectures	Mohammad Walid Charrwi, Zaid Hussain	2025-12-15	下载	We investigate adaptive minimal routing in 2D torus networks on chip NoCs under node fault conditions comparing a reinforcement learning RL based strategy to an adaptive routing baseline A torus topol...
GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs	Ruifan Chu, Anbang Wang, Xiuxiu Bai, Shuai Liu, Xiaoshe Dong	2025-12-15	下载	In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and e...
FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection	Ziyu Huang, Yangjie Zhou, Zihan Liu, Xinhao Luo, Yijia Diao, Minyi Guo, Jidong Zhai, Yu Feng, Chen Zhang, Anbang Wu, Jingwen Leng	2025-12-15	下载	The scaling of computation throughput continues to outpace improvements in memory bandwidth, making many deep learning workloads memory-bound.
PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving	Weizhe Huang, Tao Peng, Tongxuan Liu, Donghe Jin, Xianzhe Dong, Ke Zhang	2025-12-15	下载	The widespread deployment of large language models (LLMs) for interactive applications necessitates serving systems that can handle thousands of concurrent requests with diverse Service Level Objectiv...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Assessing Resilience in Authoritative DNS Infrastructure Supporting Government Services	Agung Septiadi, Minzhao Lyu, Hassan Habibi Gharakheili, Vijay Sivaraman	2025-12-15	下载	Online government services are increasingly regarded as critical national infrastructure. Because these services directly influence public trust, any disruption can have significant societal and polit...
Energy-Efficient Multi-Radio Microwave and IAB-Based Fixed Wireless Access for Rural Areas	Anselme Ndikumana, Kim Khoa Nguyen, Adel Larabi, Mohamed Cheriet	2025-12-15	下载	Deploying fiber optics as a last-mile solution in rural areas is not economically viable due to low population density. Nevertheless, providing high-speed internet access in these regions is essential...
A Fair, Flexible, Zero-Waste Digital Electricity Market: A First-Principles Approach Combining Automatic Market Making, Holarchic Architectures and Shapley Theory	Shaun Sweeney, Robert Shorten, Mark O'Malley	2025-12-15	下载	This thesis presents a fundamental rethink of electricity market design at the wholesale and balancing layers. Rather than treating markets as static spot clearing mechanisms, it reframes them as a co...
A Secure Edge Gateway Architecture for Wi-Fi-Enabled IoT	Daniyal Ganiuly, Nurzhau Bolatbek, Assel Smaiyl	2025-12-15	下载	This paper presents a Secure Edge Gateway Architecture for Wi-Fi-Enabled IoT designed to strengthen local network protection without altering existing infrastructure.
Link-Aware Energy-Frugal Continual Learning for Fault Detection in IoT Networks	Henrik C. M. Frederiksen, Junya Shiraishi, Cedomir Stefanovic, Hei Victor Cheng, Shashi Raj Pandey	2025-12-15	下载	The use of lightweight machine learning (ML) models in internet of things (IoT) networks enables resource constrained IoT devices to perform on-device inference for several critical applications.
Resource Orchestration and Optimization in 6G Extreme-edge Scenario	Manuel A. Jimenez, Sarang Kahvazadeh, Ignacio Labrador, Josep Mangues-Bafalluy	2025-12-15	下载	6G networks envision a pervasive service infrastructure spanning from centralized cloud to distributed edge and highly dynamic extreme-edge domains.
Low-Complexity Monitoring and Compensation of Transceiver IQ Imbalance by Multi-dimensional Architecture for Dual-Polarization 16 Quadrature Amplitude Modulation	Yukun Zhang, Xiaoxue Gong, Xu Zhang, Lei Guo	2025-12-15	下载	In this paper, a low-complexity multi-dimensional architecture for IQ imbalance compensation is proposed, which reduces the effects of in-phase (I) and quadrature (Q) imbalance.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC	Qingyuan Liu, Mo Zou, Hengbin Zhang, Dong Du, Yubin Xia, Haibo Chen	2025-12-15	下载	File systems are critical OS components that require constant evolution to support new hardware and emerging application needs. However, the traditional paradigm of developing features, fixing bugs, a...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming Reconstruction	Mohammad Mozaffari, Samuel Kushnir, Maryam Mehri Dehnavi, Amir Yazdanbakhsh	2025-12-15	下载	Post-training model pruning is a promising solution, yet it faces a trade-off: simple heuristics that zero weights are fast but degrade accuracy, while principled joint optimization methods recover ac...
astroCAMP: A Community Benchmark and Co-Design Framework for Sustainable SKA-Scale Radio Imaging	Denisa-Andreea Constantinescu, Rubén Rodríguez Álvarez, Jacques Morin, Etienne Orliac, Mickaël Dardaillon, Sunrise Wang, Hugo Miomandre, Miguel Peón-Quirós, Jean-François Nezan, David Atienza	2025-12-15	下载	The Square Kilometre Array (SKA) will operate one of the world's largest continuous scientific data systems, sustaining petascale imaging under strict power envelopes.
EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC	Siyuan Shen, Mikhail Khalilov, Lukas Gianinazzi, Timo Schneider, Marcin Chrapek, Jai Dayal, Manisha Gajbe, Robert Wisniewski, Torsten Hoefler	2025-12-15	下载	Resource disaggregation is a promising technique for improving the efficiency of large-scale computing systems. However, this comes at the cost of increased memory access latency due to the need to re...
GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs	Ruifan Chu, Anbang Wang, Xiuxiu Bai, Shuai Liu, Xiaoshe Dong	2025-12-15	下载	In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and e...