Skip to content

2025-12-15

cs.AR - Architecture

标题作者发布日期PDF摘要
Pipeline Stage Resolved Timing Characterization of FPGA and ASIC Implementations of a RISC V ProcessorMostafa Darvishi2025-12-15下载This paper presents a pipeline stage resolved timing characterization of a 32-bit RISC V processor implemented on a 20 nm FPGA and a 7 nm FinFET ASIC platform.
Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor FuzzingJuncheng Huo, Yunfan Gao, Xinxin Liu, Sa Wang, Yungang Bao, Xitong Gao, Kan Shi2025-12-15下载As processor designs grow more complex, verification remains bottlenecked by slow software simulation and low-quality random test stimuli. Recent research has applied software fuzzers to hardware veri...
Design in Tiles: Automating GEMM Deployment on Tile-Based Many-PE AcceleratorsAofeng Shen, Chi Zhang, Yakup Budanaz, Alexandru Calotoiu, Torsten Hoefler, Luca Benini2025-12-15下载Tile-based many-Processing Element (PE) accelerators can achieve competitive performance on General Matrix Multiplication (GEMM), but they are extremely hard to program, as their optimal software mapp...
Toward Reproducible and Standardized Computer Architecture Simulation with gem5Kunal Pai, Harshil Patel, Erin Le, Noah Krim, Mahyar Samani, Bobby R. Bruce, Jason Lowe-Power2025-12-15下载Reproducibility in simulation-based computer architecture research requires coordinating artifacts like disk images, kernels, and benchmarks, but existing workflows are inconsistent.
Striking the Balance: GEMM Performance Optimization Across Generations of Ryzen AI NPUsEndri Taka, Andre Roesti, Joseph Melber, Pranathi Vasireddy, Kristof Denolf, Diana Marculescu2025-12-15下载The high computational and memory demands of modern deep learning (DL) workloads have led to the development of specialized hardware devices from cloud to edge, such as AMD's Ryzen AI XDNA NPUs.
Noise-Resilient Quantum Aggregation on NISQ for Federated ADAS LearningChethana Prasad Kabgere, Sudarshan T S B2025-12-15下载Advanced Driver Assistance Systems (ADAS) increasingly employ Federated Learning (FL) to collaboratively train models across distributed vehicular nodes while preserving data privacy.
An Optimal Alignment-Driven Iterative Closed-Loop Convergence Framework for High-Performance Ultra-Large Scale Layout Pattern ClusteringShuo Liu2025-12-15下载With the aggressive scaling of VLSI technology, the explosion of layout patterns creates a critical bottleneck for DFM applications like OPC. Pattern clustering is essential to reduce data complexity,...
SliceMoE: Bit-Sliced Expert Caching under Miss-Rate Constraints for Efficient MoE InferenceYuseon Choi, Sangjin Kim, Jungjun Oh, Gwangtae Park, Byeongcheol Kim, Hoi-Jun Yoo2025-12-15下载MoE models offer efficient scaling through conditional computation, but their large parameter size and expensive expert offloading make on-device deployment challenging.
SeVeDo: A Heterogeneous Transformer Accelerator for Low-Bit Inference via Hierarchical Group Quantization and SVD-Guided Mixed PrecisionYuseon Choi, Sangjin Kim, Jungjun Oh, Byeongcheol Kim, Hoi-Jun Yoo2025-12-15下载Low-bit quantization is a promising technique for efficient transformer inference by reducing computational and memory overhead. However, aggressive bitwidth reduction remains challenging due to activ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Q-IRIS: The Evolution of the IRIS Task-Based Runtime to Enable Classical-Quantum WorkflowsNarasinga Rao Miniskar, Mohammad Alaul Haque Monil, Elaine Wong, Vicente Leyton-Ortega, Jeffrey S. Vetter, Seth R. Johnson, Travis S. Humble2025-12-15下载Extreme heterogeneity in emerging HPC systems are starting to include quantum accelerators, motivating runtimes that can coordinate between classical and quantum workloads.
SEDULity: A Proof-of-Learning Framework for Distributed and Secure Blockchains with Efficient Useful WorkWeihang Cao, Mustafa Doger, Sennur Ulukus2025-12-15下载The security and decentralization of Proof-of-Work (PoW) have been well-tested in existing blockchain systems. However, its tremendous energy waste has raised concerns about sustainability.
Design in Tiles: Automating GEMM Deployment on Tile-Based Many-PE AcceleratorsAofeng Shen, Chi Zhang, Yakup Budanaz, Alexandru Calotoiu, Torsten Hoefler, Luca Benini2025-12-15下载Tile-based many-Processing Element (PE) accelerators can achieve competitive performance on General Matrix Multiplication (GEMM), but they are extremely hard to program, as their optimal software mapp...
astroCAMP: A Community Benchmark and Co-Design Framework for Sustainable SKA-Scale Radio ImagingDenisa-Andreea Constantinescu, Rubén Rodríguez Álvarez, Jacques Morin, Etienne Orliac, Mickaël Dardaillon, Sunrise Wang, Hugo Miomandre, Miguel Peón-Quirós, Jean-François Nezan, David Atienza2025-12-15下载The Square Kilometre Array (SKA) will operate one of the world's largest continuous scientific data systems, sustaining petascale imaging under strict power envelopes.
Janus: Disaggregating Attention and Experts for Scalable MoE InferenceZhexiang Zhang, Ye Wang, Xiangyu Wang, Yumiao Zhao, Jingzhe Jiang, Qizhen Weng, Shaohuai Shi, Yin Chen, Minchen Yu2025-12-15下载Large Mixture-of-Experts (MoE) model inference is challenging due to high resource demands and dynamic workloads. Existing solutions often deploy the entire model as a single monolithic unit, which ap...
SIGMA: An AI-Empowered Training Stack on Early-Life HardwareLei Qu, Lianhai Ren, Peng Cheng, Rui Gao, Ruizhe Wang, Tianyu Chen, Xiao Liu, Xingjian Zhang, Yeyun Gong, Yifan Xiong, Yucheng Ding, Yuting Jiang, Zhenghao Lin, Zhongxin Guo, Ziyue Yang2025-12-15下载An increasing variety of AI accelerators is being considered for large-scale training. However, enabling large-scale training on early-life AI accelerators faces three core challenges: frequent system...
Temporal parallelisation of continuous-time maximum-a-posteriori trajectory estimationHassan Razavi, Ángel F. García-Fernández, Simo Särkkä2025-12-15下载This paper proposes a parallel-in-time method for computing continuous-time maximum-a-posteriori (MAP) trajectory estimates of the states of partially observed stochastic differential equations (SDEs)...
SPARS: A Reinforcement Learning-Enabled Simulator for Power Management in HPC Job SchedulingMuhammad Alfian Amrizal, Raka Satya Prasasta, Santana Yuda Pradata, Kadek Gemilang Santiyuda, Reza Pulungan, Hiroyuki Takizawa2025-12-15下载High-performance computing (HPC) clusters consume enormous amounts of energy, with idle nodes as a major source of waste. Powering down unused nodes can mitigate this problem, but poorly timed transit...
Towards Secure Decentralized Applications and Consensus Protocols in Blockchains (on Selfish Mining, Undercutting Attacks, DAG-Based Blockchains, E-Voting, Cryptocurrency Wallets, Secure-Logging, and CBDC)Ivan Homoliak2025-12-15下载With the rise of cryptocurrencies, many new applications built on decentralized blockchains have emerged. Blockchains are full-stack distributed systems where multiple sub-systems interact.
Adaptive GPU Resource Allocation for Multi-Agent Collaborative Reasoning in Serverless EnvironmentsGuilin Zhang, Wulan Guo, Ziqi Tan2025-12-15下载Multi-agent systems powered by large language models have emerged as a promising paradigm for solving complex reasoning tasks through collaborative intelligence.
Toward Self-Healing Networks-on-Chip: RL-Driven Routing in 2D Torus ArchitecturesMohammad Walid Charrwi, Zaid Hussain2025-12-15下载We investigate adaptive minimal routing in 2D torus networks on chip NoCs under node fault conditions comparing a reinforcement learning RL based strategy to an adaptive routing baseline A torus topol...
GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable ProgramsRuifan Chu, Anbang Wang, Xiuxiu Bai, Shuai Liu, Xiaoshe Dong2025-12-15下载In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and e...
FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core ConnectionZiyu Huang, Yangjie Zhou, Zihan Liu, Xinhao Luo, Yijia Diao, Minyi Guo, Jidong Zhai, Yu Feng, Chen Zhang, Anbang Wu, Jingwen Leng2025-12-15下载The scaling of computation throughput continues to outpace improvements in memory bandwidth, making many deep learning workloads memory-bound.
PROSERVE: Unified Multi-Priority Request Scheduling for LLM ServingWeizhe Huang, Tao Peng, Tongxuan Liu, Donghe Jin, Xianzhe Dong, Ke Zhang2025-12-15下载The widespread deployment of large language models (LLMs) for interactive applications necessitates serving systems that can handle thousands of concurrent requests with diverse Service Level Objectiv...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Assessing Resilience in Authoritative DNS Infrastructure Supporting Government ServicesAgung Septiadi, Minzhao Lyu, Hassan Habibi Gharakheili, Vijay Sivaraman2025-12-15下载Online government services are increasingly regarded as critical national infrastructure. Because these services directly influence public trust, any disruption can have significant societal and polit...
Energy-Efficient Multi-Radio Microwave and IAB-Based Fixed Wireless Access for Rural AreasAnselme Ndikumana, Kim Khoa Nguyen, Adel Larabi, Mohamed Cheriet2025-12-15下载Deploying fiber optics as a last-mile solution in rural areas is not economically viable due to low population density. Nevertheless, providing high-speed internet access in these regions is essential...
A Fair, Flexible, Zero-Waste Digital Electricity Market: A First-Principles Approach Combining Automatic Market Making, Holarchic Architectures and Shapley TheoryShaun Sweeney, Robert Shorten, Mark O'Malley2025-12-15下载This thesis presents a fundamental rethink of electricity market design at the wholesale and balancing layers. Rather than treating markets as static spot clearing mechanisms, it reframes them as a co...
A Secure Edge Gateway Architecture for Wi-Fi-Enabled IoTDaniyal Ganiuly, Nurzhau Bolatbek, Assel Smaiyl2025-12-15下载This paper presents a Secure Edge Gateway Architecture for Wi-Fi-Enabled IoT designed to strengthen local network protection without altering existing infrastructure.
Link-Aware Energy-Frugal Continual Learning for Fault Detection in IoT NetworksHenrik C. M. Frederiksen, Junya Shiraishi, Cedomir Stefanovic, Hei Victor Cheng, Shashi Raj Pandey2025-12-15下载The use of lightweight machine learning (ML) models in internet of things (IoT) networks enables resource constrained IoT devices to perform on-device inference for several critical applications.
Resource Orchestration and Optimization in 6G Extreme-edge ScenarioManuel A. Jimenez, Sarang Kahvazadeh, Ignacio Labrador, Josep Mangues-Bafalluy2025-12-15下载6G networks envision a pervasive service infrastructure spanning from centralized cloud to distributed edge and highly dynamic extreme-edge domains.
Low-Complexity Monitoring and Compensation of Transceiver IQ Imbalance by Multi-dimensional Architecture for Dual-Polarization 16 Quadrature Amplitude ModulationYukun Zhang, Xiaoxue Gong, Xu Zhang, Lei Guo2025-12-15下载In this paper, a low-complexity multi-dimensional architecture for IQ imbalance compensation is proposed, which reduces the effects of in-phase (I) and quadrature (Q) imbalance.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPECQingyuan Liu, Mo Zou, Hengbin Zhang, Dong Du, Yubin Xia, Haibo Chen2025-12-15下载File systems are critical OS components that require constant evolution to support new hardware and emerging application needs. However, the traditional paradigm of developing features, fixing bugs, a...

cs.PF - Performance

标题作者发布日期PDF摘要
OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming ReconstructionMohammad Mozaffari, Samuel Kushnir, Maryam Mehri Dehnavi, Amir Yazdanbakhsh2025-12-15下载Post-training model pruning is a promising solution, yet it faces a trade-off: simple heuristics that zero weights are fast but degrade accuracy, while principled joint optimization methods recover ac...
astroCAMP: A Community Benchmark and Co-Design Framework for Sustainable SKA-Scale Radio ImagingDenisa-Andreea Constantinescu, Rubén Rodríguez Álvarez, Jacques Morin, Etienne Orliac, Mickaël Dardaillon, Sunrise Wang, Hugo Miomandre, Miguel Peón-Quirós, Jean-François Nezan, David Atienza2025-12-15下载The Square Kilometre Array (SKA) will operate one of the world's largest continuous scientific data systems, sustaining petascale imaging under strict power envelopes.
EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPCSiyuan Shen, Mikhail Khalilov, Lukas Gianinazzi, Timo Schneider, Marcin Chrapek, Jai Dayal, Manisha Gajbe, Robert Wisniewski, Torsten Hoefler2025-12-15下载Resource disaggregation is a promising technique for improving the efficiency of large-scale computing systems. However, this comes at the cost of increased memory access latency due to the need to re...
GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable ProgramsRuifan Chu, Anbang Wang, Xiuxiu Bai, Shuai Liu, Xiaoshe Dong2025-12-15下载In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and e...

基于 VitePress 构建