Appearance
2025-12-15
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Pipeline Stage Resolved Timing Characterization of FPGA and ASIC Implementations of a RISC V Processor | Mostafa Darvishi | 2025-12-15 | 下载 | This paper presents a pipeline stage resolved timing characterization of a 32-bit RISC V processor implemented on a 20 nm FPGA and a 7 nm FinFET ASIC platform. |
| Lyra: A Hardware-Accelerated RISC-V Verification Framework with Generative Model-Based Processor Fuzzing | Juncheng Huo, Yunfan Gao, Xinxin Liu, Sa Wang, Yungang Bao, Xitong Gao, Kan Shi | 2025-12-15 | 下载 | As processor designs grow more complex, verification remains bottlenecked by slow software simulation and low-quality random test stimuli. Recent research has applied software fuzzers to hardware veri... |
| Design in Tiles: Automating GEMM Deployment on Tile-Based Many-PE Accelerators | Aofeng Shen, Chi Zhang, Yakup Budanaz, Alexandru Calotoiu, Torsten Hoefler, Luca Benini | 2025-12-15 | 下载 | Tile-based many-Processing Element (PE) accelerators can achieve competitive performance on General Matrix Multiplication (GEMM), but they are extremely hard to program, as their optimal software mapp... |
| Toward Reproducible and Standardized Computer Architecture Simulation with gem5 | Kunal Pai, Harshil Patel, Erin Le, Noah Krim, Mahyar Samani, Bobby R. Bruce, Jason Lowe-Power | 2025-12-15 | 下载 | Reproducibility in simulation-based computer architecture research requires coordinating artifacts like disk images, kernels, and benchmarks, but existing workflows are inconsistent. |
| Striking the Balance: GEMM Performance Optimization Across Generations of Ryzen AI NPUs | Endri Taka, Andre Roesti, Joseph Melber, Pranathi Vasireddy, Kristof Denolf, Diana Marculescu | 2025-12-15 | 下载 | The high computational and memory demands of modern deep learning (DL) workloads have led to the development of specialized hardware devices from cloud to edge, such as AMD's Ryzen AI XDNA NPUs. |
| Noise-Resilient Quantum Aggregation on NISQ for Federated ADAS Learning | Chethana Prasad Kabgere, Sudarshan T S B | 2025-12-15 | 下载 | Advanced Driver Assistance Systems (ADAS) increasingly employ Federated Learning (FL) to collaboratively train models across distributed vehicular nodes while preserving data privacy. |
| An Optimal Alignment-Driven Iterative Closed-Loop Convergence Framework for High-Performance Ultra-Large Scale Layout Pattern Clustering | Shuo Liu | 2025-12-15 | 下载 | With the aggressive scaling of VLSI technology, the explosion of layout patterns creates a critical bottleneck for DFM applications like OPC. Pattern clustering is essential to reduce data complexity,... |
| SliceMoE: Bit-Sliced Expert Caching under Miss-Rate Constraints for Efficient MoE Inference | Yuseon Choi, Sangjin Kim, Jungjun Oh, Gwangtae Park, Byeongcheol Kim, Hoi-Jun Yoo | 2025-12-15 | 下载 | MoE models offer efficient scaling through conditional computation, but their large parameter size and expensive expert offloading make on-device deployment challenging. |
| SeVeDo: A Heterogeneous Transformer Accelerator for Low-Bit Inference via Hierarchical Group Quantization and SVD-Guided Mixed Precision | Yuseon Choi, Sangjin Kim, Jungjun Oh, Byeongcheol Kim, Hoi-Jun Yoo | 2025-12-15 | 下载 | Low-bit quantization is a promising technique for efficient transformer inference by reducing computational and memory overhead. However, aggressive bitwidth reduction remains challenging due to activ... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Q-IRIS: The Evolution of the IRIS Task-Based Runtime to Enable Classical-Quantum Workflows | Narasinga Rao Miniskar, Mohammad Alaul Haque Monil, Elaine Wong, Vicente Leyton-Ortega, Jeffrey S. Vetter, Seth R. Johnson, Travis S. Humble | 2025-12-15 | 下载 | Extreme heterogeneity in emerging HPC systems are starting to include quantum accelerators, motivating runtimes that can coordinate between classical and quantum workloads. |
| SEDULity: A Proof-of-Learning Framework for Distributed and Secure Blockchains with Efficient Useful Work | Weihang Cao, Mustafa Doger, Sennur Ulukus | 2025-12-15 | 下载 | The security and decentralization of Proof-of-Work (PoW) have been well-tested in existing blockchain systems. However, its tremendous energy waste has raised concerns about sustainability. |
| Design in Tiles: Automating GEMM Deployment on Tile-Based Many-PE Accelerators | Aofeng Shen, Chi Zhang, Yakup Budanaz, Alexandru Calotoiu, Torsten Hoefler, Luca Benini | 2025-12-15 | 下载 | Tile-based many-Processing Element (PE) accelerators can achieve competitive performance on General Matrix Multiplication (GEMM), but they are extremely hard to program, as their optimal software mapp... |
| astroCAMP: A Community Benchmark and Co-Design Framework for Sustainable SKA-Scale Radio Imaging | Denisa-Andreea Constantinescu, Rubén Rodríguez Álvarez, Jacques Morin, Etienne Orliac, Mickaël Dardaillon, Sunrise Wang, Hugo Miomandre, Miguel Peón-Quirós, Jean-François Nezan, David Atienza | 2025-12-15 | 下载 | The Square Kilometre Array (SKA) will operate one of the world's largest continuous scientific data systems, sustaining petascale imaging under strict power envelopes. |
| Janus: Disaggregating Attention and Experts for Scalable MoE Inference | Zhexiang Zhang, Ye Wang, Xiangyu Wang, Yumiao Zhao, Jingzhe Jiang, Qizhen Weng, Shaohuai Shi, Yin Chen, Minchen Yu | 2025-12-15 | 下载 | Large Mixture-of-Experts (MoE) model inference is challenging due to high resource demands and dynamic workloads. Existing solutions often deploy the entire model as a single monolithic unit, which ap... |
| SIGMA: An AI-Empowered Training Stack on Early-Life Hardware | Lei Qu, Lianhai Ren, Peng Cheng, Rui Gao, Ruizhe Wang, Tianyu Chen, Xiao Liu, Xingjian Zhang, Yeyun Gong, Yifan Xiong, Yucheng Ding, Yuting Jiang, Zhenghao Lin, Zhongxin Guo, Ziyue Yang | 2025-12-15 | 下载 | An increasing variety of AI accelerators is being considered for large-scale training. However, enabling large-scale training on early-life AI accelerators faces three core challenges: frequent system... |
| Temporal parallelisation of continuous-time maximum-a-posteriori trajectory estimation | Hassan Razavi, Ángel F. García-Fernández, Simo Särkkä | 2025-12-15 | 下载 | This paper proposes a parallel-in-time method for computing continuous-time maximum-a-posteriori (MAP) trajectory estimates of the states of partially observed stochastic differential equations (SDEs)... |
| SPARS: A Reinforcement Learning-Enabled Simulator for Power Management in HPC Job Scheduling | Muhammad Alfian Amrizal, Raka Satya Prasasta, Santana Yuda Pradata, Kadek Gemilang Santiyuda, Reza Pulungan, Hiroyuki Takizawa | 2025-12-15 | 下载 | High-performance computing (HPC) clusters consume enormous amounts of energy, with idle nodes as a major source of waste. Powering down unused nodes can mitigate this problem, but poorly timed transit... |
| Towards Secure Decentralized Applications and Consensus Protocols in Blockchains (on Selfish Mining, Undercutting Attacks, DAG-Based Blockchains, E-Voting, Cryptocurrency Wallets, Secure-Logging, and CBDC) | Ivan Homoliak | 2025-12-15 | 下载 | With the rise of cryptocurrencies, many new applications built on decentralized blockchains have emerged. Blockchains are full-stack distributed systems where multiple sub-systems interact. |
| Adaptive GPU Resource Allocation for Multi-Agent Collaborative Reasoning in Serverless Environments | Guilin Zhang, Wulan Guo, Ziqi Tan | 2025-12-15 | 下载 | Multi-agent systems powered by large language models have emerged as a promising paradigm for solving complex reasoning tasks through collaborative intelligence. |
| Toward Self-Healing Networks-on-Chip: RL-Driven Routing in 2D Torus Architectures | Mohammad Walid Charrwi, Zaid Hussain | 2025-12-15 | 下载 | We investigate adaptive minimal routing in 2D torus networks on chip NoCs under node fault conditions comparing a reinforcement learning RL based strategy to an adaptive routing baseline A torus topol... |
| GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs | Ruifan Chu, Anbang Wang, Xiuxiu Bai, Shuai Liu, Xiaoshe Dong | 2025-12-15 | 下载 | In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and e... |
| FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection | Ziyu Huang, Yangjie Zhou, Zihan Liu, Xinhao Luo, Yijia Diao, Minyi Guo, Jidong Zhai, Yu Feng, Chen Zhang, Anbang Wu, Jingwen Leng | 2025-12-15 | 下载 | The scaling of computation throughput continues to outpace improvements in memory bandwidth, making many deep learning workloads memory-bound. |
| PROSERVE: Unified Multi-Priority Request Scheduling for LLM Serving | Weizhe Huang, Tao Peng, Tongxuan Liu, Donghe Jin, Xianzhe Dong, Ke Zhang | 2025-12-15 | 下载 | The widespread deployment of large language models (LLMs) for interactive applications necessitates serving systems that can handle thousands of concurrent requests with diverse Service Level Objectiv... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Assessing Resilience in Authoritative DNS Infrastructure Supporting Government Services | Agung Septiadi, Minzhao Lyu, Hassan Habibi Gharakheili, Vijay Sivaraman | 2025-12-15 | 下载 | Online government services are increasingly regarded as critical national infrastructure. Because these services directly influence public trust, any disruption can have significant societal and polit... |
| Energy-Efficient Multi-Radio Microwave and IAB-Based Fixed Wireless Access for Rural Areas | Anselme Ndikumana, Kim Khoa Nguyen, Adel Larabi, Mohamed Cheriet | 2025-12-15 | 下载 | Deploying fiber optics as a last-mile solution in rural areas is not economically viable due to low population density. Nevertheless, providing high-speed internet access in these regions is essential... |
| A Fair, Flexible, Zero-Waste Digital Electricity Market: A First-Principles Approach Combining Automatic Market Making, Holarchic Architectures and Shapley Theory | Shaun Sweeney, Robert Shorten, Mark O'Malley | 2025-12-15 | 下载 | This thesis presents a fundamental rethink of electricity market design at the wholesale and balancing layers. Rather than treating markets as static spot clearing mechanisms, it reframes them as a co... |
| A Secure Edge Gateway Architecture for Wi-Fi-Enabled IoT | Daniyal Ganiuly, Nurzhau Bolatbek, Assel Smaiyl | 2025-12-15 | 下载 | This paper presents a Secure Edge Gateway Architecture for Wi-Fi-Enabled IoT designed to strengthen local network protection without altering existing infrastructure. |
| Link-Aware Energy-Frugal Continual Learning for Fault Detection in IoT Networks | Henrik C. M. Frederiksen, Junya Shiraishi, Cedomir Stefanovic, Hei Victor Cheng, Shashi Raj Pandey | 2025-12-15 | 下载 | The use of lightweight machine learning (ML) models in internet of things (IoT) networks enables resource constrained IoT devices to perform on-device inference for several critical applications. |
| Resource Orchestration and Optimization in 6G Extreme-edge Scenario | Manuel A. Jimenez, Sarang Kahvazadeh, Ignacio Labrador, Josep Mangues-Bafalluy | 2025-12-15 | 下载 | 6G networks envision a pervasive service infrastructure spanning from centralized cloud to distributed edge and highly dynamic extreme-edge domains. |
| Low-Complexity Monitoring and Compensation of Transceiver IQ Imbalance by Multi-dimensional Architecture for Dual-Polarization 16 Quadrature Amplitude Modulation | Yukun Zhang, Xiaoxue Gong, Xu Zhang, Lei Guo | 2025-12-15 | 下载 | In this paper, a low-complexity multi-dimensional architecture for IQ imbalance compensation is proposed, which reduces the effects of in-phase (I) and quadrature (Q) imbalance. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC | Qingyuan Liu, Mo Zou, Hengbin Zhang, Dong Du, Yubin Xia, Haibo Chen | 2025-12-15 | 下载 | File systems are critical OS components that require constant evolution to support new hardware and emerging application needs. However, the traditional paradigm of developing features, fixing bugs, a... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming Reconstruction | Mohammad Mozaffari, Samuel Kushnir, Maryam Mehri Dehnavi, Amir Yazdanbakhsh | 2025-12-15 | 下载 | Post-training model pruning is a promising solution, yet it faces a trade-off: simple heuristics that zero weights are fast but degrade accuracy, while principled joint optimization methods recover ac... |
| astroCAMP: A Community Benchmark and Co-Design Framework for Sustainable SKA-Scale Radio Imaging | Denisa-Andreea Constantinescu, Rubén Rodríguez Álvarez, Jacques Morin, Etienne Orliac, Mickaël Dardaillon, Sunrise Wang, Hugo Miomandre, Miguel Peón-Quirós, Jean-François Nezan, David Atienza | 2025-12-15 | 下载 | The Square Kilometre Array (SKA) will operate one of the world's largest continuous scientific data systems, sustaining petascale imaging under strict power envelopes. |
| EDAN: Towards Understanding Memory Parallelism and Latency Sensitivity in HPC | Siyuan Shen, Mikhail Khalilov, Lukas Gianinazzi, Timo Schneider, Marcin Chrapek, Jai Dayal, Manisha Gajbe, Robert Wisniewski, Torsten Hoefler | 2025-12-15 | 下载 | Resource disaggregation is a promising technique for improving the efficiency of large-scale computing systems. However, this comes at the cost of increased memory access latency due to the need to re... |
| GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs | Ruifan Chu, Anbang Wang, Xiuxiu Bai, Shuai Liu, Xiaoshe Dong | 2025-12-15 | 下载 | In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and e... |