2026-04-06

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Direct Integer Division in RNS and its Hardware Solutions	Eric B. Olsen	2026-04-06	下载	Residue Number Systems (RNS) offer efficient modular arithmetic and natural parallelism, but direct integer division in RNS remains a difficult and comparatively underdeveloped operation.
Comparative Characterization of KV Cache Management Strategies for LLM Inference	Oteo Mamo, Olga Kogiou, Hyunjin Yi, Weikuan Yu	2026-04-06	下载	Efficient inference with Large Language Models (LLMs) increasingly relies on Key-Value (KV) caches to store previously computed key and value vectors at each layer.
GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM Inference	Guoci Chen, Xiurui Pan, Qiao Li, Bo Mao, Congming Gao, Chengying Huan, Mingzhe Zhang, Jie Zhang	2026-04-06	下载	Deploying large language models (LLMs) as cloud services raises privacy concerns as inference may leak sensitive data. Fully Homomorphic Encryption (FHE) allows computation on encrypted data, but curr...
A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM	Siddhartha Raman Sundara Raman, Siyuan Ma, Lizy Kurian John	2026-04-06	下载	Compute-in-memory (PIM) mitigates the memory wall by performing computation within memory, reducing data movement and improving energy efficiency.
DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators	Zhiwen Mo, Guoyu Li, Hao, Chen, Yu Cheng, Zhengju Tang, Qianzhou Wang, Lei Wang, Shuang Liang, Lingxiao Ma, Xianqi Zhou, Yuxiao Guo, Wayne Luk, Jilong Xue, Hongxiang Fan	2026-04-06	下载	Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity.
Neuromorphic Computing for Low-Power Artificial Intelligence	Keshava Katti, Pratik Chaudhari, Deep Jariwala	2026-04-06	下载	Classical computing is beginning to encounter fundamental limits of energy efficiency. This presents a challenge that can no longer be solved by strategies such as increasing circuit density or refini...
GPIR: Enabling Practical Private Information Retrieval with GPUs	Hyesung Ji, Hyunah Yu, Jongmin Kim, Wonseok Choi, G. Edward Suh, Jung Ho Ahn	2026-04-06	下载	Private information retrieval (PIR) allows private database queries but is hindered by intense server-side computation and memory traffic. Modern lattice-based PIR protocols typically involve three ph...
Mestra: Exploring Migration on Virtualized CGRAs	Agamemnon Kyriazis, Panagiotis Miliadis, Dimitris Theodoropoulos, Nectarios Koziris, Dionisios Pnevmatikatos	2026-04-06	下载	As modern Coarse Grain Reconfigurable Arrays (CGRAs) grow in size, efficient utilization of the available fabric by a single application becomes increasingly difficult.
LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM	Junguk Hong, Changmin Shin, Sukjin Kim, Si Ung Noh, Taehee Kwon, Seongyeon Park, Hanjun Kim, Youngsok Kim, Jinho Lee	2026-04-06	下载	Lookup tables (LUTs) have recently gained attention as an alternative compute mechanism that maps input operands to precomputed results, eliminating the need for arithmetic logic.
DHFP-PE: Dual-Precision Hybrid Floating Point Processing Element for AI Acceleration	Shubham Kumar, Vijay Pratap Sharma, Vaibhav Neema, Santosh Kumar Vishvakarma	2026-04-06	下载	The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) unit...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Analyzing Persistent Alltoallv RMA Implementations for High-Performance MPI Communication	Evelyn Namugwanya	2026-04-06	下载	Collective communication operations such as MPI_Alltoallv are central to many HPC applications, particularly those with irregular message sizes.
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU	Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye	2026-04-06	下载	We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU.
Towards Policy-Enabled Multi-Hop Routing for Cross-Chain Message Delivery	Amin Rezaei, Solomon L. Davidson, Bernard Wong	2026-04-06	下载	Blockchain ecosystems face a significant issue with liquidity fragmentation, as applications and assets are distributed across many public chains with each only accessible by subset of users.
DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators	Zhiwen Mo, Guoyu Li, Hao, Chen, Yu Cheng, Zhengju Tang, Qianzhou Wang, Lei Wang, Shuang Liang, Lingxiao Ma, Xianqi Zhou, Yuxiao Guo, Wayne Luk, Jilong Xue, Hongxiang Fan	2026-04-06	下载	Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity.
RegGuard: Legitimacy and Fairness Enforcement for Optimistic Rollups	Zhenhang Shang, Yingzhe Yu, Kani Chen	2026-04-06	下载	Optimistic rollups provide scalable smart-contract execution but remain unsuitable for regulated financial applications due to three structural gaps: semantic legitimacy, cross-layer state consistency...
The Energy Cost of Execution-Idle in GPU Clusters	Yiran Lei, Jared Fernandez, Vasilis Kypriotis, Dimitrios Skarlatos, Emma Strubell, Justine Sherry, Daniel Vosler	2026-04-06	下载	GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle.
Sampling Parallelism for Fast and Efficient Bayesian Learning	Asena Karolin Özdemir, Lars H. Heyen, Arvid Weyrauch, Achim Streit, Markus Götz, Charlotte Debus	2026-04-06	下载	Machine learning models, and deep neural networks in particular, are increasingly deployed in risk-sensitive domains such as healthcare, environmental forecasting, and finance, where reliable quantifi...
Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks	Songge Zhang, Wen Wu, Liang Li, Ye Wang, Xuemin, Shen	2026-04-06	下载	Low Earth orbit (LEO) satellites play an essential role in intelligent Earth observation by leveraging artificial intelligence models. However, limited onboard memory and excessive inference delay pre...
Edge-Oriented Orchestration of Energy Services Using Graph-Driven Swarm Intelligence	Liana Toderean, Dragos Lazea, Vasile Ofrim, Stefania Dumbrava, Anca Hangan, Tudor Cioara	2026-04-06	下载	As smart grids increasingly depend on IoT devices and distributed energy management, they require decentralized, low latency orchestration of energy services.
Tight Bounds on Window Size and Time for Single-Agent Graph Exploration under T-Interval Connectivity	Yuichi Sudo, Naoki Kitamura, Masahiro Shibata, Junya Nakamura, Sébastien Tixeuil, Toshimitsu Masuzawa, Koichi Wada	2026-04-06	下载	We study deterministic exploration by a single agent in $T$ -interval-connected graphs, a standard model of dynamic networks in which, for every time window of length $T$ , the intersection of the graph...
LP-GEMM: Integrating Layout Propagation into GEMM Operations	César Guedes Carneiro, Lucas Alvarenga, Guido Araujo, Sandro Rigo	2026-04-06	下载	In Scientific Computing and modern Machine Learning (ML) workloads, sequences of dependent General Matrix Multiplications (GEMMs) often dominate execution time.
NBI-Slurm: Simplified submission of Slurm jobs with energy saving mode	Andrea Telatin	2026-04-06	下载	NBI-Slurm is a Perl package that provides a simplified, user-friendly interface for submitting and managing jobs on SLURM high-performance computing (HPC) clusters.
An experimental evaluation of satellite constellation emulators	Victor Cionca, Ferenc Szabo, Stanimir Vasilev, Dylan Smyth	2026-04-06	下载	Satellite emulation software is essential for research due to the lack of access to physical testbeds. To be useful, emulators must generate observations that are well-aligned with real-world ones, an...
GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads	Fanjiang Ye, Zhangke Li, Xinrui Zhong, Ethan Ma, Russell Chen, Kaijian Wang, Jingwei Zuo, Desen Sun, Ye Cao, Triston Cao, Myungjin Lee, Arvind Krishnamurthy, Yuke Wang	2026-04-06	下载	Diffusion models have emerged as the prevailing approach for text-to-image (T2I) and text-to-video (T2V) generation, yet production platforms must increasingly serve both modalities on shared GPU clus...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Analyzing Symbolic Properties for DRL Agents in Systems and Networking	Mohammad Zangooei, Jannis Weil, Amr Rizk, Mina Tahmasbi Arashloo, Raouf Boutaba	2026-04-06	下载	Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congesti...
Towards Policy-Enabled Multi-Hop Routing for Cross-Chain Message Delivery	Amin Rezaei, Solomon L. Davidson, Bernard Wong	2026-04-06	下载	Blockchain ecosystems face a significant issue with liquidity fragmentation, as applications and assets are distributed across many public chains with each only accessible by subset of users.
ACHEM: A Real-Time Digital Twin Framework with Channel and Radio Emulation	Anil Gurses, Mihail L. Sichitiu	2026-04-06	下载	Digital twins are becoming an important tool for designing, developing, testing, and optimizing next-generation wireless communication systems.
nascTime: A Full-Stack 5G-TSN Bridge Simulation Framework with SDAP-Based QoS Mapping and IEEE 802.1AS Transparent Clock	Mohamed Seliem, Utz Roedig, Cormac Sreenan, Dirk Pesch	2026-04-06	下载	The integration of 5G with IEEE 802.1 Time-Sensitive Networking (TSN) is essential for enabling flexible and mobile deterministic communication in industrial automation.
Comprehensive Analysis of Cellular Uplink Performance in a Dense Stadium Deployment	S. M. Haider Ali Shuvo, Hardani Ismu Nabil, Joshua Roy Palathinkal, Muhammad I. Rochman, Monisha Ghosh	2026-04-06	下载	Uplink performance remains a critical limitation in modern 5G networks, where UEs have to balance limited transmission power against propagation challenges.
OrbitTransit: Traffic Delivery and Diffusion for Earth Observation via Satellite Mobility	Haoyuan Zhao, Long Chen, Yi Ching Chou, Hao Fang, Jiangchuan Liu	2026-04-06	下载	The emerging demand for Earth observation (EO) to address environmental challenges has driven unprecedented growth in its primary carrier, Low Earth Orbit satellites, in recent years.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU	Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye	2026-04-06	下载	We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels	Yifan Zhu, Yekai Pan, Yanghui Wu, Chen Ding	2026-04-06	下载	Data movement is the primary bottleneck in modern computing systems. For loop-based programs common in high-performance computing (HPC) and AI workloads, including matrix multiplication, tensor contra...
Learning, Potential, and Retention: An Approach for Evaluating Adaptive AI-Enabled Medical Devices	Alexis Burgon, Berkman Sahiner, Nicholas A Petrick, Gene Pennello, Ravi K Samala	2026-04-06	下载	This work addresses challenges in evaluating adaptive artificial intelligence (AI) models for medical devices, where iterative updates to both models and evaluation datasets complicate performance ass...
The Energy Cost of Execution-Idle in GPU Clusters	Yiran Lei, Jared Fernandez, Vasilis Kypriotis, Dimitrios Skarlatos, Emma Strubell, Justine Sherry, Daniel Vosler	2026-04-06	下载	GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle.
An experimental evaluation of satellite constellation emulators	Victor Cionca, Ferenc Szabo, Stanimir Vasilev, Dylan Smyth	2026-04-06	下载	Satellite emulation software is essential for research due to the lack of access to physical testbeds. To be useful, emulators must generate observations that are well-aligned with real-world ones, an...
Modeling and Analysis of Air-to-Ground Cellular KPIs in a 5G Testbed using Android Smartphones	Simran Singh, Anıl Gürses, Özgür Özdemir, Ram Asokan, Mihail L. Sichitiu, İsmail Güvenç, Rudra Dutta, Magreth Mushi	2026-04-06	下载	The integration of cellular communication with Unmanned Aerial Vehicles (UAVs) extends the range of command and control and payload communications of autonomous UAV applications.
Training Transformers in Cosine Coefficient Space	Mohamed Amine Bergach	2026-04-06	下载	We parameterize the weight matrices of a transformer in the two-dimensional discrete cosine transform (DCT) domain, retaining only the lowest-frequency coefficients.
REAM: Merging Improves Pruning of Experts in LLMs	Saurav Jha, Maryam Hashemzadeh, Ali Saheb Pasand, Ali Parviz, Min-Joong Lee, Boris Knyazev	2026-04-06	下载	Mixture-of-Experts (MoE) large language models (LLMs) are among the top-performing architectures. The largest models, often with hundreds of billions of parameters, pose significant memory challenges ...