Appearance
2026-04-06
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Direct Integer Division in RNS and its Hardware Solutions | Eric B. Olsen | 2026-04-06 | 下载 | Residue Number Systems (RNS) offer efficient modular arithmetic and natural parallelism, but direct integer division in RNS remains a difficult and comparatively underdeveloped operation. |
| Comparative Characterization of KV Cache Management Strategies for LLM Inference | Oteo Mamo, Olga Kogiou, Hyunjin Yi, Weikuan Yu | 2026-04-06 | 下载 | Efficient inference with Large Language Models (LLMs) increasingly relies on Key-Value (KV) caches to store previously computed key and value vectors at each layer. |
| GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM Inference | Guoci Chen, Xiurui Pan, Qiao Li, Bo Mao, Congming Gao, Chengying Huan, Mingzhe Zhang, Jie Zhang | 2026-04-06 | 下载 | Deploying large language models (LLMs) as cloud services raises privacy concerns as inference may leak sensitive data. Fully Homomorphic Encryption (FHE) allows computation on encrypted data, but curr... |
| A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM | Siddhartha Raman Sundara Raman, Siyuan Ma, Lizy Kurian John | 2026-04-06 | 下载 | Compute-in-memory (PIM) mitigates the memory wall by performing computation within memory, reducing data movement and improving energy efficiency. |
| DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators | Zhiwen Mo, Guoyu Li, Hao, Chen, Yu Cheng, Zhengju Tang, Qianzhou Wang, Lei Wang, Shuang Liang, Lingxiao Ma, Xianqi Zhou, Yuxiao Guo, Wayne Luk, Jilong Xue, Hongxiang Fan | 2026-04-06 | 下载 | Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity. |
| Neuromorphic Computing for Low-Power Artificial Intelligence | Keshava Katti, Pratik Chaudhari, Deep Jariwala | 2026-04-06 | 下载 | Classical computing is beginning to encounter fundamental limits of energy efficiency. This presents a challenge that can no longer be solved by strategies such as increasing circuit density or refini... |
| GPIR: Enabling Practical Private Information Retrieval with GPUs | Hyesung Ji, Hyunah Yu, Jongmin Kim, Wonseok Choi, G. Edward Suh, Jung Ho Ahn | 2026-04-06 | 下载 | Private information retrieval (PIR) allows private database queries but is hindered by intense server-side computation and memory traffic. Modern lattice-based PIR protocols typically involve three ph... |
| Mestra: Exploring Migration on Virtualized CGRAs | Agamemnon Kyriazis, Panagiotis Miliadis, Dimitris Theodoropoulos, Nectarios Koziris, Dionisios Pnevmatikatos | 2026-04-06 | 下载 | As modern Coarse Grain Reconfigurable Arrays (CGRAs) grow in size, efficient utilization of the available fabric by a single application becomes increasingly difficult. |
| LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM | Junguk Hong, Changmin Shin, Sukjin Kim, Si Ung Noh, Taehee Kwon, Seongyeon Park, Hanjun Kim, Youngsok Kim, Jinho Lee | 2026-04-06 | 下载 | Lookup tables (LUTs) have recently gained attention as an alternative compute mechanism that maps input operands to precomputed results, eliminating the need for arithmetic logic. |
| DHFP-PE: Dual-Precision Hybrid Floating Point Processing Element for AI Acceleration | Shubham Kumar, Vijay Pratap Sharma, Vaibhav Neema, Santosh Kumar Vishvakarma | 2026-04-06 | 下载 | The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) unit... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Analyzing Persistent Alltoallv RMA Implementations for High-Performance MPI Communication | Evelyn Namugwanya | 2026-04-06 | 下载 | Collective communication operations such as MPI_Alltoallv are central to many HPC applications, particularly those with irregular message sizes. |
| MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU | Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye | 2026-04-06 | 下载 | We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. |
| Towards Policy-Enabled Multi-Hop Routing for Cross-Chain Message Delivery | Amin Rezaei, Solomon L. Davidson, Bernard Wong | 2026-04-06 | 下载 | Blockchain ecosystems face a significant issue with liquidity fragmentation, as applications and assets are distributed across many public chains with each only accessible by subset of users. |
| DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators | Zhiwen Mo, Guoyu Li, Hao, Chen, Yu Cheng, Zhengju Tang, Qianzhou Wang, Lei Wang, Shuang Liang, Lingxiao Ma, Xianqi Zhou, Yuxiao Guo, Wayne Luk, Jilong Xue, Hongxiang Fan | 2026-04-06 | 下载 | Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity. |
| RegGuard: Legitimacy and Fairness Enforcement for Optimistic Rollups | Zhenhang Shang, Yingzhe Yu, Kani Chen | 2026-04-06 | 下载 | Optimistic rollups provide scalable smart-contract execution but remain unsuitable for regulated financial applications due to three structural gaps: semantic legitimacy, cross-layer state consistency... |
| The Energy Cost of Execution-Idle in GPU Clusters | Yiran Lei, Jared Fernandez, Vasilis Kypriotis, Dimitrios Skarlatos, Emma Strubell, Justine Sherry, Daniel Vosler | 2026-04-06 | 下载 | GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle. |
| Sampling Parallelism for Fast and Efficient Bayesian Learning | Asena Karolin Özdemir, Lars H. Heyen, Arvid Weyrauch, Achim Streit, Markus Götz, Charlotte Debus | 2026-04-06 | 下载 | Machine learning models, and deep neural networks in particular, are increasingly deployed in risk-sensitive domains such as healthcare, environmental forecasting, and finance, where reliable quantifi... |
| Communication-Efficient Collaborative LLM Inference over LEO Satellite Networks | Songge Zhang, Wen Wu, Liang Li, Ye Wang, Xuemin, Shen | 2026-04-06 | 下载 | Low Earth orbit (LEO) satellites play an essential role in intelligent Earth observation by leveraging artificial intelligence models. However, limited onboard memory and excessive inference delay pre... |
| Edge-Oriented Orchestration of Energy Services Using Graph-Driven Swarm Intelligence | Liana Toderean, Dragos Lazea, Vasile Ofrim, Stefania Dumbrava, Anca Hangan, Tudor Cioara | 2026-04-06 | 下载 | As smart grids increasingly depend on IoT devices and distributed energy management, they require decentralized, low latency orchestration of energy services. |
| Tight Bounds on Window Size and Time for Single-Agent Graph Exploration under T-Interval Connectivity | Yuichi Sudo, Naoki Kitamura, Masahiro Shibata, Junya Nakamura, Sébastien Tixeuil, Toshimitsu Masuzawa, Koichi Wada | 2026-04-06 | 下载 | We study deterministic exploration by a single agent in -interval-connected graphs, a standard model of dynamic networks in which, for every time window of length , the intersection of the graph... |
| LP-GEMM: Integrating Layout Propagation into GEMM Operations | César Guedes Carneiro, Lucas Alvarenga, Guido Araujo, Sandro Rigo | 2026-04-06 | 下载 | In Scientific Computing and modern Machine Learning (ML) workloads, sequences of dependent General Matrix Multiplications (GEMMs) often dominate execution time. |
| NBI-Slurm: Simplified submission of Slurm jobs with energy saving mode | Andrea Telatin | 2026-04-06 | 下载 | NBI-Slurm is a Perl package that provides a simplified, user-friendly interface for submitting and managing jobs on SLURM high-performance computing (HPC) clusters. |
| An experimental evaluation of satellite constellation emulators | Victor Cionca, Ferenc Szabo, Stanimir Vasilev, Dylan Smyth | 2026-04-06 | 下载 | Satellite emulation software is essential for research due to the lack of access to physical testbeds. To be useful, emulators must generate observations that are well-aligned with real-world ones, an... |
| GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads | Fanjiang Ye, Zhangke Li, Xinrui Zhong, Ethan Ma, Russell Chen, Kaijian Wang, Jingwei Zuo, Desen Sun, Ye Cao, Triston Cao, Myungjin Lee, Arvind Krishnamurthy, Yuke Wang | 2026-04-06 | 下载 | Diffusion models have emerged as the prevailing approach for text-to-image (T2I) and text-to-video (T2V) generation, yet production platforms must increasingly serve both modalities on shared GPU clus... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Analyzing Symbolic Properties for DRL Agents in Systems and Networking | Mohammad Zangooei, Jannis Weil, Amr Rizk, Mina Tahmasbi Arashloo, Raouf Boutaba | 2026-04-06 | 下载 | Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congesti... |
| Towards Policy-Enabled Multi-Hop Routing for Cross-Chain Message Delivery | Amin Rezaei, Solomon L. Davidson, Bernard Wong | 2026-04-06 | 下载 | Blockchain ecosystems face a significant issue with liquidity fragmentation, as applications and assets are distributed across many public chains with each only accessible by subset of users. |
| ACHEM: A Real-Time Digital Twin Framework with Channel and Radio Emulation | Anil Gurses, Mihail L. Sichitiu | 2026-04-06 | 下载 | Digital twins are becoming an important tool for designing, developing, testing, and optimizing next-generation wireless communication systems. |
| nascTime: A Full-Stack 5G-TSN Bridge Simulation Framework with SDAP-Based QoS Mapping and IEEE 802.1AS Transparent Clock | Mohamed Seliem, Utz Roedig, Cormac Sreenan, Dirk Pesch | 2026-04-06 | 下载 | The integration of 5G with IEEE 802.1 Time-Sensitive Networking (TSN) is essential for enabling flexible and mobile deterministic communication in industrial automation. |
| Comprehensive Analysis of Cellular Uplink Performance in a Dense Stadium Deployment | S. M. Haider Ali Shuvo, Hardani Ismu Nabil, Joshua Roy Palathinkal, Muhammad I. Rochman, Monisha Ghosh | 2026-04-06 | 下载 | Uplink performance remains a critical limitation in modern 5G networks, where UEs have to balance limited transmission power against propagation challenges. |
| OrbitTransit: Traffic Delivery and Diffusion for Earth Observation via Satellite Mobility | Haoyuan Zhao, Long Chen, Yi Ching Chou, Hao Fang, Jiangchuan Liu | 2026-04-06 | 下载 | The emerging demand for Earth observation (EO) to address environmental challenges has driven unprecedented growth in its primary carrier, Low Earth Orbit satellites, in recent years. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU | Zhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye | 2026-04-06 | 下载 | We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels | Yifan Zhu, Yekai Pan, Yanghui Wu, Chen Ding | 2026-04-06 | 下载 | Data movement is the primary bottleneck in modern computing systems. For loop-based programs common in high-performance computing (HPC) and AI workloads, including matrix multiplication, tensor contra... |
| Learning, Potential, and Retention: An Approach for Evaluating Adaptive AI-Enabled Medical Devices | Alexis Burgon, Berkman Sahiner, Nicholas A Petrick, Gene Pennello, Ravi K Samala | 2026-04-06 | 下载 | This work addresses challenges in evaluating adaptive artificial intelligence (AI) models for medical devices, where iterative updates to both models and evaluation datasets complicate performance ass... |
| The Energy Cost of Execution-Idle in GPU Clusters | Yiran Lei, Jared Fernandez, Vasilis Kypriotis, Dimitrios Skarlatos, Emma Strubell, Justine Sherry, Daniel Vosler | 2026-04-06 | 下载 | GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle. |
| An experimental evaluation of satellite constellation emulators | Victor Cionca, Ferenc Szabo, Stanimir Vasilev, Dylan Smyth | 2026-04-06 | 下载 | Satellite emulation software is essential for research due to the lack of access to physical testbeds. To be useful, emulators must generate observations that are well-aligned with real-world ones, an... |
| Modeling and Analysis of Air-to-Ground Cellular KPIs in a 5G Testbed using Android Smartphones | Simran Singh, Anıl Gürses, Özgür Özdemir, Ram Asokan, Mihail L. Sichitiu, İsmail Güvenç, Rudra Dutta, Magreth Mushi | 2026-04-06 | 下载 | The integration of cellular communication with Unmanned Aerial Vehicles (UAVs) extends the range of command and control and payload communications of autonomous UAV applications. |
| Training Transformers in Cosine Coefficient Space | Mohamed Amine Bergach | 2026-04-06 | 下载 | We parameterize the weight matrices of a transformer in the two-dimensional discrete cosine transform (DCT) domain, retaining only the lowest-frequency coefficients. |
| REAM: Merging Improves Pruning of Experts in LLMs | Saurav Jha, Maryam Hashemzadeh, Ali Saheb Pasand, Ali Parviz, Min-Joong Lee, Boris Knyazev | 2026-04-06 | 下载 | Mixture-of-Experts (MoE) large language models (LLMs) are among the top-performing architectures. The largest models, often with hundreds of billions of parameters, pose significant memory challenges ... |