Skip to content

2026-04-06

cs.AR - Architecture

标题作者发布日期PDF摘要
Direct Integer Division in RNS and its Hardware SolutionsEric B. Olsen2026-04-06下载Residue Number Systems (RNS) offer efficient modular arithmetic and natural parallelism, but direct integer division in RNS remains a difficult and comparatively underdeveloped operation.
Comparative Characterization of KV Cache Management Strategies for LLM InferenceOteo Mamo, Olga Kogiou, Hyunjin Yi, Weikuan Yu2026-04-06下载Efficient inference with Large Language Models (LLMs) increasingly relies on Key-Value (KV) caches to store previously computed key and value vectors at each layer.
GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM InferenceGuoci Chen, Xiurui Pan, Qiao Li, Bo Mao, Congming Gao, Chengying Huan, Mingzhe Zhang, Jie Zhang2026-04-06下载Deploying large language models (LLMs) as cloud services raises privacy concerns as inference may leak sensitive data. Fully Homomorphic Encryption (FHE) allows computation on encrypted data, but curr...
A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAMSiddhartha Raman Sundara Raman, Siyuan Ma, Lizy Kurian John2026-04-06下载Compute-in-memory (PIM) mitigates the memory wall by performing computation within memory, reducing data movement and improving energy efficiency.
DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI AcceleratorsZhiwen Mo, Guoyu Li, Hao, Chen, Yu Cheng, Zhengju Tang, Qianzhou Wang, Lei Wang, Shuang Liang, Lingxiao Ma, Xianqi Zhou, Yuxiao Guo, Wayne Luk, Jilong Xue, Hongxiang Fan2026-04-06下载Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity.
Neuromorphic Computing for Low-Power Artificial IntelligenceKeshava Katti, Pratik Chaudhari, Deep Jariwala2026-04-06下载Classical computing is beginning to encounter fundamental limits of energy efficiency. This presents a challenge that can no longer be solved by strategies such as increasing circuit density or refini...
GPIR: Enabling Practical Private Information Retrieval with GPUsHyesung Ji, Hyunah Yu, Jongmin Kim, Wonseok Choi, G. Edward Suh, Jung Ho Ahn2026-04-06下载Private information retrieval (PIR) allows private database queries but is hindered by intense server-side computation and memory traffic. Modern lattice-based PIR protocols typically involve three ph...
Mestra: Exploring Migration on Virtualized CGRAsAgamemnon Kyriazis, Panagiotis Miliadis, Dimitris Theodoropoulos, Nectarios Koziris, Dionisios Pnevmatikatos2026-04-06下载As modern Coarse Grain Reconfigurable Arrays (CGRAs) grow in size, efficient utilization of the available fabric by a single application becomes increasingly difficult.
LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIMJunguk Hong, Changmin Shin, Sukjin Kim, Si Ung Noh, Taehee Kwon, Seongyeon Park, Hanjun Kim, Youngsok Kim, Jinho Lee2026-04-06下载Lookup tables (LUTs) have recently gained attention as an alternative compute mechanism that maps input operands to precomputed results, eliminating the need for arithmetic logic.
DHFP-PE: Dual-Precision Hybrid Floating Point Processing Element for AI AccelerationShubham Kumar, Vijay Pratap Sharma, Vaibhav Neema, Santosh Kumar Vishvakarma2026-04-06下载The rapid adoption of low-precision arithmetic in artificial intelligence and edge computing has created a strong demand for energy-efficient and flexible floating-point multiply-accumulate (MAC) unit...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Analyzing Persistent Alltoallv RMA Implementations for High-Performance MPI CommunicationEvelyn Namugwanya2026-04-06下载Collective communication operations such as MPI_Alltoallv are central to many HPC applications, particularly those with irregular message sizes.
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPUZhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye2026-04-06下载We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU.
Towards Policy-Enabled Multi-Hop Routing for Cross-Chain Message DeliveryAmin Rezaei, Solomon L. Davidson, Bernard Wong2026-04-06下载Blockchain ecosystems face a significant issue with liquidity fragmentation, as applications and assets are distributed across many public chains with each only accessible by subset of users.
DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI AcceleratorsZhiwen Mo, Guoyu Li, Hao, Chen, Yu Cheng, Zhengju Tang, Qianzhou Wang, Lei Wang, Shuang Liang, Lingxiao Ma, Xianqi Zhou, Yuxiao Guo, Wayne Luk, Jilong Xue, Hongxiang Fan2026-04-06下载Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity.
RegGuard: Legitimacy and Fairness Enforcement for Optimistic RollupsZhenhang Shang, Yingzhe Yu, Kani Chen2026-04-06下载Optimistic rollups provide scalable smart-contract execution but remain unsuitable for regulated financial applications due to three structural gaps: semantic legitimacy, cross-layer state consistency...
The Energy Cost of Execution-Idle in GPU ClustersYiran Lei, Jared Fernandez, Vasilis Kypriotis, Dimitrios Skarlatos, Emma Strubell, Justine Sherry, Daniel Vosler2026-04-06下载GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle.
Sampling Parallelism for Fast and Efficient Bayesian LearningAsena Karolin Özdemir, Lars H. Heyen, Arvid Weyrauch, Achim Streit, Markus Götz, Charlotte Debus2026-04-06下载Machine learning models, and deep neural networks in particular, are increasingly deployed in risk-sensitive domains such as healthcare, environmental forecasting, and finance, where reliable quantifi...
Communication-Efficient Collaborative LLM Inference over LEO Satellite NetworksSongge Zhang, Wen Wu, Liang Li, Ye Wang, Xuemin, Shen2026-04-06下载Low Earth orbit (LEO) satellites play an essential role in intelligent Earth observation by leveraging artificial intelligence models. However, limited onboard memory and excessive inference delay pre...
Edge-Oriented Orchestration of Energy Services Using Graph-Driven Swarm IntelligenceLiana Toderean, Dragos Lazea, Vasile Ofrim, Stefania Dumbrava, Anca Hangan, Tudor Cioara2026-04-06下载As smart grids increasingly depend on IoT devices and distributed energy management, they require decentralized, low latency orchestration of energy services.
Tight Bounds on Window Size and Time for Single-Agent Graph Exploration under T-Interval ConnectivityYuichi Sudo, Naoki Kitamura, Masahiro Shibata, Junya Nakamura, Sébastien Tixeuil, Toshimitsu Masuzawa, Koichi Wada2026-04-06下载We study deterministic exploration by a single agent in TT-interval-connected graphs, a standard model of dynamic networks in which, for every time window of length TT, the intersection of the graph...
LP-GEMM: Integrating Layout Propagation into GEMM OperationsCésar Guedes Carneiro, Lucas Alvarenga, Guido Araujo, Sandro Rigo2026-04-06下载In Scientific Computing and modern Machine Learning (ML) workloads, sequences of dependent General Matrix Multiplications (GEMMs) often dominate execution time.
NBI-Slurm: Simplified submission of Slurm jobs with energy saving modeAndrea Telatin2026-04-06下载NBI-Slurm is a Perl package that provides a simplified, user-friendly interface for submitting and managing jobs on SLURM high-performance computing (HPC) clusters.
An experimental evaluation of satellite constellation emulatorsVictor Cionca, Ferenc Szabo, Stanimir Vasilev, Dylan Smyth2026-04-06下载Satellite emulation software is essential for research due to the lack of access to physical testbeds. To be useful, emulators must generate observations that are well-aligned with real-world ones, an...
GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model WorkloadsFanjiang Ye, Zhangke Li, Xinrui Zhong, Ethan Ma, Russell Chen, Kaijian Wang, Jingwei Zuo, Desen Sun, Ye Cao, Triston Cao, Myungjin Lee, Arvind Krishnamurthy, Yuke Wang2026-04-06下载Diffusion models have emerged as the prevailing approach for text-to-image (T2I) and text-to-video (T2V) generation, yet production platforms must increasingly serve both modalities on shared GPU clus...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Analyzing Symbolic Properties for DRL Agents in Systems and NetworkingMohammad Zangooei, Jannis Weil, Amr Rizk, Mina Tahmasbi Arashloo, Raouf Boutaba2026-04-06下载Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congesti...
Towards Policy-Enabled Multi-Hop Routing for Cross-Chain Message DeliveryAmin Rezaei, Solomon L. Davidson, Bernard Wong2026-04-06下载Blockchain ecosystems face a significant issue with liquidity fragmentation, as applications and assets are distributed across many public chains with each only accessible by subset of users.
ACHEM: A Real-Time Digital Twin Framework with Channel and Radio EmulationAnil Gurses, Mihail L. Sichitiu2026-04-06下载Digital twins are becoming an important tool for designing, developing, testing, and optimizing next-generation wireless communication systems.
nascTime: A Full-Stack 5G-TSN Bridge Simulation Framework with SDAP-Based QoS Mapping and IEEE 802.1AS Transparent ClockMohamed Seliem, Utz Roedig, Cormac Sreenan, Dirk Pesch2026-04-06下载The integration of 5G with IEEE 802.1 Time-Sensitive Networking (TSN) is essential for enabling flexible and mobile deterministic communication in industrial automation.
Comprehensive Analysis of Cellular Uplink Performance in a Dense Stadium DeploymentS. M. Haider Ali Shuvo, Hardani Ismu Nabil, Joshua Roy Palathinkal, Muhammad I. Rochman, Monisha Ghosh2026-04-06下载Uplink performance remains a critical limitation in modern 5G networks, where UEs have to balance limited transmission power against propagation challenges.
OrbitTransit: Traffic Delivery and Diffusion for Earth Observation via Satellite MobilityHaoyuan Zhao, Long Chen, Yi Ching Chou, Hao Fang, Jiangchuan Liu2026-04-06下载The emerging demand for Earth observation (EO) to address environmental challenges has driven unprecedented growth in its primary carrier, Low Earth Orbit satellites, in recent years.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPUZhengqing Yuan, Hanchi Sun, Lichao Sun, Yanfang Ye2026-04-06下载We present MegaTrain, a memory-centric system that efficiently trains 100B+ parameter large language models at full precision on a single GPU.

cs.PF - Performance

标题作者发布日期PDF摘要
AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC KernelsYifan Zhu, Yekai Pan, Yanghui Wu, Chen Ding2026-04-06下载Data movement is the primary bottleneck in modern computing systems. For loop-based programs common in high-performance computing (HPC) and AI workloads, including matrix multiplication, tensor contra...
Learning, Potential, and Retention: An Approach for Evaluating Adaptive AI-Enabled Medical DevicesAlexis Burgon, Berkman Sahiner, Nicholas A Petrick, Gene Pennello, Ravi K Samala2026-04-06下载This work addresses challenges in evaluating adaptive artificial intelligence (AI) models for medical devices, where iterative updates to both models and evaluation datasets complicate performance ass...
The Energy Cost of Execution-Idle in GPU ClustersYiran Lei, Jared Fernandez, Vasilis Kypriotis, Dimitrios Skarlatos, Emma Strubell, Justine Sherry, Daniel Vosler2026-04-06下载GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle.
An experimental evaluation of satellite constellation emulatorsVictor Cionca, Ferenc Szabo, Stanimir Vasilev, Dylan Smyth2026-04-06下载Satellite emulation software is essential for research due to the lack of access to physical testbeds. To be useful, emulators must generate observations that are well-aligned with real-world ones, an...
Modeling and Analysis of Air-to-Ground Cellular KPIs in a 5G Testbed using Android SmartphonesSimran Singh, Anıl Gürses, Özgür Özdemir, Ram Asokan, Mihail L. Sichitiu, İsmail Güvenç, Rudra Dutta, Magreth Mushi2026-04-06下载The integration of cellular communication with Unmanned Aerial Vehicles (UAVs) extends the range of command and control and payload communications of autonomous UAV applications.
Training Transformers in Cosine Coefficient SpaceMohamed Amine Bergach2026-04-06下载We parameterize the weight matrices of a transformer in the two-dimensional discrete cosine transform (DCT) domain, retaining only the lowest-frequency coefficients.
REAM: Merging Improves Pruning of Experts in LLMsSaurav Jha, Maryam Hashemzadeh, Ali Saheb Pasand, Ali Parviz, Min-Joong Lee, Boris Knyazev2026-04-06下载Mixture-of-Experts (MoE) large language models (LLMs) are among the top-performing architectures. The largest models, often with hundreds of billions of parameters, pose significant memory challenges ...

基于 VitePress 构建