2026-04-02

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Fast NF4 Dequantization Kernels for Large Language Model Inference	Xiangbo Qi, Chaoyi Jiang, Murali Annavaram	2026-04-02	下载	Large language models (LLMs) have grown beyond the memory capacity of single GPU devices, necessitating quantization techniques for practical deployment.
Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods	Amel Fatima, Tuan Ta, Bradford M. Beckmann	2026-04-02	下载	Distributed ML workloads rely heavily on collective communication across multi-GPU, multi-node systems. Emerging scale-up fabrics, such as NVLink and UALink, enable direct memory access across nodes b...
InsightBoard: An Interactive Multi-Metric Visualization and Fairness Analysis Plugin for TensorBoard	Ray Zeyao Chen, Christan Grant	2026-04-02	下载	Modern machine learning systems deployed in safety-critical domains require visibility not only into aggregate performance but also into how training dynamics affect subgroup fairness over time.
Taming the Exponential: A Fast Softmax Surrogate for Integer-Native Edge Inference	Dimitrios Danopoulos, Enrico Lupi, Michael Kagan, Maurizio Pierini	2026-04-02	下载	Softmax can become a computational bottleneck in the Transformer model's Multi-Head Attention (MHA) block, particularly in small models under low-precision inference, where exponentiation and normaliz...
TensorPool: A 3D-Stacked 8.4TFLOPS/4.3W Many-Core Domain-Specific Processor for AI-Native Radio Access Networks	Marco Bertuletti, Yichao Zhang, Diyou Shen, Alessandro Vanelli-Coralli, Frank K. Gürkaynak, Luca Benini	2026-04-02	下载	The upcoming integration of AI in the physical layer (PHY) of 6G radio access networks (RAN) will enable a higher quality of service in challenging transmission scenarios.
GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending	Haomin Li, Bowen Zhu, Fangxin Liu, Zongwu Wang, Xinran Liang, Li Jiang, Haibing Guan	2026-04-02	下载	Neural Radiance Fields (NeRF) enables 3D scene reconstruction from several 2D images but incurs high rendering latency via its point-sampling design.
FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators	Chi Zhang, Luca Colagrande, Renzo Andri, Luca Benini	2026-04-02	下载	Attention accounts for an increasingly dominant fraction of total computation during inference for mixture-of-experts (MoE) models, making efficient acceleration critical.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods	Amel Fatima, Tuan Ta, Bradford M. Beckmann	2026-04-02	下载	Distributed ML workloads rely heavily on collective communication across multi-GPU, multi-node systems. Emerging scale-up fabrics, such as NVLink and UALink, enable direct memory access across nodes b...
What can be computed in average anonymous networks?	Joel Rybicki, Oleg Verbitsky, Maksim Zhukovskii	2026-04-02	下载	We study what deterministic distributed algorithms can compute on random input graphs in extremely weak models of distributed computing: all nodes are anonymous, and in each communication round, nodes...
A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems	Beste Oztop, Dhruva Kulkarni, Zhengji Zhao, Ayse Kivilcim Coskun, Kadidia Konate	2026-04-02	下载	Efficient utilization of GPU resources and power has become critical with the growing demand for GPUs in high-performance computing (HPC). In this paper, we analyze GPU utilization and GPU memory util...
Intelligent Cloud Orchestration: A Hybrid Predictive and Heuristic Framework for Cost Optimization	Heet Nagoriya, Komal Rohit	2026-04-02	下载	Cloud computing allows scalable resource provisioning, but dynamic workload changes often lead to higher costs due to over-provisioning. Machine learning (ML) approaches, such as Long Short-Term Memor...
Optimization Opportunities for Cloud-Based Data Pipeline Infrastructures	Johannes Jablonski, Georg-Daniel Schwarz, Philip Heltweg, Dirk Riehle	2026-04-02	下载	Cloud infrastructure supports the efficient operation of data pipelines regarding requirements like cost, speed, and resource utilization. We present an integrated view of optimization opportunities f...
GPU-RMQ: Accelerating Range Minimum Queries on Modern GPUs	Lara Kreis, Justus Henneberg, Valentin Henkys, Felix Schuhknecht, Bertil Schmidt	2026-04-02	下载	Range minimum queries are frequently used in string processing and database applications including biological sequence analysis, document retrieval, and web search.
FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models	Juyong Jiang, Fan Wang, Hong Qi, Sunghun Kim, Jing Tang	2026-04-02	下载	Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets.
DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72	Wanqian Li, Jintao Peng, Zongfei Jing, Tianyu Zhang, Ze Long, Xianjie Qiao, Xiaoming Chen, Dongxu Yang, Kefeng Duan, June Yang	2026-04-02	下载	Large language model (LLM) inference increasingly depends on multi-GPU execution, yet existing inference parallelization strategies require layer-wise inter-rank synchronization, making end-to-end per...
ModTrans: Translating Real-world Models for Distributed Training Simulator	Yi Lyu	2026-04-02	下载	Large-scale distributed training has been a research hot spot in machine learning systems for industry and academia in recent years. However, conducting experiments without physical machines and corre...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
RL-Loop: Reinforcement Learning-Driven Real-Time 5G Slice Control for Connected and Autonomous Mobility Services	Lara Tarkh, Ali Chouman, Hanan Lutfiyya, Abdallah Shami	2026-04-02	下载	Smart and connected mobility systems rely on 5G edge infrastructure to support real-time communication, control, and service differentiation. Achieving this requires adaptive resource management mecha...
CIVIC: Cooperative Immersion Via Intelligent Credit-sharing in DRL-Powered Metaverse	Amr Aboeleneen, Mohamed Abdallah, Aiman Erbad, Amr Salem	2026-04-02	下载	The Metaverse faces complex resource allocation challenges due to diverse Virtual Environments (VEs), Digital Twins (DTs), dynamic user demands, and strict immersion needs.
Real-Time and Scalable Zak-OTFS Receiver Processing on GPUs	Junyao Zheng, Chung-Hsuan Tung, Yuncheng Yao, Nishant Mehrotra, Sandesh Mattu, Zhenzhou Qi, Danyang Zhuo, Robert Calderbank, Tingjun Chen	2026-04-02	下载	Orthogonal time frequency space (OTFS) modulation offers superior robustness to high-mobility channels compared to conventional orthogonal frequency-division multiplexing (OFDM) waveforms.
Computing the Exact Pareto Front in Average-Cost Multi-Objective Markov Decision Processes	Jiping Luo, Nikolaos Pappas	2026-04-02	下载	Many communication and control problems are cast as multi-objective Markov decision processes (MOMDPs). The complete solution to an MOMDP is the Pareto front.
Q2NS Demo: A Quantum Network Simulator Based on ns-3	Francesco Mazza, Adam Pearson, Marcello Caleffi, Angela Sara Cacciapuoti	2026-04-02	下载	Q2NS is an open-source quantum network simulator built on ns-3, the de facto standard for classical network simulation. By inheriting ns-3's mature classical stack and event-driven execution model, Q2...
Physics-Informed Transformer for Multi-Band Channel Frequency Response Reconstruction	Anatolij Zubow, Joana Angjo, Sigrid Dimce, Falko Dressler	2026-04-02	下载	Wideband channel frequency response (CFR) estimation is challenging in multi-band wireless systems, especially when one or more sub-bands are temporarily blocked by co-channel interference.
Quantum Networking Fundamentals: From Physical Protocols to Network Engineering	Athanasios Gkelias, Felix T. A. Burt, Kin K. Leung	2026-04-02	下载	The realization of the Quantum Internet promises transformative capabilities in secure communication, distributed quantum computing, and high-precision metrology.
Air-to-Air Channel Characterization for UAV Communications at 3.4 GHz	Anıl Gürses, John Kesler, Mihail L. Sichitiu	2026-04-02	下载	Uncrewed Aerial Vehicle (UAV) networks require accurate Air-to-Air (A2A) channel models, but most existing work focuses on Air-to-Ground links and leaves the sub-6 GHz A2A channel poorly characterized...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
WIO: Upload-Enabled Computational Storage on CXL SSDs	Yiwei Yang, Yanpeng Hu, Yusheng Zheng, Estabon Ramos, Jianchang Su, Andi Quinn, Wei Zhang	2026-04-02	下载	The widening gap between processor speed and storage latency has made data movement a dominant bottleneck in modern systems. Two lines of storage-layer innovation attempted to close this gap: persiste...
HACache: Leveraging Read Performance with Cache in a Heterogeneous Array	Jialin Liu, Liang Shi, Dingcui Yu	2026-04-02	下载	In cost-sensitive deployments, RAID arrays may combine SSDs with different performance levels. Such heterogeneity arises when aging SSDs degrade yet remain usable, or when failed drives are replaced w...
DAXFS: A Lock-Free Shared Filesystem for CXL Disaggregated Memory	Cong Wang, Yiwei Yang, Yusheng Zheng	2026-04-02	下载	CXL (Compute Express Link) enables multiple hosts to share byte-addressable memory with hardware cache coherence, but no existing filesystem exploits this for lock-free multi-host coordination.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Fast NF4 Dequantization Kernels for Large Language Model Inference	Xiangbo Qi, Chaoyi Jiang, Murali Annavaram	2026-04-02	下载	Large language models (LLMs) have grown beyond the memory capacity of single GPU devices, necessitating quantization techniques for practical deployment.
A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems	Beste Oztop, Dhruva Kulkarni, Zhengji Zhao, Ayse Kivilcim Coskun, Kadidia Konate	2026-04-02	下载	Efficient utilization of GPU resources and power has become critical with the growing demand for GPUs in high-performance computing (HPC). In this paper, we analyze GPU utilization and GPU memory util...
Intelligent Cloud Orchestration: A Hybrid Predictive and Heuristic Framework for Cost Optimization	Heet Nagoriya, Komal Rohit	2026-04-02	下载	Cloud computing allows scalable resource provisioning, but dynamic workload changes often lead to higher costs due to over-provisioning. Machine learning (ML) approaches, such as Long Short-Term Memor...