Appearance
2026-04-02
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Fast NF4 Dequantization Kernels for Large Language Model Inference | Xiangbo Qi, Chaoyi Jiang, Murali Annavaram | 2026-04-02 | 下载 | Large language models (LLMs) have grown beyond the memory capacity of single GPU devices, necessitating quantization techniques for practical deployment. |
| Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods | Amel Fatima, Tuan Ta, Bradford M. Beckmann | 2026-04-02 | 下载 | Distributed ML workloads rely heavily on collective communication across multi-GPU, multi-node systems. Emerging scale-up fabrics, such as NVLink and UALink, enable direct memory access across nodes b... |
| InsightBoard: An Interactive Multi-Metric Visualization and Fairness Analysis Plugin for TensorBoard | Ray Zeyao Chen, Christan Grant | 2026-04-02 | 下载 | Modern machine learning systems deployed in safety-critical domains require visibility not only into aggregate performance but also into how training dynamics affect subgroup fairness over time. |
| Taming the Exponential: A Fast Softmax Surrogate for Integer-Native Edge Inference | Dimitrios Danopoulos, Enrico Lupi, Michael Kagan, Maurizio Pierini | 2026-04-02 | 下载 | Softmax can become a computational bottleneck in the Transformer model's Multi-Head Attention (MHA) block, particularly in small models under low-precision inference, where exponentiation and normaliz... |
| TensorPool: A 3D-Stacked 8.4TFLOPS/4.3W Many-Core Domain-Specific Processor for AI-Native Radio Access Networks | Marco Bertuletti, Yichao Zhang, Diyou Shen, Alessandro Vanelli-Coralli, Frank K. Gürkaynak, Luca Benini | 2026-04-02 | 下载 | The upcoming integration of AI in the physical layer (PHY) of 6G radio access networks (RAN) will enable a higher quality of service in challenging transmission scenarios. |
| GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending | Haomin Li, Bowen Zhu, Fangxin Liu, Zongwu Wang, Xinran Liang, Li Jiang, Haibing Guan | 2026-04-02 | 下载 | Neural Radiance Fields (NeRF) enables 3D scene reconstruction from several 2D images but incurs high rendering latency via its point-sampling design. |
| FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based Accelerators | Chi Zhang, Luca Colagrande, Renzo Andri, Luca Benini | 2026-04-02 | 下载 | Attention accounts for an increasingly dominant fraction of total computation during inference for mixture-of-experts (MoE) models, making efficient acceleration critical. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods | Amel Fatima, Tuan Ta, Bradford M. Beckmann | 2026-04-02 | 下载 | Distributed ML workloads rely heavily on collective communication across multi-GPU, multi-node systems. Emerging scale-up fabrics, such as NVLink and UALink, enable direct memory access across nodes b... |
| What can be computed in average anonymous networks? | Joel Rybicki, Oleg Verbitsky, Maksim Zhukovskii | 2026-04-02 | 下载 | We study what deterministic distributed algorithms can compute on random input graphs in extremely weak models of distributed computing: all nodes are anonymous, and in each communication round, nodes... |
| A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems | Beste Oztop, Dhruva Kulkarni, Zhengji Zhao, Ayse Kivilcim Coskun, Kadidia Konate | 2026-04-02 | 下载 | Efficient utilization of GPU resources and power has become critical with the growing demand for GPUs in high-performance computing (HPC). In this paper, we analyze GPU utilization and GPU memory util... |
| Intelligent Cloud Orchestration: A Hybrid Predictive and Heuristic Framework for Cost Optimization | Heet Nagoriya, Komal Rohit | 2026-04-02 | 下载 | Cloud computing allows scalable resource provisioning, but dynamic workload changes often lead to higher costs due to over-provisioning. Machine learning (ML) approaches, such as Long Short-Term Memor... |
| Optimization Opportunities for Cloud-Based Data Pipeline Infrastructures | Johannes Jablonski, Georg-Daniel Schwarz, Philip Heltweg, Dirk Riehle | 2026-04-02 | 下载 | Cloud infrastructure supports the efficient operation of data pipelines regarding requirements like cost, speed, and resource utilization. We present an integrated view of optimization opportunities f... |
| GPU-RMQ: Accelerating Range Minimum Queries on Modern GPUs | Lara Kreis, Justus Henneberg, Valentin Henkys, Felix Schuhknecht, Bertil Schmidt | 2026-04-02 | 下载 | Range minimum queries are frequently used in string processing and database applications including biological sequence analysis, document retrieval, and web search. |
| FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models | Juyong Jiang, Fan Wang, Hong Qi, Sunghun Kim, Jing Tang | 2026-04-02 | 下载 | Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets. |
| DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72 | Wanqian Li, Jintao Peng, Zongfei Jing, Tianyu Zhang, Ze Long, Xianjie Qiao, Xiaoming Chen, Dongxu Yang, Kefeng Duan, June Yang | 2026-04-02 | 下载 | Large language model (LLM) inference increasingly depends on multi-GPU execution, yet existing inference parallelization strategies require layer-wise inter-rank synchronization, making end-to-end per... |
| ModTrans: Translating Real-world Models for Distributed Training Simulator | Yi Lyu | 2026-04-02 | 下载 | Large-scale distributed training has been a research hot spot in machine learning systems for industry and academia in recent years. However, conducting experiments without physical machines and corre... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| RL-Loop: Reinforcement Learning-Driven Real-Time 5G Slice Control for Connected and Autonomous Mobility Services | Lara Tarkh, Ali Chouman, Hanan Lutfiyya, Abdallah Shami | 2026-04-02 | 下载 | Smart and connected mobility systems rely on 5G edge infrastructure to support real-time communication, control, and service differentiation. Achieving this requires adaptive resource management mecha... |
| CIVIC: Cooperative Immersion Via Intelligent Credit-sharing in DRL-Powered Metaverse | Amr Aboeleneen, Mohamed Abdallah, Aiman Erbad, Amr Salem | 2026-04-02 | 下载 | The Metaverse faces complex resource allocation challenges due to diverse Virtual Environments (VEs), Digital Twins (DTs), dynamic user demands, and strict immersion needs. |
| Real-Time and Scalable Zak-OTFS Receiver Processing on GPUs | Junyao Zheng, Chung-Hsuan Tung, Yuncheng Yao, Nishant Mehrotra, Sandesh Mattu, Zhenzhou Qi, Danyang Zhuo, Robert Calderbank, Tingjun Chen | 2026-04-02 | 下载 | Orthogonal time frequency space (OTFS) modulation offers superior robustness to high-mobility channels compared to conventional orthogonal frequency-division multiplexing (OFDM) waveforms. |
| Computing the Exact Pareto Front in Average-Cost Multi-Objective Markov Decision Processes | Jiping Luo, Nikolaos Pappas | 2026-04-02 | 下载 | Many communication and control problems are cast as multi-objective Markov decision processes (MOMDPs). The complete solution to an MOMDP is the Pareto front. |
| Q2NS Demo: A Quantum Network Simulator Based on ns-3 | Francesco Mazza, Adam Pearson, Marcello Caleffi, Angela Sara Cacciapuoti | 2026-04-02 | 下载 | Q2NS is an open-source quantum network simulator built on ns-3, the de facto standard for classical network simulation. By inheriting ns-3's mature classical stack and event-driven execution model, Q2... |
| Physics-Informed Transformer for Multi-Band Channel Frequency Response Reconstruction | Anatolij Zubow, Joana Angjo, Sigrid Dimce, Falko Dressler | 2026-04-02 | 下载 | Wideband channel frequency response (CFR) estimation is challenging in multi-band wireless systems, especially when one or more sub-bands are temporarily blocked by co-channel interference. |
| Quantum Networking Fundamentals: From Physical Protocols to Network Engineering | Athanasios Gkelias, Felix T. A. Burt, Kin K. Leung | 2026-04-02 | 下载 | The realization of the Quantum Internet promises transformative capabilities in secure communication, distributed quantum computing, and high-precision metrology. |
| Air-to-Air Channel Characterization for UAV Communications at 3.4 GHz | Anıl Gürses, John Kesler, Mihail L. Sichitiu | 2026-04-02 | 下载 | Uncrewed Aerial Vehicle (UAV) networks require accurate Air-to-Air (A2A) channel models, but most existing work focuses on Air-to-Ground links and leaves the sub-6 GHz A2A channel poorly characterized... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| WIO: Upload-Enabled Computational Storage on CXL SSDs | Yiwei Yang, Yanpeng Hu, Yusheng Zheng, Estabon Ramos, Jianchang Su, Andi Quinn, Wei Zhang | 2026-04-02 | 下载 | The widening gap between processor speed and storage latency has made data movement a dominant bottleneck in modern systems. Two lines of storage-layer innovation attempted to close this gap: persiste... |
| HACache: Leveraging Read Performance with Cache in a Heterogeneous Array | Jialin Liu, Liang Shi, Dingcui Yu | 2026-04-02 | 下载 | In cost-sensitive deployments, RAID arrays may combine SSDs with different performance levels. Such heterogeneity arises when aging SSDs degrade yet remain usable, or when failed drives are replaced w... |
| DAXFS: A Lock-Free Shared Filesystem for CXL Disaggregated Memory | Cong Wang, Yiwei Yang, Yusheng Zheng | 2026-04-02 | 下载 | CXL (Compute Express Link) enables multiple hosts to share byte-addressable memory with hardware cache coherence, but no existing filesystem exploits this for lock-free multi-host coordination. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Fast NF4 Dequantization Kernels for Large Language Model Inference | Xiangbo Qi, Chaoyi Jiang, Murali Annavaram | 2026-04-02 | 下载 | Large language models (LLMs) have grown beyond the memory capacity of single GPU devices, necessitating quantization techniques for practical deployment. |
| A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC Systems | Beste Oztop, Dhruva Kulkarni, Zhengji Zhao, Ayse Kivilcim Coskun, Kadidia Konate | 2026-04-02 | 下载 | Efficient utilization of GPU resources and power has become critical with the growing demand for GPUs in high-performance computing (HPC). In this paper, we analyze GPU utilization and GPU memory util... |
| Intelligent Cloud Orchestration: A Hybrid Predictive and Heuristic Framework for Cost Optimization | Heet Nagoriya, Komal Rohit | 2026-04-02 | 下载 | Cloud computing allows scalable resource provisioning, but dynamic workload changes often lead to higher costs due to over-provisioning. Machine learning (ML) approaches, such as Long Short-Term Memor... |