Skip to content

2026-04-02

cs.AR - Architecture

标题作者发布日期PDF摘要
Fast NF4 Dequantization Kernels for Large Language Model InferenceXiangbo Qi, Chaoyi Jiang, Murali Annavaram2026-04-02下载Large language models (LLMs) have grown beyond the memory capacity of single GPU devices, necessitating quantization techniques for practical deployment.
Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up PodsAmel Fatima, Tuan Ta, Bradford M. Beckmann2026-04-02下载Distributed ML workloads rely heavily on collective communication across multi-GPU, multi-node systems. Emerging scale-up fabrics, such as NVLink and UALink, enable direct memory access across nodes b...
InsightBoard: An Interactive Multi-Metric Visualization and Fairness Analysis Plugin for TensorBoardRay Zeyao Chen, Christan Grant2026-04-02下载Modern machine learning systems deployed in safety-critical domains require visibility not only into aggregate performance but also into how training dynamics affect subgroup fairness over time.
Taming the Exponential: A Fast Softmax Surrogate for Integer-Native Edge InferenceDimitrios Danopoulos, Enrico Lupi, Michael Kagan, Maurizio Pierini2026-04-02下载Softmax can become a computational bottleneck in the Transformer model's Multi-Head Attention (MHA) block, particularly in small models under low-precision inference, where exponentiation and normaliz...
TensorPool: A 3D-Stacked 8.4TFLOPS/4.3W Many-Core Domain-Specific Processor for AI-Native Radio Access NetworksMarco Bertuletti, Yichao Zhang, Diyou Shen, Alessandro Vanelli-Coralli, Frank K. Gürkaynak, Luca Benini2026-04-02下载The upcoming integration of AI in the physical layer (PHY) of 6G radio access networks (RAN) will enable a higher quality of service in challenging transmission scenarios.
GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible BlendingHaomin Li, Bowen Zhu, Fangxin Liu, Zongwu Wang, Xinran Liang, Li Jiang, Haibing Guan2026-04-02下载Neural Radiance Fields (NeRF) enables 3D scene reconstruction from several 2D images but incurs high rendering latency via its point-sampling design.
FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Large Attention-Based Model Inference on Tile-Based AcceleratorsChi Zhang, Luca Colagrande, Renzo Andri, Luca Benini2026-04-02下载Attention accounts for an increasingly dominant fraction of total computation during inference for mixture-of-experts (MoE) models, making efficient acceleration critical.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up PodsAmel Fatima, Tuan Ta, Bradford M. Beckmann2026-04-02下载Distributed ML workloads rely heavily on collective communication across multi-GPU, multi-node systems. Emerging scale-up fabrics, such as NVLink and UALink, enable direct memory access across nodes b...
What can be computed in average anonymous networks?Joel Rybicki, Oleg Verbitsky, Maksim Zhukovskii2026-04-02下载We study what deterministic distributed algorithms can compute on random input graphs in extremely weak models of distributed computing: all nodes are anonymous, and in each communication round, nodes...
A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC SystemsBeste Oztop, Dhruva Kulkarni, Zhengji Zhao, Ayse Kivilcim Coskun, Kadidia Konate2026-04-02下载Efficient utilization of GPU resources and power has become critical with the growing demand for GPUs in high-performance computing (HPC). In this paper, we analyze GPU utilization and GPU memory util...
Intelligent Cloud Orchestration: A Hybrid Predictive and Heuristic Framework for Cost OptimizationHeet Nagoriya, Komal Rohit2026-04-02下载Cloud computing allows scalable resource provisioning, but dynamic workload changes often lead to higher costs due to over-provisioning. Machine learning (ML) approaches, such as Long Short-Term Memor...
Optimization Opportunities for Cloud-Based Data Pipeline InfrastructuresJohannes Jablonski, Georg-Daniel Schwarz, Philip Heltweg, Dirk Riehle2026-04-02下载Cloud infrastructure supports the efficient operation of data pipelines regarding requirements like cost, speed, and resource utilization. We present an integrated view of optimization opportunities f...
GPU-RMQ: Accelerating Range Minimum Queries on Modern GPUsLara Kreis, Justus Henneberg, Valentin Henkys, Felix Schuhknecht, Bertil Schmidt2026-04-02下载Range minimum queries are frequently used in string processing and database applications including biological sequence analysis, document retrieval, and web search.
FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language ModelsJuyong Jiang, Fan Wang, Hong Qi, Sunghun Kim, Jing Tang2026-04-02下载Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets.
DWDP: Distributed Weight Data Parallelism for High-Performance LLM Inference on NVL72Wanqian Li, Jintao Peng, Zongfei Jing, Tianyu Zhang, Ze Long, Xianjie Qiao, Xiaoming Chen, Dongxu Yang, Kefeng Duan, June Yang2026-04-02下载Large language model (LLM) inference increasingly depends on multi-GPU execution, yet existing inference parallelization strategies require layer-wise inter-rank synchronization, making end-to-end per...
ModTrans: Translating Real-world Models for Distributed Training SimulatorYi Lyu2026-04-02下载Large-scale distributed training has been a research hot spot in machine learning systems for industry and academia in recent years. However, conducting experiments without physical machines and corre...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
RL-Loop: Reinforcement Learning-Driven Real-Time 5G Slice Control for Connected and Autonomous Mobility ServicesLara Tarkh, Ali Chouman, Hanan Lutfiyya, Abdallah Shami2026-04-02下载Smart and connected mobility systems rely on 5G edge infrastructure to support real-time communication, control, and service differentiation. Achieving this requires adaptive resource management mecha...
CIVIC: Cooperative Immersion Via Intelligent Credit-sharing in DRL-Powered MetaverseAmr Aboeleneen, Mohamed Abdallah, Aiman Erbad, Amr Salem2026-04-02下载The Metaverse faces complex resource allocation challenges due to diverse Virtual Environments (VEs), Digital Twins (DTs), dynamic user demands, and strict immersion needs.
Real-Time and Scalable Zak-OTFS Receiver Processing on GPUsJunyao Zheng, Chung-Hsuan Tung, Yuncheng Yao, Nishant Mehrotra, Sandesh Mattu, Zhenzhou Qi, Danyang Zhuo, Robert Calderbank, Tingjun Chen2026-04-02下载Orthogonal time frequency space (OTFS) modulation offers superior robustness to high-mobility channels compared to conventional orthogonal frequency-division multiplexing (OFDM) waveforms.
Computing the Exact Pareto Front in Average-Cost Multi-Objective Markov Decision ProcessesJiping Luo, Nikolaos Pappas2026-04-02下载Many communication and control problems are cast as multi-objective Markov decision processes (MOMDPs). The complete solution to an MOMDP is the Pareto front.
Q2NS Demo: A Quantum Network Simulator Based on ns-3Francesco Mazza, Adam Pearson, Marcello Caleffi, Angela Sara Cacciapuoti2026-04-02下载Q2NS is an open-source quantum network simulator built on ns-3, the de facto standard for classical network simulation. By inheriting ns-3's mature classical stack and event-driven execution model, Q2...
Physics-Informed Transformer for Multi-Band Channel Frequency Response ReconstructionAnatolij Zubow, Joana Angjo, Sigrid Dimce, Falko Dressler2026-04-02下载Wideband channel frequency response (CFR) estimation is challenging in multi-band wireless systems, especially when one or more sub-bands are temporarily blocked by co-channel interference.
Quantum Networking Fundamentals: From Physical Protocols to Network EngineeringAthanasios Gkelias, Felix T. A. Burt, Kin K. Leung2026-04-02下载The realization of the Quantum Internet promises transformative capabilities in secure communication, distributed quantum computing, and high-precision metrology.
Air-to-Air Channel Characterization for UAV Communications at 3.4 GHzAnıl Gürses, John Kesler, Mihail L. Sichitiu2026-04-02下载Uncrewed Aerial Vehicle (UAV) networks require accurate Air-to-Air (A2A) channel models, but most existing work focuses on Air-to-Ground links and leaves the sub-6 GHz A2A channel poorly characterized...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
WIO: Upload-Enabled Computational Storage on CXL SSDsYiwei Yang, Yanpeng Hu, Yusheng Zheng, Estabon Ramos, Jianchang Su, Andi Quinn, Wei Zhang2026-04-02下载The widening gap between processor speed and storage latency has made data movement a dominant bottleneck in modern systems. Two lines of storage-layer innovation attempted to close this gap: persiste...
HACache: Leveraging Read Performance with Cache in a Heterogeneous ArrayJialin Liu, Liang Shi, Dingcui Yu2026-04-02下载In cost-sensitive deployments, RAID arrays may combine SSDs with different performance levels. Such heterogeneity arises when aging SSDs degrade yet remain usable, or when failed drives are replaced w...
DAXFS: A Lock-Free Shared Filesystem for CXL Disaggregated MemoryCong Wang, Yiwei Yang, Yusheng Zheng2026-04-02下载CXL (Compute Express Link) enables multiple hosts to share byte-addressable memory with hardware cache coherence, but no existing filesystem exploits this for lock-free multi-host coordination.

cs.PF - Performance

标题作者发布日期PDF摘要
Fast NF4 Dequantization Kernels for Large Language Model InferenceXiangbo Qi, Chaoyi Jiang, Murali Annavaram2026-04-02下载Large language models (LLMs) have grown beyond the memory capacity of single GPU devices, necessitating quantization techniques for practical deployment.
A Practical Two-Stage Framework for GPU Resource and Power Prediction in Heterogeneous HPC SystemsBeste Oztop, Dhruva Kulkarni, Zhengji Zhao, Ayse Kivilcim Coskun, Kadidia Konate2026-04-02下载Efficient utilization of GPU resources and power has become critical with the growing demand for GPUs in high-performance computing (HPC). In this paper, we analyze GPU utilization and GPU memory util...
Intelligent Cloud Orchestration: A Hybrid Predictive and Heuristic Framework for Cost OptimizationHeet Nagoriya, Komal Rohit2026-04-02下载Cloud computing allows scalable resource provisioning, but dynamic workload changes often lead to higher costs due to over-provisioning. Machine learning (ML) approaches, such as Long Short-Term Memor...

基于 VitePress 构建