2025-05-16

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training	Jintao Zhang, Jia Wei, Pengle Zhang, Xiaoming Xu, Haofeng Huang, Haoxu Wang, Kai Jiang, Jianfei Chen, Jun Zhu	2025-05-16	下载	The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cores in Blac...
ForgetMeNot: Understanding and Modeling the Impact of Forever Chemicals Toward Sustainable Large-Scale Computing	Rohan Basu Roy, Raghavendra Kanakagiri, Yankai Jiang, Devesh Tiwari	2025-05-16	下载	Fluorinated compounds, often referred to as forever chemicals, are critical in various steps of semiconductor fabrication like lithography, etching, chamber cleaning, and others.
Assessing the Performance of Analog Training for Transfer Learning	Omobayode Fagbohungbe, Corey Lammie, Malte J. Rasch, Takashi Ando, Tayfun Gokmen, Vijay Narayanan	2025-05-16	下载	Analog in-memory computing is a next-generation computing paradigm that promises fast, parallel, and energy-efficient deep learning training and transfer learning (TL).
Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks	Chiyue Wei, Bowen Duan, Cong Guo, Jingyang Zhang, Qingyue Song, Hai "Helen" Li, Yiran Chen	2025-05-16	下载	Spiking Neural Networks (SNNs) are gaining attention for their energy efficiency and biological plausibility, utilizing 0-1 activation sparsity through spike-driven computation.
Cell Library Characterization for Composite Current Source Models Based on Gaussian Process Regression and Active Learning	Tao Bai, Junzhuo Zhou, Zeyuan Deng, Ting-Jung Lin, Wei Xing, Peng Cao, Lei He	2025-05-16	下载	The composite current source (CCS) model has been adopted as an advanced timing model that represents the current behavior of cells for improved accuracy and better capability than traditional non-lin...
EdgeMM: Multi-Core CPU with Heterogeneous AI-Extension and Activation-aware Weight Pruning for Multimodal LLMs at Edge	Kangbo Bai, Le Ye, Ru Huang, Tianyu Jia	2025-05-16	下载	Emerging multimodal LLMs (MLLMs) exhibit strong cross-modality perception and reasoning capabilities and hold great potential for various applications at edge.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Topology-Aware Knowledge Propagation in Decentralized Learning	Mansi Sakarvadia, Nathaniel Hudson, Tian Li, Ian Foster, Kyle Chard	2025-05-16	下载	Decentralized learning enables collaborative training of models across naturally distributed data without centralized coordination or maintenance of a global model.
Cloud-Based AI Systems: Leveraging Large Language Models for Intelligent Fault Detection and Autonomous Self-Healing	Cheng Ji, Huaiying Luo	2025-05-16	下载	With the rapid development of cloud computing systems and the increasing complexity of their infrastructure, intelligent mechanisms to detect and mitigate failures in real time are becoming increasing...
FAIR Ecosystems for Science at Scale	Sean R. Wilkinson, Patrick Widener	2025-05-16	下载	High Performance Computing (HPC) centers provide resources to users who require greater scale to "get science done". They deploy infrastructure with singular hardware architectures, cutting-edge softw...
SpecMemo: Speculative Decoding is in Your Pocket	Selin Yildirim, Deming Chen	2025-05-16	下载	Recent advancements in speculative decoding have demonstrated considerable speedup across a wide array of large language model (LLM) tasks. Speculative decoding inherently relies on sacrificing extra ...
Bridging Global Frameworks: Governance Strategies Behind Cisco Common Control Framework v4.0 for Scalable Cloud Compliance	Nishant Sonkar	2025-05-16	下载	CCF v4.0 provides a standard way to ensure that Cisco's cloud products comply with the many quickly evolving requirements worldwide. To cope with increasing demands brought by ISO 27001, SOC 2, NIST, ...
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production	Chao Jin, Ziheng Jiang, Zhihao Bai, Zheng Zhong, Juncai Liu, Xiang Li, Ningxin Zheng, Xi Wang, Cong Xie, Qi Huang, Wen Heng, Yiyuan Ma, Wenlei Bao, Size Zheng, Yanghua Peng, Haibin Lin, Xuanzhe Liu, Xin Jin, Xin Liu	2025-05-16	下载	We present MegaScale-MoE, a production system tailored for the efficient training of large-scale mixture-of-experts (MoE) models. MoE emerges as a promising architecture to scale large language models...
Computing in a Faulty Congested Clique	Keren Censor-Hillel, Pedro Soto	2025-05-16	下载	We study a Faulty Congested Clique model, in which an adversary may fail nodes in the network throughout the computation. We show that any task of $O(n\log{n})$ -bit input per node can be solved in rou...
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems	Yinsicheng Jiang, Yao Fu, Yeqi Huang, Ping Nie, Zhan Lu, Leyang Xue, Congjie He, Man-Kit Sit, Jilong Xue, Li Dong, Ziming Miao, Dayou Du, Tairan Xu, Kai Zou, Edoardo Ponti, Luo Mai	2025-05-16	下载	The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources.
Adaptive and Robust Image Processing on CubeSats	Robert Bayer, Julian Priest, Daniel Kjellberg, Jeppe Lindhard, Nikolaj Sørenesen, Nicolaj Valsted, Ívar Óli, Pınar Tözün	2025-05-16	下载	CubeSats offer a low-cost platform for space research, particularly for Earth observation. However, their resource-constrained nature and being in space, challenge the flexibility and complexity of th...
Palladium: A DPU-enabled Multi-Tenant Serverless Cloud over Zero-copy Multi-node RDMA Fabrics	Shixiong Qi, Songyu Zhang, K. K. Ramakrishnan, Diman Z. Tootaghaj, Hardik Soni, Puneet Sharma	2025-05-16	下载	Serverless computing promises enhanced resource efficiency and lower user costs, yet is burdened by a heavyweight, CPU-bound data plane. Prior efforts exploiting shared memory reduce overhead locally ...
TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference	Raja Gond, Nipun Kwatra, Ramachandran Ramjee	2025-05-16	下载	Distributed inference of large language models (LLMs) can introduce overheads of up to 20% even over GPUs connected via high-speed interconnects such as NVLink.
SCAREY: Location-Aware Service Lifecycle Management	Kurt Horvath, Dragi Kimovski, Radu Prodan	2025-05-16	下载	Scheduling services within the computing continuum is complex due to the dynamic interplay of the Edge, Fog, and Cloud resources, each offering distinct computational and networking advantages.
A Review of Tools and Techniques for Optimization of Workload Mapping and Scheduling in Heterogeneous HPC System	Aasish Kumar Sharma, Julian Kunkel	2025-05-16	下载	This paper presents a systematic review of mapping and scheduling strategies within the High-Performance Computing (HPC) compute continuum, with a particular emphasis on heterogeneous systems.
ForgetMeNot: Understanding and Modeling the Impact of Forever Chemicals Toward Sustainable Large-Scale Computing	Rohan Basu Roy, Raghavendra Kanakagiri, Yankai Jiang, Devesh Tiwari	2025-05-16	下载	Fluorinated compounds, often referred to as forever chemicals, are critical in various steps of semiconductor fabrication like lithography, etching, chamber cleaning, and others.
Assessing the Performance of Analog Training for Transfer Learning	Omobayode Fagbohungbe, Corey Lammie, Malte J. Rasch, Takashi Ando, Tayfun Gokmen, Vijay Narayanan	2025-05-16	下载	Analog in-memory computing is a next-generation computing paradigm that promises fast, parallel, and energy-efficient deep learning training and transfer learning (TL).
Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism	Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, Cong Wang	2025-05-16	下载	Speculative decoding (SD) has emerged as a promising technique to accelerate LLM inference by employing a small draft model to propose draft tokens in advance, and validating them in parallel with the...
Enhancing Secrecy Energy Efficiency in RIS-Aided Aerial Mobile Edge Computing Networks: A Deep Reinforcement Learning Approach	Aly Sabri Abdalla, Vuk Marojevic	2025-05-16	下载	This paper studies the problem of securing task offloading transmissions from ground users against ground eavesdropping threats. Our study introduces a reconfigurable intelligent surface (RIS)-aided u...
RapidGNN: Communication Efficient Large-Scale Distributed Training of Graph Neural Networks	Arefin Niam, M S Q Zulkar Nine	2025-05-16	下载	Graph Neural Networks (GNNs) have achieved state-of-the-art (SOTA) performance in diverse domains. However, training GNNs on large-scale graphs poses significant challenges due to high memory demands ...
Random Client Selection on Contrastive Federated Learning for Tabular Data	Achmad Ginanjar, Xue Li, Priyanka Singh, Wen Hua	2025-05-16	下载	Vertical Federated Learning (VFL) has revolutionised collaborative machine learning by enabling privacy-preserving model training across multiple parties.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Transforming Decoder-Only Transformers for Accurate WiFi-Telemetry Based Indoor Localization	Nayan Sanjay Bhatia, Katia Obraczka	2025-05-16	下载	Wireless Fidelity (WiFi) based indoor positioning is a widely researched area for determining the position of devices within a wireless network.
Palladium: A DPU-enabled Multi-Tenant Serverless Cloud over Zero-copy Multi-node RDMA Fabrics	Shixiong Qi, Songyu Zhang, K. K. Ramakrishnan, Diman Z. Tootaghaj, Hardik Soni, Puneet Sharma	2025-05-16	下载	Serverless computing promises enhanced resource efficiency and lower user costs, yet is burdened by a heavyweight, CPU-bound data plane. Prior efforts exploiting shared memory reduce overhead locally ...
MM-INT: Telemetry in Programmable Switches with Multiple Queues using Source-based Multipath Routing	Mateus N. Bragatto, João Paulo M. Clevelares, Cristina K. Dominicini, Rodolfo S. Villaça, Fábio L. Verdi	2025-05-16	下载	This article emphasizes the importance of queues associated with the ports of switches in network monitoring. Traditionally, data collection about these queues is done using programmable data planes a...
ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless Networks	Feiran You, Hongyang Du	2025-05-16	下载	Heterogeneous Networks (HetNets) pose critical challenges for intelligent management due to the diverse user requirements and time-varying wireless conditions.
mmMirror: Device Free mmWave Indoor NLoS Localization Using Van-Atta-Array IRS	Yihe Yan, Zhenguo Shi, Yanxiang Wang, Cheng Jiang, Chun Tung Chou, Wen Hu	2025-05-16	下载	Industry 4.0 is transforming manufacturing and logistics by integrating robots into shared human environments, such as factories, warehouses, and healthcare facilities.
Characterization of Using Hybrid Beamforming in mmWave Virtual Reality	Nasim Alikhani, Abbas Mohammadi	2025-05-16	下载	Wireless Virtual Reality (VR) is increasingly in demand in Wireless LANs (WLANs). In this paper, a utility function for resource management in wireless VR is proposed.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training	Jintao Zhang, Jia Wei, Pengle Zhang, Xiaoming Xu, Haofeng Huang, Haoxu Wang, Kai Jiang, Jianfei Chen, Jun Zhu	2025-05-16	下载	The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cores in Blac...
msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML	Zhaolan Huang, Emmanuel Baccelli	2025-05-16	下载	AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g.
SuperCoder: Assembly Program Superoptimization with Large Language Models	Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken	2025-05-16	下载	Superoptimization is the task of transforming a program into a faster one while preserving its input-output behavior. In this work, we investigate whether large language models (LLMs) can serve as sup...