Appearance
2025-05-16
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training | Jintao Zhang, Jia Wei, Pengle Zhang, Xiaoming Xu, Haofeng Huang, Haoxu Wang, Kai Jiang, Jianfei Chen, Jun Zhu | 2025-05-16 | 下载 | The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cores in Blac... |
| ForgetMeNot: Understanding and Modeling the Impact of Forever Chemicals Toward Sustainable Large-Scale Computing | Rohan Basu Roy, Raghavendra Kanakagiri, Yankai Jiang, Devesh Tiwari | 2025-05-16 | 下载 | Fluorinated compounds, often referred to as forever chemicals, are critical in various steps of semiconductor fabrication like lithography, etching, chamber cleaning, and others. |
| Assessing the Performance of Analog Training for Transfer Learning | Omobayode Fagbohungbe, Corey Lammie, Malte J. Rasch, Takashi Ando, Tayfun Gokmen, Vijay Narayanan | 2025-05-16 | 下载 | Analog in-memory computing is a next-generation computing paradigm that promises fast, parallel, and energy-efficient deep learning training and transfer learning (TL). |
| Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks | Chiyue Wei, Bowen Duan, Cong Guo, Jingyang Zhang, Qingyue Song, Hai "Helen" Li, Yiran Chen | 2025-05-16 | 下载 | Spiking Neural Networks (SNNs) are gaining attention for their energy efficiency and biological plausibility, utilizing 0-1 activation sparsity through spike-driven computation. |
| Cell Library Characterization for Composite Current Source Models Based on Gaussian Process Regression and Active Learning | Tao Bai, Junzhuo Zhou, Zeyuan Deng, Ting-Jung Lin, Wei Xing, Peng Cao, Lei He | 2025-05-16 | 下载 | The composite current source (CCS) model has been adopted as an advanced timing model that represents the current behavior of cells for improved accuracy and better capability than traditional non-lin... |
| EdgeMM: Multi-Core CPU with Heterogeneous AI-Extension and Activation-aware Weight Pruning for Multimodal LLMs at Edge | Kangbo Bai, Le Ye, Ru Huang, Tianyu Jia | 2025-05-16 | 下载 | Emerging multimodal LLMs (MLLMs) exhibit strong cross-modality perception and reasoning capabilities and hold great potential for various applications at edge. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Topology-Aware Knowledge Propagation in Decentralized Learning | Mansi Sakarvadia, Nathaniel Hudson, Tian Li, Ian Foster, Kyle Chard | 2025-05-16 | 下载 | Decentralized learning enables collaborative training of models across naturally distributed data without centralized coordination or maintenance of a global model. |
| Cloud-Based AI Systems: Leveraging Large Language Models for Intelligent Fault Detection and Autonomous Self-Healing | Cheng Ji, Huaiying Luo | 2025-05-16 | 下载 | With the rapid development of cloud computing systems and the increasing complexity of their infrastructure, intelligent mechanisms to detect and mitigate failures in real time are becoming increasing... |
| FAIR Ecosystems for Science at Scale | Sean R. Wilkinson, Patrick Widener | 2025-05-16 | 下载 | High Performance Computing (HPC) centers provide resources to users who require greater scale to "get science done". They deploy infrastructure with singular hardware architectures, cutting-edge softw... |
| SpecMemo: Speculative Decoding is in Your Pocket | Selin Yildirim, Deming Chen | 2025-05-16 | 下载 | Recent advancements in speculative decoding have demonstrated considerable speedup across a wide array of large language model (LLM) tasks. Speculative decoding inherently relies on sacrificing extra ... |
| Bridging Global Frameworks: Governance Strategies Behind Cisco Common Control Framework v4.0 for Scalable Cloud Compliance | Nishant Sonkar | 2025-05-16 | 下载 | CCF v4.0 provides a standard way to ensure that Cisco's cloud products comply with the many quickly evolving requirements worldwide. To cope with increasing demands brought by ISO 27001, SOC 2, NIST, ... |
| MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production | Chao Jin, Ziheng Jiang, Zhihao Bai, Zheng Zhong, Juncai Liu, Xiang Li, Ningxin Zheng, Xi Wang, Cong Xie, Qi Huang, Wen Heng, Yiyuan Ma, Wenlei Bao, Size Zheng, Yanghua Peng, Haibin Lin, Xuanzhe Liu, Xin Jin, Xin Liu | 2025-05-16 | 下载 | We present MegaScale-MoE, a production system tailored for the efficient training of large-scale mixture-of-experts (MoE) models. MoE emerges as a promising architecture to scale large language models... |
| Computing in a Faulty Congested Clique | Keren Censor-Hillel, Pedro Soto | 2025-05-16 | 下载 | We study a Faulty Congested Clique model, in which an adversary may fail nodes in the network throughout the computation. We show that any task of -bit input per node can be solved in rou... |
| MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems | Yinsicheng Jiang, Yao Fu, Yeqi Huang, Ping Nie, Zhan Lu, Leyang Xue, Congjie He, Man-Kit Sit, Jilong Xue, Li Dong, Ziming Miao, Dayou Du, Tairan Xu, Kai Zou, Edoardo Ponti, Luo Mai | 2025-05-16 | 下载 | The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources. |
| Adaptive and Robust Image Processing on CubeSats | Robert Bayer, Julian Priest, Daniel Kjellberg, Jeppe Lindhard, Nikolaj Sørenesen, Nicolaj Valsted, Ívar Óli, Pınar Tözün | 2025-05-16 | 下载 | CubeSats offer a low-cost platform for space research, particularly for Earth observation. However, their resource-constrained nature and being in space, challenge the flexibility and complexity of th... |
| Palladium: A DPU-enabled Multi-Tenant Serverless Cloud over Zero-copy Multi-node RDMA Fabrics | Shixiong Qi, Songyu Zhang, K. K. Ramakrishnan, Diman Z. Tootaghaj, Hardik Soni, Puneet Sharma | 2025-05-16 | 下载 | Serverless computing promises enhanced resource efficiency and lower user costs, yet is burdened by a heavyweight, CPU-bound data plane. Prior efforts exploiting shared memory reduce overhead locally ... |
| TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM Inference | Raja Gond, Nipun Kwatra, Ramachandran Ramjee | 2025-05-16 | 下载 | Distributed inference of large language models (LLMs) can introduce overheads of up to 20% even over GPUs connected via high-speed interconnects such as NVLink. |
| SCAREY: Location-Aware Service Lifecycle Management | Kurt Horvath, Dragi Kimovski, Radu Prodan | 2025-05-16 | 下载 | Scheduling services within the computing continuum is complex due to the dynamic interplay of the Edge, Fog, and Cloud resources, each offering distinct computational and networking advantages. |
| A Review of Tools and Techniques for Optimization of Workload Mapping and Scheduling in Heterogeneous HPC System | Aasish Kumar Sharma, Julian Kunkel | 2025-05-16 | 下载 | This paper presents a systematic review of mapping and scheduling strategies within the High-Performance Computing (HPC) compute continuum, with a particular emphasis on heterogeneous systems. |
| ForgetMeNot: Understanding and Modeling the Impact of Forever Chemicals Toward Sustainable Large-Scale Computing | Rohan Basu Roy, Raghavendra Kanakagiri, Yankai Jiang, Devesh Tiwari | 2025-05-16 | 下载 | Fluorinated compounds, often referred to as forever chemicals, are critical in various steps of semiconductor fabrication like lithography, etching, chamber cleaning, and others. |
| Assessing the Performance of Analog Training for Transfer Learning | Omobayode Fagbohungbe, Corey Lammie, Malte J. Rasch, Takashi Ando, Tayfun Gokmen, Vijay Narayanan | 2025-05-16 | 下载 | Analog in-memory computing is a next-generation computing paradigm that promises fast, parallel, and energy-efficient deep learning training and transfer learning (TL). |
| Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism | Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, Cong Wang | 2025-05-16 | 下载 | Speculative decoding (SD) has emerged as a promising technique to accelerate LLM inference by employing a small draft model to propose draft tokens in advance, and validating them in parallel with the... |
| Enhancing Secrecy Energy Efficiency in RIS-Aided Aerial Mobile Edge Computing Networks: A Deep Reinforcement Learning Approach | Aly Sabri Abdalla, Vuk Marojevic | 2025-05-16 | 下载 | This paper studies the problem of securing task offloading transmissions from ground users against ground eavesdropping threats. Our study introduces a reconfigurable intelligent surface (RIS)-aided u... |
| RapidGNN: Communication Efficient Large-Scale Distributed Training of Graph Neural Networks | Arefin Niam, M S Q Zulkar Nine | 2025-05-16 | 下载 | Graph Neural Networks (GNNs) have achieved state-of-the-art (SOTA) performance in diverse domains. However, training GNNs on large-scale graphs poses significant challenges due to high memory demands ... |
| Random Client Selection on Contrastive Federated Learning for Tabular Data | Achmad Ginanjar, Xue Li, Priyanka Singh, Wen Hua | 2025-05-16 | 下载 | Vertical Federated Learning (VFL) has revolutionised collaborative machine learning by enabling privacy-preserving model training across multiple parties. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Transforming Decoder-Only Transformers for Accurate WiFi-Telemetry Based Indoor Localization | Nayan Sanjay Bhatia, Katia Obraczka | 2025-05-16 | 下载 | Wireless Fidelity (WiFi) based indoor positioning is a widely researched area for determining the position of devices within a wireless network. |
| Palladium: A DPU-enabled Multi-Tenant Serverless Cloud over Zero-copy Multi-node RDMA Fabrics | Shixiong Qi, Songyu Zhang, K. K. Ramakrishnan, Diman Z. Tootaghaj, Hardik Soni, Puneet Sharma | 2025-05-16 | 下载 | Serverless computing promises enhanced resource efficiency and lower user costs, yet is burdened by a heavyweight, CPU-bound data plane. Prior efforts exploiting shared memory reduce overhead locally ... |
| MM-INT: Telemetry in Programmable Switches with Multiple Queues using Source-based Multipath Routing | Mateus N. Bragatto, João Paulo M. Clevelares, Cristina K. Dominicini, Rodolfo S. Villaça, Fábio L. Verdi | 2025-05-16 | 下载 | This article emphasizes the importance of queues associated with the ports of switches in network monitoring. Traditionally, data collection about these queues is done using programmable data planes a... |
| ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless Networks | Feiran You, Hongyang Du | 2025-05-16 | 下载 | Heterogeneous Networks (HetNets) pose critical challenges for intelligent management due to the diverse user requirements and time-varying wireless conditions. |
| mmMirror: Device Free mmWave Indoor NLoS Localization Using Van-Atta-Array IRS | Yihe Yan, Zhenguo Shi, Yanxiang Wang, Cheng Jiang, Chun Tung Chou, Wen Hu | 2025-05-16 | 下载 | Industry 4.0 is transforming manufacturing and logistics by integrating robots into shared human environments, such as factories, warehouses, and healthcare facilities. |
| Characterization of Using Hybrid Beamforming in mmWave Virtual Reality | Nasim Alikhani, Abbas Mohammadi | 2025-05-16 | 下载 | Wireless Virtual Reality (VR) is increasingly in demand in Wireless LANs (WLANs). In this paper, a utility function for resource management in wireless VR is proposed. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training | Jintao Zhang, Jia Wei, Pengle Zhang, Xiaoming Xu, Haofeng Huang, Haoxu Wang, Kai Jiang, Jianfei Chen, Jun Zhu | 2025-05-16 | 下载 | The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cores in Blac... |
| msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML | Zhaolan Huang, Emmanuel Baccelli | 2025-05-16 | 下载 | AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g. |
| SuperCoder: Assembly Program Superoptimization with Large Language Models | Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken | 2025-05-16 | 下载 | Superoptimization is the task of transforming a program into a faster one while preserving its input-output behavior. In this work, we investigate whether large language models (LLMs) can serve as sup... |