Skip to content

2025-05-16

cs.AR - Architecture

标题作者发布日期PDF摘要
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit TrainingJintao Zhang, Jia Wei, Pengle Zhang, Xiaoming Xu, Haofeng Huang, Haoxu Wang, Kai Jiang, Jianfei Chen, Jun Zhu2025-05-16下载The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cores in Blac...
ForgetMeNot: Understanding and Modeling the Impact of Forever Chemicals Toward Sustainable Large-Scale ComputingRohan Basu Roy, Raghavendra Kanakagiri, Yankai Jiang, Devesh Tiwari2025-05-16下载Fluorinated compounds, often referred to as forever chemicals, are critical in various steps of semiconductor fabrication like lithography, etching, chamber cleaning, and others.
Assessing the Performance of Analog Training for Transfer LearningOmobayode Fagbohungbe, Corey Lammie, Malte J. Rasch, Takashi Ando, Tayfun Gokmen, Vijay Narayanan2025-05-16下载Analog in-memory computing is a next-generation computing paradigm that promises fast, parallel, and energy-efficient deep learning training and transfer learning (TL).
Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural NetworksChiyue Wei, Bowen Duan, Cong Guo, Jingyang Zhang, Qingyue Song, Hai "Helen" Li, Yiran Chen2025-05-16下载Spiking Neural Networks (SNNs) are gaining attention for their energy efficiency and biological plausibility, utilizing 0-1 activation sparsity through spike-driven computation.
Cell Library Characterization for Composite Current Source Models Based on Gaussian Process Regression and Active LearningTao Bai, Junzhuo Zhou, Zeyuan Deng, Ting-Jung Lin, Wei Xing, Peng Cao, Lei He2025-05-16下载The composite current source (CCS) model has been adopted as an advanced timing model that represents the current behavior of cells for improved accuracy and better capability than traditional non-lin...
EdgeMM: Multi-Core CPU with Heterogeneous AI-Extension and Activation-aware Weight Pruning for Multimodal LLMs at EdgeKangbo Bai, Le Ye, Ru Huang, Tianyu Jia2025-05-16下载Emerging multimodal LLMs (MLLMs) exhibit strong cross-modality perception and reasoning capabilities and hold great potential for various applications at edge.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Topology-Aware Knowledge Propagation in Decentralized LearningMansi Sakarvadia, Nathaniel Hudson, Tian Li, Ian Foster, Kyle Chard2025-05-16下载Decentralized learning enables collaborative training of models across naturally distributed data without centralized coordination or maintenance of a global model.
Cloud-Based AI Systems: Leveraging Large Language Models for Intelligent Fault Detection and Autonomous Self-HealingCheng Ji, Huaiying Luo2025-05-16下载With the rapid development of cloud computing systems and the increasing complexity of their infrastructure, intelligent mechanisms to detect and mitigate failures in real time are becoming increasing...
FAIR Ecosystems for Science at ScaleSean R. Wilkinson, Patrick Widener2025-05-16下载High Performance Computing (HPC) centers provide resources to users who require greater scale to "get science done". They deploy infrastructure with singular hardware architectures, cutting-edge softw...
SpecMemo: Speculative Decoding is in Your PocketSelin Yildirim, Deming Chen2025-05-16下载Recent advancements in speculative decoding have demonstrated considerable speedup across a wide array of large language model (LLM) tasks. Speculative decoding inherently relies on sacrificing extra ...
Bridging Global Frameworks: Governance Strategies Behind Cisco Common Control Framework v4.0 for Scalable Cloud ComplianceNishant Sonkar2025-05-16下载CCF v4.0 provides a standard way to ensure that Cisco's cloud products comply with the many quickly evolving requirements worldwide. To cope with increasing demands brought by ISO 27001, SOC 2, NIST, ...
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in ProductionChao Jin, Ziheng Jiang, Zhihao Bai, Zheng Zhong, Juncai Liu, Xiang Li, Ningxin Zheng, Xi Wang, Cong Xie, Qi Huang, Wen Heng, Yiyuan Ma, Wenlei Bao, Size Zheng, Yanghua Peng, Haibin Lin, Xuanzhe Liu, Xin Jin, Xin Liu2025-05-16下载We present MegaScale-MoE, a production system tailored for the efficient training of large-scale mixture-of-experts (MoE) models. MoE emerges as a promising architecture to scale large language models...
Computing in a Faulty Congested CliqueKeren Censor-Hillel, Pedro Soto2025-05-16下载We study a Faulty Congested Clique model, in which an adversary may fail nodes in the network throughout the computation. We show that any task of O(nlogn)O(n\log{n})-bit input per node can be solved in rou...
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts SystemsYinsicheng Jiang, Yao Fu, Yeqi Huang, Ping Nie, Zhan Lu, Leyang Xue, Congjie He, Man-Kit Sit, Jilong Xue, Li Dong, Ziming Miao, Dayou Du, Tairan Xu, Kai Zou, Edoardo Ponti, Luo Mai2025-05-16下载The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources.
Adaptive and Robust Image Processing on CubeSatsRobert Bayer, Julian Priest, Daniel Kjellberg, Jeppe Lindhard, Nikolaj Sørenesen, Nicolaj Valsted, Ívar Óli, Pınar Tözün2025-05-16下载CubeSats offer a low-cost platform for space research, particularly for Earth observation. However, their resource-constrained nature and being in space, challenge the flexibility and complexity of th...
Palladium: A DPU-enabled Multi-Tenant Serverless Cloud over Zero-copy Multi-node RDMA FabricsShixiong Qi, Songyu Zhang, K. K. Ramakrishnan, Diman Z. Tootaghaj, Hardik Soni, Puneet Sharma2025-05-16下载Serverless computing promises enhanced resource efficiency and lower user costs, yet is burdened by a heavyweight, CPU-bound data plane. Prior efforts exploiting shared memory reduce overhead locally ...
TokenWeave: Efficient Compute-Communication Overlap for Distributed LLM InferenceRaja Gond, Nipun Kwatra, Ramachandran Ramjee2025-05-16下载Distributed inference of large language models (LLMs) can introduce overheads of up to 20% even over GPUs connected via high-speed interconnects such as NVLink.
SCAREY: Location-Aware Service Lifecycle ManagementKurt Horvath, Dragi Kimovski, Radu Prodan2025-05-16下载Scheduling services within the computing continuum is complex due to the dynamic interplay of the Edge, Fog, and Cloud resources, each offering distinct computational and networking advantages.
A Review of Tools and Techniques for Optimization of Workload Mapping and Scheduling in Heterogeneous HPC SystemAasish Kumar Sharma, Julian Kunkel2025-05-16下载This paper presents a systematic review of mapping and scheduling strategies within the High-Performance Computing (HPC) compute continuum, with a particular emphasis on heterogeneous systems.
ForgetMeNot: Understanding and Modeling the Impact of Forever Chemicals Toward Sustainable Large-Scale ComputingRohan Basu Roy, Raghavendra Kanakagiri, Yankai Jiang, Devesh Tiwari2025-05-16下载Fluorinated compounds, often referred to as forever chemicals, are critical in various steps of semiconductor fabrication like lithography, etching, chamber cleaning, and others.
Assessing the Performance of Analog Training for Transfer LearningOmobayode Fagbohungbe, Corey Lammie, Malte J. Rasch, Takashi Ando, Tayfun Gokmen, Vijay Narayanan2025-05-16下载Analog in-memory computing is a next-generation computing paradigm that promises fast, parallel, and energy-efficient deep learning training and transfer learning (TL).
Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch ParallelismYuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, Cong Wang2025-05-16下载Speculative decoding (SD) has emerged as a promising technique to accelerate LLM inference by employing a small draft model to propose draft tokens in advance, and validating them in parallel with the...
Enhancing Secrecy Energy Efficiency in RIS-Aided Aerial Mobile Edge Computing Networks: A Deep Reinforcement Learning ApproachAly Sabri Abdalla, Vuk Marojevic2025-05-16下载This paper studies the problem of securing task offloading transmissions from ground users against ground eavesdropping threats. Our study introduces a reconfigurable intelligent surface (RIS)-aided u...
RapidGNN: Communication Efficient Large-Scale Distributed Training of Graph Neural NetworksArefin Niam, M S Q Zulkar Nine2025-05-16下载Graph Neural Networks (GNNs) have achieved state-of-the-art (SOTA) performance in diverse domains. However, training GNNs on large-scale graphs poses significant challenges due to high memory demands ...
Random Client Selection on Contrastive Federated Learning for Tabular DataAchmad Ginanjar, Xue Li, Priyanka Singh, Wen Hua2025-05-16下载Vertical Federated Learning (VFL) has revolutionised collaborative machine learning by enabling privacy-preserving model training across multiple parties.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Transforming Decoder-Only Transformers for Accurate WiFi-Telemetry Based Indoor LocalizationNayan Sanjay Bhatia, Katia Obraczka2025-05-16下载Wireless Fidelity (WiFi) based indoor positioning is a widely researched area for determining the position of devices within a wireless network.
Palladium: A DPU-enabled Multi-Tenant Serverless Cloud over Zero-copy Multi-node RDMA FabricsShixiong Qi, Songyu Zhang, K. K. Ramakrishnan, Diman Z. Tootaghaj, Hardik Soni, Puneet Sharma2025-05-16下载Serverless computing promises enhanced resource efficiency and lower user costs, yet is burdened by a heavyweight, CPU-bound data plane. Prior efforts exploiting shared memory reduce overhead locally ...
MM-INT: Telemetry in Programmable Switches with Multiple Queues using Source-based Multipath RoutingMateus N. Bragatto, João Paulo M. Clevelares, Cristina K. Dominicini, Rodolfo S. Villaça, Fábio L. Verdi2025-05-16下载This article emphasizes the importance of queues associated with the ports of switches in network monitoring. Traditionally, data collection about these queues is done using programmable data planes a...
ReaCritic: Reasoning Transformer-based DRL Critic-model Scaling For Wireless NetworksFeiran You, Hongyang Du2025-05-16下载Heterogeneous Networks (HetNets) pose critical challenges for intelligent management due to the diverse user requirements and time-varying wireless conditions.
mmMirror: Device Free mmWave Indoor NLoS Localization Using Van-Atta-Array IRSYihe Yan, Zhenguo Shi, Yanxiang Wang, Cheng Jiang, Chun Tung Chou, Wen Hu2025-05-16下载Industry 4.0 is transforming manufacturing and logistics by integrating robots into shared human environments, such as factories, warehouses, and healthcare facilities.
Characterization of Using Hybrid Beamforming in mmWave Virtual RealityNasim Alikhani, Abbas Mohammadi2025-05-16下载Wireless Virtual Reality (VR) is increasingly in demand in Wireless LANs (WLANs). In this paper, a utility function for resource management in wireless VR is proposed.

cs.PF - Performance

标题作者发布日期PDF摘要
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit TrainingJintao Zhang, Jia Wei, Pengle Zhang, Xiaoming Xu, Haofeng Huang, Haoxu Wang, Kai Jiang, Jianfei Chen, Jun Zhu2025-05-16下载The efficiency of attention is important due to its quadratic time complexity. We enhance the efficiency of attention through two key contributions: First, we leverage the new FP4 Tensor Cores in Blac...
msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyMLZhaolan Huang, Emmanuel Baccelli2025-05-16下载AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g.
SuperCoder: Assembly Program Superoptimization with Large Language ModelsAnjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken2025-05-16下载Superoptimization is the task of transforming a program into a faster one while preserving its input-output behavior. In this work, we investigate whether large language models (LLMs) can serve as sup...

基于 VitePress 构建