2025-03-01

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Leveraging Compute-in-Memory for Efficient Generative Model Inference in TPUs	Zhantong Zhu, Hongou Li, Wenjie Ren, Meng Wu, Le Ye, Ru Huang, Tianyu Jia	2025-03-01	下载	With the rapid advent of generative models, efficiently deploying these models on specialized hardware has become critical. Tensor Processing Units (TPUs) are designed to accelerate AI workloads, but ...
T-REX: A 68-567 μs/token, 0.41-3.95 μJ/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET	Seunghyun Moon, Mao Li, Gregory Chen, Phil Knag, Ram Krishnamurthy, Mingoo Seok	2025-03-01	下载	This work introduces novel training and post-training compression schemes to reduce external memory access during transformer model inference.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Communication-Efficient Device Scheduling for Federated Learning Using Lyapunov Optimization	Jake B. Perazzone, Shiqiang Wang, Mingyue Ji, Kevin Chan	2025-03-01	下载	Federated learning (FL) is a useful tool that enables the training of machine learning models over distributed data without having to collect data centrally.
A Modern Approach to Real-Time Air Traffic Management System	Priyank Vaidya, Vedansh Kamdar	2025-03-01	下载	Air traffic analytics systems are pivotal for ensuring safety, efficiency, and predictability in air travel. However, traditional systems struggle to handle the increasing volume and complexity of air...
Performance-Driven Optimization of Parallel Breadth-First Search	Marati Bhaskar, Raghavendra Kanakagiri	2025-03-01	下载	Breadth-first search (BFS) is a fundamental graph algorithm that presents significant challenges for parallel implementation due to irregular memory access patterns, load imbalance and synchronization...
Asynchronous Personalized Federated Learning through Global Memorization	Fan Wan, Yuchen Li, Xueqi Qiu, Rui Sun, Leyuan Zhang, Xingyu Miao, Tianyu Zhang, Haoran Duan, Yang Long	2025-03-01	下载	The proliferation of Internet of Things devices and advances in communication technology have unleashed an explosion of personal data, amplifying privacy concerns amid stringent regulations like GDPR ...
Conditioning on Local Statistics for Scalable Heterogeneous Federated Learning	Rickard Brännvall	2025-03-01	下载	Federated learning is a distributed machine learning approach where multiple clients collaboratively train a model without sharing their local data, which contributes to preserving privacy.
Echo: Efficient Co-Scheduling of Hybrid Online-Offline Tasks for Large Language Model Serving	Zhibin Wang, Shipeng Li, Xue Li, Yuhang Zhou, Zhonghui Zhang, Zibo Wang, Rong Gu, Chen Tian, Kun Yang, Sheng Zhong	2025-03-01	下载	Large language models have been widely deployed in various applications, encompassing both interactive online tasks and batched offline tasks.
FLStore: Efficient Federated Learning Storage for non-training workloads	Ahmad Faraz Khan, Samuel Fountain, Ahmed M. Abdelmoniem, Ali R. Butt, Ali Anwar	2025-03-01	下载	Federated Learning (FL) is an approach for privacy-preserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection.
WgPy: GPU-accelerated NumPy-like array library for web browsers	Masatoshi Hidaka, Tatsuya Harada	2025-03-01	下载	To execute scientific computing programs such as deep learning at high speed, GPU acceleration is a powerful option. With the recent advancements in web technologies, interfaces like WebGL and WebGPU,...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
CATS: A framework for Cooperative Autonomy Trust & Security	Namo Asavisanu, Tina Khezresmaeilzadeh, Rohan Sequeira, Hang Qiu, Fawad Ahmad, Konstantinos Psounis, Ramesh Govindan	2025-03-01	下载	With cooperative perception, autonomous vehicles can wirelessly share sensor data and representations to overcome sensor occlusions, improving situational awareness.
Uncoordinated Access to Serverless Computing in MEC Systems for IoT	Claudio Cicconetti, Marco Conti, Andrea Passarella	2025-03-01	下载	Edge computing is a promising solution to enable low-latency IoT applications, by shifting computation from remote data centers to local devices, less powerful but closer to the end user devices.
QaSAL: QoS-aware State-Augmented Learnable Algorithms for Coexistence of 5G NR-U/Wi-Fi	Mohammad Reza Fasihi, Brian L. Mark	2025-03-01	下载	With the increasing demand for wireless connectivity, ensuring the efficient coexistence of multiple radio access technologies in shared unlicensed spectrum has become an important issue.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Breaking the Loop: Detecting and Mitigating Denial-of-Service Vulnerabilities in Large Language Models	Junzhe Yu, Yi Liu, Huijia Sun, Ling Shi, Yuqi Chen	2025-03-01	下载	Large Language Models (LLMs) have significantly advanced text understanding and generation, becoming integral to applications across education, software development, healthcare, entertainment, and leg...
A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading	Mohammad Atif, Tianle Wang, Zhihua Dong, Charles Leggett, Meifeng Lin	2025-03-01	下载	We present a framework based on Catch2 to evaluate performance of OpenMP's target offload model via micro-benchmarks. The compilers supporting OpenMP's target offload model for heterogeneous architect...