Skip to content

2025-03-01

cs.AR - Architecture

标题作者发布日期PDF摘要
Leveraging Compute-in-Memory for Efficient Generative Model Inference in TPUsZhantong Zhu, Hongou Li, Wenjie Ren, Meng Wu, Le Ye, Ru Huang, Tianyu Jia2025-03-01下载With the rapid advent of generative models, efficiently deploying these models on specialized hardware has become critical. Tensor Processing Units (TPUs) are designed to accelerate AI workloads, but ...
T-REX: A 68-567 μs/token, 0.41-3.95 μJ/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFETSeunghyun Moon, Mao Li, Gregory Chen, Phil Knag, Ram Krishnamurthy, Mingoo Seok2025-03-01下载This work introduces novel training and post-training compression schemes to reduce external memory access during transformer model inference.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Communication-Efficient Device Scheduling for Federated Learning Using Lyapunov OptimizationJake B. Perazzone, Shiqiang Wang, Mingyue Ji, Kevin Chan2025-03-01下载Federated learning (FL) is a useful tool that enables the training of machine learning models over distributed data without having to collect data centrally.
A Modern Approach to Real-Time Air Traffic Management SystemPriyank Vaidya, Vedansh Kamdar2025-03-01下载Air traffic analytics systems are pivotal for ensuring safety, efficiency, and predictability in air travel. However, traditional systems struggle to handle the increasing volume and complexity of air...
Performance-Driven Optimization of Parallel Breadth-First SearchMarati Bhaskar, Raghavendra Kanakagiri2025-03-01下载Breadth-first search (BFS) is a fundamental graph algorithm that presents significant challenges for parallel implementation due to irregular memory access patterns, load imbalance and synchronization...
Asynchronous Personalized Federated Learning through Global MemorizationFan Wan, Yuchen Li, Xueqi Qiu, Rui Sun, Leyuan Zhang, Xingyu Miao, Tianyu Zhang, Haoran Duan, Yang Long2025-03-01下载The proliferation of Internet of Things devices and advances in communication technology have unleashed an explosion of personal data, amplifying privacy concerns amid stringent regulations like GDPR ...
Conditioning on Local Statistics for Scalable Heterogeneous Federated LearningRickard Brännvall2025-03-01下载Federated learning is a distributed machine learning approach where multiple clients collaboratively train a model without sharing their local data, which contributes to preserving privacy.
Echo: Efficient Co-Scheduling of Hybrid Online-Offline Tasks for Large Language Model ServingZhibin Wang, Shipeng Li, Xue Li, Yuhang Zhou, Zhonghui Zhang, Zibo Wang, Rong Gu, Chen Tian, Kun Yang, Sheng Zhong2025-03-01下载Large language models have been widely deployed in various applications, encompassing both interactive online tasks and batched offline tasks.
FLStore: Efficient Federated Learning Storage for non-training workloadsAhmad Faraz Khan, Samuel Fountain, Ahmed M. Abdelmoniem, Ali R. Butt, Ali Anwar2025-03-01下载Federated Learning (FL) is an approach for privacy-preserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection.
WgPy: GPU-accelerated NumPy-like array library for web browsersMasatoshi Hidaka, Tatsuya Harada2025-03-01下载To execute scientific computing programs such as deep learning at high speed, GPU acceleration is a powerful option. With the recent advancements in web technologies, interfaces like WebGL and WebGPU,...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
CATS: A framework for Cooperative Autonomy Trust & SecurityNamo Asavisanu, Tina Khezresmaeilzadeh, Rohan Sequeira, Hang Qiu, Fawad Ahmad, Konstantinos Psounis, Ramesh Govindan2025-03-01下载With cooperative perception, autonomous vehicles can wirelessly share sensor data and representations to overcome sensor occlusions, improving situational awareness.
Uncoordinated Access to Serverless Computing in MEC Systems for IoTClaudio Cicconetti, Marco Conti, Andrea Passarella2025-03-01下载Edge computing is a promising solution to enable low-latency IoT applications, by shifting computation from remote data centers to local devices, less powerful but closer to the end user devices.
QaSAL: QoS-aware State-Augmented Learnable Algorithms for Coexistence of 5G NR-U/Wi-FiMohammad Reza Fasihi, Brian L. Mark2025-03-01下载With the increasing demand for wireless connectivity, ensuring the efficient coexistence of multiple radio access technologies in shared unlicensed spectrum has become an important issue.

cs.PF - Performance

标题作者发布日期PDF摘要
Breaking the Loop: Detecting and Mitigating Denial-of-Service Vulnerabilities in Large Language ModelsJunzhe Yu, Yi Liu, Huijia Sun, Ling Shi, Yuqi Chen2025-03-01下载Large Language Models (LLMs) have significantly advanced text understanding and generation, becoming integral to applications across education, software development, healthcare, entertainment, and leg...
A Microbenchmark Framework for Performance Evaluation of OpenMP Target OffloadingMohammad Atif, Tianle Wang, Zhihua Dong, Charles Leggett, Meifeng Lin2025-03-01下载We present a framework based on Catch2 to evaluate performance of OpenMP's target offload model via micro-benchmarks. The compilers supporting OpenMP's target offload model for heterogeneous architect...

基于 VitePress 构建