Appearance
2025-03-01
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Leveraging Compute-in-Memory for Efficient Generative Model Inference in TPUs | Zhantong Zhu, Hongou Li, Wenjie Ren, Meng Wu, Le Ye, Ru Huang, Tianyu Jia | 2025-03-01 | 下载 | With the rapid advent of generative models, efficiently deploying these models on specialized hardware has become critical. Tensor Processing Units (TPUs) are designed to accelerate AI workloads, but ... |
| T-REX: A 68-567 μs/token, 0.41-3.95 μJ/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET | Seunghyun Moon, Mao Li, Gregory Chen, Phil Knag, Ram Krishnamurthy, Mingoo Seok | 2025-03-01 | 下载 | This work introduces novel training and post-training compression schemes to reduce external memory access during transformer model inference. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Communication-Efficient Device Scheduling for Federated Learning Using Lyapunov Optimization | Jake B. Perazzone, Shiqiang Wang, Mingyue Ji, Kevin Chan | 2025-03-01 | 下载 | Federated learning (FL) is a useful tool that enables the training of machine learning models over distributed data without having to collect data centrally. |
| A Modern Approach to Real-Time Air Traffic Management System | Priyank Vaidya, Vedansh Kamdar | 2025-03-01 | 下载 | Air traffic analytics systems are pivotal for ensuring safety, efficiency, and predictability in air travel. However, traditional systems struggle to handle the increasing volume and complexity of air... |
| Performance-Driven Optimization of Parallel Breadth-First Search | Marati Bhaskar, Raghavendra Kanakagiri | 2025-03-01 | 下载 | Breadth-first search (BFS) is a fundamental graph algorithm that presents significant challenges for parallel implementation due to irregular memory access patterns, load imbalance and synchronization... |
| Asynchronous Personalized Federated Learning through Global Memorization | Fan Wan, Yuchen Li, Xueqi Qiu, Rui Sun, Leyuan Zhang, Xingyu Miao, Tianyu Zhang, Haoran Duan, Yang Long | 2025-03-01 | 下载 | The proliferation of Internet of Things devices and advances in communication technology have unleashed an explosion of personal data, amplifying privacy concerns amid stringent regulations like GDPR ... |
| Conditioning on Local Statistics for Scalable Heterogeneous Federated Learning | Rickard Brännvall | 2025-03-01 | 下载 | Federated learning is a distributed machine learning approach where multiple clients collaboratively train a model without sharing their local data, which contributes to preserving privacy. |
| Echo: Efficient Co-Scheduling of Hybrid Online-Offline Tasks for Large Language Model Serving | Zhibin Wang, Shipeng Li, Xue Li, Yuhang Zhou, Zhonghui Zhang, Zibo Wang, Rong Gu, Chen Tian, Kun Yang, Sheng Zhong | 2025-03-01 | 下载 | Large language models have been widely deployed in various applications, encompassing both interactive online tasks and batched offline tasks. |
| FLStore: Efficient Federated Learning Storage for non-training workloads | Ahmad Faraz Khan, Samuel Fountain, Ahmed M. Abdelmoniem, Ali R. Butt, Ali Anwar | 2025-03-01 | 下载 | Federated Learning (FL) is an approach for privacy-preserving Machine Learning (ML), enabling model training across multiple clients without centralized data collection. |
| WgPy: GPU-accelerated NumPy-like array library for web browsers | Masatoshi Hidaka, Tatsuya Harada | 2025-03-01 | 下载 | To execute scientific computing programs such as deep learning at high speed, GPU acceleration is a powerful option. With the recent advancements in web technologies, interfaces like WebGL and WebGPU,... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| CATS: A framework for Cooperative Autonomy Trust & Security | Namo Asavisanu, Tina Khezresmaeilzadeh, Rohan Sequeira, Hang Qiu, Fawad Ahmad, Konstantinos Psounis, Ramesh Govindan | 2025-03-01 | 下载 | With cooperative perception, autonomous vehicles can wirelessly share sensor data and representations to overcome sensor occlusions, improving situational awareness. |
| Uncoordinated Access to Serverless Computing in MEC Systems for IoT | Claudio Cicconetti, Marco Conti, Andrea Passarella | 2025-03-01 | 下载 | Edge computing is a promising solution to enable low-latency IoT applications, by shifting computation from remote data centers to local devices, less powerful but closer to the end user devices. |
| QaSAL: QoS-aware State-Augmented Learnable Algorithms for Coexistence of 5G NR-U/Wi-Fi | Mohammad Reza Fasihi, Brian L. Mark | 2025-03-01 | 下载 | With the increasing demand for wireless connectivity, ensuring the efficient coexistence of multiple radio access technologies in shared unlicensed spectrum has become an important issue. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Breaking the Loop: Detecting and Mitigating Denial-of-Service Vulnerabilities in Large Language Models | Junzhe Yu, Yi Liu, Huijia Sun, Ling Shi, Yuqi Chen | 2025-03-01 | 下载 | Large Language Models (LLMs) have significantly advanced text understanding and generation, becoming integral to applications across education, software development, healthcare, entertainment, and leg... |
| A Microbenchmark Framework for Performance Evaluation of OpenMP Target Offloading | Mohammad Atif, Tianle Wang, Zhihua Dong, Charles Leggett, Meifeng Lin | 2025-03-01 | 下载 | We present a framework based on Catch2 to evaluate performance of OpenMP's target offload model via micro-benchmarks. The compilers supporting OpenMP's target offload model for heterogeneous architect... |