Appearance
2024-06-05
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Llumnix: Dynamic Scheduling for Large Language Model Serving | Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin | 2024-06-05 | 下载 | Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are i... |
| Soft GPGPU versus IP cores: Quantifying and Reducing the Performance Gap | Martin Langhammer, George A. Constantinides | 2024-06-05 | 下载 | eGPU, a recently-reported soft GPGPU for FPGAs, has demonstrated very high clock frequencies (more than 750 MHz) and small footprint. This means that for the first time, commercial soft processors may... |
| Floorplanning with I/O assignment via feasibility-seeking and superiorization methods | Shan Yu, Yair Censor, Guojie Luo | 2024-06-05 | 下载 | The feasibility-seeking approach offers a systematic framework for managing and resolving intricate constraints in continuous problems, making it a promising avenue to explore in the context of floorp... |
| HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator | Zhewen Yu, Sudarshan Sreeram, Krish Agrawal, Junyi Wu, Alexander Montgomerie-Corcoran, Cheng Zhang, Jianyi Cheng, Christos-Savvas Bouganis, Yiren Zhao | 2024-06-05 | 下载 | Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, thes... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Queue management for slo-oriented large language model serving | Archit Patke, Dhemath Reddy, Saurabh Jha, Haoran Qiu, Christian Pinto, Chandra Narayanaswami, Zbigniew Kalbarczyk, Ravishankar Iyer | 2024-06-05 | 下载 | Large language model (LLM) serving is becoming an increasingly critical workload for cloud providers. Existing LLM serving systems focus on interactive requests, such as chatbots and coding assistants... |
| FedPylot: Navigating Federated Learning for Real-Time Object Detection in Internet of Vehicles | Cyprien Quéméneur, Soumaya Cherkaoui | 2024-06-05 | 下载 | The Internet of Vehicles (IoV) emerges as a pivotal component for autonomous driving and intelligent transportation systems (ITS), by enabling low-latency big data processing in a dense interconnected... |
| Fantastyc: Blockchain-based Federated Learning Made Secure and Practical | William Boitier, Antonella Del Pozzo, Álvaro García-Pérez, Stephane Gazut, Pierre Jobic, Alexis Lemaire, Erwan Mahe, Aurelien Mayoue, Maxence Perion, Tuanir Franca Rezende, Deepika Singh, Sara Tucci-Piergiovanni | 2024-06-05 | 下载 | Federated Learning is a decentralized framework that enables multiple clients to collaboratively train a machine learning model under the orchestration of a central server without sharing their local ... |
| Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training | Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Xinrong Zhang, Zhiyuan Liu, Chuan Shi, Maosong Sun | 2024-06-05 | 下载 | The emergence of large language models (LLMs) relies heavily on distributed training strategies, among which pipeline parallelism plays a crucial role. |
| Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning | Saber Malekmohammadi, Yaoliang Yu, Yang Cao | 2024-06-05 | 下载 | High utility and rigorous data privacy are of the main goals of a federated learning (FL) system, which learns a model from the data distributed among some clients. |
| Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers | Thomas Bouvier, Bogdan Nicolae, Hugo Chaugier, Alexandru Costan, Ian Foster, Gabriel Antoniu | 2024-06-05 | 下载 | Deep learning has emerged as a powerful method for extracting valuable information from large volumes of data. However, when new training data arrives continuously (i.e. |
| Llumnix: Dynamic Scheduling for Large Language Model Serving | Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin | 2024-06-05 | 下载 | Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are i... |
| Brief Announcement: Distributed Unconstrained Local Search for Multilevel Graph Partitioning | Peter Sanders, Daniel Seemaier | 2024-06-05 | 下载 | Partitioning a graph into blocks of roughly equal weight while cutting only few edges is a fundamental problem in computer science with numerous practical applications. |
| Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading | Handong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen, Yue Gao | 2024-06-05 | 下载 | Low-orbit mega-constellation network, which utilize thousands of satellites to provide a variety of network services and collect a wide range of space information, is a rapidly growing field. |
| Detrimental task execution patterns in mainstream OpenMP runtimes | Adam S. Tuft, Tobias Weinzierl, Michael Klemm | 2024-06-05 | 下载 | The OpenMP API offers both task-based and data-parallel concepts to scientific computing. While it provides descriptive and prescriptive annotations, it is in many places deliberately unspecific how t... |
| PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs | Charlie Hou, Akshat Shrivastava, Hongyuan Zhan, Rylan Conway, Trang Le, Adithya Sagar, Giulia Fanti, Daniel Lazar | 2024-06-05 | 下载 | On-device training is currently the most common approach for training machine learning (ML) models on private, distributed user data. Despite this, on-device training has several drawbacks: (1) most u... |
| Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes | Yan Huang, Xiang Li, Yipeng Shen, Niao He, Jinming Xu | 2024-06-05 | 下载 | In this paper, we show that applying adaptive methods directly to distributed minimax problems can result in non-convergence due to inconsistency in locally computed adaptive stepsizes. |
| FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting | Jeffrey Ma, Alan Tu, Yiling Chen, Vijay Janapa Reddi | 2024-06-05 | 下载 | Federated Learning (FL) endeavors to harness decentralized data while preserving privacy, facing challenges of performance, scalability, and collaboration. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Active ML for 6G: Towards Efficient Data Generation, Acquisition, and Annotation | Omar Alhussein, Ning Zhang, Sami Muhaidat, Weihua Zhuang | 2024-06-05 | 下载 | This paper explores the integration of active machine learning (ML) for 6G networks, an area that remains under-explored yet holds potential. Unlike passive ML systems, active ML can be made to intera... |
| Optimization of Energy Consumption in Delay-Tolerant Networks | Junran Wang, Milena Radenkovic | 2024-06-05 | 下载 | Delay tolerant network is a network architecture and protocol suite specifically designed to handle challenging communications environments, such as deep space communications, disaster response, and r... |
| Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading | Handong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen, Yue Gao | 2024-06-05 | 下载 | Low-orbit mega-constellation network, which utilize thousands of satellites to provide a variety of network services and collect a wide range of space information, is a rapidly growing field. |
| To Sense or Not To Sense: A Delay Perspective (full version) | Xinran Zhao, Lin Dai | 2024-06-05 | 下载 | With the ever-growing demand for low-latency services in machine-to-machine (M2M) communications, the delay performance of random access networks has become a primary concern, which critically depends... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead | Amir Zandieh, Majid Daliri, Insu Han | 2024-06-05 | 下载 | Serving LLMs requires substantial memory due to the storage requirements of Key-Value (KV) embeddings in the KV cache, which grows with sequence length. |