2024-06-05

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Llumnix: Dynamic Scheduling for Large Language Model Serving	Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin	2024-06-05	下载	Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are i...
Soft GPGPU versus IP cores: Quantifying and Reducing the Performance Gap	Martin Langhammer, George A. Constantinides	2024-06-05	下载	eGPU, a recently-reported soft GPGPU for FPGAs, has demonstrated very high clock frequencies (more than 750 MHz) and small footprint. This means that for the first time, commercial soft processors may...
Floorplanning with I/O assignment via feasibility-seeking and superiorization methods	Shan Yu, Yair Censor, Guojie Luo	2024-06-05	下载	The feasibility-seeking approach offers a systematic framework for managing and resolving intricate constraints in continuous problems, making it a promising avenue to explore in the context of floorp...
HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator	Zhewen Yu, Sudarshan Sreeram, Krish Agrawal, Junyi Wu, Alexander Montgomerie-Corcoran, Cheng Zhang, Jianyi Cheng, Christos-Savvas Bouganis, Yiren Zhao	2024-06-05	下载	Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, thes...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Queue management for slo-oriented large language model serving	Archit Patke, Dhemath Reddy, Saurabh Jha, Haoran Qiu, Christian Pinto, Chandra Narayanaswami, Zbigniew Kalbarczyk, Ravishankar Iyer	2024-06-05	下载	Large language model (LLM) serving is becoming an increasingly critical workload for cloud providers. Existing LLM serving systems focus on interactive requests, such as chatbots and coding assistants...
FedPylot: Navigating Federated Learning for Real-Time Object Detection in Internet of Vehicles	Cyprien Quéméneur, Soumaya Cherkaoui	2024-06-05	下载	The Internet of Vehicles (IoV) emerges as a pivotal component for autonomous driving and intelligent transportation systems (ITS), by enabling low-latency big data processing in a dense interconnected...
Fantastyc: Blockchain-based Federated Learning Made Secure and Practical	William Boitier, Antonella Del Pozzo, Álvaro García-Pérez, Stephane Gazut, Pierre Jobic, Alexis Lemaire, Erwan Mahe, Aurelien Mayoue, Maxence Perion, Tuanir Franca Rezende, Deepika Singh, Sara Tucci-Piergiovanni	2024-06-05	下载	Federated Learning is a decentralized framework that enables multiple clients to collaboratively train a machine learning model under the orchestration of a central server without sharing their local ...
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training	Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Xinrong Zhang, Zhiyuan Liu, Chuan Shi, Maosong Sun	2024-06-05	下载	The emergence of large language models (LLMs) relies heavily on distributed training strategies, among which pipeline parallelism plays a crucial role.
Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning	Saber Malekmohammadi, Yaoliang Yu, Yang Cao	2024-06-05	下载	High utility and rigorous data privacy are of the main goals of a federated learning (FL) system, which learns a model from the data distributed among some clients.
Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers	Thomas Bouvier, Bogdan Nicolae, Hugo Chaugier, Alexandru Costan, Ian Foster, Gabriel Antoniu	2024-06-05	下载	Deep learning has emerged as a powerful method for extracting valuable information from large volumes of data. However, when new training data arrives continuously (i.e.
Llumnix: Dynamic Scheduling for Large Language Model Serving	Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin	2024-06-05	下载	Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are i...
Brief Announcement: Distributed Unconstrained Local Search for Multilevel Graph Partitioning	Peter Sanders, Daniel Seemaier	2024-06-05	下载	Partitioning a graph into blocks of roughly equal weight while cutting only few edges is a fundamental problem in computer science with numerous practical applications.
Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading	Handong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen, Yue Gao	2024-06-05	下载	Low-orbit mega-constellation network, which utilize thousands of satellites to provide a variety of network services and collect a wide range of space information, is a rapidly growing field.
Detrimental task execution patterns in mainstream OpenMP runtimes	Adam S. Tuft, Tobias Weinzierl, Michael Klemm	2024-06-05	下载	The OpenMP API offers both task-based and data-parallel concepts to scientific computing. While it provides descriptive and prescriptive annotations, it is in many places deliberately unspecific how t...
PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs	Charlie Hou, Akshat Shrivastava, Hongyuan Zhan, Rylan Conway, Trang Le, Adithya Sagar, Giulia Fanti, Daniel Lazar	2024-06-05	下载	On-device training is currently the most common approach for training machine learning (ML) models on private, distributed user data. Despite this, on-device training has several drawbacks: (1) most u...
Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes	Yan Huang, Xiang Li, Yipeng Shen, Niao He, Jinming Xu	2024-06-05	下载	In this paper, we show that applying adaptive methods directly to distributed minimax problems can result in non-convergence due to inconsistency in locally computed adaptive stepsizes.
FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting	Jeffrey Ma, Alan Tu, Yiling Chen, Vijay Janapa Reddi	2024-06-05	下载	Federated Learning (FL) endeavors to harness decentralized data while preserving privacy, facing challenges of performance, scalability, and collaboration.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Active ML for 6G: Towards Efficient Data Generation, Acquisition, and Annotation	Omar Alhussein, Ning Zhang, Sami Muhaidat, Weihua Zhuang	2024-06-05	下载	This paper explores the integration of active machine learning (ML) for 6G networks, an area that remains under-explored yet holds potential. Unlike passive ML systems, active ML can be made to intera...
Optimization of Energy Consumption in Delay-Tolerant Networks	Junran Wang, Milena Radenkovic	2024-06-05	下载	Delay tolerant network is a network architecture and protocol suite specifically designed to handle challenging communications environments, such as deep space communications, disaster response, and r...
Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading	Handong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen, Yue Gao	2024-06-05	下载	Low-orbit mega-constellation network, which utilize thousands of satellites to provide a variety of network services and collect a wide range of space information, is a rapidly growing field.
To Sense or Not To Sense: A Delay Perspective (full version)	Xinran Zhao, Lin Dai	2024-06-05	下载	With the ever-growing demand for low-latency services in machine-to-machine (M2M) communications, the delay performance of random access networks has become a primary concern, which critically depends...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead	Amir Zandieh, Majid Daliri, Insu Han	2024-06-05	下载	Serving LLMs requires substantial memory due to the storage requirements of Key-Value (KV) embeddings in the KV cache, which grows with sequence length.