Skip to content

2024-06-05

cs.AR - Architecture

标题作者发布日期PDF摘要
Llumnix: Dynamic Scheduling for Large Language Model ServingBiao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin2024-06-05下载Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are i...
Soft GPGPU versus IP cores: Quantifying and Reducing the Performance GapMartin Langhammer, George A. Constantinides2024-06-05下载eGPU, a recently-reported soft GPGPU for FPGAs, has demonstrated very high clock frequencies (more than 750 MHz) and small footprint. This means that for the first time, commercial soft processors may...
Floorplanning with I/O assignment via feasibility-seeking and superiorization methodsShan Yu, Yair Censor, Guojie Luo2024-06-05下载The feasibility-seeking approach offers a systematic framework for managing and resolving intricate constraints in continuous problems, making it a promising avenue to explore in the context of floorp...
HASS: Hardware-Aware Sparsity Search for Dataflow DNN AcceleratorZhewen Yu, Sudarshan Sreeram, Krish Agrawal, Junyi Wu, Alexander Montgomerie-Corcoran, Cheng Zhang, Jianyi Cheng, Christos-Savvas Bouganis, Yiren Zhao2024-06-05下载Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, thes...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Queue management for slo-oriented large language model servingArchit Patke, Dhemath Reddy, Saurabh Jha, Haoran Qiu, Christian Pinto, Chandra Narayanaswami, Zbigniew Kalbarczyk, Ravishankar Iyer2024-06-05下载Large language model (LLM) serving is becoming an increasingly critical workload for cloud providers. Existing LLM serving systems focus on interactive requests, such as chatbots and coding assistants...
FedPylot: Navigating Federated Learning for Real-Time Object Detection in Internet of VehiclesCyprien Quéméneur, Soumaya Cherkaoui2024-06-05下载The Internet of Vehicles (IoV) emerges as a pivotal component for autonomous driving and intelligent transportation systems (ITS), by enabling low-latency big data processing in a dense interconnected...
Fantastyc: Blockchain-based Federated Learning Made Secure and PracticalWilliam Boitier, Antonella Del Pozzo, Álvaro García-Pérez, Stephane Gazut, Pierre Jobic, Alexis Lemaire, Erwan Mahe, Aurelien Mayoue, Maxence Perion, Tuanir Franca Rezende, Deepika Singh, Sara Tucci-Piergiovanni2024-06-05下载Federated Learning is a decentralized framework that enables multiple clients to collaboratively train a machine learning model under the orchestration of a central server without sharing their local ...
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model TrainingAo Sun, Weilin Zhao, Xu Han, Cheng Yang, Xinrong Zhang, Zhiyuan Liu, Chuan Shi, Maosong Sun2024-06-05下载The emergence of large language models (LLMs) relies heavily on distributed training strategies, among which pipeline parallelism plays a crucial role.
Noise-Aware Algorithm for Heterogeneous Differentially Private Federated LearningSaber Malekmohammadi, Yaoliang Yu, Yang Cao2024-06-05下载High utility and rigorous data privacy are of the main goals of a federated learning (FL) system, which learns a model from the data distributed among some clients.
Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal BuffersThomas Bouvier, Bogdan Nicolae, Hugo Chaugier, Alexandru Costan, Ian Foster, Gabriel Antoniu2024-06-05下载Deep learning has emerged as a powerful method for extracting valuable information from large volumes of data. However, when new training data arrives continuously (i.e.
Llumnix: Dynamic Scheduling for Large Language Model ServingBiao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin2024-06-05下载Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are i...
Brief Announcement: Distributed Unconstrained Local Search for Multilevel Graph PartitioningPeter Sanders, Daniel Seemaier2024-06-05下载Partitioning a graph into blocks of roughly equal weight while cutting only few edges is a fundamental problem in computer science with numerous practical applications.
Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data DownloadingHandong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen, Yue Gao2024-06-05下载Low-orbit mega-constellation network, which utilize thousands of satellites to provide a variety of network services and collect a wide range of space information, is a rapidly growing field.
Detrimental task execution patterns in mainstream OpenMP runtimesAdam S. Tuft, Tobias Weinzierl, Michael Klemm2024-06-05下载The OpenMP API offers both task-based and data-parallel concepts to scientific computing. While it provides descriptive and prescriptive annotations, it is in many places deliberately unspecific how t...
PrE-Text: Training Language Models on Private Federated Data in the Age of LLMsCharlie Hou, Akshat Shrivastava, Hongyuan Zhan, Rylan Conway, Trang Le, Adithya Sagar, Giulia Fanti, Daniel Lazar2024-06-05下载On-device training is currently the most common approach for training machine learning (ML) models on private, distributed user data. Despite this, on-device training has several drawbacks: (1) most u...
Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive StepsizesYan Huang, Xiang Li, Yipeng Shen, Niao He, Jinming Xu2024-06-05下载In this paper, we show that applying adaptive methods directly to distributed minimax problems can result in non-convergence due to inconsistency in locally computed adaptive stepsizes.
FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness ReweightingJeffrey Ma, Alan Tu, Yiling Chen, Vijay Janapa Reddi2024-06-05下载Federated Learning (FL) endeavors to harness decentralized data while preserving privacy, facing challenges of performance, scalability, and collaboration.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Active ML for 6G: Towards Efficient Data Generation, Acquisition, and AnnotationOmar Alhussein, Ning Zhang, Sami Muhaidat, Weihua Zhuang2024-06-05下载This paper explores the integration of active machine learning (ML) for 6G networks, an area that remains under-explored yet holds potential. Unlike passive ML systems, active ML can be made to intera...
Optimization of Energy Consumption in Delay-Tolerant NetworksJunran Wang, Milena Radenkovic2024-06-05下载Delay tolerant network is a network architecture and protocol suite specifically designed to handle challenging communications environments, such as deep space communications, disaster response, and r...
Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data DownloadingHandong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen, Yue Gao2024-06-05下载Low-orbit mega-constellation network, which utilize thousands of satellites to provide a variety of network services and collect a wide range of space information, is a rapidly growing field.
To Sense or Not To Sense: A Delay Perspective (full version)Xinran Zhao, Lin Dai2024-06-05下载With the ever-growing demand for low-latency services in machine-to-machine (M2M) communications, the delay performance of random access networks has become a primary concern, which critically depends...

cs.PF - Performance

标题作者发布日期PDF摘要
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero OverheadAmir Zandieh, Majid Daliri, Insu Han2024-06-05下载Serving LLMs requires substantial memory due to the storage requirements of Key-Value (KV) embeddings in the KV cache, which grows with sequence length.

基于 VitePress 构建