2024-01-05

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE	Ikumi Okubo, Keisuke Sugiura, Hiroki Matsutani	2024-01-05	下载	Transformer has been adopted to image recognition tasks and shown to outperform CNNs and RNNs while it suffers from high training cost and computational complexity.
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache	Bin Lin, Chen Zhang, Tao Peng, Hanyu Zhao, Wencong Xiao, Minmin Sun, Anmin Liu, Zhipeng Zhang, Lanbo Li, Xiafei Qiu, Shen Li, Zhigang Ji, Tao Xie, Yong Li, Wei Lin	2024-01-05	下载	Large Language Models (LLMs) demonstrate substantial potential across a diverse array of domains via request serving. However, as trends continue to push for expanding context sizes, the autoregressiv...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Energy-efficient Decentralized Learning via Graph Sparsification	Xusheng Zhang, Cho-Chun Chiu, Ting He	2024-01-05	下载	This work aims at improving the energy efficiency of decentralized learning by optimizing the mixing matrix, which controls the communication demands during the learning process.
AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis	Kebin Wu, Wenbin Li, Xiaofei Xiao	2024-01-05	下载	Traffic accident analysis is pivotal for enhancing public safety and developing road regulations. Traditional approaches, although widely used, are often constrained by manual analysis processes, subj...
Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition	Adnan Hoque, Less Wright, Chih-Chieh Yang, Mudhakar Srivatsa, Raghu Ganti	2024-01-05	下载	We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposit...
Analytically-Driven Resource Management for Cloud-Native Microservices	Yanqi Zhang, Zhuangzhuang Zhou, Sameh Elnikety, Christina Delimitrou	2024-01-05	下载	Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such...
An Analysis of Avalanche Consensus	Ignacio Amores-Sesar, Christian Cachin, Philipp Schneider	2024-01-05	下载	A family of leaderless, decentralized consensus protocols, called Snow consensus was introduced in a recent whitepaper by Yin et al. These protocols address limitations of existing consensus methods, ...
Experimental Evaluation of the PHP's cURL Library Performance	Yordan Kalmukov	2024-01-05	下载	cURL (libcurl) is a popular and widely used library distributed with the php interpreter. It allows php applications to connect to and communicate with external resources (servers) by using wide varie...
Lock-free de Bruijn graph	Daniel Górniak, Robert Nowak	2024-01-05	下载	De Bruijn graph is one of the most important data structures used in de-novo genome assembly algorithms, especially for NGS data. There is a growing need for parallel data structures and algorithms du...
Fairness-Aware Job Scheduling for Multi-Job Federated Learning	Yuxin Shi, Han Yu	2024-01-05	下载	Federated learning (FL) enables multiple data owners (a.k.a. FL clients) to collaboratively train machine learning models without disclosing sensitive private data.
FedNS: A Fast Sketching Newton-Type Algorithm for Federated Learning	Jian Li, Yong Liu, Wei Wang, Haoran Wu, Weiping Wang	2024-01-05	下载	Recent Newton-type federated learning algorithms have demonstrated linear convergence with respect to the communication rounds. However, communicating Hessian matrices is often unfeasible due to their...
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC	Wei-Chen Lin, Simon McIntosh-Smith, Tom Deakin	2024-01-05	下载	Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU p...
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache	Bin Lin, Chen Zhang, Tao Peng, Hanyu Zhao, Wencong Xiao, Minmin Sun, Anmin Liu, Zhipeng Zhang, Lanbo Li, Xiafei Qiu, Shen Li, Zhigang Ji, Tao Xie, Yong Li, Wei Lin	2024-01-05	下载	Large Language Models (LLMs) demonstrate substantial potential across a diverse array of domains via request serving. However, as trends continue to push for expanding context sizes, the autoregressiv...
Towards Integrated Fine-tuning and Inference when Generative AI meets Edge Intelligence	Ning Chen, Zhipeng Cheng, Xuwei Fan, Xiaoyu Xia, Lianfen Huang	2024-01-05	下载	The high-performance generative artificial intelligence (GAI) represents the latest evolution of computational intelligence, while the blessing of future 6G networks also makes edge intelligence (EI) ...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Scheduling for Downlink OFDMA With IRS Reconfiguration Constraints	Alberto Rech, Leonardo Badia, Stefano Tomasin	2024-01-05	下载	The technical limitations of the intelligent reflecting surface (IRS) (re)configurations in terms of both communication overhead and energy efficiency must be considered when IRSs are used in cellular...
Reliability-Optimized User Admission Control for URLLC Traffic: A Neural Contextual Bandit Approach	Omid Semiari, Hosein Nikopour, Shilpa Talwar	2024-01-05	下载	Ultra-reliable low-latency communication (URLLC) is the cornerstone for a broad range of emerging services in next-generation wireless networks.
Credence: Augmenting Datacenter Switch Buffer Sharing with ML Predictions	Vamsi Addanki, Maciej Pacut, Stefan Schmid	2024-01-05	下载	Packet buffers in datacenter switches are shared across all the switch ports in order to improve the overall throughput. The trend of shrinking buffer sizes in datacenter switches makes buffer sharing...
LMaaS: Exploring Pricing Strategy of Large Model as a Service for Communication	Panlong Wu, Qi Liu, Yanjie Dong, Fangxin Wang	2024-01-05	下载	The next generation of communication is envisioned to be intelligent communication, that can replace traditional symbolic communication, where highly condensed semantic information considering both so...
GainNet: Coordinates the Odd Couple of Generative AI and 6G Networks	Ning Chen, Jie Yang, Zhipeng Cheng, Xuwei Fan, Zhang Liu, Bangzhen Huang, Yifeng Zhao, Lianfen Huang, Xiaojiang Du, Mohsen Guizani	2024-01-05	下载	The rapid expansion of AI-generated content (AIGC) reflects the iteration from assistive AI towards generative AI (GAI) with creativity. Meanwhile, the 6G networks will also evolve from the Internet-o...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC	Wei-Chen Lin, Simon McIntosh-Smith, Tom Deakin	2024-01-05	下载	Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU p...