Skip to content

2024-01-05

cs.AR - Architecture

标题作者发布日期PDF摘要
A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODEIkumi Okubo, Keisuke Sugiura, Hiroki Matsutani2024-01-05下载Transformer has been adopted to image recognition tasks and shown to outperform CNNs and RNNs while it suffers from high training cost and computational complexity.
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCacheBin Lin, Chen Zhang, Tao Peng, Hanyu Zhao, Wencong Xiao, Minmin Sun, Anmin Liu, Zhipeng Zhang, Lanbo Li, Xiafei Qiu, Shen Li, Zhigang Ji, Tao Xie, Yong Li, Wei Lin2024-01-05下载Large Language Models (LLMs) demonstrate substantial potential across a diverse array of domains via request serving. However, as trends continue to push for expanding context sizes, the autoregressiv...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Energy-efficient Decentralized Learning via Graph SparsificationXusheng Zhang, Cho-Chun Chiu, Ting He2024-01-05下载This work aims at improving the energy efficiency of decentralized learning by optimizing the mixing matrix, which controls the communication demands during the learning process.
AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident AnalysisKebin Wu, Wenbin Li, Xiaofei Xiao2024-01-05下载Traffic accident analysis is pivotal for enhancing public safety and developing road regulations. Traditional approaches, although widely used, are often constrained by manual analysis processes, subj...
Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decompositionAdnan Hoque, Less Wright, Chih-Chieh Yang, Mudhakar Srivatsa, Raghu Ganti2024-01-05下载We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposit...
Analytically-Driven Resource Management for Cloud-Native MicroservicesYanqi Zhang, Zhuangzhuang Zhou, Sameh Elnikety, Christina Delimitrou2024-01-05下载Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such...
An Analysis of Avalanche ConsensusIgnacio Amores-Sesar, Christian Cachin, Philipp Schneider2024-01-05下载A family of leaderless, decentralized consensus protocols, called Snow consensus was introduced in a recent whitepaper by Yin et al. These protocols address limitations of existing consensus methods, ...
Experimental Evaluation of the PHP's cURL Library PerformanceYordan Kalmukov2024-01-05下载cURL (libcurl) is a popular and widely used library distributed with the php interpreter. It allows php applications to connect to and communicate with external resources (servers) by using wide varie...
Lock-free de Bruijn graphDaniel Górniak, Robert Nowak2024-01-05下载De Bruijn graph is one of the most important data structures used in de-novo genome assembly algorithms, especially for NGS data. There is a growing need for parallel data structures and algorithms du...
Fairness-Aware Job Scheduling for Multi-Job Federated LearningYuxin Shi, Han Yu2024-01-05下载Federated learning (FL) enables multiple data owners (a.k.a. FL clients) to collaboratively train machine learning models without disclosing sensitive private data.
FedNS: A Fast Sketching Newton-Type Algorithm for Federated LearningJian Li, Yong Liu, Wei Wang, Haoran Wu, Weiping Wang2024-01-05下载Recent Newton-type federated learning algorithms have demonstrated linear convergence with respect to the communication rounds. However, communicating Hessian matrices is often unfeasible due to their...
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPCWei-Chen Lin, Simon McIntosh-Smith, Tom Deakin2024-01-05下载Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU p...
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCacheBin Lin, Chen Zhang, Tao Peng, Hanyu Zhao, Wencong Xiao, Minmin Sun, Anmin Liu, Zhipeng Zhang, Lanbo Li, Xiafei Qiu, Shen Li, Zhigang Ji, Tao Xie, Yong Li, Wei Lin2024-01-05下载Large Language Models (LLMs) demonstrate substantial potential across a diverse array of domains via request serving. However, as trends continue to push for expanding context sizes, the autoregressiv...
Towards Integrated Fine-tuning and Inference when Generative AI meets Edge IntelligenceNing Chen, Zhipeng Cheng, Xuwei Fan, Xiaoyu Xia, Lianfen Huang2024-01-05下载The high-performance generative artificial intelligence (GAI) represents the latest evolution of computational intelligence, while the blessing of future 6G networks also makes edge intelligence (EI) ...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Scheduling for Downlink OFDMA With IRS Reconfiguration ConstraintsAlberto Rech, Leonardo Badia, Stefano Tomasin2024-01-05下载The technical limitations of the intelligent reflecting surface (IRS) (re)configurations in terms of both communication overhead and energy efficiency must be considered when IRSs are used in cellular...
Reliability-Optimized User Admission Control for URLLC Traffic: A Neural Contextual Bandit ApproachOmid Semiari, Hosein Nikopour, Shilpa Talwar2024-01-05下载Ultra-reliable low-latency communication (URLLC) is the cornerstone for a broad range of emerging services in next-generation wireless networks.
Credence: Augmenting Datacenter Switch Buffer Sharing with ML PredictionsVamsi Addanki, Maciej Pacut, Stefan Schmid2024-01-05下载Packet buffers in datacenter switches are shared across all the switch ports in order to improve the overall throughput. The trend of shrinking buffer sizes in datacenter switches makes buffer sharing...
LMaaS: Exploring Pricing Strategy of Large Model as a Service for CommunicationPanlong Wu, Qi Liu, Yanjie Dong, Fangxin Wang2024-01-05下载The next generation of communication is envisioned to be intelligent communication, that can replace traditional symbolic communication, where highly condensed semantic information considering both so...
GainNet: Coordinates the Odd Couple of Generative AI and 6G NetworksNing Chen, Jie Yang, Zhipeng Cheng, Xuwei Fan, Zhang Liu, Bangzhen Huang, Yifeng Zhao, Lianfen Huang, Xiaojiang Du, Mohsen Guizani2024-01-05下载The rapid expansion of AI-generated content (AIGC) reflects the iteration from assistive AI towards generative AI (GAI) with creativity. Meanwhile, the 6G networks will also evolve from the Internet-o...

cs.PF - Performance

标题作者发布日期PDF摘要
Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPCWei-Chen Lin, Simon McIntosh-Smith, Tom Deakin2024-01-05下载Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU p...

基于 VitePress 构建