Appearance
2024-01-05
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE | Ikumi Okubo, Keisuke Sugiura, Hiroki Matsutani | 2024-01-05 | 下载 | Transformer has been adopted to image recognition tasks and shown to outperform CNNs and RNNs while it suffers from high training cost and computational complexity. |
| Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache | Bin Lin, Chen Zhang, Tao Peng, Hanyu Zhao, Wencong Xiao, Minmin Sun, Anmin Liu, Zhipeng Zhang, Lanbo Li, Xiafei Qiu, Shen Li, Zhigang Ji, Tao Xie, Yong Li, Wei Lin | 2024-01-05 | 下载 | Large Language Models (LLMs) demonstrate substantial potential across a diverse array of domains via request serving. However, as trends continue to push for expanding context sizes, the autoregressiv... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Energy-efficient Decentralized Learning via Graph Sparsification | Xusheng Zhang, Cho-Chun Chiu, Ting He | 2024-01-05 | 下载 | This work aims at improving the energy efficiency of decentralized learning by optimizing the mixing matrix, which controls the communication demands during the learning process. |
| AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident Analysis | Kebin Wu, Wenbin Li, Xiaofei Xiao | 2024-01-05 | 下载 | Traffic accident analysis is pivotal for enhancing public safety and developing road regulations. Traditional approaches, although widely used, are often constrained by manual analysis processes, subj... |
| Accelerating a Triton Fused Kernel for W4A16 Quantized Inference with SplitK work decomposition | Adnan Hoque, Less Wright, Chih-Chieh Yang, Mudhakar Srivatsa, Raghu Ganti | 2024-01-05 | 下载 | We propose an implementation of an efficient fused matrix multiplication kernel for W4A16 quantized inference, where we perform dequantization and GEMM in a fused kernel using a SplitK work decomposit... |
| Analytically-Driven Resource Management for Cloud-Native Microservices | Yanqi Zhang, Zhuangzhuang Zhou, Sameh Elnikety, Christina Delimitrou | 2024-01-05 | 下载 | Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such... |
| An Analysis of Avalanche Consensus | Ignacio Amores-Sesar, Christian Cachin, Philipp Schneider | 2024-01-05 | 下载 | A family of leaderless, decentralized consensus protocols, called Snow consensus was introduced in a recent whitepaper by Yin et al. These protocols address limitations of existing consensus methods, ... |
| Experimental Evaluation of the PHP's cURL Library Performance | Yordan Kalmukov | 2024-01-05 | 下载 | cURL (libcurl) is a popular and widely used library distributed with the php interpreter. It allows php applications to connect to and communicate with external resources (servers) by using wide varie... |
| Lock-free de Bruijn graph | Daniel Górniak, Robert Nowak | 2024-01-05 | 下载 | De Bruijn graph is one of the most important data structures used in de-novo genome assembly algorithms, especially for NGS data. There is a growing need for parallel data structures and algorithms du... |
| Fairness-Aware Job Scheduling for Multi-Job Federated Learning | Yuxin Shi, Han Yu | 2024-01-05 | 下载 | Federated learning (FL) enables multiple data owners (a.k.a. FL clients) to collaboratively train machine learning models without disclosing sensitive private data. |
| FedNS: A Fast Sketching Newton-Type Algorithm for Federated Learning | Jian Li, Yong Liu, Wei Wang, Haoran Wu, Weiping Wang | 2024-01-05 | 下载 | Recent Newton-type federated learning algorithms have demonstrated linear convergence with respect to the communication rounds. However, communicating Hessian matrices is often unfeasible due to their... |
| Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC | Wei-Chen Lin, Simon McIntosh-Smith, Tom Deakin | 2024-01-05 | 下载 | Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU p... |
| Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache | Bin Lin, Chen Zhang, Tao Peng, Hanyu Zhao, Wencong Xiao, Minmin Sun, Anmin Liu, Zhipeng Zhang, Lanbo Li, Xiafei Qiu, Shen Li, Zhigang Ji, Tao Xie, Yong Li, Wei Lin | 2024-01-05 | 下载 | Large Language Models (LLMs) demonstrate substantial potential across a diverse array of domains via request serving. However, as trends continue to push for expanding context sizes, the autoregressiv... |
| Towards Integrated Fine-tuning and Inference when Generative AI meets Edge Intelligence | Ning Chen, Zhipeng Cheng, Xuwei Fan, Xiaoyu Xia, Lianfen Huang | 2024-01-05 | 下载 | The high-performance generative artificial intelligence (GAI) represents the latest evolution of computational intelligence, while the blessing of future 6G networks also makes edge intelligence (EI) ... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Scheduling for Downlink OFDMA With IRS Reconfiguration Constraints | Alberto Rech, Leonardo Badia, Stefano Tomasin | 2024-01-05 | 下载 | The technical limitations of the intelligent reflecting surface (IRS) (re)configurations in terms of both communication overhead and energy efficiency must be considered when IRSs are used in cellular... |
| Reliability-Optimized User Admission Control for URLLC Traffic: A Neural Contextual Bandit Approach | Omid Semiari, Hosein Nikopour, Shilpa Talwar | 2024-01-05 | 下载 | Ultra-reliable low-latency communication (URLLC) is the cornerstone for a broad range of emerging services in next-generation wireless networks. |
| Credence: Augmenting Datacenter Switch Buffer Sharing with ML Predictions | Vamsi Addanki, Maciej Pacut, Stefan Schmid | 2024-01-05 | 下载 | Packet buffers in datacenter switches are shared across all the switch ports in order to improve the overall throughput. The trend of shrinking buffer sizes in datacenter switches makes buffer sharing... |
| LMaaS: Exploring Pricing Strategy of Large Model as a Service for Communication | Panlong Wu, Qi Liu, Yanjie Dong, Fangxin Wang | 2024-01-05 | 下载 | The next generation of communication is envisioned to be intelligent communication, that can replace traditional symbolic communication, where highly condensed semantic information considering both so... |
| GainNet: Coordinates the Odd Couple of Generative AI and 6G Networks | Ning Chen, Jie Yang, Zhipeng Cheng, Xuwei Fan, Zhang Liu, Bangzhen Huang, Yifeng Zhao, Lianfen Huang, Xiaojiang Du, Mohsen Guizani | 2024-01-05 | 下载 | The rapid expansion of AI-generated content (AIGC) reflects the iteration from assistive AI towards generative AI (GAI) with creativity. Meanwhile, the 6G networks will also evolve from the Internet-o... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC | Wei-Chen Lin, Simon McIntosh-Smith, Tom Deakin | 2024-01-05 | 下载 | Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU p... |