Appearance
2024-04-15
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Field-Programmable Gate Array Architecture for Deep Learning: Survey & Future Directions | Andrew Boutros, Aman Arora, Vaughn Betz | 2024-04-15 | 下载 | Deep learning (DL) is becoming the cornerstone of numerous applications both in datacenters and at the edge. Specialized hardware is often necessary to meet the performance requirements of state-of-th... |
| Error Detection and Correction Codes for Safe In-Memory Computations | Luca Parrini, Taha Soliman, Benjamin Hettwer, Jan Micha Borrmann, Simranjeet Singh, Ankit Bende, Vikas Rana, Farhad Merchant, Norbert Wehn | 2024-04-15 | 下载 | In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators. |
| Towards a high-performance AI compiler with upstream MLIR | Renato Golin, Lorenzo Chelini, Adam Siemieniuk, Kavitha Madhu, Niranjan Hasabnis, Hans Pabst, Evangelos Georganas, Alexander Heinecke | 2024-04-15 | 下载 | This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction. |
| Efficient and accurate neural field reconstruction using resistive memory | Yifei Yu, Shaocong Wang, Woyu Zhang, Xinyuan Zhang, Xiuzhe Wu, Yangu He, Jichang Yang, Yue Zhang, Ning Lin, Bo Wang, Xi Chen, Songqi Wang, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu | 2024-04-15 | 下载 | Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. |
| Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity | Cenlin Duan, Jianlei Yang, Yiou Wang, Yikun Wang, Yingjie Qi, Xiaolin He, Bonan Yan, Xueyan Wang, Xiaotao Jia, Weisheng Zhao | 2024-04-15 | 下载 | Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency. |
| LightningSimV2: Faster and Scalable Simulation for High-Level Synthesis via Graph Compilation and Optimization | Rishov Sarkar, Rachel Paul, Cong Hao | 2024-04-15 | 下载 | High-Level Synthesis (HLS) enables rapid prototyping of complex hardware designs by translating C or C++ code to low-level RTL code. However, the testing and evaluation of HLS designs still typically ... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Distributing Context-Aware Shared Memory Data Structures: A Case Study on Singly-Linked Lists | Raaghav Ravishankar, Sandeep Kulkarni, Sathya Peri, Gokarna Sharma | 2024-04-15 | 下载 | In this paper, we study the partitioning of a context-aware shared memory data structure so that it can be implemented as a distributed data structure running on multiple machines. |
| [FEDSTR: Money-In AI-Out | A Decentralized Marketplace for Federated Learning and LLM Training on the NOSTR Protocol](https://arxiv.org/abs/2404.15834v2) | Konstantinos E. Nikolakakis, George Chantzialexiou, Dionysis Kalogerias | 2024-04-15 | 下载 |
| Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data Partitioning | Chong Yu, Shuaiqi Shen, Shiqiang Wang, Kuan Zhang, Hai Zhao | 2024-04-15 | 下载 | E-health allows smart devices and medical institutions to collaboratively collect patients' data, which is trained by Artificial Intelligence (AI) technologies to help doctors make diagnosis. |
| Design and Implementation of a Java-Based Client-Server Application | Omkar Patil, Aarya Shirbhate | 2024-04-15 | 下载 | This report details the development of a networked distributed system named Group Communication System (GCS), implemented in Java to exemplify socket programming and communication protocols. |
| Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics | Ming Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su | 2024-04-15 | 下载 | Federated learning is a popular distributed learning approach for training a machine learning model without disclosing raw data. It consists of a parameter server and a possibly large collection of cl... |
| cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores | Zixuan Li, Mingxing Duan, Huizhang Luo, Wangdong Yang, Kenli Li, Keqin Li | 2024-04-15 | 下载 | Sparse tensors are prevalent in real-world applications, often characterized by their large-scale, high-order, and high-dimensional nature. Directly handling raw tensors is impractical due to the sign... |
| ChainScience 2024, Conference Proceedings | Nicolò Vallarano, Claudio J. Tessone | 2024-04-15 | 下载 | ChainScience 2024, the second edition of the interdisciplinary conference, brought together academics, practitioners, and industry experts to explore novel developments in the realm of distributed led... |
| AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster | Siyuan Li, Youshao Xiao, Fanzhuang Meng, Lin Ju, Lei Liang, Lin Wang, Jun Zhou | 2024-04-15 | 下载 | Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and compl... |
| AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes | Youshao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, Zhaoxin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou | 2024-04-15 | 下载 | Many distributed training techniques like Parameter Server and AllReduce have been proposed to take advantage of the increasingly large data and rich features. |
| Towards a high-performance AI compiler with upstream MLIR | Renato Golin, Lorenzo Chelini, Adam Siemieniuk, Kavitha Madhu, Niranjan Hasabnis, Hans Pabst, Evangelos Georganas, Alexander Heinecke | 2024-04-15 | 下载 | This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction. |
| Centralization in Proof-of-Stake Blockchains: A Game-Theoretic Analysis of Bootstrapping Protocols | Varul Srivastava, Sankarshan Damle, Sujit Gujar | 2024-04-15 | 下载 | Proof-of-stake (PoS) has emerged as a natural alternative to the resource-intensive Proof-of-Work (PoW) blockchain, as was recently seen with the Ethereum Merge. |
| Characterization and Mitigation of Insufficiencies in Automated Driving Systems | Yuting Fu, Jochen Seemann, Caspar Hanselaar, Tim Beurskens, Andrei Terechko, Emilia Silvas, Maurice Heemels | 2024-04-15 | 下载 | Automated Driving (AD) systems have the potential to increase safety, comfort and energy efficiency. Recently, major automotive companies have started testing and validating AD systems (ADS) on public... |
| Noiseless Privacy-Preserving Decentralized Learning | Sayan Biswas, Mathieu Even, Anne-Marie Kermarrec, Laurent Massoulie, Rafael Pires, Rishi Sharma, Martijn de Vos | 2024-04-15 | 下载 | Decentralized learning (DL) enables collaborative learning without a server and without training data leaving the users' devices. However, the models shared in DL can still be used to infer training d... |
| LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism | Bingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, Xin Jin | 2024-04-15 | 下载 | The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between different requests as well as between different phases of the same reques... |
| On Optimal Server Allocation for Moldable Jobs with Concave Speed-Up | Samira Ghanbarian, Arpan Mukhopadhyay, Ravi R. Mazumdar, Fabrice M. Guillemin | 2024-04-15 | 下载 | A large proportion of jobs submitted to modern computing clusters and data centers are parallelizable and capable of running on a flexible number of computing cores or servers. |
| Hybrid FedGraph: An efficient hybrid federated learning algorithm using graph convolutional neural network | Jaeyeon Jang, Diego Klabjan, Veena Mendiratta, Fanfei Meng | 2024-04-15 | 下载 | Federated learning is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Quality of Experience Oriented Cross-layer Optimization for Real-time XR Video Transmission | Guangjin Pan, Shugong Xu, Shunqing Zhang, Xiaojing Chen, Yanzan Sun | 2024-04-15 | 下载 | Extended reality (XR) is one of the most important applications of beyond 5G and 6G networks. Real-time XR video transmission presents challenges in terms of data rate and delay. |
| Decentralized Multi-Party Multi-Network AI for Global Deployment of 6G Wireless Systems | Merim Dzaferagic, Marco Ruffini, Nina Slamnik-Krijestorac, Joao F. Santos, Johann Marquez-Barja, Christos Tranoris, Spyros Denazis, Thomas Kyriakakis, Panagiotis Karafotis, Luiz DaSilva, Shashi Raj Pandey, Junya Shiraishi, Petar Popovski, Soren Kejser Jensen, Christian Thomsen, Torben Bach Pedersen, Holger Claussen, Jinfeng Du, Gil Zussman, Tingjun Chen, Yiran Chen, Seshu Tirupathi, Ivan Seskar, Daniel Kilper | 2024-04-15 | 下载 | Multiple visions of 6G networks elicit Artificial Intelligence (AI) as a central, native element. When 6G systems are deployed at a large scale, end-to-end AI-based solutions will necessarily have to ... |
| Improved methodology for longitudinal Web analytics using Common Crawl | Henry S. Thompson | 2024-04-15 | 下载 | Common Crawl is a multi-petabyte longitudinal dataset containing over 100 billion web pages which is widely used as a source of language data for sequence model training and in web science research. |
| Demonstration of a Networked Music Performance Experience with MEVO | Leonardo Severi, Matteo Sacchetto, Andrea Bianco, Cristina Rottondi, Aleksandra Knapinska, Piotr Lechowicz | 2024-04-15 | 下载 | In this paper we present a Networked Music Performance system currently under development at Politecnico di Torino. We demonstrate its use in a distributed concert held in June 2023, which featured th... |
| OpenAirLink: Reproducible Wireless Channel Emulation using Software Defined Radios | Yash Deshpande, Xianglong Wang, Wolfgang Kellerer | 2024-04-15 | 下载 | This paper presents OpenAirLink(OAL), an open-source channel emulator for reproducible testing of wireless scenarios. OAL is implemented on off-the-shelf software-defined radios (SDR) and presents a s... |
| LR-FHSS-Sim: A Discrete-Event Simulator for LR-FHSS Networks | Jean Michel de Souza Sant Ana, Arliones Hoeller, Hirley Alves, Richard Demo Souza | 2024-04-15 | 下载 | This work presents the LR-FHSS-Sim, a free and open-source discrete-event simulator for LR-FHSS networks. We highlight the importance of network modeling for IoT coverage, especially when it is needed... |
| A Distributed Scalable Cross-chain State Channel Scheme Based on Recursive State Synchronization | Xinyu Liang, Ruiying Du, Jing Chen, Yu Zhang, Meng Jia, Shuangxi Cao, Yufeng Wei, Shixiong Yao | 2024-04-15 | 下载 | As cross-chain technology continues to advance, the scale of cross-chain transactions is experiencing significant expansion. To improve scalability, researchers have turned to the study of cross-chain... |
| An Autoencoder-Based Constellation Design for AirComp in Wireless Federated Learning | Yujia Mu, Xizixiang Wei, Cong Shen | 2024-04-15 | 下载 | Wireless federated learning (FL) relies on efficient uplink communications to aggregate model updates across distributed edge devices. Over-the-air computation (a.k.a. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| LightningSimV2: Faster and Scalable Simulation for High-Level Synthesis via Graph Compilation and Optimization | Rishov Sarkar, Rachel Paul, Cong Hao | 2024-04-15 | 下载 | High-Level Synthesis (HLS) enables rapid prototyping of complex hardware designs by translating C or C++ code to low-level RTL code. However, the testing and evaluation of HLS designs still typically ... |
| On Optimal Server Allocation for Moldable Jobs with Concave Speed-Up | Samira Ghanbarian, Arpan Mukhopadhyay, Ravi R. Mazumdar, Fabrice M. Guillemin | 2024-04-15 | 下载 | A large proportion of jobs submitted to modern computing clusters and data centers are parallelizable and capable of running on a flexible number of computing cores or servers. |