2024-04-15

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Field-Programmable Gate Array Architecture for Deep Learning: Survey & Future Directions	Andrew Boutros, Aman Arora, Vaughn Betz	2024-04-15	下载	Deep learning (DL) is becoming the cornerstone of numerous applications both in datacenters and at the edge. Specialized hardware is often necessary to meet the performance requirements of state-of-th...
Error Detection and Correction Codes for Safe In-Memory Computations	Luca Parrini, Taha Soliman, Benjamin Hettwer, Jan Micha Borrmann, Simranjeet Singh, Ankit Bende, Vikas Rana, Farhad Merchant, Norbert Wehn	2024-04-15	下载	In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators.
Towards a high-performance AI compiler with upstream MLIR	Renato Golin, Lorenzo Chelini, Adam Siemieniuk, Kavitha Madhu, Niranjan Hasabnis, Hans Pabst, Evangelos Georganas, Alexander Heinecke	2024-04-15	下载	This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction.
Efficient and accurate neural field reconstruction using resistive memory	Yifei Yu, Shaocong Wang, Woyu Zhang, Xinyuan Zhang, Xiuzhe Wu, Yangu He, Jichang Yang, Yue Zhang, Ning Lin, Bo Wang, Xi Chen, Songqi Wang, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu	2024-04-15	下载	Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency.
Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity	Cenlin Duan, Jianlei Yang, Yiou Wang, Yikun Wang, Yingjie Qi, Xiaolin He, Bonan Yan, Xueyan Wang, Xiaotao Jia, Weisheng Zhao	2024-04-15	下载	Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency.
LightningSimV2: Faster and Scalable Simulation for High-Level Synthesis via Graph Compilation and Optimization	Rishov Sarkar, Rachel Paul, Cong Hao	2024-04-15	下载	High-Level Synthesis (HLS) enables rapid prototyping of complex hardware designs by translating C or C++ code to low-level RTL code. However, the testing and evaluation of HLS designs still typically ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Distributing Context-Aware Shared Memory Data Structures: A Case Study on Singly-Linked Lists	Raaghav Ravishankar, Sandeep Kulkarni, Sathya Peri, Gokarna Sharma	2024-04-15	下载	In this paper, we study the partitioning of a context-aware shared memory data structure so that it can be implemented as a distributed data structure running on multiple machines.
[FEDSTR: Money-In AI-Out	A Decentralized Marketplace for Federated Learning and LLM Training on the NOSTR Protocol](https://arxiv.org/abs/2404.15834v2)	Konstantinos E. Nikolakakis, George Chantzialexiou, Dionysis Kalogerias	2024-04-15	下载
Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data Partitioning	Chong Yu, Shuaiqi Shen, Shiqiang Wang, Kuan Zhang, Hai Zhao	2024-04-15	下载	E-health allows smart devices and medical institutions to collaboratively collect patients' data, which is trained by Artificial Intelligence (AI) technologies to help doctors make diagnosis.
Design and Implementation of a Java-Based Client-Server Application	Omkar Patil, Aarya Shirbhate	2024-04-15	下载	This report details the development of a networked distributed system named Group Communication System (GCS), implemented in Java to exemplify socket programming and communication protocols.
Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics	Ming Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su	2024-04-15	下载	Federated learning is a popular distributed learning approach for training a machine learning model without disclosing raw data. It consists of a parameter server and a possibly large collection of cl...
cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores	Zixuan Li, Mingxing Duan, Huizhang Luo, Wangdong Yang, Kenli Li, Keqin Li	2024-04-15	下载	Sparse tensors are prevalent in real-world applications, often characterized by their large-scale, high-order, and high-dimensional nature. Directly handling raw tensors is impractical due to the sign...
ChainScience 2024, Conference Proceedings	Nicolò Vallarano, Claudio J. Tessone	2024-04-15	下载	ChainScience 2024, the second edition of the interdisciplinary conference, brought together academics, practitioners, and industry experts to explore novel developments in the realm of distributed led...
AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster	Siyuan Li, Youshao Xiao, Fanzhuang Meng, Lin Ju, Lei Liang, Lin Wang, Jun Zhou	2024-04-15	下载	Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and compl...
AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes	Youshao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, Zhaoxin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou	2024-04-15	下载	Many distributed training techniques like Parameter Server and AllReduce have been proposed to take advantage of the increasingly large data and rich features.
Towards a high-performance AI compiler with upstream MLIR	Renato Golin, Lorenzo Chelini, Adam Siemieniuk, Kavitha Madhu, Niranjan Hasabnis, Hans Pabst, Evangelos Georganas, Alexander Heinecke	2024-04-15	下载	This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction.
Centralization in Proof-of-Stake Blockchains: A Game-Theoretic Analysis of Bootstrapping Protocols	Varul Srivastava, Sankarshan Damle, Sujit Gujar	2024-04-15	下载	Proof-of-stake (PoS) has emerged as a natural alternative to the resource-intensive Proof-of-Work (PoW) blockchain, as was recently seen with the Ethereum Merge.
Characterization and Mitigation of Insufficiencies in Automated Driving Systems	Yuting Fu, Jochen Seemann, Caspar Hanselaar, Tim Beurskens, Andrei Terechko, Emilia Silvas, Maurice Heemels	2024-04-15	下载	Automated Driving (AD) systems have the potential to increase safety, comfort and energy efficiency. Recently, major automotive companies have started testing and validating AD systems (ADS) on public...
Noiseless Privacy-Preserving Decentralized Learning	Sayan Biswas, Mathieu Even, Anne-Marie Kermarrec, Laurent Massoulie, Rafael Pires, Rishi Sharma, Martijn de Vos	2024-04-15	下载	Decentralized learning (DL) enables collaborative learning without a server and without training data leaving the users' devices. However, the models shared in DL can still be used to infer training d...
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism	Bingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, Xin Jin	2024-04-15	下载	The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between different requests as well as between different phases of the same reques...
On Optimal Server Allocation for Moldable Jobs with Concave Speed-Up	Samira Ghanbarian, Arpan Mukhopadhyay, Ravi R. Mazumdar, Fabrice M. Guillemin	2024-04-15	下载	A large proportion of jobs submitted to modern computing clusters and data centers are parallelizable and capable of running on a flexible number of computing cores or servers.
Hybrid FedGraph: An efficient hybrid federated learning algorithm using graph convolutional neural network	Jaeyeon Jang, Diego Klabjan, Veena Mendiratta, Fanfei Meng	2024-04-15	下载	Federated learning is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Quality of Experience Oriented Cross-layer Optimization for Real-time XR Video Transmission	Guangjin Pan, Shugong Xu, Shunqing Zhang, Xiaojing Chen, Yanzan Sun	2024-04-15	下载	Extended reality (XR) is one of the most important applications of beyond 5G and 6G networks. Real-time XR video transmission presents challenges in terms of data rate and delay.
Decentralized Multi-Party Multi-Network AI for Global Deployment of 6G Wireless Systems	Merim Dzaferagic, Marco Ruffini, Nina Slamnik-Krijestorac, Joao F. Santos, Johann Marquez-Barja, Christos Tranoris, Spyros Denazis, Thomas Kyriakakis, Panagiotis Karafotis, Luiz DaSilva, Shashi Raj Pandey, Junya Shiraishi, Petar Popovski, Soren Kejser Jensen, Christian Thomsen, Torben Bach Pedersen, Holger Claussen, Jinfeng Du, Gil Zussman, Tingjun Chen, Yiran Chen, Seshu Tirupathi, Ivan Seskar, Daniel Kilper	2024-04-15	下载	Multiple visions of 6G networks elicit Artificial Intelligence (AI) as a central, native element. When 6G systems are deployed at a large scale, end-to-end AI-based solutions will necessarily have to ...
Improved methodology for longitudinal Web analytics using Common Crawl	Henry S. Thompson	2024-04-15	下载	Common Crawl is a multi-petabyte longitudinal dataset containing over 100 billion web pages which is widely used as a source of language data for sequence model training and in web science research.
Demonstration of a Networked Music Performance Experience with MEVO	Leonardo Severi, Matteo Sacchetto, Andrea Bianco, Cristina Rottondi, Aleksandra Knapinska, Piotr Lechowicz	2024-04-15	下载	In this paper we present a Networked Music Performance system currently under development at Politecnico di Torino. We demonstrate its use in a distributed concert held in June 2023, which featured th...
OpenAirLink: Reproducible Wireless Channel Emulation using Software Defined Radios	Yash Deshpande, Xianglong Wang, Wolfgang Kellerer	2024-04-15	下载	This paper presents OpenAirLink(OAL), an open-source channel emulator for reproducible testing of wireless scenarios. OAL is implemented on off-the-shelf software-defined radios (SDR) and presents a s...
LR-FHSS-Sim: A Discrete-Event Simulator for LR-FHSS Networks	Jean Michel de Souza Sant Ana, Arliones Hoeller, Hirley Alves, Richard Demo Souza	2024-04-15	下载	This work presents the LR-FHSS-Sim, a free and open-source discrete-event simulator for LR-FHSS networks. We highlight the importance of network modeling for IoT coverage, especially when it is needed...
A Distributed Scalable Cross-chain State Channel Scheme Based on Recursive State Synchronization	Xinyu Liang, Ruiying Du, Jing Chen, Yu Zhang, Meng Jia, Shuangxi Cao, Yufeng Wei, Shixiong Yao	2024-04-15	下载	As cross-chain technology continues to advance, the scale of cross-chain transactions is experiencing significant expansion. To improve scalability, researchers have turned to the study of cross-chain...
An Autoencoder-Based Constellation Design for AirComp in Wireless Federated Learning	Yujia Mu, Xizixiang Wei, Cong Shen	2024-04-15	下载	Wireless federated learning (FL) relies on efficient uplink communications to aggregate model updates across distributed edge devices. Over-the-air computation (a.k.a.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
LightningSimV2: Faster and Scalable Simulation for High-Level Synthesis via Graph Compilation and Optimization	Rishov Sarkar, Rachel Paul, Cong Hao	2024-04-15	下载	High-Level Synthesis (HLS) enables rapid prototyping of complex hardware designs by translating C or C++ code to low-level RTL code. However, the testing and evaluation of HLS designs still typically ...
On Optimal Server Allocation for Moldable Jobs with Concave Speed-Up	Samira Ghanbarian, Arpan Mukhopadhyay, Ravi R. Mazumdar, Fabrice M. Guillemin	2024-04-15	下载	A large proportion of jobs submitted to modern computing clusters and data centers are parallelizable and capable of running on a flexible number of computing cores or servers.