Skip to content

2024-04-15

cs.AR - Architecture

标题作者发布日期PDF摘要
Field-Programmable Gate Array Architecture for Deep Learning: Survey & Future DirectionsAndrew Boutros, Aman Arora, Vaughn Betz2024-04-15下载Deep learning (DL) is becoming the cornerstone of numerous applications both in datacenters and at the edge. Specialized hardware is often necessary to meet the performance requirements of state-of-th...
Error Detection and Correction Codes for Safe In-Memory ComputationsLuca Parrini, Taha Soliman, Benjamin Hettwer, Jan Micha Borrmann, Simranjeet Singh, Ankit Bende, Vikas Rana, Farhad Merchant, Norbert Wehn2024-04-15下载In-Memory Computing (IMC) introduces a new paradigm of computation that offers high efficiency in terms of latency and power consumption for AI accelerators.
Towards a high-performance AI compiler with upstream MLIRRenato Golin, Lorenzo Chelini, Adam Siemieniuk, Kavitha Madhu, Niranjan Hasabnis, Hans Pabst, Evangelos Georganas, Alexander Heinecke2024-04-15下载This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction.
Efficient and accurate neural field reconstruction using resistive memoryYifei Yu, Shaocong Wang, Woyu Zhang, Xinyuan Zhang, Xiuzhe Wu, Yangu He, Jichang Yang, Yue Zhang, Ning Lin, Bo Wang, Xi Chen, Songqi Wang, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu2024-04-15下载Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency.
Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level SparsityCenlin Duan, Jianlei Yang, Yiou Wang, Yikun Wang, Yingjie Qi, Xiaolin He, Bonan Yan, Xueyan Wang, Xiaotao Jia, Weisheng Zhao2024-04-15下载Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency.
LightningSimV2: Faster and Scalable Simulation for High-Level Synthesis via Graph Compilation and OptimizationRishov Sarkar, Rachel Paul, Cong Hao2024-04-15下载High-Level Synthesis (HLS) enables rapid prototyping of complex hardware designs by translating C or C++ code to low-level RTL code. However, the testing and evaluation of HLS designs still typically ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Distributing Context-Aware Shared Memory Data Structures: A Case Study on Singly-Linked ListsRaaghav Ravishankar, Sandeep Kulkarni, Sathya Peri, Gokarna Sharma2024-04-15下载In this paper, we study the partitioning of a context-aware shared memory data structure so that it can be implemented as a distributed data structure running on multiple machines.
[FEDSTR: Money-In AI-OutA Decentralized Marketplace for Federated Learning and LLM Training on the NOSTR Protocol](https://arxiv.org/abs/2404.15834v2)Konstantinos E. Nikolakakis, George Chantzialexiou, Dionysis Kalogerias2024-04-15下载
Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data PartitioningChong Yu, Shuaiqi Shen, Shiqiang Wang, Kuan Zhang, Hai Zhao2024-04-15下载E-health allows smart devices and medical institutions to collaboratively collect patients' data, which is trained by Artificial Intelligence (AI) technologies to help doctors make diagnosis.
Design and Implementation of a Java-Based Client-Server ApplicationOmkar Patil, Aarya Shirbhate2024-04-15下载This report details the development of a networked distributed system named Group Communication System (GCS), implemented in Java to exemplify socket programming and communication protocols.
Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary DynamicsMing Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su2024-04-15下载Federated learning is a popular distributed learning approach for training a machine learning model without disclosing raw data. It consists of a parameter server and a possibly large collection of cl...
cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor CoresZixuan Li, Mingxing Duan, Huizhang Luo, Wangdong Yang, Kenli Li, Keqin Li2024-04-15下载Sparse tensors are prevalent in real-world applications, often characterized by their large-scale, high-order, and high-dimensional nature. Directly handling raw tensors is impractical due to the sign...
ChainScience 2024, Conference ProceedingsNicolò Vallarano, Claudio J. Tessone2024-04-15下载ChainScience 2024, the second edition of the interdisciplinary conference, brought together academics, practitioners, and industry experts to explore novel developments in the realm of distributed led...
AntBatchInfer: Elastic Batch Inference in the Kubernetes ClusterSiyuan Li, Youshao Xiao, Fanzhuang Meng, Lin Ju, Lei Liang, Lin Wang, Jun Zhou2024-04-15下载Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and compl...
AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler NodesYoushao Xiao, Lin Ju, Zhenglei Zhou, Siyuan Li, Zhaoxin Huan, Dalong Zhang, Rujie Jiang, Lin Wang, Xiaolu Zhang, Lei Liang, Jun Zhou2024-04-15下载Many distributed training techniques like Parameter Server and AllReduce have been proposed to take advantage of the increasingly large data and rich features.
Towards a high-performance AI compiler with upstream MLIRRenato Golin, Lorenzo Chelini, Adam Siemieniuk, Kavitha Madhu, Niranjan Hasabnis, Hans Pabst, Evangelos Georganas, Alexander Heinecke2024-04-15下载This work proposes a compilation flow using open-source compiler passes to build a framework to achieve ninja performance from a generic linear algebra high-level abstraction.
Centralization in Proof-of-Stake Blockchains: A Game-Theoretic Analysis of Bootstrapping ProtocolsVarul Srivastava, Sankarshan Damle, Sujit Gujar2024-04-15下载Proof-of-stake (PoS) has emerged as a natural alternative to the resource-intensive Proof-of-Work (PoW) blockchain, as was recently seen with the Ethereum Merge.
Characterization and Mitigation of Insufficiencies in Automated Driving SystemsYuting Fu, Jochen Seemann, Caspar Hanselaar, Tim Beurskens, Andrei Terechko, Emilia Silvas, Maurice Heemels2024-04-15下载Automated Driving (AD) systems have the potential to increase safety, comfort and energy efficiency. Recently, major automotive companies have started testing and validating AD systems (ADS) on public...
Noiseless Privacy-Preserving Decentralized LearningSayan Biswas, Mathieu Even, Anne-Marie Kermarrec, Laurent Massoulie, Rafael Pires, Rishi Sharma, Martijn de Vos2024-04-15下载Decentralized learning (DL) enables collaborative learning without a server and without training data leaving the users' devices. However, the models shared in DL can still be used to infer training d...
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence ParallelismBingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, Xin Jin2024-04-15下载The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between different requests as well as between different phases of the same reques...
On Optimal Server Allocation for Moldable Jobs with Concave Speed-UpSamira Ghanbarian, Arpan Mukhopadhyay, Ravi R. Mazumdar, Fabrice M. Guillemin2024-04-15下载A large proportion of jobs submitted to modern computing clusters and data centers are parallelizable and capable of running on a flexible number of computing cores or servers.
Hybrid FedGraph: An efficient hybrid federated learning algorithm using graph convolutional neural networkJaeyeon Jang, Diego Klabjan, Veena Mendiratta, Fanfei Meng2024-04-15下载Federated learning is an emerging paradigm for decentralized training of machine learning models on distributed clients, without revealing the data to the central server.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Quality of Experience Oriented Cross-layer Optimization for Real-time XR Video TransmissionGuangjin Pan, Shugong Xu, Shunqing Zhang, Xiaojing Chen, Yanzan Sun2024-04-15下载Extended reality (XR) is one of the most important applications of beyond 5G and 6G networks. Real-time XR video transmission presents challenges in terms of data rate and delay.
Decentralized Multi-Party Multi-Network AI for Global Deployment of 6G Wireless SystemsMerim Dzaferagic, Marco Ruffini, Nina Slamnik-Krijestorac, Joao F. Santos, Johann Marquez-Barja, Christos Tranoris, Spyros Denazis, Thomas Kyriakakis, Panagiotis Karafotis, Luiz DaSilva, Shashi Raj Pandey, Junya Shiraishi, Petar Popovski, Soren Kejser Jensen, Christian Thomsen, Torben Bach Pedersen, Holger Claussen, Jinfeng Du, Gil Zussman, Tingjun Chen, Yiran Chen, Seshu Tirupathi, Ivan Seskar, Daniel Kilper2024-04-15下载Multiple visions of 6G networks elicit Artificial Intelligence (AI) as a central, native element. When 6G systems are deployed at a large scale, end-to-end AI-based solutions will necessarily have to ...
Improved methodology for longitudinal Web analytics using Common CrawlHenry S. Thompson2024-04-15下载Common Crawl is a multi-petabyte longitudinal dataset containing over 100 billion web pages which is widely used as a source of language data for sequence model training and in web science research.
Demonstration of a Networked Music Performance Experience with MEVOLeonardo Severi, Matteo Sacchetto, Andrea Bianco, Cristina Rottondi, Aleksandra Knapinska, Piotr Lechowicz2024-04-15下载In this paper we present a Networked Music Performance system currently under development at Politecnico di Torino. We demonstrate its use in a distributed concert held in June 2023, which featured th...
OpenAirLink: Reproducible Wireless Channel Emulation using Software Defined RadiosYash Deshpande, Xianglong Wang, Wolfgang Kellerer2024-04-15下载This paper presents OpenAirLink(OAL), an open-source channel emulator for reproducible testing of wireless scenarios. OAL is implemented on off-the-shelf software-defined radios (SDR) and presents a s...
LR-FHSS-Sim: A Discrete-Event Simulator for LR-FHSS NetworksJean Michel de Souza Sant Ana, Arliones Hoeller, Hirley Alves, Richard Demo Souza2024-04-15下载This work presents the LR-FHSS-Sim, a free and open-source discrete-event simulator for LR-FHSS networks. We highlight the importance of network modeling for IoT coverage, especially when it is needed...
A Distributed Scalable Cross-chain State Channel Scheme Based on Recursive State SynchronizationXinyu Liang, Ruiying Du, Jing Chen, Yu Zhang, Meng Jia, Shuangxi Cao, Yufeng Wei, Shixiong Yao2024-04-15下载As cross-chain technology continues to advance, the scale of cross-chain transactions is experiencing significant expansion. To improve scalability, researchers have turned to the study of cross-chain...
An Autoencoder-Based Constellation Design for AirComp in Wireless Federated LearningYujia Mu, Xizixiang Wei, Cong Shen2024-04-15下载Wireless federated learning (FL) relies on efficient uplink communications to aggregate model updates across distributed edge devices. Over-the-air computation (a.k.a.

cs.PF - Performance

标题作者发布日期PDF摘要
LightningSimV2: Faster and Scalable Simulation for High-Level Synthesis via Graph Compilation and OptimizationRishov Sarkar, Rachel Paul, Cong Hao2024-04-15下载High-Level Synthesis (HLS) enables rapid prototyping of complex hardware designs by translating C or C++ code to low-level RTL code. However, the testing and evaluation of HLS designs still typically ...
On Optimal Server Allocation for Moldable Jobs with Concave Speed-UpSamira Ghanbarian, Arpan Mukhopadhyay, Ravi R. Mazumdar, Fabrice M. Guillemin2024-04-15下载A large proportion of jobs submitted to modern computing clusters and data centers are parallelizable and capable of running on a flexible number of computing cores or servers.

基于 VitePress 构建