2024-05-01

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond	Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao	2024-05-01	下载	Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE).
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators	Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque	2024-05-01	下载	Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware.
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs	Size Zheng, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang	2024-05-01	下载	IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN).
Towards Green AI: Current status and future research	Christian Clemm, Lutz Stobbe, Kishan Wimalawarne, Jan Druschke	2024-05-01	下载	The immense technological progress in artificial intelligence research and applications is increasingly drawing attention to the environmental sustainability of such systems, a field that has been ter...
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey	Dayou Du, Gu Gong, Xiaowen Chu	2024-05-01	下载	Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning	Seyed Mahmoud Sajjadi Mohammadabadi, Lei Yang, Feng Yan, Junshan Zhang	2024-05-01	下载	Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. However, inherent heterogeneity in agents' resources (computation, communication, and task ...
Tight Lower Bounds in the Supported LOCAL Model	Alkida Balliu, Thomas Boudier, Sebastian Brandt, Dennis Olivetti	2024-05-01	下载	We study the complexity of fundamental distributed graph problems in the recently popular setting where information about the input graph is available to the nodes before the start of the computation.
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators	Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque	2024-05-01	下载	Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware.
A Fast Confirmation Rule (aka Fast Synchronous Finality) for the Ethereum Consensus Protocol	Aditya Asgaonkar, Francesco D'Amato, Roberto Saltini, Luca Zanolini, Chenyi Zhang	2024-05-01	下载	A Confirmation Rule, within blockchain networks, refers to an algorithm implemented by network nodes that determines (either probabilistically or deterministically) the permanence of certain blocks on...
A New Approach for Evaluating the Performance of Distributed Latency-Sensitive Services	Theodoros Theodoropoulos, John Violos, Antonios Makris, Konstantinos Tserpes	2024-05-01	下载	Conventional latency metrics are formulated based on a broad definition of traditional monolithic services, and hence lack the capacity to address the complexities inherent in modern services and dist...
Porting HPC Applications to AMD Instinct $^\text{TM}$ MI300A Using Unified Memory and OpenMP	Suyash Tandon, Leopold Grinberg, Gheorghe-Teodor Bercea, Carlo Bertolli, Mark Olesen, Simone Bnà, Nicholas Malaya	2024-05-01	下载	AMD Instinct $^\text{TM}$ MI300A is the world's first data center accelerated processing unit (APU) with memory shared between the AMD "Zen 4" EPYC $^\text{TM}$ cores and third generation CDNA$^\text{TM...
On the Potential of Re-configurable Intelligent Surface (RIS)-assisted Physical Layer Authentication (PLA)	Hala Amin, Waqas Aman, Saif Al-Kuwari	2024-05-01	下载	Re-configurable Intelligent Surfaces (RIS) technology is increasingly becoming a potential component for next-generation wireless networks, offering enhanced performance in terms of throughput, spectr...
A Communication Avoiding and Reducing Algorithm for Symmetric Eigenproblem for Very Small Matrices	Takahiro Katagiri, Jun'ichi Iwata, Kazuyuki Uchida	2024-05-01	下载	In this paper, a parallel symmetric eigensolver with very small matrices in massively parallel processing is considered. We define very small matrices that fit the sizes of caches per node in a superc...
Improved Massively Parallel Triangle Counting in $O(1)$ Rounds	Quanquan C. Liu, C. Seshadhri	2024-05-01	下载	In this short note, we give a novel algorithm for $O(1)$ round triangle counting in bounded arboricity graphs. Counting triangles in $O(1)$ rounds (exactly) is listed as one of the interesting remaini...
Xabclib:A Fully Auto-tuned Sparse Iterative Solver	Takahiro Katagiri, Takao Sakurai, Mitsuyoshi Igai, Shoji Itoh, Satoshi Ohshima, Hisayasu Kuroda, Ken Naono, Kengo Nakajima	2024-05-01	下载	In this paper, we propose a general application programming interface named OpenATLib for auto-tuning (AT). OpenATLib is designed to establish the reusability of AT functions.
A Hybrid Approach to Monitor Context Parameters for Optimising Caching for Context-Aware IoT Applications	Ashish Manchanda, Prem Prakash Jayaraman, Abhik Banerjee, Arkady Zaslavsky, Shakthi Weerasinghe, Guang-Li Huang	2024-05-01	下载	Internet of Things (IoT) has seen a prolific rise in recent times and provides the ability to solve several key challenges faced by our societies and environment.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling	Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Hao Wang, Xin Fu, Miao Pan	2024-05-01	下载	As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing ...
Optimizing Profitability in Timely Gossip Networks	Priyanka Kaswan, Melih Bastopcu, Sennur Ulukus, S. Rasoul Etesami, Tamer Başar	2024-05-01	下载	We consider a communication system where a group of users, interconnected in a bidirectional gossip network, wishes to follow a time-varying source, e.g., updates on an event, in real-time.
Cross-Cluster Networking to Support Extended Reality Services	Theodoros Theodoropoulos, Luis Rosa, Abderrahmane Boudi, Tarik Zakaria Benmerar, Antonios Makris, Tarik Taleb, Luis Cordeiro, Konstantinos Tserpes, JaeSeung Song	2024-05-01	下载	Extented Reality (XR) refers to a class of contemporary services that are intertwined with a plethora of rather demanding Quality of Service (QoS) and functional requirements.
FMLFS: A Federated Multi-Label Feature Selection Based on Information Theory in IoT Environment	Afsaneh Mahanipour, Hana Khamfroush	2024-05-01	下载	In certain emerging applications such as health monitoring wearable and traffic monitoring systems, Internet-of-Things (IoT) devices generate or collect a huge amount of multi-label datasets.
Cell Switching in HAPS-Aided Networking: How the Obscurity of Traffic Loads Affects the Decision	Berk Çiloğlu, Görkem Berkay Koç, Metin Ozturk, Halim Yanikomeroglu	2024-05-01	下载	This study aims to introduce the cell load estimation problem of cell switching approaches in cellular networks specially-presented in a high-altitude platform station (HAPS)-assisted network.
Robot-As-A-Sensor: Forming a Sensing Network with Robots for Underground Mining Missions	Xiaoyu Ai, Chengpei Xu, Binghao Li, Feng Xia	2024-05-01	下载	Nowadays, robots are deployed as mobile platforms equipped with sensing, communication and computing capabilities, especially in the mining industry, where they perform tasks in hazardous and repetiti...
A Hybrid Approach to Monitor Context Parameters for Optimising Caching for Context-Aware IoT Applications	Ashish Manchanda, Prem Prakash Jayaraman, Abhik Banerjee, Arkady Zaslavsky, Shakthi Weerasinghe, Guang-Li Huang	2024-05-01	下载	Internet of Things (IoT) has seen a prolific rise in recent times and provides the ability to solve several key challenges faced by our societies and environment.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning	Seyed Mahmoud Sajjadi Mohammadabadi, Lei Yang, Feng Yan, Junshan Zhang	2024-05-01	下载	Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. However, inherent heterogeneity in agents' resources (computation, communication, and task ...
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators	Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque	2024-05-01	下载	Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware.
A Communication Avoiding and Reducing Algorithm for Symmetric Eigenproblem for Very Small Matrices	Takahiro Katagiri, Jun'ichi Iwata, Kazuyuki Uchida	2024-05-01	下载	In this paper, a parallel symmetric eigensolver with very small matrices in massively parallel processing is considered. We define very small matrices that fit the sizes of caches per node in a superc...
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey	Dayou Du, Gu Gong, Xiaowen Chu	2024-05-01	下载	Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications.
Xabclib:A Fully Auto-tuned Sparse Iterative Solver	Takahiro Katagiri, Takao Sakurai, Mitsuyoshi Igai, Shoji Itoh, Satoshi Ohshima, Hisayasu Kuroda, Ken Naono, Kengo Nakajima	2024-05-01	下载	In this paper, we propose a general application programming interface named OpenATLib for auto-tuning (AT). OpenATLib is designed to establish the reusability of AT functions.