Skip to content

2024-05-01

cs.AR - Architecture

标题作者发布日期PDF摘要
HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and BeyondStefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao2024-05-01下载Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE).
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module AcceleratorsMohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque2024-05-01下载Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware.
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUsSize Zheng, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang2024-05-01下载IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN).
Towards Green AI: Current status and future researchChristian Clemm, Lutz Stobbe, Kishan Wimalawarne, Jan Druschke2024-05-01下载The immense technological progress in artificial intelligence research and applications is increasingly drawing attention to the environmental sustainability of such systems, a field that has been ter...
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive SurveyDayou Du, Gu Gong, Xiaowen Chu2024-05-01下载Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent LearningSeyed Mahmoud Sajjadi Mohammadabadi, Lei Yang, Feng Yan, Junshan Zhang2024-05-01下载Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. However, inherent heterogeneity in agents' resources (computation, communication, and task ...
Tight Lower Bounds in the Supported LOCAL ModelAlkida Balliu, Thomas Boudier, Sebastian Brandt, Dennis Olivetti2024-05-01下载We study the complexity of fundamental distributed graph problems in the recently popular setting where information about the input graph is available to the nodes before the start of the computation.
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module AcceleratorsMohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque2024-05-01下载Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware.
A Fast Confirmation Rule (aka Fast Synchronous Finality) for the Ethereum Consensus ProtocolAditya Asgaonkar, Francesco D'Amato, Roberto Saltini, Luca Zanolini, Chenyi Zhang2024-05-01下载A Confirmation Rule, within blockchain networks, refers to an algorithm implemented by network nodes that determines (either probabilistically or deterministically) the permanence of certain blocks on...
A New Approach for Evaluating the Performance of Distributed Latency-Sensitive ServicesTheodoros Theodoropoulos, John Violos, Antonios Makris, Konstantinos Tserpes2024-05-01下载Conventional latency metrics are formulated based on a broad definition of traditional monolithic services, and hence lack the capacity to address the complexities inherent in modern services and dist...
Porting HPC Applications to AMD InstinctTM^\text{TM} MI300A Using Unified Memory and OpenMPSuyash Tandon, Leopold Grinberg, Gheorghe-Teodor Bercea, Carlo Bertolli, Mark Olesen, Simone Bnà, Nicholas Malaya2024-05-01下载AMD InstinctTM^\text{TM} MI300A is the world's first data center accelerated processing unit (APU) with memory shared between the AMD "Zen 4" EPYCTM^\text{TM} cores and third generation CDNA$^\text{TM...
On the Potential of Re-configurable Intelligent Surface (RIS)-assisted Physical Layer Authentication (PLA)Hala Amin, Waqas Aman, Saif Al-Kuwari2024-05-01下载Re-configurable Intelligent Surfaces (RIS) technology is increasingly becoming a potential component for next-generation wireless networks, offering enhanced performance in terms of throughput, spectr...
A Communication Avoiding and Reducing Algorithm for Symmetric Eigenproblem for Very Small MatricesTakahiro Katagiri, Jun'ichi Iwata, Kazuyuki Uchida2024-05-01下载In this paper, a parallel symmetric eigensolver with very small matrices in massively parallel processing is considered. We define very small matrices that fit the sizes of caches per node in a superc...
Improved Massively Parallel Triangle Counting in O(1)O(1) RoundsQuanquan C. Liu, C. Seshadhri2024-05-01下载In this short note, we give a novel algorithm for O(1)O(1) round triangle counting in bounded arboricity graphs. Counting triangles in O(1)O(1) rounds (exactly) is listed as one of the interesting remaini...
Xabclib:A Fully Auto-tuned Sparse Iterative SolverTakahiro Katagiri, Takao Sakurai, Mitsuyoshi Igai, Shoji Itoh, Satoshi Ohshima, Hisayasu Kuroda, Ken Naono, Kengo Nakajima2024-05-01下载In this paper, we propose a general application programming interface named OpenATLib for auto-tuning (AT). OpenATLib is designed to establish the reusability of AT functions.
A Hybrid Approach to Monitor Context Parameters for Optimising Caching for Context-Aware IoT ApplicationsAshish Manchanda, Prem Prakash Jayaraman, Abhik Banerjee, Arkady Zaslavsky, Shakthi Weerasinghe, Guang-Li Huang2024-05-01下载Internet of Things (IoT) has seen a prolific rise in recent times and provides the ability to solve several key challenges faced by our societies and environment.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork SchedulingHuai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Hao Wang, Xin Fu, Miao Pan2024-05-01下载As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing ...
Optimizing Profitability in Timely Gossip NetworksPriyanka Kaswan, Melih Bastopcu, Sennur Ulukus, S. Rasoul Etesami, Tamer Başar2024-05-01下载We consider a communication system where a group of users, interconnected in a bidirectional gossip network, wishes to follow a time-varying source, e.g., updates on an event, in real-time.
Cross-Cluster Networking to Support Extended Reality ServicesTheodoros Theodoropoulos, Luis Rosa, Abderrahmane Boudi, Tarik Zakaria Benmerar, Antonios Makris, Tarik Taleb, Luis Cordeiro, Konstantinos Tserpes, JaeSeung Song2024-05-01下载Extented Reality (XR) refers to a class of contemporary services that are intertwined with a plethora of rather demanding Quality of Service (QoS) and functional requirements.
FMLFS: A Federated Multi-Label Feature Selection Based on Information Theory in IoT EnvironmentAfsaneh Mahanipour, Hana Khamfroush2024-05-01下载In certain emerging applications such as health monitoring wearable and traffic monitoring systems, Internet-of-Things (IoT) devices generate or collect a huge amount of multi-label datasets.
Cell Switching in HAPS-Aided Networking: How the Obscurity of Traffic Loads Affects the DecisionBerk Çiloğlu, Görkem Berkay Koç, Metin Ozturk, Halim Yanikomeroglu2024-05-01下载This study aims to introduce the cell load estimation problem of cell switching approaches in cellular networks specially-presented in a high-altitude platform station (HAPS)-assisted network.
Robot-As-A-Sensor: Forming a Sensing Network with Robots for Underground Mining MissionsXiaoyu Ai, Chengpei Xu, Binghao Li, Feng Xia2024-05-01下载Nowadays, robots are deployed as mobile platforms equipped with sensing, communication and computing capabilities, especially in the mining industry, where they perform tasks in hazardous and repetiti...
A Hybrid Approach to Monitor Context Parameters for Optimising Caching for Context-Aware IoT ApplicationsAshish Manchanda, Prem Prakash Jayaraman, Abhik Banerjee, Arkady Zaslavsky, Shakthi Weerasinghe, Guang-Li Huang2024-05-01下载Internet of Things (IoT) has seen a prolific rise in recent times and provides the ability to solve several key challenges faced by our societies and environment.

cs.PF - Performance

标题作者发布日期PDF摘要
Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent LearningSeyed Mahmoud Sajjadi Mohammadabadi, Lei Yang, Feng Yan, Junshan Zhang2024-05-01下载Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. However, inherent heterogeneity in agents' resources (computation, communication, and task ...
SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module AcceleratorsMohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque2024-05-01下载Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware.
A Communication Avoiding and Reducing Algorithm for Symmetric Eigenproblem for Very Small MatricesTakahiro Katagiri, Jun'ichi Iwata, Kazuyuki Uchida2024-05-01下载In this paper, a parallel symmetric eigensolver with very small matrices in massively parallel processing is considered. We define very small matrices that fit the sizes of caches per node in a superc...
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive SurveyDayou Du, Gu Gong, Xiaowen Chu2024-05-01下载Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications.
Xabclib:A Fully Auto-tuned Sparse Iterative SolverTakahiro Katagiri, Takao Sakurai, Mitsuyoshi Igai, Shoji Itoh, Satoshi Ohshima, Hisayasu Kuroda, Ken Naono, Kengo Nakajima2024-05-01下载In this paper, we propose a general application programming interface named OpenATLib for auto-tuning (AT). OpenATLib is designed to establish the reusability of AT functions.

基于 VitePress 构建