Appearance
2024-05-01
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond | Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao | 2024-05-01 | 下载 | Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). |
| SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators | Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque | 2024-05-01 | 下载 | Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware. |
| vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs | Size Zheng, Renze Chen, Meng Li, Zihao Ye, Luis Ceze, Yun Liang | 2024-05-01 | 下载 | IoT devices based on microcontroller units (MCU) provide ultra-low power consumption and ubiquitous computation for near-sensor deep learning models (DNN). |
| Towards Green AI: Current status and future research | Christian Clemm, Lutz Stobbe, Kishan Wimalawarne, Jan Druschke | 2024-05-01 | 下载 | The immense technological progress in artificial intelligence research and applications is increasingly drawing attention to the environmental sustainability of such systems, a field that has been ter... |
| Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey | Dayou Du, Gu Gong, Xiaowen Chu | 2024-05-01 | 下载 | Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning | Seyed Mahmoud Sajjadi Mohammadabadi, Lei Yang, Feng Yan, Junshan Zhang | 2024-05-01 | 下载 | Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. However, inherent heterogeneity in agents' resources (computation, communication, and task ... |
| Tight Lower Bounds in the Supported LOCAL Model | Alkida Balliu, Thomas Boudier, Sebastian Brandt, Dennis Olivetti | 2024-05-01 | 下载 | We study the complexity of fundamental distributed graph problems in the recently popular setting where information about the input graph is available to the nodes before the start of the computation. |
| SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators | Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque | 2024-05-01 | 下载 | Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware. |
| A Fast Confirmation Rule (aka Fast Synchronous Finality) for the Ethereum Consensus Protocol | Aditya Asgaonkar, Francesco D'Amato, Roberto Saltini, Luca Zanolini, Chenyi Zhang | 2024-05-01 | 下载 | A Confirmation Rule, within blockchain networks, refers to an algorithm implemented by network nodes that determines (either probabilistically or deterministically) the permanence of certain blocks on... |
| A New Approach for Evaluating the Performance of Distributed Latency-Sensitive Services | Theodoros Theodoropoulos, John Violos, Antonios Makris, Konstantinos Tserpes | 2024-05-01 | 下载 | Conventional latency metrics are formulated based on a broad definition of traditional monolithic services, and hence lack the capacity to address the complexities inherent in modern services and dist... |
| Porting HPC Applications to AMD Instinct MI300A Using Unified Memory and OpenMP | Suyash Tandon, Leopold Grinberg, Gheorghe-Teodor Bercea, Carlo Bertolli, Mark Olesen, Simone Bnà, Nicholas Malaya | 2024-05-01 | 下载 | AMD Instinct MI300A is the world's first data center accelerated processing unit (APU) with memory shared between the AMD "Zen 4" EPYC cores and third generation CDNA$^\text{TM... |
| On the Potential of Re-configurable Intelligent Surface (RIS)-assisted Physical Layer Authentication (PLA) | Hala Amin, Waqas Aman, Saif Al-Kuwari | 2024-05-01 | 下载 | Re-configurable Intelligent Surfaces (RIS) technology is increasingly becoming a potential component for next-generation wireless networks, offering enhanced performance in terms of throughput, spectr... |
| A Communication Avoiding and Reducing Algorithm for Symmetric Eigenproblem for Very Small Matrices | Takahiro Katagiri, Jun'ichi Iwata, Kazuyuki Uchida | 2024-05-01 | 下载 | In this paper, a parallel symmetric eigensolver with very small matrices in massively parallel processing is considered. We define very small matrices that fit the sizes of caches per node in a superc... |
| Improved Massively Parallel Triangle Counting in Rounds | Quanquan C. Liu, C. Seshadhri | 2024-05-01 | 下载 | In this short note, we give a novel algorithm for round triangle counting in bounded arboricity graphs. Counting triangles in rounds (exactly) is listed as one of the interesting remaini... |
| Xabclib:A Fully Auto-tuned Sparse Iterative Solver | Takahiro Katagiri, Takao Sakurai, Mitsuyoshi Igai, Shoji Itoh, Satoshi Ohshima, Hisayasu Kuroda, Ken Naono, Kengo Nakajima | 2024-05-01 | 下载 | In this paper, we propose a general application programming interface named OpenATLib for auto-tuning (AT). OpenATLib is designed to establish the reusability of AT functions. |
| A Hybrid Approach to Monitor Context Parameters for Optimising Caching for Context-Aware IoT Applications | Ashish Manchanda, Prem Prakash Jayaraman, Abhik Banerjee, Arkady Zaslavsky, Shakthi Weerasinghe, Guang-Li Huang | 2024-05-01 | 下载 | Internet of Things (IoT) has seen a prolific rise in recent times and provides the ability to solve several key challenges faced by our societies and environment. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling | Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Hao Wang, Xin Fu, Miao Pan | 2024-05-01 | 下载 | As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing ... |
| Optimizing Profitability in Timely Gossip Networks | Priyanka Kaswan, Melih Bastopcu, Sennur Ulukus, S. Rasoul Etesami, Tamer Başar | 2024-05-01 | 下载 | We consider a communication system where a group of users, interconnected in a bidirectional gossip network, wishes to follow a time-varying source, e.g., updates on an event, in real-time. |
| Cross-Cluster Networking to Support Extended Reality Services | Theodoros Theodoropoulos, Luis Rosa, Abderrahmane Boudi, Tarik Zakaria Benmerar, Antonios Makris, Tarik Taleb, Luis Cordeiro, Konstantinos Tserpes, JaeSeung Song | 2024-05-01 | 下载 | Extented Reality (XR) refers to a class of contemporary services that are intertwined with a plethora of rather demanding Quality of Service (QoS) and functional requirements. |
| FMLFS: A Federated Multi-Label Feature Selection Based on Information Theory in IoT Environment | Afsaneh Mahanipour, Hana Khamfroush | 2024-05-01 | 下载 | In certain emerging applications such as health monitoring wearable and traffic monitoring systems, Internet-of-Things (IoT) devices generate or collect a huge amount of multi-label datasets. |
| Cell Switching in HAPS-Aided Networking: How the Obscurity of Traffic Loads Affects the Decision | Berk Çiloğlu, Görkem Berkay Koç, Metin Ozturk, Halim Yanikomeroglu | 2024-05-01 | 下载 | This study aims to introduce the cell load estimation problem of cell switching approaches in cellular networks specially-presented in a high-altitude platform station (HAPS)-assisted network. |
| Robot-As-A-Sensor: Forming a Sensing Network with Robots for Underground Mining Missions | Xiaoyu Ai, Chengpei Xu, Binghao Li, Feng Xia | 2024-05-01 | 下载 | Nowadays, robots are deployed as mobile platforms equipped with sensing, communication and computing capabilities, especially in the mining industry, where they perform tasks in hazardous and repetiti... |
| A Hybrid Approach to Monitor Context Parameters for Optimising Caching for Context-Aware IoT Applications | Ashish Manchanda, Prem Prakash Jayaraman, Abhik Banerjee, Arkady Zaslavsky, Shakthi Weerasinghe, Guang-Li Huang | 2024-05-01 | 下载 | Internet of Things (IoT) has seen a prolific rise in recent times and provides the ability to solve several key challenges faced by our societies and environment. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Communication-Efficient Training Workload Balancing for Decentralized Multi-Agent Learning | Seyed Mahmoud Sajjadi Mohammadabadi, Lei Yang, Feng Yan, Junshan Zhang | 2024-05-01 | 下载 | Decentralized Multi-agent Learning (DML) enables collaborative model training while preserving data privacy. However, inherent heterogeneity in agents' resources (computation, communication, and task ... |
| SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators | Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque | 2024-05-01 | 下载 | Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware. |
| A Communication Avoiding and Reducing Algorithm for Symmetric Eigenproblem for Very Small Matrices | Takahiro Katagiri, Jun'ichi Iwata, Kazuyuki Uchida | 2024-05-01 | 下载 | In this paper, a parallel symmetric eigensolver with very small matrices in massively parallel processing is considered. We define very small matrices that fit the sizes of caches per node in a superc... |
| Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey | Dayou Du, Gu Gong, Xiaowen Chu | 2024-05-01 | 下载 | Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. |
| Xabclib:A Fully Auto-tuned Sparse Iterative Solver | Takahiro Katagiri, Takao Sakurai, Mitsuyoshi Igai, Shoji Itoh, Satoshi Ohshima, Hisayasu Kuroda, Ken Naono, Kengo Nakajima | 2024-05-01 | 下载 | In this paper, we propose a general application programming interface named OpenATLib for auto-tuning (AT). OpenATLib is designed to establish the reusability of AT functions. |