2024-04-17

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads	Rachid Karami, Sheng-Chun Kao, Hyoukjun Kwon	2024-04-17	下载	Among ML operators today, GEneralMatrix Multiplication (GEMM)-based operators are known to be key operators that build the main backbone of ML models.
Functionality Locality, Mixture & Control = Logic = Memory	Xiangjun Peng	2024-04-17	下载	This work provides new insights and constructs to the field of computer architecture and systems, and these insights are expected to be useful for the broad software stack.
Real Time Evolvable Hardware for Optimal Reconfiguration of Cusp-Like Pulse Shapers	Juan Lanchares, Oscar Garnica, José L. Risco-Martín, J. Ignacio Hidalgo, J. Manuel Colmenar, Alfredo Cuesta	2024-04-17	下载	The design of a cusp-like digital pulse shaper for particle energy measurements requires the definition of four parameters whose values are defined based on the nature of the shaper input signal (timi...
Revisiting Main Memory-Based Covert and Side Channel Attacks in the Context of Processing-in-Memory	F. Nisa Bostanci, Konstantinos Kanellopoulos, Ataberk Olgun, A. Giray Yaglikci, Ismail Emir Yuksel, Nika Mansouri Ghiasi, Zulal Bingol, Mohammad Sadrosadati, Onur Mutlu	2024-04-17	下载	We introduce IMPACT, a set of high-throughput main memory-based timing attacks that leverage characteristics of processing-in-memory (PiM) architectures to establish covert and side channels.
Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs	Endri Taka, Dimitrios Gourounas, Andreas Gerstlauer, Diana Marculescu, Aman Arora	2024-04-17	下载	FPGAs are a promising platform for accelerating Deep Learning (DL) applications, due to their high performance, low power consumption, and reconfigurability.
Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access	Luming Wang, Xu Zhang, Songyue Wang, Zhuolun Jiang, Tianyue Lu, Mingyu Chen, Siwei Luo, Keji Huang	2024-04-17	下载	The growing memory demands of modern applications have driven the adoption of far memory technologies in data centers to provide cost-effective, high-capacity memory solutions.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Dual-pronged deep learning preprocessing on heterogeneous platforms with CPU, Accelerator and CSD	Jia Wei, Xingjun Zhang, Witold Pedrycz, Longxiang Wang, Jie Zhao	2024-04-17	下载	For image-related deep learning tasks, the first step often involves reading data from external storage and performing preprocessing on the CPU.
Simulating Cloud Environments of Connected Vehicles for Anomaly Detection	M. Weiß, J. Stümpfle, F. Dettinger, N. Jazdi, M. Weyrich	2024-04-17	下载	The emergence of connected vehicles is driven by increasing customer and regulatory demands. To meet these, more complex software applications, some of which require service-based cloud and edge backe...
Araucaria: Simplifying INC Fault Tolerance with High-Level Intents	Ricardo Parizotto, Israat Haque, Alberto Schaeffer-Filho	2024-04-17	下载	Network programmability allows modification of fine-grain data plane functionality. The performance benefits of data plane programmability have motivated many researchers to offload computation that p...
A Secure and Trustworthy Network Architecture for Federated Learning Healthcare Applications	Antonio Boiano, Marco Di Gennaro, Luca Barbieri, Michele Carminati, Monica Nicoli, Alessandro Redondi, Stefano Savazzi, Albert Sund Aillet, Diogo Reis Santos, Luigi Serio	2024-04-17	下载	Federated Learning (FL) has emerged as a promising approach for privacy-preserving machine learning, particularly in sensitive domains such as healthcare.
FLeeC: a Fast Lock-Free Application Cache	André J. Costa, Nuno M. Preguiça, João M. Lourenço	2024-04-17	下载	When compared to blocking concurrency, non-blocking concurrency can provide higher performance in parallel shared-memory contexts, especially in high contention scenarios.
Hierarchical storage management in user space for neuroimaging applications	Valérie Hayot-Sasson, Tristan Glatard	2024-04-17	下载	Neuroimaging open-data initiatives have led to increased availability of large scientific datasets. While these datasets are shifting the processing bottleneck from compute-intensive to data-intensive...
IoTSim-Osmosis-RES: Towards autonomic renewable energy-aware osmotic computing	Tomasz Szydlo, Amadeusz Szabala, Nazar Kordiumov, Konrad Siuzdak, Lukasz Wolski, Khaled Alwasel, Fawzy Habeeb, Rajiv Ranjan	2024-04-17	下载	Internet of Things systems exists in various areas of our everyday life. For example, sensors installed in smart cities and homes are processed in edge and cloud computing centres providing several be...
Quantum Cloud Computing: A Review, Open Problems, and Future Directions	Hoa T. Nguyen, Prabhakar Krishnan, Dilip Krishnaswamy, Muhammad Usman, Rajkumar Buyya	2024-04-17	下载	Quantum cloud computing is an emerging paradigm of computing that empowers quantum applications and their deployment on quantum computing resources without the need for a specialized environment to ho...
Distributed Fractional Bayesian Learning for Adaptive Optimization	Yaqun Yang, Jinlong Lei, Guanghui Wen, Yiguang Hong	2024-04-17	下载	This paper considers a distributed adaptive optimization problem, where all agents only have access to their local cost functions with a common unknown parameter, whereas they mean to collaboratively ...
Accelerating Geo-distributed Machine Learning with Network-Aware Adaptive Tree and Auxiliary Route	Zonghang Li, Wenjiao Feng, Weibo Cai, Hongfang Yu, Long Luo, Gang Sun, Hongyang Du, Dusit Niyato	2024-04-17	下载	Distributed machine learning is becoming increasingly popular for geo-distributed data analytics, facilitating the collaborative analysis of data scattered across data centers in different regions.
Undo and Redo Support for Replicated Registers	Leo Stewen, Martin Kleppmann	2024-04-17	下载	Undo and redo functionality is ubiquitous in collaboration software. In single user settings, undo and redo are well understood. However, when multiple users edit a document, concurrency may arise, le...
Mutiny! How does Kubernetes fail, and what can we do about it?	Marco Barletta, Marcello Cinque, Catello Di Martino, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer	2024-04-17	下载	In this paper, we i) analyze and classify real-world failures of Kubernetes (the most popular container orchestration system), ii) develop a framework to perform a fault/error injection campaign targe...
XMiner: Efficient Directed Subgraph Matching with Pattern Reduction	Pingpeng Yuan, Yujiang Wang, Tianyu Ma, Siyuan He, Ling Liu	2024-04-17	下载	Graph pattern matching, one of the fundamental graph mining problems, aims to extract structural patterns of interest from an input graph. The state-of-the-art graph matching algorithms and systems ar...
ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours	Feiwen Zhu, Arkadiusz Nowaczynski, Rundong Li, Jie Xin, Yifei Song, Michal Marcinkiewicz, Sukru Burc Eryilmaz, Jun Yang, Michael Andersch	2024-04-17	下载	AlphaFold2 has been hailed as a breakthrough in protein folding. It can rapidly predict protein structures with lab-grade accuracy. However, its implementation does not include the necessary training ...
Approximate Wireless Communication for Lossy Gradient Updates in IoT Federated Learning	Xiang Ma, Haijian Sun, Rose Qingyang Hu, Yi Qian	2024-04-17	下载	Federated learning (FL) has emerged as a distributed machine learning (ML) technique that can protect local data privacy for participating clients and improve system efficiency.
FedFa: A Fully Asynchronous Training Paradigm for Federated Learning	Haotian Xu, Zhaorui Zhang, Sheng Di, Benben Liu, Khalid Ayed Alharthi, Jiannong Cao	2024-04-17	下载	Federated learning has been identified as an efficient decentralized training paradigm for scaling the machine learning model training on a large number of devices while guaranteeing the data privacy ...
A Preliminary Study on Accelerating Simulation Optimization with GPU Implementation	Jinghai He, Haoyu Liu, Yuhang Wu, Zeyu Zheng, Tingyu Zhu	2024-04-17	下载	We provide a preliminary study on utilizing GPU (Graphics Processing Unit) to accelerate computation for three simulation optimization tasks with either first-order or second-order algorithms.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Araucaria: Simplifying INC Fault Tolerance with High-Level Intents	Ricardo Parizotto, Israat Haque, Alberto Schaeffer-Filho	2024-04-17	下载	Network programmability allows modification of fine-grain data plane functionality. The performance benefits of data plane programmability have motivated many researchers to offload computation that p...
SERENE: A Collusion Resilient Replication-based Verification Framework	Amir Esmaeili, Abderrahmen Mtibaa	2024-04-17	下载	The rapid advancement of autonomous driving technology is accompanied by substantial challenges, particularly the reliance on remote task execution without ensuring a reliable and accurate returned re...
What-if Analysis Framework for Digital Twins in 6G Wireless Network Management	Elif Ak, Berk Canberk, Vishal Sharma, Octavia A. Dobre, Trung Q. Duong	2024-04-17	下载	This study explores implementing a digital twin network (DTN) for efficient 6G wireless network management, aligning with the fault, configuration, accounting, performance, and security (FCAPS) model.
Enhancing Data Privacy In Wireless Sensor Networks: Investigating Techniques And Protocols To Protect Privacy Of Data Transmitted Over Wireless Sensor Networks In Critical Applications Of Healthcare And National Security	Akinsola Ahmed, Ejiofor Oluomachi, Akinde Abdullah, Njoku Tochukwu	2024-04-17	下载	The article discusses the emergence of Wireless Sensor Networks (WSNs) as a groundbreaking technology in data processing and communication. It outlines how WSNs, composed of dispersed autonomous senso...
Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks	Eri Hosonuma, Taku Yamazaki, Takumi Miyoshi, Akihito Taya, Yuuki Nishiyama, Kaoru Sezaki	2024-04-17	下载	To reduce network traffic and support environments with limited resources, a method for transmitting images with minimal transmission data is required.
On the Performance of RIS-assisted Networks with HQAM	Thrassos K. Oikonomou, Dimitrios Tyrovolas, Sotiris A. Tegos, Panagiotis D. Diamantoulakis, Panagiotis Sarigiannidis, Christos Liaskos, George K. Karagiannidis	2024-04-17	下载	In this paper, we investigate the application of hexagonal quadrature amplitude modulation (HQAM) in reconfigurable intelligent surface (RIS)-assisted networks, specifically focusing on its efficiency...
Approximate Wireless Communication for Lossy Gradient Updates in IoT Federated Learning	Xiang Ma, Haijian Sun, Rose Qingyang Hu, Yi Qian	2024-04-17	下载	Federated learning (FL) has emerged as a distributed machine learning (ML) technique that can protect local data privacy for participating clients and improve system efficiency.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM Workloads	Rachid Karami, Sheng-Chun Kao, Hyoukjun Kwon	2024-04-17	下载	Among ML operators today, GEneralMatrix Multiplication (GEMM)-based operators are known to be key operators that build the main backbone of ML models.