Skip to content

2024-04-17

cs.AR - Architecture

标题作者发布日期PDF摘要
Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM WorkloadsRachid Karami, Sheng-Chun Kao, Hyoukjun Kwon2024-04-17下载Among ML operators today, GEneralMatrix Multiplication (GEMM)-based operators are known to be key operators that build the main backbone of ML models.
Functionality Locality, Mixture & Control = Logic = MemoryXiangjun Peng2024-04-17下载This work provides new insights and constructs to the field of computer architecture and systems, and these insights are expected to be useful for the broad software stack.
Real Time Evolvable Hardware for Optimal Reconfiguration of Cusp-Like Pulse ShapersJuan Lanchares, Oscar Garnica, José L. Risco-Martín, J. Ignacio Hidalgo, J. Manuel Colmenar, Alfredo Cuesta2024-04-17下载The design of a cusp-like digital pulse shaper for particle energy measurements requires the definition of four parameters whose values are defined based on the nature of the shaper input signal (timi...
Revisiting Main Memory-Based Covert and Side Channel Attacks in the Context of Processing-in-MemoryF. Nisa Bostanci, Konstantinos Kanellopoulos, Ataberk Olgun, A. Giray Yaglikci, Ismail Emir Yuksel, Nika Mansouri Ghiasi, Zulal Bingol, Mohammad Sadrosadati, Onur Mutlu2024-04-17下载We introduce IMPACT, a set of high-throughput main memory-based timing attacks that leverage characteristics of processing-in-memory (PiM) architectures to establish covert and side channels.
Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAsEndri Taka, Dimitrios Gourounas, Andreas Gerstlauer, Diana Marculescu, Aman Arora2024-04-17下载FPGAs are a promising platform for accelerating Deep Learning (DL) applications, due to their high performance, low power consumption, and reconfigurability.
Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory AccessLuming Wang, Xu Zhang, Songyue Wang, Zhuolun Jiang, Tianyue Lu, Mingyu Chen, Siwei Luo, Keji Huang2024-04-17下载The growing memory demands of modern applications have driven the adoption of far memory technologies in data centers to provide cost-effective, high-capacity memory solutions.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Dual-pronged deep learning preprocessing on heterogeneous platforms with CPU, Accelerator and CSDJia Wei, Xingjun Zhang, Witold Pedrycz, Longxiang Wang, Jie Zhao2024-04-17下载For image-related deep learning tasks, the first step often involves reading data from external storage and performing preprocessing on the CPU.
Simulating Cloud Environments of Connected Vehicles for Anomaly DetectionM. Weiß, J. Stümpfle, F. Dettinger, N. Jazdi, M. Weyrich2024-04-17下载The emergence of connected vehicles is driven by increasing customer and regulatory demands. To meet these, more complex software applications, some of which require service-based cloud and edge backe...
Araucaria: Simplifying INC Fault Tolerance with High-Level IntentsRicardo Parizotto, Israat Haque, Alberto Schaeffer-Filho2024-04-17下载Network programmability allows modification of fine-grain data plane functionality. The performance benefits of data plane programmability have motivated many researchers to offload computation that p...
A Secure and Trustworthy Network Architecture for Federated Learning Healthcare ApplicationsAntonio Boiano, Marco Di Gennaro, Luca Barbieri, Michele Carminati, Monica Nicoli, Alessandro Redondi, Stefano Savazzi, Albert Sund Aillet, Diogo Reis Santos, Luigi Serio2024-04-17下载Federated Learning (FL) has emerged as a promising approach for privacy-preserving machine learning, particularly in sensitive domains such as healthcare.
FLeeC: a Fast Lock-Free Application CacheAndré J. Costa, Nuno M. Preguiça, João M. Lourenço2024-04-17下载When compared to blocking concurrency, non-blocking concurrency can provide higher performance in parallel shared-memory contexts, especially in high contention scenarios.
Hierarchical storage management in user space for neuroimaging applicationsValérie Hayot-Sasson, Tristan Glatard2024-04-17下载Neuroimaging open-data initiatives have led to increased availability of large scientific datasets. While these datasets are shifting the processing bottleneck from compute-intensive to data-intensive...
IoTSim-Osmosis-RES: Towards autonomic renewable energy-aware osmotic computingTomasz Szydlo, Amadeusz Szabala, Nazar Kordiumov, Konrad Siuzdak, Lukasz Wolski, Khaled Alwasel, Fawzy Habeeb, Rajiv Ranjan2024-04-17下载Internet of Things systems exists in various areas of our everyday life. For example, sensors installed in smart cities and homes are processed in edge and cloud computing centres providing several be...
Quantum Cloud Computing: A Review, Open Problems, and Future DirectionsHoa T. Nguyen, Prabhakar Krishnan, Dilip Krishnaswamy, Muhammad Usman, Rajkumar Buyya2024-04-17下载Quantum cloud computing is an emerging paradigm of computing that empowers quantum applications and their deployment on quantum computing resources without the need for a specialized environment to ho...
Distributed Fractional Bayesian Learning for Adaptive OptimizationYaqun Yang, Jinlong Lei, Guanghui Wen, Yiguang Hong2024-04-17下载This paper considers a distributed adaptive optimization problem, where all agents only have access to their local cost functions with a common unknown parameter, whereas they mean to collaboratively ...
Accelerating Geo-distributed Machine Learning with Network-Aware Adaptive Tree and Auxiliary RouteZonghang Li, Wenjiao Feng, Weibo Cai, Hongfang Yu, Long Luo, Gang Sun, Hongyang Du, Dusit Niyato2024-04-17下载Distributed machine learning is becoming increasingly popular for geo-distributed data analytics, facilitating the collaborative analysis of data scattered across data centers in different regions.
Undo and Redo Support for Replicated RegistersLeo Stewen, Martin Kleppmann2024-04-17下载Undo and redo functionality is ubiquitous in collaboration software. In single user settings, undo and redo are well understood. However, when multiple users edit a document, concurrency may arise, le...
Mutiny! How does Kubernetes fail, and what can we do about it?Marco Barletta, Marcello Cinque, Catello Di Martino, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer2024-04-17下载In this paper, we i) analyze and classify real-world failures of Kubernetes (the most popular container orchestration system), ii) develop a framework to perform a fault/error injection campaign targe...
XMiner: Efficient Directed Subgraph Matching with Pattern ReductionPingpeng Yuan, Yujiang Wang, Tianyu Ma, Siyuan He, Ling Liu2024-04-17下载Graph pattern matching, one of the fundamental graph mining problems, aims to extract structural patterns of interest from an input graph. The state-of-the-art graph matching algorithms and systems ar...
ScaleFold: Reducing AlphaFold Initial Training Time to 10 HoursFeiwen Zhu, Arkadiusz Nowaczynski, Rundong Li, Jie Xin, Yifei Song, Michal Marcinkiewicz, Sukru Burc Eryilmaz, Jun Yang, Michael Andersch2024-04-17下载AlphaFold2 has been hailed as a breakthrough in protein folding. It can rapidly predict protein structures with lab-grade accuracy. However, its implementation does not include the necessary training ...
Approximate Wireless Communication for Lossy Gradient Updates in IoT Federated LearningXiang Ma, Haijian Sun, Rose Qingyang Hu, Yi Qian2024-04-17下载Federated learning (FL) has emerged as a distributed machine learning (ML) technique that can protect local data privacy for participating clients and improve system efficiency.
FedFa: A Fully Asynchronous Training Paradigm for Federated LearningHaotian Xu, Zhaorui Zhang, Sheng Di, Benben Liu, Khalid Ayed Alharthi, Jiannong Cao2024-04-17下载Federated learning has been identified as an efficient decentralized training paradigm for scaling the machine learning model training on a large number of devices while guaranteeing the data privacy ...
A Preliminary Study on Accelerating Simulation Optimization with GPU ImplementationJinghai He, Haoyu Liu, Yuhang Wu, Zeyu Zheng, Tingyu Zhu2024-04-17下载We provide a preliminary study on utilizing GPU (Graphics Processing Unit) to accelerate computation for three simulation optimization tasks with either first-order or second-order algorithms.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Araucaria: Simplifying INC Fault Tolerance with High-Level IntentsRicardo Parizotto, Israat Haque, Alberto Schaeffer-Filho2024-04-17下载Network programmability allows modification of fine-grain data plane functionality. The performance benefits of data plane programmability have motivated many researchers to offload computation that p...
SERENE: A Collusion Resilient Replication-based Verification FrameworkAmir Esmaeili, Abderrahmen Mtibaa2024-04-17下载The rapid advancement of autonomous driving technology is accompanied by substantial challenges, particularly the reliance on remote task execution without ensuring a reliable and accurate returned re...
What-if Analysis Framework for Digital Twins in 6G Wireless Network ManagementElif Ak, Berk Canberk, Vishal Sharma, Octavia A. Dobre, Trung Q. Duong2024-04-17下载This study explores implementing a digital twin network (DTN) for efficient 6G wireless network management, aligning with the fault, configuration, accounting, performance, and security (FCAPS) model.
Enhancing Data Privacy In Wireless Sensor Networks: Investigating Techniques And Protocols To Protect Privacy Of Data Transmitted Over Wireless Sensor Networks In Critical Applications Of Healthcare And National SecurityAkinsola Ahmed, Ejiofor Oluomachi, Akinde Abdullah, Njoku Tochukwu2024-04-17下载The article discusses the emergence of Wireless Sensor Networks (WSNs) as a groundbreaking technology in data processing and communication. It outlines how WSNs, composed of dispersed autonomous senso...
Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited NetworksEri Hosonuma, Taku Yamazaki, Takumi Miyoshi, Akihito Taya, Yuuki Nishiyama, Kaoru Sezaki2024-04-17下载To reduce network traffic and support environments with limited resources, a method for transmitting images with minimal transmission data is required.
On the Performance of RIS-assisted Networks with HQAMThrassos K. Oikonomou, Dimitrios Tyrovolas, Sotiris A. Tegos, Panagiotis D. Diamantoulakis, Panagiotis Sarigiannidis, Christos Liaskos, George K. Karagiannidis2024-04-17下载In this paper, we investigate the application of hexagonal quadrature amplitude modulation (HQAM) in reconfigurable intelligent surface (RIS)-assisted networks, specifically focusing on its efficiency...
Approximate Wireless Communication for Lossy Gradient Updates in IoT Federated LearningXiang Ma, Haijian Sun, Rose Qingyang Hu, Yi Qian2024-04-17下载Federated learning (FL) has emerged as a distributed machine learning (ML) technique that can protect local data privacy for participating clients and improve system efficiency.

cs.PF - Performance

标题作者发布日期PDF摘要
Understanding the Performance Horizon of the Latest ML Workloads with NonGEMM WorkloadsRachid Karami, Sheng-Chun Kao, Hyoukjun Kwon2024-04-17下载Among ML operators today, GEneralMatrix Multiplication (GEMM)-based operators are known to be key operators that build the main backbone of ML models.

基于 VitePress 构建