2024-02-23

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Prime+Retouch: When Cache is Locked and Leaked	Jaehyuk Lee, Fan Sang, Taesoo Kim	2024-02-23	下载	Caches on the modern commodity CPUs have become one of the major sources of side-channel leakages and been abused as a new attack vector. To thwart the cache-based side-channel attacks, two types of c...
Toward High Performance, Programmable Extreme-Edge Intelligence for Neuromorphic Vision Sensors utilizing Magnetic Domain Wall Motion-based MTJ	Md Abdullah-Al Kaiser, Gourav Datta, Peter A. Beerel, Akhilesh R. Jaiswal	2024-02-23	下载	The desire to empower resource-limited edge devices with computer vision (CV) must overcome the high energy consumption of collecting and processing vast sensory data.
A $^3$ PIM: An Automated, Analytic and Accurate Processing-in-Memory Offloader	Qingcai Jiang, Shaojie Tan, Junshi Chen, Hong An	2024-02-23	下载	The performance gap between memory and processor has grown rapidly. Consequently, the energy and wall-clock time costs associated with moving data between the CPU and main memory predominate the overa...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs	Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu	2024-02-23	下载	We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPU...
Portable acceleration of CMS computing workflows with coprocessors as a service	CMS Collaboration	2024-02-23	下载	Computing demands for large scientific experiments, such as the CMS experiment at the CERN LHC, will increase dramatically in the next decades.
Trustworthy confidential virtual machines for the masses	Anna Galanou, Khushboo Bindlish, Luca Preibsch, Yvonne-Anne Pignolet, Christof Fetzer, Rüdiger Kapitza	2024-02-23	下载	Confidential computing alleviates the concerns of distrustful customers by removing the cloud provider from their trusted computing base and resolves their disincentive to migrate their workloads to t...
PICO: Accelerating All k-Core Paradigms on GPU	Chen Zhao, Ting Yu, Zhigao Zheng, Song Jin, Jiawei Jiang, Bo Du, Dacheng Tao	2024-02-23	下载	Core decomposition is a well-established graph mining problem with various applications that involves partitioning the graph into hierarchical subgraphs.
Streaming IoT Data and the Quantum Edge: A Classic/Quantum Machine Learning Use Case	Sabrina Herbst, Vincenzo De Maio, Ivona Brandic	2024-02-23	下载	With the advent of the Post-Moore era, the scientific community is faced with the challenge of addressing the demands of current data-intensive machine learning applications, which are the cornerstone...
Convergence Analysis of Split Federated Learning on Heterogeneous Data	Pengchao Han, Chao Huang, Geng Tian, Ming Tang, Xin Liu	2024-02-23	下载	Split federated learning (SFL) is a recent distributed approach for collaborative model training among multiple clients. In SFL, a global model is typically split into two parts, where clients train o...
MSPipe: Efficient Temporal GNN Training via Staleness-Aware Pipeline	Guangming Sheng, Junwei Su, Chao Huang, Chuan Wu	2024-02-23	下载	Memory-based Temporal Graph Neural Networks (MTGNNs) are a class of temporal graph neural networks that utilize a node memory module to capture and retain long-term temporal dependencies, leading to s...
Chu-ko-nu: A Reliable, Efficient, and Anonymously Authentication-Enabled Realization for Multi-Round Secure Aggregation in Federated Learning	Kaiping Cui, Xia Feng, Liangmin Wang, Haiqin Wu, Xiaoyu Zhang, Boris Düdder	2024-02-23	下载	Secure aggregation enables federated learning (FL) to perform collaborative training of clients from local gradient updates without exposing raw data.
Sampling-based Distributed Training with Message Passing Neural Network	Priyesh Kakka, Sheel Nidhan, Rishikesh Ranade, Jay Pathak, Jonathan F. MacArt	2024-02-23	下载	In this study, we introduce a domain-decomposition-based distributed training and inference approach for message-passing neural networks (MPNN).
Two-Stage Block Orthogonalization to Improve Performance of $s$ -step GMRES	Ichitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld	2024-02-23	下载	On current computer architectures, GMRES' performance can be limited by its communication cost to generate orthonormal basis vectors of the Krylov subspace.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Electric Vehicles Limit Equitable Access to Essential Services During Blackouts	Yamil Essus, Benjamin Rachunok	2024-02-23	下载	Electric vehicles (EVs) link mobility and electric power availability, posing a risk of making transportation unavailable during blackouts. We develop a computational framework to quantify the impact ...
Low-Latency Upstream Scheduling in Multi-Tenant, SLA Compliant TWDM PON	Arijeet Ganguli, Marco Ruffini	2024-02-23	下载	We present a multi-tenant multi-wavelength upstream transmission scheme for virtualised PONs, enabling compliance with latency-oriented Service Level Agreements (SLAs).

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
A $^3$ PIM: An Automated, Analytic and Accurate Processing-in-Memory Offloader	Qingcai Jiang, Shaojie Tan, Junshi Chen, Hong An	2024-02-23	下载	The performance gap between memory and processor has grown rapidly. Consequently, the energy and wall-clock time costs associated with moving data between the CPU and main memory predominate the overa...