Skip to content

2024-02-23

cs.AR - Architecture

标题作者发布日期PDF摘要
Prime+Retouch: When Cache is Locked and LeakedJaehyuk Lee, Fan Sang, Taesoo Kim2024-02-23下载Caches on the modern commodity CPUs have become one of the major sources of side-channel leakages and been abused as a new attack vector. To thwart the cache-based side-channel attacks, two types of c...
Toward High Performance, Programmable Extreme-Edge Intelligence for Neuromorphic Vision Sensors utilizing Magnetic Domain Wall Motion-based MTJMd Abdullah-Al Kaiser, Gourav Datta, Peter A. Beerel, Akhilesh R. Jaiswal2024-02-23下载The desire to empower resource-limited edge devices with computer vision (CV) must overcome the high energy consumption of collecting and processing vast sensory data.
A3^3PIM: An Automated, Analytic and Accurate Processing-in-Memory OffloaderQingcai Jiang, Shaojie Tan, Junshi Chen, Hong An2024-02-23下载The performance gap between memory and processor has grown rapidly. Consequently, the energy and wall-clock time costs associated with moving data between the CPU and main memory predominate the overa...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUsZiheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu2024-02-23下载We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPU...
Portable acceleration of CMS computing workflows with coprocessors as a serviceCMS Collaboration2024-02-23下载Computing demands for large scientific experiments, such as the CMS experiment at the CERN LHC, will increase dramatically in the next decades.
Trustworthy confidential virtual machines for the massesAnna Galanou, Khushboo Bindlish, Luca Preibsch, Yvonne-Anne Pignolet, Christof Fetzer, Rüdiger Kapitza2024-02-23下载Confidential computing alleviates the concerns of distrustful customers by removing the cloud provider from their trusted computing base and resolves their disincentive to migrate their workloads to t...
PICO: Accelerating All k-Core Paradigms on GPUChen Zhao, Ting Yu, Zhigao Zheng, Song Jin, Jiawei Jiang, Bo Du, Dacheng Tao2024-02-23下载Core decomposition is a well-established graph mining problem with various applications that involves partitioning the graph into hierarchical subgraphs.
Streaming IoT Data and the Quantum Edge: A Classic/Quantum Machine Learning Use CaseSabrina Herbst, Vincenzo De Maio, Ivona Brandic2024-02-23下载With the advent of the Post-Moore era, the scientific community is faced with the challenge of addressing the demands of current data-intensive machine learning applications, which are the cornerstone...
Convergence Analysis of Split Federated Learning on Heterogeneous DataPengchao Han, Chao Huang, Geng Tian, Ming Tang, Xin Liu2024-02-23下载Split federated learning (SFL) is a recent distributed approach for collaborative model training among multiple clients. In SFL, a global model is typically split into two parts, where clients train o...
MSPipe: Efficient Temporal GNN Training via Staleness-Aware PipelineGuangming Sheng, Junwei Su, Chao Huang, Chuan Wu2024-02-23下载Memory-based Temporal Graph Neural Networks (MTGNNs) are a class of temporal graph neural networks that utilize a node memory module to capture and retain long-term temporal dependencies, leading to s...
Chu-ko-nu: A Reliable, Efficient, and Anonymously Authentication-Enabled Realization for Multi-Round Secure Aggregation in Federated LearningKaiping Cui, Xia Feng, Liangmin Wang, Haiqin Wu, Xiaoyu Zhang, Boris Düdder2024-02-23下载Secure aggregation enables federated learning (FL) to perform collaborative training of clients from local gradient updates without exposing raw data.
Sampling-based Distributed Training with Message Passing Neural NetworkPriyesh Kakka, Sheel Nidhan, Rishikesh Ranade, Jay Pathak, Jonathan F. MacArt2024-02-23下载In this study, we introduce a domain-decomposition-based distributed training and inference approach for message-passing neural networks (MPNN).
Two-Stage Block Orthogonalization to Improve Performance of ss-step GMRESIchitaro Yamazaki, Andrew J. Higgins, Erik G. Boman, Daniel B. Szyld2024-02-23下载On current computer architectures, GMRES' performance can be limited by its communication cost to generate orthonormal basis vectors of the Krylov subspace.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Electric Vehicles Limit Equitable Access to Essential Services During BlackoutsYamil Essus, Benjamin Rachunok2024-02-23下载Electric vehicles (EVs) link mobility and electric power availability, posing a risk of making transportation unavailable during blackouts. We develop a computational framework to quantify the impact ...
Low-Latency Upstream Scheduling in Multi-Tenant, SLA Compliant TWDM PONArijeet Ganguli, Marco Ruffini2024-02-23下载We present a multi-tenant multi-wavelength upstream transmission scheme for virtualised PONs, enabling compliance with latency-oriented Service Level Agreements (SLAs).

cs.PF - Performance

标题作者发布日期PDF摘要
A3^3PIM: An Automated, Analytic and Accurate Processing-in-Memory OffloaderQingcai Jiang, Shaojie Tan, Junshi Chen, Hong An2024-02-23下载The performance gap between memory and processor has grown rapidly. Consequently, the energy and wall-clock time costs associated with moving data between the CPU and main memory predominate the overa...

基于 VitePress 构建