2024-09-06

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Parallax: A Compiler for Neutral Atom Quantum Computers under Hardware Constraints	Jason Ludmir, Tirthak Patel	2024-09-06	下载	Among different quantum computing technologies, neutral atom quantum computers have several advantageous features, such as multi-qubit gates, application-specific topologies, movable qubits, homogenou...
OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language Models	Jahyun Koo, Dahoon Park, Sangwoo Jung, Jaeha Kung	2024-09-06	下载	To overcome the burden on the memory size and bandwidth due to ever-increasing size of large language models (LLMs), aggressive weight quantization has been recently studied, while lacking research on...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance	Wei Wen, Quanyu Zhu, Weiwei Chu, Wen-Yen Chen, Jiyan Yang	2024-09-06	下载	Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models, especially for industry recommendation models and large language models.
Hermes: Memory-Efficient Pipeline Inference for Large Models on Edge Devices	Xueyuan Han, Zinuo Cai, Yichu Zhang, Chongxin Fan, Junhan Liu, Ruhui Ma, Rajkumar Buyya	2024-09-06	下载	The application of Transformer-based large models has achieved numerous success in recent years. However, the exponential growth in the parameters of large models introduces formidable memory challeng...
Revisiting the Time Cost Model of AllReduce	Dian Xiong, Li Chen, Youhe Jiang, Dan Li, Shuai Wang, Songtao Wang	2024-09-06	下载	AllReduce is an important and popular collective communication primitive, which has been widely used in areas such as distributed machine learning and high performance computing.
3D System Design: A Case for Building Customized Modular Systems in 3D	Philip Emma, Eren Kurshan	2024-09-06	下载	3D promises a new dimension in composing systems by aggregating chips. Literally. While the most common uses are still tightly connected with its early forms as a packaging technology, new application...
Heterogeneity-Aware Cooperative Federated Edge Learning with Adaptive Computation and Communication Compression	Zhenxiao Zhang, Zhidong Gao, Yuanxiong Guo, Yanmin Gong	2024-09-06	下载	Motivated by the drawbacks of cloud-based federated learning (FL), cooperative federated edge learning (CFEL) has been proposed to improve efficiency for FL over mobile edge networks, where multiple e...
Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark Study	Jianwei Zhu, Hang Yin, Peng Deng, Aline Almeida, Shunfan Zhou	2024-09-06	下载	This report evaluates the performance impact of enabling Trusted Execution Environments (TEE) on NVIDIA Hopper GPUs for large language model (LLM) inference tasks.
A Hybrid Vectorized Merge Sort on ARM NEON	Jincheng Zhou, Jin Zhang, Xiang Zhang, Tiaojie Xiao, Di Ma, Chunye Gong	2024-09-06	下载	Sorting algorithms are the most extensively researched topics in computer science and serve for numerous practical applications. Although various sorts have been proposed for efficiency, different arc...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Digital Twin Enabled Data-Driven Approach for Traffic Efficiency and Software-Defined Vehicular Network Optimization	Mohammad Sajid Shahriar, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin	2024-09-06	下载	In the realms of the internet of vehicles (IoV) and intelligent transportation systems (ITS), software defined vehicular networks (SDVN) and edge computing (EC) have emerged as promising technologies ...
Fast Adaptation for Deep Learning-based Wireless Communications	Ouya Wang, Hengtao He, Shenglong Zhou, Zhi Ding, Shi Jin, Khaled B. Letaief, Geoffrey Ye Li	2024-09-06	下载	The integration with artificial intelligence (AI) is recognized as one of the six usage scenarios in next-generation wireless communications. However, several critical challenges hinder the widespread...
Minimizing Power Consumption under SINR Constraints for Cell-Free Massive MIMO in O-RAN	Vaishnavi Kasuluru, Luis Blanco, Miguel Angel Vazquez, Cristian J. Vaca-Rubio, Engin Zeydan	2024-09-06	下载	This paper deals with the problem of energy consumption minimization in Open RAN cell-free (CF) massive Multiple-Input Multiple-Output (mMIMO) systems under minimum per-user signal-to-noise-plus-inter...
A Centralized Discovery-Based Method for Integrating Data Distribution Service and Time-Sensitive Networking in In-Vehicle Networks	Feng Luo, Yi Ren, Yanhua Yu, Yunpeng Li, Zitong Wang	2024-09-06	下载	As the electronic and electrical architecture (E/EA) of intelligent and connected vehicles (ICVs) evolves, traditional distributed and signal-oriented architectures are being replaced by centralized, ...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
The HitchHiker's Guide to High-Assurance System Observability Protection with Efficient Permission Switches	Chuqi Zhang, Jun Zeng, Yiming Zhang, Adil Ahmad, Fengwei Zhang, Hai Jin, Zhenkai Liang	2024-09-06	下载	Protecting system observability records (logs) from compromised OSs has gained significant traction in recent times, with several note-worthy approaches proposed.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark Study	Jianwei Zhu, Hang Yin, Peng Deng, Aline Almeida, Shunfan Zhou	2024-09-06	下载	This report evaluates the performance impact of enabling Trusted Execution Environments (TEE) on NVIDIA Hopper GPUs for large language model (LLM) inference tasks.