Skip to content

2024-09-06

cs.AR - Architecture

标题作者发布日期PDF摘要
Parallax: A Compiler for Neutral Atom Quantum Computers under Hardware ConstraintsJason Ludmir, Tirthak Patel2024-09-06下载Among different quantum computing technologies, neutral atom quantum computers have several advantageous features, such as multi-qubit gates, application-specific topologies, movable qubits, homogenou...
OPAL: Outlier-Preserved Microscaling Quantization Accelerator for Generative Large Language ModelsJahyun Koo, Dahoon Park, Sangwoo Jung, Jaeha Kung2024-09-06下载To overcome the burden on the memory size and bandwidth due to ever-increasing size of large language models (LLMs), aggressive weight quantization has been recently studied, while lacking research on...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of PerformanceWei Wen, Quanyu Zhu, Weiwei Chu, Wen-Yen Chen, Jiyan Yang2024-09-06下载Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models, especially for industry recommendation models and large language models.
Hermes: Memory-Efficient Pipeline Inference for Large Models on Edge DevicesXueyuan Han, Zinuo Cai, Yichu Zhang, Chongxin Fan, Junhan Liu, Ruhui Ma, Rajkumar Buyya2024-09-06下载The application of Transformer-based large models has achieved numerous success in recent years. However, the exponential growth in the parameters of large models introduces formidable memory challeng...
Revisiting the Time Cost Model of AllReduceDian Xiong, Li Chen, Youhe Jiang, Dan Li, Shuai Wang, Songtao Wang2024-09-06下载AllReduce is an important and popular collective communication primitive, which has been widely used in areas such as distributed machine learning and high performance computing.
3D System Design: A Case for Building Customized Modular Systems in 3DPhilip Emma, Eren Kurshan2024-09-06下载3D promises a new dimension in composing systems by aggregating chips. Literally. While the most common uses are still tightly connected with its early forms as a packaging technology, new application...
Heterogeneity-Aware Cooperative Federated Edge Learning with Adaptive Computation and Communication CompressionZhenxiao Zhang, Zhidong Gao, Yuanxiong Guo, Yanmin Gong2024-09-06下载Motivated by the drawbacks of cloud-based federated learning (FL), cooperative federated edge learning (CFEL) has been proposed to improve efficiency for FL over mobile edge networks, where multiple e...
Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark StudyJianwei Zhu, Hang Yin, Peng Deng, Aline Almeida, Shunfan Zhou2024-09-06下载This report evaluates the performance impact of enabling Trusted Execution Environments (TEE) on NVIDIA Hopper GPUs for large language model (LLM) inference tasks.
A Hybrid Vectorized Merge Sort on ARM NEONJincheng Zhou, Jin Zhang, Xiang Zhang, Tiaojie Xiao, Di Ma, Chunye Gong2024-09-06下载Sorting algorithms are the most extensively researched topics in computer science and serve for numerous practical applications. Although various sorts have been proposed for efficiency, different arc...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Digital Twin Enabled Data-Driven Approach for Traffic Efficiency and Software-Defined Vehicular Network OptimizationMohammad Sajid Shahriar, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin2024-09-06下载In the realms of the internet of vehicles (IoV) and intelligent transportation systems (ITS), software defined vehicular networks (SDVN) and edge computing (EC) have emerged as promising technologies ...
Fast Adaptation for Deep Learning-based Wireless CommunicationsOuya Wang, Hengtao He, Shenglong Zhou, Zhi Ding, Shi Jin, Khaled B. Letaief, Geoffrey Ye Li2024-09-06下载The integration with artificial intelligence (AI) is recognized as one of the six usage scenarios in next-generation wireless communications. However, several critical challenges hinder the widespread...
Minimizing Power Consumption under SINR Constraints for Cell-Free Massive MIMO in O-RANVaishnavi Kasuluru, Luis Blanco, Miguel Angel Vazquez, Cristian J. Vaca-Rubio, Engin Zeydan2024-09-06下载This paper deals with the problem of energy consumption minimization in Open RAN cell-free (CF) massive Multiple-Input Multiple-Output (mMIMO) systems under minimum per-user signal-to-noise-plus-inter...
A Centralized Discovery-Based Method for Integrating Data Distribution Service and Time-Sensitive Networking in In-Vehicle NetworksFeng Luo, Yi Ren, Yanhua Yu, Yunpeng Li, Zitong Wang2024-09-06下载As the electronic and electrical architecture (E/EA) of intelligent and connected vehicles (ICVs) evolves, traditional distributed and signal-oriented architectures are being replaced by centralized, ...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
The HitchHiker's Guide to High-Assurance System Observability Protection with Efficient Permission SwitchesChuqi Zhang, Jun Zeng, Yiming Zhang, Adil Ahmad, Fengwei Zhang, Hai Jin, Zhenkai Liang2024-09-06下载Protecting system observability records (logs) from compromised OSs has gained significant traction in recent times, with several note-worthy approaches proposed.

cs.PF - Performance

标题作者发布日期PDF摘要
Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark StudyJianwei Zhu, Hang Yin, Peng Deng, Aline Almeida, Shunfan Zhou2024-09-06下载This report evaluates the performance impact of enabling Trusted Execution Environments (TEE) on NVIDIA Hopper GPUs for large language model (LLM) inference tasks.

基于 VitePress 构建