2024-07-18

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Automated and Holistic Co-design of Neural Networks and ASICs for Enabling In-Pixel Intelligence	Shubha R. Kharel, Prashansa Mukim, Piotr Maj, Grzegorz W. Deptuch, Shinjae Yoo, Yihui Ren, Soumyajit Mandal	2024-07-18	下载	Extreme edge-AI systems, such as those in readout ASICs for radiation detection, must operate under stringent hardware constraints such as micron-level dimensions, sub-milliwatt power, and nanosecond-...
SecScale: A Scalable and Secure Trusted Execution Environment for Servers	Ani Sunny, Nivedita Shrivastava, Smruti R. Sarangi	2024-07-18	下载	Trusted execution environments (TEEs) are an integral part of modern secure processors. They ensure that their application and code pages are confidential, tamper proof and immune to diverse types of ...
Integrated Hardware Architecture and Device Placement Search	Irene Wang, Jakub Tarnawski, Amar Phanishayee, Divya Mahajan	2024-07-18	下载	Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
PowerTrain: Fast, Generalizable Time and Power Prediction Models to Optimize DNN Training on Accelerated Edges	Prashanthi S. K., Saisamarth Taluri, Beautlin S, Lakshya Karwa, Yogesh Simmhan	2024-07-18	下载	Accelerated edge devices, like Nvidia's Jetson with 1000+ CUDA cores, are increasingly used for DNN training and federated learning, rather than just for inferencing workloads.
Microservices-based Software Systems Reengineering: State-of-the-Art and Future Directions	Thakshila Imiya Mohottige, Artem Polyvyanyy, Rajkumar Buyya, Colin Fidge, Alistair Barros	2024-07-18	下载	Designing software compatible with cloud-based Microservice Architectures (MSAs) is vital due to the performance, scalability, and availability limitations.
SecureVAX: A Blockchain-Enabled Secure Vaccine Passport System	Debendranath Das, Sushmita Ruj, Subhamoy Maitra	2024-07-18	下载	A vaccine passport serves as documentary proof, providing passport holders with greater freedom while roaming around during pandemics. It confirms vaccination against certain infectious diseases like ...
DPDPU: Data Processing with DPUs	Jiasheng Hu, Philip A. Bernstein, Jialin Li, Qizhen Zhang	2024-07-18	下载	Improving the performance and reducing the cost of cloud data systems is increasingly challenging. Data processing units (DPUs) are a promising solution, but utilizing them for data processing needs c...
DDS: DPU-optimized Disaggregated Storage [Extended Report]	Qizhen Zhang, Philip Bernstein, Badrish Chandramouli, Jiasheng Hu, Yiming Zheng	2024-07-18	下载	This extended report presents DDS, a novel disaggregated storage architecture enabled by emerging networking hardware, namely DPUs (Data Processing Units).
Integrated Hardware Architecture and Device Placement Search	Irene Wang, Jakub Tarnawski, Amar Phanishayee, Divya Mahajan	2024-07-18	下载	Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy.
Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration	Tianyu Wang, Sheng Li, Bingyao Li, Yue Dai, Ao Li, Geng Yuan, Yufei Ding, Youtao Zhang, Xulong Tang	2024-07-18	下载	Continuous learning (CL) has emerged as one of the most popular deep learning paradigms deployed in modern cloud GPUs. Specifically, CL has the capability to continuously update the model parameters (...
Reconfigurable Intelligent Surface Aided Vehicular Edge Computing: Joint Phase-shift Optimization and Multi-User Power Allocation	Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief	2024-07-18	下载	Vehicular edge computing (VEC) is an emerging technology with significant potential in the field of internet of vehicles (IoV), enabling vehicles to perform intensive computational tasks locally or of...
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing	Yizhou Luo, Qiang Wang, Shaohuai Shi, Jiaxin Lai, Shuhan Qi, Jiajia Zhang, Xuan Wang	2024-07-18	下载	Deep learning (DL) has demonstrated significant success across diverse fields, leading to the construction of dedicated GPU accelerators within GPU clusters for high-quality training services.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Streaming Technologies and Serialization Protocols: Empirical Performance Analysis	Samuel Jackson, Nathan Cummings, Saiful Khan	2024-07-18	下载	Efficient data streaming is essential for real-time data analytics, visualization, and machine learning model training, particularly when dealing with high-volume datasets.
An Agile Adaptation Method for Multi-mode Vehicle Communication Networks	Shiwen He, Kanghong Chen, Shiyue Huang, Wei Huang, Zhenyu An	2024-07-18	下载	This paper focuses on discovering the impact of communication mode allocation on communication efficiency in the vehicle communication networks.
TwinRAN: Twinning the 5G RAN in Azure Cloud	Yash Deshpande, Eni Sulkaj, Wolfgang Kellerer	2024-07-18	下载	The proliferation of 5G technology necessitates advanced network management strategies to ensure optimal performance and reliability. Digital Twin (DT)s have emerged as a promising paradigm for modeli...
Reconfigurable Intelligent Surface Aided Vehicular Edge Computing: Joint Phase-shift Optimization and Multi-User Power Allocation	Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief	2024-07-18	下载	Vehicular edge computing (VEC) is an emerging technology with significant potential in the field of internet of vehicles (IoV), enabling vehicles to perform intensive computational tasks locally or of...
Matching-Driven Deep Reinforcement Learning for Energy-Efficient Transmission Parameter Allocation in Multi-Gateway LoRa Networks	Ziqi Lin, Xu Zhang, Shimin Gong, Lanhua Li, Zhou Su, Bo Gu	2024-07-18	下载	Long-range (LoRa) communication technology, distinguished by its low power consumption and long communication range, is widely used in the Internet of Things.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Attention in SRAM on Tenstorrent Grayskull	Moritz Thüning	2024-07-18	下载	When implementations of the Transformer's self-attention layer utilize SRAM instead of DRAM, they can achieve significant speedups. The Tenstorrent Grayskull architecture provides a large SRAM, distri...
Forecasting GPU Performance for Deep Learning Training and Inference	Seonho Lee, Amar Phanishayee, Divya Mahajan	2024-07-18	下载	Deep learning kernels exhibit predictable memory accesses and compute patterns, making GPUs' parallel architecture well-suited for their execution.
DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information	Qiang Wang, Laiyi Li, Weile Luo, Yijia Zhang, Bingqiang Wang	2024-07-18	下载	Increased reliance on graphics processing units (GPUs) for high-intensity computing tasks raises challenges regarding energy consumption. To address this issue, dynamic voltage and frequency scaling (...