Appearance
2024-07-18
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Automated and Holistic Co-design of Neural Networks and ASICs for Enabling In-Pixel Intelligence | Shubha R. Kharel, Prashansa Mukim, Piotr Maj, Grzegorz W. Deptuch, Shinjae Yoo, Yihui Ren, Soumyajit Mandal | 2024-07-18 | 下载 | Extreme edge-AI systems, such as those in readout ASICs for radiation detection, must operate under stringent hardware constraints such as micron-level dimensions, sub-milliwatt power, and nanosecond-... |
| SecScale: A Scalable and Secure Trusted Execution Environment for Servers | Ani Sunny, Nivedita Shrivastava, Smruti R. Sarangi | 2024-07-18 | 下载 | Trusted execution environments (TEEs) are an integral part of modern secure processors. They ensure that their application and code pages are confidential, tamper proof and immune to diverse types of ... |
| Integrated Hardware Architecture and Device Placement Search | Irene Wang, Jakub Tarnawski, Amar Phanishayee, Divya Mahajan | 2024-07-18 | 下载 | Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| PowerTrain: Fast, Generalizable Time and Power Prediction Models to Optimize DNN Training on Accelerated Edges | Prashanthi S. K., Saisamarth Taluri, Beautlin S, Lakshya Karwa, Yogesh Simmhan | 2024-07-18 | 下载 | Accelerated edge devices, like Nvidia's Jetson with 1000+ CUDA cores, are increasingly used for DNN training and federated learning, rather than just for inferencing workloads. |
| Microservices-based Software Systems Reengineering: State-of-the-Art and Future Directions | Thakshila Imiya Mohottige, Artem Polyvyanyy, Rajkumar Buyya, Colin Fidge, Alistair Barros | 2024-07-18 | 下载 | Designing software compatible with cloud-based Microservice Architectures (MSAs) is vital due to the performance, scalability, and availability limitations. |
| SecureVAX: A Blockchain-Enabled Secure Vaccine Passport System | Debendranath Das, Sushmita Ruj, Subhamoy Maitra | 2024-07-18 | 下载 | A vaccine passport serves as documentary proof, providing passport holders with greater freedom while roaming around during pandemics. It confirms vaccination against certain infectious diseases like ... |
| DPDPU: Data Processing with DPUs | Jiasheng Hu, Philip A. Bernstein, Jialin Li, Qizhen Zhang | 2024-07-18 | 下载 | Improving the performance and reducing the cost of cloud data systems is increasingly challenging. Data processing units (DPUs) are a promising solution, but utilizing them for data processing needs c... |
| DDS: DPU-optimized Disaggregated Storage [Extended Report] | Qizhen Zhang, Philip Bernstein, Badrish Chandramouli, Jiasheng Hu, Yiming Zheng | 2024-07-18 | 下载 | This extended report presents DDS, a novel disaggregated storage architecture enabled by emerging networking hardware, namely DPUs (Data Processing Units). |
| Integrated Hardware Architecture and Device Placement Search | Irene Wang, Jakub Tarnawski, Amar Phanishayee, Divya Mahajan | 2024-07-18 | 下载 | Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy. |
| Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration | Tianyu Wang, Sheng Li, Bingyao Li, Yue Dai, Ao Li, Geng Yuan, Yufei Ding, Youtao Zhang, Xulong Tang | 2024-07-18 | 下载 | Continuous learning (CL) has emerged as one of the most popular deep learning paradigms deployed in modern cloud GPUs. Specifically, CL has the capability to continuously update the model parameters (... |
| Reconfigurable Intelligent Surface Aided Vehicular Edge Computing: Joint Phase-shift Optimization and Multi-User Power Allocation | Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief | 2024-07-18 | 下载 | Vehicular edge computing (VEC) is an emerging technology with significant potential in the field of internet of vehicles (IoV), enabling vehicles to perform intensive computational tasks locally or of... |
| Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing | Yizhou Luo, Qiang Wang, Shaohuai Shi, Jiaxin Lai, Shuhan Qi, Jiajia Zhang, Xuan Wang | 2024-07-18 | 下载 | Deep learning (DL) has demonstrated significant success across diverse fields, leading to the construction of dedicated GPU accelerators within GPU clusters for high-quality training services. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Streaming Technologies and Serialization Protocols: Empirical Performance Analysis | Samuel Jackson, Nathan Cummings, Saiful Khan | 2024-07-18 | 下载 | Efficient data streaming is essential for real-time data analytics, visualization, and machine learning model training, particularly when dealing with high-volume datasets. |
| An Agile Adaptation Method for Multi-mode Vehicle Communication Networks | Shiwen He, Kanghong Chen, Shiyue Huang, Wei Huang, Zhenyu An | 2024-07-18 | 下载 | This paper focuses on discovering the impact of communication mode allocation on communication efficiency in the vehicle communication networks. |
| TwinRAN: Twinning the 5G RAN in Azure Cloud | Yash Deshpande, Eni Sulkaj, Wolfgang Kellerer | 2024-07-18 | 下载 | The proliferation of 5G technology necessitates advanced network management strategies to ensure optimal performance and reliability. Digital Twin (DT)s have emerged as a promising paradigm for modeli... |
| Reconfigurable Intelligent Surface Aided Vehicular Edge Computing: Joint Phase-shift Optimization and Multi-User Power Allocation | Kangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief | 2024-07-18 | 下载 | Vehicular edge computing (VEC) is an emerging technology with significant potential in the field of internet of vehicles (IoV), enabling vehicles to perform intensive computational tasks locally or of... |
| Matching-Driven Deep Reinforcement Learning for Energy-Efficient Transmission Parameter Allocation in Multi-Gateway LoRa Networks | Ziqi Lin, Xu Zhang, Shimin Gong, Lanhua Li, Zhou Su, Bo Gu | 2024-07-18 | 下载 | Long-range (LoRa) communication technology, distinguished by its low power consumption and long communication range, is widely used in the Internet of Things. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Attention in SRAM on Tenstorrent Grayskull | Moritz Thüning | 2024-07-18 | 下载 | When implementations of the Transformer's self-attention layer utilize SRAM instead of DRAM, they can achieve significant speedups. The Tenstorrent Grayskull architecture provides a large SRAM, distri... |
| Forecasting GPU Performance for Deep Learning Training and Inference | Seonho Lee, Amar Phanishayee, Divya Mahajan | 2024-07-18 | 下载 | Deep learning kernels exhibit predictable memory accesses and compute patterns, making GPUs' parallel architecture well-suited for their execution. |
| DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information | Qiang Wang, Laiyi Li, Weile Luo, Yijia Zhang, Bingqiang Wang | 2024-07-18 | 下载 | Increased reliance on graphics processing units (GPUs) for high-intensity computing tasks raises challenges regarding energy consumption. To address this issue, dynamic voltage and frequency scaling (... |