Skip to content

2024-07-18

cs.AR - Architecture

标题作者发布日期PDF摘要
Automated and Holistic Co-design of Neural Networks and ASICs for Enabling In-Pixel IntelligenceShubha R. Kharel, Prashansa Mukim, Piotr Maj, Grzegorz W. Deptuch, Shinjae Yoo, Yihui Ren, Soumyajit Mandal2024-07-18下载Extreme edge-AI systems, such as those in readout ASICs for radiation detection, must operate under stringent hardware constraints such as micron-level dimensions, sub-milliwatt power, and nanosecond-...
SecScale: A Scalable and Secure Trusted Execution Environment for ServersAni Sunny, Nivedita Shrivastava, Smruti R. Sarangi2024-07-18下载Trusted execution environments (TEEs) are an integral part of modern secure processors. They ensure that their application and code pages are confidential, tamper proof and immune to diverse types of ...
Integrated Hardware Architecture and Device Placement SearchIrene Wang, Jakub Tarnawski, Amar Phanishayee, Divya Mahajan2024-07-18下载Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
PowerTrain: Fast, Generalizable Time and Power Prediction Models to Optimize DNN Training on Accelerated EdgesPrashanthi S. K., Saisamarth Taluri, Beautlin S, Lakshya Karwa, Yogesh Simmhan2024-07-18下载Accelerated edge devices, like Nvidia's Jetson with 1000+ CUDA cores, are increasingly used for DNN training and federated learning, rather than just for inferencing workloads.
Microservices-based Software Systems Reengineering: State-of-the-Art and Future DirectionsThakshila Imiya Mohottige, Artem Polyvyanyy, Rajkumar Buyya, Colin Fidge, Alistair Barros2024-07-18下载Designing software compatible with cloud-based Microservice Architectures (MSAs) is vital due to the performance, scalability, and availability limitations.
SecureVAX: A Blockchain-Enabled Secure Vaccine Passport SystemDebendranath Das, Sushmita Ruj, Subhamoy Maitra2024-07-18下载A vaccine passport serves as documentary proof, providing passport holders with greater freedom while roaming around during pandemics. It confirms vaccination against certain infectious diseases like ...
DPDPU: Data Processing with DPUsJiasheng Hu, Philip A. Bernstein, Jialin Li, Qizhen Zhang2024-07-18下载Improving the performance and reducing the cost of cloud data systems is increasingly challenging. Data processing units (DPUs) are a promising solution, but utilizing them for data processing needs c...
DDS: DPU-optimized Disaggregated Storage [Extended Report]Qizhen Zhang, Philip Bernstein, Badrish Chandramouli, Jiasheng Hu, Yiming Zheng2024-07-18下载This extended report presents DDS, a novel disaggregated storage architecture enabled by emerging networking hardware, namely DPUs (Data Processing Units).
Integrated Hardware Architecture and Device Placement SearchIrene Wang, Jakub Tarnawski, Amar Phanishayee, Divya Mahajan2024-07-18下载Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy.
Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU ReconfigurationTianyu Wang, Sheng Li, Bingyao Li, Yue Dai, Ao Li, Geng Yuan, Yufei Ding, Youtao Zhang, Xulong Tang2024-07-18下载Continuous learning (CL) has emerged as one of the most popular deep learning paradigms deployed in modern cloud GPUs. Specifically, CL has the capability to continuously update the model parameters (...
Reconfigurable Intelligent Surface Aided Vehicular Edge Computing: Joint Phase-shift Optimization and Multi-User Power AllocationKangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief2024-07-18下载Vehicular edge computing (VEC) is an emerging technology with significant potential in the field of internet of vehicles (IoV), enabling vehicles to perform intensive computational tasks locally or of...
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource SharingYizhou Luo, Qiang Wang, Shaohuai Shi, Jiaxin Lai, Shuhan Qi, Jiajia Zhang, Xuan Wang2024-07-18下载Deep learning (DL) has demonstrated significant success across diverse fields, leading to the construction of dedicated GPU accelerators within GPU clusters for high-quality training services.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Streaming Technologies and Serialization Protocols: Empirical Performance AnalysisSamuel Jackson, Nathan Cummings, Saiful Khan2024-07-18下载Efficient data streaming is essential for real-time data analytics, visualization, and machine learning model training, particularly when dealing with high-volume datasets.
An Agile Adaptation Method for Multi-mode Vehicle Communication NetworksShiwen He, Kanghong Chen, Shiyue Huang, Wei Huang, Zhenyu An2024-07-18下载This paper focuses on discovering the impact of communication mode allocation on communication efficiency in the vehicle communication networks.
TwinRAN: Twinning the 5G RAN in Azure CloudYash Deshpande, Eni Sulkaj, Wolfgang Kellerer2024-07-18下载The proliferation of 5G technology necessitates advanced network management strategies to ensure optimal performance and reliability. Digital Twin (DT)s have emerged as a promising paradigm for modeli...
Reconfigurable Intelligent Surface Aided Vehicular Edge Computing: Joint Phase-shift Optimization and Multi-User Power AllocationKangwei Qi, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief2024-07-18下载Vehicular edge computing (VEC) is an emerging technology with significant potential in the field of internet of vehicles (IoV), enabling vehicles to perform intensive computational tasks locally or of...
Matching-Driven Deep Reinforcement Learning for Energy-Efficient Transmission Parameter Allocation in Multi-Gateway LoRa NetworksZiqi Lin, Xu Zhang, Shimin Gong, Lanhua Li, Zhou Su, Bo Gu2024-07-18下载Long-range (LoRa) communication technology, distinguished by its low power consumption and long communication range, is widely used in the Internet of Things.

cs.PF - Performance

标题作者发布日期PDF摘要
Attention in SRAM on Tenstorrent GrayskullMoritz Thüning2024-07-18下载When implementations of the Transformer's self-attention layer utilize SRAM instead of DRAM, they can achieve significant speedups. The Tenstorrent Grayskull architecture provides a large SRAM, distri...
Forecasting GPU Performance for Deep Learning Training and InferenceSeonho Lee, Amar Phanishayee, Divya Mahajan2024-07-18下载Deep learning kernels exhibit predictable memory accesses and compute patterns, making GPUs' parallel architecture well-suited for their execution.
DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static InformationQiang Wang, Laiyi Li, Weile Luo, Yijia Zhang, Bingqiang Wang2024-07-18下载Increased reliance on graphics processing units (GPUs) for high-intensity computing tasks raises challenges regarding energy consumption. To address this issue, dynamic voltage and frequency scaling (...

基于 VitePress 构建