Skip to content

2024-06-15

cs.AR - Architecture

标题作者发布日期PDF摘要
Triangel: A High-Performance, Accurate, Timely On-Chip Temporal PrefetcherSam Ainsworth, Lev Mukhanov2024-06-15下载Temporal prefetching, where correlated pairs of addresses are logged and replayed on repeat accesses, has recently become viable in commercial designs.
Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSVQian Chen, Xiaofeng Yang, Shengli Lu2024-06-15下载Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflows can be categorized into c...
FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator DesignNandeeka Nayak, Xinrui Wu, Toluwanimi O. Odemuyiwa, Michael Pellauer, Joel S. Emer, Christopher W. Fletcher2024-06-15下载Attention for transformers is a critical workload that has recently received significant "attention" as a target for custom acceleration. Yet, while prior work succeeds in reducing attention's memory-...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded OptimizersAvinash Maurya, Jie Ye, M. Mustafa Rafique, Franck Cappello, Bogdan Nicolae2024-06-15下载Transformers and LLMs have seen rapid adoption in all domains. Their sizes have exploded to hundreds of billions of parameters and keep increasing.
DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language ModelsAvinash Maurya, Robert Underwood, M. Mustafa Rafique, Franck Cappello, Bogdan Nicolae2024-06-15下载LLMs have seen rapid adoption in all domains. They need to be trained on high-end high-performance computing (HPC) infrastructures and ingest massive amounts of input data.
HiFGL: A Hierarchical Framework for Cross-silo Cross-device Federated Graph LearningZhuoning Guo, Duanyi Yao, Qiang Yang, Hao Liu2024-06-15下载Federated Graph Learning (FGL) has emerged as a promising way to learn high-quality representations from distributed graph data with privacy preservation.
Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSVQian Chen, Xiaofeng Yang, Shengli Lu2024-06-15下载Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflows can be categorized into c...
Federated Neural Radiance Field for Distributed IntelligenceYintian Zhang, Ziyu Shao2024-06-15下载Novel view synthesis (NVS) is an important technology for many AR and VR applications. The recently proposed Neural Radiance Field (NeRF) approach has demonstrated superior performance on NVS tasks, a...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Novel Joint DRL-Based Utility Optimization for UAV Data ServicesXuli Cai, Poonam Lohan, Burak Kantarci2024-06-15下载In this paper, we propose a novel joint deep reinforcement learning (DRL)-based solution to optimize the utility of an uncrewed aerial vehicle (UAV)-assisted communication network.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
ROSfs: A User-Level File System for ROSZijun Xu, Xuanjun Wen, Yanjie Song, Shu Yin2024-06-15下载We present ROSfs, a novel user-level file system for the Robot Operating System (ROS). ROSfs interprets a robot file as a group of sub-files, with each having a distinct label.

cs.PF - Performance

标题作者发布日期PDF摘要
Efficient Hardware Accelerator Based on Medium Granularity Dataflow for SpTRSVQian Chen, Xiaofeng Yang, Shengli Lu2024-06-15下载Sparse triangular solve (SpTRSV) is widely used in various domains. Numerous studies have been conducted using CPUs, GPUs, and specific hardware accelerators, where dataflows can be categorized into c...

基于 VitePress 构建