Skip to content

2024-06-14

cs.AR - Architecture

标题作者发布日期PDF摘要
CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank MemoriesMan Shi, Steven Colleman, Charlotte VanDeMieroop, Antony Joseph, Maurice Meijer, Wim Dehaene, Marian Verhelst2024-06-14下载Deep neural networks (DNN) use a wide range of network topologies to achieve high accuracy within diverse applications. This model diversity makes it impossible to identify a single "dataflow" (execut...
Optimising GPGPU Execution Through Runtime Micro-Architecture Parameter AnalysisGiuseppe M. Sarda, Nimish Shah, Debjyoti Bhattacharjee, Peter Debacker, Marian Verhelst2024-06-14下载GPGPU execution analysis has always been tied to closed-source, proprietary benchmarking tools that provide high-level, non-exhaustive, and/or statistical information, preventing a thorough understand...
Optimizing Layer-Fused Scheduling of Transformer Networks on Multi-accelerator PlatformsSteven Colleman, Arne Symons, Victor J. B. Jung, Marian Verhelst2024-06-14下载The impact of transformer networks is booming, yet, they come with significant computational complexity. It is therefore essential to understand how to optimally map and execute these networks on mode...
SAGA: Synthesis Augmentation with Genetic Algorithms for In-Memory Sequence OptimizationAndey Robins, Mike Borowczak2024-06-14下载The von-Neumann architecture has a bottleneck which limits the speed at which data can be made available for computation. To combat this problem, novel paradigms for computing are being developed.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Byzantine-Robust Decentralized Federated LearningMinghong Fang, Zifan Zhang, Hairi, Prashant Khanduri, Jia Liu, Songtao Lu, Yuchen Liu, Neil Gong2024-06-14下载Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data.
A Comparison of the Performance of the Molecular Dynamics Simulation Package GROMACS Implemented in the SYCL and CUDA Programming ModelsL. Apanasevich, Yogesh Kale, Himanshu Sharma, Ana Marija Sokovic2024-06-14下载For many years, systems running Nvidia-based GPU architectures have dominated the heterogeneous supercomputer landscape. However, recently GPU chipsets manufactured by Intel and AMD have cut into this...
Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectorsSiyuan Chen, Zhuofeng Wang, Zelong Guan, Yudong Liu, Phillip B. Gibbons2024-06-14下载Fine-tuning large language models (LLMs) requires significant memory, often exceeding the capacity of a single GPU. A common solution to this memory challenge is offloading compute and data from the G...
Harnessing GPU Power for Enhanced OLTP: A Study in Concurrency Control SchemesZihan Sun, Yong Zhang, Chao Li, Chunxiao Xing2024-06-14下载GPUs, whose performance has gone through a huge leap over the past decade, have proved their ability to accelerate Online Analytical Processing (OLAP) operations.
CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank MemoriesMan Shi, Steven Colleman, Charlotte VanDeMieroop, Antony Joseph, Maurice Meijer, Wim Dehaene, Marian Verhelst2024-06-14下载Deep neural networks (DNN) use a wide range of network topologies to achieve high accuracy within diverse applications. This model diversity makes it impossible to identify a single "dataflow" (execut...
Federated Learning with Flexible ArchitecturesJong-Ik Park, Carlee Joe-Wong2024-06-14下载Traditional federated learning (FL) methods have limited support for clients with varying computational and communication abilities, leading to inefficiencies and potential inaccuracies in model train...
A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned AttentionHeejun Lee, Geon Park, Youngwan Lee, Jaduk Suh, Jina Kim, Wonyoung Jeong, Bumsik Kim, Hyemin Lee, Myeongjae Jeon, Sung Ju Hwang2024-06-14下载In modern large language models (LLMs), increasing the context length is crucial for improving comprehension and coherence in long-context, multi-modal, and retrieval-augmented language generation.
Optimization policy for file replica placement in fog domainsCarlos Guerrero, Isaac Lera, Carlos Juiz2024-06-14下载Fog computing architectures distribute computational and storage resources along the continuum from the cloud to things. Therefore, the execution of services or the storage of files can be closer to t...
PixRO: Pixel-Distributed Rotational Odometry with Gaussian Belief PropagationIgnacio Alzugaray, Riku Murai, Andrew Davison2024-06-14下载Images are the standard input for most computer vision algorithms. However, their processing often reduces to parallelizable operations applied locally and independently to individual pixels.
Speed-up of Data Analysis with Kernel Trick in Encrypted DomainJoon Soo Yoo, Baek Kyung Song, Tae Min Ahn, Ji Won Heo, Ji Won Yoon2024-06-14下载Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially fo...
Heterogeneous Federated Learning with Convolutional and Spiking Neural NetworksYingchao Yu, Yuping Yan, Jisong Cai, Yaochu Jin2024-06-14下载Federated learning (FL) has emerged as a promising paradigm for training models on decentralized data while safeguarding data privacy. Most existing FL systems, however, assume that all machine learni...
Cyberattack Data Analysis in IoT Environments using Big DataNeelam Patidar, Sally Zreiqat, Sirisha Mahesh, Jongwook Woo2024-06-14下载In the landscape of the Internet of Things (IoT), transforming various industries, our research addresses the growing connectivity and security challenges, including interoperability and standardized ...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A New Realistic Platform for Benchmarking and Performance Evaluation of DRL-Driven and Reconfigurable SFC Provisioning SolutionsMurat Arda Onsu, Poonam Lohan, Burak Kantarci, Emil Janulewicz, Sergio Slobodrian2024-06-14下载Service Function Chain (SFC) provisioning stands as a pivotal technology in the realm of 5G and future networks. Its essence lies in orchestrating VNFs (Virtual Network Functions) in a specified seque...
A Near-Optimal Category Information Sampling in RFID SystemsXiujun Wang, Zhi Liu, Xiaokang Zhou, Yong Liao, Han Hu, Xiao Zheng, Jie Li2024-06-14下载In many RFID-enabled applications, objects are classified into different categories, and the information associated with each object's category (called category information) is written into the attach...
Efficient Mixed Integer Linear Programming Approaches to Dynamic Path RestorationAlexander Rubtsov, Bruno Bauwens, Dmitri Shmelkin, Elizaveta Rudenko, Alexey Lavrov2024-06-14下载We consider the problem of single link failure in an elastic optical network, (also known as flex-grid WDM network). The task is to reroute optical connections that go through the broken link using fr...
Intra-QLAN Connectivity: beyond the Physical TopologyFrancesco Mazza, Marcello Caleffi, Angela Sara Cacciapuoti2024-06-14下载In the near to mid future, Quantum Local Area Networks (QLANs) -- the fundamental building block of the Quantum Internet -- will unlike exhibit physical topologies characterized by densely physical co...
ARA-O-RAN: End-to-End Programmable O-RAN Living Lab for Agriculture and Rural CommunitiesTianyi Zhang, Joshua Ofori Boateng, Taimoor UI Islam, Arsalan Ahmad, Hongwei Zhang, Daji Qiao2024-06-14下载As wireless networks evolve towards open architectures like O-RAN, testing, and integration platforms are crucial to address challenges like interoperability.
Carbon-Aware End-to-End Data MovementJacob Goldverg, Hasibul Jamil, Elvis Rodriguez, Tevfik Kosar2024-06-14下载The latest trends in the adoption of cloud, edge, and distributed computing, as well as a rise in applying AI/ML workloads, have created a need to measure, monitor, and reduce the carbon emissions of ...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
SquirrelFS: using the Rust compiler to check file-system crash consistencyHayley LeBlanc, Nathan Taylor, James Bornholt, Vijay Chidambaram2024-06-14下载This work introduces a new approach to building crash-safe file systems for persistent memory. We exploit the fact that Rust's typestate pattern allows compile-time enforcement of a specific order of ...

cs.PF - Performance

标题作者发布日期PDF摘要
A Comparison of the Performance of the Molecular Dynamics Simulation Package GROMACS Implemented in the SYCL and CUDA Programming ModelsL. Apanasevich, Yogesh Kale, Himanshu Sharma, Ana Marija Sokovic2024-06-14下载For many years, systems running Nvidia-based GPU architectures have dominated the heterogeneous supercomputer landscape. However, recently GPU chipsets manufactured by Intel and AMD have cut into this...

基于 VitePress 构建