Appearance
2024-11-18
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration | Yuzong Chen, Ahmed F. AbouElhamayed, Xilai Dai, Yang Wang, Marta Andronic, George A. Constantinides, Mohamed S. Abdelfattah | 2024-11-18 | 下载 | Large language models (LLMs) have demonstrated remarkable performance across various machine learning tasks. Yet the substantial memory footprint of LLMs significantly hinders their deployment. |
| Analysis of Hardware Synthesis Strategies for Machine Learning in Collider Trigger and Data Acquisition | Haoyi Jia, Abhilasha Dave, Julia Gonski, Ryan Herbst | 2024-11-18 | 下载 | To fully exploit the physics potential of current and future high energy particle colliders, machine learning (ML) can be implemented in detector electronics for intelligent data processing and acquis... |
| Teapot: Efficiently Uncovering Spectre Gadgets in COTS Binaries | Fangzheng Lin, Zhongfa Wang, Hiroshi Sasaki | 2024-11-18 | 下载 | Speculative execution is crucial in enhancing modern processor performance but can introduce Spectre-type vulnerabilities that may leak sensitive information. |
| An Efficient Multicast Addressing Encoding Scheme for Multi-Core Neuromorphic Processors | Zhe Su, Aron Bencsik, Giacomo Indiveri, Davide Bertozzi | 2024-11-18 | 下载 | Multi-core neuromorphic processors are becoming increasingly significant due to their energy-efficient local computing and scalable modular architecture, particularly for event-based processing applic... |
| SILVIA: Automated Superword-Level Parallelism Exploitation via HLS-Specific LLVM Passes for Compute-Intensive FPGA Accelerators | Giovanni Brignone, Roberto Bosio, Fabrizio Ottati, Claudio Sansoè, Luciano Lavagno | 2024-11-18 | 下载 | High-level synthesis (HLS) aims at democratizing custom hardware acceleration with highly abstracted software-like descriptions. However, efficient accelerators still require substantial low-level har... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations | Igor Fedorov, Kate Plawiak, Lemeng Wu, Tarek Elgamal, Naveen Suda, Eric Smith, Hongyuan Zhan, Jianfeng Chi, Yuriy Hulovatyy, Kimish Patel, Zechun Liu, Changsheng Zhao, Yangyang Shi, Tijmen Blankevoort, Mahesh Pasupuleti, Bilge Soran, Zacharie Delpierre Coudert, Rachad Alao, Raghuraman Krishnamoorthi, Vikas Chandra | 2024-11-18 | 下载 | This paper presents Llama Guard 3-1B-INT4, a compact and efficient Llama Guard model, which has been open-sourced to the community during Meta Connect 2024. |
| Scaling Deep Learning Research with Kubernetes on the NRP Nautilus HyperCluster | J. Alex Hurt, Anes Ouadou, Mariam Alshehri, Grant J. Scott | 2024-11-18 | 下载 | Throughout the scientific computing space, deep learning algorithms have shown excellent performance in a wide range of applications. As these deep neural networks (DNNs) continue to mature, the neces... |
| Towards Scalable and Practical Batch-Dynamic Connectivity | Quinten De Man, Laxman Dhulipala, Adam Karczmarz, Jakub Łącki, Julian Shun, Zhongqi Wang | 2024-11-18 | 下载 | We study the problem of dynamically maintaining the connected components of an undirected graph subject to edge insertions and deletions. We give the first parallel algorithm for the problem which is ... |
| Distributed Maximum Flow in Planar Graphs | Yaseen Abd-Elhaleem, Michal Dory, Merav Parter, Oren Weimann | 2024-11-18 | 下载 | The dual of a planar graph is a planar graph that has a vertex for each face of and an edge for each pair of adjacent faces of . |
| FLMarket: Enabling Privacy-preserved Pre-training Data Pricing for Federated Learning | Zhenyu Wen, Wanglei Feng, Di Wu, Haozhen Hu, Chang Xu, Bin Qian, Zhen Hong, Cong Wang, Shouling Ji | 2024-11-18 | 下载 | Federated Learning (FL), as a mainstream privacy-preserving machine learning paradigm, offers promising solutions for privacy-critical domains such as healthcare and finance. |
| Generative AI on the Edge: Architecture and Performance Evaluation | Zeinab Nezami, Maryam Hafeez, Karim Djemame, Syed Ali Raza Zaidi | 2024-11-18 | 下载 | 6G's AI native vision of embedding advance intelligence in the network while bringing it closer to the user requires a systematic evaluation of Generative AI (GenAI) models on edge devices. |
| Hash & Adjust: Competitive Demand-Aware Consistent Hashing | Arash Pourdamghani, Chen Avin, Robert Sama, Maryam Shiran, Stefan Schmid | 2024-11-18 | 下载 | Distributed systems often serve dynamic workloads and resource demands evolve over time. Such a temporal behavior stands in contrast to the static and demand-oblivious nature of most data structures u... |
| Topology-aware Preemptive Scheduling for Co-located LLM Workloads | Ping Zhang, Lei Su, Jinjie Yang, Xin Chen | 2024-11-18 | 下载 | Hosting diverse large language model workloads in a unified resource pool through co-location is cost-effective. For example, long-running chat services generally follow diurnal traffic patterns, whic... |
| gpuPairHMM: High-speed Pair-HMM Forward Algorithm for DNA Variant Calling on GPUs | Bertil Schmidt, Felix Kallenborn, Alexander Wichmann, Alejandro Chacon, Christian Hundt | 2024-11-18 | 下载 | The continually increasing volume of DNA sequence data has resulted in a growing demand for fast implementations of core algorithms. Computation of pairwise alignments between candidate haplotypes and... |
| The Jevons Paradox In Cloud Computing: A Thermodynamics Perspective | Prateek Sharma | 2024-11-18 | 下载 | How do we explain the simultaneous growth in energy efficiency of cloud computing and its energy consumption? The Jevons paradox provides one perspective of this phenomenon. |
| LSRAM: A Lightweight Autoscaling and SLO Resource Allocation Framework for Microservices Based on Gradient Descent | Kan Hu, Minxian Xu, Kejiang Ye, Chengzhong Xu | 2024-11-18 | 下载 | Microservices architecture has become the dominant architecture in cloud computing paradigm with its advantages of facilitating development, deployment, modularity and scalability. |
| ν-LPA: Fast GPU-based Label Propagation Algorithm (LPA) for Community Detection | Subhajit Sahu | 2024-11-18 | 下载 | Community detection is the problem of identifying natural divisions in networks. Efficient parallel algorithms for identifying such divisions are critical in a number of applications. |
| Massively Parallel Maximum Coverage Revisited | Thai Bui, Hoa T. Vu | 2024-11-18 | 下载 | We study the maximum set coverage problem in the massively parallel model. In this setting, sets that are subsets of a universe of elements are distributed among machines. |
| MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs | Shiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaoxuan Liu, Ying Sheng, Joseph E. Gonzalez, Matei Zaharia, Ion Stoica | 2024-11-18 | 下载 | Efficient deployment of large language models, particularly Mixture of Experts (MoE), on resource-constrained platforms presents significant challenges, especially in terms of computational efficiency... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A Software Platform for Testing Multi-Link Operation in Industrial Wi-Fi Networks | Matteo Rosani, Gianluca Cena, Dave Cavalcanti, Valerio Frascolla, Guido Marchetto, Stefano Scanzio | 2024-11-18 | 下载 | Multi-Link Operation (MLO) in Wi-Fi 7 is expected to tangibly boost throughput while lowering transmission latency at the same time. This is very relevant in industrial scenarios and makes MLO suitabl... |
| Generative AI on the Edge: Architecture and Performance Evaluation | Zeinab Nezami, Maryam Hafeez, Karim Djemame, Syed Ali Raza Zaidi | 2024-11-18 | 下载 | 6G's AI native vision of embedding advance intelligence in the network while bringing it closer to the user requires a systematic evaluation of Generative AI (GenAI) models on edge devices. |
| SpiderDAN: Matching Augmentation in Demand-Aware Networks | Aleksander Figiel, Darya Melnyk, André Nichterlein, Arash Pourdamghani, Stefan Schmid | 2024-11-18 | 下载 | Graph augmentation is a fundamental and well-studied problem that arises in network optimization. We consider a new variant of this model motivated by reconfigurable communication networks. |
| Next-generation optical networks to sustain connectivity of the future: All roads lead to optical-computing-enabled network? | Dao Thanh Hai, Isaac Woungang | 2024-11-18 | 下载 | From an architectural perspective with the main goal of reducing the effective traffic load in the network and thus gaining more operational efficiency, optical networks have been essentially remained... |
| Multi-hop Differential Topology based Algorithms for Resilient Network of UAV Swarm | Huan Lin, Lianghui Ding | 2024-11-18 | 下载 | Unmanned aerial vehicle (UAV) swarm networks face severe challenges of communication network split (CNS) issues caused by massive damage in hostile environments. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Generative AI on the Edge: Architecture and Performance Evaluation | Zeinab Nezami, Maryam Hafeez, Karim Djemame, Syed Ali Raza Zaidi | 2024-11-18 | 下载 | 6G's AI native vision of embedding advance intelligence in the network while bringing it closer to the user requires a systematic evaluation of Generative AI (GenAI) models on edge devices. |
| gDist: Efficient Distance Computation between 3D Meshes on GPU | Peng Fang, Wei Wang, Ruofeng Tong, Hailong Li, Min Tang | 2024-11-18 | 下载 | Computing maximum/minimum distances between 3D meshes is crucial for various applications, i.e., robotics, CAD, VR/AR, etc. In this work, we introduce a highly parallel algorithm (gDist) optimized for... |