2024-11-18

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration	Yuzong Chen, Ahmed F. AbouElhamayed, Xilai Dai, Yang Wang, Marta Andronic, George A. Constantinides, Mohamed S. Abdelfattah	2024-11-18	下载	Large language models (LLMs) have demonstrated remarkable performance across various machine learning tasks. Yet the substantial memory footprint of LLMs significantly hinders their deployment.
Analysis of Hardware Synthesis Strategies for Machine Learning in Collider Trigger and Data Acquisition	Haoyi Jia, Abhilasha Dave, Julia Gonski, Ryan Herbst	2024-11-18	下载	To fully exploit the physics potential of current and future high energy particle colliders, machine learning (ML) can be implemented in detector electronics for intelligent data processing and acquis...
Teapot: Efficiently Uncovering Spectre Gadgets in COTS Binaries	Fangzheng Lin, Zhongfa Wang, Hiroshi Sasaki	2024-11-18	下载	Speculative execution is crucial in enhancing modern processor performance but can introduce Spectre-type vulnerabilities that may leak sensitive information.
An Efficient Multicast Addressing Encoding Scheme for Multi-Core Neuromorphic Processors	Zhe Su, Aron Bencsik, Giacomo Indiveri, Davide Bertozzi	2024-11-18	下载	Multi-core neuromorphic processors are becoming increasingly significant due to their energy-efficient local computing and scalable modular architecture, particularly for event-based processing applic...
SILVIA: Automated Superword-Level Parallelism Exploitation via HLS-Specific LLVM Passes for Compute-Intensive FPGA Accelerators	Giovanni Brignone, Roberto Bosio, Fabrizio Ottati, Claudio Sansoè, Luciano Lavagno	2024-11-18	下载	High-level synthesis (HLS) aims at democratizing custom hardware acceleration with highly abstracted software-like descriptions. However, efficient accelerators still require substantial low-level har...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations	Igor Fedorov, Kate Plawiak, Lemeng Wu, Tarek Elgamal, Naveen Suda, Eric Smith, Hongyuan Zhan, Jianfeng Chi, Yuriy Hulovatyy, Kimish Patel, Zechun Liu, Changsheng Zhao, Yangyang Shi, Tijmen Blankevoort, Mahesh Pasupuleti, Bilge Soran, Zacharie Delpierre Coudert, Rachad Alao, Raghuraman Krishnamoorthi, Vikas Chandra	2024-11-18	下载	This paper presents Llama Guard 3-1B-INT4, a compact and efficient Llama Guard model, which has been open-sourced to the community during Meta Connect 2024.
Scaling Deep Learning Research with Kubernetes on the NRP Nautilus HyperCluster	J. Alex Hurt, Anes Ouadou, Mariam Alshehri, Grant J. Scott	2024-11-18	下载	Throughout the scientific computing space, deep learning algorithms have shown excellent performance in a wide range of applications. As these deep neural networks (DNNs) continue to mature, the neces...
Towards Scalable and Practical Batch-Dynamic Connectivity	Quinten De Man, Laxman Dhulipala, Adam Karczmarz, Jakub Łącki, Julian Shun, Zhongqi Wang	2024-11-18	下载	We study the problem of dynamically maintaining the connected components of an undirected graph subject to edge insertions and deletions. We give the first parallel algorithm for the problem which is ...
Distributed Maximum Flow in Planar Graphs	Yaseen Abd-Elhaleem, Michal Dory, Merav Parter, Oren Weimann	2024-11-18	下载	The dual of a planar graph $G$ is a planar graph $G^*$ that has a vertex for each face of $G$ and an edge for each pair of adjacent faces of $G$ .
FLMarket: Enabling Privacy-preserved Pre-training Data Pricing for Federated Learning	Zhenyu Wen, Wanglei Feng, Di Wu, Haozhen Hu, Chang Xu, Bin Qian, Zhen Hong, Cong Wang, Shouling Ji	2024-11-18	下载	Federated Learning (FL), as a mainstream privacy-preserving machine learning paradigm, offers promising solutions for privacy-critical domains such as healthcare and finance.
Generative AI on the Edge: Architecture and Performance Evaluation	Zeinab Nezami, Maryam Hafeez, Karim Djemame, Syed Ali Raza Zaidi	2024-11-18	下载	6G's AI native vision of embedding advance intelligence in the network while bringing it closer to the user requires a systematic evaluation of Generative AI (GenAI) models on edge devices.
Hash & Adjust: Competitive Demand-Aware Consistent Hashing	Arash Pourdamghani, Chen Avin, Robert Sama, Maryam Shiran, Stefan Schmid	2024-11-18	下载	Distributed systems often serve dynamic workloads and resource demands evolve over time. Such a temporal behavior stands in contrast to the static and demand-oblivious nature of most data structures u...
Topology-aware Preemptive Scheduling for Co-located LLM Workloads	Ping Zhang, Lei Su, Jinjie Yang, Xin Chen	2024-11-18	下载	Hosting diverse large language model workloads in a unified resource pool through co-location is cost-effective. For example, long-running chat services generally follow diurnal traffic patterns, whic...
gpuPairHMM: High-speed Pair-HMM Forward Algorithm for DNA Variant Calling on GPUs	Bertil Schmidt, Felix Kallenborn, Alexander Wichmann, Alejandro Chacon, Christian Hundt	2024-11-18	下载	The continually increasing volume of DNA sequence data has resulted in a growing demand for fast implementations of core algorithms. Computation of pairwise alignments between candidate haplotypes and...
The Jevons Paradox In Cloud Computing: A Thermodynamics Perspective	Prateek Sharma	2024-11-18	下载	How do we explain the simultaneous growth in energy efficiency of cloud computing and its energy consumption? The Jevons paradox provides one perspective of this phenomenon.
LSRAM: A Lightweight Autoscaling and SLO Resource Allocation Framework for Microservices Based on Gradient Descent	Kan Hu, Minxian Xu, Kejiang Ye, Chengzhong Xu	2024-11-18	下载	Microservices architecture has become the dominant architecture in cloud computing paradigm with its advantages of facilitating development, deployment, modularity and scalability.
ν-LPA: Fast GPU-based Label Propagation Algorithm (LPA) for Community Detection	Subhajit Sahu	2024-11-18	下载	Community detection is the problem of identifying natural divisions in networks. Efficient parallel algorithms for identifying such divisions are critical in a number of applications.
Massively Parallel Maximum Coverage Revisited	Thai Bui, Hoa T. Vu	2024-11-18	下载	We study the maximum set coverage problem in the massively parallel model. In this setting, $m$ sets that are subsets of a universe of $n$ elements are distributed among $m$ machines.
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs	Shiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaoxuan Liu, Ying Sheng, Joseph E. Gonzalez, Matei Zaharia, Ion Stoica	2024-11-18	下载	Efficient deployment of large language models, particularly Mixture of Experts (MoE), on resource-constrained platforms presents significant challenges, especially in terms of computational efficiency...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
A Software Platform for Testing Multi-Link Operation in Industrial Wi-Fi Networks	Matteo Rosani, Gianluca Cena, Dave Cavalcanti, Valerio Frascolla, Guido Marchetto, Stefano Scanzio	2024-11-18	下载	Multi-Link Operation (MLO) in Wi-Fi 7 is expected to tangibly boost throughput while lowering transmission latency at the same time. This is very relevant in industrial scenarios and makes MLO suitabl...
Generative AI on the Edge: Architecture and Performance Evaluation	Zeinab Nezami, Maryam Hafeez, Karim Djemame, Syed Ali Raza Zaidi	2024-11-18	下载	6G's AI native vision of embedding advance intelligence in the network while bringing it closer to the user requires a systematic evaluation of Generative AI (GenAI) models on edge devices.
SpiderDAN: Matching Augmentation in Demand-Aware Networks	Aleksander Figiel, Darya Melnyk, André Nichterlein, Arash Pourdamghani, Stefan Schmid	2024-11-18	下载	Graph augmentation is a fundamental and well-studied problem that arises in network optimization. We consider a new variant of this model motivated by reconfigurable communication networks.
Next-generation optical networks to sustain connectivity of the future: All roads lead to optical-computing-enabled network?	Dao Thanh Hai, Isaac Woungang	2024-11-18	下载	From an architectural perspective with the main goal of reducing the effective traffic load in the network and thus gaining more operational efficiency, optical networks have been essentially remained...
Multi-hop Differential Topology based Algorithms for Resilient Network of UAV Swarm	Huan Lin, Lianghui Ding	2024-11-18	下载	Unmanned aerial vehicle (UAV) swarm networks face severe challenges of communication network split (CNS) issues caused by massive damage in hostile environments.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Generative AI on the Edge: Architecture and Performance Evaluation	Zeinab Nezami, Maryam Hafeez, Karim Djemame, Syed Ali Raza Zaidi	2024-11-18	下载	6G's AI native vision of embedding advance intelligence in the network while bringing it closer to the user requires a systematic evaluation of Generative AI (GenAI) models on edge devices.
gDist: Efficient Distance Computation between 3D Meshes on GPU	Peng Fang, Wei Wang, Ruofeng Tong, Hailong Li, Min Tang	2024-11-18	下载	Computing maximum/minimum distances between 3D meshes is crucial for various applications, i.e., robotics, CAD, VR/AR, etc. In this work, we introduce a highly parallel algorithm (gDist) optimized for...