Skip to content

2024-11-18

cs.AR - Architecture

标题作者发布日期PDF摘要
BitMoD: Bit-serial Mixture-of-Datatype LLM AccelerationYuzong Chen, Ahmed F. AbouElhamayed, Xilai Dai, Yang Wang, Marta Andronic, George A. Constantinides, Mohamed S. Abdelfattah2024-11-18下载Large language models (LLMs) have demonstrated remarkable performance across various machine learning tasks. Yet the substantial memory footprint of LLMs significantly hinders their deployment.
Analysis of Hardware Synthesis Strategies for Machine Learning in Collider Trigger and Data AcquisitionHaoyi Jia, Abhilasha Dave, Julia Gonski, Ryan Herbst2024-11-18下载To fully exploit the physics potential of current and future high energy particle colliders, machine learning (ML) can be implemented in detector electronics for intelligent data processing and acquis...
Teapot: Efficiently Uncovering Spectre Gadgets in COTS BinariesFangzheng Lin, Zhongfa Wang, Hiroshi Sasaki2024-11-18下载Speculative execution is crucial in enhancing modern processor performance but can introduce Spectre-type vulnerabilities that may leak sensitive information.
An Efficient Multicast Addressing Encoding Scheme for Multi-Core Neuromorphic ProcessorsZhe Su, Aron Bencsik, Giacomo Indiveri, Davide Bertozzi2024-11-18下载Multi-core neuromorphic processors are becoming increasingly significant due to their energy-efficient local computing and scalable modular architecture, particularly for event-based processing applic...
SILVIA: Automated Superword-Level Parallelism Exploitation via HLS-Specific LLVM Passes for Compute-Intensive FPGA AcceleratorsGiovanni Brignone, Roberto Bosio, Fabrizio Ottati, Claudio Sansoè, Luciano Lavagno2024-11-18下载High-level synthesis (HLS) aims at democratizing custom hardware acceleration with highly abstracted software-like descriptions. However, efficient accelerators still require substantial low-level har...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI ConversationsIgor Fedorov, Kate Plawiak, Lemeng Wu, Tarek Elgamal, Naveen Suda, Eric Smith, Hongyuan Zhan, Jianfeng Chi, Yuriy Hulovatyy, Kimish Patel, Zechun Liu, Changsheng Zhao, Yangyang Shi, Tijmen Blankevoort, Mahesh Pasupuleti, Bilge Soran, Zacharie Delpierre Coudert, Rachad Alao, Raghuraman Krishnamoorthi, Vikas Chandra2024-11-18下载This paper presents Llama Guard 3-1B-INT4, a compact and efficient Llama Guard model, which has been open-sourced to the community during Meta Connect 2024.
Scaling Deep Learning Research with Kubernetes on the NRP Nautilus HyperClusterJ. Alex Hurt, Anes Ouadou, Mariam Alshehri, Grant J. Scott2024-11-18下载Throughout the scientific computing space, deep learning algorithms have shown excellent performance in a wide range of applications. As these deep neural networks (DNNs) continue to mature, the neces...
Towards Scalable and Practical Batch-Dynamic ConnectivityQuinten De Man, Laxman Dhulipala, Adam Karczmarz, Jakub Łącki, Julian Shun, Zhongqi Wang2024-11-18下载We study the problem of dynamically maintaining the connected components of an undirected graph subject to edge insertions and deletions. We give the first parallel algorithm for the problem which is ...
Distributed Maximum Flow in Planar GraphsYaseen Abd-Elhaleem, Michal Dory, Merav Parter, Oren Weimann2024-11-18下载The dual of a planar graph GG is a planar graph GG^* that has a vertex for each face of GG and an edge for each pair of adjacent faces of GG.
FLMarket: Enabling Privacy-preserved Pre-training Data Pricing for Federated LearningZhenyu Wen, Wanglei Feng, Di Wu, Haozhen Hu, Chang Xu, Bin Qian, Zhen Hong, Cong Wang, Shouling Ji2024-11-18下载Federated Learning (FL), as a mainstream privacy-preserving machine learning paradigm, offers promising solutions for privacy-critical domains such as healthcare and finance.
Generative AI on the Edge: Architecture and Performance EvaluationZeinab Nezami, Maryam Hafeez, Karim Djemame, Syed Ali Raza Zaidi2024-11-18下载6G's AI native vision of embedding advance intelligence in the network while bringing it closer to the user requires a systematic evaluation of Generative AI (GenAI) models on edge devices.
Hash & Adjust: Competitive Demand-Aware Consistent HashingArash Pourdamghani, Chen Avin, Robert Sama, Maryam Shiran, Stefan Schmid2024-11-18下载Distributed systems often serve dynamic workloads and resource demands evolve over time. Such a temporal behavior stands in contrast to the static and demand-oblivious nature of most data structures u...
Topology-aware Preemptive Scheduling for Co-located LLM WorkloadsPing Zhang, Lei Su, Jinjie Yang, Xin Chen2024-11-18下载Hosting diverse large language model workloads in a unified resource pool through co-location is cost-effective. For example, long-running chat services generally follow diurnal traffic patterns, whic...
gpuPairHMM: High-speed Pair-HMM Forward Algorithm for DNA Variant Calling on GPUsBertil Schmidt, Felix Kallenborn, Alexander Wichmann, Alejandro Chacon, Christian Hundt2024-11-18下载The continually increasing volume of DNA sequence data has resulted in a growing demand for fast implementations of core algorithms. Computation of pairwise alignments between candidate haplotypes and...
The Jevons Paradox In Cloud Computing: A Thermodynamics PerspectivePrateek Sharma2024-11-18下载How do we explain the simultaneous growth in energy efficiency of cloud computing and its energy consumption? The Jevons paradox provides one perspective of this phenomenon.
LSRAM: A Lightweight Autoscaling and SLO Resource Allocation Framework for Microservices Based on Gradient DescentKan Hu, Minxian Xu, Kejiang Ye, Chengzhong Xu2024-11-18下载Microservices architecture has become the dominant architecture in cloud computing paradigm with its advantages of facilitating development, deployment, modularity and scalability.
ν-LPA: Fast GPU-based Label Propagation Algorithm (LPA) for Community DetectionSubhajit Sahu2024-11-18下载Community detection is the problem of identifying natural divisions in networks. Efficient parallel algorithms for identifying such divisions are critical in a number of applications.
Massively Parallel Maximum Coverage RevisitedThai Bui, Hoa T. Vu2024-11-18下载We study the maximum set coverage problem in the massively parallel model. In this setting, mm sets that are subsets of a universe of nn elements are distributed among mm machines.
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUsShiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaoxuan Liu, Ying Sheng, Joseph E. Gonzalez, Matei Zaharia, Ion Stoica2024-11-18下载Efficient deployment of large language models, particularly Mixture of Experts (MoE), on resource-constrained platforms presents significant challenges, especially in terms of computational efficiency...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Software Platform for Testing Multi-Link Operation in Industrial Wi-Fi NetworksMatteo Rosani, Gianluca Cena, Dave Cavalcanti, Valerio Frascolla, Guido Marchetto, Stefano Scanzio2024-11-18下载Multi-Link Operation (MLO) in Wi-Fi 7 is expected to tangibly boost throughput while lowering transmission latency at the same time. This is very relevant in industrial scenarios and makes MLO suitabl...
Generative AI on the Edge: Architecture and Performance EvaluationZeinab Nezami, Maryam Hafeez, Karim Djemame, Syed Ali Raza Zaidi2024-11-18下载6G's AI native vision of embedding advance intelligence in the network while bringing it closer to the user requires a systematic evaluation of Generative AI (GenAI) models on edge devices.
SpiderDAN: Matching Augmentation in Demand-Aware NetworksAleksander Figiel, Darya Melnyk, André Nichterlein, Arash Pourdamghani, Stefan Schmid2024-11-18下载Graph augmentation is a fundamental and well-studied problem that arises in network optimization. We consider a new variant of this model motivated by reconfigurable communication networks.
Next-generation optical networks to sustain connectivity of the future: All roads lead to optical-computing-enabled network?Dao Thanh Hai, Isaac Woungang2024-11-18下载From an architectural perspective with the main goal of reducing the effective traffic load in the network and thus gaining more operational efficiency, optical networks have been essentially remained...
Multi-hop Differential Topology based Algorithms for Resilient Network of UAV SwarmHuan Lin, Lianghui Ding2024-11-18下载Unmanned aerial vehicle (UAV) swarm networks face severe challenges of communication network split (CNS) issues caused by massive damage in hostile environments.

cs.PF - Performance

标题作者发布日期PDF摘要
Generative AI on the Edge: Architecture and Performance EvaluationZeinab Nezami, Maryam Hafeez, Karim Djemame, Syed Ali Raza Zaidi2024-11-18下载6G's AI native vision of embedding advance intelligence in the network while bringing it closer to the user requires a systematic evaluation of Generative AI (GenAI) models on edge devices.
gDist: Efficient Distance Computation between 3D Meshes on GPUPeng Fang, Wei Wang, Ruofeng Tong, Hailong Li, Min Tang2024-11-18下载Computing maximum/minimum distances between 3D meshes is crucial for various applications, i.e., robotics, CAD, VR/AR, etc. In this work, we introduce a highly parallel algorithm (gDist) optimized for...

基于 VitePress 构建