Skip to content

2024-01-30

cs.AR - Architecture

标题作者发布日期PDF摘要
Qplacer: Frequency-Aware Component Placement for Superconducting Quantum ComputersJunyao Zhang, Hanrui Wang, Qi Ding, Jiaqi Gu, Reouven Assouly, William D. Oliver, Song Han, Kenneth R. Brown, Hai "Helen" Li, Yiran Chen2024-01-30下载Noisy Intermediate-Scale Quantum (NISQ) computers face a critical limitation in qubit numbers, hindering their progression towards large-scale and fault-tolerant quantum computing.
Using the Abstract Computer Architecture Description Language to Model AI Hardware AcceleratorsMika Markus Müller, Alexander Richard Manfred Borst, Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Oliver Bringmann2024-01-30下载Artificial Intelligence (AI) has witnessed remarkable growth, particularly through the proliferation of Deep Neural Networks (DNNs). These powerful models drive technological advancements across vario...
SAL-PIM: A Subarray-level Processing-in-Memory Architecture with LUT-based Linear Interpolation for Transformer-based Text GenerationWontak Han, Hyunjun Cho, Donghyuk Kim, Joo-Young Kim2024-01-30下载Text generation is a compelling sub-field of natural language processing, aiming to generate human-readable text from input words. In particular, the decoder-only generative models, such as generative...
Method for determining the acceleration of a parallel specialised computer system based on Amdahl's lawAleksandr S. Filipchenko2024-01-30下载The modification of Amdahl's law for the case of increment of processor elements in a computer system is considered. The coefficient kk linking accelerations of parallel and parallel specialized comp...
A Scalable RISC-V Vector Processor Enabling Efficient Multi-Precision DNN InferenceChuanning Wang, Chao Fang, Xiao Wu, Zhongfeng Wang, Jun Lin2024-01-30下载RISC-V processors encounter substantial challenges in deploying multi-precision deep neural networks (DNNs) due to their restricted precision support, constrained throughput, and suboptimal dataflow d...
WideSA: A High Array Utilization Mapping Scheme for Uniform Recurrences on the Versal ACAP ArchitectureTuo Dai, Bizhao Shi, Guojie Luo2024-01-30下载The Versal Adaptive Compute Acceleration Platform (ACAP) is a new architecture that combines AI Engines (AIEs) with reconfigurable fabric. This architecture offers significant acceleration potential f...
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & CollectivesSuchita Pati, Shaizeen Aga, Mahzabeen Islam, Nuwan Jayasena, Matthew D. Sinclair2024-01-30下载Large Language Models increasingly rely on distributed techniques for their training and inference. These techniques require communication across devices which can reduce scaling efficiency as the num...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
CLAIRE: Scalable GPU-Accelerated Algorithms for Diffeomorphic Image Registration in 3DAndreas Mang2024-01-30下载We present our work on scalable, GPU-accelerated algorithms for diffeomorphic image registration. The associated software package is termed CLAIRE. Image registration is a non-linear inverse problem.
Parallelization Strategies for the Randomized Kaczmarz Algorithm on Large-Scale Dense SystemsInês Ferreira, Juan A. Acebrón, José Monteiro2024-01-30下载The Kaczmarz algorithm is an iterative technique designed to solve consistent linear systems of equations. It falls within the category of row-action methods, focusing on handling one equation per ite...
Rendering Wireless Environments Useful for Gradient Estimators: A Zero-Order Stochastic Federated Learning MethodElissa Mhanna, Mohamad Assaad2024-01-30下载Cross-device federated learning (FL) is a growing machine learning setting whereby multiple edge devices collaborate to train a model without disclosing their raw data.
Characterising resource management performance in KubernetesVíctor Medel, Rafael Tolosana-Calasanz, José Ángel Bañares, Unai Arronategui, Omer F. Rana2024-01-30下载A key challenge for supporting elastic behaviour in cloud systems is to achieve a good performance in automated (de-)provisioning and scheduling of computing resources.
Identifying Quality Mersenne Twister Streams For Parallel Stochastic SimulationsBenjamin Antunes, Claude Mazel, David R. C Hill2024-01-30下载The Mersenne Twister (MT) is a pseudo-random number generator (PRNG) widely used in High Performance Computing for parallel stochastic simulations.
GPU-Accelerated Batch-Dynamic Subgraph MatchingLinshan Qiu, Lu Chen, Hailiang Jie, Xiangyu Ke, Yunjun Gao, Yang Liu, Zetao Zhang2024-01-30下载Subgraph matching has garnered increasing attention for its diverse real-world applications. Given the dynamic nature of real-world graphs, addressing evolving scenarios without incurring prohibitive ...
Method for determining the acceleration of a parallel specialised computer system based on Amdahl's lawAleksandr S. Filipchenko2024-01-30下载The modification of Amdahl's law for the case of increment of processor elements in a computer system is considered. The coefficient kk linking accelerations of parallel and parallel specialized comp...
Autonomy Loops for Monitoring, Operational Data Analytics, Feedback, and Response in HPC OperationsFrancieli Boito, Jim Brandt, Valeria Cardellini, Philip Carns, Florina M. Ciorba, Hilary Egan, Ahmed Eleliemy, Ann Gentile, Thomas Gruber, Jeff Hanson, Utz-Uwe Haus, Kevin Huck, Thomas Ilsche, Thomas Jakobsche, Terry Jones, Sven Karlsson, Abdullah Mueen, Michael Ott, Tapasya Patki, Ivy Peng, Krishnan Raghavan, Stephen Simms, Kathleen Shoga, Michael Showerman, Devesh Tiwari, Torsten Wilde, Keiji Yamamoto2024-01-30下载Many High Performance Computing (HPC) facilities have developed and deployed frameworks in support of continuous monitoring and operational data analytics (MODA) to help improve efficiency and through...
BCM-Broadcast: A Byzantine-Tolerant Causal Broadcast Algorithm for Distributed Mobile SystemsLeila NamvariTazehkand, Saied Pashazadeh, Ali Ebnenasir2024-01-30下载This paper presents an algorithm, called BCM-Broadcast, for the implementation of causal broadcast in distributed mobile systems in the presence of Byzantine failures.
Interactive Byzantine-Resilient Gradient Coding for General Data AssignmentsShreyas Jain, Luis Maßny, Christoph Hofmeister, Eitan Yaakobi, Rawad Bitar2024-01-30下载We tackle the problem of Byzantine errors in distributed gradient descent within the Byzantine-resilient gradient coding framework. Our proposed solution can recover the exact full gradient in the pre...
Quantum-Secure Hybrid Blockchain System for DID-based Verifiable Random Function with NTRU Linkable Ring SignatureBong Gon Kim, Dennis Wong, Yoon Seok Yang2024-01-30下载In this study, we present a secure smart contract-based Verifiable Random Function (VRF) model, addressing the shortcomings of existing systems.
Computational Power of Opaque RobotsCaterina Feletti, Lucia Mambretti, Carlo Mereghetti, Beatrice Palano2024-01-30下载In the field of distributed computing by robot swarms, the research comprehends manifold models where robots operate in the Euclidean plane through a sequence of look-compute-move cycles.
Using Sequential Runtime Distributions for the Parallel Speedup Prediction of SAT Local SearchAlejandro Arbelaez, Charlotte Truchet, Philippe Codognet2024-01-30下载This paper presents a detailed analysis of the scalability and parallelization of local search algorithms for the Satisfiability problem. We propose a framework to estimate the parallel performance of...
SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory BudgetKun Wang, Jiani Cao, Zimu Zhou, Zhenjiang Li2024-01-30下载Executing deep neural networks (DNNs) on edge artificial intelligence (AI) devices enables various autonomous mobile computing applications. However, the memory budget of edge AI devices restricts the...
EdgeOL: Efficient in-situ Online Learning on Edge DevicesSheng Li, Geng Yuan, Yue Dai, Tianyu Wang, Yawen Wu, Alex K. Jones, Jingtong Hu, Tony, Geng, Yanzhi Wang, Bo Yuan, Yufei Ding, Xulong Tang2024-01-30下载Emerging applications, such as robot-assisted eldercare and object recognition, generally employ deep learning neural networks (DNNs) and naturally require: i) handling streaming-in inference requests...
Communication-Efficient Multimodal Federated Learning: Joint Modality and Client SelectionLiangqi Yuan, Dong-Jun Han, Su Wang, Devesh Upadhyay, Christopher G. Brinton2024-01-30下载Multimodal federated learning (MFL) aims to enrich model training in FL settings where clients are collecting measurements across multiple modalities.
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & CollectivesSuchita Pati, Shaizeen Aga, Mahzabeen Islam, Nuwan Jayasena, Matthew D. Sinclair2024-01-30下载Large Language Models increasingly rely on distributed techniques for their training and inference. These techniques require communication across devices which can reduce scaling efficiency as the num...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Differentiated Service Entanglement Routing for Quantum NetworksHui Han, Bo Liu, Bangying Tang, Siyu Xiong, Jinquan Huang, Wanrong Yu, Shuhui Chen2024-01-30下载The entanglement distribution networks with various topologies are mainly implemented by active wavelength multiplexing routing strategies. However, designing an entanglement routing scheme, which ach...
Socially Aware V2X Localized QoSRafael Kaliski, Yue-hua Han2024-01-30下载Vehicle-to-everything (V2X) is a core 5G technology. V2X and its enabler, Device-to-Device (D2D), are essential for the Internet of Things (IoT) and the Internet of Vehicles (IoV).
URLLC-Aware Proactive UAV Placement in Internet of VehiclesChen-Feng Liu, Nirmal D. Wickramasinghe, Himal A. Suraweera, Mehdi Bennis, Merouane Debbah2024-01-30下载Unmanned aerial vehicles (UAVs) are envisioned to provide diverse services from the air. The service quality may rely on the wireless performance which is affected by the UAV's position.
Quantum XX-Secure BB-Byzantine TT-Colluding Private Information RetrievalMohamed Nomeir, Alptug Aytekin, Sennur Ulukus2024-01-30下载We consider the problems arising from the presence of Byzantine servers in a quantum private information retrieval (QPIR) setting. This is the first work to precisely define what the capabilities of B...
Utilizing Large Language Models to Translate RFC Protocol Specifications to CPSA DefinitionsMartin Duclos, Ivan A. Fernandez, Kaneesha Moore, Sudip Mittal, Edward Zieglar2024-01-30下载This paper proposes the use of Large Language Models (LLMs) for translating Request for Comments (RFC) protocol specifications into a format compatible with the Cryptographic Protocol Shapes Analyzer ...
Evaluating ML-Based Anomaly Detection Across Datasets of Varied Integrity: A Case StudyAdrian Pekar, Richard Jozsa2024-01-30下载Cybersecurity remains a critical challenge in the digital age, with network traffic flow anomaly detection being a key pivotal instrument in the fight against cyber threats.
Dynamic Human Digital Twin Deployment at the Edge for Task Execution: A Two-Timescale Accuracy-Aware Online OptimizationYuye Yang, You Shi, Changyan Yi, Jun Cai, Jiawen Kang, Dusit Niyato, Xuemin, Shen2024-01-30下载Human digital twin (HDT) is an emerging paradigm that bridges physical twins (PTs) with powerful virtual twins (VTs) for assisting complex task executions in human-centric services.
Large Multi-Modal Models (LMMs) as Universal Foundation Models for AI-Native Wireless SystemsShengzhe Xu, Christo Kurisummoottil Thomas, Omar Hashash, Nikhil Muralidhar, Walid Saad, Naren Ramakrishnan2024-01-30下载Large language models (LLMs) and foundation models have been recently touted as a game-changer for 6G systems. However, recent efforts on LLMs for wireless networks are limited to a direct application...

cs.PF - Performance

标题作者发布日期PDF摘要
Realtime Facial Expression Recognition: Neuromorphic Hardware vs. Edge AI AcceleratorsHeath Smith, James Seekings, Mohammadreza Mohammadi, Ramtin Zand2024-01-30下载The paper focuses on real-time facial expression recognition (FER) systems as an important component in various real-world applications such as social robotics.

基于 VitePress 构建