2024-09-23

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Investigating Robot Dogs for Construction Monitoring: A Comparative Analysis of Specifications and On-site Requirements	Miguel Arturo Vega Torres, Fabian Pfitzner	2024-09-23	下载	Robot dogs are receiving increasing attention in various fields of research. However, the number of studies investigating their potential usability on construction sites is scarce.
Location is Key: Leveraging Large Language Model for Functional Bug Localization in Verilog	Bingkun Yao, Ning Wang, Jie Zhou, Xi Wang, Hong Gao, Zhe Jiang, Nan Guan	2024-09-23	下载	Bug localization in Verilog code is a crucial and time-consuming task during the verification of hardware design. Since introduction, Large Language Models (LLMs) have showed their strong programming ...
Chattronics: using GPTs to assist in the design of data acquisition systems	Jonathan Paul Driemeyer Brown, Tiago Oliveira Weber	2024-09-23	下载	The usefulness of Large Language Models (LLM) is being continuously tested in various fields. However, their intrinsic linguistic characteristic is still one of the limiting factors when applying thes...
Fast Virtual Gate Extraction For Silicon Quantum Dot Devices	Shize Che, Seong W Oh, Haoyun Qin, Yuhao Liu, Anthony Sigillito, Gushu Li	2024-09-23	下载	Silicon quantum dot devices stand as promising candidates for large-scale quantum computing due to their extended coherence times, compact size, and recent experimental demonstrations of sizable qubit...
Google Quantum AI's Quest for Error-Corrected Quantum Computers	M. AbuGhanem	2024-09-23	下载	Quantum computers stand at the forefront of technological innovation, offering exponential computational speed-ups that challenge classical computing capabilities.
Exploring and Exploiting Runtime Reconfigurable Floating Point Precision in Scientific Computing: a Case Study for Solving PDEs	Cong "Callie" Hao	2024-09-23	下载	Scientific computing applications, such as computational fluid dynamics and climate modeling, typically rely on 64-bit double-precision floating-point operations, which are extremely costly in terms o...
Analogous Alignments: Digital "Formally" meets Analog	Hansa Mohanty, Deepak Narayan Gadde	2024-09-23	下载	The complexity of modern-day System-on-Chips (SoCs) is continually increasing, and it becomes increasingly challenging to deliver dependable and credible chips in a short time-to-market.
FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale	Zeyu Zhu, Peisong Wang, Qinghao Hu, Gang Li, Xiaoyao Liang, Jian Cheng	2024-09-23	下载	Graph Neural Networks (GNNs) have shown great superiority on non-Euclidean graph data, achieving ground-breaking performance on various graph-related tasks.
A Realistic Simulation Framework for Analog/Digital Neuromorphic Architectures	Fernando M. Quintana, Maryada, Pedro L. Galindo, Elisa Donati, Giacomo Indiveri, Fernando Perez-Peña	2024-09-23	下载	Developing dedicated mixed-signal neuromorphic computing systems optimized for real-time sensory-processing in extreme edge-computing applications requires time-consuming design, fabrication, and depl...
Efficient Tabular Data Preprocessing of ML Pipelines	Yu Zhu, Wenqi Jiang, Gustavo Alonso	2024-09-23	下载	Data preprocessing pipelines, which includes data decoding, cleaning, and transforming, are a crucial component of Machine Learning (ML) training.
MICSim: A Modular Simulator for Mixed-signal Compute-in-Memory based AI Accelerator	Cong Wang, Zeming Chen, Shanshi Huang	2024-09-23	下载	This work introduces MICSim, an open-source, pre-circuit simulator designed for early-stage evaluation of chip-level software performance and hardware overhead of mixed-signal compute-in-memory (CIM) ...
MESC: Re-thinking Algorithmic Priority and/or Criticality Inversions for Heterogeneous MCSs	Jiapeng Guan, Ran Wei, Dean You, Yingquan Wang, Ruizhe Yang, Hui Wang, Zhe Jiang	2024-09-23	下载	Modern Mixed-Criticality Systems (MCSs) rely on hardware heterogeneity to satisfy ever-increasing computational demands. However, most of the heterogeneous co-processors are designed to achieve high t...
Hardware/Algorithm Co-design for Real-Time I/O Control with Improved Timing Accuracy and Robustness	Zhe Jiang, Shuai Zhao, Ran Wei, Xin Si, Gang Chen, Nan Guan	2024-09-23	下载	In safety-critical systems, timing accuracy is the key to achieving precise I/O control. To meet such strict timing requirements, dedicated hardware assistance has recently been investigated and devel...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training	Daiyaan Arfeen, Zhen Zhang, Xinwei Fu, Gregory R. Ganger, Yida Wang	2024-09-23	下载	Training Deep Neural Networks (DNNs) with billions of parameters generally involves pipeline-parallel (PP) execution. Unfortunately, PP model training can use GPUs inefficiently, especially at large s...
Stalactite: Toolbox for Fast Prototyping of Vertical Federated Learning Systems	Anastasiia Zakharova, Dmitriy Alexandrov, Maria Khodorchenko, Nikolay Butakov, Alexey Vasilev, Maxim Savchenko, Alexander Grigorievskiy	2024-09-23	下载	Machine learning (ML) models trained on datasets owned by different organizations and physically located in remote databases offer benefits in many real-world use cases.
MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines	Lei Gao, Amir Ziashahabi, Yue Niu, Salman Avestimehr, Murali Annavaram	2024-09-23	下载	Large Language Models (LLMs) are currently pre-trained and fine-tuned on large cloud servers. The next frontier is LLM personalization, where a foundation model can be fine-tuned with user/task-specif...
Parallel Dynamic Maximal Matching	Mohsen Ghaffari, Anton Trygub	2024-09-23	下载	We present the first (randomized) parallel dynamic algorithm for maximal matching, which can process an arbitrary number of updates simultaneously.
A Near-Optimal Low-Energy Deterministic Distributed SSSP with Ramifications on Congestion and APSP	Mohsen Ghaffari, Anton Trygub	2024-09-23	下载	We present a low-energy deterministic distributed algorithm that computes exact Single-Source Shortest Paths (SSSP) in near-optimal time: it runs in $\tilde{O}(n)$ rounds and each node is awake during...
Renaming in distributed certification	Nicolas Bousquet, Louis Esperet, Laurent Feuilloley, Sébastien Zeitoun	2024-09-23	下载	Local certification is the area of distributed network computing asking the following question: How to certify to the nodes of a network that a global property holds, if they are limited to a local ve...
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping	Guanhua Wang, Chengming Zhang, Zheyu Shen, Ang Li, Olatunji Ruwase	2024-09-23	下载	Given the popularity of generative AI, Large Language Models (LLMs) often consume hundreds or thousands of GPUs for parallelizing and accelerating the training process.
FLeNS: Federated Learning with Enhanced Nesterov-Newton Sketch	Sunny Gupta, Mohit Jindal, Pankhi Kashyap, Pranav Jeevan, Amit Sethi	2024-09-23	下载	Federated learning faces a critical challenge in balancing communication efficiency with rapid convergence, especially for second-order methods.
PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference	Zeyu Zhang, Haiying Shen	2024-09-23	下载	The scaling of transformer-based Large Language Models (LLMs) has significantly expanded their context lengths, enabling applications where inputs exceed 100K tokens.
Cucheb: A GPU implementation of the filtered Lanczos procedure	Jared L. Aurentz, Vassilis Kalantzis, Yousef Saad	2024-09-23	下载	This paper describes the software package Cucheb, a GPU implementation of the filtered Lanczos procedure for the solution of large sparse symmetric eigenvalue problems.
UELLM: A Unified and Efficient Approach for LLM Inference Serving	Yiyuan He, Minxian Xu, Jingfeng Wu, Wanyi Zheng, Kejiang Ye, Chengzhong Xu	2024-09-23	下载	In the context of Machine Learning as a Service (MLaaS) clouds, the extensive use of Large Language Models (LLMs) often requires efficient management of significant query loads.
MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices	Kan Hu, Linfeng Wen, Minxian Xu, Kejiang Ye	2024-09-23	下载	Service Level Objectives (SLOs) aim to set threshold for service time in cloud services to ensure acceptable quality of service (QoS) and user satisfaction.
FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale	Zeyu Zhu, Peisong Wang, Qinghao Hu, Gang Li, Xiaoyao Liang, Jian Cheng	2024-09-23	下载	Graph Neural Networks (GNNs) have shown great superiority on non-Euclidean graph data, achieving ground-breaking performance on various graph-related tasks.
Energy-Aware Federated Learning in Satellite Constellations	Nasrin Razmi, Bho Matthiesen, Armin Dekorsy, Petar Popovski	2024-09-23	下载	Federated learning in satellite constellations, where the satellites collaboratively train a machine learning model, is a promising technology towards enabling globally connected intelligence and the ...
Machine Learning Methods as Robust Quantum Noise Estimators	Jon Gardeazabal-Gutierrez, Erik B. Terres-Escudero, Pablo García Bringas	2024-09-23	下载	Access to quantum computing is steadily increasing each year as the speed advantage of quantum computers solidifies with the growing number of usable qubits.
Federated Graph Learning with Adaptive Importance-based Sampling	Anran Li, Yuanyuan Chen, Chao Ren, Wenhan Wang, Ming Hu, Tianlin Li, Han Yu, Qingyu Chen	2024-09-23	下载	For privacy-preserving graph learning tasks involving distributed graph datasets, federated learning (FL)-based GCN (FedGCN) training is required.
MECURY: Practical Cross-Chain Exchange via Trusted Hardware	Xiaoqing Wen, Quanbi Feng, Jianyu Niu, Yinqian Zhang, Chen Feng	2024-09-23	下载	The proliferation of blockchain-backed cryptocurrencies has sparked the need for cross-chain exchanges of diverse digital assets. Unfortunately, current exchanges suffer from high on-chain verificatio...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
On-Air Deep Learning Integrated Semantic Inference Models for Enhanced Earth Observation Satellite Networks	Hong-fu Chou, Vu Nguyen Ha, Prabhu Thiruvasagam, Thanh-Dung Le, Geoffrey Eappen, Ti Ti Nguyen, Luis M. Garces-Socarras, Jorge L. Gonzalez-Rios, Juan Carlos Merlano-Duncan, Symeon Chatzinotas	2024-09-23	下载	Earth Observation (EO) systems are crucial for cartography, disaster surveillance, and resource administration. Nonetheless, they encounter considerable obstacles in the processing and transmission of...
Intelligent Routing Algorithm over SDN: Reusable Reinforcement Learning Approach	Wang Wumian, Sajal Saha, Anwar Haque, Greg Sidebottom	2024-09-23	下载	Traffic routing is vital for the proper functioning of the Internet. As users and network traffic increase, researchers try to develop adaptive and intelligent routing algorithms that can fulfill vari...
Architectural Challenges of Nomadic Networks in 6G	Daniel Lindenschmitt, Benedikt Veith, Khurshid Alam, Ainur Daurembekova, Michael Gundall, Mohammad Asif Habibi, Bin Han, Dennis Krummacker, Philipp Rosemann, Hans D. Schotten	2024-09-23	下载	This paper examines architectural challenges and opportunities arising from Nomadic Networks in the context of emerging 6G research. Nomadic networks are proposed as a solution to the limitations of s...
Improved Routing of Multiparty Entanglement over Quantum Networks	Nirupam Basak, Goutam Paul	2024-09-23	下载	Effective routing of entanglements over a quantum network is a fundamental problem in quantum communication. Due to the fragility of quantum states, it is difficult to route entanglements at long dist...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
FRSZ2 for In-Register Block Compression Inside GMRES on GPUs	Thomas Grützmacher, Robert Underwood, Sheng Di, Franck Cappello, Hartwig Anzt	2024-09-23	下载	The performance of the GMRES iterative solver on GPUs is limited by the GPU main memory bandwidth. Compressed Basis GMRES outperforms GMRES by storing the Krylov basis in low precision, thereby reduci...
Deploying Open-Source Large Language Models: A performance Analysis	Yannis Bendi-Ouis, Dan Dutartre, Xavier Hinaut	2024-09-23	下载	Since the release of ChatGPT in November 2022, large language models (LLMs) have seen considerable success, including in the open-source community, with many open-weight models available.