Appearance
2024-09-23
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Investigating Robot Dogs for Construction Monitoring: A Comparative Analysis of Specifications and On-site Requirements | Miguel Arturo Vega Torres, Fabian Pfitzner | 2024-09-23 | 下载 | Robot dogs are receiving increasing attention in various fields of research. However, the number of studies investigating their potential usability on construction sites is scarce. |
| Location is Key: Leveraging Large Language Model for Functional Bug Localization in Verilog | Bingkun Yao, Ning Wang, Jie Zhou, Xi Wang, Hong Gao, Zhe Jiang, Nan Guan | 2024-09-23 | 下载 | Bug localization in Verilog code is a crucial and time-consuming task during the verification of hardware design. Since introduction, Large Language Models (LLMs) have showed their strong programming ... |
| Chattronics: using GPTs to assist in the design of data acquisition systems | Jonathan Paul Driemeyer Brown, Tiago Oliveira Weber | 2024-09-23 | 下载 | The usefulness of Large Language Models (LLM) is being continuously tested in various fields. However, their intrinsic linguistic characteristic is still one of the limiting factors when applying thes... |
| Fast Virtual Gate Extraction For Silicon Quantum Dot Devices | Shize Che, Seong W Oh, Haoyun Qin, Yuhao Liu, Anthony Sigillito, Gushu Li | 2024-09-23 | 下载 | Silicon quantum dot devices stand as promising candidates for large-scale quantum computing due to their extended coherence times, compact size, and recent experimental demonstrations of sizable qubit... |
| Google Quantum AI's Quest for Error-Corrected Quantum Computers | M. AbuGhanem | 2024-09-23 | 下载 | Quantum computers stand at the forefront of technological innovation, offering exponential computational speed-ups that challenge classical computing capabilities. |
| Exploring and Exploiting Runtime Reconfigurable Floating Point Precision in Scientific Computing: a Case Study for Solving PDEs | Cong "Callie" Hao | 2024-09-23 | 下载 | Scientific computing applications, such as computational fluid dynamics and climate modeling, typically rely on 64-bit double-precision floating-point operations, which are extremely costly in terms o... |
| Analogous Alignments: Digital "Formally" meets Analog | Hansa Mohanty, Deepak Narayan Gadde | 2024-09-23 | 下载 | The complexity of modern-day System-on-Chips (SoCs) is continually increasing, and it becomes increasingly challenging to deliver dependable and credible chips in a short time-to-market. |
| FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale | Zeyu Zhu, Peisong Wang, Qinghao Hu, Gang Li, Xiaoyao Liang, Jian Cheng | 2024-09-23 | 下载 | Graph Neural Networks (GNNs) have shown great superiority on non-Euclidean graph data, achieving ground-breaking performance on various graph-related tasks. |
| A Realistic Simulation Framework for Analog/Digital Neuromorphic Architectures | Fernando M. Quintana, Maryada, Pedro L. Galindo, Elisa Donati, Giacomo Indiveri, Fernando Perez-Peña | 2024-09-23 | 下载 | Developing dedicated mixed-signal neuromorphic computing systems optimized for real-time sensory-processing in extreme edge-computing applications requires time-consuming design, fabrication, and depl... |
| Efficient Tabular Data Preprocessing of ML Pipelines | Yu Zhu, Wenqi Jiang, Gustavo Alonso | 2024-09-23 | 下载 | Data preprocessing pipelines, which includes data decoding, cleaning, and transforming, are a crucial component of Machine Learning (ML) training. |
| MICSim: A Modular Simulator for Mixed-signal Compute-in-Memory based AI Accelerator | Cong Wang, Zeming Chen, Shanshi Huang | 2024-09-23 | 下载 | This work introduces MICSim, an open-source, pre-circuit simulator designed for early-stage evaluation of chip-level software performance and hardware overhead of mixed-signal compute-in-memory (CIM) ... |
| MESC: Re-thinking Algorithmic Priority and/or Criticality Inversions for Heterogeneous MCSs | Jiapeng Guan, Ran Wei, Dean You, Yingquan Wang, Ruizhe Yang, Hui Wang, Zhe Jiang | 2024-09-23 | 下载 | Modern Mixed-Criticality Systems (MCSs) rely on hardware heterogeneity to satisfy ever-increasing computational demands. However, most of the heterogeneous co-processors are designed to achieve high t... |
| Hardware/Algorithm Co-design for Real-Time I/O Control with Improved Timing Accuracy and Robustness | Zhe Jiang, Shuai Zhao, Ran Wei, Xin Si, Gang Chen, Nan Guan | 2024-09-23 | 下载 | In safety-critical systems, timing accuracy is the key to achieving precise I/O control. To meet such strict timing requirements, dedicated hardware assistance has recently been investigated and devel... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training | Daiyaan Arfeen, Zhen Zhang, Xinwei Fu, Gregory R. Ganger, Yida Wang | 2024-09-23 | 下载 | Training Deep Neural Networks (DNNs) with billions of parameters generally involves pipeline-parallel (PP) execution. Unfortunately, PP model training can use GPUs inefficiently, especially at large s... |
| Stalactite: Toolbox for Fast Prototyping of Vertical Federated Learning Systems | Anastasiia Zakharova, Dmitriy Alexandrov, Maria Khodorchenko, Nikolay Butakov, Alexey Vasilev, Maxim Savchenko, Alexander Grigorievskiy | 2024-09-23 | 下载 | Machine learning (ML) models trained on datasets owned by different organizations and physically located in remote databases offer benefits in many real-world use cases. |
| MobiZO: Enabling Efficient LLM Fine-Tuning at the Edge via Inference Engines | Lei Gao, Amir Ziashahabi, Yue Niu, Salman Avestimehr, Murali Annavaram | 2024-09-23 | 下载 | Large Language Models (LLMs) are currently pre-trained and fine-tuned on large cloud servers. The next frontier is LLM personalization, where a foundation model can be fine-tuned with user/task-specif... |
| Parallel Dynamic Maximal Matching | Mohsen Ghaffari, Anton Trygub | 2024-09-23 | 下载 | We present the first (randomized) parallel dynamic algorithm for maximal matching, which can process an arbitrary number of updates simultaneously. |
| A Near-Optimal Low-Energy Deterministic Distributed SSSP with Ramifications on Congestion and APSP | Mohsen Ghaffari, Anton Trygub | 2024-09-23 | 下载 | We present a low-energy deterministic distributed algorithm that computes exact Single-Source Shortest Paths (SSSP) in near-optimal time: it runs in rounds and each node is awake during... |
| Renaming in distributed certification | Nicolas Bousquet, Louis Esperet, Laurent Feuilloley, Sébastien Zeitoun | 2024-09-23 | 下载 | Local certification is the area of distributed network computing asking the following question: How to certify to the nodes of a network that a global property holds, if they are limited to a local ve... |
| Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping | Guanhua Wang, Chengming Zhang, Zheyu Shen, Ang Li, Olatunji Ruwase | 2024-09-23 | 下载 | Given the popularity of generative AI, Large Language Models (LLMs) often consume hundreds or thousands of GPUs for parallelizing and accelerating the training process. |
| FLeNS: Federated Learning with Enhanced Nesterov-Newton Sketch | Sunny Gupta, Mohit Jindal, Pankhi Kashyap, Pranav Jeevan, Amit Sethi | 2024-09-23 | 下载 | Federated learning faces a critical challenge in balancing communication efficiency with rapid convergence, especially for second-order methods. |
| PecSched: Preemptive and Efficient Cluster Scheduling for LLM Inference | Zeyu Zhang, Haiying Shen | 2024-09-23 | 下载 | The scaling of transformer-based Large Language Models (LLMs) has significantly expanded their context lengths, enabling applications where inputs exceed 100K tokens. |
| Cucheb: A GPU implementation of the filtered Lanczos procedure | Jared L. Aurentz, Vassilis Kalantzis, Yousef Saad | 2024-09-23 | 下载 | This paper describes the software package Cucheb, a GPU implementation of the filtered Lanczos procedure for the solution of large sparse symmetric eigenvalue problems. |
| UELLM: A Unified and Efficient Approach for LLM Inference Serving | Yiyuan He, Minxian Xu, Jingfeng Wu, Wanyi Zheng, Kejiang Ye, Chengzhong Xu | 2024-09-23 | 下载 | In the context of Machine Learning as a Service (MLaaS) clouds, the extensive use of Large Language Models (LLMs) often requires efficient management of significant query loads. |
| MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices | Kan Hu, Linfeng Wen, Minxian Xu, Kejiang Ye | 2024-09-23 | 下载 | Service Level Objectives (SLOs) aim to set threshold for service time in cloud services to ensure acceptable quality of service (QoS) and user satisfaction. |
| FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale | Zeyu Zhu, Peisong Wang, Qinghao Hu, Gang Li, Xiaoyao Liang, Jian Cheng | 2024-09-23 | 下载 | Graph Neural Networks (GNNs) have shown great superiority on non-Euclidean graph data, achieving ground-breaking performance on various graph-related tasks. |
| Energy-Aware Federated Learning in Satellite Constellations | Nasrin Razmi, Bho Matthiesen, Armin Dekorsy, Petar Popovski | 2024-09-23 | 下载 | Federated learning in satellite constellations, where the satellites collaboratively train a machine learning model, is a promising technology towards enabling globally connected intelligence and the ... |
| Machine Learning Methods as Robust Quantum Noise Estimators | Jon Gardeazabal-Gutierrez, Erik B. Terres-Escudero, Pablo García Bringas | 2024-09-23 | 下载 | Access to quantum computing is steadily increasing each year as the speed advantage of quantum computers solidifies with the growing number of usable qubits. |
| Federated Graph Learning with Adaptive Importance-based Sampling | Anran Li, Yuanyuan Chen, Chao Ren, Wenhan Wang, Ming Hu, Tianlin Li, Han Yu, Qingyu Chen | 2024-09-23 | 下载 | For privacy-preserving graph learning tasks involving distributed graph datasets, federated learning (FL)-based GCN (FedGCN) training is required. |
| MECURY: Practical Cross-Chain Exchange via Trusted Hardware | Xiaoqing Wen, Quanbi Feng, Jianyu Niu, Yinqian Zhang, Chen Feng | 2024-09-23 | 下载 | The proliferation of blockchain-backed cryptocurrencies has sparked the need for cross-chain exchanges of diverse digital assets. Unfortunately, current exchanges suffer from high on-chain verificatio... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| On-Air Deep Learning Integrated Semantic Inference Models for Enhanced Earth Observation Satellite Networks | Hong-fu Chou, Vu Nguyen Ha, Prabhu Thiruvasagam, Thanh-Dung Le, Geoffrey Eappen, Ti Ti Nguyen, Luis M. Garces-Socarras, Jorge L. Gonzalez-Rios, Juan Carlos Merlano-Duncan, Symeon Chatzinotas | 2024-09-23 | 下载 | Earth Observation (EO) systems are crucial for cartography, disaster surveillance, and resource administration. Nonetheless, they encounter considerable obstacles in the processing and transmission of... |
| Intelligent Routing Algorithm over SDN: Reusable Reinforcement Learning Approach | Wang Wumian, Sajal Saha, Anwar Haque, Greg Sidebottom | 2024-09-23 | 下载 | Traffic routing is vital for the proper functioning of the Internet. As users and network traffic increase, researchers try to develop adaptive and intelligent routing algorithms that can fulfill vari... |
| Architectural Challenges of Nomadic Networks in 6G | Daniel Lindenschmitt, Benedikt Veith, Khurshid Alam, Ainur Daurembekova, Michael Gundall, Mohammad Asif Habibi, Bin Han, Dennis Krummacker, Philipp Rosemann, Hans D. Schotten | 2024-09-23 | 下载 | This paper examines architectural challenges and opportunities arising from Nomadic Networks in the context of emerging 6G research. Nomadic networks are proposed as a solution to the limitations of s... |
| Improved Routing of Multiparty Entanglement over Quantum Networks | Nirupam Basak, Goutam Paul | 2024-09-23 | 下载 | Effective routing of entanglements over a quantum network is a fundamental problem in quantum communication. Due to the fragility of quantum states, it is difficult to route entanglements at long dist... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| FRSZ2 for In-Register Block Compression Inside GMRES on GPUs | Thomas Grützmacher, Robert Underwood, Sheng Di, Franck Cappello, Hartwig Anzt | 2024-09-23 | 下载 | The performance of the GMRES iterative solver on GPUs is limited by the GPU main memory bandwidth. Compressed Basis GMRES outperforms GMRES by storing the Krylov basis in low precision, thereby reduci... |
| Deploying Open-Source Large Language Models: A performance Analysis | Yannis Bendi-Ouis, Dan Dutartre, Xavier Hinaut | 2024-09-23 | 下载 | Since the release of ChatGPT in November 2022, large language models (LLMs) have seen considerable success, including in the open-source community, with many open-weight models available. |