Appearance
2025-03-03
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Scanning HTML at Tens of Gigabytes per Second on ARM Processors | Daniel Lemire | 2025-03-03 | 下载 | Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data. |
| A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural Networks | Muhammad Ihsan Al Hafiz, Naresh Ravichandran, Anders Lansner, Pawel Herman, Artur Podobas | 2025-03-03 | 下载 | Brain-inspired algorithms are attractive and emerging alternatives to classical deep learning methods for use in various machine learning applications. |
| DCI: A Coordinated Allocation and Filling Workload-Aware Dual-Cache Allocation GNN Inference Acceleration System | Yi Luo, Yaobin Wang, Qi Wang, Yingchen Song, Huan Wu, Qingfeng Wang, Jun Huang | 2025-03-03 | 下载 | Graph Neural Networks (GNNs) are powerful tools for processing graph-structured data, increasingly used for large-scale real-world graphs via sampling-based inference methods. |
| Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective | Rakshit Aralimatti, Syed Abdul Gaffar Shakhadri, Kruthika KR, Kartik Basavaraj Angadi | 2025-03-03 | 下载 | Deploying large scale language models on edge devices faces inherent challenges such as high computational demands, energy consumption, and potential data privacy risks. |
| Scalable Connectivity for Ising Machines: Dense to Sparse | M Mahmudul Hasan Sajeeb, Navid Anjum Aadit, Shuvro Chowdhury, Tong Wu, Cesely Smith, Dhruv Chinmay, Atharva Raut, Kerem Y. Camsari, Corentin Delacour, Tathagata Srimani | 2025-03-03 | 下载 | In recent years, hardware implementations of Ising machines have emerged as a viable alternative to quantum computing for solving hard optimization problems among other applications. |
| CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design | Zishen Wan, Hanchen Yang, Ritik Raj, Che-Kai Liu, Ananda Samajdar, Arijit Raychowdhury, Tushar Krishna | 2025-03-03 | 下载 | Neurosymbolic AI is an emerging compositional paradigm that fuses neural learning with symbolic reasoning to enhance the transparency, interpretability, and trustworthiness of AI. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Enabling mixed-precision in spectral element codes | Yanxiang Chen, Pablo de Oliveira Castro, Paolo Bientinesi, Niclas Jansson, Roman Iakymchuk | 2025-03-03 | 下载 | Mixed-precision computing has the potential to significantly reduce the cost of exascale computations, but determining when and how to implement it in programs can be challenging. |
| A Survey on Heterogeneous Computing Using SmartNICs and Emerging Data Processing Units | Nathan Tibbetts, Sifat Ibtisum, Satish Puri | 2025-03-03 | 下载 | The emergence of new, off-path smart network cards (SmartNICs), known generally as Data Processing Units (DPU), has opened a wide range of research opportunities. |
| GRAIN: Exact Graph Reconstruction from Gradients | Maria Drencheva, Ivo Petrov, Maximilian Baader, Dimitar I. Dimitrov, Martin Vechev | 2025-03-03 | 下载 | Federated learning claims to enable collaborative model training among multiple clients with data privacy by transmitting gradient updates instead of the actual client data. |
| Bridging Paradigms: Designing for HPC-Quantum Convergence | Amir Shehata, Peter Groszkowski, Thomas Naughton, Murali Gopalakrishnan Meena, Elaine Wong, Daniel Claudino, Rafael Ferreira da Silvaa, Thomas Beck | 2025-03-03 | 下载 | This paper presents a comprehensive software stack architecture for integrating quantum computing (QC) capabilities with High-Performance Computing (HPC) environments. |
| PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization | Xinyi Wan, Penghui Qi, Guangxing Huang, Min Lin, Jialin Li | 2025-03-03 | 下载 | Pipeline parallelism (PP) is widely used for training large language models (LLMs), yet its scalability is often constrained by high activation memory consumption as the number of in-flight microbatch... |
| PVU: Design and Implementation of a Posit Vector Arithmetic Unit (PVU) for Enhanced Floating-Point Computing in Edge and AI Applications | Xinyu Wu, Yaobin Wang, Tianyi Zhao, Jiawei Qin, Zhu Liang, Jie Fu | 2025-03-03 | 下载 | With the rapid development of edge computing, artificial intelligence and other fields, the accuracy and efficiency of floating-point computing have become increasingly crucial. |
| NM-SpMM: Accelerating Matrix Multiplication Using N:M Sparsity with GPGPU | Cong Ma, Du Wu, Zhelang Deng, Jiang Chen, Xiaowen Huang, Jintao Meng, Wenxi Zhu, Bingqiang Wang, Amelie Chi Zhou, Peng Chen, Minwen Deng, Yanjie Wei, Shengzhong Feng, Yi Pan | 2025-03-03 | 下载 | Deep learning demonstrates effectiveness across a wide range of tasks. However, the dense and over-parameterized nature of these models results in significant resource consumption during deployment. |
| Alchemist: Towards the Design of Efficient Online Continual Learning System | Yuyang Huang, Yuhan Liu, Haryadi S. Gunawi, Beibin Li, Changho Hwang | 2025-03-03 | 下载 | Continual learning has become a promising solution to refine large language models incrementally by leveraging user feedback. In particular, online continual learning - iteratively training the model ... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A Survey on Heterogeneous Computing Using SmartNICs and Emerging Data Processing Units | Nathan Tibbetts, Sifat Ibtisum, Satish Puri | 2025-03-03 | 下载 | The emergence of new, off-path smart network cards (SmartNICs), known generally as Data Processing Units (DPU), has opened a wide range of research opportunities. |
| m4: A Learned Flow-level Network Simulator | Chenning Li, Anton A. Zabreyko, Arash Nasr-Esfahany, Kevin Zhao, Prateesh Goyal, Mohammad Alizadeh, Thomas Anderson | 2025-03-03 | 下载 | Flow-level simulation is widely used to model large-scale data center networks due to its scalability. Unlike packet-level simulators that model individual packets, flow-level simulators abstract traf... |
| Application of the List Viterbi Algorithm for Satellite-based AIS Detection | Linda Kanaan, Karine Amis, Frédéric Guilloud, Rémi Chauvat | 2025-03-03 | 下载 | Satellites receiving Automatic Identification System (AIS) packets in dense areas are particularly prone to AIS channel overload due to the extensive number of vessels. |
| Formally Discovering and Reproducing Network Protocols Vulnerabilities | Christophe Crochet, John Aoga, Axel Legay | 2025-03-03 | 下载 | The rapid evolution of cyber threats has increased the need for robust methods to discover vulnerabilities in increasingly complex and diverse network protocols. |
| A Survey on Semantic Communications in Internet of Vehicles | Sha Ye, Qiong Wu, Pingyi Fan, Qiang Fan | 2025-03-03 | 下载 | Internet of Vehicles (IoV), as the core of intelligent transportation system, enables comprehensive interconnection between vehicles and their surroundings through multiple communication modes, which ... |
| An Empirical Smart Contracts Latency Analysis on Ethereum Blockchain for Trustworthy Inter-Provider Agreements | Farhana Javed, Josep Mangues-Bafalluy | 2025-03-03 | 下载 | As 6G networks evolve, inter-provider agreements become crucial for dynamic resource sharing and network slicing across multiple domains, requiring on-demand capacity provisioning while enabling trust... |
| Verifying QUIC implementations using Ivy | Christophe Crochet, Tom Rousseaux, J-F Sambon, Maxime Piraux, Axel Legay | 2025-03-03 | 下载 | QUIC is a new transport protocol combining the reliability and congestion control features of TCP with the security features of TLS. One of the main challenges with QUIC is to guarantee that any of it... |
| Measuring the Energy of Smartphone Communications in the Edge-Cloud Continuum: Approaches, Challenges, and a Case Study | Chiara Caiazza, Valerio Luconi, Alessio Vecchio | 2025-03-03 | 下载 | As computational resources are placed at different points in the edge-cloud continuum, not only the responsiveness on the client side is affected, but also the energy spent during communications. |
| Comparative Analysis of Ray Tracing and Rayleigh Fading Models for Distributed MIMO Systems in Industrial Environments | Aymen Jaziri, David Demmer, Yoann Corre, Jean-Baptiste Doré, Didier Le Ruyet, Hmaied Shaiek, Pascal Chevalier | 2025-03-03 | 下载 | This paper presents a detailed analysis of coverage in a factory environment using realistic 3D map data to evaluate the benefits of Distributed MIMO (D-MIMO) over colocalized approach. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Bomfather: An eBPF-based Kernel-level Monitoring Framework for Accurate Identification of Unknown, Unused, and Dynamically Loaded Dependencies in Modern Software Supply Chains | Naveen Srinivasan, Nathan Naveen, Neil Naveen | 2025-03-03 | 下载 | Inaccuracies in conventional dependency-tracking methods frequently undermine the security and integrity of modern software supply chains. This paper introduces a kernel-level framework leveraging ext... |
| TUNA: Tuning Unstable and Noisy Cloud Applications | Johannes Freischuetz, Konstantinos Kanellis, Brian Kroth, Shivaram Venkataraman | 2025-03-03 | 下载 | Autotuning plays a pivotal role in optimizing the performance of systems, particularly in large-scale cloud deployments. One of the main challenges in performing autotuning in the cloud arises from pe... |
| CHRONOS: Compensating Hardware Related Overheads with Native Multi Timer Support for Real-Time Operating Systems | Kay Heider, Christian Hakert, Kuan-Hsun Chen, Jian-Jia Chen | 2025-03-03 | 下载 | The management of timing constraints in a real-time operating system (RTOS) is usually realized through a global tick counter. This counter acts as the foundational time unit for all tasks in the syst... |
| Scalable and Accurate Application-Level Crash-Consistency Testing via Representative Testing | Yile Gu, Ian Neal, Jiexiao Xu, Shaun Christopher Lee, Ayman Said, Musa Haydar, Jacob Van Geffen, Rohan Kadekodi, Andrew Quinn, Baris Kasikci | 2025-03-03 | 下载 | Crash consistency is essential for applications that must persist data. Crash-consistency testing has been commonly applied to find crash-consistency bugs in applications. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| m4: A Learned Flow-level Network Simulator | Chenning Li, Anton A. Zabreyko, Arash Nasr-Esfahany, Kevin Zhao, Prateesh Goyal, Mohammad Alizadeh, Thomas Anderson | 2025-03-03 | 下载 | Flow-level simulation is widely used to model large-scale data center networks due to its scalability. Unlike packet-level simulators that model individual packets, flow-level simulators abstract traf... |
| Performance Optimization of 3D Stencil Computation on ARM Scalable Vector Extension | Hongguang Chen | 2025-03-03 | 下载 | Stencil computation is essential in high-performance computing, especially for large-scale tasks like liquid simulation and weather forecasting. |