Skip to content

2025-03-03

cs.AR - Architecture

标题作者发布日期PDF摘要
Scanning HTML at Tens of Gigabytes per Second on ARM ProcessorsDaniel Lemire2025-03-03下载Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data.
A Reconfigurable Stream-Based FPGA Accelerator for Bayesian Confidence Propagation Neural NetworksMuhammad Ihsan Al Hafiz, Naresh Ravichandran, Anders Lansner, Pawel Herman, Artur Podobas2025-03-03下载Brain-inspired algorithms are attractive and emerging alternatives to classical deep learning methods for use in various machine learning applications.
DCI: A Coordinated Allocation and Filling Workload-Aware Dual-Cache Allocation GNN Inference Acceleration SystemYi Luo, Yaobin Wang, Qi Wang, Yingchen Song, Huan Wu, Qingfeng Wang, Jun Huang2025-03-03下载Graph Neural Networks (GNNs) are powerful tools for processing graph-structured data, increasingly used for large-scale real-world graphs via sampling-based inference methods.
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI PerspectiveRakshit Aralimatti, Syed Abdul Gaffar Shakhadri, Kruthika KR, Kartik Basavaraj Angadi2025-03-03下载Deploying large scale language models on edge devices faces inherent challenges such as high computational demands, energy consumption, and potential data privacy risks.
Scalable Connectivity for Ising Machines: Dense to SparseM Mahmudul Hasan Sajeeb, Navid Anjum Aadit, Shuvro Chowdhury, Tong Wu, Cesely Smith, Dhruv Chinmay, Atharva Raut, Kerem Y. Camsari, Corentin Delacour, Tathagata Srimani2025-03-03下载In recent years, hardware implementations of Ising machines have emerged as a viable alternative to quantum computing for solving hard optimization problems among other applications.
CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-DesignZishen Wan, Hanchen Yang, Ritik Raj, Che-Kai Liu, Ananda Samajdar, Arijit Raychowdhury, Tushar Krishna2025-03-03下载Neurosymbolic AI is an emerging compositional paradigm that fuses neural learning with symbolic reasoning to enhance the transparency, interpretability, and trustworthiness of AI.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Enabling mixed-precision in spectral element codesYanxiang Chen, Pablo de Oliveira Castro, Paolo Bientinesi, Niclas Jansson, Roman Iakymchuk2025-03-03下载Mixed-precision computing has the potential to significantly reduce the cost of exascale computations, but determining when and how to implement it in programs can be challenging.
A Survey on Heterogeneous Computing Using SmartNICs and Emerging Data Processing UnitsNathan Tibbetts, Sifat Ibtisum, Satish Puri2025-03-03下载The emergence of new, off-path smart network cards (SmartNICs), known generally as Data Processing Units (DPU), has opened a wide range of research opportunities.
GRAIN: Exact Graph Reconstruction from GradientsMaria Drencheva, Ivo Petrov, Maximilian Baader, Dimitar I. Dimitrov, Martin Vechev2025-03-03下载Federated learning claims to enable collaborative model training among multiple clients with data privacy by transmitting gradient updates instead of the actual client data.
Bridging Paradigms: Designing for HPC-Quantum ConvergenceAmir Shehata, Peter Groszkowski, Thomas Naughton, Murali Gopalakrishnan Meena, Elaine Wong, Daniel Claudino, Rafael Ferreira da Silvaa, Thomas Beck2025-03-03下载This paper presents a comprehensive software stack architecture for integrating quantum computing (QC) capabilities with High-Performance Computing (HPC) environments.
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory OptimizationXinyi Wan, Penghui Qi, Guangxing Huang, Min Lin, Jialin Li2025-03-03下载Pipeline parallelism (PP) is widely used for training large language models (LLMs), yet its scalability is often constrained by high activation memory consumption as the number of in-flight microbatch...
PVU: Design and Implementation of a Posit Vector Arithmetic Unit (PVU) for Enhanced Floating-Point Computing in Edge and AI ApplicationsXinyu Wu, Yaobin Wang, Tianyi Zhao, Jiawei Qin, Zhu Liang, Jie Fu2025-03-03下载With the rapid development of edge computing, artificial intelligence and other fields, the accuracy and efficiency of floating-point computing have become increasingly crucial.
NM-SpMM: Accelerating Matrix Multiplication Using N:M Sparsity with GPGPUCong Ma, Du Wu, Zhelang Deng, Jiang Chen, Xiaowen Huang, Jintao Meng, Wenxi Zhu, Bingqiang Wang, Amelie Chi Zhou, Peng Chen, Minwen Deng, Yanjie Wei, Shengzhong Feng, Yi Pan2025-03-03下载Deep learning demonstrates effectiveness across a wide range of tasks. However, the dense and over-parameterized nature of these models results in significant resource consumption during deployment.
Alchemist: Towards the Design of Efficient Online Continual Learning SystemYuyang Huang, Yuhan Liu, Haryadi S. Gunawi, Beibin Li, Changho Hwang2025-03-03下载Continual learning has become a promising solution to refine large language models incrementally by leveraging user feedback. In particular, online continual learning - iteratively training the model ...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Survey on Heterogeneous Computing Using SmartNICs and Emerging Data Processing UnitsNathan Tibbetts, Sifat Ibtisum, Satish Puri2025-03-03下载The emergence of new, off-path smart network cards (SmartNICs), known generally as Data Processing Units (DPU), has opened a wide range of research opportunities.
m4: A Learned Flow-level Network SimulatorChenning Li, Anton A. Zabreyko, Arash Nasr-Esfahany, Kevin Zhao, Prateesh Goyal, Mohammad Alizadeh, Thomas Anderson2025-03-03下载Flow-level simulation is widely used to model large-scale data center networks due to its scalability. Unlike packet-level simulators that model individual packets, flow-level simulators abstract traf...
Application of the List Viterbi Algorithm for Satellite-based AIS DetectionLinda Kanaan, Karine Amis, Frédéric Guilloud, Rémi Chauvat2025-03-03下载Satellites receiving Automatic Identification System (AIS) packets in dense areas are particularly prone to AIS channel overload due to the extensive number of vessels.
Formally Discovering and Reproducing Network Protocols VulnerabilitiesChristophe Crochet, John Aoga, Axel Legay2025-03-03下载The rapid evolution of cyber threats has increased the need for robust methods to discover vulnerabilities in increasingly complex and diverse network protocols.
A Survey on Semantic Communications in Internet of VehiclesSha Ye, Qiong Wu, Pingyi Fan, Qiang Fan2025-03-03下载Internet of Vehicles (IoV), as the core of intelligent transportation system, enables comprehensive interconnection between vehicles and their surroundings through multiple communication modes, which ...
An Empirical Smart Contracts Latency Analysis on Ethereum Blockchain for Trustworthy Inter-Provider AgreementsFarhana Javed, Josep Mangues-Bafalluy2025-03-03下载As 6G networks evolve, inter-provider agreements become crucial for dynamic resource sharing and network slicing across multiple domains, requiring on-demand capacity provisioning while enabling trust...
Verifying QUIC implementations using IvyChristophe Crochet, Tom Rousseaux, J-F Sambon, Maxime Piraux, Axel Legay2025-03-03下载QUIC is a new transport protocol combining the reliability and congestion control features of TCP with the security features of TLS. One of the main challenges with QUIC is to guarantee that any of it...
Measuring the Energy of Smartphone Communications in the Edge-Cloud Continuum: Approaches, Challenges, and a Case StudyChiara Caiazza, Valerio Luconi, Alessio Vecchio2025-03-03下载As computational resources are placed at different points in the edge-cloud continuum, not only the responsiveness on the client side is affected, but also the energy spent during communications.
Comparative Analysis of Ray Tracing and Rayleigh Fading Models for Distributed MIMO Systems in Industrial EnvironmentsAymen Jaziri, David Demmer, Yoann Corre, Jean-Baptiste Doré, Didier Le Ruyet, Hmaied Shaiek, Pascal Chevalier2025-03-03下载This paper presents a detailed analysis of coverage in a factory environment using realistic 3D map data to evaluate the benefits of Distributed MIMO (D-MIMO) over colocalized approach.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Bomfather: An eBPF-based Kernel-level Monitoring Framework for Accurate Identification of Unknown, Unused, and Dynamically Loaded Dependencies in Modern Software Supply ChainsNaveen Srinivasan, Nathan Naveen, Neil Naveen2025-03-03下载Inaccuracies in conventional dependency-tracking methods frequently undermine the security and integrity of modern software supply chains. This paper introduces a kernel-level framework leveraging ext...
TUNA: Tuning Unstable and Noisy Cloud ApplicationsJohannes Freischuetz, Konstantinos Kanellis, Brian Kroth, Shivaram Venkataraman2025-03-03下载Autotuning plays a pivotal role in optimizing the performance of systems, particularly in large-scale cloud deployments. One of the main challenges in performing autotuning in the cloud arises from pe...
CHRONOS: Compensating Hardware Related Overheads with Native Multi Timer Support for Real-Time Operating SystemsKay Heider, Christian Hakert, Kuan-Hsun Chen, Jian-Jia Chen2025-03-03下载The management of timing constraints in a real-time operating system (RTOS) is usually realized through a global tick counter. This counter acts as the foundational time unit for all tasks in the syst...
Scalable and Accurate Application-Level Crash-Consistency Testing via Representative TestingYile Gu, Ian Neal, Jiexiao Xu, Shaun Christopher Lee, Ayman Said, Musa Haydar, Jacob Van Geffen, Rohan Kadekodi, Andrew Quinn, Baris Kasikci2025-03-03下载Crash consistency is essential for applications that must persist data. Crash-consistency testing has been commonly applied to find crash-consistency bugs in applications.

cs.PF - Performance

标题作者发布日期PDF摘要
m4: A Learned Flow-level Network SimulatorChenning Li, Anton A. Zabreyko, Arash Nasr-Esfahany, Kevin Zhao, Prateesh Goyal, Mohammad Alizadeh, Thomas Anderson2025-03-03下载Flow-level simulation is widely used to model large-scale data center networks due to its scalability. Unlike packet-level simulators that model individual packets, flow-level simulators abstract traf...
Performance Optimization of 3D Stencil Computation on ARM Scalable Vector ExtensionHongguang Chen2025-03-03下载Stencil computation is essential in high-performance computing, especially for large-scale tasks like liquid simulation and weather forecasting.

基于 VitePress 构建