2026-02-26

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths	Marco Graziano	2026-02-26	下载	AI transport libraries move bytes efficiently, but they commonly assume that buffers are already correctly allocated, placed, shared, registered, and safe under completion and teardown pressure.
A TEE-Based Architecture for Confidential and Dependable Process Attestation in Authorship Verification	David Condrey	2026-02-26	下载	Process attestation systems verify that a continuous physical process, such as human authorship, actually occurred, rather than merely checking system state.
BiKA: Kolmogorov-Arnold-Network-inspired Ultra Lightweight Neural Network Hardware Accelerator	Yuhao Liu, Salim Ullah, Akash Kumar	2026-02-26	下载	Lightweight neural network accelerators are essential for edge devices with limited resources and power constraints. While quantization and binarization can efficiently reduce hardware cost, they stil...
Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators	Yuhao Liu, Salim Ullah, Akash Kumar	2026-02-26	下载	Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in relat...
Real-Time Stream Compaction for Sparse Machine Learning on FPGAs	Marc Neu, Isabel Haide, Torben Ferber, Jürgen Becker	2026-02-26	下载	Machine learning algorithms are being used more frequently in the first-level triggers in collider experiments, with Graph Neural Networks pushing the hardware requirements of FPGA-based triggers beyo...
The AetherFloat Family: Block-Scale-Free Quad-Radix Floating-Point Architectures for AI Accelerators	Keita Morisaki	2026-02-26	下载	The IEEE 754 floating-point standard is the bedrock of modern computing, but its structural requirements -- a hidden leading bit, Base-2 bit-level normalization, and Sign-Magnitude encoding -- impose ...
EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning	Guangyu Hu, Xiaofeng Zhou, Wei Zhang, Hongce Zhang	2026-02-26	下载	Progress in hardware model checking depends critically on high-quality benchmarks. However, the community faces a significant benchmark gap: existing suites are limited in number, often distributed on...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths	Marco Graziano	2026-02-26	下载	AI transport libraries move bytes efficiently, but they commonly assume that buffers are already correctly allocated, placed, shared, registered, and safe under completion and teardown pressure.
Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents	Aishwarya Sarkar, Sayan Ghosh, Nathan Tallent, Aman Chadha, Tanya Roosta, Ali Jannesari	2026-02-26	下载	Large-scale Graph Neural Networks (GNNs) are typically trained by sampling a vertex's neighbors to a fixed distance. Because large input graphs are distributed, training requires frequent irregular co...
FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments	Anik Pramanik, Murat Kantarcioglu, Vincent Oria, Shantanu Sharma	2026-02-26	下载	Federated Learning (FL) enables a group of clients to collaboratively train a model without sharing individual data, but its performance drops when client data are heterogeneous.
2G2T: Constant-Size, Statistically Sound MSM Outsourcing	Majid Khabbazian	2026-02-26	下载	Multi-scalar multiplication (MSM), defined as MSM(P, x) = sum_{i=1}^n x_i P_i, is a dominant computational kernel in discrete-logarithm-based cryptography and often becomes a bottleneck for verifiers ...
The PLUTO Code on GPUs: Offloading Lagrangian Particle Methods	Alessio Suriano, Stefano Truzzi, Agnese Costa, Marco Rossazza, Nitin Shukla, Andrea Mignone, Vittoria Berta, Claudio Zanni	2026-02-26	下载	The Lagrangian Particles (LP) module of the PLUTO code offers a powerful simulation tool to predict the non-thermal emission produced by shock accelerated particles in large-scale relativistic magneti...
Exploiting network topology in brain-scale simulations of spiking neural networks	Melissa Lober, Markus Diesmann, Susanne Kunkel	2026-02-26	下载	Simulation code for conventional supercomputers serves as a reference for neuromorphic computing systems. The present bottleneck of distributed large-scale spiking neuronal network simulations is the ...
STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems	Chris Egersdoerfer, Philip Carns, Shane Snyder, Robert Ross, Dong Dai	2026-02-26	下载	I/O performance is crucial to efficiency in data-intensive scientific computing; but tuning large-scale storage systems is complex, costly, and notoriously manpower-intensive, making it inaccessible f...
A High-Throughput AES-GCM Implementation on GPUs for Secure, Policy-Based Access to Massive Astronomical Catalogs	Samuel Lemes-Perera, Miguel R. Alarcon, Pino Caballero-Gil, Miquel Serra-Ricart	2026-02-26	下载	The era of large astronomical surveys generates massive image catalogs requiring efficient and secure access, particularly during pre-publication periods where data confidentiality and integrity are p...
LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure	Jaehong Cho, Hyunmin Choi, Guseul Heo, Jongse Park	2026-02-26	下载	Large language model (LLM) serving infrastructures are undergoing a shift toward heterogeneity and disaggregation. Modern deployments increasingly integrate diverse accelerators and near-memory proces...
A Simple Distributed Deterministic Planar Separator	Yaseen Abd-Elhaleem, Michal Dory, Oren Weimann	2026-02-26	下载	A balanced separator of a graph $G$ is a set of vertices whose removal disconnects the graph into connected components that are a constant factor smaller than $G$ .
Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks	Oliver Larsson, Thijs Metsch, Cristian Klein, Erik Elmroth	2026-02-26	下载	Modern multi-tenant, hardware-heterogeneous computing environments pose significant challenges for effective workload orchestration. Simple heuristics for assessing workload performance, such as CPU u...
Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching	Hiroki Matsutani, Naoki Matsuda, Naoto Sugiura	2026-02-26	下载	Since local LLM inference on resource-constrained edge devices imposes a severe performance bottleneck, this paper proposes distributed prompt caching to enhance inference performance by cooperatively...
An Artificial Intelligence Framework for Joint Structural-Temporal Load Forecasting in Cloud Native Platforms	Qingyuan Zhang	2026-02-26	下载	This study targets cloud native environments where microservice invocation relations are complex, load fluctuations are multi-scale and superimposed, and cross-service impacts are significant.
Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study	Philipp Wiesner, Soeren Becker, Brett Cornick, Dominik Scheinert, Alexander Acker, Odej Kao	2026-02-26	下载	Training large language models (LLMs) requires substantial compute and energy. At the same time, renewable energy sources regularly produce more electricity than the grid can absorb, leading to curtai...
Dynamic Hierarchical Birkhoff-von Neumann Decomposition for All-to-All GPU Communication	Yen-Chieh Wu, Cheng-Shang Chang, Duan-Shin Lee, H. Jonathan Chao	2026-02-26	下载	All-to-all GPU communication is a critical bottleneck in large-scale training clusters, where completion time is constrained by per-port bandwidth and can be severely impacted by traffic skew across G...
RLHFless: Serverless Computing for Efficient RLHF	Rui Wei, Hanfei Yu, Shubham Jain, Yogarajan Sivakumar, Devesh Tiwari, Jian Li, Seung-Jong Park, Hao Wang	2026-02-26	下载	Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LLM) post-training to align model outputs with human preferences.
Tackling Privacy Heterogeneity in Differentially Private Federated Learning	Ruichen Xu, Ying-Jun Angela Zhang, Jianwei Huang	2026-02-26	下载	Differentially private federated learning (DP-FL) enables clients to collaboratively train machine learning models while preserving the privacy of their local data.
FLYING SERVING: On-the-Fly Parallelism Switching for Large Language Model Serving	Shouwei Gao, Junqi Yin, Feiyi Wang, Wenqian Dong	2026-02-26	下载	Production LLM serving must simultaneously deliver high throughput, low latency, and sufficient context capacity under non-stationary traffic and mixed request requirements.
FuxiShuffle: An Adaptive and Resilient Shuffle Service for Distributed Data Processing on Alibaba Cloud	Yuhao Lin, Zhipeng Tang, Jiayan Tong, Junqing Xiao, Bin Lu, Yuhang Li, Chao Li, Zhiguo Zhang, Junhua Wang, Hao Luo, James Cheng, Chuang Hu, Jiawei Jiang, Xiao Yan	2026-02-26	下载	Shuffle exchanges intermediate results between upstream and downstream operators in distributed data processing and is usually the bottleneck due to factors such as small random I/Os and network conte...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
A Software-Defined Testbed for Quantifying Deauthentication Resilience in Modern Wi-Fi Networks	Alex Carbajal, Asma Jodeiri Akbarfam	2026-02-26	下载	Wi-Fi deauthentication attacks remain a practical denial-of-service (DoS) threat by exploiting unprotected management frames to disrupt client connectivity.
Dynamic Hierarchical Birkhoff-von Neumann Decomposition for All-to-All GPU Communication	Yen-Chieh Wu, Cheng-Shang Chang, Duan-Shin Lee, H. Jonathan Chao	2026-02-26	下载	All-to-all GPU communication is a critical bottleneck in large-scale training clusters, where completion time is constrained by per-port bandwidth and can be severely impacted by traffic skew across G...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
A TEE-Based Architecture for Confidential and Dependable Process Attestation in Authorship Verification	David Condrey	2026-02-26	下载	Process attestation systems verify that a continuous physical process, such as human authorship, actually occurred, rather than merely checking system state.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents	Aishwarya Sarkar, Sayan Ghosh, Nathan Tallent, Aman Chadha, Tanya Roosta, Ali Jannesari	2026-02-26	下载	Large-scale Graph Neural Networks (GNNs) are typically trained by sampling a vertex's neighbors to a fixed distance. Because large input graphs are distributed, training requires frequent irregular co...
The Road to Useful Quantum Computers	Timothy Proctor, Robin Blume-Kohout, Andrew Baczewski	2026-02-26	下载	Building a useful quantum computer is a grand science and engineering challenge, currently pursued intensely by teams around the world. In the 1980s, Richard Feynman and Yuri Manin observed independen...