Skip to content

2026-02-26

cs.AR - Architecture

标题作者发布日期PDF摘要
The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data PathsMarco Graziano2026-02-26下载AI transport libraries move bytes efficiently, but they commonly assume that buffers are already correctly allocated, placed, shared, registered, and safe under completion and teardown pressure.
A TEE-Based Architecture for Confidential and Dependable Process Attestation in Authorship VerificationDavid Condrey2026-02-26下载Process attestation systems verify that a continuous physical process, such as human authorship, actually occurred, rather than merely checking system state.
BiKA: Kolmogorov-Arnold-Network-inspired Ultra Lightweight Neural Network Hardware AcceleratorYuhao Liu, Salim Ullah, Akash Kumar2026-02-26下载Lightweight neural network accelerators are essential for edge devices with limited resources and power constraints. While quantization and binarization can efficiently reduce hardware cost, they stil...
Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware AcceleratorsYuhao Liu, Salim Ullah, Akash Kumar2026-02-26下载Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in relat...
Real-Time Stream Compaction for Sparse Machine Learning on FPGAsMarc Neu, Isabel Haide, Torben Ferber, Jürgen Becker2026-02-26下载Machine learning algorithms are being used more frequently in the first-level triggers in collider experiments, with Graph Neural Networks pushing the hardware requirements of FPGA-based triggers beyo...
The AetherFloat Family: Block-Scale-Free Quad-Radix Floating-Point Architectures for AI AcceleratorsKeita Morisaki2026-02-26下载The IEEE 754 floating-point standard is the bedrock of modern computing, but its structural requirements -- a hidden leading bit, Base-2 bit-level normalization, and Sign-Magnitude encoding -- impose ...
EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement LearningGuangyu Hu, Xiaofeng Zhou, Wei Zhang, Hongce Zhang2026-02-26下载Progress in hardware model checking depends critically on high-quality benchmarks. However, the community faces a significant benchmark gap: existing suites are limited in number, often distributed on...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data PathsMarco Graziano2026-02-26下载AI transport libraries move bytes efficiently, but they commonly assume that buffers are already correctly allocated, placed, shared, registered, and safe under completion and teardown pressure.
Rudder: Steering Prefetching in Distributed GNN Training using LLM AgentsAishwarya Sarkar, Sayan Ghosh, Nathan Tallent, Aman Chadha, Tanya Roosta, Ali Jannesari2026-02-26下载Large-scale Graph Neural Networks (GNNs) are typically trained by sampling a vertex's neighbors to a fixed distance. Because large input graphs are distributed, training requires frequent irregular co...
FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous EnvironmentsAnik Pramanik, Murat Kantarcioglu, Vincent Oria, Shantanu Sharma2026-02-26下载Federated Learning (FL) enables a group of clients to collaboratively train a model without sharing individual data, but its performance drops when client data are heterogeneous.
2G2T: Constant-Size, Statistically Sound MSM OutsourcingMajid Khabbazian2026-02-26下载Multi-scalar multiplication (MSM), defined as MSM(P, x) = sum_{i=1}^n x_i P_i, is a dominant computational kernel in discrete-logarithm-based cryptography and often becomes a bottleneck for verifiers ...
The PLUTO Code on GPUs: Offloading Lagrangian Particle MethodsAlessio Suriano, Stefano Truzzi, Agnese Costa, Marco Rossazza, Nitin Shukla, Andrea Mignone, Vittoria Berta, Claudio Zanni2026-02-26下载The Lagrangian Particles (LP) module of the PLUTO code offers a powerful simulation tool to predict the non-thermal emission produced by shock accelerated particles in large-scale relativistic magneti...
Exploiting network topology in brain-scale simulations of spiking neural networksMelissa Lober, Markus Diesmann, Susanne Kunkel2026-02-26下载Simulation code for conventional supercomputers serves as a reference for neuromorphic computing systems. The present bottleneck of distributed large-scale spiking neuronal network simulations is the ...
STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File SystemsChris Egersdoerfer, Philip Carns, Shane Snyder, Robert Ross, Dong Dai2026-02-26下载I/O performance is crucial to efficiency in data-intensive scientific computing; but tuning large-scale storage systems is complex, costly, and notoriously manpower-intensive, making it inaccessible f...
A High-Throughput AES-GCM Implementation on GPUs for Secure, Policy-Based Access to Massive Astronomical CatalogsSamuel Lemes-Perera, Miguel R. Alarcon, Pino Caballero-Gil, Miquel Serra-Ricart2026-02-26下载The era of large astronomical surveys generates massive image catalogs requiring efficient and secure access, particularly during pre-publication periods where data confidentiality and integrity are p...
LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving InfrastructureJaehong Cho, Hyunmin Choi, Guseul Heo, Jongse Park2026-02-26下载Large language model (LLM) serving infrastructures are undergoing a shift toward heterogeneity and disaggregation. Modern deployments increasingly integrate diverse accelerators and near-memory proces...
A Simple Distributed Deterministic Planar SeparatorYaseen Abd-Elhaleem, Michal Dory, Oren Weimann2026-02-26下载A balanced separator of a graph GG is a set of vertices whose removal disconnects the graph into connected components that are a constant factor smaller than GG.
Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource BottlenecksOliver Larsson, Thijs Metsch, Cristian Klein, Erik Elmroth2026-02-26下载Modern multi-tenant, hardware-heterogeneous computing environments pose significant challenges for effective workload orchestration. Simple heuristics for assessing workload performance, such as CPU u...
Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt CachingHiroki Matsutani, Naoki Matsuda, Naoto Sugiura2026-02-26下载Since local LLM inference on resource-constrained edge devices imposes a severe performance bottleneck, this paper proposes distributed prompt caching to enhance inference performance by cooperatively...
An Artificial Intelligence Framework for Joint Structural-Temporal Load Forecasting in Cloud Native PlatformsQingyuan Zhang2026-02-26下载This study targets cloud native environments where microservice invocation relations are complex, load fluctuations are multi-scale and superimposed, and cross-service impacts are significant.
Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility StudyPhilipp Wiesner, Soeren Becker, Brett Cornick, Dominik Scheinert, Alexander Acker, Odej Kao2026-02-26下载Training large language models (LLMs) requires substantial compute and energy. At the same time, renewable energy sources regularly produce more electricity than the grid can absorb, leading to curtai...
Dynamic Hierarchical Birkhoff-von Neumann Decomposition for All-to-All GPU CommunicationYen-Chieh Wu, Cheng-Shang Chang, Duan-Shin Lee, H. Jonathan Chao2026-02-26下载All-to-all GPU communication is a critical bottleneck in large-scale training clusters, where completion time is constrained by per-port bandwidth and can be severely impacted by traffic skew across G...
RLHFless: Serverless Computing for Efficient RLHFRui Wei, Hanfei Yu, Shubham Jain, Yogarajan Sivakumar, Devesh Tiwari, Jian Li, Seung-Jong Park, Hao Wang2026-02-26下载Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LLM) post-training to align model outputs with human preferences.
Tackling Privacy Heterogeneity in Differentially Private Federated LearningRuichen Xu, Ying-Jun Angela Zhang, Jianwei Huang2026-02-26下载Differentially private federated learning (DP-FL) enables clients to collaboratively train machine learning models while preserving the privacy of their local data.
FLYING SERVING: On-the-Fly Parallelism Switching for Large Language Model ServingShouwei Gao, Junqi Yin, Feiyi Wang, Wenqian Dong2026-02-26下载Production LLM serving must simultaneously deliver high throughput, low latency, and sufficient context capacity under non-stationary traffic and mixed request requirements.
FuxiShuffle: An Adaptive and Resilient Shuffle Service for Distributed Data Processing on Alibaba CloudYuhao Lin, Zhipeng Tang, Jiayan Tong, Junqing Xiao, Bin Lu, Yuhang Li, Chao Li, Zhiguo Zhang, Junhua Wang, Hao Luo, James Cheng, Chuang Hu, Jiawei Jiang, Xiao Yan2026-02-26下载Shuffle exchanges intermediate results between upstream and downstream operators in distributed data processing and is usually the bottleneck due to factors such as small random I/Os and network conte...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Software-Defined Testbed for Quantifying Deauthentication Resilience in Modern Wi-Fi NetworksAlex Carbajal, Asma Jodeiri Akbarfam2026-02-26下载Wi-Fi deauthentication attacks remain a practical denial-of-service (DoS) threat by exploiting unprotected management frames to disrupt client connectivity.
Dynamic Hierarchical Birkhoff-von Neumann Decomposition for All-to-All GPU CommunicationYen-Chieh Wu, Cheng-Shang Chang, Duan-Shin Lee, H. Jonathan Chao2026-02-26下载All-to-all GPU communication is a critical bottleneck in large-scale training clusters, where completion time is constrained by per-port bandwidth and can be severely impacted by traffic skew across G...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
A TEE-Based Architecture for Confidential and Dependable Process Attestation in Authorship VerificationDavid Condrey2026-02-26下载Process attestation systems verify that a continuous physical process, such as human authorship, actually occurred, rather than merely checking system state.

cs.PF - Performance

标题作者发布日期PDF摘要
Rudder: Steering Prefetching in Distributed GNN Training using LLM AgentsAishwarya Sarkar, Sayan Ghosh, Nathan Tallent, Aman Chadha, Tanya Roosta, Ali Jannesari2026-02-26下载Large-scale Graph Neural Networks (GNNs) are typically trained by sampling a vertex's neighbors to a fixed distance. Because large input graphs are distributed, training requires frequent irregular co...
The Road to Useful Quantum ComputersTimothy Proctor, Robin Blume-Kohout, Andrew Baczewski2026-02-26下载Building a useful quantum computer is a grand science and engineering challenge, currently pursued intensely by teams around the world. In the 1980s, Richard Feynman and Yuri Manin observed independen...

基于 VitePress 构建