Appearance
2026-02-26
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths | Marco Graziano | 2026-02-26 | 下载 | AI transport libraries move bytes efficiently, but they commonly assume that buffers are already correctly allocated, placed, shared, registered, and safe under completion and teardown pressure. |
| A TEE-Based Architecture for Confidential and Dependable Process Attestation in Authorship Verification | David Condrey | 2026-02-26 | 下载 | Process attestation systems verify that a continuous physical process, such as human authorship, actually occurred, rather than merely checking system state. |
| BiKA: Kolmogorov-Arnold-Network-inspired Ultra Lightweight Neural Network Hardware Accelerator | Yuhao Liu, Salim Ullah, Akash Kumar | 2026-02-26 | 下载 | Lightweight neural network accelerators are essential for edge devices with limited resources and power constraints. While quantization and binarization can efficiently reduce hardware cost, they stil... |
| Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators | Yuhao Liu, Salim Ullah, Akash Kumar | 2026-02-26 | 下载 | Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in relat... |
| Real-Time Stream Compaction for Sparse Machine Learning on FPGAs | Marc Neu, Isabel Haide, Torben Ferber, Jürgen Becker | 2026-02-26 | 下载 | Machine learning algorithms are being used more frequently in the first-level triggers in collider experiments, with Graph Neural Networks pushing the hardware requirements of FPGA-based triggers beyo... |
| The AetherFloat Family: Block-Scale-Free Quad-Radix Floating-Point Architectures for AI Accelerators | Keita Morisaki | 2026-02-26 | 下载 | The IEEE 754 floating-point standard is the bedrock of modern computing, but its structural requirements -- a hidden leading bit, Base-2 bit-level normalization, and Sign-Magnitude encoding -- impose ... |
| EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning | Guangyu Hu, Xiaofeng Zhou, Wei Zhang, Hongce Zhang | 2026-02-26 | 下载 | Progress in hardware model checking depends critically on high-quality benchmarks. However, the community faces a significant benchmark gap: existing suites are limited in number, often distributed on... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths | Marco Graziano | 2026-02-26 | 下载 | AI transport libraries move bytes efficiently, but they commonly assume that buffers are already correctly allocated, placed, shared, registered, and safe under completion and teardown pressure. |
| Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents | Aishwarya Sarkar, Sayan Ghosh, Nathan Tallent, Aman Chadha, Tanya Roosta, Ali Jannesari | 2026-02-26 | 下载 | Large-scale Graph Neural Networks (GNNs) are typically trained by sampling a vertex's neighbors to a fixed distance. Because large input graphs are distributed, training requires frequent irregular co... |
| FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments | Anik Pramanik, Murat Kantarcioglu, Vincent Oria, Shantanu Sharma | 2026-02-26 | 下载 | Federated Learning (FL) enables a group of clients to collaboratively train a model without sharing individual data, but its performance drops when client data are heterogeneous. |
| 2G2T: Constant-Size, Statistically Sound MSM Outsourcing | Majid Khabbazian | 2026-02-26 | 下载 | Multi-scalar multiplication (MSM), defined as MSM(P, x) = sum_{i=1}^n x_i P_i, is a dominant computational kernel in discrete-logarithm-based cryptography and often becomes a bottleneck for verifiers ... |
| The PLUTO Code on GPUs: Offloading Lagrangian Particle Methods | Alessio Suriano, Stefano Truzzi, Agnese Costa, Marco Rossazza, Nitin Shukla, Andrea Mignone, Vittoria Berta, Claudio Zanni | 2026-02-26 | 下载 | The Lagrangian Particles (LP) module of the PLUTO code offers a powerful simulation tool to predict the non-thermal emission produced by shock accelerated particles in large-scale relativistic magneti... |
| Exploiting network topology in brain-scale simulations of spiking neural networks | Melissa Lober, Markus Diesmann, Susanne Kunkel | 2026-02-26 | 下载 | Simulation code for conventional supercomputers serves as a reference for neuromorphic computing systems. The present bottleneck of distributed large-scale spiking neuronal network simulations is the ... |
| STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems | Chris Egersdoerfer, Philip Carns, Shane Snyder, Robert Ross, Dong Dai | 2026-02-26 | 下载 | I/O performance is crucial to efficiency in data-intensive scientific computing; but tuning large-scale storage systems is complex, costly, and notoriously manpower-intensive, making it inaccessible f... |
| A High-Throughput AES-GCM Implementation on GPUs for Secure, Policy-Based Access to Massive Astronomical Catalogs | Samuel Lemes-Perera, Miguel R. Alarcon, Pino Caballero-Gil, Miquel Serra-Ricart | 2026-02-26 | 下载 | The era of large astronomical surveys generates massive image catalogs requiring efficient and secure access, particularly during pre-publication periods where data confidentiality and integrity are p... |
| LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure | Jaehong Cho, Hyunmin Choi, Guseul Heo, Jongse Park | 2026-02-26 | 下载 | Large language model (LLM) serving infrastructures are undergoing a shift toward heterogeneity and disaggregation. Modern deployments increasingly integrate diverse accelerators and near-memory proces... |
| A Simple Distributed Deterministic Planar Separator | Yaseen Abd-Elhaleem, Michal Dory, Oren Weimann | 2026-02-26 | 下载 | A balanced separator of a graph is a set of vertices whose removal disconnects the graph into connected components that are a constant factor smaller than . |
| Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks | Oliver Larsson, Thijs Metsch, Cristian Klein, Erik Elmroth | 2026-02-26 | 下载 | Modern multi-tenant, hardware-heterogeneous computing environments pose significant challenges for effective workload orchestration. Simple heuristics for assessing workload performance, such as CPU u... |
| Accelerating Local LLMs on Resource-Constrained Edge Devices via Distributed Prompt Caching | Hiroki Matsutani, Naoki Matsuda, Naoto Sugiura | 2026-02-26 | 下载 | Since local LLM inference on resource-constrained edge devices imposes a severe performance bottleneck, this paper proposes distributed prompt caching to enhance inference performance by cooperatively... |
| An Artificial Intelligence Framework for Joint Structural-Temporal Load Forecasting in Cloud Native Platforms | Qingyuan Zhang | 2026-02-26 | 下载 | This study targets cloud native environments where microservice invocation relations are complex, load fluctuations are multi-scale and superimposed, and cross-service impacts are significant. |
| Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study | Philipp Wiesner, Soeren Becker, Brett Cornick, Dominik Scheinert, Alexander Acker, Odej Kao | 2026-02-26 | 下载 | Training large language models (LLMs) requires substantial compute and energy. At the same time, renewable energy sources regularly produce more electricity than the grid can absorb, leading to curtai... |
| Dynamic Hierarchical Birkhoff-von Neumann Decomposition for All-to-All GPU Communication | Yen-Chieh Wu, Cheng-Shang Chang, Duan-Shin Lee, H. Jonathan Chao | 2026-02-26 | 下载 | All-to-all GPU communication is a critical bottleneck in large-scale training clusters, where completion time is constrained by per-port bandwidth and can be severely impacted by traffic skew across G... |
| RLHFless: Serverless Computing for Efficient RLHF | Rui Wei, Hanfei Yu, Shubham Jain, Yogarajan Sivakumar, Devesh Tiwari, Jian Li, Seung-Jong Park, Hao Wang | 2026-02-26 | 下载 | Reinforcement Learning from Human Feedback (RLHF) has been widely applied to Large Language Model (LLM) post-training to align model outputs with human preferences. |
| Tackling Privacy Heterogeneity in Differentially Private Federated Learning | Ruichen Xu, Ying-Jun Angela Zhang, Jianwei Huang | 2026-02-26 | 下载 | Differentially private federated learning (DP-FL) enables clients to collaboratively train machine learning models while preserving the privacy of their local data. |
| FLYING SERVING: On-the-Fly Parallelism Switching for Large Language Model Serving | Shouwei Gao, Junqi Yin, Feiyi Wang, Wenqian Dong | 2026-02-26 | 下载 | Production LLM serving must simultaneously deliver high throughput, low latency, and sufficient context capacity under non-stationary traffic and mixed request requirements. |
| FuxiShuffle: An Adaptive and Resilient Shuffle Service for Distributed Data Processing on Alibaba Cloud | Yuhao Lin, Zhipeng Tang, Jiayan Tong, Junqing Xiao, Bin Lu, Yuhang Li, Chao Li, Zhiguo Zhang, Junhua Wang, Hao Luo, James Cheng, Chuang Hu, Jiawei Jiang, Xiao Yan | 2026-02-26 | 下载 | Shuffle exchanges intermediate results between upstream and downstream operators in distributed data processing and is usually the bottleneck due to factors such as small random I/Os and network conte... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A Software-Defined Testbed for Quantifying Deauthentication Resilience in Modern Wi-Fi Networks | Alex Carbajal, Asma Jodeiri Akbarfam | 2026-02-26 | 下载 | Wi-Fi deauthentication attacks remain a practical denial-of-service (DoS) threat by exploiting unprotected management frames to disrupt client connectivity. |
| Dynamic Hierarchical Birkhoff-von Neumann Decomposition for All-to-All GPU Communication | Yen-Chieh Wu, Cheng-Shang Chang, Duan-Shin Lee, H. Jonathan Chao | 2026-02-26 | 下载 | All-to-all GPU communication is a critical bottleneck in large-scale training clusters, where completion time is constrained by per-port bandwidth and can be severely impacted by traffic skew across G... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A TEE-Based Architecture for Confidential and Dependable Process Attestation in Authorship Verification | David Condrey | 2026-02-26 | 下载 | Process attestation systems verify that a continuous physical process, such as human authorship, actually occurred, rather than merely checking system state. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents | Aishwarya Sarkar, Sayan Ghosh, Nathan Tallent, Aman Chadha, Tanya Roosta, Ali Jannesari | 2026-02-26 | 下载 | Large-scale Graph Neural Networks (GNNs) are typically trained by sampling a vertex's neighbors to a fixed distance. Because large input graphs are distributed, training requires frequent irregular co... |
| The Road to Useful Quantum Computers | Timothy Proctor, Robin Blume-Kohout, Andrew Baczewski | 2026-02-26 | 下载 | Building a useful quantum computer is a grand science and engineering challenge, currently pursued intensely by teams around the world. In the 1980s, Richard Feynman and Yuri Manin observed independen... |