2026-02-25

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
ArchAgent: Agentic AI-driven Computer Architecture Discovery	Raghav Gupta, Akanksha Jain, Abraham Gonzalez, Alexander Novikov, Po-Sen Huang, Matej Balog, Marvin Eisenberger, Sergey Shirobokov, Ngân Vũ, Martin Dixon, Borivoje Nikolić, Parthasarathy Ranganathan, Sagar Karandikar	2026-02-25	下载	Agile hardware design flows are a critically needed force multiplier to meet the exploding demand for compute. Recently, agentic generative AI systems have demonstrated significant advances in algorit...
GRAU: Generic Reconfigurable Activation Unit Design for Neural Network Hardware Accelerators	Yuhao Liu, Salim Ullah, Akash Kumar	2026-02-25	下载	With the continuous growth of neural network scales, low-precision quantization is widely used in edge accelerators. Classic multi-threshold activation hardware requires 2^n thresholds for n-bit outpu...
Architectural Design and Performance Analysis of FPGA based AI Accelerators: A Comprehensive Review	Soumita Chatterjee, Sudip Ghosh, Tamal Ghosh, Hafizur Rahaman	2026-02-25	下载	Deep learning (DL) has emerged as a rapidly developing advanced technology, enabling the performance of complex tasks involving image recognition, natural language processing, and autonomous decision-...
SigmaQuant: Hardware-Aware Heterogeneous Quantization Method for Edge DNN Inference	Qunyou Liu, Pengbo Yu, Marina Zapater, David Atienza	2026-02-25	下载	Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, ener...
Adaptive Multi-Objective Tiered Storage Configuration for KV Cache in LLM Service	Xianzhe Zheng, Zhengheng Wang, Ruiyan Ma, Rui Wang, Xiyu Wang, Rui Chen, Peng Zhang, Sicheng Pan, Zhangheng Huang, Chenxin Wu, Yi Zhang, Bo Cai, Kan Liu, Teng Ma, Yin Du, Dong Deng, Sai Wu, Guoyun Zhu, Wei Zhang, Feifei Li	2026-02-25	下载	The memory-for-computation paradigm of KV caching is essential for accelerating large language model (LLM) inference service, but limited GPU high-bandwidth memory (HBM) capacity motivates offloading ...
FormalRTL: Verified RTL Synthesis at Scale	Kezhi Li, Min Li, Xiangyu Wen, Shibo Zhao, Jieying Wu, Junhua Huang, Qiang Xu	2026-02-25	下载	Large language models (LLMs) have demonstrated significant potential in automating hardware synthesis, yet substantial barriers remain for industrial-scale, datapath-centric designs due to ambiguous s...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling	Dong Xu, Han Meng, Xinyu Chen, Dengcheng Zhu, Wei Tang, Fei Liu, Liguang Xie, Wu Xiang, Rui Shi, Yue Li, Henry Hu, Hui Zhang, Jianping Jiang, Dong Li	2026-02-25	下载	Large language models (LLMs) training or inference across multiple nodes introduces significant pressure on GPU memory and interconnect bandwidth.
Fault-tolerant Reduce and Allreduce operations based on correction	Martin Kuettler, Hermann Haertig	2026-02-25	下载	Implementations of Broadcast based on some information dissemination algorithm -- e.g., gossip or tree-based communication -- followed by a correction algorithm has been proposed previously.
veScale-FSDP: Flexible and High-Performance FSDP at Scale	Zezhou Wang, Youjie Li, Zhiqi Lin, Jiacheng Yang, Cong Xie, Guanyu Feng, Zheng Zhong, Ziyue Huang, Hongyu Zhu, Zhi Zhang, Yanghua Peng, Xin Liu	2026-02-25	下载	Fully Sharded Data Parallel (FSDP), also known as ZeRO, is widely used for training large-scale models, featuring its flexibility and minimal intrusion on model code.
GetBatch: Distributed Multi-Object Retrieval for ML Data Loading	Alex Aizman, Abhishek Gaikwad, Piotr Żelasko	2026-02-25	下载	Machine learning training pipelines consume data in batches. A single training step may require thousands of samples drawn from shards distributed across a storage cluster.
CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems	Md Hasanur Rashid, Nathan R. Tallent, Forrest Sheng Bao, Dong Dai	2026-02-25	下载	Tuning parallel file system in High-Performance Computing (HPC) systems remains challenging due to the complex I/O paths, diverse I/O patterns, and dynamic system conditions.
AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage	Md Hasanur Rashid, Dong Dai	2026-02-25	下载	Modern high-performance computing (HPC) applications run on compute resources but share global storage systems. This design can cause problems when applications consume a disproportionate amount of st...
DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File System	Md Hasanur Rashid, Xinyi Li, Youbiao He, Forrest Sheng Bao, Dong Dai	2026-02-25	下载	Enabling efficient, high-performance data access in parallel file systems (PFS) is critical for today's high-performance computing systems. PFS client-side I/O heavily impacts the final I/O performanc...
Engineered Simultaneity: The Physical Impossibility of Consolidated Price Discovery Across Spacelike-Separated Exchanges	Paul Borrill	2026-02-25	下载	We define \emph{engineered simultaneity}: the construction of a system that requires temporal comparison of events at spacelike-separated locations, implements this comparison via an implicit simultan...
Hybrid Consensus with Quantum Sybil Resistance	Dar Gilboa, Siddhartha Jain, Or Sattath	2026-02-25	下载	Sybil resistance is a key requirement of decentralized consensus protocols. It is achieved by introducing a scarce resource (such as computational power, monetary stake, disk space, etc.
LLMTailor: A Layer-wise Tailoring Tool for Efficient Checkpointing of Large Language Models	Minqiu Sun, Xin Huang, Luanzheng Guo, Nathan R. Tallent, Kento Sato, Dong Dai	2026-02-25	下载	Checkpointing is essential for fault tolerance in training large language models (LLMs). However, existing methods, regardless of their I/O strategies, periodically store the entire model and optimize...
PASTA: A Modular Program Analysis Tool Framework for Accelerators	Mao Lin, Hyeran Jeon, Keren Zhou	2026-02-25	下载	The increasing complexity and diversity of hardware accelerators in modern computing systems demand flexible, low-overhead program analysis tools.
IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs	Chris Egersdoerfer, Arnav Sareen, Jean Luca Bez, Suren Byna, Dongkuan Xu, Dong Dai	2026-02-25	下载	As the complexity of the HPC storage stack rapidly grows, domain scientists face increasing challenges in effectively utilizing HPC storage systems to achieve their desired I/O performance.
Energy Efficient Federated Learning with Hyperdimensional Computing (HDC)	Yahao Ding, Yinchao Yang, Jiaxiang Wang, Zhonghao Liu, Zhaohui Yang, Mingzhe Chen, Mohammad Shikh-Bahaei	2026-02-25	下载	This paper investigates the problem of minimizing total energy consumption for secure federated learning (FL) in wireless edge networks, a key paradigm for decentralized big data analytics.
Energy Efficient Federated Learning with Hyperdimensional Computing over Wireless Communication Networks	Yahao Ding, Yinchao Yang, Jiaxiang Wang, Zhaohui Yang, Dusit Niyato, Zhu Han, Mohammad Shikh-Bahaei	2026-02-25	下载	In this paper, we investigate a problem of minimizing total energy consumption for secure federated learning (FL) over wireless edge networks.
A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIs	Aleix Boné, Alejandro Aguirre, David Álvarez, Pedro J. Martinez-Ferrer, Vicenç Beltran	2026-02-25	下载	Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures.
JSAM: Privacy Straggler-Resilient Joint Client Selection and Incentive Mechanism Design in Differentially Private Federated Learning	Ruichen Xu, Ying-Jun Angela Zhang, Jianwei Huang	2026-02-25	下载	Differentially private federated learning faces a fundamental tension: privacy protection mechanisms that safeguard client data simultaneously create quantifiable privacy costs that discourage partici...
Adaptive Multi-Objective Tiered Storage Configuration for KV Cache in LLM Service	Xianzhe Zheng, Zhengheng Wang, Ruiyan Ma, Rui Wang, Xiyu Wang, Rui Chen, Peng Zhang, Sicheng Pan, Zhangheng Huang, Chenxin Wu, Yi Zhang, Bo Cai, Kan Liu, Teng Ma, Yin Du, Dong Deng, Sai Wu, Guoyun Zhu, Wei Zhang, Feifei Li	2026-02-25	下载	The memory-for-computation paradigm of KV caching is essential for accelerating large language model (LLM) inference service, but limited GPU high-bandwidth memory (HBM) capacity motivates offloading ...
DHP: Efficient Scaling of MLLM Training with Dynamic Hybrid Parallelism	Yifan Niu, Han Xiao, Dongyi Liu, Wei Zhou, Jia Li	2026-02-25	下载	Scaling long-context capabilities is crucial for Multimodal Large Language Models (MLLMs). However, real-world multimodal datasets are extremely heterogeneous.
Lamport's Arrow of Time: The Category Mistake in Logical Clocks	Paul Borrill	2026-02-25	下载	Lamport's 1978 paper introduced the happens-before relation and logical clocks, freeing distributed systems from dependence on synchronized physical clocks.
Type-Based Enforcement of Non-Interference for Choreographic Programming	Marco Bertoni, Saverio Giallorenzo, Marco Peressotti	2026-02-25	下载	Choreographies describe distributed protocols from a global viewpoint, enabling correct-by-construction synthesis of local behaviours. We develop a policy-parametric type system that prevents informat...
Multi-Layer Scheduling for MoE-Based LLM Reasoning	Yifan Sun, Gholamreza Haffari, Minxian Xu, Rajkumar Buyya, Adel N. Toosi	2026-02-25	下载	Large Language Models (LLMs) have achieved remarkable success across a wide range of tasks, but serving them efficiently at scale remains a critical challenge due to their substantial computational an...
Epoch-based Optimistic Concurrency Control in Geo-replicated Databases	Yunhao Mao, Harunari Takata, Michail Bachras, Yuqiu Zhang, Shiquan Zhang, Gengrui Zhang, Hans-Arno Jacobsen	2026-02-25	下载	Geo-distribution is essential for modern online applications to ensure service reliability and high availability. However, supporting high-performance serializable transactions in geo-replicated datab...
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference	Yongtong Wu, Shaoyuan Chen, Yinmin Zhong, Rilin Huang, Yixuan Tan, Wentao Zhang, Liyue Zhang, Shangyan Zhou, Yuxuan Liu, Shunfeng Zhou, Mingxing Zhang, Xin Jin, Panpan Huang	2026-02-25	下载	The performance of multi-turn, agentic LLM inference is increasingly dominated by KV-Cache storage I/O rather than computation. In prevalent disaggregated architectures, loading the massive KV-Cache f...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
A Generic Web Component for WebRTC Pub-Sub	Kundan Singh	2026-02-25	下载	We present video-io, a generic web component to publish or subscribe to a media stream in WebRTC (web real-time communication) applications. Unlike a call or conference room abstraction of existing vi...
Enhancing Cellular-enabled Collaborative Robots Planning through GNSS data for SAR Scenarios	Arnau Romero, Carmen Delgado, Jana Baguer, Raúl Suárez, Xavier Costa-Pérez	2026-02-25	下载	Cellular-enabled collaborative robots are becoming paramount in Search-and-Rescue (SAR) and emergency response. Crucially dependent on resilient mobile network connectivity, they serve as invaluable a...
Lossy Compression of Network Feature Data: When Less Is Enough	Fabio Palmese, Gabriele Merlach, Damiano Ravalico, Martino Trevisan, Alessandro E. C. Redondi	2026-02-25	下载	Network traffic analysis increasingly relies on feature-based representations to support monitoring and security in the presence of pervasive encryption.
Dual-Hop Joint Visible Light and Backscatter Communication Relaying under Finite Blocklength	Boxuan Xie, Lauri Mela, Alexis A. Dowhuszko, Jiacheng Wang, Kalle Ruttik, Riku Jäntti	2026-02-25	下载	This paper investigates a dual-hop joint visible light communication (VLC) and backscatter communication (BC) relaying framework under the finite blocklength (FBL) constraint, aiming at energy-neutral...
Implementation and transition to post-quantum cryptography of the Minimal IKE protocol	Davide De Zuane, Paolo Santini, Marco Baldi	2026-02-25	下载	This paper concerns the Minimal Internet Key Exchange (IKE) protocol, which has received little attention to date, despite its potential to make the best-known IKE protocol sufficiently lightweight to...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM Agents	Cosmo Santoni	2026-02-25	下载	As large language models engage in extended reasoning tasks, they accumulate significant state -- architectural mappings, trade-off decisions, codebase conventions -- within the context window.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems	Md Hasanur Rashid, Nathan R. Tallent, Forrest Sheng Bao, Dong Dai	2026-02-25	下载	Tuning parallel file system in High-Performance Computing (HPC) systems remains challenging due to the complex I/O paths, diverse I/O patterns, and dynamic system conditions.
AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage	Md Hasanur Rashid, Dong Dai	2026-02-25	下载	Modern high-performance computing (HPC) applications run on compute resources but share global storage systems. This design can cause problems when applications consume a disproportionate amount of st...
DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File System	Md Hasanur Rashid, Xinyi Li, Youbiao He, Forrest Sheng Bao, Dong Dai	2026-02-25	下载	Enabling efficient, high-performance data access in parallel file systems (PFS) is critical for today's high-performance computing systems. PFS client-side I/O heavily impacts the final I/O performanc...
PASTA: A Modular Program Analysis Tool Framework for Accelerators	Mao Lin, Hyeran Jeon, Keren Zhou	2026-02-25	下载	The increasing complexity and diversity of hardware accelerators in modern computing systems demand flexible, low-overhead program analysis tools.

2026-02-25 ​

cs.AR - Architecture ​

cs.DC - Distributed, Parallel, and Cluster Computing ​

cs.NI - Networking and Internet Architecture ​

cs.OS - Operating Systems ​

cs.PF - Performance ​

2026-02-25

cs.AR - Architecture

cs.DC - Distributed, Parallel, and Cluster Computing

cs.NI - Networking and Internet Architecture

cs.OS - Operating Systems

cs.PF - Performance