Skip to content

2026-02-25

cs.AR - Architecture

标题作者发布日期PDF摘要
ArchAgent: Agentic AI-driven Computer Architecture DiscoveryRaghav Gupta, Akanksha Jain, Abraham Gonzalez, Alexander Novikov, Po-Sen Huang, Matej Balog, Marvin Eisenberger, Sergey Shirobokov, Ngân Vũ, Martin Dixon, Borivoje Nikolić, Parthasarathy Ranganathan, Sagar Karandikar2026-02-25下载Agile hardware design flows are a critically needed force multiplier to meet the exploding demand for compute. Recently, agentic generative AI systems have demonstrated significant advances in algorit...
GRAU: Generic Reconfigurable Activation Unit Design for Neural Network Hardware AcceleratorsYuhao Liu, Salim Ullah, Akash Kumar2026-02-25下载With the continuous growth of neural network scales, low-precision quantization is widely used in edge accelerators. Classic multi-threshold activation hardware requires 2^n thresholds for n-bit outpu...
Architectural Design and Performance Analysis of FPGA based AI Accelerators: A Comprehensive ReviewSoumita Chatterjee, Sudip Ghosh, Tamal Ghosh, Hafizur Rahaman2026-02-25下载Deep learning (DL) has emerged as a rapidly developing advanced technology, enabling the performance of complex tasks involving image recognition, natural language processing, and autonomous decision-...
SigmaQuant: Hardware-Aware Heterogeneous Quantization Method for Edge DNN InferenceQunyou Liu, Pengbo Yu, Marina Zapater, David Atienza2026-02-25下载Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, ener...
Adaptive Multi-Objective Tiered Storage Configuration for KV Cache in LLM ServiceXianzhe Zheng, Zhengheng Wang, Ruiyan Ma, Rui Wang, Xiyu Wang, Rui Chen, Peng Zhang, Sicheng Pan, Zhangheng Huang, Chenxin Wu, Yi Zhang, Bo Cai, Kan Liu, Teng Ma, Yin Du, Dong Deng, Sai Wu, Guoyun Zhu, Wei Zhang, Feifei Li2026-02-25下载The memory-for-computation paradigm of KV caching is essential for accelerating large language model (LLM) inference service, but limited GPU high-bandwidth memory (HBM) capacity motivates offloading ...
FormalRTL: Verified RTL Synthesis at ScaleKezhi Li, Min Li, Xiangyu Wen, Shibo Zhao, Jieying Wu, Junhua Huang, Qiang Xu2026-02-25下载Large language models (LLMs) have demonstrated significant potential in automating hardware synthesis, yet substantial barriers remain for industrial-scale, datapath-centric designs due to ambiguous s...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
CCCL: Node-Spanning GPU Collectives with CXL Memory PoolingDong Xu, Han Meng, Xinyu Chen, Dengcheng Zhu, Wei Tang, Fei Liu, Liguang Xie, Wu Xiang, Rui Shi, Yue Li, Henry Hu, Hui Zhang, Jianping Jiang, Dong Li2026-02-25下载Large language models (LLMs) training or inference across multiple nodes introduces significant pressure on GPU memory and interconnect bandwidth.
Fault-tolerant Reduce and Allreduce operations based on correctionMartin Kuettler, Hermann Haertig2026-02-25下载Implementations of Broadcast based on some information dissemination algorithm -- e.g., gossip or tree-based communication -- followed by a correction algorithm has been proposed previously.
veScale-FSDP: Flexible and High-Performance FSDP at ScaleZezhou Wang, Youjie Li, Zhiqi Lin, Jiacheng Yang, Cong Xie, Guanyu Feng, Zheng Zhong, Ziyue Huang, Hongyu Zhu, Zhi Zhang, Yanghua Peng, Xin Liu2026-02-25下载Fully Sharded Data Parallel (FSDP), also known as ZeRO, is widely used for training large-scale models, featuring its flexibility and minimal intrusion on model code.
GetBatch: Distributed Multi-Object Retrieval for ML Data LoadingAlex Aizman, Abhishek Gaikwad, Piotr Żelasko2026-02-25下载Machine learning training pipelines consume data in batches. A single training step may require thousands of samples drawn from shards distributed across a storage cluster.
CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File SystemsMd Hasanur Rashid, Nathan R. Tallent, Forrest Sheng Bao, Dong Dai2026-02-25下载Tuning parallel file system in High-Performance Computing (HPC) systems remains challenging due to the complex I/O paths, diverse I/O patterns, and dynamic system conditions.
AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC StorageMd Hasanur Rashid, Dong Dai2026-02-25下载Modern high-performance computing (HPC) applications run on compute resources but share global storage systems. This design can cause problems when applications consume a disproportionate amount of st...
DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File SystemMd Hasanur Rashid, Xinyi Li, Youbiao He, Forrest Sheng Bao, Dong Dai2026-02-25下载Enabling efficient, high-performance data access in parallel file systems (PFS) is critical for today's high-performance computing systems. PFS client-side I/O heavily impacts the final I/O performanc...
Engineered Simultaneity: The Physical Impossibility of Consolidated Price Discovery Across Spacelike-Separated ExchangesPaul Borrill2026-02-25下载We define \emph{engineered simultaneity}: the construction of a system that requires temporal comparison of events at spacelike-separated locations, implements this comparison via an implicit simultan...
Hybrid Consensus with Quantum Sybil ResistanceDar Gilboa, Siddhartha Jain, Or Sattath2026-02-25下载Sybil resistance is a key requirement of decentralized consensus protocols. It is achieved by introducing a scarce resource (such as computational power, monetary stake, disk space, etc.
LLMTailor: A Layer-wise Tailoring Tool for Efficient Checkpointing of Large Language ModelsMinqiu Sun, Xin Huang, Luanzheng Guo, Nathan R. Tallent, Kento Sato, Dong Dai2026-02-25下载Checkpointing is essential for fault tolerance in training large language models (LLMs). However, existing methods, regardless of their I/O strategies, periodically store the entire model and optimize...
PASTA: A Modular Program Analysis Tool Framework for AcceleratorsMao Lin, Hyeran Jeon, Keren Zhou2026-02-25下载The increasing complexity and diversity of hardware accelerators in modern computing systems demand flexible, low-overhead program analysis tools.
IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMsChris Egersdoerfer, Arnav Sareen, Jean Luca Bez, Suren Byna, Dongkuan Xu, Dong Dai2026-02-25下载As the complexity of the HPC storage stack rapidly grows, domain scientists face increasing challenges in effectively utilizing HPC storage systems to achieve their desired I/O performance.
Energy Efficient Federated Learning with Hyperdimensional Computing (HDC)Yahao Ding, Yinchao Yang, Jiaxiang Wang, Zhonghao Liu, Zhaohui Yang, Mingzhe Chen, Mohammad Shikh-Bahaei2026-02-25下载This paper investigates the problem of minimizing total energy consumption for secure federated learning (FL) in wireless edge networks, a key paradigm for decentralized big data analytics.
Energy Efficient Federated Learning with Hyperdimensional Computing over Wireless Communication NetworksYahao Ding, Yinchao Yang, Jiaxiang Wang, Zhaohui Yang, Dusit Niyato, Zhu Han, Mohammad Shikh-Bahaei2026-02-25下载In this paper, we investigate a problem of minimizing total energy consumption for secure federated learning (FL) over wireless edge networks.
A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIsAleix Boné, Alejandro Aguirre, David Álvarez, Pedro J. Martinez-Ferrer, Vicenç Beltran2026-02-25下载Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures.
JSAM: Privacy Straggler-Resilient Joint Client Selection and Incentive Mechanism Design in Differentially Private Federated LearningRuichen Xu, Ying-Jun Angela Zhang, Jianwei Huang2026-02-25下载Differentially private federated learning faces a fundamental tension: privacy protection mechanisms that safeguard client data simultaneously create quantifiable privacy costs that discourage partici...
Adaptive Multi-Objective Tiered Storage Configuration for KV Cache in LLM ServiceXianzhe Zheng, Zhengheng Wang, Ruiyan Ma, Rui Wang, Xiyu Wang, Rui Chen, Peng Zhang, Sicheng Pan, Zhangheng Huang, Chenxin Wu, Yi Zhang, Bo Cai, Kan Liu, Teng Ma, Yin Du, Dong Deng, Sai Wu, Guoyun Zhu, Wei Zhang, Feifei Li2026-02-25下载The memory-for-computation paradigm of KV caching is essential for accelerating large language model (LLM) inference service, but limited GPU high-bandwidth memory (HBM) capacity motivates offloading ...
DHP: Efficient Scaling of MLLM Training with Dynamic Hybrid ParallelismYifan Niu, Han Xiao, Dongyi Liu, Wei Zhou, Jia Li2026-02-25下载Scaling long-context capabilities is crucial for Multimodal Large Language Models (MLLMs). However, real-world multimodal datasets are extremely heterogeneous.
Lamport's Arrow of Time: The Category Mistake in Logical ClocksPaul Borrill2026-02-25下载Lamport's 1978 paper introduced the happens-before relation and logical clocks, freeing distributed systems from dependence on synchronized physical clocks.
Type-Based Enforcement of Non-Interference for Choreographic ProgrammingMarco Bertoni, Saverio Giallorenzo, Marco Peressotti2026-02-25下载Choreographies describe distributed protocols from a global viewpoint, enabling correct-by-construction synthesis of local behaviours. We develop a policy-parametric type system that prevents informat...
Multi-Layer Scheduling for MoE-Based LLM ReasoningYifan Sun, Gholamreza Haffari, Minxian Xu, Rajkumar Buyya, Adel N. Toosi2026-02-25下载Large Language Models (LLMs) have achieved remarkable success across a wide range of tasks, but serving them efficiently at scale remains a critical challenge due to their substantial computational an...
Epoch-based Optimistic Concurrency Control in Geo-replicated DatabasesYunhao Mao, Harunari Takata, Michail Bachras, Yuqiu Zhang, Shiquan Zhang, Gengrui Zhang, Hans-Arno Jacobsen2026-02-25下载Geo-distribution is essential for modern online applications to ensure service reliability and high availability. However, supporting high-performance serializable transactions in geo-replicated datab...
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM InferenceYongtong Wu, Shaoyuan Chen, Yinmin Zhong, Rilin Huang, Yixuan Tan, Wentao Zhang, Liyue Zhang, Shangyan Zhou, Yuxuan Liu, Shunfeng Zhou, Mingxing Zhang, Xin Jin, Panpan Huang2026-02-25下载The performance of multi-turn, agentic LLM inference is increasingly dominated by KV-Cache storage I/O rather than computation. In prevalent disaggregated architectures, loading the massive KV-Cache f...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Generic Web Component for WebRTC Pub-SubKundan Singh2026-02-25下载We present video-io, a generic web component to publish or subscribe to a media stream in WebRTC (web real-time communication) applications. Unlike a call or conference room abstraction of existing vi...
Enhancing Cellular-enabled Collaborative Robots Planning through GNSS data for SAR ScenariosArnau Romero, Carmen Delgado, Jana Baguer, Raúl Suárez, Xavier Costa-Pérez2026-02-25下载Cellular-enabled collaborative robots are becoming paramount in Search-and-Rescue (SAR) and emergency response. Crucially dependent on resilient mobile network connectivity, they serve as invaluable a...
Lossy Compression of Network Feature Data: When Less Is EnoughFabio Palmese, Gabriele Merlach, Damiano Ravalico, Martino Trevisan, Alessandro E. C. Redondi2026-02-25下载Network traffic analysis increasingly relies on feature-based representations to support monitoring and security in the presence of pervasive encryption.
Dual-Hop Joint Visible Light and Backscatter Communication Relaying under Finite BlocklengthBoxuan Xie, Lauri Mela, Alexis A. Dowhuszko, Jiacheng Wang, Kalle Ruttik, Riku Jäntti2026-02-25下载This paper investigates a dual-hop joint visible light communication (VLC) and backscatter communication (BC) relaying framework under the finite blocklength (FBL) constraint, aiming at energy-neutral...
Implementation and transition to post-quantum cryptography of the Minimal IKE protocolDavide De Zuane, Paolo Santini, Marco Baldi2026-02-25下载This paper concerns the Minimal Internet Key Exchange (IKE) protocol, which has received little attention to date, despite its potential to make the best-known IKE protocol sufficiently lightweight to...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM AgentsCosmo Santoni2026-02-25下载As large language models engage in extended reasoning tasks, they accumulate significant state -- architectural mappings, trade-off decisions, codebase conventions -- within the context window.

cs.PF - Performance

标题作者发布日期PDF摘要
CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File SystemsMd Hasanur Rashid, Nathan R. Tallent, Forrest Sheng Bao, Dong Dai2026-02-25下载Tuning parallel file system in High-Performance Computing (HPC) systems remains challenging due to the complex I/O paths, diverse I/O patterns, and dynamic system conditions.
AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC StorageMd Hasanur Rashid, Dong Dai2026-02-25下载Modern high-performance computing (HPC) applications run on compute resources but share global storage systems. This design can cause problems when applications consume a disproportionate amount of st...
DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File SystemMd Hasanur Rashid, Xinyi Li, Youbiao He, Forrest Sheng Bao, Dong Dai2026-02-25下载Enabling efficient, high-performance data access in parallel file systems (PFS) is critical for today's high-performance computing systems. PFS client-side I/O heavily impacts the final I/O performanc...
PASTA: A Modular Program Analysis Tool Framework for AcceleratorsMao Lin, Hyeran Jeon, Keren Zhou2026-02-25下载The increasing complexity and diversity of hardware accelerators in modern computing systems demand flexible, low-overhead program analysis tools.

基于 VitePress 构建