Appearance
2026-02-25
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| ArchAgent: Agentic AI-driven Computer Architecture Discovery | Raghav Gupta, Akanksha Jain, Abraham Gonzalez, Alexander Novikov, Po-Sen Huang, Matej Balog, Marvin Eisenberger, Sergey Shirobokov, Ngân Vũ, Martin Dixon, Borivoje Nikolić, Parthasarathy Ranganathan, Sagar Karandikar | 2026-02-25 | 下载 | Agile hardware design flows are a critically needed force multiplier to meet the exploding demand for compute. Recently, agentic generative AI systems have demonstrated significant advances in algorit... |
| GRAU: Generic Reconfigurable Activation Unit Design for Neural Network Hardware Accelerators | Yuhao Liu, Salim Ullah, Akash Kumar | 2026-02-25 | 下载 | With the continuous growth of neural network scales, low-precision quantization is widely used in edge accelerators. Classic multi-threshold activation hardware requires 2^n thresholds for n-bit outpu... |
| Architectural Design and Performance Analysis of FPGA based AI Accelerators: A Comprehensive Review | Soumita Chatterjee, Sudip Ghosh, Tamal Ghosh, Hafizur Rahaman | 2026-02-25 | 下载 | Deep learning (DL) has emerged as a rapidly developing advanced technology, enabling the performance of complex tasks involving image recognition, natural language processing, and autonomous decision-... |
| SigmaQuant: Hardware-Aware Heterogeneous Quantization Method for Edge DNN Inference | Qunyou Liu, Pengbo Yu, Marina Zapater, David Atienza | 2026-02-25 | 下载 | Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource constraints, including limited memory, ener... |
| Adaptive Multi-Objective Tiered Storage Configuration for KV Cache in LLM Service | Xianzhe Zheng, Zhengheng Wang, Ruiyan Ma, Rui Wang, Xiyu Wang, Rui Chen, Peng Zhang, Sicheng Pan, Zhangheng Huang, Chenxin Wu, Yi Zhang, Bo Cai, Kan Liu, Teng Ma, Yin Du, Dong Deng, Sai Wu, Guoyun Zhu, Wei Zhang, Feifei Li | 2026-02-25 | 下载 | The memory-for-computation paradigm of KV caching is essential for accelerating large language model (LLM) inference service, but limited GPU high-bandwidth memory (HBM) capacity motivates offloading ... |
| FormalRTL: Verified RTL Synthesis at Scale | Kezhi Li, Min Li, Xiangyu Wen, Shibo Zhao, Jieying Wu, Junhua Huang, Qiang Xu | 2026-02-25 | 下载 | Large language models (LLMs) have demonstrated significant potential in automating hardware synthesis, yet substantial barriers remain for industrial-scale, datapath-centric designs due to ambiguous s... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| CCCL: Node-Spanning GPU Collectives with CXL Memory Pooling | Dong Xu, Han Meng, Xinyu Chen, Dengcheng Zhu, Wei Tang, Fei Liu, Liguang Xie, Wu Xiang, Rui Shi, Yue Li, Henry Hu, Hui Zhang, Jianping Jiang, Dong Li | 2026-02-25 | 下载 | Large language models (LLMs) training or inference across multiple nodes introduces significant pressure on GPU memory and interconnect bandwidth. |
| Fault-tolerant Reduce and Allreduce operations based on correction | Martin Kuettler, Hermann Haertig | 2026-02-25 | 下载 | Implementations of Broadcast based on some information dissemination algorithm -- e.g., gossip or tree-based communication -- followed by a correction algorithm has been proposed previously. |
| veScale-FSDP: Flexible and High-Performance FSDP at Scale | Zezhou Wang, Youjie Li, Zhiqi Lin, Jiacheng Yang, Cong Xie, Guanyu Feng, Zheng Zhong, Ziyue Huang, Hongyu Zhu, Zhi Zhang, Yanghua Peng, Xin Liu | 2026-02-25 | 下载 | Fully Sharded Data Parallel (FSDP), also known as ZeRO, is widely used for training large-scale models, featuring its flexibility and minimal intrusion on model code. |
| GetBatch: Distributed Multi-Object Retrieval for ML Data Loading | Alex Aizman, Abhishek Gaikwad, Piotr Żelasko | 2026-02-25 | 下载 | Machine learning training pipelines consume data in batches. A single training step may require thousands of samples drawn from shards distributed across a storage cluster. |
| CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems | Md Hasanur Rashid, Nathan R. Tallent, Forrest Sheng Bao, Dong Dai | 2026-02-25 | 下载 | Tuning parallel file system in High-Performance Computing (HPC) systems remains challenging due to the complex I/O paths, diverse I/O patterns, and dynamic system conditions. |
| AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage | Md Hasanur Rashid, Dong Dai | 2026-02-25 | 下载 | Modern high-performance computing (HPC) applications run on compute resources but share global storage systems. This design can cause problems when applications consume a disproportionate amount of st... |
| DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File System | Md Hasanur Rashid, Xinyi Li, Youbiao He, Forrest Sheng Bao, Dong Dai | 2026-02-25 | 下载 | Enabling efficient, high-performance data access in parallel file systems (PFS) is critical for today's high-performance computing systems. PFS client-side I/O heavily impacts the final I/O performanc... |
| Engineered Simultaneity: The Physical Impossibility of Consolidated Price Discovery Across Spacelike-Separated Exchanges | Paul Borrill | 2026-02-25 | 下载 | We define \emph{engineered simultaneity}: the construction of a system that requires temporal comparison of events at spacelike-separated locations, implements this comparison via an implicit simultan... |
| Hybrid Consensus with Quantum Sybil Resistance | Dar Gilboa, Siddhartha Jain, Or Sattath | 2026-02-25 | 下载 | Sybil resistance is a key requirement of decentralized consensus protocols. It is achieved by introducing a scarce resource (such as computational power, monetary stake, disk space, etc. |
| LLMTailor: A Layer-wise Tailoring Tool for Efficient Checkpointing of Large Language Models | Minqiu Sun, Xin Huang, Luanzheng Guo, Nathan R. Tallent, Kento Sato, Dong Dai | 2026-02-25 | 下载 | Checkpointing is essential for fault tolerance in training large language models (LLMs). However, existing methods, regardless of their I/O strategies, periodically store the entire model and optimize... |
| PASTA: A Modular Program Analysis Tool Framework for Accelerators | Mao Lin, Hyeran Jeon, Keren Zhou | 2026-02-25 | 下载 | The increasing complexity and diversity of hardware accelerators in modern computing systems demand flexible, low-overhead program analysis tools. |
| IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs | Chris Egersdoerfer, Arnav Sareen, Jean Luca Bez, Suren Byna, Dongkuan Xu, Dong Dai | 2026-02-25 | 下载 | As the complexity of the HPC storage stack rapidly grows, domain scientists face increasing challenges in effectively utilizing HPC storage systems to achieve their desired I/O performance. |
| Energy Efficient Federated Learning with Hyperdimensional Computing (HDC) | Yahao Ding, Yinchao Yang, Jiaxiang Wang, Zhonghao Liu, Zhaohui Yang, Mingzhe Chen, Mohammad Shikh-Bahaei | 2026-02-25 | 下载 | This paper investigates the problem of minimizing total energy consumption for secure federated learning (FL) in wireless edge networks, a key paradigm for decentralized big data analytics. |
| Energy Efficient Federated Learning with Hyperdimensional Computing over Wireless Communication Networks | Yahao Ding, Yinchao Yang, Jiaxiang Wang, Zhaohui Yang, Dusit Niyato, Zhu Han, Mohammad Shikh-Bahaei | 2026-02-25 | 下载 | In this paper, we investigate a problem of minimizing total energy consumption for secure federated learning (FL) over wireless edge networks. |
| A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIs | Aleix Boné, Alejandro Aguirre, David Álvarez, Pedro J. Martinez-Ferrer, Vicenç Beltran | 2026-02-25 | 下载 | Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures. |
| JSAM: Privacy Straggler-Resilient Joint Client Selection and Incentive Mechanism Design in Differentially Private Federated Learning | Ruichen Xu, Ying-Jun Angela Zhang, Jianwei Huang | 2026-02-25 | 下载 | Differentially private federated learning faces a fundamental tension: privacy protection mechanisms that safeguard client data simultaneously create quantifiable privacy costs that discourage partici... |
| Adaptive Multi-Objective Tiered Storage Configuration for KV Cache in LLM Service | Xianzhe Zheng, Zhengheng Wang, Ruiyan Ma, Rui Wang, Xiyu Wang, Rui Chen, Peng Zhang, Sicheng Pan, Zhangheng Huang, Chenxin Wu, Yi Zhang, Bo Cai, Kan Liu, Teng Ma, Yin Du, Dong Deng, Sai Wu, Guoyun Zhu, Wei Zhang, Feifei Li | 2026-02-25 | 下载 | The memory-for-computation paradigm of KV caching is essential for accelerating large language model (LLM) inference service, but limited GPU high-bandwidth memory (HBM) capacity motivates offloading ... |
| DHP: Efficient Scaling of MLLM Training with Dynamic Hybrid Parallelism | Yifan Niu, Han Xiao, Dongyi Liu, Wei Zhou, Jia Li | 2026-02-25 | 下载 | Scaling long-context capabilities is crucial for Multimodal Large Language Models (MLLMs). However, real-world multimodal datasets are extremely heterogeneous. |
| Lamport's Arrow of Time: The Category Mistake in Logical Clocks | Paul Borrill | 2026-02-25 | 下载 | Lamport's 1978 paper introduced the happens-before relation and logical clocks, freeing distributed systems from dependence on synchronized physical clocks. |
| Type-Based Enforcement of Non-Interference for Choreographic Programming | Marco Bertoni, Saverio Giallorenzo, Marco Peressotti | 2026-02-25 | 下载 | Choreographies describe distributed protocols from a global viewpoint, enabling correct-by-construction synthesis of local behaviours. We develop a policy-parametric type system that prevents informat... |
| Multi-Layer Scheduling for MoE-Based LLM Reasoning | Yifan Sun, Gholamreza Haffari, Minxian Xu, Rajkumar Buyya, Adel N. Toosi | 2026-02-25 | 下载 | Large Language Models (LLMs) have achieved remarkable success across a wide range of tasks, but serving them efficiently at scale remains a critical challenge due to their substantial computational an... |
| Epoch-based Optimistic Concurrency Control in Geo-replicated Databases | Yunhao Mao, Harunari Takata, Michail Bachras, Yuqiu Zhang, Shiquan Zhang, Gengrui Zhang, Hans-Arno Jacobsen | 2026-02-25 | 下载 | Geo-distribution is essential for modern online applications to ensure service reliability and high availability. However, supporting high-performance serializable transactions in geo-replicated datab... |
| DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference | Yongtong Wu, Shaoyuan Chen, Yinmin Zhong, Rilin Huang, Yixuan Tan, Wentao Zhang, Liyue Zhang, Shangyan Zhou, Yuxuan Liu, Shunfeng Zhou, Mingxing Zhang, Xin Jin, Panpan Huang | 2026-02-25 | 下载 | The performance of multi-turn, agentic LLM inference is increasingly dominated by KV-Cache storage I/O rather than computation. In prevalent disaggregated architectures, loading the massive KV-Cache f... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A Generic Web Component for WebRTC Pub-Sub | Kundan Singh | 2026-02-25 | 下载 | We present video-io, a generic web component to publish or subscribe to a media stream in WebRTC (web real-time communication) applications. Unlike a call or conference room abstraction of existing vi... |
| Enhancing Cellular-enabled Collaborative Robots Planning through GNSS data for SAR Scenarios | Arnau Romero, Carmen Delgado, Jana Baguer, Raúl Suárez, Xavier Costa-Pérez | 2026-02-25 | 下载 | Cellular-enabled collaborative robots are becoming paramount in Search-and-Rescue (SAR) and emergency response. Crucially dependent on resilient mobile network connectivity, they serve as invaluable a... |
| Lossy Compression of Network Feature Data: When Less Is Enough | Fabio Palmese, Gabriele Merlach, Damiano Ravalico, Martino Trevisan, Alessandro E. C. Redondi | 2026-02-25 | 下载 | Network traffic analysis increasingly relies on feature-based representations to support monitoring and security in the presence of pervasive encryption. |
| Dual-Hop Joint Visible Light and Backscatter Communication Relaying under Finite Blocklength | Boxuan Xie, Lauri Mela, Alexis A. Dowhuszko, Jiacheng Wang, Kalle Ruttik, Riku Jäntti | 2026-02-25 | 下载 | This paper investigates a dual-hop joint visible light communication (VLC) and backscatter communication (BC) relaying framework under the finite blocklength (FBL) constraint, aiming at energy-neutral... |
| Implementation and transition to post-quantum cryptography of the Minimal IKE protocol | Davide De Zuane, Paolo Santini, Marco Baldi | 2026-02-25 | 下载 | This paper concerns the Minimal Internet Key Exchange (IKE) protocol, which has received little attention to date, despite its potential to make the best-known IKE protocol sufficiently lightweight to... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Contextual Memory Virtualisation: DAG-Based State Management and Structurally Lossless Trimming for LLM Agents | Cosmo Santoni | 2026-02-25 | 下载 | As large language models engage in extended reasoning tasks, they accumulate significant state -- architectural mappings, trade-off decisions, codebase conventions -- within the context window. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems | Md Hasanur Rashid, Nathan R. Tallent, Forrest Sheng Bao, Dong Dai | 2026-02-25 | 下载 | Tuning parallel file system in High-Performance Computing (HPC) systems remains challenging due to the complex I/O paths, diverse I/O patterns, and dynamic system conditions. |
| AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage | Md Hasanur Rashid, Dong Dai | 2026-02-25 | 下载 | Modern high-performance computing (HPC) applications run on compute resources but share global storage systems. This design can cause problems when applications consume a disproportionate amount of st... |
| DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File System | Md Hasanur Rashid, Xinyi Li, Youbiao He, Forrest Sheng Bao, Dong Dai | 2026-02-25 | 下载 | Enabling efficient, high-performance data access in parallel file systems (PFS) is critical for today's high-performance computing systems. PFS client-side I/O heavily impacts the final I/O performanc... |
| PASTA: A Modular Program Analysis Tool Framework for Accelerators | Mao Lin, Hyeran Jeon, Keren Zhou | 2026-02-25 | 下载 | The increasing complexity and diversity of hardware accelerators in modern computing systems demand flexible, low-overhead program analysis tools. |