2026-03-01

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
HAVEN: High-Bandwidth Flash Augmented Vector Engine for Large-Scale Approximate Nearest-Neighbor Search Acceleration	Po-Kai Hsu, Weihong Xu, Qunyou Liu, Tajana Rosing, Shimeng Yu	2026-03-01	下载	Retrieval-Augmented Generation (RAG) relies on large-scale Approximate Nearest Neighbor Search (ANNS) to retrieve semantically relevant context for large language models.
VIKIN: A Reconfigurable Accelerator for KANs and MLPs with Two-Stage Sparsity Support	Wenhui Ou, Zhuoyu Wu, Yipu Zhang, Zheng Wang, C. Patrick Yue	2026-03-01	下载	Recently, multi-layer perceptrons (MLPs) widely used in modern AI applications suffer from limited real-time performance due to intensive memory access overhead.
FLICKER: A Fine-Grained Contribution-Aware Accelerator for Real-Time 3D Gaussian Splatting	Wenhui Ou, Zhuoyu Wu, Yipu Zhang, Dongjun Wu, Freddy Ziyang Hong, Chik Patrick Yue	2026-03-01	下载	Recently, 3D Gaussian Splatting (3DGS) has emerged as a mainstream rendering technique due to its photorealistic quality and low latency. However, processing massive numbers of non-contributing Gaussi...
SHIELD8-UAV: Sequential 8-bit Hardware Implementation of a Precision-Aware 1D-F-CNN for Low-Energy UAV Acoustic Detection and Temporal Tracking	Susmita Ghanta, Karan Nathwani, Rohit Chaurasiya	2026-03-01	下载	Real-time unmanned aerial vehicle (UAV) acoustic detection at the edge demands low-latency inference under strict power and hardware limits. This paper presents SHIELD8-UAV, a sequential 8-bit hardwar...
TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading	Yudong Pan, Yintao He, Tianhua Han, Lian Liu, Shixin Zhao, Zhirong Chen, Mengdi Wang, Cangyuan Li, Yinhe Han, Ying Wang	2026-03-01	下载	To deploy large Mixture-of-Experts (MoE) models cost-effectively, offloading-based single-GPU heterogeneous inference is crucial. While GPU-CPU architectures that offload cold experts are constrained ...
SoberDSE: Sample-Efficient Design Space Exploration via Learning-Based Algorithm Selection	Lei Xu, Shanshan Wang, Chenglong Xiao	2026-03-01	下载	High-Level Synthesis (HLS) is a pivotal electronic design automation (EDA) technology that enables the generation of hardware circuits from high-level language descriptions.
Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture	Huize Li, Qinggang Wang, Bing Gao, Dan Chen, Yu Huang, Xin Xin	2026-03-01	下载	Multi Scale Deformable Attention (MSDAttn) has become a fundamental component in various vision tasks due to its effective multi scale grid sampling (MSGS).
Capstone: Power-Capped Pipelining for Coarse-Grained Reconfigurable Array Compilers	Sabrina Yarzada, Christopher Torng	2026-03-01	下载	Coarse-grained reconfigurable arrays (CGRAs) have attracted growing interest because they exhibit performance and energy efficiency competitive with ASICs while maintaining flexibility similar to FPGA...
Characterizing VLA Models: Identifying the Action Generation Bottleneck for Edge AI Architectures	Manoj Vishwanathan, Suvinay Subramanian, Anand Raghunathan	2026-03-01	下载	Vision-Language-Action (VLA) models are an emerging class of workloads critical for robotics and embodied AI at the edge. As these models scale, they demonstrate significant capability gains, yet they...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
The Finality Calculator: Analyzing and Quantifying Filecoin's Finality Guarantees	Guy Goren, Jorge M. Soares	2026-03-01	下载	In this paper, we analyze the finality of the Filecoin network, focusing on dynamic probabilistic guarantees of tipset permanence in the canonical chain.
A402: Binding Cryptocurrency Payments to Service Execution for Agentic Commerce	Yue Li, Lei Wang, Kaixuan Wang, Zhiqiang Yang, Ke Wang, Zhi Guan, Jianbo Gao	2026-03-01	下载	The rapid proliferation of autonomous AI agents is driving a shift toward agentic commerce, where agents are expected to autonomously invoke and pay for services.
Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention	Mengqi Liao, Lu Wang, Chaoyun Zhang, Bo Qiao, Si Qin, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Huaiyu Wan	2026-03-01	下载	With reasoning becoming the generative paradigm for large language models (LLMs), the memory bottleneck caused by KV cache during the decoding phase has become a critical factor limiting high-concurre...
TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via Offloading	Yudong Pan, Yintao He, Tianhua Han, Lian Liu, Shixin Zhao, Zhirong Chen, Mengdi Wang, Cangyuan Li, Yinhe Han, Ying Wang	2026-03-01	下载	To deploy large Mixture-of-Experts (MoE) models cost-effectively, offloading-based single-GPU heterogeneous inference is crucial. While GPU-CPU architectures that offload cold experts are constrained ...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
A Border Gateway Protocol Extension for Distributing Endpoint Identifier Reachability Information in Delay-tolerant Networks	Marius Feldmann, Théo Tchilinguirian, Felix Walter	2026-03-01	下载	The Delay-Tolerant Networking (DTN) community has created solid results during the last three decades. One aspect that still requires focus, however, is the simplification of configuring systems parti...
On Utility-optimal Entanglement Routing in Quantum Networks	Sounak Kar, Arpan Mukhopadhyay	2026-03-01	下载	Quantum networks are envisioned to enable reliable distribution and manipulation of quantum information across distances, forming the foundation of a future quantum internet.
Demand- and Priority-Aware Adaptive Congestion Control for Heterogeneous V2X Service Requirements	Miguel Sepulcre, Javier Tortosa-Garcia, Javier Gozalvez	2026-03-01	下载	Vehicle-to-Everything (V2X) communications enable the exchange of information among vehicles to improve road safety and traffic efficiency. As V2X deployments progress, vehicles are expected to suppor...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Characterizing VLA Models: Identifying the Action Generation Bottleneck for Edge AI Architectures	Manoj Vishwanathan, Suvinay Subramanian, Anand Raghunathan	2026-03-01	下载	Vision-Language-Action (VLA) models are an emerging class of workloads critical for robotics and embodied AI at the edge. As these models scale, they demonstrate significant capability gains, yet they...