Skip to content

2026-03-01

cs.AR - Architecture

标题作者发布日期PDF摘要
HAVEN: High-Bandwidth Flash Augmented Vector Engine for Large-Scale Approximate Nearest-Neighbor Search AccelerationPo-Kai Hsu, Weihong Xu, Qunyou Liu, Tajana Rosing, Shimeng Yu2026-03-01下载Retrieval-Augmented Generation (RAG) relies on large-scale Approximate Nearest Neighbor Search (ANNS) to retrieve semantically relevant context for large language models.
VIKIN: A Reconfigurable Accelerator for KANs and MLPs with Two-Stage Sparsity SupportWenhui Ou, Zhuoyu Wu, Yipu Zhang, Zheng Wang, C. Patrick Yue2026-03-01下载Recently, multi-layer perceptrons (MLPs) widely used in modern AI applications suffer from limited real-time performance due to intensive memory access overhead.
FLICKER: A Fine-Grained Contribution-Aware Accelerator for Real-Time 3D Gaussian SplattingWenhui Ou, Zhuoyu Wu, Yipu Zhang, Dongjun Wu, Freddy Ziyang Hong, Chik Patrick Yue2026-03-01下载Recently, 3D Gaussian Splatting (3DGS) has emerged as a mainstream rendering technique due to its photorealistic quality and low latency. However, processing massive numbers of non-contributing Gaussi...
SHIELD8-UAV: Sequential 8-bit Hardware Implementation of a Precision-Aware 1D-F-CNN for Low-Energy UAV Acoustic Detection and Temporal TrackingSusmita Ghanta, Karan Nathwani, Rohit Chaurasiya2026-03-01下载Real-time unmanned aerial vehicle (UAV) acoustic detection at the edge demands low-latency inference under strict power and hardware limits. This paper presents SHIELD8-UAV, a sequential 8-bit hardwar...
TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via OffloadingYudong Pan, Yintao He, Tianhua Han, Lian Liu, Shixin Zhao, Zhirong Chen, Mengdi Wang, Cangyuan Li, Yinhe Han, Ying Wang2026-03-01下载To deploy large Mixture-of-Experts (MoE) models cost-effectively, offloading-based single-GPU heterogeneous inference is crucial. While GPU-CPU architectures that offload cold experts are constrained ...
SoberDSE: Sample-Efficient Design Space Exploration via Learning-Based Algorithm SelectionLei Xu, Shanshan Wang, Chenglong Xiao2026-03-01下载High-Level Synthesis (HLS) is a pivotal electronic design automation (EDA) technology that enables the generation of hardware circuits from high-level language descriptions.
Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing ArchitectureHuize Li, Qinggang Wang, Bing Gao, Dan Chen, Yu Huang, Xin Xin2026-03-01下载Multi Scale Deformable Attention (MSDAttn) has become a fundamental component in various vision tasks due to its effective multi scale grid sampling (MSGS).
Capstone: Power-Capped Pipelining for Coarse-Grained Reconfigurable Array CompilersSabrina Yarzada, Christopher Torng2026-03-01下载Coarse-grained reconfigurable arrays (CGRAs) have attracted growing interest because they exhibit performance and energy efficiency competitive with ASICs while maintaining flexibility similar to FPGA...
Characterizing VLA Models: Identifying the Action Generation Bottleneck for Edge AI ArchitecturesManoj Vishwanathan, Suvinay Subramanian, Anand Raghunathan2026-03-01下载Vision-Language-Action (VLA) models are an emerging class of workloads critical for robotics and embodied AI at the edge. As these models scale, they demonstrate significant capability gains, yet they...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
The Finality Calculator: Analyzing and Quantifying Filecoin's Finality GuaranteesGuy Goren, Jorge M. Soares2026-03-01下载In this paper, we analyze the finality of the Filecoin network, focusing on dynamic probabilistic guarantees of tipset permanence in the canonical chain.
A402: Binding Cryptocurrency Payments to Service Execution for Agentic CommerceYue Li, Lei Wang, Kaixuan Wang, Zhiqiang Yang, Ke Wang, Zhi Guan, Jianbo Gao2026-03-01下载The rapid proliferation of autonomous AI agents is driving a shift toward agentic commerce, where agents are expected to autonomously invoke and pay for services.
Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttentionMengqi Liao, Lu Wang, Chaoyun Zhang, Bo Qiao, Si Qin, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Huaiyu Wan2026-03-01下载With reasoning becoming the generative paradigm for large language models (LLMs), the memory bottleneck caused by KV cache during the decoding phase has become a critical factor limiting high-concurre...
TriMoE: Augmenting GPU with AMX-Enabled CPU and DIMM-NDP for High-Throughput MoE Inference via OffloadingYudong Pan, Yintao He, Tianhua Han, Lian Liu, Shixin Zhao, Zhirong Chen, Mengdi Wang, Cangyuan Li, Yinhe Han, Ying Wang2026-03-01下载To deploy large Mixture-of-Experts (MoE) models cost-effectively, offloading-based single-GPU heterogeneous inference is crucial. While GPU-CPU architectures that offload cold experts are constrained ...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Border Gateway Protocol Extension for Distributing Endpoint Identifier Reachability Information in Delay-tolerant NetworksMarius Feldmann, Théo Tchilinguirian, Felix Walter2026-03-01下载The Delay-Tolerant Networking (DTN) community has created solid results during the last three decades. One aspect that still requires focus, however, is the simplification of configuring systems parti...
On Utility-optimal Entanglement Routing in Quantum NetworksSounak Kar, Arpan Mukhopadhyay2026-03-01下载Quantum networks are envisioned to enable reliable distribution and manipulation of quantum information across distances, forming the foundation of a future quantum internet.
Demand- and Priority-Aware Adaptive Congestion Control for Heterogeneous V2X Service RequirementsMiguel Sepulcre, Javier Tortosa-Garcia, Javier Gozalvez2026-03-01下载Vehicle-to-Everything (V2X) communications enable the exchange of information among vehicles to improve road safety and traffic efficiency. As V2X deployments progress, vehicles are expected to suppor...

cs.PF - Performance

标题作者发布日期PDF摘要
Characterizing VLA Models: Identifying the Action Generation Bottleneck for Edge AI ArchitecturesManoj Vishwanathan, Suvinay Subramanian, Anand Raghunathan2026-03-01下载Vision-Language-Action (VLA) models are an emerging class of workloads critical for robotics and embodied AI at the edge. As these models scale, they demonstrate significant capability gains, yet they...

基于 VitePress 构建