Skip to content

2026-04-11

cs.AR - Architecture

标题作者发布日期PDF摘要
A 129FPS Full HD Real-Time Accelerator for 3D Gaussian SplattingFang-Chi Chang, Tian-Sheuan Chang2026-04-11下载Rendering large-scale, unbounded scenes on AR/VR-class devices is constrained by the computation, bandwidth, and storage cost of 3D Gaussian Splatting (3DGS).
WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuningKaixuan Zhang, Chutong Ding, Shiyou Qian, Luping Wang, Jian Cao, Guangtao Xue, Cheng Huang, Guodong Yang, Liping Zhang2026-04-11下载The rapid adoption of Large Language Models (LLMs) has made GPU inference efficiency an increasingly critical system concern. The runtime of LLM workloads is largely dominated by tile-based kernels, p...
FlexVector: A SpMM Vector Processor with Flexible VRF for GCNs on Varying-Sparsity GraphsBohan Li, Shengmin Li, Xinyu Shi, Enyi Yao, Francky Catthoor, Simei Yang2026-04-11下载Graph Convolutional Networks (GCNs) are widely adopted for tasks involving relational or graph-structured data and can be formulated as two-stage sparse-dense matrix multiplication (SpMM) during infer...
Late Breaking Results: CHESSY: Coupled Hybrid Emulation with SystemC-FPGA SynchronizationLorenzo Ruotolo, Giovanni Pollo, Mohamed Amine Hamdi, Matteo Risso, Yukai Chen, Enrico Macii, Massimo Poncino, Sara Vinco, Alessio Burrello, Daniele Jahier Pagliari2026-04-11下载The growing complexity of cyber-physical systems (CPSs) calls for early prototyping tools that combine accuracy, speed, and usability. Virtual Platforms (VPs) provide fast functional simulation, but h...
Aging Aware Adaptive Voltage Scaling for Reliable and Efficient AI AcceleratorsTong Xie, Zuodong Zhang, Chao Yang, Yuan Wang, Runsheng Wang, Meng Li2026-04-11下载Deep neural networks (DNNs) have showcased remarkable performance across various tasks and are widely deployed on AI accelerators fabricated in advanced technology nodes for efficiency.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Icicle: Scalable Metadata Indexing and Real-Time Monitoring for HPC File SystemsHaochen Pan, Ryan Chard, Song Young Oh, Maxime Gonthier, Valérie Hayot-Sasson, Geoffrey Lentner, Joe Bottigliero, Rachana Ananthakrishnan, Kyle Chard, Ian Foster2026-04-11下载Modern HPC file systems can contain billions of files and hundreds of petabytes of data, making even simple questions increasingly intractable to answer.
Verifying In-Network Computing Systems for Design RisksTianyu Bai, Ying Zhang, Ying Zhang, Wenfei Wu2026-04-11下载The emergence of programmable switches has brought in-network computing (INC) into the spotlight in recent years. By offloading computation directly onto the data transmission process, INC improves ne...
RF-LEGO: Modularized Signal Processing-Deep Learning Co-Design for RF Sensing via Deep UnrollingLuca Jiang-Tao Yu, Chenshu Wu2026-04-11下载Wireless sensing, traditionally relying on signal processing (SP) techniques, has recently shifted toward data-driven deep learning (DL) to achieve performance breakthroughs.
Tessera: Unlocking Heterogeneous GPUs through Kernel-Granularity DisaggregationTiancheng Hu, Jin Qin, Zheng Wang, Junhao Hu, Yuzheng Wang, Lei Chen, Yizhou Shan, Mingxing Zhang, Ting Cao, Chunwei Xia, Huimin Cui, Tao Xie, Chenxi Wang2026-04-11下载Disaggregation maps parts of an AI workload to different types of GPUs, offering a path to utilize modern heterogeneous GPU clusters. However, existing solutions operate at a coarse granularity and ar...
FlexVector: A SpMM Vector Processor with Flexible VRF for GCNs on Varying-Sparsity GraphsBohan Li, Shengmin Li, Xinyu Shi, Enyi Yao, Francky Catthoor, Simei Yang2026-04-11下载Graph Convolutional Networks (GCNs) are widely adopted for tasks involving relational or graph-structured data and can be formulated as two-stage sparse-dense matrix multiplication (SpMM) during infer...
LoDAdaC: a unified local training-based decentralized framework with adaptive gradients and compressed communicationWei Liu, Anweshit Panda, Ujwal Pandey, Haven Cook, George M. Slota, Naigang Wang, Jie Chen, Yangyang Xu2026-04-11下载In the decentralized distributed learning, achieving fast convergence and low communication cost is essential for scalability and high efficiency.
Rebooting Microreboot: Architectural Support for Safe, Parallel Recovery in Microservice SystemsLaurent Bindschaedler2026-04-11下载Microreboot enables fast recovery by restarting only the failing component, but in modern microservices naive restarts are unsafe: dense dependencies mean rebooting one service can disrupt many caller...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Impact of Intelligent Technologies on IoV Security: Integrating Edge Computing and AIAwais Bilal, Kashif Sharif, Liehuang Zhu, Chang Xu, Fan Li, Sadaf Bukhari, Sujit Biswas2026-04-11下载The rapid development and integration of intelligent technologies in the Internet of Vehicles (IoV) have revolutionized transportation systems by enhancing connectivity, automation, and safety.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM AgentsMofasshara Rafique, Laurent Bindschaedler2026-04-11下载Stateful tool-using LLM agents treat the context window as working memory, yet today's agent harnesses manage residency and durability as best-effort, causing recurring failures: lost state after comp...

基于 VitePress 构建