2026-03-18

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
PAI: Fast, Accurate, and Full Benchmark Performance Projection with AI	Avery Johnson, Mohammad Majharul Islam, Riad Akram, Abdullah Muzahid	2026-03-18	下载	The exponential increase in complex IPs within modern SoCs, driven by Moore's Law, has created a pressing need for fast and accurate hardware-software power-performance analysis.
A Survey of Neural Network Variational Monte Carlo from a Computing Workload Characterization Perspective	Zhengze Xiao, Xuanzhe Ding, Yuyang Lou, Lixue Cheng, Chaojian Li	2026-03-18	下载	Neural Network Variational Monte Carlo (NNVMC) has emerged as a promising paradigm for solving quantum many-body problems by combining variational Monte Carlo with expressive neural-network wave-funct...
Enabling RISC-V Vector Code Generation in MLIR through Custom xDSL Lowerings	Jie Lei, Héctor Martínez, Adrián Castelló	2026-03-18	下载	The growing adoption of RISC-V in high-performance and scientific computing has increased the need for performance-portable code targeting the RISC-V Vector (RVV) extension.
HWE-Bench: Can Language Models Perform Board-level Schematic Designs?	Weibo Qiu, Yinhao Xiao, Runyu Pan	2026-03-18	下载	Large Language Models (LLMs) have demonstrated significant potential in various engineering tasks, including software development, digital logic generation, and companion document maintenance.
The Program Hypergraph: Multi-Way Relational Structure for Geometric Algebra, Spatial Compute, and Physics-Aware Compilation	Houston Haynes	2026-03-18	下载	The Program Semantic Graph (PSG) introduced in prior work on Dimensional Type Systems and Deterministic Memory Management encodes compilation-relevant properties as binary edge relations between compu...
ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression	Ruibo Fan, Xiangrui Yu, Xinglin Pan, Zeyu Li, Weile Luo, Qiang Wang, Wei Wang, Xiaowen Chu	2026-03-18	下载	Lossless model compression holds tremendous promise for alleviating the memory and bandwidth bottlenecks in bit-exact Large Language Model (LLM) serving.
ReLMXEL: Adaptive RL-Based Memory Controller with Explainable Energy and Latency Optimization	Panuganti Chirag Sai, Gandholi Sarat, R. Raghunatha Sarma, Venkata Kalyan Tavva, Naveen M	2026-03-18	下载	Reducing latency and energy consumption is critical to improving the efficiency of memory systems in modern computing. This work introduces ReLMXEL (Reinforcement Learning for Memory Controller with E...
A Synthesizable RTL Implementation of Predictive Coding Networks	Timothy Oh	2026-03-18	下载	Backpropagation has enabled modern deep learning but is difficult to realize as an online, fully distributed hardware learning system due to global error propagation, phase separation, and heavy relia...
KANtize: Exploring Low-bit Quantization of Kolmogorov-Arnold Networks for Efficient Inference	Sohaib Errabii, Olivier Sentieys, Marcello Traiola	2026-03-18	下载	Kolmogorov-Arnold Networks (KANs) have gained attention for their potential to outperform Multi-Layer Perceptrons (MLPs) in terms of parameter efficiency and interpretability.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI	Houston Haynes	2026-03-18	下载	Prevailing AI training infrastructure assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer complexity, and structur...
A mechanism design overview of Sedna	Benjamin Marsh, Alejandro Ranchal-Pedrosa	2026-03-18	下载	Sedna is a coded multi-proposer consensus protocol in which a sender shards a transaction payload into rateless symbols and disseminates them across parallel proposer lanes, providing high throughput ...
Multi-stage Flow Scheduling for LLM Serving	Yijun Sun, Xudong Liao, Songrun Xie, Hao Chen, Han Tian, Wenxue Li, Yiming Zhang, Kai Chen	2026-03-18	下载	Meeting stringent Time-To-First-Token (TTFT) requirements is crucial for LLM applications. To improve efficiency, modern LLM serving systems adopt disaggregated architectures with diverse parallelisms...
ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression	Ruibo Fan, Xiangrui Yu, Xinglin Pan, Zeyu Li, Weile Luo, Qiang Wang, Wei Wang, Xiaowen Chu	2026-03-18	下载	Lossless model compression holds tremendous promise for alleviating the memory and bandwidth bottlenecks in bit-exact Large Language Model (LLM) serving.
The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency	Huamin Chen, Xunzhuo Liu, Yuhan Liu, Junchen Jiang, Bowei He, Xue Liu	2026-03-18	下载	How many tokens can a GPU inference cluster deliver per watt? Across deployments of identical hardware, the answer varies by 40x -- not because of software inefficiency, but because of the serving con...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Offload or Overload: A Platform Measurement Study of Mobile Robotic Manipulation Workloads	Sara Pohland, Xenofon Foukas, Ganesh Ananthanarayanan, Andrey Kolobov, Sanjeev Mehrotra, Bozidar Radunovic, Ankit Verma	2026-03-18	下载	Mobile robotic manipulation--the ability of robots to navigate spaces and interact with objects--is a core capability of physical AI. Foundation models have led to breakthroughs in their performance, ...
RIS-Aided Mobile Network Design	Adam Samorzewski, Adrian Kliks	2026-03-18	下载	In this paper, we examine the distribution of radio signal propagation within the city of Poznan (Poland) to determine optimal locations for deploying Reconfigurable Intelligent Surfaces (RIS).
Access Controlled Website Interaction for Agentic AI with Delegated Critical Tasks	Sunyoung Kim, Hokeun Kim	2026-03-18	下载	Recent studies reveal gaps in delegating critical tasks to agentic AI that accesses websites on the user's behalf, primarily due to limited access control mechanisms on websites designed for agentic A...
Enabling Real-Time Programmability for RAN Functions: A Wasm-Based Approach for Robust and High-Performance dApps	João Paulo Esper, Yure Freitas, Pedro Souza, Bruno Silvestre, Joao F. Santos, Alexandre Huff, Cristiano Both, Kleber Cardoso	2026-03-18	下载	While the Open Radio Access Network Alliance (O-RAN) architecture enables third-party applications to optimize radio access networks at multiple timescales, real-time distributed applications (dApps) ...
A Vision-based Framework for Intelligent gNodeB Mobility Control	Pedro Duarte, André Coelho, Francisco Ribeiro, Filipe B. Teixeira, Luís Pessoa, Manuel Ricardo	2026-03-18	下载	This paper proposes a vision-based framework for the intelligent control of mobile Open Radio Access Network (O-RAN) base stations (gNBs) operating in dynamic wireless environments.
Bringing Network Coding into Multi-Robot Systems: Interplay Study for Autonomous Systems over Wireless Communications	Anil Zaher, Kiril Solovey, Alejandro Cohen	2026-03-18	下载	Communication is a core enabler for multi-robot systems (MRS), providing the mechanism through which robots exchange state information, coordinate actions, and satisfy safety constraints.
Multi-stage Flow Scheduling for LLM Serving	Yijun Sun, Xudong Liao, Songrun Xie, Hao Chen, Han Tian, Wenxue Li, Yiming Zhang, Kai Chen	2026-03-18	下载	Meeting stringent Time-To-First-Token (TTFT) requirements is crucial for LLM applications. To improve efficiency, modern LLM serving systems adopt disaggregated architectures with diverse parallelisms...
IEMAS: An Incentive-Efficiency Routing Framework for Open Agentic Web Ecosystems	Hongze Liu, Chang Guo, Yingzeng Li, Mengru Wang, Jiong Lou, Shijing Yuan, Hefeng Zhou, Chentao Wu, Jie LI	2026-03-18	下载	The transition to open, distributed Multi-Agent Systems (MAS) promises scalable intelligence but introduces a non-trivial tension: maximizing global efficiency requires cooperative, resource-aware sch...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
AppFlow: Memory Scheduling for Cold Launch of Large Apps on Mobile and Vehicle Systems	Xiaochen Li, Sicong Liu, Bin Guo, Yu Ouyang, Fengmin Wu, Yuan Xu, Zhiwen Yu	2026-03-18	下载	GB-scale large apps like on-device LLMs and rich media editors are becoming the next-generation trend, but their heavy memory and I/O demands, especially during multitasking, cause devices to reclaim ...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Swarm: Co-Activation Aware KVCache Offloading Across Multiple SSDs	Tuowei Wang, Liyun Chu, Ruwen Fan, Ju Ren	2026-03-18	下载	The key-value (KV) cache has become the dominant contributor to memory consumption in large language model (LLM) inference. Although offloading KVCache from GPU high-bandwidth memory (HBM) to CPU DRAM...
ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression	Ruibo Fan, Xiangrui Yu, Xinglin Pan, Zeyu Li, Weile Luo, Qiang Wang, Wei Wang, Xiaowen Chu	2026-03-18	下载	Lossless model compression holds tremendous promise for alleviating the memory and bandwidth bottlenecks in bit-exact Large Language Model (LLM) serving.