Skip to content

2026-03-18

cs.AR - Architecture

标题作者发布日期PDF摘要
PAI: Fast, Accurate, and Full Benchmark Performance Projection with AIAvery Johnson, Mohammad Majharul Islam, Riad Akram, Abdullah Muzahid2026-03-18下载The exponential increase in complex IPs within modern SoCs, driven by Moore's Law, has created a pressing need for fast and accurate hardware-software power-performance analysis.
A Survey of Neural Network Variational Monte Carlo from a Computing Workload Characterization PerspectiveZhengze Xiao, Xuanzhe Ding, Yuyang Lou, Lixue Cheng, Chaojian Li2026-03-18下载Neural Network Variational Monte Carlo (NNVMC) has emerged as a promising paradigm for solving quantum many-body problems by combining variational Monte Carlo with expressive neural-network wave-funct...
Enabling RISC-V Vector Code Generation in MLIR through Custom xDSL LoweringsJie Lei, Héctor Martínez, Adrián Castelló2026-03-18下载The growing adoption of RISC-V in high-performance and scientific computing has increased the need for performance-portable code targeting the RISC-V Vector (RVV) extension.
HWE-Bench: Can Language Models Perform Board-level Schematic Designs?Weibo Qiu, Yinhao Xiao, Runyu Pan2026-03-18下载Large Language Models (LLMs) have demonstrated significant potential in various engineering tasks, including software development, digital logic generation, and companion document maintenance.
The Program Hypergraph: Multi-Way Relational Structure for Geometric Algebra, Spatial Compute, and Physics-Aware CompilationHouston Haynes2026-03-18下载The Program Semantic Graph (PSG) introduced in prior work on Dimensional Type Systems and Deterministic Memory Management encodes compilation-relevant properties as binary edge relations between compu...
ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless CompressionRuibo Fan, Xiangrui Yu, Xinglin Pan, Zeyu Li, Weile Luo, Qiang Wang, Wei Wang, Xiaowen Chu2026-03-18下载Lossless model compression holds tremendous promise for alleviating the memory and bandwidth bottlenecks in bit-exact Large Language Model (LLM) serving.
ReLMXEL: Adaptive RL-Based Memory Controller with Explainable Energy and Latency OptimizationPanuganti Chirag Sai, Gandholi Sarat, R. Raghunatha Sarma, Venkata Kalyan Tavva, Naveen M2026-03-18下载Reducing latency and energy consumption is critical to improving the efficiency of memory systems in modern computing. This work introduces ReLMXEL (Reinforcement Learning for Memory Controller with E...
A Synthesizable RTL Implementation of Predictive Coding NetworksTimothy Oh2026-03-18下载Backpropagation has enabled modern deep learning but is difficult to realize as an online, fully distributed hardware learning system due to global error propagation, phase separation, and heavy relia...
KANtize: Exploring Low-bit Quantization of Kolmogorov-Arnold Networks for Efficient InferenceSohaib Errabii, Olivier Sentieys, Marcello Traiola2026-03-18下载Kolmogorov-Arnold Networks (KANs) have gained attention for their potential to outperform Multi-Layer Perceptrons (MLPs) in terms of parameter efficiency and interpretability.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AIHouston Haynes2026-03-18下载Prevailing AI training infrastructure assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer complexity, and structur...
A mechanism design overview of SednaBenjamin Marsh, Alejandro Ranchal-Pedrosa2026-03-18下载Sedna is a coded multi-proposer consensus protocol in which a sender shards a transaction payload into rateless symbols and disseminates them across parallel proposer lanes, providing high throughput ...
Multi-stage Flow Scheduling for LLM ServingYijun Sun, Xudong Liao, Songrun Xie, Hao Chen, Han Tian, Wenxue Li, Yiming Zhang, Kai Chen2026-03-18下载Meeting stringent Time-To-First-Token (TTFT) requirements is crucial for LLM applications. To improve efficiency, modern LLM serving systems adopt disaggregated architectures with diverse parallelisms...
ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless CompressionRuibo Fan, Xiangrui Yu, Xinglin Pan, Zeyu Li, Weile Luo, Qiang Wang, Wei Wang, Xiaowen Chu2026-03-18下载Lossless model compression holds tremendous promise for alleviating the memory and bandwidth bottlenecks in bit-exact Large Language Model (LLM) serving.
The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy EfficiencyHuamin Chen, Xunzhuo Liu, Yuhan Liu, Junchen Jiang, Bowei He, Xue Liu2026-03-18下载How many tokens can a GPU inference cluster deliver per watt? Across deployments of identical hardware, the answer varies by 40x -- not because of software inefficiency, but because of the serving con...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Offload or Overload: A Platform Measurement Study of Mobile Robotic Manipulation WorkloadsSara Pohland, Xenofon Foukas, Ganesh Ananthanarayanan, Andrey Kolobov, Sanjeev Mehrotra, Bozidar Radunovic, Ankit Verma2026-03-18下载Mobile robotic manipulation--the ability of robots to navigate spaces and interact with objects--is a core capability of physical AI. Foundation models have led to breakthroughs in their performance, ...
RIS-Aided Mobile Network DesignAdam Samorzewski, Adrian Kliks2026-03-18下载In this paper, we examine the distribution of radio signal propagation within the city of Poznan (Poland) to determine optimal locations for deploying Reconfigurable Intelligent Surfaces (RIS).
Access Controlled Website Interaction for Agentic AI with Delegated Critical TasksSunyoung Kim, Hokeun Kim2026-03-18下载Recent studies reveal gaps in delegating critical tasks to agentic AI that accesses websites on the user's behalf, primarily due to limited access control mechanisms on websites designed for agentic A...
Enabling Real-Time Programmability for RAN Functions: A Wasm-Based Approach for Robust and High-Performance dAppsJoão Paulo Esper, Yure Freitas, Pedro Souza, Bruno Silvestre, Joao F. Santos, Alexandre Huff, Cristiano Both, Kleber Cardoso2026-03-18下载While the Open Radio Access Network Alliance (O-RAN) architecture enables third-party applications to optimize radio access networks at multiple timescales, real-time distributed applications (dApps) ...
A Vision-based Framework for Intelligent gNodeB Mobility ControlPedro Duarte, André Coelho, Francisco Ribeiro, Filipe B. Teixeira, Luís Pessoa, Manuel Ricardo2026-03-18下载This paper proposes a vision-based framework for the intelligent control of mobile Open Radio Access Network (O-RAN) base stations (gNBs) operating in dynamic wireless environments.
Bringing Network Coding into Multi-Robot Systems: Interplay Study for Autonomous Systems over Wireless CommunicationsAnil Zaher, Kiril Solovey, Alejandro Cohen2026-03-18下载Communication is a core enabler for multi-robot systems (MRS), providing the mechanism through which robots exchange state information, coordinate actions, and satisfy safety constraints.
Multi-stage Flow Scheduling for LLM ServingYijun Sun, Xudong Liao, Songrun Xie, Hao Chen, Han Tian, Wenxue Li, Yiming Zhang, Kai Chen2026-03-18下载Meeting stringent Time-To-First-Token (TTFT) requirements is crucial for LLM applications. To improve efficiency, modern LLM serving systems adopt disaggregated architectures with diverse parallelisms...
IEMAS: An Incentive-Efficiency Routing Framework for Open Agentic Web EcosystemsHongze Liu, Chang Guo, Yingzeng Li, Mengru Wang, Jiong Lou, Shijing Yuan, Hefeng Zhou, Chentao Wu, Jie LI2026-03-18下载The transition to open, distributed Multi-Agent Systems (MAS) promises scalable intelligence but introduces a non-trivial tension: maximizing global efficiency requires cooperative, resource-aware sch...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
AppFlow: Memory Scheduling for Cold Launch of Large Apps on Mobile and Vehicle SystemsXiaochen Li, Sicong Liu, Bin Guo, Yu Ouyang, Fengmin Wu, Yuan Xu, Zhiwen Yu2026-03-18下载GB-scale large apps like on-device LLMs and rich media editors are becoming the next-generation trend, but their heavy memory and I/O demands, especially during multitasking, cause devices to reclaim ...

cs.PF - Performance

标题作者发布日期PDF摘要
Swarm: Co-Activation Aware KVCache Offloading Across Multiple SSDsTuowei Wang, Liyun Chu, Ruwen Fan, Ju Ren2026-03-18下载The key-value (KV) cache has become the dominant contributor to memory consumption in large language model (LLM) inference. Although offloading KVCache from GPU high-bandwidth memory (HBM) to CPU DRAM...
ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless CompressionRuibo Fan, Xiangrui Yu, Xinglin Pan, Zeyu Li, Weile Luo, Qiang Wang, Wei Wang, Xiaowen Chu2026-03-18下载Lossless model compression holds tremendous promise for alleviating the memory and bandwidth bottlenecks in bit-exact Large Language Model (LLM) serving.

基于 VitePress 构建