2026-04-13

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents	Swanand Rao, Kiran Kashalkar, Parvathi Somashekar, Priya Krishnan	2026-04-13	下载	The transition from stateless model inference to stateful agentic execution is reshaping the systems assumptions underlying modern AI infrastructure.
Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores	Yixian Shen, Chaoyao Shen, Jan Deen, George Floros, Andy Pimentel, Anuj Pathania	2026-04-13	下载	Large Foundation Model (LFM) inference is both memory- and compute-intensive, traditionally relying on GPUs. However, the limited availability and high cost have motivated the adoption of high-perform...
CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead	Jinpeng Ye, Chongxi Wang, Wenqing Li, Bin Yuan, Shiyi Wang, Fenglu Zhang, Junyu Yue, Jianan Xie, Yunhao Ye, Haoyu Deng, Yingkun Zhou, Xin Cheng, Fuxin Zhang, Jian Wang	2026-04-13	下载	Matrix extensions have emerged as an essential feature in modern CPUs to address the surging demands of AI workloads. However, existing designs often incur substantial hardware and software design ove...
EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models	Jinane Bazzi, Mariam Rakka, Fadi Kurdahi, Mohammed E. Fouda, Ahmed Eltawil	2026-04-13	下载	The growing demand for deploying Small Language Models (SLMs) on edge devices, including laptops, smartphones, and embedded platforms, has exposed fundamental inefficiencies in existing accelerators.
CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction	Hector R. Rodriguez, Jiechen Huang, Wenjian Yu	2026-04-13	下载	We present CapBench, a fully reproducible, multi-PDK dataset for capacitance extraction. The dataset is derived from open-source designs, including single-core CPUs, systems-on-chip, and media acceler...
Technology solutions targeting the performance of gen-AI inference in resource constrained platforms	Joyjit Kundu, Joshua Klein, Aakash Patel, Dwaipayan Biswas	2026-04-13	下载	The rise of generative AI workloads, particularly language model inference, is intensifying on/off-chip memory pressure. Multimodal inputs such as video streams or images and downstream applications l...
Automated SVA Generation with LLMs	Lik Tung Fu, Qihang Wang, Shaokai Ren, Mengli Zhang, Sichao Yang, Jun Liu, Xi Wang	2026-04-13	下载	Functional verification remains a dominant cost in modern IC development, and SystemVerilog Assertions (SVAs) are critical for simulation-based monitoring and formal property checking.
Compiler Framework for Directional Transport in Zoned Neutral Atom Systems with AOD Assistance: A Hybrid Remote CZ Approach	Lingyi Kong, Chen Huang, Zhemin Zhang, Yidong Zhou, Xiangyu Ren, Shaochen Li, Zhiding Liang	2026-04-13	下载	We present a directional-transport (DT)-based remote CZ gate and compiler for zoned neutral-atom arrays that overcomes movement-bound entanglement limitations.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents	Swanand Rao, Kiran Kashalkar, Parvathi Somashekar, Priya Krishnan	2026-04-13	下载	The transition from stateless model inference to stateful agentic execution is reshaping the systems assumptions underlying modern AI infrastructure.
Evaluating Cross-Architecture Performance Modeling of Distributed ML Workloads Using StableHLO	Jonas Svedas, Nathan Laubeuf, Ryan Harvey, Arjun Singh, Changhai Man, Abubakr Nada, Tushar Krishna, James Myers, Debjyoti Bhattacharjee	2026-04-13	下载	Predicting the performance of large-scale distributed machine learning (ML) workloads across multiple accelerator architectures remains a central challenge in ML system design.
Accelerating Microswimmer Simulations via a Heterogeneous Pipelined Parallel-in-Time Framework	Ruixiang Huang, Weifan Liu	2026-04-13	下载	Simulating large-scale microswimmer dynamics in viscous fluid poses significant challenges due to the coupled high spatial and temporal complexity.
Predictive Bayesian Arbitration: A Scalable Noisy-OR Model with Service Criticality Awareness	Anil Jangam, Ganesh Karthick Rajendran, Roy Kantharajah	2026-04-13	下载	Geographically High-Available (Geo-HA) cluster systems are essential for service continuity in distributed cloud-native environments. However, traditional arbitration mechanisms, which are often predi...
GitFarm: Git as a Service for Large-Scale Monorepos	Preetam Dwivedi, Akshay Hacholli, Adam Bettigole	2026-04-13	下载	At the scale of Uber's monorepos, traditional Git workflows become a fundamental bottleneck. Cloning multi-gigabyte repositories, maintaining local checkouts, periodically syncing from upstream, and e...
Understanding Large-Scale HPC System Behavior Through Cluster-Based Visual Analytics	Allison Austin, Shilpika, Yan To Linus Lam, Yun-Hsin Kuo, Venkatram Vishwanath, Michael E. Papka, Kwan-Liu Ma	2026-04-13	下载	In high-performance computing (HPC) environments, system monitoring data is often unlabeled and high-dimensional, making it difficult to reliably detect and understand anomalous computing nodes.
ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism	Alan Aboudib, Rodrigo Lopez Portillo A., Kalei Brady, Steffen Cruz	2026-04-13	下载	Unlocking large-scale low-bandwidth decentralized training has the potential to utilize otherwise untapped compute resources. In centralized settings, large-scale multi-node training is primarily enab...
Nanvix: A Multikernel OS Design for High-Density Serverless Deployments	Carlos Segarra, Pedro Henrique Penna, Enrique Saurez, Íñigo Goiri, Peter Pietzuch, Shan Lu, Rodrigo Fonseca	2026-04-13	下载	Serverless providers strive for high resource utilization by optimizing deployment density: how many applications can be deployed per host server.
GPU Acceleration of Sparse Fully Homomorphic Encrypted DNNs	Lara D'Agata, Carlos Agulló-Domingo, Óscar Vera-López, Kaustubh Shivdikar, Ardhi W. B. Yudha, Ferhat Yaman, David Kaeli, José L. Abellán, Ian Colbert, José Cano	2026-04-13	下载	Fully homomorphic encryption (FHE) has recently attracted significant attention as both a cryptographic primitive and a systems challenge. Given the latest advances in accelerated computing, FHE prese...
CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead	Jinpeng Ye, Chongxi Wang, Wenqing Li, Bin Yuan, Shiyi Wang, Fenglu Zhang, Junyu Yue, Jianan Xie, Yunhao Ye, Haoyu Deng, Yingkun Zhou, Xin Cheng, Fuxin Zhang, Jian Wang	2026-04-13	下载	Matrix extensions have emerged as an essential feature in modern CPUs to address the surging demands of AI workloads. However, existing designs often incur substantial hardware and software design ove...
OpenDT: Exploring Datacenter Performance and Sustainability with a Self-Calibrating Digital Twin	Radu Nicolae, Jules van der Toorn, Stavriana Kraniti, Houcen Liu, Alexandru Iosup	2026-04-13	下载	Datacenters are the backbone of our digital society, but raise numerous operational challenges. We envision digital twins becoming primary instruments in datacenter operations, continuously and autono...
Characterizing the Impact of Congestion in Modern HPC Interconnects	Lorenzo Piarulli, Marco Faltelli, Dirk Pleiter, Karthee Sivalingam, Dancheng Zhang, Kexue Zhao, Matteo Turisini, Francesco Iannone, Aldo Artigiani, Daniele De Sensi	2026-04-13	下载	High-performance computing (HPC) systems increasingly support both scalable AI training and large-scale simulation workloads. Both typically rely heavily on collective communication operations.
A Full Compression Pipeline for Green Federated Learning in Communication-Constrained Environments	Elouan Colybes, Shirin Salehi, Anke Schmeink	2026-04-13	下载	Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, thereby preserving privacy. However, FL often suffers from significant communication a...
Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search	Daniel Nichols, Konstantinos Parasyris, Caetano Melone, Tal Ben-Nun, Giorgis Georgakoudis, Harshitha Menon	2026-04-13	下载	As high-performance computing and AI workloads become increasingly dependent on GPUs, maintaining high performance across rapidly evolving hardware generations has become a major challenge.
NimbusGuard: A Novel Framework for Proactive Kubernetes Autoscaling Using Deep Q-Networks	Chamath Wanigasooriya, Indrajith Ekanayake	2026-04-13	下载	Cloud native architecture is about building and running scalable microservice applications to take full advantage of the cloud environments. Managed Kubernetes is the powerhouse orchestrating cloud na...
QuMod: Parallel Quantum Job Scheduling on Modular QPUs using Circuit Cutting	Vinooth Kulkarni, Aaron Orenstein, Xinpeng Li, Shuai Xu, Daniel Blankenberg, Vipin Chaudhary	2026-04-13	下载	The quantum computing community is increasingly positioning quantum processors as accelerators within classical HPC workflows, analogous to GPUs and TPUs.
RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM Serving	Hossein Hosseini Kasnavieh, Christopher Leckie, Adel N. Toosi	2026-04-13	下载	Multi-model LLM routing has emerged as an effective approach for reducing serving cost and latency while maintaining output quality by assigning each prompt to an appropriate model.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
BLAST: Blockchain-based LLM-powered Agentic Spectrum Trading	Anas Abognah, Otman Basir	2026-04-13	下载	The management of radio frequency spectrum is undergoing a paradigm shift from static, centralized command-and-control models to dynamic, market-driven approaches.
A Geometric Algebra-informed NeRF Framework for Generalizable Wireless Channel Prediction	Jingzhou Shen, Luis Lago Enamorado, Shiwen Mao, Xuyu Wang	2026-04-13	下载	In this paper, we propose the geometric algebra-informed neural radiance fields (GAI-NeRF), a novel framework for wireless channel prediction that leverages geometric algebra attention mechanisms to c...
Network Slice Embedding over Space Division Multiplexed Elastic Optical Networks	Divya Khanure, Riti Gour, Congzhou Li, Jason P. Jue	2026-04-13	下载	Network slicing over space division multiplexed elastic optical networks (SDM EONs) enables efficient multiservice provisioning on a shared optical substrate.
ISAC-Enabled Non-Terrestrial Networks for 6G: Design Principles, Standardization, Performance Tradeoffs, and Use Cases	Muhammad Ali Jamshed, Rohit Singh, Malik Muhammad Saad, Aryan Kaushik, Wonjae Shin, Miguel Dajer, Alain Mourad	2026-04-13	下载	Non-Terrestrial Networks (NTN) have emerged as a key enabler to fully realize the vision of integrated, intelligent, and ubiquitous connectivity in 6G systems.
Security Implications of 5G Communication in Industrial Systems	Stefan Lenz, Sotiris Michaelides, Moritz Rickert, Jonas Holtwick, Martin Henze	2026-04-13	下载	Traditionally, industrial control systems (ICS) were designed without security in mind, prioritizing availability and real-time communication.
Programmable Packet Scheduling with Dynamic Reordering at Line Rate	Zekun Wang, Binghao Yue, Yichen Deng, Weitao Pan, Jiangyi Shi, Yue Hao	2026-04-13	下载	High-speed switch packet scheduling demands both line-rate performance and programmability. Existing programmable hardware scheduling models, such as PIFO and PIEO, can express a broad range of schedu...
BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection	Ammar Bhilwarawala, Likhamba Rongmei, Harsh Sharma, Arya Jena, Kaushal Singh, Jayashree Piri, Raghunath Dey	2026-04-13	下载	IoT botnet detection has advanced, yet most published systems are validated on a single dataset and rarely generalise across environments. Heterogeneous feature spaces make multi-dataset training prac...
RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM Serving	Hossein Hosseini Kasnavieh, Christopher Leckie, Adel N. Toosi	2026-04-13	下载	Multi-model LLM routing has emerged as an effective approach for reducing serving cost and latency while maintaining output quality by assigning each prompt to an appropriate model.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
ProbeLogits: Kernel-Level LLM Inference Primitives for AI-Native Operating Systems	Daeyeon Son	2026-04-13	下载	An OS kernel that runs LLM inference internally can read logit distributions before any text is generated -- and act on them as a governance primitive.
Nanvix: A Multikernel OS Design for High-Density Serverless Deployments	Carlos Segarra, Pedro Henrique Penna, Enrique Saurez, Íñigo Goiri, Peter Pietzuch, Shan Lu, Rodrigo Fonseca	2026-04-13	下载	Serverless providers strive for high resource utilization by optimizing deployment density: how many applications can be deployed per host server.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
GPU Acceleration of Sparse Fully Homomorphic Encrypted DNNs	Lara D'Agata, Carlos Agulló-Domingo, Óscar Vera-López, Kaustubh Shivdikar, Ardhi W. B. Yudha, Ferhat Yaman, David Kaeli, José L. Abellán, Ian Colbert, José Cano	2026-04-13	下载	Fully homomorphic encryption (FHE) has recently attracted significant attention as both a cryptographic primitive and a systems challenge. Given the latest advances in accelerated computing, FHE prese...
Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDA-Q: Performance and Expressiveness Advantages	Vinooth Kulkarni, Jaehyun Lee, Adam Hutchings, Anas Albahri, Jai Nana, Shuai Xu, Vipin Chaudhary	2026-04-13	下载	Dynamic quantum circuits with mid-circuit measurement and classical feedforward are essential for near-term algorithms such as error mitigation, adaptive phase estimation, and Variational Quantum Eige...
Architectural Trade-offs in the Energy-Efficient Era: A Comparative Study of power-capping NVIDIA H100 and H200	Aditya Ujeniya, Jan Eitzinger, Georg Hager, Gerhard Wellein	2026-04-13	下载	Modern NVIDIA GPUs like the H100 (HBM2e) and H200 (HBM3e) share similar compute characteristics but differ significantly in memory interface technology and bandwidth.
Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search	Daniel Nichols, Konstantinos Parasyris, Caetano Melone, Tal Ben-Nun, Giorgis Georgakoudis, Harshitha Menon	2026-04-13	下载	As high-performance computing and AI workloads become increasingly dependent on GPUs, maintaining high performance across rapidly evolving hardware generations has become a major challenge.
LCS.jl: A High-Performance, Multi-Platform Computational Model in Julia for Turbulent Particle-Laden Flows	Taketo Tominaga, Ryo Onishi	2026-04-13	下载	Multiphase turbulent flow phenomena are observed not only in industrial devices but also in environmental flows, and direct numerical simulation (DNS) plays a key role in their investigation.