Skip to content

2026-04-13

cs.AR - Architecture

标题作者发布日期PDF摘要
Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI AgentsSwanand Rao, Kiran Kashalkar, Parvathi Somashekar, Priya Krishnan2026-04-13下载The transition from stateless model inference to stateful agentic execution is reshaping the systems assumptions underlying modern AI infrastructure.
Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-CoresYixian Shen, Chaoyao Shen, Jan Deen, George Floros, Andy Pimentel, Anuj Pathania2026-04-13下载Large Foundation Model (LFM) inference is both memory- and compute-intensive, traditionally relying on GPUs. However, the limited availability and high cost have motivated the adoption of high-perform...
CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design OverheadJinpeng Ye, Chongxi Wang, Wenqing Li, Bin Yuan, Shiyi Wang, Fenglu Zhang, Junyu Yue, Jianan Xie, Yunhao Ye, Haoyu Deng, Yingkun Zhou, Xin Cheng, Fuxin Zhang, Jian Wang2026-04-13下载Matrix extensions have emerged as an essential feature in modern CPUs to address the surging demands of AI workloads. However, existing designs often incur substantial hardware and software design ove...
EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language ModelsJinane Bazzi, Mariam Rakka, Fadi Kurdahi, Mohammed E. Fouda, Ahmed Eltawil2026-04-13下载The growing demand for deploying Small Language Models (SLMs) on edge devices, including laptops, smartphones, and embedded platforms, has exposed fundamental inefficiencies in existing accelerators.
CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance ExtractionHector R. Rodriguez, Jiechen Huang, Wenjian Yu2026-04-13下载We present CapBench, a fully reproducible, multi-PDK dataset for capacitance extraction. The dataset is derived from open-source designs, including single-core CPUs, systems-on-chip, and media acceler...
Technology solutions targeting the performance of gen-AI inference in resource constrained platformsJoyjit Kundu, Joshua Klein, Aakash Patel, Dwaipayan Biswas2026-04-13下载The rise of generative AI workloads, particularly language model inference, is intensifying on/off-chip memory pressure. Multimodal inputs such as video streams or images and downstream applications l...
Automated SVA Generation with LLMsLik Tung Fu, Qihang Wang, Shaokai Ren, Mengli Zhang, Sichao Yang, Jun Liu, Xi Wang2026-04-13下载Functional verification remains a dominant cost in modern IC development, and SystemVerilog Assertions (SVAs) are critical for simulation-based monitoring and formal property checking.
Compiler Framework for Directional Transport in Zoned Neutral Atom Systems with AOD Assistance: A Hybrid Remote CZ ApproachLingyi Kong, Chen Huang, Zhemin Zhang, Yidong Zhou, Xiangyu Ren, Shaochen Li, Zhiding Liang2026-04-13下载We present a directional-transport (DT)-based remote CZ gate and compiler for zoned neutral-atom arrays that overcomes movement-bound entanglement limitations.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI AgentsSwanand Rao, Kiran Kashalkar, Parvathi Somashekar, Priya Krishnan2026-04-13下载The transition from stateless model inference to stateful agentic execution is reshaping the systems assumptions underlying modern AI infrastructure.
Evaluating Cross-Architecture Performance Modeling of Distributed ML Workloads Using StableHLOJonas Svedas, Nathan Laubeuf, Ryan Harvey, Arjun Singh, Changhai Man, Abubakr Nada, Tushar Krishna, James Myers, Debjyoti Bhattacharjee2026-04-13下载Predicting the performance of large-scale distributed machine learning (ML) workloads across multiple accelerator architectures remains a central challenge in ML system design.
Accelerating Microswimmer Simulations via a Heterogeneous Pipelined Parallel-in-Time FrameworkRuixiang Huang, Weifan Liu2026-04-13下载Simulating large-scale microswimmer dynamics in viscous fluid poses significant challenges due to the coupled high spatial and temporal complexity.
Predictive Bayesian Arbitration: A Scalable Noisy-OR Model with Service Criticality AwarenessAnil Jangam, Ganesh Karthick Rajendran, Roy Kantharajah2026-04-13下载Geographically High-Available (Geo-HA) cluster systems are essential for service continuity in distributed cloud-native environments. However, traditional arbitration mechanisms, which are often predi...
GitFarm: Git as a Service for Large-Scale MonoreposPreetam Dwivedi, Akshay Hacholli, Adam Bettigole2026-04-13下载At the scale of Uber's monorepos, traditional Git workflows become a fundamental bottleneck. Cloning multi-gigabyte repositories, maintaining local checkouts, periodically syncing from upstream, and e...
Understanding Large-Scale HPC System Behavior Through Cluster-Based Visual AnalyticsAllison Austin, Shilpika, Yan To Linus Lam, Yun-Hsin Kuo, Venkatram Vishwanath, Michael E. Papka, Kwan-Liu Ma2026-04-13下载In high-performance computing (HPC) environments, system monitoring data is often unlabeled and high-dimensional, making it difficult to reliably detect and understand anomalous computing nodes.
ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline ParallelismAlan Aboudib, Rodrigo Lopez Portillo A., Kalei Brady, Steffen Cruz2026-04-13下载Unlocking large-scale low-bandwidth decentralized training has the potential to utilize otherwise untapped compute resources. In centralized settings, large-scale multi-node training is primarily enab...
Nanvix: A Multikernel OS Design for High-Density Serverless DeploymentsCarlos Segarra, Pedro Henrique Penna, Enrique Saurez, Íñigo Goiri, Peter Pietzuch, Shan Lu, Rodrigo Fonseca2026-04-13下载Serverless providers strive for high resource utilization by optimizing deployment density: how many applications can be deployed per host server.
GPU Acceleration of Sparse Fully Homomorphic Encrypted DNNsLara D'Agata, Carlos Agulló-Domingo, Óscar Vera-López, Kaustubh Shivdikar, Ardhi W. B. Yudha, Ferhat Yaman, David Kaeli, José L. Abellán, Ian Colbert, José Cano2026-04-13下载Fully homomorphic encryption (FHE) has recently attracted significant attention as both a cryptographic primitive and a systems challenge. Given the latest advances in accelerated computing, FHE prese...
CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design OverheadJinpeng Ye, Chongxi Wang, Wenqing Li, Bin Yuan, Shiyi Wang, Fenglu Zhang, Junyu Yue, Jianan Xie, Yunhao Ye, Haoyu Deng, Yingkun Zhou, Xin Cheng, Fuxin Zhang, Jian Wang2026-04-13下载Matrix extensions have emerged as an essential feature in modern CPUs to address the surging demands of AI workloads. However, existing designs often incur substantial hardware and software design ove...
OpenDT: Exploring Datacenter Performance and Sustainability with a Self-Calibrating Digital TwinRadu Nicolae, Jules van der Toorn, Stavriana Kraniti, Houcen Liu, Alexandru Iosup2026-04-13下载Datacenters are the backbone of our digital society, but raise numerous operational challenges. We envision digital twins becoming primary instruments in datacenter operations, continuously and autono...
Characterizing the Impact of Congestion in Modern HPC InterconnectsLorenzo Piarulli, Marco Faltelli, Dirk Pleiter, Karthee Sivalingam, Dancheng Zhang, Kexue Zhao, Matteo Turisini, Francesco Iannone, Aldo Artigiani, Daniele De Sensi2026-04-13下载High-performance computing (HPC) systems increasingly support both scalable AI training and large-scale simulation workloads. Both typically rely heavily on collective communication operations.
A Full Compression Pipeline for Green Federated Learning in Communication-Constrained EnvironmentsElouan Colybes, Shirin Salehi, Anke Schmeink2026-04-13下载Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, thereby preserving privacy. However, FL often suffers from significant communication a...
Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary SearchDaniel Nichols, Konstantinos Parasyris, Caetano Melone, Tal Ben-Nun, Giorgis Georgakoudis, Harshitha Menon2026-04-13下载As high-performance computing and AI workloads become increasingly dependent on GPUs, maintaining high performance across rapidly evolving hardware generations has become a major challenge.
NimbusGuard: A Novel Framework for Proactive Kubernetes Autoscaling Using Deep Q-NetworksChamath Wanigasooriya, Indrajith Ekanayake2026-04-13下载Cloud native architecture is about building and running scalable microservice applications to take full advantage of the cloud environments. Managed Kubernetes is the powerhouse orchestrating cloud na...
QuMod: Parallel Quantum Job Scheduling on Modular QPUs using Circuit CuttingVinooth Kulkarni, Aaron Orenstein, Xinpeng Li, Shuai Xu, Daniel Blankenberg, Vipin Chaudhary2026-04-13下载The quantum computing community is increasingly positioning quantum processors as accelerators within classical HPC workflows, analogous to GPUs and TPUs.
RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM ServingHossein Hosseini Kasnavieh, Christopher Leckie, Adel N. Toosi2026-04-13下载Multi-model LLM routing has emerged as an effective approach for reducing serving cost and latency while maintaining output quality by assigning each prompt to an appropriate model.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
BLAST: Blockchain-based LLM-powered Agentic Spectrum TradingAnas Abognah, Otman Basir2026-04-13下载The management of radio frequency spectrum is undergoing a paradigm shift from static, centralized command-and-control models to dynamic, market-driven approaches.
A Geometric Algebra-informed NeRF Framework for Generalizable Wireless Channel PredictionJingzhou Shen, Luis Lago Enamorado, Shiwen Mao, Xuyu Wang2026-04-13下载In this paper, we propose the geometric algebra-informed neural radiance fields (GAI-NeRF), a novel framework for wireless channel prediction that leverages geometric algebra attention mechanisms to c...
Network Slice Embedding over Space Division Multiplexed Elastic Optical NetworksDivya Khanure, Riti Gour, Congzhou Li, Jason P. Jue2026-04-13下载Network slicing over space division multiplexed elastic optical networks (SDM EONs) enables efficient multiservice provisioning on a shared optical substrate.
ISAC-Enabled Non-Terrestrial Networks for 6G: Design Principles, Standardization, Performance Tradeoffs, and Use CasesMuhammad Ali Jamshed, Rohit Singh, Malik Muhammad Saad, Aryan Kaushik, Wonjae Shin, Miguel Dajer, Alain Mourad2026-04-13下载Non-Terrestrial Networks (NTN) have emerged as a key enabler to fully realize the vision of integrated, intelligent, and ubiquitous connectivity in 6G systems.
Security Implications of 5G Communication in Industrial SystemsStefan Lenz, Sotiris Michaelides, Moritz Rickert, Jonas Holtwick, Martin Henze2026-04-13下载Traditionally, industrial control systems (ICS) were designed without security in mind, prioritizing availability and real-time communication.
Programmable Packet Scheduling with Dynamic Reordering at Line RateZekun Wang, Binghao Yue, Yichen Deng, Weitao Pan, Jiangyi Shi, Yue Hao2026-04-13下载High-speed switch packet scheduling demands both line-rate performance and programmability. Existing programmable hardware scheduling models, such as PIFO and PIEO, can express a broad range of schedu...
BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet DetectionAmmar Bhilwarawala, Likhamba Rongmei, Harsh Sharma, Arya Jena, Kaushal Singh, Jayashree Piri, Raghunath Dey2026-04-13下载IoT botnet detection has advanced, yet most published systems are validated on a single dataset and rarely generalise across environments. Heterogeneous feature spaces make multi-dataset training prac...
RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM ServingHossein Hosseini Kasnavieh, Christopher Leckie, Adel N. Toosi2026-04-13下载Multi-model LLM routing has emerged as an effective approach for reducing serving cost and latency while maintaining output quality by assigning each prompt to an appropriate model.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
ProbeLogits: Kernel-Level LLM Inference Primitives for AI-Native Operating SystemsDaeyeon Son2026-04-13下载An OS kernel that runs LLM inference internally can read logit distributions before any text is generated -- and act on them as a governance primitive.
Nanvix: A Multikernel OS Design for High-Density Serverless DeploymentsCarlos Segarra, Pedro Henrique Penna, Enrique Saurez, Íñigo Goiri, Peter Pietzuch, Shan Lu, Rodrigo Fonseca2026-04-13下载Serverless providers strive for high resource utilization by optimizing deployment density: how many applications can be deployed per host server.

cs.PF - Performance

标题作者发布日期PDF摘要
GPU Acceleration of Sparse Fully Homomorphic Encrypted DNNsLara D'Agata, Carlos Agulló-Domingo, Óscar Vera-López, Kaustubh Shivdikar, Ardhi W. B. Yudha, Ferhat Yaman, David Kaeli, José L. Abellán, Ian Colbert, José Cano2026-04-13下载Fully homomorphic encryption (FHE) has recently attracted significant attention as both a cryptographic primitive and a systems challenge. Given the latest advances in accelerated computing, FHE prese...
Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDA-Q: Performance and Expressiveness AdvantagesVinooth Kulkarni, Jaehyun Lee, Adam Hutchings, Anas Albahri, Jai Nana, Shuai Xu, Vipin Chaudhary2026-04-13下载Dynamic quantum circuits with mid-circuit measurement and classical feedforward are essential for near-term algorithms such as error mitigation, adaptive phase estimation, and Variational Quantum Eige...
Architectural Trade-offs in the Energy-Efficient Era: A Comparative Study of power-capping NVIDIA H100 and H200Aditya Ujeniya, Jan Eitzinger, Georg Hager, Gerhard Wellein2026-04-13下载Modern NVIDIA GPUs like the H100 (HBM2e) and H200 (HBM3e) share similar compute characteristics but differ significantly in memory interface technology and bandwidth.
Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary SearchDaniel Nichols, Konstantinos Parasyris, Caetano Melone, Tal Ben-Nun, Giorgis Georgakoudis, Harshitha Menon2026-04-13下载As high-performance computing and AI workloads become increasingly dependent on GPUs, maintaining high performance across rapidly evolving hardware generations has become a major challenge.
LCS.jl: A High-Performance, Multi-Platform Computational Model in Julia for Turbulent Particle-Laden FlowsTaketo Tominaga, Ryo Onishi2026-04-13下载Multiphase turbulent flow phenomena are observed not only in industrial devices but also in environmental flows, and direct numerical simulation (DNS) plays a key role in their investigation.

基于 VitePress 构建