Appearance
2026-04-13
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents | Swanand Rao, Kiran Kashalkar, Parvathi Somashekar, Priya Krishnan | 2026-04-13 | 下载 | The transition from stateless model inference to stateful agentic execution is reshaping the systems assumptions underlying modern AI infrastructure. |
| Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores | Yixian Shen, Chaoyao Shen, Jan Deen, George Floros, Andy Pimentel, Anuj Pathania | 2026-04-13 | 下载 | Large Foundation Model (LFM) inference is both memory- and compute-intensive, traditionally relying on GPUs. However, the limited availability and high cost have motivated the adoption of high-perform... |
| CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead | Jinpeng Ye, Chongxi Wang, Wenqing Li, Bin Yuan, Shiyi Wang, Fenglu Zhang, Junyu Yue, Jianan Xie, Yunhao Ye, Haoyu Deng, Yingkun Zhou, Xin Cheng, Fuxin Zhang, Jian Wang | 2026-04-13 | 下载 | Matrix extensions have emerged as an essential feature in modern CPUs to address the surging demands of AI workloads. However, existing designs often incur substantial hardware and software design ove... |
| EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models | Jinane Bazzi, Mariam Rakka, Fadi Kurdahi, Mohammed E. Fouda, Ahmed Eltawil | 2026-04-13 | 下载 | The growing demand for deploying Small Language Models (SLMs) on edge devices, including laptops, smartphones, and embedded platforms, has exposed fundamental inefficiencies in existing accelerators. |
| CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction | Hector R. Rodriguez, Jiechen Huang, Wenjian Yu | 2026-04-13 | 下载 | We present CapBench, a fully reproducible, multi-PDK dataset for capacitance extraction. The dataset is derived from open-source designs, including single-core CPUs, systems-on-chip, and media acceler... |
| Technology solutions targeting the performance of gen-AI inference in resource constrained platforms | Joyjit Kundu, Joshua Klein, Aakash Patel, Dwaipayan Biswas | 2026-04-13 | 下载 | The rise of generative AI workloads, particularly language model inference, is intensifying on/off-chip memory pressure. Multimodal inputs such as video streams or images and downstream applications l... |
| Automated SVA Generation with LLMs | Lik Tung Fu, Qihang Wang, Shaokai Ren, Mengli Zhang, Sichao Yang, Jun Liu, Xi Wang | 2026-04-13 | 下载 | Functional verification remains a dominant cost in modern IC development, and SystemVerilog Assertions (SVAs) are critical for simulation-based monitoring and formal property checking. |
| Compiler Framework for Directional Transport in Zoned Neutral Atom Systems with AOD Assistance: A Hybrid Remote CZ Approach | Lingyi Kong, Chen Huang, Zhemin Zhang, Yidong Zhou, Xiangyu Ren, Shaochen Li, Zhiding Liang | 2026-04-13 | 下载 | We present a directional-transport (DT)-based remote CZ gate and compiler for zoned neutral-atom arrays that overcomes movement-bound entanglement limitations. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents | Swanand Rao, Kiran Kashalkar, Parvathi Somashekar, Priya Krishnan | 2026-04-13 | 下载 | The transition from stateless model inference to stateful agentic execution is reshaping the systems assumptions underlying modern AI infrastructure. |
| Evaluating Cross-Architecture Performance Modeling of Distributed ML Workloads Using StableHLO | Jonas Svedas, Nathan Laubeuf, Ryan Harvey, Arjun Singh, Changhai Man, Abubakr Nada, Tushar Krishna, James Myers, Debjyoti Bhattacharjee | 2026-04-13 | 下载 | Predicting the performance of large-scale distributed machine learning (ML) workloads across multiple accelerator architectures remains a central challenge in ML system design. |
| Accelerating Microswimmer Simulations via a Heterogeneous Pipelined Parallel-in-Time Framework | Ruixiang Huang, Weifan Liu | 2026-04-13 | 下载 | Simulating large-scale microswimmer dynamics in viscous fluid poses significant challenges due to the coupled high spatial and temporal complexity. |
| Predictive Bayesian Arbitration: A Scalable Noisy-OR Model with Service Criticality Awareness | Anil Jangam, Ganesh Karthick Rajendran, Roy Kantharajah | 2026-04-13 | 下载 | Geographically High-Available (Geo-HA) cluster systems are essential for service continuity in distributed cloud-native environments. However, traditional arbitration mechanisms, which are often predi... |
| GitFarm: Git as a Service for Large-Scale Monorepos | Preetam Dwivedi, Akshay Hacholli, Adam Bettigole | 2026-04-13 | 下载 | At the scale of Uber's monorepos, traditional Git workflows become a fundamental bottleneck. Cloning multi-gigabyte repositories, maintaining local checkouts, periodically syncing from upstream, and e... |
| Understanding Large-Scale HPC System Behavior Through Cluster-Based Visual Analytics | Allison Austin, Shilpika, Yan To Linus Lam, Yun-Hsin Kuo, Venkatram Vishwanath, Michael E. Papka, Kwan-Liu Ma | 2026-04-13 | 下载 | In high-performance computing (HPC) environments, system monitoring data is often unlabeled and high-dimensional, making it difficult to reliably detect and understand anomalous computing nodes. |
| ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism | Alan Aboudib, Rodrigo Lopez Portillo A., Kalei Brady, Steffen Cruz | 2026-04-13 | 下载 | Unlocking large-scale low-bandwidth decentralized training has the potential to utilize otherwise untapped compute resources. In centralized settings, large-scale multi-node training is primarily enab... |
| Nanvix: A Multikernel OS Design for High-Density Serverless Deployments | Carlos Segarra, Pedro Henrique Penna, Enrique Saurez, Íñigo Goiri, Peter Pietzuch, Shan Lu, Rodrigo Fonseca | 2026-04-13 | 下载 | Serverless providers strive for high resource utilization by optimizing deployment density: how many applications can be deployed per host server. |
| GPU Acceleration of Sparse Fully Homomorphic Encrypted DNNs | Lara D'Agata, Carlos Agulló-Domingo, Óscar Vera-López, Kaustubh Shivdikar, Ardhi W. B. Yudha, Ferhat Yaman, David Kaeli, José L. Abellán, Ian Colbert, José Cano | 2026-04-13 | 下载 | Fully homomorphic encryption (FHE) has recently attracted significant attention as both a cryptographic primitive and a systems challenge. Given the latest advances in accelerated computing, FHE prese... |
| CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead | Jinpeng Ye, Chongxi Wang, Wenqing Li, Bin Yuan, Shiyi Wang, Fenglu Zhang, Junyu Yue, Jianan Xie, Yunhao Ye, Haoyu Deng, Yingkun Zhou, Xin Cheng, Fuxin Zhang, Jian Wang | 2026-04-13 | 下载 | Matrix extensions have emerged as an essential feature in modern CPUs to address the surging demands of AI workloads. However, existing designs often incur substantial hardware and software design ove... |
| OpenDT: Exploring Datacenter Performance and Sustainability with a Self-Calibrating Digital Twin | Radu Nicolae, Jules van der Toorn, Stavriana Kraniti, Houcen Liu, Alexandru Iosup | 2026-04-13 | 下载 | Datacenters are the backbone of our digital society, but raise numerous operational challenges. We envision digital twins becoming primary instruments in datacenter operations, continuously and autono... |
| Characterizing the Impact of Congestion in Modern HPC Interconnects | Lorenzo Piarulli, Marco Faltelli, Dirk Pleiter, Karthee Sivalingam, Dancheng Zhang, Kexue Zhao, Matteo Turisini, Francesco Iannone, Aldo Artigiani, Daniele De Sensi | 2026-04-13 | 下载 | High-performance computing (HPC) systems increasingly support both scalable AI training and large-scale simulation workloads. Both typically rely heavily on collective communication operations. |
| A Full Compression Pipeline for Green Federated Learning in Communication-Constrained Environments | Elouan Colybes, Shirin Salehi, Anke Schmeink | 2026-04-13 | 下载 | Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, thereby preserving privacy. However, FL often suffers from significant communication a... |
| Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search | Daniel Nichols, Konstantinos Parasyris, Caetano Melone, Tal Ben-Nun, Giorgis Georgakoudis, Harshitha Menon | 2026-04-13 | 下载 | As high-performance computing and AI workloads become increasingly dependent on GPUs, maintaining high performance across rapidly evolving hardware generations has become a major challenge. |
| NimbusGuard: A Novel Framework for Proactive Kubernetes Autoscaling Using Deep Q-Networks | Chamath Wanigasooriya, Indrajith Ekanayake | 2026-04-13 | 下载 | Cloud native architecture is about building and running scalable microservice applications to take full advantage of the cloud environments. Managed Kubernetes is the powerhouse orchestrating cloud na... |
| QuMod: Parallel Quantum Job Scheduling on Modular QPUs using Circuit Cutting | Vinooth Kulkarni, Aaron Orenstein, Xinpeng Li, Shuai Xu, Daniel Blankenberg, Vipin Chaudhary | 2026-04-13 | 下载 | The quantum computing community is increasingly positioning quantum processors as accelerators within classical HPC workflows, analogous to GPUs and TPUs. |
| RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM Serving | Hossein Hosseini Kasnavieh, Christopher Leckie, Adel N. Toosi | 2026-04-13 | 下载 | Multi-model LLM routing has emerged as an effective approach for reducing serving cost and latency while maintaining output quality by assigning each prompt to an appropriate model. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| BLAST: Blockchain-based LLM-powered Agentic Spectrum Trading | Anas Abognah, Otman Basir | 2026-04-13 | 下载 | The management of radio frequency spectrum is undergoing a paradigm shift from static, centralized command-and-control models to dynamic, market-driven approaches. |
| A Geometric Algebra-informed NeRF Framework for Generalizable Wireless Channel Prediction | Jingzhou Shen, Luis Lago Enamorado, Shiwen Mao, Xuyu Wang | 2026-04-13 | 下载 | In this paper, we propose the geometric algebra-informed neural radiance fields (GAI-NeRF), a novel framework for wireless channel prediction that leverages geometric algebra attention mechanisms to c... |
| Network Slice Embedding over Space Division Multiplexed Elastic Optical Networks | Divya Khanure, Riti Gour, Congzhou Li, Jason P. Jue | 2026-04-13 | 下载 | Network slicing over space division multiplexed elastic optical networks (SDM EONs) enables efficient multiservice provisioning on a shared optical substrate. |
| ISAC-Enabled Non-Terrestrial Networks for 6G: Design Principles, Standardization, Performance Tradeoffs, and Use Cases | Muhammad Ali Jamshed, Rohit Singh, Malik Muhammad Saad, Aryan Kaushik, Wonjae Shin, Miguel Dajer, Alain Mourad | 2026-04-13 | 下载 | Non-Terrestrial Networks (NTN) have emerged as a key enabler to fully realize the vision of integrated, intelligent, and ubiquitous connectivity in 6G systems. |
| Security Implications of 5G Communication in Industrial Systems | Stefan Lenz, Sotiris Michaelides, Moritz Rickert, Jonas Holtwick, Martin Henze | 2026-04-13 | 下载 | Traditionally, industrial control systems (ICS) were designed without security in mind, prioritizing availability and real-time communication. |
| Programmable Packet Scheduling with Dynamic Reordering at Line Rate | Zekun Wang, Binghao Yue, Yichen Deng, Weitao Pan, Jiangyi Shi, Yue Hao | 2026-04-13 | 下载 | High-speed switch packet scheduling demands both line-rate performance and programmability. Existing programmable hardware scheduling models, such as PIFO and PIEO, can express a broad range of schedu... |
| BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection | Ammar Bhilwarawala, Likhamba Rongmei, Harsh Sharma, Arya Jena, Kaushal Singh, Jayashree Piri, Raghunath Dey | 2026-04-13 | 下载 | IoT botnet detection has advanced, yet most published systems are validated on a single dataset and rarely generalise across environments. Heterogeneous feature spaces make multi-dataset training prac... |
| RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM Serving | Hossein Hosseini Kasnavieh, Christopher Leckie, Adel N. Toosi | 2026-04-13 | 下载 | Multi-model LLM routing has emerged as an effective approach for reducing serving cost and latency while maintaining output quality by assigning each prompt to an appropriate model. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| ProbeLogits: Kernel-Level LLM Inference Primitives for AI-Native Operating Systems | Daeyeon Son | 2026-04-13 | 下载 | An OS kernel that runs LLM inference internally can read logit distributions before any text is generated -- and act on them as a governance primitive. |
| Nanvix: A Multikernel OS Design for High-Density Serverless Deployments | Carlos Segarra, Pedro Henrique Penna, Enrique Saurez, Íñigo Goiri, Peter Pietzuch, Shan Lu, Rodrigo Fonseca | 2026-04-13 | 下载 | Serverless providers strive for high resource utilization by optimizing deployment density: how many applications can be deployed per host server. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| GPU Acceleration of Sparse Fully Homomorphic Encrypted DNNs | Lara D'Agata, Carlos Agulló-Domingo, Óscar Vera-López, Kaustubh Shivdikar, Ardhi W. B. Yudha, Ferhat Yaman, David Kaeli, José L. Abellán, Ian Colbert, José Cano | 2026-04-13 | 下载 | Fully homomorphic encryption (FHE) has recently attracted significant attention as both a cryptographic primitive and a systems challenge. Given the latest advances in accelerated computing, FHE prese... |
| Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDA-Q: Performance and Expressiveness Advantages | Vinooth Kulkarni, Jaehyun Lee, Adam Hutchings, Anas Albahri, Jai Nana, Shuai Xu, Vipin Chaudhary | 2026-04-13 | 下载 | Dynamic quantum circuits with mid-circuit measurement and classical feedforward are essential for near-term algorithms such as error mitigation, adaptive phase estimation, and Variational Quantum Eige... |
| Architectural Trade-offs in the Energy-Efficient Era: A Comparative Study of power-capping NVIDIA H100 and H200 | Aditya Ujeniya, Jan Eitzinger, Georg Hager, Gerhard Wellein | 2026-04-13 | 下载 | Modern NVIDIA GPUs like the H100 (HBM2e) and H200 (HBM3e) share similar compute characteristics but differ significantly in memory interface technology and bandwidth. |
| Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search | Daniel Nichols, Konstantinos Parasyris, Caetano Melone, Tal Ben-Nun, Giorgis Georgakoudis, Harshitha Menon | 2026-04-13 | 下载 | As high-performance computing and AI workloads become increasingly dependent on GPUs, maintaining high performance across rapidly evolving hardware generations has become a major challenge. |
| LCS.jl: A High-Performance, Multi-Platform Computational Model in Julia for Turbulent Particle-Laden Flows | Taketo Tominaga, Ryo Onishi | 2026-04-13 | 下载 | Multiphase turbulent flow phenomena are observed not only in industrial devices but also in environmental flows, and direct numerical simulation (DNS) plays a key role in their investigation. |