2026-02-10

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A	Aaron Jarmusch, Connor Vitz, Sunita Chandrasekaran	2026-02-10	下载	The AMD MI300A APU integrates CDNA3 GPUs with high-bandwidth memory and advanced accelerator features: FP8 matrix cores, asynchronous compute engines (ACE), and 2:4 structured sparsity.
Area-Efficient In-Memory Computing for Mixture-of-Experts via Multiplexing and Caching	Hanyuan Gao, Xiaoxuan Yang	2026-02-10	下载	Mixture-of-Experts (MoE) layers activate a subset of model weights, dubbed experts, to improve model performance. MoE is particularly promising for deployment on process-in-memory (PIM) architectures,...
ACE-RTL: When Agentic Context Evolution Meets RTL-Specialized LLMs	Chenhui Deng, Zhongzhi Yu, Guan-Ting Liu, Nathaniel Pinckney, Brucek Khailany, Haoxing Ren	2026-02-10	下载	Recent advances in LLMs have sparked growing interest in applying them to hardware design automation, particularly for accurate RTL code generation.
KernelCraft: Benchmarking for Agentic Close-to-Metal Kernel Generation on Emerging Hardware	Jiayi Nie, Haoran Wu, Yao Lai, Zeyu Cao, Cheng Zhang, Binglei Lou, Erwei Wang, Jianyi Cheng, Timothy M. Jones, Robert Mullins, Rika Antonova, Yiren Zhao	2026-02-10	下载	New AI accelerators with novel instruction set architectures (ISAs) often require developers to manually craft low-level kernels -- a time-consuming, laborious, and error-prone process that cannot sca...
Development of an Energy-Efficient and Real-Time Data Movement Strategy for Next-Generation Heterogeneous Mixed-Criticality Systems	Thomas Benz	2026-02-10	下载	Industrial domains such as automotive, robotics, and aerospace are rapidly evolving to satisfy the increasing demand for machine-learning-driven Autonomy, Connectivity, Electrification, and Shared mob...
AnalogToBi: Device-Level Analog Circuit Topology Generation via Bipartite Graph and Grammar Guided Decoding	Seungmin Kim, Mingun Kim, Yuna Lee, Yulhwa Kim	2026-02-10	下载	Automatic generation of device-level analog circuit topologies remains a fundamental challenge in analog design automation. Recent transformer-based approaches have shown promise, yet they often suffe...
SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code Generation	Mu-Chi Chen, Yu-Hung Kao, Po-Hsuan Huang, Shao-Chun Ho, Hsiang-Yu Tsou, I-Ting Wu, En-Ming Huang, Yu-Kai Hung, Wei-Po Hsin, Cheng Liang, Chia-Heng Tu, Shih-Hao Hung, H. T. Kung	2026-02-10	下载	Large language models (LLMs) have recently emerged as a promising approach for automating Verilog code generation; however, existing methods primarily emphasize syntactic correctness and often rely on...
Accelerating Post-Quantum Cryptography via LLM-Driven Hardware-Software Co-Design	Yuchao Liao, Tosiron Adegbija, Roman Lysecky	2026-02-10	下载	Post-quantum cryptography (PQC) is crucial for securing data against emerging quantum threats. However, its algorithms are computationally complex and difficult to implement efficiently on hardware.
FHECore: Rethinking GPU Microarchitecture for Fully Homomorphic Encryption	Lohit Daksha, Seyda Guzelhan, Kaustubh Shivdikar, Carlos Agulló Domingo, Óscar Vera Lopez, Gilbert Jonatan, Hubert Dymarkowski, Aymane El Jerari, José Cano, José L. Abellán, John Kim, David Kaeli, Ajay Joshi	2026-02-10	下载	Fully Homomorphic Encryption (FHE) enables computation directly on encrypted data but incurs massive computational and memory overheads, often exceeding plaintext execution by several orders of magnit...
CktEvo: Repository-Level RTL Code Benchmark for Design Evolution	Zhengyuan Shi, Jingxin Wang, Tairan Cheng, Changran Xu, Weikang Qian, Qiang Xu	2026-02-10	下载	Register-Transfer Level (RTL) coding is an iterative, repository-scale process in which Power, Performance, and Area (PPA) emerge from interactions across many files and the downstream toolchain.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Flash-SD-KDE: Accelerating SD-KDE with Tensor Cores	Elliot L. Epstein, Rajat Vadiraj Dwaraknath, John Winnicki	2026-02-10	下载	Score-debiased kernel density estimation (SD-KDE) achieves improved asymptotic convergence rates over classical KDE, but its use of an empirical score has made it significantly slower in practice.
Implementability of Global Distributed Protocols modulo Network Architectures	Elaine Li, Thomas Wies	2026-02-10	下载	Global protocols specify distributed, message-passing protocols from a birds-eye view, and are used as a specification for synthesizing local implementations.
Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300A	Aaron Jarmusch, Connor Vitz, Sunita Chandrasekaran	2026-02-10	下载	The AMD MI300A APU integrates CDNA3 GPUs with high-bandwidth memory and advanced accelerator features: FP8 matrix cores, asynchronous compute engines (ACE), and 2:4 structured sparsity.
KORAL: Knowledge Graph Guided LLM Reasoning for SSD Operational Analysis	Mayur Akewar, Sandeep Madireddy, Dongsheng Luo, Janki Bhimani	2026-02-10	下载	Solid State Drives (SSDs) are critical to datacenters, consumer platforms, and mission-critical systems. Yet diagnosing their performance and reliability is difficult because data are fragmented and t...
Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?	Taeyoon Kim, Woohyeok Park, Hoyeong Yun, Kyungyong Lee	2026-02-10	下载	Failures in large-scale cloud systems incur substantial financial losses, making automated Root Cause Analysis (RCA) essential for operational stability.
Efficient Remote Prefix Fetching with GPU-native Media ASICs	Liang Mi, Weijun Wang, Jinghan Chen, Ting Cao, Haipeng Dai, Yunxin Liu	2026-02-10	下载	Remote KV cache reuse fetches KV cache for identical contexts from remote storage, avoiding recomputation, accelerating LLM inference. While it excels in high-speed networks, its performance degrades ...
Revealing the Challenges of Attention-FFN Disaggregation for Modern MoE Models and Hardware Systems	Guowei Liu, Hongming Li, Yaning Guo, Yongxi Lyu, Mo Zhou, Yi Liu, Zhaogeng Li, Yanpeng Wang	2026-02-10	下载	Deploying large-scale MoE models presents challenges in memory capacity and bandwidth for expert activation. While Attention-FFN Disaggregation (AFD) has emerged as a potential architecture to decoupl...
High-performance Vector-length Agnostic Quantum Circuit Simulations on ARM Processors	Ruimin Shi, Gabin Schieffer, Pei-Hung Lin, Maya Gokhale, Andreas Herten, Ivy Peng	2026-02-10	下载	ARM SVE and RISC-V RVV are emerging vector architectures in high-end processors that support vectorization of flexible vector length. In this work, we leverage an important workload for quantum comput...
Rashomon Sets and Model Multiplicity in Federated Learning	Xenia Heilmann, Luca Corbucci, Mattia Cerrato	2026-02-10	下载	The Rashomon set captures the collection of models that achieve near-identical empirical performance yet may differ substantially in their decision boundaries.
It's not a lie if you don't get caught: simplifying reconfiguration in SMR through dirty logs	Allen Clement, Natacha Crooks, Neil Giridharan, Alex Shamis	2026-02-10	下载	Production state-machine replication (SMR) implementations are complex, multi-layered architectures comprising data dissemination, ordering, execution, and reconfiguration components.
The Coordination Criterion	Joseph M. Hellerstein	2026-02-10	下载	When is coordination intrinsically required by a distributed specification, rather than imposed by a particular protocol or implementation strategy? We give a general answer using minimal assumptions.
Architectural Foundations for Checkpointing and Restoration in Quantum HPC Systems	Qiang Guan, Qinglei Cao, Xiaoyi Lu, Siyuan Niu	2026-02-10	下载	In this work, we explore the design of the checkpointing and restoration for quantum HPC that leverages dynamic circuit technology to enable restartable and resilient quantum execution.
LLM-CoOpt: A Co-Design and Optimization Framework for Efficient LLM Inference on Heterogeneous Platforms	Jie Kong, Wei Wang, Jiehan Zhou, Chen Yu	2026-02-10	下载	Major challenges in LLMs inference remain frequent memory bandwidth bottlenecks, computational redundancy, and inefficiencies in long-sequence processing.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Bring Your Own Objective: Inter-operability of Network Objectives in Datacenters	Sanjoli Narang, Anup Agarwal, Venkat Arun, Manya Ghobadi	2026-02-10	下载	Datacenter networks are currently locked in a "tyranny of the single objective". While modern workloads demand diverse performance goals, ranging from coflow completion times, per-flow fairness, short...
Resilient Topology-Aware Coordination for Dynamic 3D UAV Networks under Node Failure	Chuan-Chi Lai	2026-02-10	下载	Ensuring continuous service coverage under unexpected hardware failures is a fundamental challenge for 3D Aerial-Ground Integrated Networks. Although Multi-Agent Reinforcement Learning facilitates aut...
ORCHID: Fairness-Aware Orchestration in Mission-Critical Air-Ground Integrated Networks	Chuan-Chi Lai, Chi Jai Choy	2026-02-10	下载	In the era of 6G Air-Ground Integrated Networks (AGINs), Unmanned Aerial Vehicles (UAVs) are pivotal for providing on-demand wireless coverage in mission-critical environments, such as post-disaster r...
SCOPE: A Training-Free Online 3D Deployment for UAV-BSs with Theoretical Analysis and Comparative Study	Chuan-Chi Lai	2026-02-10	下载	Unmanned Aerial Vehicle (UAV)-mounted Base Stations (UAV-BSs) offer a flexible solution for serving ground users in temporary hotspot scenarios.
Tracing Data Packet Paths over the Internet using Traceroute	Thomas Dreibholz, Somnath Mazumdar	2026-02-10	下载	Network communication using the Internet Protocol (IP) is a pillar of modern Internet applications. IP allows data packets to travel the world through a complex set of interconnected computer networks...
Hybrid Responsible AI-Stochastic Approach for SLA Compliance in Multivendor 6G Networks	Emanuel Figetakis, Ahmed Refaey Hussein	2026-02-10	下载	The convergence of AI and 6G network automation introduces new challenges in maintaining transparency, fairness, and accountability across multivendor management systems.
6G NTN Waveforms: A Comparison of OTFS, AFDM and OCDM in LEO Satellite Channels	Baidyanath Mandal, Aniruddha Chandra, Rastislav Roka, Jarosław Wojtun, Jan Kelner, Cezary Ziołkowski	2026-02-10	下载	Sixth generation (6G) physical layer (PHY) is evolving beyond the legacy orthogonal frequency division multiplexing (OFDM)-based waveforms. In this paper, we compare the bit error rate (BER) performan...
Optimally Deployed Multistatic OTFS-ISAC Design With Kalman-Based Tracking of Targets	Jyotsna Rani, Kuntal Deka, Ganesh Prasad, Zilong Liu	2026-02-10	下载	This paper studies orthogonal time-frequency space (OTFS) modulation aided multistatic integrated sensing and communication (ISAC) in vehicular networks, whereby its delay-Doppler robustness is exploi...
Semantic Waveforms for AI-Native 6G Networks	Nour Hello, Mohamed Amine Hamoura, Francois Rivet, Emilio Calvanese Strinati	2026-02-10	下载	In this paper, we propose a semantic-aware waveform design framework for AI-native 6G networks that jointly optimizes physical layer resource usage and semantic communication efficiency and robustness...
ISO FastLane: Faster ISO 11783 with Dual Stack Approach as a Short Term Solution	Timo Oksanen	2026-02-10	下载	The agricultural industry has been searching for a high-speed successor to the 250~kbit/s CAN bus backbone of ISO~11783 (ISOBUS) for over a decade, yet no protocol-level solution has reached standardi...
Fidelity-Age-Aware Scheduling in Quantum Repeater Networks	Ozgur Ercetin, Zafer Gedik	2026-02-10	下载	Quantum repeater networks distribute entanglement over long distances but must balance fidelity, delay, and resource contention. Prior work optimized throughput and end-to-end fidelity, yet little att...
QoS Identifier and Slice Mapping in 5G and Non-Terrestrial Network Interconnected Systems	Yuma Abe, Mariko Sekiguchi, Amane Miura	2026-02-10	下载	The interconnection of 5G and non-terrestrial networks (NTNs) has been actively studied to expand connectivity beyond conventional terrestrial infrastructure.
XLB: A High Performance Layer-7 Load Balancer for Microservices using eBPF-based In-kernel Interposition	Yuejie Wang, Chenchen Shou, Jiaxu Qian, Guyue Liu	2026-02-10	下载	L7 load balancers are a fundamental building block in microservices as they enable fine-grained traffic distribution. Compared to monolithic applications, microservices demand higher performance and s...
Resilient and Freshness-Aware Scheduling for Industrial Multi-Hop IAB Networks: A Packet Duplication Approach	Shuo Zhu, Siyu Lin, Zijing Wang, Qiao Ren, Xiaoheng Deng, Bo Ai	2026-02-10	下载	In industrial millimeter-wave (mmWave) multi-hop Integrated Access and Backhaul (IAB) networks, dynamic blockages caused by moving obstacles pose a severe threat to robust and continuous networks.
MalMoE: Mixture-of-Experts Enhanced Encrypted Malicious Traffic Detection Under Graph Drift	Yunpeng Tan, Qingyang Li, Mingxin Yang, Yannan Hu, Lei Zhang, Xinggong Zhang	2026-02-10	下载	Encryption has been commonly used in network traffic to secure transmission, but it also brings challenges for malicious traffic detection, due to the invisibility of the packet payload.
XMap: Fast Internet-wide IPv4 and IPv6 Network Scanner	Xiang Li, Zixuan Xie, Lu Sun, Yuqi Qiu, Zuyao Xu, Zheli Liu	2026-02-10	下载	XMap is an open-source network scanner designed for performing fast Internet-wide IPv4 and IPv6 network research scanning. XMap was initially developed as the research artifact of a paper published at...
Cooperative Edge Caching with Large Language Model in Wireless Networks	Ning Yang, Wentao Wang, Lingtao Ouyang, Haijun Zhang	2026-02-10	下载	Cooperative edge caching in overlapping zones couples Base Station (BS) decisions, making content replacement sensitive to spatial topology and temporal reuse.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
AgentCgroup: Understanding and Controlling OS Resources of AI Agents	Yusheng Zheng, Jiakun Fan, Quanzhi Fu, Yiwei Yang, Wei Zhang, Andi Quinn	2026-02-10	下载	AI agents are increasingly deployed in multi-tenant cloud environments, where they execute diverse tool calls within sandboxed containers, each call with distinct resource demands and rapid fluctuatio...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
XLB: A High Performance Layer-7 Load Balancer for Microservices using eBPF-based In-kernel Interposition	Yuejie Wang, Chenchen Shou, Jiaxu Qian, Guyue Liu	2026-02-10	下载	L7 load balancers are a fundamental building block in microservices as they enable fine-grained traffic distribution. Compared to monolithic applications, microservices demand higher performance and s...