Skip to content

2026-02-10

cs.AR - Architecture

标题作者发布日期PDF摘要
Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300AAaron Jarmusch, Connor Vitz, Sunita Chandrasekaran2026-02-10下载The AMD MI300A APU integrates CDNA3 GPUs with high-bandwidth memory and advanced accelerator features: FP8 matrix cores, asynchronous compute engines (ACE), and 2:4 structured sparsity.
Area-Efficient In-Memory Computing for Mixture-of-Experts via Multiplexing and CachingHanyuan Gao, Xiaoxuan Yang2026-02-10下载Mixture-of-Experts (MoE) layers activate a subset of model weights, dubbed experts, to improve model performance. MoE is particularly promising for deployment on process-in-memory (PIM) architectures,...
ACE-RTL: When Agentic Context Evolution Meets RTL-Specialized LLMsChenhui Deng, Zhongzhi Yu, Guan-Ting Liu, Nathaniel Pinckney, Brucek Khailany, Haoxing Ren2026-02-10下载Recent advances in LLMs have sparked growing interest in applying them to hardware design automation, particularly for accurate RTL code generation.
KernelCraft: Benchmarking for Agentic Close-to-Metal Kernel Generation on Emerging HardwareJiayi Nie, Haoran Wu, Yao Lai, Zeyu Cao, Cheng Zhang, Binglei Lou, Erwei Wang, Jianyi Cheng, Timothy M. Jones, Robert Mullins, Rika Antonova, Yiren Zhao2026-02-10下载New AI accelerators with novel instruction set architectures (ISAs) often require developers to manually craft low-level kernels -- a time-consuming, laborious, and error-prone process that cannot sca...
Development of an Energy-Efficient and Real-Time Data Movement Strategy for Next-Generation Heterogeneous Mixed-Criticality SystemsThomas Benz2026-02-10下载Industrial domains such as automotive, robotics, and aerospace are rapidly evolving to satisfy the increasing demand for machine-learning-driven Autonomy, Connectivity, Electrification, and Shared mob...
AnalogToBi: Device-Level Analog Circuit Topology Generation via Bipartite Graph and Grammar Guided DecodingSeungmin Kim, Mingun Kim, Yuna Lee, Yulhwa Kim2026-02-10下载Automatic generation of device-level analog circuit topologies remains a fundamental challenge in analog design automation. Recent transformer-based approaches have shown promise, yet they often suffe...
SiliconMind-V1: Multi-Agent Distillation and Debug-Reasoning Workflows for Verilog Code GenerationMu-Chi Chen, Yu-Hung Kao, Po-Hsuan Huang, Shao-Chun Ho, Hsiang-Yu Tsou, I-Ting Wu, En-Ming Huang, Yu-Kai Hung, Wei-Po Hsin, Cheng Liang, Chia-Heng Tu, Shih-Hao Hung, H. T. Kung2026-02-10下载Large language models (LLMs) have recently emerged as a promising approach for automating Verilog code generation; however, existing methods primarily emphasize syntactic correctness and often rely on...
Accelerating Post-Quantum Cryptography via LLM-Driven Hardware-Software Co-DesignYuchao Liao, Tosiron Adegbija, Roman Lysecky2026-02-10下载Post-quantum cryptography (PQC) is crucial for securing data against emerging quantum threats. However, its algorithms are computationally complex and difficult to implement efficiently on hardware.
FHECore: Rethinking GPU Microarchitecture for Fully Homomorphic EncryptionLohit Daksha, Seyda Guzelhan, Kaustubh Shivdikar, Carlos Agulló Domingo, Óscar Vera Lopez, Gilbert Jonatan, Hubert Dymarkowski, Aymane El Jerari, José Cano, José L. Abellán, John Kim, David Kaeli, Ajay Joshi2026-02-10下载Fully Homomorphic Encryption (FHE) enables computation directly on encrypted data but incurs massive computational and memory overheads, often exceeding plaintext execution by several orders of magnit...
CktEvo: Repository-Level RTL Code Benchmark for Design EvolutionZhengyuan Shi, Jingxin Wang, Tairan Cheng, Changran Xu, Weikang Qian, Qiang Xu2026-02-10下载Register-Transfer Level (RTL) coding is an iterative, repository-scale process in which Power, Performance, and Area (PPA) emerge from interactions across many files and the downstream toolchain.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Flash-SD-KDE: Accelerating SD-KDE with Tensor CoresElliot L. Epstein, Rajat Vadiraj Dwaraknath, John Winnicki2026-02-10下载Score-debiased kernel density estimation (SD-KDE) achieves improved asymptotic convergence rates over classical KDE, but its use of an empirical score has made it significantly slower in practice.
Implementability of Global Distributed Protocols modulo Network ArchitecturesElaine Li, Thomas Wies2026-02-10下载Global protocols specify distributed, message-passing protocols from a birds-eye view, and are used as a specification for synthesizing local implementations.
Execution-Centric Characterization of FP8 Matrix Cores, Asynchronous Execution, and Structured Sparsity on AMD MI300AAaron Jarmusch, Connor Vitz, Sunita Chandrasekaran2026-02-10下载The AMD MI300A APU integrates CDNA3 GPUs with high-bandwidth memory and advanced accelerator features: FP8 matrix cores, asynchronous compute engines (ACE), and 2:4 structured sparsity.
KORAL: Knowledge Graph Guided LLM Reasoning for SSD Operational AnalysisMayur Akewar, Sandeep Madireddy, Dongsheng Luo, Janki Bhimani2026-02-10下载Solid State Drives (SSDs) are critical to datacenters, consumer platforms, and mission-critical systems. Yet diagnosing their performance and reliability is difficult because data are fragmented and t...
Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?Taeyoon Kim, Woohyeok Park, Hoyeong Yun, Kyungyong Lee2026-02-10下载Failures in large-scale cloud systems incur substantial financial losses, making automated Root Cause Analysis (RCA) essential for operational stability.
Efficient Remote Prefix Fetching with GPU-native Media ASICsLiang Mi, Weijun Wang, Jinghan Chen, Ting Cao, Haipeng Dai, Yunxin Liu2026-02-10下载Remote KV cache reuse fetches KV cache for identical contexts from remote storage, avoiding recomputation, accelerating LLM inference. While it excels in high-speed networks, its performance degrades ...
Revealing the Challenges of Attention-FFN Disaggregation for Modern MoE Models and Hardware SystemsGuowei Liu, Hongming Li, Yaning Guo, Yongxi Lyu, Mo Zhou, Yi Liu, Zhaogeng Li, Yanpeng Wang2026-02-10下载Deploying large-scale MoE models presents challenges in memory capacity and bandwidth for expert activation. While Attention-FFN Disaggregation (AFD) has emerged as a potential architecture to decoupl...
High-performance Vector-length Agnostic Quantum Circuit Simulations on ARM ProcessorsRuimin Shi, Gabin Schieffer, Pei-Hung Lin, Maya Gokhale, Andreas Herten, Ivy Peng2026-02-10下载ARM SVE and RISC-V RVV are emerging vector architectures in high-end processors that support vectorization of flexible vector length. In this work, we leverage an important workload for quantum comput...
Rashomon Sets and Model Multiplicity in Federated LearningXenia Heilmann, Luca Corbucci, Mattia Cerrato2026-02-10下载The Rashomon set captures the collection of models that achieve near-identical empirical performance yet may differ substantially in their decision boundaries.
It's not a lie if you don't get caught: simplifying reconfiguration in SMR through dirty logsAllen Clement, Natacha Crooks, Neil Giridharan, Alex Shamis2026-02-10下载Production state-machine replication (SMR) implementations are complex, multi-layered architectures comprising data dissemination, ordering, execution, and reconfiguration components.
The Coordination CriterionJoseph M. Hellerstein2026-02-10下载When is coordination intrinsically required by a distributed specification, rather than imposed by a particular protocol or implementation strategy? We give a general answer using minimal assumptions.
Architectural Foundations for Checkpointing and Restoration in Quantum HPC SystemsQiang Guan, Qinglei Cao, Xiaoyi Lu, Siyuan Niu2026-02-10下载In this work, we explore the design of the checkpointing and restoration for quantum HPC that leverages dynamic circuit technology to enable restartable and resilient quantum execution.
LLM-CoOpt: A Co-Design and Optimization Framework for Efficient LLM Inference on Heterogeneous PlatformsJie Kong, Wei Wang, Jiehan Zhou, Chen Yu2026-02-10下载Major challenges in LLMs inference remain frequent memory bandwidth bottlenecks, computational redundancy, and inefficiencies in long-sequence processing.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Bring Your Own Objective: Inter-operability of Network Objectives in DatacentersSanjoli Narang, Anup Agarwal, Venkat Arun, Manya Ghobadi2026-02-10下载Datacenter networks are currently locked in a "tyranny of the single objective". While modern workloads demand diverse performance goals, ranging from coflow completion times, per-flow fairness, short...
Resilient Topology-Aware Coordination for Dynamic 3D UAV Networks under Node FailureChuan-Chi Lai2026-02-10下载Ensuring continuous service coverage under unexpected hardware failures is a fundamental challenge for 3D Aerial-Ground Integrated Networks. Although Multi-Agent Reinforcement Learning facilitates aut...
ORCHID: Fairness-Aware Orchestration in Mission-Critical Air-Ground Integrated NetworksChuan-Chi Lai, Chi Jai Choy2026-02-10下载In the era of 6G Air-Ground Integrated Networks (AGINs), Unmanned Aerial Vehicles (UAVs) are pivotal for providing on-demand wireless coverage in mission-critical environments, such as post-disaster r...
SCOPE: A Training-Free Online 3D Deployment for UAV-BSs with Theoretical Analysis and Comparative StudyChuan-Chi Lai2026-02-10下载Unmanned Aerial Vehicle (UAV)-mounted Base Stations (UAV-BSs) offer a flexible solution for serving ground users in temporary hotspot scenarios.
Tracing Data Packet Paths over the Internet using TracerouteThomas Dreibholz, Somnath Mazumdar2026-02-10下载Network communication using the Internet Protocol (IP) is a pillar of modern Internet applications. IP allows data packets to travel the world through a complex set of interconnected computer networks...
Hybrid Responsible AI-Stochastic Approach for SLA Compliance in Multivendor 6G NetworksEmanuel Figetakis, Ahmed Refaey Hussein2026-02-10下载The convergence of AI and 6G network automation introduces new challenges in maintaining transparency, fairness, and accountability across multivendor management systems.
6G NTN Waveforms: A Comparison of OTFS, AFDM and OCDM in LEO Satellite ChannelsBaidyanath Mandal, Aniruddha Chandra, Rastislav Roka, Jarosław Wojtun, Jan Kelner, Cezary Ziołkowski2026-02-10下载Sixth generation (6G) physical layer (PHY) is evolving beyond the legacy orthogonal frequency division multiplexing (OFDM)-based waveforms. In this paper, we compare the bit error rate (BER) performan...
Optimally Deployed Multistatic OTFS-ISAC Design With Kalman-Based Tracking of TargetsJyotsna Rani, Kuntal Deka, Ganesh Prasad, Zilong Liu2026-02-10下载This paper studies orthogonal time-frequency space (OTFS) modulation aided multistatic integrated sensing and communication (ISAC) in vehicular networks, whereby its delay-Doppler robustness is exploi...
Semantic Waveforms for AI-Native 6G NetworksNour Hello, Mohamed Amine Hamoura, Francois Rivet, Emilio Calvanese Strinati2026-02-10下载In this paper, we propose a semantic-aware waveform design framework for AI-native 6G networks that jointly optimizes physical layer resource usage and semantic communication efficiency and robustness...
ISO FastLane: Faster ISO 11783 with Dual Stack Approach as a Short Term SolutionTimo Oksanen2026-02-10下载The agricultural industry has been searching for a high-speed successor to the 250~kbit/s CAN bus backbone of ISO~11783 (ISOBUS) for over a decade, yet no protocol-level solution has reached standardi...
Fidelity-Age-Aware Scheduling in Quantum Repeater NetworksOzgur Ercetin, Zafer Gedik2026-02-10下载Quantum repeater networks distribute entanglement over long distances but must balance fidelity, delay, and resource contention. Prior work optimized throughput and end-to-end fidelity, yet little att...
QoS Identifier and Slice Mapping in 5G and Non-Terrestrial Network Interconnected SystemsYuma Abe, Mariko Sekiguchi, Amane Miura2026-02-10下载The interconnection of 5G and non-terrestrial networks (NTNs) has been actively studied to expand connectivity beyond conventional terrestrial infrastructure.
XLB: A High Performance Layer-7 Load Balancer for Microservices using eBPF-based In-kernel InterpositionYuejie Wang, Chenchen Shou, Jiaxu Qian, Guyue Liu2026-02-10下载L7 load balancers are a fundamental building block in microservices as they enable fine-grained traffic distribution. Compared to monolithic applications, microservices demand higher performance and s...
Resilient and Freshness-Aware Scheduling for Industrial Multi-Hop IAB Networks: A Packet Duplication ApproachShuo Zhu, Siyu Lin, Zijing Wang, Qiao Ren, Xiaoheng Deng, Bo Ai2026-02-10下载In industrial millimeter-wave (mmWave) multi-hop Integrated Access and Backhaul (IAB) networks, dynamic blockages caused by moving obstacles pose a severe threat to robust and continuous networks.
MalMoE: Mixture-of-Experts Enhanced Encrypted Malicious Traffic Detection Under Graph DriftYunpeng Tan, Qingyang Li, Mingxin Yang, Yannan Hu, Lei Zhang, Xinggong Zhang2026-02-10下载Encryption has been commonly used in network traffic to secure transmission, but it also brings challenges for malicious traffic detection, due to the invisibility of the packet payload.
XMap: Fast Internet-wide IPv4 and IPv6 Network ScannerXiang Li, Zixuan Xie, Lu Sun, Yuqi Qiu, Zuyao Xu, Zheli Liu2026-02-10下载XMap is an open-source network scanner designed for performing fast Internet-wide IPv4 and IPv6 network research scanning. XMap was initially developed as the research artifact of a paper published at...
Cooperative Edge Caching with Large Language Model in Wireless NetworksNing Yang, Wentao Wang, Lingtao Ouyang, Haijun Zhang2026-02-10下载Cooperative edge caching in overlapping zones couples Base Station (BS) decisions, making content replacement sensitive to spatial topology and temporal reuse.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
AgentCgroup: Understanding and Controlling OS Resources of AI AgentsYusheng Zheng, Jiakun Fan, Quanzhi Fu, Yiwei Yang, Wei Zhang, Andi Quinn2026-02-10下载AI agents are increasingly deployed in multi-tenant cloud environments, where they execute diverse tool calls within sandboxed containers, each call with distinct resource demands and rapid fluctuatio...

cs.PF - Performance

标题作者发布日期PDF摘要
XLB: A High Performance Layer-7 Load Balancer for Microservices using eBPF-based In-kernel InterpositionYuejie Wang, Chenchen Shou, Jiaxu Qian, Guyue Liu2026-02-10下载L7 load balancers are a fundamental building block in microservices as they enable fine-grained traffic distribution. Compared to monolithic applications, microservices demand higher performance and s...

基于 VitePress 构建