2026-03-17

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
An FPGA-Based SoC Architecture with a RISC-V Controller for Energy-Efficient Temporal-Coding Spiking Neural Networks	Mohammad Javad Sekonji, Ali Mahani, Maryam Mirsadeghi, Mahdi Taheri	2026-03-17	下载	Spiking Neural Networks (SNNs) offer high energy efficiency and event-driven computation, ideal for low-power edge AI. Their hardware implementation on FPGAs, however, faces challenges due to heavy co...
Mix-and-Match Pruning: Globally Guided Layer-Wise Sparsification of DNNs	Danial Monachan, Samira Nazari, Mahdi Taheri, Ali Azarpeyvand, Milos Krstic, Michael Huebner, Christian Herglotz	2026-03-17	下载	Deploying deep neural networks (DNNs) on edge devices requires strong compression with minimal accuracy loss. This paper introduces Mix-and-Match Pruning, a globally guided, layer-wise sparsification ...
CODMAS: A Dialectic Multi-Agent Collaborative Framework for Structured RTL Optimization	Che-Ming Chang, Prashanth Vijayaraghavan, Ashutosh Jadhav, Charles Mackin, Vandana Mukherjee, Hsinyu Tsai, Ehsan Degan	2026-03-17	下载	Optimizing Register Transfer Level (RTL) code is a critical step in Electronic Design Automation (EDA) for improving power, performance, and area (PPA).
Vectorization of Verilog Designs and its Effects on Verification and Synthesis	Maria Fernanda Oiveira Guimarães, Ulisses Rosa, Ian Trudel, João Victor Amorim Vieira, Augusto Amaral Mafra, Mirlaine Crepalde, Fernando Magno Quintão Pereira	2026-03-17	下载	Vectorization is a compiler optimization that replaces multiple operations on scalar values with a single operation on vector values. Although common in traditional compilers such as rustc, clang, and...
ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation	Nij Dorairaj, Debabrata Chatterjee, Hong Wang, Hong Jiang, Alankar Saxena, Altug Koker, Thiam Ern Lim, Cathrane Teoh, Chuan Yin Loo, Bishara Shomar, Anthony Lester	2026-03-17	下载	Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability.
Deep Learning-Driven Black-Box Doherty Power Amplifier with Pixelated Output Combiner and Extended Efficiency Range	Han Zhou, Haojie Chang, David Widen	2026-03-17	下载	This article presents a deep learning-driven inverse design methodology for Doherty power amplifiers (PA) with multi-port pixelated output combiner networks.
A Pin-Array Structured Climbing Robot for Stable Locomotion on Steep Rocky Terrain	Keita Nagaoka, Kentaro Uno, Kazuya Yoshida	2026-03-17	下载	Climbing robots face significant challenges when navigating unstructured environments, where reliable attachment to irregular surfaces is critical.
ETM2: Empowering Traditional Memory Bandwidth Regulation using ETM	Alexander Zuepke, Ashutosh Pradhan, Daniele Ottaviano, Andrea Bastoni, Marco Caccamo	2026-03-17	下载	The Embedded Trace Macrocell (ETM) is a standard component of Arm's CoreSight architecture, present in a wide range of platforms and primarily designed for tracing and debugging.
A Scalable Open-Source QEC System with Sub-Microsecond Decoding-Feedback Latency	Junyi Liu, Yi Lee, Yilun Xu, Gang Huang, Xiaodi Wu	2026-03-17	下载	Quantum error correction (QEC) is essential for realizing large-scale, fault-tolerant quantum computation, yet its practical implementation remains a major engineering challenge.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage	Haidong Rong, Jiashu Yao, Matthias Langer, Shijie Liu, Li Fan, Dongxin Wang, Jia He, Jinglin Chen, Jiaheng Rang, Julian Qian, Mengyao Xu, Fan Yu, Minseok Lee, Zehuan Wang, Even Oldridge	2026-03-17	下载	Traditional GPU hash tables preserve every inserted key -- a dictionary assumption that wastes scarce High Bandwidth Memory (HBM) when embedding tables routinely exceed single-GPU capacity.
Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guide to Parallel Newton Methods for Breaking Sequential Bottlenecks	Xavier Gonzalez	2026-03-17	下载	Massively parallel hardware (GPUs) and long sequence data have made parallel algorithms essential for machine learning at scale. Yet dynamical systems, like recurrent neural networks and Markov chain ...
ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation	Nij Dorairaj, Debabrata Chatterjee, Hong Wang, Hong Jiang, Alankar Saxena, Altug Koker, Thiam Ern Lim, Cathrane Teoh, Chuan Yin Loo, Bishara Shomar, Anthony Lester	2026-03-17	下载	Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability.
Looking for (Genomic) Needles in a Haystack: Sparsity-Driven Search for Identifying Correlated Genetic Mutations in Cancer	Ritvik Prabhu, Emil Vatai, Bernard Moussad, Emmanuel Jeannot, Ramu Anandakrishnan, Wu-chun Feng, Mohamed Wahib	2026-03-17	下载	Cancer typically arises not from a single genetic mutation (i.e., hit) but from multi-hit combinations that accumulate within cells. However, enumerating multi-hit combinations becomes exponentially m...
Dataflow-Oriented Classification and Performance Analysis of GPU-Accelerated Homomorphic Encryption	Ai Nozaki, Takuya Kojima, Hideki Takase, Hiroshi Nakamura	2026-03-17	下载	Fully Homomorphic Encryption (FHE) enables secure computation over encrypted data, but its computational cost remains a major obstacle to practical deployment.
Accelerating the Particle-In-Cell code ECsim with OpenACC	Elisabetta Boella, Nitin Shukla, Filippo Spiga, Mozhgan Kabiri Chimeh, Matt Bettencourt, Maria Elena Innocenti	2026-03-17	下载	The Particle-In-Cell (PIC) method is a computational technique widely used in plasma physics to model plasmas at the kinetic level. In this work, we present our effort to prepare the semi-implicit ene...
FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism	Huamin Chen, Xunzhuo Liu, Yuhan Liu, Junchen Jiang, Bowei He, Xue Liu	2026-03-17	下载	Modern LLM GPU fleets are provisioned for worst-case context lengths that the vast majority of requests never approach, wasting GPU capacity on idle KV-cache slots.
When GPUs Fail Quietly: Observability-Aware Early Warning Beyond Numeric Telemetry	Michael Bidollahkhani, Freja Nordsiek, Julian M. Kunkel	2026-03-17	下载	GPU nodes are central to modern HPC and AI workloads, yet many failures do not manifest as immediate hard faults. While some instabilities emerge gradually as weak thermal or efficiency drift, a signi...
An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU	Ruijia Yang, Zeyi Wen	2026-03-17	下载	Fine-tuning Large Language Models (LLMs) has become essential for domain adaptation, but its memory-intensive property exceeds the capabilities of most GPUs.
Biased Compression in Gradient Coding for Distributed Learning	Chengxi Li, Ming Xiao, Mikael Skoglund	2026-03-17	下载	Communication bottlenecks and the presence of stragglers pose significant challenges in distributed learning (DL). To deal with these challenges, recent advances leverage unbiased compression function...
Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient Coding	Chengxi Li, Youssef Allouah, Rachid Guerraoui, Mikael Skoglund, Ming Xiao	2026-03-17	下载	In this paper, we study the problem of distributed training (DT) under Byzantine attacks with communication constraints. While prior work has developed various robust aggregation rules at the server t...
inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference	Huamin Chen, Xunzhuo Liu, Yuhan Liu, Junchen Jiang, Bowei He, Xue Liu	2026-03-17	下载	Sizing a GPU fleet for LLM inference is harder than it looks. The obvious questions -- how many GPUs, which type, where to split a two-pool fleet -- have no closed-form answers.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
AI-Driven Multi-Modal Adaptive Handover Control Optimization for O-RAN	Abdul Wadud, Fatemeh Golpayegani, Nima Afraz	2026-03-17	下载	Handover optimization in O-RAN faces growing challenges due to heterogeneous user mobility patterns and rapidly varying radio conditions. Existing ML-based handover schemes typically operate at the ne...
HAPS-RIS-assisted IoT Networks for Disaster Recovery and Emergency Response: Architecture, Application Scenarios, and Open Challenges	Bilal Karaman, Ilhan Basturk, Engin Zeydan, Ferdi Kara, Esra Aycan Beyazit, Sezai Taskin, Halim Yanikomeroglu	2026-03-17	下载	Reliable and resilient communication is essential for disaster recovery and emergency response, yet terrestrial infrastructure often fails during large-scale natural disasters.
Evaluating Smartphone GNSS Accuracy for Geofenced 6 GHz Operations	Joshua Roy Palathinkal, Hardani Ismu Nabil, Muhammad Iqbal Rochman, Hossein Nasiri, Francis A. Gatsi, Monisha Ghosh	2026-03-17	下载	The recently deployed 6 GHz spectrum in the U.S. utilizes distinct power categories, with the latest proposed "Geofenced Variable Power" (GVP) category permitting indoor and outdoor operations without...
Fine-Grained Network Traffic Classification with Contextual QoS Profiling	Huiwen Zhang, Feng Ye	2026-03-17	下载	Accurate network traffic classification is vital for managing modern applications with strict Quality of Service (QoS) demands, such as edge computing, real-time XR, and autonomous systems.
Persistent Device Identity for Network Access Control in the Era of MAC Address Randomization: A RADIUS-Based Framework	Premanand Seralathan	2026-03-17	下载	Modern operating systems increasingly randomize Media Access Control (MAC) addresses to protect user privacy, fundamentally disrupting Network Access Control (NAC) systems that have relied on MAC addr...
Optimal Radio Resource Management for ISAC Under Imperfect Information: A Resource Economy-Driven Perspective	Luis F. Abanto-Leon, Setareh Maghsudi	2026-03-17	下载	This work investigates the radio resource management (RRM) design for downlink integrated sensing and communications (ISAC) systems, jointly optimizing timeslot allocation, beam adaptation, functional...
Agentic AI for SAGIN Resource Management_Semantic Awareness, Orchestration, and Optimization	Linghao Zhang, Haitao Zhao, Bo Xu, Hongbo Zhu, Xianbin Wang	2026-03-17	下载	Space-air-ground integrated networks (SAGIN) promise ubiquitous 6G connectivity but face significant resource management challenges due to heterogeneous infrastructure, dynamic topologies, and stringe...
Toward Experimentation-as-a-Service in 5G/6G: The Plaza6G Prototype for AI-Assisted Trials	Sergio Barrachina-Muñoz, Marc Carrascosa-Zamacois, Horacio Bleda, Umair Riaz, Yasir Maqsood, Xavier Calle, Selva Vía, Miquel Payaró, Josep Mangues-Bafalluy	2026-03-17	下载	This paper presents Plaza6G, the first operational Experiment-as-a-Service (ExaS) platform unifying cloud resources with next-generation wireless infrastructure.
Communication-Aware Multi-Agent Reinforcement Learning for Decentralized Cooperative UAV Deployment	Enguang Fan, Yifan Chen, Zihan Shan, Matthew Caesar, Jae Kim	2026-03-17	下载	Autonomous Unmanned Aerial Vehicle (UAV) swarms are increasingly used as rapidly deployable aerial relays and sensing platforms, yet practical deployments must operate under partial observability and ...
BLADE: Adaptive Wi-Fi Contention Control for Next-Generation Real-Time Communication	Fengqian Guo, Yuhan Zhou, Longwei Jiang, Congcong Miao, Yuxin Liu, Chenren Xu, Hancheng Lu, Chang Wen Chen, Yaxiong Xie, Honghao Liu	2026-03-17	下载	Next-generation real-time communication (NGRTC) applications, such as cloud gaming and XR, demand consistently ultra-low latency. However, through our first large-scale measurement, we find that despi...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Leveraging LLMs for Structured Information Extraction and Analysis from Cloud Incident Reports (Work In Progress Paper)	Xiaoyu Chu, Shashikant Ilager, Yizhen Zang, Sacheendra Talluri, Alexandru Iosup	2026-03-17	下载	Incident management is essential to maintain the reliability and availability of cloud computing services. Cloud vendors typically disclose incident reports to the public, summarizing the failures and...
Elastic Sketch under Random Stationary Streams: Limiting Behavior and Near-Optimal Configuration	Younes Ben Mazziane, Vinay Kumar B. R., Othmane Marfoq	2026-03-17	下载	Elastic-Sketch is a hash-based data structure for counting item's appearances in a data stream, and it has been empirically shown to achieve a better memory-accuracy trade-off compared to classical me...
ETM2: Empowering Traditional Memory Bandwidth Regulation using ETM	Alexander Zuepke, Ashutosh Pradhan, Daniele Ottaviano, Andrea Bastoni, Marco Caccamo	2026-03-17	下载	The Embedded Trace Macrocell (ETM) is a standard component of Arm's CoreSight architecture, present in a wide range of platforms and primarily designed for tracing and debugging.
AI Application Benchmarking: Power-Aware Performance Analysis for Vision and Language Models	Martin Mayr, Sebastian Wind, Lukas Schröder, Georg Hager, Harald Köstler, Gerhard Wellein	2026-03-17	下载	Artificial Intelligence (AI) workloads drive a rapid expansion of high-performance computing (HPC) infrastructures and increase their power and energy demands towards a critical level.