Skip to content

2026-03-17

cs.AR - Architecture

标题作者发布日期PDF摘要
An FPGA-Based SoC Architecture with a RISC-V Controller for Energy-Efficient Temporal-Coding Spiking Neural NetworksMohammad Javad Sekonji, Ali Mahani, Maryam Mirsadeghi, Mahdi Taheri2026-03-17下载Spiking Neural Networks (SNNs) offer high energy efficiency and event-driven computation, ideal for low-power edge AI. Their hardware implementation on FPGAs, however, faces challenges due to heavy co...
Mix-and-Match Pruning: Globally Guided Layer-Wise Sparsification of DNNsDanial Monachan, Samira Nazari, Mahdi Taheri, Ali Azarpeyvand, Milos Krstic, Michael Huebner, Christian Herglotz2026-03-17下载Deploying deep neural networks (DNNs) on edge devices requires strong compression with minimal accuracy loss. This paper introduces Mix-and-Match Pruning, a globally guided, layer-wise sparsification ...
CODMAS: A Dialectic Multi-Agent Collaborative Framework for Structured RTL OptimizationChe-Ming Chang, Prashanth Vijayaraghavan, Ashutosh Jadhav, Charles Mackin, Vandana Mukherjee, Hsinyu Tsai, Ehsan Degan2026-03-17下载Optimizing Register Transfer Level (RTL) code is a critical step in Electronic Design Automation (EDA) for improving power, performance, and area (PPA).
Vectorization of Verilog Designs and its Effects on Verification and SynthesisMaria Fernanda Oiveira Guimarães, Ulisses Rosa, Ian Trudel, João Victor Amorim Vieira, Augusto Amaral Mafra, Mirlaine Crepalde, Fernando Magno Quintão Pereira2026-03-17下载Vectorization is a compiler optimization that replaces multiple operations on scalar values with a single operation on vector values. Although common in traditional compilers such as rustc, clang, and...
ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and EmulationNij Dorairaj, Debabrata Chatterjee, Hong Wang, Hong Jiang, Alankar Saxena, Altug Koker, Thiam Ern Lim, Cathrane Teoh, Chuan Yin Loo, Bishara Shomar, Anthony Lester2026-03-17下载Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability.
Deep Learning-Driven Black-Box Doherty Power Amplifier with Pixelated Output Combiner and Extended Efficiency RangeHan Zhou, Haojie Chang, David Widen2026-03-17下载This article presents a deep learning-driven inverse design methodology for Doherty power amplifiers (PA) with multi-port pixelated output combiner networks.
A Pin-Array Structured Climbing Robot for Stable Locomotion on Steep Rocky TerrainKeita Nagaoka, Kentaro Uno, Kazuya Yoshida2026-03-17下载Climbing robots face significant challenges when navigating unstructured environments, where reliable attachment to irregular surfaces is critical.
ETM2: Empowering Traditional Memory Bandwidth Regulation using ETMAlexander Zuepke, Ashutosh Pradhan, Daniele Ottaviano, Andrea Bastoni, Marco Caccamo2026-03-17下载The Embedded Trace Macrocell (ETM) is a standard component of Arm's CoreSight architecture, present in a wide range of platforms and primarily designed for tracing and debugging.
A Scalable Open-Source QEC System with Sub-Microsecond Decoding-Feedback LatencyJunyi Liu, Yi Lee, Yilun Xu, Gang Huang, Xiaodi Wu2026-03-17下载Quantum error correction (QEC) is essential for realizing large-scale, fault-tolerant quantum computation, yet its practical implementation remains a major engineering challenge.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding StorageHaidong Rong, Jiashu Yao, Matthias Langer, Shijie Liu, Li Fan, Dongxin Wang, Jia He, Jinglin Chen, Jiaheng Rang, Julian Qian, Mengyao Xu, Fan Yu, Minseok Lee, Zehuan Wang, Even Oldridge2026-03-17下载Traditional GPU hash tables preserve every inserted key -- a dictionary assumption that wastes scarce High Bandwidth Memory (HBM) when embedding tables routinely exceed single-GPU capacity.
Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guide to Parallel Newton Methods for Breaking Sequential BottlenecksXavier Gonzalez2026-03-17下载Massively parallel hardware (GPUs) and long sequence data have made parallel algorithms essential for machine learning at scale. Yet dynamical systems, like recurrent neural networks and Markov chain ...
ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and EmulationNij Dorairaj, Debabrata Chatterjee, Hong Wang, Hong Jiang, Alankar Saxena, Altug Koker, Thiam Ern Lim, Cathrane Teoh, Chuan Yin Loo, Bishara Shomar, Anthony Lester2026-03-17下载Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability.
Looking for (Genomic) Needles in a Haystack: Sparsity-Driven Search for Identifying Correlated Genetic Mutations in CancerRitvik Prabhu, Emil Vatai, Bernard Moussad, Emmanuel Jeannot, Ramu Anandakrishnan, Wu-chun Feng, Mohamed Wahib2026-03-17下载Cancer typically arises not from a single genetic mutation (i.e., hit) but from multi-hit combinations that accumulate within cells. However, enumerating multi-hit combinations becomes exponentially m...
Dataflow-Oriented Classification and Performance Analysis of GPU-Accelerated Homomorphic EncryptionAi Nozaki, Takuya Kojima, Hideki Takase, Hiroshi Nakamura2026-03-17下载Fully Homomorphic Encryption (FHE) enables secure computation over encrypted data, but its computational cost remains a major obstacle to practical deployment.
Accelerating the Particle-In-Cell code ECsim with OpenACCElisabetta Boella, Nitin Shukla, Filippo Spiga, Mozhgan Kabiri Chimeh, Matt Bettencourt, Maria Elena Innocenti2026-03-17下载The Particle-In-Cell (PIC) method is a computational technique widely used in plasma physics to model plasmas at the kinetic level. In this work, we present our effort to prepare the semi-implicit ene...
FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation MechanismHuamin Chen, Xunzhuo Liu, Yuhan Liu, Junchen Jiang, Bowei He, Xue Liu2026-03-17下载Modern LLM GPU fleets are provisioned for worst-case context lengths that the vast majority of requests never approach, wasting GPU capacity on idle KV-cache slots.
When GPUs Fail Quietly: Observability-Aware Early Warning Beyond Numeric TelemetryMichael Bidollahkhani, Freja Nordsiek, Julian M. Kunkel2026-03-17下载GPU nodes are central to modern HPC and AI workloads, yet many failures do not manifest as immediate hard faults. While some instabilities emerge gradually as weak thermal or efficiency drift, a signi...
An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPURuijia Yang, Zeyi Wen2026-03-17下载Fine-tuning Large Language Models (LLMs) has become essential for domain adaptation, but its memory-intensive property exceeds the capabilities of most GPUs.
Biased Compression in Gradient Coding for Distributed LearningChengxi Li, Ming Xiao, Mikael Skoglund2026-03-17下载Communication bottlenecks and the presence of stragglers pose significant challenges in distributed learning (DL). To deal with these challenges, recent advances leverage unbiased compression function...
Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient CodingChengxi Li, Youssef Allouah, Rachid Guerraoui, Mikael Skoglund, Ming Xiao2026-03-17下载In this paper, we study the problem of distributed training (DT) under Byzantine attacks with communication constraints. While prior work has developed various robust aggregation rules at the server t...
inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM InferenceHuamin Chen, Xunzhuo Liu, Yuhan Liu, Junchen Jiang, Bowei He, Xue Liu2026-03-17下载Sizing a GPU fleet for LLM inference is harder than it looks. The obvious questions -- how many GPUs, which type, where to split a two-pool fleet -- have no closed-form answers.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
AI-Driven Multi-Modal Adaptive Handover Control Optimization for O-RANAbdul Wadud, Fatemeh Golpayegani, Nima Afraz2026-03-17下载Handover optimization in O-RAN faces growing challenges due to heterogeneous user mobility patterns and rapidly varying radio conditions. Existing ML-based handover schemes typically operate at the ne...
HAPS-RIS-assisted IoT Networks for Disaster Recovery and Emergency Response: Architecture, Application Scenarios, and Open ChallengesBilal Karaman, Ilhan Basturk, Engin Zeydan, Ferdi Kara, Esra Aycan Beyazit, Sezai Taskin, Halim Yanikomeroglu2026-03-17下载Reliable and resilient communication is essential for disaster recovery and emergency response, yet terrestrial infrastructure often fails during large-scale natural disasters.
Evaluating Smartphone GNSS Accuracy for Geofenced 6 GHz OperationsJoshua Roy Palathinkal, Hardani Ismu Nabil, Muhammad Iqbal Rochman, Hossein Nasiri, Francis A. Gatsi, Monisha Ghosh2026-03-17下载The recently deployed 6 GHz spectrum in the U.S. utilizes distinct power categories, with the latest proposed "Geofenced Variable Power" (GVP) category permitting indoor and outdoor operations without...
Fine-Grained Network Traffic Classification with Contextual QoS ProfilingHuiwen Zhang, Feng Ye2026-03-17下载Accurate network traffic classification is vital for managing modern applications with strict Quality of Service (QoS) demands, such as edge computing, real-time XR, and autonomous systems.
Persistent Device Identity for Network Access Control in the Era of MAC Address Randomization: A RADIUS-Based FrameworkPremanand Seralathan2026-03-17下载Modern operating systems increasingly randomize Media Access Control (MAC) addresses to protect user privacy, fundamentally disrupting Network Access Control (NAC) systems that have relied on MAC addr...
Optimal Radio Resource Management for ISAC Under Imperfect Information: A Resource Economy-Driven PerspectiveLuis F. Abanto-Leon, Setareh Maghsudi2026-03-17下载This work investigates the radio resource management (RRM) design for downlink integrated sensing and communications (ISAC) systems, jointly optimizing timeslot allocation, beam adaptation, functional...
Agentic AI for SAGIN Resource Management_Semantic Awareness, Orchestration, and OptimizationLinghao Zhang, Haitao Zhao, Bo Xu, Hongbo Zhu, Xianbin Wang2026-03-17下载Space-air-ground integrated networks (SAGIN) promise ubiquitous 6G connectivity but face significant resource management challenges due to heterogeneous infrastructure, dynamic topologies, and stringe...
Toward Experimentation-as-a-Service in 5G/6G: The Plaza6G Prototype for AI-Assisted TrialsSergio Barrachina-Muñoz, Marc Carrascosa-Zamacois, Horacio Bleda, Umair Riaz, Yasir Maqsood, Xavier Calle, Selva Vía, Miquel Payaró, Josep Mangues-Bafalluy2026-03-17下载This paper presents Plaza6G, the first operational Experiment-as-a-Service (ExaS) platform unifying cloud resources with next-generation wireless infrastructure.
Communication-Aware Multi-Agent Reinforcement Learning for Decentralized Cooperative UAV DeploymentEnguang Fan, Yifan Chen, Zihan Shan, Matthew Caesar, Jae Kim2026-03-17下载Autonomous Unmanned Aerial Vehicle (UAV) swarms are increasingly used as rapidly deployable aerial relays and sensing platforms, yet practical deployments must operate under partial observability and ...
BLADE: Adaptive Wi-Fi Contention Control for Next-Generation Real-Time CommunicationFengqian Guo, Yuhan Zhou, Longwei Jiang, Congcong Miao, Yuxin Liu, Chenren Xu, Hancheng Lu, Chang Wen Chen, Yaxiong Xie, Honghao Liu2026-03-17下载Next-generation real-time communication (NGRTC) applications, such as cloud gaming and XR, demand consistently ultra-low latency. However, through our first large-scale measurement, we find that despi...

cs.PF - Performance

标题作者发布日期PDF摘要
Leveraging LLMs for Structured Information Extraction and Analysis from Cloud Incident Reports (Work In Progress Paper)Xiaoyu Chu, Shashikant Ilager, Yizhen Zang, Sacheendra Talluri, Alexandru Iosup2026-03-17下载Incident management is essential to maintain the reliability and availability of cloud computing services. Cloud vendors typically disclose incident reports to the public, summarizing the failures and...
Elastic Sketch under Random Stationary Streams: Limiting Behavior and Near-Optimal ConfigurationYounes Ben Mazziane, Vinay Kumar B. R., Othmane Marfoq2026-03-17下载Elastic-Sketch is a hash-based data structure for counting item's appearances in a data stream, and it has been empirically shown to achieve a better memory-accuracy trade-off compared to classical me...
ETM2: Empowering Traditional Memory Bandwidth Regulation using ETMAlexander Zuepke, Ashutosh Pradhan, Daniele Ottaviano, Andrea Bastoni, Marco Caccamo2026-03-17下载The Embedded Trace Macrocell (ETM) is a standard component of Arm's CoreSight architecture, present in a wide range of platforms and primarily designed for tracing and debugging.
AI Application Benchmarking: Power-Aware Performance Analysis for Vision and Language ModelsMartin Mayr, Sebastian Wind, Lukas Schröder, Georg Hager, Harald Köstler, Gerhard Wellein2026-03-17下载Artificial Intelligence (AI) workloads drive a rapid expansion of high-performance computing (HPC) infrastructures and increase their power and energy demands towards a critical level.

基于 VitePress 构建