Appearance
2026-03-17
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| An FPGA-Based SoC Architecture with a RISC-V Controller for Energy-Efficient Temporal-Coding Spiking Neural Networks | Mohammad Javad Sekonji, Ali Mahani, Maryam Mirsadeghi, Mahdi Taheri | 2026-03-17 | 下载 | Spiking Neural Networks (SNNs) offer high energy efficiency and event-driven computation, ideal for low-power edge AI. Their hardware implementation on FPGAs, however, faces challenges due to heavy co... |
| Mix-and-Match Pruning: Globally Guided Layer-Wise Sparsification of DNNs | Danial Monachan, Samira Nazari, Mahdi Taheri, Ali Azarpeyvand, Milos Krstic, Michael Huebner, Christian Herglotz | 2026-03-17 | 下载 | Deploying deep neural networks (DNNs) on edge devices requires strong compression with minimal accuracy loss. This paper introduces Mix-and-Match Pruning, a globally guided, layer-wise sparsification ... |
| CODMAS: A Dialectic Multi-Agent Collaborative Framework for Structured RTL Optimization | Che-Ming Chang, Prashanth Vijayaraghavan, Ashutosh Jadhav, Charles Mackin, Vandana Mukherjee, Hsinyu Tsai, Ehsan Degan | 2026-03-17 | 下载 | Optimizing Register Transfer Level (RTL) code is a critical step in Electronic Design Automation (EDA) for improving power, performance, and area (PPA). |
| Vectorization of Verilog Designs and its Effects on Verification and Synthesis | Maria Fernanda Oiveira Guimarães, Ulisses Rosa, Ian Trudel, João Victor Amorim Vieira, Augusto Amaral Mafra, Mirlaine Crepalde, Fernando Magno Quintão Pereira | 2026-03-17 | 下载 | Vectorization is a compiler optimization that replaces multiple operations on scalar values with a single operation on vector values. Although common in traditional compilers such as rustc, clang, and... |
| ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation | Nij Dorairaj, Debabrata Chatterjee, Hong Wang, Hong Jiang, Alankar Saxena, Altug Koker, Thiam Ern Lim, Cathrane Teoh, Chuan Yin Loo, Bishara Shomar, Anthony Lester | 2026-03-17 | 下载 | Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability. |
| Deep Learning-Driven Black-Box Doherty Power Amplifier with Pixelated Output Combiner and Extended Efficiency Range | Han Zhou, Haojie Chang, David Widen | 2026-03-17 | 下载 | This article presents a deep learning-driven inverse design methodology for Doherty power amplifiers (PA) with multi-port pixelated output combiner networks. |
| A Pin-Array Structured Climbing Robot for Stable Locomotion on Steep Rocky Terrain | Keita Nagaoka, Kentaro Uno, Kazuya Yoshida | 2026-03-17 | 下载 | Climbing robots face significant challenges when navigating unstructured environments, where reliable attachment to irregular surfaces is critical. |
| ETM2: Empowering Traditional Memory Bandwidth Regulation using ETM | Alexander Zuepke, Ashutosh Pradhan, Daniele Ottaviano, Andrea Bastoni, Marco Caccamo | 2026-03-17 | 下载 | The Embedded Trace Macrocell (ETM) is a standard component of Arm's CoreSight architecture, present in a wide range of platforms and primarily designed for tracing and debugging. |
| A Scalable Open-Source QEC System with Sub-Microsecond Decoding-Feedback Latency | Junyi Liu, Yi Lee, Yilun Xu, Gang Huang, Xiaodi Wu | 2026-03-17 | 下载 | Quantum error correction (QEC) is essential for realizing large-scale, fault-tolerant quantum computation, yet its practical implementation remains a major engineering challenge. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage | Haidong Rong, Jiashu Yao, Matthias Langer, Shijie Liu, Li Fan, Dongxin Wang, Jia He, Jinglin Chen, Jiaheng Rang, Julian Qian, Mengyao Xu, Fan Yu, Minseok Lee, Zehuan Wang, Even Oldridge | 2026-03-17 | 下载 | Traditional GPU hash tables preserve every inserted key -- a dictionary assumption that wastes scarce High Bandwidth Memory (HBM) when embedding tables routinely exceed single-GPU capacity. |
| Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guide to Parallel Newton Methods for Breaking Sequential Bottlenecks | Xavier Gonzalez | 2026-03-17 | 下载 | Massively parallel hardware (GPUs) and long sequence data have made parallel algorithms essential for machine learning at scale. Yet dynamical systems, like recurrent neural networks and Markov chain ... |
| ODIN-Based CPU-GPU Architecture with Replay-Driven Simulation and Emulation | Nij Dorairaj, Debabrata Chatterjee, Hong Wang, Hong Jiang, Alankar Saxena, Altug Koker, Thiam Ern Lim, Cathrane Teoh, Chuan Yin Loo, Bishara Shomar, Anthony Lester | 2026-03-17 | 下载 | Integration of CPU and GPU technologies is a key enabler for modern AI and graphics workloads, combining control-oriented processing with massive parallel compute capability. |
| Looking for (Genomic) Needles in a Haystack: Sparsity-Driven Search for Identifying Correlated Genetic Mutations in Cancer | Ritvik Prabhu, Emil Vatai, Bernard Moussad, Emmanuel Jeannot, Ramu Anandakrishnan, Wu-chun Feng, Mohamed Wahib | 2026-03-17 | 下载 | Cancer typically arises not from a single genetic mutation (i.e., hit) but from multi-hit combinations that accumulate within cells. However, enumerating multi-hit combinations becomes exponentially m... |
| Dataflow-Oriented Classification and Performance Analysis of GPU-Accelerated Homomorphic Encryption | Ai Nozaki, Takuya Kojima, Hideki Takase, Hiroshi Nakamura | 2026-03-17 | 下载 | Fully Homomorphic Encryption (FHE) enables secure computation over encrypted data, but its computational cost remains a major obstacle to practical deployment. |
| Accelerating the Particle-In-Cell code ECsim with OpenACC | Elisabetta Boella, Nitin Shukla, Filippo Spiga, Mozhgan Kabiri Chimeh, Matt Bettencourt, Maria Elena Innocenti | 2026-03-17 | 下载 | The Particle-In-Cell (PIC) method is a computational technique widely used in plasma physics to model plasmas at the kinetic level. In this work, we present our effort to prepare the semi-implicit ene... |
| FleetOpt: Analytical Fleet Provisioning for LLM Inference with Compress-and-Route as Implementation Mechanism | Huamin Chen, Xunzhuo Liu, Yuhan Liu, Junchen Jiang, Bowei He, Xue Liu | 2026-03-17 | 下载 | Modern LLM GPU fleets are provisioned for worst-case context lengths that the vast majority of requests never approach, wasting GPU capacity on idle KV-cache slots. |
| When GPUs Fail Quietly: Observability-Aware Early Warning Beyond Numeric Telemetry | Michael Bidollahkhani, Freja Nordsiek, Julian M. Kunkel | 2026-03-17 | 下载 | GPU nodes are central to modern HPC and AI workloads, yet many failures do not manifest as immediate hard faults. While some instabilities emerge gradually as weak thermal or efficiency drift, a signi... |
| An Efficient Heterogeneous Co-Design for Fine-Tuning on a Single GPU | Ruijia Yang, Zeyi Wen | 2026-03-17 | 下载 | Fine-tuning Large Language Models (LLMs) has become essential for domain adaptation, but its memory-intensive property exceeds the capabilities of most GPUs. |
| Biased Compression in Gradient Coding for Distributed Learning | Chengxi Li, Ming Xiao, Mikael Skoglund | 2026-03-17 | 下载 | Communication bottlenecks and the presence of stragglers pose significant challenges in distributed learning (DL). To deal with these challenges, recent advances leverage unbiased compression function... |
| Byzantine-Robust and Communication-Efficient Distributed Training: Compressive and Cyclic Gradient Coding | Chengxi Li, Youssef Allouah, Rachid Guerraoui, Mikael Skoglund, Ming Xiao | 2026-03-17 | 下载 | In this paper, we study the problem of distributed training (DT) under Byzantine attacks with communication constraints. While prior work has developed various robust aggregation rules at the server t... |
| inference-fleet-sim: A Queueing-Theory-Grounded Fleet Capacity Planner for LLM Inference | Huamin Chen, Xunzhuo Liu, Yuhan Liu, Junchen Jiang, Bowei He, Xue Liu | 2026-03-17 | 下载 | Sizing a GPU fleet for LLM inference is harder than it looks. The obvious questions -- how many GPUs, which type, where to split a two-pool fleet -- have no closed-form answers. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| AI-Driven Multi-Modal Adaptive Handover Control Optimization for O-RAN | Abdul Wadud, Fatemeh Golpayegani, Nima Afraz | 2026-03-17 | 下载 | Handover optimization in O-RAN faces growing challenges due to heterogeneous user mobility patterns and rapidly varying radio conditions. Existing ML-based handover schemes typically operate at the ne... |
| HAPS-RIS-assisted IoT Networks for Disaster Recovery and Emergency Response: Architecture, Application Scenarios, and Open Challenges | Bilal Karaman, Ilhan Basturk, Engin Zeydan, Ferdi Kara, Esra Aycan Beyazit, Sezai Taskin, Halim Yanikomeroglu | 2026-03-17 | 下载 | Reliable and resilient communication is essential for disaster recovery and emergency response, yet terrestrial infrastructure often fails during large-scale natural disasters. |
| Evaluating Smartphone GNSS Accuracy for Geofenced 6 GHz Operations | Joshua Roy Palathinkal, Hardani Ismu Nabil, Muhammad Iqbal Rochman, Hossein Nasiri, Francis A. Gatsi, Monisha Ghosh | 2026-03-17 | 下载 | The recently deployed 6 GHz spectrum in the U.S. utilizes distinct power categories, with the latest proposed "Geofenced Variable Power" (GVP) category permitting indoor and outdoor operations without... |
| Fine-Grained Network Traffic Classification with Contextual QoS Profiling | Huiwen Zhang, Feng Ye | 2026-03-17 | 下载 | Accurate network traffic classification is vital for managing modern applications with strict Quality of Service (QoS) demands, such as edge computing, real-time XR, and autonomous systems. |
| Persistent Device Identity for Network Access Control in the Era of MAC Address Randomization: A RADIUS-Based Framework | Premanand Seralathan | 2026-03-17 | 下载 | Modern operating systems increasingly randomize Media Access Control (MAC) addresses to protect user privacy, fundamentally disrupting Network Access Control (NAC) systems that have relied on MAC addr... |
| Optimal Radio Resource Management for ISAC Under Imperfect Information: A Resource Economy-Driven Perspective | Luis F. Abanto-Leon, Setareh Maghsudi | 2026-03-17 | 下载 | This work investigates the radio resource management (RRM) design for downlink integrated sensing and communications (ISAC) systems, jointly optimizing timeslot allocation, beam adaptation, functional... |
| Agentic AI for SAGIN Resource Management_Semantic Awareness, Orchestration, and Optimization | Linghao Zhang, Haitao Zhao, Bo Xu, Hongbo Zhu, Xianbin Wang | 2026-03-17 | 下载 | Space-air-ground integrated networks (SAGIN) promise ubiquitous 6G connectivity but face significant resource management challenges due to heterogeneous infrastructure, dynamic topologies, and stringe... |
| Toward Experimentation-as-a-Service in 5G/6G: The Plaza6G Prototype for AI-Assisted Trials | Sergio Barrachina-Muñoz, Marc Carrascosa-Zamacois, Horacio Bleda, Umair Riaz, Yasir Maqsood, Xavier Calle, Selva Vía, Miquel Payaró, Josep Mangues-Bafalluy | 2026-03-17 | 下载 | This paper presents Plaza6G, the first operational Experiment-as-a-Service (ExaS) platform unifying cloud resources with next-generation wireless infrastructure. |
| Communication-Aware Multi-Agent Reinforcement Learning for Decentralized Cooperative UAV Deployment | Enguang Fan, Yifan Chen, Zihan Shan, Matthew Caesar, Jae Kim | 2026-03-17 | 下载 | Autonomous Unmanned Aerial Vehicle (UAV) swarms are increasingly used as rapidly deployable aerial relays and sensing platforms, yet practical deployments must operate under partial observability and ... |
| BLADE: Adaptive Wi-Fi Contention Control for Next-Generation Real-Time Communication | Fengqian Guo, Yuhan Zhou, Longwei Jiang, Congcong Miao, Yuxin Liu, Chenren Xu, Hancheng Lu, Chang Wen Chen, Yaxiong Xie, Honghao Liu | 2026-03-17 | 下载 | Next-generation real-time communication (NGRTC) applications, such as cloud gaming and XR, demand consistently ultra-low latency. However, through our first large-scale measurement, we find that despi... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Leveraging LLMs for Structured Information Extraction and Analysis from Cloud Incident Reports (Work In Progress Paper) | Xiaoyu Chu, Shashikant Ilager, Yizhen Zang, Sacheendra Talluri, Alexandru Iosup | 2026-03-17 | 下载 | Incident management is essential to maintain the reliability and availability of cloud computing services. Cloud vendors typically disclose incident reports to the public, summarizing the failures and... |
| Elastic Sketch under Random Stationary Streams: Limiting Behavior and Near-Optimal Configuration | Younes Ben Mazziane, Vinay Kumar B. R., Othmane Marfoq | 2026-03-17 | 下载 | Elastic-Sketch is a hash-based data structure for counting item's appearances in a data stream, and it has been empirically shown to achieve a better memory-accuracy trade-off compared to classical me... |
| ETM2: Empowering Traditional Memory Bandwidth Regulation using ETM | Alexander Zuepke, Ashutosh Pradhan, Daniele Ottaviano, Andrea Bastoni, Marco Caccamo | 2026-03-17 | 下载 | The Embedded Trace Macrocell (ETM) is a standard component of Arm's CoreSight architecture, present in a wide range of platforms and primarily designed for tracing and debugging. |
| AI Application Benchmarking: Power-Aware Performance Analysis for Vision and Language Models | Martin Mayr, Sebastian Wind, Lukas Schröder, Georg Hager, Harald Köstler, Gerhard Wellein | 2026-03-17 | 下载 | Artificial Intelligence (AI) workloads drive a rapid expansion of high-performance computing (HPC) infrastructures and increase their power and energy demands towards a critical level. |