2025-05-13

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Dataflow & Tiling Strategies in Edge-AI FPGA Accelerators: A Comprehensive Literature Review	Richie Li	2025-05-13	下载	Edge-AI applications demand high-throughput, low-latency inference on FPGAs under tight resource and power constraints. This survey provides a comprehensive review of two key architectural decisions f...
ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition	Keran Zheng, Yinting Huang, Zhewen Yu, Christos-Savvas Bouganis	2025-05-13	下载	Recent advancements in Large Language Models (LLMs) have demonstrated impressive capabilities as their scale expands to billions of parameters.
AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies	Amit Sharma	2025-05-13	下载	The rapid growth of large-language models (LLMs) is driving a new wave of specialized hardware for inference. This paper presents the first workload-centric, cross-architectural performance study of c...
MINIMALIST: switched-capacitor circuits for efficient in-memory computation of gated recurrent units	Sebastian Billaudelle, Laura Kriener, Filippo Moro, Tristan Torchet, Melika Payvand	2025-05-13	下载	Recurrent neural networks (RNNs) have been a long-standing candidate for processing of temporal sequence data, especially in memory-constrained systems that one may find in embedded edge computing env...
Area Comparison of CHERIoT and PMP in Ibex	Samuel Riedel, Marno van der Maas, John Thomson, Andreas Kurth, Pirmin Vogel	2025-05-13	下载	Memory safety is a critical concern for modern embedded systems, particularly in security-sensitive applications. This paper explores the area impact of adding memory safety extensions to the Ibex RIS...
e-GPU: An Open-Source and Configurable RISC-V Graphic Processing Unit for TinyAI Applications	Simone Machetti, Pasquale Davide Schiavone, Lara Orlandic, Darong Huang, Deniz Kasap, Giovanni Ansaloni, David Atienza	2025-05-13	下载	Graphics processing units (GPUs) excel at parallel processing, but remain largely unexplored in ultra-low-power edge devices (TinyAI) due to their power and area limitations, as well as the lack of su...
SpNeRF: Memory Efficient Sparse Volumetric Neural Rendering Accelerator for Edge Devices	Yipu Zhang, Jiawei Liang, Jian Peng, Jiang Xu, Wei Zhang	2025-05-13	下载	Neural rendering has gained prominence for its high-quality output, which is crucial for AR/VR applications. However, its large voxel grid data size and irregular access patterns challenge real-time p...
Ray Antenna Array: A Novel Cost-Effective Multi-Antenna Architecture for Enhanced Wireless Communication	Zhenjun Dong, Zhiwen Zhou, Yong Zeng	2025-05-13	下载	This paper proposes a novel multi-antenna architecture, termed ray antenna array (RAA), which aims to enhance wireless communication performance in a cost-effective manner.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Toward Accessible and Safe Live Streaming Using Distributed Content Filtering with MoQ	Andrew C. Freeman	2025-05-13	下载	Live video streaming is increasingly popular on social media platforms. With the growth of live streaming comes an increased need for robust content moderation to remove dangerous, illegal, or otherwi...
Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony	Shaoyu Wang, Guangrong He, Geon-Woo Kim, Yanqi Zhou, Seo Jin Park	2025-05-13	下载	Mixture-of-Experts (MoE) architectures offer the promise of larger model capacity without the prohibitive costs of fully dense designs. However, in real-world inference serving, load skew across exper...
ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage	Siyuan Shen, Tommaso Bonato, Zhiyi Hu, Pasquale Jordan, Tiancheng Chen, Torsten Hoefler	2025-05-13	下载	Network simulators play a crucial role in evaluating the performance of large-scale systems. However, existing simulators rely heavily on synthetic microbenchmarks or narrowly focus on specific domain...
Packaging HEP Heterogeneous Mini-apps for Portable Benchmarking and Facility Evaluation on Modern HPCs	Mohammad Atif, Pengfei Ding, Ka Hei Martin Kwok, Charles Leggett	2025-05-13	下载	High Energy Physics (HEP) experiments are making increasing use of GPUs and GPU dominated High Performance Computer facilities. Both the software and hardware of these systems are rapidly evolving, cr...
Comparing Parallel Functional Array Languages: Programming and Performance	David van Balen, Tiziano De Matteis, Clemens Grelck, Troels Henriksen, Aaron W. Hsu, Gabriele K. Keller, Thomas Koopman, Trevor L. McDonell, Cosmin Oancea, Sven-Bodo Scholz, Artjoms Sinkarovs, Tom Smeding, Phil Trinder, Ivo Gabe de Wolff, Alexandros Nikolaos Ziogas	2025-05-13	下载	Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability.
Kudzu: Fast and Simple High-Throughput BFT	Victor Shoup, Jakub Sliwinski, Yann Vonlanthen	2025-05-13	下载	We present Kudzu, a high-throughput atomic broadcast protocol with an integrated fast path. Our contribution is based on the combination of two lines of work.
FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs	Pengcuo Dege, Qiuming Luo, Rui Mao, Chang Kong	2025-05-13	下载	Efficient inference of Multi-Head Latent Attention (MLA) is challenged by deploying the DeepSeek-R1 671B model on a single Multi-GPU server. This paper introduces FlashMLA-ETAP, a novel framework that...
Distributed Quantum Neural Networks on Distributed Photonic Quantum Computing	Kuan-Cheng Chen, Chen-Yu Liu, Yu Shang, Felix Burt, Kin K. Leung	2025-05-13	下载	We introduce a distributed quantum-classical framework that synergizes photonic quantum neural networks (QNNs) with matrix-product-state (MPS) mapping to achieve parameter-efficient training of classi...
Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning	Muhammad Saqib, Dipkumar Mehta, Fnu Yashu, Shubham Malhotra	2025-05-13	下载	The security of cloud environments, such as Amazon Web Services (AWS), is complex and dynamic. Static security policies have become inadequate as threats evolve and cloud resources exhibit elasticity ...
Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles	Matteo Gallici, Ivan Masmitja, Mario Martín	2025-05-13	下载	Autonomous vehicles (AV) offer a cost-effective solution for scientific missions such as underwater tracking. Recently, reinforcement learning (RL) has emerged as a powerful method for controlling AVs...
A Generalized Hierarchical Federated Learning Framework with Theoretical Guarantees	Seyed Mohammad Azimi-Abarghouyi, Carlo Fischione	2025-05-13	下载	Almost all existing hierarchical federated learning (FL) models are limited to two aggregation layers, restricting scalability and flexibility in complex, large-scale networks.
Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions	Keita Teranishi, Harshitha Menon, William F. Godoy, Prasanna Balaprakash, David Bau, Tal Ben-Nun, Abhinav Bhatele, Franz Franchetti, Michael Franusich, Todd Gamblin, Giorgis Georgakoudis, Tom Goldstein, Arjun Guha, Steven Hahn, Costin Iancu, Zheming Jin, Terry Jones, Tze Meng Low, Het Mankad, Narasinga Rao Miniskar, Mohammad Alaul Haque Monil, Daniel Nichols, Konstantinos Parasyris, Swaroop Pophale, Pedro Valero-Lara, Jeffrey S. Vetter, Samuel Williams, Aaron Young	2025-05-13	下载	We discuss the challenges and propose research directions for using AI to revolutionize the development of high-performance computing (HPC) software.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Toward Accessible and Safe Live Streaming Using Distributed Content Filtering with MoQ	Andrew C. Freeman	2025-05-13	下载	Live video streaming is increasingly popular on social media platforms. With the growth of live streaming comes an increased need for robust content moderation to remove dangerous, illegal, or otherwi...
Adaptive Entanglement Generation for Quantum Routing	Tasdiqul Islam, Md Arifuzzaman, Engin Arslan	2025-05-13	下载	Entanglement generation in long-distance quantum networks is a difficult process due to resource limitations and the probabilistic nature of entanglement swapping.
Towards Real-Time Interpolation for Enhanced AUV Deep Sea Mapping	Devanshu Saxena	2025-05-13	下载	Approximately seventy-one percent of the Earth is covered in water. Of that area, ninety-five percent of the ocean has never been explored or mapped.
Decoupling the Device and Identity in Cellular Networks with vSIM	Shirin Ebadi, Zach Moolman, Eric Keller, Tamara Lehman	2025-05-13	下载	Cellular networks are now fundamental infrastructure, powering not just smartphones for daily communication and commerce, but also enabling the expansion of IoT and edge computing through last-mile co...
AI-Driven Digital Twins: Optimizing 5G/6G Network Slicing with NTNs	Afan Ali, Huseyin Arslan	2025-05-13	下载	Network slicing in 5G/6G Non-Terrestrial Network (NTN) is confronted with mobility and traffic variability. An artificial intelligence (AI)-based digital twin (DT) architecture with deep reinforcement...
Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning	Muhammad Saqib, Dipkumar Mehta, Fnu Yashu, Shubham Malhotra	2025-05-13	下载	The security of cloud environments, such as Amazon Web Services (AWS), is complex and dynamic. Static security policies have become inadequate as threats evolve and cloud resources exhibit elasticity ...
Hybrid Wi-Fi/PDR Indoor Localization with Fingerprint Matching	Chunyi Zhang, Zongwei Li, Xiaoqi Li	2025-05-13	下载	Indoor position technology has become one of the research highlights in the Internet of Things (IoT), but there is still a lack of universal, low-cost, and high-precision solutions.
A Generalized Hierarchical Federated Learning Framework with Theoretical Guarantees	Seyed Mohammad Azimi-Abarghouyi, Carlo Fischione	2025-05-13	下载	Almost all existing hierarchical federated learning (FL) models are limited to two aggregation layers, restricting scalability and flexibility in complex, large-scale networks.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Geometric lower bounds for the steady-state occupancy of processing networks with limited connectivity	Diego Goldsztajn, Andres Ferragut	2025-05-13	下载	We consider processing networks where multiple dispatchers are connected to single-server queues by a bipartite compatibility graph, modeling constraints that are common in data centers and cloud netw...
Comparing Parallel Functional Array Languages: Programming and Performance	David van Balen, Tiziano De Matteis, Clemens Grelck, Troels Henriksen, Aaron W. Hsu, Gabriele K. Keller, Thomas Koopman, Trevor L. McDonell, Cosmin Oancea, Sven-Bodo Scholz, Artjoms Sinkarovs, Tom Smeding, Phil Trinder, Ivo Gabe de Wolff, Alexandros Nikolaos Ziogas	2025-05-13	下载	Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability.
Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles	Matteo Gallici, Ivan Masmitja, Mario Martín	2025-05-13	下载	Autonomous vehicles (AV) offer a cost-effective solution for scientific missions such as underwater tracking. Recently, reinforcement learning (RL) has emerged as a powerful method for controlling AVs...
Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions	Keita Teranishi, Harshitha Menon, William F. Godoy, Prasanna Balaprakash, David Bau, Tal Ben-Nun, Abhinav Bhatele, Franz Franchetti, Michael Franusich, Todd Gamblin, Giorgis Georgakoudis, Tom Goldstein, Arjun Guha, Steven Hahn, Costin Iancu, Zheming Jin, Terry Jones, Tze Meng Low, Het Mankad, Narasinga Rao Miniskar, Mohammad Alaul Haque Monil, Daniel Nichols, Konstantinos Parasyris, Swaroop Pophale, Pedro Valero-Lara, Jeffrey S. Vetter, Samuel Williams, Aaron Young	2025-05-13	下载	We discuss the challenges and propose research directions for using AI to revolutionize the development of high-performance computing (HPC) software.