Skip to content

2025-05-13

cs.AR - Architecture

标题作者发布日期PDF摘要
Dataflow & Tiling Strategies in Edge-AI FPGA Accelerators: A Comprehensive Literature ReviewRichie Li2025-05-13下载Edge-AI applications demand high-throughput, low-latency inference on FPGAs under tight resource and power constraints. This survey provides a comprehensive review of two key architectural decisions f...
ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor DecompositionKeran Zheng, Yinting Huang, Zhewen Yu, Christos-Savvas Bouganis2025-05-13下载Recent advancements in Large Language Models (LLMs) have demonstrated impressive capabilities as their scale expands to billions of parameters.
AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling StrategiesAmit Sharma2025-05-13下载The rapid growth of large-language models (LLMs) is driving a new wave of specialized hardware for inference. This paper presents the first workload-centric, cross-architectural performance study of c...
MINIMALIST: switched-capacitor circuits for efficient in-memory computation of gated recurrent unitsSebastian Billaudelle, Laura Kriener, Filippo Moro, Tristan Torchet, Melika Payvand2025-05-13下载Recurrent neural networks (RNNs) have been a long-standing candidate for processing of temporal sequence data, especially in memory-constrained systems that one may find in embedded edge computing env...
Area Comparison of CHERIoT and PMP in IbexSamuel Riedel, Marno van der Maas, John Thomson, Andreas Kurth, Pirmin Vogel2025-05-13下载Memory safety is a critical concern for modern embedded systems, particularly in security-sensitive applications. This paper explores the area impact of adding memory safety extensions to the Ibex RIS...
e-GPU: An Open-Source and Configurable RISC-V Graphic Processing Unit for TinyAI ApplicationsSimone Machetti, Pasquale Davide Schiavone, Lara Orlandic, Darong Huang, Deniz Kasap, Giovanni Ansaloni, David Atienza2025-05-13下载Graphics processing units (GPUs) excel at parallel processing, but remain largely unexplored in ultra-low-power edge devices (TinyAI) due to their power and area limitations, as well as the lack of su...
SpNeRF: Memory Efficient Sparse Volumetric Neural Rendering Accelerator for Edge DevicesYipu Zhang, Jiawei Liang, Jian Peng, Jiang Xu, Wei Zhang2025-05-13下载Neural rendering has gained prominence for its high-quality output, which is crucial for AR/VR applications. However, its large voxel grid data size and irregular access patterns challenge real-time p...
Ray Antenna Array: A Novel Cost-Effective Multi-Antenna Architecture for Enhanced Wireless CommunicationZhenjun Dong, Zhiwen Zhou, Yong Zeng2025-05-13下载This paper proposes a novel multi-antenna architecture, termed ray antenna array (RAA), which aims to enhance wireless communication performance in a cost-effective manner.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Toward Accessible and Safe Live Streaming Using Distributed Content Filtering with MoQAndrew C. Freeman2025-05-13下载Live video streaming is increasingly popular on social media platforms. With the growth of live streaming comes an increased need for robust content moderation to remove dangerous, illegal, or otherwi...
Toward Cost-Efficient Serving of Mixture-of-Experts with AsynchronyShaoyu Wang, Guangrong He, Geon-Woo Kim, Yanqi Zhou, Seo Jin Park2025-05-13下载Mixture-of-Experts (MoE) architectures offer the promise of larger model capacity without the prohibitive costs of fully dense designs. However, in real-world inference serving, load skew across exper...
ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed StorageSiyuan Shen, Tommaso Bonato, Zhiyi Hu, Pasquale Jordan, Tiancheng Chen, Torsten Hoefler2025-05-13下载Network simulators play a crucial role in evaluating the performance of large-scale systems. However, existing simulators rely heavily on synthetic microbenchmarks or narrowly focus on specific domain...
Packaging HEP Heterogeneous Mini-apps for Portable Benchmarking and Facility Evaluation on Modern HPCsMohammad Atif, Pengfei Ding, Ka Hei Martin Kwok, Charles Leggett2025-05-13下载High Energy Physics (HEP) experiments are making increasing use of GPUs and GPU dominated High Performance Computer facilities. Both the software and hardware of these systems are rapidly evolving, cr...
Comparing Parallel Functional Array Languages: Programming and PerformanceDavid van Balen, Tiziano De Matteis, Clemens Grelck, Troels Henriksen, Aaron W. Hsu, Gabriele K. Keller, Thomas Koopman, Trevor L. McDonell, Cosmin Oancea, Sven-Bodo Scholz, Artjoms Sinkarovs, Tom Smeding, Phil Trinder, Ivo Gabe de Wolff, Alexandros Nikolaos Ziogas2025-05-13下载Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability.
Kudzu: Fast and Simple High-Throughput BFTVictor Shoup, Jakub Sliwinski, Yann Vonlanthen2025-05-13下载We present Kudzu, a high-throughput atomic broadcast protocol with an integrated fast path. Our contribution is based on the combination of two lines of work.
FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUsPengcuo Dege, Qiuming Luo, Rui Mao, Chang Kong2025-05-13下载Efficient inference of Multi-Head Latent Attention (MLA) is challenged by deploying the DeepSeek-R1 671B model on a single Multi-GPU server. This paper introduces FlashMLA-ETAP, a novel framework that...
Distributed Quantum Neural Networks on Distributed Photonic Quantum ComputingKuan-Cheng Chen, Chen-Yu Liu, Yu Shang, Felix Burt, Kin K. Leung2025-05-13下载We introduce a distributed quantum-classical framework that synergizes photonic quantum neural networks (QNNs) with matrix-product-state (MPS) mapping to achieve parameter-efficient training of classi...
Adaptive Security Policy Management in Cloud Environments Using Reinforcement LearningMuhammad Saqib, Dipkumar Mehta, Fnu Yashu, Shubham Malhotra2025-05-13下载The security of cloud environments, such as Amazon Web Services (AWS), is complex and dynamic. Static security policies have become inadequate as threats evolve and cloud resources exhibit elasticity ...
Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous VehiclesMatteo Gallici, Ivan Masmitja, Mario Martín2025-05-13下载Autonomous vehicles (AV) offer a cost-effective solution for scientific missions such as underwater tracking. Recently, reinforcement learning (RL) has emerged as a powerful method for controlling AVs...
A Generalized Hierarchical Federated Learning Framework with Theoretical GuaranteesSeyed Mohammad Azimi-Abarghouyi, Carlo Fischione2025-05-13下载Almost all existing hierarchical federated learning (FL) models are limited to two aggregation layers, restricting scalability and flexibility in complex, large-scale networks.
Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research DirectionsKeita Teranishi, Harshitha Menon, William F. Godoy, Prasanna Balaprakash, David Bau, Tal Ben-Nun, Abhinav Bhatele, Franz Franchetti, Michael Franusich, Todd Gamblin, Giorgis Georgakoudis, Tom Goldstein, Arjun Guha, Steven Hahn, Costin Iancu, Zheming Jin, Terry Jones, Tze Meng Low, Het Mankad, Narasinga Rao Miniskar, Mohammad Alaul Haque Monil, Daniel Nichols, Konstantinos Parasyris, Swaroop Pophale, Pedro Valero-Lara, Jeffrey S. Vetter, Samuel Williams, Aaron Young2025-05-13下载We discuss the challenges and propose research directions for using AI to revolutionize the development of high-performance computing (HPC) software.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Toward Accessible and Safe Live Streaming Using Distributed Content Filtering with MoQAndrew C. Freeman2025-05-13下载Live video streaming is increasingly popular on social media platforms. With the growth of live streaming comes an increased need for robust content moderation to remove dangerous, illegal, or otherwi...
Adaptive Entanglement Generation for Quantum RoutingTasdiqul Islam, Md Arifuzzaman, Engin Arslan2025-05-13下载Entanglement generation in long-distance quantum networks is a difficult process due to resource limitations and the probabilistic nature of entanglement swapping.
Towards Real-Time Interpolation for Enhanced AUV Deep Sea MappingDevanshu Saxena2025-05-13下载Approximately seventy-one percent of the Earth is covered in water. Of that area, ninety-five percent of the ocean has never been explored or mapped.
Decoupling the Device and Identity in Cellular Networks with vSIMShirin Ebadi, Zach Moolman, Eric Keller, Tamara Lehman2025-05-13下载Cellular networks are now fundamental infrastructure, powering not just smartphones for daily communication and commerce, but also enabling the expansion of IoT and edge computing through last-mile co...
AI-Driven Digital Twins: Optimizing 5G/6G Network Slicing with NTNsAfan Ali, Huseyin Arslan2025-05-13下载Network slicing in 5G/6G Non-Terrestrial Network (NTN) is confronted with mobility and traffic variability. An artificial intelligence (AI)-based digital twin (DT) architecture with deep reinforcement...
Adaptive Security Policy Management in Cloud Environments Using Reinforcement LearningMuhammad Saqib, Dipkumar Mehta, Fnu Yashu, Shubham Malhotra2025-05-13下载The security of cloud environments, such as Amazon Web Services (AWS), is complex and dynamic. Static security policies have become inadequate as threats evolve and cloud resources exhibit elasticity ...
Hybrid Wi-Fi/PDR Indoor Localization with Fingerprint MatchingChunyi Zhang, Zongwei Li, Xiaoqi Li2025-05-13下载Indoor position technology has become one of the research highlights in the Internet of Things (IoT), but there is still a lack of universal, low-cost, and high-precision solutions.
A Generalized Hierarchical Federated Learning Framework with Theoretical GuaranteesSeyed Mohammad Azimi-Abarghouyi, Carlo Fischione2025-05-13下载Almost all existing hierarchical federated learning (FL) models are limited to two aggregation layers, restricting scalability and flexibility in complex, large-scale networks.

cs.PF - Performance

标题作者发布日期PDF摘要
Geometric lower bounds for the steady-state occupancy of processing networks with limited connectivityDiego Goldsztajn, Andres Ferragut2025-05-13下载We consider processing networks where multiple dispatchers are connected to single-server queues by a bipartite compatibility graph, modeling constraints that are common in data centers and cloud netw...
Comparing Parallel Functional Array Languages: Programming and PerformanceDavid van Balen, Tiziano De Matteis, Clemens Grelck, Troels Henriksen, Aaron W. Hsu, Gabriele K. Keller, Thomas Koopman, Trevor L. McDonell, Cosmin Oancea, Sven-Bodo Scholz, Artjoms Sinkarovs, Tom Smeding, Phil Trinder, Ivo Gabe de Wolff, Alexandros Nikolaos Ziogas2025-05-13下载Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability.
Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous VehiclesMatteo Gallici, Ivan Masmitja, Mario Martín2025-05-13下载Autonomous vehicles (AV) offer a cost-effective solution for scientific missions such as underwater tracking. Recently, reinforcement learning (RL) has emerged as a powerful method for controlling AVs...
Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research DirectionsKeita Teranishi, Harshitha Menon, William F. Godoy, Prasanna Balaprakash, David Bau, Tal Ben-Nun, Abhinav Bhatele, Franz Franchetti, Michael Franusich, Todd Gamblin, Giorgis Georgakoudis, Tom Goldstein, Arjun Guha, Steven Hahn, Costin Iancu, Zheming Jin, Terry Jones, Tze Meng Low, Het Mankad, Narasinga Rao Miniskar, Mohammad Alaul Haque Monil, Daniel Nichols, Konstantinos Parasyris, Swaroop Pophale, Pedro Valero-Lara, Jeffrey S. Vetter, Samuel Williams, Aaron Young2025-05-13下载We discuss the challenges and propose research directions for using AI to revolutionize the development of high-performance computing (HPC) software.

基于 VitePress 构建