Appearance
2025-05-13
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Dataflow & Tiling Strategies in Edge-AI FPGA Accelerators: A Comprehensive Literature Review | Richie Li | 2025-05-13 | 下载 | Edge-AI applications demand high-throughput, low-latency inference on FPGAs under tight resource and power constraints. This survey provides a comprehensive review of two key architectural decisions f... |
| ITERA-LLM: Boosting Sub-8-Bit Large Language Model Inference via Iterative Tensor Decomposition | Keran Zheng, Yinting Huang, Zhewen Yu, Christos-Savvas Bouganis | 2025-05-13 | 下载 | Recent advancements in Large Language Models (LLMs) have demonstrated impressive capabilities as their scale expands to billions of parameters. |
| AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies | Amit Sharma | 2025-05-13 | 下载 | The rapid growth of large-language models (LLMs) is driving a new wave of specialized hardware for inference. This paper presents the first workload-centric, cross-architectural performance study of c... |
| MINIMALIST: switched-capacitor circuits for efficient in-memory computation of gated recurrent units | Sebastian Billaudelle, Laura Kriener, Filippo Moro, Tristan Torchet, Melika Payvand | 2025-05-13 | 下载 | Recurrent neural networks (RNNs) have been a long-standing candidate for processing of temporal sequence data, especially in memory-constrained systems that one may find in embedded edge computing env... |
| Area Comparison of CHERIoT and PMP in Ibex | Samuel Riedel, Marno van der Maas, John Thomson, Andreas Kurth, Pirmin Vogel | 2025-05-13 | 下载 | Memory safety is a critical concern for modern embedded systems, particularly in security-sensitive applications. This paper explores the area impact of adding memory safety extensions to the Ibex RIS... |
| e-GPU: An Open-Source and Configurable RISC-V Graphic Processing Unit for TinyAI Applications | Simone Machetti, Pasquale Davide Schiavone, Lara Orlandic, Darong Huang, Deniz Kasap, Giovanni Ansaloni, David Atienza | 2025-05-13 | 下载 | Graphics processing units (GPUs) excel at parallel processing, but remain largely unexplored in ultra-low-power edge devices (TinyAI) due to their power and area limitations, as well as the lack of su... |
| SpNeRF: Memory Efficient Sparse Volumetric Neural Rendering Accelerator for Edge Devices | Yipu Zhang, Jiawei Liang, Jian Peng, Jiang Xu, Wei Zhang | 2025-05-13 | 下载 | Neural rendering has gained prominence for its high-quality output, which is crucial for AR/VR applications. However, its large voxel grid data size and irregular access patterns challenge real-time p... |
| Ray Antenna Array: A Novel Cost-Effective Multi-Antenna Architecture for Enhanced Wireless Communication | Zhenjun Dong, Zhiwen Zhou, Yong Zeng | 2025-05-13 | 下载 | This paper proposes a novel multi-antenna architecture, termed ray antenna array (RAA), which aims to enhance wireless communication performance in a cost-effective manner. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Toward Accessible and Safe Live Streaming Using Distributed Content Filtering with MoQ | Andrew C. Freeman | 2025-05-13 | 下载 | Live video streaming is increasingly popular on social media platforms. With the growth of live streaming comes an increased need for robust content moderation to remove dangerous, illegal, or otherwi... |
| Toward Cost-Efficient Serving of Mixture-of-Experts with Asynchrony | Shaoyu Wang, Guangrong He, Geon-Woo Kim, Yanqi Zhou, Seo Jin Park | 2025-05-13 | 下载 | Mixture-of-Experts (MoE) architectures offer the promise of larger model capacity without the prohibitive costs of fully dense designs. However, in real-world inference serving, load skew across exper... |
| ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage | Siyuan Shen, Tommaso Bonato, Zhiyi Hu, Pasquale Jordan, Tiancheng Chen, Torsten Hoefler | 2025-05-13 | 下载 | Network simulators play a crucial role in evaluating the performance of large-scale systems. However, existing simulators rely heavily on synthetic microbenchmarks or narrowly focus on specific domain... |
| Packaging HEP Heterogeneous Mini-apps for Portable Benchmarking and Facility Evaluation on Modern HPCs | Mohammad Atif, Pengfei Ding, Ka Hei Martin Kwok, Charles Leggett | 2025-05-13 | 下载 | High Energy Physics (HEP) experiments are making increasing use of GPUs and GPU dominated High Performance Computer facilities. Both the software and hardware of these systems are rapidly evolving, cr... |
| Comparing Parallel Functional Array Languages: Programming and Performance | David van Balen, Tiziano De Matteis, Clemens Grelck, Troels Henriksen, Aaron W. Hsu, Gabriele K. Keller, Thomas Koopman, Trevor L. McDonell, Cosmin Oancea, Sven-Bodo Scholz, Artjoms Sinkarovs, Tom Smeding, Phil Trinder, Ivo Gabe de Wolff, Alexandros Nikolaos Ziogas | 2025-05-13 | 下载 | Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability. |
| Kudzu: Fast and Simple High-Throughput BFT | Victor Shoup, Jakub Sliwinski, Yann Vonlanthen | 2025-05-13 | 下载 | We present Kudzu, a high-throughput atomic broadcast protocol with an integrated fast path. Our contribution is based on the combination of two lines of work. |
| FlashMLA-ETAP: Efficient Transpose Attention Pipeline for Accelerating MLA Inference on NVIDIA H20 GPUs | Pengcuo Dege, Qiuming Luo, Rui Mao, Chang Kong | 2025-05-13 | 下载 | Efficient inference of Multi-Head Latent Attention (MLA) is challenged by deploying the DeepSeek-R1 671B model on a single Multi-GPU server. This paper introduces FlashMLA-ETAP, a novel framework that... |
| Distributed Quantum Neural Networks on Distributed Photonic Quantum Computing | Kuan-Cheng Chen, Chen-Yu Liu, Yu Shang, Felix Burt, Kin K. Leung | 2025-05-13 | 下载 | We introduce a distributed quantum-classical framework that synergizes photonic quantum neural networks (QNNs) with matrix-product-state (MPS) mapping to achieve parameter-efficient training of classi... |
| Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning | Muhammad Saqib, Dipkumar Mehta, Fnu Yashu, Shubham Malhotra | 2025-05-13 | 下载 | The security of cloud environments, such as Amazon Web Services (AWS), is complex and dynamic. Static security policies have become inadequate as threats evolve and cloud resources exhibit elasticity ... |
| Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles | Matteo Gallici, Ivan Masmitja, Mario Martín | 2025-05-13 | 下载 | Autonomous vehicles (AV) offer a cost-effective solution for scientific missions such as underwater tracking. Recently, reinforcement learning (RL) has emerged as a powerful method for controlling AVs... |
| A Generalized Hierarchical Federated Learning Framework with Theoretical Guarantees | Seyed Mohammad Azimi-Abarghouyi, Carlo Fischione | 2025-05-13 | 下载 | Almost all existing hierarchical federated learning (FL) models are limited to two aggregation layers, restricting scalability and flexibility in complex, large-scale networks. |
| Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions | Keita Teranishi, Harshitha Menon, William F. Godoy, Prasanna Balaprakash, David Bau, Tal Ben-Nun, Abhinav Bhatele, Franz Franchetti, Michael Franusich, Todd Gamblin, Giorgis Georgakoudis, Tom Goldstein, Arjun Guha, Steven Hahn, Costin Iancu, Zheming Jin, Terry Jones, Tze Meng Low, Het Mankad, Narasinga Rao Miniskar, Mohammad Alaul Haque Monil, Daniel Nichols, Konstantinos Parasyris, Swaroop Pophale, Pedro Valero-Lara, Jeffrey S. Vetter, Samuel Williams, Aaron Young | 2025-05-13 | 下载 | We discuss the challenges and propose research directions for using AI to revolutionize the development of high-performance computing (HPC) software. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Toward Accessible and Safe Live Streaming Using Distributed Content Filtering with MoQ | Andrew C. Freeman | 2025-05-13 | 下载 | Live video streaming is increasingly popular on social media platforms. With the growth of live streaming comes an increased need for robust content moderation to remove dangerous, illegal, or otherwi... |
| Adaptive Entanglement Generation for Quantum Routing | Tasdiqul Islam, Md Arifuzzaman, Engin Arslan | 2025-05-13 | 下载 | Entanglement generation in long-distance quantum networks is a difficult process due to resource limitations and the probabilistic nature of entanglement swapping. |
| Towards Real-Time Interpolation for Enhanced AUV Deep Sea Mapping | Devanshu Saxena | 2025-05-13 | 下载 | Approximately seventy-one percent of the Earth is covered in water. Of that area, ninety-five percent of the ocean has never been explored or mapped. |
| Decoupling the Device and Identity in Cellular Networks with vSIM | Shirin Ebadi, Zach Moolman, Eric Keller, Tamara Lehman | 2025-05-13 | 下载 | Cellular networks are now fundamental infrastructure, powering not just smartphones for daily communication and commerce, but also enabling the expansion of IoT and edge computing through last-mile co... |
| AI-Driven Digital Twins: Optimizing 5G/6G Network Slicing with NTNs | Afan Ali, Huseyin Arslan | 2025-05-13 | 下载 | Network slicing in 5G/6G Non-Terrestrial Network (NTN) is confronted with mobility and traffic variability. An artificial intelligence (AI)-based digital twin (DT) architecture with deep reinforcement... |
| Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning | Muhammad Saqib, Dipkumar Mehta, Fnu Yashu, Shubham Malhotra | 2025-05-13 | 下载 | The security of cloud environments, such as Amazon Web Services (AWS), is complex and dynamic. Static security policies have become inadequate as threats evolve and cloud resources exhibit elasticity ... |
| Hybrid Wi-Fi/PDR Indoor Localization with Fingerprint Matching | Chunyi Zhang, Zongwei Li, Xiaoqi Li | 2025-05-13 | 下载 | Indoor position technology has become one of the research highlights in the Internet of Things (IoT), but there is still a lack of universal, low-cost, and high-precision solutions. |
| A Generalized Hierarchical Federated Learning Framework with Theoretical Guarantees | Seyed Mohammad Azimi-Abarghouyi, Carlo Fischione | 2025-05-13 | 下载 | Almost all existing hierarchical federated learning (FL) models are limited to two aggregation layers, restricting scalability and flexibility in complex, large-scale networks. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Geometric lower bounds for the steady-state occupancy of processing networks with limited connectivity | Diego Goldsztajn, Andres Ferragut | 2025-05-13 | 下载 | We consider processing networks where multiple dispatchers are connected to single-server queues by a bipartite compatibility graph, modeling constraints that are common in data centers and cloud netw... |
| Comparing Parallel Functional Array Languages: Programming and Performance | David van Balen, Tiziano De Matteis, Clemens Grelck, Troels Henriksen, Aaron W. Hsu, Gabriele K. Keller, Thomas Koopman, Trevor L. McDonell, Cosmin Oancea, Sven-Bodo Scholz, Artjoms Sinkarovs, Tom Smeding, Phil Trinder, Ivo Gabe de Wolff, Alexandros Nikolaos Ziogas | 2025-05-13 | 下载 | Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability. |
| Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles | Matteo Gallici, Ivan Masmitja, Mario Martín | 2025-05-13 | 下载 | Autonomous vehicles (AV) offer a cost-effective solution for scientific missions such as underwater tracking. Recently, reinforcement learning (RL) has emerged as a powerful method for controlling AVs... |
| Leveraging AI for Productive and Trustworthy HPC Software: Challenges and Research Directions | Keita Teranishi, Harshitha Menon, William F. Godoy, Prasanna Balaprakash, David Bau, Tal Ben-Nun, Abhinav Bhatele, Franz Franchetti, Michael Franusich, Todd Gamblin, Giorgis Georgakoudis, Tom Goldstein, Arjun Guha, Steven Hahn, Costin Iancu, Zheming Jin, Terry Jones, Tze Meng Low, Het Mankad, Narasinga Rao Miniskar, Mohammad Alaul Haque Monil, Daniel Nichols, Konstantinos Parasyris, Swaroop Pophale, Pedro Valero-Lara, Jeffrey S. Vetter, Samuel Williams, Aaron Young | 2025-05-13 | 下载 | We discuss the challenges and propose research directions for using AI to revolutionize the development of high-performance computing (HPC) software. |