Appearance
2025-08-29
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Neural Network Acceleration on MPSoC board: Integrating SLAC's SNL, Rogue Software and Auto-SNL | Hamza Ezzaoui Rahali, Abhilasha Dave, Larry Ruckman, Mohammad Mehdi Rahimifar, Audrey C. Therrien, James J. Russel, Ryan T. Herbst | 2025-08-29 | 下载 | The LCLS-II Free Electron Laser (FEL) will generate X-ray pulses for beamline experiments at rates of up to 1~MHz, with detectors producing data throughputs exceeding 1 TB/s. |
| Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators | Wenyong Zhou, Zhengwu Liu, Yuan Ren, Ngai Wong | 2025-08-29 | 下载 | Compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs). |
| SIRA: Scaled-Integer Range Analysis for Optimizing FPGA Dataflow Neural Network Accelerators | Yaman Umuroglu, Christoph Berganski, Felix Jentzsch, Michal Danilowicz, Tomasz Kryjak, Charalampos Bezaitis, Magnus Sjalander, Ian Colbert, Thomas Preusser, Jakoba Petri-Koenig, Michaela Blott | 2025-08-29 | 下载 | While neural network quantization effectively reduces the cost of matrix multiplications, aggressive quantization can expose non-matrix-multiply operations as significant performance and resource bott... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor Decomposition | Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Yongseok Soh, Jesmin Jahan Tithi, Fabrizio Petrini, Jee Choi | 2025-08-29 | 下载 | Tensor decomposition (TD) is essential for analyzing high-dimensional sparse data, yet its irregular computations and memory-access patterns pose major performance challenges on modern parallel proces... |
| Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference | Ruokai Yin, Sattwik Deb Mishra, Xuan Zuo, Hokchhay Tann, Preyas Shah, Apala Guha | 2025-08-29 | 下载 | Distributed LLM inference requires careful coordination of parallelization strategies across hundreds to thousands of NPUs to meet production SLOs. |
| Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding | Zhibin Wang, Zhonghui Zhang, Yuhang Zhou, Zibo Wang, Mo Zhou, Peng Jiang, Weilin Cai, Chengying Huan, Rong Gu, Sheng Zhong, Chen Tian | 2025-08-29 | 下载 | Recent advancements in Mixture of Experts (MoE) models have significantly increased their parameter scale as well as model performance. Extensive offloading techniques have been proposed to address th... |
| Odyssey: Adaptive Policy Selection for Resilient Distributed Training | Yuhang Zhou, Zhibin Wang, Peng Jiang, Haoran Xia, Junhe Lu, Qianyu Jiang, Rong Gu, Hengxi Xu, Xinjing Huang, Guanghuan Fang, Zhiheng Hu, Jingyi Zhang, Yongjin Cai, Jian He, Chen Tian | 2025-08-29 | 下载 | Training large language models faces frequent interruptions due to various faults, demanding robust fault-tolerance. Existing backup-free methods, such as redundant computation, dynamic parallelism, a... |
| Unpacking Maximum Extractable Value on Polygon: A Study on Atomic Arbitrage | Daniil Vostrikov, Yash Madhwal, Andrey Seoev, Anastasiia Smirnova, Yury Yanovich, Alexey Smirnov, Vladimir Gorgadze | 2025-08-29 | 下载 | The evolution of blockchain technology, from its origins as a decentralized ledger for cryptocurrencies to its broader applications in areas like decentralized finance (DeFi), has significantly transf... |
| An Optimistic Gradient Tracking Method for Distributed Minimax Optimization | Yan Huang, Jinming Xu, Jiming Chen, Karl Henrik Johansson | 2025-08-29 | 下载 | This paper studies the distributed minimax optimization problem over networks. To enhance convergence performance, we propose a distributed optimistic gradient tracking method, termed DOGT, which solv... |
| A Knowledge Distillation-empowered Adaptive Federated Reinforcement Learning Framework for Multi-Domain IoT Applications Scheduling | Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya | 2025-08-29 | 下载 | The rapid proliferation of Internet of Things (IoT) applications across heterogeneous Cloud-Edge-IoT environments presents significant challenges in distributed scheduling optimization. |
| Addressing Reproducibility Challenges in HPC with Continuous Integration | Valérie Hayot-Sasson, Nathaniel Hudson, André Bauer, Maxime Gonthier, Ian Foster, Kyle Chard | 2025-08-29 | 下载 | The high-performance computing (HPC) community has adopted incentive structures to motivate reproducible research, with major conferences awarding badges to papers that meet reproducibility requiremen... |
| Decentralized Federated Averaging via Random Walk | Changheng Wang, Zhiqing Wei, Lizhe Liu, Qiao Deng, Yingda Wu, Yangyang Niu, Yashan Pang, Zhiyong Feng | 2025-08-29 | 下载 | Federated Learning (FL) is a communication-efficient distributed machine learning method that allows multiple devices to collaboratively train models without sharing raw data. |
| On the Optimization of Methods for Establishing Well-Connected Communities | Mohammad Dindoost, Oliver Alvarado Rodriguez, Bartosz Bryg, Minhyuk Park, George Chacko, Tandy Warnow, David A. Bader | 2025-08-29 | 下载 | Community detection plays a central role in uncovering meso scale structures in networks. However, existing methods often suffer from disconnected or weakly connected clusters, undermining interpretab... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| QoS-Aware Proportional Fairness Scheduling for Multi-Flow 5G UEs: A Smart Factory Perspective | Mohamed Seliem, Utz Roedig, Cormac Sreenan, Dirk Pesch | 2025-08-29 | 下载 | Private 5G networks are emerging as key enablers for smart factories, where a single device often handles multiple concurrent traffic flows with distinct Quality of Service (QoS) requirements. |
| VOTA: Parallelizing 6G-RAN Experimentation with Virtualized Over-The-Air Workloads | Chang Liu, T. D. Khoa Le, Rahul Saini, Kishor C. Joshi, George Exarchakos | 2025-08-29 | 下载 | Testbed sharing, a practice in which different researchers concurrently develop independent use cases on top of the same testbed, is ubiquitous in wireless experimental research. |
| Generalized Encrypted Traffic Classification Using Inter-Flow Signals | Federica Bianchi, Edoardo Di Paolo, Angelo Spognardi | 2025-08-29 | 下载 | In this paper, we present a novel encrypted traffic classification model that operates directly on raw PCAP data without requiring prior assumptions about traffic type. |
| A Combined Push-Pull Access Framework for Digital Twin Alignment and Anomaly Reporting | Federico Chiariotti, Fabio Saggese, Andrea Munari, Leonardo Badia, Petar Popovski | 2025-08-29 | 下载 | A digital twin (DT) contains a set of virtual models of real systems and processes that are synchronized to their physical counterparts. This enables experimentation and examination of counterfactuals... |
| Towards a Decentralized IoT Onboarding for Smart Homes Using Consortium Blockchain | Narges Dadkhah, Khan Reaz, Gerhard Wunder | 2025-08-29 | 下载 | The increasing adoption of smart home devices and IoT-based security systems presents significant opportunities to enhance convenience, safety, and risk management for homeowners and service providers... |
| Synergetic Empowerment: Wireless Communications Meets Embodied Intelligence | Hongtao Liang, Yihe Diao, YuHang Wu, Fuhui Zhou, Qihui Wu | 2025-08-29 | 下载 | Wireless communication is evolving into an agent era, where large-scale agents with inherent embodied intelligence are not just users but active participants. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor Decomposition | Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Yongseok Soh, Jesmin Jahan Tithi, Fabrizio Petrini, Jee Choi | 2025-08-29 | 下载 | Tensor decomposition (TD) is essential for analyzing high-dimensional sparse data, yet its irregular computations and memory-access patterns pose major performance challenges on modern parallel proces... |