2025-08-29

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Neural Network Acceleration on MPSoC board: Integrating SLAC's SNL, Rogue Software and Auto-SNL	Hamza Ezzaoui Rahali, Abhilasha Dave, Larry Ruckman, Mohammad Mehdi Rahimifar, Audrey C. Therrien, James J. Russel, Ryan T. Herbst	2025-08-29	下载	The LCLS-II Free Electron Laser (FEL) will generate X-ray pulses for beamline experiments at rates of up to 1~MHz, with detectors producing data throughputs exceeding 1 TB/s.
Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators	Wenyong Zhou, Zhengwu Liu, Yuan Ren, Ngai Wong	2025-08-29	下载	Compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs).
SIRA: Scaled-Integer Range Analysis for Optimizing FPGA Dataflow Neural Network Accelerators	Yaman Umuroglu, Christoph Berganski, Felix Jentzsch, Michal Danilowicz, Tomasz Kryjak, Charalampos Bezaitis, Magnus Sjalander, Ian Colbert, Thomas Preusser, Jakoba Petri-Koenig, Michaela Blott	2025-08-29	下载	While neural network quantization effectively reduces the cost of matrix multiplications, aggressive quantization can expose non-matrix-multiply operations as significant performance and resource bott...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor Decomposition	Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Yongseok Soh, Jesmin Jahan Tithi, Fabrizio Petrini, Jee Choi	2025-08-29	下载	Tensor decomposition (TD) is essential for analyzing high-dimensional sparse data, yet its irregular computations and memory-access patterns pose major performance challenges on modern parallel proces...
Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference	Ruokai Yin, Sattwik Deb Mishra, Xuan Zuo, Hokchhay Tann, Preyas Shah, Apala Guha	2025-08-29	下载	Distributed LLM inference requires careful coordination of parallelization strategies across hundreds to thousands of NPUs to meet production SLOs.
Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative Decoding	Zhibin Wang, Zhonghui Zhang, Yuhang Zhou, Zibo Wang, Mo Zhou, Peng Jiang, Weilin Cai, Chengying Huan, Rong Gu, Sheng Zhong, Chen Tian	2025-08-29	下载	Recent advancements in Mixture of Experts (MoE) models have significantly increased their parameter scale as well as model performance. Extensive offloading techniques have been proposed to address th...
Odyssey: Adaptive Policy Selection for Resilient Distributed Training	Yuhang Zhou, Zhibin Wang, Peng Jiang, Haoran Xia, Junhe Lu, Qianyu Jiang, Rong Gu, Hengxi Xu, Xinjing Huang, Guanghuan Fang, Zhiheng Hu, Jingyi Zhang, Yongjin Cai, Jian He, Chen Tian	2025-08-29	下载	Training large language models faces frequent interruptions due to various faults, demanding robust fault-tolerance. Existing backup-free methods, such as redundant computation, dynamic parallelism, a...
Unpacking Maximum Extractable Value on Polygon: A Study on Atomic Arbitrage	Daniil Vostrikov, Yash Madhwal, Andrey Seoev, Anastasiia Smirnova, Yury Yanovich, Alexey Smirnov, Vladimir Gorgadze	2025-08-29	下载	The evolution of blockchain technology, from its origins as a decentralized ledger for cryptocurrencies to its broader applications in areas like decentralized finance (DeFi), has significantly transf...
An Optimistic Gradient Tracking Method for Distributed Minimax Optimization	Yan Huang, Jinming Xu, Jiming Chen, Karl Henrik Johansson	2025-08-29	下载	This paper studies the distributed minimax optimization problem over networks. To enhance convergence performance, we propose a distributed optimistic gradient tracking method, termed DOGT, which solv...
A Knowledge Distillation-empowered Adaptive Federated Reinforcement Learning Framework for Multi-Domain IoT Applications Scheduling	Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya	2025-08-29	下载	The rapid proliferation of Internet of Things (IoT) applications across heterogeneous Cloud-Edge-IoT environments presents significant challenges in distributed scheduling optimization.
Addressing Reproducibility Challenges in HPC with Continuous Integration	Valérie Hayot-Sasson, Nathaniel Hudson, André Bauer, Maxime Gonthier, Ian Foster, Kyle Chard	2025-08-29	下载	The high-performance computing (HPC) community has adopted incentive structures to motivate reproducible research, with major conferences awarding badges to papers that meet reproducibility requiremen...
Decentralized Federated Averaging via Random Walk	Changheng Wang, Zhiqing Wei, Lizhe Liu, Qiao Deng, Yingda Wu, Yangyang Niu, Yashan Pang, Zhiyong Feng	2025-08-29	下载	Federated Learning (FL) is a communication-efficient distributed machine learning method that allows multiple devices to collaboratively train models without sharing raw data.
On the Optimization of Methods for Establishing Well-Connected Communities	Mohammad Dindoost, Oliver Alvarado Rodriguez, Bartosz Bryg, Minhyuk Park, George Chacko, Tandy Warnow, David A. Bader	2025-08-29	下载	Community detection plays a central role in uncovering meso scale structures in networks. However, existing methods often suffer from disconnected or weakly connected clusters, undermining interpretab...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
QoS-Aware Proportional Fairness Scheduling for Multi-Flow 5G UEs: A Smart Factory Perspective	Mohamed Seliem, Utz Roedig, Cormac Sreenan, Dirk Pesch	2025-08-29	下载	Private 5G networks are emerging as key enablers for smart factories, where a single device often handles multiple concurrent traffic flows with distinct Quality of Service (QoS) requirements.
VOTA: Parallelizing 6G-RAN Experimentation with Virtualized Over-The-Air Workloads	Chang Liu, T. D. Khoa Le, Rahul Saini, Kishor C. Joshi, George Exarchakos	2025-08-29	下载	Testbed sharing, a practice in which different researchers concurrently develop independent use cases on top of the same testbed, is ubiquitous in wireless experimental research.
Generalized Encrypted Traffic Classification Using Inter-Flow Signals	Federica Bianchi, Edoardo Di Paolo, Angelo Spognardi	2025-08-29	下载	In this paper, we present a novel encrypted traffic classification model that operates directly on raw PCAP data without requiring prior assumptions about traffic type.
A Combined Push-Pull Access Framework for Digital Twin Alignment and Anomaly Reporting	Federico Chiariotti, Fabio Saggese, Andrea Munari, Leonardo Badia, Petar Popovski	2025-08-29	下载	A digital twin (DT) contains a set of virtual models of real systems and processes that are synchronized to their physical counterparts. This enables experimentation and examination of counterfactuals...
Towards a Decentralized IoT Onboarding for Smart Homes Using Consortium Blockchain	Narges Dadkhah, Khan Reaz, Gerhard Wunder	2025-08-29	下载	The increasing adoption of smart home devices and IoT-based security systems presents significant opportunities to enhance convenience, safety, and risk management for homeowners and service providers...
Synergetic Empowerment: Wireless Communications Meets Embodied Intelligence	Hongtao Liang, Yihe Diao, YuHang Wu, Fuhui Zhou, Qihui Wu	2025-08-29	下载	Wireless communication is evolving into an agent era, where large-scale agents with inherent embodied intelligence are not just users but active participants.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor Decomposition	Ahmed E. Helal, Fabio Checconi, Jan Laukemann, Yongseok Soh, Jesmin Jahan Tithi, Fabrizio Petrini, Jee Choi	2025-08-29	下载	Tensor decomposition (TD) is essential for analyzing high-dimensional sparse data, yet its irregular computations and memory-access patterns pose major performance challenges on modern parallel proces...