Skip to content

2025-08-29

cs.AR - Architecture

标题作者发布日期PDF摘要
Neural Network Acceleration on MPSoC board: Integrating SLAC's SNL, Rogue Software and Auto-SNLHamza Ezzaoui Rahali, Abhilasha Dave, Larry Ruckman, Mohammad Mehdi Rahimifar, Audrey C. Therrien, James J. Russel, Ryan T. Herbst2025-08-29下载The LCLS-II Free Electron Laser (FEL) will generate X-ray pulses for beamline experiments at rates of up to 1~MHz, with detectors producing data throughputs exceeding 1 TB/s.
Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN AcceleratorsWenyong Zhou, Zhengwu Liu, Yuan Ren, Ngai Wong2025-08-29下载Compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs).
SIRA: Scaled-Integer Range Analysis for Optimizing FPGA Dataflow Neural Network AcceleratorsYaman Umuroglu, Christoph Berganski, Felix Jentzsch, Michal Danilowicz, Tomasz Kryjak, Charalampos Bezaitis, Magnus Sjalander, Ian Colbert, Thomas Preusser, Jakoba Petri-Koenig, Michaela Blott2025-08-29下载While neural network quantization effectively reduces the cost of matrix multiplications, aggressive quantization can expose non-matrix-multiply operations as significant performance and resource bott...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor DecompositionAhmed E. Helal, Fabio Checconi, Jan Laukemann, Yongseok Soh, Jesmin Jahan Tithi, Fabrizio Petrini, Jee Choi2025-08-29下载Tensor decomposition (TD) is essential for analyzing high-dimensional sparse data, yet its irregular computations and memory-access patterns pose major performance challenges on modern parallel proces...
Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM InferenceRuokai Yin, Sattwik Deb Mishra, Xuan Zuo, Hokchhay Tann, Preyas Shah, Apala Guha2025-08-29下载Distributed LLM inference requires careful coordination of parallelization strategies across hundreds to thousands of NPUs to meet production SLOs.
Accelerating Mixture-of-Experts Inference by Hiding Offloading Latency with Speculative DecodingZhibin Wang, Zhonghui Zhang, Yuhang Zhou, Zibo Wang, Mo Zhou, Peng Jiang, Weilin Cai, Chengying Huan, Rong Gu, Sheng Zhong, Chen Tian2025-08-29下载Recent advancements in Mixture of Experts (MoE) models have significantly increased their parameter scale as well as model performance. Extensive offloading techniques have been proposed to address th...
Odyssey: Adaptive Policy Selection for Resilient Distributed TrainingYuhang Zhou, Zhibin Wang, Peng Jiang, Haoran Xia, Junhe Lu, Qianyu Jiang, Rong Gu, Hengxi Xu, Xinjing Huang, Guanghuan Fang, Zhiheng Hu, Jingyi Zhang, Yongjin Cai, Jian He, Chen Tian2025-08-29下载Training large language models faces frequent interruptions due to various faults, demanding robust fault-tolerance. Existing backup-free methods, such as redundant computation, dynamic parallelism, a...
Unpacking Maximum Extractable Value on Polygon: A Study on Atomic ArbitrageDaniil Vostrikov, Yash Madhwal, Andrey Seoev, Anastasiia Smirnova, Yury Yanovich, Alexey Smirnov, Vladimir Gorgadze2025-08-29下载The evolution of blockchain technology, from its origins as a decentralized ledger for cryptocurrencies to its broader applications in areas like decentralized finance (DeFi), has significantly transf...
An Optimistic Gradient Tracking Method for Distributed Minimax OptimizationYan Huang, Jinming Xu, Jiming Chen, Karl Henrik Johansson2025-08-29下载This paper studies the distributed minimax optimization problem over networks. To enhance convergence performance, we propose a distributed optimistic gradient tracking method, termed DOGT, which solv...
A Knowledge Distillation-empowered Adaptive Federated Reinforcement Learning Framework for Multi-Domain IoT Applications SchedulingZhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya2025-08-29下载The rapid proliferation of Internet of Things (IoT) applications across heterogeneous Cloud-Edge-IoT environments presents significant challenges in distributed scheduling optimization.
Addressing Reproducibility Challenges in HPC with Continuous IntegrationValérie Hayot-Sasson, Nathaniel Hudson, André Bauer, Maxime Gonthier, Ian Foster, Kyle Chard2025-08-29下载The high-performance computing (HPC) community has adopted incentive structures to motivate reproducible research, with major conferences awarding badges to papers that meet reproducibility requiremen...
Decentralized Federated Averaging via Random WalkChangheng Wang, Zhiqing Wei, Lizhe Liu, Qiao Deng, Yingda Wu, Yangyang Niu, Yashan Pang, Zhiyong Feng2025-08-29下载Federated Learning (FL) is a communication-efficient distributed machine learning method that allows multiple devices to collaboratively train models without sharing raw data.
On the Optimization of Methods for Establishing Well-Connected CommunitiesMohammad Dindoost, Oliver Alvarado Rodriguez, Bartosz Bryg, Minhyuk Park, George Chacko, Tandy Warnow, David A. Bader2025-08-29下载Community detection plays a central role in uncovering meso scale structures in networks. However, existing methods often suffer from disconnected or weakly connected clusters, undermining interpretab...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
QoS-Aware Proportional Fairness Scheduling for Multi-Flow 5G UEs: A Smart Factory PerspectiveMohamed Seliem, Utz Roedig, Cormac Sreenan, Dirk Pesch2025-08-29下载Private 5G networks are emerging as key enablers for smart factories, where a single device often handles multiple concurrent traffic flows with distinct Quality of Service (QoS) requirements.
VOTA: Parallelizing 6G-RAN Experimentation with Virtualized Over-The-Air WorkloadsChang Liu, T. D. Khoa Le, Rahul Saini, Kishor C. Joshi, George Exarchakos2025-08-29下载Testbed sharing, a practice in which different researchers concurrently develop independent use cases on top of the same testbed, is ubiquitous in wireless experimental research.
Generalized Encrypted Traffic Classification Using Inter-Flow SignalsFederica Bianchi, Edoardo Di Paolo, Angelo Spognardi2025-08-29下载In this paper, we present a novel encrypted traffic classification model that operates directly on raw PCAP data without requiring prior assumptions about traffic type.
A Combined Push-Pull Access Framework for Digital Twin Alignment and Anomaly ReportingFederico Chiariotti, Fabio Saggese, Andrea Munari, Leonardo Badia, Petar Popovski2025-08-29下载A digital twin (DT) contains a set of virtual models of real systems and processes that are synchronized to their physical counterparts. This enables experimentation and examination of counterfactuals...
Towards a Decentralized IoT Onboarding for Smart Homes Using Consortium BlockchainNarges Dadkhah, Khan Reaz, Gerhard Wunder2025-08-29下载The increasing adoption of smart home devices and IoT-based security systems presents significant opportunities to enhance convenience, safety, and risk management for homeowners and service providers...
Synergetic Empowerment: Wireless Communications Meets Embodied IntelligenceHongtao Liang, Yihe Diao, YuHang Wu, Fuhui Zhou, Qihui Wu2025-08-29下载Wireless communication is evolving into an agent era, where large-scale agents with inherent embodied intelligence are not just users but active participants.

cs.PF - Performance

标题作者发布日期PDF摘要
ReLATE: Learning Efficient Sparse Encoding for High-Performance Tensor DecompositionAhmed E. Helal, Fabio Checconi, Jan Laukemann, Yongseok Soh, Jesmin Jahan Tithi, Fabrizio Petrini, Jee Choi2025-08-29下载Tensor decomposition (TD) is essential for analyzing high-dimensional sparse data, yet its irregular computations and memory-access patterns pose major performance challenges on modern parallel proces...

基于 VitePress 构建