2025-06-10

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Low-Level and NUMA-Aware Optimization for High-Performance Quantum Simulation	Ali Rezaei, Luc Jaulmes, Maria Bahna, Oliver Thomson Brown, Antonio Barbalace	2025-06-10	下载	Scalable classical simulation of quantum circuits is crucial for advancing quantum algorithm development and validating emerging hardware. This work focuses on performance enhancements through targete...
Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU	Petar Jakuš, Hrvoje Džapo	2025-06-10	下载	This paper presents a keyword spotting (KWS) system implemented on the NXP MCXN947 microcontroller with an integrated Neural Processing Unit (NPU), enabling real-time voice interaction on resource-con...
STI-SNN: A 0.14 GOPS/W/PE Single-Timestep Inference FPGA-based SNN Accelerator with Algorithm and Hardware Co-Design	Kainan Wang, Chengyi Yang, Chengting Yu, Yee Sin Ang, Bo Wang, Aili Wang	2025-06-10	下载	Brain-inspired Spiking Neural Networks (SNNs) have attracted attention for their event-driven characteristics and high energy efficiency. However, the temporal dependency and irregularity of spikes pr...
POLARON: Precision-aware On-device Learning and Adaptive Runtime-cONfigurable AI acceleration	Mukul Lokhande, Santosh Kumar Vishvakarma	2025-06-10	下载	The increasing complexity of AI models requires flexible hardware capable of supporting diverse precision formats, particularly for energy-constrained edge platforms.
CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA	Jiale Dong, Hao Wu, Zihao Wang, Wenqi Lou, Zhendong Zheng, Lei Gong, Chao Wang, Xuehai Zhou	2025-06-10	下载	Vision Transformers (ViTs) exhibit superior performance in computer vision tasks but face deployment challenges on resource-constrained devices due to high computational/memory demands.
ABC-FHE : A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic Encryption	Sungwoong Yune, Hyojeong Lee, Adiwena Putra, Hyunjun Cho, Cuong Duong Manh, Jaeho Jeon, Joo-Young Kim	2025-06-10	下载	As the demand for privacy-preserving computation continues to grow, fully homomorphic encryption (FHE)-which enables continuous computation on encrypted data-has become a critical solution.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs	Dhruv Parikh, Viktor Prasanna	2025-06-10	下载	Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that represents and manipulates information using high-dimensional vectors, called hypervectors (HV).
TTrace: Lightweight Error Checking and Diagnosis for Distributed Training	Haitian Jiang, Shaowei Zhu, Zhen Zhang, Zhenyu Song, Xinwei Fu, Zhen Jia, Yida Wang, Jinyang Li	2025-06-10	下载	Distributed training is essential for scaling the training of large neural network models, such as large language models (LLMs), across thousands of GPUs.
A Survey of End-to-End Modeling for Distributed DNN Training: Workloads, Simulators, and TCO	Jonas Svedas, Hannah Watson, Nathan Laubeuf, Diksha Moolchandani, Abubakr Nada, Arjun Singh, Dwaipayan Biswas, James Myers, Debjyoti Bhattacharjee	2025-06-10	下载	Distributed deep neural networks (DNNs) have become a cornerstone for scaling machine learning to meet the demands of increasingly complex applications.
Multi-GPU Acceleration of PALABOS Fluid Solver using C++ Standard Parallelism	Jonas Latt, Christophe Coreixas	2025-06-10	下载	This article presents the principles, software architecture, and performance analysis of the GPU port of the lattice Boltzmann software library Palabos (J. Latt et al.
Terabyte-Scale Analytics in the Blink of an Eye	Bowen Wu, Wei Cui, Carlo Curino, Matteo Interlandi, Rathijit Sen	2025-06-10	下载	For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of...
FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models	Hariharan Ramesh, Jyotikrishna Dass	2025-06-10	下载	Integrating Low-Rank Adaptation (LoRA) into federated learning offers a promising solution for parameter-efficient fine-tuning of Large Language Models (LLMs) without sharing local data.
Mycelium: A Transformation-Embedded LSM-Tree	Holly Casaletto, Jeff Lefevre, Aldrin Montana, Peter Alvaro	2025-06-10	下载	Compaction is a necessary, but often costly background process in write-optimized data structures like LSM-trees that reorganizes incoming data that is sequentially appended to logs.
Balancing Fixed Number of Nodes Among Multiple Fixed Clusters	Paritosh Ranjan, Surajit Majumder, Prodip Roy, Bhuban Padhan	2025-06-10	下载	Cloud infrastructure users often allocate a fixed number of nodes to individual container clusters (e.g., Kubernetes, OpenShift), resulting in underutilization of computing resources due to asynchrono...
Synchronization in Anonymous Networks Under Arbitrary Dynamics	Rida Bazzi, Cameron Bickley, Anya Chaturvedi, Andréa W. Richa, Peter Vargas	2025-06-10	下载	We present the δ-Synchronizer, which works in non-synchronous dynamic networks under minimal assumptions. Our model allows for arbitrary topological changes without any guarantee of eventual global ...
Parallel FFTW on RISC-V: A Comparative Study including OpenMP, MPI, and HPX	Alexander Strack, Christopher Taylor, Dirk Pflüger	2025-06-10	下载	Rapid advancements in RISC-V hardware development shift the focus from low-level optimizations to higher-level parallelization. Recent RISC-V processors, such as the SOPHON SG2042, have 64 cores.
Blockchain and Edge Computing Nexus: A Large-scale Systematic Literature Review	Zeinab Nezami, Zhuolun Li, Chuhao Qin, Fatemeh Banaie, Rabiya Khalid, Evangelos Pournaras	2025-06-10	下载	Blockchain and edge computing are two instrumental paradigms of decentralized computation, driving key advancements in Smart Cities applications such as supply chain, energy and mobility.
Towards Provenance-Aware Earth Observation Workflows: the openEO Case Study	H. Omidi, L. Sacco, V. Hutter, G. Irsiegler, M. Claus, M. Schobben, A. Jacob, M. Schramm, S. Fiore	2025-06-10	下载	Capturing the history of operations and activities during a computational workflow is significantly important for Earth Observation (EO). The data provenance helps to collect the metadata that records...
EROICA: Online Performance Troubleshooting for Large-scale Model Training	Yu Guan, Zhiyu Yin, Haoyu Chen, Sheng Cheng, Chaojie Yang, Kun Qian, Tianyin Xu, Pengcheng Zhang, Yang Zhang, Hanyu Zhao, Yong Li, Wei Lin, Dennis Cai, Ennan Zhai	2025-06-10	下载	Troubleshooting performance problems of large model training (LMT) is immensely challenging, due to unprecedented scales of modern GPU clusters, the complexity of software-hardware interactions, and t...
Low-resource domain adaptation while minimizing energy and hardware resource consumption	Hernán Maina, Nicolás Wolovick, Luciana Benotti	2025-06-10	下载	Training Large Language Models (LLMs) is costly in terms of energy, hardware, and annotated data, often resulting in a positionality rooted in predominant cultures and values (Santy et al., 2023).
HASFL: Heterogeneity-aware Split Federated Learning over Edge Computing Systems	Zheng Lin, Zhe Chen, Xianhao Chen, Wei Ni, Yue Gao	2025-06-10	下载	Split federated learning (SFL) has emerged as a promising paradigm to democratize machine learning (ML) on edge devices by enabling layer-wise model partitioning.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
A Multi-Armed Bandit Framework for Online Optimisation in Green Integrated Terrestrial and Non-Terrestrial Networks	Henri Alam, Antonio de Domenico, Tareq Si Salem, Florian Kaltenberger	2025-06-10	下载	Integrated terrestrial and non-terrestrial network (TN-NTN) architectures offer a promising solution for expanding coverage and improving capacity for the network.
Age of Information in Unreliable Tandem Queues	Muthukrishnan Senthilkumar, Aresh Dadlani, Hina Tabassum	2025-06-10	下载	Stringent demands for timely information delivery, driven by the widespread adoption of real-time applications and the Internet of Things, have established the age of information (AoI) as a critical m...
Adaptive Bandwidth Sharing for Optimizing QoE of Real-Time Video	Sushi Anna George, Vinay Joseph	2025-06-10	下载	The concept of spectrum or bandwidth sharing has gained significant global attention as a means to enhance the efficiency of real-time traffic management in wireless networks.
MOSE: A Novel Orchestration Framework for Stateful Microservice Migration at the Edge	Antonio Calagna, Yenchia Yu, Paolo Giaccone, Carla Fabiana Chiasserini	2025-06-10	下载	Stateful migration has emerged as the dominant technology to support microservice mobility at the network edge while ensuring a satisfying experience to mobile end users.
Deep Reinforcement Learning-Based RAN Slicing with Efficient Inter-Slice Isolation in Tactical Wireless Networks	Abderrahime Filali, Diala Naboulsi, Georges Kaddoum	2025-06-10	下载	The next generation of tactical networks (TNs) is poised to further leverage the key enablers of 5G and beyond 5G (B5G) technology, such as radio access network (RAN) slicing and the open RAN (O-RAN) ...
When Simple Model Just Works: Is Network Traffic Classification in Crisis?	Kamil Jerabek, Jan Luxemburk, Richard Plny, Josef Koumar, Jaroslav Pesek, Karel Hynek	2025-06-10	下载	Machine learning has been applied to network traffic classification (TC) for over two decades. While early efforts used shallow models, the latter 2010s saw a shift toward complex neural networks, oft...
Aerial Shepherds: Enabling Hierarchical Localization in Heterogeneous MAV Swarms	Haoyang Wang, Jingao Xu, Chenyu Zhao, Yuhan Cheng, Xuecheng Chen, Chaopeng Hong, Xiao-Ping Zhang, Yunhao Liu, Xinlei Chen	2025-06-10	下载	A heterogeneous micro aerial vehicles (MAV) swarm consists of resource-intensive but expensive advanced MAVs (AMAVs) and resource-limited but cost-effective basic MAVs (BMAVs), offering opportunities ...
5G Aero: A Prototyping Platform for Evaluating Aerial 5G Communications	Matteo Bordin, Madhukara S. Holla, Sakthivel Velumani, Salvatore D'Oro, Tommaso Melodia	2025-06-10	下载	The application of small-factor, 5G-enabled Unmanned Aerial Vehicles (UAVs) has recently gained significant interest in various aerial and Industry 4.0 applications.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
EROICA: Online Performance Troubleshooting for Large-scale Model Training	Yu Guan, Zhiyu Yin, Haoyu Chen, Sheng Cheng, Chaojie Yang, Kun Qian, Tianyin Xu, Pengcheng Zhang, Yang Zhang, Hanyu Zhao, Yong Li, Wei Lin, Dennis Cai, Ennan Zhai	2025-06-10	下载	Troubleshooting performance problems of large model training (LMT) is immensely challenging, due to unprecedented scales of modern GPU clusters, the complexity of software-hardware interactions, and t...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
GPU-accelerated Modeling of Biological Regulatory Networks	Joyce Reimer, Pranta Saha, Chris Chen, Neeraj Dhar, Brook Byrns, Steven Rayan, Gordon Broderick	2025-06-10	下载	The complex regulatory dynamics of a biological network can be succinctly captured using discrete logic models. Given even sparse time-course data from the system of interest, previous work has shown ...
Terabyte-Scale Analytics in the Blink of an Eye	Bowen Wu, Wei Cui, Carlo Curino, Matteo Interlandi, Rathijit Sen	2025-06-10	下载	For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of...
Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU	Petar Jakuš, Hrvoje Džapo	2025-06-10	下载	This paper presents a keyword spotting (KWS) system implemented on the NXP MCXN947 microcontroller with an integrated Neural Processing Unit (NPU), enabling real-time voice interaction on resource-con...
A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search	Austin R. Ellis-Mohr, Anuj K. Nayak, Lav R. Varshney	2025-06-10	下载	Large language models (LLMs) demand considerable computational, energy, and financial resources during both training and deployment. While scaling laws for training have guided much of the field's rec...