Skip to content

2025-06-10

cs.AR - Architecture

标题作者发布日期PDF摘要
Low-Level and NUMA-Aware Optimization for High-Performance Quantum SimulationAli Rezaei, Luc Jaulmes, Maria Bahna, Oliver Thomson Brown, Antonio Barbalace2025-06-10下载Scalable classical simulation of quantum circuits is crucial for advancing quantum algorithm development and validating emerging hardware. This work focuses on performance enhancements through targete...
Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPUPetar Jakuš, Hrvoje Džapo2025-06-10下载This paper presents a keyword spotting (KWS) system implemented on the NXP MCXN947 microcontroller with an integrated Neural Processing Unit (NPU), enabling real-time voice interaction on resource-con...
STI-SNN: A 0.14 GOPS/W/PE Single-Timestep Inference FPGA-based SNN Accelerator with Algorithm and Hardware Co-DesignKainan Wang, Chengyi Yang, Chengting Yu, Yee Sin Ang, Bo Wang, Aili Wang2025-06-10下载Brain-inspired Spiking Neural Networks (SNNs) have attracted attention for their event-driven characteristics and high energy efficiency. However, the temporal dependency and irregularity of spikes pr...
POLARON: Precision-aware On-device Learning and Adaptive Runtime-cONfigurable AI accelerationMukul Lokhande, Santosh Kumar Vishvakarma2025-06-10下载The increasing complexity of AI models requires flexible hardware capable of supporting diverse precision formats, particularly for energy-constrained edge platforms.
CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGAJiale Dong, Hao Wu, Zihao Wang, Wenqi Lou, Zhendong Zheng, Lei Gong, Chao Wang, Xuehai Zhou2025-06-10下载Vision Transformers (ViTs) exhibit superior performance in computer vision tasks but face deployment challenges on resource-constrained devices due to high computational/memory demands.
ABC-FHE : A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic EncryptionSungwoong Yune, Hyojeong Lee, Adiwena Putra, Hyunjun Cho, Cuong Duong Manh, Jaeho Jeon, Joo-Young Kim2025-06-10下载As the demand for privacy-preserving computation continues to grow, fully homomorphic encryption (FHE)-which enables continuous computation on encrypted data-has become a critical solution.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUsDhruv Parikh, Viktor Prasanna2025-06-10下载Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that represents and manipulates information using high-dimensional vectors, called hypervectors (HV).
TTrace: Lightweight Error Checking and Diagnosis for Distributed TrainingHaitian Jiang, Shaowei Zhu, Zhen Zhang, Zhenyu Song, Xinwei Fu, Zhen Jia, Yida Wang, Jinyang Li2025-06-10下载Distributed training is essential for scaling the training of large neural network models, such as large language models (LLMs), across thousands of GPUs.
A Survey of End-to-End Modeling for Distributed DNN Training: Workloads, Simulators, and TCOJonas Svedas, Hannah Watson, Nathan Laubeuf, Diksha Moolchandani, Abubakr Nada, Arjun Singh, Dwaipayan Biswas, James Myers, Debjyoti Bhattacharjee2025-06-10下载Distributed deep neural networks (DNNs) have become a cornerstone for scaling machine learning to meet the demands of increasingly complex applications.
Multi-GPU Acceleration of PALABOS Fluid Solver using C++ Standard ParallelismJonas Latt, Christophe Coreixas2025-06-10下载This article presents the principles, software architecture, and performance analysis of the GPU port of the lattice Boltzmann software library Palabos (J. Latt et al.
Terabyte-Scale Analytics in the Blink of an EyeBowen Wu, Wei Cui, Carlo Curino, Matteo Interlandi, Rathijit Sen2025-06-10下载For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of...
FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language ModelsHariharan Ramesh, Jyotikrishna Dass2025-06-10下载Integrating Low-Rank Adaptation (LoRA) into federated learning offers a promising solution for parameter-efficient fine-tuning of Large Language Models (LLMs) without sharing local data.
Mycelium: A Transformation-Embedded LSM-TreeHolly Casaletto, Jeff Lefevre, Aldrin Montana, Peter Alvaro2025-06-10下载Compaction is a necessary, but often costly background process in write-optimized data structures like LSM-trees that reorganizes incoming data that is sequentially appended to logs.
Balancing Fixed Number of Nodes Among Multiple Fixed ClustersParitosh Ranjan, Surajit Majumder, Prodip Roy, Bhuban Padhan2025-06-10下载Cloud infrastructure users often allocate a fixed number of nodes to individual container clusters (e.g., Kubernetes, OpenShift), resulting in underutilization of computing resources due to asynchrono...
Synchronization in Anonymous Networks Under Arbitrary DynamicsRida Bazzi, Cameron Bickley, Anya Chaturvedi, Andréa W. Richa, Peter Vargas2025-06-10下载We present the δ-Synchronizer, which works in non-synchronous dynamic networks under minimal assumptions. Our model allows for arbitrary topological changes without any guarantee of eventual global ...
Parallel FFTW on RISC-V: A Comparative Study including OpenMP, MPI, and HPXAlexander Strack, Christopher Taylor, Dirk Pflüger2025-06-10下载Rapid advancements in RISC-V hardware development shift the focus from low-level optimizations to higher-level parallelization. Recent RISC-V processors, such as the SOPHON SG2042, have 64 cores.
Blockchain and Edge Computing Nexus: A Large-scale Systematic Literature ReviewZeinab Nezami, Zhuolun Li, Chuhao Qin, Fatemeh Banaie, Rabiya Khalid, Evangelos Pournaras2025-06-10下载Blockchain and edge computing are two instrumental paradigms of decentralized computation, driving key advancements in Smart Cities applications such as supply chain, energy and mobility.
Towards Provenance-Aware Earth Observation Workflows: the openEO Case StudyH. Omidi, L. Sacco, V. Hutter, G. Irsiegler, M. Claus, M. Schobben, A. Jacob, M. Schramm, S. Fiore2025-06-10下载Capturing the history of operations and activities during a computational workflow is significantly important for Earth Observation (EO). The data provenance helps to collect the metadata that records...
EROICA: Online Performance Troubleshooting for Large-scale Model TrainingYu Guan, Zhiyu Yin, Haoyu Chen, Sheng Cheng, Chaojie Yang, Kun Qian, Tianyin Xu, Pengcheng Zhang, Yang Zhang, Hanyu Zhao, Yong Li, Wei Lin, Dennis Cai, Ennan Zhai2025-06-10下载Troubleshooting performance problems of large model training (LMT) is immensely challenging, due to unprecedented scales of modern GPU clusters, the complexity of software-hardware interactions, and t...
Low-resource domain adaptation while minimizing energy and hardware resource consumptionHernán Maina, Nicolás Wolovick, Luciana Benotti2025-06-10下载Training Large Language Models (LLMs) is costly in terms of energy, hardware, and annotated data, often resulting in a positionality rooted in predominant cultures and values (Santy et al., 2023).
HASFL: Heterogeneity-aware Split Federated Learning over Edge Computing SystemsZheng Lin, Zhe Chen, Xianhao Chen, Wei Ni, Yue Gao2025-06-10下载Split federated learning (SFL) has emerged as a promising paradigm to democratize machine learning (ML) on edge devices by enabling layer-wise model partitioning.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Multi-Armed Bandit Framework for Online Optimisation in Green Integrated Terrestrial and Non-Terrestrial NetworksHenri Alam, Antonio de Domenico, Tareq Si Salem, Florian Kaltenberger2025-06-10下载Integrated terrestrial and non-terrestrial network (TN-NTN) architectures offer a promising solution for expanding coverage and improving capacity for the network.
Age of Information in Unreliable Tandem QueuesMuthukrishnan Senthilkumar, Aresh Dadlani, Hina Tabassum2025-06-10下载Stringent demands for timely information delivery, driven by the widespread adoption of real-time applications and the Internet of Things, have established the age of information (AoI) as a critical m...
Adaptive Bandwidth Sharing for Optimizing QoE of Real-Time VideoSushi Anna George, Vinay Joseph2025-06-10下载The concept of spectrum or bandwidth sharing has gained significant global attention as a means to enhance the efficiency of real-time traffic management in wireless networks.
MOSE: A Novel Orchestration Framework for Stateful Microservice Migration at the EdgeAntonio Calagna, Yenchia Yu, Paolo Giaccone, Carla Fabiana Chiasserini2025-06-10下载Stateful migration has emerged as the dominant technology to support microservice mobility at the network edge while ensuring a satisfying experience to mobile end users.
Deep Reinforcement Learning-Based RAN Slicing with Efficient Inter-Slice Isolation in Tactical Wireless NetworksAbderrahime Filali, Diala Naboulsi, Georges Kaddoum2025-06-10下载The next generation of tactical networks (TNs) is poised to further leverage the key enablers of 5G and beyond 5G (B5G) technology, such as radio access network (RAN) slicing and the open RAN (O-RAN) ...
When Simple Model Just Works: Is Network Traffic Classification in Crisis?Kamil Jerabek, Jan Luxemburk, Richard Plny, Josef Koumar, Jaroslav Pesek, Karel Hynek2025-06-10下载Machine learning has been applied to network traffic classification (TC) for over two decades. While early efforts used shallow models, the latter 2010s saw a shift toward complex neural networks, oft...
Aerial Shepherds: Enabling Hierarchical Localization in Heterogeneous MAV SwarmsHaoyang Wang, Jingao Xu, Chenyu Zhao, Yuhan Cheng, Xuecheng Chen, Chaopeng Hong, Xiao-Ping Zhang, Yunhao Liu, Xinlei Chen2025-06-10下载A heterogeneous micro aerial vehicles (MAV) swarm consists of resource-intensive but expensive advanced MAVs (AMAVs) and resource-limited but cost-effective basic MAVs (BMAVs), offering opportunities ...
5G Aero: A Prototyping Platform for Evaluating Aerial 5G CommunicationsMatteo Bordin, Madhukara S. Holla, Sakthivel Velumani, Salvatore D'Oro, Tommaso Melodia2025-06-10下载The application of small-factor, 5G-enabled Unmanned Aerial Vehicles (UAVs) has recently gained significant interest in various aerial and Industry 4.0 applications.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
EROICA: Online Performance Troubleshooting for Large-scale Model TrainingYu Guan, Zhiyu Yin, Haoyu Chen, Sheng Cheng, Chaojie Yang, Kun Qian, Tianyin Xu, Pengcheng Zhang, Yang Zhang, Hanyu Zhao, Yong Li, Wei Lin, Dennis Cai, Ennan Zhai2025-06-10下载Troubleshooting performance problems of large model training (LMT) is immensely challenging, due to unprecedented scales of modern GPU clusters, the complexity of software-hardware interactions, and t...

cs.PF - Performance

标题作者发布日期PDF摘要
GPU-accelerated Modeling of Biological Regulatory NetworksJoyce Reimer, Pranta Saha, Chris Chen, Neeraj Dhar, Brook Byrns, Steven Rayan, Gordon Broderick2025-06-10下载The complex regulatory dynamics of a biological network can be succinctly captured using discrete logic models. Given even sparse time-course data from the system of interest, previous work has shown ...
Terabyte-Scale Analytics in the Blink of an EyeBowen Wu, Wei Cui, Carlo Curino, Matteo Interlandi, Rathijit Sen2025-06-10下载For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of...
Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPUPetar Jakuš, Hrvoje Džapo2025-06-10下载This paper presents a keyword spotting (KWS) system implemented on the NXP MCXN947 microcontroller with an integrated Neural Processing Unit (NPU), enabling real-time voice interaction on resource-con...
A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill SearchAustin R. Ellis-Mohr, Anuj K. Nayak, Lav R. Varshney2025-06-10下载Large language models (LLMs) demand considerable computational, energy, and financial resources during both training and deployment. While scaling laws for training have guided much of the field's rec...

基于 VitePress 构建