Appearance
2025-06-10
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Low-Level and NUMA-Aware Optimization for High-Performance Quantum Simulation | Ali Rezaei, Luc Jaulmes, Maria Bahna, Oliver Thomson Brown, Antonio Barbalace | 2025-06-10 | 下载 | Scalable classical simulation of quantum circuits is crucial for advancing quantum algorithm development and validating emerging hardware. This work focuses on performance enhancements through targete... |
| Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU | Petar Jakuš, Hrvoje Džapo | 2025-06-10 | 下载 | This paper presents a keyword spotting (KWS) system implemented on the NXP MCXN947 microcontroller with an integrated Neural Processing Unit (NPU), enabling real-time voice interaction on resource-con... |
| STI-SNN: A 0.14 GOPS/W/PE Single-Timestep Inference FPGA-based SNN Accelerator with Algorithm and Hardware Co-Design | Kainan Wang, Chengyi Yang, Chengting Yu, Yee Sin Ang, Bo Wang, Aili Wang | 2025-06-10 | 下载 | Brain-inspired Spiking Neural Networks (SNNs) have attracted attention for their event-driven characteristics and high energy efficiency. However, the temporal dependency and irregularity of spikes pr... |
| POLARON: Precision-aware On-device Learning and Adaptive Runtime-cONfigurable AI acceleration | Mukul Lokhande, Santosh Kumar Vishvakarma | 2025-06-10 | 下载 | The increasing complexity of AI models requires flexible hardware capable of supporting diverse precision formats, particularly for energy-constrained edge platforms. |
| CoQMoE: Co-Designed Quantization and Computation Orchestration for Mixture-of-Experts Vision Transformer on FPGA | Jiale Dong, Hao Wu, Zihao Wang, Wenqi Lou, Zhendong Zheng, Lei Gong, Chao Wang, Xuehai Zhou | 2025-06-10 | 下载 | Vision Transformers (ViTs) exhibit superior performance in computer vision tasks but face deployment challenges on resource-constrained devices due to high computational/memory demands. |
| ABC-FHE : A Resource-Efficient Accelerator Enabling Bootstrappable Parameters for Client-Side Fully Homomorphic Encryption | Sungwoong Yune, Hyojeong Lee, Adiwena Putra, Hyunjun Cho, Cuong Duong Manh, Jaeho Jeon, Joo-Young Kim | 2025-06-10 | 下载 | As the demand for privacy-preserving computation continues to grow, fully homomorphic encryption (FHE)-which enables continuous computation on encrypted data-has become a critical solution. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs | Dhruv Parikh, Viktor Prasanna | 2025-06-10 | 下载 | Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that represents and manipulates information using high-dimensional vectors, called hypervectors (HV). |
| TTrace: Lightweight Error Checking and Diagnosis for Distributed Training | Haitian Jiang, Shaowei Zhu, Zhen Zhang, Zhenyu Song, Xinwei Fu, Zhen Jia, Yida Wang, Jinyang Li | 2025-06-10 | 下载 | Distributed training is essential for scaling the training of large neural network models, such as large language models (LLMs), across thousands of GPUs. |
| A Survey of End-to-End Modeling for Distributed DNN Training: Workloads, Simulators, and TCO | Jonas Svedas, Hannah Watson, Nathan Laubeuf, Diksha Moolchandani, Abubakr Nada, Arjun Singh, Dwaipayan Biswas, James Myers, Debjyoti Bhattacharjee | 2025-06-10 | 下载 | Distributed deep neural networks (DNNs) have become a cornerstone for scaling machine learning to meet the demands of increasingly complex applications. |
| Multi-GPU Acceleration of PALABOS Fluid Solver using C++ Standard Parallelism | Jonas Latt, Christophe Coreixas | 2025-06-10 | 下载 | This article presents the principles, software architecture, and performance analysis of the GPU port of the lattice Boltzmann software library Palabos (J. Latt et al. |
| Terabyte-Scale Analytics in the Blink of an Eye | Bowen Wu, Wei Cui, Carlo Curino, Matteo Interlandi, Rathijit Sen | 2025-06-10 | 下载 | For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of... |
| FLoRIST: Singular Value Thresholding for Efficient and Accurate Federated Fine-Tuning of Large Language Models | Hariharan Ramesh, Jyotikrishna Dass | 2025-06-10 | 下载 | Integrating Low-Rank Adaptation (LoRA) into federated learning offers a promising solution for parameter-efficient fine-tuning of Large Language Models (LLMs) without sharing local data. |
| Mycelium: A Transformation-Embedded LSM-Tree | Holly Casaletto, Jeff Lefevre, Aldrin Montana, Peter Alvaro | 2025-06-10 | 下载 | Compaction is a necessary, but often costly background process in write-optimized data structures like LSM-trees that reorganizes incoming data that is sequentially appended to logs. |
| Balancing Fixed Number of Nodes Among Multiple Fixed Clusters | Paritosh Ranjan, Surajit Majumder, Prodip Roy, Bhuban Padhan | 2025-06-10 | 下载 | Cloud infrastructure users often allocate a fixed number of nodes to individual container clusters (e.g., Kubernetes, OpenShift), resulting in underutilization of computing resources due to asynchrono... |
| Synchronization in Anonymous Networks Under Arbitrary Dynamics | Rida Bazzi, Cameron Bickley, Anya Chaturvedi, Andréa W. Richa, Peter Vargas | 2025-06-10 | 下载 | We present the δ-Synchronizer, which works in non-synchronous dynamic networks under minimal assumptions. Our model allows for arbitrary topological changes without any guarantee of eventual global ... |
| Parallel FFTW on RISC-V: A Comparative Study including OpenMP, MPI, and HPX | Alexander Strack, Christopher Taylor, Dirk Pflüger | 2025-06-10 | 下载 | Rapid advancements in RISC-V hardware development shift the focus from low-level optimizations to higher-level parallelization. Recent RISC-V processors, such as the SOPHON SG2042, have 64 cores. |
| Blockchain and Edge Computing Nexus: A Large-scale Systematic Literature Review | Zeinab Nezami, Zhuolun Li, Chuhao Qin, Fatemeh Banaie, Rabiya Khalid, Evangelos Pournaras | 2025-06-10 | 下载 | Blockchain and edge computing are two instrumental paradigms of decentralized computation, driving key advancements in Smart Cities applications such as supply chain, energy and mobility. |
| Towards Provenance-Aware Earth Observation Workflows: the openEO Case Study | H. Omidi, L. Sacco, V. Hutter, G. Irsiegler, M. Claus, M. Schobben, A. Jacob, M. Schramm, S. Fiore | 2025-06-10 | 下载 | Capturing the history of operations and activities during a computational workflow is significantly important for Earth Observation (EO). The data provenance helps to collect the metadata that records... |
| EROICA: Online Performance Troubleshooting for Large-scale Model Training | Yu Guan, Zhiyu Yin, Haoyu Chen, Sheng Cheng, Chaojie Yang, Kun Qian, Tianyin Xu, Pengcheng Zhang, Yang Zhang, Hanyu Zhao, Yong Li, Wei Lin, Dennis Cai, Ennan Zhai | 2025-06-10 | 下载 | Troubleshooting performance problems of large model training (LMT) is immensely challenging, due to unprecedented scales of modern GPU clusters, the complexity of software-hardware interactions, and t... |
| Low-resource domain adaptation while minimizing energy and hardware resource consumption | Hernán Maina, Nicolás Wolovick, Luciana Benotti | 2025-06-10 | 下载 | Training Large Language Models (LLMs) is costly in terms of energy, hardware, and annotated data, often resulting in a positionality rooted in predominant cultures and values (Santy et al., 2023). |
| HASFL: Heterogeneity-aware Split Federated Learning over Edge Computing Systems | Zheng Lin, Zhe Chen, Xianhao Chen, Wei Ni, Yue Gao | 2025-06-10 | 下载 | Split federated learning (SFL) has emerged as a promising paradigm to democratize machine learning (ML) on edge devices by enabling layer-wise model partitioning. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A Multi-Armed Bandit Framework for Online Optimisation in Green Integrated Terrestrial and Non-Terrestrial Networks | Henri Alam, Antonio de Domenico, Tareq Si Salem, Florian Kaltenberger | 2025-06-10 | 下载 | Integrated terrestrial and non-terrestrial network (TN-NTN) architectures offer a promising solution for expanding coverage and improving capacity for the network. |
| Age of Information in Unreliable Tandem Queues | Muthukrishnan Senthilkumar, Aresh Dadlani, Hina Tabassum | 2025-06-10 | 下载 | Stringent demands for timely information delivery, driven by the widespread adoption of real-time applications and the Internet of Things, have established the age of information (AoI) as a critical m... |
| Adaptive Bandwidth Sharing for Optimizing QoE of Real-Time Video | Sushi Anna George, Vinay Joseph | 2025-06-10 | 下载 | The concept of spectrum or bandwidth sharing has gained significant global attention as a means to enhance the efficiency of real-time traffic management in wireless networks. |
| MOSE: A Novel Orchestration Framework for Stateful Microservice Migration at the Edge | Antonio Calagna, Yenchia Yu, Paolo Giaccone, Carla Fabiana Chiasserini | 2025-06-10 | 下载 | Stateful migration has emerged as the dominant technology to support microservice mobility at the network edge while ensuring a satisfying experience to mobile end users. |
| Deep Reinforcement Learning-Based RAN Slicing with Efficient Inter-Slice Isolation in Tactical Wireless Networks | Abderrahime Filali, Diala Naboulsi, Georges Kaddoum | 2025-06-10 | 下载 | The next generation of tactical networks (TNs) is poised to further leverage the key enablers of 5G and beyond 5G (B5G) technology, such as radio access network (RAN) slicing and the open RAN (O-RAN) ... |
| When Simple Model Just Works: Is Network Traffic Classification in Crisis? | Kamil Jerabek, Jan Luxemburk, Richard Plny, Josef Koumar, Jaroslav Pesek, Karel Hynek | 2025-06-10 | 下载 | Machine learning has been applied to network traffic classification (TC) for over two decades. While early efforts used shallow models, the latter 2010s saw a shift toward complex neural networks, oft... |
| Aerial Shepherds: Enabling Hierarchical Localization in Heterogeneous MAV Swarms | Haoyang Wang, Jingao Xu, Chenyu Zhao, Yuhan Cheng, Xuecheng Chen, Chaopeng Hong, Xiao-Ping Zhang, Yunhao Liu, Xinlei Chen | 2025-06-10 | 下载 | A heterogeneous micro aerial vehicles (MAV) swarm consists of resource-intensive but expensive advanced MAVs (AMAVs) and resource-limited but cost-effective basic MAVs (BMAVs), offering opportunities ... |
| 5G Aero: A Prototyping Platform for Evaluating Aerial 5G Communications | Matteo Bordin, Madhukara S. Holla, Sakthivel Velumani, Salvatore D'Oro, Tommaso Melodia | 2025-06-10 | 下载 | The application of small-factor, 5G-enabled Unmanned Aerial Vehicles (UAVs) has recently gained significant interest in various aerial and Industry 4.0 applications. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| EROICA: Online Performance Troubleshooting for Large-scale Model Training | Yu Guan, Zhiyu Yin, Haoyu Chen, Sheng Cheng, Chaojie Yang, Kun Qian, Tianyin Xu, Pengcheng Zhang, Yang Zhang, Hanyu Zhao, Yong Li, Wei Lin, Dennis Cai, Ennan Zhai | 2025-06-10 | 下载 | Troubleshooting performance problems of large model training (LMT) is immensely challenging, due to unprecedented scales of modern GPU clusters, the complexity of software-hardware interactions, and t... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| GPU-accelerated Modeling of Biological Regulatory Networks | Joyce Reimer, Pranta Saha, Chris Chen, Neeraj Dhar, Brook Byrns, Steven Rayan, Gordon Broderick | 2025-06-10 | 下载 | The complex regulatory dynamics of a biological network can be succinctly captured using discrete logic models. Given even sparse time-course data from the system of interest, previous work has shown ... |
| Terabyte-Scale Analytics in the Blink of an Eye | Bowen Wu, Wei Cui, Carlo Curino, Matteo Interlandi, Rathijit Sen | 2025-06-10 | 下载 | For the past two decades, the DB community has devoted substantial research to take advantage of cheap clusters of machines for distributed data analytics -- we believe that we are at the beginning of... |
| Implementing Keyword Spotting on the MCUX947 Microcontroller with Integrated NPU | Petar Jakuš, Hrvoje Džapo | 2025-06-10 | 下载 | This paper presents a keyword spotting (KWS) system implemented on the NXP MCXN947 microcontroller with an integrated Neural Processing Unit (NPU), enabling real-time voice interaction on resource-con... |
| A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search | Austin R. Ellis-Mohr, Anuj K. Nayak, Lav R. Varshney | 2025-06-10 | 下载 | Large language models (LLMs) demand considerable computational, energy, and financial resources during both training and deployment. While scaling laws for training have guided much of the field's rec... |