2025-12-22

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Sensitivity-Aware Mixed-Precision Quantization for ReRAM-based Computing-in-Memory	Guan-Cheng Chen, Chieh-Lin Tsai, Pei-Hsuan Tsai, Yuan-Hao Chang	2025-12-22	下载	Compute-In-Memory (CIM) systems, particularly those utilizing ReRAM and memristive technologies, offer a promising path toward energy-efficient neural network computation.
Binary Neural Network Implementation for Handwritten Digit Recognition on FPGA	Emir Devlet Ertörer, Cem Ünsalan	2025-12-22	下载	Binary neural networks provide a promising solution for low-power, high-speed inference by replacing expensive floating-point operations with bitwise logic.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
An Adaptive Distributed Stencil Abstraction for GPUs	Aditya Bhosale, Laxmikant Kale	2025-12-22	下载	The scientific computing ecosystem in Python is largely confined to single-node parallelism, creating a gap between high-level prototyping in NumPy and high-performance execution on modern supercomput...
UCCL-EP: Portable Expert-Parallel Communication	Ziming Mao, Yihan Zhang, Chihan Cui, Zhen Huang, Kaichao You, Zhongjie Chen, Zhiying Xu, Zhenyu Gu, Scott Shenker, Costin Raiciu, Yang Zhou, Ion Stoica	2025-12-22	下载	Mixture-of-Experts (MoE) workloads rely on expert parallelism (EP) to achieve high GPU efficiency. State-of-the-art EP communication systems such as DeepEP demonstrate strong performance but exhibit p...
Holoscope: Open and Lightweight Distributed Telescope & Honeypot Platform	Andrea Sordello, Marco Mellia, Idilio Drago, Rodolfo Valentim, Francesco Musumeci, Massimo Tornatore, Federico Cerutti, Martino Trevisan, Alessio Botta, Willen Borges Coelho	2025-12-22	下载	The complexity and scale of Internet attacks call for distributed, cooperative observatories capable of monitoring malicious traffic across diverse networks.
PHOTON: Hierarchical Autoregressive Modeling for Lightspeed and Memory-Efficient Language Generation	Yuma Ichikawa, Naoya Takagi, Takumi Nakagawa, Yuzi Kanazawa, Akira Sakai	2025-12-22	下载	Transformers operate as horizontal token-by-token scanners; at each generation step, attending to an ever-growing sequence of token-level states.
RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference	George Karfakis, Faraz Tahmasebi, Binglu Chen, Lime Yao, Saptarshi Mitra, Tianyue Pan, Hyoukjun Kwon, Puneet Gupta	2025-12-22	下载	RAPID-LLM is a unified performance modeling framework for large language model (LLM) training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operato...
Learned Digital Codes for Over-the-Air Computation in Federated Edge Learning	Antonio Tarizzo, Mohammad Kazemi, Deniz Gündüz	2025-12-22	下载	Federated edge learning (FEEL) enables wireless devices to collaboratively train a centralised model without sharing raw data, but repeated uplink transmission of model updates makes communication the...
A Survey of Real-Time Support, Analysis, and Advancements in ROS 2	Daniel Casini, Jian-Jia Chen, Jing Li, Federico Reghenzani, Harun Teper	2025-12-22	下载	The Robot Operating System 2 (ROS~2) has emerged as a relevant middleware framework for robotic applications, offering modularity, distributed execution, and communication.
Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs	Xinhao Cheng, Zhihao Zhang, Yu Zhou, Jianan Ji, Jinchen Jiang, Zepeng Zhao, Ziruo Xiao, Zihao Ye, Yingyi Huang, Ruihang Lai, Hongyi Jin, Bohan Hou, Mengdi Wu, Yixin Dong, Anthony Yip, Zihao Ye, Songting Wang, Wenqin Yang, Xupeng Miao, Tianqi Chen, Zhihao Jia	2025-12-22	下载	We introduce Mirage Persistent Kernel (MPK), the first compiler and runtime system that automatically transforms multi-GPU model inference into a single high-performance megakernel.
Faster Distributed Inference-Only Recommender Systems via Bounded Lag Synchronous Collectives	Kiril Dichev, Filip Pawlowski, Albert-Jan Yzelman	2025-12-22	下载	Recommender systems are enablers of personalized content delivery, and therefore revenue, for many large companies. In the last decade, deep learning recommender models (DLRMs) are the de-facto standa...
Simulations between Strongly Sublinear MPC and Node-Capacitated Clique	Philipp Schneider, Julian Werthmann	2025-12-22	下载	We study how the strongly sublinear MPC model relates to the classic, graph-centric distributed models, focusing on the Node-Capacitated Clique (NCC), a bandwidth-parametrized generalization of the Co...
SPUMA: a minimally invasive approach to the GPU porting of OPENFOAM	Simone Bnà, Giuseppe Giaquinto, Ettore Fadiga, Tommaso Zanelli, Francesco Bottau	2025-12-22	下载	High Performance Computing (HPC) on hybrid clusters represents a significant opportunity for Computational Fluid Dynamics (CFD), especially when modern accelerators are utilized effectively.
CascadeInfer: Low-Latency and Load-Balanced LLM Serving via Length-Aware Scheduling	Yitao Yuan, Chenqi Zhao, Bohan Zhao, Zane Cao, Yongchao He, Wenfei Wu	2025-12-22	下载	Efficiently harnessing GPU compute is critical to improving user experience and reducing operational costs in large language model (LLM) services.
Evidential Trust-Aware Model Personalization in Decentralized Federated Learning for Wearable IoT	Murtaza Rangwala, Richard O. Sinnott, Rajkumar Buyya	2025-12-22	下载	Decentralized federated learning (DFL) enables collaborative model training across edge devices without centralized coordination, offering resilience against single points of failure.
Timely Parameter Updating in Over-the-Air Federated Learning	Jiaqi Zhu, Zhongyuan Zhao, Xiao Li, Ruihao Du, Shi Jin, Howard H. Yang	2025-12-22	下载	Incorporating over-the-air computations (OAC) into the model training process of federated learning (FL) is an effective approach to alleviating the communication bottleneck in FL systems.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
UCCL-EP: Portable Expert-Parallel Communication	Ziming Mao, Yihan Zhang, Chihan Cui, Zhen Huang, Kaichao You, Zhongjie Chen, Zhiying Xu, Zhenyu Gu, Scott Shenker, Costin Raiciu, Yang Zhou, Ion Stoica	2025-12-22	下载	Mixture-of-Experts (MoE) workloads rely on expert parallelism (EP) to achieve high GPU efficiency. State-of-the-art EP communication systems such as DeepEP demonstrate strong performance but exhibit p...
CORE: Compensable Reward as a Catalyst for Improving Offline RL in Wireless Networks	Lipeng Zu, Hansong Zhou, Yu Qian, Shayok Chakraborty, Yukun Yuan, Linke Guo, Xiaonan Zhang	2025-12-22	下载	Real-world wireless data are expensive to collect and often lack sufficient expert demonstrations, causing existing offline RL methods to overfit suboptimal behaviors and exhibit unstable performance.
On Network-Aware Semantic Communication and Edge-Cloud Collaborative Intelligence Systems	Murdadha Nasif, Ahmed Refaey Hussein	2025-12-22	下载	Semantic communication and edge-cloud collaborative intelligence are increasingly recognized as foundational enablers for next-generation intelligent services operating under stringent bandwidth, late...
Lightweight Intrusion Detection in IoT via SHAP-Guided Feature Pruning and Knowledge-Distilled Kronecker Networks	Hafsa Benaddi, Mohammed Jouhari, Nouha Laamech, Anas Motii, Khalil Ibrahimi	2025-12-22	下载	The widespread deployment of Internet of Things (IoT) devices requires intrusion detection systems (IDS) with high accuracy while operating under strict resource constraints.
Semantic Communication for Rate-Limited Closed-Loop Distributed Communication-Sensing-Control Systems	Guangjin Pan, Ayça Özçelikkale, Christian Häger, Musa Furkan Keskin, Henk Wymeersch	2025-12-22	下载	The growing integration of distributed integrated sensing and communication (ISAC) with closed-loop control in intelligent networks demands efficient information transmission under stringent bandwidth...
BEVCooper: Accurate and Communication-Efficient Bird's-Eye-View Perception in Vehicular Networks	Jiawei Hou, Peng Yang, Xiangxiang Dai, Mingliu Liu, Conghao Zhou	2025-12-22	下载	Bird's-Eye-View (BEV) is critical to connected and automated vehicles (CAVs) as it can provide unified and precise representation of vehicular surroundings.
Optimal 3D Directional WPT Charging via UAV for 3D Wireless Rechargeable Sensor Networks	Zhenguo Gao, Hui Li, Yiqin Chen, Qingyu Gao, Zhufang Kuang, Shih-Hau Fang, Hsiao-Chun Wu	2025-12-22	下载	The high mobility and flexible deployment capability of UAVs make them an impressive option for charging nodes in Wireless Rechargeable Sensor Networks (WRSNs) using Directional Wireless Power Transfe...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference	George Karfakis, Faraz Tahmasebi, Binglu Chen, Lime Yao, Saptarshi Mitra, Tianyue Pan, Hyoukjun Kwon, Puneet Gupta	2025-12-22	下载	RAPID-LLM is a unified performance modeling framework for large language model (LLM) training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operato...