Skip to content

2025-12-22

cs.AR - Architecture

标题作者发布日期PDF摘要
Sensitivity-Aware Mixed-Precision Quantization for ReRAM-based Computing-in-MemoryGuan-Cheng Chen, Chieh-Lin Tsai, Pei-Hsuan Tsai, Yuan-Hao Chang2025-12-22下载Compute-In-Memory (CIM) systems, particularly those utilizing ReRAM and memristive technologies, offer a promising path toward energy-efficient neural network computation.
Binary Neural Network Implementation for Handwritten Digit Recognition on FPGAEmir Devlet Ertörer, Cem Ünsalan2025-12-22下载Binary neural networks provide a promising solution for low-power, high-speed inference by replacing expensive floating-point operations with bitwise logic.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
An Adaptive Distributed Stencil Abstraction for GPUsAditya Bhosale, Laxmikant Kale2025-12-22下载The scientific computing ecosystem in Python is largely confined to single-node parallelism, creating a gap between high-level prototyping in NumPy and high-performance execution on modern supercomput...
UCCL-EP: Portable Expert-Parallel CommunicationZiming Mao, Yihan Zhang, Chihan Cui, Zhen Huang, Kaichao You, Zhongjie Chen, Zhiying Xu, Zhenyu Gu, Scott Shenker, Costin Raiciu, Yang Zhou, Ion Stoica2025-12-22下载Mixture-of-Experts (MoE) workloads rely on expert parallelism (EP) to achieve high GPU efficiency. State-of-the-art EP communication systems such as DeepEP demonstrate strong performance but exhibit p...
Holoscope: Open and Lightweight Distributed Telescope & Honeypot PlatformAndrea Sordello, Marco Mellia, Idilio Drago, Rodolfo Valentim, Francesco Musumeci, Massimo Tornatore, Federico Cerutti, Martino Trevisan, Alessio Botta, Willen Borges Coelho2025-12-22下载The complexity and scale of Internet attacks call for distributed, cooperative observatories capable of monitoring malicious traffic across diverse networks.
PHOTON: Hierarchical Autoregressive Modeling for Lightspeed and Memory-Efficient Language GenerationYuma Ichikawa, Naoya Takagi, Takumi Nakagawa, Yuzi Kanazawa, Akira Sakai2025-12-22下载Transformers operate as horizontal token-by-token scanners; at each generation step, attending to an ever-growing sequence of token-level states.
RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and InferenceGeorge Karfakis, Faraz Tahmasebi, Binglu Chen, Lime Yao, Saptarshi Mitra, Tianyue Pan, Hyoukjun Kwon, Puneet Gupta2025-12-22下载RAPID-LLM is a unified performance modeling framework for large language model (LLM) training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operato...
Learned Digital Codes for Over-the-Air Computation in Federated Edge LearningAntonio Tarizzo, Mohammad Kazemi, Deniz Gündüz2025-12-22下载Federated edge learning (FEEL) enables wireless devices to collaboratively train a centralised model without sharing raw data, but repeated uplink transmission of model updates makes communication the...
A Survey of Real-Time Support, Analysis, and Advancements in ROS 2Daniel Casini, Jian-Jia Chen, Jing Li, Federico Reghenzani, Harun Teper2025-12-22下载The Robot Operating System 2 (ROS~2) has emerged as a relevant middleware framework for robotic applications, offering modularity, distributed execution, and communication.
Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor ProgramsXinhao Cheng, Zhihao Zhang, Yu Zhou, Jianan Ji, Jinchen Jiang, Zepeng Zhao, Ziruo Xiao, Zihao Ye, Yingyi Huang, Ruihang Lai, Hongyi Jin, Bohan Hou, Mengdi Wu, Yixin Dong, Anthony Yip, Zihao Ye, Songting Wang, Wenqin Yang, Xupeng Miao, Tianqi Chen, Zhihao Jia2025-12-22下载We introduce Mirage Persistent Kernel (MPK), the first compiler and runtime system that automatically transforms multi-GPU model inference into a single high-performance megakernel.
Faster Distributed Inference-Only Recommender Systems via Bounded Lag Synchronous CollectivesKiril Dichev, Filip Pawlowski, Albert-Jan Yzelman2025-12-22下载Recommender systems are enablers of personalized content delivery, and therefore revenue, for many large companies. In the last decade, deep learning recommender models (DLRMs) are the de-facto standa...
Simulations between Strongly Sublinear MPC and Node-Capacitated CliquePhilipp Schneider, Julian Werthmann2025-12-22下载We study how the strongly sublinear MPC model relates to the classic, graph-centric distributed models, focusing on the Node-Capacitated Clique (NCC), a bandwidth-parametrized generalization of the Co...
SPUMA: a minimally invasive approach to the GPU porting of OPENFOAMSimone Bnà, Giuseppe Giaquinto, Ettore Fadiga, Tommaso Zanelli, Francesco Bottau2025-12-22下载High Performance Computing (HPC) on hybrid clusters represents a significant opportunity for Computational Fluid Dynamics (CFD), especially when modern accelerators are utilized effectively.
CascadeInfer: Low-Latency and Load-Balanced LLM Serving via Length-Aware SchedulingYitao Yuan, Chenqi Zhao, Bohan Zhao, Zane Cao, Yongchao He, Wenfei Wu2025-12-22下载Efficiently harnessing GPU compute is critical to improving user experience and reducing operational costs in large language model (LLM) services.
Evidential Trust-Aware Model Personalization in Decentralized Federated Learning for Wearable IoTMurtaza Rangwala, Richard O. Sinnott, Rajkumar Buyya2025-12-22下载Decentralized federated learning (DFL) enables collaborative model training across edge devices without centralized coordination, offering resilience against single points of failure.
Timely Parameter Updating in Over-the-Air Federated LearningJiaqi Zhu, Zhongyuan Zhao, Xiao Li, Ruihao Du, Shi Jin, Howard H. Yang2025-12-22下载Incorporating over-the-air computations (OAC) into the model training process of federated learning (FL) is an effective approach to alleviating the communication bottleneck in FL systems.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
UCCL-EP: Portable Expert-Parallel CommunicationZiming Mao, Yihan Zhang, Chihan Cui, Zhen Huang, Kaichao You, Zhongjie Chen, Zhiying Xu, Zhenyu Gu, Scott Shenker, Costin Raiciu, Yang Zhou, Ion Stoica2025-12-22下载Mixture-of-Experts (MoE) workloads rely on expert parallelism (EP) to achieve high GPU efficiency. State-of-the-art EP communication systems such as DeepEP demonstrate strong performance but exhibit p...
CORE: Compensable Reward as a Catalyst for Improving Offline RL in Wireless NetworksLipeng Zu, Hansong Zhou, Yu Qian, Shayok Chakraborty, Yukun Yuan, Linke Guo, Xiaonan Zhang2025-12-22下载Real-world wireless data are expensive to collect and often lack sufficient expert demonstrations, causing existing offline RL methods to overfit suboptimal behaviors and exhibit unstable performance.
On Network-Aware Semantic Communication and Edge-Cloud Collaborative Intelligence SystemsMurdadha Nasif, Ahmed Refaey Hussein2025-12-22下载Semantic communication and edge-cloud collaborative intelligence are increasingly recognized as foundational enablers for next-generation intelligent services operating under stringent bandwidth, late...
Lightweight Intrusion Detection in IoT via SHAP-Guided Feature Pruning and Knowledge-Distilled Kronecker NetworksHafsa Benaddi, Mohammed Jouhari, Nouha Laamech, Anas Motii, Khalil Ibrahimi2025-12-22下载The widespread deployment of Internet of Things (IoT) devices requires intrusion detection systems (IDS) with high accuracy while operating under strict resource constraints.
Semantic Communication for Rate-Limited Closed-Loop Distributed Communication-Sensing-Control SystemsGuangjin Pan, Ayça Özçelikkale, Christian Häger, Musa Furkan Keskin, Henk Wymeersch2025-12-22下载The growing integration of distributed integrated sensing and communication (ISAC) with closed-loop control in intelligent networks demands efficient information transmission under stringent bandwidth...
BEVCooper: Accurate and Communication-Efficient Bird's-Eye-View Perception in Vehicular NetworksJiawei Hou, Peng Yang, Xiangxiang Dai, Mingliu Liu, Conghao Zhou2025-12-22下载Bird's-Eye-View (BEV) is critical to connected and automated vehicles (CAVs) as it can provide unified and precise representation of vehicular surroundings.
Optimal 3D Directional WPT Charging via UAV for 3D Wireless Rechargeable Sensor NetworksZhenguo Gao, Hui Li, Yiqin Chen, Qingyu Gao, Zhufang Kuang, Shih-Hau Fang, Hsiao-Chun Wu2025-12-22下载The high mobility and flexible deployment capability of UAVs make them an impressive option for charging nodes in Wireless Rechargeable Sensor Networks (WRSNs) using Directional Wireless Power Transfe...

cs.PF - Performance

标题作者发布日期PDF摘要
RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and InferenceGeorge Karfakis, Faraz Tahmasebi, Binglu Chen, Lime Yao, Saptarshi Mitra, Tianyue Pan, Hyoukjun Kwon, Puneet Gupta2025-12-22下载RAPID-LLM is a unified performance modeling framework for large language model (LLM) training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operato...

基于 VitePress 构建