Appearance
2025-12-22
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Sensitivity-Aware Mixed-Precision Quantization for ReRAM-based Computing-in-Memory | Guan-Cheng Chen, Chieh-Lin Tsai, Pei-Hsuan Tsai, Yuan-Hao Chang | 2025-12-22 | 下载 | Compute-In-Memory (CIM) systems, particularly those utilizing ReRAM and memristive technologies, offer a promising path toward energy-efficient neural network computation. |
| Binary Neural Network Implementation for Handwritten Digit Recognition on FPGA | Emir Devlet Ertörer, Cem Ünsalan | 2025-12-22 | 下载 | Binary neural networks provide a promising solution for low-power, high-speed inference by replacing expensive floating-point operations with bitwise logic. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| An Adaptive Distributed Stencil Abstraction for GPUs | Aditya Bhosale, Laxmikant Kale | 2025-12-22 | 下载 | The scientific computing ecosystem in Python is largely confined to single-node parallelism, creating a gap between high-level prototyping in NumPy and high-performance execution on modern supercomput... |
| UCCL-EP: Portable Expert-Parallel Communication | Ziming Mao, Yihan Zhang, Chihan Cui, Zhen Huang, Kaichao You, Zhongjie Chen, Zhiying Xu, Zhenyu Gu, Scott Shenker, Costin Raiciu, Yang Zhou, Ion Stoica | 2025-12-22 | 下载 | Mixture-of-Experts (MoE) workloads rely on expert parallelism (EP) to achieve high GPU efficiency. State-of-the-art EP communication systems such as DeepEP demonstrate strong performance but exhibit p... |
| Holoscope: Open and Lightweight Distributed Telescope & Honeypot Platform | Andrea Sordello, Marco Mellia, Idilio Drago, Rodolfo Valentim, Francesco Musumeci, Massimo Tornatore, Federico Cerutti, Martino Trevisan, Alessio Botta, Willen Borges Coelho | 2025-12-22 | 下载 | The complexity and scale of Internet attacks call for distributed, cooperative observatories capable of monitoring malicious traffic across diverse networks. |
| PHOTON: Hierarchical Autoregressive Modeling for Lightspeed and Memory-Efficient Language Generation | Yuma Ichikawa, Naoya Takagi, Takumi Nakagawa, Yuzi Kanazawa, Akira Sakai | 2025-12-22 | 下载 | Transformers operate as horizontal token-by-token scanners; at each generation step, attending to an ever-growing sequence of token-level states. |
| RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference | George Karfakis, Faraz Tahmasebi, Binglu Chen, Lime Yao, Saptarshi Mitra, Tianyue Pan, Hyoukjun Kwon, Puneet Gupta | 2025-12-22 | 下载 | RAPID-LLM is a unified performance modeling framework for large language model (LLM) training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operato... |
| Learned Digital Codes for Over-the-Air Computation in Federated Edge Learning | Antonio Tarizzo, Mohammad Kazemi, Deniz Gündüz | 2025-12-22 | 下载 | Federated edge learning (FEEL) enables wireless devices to collaboratively train a centralised model without sharing raw data, but repeated uplink transmission of model updates makes communication the... |
| A Survey of Real-Time Support, Analysis, and Advancements in ROS 2 | Daniel Casini, Jian-Jia Chen, Jing Li, Federico Reghenzani, Harun Teper | 2025-12-22 | 下载 | The Robot Operating System 2 (ROS~2) has emerged as a relevant middleware framework for robotic applications, offering modularity, distributed execution, and communication. |
| Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs | Xinhao Cheng, Zhihao Zhang, Yu Zhou, Jianan Ji, Jinchen Jiang, Zepeng Zhao, Ziruo Xiao, Zihao Ye, Yingyi Huang, Ruihang Lai, Hongyi Jin, Bohan Hou, Mengdi Wu, Yixin Dong, Anthony Yip, Zihao Ye, Songting Wang, Wenqin Yang, Xupeng Miao, Tianqi Chen, Zhihao Jia | 2025-12-22 | 下载 | We introduce Mirage Persistent Kernel (MPK), the first compiler and runtime system that automatically transforms multi-GPU model inference into a single high-performance megakernel. |
| Faster Distributed Inference-Only Recommender Systems via Bounded Lag Synchronous Collectives | Kiril Dichev, Filip Pawlowski, Albert-Jan Yzelman | 2025-12-22 | 下载 | Recommender systems are enablers of personalized content delivery, and therefore revenue, for many large companies. In the last decade, deep learning recommender models (DLRMs) are the de-facto standa... |
| Simulations between Strongly Sublinear MPC and Node-Capacitated Clique | Philipp Schneider, Julian Werthmann | 2025-12-22 | 下载 | We study how the strongly sublinear MPC model relates to the classic, graph-centric distributed models, focusing on the Node-Capacitated Clique (NCC), a bandwidth-parametrized generalization of the Co... |
| SPUMA: a minimally invasive approach to the GPU porting of OPENFOAM | Simone Bnà, Giuseppe Giaquinto, Ettore Fadiga, Tommaso Zanelli, Francesco Bottau | 2025-12-22 | 下载 | High Performance Computing (HPC) on hybrid clusters represents a significant opportunity for Computational Fluid Dynamics (CFD), especially when modern accelerators are utilized effectively. |
| CascadeInfer: Low-Latency and Load-Balanced LLM Serving via Length-Aware Scheduling | Yitao Yuan, Chenqi Zhao, Bohan Zhao, Zane Cao, Yongchao He, Wenfei Wu | 2025-12-22 | 下载 | Efficiently harnessing GPU compute is critical to improving user experience and reducing operational costs in large language model (LLM) services. |
| Evidential Trust-Aware Model Personalization in Decentralized Federated Learning for Wearable IoT | Murtaza Rangwala, Richard O. Sinnott, Rajkumar Buyya | 2025-12-22 | 下载 | Decentralized federated learning (DFL) enables collaborative model training across edge devices without centralized coordination, offering resilience against single points of failure. |
| Timely Parameter Updating in Over-the-Air Federated Learning | Jiaqi Zhu, Zhongyuan Zhao, Xiao Li, Ruihao Du, Shi Jin, Howard H. Yang | 2025-12-22 | 下载 | Incorporating over-the-air computations (OAC) into the model training process of federated learning (FL) is an effective approach to alleviating the communication bottleneck in FL systems. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| UCCL-EP: Portable Expert-Parallel Communication | Ziming Mao, Yihan Zhang, Chihan Cui, Zhen Huang, Kaichao You, Zhongjie Chen, Zhiying Xu, Zhenyu Gu, Scott Shenker, Costin Raiciu, Yang Zhou, Ion Stoica | 2025-12-22 | 下载 | Mixture-of-Experts (MoE) workloads rely on expert parallelism (EP) to achieve high GPU efficiency. State-of-the-art EP communication systems such as DeepEP demonstrate strong performance but exhibit p... |
| CORE: Compensable Reward as a Catalyst for Improving Offline RL in Wireless Networks | Lipeng Zu, Hansong Zhou, Yu Qian, Shayok Chakraborty, Yukun Yuan, Linke Guo, Xiaonan Zhang | 2025-12-22 | 下载 | Real-world wireless data are expensive to collect and often lack sufficient expert demonstrations, causing existing offline RL methods to overfit suboptimal behaviors and exhibit unstable performance. |
| On Network-Aware Semantic Communication and Edge-Cloud Collaborative Intelligence Systems | Murdadha Nasif, Ahmed Refaey Hussein | 2025-12-22 | 下载 | Semantic communication and edge-cloud collaborative intelligence are increasingly recognized as foundational enablers for next-generation intelligent services operating under stringent bandwidth, late... |
| Lightweight Intrusion Detection in IoT via SHAP-Guided Feature Pruning and Knowledge-Distilled Kronecker Networks | Hafsa Benaddi, Mohammed Jouhari, Nouha Laamech, Anas Motii, Khalil Ibrahimi | 2025-12-22 | 下载 | The widespread deployment of Internet of Things (IoT) devices requires intrusion detection systems (IDS) with high accuracy while operating under strict resource constraints. |
| Semantic Communication for Rate-Limited Closed-Loop Distributed Communication-Sensing-Control Systems | Guangjin Pan, Ayça Özçelikkale, Christian Häger, Musa Furkan Keskin, Henk Wymeersch | 2025-12-22 | 下载 | The growing integration of distributed integrated sensing and communication (ISAC) with closed-loop control in intelligent networks demands efficient information transmission under stringent bandwidth... |
| BEVCooper: Accurate and Communication-Efficient Bird's-Eye-View Perception in Vehicular Networks | Jiawei Hou, Peng Yang, Xiangxiang Dai, Mingliu Liu, Conghao Zhou | 2025-12-22 | 下载 | Bird's-Eye-View (BEV) is critical to connected and automated vehicles (CAVs) as it can provide unified and precise representation of vehicular surroundings. |
| Optimal 3D Directional WPT Charging via UAV for 3D Wireless Rechargeable Sensor Networks | Zhenguo Gao, Hui Li, Yiqin Chen, Qingyu Gao, Zhufang Kuang, Shih-Hau Fang, Hsiao-Chun Wu | 2025-12-22 | 下载 | The high mobility and flexible deployment capability of UAVs make them an impressive option for charging nodes in Wireless Rechargeable Sensor Networks (WRSNs) using Directional Wireless Power Transfe... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference | George Karfakis, Faraz Tahmasebi, Binglu Chen, Lime Yao, Saptarshi Mitra, Tianyue Pan, Hyoukjun Kwon, Puneet Gupta | 2025-12-22 | 下载 | RAPID-LLM is a unified performance modeling framework for large language model (LLM) training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operato... |