2025-09-25

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
From GPUs to RRAMs: Distributed In-Memory Primal-Dual Hybrid Gradient Method for Solving Large-Scale Linear Optimization Problem	Huynh Q. N. Vo, Md Tawsif Rahman Chowdhury, Paritosh Ramanan, Gozde Tutuncuoglu, Junchi Yang, Feng Qiu, Murat Yildirim	2025-09-25	下载	The exponential growth of computational workloads is surpassing the capabilities of conventional architectures, which are constrained by fundamental limits.
Reliability Analysis of Fully Homomorphic Encryption Systems Under Memory Faults	Rian Adam Rajagede, Yan Solihin	2025-09-25	下载	Fully Homomorphic Encryption (FHE) represents a paradigm shift in cryptography, enabling computation directly on encrypted data and unlocking privacy-critical computation.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices	Yilong Li, Shuai Zhang, Yijing Zeng, Hao Zhang, Xinmiao Xiong, Jingyu Liu, Pan Hu, Suman Banerjee	2025-09-25	下载	Large Multimodal Models (LMMs) are inherently modular, consisting of vision and audio encoders, projectors, and large language models. Yet, they are almost always executed monolithically, which underu...
Redesigning GROMACS Halo Exchange: Improving Strong Scaling with GPU-initiated NVSHMEM	Mahesh Doijade, Andrey Alekseenko, Ania Brown, Alan Gray, Szilárd Páll	2025-09-25	下载	Improving time-to-solution in molecular dynamics simulations often requires strong scaling due to fixed-sized problems. GROMACS is highly latency-sensitive, with peak iteration rates in the sub-millis...
Data-Centric Elastic Pipeline Parallelism for Efficient Long-Context LLM Training	Shiju Wang, Yujie Wang, Ao Sun, Fangcheng Fu, Zijian Zhu, Bin Cui, Xu Han, Kaisheng Ma	2025-09-25	下载	Long context training is crucial for LLM's context extension. Existing schemes, such as sequence parallelism, incur substantial communication overhead.
SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips	Xinyu Lian, Masahiro Tanaka, Olatunji Ruwase, Minjia Zhang	2025-09-25	下载	The emergence of Superchips represents a significant advancement in next-generation AI hardware. These Superchips employ a tightly coupled heterogeneous architecture that integrates GPU and CPU on the...
Go With The Flow: Churn-Tolerant Decentralized Training of Large Language Models	Nikolay Blagoev, Bart Cox, Jérémie Decouchant, Lydia Y. Chen	2025-09-25	下载	Motivated by the emergence of large language models (LLMs) and the importance of democratizing their training, we propose GWTF, the first crash tolerant practical decentralized training framework for ...
From GPUs to RRAMs: Distributed In-Memory Primal-Dual Hybrid Gradient Method for Solving Large-Scale Linear Optimization Problem	Huynh Q. N. Vo, Md Tawsif Rahman Chowdhury, Paritosh Ramanan, Gozde Tutuncuoglu, Junchi Yang, Feng Qiu, Murat Yildirim	2025-09-25	下载	The exponential growth of computational workloads is surpassing the capabilities of conventional architectures, which are constrained by fundamental limits.
Communication Bias in Large Language Models: A Regulatory Perspective	Adrian Kuenzler, Stefan Schmid	2025-09-25	下载	Large language models (LLMs) are increasingly central to many applications, raising concerns about bias, fairness, and regulatory compliance. This paper reviews risks of biased outputs and their socie...
Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem	William F. Godoy, Tatiana Melnichenko, Pedro Valero-Lara, Wael Elwasif, Philip Fackler, Rafael Ferreira Da Silva, Keita Teranishi, Jeffrey S. Vetter	2025-09-25	下载	We explore the performance and portability of the novel Mojo language for scientific computing workloads on GPUs. As the first language based on the LLVM's Multi-Level Intermediate Representation (MLI...
Utilizing Sparsity in the GPU-accelerated Assembly of Schur Complement Matrices in Domain Decomposition Methods	Jakub Homola, Ondřej Meca, Lubomír Říha, Tomáš Brzobohatý	2025-09-25	下载	Schur complement matrices emerge in many domain decomposition methods that can solve complex engineering problems using supercomputers. Today, as most of the high-performance clusters' performance lie...
RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training	Wei Gao, Yuheng Zhao, Dakai An, Tianyuan Wu, Lunxi Cao, Shaopan Xiong, Ju Huang, Weixun Wang, Siran Yang, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng, Wei Wang	2025-09-25	下载	Reinforcement Learning (RL) is a pivotal post-training technique for enhancing the reasoning capabilities of Large Language Models (LLMs). However, synchronous RL post-training often suffers from sign...
IoT-MCP: Bridging LLMs and IoT Systems Through Model Context Protocol	Ningyuan Yang, Guanliang Lyu, Mingchen Ma, Yiyi Lu, Yiming Li, Zhihui Gao, Hancheng Ye, Jianyi Zhang, Tingjun Chen, Yiran Chen	2025-09-25	下载	The integration of Large Language Models (LLMs) with Internet-of-Things (IoT) systems faces significant challenges in hardware heterogeneity and control complexity.
RecIS: Sparse to Dense, A Unified Training Framework for Recommendation Models	Hua Zong, Qingtao Zeng, Zhengxiong Zhou, Zhihua Han, Zhensong Yan, Mingjie Liu, Hechen Sun, Jiawei Liu, Yiwen Hu, Qi Wang, YiHan Xian, Wenjie Guo, Houyuan Xiang, Zhiyuan Zeng, Xiangrong Sheng, Bencheng Yan, Nan Hu, Yuheng Huang, Jinqing Lian, Ziru Xu, Yan Zhang, Ju Huang, Siran Yang, Huimin Yi, Jiamang Wang, Pengjie Wang, Han Zhu, Jian Wu, Dan Ou, Jian Xu, Haihong Tang, Yuning Jiang, Bo Zheng, Lin Qu	2025-09-25	下载	In this paper, we propose RecIS, a unified Sparse-Dense training framework designed to achieve two primary goals: 1. Unified Framework To create a Unified sparse-dense training framework based on the ...
Prompt-Aware Scheduling for Low-Latency LLM Serving	Yiheng Tao, Yihe Zhang, Matthew T. Dearing, Xin Wang, Yuping Fan, Zhiling Lan	2025-09-25	下载	Efficient scheduling of LLM inference tasks is essential for achieving low latency and high throughput, particularly with the growing use of reasoning-capable LLMs.
Integrating and Characterizing HPC Task Runtime Systems for hybrid AI-HPC workloads	Andre Merzky, Mikhail Titov, Matteo Turilli, Shantenu Jha	2025-09-25	下载	Scientific workflows increasingly involve both HPC and machine-learning tasks, combining MPI-based simulations, training, and inference in a single execution.
Distributed-memory Algorithms for Sparse Matrix Permutation, Extraction, and Assignment	Elaheh Hassani, Md Taufique Hussain, Ariful Azad	2025-09-25	下载	We present scalable distributed-memory algorithms for sparse matrix permutation, extraction, and assignment. Our methods follow an Identify-Exchange-Build (IEB) strategy where each process identifies ...
Kant: An Efficient Unified Scheduling System for Large-Scale AI Clusters	Lingling Zeng, Gen Zhang, Jialin Peng, Xiang Xu, Yuan Xu, Lijun Ma	2025-09-25	下载	As AI cluster sizes continue to expand and the demand for large-language-model (LLM) training and inference workloads grows rapidly, traditional scheduling systems face significant challenges in balan...
Guiding Application Users via Estimation of Computational Resources for Massively Parallel Chemistry Computations	Tanzila Tabassum, Omer Subasi, Ajay Panyala, Epiya Ebiapia, Gerald Baumgartner, Erdal Mutlu, P., Sadayappan, Karol Kowalski	2025-09-25	下载	In this work, we develop machine learning (ML) based strategies to predict resources (costs) required for massively parallel chemistry computations, such as coupled-cluster methods, to guide applicati...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
XenoFlow: How Fast Can a SmartNIC-Based DNS Load Balancer Run?	Max Schrötter, Sten Heimbrodt, Bettina Schnor	2025-09-25	下载	With the advent of programmable network hardware, more and more functionality can be moved from software running on general purpose CPUs to the NIC.
eXplainable Artificial Intelligence for RL-based Networking Solutions	Yeison Stiven Murcia, Oscar Mauricio Caicedo, Daniela Maria Casas, Nelson Luis Saldanha da Fonseca	2025-09-25	下载	Reinforcement Learning (RL) agents have been widely used to improve networking tasks. However, understanding the decisions made by these agents is essential for their broader adoption in networking an...
MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANs	Prakhar Sharma, Haohuang Wen, Vinod Yegneswaran, Ashish Gehani, Phillip Porras, Zhiqiang Lin	2025-09-25	下载	The evolution toward 6G networks is being accelerated by the Open Radio Access Network (O-RAN) paradigm -- an open, interoperable architecture that enables intelligent, modular applications across pub...
A Target-Agnostic Protocol-Independent Interface for the Transport Layer	Pedro Mizuno, Kimiya Mohammadtaheri, Linfan Qian, Joshua Johnson, Danny Akbarzadeh, Chris Neely, Mario Baldi, Nachiket Kapre, Mina Tahmasbi Arashloo	2025-09-25	下载	Transport protocols continue to evolve to meet the demands of new applications, workloads, and network environments, yet implementing and evolving transport protocols remains difficult and costly.
Context-Aware Hybrid Routing in Bluetooth Mesh Networks Using Multi-Model Machine Learning and AODV Fallback	Md Sajid Islam, Tanvir Hasan	2025-09-25	下载	Bluetooth-based mesh networks offer a promising infrastructure for offline communication in emergency and resource constrained scenarios. However, traditional routing strategies such as Ad hoc On-Dema...
Semantic Edge-Cloud Communication for Real-Time Urban Traffic Surveillance with ViT and LLMs over Mobile Networks	Murat Arda Onsu, Poonam Lohan, Burak Kantarci, Aisha Syed, Matthew Andrews, Sean Kennedy	2025-09-25	下载	Real-time urban traffic surveillance is vital for Intelligent Transportation Systems (ITS) to ensure road safety, optimize traffic flow, track vehicle trajectories, and prevent collisions in smart cit...
Hybrid RIS-Aided Digital Over-the-Air Computing for Edge AI Inference: Joint Feature Quantization and Active-Passive Beamforming Design	Yang Fu, Peng Qin, Liming Chen, Xianchao Zhang, Yifei Wang	2025-09-25	下载	The vision of 6G networks aims to enable edge inference by leveraging ubiquitously deployed artificial intelligence (AI) models, facilitating intelligent environmental perception for a wide range of a...
Flight Dynamics to Sensing Modalities: Exploiting Drone Ground Effect for Accurate Edge Detection	Chenyu Zhao, Jingao Xu, Ciyu Ruan, Haoyang Wang, Shengbo Wang, Jiaqi Li, Jirong Zha, Weijie Hong, Zheng Yang, Yunhao Liu, Xiao-Ping Zhang, Xinlei Chen	2025-09-25	下载	Drone-based rapid and accurate environmental edge detection is highly advantageous for tasks such as disaster relief and autonomous navigation.
Leveraging Large Language Models for Automated Reproduction of Networking Research Results	Yining Jiang, Yunxin Xu, Wenyun Xu, Yufan Zhu, Tangtang He, Haiying Huang, Letian Zhu, Qingyu Song, Qiang Su, Lizhao You, Lu Tang, Wanjin Feng, Yuchao Zhang, Linghe Kong, Qiao Xiang, Jiwu Shu	2025-09-25	下载	Code reproduction is a cornerstone of scientific validity, yet it remains a formidable challenge in computer networking research due to the scarcity of open-source implementations and the complexity o...
A Novel Integrated Architecture for Intent Based Approach and Zero Touch Networks	Neelam Gupta, Dibakar Das, Tamizhelakkiya K, Uma Maheswari Natarajan, Sharvari Ravindran, Komal Sharma, Jyotsna Bapat, Debabrata Das	2025-09-25	下载	The transition to Sixth Generation (6G) networks presents challenges in managing quality of service (QoS) of diverse applications and achieving Service Level Agreements (SLAs) under varying network co...
BSB: Towards Demand-Aware Peer Selection With XOR-based Routing	Qingyun Ji, Darya Melnyk, Arash Pourdamghani, Stefan Schmid	2025-09-25	下载	Peer-to-peer networks, as a key enabler of modern networked and distributed systems, rely on peer-selection algorithms to optimize their scalability and performance.
Joint Active RIS Configuration and User Power Control for Localization: A Neuroevolution-Based Approach	George Stamatelis, Hui Chen, Henk Wymeersch, George C. Alexandropoulos	2025-09-25	下载	This paper studies user localization aided by a Reconfigurable Intelligent Surface (RIS). A feedback link from the Base Station (BS) to the user is adopted to enable dynamic power control of the user ...
Trustworthy Semantic Communication for Vehicular Networks: Challenges and Solutions	Yanghe Pan, Yuntao Wang, Shaolong Guo, Chengyu Yin, Ruidong Li, Zhou Su, Yuan Wu	2025-09-25	下载	Semantic communication (SemCom) has the potential to significantly reduce communication delay in vehicle-to-everything (V2X) communications within vehicular networks (VNs).
NetCAS: Dynamic Cache and Backend Device Management in Networked Environments	Joon Yong Hwang, Chanseo Park, Ikjun Yeom, Younghoon Kim	2025-09-25	下载	Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits...
An SDR-Based Test Platform for 5G NTN Prototyping and Validation	Lu Hou, Kan Zheng, Jie Mei, Cheng Huang	2025-09-25	下载	The integration of satellite communication into 5G has been formalized in 3GPP Release 17 through the specification of Non-Terrestrial Networks (NTN), marking a significant step toward achieving globa...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
A Target-Agnostic Protocol-Independent Interface for the Transport Layer	Pedro Mizuno, Kimiya Mohammadtaheri, Linfan Qian, Joshua Johnson, Danny Akbarzadeh, Chris Neely, Mario Baldi, Nachiket Kapre, Mina Tahmasbi Arashloo	2025-09-25	下载	Transport protocols continue to evolve to meet the demands of new applications, workloads, and network environments, yet implementing and evolving transport protocols remains difficult and costly.
Nova: Real-Time Agentic Vision-Language Model Serving with Adaptive Cross-Stage Parallelization	Yuhang Xu, Shengzhong Liu, Dong Zhang, Bingheng Yan, Fan Wu, Guihai Chen	2025-09-25	下载	This paper presents Nova, a real-time scheduling framework for serving agentic vision-language models (VLMs) on a single GPU with balanced per-request latency and overall request process throughput.
NetCAS: Dynamic Cache and Backend Device Management in Networked Environments	Joon Yong Hwang, Chanseo Park, Ikjun Yeom, Younghoon Kim	2025-09-25	下载	Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Size-Aware Dispatching to Fluid Queues	Runhan Xie, Esa Hyytiä, Rhonda Righter	2025-09-25	下载	We develop a fluid-flow model for routing problems, where fluid consists of different size particles and the task is to route the incoming fluid to $n$ parallel servers using the size information in o...
PreLoRA: Hybrid Pre-training of Vision Transformers with Full Training and Low-Rank Adapters	Krishu K Thapa, Reet Barik, Krishna Teja Chitty-Venkata, Murali Emani, Venkatram Vishwanath	2025-09-25	下载	Training large models ranging from millions to billions of parameters is highly resource-intensive, requiring significant time, compute, and memory.
Redesigning GROMACS Halo Exchange: Improving Strong Scaling with GPU-initiated NVSHMEM	Mahesh Doijade, Andrey Alekseenko, Ania Brown, Alan Gray, Szilárd Páll	2025-09-25	下载	Improving time-to-solution in molecular dynamics simulations often requires strong scaling due to fixed-sized problems. GROMACS is highly latency-sensitive, with peak iteration rates in the sub-millis...
Fast-SEnSeI: Lightweight Sensor-Independent Cloud Masking for On-board Multispectral Sensors	Jan Kněžík, Jonáš Herec, Rado Pitoňák	2025-09-25	下载	Cloud segmentation is a critical preprocessing step for many Earth observation tasks, yet most models are tightly coupled to specific sensor configurations and rely on ground-based processing.
Prompt-Aware Scheduling for Low-Latency LLM Serving	Yiheng Tao, Yihe Zhang, Matthew T. Dearing, Xin Wang, Yuping Fan, Zhiling Lan	2025-09-25	下载	Efficient scheduling of LLM inference tasks is essential for achieving low latency and high throughput, particularly with the growing use of reasoning-capable LLMs.
NetCAS: Dynamic Cache and Backend Device Management in Networked Environments	Joon Yong Hwang, Chanseo Park, Ikjun Yeom, Younghoon Kim	2025-09-25	下载	Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits...
Sig2Model: A Boosting-Driven Model for Updatable Learned Indexes	Alireza Heidari, Amirhossein Ahmad, Wei Zhang, Ying Xiong	2025-09-25	下载	Learned Indexes (LIs) represent a paradigm shift from traditional index structures by employing machine learning models to approximate the cumulative distribution function (CDF) of sorted data.