Skip to content

2025-09-25

cs.AR - Architecture

标题作者发布日期PDF摘要
From GPUs to RRAMs: Distributed In-Memory Primal-Dual Hybrid Gradient Method for Solving Large-Scale Linear Optimization ProblemHuynh Q. N. Vo, Md Tawsif Rahman Chowdhury, Paritosh Ramanan, Gozde Tutuncuoglu, Junchi Yang, Feng Qiu, Murat Yildirim2025-09-25下载The exponential growth of computational workloads is surpassing the capabilities of conventional architectures, which are constrained by fundamental limits.
Reliability Analysis of Fully Homomorphic Encryption Systems Under Memory FaultsRian Adam Rajagede, Yan Solihin2025-09-25下载Fully Homomorphic Encryption (FHE) represents a paradigm shift in cryptography, enabling computation directly on encrypted data and unlocking privacy-critical computation.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small DevicesYilong Li, Shuai Zhang, Yijing Zeng, Hao Zhang, Xinmiao Xiong, Jingyu Liu, Pan Hu, Suman Banerjee2025-09-25下载Large Multimodal Models (LMMs) are inherently modular, consisting of vision and audio encoders, projectors, and large language models. Yet, they are almost always executed monolithically, which underu...
Redesigning GROMACS Halo Exchange: Improving Strong Scaling with GPU-initiated NVSHMEMMahesh Doijade, Andrey Alekseenko, Ania Brown, Alan Gray, Szilárd Páll2025-09-25下载Improving time-to-solution in molecular dynamics simulations often requires strong scaling due to fixed-sized problems. GROMACS is highly latency-sensitive, with peak iteration rates in the sub-millis...
Data-Centric Elastic Pipeline Parallelism for Efficient Long-Context LLM TrainingShiju Wang, Yujie Wang, Ao Sun, Fangcheng Fu, Zijian Zhu, Bin Cui, Xu Han, Kaisheng Ma2025-09-25下载Long context training is crucial for LLM's context extension. Existing schemes, such as sequence parallelism, incur substantial communication overhead.
SuperOffload: Unleashing the Power of Large-Scale LLM Training on SuperchipsXinyu Lian, Masahiro Tanaka, Olatunji Ruwase, Minjia Zhang2025-09-25下载The emergence of Superchips represents a significant advancement in next-generation AI hardware. These Superchips employ a tightly coupled heterogeneous architecture that integrates GPU and CPU on the...
Go With The Flow: Churn-Tolerant Decentralized Training of Large Language ModelsNikolay Blagoev, Bart Cox, Jérémie Decouchant, Lydia Y. Chen2025-09-25下载Motivated by the emergence of large language models (LLMs) and the importance of democratizing their training, we propose GWTF, the first crash tolerant practical decentralized training framework for ...
From GPUs to RRAMs: Distributed In-Memory Primal-Dual Hybrid Gradient Method for Solving Large-Scale Linear Optimization ProblemHuynh Q. N. Vo, Md Tawsif Rahman Chowdhury, Paritosh Ramanan, Gozde Tutuncuoglu, Junchi Yang, Feng Qiu, Murat Yildirim2025-09-25下载The exponential growth of computational workloads is surpassing the capabilities of conventional architectures, which are constrained by fundamental limits.
Communication Bias in Large Language Models: A Regulatory PerspectiveAdrian Kuenzler, Stefan Schmid2025-09-25下载Large language models (LLMs) are increasingly central to many applications, raising concerns about bias, fairness, and regulatory compliance. This paper reviews risks of biased outputs and their socie...
Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python EcosystemWilliam F. Godoy, Tatiana Melnichenko, Pedro Valero-Lara, Wael Elwasif, Philip Fackler, Rafael Ferreira Da Silva, Keita Teranishi, Jeffrey S. Vetter2025-09-25下载We explore the performance and portability of the novel Mojo language for scientific computing workloads on GPUs. As the first language based on the LLVM's Multi-Level Intermediate Representation (MLI...
Utilizing Sparsity in the GPU-accelerated Assembly of Schur Complement Matrices in Domain Decomposition MethodsJakub Homola, Ondřej Meca, Lubomír Říha, Tomáš Brzobohatý2025-09-25下载Schur complement matrices emerge in many domain decomposition methods that can solve complex engineering problems using supercomputers. Today, as most of the high-performance clusters' performance lie...
RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-TrainingWei Gao, Yuheng Zhao, Dakai An, Tianyuan Wu, Lunxi Cao, Shaopan Xiong, Ju Huang, Weixun Wang, Siran Yang, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng, Wei Wang2025-09-25下载Reinforcement Learning (RL) is a pivotal post-training technique for enhancing the reasoning capabilities of Large Language Models (LLMs). However, synchronous RL post-training often suffers from sign...
IoT-MCP: Bridging LLMs and IoT Systems Through Model Context ProtocolNingyuan Yang, Guanliang Lyu, Mingchen Ma, Yiyi Lu, Yiming Li, Zhihui Gao, Hancheng Ye, Jianyi Zhang, Tingjun Chen, Yiran Chen2025-09-25下载The integration of Large Language Models (LLMs) with Internet-of-Things (IoT) systems faces significant challenges in hardware heterogeneity and control complexity.
RecIS: Sparse to Dense, A Unified Training Framework for Recommendation ModelsHua Zong, Qingtao Zeng, Zhengxiong Zhou, Zhihua Han, Zhensong Yan, Mingjie Liu, Hechen Sun, Jiawei Liu, Yiwen Hu, Qi Wang, YiHan Xian, Wenjie Guo, Houyuan Xiang, Zhiyuan Zeng, Xiangrong Sheng, Bencheng Yan, Nan Hu, Yuheng Huang, Jinqing Lian, Ziru Xu, Yan Zhang, Ju Huang, Siran Yang, Huimin Yi, Jiamang Wang, Pengjie Wang, Han Zhu, Jian Wu, Dan Ou, Jian Xu, Haihong Tang, Yuning Jiang, Bo Zheng, Lin Qu2025-09-25下载In this paper, we propose RecIS, a unified Sparse-Dense training framework designed to achieve two primary goals: 1. Unified Framework To create a Unified sparse-dense training framework based on the ...
Prompt-Aware Scheduling for Low-Latency LLM ServingYiheng Tao, Yihe Zhang, Matthew T. Dearing, Xin Wang, Yuping Fan, Zhiling Lan2025-09-25下载Efficient scheduling of LLM inference tasks is essential for achieving low latency and high throughput, particularly with the growing use of reasoning-capable LLMs.
Integrating and Characterizing HPC Task Runtime Systems for hybrid AI-HPC workloadsAndre Merzky, Mikhail Titov, Matteo Turilli, Shantenu Jha2025-09-25下载Scientific workflows increasingly involve both HPC and machine-learning tasks, combining MPI-based simulations, training, and inference in a single execution.
Distributed-memory Algorithms for Sparse Matrix Permutation, Extraction, and AssignmentElaheh Hassani, Md Taufique Hussain, Ariful Azad2025-09-25下载We present scalable distributed-memory algorithms for sparse matrix permutation, extraction, and assignment. Our methods follow an Identify-Exchange-Build (IEB) strategy where each process identifies ...
Kant: An Efficient Unified Scheduling System for Large-Scale AI ClustersLingling Zeng, Gen Zhang, Jialin Peng, Xiang Xu, Yuan Xu, Lijun Ma2025-09-25下载As AI cluster sizes continue to expand and the demand for large-language-model (LLM) training and inference workloads grows rapidly, traditional scheduling systems face significant challenges in balan...
Guiding Application Users via Estimation of Computational Resources for Massively Parallel Chemistry ComputationsTanzila Tabassum, Omer Subasi, Ajay Panyala, Epiya Ebiapia, Gerald Baumgartner, Erdal Mutlu, P., Sadayappan, Karol Kowalski2025-09-25下载In this work, we develop machine learning (ML) based strategies to predict resources (costs) required for massively parallel chemistry computations, such as coupled-cluster methods, to guide applicati...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
XenoFlow: How Fast Can a SmartNIC-Based DNS Load Balancer Run?Max Schrötter, Sten Heimbrodt, Bettina Schnor2025-09-25下载With the advent of programmable network hardware, more and more functionality can be moved from software running on general purpose CPUs to the NIC.
eXplainable Artificial Intelligence for RL-based Networking SolutionsYeison Stiven Murcia, Oscar Mauricio Caicedo, Daniela Maria Casas, Nelson Luis Saldanha da Fonseca2025-09-25下载Reinforcement Learning (RL) agents have been widely used to improve networking tasks. However, understanding the decisions made by these agents is essential for their broader adoption in networking an...
MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANsPrakhar Sharma, Haohuang Wen, Vinod Yegneswaran, Ashish Gehani, Phillip Porras, Zhiqiang Lin2025-09-25下载The evolution toward 6G networks is being accelerated by the Open Radio Access Network (O-RAN) paradigm -- an open, interoperable architecture that enables intelligent, modular applications across pub...
A Target-Agnostic Protocol-Independent Interface for the Transport LayerPedro Mizuno, Kimiya Mohammadtaheri, Linfan Qian, Joshua Johnson, Danny Akbarzadeh, Chris Neely, Mario Baldi, Nachiket Kapre, Mina Tahmasbi Arashloo2025-09-25下载Transport protocols continue to evolve to meet the demands of new applications, workloads, and network environments, yet implementing and evolving transport protocols remains difficult and costly.
Context-Aware Hybrid Routing in Bluetooth Mesh Networks Using Multi-Model Machine Learning and AODV FallbackMd Sajid Islam, Tanvir Hasan2025-09-25下载Bluetooth-based mesh networks offer a promising infrastructure for offline communication in emergency and resource constrained scenarios. However, traditional routing strategies such as Ad hoc On-Dema...
Semantic Edge-Cloud Communication for Real-Time Urban Traffic Surveillance with ViT and LLMs over Mobile NetworksMurat Arda Onsu, Poonam Lohan, Burak Kantarci, Aisha Syed, Matthew Andrews, Sean Kennedy2025-09-25下载Real-time urban traffic surveillance is vital for Intelligent Transportation Systems (ITS) to ensure road safety, optimize traffic flow, track vehicle trajectories, and prevent collisions in smart cit...
Hybrid RIS-Aided Digital Over-the-Air Computing for Edge AI Inference: Joint Feature Quantization and Active-Passive Beamforming DesignYang Fu, Peng Qin, Liming Chen, Xianchao Zhang, Yifei Wang2025-09-25下载The vision of 6G networks aims to enable edge inference by leveraging ubiquitously deployed artificial intelligence (AI) models, facilitating intelligent environmental perception for a wide range of a...
Flight Dynamics to Sensing Modalities: Exploiting Drone Ground Effect for Accurate Edge DetectionChenyu Zhao, Jingao Xu, Ciyu Ruan, Haoyang Wang, Shengbo Wang, Jiaqi Li, Jirong Zha, Weijie Hong, Zheng Yang, Yunhao Liu, Xiao-Ping Zhang, Xinlei Chen2025-09-25下载Drone-based rapid and accurate environmental edge detection is highly advantageous for tasks such as disaster relief and autonomous navigation.
Leveraging Large Language Models for Automated Reproduction of Networking Research ResultsYining Jiang, Yunxin Xu, Wenyun Xu, Yufan Zhu, Tangtang He, Haiying Huang, Letian Zhu, Qingyu Song, Qiang Su, Lizhao You, Lu Tang, Wanjin Feng, Yuchao Zhang, Linghe Kong, Qiao Xiang, Jiwu Shu2025-09-25下载Code reproduction is a cornerstone of scientific validity, yet it remains a formidable challenge in computer networking research due to the scarcity of open-source implementations and the complexity o...
A Novel Integrated Architecture for Intent Based Approach and Zero Touch NetworksNeelam Gupta, Dibakar Das, Tamizhelakkiya K, Uma Maheswari Natarajan, Sharvari Ravindran, Komal Sharma, Jyotsna Bapat, Debabrata Das2025-09-25下载The transition to Sixth Generation (6G) networks presents challenges in managing quality of service (QoS) of diverse applications and achieving Service Level Agreements (SLAs) under varying network co...
BSB: Towards Demand-Aware Peer Selection With XOR-based RoutingQingyun Ji, Darya Melnyk, Arash Pourdamghani, Stefan Schmid2025-09-25下载Peer-to-peer networks, as a key enabler of modern networked and distributed systems, rely on peer-selection algorithms to optimize their scalability and performance.
Joint Active RIS Configuration and User Power Control for Localization: A Neuroevolution-Based ApproachGeorge Stamatelis, Hui Chen, Henk Wymeersch, George C. Alexandropoulos2025-09-25下载This paper studies user localization aided by a Reconfigurable Intelligent Surface (RIS). A feedback link from the Base Station (BS) to the user is adopted to enable dynamic power control of the user ...
Trustworthy Semantic Communication for Vehicular Networks: Challenges and SolutionsYanghe Pan, Yuntao Wang, Shaolong Guo, Chengyu Yin, Ruidong Li, Zhou Su, Yuan Wu2025-09-25下载Semantic communication (SemCom) has the potential to significantly reduce communication delay in vehicle-to-everything (V2X) communications within vehicular networks (VNs).
NetCAS: Dynamic Cache and Backend Device Management in Networked EnvironmentsJoon Yong Hwang, Chanseo Park, Ikjun Yeom, Younghoon Kim2025-09-25下载Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits...
An SDR-Based Test Platform for 5G NTN Prototyping and ValidationLu Hou, Kan Zheng, Jie Mei, Cheng Huang2025-09-25下载The integration of satellite communication into 5G has been formalized in 3GPP Release 17 through the specification of Non-Terrestrial Networks (NTN), marking a significant step toward achieving globa...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
A Target-Agnostic Protocol-Independent Interface for the Transport LayerPedro Mizuno, Kimiya Mohammadtaheri, Linfan Qian, Joshua Johnson, Danny Akbarzadeh, Chris Neely, Mario Baldi, Nachiket Kapre, Mina Tahmasbi Arashloo2025-09-25下载Transport protocols continue to evolve to meet the demands of new applications, workloads, and network environments, yet implementing and evolving transport protocols remains difficult and costly.
Nova: Real-Time Agentic Vision-Language Model Serving with Adaptive Cross-Stage ParallelizationYuhang Xu, Shengzhong Liu, Dong Zhang, Bingheng Yan, Fan Wu, Guihai Chen2025-09-25下载This paper presents Nova, a real-time scheduling framework for serving agentic vision-language models (VLMs) on a single GPU with balanced per-request latency and overall request process throughput.
NetCAS: Dynamic Cache and Backend Device Management in Networked EnvironmentsJoon Yong Hwang, Chanseo Park, Ikjun Yeom, Younghoon Kim2025-09-25下载Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits...

cs.PF - Performance

标题作者发布日期PDF摘要
Size-Aware Dispatching to Fluid QueuesRunhan Xie, Esa Hyytiä, Rhonda Righter2025-09-25下载We develop a fluid-flow model for routing problems, where fluid consists of different size particles and the task is to route the incoming fluid to nn parallel servers using the size information in o...
PreLoRA: Hybrid Pre-training of Vision Transformers with Full Training and Low-Rank AdaptersKrishu K Thapa, Reet Barik, Krishna Teja Chitty-Venkata, Murali Emani, Venkatram Vishwanath2025-09-25下载Training large models ranging from millions to billions of parameters is highly resource-intensive, requiring significant time, compute, and memory.
Redesigning GROMACS Halo Exchange: Improving Strong Scaling with GPU-initiated NVSHMEMMahesh Doijade, Andrey Alekseenko, Ania Brown, Alan Gray, Szilárd Páll2025-09-25下载Improving time-to-solution in molecular dynamics simulations often requires strong scaling due to fixed-sized problems. GROMACS is highly latency-sensitive, with peak iteration rates in the sub-millis...
Fast-SEnSeI: Lightweight Sensor-Independent Cloud Masking for On-board Multispectral SensorsJan Kněžík, Jonáš Herec, Rado Pitoňák2025-09-25下载Cloud segmentation is a critical preprocessing step for many Earth observation tasks, yet most models are tightly coupled to specific sensor configurations and rely on ground-based processing.
Prompt-Aware Scheduling for Low-Latency LLM ServingYiheng Tao, Yihe Zhang, Matthew T. Dearing, Xin Wang, Yuping Fan, Zhiling Lan2025-09-25下载Efficient scheduling of LLM inference tasks is essential for achieving low latency and high throughput, particularly with the growing use of reasoning-capable LLMs.
NetCAS: Dynamic Cache and Backend Device Management in Networked EnvironmentsJoon Yong Hwang, Chanseo Park, Ikjun Yeom, Younghoon Kim2025-09-25下载Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits...
Sig2Model: A Boosting-Driven Model for Updatable Learned IndexesAlireza Heidari, Amirhossein Ahmad, Wei Zhang, Ying Xiong2025-09-25下载Learned Indexes (LIs) represent a paradigm shift from traditional index structures by employing machine learning models to approximate the cumulative distribution function (CDF) of sorted data.

基于 VitePress 构建