Appearance
2025-09-25
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| From GPUs to RRAMs: Distributed In-Memory Primal-Dual Hybrid Gradient Method for Solving Large-Scale Linear Optimization Problem | Huynh Q. N. Vo, Md Tawsif Rahman Chowdhury, Paritosh Ramanan, Gozde Tutuncuoglu, Junchi Yang, Feng Qiu, Murat Yildirim | 2025-09-25 | 下载 | The exponential growth of computational workloads is surpassing the capabilities of conventional architectures, which are constrained by fundamental limits. |
| Reliability Analysis of Fully Homomorphic Encryption Systems Under Memory Faults | Rian Adam Rajagede, Yan Solihin | 2025-09-25 | 下载 | Fully Homomorphic Encryption (FHE) represents a paradigm shift in cryptography, enabling computation directly on encrypted data and unlocking privacy-critical computation. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Tiny but Mighty: A Software-Hardware Co-Design Approach for Efficient Multimodal Inference on Battery-Powered Small Devices | Yilong Li, Shuai Zhang, Yijing Zeng, Hao Zhang, Xinmiao Xiong, Jingyu Liu, Pan Hu, Suman Banerjee | 2025-09-25 | 下载 | Large Multimodal Models (LMMs) are inherently modular, consisting of vision and audio encoders, projectors, and large language models. Yet, they are almost always executed monolithically, which underu... |
| Redesigning GROMACS Halo Exchange: Improving Strong Scaling with GPU-initiated NVSHMEM | Mahesh Doijade, Andrey Alekseenko, Ania Brown, Alan Gray, Szilárd Páll | 2025-09-25 | 下载 | Improving time-to-solution in molecular dynamics simulations often requires strong scaling due to fixed-sized problems. GROMACS is highly latency-sensitive, with peak iteration rates in the sub-millis... |
| Data-Centric Elastic Pipeline Parallelism for Efficient Long-Context LLM Training | Shiju Wang, Yujie Wang, Ao Sun, Fangcheng Fu, Zijian Zhu, Bin Cui, Xu Han, Kaisheng Ma | 2025-09-25 | 下载 | Long context training is crucial for LLM's context extension. Existing schemes, such as sequence parallelism, incur substantial communication overhead. |
| SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips | Xinyu Lian, Masahiro Tanaka, Olatunji Ruwase, Minjia Zhang | 2025-09-25 | 下载 | The emergence of Superchips represents a significant advancement in next-generation AI hardware. These Superchips employ a tightly coupled heterogeneous architecture that integrates GPU and CPU on the... |
| Go With The Flow: Churn-Tolerant Decentralized Training of Large Language Models | Nikolay Blagoev, Bart Cox, Jérémie Decouchant, Lydia Y. Chen | 2025-09-25 | 下载 | Motivated by the emergence of large language models (LLMs) and the importance of democratizing their training, we propose GWTF, the first crash tolerant practical decentralized training framework for ... |
| From GPUs to RRAMs: Distributed In-Memory Primal-Dual Hybrid Gradient Method for Solving Large-Scale Linear Optimization Problem | Huynh Q. N. Vo, Md Tawsif Rahman Chowdhury, Paritosh Ramanan, Gozde Tutuncuoglu, Junchi Yang, Feng Qiu, Murat Yildirim | 2025-09-25 | 下载 | The exponential growth of computational workloads is surpassing the capabilities of conventional architectures, which are constrained by fundamental limits. |
| Communication Bias in Large Language Models: A Regulatory Perspective | Adrian Kuenzler, Stefan Schmid | 2025-09-25 | 下载 | Large language models (LLMs) are increasingly central to many applications, raising concerns about bias, fairness, and regulatory compliance. This paper reviews risks of biased outputs and their socie... |
| Mojo: MLIR-Based Performance-Portable HPC Science Kernels on GPUs for the Python Ecosystem | William F. Godoy, Tatiana Melnichenko, Pedro Valero-Lara, Wael Elwasif, Philip Fackler, Rafael Ferreira Da Silva, Keita Teranishi, Jeffrey S. Vetter | 2025-09-25 | 下载 | We explore the performance and portability of the novel Mojo language for scientific computing workloads on GPUs. As the first language based on the LLVM's Multi-Level Intermediate Representation (MLI... |
| Utilizing Sparsity in the GPU-accelerated Assembly of Schur Complement Matrices in Domain Decomposition Methods | Jakub Homola, Ondřej Meca, Lubomír Říha, Tomáš Brzobohatý | 2025-09-25 | 下载 | Schur complement matrices emerge in many domain decomposition methods that can solve complex engineering problems using supercomputers. Today, as most of the high-performance clusters' performance lie... |
| RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training | Wei Gao, Yuheng Zhao, Dakai An, Tianyuan Wu, Lunxi Cao, Shaopan Xiong, Ju Huang, Weixun Wang, Siran Yang, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng, Wei Wang | 2025-09-25 | 下载 | Reinforcement Learning (RL) is a pivotal post-training technique for enhancing the reasoning capabilities of Large Language Models (LLMs). However, synchronous RL post-training often suffers from sign... |
| IoT-MCP: Bridging LLMs and IoT Systems Through Model Context Protocol | Ningyuan Yang, Guanliang Lyu, Mingchen Ma, Yiyi Lu, Yiming Li, Zhihui Gao, Hancheng Ye, Jianyi Zhang, Tingjun Chen, Yiran Chen | 2025-09-25 | 下载 | The integration of Large Language Models (LLMs) with Internet-of-Things (IoT) systems faces significant challenges in hardware heterogeneity and control complexity. |
| RecIS: Sparse to Dense, A Unified Training Framework for Recommendation Models | Hua Zong, Qingtao Zeng, Zhengxiong Zhou, Zhihua Han, Zhensong Yan, Mingjie Liu, Hechen Sun, Jiawei Liu, Yiwen Hu, Qi Wang, YiHan Xian, Wenjie Guo, Houyuan Xiang, Zhiyuan Zeng, Xiangrong Sheng, Bencheng Yan, Nan Hu, Yuheng Huang, Jinqing Lian, Ziru Xu, Yan Zhang, Ju Huang, Siran Yang, Huimin Yi, Jiamang Wang, Pengjie Wang, Han Zhu, Jian Wu, Dan Ou, Jian Xu, Haihong Tang, Yuning Jiang, Bo Zheng, Lin Qu | 2025-09-25 | 下载 | In this paper, we propose RecIS, a unified Sparse-Dense training framework designed to achieve two primary goals: 1. Unified Framework To create a Unified sparse-dense training framework based on the ... |
| Prompt-Aware Scheduling for Low-Latency LLM Serving | Yiheng Tao, Yihe Zhang, Matthew T. Dearing, Xin Wang, Yuping Fan, Zhiling Lan | 2025-09-25 | 下载 | Efficient scheduling of LLM inference tasks is essential for achieving low latency and high throughput, particularly with the growing use of reasoning-capable LLMs. |
| Integrating and Characterizing HPC Task Runtime Systems for hybrid AI-HPC workloads | Andre Merzky, Mikhail Titov, Matteo Turilli, Shantenu Jha | 2025-09-25 | 下载 | Scientific workflows increasingly involve both HPC and machine-learning tasks, combining MPI-based simulations, training, and inference in a single execution. |
| Distributed-memory Algorithms for Sparse Matrix Permutation, Extraction, and Assignment | Elaheh Hassani, Md Taufique Hussain, Ariful Azad | 2025-09-25 | 下载 | We present scalable distributed-memory algorithms for sparse matrix permutation, extraction, and assignment. Our methods follow an Identify-Exchange-Build (IEB) strategy where each process identifies ... |
| Kant: An Efficient Unified Scheduling System for Large-Scale AI Clusters | Lingling Zeng, Gen Zhang, Jialin Peng, Xiang Xu, Yuan Xu, Lijun Ma | 2025-09-25 | 下载 | As AI cluster sizes continue to expand and the demand for large-language-model (LLM) training and inference workloads grows rapidly, traditional scheduling systems face significant challenges in balan... |
| Guiding Application Users via Estimation of Computational Resources for Massively Parallel Chemistry Computations | Tanzila Tabassum, Omer Subasi, Ajay Panyala, Epiya Ebiapia, Gerald Baumgartner, Erdal Mutlu, P., Sadayappan, Karol Kowalski | 2025-09-25 | 下载 | In this work, we develop machine learning (ML) based strategies to predict resources (costs) required for massively parallel chemistry computations, such as coupled-cluster methods, to guide applicati... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| XenoFlow: How Fast Can a SmartNIC-Based DNS Load Balancer Run? | Max Schrötter, Sten Heimbrodt, Bettina Schnor | 2025-09-25 | 下载 | With the advent of programmable network hardware, more and more functionality can be moved from software running on general purpose CPUs to the NIC. |
| eXplainable Artificial Intelligence for RL-based Networking Solutions | Yeison Stiven Murcia, Oscar Mauricio Caicedo, Daniela Maria Casas, Nelson Luis Saldanha da Fonseca | 2025-09-25 | 下载 | Reinforcement Learning (RL) agents have been widely used to improve networking tasks. However, understanding the decisions made by these agents is essential for their broader adoption in networking an... |
| MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANs | Prakhar Sharma, Haohuang Wen, Vinod Yegneswaran, Ashish Gehani, Phillip Porras, Zhiqiang Lin | 2025-09-25 | 下载 | The evolution toward 6G networks is being accelerated by the Open Radio Access Network (O-RAN) paradigm -- an open, interoperable architecture that enables intelligent, modular applications across pub... |
| A Target-Agnostic Protocol-Independent Interface for the Transport Layer | Pedro Mizuno, Kimiya Mohammadtaheri, Linfan Qian, Joshua Johnson, Danny Akbarzadeh, Chris Neely, Mario Baldi, Nachiket Kapre, Mina Tahmasbi Arashloo | 2025-09-25 | 下载 | Transport protocols continue to evolve to meet the demands of new applications, workloads, and network environments, yet implementing and evolving transport protocols remains difficult and costly. |
| Context-Aware Hybrid Routing in Bluetooth Mesh Networks Using Multi-Model Machine Learning and AODV Fallback | Md Sajid Islam, Tanvir Hasan | 2025-09-25 | 下载 | Bluetooth-based mesh networks offer a promising infrastructure for offline communication in emergency and resource constrained scenarios. However, traditional routing strategies such as Ad hoc On-Dema... |
| Semantic Edge-Cloud Communication for Real-Time Urban Traffic Surveillance with ViT and LLMs over Mobile Networks | Murat Arda Onsu, Poonam Lohan, Burak Kantarci, Aisha Syed, Matthew Andrews, Sean Kennedy | 2025-09-25 | 下载 | Real-time urban traffic surveillance is vital for Intelligent Transportation Systems (ITS) to ensure road safety, optimize traffic flow, track vehicle trajectories, and prevent collisions in smart cit... |
| Hybrid RIS-Aided Digital Over-the-Air Computing for Edge AI Inference: Joint Feature Quantization and Active-Passive Beamforming Design | Yang Fu, Peng Qin, Liming Chen, Xianchao Zhang, Yifei Wang | 2025-09-25 | 下载 | The vision of 6G networks aims to enable edge inference by leveraging ubiquitously deployed artificial intelligence (AI) models, facilitating intelligent environmental perception for a wide range of a... |
| Flight Dynamics to Sensing Modalities: Exploiting Drone Ground Effect for Accurate Edge Detection | Chenyu Zhao, Jingao Xu, Ciyu Ruan, Haoyang Wang, Shengbo Wang, Jiaqi Li, Jirong Zha, Weijie Hong, Zheng Yang, Yunhao Liu, Xiao-Ping Zhang, Xinlei Chen | 2025-09-25 | 下载 | Drone-based rapid and accurate environmental edge detection is highly advantageous for tasks such as disaster relief and autonomous navigation. |
| Leveraging Large Language Models for Automated Reproduction of Networking Research Results | Yining Jiang, Yunxin Xu, Wenyun Xu, Yufan Zhu, Tangtang He, Haiying Huang, Letian Zhu, Qingyu Song, Qiang Su, Lizhao You, Lu Tang, Wanjin Feng, Yuchao Zhang, Linghe Kong, Qiao Xiang, Jiwu Shu | 2025-09-25 | 下载 | Code reproduction is a cornerstone of scientific validity, yet it remains a formidable challenge in computer networking research due to the scarcity of open-source implementations and the complexity o... |
| A Novel Integrated Architecture for Intent Based Approach and Zero Touch Networks | Neelam Gupta, Dibakar Das, Tamizhelakkiya K, Uma Maheswari Natarajan, Sharvari Ravindran, Komal Sharma, Jyotsna Bapat, Debabrata Das | 2025-09-25 | 下载 | The transition to Sixth Generation (6G) networks presents challenges in managing quality of service (QoS) of diverse applications and achieving Service Level Agreements (SLAs) under varying network co... |
| BSB: Towards Demand-Aware Peer Selection With XOR-based Routing | Qingyun Ji, Darya Melnyk, Arash Pourdamghani, Stefan Schmid | 2025-09-25 | 下载 | Peer-to-peer networks, as a key enabler of modern networked and distributed systems, rely on peer-selection algorithms to optimize their scalability and performance. |
| Joint Active RIS Configuration and User Power Control for Localization: A Neuroevolution-Based Approach | George Stamatelis, Hui Chen, Henk Wymeersch, George C. Alexandropoulos | 2025-09-25 | 下载 | This paper studies user localization aided by a Reconfigurable Intelligent Surface (RIS). A feedback link from the Base Station (BS) to the user is adopted to enable dynamic power control of the user ... |
| Trustworthy Semantic Communication for Vehicular Networks: Challenges and Solutions | Yanghe Pan, Yuntao Wang, Shaolong Guo, Chengyu Yin, Ruidong Li, Zhou Su, Yuan Wu | 2025-09-25 | 下载 | Semantic communication (SemCom) has the potential to significantly reduce communication delay in vehicle-to-everything (V2X) communications within vehicular networks (VNs). |
| NetCAS: Dynamic Cache and Backend Device Management in Networked Environments | Joon Yong Hwang, Chanseo Park, Ikjun Yeom, Younghoon Kim | 2025-09-25 | 下载 | Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits... |
| An SDR-Based Test Platform for 5G NTN Prototyping and Validation | Lu Hou, Kan Zheng, Jie Mei, Cheng Huang | 2025-09-25 | 下载 | The integration of satellite communication into 5G has been formalized in 3GPP Release 17 through the specification of Non-Terrestrial Networks (NTN), marking a significant step toward achieving globa... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A Target-Agnostic Protocol-Independent Interface for the Transport Layer | Pedro Mizuno, Kimiya Mohammadtaheri, Linfan Qian, Joshua Johnson, Danny Akbarzadeh, Chris Neely, Mario Baldi, Nachiket Kapre, Mina Tahmasbi Arashloo | 2025-09-25 | 下载 | Transport protocols continue to evolve to meet the demands of new applications, workloads, and network environments, yet implementing and evolving transport protocols remains difficult and costly. |
| Nova: Real-Time Agentic Vision-Language Model Serving with Adaptive Cross-Stage Parallelization | Yuhang Xu, Shengzhong Liu, Dong Zhang, Bingheng Yan, Fan Wu, Guihai Chen | 2025-09-25 | 下载 | This paper presents Nova, a real-time scheduling framework for serving agentic vision-language models (VLMs) on a single GPU with balanced per-request latency and overall request process throughput. |
| NetCAS: Dynamic Cache and Backend Device Management in Networked Environments | Joon Yong Hwang, Chanseo Park, Ikjun Yeom, Younghoon Kim | 2025-09-25 | 下载 | Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Size-Aware Dispatching to Fluid Queues | Runhan Xie, Esa Hyytiä, Rhonda Righter | 2025-09-25 | 下载 | We develop a fluid-flow model for routing problems, where fluid consists of different size particles and the task is to route the incoming fluid to parallel servers using the size information in o... |
| PreLoRA: Hybrid Pre-training of Vision Transformers with Full Training and Low-Rank Adapters | Krishu K Thapa, Reet Barik, Krishna Teja Chitty-Venkata, Murali Emani, Venkatram Vishwanath | 2025-09-25 | 下载 | Training large models ranging from millions to billions of parameters is highly resource-intensive, requiring significant time, compute, and memory. |
| Redesigning GROMACS Halo Exchange: Improving Strong Scaling with GPU-initiated NVSHMEM | Mahesh Doijade, Andrey Alekseenko, Ania Brown, Alan Gray, Szilárd Páll | 2025-09-25 | 下载 | Improving time-to-solution in molecular dynamics simulations often requires strong scaling due to fixed-sized problems. GROMACS is highly latency-sensitive, with peak iteration rates in the sub-millis... |
| Fast-SEnSeI: Lightweight Sensor-Independent Cloud Masking for On-board Multispectral Sensors | Jan Kněžík, Jonáš Herec, Rado Pitoňák | 2025-09-25 | 下载 | Cloud segmentation is a critical preprocessing step for many Earth observation tasks, yet most models are tightly coupled to specific sensor configurations and rely on ground-based processing. |
| Prompt-Aware Scheduling for Low-Latency LLM Serving | Yiheng Tao, Yihe Zhang, Matthew T. Dearing, Xin Wang, Yuping Fan, Zhiling Lan | 2025-09-25 | 下载 | Efficient scheduling of LLM inference tasks is essential for achieving low latency and high throughput, particularly with the growing use of reasoning-capable LLMs. |
| NetCAS: Dynamic Cache and Backend Device Management in Networked Environments | Joon Yong Hwang, Chanseo Park, Ikjun Yeom, Younghoon Kim | 2025-09-25 | 下载 | Modern storage systems often combine fast cache with slower backend devices to accelerate I/O. As performance gaps narrow, concurrently accessing both devices, rather than relying solely on cache hits... |
| Sig2Model: A Boosting-Driven Model for Updatable Learned Indexes | Alireza Heidari, Amirhossein Ahmad, Wei Zhang, Ying Xiong | 2025-09-25 | 下载 | Learned Indexes (LIs) represent a paradigm shift from traditional index structures by employing machine learning models to approximate the cumulative distribution function (CDF) of sorted data. |