Appearance
2025-11-03
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Optimizing Attention on GPUs by Exploiting GPU Architectural NUMA Effects | Mansi Choudhary, Karthik Sangaiah, Sonali Singh, Muhammad Osama, Lisa Wu Wills, Ganesh Dasika | 2025-11-03 | 下载 | The rise of disaggregated AI GPUs has exposed a critical bottleneck in large-scale attention workloads: non-uniform memory access (NUMA). As multi-chiplet designs become the norm for scaling compute c... |
| LM-Fix: Lightweight Bit-Flip Detection and Rapid Recovery Framework for Language Models | Ahmad Tahmasivand, Noureldin Zahran, Saba Al-Sayouri, Mohammed Fouda, Khaled N. Khasawneh | 2025-11-03 | 下载 | This paper presents LM-Fix, a lightweight detection and rapid recovery framework for faults in large language models (LLMs). Existing integrity approaches are often heavy or slow for modern LLMs. |
| Simulation-Driven Evaluation of Chiplet-Based Architectures Using VisualSim | Wajid Ali, Ayaz Akram, Deepak Shankar | 2025-11-03 | 下载 | This paper focuses on the simulation of multi-die System-on-Chip (SoC) architectures using VisualSim, emphasizing chiplet-based system modeling and performance analysis. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Optimizing Attention on GPUs by Exploiting GPU Architectural NUMA Effects | Mansi Choudhary, Karthik Sangaiah, Sonali Singh, Muhammad Osama, Lisa Wu Wills, Ganesh Dasika | 2025-11-03 | 下载 | The rise of disaggregated AI GPUs has exposed a critical bottleneck in large-scale attention workloads: non-uniform memory access (NUMA). As multi-chiplet designs become the norm for scaling compute c... |
| Quantum-Enhanced Generative Models for Rare Event Prediction | M. Z. Haider, M. U. Ghouri, Tayyaba Noreen, M. Salman | 2025-11-03 | 下载 | Rare events such as financial crashes, climate extremes, and biological anomalies are notoriously difficult to model due to their scarcity and heavy-tailed distributions. |
| GPoS: Geospatially-aware Proof of Stake | Shashank Motepalli, Naman Garg, Gengrui Zhang, Hans-Arno Jacobsen | 2025-11-03 | 下载 | Geospatial decentralization is essential for blockchains, ensuring regulatory resilience, robustness, and fairness. We empirically analyze five major Proof of Stake (PoS) blockchains: Aptos, Avalanche... |
| RobustFSM: Submodular Maximization in Federated Setting with Malicious Clients | Duc A. Tran, Dung Truong, Duy Le | 2025-11-03 | 下载 | Submodular maximization is an optimization problem benefiting many machine learning applications, where we seek a small subset best representing an extremely large dataset. |
| LARK -- Linearizability Algorithms for Replicated Keys in Aerospike | Andrew Goodng, Kevin Porter, Thomas Lopatic, Ashish Shinde, Sunil Sayyaparaju, Srinivasan Seshadri, V. Srinivasan | 2025-11-03 | 下载 | We present LARK (Linearizability Algorithms for Replicated Keys), a synchronous replication protocol that achieves linearizability while minimizing latency and infrastructure cost, at significantly hi... |
| Edge AI in Highly Volatile Environments: Is Fairness Worth the Accuracy Trade-off? | Obaidullah Zaland, Feras M. Awaysheh, Sawsan Al Zubi, Abdul Rahman Safi, Monowar Bhuyan | 2025-11-03 | 下载 | Federated learning (FL) has emerged as a transformative paradigm for edge intelligence, enabling collaborative model training while preserving data privacy across distributed personal devices. |
| Federated Cyber Defense: Privacy-Preserving Ransomware Detection Across Distributed Systems | Daniel M. Jimenez-Gutierrez, Enrique Zuazua, Joaquin Del Rio, Oleksii Sliusarenko, Xabi Uribe-Etxebarria | 2025-11-03 | 下载 | Detecting malware, especially ransomware, is essential to securing today's interconnected ecosystems, including cloud storage, enterprise file-sharing, and database services. |
| Adaptive Multidimensional Quadrature on Multi-GPU Systems | Melanie Tonarelli, Simone Riva, Pietro Benedusi, Fabrizio Ferrandi, Rolf Krause | 2025-11-03 | 下载 | We introduce a distributed adaptive quadrature method that formulates multidimensional integration as a hierarchical domain decomposition problem on multi-GPU architectures. |
| Real-time Continual Learning on Intel Loihi 2 | Elvin Hajizada, Danielle Rager, Timothy Shea, Leobardo Campos-Macias, Andreas Wild, Eyke Hüllermeier, Yulia Sandamirskaya, Mike Davies | 2025-11-03 | 下载 | AI systems on edge devices face a critical challenge in open-world environments: adapting when data distributions shift and novel classes emerge. |
| Gradient Clock Synchronization with Practically Constant Local Skew | Christoph Lenzen | 2025-11-03 | 下载 | Gradient Clock Synchronization (GCS) is the task of minimizing the local skew, i.e., the clock offset between neighboring clocks, in a larger network. |
| Why Should the Server Do It All?: A Scalable, Versatile, and Model-Agnostic Framework for Server-Light DNN Inference over Massively Distributed Clients via Training-Free Intermediate Feature Compression | Mingyu Sung, Suhwan Im, Daeho Bang, Il-Min Kim, Sangseok Yun, Jae-Mo Kang | 2025-11-03 | 下载 | Modern DNNs often rely on edge-cloud model partitioning (MP), but widely used schemes fix shallow, static split points that underutilize edge compute and concentrate latency and energy on the server. |
| Transformer-Based Sparse CSI Estimation for Non-Stationary Channels | Muhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal, Hassan Rizwan, Sagnik Bhattacharya, Muhammad Ali Jamshed, John M. Cioffi | 2025-11-03 | 下载 | Accurate and efficient estimation of Channel State Information (CSI) is critical for next-generation wireless systems operating under non-stationary conditions, where user mobility, Doppler spread, an... |
| Design of quasi phase matching crystal based on differential gray wolf algorithm | He Chen, ZiHua Zheng, JingHua Sun | 2025-11-03 | 下载 | This paper focuses on the key problem in the development of nonlinear optical technology, the performance optimization of aperiodically polarized crystals. |
| Scalable Maxflow Processing for Dynamic Graphs | Shruthi Kannappan, Ashwina Kumar, Rupesh Nasre | 2025-11-03 | 下载 | The Maximum Flow (Max-Flow) problem is a cornerstone in graph theory and combinatorial optimization, aiming to determine the largest possible flow from a designated source node to a sink node within a... |
| Boosting performance of computer vision applications through embedded GPUs on the edge | Fabio Diniz Rossi | 2025-11-03 | 下载 | Computer vision applications, especially those using augmented reality technology, are becoming quite popular in mobile devices. However, this type of application is known as presenting significant de... |
| Neuro-Inspired Task Offloading in Edge-IoT Networks Using Spiking Neural Networks | Fabio Diniz Rossi | 2025-11-03 | 下载 | Traditional task offloading strategies in edge computing often rely on static heuristics or data-intensive machine learning models, which are not always suitable for highly dynamic and resource-constr... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Pinching Antennas Meet AI in Next-Generation Wireless Networks | Fang Fang, Zhiguo Ding, Victor C. M. Leung, Lajos Hanzo | 2025-11-03 | 下载 | Next-generation (NG) wireless networks must embrace innate intelligence in support of demanding emerging applications, such as extended reality and autonomous systems, under ultra-reliable and low-lat... |
| GPoS: Geospatially-aware Proof of Stake | Shashank Motepalli, Naman Garg, Gengrui Zhang, Hans-Arno Jacobsen | 2025-11-03 | 下载 | Geospatial decentralization is essential for blockchains, ensuring regulatory resilience, robustness, and fairness. We empirically analyze five major Proof of Stake (PoS) blockchains: Aptos, Avalanche... |
| Deep Reinforcement Learning for Multi-flow Routing in Heterogeneous Wireless Networks | Brian Kim, Justin H. Kong, Terrence J. Moore, Fikadu T. Dagefu | 2025-11-03 | 下载 | Due to the rapid growth of heterogeneous wireless networks (HWNs), where devices with diverse communication technologies coexist, there is increasing demand for efficient and adaptive multi-hop routin... |
| A Modular DTaaS Architecture for Predictive Slice Management in 6G Systems | Tuğçe Bilen, Mehmet Özdem | 2025-11-03 | 下载 | The sixth generation (6G) of wireless networks will require fundamentally new orchestration paradigms to meet stringent requirements for ultra-low latency, high reliability, and pervasive intelligence... |
| Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing | Song Gao, Songyang Zhang, Shusen Jing, Shuai Zhang, Xiangwei Zhou, Yue Wang, Zhipeng Cai | 2025-11-03 | 下载 | Recent advancements in large artificial intelligence models (LAMs) are driving significant innovations in mobile edge computing within next-generation wireless networks. |
| Multi-Server FL with Overlapping Clients: A Latency-Aware Relay Framework | Yun Ji, Zeyu Chen, Xiaoxiong Zhong, Yanan Ma, Sheng Zhang, Yuguang Fang | 2025-11-03 | 下载 | Multi-server Federated Learning (FL) has emerged as a promising solution to mitigate communication bottlenecks of single-server FL. In a typical multi-server FL architecture, the regions covered by di... |
| Beyond Static Thresholds: Adaptive RRC Signaling Storm Detection with Extreme Value Theory | Dang Kien Nguyen, Rim El Malki, Filippo Rebecchi, Raymond Knopp, Melek Önen | 2025-11-03 | 下载 | In 5G and beyond networks, the radio communication between a User Equipment (UE) and a base station (gNodeB or gNB), also known as the air interface, is a critical component of network access and conn... |
| 3D Gaussian Radiation Field Modeling for Integrated RIS-FAS Systems: Analysis and Optimization | Kaining Wang, Bo Yang, Yusheng Lei, Zhiwen Yu, Xuelin Cao, Liang Wang, Bin Guo, George C. Alexandropoulos, Mérouane Debbah, Zhu Han | 2025-11-03 | 下载 | The integration of reconfigurable intelligent surfaces (RIS) and fluid antenna systems (FAS) has attracted considerable attention due to its tremendous potential in enhancing wireless communication pe... |
| DeepSpecs: Expert-Level Questions Answering in 5G | Aman Ganapathy Manvattira, Yifei Xu, Ziyue Dang, Songwu Lu | 2025-11-03 | 下载 | 5G technology enables mobile Internet access for billions of users. Answering expert-level questions about 5G specifications requires navigating thousands of pages of cross-referenced standards that e... |
| Experimental Demonstration of Software-Orchestrated Quantum Network Applications over a Campus-Scale Testbed | Md. Shariful Islam, Joaquin Chung, Ely Marcus Eastman, Robert J. Hayek, Prem Kumar, Rajkumar Kettimuthu | 2025-11-03 | 下载 | To fulfill their promise, quantum networks must transform from isolated testbeds into scalable infrastructures for distributed quantum applications. |
| Joint Computation Offloading and Resource Allocation for Maritime MEC with Energy Harvesting | Zhen Wang, Bin Lin, Qiang Ye, Yuguang Fang, Xiaoling Han | 2025-11-03 | 下载 | In this paper, we establish a multi-access edge computing (MEC)-enabled sea lane monitoring network (MSLMN) architecture with energy harvesting (EH) to support dynamic ship tracking, accident forensic... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Optimizing Attention on GPUs by Exploiting GPU Architectural NUMA Effects | Mansi Choudhary, Karthik Sangaiah, Sonali Singh, Muhammad Osama, Lisa Wu Wills, Ganesh Dasika | 2025-11-03 | 下载 | The rise of disaggregated AI GPUs has exposed a critical bottleneck in large-scale attention workloads: non-uniform memory access (NUMA). As multi-chiplet designs become the norm for scaling compute c... |
| Flashlight: PyTorch Compiler Extensions to Accelerate Attention Variants | Bozhi You, Irene Wang, Zelal Su Mustafaoglu, Abhinav Jangda, Angélica Moreira, Roshan Dathathri, Divya Mahajan, Keshav Pingali | 2025-11-03 | 下载 | Attention is a fundamental building block of large language models (LLMs), so there have been many efforts to implement it efficiently. For example, FlashAttention leverages tiling and kernel fusion t... |
| Simulation-Driven Evaluation of Chiplet-Based Architectures Using VisualSim | Wajid Ali, Ayaz Akram, Deepak Shankar | 2025-11-03 | 下载 | This paper focuses on the simulation of multi-die System-on-Chip (SoC) architectures using VisualSim, emphasizing chiplet-based system modeling and performance analysis. |