Skip to content

2025-11-03

cs.AR - Architecture

标题作者发布日期PDF摘要
Optimizing Attention on GPUs by Exploiting GPU Architectural NUMA EffectsMansi Choudhary, Karthik Sangaiah, Sonali Singh, Muhammad Osama, Lisa Wu Wills, Ganesh Dasika2025-11-03下载The rise of disaggregated AI GPUs has exposed a critical bottleneck in large-scale attention workloads: non-uniform memory access (NUMA). As multi-chiplet designs become the norm for scaling compute c...
LM-Fix: Lightweight Bit-Flip Detection and Rapid Recovery Framework for Language ModelsAhmad Tahmasivand, Noureldin Zahran, Saba Al-Sayouri, Mohammed Fouda, Khaled N. Khasawneh2025-11-03下载This paper presents LM-Fix, a lightweight detection and rapid recovery framework for faults in large language models (LLMs). Existing integrity approaches are often heavy or slow for modern LLMs.
Simulation-Driven Evaluation of Chiplet-Based Architectures Using VisualSimWajid Ali, Ayaz Akram, Deepak Shankar2025-11-03下载This paper focuses on the simulation of multi-die System-on-Chip (SoC) architectures using VisualSim, emphasizing chiplet-based system modeling and performance analysis.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Optimizing Attention on GPUs by Exploiting GPU Architectural NUMA EffectsMansi Choudhary, Karthik Sangaiah, Sonali Singh, Muhammad Osama, Lisa Wu Wills, Ganesh Dasika2025-11-03下载The rise of disaggregated AI GPUs has exposed a critical bottleneck in large-scale attention workloads: non-uniform memory access (NUMA). As multi-chiplet designs become the norm for scaling compute c...
Quantum-Enhanced Generative Models for Rare Event PredictionM. Z. Haider, M. U. Ghouri, Tayyaba Noreen, M. Salman2025-11-03下载Rare events such as financial crashes, climate extremes, and biological anomalies are notoriously difficult to model due to their scarcity and heavy-tailed distributions.
GPoS: Geospatially-aware Proof of StakeShashank Motepalli, Naman Garg, Gengrui Zhang, Hans-Arno Jacobsen2025-11-03下载Geospatial decentralization is essential for blockchains, ensuring regulatory resilience, robustness, and fairness. We empirically analyze five major Proof of Stake (PoS) blockchains: Aptos, Avalanche...
RobustFSM: Submodular Maximization in Federated Setting with Malicious ClientsDuc A. Tran, Dung Truong, Duy Le2025-11-03下载Submodular maximization is an optimization problem benefiting many machine learning applications, where we seek a small subset best representing an extremely large dataset.
LARK -- Linearizability Algorithms for Replicated Keys in AerospikeAndrew Goodng, Kevin Porter, Thomas Lopatic, Ashish Shinde, Sunil Sayyaparaju, Srinivasan Seshadri, V. Srinivasan2025-11-03下载We present LARK (Linearizability Algorithms for Replicated Keys), a synchronous replication protocol that achieves linearizability while minimizing latency and infrastructure cost, at significantly hi...
Edge AI in Highly Volatile Environments: Is Fairness Worth the Accuracy Trade-off?Obaidullah Zaland, Feras M. Awaysheh, Sawsan Al Zubi, Abdul Rahman Safi, Monowar Bhuyan2025-11-03下载Federated learning (FL) has emerged as a transformative paradigm for edge intelligence, enabling collaborative model training while preserving data privacy across distributed personal devices.
Federated Cyber Defense: Privacy-Preserving Ransomware Detection Across Distributed SystemsDaniel M. Jimenez-Gutierrez, Enrique Zuazua, Joaquin Del Rio, Oleksii Sliusarenko, Xabi Uribe-Etxebarria2025-11-03下载Detecting malware, especially ransomware, is essential to securing today's interconnected ecosystems, including cloud storage, enterprise file-sharing, and database services.
Adaptive Multidimensional Quadrature on Multi-GPU SystemsMelanie Tonarelli, Simone Riva, Pietro Benedusi, Fabrizio Ferrandi, Rolf Krause2025-11-03下载We introduce a distributed adaptive quadrature method that formulates multidimensional integration as a hierarchical domain decomposition problem on multi-GPU architectures.
Real-time Continual Learning on Intel Loihi 2Elvin Hajizada, Danielle Rager, Timothy Shea, Leobardo Campos-Macias, Andreas Wild, Eyke Hüllermeier, Yulia Sandamirskaya, Mike Davies2025-11-03下载AI systems on edge devices face a critical challenge in open-world environments: adapting when data distributions shift and novel classes emerge.
Gradient Clock Synchronization with Practically Constant Local SkewChristoph Lenzen2025-11-03下载Gradient Clock Synchronization (GCS) is the task of minimizing the local skew, i.e., the clock offset between neighboring clocks, in a larger network.
Why Should the Server Do It All?: A Scalable, Versatile, and Model-Agnostic Framework for Server-Light DNN Inference over Massively Distributed Clients via Training-Free Intermediate Feature CompressionMingyu Sung, Suhwan Im, Daeho Bang, Il-Min Kim, Sangseok Yun, Jae-Mo Kang2025-11-03下载Modern DNNs often rely on edge-cloud model partitioning (MP), but widely used schemes fix shallow, static split points that underutilize edge compute and concentrate latency and energy on the server.
Transformer-Based Sparse CSI Estimation for Non-Stationary ChannelsMuhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal, Hassan Rizwan, Sagnik Bhattacharya, Muhammad Ali Jamshed, John M. Cioffi2025-11-03下载Accurate and efficient estimation of Channel State Information (CSI) is critical for next-generation wireless systems operating under non-stationary conditions, where user mobility, Doppler spread, an...
Design of quasi phase matching crystal based on differential gray wolf algorithmHe Chen, ZiHua Zheng, JingHua Sun2025-11-03下载This paper focuses on the key problem in the development of nonlinear optical technology, the performance optimization of aperiodically polarized crystals.
Scalable Maxflow Processing for Dynamic GraphsShruthi Kannappan, Ashwina Kumar, Rupesh Nasre2025-11-03下载The Maximum Flow (Max-Flow) problem is a cornerstone in graph theory and combinatorial optimization, aiming to determine the largest possible flow from a designated source node to a sink node within a...
Boosting performance of computer vision applications through embedded GPUs on the edgeFabio Diniz Rossi2025-11-03下载Computer vision applications, especially those using augmented reality technology, are becoming quite popular in mobile devices. However, this type of application is known as presenting significant de...
Neuro-Inspired Task Offloading in Edge-IoT Networks Using Spiking Neural NetworksFabio Diniz Rossi2025-11-03下载Traditional task offloading strategies in edge computing often rely on static heuristics or data-intensive machine learning models, which are not always suitable for highly dynamic and resource-constr...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Pinching Antennas Meet AI in Next-Generation Wireless NetworksFang Fang, Zhiguo Ding, Victor C. M. Leung, Lajos Hanzo2025-11-03下载Next-generation (NG) wireless networks must embrace innate intelligence in support of demanding emerging applications, such as extended reality and autonomous systems, under ultra-reliable and low-lat...
GPoS: Geospatially-aware Proof of StakeShashank Motepalli, Naman Garg, Gengrui Zhang, Hans-Arno Jacobsen2025-11-03下载Geospatial decentralization is essential for blockchains, ensuring regulatory resilience, robustness, and fairness. We empirically analyze five major Proof of Stake (PoS) blockchains: Aptos, Avalanche...
Deep Reinforcement Learning for Multi-flow Routing in Heterogeneous Wireless NetworksBrian Kim, Justin H. Kong, Terrence J. Moore, Fikadu T. Dagefu2025-11-03下载Due to the rapid growth of heterogeneous wireless networks (HWNs), where devices with diverse communication technologies coexist, there is increasing demand for efficient and adaptive multi-hop routin...
A Modular DTaaS Architecture for Predictive Slice Management in 6G SystemsTuğçe Bilen, Mehmet Özdem2025-11-03下载The sixth generation (6G) of wireless networks will require fundamentally new orchestration paradigms to meet stringent requirements for ultra-low latency, high reliability, and pervasive intelligence...
Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge ComputingSong Gao, Songyang Zhang, Shusen Jing, Shuai Zhang, Xiangwei Zhou, Yue Wang, Zhipeng Cai2025-11-03下载Recent advancements in large artificial intelligence models (LAMs) are driving significant innovations in mobile edge computing within next-generation wireless networks.
Multi-Server FL with Overlapping Clients: A Latency-Aware Relay FrameworkYun Ji, Zeyu Chen, Xiaoxiong Zhong, Yanan Ma, Sheng Zhang, Yuguang Fang2025-11-03下载Multi-server Federated Learning (FL) has emerged as a promising solution to mitigate communication bottlenecks of single-server FL. In a typical multi-server FL architecture, the regions covered by di...
Beyond Static Thresholds: Adaptive RRC Signaling Storm Detection with Extreme Value TheoryDang Kien Nguyen, Rim El Malki, Filippo Rebecchi, Raymond Knopp, Melek Önen2025-11-03下载In 5G and beyond networks, the radio communication between a User Equipment (UE) and a base station (gNodeB or gNB), also known as the air interface, is a critical component of network access and conn...
3D Gaussian Radiation Field Modeling for Integrated RIS-FAS Systems: Analysis and OptimizationKaining Wang, Bo Yang, Yusheng Lei, Zhiwen Yu, Xuelin Cao, Liang Wang, Bin Guo, George C. Alexandropoulos, Mérouane Debbah, Zhu Han2025-11-03下载The integration of reconfigurable intelligent surfaces (RIS) and fluid antenna systems (FAS) has attracted considerable attention due to its tremendous potential in enhancing wireless communication pe...
DeepSpecs: Expert-Level Questions Answering in 5GAman Ganapathy Manvattira, Yifei Xu, Ziyue Dang, Songwu Lu2025-11-03下载5G technology enables mobile Internet access for billions of users. Answering expert-level questions about 5G specifications requires navigating thousands of pages of cross-referenced standards that e...
Experimental Demonstration of Software-Orchestrated Quantum Network Applications over a Campus-Scale TestbedMd. Shariful Islam, Joaquin Chung, Ely Marcus Eastman, Robert J. Hayek, Prem Kumar, Rajkumar Kettimuthu2025-11-03下载To fulfill their promise, quantum networks must transform from isolated testbeds into scalable infrastructures for distributed quantum applications.
Joint Computation Offloading and Resource Allocation for Maritime MEC with Energy HarvestingZhen Wang, Bin Lin, Qiang Ye, Yuguang Fang, Xiaoling Han2025-11-03下载In this paper, we establish a multi-access edge computing (MEC)-enabled sea lane monitoring network (MSLMN) architecture with energy harvesting (EH) to support dynamic ship tracking, accident forensic...

cs.PF - Performance

标题作者发布日期PDF摘要
Optimizing Attention on GPUs by Exploiting GPU Architectural NUMA EffectsMansi Choudhary, Karthik Sangaiah, Sonali Singh, Muhammad Osama, Lisa Wu Wills, Ganesh Dasika2025-11-03下载The rise of disaggregated AI GPUs has exposed a critical bottleneck in large-scale attention workloads: non-uniform memory access (NUMA). As multi-chiplet designs become the norm for scaling compute c...
Flashlight: PyTorch Compiler Extensions to Accelerate Attention VariantsBozhi You, Irene Wang, Zelal Su Mustafaoglu, Abhinav Jangda, Angélica Moreira, Roshan Dathathri, Divya Mahajan, Keshav Pingali2025-11-03下载Attention is a fundamental building block of large language models (LLMs), so there have been many efforts to implement it efficiently. For example, FlashAttention leverages tiling and kernel fusion t...
Simulation-Driven Evaluation of Chiplet-Based Architectures Using VisualSimWajid Ali, Ayaz Akram, Deepak Shankar2025-11-03下载This paper focuses on the simulation of multi-die System-on-Chip (SoC) architectures using VisualSim, emphasizing chiplet-based system modeling and performance analysis.

基于 VitePress 构建