Skip to content

2025-10-15

cs.AR - Architecture

标题作者发布日期PDF摘要
ArtNet: Hierarchical Clustering-Based Artificial Netlist Generator for ML and DTCO ApplicationAndrew B. Kahng. Seokhyeong Kang, Seonghyeon Park, Dooseok Yoon2025-10-15下载In advanced nodes, optimization of power, performance and area (PPA) has become highly complex and challenging. Machine learning (ML) and design-technology co-optimization (DTCO) provide promising mit...
F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMsJude Haris, José Cano2025-10-15下载Large Language Models (LLMs) have become increasingly prominent for daily tasks, from improving sound-totext translation to generating additional frames for the latest video games.
Energy-Efficient FPGA Framework for Non-Quantized Convolutional Neural NetworksAngelos Athanasiadis, Nikolaos Tampouratzis, Ioannis Papaefstathiou2025-10-15下载The growing demand for real-time processing in artificial intelligence applications, particularly those involving Convolutional Neural Networks (CNNs), has highlighted the need for efficient computati...
D-com: Accelerating Iterative Processing to Enable Low-rank Decomposition of ActivationsFaraz Tahmasebi, Michael Pelluer, Hyoukjun Kwon2025-10-15下载The computation and memory costs of large language models kept increasing over last decade, which reached over the scale of 1T parameters. To address the challenges from the large scale models, model ...
ShuffleV: A Microarchitectural Defense Strategy against Electromagnetic Side-Channel Attacks in MicroprocessorsNuntipat Narkthong, Yukui Luo, Xiaolin Xu2025-10-15下载The run-time electromagnetic (EM) emanation of microprocessors presents a side-channel that leaks the confidentiality of the applications running on them.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Privacy-Preserving and Incentive-Driven Relay-Based Framework for Cross-Domain Blockchain InteroperabilitySaeed Moradi, Koosha Esmaeilzadeh Khorasani, Sara Rouhani2025-10-15下载Interoperability is essential for transforming blockchains from isolated networks into collaborative ecosystems, unlocking their full potential.
Distributed-Memory Parallel Algorithms for Fixed-Radius Near Neighbor Graph ConstructionGabriel Raulet, Dmitriy Morozov, Aydin Buluc, Katherine Yelick2025-10-15下载Computing fixed-radius near-neighbor graphs is an important first step for many data analysis algorithms. Near-neighbor graphs connect points that are close under some metric, endowing point clouds wi...
Cortex: Workflow-Aware Resource Pooling and Scheduling for Agentic ServingNikos Pagonas, Yeounoh Chung, Kostis Kaffes, Arvind Krishnamurthy2025-10-15下载We introduce Cortex, a prototype workflow-aware serving platform designed for agentic workloads. The core principle of Cortex is stage isolation: it provisions dedicated resource pools for each distin...
FedHFT: Efficient Federated Finetuning with Heterogeneous Edge ClientsFatih Ilhan, Selim Furkan Tekin, Tiansheng Huang, Gaowen Liu, Ramana Kompella, Greg Eisenhauer, Yingyan Celine Lin, Calton Pu, Ling Liu2025-10-15下载Fine-tuning pre-trained large language models (LLMs) has become a common practice for personalized natural language understanding (NLU) applications on downstream tasks and domain-specific datasets.
Anonymized Network Sensing using C++26 std::execution on GPUsMichael Mandulak, Sayan Ghosh, S M Ferdous, Mahantesh Halappanavar, George Slota2025-10-15下载Large-scale network sensing plays a vital role in network traffic analysis and characterization. As network packet data grows increasingly large, parallel methods have become mainstream for network an...
Efficiently Executing High-throughput Lightweight LLM Inference Applications on Heterogeneous Opportunistic GPU Clusters with Pervasive Context ManagementThanh Son Phung, Douglas Thain2025-10-15下载The rise of Generative AI introduces a new class of HPC workloads that integrates lightweight LLMs with traditional high-throughput applications to accelerate scientific discovery.
On-Chain Decentralized Learning and Cost-Effective Inference for DeFi Attack MitigationAbdulrahman Alhaidari, Balaji Palanisamy, Prashant Krishnamurthy2025-10-15下载Billions of dollars are lost every year in DeFi platforms by transactions exploiting business logic or accounting vulnerabilities. Existing defenses focus on static code analysis, public mempool scree...
Tight Conditions for Binary-Output Tasks under CrashesTimothé Albouy, Antonio Fernández Anta, Chryssis Georgiou, Nicolas Nicolaou, Junlang Wang2025-10-15下载This paper explores necessary and sufficient system conditions to solve distributed tasks with binary outputs (\textit{i.e.}, tasks with output values in {0,1}\{0,1\}).
FIRST: Federated Inference Resource Scheduling Toolkit for Scientific AI Model AccessAditya Tanikanti, Benoit Côté, Yanfei Guo, Le Chen, Nickolaus Saint, Ryan Chard, Ken Raffenetti, Rajeev Thakur, Thomas Uram, Ian Foster, Michael E. Papka, Venkatram Vishwanath2025-10-15下载We present the Federated Inference Resource Scheduling Toolkit (FIRST), a framework enabling Inference-as-a-Service across distributed High-Performance Computing (HPC) clusters.
Adaptive Rescheduling in Prefill-Decode Disaggregated LLM InferenceZhibin Wang, Zetao Hong, Xue Li, Zibo Wang, Shipeng Li, Qingkai Meng, Qing Wang, Chengying Huan, Rong Gu, Sheng Zhong, Chen Tian2025-10-15下载Large Language Model (LLM) inference has emerged as a fundamental paradigm. In real-world scenarios, variations in output length cause severe workload imbalance in the decode phase, particularly for l...
Service-Level Energy Modeling and Experimentation for Cloud-Native MicroservicesJulian Legler, Sebastian Werner, Maria C. Borges, Stefan Tai2025-10-15下载Microservice architectures have become the dominant paradigm for cloud-native systems, offering flexibility and scalability. However, this shift has also led to increased demand for cloud resources, c...
Verification Challenges in Sparse Matrix Vector Multiplication in High Performance Computing: Part IJunchao Zhang2025-10-15下载Sparse matrix vector multiplication (SpMV) is a fundamental kernel in scientific codes that rely on iterative solvers. In this first part of our work, we present both a sequential and a basic MPI para...
VSS Challenge Problem: Verifying the Correctness of AllReduce Algorithms in the MPICH Implementation of MPIPaul D. Hovland2025-10-15下载We describe a challenge problem for verification based on the MPICH implementation of MPI. The MPICH implementation includes several algorithms for allreduce, all of which should be functionally equiv...
F-BFQ: Flexible Block Floating-Point Quantization Accelerator for LLMsJude Haris, José Cano2025-10-15下载Large Language Models (LLMs) have become increasingly prominent for daily tasks, from improving sound-totext translation to generating additional frames for the latest video games.
ACE-GNN: Adaptive GNN Co-Inference with System-Aware Scheduling in Dynamic Edge EnvironmentsAo Zhou, Jianlei Yang, Tong Qiao, Yingjie Qi, Xinming Wei, Cenlin Duan, Weisheng Zhao, Chunming Hu2025-10-15下载The device-edge co-inference paradigm effectively bridges the gap between the high resource demands of Graph Neural Networks (GNNs) and limited device resources, making it a promising solution for adv...
Distributed Reductions for the Maximum Weight Independent Set ProblemJannick Borowitz, Ernestine Großmann, Mattthias Schimek2025-10-15下载Finding maximum-weight independent sets in graphs is an important NP-hard optimization problem. Given a vertex-weighted graph GG, the task is to find a subset of pairwise non-adjacent vertices of GG...
BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI InfrastructureYiyuan He, Minxian Xu, Jingfeng Wu, Jianmin Hu, Chong Ma, Min Shen, Le Chen, Chengzhong Xu, Lin Qu, Kejiang Ye2025-10-15下载Large language models (LLMs) are increasingly deployed in AI infrastructure, driving the need for high throughput, resource efficient serving systems.
Scrutiny new framework in integrated distributed reliable systemsMehdi Zekriyapanah Gashti2025-10-15下载In this paper we represent a new framework for integrated distributed systems. In the proposed framework we have used three parts to increase Satisfaction and Performance of this framework.
Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous PlatformsTong Qiao, Ao Zhou, Yingjie Qi, Yiou Wang, Han Wan, Jianlei Yang, Chunming Hu2025-10-15下载Graph Neural Networks (GNNs) have been widely adopted due to their strong performance. However, GNN training often relies on expensive, high-performance computing platforms, limiting accessibility for...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
DiffLoc: Diffusion Model-Based High-Precision Positioning for 6G NetworksTaekyun Lee, Tommaso Balercia, Heasung Kim, Hyeji Kim, Jeffrey G. Andrews2025-10-15下载This paper introduces a novel framework for high-accuracy outdoor user equipment (UE) positioning that applies a conditional generative diffusion model directly to high-dimensional massive MIMO channe...
Pilot Assignment for Distributed Massive MIMO Based on Channel Estimation Error MinimizationMohd Saif Ali Khan, Karthik RM, Samar Agnihotri2025-10-15下载Pilot contamination remains a major bottleneck in realizing the full potential of distributed massive MIMO systems. We propose two dynamic and scalable pilot assignment schemes designed for practical ...
Investigating Web Content Delivery Performance over StarlinkRohan Bose, Jinwei Zhao, Tanya Shreedhar, Jianping Pan, Nitinder Mohan2025-10-15下载Low Earth Orbit (LEO) satellite ISPs promise universal Internet connectivity, yet their interaction with content delivery remains poorly understood.
Optimize Replica Server Placement in a Satellite NetworkZhiyuan He, Yi Xu, Cheng Luo, Lili Qiu, Yuqing Yang2025-10-15下载Satellite communication offers Internet connectivity to remote locations, such as villages, deserts, mountains, and at sea. However, transmitting content over satellite networks is significantly more ...
Beyond Lamport, Towards Probabilistic Fair OrderingMuhammad Haseeb, Jinkun Geng, Radhika Mittal, Aurojit Panda, Srinivas Narayana, Anirudh Sivaraman2025-10-15下载A growing class of applications demands \emph{fair ordering} of events, which ensures that events generated earlier are processed before later events.
An LLM-Powered AI Agent Framework for Holistic IoT Traffic InterpretationDaniel Adu Worae, Spyridon Mastorakis2025-10-15下载Internet of Things (IoT) networks generate diverse and high-volume traffic that reflects both normal activity and potential threats. Deriving meaningful insight from such telemetry requires cross-laye...
NetMCP: Network-Aware Model Context Protocol Platform for LLM Capability ExtensionEnhan Li, Hongyang Du, Kaibin Huang2025-10-15下载Large Language Models (LLMs) remain static in functionality after training, and extending their capabilities requires integration with external data, computation, and services.
Mobile Coverage Analysis using Crowdsourced DataTimothy Wong, Tom Freeman, Joseph Feehily2025-10-15下载Effective assessment of mobile network coverage and the precise identification of service weak spots are paramount for network operators striving to enhance user Quality of Experience (QoE).
Optimizing Storage Overhead of User Behavior Log for ML-embedded Mobile AppsChen Gong, Yan Zhuang, Zhenzhe Zheng, Yiliu Chen, Sheng Wang, Fan Wu, Guihai Chen2025-10-15下载Machine learning (ML) models are increasingly integrated into modern mobile apps to enable personalized and intelligent services. These models typically rely on rich input features derived from histor...
Towards Trusted Service Monitoring: Verifiable Service Level AgreementsFernando Castillo, Eduardo Brito, Sebastian Werner, Pille Pullonen-Raudvere, Jonathan Heiss2025-10-15下载Service Level Agreement (SLA) monitoring in service-oriented environments suffers from inherent trust conflicts when providers self-report metrics, creating incentives to underreport violations.
Automated Network Protocol Testing with LLM AgentsYunze Wei, Kaiwen Wei, Shibo Du, Jianyu Wang, Zhangzhong Liu, Yawen Wang, Zhanyou Li, Congcong Miao, Xiaohui Xie, Yong Cui2025-10-15下载Network protocol testing is fundamental for modern network infrastructure. However, traditional network protocol testing methods are labor-intensive and error-prone, requiring manual interpretation of...

cs.PF - Performance

标题作者发布日期PDF摘要
Accelerated Feature Detectors for Visual SLAM: A Comparative Study of FPGA vs GPURuiqi Ye, Mikel Luján2025-10-15下载Feature detection is a common yet time-consuming module in Simultaneous Localization and Mapping (SLAM) implementations, which are increasingly deployed on power-constrained platforms, such as drones.
D-com: Accelerating Iterative Processing to Enable Low-rank Decomposition of ActivationsFaraz Tahmasebi, Michael Pelluer, Hyoukjun Kwon2025-10-15下载The computation and memory costs of large language models kept increasing over last decade, which reached over the scale of 1T parameters. To address the challenges from the large scale models, model ...

基于 VitePress 构建