2025-12-26

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling	Hannah Atmer, Yuan Yao, Thiemo Voigt, Stefanos Kaxiras	2025-12-26	下载	Energy consumption dictates the cost and environmental impact of deploying Large Language Models. This paper investigates the impact of on-chip SRAM size and operating frequency on the energy efficien...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Efficient Multi-Model Orchestration for Self-Hosted Large Language Models	Bhanu Prakash Vangala, Tanu Malik	2025-12-26	下载	Self-hosting large language models (LLMs) is increasingly appealing for organizations seeking privacy, cost control, and customization. Yet deploying and maintaining in-house models poses challenges i...
Cost Trade-offs of Reasoning and Non-Reasoning Large Language Models in Text-to-SQL	Saurabh Deochake, Debajyoti Mukhopadhyay	2025-12-26	下载	While Text-to-SQL systems achieve high accuracy, existing efficiency metrics like the Valid Efficiency Score prioritize execution time, a metric we show is fundamentally decoupled from consumption-bas...
Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud Applications	Shengkun Cui, Rahul Krishna, Saurabh Jha, Ravishankar K. Iyer	2025-12-26	下载	Cloud incidents pose major operational challenges in production, with unresolved production cloud incidents cost on average over $2M per hour.
Proceedings First Workshop on Adaptable Cloud Architectures	Giuseppe De Palma, Saverio Giallorenzo	2025-12-26	下载	This volume contains the post-proceedings of the Workshop on Adaptable Cloud Architectures (WACA 2025), held on June 20, 2025, in Lille, France, co-located with DisCoTec 2025 - 20th International Fede...
FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion	Zhuoran Zhu, Chunyang Zhu, Hao Lin, Xu Fu, Yiming Zhou, Quanlu Zhang, Zhenhua Li, Feng Qian, Chao Yu, Boxun Li, Guohao Dai, Yu Wang	2025-12-26	下载	Large-scale Mixture-of-Experts (MoE) models rely on \emph{expert parallelism} for efficient training and inference, which splits experts across devices and necessitates distributed data shuffling to r...
Robust Federated Fine-Tuning in Heterogeneous Networks with Unreliable Connections: An Aggregation View	Yanmeng Wang, Zhiwen Dai, Shuai Wang, Jian Zhou, Fu Xiao, Tony Q. S. Quek, Tsung-Hui Chang	2025-12-26	下载	Federated Fine-Tuning (FFT) has attracted growing interest as it leverages both server- and client-side data to enhance global model generalization while preserving privacy, and significantly reduces ...
BLEST: Blazingly Efficient BFS using Tensor Cores	Deniz Elbek, Kamer Kaya	2025-12-26	下载	Breadth-First Search (BFS) is a fundamental graph kernel that underpins a wide range of applications. While modern GPUs provide specialised Matrix-Multiply-Accumulate (MMA) units, e.g.
Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models	Tingyang Sun, Ting He, Bo Ji, Parimal Parag	2025-12-26	下载	Large language models have demonstrated extraordinary performance in many AI tasks but are expensive to use, even after training, due to their requirement of high-end GPUs.
LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices	Mingyu Sun, Xiao Zhang, Shen Qu, Yan Li, Mengbai Xiao, Yuan Yuan, Dongxiao Yu	2025-12-26	下载	Large language models (LLMs) have emerged as a powerful foundation for intelligent reasoning and decision-making, demonstrating substantial impact across a wide range of domains and applications.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Schwarz Information Criterion Aided MAB for Resource Allocation in Dynamic LoRa System	Ryotai Ariyoshi, Aohan Li, Mikio Hasegawa, Miao Pan, Tomoaki Ohtsuki, Zhu Han	2025-12-26	下载	This paper proposes a lightweight distributed learning method for transmission parameter selection in Long Range (LoRa) networks that can adapt to dynamic communication environments.
Meta-Learning-Based Handover Management in NextG O-RAN	Michail Kalntis, George Iosifidis, José Suárez-Varela, Andra Lutu, Fernando A. Kuipers	2025-12-26	下载	While traditional handovers (THOs) have served as a backbone for mobile connectivity, they increasingly suffer from failures and delays, especially in dense deployments and high-frequency bands.
Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models	Tingyang Sun, Ting He, Bo Ji, Parimal Parag	2025-12-26	下载	Large language models have demonstrated extraordinary performance in many AI tasks but are expensive to use, even after training, due to their requirement of high-end GPUs.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling	Hannah Atmer, Yuan Yao, Thiemo Voigt, Stefanos Kaxiras	2025-12-26	下载	Energy consumption dictates the cost and environmental impact of deploying Large Language Models. This paper investigates the impact of on-chip SRAM size and operating frequency on the energy efficien...