Skip to content

2025-12-26

cs.AR - Architecture

标题作者发布日期PDF摘要
Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth CeilingHannah Atmer, Yuan Yao, Thiemo Voigt, Stefanos Kaxiras2025-12-26下载Energy consumption dictates the cost and environmental impact of deploying Large Language Models. This paper investigates the impact of on-chip SRAM size and operating frequency on the energy efficien...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Efficient Multi-Model Orchestration for Self-Hosted Large Language ModelsBhanu Prakash Vangala, Tanu Malik2025-12-26下载Self-hosting large language models (LLMs) is increasingly appealing for organizations seeking privacy, cost control, and customization. Yet deploying and maintaining in-house models poses challenges i...
Cost Trade-offs of Reasoning and Non-Reasoning Large Language Models in Text-to-SQLSaurabh Deochake, Debajyoti Mukhopadhyay2025-12-26下载While Text-to-SQL systems achieve high accuracy, existing efficiency metrics like the Valid Efficiency Score prioritize execution time, a metric we show is fundamentally decoupled from consumption-bas...
Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud ApplicationsShengkun Cui, Rahul Krishna, Saurabh Jha, Ravishankar K. Iyer2025-12-26下载Cloud incidents pose major operational challenges in production, with unresolved production cloud incidents cost on average over $2M per hour.
Proceedings First Workshop on Adaptable Cloud ArchitecturesGiuseppe De Palma, Saverio Giallorenzo2025-12-26下载This volume contains the post-proceedings of the Workshop on Adaptable Cloud Architectures (WACA 2025), held on June 20, 2025, in Lille, France, co-located with DisCoTec 2025 - 20th International Fede...
FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication FusionZhuoran Zhu, Chunyang Zhu, Hao Lin, Xu Fu, Yiming Zhou, Quanlu Zhang, Zhenhua Li, Feng Qian, Chao Yu, Boxun Li, Guohao Dai, Yu Wang2025-12-26下载Large-scale Mixture-of-Experts (MoE) models rely on \emph{expert parallelism} for efficient training and inference, which splits experts across devices and necessitates distributed data shuffling to r...
Robust Federated Fine-Tuning in Heterogeneous Networks with Unreliable Connections: An Aggregation ViewYanmeng Wang, Zhiwen Dai, Shuai Wang, Jian Zhou, Fu Xiao, Tony Q. S. Quek, Tsung-Hui Chang2025-12-26下载Federated Fine-Tuning (FFT) has attracted growing interest as it leverages both server- and client-side data to enhance global model generalization while preserving privacy, and significantly reduces ...
BLEST: Blazingly Efficient BFS using Tensor CoresDeniz Elbek, Kamer Kaya2025-12-26下载Breadth-First Search (BFS) is a fundamental graph kernel that underpins a wide range of applications. While modern GPUs provide specialised Matrix-Multiply-Accumulate (MMA) units, e.g.
Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language ModelsTingyang Sun, Ting He, Bo Ji, Parimal Parag2025-12-26下载Large language models have demonstrated extraordinary performance in many AI tasks but are expensive to use, even after training, due to their requirement of high-end GPUs.
LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge DevicesMingyu Sun, Xiao Zhang, Shen Qu, Yan Li, Mengbai Xiao, Yuan Yuan, Dongxiao Yu2025-12-26下载Large language models (LLMs) have emerged as a powerful foundation for intelligent reasoning and decision-making, demonstrating substantial impact across a wide range of domains and applications.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Schwarz Information Criterion Aided MAB for Resource Allocation in Dynamic LoRa SystemRyotai Ariyoshi, Aohan Li, Mikio Hasegawa, Miao Pan, Tomoaki Ohtsuki, Zhu Han2025-12-26下载This paper proposes a lightweight distributed learning method for transmission parameter selection in Long Range (LoRa) networks that can adapt to dynamic communication environments.
Meta-Learning-Based Handover Management in NextG O-RANMichail Kalntis, George Iosifidis, José Suárez-Varela, Andra Lutu, Fernando A. Kuipers2025-12-26下载While traditional handovers (THOs) have served as a backbone for mobile connectivity, they increasingly suffer from failures and delays, especially in dense deployments and high-frequency bands.
Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language ModelsTingyang Sun, Ting He, Bo Ji, Parimal Parag2025-12-26下载Large language models have demonstrated extraordinary performance in many AI tasks but are expensive to use, even after training, due to their requirement of high-end GPUs.

cs.PF - Performance

标题作者发布日期PDF摘要
Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth CeilingHannah Atmer, Yuan Yao, Thiemo Voigt, Stefanos Kaxiras2025-12-26下载Energy consumption dictates the cost and environmental impact of deploying Large Language Models. This paper investigates the impact of on-chip SRAM size and operating frequency on the energy efficien...

基于 VitePress 构建