Appearance
2025-12-26
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling | Hannah Atmer, Yuan Yao, Thiemo Voigt, Stefanos Kaxiras | 2025-12-26 | 下载 | Energy consumption dictates the cost and environmental impact of deploying Large Language Models. This paper investigates the impact of on-chip SRAM size and operating frequency on the energy efficien... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Efficient Multi-Model Orchestration for Self-Hosted Large Language Models | Bhanu Prakash Vangala, Tanu Malik | 2025-12-26 | 下载 | Self-hosting large language models (LLMs) is increasingly appealing for organizations seeking privacy, cost control, and customization. Yet deploying and maintaining in-house models poses challenges i... |
| Cost Trade-offs of Reasoning and Non-Reasoning Large Language Models in Text-to-SQL | Saurabh Deochake, Debajyoti Mukhopadhyay | 2025-12-26 | 下载 | While Text-to-SQL systems achieve high accuracy, existing efficiency metrics like the Valid Efficiency Score prioritize execution time, a metric we show is fundamentally decoupled from consumption-bas... |
| Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud Applications | Shengkun Cui, Rahul Krishna, Saurabh Jha, Ravishankar K. Iyer | 2025-12-26 | 下载 | Cloud incidents pose major operational challenges in production, with unresolved production cloud incidents cost on average over $2M per hour. |
| Proceedings First Workshop on Adaptable Cloud Architectures | Giuseppe De Palma, Saverio Giallorenzo | 2025-12-26 | 下载 | This volume contains the post-proceedings of the Workshop on Adaptable Cloud Architectures (WACA 2025), held on June 20, 2025, in Lille, France, co-located with DisCoTec 2025 - 20th International Fede... |
| FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion | Zhuoran Zhu, Chunyang Zhu, Hao Lin, Xu Fu, Yiming Zhou, Quanlu Zhang, Zhenhua Li, Feng Qian, Chao Yu, Boxun Li, Guohao Dai, Yu Wang | 2025-12-26 | 下载 | Large-scale Mixture-of-Experts (MoE) models rely on \emph{expert parallelism} for efficient training and inference, which splits experts across devices and necessitates distributed data shuffling to r... |
| Robust Federated Fine-Tuning in Heterogeneous Networks with Unreliable Connections: An Aggregation View | Yanmeng Wang, Zhiwen Dai, Shuai Wang, Jian Zhou, Fu Xiao, Tony Q. S. Quek, Tsung-Hui Chang | 2025-12-26 | 下载 | Federated Fine-Tuning (FFT) has attracted growing interest as it leverages both server- and client-side data to enhance global model generalization while preserving privacy, and significantly reduces ... |
| BLEST: Blazingly Efficient BFS using Tensor Cores | Deniz Elbek, Kamer Kaya | 2025-12-26 | 下载 | Breadth-First Search (BFS) is a fundamental graph kernel that underpins a wide range of applications. While modern GPUs provide specialised Matrix-Multiply-Accumulate (MMA) units, e.g. |
| Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models | Tingyang Sun, Ting He, Bo Ji, Parimal Parag | 2025-12-26 | 下载 | Large language models have demonstrated extraordinary performance in many AI tasks but are expensive to use, even after training, due to their requirement of high-end GPUs. |
| LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices | Mingyu Sun, Xiao Zhang, Shen Qu, Yan Li, Mengbai Xiao, Yuan Yuan, Dongxiao Yu | 2025-12-26 | 下载 | Large language models (LLMs) have emerged as a powerful foundation for intelligent reasoning and decision-making, demonstrating substantial impact across a wide range of domains and applications. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Schwarz Information Criterion Aided MAB for Resource Allocation in Dynamic LoRa System | Ryotai Ariyoshi, Aohan Li, Mikio Hasegawa, Miao Pan, Tomoaki Ohtsuki, Zhu Han | 2025-12-26 | 下载 | This paper proposes a lightweight distributed learning method for transmission parameter selection in Long Range (LoRa) networks that can adapt to dynamic communication environments. |
| Meta-Learning-Based Handover Management in NextG O-RAN | Michail Kalntis, George Iosifidis, José Suárez-Varela, Andra Lutu, Fernando A. Kuipers | 2025-12-26 | 下载 | While traditional handovers (THOs) have served as a backbone for mobile connectivity, they increasingly suffer from failures and delays, especially in dense deployments and high-frequency bands. |
| Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models | Tingyang Sun, Ting He, Bo Ji, Parimal Parag | 2025-12-26 | 下载 | Large language models have demonstrated extraordinary performance in many AI tasks but are expensive to use, even after training, due to their requirement of high-end GPUs. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling | Hannah Atmer, Yuan Yao, Thiemo Voigt, Stefanos Kaxiras | 2025-12-26 | 下载 | Energy consumption dictates the cost and environmental impact of deploying Large Language Models. This paper investigates the impact of on-chip SRAM size and operating frequency on the energy efficien... |