2025-12-07

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Accurate Models of NVIDIA Tensor Cores	Faizan A. Khattak, Mantas Mikaitis	2025-12-07	下载	Matrix multiplication is a fundamental operation in both training of neural networks and inference. To accelerate matrix multiplication, Graphical Processing Units (GPUs) provide it implemented in har...
ArchPower: Dataset for Architecture-Level Power Modeling of Modern CPU Design	Qijun Zhang, Yao Lu, Mengming Li, Shang Liu, Zhiyao Xie	2025-12-07	下载	Power is the primary design objective of large-scale integrated circuits (ICs), especially for complex modern processors (i.e., CPUs). Accurate CPU power evaluation requires designers to go through th...
Formal that "Floats" High: Formal Verification of Floating Point Arithmetic	Hansa Mohanty, Vaisakh Naduvodi Viswambharan, Deepak Narayan Gadde	2025-12-07	下载	Formal verification of floating-point arithmetic remains challenging due to non-linear arithmetic behavior and the tight coupling between control and datapath logic.
GPU-Accelerated Optimization Solver for Unit Commitment in Large-Scale Power Grids	Hussein Sharadga, Javad Mohammadi	2025-12-07	下载	This work presents a GPU-accelerated solver for the unit commitment (UC) problem in large-scale power grids. The solver uses the Primal-Dual Hybrid Gradient (PDHG) algorithm to efficiently solve the r...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Optimizing video analytics inference pipelines: a case study	Saeid Ghafouri, Yuming Ding, Katerine Diaz Chito, Jesús Martinez del Rincón, Niamh O'Connell, Hans Vandierendonck	2025-12-07	下载	Cost-effective and scalable video analytics are essential for precision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms generates substant...
ELANA: A Simple Energy and Latency Analyzer for LLMs	Hung-Yueh Chiang, Bokun Wang, Diana Marculescu	2025-12-07	下载	The latency and power consumption of large language models (LLMs) are major constraints when serving them across a wide spectrum of hardware platforms, from mobile edge devices to cloud GPU clusters.
A Chunked-Object Pattern for Multi-Region Large Payload Storage in Managed NoSQL Databases	Manideep Reddy Chinthareddy	2025-12-07	下载	Many managed key-value and NoSQL databases - such as Amazon DynamoDB, Azure Cosmos DB, and Google Cloud Firestore - enforce strict maximum item sizes (e.g., 400 KB in DynamoDB).
Cloud Revolution: Tracing the Origins and Rise of Cloud Computing	Deepa Gurung, S M Zia Ur Rashid, Zain ul Abdeen, Suman Rath	2025-12-07	下载	The history behind the development of cloud computing is more than several decades of technological progress in the fields of virtualization, distributed systems, and high-speed networking, but its cu...
Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge Networks	Long Shi, Bingyan Ou, Kang Wei, Weihao Zhu, Zhe Wang, Zhiyong Chen	2025-12-07	下载	The sparse activation mechanism of mixture of experts (MoE) model empowers edge intelligence with enhanced training efficiency and reduced computational resource consumption.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Managed TLS Under Migration: Authentication Authority Across CDN and Hosting Transitions	Daniyal Ganiuly, Nurzhau Bolatbek, Assel Smaiyl	2025-12-07	下载	Managed TLS has become a common approach for deploying HTTPS, with platforms generating and storing private keys and automating certificate issuance on behalf of domain operators.
Permission Manifests for Web Agents	Samuele Marro, Alan Chan, Xinxing Ren, Lewis Hammond, Jesse Wright, Gurjyot Wanga, Tiziano Piccardi, Nuno Campos, Tobin South, Jialin Yu, Sunando Sengupta, Eric Sommerlade, Alex Pentland, Philip Torr, Jiaxin Pei	2025-12-07	下载	The rise of Large Language Model (LLM)-based web agents represents a significant shift in automated interactions with the web. Unlike traditional crawlers that follow simple conventions, such as robot...
AQUILA: A QUIC-Based Link Architecture for Resilient Long-Range UAV Communication	Ximing Huang, Yirui Rao	2025-12-07	下载	The proliferation of autonomous Unmanned Aerial Vehicles (UAVs) in Beyond Visual Line of Sight (BVLOS) applications is critically dependent on resilient, high-bandwidth, and low-latency communication ...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Block Sparse Flash Attention	Daniel Ohayon, Itay Lamprecht, Itay Hubara, Israel Cohen, Daniel Soudry, Noam Elata	2025-12-07	下载	Modern large language models increasingly require long contexts for reasoning and multi-document tasks, but attention's quadratic complexity creates a severe computational bottleneck.
Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven Approach to Storage Optimization	Karthik Prabhakar, Durgamadhab Mishra	2025-12-07	下载	Modern machine learning training is increasingly bottlenecked by data I/O rather than compute. GPUs often sit idle at below 50% utilization waiting for data.