Skip to content

2025-12-07

cs.AR - Architecture

标题作者发布日期PDF摘要
Accurate Models of NVIDIA Tensor CoresFaizan A. Khattak, Mantas Mikaitis2025-12-07下载Matrix multiplication is a fundamental operation in both training of neural networks and inference. To accelerate matrix multiplication, Graphical Processing Units (GPUs) provide it implemented in har...
ArchPower: Dataset for Architecture-Level Power Modeling of Modern CPU DesignQijun Zhang, Yao Lu, Mengming Li, Shang Liu, Zhiyao Xie2025-12-07下载Power is the primary design objective of large-scale integrated circuits (ICs), especially for complex modern processors (i.e., CPUs). Accurate CPU power evaluation requires designers to go through th...
Formal that "Floats" High: Formal Verification of Floating Point ArithmeticHansa Mohanty, Vaisakh Naduvodi Viswambharan, Deepak Narayan Gadde2025-12-07下载Formal verification of floating-point arithmetic remains challenging due to non-linear arithmetic behavior and the tight coupling between control and datapath logic.
GPU-Accelerated Optimization Solver for Unit Commitment in Large-Scale Power GridsHussein Sharadga, Javad Mohammadi2025-12-07下载This work presents a GPU-accelerated solver for the unit commitment (UC) problem in large-scale power grids. The solver uses the Primal-Dual Hybrid Gradient (PDHG) algorithm to efficiently solve the r...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Optimizing video analytics inference pipelines: a case studySaeid Ghafouri, Yuming Ding, Katerine Diaz Chito, Jesús Martinez del Rincón, Niamh O'Connell, Hans Vandierendonck2025-12-07下载Cost-effective and scalable video analytics are essential for precision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms generates substant...
ELANA: A Simple Energy and Latency Analyzer for LLMsHung-Yueh Chiang, Bokun Wang, Diana Marculescu2025-12-07下载The latency and power consumption of large language models (LLMs) are major constraints when serving them across a wide spectrum of hardware platforms, from mobile edge devices to cloud GPU clusters.
A Chunked-Object Pattern for Multi-Region Large Payload Storage in Managed NoSQL DatabasesManideep Reddy Chinthareddy2025-12-07下载Many managed key-value and NoSQL databases - such as Amazon DynamoDB, Azure Cosmos DB, and Google Cloud Firestore - enforce strict maximum item sizes (e.g., 400 KB in DynamoDB).
Cloud Revolution: Tracing the Origins and Rise of Cloud ComputingDeepa Gurung, S M Zia Ur Rashid, Zain ul Abdeen, Suman Rath2025-12-07下载The history behind the development of cloud computing is more than several decades of technological progress in the fields of virtualization, distributed systems, and high-speed networking, but its cu...
Stable-MoE: Lyapunov-based Token Routing for Distributed Mixture-of-Experts Training over Edge NetworksLong Shi, Bingyan Ou, Kang Wei, Weihao Zhu, Zhe Wang, Zhiyong Chen2025-12-07下载The sparse activation mechanism of mixture of experts (MoE) model empowers edge intelligence with enhanced training efficiency and reduced computational resource consumption.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Managed TLS Under Migration: Authentication Authority Across CDN and Hosting TransitionsDaniyal Ganiuly, Nurzhau Bolatbek, Assel Smaiyl2025-12-07下载Managed TLS has become a common approach for deploying HTTPS, with platforms generating and storing private keys and automating certificate issuance on behalf of domain operators.
Permission Manifests for Web AgentsSamuele Marro, Alan Chan, Xinxing Ren, Lewis Hammond, Jesse Wright, Gurjyot Wanga, Tiziano Piccardi, Nuno Campos, Tobin South, Jialin Yu, Sunando Sengupta, Eric Sommerlade, Alex Pentland, Philip Torr, Jiaxin Pei2025-12-07下载The rise of Large Language Model (LLM)-based web agents represents a significant shift in automated interactions with the web. Unlike traditional crawlers that follow simple conventions, such as robot...
AQUILA: A QUIC-Based Link Architecture for Resilient Long-Range UAV CommunicationXiming Huang, Yirui Rao2025-12-07下载The proliferation of autonomous Unmanned Aerial Vehicles (UAVs) in Beyond Visual Line of Sight (BVLOS) applications is critically dependent on resilient, high-bandwidth, and low-latency communication ...

cs.PF - Performance

标题作者发布日期PDF摘要
Block Sparse Flash AttentionDaniel Ohayon, Itay Lamprecht, Itay Hubara, Israel Cohen, Daniel Soudry, Noam Elata2025-12-07下载Modern large language models increasingly require long contexts for reasoning and multi-document tasks, but attention's quadratic complexity creates a severe computational bottleneck.
Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven Approach to Storage OptimizationKarthik Prabhakar, Durgamadhab Mishra2025-12-07下载Modern machine learning training is increasingly bottlenecked by data I/O rather than compute. GPUs often sit idle at below 50% utilization waiting for data.

基于 VitePress 构建