Appearance
2025-12-16
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Understanding Bottlenecks for Efficiently Serving LLM Inference With KV Offloading | William Meng, Benjamin Lee, Hong Wang | 2025-12-16 | 下载 | KV cache offloading enables long-context LLM inference by storing caches in CPU DRAM, but PCIe bandwidth limitations create severe bottlenecks. |
| Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models | Chiyue Wei, Cong Guo, Junyao Zhang, Haoxuan Shan, Yifan Xu, Ziyue Zhang, Yudong Liu, Qinsi Wang, Changchun Zhou, Hai "Helen" Li, Yiran Chen | 2025-12-16 | 下载 | Vision-Language Models (VLMs) have demonstrated strong performance on tasks such as video captioning and visual question answering. However, their growing scale and video-level inputs lead to signific... |
| PADE: A Predictor-Free Sparse Attention Accelerator via Unified Execution and Stage Fusion | Huizheng Wang, Hongbin Wang, Zichuan Wang, Zhiheng Yue, Yang Wang, Chao Li, Yang Hu, Shouyi Yin | 2025-12-16 | 下载 | Attention-based models have revolutionized AI, but the quadratic cost of self-attention incurs severe computational and memory overhead. Sparse attention methods alleviate this by skipping low-relevan... |
| TEMP: A Memory Efficient Physical-aware Tensor Partition-Mapping Framework on Wafer-scale Chips | Huizheng Wang, Taiquan Wei, Zichuan Wang, Dingcheng Jiang, Qize Yang, Jiaxin Liu, Jingxiang Hou, Chao Li, Jinyi Deng, Yang Hu, Shouyi Yin | 2025-12-16 | 下载 | Large language models (LLMs) demand significant memory and computation resources. Wafer-scale chips (WSCs) provide high computation power and die-to-die (D2D) bandwidth but face a unique trade-off bet... |
| ReadyPower: A Reliable, Interpretable, and Handy Architectural Power Model Based on Analytical Framework | Qijun Zhang, Shang Liu, Yao Lu, Mengming Li, Zhiyao Xie | 2025-12-16 | 下载 | Power is a primary objective in modern processor design, requiring accurate yet efficient power modeling techniques. Architecture-level power models are necessary for early power optimization and desi... |
| Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement | Songze Liu, Hongkun Du, Shaowen Wang | 2025-12-16 | 下载 | Large Language Models (LLMs), such as GPT and LLaMA, introduce unique memory access characteristics during inference due to frequent token sequence lookups and embedding vector retrievals. |
| The Impact Market to Save Conference Peer Review: Decoupling Dissemination and Credentialing | Karthikeyan Sankaralingam | 2025-12-16 | 下载 | Top-tier academic conferences are failing under the strain of two irreconcilable roles: (1) rapid dissemination of all sound research and (2) scarce credentialing for prestige and career advancement. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Optimizing Sensor Node Localization for Achieving Sustainable Smart Agriculture System Connectivity | Mohamed Naeem | 2025-12-16 | 下载 | The innovative agriculture system is revolutionizing how we farm, making it one of the most critical innovations of our time! Yet it faces significant connectivity challenges, particularly with the se... |
| Understanding Bottlenecks for Efficiently Serving LLM Inference With KV Offloading | William Meng, Benjamin Lee, Hong Wang | 2025-12-16 | 下载 | KV cache offloading enables long-context LLM inference by storing caches in CPU DRAM, but PCIe bandwidth limitations create severe bottlenecks. |
| PruneX: A Hierarchical Communication-Efficient System for Distributed CNN Training with Structured Pruning | Alireza Olama, Andreas Lundell, Izzat El Hajj, Johan Lilius, Jerker Björkqvist | 2025-12-16 | 下载 | Inter-node communication bandwidth increasingly constrains distributed training at scale on multi-node GPU clusters. While compact models are the ultimate deployment target, conventional pruning-aware... |
| Improving Slow Transfer Predictions: Generative Methods Compared | Jacob Taegon Kim, Alex Sim, Kesheng Wu, Jinoh Kim | 2025-12-16 | 下载 | Monitoring data transfer performance is a crucial task in scientific computing networks. By predicting performance early in the communication phase, potentially sluggish transfers can be identified an... |
| Performance and Stability of Barrier Mode Parallel Systems with Heterogeneous and Redundant Jobs | Brenton Walker, Markus Fidler | 2025-12-16 | 下载 | In some models of parallel computation, jobs are split into smaller tasks and can be executed completely asynchronously. In other situations the parallel tasks have constraints that require them to sy... |
| A Hybrid Reactive-Proactive Auto-scaling Algorithm for SLA-Constrained Edge Computing | Suhrid Gupta, Muhammed Tawfiqul Islam, Rajkumar Buyya | 2025-12-16 | 下载 | Edge computing decentralizes computing resources, allowing for novel applications in domains such as the Internet of Things (IoT) in healthcare and agriculture by reducing latency and improving perfor... |
| Privacy-Preserving Feature Valuation in Vertical Federated Learning Using Shapley-CMI and PSI Permutation | Unai Laskurain, Aitor Aguirre-Ortuzar, Urko Zurutuza | 2025-12-16 | 下载 | Federated Learning (FL) is an emerging machine learning paradigm that enables multiple parties to collaboratively train models without sharing raw data, ensuring data privacy. |
| Cornserve: Efficiently Serving Any-to-Any Multimodal Models | Jeff J. Ma, Jae-Won Chung, Jisang Ahn, Yizhuo Liang, Akshay Jajoo, Myungjin Lee, Mosharaf Chowdhury | 2025-12-16 | 下载 | We present Cornserve, an efficient online serving system for an emerging class of multimodal models called Any-to-Any models. Any-to-Any models accept combinations of text and multimodal data (e.g. |
| Real-Time Service Subscription and Adaptive Offloading Control in Vehicular Edge Computing | Chuanchao Gao, Arvind Easwaran | 2025-12-16 | 下载 | Vehicular Edge Computing (VEC) has emerged as a promising paradigm for enhancing the computational efficiency and service quality in intelligent transportation systems by enabling vehicles to wireless... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Improving Slow Transfer Predictions: Generative Methods Compared | Jacob Taegon Kim, Alex Sim, Kesheng Wu, Jinoh Kim | 2025-12-16 | 下载 | Monitoring data transfer performance is a crucial task in scientific computing networks. By predicting performance early in the communication phase, potentially sluggish transfers can be identified an... |
| Hybrid Cognitive IoT with Cooperative Caching and SWIPT-EH: A Hierarchical Reinforcement Learning Framework | Nadia Abdolkhani, Walaa Hamouda | 2025-12-16 | 下载 | This paper proposes a hierarchical deep reinforcement learning (DRL) framework based on the soft actor-critic (SAC) algorithm for hybrid underlay-overlay cognitive Internet of Things (CIoT) networks w... |
| Performance and Stability of Barrier Mode Parallel Systems with Heterogeneous and Redundant Jobs | Brenton Walker, Markus Fidler | 2025-12-16 | 下载 | In some models of parallel computation, jobs are split into smaller tasks and can be executed completely asynchronously. In other situations the parallel tasks have constraints that require them to sy... |
| Assessing the Carbon Footprint of Virtual Meetings: A Quantitative Analysis of Camera Usage | Félix Mortas | 2025-12-16 | 下载 | This paper quantifies the carbon emissions related to data consumption during video calls, focusing on the impact of having the camera on versus off. |
| FUSION: Forecast-Embedded Agent Scheduling with Service Incentive Optimization over Distributed Air-Ground Edge Networks | Houyi Qi, Minghui Liwang, Seyyedali Hosseinalipour, Liqun Fu, Sai Zou, Xianbin Wang, Wei Ni, Yiguang Hong | 2025-12-16 | 下载 | In this paper, we introduce a first-of-its-kind forecasting-driven, incentive-inherent service provisioning framework for distributed air-ground integrated networks that explicitly accounts for human-... |
| A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks | Agrippina Mwangi, León Navarro-Hilfiker, Lukasz Brewka, Mikkel Gryning, Elena Fumagalli, Madeleine Gibescu | 2025-12-16 | 下载 | Stochastic disruptions such as flash events arising from benign traffic bursts and switch thermal fluctuations are major contributors to intermittent service degradation in software-defined industrial... |
| Cooperative Caching Towards Efficient Spectrum Utilization in Cognitive-IoT Networks | Nadia Abdolkhani, Walaa Hamouda | 2025-12-16 | 下载 | In cognitive Internet of Things (CIoT) networks, efficient spectrum sharing is essential to address increasing wireless demands. This paper presents a novel deep reinforcement learning (DRL)-based app... |
| Hierarchical Deep Reinforcement Learning for Robust Access in Cognitive IoT Networks under Smart Jamming Attacks | Nadia Abdolkhani, Walaa Hamouda | 2025-12-16 | 下载 | In this paper, we address the challenge of dynamic spectrum access in a cognitive Internet of Things (CIoT) network where a secondary user (SU) operates under both energy constraints and adversarial i... |
| Country-in-the-Middle: Measuring Paths between People and their Governments | Alisha Ukani, Katherine Izhikevich, Shambhavi Mittal, Manan Patel, Samvrit Srinath, Kristy Ly, kc claffy, Alex C. Snoeren | 2025-12-16 | 下载 | Understanding where Internet services are hosted, and how users reach them, has captured the interest of government regulators and others concerned with the privacy of data flows. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving | Shaoting Feng, Yuhan Liu, Hanchen Li, Xiaokun Chen, Samuel Shen, Kuntai Du, Zhuohan Gu, Rui Zhang, Yuyang Huang, Yihua Cheng, Jiayi Yao, Qizheng Zhang, Ganesh Ananthanarayanan, Junchen Jiang | 2025-12-16 | 下载 | Reusing KV cache is essential for high efficiency of Large Language Model (LLM) inference systems. With more LLM users, the KV cache footprint can easily exceed GPU memory capacity, so prior work has ... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| From HNSW to Information-Theoretic Binarization: Rethinking the Architecture of Scalable Vector Search | Seyed Moein Abtahi, Majid Fekri, Tara Khani, Akramul Azim | 2025-12-16 | 下载 | Modern semantic search and retrieval-augmented generation (RAG) systems rely predominantly on in-memory approximate nearest neighbor (ANN) indexes over high-precision floating-point vectors, resulting... |
| Performance and Stability of Barrier Mode Parallel Systems with Heterogeneous and Redundant Jobs | Brenton Walker, Markus Fidler | 2025-12-16 | 下载 | In some models of parallel computation, jobs are split into smaller tasks and can be executed completely asynchronously. In other situations the parallel tasks have constraints that require them to sy... |
| A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks | Agrippina Mwangi, León Navarro-Hilfiker, Lukasz Brewka, Mikkel Gryning, Elena Fumagalli, Madeleine Gibescu | 2025-12-16 | 下载 | Stochastic disruptions such as flash events arising from benign traffic bursts and switch thermal fluctuations are major contributors to intermittent service degradation in software-defined industrial... |
| Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement | Songze Liu, Hongkun Du, Shaowen Wang | 2025-12-16 | 下载 | Large Language Models (LLMs), such as GPT and LLaMA, introduce unique memory access characteristics during inference due to frequent token sequence lookups and embedding vector retrievals. |