2026-04-07

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Interference Suppression for Massive MU-MIMO Long-Term Beamforming with Matrix Inversion Approximation	Amirreza Kiani, Ali Rasteh, Marco Mezzavilla, Sundeep Rangan	2026-04-07	下载	Long-term beamforming (LTBF) is a widely-used scalable alternative to instantaneous multi-user MIMO processing that leverages slowly varying spatial channel statistics.
Fine-Grained Power and Energy Attribution on AMD GPU/APU-Based Exascale Nodes	Adam McDaniel, Michael Jantz, Ashesh Sharma, Steve Abbott, Steven Martin, Shreyas Khandekar, Brandon Neth, Bruno Villasenor Alvarez, Aditya Kashi, Wael Elwasif, Oscar Hernandez	2026-04-07	下载	Modern exascale GPU- and APU-based systems provide multiple power and energy sensors, but differences in scope, update rate, timing, and filtering complicate the attribution of short-lived accelerator...
PHAROS: Pipelined Heterogeneous Accelerators for Real-time Safety-critical Systems With Deadline Compliance	Shixin Ji, Jinming Zhuang, Sarah Schultz, Zhuoping Yang, Xingzhen Chen, Zheng Dong, Alex K. Jones, Yihui Ren, Peipei Zhou	2026-04-07	下载	Spatially partitioned heterogeneous accelerators (HAs) are increasingly adopted in embedded systems for their performance and flexibility. Yet most existing HA design frameworks optimize primarily for...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV Cache	Shao Wang, Rui Ren, Lin Gui	2026-04-07	下载	The serving paradigm of large language models (LLMs) is rapidly shifting towards complex multi-agent workflows where specialized agents collaborate over massive shared contexts.
Fine-Grained Power and Energy Attribution on AMD GPU/APU-Based Exascale Nodes	Adam McDaniel, Michael Jantz, Ashesh Sharma, Steve Abbott, Steven Martin, Shreyas Khandekar, Brandon Neth, Bruno Villasenor Alvarez, Aditya Kashi, Wael Elwasif, Oscar Hernandez	2026-04-07	下载	Modern exascale GPU- and APU-based systems provide multiple power and energy sensors, but differences in scope, update rate, timing, and filtering complicate the attribution of short-lived accelerator...
CodecFlow: Codec-Guided End-to-End Optimization for Streaming Video Analytics	Yulin Zou, Yan Chen, Wenyan Chen, JooYoung Park, Shivaraman Nitin, Luo Tao, Francisco Romero, Dmitrii Ustiugov	2026-04-07	下载	Video streaming analytics is a crucial workload for vision-language model serving, but the high cost of multimodal inference limits scalability.
GTaP: A GPU-Resident Fork-Join Task-Parallel Runtime with a Pragma-Based Interface	Yuki Maeda, Kenjiro Taura	2026-04-07	下载	Graphics Processing Units (GPUs) excel at regular data-parallel workloads where massive hardware parallelism can be readily exploited. In contrast, many important irregular applications are naturally ...
JZ-Tree: GPU friendly neighbour search and friends-of-friends with dual tree walks in JAX plus CUDA	Jens Stücker, Oliver Hahn, Lukas Winkler, Adrian Gutierrez Adame, Thomas Flöss	2026-04-07	下载	Algorithms based on spatial tree traversal are widely regarded as among the most efficient and flexible approaches for many problems in CPU-based high-performance computing (HPC).
Communication Requirements for Linearizable Registers	Raïssa Nataf, Yoram Moses	2026-04-07	下载	While linearizability is a fundamental correctness condition for distributed systems, ensuring the linearizability of implementations can be quite complex.
Optimizing OpenFaaS on Kubernetes: Comparative Analysis of Language Runtimes and Cluster Distributions	Ehsan Ataie, Mohammadreza Pooshani, Hossein Aqasizade	2026-04-07	下载	Serverless computing, particularly Function-as-a-Service (FaaS), has revolutionized cloud computing by abstracting infrastructure management and enabling dynamic resource allocation.
ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads	Jingwei Zuo, Xinze Feng, Zien Liu, Kaijian Wang, Fanjiang Ye, Ye Cao, Zhuang Wang, Yuke Wang	2026-04-07	下载	Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Real-World LoRaWAN Performance and Propagation Modeling Using UAV, Helikite, and Vehicle-Based Measurements	Sergio Vargas Villar, Simran Singh, Özgür Özdemir, Mihail L. Sichitiu, İsmail Güvenç	2026-04-07	下载	This paper presents a field-based evaluation of Long Range Wide Area Network (LoRaWAN) signal propagation conducted at two locations within the Aerial Experimentation and Research Platform for Advance...
Towards Realistic Waveform-Level IoT Network Simulation via IQ Mixing	Alexis Delplace, Samer Lahoud, Kinda Khawam, Dominique Quadri	2026-04-07	下载	Most Internet of Things (IoT) network simulators are packet-level discrete-event systems in which physical-layer (PHY) behavior is approximated through analytical interference rules and precomputed er...
Design and Analysis of Chirp-Layered Superposition Coding for LoRa	Jingxiang Huang, Samer Lahoud	2026-04-07	下载	This paper investigates the design of chirp-layered superposition coding for LoRa, where an additional waveform is linearly superposed on a standard LoRa transmission with minimal impact on the LoRa d...
Edge Intelligence for Satellite-based Earth Observation: Scheduling Image Acquisition and Processing	Beatriz Soret, Antonio M. Mercado-Martínez, Antonio Jurado-Navas, Nicolai D. Lyholm, Marco Moretti, Petar Popovski, Israel Leyva-Mayorga	2026-04-07	下载	Modern Earth Observation (EO) missions generate massive volumes of imagery that challenge existing downlink and ground-processing capabilities, particularly for time-critical applications.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Qurator: Scheduling Hybrid Quantum-Classical Workflows Across Heterogeneous Cloud Providers	Sinan Pehlivanoglu, Ulrik de Muelenaere, Peter Kogge, Amr Sabry	2026-04-07	下载	As quantum computing moves from isolated experiments toward integration with large-scale workflows, the integration of quantum devices into HPC systems has gained much interest.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Optimizing OpenFaaS on Kubernetes: Comparative Analysis of Language Runtimes and Cluster Distributions	Ehsan Ataie, Mohammadreza Pooshani, Hossein Aqasizade	2026-04-07	下载	Serverless computing, particularly Function-as-a-Service (FaaS), has revolutionized cloud computing by abstracting infrastructure management and enabling dynamic resource allocation.
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning	Qisheng Su, Shiting Huang, Zhen Fang, Ziyan Chen, Zehui Chen, Feng Zhao	2026-04-07	下载	In real-world Tool-Integrated Reasoning (TIR) scenarios, where LLMs interleave reasoning with external tool calls, a major source of inefficiency is that the toolcalls create pauses between LLM reques...