Skip to content

2026-04-07

cs.AR - Architecture

标题作者发布日期PDF摘要
Interference Suppression for Massive MU-MIMO Long-Term Beamforming with Matrix Inversion ApproximationAmirreza Kiani, Ali Rasteh, Marco Mezzavilla, Sundeep Rangan2026-04-07下载Long-term beamforming (LTBF) is a widely-used scalable alternative to instantaneous multi-user MIMO processing that leverages slowly varying spatial channel statistics.
Fine-Grained Power and Energy Attribution on AMD GPU/APU-Based Exascale NodesAdam McDaniel, Michael Jantz, Ashesh Sharma, Steve Abbott, Steven Martin, Shreyas Khandekar, Brandon Neth, Bruno Villasenor Alvarez, Aditya Kashi, Wael Elwasif, Oscar Hernandez2026-04-07下载Modern exascale GPU- and APU-based systems provide multiple power and energy sensors, but differences in scope, update rate, timing, and filtering complicate the attribution of short-lived accelerator...
PHAROS: Pipelined Heterogeneous Accelerators for Real-time Safety-critical Systems With Deadline ComplianceShixin Ji, Jinming Zhuang, Sarah Schultz, Zhuoping Yang, Xingzhen Chen, Zheng Dong, Alex K. Jones, Yihui Ren, Peipei Zhou2026-04-07下载Spatially partitioned heterogeneous accelerators (HAs) are increasingly adopted in embedded systems for their performance and flexibility. Yet most existing HA design frameworks optimize primarily for...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
ForkKV: Scaling Multi-LoRA Agent Serving via Copy-on-Write Disaggregated KV CacheShao Wang, Rui Ren, Lin Gui2026-04-07下载The serving paradigm of large language models (LLMs) is rapidly shifting towards complex multi-agent workflows where specialized agents collaborate over massive shared contexts.
Fine-Grained Power and Energy Attribution on AMD GPU/APU-Based Exascale NodesAdam McDaniel, Michael Jantz, Ashesh Sharma, Steve Abbott, Steven Martin, Shreyas Khandekar, Brandon Neth, Bruno Villasenor Alvarez, Aditya Kashi, Wael Elwasif, Oscar Hernandez2026-04-07下载Modern exascale GPU- and APU-based systems provide multiple power and energy sensors, but differences in scope, update rate, timing, and filtering complicate the attribution of short-lived accelerator...
CodecFlow: Codec-Guided End-to-End Optimization for Streaming Video AnalyticsYulin Zou, Yan Chen, Wenyan Chen, JooYoung Park, Shivaraman Nitin, Luo Tao, Francisco Romero, Dmitrii Ustiugov2026-04-07下载Video streaming analytics is a crucial workload for vision-language model serving, but the high cost of multimodal inference limits scalability.
GTaP: A GPU-Resident Fork-Join Task-Parallel Runtime with a Pragma-Based InterfaceYuki Maeda, Kenjiro Taura2026-04-07下载Graphics Processing Units (GPUs) excel at regular data-parallel workloads where massive hardware parallelism can be readily exploited. In contrast, many important irregular applications are naturally ...
JZ-Tree: GPU friendly neighbour search and friends-of-friends with dual tree walks in JAX plus CUDAJens Stücker, Oliver Hahn, Lukas Winkler, Adrian Gutierrez Adame, Thomas Flöss2026-04-07下载Algorithms based on spatial tree traversal are widely regarded as among the most efficient and flexible approaches for many problems in CPU-based high-performance computing (HPC).
Communication Requirements for Linearizable RegistersRaïssa Nataf, Yoram Moses2026-04-07下载While linearizability is a fundamental correctness condition for distributed systems, ensuring the linearizability of implementations can be quite complex.
Optimizing OpenFaaS on Kubernetes: Comparative Analysis of Language Runtimes and Cluster DistributionsEhsan Ataie, Mohammadreza Pooshani, Hossein Aqasizade2026-04-07下载Serverless computing, particularly Function-as-a-Service (FaaS), has revolutionized cloud computing by abstracting infrastructure management and enabling dynamic resource allocation.
ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training WorkloadsJingwei Zuo, Xinze Feng, Zien Liu, Kaijian Wang, Fanjiang Ye, Ye Cao, Zhuang Wang, Yuke Wang2026-04-07下载Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Real-World LoRaWAN Performance and Propagation Modeling Using UAV, Helikite, and Vehicle-Based MeasurementsSergio Vargas Villar, Simran Singh, Özgür Özdemir, Mihail L. Sichitiu, İsmail Güvenç2026-04-07下载This paper presents a field-based evaluation of Long Range Wide Area Network (LoRaWAN) signal propagation conducted at two locations within the Aerial Experimentation and Research Platform for Advance...
Towards Realistic Waveform-Level IoT Network Simulation via IQ MixingAlexis Delplace, Samer Lahoud, Kinda Khawam, Dominique Quadri2026-04-07下载Most Internet of Things (IoT) network simulators are packet-level discrete-event systems in which physical-layer (PHY) behavior is approximated through analytical interference rules and precomputed er...
Design and Analysis of Chirp-Layered Superposition Coding for LoRaJingxiang Huang, Samer Lahoud2026-04-07下载This paper investigates the design of chirp-layered superposition coding for LoRa, where an additional waveform is linearly superposed on a standard LoRa transmission with minimal impact on the LoRa d...
Edge Intelligence for Satellite-based Earth Observation: Scheduling Image Acquisition and ProcessingBeatriz Soret, Antonio M. Mercado-Martínez, Antonio Jurado-Navas, Nicolai D. Lyholm, Marco Moretti, Petar Popovski, Israel Leyva-Mayorga2026-04-07下载Modern Earth Observation (EO) missions generate massive volumes of imagery that challenge existing downlink and ground-processing capabilities, particularly for time-critical applications.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Qurator: Scheduling Hybrid Quantum-Classical Workflows Across Heterogeneous Cloud ProvidersSinan Pehlivanoglu, Ulrik de Muelenaere, Peter Kogge, Amr Sabry2026-04-07下载As quantum computing moves from isolated experiments toward integration with large-scale workflows, the integration of quantum devices into HPC systems has gained much interest.

cs.PF - Performance

标题作者发布日期PDF摘要
Optimizing OpenFaaS on Kubernetes: Comparative Analysis of Language Runtimes and Cluster DistributionsEhsan Ataie, Mohammadreza Pooshani, Hossein Aqasizade2026-04-07下载Serverless computing, particularly Function-as-a-Service (FaaS), has revolutionized cloud computing by abstracting infrastructure management and enabling dynamic resource allocation.
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated ReasoningQisheng Su, Shiting Huang, Zhen Fang, Ziyan Chen, Zehui Chen, Feng Zhao2026-04-07下载In real-world Tool-Integrated Reasoning (TIR) scenarios, where LLMs interleave reasoning with external tool calls, a major source of inefficiency is that the toolcalls create pauses between LLM reques...

基于 VitePress 构建