2026-03-29

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
RTLSeek: Boosting the LLM-Based RTL Generation with Multi-Stage Diversity-Oriented Reinforcement Learning	Xinyu Zhang, Zhiteng Chao, Yonghao Wang, Bin Sun, Tianyun Ma, Tianmeng Yang, Jianan Mu, Jing Justin Ye, Huawei Li	2026-03-29	下载	Register Transfer Level (RTL) design translates high-level specifications into hardware using HDLs such as Verilog. Although LLM-based RTL generation is promising, the scarcity of functionally verifia...
Expert Streaming: Accelerating Low-Batch MoE Inference via Multi-chiplet Architecture and Dynamic Expert Trajectory Scheduling	Songchen Ma, Hongyi Li, Weihao Zhang, Yonghao Tan, Pingcheng Dong, Yu Liu, Lan Liu, Yuzhong Jiao, Xuejiao Liu, Luhong Liang, Kwang-Ting Cheng	2026-03-29	下载	Mixture-of-Experts is a promising approach for edge AI with low-batch inference. Yet, on-device deployments often face limited on-chip memory and severe workload imbalance; the prevalent use of offloa...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Operational Strategies for Non-Disruptive Scheduling Transitions in Production HPC Systems	Glen MacLachlan, Joseph Creech, Rubeel Muhammad Iqbal, Clark Gaylord, Jake Messick	2026-03-29	下载	Migrating heterogeneous high-performance computing (HPC) systems to resource-aware scheduling introduces both technical and behavioral challenges, particularly in production environments with establis...
jaxsgp4: GPU-accelerated mega-constellation propagation with batch parallelism	Charlotte Priestley, Will Handley	2026-03-29	下载	As the population of anthropogenic space objects transitions from sparse clusters to mega-constellations exceeding 100,000 satellites, traditional orbital propagation techniques face a critical bottle...
Optimising Blockchain Scalability for Real-Time IoT Applications	Hasan Mahmud Rhidoy, Mahdi H. Miraz, Iftekhar Salam	2026-03-29	下载	The convergence of blockchain and the Internet of Things (IoT) enables secure, decentralised, and verifiable data exchange across distributed smart environments.
Beating vDSP: A 138 GFLOPS Radix-8 Stockham FFT on Apple Silicon via Two-Tier Register-Threadgroup Memory Decomposition	Mohamed Amine Bergach	2026-03-29	下载	We present an optimized Fast Fourier Transform (FFT) implementation for Apple Silicon GPUs, achieving 138.45~GFLOPS for $N\!=\!4096$ complex single-precision transforms -- a 29% improvement over Appl...
The First OpenFOAM HPC Challenge (OHC-1)	Sergey Lesnik, Gregor Olenik, Mark Wassermann	2026-03-29	下载	The first OpenFOAM HPC Challenge (OHC-1) was organised by the OpenFOAM HPC Technical Committee (HPCTC) to collect a snapshot of OpenFOAM's computational performance on contemporary production hardware...
BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed Modalities	Pranav M R, Jayant Chandwani, Ahmed M. Abdelmoniem, Arnab K. Paul	2026-03-29	下载	Multimodal federated learning (FL) is essential for real-world applications such as autonomous systems and healthcare, where data is distributed across heterogeneous clients with varying and often mis...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Energy Efficient Orchestration in Multiple-Access Vehicular Aerial-Terrestrial 6G Networks	Mohammad Farhoudi, Hamidreza Mazandarani, Masoud Shokrnezhad, Tarik Taleb, Ignacio Lacalle	2026-03-29	下载	The proliferation of users, devices, and novel vehicular applications - propelled by advancements in autonomous systems and connected technologies - is precipitating an unprecedented surge in novel se...
Fronthaul Network Planning for Hierarchical and Radio-Stripes-Enabled CF-mMIMO in O-RAN	Anas S. Mohammed, Krishnendu S. Tharakan, Hussein A. Ammar, Hesham ElSawy, Hossam S. Hassanein	2026-03-29	下载	The deployment of ultra-dense networks (UDNs), particularly cell-free massive MIMO (CF-mMIMO), is mainly hindered by costly and capacity-limited fronthaul links.
LP-Based Algorithms for Scheduling in a Quantum Switch	R. Srikant	2026-03-29	下载	We consider scheduling in a quantum switch with stochastic entanglement generation, finite quantum memories, and decoherence. The objective is to design a scheduling algorithm with polynomial-time com...
Tracking without Seeing: Geospatial Inference using Encrypted Traffic from Distributed Nodes	Sadik Yagiz Yetim, Gaofeng Dong, Isaac-Neil Zanoria, Ronit Barman, Maggie Wigness, Tarek Abdelzaher, Mani Srivastava, Suhas Diggavi	2026-03-29	下载	Accurate observation of dynamic environments traditionally relies on synthesizing raw, signal-level information from multiple distributed sensors.
Serverless5GC: Private 5G Core Deployment via a Procedure-as-a-Function Architecture	Hai Dinh-Tuan	2026-03-29	下载	Open-source 5G core implementations deploy network functions as always-on processes that consume resources even when idle. This inefficiency is most acute in private and edge deployments with sporadic...
RADAR-Q: Resource-Aware Distributed Asynchronous Routing for Entanglement Distribution in Multi-Tenant Quantum Networks	Chenliang Tian, Zebo Yang, Raj Jain, Ramana Kompella, Reza Nejabati, Eneet Kaur, Aiman Erbad, Mohamed Abdallah, Mounir Hamdi	2026-03-29	下载	Scalable quantum networks must support concurrent entanglement requests, yet existing routing protocols fail when users compete for shared repeater resources, wasting fragile quantum states.
Asynchronous Routing for Multipartite Entanglement in Quantum Networks	Chenliang Tian, Zebo Yang, Raj Jain, Ramana Kompella, Reza Nejabati, Eneet Kaur, Aiman Erbad, Mounir Hamdi, Mohamed Abdallah	2026-03-29	下载	In quantum networks, one way to communicate is to distribute entanglements through swapping at intermediate nodes. Most existing work primarily aims to create efficient two-party end-to-end entangleme...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Operational Strategies for Non-Disruptive Scheduling Transitions in Production HPC Systems	Glen MacLachlan, Joseph Creech, Rubeel Muhammad Iqbal, Clark Gaylord, Jake Messick	2026-03-29	下载	Migrating heterogeneous high-performance computing (HPC) systems to resource-aware scheduling introduces both technical and behavioral challenges, particularly in production environments with establis...
Beating vDSP: A 138 GFLOPS Radix-8 Stockham FFT on Apple Silicon via Two-Tier Register-Threadgroup Memory Decomposition	Mohamed Amine Bergach	2026-03-29	下载	We present an optimized Fast Fourier Transform (FFT) implementation for Apple Silicon GPUs, achieving 138.45~GFLOPS for $N\!=\!4096$ complex single-precision transforms -- a 29% improvement over Appl...
Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs	Yi Liu	2026-03-29	下载	Scaling laws relate model quality to compute budget (FLOPs), but practitioners face wall-clock time constraints, not compute budgets. We study optimal model sizing under fixed time budgets from 5 minu...
RSR-core: A High-Performance Engine for Low-Bit Matrix-Vector Multiplication	Mohsen Dehghankar, Abolfazl Asudeh	2026-03-29	下载	Matrix-vector multiplication is a fundamental building block in neural networks, vector databases, and large language models, particularly during inference.