Skip to content

2026-03-29

cs.AR - Architecture

标题作者发布日期PDF摘要
RTLSeek: Boosting the LLM-Based RTL Generation with Multi-Stage Diversity-Oriented Reinforcement LearningXinyu Zhang, Zhiteng Chao, Yonghao Wang, Bin Sun, Tianyun Ma, Tianmeng Yang, Jianan Mu, Jing Justin Ye, Huawei Li2026-03-29下载Register Transfer Level (RTL) design translates high-level specifications into hardware using HDLs such as Verilog. Although LLM-based RTL generation is promising, the scarcity of functionally verifia...
Expert Streaming: Accelerating Low-Batch MoE Inference via Multi-chiplet Architecture and Dynamic Expert Trajectory SchedulingSongchen Ma, Hongyi Li, Weihao Zhang, Yonghao Tan, Pingcheng Dong, Yu Liu, Lan Liu, Yuzhong Jiao, Xuejiao Liu, Luhong Liang, Kwang-Ting Cheng2026-03-29下载Mixture-of-Experts is a promising approach for edge AI with low-batch inference. Yet, on-device deployments often face limited on-chip memory and severe workload imbalance; the prevalent use of offloa...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Operational Strategies for Non-Disruptive Scheduling Transitions in Production HPC SystemsGlen MacLachlan, Joseph Creech, Rubeel Muhammad Iqbal, Clark Gaylord, Jake Messick2026-03-29下载Migrating heterogeneous high-performance computing (HPC) systems to resource-aware scheduling introduces both technical and behavioral challenges, particularly in production environments with establis...
jaxsgp4: GPU-accelerated mega-constellation propagation with batch parallelismCharlotte Priestley, Will Handley2026-03-29下载As the population of anthropogenic space objects transitions from sparse clusters to mega-constellations exceeding 100,000 satellites, traditional orbital propagation techniques face a critical bottle...
Optimising Blockchain Scalability for Real-Time IoT ApplicationsHasan Mahmud Rhidoy, Mahdi H. Miraz, Iftekhar Salam2026-03-29下载The convergence of blockchain and the Internet of Things (IoT) enables secure, decentralised, and verifiable data exchange across distributed smart environments.
Beating vDSP: A 138 GFLOPS Radix-8 Stockham FFT on Apple Silicon via Two-Tier Register-Threadgroup Memory DecompositionMohamed Amine Bergach2026-03-29下载We present an optimized Fast Fourier Transform (FFT) implementation for Apple Silicon GPUs, achieving 138.45~GFLOPS for N=4096N\!=\!4096 complex single-precision transforms -- a 29% improvement over Appl...
The First OpenFOAM HPC Challenge (OHC-1)Sergey Lesnik, Gregor Olenik, Mark Wassermann2026-03-29下载The first OpenFOAM HPC Challenge (OHC-1) was organised by the OpenFOAM HPC Technical Committee (HPCTC) to collect a snapshot of OpenFOAM's computational performance on contemporary production hardware...
BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed ModalitiesPranav M R, Jayant Chandwani, Ahmed M. Abdelmoniem, Arnab K. Paul2026-03-29下载Multimodal federated learning (FL) is essential for real-world applications such as autonomous systems and healthcare, where data is distributed across heterogeneous clients with varying and often mis...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Energy Efficient Orchestration in Multiple-Access Vehicular Aerial-Terrestrial 6G NetworksMohammad Farhoudi, Hamidreza Mazandarani, Masoud Shokrnezhad, Tarik Taleb, Ignacio Lacalle2026-03-29下载The proliferation of users, devices, and novel vehicular applications - propelled by advancements in autonomous systems and connected technologies - is precipitating an unprecedented surge in novel se...
Fronthaul Network Planning for Hierarchical and Radio-Stripes-Enabled CF-mMIMO in O-RANAnas S. Mohammed, Krishnendu S. Tharakan, Hussein A. Ammar, Hesham ElSawy, Hossam S. Hassanein2026-03-29下载The deployment of ultra-dense networks (UDNs), particularly cell-free massive MIMO (CF-mMIMO), is mainly hindered by costly and capacity-limited fronthaul links.
LP-Based Algorithms for Scheduling in a Quantum SwitchR. Srikant2026-03-29下载We consider scheduling in a quantum switch with stochastic entanglement generation, finite quantum memories, and decoherence. The objective is to design a scheduling algorithm with polynomial-time com...
Tracking without Seeing: Geospatial Inference using Encrypted Traffic from Distributed NodesSadik Yagiz Yetim, Gaofeng Dong, Isaac-Neil Zanoria, Ronit Barman, Maggie Wigness, Tarek Abdelzaher, Mani Srivastava, Suhas Diggavi2026-03-29下载Accurate observation of dynamic environments traditionally relies on synthesizing raw, signal-level information from multiple distributed sensors.
Serverless5GC: Private 5G Core Deployment via a Procedure-as-a-Function ArchitectureHai Dinh-Tuan2026-03-29下载Open-source 5G core implementations deploy network functions as always-on processes that consume resources even when idle. This inefficiency is most acute in private and edge deployments with sporadic...
RADAR-Q: Resource-Aware Distributed Asynchronous Routing for Entanglement Distribution in Multi-Tenant Quantum NetworksChenliang Tian, Zebo Yang, Raj Jain, Ramana Kompella, Reza Nejabati, Eneet Kaur, Aiman Erbad, Mohamed Abdallah, Mounir Hamdi2026-03-29下载Scalable quantum networks must support concurrent entanglement requests, yet existing routing protocols fail when users compete for shared repeater resources, wasting fragile quantum states.
Asynchronous Routing for Multipartite Entanglement in Quantum NetworksChenliang Tian, Zebo Yang, Raj Jain, Ramana Kompella, Reza Nejabati, Eneet Kaur, Aiman Erbad, Mounir Hamdi, Mohamed Abdallah2026-03-29下载In quantum networks, one way to communicate is to distribute entanglements through swapping at intermediate nodes. Most existing work primarily aims to create efficient two-party end-to-end entangleme...

cs.PF - Performance

标题作者发布日期PDF摘要
Operational Strategies for Non-Disruptive Scheduling Transitions in Production HPC SystemsGlen MacLachlan, Joseph Creech, Rubeel Muhammad Iqbal, Clark Gaylord, Jake Messick2026-03-29下载Migrating heterogeneous high-performance computing (HPC) systems to resource-aware scheduling introduces both technical and behavioral challenges, particularly in production environments with establis...
Beating vDSP: A 138 GFLOPS Radix-8 Stockham FFT on Apple Silicon via Two-Tier Register-Threadgroup Memory DecompositionMohamed Amine Bergach2026-03-29下载We present an optimized Fast Fourier Transform (FFT) implementation for Apple Silicon GPUs, achieving 138.45~GFLOPS for N=4096N\!=\!4096 complex single-precision transforms -- a 29% improvement over Appl...
Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUsYi Liu2026-03-29下载Scaling laws relate model quality to compute budget (FLOPs), but practitioners face wall-clock time constraints, not compute budgets. We study optimal model sizing under fixed time budgets from 5 minu...
RSR-core: A High-Performance Engine for Low-Bit Matrix-Vector MultiplicationMohsen Dehghankar, Abolfazl Asudeh2026-03-29下载Matrix-vector multiplication is a fundamental building block in neural networks, vector databases, and large language models, particularly during inference.

基于 VitePress 构建