2025-08-07

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Accelerating Data Chunking in Deduplication Systems using Vector Instructions	Sreeharsha Udayashankar, Abdelrahman Baba, Samer Al-Kiswany	2025-08-07	下载	Content-defined Chunking (CDC) algorithms dictate the overall space savings that deduplication systems achieve. However, due to their need to scan each file in its entirety, they are slow and often th...
ConiQ: Enabling Concatenated Quantum Error Correction on Neutral Atom Arrays	Pengyu Liu, Mingkuan Xu, Hengyun Zhou, Hanrui Wang, Umut A. Acar, Yunong Shi	2025-08-07	下载	Recent progress on concatenated codes, especially many-hypercube codes, achieves unprecedented space efficiency. Yet two critical challenges persist in practice.
relOBI: A Reliable Low-latency Interconnect for Tightly-Coupled On-chip Communication	Michael Rogenmoser, Angelo Garofalo, Luca Benini	2025-08-07	下载	On-chip communication is a critical element of modern systems-on-chip (SoCs), allowing processor cores to interact with memory and peripherals.
Understanding and Mitigating Errors of LLM-Generated RTL Code	Jiazheng Zhang, Cheng Liu, Long Cheng, Xiaowei Li, Huawei Li	2025-08-07	下载	Despite limited success in large language model (LLM)-based register-transfer-level (RTL) code generation, the root causes of errors remain poorly understood.
TMA-Adaptive FP8 Grouped GEMM: Eliminating Padding Requirements in Low-Precision Training and Inference on Hopper	Zhongling Su, Rong Fu, Weihan Cao, Jianfei Gao, Minxi Jin, Zhilin Pei, Hui Wang	2025-08-07	下载	Current FP8 grouped GEMM implementations require padding each group to a fixed alignment (e.g., 128), incurring memory and computational overhead.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Snowpark: Performant, Secure, User-Friendly Data Engineering and AI/ML Next To Your Data	Brandon Baker, Elliott Brossard, Chenwei Xie, Zihao Ye, Deen Liu, Yijun Xie, Arthur Zwiegincew, Nitya Kumar Sharma, Gaurav Jain, Eugene Retunsky, Mike Halcrow, Derek Denny-Brown, Istvan Cseri, Tyler Akidau, Yuxiong He	2025-08-07	下载	Snowflake revolutionized data analytics with an elastic architecture that decouples compute and storage, enabling scalable solutions supporting data architectures like data lake, data warehouse, data ...
A Dynamic Approach to Load Balancing in Cloud Infrastructure: Enhancing Energy Efficiency and Resource Utilization	Shadman Sakib, Ajay Katangur, Rahul Dubey	2025-08-07	下载	Cloud computing has grown rapidly in recent years, mainly due to the sharp increase in data transferred over the internet. This growth makes load balancing a key part of cloud systems, as it helps dis...
Accelerating Data Chunking in Deduplication Systems using Vector Instructions	Sreeharsha Udayashankar, Abdelrahman Baba, Samer Al-Kiswany	2025-08-07	下载	Content-defined Chunking (CDC) algorithms dictate the overall space savings that deduplication systems achieve. However, due to their need to scan each file in its entirety, they are slow and often th...
X-VFL: A New Vertical Federated Learning Framework with Cross Completion and Decision Subspace Alignment	Qinghua Yao, Xiangrui Xu, Zhize Li	2025-08-07	下载	Vertical Federated Learning (VFL) enables collaborative learning by integrating disjoint feature subsets from multiple clients/parties. However, VFL typically faces two key challenges: i) the requirem...
Modular Architecture for High-Performance and Low Overhead Data Transfers	Rasman Mubtasim Swargo, Engin Arslan, Md Arifuzzaman	2025-08-07	下载	High-performance applications necessitate rapid and dependable transfer of massive datasets across geographically dispersed locations. Traditional file transfer tools often suffer from resource underu...
Adaptive Parallel Downloader for Large Genomic Datasets	Rasman Mubtasim Swargo, Engin Arslan, Md Arifuzzaman	2025-08-07	下载	Modern next-generation sequencing (NGS) projects routinely generate terabytes of data, which researchers commonly download from public repositories such as SRA or ENA.
A Feature Engineering Approach for Business Impact-Oriented Failure Detection in Distributed Instant Payment Systems	Lorenzo Porcelli	2025-08-07	下载	Instant payment infrastructures have stringent performance requirements, processing millions of transactions daily with zero-downtime expectations.
Simulating LLM training workloads for heterogeneous compute and network infrastructure	Sumit Kumar, Arjun Temura, Naman Sharma, Ramanjeet Singh, Meet Dadhania, Praveen Tammana, Satananda Burla, Abed Mohammad Kamaluddin, Rinku Shah	2025-08-07	下载	The growing demand for large-scale GPU clusters in distributed model training presents a significant barrier to innovation, particularly in model optimization, performance tuning, and system-level enh...
HFedATM: Hierarchical Federated Domain Generalization via Optimal Transport and Regularized Mean Aggregation	Thinh Nguyen, Trung Phan, Binh T. Nguyen, Khoa D Doan, Kok-Seng Wong	2025-08-07	下载	Federated Learning (FL) is a decentralized approach where multiple clients collaboratively train a shared global model without sharing their raw data.
Theseus: A Distributed and Scalable GPU-Accelerated Query Processing Platform Optimized for Efficient Data Movement	Felipe Aramburú, William Malpica, Kaouther Abrougui, Amin Aramoon, Romulo Auccapuclla, Claude Brisson, Matthijs Brobbel, Colby Farrell, Pradeep Garigipati, Joost Hoozemans, Supun Kamburugamuve, Akhil Nair, Alexander Ocsa, Johan Peltenburg, Rubén Quesada López, Deepak Sihag, Ahmet Uyar, Dhruv Vats, Michael Wendt, Jignesh M. Patel, Rodrigo Aramburú	2025-08-07	下载	Online analytical processing of queries on datasets in the many-terabyte range is only possible with costly distributed computing systems. To decrease the cost and increase the throughput, systems can...
Task-Based Programming for Adaptive Mesh Refinement in Compressible Flow Simulations	Anjiang Wei, Hang Song, Mert Hidayetoglu, Elliott Slaughter, Sanjiva K. Lele, Alex Aiken	2025-08-07	下载	High-order solvers for compressible flows are vital in scientific applications. Adaptive mesh refinement (AMR) is a key technique for reducing computational cost by concentrating resolution in regions...
Tesserae: Scalable Placement Policies for Deep Learning Workloads	Song Bian, Saurabh Agarwal, Md. Tareq Mahmood, Shivaram Venkataraman	2025-08-07	下载	Training deep learning (DL) models has become a dominant workload in data-centers and improving resource utilization is a key goal of DL cluster schedulers.
Managing, Analyzing and Sharing Research Data with Gen3 Data Commons	Craig Barnes, Kyle Burton, Michael S. Fitzsimons, Hara Prasad Juvvala, Brienna Larrick, Christopher Meyer, Pauline Ribeyre, Ao Liu, Clint Malson, Noah Metoki-Shlubsky, Andrii Prokhorenkov, Jawad Qureshi, Radhika Reddy, L. Philip Schumm, Mingfei Shao, Trevar Simmons, Alexander VanTol, Peter Vassilatos, Aarti Venkat, Robert L. Grossman	2025-08-07	下载	Gen3 is an open-source data platform for building data commons. A data commons is a cloud-based data platform for managing, analyzing, and sharing data with a research community.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
HiSTM: Hierarchical Spatiotemporal Mamba for Cellular Traffic Forecasting	Zineddine Bettouche, Khalid Ali, Andreas Fischer, Andreas Kassler	2025-08-07	下载	Cellular traffic forecasting is essential for network planning, resource allocation, or load-balancing traffic across cells. However, accurate forecasting is difficult due to intricate spatial and tem...
Modular Design and Experimental Evaluation of 5G Mobile Cell Architectures Based on Overlay and Integrated Models	José Ruela, Ivan Cojocaru, André Coelho, Rui Campos, Manuel Ricardo	2025-08-07	下载	This paper presents the concept, architectural design, and performance evaluation of a 5G Mobile Cell (MC) used to provide 5G wireless connectivity to User Equipment (UE) in areas with limited fixed 5...
TeraRIS NOMA-MIMO Communications for 6G and Beyond Industrial Networks	Ali Raza, Muhammad Farhan Khan, Zeeshan Alam, Muhammad Saad, Ilyas Saleem, Muhammad Ahmed Mohsin, Muhammad Ali Jamshed	2025-08-07	下载	This paper presents a joint framework that integrates reconfigurable intelligent surfaces (RISs) with Terahertz (THz) communications and non-orthogonal multiple access (NOMA) to enhance smart industri...
A Design for an Early Quantum Network	Yuan Li, Chen Zhang, Hao Zhang, Tao Huang, Yunjie Liu	2025-08-07	下载	With the rapid advancement of quantum information technology, quantum networks have become essential for supporting diverse applications, which often have stringent demands for key metrics such as fid...
Camel: Energy-Aware LLM Inference on Resource-Constrained Devices	Hao Xu, Long Peng, Shezheng Song, Xiaodong Liu, Ma Jun, Shasha Li, Jie Yu, Xiaoguang Mao	2025-08-07	下载	Most Large Language Models (LLMs) are currently deployed in the cloud, with users relying on internet connectivity for access. However, this paradigm faces challenges such as network latency, privacy ...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Back to Bits: Extending Shannon's communication performance framework to computing	Max Hawkins, Richard Vuduc	2025-08-07	下载	This work proposes a novel computing performance unit grounded in information theory. Modern computing systems are increasingly diverse, supporting low-precision formats, hardware specialization, and ...
Dancing with a Robot: An Experimental Study of Child-Robot Interaction in a Performative Art Setting	Victor Ngo, Rachel, Ramchurn, Roma Patel, Alan Chamberlain, Ayse Kucukyilmaz	2025-08-07	下载	This paper presents an evaluation of 18 children's in-the-wild experiences with the autonomous robot arm performer NED (Never-Ending Dancer) within the Thingamabobas installation, showcased across the...
CRAM: Large-scale Video Continual Learning with Bootstrapped Compression	Shivani Mall, Joao F. Henriques	2025-08-07	下载	Continual learning (CL) promises to allow neural networks to learn from continuous streams of inputs, instead of IID (independent and identically distributed) sampling, which requires random access to...