Skip to content

2025-08-07

cs.AR - Architecture

标题作者发布日期PDF摘要
Accelerating Data Chunking in Deduplication Systems using Vector InstructionsSreeharsha Udayashankar, Abdelrahman Baba, Samer Al-Kiswany2025-08-07下载Content-defined Chunking (CDC) algorithms dictate the overall space savings that deduplication systems achieve. However, due to their need to scan each file in its entirety, they are slow and often th...
ConiQ: Enabling Concatenated Quantum Error Correction on Neutral Atom ArraysPengyu Liu, Mingkuan Xu, Hengyun Zhou, Hanrui Wang, Umut A. Acar, Yunong Shi2025-08-07下载Recent progress on concatenated codes, especially many-hypercube codes, achieves unprecedented space efficiency. Yet two critical challenges persist in practice.
relOBI: A Reliable Low-latency Interconnect for Tightly-Coupled On-chip CommunicationMichael Rogenmoser, Angelo Garofalo, Luca Benini2025-08-07下载On-chip communication is a critical element of modern systems-on-chip (SoCs), allowing processor cores to interact with memory and peripherals.
Understanding and Mitigating Errors of LLM-Generated RTL CodeJiazheng Zhang, Cheng Liu, Long Cheng, Xiaowei Li, Huawei Li2025-08-07下载Despite limited success in large language model (LLM)-based register-transfer-level (RTL) code generation, the root causes of errors remain poorly understood.
TMA-Adaptive FP8 Grouped GEMM: Eliminating Padding Requirements in Low-Precision Training and Inference on HopperZhongling Su, Rong Fu, Weihan Cao, Jianfei Gao, Minxi Jin, Zhilin Pei, Hui Wang2025-08-07下载Current FP8 grouped GEMM implementations require padding each group to a fixed alignment (e.g., 128), incurring memory and computational overhead.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Snowpark: Performant, Secure, User-Friendly Data Engineering and AI/ML Next To Your DataBrandon Baker, Elliott Brossard, Chenwei Xie, Zihao Ye, Deen Liu, Yijun Xie, Arthur Zwiegincew, Nitya Kumar Sharma, Gaurav Jain, Eugene Retunsky, Mike Halcrow, Derek Denny-Brown, Istvan Cseri, Tyler Akidau, Yuxiong He2025-08-07下载Snowflake revolutionized data analytics with an elastic architecture that decouples compute and storage, enabling scalable solutions supporting data architectures like data lake, data warehouse, data ...
A Dynamic Approach to Load Balancing in Cloud Infrastructure: Enhancing Energy Efficiency and Resource UtilizationShadman Sakib, Ajay Katangur, Rahul Dubey2025-08-07下载Cloud computing has grown rapidly in recent years, mainly due to the sharp increase in data transferred over the internet. This growth makes load balancing a key part of cloud systems, as it helps dis...
Accelerating Data Chunking in Deduplication Systems using Vector InstructionsSreeharsha Udayashankar, Abdelrahman Baba, Samer Al-Kiswany2025-08-07下载Content-defined Chunking (CDC) algorithms dictate the overall space savings that deduplication systems achieve. However, due to their need to scan each file in its entirety, they are slow and often th...
X-VFL: A New Vertical Federated Learning Framework with Cross Completion and Decision Subspace AlignmentQinghua Yao, Xiangrui Xu, Zhize Li2025-08-07下载Vertical Federated Learning (VFL) enables collaborative learning by integrating disjoint feature subsets from multiple clients/parties. However, VFL typically faces two key challenges: i) the requirem...
Modular Architecture for High-Performance and Low Overhead Data TransfersRasman Mubtasim Swargo, Engin Arslan, Md Arifuzzaman2025-08-07下载High-performance applications necessitate rapid and dependable transfer of massive datasets across geographically dispersed locations. Traditional file transfer tools often suffer from resource underu...
Adaptive Parallel Downloader for Large Genomic DatasetsRasman Mubtasim Swargo, Engin Arslan, Md Arifuzzaman2025-08-07下载Modern next-generation sequencing (NGS) projects routinely generate terabytes of data, which researchers commonly download from public repositories such as SRA or ENA.
A Feature Engineering Approach for Business Impact-Oriented Failure Detection in Distributed Instant Payment SystemsLorenzo Porcelli2025-08-07下载Instant payment infrastructures have stringent performance requirements, processing millions of transactions daily with zero-downtime expectations.
Simulating LLM training workloads for heterogeneous compute and network infrastructureSumit Kumar, Arjun Temura, Naman Sharma, Ramanjeet Singh, Meet Dadhania, Praveen Tammana, Satananda Burla, Abed Mohammad Kamaluddin, Rinku Shah2025-08-07下载The growing demand for large-scale GPU clusters in distributed model training presents a significant barrier to innovation, particularly in model optimization, performance tuning, and system-level enh...
HFedATM: Hierarchical Federated Domain Generalization via Optimal Transport and Regularized Mean AggregationThinh Nguyen, Trung Phan, Binh T. Nguyen, Khoa D Doan, Kok-Seng Wong2025-08-07下载Federated Learning (FL) is a decentralized approach where multiple clients collaboratively train a shared global model without sharing their raw data.
Theseus: A Distributed and Scalable GPU-Accelerated Query Processing Platform Optimized for Efficient Data MovementFelipe Aramburú, William Malpica, Kaouther Abrougui, Amin Aramoon, Romulo Auccapuclla, Claude Brisson, Matthijs Brobbel, Colby Farrell, Pradeep Garigipati, Joost Hoozemans, Supun Kamburugamuve, Akhil Nair, Alexander Ocsa, Johan Peltenburg, Rubén Quesada López, Deepak Sihag, Ahmet Uyar, Dhruv Vats, Michael Wendt, Jignesh M. Patel, Rodrigo Aramburú2025-08-07下载Online analytical processing of queries on datasets in the many-terabyte range is only possible with costly distributed computing systems. To decrease the cost and increase the throughput, systems can...
Task-Based Programming for Adaptive Mesh Refinement in Compressible Flow SimulationsAnjiang Wei, Hang Song, Mert Hidayetoglu, Elliott Slaughter, Sanjiva K. Lele, Alex Aiken2025-08-07下载High-order solvers for compressible flows are vital in scientific applications. Adaptive mesh refinement (AMR) is a key technique for reducing computational cost by concentrating resolution in regions...
Tesserae: Scalable Placement Policies for Deep Learning WorkloadsSong Bian, Saurabh Agarwal, Md. Tareq Mahmood, Shivaram Venkataraman2025-08-07下载Training deep learning (DL) models has become a dominant workload in data-centers and improving resource utilization is a key goal of DL cluster schedulers.
Managing, Analyzing and Sharing Research Data with Gen3 Data CommonsCraig Barnes, Kyle Burton, Michael S. Fitzsimons, Hara Prasad Juvvala, Brienna Larrick, Christopher Meyer, Pauline Ribeyre, Ao Liu, Clint Malson, Noah Metoki-Shlubsky, Andrii Prokhorenkov, Jawad Qureshi, Radhika Reddy, L. Philip Schumm, Mingfei Shao, Trevar Simmons, Alexander VanTol, Peter Vassilatos, Aarti Venkat, Robert L. Grossman2025-08-07下载Gen3 is an open-source data platform for building data commons. A data commons is a cloud-based data platform for managing, analyzing, and sharing data with a research community.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
HiSTM: Hierarchical Spatiotemporal Mamba for Cellular Traffic ForecastingZineddine Bettouche, Khalid Ali, Andreas Fischer, Andreas Kassler2025-08-07下载Cellular traffic forecasting is essential for network planning, resource allocation, or load-balancing traffic across cells. However, accurate forecasting is difficult due to intricate spatial and tem...
Modular Design and Experimental Evaluation of 5G Mobile Cell Architectures Based on Overlay and Integrated ModelsJosé Ruela, Ivan Cojocaru, André Coelho, Rui Campos, Manuel Ricardo2025-08-07下载This paper presents the concept, architectural design, and performance evaluation of a 5G Mobile Cell (MC) used to provide 5G wireless connectivity to User Equipment (UE) in areas with limited fixed 5...
TeraRIS NOMA-MIMO Communications for 6G and Beyond Industrial NetworksAli Raza, Muhammad Farhan Khan, Zeeshan Alam, Muhammad Saad, Ilyas Saleem, Muhammad Ahmed Mohsin, Muhammad Ali Jamshed2025-08-07下载This paper presents a joint framework that integrates reconfigurable intelligent surfaces (RISs) with Terahertz (THz) communications and non-orthogonal multiple access (NOMA) to enhance smart industri...
A Design for an Early Quantum NetworkYuan Li, Chen Zhang, Hao Zhang, Tao Huang, Yunjie Liu2025-08-07下载With the rapid advancement of quantum information technology, quantum networks have become essential for supporting diverse applications, which often have stringent demands for key metrics such as fid...
Camel: Energy-Aware LLM Inference on Resource-Constrained DevicesHao Xu, Long Peng, Shezheng Song, Xiaodong Liu, Ma Jun, Shasha Li, Jie Yu, Xiaoguang Mao2025-08-07下载Most Large Language Models (LLMs) are currently deployed in the cloud, with users relying on internet connectivity for access. However, this paradigm faces challenges such as network latency, privacy ...

cs.PF - Performance

标题作者发布日期PDF摘要
Back to Bits: Extending Shannon's communication performance framework to computingMax Hawkins, Richard Vuduc2025-08-07下载This work proposes a novel computing performance unit grounded in information theory. Modern computing systems are increasingly diverse, supporting low-precision formats, hardware specialization, and ...
Dancing with a Robot: An Experimental Study of Child-Robot Interaction in a Performative Art SettingVictor Ngo, Rachel, Ramchurn, Roma Patel, Alan Chamberlain, Ayse Kucukyilmaz2025-08-07下载This paper presents an evaluation of 18 children's in-the-wild experiences with the autonomous robot arm performer NED (Never-Ending Dancer) within the Thingamabobas installation, showcased across the...
CRAM: Large-scale Video Continual Learning with Bootstrapped CompressionShivani Mall, Joao F. Henriques2025-08-07下载Continual learning (CL) promises to allow neural networks to learn from continuous streams of inputs, instead of IID (independent and identically distributed) sampling, which requires random access to...

基于 VitePress 构建