2025-06-18

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Sudoku: Decomposing DRAM Address Mapping into Component Functions	Minbok Wi, Seungmin Baek, Seonyong Park, Mattan Erez, Jung Ho Ahn	2025-06-18	下载	Decomposing DRAM address mappings into component-level functions is critical for understanding memory behavior and enabling precise RowHammer attacks, yet existing reverse-engineering methods fall sho...
Bias Variation Compensation in Perimeter-Gated SPAD TRNGs	Md Sakibur Sajal, Hunter Guthrie, Marc Dandin	2025-06-18	下载	Random number generators that utilize arrays of entropy source elements suffer from bias variation (BV). Despite the availability of efficient debiasing algorithms, optimized implementations of hardwa...
A System Level Compiler for Massively-Parallel, Spatial, Dataflow Architectures	Dirk Van Essendelft, Patrick Wingo, Terry Jordan, Ryan Smith, Wissam Saidi	2025-06-18	下载	We have developed a novel compiler called the Multiple-Architecture Compiler for Advanced Computing Hardware (MACH) designed specifically for massively-parallel, spatial, dataflow architectures like t...
SR-NCL: an Area-/Energy-Efficient Resilient NCL Architecture Based on Selective Redundancy	Hasnain A. Ziad, Alexander C. Bodoh, Ashiq A. Sakib	2025-06-18	下载	Duplication-based redundancy schemes have proven to be effective in designing fully-resilient Quasi-delay Insensitive (QDI) asynchronous circuits.
From Block to Byte: Transforming PCIe SSDs with CXL Memory Protocol and Instruction Annotation	Miryeong Kwon, Donghyun Gouk, Junhyeok Jang, Jinwoo Baek, Hyunwoo You, Sangyoon Ji, Hongjoo Jung, Junseok Moon, Seungkwan Kang, Seungjun Lee, Myoungsoo Jung	2025-06-18	下载	This paper explores how Compute Express Link (CXL) can transform PCIe-based block storage into a scalable, byte-addressable working memory. We address the challenges of adapting block storage to CXL's...
CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies	Donghyun Gouk, Seungkwan Kang, Seungjun Lee, Jiseon Kim, Kyungkuk Nam, Eojin Ryu, Sangwon Lee, Dongpyung Kim, Junhyeok Jang, Hanyeoreum Bae, Myoungsoo Jung	2025-06-18	下载	This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs).
Acore-CIM: build accurate and reliable mixed-signal CIM cores with RISC-V controlled self-calibration	Omar Numan, Gaurav Singh, Kazybek Adam, Jelin Leslin, Aleksi Korsman, Otto Simola, Marko Kosunen, Jussi Ryynänen, Martin Andraud	2025-06-18	下载	Developing accurate and reliable Compute-In-Memory (CIM) architectures is becoming a key research focus to accelerate Artificial Intelligence (AI) tasks on hardware, particularly Deep Neural Networks ...
J3DAI: A tiny DNN-Based Edge AI Accelerator for 3D-Stacked CMOS Image Sensor	Benoit Tain, Raphael Millet, Romain Lemaire, Michal Szczepanski, Laurent Alacoque, Emmanuel Pluchart, Sylvain Choisnet, Rohit Prasad, Jerome Chossat, Pascal Pierunek, Pascal Vivet, Sebastien Thuries	2025-06-18	下载	This paper presents J3DAI, a tiny deep neural network-based hardware accelerator for a 3-layer 3D-stacked CMOS image sensor featuring an artificial intelligence (AI) chip integrating a Deep Neural Net...
ChatModel: Automating Reference Model Design and Verification with LLMs	Jianmin Ye, Tianyang Liu, Qi Tian, Shengchu Su, Zhe Jiang, Xi Wang	2025-06-18	下载	As the complexity of integrated circuit designs continues to escalate, the functional verification becomes increasingly challenging. Reference models, critical for accelerating the verification proces...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
PNCS:Power-Norm Cosine Similarity for Diverse Client Selection in Federated Learning	Liangyan Li, Yangyi Liu, Yimo Ning, Stefano Rini, Jun Chen	2025-06-18	下载	Federated Learning (FL) has emerged as a powerful paradigm for leveraging diverse datasets from multiple sources while preserving data privacy by avoiding centralized storage.
Autonomous Trajectory Optimization for UAVs in Disaster Zone Using Henry Gas Optimization Scheme	Zakria Qadir, Muhammad Bilal, Guoqiang Liu, Xiaolong Xu	2025-06-18	下载	The unmanned aerial vehicles (UAVs) in a disaster-prone environment plays important role in assisting the rescue services and providing the internet connectivity with the outside world.
A System Level Compiler for Massively-Parallel, Spatial, Dataflow Architectures	Dirk Van Essendelft, Patrick Wingo, Terry Jordan, Ryan Smith, Wissam Saidi	2025-06-18	下载	We have developed a novel compiler called the Multiple-Architecture Compiler for Advanced Computing Hardware (MACH) designed specifically for massively-parallel, spatial, dataflow architectures like t...
Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome prediction	Vincent Roca, Marc Tommasi, Paul Andrey, Aurélien Bellet, Markus D. Schirmer, Hilde Henon, Laurent Puy, Julien Ramon, Grégory Kuchcinski, Martin Bretzner, Renaud Lopes	2025-06-18	下载	\textbf{Objective:} Brain-predicted age difference (BrainAGE) is a neuroimaging biomarker reflecting brain health. However, training robust BrainAGE models requires large datasets, often restricted ...
BandPilot: Towards Performance- and Contention-Aware GPU Dispatching in AI Clusters	Kunming Zhang, Hanlong Liao, Junyu Xue, Deke Guo, Guoming Tang	2025-06-18	下载	Modern multi-tenant AI clusters are increasingly communication-bound, driven by high-volume and multi-round GPU-to-GPU collective communication.
Automatic Metadata Capture and Processing for High-Performance Workflows	Polina Shpilker, Line Pouchard	2025-06-18	下载	Modern workflows run on increasingly heterogeneous computing architectures and with this heterogeneity comes additional complexity. We aim to apply the FAIR principles for research reproducibility by ...
Minimizing Communication for Parallel Symmetric Tensor Times Same Vector Computation	Hussam Al Daas, Grey Ballard, Laura Grigori, Suraj Kumar, Kathryn Rouse, Mathieu Vérité	2025-06-18	下载	In this article, we focus on the parallel communication cost of multiplying the same vector along two modes of a $3$ -dimensional symmetric tensor.
All is Not Lost: LLM Recovery without Checkpoints	Nikolay Blagoev, Oğuzhan Ersoy, Lydia Yiyu Chen	2025-06-18	下载	Training LLMs on decentralized nodes or on-spot instances, lowers the training cost and enables model democratization. The inevitable challenge here is the transient churns of nodes due to failures an...
Parallel Paradigms in Modern HPC: A Comparative Analysis of MPI, OpenMP, and CUDA	Nizar ALHafez, Ahmad Kurdi	2025-06-18	下载	This paper presents a comprehensive comparison of three dominant parallel programming models in High Performance Computing (HPC): Message Passing Interface (MPI), Open Multi-Processing (OpenMP), and C...
Exploring Fast Fourier Transforms on the Tenstorrent Wormhole	Nick Brown, Jake Davies, Felix LeClair	2025-06-18	下载	Whilst numerous areas of computing have adopted the RISC-V Instruction Set Architecture (ISA) wholesale in recent years, it is yet to become widespread in HPC.
Programming RISC-V accelerators via Fortran	Nick Brown, Jake Davies, Felix LeClair	2025-06-18	下载	A range of RISC-V based accelerators are available and coming to market, and there is strong potential for these to be used for High Performance Computing (HPC) workloads.
RISC-V for HPC: An update of where we are and main action points	Nick Brown	2025-06-18	下载	This extended abstract is submitted on behalf of the RISC-V HPC SIG who have been undertaking an analysis to explore the current state and limitations of the RISC-V ecosystem for HPC.
Centroid Approximation for Byzantine-Tolerant Federated Learning	Mélanie Cambus, Darya Melnyk, Tijana Milentijević, Stefan Schmid	2025-06-18	下载	Federated learning allows each client to keep its data locally when training machine learning models in a distributed setting. Significant recent research established the requirements that the input m...
eLLM: Elastic Memory Management Framework for Efficient LLM Serving	Jiale Xu, Rui Zhang, Yi Xiong, Cong Guo, Zihan Liu, Yangjie Zhou, Weiming Hu, Hao Wu, Changxu Shao, Ziqing Wang, Yongjie Yuan, Junping Zhao, Minyi Guo, Jingwen Leng	2025-06-18	下载	Large Language Models are increasingly being deployed in datacenters. Serving these models requires careful memory management, as their memory usage includes static weights, dynamic activations, and k...
Parallel Data Object Creation: Towards Scalable Metadata Management in High-Performance I/O Library	Youjia Li, Robert Latham, Robert Ross, Ankit Agrawal, Alok Choudhary, Wei-Keng Liao	2025-06-18	下载	High-level I/O libraries, such as HDF5 and PnetCDF, are commonly used by large-scale scientific applications to perform I/O tasks in parallel.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Autonomous Trajectory Optimization for UAVs in Disaster Zone Using Henry Gas Optimization Scheme	Zakria Qadir, Muhammad Bilal, Guoqiang Liu, Xiaolong Xu	2025-06-18	下载	The unmanned aerial vehicles (UAVs) in a disaster-prone environment plays important role in assisting the rescue services and providing the internet connectivity with the outside world.
Golden Ratio Assisted Localization for Wireless Sensor Network	Hitesh Mohapatra	2025-06-18	下载	This paper presents a novel localization algorithm for wireless sensor networks (WSNs) called Golden Ratio Localization (GRL), which leverages the mathematical properties of the golden ratio (phi 1.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Atys: An Efficient Profiling Framework for Identifying Hotspot Functions in Large-scale Cloud Microservices	Jiaqi Sun, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue	2025-06-18	下载	To handle the high volume of requests, large-scale services are comprised of thousands of instances deployed in clouds. These services utilize diverse programming languages and are distributed across ...