Skip to content

2025-06-18

cs.AR - Architecture

标题作者发布日期PDF摘要
Sudoku: Decomposing DRAM Address Mapping into Component FunctionsMinbok Wi, Seungmin Baek, Seonyong Park, Mattan Erez, Jung Ho Ahn2025-06-18下载Decomposing DRAM address mappings into component-level functions is critical for understanding memory behavior and enabling precise RowHammer attacks, yet existing reverse-engineering methods fall sho...
Bias Variation Compensation in Perimeter-Gated SPAD TRNGsMd Sakibur Sajal, Hunter Guthrie, Marc Dandin2025-06-18下载Random number generators that utilize arrays of entropy source elements suffer from bias variation (BV). Despite the availability of efficient debiasing algorithms, optimized implementations of hardwa...
A System Level Compiler for Massively-Parallel, Spatial, Dataflow ArchitecturesDirk Van Essendelft, Patrick Wingo, Terry Jordan, Ryan Smith, Wissam Saidi2025-06-18下载We have developed a novel compiler called the Multiple-Architecture Compiler for Advanced Computing Hardware (MACH) designed specifically for massively-parallel, spatial, dataflow architectures like t...
SR-NCL: an Area-/Energy-Efficient Resilient NCL Architecture Based on Selective RedundancyHasnain A. Ziad, Alexander C. Bodoh, Ashiq A. Sakib2025-06-18下载Duplication-based redundancy schemes have proven to be effective in designing fully-resilient Quasi-delay Insensitive (QDI) asynchronous circuits.
From Block to Byte: Transforming PCIe SSDs with CXL Memory Protocol and Instruction AnnotationMiryeong Kwon, Donghyun Gouk, Junhyeok Jang, Jinwoo Baek, Hyunwoo You, Sangyoon Ji, Hongjoo Jung, Junseok Moon, Seungkwan Kang, Seungjun Lee, Myoungsoo Jung2025-06-18下载This paper explores how Compute Express Link (CXL) can transform PCIe-based block storage into a scalable, byte-addressable working memory. We address the challenges of adapting block storage to CXL's...
CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL TechnologiesDonghyun Gouk, Seungkwan Kang, Seungjun Lee, Jiseon Kim, Kyungkuk Nam, Eojin Ryu, Sangwon Lee, Dongpyung Kim, Junhyeok Jang, Hanyeoreum Bae, Myoungsoo Jung2025-06-18下载This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs).
Acore-CIM: build accurate and reliable mixed-signal CIM cores with RISC-V controlled self-calibrationOmar Numan, Gaurav Singh, Kazybek Adam, Jelin Leslin, Aleksi Korsman, Otto Simola, Marko Kosunen, Jussi Ryynänen, Martin Andraud2025-06-18下载Developing accurate and reliable Compute-In-Memory (CIM) architectures is becoming a key research focus to accelerate Artificial Intelligence (AI) tasks on hardware, particularly Deep Neural Networks ...
J3DAI: A tiny DNN-Based Edge AI Accelerator for 3D-Stacked CMOS Image SensorBenoit Tain, Raphael Millet, Romain Lemaire, Michal Szczepanski, Laurent Alacoque, Emmanuel Pluchart, Sylvain Choisnet, Rohit Prasad, Jerome Chossat, Pascal Pierunek, Pascal Vivet, Sebastien Thuries2025-06-18下载This paper presents J3DAI, a tiny deep neural network-based hardware accelerator for a 3-layer 3D-stacked CMOS image sensor featuring an artificial intelligence (AI) chip integrating a Deep Neural Net...
ChatModel: Automating Reference Model Design and Verification with LLMsJianmin Ye, Tianyang Liu, Qi Tian, Shengchu Su, Zhe Jiang, Xi Wang2025-06-18下载As the complexity of integrated circuit designs continues to escalate, the functional verification becomes increasingly challenging. Reference models, critical for accelerating the verification proces...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
PNCS:Power-Norm Cosine Similarity for Diverse Client Selection in Federated LearningLiangyan Li, Yangyi Liu, Yimo Ning, Stefano Rini, Jun Chen2025-06-18下载Federated Learning (FL) has emerged as a powerful paradigm for leveraging diverse datasets from multiple sources while preserving data privacy by avoiding centralized storage.
Autonomous Trajectory Optimization for UAVs in Disaster Zone Using Henry Gas Optimization SchemeZakria Qadir, Muhammad Bilal, Guoqiang Liu, Xiaolong Xu2025-06-18下载The unmanned aerial vehicles (UAVs) in a disaster-prone environment plays important role in assisting the rescue services and providing the internet connectivity with the outside world.
A System Level Compiler for Massively-Parallel, Spatial, Dataflow ArchitecturesDirk Van Essendelft, Patrick Wingo, Terry Jordan, Ryan Smith, Wissam Saidi2025-06-18下载We have developed a novel compiler called the Multiple-Architecture Compiler for Advanced Computing Hardware (MACH) designed specifically for massively-parallel, spatial, dataflow architectures like t...
Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome predictionVincent Roca, Marc Tommasi, Paul Andrey, Aurélien Bellet, Markus D. Schirmer, Hilde Henon, Laurent Puy, Julien Ramon, Grégory Kuchcinski, Martin Bretzner, Renaud Lopes2025-06-18下载\textbf{Objective:} Brain-predicted age difference (BrainAGE) is a neuroimaging biomarker reflecting brain health. However, training robust BrainAGE models requires large datasets, often restricted ...
BandPilot: Towards Performance- and Contention-Aware GPU Dispatching in AI ClustersKunming Zhang, Hanlong Liao, Junyu Xue, Deke Guo, Guoming Tang2025-06-18下载Modern multi-tenant AI clusters are increasingly communication-bound, driven by high-volume and multi-round GPU-to-GPU collective communication.
Automatic Metadata Capture and Processing for High-Performance WorkflowsPolina Shpilker, Line Pouchard2025-06-18下载Modern workflows run on increasingly heterogeneous computing architectures and with this heterogeneity comes additional complexity. We aim to apply the FAIR principles for research reproducibility by ...
Minimizing Communication for Parallel Symmetric Tensor Times Same Vector ComputationHussam Al Daas, Grey Ballard, Laura Grigori, Suraj Kumar, Kathryn Rouse, Mathieu Vérité2025-06-18下载In this article, we focus on the parallel communication cost of multiplying the same vector along two modes of a 33-dimensional symmetric tensor.
All is Not Lost: LLM Recovery without CheckpointsNikolay Blagoev, Oğuzhan Ersoy, Lydia Yiyu Chen2025-06-18下载Training LLMs on decentralized nodes or on-spot instances, lowers the training cost and enables model democratization. The inevitable challenge here is the transient churns of nodes due to failures an...
Parallel Paradigms in Modern HPC: A Comparative Analysis of MPI, OpenMP, and CUDANizar ALHafez, Ahmad Kurdi2025-06-18下载This paper presents a comprehensive comparison of three dominant parallel programming models in High Performance Computing (HPC): Message Passing Interface (MPI), Open Multi-Processing (OpenMP), and C...
Exploring Fast Fourier Transforms on the Tenstorrent WormholeNick Brown, Jake Davies, Felix LeClair2025-06-18下载Whilst numerous areas of computing have adopted the RISC-V Instruction Set Architecture (ISA) wholesale in recent years, it is yet to become widespread in HPC.
Programming RISC-V accelerators via FortranNick Brown, Jake Davies, Felix LeClair2025-06-18下载A range of RISC-V based accelerators are available and coming to market, and there is strong potential for these to be used for High Performance Computing (HPC) workloads.
RISC-V for HPC: An update of where we are and main action pointsNick Brown2025-06-18下载This extended abstract is submitted on behalf of the RISC-V HPC SIG who have been undertaking an analysis to explore the current state and limitations of the RISC-V ecosystem for HPC.
Centroid Approximation for Byzantine-Tolerant Federated LearningMélanie Cambus, Darya Melnyk, Tijana Milentijević, Stefan Schmid2025-06-18下载Federated learning allows each client to keep its data locally when training machine learning models in a distributed setting. Significant recent research established the requirements that the input m...
eLLM: Elastic Memory Management Framework for Efficient LLM ServingJiale Xu, Rui Zhang, Yi Xiong, Cong Guo, Zihan Liu, Yangjie Zhou, Weiming Hu, Hao Wu, Changxu Shao, Ziqing Wang, Yongjie Yuan, Junping Zhao, Minyi Guo, Jingwen Leng2025-06-18下载Large Language Models are increasingly being deployed in datacenters. Serving these models requires careful memory management, as their memory usage includes static weights, dynamic activations, and k...
Parallel Data Object Creation: Towards Scalable Metadata Management in High-Performance I/O LibraryYoujia Li, Robert Latham, Robert Ross, Ankit Agrawal, Alok Choudhary, Wei-Keng Liao2025-06-18下载High-level I/O libraries, such as HDF5 and PnetCDF, are commonly used by large-scale scientific applications to perform I/O tasks in parallel.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Autonomous Trajectory Optimization for UAVs in Disaster Zone Using Henry Gas Optimization SchemeZakria Qadir, Muhammad Bilal, Guoqiang Liu, Xiaolong Xu2025-06-18下载The unmanned aerial vehicles (UAVs) in a disaster-prone environment plays important role in assisting the rescue services and providing the internet connectivity with the outside world.
Golden Ratio Assisted Localization for Wireless Sensor NetworkHitesh Mohapatra2025-06-18下载This paper presents a novel localization algorithm for wireless sensor networks (WSNs) called Golden Ratio Localization (GRL), which leverages the mathematical properties of the golden ratio (phi 1.

cs.PF - Performance

标题作者发布日期PDF摘要
Atys: An Efficient Profiling Framework for Identifying Hotspot Functions in Large-scale Cloud MicroservicesJiaqi Sun, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue2025-06-18下载To handle the high volume of requests, large-scale services are comprised of thousands of instances deployed in clouds. These services utilize diverse programming languages and are distributed across ...

基于 VitePress 构建