Appearance
2025-06-18
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Sudoku: Decomposing DRAM Address Mapping into Component Functions | Minbok Wi, Seungmin Baek, Seonyong Park, Mattan Erez, Jung Ho Ahn | 2025-06-18 | 下载 | Decomposing DRAM address mappings into component-level functions is critical for understanding memory behavior and enabling precise RowHammer attacks, yet existing reverse-engineering methods fall sho... |
| Bias Variation Compensation in Perimeter-Gated SPAD TRNGs | Md Sakibur Sajal, Hunter Guthrie, Marc Dandin | 2025-06-18 | 下载 | Random number generators that utilize arrays of entropy source elements suffer from bias variation (BV). Despite the availability of efficient debiasing algorithms, optimized implementations of hardwa... |
| A System Level Compiler for Massively-Parallel, Spatial, Dataflow Architectures | Dirk Van Essendelft, Patrick Wingo, Terry Jordan, Ryan Smith, Wissam Saidi | 2025-06-18 | 下载 | We have developed a novel compiler called the Multiple-Architecture Compiler for Advanced Computing Hardware (MACH) designed specifically for massively-parallel, spatial, dataflow architectures like t... |
| SR-NCL: an Area-/Energy-Efficient Resilient NCL Architecture Based on Selective Redundancy | Hasnain A. Ziad, Alexander C. Bodoh, Ashiq A. Sakib | 2025-06-18 | 下载 | Duplication-based redundancy schemes have proven to be effective in designing fully-resilient Quasi-delay Insensitive (QDI) asynchronous circuits. |
| From Block to Byte: Transforming PCIe SSDs with CXL Memory Protocol and Instruction Annotation | Miryeong Kwon, Donghyun Gouk, Junhyeok Jang, Jinwoo Baek, Hyunwoo You, Sangyoon Ji, Hongjoo Jung, Junseok Moon, Seungkwan Kang, Seungjun Lee, Myoungsoo Jung | 2025-06-18 | 下载 | This paper explores how Compute Express Link (CXL) can transform PCIe-based block storage into a scalable, byte-addressable working memory. We address the challenges of adapting block storage to CXL's... |
| CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies | Donghyun Gouk, Seungkwan Kang, Seungjun Lee, Jiseon Kim, Kyungkuk Nam, Eojin Ryu, Sangwon Lee, Dongpyung Kim, Junhyeok Jang, Hanyeoreum Bae, Myoungsoo Jung | 2025-06-18 | 下载 | This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs). |
| Acore-CIM: build accurate and reliable mixed-signal CIM cores with RISC-V controlled self-calibration | Omar Numan, Gaurav Singh, Kazybek Adam, Jelin Leslin, Aleksi Korsman, Otto Simola, Marko Kosunen, Jussi Ryynänen, Martin Andraud | 2025-06-18 | 下载 | Developing accurate and reliable Compute-In-Memory (CIM) architectures is becoming a key research focus to accelerate Artificial Intelligence (AI) tasks on hardware, particularly Deep Neural Networks ... |
| J3DAI: A tiny DNN-Based Edge AI Accelerator for 3D-Stacked CMOS Image Sensor | Benoit Tain, Raphael Millet, Romain Lemaire, Michal Szczepanski, Laurent Alacoque, Emmanuel Pluchart, Sylvain Choisnet, Rohit Prasad, Jerome Chossat, Pascal Pierunek, Pascal Vivet, Sebastien Thuries | 2025-06-18 | 下载 | This paper presents J3DAI, a tiny deep neural network-based hardware accelerator for a 3-layer 3D-stacked CMOS image sensor featuring an artificial intelligence (AI) chip integrating a Deep Neural Net... |
| ChatModel: Automating Reference Model Design and Verification with LLMs | Jianmin Ye, Tianyang Liu, Qi Tian, Shengchu Su, Zhe Jiang, Xi Wang | 2025-06-18 | 下载 | As the complexity of integrated circuit designs continues to escalate, the functional verification becomes increasingly challenging. Reference models, critical for accelerating the verification proces... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| PNCS:Power-Norm Cosine Similarity for Diverse Client Selection in Federated Learning | Liangyan Li, Yangyi Liu, Yimo Ning, Stefano Rini, Jun Chen | 2025-06-18 | 下载 | Federated Learning (FL) has emerged as a powerful paradigm for leveraging diverse datasets from multiple sources while preserving data privacy by avoiding centralized storage. |
| Autonomous Trajectory Optimization for UAVs in Disaster Zone Using Henry Gas Optimization Scheme | Zakria Qadir, Muhammad Bilal, Guoqiang Liu, Xiaolong Xu | 2025-06-18 | 下载 | The unmanned aerial vehicles (UAVs) in a disaster-prone environment plays important role in assisting the rescue services and providing the internet connectivity with the outside world. |
| A System Level Compiler for Massively-Parallel, Spatial, Dataflow Architectures | Dirk Van Essendelft, Patrick Wingo, Terry Jordan, Ryan Smith, Wissam Saidi | 2025-06-18 | 下载 | We have developed a novel compiler called the Multiple-Architecture Compiler for Advanced Computing Hardware (MACH) designed specifically for massively-parallel, spatial, dataflow architectures like t... |
| Federated Learning for MRI-based BrainAGE: a multicenter study on post-stroke functional outcome prediction | Vincent Roca, Marc Tommasi, Paul Andrey, Aurélien Bellet, Markus D. Schirmer, Hilde Henon, Laurent Puy, Julien Ramon, Grégory Kuchcinski, Martin Bretzner, Renaud Lopes | 2025-06-18 | 下载 | \textbf{Objective:} Brain-predicted age difference (BrainAGE) is a neuroimaging biomarker reflecting brain health. However, training robust BrainAGE models requires large datasets, often restricted ... |
| BandPilot: Towards Performance- and Contention-Aware GPU Dispatching in AI Clusters | Kunming Zhang, Hanlong Liao, Junyu Xue, Deke Guo, Guoming Tang | 2025-06-18 | 下载 | Modern multi-tenant AI clusters are increasingly communication-bound, driven by high-volume and multi-round GPU-to-GPU collective communication. |
| Automatic Metadata Capture and Processing for High-Performance Workflows | Polina Shpilker, Line Pouchard | 2025-06-18 | 下载 | Modern workflows run on increasingly heterogeneous computing architectures and with this heterogeneity comes additional complexity. We aim to apply the FAIR principles for research reproducibility by ... |
| Minimizing Communication for Parallel Symmetric Tensor Times Same Vector Computation | Hussam Al Daas, Grey Ballard, Laura Grigori, Suraj Kumar, Kathryn Rouse, Mathieu Vérité | 2025-06-18 | 下载 | In this article, we focus on the parallel communication cost of multiplying the same vector along two modes of a -dimensional symmetric tensor. |
| All is Not Lost: LLM Recovery without Checkpoints | Nikolay Blagoev, Oğuzhan Ersoy, Lydia Yiyu Chen | 2025-06-18 | 下载 | Training LLMs on decentralized nodes or on-spot instances, lowers the training cost and enables model democratization. The inevitable challenge here is the transient churns of nodes due to failures an... |
| Parallel Paradigms in Modern HPC: A Comparative Analysis of MPI, OpenMP, and CUDA | Nizar ALHafez, Ahmad Kurdi | 2025-06-18 | 下载 | This paper presents a comprehensive comparison of three dominant parallel programming models in High Performance Computing (HPC): Message Passing Interface (MPI), Open Multi-Processing (OpenMP), and C... |
| Exploring Fast Fourier Transforms on the Tenstorrent Wormhole | Nick Brown, Jake Davies, Felix LeClair | 2025-06-18 | 下载 | Whilst numerous areas of computing have adopted the RISC-V Instruction Set Architecture (ISA) wholesale in recent years, it is yet to become widespread in HPC. |
| Programming RISC-V accelerators via Fortran | Nick Brown, Jake Davies, Felix LeClair | 2025-06-18 | 下载 | A range of RISC-V based accelerators are available and coming to market, and there is strong potential for these to be used for High Performance Computing (HPC) workloads. |
| RISC-V for HPC: An update of where we are and main action points | Nick Brown | 2025-06-18 | 下载 | This extended abstract is submitted on behalf of the RISC-V HPC SIG who have been undertaking an analysis to explore the current state and limitations of the RISC-V ecosystem for HPC. |
| Centroid Approximation for Byzantine-Tolerant Federated Learning | Mélanie Cambus, Darya Melnyk, Tijana Milentijević, Stefan Schmid | 2025-06-18 | 下载 | Federated learning allows each client to keep its data locally when training machine learning models in a distributed setting. Significant recent research established the requirements that the input m... |
| eLLM: Elastic Memory Management Framework for Efficient LLM Serving | Jiale Xu, Rui Zhang, Yi Xiong, Cong Guo, Zihan Liu, Yangjie Zhou, Weiming Hu, Hao Wu, Changxu Shao, Ziqing Wang, Yongjie Yuan, Junping Zhao, Minyi Guo, Jingwen Leng | 2025-06-18 | 下载 | Large Language Models are increasingly being deployed in datacenters. Serving these models requires careful memory management, as their memory usage includes static weights, dynamic activations, and k... |
| Parallel Data Object Creation: Towards Scalable Metadata Management in High-Performance I/O Library | Youjia Li, Robert Latham, Robert Ross, Ankit Agrawal, Alok Choudhary, Wei-Keng Liao | 2025-06-18 | 下载 | High-level I/O libraries, such as HDF5 and PnetCDF, are commonly used by large-scale scientific applications to perform I/O tasks in parallel. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Autonomous Trajectory Optimization for UAVs in Disaster Zone Using Henry Gas Optimization Scheme | Zakria Qadir, Muhammad Bilal, Guoqiang Liu, Xiaolong Xu | 2025-06-18 | 下载 | The unmanned aerial vehicles (UAVs) in a disaster-prone environment plays important role in assisting the rescue services and providing the internet connectivity with the outside world. |
| Golden Ratio Assisted Localization for Wireless Sensor Network | Hitesh Mohapatra | 2025-06-18 | 下载 | This paper presents a novel localization algorithm for wireless sensor networks (WSNs) called Golden Ratio Localization (GRL), which leverages the mathematical properties of the golden ratio (phi 1. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Atys: An Efficient Profiling Framework for Identifying Hotspot Functions in Large-scale Cloud Microservices | Jiaqi Sun, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue | 2025-06-18 | 下载 | To handle the high volume of requests, large-scale services are comprised of thousands of instances deployed in clouds. These services utilize diverse programming languages and are distributed across ... |