Skip to content

2025-11-06

cs.AR - Architecture

标题作者发布日期PDF摘要
MDM: Manhattan Distance Mapping of DNN Weights for Parasitic-Resistance-Resilient Memristive CrossbarsMatheus Farias, Wanghley Martins, H. T. Kung2025-11-06下载Manhattan Distance Mapping (MDM) is a post-training deep neural network (DNN) weight mapping technique for memristive bit-sliced compute-in-memory (CIM) crossbars that reduces parasitic resistance (PR...
SLOFetch: Compressed-Hierarchical Instruction Prefetching for Cloud MicroservicesZerui Bao, Di Zhu, Liu Jiang, Shiqi Sheng, Ziwei Wang, Haoyun Zhang2025-11-06下载Large-scale networked services rely on deep soft-ware stacks and microservice orchestration, which increase instruction footprints and create frontend stalls that inflate tail latency and energy.
FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming DataflowRubens Lacouture, Nathan Zhang, Ritvik Sharma, Marco Siracusa, Fredrik Kjolstad, Kunle Olukotun, Olivia Hsu2025-11-06下载As deep learning models scale, sparse computation and specialized dataflow hardware have emerged as powerful solutions to address efficiency. We propose FuseFlow, a compiler that converts sparse machi...
Scalable and Efficient Intra- and Inter-node Interconnection Networks for Post-Exascale Supercomputers and Data centersJoaquin Tarraga-Moreno, Daniel Barley, Francisco J. Andujar Munoz, Jesus Escudero-Sahuquillo, Holger Froning, Pedro Javier Garcia, Francisco J. Quiles, Jose Duato2025-11-06下载The rapid growth of data-intensive applications such as generative AI, scientific simulations, and large-scale analytics is driving modern supercomputers and data centers toward increasingly heterogen...
wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency EstimationBenjamin Hawks, Jason Weitz, Dmitri Demler, Karla Tame-Narvaez, Dennis Plotnikov, Mohammad Mehdi Rahimifar, Hamza Ezzaoui Rahali, Audrey C. Therrien, Donovan Sproule, Elham E Khoda, Keegan A. Smith, Russell Marroquin, Giuseppe Di Guglielmo, Nhan Tran, Javier Duarte, Vladimir Loncar2025-11-06下载As machine learning (ML) is increasingly implemented in hardware to address real-time challenges in scientific applications, the development of advanced toolchains has significantly reduced the time r...
Analysis of Single Event Induced Bit Faults in a Deep Neural Network Accelerator PipelineNaïn Jonckers, Toon Vinck, Peter Karsmakers, Jeffrey Prinzie2025-11-06下载In recent years, the increased interest and the growth in application domains of Artificial Intelligence (AI), and more specifically Deep Neural Networks (DNNs), has led to an extensive usage of domai...
AIM: Software and Hardware Co-design for Architecture-level IR-drop Mitigation in High-performance PIMYuanpeng Zhang, Xing Hu, Xi Chen, Zhihang Yuan, Cong Li, Jingchen Zhu, Zhao Wang, Chenguang Zhang, Xin Si, Wei Gao, Qiang Wu, Runsheng Wang, Guangyu Sun2025-11-06下载SRAM Processing-in-Memory (PIM) has emerged as the most promising implementation for high-performance PIM, delivering superior computing density, energy efficiency, and computational precision.
FiCABU: A Fisher-Based, Context-Adaptive Machine Unlearning Processor for Edge AIEun-Su Cho, Jongin Choi, Jeongmin Jin, Jae-Jin Lee, Woojoo Lee2025-11-06下载Machine unlearning, driven by privacy regulations and the "right to be forgotten", is increasingly needed at the edge, yet server-centric or retraining-heavy methods are impractical under tight comput...
Disaggregated Architectures and the Redesign of Data Center Ecosystems: Scheduling, Pooling, and Infrastructure Trade-offsChao Guo, Jiahe Xu, Moshe Zukerman2025-11-06下载Hardware disaggregation seeks to transform Data Center (DC) resources from traditional server fleets into unified resource pools. Despite existing challenges that may hinder its full realization, sign...
PICNIC: Silicon Photonic Interconnected Chiplets with Computational Network and In-memory Computing for LLM Inference AccelerationYue Jiet Chong, Yimin Wang, Zhen Wu, Xuanyao Fong2025-11-06下载This paper presents a 3D-stacked chiplets based large language model (LLM) inference accelerator, consisting of non-volatile in-memory-computing processing elements (PEs) and Inter-PE Computational Ne...
From Minutes to Seconds: Redefining the Five-Minute Rule for AI-Era Memory HierarchiesTong Zhang, Vikram Sharma Mailthody, Fei Sun, Linsen Ma, Chris J. Newburn, Teresa Zhang, Yang Liu, Jiangpeng Li, Hao Zhong, Wen-Mei Hwu2025-11-06下载In 1987, Jim Gray and Gianfranco Putzolu introduced the five-minute rule, a simple, storage-memory-economics-based heuristic for deciding when data should live in DRAM rather than on storage.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Marionette: Data Structure Description and Management for Heterogeneous ComputingNuno dos Santos Fernandes, Pedro Tomás, Nuno Roma, Frank Winklmeier, Patricia Conde-Muíño2025-11-06下载Adapting large, object-oriented C++ codebases for hardware acceleration might be extremely challenging, particularly when targeting heterogeneous platforms such as GPUs.
Resolving Conflicts with Grace: Dynamically Concurrent UniversalityPetr Kuznetsov, Nathan Josia Schrodt2025-11-06下载Synchronization is the major obstacle to scalability in distributed computing. Concurrent operations on the shared data engage in synchronization when they encounter a \emph{conflict}, i.e.
A New Probabilistic Mobile Byzantine Failure Model for Self-Protecting SystemsSilvia Bonomi, Giovanni Farina, Roy Friedman, Eviatar B. Procaccia, Sebastien Tixeuil2025-11-06下载Modern distributed systems face growing security threats, as attackers continuously enhance their skills and vulnerabilities span across the entire system stack, from hardware to the application layer...
Scalable Domain-decomposed Monte Carlo Neutral Transport for Nuclear FusionOskar Lappi, Huw Leggate, Yannick Marandet, Jan Åström, Keijo Heljanko, Dmitriy V. Borodin2025-11-06下载EIRENE [1] is a Monte Carlo neutral transport solver heavily used in the fusion community. EIRENE does not implement domain decomposition, making it impossible to use for simulations where the grid da...
Enabling Dynamic Sparsity in Quantized LLM InferenceRongxiang Wang, Kangyuan Shu, Felix Xiaozhu Lin2025-11-06下载Deploying large language models (LLMs) on end-user devices is gaining importance due to benefits in responsiveness, privacy, and operational cost.
AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUsPedro Antunes, Ana Rita Ortigoso, Gabriel Vieira, Daniel Fuentes, Luís Frazão, Nuno Costa, António Pereira2025-11-06下载The rise of Large Language Models (LLM) has increased the need for scalable, high-performance inference systems, yet most existing frameworks assume homogeneous, resource-rich hardware, often unrealis...
Parallel Spawning Strategies for Dynamic-Aware MPI ApplicationsIker Martín-Álvarez, José I. Aliaga, Maribel Castillo2025-11-06下载Dynamic resource management is an increasingly important capability of High Performance Computing systems, as it enables jobs to adjust their resource allocation at runtime.
A Reinforced Evolution-Based Approach to Multi-Resource Load BalancingLeszek Sliwko2025-11-06下载This paper presents a reinforced genetic approach to a defined d-resource system optimization problem. The classical evolution schema was ineffective due to a very strict feasibility function in the s...
DIAP: A Decentralized Agent Identity Protocol with Zero-Knowledge Proofs and a Hybrid P2P StackYuanjie Liu, Wenpeng Xing, Ye Zhou, Gaowei Chang, Changting Lin, Meng Han2025-11-06下载The absence of a fully decentralized, verifiable, and privacy-preserving communication protocol for autonomous agents remains a core challenge in decentralized computing.
Stochastic Modeling for Energy-Efficient Edge InfrastructureFabio Diniz Rossi2025-11-06下载Edge Computing enables low-latency processing for real-time applications but introduces challenges in power management due to the distributed nature of edge devices and their limited energy resources.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Bit-Flipping Attack Exploration and Countermeasure in 5G NetworkJoon Kim, Chengwei Duan, Sandip Ray2025-11-06下载5G communication technology has become a vital component in a wide range of applications due to its unique advantages such as high data rate and low latency.
Improving dynamic congestion isolation in data-center networksAlberto Merino, Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J. Quiles2025-11-06下载The rise of distributed AI and large-scale applications has impacted the communication operations of data-center and Supercomputer interconnection networks, leading to dramatic incast or in-network co...
Age of Job Completion Minimization with Stable QueuesStavros Mitrolaris, Subhankar Banerjee, Sennur Ulukus2025-11-06下载We consider a time-slotted job-assignment system with a central server, N users and a machine which changes its state according to a Markov chain (hence called a Markov machine).
HiFiNet: Hierarchical Fault Identification in Wireless Sensor Networks via Edge-Based Classification and Graph AggregationNguyen Van Son, Nguyen Tri Nghia, Nguyen Thi Hanh, Huynh Thi Thanh Binh2025-11-06下载Wireless Sensor Networks (WSN) are the backbone of essential monitoring applications, but their deployment in unfavourable conditions increases the risk to data integrity and system reliability.
AIvailable: A Software-Defined Architecture for LLM-as-a-Service on Heterogeneous and Legacy GPUsPedro Antunes, Ana Rita Ortigoso, Gabriel Vieira, Daniel Fuentes, Luís Frazão, Nuno Costa, António Pereira2025-11-06下载The rise of Large Language Models (LLM) has increased the need for scalable, high-performance inference systems, yet most existing frameworks assume homogeneous, resource-rich hardware, often unrealis...
Hybrid Quantum-Classical Detection for RIS-Assisted SC-FDE via Grover Adaptive SearchMaryam Tariq, Omar Alhussein, Raneem Abdelraheem, Abdullah Quran, Georges Kaddoum, Sami Muhaidat2025-11-06下载Wideband and low-latency requirements in sixth-generation (6G) networks demand detectors that approach maximum-likelihood (ML) performance without incurring exponential complexity.
Disaggregated Architectures and the Redesign of Data Center Ecosystems: Scheduling, Pooling, and Infrastructure Trade-offsChao Guo, Jiahe Xu, Moshe Zukerman2025-11-06下载Hardware disaggregation seeks to transform Data Center (DC) resources from traditional server fleets into unified resource pools. Despite existing challenges that may hinder its full realization, sign...
OTS-PC: OTS-based Payment Channels for the Lightning NetworkSergio Demian Lerner, Ariel Futoransky2025-11-06下载We present a new type of bidirectional payment channel based on One-Time Signatures on state sequence numbers. This new construction is simpler than the Poon-Dryja construction, but provides a number ...

cs.PF - Performance

标题作者发布日期PDF摘要
An MLCommons Scientific Benchmarks OntologyBen Hawks, Gregor von Laszewski, Matthew D. Sinclair, Marco Colombo, Shivaram Venkataraman, Rutwik Jain, Yiwei Jiang, Nhan Tran, Geoffrey Fox2025-11-06下载Scientific machine learning research spans diverse domains and data modalities, yet existing benchmark efforts remain siloed and lack standardization.
Scalable Domain-decomposed Monte Carlo Neutral Transport for Nuclear FusionOskar Lappi, Huw Leggate, Yannick Marandet, Jan Åström, Keijo Heljanko, Dmitriy V. Borodin2025-11-06下载EIRENE [1] is a Monte Carlo neutral transport solver heavily used in the fusion community. EIRENE does not implement domain decomposition, making it impossible to use for simulations where the grid da...

基于 VitePress 构建