Skip to content

2025-11-19

cs.AR - Architecture

标题作者发布日期PDF摘要
Toward Open-Source Chiplets for HPC and AI: Occamy and BeyondPaul Scheffler, Thomas Benz, Tim Fischer, Lorenzo Leone, Sina Arjmandpour, Luca Benini2025-11-19下载We present a roadmap for open-source chiplet-based RISC-V systems targeting high-performance computing and artificial intelligence, aiming to close the performance gap to proprietary designs.
Ten-Four: An Open-Source Fused Dot Product Unit for Mixed-Precision GPGPU Tensor CoresNikhil Rout, Blaise Tine2025-11-19下载Efficient mixed-precision matrix multiply accumulate (MMA) operations are critical for accelerating deep learning workloads on GPGPUs. However, existing open-source dot product implementations for Ten...
Instruction-Based Coordination of Heterogeneous Processing Units for Acceleration of DNN InferenceAnastasios Petropoulos, Theodore Antonakopoulos2025-11-19下载This paper presents an instruction-based coordination architecture for Field-Programmable Gate Array (FPGA)-based systems with multiple high-performance Processing Units (PUs) for accelerating Deep Ne...
A Tensor Compiler for Processing-In-Memory ArchitecturesPeiming Yang, Sankeerth Durvasula, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko, Christina Giannoula2025-11-19下载Processing-In-Memory (PIM) devices integrated with high-performance Host processors (e.g., GPUs) can accelerate memory-intensive kernels in Machine Learning (ML) models, including Large Language Model...
Hemlet: A Heterogeneous Compute-in-Memory Chiplet Architecture for Vision Transformers with Group-Level ParallelismCong Wang, Zexin Fu, Jiayi Huang, Shanshi Huang2025-11-19下载Vision Transformers (ViTs) have established new performance benchmarks in vision tasks such as image recognition and object detection. However, these advancements come with significant demands for mem...
DARE: An Irregularity-Tolerant Matrix Processing Unit with a Densifying ISA and Filtered Runahead ExecutionXin Yang, Xin Fan, Zengshi Wang, Jun Han2025-11-19下载Deep Neural Networks (DNNs) are widely applied across domains and have shown strong effectiveness. As DNN workloads increasingly run on CPUs, dedicated Matrix Processing Units (MPUs) and Matrix Instru...
GPU-Initiated Networking for NCCLKhaled Hamidouche, John Bachan, Pak Markthub, Peter-Jan Gootzen, Elena Agostini, Sylvain Jeaugey, Aamir Shafi, Georgios Theodorakis, Manjunath Gorentla Venkata2025-11-19下载Modern AI workloads, especially Mixture-of-Experts (MoE) architectures, increasingly demand low-latency, fine-grained GPU-to-GPU communication with device-side control.
A Flower-Inspired Solution for Computer Memory Wear-LevelingElizabeth Shen, Huiyang Zhou2025-11-19下载Lengthening a computer memory's lifespan is important for e-waste and sustainability. Uneven wear of memory is a major barrier. The problem is becoming even more urgent as emerging memory such as phas...
CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled OperationsZhuolun Jiang, Songyue Wang, Xiaokun Pei, Tianyue Lu, Mingyu Chen2025-11-19下载Modern data-intensive applications face memory latency challenges exacerbated by disaggregated memory systems. Recent work shows that coroutines are promising in effectively interleaving tasks and hid...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
AVS: A Computational and Hierarchical Storage System for Autonomous VehiclesYuxin Wang, Yuankai He, Weisong Shi2025-11-19下载Autonomous vehicles (AVs) are evolving into mobile computing platforms, equipped with powerful processors and diverse sensors that generate massive heterogeneous data, for example 14 TB per day.
Beluga: Block Synchronization for BFT Consensus ProtocolsTasos Kichidis, Lefteris Kokoris-Kogias, Arun Koshy, Ilya Sergey, Alberto Sonnino, Mingwei Tian, Jianting Zhang2025-11-19下载Modern high-throughput BFT consensus protocols use streamlined push-pull mechanisms to disseminate blocks and keep happy-path performance optimal.
A Tensor Compiler for Processing-In-Memory ArchitecturesPeiming Yang, Sankeerth Durvasula, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko, Christina Giannoula2025-11-19下载Processing-In-Memory (PIM) devices integrated with high-performance Host processors (e.g., GPUs) can accelerate memory-intensive kernels in Machine Learning (ML) models, including Large Language Model...
Proving there is a leader without naming itLaurent Feuilloley, Josef Erik Sedláček, Martin Slávik2025-11-19下载Local certification is a mechanism for certifying to the nodes of a network that a certain property holds. In this framework, nodes are assigned labels, called certificates, which are supposed to prov...
Towards a Formal Verification of Secure Vehicle Software UpdatesMartin Slind Hagen, Emil Lundqvist, Alex Phu, Yenan Wang, Kim Strandberg, Elad Michael Schiller2025-11-19下载With the rise of software-defined vehicles (SDVs), where software governs most vehicle functions alongside enhanced connectivity, the need for secure software updates has become increasingly critical.
When Can You Trust Bitcoin? Value-Dependent Block Confirmation to Determine Transaction FinalitEthan Hicks, Joseph Oglio, Mikhail Nesterenko, Gokarna Sharma2025-11-19下载We study financial transaction confirmation finality in Bitcoin as a function of transaction amount and user risk tolerance. A transaction is recorded in a block on a blockchain.
Multiple Sides of 36 Coins: Measuring Peer-to-Peer Infrastructure Across CryptocurrenciesLucianna Kiffer, Lioba Heimbach, Dennis Trautwein, Yann Vonlanthen, Oliver Gasser2025-11-19下载Blockchain technologies underpin an expanding ecosystem of decentralized applications, financial systems, and infrastructure. However, the fundamental networking layer that sustains these systems, the...
BlueBottle: Fast and Robust Blockchains through Subsystem SpecializationPreston Vander Vos, Alberto Sonnino, Giorgos Tsimos, Philipp Jovanovic, Lefteris Kokoris-Kogias2025-11-19下载Blockchain consensus faces a trilemma of security, latency, and decentralization. High-throughput systems often require a reduction in decentralization or robustness against strong adversaries, while ...
Privacy-Preserving IoT in Connected Aircraft CabinNilesh Vyas, Benjamin Zhao, Aygün Baltaci, Gustavo de Carvalho Bertoli, Hassan Asghar, Markus Klügel, Gerrit Schramm, Martin Kubisch, Dali Kaafar2025-11-19下载The proliferation of IoT devices in shared, multi-vendor environments like the modern aircraft cabin creates a fundamental conflict between the promise of data collaboration and the risks to passenger...
GPU-Initiated Networking for NCCLKhaled Hamidouche, John Bachan, Pak Markthub, Peter-Jan Gootzen, Elena Agostini, Sylvain Jeaugey, Aamir Shafi, Georgios Theodorakis, Manjunath Gorentla Venkata2025-11-19下载Modern AI workloads, especially Mixture-of-Experts (MoE) architectures, increasingly demand low-latency, fine-grained GPU-to-GPU communication with device-side control.
GeoShield: Byzantine Fault Detection and Recovery for Geo-Distributed Real-Time Cyber-Physical SystemsYifan Cai, Linh Thi Xuan Phan2025-11-19下载Large-scale cyber-physical systems (CPS), such as railway control systems and smart grids, consist of geographically distributed subsystems that are connected via unreliable, asynchronous inter-region...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Inter-Satellite Link Configuration for Fast Delivery in Low-Earth-Orbit ConstellationsArman Mollakhani, Jerayu Tiamraj, Shu-Jie Cao, Dongning Guo2025-11-19下载End-to-end latency in large low-Earth-orbit (LEO) constellations is dominated by propagation delay, making total delay roughly proportional to the network diameter, the longest shortest path in hops.
Privacy-Preserving IoT in Connected Aircraft CabinNilesh Vyas, Benjamin Zhao, Aygün Baltaci, Gustavo de Carvalho Bertoli, Hassan Asghar, Markus Klügel, Gerrit Schramm, Martin Kubisch, Dali Kaafar2025-11-19下载The proliferation of IoT devices in shared, multi-vendor environments like the modern aircraft cabin creates a fundamental conflict between the promise of data collaboration and the risks to passenger...
QADR: A Scalable, Quantum-Resistant Protocol for Anonymous Data ReportingNilesh Vyas, Konstantin Baier2025-11-19下载The security of future large-scale IoT networks is critically threatened by the ``Harvest Now, Decrypt Later'' (HNDL) attack paradigm. Securing the massive, long-lived data streams from these systems ...
PLATONT: Learning a Platonic Representation for Unified Network TomographyChengze Du, Heng Xu, Zhiwei Yu, Bo Liu, Jialong Li2025-11-19下载Network tomography aims to infer hidden network states, such as link performance, traffic load, and topology, from external observations. Most existing methods solve these problems separately and depe...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
AVS: A Computational and Hierarchical Storage System for Autonomous VehiclesYuxin Wang, Yuankai He, Weisong Shi2025-11-19下载Autonomous vehicles (AVs) are evolving into mobile computing platforms, equipped with powerful processors and diverse sensors that generate massive heterogeneous data, for example 14 TB per day.
A Flower-Inspired Solution for Computer Memory Wear-LevelingElizabeth Shen, Huiyang Zhou2025-11-19下载Lengthening a computer memory's lifespan is important for e-waste and sustainability. Uneven wear of memory is a major barrier. The problem is becoming even more urgent as emerging memory such as phas...

cs.PF - Performance

标题作者发布日期PDF摘要
A Latency-Constrained, Gated Recurrent Unit (GRU) Implementation in the Versal AI EngineM. Sapkas, A. Triossi, M. Zanetti2025-11-19下载This work explores the use of the AMD Xilinx Versal Adaptable Intelligent Engine (AIE) to accelerate Gated Recurrent Unit (GRU) inference for latency constrained applications.
A Tensor Compiler for Processing-In-Memory ArchitecturesPeiming Yang, Sankeerth Durvasula, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko, Christina Giannoula2025-11-19下载Processing-In-Memory (PIM) devices integrated with high-performance Host processors (e.g., GPUs) can accelerate memory-intensive kernels in Machine Learning (ML) models, including Large Language Model...
Dynamic Expert Quantization for Scalable Mixture-of-Experts InferenceKexin Chu, Dawei Xiang, Zixu Shen, Yiwei Yang, Zecheng Liu, Wei Zhang2025-11-19下载Mixture-of-Experts (MoE) has become a practical architecture for scaling LLM capacity while keeping per-token compute modest, but deploying MoE models on a single, memory-limited GPU remains difficult...

基于 VitePress 构建