2025-11-19

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Toward Open-Source Chiplets for HPC and AI: Occamy and Beyond	Paul Scheffler, Thomas Benz, Tim Fischer, Lorenzo Leone, Sina Arjmandpour, Luca Benini	2025-11-19	下载	We present a roadmap for open-source chiplet-based RISC-V systems targeting high-performance computing and artificial intelligence, aiming to close the performance gap to proprietary designs.
Ten-Four: An Open-Source Fused Dot Product Unit for Mixed-Precision GPGPU Tensor Cores	Nikhil Rout, Blaise Tine	2025-11-19	下载	Efficient mixed-precision matrix multiply accumulate (MMA) operations are critical for accelerating deep learning workloads on GPGPUs. However, existing open-source dot product implementations for Ten...
Instruction-Based Coordination of Heterogeneous Processing Units for Acceleration of DNN Inference	Anastasios Petropoulos, Theodore Antonakopoulos	2025-11-19	下载	This paper presents an instruction-based coordination architecture for Field-Programmable Gate Array (FPGA)-based systems with multiple high-performance Processing Units (PUs) for accelerating Deep Ne...
A Tensor Compiler for Processing-In-Memory Architectures	Peiming Yang, Sankeerth Durvasula, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko, Christina Giannoula	2025-11-19	下载	Processing-In-Memory (PIM) devices integrated with high-performance Host processors (e.g., GPUs) can accelerate memory-intensive kernels in Machine Learning (ML) models, including Large Language Model...
Hemlet: A Heterogeneous Compute-in-Memory Chiplet Architecture for Vision Transformers with Group-Level Parallelism	Cong Wang, Zexin Fu, Jiayi Huang, Shanshi Huang	2025-11-19	下载	Vision Transformers (ViTs) have established new performance benchmarks in vision tasks such as image recognition and object detection. However, these advancements come with significant demands for mem...
DARE: An Irregularity-Tolerant Matrix Processing Unit with a Densifying ISA and Filtered Runahead Execution	Xin Yang, Xin Fan, Zengshi Wang, Jun Han	2025-11-19	下载	Deep Neural Networks (DNNs) are widely applied across domains and have shown strong effectiveness. As DNN workloads increasingly run on CPUs, dedicated Matrix Processing Units (MPUs) and Matrix Instru...
GPU-Initiated Networking for NCCL	Khaled Hamidouche, John Bachan, Pak Markthub, Peter-Jan Gootzen, Elena Agostini, Sylvain Jeaugey, Aamir Shafi, Georgios Theodorakis, Manjunath Gorentla Venkata	2025-11-19	下载	Modern AI workloads, especially Mixture-of-Experts (MoE) architectures, increasingly demand low-latency, fine-grained GPU-to-GPU communication with device-side control.
A Flower-Inspired Solution for Computer Memory Wear-Leveling	Elizabeth Shen, Huiyang Zhou	2025-11-19	下载	Lengthening a computer memory's lifespan is important for e-waste and sustainability. Uneven wear of memory is a major barrier. The problem is becoming even more urgent as emerging memory such as phas...
CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled Operations	Zhuolun Jiang, Songyue Wang, Xiaokun Pei, Tianyue Lu, Mingyu Chen	2025-11-19	下载	Modern data-intensive applications face memory latency challenges exacerbated by disaggregated memory systems. Recent work shows that coroutines are promising in effectively interleaving tasks and hid...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
AVS: A Computational and Hierarchical Storage System for Autonomous Vehicles	Yuxin Wang, Yuankai He, Weisong Shi	2025-11-19	下载	Autonomous vehicles (AVs) are evolving into mobile computing platforms, equipped with powerful processors and diverse sensors that generate massive heterogeneous data, for example 14 TB per day.
Beluga: Block Synchronization for BFT Consensus Protocols	Tasos Kichidis, Lefteris Kokoris-Kogias, Arun Koshy, Ilya Sergey, Alberto Sonnino, Mingwei Tian, Jianting Zhang	2025-11-19	下载	Modern high-throughput BFT consensus protocols use streamlined push-pull mechanisms to disseminate blocks and keep happy-path performance optimal.
A Tensor Compiler for Processing-In-Memory Architectures	Peiming Yang, Sankeerth Durvasula, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko, Christina Giannoula	2025-11-19	下载	Processing-In-Memory (PIM) devices integrated with high-performance Host processors (e.g., GPUs) can accelerate memory-intensive kernels in Machine Learning (ML) models, including Large Language Model...
Proving there is a leader without naming it	Laurent Feuilloley, Josef Erik Sedláček, Martin Slávik	2025-11-19	下载	Local certification is a mechanism for certifying to the nodes of a network that a certain property holds. In this framework, nodes are assigned labels, called certificates, which are supposed to prov...
Towards a Formal Verification of Secure Vehicle Software Updates	Martin Slind Hagen, Emil Lundqvist, Alex Phu, Yenan Wang, Kim Strandberg, Elad Michael Schiller	2025-11-19	下载	With the rise of software-defined vehicles (SDVs), where software governs most vehicle functions alongside enhanced connectivity, the need for secure software updates has become increasingly critical.
When Can You Trust Bitcoin? Value-Dependent Block Confirmation to Determine Transaction Finalit	Ethan Hicks, Joseph Oglio, Mikhail Nesterenko, Gokarna Sharma	2025-11-19	下载	We study financial transaction confirmation finality in Bitcoin as a function of transaction amount and user risk tolerance. A transaction is recorded in a block on a blockchain.
Multiple Sides of 36 Coins: Measuring Peer-to-Peer Infrastructure Across Cryptocurrencies	Lucianna Kiffer, Lioba Heimbach, Dennis Trautwein, Yann Vonlanthen, Oliver Gasser	2025-11-19	下载	Blockchain technologies underpin an expanding ecosystem of decentralized applications, financial systems, and infrastructure. However, the fundamental networking layer that sustains these systems, the...
BlueBottle: Fast and Robust Blockchains through Subsystem Specialization	Preston Vander Vos, Alberto Sonnino, Giorgos Tsimos, Philipp Jovanovic, Lefteris Kokoris-Kogias	2025-11-19	下载	Blockchain consensus faces a trilemma of security, latency, and decentralization. High-throughput systems often require a reduction in decentralization or robustness against strong adversaries, while ...
Privacy-Preserving IoT in Connected Aircraft Cabin	Nilesh Vyas, Benjamin Zhao, Aygün Baltaci, Gustavo de Carvalho Bertoli, Hassan Asghar, Markus Klügel, Gerrit Schramm, Martin Kubisch, Dali Kaafar	2025-11-19	下载	The proliferation of IoT devices in shared, multi-vendor environments like the modern aircraft cabin creates a fundamental conflict between the promise of data collaboration and the risks to passenger...
GPU-Initiated Networking for NCCL	Khaled Hamidouche, John Bachan, Pak Markthub, Peter-Jan Gootzen, Elena Agostini, Sylvain Jeaugey, Aamir Shafi, Georgios Theodorakis, Manjunath Gorentla Venkata	2025-11-19	下载	Modern AI workloads, especially Mixture-of-Experts (MoE) architectures, increasingly demand low-latency, fine-grained GPU-to-GPU communication with device-side control.
GeoShield: Byzantine Fault Detection and Recovery for Geo-Distributed Real-Time Cyber-Physical Systems	Yifan Cai, Linh Thi Xuan Phan	2025-11-19	下载	Large-scale cyber-physical systems (CPS), such as railway control systems and smart grids, consist of geographically distributed subsystems that are connected via unreliable, asynchronous inter-region...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Inter-Satellite Link Configuration for Fast Delivery in Low-Earth-Orbit Constellations	Arman Mollakhani, Jerayu Tiamraj, Shu-Jie Cao, Dongning Guo	2025-11-19	下载	End-to-end latency in large low-Earth-orbit (LEO) constellations is dominated by propagation delay, making total delay roughly proportional to the network diameter, the longest shortest path in hops.
Privacy-Preserving IoT in Connected Aircraft Cabin	Nilesh Vyas, Benjamin Zhao, Aygün Baltaci, Gustavo de Carvalho Bertoli, Hassan Asghar, Markus Klügel, Gerrit Schramm, Martin Kubisch, Dali Kaafar	2025-11-19	下载	The proliferation of IoT devices in shared, multi-vendor environments like the modern aircraft cabin creates a fundamental conflict between the promise of data collaboration and the risks to passenger...
QADR: A Scalable, Quantum-Resistant Protocol for Anonymous Data Reporting	Nilesh Vyas, Konstantin Baier	2025-11-19	下载	The security of future large-scale IoT networks is critically threatened by the ``Harvest Now, Decrypt Later'' (HNDL) attack paradigm. Securing the massive, long-lived data streams from these systems ...
PLATONT: Learning a Platonic Representation for Unified Network Tomography	Chengze Du, Heng Xu, Zhiwei Yu, Bo Liu, Jialong Li	2025-11-19	下载	Network tomography aims to infer hidden network states, such as link performance, traffic load, and topology, from external observations. Most existing methods solve these problems separately and depe...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
AVS: A Computational and Hierarchical Storage System for Autonomous Vehicles	Yuxin Wang, Yuankai He, Weisong Shi	2025-11-19	下载	Autonomous vehicles (AVs) are evolving into mobile computing platforms, equipped with powerful processors and diverse sensors that generate massive heterogeneous data, for example 14 TB per day.
A Flower-Inspired Solution for Computer Memory Wear-Leveling	Elizabeth Shen, Huiyang Zhou	2025-11-19	下载	Lengthening a computer memory's lifespan is important for e-waste and sustainability. Uneven wear of memory is a major barrier. The problem is becoming even more urgent as emerging memory such as phas...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
A Latency-Constrained, Gated Recurrent Unit (GRU) Implementation in the Versal AI Engine	M. Sapkas, A. Triossi, M. Zanetti	2025-11-19	下载	This work explores the use of the AMD Xilinx Versal Adaptable Intelligent Engine (AIE) to accelerate Gated Recurrent Unit (GRU) inference for latency constrained applications.
A Tensor Compiler for Processing-In-Memory Architectures	Peiming Yang, Sankeerth Durvasula, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko, Christina Giannoula	2025-11-19	下载	Processing-In-Memory (PIM) devices integrated with high-performance Host processors (e.g., GPUs) can accelerate memory-intensive kernels in Machine Learning (ML) models, including Large Language Model...
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference	Kexin Chu, Dawei Xiang, Zixu Shen, Yiwei Yang, Zecheng Liu, Wei Zhang	2025-11-19	下载	Mixture-of-Experts (MoE) has become a practical architecture for scaling LLM capacity while keeping per-token compute modest, but deploying MoE models on a single, memory-limited GPU remains difficult...