Appearance
2025-11-19
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Toward Open-Source Chiplets for HPC and AI: Occamy and Beyond | Paul Scheffler, Thomas Benz, Tim Fischer, Lorenzo Leone, Sina Arjmandpour, Luca Benini | 2025-11-19 | 下载 | We present a roadmap for open-source chiplet-based RISC-V systems targeting high-performance computing and artificial intelligence, aiming to close the performance gap to proprietary designs. |
| Ten-Four: An Open-Source Fused Dot Product Unit for Mixed-Precision GPGPU Tensor Cores | Nikhil Rout, Blaise Tine | 2025-11-19 | 下载 | Efficient mixed-precision matrix multiply accumulate (MMA) operations are critical for accelerating deep learning workloads on GPGPUs. However, existing open-source dot product implementations for Ten... |
| Instruction-Based Coordination of Heterogeneous Processing Units for Acceleration of DNN Inference | Anastasios Petropoulos, Theodore Antonakopoulos | 2025-11-19 | 下载 | This paper presents an instruction-based coordination architecture for Field-Programmable Gate Array (FPGA)-based systems with multiple high-performance Processing Units (PUs) for accelerating Deep Ne... |
| A Tensor Compiler for Processing-In-Memory Architectures | Peiming Yang, Sankeerth Durvasula, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko, Christina Giannoula | 2025-11-19 | 下载 | Processing-In-Memory (PIM) devices integrated with high-performance Host processors (e.g., GPUs) can accelerate memory-intensive kernels in Machine Learning (ML) models, including Large Language Model... |
| Hemlet: A Heterogeneous Compute-in-Memory Chiplet Architecture for Vision Transformers with Group-Level Parallelism | Cong Wang, Zexin Fu, Jiayi Huang, Shanshi Huang | 2025-11-19 | 下载 | Vision Transformers (ViTs) have established new performance benchmarks in vision tasks such as image recognition and object detection. However, these advancements come with significant demands for mem... |
| DARE: An Irregularity-Tolerant Matrix Processing Unit with a Densifying ISA and Filtered Runahead Execution | Xin Yang, Xin Fan, Zengshi Wang, Jun Han | 2025-11-19 | 下载 | Deep Neural Networks (DNNs) are widely applied across domains and have shown strong effectiveness. As DNN workloads increasingly run on CPUs, dedicated Matrix Processing Units (MPUs) and Matrix Instru... |
| GPU-Initiated Networking for NCCL | Khaled Hamidouche, John Bachan, Pak Markthub, Peter-Jan Gootzen, Elena Agostini, Sylvain Jeaugey, Aamir Shafi, Georgios Theodorakis, Manjunath Gorentla Venkata | 2025-11-19 | 下载 | Modern AI workloads, especially Mixture-of-Experts (MoE) architectures, increasingly demand low-latency, fine-grained GPU-to-GPU communication with device-side control. |
| A Flower-Inspired Solution for Computer Memory Wear-Leveling | Elizabeth Shen, Huiyang Zhou | 2025-11-19 | 下载 | Lengthening a computer memory's lifespan is important for e-waste and sustainability. Uneven wear of memory is a major barrier. The problem is becoming even more urgent as emerging memory such as phas... |
| CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled Operations | Zhuolun Jiang, Songyue Wang, Xiaokun Pei, Tianyue Lu, Mingyu Chen | 2025-11-19 | 下载 | Modern data-intensive applications face memory latency challenges exacerbated by disaggregated memory systems. Recent work shows that coroutines are promising in effectively interleaving tasks and hid... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| AVS: A Computational and Hierarchical Storage System for Autonomous Vehicles | Yuxin Wang, Yuankai He, Weisong Shi | 2025-11-19 | 下载 | Autonomous vehicles (AVs) are evolving into mobile computing platforms, equipped with powerful processors and diverse sensors that generate massive heterogeneous data, for example 14 TB per day. |
| Beluga: Block Synchronization for BFT Consensus Protocols | Tasos Kichidis, Lefteris Kokoris-Kogias, Arun Koshy, Ilya Sergey, Alberto Sonnino, Mingwei Tian, Jianting Zhang | 2025-11-19 | 下载 | Modern high-throughput BFT consensus protocols use streamlined push-pull mechanisms to disseminate blocks and keep happy-path performance optimal. |
| A Tensor Compiler for Processing-In-Memory Architectures | Peiming Yang, Sankeerth Durvasula, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko, Christina Giannoula | 2025-11-19 | 下载 | Processing-In-Memory (PIM) devices integrated with high-performance Host processors (e.g., GPUs) can accelerate memory-intensive kernels in Machine Learning (ML) models, including Large Language Model... |
| Proving there is a leader without naming it | Laurent Feuilloley, Josef Erik Sedláček, Martin Slávik | 2025-11-19 | 下载 | Local certification is a mechanism for certifying to the nodes of a network that a certain property holds. In this framework, nodes are assigned labels, called certificates, which are supposed to prov... |
| Towards a Formal Verification of Secure Vehicle Software Updates | Martin Slind Hagen, Emil Lundqvist, Alex Phu, Yenan Wang, Kim Strandberg, Elad Michael Schiller | 2025-11-19 | 下载 | With the rise of software-defined vehicles (SDVs), where software governs most vehicle functions alongside enhanced connectivity, the need for secure software updates has become increasingly critical. |
| When Can You Trust Bitcoin? Value-Dependent Block Confirmation to Determine Transaction Finalit | Ethan Hicks, Joseph Oglio, Mikhail Nesterenko, Gokarna Sharma | 2025-11-19 | 下载 | We study financial transaction confirmation finality in Bitcoin as a function of transaction amount and user risk tolerance. A transaction is recorded in a block on a blockchain. |
| Multiple Sides of 36 Coins: Measuring Peer-to-Peer Infrastructure Across Cryptocurrencies | Lucianna Kiffer, Lioba Heimbach, Dennis Trautwein, Yann Vonlanthen, Oliver Gasser | 2025-11-19 | 下载 | Blockchain technologies underpin an expanding ecosystem of decentralized applications, financial systems, and infrastructure. However, the fundamental networking layer that sustains these systems, the... |
| BlueBottle: Fast and Robust Blockchains through Subsystem Specialization | Preston Vander Vos, Alberto Sonnino, Giorgos Tsimos, Philipp Jovanovic, Lefteris Kokoris-Kogias | 2025-11-19 | 下载 | Blockchain consensus faces a trilemma of security, latency, and decentralization. High-throughput systems often require a reduction in decentralization or robustness against strong adversaries, while ... |
| Privacy-Preserving IoT in Connected Aircraft Cabin | Nilesh Vyas, Benjamin Zhao, Aygün Baltaci, Gustavo de Carvalho Bertoli, Hassan Asghar, Markus Klügel, Gerrit Schramm, Martin Kubisch, Dali Kaafar | 2025-11-19 | 下载 | The proliferation of IoT devices in shared, multi-vendor environments like the modern aircraft cabin creates a fundamental conflict between the promise of data collaboration and the risks to passenger... |
| GPU-Initiated Networking for NCCL | Khaled Hamidouche, John Bachan, Pak Markthub, Peter-Jan Gootzen, Elena Agostini, Sylvain Jeaugey, Aamir Shafi, Georgios Theodorakis, Manjunath Gorentla Venkata | 2025-11-19 | 下载 | Modern AI workloads, especially Mixture-of-Experts (MoE) architectures, increasingly demand low-latency, fine-grained GPU-to-GPU communication with device-side control. |
| GeoShield: Byzantine Fault Detection and Recovery for Geo-Distributed Real-Time Cyber-Physical Systems | Yifan Cai, Linh Thi Xuan Phan | 2025-11-19 | 下载 | Large-scale cyber-physical systems (CPS), such as railway control systems and smart grids, consist of geographically distributed subsystems that are connected via unreliable, asynchronous inter-region... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Inter-Satellite Link Configuration for Fast Delivery in Low-Earth-Orbit Constellations | Arman Mollakhani, Jerayu Tiamraj, Shu-Jie Cao, Dongning Guo | 2025-11-19 | 下载 | End-to-end latency in large low-Earth-orbit (LEO) constellations is dominated by propagation delay, making total delay roughly proportional to the network diameter, the longest shortest path in hops. |
| Privacy-Preserving IoT in Connected Aircraft Cabin | Nilesh Vyas, Benjamin Zhao, Aygün Baltaci, Gustavo de Carvalho Bertoli, Hassan Asghar, Markus Klügel, Gerrit Schramm, Martin Kubisch, Dali Kaafar | 2025-11-19 | 下载 | The proliferation of IoT devices in shared, multi-vendor environments like the modern aircraft cabin creates a fundamental conflict between the promise of data collaboration and the risks to passenger... |
| QADR: A Scalable, Quantum-Resistant Protocol for Anonymous Data Reporting | Nilesh Vyas, Konstantin Baier | 2025-11-19 | 下载 | The security of future large-scale IoT networks is critically threatened by the ``Harvest Now, Decrypt Later'' (HNDL) attack paradigm. Securing the massive, long-lived data streams from these systems ... |
| PLATONT: Learning a Platonic Representation for Unified Network Tomography | Chengze Du, Heng Xu, Zhiwei Yu, Bo Liu, Jialong Li | 2025-11-19 | 下载 | Network tomography aims to infer hidden network states, such as link performance, traffic load, and topology, from external observations. Most existing methods solve these problems separately and depe... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| AVS: A Computational and Hierarchical Storage System for Autonomous Vehicles | Yuxin Wang, Yuankai He, Weisong Shi | 2025-11-19 | 下载 | Autonomous vehicles (AVs) are evolving into mobile computing platforms, equipped with powerful processors and diverse sensors that generate massive heterogeneous data, for example 14 TB per day. |
| A Flower-Inspired Solution for Computer Memory Wear-Leveling | Elizabeth Shen, Huiyang Zhou | 2025-11-19 | 下载 | Lengthening a computer memory's lifespan is important for e-waste and sustainability. Uneven wear of memory is a major barrier. The problem is becoming even more urgent as emerging memory such as phas... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A Latency-Constrained, Gated Recurrent Unit (GRU) Implementation in the Versal AI Engine | M. Sapkas, A. Triossi, M. Zanetti | 2025-11-19 | 下载 | This work explores the use of the AMD Xilinx Versal Adaptable Intelligent Engine (AIE) to accelerate Gated Recurrent Unit (GRU) inference for latency constrained applications. |
| A Tensor Compiler for Processing-In-Memory Architectures | Peiming Yang, Sankeerth Durvasula, Ivan Fernandez, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko, Christina Giannoula | 2025-11-19 | 下载 | Processing-In-Memory (PIM) devices integrated with high-performance Host processors (e.g., GPUs) can accelerate memory-intensive kernels in Machine Learning (ML) models, including Large Language Model... |
| Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference | Kexin Chu, Dawei Xiang, Zixu Shen, Yiwei Yang, Zecheng Liu, Wei Zhang | 2025-11-19 | 下载 | Mixture-of-Experts (MoE) has become a practical architecture for scaling LLM capacity while keeping per-token compute modest, but deploying MoE models on a single, memory-limited GPU remains difficult... |