2025-04-08

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Accelerating Hybrid XOR $-$ CNF SAT Problems Natively with In-Memory Computing	Haesol Im, Fabian Böhm, Giacomo Pedretti, Noriyuki Kushida, Moslem Noori, Elisabetta Valiante, Xiangyi Zhang, Chan-Woo Yang, Tinish Bhattacharya, Xia Sheng, Jim Ignowski, Arne Heittmann, John Paul Strachan, Masoud Mohseni, Ray Beausoleil, Thomas Van Vaerenbergh, Ignacio Rozada	2025-04-08	下载	The Boolean satisfiability (SAT) problem is a computationally challenging decision problem central to many industrial applications. For SAT problems in cryptanalysis, circuit design, and telecommunica...
CDM-QTA: Quantized Training Acceleration for Efficient LoRA Fine-Tuning of Diffusion Model	Jinming Lu, Minghao She, Wendong Mao, Zhongfeng Wang	2025-04-08	下载	Fine-tuning large diffusion models for custom applications demands substantial power and time, which poses significant challenges for efficient implementation on mobile devices.
FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training	Jinming Lu, Jiayi Tian, Hai Li, Ian Young, Zheng Zhang	2025-04-08	下载	The increasing demand for on-device training of deep neural networks (DNNs) aims to leverage personal data for high-performance applications while addressing privacy concerns and reducing communicatio...
Membrane: Accelerating Database Analytics with Bank-Level DRAM-PIM Filtering	Akhil Shekar, Kevin Gaffney, Martin Prammer, Khyati Kiyawat, Lingxi Wu, Helena Caminal, Zhenxing Fan, Yimin Gao, Ashish Venkat, José F. Martínez, Jignesh Patel, Kevin Skadron	2025-04-08	下载	In-memory database query processing frequently involves substantial data transfers between the CPU and memory, leading to inefficiencies due to Von Neumann bottleneck.
Need for zkSpeed: Accelerating HyperPlonk for Zero-Knowledge Proofs	Alhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Ramesh Karri, Siddharth Garg, Brandon Reagen	2025-04-08	下载	Zero-Knowledge Proofs (ZKPs) are rapidly gaining importance in privacy-preserving and verifiable computing. ZKPs enable a proving party to prove the truth of a statement to a verifying party without r...
SpikeStream: Accelerating Spiking Neural Network Inference on RISC-V Clusters with Sparse Computation Extensions	Simone Manoni, Paul Scheffler, Luca Zanatta, Andrea Acquaviva, Luca Benini, Andrea Bartolini	2025-04-08	下载	Spiking Neural Network (SNN) inference has a clear potential for high energy efficiency as computation is triggered by events. However, the inherent sparsity of events poses challenges for conventiona...
CVA6-VMRT: A Modular Approach Towards Time-Predictable Virtual Memory in a 64-bit Application Class RISC-V Processor	Christopher Reinwardt, Robert Balas, Alessandro Ottaviano, Angelo Garofalo, Luca Benini	2025-04-08	下载	The increasing complexity of autonomous systems has driven a shift to integrated heterogeneous SoCs with real-time and safety demands. Ensuring deterministic WCETs and low-latency for critical tasks r...
BoolE: Exact Symbolic Reasoning via Boolean Equality Saturation	Jiaqi Yin, Zhan Song, Chen Chen, Qihao Hu, Cunxi Yu	2025-04-08	下载	Boolean symbolic reasoning for gate-level netlists is a critical step in verification, logic and datapath synthesis, and hardware security. Specifically, reasoning datapath and adder tree in bit-blast...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Federated Neural Architecture Search with Model-Agnostic Meta Learning	Xinyuan Huang, Jiechao Gao	2025-04-08	下载	Federated Learning (FL) often struggles with data heterogeneity due to the naturally uneven distribution of user data across devices. Federated Neural Architecture Search (NAS) enables collaborative s...
Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication	Thomas McFarland, Julian Bellavita, Giulia Guidi	2025-04-08	下载	Sparse General Matrix Multiply (SpGEMM) is key for various High-Performance Computing (HPC) applications such as genomics and graph analytics.
Fixing Non-blocking Data Structures for Better Compatibility with Memory Reclamation Schemes	Md Amit Hasan Arovi, Ruslan Nikolaev	2025-04-08	下载	We present a new technique, Safe Concurrent Optimistic Traversals (SCOT), to address a well-known problem related to optimistic traversals with classical and more recent safe memory reclamation (SMR) ...
Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training	Daiyaan Arfeen, Dheevatsa Mudigere, Ankit More, Bhargava Gopireddy, Ahmet Inci, Gregory R. Ganger	2025-04-08	下载	LLM training is scaled up to 10Ks of GPUs by a mix of data-(DP) and model-parallel (MP) execution. Critical to achieving efficiency is tensor-parallel (TP; a form of MP) execution within tightly-coupl...
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference	Shuzhang Zhong, Yanfan Sun, Ling Liang, Runsheng Wang, Ru Huang, Meng Li	2025-04-08	下载	The Mixture of Experts (MoE) architecture has demonstrated significant advantages as it enables to increase the model capacity without a proportional increase in computation.
TAGC: Optimizing Gradient Communication in Distributed Transformer Training	Igor Polyakov, Alexey Dukhanov, Egor Spirin	2025-04-08	下载	The increasing complexity of large language models (LLMs) necessitates efficient training strategies to mitigate the high computational costs associated with distributed training.
DARIS: An Oversubscribed Spatio-Temporal Scheduler for Real-Time DNN Inference on GPUs	Amir Fakhim Babaei, Thidapat Chantem	2025-04-08	下载	The widespread use of Deep Neural Networks (DNNs) is limited by high computational demands, especially in constrained environments. GPUs, though effective accelerators, often face underutilization and...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Spectrum Sharing by Space-Time Waveform Shaping	Hatef Nouri, George Sklivanitis, Dimitris A. Pados, Elizabeth Serena Bentley	2025-04-08	下载	In this paper, we consider the task of introducing a new wireless data link over a given occupied frequency band using a multi-antenna transmitter and receiver.
Scalable Routing in a City-Scale Wi-Fi Network for Disaster Recovery	Ziqian Liu, Om Chabra, James Lynch, Chenning Li, Manya Ghobadi, Hari Balakrishnan	2025-04-08	下载	In this paper, we present a new city-scale decentralized mesh network system suited for disaster recovery and emergencies. When wide-area connectivity is unavailable or significantly degraded, our sys...
A Case for Network-wide Orchestration of Host-based Intrusion Detection and Response	Mark Timmons, Daniel Lukaszewski, Geoffrey Xie	2025-04-08	下载	Recent cyber incidents and the push for zero trust security underscore the necessity of monitoring host-level events. However, current host-level intrusion detection systems (IDS) lack the ability to ...
Multi-Modality Sensing in mmWave Beamforming for Connected Vehicles Using Deep Learning	Muhammad Baqer Mollah, Honggang Wang, Mohammad Ataul Karim, Hua Fang	2025-04-08	下载	Beamforming techniques are considered as essential parts to compensate severe path losses in millimeter-wave (mmWave) communications. In particular, these techniques adopt large antenna arrays and for...
Sherlock: A Dataset for Process-aware Intrusion Detection Research on Power Grid Networks	Eric Wagner, Lennart Bader, Konrad Wolsing, Martin Serror	2025-04-08	下载	Physically distributed components and legacy protocols make the protection of power grids against increasing cyberattack threats challenging. Infamously, the 2015 and 2016 blackouts in Ukraine were ca...
Context-aware Rate Adaptation for Predictive Flying Networks using Contextual Bandits	Ruben Queiros, Megumi Kaneko, Helder Fontes, Rui Campos	2025-04-08	下载	The increasing complexity of wireless technologies, such as Wi-Fi, presents significant challenges for Rate Adaptation (RA) due to the large configuration space of transmission parameters.
Negotiating strict latency limits for dynamic real-time services in vehicular time-sensitive networks	Timo Salomon, Lisa Maile, Philipp Meyer, Franz Korf, Thomas C. Schmidt	2025-04-08	下载	Future vehicles are expected to dynamically deploy in-vehicle applications within a Service-Oriented Architecture (SOA) while critical services continue to operate under hard real-time constraints.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Zerrow: True Zero-Copy Arrow Pipelines in Bauplan	Yifan Dai, Jacopo Tagliabue, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Tyler R. Caraza-Harter	2025-04-08	下载	Bauplan is a FaaS-based lakehouse specifically built for data pipelines: its execution engine uses Apache Arrow for data passing between the nodes in the DAG.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
cuTeSpMM: Accelerating Sparse-Dense Matrix Multiplication using GPU Tensor Cores	Lizhi Xiang, Omid Asudeh, Gerald Sabin, Aravind Sukumaran-Rajam, P. Sadayappan	2025-04-08	下载	Many recent GPUs feature matrix multiplication engines (aka Tensor Core Units or TCUs) that perform small fixed-size matrix-matrix products at very high throughput.