Skip to content

2025-04-08

cs.AR - Architecture

标题作者发布日期PDF摘要
Accelerating Hybrid XOR-CNF SAT Problems Natively with In-Memory ComputingHaesol Im, Fabian Böhm, Giacomo Pedretti, Noriyuki Kushida, Moslem Noori, Elisabetta Valiante, Xiangyi Zhang, Chan-Woo Yang, Tinish Bhattacharya, Xia Sheng, Jim Ignowski, Arne Heittmann, John Paul Strachan, Masoud Mohseni, Ray Beausoleil, Thomas Van Vaerenbergh, Ignacio Rozada2025-04-08下载The Boolean satisfiability (SAT) problem is a computationally challenging decision problem central to many industrial applications. For SAT problems in cryptanalysis, circuit design, and telecommunica...
CDM-QTA: Quantized Training Acceleration for Efficient LoRA Fine-Tuning of Diffusion ModelJinming Lu, Minghao She, Wendong Mao, Zhongfeng Wang2025-04-08下载Fine-tuning large diffusion models for custom applications demands substantial power and time, which poses significant challenges for efficient implementation on mobile devices.
FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network TrainingJinming Lu, Jiayi Tian, Hai Li, Ian Young, Zheng Zhang2025-04-08下载The increasing demand for on-device training of deep neural networks (DNNs) aims to leverage personal data for high-performance applications while addressing privacy concerns and reducing communicatio...
Membrane: Accelerating Database Analytics with Bank-Level DRAM-PIM FilteringAkhil Shekar, Kevin Gaffney, Martin Prammer, Khyati Kiyawat, Lingxi Wu, Helena Caminal, Zhenxing Fan, Yimin Gao, Ashish Venkat, José F. Martínez, Jignesh Patel, Kevin Skadron2025-04-08下载In-memory database query processing frequently involves substantial data transfers between the CPU and memory, leading to inefficiencies due to Von Neumann bottleneck.
Need for zkSpeed: Accelerating HyperPlonk for Zero-Knowledge ProofsAlhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Ramesh Karri, Siddharth Garg, Brandon Reagen2025-04-08下载Zero-Knowledge Proofs (ZKPs) are rapidly gaining importance in privacy-preserving and verifiable computing. ZKPs enable a proving party to prove the truth of a statement to a verifying party without r...
SpikeStream: Accelerating Spiking Neural Network Inference on RISC-V Clusters with Sparse Computation ExtensionsSimone Manoni, Paul Scheffler, Luca Zanatta, Andrea Acquaviva, Luca Benini, Andrea Bartolini2025-04-08下载Spiking Neural Network (SNN) inference has a clear potential for high energy efficiency as computation is triggered by events. However, the inherent sparsity of events poses challenges for conventiona...
CVA6-VMRT: A Modular Approach Towards Time-Predictable Virtual Memory in a 64-bit Application Class RISC-V ProcessorChristopher Reinwardt, Robert Balas, Alessandro Ottaviano, Angelo Garofalo, Luca Benini2025-04-08下载The increasing complexity of autonomous systems has driven a shift to integrated heterogeneous SoCs with real-time and safety demands. Ensuring deterministic WCETs and low-latency for critical tasks r...
BoolE: Exact Symbolic Reasoning via Boolean Equality SaturationJiaqi Yin, Zhan Song, Chen Chen, Qihao Hu, Cunxi Yu2025-04-08下载Boolean symbolic reasoning for gate-level netlists is a critical step in verification, logic and datapath synthesis, and hardware security. Specifically, reasoning datapath and adder tree in bit-blast...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Federated Neural Architecture Search with Model-Agnostic Meta LearningXinyuan Huang, Jiechao Gao2025-04-08下载Federated Learning (FL) often struggles with data heterogeneity due to the naturally uneven distribution of user data across devices. Federated Neural Architecture Search (NAS) enables collaborative s...
Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid CommunicationThomas McFarland, Julian Bellavita, Giulia Guidi2025-04-08下载Sparse General Matrix Multiply (SpGEMM) is key for various High-Performance Computing (HPC) applications such as genomics and graph analytics.
Fixing Non-blocking Data Structures for Better Compatibility with Memory Reclamation SchemesMd Amit Hasan Arovi, Ruslan Nikolaev2025-04-08下载We present a new technique, Safe Concurrent Optimistic Traversals (SCOT), to address a well-known problem related to optimistic traversals with classical and more recent safe memory reclamation (SMR) ...
Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM TrainingDaiyaan Arfeen, Dheevatsa Mudigere, Ankit More, Bhargava Gopireddy, Ahmet Inci, Gregory R. Ganger2025-04-08下载LLM training is scaled up to 10Ks of GPUs by a mix of data-(DP) and model-parallel (MP) execution. Critical to achieving efficiency is tensor-parallel (TP; a form of MP) execution within tightly-coupl...
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE InferenceShuzhang Zhong, Yanfan Sun, Ling Liang, Runsheng Wang, Ru Huang, Meng Li2025-04-08下载The Mixture of Experts (MoE) architecture has demonstrated significant advantages as it enables to increase the model capacity without a proportional increase in computation.
TAGC: Optimizing Gradient Communication in Distributed Transformer TrainingIgor Polyakov, Alexey Dukhanov, Egor Spirin2025-04-08下载The increasing complexity of large language models (LLMs) necessitates efficient training strategies to mitigate the high computational costs associated with distributed training.
DARIS: An Oversubscribed Spatio-Temporal Scheduler for Real-Time DNN Inference on GPUsAmir Fakhim Babaei, Thidapat Chantem2025-04-08下载The widespread use of Deep Neural Networks (DNNs) is limited by high computational demands, especially in constrained environments. GPUs, though effective accelerators, often face underutilization and...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Spectrum Sharing by Space-Time Waveform ShapingHatef Nouri, George Sklivanitis, Dimitris A. Pados, Elizabeth Serena Bentley2025-04-08下载In this paper, we consider the task of introducing a new wireless data link over a given occupied frequency band using a multi-antenna transmitter and receiver.
Scalable Routing in a City-Scale Wi-Fi Network for Disaster RecoveryZiqian Liu, Om Chabra, James Lynch, Chenning Li, Manya Ghobadi, Hari Balakrishnan2025-04-08下载In this paper, we present a new city-scale decentralized mesh network system suited for disaster recovery and emergencies. When wide-area connectivity is unavailable or significantly degraded, our sys...
A Case for Network-wide Orchestration of Host-based Intrusion Detection and ResponseMark Timmons, Daniel Lukaszewski, Geoffrey Xie2025-04-08下载Recent cyber incidents and the push for zero trust security underscore the necessity of monitoring host-level events. However, current host-level intrusion detection systems (IDS) lack the ability to ...
Multi-Modality Sensing in mmWave Beamforming for Connected Vehicles Using Deep LearningMuhammad Baqer Mollah, Honggang Wang, Mohammad Ataul Karim, Hua Fang2025-04-08下载Beamforming techniques are considered as essential parts to compensate severe path losses in millimeter-wave (mmWave) communications. In particular, these techniques adopt large antenna arrays and for...
Sherlock: A Dataset for Process-aware Intrusion Detection Research on Power Grid NetworksEric Wagner, Lennart Bader, Konrad Wolsing, Martin Serror2025-04-08下载Physically distributed components and legacy protocols make the protection of power grids against increasing cyberattack threats challenging. Infamously, the 2015 and 2016 blackouts in Ukraine were ca...
Context-aware Rate Adaptation for Predictive Flying Networks using Contextual BanditsRuben Queiros, Megumi Kaneko, Helder Fontes, Rui Campos2025-04-08下载The increasing complexity of wireless technologies, such as Wi-Fi, presents significant challenges for Rate Adaptation (RA) due to the large configuration space of transmission parameters.
Negotiating strict latency limits for dynamic real-time services in vehicular time-sensitive networksTimo Salomon, Lisa Maile, Philipp Meyer, Franz Korf, Thomas C. Schmidt2025-04-08下载Future vehicles are expected to dynamically deploy in-vehicle applications within a Service-Oriented Architecture (SOA) while critical services continue to operate under hard real-time constraints.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Zerrow: True Zero-Copy Arrow Pipelines in BauplanYifan Dai, Jacopo Tagliabue, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Tyler R. Caraza-Harter2025-04-08下载Bauplan is a FaaS-based lakehouse specifically built for data pipelines: its execution engine uses Apache Arrow for data passing between the nodes in the DAG.

cs.PF - Performance

标题作者发布日期PDF摘要
cuTeSpMM: Accelerating Sparse-Dense Matrix Multiplication using GPU Tensor CoresLizhi Xiang, Omid Asudeh, Gerald Sabin, Aravind Sukumaran-Rajam, P. Sadayappan2025-04-08下载Many recent GPUs feature matrix multiplication engines (aka Tensor Core Units or TCUs) that perform small fixed-size matrix-matrix products at very high throughput.

基于 VitePress 构建