Appearance
2025-04-08
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Accelerating Hybrid XORCNF SAT Problems Natively with In-Memory Computing | Haesol Im, Fabian Böhm, Giacomo Pedretti, Noriyuki Kushida, Moslem Noori, Elisabetta Valiante, Xiangyi Zhang, Chan-Woo Yang, Tinish Bhattacharya, Xia Sheng, Jim Ignowski, Arne Heittmann, John Paul Strachan, Masoud Mohseni, Ray Beausoleil, Thomas Van Vaerenbergh, Ignacio Rozada | 2025-04-08 | 下载 | The Boolean satisfiability (SAT) problem is a computationally challenging decision problem central to many industrial applications. For SAT problems in cryptanalysis, circuit design, and telecommunica... |
| CDM-QTA: Quantized Training Acceleration for Efficient LoRA Fine-Tuning of Diffusion Model | Jinming Lu, Minghao She, Wendong Mao, Zhongfeng Wang | 2025-04-08 | 下载 | Fine-tuning large diffusion models for custom applications demands substantial power and time, which poses significant challenges for efficient implementation on mobile devices. |
| FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training | Jinming Lu, Jiayi Tian, Hai Li, Ian Young, Zheng Zhang | 2025-04-08 | 下载 | The increasing demand for on-device training of deep neural networks (DNNs) aims to leverage personal data for high-performance applications while addressing privacy concerns and reducing communicatio... |
| Membrane: Accelerating Database Analytics with Bank-Level DRAM-PIM Filtering | Akhil Shekar, Kevin Gaffney, Martin Prammer, Khyati Kiyawat, Lingxi Wu, Helena Caminal, Zhenxing Fan, Yimin Gao, Ashish Venkat, José F. Martínez, Jignesh Patel, Kevin Skadron | 2025-04-08 | 下载 | In-memory database query processing frequently involves substantial data transfers between the CPU and memory, leading to inefficiencies due to Von Neumann bottleneck. |
| Need for zkSpeed: Accelerating HyperPlonk for Zero-Knowledge Proofs | Alhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Ramesh Karri, Siddharth Garg, Brandon Reagen | 2025-04-08 | 下载 | Zero-Knowledge Proofs (ZKPs) are rapidly gaining importance in privacy-preserving and verifiable computing. ZKPs enable a proving party to prove the truth of a statement to a verifying party without r... |
| SpikeStream: Accelerating Spiking Neural Network Inference on RISC-V Clusters with Sparse Computation Extensions | Simone Manoni, Paul Scheffler, Luca Zanatta, Andrea Acquaviva, Luca Benini, Andrea Bartolini | 2025-04-08 | 下载 | Spiking Neural Network (SNN) inference has a clear potential for high energy efficiency as computation is triggered by events. However, the inherent sparsity of events poses challenges for conventiona... |
| CVA6-VMRT: A Modular Approach Towards Time-Predictable Virtual Memory in a 64-bit Application Class RISC-V Processor | Christopher Reinwardt, Robert Balas, Alessandro Ottaviano, Angelo Garofalo, Luca Benini | 2025-04-08 | 下载 | The increasing complexity of autonomous systems has driven a shift to integrated heterogeneous SoCs with real-time and safety demands. Ensuring deterministic WCETs and low-latency for critical tasks r... |
| BoolE: Exact Symbolic Reasoning via Boolean Equality Saturation | Jiaqi Yin, Zhan Song, Chen Chen, Qihao Hu, Cunxi Yu | 2025-04-08 | 下载 | Boolean symbolic reasoning for gate-level netlists is a critical step in verification, logic and datapath synthesis, and hardware security. Specifically, reasoning datapath and adder tree in bit-blast... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Federated Neural Architecture Search with Model-Agnostic Meta Learning | Xinyuan Huang, Jiechao Gao | 2025-04-08 | 下载 | Federated Learning (FL) often struggles with data heterogeneity due to the naturally uneven distribution of user data across devices. Federated Neural Architecture Search (NAS) enables collaborative s... |
| Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication | Thomas McFarland, Julian Bellavita, Giulia Guidi | 2025-04-08 | 下载 | Sparse General Matrix Multiply (SpGEMM) is key for various High-Performance Computing (HPC) applications such as genomics and graph analytics. |
| Fixing Non-blocking Data Structures for Better Compatibility with Memory Reclamation Schemes | Md Amit Hasan Arovi, Ruslan Nikolaev | 2025-04-08 | 下载 | We present a new technique, Safe Concurrent Optimistic Traversals (SCOT), to address a well-known problem related to optimistic traversals with classical and more recent safe memory reclamation (SMR) ... |
| Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training | Daiyaan Arfeen, Dheevatsa Mudigere, Ankit More, Bhargava Gopireddy, Ahmet Inci, Gregory R. Ganger | 2025-04-08 | 下载 | LLM training is scaled up to 10Ks of GPUs by a mix of data-(DP) and model-parallel (MP) execution. Critical to achieving efficiency is tensor-parallel (TP; a form of MP) execution within tightly-coupl... |
| HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference | Shuzhang Zhong, Yanfan Sun, Ling Liang, Runsheng Wang, Ru Huang, Meng Li | 2025-04-08 | 下载 | The Mixture of Experts (MoE) architecture has demonstrated significant advantages as it enables to increase the model capacity without a proportional increase in computation. |
| TAGC: Optimizing Gradient Communication in Distributed Transformer Training | Igor Polyakov, Alexey Dukhanov, Egor Spirin | 2025-04-08 | 下载 | The increasing complexity of large language models (LLMs) necessitates efficient training strategies to mitigate the high computational costs associated with distributed training. |
| DARIS: An Oversubscribed Spatio-Temporal Scheduler for Real-Time DNN Inference on GPUs | Amir Fakhim Babaei, Thidapat Chantem | 2025-04-08 | 下载 | The widespread use of Deep Neural Networks (DNNs) is limited by high computational demands, especially in constrained environments. GPUs, though effective accelerators, often face underutilization and... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Spectrum Sharing by Space-Time Waveform Shaping | Hatef Nouri, George Sklivanitis, Dimitris A. Pados, Elizabeth Serena Bentley | 2025-04-08 | 下载 | In this paper, we consider the task of introducing a new wireless data link over a given occupied frequency band using a multi-antenna transmitter and receiver. |
| Scalable Routing in a City-Scale Wi-Fi Network for Disaster Recovery | Ziqian Liu, Om Chabra, James Lynch, Chenning Li, Manya Ghobadi, Hari Balakrishnan | 2025-04-08 | 下载 | In this paper, we present a new city-scale decentralized mesh network system suited for disaster recovery and emergencies. When wide-area connectivity is unavailable or significantly degraded, our sys... |
| A Case for Network-wide Orchestration of Host-based Intrusion Detection and Response | Mark Timmons, Daniel Lukaszewski, Geoffrey Xie | 2025-04-08 | 下载 | Recent cyber incidents and the push for zero trust security underscore the necessity of monitoring host-level events. However, current host-level intrusion detection systems (IDS) lack the ability to ... |
| Multi-Modality Sensing in mmWave Beamforming for Connected Vehicles Using Deep Learning | Muhammad Baqer Mollah, Honggang Wang, Mohammad Ataul Karim, Hua Fang | 2025-04-08 | 下载 | Beamforming techniques are considered as essential parts to compensate severe path losses in millimeter-wave (mmWave) communications. In particular, these techniques adopt large antenna arrays and for... |
| Sherlock: A Dataset for Process-aware Intrusion Detection Research on Power Grid Networks | Eric Wagner, Lennart Bader, Konrad Wolsing, Martin Serror | 2025-04-08 | 下载 | Physically distributed components and legacy protocols make the protection of power grids against increasing cyberattack threats challenging. Infamously, the 2015 and 2016 blackouts in Ukraine were ca... |
| Context-aware Rate Adaptation for Predictive Flying Networks using Contextual Bandits | Ruben Queiros, Megumi Kaneko, Helder Fontes, Rui Campos | 2025-04-08 | 下载 | The increasing complexity of wireless technologies, such as Wi-Fi, presents significant challenges for Rate Adaptation (RA) due to the large configuration space of transmission parameters. |
| Negotiating strict latency limits for dynamic real-time services in vehicular time-sensitive networks | Timo Salomon, Lisa Maile, Philipp Meyer, Franz Korf, Thomas C. Schmidt | 2025-04-08 | 下载 | Future vehicles are expected to dynamically deploy in-vehicle applications within a Service-Oriented Architecture (SOA) while critical services continue to operate under hard real-time constraints. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Zerrow: True Zero-Copy Arrow Pipelines in Bauplan | Yifan Dai, Jacopo Tagliabue, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, Tyler R. Caraza-Harter | 2025-04-08 | 下载 | Bauplan is a FaaS-based lakehouse specifically built for data pipelines: its execution engine uses Apache Arrow for data passing between the nodes in the DAG. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| cuTeSpMM: Accelerating Sparse-Dense Matrix Multiplication using GPU Tensor Cores | Lizhi Xiang, Omid Asudeh, Gerald Sabin, Aravind Sukumaran-Rajam, P. Sadayappan | 2025-04-08 | 下载 | Many recent GPUs feature matrix multiplication engines (aka Tensor Core Units or TCUs) that perform small fixed-size matrix-matrix products at very high throughput. |