Skip to content

2025-11-10

cs.AR - Architecture

标题作者发布日期PDF摘要
FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud ProcessingYuzhe Fu, Changchun Zhou, Hancheng Ye, Bowen Duan, Qiyu Huang, Chiyue Wei, Cong Guo, Hai "Helen'' Li, Yiran Chen2025-11-10下载Three-dimensional (3D) point clouds are increasingly used in applications such as autonomous driving, robotics, and virtual reality (VR). Point-based neural networks (PNNs) have demonstrated strong pe...
ZeroSim: Zero-Shot Analog Circuit Evaluation with Unified Transformer EmbeddingsXiaomeng Yang, Jian Gao, Yanzhi Wang, Xuan Zhang2025-11-10下载Although recent advancements in learning-based analog circuit design automation have tackled tasks such as topology generation, device sizing, and layout synthesis, efficient performance evaluation re...
Decoupled Control Flow and Data Access in RISC-V GPGPUsGiuseppe M. Sarda, Nimish Shah, Abubakr Nada, Debjyoti Bhattacharjee, Marian Verhelst2025-11-10下载Vortex, a newly proposed open-source GPGPU platform based on the RISC-V ISA, offers a valid alternative for GPGPU research over the broadly-used modeling platforms based on commercial GPUs.
FPGA-Accelerated RISC-V ISA Extensions for Efficient Neural Network Inference on Edge DevicesArya Parameshwara, Santosh Hanamappa Mokashi2025-11-10下载Edge AI deployment faces critical challenges balancing computational performance, energy efficiency, and resource constraints. This paper presents FPGA-accelerated RISC-V instruction set architecture ...
Optimizing GEMM for Energy and Performance on Versal ACAP ArchitecturesIlias Papalamprou, Dimosthenis Masouros, Ioannis Loudaros, Francky Catthoor, Dimitrios Soudris2025-11-10下载General Matrix Multiplication (GEMM) is a fundamental operation in many scientific workloads, signal processing, and particularly deep learning.
P3-LLM: An Integrated NPU-PIM Accelerator for LLM Inference Using Hybrid Numerical FormatsYuzong Chen, Chao Fang, Xilai Dai, Yuheng Wu, Thierry Tambe, Marian Verhelst, Mohamed S. Abdelfattah2025-11-10下载The substantial memory bandwidth and computational demands of large language models (LLMs) present critical challenges for efficient inference.
ASTER: Attention-based Spiking Transformer Engine for Event-driven ReasoningTamoghno Das, Khanh Phan Vu, Hanning Chen, Hyunwoo Oh, Mohsen Imani2025-11-10下载The integration of spiking neural networks (SNNs) with transformer-based architectures has opened new opportunities for bio-inspired low-power, event-driven visual reasoning on edge devices.
Reconfigurable Quantum Instruction Set Computers for High Performance Attainable on HardwareZhaohui Yang, Dawei Ding, Qi Ye, Cupjin Huang, Jianxin Chen, Yuan Xie2025-11-10下载The performance of current quantum hardware is severely limited. While expanding the quantum ISA with high-fidelity, expressive basis gates is a key path forward, it imposes significant gate calibrati...
Preemption-Enhanced Benchmark Suite for FPGAsArsalan Ali Malik, John Buchanan, Aydin Aysu2025-11-10下载Field-Programmable Gate Arrays (FPGAs) have become essential in cloud computing due to their reconfigurability, energy efficiency, and ability to accelerate domain-specific workloads.
Hardware-Aware Neural Network Compilation with Learned Optimization: A RISC-V Accelerator ApproachRavindra Ganti, Steve Xu2025-11-10下载We present XgenSilicon ML Compiler, a fully automated end-to-end compilation framework that transforms high-level machine learning models into optimized RISC-V assembly code for custom ASIC accelerato...
EONSim: An NPU Simulator for On-Chip Memory and Embedding Vector OperationsSangun Choi, Yunho Oh2025-11-10下载Embedding vector operations are a key component of modern deep neural network workloads. Unlike matrix operations with deterministic access patterns, embedding vector operations exhibit input data-dep...
DMA Collectives for Efficient ML Communication OffloadsSuchita Pati, Mahzabeen Islam, Shaizeen Aga, Mohamed Assem Ibrahim2025-11-10下载Offloading machine learning (ML) communication collectives to direct memory access (DMA) engines has emerged as an interesting and low-cost solution to efficiently overlap computation and communicatio...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
SemanticForge: Repository-Level Code Generation through Semantic Knowledge Graphs and Constraint SatisfactionWuyang Zhang, Chenkai Zhang, Zhen Luo, Jianming Ma, Wangming Yuan, Chuqiao Gu, Chenwei Feng2025-11-10下载Large language models (LLMs) have transformed software development by enabling automated code generation, yet they frequently suffer from systematic errors that limit practical deployment.
HyProv: Hybrid Provenance Management for Scientific WorkflowsVasilis Bountris, Lauritz Thamsen, Ulf Leser2025-11-10下载Provenance plays a crucial role in scientific workflow execution, for instance by providing data for failure analysis, real-time monitoring, or statistics on resource utilization for right-sizing allo...
Lightning Grasp: High Performance Procedural Grasp Synthesis with Contact FieldsZhao-Heng Yin, Pieter Abbeel2025-11-10下载Despite years of research, real-time diverse grasp synthesis for dexterous hands remains an unsolved core challenge in robotics and computer graphics.
LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM InfrastructureJaehong Cho, Hyunmin Choi, Jongse Park2025-11-10下载This paper introduces LLMServingSim2.0, a system simulator designed for exploring heterogeneous hardware in large-scale LLM serving systems. LLMServingSim2.
Resilient by Design -- Active Inference for Distributed Continuum IntelligencePraveen Kumar Donta, Alfreds Lapkovskis, Enzo Mingozzi, Schahram Dustdar2025-11-10下载Failures are the norm in highly complex and heterogeneous devices spanning the distributed computing continuum (DCC), from resource-constrained IoT and edge nodes to high-performance computing systems...
A GPU-boosted high-performance multi-working condition joint analysis framework for predicting dynamics of textured axial piston pumpXin Yao, Yang Liu, Jin Jiang, Yesen Chen, Zhilong Chen, Hongkang Dong, Xiaofeng Wei, Teng Zhang, Dongyun Wang2025-11-10下载Accurate simulation to dynamics of axial piston pump (APP) is essential for its design, manufacture and maintenance. However, limited by computation capacity of CPU device and traditional solvers, con...
Wireless Sensor Networks Nodes Clustering and Optimization Based on Fuzzy C-Means and Water Strider AlgorithmsRaya Majid Alsharfa, Mahmood Mohassel Feghhi, Majid Hameed Majeed2025-11-10下载Wireless sensor networks (WSNs) face critical challenges in energy management and network lifetime optimization due to limited battery resources and communication overhead.
Argus: Quality-Aware High-Throughput Text-to-Image Inference Serving SystemShubham Agarwal, Subrata Mitra, Saud Iqbal2025-11-10下载Text-to-image (T2I) models have gained significant popularity. Most of these are diffusion models with unique computational characteristics, distinct from both traditional small-scale ML models and la...
DMA Collectives for Efficient ML Communication OffloadsSuchita Pati, Mahzabeen Islam, Shaizeen Aga, Mohamed Assem Ibrahim2025-11-10下载Offloading machine learning (ML) communication collectives to direct memory access (DMA) engines has emerged as an interesting and low-cost solution to efficiently overlap computation and communicatio...
Saarthi: An End-to-End Intelligent Platform for Optimising Distributed Serverless WorkloadsSiddharth Agarwal, Maria A. Rodriguez, Rajkumar Buyya2025-11-10下载FaaS offers significant advantages with its infrastructure abstraction, on-demand execution, and attractive no idle resource pricing for modern cloud applications.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
UAV-Assisted Resilience in 6G and Beyond Network Energy Saving: A Multi-Agent DRL ApproachDao Lan Vy Dinh, Anh Nguyen Thi Mai, Hung Tran, Giang Quynh Le Vu, Tu Dac Ho, Zhenni Pan, Vo Nhan Van, Symeon Chatzinotas, Dinh-Hieu Tran2025-11-10下载This paper investigates the unmanned aerial vehicle (UAV)-assisted resilience perspective in the 6G network energy saving (NES) scenario. More specifically, we consider multiple ground base stations (...
When Intelligence Overloads Infrastructure: A Forecast Model for AI-Driven BottlenecksGamal Refai-Ahmed, Mallik Tatipamula, Victor Zhirnov, Ahmed Refaey Hussein, Abdallah Shami2025-11-10下载The exponential growth of AI agents and connected devices fundamentally transforms the structure and capacity demands of global digital infrastructure.
Resilient by Design -- Active Inference for Distributed Continuum IntelligencePraveen Kumar Donta, Alfreds Lapkovskis, Enzo Mingozzi, Schahram Dustdar2025-11-10下载Failures are the norm in highly complex and heterogeneous devices spanning the distributed computing continuum (DCC), from resource-constrained IoT and edge nodes to high-performance computing systems...
Improving Remote Patient Monitoring Systems Using a Fog-based IoT Platform with Speech RecognitionMarc Jayson Baucas, Petros Spachos2025-11-10下载Due to the recent shortage of resources in the healthcare industry, Remote Patient Monitoring (RPM) systems arose to establish a convenient alternative for accessing healthcare services remotely.
Graph Representation-based Model Poisoning on the Heterogeneous Internet of AgentsHanlin Cai, Houtianfu Wang, Haofan Dong, Kai Li, Sai Zou, Ozgur B. Akan2025-11-10下载Internet of Agents (IoA) envisions a unified, agent-centric paradigm where heterogeneous large language model (LLM) agents can interconnect and collaborate at scale.
Fine-Tuning Diffusion-Based Recommender Systems via Reinforcement Learning with Reward Function OptimizationYu Hou, Hua Li, Ha Young Kim, Won-Yong Shin2025-11-10下载Diffusion models recently emerged as a powerful paradigm for recommender systems, offering state-of-the-art performance by modeling the generative process of user-item interactions.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
GoCkpt: Gradient-Assisted Multi-Step overlapped Checkpointing for Efficient LLM TrainingKeyao Zhang, Yiquan Chen, Zhuo Hu, Wenhai Lin, Jiexiong Xu, Wenzhi Chen2025-11-10下载The accuracy of large language models (LLMs) improves with increasing model size, but increasing model complexity also poses significant challenges to training stability.
Preemption-Enhanced Benchmark Suite for FPGAsArsalan Ali Malik, John Buchanan, Aydin Aysu2025-11-10下载Field-Programmable Gate Arrays (FPGAs) have become essential in cloud computing due to their reconfigurability, energy efficiency, and ability to accelerate domain-specific workloads.

cs.PF - Performance

标题作者发布日期PDF摘要
Energy Consumption of Dataframe Libraries for End-to-End Deep Learning Pipelines:A Comparative AnalysisPunit Kumar, Asif Imran, Tevfik Kosar2025-11-10下载This paper presents a detailed comparative analysis of the performance of three major Python data manipulation libraries - Pandas, Polars, and Dask - specifically when embedded within complete deep le...

基于 VitePress 构建