Skip to content

2025-12-05

cs.AR - Architecture

标题作者发布日期PDF摘要
SparsePixels: Efficient Convolution for Sparse Data on FPGAsHo Fung Tsoi, Dylan Rankin, Vladimir Loncar, Philip Harris2025-12-05下载Inference of standard convolutional neural networks (CNNs) on FPGAs often incurs high latency and a long initiation interval due to the deep nested loops required to densely convolve every input pixel...
From PyTorch to Calyx: An Open-Source Compiler Toolchain for ML AcceleratorsJiahan Xie, Evan Williams, Adrian Sampson2025-12-05下载We present an end-to-end open-source compiler toolchain that targets synthesizable SystemVerilog from ML models written in PyTorch. Our toolchain leverages the accelerator design language Allo, the ha...
Hardware Software Optimizations for Fast Model Recovery on Reconfigurable ArchitecturesBin Xu, Ayan Banerjee, Sandeep Gupta2025-12-05下载Model Recovery (MR) is a core primitive for physical AI and real-time digital twins, but GPUs often execute MR inefficiently due to iterative dependencies, kernel-launch overheads, underutilized memor...
Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving WorkloadsBoyu Li, Zongwei Zhu, Yi Xiong, Qianyue Cao, Jiawei Geng, Xiaonan Zhang, Xi Li2025-12-05下载Large Language Models (LLMs) impose massive computational demands, driving the need for scalable multi-chiplet accelerators. However, existing mapping space exploration efforts for such accelerators p...
ChipMind: Retrieval-Augmented Reasoning for Long-Context Circuit Design SpecificationsChangwen Xing, SamZaak Wong, Xinlai Wan, Yanfeng Lu, Mengli Zhang, Zebin Ma, Lei Qi, Zhengxiong Li, Nan Guan, Zhe Jiang, Xi Wang, Jun Yang2025-12-05下载While Large Language Models (LLMs) demonstrate immense potential for automating integrated circuit (IC) development, their practical deployment is fundamentally limited by restricted context windows.
First Demonstration of Second-order Training of Deep Neural Networks with In-memory Analog Matrix ComputingSaitao Zhang, Yubiao Luo, Shiqing Wang, Pushen Zuo, Yongxiang Li, Lunshuai Pan, Zheng Miao, Zhong Sun2025-12-05下载Second-order optimization methods, which leverage curvature information, offer faster and more stable convergence than first-order methods such as stochastic gradient descent (SGD) and Adam.
When Forgetting Builds Reliability: LLM Unlearning for Reliable Hardware Code GenerationYiwen Liang, Qiufeng Li, Shikai Wang, Weidong Cao2025-12-05下载Large Language Models (LLMs) have shown strong potential in accelerating digital hardware design through automated code generation. Yet, ensuring their reliability remains a critical challenge, as exi...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Evaluation Framework for Centralized and Decentralized Aggregation Algorithm in Federated SystemsSumit Chongder2025-12-05下载In recent years, the landscape of federated learning has witnessed significant advancements, particularly in decentralized methodologies. This research paper presents a comprehensive comparison of Cen...
Metronome: Differentiated Delay Scheduling for Serverless FunctionsZhuangbin Chen, Juzheng Zheng, Zibin Zheng2025-12-05下载Function-as-a-Service (FaaS) computing is an emerging cloud computing paradigm for its ease-of-management and elasticity. However, optimizing scheduling for serverless functions remains challenging du...
Are Bus-Mounted Edge Servers Feasible?Xuezhi Li, Jiancong He, Ming Xie, Xuyang Chen, Le Chang, Li Jiang, Gui Gui2025-12-05下载Placement of edge servers is the prerequisite of provisioning edge computing services for Internet of Vehicles (IoV). Fixed-site edge servers at Road Side Units (RSUs) or base stations are able to off...
Compiler-supported reduced precision and AoS-SoA transformations for heterogeneous hardwarePawel K. Radtke, Tobias Weinzierl2025-12-05下载This study evaluates AoS-to-SoA transformations over reduced-precision data layouts for a particle simulation code on several GPU platforms: We hypothesize that SoA fits particularly well to SIMT, whi...
Model Gateway: Model Management Platform for Model-Driven Drug DiscoveryYan-Shiun Wu, Nathan A. Morin2025-12-05下载This paper presents the Model Gateway, a management platform for managing machine learning (ML) and scientific computational models in the drug discovery pipeline.
FedGMR: Federated Learning with Gradual Model Restoration under Asynchrony and Model HeterogeneityChengjie Ma, Seungeun Oh, Jihong Park, Seong-Lyun Kim2025-12-05下载Federated learning (FL) holds strong potential for distributed machine learning, but in heterogeneous environments, Bandwidth-Constrained Clients (BCCs) often struggle to participate effectively due t...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
AIMNET: An IoT-Empowered Digital Twin for Continuous Gas Emission Monitoring and Early Hazard DetectionZifan Zhou, Xuan Wang, Yang Yan, Lkhanaajav Mijiddorj, Yu Ding, Tyler Beringer, Parisa Masnadi Khiabani, Wolfgang G. Jentner, Xiao-Ming Hu, Chenghao Wang, Bryan M. Carroll, Ming Xue, David Ebert, Bin Li, Binbin Weng2025-12-05下载A Digital Twin (DT) framework to enhance carbon-based gas plume monitoring is critical for supporting timely and effective mitigation responses to environmental hazards such as industrial gas leaks, o...
Cross-Domain Elephant Flow Detection: A Unified Machine Learning Approach with Application-Aware and Security FeaturesTabidah Usmani, Sara Zahid, Amna Javaid2025-12-05下载Network traffic classification, particularly elephant flow detection, faces significant challenges when deployed across heterogeneous network environments.
AIORA: An AI-Native Multi-Stakeholder Orchestration Architecture for 6G ContinuumNuria Molner, Luis Rosa, Fulvio Risso, Konstantinos Samdanis, David Artuñedo, Rob Smets, Tarik Taleb, David Gomez-Barquero2025-12-05下载This paper elaborates on a novel AI-native architecture for emerging 6G systems harnessing open APIs, along with supporting mechanisms to empower intelligent and coordinated orchestration of edge-clou...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Compiling Away the Overhead of Race DetectionAlexey Paznikov, Andrey Kogutenko, Yaroslav Osipov, Michael Schwarz, Umang Mathur2025-12-05下载Dynamic data race detectors are indispensable for flagging concurrency errors in software, but their high runtime overhead limits their adoption.

cs.PF - Performance

标题作者发布日期PDF摘要
Dissecting Embedding Bag Performance in DLRM InferenceChandrish Ambati, Jing Ding, Trung Diep2025-12-05下载As the size of DLRMs gets larger, the models must be partitioned across multiple GPUs or nodes of GPUs due to the size limitation of total HBM memory that can be packaged in a GPU.

基于 VitePress 构建