2025-12-05

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
SparsePixels: Efficient Convolution for Sparse Data on FPGAs	Ho Fung Tsoi, Dylan Rankin, Vladimir Loncar, Philip Harris	2025-12-05	下载	Inference of standard convolutional neural networks (CNNs) on FPGAs often incurs high latency and a long initiation interval due to the deep nested loops required to densely convolve every input pixel...
From PyTorch to Calyx: An Open-Source Compiler Toolchain for ML Accelerators	Jiahan Xie, Evan Williams, Adrian Sampson	2025-12-05	下载	We present an end-to-end open-source compiler toolchain that targets synthesizable SystemVerilog from ML models written in PyTorch. Our toolchain leverages the accelerator design language Allo, the ha...
Hardware Software Optimizations for Fast Model Recovery on Reconfigurable Architectures	Bin Xu, Ayan Banerjee, Sandeep Gupta	2025-12-05	下载	Model Recovery (MR) is a core primitive for physical AI and real-time digital twins, but GPUs often execute MR inefficiently due to iterative dependencies, kernel-launch overheads, underutilized memor...
Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads	Boyu Li, Zongwei Zhu, Yi Xiong, Qianyue Cao, Jiawei Geng, Xiaonan Zhang, Xi Li	2025-12-05	下载	Large Language Models (LLMs) impose massive computational demands, driving the need for scalable multi-chiplet accelerators. However, existing mapping space exploration efforts for such accelerators p...
ChipMind: Retrieval-Augmented Reasoning for Long-Context Circuit Design Specifications	Changwen Xing, SamZaak Wong, Xinlai Wan, Yanfeng Lu, Mengli Zhang, Zebin Ma, Lei Qi, Zhengxiong Li, Nan Guan, Zhe Jiang, Xi Wang, Jun Yang	2025-12-05	下载	While Large Language Models (LLMs) demonstrate immense potential for automating integrated circuit (IC) development, their practical deployment is fundamentally limited by restricted context windows.
First Demonstration of Second-order Training of Deep Neural Networks with In-memory Analog Matrix Computing	Saitao Zhang, Yubiao Luo, Shiqing Wang, Pushen Zuo, Yongxiang Li, Lunshuai Pan, Zheng Miao, Zhong Sun	2025-12-05	下载	Second-order optimization methods, which leverage curvature information, offer faster and more stable convergence than first-order methods such as stochastic gradient descent (SGD) and Adam.
When Forgetting Builds Reliability: LLM Unlearning for Reliable Hardware Code Generation	Yiwen Liang, Qiufeng Li, Shikai Wang, Weidong Cao	2025-12-05	下载	Large Language Models (LLMs) have shown strong potential in accelerating digital hardware design through automated code generation. Yet, ensuring their reliability remains a critical challenge, as exi...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Evaluation Framework for Centralized and Decentralized Aggregation Algorithm in Federated Systems	Sumit Chongder	2025-12-05	下载	In recent years, the landscape of federated learning has witnessed significant advancements, particularly in decentralized methodologies. This research paper presents a comprehensive comparison of Cen...
Metronome: Differentiated Delay Scheduling for Serverless Functions	Zhuangbin Chen, Juzheng Zheng, Zibin Zheng	2025-12-05	下载	Function-as-a-Service (FaaS) computing is an emerging cloud computing paradigm for its ease-of-management and elasticity. However, optimizing scheduling for serverless functions remains challenging du...
Are Bus-Mounted Edge Servers Feasible?	Xuezhi Li, Jiancong He, Ming Xie, Xuyang Chen, Le Chang, Li Jiang, Gui Gui	2025-12-05	下载	Placement of edge servers is the prerequisite of provisioning edge computing services for Internet of Vehicles (IoV). Fixed-site edge servers at Road Side Units (RSUs) or base stations are able to off...
Compiler-supported reduced precision and AoS-SoA transformations for heterogeneous hardware	Pawel K. Radtke, Tobias Weinzierl	2025-12-05	下载	This study evaluates AoS-to-SoA transformations over reduced-precision data layouts for a particle simulation code on several GPU platforms: We hypothesize that SoA fits particularly well to SIMT, whi...
Model Gateway: Model Management Platform for Model-Driven Drug Discovery	Yan-Shiun Wu, Nathan A. Morin	2025-12-05	下载	This paper presents the Model Gateway, a management platform for managing machine learning (ML) and scientific computational models in the drug discovery pipeline.
FedGMR: Federated Learning with Gradual Model Restoration under Asynchrony and Model Heterogeneity	Chengjie Ma, Seungeun Oh, Jihong Park, Seong-Lyun Kim	2025-12-05	下载	Federated learning (FL) holds strong potential for distributed machine learning, but in heterogeneous environments, Bandwidth-Constrained Clients (BCCs) often struggle to participate effectively due t...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
AIMNET: An IoT-Empowered Digital Twin for Continuous Gas Emission Monitoring and Early Hazard Detection	Zifan Zhou, Xuan Wang, Yang Yan, Lkhanaajav Mijiddorj, Yu Ding, Tyler Beringer, Parisa Masnadi Khiabani, Wolfgang G. Jentner, Xiao-Ming Hu, Chenghao Wang, Bryan M. Carroll, Ming Xue, David Ebert, Bin Li, Binbin Weng	2025-12-05	下载	A Digital Twin (DT) framework to enhance carbon-based gas plume monitoring is critical for supporting timely and effective mitigation responses to environmental hazards such as industrial gas leaks, o...
Cross-Domain Elephant Flow Detection: A Unified Machine Learning Approach with Application-Aware and Security Features	Tabidah Usmani, Sara Zahid, Amna Javaid	2025-12-05	下载	Network traffic classification, particularly elephant flow detection, faces significant challenges when deployed across heterogeneous network environments.
AIORA: An AI-Native Multi-Stakeholder Orchestration Architecture for 6G Continuum	Nuria Molner, Luis Rosa, Fulvio Risso, Konstantinos Samdanis, David Artuñedo, Rob Smets, Tarik Taleb, David Gomez-Barquero	2025-12-05	下载	This paper elaborates on a novel AI-native architecture for emerging 6G systems harnessing open APIs, along with supporting mechanisms to empower intelligent and coordinated orchestration of edge-clou...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Compiling Away the Overhead of Race Detection	Alexey Paznikov, Andrey Kogutenko, Yaroslav Osipov, Michael Schwarz, Umang Mathur	2025-12-05	下载	Dynamic data race detectors are indispensable for flagging concurrency errors in software, but their high runtime overhead limits their adoption.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Dissecting Embedding Bag Performance in DLRM Inference	Chandrish Ambati, Jing Ding, Trung Diep	2025-12-05	下载	As the size of DLRMs gets larger, the models must be partitioned across multiple GPUs or nodes of GPUs due to the size limitation of total HBM memory that can be packaged in a GPU.