Skip to content

2024-03-30

cs.AR - Architecture

标题作者发布日期PDF摘要
DE-HNN: An effective neural model for Circuit Netlist representationZhishang Luo, Truong Son Hy, Puoya Tabaghi, Donghyeon Koh, Michael Defferrard, Elahe Rezaei, Ryan Carey, Rhett Davis, Rajeev Jain, Yusu Wang2024-03-30下载The run-time for optimization tools used in chip design has grown with the complexity of designs to the point where it can take several days to go through one design cycle which has become a bottlenec...
Inexactness and Correction of Floating-Point Reciprocal, Division and Square RootLucas M. Dutton, Christopher Kumar Anand, Robert Enenkel, Silvia Melitta Müller2024-03-30下载Floating-point arithmetic performance determines the overall performance of important applications, from graphics to AI. Meeting the IEEE-754 specification for floating-point requires that final resul...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Computation and Communication Efficient Lightweighting Vertical Federated Learning for Smart Building IoTHeqiang Wang, Xiang Liu, Yucheng Liu, Jia Zhou, Weihong Yang, Xiaoxiong Zhong2024-03-30下载With the increasing number and enhanced capabilities of IoT devices in smart buildings, these devices are evolving beyond basic data collection and control to actively participate in deep learning tas...
Communication Efficient Distributed Training with Distributed LionBo Liu, Lemeng Wu, Lizhang Chen, Kaizhao Liang, Jiaxu Zhu, Chen Liang, Raghuraman Krishnamoorthi, Qiang Liu2024-03-30下载The Lion optimizer has been a promising competitor with the AdamW for training large AI models, with advantages on memory, computation, and sample efficiency.
Going Forward-Forward in Distributed Deep LearningEge Aktemur, Ege Zorlutuna, Kaan Bilgili, Tacettin Emre Bok, Berrin Yanikoglu, Suha Orhun Mutluergil2024-03-30下载We introduce a new approach in distributed deep learning, utilizing Geoffrey Hinton's Forward-Forward (FF) algorithm to speed up the training of neural networks in distributed computing environments.
Asymptotically Optimal Scheduling of Multiple Parallelizable Job ClassesBenjamin Berg, Benjamin Moseley, Weina Wang, Mor Harchol-Balter2024-03-30下载Modern computing workloads are often composed of parallelizable jobs. A parallelizable job can be completed more quickly when run on additional servers.
Engineering A Workload-balanced Push-Relabel Algorithm for Massive Graphs on GPUsChou-Ying Hsieh, Po-Chieh Lin, Sy-Yen Kuo2024-03-30下载The push-relabel algorithm is an efficient algorithm that solves the maximum flow/ minimum cut problems of its affinity to parallelization. As the size of graphs grows exponentially, researchers have ...

cs.PF - Performance

标题作者发布日期PDF摘要
Asymptotically Optimal Scheduling of Multiple Parallelizable Job ClassesBenjamin Berg, Benjamin Moseley, Weina Wang, Mor Harchol-Balter2024-03-30下载Modern computing workloads are often composed of parallelizable jobs. A parallelizable job can be completed more quickly when run on additional servers.

基于 VitePress 构建