Skip to content

2024-09-13

cs.AR - Architecture

标题作者发布日期PDF摘要
Parallel Tempering-Inspired Distributed Binary Optimization with In-Memory ComputingXiangyi Zhang, Fabian Böhm, Elisabetta Valiante, Moslem Noori, Thomas Van Vaerenbergh, Chan-Woo Yang, Giacomo Pedretti, Masoud Mohseni, Raymond Beausoleil, Ignacio Rozada2024-09-13下载In-memory computing (IMC) has been shown to be a promising approach for solving binary optimization problems while significantly reducing energy and latency.
Generic and ML Workloads in an HPC Datacenter: Node Energy, Job Failures, and Node-Job AnalysisXiaoyu Chu, Daniel Hofstätter, Shashikant Ilager, Sacheendra Talluri, Duncan Kampert, Damian Podareanu, Dmitry Duplyakin, Ivona Brandic, Alexandru Iosup2024-09-13下载HPC datacenters offer a backbone to the modern digital society. Increasingly, they run Machine Learning (ML) jobs next to generic, compute-intensive workloads, supporting science, business, and other ...
Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network AcceleratorsKonstantin Lübeck, Alexander Louis-Ferdinand Jung, Felix Wedlich, Mika Markus Müller, Federico Nicolás Peccia, Felix Thömmes, Jannik Steinmetz, Valentin Biermaier, Adrian Frischknecht, Paul Palomero Bernardo, Oliver Bringmann2024-09-13下载Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their perform...
AnalogGym: An Open and Practical Testing Suite for Analog Circuit SynthesisJintao Li, Haochang Zhi, Ruiyu Lyu, Wangzhen Li, Zhaori Bi, Keren Zhu, Yanhan Zeng, Weiwei Shan, Changhao Yan, Fan Yang, Yun Li, Xuan Zeng2024-09-13下载Recent advances in machine learning (ML) for automating analog circuit synthesis have been significant, yet challenges remain. A critical gap is the lack of a standardized evaluation framework, compou...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
HotSwap: Enabling Live Dependency Sharing in Serverless ComputingRui Li, Devesh Tiwari, Gene Cooperman2024-09-13下载This work presents HotSwap, a novel provider-side cold-start optimization for serverless computing. This optimization reduces cold-start time when booting and loading dependencies at runtime inside a ...
Generic and ML Workloads in an HPC Datacenter: Node Energy, Job Failures, and Node-Job AnalysisXiaoyu Chu, Daniel Hofstätter, Shashikant Ilager, Sacheendra Talluri, Duncan Kampert, Damian Podareanu, Dmitry Duplyakin, Ivona Brandic, Alexandru Iosup2024-09-13下载HPC datacenters offer a backbone to the modern digital society. Increasingly, they run Machine Learning (ML) jobs next to generic, compute-intensive workloads, supporting science, business, and other ...
Exploring System-Heterogeneous Federated Learning with Dynamic Model SelectionDixi Yao2024-09-13下载Federated learning is a distributed learning paradigm in which multiple mobile clients train a global model while keeping data local. These mobile clients can have various available memory and network...
Accurate Computation of the Logarithm of Modified Bessel Functions on GPUsAndreas Plesner, Hans Henrik Brandenborg Sørensen, Søren Hauberg2024-09-13下载Bessel functions are critical in scientific computing for applications such as machine learning, protein structure modeling, and robotics. However, currently, available routines lack precision or fail...
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum FilteringChangxin Liu, Yanghao Li, Yuhao Yi, Karl H. Johansson2024-09-13下载Distributed learning has become the standard approach for training large-scale machine learning models across private data silos. While distributed learning enhances privacy preservation and training ...
CompressedMediQ: Hybrid Quantum Machine Learning Pipeline for High-Dimensional Neuroimaging DataKuan-Cheng Chen, Yi-Tien Li, Tai-Yu Li, Chen-Yu Liu, Po-Heng Li, Cheng-Yu Chen2024-09-13下载This paper introduces CompressedMediQ, a novel hybrid quantum-classical machine learning pipeline specifically developed to address the computational challenges associated with high-dimensional multi-...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Throughput-Optimal Scheduling via Rate LearningPanagiotis Promponas, Víctor Valls, Konstantinos Nikolakakis, Dionysis Kalogerias, Leandros Tassiulas2024-09-13下载We study the problem of designing scheduling policies for communication networks. This problem is often addressed with max-weight-type approaches since they are throughput-optimal.
Dynamic Pricing based Near-Optimal Resource Allocation for Elastic Edge OffloadingYun Xia, Hai Xue, Di Zhang, Shahid Mumtaz, Xiaolong Xu, Joel J. P. C. Rodrigues2024-09-13下载In mobile edge computing (MEC), task offloading can significantly reduce task execution latency and energy consumption of end user (EU). However, edge server (ES) resources are limited, necessitating ...

cs.PF - Performance

标题作者发布日期PDF摘要
Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network AcceleratorsKonstantin Lübeck, Alexander Louis-Ferdinand Jung, Felix Wedlich, Mika Markus Müller, Federico Nicolás Peccia, Felix Thömmes, Jannik Steinmetz, Valentin Biermaier, Adrian Frischknecht, Paul Palomero Bernardo, Oliver Bringmann2024-09-13下载Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their perform...

基于 VitePress 构建