2024-09-13

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Parallel Tempering-Inspired Distributed Binary Optimization with In-Memory Computing	Xiangyi Zhang, Fabian Böhm, Elisabetta Valiante, Moslem Noori, Thomas Van Vaerenbergh, Chan-Woo Yang, Giacomo Pedretti, Masoud Mohseni, Raymond Beausoleil, Ignacio Rozada	2024-09-13	下载	In-memory computing (IMC) has been shown to be a promising approach for solving binary optimization problems while significantly reducing energy and latency.
Generic and ML Workloads in an HPC Datacenter: Node Energy, Job Failures, and Node-Job Analysis	Xiaoyu Chu, Daniel Hofstätter, Shashikant Ilager, Sacheendra Talluri, Duncan Kampert, Damian Podareanu, Dmitry Duplyakin, Ivona Brandic, Alexandru Iosup	2024-09-13	下载	HPC datacenters offer a backbone to the modern digital society. Increasingly, they run Machine Learning (ML) jobs next to generic, compute-intensive workloads, supporting science, business, and other ...
Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators	Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Felix Wedlich, Mika Markus Müller, Federico Nicolás Peccia, Felix Thömmes, Jannik Steinmetz, Valentin Biermaier, Adrian Frischknecht, Paul Palomero Bernardo, Oliver Bringmann	2024-09-13	下载	Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their perform...
AnalogGym: An Open and Practical Testing Suite for Analog Circuit Synthesis	Jintao Li, Haochang Zhi, Ruiyu Lyu, Wangzhen Li, Zhaori Bi, Keren Zhu, Yanhan Zeng, Weiwei Shan, Changhao Yan, Fan Yang, Yun Li, Xuan Zeng	2024-09-13	下载	Recent advances in machine learning (ML) for automating analog circuit synthesis have been significant, yet challenges remain. A critical gap is the lack of a standardized evaluation framework, compou...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
HotSwap: Enabling Live Dependency Sharing in Serverless Computing	Rui Li, Devesh Tiwari, Gene Cooperman	2024-09-13	下载	This work presents HotSwap, a novel provider-side cold-start optimization for serverless computing. This optimization reduces cold-start time when booting and loading dependencies at runtime inside a ...
Generic and ML Workloads in an HPC Datacenter: Node Energy, Job Failures, and Node-Job Analysis	Xiaoyu Chu, Daniel Hofstätter, Shashikant Ilager, Sacheendra Talluri, Duncan Kampert, Damian Podareanu, Dmitry Duplyakin, Ivona Brandic, Alexandru Iosup	2024-09-13	下载	HPC datacenters offer a backbone to the modern digital society. Increasingly, they run Machine Learning (ML) jobs next to generic, compute-intensive workloads, supporting science, business, and other ...
Exploring System-Heterogeneous Federated Learning with Dynamic Model Selection	Dixi Yao	2024-09-13	下载	Federated learning is a distributed learning paradigm in which multiple mobile clients train a global model while keeping data local. These mobile clients can have various available memory and network...
Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs	Andreas Plesner, Hans Henrik Brandenborg Sørensen, Søren Hauberg	2024-09-13	下载	Bessel functions are critical in scientific computing for applications such as machine learning, protein structure modeling, and robotics. However, currently, available routines lack precision or fail...
Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering	Changxin Liu, Yanghao Li, Yuhao Yi, Karl H. Johansson	2024-09-13	下载	Distributed learning has become the standard approach for training large-scale machine learning models across private data silos. While distributed learning enhances privacy preservation and training ...
CompressedMediQ: Hybrid Quantum Machine Learning Pipeline for High-Dimensional Neuroimaging Data	Kuan-Cheng Chen, Yi-Tien Li, Tai-Yu Li, Chen-Yu Liu, Po-Heng Li, Cheng-Yu Chen	2024-09-13	下载	This paper introduces CompressedMediQ, a novel hybrid quantum-classical machine learning pipeline specifically developed to address the computational challenges associated with high-dimensional multi-...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Throughput-Optimal Scheduling via Rate Learning	Panagiotis Promponas, Víctor Valls, Konstantinos Nikolakakis, Dionysis Kalogerias, Leandros Tassiulas	2024-09-13	下载	We study the problem of designing scheduling policies for communication networks. This problem is often addressed with max-weight-type approaches since they are throughput-optimal.
Dynamic Pricing based Near-Optimal Resource Allocation for Elastic Edge Offloading	Yun Xia, Hai Xue, Di Zhang, Shahid Mumtaz, Xiaolong Xu, Joel J. P. C. Rodrigues	2024-09-13	下载	In mobile edge computing (MEC), task offloading can significantly reduce task execution latency and energy consumption of end user (EU). However, edge server (ES) resources are limited, necessitating ...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators	Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Felix Wedlich, Mika Markus Müller, Federico Nicolás Peccia, Felix Thömmes, Jannik Steinmetz, Valentin Biermaier, Adrian Frischknecht, Paul Palomero Bernardo, Oliver Bringmann	2024-09-13	下载	Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their perform...