2024-08-05

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Evaluating Large Language Models for Automatic Register Transfer Logic Generation via High-Level Synthesis	Sneha Swaroopa, Rijoy Mukherjee, Anushka Debnath, Rajat Subhra Chakraborty	2024-08-05	下载	The ever-growing popularity of large language models (LLMs) has resulted in their increasing adoption for hardware design and verification. Prior research has attempted to assess the capability of LLM...
Finite-Time Lyapunov Exponent Calculation on FPGA using High-Level Synthesis Tools	Manuel de Castro, Roberto R. Osorio, Francisco J. Andujar, Rocío Carratalá-Sáez, Yuri Torres, Diego R. Llanos	2024-08-05	下载	As Field Programmable Gate Arrays (FPGAs) computing capabilities continue to grow, also does the interest on building scientific accelerators around them.
Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow	Philip Wiese, Gamze İslamoğlu, Moritz Scherer, Luka Macan, Victor J. B. Jung, Alessio Burrello, Francesco Conti, Luca Benini	2024-08-05	下载	One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers.
PENDRAM: Enabling High-Performance and Energy-Efficient Processing of Deep Neural Networks through a Generalized DRAM Data Mapping Policy	Rachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique	2024-08-05	下载	Convolutional Neural Networks (CNNs), a prominent type of Deep Neural Networks (DNNs), have emerged as a state-of-the-art solution for solving machine learning tasks.
TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Architecture and Hardware Implementation	Cristian Sestito, Shady Agwa, Themis Prodromakis	2024-08-05	下载	Modern hardware architectures for Convolutional Neural Networks (CNNs), other than targeting high performance, aim at dissipating limited energy.
SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving	Andreas Kosmas Kakolyris, Dimosthenis Masouros, Petros Vavaroutsos, Sotirios Xydis, Dimitrios Soudris	2024-08-05	下载	As Large Language Models (LLMs) gain traction, their reliance on power-hungry GPUs places ever-increasing energy demands, raising environmental and monetary concerns.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Mitigating Malicious Attacks in Federated Learning via Confidence-aware Defense	Qilei Li, Ahmed M. Abdelmoniem	2024-08-05	下载	Federated Learning (FL) is a distributed machine learning diagram that enables multiple clients to collaboratively train a global model without sharing their private local data.
Toward Smart Scheduling in Tapis	Joe Stubbs, Smruti Padhy, Richard Cardone	2024-08-05	下载	The Tapis framework provides APIs for automating job execution on remote resources, including HPC clusters and servers running in the cloud. Tapis can simplify the interaction with remote cyberinfrast...
Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures	Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson	2024-08-05	下载	Solving very large linear systems of equations is a key computational task in science and technology. In many cases, the coefficient matrix of the linear system is rank-deficient, leading to systems t...
Asynchronous Latency and Fast Atomic Snapshot	João Paulo Bezerra, Luciano Freitas, Petr Kuznetsov, Matthieu Rambaud	2024-08-05	下载	This paper introduces a novel, fast atomic-snapshot protocol for asynchronous message-passing systems. In the process of defining what ``fast'' means exactly, we spot a few interesting issues that ari...
SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving	Andreas Kosmas Kakolyris, Dimosthenis Masouros, Petros Vavaroutsos, Sotirios Xydis, Dimitrios Soudris	2024-08-05	下载	As Large Language Models (LLMs) gain traction, their reliance on power-hungry GPUs places ever-increasing energy demands, raising environmental and monetary concerns.
Nonlinear Perturbation-based Non-Convex Optimization over Time-Varying Networks	Mohammadreza Doostmohammadian, Zulfiya R. Gabidullina, Hamid R. Rabiee	2024-08-05	下载	Decentralized optimization strategies are helpful for various applications, from networked estimation to distributed machine learning. This paper studies finite-sum minimization problems described ove...
Large Language Model Aided QoS Prediction for Service Recommendation	Huiying Liu, Zekun Zhang, Honghao Li, Qilin Wu, Yiwen Zhang	2024-08-05	下载	Large language models (LLMs) have seen rapid improvement in the recent years, and have been used in a wider range of applications. After being trained on large text corpus, LLMs obtain the capability ...
Enabling Practical Transparent Checkpointing for MPI: A Topological Sort Approach	Yao Xu, Gene Cooperman	2024-08-05	下载	MPI is the de facto standard for parallel computing on a cluster of computers. Checkpointing is an important component in any strategy for software resilience and for long-running jobs that must be ex...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Active Learning for WBAN-based Health Monitoring	Cho-Chun Chiu, Tuan Nguyen, Ting He, Shiqiang Wang, Beom-Su Kim, Ki-Il Kim	2024-08-05	下载	We consider a novel active learning problem motivated by the need of learning machine learning models for health monitoring in wireless body area network (WBAN).
Performance analysis of a RIS-assisted communications	Hamza Adrat, Laurent Decreusefond, Philippe Martins	2024-08-05	下载	Reconfigurable Intelligent Surfaces (RIS) are currently considered for adoption in future 6G stantards. ETSI and 3GPP have started feasibility and performance investigations of such a technology.
Demystifying AMD SEV Performance Penalty for NFV Deployment	Syafiq Al Atiiq, Aris Cahyadi Risdianto	2024-08-05	下载	Network Function Virtualization (NFV) has shifted communication networks towards more adaptable software solutions, but this transition raises new security concerns, particularly in public cloud deplo...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Toward Smart Scheduling in Tapis	Joe Stubbs, Smruti Padhy, Richard Cardone	2024-08-05	下载	The Tapis framework provides APIs for automating job execution on remote resources, including HPC clusters and servers running in the cloud. Tapis can simplify the interaction with remote cyberinfrast...
Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures	Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson	2024-08-05	下载	Solving very large linear systems of equations is a key computational task in science and technology. In many cases, the coefficient matrix of the linear system is rank-deficient, leading to systems t...