Skip to content

2024-08-05

cs.AR - Architecture

标题作者发布日期PDF摘要
Evaluating Large Language Models for Automatic Register Transfer Logic Generation via High-Level SynthesisSneha Swaroopa, Rijoy Mukherjee, Anushka Debnath, Rajat Subhra Chakraborty2024-08-05下载The ever-growing popularity of large language models (LLMs) has resulted in their increasing adoption for hardware design and verification. Prior research has attempted to assess the capability of LLM...
Finite-Time Lyapunov Exponent Calculation on FPGA using High-Level Synthesis ToolsManuel de Castro, Roberto R. Osorio, Francisco J. Andujar, Rocío Carratalá-Sáez, Yuri Torres, Diego R. Llanos2024-08-05下载As Field Programmable Gate Arrays (FPGAs) computing capabilities continue to grow, also does the interest on building scientific accelerators around them.
Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment FlowPhilip Wiese, Gamze İslamoğlu, Moritz Scherer, Luka Macan, Victor J. B. Jung, Alessio Burrello, Francesco Conti, Luca Benini2024-08-05下载One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers.
PENDRAM: Enabling High-Performance and Energy-Efficient Processing of Deep Neural Networks through a Generalized DRAM Data Mapping PolicyRachmad Vidya Wicaksana Putra, Muhammad Abdullah Hanif, Muhammad Shafique2024-08-05下载Convolutional Neural Networks (CNNs), a prominent type of Deep Neural Networks (DNNs), have emerged as a state-of-the-art solution for solving machine learning tasks.
TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Architecture and Hardware ImplementationCristian Sestito, Shady Agwa, Themis Prodromakis2024-08-05下载Modern hardware architectures for Convolutional Neural Networks (CNNs), other than targeting high performance, aim at dissipating limited energy.
SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference ServingAndreas Kosmas Kakolyris, Dimosthenis Masouros, Petros Vavaroutsos, Sotirios Xydis, Dimitrios Soudris2024-08-05下载As Large Language Models (LLMs) gain traction, their reliance on power-hungry GPUs places ever-increasing energy demands, raising environmental and monetary concerns.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Mitigating Malicious Attacks in Federated Learning via Confidence-aware DefenseQilei Li, Ahmed M. Abdelmoniem2024-08-05下载Federated Learning (FL) is a distributed machine learning diagram that enables multiple clients to collaboratively train a global model without sharing their private local data.
Toward Smart Scheduling in TapisJoe Stubbs, Smruti Padhy, Richard Cardone2024-08-05下载The Tapis framework provides APIs for automating job execution on remote resources, including HPC clusters and servers running in the cloud. Tapis can simplify the interaction with remote cyberinfrast...
Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU ArchitecturesMónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson2024-08-05下载Solving very large linear systems of equations is a key computational task in science and technology. In many cases, the coefficient matrix of the linear system is rank-deficient, leading to systems t...
Asynchronous Latency and Fast Atomic SnapshotJoão Paulo Bezerra, Luciano Freitas, Petr Kuznetsov, Matthieu Rambaud2024-08-05下载This paper introduces a novel, fast atomic-snapshot protocol for asynchronous message-passing systems. In the process of defining what ``fast'' means exactly, we spot a few interesting issues that ari...
SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference ServingAndreas Kosmas Kakolyris, Dimosthenis Masouros, Petros Vavaroutsos, Sotirios Xydis, Dimitrios Soudris2024-08-05下载As Large Language Models (LLMs) gain traction, their reliance on power-hungry GPUs places ever-increasing energy demands, raising environmental and monetary concerns.
Nonlinear Perturbation-based Non-Convex Optimization over Time-Varying NetworksMohammadreza Doostmohammadian, Zulfiya R. Gabidullina, Hamid R. Rabiee2024-08-05下载Decentralized optimization strategies are helpful for various applications, from networked estimation to distributed machine learning. This paper studies finite-sum minimization problems described ove...
Large Language Model Aided QoS Prediction for Service RecommendationHuiying Liu, Zekun Zhang, Honghao Li, Qilin Wu, Yiwen Zhang2024-08-05下载Large language models (LLMs) have seen rapid improvement in the recent years, and have been used in a wider range of applications. After being trained on large text corpus, LLMs obtain the capability ...
Enabling Practical Transparent Checkpointing for MPI: A Topological Sort ApproachYao Xu, Gene Cooperman2024-08-05下载MPI is the de facto standard for parallel computing on a cluster of computers. Checkpointing is an important component in any strategy for software resilience and for long-running jobs that must be ex...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Active Learning for WBAN-based Health MonitoringCho-Chun Chiu, Tuan Nguyen, Ting He, Shiqiang Wang, Beom-Su Kim, Ki-Il Kim2024-08-05下载We consider a novel active learning problem motivated by the need of learning machine learning models for health monitoring in wireless body area network (WBAN).
Performance analysis of a RIS-assisted communicationsHamza Adrat, Laurent Decreusefond, Philippe Martins2024-08-05下载Reconfigurable Intelligent Surfaces (RIS) are currently considered for adoption in future 6G stantards. ETSI and 3GPP have started feasibility and performance investigations of such a technology.
Demystifying AMD SEV Performance Penalty for NFV DeploymentSyafiq Al Atiiq, Aris Cahyadi Risdianto2024-08-05下载Network Function Virtualization (NFV) has shifted communication networks towards more adaptable software solutions, but this transition raises new security concerns, particularly in public cloud deplo...

cs.PF - Performance

标题作者发布日期PDF摘要
Toward Smart Scheduling in TapisJoe Stubbs, Smruti Padhy, Richard Cardone2024-08-05下载The Tapis framework provides APIs for automating job execution on remote resources, including HPC clusters and servers running in the cloud. Tapis can simplify the interaction with remote cyberinfrast...
Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU ArchitecturesMónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson2024-08-05下载Solving very large linear systems of equations is a key computational task in science and technology. In many cases, the coefficient matrix of the linear system is rank-deficient, leading to systems t...

基于 VitePress 构建