Skip to content

2024-06-12

cs.AR - Architecture

标题作者发布日期PDF摘要
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model InferenceChristopher Wolters, Xiaoxuan Yang, Ulf Schlichtmann, Toyotaro Suzumura2024-06-12下载Large language models (LLMs) have recently transformed natural language processing, enabling machines to generate human-like text and engage in meaningful conversations.
Continuous-Time Digital Twin with Analogue Memristive Neural Ordinary Differential Equation SolverHegan Chen, Jichang Yang, Jia Chen, Songqi Wang, Shaocong Wang, Dingchen Wang, Xinyu Tian, Yifei Yu, Xi Chen, Yinan Lin, Yangu He, Xiaoshan Wu, Yi Li, Xinyuan Zhang, Ning Lin, Meng Xu, Yi Li, Xumeng Zhang, Zhongrui Wang, Han Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu2024-06-12下载Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation.
It's all about PR -- Smart Benchmarking AI Accelerators using Performance RepresentativesAlexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann2024-06-12下载Statistical models are widely used to estimate the performance of commercial off-the-shelf (COTS) AI hardware accelerators. However, training of statistical performance models often requires vast amou...
Hypervisor Extension for a RISC-V ProcessorJaume Gauchola, JuanJosé Costa, Enric Morancho, Ramon Canal, Xavier Carril, Max Doblas, Beatriz Otero, Alex Pajuelo, Eva Rodríguez, Javier Salamero, Javier Verdú2024-06-12下载This paper describes our experience implementing a Hypervisor extension for a 64-bit RISC-V processor. We describe the design process and the main required parts with a brief explanation of each one.
ONNXim: A Fast, Cycle-level Multi-core NPU SimulatorHyungkyu Ham, Wonhyuk Yang, Yunseon Shin, Okkyun Woo, Guseul Heo, Sangyeop Lee, Jongse Park, Gwangsun Kim2024-06-12下载As DNNs are widely adopted in various application domains while demanding increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) is beco...
Hardware Implementation of Soft Mapper/Demappers in Iterative EP-based ReceiversIan Fischer Schilling, Serdar Sahin, Camille Leroux, Antonio Maria Cipriano, Christophe Jego2024-06-12下载This paper presents a comprehensive study and implementations onto FPGA device of an Expectation Propagation (EP)-based receiver for QPSK, 8-PSK, and 16-QAM.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
PETSc/TAO Developments for GPU-Based Early Exascale SystemsRichard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Toby Isaac, Matthew Knepley, Todd Munson, Hansol Suh, Stefano Zampini, Hong Zhang, Junchao Zhang2024-06-12下载The Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization vi...
Nonconvex Federated Learning on Compact Smooth Submanifolds With Heterogeneous DataJiaojiao Zhang, Jiang Hu, Anthony Man-Cho So, Mikael Johansson2024-06-12下载Many machine learning tasks, such as principal component analysis and low-rank matrix completion, give rise to manifold optimization problems.
ProTrain: Efficient LLM Training via Memory-Aware TechniquesHanmei Yang, Jin Zhou, Yao Fu, Xiaoqun Wang, Ramine Roane, Hui Guan, Tongping Liu2024-06-12下载It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem, existing work exploits the combination of CPU and GPU for the training process, such as ZeRO-Offload.
A deep cut into Split Federated Self-supervised LearningMarcin Przewięźlikowski, Marcin Osial, Bartosz Zieliński, Marek Śmieja2024-06-12下载Collaborative self-supervised learning has recently become feasible in highly distributed environments by dividing the network layers between client devices and a central server.
Vitamin-V: Expanding Open-Source RISC-V Cloud EnvironmentsRamon Canal, Stefano Di Carlo, Dimitris Gizopoulos, Alberto Scionti, Francesco Lubrano, Josep-Lluís Berral, Aaron Call, Diego Marron, Konstantinos Nikas, Dionisios Pnevmatikatos, Daniel Raho, Alvise Rigo, Yannis Papaefstathiou, José María Arnau, Angelos Arelakis2024-06-12下载Among the key contributions of Vitamin-V (2023-2025 Horizon Europe project), we develop a complete RISC-V open-source software stack for cloud services with comparable performance to the cloud-dominan...
Resource Allocation and Workload Scheduling for Large-Scale Distributed Deep Learning: A SurveyFeng Liang, Zhen Zhang, Haifeng Lu, Chengming Li, Victor C. M. Leung, Yanyi Guo, Xiping Hu2024-06-12下载With rapidly increasing distributed deep learning workloads in large-scale data centers, efficient distributed deep learning framework strategies for resource allocation and workload scheduling have b...
IMFL-AIGC: Incentive Mechanism Design for Federated Learning Empowered by Artificial Intelligence Generated ContentGuangjing Huang, Qiong Wu, Jingyi Li, Xu Chen2024-06-12下载Federated learning (FL) has emerged as a promising paradigm that enables clients to collaboratively train a shared global model without uploading their local data.
Federated Incomplete Multi-View Clustering with Heterogeneous Graph Neural NetworksXueming Yan, Ziqi Wang, Yaochu Jin2024-06-12下载Federated multi-view clustering offers the potential to develop a global clustering model using data distributed across multiple devices. However, current methods face challenges due to the absence of...
Emergent Peer-to-Peer Multi-Hub TopologyMohamed Amine Legheraba, Maria Potop-Butucaru, Sébastien Tixeuil, Serge Fdida2024-06-12下载In this paper we propose and evaluate an innovative algorithm that enables the creation of Peer-to-Peer network overlays characterized by emergent multi-hubs.
Computational Performance and Energy Efficiency of ARM based HPC serversOskar Schirmer2024-06-12下载HPC world is dominated by x86 ISA CPUs. This monoculture is not necessarily justified by best performance evaluation, but may inherit from e.g. SW related restrictions on the choice of HW platforms.
FDLoRA: Personalized Federated Learning of Large Language Model via Dual LoRA TuningJiaxing QI, Zhongzhi Luan, Shaohan Huang, Carol Fung, Hailong Yang, Depei Qian2024-06-12下载Large language models (LLMs) have emerged as important components across various fields, yet their training requires substantial computation resources and abundant labeled data.
Class-Wise Federated Averaging for Efficient PersonalizationGyuejeong Lee, Daeyoung Choi2024-06-12下载Federated learning (FL) enables collaborative model training across distributed clients without centralizing data. However, existing approaches such as Federated Averaging (FedAvg) often perform poorl...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Enhancing Path Selections with Interference Graphs in Multihop Relay Wireless NetworksCao Vien Phung, Andre Drummond, Admela Jukan2024-06-12下载The multihop relay wireless networks have gained traction due to the emergence of Reconfigurable Intelligent Surfaces (RISs) which can be used as relays in high frequency range wireless network, inclu...
Machine Learning-Driven Open-Source Framework for Assessing QoE in Multimedia NetworksParsa Hassani Shariat Panahi, Amir Hossein Jalilvand, Abolfazl Diyanat2024-06-12下载The Internet is integral to modern life, influencing communication, business, and lifestyles globally. As dependence on Internet services grows, the demand for high-quality service delivery increases.
MSADM: Large Language Model (LLM) Assisted End-to-End Network Health Management Based on Multi-Scale SemanticizationFengxiao Tang, Xiaonan Wang, Xun Yuan, Linfeng Luo, Ming Zhao, Tianchi Huang, Nei Kato2024-06-12下载Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based...
Systematic Literature Review of AI-enabled Spectrum Management in 6G and Future NetworksBushra Sabir, Shuiqiao Yang, David Nguyen, Nan Wu, Alsharif Abuadbba, Hajime Suzuki, Shangqi Lai, Wei Ni, Ding Ming, Surya Nepal2024-06-12下载Artificial Intelligence (AI) has advanced significantly in various domains like healthcare, finance, and cybersecurity, with successes such as DeepMind's medical imaging and Tesla's autonomous vehicle...
Efficient Network Traffic Feature Sets for IoT Intrusion DetectionMiguel Silva, João Vitorino, Eva Maia, Isabel Praça2024-06-12下载The use of Machine Learning (ML) models in cybersecurity solutions requires high-quality data that is stripped of redundant, missing, and noisy information.
Semantic-Aware Resource Allocation Based on Deep Reinforcement Learning for 5G-V2X HetNetsZhiyu Shao, Qiong Wu, Pingyi Fan, Nan Cheng, Qiang Fan, Jiangzhou Wang2024-06-12下载This letter proposes a semantic-aware resource allocation (SARA) framework with flexible duty cycle (DC) coexistence mechanism (SARADC) for 5G-V2X Heterogeneous Network (HetNets) based on deep reinfor...
Demonstration of Safe Electromagnetic Radiation Emitted by 5G Active Antenna SystemsSumit Kumar, Chandan Kumar Sheemar, Abdelrahman Astro, Jorge Querol, Symeon Chatzinotas2024-06-12下载The careful planning and safe deployment of 5G technologies will bring enormous benefits to society and the economy. Higher frequency, beamforming, and small-cells are key technologies that will provi...
Content Provider Contributions to Capacity Expansion of a Neutral ISP: Effect of Private OptionPranay Agarwal, D. Manjunath2024-06-12下载Increasing content consumption by users and the expectation of a better Internet experience requires Internet service providers (ISPs) to expand the capacity of the access network continually.
Toward Enhanced Reinforcement Learning-Based Resource Management via Digital Twin: Opportunities, Applications, and ChallengesNan Cheng, Xiucheng Wang, Zan Li, Zhisheng Yin, Tom Luan, Xuemin Shen2024-06-12下载This article presents a digital twin (DT)-enhanced reinforcement learning (RL) framework aimed at optimizing performance and reliability in network resource management, since the traditional RL method...

cs.PF - Performance

标题作者发布日期PDF摘要
ProTrain: Efficient LLM Training via Memory-Aware TechniquesHanmei Yang, Jin Zhou, Yao Fu, Xiaoqun Wang, Ramine Roane, Hui Guan, Tongping Liu2024-06-12下载It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem, existing work exploits the combination of CPU and GPU for the training process, such as ZeRO-Offload.
It's all about PR -- Smart Benchmarking AI Accelerators using Performance RepresentativesAlexander Louis-Ferdinand Jung, Jannik Steinmetz, Jonathan Gietz, Konstantin Lübeck, Oliver Bringmann2024-06-12下载Statistical models are widely used to estimate the performance of commercial off-the-shelf (COTS) AI hardware accelerators. However, training of statistical performance models often requires vast amou...
ONNXim: A Fast, Cycle-level Multi-core NPU SimulatorHyungkyu Ham, Wonhyuk Yang, Yunseon Shin, Okkyun Woo, Guseul Heo, Sangyeop Lee, Jongse Park, Gwangsun Kim2024-06-12下载As DNNs are widely adopted in various application domains while demanding increasingly higher compute and memory requirements, designing efficient and performant NPUs (Neural Processing Units) is beco...

基于 VitePress 构建