Skip to content

2024-11-25

cs.AR - Architecture

标题作者发布日期PDF摘要
OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-PathsLeo Gold, Adam Bienkowski, David Sidoti, Krishna Pattipati, Omer Khan2024-11-25下载The Multi-Objective Shortest-Path (MOS) problem finds a set of Pareto-optimal solutions from a start node to a destination node in a multi-attribute graph.
DocEDA: Automated Extraction and Design of Analog Circuits from Documents with Large Language ModelHong Cai Chen, Longchang Wu, Ming Gao, Lingrui Shen, Jiarui Zhong, Yipin Xu2024-11-25下载Efficient and accurate extraction of electrical parameters from circuit datasheets and design documents is critical for accelerating circuit design in Electronic Design Automation (EDA).
SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit SynthesisKunming Shao, Fengshi Tian, Xiaomeng Wang, Jiakun Zheng, Jia Chen, Jingyu He, Hui Wu, Jinbo Chen, Xihao Guan, Yi Deng, Fengbin Tu, Jie Yang, Mohamad Sawan, Tim Kwang-Ting Cheng, Chi-Ying Tsui2024-11-25下载Digital Computing-in-Memory (DCIM) is an innovative technology that integrates multiply-accumulation (MAC) logic directly into memory arrays to enhance the performance of modern AI computing.
A Data-Driven Approach to Dataflow-Aware Online Scheduling for Graph Neural Network InferencePol Puigdemont, Enrico Russo, Axel Wassington, Abhijit Das, Sergi Abadal, Maurizio Palesi2024-11-25下载Graph Neural Networks (GNNs) have shown significant promise in various domains, such as recommendation systems, bioinformatics, and network analysis.
From CISC to RISC: language-model guided assembly transpilationAhmed Heakl, Chaimaa Abi, Rania Hossam, Abdulrahman Mahmoud2024-11-25下载The transition from x86 to ARM architecture is becoming increasingly common across various domains, primarily driven by ARM's energy efficiency and improved performance across traditional sectors.
Dataflow Optimized Reconfigurable Acceleration for FEM-based CFD SimulationsAnastassis Kapetanakis, Aggelos Ferikoglou, George Anagnostopoulos, Sotirios Xydis2024-11-25下载Computational Fluid Dynamics (CFD) simulations are essential for analyzing and optimizing fluid flows in a wide range of real-world applications.
UVLLM: An Automated Universal RTL Verification Framework using LLMsYuchen Hu, Junhao Ye, Ke Xu, Jialin Sun, Shiyue Zhang, Xinyao Jiao, Dingrong Pan, Jie Zhou, Ning Wang, Weiwei Shan, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang2024-11-25下载Verifying hardware designs in embedded systems is crucial but often labor-intensive and time-consuming. While existing solutions have improved automation, they frequently rely on unrealistic assumptio...
MixPE: Quantization and Hardware Co-design for Efficient LLM InferenceYu Zhang, Mingzi Wang, Lancheng Zou, Wulong Liu, Hui-Ling Zhen, Mingxuan Yuan, Bei Yu2024-11-25下载Transformer-based large language models (LLMs) have achieved remarkable success as model sizes continue to grow, yet their deployment remains challenging due to significant computational and memory de...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Understanding GEMM Performance and Energy on NVIDIA Ada Lovelace: A Machine Learning-Based Analytical ApproachXiaoteng, Liu, Pavly Halim2024-11-25下载Analytical framework for predicting General Matrix Multiplication (GEMM) performance on modern GPUs, focusing on runtime, power consumption, and energy efficiency.
OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-PathsLeo Gold, Adam Bienkowski, David Sidoti, Krishna Pattipati, Omer Khan2024-11-25下载The Multi-Objective Shortest-Path (MOS) problem finds a set of Pareto-optimal solutions from a start node to a destination node in a multi-attribute graph.
Observability in Fog ComputingAleteia Araujo, Breno Costa, Joao Bachiega, Leonardo R. Carvalho, Rajkumar Buyya2024-11-25下载Fog Computing provides computational resources close to the end user, supporting low-latency and high-bandwidth communications. It supports IoT applications, enabling real-time data processing, analyt...
K8s Pro Sentinel: Extend Secret Security in Kubernetes ClusterKavindu Gunathilake, Indrajith Ekanayake2024-11-25下载Microservice architecture is widely adopted among distributed systems. It follows the modular approach that decomposes large software applications into independent services.
Lion Cub: Minimizing Communication Overhead in Distributed LionSatoki Ishikawa, Tal Ben-Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden2024-11-25下载Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottle...
Proxima. A DAG based cooperative distributed ledgerEvaldas Drasutis2024-11-25下载This paper introduces a novel architecture for a distributed ledger, commonly referred to as a "blockchain", which is organized in the form of directed acyclic graph (DAG) with UTXO transactions as ve...
Truffle: Efficient Data Passing for Data-Intensive Serverless Workflows in the Edge-Cloud ContinuumCynthia Marcelino, Stefan Nastic2024-11-25下载Serverless computing promises a scalable, reliable, and cost-effective solution for running data-intensive applications and workflows in the heterogeneous and limited-resource environment of the Edge-...
A Framework for Consistency Models in Distributed SystemsPaulo Sérgio Almeida2024-11-25下载We define am axiomatic timeless framework for asynchronous distributed systems, together with well-formedness and consistency axioms, which unifies and generalizes the expressive power of current appr...
Scalable Fault-Tolerant MapReduceDemian Hespe, Lukas Hübner, Charel Mercatoris, Peter Sanders2024-11-25下载Supercomputers getting ever larger and energy-efficient is at odds with the reliability of the used hardware. Thus, the time intervals between component failures are decreasing.
HeteroTune: Efficient Federated Learning for Large Heterogeneous ModelsRuofan Jia, Weiying Xie, Jie Lei, Jitao Ma, Haonan Qin, Leyuan Fang2024-11-25下载While large pre-trained models have achieved impressive performance across AI tasks, their deployment in privacy-sensitive and distributed environments remains challenging.
Energy-aware operation of HPC systems in GermanyEstela Suarez, Hendryk Bockelmann, Norbert Eicker, Jan Eitzinger, Salem El Sayed, Thomas Fieseler, Martin Frank, Peter Frech, Pay Giesselmann, Daniel Hackenberg, Georg Hager, Andreas Herten, Thomas Ilsche, Bastian Koller, Erwin Laure, Cristina Manzano, Sebastian Oeste, Michael Ott, Klaus Reuter, Ralf Schneider, Kay Thust, Benedikt von St. Vieth2024-11-25下载High-Performance Computing (HPC) systems are among the most energy-intensive scientific facilities, with electric power consumption reaching and often exceeding 20 megawatts per installation.
Staleness-Centric Optimizations for Parallel Diffusion MoE InferenceJiajun Luo, Lizhuo Luo, Jianru Xu, Jiajun Song, Rongwei Lu, Chen Tang, Zhi Wang2024-11-25下载Mixture-of-Experts-based (MoE-based) diffusion models demonstrate remarkable scalability in high-fidelity image generation, yet their reliance on expert parallelism introduces critical communication b...
HiDP: Hierarchical DNN Partitioning for Distributed Inference on Heterogeneous Edge PlatformsZain Taufique, Aman Vyas, Antonio Miele, Pasi Liljeberg, Anil Kanduri2024-11-25下载Edge inference techniques partition and distribute Deep Neural Network (DNN) inference tasks among multiple edge nodes for low latency inference, without considering the core-level heterogeneity of ed...
Data Processing Efficiency Aware User Association and Resource Allocation in Blockchain Enabled Metaverse over Wireless CommunicationsLiangxin Qian, Jun Zhao2024-11-25下载In the rapidly evolving landscape of the Metaverse, enhanced by blockchain technology, the efficient processing of data has emerged as a critical challenge, especially in wireless communication system...
Scaling Large-scale GNN Training to Thousands of Processors on CPU-based SupercomputersChen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib2024-11-25下载Graph Convolutional Networks (GCNs), particularly for large-scale graphs, are crucial across numerous domains. However, training distributed full-batch GCNs on large-scale graphs suffers from ineffici...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Enabling Skip Graphs to Process K-Dimensional Range Queries in a Mobile Sensor NetworkGregory J. Brault, Christopher James Augeri, Barry E. Mullins, Rusty O. Baldwin, Christopher B. Mayer2024-11-25下载A skip graph is a resilient application-layer routing structure that supports range queries of distributed k-dimensional data. By sorting deterministic keys into groups based on locally computed rando...
Generative vs. Predictive Models in Massive MIMO Channel PredictionJu-Hyung Lee, Joohan Lee, Andreas F. Molisch2024-11-25下载Massive MIMO (mMIMO) systems are essential for 5G/6G networks to meet high throughput and reliability demands, with machine learning (ML)-based techniques, particularly autoencoders (AEs), showing pro...
Poster: Could Large Language Models Perform Network Management?Zine el abidine Kherroubi, Monika Prakash, Jean-Pierre Giacalone, Michael Baddeley2024-11-25下载Modern wireless communication systems have become increasingly complex due to the proliferation of wireless devices, increasing performance standards, and growing security threats.
Static and Dynamic Routing, Fiber, Modulation Format, and Spectrum Allocation in Hybrid ULL Fiber-SSMF Elastic Optical NetworksKangao Ouyang, Fengxian Tang, Zhilin Yuan, Jun Li, Yongcheng Li2024-11-25下载Traditional standard single-mode fibers (SSMF) are unable to satisfy the future long-distance and high-speed optical channel transmission requirement due to their relatively large signal losses.
Data Processing Efficiency Aware User Association and Resource Allocation in Blockchain Enabled Metaverse over Wireless CommunicationsLiangxin Qian, Jun Zhao2024-11-25下载In the rapidly evolving landscape of the Metaverse, enhanced by blockchain technology, the efficient processing of data has emerged as a critical challenge, especially in wireless communication system...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Asynchronous I/O -- With Great Power Comes Great ResponsibilityConstantin Pestka, Marcus Paradies, Matthias Pohl2024-11-25下载The performance of storage hardware has improved vastly recently, leaving the traditional I/O stack incapable of exploiting these gains due to increasingly large relative overheads.

cs.PF - Performance

标题作者发布日期PDF摘要
Understanding GEMM Performance and Energy on NVIDIA Ada Lovelace: A Machine Learning-Based Analytical ApproachXiaoteng, Liu, Pavly Halim2024-11-25下载Analytical framework for predicting General Matrix Multiplication (GEMM) performance on modern GPUs, focusing on runtime, power consumption, and energy efficiency.
OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-PathsLeo Gold, Adam Bienkowski, David Sidoti, Krishna Pattipati, Omer Khan2024-11-25下载The Multi-Objective Shortest-Path (MOS) problem finds a set of Pareto-optimal solutions from a start node to a destination node in a multi-attribute graph.
Optimizing Winograd Convolution on ARMv8 processorsHaoyuan Gui, Xiaoyu Zhang, Chong Zhang, Zitong Su, Huiyuan Li2024-11-25下载As Convolutional Neural Networks (CNNs) gain prominence in deep learning, algorithms like Winograd Convolution have been introduced to enhance computational efficiency.
DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUsJiahui Liu, Zhenkun Cai, Zhiyong Chen, Minjie Wang2024-11-25下载Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train...
Scaling Large-scale GNN Training to Thousands of Processors on CPU-based SupercomputersChen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib2024-11-25下载Graph Convolutional Networks (GCNs), particularly for large-scale graphs, are crucial across numerous domains. However, training distributed full-batch GCNs on large-scale graphs suffers from ineffici...

基于 VitePress 构建