Appearance
2024-11-25
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-Paths | Leo Gold, Adam Bienkowski, David Sidoti, Krishna Pattipati, Omer Khan | 2024-11-25 | 下载 | The Multi-Objective Shortest-Path (MOS) problem finds a set of Pareto-optimal solutions from a start node to a destination node in a multi-attribute graph. |
| DocEDA: Automated Extraction and Design of Analog Circuits from Documents with Large Language Model | Hong Cai Chen, Longchang Wu, Ming Gao, Lingrui Shen, Jiarui Zhong, Yipin Xu | 2024-11-25 | 下载 | Efficient and accurate extraction of electrical parameters from circuit datasheets and design documents is critical for accelerating circuit design in Electronic Design Automation (EDA). |
| SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis | Kunming Shao, Fengshi Tian, Xiaomeng Wang, Jiakun Zheng, Jia Chen, Jingyu He, Hui Wu, Jinbo Chen, Xihao Guan, Yi Deng, Fengbin Tu, Jie Yang, Mohamad Sawan, Tim Kwang-Ting Cheng, Chi-Ying Tsui | 2024-11-25 | 下载 | Digital Computing-in-Memory (DCIM) is an innovative technology that integrates multiply-accumulation (MAC) logic directly into memory arrays to enhance the performance of modern AI computing. |
| A Data-Driven Approach to Dataflow-Aware Online Scheduling for Graph Neural Network Inference | Pol Puigdemont, Enrico Russo, Axel Wassington, Abhijit Das, Sergi Abadal, Maurizio Palesi | 2024-11-25 | 下载 | Graph Neural Networks (GNNs) have shown significant promise in various domains, such as recommendation systems, bioinformatics, and network analysis. |
| From CISC to RISC: language-model guided assembly transpilation | Ahmed Heakl, Chaimaa Abi, Rania Hossam, Abdulrahman Mahmoud | 2024-11-25 | 下载 | The transition from x86 to ARM architecture is becoming increasingly common across various domains, primarily driven by ARM's energy efficiency and improved performance across traditional sectors. |
| Dataflow Optimized Reconfigurable Acceleration for FEM-based CFD Simulations | Anastassis Kapetanakis, Aggelos Ferikoglou, George Anagnostopoulos, Sotirios Xydis | 2024-11-25 | 下载 | Computational Fluid Dynamics (CFD) simulations are essential for analyzing and optimizing fluid flows in a wide range of real-world applications. |
| UVLLM: An Automated Universal RTL Verification Framework using LLMs | Yuchen Hu, Junhao Ye, Ke Xu, Jialin Sun, Shiyue Zhang, Xinyao Jiao, Dingrong Pan, Jie Zhou, Ning Wang, Weiwei Shan, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang | 2024-11-25 | 下载 | Verifying hardware designs in embedded systems is crucial but often labor-intensive and time-consuming. While existing solutions have improved automation, they frequently rely on unrealistic assumptio... |
| MixPE: Quantization and Hardware Co-design for Efficient LLM Inference | Yu Zhang, Mingzi Wang, Lancheng Zou, Wulong Liu, Hui-Ling Zhen, Mingxuan Yuan, Bei Yu | 2024-11-25 | 下载 | Transformer-based large language models (LLMs) have achieved remarkable success as model sizes continue to grow, yet their deployment remains challenging due to significant computational and memory de... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Understanding GEMM Performance and Energy on NVIDIA Ada Lovelace: A Machine Learning-Based Analytical Approach | Xiaoteng, Liu, Pavly Halim | 2024-11-25 | 下载 | Analytical framework for predicting General Matrix Multiplication (GEMM) performance on modern GPUs, focusing on runtime, power consumption, and energy efficiency. |
| OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-Paths | Leo Gold, Adam Bienkowski, David Sidoti, Krishna Pattipati, Omer Khan | 2024-11-25 | 下载 | The Multi-Objective Shortest-Path (MOS) problem finds a set of Pareto-optimal solutions from a start node to a destination node in a multi-attribute graph. |
| Observability in Fog Computing | Aleteia Araujo, Breno Costa, Joao Bachiega, Leonardo R. Carvalho, Rajkumar Buyya | 2024-11-25 | 下载 | Fog Computing provides computational resources close to the end user, supporting low-latency and high-bandwidth communications. It supports IoT applications, enabling real-time data processing, analyt... |
| K8s Pro Sentinel: Extend Secret Security in Kubernetes Cluster | Kavindu Gunathilake, Indrajith Ekanayake | 2024-11-25 | 下载 | Microservice architecture is widely adopted among distributed systems. It follows the modular approach that decomposes large software applications into independent services. |
| Lion Cub: Minimizing Communication Overhead in Distributed Lion | Satoki Ishikawa, Tal Ben-Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden | 2024-11-25 | 下载 | Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottle... |
| Proxima. A DAG based cooperative distributed ledger | Evaldas Drasutis | 2024-11-25 | 下载 | This paper introduces a novel architecture for a distributed ledger, commonly referred to as a "blockchain", which is organized in the form of directed acyclic graph (DAG) with UTXO transactions as ve... |
| Truffle: Efficient Data Passing for Data-Intensive Serverless Workflows in the Edge-Cloud Continuum | Cynthia Marcelino, Stefan Nastic | 2024-11-25 | 下载 | Serverless computing promises a scalable, reliable, and cost-effective solution for running data-intensive applications and workflows in the heterogeneous and limited-resource environment of the Edge-... |
| A Framework for Consistency Models in Distributed Systems | Paulo Sérgio Almeida | 2024-11-25 | 下载 | We define am axiomatic timeless framework for asynchronous distributed systems, together with well-formedness and consistency axioms, which unifies and generalizes the expressive power of current appr... |
| Scalable Fault-Tolerant MapReduce | Demian Hespe, Lukas Hübner, Charel Mercatoris, Peter Sanders | 2024-11-25 | 下载 | Supercomputers getting ever larger and energy-efficient is at odds with the reliability of the used hardware. Thus, the time intervals between component failures are decreasing. |
| HeteroTune: Efficient Federated Learning for Large Heterogeneous Models | Ruofan Jia, Weiying Xie, Jie Lei, Jitao Ma, Haonan Qin, Leyuan Fang | 2024-11-25 | 下载 | While large pre-trained models have achieved impressive performance across AI tasks, their deployment in privacy-sensitive and distributed environments remains challenging. |
| Energy-aware operation of HPC systems in Germany | Estela Suarez, Hendryk Bockelmann, Norbert Eicker, Jan Eitzinger, Salem El Sayed, Thomas Fieseler, Martin Frank, Peter Frech, Pay Giesselmann, Daniel Hackenberg, Georg Hager, Andreas Herten, Thomas Ilsche, Bastian Koller, Erwin Laure, Cristina Manzano, Sebastian Oeste, Michael Ott, Klaus Reuter, Ralf Schneider, Kay Thust, Benedikt von St. Vieth | 2024-11-25 | 下载 | High-Performance Computing (HPC) systems are among the most energy-intensive scientific facilities, with electric power consumption reaching and often exceeding 20 megawatts per installation. |
| Staleness-Centric Optimizations for Parallel Diffusion MoE Inference | Jiajun Luo, Lizhuo Luo, Jianru Xu, Jiajun Song, Rongwei Lu, Chen Tang, Zhi Wang | 2024-11-25 | 下载 | Mixture-of-Experts-based (MoE-based) diffusion models demonstrate remarkable scalability in high-fidelity image generation, yet their reliance on expert parallelism introduces critical communication b... |
| HiDP: Hierarchical DNN Partitioning for Distributed Inference on Heterogeneous Edge Platforms | Zain Taufique, Aman Vyas, Antonio Miele, Pasi Liljeberg, Anil Kanduri | 2024-11-25 | 下载 | Edge inference techniques partition and distribute Deep Neural Network (DNN) inference tasks among multiple edge nodes for low latency inference, without considering the core-level heterogeneity of ed... |
| Data Processing Efficiency Aware User Association and Resource Allocation in Blockchain Enabled Metaverse over Wireless Communications | Liangxin Qian, Jun Zhao | 2024-11-25 | 下载 | In the rapidly evolving landscape of the Metaverse, enhanced by blockchain technology, the efficient processing of data has emerged as a critical challenge, especially in wireless communication system... |
| Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers | Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib | 2024-11-25 | 下载 | Graph Convolutional Networks (GCNs), particularly for large-scale graphs, are crucial across numerous domains. However, training distributed full-batch GCNs on large-scale graphs suffers from ineffici... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Enabling Skip Graphs to Process K-Dimensional Range Queries in a Mobile Sensor Network | Gregory J. Brault, Christopher James Augeri, Barry E. Mullins, Rusty O. Baldwin, Christopher B. Mayer | 2024-11-25 | 下载 | A skip graph is a resilient application-layer routing structure that supports range queries of distributed k-dimensional data. By sorting deterministic keys into groups based on locally computed rando... |
| Generative vs. Predictive Models in Massive MIMO Channel Prediction | Ju-Hyung Lee, Joohan Lee, Andreas F. Molisch | 2024-11-25 | 下载 | Massive MIMO (mMIMO) systems are essential for 5G/6G networks to meet high throughput and reliability demands, with machine learning (ML)-based techniques, particularly autoencoders (AEs), showing pro... |
| Poster: Could Large Language Models Perform Network Management? | Zine el abidine Kherroubi, Monika Prakash, Jean-Pierre Giacalone, Michael Baddeley | 2024-11-25 | 下载 | Modern wireless communication systems have become increasingly complex due to the proliferation of wireless devices, increasing performance standards, and growing security threats. |
| Static and Dynamic Routing, Fiber, Modulation Format, and Spectrum Allocation in Hybrid ULL Fiber-SSMF Elastic Optical Networks | Kangao Ouyang, Fengxian Tang, Zhilin Yuan, Jun Li, Yongcheng Li | 2024-11-25 | 下载 | Traditional standard single-mode fibers (SSMF) are unable to satisfy the future long-distance and high-speed optical channel transmission requirement due to their relatively large signal losses. |
| Data Processing Efficiency Aware User Association and Resource Allocation in Blockchain Enabled Metaverse over Wireless Communications | Liangxin Qian, Jun Zhao | 2024-11-25 | 下载 | In the rapidly evolving landscape of the Metaverse, enhanced by blockchain technology, the efficient processing of data has emerged as a critical challenge, especially in wireless communication system... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Asynchronous I/O -- With Great Power Comes Great Responsibility | Constantin Pestka, Marcus Paradies, Matthias Pohl | 2024-11-25 | 下载 | The performance of storage hardware has improved vastly recently, leaving the traditional I/O stack incapable of exploiting these gains due to increasingly large relative overheads. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Understanding GEMM Performance and Energy on NVIDIA Ada Lovelace: A Machine Learning-Based Analytical Approach | Xiaoteng, Liu, Pavly Halim | 2024-11-25 | 下载 | Analytical framework for predicting General Matrix Multiplication (GEMM) performance on modern GPUs, focusing on runtime, power consumption, and energy efficiency. |
| OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-Paths | Leo Gold, Adam Bienkowski, David Sidoti, Krishna Pattipati, Omer Khan | 2024-11-25 | 下载 | The Multi-Objective Shortest-Path (MOS) problem finds a set of Pareto-optimal solutions from a start node to a destination node in a multi-attribute graph. |
| Optimizing Winograd Convolution on ARMv8 processors | Haoyuan Gui, Xiaoyu Zhang, Chong Zhang, Zitong Su, Huiyuan Li | 2024-11-25 | 下载 | As Convolutional Neural Networks (CNNs) gain prominence in deep learning, algorithms like Winograd Convolution have been introduced to enhance computational efficiency. |
| DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs | Jiahui Liu, Zhenkun Cai, Zhiyong Chen, Minjie Wang | 2024-11-25 | 下载 | Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train... |
| Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers | Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib | 2024-11-25 | 下载 | Graph Convolutional Networks (GCNs), particularly for large-scale graphs, are crucial across numerous domains. However, training distributed full-batch GCNs on large-scale graphs suffers from ineffici... |