2024-11-25

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-Paths	Leo Gold, Adam Bienkowski, David Sidoti, Krishna Pattipati, Omer Khan	2024-11-25	下载	The Multi-Objective Shortest-Path (MOS) problem finds a set of Pareto-optimal solutions from a start node to a destination node in a multi-attribute graph.
DocEDA: Automated Extraction and Design of Analog Circuits from Documents with Large Language Model	Hong Cai Chen, Longchang Wu, Ming Gao, Lingrui Shen, Jiarui Zhong, Yipin Xu	2024-11-25	下载	Efficient and accurate extraction of electrical parameters from circuit datasheets and design documents is critical for accelerating circuit design in Electronic Design Automation (EDA).
SynDCIM: A Performance-Aware Digital Computing-in-Memory Compiler with Multi-Spec-Oriented Subcircuit Synthesis	Kunming Shao, Fengshi Tian, Xiaomeng Wang, Jiakun Zheng, Jia Chen, Jingyu He, Hui Wu, Jinbo Chen, Xihao Guan, Yi Deng, Fengbin Tu, Jie Yang, Mohamad Sawan, Tim Kwang-Ting Cheng, Chi-Ying Tsui	2024-11-25	下载	Digital Computing-in-Memory (DCIM) is an innovative technology that integrates multiply-accumulation (MAC) logic directly into memory arrays to enhance the performance of modern AI computing.
A Data-Driven Approach to Dataflow-Aware Online Scheduling for Graph Neural Network Inference	Pol Puigdemont, Enrico Russo, Axel Wassington, Abhijit Das, Sergi Abadal, Maurizio Palesi	2024-11-25	下载	Graph Neural Networks (GNNs) have shown significant promise in various domains, such as recommendation systems, bioinformatics, and network analysis.
From CISC to RISC: language-model guided assembly transpilation	Ahmed Heakl, Chaimaa Abi, Rania Hossam, Abdulrahman Mahmoud	2024-11-25	下载	The transition from x86 to ARM architecture is becoming increasingly common across various domains, primarily driven by ARM's energy efficiency and improved performance across traditional sectors.
Dataflow Optimized Reconfigurable Acceleration for FEM-based CFD Simulations	Anastassis Kapetanakis, Aggelos Ferikoglou, George Anagnostopoulos, Sotirios Xydis	2024-11-25	下载	Computational Fluid Dynamics (CFD) simulations are essential for analyzing and optimizing fluid flows in a wide range of real-world applications.
UVLLM: An Automated Universal RTL Verification Framework using LLMs	Yuchen Hu, Junhao Ye, Ke Xu, Jialin Sun, Shiyue Zhang, Xinyao Jiao, Dingrong Pan, Jie Zhou, Ning Wang, Weiwei Shan, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang	2024-11-25	下载	Verifying hardware designs in embedded systems is crucial but often labor-intensive and time-consuming. While existing solutions have improved automation, they frequently rely on unrealistic assumptio...
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference	Yu Zhang, Mingzi Wang, Lancheng Zou, Wulong Liu, Hui-Ling Zhen, Mingxuan Yuan, Bei Yu	2024-11-25	下载	Transformer-based large language models (LLMs) have achieved remarkable success as model sizes continue to grow, yet their deployment remains challenging due to significant computational and memory de...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Understanding GEMM Performance and Energy on NVIDIA Ada Lovelace: A Machine Learning-Based Analytical Approach	Xiaoteng, Liu, Pavly Halim	2024-11-25	下载	Analytical framework for predicting General Matrix Multiplication (GEMM) performance on modern GPUs, focusing on runtime, power consumption, and energy efficiency.
OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-Paths	Leo Gold, Adam Bienkowski, David Sidoti, Krishna Pattipati, Omer Khan	2024-11-25	下载	The Multi-Objective Shortest-Path (MOS) problem finds a set of Pareto-optimal solutions from a start node to a destination node in a multi-attribute graph.
Observability in Fog Computing	Aleteia Araujo, Breno Costa, Joao Bachiega, Leonardo R. Carvalho, Rajkumar Buyya	2024-11-25	下载	Fog Computing provides computational resources close to the end user, supporting low-latency and high-bandwidth communications. It supports IoT applications, enabling real-time data processing, analyt...
K8s Pro Sentinel: Extend Secret Security in Kubernetes Cluster	Kavindu Gunathilake, Indrajith Ekanayake	2024-11-25	下载	Microservice architecture is widely adopted among distributed systems. It follows the modular approach that decomposes large software applications into independent services.
Lion Cub: Minimizing Communication Overhead in Distributed Lion	Satoki Ishikawa, Tal Ben-Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden	2024-11-25	下载	Communication overhead is a key challenge in distributed deep learning, especially on slower Ethernet interconnects, and given current hardware trends, communication is likely to become a major bottle...
Proxima. A DAG based cooperative distributed ledger	Evaldas Drasutis	2024-11-25	下载	This paper introduces a novel architecture for a distributed ledger, commonly referred to as a "blockchain", which is organized in the form of directed acyclic graph (DAG) with UTXO transactions as ve...
Truffle: Efficient Data Passing for Data-Intensive Serverless Workflows in the Edge-Cloud Continuum	Cynthia Marcelino, Stefan Nastic	2024-11-25	下载	Serverless computing promises a scalable, reliable, and cost-effective solution for running data-intensive applications and workflows in the heterogeneous and limited-resource environment of the Edge-...
A Framework for Consistency Models in Distributed Systems	Paulo Sérgio Almeida	2024-11-25	下载	We define am axiomatic timeless framework for asynchronous distributed systems, together with well-formedness and consistency axioms, which unifies and generalizes the expressive power of current appr...
Scalable Fault-Tolerant MapReduce	Demian Hespe, Lukas Hübner, Charel Mercatoris, Peter Sanders	2024-11-25	下载	Supercomputers getting ever larger and energy-efficient is at odds with the reliability of the used hardware. Thus, the time intervals between component failures are decreasing.
HeteroTune: Efficient Federated Learning for Large Heterogeneous Models	Ruofan Jia, Weiying Xie, Jie Lei, Jitao Ma, Haonan Qin, Leyuan Fang	2024-11-25	下载	While large pre-trained models have achieved impressive performance across AI tasks, their deployment in privacy-sensitive and distributed environments remains challenging.
Energy-aware operation of HPC systems in Germany	Estela Suarez, Hendryk Bockelmann, Norbert Eicker, Jan Eitzinger, Salem El Sayed, Thomas Fieseler, Martin Frank, Peter Frech, Pay Giesselmann, Daniel Hackenberg, Georg Hager, Andreas Herten, Thomas Ilsche, Bastian Koller, Erwin Laure, Cristina Manzano, Sebastian Oeste, Michael Ott, Klaus Reuter, Ralf Schneider, Kay Thust, Benedikt von St. Vieth	2024-11-25	下载	High-Performance Computing (HPC) systems are among the most energy-intensive scientific facilities, with electric power consumption reaching and often exceeding 20 megawatts per installation.
Staleness-Centric Optimizations for Parallel Diffusion MoE Inference	Jiajun Luo, Lizhuo Luo, Jianru Xu, Jiajun Song, Rongwei Lu, Chen Tang, Zhi Wang	2024-11-25	下载	Mixture-of-Experts-based (MoE-based) diffusion models demonstrate remarkable scalability in high-fidelity image generation, yet their reliance on expert parallelism introduces critical communication b...
HiDP: Hierarchical DNN Partitioning for Distributed Inference on Heterogeneous Edge Platforms	Zain Taufique, Aman Vyas, Antonio Miele, Pasi Liljeberg, Anil Kanduri	2024-11-25	下载	Edge inference techniques partition and distribute Deep Neural Network (DNN) inference tasks among multiple edge nodes for low latency inference, without considering the core-level heterogeneity of ed...
Data Processing Efficiency Aware User Association and Resource Allocation in Blockchain Enabled Metaverse over Wireless Communications	Liangxin Qian, Jun Zhao	2024-11-25	下载	In the rapidly evolving landscape of the Metaverse, enhanced by blockchain technology, the efficient processing of data has emerged as a critical challenge, especially in wireless communication system...
Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers	Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib	2024-11-25	下载	Graph Convolutional Networks (GCNs), particularly for large-scale graphs, are crucial across numerous domains. However, training distributed full-batch GCNs on large-scale graphs suffers from ineffici...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Enabling Skip Graphs to Process K-Dimensional Range Queries in a Mobile Sensor Network	Gregory J. Brault, Christopher James Augeri, Barry E. Mullins, Rusty O. Baldwin, Christopher B. Mayer	2024-11-25	下载	A skip graph is a resilient application-layer routing structure that supports range queries of distributed k-dimensional data. By sorting deterministic keys into groups based on locally computed rando...
Generative vs. Predictive Models in Massive MIMO Channel Prediction	Ju-Hyung Lee, Joohan Lee, Andreas F. Molisch	2024-11-25	下载	Massive MIMO (mMIMO) systems are essential for 5G/6G networks to meet high throughput and reliability demands, with machine learning (ML)-based techniques, particularly autoencoders (AEs), showing pro...
Poster: Could Large Language Models Perform Network Management?	Zine el abidine Kherroubi, Monika Prakash, Jean-Pierre Giacalone, Michael Baddeley	2024-11-25	下载	Modern wireless communication systems have become increasingly complex due to the proliferation of wireless devices, increasing performance standards, and growing security threats.
Static and Dynamic Routing, Fiber, Modulation Format, and Spectrum Allocation in Hybrid ULL Fiber-SSMF Elastic Optical Networks	Kangao Ouyang, Fengxian Tang, Zhilin Yuan, Jun Li, Yongcheng Li	2024-11-25	下载	Traditional standard single-mode fibers (SSMF) are unable to satisfy the future long-distance and high-speed optical channel transmission requirement due to their relatively large signal losses.
Data Processing Efficiency Aware User Association and Resource Allocation in Blockchain Enabled Metaverse over Wireless Communications	Liangxin Qian, Jun Zhao	2024-11-25	下载	In the rapidly evolving landscape of the Metaverse, enhanced by blockchain technology, the efficient processing of data has emerged as a critical challenge, especially in wireless communication system...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Asynchronous I/O -- With Great Power Comes Great Responsibility	Constantin Pestka, Marcus Paradies, Matthias Pohl	2024-11-25	下载	The performance of storage hardware has improved vastly recently, leaving the traditional I/O stack incapable of exploiting these gains due to increasingly large relative overheads.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Understanding GEMM Performance and Energy on NVIDIA Ada Lovelace: A Machine Learning-Based Analytical Approach	Xiaoteng, Liu, Pavly Halim	2024-11-25	下载	Analytical framework for predicting General Matrix Multiplication (GEMM) performance on modern GPUs, focusing on runtime, power consumption, and energy efficiency.
OPMOS: Ordered Parallel Algorithm for Multi-Objective Shortest-Paths	Leo Gold, Adam Bienkowski, David Sidoti, Krishna Pattipati, Omer Khan	2024-11-25	下载	The Multi-Objective Shortest-Path (MOS) problem finds a set of Pareto-optimal solutions from a start node to a destination node in a multi-attribute graph.
Optimizing Winograd Convolution on ARMv8 processors	Haoyuan Gui, Xiaoyu Zhang, Chong Zhang, Zitong Su, Huiyuan Li	2024-11-25	下载	As Convolutional Neural Networks (CNNs) gain prominence in deep learning, algorithms like Winograd Convolution have been introduced to enhance computational efficiency.
DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs	Jiahui Liu, Zhenkun Cai, Zhiyong Chen, Minjie Wang	2024-11-25	下载	Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train...
Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers	Chen Zhuang, Lingqi Zhang, Du Wu, Peng Chen, Jiajun Huang, Xin Liu, Rio Yokota, Nikoli Dryden, Toshio Endo, Satoshi Matsuoka, Mohamed Wahib	2024-11-25	下载	Graph Convolutional Networks (GCNs), particularly for large-scale graphs, are crucial across numerous domains. However, training distributed full-batch GCNs on large-scale graphs suffers from ineffici...