Appearance
2025-04-16
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks | Stefan Abi-Karam, Cong Hao | 2025-04-16 | 下载 | The rapid scaling of large language model (LLM) training and inference has driven their adoption in semiconductor design across academia and industry. |
| Subitizing-Inspired_Large_Language_Models_for_Floorplanning | Shao-Chien Lu, Chen-Chen Yeh, Hui-Lin Cho, Yu-Cheng Lin, Rung-Bin Lin | 2025-04-16 | 下载 | We present a novel approach to solving the floorplanning problem by leveraging fine-tuned Large Language Models (LLMs). Inspired by subitizing--the human ability to instantly and accurately count smal... |
| Fast Parameter Optimization of Delayed Feedback Reservoir with Backpropagation and Gradient Descent | Sosei Ikeda, Hiromitsu Awano, Takashi Sato | 2025-04-16 | 下载 | A delayed feedback reservoir (DFR) is a reservoir computing system well-suited for hardware implementations. However, achieving high accuracy in DFRs depends heavily on selecting appropriate hyperpara... |
| Hardware-Friendly Delayed-Feedback Reservoir for Multivariate Time-Series Classification | Sosei Ikeda, Hiromitsu Awano, Takashi Sato | 2025-04-16 | 下载 | Reservoir computing (RC) is attracting attention as a machine-learning technique for edge computing. In time-series classification tasks, the number of features obtained using a reservoir depends on t... |
| Online Training and Inference System on Edge FPGA Using Delayed Feedback Reservoir | Sosei Ikeda, Hiromitsu Awano, Takashi Sato | 2025-04-16 | 下载 | A delayed feedback reservoir (DFR) is a hardwarefriendly reservoir computing system. Implementing DFRs in embedded hardware requires efficient online training. |
| Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures | Prabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen | 2025-04-16 | 下载 | Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-G... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| You Don't Need All Attentions: Distributed Dynamic Fine-Tuning for Foundation Models | Shiwei Ding, Lan Zhang, Zhenlin Wang, Giuseppe Ateniese, Xiaoyong Yuan | 2025-04-16 | 下载 | Fine-tuning plays a crucial role in adapting models to downstream tasks with minimal training efforts. However, the rapidly increasing size of foundation models poses a daunting challenge for accommod... |
| Diffusion Models on the Edge: Challenges, Optimizations, and Applications | Dongqi Zheng | 2025-04-16 | 下载 | Diffusion models have shown remarkable capabilities in generating high-fidelity data across modalities such as images, audio, and video. However, their computational intensity makes deployment on edge... |
| Communication Optimization for Decentralized Learning atop Bandwidth-limited Edge Networks | Tingyang Sun, Tuan Nguyen, Ting He | 2025-04-16 | 下载 | Decentralized federated learning (DFL) is a promising machine learning paradigm for bringing artificial intelligence (AI) capabilities to the network edge. |
| Extrae.jl: Julia bindings for the Extrae HPC Profiler | Sergio Sanchez-Ramirez, Mosè Giordano | 2025-04-16 | 下载 | The Julia programming language has gained acceptance within the High-Performance Computing (HPC) community due to its ability to tackle two-language problem: Julia code feels as high-level as Python b... |
| Learning from the Past: Adaptive Parallelism Tuning for Stream Processing Systems | Yuxing Han, Lixiang Chen, Haoyu Wang, Zhanghao Chen, Yifan Zhang, Chengcheng Yang, Kongzhang Hao, Zhengyi Yang | 2025-04-16 | 下载 | Distributed stream processing systems rely on the dataflow model to define and execute streaming jobs, organizing computations as Directed Acyclic Graphs (DAGs) of operators. |
| LO2: Microservice API Anomaly Dataset of Logs and Metrics | Alexander Bakhtin, Jesse Nyyssölä, Yuqing Wang, Noman Ahmad, Ke Ping, Matteo Esposito, Mika Mäntylä, Davide Taibi | 2025-04-16 | 下载 | Context. Microservice-based systems have gained significant attention over the past years. A critical factor for understanding and analyzing the behavior of these systems is the collection of monitori... |
| Combining Declarative and Linear Programming for Application Management in the Cloud-Edge Continuum | Jacopo Massa, Stefano Forti, Patrizio Dazzi, Antonio Brogi | 2025-04-16 | 下载 | This work investigates the data-aware multi-service application placement problem in Cloud-Edge settings. We previously introduced EdgeWise, a hybrid approach that combines declarative programming wit... |
| Deterministic Parallel High-Quality Hypergraph Partitioning | Robert Krause, Lars Gottesbüren, Nikolai Maas | 2025-04-16 | 下载 | We present a deterministic parallel multilevel algorithm for balanced hypergraph partitioning that matches the state of the art for non-deterministic algorithms. |
| Scaled Block Vecchia Approximation for High-Dimensional Gaussian Process Emulation on GPUs | Qilong Pan, Sameh Abdulah, Mustafa Abduljabbar, Hatem Ltaief, Andreas Herten, Mathis Bode, Matthew Pratola, Arindam Fadikar, Marc G. Genton, David E. Keyes, Ying Sun | 2025-04-16 | 下载 | Emulating computationally intensive scientific simulations is crucial for enabling uncertainty quantification, optimization, and informed decision-making at scale. |
| FedCanon: Non-Convex Composite Federated Learning with Efficient Proximal Operation on Heterogeneous Data | Yuan Zhou, Jiachen Zhong, Xinli Shi, Guanghui Wen, Xinghuo Yu | 2025-04-16 | 下载 | Composite federated learning offers a general framework for solving machine learning problems with additional regularization terms. However, existing methods often face significant limitations: many r... |
| Benchmarking Mutual Information-based Loss Functions in Federated Learning | Sarang S, Harsh D. Chothani, Qilei Li, Ahmed M. Abdelmoniem, Arnab K. Paul | 2025-04-16 | 下载 | Federated Learning (FL) has attracted considerable interest due to growing privacy concerns and regulations like the General Data Protection Regulation (GDPR), which stresses the importance of privacy... |
| When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications | Sören Henning, Adriano Vogel, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser | 2025-04-16 | 下载 | Performance benchmarking is a common practice in software engineering, particularly when building large-scale, distributed, and data-intensive systems. |
| Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading | Kihyun Kim, Jinwoo Kim, Hyunsun Chung, Myung-Hoon Cha, Hong-Yeon Kim, Youngjae Kim | 2025-04-16 | 下载 | LLM inference is essential for applications like text summarization, translation, and data analysis, but the high cost of GPU instances from Cloud Service Providers (CSPs) like AWS is a major burden. |
| Network-Integrated Decoding System for Real-Time Quantum Error Correction with Lattice Surgery | Namitha Liyanage, Yue Wu, Emmet Houghton, Lin Zhong | 2025-04-16 | 下载 | Existing real-time decoders for surface codes are limited to isolated logical qubits and do not support logical operations involving multiple logical qubits. |
| Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures | Prabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen | 2025-04-16 | 下载 | Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-G... |
| Scalability Optimization in Cloud-Based AI Inference Services: Strategies for Real-Time Load Balancing and Automated Scaling | Yihong Jin, Ze Yang | 2025-04-16 | 下载 | The rapid expansion of AI inference services in the cloud necessitates a robust scalability solution to manage dynamic workloads and maintain high performance. |
| TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPU | Shixun Wu, Yujia Zhai, Huangliang Dai, Hairui Zhao, Yue Zhu, Haiyang Hu, Zizhong Chen | 2025-04-16 | 下载 | Fourier Neural Operators (FNO) are widely used for learning partial differential equation solution operators. However, FNO lacks architecture-aware optimizations,with its Fourier layers executing FFT,... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Structural Resilience Analysis of an Internet Fragment Against Targeted and Random Attacks -- A Case Study Based on iThena Project Data | Lukasz Swierczewski | 2025-04-16 | 下载 | This article presents an analysis of the structural resilience of a fragment of Internet topology against both targeted and random attacks, based on empirical data obtained from the iThena project. |
| Diffusion Models on the Edge: Challenges, Optimizations, and Applications | Dongqi Zheng | 2025-04-16 | 下载 | Diffusion models have shown remarkable capabilities in generating high-fidelity data across modalities such as images, audio, and video. However, their computational intensity makes deployment on edge... |
| Communication Optimization for Decentralized Learning atop Bandwidth-limited Edge Networks | Tingyang Sun, Tuan Nguyen, Ting He | 2025-04-16 | 下载 | Decentralized federated learning (DFL) is a promising machine learning paradigm for bringing artificial intelligence (AI) capabilities to the network edge. |
| LO2: Microservice API Anomaly Dataset of Logs and Metrics | Alexander Bakhtin, Jesse Nyyssölä, Yuqing Wang, Noman Ahmad, Ke Ping, Matteo Esposito, Mika Mäntylä, Davide Taibi | 2025-04-16 | 下载 | Context. Microservice-based systems have gained significant attention over the past years. A critical factor for understanding and analyzing the behavior of these systems is the collection of monitori... |
| The Evolution of Zero Trust Architecture (ZTA) from Concept to Implementation | Md Nasiruzzaman, Maaruf Ali, Iftekhar Salam, Mahdi H. Miraz | 2025-04-16 | 下载 | Zero Trust Architecture (ZTA) is one of the paradigm changes in cybersecurity, from the traditional perimeter-based model to perimeterless. This article studies the core concepts of ZTA, its beginning... |
| Network-Integrated Decoding System for Real-Time Quantum Error Correction with Lattice Surgery | Namitha Liyanage, Yue Wu, Emmet Houghton, Lin Zhong | 2025-04-16 | 下载 | Existing real-time decoders for surface codes are limited to isolated logical qubits and do not support logical operations involving multiple logical qubits. |
| A New Paradigm of User-Centric Wireless Communication Driven by Large Language Models | Kuiyuan Ding, Caili Guo, Yang Yang, Wuxia Hu, Yonina C. Eldar | 2025-04-16 | 下载 | The next generation of wireless communications seeks to deeply integrate artificial intelligence (AI) with user-centric communication networks, with the goal of developing AI-native networks that more... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| You Don't Need All Attentions: Distributed Dynamic Fine-Tuning for Foundation Models | Shiwei Ding, Lan Zhang, Zhenlin Wang, Giuseppe Ateniese, Xiaoyong Yuan | 2025-04-16 | 下载 | Fine-tuning plays a crucial role in adapting models to downstream tasks with minimal training efforts. However, the rapidly increasing size of foundation models poses a daunting challenge for accommod... |
| Extrae.jl: Julia bindings for the Extrae HPC Profiler | Sergio Sanchez-Ramirez, Mosè Giordano | 2025-04-16 | 下载 | The Julia programming language has gained acceptance within the High-Performance Computing (HPC) community due to its ability to tackle two-language problem: Julia code feels as high-level as Python b... |
| When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications | Sören Henning, Adriano Vogel, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser | 2025-04-16 | 下载 | Performance benchmarking is a common practice in software engineering, particularly when building large-scale, distributed, and data-intensive systems. |
| Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures | Prabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen | 2025-04-16 | 下载 | Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-G... |
| A Technical Survey of Sparse Linear Solvers in Electronic Design Automation | Nityanand Rai | 2025-04-16 | 下载 | Sparse linear system solvers () are critical computational kernels in Electronic Design Automation (EDA), underpinning vital simulations for modern IC and system design. |