2025-04-16

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks	Stefan Abi-Karam, Cong Hao	2025-04-16	下载	The rapid scaling of large language model (LLM) training and inference has driven their adoption in semiconductor design across academia and industry.
Subitizing-Inspired_Large_Language_Models_for_Floorplanning	Shao-Chien Lu, Chen-Chen Yeh, Hui-Lin Cho, Yu-Cheng Lin, Rung-Bin Lin	2025-04-16	下载	We present a novel approach to solving the floorplanning problem by leveraging fine-tuned Large Language Models (LLMs). Inspired by subitizing--the human ability to instantly and accurately count smal...
Fast Parameter Optimization of Delayed Feedback Reservoir with Backpropagation and Gradient Descent	Sosei Ikeda, Hiromitsu Awano, Takashi Sato	2025-04-16	下载	A delayed feedback reservoir (DFR) is a reservoir computing system well-suited for hardware implementations. However, achieving high accuracy in DFRs depends heavily on selecting appropriate hyperpara...
Hardware-Friendly Delayed-Feedback Reservoir for Multivariate Time-Series Classification	Sosei Ikeda, Hiromitsu Awano, Takashi Sato	2025-04-16	下载	Reservoir computing (RC) is attracting attention as a machine-learning technique for edge computing. In time-series classification tasks, the number of features obtained using a reservoir depends on t...
Online Training and Inference System on Edge FPGA Using Delayed Feedback Reservoir	Sosei Ikeda, Hiromitsu Awano, Takashi Sato	2025-04-16	下载	A delayed feedback reservoir (DFR) is a hardwarefriendly reservoir computing system. Implementing DFRs in embedded hardware requires efficient online training.
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures	Prabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen	2025-04-16	下载	Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-G...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
You Don't Need All Attentions: Distributed Dynamic Fine-Tuning for Foundation Models	Shiwei Ding, Lan Zhang, Zhenlin Wang, Giuseppe Ateniese, Xiaoyong Yuan	2025-04-16	下载	Fine-tuning plays a crucial role in adapting models to downstream tasks with minimal training efforts. However, the rapidly increasing size of foundation models poses a daunting challenge for accommod...
Diffusion Models on the Edge: Challenges, Optimizations, and Applications	Dongqi Zheng	2025-04-16	下载	Diffusion models have shown remarkable capabilities in generating high-fidelity data across modalities such as images, audio, and video. However, their computational intensity makes deployment on edge...
Communication Optimization for Decentralized Learning atop Bandwidth-limited Edge Networks	Tingyang Sun, Tuan Nguyen, Ting He	2025-04-16	下载	Decentralized federated learning (DFL) is a promising machine learning paradigm for bringing artificial intelligence (AI) capabilities to the network edge.
Extrae.jl: Julia bindings for the Extrae HPC Profiler	Sergio Sanchez-Ramirez, Mosè Giordano	2025-04-16	下载	The Julia programming language has gained acceptance within the High-Performance Computing (HPC) community due to its ability to tackle two-language problem: Julia code feels as high-level as Python b...
Learning from the Past: Adaptive Parallelism Tuning for Stream Processing Systems	Yuxing Han, Lixiang Chen, Haoyu Wang, Zhanghao Chen, Yifan Zhang, Chengcheng Yang, Kongzhang Hao, Zhengyi Yang	2025-04-16	下载	Distributed stream processing systems rely on the dataflow model to define and execute streaming jobs, organizing computations as Directed Acyclic Graphs (DAGs) of operators.
LO2: Microservice API Anomaly Dataset of Logs and Metrics	Alexander Bakhtin, Jesse Nyyssölä, Yuqing Wang, Noman Ahmad, Ke Ping, Matteo Esposito, Mika Mäntylä, Davide Taibi	2025-04-16	下载	Context. Microservice-based systems have gained significant attention over the past years. A critical factor for understanding and analyzing the behavior of these systems is the collection of monitori...
Combining Declarative and Linear Programming for Application Management in the Cloud-Edge Continuum	Jacopo Massa, Stefano Forti, Patrizio Dazzi, Antonio Brogi	2025-04-16	下载	This work investigates the data-aware multi-service application placement problem in Cloud-Edge settings. We previously introduced EdgeWise, a hybrid approach that combines declarative programming wit...
Deterministic Parallel High-Quality Hypergraph Partitioning	Robert Krause, Lars Gottesbüren, Nikolai Maas	2025-04-16	下载	We present a deterministic parallel multilevel algorithm for balanced hypergraph partitioning that matches the state of the art for non-deterministic algorithms.
Scaled Block Vecchia Approximation for High-Dimensional Gaussian Process Emulation on GPUs	Qilong Pan, Sameh Abdulah, Mustafa Abduljabbar, Hatem Ltaief, Andreas Herten, Mathis Bode, Matthew Pratola, Arindam Fadikar, Marc G. Genton, David E. Keyes, Ying Sun	2025-04-16	下载	Emulating computationally intensive scientific simulations is crucial for enabling uncertainty quantification, optimization, and informed decision-making at scale.
FedCanon: Non-Convex Composite Federated Learning with Efficient Proximal Operation on Heterogeneous Data	Yuan Zhou, Jiachen Zhong, Xinli Shi, Guanghui Wen, Xinghuo Yu	2025-04-16	下载	Composite federated learning offers a general framework for solving machine learning problems with additional regularization terms. However, existing methods often face significant limitations: many r...
Benchmarking Mutual Information-based Loss Functions in Federated Learning	Sarang S, Harsh D. Chothani, Qilei Li, Ahmed M. Abdelmoniem, Arnab K. Paul	2025-04-16	下载	Federated Learning (FL) has attracted considerable interest due to growing privacy concerns and regulations like the General Data Protection Regulation (GDPR), which stresses the importance of privacy...
When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications	Sören Henning, Adriano Vogel, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser	2025-04-16	下载	Performance benchmarking is a common practice in software engineering, particularly when building large-scale, distributed, and data-intensive systems.
Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache Offloading	Kihyun Kim, Jinwoo Kim, Hyunsun Chung, Myung-Hoon Cha, Hong-Yeon Kim, Youngjae Kim	2025-04-16	下载	LLM inference is essential for applications like text summarization, translation, and data analysis, but the high cost of GPU instances from Cloud Service Providers (CSPs) like AWS is a major burden.
Network-Integrated Decoding System for Real-Time Quantum Error Correction with Lattice Surgery	Namitha Liyanage, Yue Wu, Emmet Houghton, Lin Zhong	2025-04-16	下载	Existing real-time decoders for surface codes are limited to isolated logical qubits and do not support logical operations involving multiple logical qubits.
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures	Prabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen	2025-04-16	下载	Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-G...
Scalability Optimization in Cloud-Based AI Inference Services: Strategies for Real-Time Load Balancing and Automated Scaling	Yihong Jin, Ze Yang	2025-04-16	下载	The rapid expansion of AI inference services in the cloud necessitates a robust scalability solution to manage dynamic workloads and maintain high performance.
TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPU	Shixun Wu, Yujia Zhai, Huangliang Dai, Hairui Zhao, Yue Zhu, Haiyang Hu, Zizhong Chen	2025-04-16	下载	Fourier Neural Operators (FNO) are widely used for learning partial differential equation solution operators. However, FNO lacks architecture-aware optimizations,with its Fourier layers executing FFT,...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Structural Resilience Analysis of an Internet Fragment Against Targeted and Random Attacks -- A Case Study Based on iThena Project Data	Lukasz Swierczewski	2025-04-16	下载	This article presents an analysis of the structural resilience of a fragment of Internet topology against both targeted and random attacks, based on empirical data obtained from the iThena project.
Diffusion Models on the Edge: Challenges, Optimizations, and Applications	Dongqi Zheng	2025-04-16	下载	Diffusion models have shown remarkable capabilities in generating high-fidelity data across modalities such as images, audio, and video. However, their computational intensity makes deployment on edge...
Communication Optimization for Decentralized Learning atop Bandwidth-limited Edge Networks	Tingyang Sun, Tuan Nguyen, Ting He	2025-04-16	下载	Decentralized federated learning (DFL) is a promising machine learning paradigm for bringing artificial intelligence (AI) capabilities to the network edge.
LO2: Microservice API Anomaly Dataset of Logs and Metrics	Alexander Bakhtin, Jesse Nyyssölä, Yuqing Wang, Noman Ahmad, Ke Ping, Matteo Esposito, Mika Mäntylä, Davide Taibi	2025-04-16	下载	Context. Microservice-based systems have gained significant attention over the past years. A critical factor for understanding and analyzing the behavior of these systems is the collection of monitori...
The Evolution of Zero Trust Architecture (ZTA) from Concept to Implementation	Md Nasiruzzaman, Maaruf Ali, Iftekhar Salam, Mahdi H. Miraz	2025-04-16	下载	Zero Trust Architecture (ZTA) is one of the paradigm changes in cybersecurity, from the traditional perimeter-based model to perimeterless. This article studies the core concepts of ZTA, its beginning...
Network-Integrated Decoding System for Real-Time Quantum Error Correction with Lattice Surgery	Namitha Liyanage, Yue Wu, Emmet Houghton, Lin Zhong	2025-04-16	下载	Existing real-time decoders for surface codes are limited to isolated logical qubits and do not support logical operations involving multiple logical qubits.
A New Paradigm of User-Centric Wireless Communication Driven by Large Language Models	Kuiyuan Ding, Caili Guo, Yang Yang, Wuxia Hu, Yonina C. Eldar	2025-04-16	下载	The next generation of wireless communications seeks to deeply integrate artificial intelligence (AI) with user-centric communication networks, with the goal of developing AI-native networks that more...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
You Don't Need All Attentions: Distributed Dynamic Fine-Tuning for Foundation Models	Shiwei Ding, Lan Zhang, Zhenlin Wang, Giuseppe Ateniese, Xiaoyong Yuan	2025-04-16	下载	Fine-tuning plays a crucial role in adapting models to downstream tasks with minimal training efforts. However, the rapidly increasing size of foundation models poses a daunting challenge for accommod...
Extrae.jl: Julia bindings for the Extrae HPC Profiler	Sergio Sanchez-Ramirez, Mosè Giordano	2025-04-16	下载	The Julia programming language has gained acceptance within the High-Performance Computing (HPC) community due to its ability to tackle two-language problem: Julia code feels as high-level as Python b...
When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing Applications	Sören Henning, Adriano Vogel, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser	2025-04-16	下载	Performance benchmarking is a common practice in software engineering, particularly when building large-scale, distributed, and data-intensive systems.
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled Architectures	Prabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen	2025-04-16	下载	Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-G...
A Technical Survey of Sparse Linear Solvers in Electronic Design Automation	Nityanand Rai	2025-04-16	下载	Sparse linear system solvers ( $Ax=b$ ) are critical computational kernels in Electronic Design Automation (EDA), underpinning vital simulations for modern IC and system design.