Skip to content

2025-04-16

cs.AR - Architecture

标题作者发布日期PDF摘要
HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design TasksStefan Abi-Karam, Cong Hao2025-04-16下载The rapid scaling of large language model (LLM) training and inference has driven their adoption in semiconductor design across academia and industry.
Subitizing-Inspired_Large_Language_Models_for_FloorplanningShao-Chien Lu, Chen-Chen Yeh, Hui-Lin Cho, Yu-Cheng Lin, Rung-Bin Lin2025-04-16下载We present a novel approach to solving the floorplanning problem by leveraging fine-tuned Large Language Models (LLMs). Inspired by subitizing--the human ability to instantly and accurately count smal...
Fast Parameter Optimization of Delayed Feedback Reservoir with Backpropagation and Gradient DescentSosei Ikeda, Hiromitsu Awano, Takashi Sato2025-04-16下载A delayed feedback reservoir (DFR) is a reservoir computing system well-suited for hardware implementations. However, achieving high accuracy in DFRs depends heavily on selecting appropriate hyperpara...
Hardware-Friendly Delayed-Feedback Reservoir for Multivariate Time-Series ClassificationSosei Ikeda, Hiromitsu Awano, Takashi Sato2025-04-16下载Reservoir computing (RC) is attracting attention as a machine-learning technique for edge computing. In time-series classification tasks, the number of features obtained using a reservoir depends on t...
Online Training and Inference System on Edge FPGA Using Delayed Feedback ReservoirSosei Ikeda, Hiromitsu Awano, Takashi Sato2025-04-16下载A delayed feedback reservoir (DFR) is a hardwarefriendly reservoir computing system. Implementing DFRs in embedded hardware requires efficient online training.
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled ArchitecturesPrabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen2025-04-16下载Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-G...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
You Don't Need All Attentions: Distributed Dynamic Fine-Tuning for Foundation ModelsShiwei Ding, Lan Zhang, Zhenlin Wang, Giuseppe Ateniese, Xiaoyong Yuan2025-04-16下载Fine-tuning plays a crucial role in adapting models to downstream tasks with minimal training efforts. However, the rapidly increasing size of foundation models poses a daunting challenge for accommod...
Diffusion Models on the Edge: Challenges, Optimizations, and ApplicationsDongqi Zheng2025-04-16下载Diffusion models have shown remarkable capabilities in generating high-fidelity data across modalities such as images, audio, and video. However, their computational intensity makes deployment on edge...
Communication Optimization for Decentralized Learning atop Bandwidth-limited Edge NetworksTingyang Sun, Tuan Nguyen, Ting He2025-04-16下载Decentralized federated learning (DFL) is a promising machine learning paradigm for bringing artificial intelligence (AI) capabilities to the network edge.
Extrae.jl: Julia bindings for the Extrae HPC ProfilerSergio Sanchez-Ramirez, Mosè Giordano2025-04-16下载The Julia programming language has gained acceptance within the High-Performance Computing (HPC) community due to its ability to tackle two-language problem: Julia code feels as high-level as Python b...
Learning from the Past: Adaptive Parallelism Tuning for Stream Processing SystemsYuxing Han, Lixiang Chen, Haoyu Wang, Zhanghao Chen, Yifan Zhang, Chengcheng Yang, Kongzhang Hao, Zhengyi Yang2025-04-16下载Distributed stream processing systems rely on the dataflow model to define and execute streaming jobs, organizing computations as Directed Acyclic Graphs (DAGs) of operators.
LO2: Microservice API Anomaly Dataset of Logs and MetricsAlexander Bakhtin, Jesse Nyyssölä, Yuqing Wang, Noman Ahmad, Ke Ping, Matteo Esposito, Mika Mäntylä, Davide Taibi2025-04-16下载Context. Microservice-based systems have gained significant attention over the past years. A critical factor for understanding and analyzing the behavior of these systems is the collection of monitori...
Combining Declarative and Linear Programming for Application Management in the Cloud-Edge ContinuumJacopo Massa, Stefano Forti, Patrizio Dazzi, Antonio Brogi2025-04-16下载This work investigates the data-aware multi-service application placement problem in Cloud-Edge settings. We previously introduced EdgeWise, a hybrid approach that combines declarative programming wit...
Deterministic Parallel High-Quality Hypergraph PartitioningRobert Krause, Lars Gottesbüren, Nikolai Maas2025-04-16下载We present a deterministic parallel multilevel algorithm for balanced hypergraph partitioning that matches the state of the art for non-deterministic algorithms.
Scaled Block Vecchia Approximation for High-Dimensional Gaussian Process Emulation on GPUsQilong Pan, Sameh Abdulah, Mustafa Abduljabbar, Hatem Ltaief, Andreas Herten, Mathis Bode, Matthew Pratola, Arindam Fadikar, Marc G. Genton, David E. Keyes, Ying Sun2025-04-16下载Emulating computationally intensive scientific simulations is crucial for enabling uncertainty quantification, optimization, and informed decision-making at scale.
FedCanon: Non-Convex Composite Federated Learning with Efficient Proximal Operation on Heterogeneous DataYuan Zhou, Jiachen Zhong, Xinli Shi, Guanghui Wen, Xinghuo Yu2025-04-16下载Composite federated learning offers a general framework for solving machine learning problems with additional regularization terms. However, existing methods often face significant limitations: many r...
Benchmarking Mutual Information-based Loss Functions in Federated LearningSarang S, Harsh D. Chothani, Qilei Li, Ahmed M. Abdelmoniem, Arnab K. Paul2025-04-16下载Federated Learning (FL) has attracted considerable interest due to growing privacy concerns and regulations like the General Data Protection Regulation (GDPR), which stresses the importance of privacy...
When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing ApplicationsSören Henning, Adriano Vogel, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser2025-04-16下载Performance benchmarking is a common practice in software engineering, particularly when building large-scale, distributed, and data-intensive systems.
Cost-Efficient LLM Serving in the Cloud: VM Selection with KV Cache OffloadingKihyun Kim, Jinwoo Kim, Hyunsun Chung, Myung-Hoon Cha, Hong-Yeon Kim, Youngjae Kim2025-04-16下载LLM inference is essential for applications like text summarization, translation, and data analysis, but the high cost of GPU instances from Cloud Service Providers (CSPs) like AWS is a major burden.
Network-Integrated Decoding System for Real-Time Quantum Error Correction with Lattice SurgeryNamitha Liyanage, Yue Wu, Emmet Houghton, Lin Zhong2025-04-16下载Existing real-time decoders for surface codes are limited to isolated logical qubits and do not support logical operations involving multiple logical qubits.
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled ArchitecturesPrabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen2025-04-16下载Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-G...
Scalability Optimization in Cloud-Based AI Inference Services: Strategies for Real-Time Load Balancing and Automated ScalingYihong Jin, Ze Yang2025-04-16下载The rapid expansion of AI inference services in the cloud necessitates a robust scalability solution to manage dynamic workloads and maintain high performance.
TurboFNO: High-Performance Fourier Neural Operator with Fused FFT-GEMM-iFFT on GPUShixun Wu, Yujia Zhai, Huangliang Dai, Hairui Zhao, Yue Zhu, Haiyang Hu, Zizhong Chen2025-04-16下载Fourier Neural Operators (FNO) are widely used for learning partial differential equation solution operators. However, FNO lacks architecture-aware optimizations,with its Fourier layers executing FFT,...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Structural Resilience Analysis of an Internet Fragment Against Targeted and Random Attacks -- A Case Study Based on iThena Project DataLukasz Swierczewski2025-04-16下载This article presents an analysis of the structural resilience of a fragment of Internet topology against both targeted and random attacks, based on empirical data obtained from the iThena project.
Diffusion Models on the Edge: Challenges, Optimizations, and ApplicationsDongqi Zheng2025-04-16下载Diffusion models have shown remarkable capabilities in generating high-fidelity data across modalities such as images, audio, and video. However, their computational intensity makes deployment on edge...
Communication Optimization for Decentralized Learning atop Bandwidth-limited Edge NetworksTingyang Sun, Tuan Nguyen, Ting He2025-04-16下载Decentralized federated learning (DFL) is a promising machine learning paradigm for bringing artificial intelligence (AI) capabilities to the network edge.
LO2: Microservice API Anomaly Dataset of Logs and MetricsAlexander Bakhtin, Jesse Nyyssölä, Yuqing Wang, Noman Ahmad, Ke Ping, Matteo Esposito, Mika Mäntylä, Davide Taibi2025-04-16下载Context. Microservice-based systems have gained significant attention over the past years. A critical factor for understanding and analyzing the behavior of these systems is the collection of monitori...
The Evolution of Zero Trust Architecture (ZTA) from Concept to ImplementationMd Nasiruzzaman, Maaruf Ali, Iftekhar Salam, Mahdi H. Miraz2025-04-16下载Zero Trust Architecture (ZTA) is one of the paradigm changes in cybersecurity, from the traditional perimeter-based model to perimeterless. This article studies the core concepts of ZTA, its beginning...
Network-Integrated Decoding System for Real-Time Quantum Error Correction with Lattice SurgeryNamitha Liyanage, Yue Wu, Emmet Houghton, Lin Zhong2025-04-16下载Existing real-time decoders for surface codes are limited to isolated logical qubits and do not support logical operations involving multiple logical qubits.
A New Paradigm of User-Centric Wireless Communication Driven by Large Language ModelsKuiyuan Ding, Caili Guo, Yang Yang, Wuxia Hu, Yonina C. Eldar2025-04-16下载The next generation of wireless communications seeks to deeply integrate artificial intelligence (AI) with user-centric communication networks, with the goal of developing AI-native networks that more...

cs.PF - Performance

标题作者发布日期PDF摘要
You Don't Need All Attentions: Distributed Dynamic Fine-Tuning for Foundation ModelsShiwei Ding, Lan Zhang, Zhenlin Wang, Giuseppe Ateniese, Xiaoyong Yuan2025-04-16下载Fine-tuning plays a crucial role in adapting models to downstream tasks with minimal training efforts. However, the rapidly increasing size of foundation models poses a daunting challenge for accommod...
Extrae.jl: Julia bindings for the Extrae HPC ProfilerSergio Sanchez-Ramirez, Mosè Giordano2025-04-16下载The Julia programming language has gained acceptance within the High-Performance Computing (HPC) community due to its ability to tackle two-language problem: Julia code feels as high-level as Python b...
When Should I Run My Application Benchmark?: Studying Cloud Performance Variability for the Case of Stream Processing ApplicationsSören Henning, Adriano Vogel, Esteban Perez-Wohlfeil, Otmar Ertl, Rick Rabiser2025-04-16下载Performance benchmarking is a common practice in software engineering, particularly when building large-scale, distributed, and data-intensive systems.
Characterizing and Optimizing LLM Inference Workloads on CPU-GPU Coupled ArchitecturesPrabhu Vellaisamy, Thomas Labonte, Sourav Chakraborty, Matt Turner, Samantika Sury, John Paul Shen2025-04-16下载Large language model (LLM)-based inference workloads increasingly dominate data center costs and resource utilization. Therefore, understanding the inference workload characteristics on evolving CPU-G...
A Technical Survey of Sparse Linear Solvers in Electronic Design AutomationNityanand Rai2025-04-16下载Sparse linear system solvers (Ax=bAx=b) are critical computational kernels in Electronic Design Automation (EDA), underpinning vital simulations for modern IC and system design.

基于 VitePress 构建