2025-11-04

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Implementation and Evaluation of Stable Diffusion on a General-Purpose CGLA Accelerator	Takuto Ando, Yu Eto, Yasuhiko Nakashima	2025-11-04	下载	This paper presents the first implementation and in-depth evaluation of the primary computational kernels from the stable-diffusion.cpp image generation framework on IMAX3, a general-purpose Coarse-Gr...
Digit-Recurrence Posit Division	Raul Murillo, Julio Villalba-Moreno, Alberto A. Del Barrio, Guillermo Botella	2025-11-04	下载	Posit arithmetic has emerged as a promising alternative to IEEE 754 floating-point representation, offering enhanced accuracy and dynamic range.
Facial Expression Recognition System Using DNN Accelerator with Multi-threading on FPGA	Takuto Ando, Yusuke Inoue	2025-11-04	下载	In this paper, we implement a stand-alone facial expression recognition system on an SoC FPGA with multi-threading using a Deep learning Processor Unit (DPU).
VFocus: Better Verilog Generation from Large Language Model via Focused Reasoning	Zhuorui Zhao, Bing Li, Grace Li Zhang, Ulf Schlichtmann	2025-11-04	下载	Large Language Models (LLMs) have shown impressive potential in generating Verilog codes, but ensuring functional correctness remains a challenge.
Energy-Efficient Hardware Acceleration of Whisper ASR on a CGLA	Takuto Ando, Yu Eto, Ayumu Takeuchi, Yasuhiko Nakashima	2025-11-04	下载	The rise of generative AI for tasks like Automatic Speech Recognition (ASR) has created a critical energy consumption challenge. While ASICs offer high efficiency, they lack the programmability to ada...
BoolSkeleton: Boolean Network Skeletonization via Homogeneous Pattern Reduction	Liwei Ni, Jiaxi Zhang, Shenggen Zheng, Junfeng Liu, Xingyu Meng, Biwei Xie, Xingquan Li, Huawei Li	2025-11-04	下载	Boolean equivalence allows Boolean networks with identical functionality to exhibit diverse graph structures. This gives more room for exploration in logic optimization, while also posing a challenge ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Harvesting energy consumption on European HPC systems: Sharing Experience from the CEEC project	Kajol Kulkarni, Samuel Kemmler, Anna Schwarz, Gulcin Gedik, Yanxiang Chen, Dimitrios Papageorgiou, Ioannis Kavroulakis, Roman Iakymchuk	2025-11-04	下载	Energy efficiency has emerged as a central challenge for modern high-performance computing (HPC) systems, where escalating computational demands and architectural complexity have led to significant en...
Making Democracy Work: Fixing and Simplifying Egalitarian Paxos (Extended Version)	Fedor Ryabinin, Alexey Gotsman, Pierre Sutra	2025-11-04	下载	Classical state-machine replication protocols, such as Paxos, rely on a distinguished leader process to order commands. Unfortunately, this approach makes the leader a single point of failure and incr...
Implementing Multi-GPU Scientific Computing Miniapps Across Performance Portable Frameworks	Johansell Villalobos, Josef Ruzicka, Silvio Rizzi	2025-11-04	下载	Scientific computing in the exascale era demands increased computational power to solve complex problems across various domains. With the rise of heterogeneous computing architectures the need for ven...
Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks	Xiumei Deng, Zehui Xiong, Binbin Chen, Dong In Kim, Merouane Debbah, H. Vincent Poor	2025-11-04	下载	Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios. However, their practical deployment in collaborative scena...
Lightweight Latency Prediction Scheme for Edge Applications: A Rational Modelling Approach	Mohan Liyanage, Eldiyar Zhantileuov, Ali Kadhum Idrees, Rolf Schuster	2025-11-04	下载	Accurately predicting end-to-end network latency is essential for enabling reliable task offloading in real-time edge computing applications. This paper introduces a lightweight latency prediction sch...
3D Point Cloud Object Detection on Edge Devices for Split Computing	Taisuke Noguchi, Takuya Azumi	2025-11-04	下载	The field of autonomous driving technology is rapidly advancing, with deep learning being a key component. Particularly in the field of sensing, 3D point cloud data collected by LiDAR is utilized to r...
Fast Algorithms for Scheduling Many-body Correlation Functions on Accelerators	Oguz Selvitopi, Emin Ozturk, Jie Chen, Ponnuswamy Sadayappan, Robert G. Edwards, Aydın Buluç	2025-11-04	下载	Computation of correlation functions is a key operation in Lattice quantum chromodynamics (LQCD) simulations to extract nuclear physics observables.
From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models	Xingqi Cui, Chieh-Jan Mike Liang, Jiarong Xing, Haoran Qiu	2025-11-04	下载	Serving large generative models such as LLMs and multi- modal transformers requires balancing user-facing SLOs (e.g., time-to-first-token, time-between-tokens) with provider goals of efficiency and co...
Beyond the GPU: The Strategic Role of FPGAs in the Next Wave of AI	Arturo Urías Jiménez	2025-11-04	下载	AI acceleration has been dominated by GPUs, but the growing need for lower latency, energy efficiency, and fine-grained hardware control exposes the limits of fixed architectures.
Evaluating Large Language Models for Workload Mapping and Scheduling in Heterogeneous HPC Systems	Aasish Kumar Sharma, Julian Kunkel	2025-11-04	下载	Large language models (LLMs) are increasingly explored for their reasoning capabilities, yet their ability to perform structured, constraint-based optimization from natural language remains insufficie...
Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs	Octavian Alexandru Trifan, Karthik Sangaiah, Muhammad Awad, Muhammad Osama, Sumanth Gudaparthi, Alexandru Nicolau, Alexander Veidenbaum, Ganesh Dasika	2025-11-04	下载	As large language models (LLMs) continue to scale, their workloads increasingly rely on distributed execution across multiple GPUs. However, the conventional bulk synchronous parallel~(BSP) model used...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Distributed Incast Detection in Data Center Networks	Yiming Zheng, Haoran Qi, Lirui Yu, Zhan Shu, Qing Zhao	2025-11-04	下载	Incast traffic in data centers can lead to severe performance degradation, such as packet loss and increased latency. Effectively addressing incast requires prompt and accurate detection.
Federated Learning with Gramian Angular Fields for Privacy-Preserving ECG Classification on Heterogeneous IoT Devices	Youssef Elmir, Yassine Himeur, Abbes Amira	2025-11-04	下载	This study presents a federated learning (FL) framework for privacy-preserving electrocardiogram (ECG) classification in Internet of Things (IoT) healthcare environments.
DecodeX: Exploring and Benchmarking of LDPC Decoding across CPU, GPU, and ASIC Platforms	Zhenzhou Qi, Yuncheng Yao, Yiming Li, Chung-Hsuan Tung, Junyao Zheng, Danyang Zhuo, Tingjun Chen	2025-11-04	下载	Emerging virtualized radio access networks (vRANs) demand flexible and efficient baseband processing across heterogeneous compute substrates. In this paper, we present DecodeX, a unified benchmarking ...
Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning	Farhad Rezazadeh, Hatim Chergui, Merouane Debbah, Houbing Song, Dusit Niyato, Lingjia Liu	2025-11-04	下载	We argue that sixth-generation (6G) intelligence is not fluent token prediction but the capacity to imagine and choose -- to simulate future scenarios, weigh trade-offs, and act with calibrated uncert...
On the Optimization of Model Aggregation for Federated Learning at the Network Edge	Mengyao Li, Noah Ploch, Sebastian Troia, Carlo Spatocco, Wolfgang Kellerer, Guido Maier	2025-11-04	下载	The rapid increase in connected devices has signifi- cantly intensified the computational and communication demands on modern telecommunication networks.
CRRM: A 5G system-level simulator	Keith Briggs, Ibrahim Nur	2025-11-04	下载	System-level simulation is indispensable for developing and testing novel algorithms for 5G and future wireless networks, yet a gap persists between the needs of the machine learning re- search commun...
Decentralized AI Service Placement, Selection and Routing in Mobile Networks	Jinkun Zhang, Stefan Vlaski, Kin Leung	2025-11-04	下载	The rapid development and usage of large-scale AI models by mobile users will dominate the traffic load in future communication networks. The advent of AI technology also facilitates a decentralized A...
Janus: Leveraging Incremental Computation for Efficient DNS Verification	Yao Wang, Kexin Yu, Wenyun Xu, Kaiqiang Hu, Ziyi Wang, Lizhao You, Qiang Su, Dong Guo, Haizhou Du, Wanjian Feng, Qingyu Song, Linghe Kong, Qiao Xiang, Jiwu Shu	2025-11-04	下载	Existing DNS configuration verification tools face significant issues (e.g., inefficient and lacking support for incremental verification). Inspired by the advancements in recent work of distributed d...
Lightweight Latency Prediction Scheme for Edge Applications: A Rational Modelling Approach	Mohan Liyanage, Eldiyar Zhantileuov, Ali Kadhum Idrees, Rolf Schuster	2025-11-04	下载	Accurately predicting end-to-end network latency is essential for enabling reliable task offloading in real-time edge computing applications. This paper introduces a lightweight latency prediction sch...
Optimizing Multi-UAV 3D Deployment for Energy-Efficient Sensing over Uneven Terrains	Rushi Moliya, Dhaval K. Patel, Brijesh Soni, Miguel López-Benítez	2025-11-04	下载	In this work, we consider a multi-unmanned aerial vehicle (UAV) cooperative sensing system where UAVs are deployed to sense multiple targets in terrain-aware line of sight (LoS) conditions in uneven t...
Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live	Hanchen Li, Qiuyang Mang, Runyuan He, Qizheng Zhang, Huanzhi Mao, Xiaokun Chen, Hangrui Zhou, Alvin Cheung, Joseph Gonzalez, Ion Stoica	2025-11-04	下载	KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting.
Permissioned Blockchain in Advanced Air Mobility: A Performance Analysis for UTM	Rodrigo Nunes, André Melo, Rafael Albarello, Reinaldo Gomes, Cesar Marcondes, Lourenço Pereira	2025-11-04	下载	The integration of Uncrewed Aerial Vehicles (UAVs) into low-altitude airspace has led authorities to adopt distributed Uncrewed Traffic Management (UTM) architectures that ensure interoperability and ...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live	Hanchen Li, Qiuyang Mang, Runyuan He, Qizheng Zhang, Huanzhi Mao, Xiaokun Chen, Hangrui Zhou, Alvin Cheung, Joseph Gonzalez, Ion Stoica	2025-11-04	下载	KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Performance Evaluation of Bitstring Representations in a Linear Genetic Programming Framework	Clyde Meli, Vitezslav Nezval, Zuzana Kominkova Oplatkova, Victor Buttigieg, Anthony Spiteri Staines	2025-11-04	下载	Different bitstring representations can yield varying computational performance. This work compares three bitstring implementations in C++: std::bitset, boost::dynamic_bitset, and a custom direct impl...