Skip to content

2025-11-04

cs.AR - Architecture

标题作者发布日期PDF摘要
Implementation and Evaluation of Stable Diffusion on a General-Purpose CGLA AcceleratorTakuto Ando, Yu Eto, Yasuhiko Nakashima2025-11-04下载This paper presents the first implementation and in-depth evaluation of the primary computational kernels from the stable-diffusion.cpp image generation framework on IMAX3, a general-purpose Coarse-Gr...
Digit-Recurrence Posit DivisionRaul Murillo, Julio Villalba-Moreno, Alberto A. Del Barrio, Guillermo Botella2025-11-04下载Posit arithmetic has emerged as a promising alternative to IEEE 754 floating-point representation, offering enhanced accuracy and dynamic range.
Facial Expression Recognition System Using DNN Accelerator with Multi-threading on FPGATakuto Ando, Yusuke Inoue2025-11-04下载In this paper, we implement a stand-alone facial expression recognition system on an SoC FPGA with multi-threading using a Deep learning Processor Unit (DPU).
VFocus: Better Verilog Generation from Large Language Model via Focused ReasoningZhuorui Zhao, Bing Li, Grace Li Zhang, Ulf Schlichtmann2025-11-04下载Large Language Models (LLMs) have shown impressive potential in generating Verilog codes, but ensuring functional correctness remains a challenge.
Energy-Efficient Hardware Acceleration of Whisper ASR on a CGLATakuto Ando, Yu Eto, Ayumu Takeuchi, Yasuhiko Nakashima2025-11-04下载The rise of generative AI for tasks like Automatic Speech Recognition (ASR) has created a critical energy consumption challenge. While ASICs offer high efficiency, they lack the programmability to ada...
BoolSkeleton: Boolean Network Skeletonization via Homogeneous Pattern ReductionLiwei Ni, Jiaxi Zhang, Shenggen Zheng, Junfeng Liu, Xingyu Meng, Biwei Xie, Xingquan Li, Huawei Li2025-11-04下载Boolean equivalence allows Boolean networks with identical functionality to exhibit diverse graph structures. This gives more room for exploration in logic optimization, while also posing a challenge ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Harvesting energy consumption on European HPC systems: Sharing Experience from the CEEC projectKajol Kulkarni, Samuel Kemmler, Anna Schwarz, Gulcin Gedik, Yanxiang Chen, Dimitrios Papageorgiou, Ioannis Kavroulakis, Roman Iakymchuk2025-11-04下载Energy efficiency has emerged as a central challenge for modern high-performance computing (HPC) systems, where escalating computational demands and architectural complexity have led to significant en...
Making Democracy Work: Fixing and Simplifying Egalitarian Paxos (Extended Version)Fedor Ryabinin, Alexey Gotsman, Pierre Sutra2025-11-04下载Classical state-machine replication protocols, such as Paxos, rely on a distinguished leader process to order commands. Unfortunately, this approach makes the leader a single point of failure and incr...
Implementing Multi-GPU Scientific Computing Miniapps Across Performance Portable FrameworksJohansell Villalobos, Josef Ruzicka, Silvio Rizzi2025-11-04下载Scientific computing in the exascale era demands increased computational power to solve complex problems across various domains. With the rise of heterogeneous computing architectures the need for ven...
Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge NetworksXiumei Deng, Zehui Xiong, Binbin Chen, Dong In Kim, Merouane Debbah, H. Vincent Poor2025-11-04下载Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios. However, their practical deployment in collaborative scena...
Lightweight Latency Prediction Scheme for Edge Applications: A Rational Modelling ApproachMohan Liyanage, Eldiyar Zhantileuov, Ali Kadhum Idrees, Rolf Schuster2025-11-04下载Accurately predicting end-to-end network latency is essential for enabling reliable task offloading in real-time edge computing applications. This paper introduces a lightweight latency prediction sch...
3D Point Cloud Object Detection on Edge Devices for Split ComputingTaisuke Noguchi, Takuya Azumi2025-11-04下载The field of autonomous driving technology is rapidly advancing, with deep learning being a key component. Particularly in the field of sensing, 3D point cloud data collected by LiDAR is utilized to r...
Fast Algorithms for Scheduling Many-body Correlation Functions on AcceleratorsOguz Selvitopi, Emin Ozturk, Jie Chen, Ponnuswamy Sadayappan, Robert G. Edwards, Aydın Buluç2025-11-04下载Computation of correlation functions is a key operation in Lattice quantum chromodynamics (LQCD) simulations to extract nuclear physics observables.
From Models to Operators: Rethinking Autoscaling Granularity for Large Generative ModelsXingqi Cui, Chieh-Jan Mike Liang, Jiarong Xing, Haoran Qiu2025-11-04下载Serving large generative models such as LLMs and multi- modal transformers requires balancing user-facing SLOs (e.g., time-to-first-token, time-between-tokens) with provider goals of efficiency and co...
Beyond the GPU: The Strategic Role of FPGAs in the Next Wave of AIArturo Urías Jiménez2025-11-04下载AI acceleration has been dominated by GPUs, but the growing need for lower latency, energy efficiency, and fine-grained hardware control exposes the limits of fixed architectures.
Evaluating Large Language Models for Workload Mapping and Scheduling in Heterogeneous HPC SystemsAasish Kumar Sharma, Julian Kunkel2025-11-04下载Large language models (LLMs) are increasingly explored for their reasoning capabilities, yet their ability to perform structured, constraint-based optimization from natural language remains insufficie...
Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMsOctavian Alexandru Trifan, Karthik Sangaiah, Muhammad Awad, Muhammad Osama, Sumanth Gudaparthi, Alexandru Nicolau, Alexander Veidenbaum, Ganesh Dasika2025-11-04下载As large language models (LLMs) continue to scale, their workloads increasingly rely on distributed execution across multiple GPUs. However, the conventional bulk synchronous parallel~(BSP) model used...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Distributed Incast Detection in Data Center NetworksYiming Zheng, Haoran Qi, Lirui Yu, Zhan Shu, Qing Zhao2025-11-04下载Incast traffic in data centers can lead to severe performance degradation, such as packet loss and increased latency. Effectively addressing incast requires prompt and accurate detection.
Federated Learning with Gramian Angular Fields for Privacy-Preserving ECG Classification on Heterogeneous IoT DevicesYoussef Elmir, Yassine Himeur, Abbes Amira2025-11-04下载This study presents a federated learning (FL) framework for privacy-preserving electrocardiogram (ECG) classification in Internet of Things (IoT) healthcare environments.
DecodeX: Exploring and Benchmarking of LDPC Decoding across CPU, GPU, and ASIC PlatformsZhenzhou Qi, Yuncheng Yao, Yiming Li, Chung-Hsuan Tung, Junyao Zheng, Danyang Zhuo, Tingjun Chen2025-11-04下载Emerging virtualized radio access networks (vRANs) demand flexible and efficient baseband processing across heterogeneous compute substrates. In this paper, we present DecodeX, a unified benchmarking ...
Agentic World Modeling for 6G: Near-Real-Time Generative State-Space ReasoningFarhad Rezazadeh, Hatim Chergui, Merouane Debbah, Houbing Song, Dusit Niyato, Lingjia Liu2025-11-04下载We argue that sixth-generation (6G) intelligence is not fluent token prediction but the capacity to imagine and choose -- to simulate future scenarios, weigh trade-offs, and act with calibrated uncert...
On the Optimization of Model Aggregation for Federated Learning at the Network EdgeMengyao Li, Noah Ploch, Sebastian Troia, Carlo Spatocco, Wolfgang Kellerer, Guido Maier2025-11-04下载The rapid increase in connected devices has signifi- cantly intensified the computational and communication demands on modern telecommunication networks.
CRRM: A 5G system-level simulatorKeith Briggs, Ibrahim Nur2025-11-04下载System-level simulation is indispensable for developing and testing novel algorithms for 5G and future wireless networks, yet a gap persists between the needs of the machine learning re- search commun...
Decentralized AI Service Placement, Selection and Routing in Mobile NetworksJinkun Zhang, Stefan Vlaski, Kin Leung2025-11-04下载The rapid development and usage of large-scale AI models by mobile users will dominate the traffic load in future communication networks. The advent of AI technology also facilitates a decentralized A...
Janus: Leveraging Incremental Computation for Efficient DNS VerificationYao Wang, Kexin Yu, Wenyun Xu, Kaiqiang Hu, Ziyi Wang, Lizhao You, Qiang Su, Dong Guo, Haizhou Du, Wanjian Feng, Qingyu Song, Linghe Kong, Qiao Xiang, Jiwu Shu2025-11-04下载Existing DNS configuration verification tools face significant issues (e.g., inefficient and lacking support for incremental verification). Inspired by the advancements in recent work of distributed d...
Lightweight Latency Prediction Scheme for Edge Applications: A Rational Modelling ApproachMohan Liyanage, Eldiyar Zhantileuov, Ali Kadhum Idrees, Rolf Schuster2025-11-04下载Accurately predicting end-to-end network latency is essential for enabling reliable task offloading in real-time edge computing applications. This paper introduces a lightweight latency prediction sch...
Optimizing Multi-UAV 3D Deployment for Energy-Efficient Sensing over Uneven TerrainsRushi Moliya, Dhaval K. Patel, Brijesh Soni, Miguel López-Benítez2025-11-04下载In this work, we consider a multi-unmanned aerial vehicle (UAV) cooperative sensing system where UAVs are deployed to sense multiple targets in terrain-aware line of sight (LoS) conditions in uneven t...
Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-LiveHanchen Li, Qiuyang Mang, Runyuan He, Qizheng Zhang, Huanzhi Mao, Xiaokun Chen, Hangrui Zhou, Alvin Cheung, Joseph Gonzalez, Ion Stoica2025-11-04下载KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting.
Permissioned Blockchain in Advanced Air Mobility: A Performance Analysis for UTMRodrigo Nunes, André Melo, Rafael Albarello, Reinaldo Gomes, Cesar Marcondes, Lourenço Pereira2025-11-04下载The integration of Uncrewed Aerial Vehicles (UAVs) into low-altitude airspace has led authorities to adopt distributed Uncrewed Traffic Management (UTM) architectures that ensure interoperability and ...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-LiveHanchen Li, Qiuyang Mang, Runyuan He, Qizheng Zhang, Huanzhi Mao, Xiaokun Chen, Hangrui Zhou, Alvin Cheung, Joseph Gonzalez, Ion Stoica2025-11-04下载KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting.

cs.PF - Performance

标题作者发布日期PDF摘要
Performance Evaluation of Bitstring Representations in a Linear Genetic Programming FrameworkClyde Meli, Vitezslav Nezval, Zuzana Kominkova Oplatkova, Victor Buttigieg, Anthony Spiteri Staines2025-11-04下载Different bitstring representations can yield varying computational performance. This work compares three bitstring implementations in C++: std::bitset, boost::dynamic_bitset, and a custom direct impl...

基于 VitePress 构建