Appearance
2025-11-04
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Implementation and Evaluation of Stable Diffusion on a General-Purpose CGLA Accelerator | Takuto Ando, Yu Eto, Yasuhiko Nakashima | 2025-11-04 | 下载 | This paper presents the first implementation and in-depth evaluation of the primary computational kernels from the stable-diffusion.cpp image generation framework on IMAX3, a general-purpose Coarse-Gr... |
| Digit-Recurrence Posit Division | Raul Murillo, Julio Villalba-Moreno, Alberto A. Del Barrio, Guillermo Botella | 2025-11-04 | 下载 | Posit arithmetic has emerged as a promising alternative to IEEE 754 floating-point representation, offering enhanced accuracy and dynamic range. |
| Facial Expression Recognition System Using DNN Accelerator with Multi-threading on FPGA | Takuto Ando, Yusuke Inoue | 2025-11-04 | 下载 | In this paper, we implement a stand-alone facial expression recognition system on an SoC FPGA with multi-threading using a Deep learning Processor Unit (DPU). |
| VFocus: Better Verilog Generation from Large Language Model via Focused Reasoning | Zhuorui Zhao, Bing Li, Grace Li Zhang, Ulf Schlichtmann | 2025-11-04 | 下载 | Large Language Models (LLMs) have shown impressive potential in generating Verilog codes, but ensuring functional correctness remains a challenge. |
| Energy-Efficient Hardware Acceleration of Whisper ASR on a CGLA | Takuto Ando, Yu Eto, Ayumu Takeuchi, Yasuhiko Nakashima | 2025-11-04 | 下载 | The rise of generative AI for tasks like Automatic Speech Recognition (ASR) has created a critical energy consumption challenge. While ASICs offer high efficiency, they lack the programmability to ada... |
| BoolSkeleton: Boolean Network Skeletonization via Homogeneous Pattern Reduction | Liwei Ni, Jiaxi Zhang, Shenggen Zheng, Junfeng Liu, Xingyu Meng, Biwei Xie, Xingquan Li, Huawei Li | 2025-11-04 | 下载 | Boolean equivalence allows Boolean networks with identical functionality to exhibit diverse graph structures. This gives more room for exploration in logic optimization, while also posing a challenge ... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Harvesting energy consumption on European HPC systems: Sharing Experience from the CEEC project | Kajol Kulkarni, Samuel Kemmler, Anna Schwarz, Gulcin Gedik, Yanxiang Chen, Dimitrios Papageorgiou, Ioannis Kavroulakis, Roman Iakymchuk | 2025-11-04 | 下载 | Energy efficiency has emerged as a central challenge for modern high-performance computing (HPC) systems, where escalating computational demands and architectural complexity have led to significant en... |
| Making Democracy Work: Fixing and Simplifying Egalitarian Paxos (Extended Version) | Fedor Ryabinin, Alexey Gotsman, Pierre Sutra | 2025-11-04 | 下载 | Classical state-machine replication protocols, such as Paxos, rely on a distinguished leader process to order commands. Unfortunately, this approach makes the leader a single point of failure and incr... |
| Implementing Multi-GPU Scientific Computing Miniapps Across Performance Portable Frameworks | Johansell Villalobos, Josef Ruzicka, Silvio Rizzi | 2025-11-04 | 下载 | Scientific computing in the exascale era demands increased computational power to solve complex problems across various domains. With the rise of heterogeneous computing architectures the need for ven... |
| Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks | Xiumei Deng, Zehui Xiong, Binbin Chen, Dong In Kim, Merouane Debbah, H. Vincent Poor | 2025-11-04 | 下载 | Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios. However, their practical deployment in collaborative scena... |
| Lightweight Latency Prediction Scheme for Edge Applications: A Rational Modelling Approach | Mohan Liyanage, Eldiyar Zhantileuov, Ali Kadhum Idrees, Rolf Schuster | 2025-11-04 | 下载 | Accurately predicting end-to-end network latency is essential for enabling reliable task offloading in real-time edge computing applications. This paper introduces a lightweight latency prediction sch... |
| 3D Point Cloud Object Detection on Edge Devices for Split Computing | Taisuke Noguchi, Takuya Azumi | 2025-11-04 | 下载 | The field of autonomous driving technology is rapidly advancing, with deep learning being a key component. Particularly in the field of sensing, 3D point cloud data collected by LiDAR is utilized to r... |
| Fast Algorithms for Scheduling Many-body Correlation Functions on Accelerators | Oguz Selvitopi, Emin Ozturk, Jie Chen, Ponnuswamy Sadayappan, Robert G. Edwards, Aydın Buluç | 2025-11-04 | 下载 | Computation of correlation functions is a key operation in Lattice quantum chromodynamics (LQCD) simulations to extract nuclear physics observables. |
| From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models | Xingqi Cui, Chieh-Jan Mike Liang, Jiarong Xing, Haoran Qiu | 2025-11-04 | 下载 | Serving large generative models such as LLMs and multi- modal transformers requires balancing user-facing SLOs (e.g., time-to-first-token, time-between-tokens) with provider goals of efficiency and co... |
| Beyond the GPU: The Strategic Role of FPGAs in the Next Wave of AI | Arturo Urías Jiménez | 2025-11-04 | 下载 | AI acceleration has been dominated by GPUs, but the growing need for lower latency, energy efficiency, and fine-grained hardware control exposes the limits of fixed architectures. |
| Evaluating Large Language Models for Workload Mapping and Scheduling in Heterogeneous HPC Systems | Aasish Kumar Sharma, Julian Kunkel | 2025-11-04 | 下载 | Large language models (LLMs) are increasingly explored for their reasoning capabilities, yet their ability to perform structured, constraint-based optimization from natural language remains insufficie... |
| Eliminating Multi-GPU Performance Taxes: A Systems Approach to Efficient Distributed LLMs | Octavian Alexandru Trifan, Karthik Sangaiah, Muhammad Awad, Muhammad Osama, Sumanth Gudaparthi, Alexandru Nicolau, Alexander Veidenbaum, Ganesh Dasika | 2025-11-04 | 下载 | As large language models (LLMs) continue to scale, their workloads increasingly rely on distributed execution across multiple GPUs. However, the conventional bulk synchronous parallel~(BSP) model used... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Distributed Incast Detection in Data Center Networks | Yiming Zheng, Haoran Qi, Lirui Yu, Zhan Shu, Qing Zhao | 2025-11-04 | 下载 | Incast traffic in data centers can lead to severe performance degradation, such as packet loss and increased latency. Effectively addressing incast requires prompt and accurate detection. |
| Federated Learning with Gramian Angular Fields for Privacy-Preserving ECG Classification on Heterogeneous IoT Devices | Youssef Elmir, Yassine Himeur, Abbes Amira | 2025-11-04 | 下载 | This study presents a federated learning (FL) framework for privacy-preserving electrocardiogram (ECG) classification in Internet of Things (IoT) healthcare environments. |
| DecodeX: Exploring and Benchmarking of LDPC Decoding across CPU, GPU, and ASIC Platforms | Zhenzhou Qi, Yuncheng Yao, Yiming Li, Chung-Hsuan Tung, Junyao Zheng, Danyang Zhuo, Tingjun Chen | 2025-11-04 | 下载 | Emerging virtualized radio access networks (vRANs) demand flexible and efficient baseband processing across heterogeneous compute substrates. In this paper, we present DecodeX, a unified benchmarking ... |
| Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning | Farhad Rezazadeh, Hatim Chergui, Merouane Debbah, Houbing Song, Dusit Niyato, Lingjia Liu | 2025-11-04 | 下载 | We argue that sixth-generation (6G) intelligence is not fluent token prediction but the capacity to imagine and choose -- to simulate future scenarios, weigh trade-offs, and act with calibrated uncert... |
| On the Optimization of Model Aggregation for Federated Learning at the Network Edge | Mengyao Li, Noah Ploch, Sebastian Troia, Carlo Spatocco, Wolfgang Kellerer, Guido Maier | 2025-11-04 | 下载 | The rapid increase in connected devices has signifi- cantly intensified the computational and communication demands on modern telecommunication networks. |
| CRRM: A 5G system-level simulator | Keith Briggs, Ibrahim Nur | 2025-11-04 | 下载 | System-level simulation is indispensable for developing and testing novel algorithms for 5G and future wireless networks, yet a gap persists between the needs of the machine learning re- search commun... |
| Decentralized AI Service Placement, Selection and Routing in Mobile Networks | Jinkun Zhang, Stefan Vlaski, Kin Leung | 2025-11-04 | 下载 | The rapid development and usage of large-scale AI models by mobile users will dominate the traffic load in future communication networks. The advent of AI technology also facilitates a decentralized A... |
| Janus: Leveraging Incremental Computation for Efficient DNS Verification | Yao Wang, Kexin Yu, Wenyun Xu, Kaiqiang Hu, Ziyi Wang, Lizhao You, Qiang Su, Dong Guo, Haizhou Du, Wanjian Feng, Qingyu Song, Linghe Kong, Qiao Xiang, Jiwu Shu | 2025-11-04 | 下载 | Existing DNS configuration verification tools face significant issues (e.g., inefficient and lacking support for incremental verification). Inspired by the advancements in recent work of distributed d... |
| Lightweight Latency Prediction Scheme for Edge Applications: A Rational Modelling Approach | Mohan Liyanage, Eldiyar Zhantileuov, Ali Kadhum Idrees, Rolf Schuster | 2025-11-04 | 下载 | Accurately predicting end-to-end network latency is essential for enabling reliable task offloading in real-time edge computing applications. This paper introduces a lightweight latency prediction sch... |
| Optimizing Multi-UAV 3D Deployment for Energy-Efficient Sensing over Uneven Terrains | Rushi Moliya, Dhaval K. Patel, Brijesh Soni, Miguel López-Benítez | 2025-11-04 | 下载 | In this work, we consider a multi-unmanned aerial vehicle (UAV) cooperative sensing system where UAVs are deployed to sense multiple targets in terrain-aware line of sight (LoS) conditions in uneven t... |
| Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live | Hanchen Li, Qiuyang Mang, Runyuan He, Qizheng Zhang, Huanzhi Mao, Xiaokun Chen, Hangrui Zhou, Alvin Cheung, Joseph Gonzalez, Ion Stoica | 2025-11-04 | 下载 | KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting. |
| Permissioned Blockchain in Advanced Air Mobility: A Performance Analysis for UTM | Rodrigo Nunes, André Melo, Rafael Albarello, Reinaldo Gomes, Cesar Marcondes, Lourenço Pereira | 2025-11-04 | 下载 | The integration of Uncrewed Aerial Vehicles (UAVs) into low-altitude airspace has led authorities to adopt distributed Uncrewed Traffic Management (UTM) architectures that ensure interoperability and ... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live | Hanchen Li, Qiuyang Mang, Runyuan He, Qizheng Zhang, Huanzhi Mao, Xiaokun Chen, Hangrui Zhou, Alvin Cheung, Joseph Gonzalez, Ion Stoica | 2025-11-04 | 下载 | KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Performance Evaluation of Bitstring Representations in a Linear Genetic Programming Framework | Clyde Meli, Vitezslav Nezval, Zuzana Kominkova Oplatkova, Victor Buttigieg, Anthony Spiteri Staines | 2025-11-04 | 下载 | Different bitstring representations can yield varying computational performance. This work compares three bitstring implementations in C++: std::bitset, boost::dynamic_bitset, and a custom direct impl... |