2025-09-24

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Experience Deploying Containerized GenAI Services at an HPC Center	Angel M. Beltre, Jeff Ogden, Kevin Pedretti	2025-09-24	下载	Generative Artificial Intelligence (GenAI) applications are built from specialized components -- inference servers, object storage, vector and graph databases, and user interfaces -- interconnected vi...
ZynqParrot: A Scale-Down Approach to Cycle-Accurate, FPGA-Accelerated Co-Emulation	Daniel Ruelas-Petrisko, Farzam Gilani, Anoop Mysore Nataraja, Zoe Taylor, Michael Taylor	2025-09-24	下载	As processors increase in complexity, costs grow even more rapidly, both for functional verification and performance validation. Most often, silicon characterizations comprise simple performance count...
SoCks - Simplifying Firmware and Software Integration for Heterogeneous SoCs	Marvin Fuchs, Lukas Scheller, Timo Muscheid, Oliver Sander, Luis E. Ardila-Perez	2025-09-24	下载	Modern heterogeneous System-on-Chip (SoC) devices integrate advanced components into a single package, offering powerful capabilities while also introducing significant complexity.
Pedagogically Motivated and Composable Open-Source RISC-V Processors for Computer Science Education	Ian McDougall, Harish Batchu, Michael Davies, Karthikeyan Sankaralingam	2025-09-24	下载	While most instruction set architectures (ISAs) are only available to use through the purchase of a restrictive commercial license, the RISC-V ISA presents a free and open-source alternative.
Design Insights and Comparative Evaluation of a Hardware-Based Cooperative Perception Architecture for Lane Change Prediction	Mohamed Manzour, Catherine M. Elias, Omar M. Shehata, Rubén Izquierdo, Miguel Ángel Sotelo	2025-09-24	下载	Research on lane change prediction has gained attention in the last few years. Most existing works in this area have been conducted in simulation environments or with pre-recorded datasets, these work...
The Cream Rises to the Top: Efficient Reranking Method for Verilog Code Generation	Guang Yang, Wei Zheng, Xiang Chen, Yifan Sun, Fengji Zhang, Terry Yue Zhuo	2025-09-24	下载	LLMs face significant challenges in Verilog generation due to limited domain-specific knowledge. While sampling techniques improve pass@k metrics, hardware engineers need one trustworthy solution rath...
Automated Multi-Agent Workflows for RTL Design	Amulya Bhattaram, Janani Ramamoorthy, Ranit Gupta, Diana Marculescu, Dimitrios Stamoulis	2025-09-24	下载	The rise of agentic AI workflows unlocks novel opportunities for computer systems design and optimization. However, for specialized domains such as program synthesis, the relative scarcity of HDL and ...
Digital Signal Processing from Classical Coherent Systems to Continuous-Variable QKD: A Review of Cross-Domain Techniques, Applications, and Challenges	Davi Juvêncio Gomes de Sousa, Caroline da Silva Morais Alves, Valéria Loureiro da Silva, Nelson Alves Ferreira Neto	2025-09-24	下载	This systematic review investigates the application of digital signal processing (DSP) techniques -- originally developed for coherent optical communication systems to continuous-variable quantum key ...
OpenGL GPU-Based Rowhammer Attack (Work in Progress)	Antoine Plin, Frédéric Fauberteau, Nga Nguyen	2025-09-24	下载	Rowhammer attacks have emerged as a significant threat to modern DRAM-based memory systems, leveraging frequent memory accesses to induce bit flips in adjacent memory cells.
SpecMamba: Accelerating Mamba Inference on FPGA with Speculative Decoding	Linfeng Zhong, Songqiang Xu, Huifeng Wen, Tong Xie, Qingyu Guo, Yuan Wang, Meng Li	2025-09-24	下载	The growing demand for efficient long-sequence modeling on edge devices has propelled widespread adoption of State Space Models (SSMs) like Mamba, due to their superior computational efficiency and sc...
Open-source Stand-Alone Versatile Tensor Accelerator	Anthony Faure-Gignoux, Kevin Delmas, Adrien Gauffriau, Claire Pagetti	2025-09-24	下载	Machine Learning (ML) applications demand significant computational resources, posing challenges for safety-critical domains like aeronautics.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Experience Deploying Containerized GenAI Services at an HPC Center	Angel M. Beltre, Jeff Ogden, Kevin Pedretti	2025-09-24	下载	Generative Artificial Intelligence (GenAI) applications are built from specialized components -- inference servers, object storage, vector and graph databases, and user interfaces -- interconnected vi...
FZModules: A Heterogeneous Computing Framework for Customizable Scientific Data Compression Pipelines	Skyler Ruiter, Jiannan Tian, Fengguang Song	2025-09-24	下载	Modern scientific simulations and instruments generate data volumes that overwhelm memory and storage, throttling scalability. Lossy compression mitigates this by trading controlled error for reduced ...
Adaptive Approach to Enhance Machine Learning Scheduling Algorithms During Runtime Using Reinforcement Learning in Metascheduling Applications	Samer Alshaer, Ala Khalifeh, Roman Obermaisser	2025-09-24	下载	Metascheduling in time-triggered architectures has been crucial in adapting to dynamic and unpredictable environments, ensuring the reliability and efficiency of task execution.
Reconstruction-Based Adaptive Scheduling Using AI Inferences in Safety-Critical Systems	Samer Alshaer, Ala Khalifeh, Roman Obermaisser	2025-09-24	下载	Adaptive scheduling is crucial for ensuring the reliability and safety of time-triggered systems (TTS) in dynamic operational environments. Scheduling frameworks face significant challenges, including...
xGFabric: Coupling Sensor Networks and HPC Facilities with Private 5G Wireless Networks for Real-Time Digital Agriculture	Liubov Kurafeeva, Alan Subedi, Ryan Hartung, Michael Fay, Avhishek Biswas, Shantenu Jha, Ozgur O. Kilic, Chandra Krintz, Andre Merzky, Douglas Thain, Mehmet C. Vuran, Rich Wolski	2025-09-24	下载	Advanced scientific applications require coupling distributed sensor networks with centralized high-performance computing facilities. Citrus Under Protective Screening (CUPS) exemplifies this need in ...
Energy Use of AI Inference: Efficiency Pathways and Test-Time Compute	Felipe Oviedo, Fiodar Kazhamiaka, Esha Choukse, Allen Kim, Amy Luers, Melanie Nakagawa, Ricardo Bianchini, Juan M. Lavista Ferres	2025-09-24	下载	As AI inference scales to billions of queries and emerging reasoning and agentic workflows increase token demand, reliable estimates of per-query energy use are increasingly important for capacity pla...
An Empirical Analysis of Secure Federated Learning for Autonomous Vehicle Applications	Md Jueal Mia, M. Hadi Amini	2025-09-24	下载	Federated Learning lends itself as a promising paradigm in enabling distributed learning for autonomous vehicles applications and ensuring data privacy while enhancing and refining predictive model pe...
Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge Accelerators	Prashanthi S. K., Saisamarth Taluri, Pranav Gupta, Amartya Ranjan Saikia, Kunal Kumar Sahoo, Atharva Vinay Joshi, Lakshya Karwa, Kedar Dhule, Yogesh Simmhan	2025-09-24	下载	The proliferation of GPU accelerated edge devices like Nvidia Jetsons and the rise in privacy concerns are placing an emphasis on concurrent DNN training and inferencing on edge devices.
Pagoda: An Energy and Time Roofline Study for DNN Workloads on Edge Accelerators	Prashanthi S. K., Kunal Kumar Sahoo, Amartya Ranjan Saikia, Pranav Gupta, Atharva Vinay Joshi, Priyanshu Pansari, Yogesh Simmhan	2025-09-24	下载	Edge accelerators such as Nvidia Jetsons are becoming an integral part of the computing continuum, and are often used for DNN inferencing and training.
Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models	Prashanthi S. K., Sai Anuroop Kesanapalli, Yogesh Simmhan	2025-09-24	下载	Deep Neural Networks (DNNs) have had a significant impact on domains like autonomous vehicles and smart cities through low-latency inferencing on edge computing devices close to the data source.
BurstEngine: an Efficient Distributed Framework for Training Transformers on Extremely Long Sequences of over 1M Tokens	Ao Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong sun	2025-09-24	下载	Existing methods for training LLMs on long-sequence data, such as Tensor Parallelism and Context Parallelism, exhibit low Model FLOPs Utilization as sequence lengths and number of GPUs increase, espec...
A Theory of Multi-Agent Generative Flow Networks	Leo Maxime Brunswic, Haozhi Wang, Shuang Luo, Jianye Hao, Amir Rasouli, Yinchuan Li	2025-09-24	下载	Generative flow networks utilize a flow-matching loss to learn a stochastic policy for generating objects from a sequence of actions, such that the probability of generating a pattern can be proportio...
Gyges: Dynamic Cross-Instance Parallelism Transformation for Efficient LLM Inference	Haoyu Chen, Xue Li, Kun Qian, Yu Guan, Jin Zhao, Xin Wang	2025-09-24	下载	Efficiently processing the dynamics of requests, especially the context length variance, is important in Large Language Model (LLM) serving scenarios.
Characterizing Adaptive Mesh Refinement on Heterogeneous Platforms with Parthenon-VIBE	Akash Poptani, Alireza Khadem, Scott Mahlke, Jonah Miller, Joshua Dolence, Reetuparna Das	2025-09-24	下载	Hero-class HPC simulations rely on Adaptive Mesh Refinement (AMR) to reduce compute and memory demands while maintaining accuracy. This work analyzes the performance of Parthenon, a block-structured A...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
An LLM-based Agentic Framework for Accessible Network Control	Samuel Lin, Jiawei Zhou, Minlan Yu	2025-09-24	下载	Traditional approaches to network management have been accessible only to a handful of highly-trained network operators with significant expert knowledge.
TSKAN: Interpretable Machine Learning for QoE modeling over Time Series Data	Kamal Singh, Priyanka Rawat, Sami Marouani, Baptiste Jeudy	2025-09-24	下载	Quality of Experience (QoE) modeling is crucial for optimizing video streaming services to capture the complex relationships between different features and user experience.
Can LLMs Forecast Internet Traffic from Social Media?	Jonatan Langlet, Mariano Scazzariello, Flavio Luciani, Marta Burocchi, Dejan Kostić, Marco Chiesa	2025-09-24	下载	Societal events shape the Internet's behavior. The death of a prominent public figure, a software launch, or a major sports match can trigger sudden demand surges that overwhelm peering points and con...
A Novel Short-Term Anomaly Prediction for IIoT with Software Defined Twin Network	Bilal Dalgic, Betul Sen, Muge Erel-Ozcevik	2025-09-24	下载	Secure monitoring and dynamic control in an IIoT environment are major requirements for current development goals. We believe that dynamic, secure monitoring of the IIoT environment can be achieved th...
Joint Ex-Post Location Calibration and Radio Map Construction under Biased Positioning Errors	Koki Kanzaki, Koya Sato	2025-09-24	下载	This paper proposes a high-accuracy radio map construction method tailored for environments where location information is affected by bursty errors.
SPARQ: An Optimization Framework for the Distribution of AI-Intensive Applications under Non-Linear Delay Constraints	Pietro Spadaccino, Paolo Di Lorenzo, Sergio Barbarossa, Antonia M. Tulino, Jaime Llorca	2025-09-24	下载	Next-generation real-time compute-intensive applications, such as extended reality, multi-user gaming, and autonomous transportation, are increasingly composed of heterogeneous AI-intensive functions ...
CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge Networks	Jiewei Chen, Xiumei Deng, Zehui Xiong, Shaoyong Guo, Xuesong Qiu, Ping Wang, Dusit Niyato	2025-09-24	下载	The increasing demand for intelligent mobile applications has made multi-agent collaboration with Transformer-based large language models (LLMs) essential in mobile edge computing (MEC) networks.
Large Language Models for Real-World IoT Device Identification	Rameen Mahmood, Tousif Ahmed, Sai Teja Peddinti, Danny Yuxing Huang	2025-09-24	下载	The rapid expansion of IoT devices has outpaced current identification methods, creating significant risks for security, privacy, and network accountability.
Games Are Not Equal: Classifying Cloud Gaming Contexts for Effective User Experience Measurement	Yifan Wang, Minzhao Lyu, Vijay Sivaraman	2025-09-24	下载	To tap into the growing market of cloud gaming, whereby game graphics is rendered in the cloud and streamed back to the user as a video feed, network operators are creating monetizable assurance servi...
RIS-assisted Data Collection and Wireless Power Transfer in Low-altitude Wireless Networks	Wenwen Xie, Geng Sun, Jiahui Li, Jiacheng Wang, Yinqiu Liu, Dusit Niyato, Dong In Kim, Shiwen Mao	2025-09-24	下载	Low-altitude wireless networks (LAWNs) have become effective solutions for collecting data from low-power Internet-of-Things devices (IoTDs) in remote areas with limited communication infrastructure.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Preparation Meets Opportunity: Enhancing Data Preprocessing for ML Training With Seneca	Omkar Desai, Ziyang Jiao, Shuyi Pei, Janki Bhimani, Bryan S. Kim	2025-09-24	下载	Input data preprocessing is a common bottleneck when concurrently training multimedia machine learning (ML) models in modern systems. To alleviate these bottlenecks and reduce the training time for co...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
denet, a lightweight command-line tool for process monitoring in benchmarking and beyond	Ben Carrillo, Izaskun Mallona	2025-09-24	下载	Summary: denet is a lightweight process monitoring utility providing real-time resource profiling of running processes. denet reports CPU, memory, disk I/O, network activity, and thread usage, includi...
Characterizing Adaptive Mesh Refinement on Heterogeneous Platforms with Parthenon-VIBE	Akash Poptani, Alireza Khadem, Scott Mahlke, Jonah Miller, Joshua Dolence, Reetuparna Das	2025-09-24	下载	Hero-class HPC simulations rely on Adaptive Mesh Refinement (AMR) to reduce compute and memory demands while maintaining accuracy. This work analyzes the performance of Parthenon, a block-structured A...