Skip to content

2025-09-24

cs.AR - Architecture

标题作者发布日期PDF摘要
Experience Deploying Containerized GenAI Services at an HPC CenterAngel M. Beltre, Jeff Ogden, Kevin Pedretti2025-09-24下载Generative Artificial Intelligence (GenAI) applications are built from specialized components -- inference servers, object storage, vector and graph databases, and user interfaces -- interconnected vi...
ZynqParrot: A Scale-Down Approach to Cycle-Accurate, FPGA-Accelerated Co-EmulationDaniel Ruelas-Petrisko, Farzam Gilani, Anoop Mysore Nataraja, Zoe Taylor, Michael Taylor2025-09-24下载As processors increase in complexity, costs grow even more rapidly, both for functional verification and performance validation. Most often, silicon characterizations comprise simple performance count...
SoCks - Simplifying Firmware and Software Integration for Heterogeneous SoCsMarvin Fuchs, Lukas Scheller, Timo Muscheid, Oliver Sander, Luis E. Ardila-Perez2025-09-24下载Modern heterogeneous System-on-Chip (SoC) devices integrate advanced components into a single package, offering powerful capabilities while also introducing significant complexity.
Pedagogically Motivated and Composable Open-Source RISC-V Processors for Computer Science EducationIan McDougall, Harish Batchu, Michael Davies, Karthikeyan Sankaralingam2025-09-24下载While most instruction set architectures (ISAs) are only available to use through the purchase of a restrictive commercial license, the RISC-V ISA presents a free and open-source alternative.
Design Insights and Comparative Evaluation of a Hardware-Based Cooperative Perception Architecture for Lane Change PredictionMohamed Manzour, Catherine M. Elias, Omar M. Shehata, Rubén Izquierdo, Miguel Ángel Sotelo2025-09-24下载Research on lane change prediction has gained attention in the last few years. Most existing works in this area have been conducted in simulation environments or with pre-recorded datasets, these work...
The Cream Rises to the Top: Efficient Reranking Method for Verilog Code GenerationGuang Yang, Wei Zheng, Xiang Chen, Yifan Sun, Fengji Zhang, Terry Yue Zhuo2025-09-24下载LLMs face significant challenges in Verilog generation due to limited domain-specific knowledge. While sampling techniques improve pass@k metrics, hardware engineers need one trustworthy solution rath...
Automated Multi-Agent Workflows for RTL DesignAmulya Bhattaram, Janani Ramamoorthy, Ranit Gupta, Diana Marculescu, Dimitrios Stamoulis2025-09-24下载The rise of agentic AI workflows unlocks novel opportunities for computer systems design and optimization. However, for specialized domains such as program synthesis, the relative scarcity of HDL and ...
Digital Signal Processing from Classical Coherent Systems to Continuous-Variable QKD: A Review of Cross-Domain Techniques, Applications, and ChallengesDavi Juvêncio Gomes de Sousa, Caroline da Silva Morais Alves, Valéria Loureiro da Silva, Nelson Alves Ferreira Neto2025-09-24下载This systematic review investigates the application of digital signal processing (DSP) techniques -- originally developed for coherent optical communication systems to continuous-variable quantum key ...
OpenGL GPU-Based Rowhammer Attack (Work in Progress)Antoine Plin, Frédéric Fauberteau, Nga Nguyen2025-09-24下载Rowhammer attacks have emerged as a significant threat to modern DRAM-based memory systems, leveraging frequent memory accesses to induce bit flips in adjacent memory cells.
SpecMamba: Accelerating Mamba Inference on FPGA with Speculative DecodingLinfeng Zhong, Songqiang Xu, Huifeng Wen, Tong Xie, Qingyu Guo, Yuan Wang, Meng Li2025-09-24下载The growing demand for efficient long-sequence modeling on edge devices has propelled widespread adoption of State Space Models (SSMs) like Mamba, due to their superior computational efficiency and sc...
Open-source Stand-Alone Versatile Tensor AcceleratorAnthony Faure-Gignoux, Kevin Delmas, Adrien Gauffriau, Claire Pagetti2025-09-24下载Machine Learning (ML) applications demand significant computational resources, posing challenges for safety-critical domains like aeronautics.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Experience Deploying Containerized GenAI Services at an HPC CenterAngel M. Beltre, Jeff Ogden, Kevin Pedretti2025-09-24下载Generative Artificial Intelligence (GenAI) applications are built from specialized components -- inference servers, object storage, vector and graph databases, and user interfaces -- interconnected vi...
FZModules: A Heterogeneous Computing Framework for Customizable Scientific Data Compression PipelinesSkyler Ruiter, Jiannan Tian, Fengguang Song2025-09-24下载Modern scientific simulations and instruments generate data volumes that overwhelm memory and storage, throttling scalability. Lossy compression mitigates this by trading controlled error for reduced ...
Adaptive Approach to Enhance Machine Learning Scheduling Algorithms During Runtime Using Reinforcement Learning in Metascheduling ApplicationsSamer Alshaer, Ala Khalifeh, Roman Obermaisser2025-09-24下载Metascheduling in time-triggered architectures has been crucial in adapting to dynamic and unpredictable environments, ensuring the reliability and efficiency of task execution.
Reconstruction-Based Adaptive Scheduling Using AI Inferences in Safety-Critical SystemsSamer Alshaer, Ala Khalifeh, Roman Obermaisser2025-09-24下载Adaptive scheduling is crucial for ensuring the reliability and safety of time-triggered systems (TTS) in dynamic operational environments. Scheduling frameworks face significant challenges, including...
xGFabric: Coupling Sensor Networks and HPC Facilities with Private 5G Wireless Networks for Real-Time Digital AgricultureLiubov Kurafeeva, Alan Subedi, Ryan Hartung, Michael Fay, Avhishek Biswas, Shantenu Jha, Ozgur O. Kilic, Chandra Krintz, Andre Merzky, Douglas Thain, Mehmet C. Vuran, Rich Wolski2025-09-24下载Advanced scientific applications require coupling distributed sensor networks with centralized high-performance computing facilities. Citrus Under Protective Screening (CUPS) exemplifies this need in ...
Energy Use of AI Inference: Efficiency Pathways and Test-Time ComputeFelipe Oviedo, Fiodar Kazhamiaka, Esha Choukse, Allen Kim, Amy Luers, Melanie Nakagawa, Ricardo Bianchini, Juan M. Lavista Ferres2025-09-24下载As AI inference scales to billions of queries and emerging reasoning and agentic workflows increase token demand, reliable estimates of per-query energy use are increasingly important for capacity pla...
An Empirical Analysis of Secure Federated Learning for Autonomous Vehicle ApplicationsMd Jueal Mia, M. Hadi Amini2025-09-24下载Federated Learning lends itself as a promising paradigm in enabling distributed learning for autonomous vehicles applications and ensuring data privacy while enhancing and refining predictive model pe...
Fulcrum: Optimizing Concurrent DNN Training and Inferencing on Edge AcceleratorsPrashanthi S. K., Saisamarth Taluri, Pranav Gupta, Amartya Ranjan Saikia, Kunal Kumar Sahoo, Atharva Vinay Joshi, Lakshya Karwa, Kedar Dhule, Yogesh Simmhan2025-09-24下载The proliferation of GPU accelerated edge devices like Nvidia Jetsons and the rise in privacy concerns are placing an emphasis on concurrent DNN training and inferencing on edge devices.
Pagoda: An Energy and Time Roofline Study for DNN Workloads on Edge AcceleratorsPrashanthi S. K., Kunal Kumar Sahoo, Amartya Ranjan Saikia, Pranav Gupta, Atharva Vinay Joshi, Priyanshu Pansari, Yogesh Simmhan2025-09-24下载Edge accelerators such as Nvidia Jetsons are becoming an integral part of the computing continuum, and are often used for DNN inferencing and training.
Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning ModelsPrashanthi S. K., Sai Anuroop Kesanapalli, Yogesh Simmhan2025-09-24下载Deep Neural Networks (DNNs) have had a significant impact on domains like autonomous vehicles and smart cities through low-latency inferencing on edge computing devices close to the data source.
BurstEngine: an Efficient Distributed Framework for Training Transformers on Extremely Long Sequences of over 1M TokensAo Sun, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong sun2025-09-24下载Existing methods for training LLMs on long-sequence data, such as Tensor Parallelism and Context Parallelism, exhibit low Model FLOPs Utilization as sequence lengths and number of GPUs increase, espec...
A Theory of Multi-Agent Generative Flow NetworksLeo Maxime Brunswic, Haozhi Wang, Shuang Luo, Jianye Hao, Amir Rasouli, Yinchuan Li2025-09-24下载Generative flow networks utilize a flow-matching loss to learn a stochastic policy for generating objects from a sequence of actions, such that the probability of generating a pattern can be proportio...
Gyges: Dynamic Cross-Instance Parallelism Transformation for Efficient LLM InferenceHaoyu Chen, Xue Li, Kun Qian, Yu Guan, Jin Zhao, Xin Wang2025-09-24下载Efficiently processing the dynamics of requests, especially the context length variance, is important in Large Language Model (LLM) serving scenarios.
Characterizing Adaptive Mesh Refinement on Heterogeneous Platforms with Parthenon-VIBEAkash Poptani, Alireza Khadem, Scott Mahlke, Jonah Miller, Joshua Dolence, Reetuparna Das2025-09-24下载Hero-class HPC simulations rely on Adaptive Mesh Refinement (AMR) to reduce compute and memory demands while maintaining accuracy. This work analyzes the performance of Parthenon, a block-structured A...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
An LLM-based Agentic Framework for Accessible Network ControlSamuel Lin, Jiawei Zhou, Minlan Yu2025-09-24下载Traditional approaches to network management have been accessible only to a handful of highly-trained network operators with significant expert knowledge.
TSKAN: Interpretable Machine Learning for QoE modeling over Time Series DataKamal Singh, Priyanka Rawat, Sami Marouani, Baptiste Jeudy2025-09-24下载Quality of Experience (QoE) modeling is crucial for optimizing video streaming services to capture the complex relationships between different features and user experience.
Can LLMs Forecast Internet Traffic from Social Media?Jonatan Langlet, Mariano Scazzariello, Flavio Luciani, Marta Burocchi, Dejan Kostić, Marco Chiesa2025-09-24下载Societal events shape the Internet's behavior. The death of a prominent public figure, a software launch, or a major sports match can trigger sudden demand surges that overwhelm peering points and con...
A Novel Short-Term Anomaly Prediction for IIoT with Software Defined Twin NetworkBilal Dalgic, Betul Sen, Muge Erel-Ozcevik2025-09-24下载Secure monitoring and dynamic control in an IIoT environment are major requirements for current development goals. We believe that dynamic, secure monitoring of the IIoT environment can be achieved th...
Joint Ex-Post Location Calibration and Radio Map Construction under Biased Positioning ErrorsKoki Kanzaki, Koya Sato2025-09-24下载This paper proposes a high-accuracy radio map construction method tailored for environments where location information is affected by bursty errors.
SPARQ: An Optimization Framework for the Distribution of AI-Intensive Applications under Non-Linear Delay ConstraintsPietro Spadaccino, Paolo Di Lorenzo, Sergio Barbarossa, Antonia M. Tulino, Jaime Llorca2025-09-24下载Next-generation real-time compute-intensive applications, such as extended reality, multi-user gaming, and autonomous transportation, are increasingly composed of heterogeneous AI-intensive functions ...
CollaPipe: Adaptive Segment-Optimized Pipeline Parallelism for Collaborative LLM Training in Heterogeneous Edge NetworksJiewei Chen, Xiumei Deng, Zehui Xiong, Shaoyong Guo, Xuesong Qiu, Ping Wang, Dusit Niyato2025-09-24下载The increasing demand for intelligent mobile applications has made multi-agent collaboration with Transformer-based large language models (LLMs) essential in mobile edge computing (MEC) networks.
Large Language Models for Real-World IoT Device IdentificationRameen Mahmood, Tousif Ahmed, Sai Teja Peddinti, Danny Yuxing Huang2025-09-24下载The rapid expansion of IoT devices has outpaced current identification methods, creating significant risks for security, privacy, and network accountability.
Games Are Not Equal: Classifying Cloud Gaming Contexts for Effective User Experience MeasurementYifan Wang, Minzhao Lyu, Vijay Sivaraman2025-09-24下载To tap into the growing market of cloud gaming, whereby game graphics is rendered in the cloud and streamed back to the user as a video feed, network operators are creating monetizable assurance servi...
RIS-assisted Data Collection and Wireless Power Transfer in Low-altitude Wireless NetworksWenwen Xie, Geng Sun, Jiahui Li, Jiacheng Wang, Yinqiu Liu, Dusit Niyato, Dong In Kim, Shiwen Mao2025-09-24下载Low-altitude wireless networks (LAWNs) have become effective solutions for collecting data from low-power Internet-of-Things devices (IoTDs) in remote areas with limited communication infrastructure.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Preparation Meets Opportunity: Enhancing Data Preprocessing for ML Training With SenecaOmkar Desai, Ziyang Jiao, Shuyi Pei, Janki Bhimani, Bryan S. Kim2025-09-24下载Input data preprocessing is a common bottleneck when concurrently training multimedia machine learning (ML) models in modern systems. To alleviate these bottlenecks and reduce the training time for co...

cs.PF - Performance

标题作者发布日期PDF摘要
denet, a lightweight command-line tool for process monitoring in benchmarking and beyondBen Carrillo, Izaskun Mallona2025-09-24下载Summary: denet is a lightweight process monitoring utility providing real-time resource profiling of running processes. denet reports CPU, memory, disk I/O, network activity, and thread usage, includi...
Characterizing Adaptive Mesh Refinement on Heterogeneous Platforms with Parthenon-VIBEAkash Poptani, Alireza Khadem, Scott Mahlke, Jonah Miller, Joshua Dolence, Reetuparna Das2025-09-24下载Hero-class HPC simulations rely on Adaptive Mesh Refinement (AMR) to reduce compute and memory demands while maintaining accuracy. This work analyzes the performance of Parthenon, a block-structured A...

基于 VitePress 构建