2025-01-14

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving	Ahmet Caner Yüzügüler, Jiawei Zhuang, Lukas Cavigelli	2025-01-14	下载	Large language models (LLMs) are typically served from clusters of GPUs/NPUs that consist of large number of devices. Unfortunately, communication between these devices incurs significant overhead, in...
CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement Learning	Guoliang He, Eiko Yoneki	2025-01-14	下载	Large language models (LLMs) are remarked by their substantial computational requirements. To mitigate the cost, researchers develop specialized CUDA kernels, which often fuse several tensor operation...
PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning	Marta Andronic, Jiawen Li, George A. Constantinides	2025-01-14	下载	Standard deep neural network inference involves the computation of interleaved linear maps and nonlinear activation functions. Prior work for ultra-low latency implementations has hardcoded these oper...
PUFBind: PUF-Enabled Lightweight Program Binary Authentication for FPGA-based Embedded Systems	Sneha Swaroopa, Venkata Sreekanth Balijabudda, Rajat Subhra Chakraborty, Indrajit Chakrabarti	2025-01-14	下载	Field Programmable Gate Array (FPGA)-based embedded systems have become mainstream in the last decade, often in security-sensitive applications.
An Efficient Sparse Hardware Accelerator for Spike-Driven Transformer	Zhengke Li, Wendong Mao, Siyu Zhang, Qiwei Dong, Zhongfeng Wang	2025-01-14	下载	Recently, large models, such as Vision Transformer and BERT, have garnered significant attention due to their exceptional performance. However, their extensive computational requirements lead to consi...
HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference	Yiming Gao, Chao Jiang, Wesley Piard, Xiangru Chen, Bhavesh Patel, Herman Lam	2025-01-14	下载	Point cloud is an important type of geometric data structure for many embedded applications such as autonomous driving and augmented reality. Current Point Cloud Networks (PCNs) have proven to achieve...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Keras Sig: Efficient Path Signature Computation on GPU in Keras 3	Rémi Genet, Hugo Inzirillo	2025-01-14	下载	In this paper we introduce Keras Sig a high-performance pythonic library designed to compute path signature for deep learning applications. Entirely built in Keras 3, \textit{Keras Sig} leverages the ...
A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution Systems	Minseok Ryu, Geunyeong Byeon, Kibaek Kim	2025-01-14	下载	We propose a GPU-accelerated distributed optimization algorithm for controlling multi-phase optimal power flow in active distribution systems with dynamically changing topologies.
PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving	Ahmet Caner Yüzügüler, Jiawei Zhuang, Lukas Cavigelli	2025-01-14	下载	Large language models (LLMs) are typically served from clusters of GPUs/NPUs that consist of large number of devices. Unfortunately, communication between these devices incurs significant overhead, in...
DNN-Powered MLOps Pipeline Optimization for Large Language Models: A Framework for Automated Deployment and Resource Management	Mahesh Vaijainthymala Krishnamoorthy, Kuppusamy Vellamadam Palavesam, Siva Venkatesh Arcot, Rajarajeswari Chinniah Kuppuswami	2025-01-14	下载	The exponential growth in the size and complexity of Large Language Models (LLMs) has introduced unprecedented challenges in their deployment and operational management.
Hierarchical Autoscaling for Large Language Model Serving with Chiron	Archit Patke, Dhemath Reddy, Saurabh Jha, Chandra Narayanaswami, Zbigniew Kalbarczyk, Ravishankar Iyer	2025-01-14	下载	Large language model (LLM) serving is becoming an increasingly important workload for cloud providers. Based on performance SLO requirements, LLM inference requests can be divided into (a) interactive...
Technical Report: Exploring Automatic Model-Checking of the Ethereum specification	Igor Konnov, Jure Kukovec, Thomas Pani, Roberto Saltini, Thanh Hai Tran	2025-01-14	下载	We investigate automated model-checking of the Ethereum specification, focusing on the Accountable Safety property of the 3SF consensus protocol.
HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference	Yiming Gao, Chao Jiang, Wesley Piard, Xiangru Chen, Bhavesh Patel, Herman Lam	2025-01-14	下载	Point cloud is an important type of geometric data structure for many embedded applications such as autonomous driving and augmented reality. Current Point Cloud Networks (PCNs) have proven to achieve...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
CVaR-Based Variational Quantum Optimization for User Association in Handoff-Aware Vehicular Networks	Zijiang Yan, Hao Zhou, Jianhua Pei, Aryan Kaushik, Hina Tabassum, Ping Wang	2025-01-14	下载	Efficient resource allocation is essential for optimizing various tasks in wireless networks, which are usually formulated as generalized assignment problems (GAP).
Enhancing Train Transportation in Sri Lanka: A Smart IOT based Multi-Subsystem Approach using MQTT	Dhanushka Balasingham, Sadeesha Samarathunga, Anuththara Bandara, Gayantha Godakanda Arachchige, Narmada Gamage, Jaliya L. Wijayaraja	2025-01-14	下载	This research proposes a system as a solution for the challenges faced by Sri Lanka' s historic railway system, such as scheduling delays, overcrowding, manual ticketing, and management inefficiencies...
Mobility Management in Integrated Sensing and Communications Networks	Yuri S. Ribeiro, Behrooz Makki, Andre L. F. de Almeida, Gabor Fodor	2025-01-14	下载	The performance of the integrated sensing and communication (ISAC) networks is considerably affected by the mobility of the transceiver nodes, user equipment devices (UEs) and the passive objects that...
Toward Interactive Multi-User Extended Reality Using Millimeter-Wave Networking	Jakob Struye, Sam Van Damme, Nabeel Nisar Bhat, Arno Troch, Barend Van Liempd, Hany Assasa, Filip Lemic, Jeroen Famaey, Maria Torres Vega	2025-01-14	下载	Extended Reality (XR) enables a plethora of novel interactive shared experiences. Ideally, users are allowed to roam around freely, while audiovisual content is delivered wirelessly to their Head-Moun...
Smooth Handovers via Smoothed Online Learning	Michail Kalntis, Andra Lutu, Jesús Omaña Iglesias, Fernando A. Kuipers, George Iosifidis	2025-01-14	下载	With users demanding seamless connectivity, handovers (HOs) have become a fundamental element of cellular networks. However, optimizing HOs is a challenging problem, further exacerbated by the growing...
Continual Reinforcement Learning for Digital Twin Synchronization Optimization	Haonan Tong, Mingzhe Chen, Jun Zhao, Ye Hu, Zhaohui Yang, Yuchen Liu, Changchuan Yin	2025-01-14	下载	This article investigates the adaptive resource allocation scheme for digital twin (DT) synchronization optimization over dynamic wireless networks.
Enhanced SPS Velocity-adaptive Scheme: Access Fairness in 5G NR V2I Networks	Xiao Xu, Qiong Wu, Pingyi Fan, Kezhi Wang	2025-01-14	下载	Vehicle-to-Infrastructure (V2I) technology enables information exchange between vehicles and road infrastructure. Specifically, when a vehicle approaches a roadside unit (RSU), it can exchange informa...
RCD-IoT: Enabling Industrial Monitoring and Control with Resource-Constrained Devices UnderHigh Packet Transmission Rates	Ayesha Abid, Muhammad Jazib, Muhammad Riaz	2025-01-14	下载	This paper highlights the significance of resource-constrained Internet of Things (RCD-IoT) systems in addressing the challenges faced by industries with limited resources.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
CORD: Co-design of Resource Allocation and Deadline Decomposition with Generative Profiling	Robert Gifford, Abby Eisenklam, Georgiy A. Bondar, Yifan Cai, Tushar Sial, Linh Thi Xuan Phan, Abhishek Halder	2025-01-14	下载	As multicore hardware is becoming increasingly common in real-time systems, traditional scheduling techniques that assume a single worst-case execution time for a task are no longer adequate, since th...
Verifying Device Drivers with Pancake	Junming Zhao, Miki Tanaka, Johannes Åman Pohjola, Alessandro Legnani, Tiana Tsang Ung, H. Truong, Tsun Wang Sau, Thomas Sewell, Rob Sison, Hira Syeda, Magnus Myreen, Michael Norrish, Gernot Heiser	2025-01-14	下载	Device driver bugs are the leading cause of OS compromises, and their formal verification is therefore highly desirable. To the best of our knowledge, no realistic and performant driver has been verif...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
CORD: Co-design of Resource Allocation and Deadline Decomposition with Generative Profiling	Robert Gifford, Abby Eisenklam, Georgiy A. Bondar, Yifan Cai, Tushar Sial, Linh Thi Xuan Phan, Abhishek Halder	2025-01-14	下载	As multicore hardware is becoming increasingly common in real-time systems, traditional scheduling techniques that assume a single worst-case execution time for a task are no longer adequate, since th...
[M	D	$\infty$ Queue Busy Period and Busy Cycle Distributions Computational Calculus](https://arxiv.org/abs/2505.10567v1)	Manuel Alberto M. Ferreira	2025-01-14