Skip to content

2025-01-14

cs.AR - Architecture

标题作者发布日期PDF摘要
PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM ServingAhmet Caner Yüzügüler, Jiawei Zhuang, Lukas Cavigelli2025-01-14下载Large language models (LLMs) are typically served from clusters of GPUs/NPUs that consist of large number of devices. Unfortunately, communication between these devices incurs significant overhead, in...
CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement LearningGuoliang He, Eiko Yoneki2025-01-14下载Large language models (LLMs) are remarked by their substantial computational requirements. To mitigate the cost, researchers develop specialized CUDA kernels, which often fuse several tensor operation...
PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured PruningMarta Andronic, Jiawen Li, George A. Constantinides2025-01-14下载Standard deep neural network inference involves the computation of interleaved linear maps and nonlinear activation functions. Prior work for ultra-low latency implementations has hardcoded these oper...
PUFBind: PUF-Enabled Lightweight Program Binary Authentication for FPGA-based Embedded SystemsSneha Swaroopa, Venkata Sreekanth Balijabudda, Rajat Subhra Chakraborty, Indrajit Chakrabarti2025-01-14下载Field Programmable Gate Array (FPGA)-based embedded systems have become mainstream in the last decade, often in security-sensitive applications.
An Efficient Sparse Hardware Accelerator for Spike-Driven TransformerZhengke Li, Wendong Mao, Siyu Zhang, Qiwei Dong, Zhongfeng Wang2025-01-14下载Recently, large models, such as Vision Transformer and BERT, have garnered significant attention due to their exceptional performance. However, their extensive computational requirements lead to consi...
HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud InferenceYiming Gao, Chao Jiang, Wesley Piard, Xiangru Chen, Bhavesh Patel, Herman Lam2025-01-14下载Point cloud is an important type of geometric data structure for many embedded applications such as autonomous driving and augmented reality. Current Point Cloud Networks (PCNs) have proven to achieve...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Keras Sig: Efficient Path Signature Computation on GPU in Keras 3Rémi Genet, Hugo Inzirillo2025-01-14下载In this paper we introduce Keras Sig a high-performance pythonic library designed to compute path signature for deep learning applications. Entirely built in Keras 3, \textit{Keras Sig} leverages the ...
A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution SystemsMinseok Ryu, Geunyeong Byeon, Kibaek Kim2025-01-14下载We propose a GPU-accelerated distributed optimization algorithm for controlling multi-phase optimal power flow in active distribution systems with dynamically changing topologies.
PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM ServingAhmet Caner Yüzügüler, Jiawei Zhuang, Lukas Cavigelli2025-01-14下载Large language models (LLMs) are typically served from clusters of GPUs/NPUs that consist of large number of devices. Unfortunately, communication between these devices incurs significant overhead, in...
DNN-Powered MLOps Pipeline Optimization for Large Language Models: A Framework for Automated Deployment and Resource ManagementMahesh Vaijainthymala Krishnamoorthy, Kuppusamy Vellamadam Palavesam, Siva Venkatesh Arcot, Rajarajeswari Chinniah Kuppuswami2025-01-14下载The exponential growth in the size and complexity of Large Language Models (LLMs) has introduced unprecedented challenges in their deployment and operational management.
Hierarchical Autoscaling for Large Language Model Serving with ChironArchit Patke, Dhemath Reddy, Saurabh Jha, Chandra Narayanaswami, Zbigniew Kalbarczyk, Ravishankar Iyer2025-01-14下载Large language model (LLM) serving is becoming an increasingly important workload for cloud providers. Based on performance SLO requirements, LLM inference requests can be divided into (a) interactive...
Technical Report: Exploring Automatic Model-Checking of the Ethereum specificationIgor Konnov, Jure Kukovec, Thomas Pani, Roberto Saltini, Thanh Hai Tran2025-01-14下载We investigate automated model-checking of the Ethereum specification, focusing on the Accountable Safety property of the 3SF consensus protocol.
HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud InferenceYiming Gao, Chao Jiang, Wesley Piard, Xiangru Chen, Bhavesh Patel, Herman Lam2025-01-14下载Point cloud is an important type of geometric data structure for many embedded applications such as autonomous driving and augmented reality. Current Point Cloud Networks (PCNs) have proven to achieve...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
CVaR-Based Variational Quantum Optimization for User Association in Handoff-Aware Vehicular NetworksZijiang Yan, Hao Zhou, Jianhua Pei, Aryan Kaushik, Hina Tabassum, Ping Wang2025-01-14下载Efficient resource allocation is essential for optimizing various tasks in wireless networks, which are usually formulated as generalized assignment problems (GAP).
Enhancing Train Transportation in Sri Lanka: A Smart IOT based Multi-Subsystem Approach using MQTTDhanushka Balasingham, Sadeesha Samarathunga, Anuththara Bandara, Gayantha Godakanda Arachchige, Narmada Gamage, Jaliya L. Wijayaraja2025-01-14下载This research proposes a system as a solution for the challenges faced by Sri Lanka' s historic railway system, such as scheduling delays, overcrowding, manual ticketing, and management inefficiencies...
Mobility Management in Integrated Sensing and Communications NetworksYuri S. Ribeiro, Behrooz Makki, Andre L. F. de Almeida, Gabor Fodor2025-01-14下载The performance of the integrated sensing and communication (ISAC) networks is considerably affected by the mobility of the transceiver nodes, user equipment devices (UEs) and the passive objects that...
Toward Interactive Multi-User Extended Reality Using Millimeter-Wave NetworkingJakob Struye, Sam Van Damme, Nabeel Nisar Bhat, Arno Troch, Barend Van Liempd, Hany Assasa, Filip Lemic, Jeroen Famaey, Maria Torres Vega2025-01-14下载Extended Reality (XR) enables a plethora of novel interactive shared experiences. Ideally, users are allowed to roam around freely, while audiovisual content is delivered wirelessly to their Head-Moun...
Smooth Handovers via Smoothed Online LearningMichail Kalntis, Andra Lutu, Jesús Omaña Iglesias, Fernando A. Kuipers, George Iosifidis2025-01-14下载With users demanding seamless connectivity, handovers (HOs) have become a fundamental element of cellular networks. However, optimizing HOs is a challenging problem, further exacerbated by the growing...
Continual Reinforcement Learning for Digital Twin Synchronization OptimizationHaonan Tong, Mingzhe Chen, Jun Zhao, Ye Hu, Zhaohui Yang, Yuchen Liu, Changchuan Yin2025-01-14下载This article investigates the adaptive resource allocation scheme for digital twin (DT) synchronization optimization over dynamic wireless networks.
Enhanced SPS Velocity-adaptive Scheme: Access Fairness in 5G NR V2I NetworksXiao Xu, Qiong Wu, Pingyi Fan, Kezhi Wang2025-01-14下载Vehicle-to-Infrastructure (V2I) technology enables information exchange between vehicles and road infrastructure. Specifically, when a vehicle approaches a roadside unit (RSU), it can exchange informa...
RCD-IoT: Enabling Industrial Monitoring and Control with Resource-Constrained Devices UnderHigh Packet Transmission RatesAyesha Abid, Muhammad Jazib, Muhammad Riaz2025-01-14下载This paper highlights the significance of resource-constrained Internet of Things (RCD-IoT) systems in addressing the challenges faced by industries with limited resources.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
CORD: Co-design of Resource Allocation and Deadline Decomposition with Generative ProfilingRobert Gifford, Abby Eisenklam, Georgiy A. Bondar, Yifan Cai, Tushar Sial, Linh Thi Xuan Phan, Abhishek Halder2025-01-14下载As multicore hardware is becoming increasingly common in real-time systems, traditional scheduling techniques that assume a single worst-case execution time for a task are no longer adequate, since th...
Verifying Device Drivers with PancakeJunming Zhao, Miki Tanaka, Johannes Åman Pohjola, Alessandro Legnani, Tiana Tsang Ung, H. Truong, Tsun Wang Sau, Thomas Sewell, Rob Sison, Hira Syeda, Magnus Myreen, Michael Norrish, Gernot Heiser2025-01-14下载Device driver bugs are the leading cause of OS compromises, and their formal verification is therefore highly desirable. To the best of our knowledge, no realistic and performant driver has been verif...

cs.PF - Performance

标题作者发布日期PDF摘要
CORD: Co-design of Resource Allocation and Deadline Decomposition with Generative ProfilingRobert Gifford, Abby Eisenklam, Georgiy A. Bondar, Yifan Cai, Tushar Sial, Linh Thi Xuan Phan, Abhishek Halder2025-01-14下载As multicore hardware is becoming increasingly common in real-time systems, traditional scheduling techniques that assume a single worst-case execution time for a task are no longer adequate, since th...
[MD\infty Queue Busy Period and Busy Cycle Distributions Computational Calculus](https://arxiv.org/abs/2505.10567v1)Manuel Alberto M. Ferreira2025-01-14

基于 VitePress 构建