Appearance
2025-01-14
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving | Ahmet Caner Yüzügüler, Jiawei Zhuang, Lukas Cavigelli | 2025-01-14 | 下载 | Large language models (LLMs) are typically served from clusters of GPUs/NPUs that consist of large number of devices. Unfortunately, communication between these devices incurs significant overhead, in... |
| CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement Learning | Guoliang He, Eiko Yoneki | 2025-01-14 | 下载 | Large language models (LLMs) are remarked by their substantial computational requirements. To mitigate the cost, researchers develop specialized CUDA kernels, which often fuse several tensor operation... |
| PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning | Marta Andronic, Jiawen Li, George A. Constantinides | 2025-01-14 | 下载 | Standard deep neural network inference involves the computation of interleaved linear maps and nonlinear activation functions. Prior work for ultra-low latency implementations has hardcoded these oper... |
| PUFBind: PUF-Enabled Lightweight Program Binary Authentication for FPGA-based Embedded Systems | Sneha Swaroopa, Venkata Sreekanth Balijabudda, Rajat Subhra Chakraborty, Indrajit Chakrabarti | 2025-01-14 | 下载 | Field Programmable Gate Array (FPGA)-based embedded systems have become mainstream in the last decade, often in security-sensitive applications. |
| An Efficient Sparse Hardware Accelerator for Spike-Driven Transformer | Zhengke Li, Wendong Mao, Siyu Zhang, Qiwei Dong, Zhongfeng Wang | 2025-01-14 | 下载 | Recently, large models, such as Vision Transformer and BERT, have garnered significant attention due to their exceptional performance. However, their extensive computational requirements lead to consi... |
| HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference | Yiming Gao, Chao Jiang, Wesley Piard, Xiangru Chen, Bhavesh Patel, Herman Lam | 2025-01-14 | 下载 | Point cloud is an important type of geometric data structure for many embedded applications such as autonomous driving and augmented reality. Current Point Cloud Networks (PCNs) have proven to achieve... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Keras Sig: Efficient Path Signature Computation on GPU in Keras 3 | Rémi Genet, Hugo Inzirillo | 2025-01-14 | 下载 | In this paper we introduce Keras Sig a high-performance pythonic library designed to compute path signature for deep learning applications. Entirely built in Keras 3, \textit{Keras Sig} leverages the ... |
| A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution Systems | Minseok Ryu, Geunyeong Byeon, Kibaek Kim | 2025-01-14 | 下载 | We propose a GPU-accelerated distributed optimization algorithm for controlling multi-phase optimal power flow in active distribution systems with dynamically changing topologies. |
| PRESERVE: Prefetching Model Weights and KV-Cache in Distributed LLM Serving | Ahmet Caner Yüzügüler, Jiawei Zhuang, Lukas Cavigelli | 2025-01-14 | 下载 | Large language models (LLMs) are typically served from clusters of GPUs/NPUs that consist of large number of devices. Unfortunately, communication between these devices incurs significant overhead, in... |
| DNN-Powered MLOps Pipeline Optimization for Large Language Models: A Framework for Automated Deployment and Resource Management | Mahesh Vaijainthymala Krishnamoorthy, Kuppusamy Vellamadam Palavesam, Siva Venkatesh Arcot, Rajarajeswari Chinniah Kuppuswami | 2025-01-14 | 下载 | The exponential growth in the size and complexity of Large Language Models (LLMs) has introduced unprecedented challenges in their deployment and operational management. |
| Hierarchical Autoscaling for Large Language Model Serving with Chiron | Archit Patke, Dhemath Reddy, Saurabh Jha, Chandra Narayanaswami, Zbigniew Kalbarczyk, Ravishankar Iyer | 2025-01-14 | 下载 | Large language model (LLM) serving is becoming an increasingly important workload for cloud providers. Based on performance SLO requirements, LLM inference requests can be divided into (a) interactive... |
| Technical Report: Exploring Automatic Model-Checking of the Ethereum specification | Igor Konnov, Jure Kukovec, Thomas Pani, Roberto Saltini, Thanh Hai Tran | 2025-01-14 | 下载 | We investigate automated model-checking of the Ethereum specification, focusing on the Accountable Safety property of the 3SF consensus protocol. |
| HgPCN: A Heterogeneous Architecture for E2E Embedded Point Cloud Inference | Yiming Gao, Chao Jiang, Wesley Piard, Xiangru Chen, Bhavesh Patel, Herman Lam | 2025-01-14 | 下载 | Point cloud is an important type of geometric data structure for many embedded applications such as autonomous driving and augmented reality. Current Point Cloud Networks (PCNs) have proven to achieve... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| CVaR-Based Variational Quantum Optimization for User Association in Handoff-Aware Vehicular Networks | Zijiang Yan, Hao Zhou, Jianhua Pei, Aryan Kaushik, Hina Tabassum, Ping Wang | 2025-01-14 | 下载 | Efficient resource allocation is essential for optimizing various tasks in wireless networks, which are usually formulated as generalized assignment problems (GAP). |
| Enhancing Train Transportation in Sri Lanka: A Smart IOT based Multi-Subsystem Approach using MQTT | Dhanushka Balasingham, Sadeesha Samarathunga, Anuththara Bandara, Gayantha Godakanda Arachchige, Narmada Gamage, Jaliya L. Wijayaraja | 2025-01-14 | 下载 | This research proposes a system as a solution for the challenges faced by Sri Lanka' s historic railway system, such as scheduling delays, overcrowding, manual ticketing, and management inefficiencies... |
| Mobility Management in Integrated Sensing and Communications Networks | Yuri S. Ribeiro, Behrooz Makki, Andre L. F. de Almeida, Gabor Fodor | 2025-01-14 | 下载 | The performance of the integrated sensing and communication (ISAC) networks is considerably affected by the mobility of the transceiver nodes, user equipment devices (UEs) and the passive objects that... |
| Toward Interactive Multi-User Extended Reality Using Millimeter-Wave Networking | Jakob Struye, Sam Van Damme, Nabeel Nisar Bhat, Arno Troch, Barend Van Liempd, Hany Assasa, Filip Lemic, Jeroen Famaey, Maria Torres Vega | 2025-01-14 | 下载 | Extended Reality (XR) enables a plethora of novel interactive shared experiences. Ideally, users are allowed to roam around freely, while audiovisual content is delivered wirelessly to their Head-Moun... |
| Smooth Handovers via Smoothed Online Learning | Michail Kalntis, Andra Lutu, Jesús Omaña Iglesias, Fernando A. Kuipers, George Iosifidis | 2025-01-14 | 下载 | With users demanding seamless connectivity, handovers (HOs) have become a fundamental element of cellular networks. However, optimizing HOs is a challenging problem, further exacerbated by the growing... |
| Continual Reinforcement Learning for Digital Twin Synchronization Optimization | Haonan Tong, Mingzhe Chen, Jun Zhao, Ye Hu, Zhaohui Yang, Yuchen Liu, Changchuan Yin | 2025-01-14 | 下载 | This article investigates the adaptive resource allocation scheme for digital twin (DT) synchronization optimization over dynamic wireless networks. |
| Enhanced SPS Velocity-adaptive Scheme: Access Fairness in 5G NR V2I Networks | Xiao Xu, Qiong Wu, Pingyi Fan, Kezhi Wang | 2025-01-14 | 下载 | Vehicle-to-Infrastructure (V2I) technology enables information exchange between vehicles and road infrastructure. Specifically, when a vehicle approaches a roadside unit (RSU), it can exchange informa... |
| RCD-IoT: Enabling Industrial Monitoring and Control with Resource-Constrained Devices UnderHigh Packet Transmission Rates | Ayesha Abid, Muhammad Jazib, Muhammad Riaz | 2025-01-14 | 下载 | This paper highlights the significance of resource-constrained Internet of Things (RCD-IoT) systems in addressing the challenges faced by industries with limited resources. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| CORD: Co-design of Resource Allocation and Deadline Decomposition with Generative Profiling | Robert Gifford, Abby Eisenklam, Georgiy A. Bondar, Yifan Cai, Tushar Sial, Linh Thi Xuan Phan, Abhishek Halder | 2025-01-14 | 下载 | As multicore hardware is becoming increasingly common in real-time systems, traditional scheduling techniques that assume a single worst-case execution time for a task are no longer adequate, since th... |
| Verifying Device Drivers with Pancake | Junming Zhao, Miki Tanaka, Johannes Åman Pohjola, Alessandro Legnani, Tiana Tsang Ung, H. Truong, Tsun Wang Sau, Thomas Sewell, Rob Sison, Hira Syeda, Magnus Myreen, Michael Norrish, Gernot Heiser | 2025-01-14 | 下载 | Device driver bugs are the leading cause of OS compromises, and their formal verification is therefore highly desirable. To the best of our knowledge, no realistic and performant driver has been verif... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| CORD: Co-design of Resource Allocation and Deadline Decomposition with Generative Profiling | Robert Gifford, Abby Eisenklam, Georgiy A. Bondar, Yifan Cai, Tushar Sial, Linh Thi Xuan Phan, Abhishek Halder | 2025-01-14 | 下载 | As multicore hardware is becoming increasingly common in real-time systems, traditional scheduling techniques that assume a single worst-case execution time for a task are no longer adequate, since th... |
| [M | D | Queue Busy Period and Busy Cycle Distributions Computational Calculus](https://arxiv.org/abs/2505.10567v1) | Manuel Alberto M. Ferreira | 2025-01-14 |