2024-11-24

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception	Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque	2024-11-24	下载	We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings.
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format	Chao Fang, Man Shi, Robin Geens, Arne Symons, Zhongfeng Wang, Marian Verhelst	2024-11-24	下载	The widely-used, weight-only quantized large language models (LLMs), which leverage low-bit integer (INT) weights and retain floating-point (FP) activations, reduce storage requirements while maintain...
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments	Nikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas	2024-11-24	下载	The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception	Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque	2024-11-24	下载	We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings.
Ensuring Fair LLM Serving Amid Diverse Applications	Redwan Ibne Seraj Khan, Kunal Jain, Haiying Shen, Ankur Mallick, Anjaly Parayil, Anoop Kulkarni, Steve Kofsky, Pankhuri Choudhary, Renèe St. Amant, Rujia Wang, Yue Cheng, Ali R. Butt, Victor Rühle, Chetan Bansal, Saravan Rajmohan	2024-11-24	下载	In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other us...
SARS: A Resource Selection Algorithm for Autonomous Driving Tasks in Heterogeneous Mobile Edge Computing	Reza Zakerian, Hadi Gholami	2024-11-24	下载	With the rapid advancement of devices requiring intensive computation, such as Internet of Things (IoT) devices, smart sensors, and wearable technology, the computational demands on individual platfor...
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments	Nikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas	2024-11-24	下载	The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe...
Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution	Haiquan Wang, Chaoyi Ruan, Jia He, Jiaqi Ruan, Chengjie Tang, Xiaosong Ma, Cheng Li	2024-11-24	下载	The growth of Large Language Models (LLMs) has necessitated large-scale distributed training. Highly optimized frameworks, however, still suffer significant losses in Model FLOPS utilization (often be...
Streaming SQL Multi-Way Join Method for Long State Streams	Jinlong Hu, Tingfeng Qiu	2024-11-24	下载	Streaming computing effectively manages large-scale streaming data in real-time, making it ideal for applications such as real-time recommendations, anomaly detection, and monitoring, all of which req...
Runtime-optimized Multi-way Stream Join Operator for Large-scale Streaming data	Jinlong Hu, Tingfeng Qiu	2024-11-24	下载	Streaming computing enables the real-time processing of large volumes of data and offers significant advantages for various applications, including real-time recommendations, anomaly detection, and mo...
Algorithmics and Complexity of Cost-Driven Task Offloading with Submodular Optimization in Edge-Cloud Environments	Longkun Guo, Jiawei Lin, Xuanming Xu, Peng Li	2024-11-24	下载	Emerging applications such as autonomous driving pose the challenge of efficient cost-driven offloading in edge-cloud environments. This involves assigning tasks to edge and cloud servers for separate...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
SARS: A Resource Selection Algorithm for Autonomous Driving Tasks in Heterogeneous Mobile Edge Computing	Reza Zakerian, Hadi Gholami	2024-11-24	下载	With the rapid advancement of devices requiring intensive computation, such as Internet of Things (IoT) devices, smart sensors, and wearable technology, the computational demands on individual platfor...
Assessing the Viability of Quantum-Resistant IKEv2 over Constrained and Internet-Scale Networks	Geoff Twardokus, William Joslin, Hanif Rahbari, William Layton	2024-11-24	下载	Within 1-2 decades, quantum computers may become powerful enough to break current public-key cryptography, prompting authorities such as the IETF and NIST to push for adopting quantum-resistant crypto...
Space-ground Fluid AI for 6G Edge Intelligence	Qian Chen, Zhanwei Wang, Xianhao Chen, Juan Wen, Di Zhou, Sijing Ji, Min Sheng, Kaibin Huang	2024-11-24	下载	Edge artificial intelligence (AI) and space-ground integrated networks (SGINs) are two main usage scenarios of the sixth-generation (6G) mobile networks.
Efficient Multi-user Offloading of Personalized Diffusion Models: A DRL-Convex Hybrid Solution	Wanting Yang, Zehui Xiong, Song Guo, Shiwen Mao, Dong In Kim, Merouane Debbah	2024-11-24	下载	With the impressive generative capabilities of diffusion models, personalized content synthesis has emerged as the most highly anticipated. However, the large model sizes and iterative nature of infer...
Editable-DeepSC: Reliable Cross-Modal Semantic Communications for Facial Editing	Bin Chen, Wenbo Yu, Qinshan Zhang, Tianqu Zhuang, Hao Wu, Yong Jiang, Shu-Tao Xia	2024-11-24	下载	Interactive computer vision (CV) plays a crucial role in various real-world applications, whose performance is highly dependent on communication networks.
A Comparative Study on Self-Organization in Wireless Sensor Networks	Michael Simon, Salwa M. Din, Raja Jamal Chib	2024-11-24	下载	With advancements in microelectromechanical systems, low-power integrated circuits, and wireless communications, wireless sensor networks (WSNs) have become increasingly significant [1][2].

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments	Nikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas	2024-11-24	下载	The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception	Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque	2024-11-24	下载	We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings.
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments	Nikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas	2024-11-24	下载	The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe...