Skip to content

2024-11-24

cs.AR - Architecture

标题作者发布日期PDF摘要
Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving PerceptionMohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque2024-11-24下载We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings.
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data FormatChao Fang, Man Shi, Robin Geens, Arne Symons, Zhongfeng Wang, Marian Verhelst2024-11-24下载The widely-used, weight-only quantized large language models (LLMs), which leverage low-bit integer (INT) weights and retain floating-point (FP) activations, reduce storage requirements while maintain...
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference EnvironmentsNikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas2024-11-24下载The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving PerceptionMohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque2024-11-24下载We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings.
Ensuring Fair LLM Serving Amid Diverse ApplicationsRedwan Ibne Seraj Khan, Kunal Jain, Haiying Shen, Ankur Mallick, Anjaly Parayil, Anoop Kulkarni, Steve Kofsky, Pankhuri Choudhary, Renèe St. Amant, Rujia Wang, Yue Cheng, Ali R. Butt, Victor Rühle, Chetan Bansal, Saravan Rajmohan2024-11-24下载In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other us...
SARS: A Resource Selection Algorithm for Autonomous Driving Tasks in Heterogeneous Mobile Edge ComputingReza Zakerian, Hadi Gholami2024-11-24下载With the rapid advancement of devices requiring intensive computation, such as Internet of Things (IoT) devices, smart sensors, and wearable technology, the computational demands on individual platfor...
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference EnvironmentsNikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas2024-11-24下载The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe...
Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-executionHaiquan Wang, Chaoyi Ruan, Jia He, Jiaqi Ruan, Chengjie Tang, Xiaosong Ma, Cheng Li2024-11-24下载The growth of Large Language Models (LLMs) has necessitated large-scale distributed training. Highly optimized frameworks, however, still suffer significant losses in Model FLOPS utilization (often be...
Streaming SQL Multi-Way Join Method for Long State StreamsJinlong Hu, Tingfeng Qiu2024-11-24下载Streaming computing effectively manages large-scale streaming data in real-time, making it ideal for applications such as real-time recommendations, anomaly detection, and monitoring, all of which req...
Runtime-optimized Multi-way Stream Join Operator for Large-scale Streaming dataJinlong Hu, Tingfeng Qiu2024-11-24下载Streaming computing enables the real-time processing of large volumes of data and offers significant advantages for various applications, including real-time recommendations, anomaly detection, and mo...
Algorithmics and Complexity of Cost-Driven Task Offloading with Submodular Optimization in Edge-Cloud EnvironmentsLongkun Guo, Jiawei Lin, Xuanming Xu, Peng Li2024-11-24下载Emerging applications such as autonomous driving pose the challenge of efficient cost-driven offloading in edge-cloud environments. This involves assigning tasks to edge and cloud servers for separate...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
SARS: A Resource Selection Algorithm for Autonomous Driving Tasks in Heterogeneous Mobile Edge ComputingReza Zakerian, Hadi Gholami2024-11-24下载With the rapid advancement of devices requiring intensive computation, such as Internet of Things (IoT) devices, smart sensors, and wearable technology, the computational demands on individual platfor...
Assessing the Viability of Quantum-Resistant IKEv2 over Constrained and Internet-Scale NetworksGeoff Twardokus, William Joslin, Hanif Rahbari, William Layton2024-11-24下载Within 1-2 decades, quantum computers may become powerful enough to break current public-key cryptography, prompting authorities such as the IETF and NIST to push for adopting quantum-resistant crypto...
Space-ground Fluid AI for 6G Edge IntelligenceQian Chen, Zhanwei Wang, Xianhao Chen, Juan Wen, Di Zhou, Sijing Ji, Min Sheng, Kaibin Huang2024-11-24下载Edge artificial intelligence (AI) and space-ground integrated networks (SGINs) are two main usage scenarios of the sixth-generation (6G) mobile networks.
Efficient Multi-user Offloading of Personalized Diffusion Models: A DRL-Convex Hybrid SolutionWanting Yang, Zehui Xiong, Song Guo, Shiwen Mao, Dong In Kim, Merouane Debbah2024-11-24下载With the impressive generative capabilities of diffusion models, personalized content synthesis has emerged as the most highly anticipated. However, the large model sizes and iterative nature of infer...
Editable-DeepSC: Reliable Cross-Modal Semantic Communications for Facial EditingBin Chen, Wenbo Yu, Qinshan Zhang, Tianqu Zhuang, Hao Wu, Yong Jiang, Shu-Tao Xia2024-11-24下载Interactive computer vision (CV) plays a crucial role in various real-world applications, whose performance is highly dependent on communication networks.
A Comparative Study on Self-Organization in Wireless Sensor NetworksMichael Simon, Salwa M. Din, Raja Jamal Chib2024-11-24下载With advancements in microelectromechanical systems, low-power integrated circuits, and wireless communications, wireless sensor networks (WSNs) have become increasingly significant [1][2].

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference EnvironmentsNikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas2024-11-24下载The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe...

cs.PF - Performance

标题作者发布日期PDF摘要
Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving PerceptionMohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque2024-11-24下载We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings.
Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference EnvironmentsNikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas2024-11-24下载The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe...

基于 VitePress 构建