Appearance
2024-11-24
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception | Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque | 2024-11-24 | 下载 | We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings. |
| Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format | Chao Fang, Man Shi, Robin Geens, Arne Symons, Zhongfeng Wang, Marian Verhelst | 2024-11-24 | 下载 | The widely-used, weight-only quantized large language models (LLMs), which leverage low-bit integer (INT) weights and retain floating-point (FP) activations, reduce storage requirements while maintain... |
| Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments | Nikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas | 2024-11-24 | 下载 | The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception | Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque | 2024-11-24 | 下载 | We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings. |
| Ensuring Fair LLM Serving Amid Diverse Applications | Redwan Ibne Seraj Khan, Kunal Jain, Haiying Shen, Ankur Mallick, Anjaly Parayil, Anoop Kulkarni, Steve Kofsky, Pankhuri Choudhary, Renèe St. Amant, Rujia Wang, Yue Cheng, Ali R. Butt, Victor Rühle, Chetan Bansal, Saravan Rajmohan | 2024-11-24 | 下载 | In a multi-tenant large language model (LLM) serving platform hosting diverse applications, some users may submit an excessive number of requests, causing the service to become unavailable to other us... |
| SARS: A Resource Selection Algorithm for Autonomous Driving Tasks in Heterogeneous Mobile Edge Computing | Reza Zakerian, Hadi Gholami | 2024-11-24 | 下载 | With the rapid advancement of devices requiring intensive computation, such as Internet of Things (IoT) devices, smart sensors, and wearable technology, the computational demands on individual platfor... |
| Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments | Nikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas | 2024-11-24 | 下载 | The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe... |
| Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution | Haiquan Wang, Chaoyi Ruan, Jia He, Jiaqi Ruan, Chengjie Tang, Xiaosong Ma, Cheng Li | 2024-11-24 | 下载 | The growth of Large Language Models (LLMs) has necessitated large-scale distributed training. Highly optimized frameworks, however, still suffer significant losses in Model FLOPS utilization (often be... |
| Streaming SQL Multi-Way Join Method for Long State Streams | Jinlong Hu, Tingfeng Qiu | 2024-11-24 | 下载 | Streaming computing effectively manages large-scale streaming data in real-time, making it ideal for applications such as real-time recommendations, anomaly detection, and monitoring, all of which req... |
| Runtime-optimized Multi-way Stream Join Operator for Large-scale Streaming data | Jinlong Hu, Tingfeng Qiu | 2024-11-24 | 下载 | Streaming computing enables the real-time processing of large volumes of data and offers significant advantages for various applications, including real-time recommendations, anomaly detection, and mo... |
| Algorithmics and Complexity of Cost-Driven Task Offloading with Submodular Optimization in Edge-Cloud Environments | Longkun Guo, Jiawei Lin, Xuanming Xu, Peng Li | 2024-11-24 | 下载 | Emerging applications such as autonomous driving pose the challenge of efficient cost-driven offloading in edge-cloud environments. This involves assigning tasks to edge and cloud servers for separate... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| SARS: A Resource Selection Algorithm for Autonomous Driving Tasks in Heterogeneous Mobile Edge Computing | Reza Zakerian, Hadi Gholami | 2024-11-24 | 下载 | With the rapid advancement of devices requiring intensive computation, such as Internet of Things (IoT) devices, smart sensors, and wearable technology, the computational demands on individual platfor... |
| Assessing the Viability of Quantum-Resistant IKEv2 over Constrained and Internet-Scale Networks | Geoff Twardokus, William Joslin, Hanif Rahbari, William Layton | 2024-11-24 | 下载 | Within 1-2 decades, quantum computers may become powerful enough to break current public-key cryptography, prompting authorities such as the IETF and NIST to push for adopting quantum-resistant crypto... |
| Space-ground Fluid AI for 6G Edge Intelligence | Qian Chen, Zhanwei Wang, Xianhao Chen, Juan Wen, Di Zhou, Sijing Ji, Min Sheng, Kaibin Huang | 2024-11-24 | 下载 | Edge artificial intelligence (AI) and space-ground integrated networks (SGINs) are two main usage scenarios of the sixth-generation (6G) mobile networks. |
| Efficient Multi-user Offloading of Personalized Diffusion Models: A DRL-Convex Hybrid Solution | Wanting Yang, Zehui Xiong, Song Guo, Shiwen Mao, Dong In Kim, Merouane Debbah | 2024-11-24 | 下载 | With the impressive generative capabilities of diffusion models, personalized content synthesis has emerged as the most highly anticipated. However, the large model sizes and iterative nature of infer... |
| Editable-DeepSC: Reliable Cross-Modal Semantic Communications for Facial Editing | Bin Chen, Wenbo Yu, Qinshan Zhang, Tianqu Zhuang, Hao Wu, Yong Jiang, Shu-Tao Xia | 2024-11-24 | 下载 | Interactive computer vision (CV) plays a crucial role in various real-world applications, whose performance is highly dependent on communication networks. |
| A Comparative Study on Self-Organization in Wireless Sensor Networks | Michael Simon, Salwa M. Din, Raja Jamal Chib | 2024-11-24 | 下载 | With advancements in microelectromechanical systems, low-power integrated circuits, and wireless communications, wireless sensor networks (WSNs) have become increasingly significant [1][2]. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments | Nikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas | 2024-11-24 | 下载 | The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception | Mohanad Odema, Luke Chen, Hyoukjun Kwon, Mohammad Abdullah Al Faruque | 2024-11-24 | 下载 | We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings. |
| Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments | Nikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas | 2024-11-24 | 下载 | The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for diffe... |