Appearance
2025-04-19
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| RedMulE-FT: A Reconfigurable Fault-Tolerant Matrix Multiplication Engine | Philip Wiese, Maurus Item, Luca Bertaccini, Yvan Tortorella, Angelo Garofalo, Luca Benini | 2025-04-19 | 下载 | As safety-critical applications increasingly rely on data-parallel floating-point computations, there is an increasing need for flexible and configurable fault tolerance in parallel floating-point acc... |
| Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator | Akshat Ramachandran, Souvik Kundu, Arnab Raha, Shamik Kundu, Deepak K. Mathaikutty, Tushar Krishna | 2025-04-19 | 下载 | Large language model (LLM) pruning with fixed N:M structured sparsity significantly limits the expressivity of the sparse model, yielding sub-optimal performance. |
| Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache Management | Hang Zhang, Jiuchen Shi, Yixiao Wang, Quan Chen, Yizhou Shan, Minyi Guo | 2025-04-19 | 下载 | Multiple Low-Rank Adapters (Multi-LoRAs) are gaining popularity for task-specific Large Language Model (LLM) applications. For multi-LoRA serving, caching hot KV caches and LoRA adapters in high bandw... |
| FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM Inference | Coleman Hooper, Charbel Sakr, Ben Keller, Rangharajan Venkatesan, Kurt Keutzer, Sophia Shao, Brucek Khailany | 2025-04-19 | 下载 | Quantization is a powerful tool to improve large language model (LLM) inference efficiency by utilizing more energy-efficient low-precision datapaths and reducing memory footprint. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A fast MPI-based Distributed Hash-Table as Surrogate Model demonstrated in a coupled reactive transport HPC simulation | Max Lübke, Marco De Lucia, Steffen Christgau, Stefan Petri, Bettina Schnor | 2025-04-19 | 下载 | Surrogate models can play a pivotal role in enhancing performance in contemporary High-Performance Computing applications. Cache-based surrogates use already calculated simulation results to interpola... |
| Decentralization in PoS Blockchain Consensus: Quantification and Advancement | Shashank Motepalli, Hans-Arno Jacobsen | 2025-04-19 | 下载 | Decentralization is a foundational principle of permissionless blockchains, with consensus mechanisms serving a critical role in its realization. |
| A parallel implementation of reduced-order modeling of large-scale systems | Ionut-Gabriel Farcas, Rayomand P. Gundevia, Ramakanth Munipalli, Karen E. Willcox | 2025-04-19 | 下载 | Motivated by the large-scale nature of modern aerospace engineering simulations, this paper presents a detailed description of distributed Operator Inference (dOpInf), a recently developed parallel al... |
| Advancing Polyglot Big Data Processing using the Hadoop ecosystem | Antony Seabra, Sergio Lifschitz | 2025-04-19 | 下载 | This article explores the utilization of the Hadoop ecosystem as a polyglot big data processing platform, focusing on the integration of diverse computation and storage technologies and their potentia... |
| Towards Polyglot Data Processing in Social Networks using the Hadoop-Spark ecosystem | Antony Seabra, Sergio Lifschitz | 2025-04-19 | 下载 | This article explores the use of the Hadoop-Spark ecosystem for social media data processing, adopting a polyglot approach with the integration of various computation and storage technologies, such as... |
| DIP: Efficient Large Multimodal Model Training with Dynamic Interleaved Pipeline | Zhenliang Xue, Hanpeng Hu, Xing Chen, Yimin Jiang, Yixin Song, Zeyu Mi, Yibo Zhu, Daxin Jiang, Yubin Xia, Haibo Chen | 2025-04-19 | 下载 | Large multimodal models (LMMs) have demonstrated excellent capabilities in both understanding and generation tasks with various modalities. While these models can accept flexible combinations of input... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Planet as a Brain: Towards Internet of AgentSites based on AIOS Server | Xiang Zhang, Yongfeng Zhang | 2025-04-19 | 下载 | The internet is undergoing a historical transformation from the "Internet of Websites" to the "Internet of AgentSites." While traditional Websites served as the foundation for information hosting and ... |
| Diffusion-based Dynamic Contract for Federated AI Agent Construction in Mobile Metaverses | Jinbo Wen, Jiawen Kang, Yang Zhang, Yue Zhong, Dusit Niyato, Jie Xu, Jianhang Tang, Chau Yuen | 2025-04-19 | 下载 | Mobile metaverses are envisioned as a transformative digital ecosystem that delivers immersive, intelligent, and ubiquitous services through mobile devices. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache Management | Hang Zhang, Jiuchen Shi, Yixiao Wang, Quan Chen, Yizhou Shan, Minyi Guo | 2025-04-19 | 下载 | Multiple Low-Rank Adapters (Multi-LoRAs) are gaining popularity for task-specific Large Language Model (LLM) applications. For multi-LoRA serving, caching hot KV caches and LoRA adapters in high bandw... |