Skip to content

2025-04-19

cs.AR - Architecture

标题作者发布日期PDF摘要
RedMulE-FT: A Reconfigurable Fault-Tolerant Matrix Multiplication EnginePhilip Wiese, Maurus Item, Luca Bertaccini, Yvan Tortorella, Angelo Garofalo, Luca Benini2025-04-19下载As safety-critical applications increasingly rely on data-parallel floating-point computations, there is an increasing need for flexible and configurable fault tolerance in parallel floating-point acc...
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory AcceleratorAkshat Ramachandran, Souvik Kundu, Arnab Raha, Shamik Kundu, Deepak K. Mathaikutty, Tushar Krishna2025-04-19下载Large language model (LLM) pruning with fixed N:M structured sparsity significantly limits the expressivity of the sparse model, yielding sub-optimal performance.
Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache ManagementHang Zhang, Jiuchen Shi, Yixiao Wang, Quan Chen, Yizhou Shan, Minyi Guo2025-04-19下载Multiple Low-Rank Adapters (Multi-LoRAs) are gaining popularity for task-specific Large Language Model (LLM) applications. For multi-LoRA serving, caching hot KV caches and LoRA adapters in high bandw...
FGMP: Fine-Grained Mixed-Precision Weight and Activation Quantization for Hardware-Accelerated LLM InferenceColeman Hooper, Charbel Sakr, Ben Keller, Rangharajan Venkatesan, Kurt Keutzer, Sophia Shao, Brucek Khailany2025-04-19下载Quantization is a powerful tool to improve large language model (LLM) inference efficiency by utilizing more energy-efficient low-precision datapaths and reducing memory footprint.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
A fast MPI-based Distributed Hash-Table as Surrogate Model demonstrated in a coupled reactive transport HPC simulationMax Lübke, Marco De Lucia, Steffen Christgau, Stefan Petri, Bettina Schnor2025-04-19下载Surrogate models can play a pivotal role in enhancing performance in contemporary High-Performance Computing applications. Cache-based surrogates use already calculated simulation results to interpola...
Decentralization in PoS Blockchain Consensus: Quantification and AdvancementShashank Motepalli, Hans-Arno Jacobsen2025-04-19下载Decentralization is a foundational principle of permissionless blockchains, with consensus mechanisms serving a critical role in its realization.
A parallel implementation of reduced-order modeling of large-scale systemsIonut-Gabriel Farcas, Rayomand P. Gundevia, Ramakanth Munipalli, Karen E. Willcox2025-04-19下载Motivated by the large-scale nature of modern aerospace engineering simulations, this paper presents a detailed description of distributed Operator Inference (dOpInf), a recently developed parallel al...
Advancing Polyglot Big Data Processing using the Hadoop ecosystemAntony Seabra, Sergio Lifschitz2025-04-19下载This article explores the utilization of the Hadoop ecosystem as a polyglot big data processing platform, focusing on the integration of diverse computation and storage technologies and their potentia...
Towards Polyglot Data Processing in Social Networks using the Hadoop-Spark ecosystemAntony Seabra, Sergio Lifschitz2025-04-19下载This article explores the use of the Hadoop-Spark ecosystem for social media data processing, adopting a polyglot approach with the integration of various computation and storage technologies, such as...
DIP: Efficient Large Multimodal Model Training with Dynamic Interleaved PipelineZhenliang Xue, Hanpeng Hu, Xing Chen, Yimin Jiang, Yixin Song, Zeyu Mi, Yibo Zhu, Daxin Jiang, Yubin Xia, Haibo Chen2025-04-19下载Large multimodal models (LMMs) have demonstrated excellent capabilities in both understanding and generation tasks with various modalities. While these models can accept flexible combinations of input...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Planet as a Brain: Towards Internet of AgentSites based on AIOS ServerXiang Zhang, Yongfeng Zhang2025-04-19下载The internet is undergoing a historical transformation from the "Internet of Websites" to the "Internet of AgentSites." While traditional Websites served as the foundation for information hosting and ...
Diffusion-based Dynamic Contract for Federated AI Agent Construction in Mobile MetaversesJinbo Wen, Jiawen Kang, Yang Zhang, Yue Zhong, Dusit Niyato, Jie Xu, Jianhang Tang, Chau Yuen2025-04-19下载Mobile metaverses are envisioned as a transformative digital ecosystem that delivers immersive, intelligent, and ubiquitous services through mobile devices.

cs.PF - Performance

标题作者发布日期PDF摘要
Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache ManagementHang Zhang, Jiuchen Shi, Yixiao Wang, Quan Chen, Yizhou Shan, Minyi Guo2025-04-19下载Multiple Low-Rank Adapters (Multi-LoRAs) are gaining popularity for task-specific Large Language Model (LLM) applications. For multi-LoRA serving, caching hot KV caches and LoRA adapters in high bandw...

基于 VitePress 构建