Skip to content

2025-02-27

cs.AR - Architecture

标题作者发布日期PDF摘要
Wildcat: Educational RISC-V MicroprocessorsMartin Schoeberl2025-02-27下载In computer architecture courses, we usually teach RISC processors using a five-stage pipeline, neglecting alternative organizations. This design choice, rooted in the 1980s technology, may not be opt...
HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory ArchitectureTaiqiang Wu, Chenchen Ding, Wenyong Zhou, Yuxin Cheng, Xincheng Feng, Shuqi Wang, Wendong Xu, Chufan Shi, Zhengwu Liu, Ngai Wong2025-02-27下载Low-rank adaptation (LoRA) is a predominant parameter-efficient finetuning method for adapting large language models (LLMs) to downstream tasks.
HALO: Hardware-aware quantization with low critical-path-delay weights for LLM accelerationRohan Juneja, Shivam Aggarwal, Safeen Huda, Tulika Mitra, Li-Shiuan Peh2025-02-27下载Quantization is critical for efficiently deploying large language models (LLMs). Yet conventional methods remain hardware-agnostic, limited to bit-width constraints, and do not account for intrinsic c...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Communication-Efficient and Differentially Private Vertical Federated Learning with Zeroth-Order OptimizationJianing Zhang, Evan Chen, Dong-Jun Han, Chaoyue Liu, Christopher G. Brinton2025-02-27下载Vertical Federated Learning (VFL) enables collaborative model training across feature-partitioned devices, yet its reliance on device-server information exchange introduces significant communication o...
Building a Theory of Distributed Systems: Work by Nancy Lynch and CollaboratorsNancy Lynch2025-02-27下载In this manuscript I overview my work on developing a Theory for Distributed Systems -- work that has involved many students and other collaborators.
Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum LearningThomas Budiarjo, Santana Yuda Pradata, Kadek Gemilang Santiyuda, Muhammad Alfian Amrizal, Reza Pulungan, Hiroyuki Takizawa2025-02-27下载High energy consumption remains a key challenge in high-performance computing (HPC) systems, which often feature hundreds or thousands of nodes drawing substantial power even in idle or standby modes.
Methodology for GPU Frequency Switching Latency MeasurementDaniel Velicka, Ondrej Vysocky, Lubomir Riha2025-02-27下载The development of exascale and post-exascale HPC and AI systems integrates thousands of CPUs and specialized accelerators, making energy optimization critical as power costs rival hardware expenses.
Large-Scale Simulations of Fully Resolved Complex Moving Geometries with Partially Saturated CellsP. Suffa, S. Kemmler, H. Koestler, U. Ruede2025-02-27下载We employ the Partially Saturated Cells Method (PSM) to model the interaction between the fluid flow and solid moving objects as an extension to the conventional lattice Boltzmann method.
SkipPipe: Partial and Reordered Pipelining Framework for Training LLMs in Heterogeneous NetworksNikolay Blagoev, Lydia Yiyu Chen, Oğuzhan Ersoy2025-02-27下载Data and pipeline parallelism are ubiquitous for training of Large Language Models (LLM) on distributed nodes. Driven by the need for cost-effective training, recent work explores efficient communicat...
RingAda: Pipelining Large Model Fine-Tuning on Edge Devices with Scheduled Layer UnfreezingLiang Li, Xiaopei Chen, Wen Wu2025-02-27下载To enable large model (LM) based edge intelligent service provisioning, on-device fine-tuning with locally personalized data allows for continuous and privacy-preserving LM customization.
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-ExpertsShulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, Wenlei Bao, Chengquan Jiang, Qi Hou, Weihao Cui, Size Zheng, Li-Wen Chang, Quan Chen, Xin Liu2025-02-27下载Mixture-of-experts (MoE) has been extensively employed to scale large language models to trillion-plus parameters while maintaining a fixed computational cost.
Static task mapping for heterogeneous systems based on series-parallel decompositionsMartin Wilhelm, Thilo Pionteck2025-02-27下载Modern heterogeneous systems consist of many different processing units, such as CPUs, GPUs, FPGAs and AI units. A central problem in the design of applications in this environment is to find a benefi...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Scalable Coordinated Learning for H2M/R Applications over Optical Access Networks (Invited)Sourav Mondal, Elaine Wong2025-02-27下载One of the primary research interests adhering to next-generation fiber-wireless access networks is human-to-machine/robot (H2M/R) collaborative communications facilitating Industry 5.0.
Robust Multicast Origin Authentication in MACsec and CANsec for Automotive ScenariosGianluca Cena, Lucia Seno, Stefano Scanzio2025-02-27下载Having everything interconnected through the Internet, including vehicle onboard systems, is making security a primary concern in the automotive domain as well.
Data Taxonomy Towards the Applicability of the Digital Twin Conceptual Framework in Disaster ManagementEva Brucherseifer, Marco Marquard, Martin Hellmann, Andrea Tundis2025-02-27下载The Digital Twin (DT) offers a novel approach to the management of critical infrastructures, including energy, water, traffic, public health, and communication systems, which are indispensable for the...
ACCORD: Application Context-aware Cross-layer Optimization and Resource Design for 5G/NextG Machine-centric ApplicationsAzuka Chiejina, Subhramoy Mohanti, Vijay K. Shah2025-02-27下载Recent advancements in AI and edge computing have accelerated the development of machine-centric applications (MCAs), such as smart surveillance systems.
Pricing for Routing and Flow-Control in Payment Channel NetworksSuryanarayana Sankagiri, Bruce Hajek2025-02-27下载A payment channel network is a blockchain-based overlay mechanism that allows parties to transact more efficiently than directly using the blockchain.
Energy consumption of smartphones and IoT devices when using different versions of the HTTP protocolChiara Caiazza, Valerio Luconi, Alessio Vecchio2025-02-27下载HTTP is frequently used by smartphones and IoT devices to access information and Web services. Nowadays, HTTP is used in three major versions, each introducing significant changes with respect to the ...
Harmonious Coexistence between Aloha and CSMA: Novel Dual-channel Modeling and Throughput OptimizationWenhai Lin, Xinghua Sun, Anshan Yuan, Yayu Gao2025-02-27下载The scarcity of the licensed spectrum is forcing emerging Internet of Things (IoT) networks to operate within the unlicensed spectrum. Yet there has been extensive observation indicating that performa...
AutoBS: Autonomous Base Station Deployment with Reinforcement Learning and Digital Network TwinsJu-Hyung Lee, Andreas F. Molisch2025-02-27下载This paper introduces AutoBS, a reinforcement learning (RL)-based framework for optimal base station (BS) deployment in 6G radio access networks (RAN).

cs.PF - Performance

标题作者发布日期PDF摘要
Entanglement buffering with multiple quantum memoriesÁlvaro G. Iñesta, Bethany Davies, Sounak Kar, Stephanie Wehner2025-02-27下载Entanglement buffers are systems that maintain high-quality entanglement, ensuring it is readily available for consumption when needed. In this work, we study the performance of a two-node buffer, whe...
A high-performance and portable implementation of the SISSO method for CPUs and GPUsSebastian Eibl, Yi Yao, Matthias Scheffler, Markus Rampp, Luca M. Ghiringhelli, Thomas A. R. Purcell2025-02-27下载SISSO (sure-independence screening and sparsifying operator) is an artificial intelligence (AI) method based on symbolic regression and compressed sensing widely used in materials science research.

基于 VitePress 构建