2025-02-27

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Wildcat: Educational RISC-V Microprocessors	Martin Schoeberl	2025-02-27	下载	In computer architecture courses, we usually teach RISC processors using a five-stage pipeline, neglecting alternative organizations. This design choice, rooted in the 1980s technology, may not be opt...
HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture	Taiqiang Wu, Chenchen Ding, Wenyong Zhou, Yuxin Cheng, Xincheng Feng, Shuqi Wang, Wendong Xu, Chufan Shi, Zhengwu Liu, Ngai Wong	2025-02-27	下载	Low-rank adaptation (LoRA) is a predominant parameter-efficient finetuning method for adapting large language models (LLMs) to downstream tasks.
HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration	Rohan Juneja, Shivam Aggarwal, Safeen Huda, Tulika Mitra, Li-Shiuan Peh	2025-02-27	下载	Quantization is critical for efficiently deploying large language models (LLMs). Yet conventional methods remain hardware-agnostic, limited to bit-width constraints, and do not account for intrinsic c...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Communication-Efficient and Differentially Private Vertical Federated Learning with Zeroth-Order Optimization	Jianing Zhang, Evan Chen, Dong-Jun Han, Chaoyue Liu, Christopher G. Brinton	2025-02-27	下载	Vertical Federated Learning (VFL) enables collaborative model training across feature-partitioned devices, yet its reliance on device-server information exchange introduces significant communication o...
Building a Theory of Distributed Systems: Work by Nancy Lynch and Collaborators	Nancy Lynch	2025-02-27	下载	In this manuscript I overview my work on developing a Theory for Distributed Systems -- work that has involved many students and other collaborators.
Improving the Efficiency of a Deep Reinforcement Learning-Based Power Management System for HPC Clusters Using Curriculum Learning	Thomas Budiarjo, Santana Yuda Pradata, Kadek Gemilang Santiyuda, Muhammad Alfian Amrizal, Reza Pulungan, Hiroyuki Takizawa	2025-02-27	下载	High energy consumption remains a key challenge in high-performance computing (HPC) systems, which often feature hundreds or thousands of nodes drawing substantial power even in idle or standby modes.
Methodology for GPU Frequency Switching Latency Measurement	Daniel Velicka, Ondrej Vysocky, Lubomir Riha	2025-02-27	下载	The development of exascale and post-exascale HPC and AI systems integrates thousands of CPUs and specialized accelerators, making energy optimization critical as power costs rival hardware expenses.
Large-Scale Simulations of Fully Resolved Complex Moving Geometries with Partially Saturated Cells	P. Suffa, S. Kemmler, H. Koestler, U. Ruede	2025-02-27	下载	We employ the Partially Saturated Cells Method (PSM) to model the interaction between the fluid flow and solid moving objects as an extension to the conventional lattice Boltzmann method.
SkipPipe: Partial and Reordered Pipelining Framework for Training LLMs in Heterogeneous Networks	Nikolay Blagoev, Lydia Yiyu Chen, Oğuzhan Ersoy	2025-02-27	下载	Data and pipeline parallelism are ubiquitous for training of Large Language Models (LLM) on distributed nodes. Driven by the need for cost-effective training, recent work explores efficient communicat...
RingAda: Pipelining Large Model Fine-Tuning on Edge Devices with Scheduled Layer Unfreezing	Liang Li, Xiaopei Chen, Wen Wu	2025-02-27	下载	To enable large model (LM) based edge intelligent service provisioning, on-device fine-tuning with locally personalized data allows for continuous and privacy-preserving LM customization.
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Shulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, Wenlei Bao, Chengquan Jiang, Qi Hou, Weihao Cui, Size Zheng, Li-Wen Chang, Quan Chen, Xin Liu	2025-02-27	下载	Mixture-of-experts (MoE) has been extensively employed to scale large language models to trillion-plus parameters while maintaining a fixed computational cost.
Static task mapping for heterogeneous systems based on series-parallel decompositions	Martin Wilhelm, Thilo Pionteck	2025-02-27	下载	Modern heterogeneous systems consist of many different processing units, such as CPUs, GPUs, FPGAs and AI units. A central problem in the design of applications in this environment is to find a benefi...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Scalable Coordinated Learning for H2M/R Applications over Optical Access Networks (Invited)	Sourav Mondal, Elaine Wong	2025-02-27	下载	One of the primary research interests adhering to next-generation fiber-wireless access networks is human-to-machine/robot (H2M/R) collaborative communications facilitating Industry 5.0.
Robust Multicast Origin Authentication in MACsec and CANsec for Automotive Scenarios	Gianluca Cena, Lucia Seno, Stefano Scanzio	2025-02-27	下载	Having everything interconnected through the Internet, including vehicle onboard systems, is making security a primary concern in the automotive domain as well.
Data Taxonomy Towards the Applicability of the Digital Twin Conceptual Framework in Disaster Management	Eva Brucherseifer, Marco Marquard, Martin Hellmann, Andrea Tundis	2025-02-27	下载	The Digital Twin (DT) offers a novel approach to the management of critical infrastructures, including energy, water, traffic, public health, and communication systems, which are indispensable for the...
ACCORD: Application Context-aware Cross-layer Optimization and Resource Design for 5G/NextG Machine-centric Applications	Azuka Chiejina, Subhramoy Mohanti, Vijay K. Shah	2025-02-27	下载	Recent advancements in AI and edge computing have accelerated the development of machine-centric applications (MCAs), such as smart surveillance systems.
Pricing for Routing and Flow-Control in Payment Channel Networks	Suryanarayana Sankagiri, Bruce Hajek	2025-02-27	下载	A payment channel network is a blockchain-based overlay mechanism that allows parties to transact more efficiently than directly using the blockchain.
Energy consumption of smartphones and IoT devices when using different versions of the HTTP protocol	Chiara Caiazza, Valerio Luconi, Alessio Vecchio	2025-02-27	下载	HTTP is frequently used by smartphones and IoT devices to access information and Web services. Nowadays, HTTP is used in three major versions, each introducing significant changes with respect to the ...
Harmonious Coexistence between Aloha and CSMA: Novel Dual-channel Modeling and Throughput Optimization	Wenhai Lin, Xinghua Sun, Anshan Yuan, Yayu Gao	2025-02-27	下载	The scarcity of the licensed spectrum is forcing emerging Internet of Things (IoT) networks to operate within the unlicensed spectrum. Yet there has been extensive observation indicating that performa...
AutoBS: Autonomous Base Station Deployment with Reinforcement Learning and Digital Network Twins	Ju-Hyung Lee, Andreas F. Molisch	2025-02-27	下载	This paper introduces AutoBS, a reinforcement learning (RL)-based framework for optimal base station (BS) deployment in 6G radio access networks (RAN).

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Entanglement buffering with multiple quantum memories	Álvaro G. Iñesta, Bethany Davies, Sounak Kar, Stephanie Wehner	2025-02-27	下载	Entanglement buffers are systems that maintain high-quality entanglement, ensuring it is readily available for consumption when needed. In this work, we study the performance of a two-node buffer, whe...
A high-performance and portable implementation of the SISSO method for CPUs and GPUs	Sebastian Eibl, Yi Yao, Matthias Scheffler, Markus Rampp, Luca M. Ghiringhelli, Thomas A. R. Purcell	2025-02-27	下载	SISSO (sure-independence screening and sparsifying operator) is an artificial intelligence (AI) method based on symbolic regression and compressed sensing widely used in materials science research.