2025-02-11

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference	Yufeng Gu, Alireza Khadem, Sumanth Umesh, Ning Liang, Xavier Servot, Onur Mutlu, Ravi Iyer, Reetuparna Das	2025-02-11	下载	Large Language Model (LLM) inference uses an autoregressive manner to generate one token at a time, which exhibits notably lower operational intensity compared to earlier Machine Learning (ML) models ...
Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators	Jiyoon Kim, Kang Eun Jeon, Yulhwa Kim, Jong Hwan Ko	2025-02-11	下载	Compute-in-memory (CIM) is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead from analog-to-digital converters (ADCs), especially as ADC precision ...
A Hybrid-Domain Floating-Point Compute-in-Memory Architecture for Efficient Acceleration of High-Precision Deep Neural Networks	Zhiqiang Yi, Yiwen Liang, Weidong Cao	2025-02-11	下载	Compute-in-memory (CIM) has shown significant potential in efficiently accelerating deep neural networks (DNNs) at the edge, particularly in speeding up quantized models for inference applications.
MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures	Do Yeong Kang, Yeong Hwan Oh, Chanwook Hwang, Jinhee Kim, Kang Eun Jeon, Jong Hwan Ko	2025-02-11	下载	The implementation of Hyperdimensional Computing (HDC) on In-Memory Computing (IMC) architectures faces significant challenges due to the mismatch between highdimensional vectors and IMC array sizes, ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning	Divyansh Jhunjhunwala, Pranay Sharma, Zheng Xu, Gauri Joshi	2025-02-11	下载	Initializing with pre-trained models when learning on downstream tasks is becoming standard practice in machine learning. Several recent works explore the benefits of pre-trained initialization in a f...
Actor Capabilities for Message Ordering (Extended Version)	Colin S. Gordon	2025-02-11	下载	Actor systems are a flexible model of concurrent and distributed programming, which are efficiently implementable, and avoid many classic concurrency bugs by construction.
Federated Self-supervised Domain Generalization for Label-efficient Polyp Segmentation	Xinyi Tan, Jiacheng Wang, Liansheng Wang	2025-02-11	下载	Employing self-supervised learning (SSL) methodologies assumes par-amount significance in handling unlabeled polyp datasets when building deep learning-based automatic polyp segmentation models.
HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment	Youhe Jiang, Ran Yan, Binhang Yuan	2025-02-11	下载	Disaggregating the prefill and decoding phases represents an effective new paradigm for generative inference of large language models (LLM), which eliminates prefill-decoding interference and optimize...
Distributed Non-Interactive Zero-Knowledge Proofs	Alex B. Grilo, Ami Paz, Mor Perry	2025-02-11	下载	Distributed certification is a set of mechanisms that allows an all-knowing prover to convince the units of a communication network that the network's state has some desired property, such as being 3-...
DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training	Xin Tan, Yuetao Chen, Yimin Jiang, Xing Chen, Kun Yan, Nan Duan, Yibo Zhu, Daxin Jiang, Hong Xu	2025-02-11	下载	Diffusion Transformers (DiTs) have shown remarkable performance in generating high-quality videos. However, the quadratic complexity of 3D full attention remains a bottleneck in scaling DiT training, ...
Quantum Communication Advantage for Leader Election and Agreement	Fabien Dufoulon, Frédéric Magniez, Gopal Pandurangan	2025-02-11	下载	This work focuses on understanding the quantum message complexity of two central problems in distributed computing, namely, leader election and agreement in synchronous message-passing communication n...
Closing a Source Complexity Gap between Chapel and HPX	Shreyas Atre, Chris Taylor, Patrick Diehl, Hartmut Kaiser	2025-02-11	下载	A previous case study measured performance vs source-code complexity across multiple languages. The case study identified Chapel and HPX provide similar performance and code complexity.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Connectivity of LEO Satellite Mega Constellations: An Application of Percolation Theory on a Sphere	Hao Lin, Mustafa A. Kishk, Mohamed-Slim Alouini	2025-02-11	下载	With the advent of the 6G era, global connectivity has become a common goal in the evolution of communications, aiming to bring Internet services to more unconnected regions.
Energy-as-a-Service for RF-Powered IoE Networks: A Percolation Theory Approach	Hao Lin, Ainur Zhaikhan, Mustafa A. Kishk, Hesham ElSawy, Mohamed-Slim Alouini	2025-02-11	下载	Due to the involved massive number of devices, radio frequency (RF) energy harvesting is indispensable to realize the foreseen Internet-of-Everything (IoE) within 6G networks.
StarCast: A Secure and Spectrum-Efficient Group Communication Scheme for LEO Satellite Networks	Chaoyu Zhang, Hexuan Yu, Shanghao Shi, Shaoyu Li, Yi Shi, Eric Burger, Y. Thomas Hou, Wenjing Lou	2025-02-11	下载	Low Earth Orbit (LEO) satellite networks serve as a cornerstone infrastructure for providing ubiquitous connectivity in areas where terrestrial infrastructure is unavailable.
LLM-Sketch: Enhancing Network Sketches with LLM	Yuanpeng Li, Zhen Xu, Zongwei Lv, Yannan Hu, Yong Cui, Tong Yang	2025-02-11	下载	Network stream mining is fundamental to many network operations. Sketches, as compact data structures that offer low memory overhead with bounded accuracy, have emerged as a promising solution for net...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Memory Analysis on the Training Course of DeepSeek Models	Ping Zhang, Lei Su	2025-02-11	下载	We present a theoretical analysis of GPU memory consumption during the training of DeepSeek models such as DeepSeek-v2 and DeepSeek-v3. Our primary objective is to clarify the device-level memory requ...