Skip to content

2025-02-11

cs.AR - Architecture

标题作者发布日期PDF摘要
PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model InferenceYufeng Gu, Alireza Khadem, Sumanth Umesh, Ning Liang, Xavier Servot, Onur Mutlu, Ravi Iyer, Reetuparna Das2025-02-11下载Large Language Model (LLM) inference uses an autoregressive manner to generate one token at a time, which exhibits notably lower operational intensity compared to earlier Machine Learning (ML) models ...
Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory AcceleratorsJiyoon Kim, Kang Eun Jeon, Yulhwa Kim, Jong Hwan Ko2025-02-11下载Compute-in-memory (CIM) is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead from analog-to-digital converters (ADCs), especially as ADC precision ...
A Hybrid-Domain Floating-Point Compute-in-Memory Architecture for Efficient Acceleration of High-Precision Deep Neural NetworksZhiqiang Yi, Yiwen Liang, Weidong Cao2025-02-11下载Compute-in-memory (CIM) has shown significant potential in efficiently accelerating deep neural networks (DNNs) at the edge, particularly in speeding up quantized models for inference applications.
MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing ArchitecturesDo Yeong Kang, Yeong Hwan Oh, Chanwook Hwang, Jinhee Kim, Kang Eun Jeon, Jong Hwan Ko2025-02-11下载The implementation of Hyperdimensional Computing (HDC) on In-Memory Computing (IMC) architectures faces significant challenges due to the mismatch between highdimensional vectors and IMC array sizes, ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Initialization Matters: Unraveling the Impact of Pre-Training on Federated LearningDivyansh Jhunjhunwala, Pranay Sharma, Zheng Xu, Gauri Joshi2025-02-11下载Initializing with pre-trained models when learning on downstream tasks is becoming standard practice in machine learning. Several recent works explore the benefits of pre-trained initialization in a f...
Actor Capabilities for Message Ordering (Extended Version)Colin S. Gordon2025-02-11下载Actor systems are a flexible model of concurrent and distributed programming, which are efficiently implementable, and avoid many classic concurrency bugs by construction.
Federated Self-supervised Domain Generalization for Label-efficient Polyp SegmentationXinyi Tan, Jiacheng Wang, Liansheng Wang2025-02-11下载Employing self-supervised learning (SSL) methodologies assumes par-amount significance in handling unlabeled polyp datasets when building deep learning-based automatic polyp segmentation models.
HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous EnvironmentYouhe Jiang, Ran Yan, Binhang Yuan2025-02-11下载Disaggregating the prefill and decoding phases represents an effective new paradigm for generative inference of large language models (LLM), which eliminates prefill-decoding interference and optimize...
Distributed Non-Interactive Zero-Knowledge ProofsAlex B. Grilo, Ami Paz, Mor Perry2025-02-11下载Distributed certification is a set of mechanisms that allows an all-knowing prover to convince the units of a communication network that the network's state has some desired property, such as being 3-...
DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT TrainingXin Tan, Yuetao Chen, Yimin Jiang, Xing Chen, Kun Yan, Nan Duan, Yibo Zhu, Daxin Jiang, Hong Xu2025-02-11下载Diffusion Transformers (DiTs) have shown remarkable performance in generating high-quality videos. However, the quadratic complexity of 3D full attention remains a bottleneck in scaling DiT training, ...
Quantum Communication Advantage for Leader Election and AgreementFabien Dufoulon, Frédéric Magniez, Gopal Pandurangan2025-02-11下载This work focuses on understanding the quantum message complexity of two central problems in distributed computing, namely, leader election and agreement in synchronous message-passing communication n...
Closing a Source Complexity Gap between Chapel and HPXShreyas Atre, Chris Taylor, Patrick Diehl, Hartmut Kaiser2025-02-11下载A previous case study measured performance vs source-code complexity across multiple languages. The case study identified Chapel and HPX provide similar performance and code complexity.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Connectivity of LEO Satellite Mega Constellations: An Application of Percolation Theory on a SphereHao Lin, Mustafa A. Kishk, Mohamed-Slim Alouini2025-02-11下载With the advent of the 6G era, global connectivity has become a common goal in the evolution of communications, aiming to bring Internet services to more unconnected regions.
Energy-as-a-Service for RF-Powered IoE Networks: A Percolation Theory ApproachHao Lin, Ainur Zhaikhan, Mustafa A. Kishk, Hesham ElSawy, Mohamed-Slim Alouini2025-02-11下载Due to the involved massive number of devices, radio frequency (RF) energy harvesting is indispensable to realize the foreseen Internet-of-Everything (IoE) within 6G networks.
StarCast: A Secure and Spectrum-Efficient Group Communication Scheme for LEO Satellite NetworksChaoyu Zhang, Hexuan Yu, Shanghao Shi, Shaoyu Li, Yi Shi, Eric Burger, Y. Thomas Hou, Wenjing Lou2025-02-11下载Low Earth Orbit (LEO) satellite networks serve as a cornerstone infrastructure for providing ubiquitous connectivity in areas where terrestrial infrastructure is unavailable.
LLM-Sketch: Enhancing Network Sketches with LLMYuanpeng Li, Zhen Xu, Zongwei Lv, Yannan Hu, Yong Cui, Tong Yang2025-02-11下载Network stream mining is fundamental to many network operations. Sketches, as compact data structures that offer low memory overhead with bounded accuracy, have emerged as a promising solution for net...

cs.PF - Performance

标题作者发布日期PDF摘要
Memory Analysis on the Training Course of DeepSeek ModelsPing Zhang, Lei Su2025-02-11下载We present a theoretical analysis of GPU memory consumption during the training of DeepSeek models such as DeepSeek-v2 and DeepSeek-v3. Our primary objective is to clarify the device-level memory requ...

基于 VitePress 构建