Appearance
2025-02-11
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | Yufeng Gu, Alireza Khadem, Sumanth Umesh, Ning Liang, Xavier Servot, Onur Mutlu, Ravi Iyer, Reetuparna Das | 2025-02-11 | 下载 | Large Language Model (LLM) inference uses an autoregressive manner to generate one token at a time, which exhibits notably lower operational intensity compared to earlier Machine Learning (ML) models ... |
| Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators | Jiyoon Kim, Kang Eun Jeon, Yulhwa Kim, Jong Hwan Ko | 2025-02-11 | 下载 | Compute-in-memory (CIM) is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead from analog-to-digital converters (ADCs), especially as ADC precision ... |
| A Hybrid-Domain Floating-Point Compute-in-Memory Architecture for Efficient Acceleration of High-Precision Deep Neural Networks | Zhiqiang Yi, Yiwen Liang, Weidong Cao | 2025-02-11 | 下载 | Compute-in-memory (CIM) has shown significant potential in efficiently accelerating deep neural networks (DNNs) at the edge, particularly in speeding up quantized models for inference applications. |
| MEMHD: Memory-Efficient Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures | Do Yeong Kang, Yeong Hwan Oh, Chanwook Hwang, Jinhee Kim, Kang Eun Jeon, Jong Hwan Ko | 2025-02-11 | 下载 | The implementation of Hyperdimensional Computing (HDC) on In-Memory Computing (IMC) architectures faces significant challenges due to the mismatch between highdimensional vectors and IMC array sizes, ... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Initialization Matters: Unraveling the Impact of Pre-Training on Federated Learning | Divyansh Jhunjhunwala, Pranay Sharma, Zheng Xu, Gauri Joshi | 2025-02-11 | 下载 | Initializing with pre-trained models when learning on downstream tasks is becoming standard practice in machine learning. Several recent works explore the benefits of pre-trained initialization in a f... |
| Actor Capabilities for Message Ordering (Extended Version) | Colin S. Gordon | 2025-02-11 | 下载 | Actor systems are a flexible model of concurrent and distributed programming, which are efficiently implementable, and avoid many classic concurrency bugs by construction. |
| Federated Self-supervised Domain Generalization for Label-efficient Polyp Segmentation | Xinyi Tan, Jiacheng Wang, Liansheng Wang | 2025-02-11 | 下载 | Employing self-supervised learning (SSL) methodologies assumes par-amount significance in handling unlabeled polyp datasets when building deep learning-based automatic polyp segmentation models. |
| HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment | Youhe Jiang, Ran Yan, Binhang Yuan | 2025-02-11 | 下载 | Disaggregating the prefill and decoding phases represents an effective new paradigm for generative inference of large language models (LLM), which eliminates prefill-decoding interference and optimize... |
| Distributed Non-Interactive Zero-Knowledge Proofs | Alex B. Grilo, Ami Paz, Mor Perry | 2025-02-11 | 下载 | Distributed certification is a set of mechanisms that allows an all-knowing prover to convince the units of a communication network that the network's state has some desired property, such as being 3-... |
| DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training | Xin Tan, Yuetao Chen, Yimin Jiang, Xing Chen, Kun Yan, Nan Duan, Yibo Zhu, Daxin Jiang, Hong Xu | 2025-02-11 | 下载 | Diffusion Transformers (DiTs) have shown remarkable performance in generating high-quality videos. However, the quadratic complexity of 3D full attention remains a bottleneck in scaling DiT training, ... |
| Quantum Communication Advantage for Leader Election and Agreement | Fabien Dufoulon, Frédéric Magniez, Gopal Pandurangan | 2025-02-11 | 下载 | This work focuses on understanding the quantum message complexity of two central problems in distributed computing, namely, leader election and agreement in synchronous message-passing communication n... |
| Closing a Source Complexity Gap between Chapel and HPX | Shreyas Atre, Chris Taylor, Patrick Diehl, Hartmut Kaiser | 2025-02-11 | 下载 | A previous case study measured performance vs source-code complexity across multiple languages. The case study identified Chapel and HPX provide similar performance and code complexity. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Connectivity of LEO Satellite Mega Constellations: An Application of Percolation Theory on a Sphere | Hao Lin, Mustafa A. Kishk, Mohamed-Slim Alouini | 2025-02-11 | 下载 | With the advent of the 6G era, global connectivity has become a common goal in the evolution of communications, aiming to bring Internet services to more unconnected regions. |
| Energy-as-a-Service for RF-Powered IoE Networks: A Percolation Theory Approach | Hao Lin, Ainur Zhaikhan, Mustafa A. Kishk, Hesham ElSawy, Mohamed-Slim Alouini | 2025-02-11 | 下载 | Due to the involved massive number of devices, radio frequency (RF) energy harvesting is indispensable to realize the foreseen Internet-of-Everything (IoE) within 6G networks. |
| StarCast: A Secure and Spectrum-Efficient Group Communication Scheme for LEO Satellite Networks | Chaoyu Zhang, Hexuan Yu, Shanghao Shi, Shaoyu Li, Yi Shi, Eric Burger, Y. Thomas Hou, Wenjing Lou | 2025-02-11 | 下载 | Low Earth Orbit (LEO) satellite networks serve as a cornerstone infrastructure for providing ubiquitous connectivity in areas where terrestrial infrastructure is unavailable. |
| LLM-Sketch: Enhancing Network Sketches with LLM | Yuanpeng Li, Zhen Xu, Zongwei Lv, Yannan Hu, Yong Cui, Tong Yang | 2025-02-11 | 下载 | Network stream mining is fundamental to many network operations. Sketches, as compact data structures that offer low memory overhead with bounded accuracy, have emerged as a promising solution for net... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Memory Analysis on the Training Course of DeepSeek Models | Ping Zhang, Lei Su | 2025-02-11 | 下载 | We present a theoretical analysis of GPU memory consumption during the training of DeepSeek models such as DeepSeek-v2 and DeepSeek-v3. Our primary objective is to clarify the device-level memory requ... |