Skip to content

2025-01-16

cs.AR - Architecture

标题作者发布日期PDF摘要
Managed-Retention Memory: A New Class of Memory for the AI EraSergey Legtchenko, Ioan Stefanovici, Richard Black, Antony Rowstron, Junyi Liu, Paolo Costa, Burcu Canakci, Dushyanth Narayanan, Xingbo Wu2025-01-16下载AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance,...
Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore ArchitecturesPratyush Dhingra, Janardhan Rao Doppa, Partha Pratim Pande2025-01-16下载Transformer architectures have become the standard neural network model for various machine learning applications including natural language processing and computer vision.
MOGNET: A Mux-residual quantized Network leveraging Online-Generated weightsVan Thien Nguyen, William Guicquero, Gilles Sicard2025-01-16下载This paper presents a compact model architecture called MOGNET, compatible with a resource-limited hardware. MOGNET uses a streamlined Convolutional factorization block based on a combination of 2 poi...
Holistic Optimization Framework for FPGA AcceleratorsStéphane Pouget, Michael Lo, Louis-Noël Pouchet, Jason Cong2025-01-16下载Customized accelerators have revolutionized modern computing by delivering substantial gains in energy efficiency and performance through hardware specialization.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Serverless Computing: Architecture, Concepts, and ApplicationsMohsen Ghorbian, Mostafa Ghobaei-Arani2025-01-16下载Recently, serverless computing has gained recognition as a leading cloud computing method. Providing a solution that does not require direct server and infrastructure management, this technology has a...
The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous ExecutionFrank Sifei Luan, Ron Yifeng Wang, Yile Gu, Ziming Mao, Charlotte Lin, Amog Kamsetty, Hao Chen, Cheng Su, Balaji Veeramani, Scott Lee, SangBin Cho, Clark Zinzow, Eric Liang, Ion Stoica, Stephanie Wang2025-01-16下载While ML model training and inference are both GPU-intensive, CPU-based data processing is often the bottleneck. Distributed data processing systems based on the batch or stream processing models assu...
Managed-Retention Memory: A New Class of Memory for the AI EraSergey Legtchenko, Ioan Stefanovici, Richard Black, Antony Rowstron, Junyi Liu, Paolo Costa, Burcu Canakci, Dushyanth Narayanan, Xingbo Wu2025-01-16下载AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance,...
Cloud abstractions for AI workloadsMarco Canini, Theophilus A. Benson, Ricardo Bianchini, Íñigo Goiri, Dejan Kostić, Peter Pietzuch, Simon Peter2025-01-16下载AI workloads, often hosted in multi-tenant cloud environments, require vast computational resources but suffer inefficiencies due to limited tenant-provider coordination.
Core Hours and Carbon Credits: Incentivizing Sustainability in HPCAlok Kamatar, Maxime Gonthier, Valerie Hayot-Sasson, Andre Bauer, Marcin Copik, Torsten Hoefler, Raul Castro Fernandez, Kyle Chard, Ian Foster2025-01-16下载Realizing a shared responsibility between providers and consumers is critical to manage the sustainability of HPC. However, while cost may motivate efficiency improvements by infrastructure operators,...
RE-POSE: Synergizing Reinforcement Learning-Based Partitioning and Offloading for Edge Object DetectionJianrui Shi, Yong Zhao, Zeyang Cui, Xiaoming Shen, Minhang Zeng, Xiaojie Liu2025-01-16下载Object detection plays a crucial role in smart video analysis, with applications ranging from autonomous driving and security to smart cities.
Boosting Performance of Iterative Applications on GPUs: Kernel Batching with CUDA GraphsJonah Ekelund, Stefano Markidis, Ivy Peng2025-01-16下载Graphics Processing Units (GPUs) have become the standard in accelerating scientific applications on heterogeneous systems. However, as GPUs are getting faster, one potential performance bottleneck wi...
PICE: A Semantic-Driven Progressive Inference System for LLM Serving in Cloud-Edge NetworksHuiyou Zhan, Xuan Zhang, Haisheng Tan, Han Tian, Dongping Yong, Junyang Zhang, Xiang-Yang Li2025-01-16下载Large language models (LLMs), while driving a new wave of interactive AI applications across numerous domains, suffer from high inference costs and heavy cloud dependency.
Jodes: Efficient Oblivious Join in the Distributed SettingYilei Wang, Xiangdong Zeng, Sheng Wang, Feifei Li2025-01-16下载Trusted execution environment (TEE) has provided an isolated and secure environment for building cloud-based analytic systems, but it still suffers from access pattern leakages caused by side-channel ...
PATCHEDSERVE: A Patch Management Framework for SLO-Optimized Hybrid Resolution Diffusion ServingDesen Sun, Zepeng Zhao, Yuke Wang2025-01-16下载The Text-to-Image (T2I) diffusion model has emerged as one of the most widely adopted generative models. However, serving diffusion models at the granularity of entire images introduces significant ch...
Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor CoresHaisha Zhao, San Li, Jiaheng Wang, Chunbao Zhou, Jue Wang, Zhikuang Xin, Shunde Li, Zhiqiang Liang, Zhijie Pan, Fang Liu, Yan Zeng, Yangang Wang, Xuebin Chi2025-01-16下载General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) b...
Split Fine-Tuning for Large Language Models in Wireless NetworksSongge Zhang, Guoliang Cheng, Xinyu Huang, Zuguang Li, Wen Wu, Lingyang Song, Xuemin Shen2025-01-16下载Fine-tuning is the process of adapting the pre-trained large language models (LLMs) for downstream tasks. Due to substantial parameters, fine-tuning LLMs on mobile devices demands considerable memory ...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Complex-Valued Neural Networks for Ultra-Reliable Massive MIMOPedro Benevenuto Valadares, Jonathan Aguiar Soares, Kayol Mayer, Dalton Soares Arantes2025-01-16下载In the evolving landscape of 5G and 6G networks, the demands extend beyond high data rates, ultra-low latency, and extensive coverage, increasingly emphasizing the need for reliability.
pFedWN: A Personalized Federated Learning Framework for D2D Wireless Networks with Heterogeneous DataZhou Ni, Masoud Ghazikor, Morteza Hashemi2025-01-16下载Traditional Federated Learning (FL) approaches often struggle with data heterogeneity across clients, leading to suboptimal model performance for individual clients.
Ruling the Unruly: Designing Effective, Low-Noise Network Intrusion Detection Rules for Security Operations CentersKoen T. W. Teuwen, Tom Mulders, Emmanuele Zambon, Luca Allodi2025-01-16下载Many Security Operations Centers (SOCs) today still heavily rely on signature-based Network Intrusion Detection Systems (NIDS) such as Suricata.
Parallel multi-objective metaheuristics for smart communications in vehicular networksJamal Toutouh, Enrique Alba2025-01-16下载This article analyzes the use of two parallel multi-objective soft computing algorithms to automatically search for high-quality settings of the Ad hoc On Demand Vector routing protocol for vehicular ...
Intelligent OLSR Routing Protocol Optimization for VANETsJamal Toutouh, José García-Nieto, Enrique Alba2025-01-16下载Recent advances in wireless technologies have given rise to the emergence of vehicular ad hoc networks (VANETs). In such networks, the limited coverage of WiFi and the high mobility of the nodes gener...
Authenticated Delegation and Authorized AI AgentsTobin South, Samuele Marro, Thomas Hardjono, Robert Mahari, Cedric Deslandes Whitney, Dazza Greenwood, Alan Chan, Alex Pentland2025-01-16下载The rapid deployment of autonomous AI agents creates urgent challenges around authorization, accountability, and access control in digital spaces.
HpC: A Calculus for Hybrid and Mobile Systems -- Full VersionXiong Xu, Jean-Pierre Talpin, Shuling Wang, Hao Wu, Bohua Zhan, Xinxin Liu, Naijun Zhan2025-01-16下载Networked cybernetic and physical systems of the Internet of Things (IoT) immerse civilian and industrial infrastructures into an interconnected and dynamic web of hybrid and mobile devices.
MoE2^2: Optimizing Collaborative Inference for Edge Large Language ModelsLyudong Jin, Yanning Zhang, Yanhan Li, Shurong Wang, Howard H. Yang, Jian Wu, Meng Zhang2025-01-16下载Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing tasks. Exploiting the heterogeneous capabilities of edge LLMs is crucial for d...
Artificial Intelligence, Ambient Backscatter Communication and Non-Terrestrial Networks: A 6G CommixtureMuhammad Ali Jamshed, Bushra Haq, Muhammad Ahmed Mohsin, Ali Nauman, Halim Yanikomeroglu2025-01-16下载The advent of Non-Terrestrial Networks (NTN) represents a compelling response to the International Mobile Telecommunications 2030 (IMT-2030) framework, enabling the delivery of advanced, seamless conn...
Contract-Inspired Contest Theory for Controllable Image Generation in Mobile Edge MetaverseGuangyuan Liu, Hongyang Du, Jiacheng Wang, Dusit Niyato, Dong In Kim2025-01-16下载The rapid advancement of immersive technologies has propelled the development of the Metaverse, where the convergence of virtual and physical realities necessitates the generation of high-quality, pho...
Adaptive Contextual Caching for Mobile Edge Large Language Model ServiceGuangyuan Liu, Yinqiu Liu, Jiacheng Wang, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong2025-01-16下载Mobile edge Large Language Model (LLM) deployments face inherent constraints, such as limited computational resources and network bandwidth. Although Retrieval-Augmented Generation (RAG) mitigates som...
MagnetDB: A Longitudinal Torrent Discovery Dataset with IMDb-Matched Movies and TV ShowsScott Seidenberger, Noah Pursell, Anindya Maiti2025-01-16下载BitTorrent remains a prominent channel for illicit distribution of copyrighted material, yet the supply side of such content remains understudied.

cs.PF - Performance

标题作者发布日期PDF摘要
Quantum-Enhanced Transformers for Robust Acoustic Scene Classification in IoT EnvironmentsMinh K. Quan, Mayuri Wijayasundara, Sujeeva Setunge, Pubudu N. Pathirana2025-01-16下载The proliferation of Internet of Things (IoT) devices equipped with acoustic sensors necessitates robust acoustic scene classification (ASC) capabilities, even in noisy and data-limited environments.
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPUTrevor McInroe, Samuel Garcin2025-01-16下载We present PixelBrax, a set of continuous control tasks with pixel observations. We combine the Brax physics engine with a pure JAX renderer, allowing reinforcement learning (RL) experiments to run en...

基于 VitePress 构建