Skip to content

2025-02-02

cs.AR - Architecture

标题作者发布日期PDF摘要
Huff-LLM: End-to-End Lossless Compression for Efficient LLM InferencePatrick Yubeaton, Tareq Mahmoud, Shehab Naga, Pooria Taheri, Tianhua Xia, Arun George, Yasmein Khalil, Sai Qian Zhang, Siddharth Joshi, Chinmay Hegde, Siddharth Garg2025-02-02下载As they become more capable, large language models (LLMs) have continued to rapidly increase in size. This has exacerbated the difficulty in running state of the art LLMs on small, edge devices.
A Flexible Precision Scaling Deep Neural Network Accelerator with Efficient Weight CombinationLiang Zhao, Kunming Shao, Fengshi Tian, Tim Kwang-Ting Cheng, Chi-Ying Tsui, Yi Zou2025-02-02下载Deploying mixed-precision neural networks on edge devices is friendly to hardware resources and power consumption. To support fully mixed-precision neural network inference, it is necessary to design ...
DeepGate4: Efficient and Effective Representation Learning for Circuit Design at ScaleZiyang Zheng, Shan Huang, Jianyuan Zhong, Zhengyuan Shi, Guohao Dai, Ningyi Xu, Qiang Xu2025-02-02下载Circuit representation learning has become pivotal in electronic design automation, enabling critical tasks such as testability analysis, logic reasoning, power estimation, and SAT solving.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model ServingHaoran Qiu, Anish Biswas, Zihan Zhao, Jayashree Mohan, Alind Khare, Esha Choukse, Íñigo Goiri, Zeyu Zhang, Haiying Shen, Chetan Bansal, Ramachandran Ramjee, Rodrigo Fonseca2025-02-02下载Large multimodal models (LMMs) demonstrate impressive capabilities in understanding images, videos, and audio beyond text. However, efficiently serving LMMs in production environments poses significan...
FedRIR: Rethinking Information Representation in Federated LearningYongqiang Huang, Zerui Shao, Ziyuan Yang, Zexin Lu, Yi Zhang2025-02-02下载Mobile and Web-of-Things (WoT) devices at the network edge generate vast amounts of data for machine learning applications, yet privacy concerns hinder centralized model training.
ATA: Adaptive Task Allocation for Efficient Resource Management in Distributed Machine LearningArtavazd Maranjyan, El Mehdi Saad, Peter Richtárik, Francesco Orabona2025-02-02下载Asynchronous methods are fundamental for parallelizing computations in distributed machine learning. They aim to accelerate training by fully utilizing all available resources.
Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUsYouhe Jiang, Fangcheng Fu, Xiaozhe Yao, Guoliang He, Xupeng Miao, Ana Klimovic, Bin Cui, Binhang Yuan, Eiko Yoneki2025-02-02下载Recent advancements in Large Language Models (LLMs) have led to increasingly diverse requests, accompanied with varying resource (compute and memory) demands to serve them.
DeLIAP e DeLIAJ: Interfaces de biblioteca de Dependabilidade para Python e JuliaMarcos Irigoyen, Carla Santana, Ramon C. F Araújo, Samuel Xavier-de-Souza2025-02-02下载The evergrowing computational complexity of High Performance Computing applications is often met with an horizontal scalling of computing systems.
Optimal local certification on graphs of bounded pathwidthDan Alden Baterisna, Yi-Jun Chang2025-02-02下载We present proof labeling schemes for graphs with bounded pathwidth that can decide any graph property expressible in monadic second-order (MSO) logic using O(logn)O(\log n)-bit vertex labels.
POSMAC: Powering Up In-Network AR/CG Traffic Classification with Online LearningAlireza Shirmarz, Fabio Luciano Verdi, Suneet Kumar Singh, Christian Esteve Rothenberg2025-02-02下载In this demonstration, we showcase POSMAC1, a platform designed to deploy Decision Tree (DT) and Random Forest (RF) models on the NVIDIA DOCA DPU, equipped with an ARM processor, for real-time network...
General Coded Computing in a Probabilistic Straggler RegimeParsa Moradi, Mohammad Ali Maddah-Ali2025-02-02下载Coded computing has demonstrated promising results in addressing straggler resiliency in distributed computing systems. However, most coded computing schemes are designed for exact computation, requir...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Improving SDN Performance Using Network Coding: A Quantitative AnalysisAmer T. Ali, Qutaiba I. Ali2025-02-02下载Software Defined Networking or SDN is an architectural approach to managing the network where the control and forwarding are different planes that are controlled through an application interface.
Mathematical Modeling for Network Upgrades in Internet Service Provider InfrastructureOmar M Malallah, Qutaiba I. Ali2025-02-02下载The ongoing growth of the need for superior Internet services creates great pressure on the ISPs as to the accurate estimation of network upgrade need.
Forecasting Global Network Traffic Trends: The Role of Virtual RealityRaghad H. AlShekh, Qutaiba I. Ali2025-02-02下载Virtual Reality (VR) technology demands real-time data transmission to deliver an immersive and interactive user experience. This study investigates the implementation of UDP Ethernet communication in...
REAL: Reinforcement Learning-Enabled xApps for Experimental Closed-Loop Optimization in O-RAN with OSC RIC and srsRANRyan Barker, Alireza Ebrahimi Dorcheh, Tolunay Seyfi, Fatemeh Afghah2025-02-02下载Open Radio Access Network (O-RAN) offers an open, programmable architecture for next-generation wireless networks, enabling advanced control through AI-based applications on the near-Real-Time RAN Int...
CardioLive: Empowering Video Streaming with Online Cardiac MonitoringSheng Lyu, Ruiming Huang, Sijie Ji, Yasar Abbas Ur Rehman, Lan Ma, Chenshu Wu2025-02-02下载Online Cardiac Monitoring (OCM) emerges as a compelling enhancement for the next-generation video streaming platforms. It enables various applications including remote health, online affective computi...
POSMAC: Powering Up In-Network AR/CG Traffic Classification with Online LearningAlireza Shirmarz, Fabio Luciano Verdi, Suneet Kumar Singh, Christian Esteve Rothenberg2025-02-02下载In this demonstration, we showcase POSMAC1, a platform designed to deploy Decision Tree (DT) and Random Forest (RF) models on the NVIDIA DOCA DPU, equipped with an ARM processor, for real-time network...
Congestion Management in High-Performance Interconnection Networks Using Adaptive Routing NotificationsJose Rocher-Gonzalez, Jesus Escudero-Sahuquillo, Pedro J. Garcia, Francisco J. Quiles2025-02-02下载The interconnection network is a crucial subsystem in High-Performance Computing clusters and Data-centers, guaranteeing high bandwidth and low latency to the applications' communication operations.
Using Causality for Enhanced Prediction of Web Traffic Time SeriesChang Tian, Mingzhe Xing, Zenglin Shi, Matthew B. Blaschko, Yinliang Yue, Marie-Francine Moens2025-02-02下载Predicting web service traffic has significant social value, as it can be applied to various practical scenarios, including but not limited to dynamic resource scaling, load balancing, system anomaly ...
Hades: Hierarchical Adaptable Decoding for Efficient and Elastic vRANJincao Zhu, Kobus Van Der Merwe, Xenofon Foukas, Bozidar Radunovic2025-02-02下载In cellular networks, virtualized Radio Access Networks (vRANs) enable replacing traditional specialized hardware at cell sites with software running on commodity servers distributed across edge and r...

基于 VitePress 构建