Skip to content

2024-06-03

cs.AR - Architecture

标题作者发布日期PDF摘要
Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM modelsAbhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Suvinay Subramanian, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna2024-06-03下载Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these gigantic models efficiently for diverse ...
A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI ComputingP. J. Zhou, Q. Yu, M. Chen, Y. C. Wang, L. W. Meng, Y. Zuo, N. Ning, Y. Liu, S. G. Hu, G. C. Qiao2024-06-03下载Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.
ADE-HGNN: Accelerating HGNNs through Attention Disparity ExploitationDengke Han, Meng Wu, Runzhen Xue, Mingyu Yan, Xiaochun Ye, Dongrui Fan2024-06-03下载Heterogeneous Graph Neural Networks (HGNNs) have recently demonstrated great power in handling heterogeneous graph data, rendering them widely applied in many critical real-world domains.
VERTECS: A COTS-based payload interface board to enable next generation astronomical imaging payloadsEzra Fielding, Victor H. Schulz, Keenan A. A. Chatar, Kei Sano, Akitoshi Hanazawa2024-06-03下载Due to advances in observation and imaging technologies, modern astronomical satellites generate large volumes of data. This necessitates efficient onboard data processing and high-speed data downlink...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Efficient Data Distribution Estimation for Accelerated Federated LearningYuanli Wang, Lei Huang2024-06-03下载Federated Learning(FL) is a privacy-preserving machine learning paradigm where a global model is trained in-situ across a large number of distributed edge devices.
Optimizing the Optimal Weighted Average: Efficient Distributed Sparse ClassificationFred Lu, Ryan R. Curtin, Edward Raff, Francis Ferraro, James Holt2024-06-03下载While distributed training is often viewed as a solution to optimizing linear models on increasingly large datasets, inter-machine communication costs of popular distributed approaches can dominate as...
A Surprisingly Simple Method for Distributed Euclidean-Minimum Spanning Tree / Single Linkage Dendrogram Construction from High Dimensional Embeddings via Distance DecompositionRichard Lettich2024-06-03下载We introduce a decomposition method for the distributed calculation of exact Euclidean Minimum Spanning Trees in high dimensions (where sub-quadratic algorithms are not effective), or more generalized...
Demystifying AI Platform Design for Distributed Inference of Next-Generation LLM modelsAbhimanyu Bambhaniya, Ritik Raj, Geonhwa Jeong, Souvik Kundu, Sudarshan Srinivasan, Suvinay Subramanian, Midhilesh Elavazhagan, Madhu Kumar, Tushar Krishna2024-06-03下载Large language models (LLMs) have shown remarkable performance across a wide range of applications, often outperforming human experts. However, deploying these gigantic models efficiently for diverse ...
Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-FlowYixuan Mei, Yonghao Zhuang, Xupeng Miao, Juncheng Yang, Zhihao Jia, Rashmi Vinayak2024-06-03下载This paper introduces Helix, a distributed system for high-throughput, low-latency large language model (LLM) serving in heterogeneous GPU clusters.
Asynchronous Multi-Server Federated Learning for Geo-Distributed ClientsYuncong Zuo, Bart Cox, Lydia Y. Chen, Jérémie Decouchant2024-06-03下载Federated learning (FL) systems enable multiple clients to train a machine learning model iteratively through synchronously exchanging the intermediate model weights with a single server.
Asynchronous Byzantine Federated LearningBart Cox, Abele Mălan, Lydia Y. Chen, Jérémie Decouchant2024-06-03下载Federated learning (FL) enables a set of geographically distributed clients to collectively train a model through a server. Classically, the training process is synchronous, but can be made asynchrono...
Performance comparison of Dask and Apache Spark on HPC systems for NeuroimagingMathieu Dugré, Valérie Hayot-Sasson, Tristan Glatard2024-06-03下载The general increase in data size and data sharing motivates the adoption of Big Data strategies in several scientific disciplines. However, while several options are available, no particular guidelin...
sAirflow: Adopting Serverless in a Legacy Workflow SchedulerFilip Mikina, Pawel Zuk, Krzysztof Rzadca2024-06-03下载Serverless clouds promise efficient scaling, reduced toil and monetary costs. Yet, serverless-ing a complex, legacy application might require major refactoring and thus is risky.
A GPU-ready pseudo-spectral method for direct numerical simulations of multiphase turbulenceAlessio Roccon2024-06-03下载In this work, we detail the GPU-porting of an in-house pseudo-spectral solver tailored towards large-scale simulations of interface-resolved simulation of drop- and bubble-laden turbulent flows.
Structures and Techniques for Streaming Dynamic Graph Processing on Decentralized Message-Driven SystemsBibrak Qamar Chandio, Maciej Brodowicz, Thomas Sterling2024-06-03下载The paper presents structures and techniques aimed towards co-designing scalable asynchronous and decentralized dynamic graph processing for fine-grain memory-driven architectures.
Formal Definition and Implementation of Reproducibility Tenets for Computational WorkflowsNicholas J. Pritchard, Andreas Wicenec2024-06-03下载Computational workflow management systems power contemporary data-intensive sciences. The slowly resolving reproducibility crisis presents both a sobering warning and an opportunity to iterate on what...
No Vandalism: Privacy-Preserving and Byzantine-Robust Federated LearningZhibo Xing, Zijian Zhang, Zi'ang Zhang, Jiamou Liu, Liehuang Zhu, Giovanni Russello2024-06-03下载Federated learning allows several clients to train one machine learning model jointly without sharing private data, providing privacy protection.
An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud ComputingHang Dong, Liwen Zhu, Zhao Shan, Bo Qiao, Fangkai Yang, Si Qin, Chuan Luo, Qingwei Lin, Yuwen Yang, Gurpreet Virdi, Saravan Rajmohan, Dongmei Zhang, Thomas Moscibroda2024-06-03下载Efficient resource utilization and perfect user experience usually conflict with each other in cloud computing platforms. Great efforts have been invested in increasing resource utilization but trying...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Non-uniformity is All You Need: Efficient and Timely Encrypted Traffic Classification With ECHOShilo Daum, Tal Shapira, Anat Bremler-Barr, David Hay2024-06-03下载With 95% of Internet traffic now encrypted, an effective approach to classifying this traffic is crucial for network security and management. This paper introduces ECHO -- a novel optimization process...
TSpec-LLM: An Open-source Dataset for LLM Understanding of 3GPP SpecificationsRasoul Nikbakht, Mohamed Benzaghta, Giovanni Geraci2024-06-03下载Understanding telecom standards involves sorting through numerous technical documents, such as those produced by the 3rd Generation Partnership Project (3GPP), which is time-consuming and labor-intens...
Experimental comparison of 5G SDR platforms: srsRAN x OpenAirInterfaceRuan P. Alves, Joao Guilherme A. da S. Alves, Mikael R. Camelo, Wilker O. de Feitosa, Victor F. Monteiro, Fco. Rodrigo P. Cavalcanti2024-06-03下载A Software-Defined Radio (SDR) platform is a communication system that implements as software functions that are typically implemented in dedicated hardware.
Conditional Gumbel-Softmax for constrained feature selection with application to node selection in wireless sensor networksThomas Strypsteen, Alexander Bertrand2024-06-03下载In this paper, we introduce Conditional Gumbel-Softmax as a method to perform end-to-end learning of the optimal feature subset for a given task and deep neural network (DNN) model, while adhering to ...
Joint Constellation Shaping Using Gradient Descent Approach for MU-MIMO Broadcast ChannelMaxime Vaillant, Alix Jeannerot, Jean-Marie Gorce2024-06-03下载We introduce a learning-based approach to optimize a joint constellation for a multi-user MIMO broadcast channel (TT Tx antennas, KK users, each with RR Rx antennas), with perfect channel knowledge...
Comparison of 5G Performance Post-Merger between Two Network Operators Using Field Tests in Urban AreasSurachai Chatchalermpu, Therdpong Daengsi, Pakkasit Sriamorntrakul, Kritphon Phanrattanachai2024-06-03下载In late Q1/2023, DTAC and TRUE officially completed their merger. Consequently, this study was initiated to ascertain whether their respective 5G networks had been seamlessly integrated several months...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Recover as It is Designed to Be: Recovering from Compatibility Mobile App Crashes by Reusing User FlowsDonghwi Kim, Hyungjun Yoon, Chang Min Park, Sujin Han, Youngjin Kwon, Steven Y. Ko, Sung-Ju Lee2024-06-03下载Android OS is severely fragmented by API updates and device vendors' OS customization, creating a market condition where vastly different OS versions coexist.
SVFF: An Automated Framework for SR-IOV Virtual Function Management in FPGA Accelerated Virtualized EnvironmentsStefano Cirici, Michele Paolino, Daniel Raho2024-06-03下载FPGA accelerator devices have emerged as a powerful platform for implementing high-performance and scalable solutions in a wide range of industries, leveraging their reconfigurability and virtualizati...

cs.PF - Performance

标题作者发布日期PDF摘要
Impact of Generative AI (Large Language Models) on the PRA model construction and maintenance, observationsValentin Rychkov, Claudia Picoco, Emilie Caleca2024-06-03下载The rapid development of Large Language Models (LLMs) and Generative Pre-Trained Transformers(GPTs) in the field of Generative Artificial Intelligence (AI) can significantly impact task automation in ...

基于 VitePress 构建