Skip to content

2024-03-07

cs.AR - Architecture

标题作者发布日期PDF摘要
Parendi: Thousand-Way Parallel RTL SimulationMahyar Emami, Thomas Bourgeat, James Larus2024-03-07下载Hardware development critically depends on cycle-accurate RTL simulation. However, as chip complexity increases, conventional single-threaded simulation becomes impractical due to stagnant single-core...
Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation MethodologyKonstantinos Kanellopoulos, Konstantinos Sgouras, F. Nisa Bostanci, Andreas Kosmas Kakolyris, Berkin Kerim Konar, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Nandita Vijaykumar, Onur Mutlu2024-03-07下载The unprecedented growth in data demand from emerging applications has turned virtual memory (VM) into a major performance bottleneck. Researchers explore new hardware/OS co-designs to optimize VM acr...
PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory ArchitecturesGeraldo F. Oliveira, Emanuele G. Esposito, Juan Gómez-Luna, Onur Mutlu2024-03-07下载Processing-using-DRAM (PUD) architectures impose a restrictive data layout and alignment for their operands, where source and destination operands (i) must reside in the same DRAM subarray (i.e.
A methodology to automatically optimize dynamic memory managers applying grammatical evolutionJosé L. Risco-Martín, J. Manuel Colmenar, J. Ignacio Hidalgo, Juan Lanchares, Josefa Díaz2024-03-07下载Modern consumer devices must execute multimedia applications that exhibit high resource utilization. In order to efficiently execute these applications, the dynamic memory subsystem needs to be optimi...
Silicon Photonic 2.5D Interposer Networks for Overcoming Communication Bottlenecks in Scale-out Machine Learning Hardware AcceleratorsFebin Sunny, Ebadollah Taheri, Mahdi Nikdast, Sudeep Pasricha2024-03-07下载Modern machine learning (ML) applications are becoming increasingly complex and monolithic (single chip) accelerator architectures cannot keep up with their energy efficiency and throughput demands.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Optimizing CNN Using HPC ToolsShahrin Rahman2024-03-07下载This paper optimizes the Convolutional Neural Network (CNN) algorithm using high-performance computing (HPC) technologies. It uses multi-core processors, GPUs, and parallel computing frameworks like O...
Parendi: Thousand-Way Parallel RTL SimulationMahyar Emami, Thomas Bourgeat, James Larus2024-03-07下载Hardware development critically depends on cycle-accurate RTL simulation. However, as chip complexity increases, conventional single-threaded simulation becomes impractical due to stagnant single-core...
GreenBytes: Intelligent Energy Estimation for Edge-CloudKasra Kassai, Tasos Dagiuklas, Satwat Bashir, Muddesar Iqbal2024-03-07下载This study investigates the application of advanced machine learning models, specifically Long Short-Term Memory (LSTM) networks and Gradient Booster models, for accurate energy consumption estimation...
ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing FrameworksSören Henning, Adriano Vogel, Michael Leichtfried, Otmar Ertl, Rick Rabiser2024-03-07下载Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams.
Improvements & Evaluations on the MLCommons CloudMask BenchmarkVarshitha Chennamsetti, Laiba Mehnaz, Dan Zhao, Banani Ghosh, Sergey V. Samsonau2024-03-07下载In this paper, we report the performance benchmarking results of deep learning models on MLCommons' Science cloud-masking benchmark using a high-performance computing cluster at New York University (N...
Architectural Blueprint For Heterogeneity-Resilient Federated LearningSatwat Bashir, Tasos Dagiuklas, Kasra Kassai, Muddesar Iqbal2024-03-07下载This paper proposes a novel three tier architecture for federated learning to optimize edge computing environments. The proposed architecture addresses the challenges associated with client data heter...
Enhancing Data Quality in Federated Fine-Tuning of Foundation ModelsWanru Zhao, Yaxin Du, Nicholas Donald Lane, Siheng Chen, Yanfeng Wang2024-03-07下载In the current landscape of foundation model training, there is a significant reliance on public domain data, which is nearing exhaustion according to recent research.
On-demand Quantization for Green Federated Generative Diffusion in Mobile Edge NetworksBingkun Lai, Jiayi He, Jiawen Kang, Gaolei Li, Minrui Xu, Tao zhang, Shengli Xie2024-03-07下载Generative Artificial Intelligence (GAI) shows remarkable productivity and creativity in Mobile Edge Networks, such as the metaverse and the Industrial Internet of Things.
CARISMA: CAR-Integrated Service Mesh ArchitectureKevin Klein, Pascal Hirmer, Steffen Becker2024-03-07下载The amount of software in modern cars is increasing continuously with traditional electric/electronic (E/E) architectures reaching their limit when deploying complex applications, e.g.
LoCoDL: Communication-Efficient Distributed Learning with Local Training and CompressionLaurent Condat, Artavazd Maranjyan, Peter Richtárik2024-03-07下载In Distributed optimization and Learning, and even more in the modern framework of federated learning, communication, which is slow and costly, is critical.
Alto: Orchestrating Distributed Compound AI Systems with Nested AncestryDeepti Raghavan, Keshav Santhanam, Muhammad Shahir Rahman, Nayani Modugula, Luis Gaspar Schroeder, Maximilien Cura, Houjun Liu, Pratiksha Thaker, Philip Levis, Matei Zaharia2024-03-07下载Compound AI applications chain together subcomponents such as generative language models, document retrievers, and embedding models. Applying traditional systems optimizations such as parallelism and ...
Portable GPU implementation of the WP-CCC ion-atom collisions codeI. B. Abdurakhmanov, N. W. Antonio, M. Cytowski, A. S. Kadyrov2024-03-07下载We present our experience of porting the code used in the wave-packet convergent-close-coupling (WP-CCC) approach to run on NVIDIA V100 and AMD MI250X GPUs.
HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated LearningGyudong Kim, Mehdi Ghasemi, Soroush Heidari, Seungryong Kim, Young Geun Kim, Sarma Vrudhula, Carole-Jean Wu2024-03-07下载Federated Learning (FL) is a practical approach to train deep learning models collaboratively across user-end devices, protecting user privacy by retaining raw data on-device.
GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning ModelsTolga Dimlioglu, Anna Choromanska2024-03-07下载We study distributed training of deep learning models in time-constrained environments. We propose a new algorithm that periodically pulls workers towards the center variable computed as a weighted av...
FL-GUARD: A Holistic Framework for Run-Time Detection and Recovery of Negative Federated LearningHong Lin, Lidan Shou, Ke Chen, Gang Chen, Sai Wu2024-03-07下载Federated learning (FL) is a promising approach for learning a model from data distributed on massive clients without exposing data privacy. It works effectively in the ideal federation where clients ...
FedClust: Optimizing Federated Learning on Non-IID Data through Weight-Driven Client ClusteringMd Sirajul Islam, Simin Javaherian, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng2024-03-07下载Federated learning (FL) is an emerging distributed machine learning paradigm enabling collaborative model training on decentralized devices without exposing their local data.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
QRtree -- Decision Tree dialect specification of QRscriptStefano Scanzio, Matteo Rosani, Mattia Scamuzzi, Gianluca Cena2024-03-07下载This specification document specifies the syntax and semantics of QRtree, which is a specific dialect of QRscript particularly suited to represent decision trees without chance nodes.
QRscript specificationStefano Scanzio, Matteo Rosani, Mattia Scamuzzi, Gianluca Cena2024-03-07下载This specification document specifies the syntax and semantics of QRscript. The current document only shows the part related to the QRscript header, i.e.
GreenBytes: Intelligent Energy Estimation for Edge-CloudKasra Kassai, Tasos Dagiuklas, Satwat Bashir, Muddesar Iqbal2024-03-07下载This study investigates the application of advanced machine learning models, specifically Long Short-Term Memory (LSTM) networks and Gradient Booster models, for accurate energy consumption estimation...
Architectural Blueprint For Heterogeneity-Resilient Federated LearningSatwat Bashir, Tasos Dagiuklas, Kasra Kassai, Muddesar Iqbal2024-03-07下载This paper proposes a novel three tier architecture for federated learning to optimize edge computing environments. The proposed architecture addresses the challenges associated with client data heter...
Evacuation Management Framework towards Smart City-wide Intelligent Emergency Interactive Response SystemAnuj Abraham, Yi Zhang, Shitala Prasad2024-03-07下载A smart city solution toward future 6G network deployment allows small and medium sized enterprises (SMEs), industry, and government entities to connect with the infrastructures and play a crucial rol...
On-demand Quantization for Green Federated Generative Diffusion in Mobile Edge NetworksBingkun Lai, Jiayi He, Jiawen Kang, Gaolei Li, Minrui Xu, Tao zhang, Shengli Xie2024-03-07下载Generative Artificial Intelligence (GAI) shows remarkable productivity and creativity in Mobile Edge Networks, such as the metaverse and the Industrial Internet of Things.
iTRPL: An Intelligent and Trusted RPL Protocol based on Multi-Agent Reinforcement LearningDebasmita Dey, Nirnay Ghosh2024-03-07下载Routing Protocol for Low Power and Lossy Networks (RPL) is the de-facto routing standard in IoT networks. It enables nodes to collaborate and autonomously build ad-hoc networks modeled by tree-like de...
Performance evaluation of conditional handover in 5G systems under fading scenarioSouvik Deb, Megh Rathod, Rishi Balamurugan, Shankar K. Ghosh, Rajeev K. Singh, Samriddha Sanyal2024-03-07下载To enhance the handover performance in fifth generation (5G) cellular systems, conditional handover (CHO) has been evolved as a promising solution.
DV-Hop localization based on Distance Estimation using Multinode and Hop Loss in WSNsPenghong Wang, Xingtao Wang, Wenrui Li, Xiaopeng Fan, Debin Zhao2024-03-07下载Location awareness is a critical issue in wireless sensor network applications. For more accurate location estimation, the two issues should be considered extensively: 1) how to sufficiently utilize t...
Super-resolution on network telemetry time seriesFengchen Gong, Divya Raghunathan, Aarti Gupta, Maria Apostolaki2024-03-07下载Fine-grained monitoring is crucial for multiple data-driven tasks such as debugging, provisioning, and securing networks. Yet, practical constraints in collecting, extracting, and storing data often f...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation MethodologyKonstantinos Kanellopoulos, Konstantinos Sgouras, F. Nisa Bostanci, Andreas Kosmas Kakolyris, Berkin Kerim Konar, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Nandita Vijaykumar, Onur Mutlu2024-03-07下载The unprecedented growth in data demand from emerging applications has turned virtual memory (VM) into a major performance bottleneck. Researchers explore new hardware/OS co-designs to optimize VM acr...

基于 VitePress 构建