2024-03-07

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Parendi: Thousand-Way Parallel RTL Simulation	Mahyar Emami, Thomas Bourgeat, James Larus	2024-03-07	下载	Hardware development critically depends on cycle-accurate RTL simulation. However, as chip complexity increases, conventional single-threaded simulation becomes impractical due to stagnant single-core...
Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation Methodology	Konstantinos Kanellopoulos, Konstantinos Sgouras, F. Nisa Bostanci, Andreas Kosmas Kakolyris, Berkin Kerim Konar, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Nandita Vijaykumar, Onur Mutlu	2024-03-07	下载	The unprecedented growth in data demand from emerging applications has turned virtual memory (VM) into a major performance bottleneck. Researchers explore new hardware/OS co-designs to optimize VM acr...
PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures	Geraldo F. Oliveira, Emanuele G. Esposito, Juan Gómez-Luna, Onur Mutlu	2024-03-07	下载	Processing-using-DRAM (PUD) architectures impose a restrictive data layout and alignment for their operands, where source and destination operands (i) must reside in the same DRAM subarray (i.e.
A methodology to automatically optimize dynamic memory managers applying grammatical evolution	José L. Risco-Martín, J. Manuel Colmenar, J. Ignacio Hidalgo, Juan Lanchares, Josefa Díaz	2024-03-07	下载	Modern consumer devices must execute multimedia applications that exhibit high resource utilization. In order to efficiently execute these applications, the dynamic memory subsystem needs to be optimi...
Silicon Photonic 2.5D Interposer Networks for Overcoming Communication Bottlenecks in Scale-out Machine Learning Hardware Accelerators	Febin Sunny, Ebadollah Taheri, Mahdi Nikdast, Sudeep Pasricha	2024-03-07	下载	Modern machine learning (ML) applications are becoming increasingly complex and monolithic (single chip) accelerator architectures cannot keep up with their energy efficiency and throughput demands.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Optimizing CNN Using HPC Tools	Shahrin Rahman	2024-03-07	下载	This paper optimizes the Convolutional Neural Network (CNN) algorithm using high-performance computing (HPC) technologies. It uses multi-core processors, GPUs, and parallel computing frameworks like O...
Parendi: Thousand-Way Parallel RTL Simulation	Mahyar Emami, Thomas Bourgeat, James Larus	2024-03-07	下载	Hardware development critically depends on cycle-accurate RTL simulation. However, as chip complexity increases, conventional single-threaded simulation becomes impractical due to stagnant single-core...
GreenBytes: Intelligent Energy Estimation for Edge-Cloud	Kasra Kassai, Tasos Dagiuklas, Satwat Bashir, Muddesar Iqbal	2024-03-07	下载	This study investigates the application of advanced machine learning models, specifically Long Short-Term Memory (LSTM) networks and Gradient Booster models, for accurate energy consumption estimation...
ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks	Sören Henning, Adriano Vogel, Michael Leichtfried, Otmar Ertl, Rick Rabiser	2024-03-07	下载	Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams.
Improvements & Evaluations on the MLCommons CloudMask Benchmark	Varshitha Chennamsetti, Laiba Mehnaz, Dan Zhao, Banani Ghosh, Sergey V. Samsonau	2024-03-07	下载	In this paper, we report the performance benchmarking results of deep learning models on MLCommons' Science cloud-masking benchmark using a high-performance computing cluster at New York University (N...
Architectural Blueprint For Heterogeneity-Resilient Federated Learning	Satwat Bashir, Tasos Dagiuklas, Kasra Kassai, Muddesar Iqbal	2024-03-07	下载	This paper proposes a novel three tier architecture for federated learning to optimize edge computing environments. The proposed architecture addresses the challenges associated with client data heter...
Enhancing Data Quality in Federated Fine-Tuning of Foundation Models	Wanru Zhao, Yaxin Du, Nicholas Donald Lane, Siheng Chen, Yanfeng Wang	2024-03-07	下载	In the current landscape of foundation model training, there is a significant reliance on public domain data, which is nearing exhaustion according to recent research.
On-demand Quantization for Green Federated Generative Diffusion in Mobile Edge Networks	Bingkun Lai, Jiayi He, Jiawen Kang, Gaolei Li, Minrui Xu, Tao zhang, Shengli Xie	2024-03-07	下载	Generative Artificial Intelligence (GAI) shows remarkable productivity and creativity in Mobile Edge Networks, such as the metaverse and the Industrial Internet of Things.
CARISMA: CAR-Integrated Service Mesh Architecture	Kevin Klein, Pascal Hirmer, Steffen Becker	2024-03-07	下载	The amount of software in modern cars is increasing continuously with traditional electric/electronic (E/E) architectures reaching their limit when deploying complex applications, e.g.
LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression	Laurent Condat, Artavazd Maranjyan, Peter Richtárik	2024-03-07	下载	In Distributed optimization and Learning, and even more in the modern framework of federated learning, communication, which is slow and costly, is critical.
Alto: Orchestrating Distributed Compound AI Systems with Nested Ancestry	Deepti Raghavan, Keshav Santhanam, Muhammad Shahir Rahman, Nayani Modugula, Luis Gaspar Schroeder, Maximilien Cura, Houjun Liu, Pratiksha Thaker, Philip Levis, Matei Zaharia	2024-03-07	下载	Compound AI applications chain together subcomponents such as generative language models, document retrievers, and embedding models. Applying traditional systems optimizations such as parallelism and ...
Portable GPU implementation of the WP-CCC ion-atom collisions code	I. B. Abdurakhmanov, N. W. Antonio, M. Cytowski, A. S. Kadyrov	2024-03-07	下载	We present our experience of porting the code used in the wave-packet convergent-close-coupling (WP-CCC) approach to run on NVIDIA V100 and AMD MI250X GPUs.
HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning	Gyudong Kim, Mehdi Ghasemi, Soroush Heidari, Seungryong Kim, Young Geun Kim, Sarma Vrudhula, Carole-Jean Wu	2024-03-07	下载	Federated Learning (FL) is a practical approach to train deep learning models collaboratively across user-end devices, protecting user privacy by retaining raw data on-device.
GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models	Tolga Dimlioglu, Anna Choromanska	2024-03-07	下载	We study distributed training of deep learning models in time-constrained environments. We propose a new algorithm that periodically pulls workers towards the center variable computed as a weighted av...
FL-GUARD: A Holistic Framework for Run-Time Detection and Recovery of Negative Federated Learning	Hong Lin, Lidan Shou, Ke Chen, Gang Chen, Sai Wu	2024-03-07	下载	Federated learning (FL) is a promising approach for learning a model from data distributed on massive clients without exposing data privacy. It works effectively in the ideal federation where clients ...
FedClust: Optimizing Federated Learning on Non-IID Data through Weight-Driven Client Clustering	Md Sirajul Islam, Simin Javaherian, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng	2024-03-07	下载	Federated learning (FL) is an emerging distributed machine learning paradigm enabling collaborative model training on decentralized devices without exposing their local data.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
QRtree -- Decision Tree dialect specification of QRscript	Stefano Scanzio, Matteo Rosani, Mattia Scamuzzi, Gianluca Cena	2024-03-07	下载	This specification document specifies the syntax and semantics of QRtree, which is a specific dialect of QRscript particularly suited to represent decision trees without chance nodes.
QRscript specification	Stefano Scanzio, Matteo Rosani, Mattia Scamuzzi, Gianluca Cena	2024-03-07	下载	This specification document specifies the syntax and semantics of QRscript. The current document only shows the part related to the QRscript header, i.e.
GreenBytes: Intelligent Energy Estimation for Edge-Cloud	Kasra Kassai, Tasos Dagiuklas, Satwat Bashir, Muddesar Iqbal	2024-03-07	下载	This study investigates the application of advanced machine learning models, specifically Long Short-Term Memory (LSTM) networks and Gradient Booster models, for accurate energy consumption estimation...
Architectural Blueprint For Heterogeneity-Resilient Federated Learning	Satwat Bashir, Tasos Dagiuklas, Kasra Kassai, Muddesar Iqbal	2024-03-07	下载	This paper proposes a novel three tier architecture for federated learning to optimize edge computing environments. The proposed architecture addresses the challenges associated with client data heter...
Evacuation Management Framework towards Smart City-wide Intelligent Emergency Interactive Response System	Anuj Abraham, Yi Zhang, Shitala Prasad	2024-03-07	下载	A smart city solution toward future 6G network deployment allows small and medium sized enterprises (SMEs), industry, and government entities to connect with the infrastructures and play a crucial rol...
On-demand Quantization for Green Federated Generative Diffusion in Mobile Edge Networks	Bingkun Lai, Jiayi He, Jiawen Kang, Gaolei Li, Minrui Xu, Tao zhang, Shengli Xie	2024-03-07	下载	Generative Artificial Intelligence (GAI) shows remarkable productivity and creativity in Mobile Edge Networks, such as the metaverse and the Industrial Internet of Things.
iTRPL: An Intelligent and Trusted RPL Protocol based on Multi-Agent Reinforcement Learning	Debasmita Dey, Nirnay Ghosh	2024-03-07	下载	Routing Protocol for Low Power and Lossy Networks (RPL) is the de-facto routing standard in IoT networks. It enables nodes to collaborate and autonomously build ad-hoc networks modeled by tree-like de...
Performance evaluation of conditional handover in 5G systems under fading scenario	Souvik Deb, Megh Rathod, Rishi Balamurugan, Shankar K. Ghosh, Rajeev K. Singh, Samriddha Sanyal	2024-03-07	下载	To enhance the handover performance in fifth generation (5G) cellular systems, conditional handover (CHO) has been evolved as a promising solution.
DV-Hop localization based on Distance Estimation using Multinode and Hop Loss in WSNs	Penghong Wang, Xingtao Wang, Wenrui Li, Xiaopeng Fan, Debin Zhao	2024-03-07	下载	Location awareness is a critical issue in wireless sensor network applications. For more accurate location estimation, the two issues should be considered extensively: 1) how to sufficiently utilize t...
Super-resolution on network telemetry time series	Fengchen Gong, Divya Raghunathan, Aarti Gupta, Maria Apostolaki	2024-03-07	下载	Fine-grained monitoring is crucial for multiple data-driven tasks such as debugging, provisioning, and securing networks. Yet, practical constraints in collecting, extracting, and storing data often f...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation Methodology	Konstantinos Kanellopoulos, Konstantinos Sgouras, F. Nisa Bostanci, Andreas Kosmas Kakolyris, Berkin Kerim Konar, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Nandita Vijaykumar, Onur Mutlu	2024-03-07	下载	The unprecedented growth in data demand from emerging applications has turned virtual memory (VM) into a major performance bottleneck. Researchers explore new hardware/OS co-designs to optimize VM acr...