Appearance
2024-03-07
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Parendi: Thousand-Way Parallel RTL Simulation | Mahyar Emami, Thomas Bourgeat, James Larus | 2024-03-07 | 下载 | Hardware development critically depends on cycle-accurate RTL simulation. However, as chip complexity increases, conventional single-threaded simulation becomes impractical due to stagnant single-core... |
| Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation Methodology | Konstantinos Kanellopoulos, Konstantinos Sgouras, F. Nisa Bostanci, Andreas Kosmas Kakolyris, Berkin Kerim Konar, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Nandita Vijaykumar, Onur Mutlu | 2024-03-07 | 下载 | The unprecedented growth in data demand from emerging applications has turned virtual memory (VM) into a major performance bottleneck. Researchers explore new hardware/OS co-designs to optimize VM acr... |
| PUMA: Efficient and Low-Cost Memory Allocation and Alignment Support for Processing-Using-Memory Architectures | Geraldo F. Oliveira, Emanuele G. Esposito, Juan Gómez-Luna, Onur Mutlu | 2024-03-07 | 下载 | Processing-using-DRAM (PUD) architectures impose a restrictive data layout and alignment for their operands, where source and destination operands (i) must reside in the same DRAM subarray (i.e. |
| A methodology to automatically optimize dynamic memory managers applying grammatical evolution | José L. Risco-Martín, J. Manuel Colmenar, J. Ignacio Hidalgo, Juan Lanchares, Josefa Díaz | 2024-03-07 | 下载 | Modern consumer devices must execute multimedia applications that exhibit high resource utilization. In order to efficiently execute these applications, the dynamic memory subsystem needs to be optimi... |
| Silicon Photonic 2.5D Interposer Networks for Overcoming Communication Bottlenecks in Scale-out Machine Learning Hardware Accelerators | Febin Sunny, Ebadollah Taheri, Mahdi Nikdast, Sudeep Pasricha | 2024-03-07 | 下载 | Modern machine learning (ML) applications are becoming increasingly complex and monolithic (single chip) accelerator architectures cannot keep up with their energy efficiency and throughput demands. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Optimizing CNN Using HPC Tools | Shahrin Rahman | 2024-03-07 | 下载 | This paper optimizes the Convolutional Neural Network (CNN) algorithm using high-performance computing (HPC) technologies. It uses multi-core processors, GPUs, and parallel computing frameworks like O... |
| Parendi: Thousand-Way Parallel RTL Simulation | Mahyar Emami, Thomas Bourgeat, James Larus | 2024-03-07 | 下载 | Hardware development critically depends on cycle-accurate RTL simulation. However, as chip complexity increases, conventional single-threaded simulation becomes impractical due to stagnant single-core... |
| GreenBytes: Intelligent Energy Estimation for Edge-Cloud | Kasra Kassai, Tasos Dagiuklas, Satwat Bashir, Muddesar Iqbal | 2024-03-07 | 下载 | This study investigates the application of advanced machine learning models, specifically Long Short-Term Memory (LSTM) networks and Gradient Booster models, for accurate energy consumption estimation... |
| ShuffleBench: A Benchmark for Large-Scale Data Shuffling Operations with Distributed Stream Processing Frameworks | Sören Henning, Adriano Vogel, Michael Leichtfried, Otmar Ertl, Rick Rabiser | 2024-03-07 | 下载 | Distributed stream processing frameworks help building scalable and reliable applications that perform transformations and aggregations on continuous data streams. |
| Improvements & Evaluations on the MLCommons CloudMask Benchmark | Varshitha Chennamsetti, Laiba Mehnaz, Dan Zhao, Banani Ghosh, Sergey V. Samsonau | 2024-03-07 | 下载 | In this paper, we report the performance benchmarking results of deep learning models on MLCommons' Science cloud-masking benchmark using a high-performance computing cluster at New York University (N... |
| Architectural Blueprint For Heterogeneity-Resilient Federated Learning | Satwat Bashir, Tasos Dagiuklas, Kasra Kassai, Muddesar Iqbal | 2024-03-07 | 下载 | This paper proposes a novel three tier architecture for federated learning to optimize edge computing environments. The proposed architecture addresses the challenges associated with client data heter... |
| Enhancing Data Quality in Federated Fine-Tuning of Foundation Models | Wanru Zhao, Yaxin Du, Nicholas Donald Lane, Siheng Chen, Yanfeng Wang | 2024-03-07 | 下载 | In the current landscape of foundation model training, there is a significant reliance on public domain data, which is nearing exhaustion according to recent research. |
| On-demand Quantization for Green Federated Generative Diffusion in Mobile Edge Networks | Bingkun Lai, Jiayi He, Jiawen Kang, Gaolei Li, Minrui Xu, Tao zhang, Shengli Xie | 2024-03-07 | 下载 | Generative Artificial Intelligence (GAI) shows remarkable productivity and creativity in Mobile Edge Networks, such as the metaverse and the Industrial Internet of Things. |
| CARISMA: CAR-Integrated Service Mesh Architecture | Kevin Klein, Pascal Hirmer, Steffen Becker | 2024-03-07 | 下载 | The amount of software in modern cars is increasing continuously with traditional electric/electronic (E/E) architectures reaching their limit when deploying complex applications, e.g. |
| LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression | Laurent Condat, Artavazd Maranjyan, Peter Richtárik | 2024-03-07 | 下载 | In Distributed optimization and Learning, and even more in the modern framework of federated learning, communication, which is slow and costly, is critical. |
| Alto: Orchestrating Distributed Compound AI Systems with Nested Ancestry | Deepti Raghavan, Keshav Santhanam, Muhammad Shahir Rahman, Nayani Modugula, Luis Gaspar Schroeder, Maximilien Cura, Houjun Liu, Pratiksha Thaker, Philip Levis, Matei Zaharia | 2024-03-07 | 下载 | Compound AI applications chain together subcomponents such as generative language models, document retrievers, and embedding models. Applying traditional systems optimizations such as parallelism and ... |
| Portable GPU implementation of the WP-CCC ion-atom collisions code | I. B. Abdurakhmanov, N. W. Antonio, M. Cytowski, A. S. Kadyrov | 2024-03-07 | 下载 | We present our experience of porting the code used in the wave-packet convergent-close-coupling (WP-CCC) approach to run on NVIDIA V100 and AMD MI250X GPUs. |
| HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning | Gyudong Kim, Mehdi Ghasemi, Soroush Heidari, Seungryong Kim, Young Geun Kim, Sarma Vrudhula, Carole-Jean Wu | 2024-03-07 | 下载 | Federated Learning (FL) is a practical approach to train deep learning models collaboratively across user-end devices, protecting user privacy by retaining raw data on-device. |
| GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models | Tolga Dimlioglu, Anna Choromanska | 2024-03-07 | 下载 | We study distributed training of deep learning models in time-constrained environments. We propose a new algorithm that periodically pulls workers towards the center variable computed as a weighted av... |
| FL-GUARD: A Holistic Framework for Run-Time Detection and Recovery of Negative Federated Learning | Hong Lin, Lidan Shou, Ke Chen, Gang Chen, Sai Wu | 2024-03-07 | 下载 | Federated learning (FL) is a promising approach for learning a model from data distributed on massive clients without exposing data privacy. It works effectively in the ideal federation where clients ... |
| FedClust: Optimizing Federated Learning on Non-IID Data through Weight-Driven Client Clustering | Md Sirajul Islam, Simin Javaherian, Fei Xu, Xu Yuan, Li Chen, Nian-Feng Tzeng | 2024-03-07 | 下载 | Federated learning (FL) is an emerging distributed machine learning paradigm enabling collaborative model training on decentralized devices without exposing their local data. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| QRtree -- Decision Tree dialect specification of QRscript | Stefano Scanzio, Matteo Rosani, Mattia Scamuzzi, Gianluca Cena | 2024-03-07 | 下载 | This specification document specifies the syntax and semantics of QRtree, which is a specific dialect of QRscript particularly suited to represent decision trees without chance nodes. |
| QRscript specification | Stefano Scanzio, Matteo Rosani, Mattia Scamuzzi, Gianluca Cena | 2024-03-07 | 下载 | This specification document specifies the syntax and semantics of QRscript. The current document only shows the part related to the QRscript header, i.e. |
| GreenBytes: Intelligent Energy Estimation for Edge-Cloud | Kasra Kassai, Tasos Dagiuklas, Satwat Bashir, Muddesar Iqbal | 2024-03-07 | 下载 | This study investigates the application of advanced machine learning models, specifically Long Short-Term Memory (LSTM) networks and Gradient Booster models, for accurate energy consumption estimation... |
| Architectural Blueprint For Heterogeneity-Resilient Federated Learning | Satwat Bashir, Tasos Dagiuklas, Kasra Kassai, Muddesar Iqbal | 2024-03-07 | 下载 | This paper proposes a novel three tier architecture for federated learning to optimize edge computing environments. The proposed architecture addresses the challenges associated with client data heter... |
| Evacuation Management Framework towards Smart City-wide Intelligent Emergency Interactive Response System | Anuj Abraham, Yi Zhang, Shitala Prasad | 2024-03-07 | 下载 | A smart city solution toward future 6G network deployment allows small and medium sized enterprises (SMEs), industry, and government entities to connect with the infrastructures and play a crucial rol... |
| On-demand Quantization for Green Federated Generative Diffusion in Mobile Edge Networks | Bingkun Lai, Jiayi He, Jiawen Kang, Gaolei Li, Minrui Xu, Tao zhang, Shengli Xie | 2024-03-07 | 下载 | Generative Artificial Intelligence (GAI) shows remarkable productivity and creativity in Mobile Edge Networks, such as the metaverse and the Industrial Internet of Things. |
| iTRPL: An Intelligent and Trusted RPL Protocol based on Multi-Agent Reinforcement Learning | Debasmita Dey, Nirnay Ghosh | 2024-03-07 | 下载 | Routing Protocol for Low Power and Lossy Networks (RPL) is the de-facto routing standard in IoT networks. It enables nodes to collaborate and autonomously build ad-hoc networks modeled by tree-like de... |
| Performance evaluation of conditional handover in 5G systems under fading scenario | Souvik Deb, Megh Rathod, Rishi Balamurugan, Shankar K. Ghosh, Rajeev K. Singh, Samriddha Sanyal | 2024-03-07 | 下载 | To enhance the handover performance in fifth generation (5G) cellular systems, conditional handover (CHO) has been evolved as a promising solution. |
| DV-Hop localization based on Distance Estimation using Multinode and Hop Loss in WSNs | Penghong Wang, Xingtao Wang, Wenrui Li, Xiaopeng Fan, Debin Zhao | 2024-03-07 | 下载 | Location awareness is a critical issue in wireless sensor network applications. For more accurate location estimation, the two issues should be considered extensively: 1) how to sufficiently utilize t... |
| Super-resolution on network telemetry time series | Fengchen Gong, Divya Raghunathan, Aarti Gupta, Maria Apostolaki | 2024-03-07 | 下载 | Fine-grained monitoring is crucial for multiple data-driven tasks such as debugging, provisioning, and securing networks. Yet, practical constraints in collecting, extracting, and storing data often f... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation Methodology | Konstantinos Kanellopoulos, Konstantinos Sgouras, F. Nisa Bostanci, Andreas Kosmas Kakolyris, Berkin Kerim Konar, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Nandita Vijaykumar, Onur Mutlu | 2024-03-07 | 下载 | The unprecedented growth in data demand from emerging applications has turned virtual memory (VM) into a major performance bottleneck. Researchers explore new hardware/OS co-designs to optimize VM acr... |