Appearance
2025-01-10
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Axon: A novel systolic array architecture for improved run time and energy efficient GeMM and Conv operation with on-chip im2col | Md Mizanur Rahaman Nayan, Ritik Raj, Gouse Basha Shaik, Tushar Krishna, Azad J Naeemi | 2025-01-10 | 下载 | General matrix multiplication (GeMM) is a core operation in virtually all AI applications. Systolic array (SA) based architectures have shown great promise as GeMM hardware accelerators thanks to thei... |
| EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models | Jaehoon Heo, Adiwena Putra, Jieon Yoon, Sungwoong Yune, Hangyeol Lee, Ji-Hoon Kim, Joo-Young Kim | 2025-01-10 | 下载 | Over the past few years, diffusion models have emerged as novel AI solutions, generating diverse multi-modal outputs from text prompts. Despite their capabilities, they face challenges in computing, s... |
| TransPlace: Transferable Circuit Global Placement via Graph Neural Network | Yunbo Hou, Haoran Ye, Shuwen Yang, Yingxue Zhang, Siyuan Xu, Guojie Song | 2025-01-10 | 下载 | Global placement, a critical step in designing the physical layout of computer chips, is essential to optimize chip performance. Prior global placement methods optimize each circuit design individuall... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Batched DGEMMs for scientific codes running on long vector architectures | Fabio Banchelli, Marta Garcia-Gasulla, Filippo Mantovani | 2025-01-10 | 下载 | In this work, we evaluate the performance of SeisSol, a simulator of seismic wave phenomena and earthquake dynamics, on a RISC-V-based system utilizing a vector processing unit. |
| Benchmarking Different Application Types across Heterogeneous Cloud Compute Services | Nivedhitha Duggi, Masoud Rafiei, Mohsen Amini Salehi | 2025-01-10 | 下载 | Infrastructure as a Service (IaaS) clouds have become the predominant underlying infrastructure for the operation of modern and smart technology. |
| Scale-up Unlearnable Examples Learning with High-Performance Computing | Yanfan Zhu, Issac Lyngaas, Murali Gopalakrishnan Meena, Mary Ellen I. Koran, Bradley Malin, Daniel Moyer, Shunxing Bao, Anuj Kapadia, Xiao Wang, Bennett Landman, Yuankai Huo | 2025-01-10 | 下载 | Recent advancements in AI models are structured to retain user interactions, which could inadvertently include sensitive healthcare data. In the healthcare field, particularly when radiologists use AI... |
| Beyond Optimal Fault Tolerance | Andrew Lewis-Pye, Tim Roughgarden | 2025-01-10 | 下载 | The optimal fault-tolerance achievable by any protocol has been characterized in a wide range of settings. For example, for state machine replication (SMR) protocols operating in the partially synchro... |
| ML-Based Optimum Number of CUDA Streams for the GPU Implementation of the Tridiagonal Partition Method | Milena Veneva, Toshiyuki Imamura | 2025-01-10 | 下载 | This paper presents a heuristic for finding the optimum number of CUDA streams by using tools common to the modern AI-oriented approaches and applied to the parallel partition algorithm. |
| Encoded Spatial Attribute in Multi-Tier Federated Learning | Asfia Kawnine, Francis Palma, Seyed Alireza Rahimi Azghadi, Hung Cao | 2025-01-10 | 下载 | This research presents an Encoded Spatial Multi-Tier Federated Learning approach for a comprehensive evaluation of aggregated models for geospatial data. |
| STHFL: Spatio-Temporal Heterogeneous Federated Learning | Shunxin Guo, Hongsong Wang, Shuxia Lin, Xu Yang, Xin Geng | 2025-01-10 | 下载 | Federated learning is a new framework that protects data privacy and allows multiple devices to cooperate in training machine learning models. |
| A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers | Chenxi Yang, Yan Li, Martin Maas, Mustafa Uysal, Ubaid Ullah Hafeez, Arif Merchant, Richard McDougall | 2025-01-10 | 下载 | Storage systems account for a major portion of the total cost of ownership (TCO) of warehouse-scale computers, and thus have a major impact on the overall system's efficiency. |
| Collaboration of Large Language Models and Small Recommendation Models for Device-Cloud Recommendation | Zheqi Lv, Tianyu Zhan, Wenjie Wang, Xinyu Lin, Shengyu Zhang, Wenqiao Zhang, Jiwei Li, Kun Kuang, Fei Wu | 2025-01-10 | 下载 | Large Language Models (LLMs) for Recommendation (LLM4Rec) is a promising research direction that has demonstrated exceptional performance in this field. |
| Constrained Over-the-Air Model Updating for Wireless Online Federated Learning with Delayed Information | Juncheng Wang, Yituo Liu, Ben Liang, Min Dong | 2025-01-10 | 下载 | We study online federated learning over a wireless network, where the central server updates an online global model sequence to minimize the time-varying loss of multiple local devices over time. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Over-the-Air FEEL with Integrated Sensing: Joint Scheduling and Beamforming Design | Saba Asaad, Ping Wang, Hina Tabassum | 2025-01-10 | 下载 | Employing wireless systems with dual sensing and communications functionalities is becoming critical in next generation of wireless networks. In this paper, we propose a robust design for over-the-air... |
| Network-centric optimal hybrid sensing hole recovery and self-healing in IPV6 WSNs | Kwadwo Asante, Yaw Marfo Missah, Frimpong Twum. Michael Asante | 2025-01-10 | 下载 | In our earlier work, Network-Centric Optimal Hybrid Mobility for IPv6 wireless sensor networks, in which the work sought to control mobility of sensor nodes from an external network was proposed. |
| GR-WiFi: A GNU Radio based WiFi Platform with Single-User and Multi-User MIMO Capability | Natong Lin, Zelin Yun, Shengli Zhou, Song Han | 2025-01-10 | 下载 | Since its first release, WiFi has been highly successful in providing wireless local area networks. The ever-evolving IEEE 802.11 standards continue to add new features to keep up with the trend of in... |
| RPKI-Based Location-Unaware Tor Guard Relay Selection Algorithms | Zhifan Lu, Siyang Sun, Yixin Sun | 2025-01-10 | 下载 | Tor is a well-known anonymous communication tool, used by people with various privacy and security needs. Prior works have exploited routing attacks to observe Tor traffic and deanonymize Tor users. |
| Collaborative Content Moderation in the Fediverse | Haris Bin Zia, Aravindh Raman, Ignacio Castro, Gareth Tyson | 2025-01-10 | 下载 | The Fediverse, a group of interconnected servers providing a variety of interoperable services (e.g. micro-blogging in Mastodon) has gained rapid popularity. |
| UAV Swarm-enabled Collaborative Post-disaster Communications in Low Altitude Economy via a Two-stage Optimization Approach | Xiaoya Zheng, Geng Sun, Jiahui Li, Jiacheng Wang, Qingqing Wu, Dusit Niyato, Abbas Jamalipour | 2025-01-10 | 下载 | The low-altitude economy (LAE) plays an indispensable role in cargo transportation, healthcare, infrastructure inspection, and especially post-disaster communication. |
| Network Diffuser for Placing-Scheduling Service Function Chains with Inverse Demonstration | Zuyuan Zhang, Vaneet Aggarwal, Tian Lan | 2025-01-10 | 下载 | Network services are increasingly managed by considering chained-up virtual network functions and relevant traffic flows, known as the Service Function Chains (SFCs). |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios | Daniel Rossi, Guido Borghi, Roberto Vezzani | 2025-01-10 | 下载 | Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial imaging with drones and UAVs for emergenc... |
| MLKAPS: Machine Learning and Adaptive Sampling for HPC Kernel Auto-tuning | Mathys Jam, Eric Petit, Pablo de Oliveira Castro, David Defour, Greg Henry, William Jalby | 2025-01-10 | 下载 | Many High-Performance Computing (HPC) libraries rely on decision trees to select the best kernel hyperparameters at runtime,depending on the input and environment. |