Skip to content

2025-01-10

cs.AR - Architecture

标题作者发布日期PDF摘要
Axon: A novel systolic array architecture for improved run time and energy efficient GeMM and Conv operation with on-chip im2colMd Mizanur Rahaman Nayan, Ritik Raj, Gouse Basha Shaik, Tushar Krishna, Azad J Naeemi2025-01-10下载General matrix multiplication (GeMM) is a core operation in virtually all AI applications. Systolic array (SA) based architectures have shown great promise as GeMM hardware accelerators thanks to thei...
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion ModelsJaehoon Heo, Adiwena Putra, Jieon Yoon, Sungwoong Yune, Hangyeol Lee, Ji-Hoon Kim, Joo-Young Kim2025-01-10下载Over the past few years, diffusion models have emerged as novel AI solutions, generating diverse multi-modal outputs from text prompts. Despite their capabilities, they face challenges in computing, s...
TransPlace: Transferable Circuit Global Placement via Graph Neural NetworkYunbo Hou, Haoran Ye, Shuwen Yang, Yingxue Zhang, Siyuan Xu, Guojie Song2025-01-10下载Global placement, a critical step in designing the physical layout of computer chips, is essential to optimize chip performance. Prior global placement methods optimize each circuit design individuall...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Batched DGEMMs for scientific codes running on long vector architecturesFabio Banchelli, Marta Garcia-Gasulla, Filippo Mantovani2025-01-10下载In this work, we evaluate the performance of SeisSol, a simulator of seismic wave phenomena and earthquake dynamics, on a RISC-V-based system utilizing a vector processing unit.
Benchmarking Different Application Types across Heterogeneous Cloud Compute ServicesNivedhitha Duggi, Masoud Rafiei, Mohsen Amini Salehi2025-01-10下载Infrastructure as a Service (IaaS) clouds have become the predominant underlying infrastructure for the operation of modern and smart technology.
Scale-up Unlearnable Examples Learning with High-Performance ComputingYanfan Zhu, Issac Lyngaas, Murali Gopalakrishnan Meena, Mary Ellen I. Koran, Bradley Malin, Daniel Moyer, Shunxing Bao, Anuj Kapadia, Xiao Wang, Bennett Landman, Yuankai Huo2025-01-10下载Recent advancements in AI models are structured to retain user interactions, which could inadvertently include sensitive healthcare data. In the healthcare field, particularly when radiologists use AI...
Beyond Optimal Fault ToleranceAndrew Lewis-Pye, Tim Roughgarden2025-01-10下载The optimal fault-tolerance achievable by any protocol has been characterized in a wide range of settings. For example, for state machine replication (SMR) protocols operating in the partially synchro...
ML-Based Optimum Number of CUDA Streams for the GPU Implementation of the Tridiagonal Partition MethodMilena Veneva, Toshiyuki Imamura2025-01-10下载This paper presents a heuristic for finding the optimum number of CUDA streams by using tools common to the modern AI-oriented approaches and applied to the parallel partition algorithm.
Encoded Spatial Attribute in Multi-Tier Federated LearningAsfia Kawnine, Francis Palma, Seyed Alireza Rahimi Azghadi, Hung Cao2025-01-10下载This research presents an Encoded Spatial Multi-Tier Federated Learning approach for a comprehensive evaluation of aggregated models for geospatial data.
STHFL: Spatio-Temporal Heterogeneous Federated LearningShunxin Guo, Hongsong Wang, Shuxia Lin, Xu Yang, Xin Geng2025-01-10下载Federated learning is a new framework that protects data privacy and allows multiple devices to cooperate in training machine learning models.
A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale ComputersChenxi Yang, Yan Li, Martin Maas, Mustafa Uysal, Ubaid Ullah Hafeez, Arif Merchant, Richard McDougall2025-01-10下载Storage systems account for a major portion of the total cost of ownership (TCO) of warehouse-scale computers, and thus have a major impact on the overall system's efficiency.
Collaboration of Large Language Models and Small Recommendation Models for Device-Cloud RecommendationZheqi Lv, Tianyu Zhan, Wenjie Wang, Xinyu Lin, Shengyu Zhang, Wenqiao Zhang, Jiwei Li, Kun Kuang, Fei Wu2025-01-10下载Large Language Models (LLMs) for Recommendation (LLM4Rec) is a promising research direction that has demonstrated exceptional performance in this field.
Constrained Over-the-Air Model Updating for Wireless Online Federated Learning with Delayed InformationJuncheng Wang, Yituo Liu, Ben Liang, Min Dong2025-01-10下载We study online federated learning over a wireless network, where the central server updates an online global model sequence to minimize the time-varying loss of multiple local devices over time.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Over-the-Air FEEL with Integrated Sensing: Joint Scheduling and Beamforming DesignSaba Asaad, Ping Wang, Hina Tabassum2025-01-10下载Employing wireless systems with dual sensing and communications functionalities is becoming critical in next generation of wireless networks. In this paper, we propose a robust design for over-the-air...
Network-centric optimal hybrid sensing hole recovery and self-healing in IPV6 WSNsKwadwo Asante, Yaw Marfo Missah, Frimpong Twum. Michael Asante2025-01-10下载In our earlier work, Network-Centric Optimal Hybrid Mobility for IPv6 wireless sensor networks, in which the work sought to control mobility of sensor nodes from an external network was proposed.
GR-WiFi: A GNU Radio based WiFi Platform with Single-User and Multi-User MIMO CapabilityNatong Lin, Zelin Yun, Shengli Zhou, Song Han2025-01-10下载Since its first release, WiFi has been highly successful in providing wireless local area networks. The ever-evolving IEEE 802.11 standards continue to add new features to keep up with the trend of in...
RPKI-Based Location-Unaware Tor Guard Relay Selection AlgorithmsZhifan Lu, Siyang Sun, Yixin Sun2025-01-10下载Tor is a well-known anonymous communication tool, used by people with various privacy and security needs. Prior works have exploited routing attacks to observe Tor traffic and deanonymize Tor users.
Collaborative Content Moderation in the FediverseHaris Bin Zia, Aravindh Raman, Ignacio Castro, Gareth Tyson2025-01-10下载The Fediverse, a group of interconnected servers providing a variety of interoperable services (e.g. micro-blogging in Mastodon) has gained rapid popularity.
UAV Swarm-enabled Collaborative Post-disaster Communications in Low Altitude Economy via a Two-stage Optimization ApproachXiaoya Zheng, Geng Sun, Jiahui Li, Jiacheng Wang, Qingqing Wu, Dusit Niyato, Abbas Jamalipour2025-01-10下载The low-altitude economy (LAE) plays an indispensable role in cargo transportation, healthcare, infrastructure inspection, and especially post-disaster communication.
Network Diffuser for Placing-Scheduling Service Function Chains with Inverse DemonstrationZuyuan Zhang, Vaneet Aggarwal, Tian Lan2025-01-10下载Network services are increasingly managed by considering chained-up virtual network functions and relevant traffic flows, known as the Service Function Chains (SFCs).

cs.PF - Performance

标题作者发布日期PDF摘要
TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response ScenariosDaniel Rossi, Guido Borghi, Roberto Vezzani2025-01-10下载Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial imaging with drones and UAVs for emergenc...
MLKAPS: Machine Learning and Adaptive Sampling for HPC Kernel Auto-tuningMathys Jam, Eric Petit, Pablo de Oliveira Castro, David Defour, Greg Henry, William Jalby2025-01-10下载Many High-Performance Computing (HPC) libraries rely on decision trees to select the best kernel hyperparameters at runtime,depending on the input and environment.

基于 VitePress 构建