2025-01-10

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Axon: A novel systolic array architecture for improved run time and energy efficient GeMM and Conv operation with on-chip im2col	Md Mizanur Rahaman Nayan, Ritik Raj, Gouse Basha Shaik, Tushar Krishna, Azad J Naeemi	2025-01-10	下载	General matrix multiplication (GeMM) is a core operation in virtually all AI applications. Systolic array (SA) based architectures have shown great promise as GeMM hardware accelerators thanks to thei...
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models	Jaehoon Heo, Adiwena Putra, Jieon Yoon, Sungwoong Yune, Hangyeol Lee, Ji-Hoon Kim, Joo-Young Kim	2025-01-10	下载	Over the past few years, diffusion models have emerged as novel AI solutions, generating diverse multi-modal outputs from text prompts. Despite their capabilities, they face challenges in computing, s...
TransPlace: Transferable Circuit Global Placement via Graph Neural Network	Yunbo Hou, Haoran Ye, Shuwen Yang, Yingxue Zhang, Siyuan Xu, Guojie Song	2025-01-10	下载	Global placement, a critical step in designing the physical layout of computer chips, is essential to optimize chip performance. Prior global placement methods optimize each circuit design individuall...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Batched DGEMMs for scientific codes running on long vector architectures	Fabio Banchelli, Marta Garcia-Gasulla, Filippo Mantovani	2025-01-10	下载	In this work, we evaluate the performance of SeisSol, a simulator of seismic wave phenomena and earthquake dynamics, on a RISC-V-based system utilizing a vector processing unit.
Benchmarking Different Application Types across Heterogeneous Cloud Compute Services	Nivedhitha Duggi, Masoud Rafiei, Mohsen Amini Salehi	2025-01-10	下载	Infrastructure as a Service (IaaS) clouds have become the predominant underlying infrastructure for the operation of modern and smart technology.
Scale-up Unlearnable Examples Learning with High-Performance Computing	Yanfan Zhu, Issac Lyngaas, Murali Gopalakrishnan Meena, Mary Ellen I. Koran, Bradley Malin, Daniel Moyer, Shunxing Bao, Anuj Kapadia, Xiao Wang, Bennett Landman, Yuankai Huo	2025-01-10	下载	Recent advancements in AI models are structured to retain user interactions, which could inadvertently include sensitive healthcare data. In the healthcare field, particularly when radiologists use AI...
Beyond Optimal Fault Tolerance	Andrew Lewis-Pye, Tim Roughgarden	2025-01-10	下载	The optimal fault-tolerance achievable by any protocol has been characterized in a wide range of settings. For example, for state machine replication (SMR) protocols operating in the partially synchro...
ML-Based Optimum Number of CUDA Streams for the GPU Implementation of the Tridiagonal Partition Method	Milena Veneva, Toshiyuki Imamura	2025-01-10	下载	This paper presents a heuristic for finding the optimum number of CUDA streams by using tools common to the modern AI-oriented approaches and applied to the parallel partition algorithm.
Encoded Spatial Attribute in Multi-Tier Federated Learning	Asfia Kawnine, Francis Palma, Seyed Alireza Rahimi Azghadi, Hung Cao	2025-01-10	下载	This research presents an Encoded Spatial Multi-Tier Federated Learning approach for a comprehensive evaluation of aggregated models for geospatial data.
STHFL: Spatio-Temporal Heterogeneous Federated Learning	Shunxin Guo, Hongsong Wang, Shuxia Lin, Xu Yang, Xin Geng	2025-01-10	下载	Federated learning is a new framework that protects data privacy and allows multiple devices to cooperate in training machine learning models.
A Bring-Your-Own-Model Approach for ML-Driven Storage Placement in Warehouse-Scale Computers	Chenxi Yang, Yan Li, Martin Maas, Mustafa Uysal, Ubaid Ullah Hafeez, Arif Merchant, Richard McDougall	2025-01-10	下载	Storage systems account for a major portion of the total cost of ownership (TCO) of warehouse-scale computers, and thus have a major impact on the overall system's efficiency.
Collaboration of Large Language Models and Small Recommendation Models for Device-Cloud Recommendation	Zheqi Lv, Tianyu Zhan, Wenjie Wang, Xinyu Lin, Shengyu Zhang, Wenqiao Zhang, Jiwei Li, Kun Kuang, Fei Wu	2025-01-10	下载	Large Language Models (LLMs) for Recommendation (LLM4Rec) is a promising research direction that has demonstrated exceptional performance in this field.
Constrained Over-the-Air Model Updating for Wireless Online Federated Learning with Delayed Information	Juncheng Wang, Yituo Liu, Ben Liang, Min Dong	2025-01-10	下载	We study online federated learning over a wireless network, where the central server updates an online global model sequence to minimize the time-varying loss of multiple local devices over time.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Over-the-Air FEEL with Integrated Sensing: Joint Scheduling and Beamforming Design	Saba Asaad, Ping Wang, Hina Tabassum	2025-01-10	下载	Employing wireless systems with dual sensing and communications functionalities is becoming critical in next generation of wireless networks. In this paper, we propose a robust design for over-the-air...
Network-centric optimal hybrid sensing hole recovery and self-healing in IPV6 WSNs	Kwadwo Asante, Yaw Marfo Missah, Frimpong Twum. Michael Asante	2025-01-10	下载	In our earlier work, Network-Centric Optimal Hybrid Mobility for IPv6 wireless sensor networks, in which the work sought to control mobility of sensor nodes from an external network was proposed.
GR-WiFi: A GNU Radio based WiFi Platform with Single-User and Multi-User MIMO Capability	Natong Lin, Zelin Yun, Shengli Zhou, Song Han	2025-01-10	下载	Since its first release, WiFi has been highly successful in providing wireless local area networks. The ever-evolving IEEE 802.11 standards continue to add new features to keep up with the trend of in...
RPKI-Based Location-Unaware Tor Guard Relay Selection Algorithms	Zhifan Lu, Siyang Sun, Yixin Sun	2025-01-10	下载	Tor is a well-known anonymous communication tool, used by people with various privacy and security needs. Prior works have exploited routing attacks to observe Tor traffic and deanonymize Tor users.
Collaborative Content Moderation in the Fediverse	Haris Bin Zia, Aravindh Raman, Ignacio Castro, Gareth Tyson	2025-01-10	下载	The Fediverse, a group of interconnected servers providing a variety of interoperable services (e.g. micro-blogging in Mastodon) has gained rapid popularity.
UAV Swarm-enabled Collaborative Post-disaster Communications in Low Altitude Economy via a Two-stage Optimization Approach	Xiaoya Zheng, Geng Sun, Jiahui Li, Jiacheng Wang, Qingqing Wu, Dusit Niyato, Abbas Jamalipour	2025-01-10	下载	The low-altitude economy (LAE) plays an indispensable role in cargo transportation, healthcare, infrastructure inspection, and especially post-disaster communication.
Network Diffuser for Placing-Scheduling Service Function Chains with Inverse Demonstration	Zuyuan Zhang, Vaneet Aggarwal, Tian Lan	2025-01-10	下载	Network services are increasingly managed by considering chained-up virtual network functions and relevant traffic flows, known as the Service Function Chains (SFCs).

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response Scenarios	Daniel Rossi, Guido Borghi, Roberto Vezzani	2025-01-10	下载	Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial imaging with drones and UAVs for emergenc...
MLKAPS: Machine Learning and Adaptive Sampling for HPC Kernel Auto-tuning	Mathys Jam, Eric Petit, Pablo de Oliveira Castro, David Defour, Greg Henry, William Jalby	2025-01-10	下载	Many High-Performance Computing (HPC) libraries rely on decision trees to select the best kernel hyperparameters at runtime,depending on the input and environment.