2025-04-11

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
An FPGA Compiler for On-the-Fly Adaptive CNN Deployment and Reconfiguration	Alaa Mazouz, Duc Han Le, Van-Tam Nguyen	2025-04-11	下载	We introduce ForgeMorph, a full-stack compiler for adaptive CNN deployment on FPGAs, combining design-time optimization with runtime reconfigurability.
An Early Experience with Confidential Computing Architecture for On-Device Model Protection	Sina Abdollahi, Mohammad Maheri, Sandra Siby, Marios Kogias, Hamed Haddadi	2025-04-11	下载	Deploying machine learning (ML) models on user devices can improve privacy (by keeping data local) and reduce inference latency. Trusted Execution Environments (TEEs) are a practical solution for prot...
MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization	Daeun Kim, Jinwoo Hwang, Changhun Oh, Jongse Park	2025-04-11	下载	Diffusion Transformer (DiT) has driven significant progress in image generation tasks. However, DiT inferencing is notoriously compute-intensive and incurs long latency even on datacenter-scale GPUs, ...
All-in-Memory Stochastic Computing using ReRAM	João Paulo C. de Lima, Mehran Shoushtari Moghadam, Sercan Aygun, Jeronimo Castrillon, M. Hassan Najafi, Asif Ali Khan	2025-04-11	下载	As the demand for efficient, low-power computing in embedded and edge devices grows, traditional computing methods are becoming less effective for handling complex tasks.
Efficient Architecture for RISC-V Vector Memory Access	Hongyi Guan, Yichuan Gao, Chenlu Miao, Haoyang Wu, Hang Zhu, Mingfeng Lin, Huayue Liang	2025-04-11	下载	Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns. While coalescing strided accesses is a natural solution, effectively gathering or s...
A 55-nm SRAM Chip Scanning Errors Every 125 ns for Event-Wise Soft Error Measurement	Yuibi Gomi, Akira Sato, Waleed Madany, Kenichi Okada, Satoshi Adachi, Masatoshi Itoh, Masanori Hashimoto	2025-04-11	下载	We developed a 55 nm CMOS SRAM chip that scans all data every 125 ns and outputs timestamped soft error data via an SPI interface through a FIFO.
ML For Hardware Design Interpretability: Challenges and Opportunities	Raymond Baartmans, Andrew Ensinger, Victor Agostinelli, Lizhong Chen	2025-04-11	下载	The increasing size and complexity of machine learning (ML) models have driven the growing need for custom hardware accelerators capable of efficiently supporting ML workloads.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
MSCCL++: Rethinking GPU Communication Abstractions for AI Inference	Changho Hwang, Peng Cheng, Roshan Dathathri, Abhinav Jangda, Saeed Maleki, Madan Musuvathi, Olli Saarikivi, Aashaka Shah, Ziyue Yang, Binyang Li, Caio Rocha, Qinghua Zhou, Mahdieh Ghazimirsaeed, Sreevatsa Anantharamu, Jithin Jose	2025-04-11	下载	AI applications increasingly run on fast-evolving, heterogeneous hardware to maximize performance, but general-purpose libraries lag in supporting these features.
Path Connected Dynamic Graphs with a Study of Dispersion and Exploration	Ashish Saxena, Kaushik Mondal	2025-04-11	下载	In dynamic graphs, edges may be added or deleted in each synchronous round. Various connectivity models exist based on constraints on these changes.
Personalizing Federated Learning for Hierarchical Edge Networks with Non-IID Data	Seunghyun Lee, Omid Tavallaie, Shuaijun Chen, Kanchana Thilakarathna, Suranga Seneviratne, Adel Nadjaran Toosi, Albert Y. Zomaya	2025-04-11	下载	Accommodating edge networks between IoT devices and the cloud server in Hierarchical Federated Learning (HFL) enhances communication efficiency without compromising data privacy.
An Empirical Study of Production Incidents in Generative AI Cloud Services	Haoran Yan, Yinfang Chen, Minghua Ma, Ming Wen, Shan Lu, Shenglin Zhang, Tianyin Xu, Rujia Wang, Chetan Bansal, Saravan Rajmohan, Qingwei Lin, Chaoyun Zhang, Dongmei Zhang	2025-04-11	下载	The ever-increasing demand for generative artificial intelligence (GenAI) has motivated cloud-based GenAI services such as Azure OpenAI Service and Amazon Bedrock.
String Problems in the Congested Clique Model	Shay Golan, Matan Kraus	2025-04-11	下载	In this paper we present algorithms for several string problems in the Congested Clique model. In the Congested Clique model, $n$ nodes (computers) are used to solve some problem.
Assessing the Elephant in the Room in Scheduling for Current Hybrid HPC-QC Clusters	Paolo Viviani, Roberto Rocco, Matteo Barbieri, Gabriella Bettonte, Elisabetta Boella, Marco Cipollini, Jonathan Frassineti, Fulvio Ganz, Sara Marzella, Daniele Ottaviani, Simone Rizzo, Alberto Scionti, Chiara Vercellino, Giacomo Vitali, Olivier Terzo, Bartolomeo Montrucchio, Daniele Gregori	2025-04-11	下载	Quantum computing resources are among the most promising candidates for extending the computational capabilities of High-Performance Computing (HPC) systems.
A Nonlinear Hash-based Optimization Method for SpMV on GPUs	Chen Yan, Boyu Diao, Hangda Liu, Zhulin An, Yongjun Xu	2025-04-11	下载	Sparse matrix-vector multiplication (SpMV) is a fundamental operation with a wide range of applications in scientific computing and artificial intelligence.
Trabant: A Serverless Architecture for Multi-Tenant Orbital Edge Computing	Tobias Pfandzelter, Nikita Bauer, Alexander Leis, Corentin Perdrizet, Felix Trautwein, Trever Schirmer, Osama Abboud, David Bermbach	2025-04-11	下载	Orbital edge computing reduces the data transmission needs of Earth observation satellites by processing sensor data on-board, allowing near-real-time insights while minimizing downlink costs.
Efficient Architecture for RISC-V Vector Memory Access	Hongyi Guan, Yichuan Gao, Chenlu Miao, Haoyang Wu, Hang Zhu, Mingfeng Lin, Huayue Liang	2025-04-11	下载	Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns. While coalescing strided accesses is a natural solution, effectively gathering or s...
Self-Stabilizing Weakly Byzantine Perpetual Gathering of Mobile Agents	Jion Hirose, Ryota Eguchi, Yuichi Sudo	2025-04-11	下载	We study the \emph{Byzantine} gathering problem involving $k$ mobile agents with unique identifiers (IDs), $f$ of which are Byzantine. These agents start the execution of a common algorithm from (poss...
Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices	Shengyuan Ye, Bei Ouyang, Liekang Zeng, Tianyi Qian, Xiaowen Chu, Jian Tang, Xu Chen	2025-04-11	下载	Generative large language models (LLMs) have garnered significant attention due to their exceptional capabilities in various AI tasks. Traditionally deployed in cloud datacenters, LLMs are now increas...
A Hybrid Cloud Management Plane for Data Processing Pipelines	Vignesh Babu, Feng Lu, Haotian Wu, Cameron Moberg	2025-04-11	下载	As organizations increasingly rely on data-driven insights, the ability to run data intensive applications seamlessly across multiple cloud environments becomes critical for tapping into cloud innovat...
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting	Jiaming Xu, Jiayi Pan, Yongkang Zhou, Siming Chen, Jinhao Li, Yaoxiu Lian, Junyi Wu, Guohao Dai	2025-04-11	下载	Early exiting has recently emerged as a promising technique for accelerating large language models (LLMs) by effectively reducing the hardware computation and memory access.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Analyzing Localizability of LEO/MEO Hybrid Networks: A Stochastic Geometry Approach	Ruibo Wang, Mustafa A. Kishk, Howard H. Yang, Mohamed-Slim Alouini	2025-04-11	下载	With the increase in global positioning service demands and the requirement for more precise positioning, assisting existing medium and high orbit satellite-enabled positioning systems with low Earth ...
Optimizing Collaborative UAV Networks for Data Efficiency in IoT Ecosystems	Priyavrat Dev Sharma, Ibrahim Sorkhoh, Muthucumaru Maheswaran	2025-04-11	下载	Advances in the Internet of Things are revolutionizing data acquisition, enhancing artificial intelligence and quality of service. Unmanned Aerial Vehicles (UAVs) provide an efficient data-gathering s...
Target Tracking With ISAC Using EMLSR in Next-Generation IEEE 802.11 WLANs: Non-Cooperative and Cooperative Approaches	Ching-Lun Tai, Jingyuan Zhang, Douglas M. Blough, Raghupathy Sivakumar	2025-04-11	下载	New amendments support Wi-Fi access points (APs) and stations (STAs) in next-generation IEEE 802.11 wireless local area networks (WLANs). IEEE 802.
CertainSync: Rateless Set Reconciliation with Certainty	Tomer Keniagin, Eitan Yaakobi, Ori Rottenstreich	2025-04-11	下载	Set reconciliation is a fundamental task in distributed systems, particularly in blockchain networks, where it enables synchronization of transaction pools among peers and facilitates block disseminat...
CICV5G: A 5G Communication Delay Dataset for PnC in Cloud-based Intelligent Connected Vehicles	Xinrui Zhang, Peizhi Zhang, Junpeng Huang, Haojie Feng, Yining Ma, Feng Shen, Lu Xiong	2025-04-11	下载	Cloud-based intelligent connected vehicles (CICVs) leverage cloud computing and vehicle-to-everything (V2X) to enable efficient information exchange and cooperative control.
Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices	Shengyuan Ye, Bei Ouyang, Liekang Zeng, Tianyi Qian, Xiaowen Chu, Jian Tang, Xu Chen	2025-04-11	下载	Generative large language models (LLMs) have garnered significant attention due to their exceptional capabilities in various AI tasks. Traditionally deployed in cloud datacenters, LLMs are now increas...
A Hybrid Cloud Management Plane for Data Processing Pipelines	Vignesh Babu, Feng Lu, Haotian Wu, Cameron Moberg	2025-04-11	下载	As organizations increasingly rely on data-driven insights, the ability to run data intensive applications seamlessly across multiple cloud environments becomes critical for tapping into cloud innovat...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration	Matteo Spanio, Antonio Rodà	2025-04-11	下载	The burgeoning complexity and real-time processing demands of audio signals necessitate optimized algorithms that harness the computational prowess of Graphics Processing Units (GPUs).