Skip to content

2025-04-11

cs.AR - Architecture

标题作者发布日期PDF摘要
An FPGA Compiler for On-the-Fly Adaptive CNN Deployment and ReconfigurationAlaa Mazouz, Duc Han Le, Van-Tam Nguyen2025-04-11下载We introduce ForgeMorph, a full-stack compiler for adaptive CNN deployment on FPGAs, combining design-time optimization with runtime reconfigurability.
An Early Experience with Confidential Computing Architecture for On-Device Model ProtectionSina Abdollahi, Mohammad Maheri, Sandra Siby, Marios Kogias, Hamed Haddadi2025-04-11下载Deploying machine learning (ML) models on user devices can improve privacy (by keeping data local) and reduce inference latency. Trusted Execution Environments (TEEs) are a practical solution for prot...
MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX QuantizationDaeun Kim, Jinwoo Hwang, Changhun Oh, Jongse Park2025-04-11下载Diffusion Transformer (DiT) has driven significant progress in image generation tasks. However, DiT inferencing is notoriously compute-intensive and incurs long latency even on datacenter-scale GPUs, ...
All-in-Memory Stochastic Computing using ReRAMJoão Paulo C. de Lima, Mehran Shoushtari Moghadam, Sercan Aygun, Jeronimo Castrillon, M. Hassan Najafi, Asif Ali Khan2025-04-11下载As the demand for efficient, low-power computing in embedded and edge devices grows, traditional computing methods are becoming less effective for handling complex tasks.
Efficient Architecture for RISC-V Vector Memory AccessHongyi Guan, Yichuan Gao, Chenlu Miao, Haoyang Wu, Hang Zhu, Mingfeng Lin, Huayue Liang2025-04-11下载Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns. While coalescing strided accesses is a natural solution, effectively gathering or s...
A 55-nm SRAM Chip Scanning Errors Every 125 ns for Event-Wise Soft Error MeasurementYuibi Gomi, Akira Sato, Waleed Madany, Kenichi Okada, Satoshi Adachi, Masatoshi Itoh, Masanori Hashimoto2025-04-11下载We developed a 55 nm CMOS SRAM chip that scans all data every 125 ns and outputs timestamped soft error data via an SPI interface through a FIFO.
ML For Hardware Design Interpretability: Challenges and OpportunitiesRaymond Baartmans, Andrew Ensinger, Victor Agostinelli, Lizhong Chen2025-04-11下载The increasing size and complexity of machine learning (ML) models have driven the growing need for custom hardware accelerators capable of efficiently supporting ML workloads.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
MSCCL++: Rethinking GPU Communication Abstractions for AI InferenceChangho Hwang, Peng Cheng, Roshan Dathathri, Abhinav Jangda, Saeed Maleki, Madan Musuvathi, Olli Saarikivi, Aashaka Shah, Ziyue Yang, Binyang Li, Caio Rocha, Qinghua Zhou, Mahdieh Ghazimirsaeed, Sreevatsa Anantharamu, Jithin Jose2025-04-11下载AI applications increasingly run on fast-evolving, heterogeneous hardware to maximize performance, but general-purpose libraries lag in supporting these features.
Path Connected Dynamic Graphs with a Study of Dispersion and ExplorationAshish Saxena, Kaushik Mondal2025-04-11下载In dynamic graphs, edges may be added or deleted in each synchronous round. Various connectivity models exist based on constraints on these changes.
Personalizing Federated Learning for Hierarchical Edge Networks with Non-IID DataSeunghyun Lee, Omid Tavallaie, Shuaijun Chen, Kanchana Thilakarathna, Suranga Seneviratne, Adel Nadjaran Toosi, Albert Y. Zomaya2025-04-11下载Accommodating edge networks between IoT devices and the cloud server in Hierarchical Federated Learning (HFL) enhances communication efficiency without compromising data privacy.
An Empirical Study of Production Incidents in Generative AI Cloud ServicesHaoran Yan, Yinfang Chen, Minghua Ma, Ming Wen, Shan Lu, Shenglin Zhang, Tianyin Xu, Rujia Wang, Chetan Bansal, Saravan Rajmohan, Qingwei Lin, Chaoyun Zhang, Dongmei Zhang2025-04-11下载The ever-increasing demand for generative artificial intelligence (GenAI) has motivated cloud-based GenAI services such as Azure OpenAI Service and Amazon Bedrock.
String Problems in the Congested Clique ModelShay Golan, Matan Kraus2025-04-11下载In this paper we present algorithms for several string problems in the Congested Clique model. In the Congested Clique model, nn nodes (computers) are used to solve some problem.
Assessing the Elephant in the Room in Scheduling for Current Hybrid HPC-QC ClustersPaolo Viviani, Roberto Rocco, Matteo Barbieri, Gabriella Bettonte, Elisabetta Boella, Marco Cipollini, Jonathan Frassineti, Fulvio Ganz, Sara Marzella, Daniele Ottaviani, Simone Rizzo, Alberto Scionti, Chiara Vercellino, Giacomo Vitali, Olivier Terzo, Bartolomeo Montrucchio, Daniele Gregori2025-04-11下载Quantum computing resources are among the most promising candidates for extending the computational capabilities of High-Performance Computing (HPC) systems.
A Nonlinear Hash-based Optimization Method for SpMV on GPUsChen Yan, Boyu Diao, Hangda Liu, Zhulin An, Yongjun Xu2025-04-11下载Sparse matrix-vector multiplication (SpMV) is a fundamental operation with a wide range of applications in scientific computing and artificial intelligence.
Trabant: A Serverless Architecture for Multi-Tenant Orbital Edge ComputingTobias Pfandzelter, Nikita Bauer, Alexander Leis, Corentin Perdrizet, Felix Trautwein, Trever Schirmer, Osama Abboud, David Bermbach2025-04-11下载Orbital edge computing reduces the data transmission needs of Earth observation satellites by processing sensor data on-board, allowing near-real-time insights while minimizing downlink costs.
Efficient Architecture for RISC-V Vector Memory AccessHongyi Guan, Yichuan Gao, Chenlu Miao, Haoyang Wu, Hang Zhu, Mingfeng Lin, Huayue Liang2025-04-11下载Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns. While coalescing strided accesses is a natural solution, effectively gathering or s...
Self-Stabilizing Weakly Byzantine Perpetual Gathering of Mobile AgentsJion Hirose, Ryota Eguchi, Yuichi Sudo2025-04-11下载We study the \emph{Byzantine} gathering problem involving kk mobile agents with unique identifiers (IDs), ff of which are Byzantine. These agents start the execution of a common algorithm from (poss...
Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge DevicesShengyuan Ye, Bei Ouyang, Liekang Zeng, Tianyi Qian, Xiaowen Chu, Jian Tang, Xu Chen2025-04-11下载Generative large language models (LLMs) have garnered significant attention due to their exceptional capabilities in various AI tasks. Traditionally deployed in cloud datacenters, LLMs are now increas...
A Hybrid Cloud Management Plane for Data Processing PipelinesVignesh Babu, Feng Lu, Haotian Wu, Cameron Moberg2025-04-11下载As organizations increasingly rely on data-driven insights, the ability to run data intensive applications seamlessly across multiple cloud environments becomes critical for tapping into cloud innovat...
SpecEE: Accelerating Large Language Model Inference with Speculative Early ExitingJiaming Xu, Jiayi Pan, Yongkang Zhou, Siming Chen, Jinhao Li, Yaoxiu Lian, Junyi Wu, Guohao Dai2025-04-11下载Early exiting has recently emerged as a promising technique for accelerating large language models (LLMs) by effectively reducing the hardware computation and memory access.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Analyzing Localizability of LEO/MEO Hybrid Networks: A Stochastic Geometry ApproachRuibo Wang, Mustafa A. Kishk, Howard H. Yang, Mohamed-Slim Alouini2025-04-11下载With the increase in global positioning service demands and the requirement for more precise positioning, assisting existing medium and high orbit satellite-enabled positioning systems with low Earth ...
Optimizing Collaborative UAV Networks for Data Efficiency in IoT EcosystemsPriyavrat Dev Sharma, Ibrahim Sorkhoh, Muthucumaru Maheswaran2025-04-11下载Advances in the Internet of Things are revolutionizing data acquisition, enhancing artificial intelligence and quality of service. Unmanned Aerial Vehicles (UAVs) provide an efficient data-gathering s...
Target Tracking With ISAC Using EMLSR in Next-Generation IEEE 802.11 WLANs: Non-Cooperative and Cooperative ApproachesChing-Lun Tai, Jingyuan Zhang, Douglas M. Blough, Raghupathy Sivakumar2025-04-11下载New amendments support Wi-Fi access points (APs) and stations (STAs) in next-generation IEEE 802.11 wireless local area networks (WLANs). IEEE 802.
CertainSync: Rateless Set Reconciliation with CertaintyTomer Keniagin, Eitan Yaakobi, Ori Rottenstreich2025-04-11下载Set reconciliation is a fundamental task in distributed systems, particularly in blockchain networks, where it enables synchronization of transaction pools among peers and facilitates block disseminat...
CICV5G: A 5G Communication Delay Dataset for PnC in Cloud-based Intelligent Connected VehiclesXinrui Zhang, Peizhi Zhang, Junpeng Huang, Haojie Feng, Yining Ma, Feng Shen, Lu Xiong2025-04-11下载Cloud-based intelligent connected vehicles (CICVs) leverage cloud computing and vehicle-to-everything (V2X) to enable efficient information exchange and cooperative control.
Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge DevicesShengyuan Ye, Bei Ouyang, Liekang Zeng, Tianyi Qian, Xiaowen Chu, Jian Tang, Xu Chen2025-04-11下载Generative large language models (LLMs) have garnered significant attention due to their exceptional capabilities in various AI tasks. Traditionally deployed in cloud datacenters, LLMs are now increas...
A Hybrid Cloud Management Plane for Data Processing PipelinesVignesh Babu, Feng Lu, Haotian Wu, Cameron Moberg2025-04-11下载As organizations increasingly rely on data-driven insights, the ability to run data intensive applications seamlessly across multiple cloud environments becomes critical for tapping into cloud innovat...

cs.PF - Performance

标题作者发布日期PDF摘要
TorchFX: A modern approach to Audio DSP with PyTorch and GPU accelerationMatteo Spanio, Antonio Rodà2025-04-11下载The burgeoning complexity and real-time processing demands of audio signals necessitate optimized algorithms that harness the computational prowess of Graphics Processing Units (GPUs).

基于 VitePress 构建