Appearance
2025-04-11
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| An FPGA Compiler for On-the-Fly Adaptive CNN Deployment and Reconfiguration | Alaa Mazouz, Duc Han Le, Van-Tam Nguyen | 2025-04-11 | 下载 | We introduce ForgeMorph, a full-stack compiler for adaptive CNN deployment on FPGAs, combining design-time optimization with runtime reconfigurability. |
| An Early Experience with Confidential Computing Architecture for On-Device Model Protection | Sina Abdollahi, Mohammad Maheri, Sandra Siby, Marios Kogias, Hamed Haddadi | 2025-04-11 | 下载 | Deploying machine learning (ML) models on user devices can improve privacy (by keeping data local) and reduce inference latency. Trusted Execution Environments (TEEs) are a practical solution for prot... |
| MixDiT: Accelerating Image Diffusion Transformer Inference with Mixed-Precision MX Quantization | Daeun Kim, Jinwoo Hwang, Changhun Oh, Jongse Park | 2025-04-11 | 下载 | Diffusion Transformer (DiT) has driven significant progress in image generation tasks. However, DiT inferencing is notoriously compute-intensive and incurs long latency even on datacenter-scale GPUs, ... |
| All-in-Memory Stochastic Computing using ReRAM | João Paulo C. de Lima, Mehran Shoushtari Moghadam, Sercan Aygun, Jeronimo Castrillon, M. Hassan Najafi, Asif Ali Khan | 2025-04-11 | 下载 | As the demand for efficient, low-power computing in embedded and edge devices grows, traditional computing methods are becoming less effective for handling complex tasks. |
| Efficient Architecture for RISC-V Vector Memory Access | Hongyi Guan, Yichuan Gao, Chenlu Miao, Haoyang Wu, Hang Zhu, Mingfeng Lin, Huayue Liang | 2025-04-11 | 下载 | Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns. While coalescing strided accesses is a natural solution, effectively gathering or s... |
| A 55-nm SRAM Chip Scanning Errors Every 125 ns for Event-Wise Soft Error Measurement | Yuibi Gomi, Akira Sato, Waleed Madany, Kenichi Okada, Satoshi Adachi, Masatoshi Itoh, Masanori Hashimoto | 2025-04-11 | 下载 | We developed a 55 nm CMOS SRAM chip that scans all data every 125 ns and outputs timestamped soft error data via an SPI interface through a FIFO. |
| ML For Hardware Design Interpretability: Challenges and Opportunities | Raymond Baartmans, Andrew Ensinger, Victor Agostinelli, Lizhong Chen | 2025-04-11 | 下载 | The increasing size and complexity of machine learning (ML) models have driven the growing need for custom hardware accelerators capable of efficiently supporting ML workloads. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| MSCCL++: Rethinking GPU Communication Abstractions for AI Inference | Changho Hwang, Peng Cheng, Roshan Dathathri, Abhinav Jangda, Saeed Maleki, Madan Musuvathi, Olli Saarikivi, Aashaka Shah, Ziyue Yang, Binyang Li, Caio Rocha, Qinghua Zhou, Mahdieh Ghazimirsaeed, Sreevatsa Anantharamu, Jithin Jose | 2025-04-11 | 下载 | AI applications increasingly run on fast-evolving, heterogeneous hardware to maximize performance, but general-purpose libraries lag in supporting these features. |
| Path Connected Dynamic Graphs with a Study of Dispersion and Exploration | Ashish Saxena, Kaushik Mondal | 2025-04-11 | 下载 | In dynamic graphs, edges may be added or deleted in each synchronous round. Various connectivity models exist based on constraints on these changes. |
| Personalizing Federated Learning for Hierarchical Edge Networks with Non-IID Data | Seunghyun Lee, Omid Tavallaie, Shuaijun Chen, Kanchana Thilakarathna, Suranga Seneviratne, Adel Nadjaran Toosi, Albert Y. Zomaya | 2025-04-11 | 下载 | Accommodating edge networks between IoT devices and the cloud server in Hierarchical Federated Learning (HFL) enhances communication efficiency without compromising data privacy. |
| An Empirical Study of Production Incidents in Generative AI Cloud Services | Haoran Yan, Yinfang Chen, Minghua Ma, Ming Wen, Shan Lu, Shenglin Zhang, Tianyin Xu, Rujia Wang, Chetan Bansal, Saravan Rajmohan, Qingwei Lin, Chaoyun Zhang, Dongmei Zhang | 2025-04-11 | 下载 | The ever-increasing demand for generative artificial intelligence (GenAI) has motivated cloud-based GenAI services such as Azure OpenAI Service and Amazon Bedrock. |
| String Problems in the Congested Clique Model | Shay Golan, Matan Kraus | 2025-04-11 | 下载 | In this paper we present algorithms for several string problems in the Congested Clique model. In the Congested Clique model, nodes (computers) are used to solve some problem. |
| Assessing the Elephant in the Room in Scheduling for Current Hybrid HPC-QC Clusters | Paolo Viviani, Roberto Rocco, Matteo Barbieri, Gabriella Bettonte, Elisabetta Boella, Marco Cipollini, Jonathan Frassineti, Fulvio Ganz, Sara Marzella, Daniele Ottaviani, Simone Rizzo, Alberto Scionti, Chiara Vercellino, Giacomo Vitali, Olivier Terzo, Bartolomeo Montrucchio, Daniele Gregori | 2025-04-11 | 下载 | Quantum computing resources are among the most promising candidates for extending the computational capabilities of High-Performance Computing (HPC) systems. |
| A Nonlinear Hash-based Optimization Method for SpMV on GPUs | Chen Yan, Boyu Diao, Hangda Liu, Zhulin An, Yongjun Xu | 2025-04-11 | 下载 | Sparse matrix-vector multiplication (SpMV) is a fundamental operation with a wide range of applications in scientific computing and artificial intelligence. |
| Trabant: A Serverless Architecture for Multi-Tenant Orbital Edge Computing | Tobias Pfandzelter, Nikita Bauer, Alexander Leis, Corentin Perdrizet, Felix Trautwein, Trever Schirmer, Osama Abboud, David Bermbach | 2025-04-11 | 下载 | Orbital edge computing reduces the data transmission needs of Earth observation satellites by processing sensor data on-board, allowing near-real-time insights while minimizing downlink costs. |
| Efficient Architecture for RISC-V Vector Memory Access | Hongyi Guan, Yichuan Gao, Chenlu Miao, Haoyang Wu, Hang Zhu, Mingfeng Lin, Huayue Liang | 2025-04-11 | 下载 | Vector processors frequently suffer from inefficient memory accesses, particularly for strided and segment patterns. While coalescing strided accesses is a natural solution, effectively gathering or s... |
| Self-Stabilizing Weakly Byzantine Perpetual Gathering of Mobile Agents | Jion Hirose, Ryota Eguchi, Yuichi Sudo | 2025-04-11 | 下载 | We study the \emph{Byzantine} gathering problem involving mobile agents with unique identifiers (IDs), of which are Byzantine. These agents start the execution of a common algorithm from (poss... |
| Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices | Shengyuan Ye, Bei Ouyang, Liekang Zeng, Tianyi Qian, Xiaowen Chu, Jian Tang, Xu Chen | 2025-04-11 | 下载 | Generative large language models (LLMs) have garnered significant attention due to their exceptional capabilities in various AI tasks. Traditionally deployed in cloud datacenters, LLMs are now increas... |
| A Hybrid Cloud Management Plane for Data Processing Pipelines | Vignesh Babu, Feng Lu, Haotian Wu, Cameron Moberg | 2025-04-11 | 下载 | As organizations increasingly rely on data-driven insights, the ability to run data intensive applications seamlessly across multiple cloud environments becomes critical for tapping into cloud innovat... |
| SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting | Jiaming Xu, Jiayi Pan, Yongkang Zhou, Siming Chen, Jinhao Li, Yaoxiu Lian, Junyi Wu, Guohao Dai | 2025-04-11 | 下载 | Early exiting has recently emerged as a promising technique for accelerating large language models (LLMs) by effectively reducing the hardware computation and memory access. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Analyzing Localizability of LEO/MEO Hybrid Networks: A Stochastic Geometry Approach | Ruibo Wang, Mustafa A. Kishk, Howard H. Yang, Mohamed-Slim Alouini | 2025-04-11 | 下载 | With the increase in global positioning service demands and the requirement for more precise positioning, assisting existing medium and high orbit satellite-enabled positioning systems with low Earth ... |
| Optimizing Collaborative UAV Networks for Data Efficiency in IoT Ecosystems | Priyavrat Dev Sharma, Ibrahim Sorkhoh, Muthucumaru Maheswaran | 2025-04-11 | 下载 | Advances in the Internet of Things are revolutionizing data acquisition, enhancing artificial intelligence and quality of service. Unmanned Aerial Vehicles (UAVs) provide an efficient data-gathering s... |
| Target Tracking With ISAC Using EMLSR in Next-Generation IEEE 802.11 WLANs: Non-Cooperative and Cooperative Approaches | Ching-Lun Tai, Jingyuan Zhang, Douglas M. Blough, Raghupathy Sivakumar | 2025-04-11 | 下载 | New amendments support Wi-Fi access points (APs) and stations (STAs) in next-generation IEEE 802.11 wireless local area networks (WLANs). IEEE 802. |
| CertainSync: Rateless Set Reconciliation with Certainty | Tomer Keniagin, Eitan Yaakobi, Ori Rottenstreich | 2025-04-11 | 下载 | Set reconciliation is a fundamental task in distributed systems, particularly in blockchain networks, where it enables synchronization of transaction pools among peers and facilitates block disseminat... |
| CICV5G: A 5G Communication Delay Dataset for PnC in Cloud-based Intelligent Connected Vehicles | Xinrui Zhang, Peizhi Zhang, Junpeng Huang, Haojie Feng, Yining Ma, Feng Shen, Lu Xiong | 2025-04-11 | 下载 | Cloud-based intelligent connected vehicles (CICVs) leverage cloud computing and vehicle-to-everything (V2X) to enable efficient information exchange and cooperative control. |
| Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices | Shengyuan Ye, Bei Ouyang, Liekang Zeng, Tianyi Qian, Xiaowen Chu, Jian Tang, Xu Chen | 2025-04-11 | 下载 | Generative large language models (LLMs) have garnered significant attention due to their exceptional capabilities in various AI tasks. Traditionally deployed in cloud datacenters, LLMs are now increas... |
| A Hybrid Cloud Management Plane for Data Processing Pipelines | Vignesh Babu, Feng Lu, Haotian Wu, Cameron Moberg | 2025-04-11 | 下载 | As organizations increasingly rely on data-driven insights, the ability to run data intensive applications seamlessly across multiple cloud environments becomes critical for tapping into cloud innovat... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration | Matteo Spanio, Antonio Rodà | 2025-04-11 | 下载 | The burgeoning complexity and real-time processing demands of audio signals necessitate optimized algorithms that harness the computational prowess of Graphics Processing Units (GPUs). |