2025-03-07

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
TPU-Gen: LLM-Driven Custom Tensor Processing Unit Generator	Deepak Vungarala, Mohammed E. Elbtity, Sumiya Syed, Sakila Alam, Kartik Pandit, Arnob Ghosh, Ramtin Zand, Shaahin Angizi	2025-03-07	下载	The increasing complexity and scale of Deep Neural Networks (DNNs) necessitate specialized tensor accelerators, such as Tensor Processing Units (TPUs), to meet various computational and energy efficie...
SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery Threads	Amin Mamandipoor, Huy Dinh Tran, Mohammad Alian	2025-03-07	下载	Networking is considered a datacenter tax, and hyperscalers push hard to provide high-performance networking with minimal resource expenditure.
MatrixFlow: System-Accelerator co-design for high-performance transformer applications	Qunyou Liu, Marina Zapater, David Atienza	2025-03-07	下载	Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing.
Real-Time Semantic Segmentation of Aerial Images Using an Embedded U-Net: A Comparison of CPU, GPU, and FPGA Workflows	Julien Posso, Hugo Kieffer, Nicolas Menga, Omar Hlimi, Sébastien Tarris, Hubert Guerard, Guy Bois, Matthieu Couderc, Eric Jenn	2025-03-07	下载	This study introduces a lightweight U-Net model optimized for real-time semantic segmentation of aerial images, targeting the efficient utilization of Commercial Off-The-Shelf (COTS) embedded computin...
StreamGrid: Streaming Point Cloud Analytics via Compulsory Splitting and Deterministic Termination	Yu Feng, Zheng Liu, Weikai Lin, Zihan Liu, Jingwen Leng, Minyi Guo, Zhezhi He, Jieru Zhao, Yuhao Zhu	2025-03-07	下载	Point clouds are increasingly important in intelligent applications, but frequent off-chip memory traffic in accelerators causes pipeline stalls and leads to high energy consumption.
Piccolo: Large-Scale Graph Processing with Fine-Grained In-Memory Scatter-Gather	Changmin Shin, Jaeyong Song, Hongsun Jang, Dogeun Kim, Jun Sung, Taehee Kwon, Jae Hyung Ju, Frank Liu, Yeonkyu Choi, Jinho Lee	2025-03-07	下载	Graph processing requires irregular, fine-grained random access patterns incompatible with contemporary off-chip memory architecture, leading to inefficient data access.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
VersaSlot: Efficient Fine-grained FPGA Sharing with Big.Little Slots and Live Migration in FPGA Cluster	Jianfeng Gu, Hao Wang, Xiaorang Guo, Martin Schulz, Michael Gerndt	2025-03-07	下载	As FPGAs gain popularity for on-demand application acceleration in data center computing, dynamic partial reconfiguration (DPR) has become an effective fine-grained sharing technique for FPGA multiple...
Practical Federated Learning without a Server	Akash Dhasade, Anne-Marie Kermarrec, Erick Lavoie, Johan Pouwelse, Rishi Sharma, Martijn de Vos	2025-03-07	下载	Federated Learning (FL) enables end-user devices to collaboratively train ML models without sharing raw data, thereby preserving data privacy.
Umbilical Choir: Automated Live Testing for Edge-To-Cloud FaaS Applications	Mohammadreza Malekabbasi, Tobias Pfandzelter, David Bermbach	2025-03-07	下载	Application users react negatively to performance regressions or availability issues across software releases. To address this, modern cloud-based applications with their multiple daily releases rely ...
A Decentralized Sequencer and Data Availability Committee for Rollups Using Set Consensus	Margarita Capretto, Martín Ceresa, Antonio Fernández Anta, Pedro Moreno-Sánchez, César Sánchez	2025-03-07	下载	Blockchains face a scalability challenge due to the intrinsic throughput limitations of consensus protocols and the limitation in block sizes due to decentralization.
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts	Weigao Sun, Disen Lan, Tong Zhu, Xiaoye Qu, Yu Cheng	2025-03-07	下载	Linear Sequence Modeling (LSM) like linear attention, state space models and linear RNNs, and Mixture-of-Experts (MoE) have recently emerged as significant architectural improvements.
Efficient Parallel Scheduling for Sparse Triangular Solvers	Toni Böhnlein, Pál András Papp, Raphael S. Steiner, Christos K. Matzoros, A. N. Yzelman	2025-03-07	下载	We develop and analyze new scheduling algorithms for solving sparse triangular linear systems (SpTRSV) in parallel. Our approach produces highly efficient synchronous schedules for the forward- and ba...
Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching	Bowen Pang, Kai Li, Feifan Wang	2025-03-07	下载	The increasing adoption of large language models (LLMs) necessitates inference serving systems that can deliver both high throughput and low latency.
Uncertainty-Aware Explainable Federated Learning	Yanci Zhang, Han Yu	2025-03-07	下载	Federated Learning (FL) is a collaborative machine learning paradigm for enhancing data privacy preservation. Its privacy-preserving nature complicates the explanation of the decision-making processes...
Dilu: Enabling GPU Resourcing-on-Demand for Serverless DL Serving via Introspective Elasticity	Cunchi Lv, Xiao Shi, Zhengyu Lei, Jinyue Huang, Wenting Tan, Xiaohui Zheng, Xiaofang Zhao	2025-03-07	下载	Serverless computing, with its ease of management, auto-scaling, and cost-effectiveness, is widely adopted by deep learning (DL) applications.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery Threads	Amin Mamandipoor, Huy Dinh Tran, Mohammad Alian	2025-03-07	下载	Networking is considered a datacenter tax, and hyperscalers push hard to provide high-performance networking with minimal resource expenditure.
REACT: Multi Robot Energy-Aware Orchestrator for Indoor Search and Rescue Critical Tasks	Fabio Maresca, Arnau Romero, Carmen Delgado, Vincenzo Sciancalepore, Josep Paradells, Xavier Costa-Pérez	2025-03-07	下载	Smart factories enhance production efficiency and sustainability, but emergencies like human errors, machinery failures and natural disasters pose significant risks.
RiLoCo: An ISAC-oriented AI Solution to Build RIS-empowered Networks	Guillermo Encinas-Lago, Vincenzo Sciancalepore, Henk Wymeersch, Marco Di Renzo, Xavier Costa-Perez	2025-03-07	下载	The advance towards 6G networks comes with the promise of unprecedented performance in sensing and communication capabilities. The feat of achieving those, while satisfying the ever-growing demands pl...
Wi-Fi 6 Cross-Technology Interference Detection and Mitigation by OFDMA: an Experimental Study	Thijs Havinga, Xianjun Jiao, Wei Liu, Baiheng Chen, Adnan Shahid, Ingrid Moerman	2025-03-07	下载	Cross-Technology Interference (CTI) poses challenges for the performance and robustness of wireless networks. There are opportunities for better cooperation if the spectral occupation and technology o...
Routing for Large ML Models	Ofir Cohen, Jose Yallouz Michael Schapira, Shahar Belkar, Tal Mizrahi	2025-03-07	下载	Training large language models (LLMs), and other large machine learning models, involves repeated communication of large volumes of data across a data center network.
Evaluation of 3D Terrestrial and Aerial Spectrum Sharing with Massive MIMO Systems	Achiel Colpaert, Zhuangzhuang Cui, Sofie Pollin	2025-03-07	下载	Connecting aerial and terrestrial users with a single base station (BS) is increasingly challenging due to the rising number of aerial users like unmanned aerial vehicles (UAVs).
ORANSight-2.0: Foundational LLMs for O-RAN	Pranshav Gajjar, Vijay K. Shah	2025-03-07	下载	Despite the transformative impact of Large Language Models (LLMs) across critical domains such as healthcare, customer service, and business marketing, their integration into Open Radio Access Network...
Cross-Layer-Optimized Link Selection for Hologram Video Streaming over Millimeter Wave Networks	Yiming Jiang, Yanwei Liu, Jinxia Liu, Antonios Argyriou, Yifei Chen, Wen Zhang	2025-03-07	下载	Holographic-type communication brings an immersive tele-holography experience by delivering holographic contents to users. As the direct representation of holographic contents, hologram videos are nat...
Small noise limits of Markov chains and the PageRank	Vivek S Borkar, S Sowmya, Raghavendra Tripathi	2025-03-07	下载	We recall the classical formulation of PageRank as the stationary distribution of a singularly perturbed irreducible Markov chain that is not irreducible when the perturbation parameter goes to zero.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
HyperGraph ROS: An Open-Source Robot Operating System for Hybrid Parallel Computing based on Computational HyperGraph	Shufang Zhang, Jiazheng Wu, Jiacheng He, Kaiyi Wang, Shan An	2025-03-07	下载	This paper presents HyperGraph ROS, an open-source robot operating system that unifies intra-process, inter-process, and cross-device computation into a computational hypergraph for efficient message ...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Leveraging Approximate Caching for Faster Retrieval-Augmented Generation	Shai Bergman, Anne-Marie Kermarrec, Diana Petrescu, Rafael Pires, Mathis Randl, Martijn de Vos, Ji Zhang	2025-03-07	下载	Retrieval-augmented generation (RAG) improves the reliability of large language model (LLM) answers by integrating external knowledge. However, RAG increases the end-to-end inference time since lookin...