Skip to content

2025-03-07

cs.AR - Architecture

标题作者发布日期PDF摘要
TPU-Gen: LLM-Driven Custom Tensor Processing Unit GeneratorDeepak Vungarala, Mohammed E. Elbtity, Sumiya Syed, Sakila Alam, Kartik Pandit, Arnob Ghosh, Ramtin Zand, Shaahin Angizi2025-03-07下载The increasing complexity and scale of Deep Neural Networks (DNNs) necessitate specialized tensor accelerators, such as Tensor Processing Units (TPUs), to meet various computational and energy efficie...
SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery ThreadsAmin Mamandipoor, Huy Dinh Tran, Mohammad Alian2025-03-07下载Networking is considered a datacenter tax, and hyperscalers push hard to provide high-performance networking with minimal resource expenditure.
MatrixFlow: System-Accelerator co-design for high-performance transformer applicationsQunyou Liu, Marina Zapater, David Atienza2025-03-07下载Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing.
Real-Time Semantic Segmentation of Aerial Images Using an Embedded U-Net: A Comparison of CPU, GPU, and FPGA WorkflowsJulien Posso, Hugo Kieffer, Nicolas Menga, Omar Hlimi, Sébastien Tarris, Hubert Guerard, Guy Bois, Matthieu Couderc, Eric Jenn2025-03-07下载This study introduces a lightweight U-Net model optimized for real-time semantic segmentation of aerial images, targeting the efficient utilization of Commercial Off-The-Shelf (COTS) embedded computin...
StreamGrid: Streaming Point Cloud Analytics via Compulsory Splitting and Deterministic TerminationYu Feng, Zheng Liu, Weikai Lin, Zihan Liu, Jingwen Leng, Minyi Guo, Zhezhi He, Jieru Zhao, Yuhao Zhu2025-03-07下载Point clouds are increasingly important in intelligent applications, but frequent off-chip memory traffic in accelerators causes pipeline stalls and leads to high energy consumption.
Piccolo: Large-Scale Graph Processing with Fine-Grained In-Memory Scatter-GatherChangmin Shin, Jaeyong Song, Hongsun Jang, Dogeun Kim, Jun Sung, Taehee Kwon, Jae Hyung Ju, Frank Liu, Yeonkyu Choi, Jinho Lee2025-03-07下载Graph processing requires irregular, fine-grained random access patterns incompatible with contemporary off-chip memory architecture, leading to inefficient data access.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
VersaSlot: Efficient Fine-grained FPGA Sharing with Big.Little Slots and Live Migration in FPGA ClusterJianfeng Gu, Hao Wang, Xiaorang Guo, Martin Schulz, Michael Gerndt2025-03-07下载As FPGAs gain popularity for on-demand application acceleration in data center computing, dynamic partial reconfiguration (DPR) has become an effective fine-grained sharing technique for FPGA multiple...
Practical Federated Learning without a ServerAkash Dhasade, Anne-Marie Kermarrec, Erick Lavoie, Johan Pouwelse, Rishi Sharma, Martijn de Vos2025-03-07下载Federated Learning (FL) enables end-user devices to collaboratively train ML models without sharing raw data, thereby preserving data privacy.
Umbilical Choir: Automated Live Testing for Edge-To-Cloud FaaS ApplicationsMohammadreza Malekabbasi, Tobias Pfandzelter, David Bermbach2025-03-07下载Application users react negatively to performance regressions or availability issues across software releases. To address this, modern cloud-based applications with their multiple daily releases rely ...
A Decentralized Sequencer and Data Availability Committee for Rollups Using Set ConsensusMargarita Capretto, Martín Ceresa, Antonio Fernández Anta, Pedro Moreno-Sánchez, César Sánchez2025-03-07下载Blockchains face a scalability challenge due to the intrinsic throughput limitations of consensus protocols and the limitation in block sizes due to decentralization.
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-ExpertsWeigao Sun, Disen Lan, Tong Zhu, Xiaoye Qu, Yu Cheng2025-03-07下载Linear Sequence Modeling (LSM) like linear attention, state space models and linear RNNs, and Mixture-of-Experts (MoE) have recently emerged as significant architectural improvements.
Efficient Parallel Scheduling for Sparse Triangular SolversToni Böhnlein, Pál András Papp, Raphael S. Steiner, Christos K. Matzoros, A. N. Yzelman2025-03-07下载We develop and analyze new scheduling algorithms for solving sparse triangular linear systems (SpTRSV) in parallel. Our approach produces highly efficient synchronous schedules for the forward- and ba...
Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic BatchingBowen Pang, Kai Li, Feifan Wang2025-03-07下载The increasing adoption of large language models (LLMs) necessitates inference serving systems that can deliver both high throughput and low latency.
Uncertainty-Aware Explainable Federated LearningYanci Zhang, Han Yu2025-03-07下载Federated Learning (FL) is a collaborative machine learning paradigm for enhancing data privacy preservation. Its privacy-preserving nature complicates the explanation of the decision-making processes...
Dilu: Enabling GPU Resourcing-on-Demand for Serverless DL Serving via Introspective ElasticityCunchi Lv, Xiao Shi, Zhengyu Lei, Jinyue Huang, Wenting Tan, Xiaohui Zheng, Xiaofang Zhao2025-03-07下载Serverless computing, with its ease of management, auto-scaling, and cost-effectiveness, is widely adopted by deep learning (DL) applications.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery ThreadsAmin Mamandipoor, Huy Dinh Tran, Mohammad Alian2025-03-07下载Networking is considered a datacenter tax, and hyperscalers push hard to provide high-performance networking with minimal resource expenditure.
REACT: Multi Robot Energy-Aware Orchestrator for Indoor Search and Rescue Critical TasksFabio Maresca, Arnau Romero, Carmen Delgado, Vincenzo Sciancalepore, Josep Paradells, Xavier Costa-Pérez2025-03-07下载Smart factories enhance production efficiency and sustainability, but emergencies like human errors, machinery failures and natural disasters pose significant risks.
RiLoCo: An ISAC-oriented AI Solution to Build RIS-empowered NetworksGuillermo Encinas-Lago, Vincenzo Sciancalepore, Henk Wymeersch, Marco Di Renzo, Xavier Costa-Perez2025-03-07下载The advance towards 6G networks comes with the promise of unprecedented performance in sensing and communication capabilities. The feat of achieving those, while satisfying the ever-growing demands pl...
Wi-Fi 6 Cross-Technology Interference Detection and Mitigation by OFDMA: an Experimental StudyThijs Havinga, Xianjun Jiao, Wei Liu, Baiheng Chen, Adnan Shahid, Ingrid Moerman2025-03-07下载Cross-Technology Interference (CTI) poses challenges for the performance and robustness of wireless networks. There are opportunities for better cooperation if the spectral occupation and technology o...
Routing for Large ML ModelsOfir Cohen, Jose Yallouz Michael Schapira, Shahar Belkar, Tal Mizrahi2025-03-07下载Training large language models (LLMs), and other large machine learning models, involves repeated communication of large volumes of data across a data center network.
Evaluation of 3D Terrestrial and Aerial Spectrum Sharing with Massive MIMO SystemsAchiel Colpaert, Zhuangzhuang Cui, Sofie Pollin2025-03-07下载Connecting aerial and terrestrial users with a single base station (BS) is increasingly challenging due to the rising number of aerial users like unmanned aerial vehicles (UAVs).
ORANSight-2.0: Foundational LLMs for O-RANPranshav Gajjar, Vijay K. Shah2025-03-07下载Despite the transformative impact of Large Language Models (LLMs) across critical domains such as healthcare, customer service, and business marketing, their integration into Open Radio Access Network...
Cross-Layer-Optimized Link Selection for Hologram Video Streaming over Millimeter Wave NetworksYiming Jiang, Yanwei Liu, Jinxia Liu, Antonios Argyriou, Yifei Chen, Wen Zhang2025-03-07下载Holographic-type communication brings an immersive tele-holography experience by delivering holographic contents to users. As the direct representation of holographic contents, hologram videos are nat...
Small noise limits of Markov chains and the PageRankVivek S Borkar, S Sowmya, Raghavendra Tripathi2025-03-07下载We recall the classical formulation of PageRank as the stationary distribution of a singularly perturbed irreducible Markov chain that is not irreducible when the perturbation parameter goes to zero.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
HyperGraph ROS: An Open-Source Robot Operating System for Hybrid Parallel Computing based on Computational HyperGraphShufang Zhang, Jiazheng Wu, Jiacheng He, Kaiyi Wang, Shan An2025-03-07下载This paper presents HyperGraph ROS, an open-source robot operating system that unifies intra-process, inter-process, and cross-device computation into a computational hypergraph for efficient message ...

cs.PF - Performance

标题作者发布日期PDF摘要
Leveraging Approximate Caching for Faster Retrieval-Augmented GenerationShai Bergman, Anne-Marie Kermarrec, Diana Petrescu, Rafael Pires, Mathis Randl, Martijn de Vos, Ji Zhang2025-03-07下载Retrieval-augmented generation (RAG) improves the reliability of large language model (LLM) answers by integrating external knowledge. However, RAG increases the end-to-end inference time since lookin...

基于 VitePress 构建