2025-05-06

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data Dependencies	Shuyao Cheng, Rui Zhang, Wenkai He, Pengwei Jin, Chongxiao Li, Zidong Du, Xing Hu, Yifan Hao, Guanglin Xu, Yuanbo Wen, Ling Li, Qi Guo, Yunji Chen	2025-05-06	下载	Automated processor design, which can significantly reduce human efforts and accelerate design cycles, has received considerable attention. While recent advancements have automatically designed single...
Hardware vs. Software Implementation of Warp-Level Features in Vortex RISC-V GPU	Huanzhi Pu, Rishabh Ravi, Shinnung Jeong, Udit Subramanya, Euijun Chung, Jisheng Zhao, Chihyo Ahn, Hyesoon Kim	2025-05-06	下载	RISC-V GPUs present a promising path for supporting GPU applications. Traditionally, GPUs achieve high efficiency through the SPMD (Single Program Multiple Data) programming model.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving	Shan Yu, Jiarong Xing, Yifan Qiao, Mingyuan Ma, Yangmin Li, Yang Wang, Shuo Yang, Zhiqiang Xie, Shiyi Cao, Ke Bao, Ion Stoica, Harry Xu, Ying Sheng	2025-05-06	下载	Serving large language models (LLMs) is expensive, especially for providers hosting many models, making cost reduction essential. The unique workload patterns of serving multiple LLMs (i.e.
Rollbaccine : Herd Immunity against Storage Rollback Attacks in TEEs [Technical Report]	David Chu, Aditya Balasubramanian, Dee Bao, Natacha Crooks, Heidi Howard, Lucky E. Katahanas, Soujanya Ponnapalli	2025-05-06	下载	Today, users can "lift-and-shift" unmodified applications into modern, VM-based Trusted Execution Environments (TEEs) in order to gain hardware-based security guarantees.
Can Large Language Models Predict Parallel Code Performance?	Gregory Bolet, Giorgis Georgakoudis, Harshitha Menon, Konstantinos Parasyris, Niranjan Hasabnis, Hayden Estes, Kirk W. Cameron, Gal Oren	2025-05-06	下载	Accurate determination of the performance of parallel GPU code typically requires execution-time profiling on target hardware -- an increasingly prohibitive step due to limited access to high-end GPUs...
Decentralized Distributed Proximal Policy Optimization (DD-PPO) for High Performance Computing Scheduling on Multi-User Systems	Matthew Sgambati, Aleksandar Vakanski, Matthew Anderson	2025-05-06	下载	Resource allocation in High Performance Computing (HPC) environments presents a complex and multifaceted challenge for job scheduling algorithms.
MARCO: Multi-Agent Code Optimization with Real-Time Knowledge Integration for High-Performance Computing	Asif Rahman, Veljko Cvetkovic, Kathleen Reece, Aidan Walters, Yasir Hassan, Aneesh Tummeti, Bryan Torres, Denise Cooney, Margaret Ellis, Dimitrios S. Nikolopoulos	2025-05-06	下载	Large language models (LLMs) have transformed software development through code generation capabilities, yet their effectiveness for high-performance computing (HPC) remains limited.
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal Convergence	Shuhua Yu, Dusan Jakovetic, Soummya Kar	2025-05-06	下载	Heavy-tailed noise in nonconvex stochastic optimization has garnered increasing research interest, as empirical studies, including those on training attention models, suggest it is a more realistic gr...
Revisiting Lower Bounds for Two-Step Consensus	Fedor Ryabinin, Alexey Gotsman, Pierre Sutra	2025-05-06	下载	A seminal result by Lamport shows that at least $\max\{2e+f+1,2f+1\}$ processes are required to implement partially synchronous consensus that tolerates $f$ process failures and can furthermore decide...
TailBench++: Flexible Multi-Client, Multi-Server Benchmarking for Latency-Critical Workloads	Zhilin Li, Lucia Pons, Salvador Petit, Julio Sahuquillo, Julio Pons	2025-05-06	下载	Cloud systems have rapidly expanded worldwide in the last decade, shifting computational tasks to cloud servers where clients submit their requests.
A Hashgraph-Inspired Consensus Mechanism for Reliable Multi-Model Reasoning	Kolawole E. Ogunsina, Morayo A. Ogunsina	2025-05-06	下载	Inconsistent outputs and hallucinations from large language models (LLMs) are major obstacles to reliable AI systems. When different proprietary reasoning models (RMs), such as those by OpenAI, Google...
Elevating Semantic Exploration: A Novel Approach Utilizing Distributed Repositories	Valerio Bellandi	2025-05-06	下载	Centralized and distributed systems are two main approaches to organizing ICT infrastructure, each with its pros and cons. Centralized systems concentrate resources in one location, making management ...
The Tensor-Core Beamformer: A High-Speed Signal-Processing Library for Multidisciplinary Use	Leon Oostrum, Bram Veenboer, Ronald Rook, Michael Brown, Pieter Kruizinga, John W. Romein	2025-05-06	下载	Beamforming is a well-known technique to combine signals from multiple sensors. It has a wide range of application domains. This paper introduces the Tensor-Core Beamformer: a generic, optimized beamf...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Terahertz Spatial Wireless Channel Modeling with Radio Radiance Field	John Song, Lihao Zhang, Feng Ye, Haijian Sun	2025-05-06	下载	Terahertz (THz) communication is a key enabler for 6G systems, offering ultra-wide bandwidth and unprecedented data rates. However, THz signal propagation differs significantly from lower-frequency ba...
Intelligent Load Balancing Systems using Reinforcement Learning System	Raju Singh	2025-05-06	下载	Load Balancing is a fundamental technology for scaling cloud infrastructure. It enables systems to distribute incoming traffic across backend servers using predefined algorithms such as round robin, w...
Hybrid Quantum-Classical Maximum-Likelihood Detection via Grover-based Adaptive Search for RIS-assisted Broadband Wireless Systems	Maryam Tariq, Raneem Abdelrahim, Omar Alhussein, Sami Muhaidat	2025-05-06	下载	The escalating complexity and stringent performance demands of sixth-generation wireless systems necessitate advanced signal processing methods capable of simultaneously achieving high spectral effici...
Minimum Congestion Routing of Unsplittable Flows in Data-Center Networks	Miguel Ferreira, Nirav Atre, Justine Sherry, Michael Dinitz, João Luís Sobrinho	2025-05-06	下载	Millions of flows are routed concurrently through a modern data-center. These networks are often built as Clos topologies, and flow demands are constrained only by the link capacities at the ingress a...
Task-Oriented Multimodal Token Transmission in Resource-Constrained Multiuser Networks	Junhe Zhang, Wanli Ni, Pengwei Wang, Dongyu Wang	2025-05-06	下载	With the emergence of large model-based agents, widely adopted transformer-based architectures inevitably produce excessively long token embeddings for transmission, which may result in high bandwidth...
Multi-Agent Reinforcement Learning Scheduling to Support Low Latency in Teleoperated Driving	Giacomo Avanzi, Marco Giordani, Michele Zorzi	2025-05-06	下载	The teleoperated driving (TD) scenario comes with stringent Quality of Service (QoS) communication constraints, especially in terms of end-to-end (E2E) latency and reliability.
Advancing Remote and Continuous Cardiovascular Patient Monitoring through a Novel and Resource-efficient IoT-Driven Framework	Sanam Nayab, Sohail Raza Chohan, Aqsa Jameel, Syed Rehan Shah, Syed Ahsan Masud Zaidi, Aditya Nath Jha, Kamran Siddique	2025-05-06	下载	Cardiovascular diseases are a leading cause of fatalities worldwide, often occurring suddenly with limited time for intervention. Current healthcare monitoring systems for cardiac patients rely heavil...
Efficient Wi-Fi Sensing for IoT Forensics with Lossy Compression of CSI Data	Paolo Cerutti, Fabio Palmese, Marco Cominelli, Alessandro E. C. Redondi	2025-05-06	下载	Wi-Fi sensing is an emerging technology that uses channel state information (CSI) from ambient Wi-Fi signals to monitor human activity without the need for dedicated sensors.
A Trustworthy Multi-LLM Network: Challenges,Solutions, and A Use Case	Haoxiang Luo, Gang Sun, Yinqiu Liu, Dusit Niyato, Hongfang Yu, Mohammed Atiquzzaman, Schahram Dustdar	2025-05-06	下载	Large Language Models (LLMs) demonstrate strong potential across a variety of tasks in communications and networking due to their advanced reasoning capabilities.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving	Shan Yu, Jiarong Xing, Yifan Qiao, Mingyuan Ma, Yangmin Li, Yang Wang, Shuo Yang, Zhiqiang Xie, Shiyi Cao, Ke Bao, Ion Stoica, Harry Xu, Ying Sheng	2025-05-06	下载	Serving large language models (LLMs) is expensive, especially for providers hosting many models, making cost reduction essential. The unique workload patterns of serving multiple LLMs (i.e.
Can Large Language Models Predict Parallel Code Performance?	Gregory Bolet, Giorgis Georgakoudis, Harshitha Menon, Konstantinos Parasyris, Niranjan Hasabnis, Hayden Estes, Kirk W. Cameron, Gal Oren	2025-05-06	下载	Accurate determination of the performance of parallel GPU code typically requires execution-time profiling on target hardware -- an increasingly prohibitive step due to limited access to high-end GPUs...
Minimum Congestion Routing of Unsplittable Flows in Data-Center Networks	Miguel Ferreira, Nirav Atre, Justine Sherry, Michael Dinitz, João Luís Sobrinho	2025-05-06	下载	Millions of flows are routed concurrently through a modern data-center. These networks are often built as Clos topologies, and flow demands are constrained only by the link capacities at the ingress a...
Benchmark-based Study of CPU/GPU Power-Related Features through JAX and TensorFlow	Roblex Nana Tchakoute, Claude Tadonki, Petr Dokladal, Youssef Mesri	2025-05-06	下载	Power management has become a crucial focus in the modern computing landscape, considering that {\em energy} is increasingly recognized as a critical resource.
TeleEval-OS: Performance evaluations of large language models for operations scheduling	Yanyan Wang, Yingying Wang, Junli Liang, Yin Xu, Yunlong Liu, Yiming Xu, Zhengwang Jiang, Zhehe Li, Fei Li, Long Zhao, Kuang Xu, Qi Song, Xiangyang Li	2025-05-06	下载	The rapid advancement of large language models (LLMs) has significantly propelled progress in artificial intelligence, demonstrating substantial application potential across multiple specialized domai...