Skip to content

2025-05-06

cs.AR - Architecture

标题作者发布日期PDF摘要
QiMeng-CPU-v2: Automated Superscalar Processor Design by Learning Data DependenciesShuyao Cheng, Rui Zhang, Wenkai He, Pengwei Jin, Chongxiao Li, Zidong Du, Xing Hu, Yifan Hao, Guanglin Xu, Yuanbo Wen, Ling Li, Qi Guo, Yunji Chen2025-05-06下载Automated processor design, which can significantly reduce human efforts and accelerate design cycles, has received considerable attention. While recent advancements have automatically designed single...
Hardware vs. Software Implementation of Warp-Level Features in Vortex RISC-V GPUHuanzhi Pu, Rishabh Ravi, Shinnung Jeong, Udit Subramanya, Euijun Chung, Jisheng Zhao, Chihyo Ahn, Hyesoon Kim2025-05-06下载RISC-V GPUs present a promising path for supporting GPU applications. Traditionally, GPUs achieve high efficiency through the SPMD (Single Program Multiple Data) programming model.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM ServingShan Yu, Jiarong Xing, Yifan Qiao, Mingyuan Ma, Yangmin Li, Yang Wang, Shuo Yang, Zhiqiang Xie, Shiyi Cao, Ke Bao, Ion Stoica, Harry Xu, Ying Sheng2025-05-06下载Serving large language models (LLMs) is expensive, especially for providers hosting many models, making cost reduction essential. The unique workload patterns of serving multiple LLMs (i.e.
Rollbaccine : Herd Immunity against Storage Rollback Attacks in TEEs [Technical Report]David Chu, Aditya Balasubramanian, Dee Bao, Natacha Crooks, Heidi Howard, Lucky E. Katahanas, Soujanya Ponnapalli2025-05-06下载Today, users can "lift-and-shift" unmodified applications into modern, VM-based Trusted Execution Environments (TEEs) in order to gain hardware-based security guarantees.
Can Large Language Models Predict Parallel Code Performance?Gregory Bolet, Giorgis Georgakoudis, Harshitha Menon, Konstantinos Parasyris, Niranjan Hasabnis, Hayden Estes, Kirk W. Cameron, Gal Oren2025-05-06下载Accurate determination of the performance of parallel GPU code typically requires execution-time profiling on target hardware -- an increasingly prohibitive step due to limited access to high-end GPUs...
Decentralized Distributed Proximal Policy Optimization (DD-PPO) for High Performance Computing Scheduling on Multi-User SystemsMatthew Sgambati, Aleksandar Vakanski, Matthew Anderson2025-05-06下载Resource allocation in High Performance Computing (HPC) environments presents a complex and multifaceted challenge for job scheduling algorithms.
MARCO: Multi-Agent Code Optimization with Real-Time Knowledge Integration for High-Performance ComputingAsif Rahman, Veljko Cvetkovic, Kathleen Reece, Aidan Walters, Yasir Hassan, Aneesh Tummeti, Bryan Torres, Denise Cooney, Margaret Ellis, Dimitrios S. Nikolopoulos2025-05-06下载Large language models (LLMs) have transformed software development through code generation capabilities, yet their effectiveness for high-performance computing (HPC) remains limited.
Decentralized Nonconvex Optimization under Heavy-Tailed Noise: Normalization and Optimal ConvergenceShuhua Yu, Dusan Jakovetic, Soummya Kar2025-05-06下载Heavy-tailed noise in nonconvex stochastic optimization has garnered increasing research interest, as empirical studies, including those on training attention models, suggest it is a more realistic gr...
Revisiting Lower Bounds for Two-Step ConsensusFedor Ryabinin, Alexey Gotsman, Pierre Sutra2025-05-06下载A seminal result by Lamport shows that at least max{2e+f+1,2f+1}\max\{2e+f+1,2f+1\} processes are required to implement partially synchronous consensus that tolerates ff process failures and can furthermore decide...
TailBench++: Flexible Multi-Client, Multi-Server Benchmarking for Latency-Critical WorkloadsZhilin Li, Lucia Pons, Salvador Petit, Julio Sahuquillo, Julio Pons2025-05-06下载Cloud systems have rapidly expanded worldwide in the last decade, shifting computational tasks to cloud servers where clients submit their requests.
A Hashgraph-Inspired Consensus Mechanism for Reliable Multi-Model ReasoningKolawole E. Ogunsina, Morayo A. Ogunsina2025-05-06下载Inconsistent outputs and hallucinations from large language models (LLMs) are major obstacles to reliable AI systems. When different proprietary reasoning models (RMs), such as those by OpenAI, Google...
Elevating Semantic Exploration: A Novel Approach Utilizing Distributed RepositoriesValerio Bellandi2025-05-06下载Centralized and distributed systems are two main approaches to organizing ICT infrastructure, each with its pros and cons. Centralized systems concentrate resources in one location, making management ...
The Tensor-Core Beamformer: A High-Speed Signal-Processing Library for Multidisciplinary UseLeon Oostrum, Bram Veenboer, Ronald Rook, Michael Brown, Pieter Kruizinga, John W. Romein2025-05-06下载Beamforming is a well-known technique to combine signals from multiple sensors. It has a wide range of application domains. This paper introduces the Tensor-Core Beamformer: a generic, optimized beamf...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Terahertz Spatial Wireless Channel Modeling with Radio Radiance FieldJohn Song, Lihao Zhang, Feng Ye, Haijian Sun2025-05-06下载Terahertz (THz) communication is a key enabler for 6G systems, offering ultra-wide bandwidth and unprecedented data rates. However, THz signal propagation differs significantly from lower-frequency ba...
Intelligent Load Balancing Systems using Reinforcement Learning SystemRaju Singh2025-05-06下载Load Balancing is a fundamental technology for scaling cloud infrastructure. It enables systems to distribute incoming traffic across backend servers using predefined algorithms such as round robin, w...
Hybrid Quantum-Classical Maximum-Likelihood Detection via Grover-based Adaptive Search for RIS-assisted Broadband Wireless SystemsMaryam Tariq, Raneem Abdelrahim, Omar Alhussein, Sami Muhaidat2025-05-06下载The escalating complexity and stringent performance demands of sixth-generation wireless systems necessitate advanced signal processing methods capable of simultaneously achieving high spectral effici...
Minimum Congestion Routing of Unsplittable Flows in Data-Center NetworksMiguel Ferreira, Nirav Atre, Justine Sherry, Michael Dinitz, João Luís Sobrinho2025-05-06下载Millions of flows are routed concurrently through a modern data-center. These networks are often built as Clos topologies, and flow demands are constrained only by the link capacities at the ingress a...
Task-Oriented Multimodal Token Transmission in Resource-Constrained Multiuser NetworksJunhe Zhang, Wanli Ni, Pengwei Wang, Dongyu Wang2025-05-06下载With the emergence of large model-based agents, widely adopted transformer-based architectures inevitably produce excessively long token embeddings for transmission, which may result in high bandwidth...
Multi-Agent Reinforcement Learning Scheduling to Support Low Latency in Teleoperated DrivingGiacomo Avanzi, Marco Giordani, Michele Zorzi2025-05-06下载The teleoperated driving (TD) scenario comes with stringent Quality of Service (QoS) communication constraints, especially in terms of end-to-end (E2E) latency and reliability.
Advancing Remote and Continuous Cardiovascular Patient Monitoring through a Novel and Resource-efficient IoT-Driven FrameworkSanam Nayab, Sohail Raza Chohan, Aqsa Jameel, Syed Rehan Shah, Syed Ahsan Masud Zaidi, Aditya Nath Jha, Kamran Siddique2025-05-06下载Cardiovascular diseases are a leading cause of fatalities worldwide, often occurring suddenly with limited time for intervention. Current healthcare monitoring systems for cardiac patients rely heavil...
Efficient Wi-Fi Sensing for IoT Forensics with Lossy Compression of CSI DataPaolo Cerutti, Fabio Palmese, Marco Cominelli, Alessandro E. C. Redondi2025-05-06下载Wi-Fi sensing is an emerging technology that uses channel state information (CSI) from ambient Wi-Fi signals to monitor human activity without the need for dedicated sensors.
A Trustworthy Multi-LLM Network: Challenges,Solutions, and A Use CaseHaoxiang Luo, Gang Sun, Yinqiu Liu, Dusit Niyato, Hongfang Yu, Mohammed Atiquzzaman, Schahram Dustdar2025-05-06下载Large Language Models (LLMs) demonstrate strong potential across a variety of tasks in communications and networking due to their advanced reasoning capabilities.

cs.PF - Performance

标题作者发布日期PDF摘要
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM ServingShan Yu, Jiarong Xing, Yifan Qiao, Mingyuan Ma, Yangmin Li, Yang Wang, Shuo Yang, Zhiqiang Xie, Shiyi Cao, Ke Bao, Ion Stoica, Harry Xu, Ying Sheng2025-05-06下载Serving large language models (LLMs) is expensive, especially for providers hosting many models, making cost reduction essential. The unique workload patterns of serving multiple LLMs (i.e.
Can Large Language Models Predict Parallel Code Performance?Gregory Bolet, Giorgis Georgakoudis, Harshitha Menon, Konstantinos Parasyris, Niranjan Hasabnis, Hayden Estes, Kirk W. Cameron, Gal Oren2025-05-06下载Accurate determination of the performance of parallel GPU code typically requires execution-time profiling on target hardware -- an increasingly prohibitive step due to limited access to high-end GPUs...
Minimum Congestion Routing of Unsplittable Flows in Data-Center NetworksMiguel Ferreira, Nirav Atre, Justine Sherry, Michael Dinitz, João Luís Sobrinho2025-05-06下载Millions of flows are routed concurrently through a modern data-center. These networks are often built as Clos topologies, and flow demands are constrained only by the link capacities at the ingress a...
Benchmark-based Study of CPU/GPU Power-Related Features through JAX and TensorFlowRoblex Nana Tchakoute, Claude Tadonki, Petr Dokladal, Youssef Mesri2025-05-06下载Power management has become a crucial focus in the modern computing landscape, considering that {\em energy} is increasingly recognized as a critical resource.
TeleEval-OS: Performance evaluations of large language models for operations schedulingYanyan Wang, Yingying Wang, Junli Liang, Yin Xu, Yunlong Liu, Yiming Xu, Zhengwang Jiang, Zhehe Li, Fei Li, Long Zhao, Kuang Xu, Qi Song, Xiangyang Li2025-05-06下载The rapid advancement of large language models (LLMs) has significantly propelled progress in artificial intelligence, demonstrating substantial application potential across multiple specialized domai...

基于 VitePress 构建