Skip to content

2025-03-14

cs.AR - Architecture

标题作者发布日期PDF摘要
ARCAS: Adaptive Runtime System for Chiplet-Aware SchedulingAlessandro Fogli, Bo Zhao, Peter Pietzuch, Jana Giceva2025-03-14下载The growing disparity between CPU core counts and available memory bandwidth has intensified memory contention in servers. This particularly affects highly parallelizable applications, which must achi...
Cost-effective Deep Learning Infrastructure with NVIDIA GPUAatiz Ghimire, Shahnawaz Alam, Siman Giri, Madhav Prasad Ghimire2025-03-14下载The growing demand for computational power is driven by advancements in deep learning, the increasing need for big data processing, and the requirements of scientific simulations for academic and rese...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUsShengkun Cui, Archit Patke, Hung Nguyen, Aditya Ranjan, Ziheng Chen, Phuong Cao, Gregory Bauer, Brett Bode, Catello Di Martino, Saurabh Jha, Chandra Narayanaswami, Daby Sow, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer2025-03-14下载This study characterizes GPU resilience in Delta, a large-scale AI system that consists of 1,056 A100 and H100 GPUs, with over 1,300 petaflops of peak throughput. We used 2.
Performance Analysis of Decentralized Federated Learning DeploymentsChengyan Jiang, Jiamin Fan, Talal Halabi, Israat Haque2025-03-14下载The widespread adoption of smartphones and smart wearable devices has led to the widespread use of Centralized Federated Learning (CFL) for training powerful machine learning models while preserving d...
Supervised Distributed ComputingJohn Augustine, Christian Scheideler, Julian Werthmann2025-03-14下载We introduce a new framework for distributed computing that extends and refines the standard master-worker approach of scheduling multi-threaded computations.
Finding a Fair Scoring Function for Top-kk Selection: From Hardness to PracticeGuangya Cai2025-03-14下载Selecting a subset of the kk "best" items from a dataset of nn items, based on a scoring function, is a key task in decision-making. Given the rise of automated decision-making software, it is impor...
ARCAS: Adaptive Runtime System for Chiplet-Aware SchedulingAlessandro Fogli, Bo Zhao, Peter Pietzuch, Jana Giceva2025-03-14下载The growing disparity between CPU core counts and available memory bandwidth has intensified memory contention in servers. This particularly affects highly parallelizable applications, which must achi...
On the Limits of Distributed Quantum ComputingFrancesco d'Amore2025-03-14下载Quantum advantage is well-established in centralized computing, where quantum algorithms can solve certain problems exponentially faster than classical ones.
Efficient Distributed MLLM Training with CornstarchInsu Jang, Runyu Lu, Nikhil Bansal, Ang Chen, Mosharaf Chowdhury2025-03-14下载Multimodal large language models (MLLMs) extend the capabilities of large language models (LLMs) by combining heterogeneous model architectures to handle diverse modalities like images and audio.
Towards Fine-Grained Scalability for Stateful Stream Processing SystemsYunfan Qing, Wenli Zheng2025-03-14下载Dynamic scaling is critical to stream processing engines, as their long-running nature demands adaptive resource management. Existing scaling approaches easily cause performance degradation due to coa...
Federated Koopman-Reservoir Learning for Large-Scale Multivariate Time-Series Anomaly DetectionLong Tan Le, Tung-Anh Nguyen, Han Shu, Suranga Seneviratne, Choong Seon Hong, Nguyen H. Tran2025-03-14下载The proliferation of edge devices has dramatically increased the generation of multivariate time-series (MVTS) data, essential for applications from healthcare to smart cities.
Cost-effective Deep Learning Infrastructure with NVIDIA GPUAatiz Ghimire, Shahnawaz Alam, Siman Giri, Madhav Prasad Ghimire2025-03-14下载The growing demand for computational power is driven by advancements in deep learning, the increasing need for big data processing, and the requirements of scientific simulations for academic and rese...
LLMPerf: GPU Performance Modeling meets Large Language ModelsKhoi N. M. Nguyen, Hoang Duy Nguyen Do, Huyen Thao Le, Thanh Tuan Dao2025-03-14下载Performance modeling, a pivotal domain in program cost analysis, currently relies on manually crafted models constrained by various program and hardware limitations, especially in the intricate landsc...
The Case for ABI Interoperability in a Fault Tolerant MPIYao Xu, Grace Nansamba, Anthony Skjellum, Gene Cooperman2025-03-14下载There is new momentum behind an interoperable ABI for MPI, which will be a major component of MPI-5. This capability brings true separation of concerns to a running MPI computation.
Sustainable Grid through Distributed Data Centers: Spinning AI Demand for Grid Stabilization and OptimizationScott C Evans, Nathan Dahlin, Ibrahima Ndiaye, Sachini Piyoni Ekanayake, Alexander Duncan, Blake Rose, Hao Huang2025-03-14下载We propose a disruptive paradigm to actively place and schedule TWhrs of parallel AI jobs strategically on the grid, at distributed, grid-aware high performance compute data centers (HPC) capable of u...
SmartShards: Churn-Tolerant Continuously Available Distributed LedgerJoseph Oglio, Mikhail Nesterenko, Gokarna Sharma2025-03-14下载We present SmartShards: a new sharding algorithm for improving Byzantine tolerance and churn resistance in blockchains. Our algorithm places a peer in multiple shards to create an overlap.
Beyond A Single AI Cluster: A Survey of Decentralized LLM TrainingHaotian Dong, Jingyan Jiang, Rongwei Lu, Jiajun Luo, Jiajun Song, Bowen Li, Ying Shen, Zhi Wang2025-03-14下载The emergence of large language models (LLMs) has revolutionized AI development, yet the resource demands beyond a single cluster or even datacenter, limiting accessibility to well-resourced organizat...
Power-Aware Scheduling for Multi-Center HPC Electricity Cost OptimizationAbrar Hossain, Abubeker Abdurahman, Mohammad A. Islam, Kishwar Ahmed2025-03-14下载This paper introduces TARDIS (Temporal Allocation for Resource Distribution using Intelligent Scheduling), a novel power-aware job scheduler for High-Performance Computing (HPC) systems that minimizes...
FedOSAA: Improving Federated Learning with One-Step Anderson AccelerationXue Feng, M. Paul Laiu, Thomas Strohmer2025-03-14下载Federated learning (FL) is a distributed machine learning approach that enables multiple local clients and a central server to collaboratively train a model while keeping the data on their own devices...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Performance Analysis of Decentralized Federated Learning DeploymentsChengyan Jiang, Jiamin Fan, Talal Halabi, Israat Haque2025-03-14下载The widespread adoption of smartphones and smart wearable devices has led to the widespread use of Centralized Federated Learning (CFL) for training powerful machine learning models while preserving d...
Enhancing Resiliency of Sketch-based Security via LSB Sharing-based Dynamic Late MergingSeungsam Yang, Seyed Mohammad Mehdi Mirnajafizadeh, Sian Kim, Rhongho Jang, DaeHun Nyang2025-03-14下载With the exponentially growing Internet traffic, sketch data structure with a probabilistic algorithm has been expected to be an alternative solution for non-compromised (non-selective) security monit...
Scalable Video Conferencing Using SDN PrinciplesOliver Michel, Satadal Sengupta, Hyojoon Kim, Ravi Netravali, Jennifer Rexford2025-03-14下载Video-conferencing applications face an unwavering surge in traffic, stressing their underlying infrastructure in unprecedented ways. This paper rethinks the key building block for conferencing infras...
Experimental evaluation of xApp Conflict Mitigation Framework in O-RAN: Insights from Testbed deployment in OTICAbida Sultana, Cezary Adamczyk, Mayukh Roy Chowdhury, Adrian Kliks, Aloizio Da Silva2025-03-14下载Conflict Mitigation (CM) in Open Radio Access Network (O-RAN) is a topic that is gaining importance as commercial O-RAN deployments become more complex.
PassiveBLE: Towards Fully Commodity-Compatible BLE BackscatterHuixin Dong, Yijie Wu, Feiyu Li, Wei Kuang, Yuan He, Qian Zhang, Wei Wang2025-03-14下载Bluetooth Low Energy (BLE) backscatter is a promising candidate for battery-free Internet of Things (IoT) applications. Unlike existing commodity-level BLE backscatter systems that only enable one-sho...
Optimizing 6G Dense Network Deployment for the Metaverse Using Deep Reinforcement LearningJie Zhang, Swarna Chetty, Qiao Wang, Chenrui Sun, Paul Daniel Mitchell, David Grace, Hamed Ahmadi2025-03-14下载As the Metaverse envisions deeply immersive and pervasive connectivity in 6G networks, Integrated Access and Backhaul (IAB) emerges as a critical enabler to meet the demanding requirements of massive ...
Sketch Disaggregation Across Time and SpaceJonatan Langlet, Peiqing Chen, Michael Mitzenmacher, Ran Ben Basat, Zaoxing Liu, Gianni Antichi2025-03-14下载Streaming analytics are essential in a large range of applications, including databases, networking, and machine learning. To optimize performance, practitioners are increasingly offloading such analy...
Non Line-of-Sight Optical Wireless Communication using Neuromorphic CamerasAbbaas Alif Mohamed Nishar, Alireza Marefat, Ashwin Ashok2025-03-14下载Neuromorphic or event cameras, inspired by biological vision systems, capture changes in illumination with high temporal resolution and efficiency, producing streams of events rather than traditional ...
Reliable and Cost-Efficient IoT Connectivity for Smart Agriculture: A Comparative Study of LPWAN, 5G, and Hybrid Connectivity ModelsMohamed Shabeer Mohamed Rafi, Mehran Behjati, Ahmad Sahban Rafsanjani2025-03-14下载The integration of the Internet of Things (IoT) in smart agriculture has transformed farming practices by enabling real time monitoring, data-driven decision making, and automation.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and DiscoveryBalaji Rama, Kai Mei, Yongfeng Zhang2025-03-14下载Autonomous LLM-based agents have emerged as a powerful paradigm for complex task execution, yet the field lacks standardized tools for development, deployment, distribution and discovery of agents.

cs.PF - Performance

标题作者发布日期PDF摘要
ARCAS: Adaptive Runtime System for Chiplet-Aware SchedulingAlessandro Fogli, Bo Zhao, Peter Pietzuch, Jana Giceva2025-03-14下载The growing disparity between CPU core counts and available memory bandwidth has intensified memory contention in servers. This particularly affects highly parallelizable applications, which must achi...
LLMPerf: GPU Performance Modeling meets Large Language ModelsKhoi N. M. Nguyen, Hoang Duy Nguyen Do, Huyen Thao Le, Thanh Tuan Dao2025-03-14下载Performance modeling, a pivotal domain in program cost analysis, currently relies on manually crafted models constrained by various program and hardware limitations, especially in the intricate landsc...

基于 VitePress 构建