2025-03-14

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
ARCAS: Adaptive Runtime System for Chiplet-Aware Scheduling	Alessandro Fogli, Bo Zhao, Peter Pietzuch, Jana Giceva	2025-03-14	下载	The growing disparity between CPU core counts and available memory bandwidth has intensified memory contention in servers. This particularly affects highly parallelizable applications, which must achi...
Cost-effective Deep Learning Infrastructure with NVIDIA GPU	Aatiz Ghimire, Shahnawaz Alam, Siman Giri, Madhav Prasad Ghimire	2025-03-14	下载	The growing demand for computational power is driven by advancements in deep learning, the increasing need for big data processing, and the requirements of scientific simulations for academic and rese...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Story of Two GPUs: Characterizing the Resilience of Hopper H100 and Ampere A100 GPUs	Shengkun Cui, Archit Patke, Hung Nguyen, Aditya Ranjan, Ziheng Chen, Phuong Cao, Gregory Bauer, Brett Bode, Catello Di Martino, Saurabh Jha, Chandra Narayanaswami, Daby Sow, Zbigniew T. Kalbarczyk, Ravishankar K. Iyer	2025-03-14	下载	This study characterizes GPU resilience in Delta, a large-scale AI system that consists of 1,056 A100 and H100 GPUs, with over 1,300 petaflops of peak throughput. We used 2.
Performance Analysis of Decentralized Federated Learning Deployments	Chengyan Jiang, Jiamin Fan, Talal Halabi, Israat Haque	2025-03-14	下载	The widespread adoption of smartphones and smart wearable devices has led to the widespread use of Centralized Federated Learning (CFL) for training powerful machine learning models while preserving d...
Supervised Distributed Computing	John Augustine, Christian Scheideler, Julian Werthmann	2025-03-14	下载	We introduce a new framework for distributed computing that extends and refines the standard master-worker approach of scheduling multi-threaded computations.
Finding a Fair Scoring Function for Top- $k$ Selection: From Hardness to Practice	Guangya Cai	2025-03-14	下载	Selecting a subset of the $k$ "best" items from a dataset of $n$ items, based on a scoring function, is a key task in decision-making. Given the rise of automated decision-making software, it is impor...
ARCAS: Adaptive Runtime System for Chiplet-Aware Scheduling	Alessandro Fogli, Bo Zhao, Peter Pietzuch, Jana Giceva	2025-03-14	下载	The growing disparity between CPU core counts and available memory bandwidth has intensified memory contention in servers. This particularly affects highly parallelizable applications, which must achi...
On the Limits of Distributed Quantum Computing	Francesco d'Amore	2025-03-14	下载	Quantum advantage is well-established in centralized computing, where quantum algorithms can solve certain problems exponentially faster than classical ones.
Efficient Distributed MLLM Training with Cornstarch	Insu Jang, Runyu Lu, Nikhil Bansal, Ang Chen, Mosharaf Chowdhury	2025-03-14	下载	Multimodal large language models (MLLMs) extend the capabilities of large language models (LLMs) by combining heterogeneous model architectures to handle diverse modalities like images and audio.
Towards Fine-Grained Scalability for Stateful Stream Processing Systems	Yunfan Qing, Wenli Zheng	2025-03-14	下载	Dynamic scaling is critical to stream processing engines, as their long-running nature demands adaptive resource management. Existing scaling approaches easily cause performance degradation due to coa...
Federated Koopman-Reservoir Learning for Large-Scale Multivariate Time-Series Anomaly Detection	Long Tan Le, Tung-Anh Nguyen, Han Shu, Suranga Seneviratne, Choong Seon Hong, Nguyen H. Tran	2025-03-14	下载	The proliferation of edge devices has dramatically increased the generation of multivariate time-series (MVTS) data, essential for applications from healthcare to smart cities.
Cost-effective Deep Learning Infrastructure with NVIDIA GPU	Aatiz Ghimire, Shahnawaz Alam, Siman Giri, Madhav Prasad Ghimire	2025-03-14	下载	The growing demand for computational power is driven by advancements in deep learning, the increasing need for big data processing, and the requirements of scientific simulations for academic and rese...
LLMPerf: GPU Performance Modeling meets Large Language Models	Khoi N. M. Nguyen, Hoang Duy Nguyen Do, Huyen Thao Le, Thanh Tuan Dao	2025-03-14	下载	Performance modeling, a pivotal domain in program cost analysis, currently relies on manually crafted models constrained by various program and hardware limitations, especially in the intricate landsc...
The Case for ABI Interoperability in a Fault Tolerant MPI	Yao Xu, Grace Nansamba, Anthony Skjellum, Gene Cooperman	2025-03-14	下载	There is new momentum behind an interoperable ABI for MPI, which will be a major component of MPI-5. This capability brings true separation of concerns to a running MPI computation.
Sustainable Grid through Distributed Data Centers: Spinning AI Demand for Grid Stabilization and Optimization	Scott C Evans, Nathan Dahlin, Ibrahima Ndiaye, Sachini Piyoni Ekanayake, Alexander Duncan, Blake Rose, Hao Huang	2025-03-14	下载	We propose a disruptive paradigm to actively place and schedule TWhrs of parallel AI jobs strategically on the grid, at distributed, grid-aware high performance compute data centers (HPC) capable of u...
SmartShards: Churn-Tolerant Continuously Available Distributed Ledger	Joseph Oglio, Mikhail Nesterenko, Gokarna Sharma	2025-03-14	下载	We present SmartShards: a new sharding algorithm for improving Byzantine tolerance and churn resistance in blockchains. Our algorithm places a peer in multiple shards to create an overlap.
Beyond A Single AI Cluster: A Survey of Decentralized LLM Training	Haotian Dong, Jingyan Jiang, Rongwei Lu, Jiajun Luo, Jiajun Song, Bowen Li, Ying Shen, Zhi Wang	2025-03-14	下载	The emergence of large language models (LLMs) has revolutionized AI development, yet the resource demands beyond a single cluster or even datacenter, limiting accessibility to well-resourced organizat...
Power-Aware Scheduling for Multi-Center HPC Electricity Cost Optimization	Abrar Hossain, Abubeker Abdurahman, Mohammad A. Islam, Kishwar Ahmed	2025-03-14	下载	This paper introduces TARDIS (Temporal Allocation for Resource Distribution using Intelligent Scheduling), a novel power-aware job scheduler for High-Performance Computing (HPC) systems that minimizes...
FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration	Xue Feng, M. Paul Laiu, Thomas Strohmer	2025-03-14	下载	Federated learning (FL) is a distributed machine learning approach that enables multiple local clients and a central server to collaboratively train a model while keeping the data on their own devices...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Performance Analysis of Decentralized Federated Learning Deployments	Chengyan Jiang, Jiamin Fan, Talal Halabi, Israat Haque	2025-03-14	下载	The widespread adoption of smartphones and smart wearable devices has led to the widespread use of Centralized Federated Learning (CFL) for training powerful machine learning models while preserving d...
Enhancing Resiliency of Sketch-based Security via LSB Sharing-based Dynamic Late Merging	Seungsam Yang, Seyed Mohammad Mehdi Mirnajafizadeh, Sian Kim, Rhongho Jang, DaeHun Nyang	2025-03-14	下载	With the exponentially growing Internet traffic, sketch data structure with a probabilistic algorithm has been expected to be an alternative solution for non-compromised (non-selective) security monit...
Scalable Video Conferencing Using SDN Principles	Oliver Michel, Satadal Sengupta, Hyojoon Kim, Ravi Netravali, Jennifer Rexford	2025-03-14	下载	Video-conferencing applications face an unwavering surge in traffic, stressing their underlying infrastructure in unprecedented ways. This paper rethinks the key building block for conferencing infras...
Experimental evaluation of xApp Conflict Mitigation Framework in O-RAN: Insights from Testbed deployment in OTIC	Abida Sultana, Cezary Adamczyk, Mayukh Roy Chowdhury, Adrian Kliks, Aloizio Da Silva	2025-03-14	下载	Conflict Mitigation (CM) in Open Radio Access Network (O-RAN) is a topic that is gaining importance as commercial O-RAN deployments become more complex.
PassiveBLE: Towards Fully Commodity-Compatible BLE Backscatter	Huixin Dong, Yijie Wu, Feiyu Li, Wei Kuang, Yuan He, Qian Zhang, Wei Wang	2025-03-14	下载	Bluetooth Low Energy (BLE) backscatter is a promising candidate for battery-free Internet of Things (IoT) applications. Unlike existing commodity-level BLE backscatter systems that only enable one-sho...
Optimizing 6G Dense Network Deployment for the Metaverse Using Deep Reinforcement Learning	Jie Zhang, Swarna Chetty, Qiao Wang, Chenrui Sun, Paul Daniel Mitchell, David Grace, Hamed Ahmadi	2025-03-14	下载	As the Metaverse envisions deeply immersive and pervasive connectivity in 6G networks, Integrated Access and Backhaul (IAB) emerges as a critical enabler to meet the demanding requirements of massive ...
Sketch Disaggregation Across Time and Space	Jonatan Langlet, Peiqing Chen, Michael Mitzenmacher, Ran Ben Basat, Zaoxing Liu, Gianni Antichi	2025-03-14	下载	Streaming analytics are essential in a large range of applications, including databases, networking, and machine learning. To optimize performance, practitioners are increasingly offloading such analy...
Non Line-of-Sight Optical Wireless Communication using Neuromorphic Cameras	Abbaas Alif Mohamed Nishar, Alireza Marefat, Ashwin Ashok	2025-03-14	下载	Neuromorphic or event cameras, inspired by biological vision systems, capture changes in illumination with high temporal resolution and efficiency, producing streams of events rather than traditional ...
Reliable and Cost-Efficient IoT Connectivity for Smart Agriculture: A Comparative Study of LPWAN, 5G, and Hybrid Connectivity Models	Mohamed Shabeer Mohamed Rafi, Mehran Behjati, Ahmad Sahban Rafsanjani	2025-03-14	下载	The integration of the Internet of Things (IoT) in smart agriculture has transformed farming practices by enabling real time monitoring, data-driven decision making, and automation.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and Discovery	Balaji Rama, Kai Mei, Yongfeng Zhang	2025-03-14	下载	Autonomous LLM-based agents have emerged as a powerful paradigm for complex task execution, yet the field lacks standardized tools for development, deployment, distribution and discovery of agents.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
ARCAS: Adaptive Runtime System for Chiplet-Aware Scheduling	Alessandro Fogli, Bo Zhao, Peter Pietzuch, Jana Giceva	2025-03-14	下载	The growing disparity between CPU core counts and available memory bandwidth has intensified memory contention in servers. This particularly affects highly parallelizable applications, which must achi...
LLMPerf: GPU Performance Modeling meets Large Language Models	Khoi N. M. Nguyen, Hoang Duy Nguyen Do, Huyen Thao Le, Thanh Tuan Dao	2025-03-14	下载	Performance modeling, a pivotal domain in program cost analysis, currently relies on manually crafted models constrained by various program and hardware limitations, especially in the intricate landsc...