2025-11-13

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Inside VOLT: Designing an Open-Source GPU Compiler	Shinnung Jeong, Chihyo Ahn, Huanzhi Pu, Jisheng Zhao, Hyesoon Kim, Blaise Tine	2025-11-13	下载	Recent efforts in open-source GPU research are opening new avenues in a domain that has long been tightly coupled with a few commercial vendors.
Tiny Chiplets Enabled by Packaging Scaling: Opportunities in ESD Protection and Signal Integrity	Emad Haque, Pragnya Sudershan Nalla, Jeff Zhang, Sachin S. Sapatnekar, Chaitali Chakrabarti, Yu Cao	2025-11-13	下载	The scaling of advanced packaging technologies provides abundant interconnection resources for 2.5D/3D heterogeneous integration (HI), thereby enabling the construction of larger-scale VLSI systems wi...
FengHuang: Next-Generation Memory Orchestration for AI Inferencing	Jiamin Li, Lei Qu, Tao Zhang, Grigory Chirkov, Shuotao Xu, Peng Cheng, Lidong Zhou	2025-11-13	下载	This document presents a vision for a novel AI infrastructure design that has been initially validated through inference simulations on state-of-the-art large language models.
Beamspace Equalization for mmWave Massive MIMO: Algorithms and VLSI Implementations	Seyed Hadi Mirfarshbafan, Christoph Studer	2025-11-13	下载	Massive multiuser multiple-input multiple-output (MIMO) and millimeter-wave (mmWave) communication are key physical layer technologies in future wireless systems.
Critical Path Aware Timing-Driven Global Placement for Large-Scale Heterogeneous FPGAs	He Jiang, Yi Guo, Shikai Guo, Huijiang Liu, Xiaochen Li, Ning Wang, Zhixiong Di	2025-11-13	下载	Timing optimization during global placement is critical for achieving optimal circuit performance and remains a key challenge in modern Field Programmable Gate Array (FPGA) design.
Combined power management and congestion control in High-Speed Ethernet-based Networks for Supercomputers and Data Centers	Miguel Sánchez de la Rosa, Francisco J. andújar, Jesus Escudero-Sahuquillo, José L. Sánchez, Francisco J. Alfaro-Cortés	2025-11-13	下载	The demand for computer in our daily lives has led to the proliferation of Datacenters that power indispensable many services. On the other hand, computing has become essential for some research for v...
The Role of Advanced Computer Architectures in Accelerating Artificial Intelligence Workloads	Shahid Amin, Syed Pervez Hussnain Shah	2025-11-13	下载	The remarkable progress in Artificial Intelligence (AI) is foundation-ally linked to a concurrent revolution in computer architecture. As AI models, particularly Deep Neural Networks (DNNs), have grow...
AssertMiner: Module-Level Spec Generation and Assertion Mining using Static Analysis Guided LLMs	Hongqin Lyu, Yonghao Wang, Jiaxin Zhou, Zhiteng Chao, Tiancheng Wang, Huawei Li	2025-11-13	下载	Assertion-based verification (ABV) is a key approach to checking whether a logic design complies with its architectural specifications. Existing assertion generation methods based on design specificat...
Lit Silicon: A Case Where Thermal Imbalance Couples Concurrent Execution in Multiple GPUs	Marco Kurzynski, Shaizeen Aga, Di Wu	2025-11-13	下载	GPU systems are increasingly powering modern datacenters at scale. Despite being highly performant, GPU systems suffer from performance variation at the node and cluster levels.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
HPCAgentTester: A Multi-Agent LLM Approach for Enhanced HPC Unit Test Generation	Rabimba Karanjai, Lei Xu, Weidong Shi	2025-11-13	下载	Unit testing in High-Performance Computing (HPC) is critical but challenged by parallelism, complex algorithms, and diverse hardware. Traditional methods often fail to address non-deterministic behavi...
Inside VOLT: Designing an Open-Source GPU Compiler	Shinnung Jeong, Chihyo Ahn, Huanzhi Pu, Jisheng Zhao, Hyesoon Kim, Blaise Tine	2025-11-13	下载	Recent efforts in open-source GPU research are opening new avenues in a domain that has long been tightly coupled with a few commercial vendors.
EarthSight: A Distributed Framework for Low-Latency Satellite Intelligence	Ansel Kaplan Erol, Seungjun Lee, Divya Mahajan	2025-11-13	下载	Low-latency delivery of satellite imagery is essential for time-critical applications such as disaster response, intelligence, and infrastructure monitoring.
FengHuang: Next-Generation Memory Orchestration for AI Inferencing	Jiamin Li, Lei Qu, Tao Zhang, Grigory Chirkov, Shuotao Xu, Peng Cheng, Lidong Zhou	2025-11-13	下载	This document presents a vision for a novel AI infrastructure design that has been initially validated through inference simulations on state-of-the-art large language models.
STAGE: A Symbolic Tensor grAph GEnerator for distributed AI system co-design	Changhai Man, Joongun Park, Hanjiang Wu, Huan Xu, Srinivas Sridharan, Tushar Krishna	2025-11-13	下载	Optimizing the performance of large language models (LLMs) on large-scale AI training and inference systems requires a scalable and expressive mechanism to model distributed workload execution.
How Machine Learning-Data Driven Replication Strategies Enhance Fault Tolerance in Large-Scale Distributed Systems	Almond Kiruthu Murimi	2025-11-13	下载	This research paper investigates how machine learning-driven data replication strategies can enhance fault tolerance in large-scale distributed systems.
FastGraph: Optimized GPU-Enabled Algorithms for Fast Graph Building and Message Passing	Aarush Agarwal, Raymond He, Jan Kieseler, Matteo Cremonesi, Shah Rukh Qasim	2025-11-13	下载	We introduce FastGraph, a novel GPU-optimized k-nearest neighbor algorithm specifically designed to accelerate graph construction in low-dimensional spaces (2-10 dimensions), critical for high-perform...
Unlocking Dynamic Inter-Client Spatial Dependencies: A Federated Spatio-Temporal Graph Learning Method for Traffic Flow Forecasting	Feng Wang, Tianxiang Chen, Shuyue Wei, Qian Chu, Yi Zhang, Yifan Sun, Zhiming Zheng	2025-11-13	下载	Spatio-temporal graphs are powerful tools for modeling complex dependencies in traffic time series. However, the distributed nature of real-world traffic data across multiple stakeholders poses signif...
On The Performance of Prefix-Sum Parallel Kalman Filters and Smoothers on GPUs	Simo Särkkä, Ángel F. García-Fernández	2025-11-13	下载	This paper presents an experimental evaluation of parallel-in-time Kalman filters and smoothers using graphics processing units (GPUs). In particular, the paper evaluates different all-prefix-sum algo...
Massively Parallel Proof-Number Search for Impartial Games and Beyond	Tomáš Čížek, Martin Balko, Martin Schmid	2025-11-13	下载	Proof-Number Search is a best-first search algorithm with many successful applications, especially in game solving. As large-scale computing clusters become increasingly accessible, parallelization is...
Workload Schedulers -- Genesis, Algorithms and Differences	Leszek Sliwko, Vladimir Getov	2025-11-13	下载	This paper presents a novel approach to categorization of modern workload schedulers. We provide descriptions of three classes of schedulers: Operating Systems Process Schedulers, Cluster Systems Jobs...
Pk-IOTA: Blockchain empowered Programmable Data Plane to secure OPC UA communications in Industry 4.0	Rinieri Lorenzo, Gori Giacomo, Melis Andrea, Girau Roberto, Prandini Marco, Callegati Franco	2025-11-13	下载	The OPC UA protocol is becoming the de facto standard for Industry 4.0 machine-to-machine communication. It stands out as one of the few industrial protocols that provide robust security features desi...
Selection of Supervised Learning-based Sparse Matrix Reordering Algorithms	Tao Tang, Youfu Jiang, Yingbo Cui, Jianbin Fang, Peng Zhang, Lin Peng, Chun Huang	2025-11-13	下载	Sparse matrix ordering is a vital optimization technique often employed for solving large-scale sparse matrices. Its goal is to minimize the matrix bandwidth by reorganizing its rows and columns, thus...
Noise-Aware Optimization in Nominally Identical Manufacturing and Measuring Systems for High-Throughput Parallel Workflows	Christina Schenk, Miguel Hernández-del-Valle, Luis Calero-Lumbreras, Marcus Noack, Maciej Haranczyk	2025-11-13	下载	Device-to-device variability in experimental noise critically impacts reproducibility, especially in automated, high-throughput systems like additive manufacturing farms.
Dynamic Edge Server Selection in Time-Varying Environments: A Reliability-Aware Predictive Approach	Jaime Sebastian Burbano, Arnova Abdullah, Eldiyar Zhantileuov, Mohan Liyanage, Rolf Schuster	2025-11-13	下载	Latency-sensitive embedded applications increasingly rely on edge computing, yet dynamic network congestion in multi-server architectures challenges proper edge server selection.
dHPR: A Distributed Halpern Peaceman--Rachford Method for Non-smooth Distributed Optimization Problems	Zhangcheng Feng, Defeng Sun, Yancheng Yuan, Guojun Zhang	2025-11-13	下载	This paper introduces the distributed Halpern Peaceman--Rachford (dHPR) method, an efficient algorithm for solving distributed convex composite optimization problems with non-smooth objectives, which ...
Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput	Jingwei Song, Wanyi Chen, Xinyuan Song, Max, Chris Tong, Gufeng Chen, Tianyi Zhao, Eric Yang, Bill Shi, Lynn Ai	2025-11-13	下载	Speculative decoding accelerates large language model (LLM) inference by using a lightweight draft model to propose tokens that are later verified by a stronger target model.
Harli: SLO-Aware Co-location of LLM Inference and PEFT-based Finetuning on Model-as-a-Service Platforms	Ao Xu, Han Zhao, Weihao Cui, Quan Chen, Yukang Chen, Shulai Zhang, Shuang Chen, Jiemin Jiang, Zhibin Yu, Minyi Guo	2025-11-13	下载	Large language models (LLMs) are increasingly deployed under the Model-as-a-Service (MaaS) paradigm. To meet stringent quality-of-service (QoS) requirements, existing LLM serving systems disaggregate ...
Optimizing CPU Cache Utilization in Cloud VMs with Accurate Cache Abstraction	Mani Tofigh, Edward Guo, Weiwei Jia, Xiaoning Ding, Zirui Neil Zhao, Jianchen Shan	2025-11-13	下载	This paper shows that cache-based optimizations are often ineffective in cloud virtual machines (VMs) due to limited visibility into and control over provisioned caches.
Lit Silicon: A Case Where Thermal Imbalance Couples Concurrent Execution in Multiple GPUs	Marco Kurzynski, Shaizeen Aga, Di Wu	2025-11-13	下载	GPU systems are increasingly powering modern datacenters at scale. Despite being highly performant, GPU systems suffer from performance variation at the node and cluster levels.
MoFa: A Unified Performance Modeling Framework for LLM Pretraining	Lu Zhao, Rong Shi, Shaoqing Zhang, Shangchao Su, Ziqing Yin, Zhiyan Cui, Hongfeng Sun, Baoguo He, Yueqiang Chen, Liang Dong, Xiyuan Li, Lingbin Wang, Lijun Ma, Qiang Huang, Ting Liu, Chong Wang, Can Wei	2025-11-13	下载	The exponential growth in LLM scales, with parameters soaring from billions to trillions, has necessitated distributed pretraining across large clusters comprising thousands to tens of thousands of de...
A Meta-Heuristic Load Balancer for Cloud Computing Systems	Leszek Sliwko, Vladimir Getov	2025-11-13	下载	This paper presents a strategy to allocate services on a Cloud system without overloading nodes and maintaining the system stability with minimum cost.
SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data	Mingkun Yang, Ran Zhu, Qing Wang, Jie Yang	2025-11-13	下载	Split Federated Learning is a system-efficient federated learning paradigm that leverages the rich computing resources at a central server to train model partitions.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
LM4Opt-RA: A Multi-Candidate LLM Framework with Structured Ranking for Automating Network Resource Allocation	Tasnim Ahmed, Siana Rizwan, Naveed Ejaz, Salimur Choudhury	2025-11-13	下载	Building on advancements in Large Language Models (LLMs), we can tackle complex analytical and mathematical reasoning tasks requiring nuanced contextual understanding.
Millimeter-Wave UAV Channel Model with Height-Dependent Path Loss and Shadowing in Urban Scenarios	Abdul Saboor, Evgenii Vinogradov	2025-11-13	下载	Uncrewed Aerial Vehicles (UAVs) serving as Aerial Base Stations (ABSs) are expected to extend 6G millimeter-Wave (mmWave) coverage and improve link reliability in urban areas.
Towards an Agentic Workflow for Internet Measurement Research	Alagappan Ramanathan, Eunju Kang, Dongsu Han, Sangeetha Abdu Jyothi	2025-11-13	下载	Internet measurement research faces an accessibility crisis: complex analyses require custom integration of multiple specialized tools that demands specialized domain expertise.
Causal Model-Based Reinforcement Learning for Sample-Efficient IoT Channel Access	Aswin Arun, Christo Kurisummoottil Thomas, Rimalpudi Sarvendranath, Walid Saad	2025-11-13	下载	Despite the advantages of multi-agent reinforcement learning (MARL) for wireless use case such as medium access control (MAC), their real-world deployment in Internet of Things (IoT) is hindered by th...
P4-TAS: P4-Based Time-Aware Shaper for Time-Sensitive Networking	Fabian Ihle, Moritz Flüchter, Michael Menth	2025-11-13	下载	Time-Sensitive Networking (TSN) is a set of IEEE standards that extends Ethernet with real-time capabilities. Among its mechanisms, TSN can coordinate transmission times network-wide to minimize queue...
Pk-IOTA: Blockchain empowered Programmable Data Plane to secure OPC UA communications in Industry 4.0	Rinieri Lorenzo, Gori Giacomo, Melis Andrea, Girau Roberto, Prandini Marco, Callegati Franco	2025-11-13	下载	The OPC UA protocol is becoming the de facto standard for Industry 4.0 machine-to-machine communication. It stands out as one of the few industrial protocols that provide robust security features desi...
Dynamic Edge Server Selection in Time-Varying Environments: A Reliability-Aware Predictive Approach	Jaime Sebastian Burbano, Arnova Abdullah, Eldiyar Zhantileuov, Mohan Liyanage, Rolf Schuster	2025-11-13	下载	Latency-sensitive embedded applications increasingly rely on edge computing, yet dynamic network congestion in multi-server architectures challenges proper edge server selection.
Learning-Based Channel Access in Wi-Fi: A Multi-Armed Bandit Approach	Miguel Casasnovas, Francesc Wilhelmi, Richard Combes, Maksymilian Wojnar, Katarzyna Kosek-Szott, Szymon Szott, Anders Jonsson, Luis Esteve, Boris Bellalta	2025-11-13	下载	Due to its static protocol design, IEEE 802.11 (aka Wi-Fi) channel access lacks adaptability to address dynamic network conditions, resulting in inefficient spectrum utilization, unnecessary contentio...
See and Beam: Leveraging LiDAR Sensing and Specular Surfaces for Indoor mmWave Connectivity	Raj Sai Sohel Bandari, Amod Ashtekar, Omar Ibrahim, Mohammed E. Eltayeb	2025-11-13	下载	Millimeter-wave (mmWave) communication enables multi-gigabit-per-second data rates but is highly susceptible to path loss and blockage, especially indoors.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Vmem: A Lightweight Hot-Upgradable Memory Management for In-production Cloud Environment	Hao Zheng, Qiang Wang, Longxiang Wang, Xishi Qiu, Yibin Shen, Xiaoshe Dong, Naixuan Guan, Jia Wei, Fudong Qiu, Xingjun Zhang, Yun Xu, Mao Zhao, Yisheng Xie, Shenglong Zhao, Min He, Yu Li, Xiao Zheng, Ben Luo, Jiesheng Wu	2025-11-13	下载	Traditional memory management suffers from metadata overhead, architectural complexity, and stability degradation, problems intensified in cloud environments.
Optimizing CPU Cache Utilization in Cloud VMs with Accurate Cache Abstraction	Mani Tofigh, Edward Guo, Weiwei Jia, Xiaoning Ding, Zirui Neil Zhao, Jianchen Shan	2025-11-13	下载	This paper shows that cache-based optimizations are often ineffective in cloud virtual machines (VMs) due to limited visibility into and control over provisioned caches.
Taiji: A DPU Memory Elasticity Solution for In-production Cloud Environments	Hao Zheng, Longxiang Wang, Yun Xu, Qiang Wang, Yibin Shen, Xiaoshe Dong, Bang Di, Jia Wei, Shenyu Dong, Xingjun Zhang, Weichen Chen, Zhao Han, Sanqian Zhao, Dongdong Huang, Jie Qi, Yifan Yang, Zhao Gao, Yi Wang, Jinhu Li, Xudong Ren, Min He, Hang Yang, Xiao Zheng, Haijiao Hao, Jiesheng Wu	2025-11-13	下载	The growth of cloud computing drives data centers toward higher density and efficiency. Data processing units (DPUs) enhance server network and storage performance but face challenges such as long har...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
The Configuration Wall: Characterization and Elimination of Accelerator Configuration Overhead	Josse Van Delm, Anton Lydike, Joren Dumoulin, Jonas Crols, Xiaoling Yi, Ryan Antonio, Jackson Woodruff, Tobias Grosser, Marian Verhelst	2025-11-13	下载	Contemporary compute platforms increasingly offload compute kernels from CPU to integrated hardware accelerators to reach maximum performance per Watt.
EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM Training	Qingao Yi, Jiaang Duan, Hanwen Hu, Qin Hua, Haiyan Zhao, Shiyou Qian, Dingyu Yang, Jian Cao, Jinghua Tang, Yinghao Yu, Chenzhi Liao, Kangjin Wang, Liping Zhang	2025-11-13	下载	Training large language models (LLMs) poses significant challenges regarding computational resources and memory capacity. Although distributed training techniques help mitigate these issues, they stil...
Optimizing CPU Cache Utilization in Cloud VMs with Accurate Cache Abstraction	Mani Tofigh, Edward Guo, Weiwei Jia, Xiaoning Ding, Zirui Neil Zhao, Jianchen Shan	2025-11-13	下载	This paper shows that cache-based optimizations are often ineffective in cloud virtual machines (VMs) due to limited visibility into and control over provisioned caches.
Steering Pretrained Drafters during Speculative Decoding	Frédéric Berdoz, Peer Rheinboldt, Roger Wattenhofer	2025-11-13	下载	Speculative decoding accelerates language model inference by separating generation into fast drafting and parallel verification. Its main limitation is drafter-verifier misalignment, which limits toke...

2025-11-13 ​

cs.AR - Architecture ​

cs.DC - Distributed, Parallel, and Cluster Computing ​

cs.NI - Networking and Internet Architecture ​

cs.OS - Operating Systems ​

cs.PF - Performance ​

2025-11-13

cs.AR - Architecture

cs.DC - Distributed, Parallel, and Cluster Computing

cs.NI - Networking and Internet Architecture

cs.OS - Operating Systems

cs.PF - Performance