Appearance
2025-11-13
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Inside VOLT: Designing an Open-Source GPU Compiler | Shinnung Jeong, Chihyo Ahn, Huanzhi Pu, Jisheng Zhao, Hyesoon Kim, Blaise Tine | 2025-11-13 | 下载 | Recent efforts in open-source GPU research are opening new avenues in a domain that has long been tightly coupled with a few commercial vendors. |
| Tiny Chiplets Enabled by Packaging Scaling: Opportunities in ESD Protection and Signal Integrity | Emad Haque, Pragnya Sudershan Nalla, Jeff Zhang, Sachin S. Sapatnekar, Chaitali Chakrabarti, Yu Cao | 2025-11-13 | 下载 | The scaling of advanced packaging technologies provides abundant interconnection resources for 2.5D/3D heterogeneous integration (HI), thereby enabling the construction of larger-scale VLSI systems wi... |
| FengHuang: Next-Generation Memory Orchestration for AI Inferencing | Jiamin Li, Lei Qu, Tao Zhang, Grigory Chirkov, Shuotao Xu, Peng Cheng, Lidong Zhou | 2025-11-13 | 下载 | This document presents a vision for a novel AI infrastructure design that has been initially validated through inference simulations on state-of-the-art large language models. |
| Beamspace Equalization for mmWave Massive MIMO: Algorithms and VLSI Implementations | Seyed Hadi Mirfarshbafan, Christoph Studer | 2025-11-13 | 下载 | Massive multiuser multiple-input multiple-output (MIMO) and millimeter-wave (mmWave) communication are key physical layer technologies in future wireless systems. |
| Critical Path Aware Timing-Driven Global Placement for Large-Scale Heterogeneous FPGAs | He Jiang, Yi Guo, Shikai Guo, Huijiang Liu, Xiaochen Li, Ning Wang, Zhixiong Di | 2025-11-13 | 下载 | Timing optimization during global placement is critical for achieving optimal circuit performance and remains a key challenge in modern Field Programmable Gate Array (FPGA) design. |
| Combined power management and congestion control in High-Speed Ethernet-based Networks for Supercomputers and Data Centers | Miguel Sánchez de la Rosa, Francisco J. andújar, Jesus Escudero-Sahuquillo, José L. Sánchez, Francisco J. Alfaro-Cortés | 2025-11-13 | 下载 | The demand for computer in our daily lives has led to the proliferation of Datacenters that power indispensable many services. On the other hand, computing has become essential for some research for v... |
| The Role of Advanced Computer Architectures in Accelerating Artificial Intelligence Workloads | Shahid Amin, Syed Pervez Hussnain Shah | 2025-11-13 | 下载 | The remarkable progress in Artificial Intelligence (AI) is foundation-ally linked to a concurrent revolution in computer architecture. As AI models, particularly Deep Neural Networks (DNNs), have grow... |
| AssertMiner: Module-Level Spec Generation and Assertion Mining using Static Analysis Guided LLMs | Hongqin Lyu, Yonghao Wang, Jiaxin Zhou, Zhiteng Chao, Tiancheng Wang, Huawei Li | 2025-11-13 | 下载 | Assertion-based verification (ABV) is a key approach to checking whether a logic design complies with its architectural specifications. Existing assertion generation methods based on design specificat... |
| Lit Silicon: A Case Where Thermal Imbalance Couples Concurrent Execution in Multiple GPUs | Marco Kurzynski, Shaizeen Aga, Di Wu | 2025-11-13 | 下载 | GPU systems are increasingly powering modern datacenters at scale. Despite being highly performant, GPU systems suffer from performance variation at the node and cluster levels. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| HPCAgentTester: A Multi-Agent LLM Approach for Enhanced HPC Unit Test Generation | Rabimba Karanjai, Lei Xu, Weidong Shi | 2025-11-13 | 下载 | Unit testing in High-Performance Computing (HPC) is critical but challenged by parallelism, complex algorithms, and diverse hardware. Traditional methods often fail to address non-deterministic behavi... |
| Inside VOLT: Designing an Open-Source GPU Compiler | Shinnung Jeong, Chihyo Ahn, Huanzhi Pu, Jisheng Zhao, Hyesoon Kim, Blaise Tine | 2025-11-13 | 下载 | Recent efforts in open-source GPU research are opening new avenues in a domain that has long been tightly coupled with a few commercial vendors. |
| EarthSight: A Distributed Framework for Low-Latency Satellite Intelligence | Ansel Kaplan Erol, Seungjun Lee, Divya Mahajan | 2025-11-13 | 下载 | Low-latency delivery of satellite imagery is essential for time-critical applications such as disaster response, intelligence, and infrastructure monitoring. |
| FengHuang: Next-Generation Memory Orchestration for AI Inferencing | Jiamin Li, Lei Qu, Tao Zhang, Grigory Chirkov, Shuotao Xu, Peng Cheng, Lidong Zhou | 2025-11-13 | 下载 | This document presents a vision for a novel AI infrastructure design that has been initially validated through inference simulations on state-of-the-art large language models. |
| STAGE: A Symbolic Tensor grAph GEnerator for distributed AI system co-design | Changhai Man, Joongun Park, Hanjiang Wu, Huan Xu, Srinivas Sridharan, Tushar Krishna | 2025-11-13 | 下载 | Optimizing the performance of large language models (LLMs) on large-scale AI training and inference systems requires a scalable and expressive mechanism to model distributed workload execution. |
| How Machine Learning-Data Driven Replication Strategies Enhance Fault Tolerance in Large-Scale Distributed Systems | Almond Kiruthu Murimi | 2025-11-13 | 下载 | This research paper investigates how machine learning-driven data replication strategies can enhance fault tolerance in large-scale distributed systems. |
| FastGraph: Optimized GPU-Enabled Algorithms for Fast Graph Building and Message Passing | Aarush Agarwal, Raymond He, Jan Kieseler, Matteo Cremonesi, Shah Rukh Qasim | 2025-11-13 | 下载 | We introduce FastGraph, a novel GPU-optimized k-nearest neighbor algorithm specifically designed to accelerate graph construction in low-dimensional spaces (2-10 dimensions), critical for high-perform... |
| Unlocking Dynamic Inter-Client Spatial Dependencies: A Federated Spatio-Temporal Graph Learning Method for Traffic Flow Forecasting | Feng Wang, Tianxiang Chen, Shuyue Wei, Qian Chu, Yi Zhang, Yifan Sun, Zhiming Zheng | 2025-11-13 | 下载 | Spatio-temporal graphs are powerful tools for modeling complex dependencies in traffic time series. However, the distributed nature of real-world traffic data across multiple stakeholders poses signif... |
| On The Performance of Prefix-Sum Parallel Kalman Filters and Smoothers on GPUs | Simo Särkkä, Ángel F. García-Fernández | 2025-11-13 | 下载 | This paper presents an experimental evaluation of parallel-in-time Kalman filters and smoothers using graphics processing units (GPUs). In particular, the paper evaluates different all-prefix-sum algo... |
| Massively Parallel Proof-Number Search for Impartial Games and Beyond | Tomáš Čížek, Martin Balko, Martin Schmid | 2025-11-13 | 下载 | Proof-Number Search is a best-first search algorithm with many successful applications, especially in game solving. As large-scale computing clusters become increasingly accessible, parallelization is... |
| Workload Schedulers -- Genesis, Algorithms and Differences | Leszek Sliwko, Vladimir Getov | 2025-11-13 | 下载 | This paper presents a novel approach to categorization of modern workload schedulers. We provide descriptions of three classes of schedulers: Operating Systems Process Schedulers, Cluster Systems Jobs... |
| Pk-IOTA: Blockchain empowered Programmable Data Plane to secure OPC UA communications in Industry 4.0 | Rinieri Lorenzo, Gori Giacomo, Melis Andrea, Girau Roberto, Prandini Marco, Callegati Franco | 2025-11-13 | 下载 | The OPC UA protocol is becoming the de facto standard for Industry 4.0 machine-to-machine communication. It stands out as one of the few industrial protocols that provide robust security features desi... |
| Selection of Supervised Learning-based Sparse Matrix Reordering Algorithms | Tao Tang, Youfu Jiang, Yingbo Cui, Jianbin Fang, Peng Zhang, Lin Peng, Chun Huang | 2025-11-13 | 下载 | Sparse matrix ordering is a vital optimization technique often employed for solving large-scale sparse matrices. Its goal is to minimize the matrix bandwidth by reorganizing its rows and columns, thus... |
| Noise-Aware Optimization in Nominally Identical Manufacturing and Measuring Systems for High-Throughput Parallel Workflows | Christina Schenk, Miguel Hernández-del-Valle, Luis Calero-Lumbreras, Marcus Noack, Maciej Haranczyk | 2025-11-13 | 下载 | Device-to-device variability in experimental noise critically impacts reproducibility, especially in automated, high-throughput systems like additive manufacturing farms. |
| Dynamic Edge Server Selection in Time-Varying Environments: A Reliability-Aware Predictive Approach | Jaime Sebastian Burbano, Arnova Abdullah, Eldiyar Zhantileuov, Mohan Liyanage, Rolf Schuster | 2025-11-13 | 下载 | Latency-sensitive embedded applications increasingly rely on edge computing, yet dynamic network congestion in multi-server architectures challenges proper edge server selection. |
| dHPR: A Distributed Halpern Peaceman--Rachford Method for Non-smooth Distributed Optimization Problems | Zhangcheng Feng, Defeng Sun, Yancheng Yuan, Guojun Zhang | 2025-11-13 | 下载 | This paper introduces the distributed Halpern Peaceman--Rachford (dHPR) method, an efficient algorithm for solving distributed convex composite optimization problems with non-smooth objectives, which ... |
| Speculative Decoding in Decentralized LLM Inference: Turning Communication Latency into Computation Throughput | Jingwei Song, Wanyi Chen, Xinyuan Song, Max, Chris Tong, Gufeng Chen, Tianyi Zhao, Eric Yang, Bill Shi, Lynn Ai | 2025-11-13 | 下载 | Speculative decoding accelerates large language model (LLM) inference by using a lightweight draft model to propose tokens that are later verified by a stronger target model. |
| Harli: SLO-Aware Co-location of LLM Inference and PEFT-based Finetuning on Model-as-a-Service Platforms | Ao Xu, Han Zhao, Weihao Cui, Quan Chen, Yukang Chen, Shulai Zhang, Shuang Chen, Jiemin Jiang, Zhibin Yu, Minyi Guo | 2025-11-13 | 下载 | Large language models (LLMs) are increasingly deployed under the Model-as-a-Service (MaaS) paradigm. To meet stringent quality-of-service (QoS) requirements, existing LLM serving systems disaggregate ... |
| Optimizing CPU Cache Utilization in Cloud VMs with Accurate Cache Abstraction | Mani Tofigh, Edward Guo, Weiwei Jia, Xiaoning Ding, Zirui Neil Zhao, Jianchen Shan | 2025-11-13 | 下载 | This paper shows that cache-based optimizations are often ineffective in cloud virtual machines (VMs) due to limited visibility into and control over provisioned caches. |
| Lit Silicon: A Case Where Thermal Imbalance Couples Concurrent Execution in Multiple GPUs | Marco Kurzynski, Shaizeen Aga, Di Wu | 2025-11-13 | 下载 | GPU systems are increasingly powering modern datacenters at scale. Despite being highly performant, GPU systems suffer from performance variation at the node and cluster levels. |
| MoFa: A Unified Performance Modeling Framework for LLM Pretraining | Lu Zhao, Rong Shi, Shaoqing Zhang, Shangchao Su, Ziqing Yin, Zhiyan Cui, Hongfeng Sun, Baoguo He, Yueqiang Chen, Liang Dong, Xiyuan Li, Lingbin Wang, Lijun Ma, Qiang Huang, Ting Liu, Chong Wang, Can Wei | 2025-11-13 | 下载 | The exponential growth in LLM scales, with parameters soaring from billions to trillions, has necessitated distributed pretraining across large clusters comprising thousands to tens of thousands of de... |
| A Meta-Heuristic Load Balancer for Cloud Computing Systems | Leszek Sliwko, Vladimir Getov | 2025-11-13 | 下载 | This paper presents a strategy to allocate services on a Cloud system without overloading nodes and maintaining the system stability with minimum cost. |
| SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data | Mingkun Yang, Ran Zhu, Qing Wang, Jie Yang | 2025-11-13 | 下载 | Split Federated Learning is a system-efficient federated learning paradigm that leverages the rich computing resources at a central server to train model partitions. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| LM4Opt-RA: A Multi-Candidate LLM Framework with Structured Ranking for Automating Network Resource Allocation | Tasnim Ahmed, Siana Rizwan, Naveed Ejaz, Salimur Choudhury | 2025-11-13 | 下载 | Building on advancements in Large Language Models (LLMs), we can tackle complex analytical and mathematical reasoning tasks requiring nuanced contextual understanding. |
| Millimeter-Wave UAV Channel Model with Height-Dependent Path Loss and Shadowing in Urban Scenarios | Abdul Saboor, Evgenii Vinogradov | 2025-11-13 | 下载 | Uncrewed Aerial Vehicles (UAVs) serving as Aerial Base Stations (ABSs) are expected to extend 6G millimeter-Wave (mmWave) coverage and improve link reliability in urban areas. |
| Towards an Agentic Workflow for Internet Measurement Research | Alagappan Ramanathan, Eunju Kang, Dongsu Han, Sangeetha Abdu Jyothi | 2025-11-13 | 下载 | Internet measurement research faces an accessibility crisis: complex analyses require custom integration of multiple specialized tools that demands specialized domain expertise. |
| Causal Model-Based Reinforcement Learning for Sample-Efficient IoT Channel Access | Aswin Arun, Christo Kurisummoottil Thomas, Rimalpudi Sarvendranath, Walid Saad | 2025-11-13 | 下载 | Despite the advantages of multi-agent reinforcement learning (MARL) for wireless use case such as medium access control (MAC), their real-world deployment in Internet of Things (IoT) is hindered by th... |
| P4-TAS: P4-Based Time-Aware Shaper for Time-Sensitive Networking | Fabian Ihle, Moritz Flüchter, Michael Menth | 2025-11-13 | 下载 | Time-Sensitive Networking (TSN) is a set of IEEE standards that extends Ethernet with real-time capabilities. Among its mechanisms, TSN can coordinate transmission times network-wide to minimize queue... |
| Pk-IOTA: Blockchain empowered Programmable Data Plane to secure OPC UA communications in Industry 4.0 | Rinieri Lorenzo, Gori Giacomo, Melis Andrea, Girau Roberto, Prandini Marco, Callegati Franco | 2025-11-13 | 下载 | The OPC UA protocol is becoming the de facto standard for Industry 4.0 machine-to-machine communication. It stands out as one of the few industrial protocols that provide robust security features desi... |
| Dynamic Edge Server Selection in Time-Varying Environments: A Reliability-Aware Predictive Approach | Jaime Sebastian Burbano, Arnova Abdullah, Eldiyar Zhantileuov, Mohan Liyanage, Rolf Schuster | 2025-11-13 | 下载 | Latency-sensitive embedded applications increasingly rely on edge computing, yet dynamic network congestion in multi-server architectures challenges proper edge server selection. |
| Learning-Based Channel Access in Wi-Fi: A Multi-Armed Bandit Approach | Miguel Casasnovas, Francesc Wilhelmi, Richard Combes, Maksymilian Wojnar, Katarzyna Kosek-Szott, Szymon Szott, Anders Jonsson, Luis Esteve, Boris Bellalta | 2025-11-13 | 下载 | Due to its static protocol design, IEEE 802.11 (aka Wi-Fi) channel access lacks adaptability to address dynamic network conditions, resulting in inefficient spectrum utilization, unnecessary contentio... |
| See and Beam: Leveraging LiDAR Sensing and Specular Surfaces for Indoor mmWave Connectivity | Raj Sai Sohel Bandari, Amod Ashtekar, Omar Ibrahim, Mohammed E. Eltayeb | 2025-11-13 | 下载 | Millimeter-wave (mmWave) communication enables multi-gigabit-per-second data rates but is highly susceptible to path loss and blockage, especially indoors. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Vmem: A Lightweight Hot-Upgradable Memory Management for In-production Cloud Environment | Hao Zheng, Qiang Wang, Longxiang Wang, Xishi Qiu, Yibin Shen, Xiaoshe Dong, Naixuan Guan, Jia Wei, Fudong Qiu, Xingjun Zhang, Yun Xu, Mao Zhao, Yisheng Xie, Shenglong Zhao, Min He, Yu Li, Xiao Zheng, Ben Luo, Jiesheng Wu | 2025-11-13 | 下载 | Traditional memory management suffers from metadata overhead, architectural complexity, and stability degradation, problems intensified in cloud environments. |
| Optimizing CPU Cache Utilization in Cloud VMs with Accurate Cache Abstraction | Mani Tofigh, Edward Guo, Weiwei Jia, Xiaoning Ding, Zirui Neil Zhao, Jianchen Shan | 2025-11-13 | 下载 | This paper shows that cache-based optimizations are often ineffective in cloud virtual machines (VMs) due to limited visibility into and control over provisioned caches. |
| Taiji: A DPU Memory Elasticity Solution for In-production Cloud Environments | Hao Zheng, Longxiang Wang, Yun Xu, Qiang Wang, Yibin Shen, Xiaoshe Dong, Bang Di, Jia Wei, Shenyu Dong, Xingjun Zhang, Weichen Chen, Zhao Han, Sanqian Zhao, Dongdong Huang, Jie Qi, Yifan Yang, Zhao Gao, Yi Wang, Jinhu Li, Xudong Ren, Min He, Hang Yang, Xiao Zheng, Haijiao Hao, Jiesheng Wu | 2025-11-13 | 下载 | The growth of cloud computing drives data centers toward higher density and efficiency. Data processing units (DPUs) enhance server network and storage performance but face challenges such as long har... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| The Configuration Wall: Characterization and Elimination of Accelerator Configuration Overhead | Josse Van Delm, Anton Lydike, Joren Dumoulin, Jonas Crols, Xiaoling Yi, Ryan Antonio, Jackson Woodruff, Tobias Grosser, Marian Verhelst | 2025-11-13 | 下载 | Contemporary compute platforms increasingly offload compute kernels from CPU to integrated hardware accelerators to reach maximum performance per Watt. |
| EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM Training | Qingao Yi, Jiaang Duan, Hanwen Hu, Qin Hua, Haiyan Zhao, Shiyou Qian, Dingyu Yang, Jian Cao, Jinghua Tang, Yinghao Yu, Chenzhi Liao, Kangjin Wang, Liping Zhang | 2025-11-13 | 下载 | Training large language models (LLMs) poses significant challenges regarding computational resources and memory capacity. Although distributed training techniques help mitigate these issues, they stil... |
| Optimizing CPU Cache Utilization in Cloud VMs with Accurate Cache Abstraction | Mani Tofigh, Edward Guo, Weiwei Jia, Xiaoning Ding, Zirui Neil Zhao, Jianchen Shan | 2025-11-13 | 下载 | This paper shows that cache-based optimizations are often ineffective in cloud virtual machines (VMs) due to limited visibility into and control over provisioned caches. |
| Steering Pretrained Drafters during Speculative Decoding | Frédéric Berdoz, Peer Rheinboldt, Roger Wattenhofer | 2025-11-13 | 下载 | Speculative decoding accelerates language model inference by separating generation into fast drafting and parallel verification. Its main limitation is drafter-verifier misalignment, which limits toke... |