2026-01-30

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction	Jatin Chhugani, Geonhwa Jeong, Bor-Yiing Su, Yunjie Pan, Hanmei Yang, Aayush Ankit, Jiecao Yu, Summer Deng, Yunqing Chen, Nadathur Satish, Changkyu Kim	2026-01-30	下载	Large Language Models (LLMs) have intensified the need for low-precision formats that enable efficient, large-scale inference. The Open Compute Project (OCP) Microscaling (MX) standard is attractive d...
Optimal Engagement of Residential Battery Storage to Alleviate Grid Upgrades Caused by EVs and Solar Systems	Rafi Zahedi, Amirhossein Ahmadian, Chen Zhang, Shashank Narayana Gowda, Kourosh SedghiSigarchi, Rajit Gadh	2026-01-30	下载	The integration of distributed energy resources has ushered in a host of complex challenges, significantly impacting power quality in distribution networks.
Accelerating Physics-Based Electromigration Analysis via Rational Krylov Subspaces	Sheldon X. -D. Tan, Haotian Lu	2026-01-30	下载	Electromigration (EM) induced stress evolution is a major reliability challenge in nanometer-scale VLSI interconnects. Accurate EM analysis requires solving stress-governing partial differential equat...
FOCUS: DLLMs Know How to Tame Their Compute Bound	Kaihua Liang, Xin Tan, An Zhong, Hong Xu, Marco Canini	2026-01-30	下载	Diffusion Large Language Models (DLLMs) offer a compelling alternative to Auto-Regressive models, but their deployment is constrained by high decoding cost.
Toward Digital Twins in 3D IC Packaging: A Critical Review of Physics, Data, and Hybrid Architectures	Gourab Datta, Sarah Safura Sharif, Yaser Mike Banad	2026-01-30	下载	Three-dimensional integrated circuit (3D IC) pack-aging and heterogeneous integration have emerged as central pillars of contemporary semiconductor scaling.
Machine Learning for Energy-Performance-aware Scheduling	Zheyuan Hu, Yifei Shi	2026-01-30	下载	In the post-Dennard era, optimizing embedded systems requires navigating complex trade-offs between energy efficiency and latency. Traditional heuristic tuning is often inefficient in such high-dimens...
Design of a GPU with Heterogeneous Cores for Graphics	Aurora Tomás, Juan Luis Aragón, Joan Manuel Parcerisa, Antonio González	2026-01-30	下载	Heterogeneous architectures can deliver higher performance and energy efficiency than symmetric counterparts by using multiple architectures tuned to different types of workloads.
Trojan-Resilient NTT: Protecting Against Control Flow and Timing Faults on Reconfigurable Platforms	Rourab Paul, Krishnendu Guha, Amlan Chakrabarti	2026-01-30	下载	Number Theoretic Transform (NTT) is the most essential component for polynomial multiplications used in lattice-based Post-Quantum Cryptography (PQC) algorithms such as Kyber, Dilithium, NTRU etc.
Deep Learning-Based Early-Stage IR-Drop Estimation via CNN Surrogate Modeling	Ritesh Bhadana	2026-01-30	下载	IR-drop is a critical power integrity challenge in modern VLSI designs that can cause timing degradation, reliability issues, and functional failures if not detected early in the design flow.
RulePlanner: All-in-One Reinforcement Learner for Unifying Design Rules in 3D Floorplanning	Ruizhe Zhong, Xingbo Du, Junchi Yan	2026-01-30	下载	Floorplanning determines the coordinate and shape of each module in Integrated Circuits. With the scaling of technology nodes, in floorplanning stage especially 3D scenarios with multiple stacked laye...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Standardized Methods and Recommendations for Green Federated Learning	Austin Tapp, Holger R. Roth, Ziyue Xu, Abhijeet Parida, Hareem Nisar, Marius George Linguraru	2026-01-30	下载	Federated learning (FL) enables collaborative model training over privacy-sensitive, distributed data, but its environmental impact is difficult to compare across studies due to inconsistent measureme...
Self-Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation	Franz A. Heinsen, Leo Kozachkov	2026-01-30	下载	The most widely used artificial intelligence (AI) models today are Transformers employing self-attention. In its standard form, self-attention incurs costs that increase with context length, driving d...
Training LLMs with Fault Tolerant HSDP on 100,000 GPUs	Omkar Salpekar, Rohan Varma, Kenny Yu, Vladimir Ivanov, Yang Wang, Ahmed Sharif, Min Si, Shawn Xu, Feng Tian, Shengbao Zheng, Tristan Rice, Ankush Garg, Shangfu Peng, Shreyas Siravara, Wenyin Fu, Rodrigo de Castro, Adithya Gangidi, Andrey Obraztsov, Sharan Narang, Sergey Edunov, Maxim Naumov, Chunqiang Tang, Mathew Oldham	2026-01-30	下载	Large-scale training systems typically use synchronous training, requiring all GPUs to be healthy simultaneously. In our experience training on O(100K) GPUs, synchronous training results in a low effi...
A Fault-Tolerant Version of Safra's Termination Detection Algorithm	Wan Fokkink, Georgios Karlos, Andy Tatman	2026-01-30	下载	Safra's distributed termination detection algorithm employs a logical token ring structure within a distributed network; only passive nodes forward the token, and a counter in the token keeps track of...
VoxServe: Streaming-Centric Serving System for Speech Language Models	Keisuke Kamahori, Wei-Tzu Lee, Atindra Jha, Rohan Kadekodi, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci	2026-01-30	下载	Deploying modern Speech Language Models (SpeechLMs) in streaming settings requires systems that provide low latency, high throughput, and strong guarantees of streamability.
ERA: Epoch-Resolved Arbitration for Duelling Admins in Group Management CRDTs	Kegan Dougal	2026-01-30	下载	Conflict-Free Replicated Data Types (CRDTs) are used in a range of fields for their coordination-free replication with strong eventual consistency.
iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems	Yi-Xiang Hu, Yuke Wang, Feng Wu, Zirui Huang, Shuli Zeng, Xiang-Yang Li	2026-01-30	下载	Scheduling precedence-constrained tasks under shared renewable resources is central to modern computing platforms. The Resource Investment Problem (RIP) models this setting by minimizing the cost of p...
AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation	Zhongzhen Wen, Shudi Shao, Zhong Li, Yu Ge, Tongtong Xu, Yuanyi Lin, Tian Zhang	2026-01-30	下载	The performance of deep learning models critically depends on efficient kernel implementations, yet developing high-performance kernels for specialized accelerators remains time-consuming and expertis...
Error Analysis of Matrix Multiplication Emulation Using Ozaki-II Scheme	Yuki Uchino, Katsuhisa Ozaki, Toshiyuki Imamura	2026-01-30	下载	The Ozaki-II scheme is an emulation method that leverages the Chinese Remainder Theorem to compute high-precision matrix multiplication via a sequence of low-precision matrix multiplications.
SQUAD: Scalable Quorum Adaptive Decisions via ensemble of early exit neural networks	Matteo Gambella, Fabrizio Pittorino, Giuliano Casale, Manuel Roveri	2026-01-30	下载	Early-exit neural networks have become popular for reducing inference latency by allowing intermediate predictions when sufficient confidence is achieved.
CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control	Qiaoling Chen, Zhisheng Ye, Tian Tang, Peng Sun, Boyu Tian, Guoteng Wang, Shenggui Li, Yonggang Wen, Zhenhua Han, Tianwei Zhang	2026-01-30	下载	Batch inference for agentic workloads stresses the GPU key-value (KV) cache in a sustained and cumulative manner, often causing severe throughput degradation well before memory capacity is exhausted.
HetCCL: Accelerating LLM Training with Heterogeneous GPUs	Heehoon Kim, Jaehwan Lee, Taejeoung Kim, Jongwon Park, Jinpyo Kim, Pyongwon Suh, Ryan H. Choi, Sangwoo Lee, Jaejin Lee	2026-01-30	下载	The rapid growth of large language models is driving organizations to expand their GPU clusters, often with GPUs from multiple vendors. However, current deep learning frameworks lack support for colle...
Coordinating Power Grid Frequency Regulation Service with Data Center Load Flexibility	Ali Jahanshahi, Sara Rashidi Golrouye, Osten Anderson, Nanpeng Yu, Daniel Wong	2026-01-30	下载	AI/ML data center growth have led to higher energy consumption and carbon emissions. The shift to renewable energy and growing data center energy demands can destabilize the power grid.
AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline Parallelism	Thalaiyasingam Ajanthan, Sameera Ramasinghe, Gil Avraham, Hadi Mohaghegh Dolatabadi, Chamin P Hewa Koneputugodage, Violetta Shevchenko, Yan Zuo, Alexander Long	2026-01-30	下载	Data and pipeline parallelism are key strategies for scaling neural network training across distributed devices, but their high communication cost necessitates co-located computing clusters with fast ...
Towards Resiliency in Large Language Model Serving with KevlarFlow	Shangshu Qian, Kipling Liu, P. C. Sruthi, Lin Tan, Yongle Zhang	2026-01-30	下载	Large Language Model (LLM) serving systems remain fundamentally fragile, where frequent hardware faults in hyperscale clusters trigger disproportionate service outages in the software stack.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Lossy Compression of Cellular Network KPIs	Andrea Pimpinella, Fabio Palmese, Alessandro E. C. Redondi	2026-01-30	下载	Network Key Performance Indicators (KPIs) are a fundamental component of mobile cellular network monitoring and optimization. Their massive volume, resulting from fine-grained measurements collected a...
Digital Twin Synchronization: towards a data-centric architecture	Eduardo Freitas, Assis T. de Oliveira Filho, Pedro R. X. do Carmo, Djamel Sadok, Judith Kelner	2026-01-30	下载	Digital Twin (DT) technology revolutionizes industrial processes by enabling the representation of physical entities and their dynamics to enhance productivity and operational efficiency.
Assessing the Real-World Impact of Post-Quantum Cryptography on WPA-Enterprise Networks	Lukas Köder, Nils Lohmiller, Phil Schmieder, Bastian Buck, Michael Menth, Tobias Heer	2026-01-30	下载	The advent of large-scale quantum computers poses a significant threat to contemporary network security protocols, including Wi-Fi Protected Access (WPA)-Enterprise authentication.
MCP-Diag: A Deterministic, Protocol-Driven Architecture for AI-Native Network Diagnostics	Devansh Lodha, Mohit Panchal, Sameer G. Kulkarni	2026-01-30	下载	The integration of Large Language Models (LLMs) into network operations (AIOps) is hindered by two fundamental challenges: the stochastic grounding problem, where LLMs struggle to reliably parse unstr...
Chance-Constrained Secrecy Optimization in Hybrid RIS-Empowered and UAV-Assisted Networks	Elhadj Moustapha Diallo, Mamadou Aliou Diallo, Abusaeed B. M. Adam, Muhammad Naeem Shah	2026-01-30	下载	This paper considers a hybrid reconfigurable environment comprising a UAV-mounted reflecting RIS, an outdoor STAR-RIS enabling simultaneous transmission and reflection, and an indoor holographic RIS (...
Nethira: A Heterogeneity-aware Hierarchical Pre-trained Model for Network Traffic Classification	Chungang Lin, Weiyao Zhang, Haitong Luo, Xuying Meng, Yujun Zhang	2026-01-30	下载	Network traffic classification is vital for network security and management. The pre-training technology has shown promise by learning general traffic representations from raw byte sequences, thereby ...
Toward Non-Expert Customized Congestion Control	Mingrui Zhang, Hamid Bagheri, Lisong Xu	2026-01-30	下载	General-purpose congestion control algorithms (CCAs) are designed to achieve general congestion control goals, but they may not meet the specific requirements of certain users.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction	Jatin Chhugani, Geonhwa Jeong, Bor-Yiing Su, Yunjie Pan, Hanmei Yang, Aayush Ankit, Jiecao Yu, Summer Deng, Yunqing Chen, Nadathur Satish, Changkyu Kim	2026-01-30	下载	Large Language Models (LLMs) have intensified the need for low-precision formats that enable efficient, large-scale inference. The Open Compute Project (OCP) Microscaling (MX) standard is attractive d...
Standardized Methods and Recommendations for Green Federated Learning	Austin Tapp, Holger R. Roth, Ziyue Xu, Abhijeet Parida, Hareem Nisar, Marius George Linguraru	2026-01-30	下载	Federated learning (FL) enables collaborative model training over privacy-sensitive, distributed data, but its environmental impact is difficult to compare across studies due to inconsistent measureme...
Assessing the Real-World Impact of Post-Quantum Cryptography on WPA-Enterprise Networks	Lukas Köder, Nils Lohmiller, Phil Schmieder, Bastian Buck, Michael Menth, Tobias Heer	2026-01-30	下载	The advent of large-scale quantum computers poses a significant threat to contemporary network security protocols, including Wi-Fi Protected Access (WPA)-Enterprise authentication.
AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation	Zhongzhen Wen, Shudi Shao, Zhong Li, Yu Ge, Tongtong Xu, Yuanyi Lin, Tian Zhang	2026-01-30	下载	The performance of deep learning models critically depends on efficient kernel implementations, yet developing high-performance kernels for specialized accelerators remains time-consuming and expertis...