Skip to content

2026-01-30

cs.AR - Architecture

标题作者发布日期PDF摘要
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error ReductionJatin Chhugani, Geonhwa Jeong, Bor-Yiing Su, Yunjie Pan, Hanmei Yang, Aayush Ankit, Jiecao Yu, Summer Deng, Yunqing Chen, Nadathur Satish, Changkyu Kim2026-01-30下载Large Language Models (LLMs) have intensified the need for low-precision formats that enable efficient, large-scale inference. The Open Compute Project (OCP) Microscaling (MX) standard is attractive d...
Optimal Engagement of Residential Battery Storage to Alleviate Grid Upgrades Caused by EVs and Solar SystemsRafi Zahedi, Amirhossein Ahmadian, Chen Zhang, Shashank Narayana Gowda, Kourosh SedghiSigarchi, Rajit Gadh2026-01-30下载The integration of distributed energy resources has ushered in a host of complex challenges, significantly impacting power quality in distribution networks.
Accelerating Physics-Based Electromigration Analysis via Rational Krylov SubspacesSheldon X. -D. Tan, Haotian Lu2026-01-30下载Electromigration (EM) induced stress evolution is a major reliability challenge in nanometer-scale VLSI interconnects. Accurate EM analysis requires solving stress-governing partial differential equat...
FOCUS: DLLMs Know How to Tame Their Compute BoundKaihua Liang, Xin Tan, An Zhong, Hong Xu, Marco Canini2026-01-30下载Diffusion Large Language Models (DLLMs) offer a compelling alternative to Auto-Regressive models, but their deployment is constrained by high decoding cost.
Toward Digital Twins in 3D IC Packaging: A Critical Review of Physics, Data, and Hybrid ArchitecturesGourab Datta, Sarah Safura Sharif, Yaser Mike Banad2026-01-30下载Three-dimensional integrated circuit (3D IC) pack-aging and heterogeneous integration have emerged as central pillars of contemporary semiconductor scaling.
Machine Learning for Energy-Performance-aware SchedulingZheyuan Hu, Yifei Shi2026-01-30下载In the post-Dennard era, optimizing embedded systems requires navigating complex trade-offs between energy efficiency and latency. Traditional heuristic tuning is often inefficient in such high-dimens...
Design of a GPU with Heterogeneous Cores for GraphicsAurora Tomás, Juan Luis Aragón, Joan Manuel Parcerisa, Antonio González2026-01-30下载Heterogeneous architectures can deliver higher performance and energy efficiency than symmetric counterparts by using multiple architectures tuned to different types of workloads.
Trojan-Resilient NTT: Protecting Against Control Flow and Timing Faults on Reconfigurable PlatformsRourab Paul, Krishnendu Guha, Amlan Chakrabarti2026-01-30下载Number Theoretic Transform (NTT) is the most essential component for polynomial multiplications used in lattice-based Post-Quantum Cryptography (PQC) algorithms such as Kyber, Dilithium, NTRU etc.
Deep Learning-Based Early-Stage IR-Drop Estimation via CNN Surrogate ModelingRitesh Bhadana2026-01-30下载IR-drop is a critical power integrity challenge in modern VLSI designs that can cause timing degradation, reliability issues, and functional failures if not detected early in the design flow.
RulePlanner: All-in-One Reinforcement Learner for Unifying Design Rules in 3D FloorplanningRuizhe Zhong, Xingbo Du, Junchi Yan2026-01-30下载Floorplanning determines the coordinate and shape of each module in Integrated Circuits. With the scaling of technology nodes, in floorplanning stage especially 3D scenarios with multiple stacked laye...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Standardized Methods and Recommendations for Green Federated LearningAustin Tapp, Holger R. Roth, Ziyue Xu, Abhijeet Parida, Hareem Nisar, Marius George Linguraru2026-01-30下载Federated learning (FL) enables collaborative model training over privacy-sensitive, distributed data, but its environmental impact is difficult to compare across studies due to inconsistent measureme...
Self-Attention at Constant Cost per Token via Symmetry-Aware Taylor ApproximationFranz A. Heinsen, Leo Kozachkov2026-01-30下载The most widely used artificial intelligence (AI) models today are Transformers employing self-attention. In its standard form, self-attention incurs costs that increase with context length, driving d...
Training LLMs with Fault Tolerant HSDP on 100,000 GPUsOmkar Salpekar, Rohan Varma, Kenny Yu, Vladimir Ivanov, Yang Wang, Ahmed Sharif, Min Si, Shawn Xu, Feng Tian, Shengbao Zheng, Tristan Rice, Ankush Garg, Shangfu Peng, Shreyas Siravara, Wenyin Fu, Rodrigo de Castro, Adithya Gangidi, Andrey Obraztsov, Sharan Narang, Sergey Edunov, Maxim Naumov, Chunqiang Tang, Mathew Oldham2026-01-30下载Large-scale training systems typically use synchronous training, requiring all GPUs to be healthy simultaneously. In our experience training on O(100K) GPUs, synchronous training results in a low effi...
A Fault-Tolerant Version of Safra's Termination Detection AlgorithmWan Fokkink, Georgios Karlos, Andy Tatman2026-01-30下载Safra's distributed termination detection algorithm employs a logical token ring structure within a distributed network; only passive nodes forward the token, and a counter in the token keeps track of...
VoxServe: Streaming-Centric Serving System for Speech Language ModelsKeisuke Kamahori, Wei-Tzu Lee, Atindra Jha, Rohan Kadekodi, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci2026-01-30下载Deploying modern Speech Language Models (SpeechLMs) in streaming settings requires systems that provide low latency, high throughput, and strong guarantees of streamability.
ERA: Epoch-Resolved Arbitration for Duelling Admins in Group Management CRDTsKegan Dougal2026-01-30下载Conflict-Free Replicated Data Types (CRDTs) are used in a range of fields for their coordination-free replication with strong eventual consistency.
iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment ProblemsYi-Xiang Hu, Yuke Wang, Feng Wu, Zirui Huang, Shuli Zeng, Xiang-Yang Li2026-01-30下载Scheduling precedence-constrained tasks under shared renewable resources is central to modern computing platforms. The Resource Investment Problem (RIP) models this setting by minimizing the cost of p...
AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided TranscompilationZhongzhen Wen, Shudi Shao, Zhong Li, Yu Ge, Tongtong Xu, Yuanyi Lin, Tian Zhang2026-01-30下载The performance of deep learning models critically depends on efficient kernel implementations, yet developing high-performance kernels for specialized accelerators remains time-consuming and expertis...
Error Analysis of Matrix Multiplication Emulation Using Ozaki-II SchemeYuki Uchino, Katsuhisa Ozaki, Toshiyuki Imamura2026-01-30下载The Ozaki-II scheme is an emulation method that leverages the Chinese Remainder Theorem to compute high-precision matrix multiplication via a sequence of low-precision matrix multiplications.
SQUAD: Scalable Quorum Adaptive Decisions via ensemble of early exit neural networksMatteo Gambella, Fabrizio Pittorino, Giuliano Casale, Manuel Roveri2026-01-30下载Early-exit neural networks have become popular for reducing inference latency by allowing intermediate predictions when sufficient confidence is achieved.
CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency ControlQiaoling Chen, Zhisheng Ye, Tian Tang, Peng Sun, Boyu Tian, Guoteng Wang, Shenggui Li, Yonggang Wen, Zhenhua Han, Tianwei Zhang2026-01-30下载Batch inference for agentic workloads stresses the GPU key-value (KV) cache in a sustained and cumulative manner, often causing severe throughput degradation well before memory capacity is exhausted.
HetCCL: Accelerating LLM Training with Heterogeneous GPUsHeehoon Kim, Jaehwan Lee, Taejeoung Kim, Jongwon Park, Jinpyo Kim, Pyongwon Suh, Ryan H. Choi, Sangwoo Lee, Jaejin Lee2026-01-30下载The rapid growth of large language models is driving organizations to expand their GPU clusters, often with GPUs from multiple vendors. However, current deep learning frameworks lack support for colle...
Coordinating Power Grid Frequency Regulation Service with Data Center Load FlexibilityAli Jahanshahi, Sara Rashidi Golrouye, Osten Anderson, Nanpeng Yu, Daniel Wong2026-01-30下载AI/ML data center growth have led to higher energy consumption and carbon emissions. The shift to renewable energy and growing data center energy demands can destabilize the power grid.
AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline ParallelismThalaiyasingam Ajanthan, Sameera Ramasinghe, Gil Avraham, Hadi Mohaghegh Dolatabadi, Chamin P Hewa Koneputugodage, Violetta Shevchenko, Yan Zuo, Alexander Long2026-01-30下载Data and pipeline parallelism are key strategies for scaling neural network training across distributed devices, but their high communication cost necessitates co-located computing clusters with fast ...
Towards Resiliency in Large Language Model Serving with KevlarFlowShangshu Qian, Kipling Liu, P. C. Sruthi, Lin Tan, Yongle Zhang2026-01-30下载Large Language Model (LLM) serving systems remain fundamentally fragile, where frequent hardware faults in hyperscale clusters trigger disproportionate service outages in the software stack.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Lossy Compression of Cellular Network KPIsAndrea Pimpinella, Fabio Palmese, Alessandro E. C. Redondi2026-01-30下载Network Key Performance Indicators (KPIs) are a fundamental component of mobile cellular network monitoring and optimization. Their massive volume, resulting from fine-grained measurements collected a...
Digital Twin Synchronization: towards a data-centric architectureEduardo Freitas, Assis T. de Oliveira Filho, Pedro R. X. do Carmo, Djamel Sadok, Judith Kelner2026-01-30下载Digital Twin (DT) technology revolutionizes industrial processes by enabling the representation of physical entities and their dynamics to enhance productivity and operational efficiency.
Assessing the Real-World Impact of Post-Quantum Cryptography on WPA-Enterprise NetworksLukas Köder, Nils Lohmiller, Phil Schmieder, Bastian Buck, Michael Menth, Tobias Heer2026-01-30下载The advent of large-scale quantum computers poses a significant threat to contemporary network security protocols, including Wi-Fi Protected Access (WPA)-Enterprise authentication.
MCP-Diag: A Deterministic, Protocol-Driven Architecture for AI-Native Network DiagnosticsDevansh Lodha, Mohit Panchal, Sameer G. Kulkarni2026-01-30下载The integration of Large Language Models (LLMs) into network operations (AIOps) is hindered by two fundamental challenges: the stochastic grounding problem, where LLMs struggle to reliably parse unstr...
Chance-Constrained Secrecy Optimization in Hybrid RIS-Empowered and UAV-Assisted NetworksElhadj Moustapha Diallo, Mamadou Aliou Diallo, Abusaeed B. M. Adam, Muhammad Naeem Shah2026-01-30下载This paper considers a hybrid reconfigurable environment comprising a UAV-mounted reflecting RIS, an outdoor STAR-RIS enabling simultaneous transmission and reflection, and an indoor holographic RIS (...
Nethira: A Heterogeneity-aware Hierarchical Pre-trained Model for Network Traffic ClassificationChungang Lin, Weiyao Zhang, Haitong Luo, Xuying Meng, Yujun Zhang2026-01-30下载Network traffic classification is vital for network security and management. The pre-training technology has shown promise by learning general traffic representations from raw byte sequences, thereby ...
Toward Non-Expert Customized Congestion ControlMingrui Zhang, Hamid Bagheri, Lisong Xu2026-01-30下载General-purpose congestion control algorithms (CCAs) are designed to achieve general congestion control goals, but they may not meet the specific requirements of certain users.

cs.PF - Performance

标题作者发布日期PDF摘要
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error ReductionJatin Chhugani, Geonhwa Jeong, Bor-Yiing Su, Yunjie Pan, Hanmei Yang, Aayush Ankit, Jiecao Yu, Summer Deng, Yunqing Chen, Nadathur Satish, Changkyu Kim2026-01-30下载Large Language Models (LLMs) have intensified the need for low-precision formats that enable efficient, large-scale inference. The Open Compute Project (OCP) Microscaling (MX) standard is attractive d...
Standardized Methods and Recommendations for Green Federated LearningAustin Tapp, Holger R. Roth, Ziyue Xu, Abhijeet Parida, Hareem Nisar, Marius George Linguraru2026-01-30下载Federated learning (FL) enables collaborative model training over privacy-sensitive, distributed data, but its environmental impact is difficult to compare across studies due to inconsistent measureme...
Assessing the Real-World Impact of Post-Quantum Cryptography on WPA-Enterprise NetworksLukas Köder, Nils Lohmiller, Phil Schmieder, Bastian Buck, Michael Menth, Tobias Heer2026-01-30下载The advent of large-scale quantum computers poses a significant threat to contemporary network security protocols, including Wi-Fi Protected Access (WPA)-Enterprise authentication.
AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided TranscompilationZhongzhen Wen, Shudi Shao, Zhong Li, Yu Ge, Tongtong Xu, Yuanyi Lin, Tian Zhang2026-01-30下载The performance of deep learning models critically depends on efficient kernel implementations, yet developing high-performance kernels for specialized accelerators remains time-consuming and expertis...

基于 VitePress 构建