Appearance
2026-01-30
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction | Jatin Chhugani, Geonhwa Jeong, Bor-Yiing Su, Yunjie Pan, Hanmei Yang, Aayush Ankit, Jiecao Yu, Summer Deng, Yunqing Chen, Nadathur Satish, Changkyu Kim | 2026-01-30 | 下载 | Large Language Models (LLMs) have intensified the need for low-precision formats that enable efficient, large-scale inference. The Open Compute Project (OCP) Microscaling (MX) standard is attractive d... |
| Optimal Engagement of Residential Battery Storage to Alleviate Grid Upgrades Caused by EVs and Solar Systems | Rafi Zahedi, Amirhossein Ahmadian, Chen Zhang, Shashank Narayana Gowda, Kourosh SedghiSigarchi, Rajit Gadh | 2026-01-30 | 下载 | The integration of distributed energy resources has ushered in a host of complex challenges, significantly impacting power quality in distribution networks. |
| Accelerating Physics-Based Electromigration Analysis via Rational Krylov Subspaces | Sheldon X. -D. Tan, Haotian Lu | 2026-01-30 | 下载 | Electromigration (EM) induced stress evolution is a major reliability challenge in nanometer-scale VLSI interconnects. Accurate EM analysis requires solving stress-governing partial differential equat... |
| FOCUS: DLLMs Know How to Tame Their Compute Bound | Kaihua Liang, Xin Tan, An Zhong, Hong Xu, Marco Canini | 2026-01-30 | 下载 | Diffusion Large Language Models (DLLMs) offer a compelling alternative to Auto-Regressive models, but their deployment is constrained by high decoding cost. |
| Toward Digital Twins in 3D IC Packaging: A Critical Review of Physics, Data, and Hybrid Architectures | Gourab Datta, Sarah Safura Sharif, Yaser Mike Banad | 2026-01-30 | 下载 | Three-dimensional integrated circuit (3D IC) pack-aging and heterogeneous integration have emerged as central pillars of contemporary semiconductor scaling. |
| Machine Learning for Energy-Performance-aware Scheduling | Zheyuan Hu, Yifei Shi | 2026-01-30 | 下载 | In the post-Dennard era, optimizing embedded systems requires navigating complex trade-offs between energy efficiency and latency. Traditional heuristic tuning is often inefficient in such high-dimens... |
| Design of a GPU with Heterogeneous Cores for Graphics | Aurora Tomás, Juan Luis Aragón, Joan Manuel Parcerisa, Antonio González | 2026-01-30 | 下载 | Heterogeneous architectures can deliver higher performance and energy efficiency than symmetric counterparts by using multiple architectures tuned to different types of workloads. |
| Trojan-Resilient NTT: Protecting Against Control Flow and Timing Faults on Reconfigurable Platforms | Rourab Paul, Krishnendu Guha, Amlan Chakrabarti | 2026-01-30 | 下载 | Number Theoretic Transform (NTT) is the most essential component for polynomial multiplications used in lattice-based Post-Quantum Cryptography (PQC) algorithms such as Kyber, Dilithium, NTRU etc. |
| Deep Learning-Based Early-Stage IR-Drop Estimation via CNN Surrogate Modeling | Ritesh Bhadana | 2026-01-30 | 下载 | IR-drop is a critical power integrity challenge in modern VLSI designs that can cause timing degradation, reliability issues, and functional failures if not detected early in the design flow. |
| RulePlanner: All-in-One Reinforcement Learner for Unifying Design Rules in 3D Floorplanning | Ruizhe Zhong, Xingbo Du, Junchi Yan | 2026-01-30 | 下载 | Floorplanning determines the coordinate and shape of each module in Integrated Circuits. With the scaling of technology nodes, in floorplanning stage especially 3D scenarios with multiple stacked laye... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Standardized Methods and Recommendations for Green Federated Learning | Austin Tapp, Holger R. Roth, Ziyue Xu, Abhijeet Parida, Hareem Nisar, Marius George Linguraru | 2026-01-30 | 下载 | Federated learning (FL) enables collaborative model training over privacy-sensitive, distributed data, but its environmental impact is difficult to compare across studies due to inconsistent measureme... |
| Self-Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation | Franz A. Heinsen, Leo Kozachkov | 2026-01-30 | 下载 | The most widely used artificial intelligence (AI) models today are Transformers employing self-attention. In its standard form, self-attention incurs costs that increase with context length, driving d... |
| Training LLMs with Fault Tolerant HSDP on 100,000 GPUs | Omkar Salpekar, Rohan Varma, Kenny Yu, Vladimir Ivanov, Yang Wang, Ahmed Sharif, Min Si, Shawn Xu, Feng Tian, Shengbao Zheng, Tristan Rice, Ankush Garg, Shangfu Peng, Shreyas Siravara, Wenyin Fu, Rodrigo de Castro, Adithya Gangidi, Andrey Obraztsov, Sharan Narang, Sergey Edunov, Maxim Naumov, Chunqiang Tang, Mathew Oldham | 2026-01-30 | 下载 | Large-scale training systems typically use synchronous training, requiring all GPUs to be healthy simultaneously. In our experience training on O(100K) GPUs, synchronous training results in a low effi... |
| A Fault-Tolerant Version of Safra's Termination Detection Algorithm | Wan Fokkink, Georgios Karlos, Andy Tatman | 2026-01-30 | 下载 | Safra's distributed termination detection algorithm employs a logical token ring structure within a distributed network; only passive nodes forward the token, and a counter in the token keeps track of... |
| VoxServe: Streaming-Centric Serving System for Speech Language Models | Keisuke Kamahori, Wei-Tzu Lee, Atindra Jha, Rohan Kadekodi, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci | 2026-01-30 | 下载 | Deploying modern Speech Language Models (SpeechLMs) in streaming settings requires systems that provide low latency, high throughput, and strong guarantees of streamability. |
| ERA: Epoch-Resolved Arbitration for Duelling Admins in Group Management CRDTs | Kegan Dougal | 2026-01-30 | 下载 | Conflict-Free Replicated Data Types (CRDTs) are used in a range of fields for their coordination-free replication with strong eventual consistency. |
| iScheduler: Reinforcement Learning-Driven Continual Optimization for Large-Scale Resource Investment Problems | Yi-Xiang Hu, Yuke Wang, Feng Wu, Zirui Huang, Shuli Zeng, Xiang-Yang Li | 2026-01-30 | 下载 | Scheduling precedence-constrained tasks under shared renewable resources is central to modern computing platforms. The Resource Investment Problem (RIP) models this setting by minimizing the cost of p... |
| AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation | Zhongzhen Wen, Shudi Shao, Zhong Li, Yu Ge, Tongtong Xu, Yuanyi Lin, Tian Zhang | 2026-01-30 | 下载 | The performance of deep learning models critically depends on efficient kernel implementations, yet developing high-performance kernels for specialized accelerators remains time-consuming and expertis... |
| Error Analysis of Matrix Multiplication Emulation Using Ozaki-II Scheme | Yuki Uchino, Katsuhisa Ozaki, Toshiyuki Imamura | 2026-01-30 | 下载 | The Ozaki-II scheme is an emulation method that leverages the Chinese Remainder Theorem to compute high-precision matrix multiplication via a sequence of low-precision matrix multiplications. |
| SQUAD: Scalable Quorum Adaptive Decisions via ensemble of early exit neural networks | Matteo Gambella, Fabrizio Pittorino, Giuliano Casale, Manuel Roveri | 2026-01-30 | 下载 | Early-exit neural networks have become popular for reducing inference latency by allowing intermediate predictions when sufficient confidence is achieved. |
| CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control | Qiaoling Chen, Zhisheng Ye, Tian Tang, Peng Sun, Boyu Tian, Guoteng Wang, Shenggui Li, Yonggang Wen, Zhenhua Han, Tianwei Zhang | 2026-01-30 | 下载 | Batch inference for agentic workloads stresses the GPU key-value (KV) cache in a sustained and cumulative manner, often causing severe throughput degradation well before memory capacity is exhausted. |
| HetCCL: Accelerating LLM Training with Heterogeneous GPUs | Heehoon Kim, Jaehwan Lee, Taejeoung Kim, Jongwon Park, Jinpyo Kim, Pyongwon Suh, Ryan H. Choi, Sangwoo Lee, Jaejin Lee | 2026-01-30 | 下载 | The rapid growth of large language models is driving organizations to expand their GPU clusters, often with GPUs from multiple vendors. However, current deep learning frameworks lack support for colle... |
| Coordinating Power Grid Frequency Regulation Service with Data Center Load Flexibility | Ali Jahanshahi, Sara Rashidi Golrouye, Osten Anderson, Nanpeng Yu, Daniel Wong | 2026-01-30 | 下载 | AI/ML data center growth have led to higher energy consumption and carbon emissions. The shift to renewable energy and growing data center energy demands can destabilize the power grid. |
| AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline Parallelism | Thalaiyasingam Ajanthan, Sameera Ramasinghe, Gil Avraham, Hadi Mohaghegh Dolatabadi, Chamin P Hewa Koneputugodage, Violetta Shevchenko, Yan Zuo, Alexander Long | 2026-01-30 | 下载 | Data and pipeline parallelism are key strategies for scaling neural network training across distributed devices, but their high communication cost necessitates co-located computing clusters with fast ... |
| Towards Resiliency in Large Language Model Serving with KevlarFlow | Shangshu Qian, Kipling Liu, P. C. Sruthi, Lin Tan, Yongle Zhang | 2026-01-30 | 下载 | Large Language Model (LLM) serving systems remain fundamentally fragile, where frequent hardware faults in hyperscale clusters trigger disproportionate service outages in the software stack. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Lossy Compression of Cellular Network KPIs | Andrea Pimpinella, Fabio Palmese, Alessandro E. C. Redondi | 2026-01-30 | 下载 | Network Key Performance Indicators (KPIs) are a fundamental component of mobile cellular network monitoring and optimization. Their massive volume, resulting from fine-grained measurements collected a... |
| Digital Twin Synchronization: towards a data-centric architecture | Eduardo Freitas, Assis T. de Oliveira Filho, Pedro R. X. do Carmo, Djamel Sadok, Judith Kelner | 2026-01-30 | 下载 | Digital Twin (DT) technology revolutionizes industrial processes by enabling the representation of physical entities and their dynamics to enhance productivity and operational efficiency. |
| Assessing the Real-World Impact of Post-Quantum Cryptography on WPA-Enterprise Networks | Lukas Köder, Nils Lohmiller, Phil Schmieder, Bastian Buck, Michael Menth, Tobias Heer | 2026-01-30 | 下载 | The advent of large-scale quantum computers poses a significant threat to contemporary network security protocols, including Wi-Fi Protected Access (WPA)-Enterprise authentication. |
| MCP-Diag: A Deterministic, Protocol-Driven Architecture for AI-Native Network Diagnostics | Devansh Lodha, Mohit Panchal, Sameer G. Kulkarni | 2026-01-30 | 下载 | The integration of Large Language Models (LLMs) into network operations (AIOps) is hindered by two fundamental challenges: the stochastic grounding problem, where LLMs struggle to reliably parse unstr... |
| Chance-Constrained Secrecy Optimization in Hybrid RIS-Empowered and UAV-Assisted Networks | Elhadj Moustapha Diallo, Mamadou Aliou Diallo, Abusaeed B. M. Adam, Muhammad Naeem Shah | 2026-01-30 | 下载 | This paper considers a hybrid reconfigurable environment comprising a UAV-mounted reflecting RIS, an outdoor STAR-RIS enabling simultaneous transmission and reflection, and an indoor holographic RIS (... |
| Nethira: A Heterogeneity-aware Hierarchical Pre-trained Model for Network Traffic Classification | Chungang Lin, Weiyao Zhang, Haitong Luo, Xuying Meng, Yujun Zhang | 2026-01-30 | 下载 | Network traffic classification is vital for network security and management. The pre-training technology has shown promise by learning general traffic representations from raw byte sequences, thereby ... |
| Toward Non-Expert Customized Congestion Control | Mingrui Zhang, Hamid Bagheri, Lisong Xu | 2026-01-30 | 下载 | General-purpose congestion control algorithms (CCAs) are designed to achieve general congestion control goals, but they may not meet the specific requirements of certain users. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction | Jatin Chhugani, Geonhwa Jeong, Bor-Yiing Su, Yunjie Pan, Hanmei Yang, Aayush Ankit, Jiecao Yu, Summer Deng, Yunqing Chen, Nadathur Satish, Changkyu Kim | 2026-01-30 | 下载 | Large Language Models (LLMs) have intensified the need for low-precision formats that enable efficient, large-scale inference. The Open Compute Project (OCP) Microscaling (MX) standard is attractive d... |
| Standardized Methods and Recommendations for Green Federated Learning | Austin Tapp, Holger R. Roth, Ziyue Xu, Abhijeet Parida, Hareem Nisar, Marius George Linguraru | 2026-01-30 | 下载 | Federated learning (FL) enables collaborative model training over privacy-sensitive, distributed data, but its environmental impact is difficult to compare across studies due to inconsistent measureme... |
| Assessing the Real-World Impact of Post-Quantum Cryptography on WPA-Enterprise Networks | Lukas Köder, Nils Lohmiller, Phil Schmieder, Bastian Buck, Michael Menth, Tobias Heer | 2026-01-30 | 下载 | The advent of large-scale quantum computers poses a significant threat to contemporary network security protocols, including Wi-Fi Protected Access (WPA)-Enterprise authentication. |
| AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation | Zhongzhen Wen, Shudi Shao, Zhong Li, Yu Ge, Tongtong Xu, Yuanyi Lin, Tian Zhang | 2026-01-30 | 下载 | The performance of deep learning models critically depends on efficient kernel implementations, yet developing high-performance kernels for specialized accelerators remains time-consuming and expertis... |