2026-01-31

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
ENFOR-SA: End-to-end Cross-layer Transient Fault Injector for Efficient and Accurate DNN Reliability Assessment on Systolic Arrays	Rafael Billig Tonetto, Marcello Traiola, Fernando Fernandes dos Santos, Angeliki Kritikakou	2026-01-31	下载	Recent advances in deep learning have produced highly accurate but increasingly large and complex DNNs, making traditional fault-injection techniques impractical.
Exploration of Unary Arithmetic-Based Matrix Multiply Units for Low Precision DL Accelerators	Prabhu Vellaisamy, Harideep Nair, Di Wu, Shawn Blanton, John Paul Shen	2026-01-31	下载	General matrix multiplication (GEMM) is a fundamental operation in deep learning (DL). With DL moving increasingly toward low precision, recent works have proposed novel unary GEMM designs as an alter...
AutoGNN: End-to-End Hardware-Driven Graph Preprocessing for Enhanced GNN Performance	Seungkwan Kang, Seungjun Lee, Donghyun Gouk, Miryeong Kwon, Hyunkyu Choi, Junhyeok Jang, Sangwon Lee, Huiwon Choi, Jie Zhang, Wonil Choi, Mahmut Taylan Kandemir, Myoungsoo Jung	2026-01-31	下载	Graph neural network (GNN) inference faces significant bottlenecks in preprocessing, which often dominate overall inference latency. We introduce AutoGNN, an FPGA-based accelerator designed to address...
HyperOffload: Graph-Driven Hierarchical Memory Management for Large Language Models on SuperNode Architectures	Fangxin Liu, Qinghua Zhang, Hanjing Shen, Zhibo Liang, Li Jiang, Haibing Guan, Chong Bao, Xuefeng Jin	2026-01-31	下载	The rapid evolution of Large Language Models (LLMs) towards long-context reasoning and sparse architectures has pushed memory requirements far beyond the capacity of individual device HBM.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Fast Sparse Matrix Permutation for Mesh-Based Direct Solvers	Behrooz Zarebavami, Ahmed H. Mahmoud, Ana Dodik, Changcheng Yuan, Serban D. Porumbescu, John D. Owens, Maryam Mehri Dehnavi, Justin Solomon	2026-01-31	下载	We present a fast sparse matrix permutation algorithm tailored to linear systems arising from triangle meshes. Our approach produces nested-dissection-style permutations while significantly reducing p...
System-Level Performance Modeling of Photonic In-Memory Computing	Jebacyril Arockiaraj, Sasindu Wijeratne, Sugeet Sunder, Md Abdullah-Al Kaiser, Akhilesh Jaiswal, Ajey P. Jacob, Viktor Prasanna	2026-01-31	下载	Photonic in-memory computing is a high-speed, low-energy alternative to traditional transistor-based digital computing that utilizes high photonic operating frequencies and bandwidths.
HyperOffload: Graph-Driven Hierarchical Memory Management for Large Language Models on SuperNode Architectures	Fangxin Liu, Qinghua Zhang, Hanjing Shen, Zhibo Liang, Li Jiang, Haibing Guan, Chong Bao, Xuefeng Jin	2026-01-31	下载	The rapid evolution of Large Language Models (LLMs) towards long-context reasoning and sparse architectures has pushed memory requirements far beyond the capacity of individual device HBM.
Forecasting Energy Availability in Local Energy Communities via LSTM Federated Learning	Fabio Turazza, Marcello Pietri, Natalia Selini Hadjidimitriou, Marco Mamei	2026-01-31	下载	Local Energy Communities are emerging as crucial players in the landscape of sustainable development. A significant challenge for these communities is achieving self-sufficiency through effective mana...
PROBE: Co-Balancing Computation and Communication in MoE Inference via Real-Time Predictive Prefetching	Qianchao Zhu, Xucheng Ye, Yuliang Liu, Haodong Ouyang, Chengru Song	2026-01-31	下载	Mixture-of-Experts models have become a dominant architecture for scaling Large Language Models by activating only a sparse subset of experts per token.
FedMOA: Federated GRPO for Personalized Reasoning LLMs under Heterogeneous Rewards	Ziyao Wang, Daeun Jung, Yexiao He, Guoheng Sun, Zheyu Shen, Myungjin Lee, Ang Li	2026-01-31	下载	Group Relative Policy Optimization (GRPO) has recently emerged as an effective approach for improving the reasoning capabilities of large language models through online multi-objective reinforcement l...
Stabilizing Decentralized Federated Fine-Tuning via Topology-Aware Alternating LoRA	Xiaoyu Wang, Xiaotian Li, Zhixiang Zhou, Chen Li, Yong Liu	2026-01-31	下载	Decentralized federated learning (DFL), a serverless variant of federated learning, poses unique challenges for parameter-efficient fine-tuning due to the factorized structure of low-rank adaptation (...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
LMTE: Putting the "Reasoning" into WAN Traffic Engineering with Language Models	Xinyu Yuan, Yan Qiao, Zonghui Wang, Meng Li, Wenzhi Chen	2026-01-31	下载	The rapid expansion of modern wide-area networks (WANs) has made traffic engineering (TE) increasingly challenging, as traditional solvers struggle to keep pace.
The Syntactic-Semantic Internet:Engineering Infrastructures for Autonomous Systems	Mallik Tatipamula, Xuesong Liu, Yao Sun, Muhammad Ali Imran	2026-01-31	下载	The Internet has evolved through successive architectural abstractions that enabled unprecedented scale, interoperability, and innovation. Packet-based networking enabled the reliable transport of bit...
How segmented is my network?	Rohit Dube	2026-01-31	下载	Network segmentation is a popular security practice for limiting lateral movement, yet practitioners lack a metric to measure how segmented a network actually is.
NetWorld: Communication-Based Diffusion World Model for Multi-Agent Reinforcement Learning in Wireless Networks	Kechen Meng, Rongpeng Li, Yansha Deng, Zhifeng Zhao, Honggang Zhang	2026-01-31	下载	As wireless communication networks grow in scale and complexity, diverse resource allocation tasks become increasingly critical. Multi-Agent Reinforcement Learning (MARL) provides a promising solution...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation	Shihao Wang, Jiahao Chen, Yanqi Pan, Hao Huang, Yichen Hao, Xiangyu Zou, Wen Xia, Wentao Zhang, Chongyang Qiu, Pengfei Wang	2026-01-31	下载	The prefill stage of long-context Retrieval-Augmented Generation (RAG) is severely bottlenecked by computational overhead. To mitigate this, recent methods assemble pre-calculated KV caches of retriev...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
WritePolicyBench: Benchmarking Memory Write Policies under Byte Budgets	Edgard El Cham	2026-01-31	下载	We introduce WritePolicyBench, a benchmark for evaluating memory write policies: decision rules that choose what to store, merge, and evict under a strict byte budget while processing a stream with do...