Skip to content

2026-01-31

cs.AR - Architecture

标题作者发布日期PDF摘要
ENFOR-SA: End-to-end Cross-layer Transient Fault Injector for Efficient and Accurate DNN Reliability Assessment on Systolic ArraysRafael Billig Tonetto, Marcello Traiola, Fernando Fernandes dos Santos, Angeliki Kritikakou2026-01-31下载Recent advances in deep learning have produced highly accurate but increasingly large and complex DNNs, making traditional fault-injection techniques impractical.
Exploration of Unary Arithmetic-Based Matrix Multiply Units for Low Precision DL AcceleratorsPrabhu Vellaisamy, Harideep Nair, Di Wu, Shawn Blanton, John Paul Shen2026-01-31下载General matrix multiplication (GEMM) is a fundamental operation in deep learning (DL). With DL moving increasingly toward low precision, recent works have proposed novel unary GEMM designs as an alter...
AutoGNN: End-to-End Hardware-Driven Graph Preprocessing for Enhanced GNN PerformanceSeungkwan Kang, Seungjun Lee, Donghyun Gouk, Miryeong Kwon, Hyunkyu Choi, Junhyeok Jang, Sangwon Lee, Huiwon Choi, Jie Zhang, Wonil Choi, Mahmut Taylan Kandemir, Myoungsoo Jung2026-01-31下载Graph neural network (GNN) inference faces significant bottlenecks in preprocessing, which often dominate overall inference latency. We introduce AutoGNN, an FPGA-based accelerator designed to address...
HyperOffload: Graph-Driven Hierarchical Memory Management for Large Language Models on SuperNode ArchitecturesFangxin Liu, Qinghua Zhang, Hanjing Shen, Zhibo Liang, Li Jiang, Haibing Guan, Chong Bao, Xuefeng Jin2026-01-31下载The rapid evolution of Large Language Models (LLMs) towards long-context reasoning and sparse architectures has pushed memory requirements far beyond the capacity of individual device HBM.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Fast Sparse Matrix Permutation for Mesh-Based Direct SolversBehrooz Zarebavami, Ahmed H. Mahmoud, Ana Dodik, Changcheng Yuan, Serban D. Porumbescu, John D. Owens, Maryam Mehri Dehnavi, Justin Solomon2026-01-31下载We present a fast sparse matrix permutation algorithm tailored to linear systems arising from triangle meshes. Our approach produces nested-dissection-style permutations while significantly reducing p...
System-Level Performance Modeling of Photonic In-Memory ComputingJebacyril Arockiaraj, Sasindu Wijeratne, Sugeet Sunder, Md Abdullah-Al Kaiser, Akhilesh Jaiswal, Ajey P. Jacob, Viktor Prasanna2026-01-31下载Photonic in-memory computing is a high-speed, low-energy alternative to traditional transistor-based digital computing that utilizes high photonic operating frequencies and bandwidths.
HyperOffload: Graph-Driven Hierarchical Memory Management for Large Language Models on SuperNode ArchitecturesFangxin Liu, Qinghua Zhang, Hanjing Shen, Zhibo Liang, Li Jiang, Haibing Guan, Chong Bao, Xuefeng Jin2026-01-31下载The rapid evolution of Large Language Models (LLMs) towards long-context reasoning and sparse architectures has pushed memory requirements far beyond the capacity of individual device HBM.
Forecasting Energy Availability in Local Energy Communities via LSTM Federated LearningFabio Turazza, Marcello Pietri, Natalia Selini Hadjidimitriou, Marco Mamei2026-01-31下载Local Energy Communities are emerging as crucial players in the landscape of sustainable development. A significant challenge for these communities is achieving self-sufficiency through effective mana...
PROBE: Co-Balancing Computation and Communication in MoE Inference via Real-Time Predictive PrefetchingQianchao Zhu, Xucheng Ye, Yuliang Liu, Haodong Ouyang, Chengru Song2026-01-31下载Mixture-of-Experts models have become a dominant architecture for scaling Large Language Models by activating only a sparse subset of experts per token.
FedMOA: Federated GRPO for Personalized Reasoning LLMs under Heterogeneous RewardsZiyao Wang, Daeun Jung, Yexiao He, Guoheng Sun, Zheyu Shen, Myungjin Lee, Ang Li2026-01-31下载Group Relative Policy Optimization (GRPO) has recently emerged as an effective approach for improving the reasoning capabilities of large language models through online multi-objective reinforcement l...
Stabilizing Decentralized Federated Fine-Tuning via Topology-Aware Alternating LoRAXiaoyu Wang, Xiaotian Li, Zhixiang Zhou, Chen Li, Yong Liu2026-01-31下载Decentralized federated learning (DFL), a serverless variant of federated learning, poses unique challenges for parameter-efficient fine-tuning due to the factorized structure of low-rank adaptation (...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
LMTE: Putting the "Reasoning" into WAN Traffic Engineering with Language ModelsXinyu Yuan, Yan Qiao, Zonghui Wang, Meng Li, Wenzhi Chen2026-01-31下载The rapid expansion of modern wide-area networks (WANs) has made traffic engineering (TE) increasingly challenging, as traditional solvers struggle to keep pace.
The Syntactic-Semantic Internet:Engineering Infrastructures for Autonomous SystemsMallik Tatipamula, Xuesong Liu, Yao Sun, Muhammad Ali Imran2026-01-31下载The Internet has evolved through successive architectural abstractions that enabled unprecedented scale, interoperability, and innovation. Packet-based networking enabled the reliable transport of bit...
How segmented is my network?Rohit Dube2026-01-31下载Network segmentation is a popular security practice for limiting lateral movement, yet practitioners lack a metric to measure how segmented a network actually is.
NetWorld: Communication-Based Diffusion World Model for Multi-Agent Reinforcement Learning in Wireless NetworksKechen Meng, Rongpeng Li, Yansha Deng, Zhifeng Zhao, Honggang Zhang2026-01-31下载As wireless communication networks grow in scale and complexity, diverse resource allocation tasks become increasingly critical. Multi-Agent Reinforcement Learning (MARL) provides a promising solution...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented GenerationShihao Wang, Jiahao Chen, Yanqi Pan, Hao Huang, Yichen Hao, Xiangyu Zou, Wen Xia, Wentao Zhang, Chongyang Qiu, Pengfei Wang2026-01-31下载The prefill stage of long-context Retrieval-Augmented Generation (RAG) is severely bottlenecked by computational overhead. To mitigate this, recent methods assemble pre-calculated KV caches of retriev...

cs.PF - Performance

标题作者发布日期PDF摘要
WritePolicyBench: Benchmarking Memory Write Policies under Byte BudgetsEdgard El Cham2026-01-31下载We introduce WritePolicyBench, a benchmark for evaluating memory write policies: decision rules that choose what to store, merge, and evict under a strict byte budget while processing a stream with do...

基于 VitePress 构建