2026-02-24

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Dynamic Symmetric Point Tracking: Tackling Non-ideal Reference in Analog In-memory Training	Quan Xiao, Jindan Li, Zhaoxian Wu, Tayfun Gokmen, Tianyi Chen	2026-02-24	下载	Analog in-memory computing (AIMC) performs computation directly within resistive crossbar arrays, offering an energy-efficient platform to scale large vision and language models.
Heterogeneous Memory Design Exploration for AI Accelerators with a Gain Cell Memory Compiler	Xinxin Wang, Lixian Yan, Shuhan Liu, Luke Upton, Zhuoqi Cai, Yiming Tan, Shengman Li, Koustav Jana, Peijing Li, Jesse Cirimelli-Low, Thierry Tambe, Matthew Guthaus, H. -S. Philip Wong	2026-02-24	下载	As memory increasingly dominates system cost and energy, heterogeneous on-chip memory systems that combine technologies with complementary characteristics are becoming essential.
Machine Learning on Heterogeneous, Edge, and Quantum Hardware for Particle Physics (ML-HEQUPP)	Julia Gonski, Jenni Ott, Shiva Abbaszadeh, Sagar Addepalli, Matteo Cremonesi, Jennet Dickinson, Giuseppe Di Guglielmo, Erdem Yigit Ertorer, Lindsey Gray, Ryan Herbst, Christian Herwig, Tae Min Hong, Benedikt Maier, Maryam Bayat Makou, David Miller, Mark S. Neubauer, Cristián Peña, Dylan Rankin, Seon-Hee, Seo, Giordon Stark, Alexander Tapper, Audrey Corbeil Therrien, Ioannis Xiotidis, Keisuke Yoshihara, G Abarajithan, Sagar Addepalli, Nural Akchurin, Carlos Argüelles, Saptaparna Bhattacharya, Lorenzo Borella, Christian Boutan, Tom Braine, James Brau, Martin Breidenbach, Antonio Chahine, Talal Ahmed Chowdhury, Yuan-Tang Chou, Seokju Chung, Alberto Coppi, Mariarosaria D'Alfonso, Abhilasha Dave, Chance Desmet, Angela Di Fulvio, Karri DiPetrillo, Javier Duarte, Auralee Edelen, Jan Eysermans, Yongbin Feng, Emmett Forrestel, Dolores Garcia, Loredana Gastaldo, Julián García Pardiñas, Lino Gerlach, Loukas Gouskos, Katya Govorkova, Carl Grace, Christopher Grant, Philip Harris, Ciaran Hasnip, Timon Heim, Abraham Holtermann, Tae Min Hong, Gian Michele Innocenti, Koji Ishidoshiro, Miaochen Jin, Jyothisraj Johnson, Stephen Jones, Andreas Jung, Georgia Karagiorgi, Ryan Kastner, Nicholas Kamp, Doojin Kim, Kyoungchul Kong, Katie Kudela, Jelena Lalic, Bo-Cheng Lai, Yun-Tsung Lai, Tommy Lam, Jeffrey Lazar, Aobo Li, Zepeng Li, Haoyun Liu, Vladimir Lončar, Luca Macchiarulo, Christopher Madrid, Benedikt Maier, Zhenghua Ma, Prashansa Mukim, Mark S. Neubauer, Victoria Nguyen, Sungbin Oh, Isobel Ojalvo, Hideyoshi Ozaki, Simone Pagan Griso, Myeonghun Park, Christoph Paus, Santosh Parajuli, Benjamin Parpillon, Sara Pozzi, Ema Puljak, Benjamin Ramhorst, Amy Roberts, Larry Ruckman, Kate Scholberg, Sebastian Schmitt, Noah Singer, Eluned Anne Smith, Alexandre Sousa, Michael Spannowsky, Sioni Summers, Yanwen Sun, Daniel Tapia Takaki, Antonino Tumeo, Caterina Vernieri, Belina von Krosigk, Yash Vora, Linyan Wan, Michael H. L. S. Wang, Amanda Weinstein, Andy White, Simon Williams, Felix Yu	2026-02-24	下载	The next generation of particle physics experiments will face a new era of challenges in data acquisition, due to unprecedented data rates and volumes along with extreme environments and operational c...
Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing Accelerators	Atousa Jafari, Mahdi Taheri, Hassan Ghasemzadeh Mohammadi, Christian Herglotz, Marco Platzner	2026-02-24	下载	This paper presents a compression framework for Reservoir Computing that enables systematic design-space exploration of trade-offs among quantization levels, pruning rates, model accuracy, and hardwar...
RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators	Xinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu	2026-02-24	下载	Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers.
LUTstructions: Self-loading FPGA-based Reconfigurable Instructions	Philippos Papaphilippou	2026-02-24	下载	General-purpose processors feature a limited number of instructions based on an instruction set. They can be numerous, such as with vector extensions that include hundreds or thousands of instructions...
TOM: A Ternary Read-only Memory Accelerator for LLM-powered Edge Intelligence	Hongyi Guan, Yijia Zhang, Wenqiang Wang, Yizhao Gao, Shijie Cao, Chen Zhang, Ningyi Xu	2026-02-24	下载	The deployment of Large Language Models (LLMs) for real-time intelligence on edge devices is rapidly growing. However, conventional hardware architectures face a fundamental memory wall challenge, whe...
Measurement-Free Ancilla Recycling via Blind Reset: A Cross-Platform Study on Superconducting and Trapped-Ion Processors	Sangkeum Lee	2026-02-24	下载	Ancilla reuse in repeated syndrome extraction couples reset quality to logical-cycle latency. We evaluate blind reset -- unitary-only recycling via scaled sequence replay -- on IQM Garnet, Rigetti Ank...
FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill	Rakshith Jayanth, Viktor Prasanna	2026-02-24	下载	In long-context large language model (LLM) inference, the prefill stage dominates computation due to self-attention over the complete input context.
SegSEM: Enabling and Enhancing SAM2 for SEM Contour Extraction	Da Chen, Guangyu Hu, Kaihong Xu, Kaichao Liang, Songjiang Li, Wei Yang, XiangYu Wen, Mingxuan Yuan	2026-02-24	下载	Extracting high-fidelity 2D contours from Scanning Electron Microscope (SEM) images is critical for calibrating Optical Proximity Correction (OPC) models.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
General Convex Agreement with Near-Optimal Communication	Marc Dufay, Diana Ghinea, Anton Paramonov	2026-02-24	下载	Convex Agreement (CA) strengthens Byzantine Agreement (BA) by requiring the output agreed upon to lie in the convex hull of the honest parties' inputs.
Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking	Ravi Ghadia, Maksim Abraham, Sergei Vorobyov, Max Ryabinin	2026-02-24	下载	Efficiently processing long sequences with Transformer models usually requires splitting the computations across accelerators via context parallelism.
Circumventing the CAP Theorem with Open Atomic Ethernet	Paul Borrill	2026-02-24	下载	The CAP theorem is routinely treated as a systems law: under network partition, a replicated service must sacrifice either consistency or availability.
Scaling State-Space Models on Multiple GPUs with Tensor Parallelism	Anurag Dutt, Nimit Shah, Hazem Masarani, Anshul Gandhi	2026-02-24	下载	Selective state space models (SSMs) have rapidly become a compelling backbone for large language models, especially for long-context workloads.
ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments	Haley Li, Xinglu Wang, Cong Feng, Chunxu Zuo, Yanan Wang, Hei Lo, Yufei Cui, Bingji Wang, Duo Cui, Shuming Jing, Yizhou Shan, Ying Xiong, Jiannan Wang, Yong Zhang, Zhenan Fan	2026-02-24	下载	As LLM deployments scale over more hardware, the probability of a single failure in a system increases significantly, and cloud operators must consider robust countermeasures to handle these inevitabl...
MineDraft: A Framework for Batch Parallel Speculative Decoding	Zhenwei Tang, Arun Verma, Zijian Zhou, Zhaoxuan Wu, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low	2026-02-24	下载	Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to propose draft tokens that are subsequently verified by a larger target model.
Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing Accelerators	Atousa Jafari, Mahdi Taheri, Hassan Ghasemzadeh Mohammadi, Christian Herglotz, Marco Platzner	2026-02-24	下载	This paper presents a compression framework for Reservoir Computing that enables systematic design-space exploration of trade-offs among quantization levels, pruning rates, model accuracy, and hardwar...
Autonomous Edge-Deployed AI Agents for Electric Vehicle Charging Infrastructure Management	Mohammed Cherifi	2026-02-24	下载	Public EV charging infrastructure suffers from significant failure rates -- with field studies reporting up to 27.5% of DC fast chargers non-functional -- and multi-day mean time to resolution, imposi...
Benchmarking Federated Learning in Edge Computing Environments: A Systematic Review and Performance Evaluation	Sales Aribe, Gil Nicholas Cagande	2026-02-24	下载	Federated Learning (FL) has emerged as a transformative approach for distributed machine learning, particularly in edge computing environments where data privacy, low latency, and bandwidth efficiency...
Is a LOCAL algorithm computable?	Antonio Cruciani, Avinandan Das, Massimo Equi, Henrik Lievonen, Diep Luong-Le, Augusto Modanese, Jukka Suomela	2026-02-24	下载	Common definitions of the "standard" LOCAL model tend to be sloppy and even self-contradictory on one point: do the nodes update their state using an arbitrary function or a computable function? So fa...
A Morton-Type Space-Filling Curve for Pyramid Subdivision and Hybrid Adaptive Mesh Refinement	David Knapp, Johannes Albrecht Holke, Thomas Spenke, Carsten Burstedde	2026-02-24	下载	The forest-of-refinement-trees approach allows for dynamic adaptive mesh refinement (AMR) at negligible cost. While originally developed for quadrilateral and hexahedral elements, previous work establ...
RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators	Xinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu	2026-02-24	下载	Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers.
RSH-SpMM: A Row-Structured Hybrid Kernel for Sparse Matrix-Matrix Multiplication on GPUs	Aiying Li, Jingwei Sun, Han Li, Wence Ji, Guangzhong Sun	2026-02-24	下载	Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental computation in graph analytics, scientific simulation, and sparse deep learning workloads.
From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation	Linus Bantel, Moritz Strack, Alexander Strack, Dirk Pflüger	2026-02-24	下载	Large Language Models (LLM) show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied.
Lagom: Unleashing the Power of Communication and Computation Overlapping for Distributed LLM Training	Guanbin Xu, ZhenGuo Xu, Yuzhe Li, Youhui Bai, Ping Gong, Chaoyi Ruan, Cheng Li	2026-02-24	下载	Overlapping communication with computation is crucial for distributed large-model training, yet optimizing it - especially when computation becomes the bottleneck-remains challenging.
A Granularity Characterization of Task Scheduling Effectiveness	Sana Taghipour Anvari, David Kaeli	2026-02-24	下载	Task-based runtime systems provide flexible load balancing and portability for parallel scientific applications, but their strong scaling is highly sensitive to task granularity.
Heterogeneity-Aware Client Selection Methodology For Efficient Federated Learning	Nihal Balivada, Shrey Gupta, Shashank Shreedhar Bhatt, Suyash Gupta	2026-02-24	下载	Federated Learning (FL) enables a distributed client-server architecture where multiple clients collaboratively train a global Machine Learning (ML) model without sharing sensitive local data.
Circumventing the FLP Impossibility Result with Open Atomic Ethernet	Paul Borrill	2026-02-24	下载	The Fischer--Lynch--Paterson (FLP) impossibility result is widely regarded as one of the most fundamental negative results in distributed computing: no deterministic protocol can guarantee consensus i...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Compensating the Packet Delay Variation for 6G Integrated with IEEE Time-Sensitive Networking	Marilet De Andrade, Joachim Sachs, Lucas Haug, Simon Egger, Frank Dürr, Balázs Varga, Janos Farkas, György Miklós	2026-02-24	下载	6G is deemed as a key technology to support emerging applications with stringent requirements for highly dependable and timecritical communication.
UnlinkableDFL: a Practical Mixnet Protocol for Churn-Tolerant Decentralized FL Model Sharing	Chao Feng, Thomas Grubl, Jan von der Assen, Sandrin Raphael Hunkeler, Linn Anna Spitz, Gerome Bovet, Burkhard Stiller	2026-02-24	下载	Decentralized Federated Learning (DFL) eliminates the need for a central aggregator, but it can expose communication patterns that reveal participant identities.
The Instability of all Backoff Protocols	Leslie Ann Goldberg, John Lapinskas	2026-02-24	下载	In this paper we prove Aldous's conjecture from 1987 that there is no backoff protocol that is stable for any positive arrival rate. The setting is a communication channel for coordinating requests fo...
Telemetry-Based Server Selection in the Quantum Internet via Cross-Layer Runtime Estimation	Masaki Nagai, Hideaki Kawaguchi, Shin Nishio, Takahiko Satoh	2026-02-24	下载	The Quantum Internet will allow clients to delegate quantum workloads to remote servers over heterogeneous networks, but choosing the server that minimizes end-to-end execution time is difficult becau...
Airavat: An Agentic Framework for Internet Measurement	Alagappan Ramanathan, Eunju Kang, Dongsu Han, Sangeetha Abdu Jyothi	2026-02-24	下载	Internet measurement faces twin challenges: complex analyses require expert-level orchestration of tools, yet even syntactically correct implementations can have methodological flaws and can be diffic...
Deep Reinforcement Learning Based Block Coordinate Descent for Downlink Weighted Sum-rate Maximization on AI-Native Wireless Networks	Siya Chen, Chee Wei Tan, H. Vincent Poor	2026-02-24	下载	This paper introduces a deep reinforcement learning-based block coordinate descent (DRL-based BCD) algorithm to address the nonconvex weighted sum-rate maximization (WSRM) problem with a total power c...
AWCP: A Workspace Delegation Protocol for Deep-Engagement Collaboration across Remote Agents	Xiaohang Nie, Zihan Guo, Youliang Chen, Yuanjian Zhou, Weinan Zhang	2026-02-24	下载	The rapid evolution of Large Language Model (LLM)-based autonomous agents is reshaping the digital landscape toward an emerging Agentic Web, where increasingly specialized agents must collaborate to a...
Threadle: A Memory-Efficient Network Storage and Query Engine for Large, Multilayer, and Mixed-mode Networks	Carl Nordlund, Yukun Jiao	2026-02-24	下载	We present Threadle, an open-source, high-performance, and memory-efficient network storage and query engine written in C#. Designed for working with full-population networks derived from administrati...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Exploiting Dependency and Parallelism: Real-Time Scheduling and Analysis for GPU Tasks	Yuanhai Zhang, Songyang He, Ruizhe Gou, Mingyue Cui, Boyang Li, Shuai Zhao, Kai Huang	2026-02-24	下载	With the rapid advancement of Artificial Intelligence, the Graphics Processing Unit (GPU) has become increasingly essential across a growing number of safety-critical application domains.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators	Xinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu	2026-02-24	下载	Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers.