Skip to content

2026-02-24

cs.AR - Architecture

标题作者发布日期PDF摘要
Dynamic Symmetric Point Tracking: Tackling Non-ideal Reference in Analog In-memory TrainingQuan Xiao, Jindan Li, Zhaoxian Wu, Tayfun Gokmen, Tianyi Chen2026-02-24下载Analog in-memory computing (AIMC) performs computation directly within resistive crossbar arrays, offering an energy-efficient platform to scale large vision and language models.
Heterogeneous Memory Design Exploration for AI Accelerators with a Gain Cell Memory CompilerXinxin Wang, Lixian Yan, Shuhan Liu, Luke Upton, Zhuoqi Cai, Yiming Tan, Shengman Li, Koustav Jana, Peijing Li, Jesse Cirimelli-Low, Thierry Tambe, Matthew Guthaus, H. -S. Philip Wong2026-02-24下载As memory increasingly dominates system cost and energy, heterogeneous on-chip memory systems that combine technologies with complementary characteristics are becoming essential.
Machine Learning on Heterogeneous, Edge, and Quantum Hardware for Particle Physics (ML-HEQUPP)Julia Gonski, Jenni Ott, Shiva Abbaszadeh, Sagar Addepalli, Matteo Cremonesi, Jennet Dickinson, Giuseppe Di Guglielmo, Erdem Yigit Ertorer, Lindsey Gray, Ryan Herbst, Christian Herwig, Tae Min Hong, Benedikt Maier, Maryam Bayat Makou, David Miller, Mark S. Neubauer, Cristián Peña, Dylan Rankin, Seon-Hee, Seo, Giordon Stark, Alexander Tapper, Audrey Corbeil Therrien, Ioannis Xiotidis, Keisuke Yoshihara, G Abarajithan, Sagar Addepalli, Nural Akchurin, Carlos Argüelles, Saptaparna Bhattacharya, Lorenzo Borella, Christian Boutan, Tom Braine, James Brau, Martin Breidenbach, Antonio Chahine, Talal Ahmed Chowdhury, Yuan-Tang Chou, Seokju Chung, Alberto Coppi, Mariarosaria D'Alfonso, Abhilasha Dave, Chance Desmet, Angela Di Fulvio, Karri DiPetrillo, Javier Duarte, Auralee Edelen, Jan Eysermans, Yongbin Feng, Emmett Forrestel, Dolores Garcia, Loredana Gastaldo, Julián García Pardiñas, Lino Gerlach, Loukas Gouskos, Katya Govorkova, Carl Grace, Christopher Grant, Philip Harris, Ciaran Hasnip, Timon Heim, Abraham Holtermann, Tae Min Hong, Gian Michele Innocenti, Koji Ishidoshiro, Miaochen Jin, Jyothisraj Johnson, Stephen Jones, Andreas Jung, Georgia Karagiorgi, Ryan Kastner, Nicholas Kamp, Doojin Kim, Kyoungchul Kong, Katie Kudela, Jelena Lalic, Bo-Cheng Lai, Yun-Tsung Lai, Tommy Lam, Jeffrey Lazar, Aobo Li, Zepeng Li, Haoyun Liu, Vladimir Lončar, Luca Macchiarulo, Christopher Madrid, Benedikt Maier, Zhenghua Ma, Prashansa Mukim, Mark S. Neubauer, Victoria Nguyen, Sungbin Oh, Isobel Ojalvo, Hideyoshi Ozaki, Simone Pagan Griso, Myeonghun Park, Christoph Paus, Santosh Parajuli, Benjamin Parpillon, Sara Pozzi, Ema Puljak, Benjamin Ramhorst, Amy Roberts, Larry Ruckman, Kate Scholberg, Sebastian Schmitt, Noah Singer, Eluned Anne Smith, Alexandre Sousa, Michael Spannowsky, Sioni Summers, Yanwen Sun, Daniel Tapia Takaki, Antonino Tumeo, Caterina Vernieri, Belina von Krosigk, Yash Vora, Linyan Wan, Michael H. L. S. Wang, Amanda Weinstein, Andy White, Simon Williams, Felix Yu2026-02-24下载The next generation of particle physics experiments will face a new era of challenges in data acquisition, due to unprecedented data rates and volumes along with extreme environments and operational c...
Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing AcceleratorsAtousa Jafari, Mahdi Taheri, Hassan Ghasemzadeh Mohammadi, Christian Herglotz, Marco Platzner2026-02-24下载This paper presents a compression framework for Reservoir Computing that enables systematic design-space exploration of trade-offs among quantization levels, pruning rates, model accuracy, and hardwar...
RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI AcceleratorsXinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu2026-02-24下载Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers.
LUTstructions: Self-loading FPGA-based Reconfigurable InstructionsPhilippos Papaphilippou2026-02-24下载General-purpose processors feature a limited number of instructions based on an instruction set. They can be numerous, such as with vector extensions that include hundreds or thousands of instructions...
TOM: A Ternary Read-only Memory Accelerator for LLM-powered Edge IntelligenceHongyi Guan, Yijia Zhang, Wenqiang Wang, Yizhao Gao, Shijie Cao, Chen Zhang, Ningyi Xu2026-02-24下载The deployment of Large Language Models (LLMs) for real-time intelligence on edge devices is rapidly growing. However, conventional hardware architectures face a fundamental memory wall challenge, whe...
Measurement-Free Ancilla Recycling via Blind Reset: A Cross-Platform Study on Superconducting and Trapped-Ion ProcessorsSangkeum Lee2026-02-24下载Ancilla reuse in repeated syndrome extraction couples reset quality to logical-cycle latency. We evaluate blind reset -- unitary-only recycling via scaled sequence replay -- on IQM Garnet, Rigetti Ank...
FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM PrefillRakshith Jayanth, Viktor Prasanna2026-02-24下载In long-context large language model (LLM) inference, the prefill stage dominates computation due to self-attention over the complete input context.
SegSEM: Enabling and Enhancing SAM2 for SEM Contour ExtractionDa Chen, Guangyu Hu, Kaihong Xu, Kaichao Liang, Songjiang Li, Wei Yang, XiangYu Wen, Mingxuan Yuan2026-02-24下载Extracting high-fidelity 2D contours from Scanning Electron Microscope (SEM) images is critical for calibrating Optical Proximity Correction (OPC) models.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
General Convex Agreement with Near-Optimal CommunicationMarc Dufay, Diana Ghinea, Anton Paramonov2026-02-24下载Convex Agreement (CA) strengthens Byzantine Agreement (BA) by requiring the output agreed upon to lie in the convex hull of the honest parties' inputs.
Untied Ulysses: Memory-Efficient Context Parallelism via Headwise ChunkingRavi Ghadia, Maksim Abraham, Sergei Vorobyov, Max Ryabinin2026-02-24下载Efficiently processing long sequences with Transformer models usually requires splitting the computations across accelerators via context parallelism.
Circumventing the CAP Theorem with Open Atomic EthernetPaul Borrill2026-02-24下载The CAP theorem is routinely treated as a systems law: under network partition, a replicated service must sacrifice either consistency or availability.
Scaling State-Space Models on Multiple GPUs with Tensor ParallelismAnurag Dutt, Nimit Shah, Hazem Masarani, Anshul Gandhi2026-02-24下载Selective state space models (SSMs) have rapidly become a compelling backbone for large language models, especially for long-context workloads.
ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference DeploymentsHaley Li, Xinglu Wang, Cong Feng, Chunxu Zuo, Yanan Wang, Hei Lo, Yufei Cui, Bingji Wang, Duo Cui, Shuming Jing, Yizhou Shan, Ying Xiong, Jiannan Wang, Yong Zhang, Zhenan Fan2026-02-24下载As LLM deployments scale over more hardware, the probability of a single failure in a system increases significantly, and cloud operators must consider robust countermeasures to handle these inevitabl...
MineDraft: A Framework for Batch Parallel Speculative DecodingZhenwei Tang, Arun Verma, Zijian Zhou, Zhaoxuan Wu, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low2026-02-24下载Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to propose draft tokens that are subsequently verified by a larger target model.
Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing AcceleratorsAtousa Jafari, Mahdi Taheri, Hassan Ghasemzadeh Mohammadi, Christian Herglotz, Marco Platzner2026-02-24下载This paper presents a compression framework for Reservoir Computing that enables systematic design-space exploration of trade-offs among quantization levels, pruning rates, model accuracy, and hardwar...
Autonomous Edge-Deployed AI Agents for Electric Vehicle Charging Infrastructure ManagementMohammed Cherifi2026-02-24下载Public EV charging infrastructure suffers from significant failure rates -- with field studies reporting up to 27.5% of DC fast chargers non-functional -- and multi-day mean time to resolution, imposi...
Benchmarking Federated Learning in Edge Computing Environments: A Systematic Review and Performance EvaluationSales Aribe, Gil Nicholas Cagande2026-02-24下载Federated Learning (FL) has emerged as a transformative approach for distributed machine learning, particularly in edge computing environments where data privacy, low latency, and bandwidth efficiency...
Is a LOCAL algorithm computable?Antonio Cruciani, Avinandan Das, Massimo Equi, Henrik Lievonen, Diep Luong-Le, Augusto Modanese, Jukka Suomela2026-02-24下载Common definitions of the "standard" LOCAL model tend to be sloppy and even self-contradictory on one point: do the nodes update their state using an arbitrary function or a computable function? So fa...
A Morton-Type Space-Filling Curve for Pyramid Subdivision and Hybrid Adaptive Mesh RefinementDavid Knapp, Johannes Albrecht Holke, Thomas Spenke, Carsten Burstedde2026-02-24下载The forest-of-refinement-trees approach allows for dynamic adaptive mesh refinement (AMR) at negligible cost. While originally developed for quadrilateral and hexahedral elements, previous work establ...
RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI AcceleratorsXinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu2026-02-24下载Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers.
RSH-SpMM: A Row-Structured Hybrid Kernel for Sparse Matrix-Matrix Multiplication on GPUsAiying Li, Jingwei Sun, Han Li, Wence Ji, Guangzhong Sun2026-02-24下载Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental computation in graph analytics, scientific simulation, and sparse deep learning workloads.
From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code GenerationLinus Bantel, Moritz Strack, Alexander Strack, Dirk Pflüger2026-02-24下载Large Language Models (LLM) show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied.
Lagom: Unleashing the Power of Communication and Computation Overlapping for Distributed LLM TrainingGuanbin Xu, ZhenGuo Xu, Yuzhe Li, Youhui Bai, Ping Gong, Chaoyi Ruan, Cheng Li2026-02-24下载Overlapping communication with computation is crucial for distributed large-model training, yet optimizing it - especially when computation becomes the bottleneck-remains challenging.
A Granularity Characterization of Task Scheduling EffectivenessSana Taghipour Anvari, David Kaeli2026-02-24下载Task-based runtime systems provide flexible load balancing and portability for parallel scientific applications, but their strong scaling is highly sensitive to task granularity.
Heterogeneity-Aware Client Selection Methodology For Efficient Federated LearningNihal Balivada, Shrey Gupta, Shashank Shreedhar Bhatt, Suyash Gupta2026-02-24下载Federated Learning (FL) enables a distributed client-server architecture where multiple clients collaboratively train a global Machine Learning (ML) model without sharing sensitive local data.
Circumventing the FLP Impossibility Result with Open Atomic EthernetPaul Borrill2026-02-24下载The Fischer--Lynch--Paterson (FLP) impossibility result is widely regarded as one of the most fundamental negative results in distributed computing: no deterministic protocol can guarantee consensus i...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Compensating the Packet Delay Variation for 6G Integrated with IEEE Time-Sensitive NetworkingMarilet De Andrade, Joachim Sachs, Lucas Haug, Simon Egger, Frank Dürr, Balázs Varga, Janos Farkas, György Miklós2026-02-24下载6G is deemed as a key technology to support emerging applications with stringent requirements for highly dependable and timecritical communication.
UnlinkableDFL: a Practical Mixnet Protocol for Churn-Tolerant Decentralized FL Model SharingChao Feng, Thomas Grubl, Jan von der Assen, Sandrin Raphael Hunkeler, Linn Anna Spitz, Gerome Bovet, Burkhard Stiller2026-02-24下载Decentralized Federated Learning (DFL) eliminates the need for a central aggregator, but it can expose communication patterns that reveal participant identities.
The Instability of all Backoff ProtocolsLeslie Ann Goldberg, John Lapinskas2026-02-24下载In this paper we prove Aldous's conjecture from 1987 that there is no backoff protocol that is stable for any positive arrival rate. The setting is a communication channel for coordinating requests fo...
Telemetry-Based Server Selection in the Quantum Internet via Cross-Layer Runtime EstimationMasaki Nagai, Hideaki Kawaguchi, Shin Nishio, Takahiko Satoh2026-02-24下载The Quantum Internet will allow clients to delegate quantum workloads to remote servers over heterogeneous networks, but choosing the server that minimizes end-to-end execution time is difficult becau...
Airavat: An Agentic Framework for Internet MeasurementAlagappan Ramanathan, Eunju Kang, Dongsu Han, Sangeetha Abdu Jyothi2026-02-24下载Internet measurement faces twin challenges: complex analyses require expert-level orchestration of tools, yet even syntactically correct implementations can have methodological flaws and can be diffic...
Deep Reinforcement Learning Based Block Coordinate Descent for Downlink Weighted Sum-rate Maximization on AI-Native Wireless NetworksSiya Chen, Chee Wei Tan, H. Vincent Poor2026-02-24下载This paper introduces a deep reinforcement learning-based block coordinate descent (DRL-based BCD) algorithm to address the nonconvex weighted sum-rate maximization (WSRM) problem with a total power c...
AWCP: A Workspace Delegation Protocol for Deep-Engagement Collaboration across Remote AgentsXiaohang Nie, Zihan Guo, Youliang Chen, Yuanjian Zhou, Weinan Zhang2026-02-24下载The rapid evolution of Large Language Model (LLM)-based autonomous agents is reshaping the digital landscape toward an emerging Agentic Web, where increasingly specialized agents must collaborate to a...
Threadle: A Memory-Efficient Network Storage and Query Engine for Large, Multilayer, and Mixed-mode NetworksCarl Nordlund, Yukun Jiao2026-02-24下载We present Threadle, an open-source, high-performance, and memory-efficient network storage and query engine written in C#. Designed for working with full-population networks derived from administrati...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Exploiting Dependency and Parallelism: Real-Time Scheduling and Analysis for GPU TasksYuanhai Zhang, Songyang He, Ruizhe Gou, Mingyue Cui, Boyang Li, Shuai Zhao, Kai Huang2026-02-24下载With the rapid advancement of Artificial Intelligence, the Graphics Processing Unit (GPU) has become increasingly essential across a growing number of safety-critical application domains.

cs.PF - Performance

标题作者发布日期PDF摘要
RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI AcceleratorsXinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu2026-02-24下载Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers.

基于 VitePress 构建