Appearance
2026-02-24
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Dynamic Symmetric Point Tracking: Tackling Non-ideal Reference in Analog In-memory Training | Quan Xiao, Jindan Li, Zhaoxian Wu, Tayfun Gokmen, Tianyi Chen | 2026-02-24 | 下载 | Analog in-memory computing (AIMC) performs computation directly within resistive crossbar arrays, offering an energy-efficient platform to scale large vision and language models. |
| Heterogeneous Memory Design Exploration for AI Accelerators with a Gain Cell Memory Compiler | Xinxin Wang, Lixian Yan, Shuhan Liu, Luke Upton, Zhuoqi Cai, Yiming Tan, Shengman Li, Koustav Jana, Peijing Li, Jesse Cirimelli-Low, Thierry Tambe, Matthew Guthaus, H. -S. Philip Wong | 2026-02-24 | 下载 | As memory increasingly dominates system cost and energy, heterogeneous on-chip memory systems that combine technologies with complementary characteristics are becoming essential. |
| Machine Learning on Heterogeneous, Edge, and Quantum Hardware for Particle Physics (ML-HEQUPP) | Julia Gonski, Jenni Ott, Shiva Abbaszadeh, Sagar Addepalli, Matteo Cremonesi, Jennet Dickinson, Giuseppe Di Guglielmo, Erdem Yigit Ertorer, Lindsey Gray, Ryan Herbst, Christian Herwig, Tae Min Hong, Benedikt Maier, Maryam Bayat Makou, David Miller, Mark S. Neubauer, Cristián Peña, Dylan Rankin, Seon-Hee, Seo, Giordon Stark, Alexander Tapper, Audrey Corbeil Therrien, Ioannis Xiotidis, Keisuke Yoshihara, G Abarajithan, Sagar Addepalli, Nural Akchurin, Carlos Argüelles, Saptaparna Bhattacharya, Lorenzo Borella, Christian Boutan, Tom Braine, James Brau, Martin Breidenbach, Antonio Chahine, Talal Ahmed Chowdhury, Yuan-Tang Chou, Seokju Chung, Alberto Coppi, Mariarosaria D'Alfonso, Abhilasha Dave, Chance Desmet, Angela Di Fulvio, Karri DiPetrillo, Javier Duarte, Auralee Edelen, Jan Eysermans, Yongbin Feng, Emmett Forrestel, Dolores Garcia, Loredana Gastaldo, Julián García Pardiñas, Lino Gerlach, Loukas Gouskos, Katya Govorkova, Carl Grace, Christopher Grant, Philip Harris, Ciaran Hasnip, Timon Heim, Abraham Holtermann, Tae Min Hong, Gian Michele Innocenti, Koji Ishidoshiro, Miaochen Jin, Jyothisraj Johnson, Stephen Jones, Andreas Jung, Georgia Karagiorgi, Ryan Kastner, Nicholas Kamp, Doojin Kim, Kyoungchul Kong, Katie Kudela, Jelena Lalic, Bo-Cheng Lai, Yun-Tsung Lai, Tommy Lam, Jeffrey Lazar, Aobo Li, Zepeng Li, Haoyun Liu, Vladimir Lončar, Luca Macchiarulo, Christopher Madrid, Benedikt Maier, Zhenghua Ma, Prashansa Mukim, Mark S. Neubauer, Victoria Nguyen, Sungbin Oh, Isobel Ojalvo, Hideyoshi Ozaki, Simone Pagan Griso, Myeonghun Park, Christoph Paus, Santosh Parajuli, Benjamin Parpillon, Sara Pozzi, Ema Puljak, Benjamin Ramhorst, Amy Roberts, Larry Ruckman, Kate Scholberg, Sebastian Schmitt, Noah Singer, Eluned Anne Smith, Alexandre Sousa, Michael Spannowsky, Sioni Summers, Yanwen Sun, Daniel Tapia Takaki, Antonino Tumeo, Caterina Vernieri, Belina von Krosigk, Yash Vora, Linyan Wan, Michael H. L. S. Wang, Amanda Weinstein, Andy White, Simon Williams, Felix Yu | 2026-02-24 | 下载 | The next generation of particle physics experiments will face a new era of challenges in data acquisition, due to unprecedented data rates and volumes along with extreme environments and operational c... |
| Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing Accelerators | Atousa Jafari, Mahdi Taheri, Hassan Ghasemzadeh Mohammadi, Christian Herglotz, Marco Platzner | 2026-02-24 | 下载 | This paper presents a compression framework for Reservoir Computing that enables systematic design-space exploration of trade-offs among quantization levels, pruning rates, model accuracy, and hardwar... |
| RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators | Xinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu | 2026-02-24 | 下载 | Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers. |
| LUTstructions: Self-loading FPGA-based Reconfigurable Instructions | Philippos Papaphilippou | 2026-02-24 | 下载 | General-purpose processors feature a limited number of instructions based on an instruction set. They can be numerous, such as with vector extensions that include hundreds or thousands of instructions... |
| TOM: A Ternary Read-only Memory Accelerator for LLM-powered Edge Intelligence | Hongyi Guan, Yijia Zhang, Wenqiang Wang, Yizhao Gao, Shijie Cao, Chen Zhang, Ningyi Xu | 2026-02-24 | 下载 | The deployment of Large Language Models (LLMs) for real-time intelligence on edge devices is rapidly growing. However, conventional hardware architectures face a fundamental memory wall challenge, whe... |
| Measurement-Free Ancilla Recycling via Blind Reset: A Cross-Platform Study on Superconducting and Trapped-Ion Processors | Sangkeum Lee | 2026-02-24 | 下载 | Ancilla reuse in repeated syndrome extraction couples reset quality to logical-cycle latency. We evaluate blind reset -- unitary-only recycling via scaled sequence replay -- on IQM Garnet, Rigetti Ank... |
| FAST-Prefill: FPGA Accelerated Sparse Attention for Long Context LLM Prefill | Rakshith Jayanth, Viktor Prasanna | 2026-02-24 | 下载 | In long-context large language model (LLM) inference, the prefill stage dominates computation due to self-attention over the complete input context. |
| SegSEM: Enabling and Enhancing SAM2 for SEM Contour Extraction | Da Chen, Guangyu Hu, Kaihong Xu, Kaichao Liang, Songjiang Li, Wei Yang, XiangYu Wen, Mingxuan Yuan | 2026-02-24 | 下载 | Extracting high-fidelity 2D contours from Scanning Electron Microscope (SEM) images is critical for calibrating Optical Proximity Correction (OPC) models. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| General Convex Agreement with Near-Optimal Communication | Marc Dufay, Diana Ghinea, Anton Paramonov | 2026-02-24 | 下载 | Convex Agreement (CA) strengthens Byzantine Agreement (BA) by requiring the output agreed upon to lie in the convex hull of the honest parties' inputs. |
| Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking | Ravi Ghadia, Maksim Abraham, Sergei Vorobyov, Max Ryabinin | 2026-02-24 | 下载 | Efficiently processing long sequences with Transformer models usually requires splitting the computations across accelerators via context parallelism. |
| Circumventing the CAP Theorem with Open Atomic Ethernet | Paul Borrill | 2026-02-24 | 下载 | The CAP theorem is routinely treated as a systems law: under network partition, a replicated service must sacrifice either consistency or availability. |
| Scaling State-Space Models on Multiple GPUs with Tensor Parallelism | Anurag Dutt, Nimit Shah, Hazem Masarani, Anshul Gandhi | 2026-02-24 | 下载 | Selective state space models (SSMs) have rapidly become a compelling backbone for large language models, especially for long-context workloads. |
| ReviveMoE: Fast Recovery for Hardware Failures in Large-Scale MoE LLM Inference Deployments | Haley Li, Xinglu Wang, Cong Feng, Chunxu Zuo, Yanan Wang, Hei Lo, Yufei Cui, Bingji Wang, Duo Cui, Shuming Jing, Yizhou Shan, Ying Xiong, Jiannan Wang, Yong Zhang, Zhenan Fan | 2026-02-24 | 下载 | As LLM deployments scale over more hardware, the probability of a single failure in a system increases significantly, and cloud operators must consider robust countermeasures to handle these inevitabl... |
| MineDraft: A Framework for Batch Parallel Speculative Decoding | Zhenwei Tang, Arun Verma, Zijian Zhou, Zhaoxuan Wu, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low | 2026-02-24 | 下载 | Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to propose draft tokens that are subsequently verified by a larger target model. |
| Sensitivity-Guided Framework for Pruned and Quantized Reservoir Computing Accelerators | Atousa Jafari, Mahdi Taheri, Hassan Ghasemzadeh Mohammadi, Christian Herglotz, Marco Platzner | 2026-02-24 | 下载 | This paper presents a compression framework for Reservoir Computing that enables systematic design-space exploration of trade-offs among quantization levels, pruning rates, model accuracy, and hardwar... |
| Autonomous Edge-Deployed AI Agents for Electric Vehicle Charging Infrastructure Management | Mohammed Cherifi | 2026-02-24 | 下载 | Public EV charging infrastructure suffers from significant failure rates -- with field studies reporting up to 27.5% of DC fast chargers non-functional -- and multi-day mean time to resolution, imposi... |
| Benchmarking Federated Learning in Edge Computing Environments: A Systematic Review and Performance Evaluation | Sales Aribe, Gil Nicholas Cagande | 2026-02-24 | 下载 | Federated Learning (FL) has emerged as a transformative approach for distributed machine learning, particularly in edge computing environments where data privacy, low latency, and bandwidth efficiency... |
| Is a LOCAL algorithm computable? | Antonio Cruciani, Avinandan Das, Massimo Equi, Henrik Lievonen, Diep Luong-Le, Augusto Modanese, Jukka Suomela | 2026-02-24 | 下载 | Common definitions of the "standard" LOCAL model tend to be sloppy and even self-contradictory on one point: do the nodes update their state using an arbitrary function or a computable function? So fa... |
| A Morton-Type Space-Filling Curve for Pyramid Subdivision and Hybrid Adaptive Mesh Refinement | David Knapp, Johannes Albrecht Holke, Thomas Spenke, Carsten Burstedde | 2026-02-24 | 下载 | The forest-of-refinement-trees approach allows for dynamic adaptive mesh refinement (AMR) at negligible cost. While originally developed for quadrilateral and hexahedral elements, previous work establ... |
| RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators | Xinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu | 2026-02-24 | 下载 | Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers. |
| RSH-SpMM: A Row-Structured Hybrid Kernel for Sparse Matrix-Matrix Multiplication on GPUs | Aiying Li, Jingwei Sun, Han Li, Wence Ji, Guangzhong Sun | 2026-02-24 | 下载 | Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental computation in graph analytics, scientific simulation, and sparse deep learning workloads. |
| From Prompts to Performance: Evaluating LLMs for Task-based Parallel Code Generation | Linus Bantel, Moritz Strack, Alexander Strack, Dirk Pflüger | 2026-02-24 | 下载 | Large Language Models (LLM) show strong abilities in code generation, but their skill in creating efficient parallel programs is less studied. |
| Lagom: Unleashing the Power of Communication and Computation Overlapping for Distributed LLM Training | Guanbin Xu, ZhenGuo Xu, Yuzhe Li, Youhui Bai, Ping Gong, Chaoyi Ruan, Cheng Li | 2026-02-24 | 下载 | Overlapping communication with computation is crucial for distributed large-model training, yet optimizing it - especially when computation becomes the bottleneck-remains challenging. |
| A Granularity Characterization of Task Scheduling Effectiveness | Sana Taghipour Anvari, David Kaeli | 2026-02-24 | 下载 | Task-based runtime systems provide flexible load balancing and portability for parallel scientific applications, but their strong scaling is highly sensitive to task granularity. |
| Heterogeneity-Aware Client Selection Methodology For Efficient Federated Learning | Nihal Balivada, Shrey Gupta, Shashank Shreedhar Bhatt, Suyash Gupta | 2026-02-24 | 下载 | Federated Learning (FL) enables a distributed client-server architecture where multiple clients collaboratively train a global Machine Learning (ML) model without sharing sensitive local data. |
| Circumventing the FLP Impossibility Result with Open Atomic Ethernet | Paul Borrill | 2026-02-24 | 下载 | The Fischer--Lynch--Paterson (FLP) impossibility result is widely regarded as one of the most fundamental negative results in distributed computing: no deterministic protocol can guarantee consensus i... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Compensating the Packet Delay Variation for 6G Integrated with IEEE Time-Sensitive Networking | Marilet De Andrade, Joachim Sachs, Lucas Haug, Simon Egger, Frank Dürr, Balázs Varga, Janos Farkas, György Miklós | 2026-02-24 | 下载 | 6G is deemed as a key technology to support emerging applications with stringent requirements for highly dependable and timecritical communication. |
| UnlinkableDFL: a Practical Mixnet Protocol for Churn-Tolerant Decentralized FL Model Sharing | Chao Feng, Thomas Grubl, Jan von der Assen, Sandrin Raphael Hunkeler, Linn Anna Spitz, Gerome Bovet, Burkhard Stiller | 2026-02-24 | 下载 | Decentralized Federated Learning (DFL) eliminates the need for a central aggregator, but it can expose communication patterns that reveal participant identities. |
| The Instability of all Backoff Protocols | Leslie Ann Goldberg, John Lapinskas | 2026-02-24 | 下载 | In this paper we prove Aldous's conjecture from 1987 that there is no backoff protocol that is stable for any positive arrival rate. The setting is a communication channel for coordinating requests fo... |
| Telemetry-Based Server Selection in the Quantum Internet via Cross-Layer Runtime Estimation | Masaki Nagai, Hideaki Kawaguchi, Shin Nishio, Takahiko Satoh | 2026-02-24 | 下载 | The Quantum Internet will allow clients to delegate quantum workloads to remote servers over heterogeneous networks, but choosing the server that minimizes end-to-end execution time is difficult becau... |
| Airavat: An Agentic Framework for Internet Measurement | Alagappan Ramanathan, Eunju Kang, Dongsu Han, Sangeetha Abdu Jyothi | 2026-02-24 | 下载 | Internet measurement faces twin challenges: complex analyses require expert-level orchestration of tools, yet even syntactically correct implementations can have methodological flaws and can be diffic... |
| Deep Reinforcement Learning Based Block Coordinate Descent for Downlink Weighted Sum-rate Maximization on AI-Native Wireless Networks | Siya Chen, Chee Wei Tan, H. Vincent Poor | 2026-02-24 | 下载 | This paper introduces a deep reinforcement learning-based block coordinate descent (DRL-based BCD) algorithm to address the nonconvex weighted sum-rate maximization (WSRM) problem with a total power c... |
| AWCP: A Workspace Delegation Protocol for Deep-Engagement Collaboration across Remote Agents | Xiaohang Nie, Zihan Guo, Youliang Chen, Yuanjian Zhou, Weinan Zhang | 2026-02-24 | 下载 | The rapid evolution of Large Language Model (LLM)-based autonomous agents is reshaping the digital landscape toward an emerging Agentic Web, where increasingly specialized agents must collaborate to a... |
| Threadle: A Memory-Efficient Network Storage and Query Engine for Large, Multilayer, and Mixed-mode Networks | Carl Nordlund, Yukun Jiao | 2026-02-24 | 下载 | We present Threadle, an open-source, high-performance, and memory-efficient network storage and query engine written in C#. Designed for working with full-population networks derived from administrati... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Exploiting Dependency and Parallelism: Real-Time Scheduling and Analysis for GPU Tasks | Yuanhai Zhang, Songyang He, Ruizhe Gou, Mingyue Cui, Boyang Li, Shuai Zhao, Kai Huang | 2026-02-24 | 下载 | With the rapid advancement of Artificial Intelligence, the Graphics Processing Unit (GPU) has become increasingly essential across a growing number of safety-critical application domains. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators | Xinsheng Tang, Yangcheng Li, Nan Wang, Zhiyi Shu, Xingyu Ling, Junna Xing, Peng Zhou, Qiang Liu | 2026-02-24 | 下载 | Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers. |