2026-02-03

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
A Parameterizable Convolution Accelerator for Embedded Deep Learning Applications	Panagiotis Mousouliotis, Georgios Keramidas	2026-02-03	下载	Convolutional neural network (CNN) accelerators implemented on Field-Programmable Gate Arrays (FPGAs) are typically designed with a primary focus on maximizing performance, often measured in giga-oper...
LLM-FSM: Scaling Large Language Models for Finite-State Reasoning in RTL Code Generation	Yuheng Wu, Berk Gokmen, Zhouhua Xie, Peijing Li, Caroline Trippel, Priyanka Raina, Thierry Tambe	2026-02-03	下载	Finite-state reasoning, the ability to understand and implement state-dependent behavior, is central to hardware design. In this paper, we present LLM-FSM, a benchmark that evaluates how well large la...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Pending Conflicts Make Progress Impossible	Petr Kuznetsov, Pierre Sutra, Guillermo Toyos-Marfurt	2026-02-03	下载	In this work, we study progress conditions for commutativity-aware, linearizable implementations of shared objects. Motivated by the observation that commuting operations can be executed in parallel, ...
Do We Need Asynchronous SGD? On the Near-Optimality of Synchronous Methods	Grigory Begunov, Alexander Tyurin	2026-02-03	下载	Modern distributed optimization methods mostly rely on traditional synchronous approaches, despite substantial recent progress in asynchronous optimization.
MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments	Guangyi Liu, Pengxiang Zhao, Yaozhen Liang, Qinyi Luo, Shunye Tang, Yuxiang Chai, Weifeng Lin, Han Xiao, WenHao Wang, Siheng Chen, Zhengxi Lu, Gao Wu, Hao Wang, Liang Liu, Yong Liu	2026-02-03	下载	Current mobile GUI agent benchmarks systematically fail to assess memory capabilities, with only 5.2-11.8% memory-related tasks and no cross-session learning evaluation.
Improved Analysis of the Accelerated Noisy Power Method with Applications to Decentralized PCA	Pierre Aguié, Mathieu Even, Laurent Massoulié	2026-02-03	下载	We analyze the Accelerated Noisy Power Method, an algorithm for Principal Component Analysis in the setting where only inexact matrix-vector products are available, which can arise for instance in dec...
Evaluating Kubernetes Performance for GenAI Inference: From Automatic Speech Recognition to LLM Summarization	Sai Sindhur Malleni, Raúl Sevilla, Aleksei Vasilevskii, José Castillo Lema, André Bauer	2026-02-03	下载	As Generative AI (GenAI), particularly inference, rapidly emerges as a dominant workload category, the Kubernetes ecosystem is proactively evolving to natively support its unique demands.
Mitigating Staleness in Asynchronous Pipeline Parallelism via Basis Rotation	Hyunji Jung, Sungbin Shin, Namhoon Lee	2026-02-03	下载	Asynchronous pipeline parallelism maximizes hardware utilization by eliminating the pipeline bubbles inherent in synchronous execution, offering a path toward efficient large-scale distributed trainin...
DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCs	Zeyu Zhu, Gang Li, Peisong Wang, Zitao Mo, Minnan Pei, Zhuoran Song, Xiaoyao Liang, Jian Cheng	2026-02-03	下载	Mixture of Experts (MoE) architectures significantly enhance the capacity of LLMs without proportional increases in computation, but at the cost of a vast parameter size.
Recursive Energy Efficient Agreement	Shachar Meir, David Peleg	2026-02-03	下载	Agreement is a foundational problem in distributed computing that have been studied extensively for over four decades. Recently, Meir, Mirault, Peleg and Robinson introduced the notion of \emph{Energy...
Exploiting Multi-Core Parallelism in Blockchain Validation and Construction	Arivarasan Karmegam, Lucianna Kiffer, Antonio Fernández Anta	2026-02-03	下载	Blockchain validators can reduce block processing time by exploiting multi-core CPUs, but deterministic execution must preserve a given total order while respecting transaction conflicts and per-block...
Dynamic Topology Optimization for Non-IID Data in Decentralized Learning	Bart Cox, Antreas Ioannou, Jérémie Decouchant	2026-02-03	下载	Decentralized learning (DL) enables a set of nodes to train a model collaboratively without central coordination, offering benefits for privacy and scalability.
Experimental Analysis of Server-Side Caching for Web Performance	Mohammad Umar, Bharat Tripathi	2026-02-03	下载	Performance in web applications is a key aspect of user experience and system scalability. Among the different techniques used to improve web application performance, caching has been widely used.
Joint Network-and-Server Congestion in Multi-Source Traffic Allocation: A Convex Formulation and Price-Based Decentralization	Tamoghna Sarkar, Bhaskar Krishnamachari	2026-02-03	下载	This paper studies an important rate allocation problem that arises in many networked and distributed systems: steady-state traffic rate allocation from multiple sources to multiple service nodes when...
StreamShield: A Production-Proven Resiliency Solution for Apache Flink at ByteDance	Yong Fang, Yuxing Han, Meng Wang, Yifan Zhang, Yue Ma, Chi Zhang	2026-02-03	下载	Distributed Stream Processing Systems (DSPSs) form the backbone of real-time processing and analytics at ByteDance, where Apache Flink powers one of the largest production clusters worldwide.
Studying the Effect of Schedule Preemption on Dynamic Task Graph Scheduling	Mohammadali Khodabandehlou, Jared Coleman, Niranjan Suri, Bhaskar Krishnamachari	2026-02-03	下载	Dynamic scheduling of task graphs is often addressed without revisiting prior task allocations, with a primary focus on minimizing makespan. We study controlled schedule preemption, introducing the La...
Large-Scale LLM Inference with Heterogeneous Workloads: Prefill-Decode Contention and Asymptotically Optimal Control	Ruihan Lin, Zezhen Ding, Zean Han, Jiheng Zhang	2026-02-03	下载	Large Language Models (LLMs) are rapidly becoming critical infrastructure for enterprise applications, driving unprecedented demand for GPU-based inference services.
PackInfer: Compute- and I/O-Efficient Attention for Batched LLM Inference	Rui Ning, Wei Zhang, Fan Lai	2026-02-03	下载	Attention efficiency is critical to large language model (LLM) inference. While prior advances optimize attention execution for individual requests (e.g.
It's Not Just Timestamps: A Study on Docker Reproducibility	Oreofe Solarin	2026-02-03	下载	Reproducible container builds promise a simple integrity check for software supply chains: rebuild an image from its Dockerfile and compare hashes.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Less is More: Optimizing Probe Selection Using Shared Latency Anomalies	Taveesh Sharma, Andrew Chu, Paul Schmitt, Francesco Bronzino, Nick Feamster, Nicole Marwell	2026-02-03	下载	Latency anomalies, defined as persistent or transient increases in round-trip time (RTT), are common in residential Internet performance. When multiple users observe anomalies to the same destination,...
Perfect Network Resilience in Polynomial Time	Matthias Bentert, Stefan Schmid	2026-02-03	下载	Modern communication networks support local fast rerouting mechanisms to quickly react to link failures: nodes store a set of conditional rerouting rules which define how to forward an incoming packet...
xDevSM: An Open-Source Framework for Portable, AI-Ready xApps Across Heterogeneous O-RAN Deployments	Angelo Feraudo, Stefano Maxenti, Andrea Lacava, Leonardo Bonati, Paolo Bellavista, Michele Polese, Tommaso Melodia	2026-02-03	下载	Openness and programmability in the O-RAN architecture enable closed-loop control of the Radio Access Network (RAN). Artificial Intelligence (AI)-driven xApps, in the near-real-time RAN Intelligent Co...
RIPPLE: Lifecycle-aware Embedding of Service Function Chains in Multi-access Edge Computing	Federico Giarrè, Holger Karl	2026-02-03	下载	In Multi-access Edge Computing networks, services can be deployed on nearby edge clouds (EC) as service function chains (SFCs) to meet strict quality of service (QoS) requirements.
On the Multi-Commodity Flow with convex objective function: Column-Generation approaches	Guillaume Beraud-Sudreau, Lucas Létocart, Youcef Magnouche, Sébastien Martin	2026-02-03	下载	The purpose of this work is to develop an algorithmic optimization approach for a capacitated Multi-Commodity flow problem, where the objective is to minimize the total link costs, where the cost of e...
Morphe: High-Fidelity Generative Video Streaming with Vision Foundation Model	Tianyi Gong, Zijian Cao, Zixing Zhang, Jiangkai Wu, Xinggong Zhang, Shuguang Cui, Fangxin Wang	2026-02-03	下载	Video streaming is a fundamental Internet service, while the quality still cannot be guaranteed especially in poor network conditions such as bandwidth-constrained and remote areas.
QASM: A Novel Framework for QUIC-Aware Stateful Middleboxes	Hari Hara Sudhan Selvam, Sameer G. Kulkarni	2026-02-03	下载	Stateful Middleboxes are integral part of enterprise and campus networks that provide essential in-network, security, and value-added services.
Towards Context-Aware Edge-Cloud Continuum Orchestration for Multi-user XR Services	Inhar Yeregui, Ángel Martín, Mikel Zorrilla, Roberto Viola, Jasone Astorga, Eduardo Jacob	2026-02-03	下载	The rapid growth of multi-user eXtended Reality (XR) applications, spanning fields such as entertainment, education, and telemedicine, demands seamless, immersive experiences for users interacting wit...
Joint Network-and-Server Congestion in Multi-Source Traffic Allocation: A Convex Formulation and Price-Based Decentralization	Tamoghna Sarkar, Bhaskar Krishnamachari	2026-02-03	下载	This paper studies an important rate allocation problem that arises in many networked and distributed systems: steady-state traffic rate allocation from multiple sources to multiple service nodes when...
Analyzing Zigbee Traffic: Datasets, Classification and Storage Trade-offs	Antonio Boiano, Dalin Zheng, Fabio Palmese, Andrea Pimpinella, Alessandro E. C. Redondi	2026-02-03	下载	Zigbee is widely used in smart home environments due to its low power consumption and support for mesh networking, making it a relevant target for traffic-based IoT forensic analysis.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Mitigating Timing-Based Attacks in Real-Time Cyber-Physical Systems	Arkaprava Sain, Sunandan Adhikary, Soumyajit Dey	2026-02-03	下载	Real-time cyber-physical systems depend on deterministic task execution to guarantee safety and correctness. Unfortunately, this determinism can unintentionally expose timing information that enables ...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Beyond Tokens: Semantic-Aware Speculative Decoding for Efficient Inference by Probing Internal States	Ximing Dong, Shaowei Wang, Dayi Lin, Boyuan Chen, Ahmed E. Hassan	2026-02-03	下载	Large Language Models (LLMs) achieve strong performance across many tasks but suffer from high inference latency due to autoregressive decoding.
WebSplatter: Enabling Cross-Device Efficient Gaussian Splatting in Web Browsers via WebGPU	Yudong Han, Chao Xu, Xiaodan Ye, Weichen Bi, Zilong Dong, Yun Ma	2026-02-03	下载	We present WebSplatter, an end-to-end GPU rendering pipeline for the heterogeneous web ecosystem. Unlike naive ports, WebSplatter introduces a wait-free hierarchical radix sort that circumvents the la...
Accelerating the Tesseract Decoder for Quantum Error Correction	Dragana Grbic, Laleh Aghababaie Beni, Noah Shutty	2026-02-03	下载	Quantum Error Correction (QEC) is essential for building robust, fault-tolerant quantum computers; however, the decoding process often presents a significant computational bottleneck.