Skip to content

2026-02-03

cs.AR - Architecture

标题作者发布日期PDF摘要
A Parameterizable Convolution Accelerator for Embedded Deep Learning ApplicationsPanagiotis Mousouliotis, Georgios Keramidas2026-02-03下载Convolutional neural network (CNN) accelerators implemented on Field-Programmable Gate Arrays (FPGAs) are typically designed with a primary focus on maximizing performance, often measured in giga-oper...
LLM-FSM: Scaling Large Language Models for Finite-State Reasoning in RTL Code GenerationYuheng Wu, Berk Gokmen, Zhouhua Xie, Peijing Li, Caroline Trippel, Priyanka Raina, Thierry Tambe2026-02-03下载Finite-state reasoning, the ability to understand and implement state-dependent behavior, is central to hardware design. In this paper, we present LLM-FSM, a benchmark that evaluates how well large la...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Pending Conflicts Make Progress ImpossiblePetr Kuznetsov, Pierre Sutra, Guillermo Toyos-Marfurt2026-02-03下载In this work, we study progress conditions for commutativity-aware, linearizable implementations of shared objects. Motivated by the observation that commuting operations can be executed in parallel, ...
Do We Need Asynchronous SGD? On the Near-Optimality of Synchronous MethodsGrigory Begunov, Alexander Tyurin2026-02-03下载Modern distributed optimization methods mostly rely on traditional synchronous approaches, despite substantial recent progress in asynchronous optimization.
MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic EnvironmentsGuangyi Liu, Pengxiang Zhao, Yaozhen Liang, Qinyi Luo, Shunye Tang, Yuxiang Chai, Weifeng Lin, Han Xiao, WenHao Wang, Siheng Chen, Zhengxi Lu, Gao Wu, Hao Wang, Liang Liu, Yong Liu2026-02-03下载Current mobile GUI agent benchmarks systematically fail to assess memory capabilities, with only 5.2-11.8% memory-related tasks and no cross-session learning evaluation.
Improved Analysis of the Accelerated Noisy Power Method with Applications to Decentralized PCAPierre Aguié, Mathieu Even, Laurent Massoulié2026-02-03下载We analyze the Accelerated Noisy Power Method, an algorithm for Principal Component Analysis in the setting where only inexact matrix-vector products are available, which can arise for instance in dec...
Evaluating Kubernetes Performance for GenAI Inference: From Automatic Speech Recognition to LLM SummarizationSai Sindhur Malleni, Raúl Sevilla, Aleksei Vasilevskii, José Castillo Lema, André Bauer2026-02-03下载As Generative AI (GenAI), particularly inference, rapidly emerges as a dominant workload category, the Kubernetes ecosystem is proactively evolving to natively support its unique demands.
Mitigating Staleness in Asynchronous Pipeline Parallelism via Basis RotationHyunji Jung, Sungbin Shin, Namhoon Lee2026-02-03下载Asynchronous pipeline parallelism maximizes hardware utilization by eliminating the pipeline bubbles inherent in synchronous execution, offering a path toward efficient large-scale distributed trainin...
DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCsZeyu Zhu, Gang Li, Peisong Wang, Zitao Mo, Minnan Pei, Zhuoran Song, Xiaoyao Liang, Jian Cheng2026-02-03下载Mixture of Experts (MoE) architectures significantly enhance the capacity of LLMs without proportional increases in computation, but at the cost of a vast parameter size.
Recursive Energy Efficient AgreementShachar Meir, David Peleg2026-02-03下载Agreement is a foundational problem in distributed computing that have been studied extensively for over four decades. Recently, Meir, Mirault, Peleg and Robinson introduced the notion of \emph{Energy...
Exploiting Multi-Core Parallelism in Blockchain Validation and ConstructionArivarasan Karmegam, Lucianna Kiffer, Antonio Fernández Anta2026-02-03下载Blockchain validators can reduce block processing time by exploiting multi-core CPUs, but deterministic execution must preserve a given total order while respecting transaction conflicts and per-block...
Dynamic Topology Optimization for Non-IID Data in Decentralized LearningBart Cox, Antreas Ioannou, Jérémie Decouchant2026-02-03下载Decentralized learning (DL) enables a set of nodes to train a model collaboratively without central coordination, offering benefits for privacy and scalability.
Experimental Analysis of Server-Side Caching for Web PerformanceMohammad Umar, Bharat Tripathi2026-02-03下载Performance in web applications is a key aspect of user experience and system scalability. Among the different techniques used to improve web application performance, caching has been widely used.
Joint Network-and-Server Congestion in Multi-Source Traffic Allocation: A Convex Formulation and Price-Based DecentralizationTamoghna Sarkar, Bhaskar Krishnamachari2026-02-03下载This paper studies an important rate allocation problem that arises in many networked and distributed systems: steady-state traffic rate allocation from multiple sources to multiple service nodes when...
StreamShield: A Production-Proven Resiliency Solution for Apache Flink at ByteDanceYong Fang, Yuxing Han, Meng Wang, Yifan Zhang, Yue Ma, Chi Zhang2026-02-03下载Distributed Stream Processing Systems (DSPSs) form the backbone of real-time processing and analytics at ByteDance, where Apache Flink powers one of the largest production clusters worldwide.
Studying the Effect of Schedule Preemption on Dynamic Task Graph SchedulingMohammadali Khodabandehlou, Jared Coleman, Niranjan Suri, Bhaskar Krishnamachari2026-02-03下载Dynamic scheduling of task graphs is often addressed without revisiting prior task allocations, with a primary focus on minimizing makespan. We study controlled schedule preemption, introducing the La...
Large-Scale LLM Inference with Heterogeneous Workloads: Prefill-Decode Contention and Asymptotically Optimal ControlRuihan Lin, Zezhen Ding, Zean Han, Jiheng Zhang2026-02-03下载Large Language Models (LLMs) are rapidly becoming critical infrastructure for enterprise applications, driving unprecedented demand for GPU-based inference services.
PackInfer: Compute- and I/O-Efficient Attention for Batched LLM InferenceRui Ning, Wei Zhang, Fan Lai2026-02-03下载Attention efficiency is critical to large language model (LLM) inference. While prior advances optimize attention execution for individual requests (e.g.
It's Not Just Timestamps: A Study on Docker ReproducibilityOreofe Solarin2026-02-03下载Reproducible container builds promise a simple integrity check for software supply chains: rebuild an image from its Dockerfile and compare hashes.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Less is More: Optimizing Probe Selection Using Shared Latency AnomaliesTaveesh Sharma, Andrew Chu, Paul Schmitt, Francesco Bronzino, Nick Feamster, Nicole Marwell2026-02-03下载Latency anomalies, defined as persistent or transient increases in round-trip time (RTT), are common in residential Internet performance. When multiple users observe anomalies to the same destination,...
Perfect Network Resilience in Polynomial TimeMatthias Bentert, Stefan Schmid2026-02-03下载Modern communication networks support local fast rerouting mechanisms to quickly react to link failures: nodes store a set of conditional rerouting rules which define how to forward an incoming packet...
xDevSM: An Open-Source Framework for Portable, AI-Ready xApps Across Heterogeneous O-RAN DeploymentsAngelo Feraudo, Stefano Maxenti, Andrea Lacava, Leonardo Bonati, Paolo Bellavista, Michele Polese, Tommaso Melodia2026-02-03下载Openness and programmability in the O-RAN architecture enable closed-loop control of the Radio Access Network (RAN). Artificial Intelligence (AI)-driven xApps, in the near-real-time RAN Intelligent Co...
RIPPLE: Lifecycle-aware Embedding of Service Function Chains in Multi-access Edge ComputingFederico Giarrè, Holger Karl2026-02-03下载In Multi-access Edge Computing networks, services can be deployed on nearby edge clouds (EC) as service function chains (SFCs) to meet strict quality of service (QoS) requirements.
On the Multi-Commodity Flow with convex objective function: Column-Generation approachesGuillaume Beraud-Sudreau, Lucas Létocart, Youcef Magnouche, Sébastien Martin2026-02-03下载The purpose of this work is to develop an algorithmic optimization approach for a capacitated Multi-Commodity flow problem, where the objective is to minimize the total link costs, where the cost of e...
Morphe: High-Fidelity Generative Video Streaming with Vision Foundation ModelTianyi Gong, Zijian Cao, Zixing Zhang, Jiangkai Wu, Xinggong Zhang, Shuguang Cui, Fangxin Wang2026-02-03下载Video streaming is a fundamental Internet service, while the quality still cannot be guaranteed especially in poor network conditions such as bandwidth-constrained and remote areas.
QASM: A Novel Framework for QUIC-Aware Stateful MiddleboxesHari Hara Sudhan Selvam, Sameer G. Kulkarni2026-02-03下载Stateful Middleboxes are integral part of enterprise and campus networks that provide essential in-network, security, and value-added services.
Towards Context-Aware Edge-Cloud Continuum Orchestration for Multi-user XR ServicesInhar Yeregui, Ángel Martín, Mikel Zorrilla, Roberto Viola, Jasone Astorga, Eduardo Jacob2026-02-03下载The rapid growth of multi-user eXtended Reality (XR) applications, spanning fields such as entertainment, education, and telemedicine, demands seamless, immersive experiences for users interacting wit...
Joint Network-and-Server Congestion in Multi-Source Traffic Allocation: A Convex Formulation and Price-Based DecentralizationTamoghna Sarkar, Bhaskar Krishnamachari2026-02-03下载This paper studies an important rate allocation problem that arises in many networked and distributed systems: steady-state traffic rate allocation from multiple sources to multiple service nodes when...
Analyzing Zigbee Traffic: Datasets, Classification and Storage Trade-offsAntonio Boiano, Dalin Zheng, Fabio Palmese, Andrea Pimpinella, Alessandro E. C. Redondi2026-02-03下载Zigbee is widely used in smart home environments due to its low power consumption and support for mesh networking, making it a relevant target for traffic-based IoT forensic analysis.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Mitigating Timing-Based Attacks in Real-Time Cyber-Physical SystemsArkaprava Sain, Sunandan Adhikary, Soumyajit Dey2026-02-03下载Real-time cyber-physical systems depend on deterministic task execution to guarantee safety and correctness. Unfortunately, this determinism can unintentionally expose timing information that enables ...

cs.PF - Performance

标题作者发布日期PDF摘要
Beyond Tokens: Semantic-Aware Speculative Decoding for Efficient Inference by Probing Internal StatesXiming Dong, Shaowei Wang, Dayi Lin, Boyuan Chen, Ahmed E. Hassan2026-02-03下载Large Language Models (LLMs) achieve strong performance across many tasks but suffer from high inference latency due to autoregressive decoding.
WebSplatter: Enabling Cross-Device Efficient Gaussian Splatting in Web Browsers via WebGPUYudong Han, Chao Xu, Xiaodan Ye, Weichen Bi, Zilong Dong, Yun Ma2026-02-03下载We present WebSplatter, an end-to-end GPU rendering pipeline for the heterogeneous web ecosystem. Unlike naive ports, WebSplatter introduces a wait-free hierarchical radix sort that circumvents the la...
Accelerating the Tesseract Decoder for Quantum Error CorrectionDragana Grbic, Laleh Aghababaie Beni, Noah Shutty2026-02-03下载Quantum Error Correction (QEC) is essential for building robust, fault-tolerant quantum computers; however, the decoding process often presents a significant computational bottleneck.

基于 VitePress 构建