2025-11-14

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Advanced Strategies for Uncertainty-Guided Live Measurement Sequencing in Fast, Robust SAR ADC Linearity Testing	Thorben Schey, Khaled Karoonlatifi, Michael Weyrich, Andrey Morozov	2025-11-14	下载	This paper builds on our Uncertainty-Guided Live Measurement Sequencing (UGLMS) method. UGLMS is a closed-loop test strategy that adaptively selects SAR ADC code edges based on model uncertainty and r...
Uncertainty-Guided Live Measurement Sequencing for Fast SAR ADC Linearity Testing	Thorben Schey, Khaled Karoonlatifi, Michael Weyrich, Andrey Morozov	2025-11-14	下载	This paper introduces a novel closed-loop testing methodology for efficient linearity testing of high-resolution Successive Approximation Register (SAR) Analog-to-Digital Converters (ADCs).
Autonomous Underwater Cognitive System for Adaptive Navigation: A SLAM-Integrated Cognitive Architecture	K. A. I. N Jayarathne, R. M. N. M. Rathnayaka, D. P. S. S. Peiris	2025-11-14	下载	Deep-sea exploration poses significant challenges, including disorientation, communication loss, and navigational failures in dynamic underwater environments.
T-MAN: Enabling End-to-End Low-Bit LLM Inference on NPUs via Unified Table Lookup	Jianyu Wei, Qingtao Li, Shijie Cao, Lingxiao Ma, Zixu Hao, Yanyong Zhang, Xiaoyan Hu, Ting Cao	2025-11-14	下载	Large language models (LLMs) are increasingly deployed on customer devices. To support them, current devices are adopting SoCs (System on Chip) with NPUs (Neural Processing Unit) installed.
Revisiting Disaggregated Large Language Model Serving for Performance and Energy Implications	Jiaxi Li, Yue Zhu, Eun Kyung Lee, Klara Nahrstedt	2025-11-14	下载	Different from traditional Large Language Model (LLM) serving that colocates the prefill and decode stages on the same GPU, disaggregated serving dedicates distinct GPUs to prefill and decode workload...
A Compilation Framework for Quantum Circuits with Mid-Circuit Measurement Error Awareness	Ming Zhong, Zhemin Zhang, Xiangyu Ren, Chenghong Zhu, Siyuan Niu, Zhiding Liang	2025-11-14	下载	Mid-circuit measurement (MCM) provides the capability for qubit reuse and dynamic control in quantum processors, enabling more resource-efficient algorithms and supporting error-correction procedures.
MMA-Sim: Bit-Accurate Reference Model of Tensor Cores and Matrix Cores	Peichen Xie, Yang Wang, Fan Yang, Mao Yang	2025-11-14	下载	The rapidly growing computation demands of deep neural networks (DNNs) have driven hardware vendors to integrate matrix multiplication accelerators (MMAs), such as NVIDIA Tensor Cores and AMD Matrix C...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Modular GPU Programming with Typed Perspectives	Manya Bansal, Daniel Sainati, Joseph W. Cutler, Saman Amarasinghe, Jonathan Ragan-Kelley	2025-11-14	下载	To achieve peak performance on modern GPUs, one must balance two frames of mind: issuing instructions to individual threads to control their behavior, while simultaneously tracking the convergence of ...
KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference	Huawei Zhang, Chunwei Xia, Zheng Wang	2025-11-14	下载	Language models (LMs) underpin emerging mobile and embedded AI applications like meeting and video summarization and document analysis, which often require processing multiple long-context inputs.
Advancing Annotat3D with Harpia: A CUDA-Accelerated Library For Large-Scale Volumetric Data Segmentation	Camila Machado de Araujo, Egon P. B. S. Borges, Ricardo Marcelo Canteiro Grangeiro, Allan Pinto	2025-11-14	下载	High-resolution volumetric imaging techniques, such as X-ray tomography and advanced microscopy, generate increasingly large datasets that challenge existing tools for efficient processing, segmentati...
Flash-Fusion: Enabling Expressive, Low-Latency Queries on IoT Sensor Streams with LLMs	Kausar Patherya, Ashutosh Dhekne, Francisco Romero	2025-11-14	下载	Smart cities and pervasive IoT deployments have generated interest in IoT data analysis across transportation and urban planning. At the same time, Large Language Models offer a new interface for expl...
TD-Orch: Scalable Load-Balancing for Distributed Systems with Applications to Graph Processing	Yiwei Zhao, Qiushi Lin, Hongbo Kang, Guy E. Blelloch, Laxman Dhulipala, Yan Gu, Charles McGuffey, Phillip B. Gibbons	2025-11-14	下载	In this paper, we introduce a task-data orchestration abstraction that supports a range of distributed applications, including graph processing and key-value stores.
A Unified Convergence Analysis for Semi-Decentralized Learning: Sampled-to-Sampled vs. Sampled-to-All Communication	Angelo Rodio, Giovanni Neglia, Zheng Chen, Erik G. Larsson	2025-11-14	下载	In semi-decentralized federated learning, devices primarily rely on device-to-device communication but occasionally interact with a central server.
Beyond Exascale: Dataflow Domain Translation on a Cerebras Cluster	Tomas Oppelstrup, Nicholas Giamblanco, Delyan Z. Kalchev, Ilya Sharapov, Mark Taylor, Dirk Van Essendelft, Sivasankaran Rajamanickam, Michael James	2025-11-14	下载	Simulation of physical systems is essential across scientific and engineering domains. Commonly used domain decomposition methods are unable to simultaneously deliver both high simulation rate and hig...
UFO3: Weaving the Digital Agent Galaxy	Chaoyun Zhang, Liqun Li, He Huang, Chiming Ni, Bo Qiao, Si Qin, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang	2025-11-14	下载	Large language model (LLM)-powered agents are transforming digital devices from passive tools into proactive intelligent collaborators. However, most existing frameworks remain confined to a single OS...
What happens when nanochat meets DiLoCo?	Alexander Acker, Soeren Becker, Sasho Nedelkoski, Dominik Scheinert, Odej Kao, Philipp Wiesner	2025-11-14	下载	Although LLM training is typically centralized with high-bandwidth interconnects and large compute budgets, emerging methods target communication-constrained training in distributed environments.
SMART: A Surrogate Model for Predicting Application Runtime in Dragonfly Systems	Xin Wang, Pietro Lodi Rizzini, Sourav Medya, Zhiling Lan	2025-11-14	下载	The Dragonfly network, with its high-radix and low-diameter structure, is a leading interconnect in high-performance computing. A major challenge is workload interference on shared network links.
SemanticNN: Compressive and Error-Resilient Semantic Offloading for Extremely Weak Devices	Jiaming Huang, Yi Gao, Fuchang Pan, Renjie Li, Wei Dong	2025-11-14	下载	With the rapid growth of the Internet of Things (IoT), integrating artificial intelligence (AI) on extremely weak embedded devices has garnered significant attention, enabling improved real-time perfo...
Revisiting Disaggregated Large Language Model Serving for Performance and Energy Implications	Jiaxi Li, Yue Zhu, Eun Kyung Lee, Klara Nahrstedt	2025-11-14	下载	Different from traditional Large Language Model (LLM) serving that colocates the prefill and decode stages on the same GPU, disaggregated serving dedicates distinct GPUs to prefill and decode workload...
Cascading Bandits With Feedback	R Sri Prakash, Nikhil Karamchandani, Sharayu Moharir	2025-11-14	下载	Motivated by the challenges of edge inference, we study a variant of the cascade bandit model in which each arm corresponds to an inference model with an associated accuracy and error probability.
Root Cause Analysis for Microservice Systems via Cascaded Conditional Learning with Hypergraphs	Shuaiyu Xie, Hanbin He, Jian Wang, Bing Li	2025-11-14	下载	Root cause analysis in microservice systems typically involves two core tasks: root cause localization (RCL) and failure type identification (FTI).

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Heterogeneous CACC Coexistence: Simulation, Analysis, and Modeling	Lorenzo Ghiro, Marco Franceschini, Renato Lo Cigno, Michele Segata	2025-11-14	下载	The design of Cooperative Adaptive Cruise Control (CACC) algorithms for vehicle platooning has been extensively investigated, leading to a wide range of approaches with different requirements and perf...
SoK: Security Evaluation of Wi-Fi CSI Biometrics: Attacks, Metrics, and Open Challenges	Gioliano de Oliveira Braga, Pedro Henrique dos Santos Rocha, Rafael Pimenta de Mattos Paixão, Giovani Hoff da Costa, Gustavo Cavalcanti Morais, Lourenço Alves Pereira Júnior	2025-11-14	下载	Wi-Fi Channel State Information (CSI) has been repeatedly proposed as a biometric modality, often with reports of high accuracy and operational feasibility.
Use Cases, Metrics, and Challenges of Nomadic Non-Public Networks for the 6G Standardization	Daniel Lindenschmitt, Michael Gundall, Ainur Daurembekova, Marcos Rates Crippa, Mohammad Asif Habibi, Bin Han, Philipp Rosemann, Dennis Krummacker, Benedikt Veith, Hans D. Schotten	2025-11-14	下载	Wireless communication is evolving with the adoption of dynamic and self-organizing networks. They are expected to play a crucial role in shaping sixth-generation (6G) systems and the ongoing standard...
Advancing IoT System Dependability: A Deep Dive into Management and Operation Plane Separation	Luoyao Hao, Shuo Zhang, Henning Schulzrinne	2025-11-14	下载	We propose to enhance the dependability of large-scale IoT systems by separating the management and operation plane. We innovate the management plane to enforce overarching policies, such as safety no...
Constrained Network Slice Assignment via Large Language Models	Sagar Sudhakara, Pankaj Rajak	2025-11-14	下载	Modern networks support network slicing, which partitions physical infrastructure into virtual slices tailored to different service requirements (for example, high bandwidth or low latency).

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Looking Forward: Challenges and Opportunities in Agentic AI Reliability	Liudong Xing, Janet, Lin	2025-11-14	下载	This chapter presents perspectives for challenges and future development in building reliable AI systems, particularly, agentic AI systems. Several open research problems related to mitigating the ris...
Library Liberation: Competitive Performance Matmul Through Compiler-composed Nanokernels	Arun Thangamani, Md Asghar Ahmad Shahid, Adam Siemieniuk, Rolf Morel, Renato Golin, Alexander Heinecke	2025-11-14	下载	The rapidly evolving landscape of AI and machine learning workloads has widened the gap between high-level domain operations and efficient hardware utilization.
Revisiting Disaggregated Large Language Model Serving for Performance and Energy Implications	Jiaxi Li, Yue Zhu, Eun Kyung Lee, Klara Nahrstedt	2025-11-14	下载	Different from traditional Large Language Model (LLM) serving that colocates the prefill and decode stages on the same GPU, disaggregated serving dedicates distinct GPUs to prefill and decode workload...