Skip to content

2025-11-14

cs.AR - Architecture

标题作者发布日期PDF摘要
Advanced Strategies for Uncertainty-Guided Live Measurement Sequencing in Fast, Robust SAR ADC Linearity TestingThorben Schey, Khaled Karoonlatifi, Michael Weyrich, Andrey Morozov2025-11-14下载This paper builds on our Uncertainty-Guided Live Measurement Sequencing (UGLMS) method. UGLMS is a closed-loop test strategy that adaptively selects SAR ADC code edges based on model uncertainty and r...
Uncertainty-Guided Live Measurement Sequencing for Fast SAR ADC Linearity TestingThorben Schey, Khaled Karoonlatifi, Michael Weyrich, Andrey Morozov2025-11-14下载This paper introduces a novel closed-loop testing methodology for efficient linearity testing of high-resolution Successive Approximation Register (SAR) Analog-to-Digital Converters (ADCs).
Autonomous Underwater Cognitive System for Adaptive Navigation: A SLAM-Integrated Cognitive ArchitectureK. A. I. N Jayarathne, R. M. N. M. Rathnayaka, D. P. S. S. Peiris2025-11-14下载Deep-sea exploration poses significant challenges, including disorientation, communication loss, and navigational failures in dynamic underwater environments.
T-MAN: Enabling End-to-End Low-Bit LLM Inference on NPUs via Unified Table LookupJianyu Wei, Qingtao Li, Shijie Cao, Lingxiao Ma, Zixu Hao, Yanyong Zhang, Xiaoyan Hu, Ting Cao2025-11-14下载Large language models (LLMs) are increasingly deployed on customer devices. To support them, current devices are adopting SoCs (System on Chip) with NPUs (Neural Processing Unit) installed.
Revisiting Disaggregated Large Language Model Serving for Performance and Energy ImplicationsJiaxi Li, Yue Zhu, Eun Kyung Lee, Klara Nahrstedt2025-11-14下载Different from traditional Large Language Model (LLM) serving that colocates the prefill and decode stages on the same GPU, disaggregated serving dedicates distinct GPUs to prefill and decode workload...
A Compilation Framework for Quantum Circuits with Mid-Circuit Measurement Error AwarenessMing Zhong, Zhemin Zhang, Xiangyu Ren, Chenghong Zhu, Siyuan Niu, Zhiding Liang2025-11-14下载Mid-circuit measurement (MCM) provides the capability for qubit reuse and dynamic control in quantum processors, enabling more resource-efficient algorithms and supporting error-correction procedures.
MMA-Sim: Bit-Accurate Reference Model of Tensor Cores and Matrix CoresPeichen Xie, Yang Wang, Fan Yang, Mao Yang2025-11-14下载The rapidly growing computation demands of deep neural networks (DNNs) have driven hardware vendors to integrate matrix multiplication accelerators (MMAs), such as NVIDIA Tensor Cores and AMD Matrix C...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Modular GPU Programming with Typed PerspectivesManya Bansal, Daniel Sainati, Joseph W. Cutler, Saman Amarasinghe, Jonathan Ragan-Kelley2025-11-14下载To achieve peak performance on modern GPUs, one must balance two frames of mind: issuing instructions to individual threads to control their behavior, while simultaneously tracking the convergence of ...
KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device InferenceHuawei Zhang, Chunwei Xia, Zheng Wang2025-11-14下载Language models (LMs) underpin emerging mobile and embedded AI applications like meeting and video summarization and document analysis, which often require processing multiple long-context inputs.
Advancing Annotat3D with Harpia: A CUDA-Accelerated Library For Large-Scale Volumetric Data SegmentationCamila Machado de Araujo, Egon P. B. S. Borges, Ricardo Marcelo Canteiro Grangeiro, Allan Pinto2025-11-14下载High-resolution volumetric imaging techniques, such as X-ray tomography and advanced microscopy, generate increasingly large datasets that challenge existing tools for efficient processing, segmentati...
Flash-Fusion: Enabling Expressive, Low-Latency Queries on IoT Sensor Streams with LLMsKausar Patherya, Ashutosh Dhekne, Francisco Romero2025-11-14下载Smart cities and pervasive IoT deployments have generated interest in IoT data analysis across transportation and urban planning. At the same time, Large Language Models offer a new interface for expl...
TD-Orch: Scalable Load-Balancing for Distributed Systems with Applications to Graph ProcessingYiwei Zhao, Qiushi Lin, Hongbo Kang, Guy E. Blelloch, Laxman Dhulipala, Yan Gu, Charles McGuffey, Phillip B. Gibbons2025-11-14下载In this paper, we introduce a task-data orchestration abstraction that supports a range of distributed applications, including graph processing and key-value stores.
A Unified Convergence Analysis for Semi-Decentralized Learning: Sampled-to-Sampled vs. Sampled-to-All CommunicationAngelo Rodio, Giovanni Neglia, Zheng Chen, Erik G. Larsson2025-11-14下载In semi-decentralized federated learning, devices primarily rely on device-to-device communication but occasionally interact with a central server.
Beyond Exascale: Dataflow Domain Translation on a Cerebras ClusterTomas Oppelstrup, Nicholas Giamblanco, Delyan Z. Kalchev, Ilya Sharapov, Mark Taylor, Dirk Van Essendelft, Sivasankaran Rajamanickam, Michael James2025-11-14下载Simulation of physical systems is essential across scientific and engineering domains. Commonly used domain decomposition methods are unable to simultaneously deliver both high simulation rate and hig...
UFO3: Weaving the Digital Agent GalaxyChaoyun Zhang, Liqun Li, He Huang, Chiming Ni, Bo Qiao, Si Qin, Yu Kang, Minghua Ma, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang2025-11-14下载Large language model (LLM)-powered agents are transforming digital devices from passive tools into proactive intelligent collaborators. However, most existing frameworks remain confined to a single OS...
What happens when nanochat meets DiLoCo?Alexander Acker, Soeren Becker, Sasho Nedelkoski, Dominik Scheinert, Odej Kao, Philipp Wiesner2025-11-14下载Although LLM training is typically centralized with high-bandwidth interconnects and large compute budgets, emerging methods target communication-constrained training in distributed environments.
SMART: A Surrogate Model for Predicting Application Runtime in Dragonfly SystemsXin Wang, Pietro Lodi Rizzini, Sourav Medya, Zhiling Lan2025-11-14下载The Dragonfly network, with its high-radix and low-diameter structure, is a leading interconnect in high-performance computing. A major challenge is workload interference on shared network links.
SemanticNN: Compressive and Error-Resilient Semantic Offloading for Extremely Weak DevicesJiaming Huang, Yi Gao, Fuchang Pan, Renjie Li, Wei Dong2025-11-14下载With the rapid growth of the Internet of Things (IoT), integrating artificial intelligence (AI) on extremely weak embedded devices has garnered significant attention, enabling improved real-time perfo...
Revisiting Disaggregated Large Language Model Serving for Performance and Energy ImplicationsJiaxi Li, Yue Zhu, Eun Kyung Lee, Klara Nahrstedt2025-11-14下载Different from traditional Large Language Model (LLM) serving that colocates the prefill and decode stages on the same GPU, disaggregated serving dedicates distinct GPUs to prefill and decode workload...
Cascading Bandits With FeedbackR Sri Prakash, Nikhil Karamchandani, Sharayu Moharir2025-11-14下载Motivated by the challenges of edge inference, we study a variant of the cascade bandit model in which each arm corresponds to an inference model with an associated accuracy and error probability.
Root Cause Analysis for Microservice Systems via Cascaded Conditional Learning with HypergraphsShuaiyu Xie, Hanbin He, Jian Wang, Bing Li2025-11-14下载Root cause analysis in microservice systems typically involves two core tasks: root cause localization (RCL) and failure type identification (FTI).

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Heterogeneous CACC Coexistence: Simulation, Analysis, and ModelingLorenzo Ghiro, Marco Franceschini, Renato Lo Cigno, Michele Segata2025-11-14下载The design of Cooperative Adaptive Cruise Control (CACC) algorithms for vehicle platooning has been extensively investigated, leading to a wide range of approaches with different requirements and perf...
SoK: Security Evaluation of Wi-Fi CSI Biometrics: Attacks, Metrics, and Open ChallengesGioliano de Oliveira Braga, Pedro Henrique dos Santos Rocha, Rafael Pimenta de Mattos Paixão, Giovani Hoff da Costa, Gustavo Cavalcanti Morais, Lourenço Alves Pereira Júnior2025-11-14下载Wi-Fi Channel State Information (CSI) has been repeatedly proposed as a biometric modality, often with reports of high accuracy and operational feasibility.
Use Cases, Metrics, and Challenges of Nomadic Non-Public Networks for the 6G StandardizationDaniel Lindenschmitt, Michael Gundall, Ainur Daurembekova, Marcos Rates Crippa, Mohammad Asif Habibi, Bin Han, Philipp Rosemann, Dennis Krummacker, Benedikt Veith, Hans D. Schotten2025-11-14下载Wireless communication is evolving with the adoption of dynamic and self-organizing networks. They are expected to play a crucial role in shaping sixth-generation (6G) systems and the ongoing standard...
Advancing IoT System Dependability: A Deep Dive into Management and Operation Plane SeparationLuoyao Hao, Shuo Zhang, Henning Schulzrinne2025-11-14下载We propose to enhance the dependability of large-scale IoT systems by separating the management and operation plane. We innovate the management plane to enforce overarching policies, such as safety no...
Constrained Network Slice Assignment via Large Language ModelsSagar Sudhakara, Pankaj Rajak2025-11-14下载Modern networks support network slicing, which partitions physical infrastructure into virtual slices tailored to different service requirements (for example, high bandwidth or low latency).

cs.PF - Performance

标题作者发布日期PDF摘要
Looking Forward: Challenges and Opportunities in Agentic AI ReliabilityLiudong Xing, Janet, Lin2025-11-14下载This chapter presents perspectives for challenges and future development in building reliable AI systems, particularly, agentic AI systems. Several open research problems related to mitigating the ris...
Library Liberation: Competitive Performance Matmul Through Compiler-composed NanokernelsArun Thangamani, Md Asghar Ahmad Shahid, Adam Siemieniuk, Rolf Morel, Renato Golin, Alexander Heinecke2025-11-14下载The rapidly evolving landscape of AI and machine learning workloads has widened the gap between high-level domain operations and efficient hardware utilization.
Revisiting Disaggregated Large Language Model Serving for Performance and Energy ImplicationsJiaxi Li, Yue Zhu, Eun Kyung Lee, Klara Nahrstedt2025-11-14下载Different from traditional Large Language Model (LLM) serving that colocates the prefill and decode stages on the same GPU, disaggregated serving dedicates distinct GPUs to prefill and decode workload...

基于 VitePress 构建