Skip to content

2025-10-31

cs.AR - Architecture

标题作者发布日期PDF摘要
Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU LimitsDowon Kim, MinJae Lee, Janghyeon Kim, HyuckSung Kwon, Hyeonggyu Jeong, Sang-Soo Park, Minyong Yoon, Si-Dong Roh, Yongsuk Kwon, Jinin So, Jungwook Choi2025-10-31下载The expansion of context windows in large language models (LLMs) to multi-million tokens introduces severe memory and compute bottlenecks, particularly in managing the growing Key-Value (KV) cache.
PEARL: Power- and Energy-Aware Multicore Intermittent ComputingKhakim Akhunov, Eren Yildiz, Kasim Sinan Yildirim2025-10-31下载Low-power multicore platforms are suitable for running data-intensive tasks in parallel, but they are highly inefficient for computing on intermittent power.
H-FA: A Hybrid Floating-Point and Logarithmic Approach to Hardware Accelerated FlashAttentionKosmas Alexandridis, Giorgos Dimitrakopoulos2025-10-31下载Transformers have significantly advanced AI and machine learning through their powerful attention mechanism. However, computing attention on long sequences can become a computational bottleneck.
A Memory-Efficient Retrieval Architecture for RAG-Enabled Wearable Medical LLMs-AgentsZhipeng Liao, Kunming Shao, Jiangnan Yu, Liang Zhao, Tim Kwang-Ting Cheng, Chi-Ying Tsui, Jie Yang, Mohamad Sawan2025-10-31下载With powerful and integrative large language models (LLMs), medical AI agents have demonstrated unique advantages in providing personalized medical consultations, continuous health monitoring, and pre...
Descriptor-Based Object-Aware Memory Systems: A Comprehensive ReviewDong Tong2025-10-31下载The security and efficiency of modern computing systems are fundamentally undermined by the absence of a native architectural mechanism to propagate high-level program semantics, such as object identi...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Tetris: An SLA-aware Application Placement Strategy in the Edge-Cloud ContinuumLucas Almeida, Maycon Peixoto2025-10-31下载An Edge-Cloud Continuum integrates edge and cloud resources to provide a flexible and scalable infrastructure. This paradigm can minimize latency by processing data closer to the source at the edge wh...
LongCat-Flash-Omni Technical ReportMeituan LongCat Team, Bairui Wang, Bayan, Bin Xiao, Bo Zhang, Bolin Rong, Borun Chen, Chang Wan, Chao Zhang, Chen Huang, Chen Chen, Chen Chen, Chengxu Yang, Chengzuo Yang, Cong Han, Dandan Peng, Delian Ruan, Detai Xin, Disong Wang, Dongchao Yang, Fanfan Liu, Fengjiao Chen, Fengyu Yang, Gan Dong, Gang Huang, Gang Xu, Guanglu Wan, Guoqiang Tan, Guoqiao Yu, Haibo Qiu, Hao Lu, Hongbo Liu, Hongyu Xiang, Jiaheng Wu, Jian Yang, Jiaxing Liu, Jing Huang, Jingang Wang, Jinrui Ding, Juchao Jiang, Jun Kuang, Jun Wang, Junhui Mei, Ke Ding, Kefeng Zhang, Lei Chen, Liang Shi, Limeng Qiao, Liming Zheng, Lin Ma, Liuyang Guo, Liya Ma, Luying Sun, Man Gao, Mengshen Zhu, Miao Cao, Minliang Lin, Nuo Xu, Peng Shi, Qi Zhang, Qian Fang, Qian Wang, Qian Yang, Quanxiu Wang, Rongxiang Weng, Rongxin Guo, Ruoxuan Liang, Senbin Yang, Shanbo Xu, Shanglin Lei, Shengze Ye, Shimin Chen, Shuaiqi Chen, Shujie Hu, Shuo Li, Siqi Yang, Siyu Xu, Siyu Ren, Song Li, Songxiang Liu, Tianhao Bai, Tianye Dai, Wei Hong, Wei Wang, Weixiao Zhao, Wengang Cao, Wenlong Zhu, Wenlong He, Xi Su, Xi Nan, Xiaohan Zhao, Xiaohao Wang, Xiaoyu Zhao, Xiaoyu Wang, Xiaoyu Li, Xin Pan, Xin Chen, Xiusong Sun, Xu Xiang, Xudong Xing, Xuezhi Cao, Xunliang Cai, Yang Yang, Yanli Tan, Yao Yao, Yerui Sun, Yi Chen, Yifan Lu, Yin Gong, Yining Zhang, Yitian Chen, Yiyang Gan, Yuchen Tang, Yuchen Xie, Yueqian Wang, Yuewen Zheng, Yufei Zhang, Yufeng Zhong, Yulei Qian, Yuqi Peng, Yuqian Li, Yuwei Jiang, Zeyang Hu, Zheng Zhang, Zhengkun Tian, Zhiqing Hong, Zhixiong Zeng, Zhuqi Mi, Ziran Li, Ziwen Wang, Ziyi Zhao, Ziyuan Zhuang, Zizhe Zhao2025-10-31下载We introduce LongCat-Flash-Omni, a state-of-the-art open-source omni-modal model with 560 billion parameters, excelling at real-time audio-visual interaction.
COOL Is Optimal in Error-Free Asynchronous Byzantine AgreementJinyuan Chen2025-10-31下载COOL (Chen'21) is an error-free, information-theoretically secure Byzantine agreement (BA) protocol proven to achieve BA consensus in the synchronous setting for an \ell-bit message, with a total co...
Machine learning-based cloud resource allocation algorithms: a comprehensive comparative reviewDeep Bodra, Sushil Khairnar2025-10-31下载Cloud resource allocation has emerged as a major challenge in modern computing environments, with organizations struggling to manage complex, dynamic workloads while optimizing performance and cost ef...
Fix: externalizing network I/O in serverless computingYuhan Deng, Akshay Srivatsan, Sebastian Ingino, Francis Chua, Yasmine Mitchell, Matthew Vilaysack, Keith Winstein2025-10-31下载We describe a system for serverless computing where users, programs, and the underlying platform share a common representation of a computation: a deterministic procedure, run in an environment ...
RDMA Point-to-Point Communication for LLM SystemsNandor Licker, Kevin Hu, Vladimir Zaytsev, Lequn Chen2025-10-31下载Emerging Large Language Model (LLM) system patterns, such as disaggregated inference, Mixture-of-Experts (MoE) routing, and asynchronous reinforcement fine-tuning, require flexible point-to-point comm...
ML-Based Optimum Sub-system Size Heuristic for the GPU Implementation of the Tridiagonal Partition MethodMilena Veneva2025-10-31下载This paper presents a machine learning (ML)-based heuristic for finding the optimum sub-system size for the CUDA implementation of the parallel partition algorithm.
Dynamic Service Scheduling and Resource Management in Energy-Harvesting Multi-access Edge ComputingShuyi Chen, Panagiotis Oikonomou, Zhengchang Hua, Nikos Tziritas, Karim Djemame, Nan Zhang, Georgios Theodoropoulos2025-10-31下载Multi-access Edge Computing (MEC) delivers low-latency services by hosting applications near end-users. To promote sustainability, these systems are increasingly integrated with renewable Energy Harve...
A Digital Twin-based Multi-Agent Reinforcement Learning Framework for Vehicle-to-Grid CoordinationZhengchang Hua, Panagiotis Oikonomou, Karim Djemame, Nikos Tziritas, Georgios Theodoropoulos2025-10-31下载The coordination of large-scale, decentralised systems, such as a fleet of Electric Vehicles (EVs) in a Vehicle-to-Grid (V2G) network, presents a significant challenge for modern control systems.
Synergistic Tensor and Pipeline ParallelismMengshi Qi, Jiaxuan Peng, Jie Zhang, Juan Zhu, Yong Li, Huadong Ma2025-10-31下载In the machine learning system, the hybrid model parallelism combining tensor parallelism (TP) and pipeline parallelism (PP) has become the dominant solution for distributed training of Large Language...
SERFLOW: A Cross-Service Cost Optimization Framework for SLO-Aware Dynamic ML InferenceZongshun Zhang, Ibrahim Matta2025-10-31下载Dynamic offloading of Machine Learning (ML) model partitions across different resource orchestration services, such as Function-as-a-Service (FaaS) and Infrastructure-as-a-Service (IaaS), can balance ...
Glia: A Human-Inspired AI for Automated Systems Design and OptimizationPouya Hamadanian, Pantea Karimi, Arash Nasr-Esfahany, Kimia Noorbakhsh, Joseph Chandler, Ali ParandehGheibi, Mohammad Alizadeh, Hari Balakrishnan2025-10-31下载Can AI autonomously design mechanisms for computer systems on par with the creativity and reasoning of human experts? We present Glia, an AI architecture for networked systems design that uses large l...
Byzantine Attacks in RIS-Enhanced Cooperative Spectrum Sensing: A Decision Fusion PerspectiveGaoyuan Zhang, Gaolei Song, Boyuan Li, Zijian Li, Baofeng Ji, Ruijuan Zheng, Guoqiang Zheng, Tony Q. S. Quek2025-10-31下载From the perspective of hard decision fusion, we investigate Byzantine attacks in Reconfigurable Intelligent Surface (RIS)-enhanced and decode-and-forward relay-assisted Cooperative Spectrum Sensing (...
Secure Communication in the Presence of an RIS-Enhanced Eavesdropper in MIMO NetworksGaoyuan Zhang, Ruisong Si, Boyuan Li, Zijian Li, Baofeng Ji, Chenqi Zhu, Tony Q. S. Quek2025-10-31下载We pay our attention towards secure and robust communication in the presence of a Reconfigurable Intelligent Surface (RIS)-enhanced mobile eavesdropping attacker in Multiple-Input Multiple-Output (MIM...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Tetris: An SLA-aware Application Placement Strategy in the Edge-Cloud ContinuumLucas Almeida, Maycon Peixoto2025-10-31下载An Edge-Cloud Continuum integrates edge and cloud resources to provide a flexible and scalable infrastructure. This paradigm can minimize latency by processing data closer to the source at the edge wh...
Learning a Network Digital Twin as a Hybrid SystemChristos Mavridis, Fernando S. Barbosa, Hamed Farhadi, Karl H. Johansson2025-10-31下载Network digital twin (NDT) models are virtual models that replicate the behavior of physical communication networks and are considered a key technology component to enable novel features and capabilit...
Reinforcement Learning for Resource Allocation in Vehicular Multi-Fog ComputingMohammad Hadi Akbarzadeh, Mahmood Ahmadi, Mohammad Saeed Jahangiry, Jae Young Hur2025-10-31下载The exponential growth of Internet of Things (IoT) devices, smart vehicles, and latency-sensitive applications has created an urgent demand for efficient distributed computing paradigms.
Mist-Assisted Federated Learning for Intrusion Detection in Heterogeneous IoT NetworksSaadat Izadi, Shakib Komasi, Ali Salimi, Alireza Rezaei, Mahmood Ahmadi2025-10-31下载The rapid growth of the Internet of Things (IoT) offers new opportunities but also expands the attack surface of distributed, resource-limited devices.
Toward Hybrid COTS-based LiFi/WiFi Networks with QoS Requirements in Mobile EnvironmentsEmilio Ancillotti, Loreto Pescosolido, Andrea Passarella2025-10-31下载We consider a hybrid LiFi/WiFi network consisting of commercially available equipment, for mobile scenarios, where WiFi backs up communications, through vertical handovers, in case of insufficient LiF...
Towards Sub-millisecond Latency and Guaranteed Bit Rates in 5G User PlaneLeonardo Alberro, Noura Limam, Raouf Boutaba2025-10-31下载Next-generation services demand stringent Quality of Service (QoS) guarantees, such as per-flow bandwidth assurance, ultra-low latency, and traffic prioritization, posing significant challenges to 5G ...
Rethinking Telemetry Design for Fine-Grained Anomaly Detection in 5G User PlanesNiloy Saha, Noura Limam, Yang Xiao, Raouf Boutaba2025-10-31下载Detecting QoS anomalies in 5G user planes requires fine-grained per-flow visibility, but existing telemetry approaches face a fundamental trade-off.
Asynchronous Risk-Aware Multi-Agent Packet Routing for Ultra-Dense LEO Satellite NetworksKe He, Thang X. Vu, Le He, Lisheng Fan, Symeon Chatzinotas, Bjorn Ottersten2025-10-31下载The rise of ultra-dense LEO constellations creates a complex and asynchronous network environment, driven by their massive scale, dynamic topologies, and significant delays.
Challenging Tribal Knowledge -- Large Scale Measurement Campaign on Decentralized NAT TraversalDennis Trautwein, Cornelius Ihle, Moritz Schubotz, Bela Gipp2025-10-31下载The promise of decentralized peer-to-peer (P2P) systems is fundamentally gated by the challenge of Network Address Translation (NAT) traversal, with existing solutions often reintroducing the very cen...
Selected Results from the REDMARS2 Project: Recursive Delay-Tolerant Networking using Bundle-in-Bundle EncapsulationMarius Feldmann, Tobias Nöthlich, Felix Walter, Maximilian Nitsch, Juan A. Fraire, Georg A. Murzik, Fiona Fuchs2025-10-31下载This whitepaper presents parts of the results of the REDMARS2 project conducted in 2021-2022, exploring the integration of Recursive Internetwork Architecture (RINA) concepts into Delay- and Disruptio...
Effective Delayed Patching for Transient Malware Control on NetworksMinh Phu Vuong, Chul-Ho Lee, Do Young Eun2025-10-31下载Patching nodes is an effective network defense strategy for malware control at early stages, and its performance is primarily dependent on how accurately the infection propagation is characterized.
Study of Cluster-Based Routing Based on Machine Learning for UAV Networks in 6GLuis Antonio L. F. da Costa, Rodrigo C. de Lamare, Rafael Kunst, Edison Pignaton de Freitas2025-10-31下载The sixth generation (6G) wireless networks are envisioned to deliver ultra-low latency, massive connectivity, and high data rates, enabling advanced applications such as autonomous {unmaned aerial ve...
Stochastic Geometry of Cylinders: Characterizing Inter-Nodal Distances for 3D UAV NetworksYunfeng Jiang, Zhiming Huang, Jianping Pan2025-10-31下载The analytical characterization of coverage probability in finite three-dimensional wireless networks has long remained an open problem, hindered by the loss of spatial independence in finite-node set...
Analytical Model of NR-V2X Mode 2 with Re-Evaluation MechanismShuo Zhu, Siyu Lin2025-10-31下载Massive message transmissions, unpredictable aperiodic messages, and high-speed moving vehicles contribute to the complex wireless environment, resulting in inefficient resource collisions in Vehicle ...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Fix: externalizing network I/O in serverless computingYuhan Deng, Akshay Srivatsan, Sebastian Ingino, Francis Chua, Yasmine Mitchell, Matthew Vilaysack, Keith Winstein2025-10-31下载We describe a system for serverless computing where users, programs, and the underlying platform share a common representation of a computation: a deterministic procedure, run in an environment ...
Supply Chain Exploitation of Secure ROS 2 Systems: A Proof-of-Concept on Autonomous Platform Compromise via Keystore ExfiltrationTahmid Hasan Sakib, Yago Romano Martinez, Carter Brady, Syed Rafay Hasan, Terry N. Guo2025-10-31下载This paper presents a proof-of-concept supply chain attack against the Secure ROS 2 (SROS 2) framework, demonstrated on a Quanser QCar2 autonomous vehicle platform.
Sockeye: a language for analyzing hardware documentationBen Fiedler, Samuel Gruetter, Timothy Roscoe2025-10-31下载Systems programmers have to consolidate the ever growing hardware mess present on modern System-on-Chips (SoCs). Correctly programming a multitude of components, providing functionality but also secur...

cs.PF - Performance

标题作者发布日期PDF摘要
AMD MI300X GPU Performance AnalysisChandrish Ambati, Trung Diep2025-10-31下载The rapid growth of large language models (LLMs) has driven the need for high-performance, scalable GPU hardware capable of efficiently serving models with hundreds of billions of parameters.
Dependence-Driven, Scalable Quantum Circuit Mapping with Affine AbstractionsMarouane Benbetka, Merwan Bekkar, Riyadh Baghdadi, Martin Kong2025-10-31下载Qubit Mapping is a critical task in Quantum Compilation, as modern Quantum Processing Units (QPUs) are constrained to nearest-neighbor interactions defined by a qubit coupling graph.
MLPerf AutomotiveRadoyeh Shojaei, Predrag Djurdjevic, Mostafa El-Khamy, James Goel, Kasper Mecklenburg, John Owens, Pınar Muyan-Özçelik, Tom St. John, Jinho Suh, Arjun Suresh2025-10-31下载We present MLPerf Automotive, the first standardized public benchmark for evaluating Machine Learning systems that are deployed for AI acceleration in automotive systems.

基于 VitePress 构建