2026-03-16

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
LEXI: Lossless Exponent Coding for Efficient Inter-Chiplet Communication in Hybrid LLMs	Miao Sun, Alish Kanani, Kaushik Shroff, Umit Ogras	2026-03-16	下载	Data movement overheads increase the inference latency of state-of-the-art large language models (LLMs). These models commonly use the bfloat16 (BF16) format for stable training.
Co-Design of Memory-Storage Systems for Workload Awareness with Interpretable Models	Jay Sarkar, Vamsi Pavan Rayaprolu, Abhijeet Bhalerao	2026-03-16	下载	Solid-state storage architectures based on NAND or emerging memory devices (SSD), are fundamentally architected and optimized for both reliability and performance.
DUET: Disaggregated Hybrid Mamba-Transformer LLMs with Prefill and Decode-Specific Packages	Alish Kanani, Sangwan Lee, Han Lyu, Jiahao Lin, Jaehyun Park, Umit Y. Ogras	2026-03-16	下载	Large language models operate in distinct compute-bound prefill followed by memory bandwidth-bound decode phases. Hybrid Mamba-Transformer models inherit this asymmetry while adding state space model ...
GLANCE: Gaze-Led Attention Network for Compressed Edge-inference	Neeraj Solanki, Hong Ding, Sepehr Tabrizchi, Ali Shafiee Sarvestani, Shaahin Angizi, David Z. Pan, Arman Roohi	2026-03-16	下载	Real-time object detection in AR/VR systems faces critical computational constraints, requiring sub-10,ms latency within tight power budgets.
RESQ: A Unified Framework for REliability- and Security Enhancement of Quantized Deep Neural Networks	Ali Soltan Mohammadi, Samira Nazari, Ali Azarpeyvand, Mahdi Taheri, Milos Krstic, Michael Huebner, Christian Herglotz, Tara Ghasempouri	2026-03-16	下载	This work proposes a unified three-stage framework that produces a quantized DNN with balanced fault and attack robustness. The first stage improves attack resilience via fine-tuning that desensitizes...
bitSMM: A bit-Serial Matrix Multiplication Accelerator	Pedro Antunes, Artur Podobas	2026-03-16	下载	Neural-network (NN) inference is increasingly present on-board spacecraft to reduce downlink bandwidth and enable timely decision making. However, the power and reliability constraints of space missio...
SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated Computation	Zicheng He, Anhao Zhao, Xiaoyu Shen, Chen Wu, Lei He	2026-03-16	下载	Large language models (LLMs) have achieved remarkable performance across a wide range of tasks, but their inference efficiency remains a critical bottleneck due to rapidly growing parameters.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
DUET: Disaggregated Hybrid Mamba-Transformer LLMs with Prefill and Decode-Specific Packages	Alish Kanani, Sangwan Lee, Han Lyu, Jiahao Lin, Jaehyun Park, Umit Y. Ogras	2026-03-16	下载	Large language models operate in distinct compute-bound prefill followed by memory bandwidth-bound decode phases. Hybrid Mamba-Transformer models inherit this asymmetry while adding state space model ...
Cuckoo-GPU: Accelerating Cuckoo Filters on Modern GPUs	Tim Dortmann, Markus Vieth, Bertil Schmidt	2026-03-16	下载	Approximate Membership Query (AMQ) structures are essential for high-throughput systems in databases, networking, and bioinformatics. While Bloom filters offer speed, they lack support for deletions.
Multi-Objective Load Balancing for Heterogeneous Edge-Based Object Detection Systems	Daghash K. Alqahtani, Maria A. Rodriguez, Muhammad Aamir Cheema, Adel N. Toosi	2026-03-16	下载	The rapid proliferation of the Internet of Things (IoT) and smart applications has led to a surge in data generated by distributed sensing devices.
LMetric: Simple is Better - Multiplication May Be All You Need for LLM Request Scheduling	Dingyan Zhang, Jinbo Han, Kaixi Zhang, Xingda Wei, Sijie Shen, Chenguang Fang, Wenyuan Yu, Jingren Zhou, Rong Chen	2026-03-16	下载	High-quality LLM request scheduling requires achieving two key objectives: whether the routed instance has KV$ to accelerate the request execution and whether the workload is balanced across instances...
Token Coherence: Adapting MESI Cache Protocols to Minimize Synchronization Overhead in Multi-Agent LLM Systems	Vladyslav Parakhin	2026-03-16	下载	Multi-agent LLM orchestration incurs synchronization costs scaling as O(n x S x \|D\|) in agents, steps, and artifact size under naive broadcast -- a regime I term broadcast-induced triply-multiplicat...
Performance Isolation and Semantic Determinism in Efficient GPU Spatial Sharing	Zhenyuan Yang, Wenxin Zheng, Mingyu Li, Haibo Chen	2026-03-16	下载	Existing GPU spatial sharing systems face a three-way tradeoff: resource utilization, performance isolation, and semantic determinism. Hardware partitioning suffers from hardware under-utilization.
SCALE-TRACK: Asynchronous Euler-Lagrange particle tracking on heterogeneous computing architecture	Silvio Schmalfuß, Sergey Lesnik, Henrik Rusche, Dennis Niedermeier	2026-03-16	下载	Euler-Lagrange (EL) simulations provide a direct and robust framework for modeling disperse multiphase flows. However, they are computationally expensive.
Real-Time Driver Safety Scoring Through Inverse Crash Probability Modeling	Joyjit Roy, Samaresh Kumar Singh, Sushanta Das	2026-03-16	下载	Road crashes remain a leading cause of preventable fatalities. Existing prediction models predominantly produce binary outcomes, which offer limited actionable insights for real-time driver feedback.
Protecting Distributed Blockchain with Twin-Field Quantum Key Distribution: A Quantum Resistant Approach	Xuan Li, Ying Guo	2026-03-16	下载	Quantum computing provides the feasible multi-layered security challenges to classical blockchain systems. Whereas, quantum-secured blockchains relied on quantum key distribution (QKD) to establish se...
Fold-CP: A Context Parallelism Framework for Biomolecular Modeling	Dejun Lin, Simon Chu, Vishanth Iyer, Youhan Lee, John St John, Kevin Boyd, Brian Roland, Xiaowei Ren, Guoqing Zhou, Zhonglin Cao, Polina Binder, Yuliya Zhautouskaya, Jakub Zakrzewski, Maximilian Stadler, Kyle Gion, Yuxing Peng, Xi Chen, Tianjing Zhang, Philipp Junk, Michelle Dimon, Paweł Gniewek, Fabian Ortega, McKinley Polen, Ivan Grubisic, Ali Bashir, Graham Holt, Danny Kovtun, Matthias Grass, Luca Naef, Rui Wang, Jian Peng, Anthony Costa, Saee Paliwal, Eddie Calleja, Timur Rvachov, Neha Tadimeti, Roy Tal, Emine Kucukbenli	2026-03-16	下载	Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requ...
DeFRiS: Silo-Cooperative IoT Applications Scheduling via Decentralized Federated Reinforcement Learning	Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya	2026-03-16	下载	Next-generation IoT applications increasingly span across autonomous administrative entities, necessitating silo-cooperative scheduling to leverage diverse computational resources while preserving dat...
Can you keep a secret? A new protocol for sender-side enforcement of causal message delivery	Yan Tong, Nathan Liittschwager, Lindsey Kuper	2026-03-16	下载	Protocols for causal message delivery are widely used in distributed systems. Traditionally, causal delivery can be enforced either on the message sender's side or on the receiver's side.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Galaxy Tracer: A Topology-First 3D Interface for Interactive PCAP Exploration	Ryan Younger	2026-03-16	下载	Packet analysis tools conventionally present capture data through tabular packet lists, constraining the analyst to a sequential view that obscures the relational structure of network communication.
Evaluating Performance Characteristic of Opportunistic Routing Protocols: A Case Study of the 2016 Italian League Match Earthquake in the Stadio Adriatico	Yihang Cao, Milena Radenkovic	2026-03-16	下载	Delay Tolerant Networks (DTNs) can provide emergency communication support when conventional infrastructure is disrupted during disasters. This paper evaluates the performance of opportunistic routing...
The Internet of Physical AI Agents: Interoperability, Longevity, and the Cost of Getting It Wrong	Roberto Morabito, Mallik Tatipamula	2026-03-16	下载	The Internet has evolved by progressively expanding what humanity connects: first computers, then people, and later billions of devices through the Internet of Things (IoT).
Entropy-Aware Task Offloading in Mobile Edge Computing	Mohsen Sahraei Ardakani, Hong Wan, Rui Song	2026-03-16	下载	Mobile Edge Computing (MEC) technology has been introduced to enable could computing at the edge of the network in order to help resource limited mobile devices with time sensitive data processing tas...
Bridging Local and Global Knowledge: Cascaded Mixture-of-Experts Learning for Near-Shortest Path Routing	Yung-Fu Chen, Anish Arora	2026-03-16	下载	While deep learning models that leverage local features have demonstrated significant potential for near-optimal routing in dense Euclidean graphs, they struggle to generalize well in sparse networks ...
ln(3): A Universal Percolation Constant for Collective Dynamics on One-Dimensional Proximity Networks	Jian Ji	2026-03-16	下载	We report the identification and proof of a universal constant, ln(3) = 1.09861, which governs the onset of bidirectional collective behavior in one-dimensional Poisson proximity networks.
Brain-Inspired Graph Multi-Agent Systems for LLM Reasoning	Guangfu Hao, Yuming Dai, Xianzhe Qin, Shan Yu	2026-03-16	下载	Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of language tasks, yet complex multi-step reasoning remains a fundamental challenge.
SliceMapper: Intelligent Mapping of O-CU and O-DU onto O-Cloud Sites in 6G O-RAN	Mohammad Asif Habibi, Xavier Costa-Pérez, Hans D. Schotten	2026-03-16	下载	In this paper, we propose an rApp, named SliceMapper, to optimize the mapping process of the open centralized unit (O-CU) and open distributed unit (O-DU) of an open radio access network (O-RAN) slice...
Joint Routing and Model Pruning for Decentralized Federated Learning in Bandwidth-Constrained Multi-Hop Wireless Networks	Xiaoyu He, Weicai Li, Tiejun Lv, Xi Yu	2026-03-16	下载	Decentralized federated learning (D-FL) enables privacy-preserving training without a central server, but multi-hop model exchanges and aggregation are often bottlenecked by communication resource con...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
LMetric: Simple is Better - Multiplication May Be All You Need for LLM Request Scheduling	Dingyan Zhang, Jinbo Han, Kaixi Zhang, Xingda Wei, Sijie Shen, Chenguang Fang, Wenyuan Yu, Jingren Zhou, Rong Chen	2026-03-16	下载	High-quality LLM request scheduling requires achieving two key objectives: whether the routed instance has KV$ to accelerate the request execution and whether the workload is balanced across instances...
Performance Isolation and Semantic Determinism in Efficient GPU Spatial Sharing	Zhenyuan Yang, Wenxin Zheng, Mingyu Li, Haibo Chen	2026-03-16	下载	Existing GPU spatial sharing systems face a three-way tradeoff: resource utilization, performance isolation, and semantic determinism. Hardware partitioning suffers from hardware under-utilization.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMs	Lars Krupp, Daniel Geißler, Francisco M. Calatrava-Nicolas, Vishal Banwari, Paul Lukowicz, Jakob Karolus	2026-03-16	下载	The energy consumption of Large Language Models (LLMs) is raising growing concerns due to their adverse effects on environmental stability and resource use.