Skip to content

2026-03-16

cs.AR - Architecture

标题作者发布日期PDF摘要
LEXI: Lossless Exponent Coding for Efficient Inter-Chiplet Communication in Hybrid LLMsMiao Sun, Alish Kanani, Kaushik Shroff, Umit Ogras2026-03-16下载Data movement overheads increase the inference latency of state-of-the-art large language models (LLMs). These models commonly use the bfloat16 (BF16) format for stable training.
Co-Design of Memory-Storage Systems for Workload Awareness with Interpretable ModelsJay Sarkar, Vamsi Pavan Rayaprolu, Abhijeet Bhalerao2026-03-16下载Solid-state storage architectures based on NAND or emerging memory devices (SSD), are fundamentally architected and optimized for both reliability and performance.
DUET: Disaggregated Hybrid Mamba-Transformer LLMs with Prefill and Decode-Specific PackagesAlish Kanani, Sangwan Lee, Han Lyu, Jiahao Lin, Jaehyun Park, Umit Y. Ogras2026-03-16下载Large language models operate in distinct compute-bound prefill followed by memory bandwidth-bound decode phases. Hybrid Mamba-Transformer models inherit this asymmetry while adding state space model ...
GLANCE: Gaze-Led Attention Network for Compressed Edge-inferenceNeeraj Solanki, Hong Ding, Sepehr Tabrizchi, Ali Shafiee Sarvestani, Shaahin Angizi, David Z. Pan, Arman Roohi2026-03-16下载Real-time object detection in AR/VR systems faces critical computational constraints, requiring sub-10,ms latency within tight power budgets.
RESQ: A Unified Framework for REliability- and Security Enhancement of Quantized Deep Neural NetworksAli Soltan Mohammadi, Samira Nazari, Ali Azarpeyvand, Mahdi Taheri, Milos Krstic, Michael Huebner, Christian Herglotz, Tara Ghasempouri2026-03-16下载This work proposes a unified three-stage framework that produces a quantized DNN with balanced fault and attack robustness. The first stage improves attack resilience via fine-tuning that desensitizes...
bitSMM: A bit-Serial Matrix Multiplication AcceleratorPedro Antunes, Artur Podobas2026-03-16下载Neural-network (NN) inference is increasingly present on-board spacecraft to reduce downlink bandwidth and enable timely decision making. However, the power and reliability constraints of space missio...
SkipOPU: An FPGA-based Overlay Processor for Large Language Models with Dynamically Allocated ComputationZicheng He, Anhao Zhao, Xiaoyu Shen, Chen Wu, Lei He2026-03-16下载Large language models (LLMs) have achieved remarkable performance across a wide range of tasks, but their inference efficiency remains a critical bottleneck due to rapidly growing parameters.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
DUET: Disaggregated Hybrid Mamba-Transformer LLMs with Prefill and Decode-Specific PackagesAlish Kanani, Sangwan Lee, Han Lyu, Jiahao Lin, Jaehyun Park, Umit Y. Ogras2026-03-16下载Large language models operate in distinct compute-bound prefill followed by memory bandwidth-bound decode phases. Hybrid Mamba-Transformer models inherit this asymmetry while adding state space model ...
Cuckoo-GPU: Accelerating Cuckoo Filters on Modern GPUsTim Dortmann, Markus Vieth, Bertil Schmidt2026-03-16下载Approximate Membership Query (AMQ) structures are essential for high-throughput systems in databases, networking, and bioinformatics. While Bloom filters offer speed, they lack support for deletions.
Multi-Objective Load Balancing for Heterogeneous Edge-Based Object Detection SystemsDaghash K. Alqahtani, Maria A. Rodriguez, Muhammad Aamir Cheema, Adel N. Toosi2026-03-16下载The rapid proliferation of the Internet of Things (IoT) and smart applications has led to a surge in data generated by distributed sensing devices.
LMetric: Simple is Better - Multiplication May Be All You Need for LLM Request SchedulingDingyan Zhang, Jinbo Han, Kaixi Zhang, Xingda Wei, Sijie Shen, Chenguang Fang, Wenyuan Yu, Jingren Zhou, Rong Chen2026-03-16下载High-quality LLM request scheduling requires achieving two key objectives: whether the routed instance has KV$ to accelerate the request execution and whether the workload is balanced across instances...
Token Coherence: Adapting MESI Cache Protocols to Minimize Synchronization Overhead in Multi-Agent LLM SystemsVladyslav Parakhin2026-03-16下载Multi-agent LLM orchestration incurs synchronization costs scaling as O(n x S x |D|) in agents, steps, and artifact size under naive broadcast -- a regime I term broadcast-induced triply-multiplicat...
Performance Isolation and Semantic Determinism in Efficient GPU Spatial SharingZhenyuan Yang, Wenxin Zheng, Mingyu Li, Haibo Chen2026-03-16下载Existing GPU spatial sharing systems face a three-way tradeoff: resource utilization, performance isolation, and semantic determinism. Hardware partitioning suffers from hardware under-utilization.
SCALE-TRACK: Asynchronous Euler-Lagrange particle tracking on heterogeneous computing architectureSilvio Schmalfuß, Sergey Lesnik, Henrik Rusche, Dennis Niedermeier2026-03-16下载Euler-Lagrange (EL) simulations provide a direct and robust framework for modeling disperse multiphase flows. However, they are computationally expensive.
Real-Time Driver Safety Scoring Through Inverse Crash Probability ModelingJoyjit Roy, Samaresh Kumar Singh, Sushanta Das2026-03-16下载Road crashes remain a leading cause of preventable fatalities. Existing prediction models predominantly produce binary outcomes, which offer limited actionable insights for real-time driver feedback.
Protecting Distributed Blockchain with Twin-Field Quantum Key Distribution: A Quantum Resistant ApproachXuan Li, Ying Guo2026-03-16下载Quantum computing provides the feasible multi-layered security challenges to classical blockchain systems. Whereas, quantum-secured blockchains relied on quantum key distribution (QKD) to establish se...
Fold-CP: A Context Parallelism Framework for Biomolecular ModelingDejun Lin, Simon Chu, Vishanth Iyer, Youhan Lee, John St John, Kevin Boyd, Brian Roland, Xiaowei Ren, Guoqing Zhou, Zhonglin Cao, Polina Binder, Yuliya Zhautouskaya, Jakub Zakrzewski, Maximilian Stadler, Kyle Gion, Yuxing Peng, Xi Chen, Tianjing Zhang, Philipp Junk, Michelle Dimon, Paweł Gniewek, Fabian Ortega, McKinley Polen, Ivan Grubisic, Ali Bashir, Graham Holt, Danny Kovtun, Matthias Grass, Luca Naef, Rui Wang, Jian Peng, Anthony Costa, Saee Paliwal, Eddie Calleja, Timur Rvachov, Neha Tadimeti, Roy Tal, Emine Kucukbenli2026-03-16下载Understanding cellular machinery requires atomic-scale reconstruction of large biomolecular assemblies. However, predicting the structures of these systems has been constrained by hardware memory requ...
DeFRiS: Silo-Cooperative IoT Applications Scheduling via Decentralized Federated Reinforcement LearningZhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya2026-03-16下载Next-generation IoT applications increasingly span across autonomous administrative entities, necessitating silo-cooperative scheduling to leverage diverse computational resources while preserving dat...
Can you keep a secret? A new protocol for sender-side enforcement of causal message deliveryYan Tong, Nathan Liittschwager, Lindsey Kuper2026-03-16下载Protocols for causal message delivery are widely used in distributed systems. Traditionally, causal delivery can be enforced either on the message sender's side or on the receiver's side.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Galaxy Tracer: A Topology-First 3D Interface for Interactive PCAP ExplorationRyan Younger2026-03-16下载Packet analysis tools conventionally present capture data through tabular packet lists, constraining the analyst to a sequential view that obscures the relational structure of network communication.
Evaluating Performance Characteristic of Opportunistic Routing Protocols: A Case Study of the 2016 Italian League Match Earthquake in the Stadio AdriaticoYihang Cao, Milena Radenkovic2026-03-16下载Delay Tolerant Networks (DTNs) can provide emergency communication support when conventional infrastructure is disrupted during disasters. This paper evaluates the performance of opportunistic routing...
The Internet of Physical AI Agents: Interoperability, Longevity, and the Cost of Getting It WrongRoberto Morabito, Mallik Tatipamula2026-03-16下载The Internet has evolved by progressively expanding what humanity connects: first computers, then people, and later billions of devices through the Internet of Things (IoT).
Entropy-Aware Task Offloading in Mobile Edge ComputingMohsen Sahraei Ardakani, Hong Wan, Rui Song2026-03-16下载Mobile Edge Computing (MEC) technology has been introduced to enable could computing at the edge of the network in order to help resource limited mobile devices with time sensitive data processing tas...
Bridging Local and Global Knowledge: Cascaded Mixture-of-Experts Learning for Near-Shortest Path RoutingYung-Fu Chen, Anish Arora2026-03-16下载While deep learning models that leverage local features have demonstrated significant potential for near-optimal routing in dense Euclidean graphs, they struggle to generalize well in sparse networks ...
ln(3): A Universal Percolation Constant for Collective Dynamics on One-Dimensional Proximity NetworksJian Ji2026-03-16下载We report the identification and proof of a universal constant, ln(3) = 1.09861, which governs the onset of bidirectional collective behavior in one-dimensional Poisson proximity networks.
Brain-Inspired Graph Multi-Agent Systems for LLM ReasoningGuangfu Hao, Yuming Dai, Xianzhe Qin, Shan Yu2026-03-16下载Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of language tasks, yet complex multi-step reasoning remains a fundamental challenge.
SliceMapper: Intelligent Mapping of O-CU and O-DU onto O-Cloud Sites in 6G O-RANMohammad Asif Habibi, Xavier Costa-Pérez, Hans D. Schotten2026-03-16下载In this paper, we propose an rApp, named SliceMapper, to optimize the mapping process of the open centralized unit (O-CU) and open distributed unit (O-DU) of an open radio access network (O-RAN) slice...
Joint Routing and Model Pruning for Decentralized Federated Learning in Bandwidth-Constrained Multi-Hop Wireless NetworksXiaoyu He, Weicai Li, Tiejun Lv, Xi Yu2026-03-16下载Decentralized federated learning (D-FL) enables privacy-preserving training without a central server, but multi-hop model exchanges and aggregation are often bottlenecked by communication resource con...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
LMetric: Simple is Better - Multiplication May Be All You Need for LLM Request SchedulingDingyan Zhang, Jinbo Han, Kaixi Zhang, Xingda Wei, Sijie Shen, Chenguang Fang, Wenyuan Yu, Jingren Zhou, Rong Chen2026-03-16下载High-quality LLM request scheduling requires achieving two key objectives: whether the routed instance has KV$ to accelerate the request execution and whether the workload is balanced across instances...
Performance Isolation and Semantic Determinism in Efficient GPU Spatial SharingZhenyuan Yang, Wenxin Zheng, Mingyu Li, Haibo Chen2026-03-16下载Existing GPU spatial sharing systems face a three-way tradeoff: resource utilization, performance isolation, and semantic determinism. Hardware partitioning suffers from hardware under-utilization.

cs.PF - Performance

标题作者发布日期PDF摘要
This Is Taking Too Long -- Investigating Time as a Proxy for Energy Consumption of LLMsLars Krupp, Daniel Geißler, Francisco M. Calatrava-Nicolas, Vishal Banwari, Paul Lukowicz, Jakob Karolus2026-03-16下载The energy consumption of Large Language Models (LLMs) is raising growing concerns due to their adverse effects on environmental stability and resource use.

基于 VitePress 构建