2026-02-09

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory System	Marzieh Barkhordar, Alireza Tabatabaeian, Mohammad Sadrosadati, Christina Giannoula, Juan Gomez Luna, Izzat El Hajj, Onur Mutlu, Alaa R. Alameldeen	2026-02-09	下载	Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused ...
karl. - A Research Vehicle for Automated and Connected Driving	Jean-Pierre Busch, Lukas Ostendorf, Guido Linden, Lennart Reiher, Till Beemelmanns, Bastian Lampe, Timo Woopen, Lutz Eckstein	2026-02-09	下载	As highly automated driving is transitioning from single-vehicle closed-access testing to commercial deployments of public ride-hailing in selected areas (e.g.
Antiferromagnetic Tunnel Junctions (AFMTJs) for In-Memory Computing: Modeling and Case Study	Yousuf Choudhary, Tosiron Adegbija	2026-02-09	下载	Antiferromagnetic Tunnel Junctions (AFMTJs) enable picosecond switching and femtojoule writes through ultrafast sublattice dynamics. We present the first end-to-end AFMTJ simulation framework integrat...
ZipFlow: a Compiler-based Framework to Unleash Compressed Data Movement for Modern GPUs	Gwangoo Yeo, Zhiyang Shen, Wei Cui, Matteo Interlandi, Rathijit Sen, Bailu Ding, Qi Chen, Minsoo Rhu	2026-02-09	下载	In GPU-accelerated data analytics, the overhead of data transfer from CPU to GPU becomes a performance bottleneck when the data scales beyond GPU memory capacity due to the limited PCIe bandwidth.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Harvest: Adaptive Photonic Switching Schedules for Collective Communication in Scale-up Domains	Mahir Rahman, Samuel Joseph, Nihar Kodkani, Behnaz Arzani, Vamsi Addanki	2026-02-09	下载	As chip-to-chip silicon photonics gain traction for their bandwidth and energy efficiency, their circuit-switched nature raises a fundamental question for collective communication: when and how should...
ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory System	Marzieh Barkhordar, Alireza Tabatabaeian, Mohammad Sadrosadati, Christina Giannoula, Juan Gomez Luna, Izzat El Hajj, Onur Mutlu, Alaa R. Alameldeen	2026-02-09	下载	Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused ...
Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers	Jędrzej Maczan	2026-02-09	下载	WebGPU's security-focused design imposes per-operation validation that compounds across the many small dispatches in neural network inference, yet the true cost of this overhead is poorly characterize...
Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide	Hossam Amer, Rezaul Karim, Ali Pourranjbar, Weiwei Zhang, Walid Ahmed, Boxing Chen	2026-02-09	下载	With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inference.
DynamiQ: Accelerating Gradient Synchronization using Compressed Multi-hop All-reduce	Wenchen Han, Shay Vargaftik, Michael Mitzenmacher, Ran Ben Basat	2026-02-09	下载	Multi-hop all-reduce is the de facto backbone of large model training. As the training scale increases, the network often becomes a bottleneck, motivating reducing the volume of transmitted data.
Equilibria: Fair Multi-Tenant CXL Memory Tiering At Scale	Kaiyang Zhao, Neha Gholkar, Hasan Maruf, Abhishek Dhanotia, Johannes Weiner, Gregory Price, Ning Sun, Bhavya Dwivedi, Stuart Clark, Dimitrios Skarlatos	2026-02-09	下载	Memory dominates datacenter system cost and power. Memory expansion via Compute Express Link (CXL) is an effective way to provide additional memory at lower cost and power, but its effective use requi...
PARD: Enhancing Goodput for Inference Pipeline via Proactive Request Dropping	Zhixin Zhao, Yitao Hu, Simin Chen, Mingfang Ji, Wei Yang, Yuhao Zhang, Laiping Zhao, Wenxin Li, Xiulong Liu, Wenyu Qu, Hao Wang	2026-02-09	下载	Modern deep neural network (DNN) applications integrate multiple DNN models into inference pipelines with stringent latency requirements for customized tasks.
RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT Networks	Pouria Arefijamal, Mahdi Ahmadlou, Bardia Safaei, Jörg Henkel	2026-02-09	下载	Federated learning (FL) is a decentralized learning paradigm widely adopted in resource-constrained Internet of Things (IoT) environments. These devices, typically relying on TinyML models, collaborat...
Mathematical Foundations of Modeling ETL Process Chains	Levin Maier, Lucas Schulze, Robert Lilow, Lukas Hahn, Nikola Krasowski, Arnulf Barth, Sebastian Gaebel, Ferdi Güran, Oliver Hanau, Giovanni Wagner, Falk Borgmann, Oleg Arenz, Jan Peters	2026-02-09	下载	Extract-Transform-Load (ETL) processes are core components of modern data processing infrastructures. The throughput of processed data records can be adjusted by changing the amount of allocated resou...
Modalities, a PyTorch-native Framework For Large-scale LLM Training and Research	Max Lübbering, Timm Ruland, Richard Rutmann, Felix Stollenwerk, David Fitzek, Michael Fromm, Alexander Weber, Rafet Sifa, Nicolas Flores-Herr, Joachim Köhler, Mehdi Ali	2026-02-09	下载	Today's LLM (pre-) training and research workflows typically allocate a significant amount of compute to large-scale ablation studies. Despite the substantial compute costs of these ablations, existin...
Towards CXL Resilience to CPU Failures	Antonis Psistakis, Burak Ocalan, Chloe Alverti, Fabien Chaix, Ramnatthan Alagappan, Josep Torrellas	2026-02-09	下载	Compute Express Link (CXL) 3.0 and beyond allows the compute nodes of a cluster to share data with hardware cache coherence and at the granularity of a cache line.
HEAL: Online Incremental Recovery for Leaderless Distributed Systems Across Persistency Models	Antonis Psistakis, Burak Ocalan, Fabien Chaix, Ramnatthan Alagappan, Josep Torrellas	2026-02-09	下载	Ensuring resilience in distributed systems has become an acute concern. In today's environment, it is crucial to develop light-weight mechanisms that recover a distributed system from faults quickly a...
The Computer System Trail	Sushant Kumar Gupta	2026-02-09	下载	No matter how much the world of computing changes, system design remains crucial. While most people try to learn it through quick tutorials or AI-generated summaries, there is no better way to master ...
Fork, Explore, Commit: OS Primitives for Agentic Exploration	Cong Wang, Yusheng Zheng	2026-02-09	下载	AI agents increasingly perform agentic exploration: pursuing multiple solution paths in parallel and committing only the successful one. Because each exploration path may modify files and spawn proces...
ZipFlow: a Compiler-based Framework to Unleash Compressed Data Movement for Modern GPUs	Gwangoo Yeo, Zhiyang Shen, Wei Cui, Matteo Interlandi, Rathijit Sen, Bailu Ding, Qi Chen, Minsoo Rhu	2026-02-09	下载	In GPU-accelerated data analytics, the overhead of data transfer from CPU to GPU becomes a performance bottleneck when the data scales beyond GPU memory capacity due to the limited PCIe bandwidth.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Harvest: Adaptive Photonic Switching Schedules for Collective Communication in Scale-up Domains	Mahir Rahman, Samuel Joseph, Nihar Kodkani, Behnaz Arzani, Vamsi Addanki	2026-02-09	下载	As chip-to-chip silicon photonics gain traction for their bandwidth and energy efficiency, their circuit-switched nature raises a fundamental question for collective communication: when and how should...
Probabilistic Fair Ordering of Events	Muhammad Haseeb, Jinkun Geng, Aurojit Panda, Radhika Mittal, Nirav Atre, Srinivas Narayana, Anirudh Sivaraman	2026-02-09	下载	A growing class of applications depends on fair ordering, where events that occur earlier should be processed before later ones. Providing such guarantees is difficult in practice because clock synchr...
Zero Trust for Multi-RAT IoT: Trust Boundary Management in Heterogeneous Wireless Network Environments	Jonathan Shelby	2026-02-09	下载	The proliferation of Multi-Radio Access Technology, Internet of Things devices, particularly Unmanned Aerial Vehicles operating across LoRaWAN, 5G/4G cellular, Meshtastic mesh, proprietary protocols s...
Lightweight Call Signaling and Peer-to-Peer Control of WebRTC Video Conferencing	Kundan Singh	2026-02-09	下载	We present the software architecture and implementation of our web-based multiparty video conference application. It does not use a media server.
DynamiQ: Accelerating Gradient Synchronization using Compressed Multi-hop All-reduce	Wenchen Han, Shay Vargaftik, Michael Mitzenmacher, Ran Ben Basat	2026-02-09	下载	Multi-hop all-reduce is the de facto backbone of large model training. As the training scale increases, the network often becomes a bottleneck, motivating reducing the volume of transmitted data.
Framework for Integrating Zero Trust in Cloud-Based Endpoint Security for Critical Infrastructure	Shyam Kumar Gajula	2026-02-09	下载	Cyber threats have become highly sophisticated, prompting a heightened concern for endpoint security, especially in critical infrastructure, to new heights.
Rethinking IPv6 Defense: A Unified Edge-Centric Zero-Trust Data-Plane Architecture	Walid Aljoby, Mohammed Alzayani, Md. Kamrul Hossain, Khaled A. Harras	2026-02-09	下载	IPv6 dependability is increasingly inseparable from IPv6 security: Neighbor Discovery (ND), Router Advertisements (RA), and ICMPv6 are essential for correct operation yet expose a broad attack surface...
6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks	Mohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah	2026-02-09	下载	This paper introduces 6G-Bench, an open benchmark for evaluating semantic communication and network-level reasoning in AI-native 6G networks. 6G-Bench defines a taxonomy of 30 decision-making tasks (T...
Equitable Multi-Task Learning for AI-RANs	Panayiotis Raptis, Fatih Aslan, George Iosifidis	2026-02-09	下载	AI-enabled Radio Access Networks (AI-RANs) are expected to serve heterogeneous users with time-varying learning tasks over shared edge resources.
From Raw Data to Shared 3D Semantics: Task-Oriented Communication for Multi-Robot Collaboration	Ruibo Xue, Jiedan Tan, Fang Liu, Jingwen Tong, Taotao Wang, Shuoyao Wang	2026-02-09	下载	Multi-robot systems (MRS) rely on exchanging raw sensory data to cooperate in complex three-dimensional (3D) environments. However, this strategy often leads to severe communication congestion and hig...
Decentralized Spatial Reuse Optimization in Wi-Fi: An Internal Regret Minimization Approach	Francesc Wilhelmi, Boris Bellalta, Miguel Casasnovas, Aleksandra Kijanka, Miguel Calvo-Fullana	2026-02-09	下载	Spatial Reuse (SR) is a cost-effective technique for improving spectral efficiency in dense IEEE 802.11 deployments by enabling simultaneous transmissions.
RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT Networks	Pouria Arefijamal, Mahdi Ahmadlou, Bardia Safaei, Jörg Henkel	2026-02-09	下载	Federated learning (FL) is a decentralized learning paradigm widely adopted in resource-constrained Internet of Things (IoT) environments. These devices, typically relying on TinyML models, collaborat...
PACC: Protocol-Aware Cross-Layer Compression for Compact Network Traffic Representation	Zhaochen Guo, Tianyufei Zhou, Honghao Wang, Ronghua Li, Shinan Liu	2026-02-09	下载	Network traffic classification is a core primitive for network security and management, yet it is increasingly challenged by pervasive encryption and evolving protocols.
MonkeyTree: Near-Minimal Congestion for Multi-tenant Training via Migration	Anton A. Zabreyko, Weiyang Wang, Manya Ghobadi	2026-02-09	下载	We present MonkeyTree, the first system to mitigate network congestion in multi-tenant GPU clusters through job-migration based defragmentation rather than network-layer techniques.
Software Testing at the Network Layer: Automated HTTP API Quality Assessment and Security Analysis of Production Web Applications	Ali Hassaan Mughal, Muhammad Bilal, Noor Fatima	2026-02-09	下载	Modern web applications rely heavily on client-side API calls to fetch data, render content, and communicate with backend services. However, the quality of these network interactions (redundant reques...
NeuroScaler: Towards Energy-Optimal Autoscaling for Container-Based Services	Alisson O. Chaves, Rodrigo Moreira, Larissa F. Rodrigues Moreira, Joao Correia, David Santos, Rui Silva, Tiago Barros, Daniel Corujo, Miguel Rocha, Flavio de Oliveira Silva	2026-02-09	下载	Future networks must meet stringent requirements while operating within tight energy and carbon constraints. Current autoscaling mechanisms remain workload-centric and infrastructure-siloed, and are l...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Equilibria: Fair Multi-Tenant CXL Memory Tiering At Scale	Kaiyang Zhao, Neha Gholkar, Hasan Maruf, Abhishek Dhanotia, Johannes Weiner, Gregory Price, Ning Sun, Bhavya Dwivedi, Stuart Clark, Dimitrios Skarlatos	2026-02-09	下载	Memory dominates datacenter system cost and power. Memory expansion via Compute Express Link (CXL) is an effective way to provide additional memory at lower cost and power, but its effective use requi...
The Computer System Trail	Sushant Kumar Gupta	2026-02-09	下载	No matter how much the world of computing changes, system design remains crucial. While most people try to learn it through quick tutorials or AI-generated summaries, there is no better way to master ...
Fork, Explore, Commit: OS Primitives for Agentic Exploration	Cong Wang, Yusheng Zheng	2026-02-09	下载	AI agents increasingly perform agentic exploration: pursuing multiple solution paths in parallel and committing only the successful one. Because each exploration path may modify files and spawn proces...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three Browsers	Jędrzej Maczan	2026-02-09	下载	WebGPU's security-focused design imposes per-operation validation that compounds across the many small dispatches in neural network inference, yet the true cost of this overhead is poorly characterize...
A Machine Learning accelerated geophysical fluid solver	Yang Bai	2026-02-09	下载	Machine learning methods have been successful in many areas, like image classification and natural language processing. However, it still needs to be determined how to apply ML to areas with mathemati...