Skip to content

2026-02-09

cs.AR - Architecture

标题作者发布日期PDF摘要
ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory SystemMarzieh Barkhordar, Alireza Tabatabaeian, Mohammad Sadrosadati, Christina Giannoula, Juan Gomez Luna, Izzat El Hajj, Onur Mutlu, Alaa R. Alameldeen2026-02-09下载Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused ...
karl. - A Research Vehicle for Automated and Connected DrivingJean-Pierre Busch, Lukas Ostendorf, Guido Linden, Lennart Reiher, Till Beemelmanns, Bastian Lampe, Timo Woopen, Lutz Eckstein2026-02-09下载As highly automated driving is transitioning from single-vehicle closed-access testing to commercial deployments of public ride-hailing in selected areas (e.g.
Antiferromagnetic Tunnel Junctions (AFMTJs) for In-Memory Computing: Modeling and Case StudyYousuf Choudhary, Tosiron Adegbija2026-02-09下载Antiferromagnetic Tunnel Junctions (AFMTJs) enable picosecond switching and femtojoule writes through ultrafast sublattice dynamics. We present the first end-to-end AFMTJ simulation framework integrat...
ZipFlow: a Compiler-based Framework to Unleash Compressed Data Movement for Modern GPUsGwangoo Yeo, Zhiyang Shen, Wei Cui, Matteo Interlandi, Rathijit Sen, Bailu Ding, Qi Chen, Minsoo Rhu2026-02-09下载In GPU-accelerated data analytics, the overhead of data transfer from CPU to GPU becomes a performance bottleneck when the data scales beyond GPU memory capacity due to the limited PCIe bandwidth.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Harvest: Adaptive Photonic Switching Schedules for Collective Communication in Scale-up DomainsMahir Rahman, Samuel Joseph, Nihar Kodkani, Behnaz Arzani, Vamsi Addanki2026-02-09下载As chip-to-chip silicon photonics gain traction for their bandwidth and energy efficiency, their circuit-switched nature raises a fundamental question for collective communication: when and how should...
ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory SystemMarzieh Barkhordar, Alireza Tabatabaeian, Mohammad Sadrosadati, Christina Giannoula, Juan Gomez Luna, Izzat El Hajj, Onur Mutlu, Alaa R. Alameldeen2026-02-09下载Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused ...
Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three BrowsersJędrzej Maczan2026-02-09下载WebGPU's security-focused design imposes per-operation validation that compounds across the many small dispatches in neural network inference, yet the true cost of this overhead is poorly characterize...
Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design GuideHossam Amer, Rezaul Karim, Ali Pourranjbar, Weiwei Zhang, Walid Ahmed, Boxing Chen2026-02-09下载With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inference.
DynamiQ: Accelerating Gradient Synchronization using Compressed Multi-hop All-reduceWenchen Han, Shay Vargaftik, Michael Mitzenmacher, Ran Ben Basat2026-02-09下载Multi-hop all-reduce is the de facto backbone of large model training. As the training scale increases, the network often becomes a bottleneck, motivating reducing the volume of transmitted data.
Equilibria: Fair Multi-Tenant CXL Memory Tiering At ScaleKaiyang Zhao, Neha Gholkar, Hasan Maruf, Abhishek Dhanotia, Johannes Weiner, Gregory Price, Ning Sun, Bhavya Dwivedi, Stuart Clark, Dimitrios Skarlatos2026-02-09下载Memory dominates datacenter system cost and power. Memory expansion via Compute Express Link (CXL) is an effective way to provide additional memory at lower cost and power, but its effective use requi...
PARD: Enhancing Goodput for Inference Pipeline via Proactive Request DroppingZhixin Zhao, Yitao Hu, Simin Chen, Mingfang Ji, Wei Yang, Yuhao Zhang, Laiping Zhao, Wenxin Li, Xiulong Liu, Wenyu Qu, Hao Wang2026-02-09下载Modern deep neural network (DNN) applications integrate multiple DNN models into inference pipelines with stringent latency requirements for customized tasks.
RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT NetworksPouria Arefijamal, Mahdi Ahmadlou, Bardia Safaei, Jörg Henkel2026-02-09下载Federated learning (FL) is a decentralized learning paradigm widely adopted in resource-constrained Internet of Things (IoT) environments. These devices, typically relying on TinyML models, collaborat...
Mathematical Foundations of Modeling ETL Process ChainsLevin Maier, Lucas Schulze, Robert Lilow, Lukas Hahn, Nikola Krasowski, Arnulf Barth, Sebastian Gaebel, Ferdi Güran, Oliver Hanau, Giovanni Wagner, Falk Borgmann, Oleg Arenz, Jan Peters2026-02-09下载Extract-Transform-Load (ETL) processes are core components of modern data processing infrastructures. The throughput of processed data records can be adjusted by changing the amount of allocated resou...
Modalities, a PyTorch-native Framework For Large-scale LLM Training and ResearchMax Lübbering, Timm Ruland, Richard Rutmann, Felix Stollenwerk, David Fitzek, Michael Fromm, Alexander Weber, Rafet Sifa, Nicolas Flores-Herr, Joachim Köhler, Mehdi Ali2026-02-09下载Today's LLM (pre-) training and research workflows typically allocate a significant amount of compute to large-scale ablation studies. Despite the substantial compute costs of these ablations, existin...
Towards CXL Resilience to CPU FailuresAntonis Psistakis, Burak Ocalan, Chloe Alverti, Fabien Chaix, Ramnatthan Alagappan, Josep Torrellas2026-02-09下载Compute Express Link (CXL) 3.0 and beyond allows the compute nodes of a cluster to share data with hardware cache coherence and at the granularity of a cache line.
HEAL: Online Incremental Recovery for Leaderless Distributed Systems Across Persistency ModelsAntonis Psistakis, Burak Ocalan, Fabien Chaix, Ramnatthan Alagappan, Josep Torrellas2026-02-09下载Ensuring resilience in distributed systems has become an acute concern. In today's environment, it is crucial to develop light-weight mechanisms that recover a distributed system from faults quickly a...
The Computer System TrailSushant Kumar Gupta2026-02-09下载No matter how much the world of computing changes, system design remains crucial. While most people try to learn it through quick tutorials or AI-generated summaries, there is no better way to master ...
Fork, Explore, Commit: OS Primitives for Agentic ExplorationCong Wang, Yusheng Zheng2026-02-09下载AI agents increasingly perform agentic exploration: pursuing multiple solution paths in parallel and committing only the successful one. Because each exploration path may modify files and spawn proces...
ZipFlow: a Compiler-based Framework to Unleash Compressed Data Movement for Modern GPUsGwangoo Yeo, Zhiyang Shen, Wei Cui, Matteo Interlandi, Rathijit Sen, Bailu Ding, Qi Chen, Minsoo Rhu2026-02-09下载In GPU-accelerated data analytics, the overhead of data transfer from CPU to GPU becomes a performance bottleneck when the data scales beyond GPU memory capacity due to the limited PCIe bandwidth.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Harvest: Adaptive Photonic Switching Schedules for Collective Communication in Scale-up DomainsMahir Rahman, Samuel Joseph, Nihar Kodkani, Behnaz Arzani, Vamsi Addanki2026-02-09下载As chip-to-chip silicon photonics gain traction for their bandwidth and energy efficiency, their circuit-switched nature raises a fundamental question for collective communication: when and how should...
Probabilistic Fair Ordering of EventsMuhammad Haseeb, Jinkun Geng, Aurojit Panda, Radhika Mittal, Nirav Atre, Srinivas Narayana, Anirudh Sivaraman2026-02-09下载A growing class of applications depends on fair ordering, where events that occur earlier should be processed before later ones. Providing such guarantees is difficult in practice because clock synchr...
Zero Trust for Multi-RAT IoT: Trust Boundary Management in Heterogeneous Wireless Network EnvironmentsJonathan Shelby2026-02-09下载The proliferation of Multi-Radio Access Technology, Internet of Things devices, particularly Unmanned Aerial Vehicles operating across LoRaWAN, 5G/4G cellular, Meshtastic mesh, proprietary protocols s...
Lightweight Call Signaling and Peer-to-Peer Control of WebRTC Video ConferencingKundan Singh2026-02-09下载We present the software architecture and implementation of our web-based multiparty video conference application. It does not use a media server.
DynamiQ: Accelerating Gradient Synchronization using Compressed Multi-hop All-reduceWenchen Han, Shay Vargaftik, Michael Mitzenmacher, Ran Ben Basat2026-02-09下载Multi-hop all-reduce is the de facto backbone of large model training. As the training scale increases, the network often becomes a bottleneck, motivating reducing the volume of transmitted data.
Framework for Integrating Zero Trust in Cloud-Based Endpoint Security for Critical InfrastructureShyam Kumar Gajula2026-02-09下载Cyber threats have become highly sophisticated, prompting a heightened concern for endpoint security, especially in critical infrastructure, to new heights.
Rethinking IPv6 Defense: A Unified Edge-Centric Zero-Trust Data-Plane ArchitectureWalid Aljoby, Mohammed Alzayani, Md. Kamrul Hossain, Khaled A. Harras2026-02-09下载IPv6 dependability is increasingly inseparable from IPv6 security: Neighbor Discovery (ND), Router Advertisements (RA), and ICMPv6 are essential for correct operation yet expose a broad attack surface...
6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G NetworksMohamed Amine Ferrag, Abderrahmane Lakas, Merouane Debbah2026-02-09下载This paper introduces 6G-Bench, an open benchmark for evaluating semantic communication and network-level reasoning in AI-native 6G networks. 6G-Bench defines a taxonomy of 30 decision-making tasks (T...
Equitable Multi-Task Learning for AI-RANsPanayiotis Raptis, Fatih Aslan, George Iosifidis2026-02-09下载AI-enabled Radio Access Networks (AI-RANs) are expected to serve heterogeneous users with time-varying learning tasks over shared edge resources.
From Raw Data to Shared 3D Semantics: Task-Oriented Communication for Multi-Robot CollaborationRuibo Xue, Jiedan Tan, Fang Liu, Jingwen Tong, Taotao Wang, Shuoyao Wang2026-02-09下载Multi-robot systems (MRS) rely on exchanging raw sensory data to cooperate in complex three-dimensional (3D) environments. However, this strategy often leads to severe communication congestion and hig...
Decentralized Spatial Reuse Optimization in Wi-Fi: An Internal Regret Minimization ApproachFrancesc Wilhelmi, Boris Bellalta, Miguel Casasnovas, Aleksandra Kijanka, Miguel Calvo-Fullana2026-02-09下载Spatial Reuse (SR) is a cost-effective technique for improving spectral efficiency in dense IEEE 802.11 deployments by enabling simultaneous transmissions.
RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT NetworksPouria Arefijamal, Mahdi Ahmadlou, Bardia Safaei, Jörg Henkel2026-02-09下载Federated learning (FL) is a decentralized learning paradigm widely adopted in resource-constrained Internet of Things (IoT) environments. These devices, typically relying on TinyML models, collaborat...
PACC: Protocol-Aware Cross-Layer Compression for Compact Network Traffic RepresentationZhaochen Guo, Tianyufei Zhou, Honghao Wang, Ronghua Li, Shinan Liu2026-02-09下载Network traffic classification is a core primitive for network security and management, yet it is increasingly challenged by pervasive encryption and evolving protocols.
MonkeyTree: Near-Minimal Congestion for Multi-tenant Training via MigrationAnton A. Zabreyko, Weiyang Wang, Manya Ghobadi2026-02-09下载We present MonkeyTree, the first system to mitigate network congestion in multi-tenant GPU clusters through job-migration based defragmentation rather than network-layer techniques.
Software Testing at the Network Layer: Automated HTTP API Quality Assessment and Security Analysis of Production Web ApplicationsAli Hassaan Mughal, Muhammad Bilal, Noor Fatima2026-02-09下载Modern web applications rely heavily on client-side API calls to fetch data, render content, and communicate with backend services. However, the quality of these network interactions (redundant reques...
NeuroScaler: Towards Energy-Optimal Autoscaling for Container-Based ServicesAlisson O. Chaves, Rodrigo Moreira, Larissa F. Rodrigues Moreira, Joao Correia, David Santos, Rui Silva, Tiago Barros, Daniel Corujo, Miguel Rocha, Flavio de Oliveira Silva2026-02-09下载Future networks must meet stringent requirements while operating within tight energy and carbon constraints. Current autoscaling mechanisms remain workload-centric and infrastructure-siloed, and are l...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Equilibria: Fair Multi-Tenant CXL Memory Tiering At ScaleKaiyang Zhao, Neha Gholkar, Hasan Maruf, Abhishek Dhanotia, Johannes Weiner, Gregory Price, Ning Sun, Bhavya Dwivedi, Stuart Clark, Dimitrios Skarlatos2026-02-09下载Memory dominates datacenter system cost and power. Memory expansion via Compute Express Link (CXL) is an effective way to provide additional memory at lower cost and power, but its effective use requi...
The Computer System TrailSushant Kumar Gupta2026-02-09下载No matter how much the world of computing changes, system design remains crucial. While most people try to learn it through quick tutorials or AI-generated summaries, there is no better way to master ...
Fork, Explore, Commit: OS Primitives for Agentic ExplorationCong Wang, Yusheng Zheng2026-02-09下载AI agents increasingly perform agentic exploration: pursuing multiple solution paths in parallel and committing only the successful one. Because each exploration path may modify files and spawn proces...

cs.PF - Performance

标题作者发布日期PDF摘要
Characterizing WebGPU Dispatch Overhead for LLM Inference Across Four GPU Vendors, Three Backends, and Three BrowsersJędrzej Maczan2026-02-09下载WebGPU's security-focused design imposes per-operation validation that compounds across the many small dispatches in neural network inference, yet the true cost of this overhead is poorly characterize...
A Machine Learning accelerated geophysical fluid solverYang Bai2026-02-09下载Machine learning methods have been successful in many areas, like image classification and natural language processing. However, it still needs to be determined how to apply ML to areas with mathemati...

基于 VitePress 构建