Skip to content

2026-03-05

cs.AR - Architecture

标题作者发布日期PDF摘要
Scalable Digital Compute-in-Memory Ising Machines for Robustness Verification of Binary Neural NetworksMadhav Vadlamani, Rahul Singh, Yuyao Kong, Zheng Zhang, Shimeng Yu2026-03-05下载Verification of binary neural network (BNN) robustness is NP-hard, as it can be formulated as a combinatorial search for an adversarial perturbation that induces misclassification.
NL2GDS: LLM-aided interface for Open Source Chip DesignMax Eland, Jeyan Thiyagalingam, Dinesh Pamunuwa, Roshan Weerasekera2026-03-05下载The growing complexity of hardware design and the widening gap between high-level specifications and register-transfer level (RTL) implementation hinder rapid prototyping and system design.
Network Design for Wafer-Scale Systems with Wafer-on-Wafer Hybrid BondingPatrick Iff, Tommaso Bonato, Maciej Besta, Luca Benini, Torsten Hoefler2026-03-05下载Transformer-based large language models are increasingly constrained by data movement as communication bandwidth drops sharply beyond the chip boundary.
AI+HW 2035: Shaping the Next DecadeDeming Chen, Jason Cong, Azalia Mirhoseini, Christos Kozyrakis, Subhasish Mitra, Jinjun Xiong, Cliff Young, Anima Anandkumar, Michael Littman, Aron Kirschen, Sophia Shao, Serge Leef, Naresh Shanbhag, Dejan Milojicic, Michael Schulte, Gert Cauwenberghs, Jerry M. Chow, Tri Dao, Kailash Gopalakrishnan, Richard Ho, Hoshik Kim, Kunle Olukotun, David Z. Pan, Mark Ren, Dan Roth, Aarti Singh, Yizhou Sun, Yusu Wang, Yann LeCun, Ruchir Puri2026-03-05下载Artificial intelligence (AI) and hardware (HW) are advancing at unprecedented rates, yet their trajectories have become inseparably intertwined.
Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4Musa Cim, Burak Topcu, Mahmut Taylan Kandemir2026-03-05下载Quantization addresses the high resource demand for large language models (LLMs) by alleviating memory pressure and bandwidth congestion and providing significantly scaled compute power with a tolerab...
MCEL: Margin-Based Cross-Entropy Loss for Error-Tolerant Quantized Neural NetworksMikail Yayla, Akash Kumar2026-03-05下载Robustness to bit errors is a key requirement for the reliable use of neural networks (NNs) on emerging approximate computing platforms and error-prone memory technologies.
VMXDOTP: A RISC-V Vector ISA Extension for Efficient Microscaling (MX) Format AccelerationMax Wipfli, Gamze İslamoğlu, Navaneeth Kunhi Purayil, Angelo Garofalo, Luca Benini2026-03-05下载Compared to the first generation of deep neural networks, dominated by regular, compute-intensive kernels such as matrix multiplications (MatMuls) and convolutions, modern decoder-based transformers i...
Hardware-Software Co-design for 3D-DRAM-based LLM Serving AcceleratorCong Li, Yihan Yin, Chenhao Xue, Zhao Wang, Fujun Bai, Yixin Guo, Xiping Jiang, Qiang Wu, Yuan Xie, Guangyu Sun2026-03-05下载Large language models (LLMs) have been widely deployed for online generative services, where numerous LLM instances jointly handle workloads with fluctuating request arrival rates and variable request...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
A Lock-Free Work-Stealing Algorithm for Bulk OperationsRaja Sai Nandhan Yadav Kataru, Danial Davarnia, Ali Jannesari2026-03-05下载Work-stealing is a widely used technique for balancing irregular parallel workloads, and most modern runtime systems adopt lock-free work-stealing deques to reduce contention and improve scalability.
Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and BottlenecksBurak Topcu, Musa Oguzhan Cim, Poovaiah Palangappa, Meena Arunachalam, Mahmut Taylan Kandemir2026-03-05下载Breakthroughs in the generative AI domain have fueled an explosion of large language model (LLM)-powered applications, whose workloads fundamentally consist of sequences of inferences through transfor...
Why Ethereum Needs Fairness Mechanisms that Do Not Depend on Participant AltruismPatrick Spiesberger, Nils Henrik Beyer, Hannes Hartenstein2026-03-05下载Ethereum's ideals of decentralization and censorship resistance are undermined in practice, motivating ongoing efforts to reestablish these properties.
The Need for Quantitative Resilience Models and Metrics in Classical-Quantum Computing SystemsSantiago Núñez-Corrales2026-03-05下载Increasingly deeper integration of HPC resources and QPUs unveils new challenges in computer architecture and engineering. As a consequence, dependability arises again as a concern encompassing resili...
Radiation Hydrodynamics at Scale: Comparing MPI and Asynchronous Many-Task Runtimes with FleCSIAlexander Strack, Hartmut Kaiser, Dirk Pflüger2026-03-05下载Writing efficient distributed code remains a labor-intensive and complex endeavor. To simplify application development, the Flexible Computational Science Infrastructure (FleCSI) framework offers a us...
A monitoring system for collecting and aggregating metrics from distributed cloudsTamara Ranković, Mateja Rilak, Janko Rakonjac, Miloš Simić2026-03-05下载Applications requiring real-time processing of large volumes of data have been the main driver for rethinking the traditional cloud, giving rise to novel cloud models.
Scaling Real-Time Traffic Analytics on Edge-Cloud Fabrics for City-Scale Camera NetworksAkash Sharma, Pranjal Naman, Roopkatha Banerjee, Priyanshu Pansari, Sankalp Gawali, Mayank Arya, Sharath Chandra, Arun Josephraj, Rakshit Ramesh, Punit Rathore, Anirban Chakraborty, Raghu Krishnapuram, Vijay Kovvali, Yogesh Simmhan2026-03-05下载Real-time city-scale traffic analytics requires processing 100s-1000s of CCTV streams under strict latency, bandwidth, and compute limits. We present a scalable AI-driven Intelligent Transportation Sy...
Leveraging Structural Knowledge for Solving Election in Anonymous Networks with Shared RandomnessJérémie Chalopin, Emmanuel Godard2026-03-05下载We study the classical Election problem in anonymous net- works, where solutions can rely on the use of random bits, which may be either shared or unshared among nodes.
PromptTuner: SLO-Aware Elastic System for LLM Prompt TuningWei Gao, Peng Sun, Dmitrii Ustiugov, Tianwei Zhang, Yonggang Wen2026-03-05下载Prompt tuning has become a prominent strategy for enhancing the performance of Large Language Models (LLMs) on downstream tasks. Many IT enterprises now offer Prompt-Tuning-as-a-Service to fulfill the...
FluxSieve: Unifying Streaming and Analytical Data Planes for Scalable Cloud ObservabilityAdriano Vogel, Sören Henning, Otmar Ertl2026-03-05下载Despite many advances in query optimization, indexing techniques, and data storage, modern data platforms still face difficulties in delivering robust query performance under high concurrency and comp...
The Semantic Arrow of Time, Part V: The Leibniz Bridge -- Toward a Unified Theory of Semantic TimePaul Borrill2026-03-05下载This is the final paper in the five-part series The Semantic Arrow of Time. Part I identified the FITO category mistake -- treating forward temporal flow as sufficient for establishing meaning.
The Semantic Arrow of Time, Part IV: Why Transactions FailPaul Borrill2026-03-05下载This is the fourth of five papers comprising The Semantic Arrow of Time. Parts I-III established that computing's hidden arrow of time is semantic rather than thermodynamic, that bilateral transaction...
Unlocking Python's Cores: Hardware Usage and Energy Implications of Removing the GILJosé Daniel Montoya Salazar2026-03-05下载Python's Global Interpreter Lock prevents execution on more than one CPU core at the same time, even when multiple threads are used. However, starting with Python 3.
The Semantic Arrow of Time, Part III: RDMA and the Completion FallacyPaul Borrill2026-03-05下载This is the third of five papers comprising The Semantic Arrow of Time. Parts I and II identified computing's hidden semantic arrow of time, the FITO category mistake, and presented the constructive a...
SLO-Aware Compute Resource Allocation for Prefill-Decode Disaggregated LLM InferenceLuchang Li, Dongfang Li, Bozhao Gong, Yu Zhang2026-03-05下载Prefill-Decode (P/D) disaggregation has emerged as a widely adopted optimization strategy for Large Language Model (LLM) inference. However, there currently exists no well-established methodology for ...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Tureis: Transformer-based Unified Resilience for IoT Devices in Smart HomesAlireza Borhani, Vafa Andalibi, Bahar Asgari2026-03-05下载Smart-home IoT systems rely on heterogeneous sensor networks whose correctness shapes application behavior and the physical environment. However, these low-cost, resource-constrained sensors are highl...
V2N-Based Algorithm and Communication Protocol for Autonomous Non-Stop IntersectionsLorenzo Farina, Lorenzo Mario Amorosa, Marco Rapelli, Barbara Maví Masini, Claudio Casetti, Alessandro Bazzi2026-03-05下载Intersections are critical areas for road safety and traffic efficiency, accounting for a significant portion of vehicle crashes and fatalities.
Analysis of Proactive Uncoordinated Techniques to Mitigate Interference in FMCW Automotive RadarsAlessandro Bazzi, Francesco Miccoli, Fabrizio Cuccoli, Luca Facheris, Vincent Martinez2026-03-05下载Modern vehicles increasingly rely on advanced driver-assistance systems (ADAS), with radars playing a key role due to their cost-effectiveness and reliable performance.
U-Parking: Distributed UWB-Assisted Autonomous Parking System with Robust Localization and Intelligent PlanningYiang Wu, Qiong Wu, Pingyi Fan, Kezhi Wang, Wen Chen, Guoqiang Mao, Khaled B. Letaief2026-03-05下载This demonstration presents U-Parking, a distributed Ultra-Wideband (UWB)-assisted autonomous parking system. By integrating Large Language Models (LLMs)-assisted planning with robust fusion localizat...
Adaptive Personalized Federated Reinforcement Learning for RIS-Assisted Aerial Relays in SAGINs with Fluid AntennasYuxuan Yang, Bin Lyu, Abbas Jamalipour2026-03-05下载Space-air-ground integrated networks (SAGINs) interconnect satellites, uncrewed aerial vehicles (UAVs), and ground devices to enable flexible and ubiquitous wireless services.
Selfish Cooperation Towards Low-Altitude Economy: Integrated Multi-Service Deployment with Resilient Federated Reinforcement LearningYuxuan Yang, Bin Lyu, Abbas Jamalipour2026-03-05下载The low-altitude economy (LAE) is a rapidly emerging paradigm that builds a service-centric economic ecosystem through large-scale and sustainable uncrewed aerial vehicle (UAV)-enabled service provisi...
Body-scale NFC for wearables: human-centric body-scale NFC networking for ultra-low-power wearable devices (Demo of UTokyo Kawahara Lab 2025)Hideaki Yamamoto, Yifan Li, Wakako Yukita, Tomoyuki Yokota, Takao Someya, Ryo Takahashi, Yoshihiro Kawahara2026-03-05下载Near Field Communication (NFC) is a promising technology for ultra-low-power wearables, yet its short communication range limits its use to narrow-area, point-to-point interactions.
Quantum Algorithms for Network Signal CoordinationVinayak Dixit, Richard Pech2026-03-05下载There has been increasing interest in developing efficient quantum algorithms for hard classical problems. The Network Signal Coordination (NSC) problem is one such problem known to be NP complete.

cs.PF - Performance

标题作者发布日期PDF摘要
Parallelization Strategies for Dense LLM Deployment: Navigating Through Application-Specific Tradeoffs and BottlenecksBurak Topcu, Musa Oguzhan Cim, Poovaiah Palangappa, Meena Arunachalam, Mahmut Taylan Kandemir2026-03-05下载Breakthroughs in the generative AI domain have fueled an explosion of large language model (LLM)-powered applications, whose workloads fundamentally consist of sequences of inferences through transfor...
FluxSieve: Unifying Streaming and Analytical Data Planes for Scalable Cloud ObservabilityAdriano Vogel, Sören Henning, Otmar Ertl2026-03-05下载Despite many advances in query optimization, indexing techniques, and data storage, modern data platforms still face difficulties in delivering robust query performance under high concurrency and comp...
Rethinking Temporal Models for TinyML: LSTM versus 1D-CNN in Resource-Constrained DevicesBidyut Saha, Riya Samanta2026-03-05下载Time series classification underpins applications such as human activity recognition, healthcare monitoring, and gesture detection in the IoT domain.
Unlocking Python's Cores: Hardware Usage and Energy Implications of Removing the GILJosé Daniel Montoya Salazar2026-03-05下载Python's Global Interpreter Lock prevents execution on more than one CPU core at the same time, even when multiple threads are used. However, starting with Python 3.

基于 VitePress 构建