Skip to content

2026-03-27

cs.AR - Architecture

标题作者发布日期PDF摘要
Efficient CMOS Invertible Logic Using Stochastic ComputingSean C. Smithson, Naoya Onizawa, Brett H. Meyer, Warren J. Gross, Takahiro Hanyu2026-03-27下载Invertible logic can operate in one of two modes: 1) a forward mode, in which inputs are presented and a single, correct output is produced, and 2) a reverse mode, in which the output is fixed and the...
Who Checks the Checker? Enhancing Component-level Architectural SEU Fault Tolerance for End-to-End SoC ProtectionMichael Rogenmoser, Philippe Sauter, Chen Wu, Angelo Garofalo, Luca Benini2026-03-27下载Single-event upset (SEU) fault tolerance for systems-on-chip (SoCs) in radiation-heavy environments is often addressed by architectural fault-tolerance approaches protecting individual SoC components ...
A Lightweight High-Throughput Collective-Capable NoC for Large-Scale ML AcceleratorsLuca Colagrande, Lorenzo Leone, Chen Wu, Tim Fischer, Raphael Roth, Luca Benini2026-03-27下载The exponential increase in Machine Learning (ML) model size and complexity has driven unprecedented demand for high-performance acceleration systems.
Wattchmen: Watching the Wattchers -- High Fidelity, Flexible GPU Energy ModelingBrandon Tran, Matthias Maiterth, Woong Shin, Matthew D. Sinclair, Shivaram Venkataraman2026-03-27下载Modern GPU-rich HPC systems are increasingly becoming energy-constrained. Thus, understanding an application's energy consumption becomes essential.
VolTune: A Fine-Grained Runtime Voltage Control Architecture for FPGA SystemsAkram Ben Ahmed, Takahiro Hirofuchi, Takaaki Fukai2026-03-27下载The rapid emergence of edge computing platforms and large-scale data centers has made power efficiency a primary design constraint, particularly for data-intensive and AI-driven workloads.
IBEX: Internal Bandwidth-Efficient Compression Architecture for Scalable CXL Memory ExpansionYounghoon Ko, Hyemin Park, Hyuk-Jae Lee, Hyokeun Lee2026-03-27下载As the memory channel count is confined by physical dimensions, memory expanders appear to be a promising approach to extending memory capacity and channels by augmenting the existing I/O interface (e...
RAGnaroX: A Secure, Local-Hosted ChatOps Assistant Using Small Language ModelsBenedikt Dornauer, Mircea-Cristian Racasan2026-03-27下载This paper introduces RAGnaroX, a resource-efficient ChatOps assistant that operates entirely on commodity hardware. Unlike existing solutions that often rely on external providers such as Azure or Op...
Per-Bank Memory Bandwidth Regulation for Predictable and Performant Real-Time SystemConnor Rudy Sullivan, Amin Mamandipoor, Cole Ridge Strickler, Heechul Yun2026-03-27下载Modern multicore system-on-chips (SoCs) share off-chip DRAM across cores, where bank-level interference can significantly degrade performance and threaten real-time guarantees.
Data Gravity and the Energy Limits of ComputationWonsuk Lee, Jehoshua Bruck2026-03-27下载Unlike the von Neumann architecture, which separates computation from memory, the brain tightly integrates them, an organization that large language models increasingly resemble.
VeRA+: Vector-Based Lightweight Digital Compensation for Drift-Resilient RRAM In-Memory ComputingWeirong Dong, Kai Zhou, Zhen Kong, Zhengke Yang, Quan Cheng, Haoyuan Li, Junkai Huang, Jun Lan, Yida Li, Masanori Hashimoto, Longyang Lin2026-03-27下载RRAM-based in-memory computing (IMC) offers high energy efficiency but suffers from conductance drift that severely degrades long-term accuracy.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
HFIPay: Privacy-Preserving, Cross-Chain Cryptocurrency Payments to Human-Friendly IdentifiersJian Sheng Wang2026-03-27下载Sending cryptocurrency to an email address or phone number should be as simple as a bank transfer, yet naive schemes that map identifiers directly to blockchain addresses expose the recipient's balanc...
Fast Topology-Aware Lossy Data Compression with Full Preservation of Critical Points and Local OrderAlex Fallin, Nathaniel Gorski, Tripti Agarwal, Bei Wang, Ganesh Gopalakrishnan, Martin Burtscher2026-03-27下载Many scientific codes and instruments generate large amounts of floating-point data at high rates that must be compressed before they can be stored.
Efficiently Reproducing Distributed Workflows in Notebook-based SystemsTalha Azaz, Raza Ahmad, Md Saiful Islam, Douglas Thain, Tanu Malik2026-03-27下载Notebooks provide an author-friendly environment for iterative development, modular execution, and easy sharing. Distributed workflows are increasingly being authored and executed in notebooks, yet sh...
Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALPGhazal Rahimi, Victor Lopez, Marc Clascà, Joan Vinyals Ylla Català, Jesus Labarta, Marta Garcia-Gasulla2026-03-27下载The increasing adoption of heterogeneous platforms that combine CPUs with accelerators such as GPUs in high-performance computing (HPC) introduces new challenges for performance analysis and optimizat...
Rocks, Pebbles and Sand: Modality-aware Scheduling for Multimodal Large Language Model InferenceKonstantinos Papaioannou, Thaleia Dimitra Doudali2026-03-27下载Multimodal Large Language Models (MLLMs) power platforms like ChatGPT, Gemini, and Copilot, enabling richer interactions with text, images, and videos.
UNIFERENCE: A Discrete Event Simulation Framework for Developing Distributed AI ModelsDoğaç Eldenk, Stephen Xia2026-03-27下载Developing and evaluating distributed inference algorithms remains difficult due to the lack of standardized tools for modeling heterogeneous devices and networks.
A Lightweight High-Throughput Collective-Capable NoC for Large-Scale ML AcceleratorsLuca Colagrande, Lorenzo Leone, Chen Wu, Tim Fischer, Raphael Roth, Luca Benini2026-03-27下载The exponential increase in Machine Learning (ML) model size and complexity has driven unprecedented demand for high-performance acceleration systems.
Wattchmen: Watching the Wattchers -- High Fidelity, Flexible GPU Energy ModelingBrandon Tran, Matthias Maiterth, Woong Shin, Matthew D. Sinclair, Shivaram Venkataraman2026-03-27下载Modern GPU-rich HPC systems are increasingly becoming energy-constrained. Thus, understanding an application's energy consumption becomes essential.
ParaQAOA: Efficient Parallel Divide-and-Conquer QAOA for Large-Scale Max-Cut Problems Beyond 10,000 VerticesPo-Hsuan Huang, Xie-Ru Li, Chi Chuang, Chia-Heng Tu, Shih-Hao Hung2026-03-27下载Quantum Approximate Optimization Algorithm (QAOA) has emerged as a promising solution for combinatorial optimization problems using a hybrid quantum-classical framework.
Distributed Quantum Discrete Logarithm AlgorithmRenjie Xu, Daowen Qiu, Ligang Xiao, Le Luo, Xu Zhou2026-03-27下载Solving the discrete logarithm problem (DLP) with quantum computers is a fundamental task with important implications. Beyond Shor's algorithm, many researchers have proposed alternative solutions in ...
DarwinNet: An Evolutionary Network Architecture for Agent-Driven Protocol SynthesisJinliang Xu, Bingqi Li2026-03-27下载Traditional network architectures suffer from severe protocol ossification and structural fragility due to their reliance on static, human-defined rules that fail to adapt to the emergent edge cases a...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
ML-Enabled Open RAN: A Comprehensive Survey of Architectures, Challenges, and OpportunitiesMira Chandra Kirana, Patatchona Keyela, Fatemeh Rostamian, Deemah H. Tashman, Soumaya Cherkaoui2026-03-27下载As wireless communication systems become more advanced, Open Radio Access Networks (O-RAN) stand out as a notable framework that promotes interoperability and cost-effectiveness.
Trustworthy AI-Driven Dynamic Hybrid RIS: Joint Optimization and Reward Poisoning-Resilient Control in Cognitive MISO NetworksDeemah H. Tashman, Soumaya Cherkaoui2026-03-27下载Cognitive radio networks (CRNs) are a key mechanism for alleviating spectrum scarcity by enabling secondary users (SUs) to opportunistically access licensed frequency bands without harmful interferenc...
Innovation Discovery System for Networking ResearchMengrui Zhang, Bang Huang, Yunxin Xu, Haiying Huang, Luxi Zhao, Mochun Long, Qingyu Song, Qiao Xiang, Xue Liu, Jiwu Shu2026-03-27下载As networking systems become increasingly complex, achieving disruptive innovation grows more challenging. At the same time, recent progress in Large Language Models (LLMs) has shown strong potential ...
DarwinNet: An Evolutionary Network Architecture for Agent-Driven Protocol SynthesisJinliang Xu, Bingqi Li2026-03-27下载Traditional network architectures suffer from severe protocol ossification and structural fragility due to their reliance on static, human-defined rules that fail to adapt to the emergent edge cases a...

cs.PF - Performance

标题作者发布日期PDF摘要
Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALPGhazal Rahimi, Victor Lopez, Marc Clascà, Joan Vinyals Ylla Català, Jesus Labarta, Marta Garcia-Gasulla2026-03-27下载The increasing adoption of heterogeneous platforms that combine CPUs with accelerators such as GPUs in high-performance computing (HPC) introduces new challenges for performance analysis and optimizat...
ParaQAOA: Efficient Parallel Divide-and-Conquer QAOA for Large-Scale Max-Cut Problems Beyond 10,000 VerticesPo-Hsuan Huang, Xie-Ru Li, Chi Chuang, Chia-Heng Tu, Shih-Hao Hung2026-03-27下载Quantum Approximate Optimization Algorithm (QAOA) has emerged as a promising solution for combinatorial optimization problems using a hybrid quantum-classical framework.
Optimization Trade-offs in Asynchronous Federated Learning: A Stochastic Networks ApproachAbdelkrim Alahyane, Céline Comte, Matthieu Jonckheere2026-03-27下载Synchronous federated learning scales poorly due to the straggler effect. Asynchronous algorithms increase the update throughput by processing updates upon arrival, but they introduce two fundamental ...
Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling InnovationsMayank Jha2026-03-27下载The development of large-scale foundation models, particularly Large Language Models (LLMs), is constrained by significant computational and memory bottlenecks.

基于 VitePress 构建