Skip to content

2025-12-03

cs.AR - Architecture

标题作者发布日期PDF摘要
Spiking ManifestoEugene Izhikevich2025-12-03下载Practically everything computers do is better, faster, and more power-efficient than the brain. For example, a calculator performs numerical computations more energy-efficiently than any human.
A Spatial Array for Spectrally Agile Wireless ProcessingAli Rasteh, Andrew Hennessee, Ishaan Shivhare, Siddharth Garg, Sundeep Rangan, Brandon Reagen2025-12-03下载Massive MIMO is a cornerstone of next-generation wireless communication, offering significant gains in capacity, reliability, and energy efficiency.
The BrainScaleS-2 multi-chip system: Interconnecting continuous-time neuromorphic compute substratesJoscha Ilmberger, Johannes Schemmel2025-12-03下载The BrainScaleS-2 SoC integrates analog neuron and synapse circuits with digital periphery, including two CPUs with SIMD extensions. Each ASIC is connected to a Node-FPGA, providing experiment control...
Lightweight Unified Sha-3/Shake Architecture with a Fault-Resilient StateChristian Ewert, Amrit Sharma Poudel, Mouadh Ayache, Andrija Neskovic, Rainer Buchty, Mladen Berekovic, Sebastian Berndt, Saleh Mulhem2025-12-03下载Hash functions have become a key part of standard Post-quantum cryptography (PQC) schemes, especially Sha-3 and Shake, calling arXiv:submit/7045552 [cs.AR] 3 Dec 2025 for lightweight implementation.
KVNAND: Efficient On-Device Large Language Model Inference Using DRAM-Free In-Flash ComputingLishuo Deng, Shaojie Xu, Jinwu Chen, Changwei Yan, Jiajie Wang, Zhe Jiang, Weiwei Shan2025-12-03下载Deploying large language models (LLMs) on edge devices enables personalized agents with strong privacy and low cost. However, with tens to hundreds of billions of parameters, single-batch autoregressi...
Accelerating Detailed Routing Convergence through Offline Reinforcement LearningAfsara Khan, Austin Rovinski2025-12-03下载Detailed routing remains one of the most complex and time-consuming steps in modern physical design due to the challenges posed by shrinking feature sizes and stricter design rules.
In-Situ Encryption of Single-Transistor Nonvolatile Memories without Density LossSanwar Ahmed Ovy, Jiahui Duan, Md Ashraful Islam Romel, Franz Muller, Thomas Kampfe, Kai Ni, Sumitha George2025-12-03下载Non-volatile memories (NVMs) offer negligible leakage power consumption, high integration density, and data retention, but their non-volatility also raises the risk of data exposure.
Agentic Operator Generation for ML ASICsAlec M. Hammond, Aram Markosyan, Aman Dontula, Simon Mahns, Zacharias Fisches, Dmitrii Pedchenko, Keyur Muzumdar, Natacha Supper, Mark Saroufim, Joe Isaacson, Laura Wang, Warren Hunt, Kaustubh Gondkar, Roman Levenstein, Gabriel Synnaeve, Richard Li, Jacob Kahn, Ajit Mathews2025-12-03下载We present TritorX, an agentic AI system designed to generate functionally correct Triton PyTorch ATen kernels at scale for emerging accelerator platforms.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
VLCs: Managing Parallelism with Virtualized LibrariesYineng Yan, William Ruys, Hochan Lee, Ian Henriksen, Arthur Peters, Sean Stephens, Bozhi You, Henrique Fingler, Martin Burtscher, Milos Gligoric, Keshav Pingali, Mattan Erez, George Biros, Christopher J. Rossbach2025-12-03下载As the complexity and scale of modern parallel machines continue to grow, programmers increasingly rely on composition of software libraries to encapsulate and exploit parallelism.
Scaling MPI Applications on AuroraHuda Ibeid, Anthony-Trung Nguyen, Aditya Nishtala, Premanand Sakarda, Larry Kaplan, Nilakantan Mahadevan, Michael Woodacre, Victor Anisimov, Kalyan Kumaran, JaeHyuk Kwack, Vitali Morozov, Servesh Muralidharan, Scott Parker2025-12-03下载The Aurora supercomputer, which was deployed at Argonne National Laboratory in 2024, is currently one of three Exascale machines in the world on the Top500 list.
tritonBLAS: Triton-based Analytical Approach for GEMM Kernel Parameter SelectionRyan Swann, Muhammad Osama, Xiaohu Guo, Bryant Nelson, Lixun Zhang, Alex Brown, Yen Ong, Ali Yazdani, Sean Siddens, Ganesh Dasika, Alex Underwood2025-12-03下载We present tritonBLAS, a fast and deterministic analytical model that uses architectural parameters like the cache hierarchy, and relative code and data placement to generate performant GPU GEMM kerne...
A Chronological Analysis of the Evolution of SmartNICsOlasupo Ajayi, Ryan Grant2025-12-03下载Network Interface Cards (NICs) are one of the key enablers of the modern Internet. They serve as gateways for connecting computing devices to networks for the exchange of data with other devices.
OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE InferenceLiujianfu Wang, Yuyang Du, Yuchen Pan, Soung Chang Liew, Jiacheng Liu, Kexin Chen2025-12-03下载Mixture-of-Experts (MoE), while offering significant advantages as a Large Language Model (LLM) architecture, faces substantial challenges when deployed on low-cost edge devices with tight memory cons...
Integrating High Performance In-Memory Data Streaming and In-Situ Visualization in Hybrid MPI+OpenMP PIC MC Simulations Towards ExascaleJeremy J. Williams, Stefan Costea, Daniel Medeiros, Jordy Trilaksono, Pratibha Hegde, David Tskhakaya, Leon Kos, Ales Podolnik, Jakub Hromadka, Kevin A. Huck, Allen D. Malony, Frank Jenko, Erwin Laure, Stefano Markidis2025-12-03下载Efficient simulation of complex plasma dynamics is crucial for advancing fusion energy research. Particle-in-Cell (PIC) Monte Carlo (MC) simulations provide insights into plasma behavior, including tu...
Seamless Transitions: A Comprehensive Review of Live Migration TechnologiesSima Attar-Khorasani, Lincoln Sherpa, Matthias Lieber, Siavash Ghiasvand2025-12-03下载Live migration, a technology enabling seamless transition of operational computational entities between various hosts while preserving continuous functionality and client connectivity, has been the su...
Acceleration of Parallel Tempering for Markov Chain Monte Carlo methodsAingeru Ramos, Jose A Pascual, Javier Navaridas, Ivan Coluzza2025-12-03下载Markov Chain Monte Carlo methods are algorithms used to sample probability distributions, commonly used to sample the Boltzmann distribution of physical/chemical models (e.g.
On the Challenges of Energy-Efficiency Analysis in HPC Systems: Evaluating Synthetic Benchmarks and GromacsRafael Ravedutti Lucio Machado, Jan Eitzinger, Georg Hager, Gerhard Wellein2025-12-03下载This paper discusses the challenges encountered when analyzing the energy efficiency of synthetic benchmarks and the Gromacs package on the Fritz and Alex HPC clusters.
Distributed Quantum Computing with Fan-Out Operations and Qudits: the Case of Distributed Global GatesSeng W. Loke2025-12-03下载Much recent work on distributed quantum computing have focused on the use of entangled pairs and distributed two qubit gates. But there has also been work on efficient schemes for achieving multiparti...
FFTrainer: Fast Failover in Large-Language Model Training with Almost-Free State ManagementBohan Zhao, Yuanhong Wang, Chenglin Liu, Jiagi Pan, Guang Yang, Ruitao Liu, Tingrui Zhang, Kai Luo, Wei Xu2025-12-03下载Recent developments in large language models (LLMs) have introduced new requirements for efficient and robust training. As LLM clusters scale, node failures, lengthy recoveries, and bulky checkpoints ...
Tuning of Vectorization Parameters for Molecular Dynamics Simulations in AutoPasLuis Gall, Samuel James Newcome, Fabio Alexander Gratl, Markus Mühlhäußer, Manish Kumar Mishra, Hans-Joachim Bungartz2025-12-03下载Molecular Dynamics simulations can help scientists to gather valuable insights for physical processes on an atomic scale. This work explores various techniques for SIMD vectorization to improve the pa...
Double-Edge-Assisted Computation Offloading and Resource Allocation for Space-Air-Marine Integrated NetworksZhen Wang, Bin Lin, Qiang, Ye2025-12-03下载In this paper, we propose a double-edge-assisted computation offloading and resource allocation scheme tailored for space-air-marine integrated networks (SAMINs).
Agentic Operator Generation for ML ASICsAlec M. Hammond, Aram Markosyan, Aman Dontula, Simon Mahns, Zacharias Fisches, Dmitrii Pedchenko, Keyur Muzumdar, Natacha Supper, Mark Saroufim, Joe Isaacson, Laura Wang, Warren Hunt, Kaustubh Gondkar, Roman Levenstein, Gabriel Synnaeve, Richard Li, Jacob Kahn, Ajit Mathews2025-12-03下载We present TritorX, an agentic AI system designed to generate functionally correct Triton PyTorch ATen kernels at scale for emerging accelerator platforms.
TokenScale: Timely and Accurate Autoscaling for Disaggregated LLM Serving with Token VelocityRuiqi Lai, Hongrui Liu, Chengzhi Lu, Zonghao Liu, Siyu Cao, Siyang Shao, Yixin Zhang, Luo Mai, Dmitrii Ustiugov2025-12-03下载The architectural shift to prefill/decode (PD) disaggregation in LLM serving improves resource utilization but struggles with the bursty nature of modern workloads.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Meta-Continual Mobility Forecasting for Proactive Handover PredictionSasi Vardhan Reddy Mandapati2025-12-03下载Short-term mobility forecasting is a core requirement for proactive handover (HO) in cellular networks. Real-world mobility is highly non-stationary: abrupt turns, rapid speed changes, and unpredictab...
Simulation of a Heterogeneous Quantum NetworkHayden Miller, Caitao Zhan, Michael Bishof, Joaquin Chung, Han Xu, Prem Kumar, Rajkumar Kettimuthu2025-12-03下载Quantum networks are expected to be heterogeneous systems, combining distinct qubit platforms, photon wavelengths, and device timescales to achieve scalable, multiuser connectivity.
A Chronological Analysis of the Evolution of SmartNICsOlasupo Ajayi, Ryan Grant2025-12-03下载Network Interface Cards (NICs) are one of the key enablers of the modern Internet. They serve as gateways for connecting computing devices to networks for the exchange of data with other devices.
Tutorial on Large Language Model-Enhanced Reinforcement Learning for Wireless NetworksLingyi Cai, Wenjie Fu, Yuxi Huang, Ruichen Zhang, Yinqiu Liu, Jiawen Kang, Zehui Xiong, Tao Jiang, Dusit Niyato, Xianbin Wang, Shiwen Mao, Xuemin Shen2025-12-03下载Reinforcement Learning (RL) has shown remarkable success in enabling adaptive and data-driven optimization for various applications in wireless networks.
Machine Learning to Predict Slot Usage in TSCH Wireless Sensor NetworksStefano Scanzio, Gabriele Formis, Tullio Facchinetti, Gianluca Cena2025-12-03下载Wireless sensor networks (WSNs) are employed across a wide range of industrial applications where ultra-low power consumption is a critical prerequisite.
Performance Evaluation of Parallel Wi-Fi Redundancy with Deferral TechniquesGianluca Cena, Pietro Chiavassa, Stefano Scanzio2025-12-03下载Wireless communication is increasingly used in industrial environments, since it supports mobility of interconnected devices. Among the transmission technologies operating in unlicensed bands availabl...
Mobility Induced Sensitivity of UAV based Nodes to Jamming in Private 5G Airfield Networks An Experimental StudyPavlo Mykytyn, Ronald Chitauro, Onur Yener, Peter Langendoerfer2025-12-03下载This work presents an experimental performance evaluation of a private 5G airfield network under controlled directional SDR jamming attacks targeting UAV-based UE nodes.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
VLCs: Managing Parallelism with Virtualized LibrariesYineng Yan, William Ruys, Hochan Lee, Ian Henriksen, Arthur Peters, Sean Stephens, Bozhi You, Henrique Fingler, Martin Burtscher, Milos Gligoric, Keshav Pingali, Mattan Erez, George Biros, Christopher J. Rossbach2025-12-03下载As the complexity and scale of modern parallel machines continue to grow, programmers increasingly rely on composition of software libraries to encapsulate and exploit parallelism.

cs.PF - Performance

标题作者发布日期PDF摘要
Integrating High Performance In-Memory Data Streaming and In-Situ Visualization in Hybrid MPI+OpenMP PIC MC Simulations Towards ExascaleJeremy J. Williams, Stefan Costea, Daniel Medeiros, Jordy Trilaksono, Pratibha Hegde, David Tskhakaya, Leon Kos, Ales Podolnik, Jakub Hromadka, Kevin A. Huck, Allen D. Malony, Frank Jenko, Erwin Laure, Stefano Markidis2025-12-03下载Efficient simulation of complex plasma dynamics is crucial for advancing fusion energy research. Particle-in-Cell (PIC) Monte Carlo (MC) simulations provide insights into plasma behavior, including tu...
Hyperdimensional Computing for Sustainable Manufacturing: An Initial AssessmentDanny Hoang, Anandkumar Patel, Ruimen Chen, Rajiv Malhotra, Farhad Imani2025-12-03下载Smart manufacturing can significantly improve efficiency and reduce energy consumption, yet the energy demands of AI models may offset these gains.
Tuning of Vectorization Parameters for Molecular Dynamics Simulations in AutoPasLuis Gall, Samuel James Newcome, Fabio Alexander Gratl, Markus Mühlhäußer, Manish Kumar Mishra, Hans-Joachim Bungartz2025-12-03下载Molecular Dynamics simulations can help scientists to gather valuable insights for physical processes on an atomic scale. This work explores various techniques for SIMD vectorization to improve the pa...

基于 VitePress 构建