2025-12-01

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Microbenchmarking NVIDIA's Blackwell Architecture: An in-depth Architectural Analysis	Aaron Jarmusch, Sunita Chandrasekaran	2025-12-01	下载	As GPU architectures rapidly evolve to meet the growing demands of exascale computing and machine learning, the performance implications of architectural innovations remain poorly understood across di...
A Low-Cost Reliable Racetrack Cache Based on Data Compression	Elham Cheshmikhani, Fateme Shokouhinia, Hamed Farbeh	2025-12-01	下载	SRAM-based cache memory faces several scalability limitations in deep nanoscale technologies, e.g., high leakage current, low cell stability, and low density.
A Systematic Characterization of LLM Inference on GPUs	Haonan Wang, Xuxin Xiao, Mingyu Yan, Zhuoyuan Zhu, Dengke Han, Duo Wang, Wenming Li, Xiaochun Ye, Cunchen Hu, Hongyang Chen, Guangyu Sun	2025-12-01	下载	This work presents a systematic characterization of Large Language Model (LLM) inference to address fragmented understanding. Through comprehensive experiments, we establish a four-dimensional analyti...
IVE: An Accelerator for Single-Server Private Information Retrieval Using Versatile Processing Elements	Sangpyo Kim, Hyesung Ji, Jongmin Kim, Wonseok Choi, Jaiyoung Park, Jung Ho Ahn	2025-12-01	下载	Private information retrieval (PIR) is an essential cryptographic protocol for privacy-preserving applications, enabling a client to retrieve a record from a server's database without revealing which ...
RoMe: Row Granularity Access Memory System for Large Language Models	Hwayong Nam, Seungmin Baek, Jumin Kim, Michael Jaemin Kim, Jung Ho Ahn	2025-12-01	下载	Modern HBM-based memory systems have evolved over generations while retaining cache line granularity accesses. Preserving this fine granularity necessitated the introduction of bank groups and pseudo ...
Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control	Fabian Kresse, Christoph H. Lampert	2025-12-01	下载	We investigate whether continuous-control policies can be represented and learned as discrete logic circuits instead of continuous neural networks.
hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware	Jan-Frederik Schulte, Benjamin Ramhorst, Chang Sun, Jovan Mitrevski, Nicolò Ghielmetti, Enrico Lupi, Dimitrios Danopoulos, Vladimir Loncar, Javier Duarte, David Burnette, Lauri Laatu, Stylianos Tzelepis, Konstantinos Axiotis, Quentin Berthet, Haoyan Wang, Paul White, Suleyman Demirsoy, Marco Colombo, Thea Aarrestad, Sioni Summers, Maurizio Pierini, Giuseppe Di Guglielmo, Jennifer Ngadiuba, Javier Campos, Ben Hawks, Abhijith Gandrakota, Farah Fahim, Nhan Tran, George Constantinides, Zhiqiang Que, Wayne Luk, Alexander Tapper, Duc Hoang, Noah Paladino, Philip Harris, Bo-Cheng Lai, Manuel Valentin, Ryan Forelli, Seda Ogrenci, Lino Gerlach, Rian Flynn, Mia Liu, Daniel Diaz, Elham Khoda, Melissa Quinnan, Russell Solares, Santosh Parajuli, Mark Neubauer, Christian Herwig, Ho Fung Tsoi, Dylan Rankin, Shih-Chieh Hsu, Scott Hauck	2025-12-01	下载	We present hls4ml, a free and open-source platform that translates machine learning (ML) models from modern deep learning frameworks into high-level synthesis (HLS) code that can be integrated into fu...
Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity	Wenbin Zhu, Zhaoyan Shen, Zili Shao, Hongjun Dai, Feng Chen	2025-12-01	下载	Serverless Large Language Models (LLMs) have emerged as a cost-effective solution for deploying AI services by enabling a 'pay-as-you-go' pricing model through GPU resource sharing.
Leveraging Recurrent Patterns in Graph Accelerators	Masoud Rahimi, Sébastien Le Beux	2025-12-01	下载	Graph accelerators have emerged as a promising solution for processing large-scale sparse graphs, leveraging the in-situ compu-tation of ReRAM-based crossbars to maximize computational efficiency.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect Async	Yi Liu, Chen Qian	2025-12-01	下载	Vector similarity search has become a critical component in AI-driven applications such as large language models (LLMs). To achieve high recall and low latency, GPUs are utilized to exploit massive pa...
Sampling on Metric Graphs	Rajat Vadiraj Dwaraknath, Lexing Ying	2025-12-01	下载	Metric graphs are structures obtained by associating edges in a standard graph with segments of the real line and gluing these segments at the vertices of the graph.
Feature-Based Semantics-Aware Scheduling for Energy-Harvesting Federated Learning	Eunjeong Jeong, Giovanni Perin, Howard H. Yang, Nikolaos Pappas	2025-12-01	下载	Federated Learning (FL) on resource-constrained edge devices faces a critical challenge: The computational energy required for training Deep Neural Networks (DNNs) often dominates communication costs.
Dion2: A Simple Method to Shrink Matrix in Muon	Kwangjun Ahn, Noah Amsel, John Langford	2025-12-01	下载	The Muon optimizer enjoys strong empirical performance and theoretical grounding. However, the super-linear cost of its orthonormalization step introduces increasing overhead with scale.
QAISim: A Toolkit for Modeling and Simulation of AI in Quantum Cloud Computing Environments	Irwindeep Singh, Sukhpal Singh Gill, Jinzhao Sun, Jan Mol	2025-12-01	下载	Quantum computing offers new ways to explore the theory of computation via the laws of quantum mechanics. Due to the rising demand for quantum computing resources, there is growing interest in develop...
Trace-based, time-resolved analysis of MPI application performance using standard metrics	Kingshuk Haldar	2025-12-01	下载	Detailed trace analysis of MPI applications is essential for performance engineering, but growing trace sizes and complex communication behaviour often render comprehensive visual inspection impractic...
Morphling: Fast, Fused, and Flexible GNN Training at Scale	Anubhab, Rupesh Nasre	2025-12-01	下载	Graph Neural Networks (GNNs) present a fundamental hardware challenge by fusing irregular, memory-bound graph traversals with regular, compute-intensive dense matrix operations.
StarDist: A Code Generator for Distributed Graph Algorithms	Barenya Kumar Nandy, Rupesh Nasre	2025-12-01	下载	Relational data, occurring in the real world, are often structured as graphs, which provide the logical abstraction required to make analytical derivations simpler.
Delta Sum Learning: an approach for fast and global convergence in Gossip Learning	Tom Goethals, Merlijn Sebrechts, Stijn De Schrijver, Filip De Turck, Bruno Volckaert	2025-12-01	下载	Federated Learning is a popular approach for distributed learning due to its security and computational benefits. With the advent of powerful devices in the network edge, Gossip Learning further decen...
Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity	Wenbin Zhu, Zhaoyan Shen, Zili Shao, Hongjun Dai, Feng Chen	2025-12-01	下载	Serverless Large Language Models (LLMs) have emerged as a cost-effective solution for deploying AI services by enabling a 'pay-as-you-go' pricing model through GPU resource sharing.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Adversarial Robustness of Traffic Classification under Resource Constraints: Input Structure Matters	Adel Chehade, Edoardo Ragusa, Paolo Gastaldo, Rodolfo Zunino	2025-12-01	下载	Traffic classification (TC) plays a critical role in cybersecurity, particularly in IoT and embedded contexts, where inspection must often occur locally under tight hardware constraints.
Intrusion Detection on Resource-Constrained IoT Devices with Hardware-Aware ML and DL	Ali Diab, Adel Chehade, Edoardo Ragusa, Paolo Gastaldo, Rodolfo Zunino, Amer Baghdadi, Mostafa Rizk	2025-12-01	下载	This paper proposes a hardware-aware intrusion detection system (IDS) for Internet of Things (IoT) and Industrial IoT (IIoT) networks; it targets scenarios where classification is essential for fast, ...
Feature-Based Semantics-Aware Scheduling for Energy-Harvesting Federated Learning	Eunjeong Jeong, Giovanni Perin, Howard H. Yang, Nikolaos Pappas	2025-12-01	下载	Federated Learning (FL) on resource-constrained edge devices faces a critical challenge: The computational energy required for training Deep Neural Networks (DNNs) often dominates communication costs.
Delay Tolerant Networking to Extend Connectivity in Rural Areas Using Public Transport Systems: Design And Analysis	Salah Abdeljabar, Marco Zennaro, Mohamed-Slim Alouini	2025-12-01	下载	In today's digital age, access to the Internet is essential, yet a significant digital divide exists, particularly in rural areas of developing nations.
HERMES: Heterogeneous Application-Enabled Routing Middleware for Edge-IoT Systems	Jéssica Consciência, António Grilo	2025-12-01	下载	The growth of the Internet of Things has enabled a new generation of applications, pushing computation and intelligence toward the network edge.
Secure Over-the-Air Computation Against Multiple Eavesdroppers using Correlated Artificial Noise	David Nordlund, Luis Maßny, Antonia Wachter-Zeh, Erik G. Larsson, Zheng Chen	2025-12-01	下载	In the era of the Internet of Things and massive connectivity, many engineering applications, such as sensor fusion and federated edge learning, rely on efficient data aggregation from geographically ...
Towards a Multi-Layer Defence Framework for Securing Near-Real-Time Operations in Open RAN	Hamed Alimohammadi, Samara Mayhoub, Sotiris Chatzimiltis, Mohammad Shojafar, Muhammad Nasir Mumtaz Bhutta	2025-12-01	下载	Securing the near-real-time (near-RT) control operations in Open Radio Access Networks (Open RAN) is increasingly critical, yet remains insufficiently addressed, as new runtime threats target the cont...
Velocity-Adaptive Access Scheme for Semantic-Aware Vehicular Networks: Joint Fairness and AoI Optimization	Xiao Xu, Qiong Wu, Pingyi Fan, Kezhi Wang, Nan Cheng, Wen Chen, Khaled B. Letaief	2025-12-01	下载	In this paper, we address the problem of fair access and Age of Information (AoI) optimization in 5G New Radio (NR) Vehicle to Everything (V2X) Mode 2.
Modeling and Simulation of Data Protection Systems for Business Continuity and Disaster Recovery	Saso Nikolovski, Pece Mitrevski	2025-12-01	下载	In today's corporate landscape, particularly where operations rely heavily on information technologies, establishing a robust business continuity plan, including a disaster recovery strategy, is essen...
Value of Communication in Goal-Oriented Semantic Communications: A Pareto Analysis	Jiping Luo, Bowen Li, Nikolaos Pappas	2025-12-01	下载	Emerging cyber-physical systems increasingly operate under stringent communication constraints that preclude the reliable transmission of their extensive machine-type data streams.
INFERMAL: Inferential analysis of maliciously registered domains	Yevheniya Nosyk, Maciej Korczyński, Carlos Gañán, Sourena Maroofi, Jan Bayer, Zul Odgerel, Samaneh Tajalizadehkhoob, Andrzej Duda	2025-12-01	下载	Cybercriminals have long depended on domain names for phishing, spam, malware distribution, and botnet operation. To facilitate the malicious activities, they continually register new domain names for...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Confidential, Attestable, and Efficient Inter-CVM Communication with Arm CCA	Sina Abdollahi, Amir Al Sadi, David Kotz, Marios Kogias, Hamed Haddadi	2025-12-01	下载	Confidential Virtual Machines (CVMs) are increasingly adopted to protect sensitive workloads from privileged adversaries such as the hypervisor.
Accelerating Probabilistic Response-Time Analysis: Revised Critical Instant and Optimized Convolution	Hiroto Takahashi, Atsushi Yano, Takuya Azumi	2025-12-01	下载	Accurate estimation of the Worst-Case Deadline Failure Probability (WCDFP) has attracted growing attention as a means to provide safety assurances in complex systems such as robotic platforms and auto...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Scalable, Cloud-Based Simulations of Blood Flow and Targeted Drug Delivery in Retinal Capillaries	Lucas Amoudruz, Sergey Litvinov, Riccardo Murri, Volker Eyrich, Jens Zudrop, Costas Bekas, Petros Koumoutsakos	2025-12-01	下载	We investigate the capabilities of cloud computing for large-scale,tightly-coupled simulations of biological fluids in complex geometries, traditionally performed in supercomputing centers.