Skip to content

2025-12-01

cs.AR - Architecture

标题作者发布日期PDF摘要
Microbenchmarking NVIDIA's Blackwell Architecture: An in-depth Architectural AnalysisAaron Jarmusch, Sunita Chandrasekaran2025-12-01下载As GPU architectures rapidly evolve to meet the growing demands of exascale computing and machine learning, the performance implications of architectural innovations remain poorly understood across di...
A Low-Cost Reliable Racetrack Cache Based on Data CompressionElham Cheshmikhani, Fateme Shokouhinia, Hamed Farbeh2025-12-01下载SRAM-based cache memory faces several scalability limitations in deep nanoscale technologies, e.g., high leakage current, low cell stability, and low density.
A Systematic Characterization of LLM Inference on GPUsHaonan Wang, Xuxin Xiao, Mingyu Yan, Zhuoyuan Zhu, Dengke Han, Duo Wang, Wenming Li, Xiaochun Ye, Cunchen Hu, Hongyang Chen, Guangyu Sun2025-12-01下载This work presents a systematic characterization of Large Language Model (LLM) inference to address fragmented understanding. Through comprehensive experiments, we establish a four-dimensional analyti...
IVE: An Accelerator for Single-Server Private Information Retrieval Using Versatile Processing ElementsSangpyo Kim, Hyesung Ji, Jongmin Kim, Wonseok Choi, Jaiyoung Park, Jung Ho Ahn2025-12-01下载Private information retrieval (PIR) is an essential cryptographic protocol for privacy-preserving applications, enabling a client to retrieve a record from a server's database without revealing which ...
RoMe: Row Granularity Access Memory System for Large Language ModelsHwayong Nam, Seungmin Baek, Jumin Kim, Michael Jaemin Kim, Jung Ho Ahn2025-12-01下载Modern HBM-based memory systems have evolved over generations while retaining cache line granularity accesses. Preserving this fine granularity necessitated the introduction of bank groups and pseudo ...
Differentiable Weightless Controllers: Learning Logic Circuits for Continuous ControlFabian Kresse, Christoph H. Lampert2025-12-01下载We investigate whether continuous-control policies can be represented and learned as discrete logic circuits instead of continuous neural networks.
hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable HardwareJan-Frederik Schulte, Benjamin Ramhorst, Chang Sun, Jovan Mitrevski, Nicolò Ghielmetti, Enrico Lupi, Dimitrios Danopoulos, Vladimir Loncar, Javier Duarte, David Burnette, Lauri Laatu, Stylianos Tzelepis, Konstantinos Axiotis, Quentin Berthet, Haoyan Wang, Paul White, Suleyman Demirsoy, Marco Colombo, Thea Aarrestad, Sioni Summers, Maurizio Pierini, Giuseppe Di Guglielmo, Jennifer Ngadiuba, Javier Campos, Ben Hawks, Abhijith Gandrakota, Farah Fahim, Nhan Tran, George Constantinides, Zhiqiang Que, Wayne Luk, Alexander Tapper, Duc Hoang, Noah Paladino, Philip Harris, Bo-Cheng Lai, Manuel Valentin, Ryan Forelli, Seda Ogrenci, Lino Gerlach, Rian Flynn, Mia Liu, Daniel Diaz, Elham Khoda, Melissa Quinnan, Russell Solares, Santosh Parajuli, Mark Neubauer, Christian Herwig, Ho Fung Tsoi, Dylan Rankin, Shih-Chieh Hsu, Scott Hauck2025-12-01下载We present hls4ml, a free and open-source platform that translates machine learning (ML) models from modern deep learning frameworks into high-level synthesis (HLS) code that can be integrated into fu...
Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and AffinityWenbin Zhu, Zhaoyan Shen, Zili Shao, Hongjun Dai, Feng Chen2025-12-01下载Serverless Large Language Models (LLMs) have emerged as a cost-effective solution for deploying AI services by enabling a 'pay-as-you-go' pricing model through GPU resource sharing.
Leveraging Recurrent Patterns in Graph AcceleratorsMasoud Rahimi, Sébastien Le Beux2025-12-01下载Graph accelerators have emerged as a promising solution for processing large-scale sparse graphs, leveraging the in-situ compu-tation of ReRAM-based crossbars to maximize computational efficiency.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect AsyncYi Liu, Chen Qian2025-12-01下载Vector similarity search has become a critical component in AI-driven applications such as large language models (LLMs). To achieve high recall and low latency, GPUs are utilized to exploit massive pa...
Sampling on Metric GraphsRajat Vadiraj Dwaraknath, Lexing Ying2025-12-01下载Metric graphs are structures obtained by associating edges in a standard graph with segments of the real line and gluing these segments at the vertices of the graph.
Feature-Based Semantics-Aware Scheduling for Energy-Harvesting Federated LearningEunjeong Jeong, Giovanni Perin, Howard H. Yang, Nikolaos Pappas2025-12-01下载Federated Learning (FL) on resource-constrained edge devices faces a critical challenge: The computational energy required for training Deep Neural Networks (DNNs) often dominates communication costs.
Dion2: A Simple Method to Shrink Matrix in MuonKwangjun Ahn, Noah Amsel, John Langford2025-12-01下载The Muon optimizer enjoys strong empirical performance and theoretical grounding. However, the super-linear cost of its orthonormalization step introduces increasing overhead with scale.
QAISim: A Toolkit for Modeling and Simulation of AI in Quantum Cloud Computing EnvironmentsIrwindeep Singh, Sukhpal Singh Gill, Jinzhao Sun, Jan Mol2025-12-01下载Quantum computing offers new ways to explore the theory of computation via the laws of quantum mechanics. Due to the rising demand for quantum computing resources, there is growing interest in develop...
Trace-based, time-resolved analysis of MPI application performance using standard metricsKingshuk Haldar2025-12-01下载Detailed trace analysis of MPI applications is essential for performance engineering, but growing trace sizes and complex communication behaviour often render comprehensive visual inspection impractic...
Morphling: Fast, Fused, and Flexible GNN Training at ScaleAnubhab, Rupesh Nasre2025-12-01下载Graph Neural Networks (GNNs) present a fundamental hardware challenge by fusing irregular, memory-bound graph traversals with regular, compute-intensive dense matrix operations.
StarDist: A Code Generator for Distributed Graph AlgorithmsBarenya Kumar Nandy, Rupesh Nasre2025-12-01下载Relational data, occurring in the real world, are often structured as graphs, which provide the logical abstraction required to make analytical derivations simpler.
Delta Sum Learning: an approach for fast and global convergence in Gossip LearningTom Goethals, Merlijn Sebrechts, Stijn De Schrijver, Filip De Turck, Bruno Volckaert2025-12-01下载Federated Learning is a popular approach for distributed learning due to its security and computational benefits. With the advent of powerful devices in the network edge, Gossip Learning further decen...
Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and AffinityWenbin Zhu, Zhaoyan Shen, Zili Shao, Hongjun Dai, Feng Chen2025-12-01下载Serverless Large Language Models (LLMs) have emerged as a cost-effective solution for deploying AI services by enabling a 'pay-as-you-go' pricing model through GPU resource sharing.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Adversarial Robustness of Traffic Classification under Resource Constraints: Input Structure MattersAdel Chehade, Edoardo Ragusa, Paolo Gastaldo, Rodolfo Zunino2025-12-01下载Traffic classification (TC) plays a critical role in cybersecurity, particularly in IoT and embedded contexts, where inspection must often occur locally under tight hardware constraints.
Intrusion Detection on Resource-Constrained IoT Devices with Hardware-Aware ML and DLAli Diab, Adel Chehade, Edoardo Ragusa, Paolo Gastaldo, Rodolfo Zunino, Amer Baghdadi, Mostafa Rizk2025-12-01下载This paper proposes a hardware-aware intrusion detection system (IDS) for Internet of Things (IoT) and Industrial IoT (IIoT) networks; it targets scenarios where classification is essential for fast, ...
Feature-Based Semantics-Aware Scheduling for Energy-Harvesting Federated LearningEunjeong Jeong, Giovanni Perin, Howard H. Yang, Nikolaos Pappas2025-12-01下载Federated Learning (FL) on resource-constrained edge devices faces a critical challenge: The computational energy required for training Deep Neural Networks (DNNs) often dominates communication costs.
Delay Tolerant Networking to Extend Connectivity in Rural Areas Using Public Transport Systems: Design And AnalysisSalah Abdeljabar, Marco Zennaro, Mohamed-Slim Alouini2025-12-01下载In today's digital age, access to the Internet is essential, yet a significant digital divide exists, particularly in rural areas of developing nations.
HERMES: Heterogeneous Application-Enabled Routing Middleware for Edge-IoT SystemsJéssica Consciência, António Grilo2025-12-01下载The growth of the Internet of Things has enabled a new generation of applications, pushing computation and intelligence toward the network edge.
Secure Over-the-Air Computation Against Multiple Eavesdroppers using Correlated Artificial NoiseDavid Nordlund, Luis Maßny, Antonia Wachter-Zeh, Erik G. Larsson, Zheng Chen2025-12-01下载In the era of the Internet of Things and massive connectivity, many engineering applications, such as sensor fusion and federated edge learning, rely on efficient data aggregation from geographically ...
Towards a Multi-Layer Defence Framework for Securing Near-Real-Time Operations in Open RANHamed Alimohammadi, Samara Mayhoub, Sotiris Chatzimiltis, Mohammad Shojafar, Muhammad Nasir Mumtaz Bhutta2025-12-01下载Securing the near-real-time (near-RT) control operations in Open Radio Access Networks (Open RAN) is increasingly critical, yet remains insufficiently addressed, as new runtime threats target the cont...
Velocity-Adaptive Access Scheme for Semantic-Aware Vehicular Networks: Joint Fairness and AoI OptimizationXiao Xu, Qiong Wu, Pingyi Fan, Kezhi Wang, Nan Cheng, Wen Chen, Khaled B. Letaief2025-12-01下载In this paper, we address the problem of fair access and Age of Information (AoI) optimization in 5G New Radio (NR) Vehicle to Everything (V2X) Mode 2.
Modeling and Simulation of Data Protection Systems for Business Continuity and Disaster RecoverySaso Nikolovski, Pece Mitrevski2025-12-01下载In today's corporate landscape, particularly where operations rely heavily on information technologies, establishing a robust business continuity plan, including a disaster recovery strategy, is essen...
Value of Communication in Goal-Oriented Semantic Communications: A Pareto AnalysisJiping Luo, Bowen Li, Nikolaos Pappas2025-12-01下载Emerging cyber-physical systems increasingly operate under stringent communication constraints that preclude the reliable transmission of their extensive machine-type data streams.
INFERMAL: Inferential analysis of maliciously registered domainsYevheniya Nosyk, Maciej Korczyński, Carlos Gañán, Sourena Maroofi, Jan Bayer, Zul Odgerel, Samaneh Tajalizadehkhoob, Andrzej Duda2025-12-01下载Cybercriminals have long depended on domain names for phishing, spam, malware distribution, and botnet operation. To facilitate the malicious activities, they continually register new domain names for...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Confidential, Attestable, and Efficient Inter-CVM Communication with Arm CCASina Abdollahi, Amir Al Sadi, David Kotz, Marios Kogias, Hamed Haddadi2025-12-01下载Confidential Virtual Machines (CVMs) are increasingly adopted to protect sensitive workloads from privileged adversaries such as the hypervisor.
Accelerating Probabilistic Response-Time Analysis: Revised Critical Instant and Optimized ConvolutionHiroto Takahashi, Atsushi Yano, Takuya Azumi2025-12-01下载Accurate estimation of the Worst-Case Deadline Failure Probability (WCDFP) has attracted growing attention as a means to provide safety assurances in complex systems such as robotic platforms and auto...

cs.PF - Performance

标题作者发布日期PDF摘要
Scalable, Cloud-Based Simulations of Blood Flow and Targeted Drug Delivery in Retinal CapillariesLucas Amoudruz, Sergey Litvinov, Riccardo Murri, Volker Eyrich, Jens Zudrop, Costas Bekas, Petros Koumoutsakos2025-12-01下载We investigate the capabilities of cloud computing for large-scale,tightly-coupled simulations of biological fluids in complex geometries, traditionally performed in supercomputing centers.

基于 VitePress 构建