Appearance
2025-12-01
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Microbenchmarking NVIDIA's Blackwell Architecture: An in-depth Architectural Analysis | Aaron Jarmusch, Sunita Chandrasekaran | 2025-12-01 | 下载 | As GPU architectures rapidly evolve to meet the growing demands of exascale computing and machine learning, the performance implications of architectural innovations remain poorly understood across di... |
| A Low-Cost Reliable Racetrack Cache Based on Data Compression | Elham Cheshmikhani, Fateme Shokouhinia, Hamed Farbeh | 2025-12-01 | 下载 | SRAM-based cache memory faces several scalability limitations in deep nanoscale technologies, e.g., high leakage current, low cell stability, and low density. |
| A Systematic Characterization of LLM Inference on GPUs | Haonan Wang, Xuxin Xiao, Mingyu Yan, Zhuoyuan Zhu, Dengke Han, Duo Wang, Wenming Li, Xiaochun Ye, Cunchen Hu, Hongyang Chen, Guangyu Sun | 2025-12-01 | 下载 | This work presents a systematic characterization of Large Language Model (LLM) inference to address fragmented understanding. Through comprehensive experiments, we establish a four-dimensional analyti... |
| IVE: An Accelerator for Single-Server Private Information Retrieval Using Versatile Processing Elements | Sangpyo Kim, Hyesung Ji, Jongmin Kim, Wonseok Choi, Jaiyoung Park, Jung Ho Ahn | 2025-12-01 | 下载 | Private information retrieval (PIR) is an essential cryptographic protocol for privacy-preserving applications, enabling a client to retrieve a record from a server's database without revealing which ... |
| RoMe: Row Granularity Access Memory System for Large Language Models | Hwayong Nam, Seungmin Baek, Jumin Kim, Michael Jaemin Kim, Jung Ho Ahn | 2025-12-01 | 下载 | Modern HBM-based memory systems have evolved over generations while retaining cache line granularity accesses. Preserving this fine granularity necessitated the introduction of bank groups and pseudo ... |
| Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control | Fabian Kresse, Christoph H. Lampert | 2025-12-01 | 下载 | We investigate whether continuous-control policies can be represented and learned as discrete logic circuits instead of continuous neural networks. |
| hls4ml: A Flexible, Open-Source Platform for Deep Learning Acceleration on Reconfigurable Hardware | Jan-Frederik Schulte, Benjamin Ramhorst, Chang Sun, Jovan Mitrevski, Nicolò Ghielmetti, Enrico Lupi, Dimitrios Danopoulos, Vladimir Loncar, Javier Duarte, David Burnette, Lauri Laatu, Stylianos Tzelepis, Konstantinos Axiotis, Quentin Berthet, Haoyan Wang, Paul White, Suleyman Demirsoy, Marco Colombo, Thea Aarrestad, Sioni Summers, Maurizio Pierini, Giuseppe Di Guglielmo, Jennifer Ngadiuba, Javier Campos, Ben Hawks, Abhijith Gandrakota, Farah Fahim, Nhan Tran, George Constantinides, Zhiqiang Que, Wayne Luk, Alexander Tapper, Duc Hoang, Noah Paladino, Philip Harris, Bo-Cheng Lai, Manuel Valentin, Ryan Forelli, Seda Ogrenci, Lino Gerlach, Rian Flynn, Mia Liu, Daniel Diaz, Elham Khoda, Melissa Quinnan, Russell Solares, Santosh Parajuli, Mark Neubauer, Christian Herwig, Ho Fung Tsoi, Dylan Rankin, Shih-Chieh Hsu, Scott Hauck | 2025-12-01 | 下载 | We present hls4ml, a free and open-source platform that translates machine learning (ML) models from modern deep learning frameworks into high-level synthesis (HLS) code that can be integrated into fu... |
| Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity | Wenbin Zhu, Zhaoyan Shen, Zili Shao, Hongjun Dai, Feng Chen | 2025-12-01 | 下载 | Serverless Large Language Models (LLMs) have emerged as a cost-effective solution for deploying AI services by enabling a 'pay-as-you-go' pricing model through GPU resource sharing. |
| Leveraging Recurrent Patterns in Graph Accelerators | Masoud Rahimi, Sébastien Le Beux | 2025-12-01 | 下载 | Graph accelerators have emerged as a promising solution for processing large-scale sparse graphs, leveraging the in-situ compu-tation of ReRAM-based crossbars to maximize computational efficiency. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect Async | Yi Liu, Chen Qian | 2025-12-01 | 下载 | Vector similarity search has become a critical component in AI-driven applications such as large language models (LLMs). To achieve high recall and low latency, GPUs are utilized to exploit massive pa... |
| Sampling on Metric Graphs | Rajat Vadiraj Dwaraknath, Lexing Ying | 2025-12-01 | 下载 | Metric graphs are structures obtained by associating edges in a standard graph with segments of the real line and gluing these segments at the vertices of the graph. |
| Feature-Based Semantics-Aware Scheduling for Energy-Harvesting Federated Learning | Eunjeong Jeong, Giovanni Perin, Howard H. Yang, Nikolaos Pappas | 2025-12-01 | 下载 | Federated Learning (FL) on resource-constrained edge devices faces a critical challenge: The computational energy required for training Deep Neural Networks (DNNs) often dominates communication costs. |
| Dion2: A Simple Method to Shrink Matrix in Muon | Kwangjun Ahn, Noah Amsel, John Langford | 2025-12-01 | 下载 | The Muon optimizer enjoys strong empirical performance and theoretical grounding. However, the super-linear cost of its orthonormalization step introduces increasing overhead with scale. |
| QAISim: A Toolkit for Modeling and Simulation of AI in Quantum Cloud Computing Environments | Irwindeep Singh, Sukhpal Singh Gill, Jinzhao Sun, Jan Mol | 2025-12-01 | 下载 | Quantum computing offers new ways to explore the theory of computation via the laws of quantum mechanics. Due to the rising demand for quantum computing resources, there is growing interest in develop... |
| Trace-based, time-resolved analysis of MPI application performance using standard metrics | Kingshuk Haldar | 2025-12-01 | 下载 | Detailed trace analysis of MPI applications is essential for performance engineering, but growing trace sizes and complex communication behaviour often render comprehensive visual inspection impractic... |
| Morphling: Fast, Fused, and Flexible GNN Training at Scale | Anubhab, Rupesh Nasre | 2025-12-01 | 下载 | Graph Neural Networks (GNNs) present a fundamental hardware challenge by fusing irregular, memory-bound graph traversals with regular, compute-intensive dense matrix operations. |
| StarDist: A Code Generator for Distributed Graph Algorithms | Barenya Kumar Nandy, Rupesh Nasre | 2025-12-01 | 下载 | Relational data, occurring in the real world, are often structured as graphs, which provide the logical abstraction required to make analytical derivations simpler. |
| Delta Sum Learning: an approach for fast and global convergence in Gossip Learning | Tom Goethals, Merlijn Sebrechts, Stijn De Schrijver, Filip De Turck, Bruno Volckaert | 2025-12-01 | 下载 | Federated Learning is a popular approach for distributed learning due to its security and computational benefits. With the advent of powerful devices in the network edge, Gossip Learning further decen... |
| Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity | Wenbin Zhu, Zhaoyan Shen, Zili Shao, Hongjun Dai, Feng Chen | 2025-12-01 | 下载 | Serverless Large Language Models (LLMs) have emerged as a cost-effective solution for deploying AI services by enabling a 'pay-as-you-go' pricing model through GPU resource sharing. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Adversarial Robustness of Traffic Classification under Resource Constraints: Input Structure Matters | Adel Chehade, Edoardo Ragusa, Paolo Gastaldo, Rodolfo Zunino | 2025-12-01 | 下载 | Traffic classification (TC) plays a critical role in cybersecurity, particularly in IoT and embedded contexts, where inspection must often occur locally under tight hardware constraints. |
| Intrusion Detection on Resource-Constrained IoT Devices with Hardware-Aware ML and DL | Ali Diab, Adel Chehade, Edoardo Ragusa, Paolo Gastaldo, Rodolfo Zunino, Amer Baghdadi, Mostafa Rizk | 2025-12-01 | 下载 | This paper proposes a hardware-aware intrusion detection system (IDS) for Internet of Things (IoT) and Industrial IoT (IIoT) networks; it targets scenarios where classification is essential for fast, ... |
| Feature-Based Semantics-Aware Scheduling for Energy-Harvesting Federated Learning | Eunjeong Jeong, Giovanni Perin, Howard H. Yang, Nikolaos Pappas | 2025-12-01 | 下载 | Federated Learning (FL) on resource-constrained edge devices faces a critical challenge: The computational energy required for training Deep Neural Networks (DNNs) often dominates communication costs. |
| Delay Tolerant Networking to Extend Connectivity in Rural Areas Using Public Transport Systems: Design And Analysis | Salah Abdeljabar, Marco Zennaro, Mohamed-Slim Alouini | 2025-12-01 | 下载 | In today's digital age, access to the Internet is essential, yet a significant digital divide exists, particularly in rural areas of developing nations. |
| HERMES: Heterogeneous Application-Enabled Routing Middleware for Edge-IoT Systems | Jéssica Consciência, António Grilo | 2025-12-01 | 下载 | The growth of the Internet of Things has enabled a new generation of applications, pushing computation and intelligence toward the network edge. |
| Secure Over-the-Air Computation Against Multiple Eavesdroppers using Correlated Artificial Noise | David Nordlund, Luis Maßny, Antonia Wachter-Zeh, Erik G. Larsson, Zheng Chen | 2025-12-01 | 下载 | In the era of the Internet of Things and massive connectivity, many engineering applications, such as sensor fusion and federated edge learning, rely on efficient data aggregation from geographically ... |
| Towards a Multi-Layer Defence Framework for Securing Near-Real-Time Operations in Open RAN | Hamed Alimohammadi, Samara Mayhoub, Sotiris Chatzimiltis, Mohammad Shojafar, Muhammad Nasir Mumtaz Bhutta | 2025-12-01 | 下载 | Securing the near-real-time (near-RT) control operations in Open Radio Access Networks (Open RAN) is increasingly critical, yet remains insufficiently addressed, as new runtime threats target the cont... |
| Velocity-Adaptive Access Scheme for Semantic-Aware Vehicular Networks: Joint Fairness and AoI Optimization | Xiao Xu, Qiong Wu, Pingyi Fan, Kezhi Wang, Nan Cheng, Wen Chen, Khaled B. Letaief | 2025-12-01 | 下载 | In this paper, we address the problem of fair access and Age of Information (AoI) optimization in 5G New Radio (NR) Vehicle to Everything (V2X) Mode 2. |
| Modeling and Simulation of Data Protection Systems for Business Continuity and Disaster Recovery | Saso Nikolovski, Pece Mitrevski | 2025-12-01 | 下载 | In today's corporate landscape, particularly where operations rely heavily on information technologies, establishing a robust business continuity plan, including a disaster recovery strategy, is essen... |
| Value of Communication in Goal-Oriented Semantic Communications: A Pareto Analysis | Jiping Luo, Bowen Li, Nikolaos Pappas | 2025-12-01 | 下载 | Emerging cyber-physical systems increasingly operate under stringent communication constraints that preclude the reliable transmission of their extensive machine-type data streams. |
| INFERMAL: Inferential analysis of maliciously registered domains | Yevheniya Nosyk, Maciej Korczyński, Carlos Gañán, Sourena Maroofi, Jan Bayer, Zul Odgerel, Samaneh Tajalizadehkhoob, Andrzej Duda | 2025-12-01 | 下载 | Cybercriminals have long depended on domain names for phishing, spam, malware distribution, and botnet operation. To facilitate the malicious activities, they continually register new domain names for... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Confidential, Attestable, and Efficient Inter-CVM Communication with Arm CCA | Sina Abdollahi, Amir Al Sadi, David Kotz, Marios Kogias, Hamed Haddadi | 2025-12-01 | 下载 | Confidential Virtual Machines (CVMs) are increasingly adopted to protect sensitive workloads from privileged adversaries such as the hypervisor. |
| Accelerating Probabilistic Response-Time Analysis: Revised Critical Instant and Optimized Convolution | Hiroto Takahashi, Atsushi Yano, Takuya Azumi | 2025-12-01 | 下载 | Accurate estimation of the Worst-Case Deadline Failure Probability (WCDFP) has attracted growing attention as a means to provide safety assurances in complex systems such as robotic platforms and auto... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Scalable, Cloud-Based Simulations of Blood Flow and Targeted Drug Delivery in Retinal Capillaries | Lucas Amoudruz, Sergey Litvinov, Riccardo Murri, Volker Eyrich, Jens Zudrop, Costas Bekas, Petros Koumoutsakos | 2025-12-01 | 下载 | We investigate the capabilities of cloud computing for large-scale,tightly-coupled simulations of biological fluids in complex geometries, traditionally performed in supercomputing centers. |