2025-05-20

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient Redistribution	Chang Eun Song, Priyansh Bhatnagar, Zihan Xia, Nam Sung Kim, Tajana Rosing, Mingu Kang	2025-05-20	下载	Transformers, while revolutionary, face challenges due to their demanding computational cost and large data movement. To address this, we propose HyFlexPIM, a novel mixed-signal processing-in-memory (...
SecCAN: An Extended CAN Controller with Embedded Intrusion Detection	Shashwat Khandelwal, Shreejith Shanker	2025-05-20	下载	Recent research has highlighted the vulnerability of in-vehicle network protocols such as controller area networks (CAN) and proposed machine learning-based intrusion detection systems (IDSs) as an ef...
RISC-Q: A Generator for Real-Time Quantum Control System-on-Chips Compatible with RISC-V	Junyi Liu, Yi Lee, Haowei Deng, Connor Clayton, Gengzhi Yang, Xiaodi Wu	2025-05-20	下载	Quantum computing imposes stringent requirements for the precise control of large-scale qubit systems, including, for example, microsecond-latency feedback and nanosecond-precision timing of gigahertz...
CRYPTONITE: Scalable Accelerator Design for Cryptographic Primitives and Algorithms	Karthikeya Sharma Maheswaran, Camille Bossut, Andy Wanna, Qirun Zhang, Cong Hao	2025-05-20	下载	Cryptographic primitives, consisting of repetitive operations with different inputs, are typically implemented using straight-line C code due to traditional execution on CPUs.
Distributed quantum computing with black-box subroutines	X. Xu, Y. -D. Liu, S. Shi, Y. -J. Wang, D. -S. Wang	2025-05-20	下载	In this work, we propose a general protocol for distributed quantum computing that accommodates arbitrary unknown subroutines. It can be applied to scale up quantum computing through multi-chip interc...
Low-Cost FlashAttention with Fused Exponential and Multiplication Hardware Operators	Kosmas Alexandridis, Vasileios Titopoulos, Giorgos Dimitrakopoulos	2025-05-20	下载	Attention mechanisms, particularly within Transformer architectures and large language models (LLMs), have revolutionized sequence modeling in machine learning and artificial intelligence applications...
FLASH-D: FlashAttention with Hidden Softmax Division	Kosmas Alexandridis, Vasileios Titopoulos, Giorgos Dimitrakopoulos	2025-05-20	下载	The transformer's attention mechanism has revolutionized AI and machine learning, with its efficient computation being crucial to its performance.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Machine Learning for Consistency Violation Faults Analysis	Kamal Giri, Amit Garu	2025-05-20	下载	Distributed systems frequently encounter consistency violation faults (cvfs), where nodes operate on outdated or inaccurate data, adversely affecting convergence and overall system performance.
EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language Models	Han Liu, Ruoyao Wen, Srijith Nair, Jia Liu, Wenjing Lou, Chongjie Zhang, William Yeoh, Yevgeniy Vorobeychik, Ning Zhang	2025-05-20	下载	To address data locality and privacy restrictions, Federated Learning (FL) has recently been adopted to fine-tune large language models (LLMs), enabling improved performance on various downstream task...
Sei Giga	Benjamin Marsh, Steven Landers, Jayendra Jog	2025-05-20	下载	We introduce the Sei Giga, a multi-concurrent producer parallelized execution EVM layer one blockchain. In an internal testnet Giga has achieved >5 gigagas/sec throughput and sub 400ms finality.
Balanced and Elastic End-to-end Training of Dynamic LLMs	Mohamed Wahib, Muhammed Abdullah Soyturk, Didem Unat	2025-05-20	下载	To reduce the computational and memory overhead of Large Language Models, various approaches have been proposed. These include a) Mixture of Experts (MoEs), where token routing affects compute balance...
Extracting Practical, Actionable Energy Insights from Supercomputer Telemetry and Logs	Melanie Cornelius, Greg Cross, Shilpika Shilpika, Matthew T. Dearing, Zhiling Lan	2025-05-20	下载	As supercomputers grow in size and complexity, power efficiency has become a critical challenge, particularly in understanding GPU power consumption within modern HPC workloads.
PSMOA: Policy Support Multi-Objective Optimization Algorithm for Decentralized Data Replication	Xi Wang, Susmit Shannigrahi	2025-05-20	下载	Efficient data replication in decentralized storage systems must account for diverse policies, especially in multi-organizational, data-intensive environments.
Distributed quantum computing with black-box subroutines	X. Xu, Y. -D. Liu, S. Shi, Y. -J. Wang, D. -S. Wang	2025-05-20	下载	In this work, we propose a general protocol for distributed quantum computing that accommodates arbitrary unknown subroutines. It can be applied to scale up quantum computing through multi-chip interc...
Federated prediction for scalable and privacy-preserved knowledge-based planning in radiotherapy	Jingyun Chen, David Horowitz, Yading Yuan	2025-05-20	下载	Background: Deep learning has potential to improve the efficiency and consistency of radiation therapy planning, but clinical adoption is hindered by the limited model generalizability due to data sca...
ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs	Yifan Sui, Hao Wang, Hanfei Yu, Yitao Hu, Jianxun Li, Hao Wang	2025-05-20	下载	Serverless computing has grown rapidly for serving Large Language Model (LLM) inference due to its pay-as-you-go pricing, fine-grained GPU usage, and rapid scaling.
Evaluating the Impact Of Spatial Features Of Mobility Data and Index Choice On Database Performance	Tim C. Rese, Alexandra Kapp, David Bermbach	2025-05-20	下载	The growing number of moving Internet-of-Things (IoT) devices has led to a surge in moving object data, powering applications such as traffic routing, hotspot detection, or weather forecasting.
SkyMemory: A LEO Edge Cache for Transformer Inference Optimization and Scale Out	Thomas Sandholm, Sayandev Mukherjee, Lin Cheng, Bernardo A. Huberman	2025-05-20	下载	We expand the scope of cache memory to include LEO constellations, which are highly distributed systems with thousands of satellites connected with free-space optics inter-satellite links (ISL) always...
Co-LoRA: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients	Minhyuk Seo, Taeheon Kim, Hankook Lee, Jonghyun Choi, Tinne Tuytelaars	2025-05-20	下载	As AI becomes more personal, e.g., Agentic AI, there is an increasing need for personalizing models for various use cases. Personalized federated learning (PFL) enables each client to collaboratively ...
Prime Collective Communications Library -- Technical Report	Michael Keiblinger, Mario Sieg, Jack Min Ong, Sami Jaghouar, Johannes Hagemann	2025-05-20	下载	This report presents the Prime Collective Communications Library (PCCL), a novel fault-tolerant collective communication library designed for distributed ML workloads over the public internet.
FedGraM: Defending Against Untargeted Attacks in Federated Learning via Embedding Gram Matrix	Di Wu, Qian Li, Heng Yang, Yong Han	2025-05-20	下载	Federated Learning (FL) enables geographically distributed clients to collaboratively train machine learning models by sharing only their local models, ensuring data privacy.
Paradigm Shift in Infrastructure Inspection Technology: Leveraging High-performance Imaging and Advanced AI Analytics to Inspect Road Infrastructure	Du Wu, Enzhi Zhang, Isaac Lyngaas, Xiao Wang, Amir Ziabari, Tao Luo, Peng Chen, Kento Sato, Fumiyoshi Shoji, Takaki Hatsui, Kentaro Uesugi, Akira Seo, Yasuhito Sakai, Toshio Endo, Tetsuya Ishikawa, Satoshi Matsuoka, Mohamed Wahib	2025-05-20	下载	Effective road infrastructure management is crucial for modern society. Traditional manual inspection techniques remain constrained by cost, efficiency, and scalability, while camera and laser imaging...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
TSA-WF: Exploring the Effectiveness of Time Series Analysis for Website Fingerprinting	Michael Wrana, Uzma Maroof, Diogo Barradas	2025-05-20	下载	Website fingerprinting (WF) is a technique that allows an eavesdropper to determine the website a target user is accessing by inspecting the metadata associated with the packets she exchanges via some...
PSMOA: Policy Support Multi-Objective Optimization Algorithm for Decentralized Data Replication	Xi Wang, Susmit Shannigrahi	2025-05-20	下载	Efficient data replication in decentralized storage systems must account for diverse policies, especially in multi-organizational, data-intensive environments.
Automated, Cross-Layer Root Cause Analysis of 5G Video-Conferencing Quality Degradation	Fan Yi, Haoran Wan, Kyle Jamieson, Oliver Michel	2025-05-20	下载	5G wireless networks are complex, leveraging layers of scheduling, retransmission, and adaptation mechanisms to maximize their efficiency. But these mechanisms interact to produce significant fluctuat...
A5/1 is in the Air: Passive Detection of 2G (GSM) Ciphering Algorithms	Matthias Koch, Christian Nettersheim, Thorsten Horstmann, Michael Rademacher	2025-05-20	下载	This paper investigates the ongoing use of the A5/1 ciphering algorithm within 2G GSM networks. Despite its known vulnerabilities and the gradual phasing out of GSM technology by some operators, GSM s...
open5Gcube: A Modular and Usable Framework for Mobile Network Laboratories	Thorsten Horstmann, Dominik Brunke, Tobias Kremeyer, Matthias Wilmes, Gunnar Schneider, Julian Sturm, Hartmut König, Michael Rademacher	2025-05-20	下载	In mobile network research, the integration of real-world components such as User Equipment (UE) with open-source network infrastructure is essential yet challenging.
Interpretable Reinforcement Learning for Load Balancing using Kolmogorov-Arnold Networks	Kamal Singh, Sami Marouani, Ahmad Al Sheikh, Pham Tran Anh Quang, Amaury Habrard	2025-05-20	下载	Reinforcement learning (RL) has been increasingly applied to network control problems, such as load balancing. However, existing RL approaches often suffer from lack of interpretability and difficulty...
Measuring Round-Trip Response Latencies Under Asymmetric Routing	Bhavana Vannarth Shobhana, Yen-lin Chien, Jonathan Diamant, Badri Nath, Shir Landau Feibish, Srinivas Narayana	2025-05-20	下载	Latency is a key indicator of Internet service performance. Continuously tracking the latency of client requests enables service operators to quickly identify bottlenecks, perform adaptive resource al...
Sibling Prefixes: Identifying Similarities in IPv4 and IPv6 Prefixes	Fariba Osali, Khwaja Zubair Sediqi, Oliver Gasser	2025-05-20	下载	Since the standardization of IPv6 in 1998, both versions of the Internet Protocol have coexisted in the Internet. Clients usually run algorithms such as Happy Eyeballs, to decide whether to connect to...
Integration of TinyML and LargeML: A Survey of 6G and Beyond	Thai-Hoc Vu, Ngo Hoang Tu, Thien Huynh-The, Kyungchun Lee, Sunghwan Kim, Miroslav Voznak, Quoc-Viet Pham	2025-05-20	下载	The evolution from fifth-generation (5G) to sixth-generation (6G) networks is driving an unprecedented demand for advanced machine learning (ML) solutions.
VaN3Twin: the Multi-Technology V2X Digital Twin with Ray-Tracing in the Loop	Roberto Pegurri, Diego Gasco, Francesco Linsalata, Marco Rapelli, Eugenio Moro, Francesco Raviglione, Claudio Casetti	2025-05-20	下载	This paper presents VaN3Twin-the first open-source, full-stack Network Digital Twin (NDT) framework for simulating the coexistence of multiple Vehicle-to-Everything (V2X) communication technologies wi...
CE-LSLM: Efficient Large-Small Language Model Inference and Communication via Cloud-Edge Collaboration	Pengyan Zhu, Tingting Yang	2025-05-20	下载	Emerging intelligent service scenarios in 6G communication impose stringent requirements for low latency, high reliability, and privacy preservation.
6G communications through sub-Terahertz CMOS power amplifiers: Design challenges and trends	Jun Yan Lee, Duo Wu, Xuanrui Guo, Jian Ding Tan, Teh Jia Yew, Zi Neng Ng, Mohammad Arif Sobhan Bhuiyan, Mahdi H. Miraz	2025-05-20	下载	The fifth-generation (5G) network faces limitations in supporting emerging applications, such as artificial intelligence (AI), virtual reality (VR) and digital twins.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Extracting Practical, Actionable Energy Insights from Supercomputer Telemetry and Logs	Melanie Cornelius, Greg Cross, Shilpika Shilpika, Matthew T. Dearing, Zhiling Lan	2025-05-20	下载	As supercomputers grow in size and complexity, power efficiency has become a critical challenge, particularly in understanding GPU power consumption within modern HPC workloads.
Task-parallelism in SWIFT for heterogeneous compute architectures	Abouzied M. A. Nasar, Benedict D. Rogers, Georgios Fourtakas, Mladen Ivkovic, Tobias Weinzierl, Scott T. Kay, Matthieu Schaller	2025-05-20	下载	This paper highlights first steps towards enabling graphics processing unit (GPU) acceleration of the task-parallel smoothed particle hydrodynamics (SPH) solver SWIFT.
Heterogeneous Memory Pool Tuning	Filip Vaverka, Ondrej Vysocky, Lubomir Riha	2025-05-20	下载	We present a lightweight tool for the analysis and tuning of application data placement in systems with heterogeneous memory pools. The tool allows non-intrusively identifying, analyzing, and controll...
Towards Efficient Multi-Scale Deformable Attention on NPU	Chenghuan Huang, Zhigeng Xu, Chong Sun, Chen Li, Ziyang Ma	2025-05-20	下载	Multi-scale deformable attention (MSDA) is a flexible and powerful feature extraction mechanism for visual tasks, but its random-access grid sampling strategy poses significant optimization challenges...