Skip to content

2025-05-20

cs.AR - Architecture

标题作者发布日期PDF摘要
Hybrid SLC-MLC RRAM Mixed-Signal Processing-in-Memory Architecture for Transformer Acceleration via Gradient RedistributionChang Eun Song, Priyansh Bhatnagar, Zihan Xia, Nam Sung Kim, Tajana Rosing, Mingu Kang2025-05-20下载Transformers, while revolutionary, face challenges due to their demanding computational cost and large data movement. To address this, we propose HyFlexPIM, a novel mixed-signal processing-in-memory (...
SecCAN: An Extended CAN Controller with Embedded Intrusion DetectionShashwat Khandelwal, Shreejith Shanker2025-05-20下载Recent research has highlighted the vulnerability of in-vehicle network protocols such as controller area networks (CAN) and proposed machine learning-based intrusion detection systems (IDSs) as an ef...
RISC-Q: A Generator for Real-Time Quantum Control System-on-Chips Compatible with RISC-VJunyi Liu, Yi Lee, Haowei Deng, Connor Clayton, Gengzhi Yang, Xiaodi Wu2025-05-20下载Quantum computing imposes stringent requirements for the precise control of large-scale qubit systems, including, for example, microsecond-latency feedback and nanosecond-precision timing of gigahertz...
CRYPTONITE: Scalable Accelerator Design for Cryptographic Primitives and AlgorithmsKarthikeya Sharma Maheswaran, Camille Bossut, Andy Wanna, Qirun Zhang, Cong Hao2025-05-20下载Cryptographic primitives, consisting of repetitive operations with different inputs, are typically implemented using straight-line C code due to traditional execution on CPUs.
Distributed quantum computing with black-box subroutinesX. Xu, Y. -D. Liu, S. Shi, Y. -J. Wang, D. -S. Wang2025-05-20下载In this work, we propose a general protocol for distributed quantum computing that accommodates arbitrary unknown subroutines. It can be applied to scale up quantum computing through multi-chip interc...
Low-Cost FlashAttention with Fused Exponential and Multiplication Hardware OperatorsKosmas Alexandridis, Vasileios Titopoulos, Giorgos Dimitrakopoulos2025-05-20下载Attention mechanisms, particularly within Transformer architectures and large language models (LLMs), have revolutionized sequence modeling in machine learning and artificial intelligence applications...
FLASH-D: FlashAttention with Hidden Softmax DivisionKosmas Alexandridis, Vasileios Titopoulos, Giorgos Dimitrakopoulos2025-05-20下载The transformer's attention mechanism has revolutionized AI and machine learning, with its efficient computation being crucial to its performance.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Machine Learning for Consistency Violation Faults AnalysisKamal Giri, Amit Garu2025-05-20下载Distributed systems frequently encounter consistency violation faults (cvfs), where nodes operate on outdated or inaccurate data, adversely affecting convergence and overall system performance.
EcoLoRA: Communication-Efficient Federated Fine-Tuning of Large Language ModelsHan Liu, Ruoyao Wen, Srijith Nair, Jia Liu, Wenjing Lou, Chongjie Zhang, William Yeoh, Yevgeniy Vorobeychik, Ning Zhang2025-05-20下载To address data locality and privacy restrictions, Federated Learning (FL) has recently been adopted to fine-tune large language models (LLMs), enabling improved performance on various downstream task...
Sei GigaBenjamin Marsh, Steven Landers, Jayendra Jog2025-05-20下载We introduce the Sei Giga, a multi-concurrent producer parallelized execution EVM layer one blockchain. In an internal testnet Giga has achieved >5 gigagas/sec throughput and sub 400ms finality.
Balanced and Elastic End-to-end Training of Dynamic LLMsMohamed Wahib, Muhammed Abdullah Soyturk, Didem Unat2025-05-20下载To reduce the computational and memory overhead of Large Language Models, various approaches have been proposed. These include a) Mixture of Experts (MoEs), where token routing affects compute balance...
Extracting Practical, Actionable Energy Insights from Supercomputer Telemetry and LogsMelanie Cornelius, Greg Cross, Shilpika Shilpika, Matthew T. Dearing, Zhiling Lan2025-05-20下载As supercomputers grow in size and complexity, power efficiency has become a critical challenge, particularly in understanding GPU power consumption within modern HPC workloads.
PSMOA: Policy Support Multi-Objective Optimization Algorithm for Decentralized Data ReplicationXi Wang, Susmit Shannigrahi2025-05-20下载Efficient data replication in decentralized storage systems must account for diverse policies, especially in multi-organizational, data-intensive environments.
Distributed quantum computing with black-box subroutinesX. Xu, Y. -D. Liu, S. Shi, Y. -J. Wang, D. -S. Wang2025-05-20下载In this work, we propose a general protocol for distributed quantum computing that accommodates arbitrary unknown subroutines. It can be applied to scale up quantum computing through multi-chip interc...
Federated prediction for scalable and privacy-preserved knowledge-based planning in radiotherapyJingyun Chen, David Horowitz, Yading Yuan2025-05-20下载Background: Deep learning has potential to improve the efficiency and consistency of radiation therapy planning, but clinical adoption is hindered by the limited model generalizability due to data sca...
ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMsYifan Sui, Hao Wang, Hanfei Yu, Yitao Hu, Jianxun Li, Hao Wang2025-05-20下载Serverless computing has grown rapidly for serving Large Language Model (LLM) inference due to its pay-as-you-go pricing, fine-grained GPU usage, and rapid scaling.
Evaluating the Impact Of Spatial Features Of Mobility Data and Index Choice On Database PerformanceTim C. Rese, Alexandra Kapp, David Bermbach2025-05-20下载The growing number of moving Internet-of-Things (IoT) devices has led to a surge in moving object data, powering applications such as traffic routing, hotspot detection, or weather forecasting.
SkyMemory: A LEO Edge Cache for Transformer Inference Optimization and Scale OutThomas Sandholm, Sayandev Mukherjee, Lin Cheng, Bernardo A. Huberman2025-05-20下载We expand the scope of cache memory to include LEO constellations, which are highly distributed systems with thousands of satellites connected with free-space optics inter-satellite links (ISL) always...
Co-LoRA: Collaborative Model Personalization on Heterogeneous Multi-Modal ClientsMinhyuk Seo, Taeheon Kim, Hankook Lee, Jonghyun Choi, Tinne Tuytelaars2025-05-20下载As AI becomes more personal, e.g., Agentic AI, there is an increasing need for personalizing models for various use cases. Personalized federated learning (PFL) enables each client to collaboratively ...
Prime Collective Communications Library -- Technical ReportMichael Keiblinger, Mario Sieg, Jack Min Ong, Sami Jaghouar, Johannes Hagemann2025-05-20下载This report presents the Prime Collective Communications Library (PCCL), a novel fault-tolerant collective communication library designed for distributed ML workloads over the public internet.
FedGraM: Defending Against Untargeted Attacks in Federated Learning via Embedding Gram MatrixDi Wu, Qian Li, Heng Yang, Yong Han2025-05-20下载Federated Learning (FL) enables geographically distributed clients to collaboratively train machine learning models by sharing only their local models, ensuring data privacy.
Paradigm Shift in Infrastructure Inspection Technology: Leveraging High-performance Imaging and Advanced AI Analytics to Inspect Road InfrastructureDu Wu, Enzhi Zhang, Isaac Lyngaas, Xiao Wang, Amir Ziabari, Tao Luo, Peng Chen, Kento Sato, Fumiyoshi Shoji, Takaki Hatsui, Kentaro Uesugi, Akira Seo, Yasuhito Sakai, Toshio Endo, Tetsuya Ishikawa, Satoshi Matsuoka, Mohamed Wahib2025-05-20下载Effective road infrastructure management is crucial for modern society. Traditional manual inspection techniques remain constrained by cost, efficiency, and scalability, while camera and laser imaging...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
TSA-WF: Exploring the Effectiveness of Time Series Analysis for Website FingerprintingMichael Wrana, Uzma Maroof, Diogo Barradas2025-05-20下载Website fingerprinting (WF) is a technique that allows an eavesdropper to determine the website a target user is accessing by inspecting the metadata associated with the packets she exchanges via some...
PSMOA: Policy Support Multi-Objective Optimization Algorithm for Decentralized Data ReplicationXi Wang, Susmit Shannigrahi2025-05-20下载Efficient data replication in decentralized storage systems must account for diverse policies, especially in multi-organizational, data-intensive environments.
Automated, Cross-Layer Root Cause Analysis of 5G Video-Conferencing Quality DegradationFan Yi, Haoran Wan, Kyle Jamieson, Oliver Michel2025-05-20下载5G wireless networks are complex, leveraging layers of scheduling, retransmission, and adaptation mechanisms to maximize their efficiency. But these mechanisms interact to produce significant fluctuat...
A5/1 is in the Air: Passive Detection of 2G (GSM) Ciphering AlgorithmsMatthias Koch, Christian Nettersheim, Thorsten Horstmann, Michael Rademacher2025-05-20下载This paper investigates the ongoing use of the A5/1 ciphering algorithm within 2G GSM networks. Despite its known vulnerabilities and the gradual phasing out of GSM technology by some operators, GSM s...
open5Gcube: A Modular and Usable Framework for Mobile Network LaboratoriesThorsten Horstmann, Dominik Brunke, Tobias Kremeyer, Matthias Wilmes, Gunnar Schneider, Julian Sturm, Hartmut König, Michael Rademacher2025-05-20下载In mobile network research, the integration of real-world components such as User Equipment (UE) with open-source network infrastructure is essential yet challenging.
Interpretable Reinforcement Learning for Load Balancing using Kolmogorov-Arnold NetworksKamal Singh, Sami Marouani, Ahmad Al Sheikh, Pham Tran Anh Quang, Amaury Habrard2025-05-20下载Reinforcement learning (RL) has been increasingly applied to network control problems, such as load balancing. However, existing RL approaches often suffer from lack of interpretability and difficulty...
Measuring Round-Trip Response Latencies Under Asymmetric RoutingBhavana Vannarth Shobhana, Yen-lin Chien, Jonathan Diamant, Badri Nath, Shir Landau Feibish, Srinivas Narayana2025-05-20下载Latency is a key indicator of Internet service performance. Continuously tracking the latency of client requests enables service operators to quickly identify bottlenecks, perform adaptive resource al...
Sibling Prefixes: Identifying Similarities in IPv4 and IPv6 PrefixesFariba Osali, Khwaja Zubair Sediqi, Oliver Gasser2025-05-20下载Since the standardization of IPv6 in 1998, both versions of the Internet Protocol have coexisted in the Internet. Clients usually run algorithms such as Happy Eyeballs, to decide whether to connect to...
Integration of TinyML and LargeML: A Survey of 6G and BeyondThai-Hoc Vu, Ngo Hoang Tu, Thien Huynh-The, Kyungchun Lee, Sunghwan Kim, Miroslav Voznak, Quoc-Viet Pham2025-05-20下载The evolution from fifth-generation (5G) to sixth-generation (6G) networks is driving an unprecedented demand for advanced machine learning (ML) solutions.
VaN3Twin: the Multi-Technology V2X Digital Twin with Ray-Tracing in the LoopRoberto Pegurri, Diego Gasco, Francesco Linsalata, Marco Rapelli, Eugenio Moro, Francesco Raviglione, Claudio Casetti2025-05-20下载This paper presents VaN3Twin-the first open-source, full-stack Network Digital Twin (NDT) framework for simulating the coexistence of multiple Vehicle-to-Everything (V2X) communication technologies wi...
CE-LSLM: Efficient Large-Small Language Model Inference and Communication via Cloud-Edge CollaborationPengyan Zhu, Tingting Yang2025-05-20下载Emerging intelligent service scenarios in 6G communication impose stringent requirements for low latency, high reliability, and privacy preservation.
6G communications through sub-Terahertz CMOS power amplifiers: Design challenges and trendsJun Yan Lee, Duo Wu, Xuanrui Guo, Jian Ding Tan, Teh Jia Yew, Zi Neng Ng, Mohammad Arif Sobhan Bhuiyan, Mahdi H. Miraz2025-05-20下载The fifth-generation (5G) network faces limitations in supporting emerging applications, such as artificial intelligence (AI), virtual reality (VR) and digital twins.

cs.PF - Performance

标题作者发布日期PDF摘要
Extracting Practical, Actionable Energy Insights from Supercomputer Telemetry and LogsMelanie Cornelius, Greg Cross, Shilpika Shilpika, Matthew T. Dearing, Zhiling Lan2025-05-20下载As supercomputers grow in size and complexity, power efficiency has become a critical challenge, particularly in understanding GPU power consumption within modern HPC workloads.
Task-parallelism in SWIFT for heterogeneous compute architecturesAbouzied M. A. Nasar, Benedict D. Rogers, Georgios Fourtakas, Mladen Ivkovic, Tobias Weinzierl, Scott T. Kay, Matthieu Schaller2025-05-20下载This paper highlights first steps towards enabling graphics processing unit (GPU) acceleration of the task-parallel smoothed particle hydrodynamics (SPH) solver SWIFT.
Heterogeneous Memory Pool TuningFilip Vaverka, Ondrej Vysocky, Lubomir Riha2025-05-20下载We present a lightweight tool for the analysis and tuning of application data placement in systems with heterogeneous memory pools. The tool allows non-intrusively identifying, analyzing, and controll...
Towards Efficient Multi-Scale Deformable Attention on NPUChenghuan Huang, Zhigeng Xu, Chong Sun, Chen Li, Ziyang Ma2025-05-20下载Multi-scale deformable attention (MSDA) is a flexible and powerful feature extraction mechanism for visual tasks, but its random-access grid sampling strategy poses significant optimization challenges...

基于 VitePress 构建