2025-09-12

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Design and Analysis of Approximate Hardware Accelerators for VVC Intra Angular Prediction	Lucas M. Leipnitz de Fraga, Cláudio Machado Diniz	2025-09-12	下载	The Versatile Video Coding (VVC) standard significantly improves compression efficiency over its predecessor, HEVC, but at the cost of substantially higher computational complexity, particularly in in...
Coordinated Reinforcement Learning Prefetching Architecture for Multicore Systems	Mohammed Humaid Siddiqui, Fernando Guzman, Yufei Wu, Ruishu Ann	2025-09-12	下载	Hardware prefetching is critical to fill the performance gap between CPU speeds and slower memory accesses. With multicore architectures becoming commonplace, traditional prefetchers are severely chal...
Side-channel Inference of User Activities in AR/VR Using GPU Profiling	Seonghun Son, Chandrika Mukherjee, Reham Mohamed Aburas, Berk Gulmezoglu, Z. Berkay Celik	2025-09-12	下载	Over the past decade, AR/VR devices have drastically changed how we interact with the digital world. Users often share sensitive information, such as their location, browsing history, and even financi...
DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators	Charles Hong, Qijing Huang, Grace Dinh, Mahesh Subedar, Yakun Sophia Shao	2025-09-12	下载	In the hardware design space exploration process, it is critical to optimize both hardware parameters and algorithm-to-hardware mappings. Previous work has largely approached this simultaneous optimiz...
ReCross: Efficient Embedding Reduction Scheme for In-Memory Computing using ReRAM-Based Crossbar	Yu-Hong Lai, Chieh-Lin Tsai, Wen Sheng Lim, Han-Wen Hu, Tei-Wei Kuo, Yuan-Hao Chang	2025-09-12	下载	Deep learning-based recommendation models (DLRMs) are widely deployed in commercial applications to enhance user experience. However, the large and sparse embedding layers in these models impose subst...
TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification	Yang Zhong, Haoran Wu, Xueqi Li, Sa Wang, David Boland, Yungang Bao, Kan Shi	2025-09-12	下载	Verification is a critical process for ensuring the correctness of modern processors. The increasing complexity of processor designs and the emergence of new instruction set architectures (ISAs) like ...
MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness	Huizheng Wang, Zichuan Wang, Zhiheng Yue, Yousheng Long, Taiquan Wei, Jianxun Yang, Yang Wang, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin	2025-09-12	下载	Large language models (LLMs) face significant inference latency due to inefficiencies in GEMM operations, weight access, and KV cache access, especially in real-time scenarios.
Fully Automated Verification Framework for Configurable IPs: From Requirements to Results	Shuhang Zhang, Jelena Radulovic, Thorsten Dworzak	2025-09-12	下载	The increasing competition in the semiconductor industry has created significant pressure to reduce chip prices while maintaining quality and reliability.
Finesse: An Agile Design Framework for Pairing-based Cryptography via Software/Hardware Co-Design	Tianwei Pan, Tianao Dai, Jianlei Yang, Hongbin Jing, Yang Su, Zeyu Hao, Xiaotao Jia, Chunming Hu, Weisheng Zhao	2025-09-12	下载	Pairing-based cryptography (PBC) is crucial in modern cryptographic applications. With the rapid advancement of adversarial research and the growing diversity of application requirements, PBC accelera...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Coordinated Reinforcement Learning Prefetching Architecture for Multicore Systems	Mohammed Humaid Siddiqui, Fernando Guzman, Yufei Wu, Ruishu Ann	2025-09-12	下载	Hardware prefetching is critical to fill the performance gap between CPU speeds and slower memory accesses. With multicore architectures becoming commonplace, traditional prefetchers are severely chal...
MinatoLoader: Accelerating Machine Learning Training Through Efficient Data Preprocessing	Rahma Nouaji, Stella Bitchebe, Ricardo Macedo, Oana Balmau	2025-09-12	下载	Data loaders are used by Machine Learning (ML) frameworks like PyTorch and TensorFlow to apply transformations to data before feeding it into the accelerator.
Asynchronous Gathering of Opaque Robots with Mobility Faults	Subhajit Pramanick, Saswata Jana, Partha Sarathi Mandal, Gokarna Sharma	2025-09-12	下载	We consider the fundamental benchmarking problem of gathering in an $(N,f)$ -fault system consisting of $N$ robots, of which at most $f$ might fail at any execution, under asynchrony.
Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective	Seokjin Go, Joongun Park, Spandan More, Hanjiang Wu, Irene Wang, Aaron Jezghani, Tushar Krishna, Divya Mahajan	2025-09-12	下载	The rapid scaling of Large Language Models (LLMs) has pushed training workloads far beyond the limits of single-node analysis, demanding a deeper understanding of how these models behave across large-...
The Entropy of Parallel Systems	Temitayo Adefemi	2025-09-12	下载	Ever since Claude Shannon used entropy for his "Mathematical Theory of Communication", entropy has become a buzzword in research circles with scientists applying entropy to describe any phenomena that...
FedBiF: Communication-Efficient Federated Learning via Bits Freezing	Shiwei Li, Qunwei Li, Haozhao Wang, Ruixuan Li, Jianbin Lin, Wenliang Zhong	2025-09-12	下载	Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative model training without sharing local data.
SynergAI: Edge-to-Cloud Synergy for Architecture-Driven High-Performance Orchestration for AI Inference	Foteini Stathopoulou, Aggelos Ferikoglou, Manolis Katsaragakis, Dimosthenis Masouros, Sotirios Xydis, Dimitrios Soudris	2025-09-12	下载	The rapid evolution of Artificial Intelligence (AI) and Machine Learning (ML) has significantly heightened computational demands, particularly for inference-serving workloads.
The (R)evolution of Scientific Workflows in the Agentic AI Era: Towards Autonomous Science	Woong Shin, Renan Souza, Daniel Rosendo, Frédéric Suter, Feiyi Wang, Prasanna Balaprakash, Rafael Ferreira da Silva	2025-09-12	下载	Modern scientific discovery increasingly requires coordinating distributed facilities and heterogeneous resources, forcing researchers to act as manual workflow coordinators rather than scientists.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Large-Scale Network Utility Maximization via GPU-Accelerated Proximal Message Passing	Akshay Sreekumar, Anthony Degleris, Ram Rajagopal	2025-09-12	下载	We present a GPU-accelerated proximal message passing algorithm for large-scale network utility maximization (NUM). NUM is a fundamental problem in resource allocation, where resources are allocated a...
gNB-based Local Breakout for URLLC in industrial 5G	Rajendra Paudyal, Rajendra Upadhyay, Al Nahian Bin Emran, Duminda Wijesekera	2025-09-12	下载	Industrial URLLC workloads-coordinated robotics, automated guided vehicles, machine-vision collaboration require sub-5 ms latency and five-nines reliability.
Realistic UE Antennas for 6G in the 3GPP Channel Model	Simon Svendsen, Dimitri Gold, Christian Rom, Volker Pauli, Vuokko Nurmela	2025-09-12	下载	The transition to 6G has driven significant updates to the 3GPP channel model, particularly in modeling UE antennas and user-induced blockage for handheld devices. The 3GPP Rel.19 revision of TR 38.
Trusted Repeater Placement in QKD-enabled Optical Networks	Arup Kumar Marik, Basabdatta Palit, Sadananda Behera	2025-09-12	下载	Quantum Key Distribution (QKD) provides information-theoretic security, but is limited by distance in optical networks, thereby requiring repeater nodes to extend coverage.
Proof of AutoML: SDN based Secure Energy Trading with Blockchain in Disaster Case	Salih Toprak, Muge Erel-Ozcevik	2025-09-12	下载	In disaster scenarios where conventional energy infrastructure is compromised, secure and traceable energy trading between solar-powered households and mobile charging units becomes a necessity.
RFSeek and Ye Shall Find	Noga H. Rotman, Tiago Ferreira, Hila Peleg, Mark Silberstein, Alexandra Silva	2025-09-12	下载	Requests for Comments (RFCs) are extensive specification documents for network protocols, but their prose-based format and their considerable length often impede precise operational understanding.
Friend or Foe? Identifying Anomalous Peers in Moneros P2P Network	Yannik Kopyciok, Stefan Schmid, Friedhelm Victor	2025-09-12	下载	Monero, the leading privacy-focused cryptocurrency, relies on a peer-to-peer (P2P) network to propagate transactions and blocks. Growing evidence suggests that non-standard nodes exist in the network,...
Secure and Scalable Rerouting in LEO Satellite Networks	Lyubomir Yanev, Pietro Ronchetti, Joshua Smailes, Martin Strohmeier	2025-09-12	下载	Resilient routing in large-scale Low Earth Orbit (LEO) satellite networks remains a key challenge due to frequent and unpredictable link and node failures, potentially in response to cybersecurity bre...
Cost-Free Personalization via Information-Geometric Projection in Bayesian Federated Learning	Nour Jamoussi, Giuseppe Serra, Photios A. Stavrou, Marios Kountouris	2025-09-12	下载	Bayesian Federated Learning (BFL) combines uncertainty modeling with decentralized training, enabling the development of personalized and reliable models under data heterogeneity and privacy constrain...
Maximising Energy Efficiency in Large-Scale Open RAN: Hybrid xApps and Digital Twin Integration	Ahmed Al-Tahmeesschi, Yi Chu, Gurdeep Singh, Charles Turyagyenda, Dritan Kaleshi, David Grace, Hamed Ahmadi	2025-09-12	下载	The growing demand for high-speed, ultra-reliable, and low-latency communications in 5G and beyond networks has significantly driven up power consumption, particularly within the Radio Access Network ...
Service Function Chaining Architecture for Multi-hop Split Inference and Learning	Takanori Hara, Masahiro Sasabe	2025-09-12	下载	Service Function Chaining (SFC) is a networking technique that ensures traffic traverses a predefined sequence of service functions, realizing arbitrary network services through dynamic and efficient ...
Taming Volatility: Stable and Private QUIC Classification with Federated Learning	Richard Jozsa, Karel Hynek, Adrian Pekar	2025-09-12	下载	Federated Learning (FL) is a promising approach for privacy-preserving network traffic analysis, but its practical deployment is challenged by the non-IID nature of real-world data.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
XBOF: A Cost-Efficient CXL JBOF with Inter-SSD Compute Resource Sharing	Shushu Yi, Yuda An, Li Peng, Xiurui Pan, Qiao Li, Jieming Yin, Guangyan Zhang, Wenfei Wu, Diyu Zhou, Zhenlin Wang, Xiaolin Wang, Yingwei Luo, Ke Zhou, Jie Zhang	2025-09-12	下载	Enterprise SSDs integrate numerous computing resources (e.g., ARM processor and onboard DRAM) to satisfy the ever-increasing performance requirements of I/O bursts.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Coordinated Reinforcement Learning Prefetching Architecture for Multicore Systems	Mohammed Humaid Siddiqui, Fernando Guzman, Yufei Wu, Ruishu Ann	2025-09-12	下载	Hardware prefetching is critical to fill the performance gap between CPU speeds and slower memory accesses. With multicore architectures becoming commonplace, traditional prefetchers are severely chal...
Stencil-Lifting: Hierarchical Recursive Lifting System for Extracting Summary of Stencil Kernel in Legacy Codes	Mingyi Li, Junmin Xiao, Siyan Chen, Hui Ma, Xi Chen, Peihua Bao, Liang Yuan, Guangming Tan	2025-09-12	下载	We introduce Stencil-Lifting, a novel system for automatically converting stencil kernels written in low-level languages in legacy code into semantically equivalent Domain-Specific Language (DSL) impl...
Matrix-Free Evaluation Strategies for Continuous and Discontinuous Galerkin Discretizations on Unstructured Tetrahedral Grids	Dominik Still, Niklas Fehn, Wolfgang A. Wall, Martin Kronbichler	2025-09-12	下载	This study presents novel strategies for improving the node-level performance of matrix-free evaluation of continuous and discontinuous Galerkin spatial discretizations on unstructured tetrahedral gri...