Skip to content

2025-09-12

cs.AR - Architecture

标题作者发布日期PDF摘要
Design and Analysis of Approximate Hardware Accelerators for VVC Intra Angular PredictionLucas M. Leipnitz de Fraga, Cláudio Machado Diniz2025-09-12下载The Versatile Video Coding (VVC) standard significantly improves compression efficiency over its predecessor, HEVC, but at the cost of substantially higher computational complexity, particularly in in...
Coordinated Reinforcement Learning Prefetching Architecture for Multicore SystemsMohammed Humaid Siddiqui, Fernando Guzman, Yufei Wu, Ruishu Ann2025-09-12下载Hardware prefetching is critical to fill the performance gap between CPU speeds and slower memory accesses. With multicore architectures becoming commonplace, traditional prefetchers are severely chal...
Side-channel Inference of User Activities in AR/VR Using GPU ProfilingSeonghun Son, Chandrika Mukherjee, Reham Mohamed Aburas, Berk Gulmezoglu, Z. Berkay Celik2025-09-12下载Over the past decade, AR/VR devices have drastically changed how we interact with the digital world. Users often share sensitive information, such as their location, browsing history, and even financi...
DOSA: Differentiable Model-Based One-Loop Search for DNN AcceleratorsCharles Hong, Qijing Huang, Grace Dinh, Mahesh Subedar, Yakun Sophia Shao2025-09-12下载In the hardware design space exploration process, it is critical to optimize both hardware parameters and algorithm-to-hardware mappings. Previous work has largely approached this simultaneous optimiz...
ReCross: Efficient Embedding Reduction Scheme for In-Memory Computing using ReRAM-Based CrossbarYu-Hong Lai, Chieh-Lin Tsai, Wen Sheng Lim, Han-Wen Hu, Tei-Wei Kuo, Yuan-Hao Chang2025-09-12下载Deep learning-based recommendation models (DLRMs) are widely deployed in commercial applications to enhance user experience. However, the large and sparse embedding layers in these models impose subst...
TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile VerificationYang Zhong, Haoran Wu, Xueqi Li, Sa Wang, David Boland, Yungang Bao, Kan Shi2025-09-12下载Verification is a critical process for ensuring the correctness of modern processors. The increasing complexity of processor designs and the emergence of new instruction set architectures (ISAs) like ...
MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and RepetitivenessHuizheng Wang, Zichuan Wang, Zhiheng Yue, Yousheng Long, Taiquan Wei, Jianxun Yang, Yang Wang, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin2025-09-12下载Large language models (LLMs) face significant inference latency due to inefficiencies in GEMM operations, weight access, and KV cache access, especially in real-time scenarios.
Fully Automated Verification Framework for Configurable IPs: From Requirements to ResultsShuhang Zhang, Jelena Radulovic, Thorsten Dworzak2025-09-12下载The increasing competition in the semiconductor industry has created significant pressure to reduce chip prices while maintaining quality and reliability.
Finesse: An Agile Design Framework for Pairing-based Cryptography via Software/Hardware Co-DesignTianwei Pan, Tianao Dai, Jianlei Yang, Hongbin Jing, Yang Su, Zeyu Hao, Xiaotao Jia, Chunming Hu, Weisheng Zhao2025-09-12下载Pairing-based cryptography (PBC) is crucial in modern cryptographic applications. With the rapid advancement of adversarial research and the growing diversity of application requirements, PBC accelera...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Coordinated Reinforcement Learning Prefetching Architecture for Multicore SystemsMohammed Humaid Siddiqui, Fernando Guzman, Yufei Wu, Ruishu Ann2025-09-12下载Hardware prefetching is critical to fill the performance gap between CPU speeds and slower memory accesses. With multicore architectures becoming commonplace, traditional prefetchers are severely chal...
MinatoLoader: Accelerating Machine Learning Training Through Efficient Data PreprocessingRahma Nouaji, Stella Bitchebe, Ricardo Macedo, Oana Balmau2025-09-12下载Data loaders are used by Machine Learning (ML) frameworks like PyTorch and TensorFlow to apply transformations to data before feeding it into the accelerator.
Asynchronous Gathering of Opaque Robots with Mobility FaultsSubhajit Pramanick, Saswata Jana, Partha Sarathi Mandal, Gokarna Sharma2025-09-12下载We consider the fundamental benchmarking problem of gathering in an (N,f)(N,f)-fault system consisting of NN robots, of which at most ff might fail at any execution, under asynchrony.
Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal PerspectiveSeokjin Go, Joongun Park, Spandan More, Hanjiang Wu, Irene Wang, Aaron Jezghani, Tushar Krishna, Divya Mahajan2025-09-12下载The rapid scaling of Large Language Models (LLMs) has pushed training workloads far beyond the limits of single-node analysis, demanding a deeper understanding of how these models behave across large-...
The Entropy of Parallel SystemsTemitayo Adefemi2025-09-12下载Ever since Claude Shannon used entropy for his "Mathematical Theory of Communication", entropy has become a buzzword in research circles with scientists applying entropy to describe any phenomena that...
FedBiF: Communication-Efficient Federated Learning via Bits FreezingShiwei Li, Qunwei Li, Haozhao Wang, Ruixuan Li, Jianbin Lin, Wenliang Zhong2025-09-12下载Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative model training without sharing local data.
SynergAI: Edge-to-Cloud Synergy for Architecture-Driven High-Performance Orchestration for AI InferenceFoteini Stathopoulou, Aggelos Ferikoglou, Manolis Katsaragakis, Dimosthenis Masouros, Sotirios Xydis, Dimitrios Soudris2025-09-12下载The rapid evolution of Artificial Intelligence (AI) and Machine Learning (ML) has significantly heightened computational demands, particularly for inference-serving workloads.
The (R)evolution of Scientific Workflows in the Agentic AI Era: Towards Autonomous ScienceWoong Shin, Renan Souza, Daniel Rosendo, Frédéric Suter, Feiyi Wang, Prasanna Balaprakash, Rafael Ferreira da Silva2025-09-12下载Modern scientific discovery increasingly requires coordinating distributed facilities and heterogeneous resources, forcing researchers to act as manual workflow coordinators rather than scientists.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Large-Scale Network Utility Maximization via GPU-Accelerated Proximal Message PassingAkshay Sreekumar, Anthony Degleris, Ram Rajagopal2025-09-12下载We present a GPU-accelerated proximal message passing algorithm for large-scale network utility maximization (NUM). NUM is a fundamental problem in resource allocation, where resources are allocated a...
gNB-based Local Breakout for URLLC in industrial 5GRajendra Paudyal, Rajendra Upadhyay, Al Nahian Bin Emran, Duminda Wijesekera2025-09-12下载Industrial URLLC workloads-coordinated robotics, automated guided vehicles, machine-vision collaboration require sub-5 ms latency and five-nines reliability.
Realistic UE Antennas for 6G in the 3GPP Channel ModelSimon Svendsen, Dimitri Gold, Christian Rom, Volker Pauli, Vuokko Nurmela2025-09-12下载The transition to 6G has driven significant updates to the 3GPP channel model, particularly in modeling UE antennas and user-induced blockage for handheld devices. The 3GPP Rel.19 revision of TR 38.
Trusted Repeater Placement in QKD-enabled Optical NetworksArup Kumar Marik, Basabdatta Palit, Sadananda Behera2025-09-12下载Quantum Key Distribution (QKD) provides information-theoretic security, but is limited by distance in optical networks, thereby requiring repeater nodes to extend coverage.
Proof of AutoML: SDN based Secure Energy Trading with Blockchain in Disaster CaseSalih Toprak, Muge Erel-Ozcevik2025-09-12下载In disaster scenarios where conventional energy infrastructure is compromised, secure and traceable energy trading between solar-powered households and mobile charging units becomes a necessity.
RFSeek and Ye Shall FindNoga H. Rotman, Tiago Ferreira, Hila Peleg, Mark Silberstein, Alexandra Silva2025-09-12下载Requests for Comments (RFCs) are extensive specification documents for network protocols, but their prose-based format and their considerable length often impede precise operational understanding.
Friend or Foe? Identifying Anomalous Peers in Moneros P2P NetworkYannik Kopyciok, Stefan Schmid, Friedhelm Victor2025-09-12下载Monero, the leading privacy-focused cryptocurrency, relies on a peer-to-peer (P2P) network to propagate transactions and blocks. Growing evidence suggests that non-standard nodes exist in the network,...
Secure and Scalable Rerouting in LEO Satellite NetworksLyubomir Yanev, Pietro Ronchetti, Joshua Smailes, Martin Strohmeier2025-09-12下载Resilient routing in large-scale Low Earth Orbit (LEO) satellite networks remains a key challenge due to frequent and unpredictable link and node failures, potentially in response to cybersecurity bre...
Cost-Free Personalization via Information-Geometric Projection in Bayesian Federated LearningNour Jamoussi, Giuseppe Serra, Photios A. Stavrou, Marios Kountouris2025-09-12下载Bayesian Federated Learning (BFL) combines uncertainty modeling with decentralized training, enabling the development of personalized and reliable models under data heterogeneity and privacy constrain...
Maximising Energy Efficiency in Large-Scale Open RAN: Hybrid xApps and Digital Twin IntegrationAhmed Al-Tahmeesschi, Yi Chu, Gurdeep Singh, Charles Turyagyenda, Dritan Kaleshi, David Grace, Hamed Ahmadi2025-09-12下载The growing demand for high-speed, ultra-reliable, and low-latency communications in 5G and beyond networks has significantly driven up power consumption, particularly within the Radio Access Network ...
Service Function Chaining Architecture for Multi-hop Split Inference and LearningTakanori Hara, Masahiro Sasabe2025-09-12下载Service Function Chaining (SFC) is a networking technique that ensures traffic traverses a predefined sequence of service functions, realizing arbitrary network services through dynamic and efficient ...
Taming Volatility: Stable and Private QUIC Classification with Federated LearningRichard Jozsa, Karel Hynek, Adrian Pekar2025-09-12下载Federated Learning (FL) is a promising approach for privacy-preserving network traffic analysis, but its practical deployment is challenged by the non-IID nature of real-world data.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
XBOF: A Cost-Efficient CXL JBOF with Inter-SSD Compute Resource SharingShushu Yi, Yuda An, Li Peng, Xiurui Pan, Qiao Li, Jieming Yin, Guangyan Zhang, Wenfei Wu, Diyu Zhou, Zhenlin Wang, Xiaolin Wang, Yingwei Luo, Ke Zhou, Jie Zhang2025-09-12下载Enterprise SSDs integrate numerous computing resources (e.g., ARM processor and onboard DRAM) to satisfy the ever-increasing performance requirements of I/O bursts.

cs.PF - Performance

标题作者发布日期PDF摘要
Coordinated Reinforcement Learning Prefetching Architecture for Multicore SystemsMohammed Humaid Siddiqui, Fernando Guzman, Yufei Wu, Ruishu Ann2025-09-12下载Hardware prefetching is critical to fill the performance gap between CPU speeds and slower memory accesses. With multicore architectures becoming commonplace, traditional prefetchers are severely chal...
Stencil-Lifting: Hierarchical Recursive Lifting System for Extracting Summary of Stencil Kernel in Legacy CodesMingyi Li, Junmin Xiao, Siyan Chen, Hui Ma, Xi Chen, Peihua Bao, Liang Yuan, Guangming Tan2025-09-12下载We introduce Stencil-Lifting, a novel system for automatically converting stencil kernels written in low-level languages in legacy code into semantically equivalent Domain-Specific Language (DSL) impl...
Matrix-Free Evaluation Strategies for Continuous and Discontinuous Galerkin Discretizations on Unstructured Tetrahedral GridsDominik Still, Niklas Fehn, Wolfgang A. Wall, Martin Kronbichler2025-09-12下载This study presents novel strategies for improving the node-level performance of matrix-free evaluation of continuous and discontinuous Galerkin spatial discretizations on unstructured tetrahedral gri...

基于 VitePress 构建