Skip to content

2025-08-11

cs.AR - Architecture

标题作者发布日期PDF摘要
JSPIM: A Skew-Aware PIM Accelerator for High-Performance Databases Join and Select OperationsSabiha Tajdari, Anastasia Ailamaki, Sandhya Dwarkadas2025-08-11下载Database applications are increasingly bottlenecked by memory bandwidth and latency due to the memory wall and the limited scalability of DRAM.
Vector-Centric Machine Learning Systems: A Cross-Stack ApproachWenqi Jiang2025-08-11下载Today, two major trends are shaping the evolution of ML systems. First, modern AI systems are becoming increasingly complex, often integrating components beyond the model itself.
Architecting Long-Context LLM Acceleration with Packing-Prefetch Scheduler and Ultra-Large Capacity On-Chip MemoriesMing-Yen Lee, Faaiq Waqar, Hanchen Yang, Muhammed Ahosan Ul Karim, Harsono Simka, Shimeng Yu2025-08-11下载Long-context Large Language Model (LLM) inference faces increasing compute bottlenecks as attention calculations scale with context length, primarily due to the growing KV-cache transfer overhead that...
Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson -- ExtendedAbhinaba Chakraborty, Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle2025-08-11下载The proliferation of IoT devices and advancements in network technologies have intensified the demand for real-time data processing at the network edge.
XDMA: A Distributed, Extensible DMA Architecture for Layout-Flexible Data Movements in Heterogeneous Multi-Accelerator SoCsFanchen Kong, Yunhao Deng, Xiaoling Yi, Ryan Antonio, Marian Verhelst2025-08-11下载As modern AI workloads increasingly rely on heterogeneous accelerators, ensuring high-bandwidth and layout-flexible data movements between accelerator memories has become a pressing challenge.
ELF: Efficient Logic Synthesis by Pruning Redundancy in RefactoringDimitris Tsaras, Xing Li, Lei Chen, Zhiyao Xie, Mingxuan Yuan2025-08-11下载In electronic design automation, logic optimization operators play a crucial role in minimizing the gate count of logic circuits. However, their computation demands are high.
TLV-HGNN: Thinking Like a Vertex for Memory-efficient HGNN InferenceDengke Han, Duo Wang, Mingyu Yan, Xiaochun Ye, Dongrui Fan2025-08-11下载Heterogeneous graph neural networks (HGNNs) excel at processing heterogeneous graph data and are widely applied in critical domains. In HGNN inference, the neighbor aggregation stage is the primary pe...
ARISE: Automating RISC-V Instruction Set ExtensionAndreas Hager-Clukas, Philipp van Kempen, Stefan Wallentowitz2025-08-11下载RISC-V is an extendable Instruction Set Architecture, growing in popularity for embedded systems. However, optimizing it to specific requirements, imposes a great deal of manual effort.
A Matrix Decomposition Method for Odd-Type Gaussian Normal Basis MultiplicationKittiphon Phalakarn, Athasit Surarerks2025-08-11下载Normal basis is used in many applications because of the efficiency of the implementation. However, most space complexity reduction techniques for binary field multiplier are applicable for only optim...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
A Reinforcement Learning-Driven Task Scheduling Algorithm for Multi-Tenant Distributed SystemsXiaopei Zhang, Xingang Wang, Xin Wang2025-08-11下载This paper addresses key challenges in task scheduling for multi-tenant distributed systems, including dynamic resource variation, heterogeneous tenant demands, and fairness assurance.
Benchmarking Federated Learning for Throughput Prediction in 5G Live Streaming ApplicationsYuvraj Dutta, Soumyajit Chatterjee, Sandip Chakraborty, Basabdatta Palit2025-08-11下载Accurate and adaptive network throughput prediction is essential for latency-sensitive and bandwidth-intensive applications in 5G and emerging 6G networks.
Vector-Centric Machine Learning Systems: A Cross-Stack ApproachWenqi Jiang2025-08-11下载Today, two major trends are shaping the evolution of ML systems. First, modern AI systems are becoming increasingly complex, often integrating components beyond the model itself.
Towards Efficient and Practical GPU Multitasking in the Era of LLMJiarong Xing, Yifan Qiao, Simon Mo, Xingqi Cui, Gur-Eyal Sela, Yang Zhou, Joseph Gonzalez, Ion Stoica2025-08-11下载GPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, ...
Extremely Scalable Distributed Computation of Contour Trees via Pre-SimplificationMingzhe Li, Hamish Carr, Oliver Rübel, Bei Wang, Gunther H. Weber2025-08-11下载Contour trees offer an abstract representation of the level set topology in scalar fields and are widely used in topological data analysis and visualization.
Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson -- ExtendedAbhinaba Chakraborty, Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle2025-08-11下载The proliferation of IoT devices and advancements in network technologies have intensified the demand for real-time data processing at the network edge.
XDMA: A Distributed, Extensible DMA Architecture for Layout-Flexible Data Movements in Heterogeneous Multi-Accelerator SoCsFanchen Kong, Yunhao Deng, Xiaoling Yi, Ryan Antonio, Marian Verhelst2025-08-11下载As modern AI workloads increasingly rely on heterogeneous accelerators, ensuring high-bandwidth and layout-flexible data movements between accelerator memories has become a pressing challenge.
Fully-Fluctuating Participation in Sleepy ConsensusYuval Efron, Joachim Neu, Toniann Pitassi2025-08-11下载Proof-of-work allows Bitcoin to boast security amidst arbitrary fluctuations in participation of miners throughout time, so long as, at any point in time, a majority of hash power is honest.
On the Operational Resilience of CBDC: Threats and Prospects of Formal Validation for Offline PaymentsMarco Bernardo, Federico Calandra, Andrea Esposito, Francesco Fabris2025-08-11下载Information and communication technologies are by now employed in most human activities, including economics and finance. Modern computers have reached an extraordinary power in terms of information p...
Optimizing Federated Learning for Scalable Power-demand Forecasting in MicrogridsRoopkatha Banerjee, Sampath Koti, Gyanendra Singh, Anirban Chakraborty, Gurunath Gurrala, Bhushan Jagyasi, Yogesh Simmhan2025-08-11下载Real-time monitoring of power consumption in cities and micro-grids through the Internet of Things (IoT) can help forecast future demand and optimize grid operations.
Performance Evaluation of Brokerless Messaging LibrariesLorenzo La Corte, Syed Aftab Rashid, Andrei-Marian Dan2025-08-11下载Messaging systems are essential for efficiently transferring large volumes of data, ensuring rapid response times and high-throughput communication.
GPU-Accelerated Syndrome Decoding for Quantum LDPC Codes below the 63 μs Latency ThresholdOscar Ferraz, Bruno Coutinho, Gabriel Falcao, Marco Gomes, Francisco A. Monteiro, Vitor Silva2025-08-11下载This paper presents a GPU-accelerated decoder for quantum low-density parity-check (QLDPC) codes that achieves sub-6363 μs latency, below the surface code decoder's real-time threshold demonstrated ...
EFU: Enforcing Federated Unlearning via Functional EncryptionSamaneh Mohammadi, Vasileios Tsouvalas, Iraklis Symeonidis, Ali Balador, Tanir Ozcelebi, Francesco Flammini, Nirvana Meratnia2025-08-11下载Federated unlearning (FU) algorithms allow clients in federated settings to exercise their ''right to be forgotten'' by removing the influence of their data from a collaboratively trained model.
Towards Lock Modularization for Heterogeneous EnvironmentsHanze Zhang, Rong Chen, Haibo Chen2025-08-11下载Modern hardware environments are becoming increasingly heterogeneous, leading to the emergence of applications specifically designed to exploit this heterogeneity.
Over-the-Top Resource Broker System for Split Computing: An Approach to Distribute Cloud Computing InfrastructureIngo Friese, Jochen Klaffer, Mandy Galkow-Schneider, Sergiy Melnyk, Qiuheng Zhou, Hans Dieter Schotten2025-08-11下载6G network architectures will usher in a wave of innovative services and capabilities, introducing concepts like split computing and dynamic processing nodes.
Perpetual exploration in anonymous synchronous networks with a Byzantine black holeAdri Bhattacharya, Pritam Goswami, Evangelos Bampas, Partha Sarathi Mandal2025-08-11下载In this paper, we investigate: ``How can a group of initially co-located mobile agents perpetually explore an unknown graph, when one stationary node occasionally behaves maliciously, under an adversa...
Multi-Hop Privacy Propagation for Differentially Private Federated Learning in Social NetworksChenchen Lin, Xuehe Wang2025-08-11下载Federated learning (FL) enables collaborative model training across decentralized clients without sharing local data, thereby enhancing privacy and facilitating collaboration among clients connected v...
Taming Cold Starts: Proactive Serverless Scheduling with Model Predictive ControlChanh Nguyen, Monowar Bhuyan, Erik Elmroth2025-08-11下载Serverless computing has transformed cloud application deployment by introducing a fine-grained, event-driven execution model that abstracts away infrastructure management.
Coordinated Power Management on Heterogeneous SystemsZhong Zheng, Zhiling Lan, Xingfu Wu, Valerie E. Taylor, Michael E. Papka2025-08-11下载Performance prediction is essential for energy-efficient computing in heterogeneous computing systems that integrate CPUs and GPUs. However, traditional performance modeling methods often rely on exha...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Experimental Validation of Provably Covert Communication Using Software-Defined RadioRohan Bali, Trevor E. Bailey, Michael S. Bullock, Boulat A. Bash2025-08-11下载The fundamental information-theoretic limits of covert, or low probability of detection/intercept (LPD/LPI), communication have been extensively studied for over a decade, resulting in the square root...
Industrial Viewpoints on RAN Technologies for 6GMansoor Shafi, Erik G. Larsson, Xingqin Lin, Dorin Panaitopol, Stefan Parkvall, Flavien Ronteix-Jacquet, Antti Toskala2025-08-11下载6G standardization is to start imminently, with commercial deployments expected before 2030. Its technical components and performance requirements are the focus of this article.
VeriPHY: Physical Layer Signal Authentication for Wireless Communication in 5G EnvironmentsClifton Paul Robinson, Salvatore D'Oro, Tommaso Melodia2025-08-11下载Physical layer authentication (PLA) uses inherent characteristics of the communication medium to provide secure and efficient authentication in wireless networks, bypassing the need for traditional cr...
Adaptive Multiple Access and Service Placement for Generative Diffusion ModelsHamidreza Mazandarani, Mohammad Farhoudi, Masoud Shokrnezhad, Tarik Taleb2025-08-11下载Generative Diffusion Models (GDMs) have emerged as key components of Generative Artificial Intelligence (GenAI), offering unparalleled expressiveness and controllability for complex data generation ta...
Performance Evaluation of Brokerless Messaging LibrariesLorenzo La Corte, Syed Aftab Rashid, Andrei-Marian Dan2025-08-11下载Messaging systems are essential for efficiently transferring large volumes of data, ensuring rapid response times and high-throughput communication.
Scalable and Energy-Efficient Predictive Data Collection in Wireless Sensor Networks with Constructive InterferenceConor Muldoon2025-08-11下载A new class of Wireless Sensor Network has emerged whereby multiple nodes transmit data simultaneously, exploiting constructive interference to enable data collection frameworks with low energy usage ...
An Experimental Reservoir-Augmented Foundation Model: 6G O-RAN Case StudyFarhad Rezazadeh, Raymond Zhao, Jiongyu Dai, Amir Ashtari Gargari, Hatim Chergui, Lingjia Liu2025-08-11下载Next-generation open radio access networks (O-RAN) continuously stream tens of key performance indicators (KPIs) together with raw in-phase/quadrature (IQ) samples, yielding ultra-high-dimensional, no...
Over-the-Top Resource Broker System for Split Computing: An Approach to Distribute Cloud Computing InfrastructureIngo Friese, Jochen Klaffer, Mandy Galkow-Schneider, Sergiy Melnyk, Qiuheng Zhou, Hans Dieter Schotten2025-08-11下载6G network architectures will usher in a wave of innovative services and capabilities, introducing concepts like split computing and dynamic processing nodes.
Joint link scheduling and power allocation in imperfect and energy-constrained underwater wireless sensor networksTong Zhang, Yu Gou, Jun Liu, Shanshan Song, Tingting Yang, Jun-Hong Cui2025-08-11下载Underwater wireless sensor networks (UWSNs) stand as promising technologies facilitating diverse underwater applications. However, the major design issues of the considered system are the severely lim...
Joint Scheduling and Resource Allocation in mmWave IAB Networks Using Deep RLMaryam Abbasalizadeh, Sashank Narain2025-08-11下载Integrated Access and Backhaul (IAB) is critical for dense 5G and beyond deployments, especially in mmWave bands where fiber backhaul is infeasible.
Optimization of Private Semantic Communication Performance: An Uncooperative Covert Communication MethodWenjing Zhang, Ye Hu, Tao Luo, Zhilong Zhang, Mingzhe Chen2025-08-11下载In this paper, a novel covert semantic communication framework is investigated. Within this framework, a server extracts and transmits the semantic information, i.e.
Achieving Fair-Effective Communications and Robustness in Underwater Acoustic Sensor Networks: A Semi-Cooperative ApproachYu Gou, Tong Zhang, Jun Liu, Tingting Yang, Shanshan Song, Jun-Hong Cui2025-08-11下载This paper investigates the fair-effective communication and robustness in imperfect and energy-constrained underwater acoustic sensor networks (IC-UASNs).
Enhancing Mega-Satellite Networks with Generative Semantic Communication: A Networking PerspectiveBinquan Guo, Wanting Yang, Zehui Xiong, Zhou Zhang, Baosheng Li, Zhu Han, Rahim Tafazolli, Tony Q. S. Quek2025-08-11下载The advance of direct satellite-to-device communication has positioned mega-satellite constellations as a cornerstone of 6G wireless communication, enabling seamless global connectivity even in remote...
Multimodal Remote InferenceKeyuan Zhang, Yin Sun, Bo Ji2025-08-11下载We consider a remote inference system with multiple modalities, where a multimodal machine learning (ML) model performs real-time inference using features collected from remote sensors.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Towards Efficient and Practical GPU Multitasking in the Era of LLMJiarong Xing, Yifan Qiao, Simon Mo, Xingqi Cui, Gur-Eyal Sela, Yang Zhou, Joseph Gonzalez, Ion Stoica2025-08-11下载GPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, ...
Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM InferenceKexin Chu, Zecheng Lin, Dawei Xiang, Zixu Shen, Jianchang Su, Cheng Chu, Yiwei Yang, Wenhui Zhang, Wenfei Wu, Wei Zhang2025-08-11下载Global KV-cache sharing is an effective optimization for accelerating large language model (LLM) inference, yet it introduces an API-visible timing side channel that lets adversaries infer sensitive u...

cs.PF - Performance

标题作者发布日期PDF摘要
JSPIM: A Skew-Aware PIM Accelerator for High-Performance Databases Join and Select OperationsSabiha Tajdari, Anastasia Ailamaki, Sandhya Dwarkadas2025-08-11下载Database applications are increasingly bottlenecked by memory bandwidth and latency due to the memory wall and the limited scalability of DRAM.
Vector-Centric Machine Learning Systems: A Cross-Stack ApproachWenqi Jiang2025-08-11下载Today, two major trends are shaping the evolution of ML systems. First, modern AI systems are becoming increasingly complex, often integrating components beyond the model itself.
Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson -- ExtendedAbhinaba Chakraborty, Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle2025-08-11下载The proliferation of IoT devices and advancements in network technologies have intensified the demand for real-time data processing at the network edge.
A Data-driven ML Approach for Maximizing Performance in LLM-Adapter ServingFerran Agullo, Joan Oliveras, Chen Wang, Alberto Gutierrez-Torre, Olivier Tardieu, Alaa Youssef, Jordi Torres, Josep Ll. Berral2025-08-11下载With the rapid adoption of Large Language Models (LLMs), LLM-adapters have become increasingly common, providing lightweight specialization of large-scale models.
Taming Cold Starts: Proactive Serverless Scheduling with Model Predictive ControlChanh Nguyen, Monowar Bhuyan, Erik Elmroth2025-08-11下载Serverless computing has transformed cloud application deployment by introducing a fine-grained, event-driven execution model that abstracts away infrastructure management.

基于 VitePress 构建