2025-11-24

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Understanding Accelerator Compilers via Performance Profiling	Ayaka Yorihiro, Griffin Berlstein, Pedro Pontes García, Kevin Laeufer, Adrian Sampson	2025-11-24	下载	Accelerator design languages (ADLs), high-level languages that compile to hardware units, help domain experts quickly design efficient application-specific hardware.
CAMformer: Associative Memory is All You Need	Tergel Molom-Ochir, Benjamin F. Morris, Mark Horton, Chiyue Wei, Cong Guo, Brady Taylor, Peter Liu, Shan X. Wang, Deliang Fan, Hai Helen Li, Yiran Chen	2025-11-24	下载	Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys.
HeLEx: A Heterogeneous Layout Explorer for Spatial Elastic Coarse-Grained Reconfigurable Arrays	Alan Jia Bao Du, Tarek S. Abdelrahman	2025-11-24	下载	We present HeLEx, a framework for determining the functional layout of heterogeneous spatially-configured elastic Coarse-Grained Reconfigurable Arrays (CGRAs).
IOMMU Support for Virtual-Address Remote DMA in an ARMv8 environment	Antonis Psistakis	2025-11-24	下载	In complex systems with many compute nodes containing multiple CPUs that are coherent within each node, a key challenge is maintaining efficient and correct coherence between nodes.
A CNN-Based Technique to Assist Layout-to-Generator Conversion for Analog Circuits	Sungyu Jeong, Minsu Kim, Byungsub Kim	2025-11-24	下载	We propose a technique to assist in converting a reference layout of an analog circuit into the procedural layout generator by efficiently reusing available generators for sub-cell creation.
Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse Processing	Xiaotong Huang, He Zhu, Tianrui Ma, Yuxiang Xiong, Fangxin Liu, Zhezhi He, Yiming Gan, Zihan Liu, Jingwen Leng, Yu Feng, Minyi Guo	2025-11-24	下载	3D Gaussian splatting (3DGS) has emerged as a promising direction for SLAM due to its high-fidelity reconstruction and rapid convergence. However, 3DGS-SLAM algorithms remain impractical for mobile pl...
Evaluation of GPU Video Encoder for Low-Latency Real-Time 4K UHD Encoding	Kasidis Arunruangsirilert, Jiro Katto	2025-11-24	下载	The demand for high-quality, real-time video streaming has grown exponentially, with 4K Ultra High Definition (UHD) becoming the new standard for many applications such as live broadcasting, TV servic...
Evaluation of NVENC Split-Frame Encoding (SFE) for UHD Video Transcoding	Kasidis Arunruangsirilert, Jiro Katto	2025-11-24	下载	NVIDIA Encoder (NVENC) features in modern NVIDIA GPUs, offer significant advantages over software encoders by providing comparable Rate-Distortion (RD) performance while consuming considerably less po...
Evaluation of Hardware-based Video Encoders on Modern GPUs for UHD Live-Streaming	Kasidis Arunruangsirilert, Jiro Katto	2025-11-24	下载	Many GPUs have incorporated hardware-accelerated video encoders, which allow video encoding tasks to be offloaded from the main CPU and provide higher power efficiency.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
IOMMU Support for Virtual-Address Remote DMA in an ARMv8 environment	Antonis Psistakis	2025-11-24	下载	In complex systems with many compute nodes containing multiple CPUs that are coherent within each node, a key challenge is maintaining efficient and correct coherence between nodes.
Leader Election via Unique Sink Orientation	Jérémie Chalopin, Maria Kokkou	2025-11-24	下载	A Locally Checkable Labeling (LCL) is a specification describing a set of labels that are valid with respect to a set of conditions that characterize a local part of a solution to a global problem.
AME: An Efficient Heterogeneous Agentic Memory Engine for Smartphones	Xinkui Zhao, Qingyu Ma, Yifan Zhang, Hengxuan Lou, Guanjie Cheng, Shuiguang Deng, Jianwei Yin	2025-11-24	下载	On-device agents on smartphones increasingly require continuously evolving memory to support personalized, context-aware, and long-term behaviors.
An Online Fragmentation-Aware GPU Scheduler for Multi-Tenant MIG-based Clouds	Marco Zambianco, Lorenzo Fasol, Roberto Doriguzzi-Corin	2025-11-24	下载	The explosive growth of AI applications has created unprecedented demand for GPU resources. Cloud providers meet this demand through GPU-as-a-Service platforms that offer rentable GPU resources for ru...
Federated style aware transformer aggregation of representations	Mincheol Jeon, Euinam Huh	2025-11-24	下载	Personalized Federated Learning (PFL) faces persistent challenges, including domain heterogeneity from diverse client data, data imbalance due to skewed participation, and strict communication constra...
N2N: A Parallel Framework for Large-Scale MILP under Distributed Memory	Longfei Wang, Junyan Liu, Fan Zhang, Jiangwen Wei, Yuanhua Tang, Jie Sun, Xiaodong Luo	2025-11-24	下载	Parallelization has emerged as a promising approach for accelerating MILP solving. However, the complexity of the branch-and-bound (B&B) framework and the numerous effective algorithm components in MI...
Low-Rank GEMM: Efficient Matrix Multiplication via Low-Rank Approximation with FP8 Acceleration	Alfredo Metere	2025-11-24	下载	Large matrix multiplication is a cornerstone of modern machine learning workloads, yet traditional approaches suffer from cubic computational complexity (e.g.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
A Layered Protocol Architecture for the Internet of Agents	Charles Fleming, Luca Muscariello, Vijoy Pandey, Ramana Kompella	2025-11-24	下载	Large Language Models (LLMs) have demonstrated remarkable performance improvements and the ability to learn domain-specific languages (DSLs), including APIs and tool interfaces.
LLM-Driven Stationarity-Aware Expert Demonstrations for Multi-Agent Reinforcement Learning in Mobile Systems	Tianyang Duan, Zongyuan Zhang, Zheng Lin, Songxiao Guo, Xiuxian Guan, Guangyu Wu, Zihan Fang, Haotian Meng, Xia Du, Ji-Zhe Zhou, Heming Cui, Jun Luo, Yue Gao	2025-11-24	下载	Multi-agent reinforcement learning (MARL) has been increasingly adopted in many real-world applications. While MARL enables decentralized deployment on resource-constrained edge devices, it suffers fr...
Automated Fault Detection in 5G Core Networks Using Large Language Models	Parsa Hatami, Ahmadreza Majlesara, Ali Majlesi, Babak Hossein Khalaj	2025-11-24	下载	With the rapid growth of data volume in modern telecommunication networks and the continuous expansion of their scale, maintaining high reliability has become a critical requirement.
An O-RAN Framework for AI/ML-Based Localization with OpenAirInterface and FlexRIC	Nada Bouknana, Mohsen Ahadi, Florian Kaltenberger, Robert Schmidt	2025-11-24	下载	Localization is increasingly becoming an integral component of wireless cellular networks. The advent of artificial intelligence (AI) and machine learning (ML) based localization algorithms presents p...
Characterizing the Impact of Active Queue Management on Speed Test Measurements	Siddhant Ray, Taveesh Sharma, Jonatas Marques, Paul Schmitt, Francesco Bronzino, Nick Feamster	2025-11-24	下载	Present day speed test tools measure peak throughput, but often fail to capture the user-perceived responsiveness of a network connection under load.
LLM-Based Agentic Negotiation for 6G: Addressing Uncertainty Neglect and Tail-Event Risk	Hatim Chergui, Farhad Rezazadeh, Mehdi Bennis, Merouane Debbah	2025-11-24	下载	A critical barrier to the trustworthiness of sixth-generation (6G) agentic autonomous networks is the uncertainty neglect bias; a cognitive tendency for large language model (LLM)-powered agents to ma...
Agent Discovery in Internet of Agents: Challenges and Solutions	Shaolong Guo, Yuntao Wang, Zhou Su, Yanghe Pan, Qinnan Hu, Tom H. Luan	2025-11-24	下载	Rapid advances in large language models and agentic AI are driving the emergence of the Internet of Agents (IoA), a paradigm where billions of autonomous software and embodied agents interact, coordin...
Diffusion Model-Enhanced Environment Reconstruction in ISAC	Nguyen Duc Minh Quang, Chang Liu, Shuangyang Li, Hoai-Nam Vu, Derrick Wing Kwan Ng, Wei Xiang	2025-11-24	下载	Recently, environment reconstruction (ER) in integrated sensing and communication (ISAC) systems has emerged as a promising approach for achieving high-resolution environmental perception.
Energy-Efficient Routing Protocol in Vehicular Opportunistic Networks: A Dynamic Cluster-based Routing Using Deep Reinforcement Learning	Meisam Sharifi Sani, Saeid Iranmanesh, Raad Raad, Faisel Tubbal	2025-11-24	下载	Opportunistic Networks (OppNets) employ the Store-Carry-Forward (SCF) paradigm to maintain communication during intermittent connectivity. However, routing performance suffers due to dynamic topology ...
An Online Fragmentation-Aware GPU Scheduler for Multi-Tenant MIG-based Clouds	Marco Zambianco, Lorenzo Fasol, Roberto Doriguzzi-Corin	2025-11-24	下载	The explosive growth of AI applications has created unprecedented demand for GPU resources. Cloud providers meet this demand through GPU-as-a-Service platforms that offer rentable GPU resources for ru...
Toward Integrated Air-Ground Computing and Communications: A Synergy of Computing Power Networks and Low-Altitude Economy Network	Yan Sun, Yinqiu Liu, Shaoyong Guo, Ruichen Zhang, Jiacheng Wang, Feng Qi, Xuesong Qiu, Dusit Niyato	2025-11-24	下载	With the rapid rise of the Low-Altitude Economy (LAE), the demand for intelligent processing and real-time response in services such as aerial traffic, emergency communications, and environmental moni...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron Chunking	Kichang Yang, Seonjun Kim, Minjae Kim, Nairan Zhang, Chi Zhang, Youngki Lee	2025-11-24	下载	Edge deployment of large Vision-Language Models (VLMs) increasingly relies on flash-based weight offloading, where activation sparsification is used to reduce I/O overhead.
Low-Rank GEMM: Efficient Matrix Multiplication via Low-Rank Approximation with FP8 Acceleration	Alfredo Metere	2025-11-24	下载	Large matrix multiplication is a cornerstone of modern machine learning workloads, yet traditional approaches suffer from cubic computational complexity (e.g.