Skip to content

2025-11-24

cs.AR - Architecture

标题作者发布日期PDF摘要
Understanding Accelerator Compilers via Performance ProfilingAyaka Yorihiro, Griffin Berlstein, Pedro Pontes García, Kevin Laeufer, Adrian Sampson2025-11-24下载Accelerator design languages (ADLs), high-level languages that compile to hardware units, help domain experts quickly design efficient application-specific hardware.
CAMformer: Associative Memory is All You NeedTergel Molom-Ochir, Benjamin F. Morris, Mark Horton, Chiyue Wei, Cong Guo, Brady Taylor, Peter Liu, Shan X. Wang, Deliang Fan, Hai Helen Li, Yiran Chen2025-11-24下载Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys.
HeLEx: A Heterogeneous Layout Explorer for Spatial Elastic Coarse-Grained Reconfigurable ArraysAlan Jia Bao Du, Tarek S. Abdelrahman2025-11-24下载We present HeLEx, a framework for determining the functional layout of heterogeneous spatially-configured elastic Coarse-Grained Reconfigurable Arrays (CGRAs).
IOMMU Support for Virtual-Address Remote DMA in an ARMv8 environmentAntonis Psistakis2025-11-24下载In complex systems with many compute nodes containing multiple CPUs that are coherent within each node, a key challenge is maintaining efficient and correct coherence between nodes.
A CNN-Based Technique to Assist Layout-to-Generator Conversion for Analog CircuitsSungyu Jeong, Minsu Kim, Byungsub Kim2025-11-24下载We propose a technique to assist in converting a reference layout of an analog circuit into the procedural layout generator by efficiently reusing available generators for sub-cell creation.
Splatonic: Architecture Support for 3D Gaussian Splatting SLAM via Sparse ProcessingXiaotong Huang, He Zhu, Tianrui Ma, Yuxiang Xiong, Fangxin Liu, Zhezhi He, Yiming Gan, Zihan Liu, Jingwen Leng, Yu Feng, Minyi Guo2025-11-24下载3D Gaussian splatting (3DGS) has emerged as a promising direction for SLAM due to its high-fidelity reconstruction and rapid convergence. However, 3DGS-SLAM algorithms remain impractical for mobile pl...
Evaluation of GPU Video Encoder for Low-Latency Real-Time 4K UHD EncodingKasidis Arunruangsirilert, Jiro Katto2025-11-24下载The demand for high-quality, real-time video streaming has grown exponentially, with 4K Ultra High Definition (UHD) becoming the new standard for many applications such as live broadcasting, TV servic...
Evaluation of NVENC Split-Frame Encoding (SFE) for UHD Video TranscodingKasidis Arunruangsirilert, Jiro Katto2025-11-24下载NVIDIA Encoder (NVENC) features in modern NVIDIA GPUs, offer significant advantages over software encoders by providing comparable Rate-Distortion (RD) performance while consuming considerably less po...
Evaluation of Hardware-based Video Encoders on Modern GPUs for UHD Live-StreamingKasidis Arunruangsirilert, Jiro Katto2025-11-24下载Many GPUs have incorporated hardware-accelerated video encoders, which allow video encoding tasks to be offloaded from the main CPU and provide higher power efficiency.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
IOMMU Support for Virtual-Address Remote DMA in an ARMv8 environmentAntonis Psistakis2025-11-24下载In complex systems with many compute nodes containing multiple CPUs that are coherent within each node, a key challenge is maintaining efficient and correct coherence between nodes.
Leader Election via Unique Sink OrientationJérémie Chalopin, Maria Kokkou2025-11-24下载A Locally Checkable Labeling (LCL) is a specification describing a set of labels that are valid with respect to a set of conditions that characterize a local part of a solution to a global problem.
AME: An Efficient Heterogeneous Agentic Memory Engine for SmartphonesXinkui Zhao, Qingyu Ma, Yifan Zhang, Hengxuan Lou, Guanjie Cheng, Shuiguang Deng, Jianwei Yin2025-11-24下载On-device agents on smartphones increasingly require continuously evolving memory to support personalized, context-aware, and long-term behaviors.
An Online Fragmentation-Aware GPU Scheduler for Multi-Tenant MIG-based CloudsMarco Zambianco, Lorenzo Fasol, Roberto Doriguzzi-Corin2025-11-24下载The explosive growth of AI applications has created unprecedented demand for GPU resources. Cloud providers meet this demand through GPU-as-a-Service platforms that offer rentable GPU resources for ru...
Federated style aware transformer aggregation of representationsMincheol Jeon, Euinam Huh2025-11-24下载Personalized Federated Learning (PFL) faces persistent challenges, including domain heterogeneity from diverse client data, data imbalance due to skewed participation, and strict communication constra...
N2N: A Parallel Framework for Large-Scale MILP under Distributed MemoryLongfei Wang, Junyan Liu, Fan Zhang, Jiangwen Wei, Yuanhua Tang, Jie Sun, Xiaodong Luo2025-11-24下载Parallelization has emerged as a promising approach for accelerating MILP solving. However, the complexity of the branch-and-bound (B&B) framework and the numerous effective algorithm components in MI...
Low-Rank GEMM: Efficient Matrix Multiplication via Low-Rank Approximation with FP8 AccelerationAlfredo Metere2025-11-24下载Large matrix multiplication is a cornerstone of modern machine learning workloads, yet traditional approaches suffer from cubic computational complexity (e.g.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Layered Protocol Architecture for the Internet of AgentsCharles Fleming, Luca Muscariello, Vijoy Pandey, Ramana Kompella2025-11-24下载Large Language Models (LLMs) have demonstrated remarkable performance improvements and the ability to learn domain-specific languages (DSLs), including APIs and tool interfaces.
LLM-Driven Stationarity-Aware Expert Demonstrations for Multi-Agent Reinforcement Learning in Mobile SystemsTianyang Duan, Zongyuan Zhang, Zheng Lin, Songxiao Guo, Xiuxian Guan, Guangyu Wu, Zihan Fang, Haotian Meng, Xia Du, Ji-Zhe Zhou, Heming Cui, Jun Luo, Yue Gao2025-11-24下载Multi-agent reinforcement learning (MARL) has been increasingly adopted in many real-world applications. While MARL enables decentralized deployment on resource-constrained edge devices, it suffers fr...
Automated Fault Detection in 5G Core Networks Using Large Language ModelsParsa Hatami, Ahmadreza Majlesara, Ali Majlesi, Babak Hossein Khalaj2025-11-24下载With the rapid growth of data volume in modern telecommunication networks and the continuous expansion of their scale, maintaining high reliability has become a critical requirement.
An O-RAN Framework for AI/ML-Based Localization with OpenAirInterface and FlexRICNada Bouknana, Mohsen Ahadi, Florian Kaltenberger, Robert Schmidt2025-11-24下载Localization is increasingly becoming an integral component of wireless cellular networks. The advent of artificial intelligence (AI) and machine learning (ML) based localization algorithms presents p...
Characterizing the Impact of Active Queue Management on Speed Test MeasurementsSiddhant Ray, Taveesh Sharma, Jonatas Marques, Paul Schmitt, Francesco Bronzino, Nick Feamster2025-11-24下载Present day speed test tools measure peak throughput, but often fail to capture the user-perceived responsiveness of a network connection under load.
LLM-Based Agentic Negotiation for 6G: Addressing Uncertainty Neglect and Tail-Event RiskHatim Chergui, Farhad Rezazadeh, Mehdi Bennis, Merouane Debbah2025-11-24下载A critical barrier to the trustworthiness of sixth-generation (6G) agentic autonomous networks is the uncertainty neglect bias; a cognitive tendency for large language model (LLM)-powered agents to ma...
Agent Discovery in Internet of Agents: Challenges and SolutionsShaolong Guo, Yuntao Wang, Zhou Su, Yanghe Pan, Qinnan Hu, Tom H. Luan2025-11-24下载Rapid advances in large language models and agentic AI are driving the emergence of the Internet of Agents (IoA), a paradigm where billions of autonomous software and embodied agents interact, coordin...
Diffusion Model-Enhanced Environment Reconstruction in ISACNguyen Duc Minh Quang, Chang Liu, Shuangyang Li, Hoai-Nam Vu, Derrick Wing Kwan Ng, Wei Xiang2025-11-24下载Recently, environment reconstruction (ER) in integrated sensing and communication (ISAC) systems has emerged as a promising approach for achieving high-resolution environmental perception.
Energy-Efficient Routing Protocol in Vehicular Opportunistic Networks: A Dynamic Cluster-based Routing Using Deep Reinforcement LearningMeisam Sharifi Sani, Saeid Iranmanesh, Raad Raad, Faisel Tubbal2025-11-24下载Opportunistic Networks (OppNets) employ the Store-Carry-Forward (SCF) paradigm to maintain communication during intermittent connectivity. However, routing performance suffers due to dynamic topology ...
An Online Fragmentation-Aware GPU Scheduler for Multi-Tenant MIG-based CloudsMarco Zambianco, Lorenzo Fasol, Roberto Doriguzzi-Corin2025-11-24下载The explosive growth of AI applications has created unprecedented demand for GPU resources. Cloud providers meet this demand through GPU-as-a-Service platforms that offer rentable GPU resources for ru...
Toward Integrated Air-Ground Computing and Communications: A Synergy of Computing Power Networks and Low-Altitude Economy NetworkYan Sun, Yinqiu Liu, Shaoyong Guo, Ruichen Zhang, Jiacheng Wang, Feng Qi, Xuesong Qiu, Dusit Niyato2025-11-24下载With the rapid rise of the Low-Altitude Economy (LAE), the demand for intelligent processing and real-time response in services such as aerial traffic, emergency communications, and environmental moni...

cs.PF - Performance

标题作者发布日期PDF摘要
VLM in a flash: I/O-Efficient Sparsification of Vision-Language Model via Neuron ChunkingKichang Yang, Seonjun Kim, Minjae Kim, Nairan Zhang, Chi Zhang, Youngki Lee2025-11-24下载Edge deployment of large Vision-Language Models (VLMs) increasingly relies on flash-based weight offloading, where activation sparsification is used to reduce I/O overhead.
Low-Rank GEMM: Efficient Matrix Multiplication via Low-Rank Approximation with FP8 AccelerationAlfredo Metere2025-11-24下载Large matrix multiplication is a cornerstone of modern machine learning workloads, yet traditional approaches suffer from cubic computational complexity (e.g.

基于 VitePress 构建