Skip to content

2025-10-20

cs.AR - Architecture

标题作者发布日期PDF摘要
ReLANCE: A Resource-Efficient Low-Latency Cortical Neural Acceleration EngineSonu Kumar, Arjun S. Nair, Bhawna Chaudhary, Mukul Lokhande, Santosh Kumar Vishvakarma2025-10-20下载We present a Cortical Neural Pool (CNP) architecture featuring a high-speed, resource-efficient CORDIC based Hodgkin-Huxley (RCHH) neuron model.
RAS: A Bit-Exact rANS Accelerator For High-Performance Neural Lossless CompressionYuchao Qin, Anjunyi Fan, Bonan Yan2025-10-20下载Data centers handle vast volumes of data that require efficient lossless compression, yet emerging probabilistic models based methods are often computationally slow.
SmaRTLy: RTL Optimization with Logic Inferencing and Structural RebuildingChengxi Li, Yang Sun, Lei Chen, Yiwen Wang, Mingxuan Yuan, Evangeline F. Y. Young2025-10-20下载This paper proposes smaRTLy: a new optimization technique for multiplexers in Register-Transfer Level (RTL) logic synthesis. Multiplexer trees are very common in RTL designs, and traditional tools lik...
SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer InferenceWenxun Wang, Shuchang Zhou, Wenyu Sun, Peiqin Sun, Yongpan Liu2025-10-20下载Transformers have shown remarkable performance in both natural language processing (NLP) and computer vision (CV) tasks. However, their real-time inference speed and efficiency are limited due to the ...
YAP+: Pad-Layout-Aware Yield Modeling and Simulation for Hybrid BondingZhichao Chen, Puneet Gupta2025-10-20下载Three-dimensional (3D) integration continues to advance Moore's Law by facilitating dense interconnects and enabling multi-tier system architectures.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Efficient Multi-Worker Selection based Distributed Swarm Learning via Analog AggregationZhuoyu Yao, Yue Wang, Songyang Zhang, Yingshu Li, Zhipeng Cai, Zhi Tian2025-10-20下载Recent advances in distributed learning systems have introduced effective solutions for implementing collaborative artificial intelligence techniques in wireless communication networks.
Efficient Long-context Language Model Training by Core Attention DisaggregationYonghao Zhuang, Junda Chen, Bo Pang, Yi Gu, Yibo Zhu, Yimin Jiang, Ion Stoica, Eric Xing, Hao Zhang2025-10-20下载We present core attention disaggregation (CAD), a technique that improves long-context large language model training by decoupling the core attention computation, softmax(QK^T)V, from the rest of the ...
A New Broadcast Model for Several Network TopologiesHongbo Lu, Junsung Hwang, Bernard Tenreiro, Nabila Jaman Tripti, Darren Hamilton, Yuefan Deng2025-10-20下载We present Broadcast by Balanced Saturation (BBS), a general broadcast algorithm designed to optimize communication efficiency across diverse network topologies.
AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And SimulatorsJacopo Tagliabue2025-10-20下载We explore AI-driven distributed-systems policy design by combining stochastic code generation from large language models (LLMs) with deterministic verification in a domain-specific simulator.
Quantum Federated Learning: Architectural Elements and Future DirectionsSiva Sai, Abhishek Sawaika, Prabhjot Singh, Rajkumar Buyya2025-10-20下载Federated learning (FL) focuses on collaborative model training without the need to move the private data silos to a central server. Despite its several benefits, the classical FL is plagued with seve...
On the Universality of Round Elimination Fixed PointsAlkida Balliu, Sebastian Brandt, Ole Gabsdil, Dennis Olivetti, Jukka Suomela2025-10-20下载Recent work on distributed graph algorithms [e.g. STOC 2022, ITCS 2022, PODC 2020] has drawn attention to the following open question: are round elimination fixed points a universal technique for prov...
DynaKV: Enabling Accurate and Efficient Long-Sequence LLM Decoding on SmartphonesTuowei Wang, Minxing Huang, Fengzu Li, Ligeng Chen, Jinrui Zhang, Ju Ren2025-10-20下载As the demand for human-like reasoning, multi-turn dialogues, and long-form responses grows, large language models (LLMs) are increasingly expected to support efficient and effective long-sequence dec...
Network and Systems Performance Characterization of MCP-Enabled LLM AgentsZihao Ding, Mufeng Zhu, Yao Liu2025-10-20下载Model Context Protocol (MCP) has recently gained increased attention within the AI community for providing a standardized way for large language models (LLMs) to interact with external tools and servi...
Integrating Performance Tools in Model Reasoning for GPU Kernel OptimizationDaniel Nichols, Konstantinos Parasyris, Charles Jekel, Abhinav Bhatele, Harshitha Menon2025-10-20下载Language models are now prevalent in software engineering with many developers using them to automate tasks and accelerate their development. While language models have been tremendous at accomplishin...
An Evaluation of LLMs Inference on Popular Single-board ComputersTung, Nguyen, Tuyen Nguyen2025-10-20下载The growing demand for on-device large language model (LLM) inference is driving interest in deploying lightweight, cost-effective AI solutions on edge hardware.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Black-Box Evasion Attacks on Data-Driven Open RAN Apps: Tailored Design and Experimental EvaluationPranshav Gajjar, Molham Khoja, Abiodun Ganiyu, Marc Juarez, Mahesh K. Marina, Andrew Lehane, Vijay K. Shah2025-10-20下载The impending adoption of Open Radio Access Network (O-RAN) is fueling innovation in the RAN towards data-driven operation. Unlike traditional RAN where the RAN data and its usage is restricted within...
A New Broadcast Model for Several Network TopologiesHongbo Lu, Junsung Hwang, Bernard Tenreiro, Nabila Jaman Tripti, Darren Hamilton, Yuefan Deng2025-10-20下载We present Broadcast by Balanced Saturation (BBS), a general broadcast algorithm designed to optimize communication efficiency across diverse network topologies.
Pointing-Error-Induced Fading in an Open-Loop THz Uplink with Hardware ImpairmentsP. Brach del Prever, P. Testolina, A. Masihi, S. Petrushkevich, M. Polese, T. Melodia, J. M. Jornet2025-10-20下载We analyze the open-loop mechanical tracking performance of a sub-Terahertz (sub-THz) and Terahertz (THz) uplink communication system. These high-frequency bands enable multi-gigabit links through lar...
Adaptive Local Combining with Decentralized Decoding for Distributed Massive MIMOMohd Saif Ali Khan, Karthik RM, Samar Agnihotri2025-10-20下载Efficient uplink processing in distributed massive multiple-input multiple-output (D-mMIMO) systems requires both effective local combining and scalable decoding to significantly mitigate inter-user i...
Is It Worth to Use Feedback Channel in 5G V2X Platoon Scenarios?Dmitry Bankov, Artem Krasilov, Artem Otmakhov, Pavel Savlukovich, Evgeny Khorov2025-10-20下载5G Vehicle-to-Everything (V2X) is a new technology developed by 3GPP to support inter-vehicle communication. In contrast to 4G V2X which allows only broadcast communication, 5G V2X enables groupcast a...
Enhancing 5G V2X Mode 2 for Sporadic TrafficDmitry Bankov, Artem Krasilov, Artem Otmakhov, Aleksei Shashin, Evgeny Khorov2025-10-20下载The emerging road safety and autonomous vehicle applications require timely and reliable data delivery between vehicles and between vehicles and infrastructure.
AoA Services in 5G Networks: A Framework for Real-World Implementation and Systematic TestingAlberto Ceresoli, Viola Bernazzoli, Roberto Pegurri, Ilario Filippini2025-10-20下载Accurate positioning is a key enabler for emerging 5G applications. While the standardized Location Management Function (LMF) operates centrally within the core network, its scalability and latency li...
Network and Systems Performance Characterization of MCP-Enabled LLM AgentsZihao Ding, Mufeng Zhu, Yao Liu2025-10-20下载Model Context Protocol (MCP) has recently gained increased attention within the AI community for providing a standardized way for large language models (LLMs) to interact with external tools and servi...
Mamba4Net: Distilled Hybrid Mamba Large Language Models For NetworkingLinhan Xia, Mingzhan Yang, Jingjing Wang, Ziwei Yan, Yakun Ren, Guo Yu, Kai Lei2025-10-20下载Transformer-based large language models (LLMs) are increasingly being adopted in networking research to address domain-specific challenges. However, their quadratic time complexity and substantial mod...

cs.PF - Performance

标题作者发布日期PDF摘要
Insum: Sparse GPU Kernels Simplified and Optimized with Indirect EinsumsJaeyeon Won, Willow Ahrens, Joel S. Emer, Saman Amarasinghe2025-10-20下载Programming high-performance sparse GPU kernels is notoriously difficult, requiring both substantial effort and deep expertise. Sparse compilers aim to simplify this process, but existing systems fall...
Integrating Performance Tools in Model Reasoning for GPU Kernel OptimizationDaniel Nichols, Konstantinos Parasyris, Charles Jekel, Abhinav Bhatele, Harshitha Menon2025-10-20下载Language models are now prevalent in software engineering with many developers using them to automate tasks and accelerate their development. While language models have been tremendous at accomplishin...

基于 VitePress 构建