Skip to content

2025-07-18

cs.AR - Architecture

标题作者发布日期PDF摘要
LIMINAL: Exploring The Frontiers of LLM Decode PerformanceMichael Davies, Neal Crago, Karthikeyan Sankaralingam, Christos Kozyrakis2025-07-18下载The rapid advancement of Large Language Models (LLMs) necessitates a deep understanding of their fundamental performance limits. This paper investigates the limits of LLM inference, focusing on hardwa...
An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoCMatthias Jobst, Tim Langer, Chen Liu, Mehmet Alici, Hector A. Gonzalez, Christian Mayr2025-07-18下载This work presents a multi-layer DNN scheduling framework as an extension of OctopuScheduler, providing an end-to-end flow from PyTorch models to inference on a single SpiNNaker2 chip.
4T2R X-ReRAM CiM Array for Variation-tolerant, Low-power, Massively Parallel MAC OperationFuyuki Kihara, Seiji Uenohara, Satoshi Awamura, Naoko Misawa, Chihiro Matsui, Ken Takeuchi2025-07-18下载Computation-in-Memory (CiM) is attracting attention as a technology that can perform MAC calculations required for AI accelerators, at high speed with low power consumption.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Characterizing Communication Patterns in Distributed Large Language Model InferenceLang Xu, Kaushik Kandadi Suresh, Quentin Anthony, Nawras Alnaasan, Dhabaleswar K. Panda2025-07-18下载Large Language Models (LLMs) built on transformer architectures have transformed natural language processing, achieving remarkable performance across diverse applications.
FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated LearningMd Rafid Haque, Abu Raihan Mostofa Kamal, Md. Azam Hossain2025-07-18下载Federated Learning (FL) offers a paradigm for privacy-preserving collaborative AI, but its decentralized nature creates significant vulnerabilities to model poisoning attacks.
Weighted Matching in a Poly-Streaming ModelAhammed Ullah, S. M. Ferdous, Alex Pothen2025-07-18下载We introduce the poly-streaming model, a generalization of streaming models of computation in which kk processors process kk data streams containing a total of NN items.
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement LearningXiaoya Li, Xiaofei Sun, Albert Wang, Jiwei Li, Chris Shum2025-07-18下载The exponential growth in demand for GPU computing resources has created an urgent need for automated CUDA optimization strategies. While recent advances in LLMs show promise for code generation, curr...
Shipwright: Proving liveness of distributed systems with Byzantine participantsDerek Leung, Nickolai Zeldovich, Frans Kaashoek2025-07-18下载Ensuring liveness in a decentralized system, such as PBFT, is critical, because there may not be any single administrator that can restart the system if it encounters a liveness bug.
Edge Intelligence with Spiking Neural NetworksShuiguang Deng, Di Yu, Changze Lv, Xin Du, Linshan Jiang, Xiaofan Zhao, Wentao Tong, Xiaoqing Zheng, Weijia Fang, Peng Zhao, Gang Pan, Schahram Dustdar, Albert Y. Zomaya2025-07-18下载The convergence of artificial intelligence and edge computing has spurred growing interest in enabling intelligent services directly on resource-constrained devices.
Application Placement with Constraint RelaxationDamiano Azzolini, Marco Duca, Stefano Forti, Francesco Gallo, Antonio Ielo2025-07-18下载Novel utility computing paradigms rely upon the deployment of multi-service applications to pervasive and highly distributed cloud-edge infrastructure resources.
DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-TrainingZhixin Wang, Tianyi Zhou, Liming Liu, Ao Li, Jiarui Hu, Dian Yang, Yinhui Lu, Jinlong Hou, Siyuan Feng, Yuan Cheng, Yuan Qi2025-07-18下载Reinforcement learning (RL) has become the pivotal post-training technique for large language model (LLM). Effectively scaling reinforcement learning is now the key to unlocking advanced reasoning cap...
An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoCMatthias Jobst, Tim Langer, Chen Liu, Mehmet Alici, Hector A. Gonzalez, Christian Mayr2025-07-18下载This work presents a multi-layer DNN scheduling framework as an extension of OctopuScheduler, providing an end-to-end flow from PyTorch models to inference on a single SpiNNaker2 chip.
Quantum Blockchain Survey: Foundations, Trends, and GapsSaurav Ghosh, Niloy Deb Roy Mishu2025-07-18下载Quantum computing poses fundamental risks to classical blockchain systems by undermining widely used cryptographic primitives. In response, two major research directions have emerged: post-quantum blo...
FedSkipTwin: Digital-Twin-Guided Client Skipping for Communication-Efficient Federated LearningDaniel Commey, Kamel Abbad, Garth V. Crosby, Lyes Khoukhi2025-07-18下载Communication overhead remains a primary bottleneck in federated learning (FL), particularly for applications involving mobile and IoT devices with constrained bandwidth.
Leveraging Multi-Instance GPUs through moldable task schedulingJorge Villarrubia, Luis Costero, Francisco D. Igual, Katzalin Olcoz2025-07-18下载NVIDIA MIG (Multi-Instance GPU) allows partitioning a physical GPU into multiple logical instances with fully-isolated resources, which can be dynamically reconfigured.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
NetIntent: Leveraging Large Language Models for End-to-End Intent-Based SDN AutomationMd. Kamrul Hossain, Walid Aljoby2025-07-18下载Intent-Based Networking (IBN) often leverages the programmability of Software-Defined Networking (SDN) to simplify network management. However, significant challenges remain in automating the entire p...
The Proportional Fair Scheduler in Wavelength-Multiplexed Quantum NetworksSanidhay Bhambay, Siddarth Koduru Joshi, Thirupathaiah Vasantam, Neil Walton2025-07-18下载We address the problem of optimal pumping strategies in quantum networks. These networks enable secure communication by distributing entangled photon pairs to user (or node) pairs.
Preprint: Poster: Did I Just Browse A Website Written by LLMs?Sichang Steven He, Ramesh Govindan, Harsha V. Madhyastha2025-07-18下载Increasingly, web content is automatically generated by large language models (LLMs) with little human input. We call this "LLM-dominant" content.
Beyond DNS: Unlocking the Internet of AI Agents via the NANDA Index and Verified AgentFactsRamesh Raskar, Pradyumna Chari, John Zinky, Mahesh Lambe, Jared James Grogan, Sichao Wang, Rajesh Ranjan, Rekha Singhal, Shailja Gupta, Robert Lincourt, Raghu Bala, Aditi Joshi, Abhishek Singh, Ayush Chopra, Dimitris Stripelis, Bhuwan B, Sumit Kumar, Maria Gorskikh2025-07-18下载The Internet is poised to host billions to trillions of autonomous AI agents that negotiate, delegate, and migrate in milliseconds and workloads that will strain DNS-centred identity and discovery.
On the Trade-Off Between Sum-Rate and Energy Efficiency through the Convergence of HAPS and Active RIS TechnologiesBilal Karaman, Ilhan Basturk, Ferdi Kara, Metin Ozturk, Sezai Taskin, Halil Yanikomeroglu2025-07-18下载This paper investigates the integration of active reconfigurable intelligent surfaces (RIS) relay with high-altitude platform stations (HAPS) to enhance non-terrestrial network (NTN) performance in ne...
Quantum Blockchain Survey: Foundations, Trends, and GapsSaurav Ghosh, Niloy Deb Roy Mishu2025-07-18下载Quantum computing poses fundamental risks to classical blockchain systems by undermining widely used cryptographic primitives. In response, two major research directions have emerged: post-quantum blo...
ATRO: A Fast Algorithm for Topology Engineering of Reconfigurable Datacenter NetworksYingming Mao, Qiaozhu Zhai, Ximeng Liu, Xinchi Han, Fafan li, Shizhen Zhao, Yuzhou Zhou, Zhen Yao, Xia Zhu2025-07-18下载Reconfigurable data center networks (DCNs) enhance traditional architectures with optical circuit switches (OCSs), enabling dynamic reconfiguration of inter-pod links, i.e., the logical topology.
CARTS: Cooperative and Adaptive Resource Triggering and Stitching for 5G ISACCheng Jiang, Yihe Yan, Yanxiang Wang, Jiawei Hu, Chun Tung Chou, Wen Hu2025-07-18下载This paper presents CARTS, an adaptive 5G uplink sensing scheme designed to provide Integrated Sensing and Communication (ISAC) services. The performance of both communication and sensing fundamentall...
Agent Network Protocol Technical White PaperGaowei Chang, Eidan Lin, Chengxuan Yuan, Rizhao Cai, Binbin Chen, Xuan Xie, Yin Zhang2025-07-18下载With the development of large models and autonomous decision-making AI, agents are rapidly becoming the new entities of the internet, following mobile apps.
FedSkipTwin: Digital-Twin-Guided Client Skipping for Communication-Efficient Federated LearningDaniel Commey, Kamel Abbad, Garth V. Crosby, Lyes Khoukhi2025-07-18下载Communication overhead remains a primary bottleneck in federated learning (FL), particularly for applications involving mobile and IoT devices with constrained bandwidth.

cs.PF - Performance

标题作者发布日期PDF摘要
Photonic Fabric Platform for AI AcceleratorsJing Ding, Trung Diep2025-07-18下载This paper presents the Photonic FabricTM and the Photonic Fabric ApplianceTM (PFA), a photonic-enabled switch and memory subsystem that delivers low latency, high bandwidth, and low per-bit energy.
The Proportional Fair Scheduler in Wavelength-Multiplexed Quantum NetworksSanidhay Bhambay, Siddarth Koduru Joshi, Thirupathaiah Vasantam, Neil Walton2025-07-18下载We address the problem of optimal pumping strategies in quantum networks. These networks enable secure communication by distributing entangled photon pairs to user (or node) pairs.
Leveraging Multi-Instance GPUs through moldable task schedulingJorge Villarrubia, Luis Costero, Francisco D. Igual, Katzalin Olcoz2025-07-18下载NVIDIA MIG (Multi-Instance GPU) allows partitioning a physical GPU into multiple logical instances with fully-isolated resources, which can be dynamically reconfigured.

基于 VitePress 构建