Appearance
2025-10-16
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| From Loop Nests to Silicon: Mapping AI Workloads onto AMD NPUs with MLIR-AIR | Erwei Wang, Samuel Bayliss, Andra Bisca, Zachary Blair, Sangeeta Chowdhary, Kristof Denolf, Jeff Fifield, Brandon Freiberger, Erika Hunhoff, Phil James-Roxby, Jack Lo, Joseph Melber, Stephen Neuendorffer, Eddie Richter, Andre Rosti, Javier Setoain, Gagandeep Singh, Endri Taka, Pranathi Vasireddy, Zhewen Yu, Niansong Zhang, Jinming Zhuang | 2025-10-16 | 下载 | General-purpose compilers abstract away parallelism, locality, and synchronization, limiting their effectiveness on modern spatial architectures. |
| ColumnDisturb: Understanding Column-based Read Disturbance in Real DRAM Chips and Implications for Future Systems | İsmail Emir Yüksel, Ataberk Olgun, F. Nisa Bostancı, Haocong Luo, A. Giray Yağlıkçı, Onur Mutlu | 2025-10-16 | 下载 | We experimentally demonstrate a new widespread read disturbance phenomenon, ColumnDisturb, in real commodity DRAM chips. By repeatedly opening or keeping a DRAM row (aggressor row) open, we show that ... |
| Deadlock-free routing for Full-mesh networks without using Virtual Channels | Alejandro Cano, Cristóbal Camarero, Carmen Martínez, Ramón Beivide | 2025-10-16 | 下载 | High-radix, low-diameter networks like HyperX and Dragonfly use a Full-mesh core, and rely on multiple virtual channels (VCs) to avoid packet deadlocks in adaptive routing. |
| Tawa: Automatic Warp Specialization for Modern GPUs with Asynchronous References | Hongzheng Chen, Bin Fan, Alexander Collins, Bastian Hagedorn, Evghenii Gaburov, Masahiro Masuda, Matthew Brookhart, Chris Sullivan, Jason Knight, Zhiru Zhang, Vinod Grover | 2025-10-16 | 下载 | Modern GPUs feature specialized hardware units that enable high-performance, asynchronous dataflow execution. However, the conventional SIMT programming model is fundamentally misaligned with this tas... |
| MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving | Jungi Lee, Junyong Park, Soohyun Cha, Jaehoon Cho, Jaewoong Sim | 2025-10-16 | 下载 | Reduced-precision data formats are crucial for cost-effective serving of large language models (LLMs). While numerous reduced-precision formats have been introduced thus far, they often require intrus... |
| Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow | Ching-Lin Hsiung, Tian-Sheuan Chang | 2025-10-16 | 下载 | Current transformer accelerators primarily focus on optimizing self-attention due to its quadratic complexity. However, this focus is less relevant for vision transformers with short token lengths, wh... |
| Computing-In-Memory Aware Model Adaption For Edge Devices | Ming-Han Lin, Tian-Sheuan Chang | 2025-10-16 | 下载 | Computing-in-Memory (CIM) macros have gained popularity for deep learning acceleration due to their highly parallel computation and low power consumption. |
| Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing | Tianhua Xia, Sai Qian Zhang | 2025-10-16 | 下载 | Running Large Language Models (LLMs) on edge devices is crucial for reducing latency, improving real-time processing, and enhancing privacy. By performing inference directly on the device, data does n... |
| Systolic Array Acceleration of Diagonal-Optimized Sparse-Sparse Matrix Multiplication for Efficient Quantum Simulation | Yuchao Su, Srikar Chundury, Jiajia Li, Frank Mueller | 2025-10-16 | 下载 | Hamiltonian simulation is a key workload in quantum computing, enabling the study of complex quantum systems and serving as a critical tool for classical verification of quantum devices. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| An Elastic Job Scheduler for HPC Applications on the Cloud | Aditya Bhosale, Kavitha Chandrasekar, Laxmikant Kale, Sara Kokkila-Schumacher | 2025-10-16 | 下载 | The last few years have seen an increase in adoption of the cloud for running HPC applications. The pay-as-you-go cost model of these cloud resources has necessitated the development of specialized pr... |
| NEMO: Faster Parallel Execution for Highly Contended Blockchain Workloads (Full version) | François Ezard, Can Umut Ileri, Jérémie Decouchant | 2025-10-16 | 下载 | Following the design of more efficient blockchain consensus algorithms, the execution layer has emerged as the new performance bottleneck of blockchains, especially under high contention. |
| Targeted Attacks and Defenses for Distributed Federated Learning in Vehicular Networks | Utku Demir, Tugba Erpek, Yalin E. Sagduyu, Sastry Kompella, Mengran Xue | 2025-10-16 | 下载 | In emerging networked systems, mobile edge devices such as ground vehicles and unmanned aerial system (UAS) swarms collectively aggregate vast amounts of data to make machine learning decisions such a... |
| Hive Hash Table: A Warp-Cooperative, Dynamically Resizable Hash Table for GPUs | Md Sabbir Hossain Polak, David Troendle, Byunghyun Jang | 2025-10-16 | 下载 | Hash tables are essential building blocks in data-intensive applications, yet existing GPU implementations often struggle with concurrent updates, high load factors, and irregular memory access patter... |
| Multi-modal video data-pipelines for machine learning with minimal human supervision | Mihai-Cristian Pîrvu, Marius Leordeanu | 2025-10-16 | 下载 | The real-world is inherently multi-modal at its core. Our tools observe and take snapshots of it, in digital form, such as videos or sounds, however much of it is lost. |
| Balls and Bins and the Infinite Process with Random Deletions | Petra Berenbrink, Tom Friedetzky, Peter Kling, Lars Nagel | 2025-10-16 | 下载 | We consider an infinite balls-into-bins process with deletions where in each discrete step a coin is tossed as to whether, with probability β(t) \in (0,1), a new ball is allocated using the Gree... |
| Deadlock-free routing for Full-mesh networks without using Virtual Channels | Alejandro Cano, Cristóbal Camarero, Carmen Martínez, Ramón Beivide | 2025-10-16 | 下载 | High-radix, low-diameter networks like HyperX and Dragonfly use a Full-mesh core, and rely on multiple virtual channels (VCs) to avoid packet deadlocks in adaptive routing. |
| xLLM Technical Report | Tongxuan Liu, Tao Peng, Peijun Yang, Xiaoyang Zhao, Xiusheng Lu, Weizhe Huang, Zirui Liu, Xiaoyu Chen, Zhiwei Liang, Jun Xiong, Donghe Jin, Minchao Zhang, Jinrong Guo, Yingxu Deng, Xu Zhang, Xianzhe Dong, Siqi Wang, Siyu Wu, Yu Wu, Zihan Tang, Yuting Zeng, Yanshu Wang, Jinguang Liu, Meng Kang, Menxin Li, Yunlong Wang, Yiming Liu, Xiaolong Ma, Yifan Wang, Yichen Zhang, Jinrun Yin, Keyang Zheng, Jiawei Yin, Jun Zhang, Ziyue Wang, Xiaobo Lin, Liangyu Liu, Liwei Lan, Yang Liu, Chunhua Peng, Han Liu, Songcheng Ren, Xuezhu Wang, Yunheng Shen, Yi Wang, Guyue Liu, Yitao Hu, Hui Chen, Tong Yang, Hailong Yang, Jing Li, Guiguang Ding, Ke Zhang | 2025-10-16 | 下载 | We introduce xLLM, an intelligent and efficient Large Language Model (LLM) inference framework designed for high-performance, large-scale enterprise-grade serving, with deep optimizations for diverse ... |
| The Bidding Games: Reinforcement Learning for MEV Extraction on Polygon Blockchain | Andrei Seoev, Leonid Gremyachikh, Anastasiia Smirnova, Yash Madhwal, Alisa Kalacheva, Dmitry Belousov, Ilia Zubov, Aleksei Smirnov, Denis Fedyanin, Vladimir Gorgadze, Yury Yanovich | 2025-10-16 | 下载 | In blockchain networks, the strategic ordering of transactions within blocks has emerged as a significant source of profit extraction, known as Maximal Extractable Value (MEV). |
| MPI-over-CXL: Enhancing Communication Efficiency in Distributed HPC Systems | Miryeong Kwon, Donghyun Gouk, Hyein Woo, Junhee Kim, Jinwoo Baek, Kyungkuk Nam, Sangyoon Ji, Jiseon Kim, Hanyeoreum Bae, Junhyeok Jang, Hyunwoo You, Junseok Moon, Myoungsoo Jung | 2025-10-16 | 下载 | MPI implementations commonly rely on explicit memory-copy operations, incurring overhead from redundant data movement and buffer management. This overhead notably impacts HPC workloads involving inten... |
| JASDA: Introducing Job-Aware Scheduling in Scheduler-Driven Job Atomization | Michal Konopa, Jan Fesl, Ladislav Ber ánek | 2025-10-16 | 下载 | The increasing complexity and temporal variability of workloads on MIG-enabled GPUs challenge the scalability of traditional centralized scheduling. |
| ScalePool: Hybrid XLink-CXL Fabric for Composable Resource Disaggregation in Unified Scale-up Domains | Hyein Woo, Miryeong Kwon, Jiseon Kim, Eunjee Na, Hanjin Choi, Seonghyeon Jang, Myoungsoo Jung | 2025-10-16 | 下载 | This paper proposes ScalePool, a novel cluster architecture designed to interconnect numerous accelerators using unified hardware interconnects rather than traditional long-distance networking. |
| FairBatching: Fairness-Aware Batch Formation for LLM Inference | Hongtao Lyu, Boyue Liu, Mingyu Wu, Haibo Chen | 2025-10-16 | 下载 | Large language model (LLM) inference systems face a fundamental tension between minimizing Time-to-First-Token (TTFT) latency for new requests and maintaining a high, steady token generation rate (low... |
| From Attention to Disaggregation: Tracing the Evolution of LLM Inference | Madabattula Rajesh Kumar, Srinivasa Rao Aravilli, Mustafa Saify, Shashank Srivastava | 2025-10-16 | 下载 | The evolution of Large Language Models from the Transformer architecture to models with trillions of parameters has shifted the primary bottleneck from model training to real time inference. |
| Incentive-Based Federated Learning: Architectural Elements and Future Directions | Chanuka A. S. Hewa Kaluannakkage, Rajkumar Buyya | 2025-10-16 | 下载 | Federated learning promises to revolutionize machine learning by enabling collaborative model training without compromising data privacy. However, practical adaptability can be limited by critical fac... |
| Proof-Carrying Fair Ordering: Asymmetric Verification for BFT via Incremental Graphs | Pengkun Ren, Hai Dong, Nasrin Sohrabi, Zahir Tari, Pengcheng Zhang | 2025-10-16 | 下载 | Byzantine Fault-Tolerant (BFT) consensus protocols ensure agreement on transaction ordering despite malicious actors, but unconstrained ordering power enables sophisticated value extraction attacks li... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Targeted Attacks and Defenses for Distributed Federated Learning in Vehicular Networks | Utku Demir, Tugba Erpek, Yalin E. Sagduyu, Sastry Kompella, Mengran Xue | 2025-10-16 | 下载 | In emerging networked systems, mobile edge devices such as ground vehicles and unmanned aerial system (UAS) swarms collectively aggregate vast amounts of data to make machine learning decisions such a... |
| Decoherence-Aware Entangling and Swapping Strategy Optimization for Entanglement Routing in Quantum Networks | Shao-Min Huang, Cheng-Yang Cheng, Ming-Huang Chien, Jian-Jhih Kuo, Chih-Yu Wang | 2025-10-16 | 下载 | Quantum teleportation enables high-security communications through end-to-end quantum entangled pairs. End-to-end entangled pairs are created by using swapping processes to consume short entangled pai... |
| Intelligent Dynamic Handover via AI-assisted Signal Quality Prediction in 6G Multi-RAT Networks | Maria Lamprini A. Bartsioka, Anastasios Giannopoulos, Sotirios Spantideas | 2025-10-16 | 下载 | The emerging paradigm of 6G multiple Radio Access Technology (multi-RAT) networks, where cellular and Wireless Fidelity (WiFi) transmitters coexist, requires mobility decisions that remain reliable un... |
| Automated Extraction of Protocol State Machines from 3GPP Specifications with Domain-Informed Prompts and LLM Ensembles | Miao Zhang, Runhan Feng, Hongbo Tang, Yu Zhao, Jie Yang, Hang Qiu, Qi Liu | 2025-10-16 | 下载 | Mobile telecommunication networks are foundational to global infrastructure and increasingly support critical sectors such as manufacturing, transportation, and healthcare. |
| Energy-Latency Optimization for Dynamic 5G Mobile Radio Access Networks | Gabriela N. Caspa H., Carlos A. Astudillo, Nelson L. S. da Fonseca | 2025-10-16 | 下载 | In 5G networks, base station (BS) disaggregation and new services present challenges in radio access network (RAN) configuration, particularly in meeting their bandwidth and latency constraints. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Stability and Heavy-traffic Delay Optimality of General Load Balancing Policies in Heterogeneous Service Systems | Yishun Luo, Martin Zubeldia | 2025-10-16 | 下载 | We consider a load balancing system consisting of single-server queues working in parallel, with heterogeneous service rates. Jobs arrive to a central dispatcher, which has to dispatch them to one... |