2025-12-19

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Optimal Software Pipelining and Warp Specialization for Tensor Core GPUs	Rupanshu Soi, Rohan Yadav, Fredrik Kjolstad, Alex Aiken, Maryam Mehri Dehnavi, Michael Garland, Michael Bauer	2025-12-19	下载	GPU architectures have continued to grow in complexity, with recent incarnations introducing increasingly powerful fixed-function units for matrix multiplication and data movement to accompany highly ...
PermuteV: A Performant Side-channel-Resistant RISC-V Core Securing Edge AI Inference	Nuntipat Narkthong, Xiaolin Xu	2025-12-19	下载	Edge AI inference is becoming prevalent thanks to the emergence of small yet high-performance microprocessors. This shift from cloud to edge processing brings several benefits in terms of energy savin...
A 14ns-Latency 9Gb/s 0.44mm $^2$ 62pJ/b Short-Blocklength LDPC Decoder ASIC in 22FDX	Darja Nonaca, Jérémy Guichemerre, Reinhard Wiesmayr, Nihat Engin Tunali, Christoph Studer	2025-12-19	下载	Ultra-reliable low latency communication (URLLC) is a key part of 5G wireless systems. Achieving low latency necessitates codes with short blocklengths for which polar codes with successive cancellati...
LLM-based Behaviour Driven Development for Hardware Design	Rolf Drechsler, Qian Liu	2025-12-19	下载	Test and verification are essential activities in hardware and system design, but their complexity grows significantly with increasing system sizes.
Torrent: A Distributed DMA for Efficient and Flexible Point-to-Multipoint Data Movement	Yunhao Deng, Fanchen Kong, Xiaoling Yi, Ryan Antonio, Marian Verhelst	2025-12-19	下载	The growing disparity between computational power and on-chip communication bandwidth is a critical bottleneck in modern Systems-on-Chip (SoCs), especially for data-parallel workloads like AI.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Constrained Cuts, Flows, and Lattice-Linearity	Robert Streit, Vijay K. Garg	2025-12-19	下载	In a capacitated directed graph, it is known that the set of all min-cuts forms a distributive lattice [1], [2]. Here, we describe this lattice as a regular predicate whose forbidden elements can be a...
ACE-Sync: An Adaptive Cloud-Edge Synchronization Framework for Communication-Efficient Large-Scale Distributed Model Training	Yi Yang, Ziyu Lin, Liesheng Wei	2025-12-19	下载	Large-scale deep learning models impose substantial communication overh ead in distributed training, particularly in bandwidth-constrained or heterogeneous clo ud-edge environments.
Asymptotic behaviour of galactic small-scale dynamos at modest magnetic Prandtl number	Frederick A. Gent, Mordecai-Mark Mac Low, Maarit J. Korpi-Lagg, Touko Puro, Matthias Reinhardt	2025-12-19	下载	Magnetic fields are critical at many scales to galactic dynamics and structure, including multiphase pressure balance, dust processing, and star formation.
Torrent: A Distributed DMA for Efficient and Flexible Point-to-Multipoint Data Movement	Yunhao Deng, Fanchen Kong, Xiaoling Yi, Ryan Antonio, Marian Verhelst	2025-12-19	下载	The growing disparity between computational power and on-chip communication bandwidth is a critical bottleneck in modern Systems-on-Chip (SoCs), especially for data-parallel workloads like AI.
Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing	Lingxiao Zhao, Haoran Zhou, Yuezhi Che, Dazhao Cheng	2025-12-19	下载	Multimodal large language models (MLLMs) extend LLMs with visual understanding through a three-stage pipeline: multimodal preprocessing, vision encoding, and LLM inference.
iOS as Acceleration	Alexander K. Chen	2025-12-19	下载	Practical utilization of large-scale machine learning requires a powerful compute setup, a necessity which poses a significant barrier to engagement with such artificial intelligence in more restricte...
The HEAL Data Platform	Brienna M. Larrick, L. Philip Schumm, Mingfei Shao, Craig Barnes, Anthony Juehne, Hara Prasad Juvvla, Michael B. Kranz, Michael Lukowski, Clint Malson, Jessica N. Mazerik, Christopher G. Meyer, Jawad Qureshi, Erin Spaniol, Andrea Tentner, Alexander VanTol, Peter Vassilatos, Sara Volk de Garcia, Robert L. Grossman	2025-12-19	下载	Objective: The objective was to develop a cloud-based, federated system to serve as a single point of search, discovery and analysis for data generated under the NIH Helping to End Addiction Long-term...
Democratizing Scalable Cloud Applications: Transactional Stateful Functions on Streaming Dataflows	Kyriakos Psarakis	2025-12-19	下载	Web applications underpin much of modern digital life, yet building scalable and consistent cloud applications remains difficult, requiring expertise across cloud computing, distributed systems, datab...
Adaptive Graph Pruning with Sudden-Events Evaluation for Traffic Prediction using Online Semi-Decentralized ST-GNNs	Ivan Kralj, Lodovico Giaretta, Gordan Ježić, Ivana Podnar Žarko, Šarūnas Girdzijauskas	2025-12-19	下载	Spatio-Temporal Graph Neural Networks (ST-GNNs) are well-suited for processing high-frequency data streams from geographically distributed sensors in smart mobility systems.
Scalable Distributed Vector Search via Accuracy Preserving Index Construction	Yuming Xu, Qianxi Zhang, Qi Chen, Baotong Lu, Menghao Li, Philip Adams, Mingqin Li, Zengzhong Li, Jing Liu, Cheng Li, Fan Yang	2025-12-19	下载	Scaling Approximate Nearest Neighbor Search (ANNS) to billions of vectors requires distributed indexes that balance accuracy, latency, and throughput.
Practical Framework for Privacy-Preserving and Byzantine-robust Federated Learning	Baolei Zhang, Minghong Fang, Zhuqing Liu, Biao Yi, Peizhao Zhou, Yuan Wang, Tong Li, Zheli Liu	2025-12-19	下载	Federated Learning (FL) allows multiple clients to collaboratively train a model without sharing their private data. However, FL is vulnerable to Byzantine attacks, where adversaries manipulate client...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
STAR: Semantic-Traffic Alignment and Retrieval for Zero-Shot HTTPS Website Fingerprinting	Yifei Cheng, Yujia Zhu, Baiyang Li, Xinhao Deng, Yitong Cai, Yaochen Ren, Qingyun Liu	2025-12-19	下载	Modern HTTPS mechanisms such as Encrypted Client Hello (ECH) and encrypted DNS improve privacy but remain vulnerable to website fingerprinting (WF) attacks, where adversaries infer visited sites from ...
Binding Agent ID: Unleashing the Power of AI Agents with accountability and credibility	Zibin Lin, Shengli Zhang, Guofu Liao, Dacheng Tao, Taotao Wang	2025-12-19	下载	Autonomous AI agents lack traceable accountability mechanisms, creating a fundamental dilemma where systems must either operate as ``downgraded tools'' or risk real-world abuse.
A decomposition approach for large virtual network embedding	Amal Benhamiche, Pierre Fouilhoux, Lucas Létocart, Nancy Perrot, Alexis Schneider	2025-12-19	下载	Virtual Network Embedding (VNE) is the core combinatorial problem of Network Slicing, a 5G technology which enables telecommunication operators to propose diverse service-dedicated virtual networks, e...
Timely Information Updating for Mobile Devices Without and With ML Advice	Yu-Pin Hsu, Yi-Hsuan Tseng	2025-12-19	下载	This paper investigates an information update system in which a mobile device monitors a physical process and sends status updates to an access point (AP).
Quantum-enhanced Information Retrieval from Reflective Intelligent Surfaces	Shiqian Guo, Tingxiang Ji, Jianqing Liu	2025-12-19	下载	Information retrieval from passive backscatter systems is widely used in digital applications with tight energy budgets, short communication distances, and low data rates.
Enhancing AIGC Service Efficiency with Adaptive Multi-Edge Collaboration in A Distributed System	Changfu Xu, Jianxiong Guo, Jiandian Zeng, Houming Qiu, Tian Wang, Xiaowen Chu, Jiannong Cao	2025-12-19	下载	The Artificial Intelligence Generated Content (AIGC) technique has gained significant traction for producing diverse content. However, existing AIGC services typically operate within a centralized fra...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
On General Linearly Implicit Quantized State System Methods	Mariana Bergonzi, Joaquín Fernández, Ernesto Kofman	2025-12-19	下载	This work proposes a methodology to develop new numerical integration algorithms for ordinary differential equations based on state quantization, generalizing the notions of Linearly Implicit Quantize...
GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step Overlapping	Yishu Yin, Xuehai Qian	2025-12-19	下载	SSD-offloaded training offers a practical and promising approach to making LLM training cost-effective. Building on gradient accumulation with micro-batches, this paper introduces GreedySnake, a new S...