Skip to content

2025-12-19

cs.AR - Architecture

标题作者发布日期PDF摘要
Optimal Software Pipelining and Warp Specialization for Tensor Core GPUsRupanshu Soi, Rohan Yadav, Fredrik Kjolstad, Alex Aiken, Maryam Mehri Dehnavi, Michael Garland, Michael Bauer2025-12-19下载GPU architectures have continued to grow in complexity, with recent incarnations introducing increasingly powerful fixed-function units for matrix multiplication and data movement to accompany highly ...
PermuteV: A Performant Side-channel-Resistant RISC-V Core Securing Edge AI InferenceNuntipat Narkthong, Xiaolin Xu2025-12-19下载Edge AI inference is becoming prevalent thanks to the emergence of small yet high-performance microprocessors. This shift from cloud to edge processing brings several benefits in terms of energy savin...
A 14ns-Latency 9Gb/s 0.44mm2^2 62pJ/b Short-Blocklength LDPC Decoder ASIC in 22FDXDarja Nonaca, Jérémy Guichemerre, Reinhard Wiesmayr, Nihat Engin Tunali, Christoph Studer2025-12-19下载Ultra-reliable low latency communication (URLLC) is a key part of 5G wireless systems. Achieving low latency necessitates codes with short blocklengths for which polar codes with successive cancellati...
LLM-based Behaviour Driven Development for Hardware DesignRolf Drechsler, Qian Liu2025-12-19下载Test and verification are essential activities in hardware and system design, but their complexity grows significantly with increasing system sizes.
Torrent: A Distributed DMA for Efficient and Flexible Point-to-Multipoint Data MovementYunhao Deng, Fanchen Kong, Xiaoling Yi, Ryan Antonio, Marian Verhelst2025-12-19下载The growing disparity between computational power and on-chip communication bandwidth is a critical bottleneck in modern Systems-on-Chip (SoCs), especially for data-parallel workloads like AI.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Constrained Cuts, Flows, and Lattice-LinearityRobert Streit, Vijay K. Garg2025-12-19下载In a capacitated directed graph, it is known that the set of all min-cuts forms a distributive lattice [1], [2]. Here, we describe this lattice as a regular predicate whose forbidden elements can be a...
ACE-Sync: An Adaptive Cloud-Edge Synchronization Framework for Communication-Efficient Large-Scale Distributed Model TrainingYi Yang, Ziyu Lin, Liesheng Wei2025-12-19下载Large-scale deep learning models impose substantial communication overh ead in distributed training, particularly in bandwidth-constrained or heterogeneous clo ud-edge environments.
Asymptotic behaviour of galactic small-scale dynamos at modest magnetic Prandtl numberFrederick A. Gent, Mordecai-Mark Mac Low, Maarit J. Korpi-Lagg, Touko Puro, Matthias Reinhardt2025-12-19下载Magnetic fields are critical at many scales to galactic dynamics and structure, including multiphase pressure balance, dust processing, and star formation.
Torrent: A Distributed DMA for Efficient and Flexible Point-to-Multipoint Data MovementYunhao Deng, Fanchen Kong, Xiaoling Yi, Ryan Antonio, Marian Verhelst2025-12-19下载The growing disparity between computational power and on-chip communication bandwidth is a critical bottleneck in modern Systems-on-Chip (SoCs), especially for data-parallel workloads like AI.
Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource SharingLingxiao Zhao, Haoran Zhou, Yuezhi Che, Dazhao Cheng2025-12-19下载Multimodal large language models (MLLMs) extend LLMs with visual understanding through a three-stage pipeline: multimodal preprocessing, vision encoding, and LLM inference.
iOS as AccelerationAlexander K. Chen2025-12-19下载Practical utilization of large-scale machine learning requires a powerful compute setup, a necessity which poses a significant barrier to engagement with such artificial intelligence in more restricte...
The HEAL Data PlatformBrienna M. Larrick, L. Philip Schumm, Mingfei Shao, Craig Barnes, Anthony Juehne, Hara Prasad Juvvla, Michael B. Kranz, Michael Lukowski, Clint Malson, Jessica N. Mazerik, Christopher G. Meyer, Jawad Qureshi, Erin Spaniol, Andrea Tentner, Alexander VanTol, Peter Vassilatos, Sara Volk de Garcia, Robert L. Grossman2025-12-19下载Objective: The objective was to develop a cloud-based, federated system to serve as a single point of search, discovery and analysis for data generated under the NIH Helping to End Addiction Long-term...
Democratizing Scalable Cloud Applications: Transactional Stateful Functions on Streaming DataflowsKyriakos Psarakis2025-12-19下载Web applications underpin much of modern digital life, yet building scalable and consistent cloud applications remains difficult, requiring expertise across cloud computing, distributed systems, datab...
Adaptive Graph Pruning with Sudden-Events Evaluation for Traffic Prediction using Online Semi-Decentralized ST-GNNsIvan Kralj, Lodovico Giaretta, Gordan Ježić, Ivana Podnar Žarko, Šarūnas Girdzijauskas2025-12-19下载Spatio-Temporal Graph Neural Networks (ST-GNNs) are well-suited for processing high-frequency data streams from geographically distributed sensors in smart mobility systems.
Scalable Distributed Vector Search via Accuracy Preserving Index ConstructionYuming Xu, Qianxi Zhang, Qi Chen, Baotong Lu, Menghao Li, Philip Adams, Mingqin Li, Zengzhong Li, Jing Liu, Cheng Li, Fan Yang2025-12-19下载Scaling Approximate Nearest Neighbor Search (ANNS) to billions of vectors requires distributed indexes that balance accuracy, latency, and throughput.
Practical Framework for Privacy-Preserving and Byzantine-robust Federated LearningBaolei Zhang, Minghong Fang, Zhuqing Liu, Biao Yi, Peizhao Zhou, Yuan Wang, Tong Li, Zheli Liu2025-12-19下载Federated Learning (FL) allows multiple clients to collaboratively train a model without sharing their private data. However, FL is vulnerable to Byzantine attacks, where adversaries manipulate client...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
STAR: Semantic-Traffic Alignment and Retrieval for Zero-Shot HTTPS Website FingerprintingYifei Cheng, Yujia Zhu, Baiyang Li, Xinhao Deng, Yitong Cai, Yaochen Ren, Qingyun Liu2025-12-19下载Modern HTTPS mechanisms such as Encrypted Client Hello (ECH) and encrypted DNS improve privacy but remain vulnerable to website fingerprinting (WF) attacks, where adversaries infer visited sites from ...
Binding Agent ID: Unleashing the Power of AI Agents with accountability and credibilityZibin Lin, Shengli Zhang, Guofu Liao, Dacheng Tao, Taotao Wang2025-12-19下载Autonomous AI agents lack traceable accountability mechanisms, creating a fundamental dilemma where systems must either operate as ``downgraded tools'' or risk real-world abuse.
A decomposition approach for large virtual network embeddingAmal Benhamiche, Pierre Fouilhoux, Lucas Létocart, Nancy Perrot, Alexis Schneider2025-12-19下载Virtual Network Embedding (VNE) is the core combinatorial problem of Network Slicing, a 5G technology which enables telecommunication operators to propose diverse service-dedicated virtual networks, e...
Timely Information Updating for Mobile Devices Without and With ML AdviceYu-Pin Hsu, Yi-Hsuan Tseng2025-12-19下载This paper investigates an information update system in which a mobile device monitors a physical process and sends status updates to an access point (AP).
Quantum-enhanced Information Retrieval from Reflective Intelligent SurfacesShiqian Guo, Tingxiang Ji, Jianqing Liu2025-12-19下载Information retrieval from passive backscatter systems is widely used in digital applications with tight energy budgets, short communication distances, and low data rates.
Enhancing AIGC Service Efficiency with Adaptive Multi-Edge Collaboration in A Distributed SystemChangfu Xu, Jianxiong Guo, Jiandian Zeng, Houming Qiu, Tian Wang, Xiaowen Chu, Jiannong Cao2025-12-19下载The Artificial Intelligence Generated Content (AIGC) technique has gained significant traction for producing diverse content. However, existing AIGC services typically operate within a centralized fra...

cs.PF - Performance

标题作者发布日期PDF摘要
On General Linearly Implicit Quantized State System MethodsMariana Bergonzi, Joaquín Fernández, Ernesto Kofman2025-12-19下载This work proposes a methodology to develop new numerical integration algorithms for ordinary differential equations based on state quantization, generalizing the notions of Linearly Implicit Quantize...
GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step OverlappingYishu Yin, Xuehai Qian2025-12-19下载SSD-offloaded training offers a practical and promising approach to making LLM training cost-effective. Building on gradient accumulation with micro-batches, this paper introduces GreedySnake, a new S...

基于 VitePress 构建