2025-05-24

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Efficient Multi-Head Attention on Tile-Based Many-PE Accelerators	Chi Zhang, Luca Colagrande, Renzo Andri, Thomas Benz, Gamze Islamoglu, Alessandro Nadalini, Francesco Conti, Yawei Li, Luca Benini	2025-05-24	下载	Multi-Head Attention (MHA) is a critical computational kernel in transformer-based AI models. Emerging scalable tile-based accelerator architectures integrate increasing numbers of tightly-packed proc...
CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance	Dongsuk Oh, Miryeong Kwon, Jiseon Kim, Eunjee Na, Junseok Moon, Hyunkyu Choi, Seonghyeon Jang, Hanjin Choi, Hongjoo Jung, Sangwon Lee, Myoungsoo Jung	2025-05-24	下载	Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level ...
Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators	Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao	2025-05-24	下载	Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today's computing landscape. However, even with significant efforts in building compilers, programming...
Garibaldi: A Pairwise Instruction-Data Management for Enhancing Shared Last-Level Cache Performance in Server Workloads	Jaewon Kwon, Yongju Lee, Jiwan Kim, Enhyeok Jang, Hongju Kal, Won Woo Ro	2025-05-24	下载	Modern CPUs suffer from the frontend bottleneck because the instruction footprint of server workloads exceeds the private cache capacity. Prior works have examined the CPU components or private cache ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Distributed Incremental SAT Solving with Mallob: Report and Case Study with Hierarchical Planning	Dominik Schreiber	2025-05-24	下载	This report describes an extension of the distributed job scheduling and SAT solving platform Mallob by incremental SAT solving, embedded in a case study on SAT-based hierarchical planning.
EvoSort: A Genetic-Algorithm-Based Adaptive Parallel Sorting Framework for Large-Scale High Performance Computing	Shashank Raj, Kalyanmoy Deb	2025-05-24	下载	We present EvoSort, a general-purpose adaptive parallel parallel sorting framework accessible at the Python level. EvoSort employs a Genetic Algorithm (GA) to automatically discover and refine critica...
TEE is not a Healer: Rollback-Resistant Reliable Storage (Extended Version)	Sadegh Keshavarzi, Gregory Chockler, Alexey Gotsman	2025-05-24	下载	Recent advances in secure hardware technologies, such as Intel SGX or ARM TrustZone, offer an opportunity to substantially reduce the costs of Byzantine fault-tolerance by placing the program code and...
PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning	Yisu Wang, Ruilong Wu, Xinjiao Li, Dirk Kutscher	2025-05-24	下载	Large-scale deep neural networks (DNN) exhibit excellent performance for various tasks. As DNNs and datasets grow, distributed training becomes extremely time-consuming and demands larger clusters.
MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Quantized Layer Swapping and KV Cache Resizing	Zhaoyuan Su, Zeyu Zhang, Tingfeng Lan, Zirui Wang, Haiying Shen, Juncheng Yang, Yue Cheng	2025-05-24	下载	Efficiently serving large language models (LLMs) under dynamic and bursty workloads remains a key challenge for real-world deployment. Existing serving frameworks and static model compression techniqu...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
The Dual Horizon: A Rendezvous of Computing and Communication Services at the Optical Layer in Optical Computing-Communication Integrated Network	Dao Thanh Hai, Isaac Woungang	2025-05-24	下载	With the significant advancements in optical computing platforms recently capable of performing various primitive operations, a seamless integration of optical computing into very fabric of optical co...
A DSP-Free Carrier Phase Recovery System using 16-Offset-QAM Laser Forwarded Links for 400Gb/s and Beyond	Marziyeh Rezaei, Dan Sturm, Pengyu Zeng, Sajjad Moazeni	2025-05-24	下载	Optical interconnects are becoming a major bottleneck in scaling up future GPU racks and network switches within data centers. Although 200 Gb/s optical transceivers using PAM-4 modulation have been d...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS	Kai Mei, Xi Zhu, Hang Gao, Shuhang Lin, Yongfeng Zhang	2025-05-24	下载	We present AIOS 1.0, a novel platform designed to advance computer-use agent (CUA) capabilities through environmental contextualization. While existing approaches primarily focus on building more powe...