Skip to content

2025-05-24

cs.AR - Architecture

标题作者发布日期PDF摘要
FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Efficient Multi-Head Attention on Tile-Based Many-PE AcceleratorsChi Zhang, Luca Colagrande, Renzo Andri, Thomas Benz, Gamze Islamoglu, Alessandro Nadalini, Francesco Conti, Yawei Li, Luca Benini2025-05-24下载Multi-Head Attention (MHA) is a critical computational kernel in transformer-based AI models. Emerging scalable tile-based accelerator architectures integrate increasing numbers of tightly-packed proc...
CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD PerformanceDongsuk Oh, Miryeong Kwon, Jiseon Kim, Eunjee Na, Junseok Moon, Hyunkyu Choi, Seonghyeon Jang, Hanjin Choi, Hongjoo Jung, Sangwon Lee, Myoungsoo Jung2025-05-24下载Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level ...
Autocomp: A Powerful and Portable Code Optimizer for Tensor AcceleratorsCharles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao2025-05-24下载Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today's computing landscape. However, even with significant efforts in building compilers, programming...
Garibaldi: A Pairwise Instruction-Data Management for Enhancing Shared Last-Level Cache Performance in Server WorkloadsJaewon Kwon, Yongju Lee, Jiwan Kim, Enhyeok Jang, Hongju Kal, Won Woo Ro2025-05-24下载Modern CPUs suffer from the frontend bottleneck because the instruction footprint of server workloads exceeds the private cache capacity. Prior works have examined the CPU components or private cache ...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Distributed Incremental SAT Solving with Mallob: Report and Case Study with Hierarchical PlanningDominik Schreiber2025-05-24下载This report describes an extension of the distributed job scheduling and SAT solving platform Mallob by incremental SAT solving, embedded in a case study on SAT-based hierarchical planning.
EvoSort: A Genetic-Algorithm-Based Adaptive Parallel Sorting Framework for Large-Scale High Performance ComputingShashank Raj, Kalyanmoy Deb2025-05-24下载We present EvoSort, a general-purpose adaptive parallel parallel sorting framework accessible at the Python level. EvoSort employs a Genetic Algorithm (GA) to automatically discover and refine critica...
TEE is not a Healer: Rollback-Resistant Reliable Storage (Extended Version)Sadegh Keshavarzi, Gregory Chockler, Alexey Gotsman2025-05-24下载Recent advances in secure hardware technologies, such as Intel SGX or ARM TrustZone, offer an opportunity to substantially reduce the costs of Byzantine fault-tolerance by placing the program code and...
PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep LearningYisu Wang, Ruilong Wu, Xinjiao Li, Dirk Kutscher2025-05-24下载Large-scale deep neural networks (DNN) exhibit excellent performance for various tasks. As DNNs and datasets grow, distributed training becomes extremely time-consuming and demands larger clusters.
MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Quantized Layer Swapping and KV Cache ResizingZhaoyuan Su, Zeyu Zhang, Tingfeng Lan, Zirui Wang, Haiying Shen, Juncheng Yang, Yue Cheng2025-05-24下载Efficiently serving large language models (LLMs) under dynamic and bursty workloads remains a key challenge for real-world deployment. Existing serving frameworks and static model compression techniqu...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
The Dual Horizon: A Rendezvous of Computing and Communication Services at the Optical Layer in Optical Computing-Communication Integrated NetworkDao Thanh Hai, Isaac Woungang2025-05-24下载With the significant advancements in optical computing platforms recently capable of performing various primitive operations, a seamless integration of optical computing into very fabric of optical co...
A DSP-Free Carrier Phase Recovery System using 16-Offset-QAM Laser Forwarded Links for 400Gb/s and BeyondMarziyeh Rezaei, Dan Sturm, Pengyu Zeng, Sajjad Moazeni2025-05-24下载Optical interconnects are becoming a major bottleneck in scaling up future GPU racks and network switches within data centers. Although 200 Gb/s optical transceivers using PAM-4 modulation have been d...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOSKai Mei, Xi Zhu, Hang Gao, Shuhang Lin, Yongfeng Zhang2025-05-24下载We present AIOS 1.0, a novel platform designed to advance computer-use agent (CUA) capabilities through environmental contextualization. While existing approaches primarily focus on building more powe...

基于 VitePress 构建