Appearance
2025-05-24
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| FlatAttention: Dataflow and Fabric Collectives Co-Optimization for Efficient Multi-Head Attention on Tile-Based Many-PE Accelerators | Chi Zhang, Luca Colagrande, Renzo Andri, Thomas Benz, Gamze Islamoglu, Alessandro Nadalini, Francesco Conti, Yawei Li, Luca Benini | 2025-05-24 | 下载 | Multi-Head Attention (MHA) is a critical computational kernel in transformer-based AI models. Emerging scalable tile-based accelerator architectures integrate increasing numbers of tightly-packed proc... |
| CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance | Dongsuk Oh, Miryeong Kwon, Jiseon Kim, Eunjee Na, Junseok Moon, Hyunkyu Choi, Seonghyeon Jang, Hanjin Choi, Hongjoo Jung, Sangwon Lee, Myoungsoo Jung | 2025-05-24 | 下载 | Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level ... |
| Autocomp: A Powerful and Portable Code Optimizer for Tensor Accelerators | Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao | 2025-05-24 | 下载 | Hardware accelerators, especially those designed for tensor processing, have become ubiquitous in today's computing landscape. However, even with significant efforts in building compilers, programming... |
| Garibaldi: A Pairwise Instruction-Data Management for Enhancing Shared Last-Level Cache Performance in Server Workloads | Jaewon Kwon, Yongju Lee, Jiwan Kim, Enhyeok Jang, Hongju Kal, Won Woo Ro | 2025-05-24 | 下载 | Modern CPUs suffer from the frontend bottleneck because the instruction footprint of server workloads exceeds the private cache capacity. Prior works have examined the CPU components or private cache ... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Distributed Incremental SAT Solving with Mallob: Report and Case Study with Hierarchical Planning | Dominik Schreiber | 2025-05-24 | 下载 | This report describes an extension of the distributed job scheduling and SAT solving platform Mallob by incremental SAT solving, embedded in a case study on SAT-based hierarchical planning. |
| EvoSort: A Genetic-Algorithm-Based Adaptive Parallel Sorting Framework for Large-Scale High Performance Computing | Shashank Raj, Kalyanmoy Deb | 2025-05-24 | 下载 | We present EvoSort, a general-purpose adaptive parallel parallel sorting framework accessible at the Python level. EvoSort employs a Genetic Algorithm (GA) to automatically discover and refine critica... |
| TEE is not a Healer: Rollback-Resistant Reliable Storage (Extended Version) | Sadegh Keshavarzi, Gregory Chockler, Alexey Gotsman | 2025-05-24 | 下载 | Recent advances in secure hardware technologies, such as Intel SGX or ARM TrustZone, offer an opportunity to substantially reduce the costs of Byzantine fault-tolerance by placing the program code and... |
| PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning | Yisu Wang, Ruilong Wu, Xinjiao Li, Dirk Kutscher | 2025-05-24 | 下载 | Large-scale deep neural networks (DNN) exhibit excellent performance for various tasks. As DNNs and datasets grow, distributed training becomes extremely time-consuming and demands larger clusters. |
| MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Quantized Layer Swapping and KV Cache Resizing | Zhaoyuan Su, Zeyu Zhang, Tingfeng Lan, Zirui Wang, Haiying Shen, Juncheng Yang, Yue Cheng | 2025-05-24 | 下载 | Efficiently serving large language models (LLMs) under dynamic and bursty workloads remains a key challenge for real-world deployment. Existing serving frameworks and static model compression techniqu... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| The Dual Horizon: A Rendezvous of Computing and Communication Services at the Optical Layer in Optical Computing-Communication Integrated Network | Dao Thanh Hai, Isaac Woungang | 2025-05-24 | 下载 | With the significant advancements in optical computing platforms recently capable of performing various primitive operations, a seamless integration of optical computing into very fabric of optical co... |
| A DSP-Free Carrier Phase Recovery System using 16-Offset-QAM Laser Forwarded Links for 400Gb/s and Beyond | Marziyeh Rezaei, Dan Sturm, Pengyu Zeng, Sajjad Moazeni | 2025-05-24 | 下载 | Optical interconnects are becoming a major bottleneck in scaling up future GPU racks and network switches within data centers. Although 200 Gb/s optical transceivers using PAM-4 modulation have been d... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS | Kai Mei, Xi Zhu, Hang Gao, Shuhang Lin, Yongfeng Zhang | 2025-05-24 | 下载 | We present AIOS 1.0, a novel platform designed to advance computer-use agent (CUA) capabilities through environmental contextualization. While existing approaches primarily focus on building more powe... |