Appearance
2025-04-21
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis | Ritik Raj, Sarbartha Banerjee, Nikhil Chandra, Zishen Wan, Jianming Tong, Ananda Samajdar, Tushar Krishna | 2025-04-21 | 下载 | The rapid advancements in AI, scientific computing, and high-performance computing (HPC) have driven the need for versatile and efficient hardware accelerators. |
| Computing with Printed and Flexible Electronics | Mehdi B. Tahoori, Emre Ozer, Georgios Zervakis, Konstantinos Balaskas, Priyanjana Pal | 2025-04-21 | 下载 | Printed and flexible electronics (PFE) have emerged as the ubiquitous solution for application domains at the extreme edge, where the demands for low manufacturing and operational cost cannot be met b... |
| ForgeBench: A Machine Learning Benchmark Suite and Auto-Generation Framework for Next-Generation HLS Tools | Andy Wanna, Hanqiu Chen, Cong Hao | 2025-04-21 | 下载 | Although High-Level Synthesis (HLS) has attracted considerable interest in hardware design, it has not yet become mainstream due to two primary challenges. |
| Advancing AI-assisted Hardware Design with Hierarchical Decentralized Training and Personalized Inference-Time Optimization | Hao Mark Chen, Zehuan Zhang, Wanru Zhao, Nicholas Lane, Hongxiang Fan | 2025-04-21 | 下载 | Recent years have witnessed a significant increase in the adoption of AI techniques to enhance electronic design automation. In particular, the emergence of Large Language Models (LLMs) has sparked si... |
| Considerations on the Design of Transceivers for Ambient Internet of Things | Yuxiao Zhao, Zhen Shen, Shiyu Li, Jing Feng, Hao Min | 2025-04-21 | 下载 | The Ambient IoT (A-IoT) will introduce trillions of connections and enable low-cost battery-less devices. The A-IoT nodes can achieve low cost ($\sim$ 0. |
| Hardware-based Heterogeneous Memory Management for Large Language Model Inference | Soojin Hwang, Jungwoo Kim, Sanghyeon Lee, Hongbeen Kim, Jaehyuk Huh | 2025-04-21 | 下载 | A large language model (LLM) is one of the most important emerging machine learning applications nowadays. However, due to its huge model size and runtime increase of the memory footprint, LLM inferen... |
| GainSight: A Unified Framework for Data Lifetime Profiling and Heterogeneous Memory Composition | Peijing Li, Matthew Hung, Yiming Tan, Konstantin Hoßfeld, Jake Cheng Jiajun, Shuhan Liu, Lixian Yan, Xinxin Wang, Philip Levis, H. -S. Philip Wong, Thierry Tambe | 2025-04-21 | 下载 | As AI workloads drive increasing memory requirements, domain-specific accelerators need higher-density on-chip memory beyond what current SRAM scaling trends can provide. |
| Ultra-Low-Power Spiking Neurons in 7 nm FinFET Technology: A Comparative Analysis of Leaky Integrate-and-Fire, Morris-Lecar, and Axon-Hillock Architectures | Logan Larsh, Raiyan Siddique, Sarah Sharif Yaser Mike Banad | 2025-04-21 | 下载 | Neuromorphic computing aims to replicate the brain's remarkable energy efficiency and parallel processing capabilities for large-scale artificial intelligence applications. |
| Splitwiser: Efficient LM inference with constrained resources | Asad Aali, Adney Cardoza, Melissa Capo | 2025-04-21 | 下载 | Efficient inference of LLMs remains a crucial challenge, with two main phases: a compute-intensive prompt computation and a memory-intensive token generation. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Tracing Cross-chain Transactions between EVM-based Blockchains: An Analysis of Ethereum-Polygon Bridges | Tao Yan, Chuanshan Huang, Claudio J. Tessone | 2025-04-21 | 下载 | Ethereum's scalability has been a major concern due to its limited transaction throughput and high fees. To address these limitations, Polygon has emerged as a sidechain solution that facilitates asse... |
| FedFetch: Faster Federated Learning with Adaptive Downstream Prefetching | Qifan Yan, Andrew Liu, Shiqi He, Mathias Lécuyer, Ivan Beschastnikh | 2025-04-21 | 下载 | Federated learning (FL) is a machine learning paradigm that facilitates massively distributed model training with end-user data on edge devices directed by a central server. |
| Advancing AI-assisted Hardware Design with Hierarchical Decentralized Training and Personalized Inference-Time Optimization | Hao Mark Chen, Zehuan Zhang, Wanru Zhao, Nicholas Lane, Hongxiang Fan | 2025-04-21 | 下载 | Recent years have witnessed a significant increase in the adoption of AI techniques to enhance electronic design automation. In particular, the emergence of Large Language Models (LLMs) has sparked si... |
| To Offload or Not To Offload: Model-driven Comparison of Edge-native and On-device Processing In the Era of Accelerators | Nathan Ng, David Irwin, Ananthram Swami, Don Towsley, Prashant Shenoy | 2025-04-21 | 下载 | Computational offloading is a promising approach for overcoming resource constraints on client devices by moving some or all of an application's computations to remote servers. |
| Is Intelligence the Right Direction in New OS Scheduling for Multiple Resources in Cloud Environments? | Xinglei Dou, Lei Liu, Limin Xiao | 2025-04-21 | 下载 | Making it intelligent is a promising way in System/OS design. This paper proposes OSML+, a new ML-based resource scheduling mechanism for co-located cloud services. |
| SLO-Aware Scheduling for Large Language Model Inferences | Jinqi Huang, Yi Xiong, Xuebing Yu, Wenjie Huang, Entong Li, Li Zeng, Xin Chen | 2025-04-21 | 下载 | Large language models (LLMs) have revolutionized applications such as code completion, chatbots, and online classification. To elevate user experiences, service level objectives (SLOs) serve as crucia... |
| MoE Parallel Folding: Heterogeneous Parallelism Mappings for Efficient Large-Scale MoE Model Training with Megatron Core | Dennis Liu, Zijie Yan, Xin Yao, Tong Liu, Vijay Korthikanti, Evan Wu, Shiqing Fan, Gao Deng, Hongxiao Bai, Jianbin Chang, Ashwath Aithal, Michael Andersch, Mohammad Shoeybi, Jiajie Yao, Chandler Zhou, David Wu, Xipeng Li, June Yang | 2025-04-21 | 下载 | Mixture of Experts (MoE) models enhance neural network scalability by dynamically selecting relevant experts per input token, enabling larger model sizes while maintaining manageable computation costs... |
| WindVE: Collaborative CPU-NPU Vector Embedding | Jinqi Huang, Xuebing Yu, Yi Xiong, Wenjie Huang, Entong Li, Li Zeng, Xin chen | 2025-04-21 | 下载 | Retrieval-Augmented Generation is a technology that enhances large language models by integrating information retrieval. In the industry, inference services based on LLMs are highly sensitive to cost-... |
| ReCraft: Self-Contained Split, Merge, and Membership Change of Raft Protocol | Kezhi Xiong, Soonwon Moon, Joshua Kang, Bryant Curto, Jieung Kim, Ji-Yong Shin | 2025-04-21 | 下载 | Designing reconfiguration schemes for consensus protocols is challenging because subtle corner cases during reconfiguration could invalidate the correctness of the protocol. |
| Cultivating Multidisciplinary Research and Education on GPU Infrastructure for Mid-South Institutions at the University of Memphis: Practice and Challenge | Mayira Sharif, Guangzeng Han, Weisi Liu, Xiaolei Huang | 2025-04-21 | 下载 | To support rapid scientific advancement and promote access to large-scale computing resources for under-resourced institutions at the Mid-South region, the University of Memphis (UofM) established the... |
| Splitwiser: Efficient LM inference with constrained resources | Asad Aali, Adney Cardoza, Melissa Capo | 2025-04-21 | 下载 | Efficient inference of LLMs remains a crucial challenge, with two main phases: a compute-intensive prompt computation and a memory-intensive token generation. |
| gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling | Tianyu Guo, Xianwei Zhang, Jiangsu Du, Zhiguang Chen, Nong Xiao, Yutong Lu | 2025-04-21 | 下载 | Pipeline parallelism has emerged as a predominant approach for deploying large language models (LLMs) across distributed nodes, owing to its lower communication overhead compared to tensor parallelism... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Generative Artificial Intelligence for Beamforming in Low-Altitude Economy | Geng Sun, Jia Qi, Chuang Zhang, Xuejie Liu, Jiacheng Wang, Dusit Niyato, Yuanwei Liu, Dong In Kim | 2025-04-21 | 下载 | The growth of low-altitude economy (LAE) has driven a rising demand for efficient and secure communication. However, conventional beamforming optimization techniques struggle in the complex LAE enviro... |
| Direct Search Algorithm for Clock Skew Compensation Immune to Floating-Point Precision Loss | Kyeong Soo Kim | 2025-04-21 | 下载 | We have been investigating clock skew compensation immune to floating-point precision loss by taking into account the discrete nature of clocks in digital communication systems; extending Bresenham's ... |
| NetCloak: Dynamic Topology Expansion for Secure and Scalable Configuration Sharing | Qianye Wang, Yuejie Wang, Yongting Chen, Guyue Liu | 2025-04-21 | 下载 | As modern networks continue to grow in both scale and complexity, sharing real-world device configurations poses significant privacy risks, especially when adversaries can infer organizational size or... |
| IoT-AMLHP: Aligned Multimodal Learning of Header-Payload Representations for Resource-Efficient Malicious IoT Traffic Classification | Fengyuan Nie, Guangjie Liu, Weiwei Liu, Jianan Huang, Bo Gao | 2025-04-21 | 下载 | Traffic classification is crucial for securing Internet of Things (IoT) networks. Deep learning-based methods can autonomously extract latent patterns from massive network traffic, demonstrating signi... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| LithOS: An Operating System for Efficient Machine Learning on GPUs | Patrick H. Coppock, Brian Zhang, Eliot H. Solomon, Vasilis Kypriotis, Leon Yang, Bikash Sharma, Dan Schatzberg, Todd C. Mowry, Dimitrios Skarlatos | 2025-04-21 | 下载 | The surging demand for GPUs in datacenters for machine learning (ML) has made efficient GPU utilization crucial. However, meeting the diverse needs of ML models while optimizing resource usage is chal... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| SCALE-Sim v3: A modular cycle-accurate systolic accelerator simulator for end-to-end system analysis | Ritik Raj, Sarbartha Banerjee, Nikhil Chandra, Zishen Wan, Jianming Tong, Ananda Samajdar, Tushar Krishna | 2025-04-21 | 下载 | The rapid advancements in AI, scientific computing, and high-performance computing (HPC) have driven the need for versatile and efficient hardware accelerators. |
| Zero-Shot, But at What Cost? Unveiling the Hidden Overhead of MILS's LLM-CLIP Framework for Image Captioning | Yassir Benhammou, Alessandro Tiberio, Gabriel Trautmann, Suman Kalyan | 2025-04-21 | 下载 | MILS (Multimodal Iterative LLM Solver) is a recently published framework that claims "LLMs can see and hear without any training" by leveraging an iterative, LLM-CLIP based approach for zero-shot imag... |
| Is Intelligence the Right Direction in New OS Scheduling for Multiple Resources in Cloud Environments? | Xinglei Dou, Lei Liu, Limin Xiao | 2025-04-21 | 下载 | Making it intelligent is a promising way in System/OS design. This paper proposes OSML+, a new ML-based resource scheduling mechanism for co-located cloud services. |