2025-03-28

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Learning Library Cell Representations in Vector Space	Rongjian Liang, Yi-Chen Lu, Wen-Hao Liu, Haoxing Ren	2025-03-28	下载	We propose Lib2Vec, a novel self-supervised framework to efficiently learn meaningful vector representations of library cells, enabling ML models to capture essential cell semantics.
Benchmarking Ultra-Low-Power μNPUs	Josh Millar, Yushan Huang, Sarab Sethi, Hamed Haddadi, Anil Madhavapeddy	2025-03-28	下载	Efficient on-device neural network (NN) inference offers predictable latency, improved privacy and reliability, and lower operating costs for vendors than cloud-based inference.
NLS: Natural-Level Synthesis for Hardware Implementation Through GenAI	Kaiyuan Yang, Huang Ouyang, Xinyi Wang, Bingjie Lu, Yanbo Wang, Charith Abhayaratne, Sizhao Li, Long Jin, Tiantai Deng	2025-03-28	下载	This paper introduces Natural-Level Synthesis, an innovative approach for generating hardware using generative artificial intelligence on both the system level and component-level.
A Survey of Circuit Foundation Model: Foundation AI Models for VLSI Circuit Design and EDA	Wenji Fang, Jing Wang, Yao Lu, Shang Liu, Yuchao Wu, Yuzhe Ma, Zhiyao Xie	2025-03-28	下载	Artificial intelligence (AI)-driven electronic design automation (EDA) techniques have been extensively explored for VLSI circuit design applications.
CIMR-V: An End-to-End SRAM-based CIM Accelerator with RISC-V for AI Edge Device	Yan-Cheng Guo and, Tian-Sheuan Chang, Chih-Sheng Lin, Bo-Cheng Chiou, Chih-Ming Lai, Shyh-Shyuan Sheu, Wei-Chung Lo, Shih-Chieh Chang	2025-03-28	下载	Computing-in-memory (CIM) is renowned in deep learning due to its high energy efficiency resulting from highly parallel computing with minimal data movement.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
A Pilot Study on Tunable Precision Emulation via Automatic BLAS Offloading	Hang Liu, Junjie Li, Yinzhi Wang	2025-03-28	下载	This study explores the use of automatic BLAS offloading and INT8-based emulation for accelerating traditional HPC workloads on modern GPU architectures.
A Performance Analysis of Task Scheduling for UQ Workflows on HPC Systems	Chung Ming Loi, Anne Reinarz, Mikkel Lykkegaard, William Hornsby, James Buchanan, Linus Seelinger	2025-03-28	下载	Uncertainty Quantification (UQ) workloads are becoming increasingly common in science and engineering. They involve the submission of thousands or even millions of similar tasks with potentially unpre...
Hiding Latencies in Network-Based Image Loading for Deep Learning	Francesco Versaci, Giovanni Busonera	2025-03-28	下载	In the last decades, the computational power of GPUs has grown exponentially, allowing current deep learning (DL) applications to handle increasingly large amounts of data at a progressively higher th...
Niyama : Breaking the Silos of LLM Inference Serving	Kanishk Goel, Jayashree Mohan, Nipun Kwatra, Ravi Shreyas Anupindi, Ramachandran Ramjee	2025-03-28	下载	The widespread adoption of Large Language Models (LLMs) has enabled diverse applications with very different latency requirements. Existing LLM serving frameworks rely on siloed infrastructure with co...
On the Solvability of Byzantine-tolerant Reliable Communication in Dynamic Networks	Silvia Bonomi, Giovanni Farina, Sébastien Tixeuil	2025-03-28	下载	A reliable communication primitive guarantees the delivery, integrity, and authorship of messages exchanged between correct processes of a distributed system.
Memory-aware Adaptive Scheduling of Scientific Workflows on Heterogeneous Architectures	Svetlana Kulagina, Anne Benoit, Henning Meyerhenke	2025-03-28	下载	The analysis of massive scientific data often happens in the form of workflows with interdependent tasks. When such a scientific workflow needs to be scheduled on a parallel or distributed system, one...
SimDC: A High-Fidelity Device Simulation Platform for Device-Cloud Collaborative Computing	Ruiguang Pei, Junjie Wu, Dan Peng, Min Fang, Jianan Zhang, Zhihui Fu, Jun Wang	2025-03-28	下载	The advent of edge intelligence and escalating concerns for data privacy protection have sparked a surge of interest in device-cloud collaborative computing.
CAT: A GPU-Accelerated FHE Framework with Its Application to High-Precision Private Dataset Query	Qirui Li, Rui Zong	2025-03-28	下载	We introduce an open-source GPU-accelerated fully homomorphic encryption (FHE) framework CAT, which surpasses existing solutions in functionality and efficiency.
Route-and-Aggregate Decentralized Federated Learning Under Communication Errors	Weicai Li, Tiejun Lv, Wei Ni, Jingbo Zhao, Ekram Hossain, H. Vincent Poor	2025-03-28	下载	Decentralized federated learning (D-FL) allows clients to aggregate learning models locally, offering flexibility and scalability. Existing D-FL methods use gossip protocols, which are inefficient whe...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
NetSSM: Multi-Flow and State-Aware Network Trace Generation using State Space Models	Andrew Chu, Xi Jiang, Shinan Liu, Arjun Bhagoji, Francesco Bronzino, Paul Schmitt, Nick Feamster	2025-03-28	下载	Access to raw network traffic data is essential for many computer networking tasks, from traffic modeling to performance evaluation. Unfortunately, this data is scarce due to high collection costs and...
Training Large Language Models for Advanced Typosquatting Detection	Jackson Welch	2025-03-28	下载	Typosquatting is a long-standing cyber threat that exploits human error in typing URLs to deceive users, distribute malware, and conduct phishing attacks.
Privacy-Preserving Secure Neighbor Discovery for Wireless Networks	Ahmed Mohamed Hussain, Panos Papadimitratos	2025-03-28	下载	Traditional Neighbor Discovery (ND) and Secure Neighbor Discovery (SND) are key elements for network functionality. SND is a hard problem, satisfying not only typical security properties (authenticati...
Route-and-Aggregate Decentralized Federated Learning Under Communication Errors	Weicai Li, Tiejun Lv, Wei Ni, Jingbo Zhao, Ekram Hossain, H. Vincent Poor	2025-03-28	下载	Decentralized federated learning (D-FL) allows clients to aggregate learning models locally, offering flexibility and scalability. Existing D-FL methods use gossip protocols, which are inefficient whe...
QoS-Aware Service Restoration in 5G Optical Transport Networks	Zahra Sharifi Soltani, Arash Rezaee, Orlando Arias, Vinod M Vokkarane	2025-03-28	下载	Only the chairs can edit The rapid growth of high-bandwidth applications in fifth-generation (5G) networks and beyond has driven a substantial increase in traffic within transport optical networks.
Saving Storage Space Using Files on the Web	Kevin Saric, Gowri Sankar Ramachandran, Raja Jurdak, Surya Nepal	2025-03-28	下载	As conventional storage density reaches its physical limits, the cost of a gigabyte of storage is no longer plummeting, but rather has remained mostly flat for the past decade.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Saving Storage Space Using Files on the Web	Kevin Saric, Gowri Sankar Ramachandran, Raja Jurdak, Surya Nepal	2025-03-28	下载	As conventional storage density reaches its physical limits, the cost of a gigabyte of storage is no longer plummeting, but rather has remained mostly flat for the past decade.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models	Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin, Kai-Chiang Wu, Mohamed S. Abdelfattah, Diana Marculescu	2025-03-28	下载	State Space Models (SSMs) are emerging as a compelling alternative to Transformers because of their consistent memory usage and high performance.
A Pilot Study on Tunable Precision Emulation via Automatic BLAS Offloading	Hang Liu, Junjie Li, Yinzhi Wang	2025-03-28	下载	This study explores the use of automatic BLAS offloading and INT8-based emulation for accelerating traditional HPC workloads on modern GPU architectures.
Service-the-Longest-Queue Among d Choices Policy for Quantum Entanglement Switching	Guo Xian Yau, Thirupathaiah Vasantam, Gayane Vardoyan	2025-03-28	下载	An Entanglement Generation Switch (EGS) is a quantum network hub that provides entangled states to a set of connected nodes by enabling them to share a limited number of hub resources.