2025-01-09

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Analog Bayesian neural networks are insensitive to the shape of the weight distribution	Ravi G. Patel, T. Patrick Xiao, Sapan Agarwal, Christopher Bennett	2025-01-09	下载	Recent work has demonstrated that Bayesian neural networks (BNN's) trained with mean field variational inference (MFVI) can be implemented in analog hardware, promising orders of magnitude energy savi...
Explore Activation Sparsity in Recurrent LLMs for Energy-Efficient Neuromorphic Computing	Ivan Knunyants, Maryam Tavakol, Manolis Sifalakis, Yingfu Xu, Amirreza Yousefzadeh, Guangzhi Tang	2025-01-09	下载	The recent rise of Large Language Models (LLMs) has revolutionized the deep learning field. However, the desire to deploy LLMs on edge devices introduces energy efficiency and latency challenges.
Towards High-Performance Network Coding: FPGA Acceleration With Bounded-value Generators	Jiaxin Qing, Philip H. W. Leong, Kin Hong Lee, Raymond W. Yeung	2025-01-09	下载	Network coding enhances performance in network communications and distributed storage by increasing throughput and robustness while reducing latency.
HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL Engineers	Yiyao Yang, Fu Teng, Pengju Liu, Mengnan Qi, Chenyang Lv, Ji Li, Xuhong Zhang, Zhezhi He	2025-01-09	下载	Recently, the use of large language models (LLMs) for Verilog code generation has attracted great research interest to enable hardware design automation.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra	Julian Bellavita, Thomas Pasquali, Laura Del Rio Martin, Flavio Vella, Giulia Guidi	2025-01-09	下载	K-means is a popular clustering algorithm with significant applications in numerous scientific and engineering areas. One drawback of K-means is its inability to identify non-linearly separable cluste...
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters	Ziyue Luo, Jia Liu, Myungjin Lee, Ness B. Shroff	2025-01-09	下载	The recent explosive growth of deep learning (DL) models has necessitated a compelling need for efficient job scheduling for distributed deep learning training with mixed parallelisms (DDLwMP) in GPU ...
On Fair Ordering and Differential Privacy	Shir Cohen, Neel Basu, Soumya Basu, Lorenzo Alvisi	2025-01-09	下载	In blockchain systems, fair transaction ordering is crucial for a trusted and regulation-compliant economic ecosystem. Unlike traditional State Machine Replication (SMR) systems, which focus solely on...
Track reconstruction as a service for collider physics	Haoran Zhao, Yuan-Tang Chou, Yao Yao, Xiangyang Ju, Yongbin Feng, William Patrick McCormack, Miles Cochran-Branson, Jan-Frederik Schulte, Miaoyuan Liu, Javier Duarte, Philip Harris, Shih-Chieh Hsu, Kevin Pedro, Nhan Tran	2025-01-09	下载	Optimizing charged-particle track reconstruction algorithms is crucial for efficient event reconstruction in Large Hadron Collider (LHC) experiments due to their significant computational demands.
Decentralized Diffusion Models	David McAllister, Matthew Tancik, Jiaming Song, Angjoo Kanazawa	2025-01-09	下载	Large-scale AI model training divides work across thousands of GPUs, then synchronizes gradients across them at each step. This incurs a significant network burden that only centralized, monolithic cl...
Tempo: Compiled Dynamic Deep Learning with Symbolic Dependence Graphs	Pedro F. Silvestre, Peter Pietzuch	2025-01-09	下载	Deep learning (DL) algorithms are often defined in terms of temporal relationships: a tensor at one timestep may depend on tensors from earlier or later timesteps.
Byzantine Fault Tolerant Protocols with Near-Constant Work per Node without Signatures	Philipp Schneider	2025-01-09	下载	Numerous distributed tasks have to be handled in a setting where a fraction of nodes behaves Byzantine, that is, deviates arbitrarily from the intended protocol.
Validation of GPU Computation in Decentralized, Trustless Networks	Eric Boniardi, Stanley Bishop, Alison Haire	2025-01-09	下载	Verifying computational processes in decentralized networks poses a fundamental challenge, particularly for Graphics Processing Unit (GPU) computations.
Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing	Mengfan Liu, Wei Wang, Chuan Wu	2025-01-09	下载	With the advancement of serverless computing, running machine learning (ML) inference services over a serverless platform has been advocated, given its labor-free scalability and cost effectiveness.
Distributed Graph Algorithms with Predictions	Joan Boyar, Faith Ellen, Kim S. Larsen	2025-01-09	下载	We initiate the study of deterministic distributed graph algorithms with predictions in synchronous message passing systems. The process at each node in the graph is given a prediction, which is some ...
A Scalable System for Visual Analysis of Ocean Data	Toshit Jain, Upkar Singh, Varun Singh, Vijay Kumar Boda, Ingrid Hotz, Sathish S. Vadhiyar, P. N. Vinayachandran, Vijay Natarajan	2025-01-09	下载	Oceanographers rely on visual analysis to interpret model simulations, identify events and phenomena, and track dynamic ocean processes. The ever increasing resolution and complexity of ocean data due...
Topology-aware Microservice Architecture in Edge Networks: Deployment Optimization and Implementation	Yuang Chen, Chang Wu, Fangyu Zhang, Chengdi Lu, Yongsheng Huang, Hancheng Lu	2025-01-09	下载	As a ubiquitous deployment paradigm, integrating microservice architecture (MSA) into edge networks promises to enhance the flexibility and scalability of services.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Optimal Scheduling in a Quantum Switch	Sanidhay Bhambay, Thirupathaiah Vasantam, Neil Walton	2025-01-09	下载	With a growing number of quantum networks in operation, there is a pressing need for performance analysis of quantum switching technologies. A quantum switch establishes, distributes, and maintains en...
Distributed Learning and Inference Systems: A Networking Perspective	Hesham G. Moussa, Arashmid Akhavain, S. Maryam Hosseini, Bill McCormick	2025-01-09	下载	Machine learning models have achieved, and in some cases surpassed, human-level performance in various tasks, mainly through centralized training of static models and the use of large models stored in...
QMDB: Quick Merkle Database	Isaac Zhang, Ryan Zarick, Daniel Wong, Thomas Kim, Bryan Pellegrino, Mignon Li, Kelvin Wong	2025-01-09	下载	Quick Merkle Database (QMDB) addresses longstanding bottlenecks in blockchain state management by integrating key-value (KV) and Merkle tree storage into a single unified architecture.
Topology-aware Microservice Architecture in Edge Networks: Deployment Optimization and Implementation	Yuang Chen, Chang Wu, Fangyu Zhang, Chengdi Lu, Yongsheng Huang, Hancheng Lu	2025-01-09	下载	As a ubiquitous deployment paradigm, integrating microservice architecture (MSA) into edge networks promises to enhance the flexibility and scalability of services.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
ByteFS: System Support for (CXL-based) Memory-Semantic Solid-State Drives	Shaobo Li, Yirui Eric Zhou, Hao Ren, Jian Huang	2025-01-09	下载	Unlike non-volatile memory that resides on the processor memory bus, memory-semantic solid-state drives (SSDs) support both byte and block access granularity via PCIe or CXL interconnects.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Optimal Scheduling in a Quantum Switch	Sanidhay Bhambay, Thirupathaiah Vasantam, Neil Walton	2025-01-09	下载	With a growing number of quantum networks in operation, there is a pressing need for performance analysis of quantum switching technologies. A quantum switch establishes, distributes, and maintains en...