Skip to content

2025-01-09

cs.AR - Architecture

标题作者发布日期PDF摘要
Analog Bayesian neural networks are insensitive to the shape of the weight distributionRavi G. Patel, T. Patrick Xiao, Sapan Agarwal, Christopher Bennett2025-01-09下载Recent work has demonstrated that Bayesian neural networks (BNN's) trained with mean field variational inference (MFVI) can be implemented in analog hardware, promising orders of magnitude energy savi...
Explore Activation Sparsity in Recurrent LLMs for Energy-Efficient Neuromorphic ComputingIvan Knunyants, Maryam Tavakol, Manolis Sifalakis, Yingfu Xu, Amirreza Yousefzadeh, Guangzhi Tang2025-01-09下载The recent rise of Large Language Models (LLMs) has revolutionized the deep learning field. However, the desire to deploy LLMs on edge devices introduces energy efficiency and latency challenges.
Towards High-Performance Network Coding: FPGA Acceleration With Bounded-value GeneratorsJiaxin Qing, Philip H. W. Leong, Kin Hong Lee, Raymond W. Yeung2025-01-09下载Network coding enhances performance in network communications and distributed storage by increasing throughput and robustness while reducing latency.
HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL EngineersYiyao Yang, Fu Teng, Pengju Liu, Mengnan Qi, Chenyang Lv, Ji Li, Xuhong Zhang, Zhezhi He2025-01-09下载Recently, the use of large language models (LLMs) for Verilog code generation has attracted great research interest to enable hardware design automation.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear AlgebraJulian Bellavita, Thomas Pasquali, Laura Del Rio Martin, Flavio Vella, Giulia Guidi2025-01-09下载K-means is a popular clustering algorithm with significant applications in numerous scientific and engineering areas. One drawback of K-means is its inability to identify non-linearly separable cluste...
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU ClustersZiyue Luo, Jia Liu, Myungjin Lee, Ness B. Shroff2025-01-09下载The recent explosive growth of deep learning (DL) models has necessitated a compelling need for efficient job scheduling for distributed deep learning training with mixed parallelisms (DDLwMP) in GPU ...
On Fair Ordering and Differential PrivacyShir Cohen, Neel Basu, Soumya Basu, Lorenzo Alvisi2025-01-09下载In blockchain systems, fair transaction ordering is crucial for a trusted and regulation-compliant economic ecosystem. Unlike traditional State Machine Replication (SMR) systems, which focus solely on...
Track reconstruction as a service for collider physicsHaoran Zhao, Yuan-Tang Chou, Yao Yao, Xiangyang Ju, Yongbin Feng, William Patrick McCormack, Miles Cochran-Branson, Jan-Frederik Schulte, Miaoyuan Liu, Javier Duarte, Philip Harris, Shih-Chieh Hsu, Kevin Pedro, Nhan Tran2025-01-09下载Optimizing charged-particle track reconstruction algorithms is crucial for efficient event reconstruction in Large Hadron Collider (LHC) experiments due to their significant computational demands.
Decentralized Diffusion ModelsDavid McAllister, Matthew Tancik, Jiaming Song, Angjoo Kanazawa2025-01-09下载Large-scale AI model training divides work across thousands of GPUs, then synchronizes gradients across them at each step. This incurs a significant network burden that only centralized, monolithic cl...
Tempo: Compiled Dynamic Deep Learning with Symbolic Dependence GraphsPedro F. Silvestre, Peter Pietzuch2025-01-09下载Deep learning (DL) algorithms are often defined in terms of temporal relationships: a tensor at one timestep may depend on tensors from earlier or later timesteps.
Byzantine Fault Tolerant Protocols with Near-Constant Work per Node without SignaturesPhilipp Schneider2025-01-09下载Numerous distributed tasks have to be handled in a setting where a fraction of nodes behaves Byzantine, that is, deviates arbitrarily from the intended protocol.
Validation of GPU Computation in Decentralized, Trustless NetworksEric Boniardi, Stanley Bishop, Alison Haire2025-01-09下载Verifying computational processes in decentralized networks poses a fundamental challenge, particularly for Graphics Processing Unit (GPU) computations.
Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless ComputingMengfan Liu, Wei Wang, Chuan Wu2025-01-09下载With the advancement of serverless computing, running machine learning (ML) inference services over a serverless platform has been advocated, given its labor-free scalability and cost effectiveness.
Distributed Graph Algorithms with PredictionsJoan Boyar, Faith Ellen, Kim S. Larsen2025-01-09下载We initiate the study of deterministic distributed graph algorithms with predictions in synchronous message passing systems. The process at each node in the graph is given a prediction, which is some ...
A Scalable System for Visual Analysis of Ocean DataToshit Jain, Upkar Singh, Varun Singh, Vijay Kumar Boda, Ingrid Hotz, Sathish S. Vadhiyar, P. N. Vinayachandran, Vijay Natarajan2025-01-09下载Oceanographers rely on visual analysis to interpret model simulations, identify events and phenomena, and track dynamic ocean processes. The ever increasing resolution and complexity of ocean data due...
Topology-aware Microservice Architecture in Edge Networks: Deployment Optimization and ImplementationYuang Chen, Chang Wu, Fangyu Zhang, Chengdi Lu, Yongsheng Huang, Hancheng Lu2025-01-09下载As a ubiquitous deployment paradigm, integrating microservice architecture (MSA) into edge networks promises to enhance the flexibility and scalability of services.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Optimal Scheduling in a Quantum SwitchSanidhay Bhambay, Thirupathaiah Vasantam, Neil Walton2025-01-09下载With a growing number of quantum networks in operation, there is a pressing need for performance analysis of quantum switching technologies. A quantum switch establishes, distributes, and maintains en...
Distributed Learning and Inference Systems: A Networking PerspectiveHesham G. Moussa, Arashmid Akhavain, S. Maryam Hosseini, Bill McCormick2025-01-09下载Machine learning models have achieved, and in some cases surpassed, human-level performance in various tasks, mainly through centralized training of static models and the use of large models stored in...
QMDB: Quick Merkle DatabaseIsaac Zhang, Ryan Zarick, Daniel Wong, Thomas Kim, Bryan Pellegrino, Mignon Li, Kelvin Wong2025-01-09下载Quick Merkle Database (QMDB) addresses longstanding bottlenecks in blockchain state management by integrating key-value (KV) and Merkle tree storage into a single unified architecture.
Topology-aware Microservice Architecture in Edge Networks: Deployment Optimization and ImplementationYuang Chen, Chang Wu, Fangyu Zhang, Chengdi Lu, Yongsheng Huang, Hancheng Lu2025-01-09下载As a ubiquitous deployment paradigm, integrating microservice architecture (MSA) into edge networks promises to enhance the flexibility and scalability of services.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
ByteFS: System Support for (CXL-based) Memory-Semantic Solid-State DrivesShaobo Li, Yirui Eric Zhou, Hao Ren, Jian Huang2025-01-09下载Unlike non-volatile memory that resides on the processor memory bus, memory-semantic solid-state drives (SSDs) support both byte and block access granularity via PCIe or CXL interconnects.

cs.PF - Performance

标题作者发布日期PDF摘要
Optimal Scheduling in a Quantum SwitchSanidhay Bhambay, Thirupathaiah Vasantam, Neil Walton2025-01-09下载With a growing number of quantum networks in operation, there is a pressing need for performance analysis of quantum switching technologies. A quantum switch establishes, distributes, and maintains en...

基于 VitePress 构建