Skip to content

2024-07-17

cs.AR - Architecture

标题作者发布日期PDF摘要
SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight QuantizationRui Xie, Asad Ul Haq, Linsen Ma, Krystal Sun, Sanchari Sen, Swagath Venkataramani, Liu Liu, Tong Zhang2024-07-17下载Recent studies have revealed that, during the inference on generative AI models such as transformer, the importance of different weights exhibits substantial context-dependent variations.
CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer InferenceMohammad Erfan Sadeghi, Arash Fayyazi, Suhas Somashekar, Armin Abdollahi, Massoud Pedram2024-07-17下载Vision Transformers (ViTs) represent a groundbreaking shift in machine learning approaches to computer vision. Unlike traditional approaches, ViTs employ the self-attention mechanism, which has been w...
Highly Efficient Parallel Row-Layered Min-Sum MDPC Decoder for McEliece CryptosystemJiaxuan Cai, Xinmiao Zhang2024-07-17下载The medium-density parity-check (MDPC) code-based McEliece cryptosystem remains a finalist of the post-quantum cryptography standard. The Min-sum decoding algorithm achieves better performance-complex...
ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural NetworksSalma Afifi, Ishan Thakkar, Sudeep Pasricha2024-07-17下载Transformers have emerged as a powerful tool for natural language processing (NLP) and computer vision. Through the attention mechanism, these models have exhibited remarkable performance gains when c...
MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUsJunfeng Gong, Cheng Liu, Long Cheng, Huawei Li, Xiaowei Li2024-07-17下载Mixed-precision neural network (MPNN) that utilizes just enough data width for the neural network processing is an effective approach to meet the stringent resources constraints including memory and c...
IICPilot: An Intelligent Integrated Circuit Backend Design Framework Using Open EDAZesong Jiang, Qing Zhang, Cheng Liu, Long Cheng, Huawei Li, Xiaowei Li2024-07-17下载Open-source EDA tools are rapidly advancing, fostering collaboration, innovation, and knowledge sharing within the EDA community. However, the growing complexity of these tools, characterized by numer...
Graphitron: A Domain Specific Language for FPGA-based Graph Processing Accelerator GenerationXinmiao Zhang, Zheng Feng, Shengwen Liang, Xinyu Chen, Cheng Liu, Huawei Li, Xiaowei Li2024-07-17下载FPGA-based graph processing accelerators, enabling extensive customization, have demonstrated significant energy efficiency over general computing engines like CPUs and GPUs.
SigDLA: A Deep Learning Accelerator Extension for Signal ProcessingFangfa Fu, Wenyu Zhang, Zesong Jiang, Zhiyu Zhu, Guoyu Li, Bing Yang, Cheng Liu, Liyi Xiao, Jinxiang Wang, Huawei Li, Xiaowei Li2024-07-17下载Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSP...
An Efficient Algorithm for Modulus Operation and Its Hardware Implementation in Prime Number CalculationW. A. Susantha Wijesinghe2024-07-17下载This paper presents a novel algorithm for the modulus operation for FPGA implementation. The proposed algorithm use only addition, subtraction, logical, and bit shift operations, avoiding the complexi...
StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN AcceleratorsEthan G Rogers, Sohan Salahuddin Mugdho, Kshemal Kshemendra Gupte, Cheng Wang2024-07-17下载Crossbar-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs). However, the energy and latency of IMC systems are dominated by t...
SENTAUR: Security EnhaNced Trojan Assessment Using LLMs Against Undesirable RevisionsJitendra Bhandari, Rajat Sadhukhan, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri2024-07-17下载A globally distributed IC supply chain brings risks due to untrusted third parties. The risks span inadvertent use of hardware Trojan (HT), inserted Intellectual Property (3P-IP) or Electronic Design ...
Chip Placement with Diffusion ModelsVint Lee, Minh Nguyen, Leena Elzeiny, Chun Deng, Pieter Abbeel, John Wawrzynek2024-07-17下载Macro placement is a vital step in digital circuit design that defines the physical location of large collections of components, known as macros, on a 2D chip.
RTL Verification for Secure Speculation Using Contract Shadow LogicQinhan Tan, Yuheng Yang, Thomas Bourgeat, Sharad Malik, Mengjia Yan2024-07-17下载Modern out-of-order processors face speculative execution attacks. Despite various proposed software and hardware mitigations to prevent such attacks, new attacks keep arising from unknown vulnerabili...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Proof-of-Collaborative-Learning: A Multi-winner Federated Learning Consensus AlgorithmAmirreza Sokhankhosh, Sara Rouhani2024-07-17下载Regardless of their variations, blockchains require a consensus mechanism to validate transactions, supervise added blocks, maintain network security, synchronize the network state, and distribute inc...
Automated Gateways: A Smart Contract-Powered Solution for Interoperability Across BlockchainsKoosha Esmaeilzadeh Khorasani, Sara Rouhani, Rui Pan, Vahid Pourheidari2024-07-17下载Interoperability is a significant challenge in blockchain technology, hindering seamless data and service sharing across diverse blockchain networks.
A Framework for testing Federated Learning algorithms using an edge-like environmentFelipe Machado Schwanck, Marcos Tomazzoli Leipnitz, Joel Luís Carbonera, Juliano Araujo Wickboldt2024-07-17下载Federated Learning (FL) is a machine learning paradigm in which many clients cooperatively train a single centralized model while keeping their data private and decentralized.
FlexFL: Heterogeneous Federated Learning via APoZ-Guided Flexible Pruning in Uncertain ScenariosZekai Chen, Chentao Jia, Ming Hu, Xiaofei Xie, Anran Li, Mingsong Chen2024-07-17下载Along with the increasing popularity of Deep Learning (DL) techniques, more and more Artificial Intelligence of Things (AIoT) systems are adopting federated learning (FL) to enable privacy-aware colla...
LSKV: A Confidential Distributed Datastore to Protect Critical Data in the CloudAndrew Jeffery, Julien Maffre, Heidi Howard, Richard Mortier2024-07-17下载Software services are increasingly migrating to the cloud, requiring trust in actors with direct access to the hardware, software and data comprising the service.
Continuous reasoning for adaptive container image distribution in the cloud-edge continuumDamiano Azzolini, Stefano Forti, Antonio Ielo2024-07-17下载Cloud-edge computing requires applications to operate across diverse infrastructures, often triggered by cyber-physical events. Containers offer a lightweight deployment option but pulling images from...
Computing: Looking Back and Moving ForwardMuhammed Golec, Sukhpal Singh Gill2024-07-17下载The Internet and computer commercialization have transformed the computing systems area over the past sixty years, affecting society. Computer systems have evolved to meet diverse social needs thanks ...
LLM Inference Serving: Survey of Recent Advances and OpportunitiesBaolin Li, Yankai Jiang, Vijay Gadepally, Devesh Tiwari2024-07-17下载This survey offers a comprehensive overview of recent advancements in Large Language Model (LLM) serving systems, focusing on research since the year 2023.
Mitigating Interference of Microservices with a Scoring Mechanism in Large-scale ClustersDingyu Yang, Kangpeng Zheng, Shiyou Qian, Jian Cao, Guangtao Xue2024-07-17下载Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
A Framework for testing Federated Learning algorithms using an edge-like environmentFelipe Machado Schwanck, Marcos Tomazzoli Leipnitz, Joel Luís Carbonera, Juliano Araujo Wickboldt2024-07-17下载Federated Learning (FL) is a machine learning paradigm in which many clients cooperatively train a single centralized model while keeping their data private and decentralized.
A Scheduler for Real-Time Service in Wi-Fi 8 Multi-AP Networks With Parameterized Spatial ReuseKirill Chemrov, Dmitry Bankov, Andrey Lyakhov, Evgeny Khorov2024-07-17下载Real-time applications (RTAs) require low delays and impose a significant challenge to Wi-Fi. In Wi-Fi, high delays are often caused by waiting for the channel to become idle.
Plausibly Deniable Content Discovery for Bitswap Using Random WalksManuel Wedler, Erik Daniel, Florian Tschorsch2024-07-17下载Bitswap is the data exchange protocol for the content-addressed peer-to-peer overlay network IPFS. During content discovery, Bitswap reveals the interest of a peer in content to all neighbors, enablin...
Bayesian Optimization for Fast Radio Mapping and Localization with an Autonomous Aerial DronePaul S. Kudyba, Qin Lu, Haijian Sun2024-07-17下载This paper explores how a flying drone can autonomously navigate while constructing a narrowband radio map for signal localization. As flying drones become more ubiquitous, their wireless signals will...

cs.PF - Performance

标题作者发布日期PDF摘要
Cheddar: A Swift Fully Homomorphic Encryption Library Designed for GPU ArchitecturesWonseok Choi, Jongmin Kim, Jung Ho Ahn2024-07-17下载Fully homomorphic encryption (FHE) frees cloud computing from privacy concerns by enabling secure computation on encrypted data. However, its substantial computational and memory overhead results in s...

基于 VitePress 构建