2024-07-17

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
SmartQuant: CXL-based AI Model Store in Support of Runtime Configurable Weight Quantization	Rui Xie, Asad Ul Haq, Linsen Ma, Krystal Sun, Sanchari Sen, Swagath Venkataramani, Liu Liu, Tong Zhang	2024-07-17	下载	Recent studies have revealed that, during the inference on generative AI models such as transformer, the importance of different weights exhibits substantial context-dependent variations.
CHOSEN: Compilation to Hardware Optimization Stack for Efficient Vision Transformer Inference	Mohammad Erfan Sadeghi, Arash Fayyazi, Suhas Somashekar, Armin Abdollahi, Massoud Pedram	2024-07-17	下载	Vision Transformers (ViTs) represent a groundbreaking shift in machine learning approaches to computer vision. Unlike traditional approaches, ViTs employ the self-attention mechanism, which has been w...
Highly Efficient Parallel Row-Layered Min-Sum MDPC Decoder for McEliece Cryptosystem	Jiaxuan Cai, Xinmiao Zhang	2024-07-17	下载	The medium-density parity-check (MDPC) code-based McEliece cryptosystem remains a finalist of the post-quantum cryptography standard. The Min-sum decoding algorithm achieves better performance-complex...
ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks	Salma Afifi, Ishan Thakkar, Sudeep Pasricha	2024-07-17	下载	Transformers have emerged as a powerful tool for natural language processing (NLP) and computer vision. Through the attention mechanism, these models have exhibited remarkable performance gains when c...
MCU-MixQ: A HW/SW Co-optimized Mixed-precision Neural Network Design Framework for MCUs	Junfeng Gong, Cheng Liu, Long Cheng, Huawei Li, Xiaowei Li	2024-07-17	下载	Mixed-precision neural network (MPNN) that utilizes just enough data width for the neural network processing is an effective approach to meet the stringent resources constraints including memory and c...
IICPilot: An Intelligent Integrated Circuit Backend Design Framework Using Open EDA	Zesong Jiang, Qing Zhang, Cheng Liu, Long Cheng, Huawei Li, Xiaowei Li	2024-07-17	下载	Open-source EDA tools are rapidly advancing, fostering collaboration, innovation, and knowledge sharing within the EDA community. However, the growing complexity of these tools, characterized by numer...
Graphitron: A Domain Specific Language for FPGA-based Graph Processing Accelerator Generation	Xinmiao Zhang, Zheng Feng, Shengwen Liang, Xinyu Chen, Cheng Liu, Huawei Li, Xiaowei Li	2024-07-17	下载	FPGA-based graph processing accelerators, enabling extensive customization, have demonstrated significant energy efficiency over general computing engines like CPUs and GPUs.
SigDLA: A Deep Learning Accelerator Extension for Signal Processing	Fangfa Fu, Wenyu Zhang, Zesong Jiang, Zhiyu Zhu, Guoyu Li, Bing Yang, Cheng Liu, Liyi Xiao, Jinxiang Wang, Huawei Li, Xiaowei Li	2024-07-17	下载	Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSP...
An Efficient Algorithm for Modulus Operation and Its Hardware Implementation in Prime Number Calculation	W. A. Susantha Wijesinghe	2024-07-17	下载	This paper presents a novel algorithm for the modulus operation for FPGA implementation. The proposed algorithm use only addition, subtraction, logical, and bit shift operations, avoiding the complexi...
StoX-Net: Stochastic Processing of Partial Sums for Efficient In-Memory Computing DNN Accelerators	Ethan G Rogers, Sohan Salahuddin Mugdho, Kshemal Kshemendra Gupte, Cheng Wang	2024-07-17	下载	Crossbar-based in-memory computing (IMC) has emerged as a promising platform for hardware acceleration of deep neural networks (DNNs). However, the energy and latency of IMC systems are dominated by t...
SENTAUR: Security EnhaNced Trojan Assessment Using LLMs Against Undesirable Revisions	Jitendra Bhandari, Rajat Sadhukhan, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri	2024-07-17	下载	A globally distributed IC supply chain brings risks due to untrusted third parties. The risks span inadvertent use of hardware Trojan (HT), inserted Intellectual Property (3P-IP) or Electronic Design ...
Chip Placement with Diffusion Models	Vint Lee, Minh Nguyen, Leena Elzeiny, Chun Deng, Pieter Abbeel, John Wawrzynek	2024-07-17	下载	Macro placement is a vital step in digital circuit design that defines the physical location of large collections of components, known as macros, on a 2D chip.
RTL Verification for Secure Speculation Using Contract Shadow Logic	Qinhan Tan, Yuheng Yang, Thomas Bourgeat, Sharad Malik, Mengjia Yan	2024-07-17	下载	Modern out-of-order processors face speculative execution attacks. Despite various proposed software and hardware mitigations to prevent such attacks, new attacks keep arising from unknown vulnerabili...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Proof-of-Collaborative-Learning: A Multi-winner Federated Learning Consensus Algorithm	Amirreza Sokhankhosh, Sara Rouhani	2024-07-17	下载	Regardless of their variations, blockchains require a consensus mechanism to validate transactions, supervise added blocks, maintain network security, synchronize the network state, and distribute inc...
Automated Gateways: A Smart Contract-Powered Solution for Interoperability Across Blockchains	Koosha Esmaeilzadeh Khorasani, Sara Rouhani, Rui Pan, Vahid Pourheidari	2024-07-17	下载	Interoperability is a significant challenge in blockchain technology, hindering seamless data and service sharing across diverse blockchain networks.
A Framework for testing Federated Learning algorithms using an edge-like environment	Felipe Machado Schwanck, Marcos Tomazzoli Leipnitz, Joel Luís Carbonera, Juliano Araujo Wickboldt	2024-07-17	下载	Federated Learning (FL) is a machine learning paradigm in which many clients cooperatively train a single centralized model while keeping their data private and decentralized.
FlexFL: Heterogeneous Federated Learning via APoZ-Guided Flexible Pruning in Uncertain Scenarios	Zekai Chen, Chentao Jia, Ming Hu, Xiaofei Xie, Anran Li, Mingsong Chen	2024-07-17	下载	Along with the increasing popularity of Deep Learning (DL) techniques, more and more Artificial Intelligence of Things (AIoT) systems are adopting federated learning (FL) to enable privacy-aware colla...
LSKV: A Confidential Distributed Datastore to Protect Critical Data in the Cloud	Andrew Jeffery, Julien Maffre, Heidi Howard, Richard Mortier	2024-07-17	下载	Software services are increasingly migrating to the cloud, requiring trust in actors with direct access to the hardware, software and data comprising the service.
Continuous reasoning for adaptive container image distribution in the cloud-edge continuum	Damiano Azzolini, Stefano Forti, Antonio Ielo	2024-07-17	下载	Cloud-edge computing requires applications to operate across diverse infrastructures, often triggered by cyber-physical events. Containers offer a lightweight deployment option but pulling images from...
Computing: Looking Back and Moving Forward	Muhammed Golec, Sukhpal Singh Gill	2024-07-17	下载	The Internet and computer commercialization have transformed the computing systems area over the past sixty years, affecting society. Computer systems have evolved to meet diverse social needs thanks ...
LLM Inference Serving: Survey of Recent Advances and Opportunities	Baolin Li, Yankai Jiang, Vijay Gadepally, Devesh Tiwari	2024-07-17	下载	This survey offers a comprehensive overview of recent advancements in Large Language Model (LLM) serving systems, focusing on research since the year 2023.
Mitigating Interference of Microservices with a Scoring Mechanism in Large-scale Clusters	Dingyu Yang, Kangpeng Zheng, Shiyou Qian, Jian Cao, Guangtao Xue	2024-07-17	下载	Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
A Framework for testing Federated Learning algorithms using an edge-like environment	Felipe Machado Schwanck, Marcos Tomazzoli Leipnitz, Joel Luís Carbonera, Juliano Araujo Wickboldt	2024-07-17	下载	Federated Learning (FL) is a machine learning paradigm in which many clients cooperatively train a single centralized model while keeping their data private and decentralized.
A Scheduler for Real-Time Service in Wi-Fi 8 Multi-AP Networks With Parameterized Spatial Reuse	Kirill Chemrov, Dmitry Bankov, Andrey Lyakhov, Evgeny Khorov	2024-07-17	下载	Real-time applications (RTAs) require low delays and impose a significant challenge to Wi-Fi. In Wi-Fi, high delays are often caused by waiting for the channel to become idle.
Plausibly Deniable Content Discovery for Bitswap Using Random Walks	Manuel Wedler, Erik Daniel, Florian Tschorsch	2024-07-17	下载	Bitswap is the data exchange protocol for the content-addressed peer-to-peer overlay network IPFS. During content discovery, Bitswap reveals the interest of a peer in content to all neighbors, enablin...
Bayesian Optimization for Fast Radio Mapping and Localization with an Autonomous Aerial Drone	Paul S. Kudyba, Qin Lu, Haijian Sun	2024-07-17	下载	This paper explores how a flying drone can autonomously navigate while constructing a narrowband radio map for signal localization. As flying drones become more ubiquitous, their wireless signals will...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Cheddar: A Swift Fully Homomorphic Encryption Library Designed for GPU Architectures	Wonseok Choi, Jongmin Kim, Jung Ho Ahn	2024-07-17	下载	Fully homomorphic encryption (FHE) frees cloud computing from privacy concerns by enabling secure computation on encrypted data. However, its substantial computational and memory overhead results in s...