2024-01-25

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Designing Silicon Brains using LLM: Leveraging ChatGPT for Automated Description of a Spiking Neuron Array	Michael Tomlinson, Joe Li, Andreas Andreou	2024-01-25	下载	Large language models (LLMs) have made headlines for synthesizing correct-sounding responses to a variety of prompts, including code generation.
InfiniteEn: A Multi-Source Energy Harvesting System with Load Monitoring Module for Batteryless Internet of Things	Priyesh Pappinisseri Puluckul, Maarten Weyn	2024-01-25	下载	This paper presents InfiniteEn, a multi-source energy harvesting platform designed for the Internet of Batteryless Things (IoBT). InfiniteEn incorporates an efficient energy combiner to combine energy...
Evaluation of POSIT Arithmetic with Accelerators	Naohito Nakasato, Yuki Murakami, Fumiya Kono, Maho Nakata	2024-01-25	下载	We present an evaluation of 32-bit POSIT arithmetic through its implementation as accelerators on FPGAs and GPUs. POSIT, a floating-point number format, adaptively changes the size of its fractional p...
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design	Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song	2024-01-25	下载	Six-bit quantization (FP6) can effectively reduce the size of large language models (LLMs) and preserve the model quality consistently across varied applications.
Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators	Yaniv Blumenfeld, Itay Hubara, Daniel Soudry	2024-01-25	下载	The majority of the research on the quantization of Deep Neural Networks (DNNs) is focused on reducing the precision of tensors visible by high-level frameworks (e.g.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Interactive and Urgent HPC: Challenges and Opportunities	Albert Reuther, Nick Brown, William Arndt, Johannes Blaschke, Christian Boehme, Antony Chazapis, Bjoern Enders, Robert Henschel, Julian Kunkel, Maxime Martinasso	2024-01-25	下载	As a broader set of applications from simulations to data analysis and machine learning require more parallel computational capability, the demand for interactive and urgent high performance computing...
Unsealing the secrets of blockchain consensus: A systematic comparison of the formal security of proof-of-work and proof-of-stake	Iván Abellán Álvarez, Vincent Gramlich, Johannes Sedlmeir	2024-01-25	下载	With the increasing adoption of decentralized information systems based on a variety of permissionless blockchain networks, the choice of consensus mechanism is at the core of many controversial discu...
The Case for Co-Designing Model Architectures with Hardware	Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar Panda	2024-01-25	下载	While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) mo...
ServerlessLLM: Low-Latency Serverless Inference for Large Language Models	Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai	2024-01-25	下载	This paper presents ServerlessLLM, a distributed system designed to support low-latency serverless inference for Large Language Models (LLMs).
CHIRON: Accelerating Node Synchronization without Security Trade-offs in Distributed Ledgers	Ray Neiheiser, Arman Babaei, Giannis Alexopoulos, Marios Kogias, Eleftherios Kokoris Kogias	2024-01-25	下载	Blockchain performance has historically faced challenges posed by the throughput limitations of consensus algorithms. Recent breakthroughs in research have successfully alleviated these constraints by...
Communication-Efficient Federated Learning through Adaptive Weight Clustering and Server-Side Distillation	Vasileios Tsouvalas, Aaqib Saeed, Tanir Ozcelebi, Nirvana Meratnia	2024-01-25	下载	Federated Learning (FL) is a promising technique for the collaborative training of deep neural networks across multiple devices while preserving data privacy.
Enabling Cross-Camera Collaboration for Video Analytics on Distributed Smart Cameras	Chulhong Min, Juheon Yi, Utku Gunay Acer, Fahim Kawsar	2024-01-25	下载	Overlapping cameras offer exciting opportunities to view a scene from different angles, allowing for more advanced, comprehensive and robust analysis.
Evaluation of POSIT Arithmetic with Accelerators	Naohito Nakasato, Yuki Murakami, Fumiya Kono, Maho Nakata	2024-01-25	下载	We present an evaluation of 32-bit POSIT arithmetic through its implementation as accelerators on FPGAs and GPUs. POSIT, a floating-point number format, adaptively changes the size of its fractional p...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
5G Network Security Practices: An Overview and Survey	Fatema Bannat Wala, Mariam Kiran	2024-01-25	下载	This document provides an overview of 5G network security, describing various components of the 5G core network architecture and what kind of security services are offered by these 5G components.
Multicasting Optical Reconfigurable Switch	Niyazi Ulas Dinc, Mustafa Yildirim, Ilker Oguz, Christophe Moser, Demetri Psaltis	2024-01-25	下载	Artificial Intelligence (AI) demands large data flows within datacenters, heavily relying on multicasting data transfers. As AI models scale, the requirement for high-bandwidth and low-latency network...
Model CBOR Serialization for Federated Learning	Koen Zandberg, Mayank Gulati, Gerhard Wunder, Emmanuel Baccelli	2024-01-25	下载	The typical federated learning workflow requires communication between a central server and a large set of clients synchronizing model parameters between each other.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache	Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina	2024-01-25	下载	This paper presents MoE-Infinity, an efficient MoE inference system designed for personal machines with limited GPU memory capacity. The key idea for MoE-Infinity is that on personal machines, which a...