2024-10-16

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
RapidStream IR: Infrastructure for FPGA High-Level Physical Synthesis	Jason Lau, Yuanlong Xiao, Yutong Xie, Yuze Chi, Linghao Song, Shaojie Xiang, Michael Lo, Zhiru Zhang, Jason Cong, Licheng Guo	2024-10-16	下载	The increasing complexity of large-scale FPGA accelerators poses significant challenges in achieving high performance while maintaining design productivity.
Low-Power Encoding for PAM-3 DRAM Bus	Jonghyeon Nam, Jaeduk Han, Hokeun Kim	2024-10-16	下载	The 3-level pulse amplitude modulation (PAM-3) signaling is expected to be widely used in memory interfaces for its greater voltage margins compared to PAM-4.
Toleo: Scaling Freshness to Tera-scale Memory using CXL and PIM	Juechu Dong, Jonah Rosenblum, Satish Narayanasamy	2024-10-16	下载	Trusted hardware's freshness guarantee ensures that an adversary cannot replay an old value in response to a memory read request. They rely on maintaining a version number for each cache block and ens...
Mixed-precision finite element kernels and assembly: Rounding error analysis and hardware acceleration	M. Croci, G. N. Wells	2024-10-16	下载	In this paper we develop the first fine-grained rounding error analysis of finite element (FE) cell kernels and assembly. The theory includes mixed-precision implementations and accounts for hardware-...
An O(m+n)-Space Spatiotemporal Denoising Filter with Cache-Like Memories for Dynamic Vision Sensors	Qinghang Zhao, Jiaqi Wang, Yixi Ji, Jinjian Wu, Guangming Shi	2024-10-16	下载	Dynamic vision sensor (DVS) is novel neuromorphic imaging device that generates asynchronous events. Despite the high temporal resolution and high dynamic range features, DVS is faced with background ...
COMET: Towards Partical W4A4KV4 LLMs Serving	Lian Liu, Haimeng Ren, Long Cheng, Zhaohui Xu, Yudong Pan, Mengdi Wang, Xiaowei Li, Yinhe Han, Ying Wang	2024-10-16	下载	Quantization is a widely-used compression technology to reduce the overhead of serving large language models (LLMs) on terminal devices and in cloud data centers.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
RapidStream IR: Infrastructure for FPGA High-Level Physical Synthesis	Jason Lau, Yuanlong Xiao, Yutong Xie, Yuze Chi, Linghao Song, Shaojie Xiang, Michael Lo, Zhiru Zhang, Jason Cong, Licheng Guo	2024-10-16	下载	The increasing complexity of large-scale FPGA accelerators poses significant challenges in achieving high performance while maintaining design productivity.
Boosting Asynchronous Decentralized Learning with Model Fragmentation	Sayan Biswas, Anne-Marie Kermarrec, Alexis Marouani, Rafael Pires, Rishi Sharma, Martijn de Vos	2024-10-16	下载	Decentralized learning (DL) is an emerging technique that allows nodes on the web to collaboratively train machine learning models without sharing raw data. Dealing with stragglers, i.e.
Vaccinating Federated Learning for Robust Modulation Classification in Distributed Wireless Networks	Hunmin Lee, Hongju Seong, Wonbin Kim, Hyeokchan Kwon, Daehee Seo	2024-10-16	下载	Automatic modulation classification (AMC) serves a vital role in ensuring efficient and reliable communication services within distributed wireless networks.
Toward Optimal-Complexity Hash-Based Asynchronous MVBA with Optimal Resilience	Jovan Komatovic, Joachim Neu, Tim Roughgarden	2024-10-16	下载	Multi-valued validated Byzantine agreement (MVBA), a fundamental primitive of distributed computing, allows $n$ processes to agree on a valid $\ell$ -bit value, despite $t$ faulty processes behaving ma...
HEnRY: A Multi-Agent System Framework for Multi-Domain Contexts	Emmanuele Lacavalla, Shuyi Yang, Riccardo Crupi, Joseph E. Gonzalez	2024-10-16	下载	This project, named HEnRY, aims to introduce a Multi-Agent System (MAS) into Intesa Sanpaolo. The name HEnRY summarizes the project's core principles: the Hierarchical organization of agents in a laye...
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression	Zhenheng Tang, Xueze Kang, Yiming Yin, Xinglin Pan, Yuxin Wang, Xin He, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Amelie Chi Zhou, Bo Li, Bingsheng He, Xiaowen Chu	2024-10-16	下载	To alleviate hardware scarcity in training large deep neural networks (DNNs), particularly large language models (LLMs), we present FusionLLM, a decentralized training system designed and implemented ...
Optimization and Application of Cloud-based Deep Learning Architecture for Multi-Source Data Prediction	Yang Zhang, Fa Wang, Xin Huang, Xintao Li, Sibei Liu, Hansong Zhang	2024-10-16	下载	This study develops a cloud-based deep learning system for early prediction of diabetes, leveraging the distributed computing capabilities of the AWS cloud platform and deep learning technologies to a...
FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training	Tianyuan Wu, Wei Wang, Yinghao Yu, Siran Yang, Wenchao Wu, Qinkai Duan, Guodong Yang, Jiamang Wang, Lin Qu, Liping Zhang	2024-10-16	下载	Fail-slows, or stragglers, are common but largely unheeded problems in large-scale hybrid-parallel training that spans thousands of GPU servers and runs for weeks to months.
SEMSO: A Secure and Efficient Multi-Data Source Blockchain Oracle	Youquan Xian, Xueying Zeng, Chunpei Li, Peng Wang, Dongcheng Li, Peng Liu, Xianxian Li	2024-10-16	下载	In recent years, blockchain oracle, as the key link between blockchain and real-world data interaction, has greatly expanded the application scope of blockchain.
Disentangling data distribution for Federated Learning	Xinyuan Zhao, Hanlin Gu, Lixin Fan, Yuxing Han, Qiang Yang	2024-10-16	下载	Federated Learning (FL) facilitates collaborative training of a global model whose performance is boosted by private data owned by distributed clients, without compromising data privacy.
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment	Qinfeng Li, Tianyue Luo, Xuhong Zhang, Yangfan Xie, Zhiqiang Shen, Lijun Zhang, Yier Jin, Hao Peng, Xinkui Zhao, Xianwei Zhu, Jianwei Yin	2024-10-16	下载	Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons.
Federated Temporal Graph Clustering	Zihao Zhou, Yang Liu, Xianghong Xu, Qian Li	2024-10-16	下载	Temporal graph clustering is a complex task that involves discovering meaningful structures in dynamic graphs where relationships and entities change over time.
Towards Edge General Intelligence via Large Language Models: Opportunities and Challenges	Handi Chen, Weipeng Deng, Shuo Yang, Jinfeng Xu, Zhihan Jiang, Edith C. H. Ngai, Jiangchuan Liu, Xue Liu	2024-10-16	下载	Edge Intelligence (EI) has been instrumental in delivering real-time, localized services by leveraging the computational capabilities of edge networks.
TPFL: A Trustworthy Personalized Federated Learning Framework via Subjective Logic	Jinqian Chen, Jihua Zhu	2024-10-16	下载	Federated learning (FL) enables collaborative model training across distributed clients while preserving data privacy. Despite its widespread adoption, most FL approaches focusing solely on privacy pr...
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference	Yulei Qian, Fengcun Li, Xiangyang Ji, Xiaoyu Zhao, Jianchao Tan, Kefeng Zhang, Xunliang Cai	2024-10-16	下载	The Mixture-of-Experts (MoE) model has emerged as a prominent architecture in the field of Large Language Models (LLMs), providing a better balance between model performance and computational efficien...
EdgeRL: Reinforcement Learning-driven Deep Learning Model Inference Optimization at Edge	Motahare Mounesan, Xiaojie Zhang, Saptarshi Debroy	2024-10-16	下载	Balancing mutually diverging performance metrics, such as, processing latency, outcome accuracy, and end device energy consumption is a challenging undertaking for deep learning model inference in ad-...
SANSee: A Physical-layer Semantic-aware Networking Framework for Distributed Wireless Sensing	Huixiang Zhu, Yong Xiao, Yingyu Li, Guangming Shi, Marwan Krunz	2024-10-16	下载	Contactless device-free wireless sensing has recently attracted significant interest due to its potential to support a wide range of immersive human-machine interactive applications using ubiquitously...
Accelerating high-order continuum kinetic plasma simulations using multiple GPUs	Andrew Ho, Genia Vogman	2024-10-16	下载	Kinetic plasma simulations solve the Vlasov-Poisson or Vlasov-Maxwell equations to evolve scalar-variable distribution functions in position-velocity phase space and vector-variable electromagnetic fi...
Proof of Team Sprint: A Collaborative Consensus Algorithm for Reducing Energy Consumption in Blockchain Systems	Naoki Yonezawa	2024-10-16	下载	This paper introduces Proof of Team Sprint (PoTS), a novel consensus algorithm designed to address the significant energy inefficiencies inherent in traditional Proof of Work (PoW) systems.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Latency-Aware Inter-domain Routing	Shihan Lin, Yi Zhou, Xiao Zhang, Todd Arnold, Ramesh Govindan, Xiaowei Yang	2024-10-16	下载	Despite efforts from cloud and content providers to lower latency to acceptable levels for current and future services (e.g., augmented reality or cloud gaming), there are still opportunities for impr...
ORANSlice: An Open-Source 5G Network Slicing Platform for O-RAN	Hai Cheng, Salvatore D'Oro, Rajeev Gangula, Sakthivel Velumani, Davide Villa, Leonardo Bonati, Michele Polese, Gabriel Arrobo, Christian Maciocco, Tommaso Melodia	2024-10-16	下载	Network slicing allows Telecom Operators (TOs) to support service provisioning with diverse Service Level Agreements (SLAs). The combination of network slicing and Open Radio Access Network (RAN) enab...
IRS-aided Near-field Communication: Prospects and Challenges with Codebook Approach	Ryuhei Hibi, Hiroaki Hashida, Yuichi Kawamoto, Nei Kato	2024-10-16	下载	Intelligent reflecting surfaces (IRSs) are gaining attention as a low-cost solution to the coverage reduction in high-frequency bands used in next-generation communications.
Towards Edge General Intelligence via Large Language Models: Opportunities and Challenges	Handi Chen, Weipeng Deng, Shuo Yang, Jinfeng Xu, Zhihan Jiang, Edith C. H. Ngai, Jiangchuan Liu, Xue Liu	2024-10-16	下载	Edge Intelligence (EI) has been instrumental in delivering real-time, localized services by leveraging the computational capabilities of edge networks.
LPUF-AuthNet: A Lightweight PUF-Based IoT Authentication via Tandem Neural Networks and Split Learning	Brahim Mefgouda, Raviha Khan, Omar Alhussein, Hani Saleh, Hossien B. Eldeeb, Anshul Pandey, Sami Muhaidat	2024-10-16	下载	By 2025, the internet of things (IoT) is projected to connect over 75 billion devices globally, fundamentally altering how we interact with our environments in both urban and rural settings.
SANSee: A Physical-layer Semantic-aware Networking Framework for Distributed Wireless Sensing	Huixiang Zhu, Yong Xiao, Yingyu Li, Guangming Shi, Marwan Krunz	2024-10-16	下载	Contactless device-free wireless sensing has recently attracted significant interest due to its potential to support a wide range of immersive human-machine interactive applications using ubiquitously...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel Training	Tianyuan Wu, Wei Wang, Yinghao Yu, Siran Yang, Wenchao Wu, Qinkai Duan, Guodong Yang, Jiamang Wang, Lin Qu, Liping Zhang	2024-10-16	下载	Fail-slows, or stragglers, are common but largely unheeded problems in large-scale hybrid-parallel training that spans thousands of GPU servers and runs for weeks to months.