Skip to content

2024-10-16

cs.AR - Architecture

标题作者发布日期PDF摘要
RapidStream IR: Infrastructure for FPGA High-Level Physical SynthesisJason Lau, Yuanlong Xiao, Yutong Xie, Yuze Chi, Linghao Song, Shaojie Xiang, Michael Lo, Zhiru Zhang, Jason Cong, Licheng Guo2024-10-16下载The increasing complexity of large-scale FPGA accelerators poses significant challenges in achieving high performance while maintaining design productivity.
Low-Power Encoding for PAM-3 DRAM BusJonghyeon Nam, Jaeduk Han, Hokeun Kim2024-10-16下载The 3-level pulse amplitude modulation (PAM-3) signaling is expected to be widely used in memory interfaces for its greater voltage margins compared to PAM-4.
Toleo: Scaling Freshness to Tera-scale Memory using CXL and PIMJuechu Dong, Jonah Rosenblum, Satish Narayanasamy2024-10-16下载Trusted hardware's freshness guarantee ensures that an adversary cannot replay an old value in response to a memory read request. They rely on maintaining a version number for each cache block and ens...
Mixed-precision finite element kernels and assembly: Rounding error analysis and hardware accelerationM. Croci, G. N. Wells2024-10-16下载In this paper we develop the first fine-grained rounding error analysis of finite element (FE) cell kernels and assembly. The theory includes mixed-precision implementations and accounts for hardware-...
An O(m+n)-Space Spatiotemporal Denoising Filter with Cache-Like Memories for Dynamic Vision SensorsQinghang Zhao, Jiaqi Wang, Yixi Ji, Jinjian Wu, Guangming Shi2024-10-16下载Dynamic vision sensor (DVS) is novel neuromorphic imaging device that generates asynchronous events. Despite the high temporal resolution and high dynamic range features, DVS is faced with background ...
COMET: Towards Partical W4A4KV4 LLMs ServingLian Liu, Haimeng Ren, Long Cheng, Zhaohui Xu, Yudong Pan, Mengdi Wang, Xiaowei Li, Yinhe Han, Ying Wang2024-10-16下载Quantization is a widely-used compression technology to reduce the overhead of serving large language models (LLMs) on terminal devices and in cloud data centers.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
RapidStream IR: Infrastructure for FPGA High-Level Physical SynthesisJason Lau, Yuanlong Xiao, Yutong Xie, Yuze Chi, Linghao Song, Shaojie Xiang, Michael Lo, Zhiru Zhang, Jason Cong, Licheng Guo2024-10-16下载The increasing complexity of large-scale FPGA accelerators poses significant challenges in achieving high performance while maintaining design productivity.
Boosting Asynchronous Decentralized Learning with Model FragmentationSayan Biswas, Anne-Marie Kermarrec, Alexis Marouani, Rafael Pires, Rishi Sharma, Martijn de Vos2024-10-16下载Decentralized learning (DL) is an emerging technique that allows nodes on the web to collaboratively train machine learning models without sharing raw data. Dealing with stragglers, i.e.
Vaccinating Federated Learning for Robust Modulation Classification in Distributed Wireless NetworksHunmin Lee, Hongju Seong, Wonbin Kim, Hyeokchan Kwon, Daehee Seo2024-10-16下载Automatic modulation classification (AMC) serves a vital role in ensuring efficient and reliable communication services within distributed wireless networks.
Toward Optimal-Complexity Hash-Based Asynchronous MVBA with Optimal ResilienceJovan Komatovic, Joachim Neu, Tim Roughgarden2024-10-16下载Multi-valued validated Byzantine agreement (MVBA), a fundamental primitive of distributed computing, allows nn processes to agree on a valid \ell-bit value, despite tt faulty processes behaving ma...
HEnRY: A Multi-Agent System Framework for Multi-Domain ContextsEmmanuele Lacavalla, Shuyi Yang, Riccardo Crupi, Joseph E. Gonzalez2024-10-16下载This project, named HEnRY, aims to introduce a Multi-Agent System (MAS) into Intesa Sanpaolo. The name HEnRY summarizes the project's core principles: the Hierarchical organization of agents in a laye...
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive CompressionZhenheng Tang, Xueze Kang, Yiming Yin, Xinglin Pan, Yuxin Wang, Xin He, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Amelie Chi Zhou, Bo Li, Bingsheng He, Xiaowen Chu2024-10-16下载To alleviate hardware scarcity in training large deep neural networks (DNNs), particularly large language models (LLMs), we present FusionLLM, a decentralized training system designed and implemented ...
Optimization and Application of Cloud-based Deep Learning Architecture for Multi-Source Data PredictionYang Zhang, Fa Wang, Xin Huang, Xintao Li, Sibei Liu, Hansong Zhang2024-10-16下载This study develops a cloud-based deep learning system for early prediction of diabetes, leveraging the distributed computing capabilities of the AWS cloud platform and deep learning technologies to a...
FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel TrainingTianyuan Wu, Wei Wang, Yinghao Yu, Siran Yang, Wenchao Wu, Qinkai Duan, Guodong Yang, Jiamang Wang, Lin Qu, Liping Zhang2024-10-16下载Fail-slows, or stragglers, are common but largely unheeded problems in large-scale hybrid-parallel training that spans thousands of GPU servers and runs for weeks to months.
SEMSO: A Secure and Efficient Multi-Data Source Blockchain OracleYouquan Xian, Xueying Zeng, Chunpei Li, Peng Wang, Dongcheng Li, Peng Liu, Xianxian Li2024-10-16下载In recent years, blockchain oracle, as the key link between blockchain and real-world data interaction, has greatly expanded the application scope of blockchain.
Disentangling data distribution for Federated LearningXinyuan Zhao, Hanlin Gu, Lixin Fan, Yuxing Han, Qiang Yang2024-10-16下载Federated Learning (FL) facilitates collaborative training of a global model whose performance is boosted by private data owned by distributed clients, without compromising data privacy.
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge DeploymentQinfeng Li, Tianyue Luo, Xuhong Zhang, Yangfan Xie, Zhiqiang Shen, Lijun Zhang, Yier Jin, Hao Peng, Xinkui Zhao, Xianwei Zhu, Jianwei Yin2024-10-16下载Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons.
Federated Temporal Graph ClusteringZihao Zhou, Yang Liu, Xianghong Xu, Qian Li2024-10-16下载Temporal graph clustering is a complex task that involves discovering meaningful structures in dynamic graphs where relationships and entities change over time.
Towards Edge General Intelligence via Large Language Models: Opportunities and ChallengesHandi Chen, Weipeng Deng, Shuo Yang, Jinfeng Xu, Zhihan Jiang, Edith C. H. Ngai, Jiangchuan Liu, Xue Liu2024-10-16下载Edge Intelligence (EI) has been instrumental in delivering real-time, localized services by leveraging the computational capabilities of edge networks.
TPFL: A Trustworthy Personalized Federated Learning Framework via Subjective LogicJinqian Chen, Jihua Zhu2024-10-16下载Federated learning (FL) enables collaborative model training across distributed clients while preserving data privacy. Despite its widespread adoption, most FL approaches focusing solely on privacy pr...
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE InferenceYulei Qian, Fengcun Li, Xiangyang Ji, Xiaoyu Zhao, Jianchao Tan, Kefeng Zhang, Xunliang Cai2024-10-16下载The Mixture-of-Experts (MoE) model has emerged as a prominent architecture in the field of Large Language Models (LLMs), providing a better balance between model performance and computational efficien...
EdgeRL: Reinforcement Learning-driven Deep Learning Model Inference Optimization at EdgeMotahare Mounesan, Xiaojie Zhang, Saptarshi Debroy2024-10-16下载Balancing mutually diverging performance metrics, such as, processing latency, outcome accuracy, and end device energy consumption is a challenging undertaking for deep learning model inference in ad-...
SANSee: A Physical-layer Semantic-aware Networking Framework for Distributed Wireless SensingHuixiang Zhu, Yong Xiao, Yingyu Li, Guangming Shi, Marwan Krunz2024-10-16下载Contactless device-free wireless sensing has recently attracted significant interest due to its potential to support a wide range of immersive human-machine interactive applications using ubiquitously...
Accelerating high-order continuum kinetic plasma simulations using multiple GPUsAndrew Ho, Genia Vogman2024-10-16下载Kinetic plasma simulations solve the Vlasov-Poisson or Vlasov-Maxwell equations to evolve scalar-variable distribution functions in position-velocity phase space and vector-variable electromagnetic fi...
Proof of Team Sprint: A Collaborative Consensus Algorithm for Reducing Energy Consumption in Blockchain SystemsNaoki Yonezawa2024-10-16下载This paper introduces Proof of Team Sprint (PoTS), a novel consensus algorithm designed to address the significant energy inefficiencies inherent in traditional Proof of Work (PoW) systems.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Latency-Aware Inter-domain RoutingShihan Lin, Yi Zhou, Xiao Zhang, Todd Arnold, Ramesh Govindan, Xiaowei Yang2024-10-16下载Despite efforts from cloud and content providers to lower latency to acceptable levels for current and future services (e.g., augmented reality or cloud gaming), there are still opportunities for impr...
ORANSlice: An Open-Source 5G Network Slicing Platform for O-RANHai Cheng, Salvatore D'Oro, Rajeev Gangula, Sakthivel Velumani, Davide Villa, Leonardo Bonati, Michele Polese, Gabriel Arrobo, Christian Maciocco, Tommaso Melodia2024-10-16下载Network slicing allows Telecom Operators (TOs) to support service provisioning with diverse Service Level Agreements (SLAs). The combination of network slicing and Open Radio Access Network (RAN) enab...
IRS-aided Near-field Communication: Prospects and Challenges with Codebook ApproachRyuhei Hibi, Hiroaki Hashida, Yuichi Kawamoto, Nei Kato2024-10-16下载Intelligent reflecting surfaces (IRSs) are gaining attention as a low-cost solution to the coverage reduction in high-frequency bands used in next-generation communications.
Towards Edge General Intelligence via Large Language Models: Opportunities and ChallengesHandi Chen, Weipeng Deng, Shuo Yang, Jinfeng Xu, Zhihan Jiang, Edith C. H. Ngai, Jiangchuan Liu, Xue Liu2024-10-16下载Edge Intelligence (EI) has been instrumental in delivering real-time, localized services by leveraging the computational capabilities of edge networks.
LPUF-AuthNet: A Lightweight PUF-Based IoT Authentication via Tandem Neural Networks and Split LearningBrahim Mefgouda, Raviha Khan, Omar Alhussein, Hani Saleh, Hossien B. Eldeeb, Anshul Pandey, Sami Muhaidat2024-10-16下载By 2025, the internet of things (IoT) is projected to connect over 75 billion devices globally, fundamentally altering how we interact with our environments in both urban and rural settings.
SANSee: A Physical-layer Semantic-aware Networking Framework for Distributed Wireless SensingHuixiang Zhu, Yong Xiao, Yingyu Li, Guangming Shi, Marwan Krunz2024-10-16下载Contactless device-free wireless sensing has recently attracted significant interest due to its potential to support a wide range of immersive human-machine interactive applications using ubiquitously...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
FALCON: Pinpointing and Mitigating Stragglers for Large-Scale Hybrid-Parallel TrainingTianyuan Wu, Wei Wang, Yinghao Yu, Siran Yang, Wenchao Wu, Qinkai Duan, Guodong Yang, Jiamang Wang, Lin Qu, Liping Zhang2024-10-16下载Fail-slows, or stragglers, are common but largely unheeded problems in large-scale hybrid-parallel training that spans thousands of GPU servers and runs for weeks to months.

基于 VitePress 构建