Skip to content

2024-06-20

cs.AR - Architecture

标题作者发布日期PDF摘要
Exploring DRAM Cache Prefetching for Pooled MemoryChandrahas Tirumalasetty, Narasimha Annapreddy2024-06-20下载Hardware based memory pooling enabled by interconnect standards like CXL have been gaining popularity amongst cloud providers and system integrators.
WAGONN: Weight Bit Agglomeration in Crossbar Arrays for Reduced Impact of Interconnect Resistance on DNN Inference AccuracyJeffry Victor, Dong Eun Kim, Chunguang Wang, Kaushik Roy, Sumeet Gupta2024-06-20下载Deep neural network (DNN) accelerators employing crossbar arrays capable of in-memory computing (IMC) are highly promising for neural computing platforms.
Scalable and RISC-V Programmable Near-Memory Computing Architectures for Edge NodesMichele Caon, Clément Choné, Pasquale Davide Schiavone, Alexandre Levisse, Guido Masera, Maurizio Martina, David Atienza2024-06-20下载The widespread adoption of data-centric algorithms, particularly Artificial Intelligence (AI) and Machine Learning (ML), has exposed the limitations of centralized processing infrastructures, driving ...
COOK Access Control on an embedded Volta GPUBenjamin Lesage, Frédéric Boniol, Claire Pagetti2024-06-20下载The last decade has seen the emergence of a new generation of multi-core in response to advances in machine learning, and in particular Deep Neural Network (DNN) training and inference tasks.
AMC: Access to Miss Correlation Prefetcher for Evolving Graph AnalyticsAbhishek Singh, Christian Schulte, Xiaochen Guo2024-06-20下载Modern memory hierarchies work well with applications that have good spatial locality. Evolving (dynamic) graphs are important applications widely used to model graphs and networks with edge and verte...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Suki: Choreographed Distributed Dataflow in RustShadaj Laddad, Alvin Cheung, Joseph M. Hellerstein2024-06-20下载Programming models for distributed dataflow have long focused on analytical workloads that allow the runtime to dynamically place and schedule compute logic.
Vahana.jl -- A framework (not only) for large-scale agent-based modelsSteffen Fürst, Tim Conrad, Carlo Jaeger, Sarah Wolf2024-06-20下载Agent-based models (ABMs) offer a powerful framework for understanding complex systems. However, their computational demands often become a significant barrier as the number of agents and complexity o...
CascadeServe: Unlocking Model Cascades for Inference ServingFerdi Kossmann, Ziniu Wu, Alex Turk, Nesime Tatbul, Lei Cao, Samuel Madden2024-06-20下载Machine learning (ML) models are increasingly deployed to production, calling for efficient inference serving systems. Efficient inference serving is complicated by two challenges: (i) ML models incur...
Communication-efficient Vertical Federated Learning via Compressed Error FeedbackPedro Valdeira, João Xavier, Cláudia Soares, Yuejie Chi2024-06-20下载Communication overhead is a known bottleneck in federated learning (FL). To address this, lossy compression is commonly used on the information communicated between the server and clients during train...
Safety-Critical Edge Robotics Architecture with Bounded End-to-End LatencyGautam Gala, Tilmann Unte, Luiz Maia, Johannes Kühbacher, Isser Kadusale, Mohammad Ibrahim Alkoudsi, Gerhard Fohler, Sebastian Altmeyer2024-06-20下载Edge computing processes data near its source, reducing latency and enhancing security compared to traditional cloud computing while providing its benefits.
AI-coupled HPC Workflow Applications, Middleware and PerformanceWes Brewer, Ana Gainaru, Frédéric Suter, Feiyi Wang, Murali Emani, Shantenu Jha2024-06-20下载AI integration is revolutionizing the landscape of HPC simulations, enhancing the importance, use, and performance of AI-driven HPC workflows.
NAC-QFL: Noise Aware Clustered Quantum Federated LearningHimanshu Sahu, Hari Prabhat Gupta2024-06-20下载Recent advancements in quantum computing, alongside successful deployments of quantum communication, hold promises for revolutionizing mobile networks.
Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge DevicesLi Wang, Liang Li, Lianming Xu, Xian Peng, Aiguo Fei2024-06-20下载The distributed inference paradigm enables the computation workload to be distributed across multiple devices, facilitating the implementations of deep learning based intelligent services on extremely...
ReaL: Efficient RLHF Training of Large Language Models with Parameter ReallocationZhiyu Mei, Wei Fu, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu2024-06-20下载Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique for empowering large language model (LLM) applications. Compared with the supervised training process of LLMs, the RLHF trainin...
Reducing Memory Contention and I/O Congestion for Disk-based GNN TrainingQisheng Jiang, Lei Jia, Chundong Wang2024-06-20下载Graph neural networks (GNNs) gain wide popularity. Large graphs with high-dimensional features become common and training GNNs on them is non-trivial on an ordinary machine.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Adaptive Compression of Massive MIMO Channel State Information with Deep LearningFaris B. Mismar, Aliye Özge Kaya2024-06-20下载This paper proposes the use of deep autoencoders to compress the channel information in a \review{massive} multiple input and multiple output (MIMO) system.
QuIP: A P4 Quantum Internet Protocol Prototyping FrameworkWojciech Kozlowski, Fernando A. Kuipers, Rob Smets, Belma Turkovic2024-06-20下载Quantum entanglement is so fundamentally different from a network packet that several quantum network stacks have been proposed; one of which has even been experimentally demonstrated.
Age of Information Versions: a Semantic View of Markov Source MonitoringMehrdad Salimnejad, Marios Kountouris, Anthony Ephremides, Nikolaos Pappas2024-06-20下载We consider the problem of real-time remote monitoring of a two-state Markov process, where a sensor observes the state of the source and makes a decision on whether to transmit the status updates ove...
Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto ScalingS. R. Eshwar, Lucas Lopes Felipe, Alexandre Reiffers-Masson, Daniel Sadoc Menasché, Gugan Thoppe2024-06-20下载Load balancing and auto scaling are at the core of scalable, contemporary systems, addressing dynamic resource allocation and service rate adjustments in response to workload changes.
Leveraging eBPF and AI for Ransomware Nose OutArjun Sekar, Sameer G. Kulkarni, Joy Kuri2024-06-20下载In this work, we propose a two-phased approach for real-time detection and deterrence of ransomware. To achieve this, we leverage the capabilities of eBPF (Extended Berkeley Packet Filter) and artific...
Hierarchical Micro-Segmentations for Zero-Trust Services via Large Language Model (LLM)-enhanced Graph DiffusionYinqiu Liu, Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin Shen2024-06-20下载In the rapidly evolving Next-Generation Networking (NGN) era, the adoption of zero-trust architectures has become increasingly crucial to protect security.

cs.PF - Performance

标题作者发布日期PDF摘要
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM PipelinesWenbo Sun, Jiaqi Wang, Qiming Guo, Ziyu Li, Wenlu Wang, Rihan Hai2024-06-20下载Online Large Language Model (LLM) services such as ChatGPT and Claude 3 have transformed business operations and academic research by effortlessly enabling new opportunities.
Queen: A quick, scalable, and comprehensive quantum circuit simulation for supercomputingChuan-Chi Wang, Yu-Cheng Lin, Yan-Jie Wang, Chia-Heng Tu, Shih-Hao Hung2024-06-20下载The state vector-based simulation offers a convenient approach to developing and validating quantum algorithms with noise-free results. However, limited by the absence of cache-aware implementations a...
TurboSpec: Closed-loop Speculation Control System for Optimizing LLM Serving GoodputXiaoxuan Liu, Jongseok Park, Langxiang Hu, Woosuk Kwon, Zhuohan Li, Chen Zhang, Kuntai Du, Xiangxi Mo, Kaichao You, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang2024-06-20下载Large Language Model (LLM) serving systems batch concurrent user requests to achieve efficient serving. However, in real-world deployments, such inter-request parallelism from batching is often limite...

基于 VitePress 构建