2024-06-20

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Exploring DRAM Cache Prefetching for Pooled Memory	Chandrahas Tirumalasetty, Narasimha Annapreddy	2024-06-20	下载	Hardware based memory pooling enabled by interconnect standards like CXL have been gaining popularity amongst cloud providers and system integrators.
WAGONN: Weight Bit Agglomeration in Crossbar Arrays for Reduced Impact of Interconnect Resistance on DNN Inference Accuracy	Jeffry Victor, Dong Eun Kim, Chunguang Wang, Kaushik Roy, Sumeet Gupta	2024-06-20	下载	Deep neural network (DNN) accelerators employing crossbar arrays capable of in-memory computing (IMC) are highly promising for neural computing platforms.
Scalable and RISC-V Programmable Near-Memory Computing Architectures for Edge Nodes	Michele Caon, Clément Choné, Pasquale Davide Schiavone, Alexandre Levisse, Guido Masera, Maurizio Martina, David Atienza	2024-06-20	下载	The widespread adoption of data-centric algorithms, particularly Artificial Intelligence (AI) and Machine Learning (ML), has exposed the limitations of centralized processing infrastructures, driving ...
COOK Access Control on an embedded Volta GPU	Benjamin Lesage, Frédéric Boniol, Claire Pagetti	2024-06-20	下载	The last decade has seen the emergence of a new generation of multi-core in response to advances in machine learning, and in particular Deep Neural Network (DNN) training and inference tasks.
AMC: Access to Miss Correlation Prefetcher for Evolving Graph Analytics	Abhishek Singh, Christian Schulte, Xiaochen Guo	2024-06-20	下载	Modern memory hierarchies work well with applications that have good spatial locality. Evolving (dynamic) graphs are important applications widely used to model graphs and networks with edge and verte...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Suki: Choreographed Distributed Dataflow in Rust	Shadaj Laddad, Alvin Cheung, Joseph M. Hellerstein	2024-06-20	下载	Programming models for distributed dataflow have long focused on analytical workloads that allow the runtime to dynamically place and schedule compute logic.
Vahana.jl -- A framework (not only) for large-scale agent-based models	Steffen Fürst, Tim Conrad, Carlo Jaeger, Sarah Wolf	2024-06-20	下载	Agent-based models (ABMs) offer a powerful framework for understanding complex systems. However, their computational demands often become a significant barrier as the number of agents and complexity o...
CascadeServe: Unlocking Model Cascades for Inference Serving	Ferdi Kossmann, Ziniu Wu, Alex Turk, Nesime Tatbul, Lei Cao, Samuel Madden	2024-06-20	下载	Machine learning (ML) models are increasingly deployed to production, calling for efficient inference serving systems. Efficient inference serving is complicated by two challenges: (i) ML models incur...
Communication-efficient Vertical Federated Learning via Compressed Error Feedback	Pedro Valdeira, João Xavier, Cláudia Soares, Yuejie Chi	2024-06-20	下载	Communication overhead is a known bottleneck in federated learning (FL). To address this, lossy compression is commonly used on the information communicated between the server and clients during train...
Safety-Critical Edge Robotics Architecture with Bounded End-to-End Latency	Gautam Gala, Tilmann Unte, Luiz Maia, Johannes Kühbacher, Isser Kadusale, Mohammad Ibrahim Alkoudsi, Gerhard Fohler, Sebastian Altmeyer	2024-06-20	下载	Edge computing processes data near its source, reducing latency and enhancing security compared to traditional cloud computing while providing its benefits.
AI-coupled HPC Workflow Applications, Middleware and Performance	Wes Brewer, Ana Gainaru, Frédéric Suter, Feiyi Wang, Murali Emani, Shantenu Jha	2024-06-20	下载	AI integration is revolutionizing the landscape of HPC simulations, enhancing the importance, use, and performance of AI-driven HPC workflows.
NAC-QFL: Noise Aware Clustered Quantum Federated Learning	Himanshu Sahu, Hari Prabhat Gupta	2024-06-20	下载	Recent advancements in quantum computing, alongside successful deployments of quantum communication, hold promises for revolutionizing mobile networks.
Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices	Li Wang, Liang Li, Lianming Xu, Xian Peng, Aiguo Fei	2024-06-20	下载	The distributed inference paradigm enables the computation workload to be distributed across multiple devices, facilitating the implementations of deep learning based intelligent services on extremely...
ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation	Zhiyu Mei, Wei Fu, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu	2024-06-20	下载	Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique for empowering large language model (LLM) applications. Compared with the supervised training process of LLMs, the RLHF trainin...
Reducing Memory Contention and I/O Congestion for Disk-based GNN Training	Qisheng Jiang, Lei Jia, Chundong Wang	2024-06-20	下载	Graph neural networks (GNNs) gain wide popularity. Large graphs with high-dimensional features become common and training GNNs on them is non-trivial on an ordinary machine.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Adaptive Compression of Massive MIMO Channel State Information with Deep Learning	Faris B. Mismar, Aliye Özge Kaya	2024-06-20	下载	This paper proposes the use of deep autoencoders to compress the channel information in a \review{massive} multiple input and multiple output (MIMO) system.
QuIP: A P4 Quantum Internet Protocol Prototyping Framework	Wojciech Kozlowski, Fernando A. Kuipers, Rob Smets, Belma Turkovic	2024-06-20	下载	Quantum entanglement is so fundamentally different from a network packet that several quantum network stacks have been proposed; one of which has even been experimentally demonstrated.
Age of Information Versions: a Semantic View of Markov Source Monitoring	Mehrdad Salimnejad, Marios Kountouris, Anthony Ephremides, Nikolaos Pappas	2024-06-20	下载	We consider the problem of real-time remote monitoring of a two-state Markov process, where a sensor observes the state of the source and makes a decision on whether to transmit the status updates ove...
Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling	S. R. Eshwar, Lucas Lopes Felipe, Alexandre Reiffers-Masson, Daniel Sadoc Menasché, Gugan Thoppe	2024-06-20	下载	Load balancing and auto scaling are at the core of scalable, contemporary systems, addressing dynamic resource allocation and service rate adjustments in response to workload changes.
Leveraging eBPF and AI for Ransomware Nose Out	Arjun Sekar, Sameer G. Kulkarni, Joy Kuri	2024-06-20	下载	In this work, we propose a two-phased approach for real-time detection and deterrence of ransomware. To achieve this, we leverage the capabilities of eBPF (Extended Berkeley Packet Filter) and artific...
Hierarchical Micro-Segmentations for Zero-Trust Services via Large Language Model (LLM)-enhanced Graph Diffusion	Yinqiu Liu, Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin Shen	2024-06-20	下载	In the rapidly evolving Next-Generation Networking (NGN) era, the adoption of zero-trust architectures has become increasingly crucial to protect security.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines	Wenbo Sun, Jiaqi Wang, Qiming Guo, Ziyu Li, Wenlu Wang, Rihan Hai	2024-06-20	下载	Online Large Language Model (LLM) services such as ChatGPT and Claude 3 have transformed business operations and academic research by effortlessly enabling new opportunities.
Queen: A quick, scalable, and comprehensive quantum circuit simulation for supercomputing	Chuan-Chi Wang, Yu-Cheng Lin, Yan-Jie Wang, Chia-Heng Tu, Shih-Hao Hung	2024-06-20	下载	The state vector-based simulation offers a convenient approach to developing and validating quantum algorithms with noise-free results. However, limited by the absence of cache-aware implementations a...
TurboSpec: Closed-loop Speculation Control System for Optimizing LLM Serving Goodput	Xiaoxuan Liu, Jongseok Park, Langxiang Hu, Woosuk Kwon, Zhuohan Li, Chen Zhang, Kuntai Du, Xiangxi Mo, Kaichao You, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang	2024-06-20	下载	Large Language Model (LLM) serving systems batch concurrent user requests to achieve efficient serving. However, in real-world deployments, such inter-request parallelism from batching is often limite...