2024-10-29

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Systolic Array Data Flows for Efficient Matrix Multiplication in Deep Neural Networks	Tejas Raja	2024-10-29	下载	The paper discusses how Systolic Arrays can improve matrix multiplication for deep neural networks (DNNs). With AI models like OpenAI's GPT now containing trillions of parameters, the need for efficie...
Lost and Found in Speculation: Hybrid Speculative Vulnerability Detection	Mohamadreza Rostami, Shaza Zeitouni, Rahul Kande, Chen Chen, Pouya Mahmoody, Jeyavijayan, Rajendran, Ahmad-Reza Sadeghi	2024-10-29	下载	Microarchitectural attacks represent a challenging and persistent threat to modern processors, exploiting inherent design vulnerabilities in processors to leak sensitive information or compromise syst...
Communication Characterization of AI Workloads for Large-scale Multi-chiplet Accelerators	Mariam Musavi, Emmanuel Irabor, Abhijit Das, Eduard Alarcon, Sergi Abadal	2024-10-29	下载	Next-generation artificial intelligence (AI) workloads are posing challenges of scalability and robustness in terms of execution time due to their intrinsic evolving data-intensive characteristics.
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs	Rishabh Jain, Vivek M. Bhasi, Adwait Jog, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das	2024-10-29	下载	Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalizat...
Online Alignment and Addition in Multi-Term Floating-Point Adders	Kosmas Alexandridis, Giorgos Dimitrakopoulos	2024-10-29	下载	Multi-term floating-point addition appears in vector dot-product computations, matrix multiplications, and other forms of floating-point data aggregation.
A Host-SSD Collaborative Write Accelerator for LSM-Tree-Based Key-Value Stores	KiHwan Kim, Hyunsun Chung, Seonghoon Ahn, Junhyeok Park, Safdar Jamil, Hongsu Byun, Myungcheol Lee, Jinchun Choi, Youngjae Kim	2024-10-29	下载	Log-Structured Merge (LSM) tree-based Key-Value Stores (KVSs) are widely adopted for their high performance in write-intensive environments, but they often face performance degradation due to write st...
Efficient Reprogramming of Memristive Crossbars for DNNs: Weight Sorting and Bit Stucking	Matheus Farias, H. T. Kung	2024-10-29	下载	We introduce a novel approach to reduce the number of times required for reprogramming memristors on bit-sliced compute-in-memory crossbars for deep neural networks (DNNs).

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
An AD based library for Efficient Hessian and Hessian-Vector Product Computation on GPU	Desh Ranjan, Mohammad Zubair	2024-10-29	下载	The Hessian-vector product computation appears in many scientific applications such as in optimization and finite element modeling. Often there is a need for computing Hessian-vector products at many ...
Vertical Federated Learning with Missing Features During Training and Inference	Pedro Valdeira, Shiqiang Wang, Yuejie Chi	2024-10-29	下载	Vertical federated learning trains models from feature-partitioned datasets across multiple clients, who collaborate without sharing their local data.
Adaptive Aggregation Weights for Federated Segmentation of Pancreas MRI	Hongyi Pan, Gorkem Durak, Zheyuan Zhang, Yavuz Taktak, Elif Keles, Halil Ertugrul Aktas, Alpay Medetalibeyoglu, Yury Velichko, Concetto Spampinato, Ivo Schoots, Marco J. Bruno, Rajesh N. Keswani, Pallavi Tiwari, Candice Bolan, Tamas Gonda, Michael G. Goggins, Michael B. Wallace, Ziyue Xu, Ulas Bagci	2024-10-29	下载	Federated learning (FL) enables collaborative model training across institutions without sharing sensitive data, making it an attractive solution for medical imaging tasks.
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration	Dezhan Tu, Danylo Vashchilenko, Yuzhe Lu, Panpan Xu	2024-10-29	下载	Vision-Language Models (VLMs) have demonstrated impressive performance across a versatile set of tasks. A key challenge in accelerating VLMs is storing and accessing the large Key-Value (KV) cache tha...
Unleashing Multicore Strength for Efficient Execution of Transactions	Ankit Ravish, Akshay Tejwani, Piduguralla Manaswini, Sathya Peri	2024-10-29	下载	Blockchain technology is booming up the digital world in recent days and thus paved a way for creating separate blockchain network for various industries.
GPU Sharing with Triples Mode	Chansup Byun, Albert Reuther, LaToya Anderson, William Arcand, Bill Bergeron, David Bestor, Alexander Bonn, Daniel Burrill, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Piotr Luszczek, Peter Michaleas, Lauren Milechin, Guillermo Morales, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Jeremy Kepner	2024-10-29	下载	There is a tremendous amount of interest in AI/ML technologies due to the proliferation of generative AI applications such as ChatGPT. This trend has significantly increased demand on GPUs, which are ...
ProMoE: Fast MoE-based LLM Serving using Proactive Caching	Xiaoniu Song, Zihang Zhong, Rong Chen, Haibo Chen	2024-10-29	下载	The promising applications of large language models are often limited by the constrained GPU memory capacity available on edge devices. Mixture-of-Experts (MoE) models help address this issue by activ...
A New Broadcast Primitive for BFT Protocols	Manu Drijvers, Tim Gretler, Yotam Harchol, Tobias Klenze, Ognjen Maric, Stefan Neamtu, Yvonne-Anne Pignolet, Rostislav Rumenov, Daniel Sharifi, Victor Shoup	2024-10-29	下载	Byzantine fault tolerant (BFT) protocol descriptions often assume application-layer networking primitives, such as best-effort and reliable broadcast, which are impossible to implement in practice in ...
Optimizing Streamlined Blockchain Consensus with Generalized Weighted Voting and Enhanced Leader Rotation	Diana Micloiu, Rowdy Chotkan, Jérémie Decouchant	2024-10-29	下载	Streamlined Byzantine Fault Tolerant (BFT) protocols, such as HotStuff [PODC'19], and weighted voting represent two possible strategies to improve consensus in the distributed systems world.
Breaking the Million-Electron and 1 EFLOP/s Barriers: Biomolecular-Scale Ab Initio Molecular Dynamics Using MP2 Potentials	Ryan Stocks, Jorge L. Galvez Vallejo, Fiona C. Y. Yu, Calum Snowdon, Elise Palethorpe, Jakub Kurzak, Dmytro Bykov, Giuseppe M. J. Barca	2024-10-29	下载	The accurate simulation of complex biochemical phenomena has historically been hampered by the computational requirements of high-fidelity molecular-modeling techniques.
Histrio: a Serverless Actor System	Giorgio Natale Buttiglieri, Luca De Martini, Alessandro Margara	2024-10-29	下载	In recent years, the serverless paradigm has been widely adopted to develop cloud applications, as it enables building scalable solutions while delegating operational concerns such as infrastructure m...
Building Castles in the Cloud: Architecting Resilient and Scalable Infrastructure	Naresh Kumar Gundla	2024-10-29	下载	In the contemporary world of dynamic digital solutions and services, the significance of effective and stable cloud solutions cannot be overestimated.
Revisiting Reliability in Large-Scale Machine Learning Research Clusters	Apostolos Kokolis, Michael Kuchnik, John Hoffman, Adithya Kumar, Parth Malani, Faye Ma, Zachary DeVito, Shubho Sengupta, Kalyan Saladi, Carole-Jean Wu	2024-10-29	下载	Reliability is a fundamental challenge in operating large-scale machine learning (ML) infrastructures, particularly as the scale of ML models and training clusters continues to grow.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Energy-Aware Multi-Agent Reinforcement Learning for Collaborative Execution in Mission-Oriented Drone Networks	Ying Li, Changling Li, Jiyao Chen, Christine Roinou	2024-10-29	下载	Mission-oriented drone networks have been widely used for structural inspection, disaster monitoring, border surveillance, etc. Due to the limited battery capacity of drones, mission execution strateg...
BBR Fairness Evaluation Using NS-3	Linchuan Tang	2024-10-29	下载	This paper evaluates the fairness of BBR congestion control using NS-3 simulator. While BBR improves performance over loss-based methods in single flows, unfairness issues emerge with competing BBR an...
Optimizing and Managing Wireless Backhaul for Resilient Next-Generation Cellular Networks	Gabriele Gemmi, Michele Polese, Tommaso Melodia, Leonardo Maccari	2024-10-29	下载	Next-generation wireless networks target high network availability, ubiquitous coverage, and extremely high data rates for mobile users. This requires exploring new frequency bands, e.g.
Cora: Accelerating Stateful Network Applications with SmartNICs	Shaoke Xi, Jiaqi Gao, Mengqi Liu, Jiamin Cao, Fuliang Li, Kai Bu, Kui Ren, Minlan Yu, Dennis Cai, Ennan Zhai	2024-10-29	下载	With the growing performance requirements on networked applications, there is a new trend of offloading stateful network applications to SmartNICs to improve performance and reduce the total cost of o...
From Simulators to Digital Twins for Enabling Emerging Cellular Networks: A Tutorial and Survey	Marvin Manalastas, Muhammad Umar Bin Farooq, Syed Muhammad Asad Zaidi, Haneya Naeem Qureshi, Yusuf Sambo, Ali Imran	2024-10-29	下载	Simulators are indispensable parts of the research and development necessary to advance countless industries, including cellular networks. With simulators, the evaluation, analysis, testing, and exper...
A New Broadcast Primitive for BFT Protocols	Manu Drijvers, Tim Gretler, Yotam Harchol, Tobias Klenze, Ognjen Maric, Stefan Neamtu, Yvonne-Anne Pignolet, Rostislav Rumenov, Daniel Sharifi, Victor Shoup	2024-10-29	下载	Byzantine fault tolerant (BFT) protocol descriptions often assume application-layer networking primitives, such as best-effort and reliable broadcast, which are impossible to implement in practice in ...
Generative AI Enabled Matching for 6G Multiple Access	Xudong Wang, Hongyang Du, Dusit Niyato, Lijie Zhou, Lei Feng, Zhixiang Yang, Fanqin Zhou, Wenjing Li	2024-10-29	下载	In wireless networks, applying deep learning models to solve matching problems between different entities has become a mainstream and effective approach.
ReDAN: An Empirical Study on Remote DoS Attacks against NAT Networks	Xuewei Feng, Yuxiang Yang, Qi Li, Xingxiang Zhan, Kun Sun, Ziqiang Wang, Ao Wang, Ganqiu Du, Ke Xu	2024-10-29	下载	In this paper, we conduct an empirical study on remote DoS attacks targeting NAT networks. We show that Internet attackers operating outside local NAT networks can remotely identify a NAT device and s...
Data streaming platform for crowd-sourced vehicle dataset generation	Felipe Mogollon, Zaloa Fernandez, Angel Martin, Juan Diego Ortega, Gorka Velez	2024-10-29	下载	Vehicles are sophisticated machines equipped with sensors that provide real-time data for onboard driving assistance systems. Due to the wide variety of traffic, road, and weather conditions, continuo...
Cognitive Semantic Augmentation LEO Satellite Networks for Earth Observation	Hong-fu Chou, Vu Nguyen Ha, Prabhu Thiruvasagam, Thanh-Dung Le, Geoffrey Eappen, Ti Ti Nguyen, Duc Dung Tran, Luis M. Garces-Socarras, Juan Carlos Merlano-Duncan, Symeon Chatzinotas	2024-10-29	下载	Earth observation (EO) systems are essential for mapping, catastrophe monitoring, and resource management, but they have trouble processing and sending large amounts of EO data efficiently, especially...
A Range-Free Node Localization Method for Anisotropic Wireless Sensor Networks with Sparse Anchors	Yong Jin, Junfang Leng, Lin Zhou, Yu Jiang, Qian Wei	2024-10-29	下载	In sensor networks characterized by irregular layouts and poor connectivity, anisotropic properties can significantly reduce the accuracy of distance estimation between nodes, consequently impairing t...
Demand-Aware Beam Hopping and Power Allocation for Load Balancing in Digital Twin empowered LEO Satellite Networks	Ruili Zhao, Jun Cai, Jiangtao Luo, Junpeng Gao, Yongyi Ran	2024-10-29	下载	Low-Earth orbit (LEO) satellites utilizing beam hopping (BH) technology offer extensive coverage, low latency, high bandwidth, and significant flexibility.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration	Dezhan Tu, Danylo Vashchilenko, Yuzhe Lu, Panpan Xu	2024-10-29	下载	Vision-Language Models (VLMs) have demonstrated impressive performance across a versatile set of tasks. A key challenge in accelerating VLMs is storing and accessing the large Key-Value (KV) cache tha...
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs	Rishabh Jain, Vivek M. Bhasi, Adwait Jog, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das	2024-10-29	下载	Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalizat...
Two Criteria for Performance Analysis of Optimization Algorithms	Yunpeng Jing, HaiLin Liu, Qunfeng Liu	2024-10-29	下载	Performance analysis is crucial in optimization research, especially when addressing black-box problems through nature-inspired algorithms. Current practices often rely heavily on statistical methods,...