Skip to content

2024-10-29

cs.AR - Architecture

标题作者发布日期PDF摘要
Systolic Array Data Flows for Efficient Matrix Multiplication in Deep Neural NetworksTejas Raja2024-10-29下载The paper discusses how Systolic Arrays can improve matrix multiplication for deep neural networks (DNNs). With AI models like OpenAI's GPT now containing trillions of parameters, the need for efficie...
Lost and Found in Speculation: Hybrid Speculative Vulnerability DetectionMohamadreza Rostami, Shaza Zeitouni, Rahul Kande, Chen Chen, Pouya Mahmoody, Jeyavijayan, Rajendran, Ahmad-Reza Sadeghi2024-10-29下载Microarchitectural attacks represent a challenging and persistent threat to modern processors, exploiting inherent design vulnerabilities in processors to leak sensitive information or compromise syst...
Communication Characterization of AI Workloads for Large-scale Multi-chiplet AcceleratorsMariam Musavi, Emmanuel Irabor, Abhijit Das, Eduard Alarcon, Sergi Abadal2024-10-29下载Next-generation artificial intelligence (AI) workloads are posing challenges of scalability and robustness in terms of execution time due to their intrinsic evolving data-intensive characteristics.
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUsRishabh Jain, Vivek M. Bhasi, Adwait Jog, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das2024-10-29下载Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalizat...
Online Alignment and Addition in Multi-Term Floating-Point AddersKosmas Alexandridis, Giorgos Dimitrakopoulos2024-10-29下载Multi-term floating-point addition appears in vector dot-product computations, matrix multiplications, and other forms of floating-point data aggregation.
A Host-SSD Collaborative Write Accelerator for LSM-Tree-Based Key-Value StoresKiHwan Kim, Hyunsun Chung, Seonghoon Ahn, Junhyeok Park, Safdar Jamil, Hongsu Byun, Myungcheol Lee, Jinchun Choi, Youngjae Kim2024-10-29下载Log-Structured Merge (LSM) tree-based Key-Value Stores (KVSs) are widely adopted for their high performance in write-intensive environments, but they often face performance degradation due to write st...
Efficient Reprogramming of Memristive Crossbars for DNNs: Weight Sorting and Bit StuckingMatheus Farias, H. T. Kung2024-10-29下载We introduce a novel approach to reduce the number of times required for reprogramming memristors on bit-sliced compute-in-memory crossbars for deep neural networks (DNNs).

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
An AD based library for Efficient Hessian and Hessian-Vector Product Computation on GPUDesh Ranjan, Mohammad Zubair2024-10-29下载The Hessian-vector product computation appears in many scientific applications such as in optimization and finite element modeling. Often there is a need for computing Hessian-vector products at many ...
Vertical Federated Learning with Missing Features During Training and InferencePedro Valdeira, Shiqiang Wang, Yuejie Chi2024-10-29下载Vertical federated learning trains models from feature-partitioned datasets across multiple clients, who collaborate without sharing their local data.
Adaptive Aggregation Weights for Federated Segmentation of Pancreas MRIHongyi Pan, Gorkem Durak, Zheyuan Zhang, Yavuz Taktak, Elif Keles, Halil Ertugrul Aktas, Alpay Medetalibeyoglu, Yury Velichko, Concetto Spampinato, Ivo Schoots, Marco J. Bruno, Rajesh N. Keswani, Pallavi Tiwari, Candice Bolan, Tamas Gonda, Michael G. Goggins, Michael B. Wallace, Ziyue Xu, Ulas Bagci2024-10-29下载Federated learning (FL) enables collaborative model training across institutions without sharing sensitive data, making it an attractive solution for medical imaging tasks.
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference AccelerationDezhan Tu, Danylo Vashchilenko, Yuzhe Lu, Panpan Xu2024-10-29下载Vision-Language Models (VLMs) have demonstrated impressive performance across a versatile set of tasks. A key challenge in accelerating VLMs is storing and accessing the large Key-Value (KV) cache tha...
Unleashing Multicore Strength for Efficient Execution of TransactionsAnkit Ravish, Akshay Tejwani, Piduguralla Manaswini, Sathya Peri2024-10-29下载Blockchain technology is booming up the digital world in recent days and thus paved a way for creating separate blockchain network for various industries.
GPU Sharing with Triples ModeChansup Byun, Albert Reuther, LaToya Anderson, William Arcand, Bill Bergeron, David Bestor, Alexander Bonn, Daniel Burrill, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Piotr Luszczek, Peter Michaleas, Lauren Milechin, Guillermo Morales, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Jeremy Kepner2024-10-29下载There is a tremendous amount of interest in AI/ML technologies due to the proliferation of generative AI applications such as ChatGPT. This trend has significantly increased demand on GPUs, which are ...
ProMoE: Fast MoE-based LLM Serving using Proactive CachingXiaoniu Song, Zihang Zhong, Rong Chen, Haibo Chen2024-10-29下载The promising applications of large language models are often limited by the constrained GPU memory capacity available on edge devices. Mixture-of-Experts (MoE) models help address this issue by activ...
A New Broadcast Primitive for BFT ProtocolsManu Drijvers, Tim Gretler, Yotam Harchol, Tobias Klenze, Ognjen Maric, Stefan Neamtu, Yvonne-Anne Pignolet, Rostislav Rumenov, Daniel Sharifi, Victor Shoup2024-10-29下载Byzantine fault tolerant (BFT) protocol descriptions often assume application-layer networking primitives, such as best-effort and reliable broadcast, which are impossible to implement in practice in ...
Optimizing Streamlined Blockchain Consensus with Generalized Weighted Voting and Enhanced Leader RotationDiana Micloiu, Rowdy Chotkan, Jérémie Decouchant2024-10-29下载Streamlined Byzantine Fault Tolerant (BFT) protocols, such as HotStuff [PODC'19], and weighted voting represent two possible strategies to improve consensus in the distributed systems world.
Breaking the Million-Electron and 1 EFLOP/s Barriers: Biomolecular-Scale Ab Initio Molecular Dynamics Using MP2 PotentialsRyan Stocks, Jorge L. Galvez Vallejo, Fiona C. Y. Yu, Calum Snowdon, Elise Palethorpe, Jakub Kurzak, Dmytro Bykov, Giuseppe M. J. Barca2024-10-29下载The accurate simulation of complex biochemical phenomena has historically been hampered by the computational requirements of high-fidelity molecular-modeling techniques.
Histrio: a Serverless Actor SystemGiorgio Natale Buttiglieri, Luca De Martini, Alessandro Margara2024-10-29下载In recent years, the serverless paradigm has been widely adopted to develop cloud applications, as it enables building scalable solutions while delegating operational concerns such as infrastructure m...
Building Castles in the Cloud: Architecting Resilient and Scalable InfrastructureNaresh Kumar Gundla2024-10-29下载In the contemporary world of dynamic digital solutions and services, the significance of effective and stable cloud solutions cannot be overestimated.
Revisiting Reliability in Large-Scale Machine Learning Research ClustersApostolos Kokolis, Michael Kuchnik, John Hoffman, Adithya Kumar, Parth Malani, Faye Ma, Zachary DeVito, Shubho Sengupta, Kalyan Saladi, Carole-Jean Wu2024-10-29下载Reliability is a fundamental challenge in operating large-scale machine learning (ML) infrastructures, particularly as the scale of ML models and training clusters continues to grow.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Energy-Aware Multi-Agent Reinforcement Learning for Collaborative Execution in Mission-Oriented Drone NetworksYing Li, Changling Li, Jiyao Chen, Christine Roinou2024-10-29下载Mission-oriented drone networks have been widely used for structural inspection, disaster monitoring, border surveillance, etc. Due to the limited battery capacity of drones, mission execution strateg...
BBR Fairness Evaluation Using NS-3Linchuan Tang2024-10-29下载This paper evaluates the fairness of BBR congestion control using NS-3 simulator. While BBR improves performance over loss-based methods in single flows, unfairness issues emerge with competing BBR an...
Optimizing and Managing Wireless Backhaul for Resilient Next-Generation Cellular NetworksGabriele Gemmi, Michele Polese, Tommaso Melodia, Leonardo Maccari2024-10-29下载Next-generation wireless networks target high network availability, ubiquitous coverage, and extremely high data rates for mobile users. This requires exploring new frequency bands, e.g.
Cora: Accelerating Stateful Network Applications with SmartNICsShaoke Xi, Jiaqi Gao, Mengqi Liu, Jiamin Cao, Fuliang Li, Kai Bu, Kui Ren, Minlan Yu, Dennis Cai, Ennan Zhai2024-10-29下载With the growing performance requirements on networked applications, there is a new trend of offloading stateful network applications to SmartNICs to improve performance and reduce the total cost of o...
From Simulators to Digital Twins for Enabling Emerging Cellular Networks: A Tutorial and SurveyMarvin Manalastas, Muhammad Umar Bin Farooq, Syed Muhammad Asad Zaidi, Haneya Naeem Qureshi, Yusuf Sambo, Ali Imran2024-10-29下载Simulators are indispensable parts of the research and development necessary to advance countless industries, including cellular networks. With simulators, the evaluation, analysis, testing, and exper...
A New Broadcast Primitive for BFT ProtocolsManu Drijvers, Tim Gretler, Yotam Harchol, Tobias Klenze, Ognjen Maric, Stefan Neamtu, Yvonne-Anne Pignolet, Rostislav Rumenov, Daniel Sharifi, Victor Shoup2024-10-29下载Byzantine fault tolerant (BFT) protocol descriptions often assume application-layer networking primitives, such as best-effort and reliable broadcast, which are impossible to implement in practice in ...
Generative AI Enabled Matching for 6G Multiple AccessXudong Wang, Hongyang Du, Dusit Niyato, Lijie Zhou, Lei Feng, Zhixiang Yang, Fanqin Zhou, Wenjing Li2024-10-29下载In wireless networks, applying deep learning models to solve matching problems between different entities has become a mainstream and effective approach.
ReDAN: An Empirical Study on Remote DoS Attacks against NAT NetworksXuewei Feng, Yuxiang Yang, Qi Li, Xingxiang Zhan, Kun Sun, Ziqiang Wang, Ao Wang, Ganqiu Du, Ke Xu2024-10-29下载In this paper, we conduct an empirical study on remote DoS attacks targeting NAT networks. We show that Internet attackers operating outside local NAT networks can remotely identify a NAT device and s...
Data streaming platform for crowd-sourced vehicle dataset generationFelipe Mogollon, Zaloa Fernandez, Angel Martin, Juan Diego Ortega, Gorka Velez2024-10-29下载Vehicles are sophisticated machines equipped with sensors that provide real-time data for onboard driving assistance systems. Due to the wide variety of traffic, road, and weather conditions, continuo...
Cognitive Semantic Augmentation LEO Satellite Networks for Earth ObservationHong-fu Chou, Vu Nguyen Ha, Prabhu Thiruvasagam, Thanh-Dung Le, Geoffrey Eappen, Ti Ti Nguyen, Duc Dung Tran, Luis M. Garces-Socarras, Juan Carlos Merlano-Duncan, Symeon Chatzinotas2024-10-29下载Earth observation (EO) systems are essential for mapping, catastrophe monitoring, and resource management, but they have trouble processing and sending large amounts of EO data efficiently, especially...
A Range-Free Node Localization Method for Anisotropic Wireless Sensor Networks with Sparse AnchorsYong Jin, Junfang Leng, Lin Zhou, Yu Jiang, Qian Wei2024-10-29下载In sensor networks characterized by irregular layouts and poor connectivity, anisotropic properties can significantly reduce the accuracy of distance estimation between nodes, consequently impairing t...
Demand-Aware Beam Hopping and Power Allocation for Load Balancing in Digital Twin empowered LEO Satellite NetworksRuili Zhao, Jun Cai, Jiangtao Luo, Junpeng Gao, Yongyi Ran2024-10-29下载Low-Earth orbit (LEO) satellites utilizing beam hopping (BH) technology offer extensive coverage, low latency, high bandwidth, and significant flexibility.

cs.PF - Performance

标题作者发布日期PDF摘要
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference AccelerationDezhan Tu, Danylo Vashchilenko, Yuzhe Lu, Panpan Xu2024-10-29下载Vision-Language Models (VLMs) have demonstrated impressive performance across a versatile set of tasks. A key challenge in accelerating VLMs is storing and accessing the large Key-Value (KV) cache tha...
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUsRishabh Jain, Vivek M. Bhasi, Adwait Jog, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das2024-10-29下载Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalizat...
Two Criteria for Performance Analysis of Optimization AlgorithmsYunpeng Jing, HaiLin Liu, Qunfeng Liu2024-10-29下载Performance analysis is crucial in optimization research, especially when addressing black-box problems through nature-inspired algorithms. Current practices often rely heavily on statistical methods,...

基于 VitePress 构建