Appearance
2024-10-29
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Systolic Array Data Flows for Efficient Matrix Multiplication in Deep Neural Networks | Tejas Raja | 2024-10-29 | 下载 | The paper discusses how Systolic Arrays can improve matrix multiplication for deep neural networks (DNNs). With AI models like OpenAI's GPT now containing trillions of parameters, the need for efficie... |
| Lost and Found in Speculation: Hybrid Speculative Vulnerability Detection | Mohamadreza Rostami, Shaza Zeitouni, Rahul Kande, Chen Chen, Pouya Mahmoody, Jeyavijayan, Rajendran, Ahmad-Reza Sadeghi | 2024-10-29 | 下载 | Microarchitectural attacks represent a challenging and persistent threat to modern processors, exploiting inherent design vulnerabilities in processors to leak sensitive information or compromise syst... |
| Communication Characterization of AI Workloads for Large-scale Multi-chiplet Accelerators | Mariam Musavi, Emmanuel Irabor, Abhijit Das, Eduard Alarcon, Sergi Abadal | 2024-10-29 | 下载 | Next-generation artificial intelligence (AI) workloads are posing challenges of scalability and robustness in terms of execution time due to their intrinsic evolving data-intensive characteristics. |
| Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs | Rishabh Jain, Vivek M. Bhasi, Adwait Jog, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das | 2024-10-29 | 下载 | Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalizat... |
| Online Alignment and Addition in Multi-Term Floating-Point Adders | Kosmas Alexandridis, Giorgos Dimitrakopoulos | 2024-10-29 | 下载 | Multi-term floating-point addition appears in vector dot-product computations, matrix multiplications, and other forms of floating-point data aggregation. |
| A Host-SSD Collaborative Write Accelerator for LSM-Tree-Based Key-Value Stores | KiHwan Kim, Hyunsun Chung, Seonghoon Ahn, Junhyeok Park, Safdar Jamil, Hongsu Byun, Myungcheol Lee, Jinchun Choi, Youngjae Kim | 2024-10-29 | 下载 | Log-Structured Merge (LSM) tree-based Key-Value Stores (KVSs) are widely adopted for their high performance in write-intensive environments, but they often face performance degradation due to write st... |
| Efficient Reprogramming of Memristive Crossbars for DNNs: Weight Sorting and Bit Stucking | Matheus Farias, H. T. Kung | 2024-10-29 | 下载 | We introduce a novel approach to reduce the number of times required for reprogramming memristors on bit-sliced compute-in-memory crossbars for deep neural networks (DNNs). |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| An AD based library for Efficient Hessian and Hessian-Vector Product Computation on GPU | Desh Ranjan, Mohammad Zubair | 2024-10-29 | 下载 | The Hessian-vector product computation appears in many scientific applications such as in optimization and finite element modeling. Often there is a need for computing Hessian-vector products at many ... |
| Vertical Federated Learning with Missing Features During Training and Inference | Pedro Valdeira, Shiqiang Wang, Yuejie Chi | 2024-10-29 | 下载 | Vertical federated learning trains models from feature-partitioned datasets across multiple clients, who collaborate without sharing their local data. |
| Adaptive Aggregation Weights for Federated Segmentation of Pancreas MRI | Hongyi Pan, Gorkem Durak, Zheyuan Zhang, Yavuz Taktak, Elif Keles, Halil Ertugrul Aktas, Alpay Medetalibeyoglu, Yury Velichko, Concetto Spampinato, Ivo Schoots, Marco J. Bruno, Rajesh N. Keswani, Pallavi Tiwari, Candice Bolan, Tamas Gonda, Michael G. Goggins, Michael B. Wallace, Ziyue Xu, Ulas Bagci | 2024-10-29 | 下载 | Federated learning (FL) enables collaborative model training across institutions without sharing sensitive data, making it an attractive solution for medical imaging tasks. |
| VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration | Dezhan Tu, Danylo Vashchilenko, Yuzhe Lu, Panpan Xu | 2024-10-29 | 下载 | Vision-Language Models (VLMs) have demonstrated impressive performance across a versatile set of tasks. A key challenge in accelerating VLMs is storing and accessing the large Key-Value (KV) cache tha... |
| Unleashing Multicore Strength for Efficient Execution of Transactions | Ankit Ravish, Akshay Tejwani, Piduguralla Manaswini, Sathya Peri | 2024-10-29 | 下载 | Blockchain technology is booming up the digital world in recent days and thus paved a way for creating separate blockchain network for various industries. |
| GPU Sharing with Triples Mode | Chansup Byun, Albert Reuther, LaToya Anderson, William Arcand, Bill Bergeron, David Bestor, Alexander Bonn, Daniel Burrill, Vijay Gadepally, Michael Houle, Matthew Hubbell, Hayden Jananthan, Michael Jones, Piotr Luszczek, Peter Michaleas, Lauren Milechin, Guillermo Morales, Julie Mullen, Andrew Prout, Antonio Rosa, Charles Yee, Jeremy Kepner | 2024-10-29 | 下载 | There is a tremendous amount of interest in AI/ML technologies due to the proliferation of generative AI applications such as ChatGPT. This trend has significantly increased demand on GPUs, which are ... |
| ProMoE: Fast MoE-based LLM Serving using Proactive Caching | Xiaoniu Song, Zihang Zhong, Rong Chen, Haibo Chen | 2024-10-29 | 下载 | The promising applications of large language models are often limited by the constrained GPU memory capacity available on edge devices. Mixture-of-Experts (MoE) models help address this issue by activ... |
| A New Broadcast Primitive for BFT Protocols | Manu Drijvers, Tim Gretler, Yotam Harchol, Tobias Klenze, Ognjen Maric, Stefan Neamtu, Yvonne-Anne Pignolet, Rostislav Rumenov, Daniel Sharifi, Victor Shoup | 2024-10-29 | 下载 | Byzantine fault tolerant (BFT) protocol descriptions often assume application-layer networking primitives, such as best-effort and reliable broadcast, which are impossible to implement in practice in ... |
| Optimizing Streamlined Blockchain Consensus with Generalized Weighted Voting and Enhanced Leader Rotation | Diana Micloiu, Rowdy Chotkan, Jérémie Decouchant | 2024-10-29 | 下载 | Streamlined Byzantine Fault Tolerant (BFT) protocols, such as HotStuff [PODC'19], and weighted voting represent two possible strategies to improve consensus in the distributed systems world. |
| Breaking the Million-Electron and 1 EFLOP/s Barriers: Biomolecular-Scale Ab Initio Molecular Dynamics Using MP2 Potentials | Ryan Stocks, Jorge L. Galvez Vallejo, Fiona C. Y. Yu, Calum Snowdon, Elise Palethorpe, Jakub Kurzak, Dmytro Bykov, Giuseppe M. J. Barca | 2024-10-29 | 下载 | The accurate simulation of complex biochemical phenomena has historically been hampered by the computational requirements of high-fidelity molecular-modeling techniques. |
| Histrio: a Serverless Actor System | Giorgio Natale Buttiglieri, Luca De Martini, Alessandro Margara | 2024-10-29 | 下载 | In recent years, the serverless paradigm has been widely adopted to develop cloud applications, as it enables building scalable solutions while delegating operational concerns such as infrastructure m... |
| Building Castles in the Cloud: Architecting Resilient and Scalable Infrastructure | Naresh Kumar Gundla | 2024-10-29 | 下载 | In the contemporary world of dynamic digital solutions and services, the significance of effective and stable cloud solutions cannot be overestimated. |
| Revisiting Reliability in Large-Scale Machine Learning Research Clusters | Apostolos Kokolis, Michael Kuchnik, John Hoffman, Adithya Kumar, Parth Malani, Faye Ma, Zachary DeVito, Shubho Sengupta, Kalyan Saladi, Carole-Jean Wu | 2024-10-29 | 下载 | Reliability is a fundamental challenge in operating large-scale machine learning (ML) infrastructures, particularly as the scale of ML models and training clusters continues to grow. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Energy-Aware Multi-Agent Reinforcement Learning for Collaborative Execution in Mission-Oriented Drone Networks | Ying Li, Changling Li, Jiyao Chen, Christine Roinou | 2024-10-29 | 下载 | Mission-oriented drone networks have been widely used for structural inspection, disaster monitoring, border surveillance, etc. Due to the limited battery capacity of drones, mission execution strateg... |
| BBR Fairness Evaluation Using NS-3 | Linchuan Tang | 2024-10-29 | 下载 | This paper evaluates the fairness of BBR congestion control using NS-3 simulator. While BBR improves performance over loss-based methods in single flows, unfairness issues emerge with competing BBR an... |
| Optimizing and Managing Wireless Backhaul for Resilient Next-Generation Cellular Networks | Gabriele Gemmi, Michele Polese, Tommaso Melodia, Leonardo Maccari | 2024-10-29 | 下载 | Next-generation wireless networks target high network availability, ubiquitous coverage, and extremely high data rates for mobile users. This requires exploring new frequency bands, e.g. |
| Cora: Accelerating Stateful Network Applications with SmartNICs | Shaoke Xi, Jiaqi Gao, Mengqi Liu, Jiamin Cao, Fuliang Li, Kai Bu, Kui Ren, Minlan Yu, Dennis Cai, Ennan Zhai | 2024-10-29 | 下载 | With the growing performance requirements on networked applications, there is a new trend of offloading stateful network applications to SmartNICs to improve performance and reduce the total cost of o... |
| From Simulators to Digital Twins for Enabling Emerging Cellular Networks: A Tutorial and Survey | Marvin Manalastas, Muhammad Umar Bin Farooq, Syed Muhammad Asad Zaidi, Haneya Naeem Qureshi, Yusuf Sambo, Ali Imran | 2024-10-29 | 下载 | Simulators are indispensable parts of the research and development necessary to advance countless industries, including cellular networks. With simulators, the evaluation, analysis, testing, and exper... |
| A New Broadcast Primitive for BFT Protocols | Manu Drijvers, Tim Gretler, Yotam Harchol, Tobias Klenze, Ognjen Maric, Stefan Neamtu, Yvonne-Anne Pignolet, Rostislav Rumenov, Daniel Sharifi, Victor Shoup | 2024-10-29 | 下载 | Byzantine fault tolerant (BFT) protocol descriptions often assume application-layer networking primitives, such as best-effort and reliable broadcast, which are impossible to implement in practice in ... |
| Generative AI Enabled Matching for 6G Multiple Access | Xudong Wang, Hongyang Du, Dusit Niyato, Lijie Zhou, Lei Feng, Zhixiang Yang, Fanqin Zhou, Wenjing Li | 2024-10-29 | 下载 | In wireless networks, applying deep learning models to solve matching problems between different entities has become a mainstream and effective approach. |
| ReDAN: An Empirical Study on Remote DoS Attacks against NAT Networks | Xuewei Feng, Yuxiang Yang, Qi Li, Xingxiang Zhan, Kun Sun, Ziqiang Wang, Ao Wang, Ganqiu Du, Ke Xu | 2024-10-29 | 下载 | In this paper, we conduct an empirical study on remote DoS attacks targeting NAT networks. We show that Internet attackers operating outside local NAT networks can remotely identify a NAT device and s... |
| Data streaming platform for crowd-sourced vehicle dataset generation | Felipe Mogollon, Zaloa Fernandez, Angel Martin, Juan Diego Ortega, Gorka Velez | 2024-10-29 | 下载 | Vehicles are sophisticated machines equipped with sensors that provide real-time data for onboard driving assistance systems. Due to the wide variety of traffic, road, and weather conditions, continuo... |
| Cognitive Semantic Augmentation LEO Satellite Networks for Earth Observation | Hong-fu Chou, Vu Nguyen Ha, Prabhu Thiruvasagam, Thanh-Dung Le, Geoffrey Eappen, Ti Ti Nguyen, Duc Dung Tran, Luis M. Garces-Socarras, Juan Carlos Merlano-Duncan, Symeon Chatzinotas | 2024-10-29 | 下载 | Earth observation (EO) systems are essential for mapping, catastrophe monitoring, and resource management, but they have trouble processing and sending large amounts of EO data efficiently, especially... |
| A Range-Free Node Localization Method for Anisotropic Wireless Sensor Networks with Sparse Anchors | Yong Jin, Junfang Leng, Lin Zhou, Yu Jiang, Qian Wei | 2024-10-29 | 下载 | In sensor networks characterized by irregular layouts and poor connectivity, anisotropic properties can significantly reduce the accuracy of distance estimation between nodes, consequently impairing t... |
| Demand-Aware Beam Hopping and Power Allocation for Load Balancing in Digital Twin empowered LEO Satellite Networks | Ruili Zhao, Jun Cai, Jiangtao Luo, Junpeng Gao, Yongyi Ran | 2024-10-29 | 下载 | Low-Earth orbit (LEO) satellites utilizing beam hopping (BH) technology offer extensive coverage, low latency, high bandwidth, and significant flexibility. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration | Dezhan Tu, Danylo Vashchilenko, Yuzhe Lu, Panpan Xu | 2024-10-29 | 下载 | Vision-Language Models (VLMs) have demonstrated impressive performance across a versatile set of tasks. A key challenge in accelerating VLMs is storing and accessing the large Key-Value (KV) cache tha... |
| Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs | Rishabh Jain, Vivek M. Bhasi, Adwait Jog, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das | 2024-10-29 | 下载 | Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalizat... |
| Two Criteria for Performance Analysis of Optimization Algorithms | Yunpeng Jing, HaiLin Liu, Qunfeng Liu | 2024-10-29 | 下载 | Performance analysis is crucial in optimization research, especially when addressing black-box problems through nature-inspired algorithms. Current practices often rely heavily on statistical methods,... |