Appearance
2025-02-14
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs | Abhishek Moitra, Arkapravo Ghosh, Shrey Agarwal, Aporva Amarnath, Karthik Swaminathan, Priyadarshini Panda | 2025-02-14 | 下载 | The computational and memory challenges of large language models (LLMs) have sparked several optimization approaches towards their efficient implementation. |
| Lorecast: Layout-Aware Performance and Power Forecasting from Natural Language | Runzhi Wang, Prianka Sengupta, Cristhian Roman-Vicharra, Yiran Chen, Jiang Hu | 2025-02-14 | 下载 | In chip design planning, obtaining reliable performance and power forecasts for various design options is of critical importance. Traditionally, this involves using system-level models, which often la... |
| Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization | Bowen Pang, Kai Li, Ruifeng She, Feifan Wang | 2025-02-14 | 下载 | With the development of large language models (LLMs), it has become increasingly important to optimize hardware usage and improve throughput. In this paper, we study the inference optimization of the ... |
| Translating Common Security Assertions Across Processor Designs: A RISC-V Case Study | Sharjeel Imtiaz, Uljana Reinsalu, Tara Ghasempouri | 2025-02-14 | 下载 | RISC-V is gaining popularity for its adaptability and cost-effectiveness in processor design. With the increasing adoption of RISC-V, the importance of implementing robust security verification has gr... |
| Modeling and Simulating Emerging Memory Technologies: A Tutorial | Yun-Chih Chen, Tristan Seidl, Nils Hölscher, Christian Hakert, Minh Duy Truong, Jian-Jia Chen, João Paulo C. de Lima, Asif Ali Khan, Jeronimo Castrillon, Ali Nezhadi, Lokesh Siddhu, Hassan Nassar, Mahta Mayahinia, Mehdi Baradaran Tahoori, Jörg Henkel, Nils Wilbert, Stefan Wildermann, Jürgen Teich | 2025-02-14 | 下载 | Non-volatile Memory (NVM) technologies present a promising alternative to traditional volatile memories such as SRAM and DRAM. Due to the limited availability of real NVM devices, simulators play a cr... |
| A Hybrid Edge Classifier: Combining TinyML-Optimised CNN with RRAM-CMOS ACAM for Energy-Efficient Inference | Kieran Woodward, Eiman Kanjo, Georgios Papandroulidakis, Shady Agwa, Themis Prodromakis | 2025-02-14 | 下载 | In recent years, the development of smart edge computing systems to process information locally is on the rise. Many near-sensor machine learning (ML) approaches have been implemented to introduce acc... |
| Strassen Multisystolic Array Hardware Architectures | Trevor E. Pogue, Nicola Nicolici | 2025-02-14 | 下载 | While Strassen's matrix multiplication algorithm reduces the complexity of naive matrix multiplication, general-purpose hardware is not suitable for achieving the algorithm's promised theoretical spee... |
| EmbBERT: Attention Under 2 MB Memory | Riccardo Bravin, Massimo Pavan, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri | 2025-02-14 | 下载 | Transformer architectures based on the attention mechanism have revolutionized natural language processing (NLP), driving major breakthroughs across virtually every NLP task. |
| A Cost-Effective Near-Storage Processing Solution for Offline Inference of Long-Context LLMs | Hongsun Jang, Jaeyong Song, Changmin Shin, Si Ung Noh, Jaewon Jung, Jisung Park, Jinho Lee | 2025-02-14 | 下载 | The computational and memory demands of large language models for generative inference present significant challenges for practical deployment. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Distributed Application Provisioning over Ethereum based private and permissioned Blockchain: Availability modeling, capacity, and costs planning | Carlos Melo, Jamilson Dantas, Paulo Pereira, Paulo Maciel | 2025-02-14 | 下载 | Blockchain and Cloud Computing are two of the main topics related to the distributed computing paradigm, and in the last decade, they have seen exponential growth in their adoption. |
| A Comprehensive Hyperledger Fabric Performance Evaluation based on Resources Capacity Planning | Carlos Melo, Glauber Gonçalves, Francisco A. Silva, André Soares | 2025-02-14 | 下载 | Hyperledger Fabric is a platform for permissioned blockchain networks that enables secure and auditable distributed data storage for enterprise applications. |
| Dynamic Fraud Proof | Gabriele Picco, Andrea Fortugno | 2025-02-14 | 下载 | In this paper, we present a novel fraud-proof mechanism that achieves fast finality and, when combined with optimistic execution, enables real-time transaction processing. |
| Investigations of multi-socket high core count RISC-V for HPC workloads | Nick Brown, Christopher Day | 2025-02-14 | 下载 | Whilst RISC-V has become popular in fields such as embedded computing, it is yet to find mainstream success in High Performance Computing (HPC). |
| Seamless acceleration of Fortran intrinsics via AMD AI engines | Nick Brown, Gabriel Rodríguez Canal | 2025-02-14 | 下载 | A major challenge that the HPC community faces is how to continue delivering the performance demanded by scientific programmers, whilst meeting an increased emphasis on sustainable operations. |
| Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization | Bowen Pang, Kai Li, Ruifeng She, Feifan Wang | 2025-02-14 | 下载 | With the development of large language models (LLMs), it has become increasingly important to optimize hardware usage and improve throughput. In this paper, we study the inference optimization of the ... |
| AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence | Zhijie Cai, Xiaowen Cao, Xu Chen, Yuanhao Cui, Guangxu Zhu, Kaibin Huang, Shuguang Cui | 2025-02-14 | 下载 | Recent breakthroughs in artificial intelligence (AI), wireless communications, and sensing technologies have accelerated the evolution of edge intelligence. |
| Semantica: Decentralized Search using a LLM-Guided Semantic Tree Overlay | Petru Neague, Quinten Stokkink, Naman Goel, Johan Pouwelse | 2025-02-14 | 下载 | Centralized search engines are key for the Internet, but lead to undesirable concentration of power. Decentralized alternatives fail to offer equal document retrieval accuracy and speed. |
| Anthemius: Efficient & Modular Block Assembly for Concurrent Execution | Ray Neiheiser, Eleftherios Kokoris-Kogias | 2025-02-14 | 下载 | Many blockchains such as Ethereum execute all incoming transactions sequentially significantly limiting the potential throughput. A common approach to scale execution is parallel execution engines tha... |
| Janus: Collaborative Vision Transformer Under Dynamic Network Environment | Linyi Jiang, Silvery D. Fu, Yifei Zhu, Bo Li | 2025-02-14 | 下载 | Vision Transformers (ViTs) have outperformed traditional Convolutional Neural Network architectures and achieved state-of-the-art results in various computer vision tasks. |
| EmbBERT: Attention Under 2 MB Memory | Riccardo Bravin, Massimo Pavan, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri | 2025-02-14 | 下载 | Transformer architectures based on the attention mechanism have revolutionized natural language processing (NLP), driving major breakthroughs across virtually every NLP task. |
| Energy-Aware Scheduling Strategies for Partially-Replicable Task Chains on Heterogeneous Processors | Yacine Idouar, Adrien Cassagne, Laércio Lima Pilla, Julien Sopena, Manuel Bouyer, Diane Orhan, Lionel Lacassagne, Dimitri Galayko, Denis Barthou, Christophe Jego | 2025-02-14 | 下载 | The arrival of heterogeneous (or hybrid) multicore architectures has brought new performance trade-offs for applications, and efficiency opportunities to systems. |
| SmartEdge: Smart Healthcare End-to-End Integrated Edge and Cloud Computing System for Diabetes Prediction Enabled by Ensemble Machine Learning | Alain Hennebelle, Qifan Dieng, Leila Ismail, Rajkumar Buyya | 2025-02-14 | 下载 | The Internet of Things (IoT) revolutionizes smart city domains such as healthcare, transportation, industry, and education. The Internet of Medical Things (IoMT) is gaining prominence, particularly in... |
| The Blind Men and the Elephant: Mapping Interdisciplinarity in Research on Decentralized Autonomous Organizations | Giorgia Sampò, Oliver Baumann, Marco Peressotti | 2025-02-14 | 下载 | Decentralized Autonomous Organizations (DAOs) are attracting interdisciplinary interest, particularly in business, economics, and computer science. |
| λScale: Enabling Fast Scaling for Serverless Large Language Model Inference | Minchen Yu, Rui Yang, Chaobo Jia, Zhaoyuan Su, Sheng Yao, Tingfeng Lan, Yuchen Yang, Zirui Wang, Yue Cheng, Wei Wang, Ao Wang, Ruichuan Chen | 2025-02-14 | 下载 | Serverless computing has emerged as a compelling solution for cloud-based model inference. However, as modern large language models (LLMs) continue to grow in size, existing serverless platforms often... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Federated Learning-Driven Cybersecurity Framework for IoT Networks with Privacy-Preserving and Real-Time Threat Detection Capabilities | Milad Rahmati | 2025-02-14 | 下载 | The rapid expansion of the Internet of Things (IoT) ecosystem has transformed various sectors but has also introduced significant cybersecurity challenges. |
| TrustZero -- open, verifiable and scalable zero-trust | Adrian-Tudor Dumitrescu, Johan Pouwelse | 2025-02-14 | 下载 | We present a passport-level trust token for Europe. In an era of escalating cyber threats fueled by global competition in economic, military, and technological domains, traditional security models are... |
| Semantica: Decentralized Search using a LLM-Guided Semantic Tree Overlay | Petru Neague, Quinten Stokkink, Naman Goel, Johan Pouwelse | 2025-02-14 | 下载 | Centralized search engines are key for the Internet, but lead to undesirable concentration of power. Decentralized alternatives fail to offer equal document retrieval accuracy and speed. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Analysis of Stable Vertex Values: Fast Query Evaluation Over An Evolving Graph | Mahbod Afarin, Chao Gao, Xizhe Yin, Zhijia Zhao, Nael Abu-Ghazaleh, Rajiv Gupta | 2025-02-14 | 下载 | Evaluating a query over a large, irregular graph is inherently challenging. This challenge intensifies when solving a query over a sequence of snapshots of an evolving graph, where changes occur throu... |
| KernelBench: Can LLMs Write Efficient GPU Kernels? | Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini | 2025-02-14 | 下载 | Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore usin... |
| Seamless acceleration of Fortran intrinsics via AMD AI engines | Nick Brown, Gabriel Rodríguez Canal | 2025-02-14 | 下载 | A major challenge that the HPC community faces is how to continue delivering the performance demanded by scientific programmers, whilst meeting an increased emphasis on sustainable operations. |
| Strassen Multisystolic Array Hardware Architectures | Trevor E. Pogue, Nicola Nicolici | 2025-02-14 | 下载 | While Strassen's matrix multiplication algorithm reduces the complexity of naive matrix multiplication, general-purpose hardware is not suitable for achieving the algorithm's promised theoretical spee... |