2025-02-14

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMs	Abhishek Moitra, Arkapravo Ghosh, Shrey Agarwal, Aporva Amarnath, Karthik Swaminathan, Priyadarshini Panda	2025-02-14	下载	The computational and memory challenges of large language models (LLMs) have sparked several optimization approaches towards their efficient implementation.
Lorecast: Layout-Aware Performance and Power Forecasting from Natural Language	Runzhi Wang, Prianka Sengupta, Cristhian Roman-Vicharra, Yiran Chen, Jiang Hu	2025-02-14	下载	In chip design planning, obtaining reliable performance and power forecasts for various design options is of critical importance. Traditionally, this involves using system-level models, which often la...
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization	Bowen Pang, Kai Li, Ruifeng She, Feifan Wang	2025-02-14	下载	With the development of large language models (LLMs), it has become increasingly important to optimize hardware usage and improve throughput. In this paper, we study the inference optimization of the ...
Translating Common Security Assertions Across Processor Designs: A RISC-V Case Study	Sharjeel Imtiaz, Uljana Reinsalu, Tara Ghasempouri	2025-02-14	下载	RISC-V is gaining popularity for its adaptability and cost-effectiveness in processor design. With the increasing adoption of RISC-V, the importance of implementing robust security verification has gr...
Modeling and Simulating Emerging Memory Technologies: A Tutorial	Yun-Chih Chen, Tristan Seidl, Nils Hölscher, Christian Hakert, Minh Duy Truong, Jian-Jia Chen, João Paulo C. de Lima, Asif Ali Khan, Jeronimo Castrillon, Ali Nezhadi, Lokesh Siddhu, Hassan Nassar, Mahta Mayahinia, Mehdi Baradaran Tahoori, Jörg Henkel, Nils Wilbert, Stefan Wildermann, Jürgen Teich	2025-02-14	下载	Non-volatile Memory (NVM) technologies present a promising alternative to traditional volatile memories such as SRAM and DRAM. Due to the limited availability of real NVM devices, simulators play a cr...
A Hybrid Edge Classifier: Combining TinyML-Optimised CNN with RRAM-CMOS ACAM for Energy-Efficient Inference	Kieran Woodward, Eiman Kanjo, Georgios Papandroulidakis, Shady Agwa, Themis Prodromakis	2025-02-14	下载	In recent years, the development of smart edge computing systems to process information locally is on the rise. Many near-sensor machine learning (ML) approaches have been implemented to introduce acc...
Strassen Multisystolic Array Hardware Architectures	Trevor E. Pogue, Nicola Nicolici	2025-02-14	下载	While Strassen's matrix multiplication algorithm reduces the complexity of naive matrix multiplication, general-purpose hardware is not suitable for achieving the algorithm's promised theoretical spee...
EmbBERT: Attention Under 2 MB Memory	Riccardo Bravin, Massimo Pavan, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri	2025-02-14	下载	Transformer architectures based on the attention mechanism have revolutionized natural language processing (NLP), driving major breakthroughs across virtually every NLP task.
A Cost-Effective Near-Storage Processing Solution for Offline Inference of Long-Context LLMs	Hongsun Jang, Jaeyong Song, Changmin Shin, Si Ung Noh, Jaewon Jung, Jisung Park, Jinho Lee	2025-02-14	下载	The computational and memory demands of large language models for generative inference present significant challenges for practical deployment.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Distributed Application Provisioning over Ethereum based private and permissioned Blockchain: Availability modeling, capacity, and costs planning	Carlos Melo, Jamilson Dantas, Paulo Pereira, Paulo Maciel	2025-02-14	下载	Blockchain and Cloud Computing are two of the main topics related to the distributed computing paradigm, and in the last decade, they have seen exponential growth in their adoption.
A Comprehensive Hyperledger Fabric Performance Evaluation based on Resources Capacity Planning	Carlos Melo, Glauber Gonçalves, Francisco A. Silva, André Soares	2025-02-14	下载	Hyperledger Fabric is a platform for permissioned blockchain networks that enables secure and auditable distributed data storage for enterprise applications.
Dynamic Fraud Proof	Gabriele Picco, Andrea Fortugno	2025-02-14	下载	In this paper, we present a novel fraud-proof mechanism that achieves fast finality and, when combined with optimistic execution, enables real-time transaction processing.
Investigations of multi-socket high core count RISC-V for HPC workloads	Nick Brown, Christopher Day	2025-02-14	下载	Whilst RISC-V has become popular in fields such as embedded computing, it is yet to find mainstream success in High Performance Computing (HPC).
Seamless acceleration of Fortran intrinsics via AMD AI engines	Nick Brown, Gabriel Rodríguez Canal	2025-02-14	下载	A major challenge that the HPC community faces is how to continue delivering the performance demanded by scientific programmers, whilst meeting an increased emphasis on sustainable operations.
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization	Bowen Pang, Kai Li, Ruifeng She, Feifan Wang	2025-02-14	下载	With the development of large language models (LLMs), it has become increasingly important to optimize hardware usage and improve throughput. In this paper, we study the inference optimization of the ...
AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence	Zhijie Cai, Xiaowen Cao, Xu Chen, Yuanhao Cui, Guangxu Zhu, Kaibin Huang, Shuguang Cui	2025-02-14	下载	Recent breakthroughs in artificial intelligence (AI), wireless communications, and sensing technologies have accelerated the evolution of edge intelligence.
Semantica: Decentralized Search using a LLM-Guided Semantic Tree Overlay	Petru Neague, Quinten Stokkink, Naman Goel, Johan Pouwelse	2025-02-14	下载	Centralized search engines are key for the Internet, but lead to undesirable concentration of power. Decentralized alternatives fail to offer equal document retrieval accuracy and speed.
Anthemius: Efficient & Modular Block Assembly for Concurrent Execution	Ray Neiheiser, Eleftherios Kokoris-Kogias	2025-02-14	下载	Many blockchains such as Ethereum execute all incoming transactions sequentially significantly limiting the potential throughput. A common approach to scale execution is parallel execution engines tha...
Janus: Collaborative Vision Transformer Under Dynamic Network Environment	Linyi Jiang, Silvery D. Fu, Yifei Zhu, Bo Li	2025-02-14	下载	Vision Transformers (ViTs) have outperformed traditional Convolutional Neural Network architectures and achieved state-of-the-art results in various computer vision tasks.
EmbBERT: Attention Under 2 MB Memory	Riccardo Bravin, Massimo Pavan, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri	2025-02-14	下载	Transformer architectures based on the attention mechanism have revolutionized natural language processing (NLP), driving major breakthroughs across virtually every NLP task.
Energy-Aware Scheduling Strategies for Partially-Replicable Task Chains on Heterogeneous Processors	Yacine Idouar, Adrien Cassagne, Laércio Lima Pilla, Julien Sopena, Manuel Bouyer, Diane Orhan, Lionel Lacassagne, Dimitri Galayko, Denis Barthou, Christophe Jego	2025-02-14	下载	The arrival of heterogeneous (or hybrid) multicore architectures has brought new performance trade-offs for applications, and efficiency opportunities to systems.
SmartEdge: Smart Healthcare End-to-End Integrated Edge and Cloud Computing System for Diabetes Prediction Enabled by Ensemble Machine Learning	Alain Hennebelle, Qifan Dieng, Leila Ismail, Rajkumar Buyya	2025-02-14	下载	The Internet of Things (IoT) revolutionizes smart city domains such as healthcare, transportation, industry, and education. The Internet of Medical Things (IoMT) is gaining prominence, particularly in...
The Blind Men and the Elephant: Mapping Interdisciplinarity in Research on Decentralized Autonomous Organizations	Giorgia Sampò, Oliver Baumann, Marco Peressotti	2025-02-14	下载	Decentralized Autonomous Organizations (DAOs) are attracting interdisciplinary interest, particularly in business, economics, and computer science.
λScale: Enabling Fast Scaling for Serverless Large Language Model Inference	Minchen Yu, Rui Yang, Chaobo Jia, Zhaoyuan Su, Sheng Yao, Tingfeng Lan, Yuchen Yang, Zirui Wang, Yue Cheng, Wei Wang, Ao Wang, Ruichuan Chen	2025-02-14	下载	Serverless computing has emerged as a compelling solution for cloud-based model inference. However, as modern large language models (LLMs) continue to grow in size, existing serverless platforms often...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Federated Learning-Driven Cybersecurity Framework for IoT Networks with Privacy-Preserving and Real-Time Threat Detection Capabilities	Milad Rahmati	2025-02-14	下载	The rapid expansion of the Internet of Things (IoT) ecosystem has transformed various sectors but has also introduced significant cybersecurity challenges.
TrustZero -- open, verifiable and scalable zero-trust	Adrian-Tudor Dumitrescu, Johan Pouwelse	2025-02-14	下载	We present a passport-level trust token for Europe. In an era of escalating cyber threats fueled by global competition in economic, military, and technological domains, traditional security models are...
Semantica: Decentralized Search using a LLM-Guided Semantic Tree Overlay	Petru Neague, Quinten Stokkink, Naman Goel, Johan Pouwelse	2025-02-14	下载	Centralized search engines are key for the Internet, but lead to undesirable concentration of power. Decentralized alternatives fail to offer equal document retrieval accuracy and speed.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Analysis of Stable Vertex Values: Fast Query Evaluation Over An Evolving Graph	Mahbod Afarin, Chao Gao, Xizhe Yin, Zhijia Zhao, Nael Abu-Ghazaleh, Rajiv Gupta	2025-02-14	下载	Evaluating a query over a large, irregular graph is inherently challenging. This challenge intensifies when solving a query over a sequence of snapshots of an evolving graph, where changes occur throu...
KernelBench: Can LLMs Write Efficient GPU Kernels?	Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini	2025-02-14	下载	Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore usin...
Seamless acceleration of Fortran intrinsics via AMD AI engines	Nick Brown, Gabriel Rodríguez Canal	2025-02-14	下载	A major challenge that the HPC community faces is how to continue delivering the performance demanded by scientific programmers, whilst meeting an increased emphasis on sustainable operations.
Strassen Multisystolic Array Hardware Architectures	Trevor E. Pogue, Nicola Nicolici	2025-02-14	下载	While Strassen's matrix multiplication algorithm reduces the complexity of naive matrix multiplication, general-purpose hardware is not suitable for achieving the algorithm's promised theoretical spee...