Skip to content

2025-02-14

cs.AR - Architecture

标题作者发布日期PDF摘要
MEADOW: Memory-efficient Dataflow and Data Packing for Low Power Edge LLMsAbhishek Moitra, Arkapravo Ghosh, Shrey Agarwal, Aporva Amarnath, Karthik Swaminathan, Priyadarshini Panda2025-02-14下载The computational and memory challenges of large language models (LLMs) have sparked several optimization approaches towards their efficient implementation.
Lorecast: Layout-Aware Performance and Power Forecasting from Natural LanguageRunzhi Wang, Prianka Sengupta, Cristhian Roman-Vicharra, Yiran Chen, Jiang Hu2025-02-14下载In chip design planning, obtaining reliable performance and power forecasts for various design options is of critical importance. Traditionally, this involves using system-level models, which often la...
Hybrid Offline-online Scheduling Method for Large Language Model Inference OptimizationBowen Pang, Kai Li, Ruifeng She, Feifan Wang2025-02-14下载With the development of large language models (LLMs), it has become increasingly important to optimize hardware usage and improve throughput. In this paper, we study the inference optimization of the ...
Translating Common Security Assertions Across Processor Designs: A RISC-V Case StudySharjeel Imtiaz, Uljana Reinsalu, Tara Ghasempouri2025-02-14下载RISC-V is gaining popularity for its adaptability and cost-effectiveness in processor design. With the increasing adoption of RISC-V, the importance of implementing robust security verification has gr...
Modeling and Simulating Emerging Memory Technologies: A TutorialYun-Chih Chen, Tristan Seidl, Nils Hölscher, Christian Hakert, Minh Duy Truong, Jian-Jia Chen, João Paulo C. de Lima, Asif Ali Khan, Jeronimo Castrillon, Ali Nezhadi, Lokesh Siddhu, Hassan Nassar, Mahta Mayahinia, Mehdi Baradaran Tahoori, Jörg Henkel, Nils Wilbert, Stefan Wildermann, Jürgen Teich2025-02-14下载Non-volatile Memory (NVM) technologies present a promising alternative to traditional volatile memories such as SRAM and DRAM. Due to the limited availability of real NVM devices, simulators play a cr...
A Hybrid Edge Classifier: Combining TinyML-Optimised CNN with RRAM-CMOS ACAM for Energy-Efficient InferenceKieran Woodward, Eiman Kanjo, Georgios Papandroulidakis, Shady Agwa, Themis Prodromakis2025-02-14下载In recent years, the development of smart edge computing systems to process information locally is on the rise. Many near-sensor machine learning (ML) approaches have been implemented to introduce acc...
Strassen Multisystolic Array Hardware ArchitecturesTrevor E. Pogue, Nicola Nicolici2025-02-14下载While Strassen's matrix multiplication algorithm reduces the complexity of naive matrix multiplication, general-purpose hardware is not suitable for achieving the algorithm's promised theoretical spee...
EmbBERT: Attention Under 2 MB MemoryRiccardo Bravin, Massimo Pavan, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri2025-02-14下载Transformer architectures based on the attention mechanism have revolutionized natural language processing (NLP), driving major breakthroughs across virtually every NLP task.
A Cost-Effective Near-Storage Processing Solution for Offline Inference of Long-Context LLMsHongsun Jang, Jaeyong Song, Changmin Shin, Si Ung Noh, Jaewon Jung, Jisung Park, Jinho Lee2025-02-14下载The computational and memory demands of large language models for generative inference present significant challenges for practical deployment.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Distributed Application Provisioning over Ethereum based private and permissioned Blockchain: Availability modeling, capacity, and costs planningCarlos Melo, Jamilson Dantas, Paulo Pereira, Paulo Maciel2025-02-14下载Blockchain and Cloud Computing are two of the main topics related to the distributed computing paradigm, and in the last decade, they have seen exponential growth in their adoption.
A Comprehensive Hyperledger Fabric Performance Evaluation based on Resources Capacity PlanningCarlos Melo, Glauber Gonçalves, Francisco A. Silva, André Soares2025-02-14下载Hyperledger Fabric is a platform for permissioned blockchain networks that enables secure and auditable distributed data storage for enterprise applications.
Dynamic Fraud ProofGabriele Picco, Andrea Fortugno2025-02-14下载In this paper, we present a novel fraud-proof mechanism that achieves fast finality and, when combined with optimistic execution, enables real-time transaction processing.
Investigations of multi-socket high core count RISC-V for HPC workloadsNick Brown, Christopher Day2025-02-14下载Whilst RISC-V has become popular in fields such as embedded computing, it is yet to find mainstream success in High Performance Computing (HPC).
Seamless acceleration of Fortran intrinsics via AMD AI enginesNick Brown, Gabriel Rodríguez Canal2025-02-14下载A major challenge that the HPC community faces is how to continue delivering the performance demanded by scientific programmers, whilst meeting an increased emphasis on sustainable operations.
Hybrid Offline-online Scheduling Method for Large Language Model Inference OptimizationBowen Pang, Kai Li, Ruifeng She, Feifan Wang2025-02-14下载With the development of large language models (LLMs), it has become increasingly important to optimize hardware usage and improve throughput. In this paper, we study the inference optimization of the ...
AI-in-the-Loop Sensing and Communication Joint Design for Edge IntelligenceZhijie Cai, Xiaowen Cao, Xu Chen, Yuanhao Cui, Guangxu Zhu, Kaibin Huang, Shuguang Cui2025-02-14下载Recent breakthroughs in artificial intelligence (AI), wireless communications, and sensing technologies have accelerated the evolution of edge intelligence.
Semantica: Decentralized Search using a LLM-Guided Semantic Tree OverlayPetru Neague, Quinten Stokkink, Naman Goel, Johan Pouwelse2025-02-14下载Centralized search engines are key for the Internet, but lead to undesirable concentration of power. Decentralized alternatives fail to offer equal document retrieval accuracy and speed.
Anthemius: Efficient & Modular Block Assembly for Concurrent ExecutionRay Neiheiser, Eleftherios Kokoris-Kogias2025-02-14下载Many blockchains such as Ethereum execute all incoming transactions sequentially significantly limiting the potential throughput. A common approach to scale execution is parallel execution engines tha...
Janus: Collaborative Vision Transformer Under Dynamic Network EnvironmentLinyi Jiang, Silvery D. Fu, Yifei Zhu, Bo Li2025-02-14下载Vision Transformers (ViTs) have outperformed traditional Convolutional Neural Network architectures and achieved state-of-the-art results in various computer vision tasks.
EmbBERT: Attention Under 2 MB MemoryRiccardo Bravin, Massimo Pavan, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri2025-02-14下载Transformer architectures based on the attention mechanism have revolutionized natural language processing (NLP), driving major breakthroughs across virtually every NLP task.
Energy-Aware Scheduling Strategies for Partially-Replicable Task Chains on Heterogeneous ProcessorsYacine Idouar, Adrien Cassagne, Laércio Lima Pilla, Julien Sopena, Manuel Bouyer, Diane Orhan, Lionel Lacassagne, Dimitri Galayko, Denis Barthou, Christophe Jego2025-02-14下载The arrival of heterogeneous (or hybrid) multicore architectures has brought new performance trade-offs for applications, and efficiency opportunities to systems.
SmartEdge: Smart Healthcare End-to-End Integrated Edge and Cloud Computing System for Diabetes Prediction Enabled by Ensemble Machine LearningAlain Hennebelle, Qifan Dieng, Leila Ismail, Rajkumar Buyya2025-02-14下载The Internet of Things (IoT) revolutionizes smart city domains such as healthcare, transportation, industry, and education. The Internet of Medical Things (IoMT) is gaining prominence, particularly in...
The Blind Men and the Elephant: Mapping Interdisciplinarity in Research on Decentralized Autonomous OrganizationsGiorgia Sampò, Oliver Baumann, Marco Peressotti2025-02-14下载Decentralized Autonomous Organizations (DAOs) are attracting interdisciplinary interest, particularly in business, economics, and computer science.
λScale: Enabling Fast Scaling for Serverless Large Language Model InferenceMinchen Yu, Rui Yang, Chaobo Jia, Zhaoyuan Su, Sheng Yao, Tingfeng Lan, Yuchen Yang, Zirui Wang, Yue Cheng, Wei Wang, Ao Wang, Ruichuan Chen2025-02-14下载Serverless computing has emerged as a compelling solution for cloud-based model inference. However, as modern large language models (LLMs) continue to grow in size, existing serverless platforms often...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Federated Learning-Driven Cybersecurity Framework for IoT Networks with Privacy-Preserving and Real-Time Threat Detection CapabilitiesMilad Rahmati2025-02-14下载The rapid expansion of the Internet of Things (IoT) ecosystem has transformed various sectors but has also introduced significant cybersecurity challenges.
TrustZero -- open, verifiable and scalable zero-trustAdrian-Tudor Dumitrescu, Johan Pouwelse2025-02-14下载We present a passport-level trust token for Europe. In an era of escalating cyber threats fueled by global competition in economic, military, and technological domains, traditional security models are...
Semantica: Decentralized Search using a LLM-Guided Semantic Tree OverlayPetru Neague, Quinten Stokkink, Naman Goel, Johan Pouwelse2025-02-14下载Centralized search engines are key for the Internet, but lead to undesirable concentration of power. Decentralized alternatives fail to offer equal document retrieval accuracy and speed.

cs.PF - Performance

标题作者发布日期PDF摘要
Analysis of Stable Vertex Values: Fast Query Evaluation Over An Evolving GraphMahbod Afarin, Chao Gao, Xizhe Yin, Zhijia Zhao, Nael Abu-Ghazaleh, Rajiv Gupta2025-02-14下载Evaluating a query over a large, irregular graph is inherently challenging. This challenge intensifies when solving a query over a sequence of snapshots of an evolving graph, where changes occur throu...
KernelBench: Can LLMs Write Efficient GPU Kernels?Anne Ouyang, Simon Guo, Simran Arora, Alex L. Zhang, William Hu, Christopher Ré, Azalia Mirhoseini2025-02-14下载Efficient GPU kernels are crucial for building performant machine learning architectures, but writing them is a time-consuming challenge that requires significant expertise; therefore, we explore usin...
Seamless acceleration of Fortran intrinsics via AMD AI enginesNick Brown, Gabriel Rodríguez Canal2025-02-14下载A major challenge that the HPC community faces is how to continue delivering the performance demanded by scientific programmers, whilst meeting an increased emphasis on sustainable operations.
Strassen Multisystolic Array Hardware ArchitecturesTrevor E. Pogue, Nicola Nicolici2025-02-14下载While Strassen's matrix multiplication algorithm reduces the complexity of naive matrix multiplication, general-purpose hardware is not suitable for achieving the algorithm's promised theoretical spee...

基于 VitePress 构建