Skip to content

2025-02-17

cs.AR - Architecture

标题作者发布日期PDF摘要
Nexus Machine: An Active Message Inspired Reconfigurable Architecture for Irregular WorkloadsRohan Juneja, Pranav Dangi, Thilini Kaushalya Bandara, Tulika Mitra, Li-shiuan Peh2025-02-17下载Modern reconfigurable architectures are increasingly favored for resource-constrained edge devices as they balance high performance, energy efficiency, and programmability well.
Hardware-Software Co-Design for Accelerating Transformer Inference Leveraging Compute-in-MemoryDong Eun Kim, Tanvi Sharma, Kaushik Roy2025-02-17下载Transformers have become the backbone of neural network architecture for most machine learning applications. Their widespread use has resulted in multiple efforts on accelerating attention, the basic ...
Gem5-AcceSys: Enabling System-Level Exploration of Standard Interconnects for Novel AcceleratorsQunyou Liu, Marina Zapater, David Atienza2025-02-17下载The growing demand for efficient, high-performance processing in machine learning (ML) and image processing has made hardware accelerators, such as GPUs and Data Streaming Accelerators (DSAs), increas...
Mapping and Execution of Nested Loops on Processor Arrays: CGRAs vs. TCPAsDominik Walter, Marita Halm, Daniel Seidel, Indrayudh Ghosh, Christian Heidorn, Frank Hannig, Jürgen Teich2025-02-17下载Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate...
HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language ModelsTianfan Peng, Jiajun Qin, Tianhua Xia, Sai Qian Zhang2025-02-17下载Large language models (LLMs) have revolutionized natural language processing (NLP) tasks by achieving state-of-the-art performance across a range of benchmarks.
Exploring the Versal AI Engine for 3D Gaussian SplattingKotaro Shimamura, Ayumi Ohno, Shinya Takamaeda-Yamazaki2025-02-17下载Dataflow-oriented spatial architectures are the emerging paradigm for higher computation performance and efficiency. AMD Versal AI Engine is a commercial spatial architecture consisting of tiles of ...
Understanding RowHammer Under Reduced Refresh Latency: Experimental Analysis of Real DRAM Chips and Implications on Future SolutionsYahya Can Tuğrul, A. Giray Yağlıkçı, İsmail Emir Yüksel, Ataberk Olgun, Oğuzhan Canpolat, Nisa Bostancı, Mohammad Sadrosadati, Oğuz Ergin, Onur Mutlu2025-02-17下载RowHammer is a major read disturbance mechanism in DRAM where repeatedly accessing (hammering) a row of DRAM cells (DRAM row) induces bitflips in physically nearby DRAM rows (victim rows).
Accelerating Elliptic Curve Point Additions on Versal AI Engine for Multi-scalar MultiplicationAyumi Ohno, Kotaro Shimamura, Shinya Takamaeda-Yamazaki2025-02-17下载Multi-scalar multiplication (MSM) is crucial in cryptographic applications and computationally intensive in zero-knowledge proofs. MSM involves accumulating the products of scalars and points on an el...
Non-Binary LDPC Arithmetic Error Correction For Processing-in-MemoryDaijing Shi, Yihang Zhu, Anjunyi Fan, Yaoyu Tao, Yuchao Yang, Bonan Yan2025-02-17下载Processing-in-memory (PIM) based on emerging devices such as memristors is more vulnerable to noise than traditional memories, due to the physical non-idealities and complex operations in analog domai...
SparseZipper: Enhancing Matrix Extensions to Accelerate SpGEMM on CPUsTuan Ta, Joshua Randall, Christopher Batten2025-02-17下载The importance of general matrix multiplication (GEMM) is motivating new instruction set extensions for multiplying dense matrices in almost all contemporary ISAs, and these extensions are often imple...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Understanding Silent Data Corruption in LLM TrainingJeffrey Ma, Hengzhi Pei, Leonard Lausen, George Karypis2025-02-17下载As the scale of training large language models (LLMs) increases, one emergent failure is silent data corruption (SDC), where hardware produces incorrect computations without explicit failure signals.
Connecting Large Language Model Agent to High Performance Computing ResourceHeng Ma, Alexander Brace, Carlo Siebenschuh, Greg Pauloski, Ian Foster, Arvind Ramanathan2025-02-17下载The Large Language Model agent workflow enables the LLM to invoke tool functions to increase the performance on specific scientific domain questions.
Distributed Consensus Network: A Modularized Communication Framework and Reliability Probabilistic AnalysisYuetai Li, Zhangchen Xu, Yiqi Wang, Zihan Zhou, Lei Zhang, Jon Crowcroft2025-02-17下载In this paper, we propose a modularized framework for communication processes applicable to crash and Byzantine fault-tolerant consensus protocols.
Transaction Fee Market Design for Parallel ExecutionBahar Acilan, Andrei Constantinescu, Lioba Heimbach, Roger Wattenhofer2025-02-17下载Given the low throughput of blockchains like Bitcoin and Ethereum, scalability - the ability to process an increasing number of transactions - has become a central focus of blockchain research.
Comparison of Vectorization Capabilities of Different Compilers for X86 and ARM CPUsNazmus Sakib, Tarun Prabhu, Nandakishore Santhi, John Shalf, Abdel-Hameed A. Badawy2025-02-17下载Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability of compilers to automatically vectorize code is critic...
Bitnet.cpp: Efficient Edge Inference for Ternary LLMsJinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei2025-02-17下载The advent of 1-bit large language models (LLMs), led by BitNet b1.58, has spurred interest in ternary LLMs. Despite this, research and practical applications focusing on efficient edge inference for ...
A Proposed End-To-End Principle for Data CommonsRobert L. Grossman2025-02-17下载A data commons brings together (or co-locates) data with cloud computing infrastructure and commonly used software services, tools and applications for managing, analyzing and sharing data to create a...
On the Locality of the Lovász Local LemmaPeter Davies-Peck2025-02-17下载The Lovász Local Lemma is a versatile result in probability theory, characterizing circumstances in which a collection of nn `bad events', each occurring with probability at most pp and dependent on...
Parallel-in-Time Kalman Smoothing Using Orthogonal TransformationsShahaf Gargir, Sivan Toledo2025-02-17下载We present a numerically-stable parallel-in-time linear Kalman smoother. The smoother uses a novel highly-parallel QR factorization for a class of structured sparse matrices for state estimation, and ...
InTec: integrated things-edge computing: a framework for distributing machine learning pipelines in edge AI systemsHabib Larian, Faramarz Safi-Esfahani2025-02-17下载With the rapid expansion of the Internet of Things (IoT), sensors, smartphones, and wearables have become integral to daily life, powering smart applications in home automation, healthcare, and intell...
Robust Set Partitioning Strategy for Malicious Information Detection in Large-Scale Internet of ThingsYuhan Suo, Runqi Chai, Kaiyuan Chen, Senchun Chai, Wannian Liang, Yuanqing Xia2025-02-17下载With the rapid development of the Internet of Things (IoT), the risks of data tampering and malicious information injection have intensified, making efficient threat detection in large-scale distribut...
Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous DecodingTian Jin, Ellie Y. Cheng, Zack Ankner, Nikunj Saunshi, Blake M. Elias, Amir Yazdanbakhsh, Jonathan Ragan-Kelley, Suvinay Subramanian, Michael Carbin2025-02-17下载Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and ...
zScore: A Universal Decentralised Reputation System for the Blockchain EconomyHimanshu Udupi, Ashutosh Sahoo, Akshay S. P., Gurukiran S., Parag Paul, Petrus C. Martens2025-02-17下载Modern society functions on trust. The onchain economy, however, is built on the founding principles of trustless peer-to-peer interactions in an adversarial environment without a centralised body of ...
GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale RecommendationsZhuoning Guo, Guangxing Chen, Qian Gao, Xiaochao Liao, Jianjia Zheng, Lu Shen, Hao Liu2025-02-17下载Web recommendations provide personalized items from massive catalogs for users, which rely heavily on retrieval stages to trade off the effectiveness and efficiency of selecting a small relevant set f...
BagChain: A Dual-functional Blockchain Leveraging Bagging-based Distributed LearningZixiang Cui, Xintong Ling, Xingyu Zhou, Jiaheng Wang, Zhi Ding, Xiqi Gao2025-02-17下载This work proposes a dual-functional blockchain framework named BagChain for bagging-based decentralized learning. BagChain integrates blockchain with distributed machine learning by replacing the com...
DiSCo: Device-Server Collaborative LLM-Based Text Streaming ServicesTing Sun, Penghan Wang, Fan Lai2025-02-17下载The rapid rise of large language models (LLMs) in text streaming services has introduced significant cost and Quality of Experience (QoE) challenges in serving millions of daily requests, especially i...
Gensor: A Graph-based Construction Tensor Compilation Method for Deep LearningHangda Liu, Boyu Diao, Yu Yang, Wenxin Chen, Xiaohui Peng, Yongjun Xu2025-02-17下载High-performance deep learning depends on efficient tensor programs. In recent years, automatic tensor program optimization, also known as tensor compilation, has emerged as the primary approach to ge...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Reconfigurable Intelligent Surfaces-Assisted Integrated Access and BackhaulCharitha Madapatha, Behrooz Makki, Hao Guo, Tommy Svensson2025-02-17下载In this paper, we study the impact of reconfigurable intelligent surfaces (RISs) on the coverage extension of integrated access and backhaul (IAB) networks.
Blank Space: Adaptive Causal Coding for Streaming Communications Over Multi-Hop NetworksAdina Waxman, Shai Ginzach, Aviel Glam, Alejandro Cohen2025-02-17下载In this work, we introduce Blank Space AC-RLNC (BS), a novel Adaptive and Causal Network Coding (AC-RLNC) solution designed to mitigate the triplet trade-off between throughput-delay-efficiency in mul...
Design Considerations Based on Stability for a Class of TCP AlgorithmsSreekanth Prabhakar, Gaurav Raina2025-02-17下载Transmission Control Protocol (TCP) continues to be the dominant transport protocol on the Internet. The stability of fluid models has been a key consideration in the design of TCP and the performance...
End-to-End Reliability in Wireless IEEE 802.1Qbv Time-Sensitive NetworksS. Egger, J. Gross, J. Sachs, G. P. Sharma, C. Becker, F. Dürr2025-02-17下载Industrial cyber-physical systems require dependable network communication with formal end-to-end reliability guarantees. Striving towards this goal, recent efforts aim to advance the integration of 5...
A Unified Modeling Framework for Automated Penetration TestingYunfei Wang, Shixuan Liu, Wenhao Wang, Changling Zhou, Chao Zhang, Jiandong Jin, Cheng Zhu2025-02-17下载The integration of artificial intelligence into automated penetration testing (AutoPT) has highlighted the necessity of simulation modeling for the training of intelligent agents, due to its cost-effi...
Graph Neural Network-based Spectral Filtering Mechanism for Imbalance Classification in Network Digital TwinAbubakar Isah, Ibrahim Aliyu, Sulaiman Muhammad Rashid, Jaehyung Park, Minsoo Hahn, Jinsul Kim2025-02-17下载Graph neural networks are gaining attention in fifth-generation (5G) core network digital twins, which are data-driven complex systems with numerous components.
Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service ProvisioningYinqiu Liu, Ruichen Zhang, Jiacheng Wang, Dusit Niyato, Xianbin Wang, Dong In Kim, Hongyang Du2025-02-17下载Due to massive computational demands of large generative models, AI-Generated Content (AIGC) can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
A Survey of Fuzzing Open-Source Operating SystemsKun Hu, Qicai Chen, Wenzhuo Zhang, Zilong Lu, Bihuan Chen, You Lu, Haowen Jiang, Bingkun Sun, Xin Peng, Wenyun Zhao2025-02-17下载Vulnerabilities in open-source operating systems (OSs) pose substantial security risks to software systems, making their detection crucial. While fuzzing has been an effective vulnerability detection ...

cs.PF - Performance

标题作者发布日期PDF摘要
Gem5-AcceSys: Enabling System-Level Exploration of Standard Interconnects for Novel AcceleratorsQunyou Liu, Marina Zapater, David Atienza2025-02-17下载The growing demand for efficient, high-performance processing in machine learning (ML) and image processing has made hardware accelerators, such as GPUs and Data Streaming Accelerators (DSAs), increas...
Comparison of Vectorization Capabilities of Different Compilers for X86 and ARM CPUsNazmus Sakib, Tarun Prabhu, Nandakishore Santhi, John Shalf, Abdel-Hameed A. Badawy2025-02-17下载Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability of compilers to automatically vectorize code is critic...
Cheesemap: A High-Performance Point-Indexing Data Structure for Neighbor Search in LiDAR DataRuben Laso, Miguel Yermo2025-02-17下载Point cloud data, as the representation of three-dimensional spatial information, is a fundamental piece of information in various domains where indexing and querying these point clouds efficiently is...
Biases in Edge Language Models: Detection, Analysis, and MitigationVinamra Sharma, Danilo Pietro Pau, José Cano2025-02-17下载The integration of large language models (LLMs) on low-power edge devices such as Raspberry Pi, known as edge language models (ELMs), has introduced opportunities for more personalized, secure, and lo...
Evaluating the Performance of the DeepSeek Model in Confidential Computing EnvironmentBen Dong, Qian Wang2025-02-17下载The increasing adoption of Large Language Models (LLMs) in cloud environments raises critical security concerns, particularly regarding model confidentiality and data privacy.

基于 VitePress 构建