2025-02-17

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Nexus Machine: An Active Message Inspired Reconfigurable Architecture for Irregular Workloads	Rohan Juneja, Pranav Dangi, Thilini Kaushalya Bandara, Tulika Mitra, Li-shiuan Peh	2025-02-17	下载	Modern reconfigurable architectures are increasingly favored for resource-constrained edge devices as they balance high performance, energy efficiency, and programmability well.
Hardware-Software Co-Design for Accelerating Transformer Inference Leveraging Compute-in-Memory	Dong Eun Kim, Tanvi Sharma, Kaushik Roy	2025-02-17	下载	Transformers have become the backbone of neural network architecture for most machine learning applications. Their widespread use has resulted in multiple efforts on accelerating attention, the basic ...
Gem5-AcceSys: Enabling System-Level Exploration of Standard Interconnects for Novel Accelerators	Qunyou Liu, Marina Zapater, David Atienza	2025-02-17	下载	The growing demand for efficient, high-performance processing in machine learning (ML) and image processing has made hardware accelerators, such as GPUs and Data Streaming Accelerators (DSAs), increas...
Mapping and Execution of Nested Loops on Processor Arrays: CGRAs vs. TCPAs	Dominik Walter, Marita Halm, Daniel Seidel, Indrayudh Ghosh, Christian Heidorn, Frank Hannig, Jürgen Teich	2025-02-17	下载	Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate...
HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models	Tianfan Peng, Jiajun Qin, Tianhua Xia, Sai Qian Zhang	2025-02-17	下载	Large language models (LLMs) have revolutionized natural language processing (NLP) tasks by achieving state-of-the-art performance across a range of benchmarks.
Exploring the Versal AI Engine for 3D Gaussian Splatting	Kotaro Shimamura, Ayumi Ohno, Shinya Takamaeda-Yamazaki	2025-02-17	下载	Dataflow-oriented spatial architectures are the emerging paradigm for higher computation performance and efficiency. AMD Versal AI Engine is a commercial spatial architecture consisting of tiles of ...
Understanding RowHammer Under Reduced Refresh Latency: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions	Yahya Can Tuğrul, A. Giray Yağlıkçı, İsmail Emir Yüksel, Ataberk Olgun, Oğuzhan Canpolat, Nisa Bostancı, Mohammad Sadrosadati, Oğuz Ergin, Onur Mutlu	2025-02-17	下载	RowHammer is a major read disturbance mechanism in DRAM where repeatedly accessing (hammering) a row of DRAM cells (DRAM row) induces bitflips in physically nearby DRAM rows (victim rows).
Accelerating Elliptic Curve Point Additions on Versal AI Engine for Multi-scalar Multiplication	Ayumi Ohno, Kotaro Shimamura, Shinya Takamaeda-Yamazaki	2025-02-17	下载	Multi-scalar multiplication (MSM) is crucial in cryptographic applications and computationally intensive in zero-knowledge proofs. MSM involves accumulating the products of scalars and points on an el...
Non-Binary LDPC Arithmetic Error Correction For Processing-in-Memory	Daijing Shi, Yihang Zhu, Anjunyi Fan, Yaoyu Tao, Yuchao Yang, Bonan Yan	2025-02-17	下载	Processing-in-memory (PIM) based on emerging devices such as memristors is more vulnerable to noise than traditional memories, due to the physical non-idealities and complex operations in analog domai...
SparseZipper: Enhancing Matrix Extensions to Accelerate SpGEMM on CPUs	Tuan Ta, Joshua Randall, Christopher Batten	2025-02-17	下载	The importance of general matrix multiplication (GEMM) is motivating new instruction set extensions for multiplying dense matrices in almost all contemporary ISAs, and these extensions are often imple...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Understanding Silent Data Corruption in LLM Training	Jeffrey Ma, Hengzhi Pei, Leonard Lausen, George Karypis	2025-02-17	下载	As the scale of training large language models (LLMs) increases, one emergent failure is silent data corruption (SDC), where hardware produces incorrect computations without explicit failure signals.
Connecting Large Language Model Agent to High Performance Computing Resource	Heng Ma, Alexander Brace, Carlo Siebenschuh, Greg Pauloski, Ian Foster, Arvind Ramanathan	2025-02-17	下载	The Large Language Model agent workflow enables the LLM to invoke tool functions to increase the performance on specific scientific domain questions.
Distributed Consensus Network: A Modularized Communication Framework and Reliability Probabilistic Analysis	Yuetai Li, Zhangchen Xu, Yiqi Wang, Zihan Zhou, Lei Zhang, Jon Crowcroft	2025-02-17	下载	In this paper, we propose a modularized framework for communication processes applicable to crash and Byzantine fault-tolerant consensus protocols.
Transaction Fee Market Design for Parallel Execution	Bahar Acilan, Andrei Constantinescu, Lioba Heimbach, Roger Wattenhofer	2025-02-17	下载	Given the low throughput of blockchains like Bitcoin and Ethereum, scalability - the ability to process an increasing number of transactions - has become a central focus of blockchain research.
Comparison of Vectorization Capabilities of Different Compilers for X86 and ARM CPUs	Nazmus Sakib, Tarun Prabhu, Nandakishore Santhi, John Shalf, Abdel-Hameed A. Badawy	2025-02-17	下载	Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability of compilers to automatically vectorize code is critic...
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs	Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei	2025-02-17	下载	The advent of 1-bit large language models (LLMs), led by BitNet b1.58, has spurred interest in ternary LLMs. Despite this, research and practical applications focusing on efficient edge inference for ...
A Proposed End-To-End Principle for Data Commons	Robert L. Grossman	2025-02-17	下载	A data commons brings together (or co-locates) data with cloud computing infrastructure and commonly used software services, tools and applications for managing, analyzing and sharing data to create a...
On the Locality of the Lovász Local Lemma	Peter Davies-Peck	2025-02-17	下载	The Lovász Local Lemma is a versatile result in probability theory, characterizing circumstances in which a collection of $n$ `bad events', each occurring with probability at most $p$ and dependent on...
Parallel-in-Time Kalman Smoothing Using Orthogonal Transformations	Shahaf Gargir, Sivan Toledo	2025-02-17	下载	We present a numerically-stable parallel-in-time linear Kalman smoother. The smoother uses a novel highly-parallel QR factorization for a class of structured sparse matrices for state estimation, and ...
InTec: integrated things-edge computing: a framework for distributing machine learning pipelines in edge AI systems	Habib Larian, Faramarz Safi-Esfahani	2025-02-17	下载	With the rapid expansion of the Internet of Things (IoT), sensors, smartphones, and wearables have become integral to daily life, powering smart applications in home automation, healthcare, and intell...
Robust Set Partitioning Strategy for Malicious Information Detection in Large-Scale Internet of Things	Yuhan Suo, Runqi Chai, Kaiyuan Chen, Senchun Chai, Wannian Liang, Yuanqing Xia	2025-02-17	下载	With the rapid development of the Internet of Things (IoT), the risks of data tampering and malicious information injection have intensified, making efficient threat detection in large-scale distribut...
Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding	Tian Jin, Ellie Y. Cheng, Zack Ankner, Nikunj Saunshi, Blake M. Elias, Amir Yazdanbakhsh, Jonathan Ragan-Kelley, Suvinay Subramanian, Michael Carbin	2025-02-17	下载	Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and ...
zScore: A Universal Decentralised Reputation System for the Blockchain Economy	Himanshu Udupi, Ashutosh Sahoo, Akshay S. P., Gurukiran S., Parag Paul, Petrus C. Martens	2025-02-17	下载	Modern society functions on trust. The onchain economy, however, is built on the founding principles of trustless peer-to-peer interactions in an adversarial environment without a centralised body of ...
GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations	Zhuoning Guo, Guangxing Chen, Qian Gao, Xiaochao Liao, Jianjia Zheng, Lu Shen, Hao Liu	2025-02-17	下载	Web recommendations provide personalized items from massive catalogs for users, which rely heavily on retrieval stages to trade off the effectiveness and efficiency of selecting a small relevant set f...
BagChain: A Dual-functional Blockchain Leveraging Bagging-based Distributed Learning	Zixiang Cui, Xintong Ling, Xingyu Zhou, Jiaheng Wang, Zhi Ding, Xiqi Gao	2025-02-17	下载	This work proposes a dual-functional blockchain framework named BagChain for bagging-based decentralized learning. BagChain integrates blockchain with distributed machine learning by replacing the com...
DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services	Ting Sun, Penghan Wang, Fan Lai	2025-02-17	下载	The rapid rise of large language models (LLMs) in text streaming services has introduced significant cost and Quality of Experience (QoE) challenges in serving millions of daily requests, especially i...
Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning	Hangda Liu, Boyu Diao, Yu Yang, Wenxin Chen, Xiaohui Peng, Yongjun Xu	2025-02-17	下载	High-performance deep learning depends on efficient tensor programs. In recent years, automatic tensor program optimization, also known as tensor compilation, has emerged as the primary approach to ge...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Reconfigurable Intelligent Surfaces-Assisted Integrated Access and Backhaul	Charitha Madapatha, Behrooz Makki, Hao Guo, Tommy Svensson	2025-02-17	下载	In this paper, we study the impact of reconfigurable intelligent surfaces (RISs) on the coverage extension of integrated access and backhaul (IAB) networks.
Blank Space: Adaptive Causal Coding for Streaming Communications Over Multi-Hop Networks	Adina Waxman, Shai Ginzach, Aviel Glam, Alejandro Cohen	2025-02-17	下载	In this work, we introduce Blank Space AC-RLNC (BS), a novel Adaptive and Causal Network Coding (AC-RLNC) solution designed to mitigate the triplet trade-off between throughput-delay-efficiency in mul...
Design Considerations Based on Stability for a Class of TCP Algorithms	Sreekanth Prabhakar, Gaurav Raina	2025-02-17	下载	Transmission Control Protocol (TCP) continues to be the dominant transport protocol on the Internet. The stability of fluid models has been a key consideration in the design of TCP and the performance...
End-to-End Reliability in Wireless IEEE 802.1Qbv Time-Sensitive Networks	S. Egger, J. Gross, J. Sachs, G. P. Sharma, C. Becker, F. Dürr	2025-02-17	下载	Industrial cyber-physical systems require dependable network communication with formal end-to-end reliability guarantees. Striving towards this goal, recent efforts aim to advance the integration of 5...
A Unified Modeling Framework for Automated Penetration Testing	Yunfei Wang, Shixuan Liu, Wenhao Wang, Changling Zhou, Chao Zhang, Jiandong Jin, Cheng Zhu	2025-02-17	下载	The integration of artificial intelligence into automated penetration testing (AutoPT) has highlighted the necessity of simulation modeling for the training of intelligent agents, due to its cost-effi...
Graph Neural Network-based Spectral Filtering Mechanism for Imbalance Classification in Network Digital Twin	Abubakar Isah, Ibrahim Aliyu, Sulaiman Muhammad Rashid, Jaehyung Park, Minsoo Hahn, Jinsul Kim	2025-02-17	下载	Graph neural networks are gaining attention in fifth-generation (5G) core network digital twins, which are data-driven complex systems with numerous components.
Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning	Yinqiu Liu, Ruichen Zhang, Jiacheng Wang, Dusit Niyato, Xianbin Wang, Dong In Kim, Hongyang Du	2025-02-17	下载	Due to massive computational demands of large generative models, AI-Generated Content (AIGC) can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
A Survey of Fuzzing Open-Source Operating Systems	Kun Hu, Qicai Chen, Wenzhuo Zhang, Zilong Lu, Bihuan Chen, You Lu, Haowen Jiang, Bingkun Sun, Xin Peng, Wenyun Zhao	2025-02-17	下载	Vulnerabilities in open-source operating systems (OSs) pose substantial security risks to software systems, making their detection crucial. While fuzzing has been an effective vulnerability detection ...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Gem5-AcceSys: Enabling System-Level Exploration of Standard Interconnects for Novel Accelerators	Qunyou Liu, Marina Zapater, David Atienza	2025-02-17	下载	The growing demand for efficient, high-performance processing in machine learning (ML) and image processing has made hardware accelerators, such as GPUs and Data Streaming Accelerators (DSAs), increas...
Comparison of Vectorization Capabilities of Different Compilers for X86 and ARM CPUs	Nazmus Sakib, Tarun Prabhu, Nandakishore Santhi, John Shalf, Abdel-Hameed A. Badawy	2025-02-17	下载	Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability of compilers to automatically vectorize code is critic...
Cheesemap: A High-Performance Point-Indexing Data Structure for Neighbor Search in LiDAR Data	Ruben Laso, Miguel Yermo	2025-02-17	下载	Point cloud data, as the representation of three-dimensional spatial information, is a fundamental piece of information in various domains where indexing and querying these point clouds efficiently is...
Biases in Edge Language Models: Detection, Analysis, and Mitigation	Vinamra Sharma, Danilo Pietro Pau, José Cano	2025-02-17	下载	The integration of large language models (LLMs) on low-power edge devices such as Raspberry Pi, known as edge language models (ELMs), has introduced opportunities for more personalized, secure, and lo...
Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment	Ben Dong, Qian Wang	2025-02-17	下载	The increasing adoption of Large Language Models (LLMs) in cloud environments raises critical security concerns, particularly regarding model confidentiality and data privacy.