Appearance
2025-02-17
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Nexus Machine: An Active Message Inspired Reconfigurable Architecture for Irregular Workloads | Rohan Juneja, Pranav Dangi, Thilini Kaushalya Bandara, Tulika Mitra, Li-shiuan Peh | 2025-02-17 | 下载 | Modern reconfigurable architectures are increasingly favored for resource-constrained edge devices as they balance high performance, energy efficiency, and programmability well. |
| Hardware-Software Co-Design for Accelerating Transformer Inference Leveraging Compute-in-Memory | Dong Eun Kim, Tanvi Sharma, Kaushik Roy | 2025-02-17 | 下载 | Transformers have become the backbone of neural network architecture for most machine learning applications. Their widespread use has resulted in multiple efforts on accelerating attention, the basic ... |
| Gem5-AcceSys: Enabling System-Level Exploration of Standard Interconnects for Novel Accelerators | Qunyou Liu, Marina Zapater, David Atienza | 2025-02-17 | 下载 | The growing demand for efficient, high-performance processing in machine learning (ML) and image processing has made hardware accelerators, such as GPUs and Data Streaming Accelerators (DSAs), increas... |
| Mapping and Execution of Nested Loops on Processor Arrays: CGRAs vs. TCPAs | Dominik Walter, Marita Halm, Daniel Seidel, Indrayudh Ghosh, Christian Heidorn, Frank Hannig, Jürgen Teich | 2025-02-17 | 下载 | Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate... |
| HAAN: A Holistic Approach for Accelerating Normalization Operations in Large Language Models | Tianfan Peng, Jiajun Qin, Tianhua Xia, Sai Qian Zhang | 2025-02-17 | 下载 | Large language models (LLMs) have revolutionized natural language processing (NLP) tasks by achieving state-of-the-art performance across a range of benchmarks. |
| Exploring the Versal AI Engine for 3D Gaussian Splatting | Kotaro Shimamura, Ayumi Ohno, Shinya Takamaeda-Yamazaki | 2025-02-17 | 下载 | Dataflow-oriented spatial architectures are the emerging paradigm for higher computation performance and efficiency. AMD Versal AI Engine is a commercial spatial architecture consisting of tiles of ... |
| Understanding RowHammer Under Reduced Refresh Latency: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions | Yahya Can Tuğrul, A. Giray Yağlıkçı, İsmail Emir Yüksel, Ataberk Olgun, Oğuzhan Canpolat, Nisa Bostancı, Mohammad Sadrosadati, Oğuz Ergin, Onur Mutlu | 2025-02-17 | 下载 | RowHammer is a major read disturbance mechanism in DRAM where repeatedly accessing (hammering) a row of DRAM cells (DRAM row) induces bitflips in physically nearby DRAM rows (victim rows). |
| Accelerating Elliptic Curve Point Additions on Versal AI Engine for Multi-scalar Multiplication | Ayumi Ohno, Kotaro Shimamura, Shinya Takamaeda-Yamazaki | 2025-02-17 | 下载 | Multi-scalar multiplication (MSM) is crucial in cryptographic applications and computationally intensive in zero-knowledge proofs. MSM involves accumulating the products of scalars and points on an el... |
| Non-Binary LDPC Arithmetic Error Correction For Processing-in-Memory | Daijing Shi, Yihang Zhu, Anjunyi Fan, Yaoyu Tao, Yuchao Yang, Bonan Yan | 2025-02-17 | 下载 | Processing-in-memory (PIM) based on emerging devices such as memristors is more vulnerable to noise than traditional memories, due to the physical non-idealities and complex operations in analog domai... |
| SparseZipper: Enhancing Matrix Extensions to Accelerate SpGEMM on CPUs | Tuan Ta, Joshua Randall, Christopher Batten | 2025-02-17 | 下载 | The importance of general matrix multiplication (GEMM) is motivating new instruction set extensions for multiplying dense matrices in almost all contemporary ISAs, and these extensions are often imple... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Understanding Silent Data Corruption in LLM Training | Jeffrey Ma, Hengzhi Pei, Leonard Lausen, George Karypis | 2025-02-17 | 下载 | As the scale of training large language models (LLMs) increases, one emergent failure is silent data corruption (SDC), where hardware produces incorrect computations without explicit failure signals. |
| Connecting Large Language Model Agent to High Performance Computing Resource | Heng Ma, Alexander Brace, Carlo Siebenschuh, Greg Pauloski, Ian Foster, Arvind Ramanathan | 2025-02-17 | 下载 | The Large Language Model agent workflow enables the LLM to invoke tool functions to increase the performance on specific scientific domain questions. |
| Distributed Consensus Network: A Modularized Communication Framework and Reliability Probabilistic Analysis | Yuetai Li, Zhangchen Xu, Yiqi Wang, Zihan Zhou, Lei Zhang, Jon Crowcroft | 2025-02-17 | 下载 | In this paper, we propose a modularized framework for communication processes applicable to crash and Byzantine fault-tolerant consensus protocols. |
| Transaction Fee Market Design for Parallel Execution | Bahar Acilan, Andrei Constantinescu, Lioba Heimbach, Roger Wattenhofer | 2025-02-17 | 下载 | Given the low throughput of blockchains like Bitcoin and Ethereum, scalability - the ability to process an increasing number of transactions - has become a central focus of blockchain research. |
| Comparison of Vectorization Capabilities of Different Compilers for X86 and ARM CPUs | Nazmus Sakib, Tarun Prabhu, Nandakishore Santhi, John Shalf, Abdel-Hameed A. Badawy | 2025-02-17 | 下载 | Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability of compilers to automatically vectorize code is critic... |
| Bitnet.cpp: Efficient Edge Inference for Ternary LLMs | Jinheng Wang, Hansong Zhou, Ting Song, Shijie Cao, Yan Xia, Ting Cao, Jianyu Wei, Shuming Ma, Hongyu Wang, Furu Wei | 2025-02-17 | 下载 | The advent of 1-bit large language models (LLMs), led by BitNet b1.58, has spurred interest in ternary LLMs. Despite this, research and practical applications focusing on efficient edge inference for ... |
| A Proposed End-To-End Principle for Data Commons | Robert L. Grossman | 2025-02-17 | 下载 | A data commons brings together (or co-locates) data with cloud computing infrastructure and commonly used software services, tools and applications for managing, analyzing and sharing data to create a... |
| On the Locality of the Lovász Local Lemma | Peter Davies-Peck | 2025-02-17 | 下载 | The Lovász Local Lemma is a versatile result in probability theory, characterizing circumstances in which a collection of `bad events', each occurring with probability at most and dependent on... |
| Parallel-in-Time Kalman Smoothing Using Orthogonal Transformations | Shahaf Gargir, Sivan Toledo | 2025-02-17 | 下载 | We present a numerically-stable parallel-in-time linear Kalman smoother. The smoother uses a novel highly-parallel QR factorization for a class of structured sparse matrices for state estimation, and ... |
| InTec: integrated things-edge computing: a framework for distributing machine learning pipelines in edge AI systems | Habib Larian, Faramarz Safi-Esfahani | 2025-02-17 | 下载 | With the rapid expansion of the Internet of Things (IoT), sensors, smartphones, and wearables have become integral to daily life, powering smart applications in home automation, healthcare, and intell... |
| Robust Set Partitioning Strategy for Malicious Information Detection in Large-Scale Internet of Things | Yuhan Suo, Runqi Chai, Kaiyuan Chen, Senchun Chai, Wannian Liang, Yuanqing Xia | 2025-02-17 | 下载 | With the rapid development of the Internet of Things (IoT), the risks of data tampering and malicious information injection have intensified, making efficient threat detection in large-scale distribut... |
| Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding | Tian Jin, Ellie Y. Cheng, Zack Ankner, Nikunj Saunshi, Blake M. Elias, Amir Yazdanbakhsh, Jonathan Ragan-Kelley, Suvinay Subramanian, Michael Carbin | 2025-02-17 | 下载 | Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and ... |
| zScore: A Universal Decentralised Reputation System for the Blockchain Economy | Himanshu Udupi, Ashutosh Sahoo, Akshay S. P., Gurukiran S., Parag Paul, Petrus C. Martens | 2025-02-17 | 下载 | Modern society functions on trust. The onchain economy, however, is built on the founding principles of trustless peer-to-peer interactions in an adversarial environment without a centralised body of ... |
| GPU-accelerated Multi-relational Parallel Graph Retrieval for Web-scale Recommendations | Zhuoning Guo, Guangxing Chen, Qian Gao, Xiaochao Liao, Jianjia Zheng, Lu Shen, Hao Liu | 2025-02-17 | 下载 | Web recommendations provide personalized items from massive catalogs for users, which rely heavily on retrieval stages to trade off the effectiveness and efficiency of selecting a small relevant set f... |
| BagChain: A Dual-functional Blockchain Leveraging Bagging-based Distributed Learning | Zixiang Cui, Xintong Ling, Xingyu Zhou, Jiaheng Wang, Zhi Ding, Xiqi Gao | 2025-02-17 | 下载 | This work proposes a dual-functional blockchain framework named BagChain for bagging-based decentralized learning. BagChain integrates blockchain with distributed machine learning by replacing the com... |
| DiSCo: Device-Server Collaborative LLM-Based Text Streaming Services | Ting Sun, Penghan Wang, Fan Lai | 2025-02-17 | 下载 | The rapid rise of large language models (LLMs) in text streaming services has introduced significant cost and Quality of Experience (QoE) challenges in serving millions of daily requests, especially i... |
| Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning | Hangda Liu, Boyu Diao, Yu Yang, Wenxin Chen, Xiaohui Peng, Yongjun Xu | 2025-02-17 | 下载 | High-performance deep learning depends on efficient tensor programs. In recent years, automatic tensor program optimization, also known as tensor compilation, has emerged as the primary approach to ge... |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Reconfigurable Intelligent Surfaces-Assisted Integrated Access and Backhaul | Charitha Madapatha, Behrooz Makki, Hao Guo, Tommy Svensson | 2025-02-17 | 下载 | In this paper, we study the impact of reconfigurable intelligent surfaces (RISs) on the coverage extension of integrated access and backhaul (IAB) networks. |
| Blank Space: Adaptive Causal Coding for Streaming Communications Over Multi-Hop Networks | Adina Waxman, Shai Ginzach, Aviel Glam, Alejandro Cohen | 2025-02-17 | 下载 | In this work, we introduce Blank Space AC-RLNC (BS), a novel Adaptive and Causal Network Coding (AC-RLNC) solution designed to mitigate the triplet trade-off between throughput-delay-efficiency in mul... |
| Design Considerations Based on Stability for a Class of TCP Algorithms | Sreekanth Prabhakar, Gaurav Raina | 2025-02-17 | 下载 | Transmission Control Protocol (TCP) continues to be the dominant transport protocol on the Internet. The stability of fluid models has been a key consideration in the design of TCP and the performance... |
| End-to-End Reliability in Wireless IEEE 802.1Qbv Time-Sensitive Networks | S. Egger, J. Gross, J. Sachs, G. P. Sharma, C. Becker, F. Dürr | 2025-02-17 | 下载 | Industrial cyber-physical systems require dependable network communication with formal end-to-end reliability guarantees. Striving towards this goal, recent efforts aim to advance the integration of 5... |
| A Unified Modeling Framework for Automated Penetration Testing | Yunfei Wang, Shixuan Liu, Wenhao Wang, Changling Zhou, Chao Zhang, Jiandong Jin, Cheng Zhu | 2025-02-17 | 下载 | The integration of artificial intelligence into automated penetration testing (AutoPT) has highlighted the necessity of simulation modeling for the training of intelligent agents, due to its cost-effi... |
| Graph Neural Network-based Spectral Filtering Mechanism for Imbalance Classification in Network Digital Twin | Abubakar Isah, Ibrahim Aliyu, Sulaiman Muhammad Rashid, Jaehyung Park, Minsoo Hahn, Jinsul Kim | 2025-02-17 | 下载 | Graph neural networks are gaining attention in fifth-generation (5G) core network digital twins, which are data-driven complex systems with numerous components. |
| Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning | Yinqiu Liu, Ruichen Zhang, Jiacheng Wang, Dusit Niyato, Xianbin Wang, Dong In Kim, Hongyang Du | 2025-02-17 | 下载 | Due to massive computational demands of large generative models, AI-Generated Content (AIGC) can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and... |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| A Survey of Fuzzing Open-Source Operating Systems | Kun Hu, Qicai Chen, Wenzhuo Zhang, Zilong Lu, Bihuan Chen, You Lu, Haowen Jiang, Bingkun Sun, Xin Peng, Wenyun Zhao | 2025-02-17 | 下载 | Vulnerabilities in open-source operating systems (OSs) pose substantial security risks to software systems, making their detection crucial. While fuzzing has been an effective vulnerability detection ... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Gem5-AcceSys: Enabling System-Level Exploration of Standard Interconnects for Novel Accelerators | Qunyou Liu, Marina Zapater, David Atienza | 2025-02-17 | 下载 | The growing demand for efficient, high-performance processing in machine learning (ML) and image processing has made hardware accelerators, such as GPUs and Data Streaming Accelerators (DSAs), increas... |
| Comparison of Vectorization Capabilities of Different Compilers for X86 and ARM CPUs | Nazmus Sakib, Tarun Prabhu, Nandakishore Santhi, John Shalf, Abdel-Hameed A. Badawy | 2025-02-17 | 下载 | Most modern processors contain vector units that simultaneously perform the same arithmetic operation over multiple sets of operands. The ability of compilers to automatically vectorize code is critic... |
| Cheesemap: A High-Performance Point-Indexing Data Structure for Neighbor Search in LiDAR Data | Ruben Laso, Miguel Yermo | 2025-02-17 | 下载 | Point cloud data, as the representation of three-dimensional spatial information, is a fundamental piece of information in various domains where indexing and querying these point clouds efficiently is... |
| Biases in Edge Language Models: Detection, Analysis, and Mitigation | Vinamra Sharma, Danilo Pietro Pau, José Cano | 2025-02-17 | 下载 | The integration of large language models (LLMs) on low-power edge devices such as Raspberry Pi, known as edge language models (ELMs), has introduced opportunities for more personalized, secure, and lo... |
| Evaluating the Performance of the DeepSeek Model in Confidential Computing Environment | Ben Dong, Qian Wang | 2025-02-17 | 下载 | The increasing adoption of Large Language Models (LLMs) in cloud environments raises critical security concerns, particularly regarding model confidentiality and data privacy. |