Skip to content

2025-03-18

cs.AR - Architecture

标题作者发布日期PDF摘要
NeCTAr: A Heterogeneous RISC-V SoC for Language Model Inference in Intel 16Viansa Schmulbach, Jason Kim, Ethan Gao, Lucy Revina, Nikhil Jha, Ethan Wu, Borivoje Nikolic2025-03-18下载This paper introduces NeCTAr (Near-Cache Transformer Accelerator), a 16nm heterogeneous multicore RISC-V SoC for sparse and dense machine learning kernels with both near-core and near-memory accelerat...
Retrospective: A CORDIC Based Configurable Activation Function for NN ApplicationsOmkar Kokane, Gopal Raut, Salim Ullah, Mukul Lokhande, Adam Teman, Akash Kumar, Santosh Kumar Vishvakarma2025-03-18下载A CORDIC-based configuration for the design of Activation Functions (AF) was previously suggested to accelerate ASIC hardware design for resource-constrained systems by providing functional reconfigur...
Speculative Decoding for Verilog: Speed and Quality, All in OneChangran Xu, Yi Liu, Yunhao Zhou, Shan Huang, Ningyi Xu, Qiang Xu2025-03-18下载The rapid advancement of large language models (LLMs) has revolutionized code generation tasks across various programming languages. However, the unique characteristics of programming languages, parti...
Streamlining SIMD ISA Extensions with Takum Arithmetic: A Case Study on Intel AVX10.2Laslo Hunhold2025-03-18下载Modern microprocessors extend their instruction set architecture (ISA) with Single Instruction, Multiple Data (SIMD) operations to improve performance.
A Modular Edge Device Network for Surgery DigitalizationVincent Schorp, Frédéric Giraud, Gianluca Pargätzi, Michael Wäspe, Lorenzo von Ritter-Zahony, Marcel Wegmann, Nicola A. Cavalcanti, John Garcia Henao, Nicholas Bünger, Dominique Cachin, Sebastiano Caprara, Philipp Fürnstahl, Fabio Carrillo2025-03-18下载Future surgical care demands real-time, integrated data to drive informed decision-making and improve patient outcomes. The pressing need for seamless and efficient data capture in the OR motivates ou...
FlexStep: Enabling Flexible Error Detection in Multi/Many-core Real-time SystemsTinglue Wang, Yiming Li, Wei Tang, Jiapeng Guan, Zhenghui Guo, Renshuang Jiang, Ran Wei, Jing Li, Zhe Jiang2025-03-18下载Reliability and real-time responsiveness in safety-critical systems have traditionally been achieved using error detection mechanisms, such as LockStep, which require pre-configured checker cores,stri...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Fake Runs, Real Fixes -- Analyzing xPU Performance Through SimulationIoannis Zarkadas, Amanda Tomlinson, Asaf Cidon, Baris Kasikci, Ofir Weisse2025-03-18下载As models become larger, ML accelerators are a scarce resource whose performance must be continually optimized to improve efficiency. Existing performance analysis tools are coarse grained, and fail t...
zkMixer: A Configurable Zero-Knowledge Mixer with Anti-Money Laundering Consensus ProtocolsTheodoros Constantinides, John Cartlidge2025-03-18下载We introduce a zero-knowledge cryptocurrency mixer framework that allows groups of users to set up a mixing pool with configurable governance conditions, configurable deposit delays, and the ability t...
RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation ServingWenqi Jiang, Suvinay Subramanian, Cat Graves, Gustavo Alonso, Amir Yazdanbakhsh, Vidushi Dadu2025-03-18下载Retrieval-augmented generation (RAG), which combines large language models (LLMs) with retrievals from external knowledge databases, is emerging as a popular approach for reliable LLM serving.
Unified Analysis of Decentralized Gradient Descent: a Contraction Mapping FrameworkErik G. Larsson, Nicolo Michelusi2025-03-18下载The decentralized gradient descent (DGD) algorithm, and its sibling, diffusion, are workhorses in decentralized machine learning, distributed inference and estimation, and multi-agent coordination.
Enhancing Kubernetes Resilience through Anomaly Detection and PredictionV. Anemogiannis, B. Andreou, K. Myrtollari, K. Panagidi, S. Hadjiefthymiades2025-03-18下载Kubernetes, in recent years, has become widely used for the deployment and management of software projects on cloud infrastructure. Due to the execution of these applications across numerous Nodes, ea...
Data Race Satisfiability on Array ElementsJunhyung Shim, Quazi Ishtiaque Mahmud, Ali Jannesari2025-03-18下载Detection of data races is one of the most important tasks for verifying the correctness of OpenMP parallel codes. Two main models of analysis tools have been proposed for detecting data races: dynami...
FlexStep: Enabling Flexible Error Detection in Multi/Many-core Real-time SystemsTinglue Wang, Yiming Li, Wei Tang, Jiapeng Guan, Zhenghui Guo, Renshuang Jiang, Ran Wei, Jing Li, Zhe Jiang2025-03-18下载Reliability and real-time responsiveness in safety-critical systems have traditionally been achieved using error detection mechanisms, such as LockStep, which require pre-configured checker cores,stri...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Distribution and Purification of Entanglement States in Quantum NetworksXiaojie Fan, Yukun Yang, Himanshu Gupta, C. R. Ramakrishnan2025-03-18下载We consider problems of distributing high-fidelity entangled states across nodes of a quantum network. We consider a repeater-based network architecture with entanglement swapping (fusion) operations ...
Towards More Economical Context-Augmented LLM Generation by Reusing Stored KV CacheHanchen Li, Yuhan Liu, Yihua Cheng, Kuntai Du, Junchen Jiang2025-03-18下载Across large language model (LLM) applications, we observe an emerging trend for reusing KV caches to save the prefill delays of processing repeated input texts in different LLM inputs.
Transparent Attested DNS for Confidential Computing ServicesAntoine Delignat-Lavaud, Cédric Fournet, Kapil Vaswani, Manuel Costa, Sylvan Clebsch, Christoph M. Wintersteiger2025-03-18下载Confidential services running in hardware-protected Trusted Execution Environments (TEEs) can provide higher security assurance, but this requires custom clients and protocols to distribute, update, a...
Load-Balancing versus Anycast: A First Look at Operational ChallengesRemi Hendriks, Mattijs Jonker, Roland van Rijswijk-Deij, Raffaele Sommese2025-03-18下载Load Balancing (LB) is a routing strategy that increases performance by distributing traffic over multiple outgoing paths. In this work, we introduce a novel methodology to detect the influence of LB ...
Video Streaming with Kairos: An MPC-Based ABR with Streaming-Aware Throughput PredictionZiyu Zhong, Mufan Liu, Le Yang, Yifan Wang, Yiling Xu, Jenq-Neng Hwang2025-03-18下载In this paper, we present Kairos, a model predictive control (MPC)-based adaptive bitrate (ABR) scheme that integrates streaming-aware throughput predictions to enhance video streaming quality.
Joint Channel Bandwidth Assignment and Relay Positioning for Predictive Flying NetworksRuben Queiros, Megumi Kaneko, Helder Fontes, Rui Campos2025-03-18下载Flying Networks (FNs) have emerged as a promising solution to provide on-demand wireless connectivity when network coverage is insufficient or the communications infrastructure is compromised, such as...
Multi-user Wireless Image Semantic Transmission over MIMO Multiple Access ChannelsBingyan Xie, Yongpeng Wu, Feng Shu, Jiangzhou Wang, Wenjun Zhang2025-03-18下载This paper focuses on a typical uplink transmission scenario over multiple-input multiple-output multiple access channel (MIMO-MAC) and thus propose a multi-user learnable CSI fusion semantic communic...
5G-Enabled Teleoperated Driving: An Experimental EvaluationMehdi Testouri, Gamal Elghazaly, Faisal Hawlader, Raphael Frank2025-03-18下载Teleoperated driving enables remote human intervention in autonomous vehicles, addressing challenges in complex driving environments. However, its effectiveness depends on ultra-low latency, high-reli...
Effect of Hotspot Traffic on Blocking Probability in Elastic Optical NetworksParesh Upadhyay, Yatindra Nath Singh2025-03-18下载In a circuit-switched network, traffic can be characterized by several factors that define how communication resources are allocated and utilized during a connection.
Bitcoin Burn Addresses: Unveiling the Permanent Losses and Their Underlying CausesMohamed El Khatib, Arnaud Legout2025-03-18下载Bitcoin burn addresses are addresses where bitcoins can be sent but never retrieved, resulting in the permanent loss of those coins. Given Bitcoin's fixed supply of 21 million coins, understanding the...
A Modular Edge Device Network for Surgery DigitalizationVincent Schorp, Frédéric Giraud, Gianluca Pargätzi, Michael Wäspe, Lorenzo von Ritter-Zahony, Marcel Wegmann, Nicola A. Cavalcanti, John Garcia Henao, Nicholas Bünger, Dominique Cachin, Sebastiano Caprara, Philipp Fürnstahl, Fabio Carrillo2025-03-18下载Future surgical care demands real-time, integrated data to drive informed decision-making and improve patient outcomes. The pressing need for seamless and efficient data capture in the OR motivates ou...
SNAKE: A Sustainable and Multi-functional Traffic Analysis System utilizing Specialized Large-Scale Models with a Mixture of Experts ArchitectureTian Qin, Guang Cheng, Yuyang Zhou, Zihan Chen, Xing Luan2025-03-18下载The rapid advancement of internet technology has led to a surge in data transmission, making network traffic classification crucial for security and management.

cs.PF - Performance

标题作者发布日期PDF摘要
Fake Runs, Real Fixes -- Analyzing xPU Performance Through SimulationIoannis Zarkadas, Amanda Tomlinson, Asaf Cidon, Baris Kasikci, Ofir Weisse2025-03-18下载As models become larger, ML accelerators are a scarce resource whose performance must be continually optimized to improve efficiency. Existing performance analysis tools are coarse grained, and fail t...

基于 VitePress 构建