Skip to content

2025-03-24

cs.AR - Architecture

标题作者发布日期PDF摘要
"Test, Build, Deploy" -- A CI/CD Framework for Open-Source Hardware DesignsCalvin Deutschbein, Aristotle Stassinopoulos2025-03-24下载Addressing TedX, Amber Huffman made an impassioned case that "none of us is as smart as all of us" and that open-source hardware is the future.
Reimagining Memory Access for LLM Inference: Compression-Aware Memory Controller DesignRui Xie, Asad Ul Haq, Linsen Ma, Yunhua Fang, Zirak Burzin Engineer, Liu Liu, Tong Zhang2025-03-24下载The efficiency of Large Language Model~(LLM) inference is often constrained by substantial memory bandwidth and capacity demands. Existing techniques, such as pruning, quantization, and mixture of exp...
Efficient Trace for RISC-V: Design, Evaluation, and Integration in CVA6Umberto Laghi, Simone Manoni, Emanuele Parisi, Andrea Bartolini2025-03-24下载In this work, we present the design and evaluation of a Processor Tracing System compliant with the RISC-V Efficient Trace specification for Instruction Branch Tracing.
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV CacheDayou Du, Shijie Cao, Jianyi Cheng, Luo Mai, Ting Cao, Mao Yang2025-03-24下载The growth of long-context Large Language Models (LLMs) significantly increases memory and bandwidth pressure during autoregressive decoding due to the expanding Key-Value (KV) cache.
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache QuantizationMinsu Kim, Seongmin Hong, RyeoWook Ko, Soongyu Choi, Hunjong Lee, Junsoo Kim, Joo-Young Kim, Jongse Park2025-03-24下载Modern Large Language Model serving system batches multiple requests to achieve high throughput, while batching attention operations is challenging, rendering memory bandwidth a critical bottleneck.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
COoL-TEE: Client-TEE Collaboration for Resilient Distributed SearchMatthieu Bettinger, Etienne Rivière, Sonia Ben Mokhtar, Anthony Simonet-Boulogne2025-03-24下载Current marketplaces rely on search mechanisms with distributed systems but centralized governance, making them vulnerable to attacks, failures, censorship and biases.
Reliability is Blind: Collective Incentives for Decentralized Computing Marketplaces without Individual Behavior InformationHenry Mont, Matthieu Bettinger, Sonia Ben Mokhtar, Anthony Simonet-Boulogne2025-03-24下载In decentralized cloud computing marketplaces, ensuring fair and efficient interactions among asset providers and end-users is crucial. A key concern is meeting agreed-upon service-level objectives li...
Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-OptimizationZhanda Zhu, Christina Giannoula, Muralidhar Andoorveedu, Qidong Su, Karttikeya Mangalam, Bojian Zheng, Gennady Pekhimenko2025-03-24下载Various parallelism, such as data, tensor, and pipeline parallelism, along with memory optimizations like activation checkpointing, redundancy elimination, and offloading, have been proposed to accele...
Efficient Distributed Algorithms for Shape Reduction via Reconfigurable CircuitsNada Almalki, Siddharth Gupta, Othon Michail, Andreas Padalkin2025-03-24下载Autonomous reconfiguration of agent-based systems is a key challenge in the study of programmable matter, distributed robotics, and molecular self-assembly.
cfdSCOPE: A Fluid-Dynamics Proxy App for Teaching Performance EngineeringPeter Arzt, Sebastian Kreutzer, Tim Jammer, Christian Bischof2025-03-24下载Teaching performance engineering in high-performance computing (HPC) requires example codes that demonstrate bottlenecks and enable hands-on optimization.
Monte Cimone v2: Down the Road of RISC-V High-Performance ComputersEmanuele Venieri, Simone Manoni, Gabriele Ceccolini, Giacomo Madella, Federico Ficarelli, Daniele Gregori, Daniele Cesarini, Luca Benini, Andrea Bartolini2025-03-24下载Many RISC-V (RV) platforms and SoCs have been announced in recent years targeting the HPC sector, but only a few of them are commercially available and engineered to fit the HPC requirements.
Õptimal Fault-Tolerant Labeling for Reachability and Approximate Distances in Directed Planar GraphsItai Boneh, Shiri Chechik, Shay Golan, Shay Mozes, Oren Weimann2025-03-24下载We present a labeling scheme that assigns labels of size O~(1)\tilde O(1) to the vertices of a directed weighted planar graph GG, such that for any fixed ε>0\varepsilon>0 from the labels of any three ver...
AES-SpMM: Balancing Accuracy and Speed by Adaptive Edge Sampling Strategy to Accelerate SpMM in GNNsYingchen Song, Yaobin Wang, Yi Luo, Huan Wu, Pingping Tang2025-03-24下载Coordinating the design of sampling and sparse-dense matrix multiplication (SpMM) is crucial for accelerating graph neural networks (GNNs). However, due to irrational sampling strategies, existing met...
ED-DAO: Energy Donation Algorithms based on Decentralized Autonomous OrganizationAbdulrezzak Zekiye, Ouns Bouachir, Öznur Özkasap, Moayad Aloqaily2025-03-24下载Energy is a fundamental component of modern life, driving nearly all aspects of daily activities. As such, the inability to access energy when needed is a significant issue that requires innovative so...
Jenga: Effective Memory Management for Serving LLM with HeterogeneityChen Zhang, Kuntai Du, Shu Liu, Woosuk Kwon, Xiangxi Mo, Yufeng Wang, Xiaoxuan Liu, Kaichao You, Zhuohan Li, Mingsheng Long, Jidong Zhai, Joseph Gonzalez, Ion Stoica2025-03-24下载Large language models (LLMs) are widely used but expensive to run, especially as inference workloads grow. To lower costs, maximizing the request batch size by managing GPU memory efficiently is cruci...
Risk Management for Distributed Arbitrage Systems: Integrating Artificial IntelligenceAkaash Vishal Hazarika, Mahak Shah, Swapnil Patil, Pradyumna Shukla2025-03-24下载Effective risk management solutions become absolutely crucial when financial markets embrace distributed technology and decentralized financing (DeFi).
Bridging Emotions and Architecture: Sentiment Analysis in Modern Distributed SystemsMahak Shah, Akaash Vishal Hazarika, Meetu Malhotra, Sachin C. Patil, Joshit Mohanty2025-03-24下载Sentiment analysis is a field within NLP that has gained importance because it is applied in various areas such as; social media surveillance, customer feedback evaluation and market research.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Rank-Based Modeling for Universal Packets Compression in Multi-Modal CommunicationsXuanhao Luo, Zhiyuan Peng, Zhouyu Li, Ruozhou Yu, Yuchen Liu2025-03-24下载The rapid increase in networked systems and data transmission requires advanced data compression solutions to optimize bandwidth utilization and enhance network performance.
Enhancing V2X Communications with UAV-mounted Reconfigurable Intelligent SurfacesSalim Janji, Paweł Sroka, Adrian Kliks2025-03-24下载This paper addresses the crucial need for reliable wireless communication in vehicular networks, particularly vital for the safety and efficacy of (semi-)autonomous driving amid increasing traffic.
Signal Propagation in RIS-Aided 5G SystemsAdam Samorzewski, Adrian Kliks2025-03-24下载In this paper, we conduct an in-depth analysis of radio signal propagation characteristics within the urban environment of Poznan (Poland). The study specifically addresses the deployment of a 5th gen...
Energy-Efficient Dynamic Training and Inference for GNN-Based Network ModelingChetna Singhal, Yassine Hadjadj-Aoul2025-03-24下载Efficient network modeling is essential for resource optimization and network planning in next-generation large-scale complex networks. Traditional approaches, such as queuing theory-based modeling an...
Periodic Chains Scheduling on Dedicated Resources -- A Crucial Problem in Time-Sensitive NetworksJosef Grus, Claire Hanen, Zdeněk Hanzálek2025-03-24下载Periodic messages transfer data from sensors to actuators in cars, planes, and complex production machines. When considering a given routing, the unicast message starts at its source and goes over sev...
Real-Time Streaming Telemetry Based Detection and Mitigation of OOK and Power Interference in Multi-User OSaaS NetworksAgastya Raj, Devika Dass, Daniel C. Kilper, Marco Ruffini2025-03-24下载We present a framework to identify and mitigate rogue OOK signals and user-generated power interference in a multi-user Optical-Spectrum-as-a-Service network.
Large Language Models powered Malicious Traffic Detection: Architecture, Opportunities and Case StudyXinggong Zhang, Haotian Meng, Qingyang Li, Yunpeng Tan, Lei Zhang2025-03-24下载Malicious traffic detection is a pivotal technology for network security to identify abnormal network traffic and detect network attacks. Large Language Models (LLMs) are trained on a vast corpus of t...

cs.PF - Performance

标题作者发布日期PDF摘要
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV CacheDayou Du, Shijie Cao, Jianyi Cheng, Luo Mai, Ting Cao, Mao Yang2025-03-24下载The growth of long-context Large Language Models (LLMs) significantly increases memory and bandwidth pressure during autoregressive decoding due to the expanding Key-Value (KV) cache.
cfdSCOPE: A Fluid-Dynamics Proxy App for Teaching Performance EngineeringPeter Arzt, Sebastian Kreutzer, Tim Jammer, Christian Bischof2025-03-24下载Teaching performance engineering in high-performance computing (HPC) requires example codes that demonstrate bottlenecks and enable hands-on optimization.

基于 VitePress 构建