Skip to content

2025-11-26

cs.AR - Architecture

标题作者发布日期PDF摘要
Platinum: Path-Adaptable LUT-Based Accelerator Tailored for Low-Bit Weight Matrix MultiplicationHaoxuan Shan, Cong Guo, Chiyue Wei, Feng Cheng, Junyao Zhang, Hai "Helen" Li, Yiran Chen2025-11-26下载The rapid scaling of large language models demands more efficient hardware. Quantization offers a promising trade-off between efficiency and performance.
Modeling and Simulation Frameworks for Processing-in-Memory ArchitecturesMahdi Aghaei, Saba Ebrahimi, Mohammad Saleh Arafati, Elham Cheshmikhani, Dara Rahmati, Saeid Gorgin, Jungrae Kim2025-11-26下载Processing-in-Memory (PIM) has emerged as a promising computing paradigm to address the memory wall and the fundamental bottleneck of the von Neumann architecture by reducing costly data movement betw...
Modeling and Optimizing Performance Bottlenecks for Neuromorphic AcceleratorsJason Yik, Walter Gallego Gomez, Andrew Cheng, Benedetto Leto, Alessandro Pierro, Noah Pacik-Nelson, Korneel Van den Berghe, Vittorio Fra, Andreea Danielescu, Gianvito Urgese, Vijay Janapa Reddi2025-11-26下载Neuromorphic accelerators offer promising platforms for machine learning (ML) inference by leveraging event-driven, spatially-expanded architectures that naturally exploit unstructured sparsity throug...
A 0.32 mm2^2 100 Mb/s 223 mW ASIC in 22FDX for Joint Jammer Mitigation, Channel Estimation, and SIMO Data DetectionJonas Elmiger, Fabian Stuber, Oscar Castañeda, Gian Marti, Christoph Studer2025-11-26下载We present the first single-input multiple-output (SIMO) receiver ASIC that jointly performs jammer mitigation, channel estimation, and data detection.
A Jammer-Resilient 2.87 mm2^2 1.28 MS/s 310 mW Multi-Antenna Synchronization ASIC in 65 nmFlurin Arquint, Oscar Castañeda, Gian Marti, Christoph Studer2025-11-26下载We present the first ASIC implementation of jammer-resilient multi-antenna time synchronization. The ASIC implements a recent algorithm that mitigates jamming attacks on synchronization signals using ...
Bombyx: OpenCilk Compilation for FPGA Hardware AccelerationMohamed Shahawy, Julien de Castelnau, Paolo Ienne2025-11-26下载Task-level parallelism (TLP) is a widely used approach in software where independent tasks are dynamically created and scheduled at runtime. Recent systems have explored architectural support for TLP ...
RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AIMuhammed Yildirim, Ozcan Ozturk2025-11-26下载The increasing demand for on-device intelligence in Edge AI and TinyML applications requires the efficient execution of modern Convolutional Neural Networks (CNNs).
LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and ThrottlingZhongchun Zhou, Chengtao Lai, Wei Zhang2025-11-26下载Large Language Models (LLMs) have achieved unprecedented success across various applications, but their substantial memory requirements pose significant challenges to current memory system designs, es...
Handling of Memory Page Faults during Virtual-Address RDMAAntonis Psistakis2025-11-26下载Nowadays, avoiding system calls during cluster communication (e.g., in Data Centers and High Performance Computing) in modern high-speed interconnection networks has become a necessity, due to the hig...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
ZipperChain: Transmuting Trusted Third-Party Services Into Trustless Atomic BroadcastMatteo Bjornsson, Taylor Hardin, Taylor Heinecke, Marcin Furtak, David L. Millman, Mike P. Wittie2025-11-26下载Distributed ledger technologies (DLTs) rely on distributed consensus mechanisms to reach agreement over the order of transactions and to provide immutability and availability of transaction data.
Clock2Q+: A Simple and Efficient Replacement Algorithm for Metadata Cache in VMware vSANYiyan Zhai, Bintang Dwi Marthen, Sarath Balivada, Vamsi Sudhakar Bojji, Eric Knauft, Jitender Rohilla, Jiaqi Zuo, Quanxing Liu, Maxime Austruy, Wenguang Wang, Juncheng Yang2025-11-26下载Cache replacement algorithms are critical building blocks of storage systems. This paper examines the characteristics of metadata caches and argues that they inherently exhibit correlated references, ...
OOCO: Latency-disaggregated Architecture for Online-Offline Co-locate LLM ServingSiyu Wu, Zihan Tang, Yuting Zeng, Hui Chen, Guiguang Ding, Tongxuan Liu, Ke Zhang, Hailong Yang2025-11-26下载Large Language Models (LLMs) are increasingly deployed in both latency-sensitive online services and cost-sensitive offline workloads. Co-locating these workloads on shared serving instances can impro...
Equivalence and Separation between Heard-Of and Asynchronous Message-Passing ModelsHagit Attiya, Armando Castañeda, Dhrubajyoti Ghosh, Thomas Nowak2025-11-26下载We revisit the relationship between two fundamental models of distributed computation: the asynchronous message-passing model with up to ff crash failures (\operatorname{AMP}_f) and the Heard-Of mo...
A Sustainable and Reward Incentivized High-Performance Cluster Computing for Artificial Intelligence: A Novel Bayesian-Time-Decay Trust Mechanism in BlockchainMurat Yaslioglu2025-11-26下载In an age where sustainability is of paramount importance, the significance of both high-performance computing and intelligent algorithms cannot be understated.
DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model ServingFengze Yu, Leshu Li, Brad McDanel, Sai Qian Zhang2025-11-26下载Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments.
AI/ML Model Cards in Edge AI Cyberinfrastructure: towards Agentic AIBeth Plale, Neelesh Karthikeyan, Isuru Gamage, Joe Stubbs, Sachith Withana2025-11-26下载AI/ML model cards can contain a benchmarked evaluation of an AI/ML model against intended use but a one time assessment during model training does not get at how and where a model is actually used ove...
Diagonal Scaling: A Multi-Dimensional Resource Model and Optimization Framework for Distributed DatabasesShahir Abdullah, Syed Rohit Zaman2025-11-26下载Modern cloud databases present scaling as a binary decision: scale-out by adding nodes or scale-up by increasing per-node resources. This one-dimensional view is limiting because database performance,...
MAD-DAG: Protecting Blockchain Consensus from MEVRoi Bar-Zur, Aviv Tamar, Ittay Eyal2025-11-26下载Blockchain security is threatened by selfish mining, where a miner (operator) deviates from the protocol to increase their revenue. Selfish mining is exacerbated by adverse conditions: rushing (networ...
Modeling the Effect of Data Redundancy on Speedup in MLFMA Near-Field ComputationMorteza Sadeghi2025-11-26下载The near-field (P2P) operator in the Multilevel Fast Multipole Algorithm (MLFMA) is a performance bottleneck on GPUs due to poor memory locality.
MemFine: Memory-Aware Fine-Grained Scheduling for MoE TrainingLu Zhao, Rong Shi, Shaoqing Zhang, Yueqiang Chen, Baoguo He, Hongfeng Sun, Ziqing Yin, Shangchao Su, Zhiyan Cui, Liang Dong, Xiyuan Li, Lingbin Wang, Jianwei He, Jiesong Ma, Weikang Huang, Jianglei Tong, Dongdong Gao, Jian Zhang, Hong Tian, Hui Shen, Zongtai Luo, Zhaoqun Sun, Hongxing Niu, Yue Sun2025-11-26下载The training of large-scale Mixture of Experts (MoE) models faces a critical memory bottleneck due to severe load imbalance caused by dynamic token routing.
Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLMTim Trappen, Robert Keßler, Roland Pabel, Viktor Achter, Stefan Wesner2025-11-26下载Due to rising demands for Artificial Inteligence (AI) inference, especially in higher education, novel solutions utilising existing infrastructure are emerging.
GPU-Virt-Bench: A Comprehensive Benchmarking Framework for Software-Based GPU Virtualization SystemsJithin VG, Ditto PS2025-11-26下载The proliferation of GPU-accelerated workloads, particularly in artificial intelligence and large language model (LLM) inference, has created unprecedented demand for efficient GPU resource sharing in...
Privacy in Federated Learning with Spiking Neural NetworksDogukan Aksu, Jesus Martinez del Rincon, Ihsen Alouani2025-11-26下载Spiking neural networks (SNNs) have emerged as prominent candidates for embedded and edge AI. Their inherent low power consumption makes them far more efficient than conventional ANNs in scenarios whe...
GPU Memory Prediction for Multimodal Model TrainingJinwoo Jeong, Minchul Kang, Younghun Go, Changyong Shin, Hyunho Lee, Junho Yoon, Gyeongsik Yang, Chuck Yoo2025-11-26下载As deep learning models in agentic AI systems grow in scale and complexity, GPU memory requirements increase and often exceed the available GPU memory capacity, so that out-of-memory (OoM) errors occu...
LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and ThrottlingZhongchun Zhou, Chengtao Lai, Wei Zhang2025-11-26下载Large Language Models (LLMs) have achieved unprecedented success across various applications, but their substantial memory requirements pose significant challenges to current memory system designs, es...
Handling of Memory Page Faults during Virtual-Address RDMAAntonis Psistakis2025-11-26下载Nowadays, avoiding system calls during cluster communication (e.g., in Data Centers and High Performance Computing) in modern high-speed interconnection networks has become a necessity, due to the hig...
Efficient Multi-Adapter LLM Serving via Cross-Model KV-Cache Reuse with Activated LoRAAllison Li, Kristjan Greenewald, Thomas Parnell, Navid Azizan2025-11-26下载Modern large language model (LLM) systems increasingly rely on multi-turn pipelines that are composed of multiple task-specific adapters, yet existing serving frameworks remain inefficient, incurring ...
DOPD: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference ServingJunhan Liao, Minxian Xu, Wanyi Zheng, Yan Wang, Kejiang Ye, Rajkumar Buyya, Chengzhong Xu2025-11-26下载To meet strict Service-Level Objectives (SLOs),contemporary Large Language Models (LLMs) decouple the prefill and decoding stages and place them on separate GPUs to mitigate the distinct bottlenecks i...
Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic WorkflowsYinwei Dai, Zhuofu Chen, Anand Iyer, Ravi Netravali2025-11-26下载Agentic workflows have emerged as a powerful paradigm for solving complex, multi-stage tasks, but serving them at scale is computationally expensive given the many LLM inferences that each request mus...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
ZipperChain: Transmuting Trusted Third-Party Services Into Trustless Atomic BroadcastMatteo Bjornsson, Taylor Hardin, Taylor Heinecke, Marcin Furtak, David L. Millman, Mike P. Wittie2025-11-26下载Distributed ledger technologies (DLTs) rely on distributed consensus mechanisms to reach agreement over the order of transactions and to provide immutability and availability of transaction data.
Resilient and Reliable Cloud Network Control for Mission-Critical Latency-Sensitive Service ChainsChin-Wei Huang, Jaime Llorca, Antonia M. Tulino, Andreas F. Molisch2025-11-26下载The proliferation of mission-critical latency-sensitive services has intensified the demand for next-generation cloud-integrated networks to guarantee both reliable and resilient service delivery.
Secure Command, Control and Communications Systems (C3) for Army UxVsT. Rebolo, A. Grilo, C. Ribeiro2025-11-26下载Unmanned Vehicles (UxVs) are increasingly used in modern military operations for reconnaissance, surveillance, and strike missions, enhancing situational awareness while reducing risk to personnel.
Toward Secure Content-Centric Approaches for 5G-Based IoT: Advances and Emerging TrendsGhada Jaber, Mohamed Ali Zormati, Walid Cavelius, Louka Chapiro, Mohamed El Ahmadi2025-11-26下载The convergence of the Internet of Things (IoT) and 5G technologies is transforming modern communication systems by enabling massive connectivity, low latency, and high-speed data transmission.
ChronoRAN: Analyzing Latency in 5G SystemsArman Maghsoudnia, Aoyu Gong, Raphael Cannatà, Dan Mihai Dumitriu, Haitham Hassanieh2025-11-26下载This paper presents ChronoRAN, a mathematical framework for accurately computing one-way latency (for uplink and downlink) in the 5G RAN across diverse system configurations.
Digital Twin-Driven Secure Access Strategy for SAGIN-Enabled IoT NetworksHui Liang, Zhihui Wu, Runqi Yuan, Guobin Zhang, Yanfeng Zhang, Jinkai Zheng, Tom H. Luan2025-11-26下载In space-air-ground integrated networks (SAGIN)-enabled IoT networks, secure access has become a significant challenge due to the increasing risks of eavesdropping attacks.
5G Network Automation Using Local Large Language Models and Retrieval-Augmented GenerationAhmadreza Majlesara, Ali Majlesi, Ali Mamaghani, Alireza Shokrani, Babak Hossein Khalaj2025-11-26下载This demonstration showcases the integration of a lightweight, locally deployed Large Language Model (LLaMA-3 8b Q-4b) empowered by retrieval augmented generation (RAG) to automate 5G network manageme...
Performance Evaluation of Low-Latency Live Streaming of MPEG-DASH UHD video over Commercial 5G NSA/SA NetworkKasidis Arunruangsirilert, Bo Wei, Hang Song, Jiro Katto2025-11-26下载5G Standalone (SA) is the goal of the 5G evolution, which aims to provide higher throughput and lower latency than the existing LTE network. One of the main applications of 5G is the real-time distrib...
Real-World Performance Evaluations of Low-Band 5G NR/4G LTE 4x4 MIMO on Commercial SmartphonesPasapong Wongprasert, Kasidis Arunruangsirilert, Jiro Katto2025-11-26下载All 3GPP-compliant commercial 5G New Radio (NR)-capable UEs on the market are equipped with 4x4 MIMO support for Mid-Band frequencies (>1.7 GHz) and above, enabling up to rank 4 MIMO transmission.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
DynamicAdaptiveClimb: Adaptive Cache Replacement with Dynamic ResizingDaniel Berend, Shlomi Dolev, Sweta Kumari, Dhruv Mishra, Marina Kogan-Sadetsky, Archit Somani2025-11-26下载Efficient cache management is critical for optimizing the system performance, and numerous caching mechanisms have been proposed, each exploring various insertion and eviction strategies.

cs.PF - Performance

标题作者发布日期PDF摘要
Modeling the Effect of Data Redundancy on Speedup in MLFMA Near-Field ComputationMorteza Sadeghi2025-11-26下载The near-field (P2P) operator in the Multilevel Fast Multipole Algorithm (MLFMA) is a performance bottleneck on GPUs due to poor memory locality.
Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLMTim Trappen, Robert Keßler, Roland Pabel, Viktor Achter, Stefan Wesner2025-11-26下载Due to rising demands for Artificial Inteligence (AI) inference, especially in higher education, novel solutions utilising existing infrastructure are emerging.

基于 VitePress 构建