2025-11-26

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Platinum: Path-Adaptable LUT-Based Accelerator Tailored for Low-Bit Weight Matrix Multiplication	Haoxuan Shan, Cong Guo, Chiyue Wei, Feng Cheng, Junyao Zhang, Hai "Helen" Li, Yiran Chen	2025-11-26	下载	The rapid scaling of large language models demands more efficient hardware. Quantization offers a promising trade-off between efficiency and performance.
Modeling and Simulation Frameworks for Processing-in-Memory Architectures	Mahdi Aghaei, Saba Ebrahimi, Mohammad Saleh Arafati, Elham Cheshmikhani, Dara Rahmati, Saeid Gorgin, Jungrae Kim	2025-11-26	下载	Processing-in-Memory (PIM) has emerged as a promising computing paradigm to address the memory wall and the fundamental bottleneck of the von Neumann architecture by reducing costly data movement betw...
Modeling and Optimizing Performance Bottlenecks for Neuromorphic Accelerators	Jason Yik, Walter Gallego Gomez, Andrew Cheng, Benedetto Leto, Alessandro Pierro, Noah Pacik-Nelson, Korneel Van den Berghe, Vittorio Fra, Andreea Danielescu, Gianvito Urgese, Vijay Janapa Reddi	2025-11-26	下载	Neuromorphic accelerators offer promising platforms for machine learning (ML) inference by leveraging event-driven, spatially-expanded architectures that naturally exploit unstructured sparsity throug...
A 0.32 mm $^2$ 100 Mb/s 223 mW ASIC in 22FDX for Joint Jammer Mitigation, Channel Estimation, and SIMO Data Detection	Jonas Elmiger, Fabian Stuber, Oscar Castañeda, Gian Marti, Christoph Studer	2025-11-26	下载	We present the first single-input multiple-output (SIMO) receiver ASIC that jointly performs jammer mitigation, channel estimation, and data detection.
A Jammer-Resilient 2.87 mm $^2$ 1.28 MS/s 310 mW Multi-Antenna Synchronization ASIC in 65 nm	Flurin Arquint, Oscar Castañeda, Gian Marti, Christoph Studer	2025-11-26	下载	We present the first ASIC implementation of jammer-resilient multi-antenna time synchronization. The ASIC implements a recent algorithm that mitigates jamming attacks on synchronization signals using ...
Bombyx: OpenCilk Compilation for FPGA Hardware Acceleration	Mohamed Shahawy, Julien de Castelnau, Paolo Ienne	2025-11-26	下载	Task-level parallelism (TLP) is a widely used approach in software where independent tasks are dynamically created and scheduled at runtime. Recent systems have explored architectural support for TLP ...
RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI	Muhammed Yildirim, Ozcan Ozturk	2025-11-26	下载	The increasing demand for on-device intelligence in Edge AI and TinyML applications requires the efficient execution of modern Convolutional Neural Networks (CNNs).
LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and Throttling	Zhongchun Zhou, Chengtao Lai, Wei Zhang	2025-11-26	下载	Large Language Models (LLMs) have achieved unprecedented success across various applications, but their substantial memory requirements pose significant challenges to current memory system designs, es...
Handling of Memory Page Faults during Virtual-Address RDMA	Antonis Psistakis	2025-11-26	下载	Nowadays, avoiding system calls during cluster communication (e.g., in Data Centers and High Performance Computing) in modern high-speed interconnection networks has become a necessity, due to the hig...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
ZipperChain: Transmuting Trusted Third-Party Services Into Trustless Atomic Broadcast	Matteo Bjornsson, Taylor Hardin, Taylor Heinecke, Marcin Furtak, David L. Millman, Mike P. Wittie	2025-11-26	下载	Distributed ledger technologies (DLTs) rely on distributed consensus mechanisms to reach agreement over the order of transactions and to provide immutability and availability of transaction data.
Clock2Q+: A Simple and Efficient Replacement Algorithm for Metadata Cache in VMware vSAN	Yiyan Zhai, Bintang Dwi Marthen, Sarath Balivada, Vamsi Sudhakar Bojji, Eric Knauft, Jitender Rohilla, Jiaqi Zuo, Quanxing Liu, Maxime Austruy, Wenguang Wang, Juncheng Yang	2025-11-26	下载	Cache replacement algorithms are critical building blocks of storage systems. This paper examines the characteristics of metadata caches and argues that they inherently exhibit correlated references, ...
OOCO: Latency-disaggregated Architecture for Online-Offline Co-locate LLM Serving	Siyu Wu, Zihan Tang, Yuting Zeng, Hui Chen, Guiguang Ding, Tongxuan Liu, Ke Zhang, Hailong Yang	2025-11-26	下载	Large Language Models (LLMs) are increasingly deployed in both latency-sensitive online services and cost-sensitive offline workloads. Co-locating these workloads on shared serving instances can impro...
Equivalence and Separation between Heard-Of and Asynchronous Message-Passing Models	Hagit Attiya, Armando Castañeda, Dhrubajyoti Ghosh, Thomas Nowak	2025-11-26	下载	We revisit the relationship between two fundamental models of distributed computation: the asynchronous message-passing model with up to $f$ crash failures (\operatorname{AMP}_f) and the Heard-Of mo...
A Sustainable and Reward Incentivized High-Performance Cluster Computing for Artificial Intelligence: A Novel Bayesian-Time-Decay Trust Mechanism in Blockchain	Murat Yaslioglu	2025-11-26	下载	In an age where sustainability is of paramount importance, the significance of both high-performance computing and intelligent algorithms cannot be understated.
DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving	Fengze Yu, Leshu Li, Brad McDanel, Sai Qian Zhang	2025-11-26	下载	Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments.
AI/ML Model Cards in Edge AI Cyberinfrastructure: towards Agentic AI	Beth Plale, Neelesh Karthikeyan, Isuru Gamage, Joe Stubbs, Sachith Withana	2025-11-26	下载	AI/ML model cards can contain a benchmarked evaluation of an AI/ML model against intended use but a one time assessment during model training does not get at how and where a model is actually used ove...
Diagonal Scaling: A Multi-Dimensional Resource Model and Optimization Framework for Distributed Databases	Shahir Abdullah, Syed Rohit Zaman	2025-11-26	下载	Modern cloud databases present scaling as a binary decision: scale-out by adding nodes or scale-up by increasing per-node resources. This one-dimensional view is limiting because database performance,...
MAD-DAG: Protecting Blockchain Consensus from MEV	Roi Bar-Zur, Aviv Tamar, Ittay Eyal	2025-11-26	下载	Blockchain security is threatened by selfish mining, where a miner (operator) deviates from the protocol to increase their revenue. Selfish mining is exacerbated by adverse conditions: rushing (networ...
Modeling the Effect of Data Redundancy on Speedup in MLFMA Near-Field Computation	Morteza Sadeghi	2025-11-26	下载	The near-field (P2P) operator in the Multilevel Fast Multipole Algorithm (MLFMA) is a performance bottleneck on GPUs due to poor memory locality.
MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training	Lu Zhao, Rong Shi, Shaoqing Zhang, Yueqiang Chen, Baoguo He, Hongfeng Sun, Ziqing Yin, Shangchao Su, Zhiyan Cui, Liang Dong, Xiyuan Li, Lingbin Wang, Jianwei He, Jiesong Ma, Weikang Huang, Jianglei Tong, Dongdong Gao, Jian Zhang, Hong Tian, Hui Shen, Zongtai Luo, Zhaoqun Sun, Hongxing Niu, Yue Sun	2025-11-26	下载	The training of large-scale Mixture of Experts (MoE) models faces a critical memory bottleneck due to severe load imbalance caused by dynamic token routing.
Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM	Tim Trappen, Robert Keßler, Roland Pabel, Viktor Achter, Stefan Wesner	2025-11-26	下载	Due to rising demands for Artificial Inteligence (AI) inference, especially in higher education, novel solutions utilising existing infrastructure are emerging.
GPU-Virt-Bench: A Comprehensive Benchmarking Framework for Software-Based GPU Virtualization Systems	Jithin VG, Ditto PS	2025-11-26	下载	The proliferation of GPU-accelerated workloads, particularly in artificial intelligence and large language model (LLM) inference, has created unprecedented demand for efficient GPU resource sharing in...
Privacy in Federated Learning with Spiking Neural Networks	Dogukan Aksu, Jesus Martinez del Rincon, Ihsen Alouani	2025-11-26	下载	Spiking neural networks (SNNs) have emerged as prominent candidates for embedded and edge AI. Their inherent low power consumption makes them far more efficient than conventional ANNs in scenarios whe...
GPU Memory Prediction for Multimodal Model Training	Jinwoo Jeong, Minchul Kang, Younghun Go, Changyong Shin, Hyunho Lee, Junho Yoon, Gyeongsik Yang, Chuck Yoo	2025-11-26	下载	As deep learning models in agentic AI systems grow in scale and complexity, GPU memory requirements increase and often exceed the available GPU memory capacity, so that out-of-memory (OoM) errors occu...
LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and Throttling	Zhongchun Zhou, Chengtao Lai, Wei Zhang	2025-11-26	下载	Large Language Models (LLMs) have achieved unprecedented success across various applications, but their substantial memory requirements pose significant challenges to current memory system designs, es...
Handling of Memory Page Faults during Virtual-Address RDMA	Antonis Psistakis	2025-11-26	下载	Nowadays, avoiding system calls during cluster communication (e.g., in Data Centers and High Performance Computing) in modern high-speed interconnection networks has become a necessity, due to the hig...
Efficient Multi-Adapter LLM Serving via Cross-Model KV-Cache Reuse with Activated LoRA	Allison Li, Kristjan Greenewald, Thomas Parnell, Navid Azizan	2025-11-26	下载	Modern large language model (LLM) systems increasingly rely on multi-turn pipelines that are composed of multiple task-specific adapters, yet existing serving frameworks remain inefficient, incurring ...
DOPD: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving	Junhan Liao, Minxian Xu, Wanyi Zheng, Yan Wang, Kejiang Ye, Rajkumar Buyya, Chengzhong Xu	2025-11-26	下载	To meet strict Service-Level Objectives (SLOs),contemporary Large Language Models (LLMs) decouple the prefill and decoding stages and place them on separate GPUs to mitigate the distinct bottlenecks i...
Aragog: Just-in-Time Model Routing for Scalable Serving of Agentic Workflows	Yinwei Dai, Zhuofu Chen, Anand Iyer, Ravi Netravali	2025-11-26	下载	Agentic workflows have emerged as a powerful paradigm for solving complex, multi-stage tasks, but serving them at scale is computationally expensive given the many LLM inferences that each request mus...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
ZipperChain: Transmuting Trusted Third-Party Services Into Trustless Atomic Broadcast	Matteo Bjornsson, Taylor Hardin, Taylor Heinecke, Marcin Furtak, David L. Millman, Mike P. Wittie	2025-11-26	下载	Distributed ledger technologies (DLTs) rely on distributed consensus mechanisms to reach agreement over the order of transactions and to provide immutability and availability of transaction data.
Resilient and Reliable Cloud Network Control for Mission-Critical Latency-Sensitive Service Chains	Chin-Wei Huang, Jaime Llorca, Antonia M. Tulino, Andreas F. Molisch	2025-11-26	下载	The proliferation of mission-critical latency-sensitive services has intensified the demand for next-generation cloud-integrated networks to guarantee both reliable and resilient service delivery.
Secure Command, Control and Communications Systems (C3) for Army UxVs	T. Rebolo, A. Grilo, C. Ribeiro	2025-11-26	下载	Unmanned Vehicles (UxVs) are increasingly used in modern military operations for reconnaissance, surveillance, and strike missions, enhancing situational awareness while reducing risk to personnel.
Toward Secure Content-Centric Approaches for 5G-Based IoT: Advances and Emerging Trends	Ghada Jaber, Mohamed Ali Zormati, Walid Cavelius, Louka Chapiro, Mohamed El Ahmadi	2025-11-26	下载	The convergence of the Internet of Things (IoT) and 5G technologies is transforming modern communication systems by enabling massive connectivity, low latency, and high-speed data transmission.
ChronoRAN: Analyzing Latency in 5G Systems	Arman Maghsoudnia, Aoyu Gong, Raphael Cannatà, Dan Mihai Dumitriu, Haitham Hassanieh	2025-11-26	下载	This paper presents ChronoRAN, a mathematical framework for accurately computing one-way latency (for uplink and downlink) in the 5G RAN across diverse system configurations.
Digital Twin-Driven Secure Access Strategy for SAGIN-Enabled IoT Networks	Hui Liang, Zhihui Wu, Runqi Yuan, Guobin Zhang, Yanfeng Zhang, Jinkai Zheng, Tom H. Luan	2025-11-26	下载	In space-air-ground integrated networks (SAGIN)-enabled IoT networks, secure access has become a significant challenge due to the increasing risks of eavesdropping attacks.
5G Network Automation Using Local Large Language Models and Retrieval-Augmented Generation	Ahmadreza Majlesara, Ali Majlesi, Ali Mamaghani, Alireza Shokrani, Babak Hossein Khalaj	2025-11-26	下载	This demonstration showcases the integration of a lightweight, locally deployed Large Language Model (LLaMA-3 8b Q-4b) empowered by retrieval augmented generation (RAG) to automate 5G network manageme...
Performance Evaluation of Low-Latency Live Streaming of MPEG-DASH UHD video over Commercial 5G NSA/SA Network	Kasidis Arunruangsirilert, Bo Wei, Hang Song, Jiro Katto	2025-11-26	下载	5G Standalone (SA) is the goal of the 5G evolution, which aims to provide higher throughput and lower latency than the existing LTE network. One of the main applications of 5G is the real-time distrib...
Real-World Performance Evaluations of Low-Band 5G NR/4G LTE 4x4 MIMO on Commercial Smartphones	Pasapong Wongprasert, Kasidis Arunruangsirilert, Jiro Katto	2025-11-26	下载	All 3GPP-compliant commercial 5G New Radio (NR)-capable UEs on the market are equipped with 4x4 MIMO support for Mid-Band frequencies (>1.7 GHz) and above, enabling up to rank 4 MIMO transmission.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
DynamicAdaptiveClimb: Adaptive Cache Replacement with Dynamic Resizing	Daniel Berend, Shlomi Dolev, Sweta Kumari, Dhruv Mishra, Marina Kogan-Sadetsky, Archit Somani	2025-11-26	下载	Efficient cache management is critical for optimizing the system performance, and numerous caching mechanisms have been proposed, each exploring various insertion and eviction strategies.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Modeling the Effect of Data Redundancy on Speedup in MLFMA Near-Field Computation	Morteza Sadeghi	2025-11-26	下载	The near-field (P2P) operator in the Multilevel Fast Multipole Algorithm (MLFMA) is a performance bottleneck on GPUs due to poor memory locality.
Automated Dynamic AI Inference Scaling on HPC-Infrastructure: Integrating Kubernetes, Slurm and vLLM	Tim Trappen, Robert Keßler, Roland Pabel, Viktor Achter, Stefan Wesner	2025-11-26	下载	Due to rising demands for Artificial Inteligence (AI) inference, especially in higher education, novel solutions utilising existing infrastructure are emerging.

2025-11-26 ​

cs.AR - Architecture ​

cs.DC - Distributed, Parallel, and Cluster Computing ​

cs.NI - Networking and Internet Architecture ​

cs.OS - Operating Systems ​

cs.PF - Performance ​

2025-11-26

cs.AR - Architecture

cs.DC - Distributed, Parallel, and Cluster Computing

cs.NI - Networking and Internet Architecture

cs.OS - Operating Systems

cs.PF - Performance