2025-08-28

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Catwalk: Unary Top-K for Efficient Ramp-No-Leak Neuron Design for Temporal Neural Networks	Devon Lister, Prabhu Vellaisamy, John Paul Shen, Di Wu	2025-08-28	下载	Temporal neural networks (TNNs) are neuromorphic neural networks that utilize bit-serial temporal coding. TNNs are composed of columns, which in turn employ neurons as their building blocks.
SCE-NTT: A Hardware Accelerator for Number Theoretic Transform Using Superconductor Electronics	Sasan Razmkhah, Mingye Li, Zeming Cheng, Robert S. Aviles, Kyle Jackman, Joey Delport, Lieze Schindler, Wenhui Luo, Takuya Suzuki, Mehdi Kamal, Christopher L. Ayala, Coenrad J. Fourie, Nabuyuki Yoshikawa, Peter A. Beerel, Sandeep Gupta, Massoud Pedram	2025-08-28	下载	This research explores the use of superconductor electronics (SCE) for accelerating fully homomorphic encryption (FHE), focusing on the Number-Theoretic Transform (NTT), a key computational bottleneck...
Microarchitecture Design and Benchmarking of Custom SHA-3 Instruction for RISC-V	Alperen Bolat, Sakir Sezer, Kieran McLaughlin, Henry Hui	2025-08-28	下载	Integrating cryptographic accelerators into modern CPU architectures presents unique microarchitectural challenges, particularly when extending instruction sets with complex and multistage operations.
The Future of Memory: Limits and Opportunities	Samuel Dayo, Shuhan Liu, Peijing Li, Philip Levis, Subhasish Mitra, Thierry Tambe, David Tennenhouse, H. -S. Philip Wong	2025-08-28	下载	Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memori...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores	Brian Curless, Michael Gowanlock	2025-08-28	下载	Modern GPUs are equipped with tensor cores (TCs) that are commonly used for matrix multiplication in artificial intelligence workloads. However, because they have high computational throughput, they c...
TinyServe: Query-Aware Cache Selection for Efficient LLM Serving	Dong Liu, Yanxuan Yu	2025-08-28	下载	Serving large language models (LLMs) efficiently remains challenging due to the high memory and latency overhead of key-value (KV) cache access during autoregressive decoding.
A Hybrid Stochastic Gradient Tracking Method for Distributed Online Optimization Over Time-Varying Directed Networks	Xinli Shi, Xingxing Yuan, Longkang Zhu, Guanghui Wen	2025-08-28	下载	With the increasing scale and dynamics of data, distributed online optimization has become essential for real-time decision-making in various applications.
A Proposal for High-Level Architectural Model Capable of Expressing Various Data Collaboration Platform and Data Space Concepts	Masaru Dobashi, Kohei Toshimitsu, Hirotsugu Seike, Miki Kanno, Genki Horie, Noboru Koshizuka	2025-08-28	下载	This paper proposes "Data Space High-Level Architecture Model" (DS-HLAM) for expressing diverse data collaboration platforms across regional implementations.
High performance visualization for Astronomy and Cosmology: the VisIVO's pathway toward Exascale systems	Eva Sciacca, Nicola Tuccari, Fabio Vitello, Valentina Cesare	2025-08-28	下载	Petabyte-scale data volumes are generated by observations and simulations in modern astronomy and astrophysics. Storage, access, and data analysis are significantly hampered by such data volumes and a...
Collaborative Evolution of Intelligent Agents in Large-Scale Microservice Systems	Yilin Li, Song Han, Sibo Wang, Ming Wang, Renzi Meng	2025-08-28	下载	This paper proposes an intelligent service optimization method based on a multi-agent collaborative evolution mechanism to address governance challenges in large-scale microservice architectures.
pdGRASS: A Fast Parallel Density-Aware Algorithm for Graph Spectral Sparsification	Tiancheng Zhao, Zekun Yin, Huihai An, Xiaoyu Yang, Zhou Jin, Jiasi Shen, Helen Xu	2025-08-28	下载	Graph Spectral Sparsification (GSS) identifies an ultra-sparse subgraph, or sparsifier, whose Laplacian matrix closely approximates the spectral properties of the original graph, enabling substantial ...
CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference	Guanyu Xu, Zhiwei Hao, Li Shen, Yong Luo, Fuhui Sun, Xiaoyan Wang, Han Hu, Yonggang Wen	2025-08-28	下载	The impressive performance of transformer models has sparked the deployment of intelligent applications on resource-constrained edge devices. However, ensuring high-quality service for real-time edge ...
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMs	Md Abdullah Al Mamun, Ihsen Alouani, Nael Abu-Ghazaleh	2025-08-28	下载	Large Language Models (LLMs) are aligned to meet ethical standards and safety requirements by training them to refuse answering harmful or unsafe prompts.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
DSROQ: Dynamic Scheduling and Routing for QoE Management in LEO Satellite Networks	Dhiraj Bhattacharjee, Pablo G. Madoery, Abhishek Naik, Halim Yanikomeroglu, Gunes Karabulut Kurt, Stephane Martel, Khaled Ahmed	2025-08-28	下载	The modern Internet supports diverse applications with heterogeneous quality of service (QoS) requirements. Low Earth orbit (LEO) satellite constellations offer a promising solution to meet these need...
RANGAN: GAN-empowered Anomaly Detection in 5G Cloud RAN	Douglas Liao, Jiping Luo, Jens Vevstad, Nikolaos Pappas	2025-08-28	下载	Radio Access Network (RAN) systems are inherently complex, requiring continuous monitoring to prevent performance degradation and ensure optimal user experience.
Digital Twin-Empowered Deep Reinforcement Learning for Intelligent VNF Migration in Edge-Core Networks	Faisal Ahmed, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin	2025-08-28	下载	The growing demand for services and the rapid deployment of virtualized network functions (VNFs) pose significant challenges for achieving low-latency and energy-efficient orchestration in modern edge...
Microarchitecture Design and Benchmarking of Custom SHA-3 Instruction for RISC-V	Alperen Bolat, Sakir Sezer, Kieran McLaughlin, Henry Hui	2025-08-28	下载	Integrating cryptographic accelerators into modern CPU architectures presents unique microarchitectural challenges, particularly when extending instruction sets with complex and multistage operations.
Relay Selection in Wireless Networks as Restless Bandits	Mandar R. Nalavade, Ravindra S. Tomar, Gaurav S. Kasbekar	2025-08-28	下载	We consider a wireless network in which a source node needs to transmit a large file to a destination node. The direct wireless link between the source and the destination is assumed to be blocked.
Exploring Busy Period for Worst-Case Deadline Failure Probability Analysis	Junyi Liu, Xu Jiang, Yuanzhen Mu, Wang Yi, Nan Guan	2025-08-28	下载	Busy period is a fundamental concept in classical deterministic real-time scheduling analysis. In this deterministic context, only one busy period - which starts at the critical instant - needs to be ...
Enhancing Resilience for IoE: A Perspective of Networking-Level Safeguard	Guan-Yan Yang, Jui-Ning Chen, Farn Wang, Kuo-Hui Yeh	2025-08-28	下载	The Internet of Energy (IoE) integrates IoT-driven digital communication with power grids to enable efficient and sustainable energy systems. Still, its interconnectivity exposes critical infrastructu...

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Exploring Busy Period for Worst-Case Deadline Failure Probability Analysis	Junyi Liu, Xu Jiang, Yuanzhen Mu, Wang Yi, Nan Guan	2025-08-28	下载	Busy period is a fundamental concept in classical deterministic real-time scheduling analysis. In this deterministic context, only one busy period - which starts at the critical instant - needs to be ...
AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving	Shaoting Feng, Hanchen Li, Kuntai Du, Zhuohan Gu, Yuhan Liu, Jiayi Yao, Siddhant Ray, Samuel Shen, Yihua Cheng, Ganesh Ananthanarayanan, Junchen Jiang	2025-08-28	下载	Large language model (LLM) applications often reuse previously processed context, such as chat history and documents, which introduces significant redundant computation.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores	Brian Curless, Michael Gowanlock	2025-08-28	下载	Modern GPUs are equipped with tensor cores (TCs) that are commonly used for matrix multiplication in artificial intelligence workloads. However, because they have high computational throughput, they c...
Blind Source Separation-Enabled Joint Communication and Sensing in IBFD MIMO Systems	Siyao Li, Conrad Prisby, Thomas Yang	2025-08-28	下载	This paper addresses the challenge of joint communication and sensing (JCAS) in next-generation wireless networks, with an emphasis on in-band full-duplex (IBFD) multiple-input multiple-output (MIMO) ...
CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference	Guanyu Xu, Zhiwei Hao, Li Shen, Yong Luo, Fuhui Sun, Xiaoyan Wang, Han Hu, Yonggang Wen	2025-08-28	下载	The impressive performance of transformer models has sparked the deployment of intelligent applications on resource-constrained edge devices. However, ensuring high-quality service for real-time edge ...