Skip to content

2025-08-28

cs.AR - Architecture

标题作者发布日期PDF摘要
Catwalk: Unary Top-K for Efficient Ramp-No-Leak Neuron Design for Temporal Neural NetworksDevon Lister, Prabhu Vellaisamy, John Paul Shen, Di Wu2025-08-28下载Temporal neural networks (TNNs) are neuromorphic neural networks that utilize bit-serial temporal coding. TNNs are composed of columns, which in turn employ neurons as their building blocks.
SCE-NTT: A Hardware Accelerator for Number Theoretic Transform Using Superconductor ElectronicsSasan Razmkhah, Mingye Li, Zeming Cheng, Robert S. Aviles, Kyle Jackman, Joey Delport, Lieze Schindler, Wenhui Luo, Takuya Suzuki, Mehdi Kamal, Christopher L. Ayala, Coenrad J. Fourie, Nabuyuki Yoshikawa, Peter A. Beerel, Sandeep Gupta, Massoud Pedram2025-08-28下载This research explores the use of superconductor electronics (SCE) for accelerating fully homomorphic encryption (FHE), focusing on the Number-Theoretic Transform (NTT), a key computational bottleneck...
Microarchitecture Design and Benchmarking of Custom SHA-3 Instruction for RISC-VAlperen Bolat, Sakir Sezer, Kieran McLaughlin, Henry Hui2025-08-28下载Integrating cryptographic accelerators into modern CPU architectures presents unique microarchitectural challenges, particularly when extending instruction sets with complex and multistage operations.
The Future of Memory: Limits and OpportunitiesSamuel Dayo, Shuhan Liu, Peijing Li, Philip Levis, Subhasish Mitra, Thierry Tambe, David Tennenhouse, H. -S. Philip Wong2025-08-28下载Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memori...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor CoresBrian Curless, Michael Gowanlock2025-08-28下载Modern GPUs are equipped with tensor cores (TCs) that are commonly used for matrix multiplication in artificial intelligence workloads. However, because they have high computational throughput, they c...
TinyServe: Query-Aware Cache Selection for Efficient LLM ServingDong Liu, Yanxuan Yu2025-08-28下载Serving large language models (LLMs) efficiently remains challenging due to the high memory and latency overhead of key-value (KV) cache access during autoregressive decoding.
A Hybrid Stochastic Gradient Tracking Method for Distributed Online Optimization Over Time-Varying Directed NetworksXinli Shi, Xingxing Yuan, Longkang Zhu, Guanghui Wen2025-08-28下载With the increasing scale and dynamics of data, distributed online optimization has become essential for real-time decision-making in various applications.
A Proposal for High-Level Architectural Model Capable of Expressing Various Data Collaboration Platform and Data Space ConceptsMasaru Dobashi, Kohei Toshimitsu, Hirotsugu Seike, Miki Kanno, Genki Horie, Noboru Koshizuka2025-08-28下载This paper proposes "Data Space High-Level Architecture Model" (DS-HLAM) for expressing diverse data collaboration platforms across regional implementations.
High performance visualization for Astronomy and Cosmology: the VisIVO's pathway toward Exascale systemsEva Sciacca, Nicola Tuccari, Fabio Vitello, Valentina Cesare2025-08-28下载Petabyte-scale data volumes are generated by observations and simulations in modern astronomy and astrophysics. Storage, access, and data analysis are significantly hampered by such data volumes and a...
Collaborative Evolution of Intelligent Agents in Large-Scale Microservice SystemsYilin Li, Song Han, Sibo Wang, Ming Wang, Renzi Meng2025-08-28下载This paper proposes an intelligent service optimization method based on a multi-agent collaborative evolution mechanism to address governance challenges in large-scale microservice architectures.
pdGRASS: A Fast Parallel Density-Aware Algorithm for Graph Spectral SparsificationTiancheng Zhao, Zekun Yin, Huihai An, Xiaoyu Yang, Zhou Jin, Jiasi Shen, Helen Xu2025-08-28下载Graph Spectral Sparsification (GSS) identifies an ultra-sparse subgraph, or sparsifier, whose Laplacian matrix closely approximates the spectral properties of the original graph, enabling substantial ...
CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer InferenceGuanyu Xu, Zhiwei Hao, Li Shen, Yong Luo, Fuhui Sun, Xiaoyan Wang, Han Hu, Yonggang Wen2025-08-28下载The impressive performance of transformer models has sparked the deployment of intelligent applications on resource-constrained edge devices. However, ensuring high-quality service for real-time edge ...
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in LLMsMd Abdullah Al Mamun, Ihsen Alouani, Nael Abu-Ghazaleh2025-08-28下载Large Language Models (LLMs) are aligned to meet ethical standards and safety requirements by training them to refuse answering harmful or unsafe prompts.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
DSROQ: Dynamic Scheduling and Routing for QoE Management in LEO Satellite NetworksDhiraj Bhattacharjee, Pablo G. Madoery, Abhishek Naik, Halim Yanikomeroglu, Gunes Karabulut Kurt, Stephane Martel, Khaled Ahmed2025-08-28下载The modern Internet supports diverse applications with heterogeneous quality of service (QoS) requirements. Low Earth orbit (LEO) satellite constellations offer a promising solution to meet these need...
RANGAN: GAN-empowered Anomaly Detection in 5G Cloud RANDouglas Liao, Jiping Luo, Jens Vevstad, Nikolaos Pappas2025-08-28下载Radio Access Network (RAN) systems are inherently complex, requiring continuous monitoring to prevent performance degradation and ensure optimal user experience.
Digital Twin-Empowered Deep Reinforcement Learning for Intelligent VNF Migration in Edge-Core NetworksFaisal Ahmed, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin2025-08-28下载The growing demand for services and the rapid deployment of virtualized network functions (VNFs) pose significant challenges for achieving low-latency and energy-efficient orchestration in modern edge...
Microarchitecture Design and Benchmarking of Custom SHA-3 Instruction for RISC-VAlperen Bolat, Sakir Sezer, Kieran McLaughlin, Henry Hui2025-08-28下载Integrating cryptographic accelerators into modern CPU architectures presents unique microarchitectural challenges, particularly when extending instruction sets with complex and multistage operations.
Relay Selection in Wireless Networks as Restless BanditsMandar R. Nalavade, Ravindra S. Tomar, Gaurav S. Kasbekar2025-08-28下载We consider a wireless network in which a source node needs to transmit a large file to a destination node. The direct wireless link between the source and the destination is assumed to be blocked.
Exploring Busy Period for Worst-Case Deadline Failure Probability AnalysisJunyi Liu, Xu Jiang, Yuanzhen Mu, Wang Yi, Nan Guan2025-08-28下载Busy period is a fundamental concept in classical deterministic real-time scheduling analysis. In this deterministic context, only one busy period - which starts at the critical instant - needs to be ...
Enhancing Resilience for IoE: A Perspective of Networking-Level SafeguardGuan-Yan Yang, Jui-Ning Chen, Farn Wang, Kuo-Hui Yeh2025-08-28下载The Internet of Energy (IoE) integrates IoT-driven digital communication with power grids to enable efficient and sustainable energy systems. Still, its interconnectivity exposes critical infrastructu...

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Exploring Busy Period for Worst-Case Deadline Failure Probability AnalysisJunyi Liu, Xu Jiang, Yuanzhen Mu, Wang Yi, Nan Guan2025-08-28下载Busy period is a fundamental concept in classical deterministic real-time scheduling analysis. In this deterministic context, only one busy period - which starts at the critical instant - needs to be ...
AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model ServingShaoting Feng, Hanchen Li, Kuntai Du, Zhuohan Gu, Yuhan Liu, Jiayi Yao, Siddhant Ray, Samuel Shen, Yihua Cheng, Ganesh Ananthanarayanan, Junchen Jiang2025-08-28下载Large language model (LLM) applications often reuse previously processed context, such as chat history and documents, which introduces significant redundant computation.

cs.PF - Performance

标题作者发布日期PDF摘要
Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor CoresBrian Curless, Michael Gowanlock2025-08-28下载Modern GPUs are equipped with tensor cores (TCs) that are commonly used for matrix multiplication in artificial intelligence workloads. However, because they have high computational throughput, they c...
Blind Source Separation-Enabled Joint Communication and Sensing in IBFD MIMO SystemsSiyao Li, Conrad Prisby, Thomas Yang2025-08-28下载This paper addresses the challenge of joint communication and sensing (JCAS) in next-generation wireless networks, with an emphasis on in-band full-duplex (IBFD) multiple-input multiple-output (MIMO) ...
CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer InferenceGuanyu Xu, Zhiwei Hao, Li Shen, Yong Luo, Fuhui Sun, Xiaoyan Wang, Han Hu, Yonggang Wen2025-08-28下载The impressive performance of transformer models has sparked the deployment of intelligent applications on resource-constrained edge devices. However, ensuring high-quality service for real-time edge ...

基于 VitePress 构建