2025-11-09

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
FPGA or GPU? Analyzing comparative research for application-specific guidance	Arnab A Purkayastha, Jay Tharwani, Shobhit Aggarwal	2025-11-09	下载	The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs...
Offloading Data Center Tax	Akshay Revankar, Charan Renganathan, Sartaj Wariah	2025-11-09	下载	The data centers of today are running diverse workloads sharing many common lower level functions called tax components. Any optimization to any tax component will lead to performance improvements acr...
Precision-Scalable Microscaling Datapaths with Optimized Reduction Tree for Efficient NPU Integration	Stef Cuyckens, Xiaoling Yi, Robin Geens, Joren Dumoulin, Martin Wiesner, Chao Fang, Marian Verhelst	2025-11-09	下载	Emerging continual learning applications necessitate next-generation neural processing unit (NPU) platforms to support both training and inference operations.
STAR: Improving Lifetime and Performance of High-Capacity Modern SSDs Using State-Aware Randomizer	Omin Kwon, Kyungjun Oh, Jaeyong Lee, Myungsuk Kim, Jihong Kim	2025-11-09	下载	Although NAND flash memory has achieved continuous capacity improvements via advanced 3D stacking and multi-level cell technologies, these innovations introduce new reliability challenges, particularl...
Exploring Parallelism in FPGA-Based Accelerators for Machine Learning Applications	Sed Centeno, Christopher Sprague, Arnab A Purkayastha, Ray Simar, Neeraj Magotra	2025-11-09	下载	Speculative backpropagation has emerged as a promising technique to accelerate the training of neural networks by overlapping the forward and backward passes.
SoK: Systematizing a Decade of Architectural RowHammer Defenses Through the Lens of Streaming Algorithms	Michael Jaemin Kim, Seungmin Baek, Jumin Kim, Hwayong Nam, Nam Sung Kim, Jung Ho Ahn	2025-11-09	下载	A decade after its academic introduction, RowHammer (RH) remains a moving target that continues to challenge both the industry and academia. With its potential to serve as a critical attack vector, th...
LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs	Zifan He, Shengyu Ye, Rui Ma, Yang Wang, Jason Cong	2025-11-09	下载	The rapid development of large language models (LLM) has greatly enhanced everyday applications. While many FPGA-based accelerators, with flexibility for fine-grained data control, exhibit superior sp...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
FPGA or GPU? Analyzing comparative research for application-specific guidance	Arnab A Purkayastha, Jay Tharwani, Shobhit Aggarwal	2025-11-09	下载	The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs...
Towards Optimal Constellation Design for Digital Over-the-Air Computation	Saeed Razavikia, Deniz Gündüz, Carlo Fischione	2025-11-09	下载	Over-the-air computation (OAC) has emerged as a key technique for efficient function computation over multiple-access channels (MACs) by exploiting the waveform superposition property of the wireless ...
PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization	Kelun Lei, Hailong Yang, Huaitao Zhang, Xin You, Kaige Zhang, Zhongzhi Luan, Yi Liu, Depei Qian	2025-11-09	下载	Designing high-performance kernels requires expert-level tuning and a deep understanding of hardware characteristics. Recent advances in large language models (LLMs) have enabled automated kernel gene...
Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism	Cong Li, Yuzhe Yang, Xuegui Zheng, Qifan Yang, Yijin Guan, Size Zheng, Li-Wen Chang, Shufan Liu, Xin Liu, Guangyu Sun	2025-11-09	下载	With the advancement of large language models (LLMs), their context windows have rapidly expanded. To meet diverse demands from varying-length requests in online services, existing state-of-the-art sy...
Exploring Parallelism in FPGA-Based Accelerators for Machine Learning Applications	Sed Centeno, Christopher Sprague, Arnab A Purkayastha, Ray Simar, Neeraj Magotra	2025-11-09	下载	Speculative backpropagation has emerged as a promising technique to accelerate the training of neural networks by overlapping the forward and backward passes.
LiteCast: A Lightweight Forecaster for Carbon Optimizations	Mathew Joseph, Tanush Savadi, Abel Souza	2025-11-09	下载	Over recent decades, electricity demand has experienced sustained growth through widespread electrification of transportation and the accelerated expansion of Artificial Intelligence (AI).

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
CYPRESS: Transferring Secrets in the Shadow of Visible Packets	Sirus Shahini, Robert Ricci	2025-11-09	下载	Network steganography and covert communication channels have been studied extensively in the past. However, prior works offer minimal practical use for their proposed techniques and are limited to spe...
Privacy-Preserving Federated Learning for Fair and Efficient Urban Traffic Optimization	Rathin Chandra Shit, Sharmila Subudhi	2025-11-09	下载	The optimization of urban traffic is threatened by the complexity of achieving a balance between transport efficiency and the maintenance of privacy, as well as the equitable distribution of traffic b...
Enhancing Adversarial Robustness of IoT Intrusion Detection via SHAP-Based Attribution Fingerprinting	Dilli Prasad Sharma, Liang Xue, Xiaowei Sun, Xiaodong Lin, Pulei Xiong	2025-11-09	下载	The rapid proliferation of Internet of Things (IoT) devices has transformed numerous industries by enabling seamless connectivity and data-driven automation.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
Guidelines for Building Indexes on Partially Cache-Coherent CXL Shared Memory	Fangnuo Wu, Mingkai Dong, Wenjun Cai, Jingsheng Yan, Haibo Chen	2025-11-09	下载	The \emph{Partial Cache-Coherence (PCC)} model maintains hardware cache coherence only within subsets of cores, enabling large-scale memory sharing with emerging memory interconnect technologies like ...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
EcoSpa: Efficient Transformer Training with Coupled Sparsity	Jinqi Xiao, Cheng Luo, Lingyi Huang, Cheng Yang, Yang Sui, Huy Phan, Xiao Zang, Yibiao Ying, Zhexiang Tang, Anima Anandkumar, Bo Yuan	2025-11-09	下载	Transformers have become the backbone of modern AI, yet their high computational demands pose critical system challenges. While sparse training offers efficiency gains, existing methods fail to preser...