Skip to content

2025-11-09

cs.AR - Architecture

标题作者发布日期PDF摘要
FPGA or GPU? Analyzing comparative research for application-specific guidanceArnab A Purkayastha, Jay Tharwani, Shobhit Aggarwal2025-11-09下载The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs...
Offloading Data Center TaxAkshay Revankar, Charan Renganathan, Sartaj Wariah2025-11-09下载The data centers of today are running diverse workloads sharing many common lower level functions called tax components. Any optimization to any tax component will lead to performance improvements acr...
Precision-Scalable Microscaling Datapaths with Optimized Reduction Tree for Efficient NPU IntegrationStef Cuyckens, Xiaoling Yi, Robin Geens, Joren Dumoulin, Martin Wiesner, Chao Fang, Marian Verhelst2025-11-09下载Emerging continual learning applications necessitate next-generation neural processing unit (NPU) platforms to support both training and inference operations.
STAR: Improving Lifetime and Performance of High-Capacity Modern SSDs Using State-Aware RandomizerOmin Kwon, Kyungjun Oh, Jaeyong Lee, Myungsuk Kim, Jihong Kim2025-11-09下载Although NAND flash memory has achieved continuous capacity improvements via advanced 3D stacking and multi-level cell technologies, these innovations introduce new reliability challenges, particularl...
Exploring Parallelism in FPGA-Based Accelerators for Machine Learning ApplicationsSed Centeno, Christopher Sprague, Arnab A Purkayastha, Ray Simar, Neeraj Magotra2025-11-09下载Speculative backpropagation has emerged as a promising technique to accelerate the training of neural networks by overlapping the forward and backward passes.
SoK: Systematizing a Decade of Architectural RowHammer Defenses Through the Lens of Streaming AlgorithmsMichael Jaemin Kim, Seungmin Baek, Jumin Kim, Hwayong Nam, Nam Sung Kim, Jung Ho Ahn2025-11-09下载A decade after its academic introduction, RowHammer (RH) remains a moving target that continues to challenge both the industry and academia. With its potential to serve as a critical attack vector, th...
LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAsZifan He, Shengyu Ye, Rui Ma, Yang Wang, Jason Cong2025-11-09下载The rapid development of large language models (LLM) has greatly enhanced everyday applications. While many FPGA-based accelerators, with flexibility for fine-grained data control, exhibit superior sp...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
FPGA or GPU? Analyzing comparative research for application-specific guidanceArnab A Purkayastha, Jay Tharwani, Shobhit Aggarwal2025-11-09下载The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs...
Towards Optimal Constellation Design for Digital Over-the-Air ComputationSaeed Razavikia, Deniz Gündüz, Carlo Fischione2025-11-09下载Over-the-air computation (OAC) has emerged as a key technique for efficient function computation over multiple-access channels (MACs) by exploiting the waveform superposition property of the wireless ...
PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel OptimizationKelun Lei, Hailong Yang, Huaitao Zhang, Xin You, Kaige Zhang, Zhongzhi Luan, Yi Liu, Depei Qian2025-11-09下载Designing high-performance kernels requires expert-level tuning and a deep understanding of hardware characteristics. Recent advances in large language models (LLMs) have enabled automated kernel gene...
Optimizing Long-context LLM Serving via Fine-grained Sequence ParallelismCong Li, Yuzhe Yang, Xuegui Zheng, Qifan Yang, Yijin Guan, Size Zheng, Li-Wen Chang, Shufan Liu, Xin Liu, Guangyu Sun2025-11-09下载With the advancement of large language models (LLMs), their context windows have rapidly expanded. To meet diverse demands from varying-length requests in online services, existing state-of-the-art sy...
Exploring Parallelism in FPGA-Based Accelerators for Machine Learning ApplicationsSed Centeno, Christopher Sprague, Arnab A Purkayastha, Ray Simar, Neeraj Magotra2025-11-09下载Speculative backpropagation has emerged as a promising technique to accelerate the training of neural networks by overlapping the forward and backward passes.
LiteCast: A Lightweight Forecaster for Carbon OptimizationsMathew Joseph, Tanush Savadi, Abel Souza2025-11-09下载Over recent decades, electricity demand has experienced sustained growth through widespread electrification of transportation and the accelerated expansion of Artificial Intelligence (AI).

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
CYPRESS: Transferring Secrets in the Shadow of Visible PacketsSirus Shahini, Robert Ricci2025-11-09下载Network steganography and covert communication channels have been studied extensively in the past. However, prior works offer minimal practical use for their proposed techniques and are limited to spe...
Privacy-Preserving Federated Learning for Fair and Efficient Urban Traffic OptimizationRathin Chandra Shit, Sharmila Subudhi2025-11-09下载The optimization of urban traffic is threatened by the complexity of achieving a balance between transport efficiency and the maintenance of privacy, as well as the equitable distribution of traffic b...
Enhancing Adversarial Robustness of IoT Intrusion Detection via SHAP-Based Attribution FingerprintingDilli Prasad Sharma, Liang Xue, Xiaowei Sun, Xiaodong Lin, Pulei Xiong2025-11-09下载The rapid proliferation of Internet of Things (IoT) devices has transformed numerous industries by enabling seamless connectivity and data-driven automation.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
Guidelines for Building Indexes on Partially Cache-Coherent CXL Shared MemoryFangnuo Wu, Mingkai Dong, Wenjun Cai, Jingsheng Yan, Haibo Chen2025-11-09下载The \emph{Partial Cache-Coherence (PCC)} model maintains hardware cache coherence only within subsets of cores, enabling large-scale memory sharing with emerging memory interconnect technologies like ...

cs.PF - Performance

标题作者发布日期PDF摘要
EcoSpa: Efficient Transformer Training with Coupled SparsityJinqi Xiao, Cheng Luo, Lingyi Huang, Cheng Yang, Yang Sui, Huy Phan, Xiao Zang, Yibiao Ying, Zhexiang Tang, Anima Anandkumar, Bo Yuan2025-11-09下载Transformers have become the backbone of modern AI, yet their high computational demands pose critical system challenges. While sparse training offers efficiency gains, existing methods fail to preser...

基于 VitePress 构建