Appearance
2025-11-09
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| FPGA or GPU? Analyzing comparative research for application-specific guidance | Arnab A Purkayastha, Jay Tharwani, Shobhit Aggarwal | 2025-11-09 | 下载 | The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs... |
| Offloading Data Center Tax | Akshay Revankar, Charan Renganathan, Sartaj Wariah | 2025-11-09 | 下载 | The data centers of today are running diverse workloads sharing many common lower level functions called tax components. Any optimization to any tax component will lead to performance improvements acr... |
| Precision-Scalable Microscaling Datapaths with Optimized Reduction Tree for Efficient NPU Integration | Stef Cuyckens, Xiaoling Yi, Robin Geens, Joren Dumoulin, Martin Wiesner, Chao Fang, Marian Verhelst | 2025-11-09 | 下载 | Emerging continual learning applications necessitate next-generation neural processing unit (NPU) platforms to support both training and inference operations. |
| STAR: Improving Lifetime and Performance of High-Capacity Modern SSDs Using State-Aware Randomizer | Omin Kwon, Kyungjun Oh, Jaeyong Lee, Myungsuk Kim, Jihong Kim | 2025-11-09 | 下载 | Although NAND flash memory has achieved continuous capacity improvements via advanced 3D stacking and multi-level cell technologies, these innovations introduce new reliability challenges, particularl... |
| Exploring Parallelism in FPGA-Based Accelerators for Machine Learning Applications | Sed Centeno, Christopher Sprague, Arnab A Purkayastha, Ray Simar, Neeraj Magotra | 2025-11-09 | 下载 | Speculative backpropagation has emerged as a promising technique to accelerate the training of neural networks by overlapping the forward and backward passes. |
| SoK: Systematizing a Decade of Architectural RowHammer Defenses Through the Lens of Streaming Algorithms | Michael Jaemin Kim, Seungmin Baek, Jumin Kim, Hwayong Nam, Nam Sung Kim, Jung Ho Ahn | 2025-11-09 | 下载 | A decade after its academic introduction, RowHammer (RH) remains a moving target that continues to challenge both the industry and academia. With its potential to serve as a critical attack vector, th... |
| LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs | Zifan He, Shengyu Ye, Rui Ma, Yang Wang, Jason Cong | 2025-11-09 | 下载 | The rapid development of large language models (LLM) has greatly enhanced everyday applications. While many FPGA-based accelerators, with flexibility for fine-grained data control, exhibit superior sp... |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| FPGA or GPU? Analyzing comparative research for application-specific guidance | Arnab A Purkayastha, Jay Tharwani, Shobhit Aggarwal | 2025-11-09 | 下载 | The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs... |
| Towards Optimal Constellation Design for Digital Over-the-Air Computation | Saeed Razavikia, Deniz Gündüz, Carlo Fischione | 2025-11-09 | 下载 | Over-the-air computation (OAC) has emerged as a key technique for efficient function computation over multiple-access channels (MACs) by exploiting the waveform superposition property of the wireless ... |
| PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization | Kelun Lei, Hailong Yang, Huaitao Zhang, Xin You, Kaige Zhang, Zhongzhi Luan, Yi Liu, Depei Qian | 2025-11-09 | 下载 | Designing high-performance kernels requires expert-level tuning and a deep understanding of hardware characteristics. Recent advances in large language models (LLMs) have enabled automated kernel gene... |
| Optimizing Long-context LLM Serving via Fine-grained Sequence Parallelism | Cong Li, Yuzhe Yang, Xuegui Zheng, Qifan Yang, Yijin Guan, Size Zheng, Li-Wen Chang, Shufan Liu, Xin Liu, Guangyu Sun | 2025-11-09 | 下载 | With the advancement of large language models (LLMs), their context windows have rapidly expanded. To meet diverse demands from varying-length requests in online services, existing state-of-the-art sy... |
| Exploring Parallelism in FPGA-Based Accelerators for Machine Learning Applications | Sed Centeno, Christopher Sprague, Arnab A Purkayastha, Ray Simar, Neeraj Magotra | 2025-11-09 | 下载 | Speculative backpropagation has emerged as a promising technique to accelerate the training of neural networks by overlapping the forward and backward passes. |
| LiteCast: A Lightweight Forecaster for Carbon Optimizations | Mathew Joseph, Tanush Savadi, Abel Souza | 2025-11-09 | 下载 | Over recent decades, electricity demand has experienced sustained growth through widespread electrification of transportation and the accelerated expansion of Artificial Intelligence (AI). |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| CYPRESS: Transferring Secrets in the Shadow of Visible Packets | Sirus Shahini, Robert Ricci | 2025-11-09 | 下载 | Network steganography and covert communication channels have been studied extensively in the past. However, prior works offer minimal practical use for their proposed techniques and are limited to spe... |
| Privacy-Preserving Federated Learning for Fair and Efficient Urban Traffic Optimization | Rathin Chandra Shit, Sharmila Subudhi | 2025-11-09 | 下载 | The optimization of urban traffic is threatened by the complexity of achieving a balance between transport efficiency and the maintenance of privacy, as well as the equitable distribution of traffic b... |
| Enhancing Adversarial Robustness of IoT Intrusion Detection via SHAP-Based Attribution Fingerprinting | Dilli Prasad Sharma, Liang Xue, Xiaowei Sun, Xiaodong Lin, Pulei Xiong | 2025-11-09 | 下载 | The rapid proliferation of Internet of Things (IoT) devices has transformed numerous industries by enabling seamless connectivity and data-driven automation. |
cs.OS - Operating Systems
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Guidelines for Building Indexes on Partially Cache-Coherent CXL Shared Memory | Fangnuo Wu, Mingkai Dong, Wenjun Cai, Jingsheng Yan, Haibo Chen | 2025-11-09 | 下载 | The \emph{Partial Cache-Coherence (PCC)} model maintains hardware cache coherence only within subsets of cores, enabling large-scale memory sharing with emerging memory interconnect technologies like ... |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| EcoSpa: Efficient Transformer Training with Coupled Sparsity | Jinqi Xiao, Cheng Luo, Lingyi Huang, Cheng Yang, Yang Sui, Huy Phan, Xiao Zang, Yibiao Ying, Zhexiang Tang, Anima Anandkumar, Bo Yuan | 2025-11-09 | 下载 | Transformers have become the backbone of modern AI, yet their high computational demands pose critical system challenges. While sparse training offers efficiency gains, existing methods fail to preser... |