Appearance
2025-03-26
cs.AR - Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Late Breaking Results: A RISC-V ISA Extension for Chaining in Scalar Processors | Luca Colagrande, Jayanth Jonnalagadda, Luca Benini | 2025-03-26 | 下载 | Modern general-purpose accelerators integrate a large number of programmable area- and energy-efficient processing elements (PEs), to deliver high performance while meeting stringent power delivery an... |
| Dual-Issue Execution of Mixed Integer and Floating-Point Workloads on Energy-Efficient In-Order RISC-V Cores | Luca Colagrande, Luca Benini | 2025-03-26 | 下载 | To meet the computational requirements of modern workloads under tight energy constraints, general-purpose accelerator architectures have to integrate an ever-increasing number of extremely area- and ... |
| Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems | Rakesh Nadig, Vamanan Arulchelvan, Rahul Bera, Taha Shahroodi, Gagandeep Singh, Andreas Kakolyris, Mohammad Sadrosadati, Jisung Park, Onur Mutlu | 2025-03-26 | 下载 | Hybrid storage systems (HSS) integrate multiple storage devices with diverse characteristics to deliver high performance and capacity at low cost. |
| Analyzing Modern NVIDIA GPU cores | Rodrigo Huerta, Mojtaba Abaie Shoushtary, José-Lorenzo Cruz, Antonio González | 2025-03-26 | 下载 | GPUs are the most popular platform for accelerating HPC workloads, such as artificial intelligence and science simulations. However, most microarchitectural research in academia relies on GPU core pip... |
| UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture | Heng Liao, Bingyang Liu, Xianping Chen, Zhigang Guo, Chuanning Cheng, Jianbing Wang, Xiangyu Chen, Peng Dong, Rui Meng, Wenjie Liu, Zhe Zhou, Ziyang Zhang, Yuhang Gai, Cunle Qian, Yi Xiong, Zhongwu Cheng, Jing Xia, Yuli Ma, Xi Chen, Wenhua Du, Shizhong Xiao, Chungang Li, Yong Qin, Liudong Xiong, Zhou Yu, Lv Chen, Lei Chen, Buyun Wang, Pei Wu, Junen Gao, Xiaochu Li, Jian He, Shizhuan Yan, Bill McColl | 2025-03-26 | 下载 | As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture... |
| Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters | Jing Wang, Chao Li, Taolei Wang, Jinyang Guo, Hanzhang Yang, Yiming Zhuansun, Minyi Guo | 2025-03-26 | 下载 | The growing scale of data requires efficient memory subsystems with large memory capacity and high memory performance. Disaggregated architecture has become a promising solution for today's cloud and ... |
| VESTA: A Versatile SNN-Based Transformer Accelerator with Unified PEs for Multiple Computational Layers | Ching-Yao Chen, Meng-Chieh Chen, Tian-Sheuan Chang | 2025-03-26 | 下载 | Spiking Neural Networks (SNNs) and transformers represent two powerful paradigms in neural computation, known for their low power consumption and ability to capture feature dependencies, respectively. |
| ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network | Chih-Chia Hsu, Tian-Sheuan Chang | 2025-03-26 | 下载 | Deep learning-based super-resolution (SR) is challenging to implement in resource-constrained edge devices for resolutions beyond full HD due to its high computational complexity and memory bandwidth ... |
| Enhancing Finite State Machine Design Automation with Large Language Models and Prompt Engineering Techniques | Qun-Kai Lin, Cheng Hsu, Tian-Sheuan Chang | 2025-03-26 | 下载 | Large Language Models (LLMs) have attracted considerable attention in recent years due to their remarkable compatibility with Hardware Description Language (HDL) design. |
cs.DC - Distributed, Parallel, and Cluster Computing
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs | Dimitar Mileski, Nikola Petrovski, Marjan Gusev | 2025-03-26 | 下载 | Training large language models requires extensive processing, made possible by many high-performance computing resources. This study compares multi-node and multi-GPU environments for training large l... |
| History-Independent Concurrent Hash Tables | Hagit Attiya, Michael A. Bender, Martín Farach-Colton, Rotem Oshman, Noa Schiller | 2025-03-26 | 下载 | A history-independent data structure does not reveal the history of operations applied to it, only its current logical state, even if its internal state is examined. |
| AllReduce Scheduling with Hierarchical Deep Reinforcement Learning | Yufan Wei, Mickel Liu, Wenfei Wu | 2025-03-26 | 下载 | AllReduce is a technique in distributed computing which saw use in many critical applications of deep learning. Existing methods of AllReduce scheduling oftentimes lack flexibility due to being topolo... |
| Byzantine-Robust Federated Learning Using Generative Adversarial Networks | Usama Zafar, André M. H. Teixeira, Salman Toor | 2025-03-26 | 下载 | Federated learning (FL) enables collaborative model training across distributed clients without sharing raw data, but its robustness is threatened by Byzantine behaviors such as data and model poisoni... |
| Advances in Semantic Patching for HPC-oriented Refactorings with Coccinelle | Michele Martone, Julia Lawall | 2025-03-26 | 下载 | Currently, the most energy-efficient hardware platforms for floating point-intensive calculations (also known as High Performance Computing, or HPC) are graphical processing units (GPUs). |
| An Empirical Study of the Impact of Federated Learning on Machine Learning Model Accuracy | Haotian Yang, Zhuoran Wang, Benson Chou, Sophie Xu, Hao Wang, Jingxian Wang, Qizhen Zhang | 2025-03-26 | 下载 | Federated Learning (FL) enables distributed ML model training on private user data at the global scale. Despite the potential of FL demonstrated in many domains, an in-depth view of its impact on mode... |
| NotebookOS: A Replicated Notebook Platform for Interactive Training with On-Demand GPUs | Benjamin Carver, Jingyuan Zhang, Haoliang Wang, Kanak Mahadik, Yue Cheng | 2025-03-26 | 下载 | Interactive notebook programming is universal in modern ML and AI workflows, with interactive deep learning training (IDLT) emerging as a dominant use case. |
| Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation | Yunkai Liang, Zhangyu Chen, Pengfei Zuo, Zhi Zhou, Xu Chen, Zhou Yu | 2025-03-26 | 下载 | In large language model (LLM) serving systems, executing each request consists of two phases: the compute-intensive prefill phase and the memory-intensive decoding phase. |
| Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems | Rakesh Nadig, Vamanan Arulchelvan, Rahul Bera, Taha Shahroodi, Gagandeep Singh, Andreas Kakolyris, Mohammad Sadrosadati, Jisung Park, Onur Mutlu | 2025-03-26 | 下载 | Hybrid storage systems (HSS) integrate multiple storage devices with diverse characteristics to deliver high performance and capacity at low cost. |
| A Blockchain-Enabled Framework for Storage and Retrieval of Social Data | Aishwarya Parab, Prakhar Pradhan, Yogesh Simmhan, Arnab K. Paul | 2025-03-26 | 下载 | The increasing availability of data from diverse sources, including trusted entities such as governments, as well as untrusted crowd-sourced contributors, demands a secure and trustworthy environment ... |
| Workshop Scientific HPC in the pre-Exascale era (part of ITADATA 2024) Proceedings | Nicola Bena, Claudia Diamantini, Michela Natilli, Luigi Romano, Giovanni Stilo, Valentina Pansanella, Claudio A. Ardagna, Anna Monreale, Roberto Trasarti, Valentina Cesare, Gianluca Mittone, Emanuele De Rubeis, Alberto Vecchiato | 2025-03-26 | 下载 | The proceedings of Workshop Scientific HPC in the pre-Exascale era (SHPC), held in Pisa, Italy, September 18, 2024, are part of 3rd Italian Conference on Big Data and Data Science (ITADATA2024) procee... |
| GeoNimbus: A serverless framework to build earth observation and environmental services | Dante D. Sánchez-Gallegos, Diana Carrizales-Espinoza, Alejandro Zequeira, Catherine Torres-Charles, J. L. Gonzalez-Compean, Jesus Carretero | 2025-03-26 | 下载 | Cloud computing has become a popular solution for organizations implementing Earth Observation Systems (EOS). However, this produces a dependency on provider resources. |
| TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives | Size Zheng, Jin Fang, Xuegui Zheng, Qi Hou, Wenlei Bao, Ningxin Zheng, Ziheng Jiang, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, Xin Liu | 2025-03-26 | 下载 | Large deep learning models have achieved state-of-the-art performance in a wide range of tasks. These models often necessitate distributed systems for efficient training and inference. |
| Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters | Jing Wang, Chao Li, Taolei Wang, Jinyang Guo, Hanzhang Yang, Yiming Zhuansun, Minyi Guo | 2025-03-26 | 下载 | The growing scale of data requires efficient memory subsystems with large memory capacity and high memory performance. Disaggregated architecture has become a promising solution for today's cloud and ... |
| L4: Diagnosing Large-scale LLM Training Failures via Automated Log Analysis | Zhihan Jiang, Junjie Huang, Zhuangbin Chen, Yichen Li, Guangba Yu, Cong Feng, Yongqiang Yang, Zengyin Yang, Michael R. Lyu | 2025-03-26 | 下载 | As Large Language Models (LLMs) show their capabilities across various applications, training customized LLMs has become essential for modern enterprises. |
| Maya: Optimizing Deep Learning Training Workloads using GPU Runtime Emulation | Srihas Yarlagadda, Amey Agrawal, Elton Pinto, Hakesh Darapaneni, Mitali Meratwal, Shivam Mittal, Pranavi Bajjuri, Srinivas Sridharan, Alexey Tumanov | 2025-03-26 | 下载 | Training large foundation models costs hundreds of millions of dollars, making deployment optimization critical. Current approaches require machine learning engineers to manually craft training recipe... |
| AIGC-assisted Federated Learning for Edge Intelligence: Architecture Design, Research Challenges and Future Directions | Xianke Qiang, Zheng Chang, Ying-Chang Liang | 2025-03-26 | 下载 | Federated learning (FL) can fully leverage large-scale terminal data while ensuring privacy and security, and is considered as a distributed alternative for the centralized machine learning. |
cs.NI - Networking and Internet Architecture
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| AllReduce Scheduling with Hierarchical Deep Reinforcement Learning | Yufan Wei, Mickel Liu, Wenfei Wu | 2025-03-26 | 下载 | AllReduce is a technique in distributed computing which saw use in many critical applications of deep learning. Existing methods of AllReduce scheduling oftentimes lack flexibility due to being topolo... |
| Probabilistic Forecasting for Network Resource Analysis in Integrated Terrestrial and Non-Terrestrial Networks | Cristian J. Vaca-Rubio, Vaishnavi Kasuluru, Engin Zeydan, Luis Blanco, Roberto Pereira, Marius Caus, Kapal Dev | 2025-03-26 | 下载 | Efficient resource management is critical for Non-Terrestrial Networks (NTNs) to provide consistent, high-quality service in remote and under-served regions. |
| Robust Deep Reinforcement Learning in Robotics via Adaptive Gradient-Masked Adversarial Attacks | Zongyuan Zhang, Tianyang Duan, Zheng Lin, Dong Huang, Zihan Fang, Zekai Sun, Ling Xiong, Hongbin Liang, Heming Cui, Yong Cui, Yue Gao | 2025-03-26 | 下载 | Deep reinforcement learning (DRL) has emerged as a promising approach for robotic control, but its realworld deployment remains challenging due to its vulnerability to environmental perturbations. |
| State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning | Zongyuan Zhang, Tianyang Duan, Zheng Lin, Dong Huang, Zihan Fang, Zekai Sun, Ling Xiong, Hongbin Liang, Heming Cui, Yong Cui | 2025-03-26 | 下载 | Recently, deep reinforcement learning (DRL) has emerged as a promising approach for robotic control. However, the deployment of DRL in real-world robots is hindered by its sensitivity to environmental... |
| LACeS: An Open, Fast, Responsible, and Efficient Longitudinal Anycast Census System | Remi Hendriks, Matthew Luckie, Mattijs Jonker, Raffaele Sommese, Roland van Rijswijk-Deij | 2025-03-26 | 下载 | IP anycast replicates an address at multiple locations to reduce latency and enhance resilience. Due to anycast's crucial role in the modern Internet, earlier research introduced tools to perform anyc... |
| UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture | Heng Liao, Bingyang Liu, Xianping Chen, Zhigang Guo, Chuanning Cheng, Jianbing Wang, Xiangyu Chen, Peng Dong, Rui Meng, Wenjie Liu, Zhe Zhou, Ziyang Zhang, Yuhang Gai, Cunle Qian, Yi Xiong, Zhongwu Cheng, Jing Xia, Yuli Ma, Xi Chen, Wenhua Du, Shizhong Xiao, Chungang Li, Yong Qin, Liudong Xiong, Zhou Yu, Lv Chen, Lei Chen, Buyun Wang, Pei Wu, Junen Gao, Xiaochu Li, Jian He, Shizhuan Yan, Bill McColl | 2025-03-26 | 下载 | As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture... |
| CNN+Transformer Based Anomaly Traffic Detection in UAV Networks for Emergency Rescue | Yulu Han, Ziye Jia, Sijie He, Yu Zhang, Qihui Wu | 2025-03-26 | 下载 | The unmanned aerial vehicle (UAV) network has gained significant attentions in recent years due to its various applications. However, the traffic security becomes the key threatening public safety iss... |
| Sequential Task Assignment and Resource Allocation in V2X-Enabled Mobile Edge Computing | Yufei Ye, Shijian Gao, Xinhu Zheng, Liuqing Yang | 2025-03-26 | 下载 | Nowadays, the convergence of Mobile Edge Computing (MEC) and vehicular networks has emerged as a vital facilitator for the ever-increasing intelligent onboard applications. |
| TURBO: Utility-Aware Bandwidth Allocation for Cloud-Augmented Autonomous Control | Peter Schafhalter, Alexander Krentsel, Hongbo Wei, Joseph E. Gonzalez, Sylvia Ratnasamy, Scott Shenker, Ion Stoica | 2025-03-26 | 下载 | Autonomous driving system progress has been driven by improvements in machine learning models, whose computational demands now exceed what edge devices alone can provide. |
cs.PF - Performance
| 标题 | 作者 | 发布日期 | 摘要 | |
|---|---|---|---|---|
| Assembly line balancing considering stochastic task times and production defects | Gazi Nazia Nur, Mohammad Ahnaf Sadat, Basit Mahmud Shahriar | 2025-03-26 | 下载 | In this paper, we address the inherent limitations in traditional assembly line balancing, specifically the assumptions that task times are constant and no defective outputs occur. |
| Underwater Image Enhancement by Convolutional Spiking Neural Networks | Vidya Sudevan, Fakhreddine Zayer, Rizwana Kausar, Sajid Javed, Hamad Karki, Giulia De Masi, Jorge Dias | 2025-03-26 | 下载 | Underwater image enhancement (UIE) is fundamental for marine applications, including autonomous vision-based navigation. Deep learning methods using convolutional neural networks (CNN) and vision tran... |
| Linear-Time Graph Programs without Preconditions | Ziad Ismaili Alaoui, Detlef Plump | 2025-03-26 | 下载 | We report on a recent breakthrough in rule-based graph programming, which allows us to reach the time complexity of imperative linear-time algorithms. |
| Can We Make Code Green? Understanding Trade-Offs in LLMs vs. Human Code Optimizations | Pooja Rani, Jan-Andrea Bard, June Sallou, Alexander Boll, Timo Kehrer, Alberto Bacchelli | 2025-03-26 | 下载 | The rapid technological evolution has accelerated software development for various domains and use cases, contributing to a growing share of global carbon emissions. |