2025-03-26

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Late Breaking Results: A RISC-V ISA Extension for Chaining in Scalar Processors	Luca Colagrande, Jayanth Jonnalagadda, Luca Benini	2025-03-26	下载	Modern general-purpose accelerators integrate a large number of programmable area- and energy-efficient processing elements (PEs), to deliver high performance while meeting stringent power delivery an...
Dual-Issue Execution of Mixed Integer and Floating-Point Workloads on Energy-Efficient In-Order RISC-V Cores	Luca Colagrande, Luca Benini	2025-03-26	下载	To meet the computational requirements of modern workloads under tight energy constraints, general-purpose accelerator architectures have to integrate an ever-increasing number of extremely area- and ...
Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems	Rakesh Nadig, Vamanan Arulchelvan, Rahul Bera, Taha Shahroodi, Gagandeep Singh, Andreas Kakolyris, Mohammad Sadrosadati, Jisung Park, Onur Mutlu	2025-03-26	下载	Hybrid storage systems (HSS) integrate multiple storage devices with diverse characteristics to deliver high performance and capacity at low cost.
Analyzing Modern NVIDIA GPU cores	Rodrigo Huerta, Mojtaba Abaie Shoushtary, José-Lorenzo Cruz, Antonio González	2025-03-26	下载	GPUs are the most popular platform for accelerating HPC workloads, such as artificial intelligence and science simulations. However, most microarchitectural research in academia relies on GPU core pip...
UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture	Heng Liao, Bingyang Liu, Xianping Chen, Zhigang Guo, Chuanning Cheng, Jianbing Wang, Xiangyu Chen, Peng Dong, Rui Meng, Wenjie Liu, Zhe Zhou, Ziyang Zhang, Yuhang Gai, Cunle Qian, Yi Xiong, Zhongwu Cheng, Jing Xia, Yuli Ma, Xi Chen, Wenhua Du, Shizhong Xiao, Chungang Li, Yong Qin, Liudong Xiong, Zhou Yu, Lv Chen, Lei Chen, Buyun Wang, Pei Wu, Junen Gao, Xiaochu Li, Jian He, Shizhuan Yan, Bill McColl	2025-03-26	下载	As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture...
Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters	Jing Wang, Chao Li, Taolei Wang, Jinyang Guo, Hanzhang Yang, Yiming Zhuansun, Minyi Guo	2025-03-26	下载	The growing scale of data requires efficient memory subsystems with large memory capacity and high memory performance. Disaggregated architecture has become a promising solution for today's cloud and ...
VESTA: A Versatile SNN-Based Transformer Accelerator with Unified PEs for Multiple Computational Layers	Ching-Yao Chen, Meng-Chieh Chen, Tian-Sheuan Chang	2025-03-26	下载	Spiking Neural Networks (SNNs) and transformers represent two powerful paradigms in neural computation, known for their low power consumption and ability to capture feature dependencies, respectively.
ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network	Chih-Chia Hsu, Tian-Sheuan Chang	2025-03-26	下载	Deep learning-based super-resolution (SR) is challenging to implement in resource-constrained edge devices for resolutions beyond full HD due to its high computational complexity and memory bandwidth ...
Enhancing Finite State Machine Design Automation with Large Language Models and Prompt Engineering Techniques	Qun-Kai Lin, Cheng Hsu, Tian-Sheuan Chang	2025-03-26	下载	Large Language Models (LLMs) have attracted considerable attention in recent years due to their remarkable compatibility with Hardware Description Language (HDL) design.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMs	Dimitar Mileski, Nikola Petrovski, Marjan Gusev	2025-03-26	下载	Training large language models requires extensive processing, made possible by many high-performance computing resources. This study compares multi-node and multi-GPU environments for training large l...
History-Independent Concurrent Hash Tables	Hagit Attiya, Michael A. Bender, Martín Farach-Colton, Rotem Oshman, Noa Schiller	2025-03-26	下载	A history-independent data structure does not reveal the history of operations applied to it, only its current logical state, even if its internal state is examined.
AllReduce Scheduling with Hierarchical Deep Reinforcement Learning	Yufan Wei, Mickel Liu, Wenfei Wu	2025-03-26	下载	AllReduce is a technique in distributed computing which saw use in many critical applications of deep learning. Existing methods of AllReduce scheduling oftentimes lack flexibility due to being topolo...
Byzantine-Robust Federated Learning Using Generative Adversarial Networks	Usama Zafar, André M. H. Teixeira, Salman Toor	2025-03-26	下载	Federated learning (FL) enables collaborative model training across distributed clients without sharing raw data, but its robustness is threatened by Byzantine behaviors such as data and model poisoni...
Advances in Semantic Patching for HPC-oriented Refactorings with Coccinelle	Michele Martone, Julia Lawall	2025-03-26	下载	Currently, the most energy-efficient hardware platforms for floating point-intensive calculations (also known as High Performance Computing, or HPC) are graphical processing units (GPUs).
An Empirical Study of the Impact of Federated Learning on Machine Learning Model Accuracy	Haotian Yang, Zhuoran Wang, Benson Chou, Sophie Xu, Hao Wang, Jingxian Wang, Qizhen Zhang	2025-03-26	下载	Federated Learning (FL) enables distributed ML model training on private user data at the global scale. Despite the potential of FL demonstrated in many domains, an in-depth view of its impact on mode...
NotebookOS: A Replicated Notebook Platform for Interactive Training with On-Demand GPUs	Benjamin Carver, Jingyuan Zhang, Haoliang Wang, Kanak Mahadik, Yue Cheng	2025-03-26	下载	Interactive notebook programming is universal in modern ML and AI workflows, with interactive deep learning training (IDLT) emerging as a dominant use case.
Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation	Yunkai Liang, Zhangyu Chen, Pengfei Zuo, Zhi Zhou, Xu Chen, Zhou Yu	2025-03-26	下载	In large language model (LLM) serving systems, executing each request consists of two phases: the compute-intensive prefill phase and the memory-intensive decoding phase.
Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage Systems	Rakesh Nadig, Vamanan Arulchelvan, Rahul Bera, Taha Shahroodi, Gagandeep Singh, Andreas Kakolyris, Mohammad Sadrosadati, Jisung Park, Onur Mutlu	2025-03-26	下载	Hybrid storage systems (HSS) integrate multiple storage devices with diverse characteristics to deliver high performance and capacity at low cost.
A Blockchain-Enabled Framework for Storage and Retrieval of Social Data	Aishwarya Parab, Prakhar Pradhan, Yogesh Simmhan, Arnab K. Paul	2025-03-26	下载	The increasing availability of data from diverse sources, including trusted entities such as governments, as well as untrusted crowd-sourced contributors, demands a secure and trustworthy environment ...
Workshop Scientific HPC in the pre-Exascale era (part of ITADATA 2024) Proceedings	Nicola Bena, Claudia Diamantini, Michela Natilli, Luigi Romano, Giovanni Stilo, Valentina Pansanella, Claudio A. Ardagna, Anna Monreale, Roberto Trasarti, Valentina Cesare, Gianluca Mittone, Emanuele De Rubeis, Alberto Vecchiato	2025-03-26	下载	The proceedings of Workshop Scientific HPC in the pre-Exascale era (SHPC), held in Pisa, Italy, September 18, 2024, are part of 3rd Italian Conference on Big Data and Data Science (ITADATA2024) procee...
GeoNimbus: A serverless framework to build earth observation and environmental services	Dante D. Sánchez-Gallegos, Diana Carrizales-Espinoza, Alejandro Zequeira, Catherine Torres-Charles, J. L. Gonzalez-Compean, Jesus Carretero	2025-03-26	下载	Cloud computing has become a popular solution for organizations implementing Earth Observation Systems (EOS). However, this produces a dependency on provider resources.
TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives	Size Zheng, Jin Fang, Xuegui Zheng, Qi Hou, Wenlei Bao, Ningxin Zheng, Ziheng Jiang, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, Xin Liu	2025-03-26	下载	Large deep learning models have achieved state-of-the-art performance in a wide range of tasks. These models often necessitate distributed systems for efficient training and inference.
Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation Datacenters	Jing Wang, Chao Li, Taolei Wang, Jinyang Guo, Hanzhang Yang, Yiming Zhuansun, Minyi Guo	2025-03-26	下载	The growing scale of data requires efficient memory subsystems with large memory capacity and high memory performance. Disaggregated architecture has become a promising solution for today's cloud and ...
L4: Diagnosing Large-scale LLM Training Failures via Automated Log Analysis	Zhihan Jiang, Junjie Huang, Zhuangbin Chen, Yichen Li, Guangba Yu, Cong Feng, Yongqiang Yang, Zengyin Yang, Michael R. Lyu	2025-03-26	下载	As Large Language Models (LLMs) show their capabilities across various applications, training customized LLMs has become essential for modern enterprises.
Maya: Optimizing Deep Learning Training Workloads using GPU Runtime Emulation	Srihas Yarlagadda, Amey Agrawal, Elton Pinto, Hakesh Darapaneni, Mitali Meratwal, Shivam Mittal, Pranavi Bajjuri, Srinivas Sridharan, Alexey Tumanov	2025-03-26	下载	Training large foundation models costs hundreds of millions of dollars, making deployment optimization critical. Current approaches require machine learning engineers to manually craft training recipe...
AIGC-assisted Federated Learning for Edge Intelligence: Architecture Design, Research Challenges and Future Directions	Xianke Qiang, Zheng Chang, Ying-Chang Liang	2025-03-26	下载	Federated learning (FL) can fully leverage large-scale terminal data while ensuring privacy and security, and is considered as a distributed alternative for the centralized machine learning.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
AllReduce Scheduling with Hierarchical Deep Reinforcement Learning	Yufan Wei, Mickel Liu, Wenfei Wu	2025-03-26	下载	AllReduce is a technique in distributed computing which saw use in many critical applications of deep learning. Existing methods of AllReduce scheduling oftentimes lack flexibility due to being topolo...
Probabilistic Forecasting for Network Resource Analysis in Integrated Terrestrial and Non-Terrestrial Networks	Cristian J. Vaca-Rubio, Vaishnavi Kasuluru, Engin Zeydan, Luis Blanco, Roberto Pereira, Marius Caus, Kapal Dev	2025-03-26	下载	Efficient resource management is critical for Non-Terrestrial Networks (NTNs) to provide consistent, high-quality service in remote and under-served regions.
Robust Deep Reinforcement Learning in Robotics via Adaptive Gradient-Masked Adversarial Attacks	Zongyuan Zhang, Tianyang Duan, Zheng Lin, Dong Huang, Zihan Fang, Zekai Sun, Ling Xiong, Hongbin Liang, Heming Cui, Yong Cui, Yue Gao	2025-03-26	下载	Deep reinforcement learning (DRL) has emerged as a promising approach for robotic control, but its realworld deployment remains challenging due to its vulnerability to environmental perturbations.
State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning	Zongyuan Zhang, Tianyang Duan, Zheng Lin, Dong Huang, Zihan Fang, Zekai Sun, Ling Xiong, Hongbin Liang, Heming Cui, Yong Cui	2025-03-26	下载	Recently, deep reinforcement learning (DRL) has emerged as a promising approach for robotic control. However, the deployment of DRL in real-world robots is hindered by its sensitivity to environmental...
LACeS: An Open, Fast, Responsible, and Efficient Longitudinal Anycast Census System	Remi Hendriks, Matthew Luckie, Mattijs Jonker, Raffaele Sommese, Roland van Rijswijk-Deij	2025-03-26	下载	IP anycast replicates an address at multiple locations to reduce latency and enhance resilience. Due to anycast's crucial role in the modern Internet, earlier research introduced tools to perform anyc...
UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network Architecture	Heng Liao, Bingyang Liu, Xianping Chen, Zhigang Guo, Chuanning Cheng, Jianbing Wang, Xiangyu Chen, Peng Dong, Rui Meng, Wenjie Liu, Zhe Zhou, Ziyang Zhang, Yuhang Gai, Cunle Qian, Yi Xiong, Zhongwu Cheng, Jing Xia, Yuli Ma, Xi Chen, Wenhua Du, Shizhong Xiao, Chungang Li, Yong Qin, Liudong Xiong, Zhou Yu, Lv Chen, Lei Chen, Buyun Wang, Pei Wu, Junen Gao, Xiaochu Li, Jian He, Shizhuan Yan, Bill McColl	2025-03-26	下载	As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture...
CNN+Transformer Based Anomaly Traffic Detection in UAV Networks for Emergency Rescue	Yulu Han, Ziye Jia, Sijie He, Yu Zhang, Qihui Wu	2025-03-26	下载	The unmanned aerial vehicle (UAV) network has gained significant attentions in recent years due to its various applications. However, the traffic security becomes the key threatening public safety iss...
Sequential Task Assignment and Resource Allocation in V2X-Enabled Mobile Edge Computing	Yufei Ye, Shijian Gao, Xinhu Zheng, Liuqing Yang	2025-03-26	下载	Nowadays, the convergence of Mobile Edge Computing (MEC) and vehicular networks has emerged as a vital facilitator for the ever-increasing intelligent onboard applications.
TURBO: Utility-Aware Bandwidth Allocation for Cloud-Augmented Autonomous Control	Peter Schafhalter, Alexander Krentsel, Hongbo Wei, Joseph E. Gonzalez, Sylvia Ratnasamy, Scott Shenker, Ion Stoica	2025-03-26	下载	Autonomous driving system progress has been driven by improvements in machine learning models, whose computational demands now exceed what edge devices alone can provide.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Assembly line balancing considering stochastic task times and production defects	Gazi Nazia Nur, Mohammad Ahnaf Sadat, Basit Mahmud Shahriar	2025-03-26	下载	In this paper, we address the inherent limitations in traditional assembly line balancing, specifically the assumptions that task times are constant and no defective outputs occur.
Underwater Image Enhancement by Convolutional Spiking Neural Networks	Vidya Sudevan, Fakhreddine Zayer, Rizwana Kausar, Sajid Javed, Hamad Karki, Giulia De Masi, Jorge Dias	2025-03-26	下载	Underwater image enhancement (UIE) is fundamental for marine applications, including autonomous vision-based navigation. Deep learning methods using convolutional neural networks (CNN) and vision tran...
Linear-Time Graph Programs without Preconditions	Ziad Ismaili Alaoui, Detlef Plump	2025-03-26	下载	We report on a recent breakthrough in rule-based graph programming, which allows us to reach the time complexity of imperative linear-time algorithms.
Can We Make Code Green? Understanding Trade-Offs in LLMs vs. Human Code Optimizations	Pooja Rani, Jan-Andrea Bard, June Sallou, Alexander Boll, Timo Kehrer, Alberto Bacchelli	2025-03-26	下载	The rapid technological evolution has accelerated software development for various domains and use cases, contributing to a growing share of global carbon emissions.