Skip to content

2025-03-26

cs.AR - Architecture

标题作者发布日期PDF摘要
Late Breaking Results: A RISC-V ISA Extension for Chaining in Scalar ProcessorsLuca Colagrande, Jayanth Jonnalagadda, Luca Benini2025-03-26下载Modern general-purpose accelerators integrate a large number of programmable area- and energy-efficient processing elements (PEs), to deliver high performance while meeting stringent power delivery an...
Dual-Issue Execution of Mixed Integer and Floating-Point Workloads on Energy-Efficient In-Order RISC-V CoresLuca Colagrande, Luca Benini2025-03-26下载To meet the computational requirements of modern workloads under tight energy constraints, general-purpose accelerator architectures have to integrate an ever-increasing number of extremely area- and ...
Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage SystemsRakesh Nadig, Vamanan Arulchelvan, Rahul Bera, Taha Shahroodi, Gagandeep Singh, Andreas Kakolyris, Mohammad Sadrosadati, Jisung Park, Onur Mutlu2025-03-26下载Hybrid storage systems (HSS) integrate multiple storage devices with diverse characteristics to deliver high performance and capacity at low cost.
Analyzing Modern NVIDIA GPU coresRodrigo Huerta, Mojtaba Abaie Shoushtary, José-Lorenzo Cruz, Antonio González2025-03-26下载GPUs are the most popular platform for accelerating HPC workloads, such as artificial intelligence and science simulations. However, most microarchitectural research in academia relies on GPU core pip...
UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network ArchitectureHeng Liao, Bingyang Liu, Xianping Chen, Zhigang Guo, Chuanning Cheng, Jianbing Wang, Xiangyu Chen, Peng Dong, Rui Meng, Wenjie Liu, Zhe Zhou, Ziyang Zhang, Yuhang Gai, Cunle Qian, Yi Xiong, Zhongwu Cheng, Jing Xia, Yuli Ma, Xi Chen, Wenhua Du, Shizhong Xiao, Chungang Li, Yong Qin, Liudong Xiong, Zhou Yu, Lv Chen, Lei Chen, Buyun Wang, Pei Wu, Junen Gao, Xiaochu Li, Jian He, Shizhuan Yan, Bill McColl2025-03-26下载As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture...
Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation DatacentersJing Wang, Chao Li, Taolei Wang, Jinyang Guo, Hanzhang Yang, Yiming Zhuansun, Minyi Guo2025-03-26下载The growing scale of data requires efficient memory subsystems with large memory capacity and high memory performance. Disaggregated architecture has become a promising solution for today's cloud and ...
VESTA: A Versatile SNN-Based Transformer Accelerator with Unified PEs for Multiple Computational LayersChing-Yao Chen, Meng-Chieh Chen, Tian-Sheuan Chang2025-03-26下载Spiking Neural Networks (SNNs) and transformers represent two powerful paradigms in neural computation, known for their low power consumption and ability to capture feature dependencies, respectively.
ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective NetworkChih-Chia Hsu, Tian-Sheuan Chang2025-03-26下载Deep learning-based super-resolution (SR) is challenging to implement in resource-constrained edge devices for resolutions beyond full HD due to its high computational complexity and memory bandwidth ...
Enhancing Finite State Machine Design Automation with Large Language Models and Prompt Engineering TechniquesQun-Kai Lin, Cheng Hsu, Tian-Sheuan Chang2025-03-26下载Large Language Models (LLMs) have attracted considerable attention in recent years due to their remarkable compatibility with Hardware Description Language (HDL) design.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Scalability Evaluation of HPC Multi-GPU Training for ECG-based LLMsDimitar Mileski, Nikola Petrovski, Marjan Gusev2025-03-26下载Training large language models requires extensive processing, made possible by many high-performance computing resources. This study compares multi-node and multi-GPU environments for training large l...
History-Independent Concurrent Hash TablesHagit Attiya, Michael A. Bender, Martín Farach-Colton, Rotem Oshman, Noa Schiller2025-03-26下载A history-independent data structure does not reveal the history of operations applied to it, only its current logical state, even if its internal state is examined.
AllReduce Scheduling with Hierarchical Deep Reinforcement LearningYufan Wei, Mickel Liu, Wenfei Wu2025-03-26下载AllReduce is a technique in distributed computing which saw use in many critical applications of deep learning. Existing methods of AllReduce scheduling oftentimes lack flexibility due to being topolo...
Byzantine-Robust Federated Learning Using Generative Adversarial NetworksUsama Zafar, André M. H. Teixeira, Salman Toor2025-03-26下载Federated learning (FL) enables collaborative model training across distributed clients without sharing raw data, but its robustness is threatened by Byzantine behaviors such as data and model poisoni...
Advances in Semantic Patching for HPC-oriented Refactorings with CoccinelleMichele Martone, Julia Lawall2025-03-26下载Currently, the most energy-efficient hardware platforms for floating point-intensive calculations (also known as High Performance Computing, or HPC) are graphical processing units (GPUs).
An Empirical Study of the Impact of Federated Learning on Machine Learning Model AccuracyHaotian Yang, Zhuoran Wang, Benson Chou, Sophie Xu, Hao Wang, Jingxian Wang, Qizhen Zhang2025-03-26下载Federated Learning (FL) enables distributed ML model training on private user data at the global scale. Despite the potential of FL demonstrated in many domains, an in-depth view of its impact on mode...
NotebookOS: A Replicated Notebook Platform for Interactive Training with On-Demand GPUsBenjamin Carver, Jingyuan Zhang, Haoliang Wang, Kanak Mahadik, Yue Cheng2025-03-26下载Interactive notebook programming is universal in modern ML and AI workflows, with interactive deep learning training (IDLT) emerging as a dominant use case.
Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention DisaggregationYunkai Liang, Zhangyu Chen, Pengfei Zuo, Zhi Zhou, Xu Chen, Zhou Yu2025-03-26下载In large language model (LLM) serving systems, executing each request consists of two phases: the compute-intensive prefill phase and the memory-intensive decoding phase.
Harmonia: A Multi-Agent Reinforcement Learning Approach to Data Placement and Migration in Hybrid Storage SystemsRakesh Nadig, Vamanan Arulchelvan, Rahul Bera, Taha Shahroodi, Gagandeep Singh, Andreas Kakolyris, Mohammad Sadrosadati, Jisung Park, Onur Mutlu2025-03-26下载Hybrid storage systems (HSS) integrate multiple storage devices with diverse characteristics to deliver high performance and capacity at low cost.
A Blockchain-Enabled Framework for Storage and Retrieval of Social DataAishwarya Parab, Prakhar Pradhan, Yogesh Simmhan, Arnab K. Paul2025-03-26下载The increasing availability of data from diverse sources, including trusted entities such as governments, as well as untrusted crowd-sourced contributors, demands a secure and trustworthy environment ...
Workshop Scientific HPC in the pre-Exascale era (part of ITADATA 2024) ProceedingsNicola Bena, Claudia Diamantini, Michela Natilli, Luigi Romano, Giovanni Stilo, Valentina Pansanella, Claudio A. Ardagna, Anna Monreale, Roberto Trasarti, Valentina Cesare, Gianluca Mittone, Emanuele De Rubeis, Alberto Vecchiato2025-03-26下载The proceedings of Workshop Scientific HPC in the pre-Exascale era (SHPC), held in Pisa, Italy, September 18, 2024, are part of 3rd Italian Conference on Big Data and Data Science (ITADATA2024) procee...
GeoNimbus: A serverless framework to build earth observation and environmental servicesDante D. Sánchez-Gallegos, Diana Carrizales-Espinoza, Alejandro Zequeira, Catherine Torres-Charles, J. L. Gonzalez-Compean, Jesus Carretero2025-03-26下载Cloud computing has become a popular solution for organizations implementing Earth Observation Systems (EOS). However, this produces a dependency on provider resources.
TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric PrimitivesSize Zheng, Jin Fang, Xuegui Zheng, Qi Hou, Wenlei Bao, Ningxin Zheng, Ziheng Jiang, Dongyang Wang, Jianxi Ye, Haibin Lin, Li-Wen Chang, Xin Liu2025-03-26下载Large deep learning models have achieved state-of-the-art performance in a wide range of tasks. These models often necessitate distributed systems for efficient training and inference.
Survey of Disaggregated Memory: Cross-layer Technique Insights for Next-Generation DatacentersJing Wang, Chao Li, Taolei Wang, Jinyang Guo, Hanzhang Yang, Yiming Zhuansun, Minyi Guo2025-03-26下载The growing scale of data requires efficient memory subsystems with large memory capacity and high memory performance. Disaggregated architecture has become a promising solution for today's cloud and ...
L4: Diagnosing Large-scale LLM Training Failures via Automated Log AnalysisZhihan Jiang, Junjie Huang, Zhuangbin Chen, Yichen Li, Guangba Yu, Cong Feng, Yongqiang Yang, Zengyin Yang, Michael R. Lyu2025-03-26下载As Large Language Models (LLMs) show their capabilities across various applications, training customized LLMs has become essential for modern enterprises.
Maya: Optimizing Deep Learning Training Workloads using GPU Runtime EmulationSrihas Yarlagadda, Amey Agrawal, Elton Pinto, Hakesh Darapaneni, Mitali Meratwal, Shivam Mittal, Pranavi Bajjuri, Srinivas Sridharan, Alexey Tumanov2025-03-26下载Training large foundation models costs hundreds of millions of dollars, making deployment optimization critical. Current approaches require machine learning engineers to manually craft training recipe...
AIGC-assisted Federated Learning for Edge Intelligence: Architecture Design, Research Challenges and Future DirectionsXianke Qiang, Zheng Chang, Ying-Chang Liang2025-03-26下载Federated learning (FL) can fully leverage large-scale terminal data while ensuring privacy and security, and is considered as a distributed alternative for the centralized machine learning.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
AllReduce Scheduling with Hierarchical Deep Reinforcement LearningYufan Wei, Mickel Liu, Wenfei Wu2025-03-26下载AllReduce is a technique in distributed computing which saw use in many critical applications of deep learning. Existing methods of AllReduce scheduling oftentimes lack flexibility due to being topolo...
Probabilistic Forecasting for Network Resource Analysis in Integrated Terrestrial and Non-Terrestrial NetworksCristian J. Vaca-Rubio, Vaishnavi Kasuluru, Engin Zeydan, Luis Blanco, Roberto Pereira, Marius Caus, Kapal Dev2025-03-26下载Efficient resource management is critical for Non-Terrestrial Networks (NTNs) to provide consistent, high-quality service in remote and under-served regions.
Robust Deep Reinforcement Learning in Robotics via Adaptive Gradient-Masked Adversarial AttacksZongyuan Zhang, Tianyang Duan, Zheng Lin, Dong Huang, Zihan Fang, Zekai Sun, Ling Xiong, Hongbin Liang, Heming Cui, Yong Cui, Yue Gao2025-03-26下载Deep reinforcement learning (DRL) has emerged as a promising approach for robotic control, but its realworld deployment remains challenging due to its vulnerability to environmental perturbations.
State-Aware Perturbation Optimization for Robust Deep Reinforcement LearningZongyuan Zhang, Tianyang Duan, Zheng Lin, Dong Huang, Zihan Fang, Zekai Sun, Ling Xiong, Hongbin Liang, Heming Cui, Yong Cui2025-03-26下载Recently, deep reinforcement learning (DRL) has emerged as a promising approach for robotic control. However, the deployment of DRL in real-world robots is hindered by its sensitivity to environmental...
LACeS: An Open, Fast, Responsible, and Efficient Longitudinal Anycast Census SystemRemi Hendriks, Matthew Luckie, Mattijs Jonker, Raffaele Sommese, Roland van Rijswijk-Deij2025-03-26下载IP anycast replicates an address at multiple locations to reduce latency and enhance resilience. Due to anycast's crucial role in the modern Internet, earlier research introduced tools to perform anyc...
UB-Mesh: a Hierarchically Localized nD-FullMesh Datacenter Network ArchitectureHeng Liao, Bingyang Liu, Xianping Chen, Zhigang Guo, Chuanning Cheng, Jianbing Wang, Xiangyu Chen, Peng Dong, Rui Meng, Wenjie Liu, Zhe Zhou, Ziyang Zhang, Yuhang Gai, Cunle Qian, Yi Xiong, Zhongwu Cheng, Jing Xia, Yuli Ma, Xi Chen, Wenhua Du, Shizhong Xiao, Chungang Li, Yong Qin, Liudong Xiong, Zhou Yu, Lv Chen, Lei Chen, Buyun Wang, Pei Wu, Junen Gao, Xiaochu Li, Jian He, Shizhuan Yan, Bill McColl2025-03-26下载As the Large-scale Language Models (LLMs) continue to scale, the requisite computational power and bandwidth escalate. To address this, we introduce UB-Mesh, a novel AI datacenter network architecture...
CNN+Transformer Based Anomaly Traffic Detection in UAV Networks for Emergency RescueYulu Han, Ziye Jia, Sijie He, Yu Zhang, Qihui Wu2025-03-26下载The unmanned aerial vehicle (UAV) network has gained significant attentions in recent years due to its various applications. However, the traffic security becomes the key threatening public safety iss...
Sequential Task Assignment and Resource Allocation in V2X-Enabled Mobile Edge ComputingYufei Ye, Shijian Gao, Xinhu Zheng, Liuqing Yang2025-03-26下载Nowadays, the convergence of Mobile Edge Computing (MEC) and vehicular networks has emerged as a vital facilitator for the ever-increasing intelligent onboard applications.
TURBO: Utility-Aware Bandwidth Allocation for Cloud-Augmented Autonomous ControlPeter Schafhalter, Alexander Krentsel, Hongbo Wei, Joseph E. Gonzalez, Sylvia Ratnasamy, Scott Shenker, Ion Stoica2025-03-26下载Autonomous driving system progress has been driven by improvements in machine learning models, whose computational demands now exceed what edge devices alone can provide.

cs.PF - Performance

标题作者发布日期PDF摘要
Assembly line balancing considering stochastic task times and production defectsGazi Nazia Nur, Mohammad Ahnaf Sadat, Basit Mahmud Shahriar2025-03-26下载In this paper, we address the inherent limitations in traditional assembly line balancing, specifically the assumptions that task times are constant and no defective outputs occur.
Underwater Image Enhancement by Convolutional Spiking Neural NetworksVidya Sudevan, Fakhreddine Zayer, Rizwana Kausar, Sajid Javed, Hamad Karki, Giulia De Masi, Jorge Dias2025-03-26下载Underwater image enhancement (UIE) is fundamental for marine applications, including autonomous vision-based navigation. Deep learning methods using convolutional neural networks (CNN) and vision tran...
Linear-Time Graph Programs without PreconditionsZiad Ismaili Alaoui, Detlef Plump2025-03-26下载We report on a recent breakthrough in rule-based graph programming, which allows us to reach the time complexity of imperative linear-time algorithms.
Can We Make Code Green? Understanding Trade-Offs in LLMs vs. Human Code OptimizationsPooja Rani, Jan-Andrea Bard, June Sallou, Alexander Boll, Timo Kehrer, Alberto Bacchelli2025-03-26下载The rapid technological evolution has accelerated software development for various domains and use cases, contributing to a growing share of global carbon emissions.

基于 VitePress 构建