2025-03-31

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis	Nika Mansouri Ghiasi, Talu Güloglu, Harun Mustafa, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, Onur Mutlu	2025-03-31	下载	Genome sequence analysis, which examines the DNA sequences of organisms, drives advances in many critical medical and biotechnological fields.
PIM-LLM: A High-Throughput Hybrid PIM Architecture for 1-bit LLMs	Jinendra Malekar, Peyton Chandarana, Md Hasibul Amin, Mohammed E. Elbtity, Ramtin Zand	2025-03-31	下载	In this paper, we propose PIM-LLM, a hybrid architecture developed to accelerate 1-bit large language models (LLMs). PIM-LLM leverages analog processing-in-memory (PIM) architectures and digital systo...
SPRING: Systematic Profiling of Randomly Interconnected Neural Networks Generated by HLS	Rui Shi, Seda Ogrenci	2025-03-31	下载	Profiling is important for performance optimization by providing real-time observations and measurements of important parameters of hardware execution.
Banked Memories for Soft SIMT Processors	Martin Langhammer, George A. Constantinides	2025-03-31	下载	Recent advances in soft GPGPU architectures have shown that a small (<10K LUT), high performance (770 MHz) processor is possible in modern FPGAs.
ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance	Tong Xie, Jiawang Zhao, Zishen Wan, Zuodong Zhang, Yuan Wang, Runsheng Wang, Ru Huang, Meng Li	2025-03-31	下载	The demand for efficient large language model (LLM) inference has propelled the development of dedicated accelerators. As accelerators are vulnerable to hardware faults due to aging, variation, etc, e...
DiffuSE: Cross-Layer Design Space Exploration of DNN Accelerator via Diffusion-Driven Optimization	Yi Ren, Chenhao Xue, Jiaxing Zhang, Chen Zhang, Qiang Xu, Yibo Lin, Lining Zhang, Guangyu Sun	2025-03-31	下载	The proliferation of deep learning accelerators calls for efficient and cost-effective hardware design solutions, where parameterized modular hardware generator and electronic design automation (EDA) ...
DOMAC: Differentiable Optimization for High-Speed Multipliers and Multiply-Accumulators	Chenhao Xue, Yi Ren, Jinwei Zhou, Kezhi Li, Chen Zhang, Yibo Lin, Lining Zhang, Qiang Xu, Guangyu Sun	2025-03-31	下载	Multipliers and multiply-accumulators (MACs) are fundamental building blocks for compute-intensive applications such as artificial intelligence.
MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration	Tatsuya Kubo, Daichi Tokuda, Tomoya Nagatani, Masayuki Usui, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki	2025-03-31	下载	General matrix-vector multiplication (GeMV) remains a critical latency bottleneck in large language model (LLM) inference, even with quantized low-bit models.
TuRTLe: A Unified Evaluation of LLMs for RTL Generation	Dario Garcia-Gasulla, Gokcen Kestor, Emanuele Parisi, Miquel Albertí-Binimelis, Cristian Gutierrez, Razine Moundir Ghorab, Orlando Montenegro, Bernat Homs, Miquel Moreto	2025-03-31	下载	The rapid advancements in LLMs have driven the adoption of generative AI in various domains, including Electronic Design Automation (EDA). Unlike traditional software development, EDA presents unique ...
Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers	Chaojian Li, Sixu Li, Linrui Jiang, Jingqun Zhang, Yingyan Celine Lin	2025-03-31	下载	Recent advancements in neural rendering technologies and their supporting devices have paved the way for immersive 3D experiences, significantly transforming human interaction with intelligent devices...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis	Nika Mansouri Ghiasi, Talu Güloglu, Harun Mustafa, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, Onur Mutlu	2025-03-31	下载	Genome sequence analysis, which examines the DNA sequences of organisms, drives advances in many critical medical and biotechnological fields.
Rack Position Optimization in Large-Scale Heterogeneous Data Centers	Chang-Lin Chen, Jiayu Chen, Tian Lan, Zhaoxia Zhao, Hongbo Dong, Vaneet Aggarwal	2025-03-31	下载	As rapidly growing AI computational demands accelerate the need for new hardware installation and maintenance, this work explores optimal data center resource management by balancing operational effic...
GPU-centric Communication Schemes for HPC and ML Applications	Naveen Namashivayam	2025-03-31	下载	Compute nodes on modern heterogeneous supercomputing systems comprise CPUs, GPUs, and high-speed network interconnects (NICs). Parallelization is identified as a technique for effectively utilizing th...
Fermilab's Transition to Token Authentication	Dave Dykstra, Mine Altunay, Shreyas Bhat, Dmitry Litvintsev, Marco Mambelli, Marc Mengel, Stephen White	2025-03-31	下载	Fermilab is the first High Energy Physics institution to transition from X.509 user certificates to authentication tokens in production systems.
Enhancing Traffic Safety with AI and 6G: Latency Requirements and Real-Time Threat Detection	Kurt Horvath, Dragi Kimovski, Stojan Kitanov, Radu Prodan	2025-03-31	下载	The rapid digitalization of urban infrastructure opens the path to smart cities, where IoT-enabled infrastructure enhances public safety and efficiency.
Deep Learning Model Deployment in Multiple Cloud Providers: an Exploratory Study Using Low Computing Power Environments	Elayne Lemos, Rodrigo Oliveira, Jairson Rodrigues, Rosalvo F. Oliveira Neto	2025-03-31	下载	The deployment of Machine Learning models in the cloud has grown among tech companies. Hardware requirements are higher when these models involve Deep Learning techniques, and the cloud providers' cos...
A Practical Rollup Escape Hatch Design	Francisco Gomes Figueira, Martin Derka, Ching Lun Chiu, Jan Gorzny	2025-03-31	下载	A rollup network is a type of popular "Layer 2" scaling solution for general purpose "Layer 1" blockchains like Ethereum. Rollups networks separate execution of transactions from other aspects like co...
OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model Training	Yijie Zheng, Bangjun Xiao, Lei Shi, Xiaoyang Li, Faming Wu, Tianyu Li, Xuefeng Xiao, Yang Zhang, Yuxuan Wang, Shouda Liu	2025-03-31	下载	Multimodal large language models (MLLMs), such as GPT-4o, are garnering significant attention. During the exploration of MLLM training, we identified Modality Composition Incoherence, a phenomenon tha...
MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration	Tatsuya Kubo, Daichi Tokuda, Tomoya Nagatani, Masayuki Usui, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki	2025-03-31	下载	General matrix-vector multiplication (GeMV) remains a critical latency bottleneck in large language model (LLM) inference, even with quantized low-bit models.
Who is in Charge here? Understanding How Runtime Configuration Affects Software along with Variables&Constants	Chaopeng Luo, Yuanliang Zhang, Haochen He, Zhouyang Jia, Teng Wang, Shulin Zhou, Si Zheng, Shanshan Li	2025-03-31	下载	Runtime misconfiguration can lead to software performance degradation and even cause failure. Developers typically perform sanity checks during the configuration parsing stage to prevent invalid param...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Rack Position Optimization in Large-Scale Heterogeneous Data Centers	Chang-Lin Chen, Jiayu Chen, Tian Lan, Zhaoxia Zhao, Hongbo Dong, Vaneet Aggarwal	2025-03-31	下载	As rapidly growing AI computational demands accelerate the need for new hardware installation and maintenance, this work explores optimal data center resource management by balancing operational effic...
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning	Yubo Zhang, Pedro Botelho, Trevor Gordon, Gil Zussman, Igor Kadota	2025-03-31	下载	We consider a decentralized wireless network with several source-destination pairs sharing a limited number of orthogonal frequency bands. Sources learn to adapt their transmissions (specifically, the...
Moving Edge for On-Demand Edge Computing: An Uncertainty-aware Approach	Fangtong Zhou, Ruozhou Yu	2025-03-31	下载	We study an edge demand response problem where, based on historical edge workload demands, an edge provider needs to dispatch moving computing units, e.g.
Traffic Engineering in Large-scale Networks with Generalizable Graph Neural Networks	Fangtong Zhou, Xiaorui Liu, Ruozhou Yu, Guoliang Xue	2025-03-31	下载	Traffic Engineering (TE) in large-scale networks like cloud Wide Area Networks (WANs) and Low Earth Orbit (LEO) satellite constellations is a critical challenge.
Trident: Interference Avoidance in Multi-reader Backscatter Network via Frequency-space Division	Yang Zou, Xin Na, Yimiao Sun, Yuan He	2025-03-31	下载	Backscatter is a key technology for battery-free sensing in industrial IoT applications. To fully cover numerous tags in the deployment area, one often needs to deploy multiple readers, each of which ...
Cell-Free Massive MIMO Under Mobility: A Fairness-Differentiated Handover Scheme	Yunlu Xiao, Marina Petrova, Ljiljana Simić	2025-03-31	下载	While cell-free massive MIMO (CF-mMIMO) offers high network-wide throughput in static networks, especially for the worst-served users, its performance in mobile networks is not yet fully addressed.
Robust Predictive Routing for Internet of Vehicles Leveraging Both V2I and V2V Links	Yawen Chang, Xudong Wang	2025-03-31	下载	With the developments of the Internet of Vehicles (IoV) from 4G to 5G, vehicle-to-infrastructure (V2I) communications are becoming attractive for vehicle users (VUEs) to obtain diverse cloud service t...
Blockchain for Federated Learning in the Internet of Things: Trustworthy Adaptation, Standards, and the Road Ahead	Farhana Javed, Engin Zeydan, Josep Mangues-Bafalluy, Kapal Dev, Luis Blanco	2025-03-31	下载	As edge computing gains prominence in Internet of Things (IoTs), smart cities, and autonomous systems, the demand for real-time machine intelligence with low latency and model reliability continues to...
Multi-Agent Deep Reinforcement Learning for Optimized Multi-UAV Coverage and Power-Efficient UE Connectivity	Xuli Cai, Poonam Lohan, Burak Kantarci	2025-03-31	下载	In critical situations such as natural disasters, network outages, battlefield communication, or large-scale public events, Unmanned Aerial Vehicles (UAVs) offer a promising approach to maximize wirel...
Optimizing Age of Information in Networks with Large and Small Updates	Zhuoyi Zhao, Vishrant Tripathi, Igor Kadota	2025-03-31	下载	Modern sensing and monitoring applications typically consist of sources transmitting updates of different sizes, ranging from a few bytes (position, temperature, etc.

cs.OS - Operating Systems

标题	作者	发布日期	PDF	摘要
HeteroPod: XPU-Accelerated Infrastructure Offloading for Commodity Cloud-Native Applications	Bicheng Yang, Jingkai He, Dong Du, Yubin Xia, Haibo Chen	2025-03-31	下载	Cloud-native systems increasingly rely on infrastructure services (e.g., service meshes, monitoring agents), which compete for resources with user applications, degrading performance and scalability.
Who is in Charge here? Understanding How Runtime Configuration Affects Software along with Variables&Constants	Chaopeng Luo, Yuanliang Zhang, Haochen He, Zhouyang Jia, Teng Wang, Shulin Zhou, Si Zheng, Shanshan Li	2025-03-31	下载	Runtime misconfiguration can lead to software performance degradation and even cause failure. Developers typically perform sanity checks during the configuration parsing stage to prevent invalid param...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Deep Learning Model Deployment in Multiple Cloud Providers: an Exploratory Study Using Low Computing Power Environments	Elayne Lemos, Rodrigo Oliveira, Jairson Rodrigues, Rosalvo F. Oliveira Neto	2025-03-31	下载	The deployment of Machine Learning models in the cloud has grown among tech companies. Hardware requirements are higher when these models involve Deep Learning techniques, and the cloud providers' cos...