Skip to content

2025-03-31

cs.AR - Architecture

标题作者发布日期PDF摘要
SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence AnalysisNika Mansouri Ghiasi, Talu Güloglu, Harun Mustafa, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, Onur Mutlu2025-03-31下载Genome sequence analysis, which examines the DNA sequences of organisms, drives advances in many critical medical and biotechnological fields.
PIM-LLM: A High-Throughput Hybrid PIM Architecture for 1-bit LLMsJinendra Malekar, Peyton Chandarana, Md Hasibul Amin, Mohammed E. Elbtity, Ramtin Zand2025-03-31下载In this paper, we propose PIM-LLM, a hybrid architecture developed to accelerate 1-bit large language models (LLMs). PIM-LLM leverages analog processing-in-memory (PIM) architectures and digital systo...
SPRING: Systematic Profiling of Randomly Interconnected Neural Networks Generated by HLSRui Shi, Seda Ogrenci2025-03-31下载Profiling is important for performance optimization by providing real-time observations and measurements of important parameters of hardware execution.
Banked Memories for Soft SIMT ProcessorsMartin Langhammer, George A. Constantinides2025-03-31下载Recent advances in soft GPGPU architectures have shown that a small (<10K LUT), high performance (770 MHz) processor is possible in modern FPGAs.
ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault ToleranceTong Xie, Jiawang Zhao, Zishen Wan, Zuodong Zhang, Yuan Wang, Runsheng Wang, Ru Huang, Meng Li2025-03-31下载The demand for efficient large language model (LLM) inference has propelled the development of dedicated accelerators. As accelerators are vulnerable to hardware faults due to aging, variation, etc, e...
DiffuSE: Cross-Layer Design Space Exploration of DNN Accelerator via Diffusion-Driven OptimizationYi Ren, Chenhao Xue, Jiaxing Zhang, Chen Zhang, Qiang Xu, Yibo Lin, Lining Zhang, Guangyu Sun2025-03-31下载The proliferation of deep learning accelerators calls for efficient and cost-effective hardware design solutions, where parameterized modular hardware generator and electronic design automation (EDA) ...
DOMAC: Differentiable Optimization for High-Speed Multipliers and Multiply-AccumulatorsChenhao Xue, Yi Ren, Jinwei Zhou, Kezhi Li, Chen Zhang, Yibo Lin, Lining Zhang, Qiang Xu, Guangyu Sun2025-03-31下载Multipliers and multiply-accumulators (MACs) are fundamental building blocks for compute-intensive applications such as artificial intelligence.
MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM AccelerationTatsuya Kubo, Daichi Tokuda, Tomoya Nagatani, Masayuki Usui, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki2025-03-31下载General matrix-vector multiplication (GeMV) remains a critical latency bottleneck in large language model (LLM) inference, even with quantized low-bit models.
TuRTLe: A Unified Evaluation of LLMs for RTL GenerationDario Garcia-Gasulla, Gokcen Kestor, Emanuele Parisi, Miquel Albertí-Binimelis, Cristian Gutierrez, Razine Moundir Ghorab, Orlando Montenegro, Bernat Homs, Miquel Moreto2025-03-31下载The rapid advancements in LLMs have driven the adoption of generative AI in various domains, including Electronic Design Automation (EDA). Unlike traditional software development, EDA presents unique ...
Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural RenderersChaojian Li, Sixu Li, Linrui Jiang, Jingqun Zhang, Yingyan Celine Lin2025-03-31下载Recent advancements in neural rendering technologies and their supporting devices have paved the way for immersive 3D experiences, significantly transforming human interaction with intelligent devices...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence AnalysisNika Mansouri Ghiasi, Talu Güloglu, Harun Mustafa, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, Onur Mutlu2025-03-31下载Genome sequence analysis, which examines the DNA sequences of organisms, drives advances in many critical medical and biotechnological fields.
Rack Position Optimization in Large-Scale Heterogeneous Data CentersChang-Lin Chen, Jiayu Chen, Tian Lan, Zhaoxia Zhao, Hongbo Dong, Vaneet Aggarwal2025-03-31下载As rapidly growing AI computational demands accelerate the need for new hardware installation and maintenance, this work explores optimal data center resource management by balancing operational effic...
GPU-centric Communication Schemes for HPC and ML ApplicationsNaveen Namashivayam2025-03-31下载Compute nodes on modern heterogeneous supercomputing systems comprise CPUs, GPUs, and high-speed network interconnects (NICs). Parallelization is identified as a technique for effectively utilizing th...
Fermilab's Transition to Token AuthenticationDave Dykstra, Mine Altunay, Shreyas Bhat, Dmitry Litvintsev, Marco Mambelli, Marc Mengel, Stephen White2025-03-31下载Fermilab is the first High Energy Physics institution to transition from X.509 user certificates to authentication tokens in production systems.
Enhancing Traffic Safety with AI and 6G: Latency Requirements and Real-Time Threat DetectionKurt Horvath, Dragi Kimovski, Stojan Kitanov, Radu Prodan2025-03-31下载The rapid digitalization of urban infrastructure opens the path to smart cities, where IoT-enabled infrastructure enhances public safety and efficiency.
Deep Learning Model Deployment in Multiple Cloud Providers: an Exploratory Study Using Low Computing Power EnvironmentsElayne Lemos, Rodrigo Oliveira, Jairson Rodrigues, Rosalvo F. Oliveira Neto2025-03-31下载The deployment of Machine Learning models in the cloud has grown among tech companies. Hardware requirements are higher when these models involve Deep Learning techniques, and the cloud providers' cos...
A Practical Rollup Escape Hatch DesignFrancisco Gomes Figueira, Martin Derka, Ching Lun Chiu, Jan Gorzny2025-03-31下载A rollup network is a type of popular "Layer 2" scaling solution for general purpose "Layer 1" blockchains like Ethereum. Rollups networks separate execution of transactions from other aspects like co...
OrchMLLM: Orchestrate Multimodal Data with Batch Post-Balancing to Accelerate Multimodal Large Language Model TrainingYijie Zheng, Bangjun Xiao, Lei Shi, Xiaoyang Li, Faming Wu, Tianyu Li, Xuefeng Xiao, Yang Zhang, Yuxuan Wang, Shouda Liu2025-03-31下载Multimodal large language models (MLLMs), such as GPT-4o, are garnering significant attention. During the exploration of MLLM training, we identified Modality Composition Incoherence, a phenomenon tha...
MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM AccelerationTatsuya Kubo, Daichi Tokuda, Tomoya Nagatani, Masayuki Usui, Lei Qu, Ting Cao, Shinya Takamaeda-Yamazaki2025-03-31下载General matrix-vector multiplication (GeMV) remains a critical latency bottleneck in large language model (LLM) inference, even with quantized low-bit models.
Who is in Charge here? Understanding How Runtime Configuration Affects Software along with Variables&ConstantsChaopeng Luo, Yuanliang Zhang, Haochen He, Zhouyang Jia, Teng Wang, Shulin Zhou, Si Zheng, Shanshan Li2025-03-31下载Runtime misconfiguration can lead to software performance degradation and even cause failure. Developers typically perform sanity checks during the configuration parsing stage to prevent invalid param...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Rack Position Optimization in Large-Scale Heterogeneous Data CentersChang-Lin Chen, Jiayu Chen, Tian Lan, Zhaoxia Zhao, Hongbo Dong, Vaneet Aggarwal2025-03-31下载As rapidly growing AI computational demands accelerate the need for new hardware installation and maintenance, this work explores optimal data center resource management by balancing operational effic...
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement LearningYubo Zhang, Pedro Botelho, Trevor Gordon, Gil Zussman, Igor Kadota2025-03-31下载We consider a decentralized wireless network with several source-destination pairs sharing a limited number of orthogonal frequency bands. Sources learn to adapt their transmissions (specifically, the...
Moving Edge for On-Demand Edge Computing: An Uncertainty-aware ApproachFangtong Zhou, Ruozhou Yu2025-03-31下载We study an edge demand response problem where, based on historical edge workload demands, an edge provider needs to dispatch moving computing units, e.g.
Traffic Engineering in Large-scale Networks with Generalizable Graph Neural NetworksFangtong Zhou, Xiaorui Liu, Ruozhou Yu, Guoliang Xue2025-03-31下载Traffic Engineering (TE) in large-scale networks like cloud Wide Area Networks (WANs) and Low Earth Orbit (LEO) satellite constellations is a critical challenge.
Trident: Interference Avoidance in Multi-reader Backscatter Network via Frequency-space DivisionYang Zou, Xin Na, Yimiao Sun, Yuan He2025-03-31下载Backscatter is a key technology for battery-free sensing in industrial IoT applications. To fully cover numerous tags in the deployment area, one often needs to deploy multiple readers, each of which ...
Cell-Free Massive MIMO Under Mobility: A Fairness-Differentiated Handover SchemeYunlu Xiao, Marina Petrova, Ljiljana Simić2025-03-31下载While cell-free massive MIMO (CF-mMIMO) offers high network-wide throughput in static networks, especially for the worst-served users, its performance in mobile networks is not yet fully addressed.
Robust Predictive Routing for Internet of Vehicles Leveraging Both V2I and V2V LinksYawen Chang, Xudong Wang2025-03-31下载With the developments of the Internet of Vehicles (IoV) from 4G to 5G, vehicle-to-infrastructure (V2I) communications are becoming attractive for vehicle users (VUEs) to obtain diverse cloud service t...
Blockchain for Federated Learning in the Internet of Things: Trustworthy Adaptation, Standards, and the Road AheadFarhana Javed, Engin Zeydan, Josep Mangues-Bafalluy, Kapal Dev, Luis Blanco2025-03-31下载As edge computing gains prominence in Internet of Things (IoTs), smart cities, and autonomous systems, the demand for real-time machine intelligence with low latency and model reliability continues to...
Multi-Agent Deep Reinforcement Learning for Optimized Multi-UAV Coverage and Power-Efficient UE ConnectivityXuli Cai, Poonam Lohan, Burak Kantarci2025-03-31下载In critical situations such as natural disasters, network outages, battlefield communication, or large-scale public events, Unmanned Aerial Vehicles (UAVs) offer a promising approach to maximize wirel...
Optimizing Age of Information in Networks with Large and Small UpdatesZhuoyi Zhao, Vishrant Tripathi, Igor Kadota2025-03-31下载Modern sensing and monitoring applications typically consist of sources transmitting updates of different sizes, ranging from a few bytes (position, temperature, etc.

cs.OS - Operating Systems

标题作者发布日期PDF摘要
HeteroPod: XPU-Accelerated Infrastructure Offloading for Commodity Cloud-Native ApplicationsBicheng Yang, Jingkai He, Dong Du, Yubin Xia, Haibo Chen2025-03-31下载Cloud-native systems increasingly rely on infrastructure services (e.g., service meshes, monitoring agents), which compete for resources with user applications, degrading performance and scalability.
Who is in Charge here? Understanding How Runtime Configuration Affects Software along with Variables&ConstantsChaopeng Luo, Yuanliang Zhang, Haochen He, Zhouyang Jia, Teng Wang, Shulin Zhou, Si Zheng, Shanshan Li2025-03-31下载Runtime misconfiguration can lead to software performance degradation and even cause failure. Developers typically perform sanity checks during the configuration parsing stage to prevent invalid param...

cs.PF - Performance

标题作者发布日期PDF摘要
Deep Learning Model Deployment in Multiple Cloud Providers: an Exploratory Study Using Low Computing Power EnvironmentsElayne Lemos, Rodrigo Oliveira, Jairson Rodrigues, Rosalvo F. Oliveira Neto2025-03-31下载The deployment of Machine Learning models in the cloud has grown among tech companies. Hardware requirements are higher when these models involve Deep Learning techniques, and the cloud providers' cos...

基于 VitePress 构建