Skip to content

2025-05-09

cs.AR - Architecture

标题作者发布日期PDF摘要
"vcd2df" -- Leveraging Data Science Insights for Hardware Security ResearchCalvin Deutschbein, Jimmy Ostler, Hriday Raj2025-05-09下载In this work, we hope to expand the universe of security practitioners of open-source hardware by creating a bridge from hardware design languages (HDLs) to data science languages like Python and R th...
A Comprehensive Data Description for LoRaWAN Path Loss Measurements in an Indoor Office Setting: Effects of Environmental FactorsNahshon Mokua Obiri, Kristof Van Laerhoven2025-05-09下载This paper presents a comprehensive dataset of LoRaWAN technology path loss measurements collected in an indoor office environment, focusing on quantifying the effects of environmental factors on sign...
Assessing Tenstorrent's RISC-V MatMul Acceleration CapabilitiesHiari Pizzini Cavagna, Daniele Cesarini, Andrea Bartolini2025-05-09下载The increasing demand for generative AI as Large Language Models (LLMs) services has driven the need for specialized hardware architectures that optimize computational efficiency and energy consumptio...
Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and OptimizationLuca Colagrande, Luca Benini2025-05-09下载Heterogeneous multi-core architectures combine on a single chip a few large, general-purpose host cores, optimized for single-thread performance, with (many) clusters of small, specialized, energy-eff...
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation QuantizationSeunghee Han, Soongyu Choi, Joo-Young Kim2025-05-09下载Recent advances in Protein Structure Prediction Models (PPMs), such as AlphaFold2 and ESMFold, have revolutionized computational biology by achieving unprecedented accuracy in predicting three-dimensi...
What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic ChipsRenjie Li, Wenjie Wei, Qi Xin, Xiaoli Liu, Sixuan Mao, Erik Ma, Zijian Chen, Malu Zhang, Haizhou Li, Zhaoyu Zhang2025-05-09下载Large language models (LLMs) are rapidly pushing the limits of contemporary computing hardware. For example, training GPT-3 has been estimated to consume around 1300 MWh of electricity, and projection...

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Challenging GPU Dominance: When CPUs Outperform for On-Device LLM InferenceHaolin Zhang, Jeff Huang2025-05-09下载The common assumption in on-device AI is that GPUs, with their superior parallel processing, always provide the best performance for large language model (LLM) inference.
On Optimal Batch Size in Coded ComputingSwapnil Saha, Emina Soljanin, Philip Whiting2025-05-09下载We consider computing systems that partition jobs into tasks, add redundancy through coding, and assign the encoded tasks to different computing nodes for parallel execution.
Distributed Tensor Network Library for Quantum Computing EmulationJakub Adamski, Oliver Thomson Brown2025-05-09下载Tensor networks offer an adaptable and efficient approach to emulation of quantum computers. Their usage relies on partitioning circuits into small tensors, which are contracted together to form the f...
HashKitty: Distributed Password AnalysisPedro Antunes, Tomás Santos, Daniel Fuentes, Luís Frazão2025-05-09下载This article documents the HashKitty platform, a distributed solution for password analysis based on the hashcat tool, designed to improve efficiency in both offensive and defensive security operation...
Scheduled Jacobian ChainingSimon Märtens, Uwe Naumann2025-05-09下载This paper addresses the efficient computation of Jacobian matrices for programs composed of sequential differentiable subprograms. By representing the overall Jacobian as a chain product of the Jacob...
Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoIJianpeng Qi, Chao Liu, Chengxiang Xu, Rui Wang, Junyu Dong, Yanwei Yu2025-05-09下载Timely and efficient dissemination of service information is critical in compute-first networking systems, where user requests arrive dynamically and computing resources are constrained.
Toward Heterogeneous, Distributed, and Energy-Efficient Computing with SYCLBiagio Cosenza, Lorenzo Carpentieri, Kaijie Fan, Marco D'Antonio, Peter Thoman, Philip Salzmann2025-05-09下载Programming modern high-performance computing systems is challenging due to the need to efficiently program GPUs and accelerators and to handle data movement between nodes.
An Autonomy Loop for Dynamic HPC Job Time Limit AdjustmentThomas Jakobsche, Osman Seckin Simsek, Jim Brandt, Ann Gentile, Florina M. Ciorba2025-05-09下载High Performance Computing (HPC) systems rely on fixed user-provided estimates of job time limits. These estimates are often inaccurate, resulting in inefficient resource use and the loss of unsaved w...
Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and OptimizationLuca Colagrande, Luca Benini2025-05-09下载Heterogeneous multi-core architectures combine on a single chip a few large, general-purpose host cores, optimized for single-thread performance, with (many) clusters of small, specialized, energy-eff...
DawnPiper: A Memory-scablable Pipeline Parallel Training FrameworkXuan Peng, Xuanhua Shi, Haolin Zhang, Yunfei Zhao, Xuehai Qian2025-05-09下载Pipeline parallelism is a crucial paradigm for large-scale model training. However, imbalances in memory footprint across stages can lead to significant GPU memory wastage, limiting the model sizes th...
All-to-All Communication with Mobile Edge Adversary: Almost Linearly More Faults, For FreeOrr Fischer, Merav Parter2025-05-09下载Resilient computation in all-to-all-communication models has attracted tremendous attention over the years. Most of these works assume the classical faulty model which restricts the total number of co...
Understanding Stragglers in Large Model Training Using What-if AnalysisJinkun Lin, Ziheng Jiang, Zuquan Song, Sida Zhao, Menghan Yu, Zhanghan Wang, Chenyuan Wang, Zuocheng Shi, Xiang Shi, Wei Jia, Zherui Liu, Shuguang Wang, Haibin Lin, Xin Liu, Aurojit Panda, Jinyang Li2025-05-09下载Large language model (LLM) training is one of the most demanding distributed computations today, often requiring thousands of GPUs with frequent synchronization across machines.

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
Revenue Optimization in Video Caching Networks with Privacy-Preserving Demand PredictionsYijing Zhang, Ferdous Pervej, Andreas F. Molisch2025-05-09下载Performance of video streaming, which accounts for most of the traffic in wireless communication, can be significantly improved by caching popular videos at the wireless edge.
A Comprehensive Data Description for LoRaWAN Path Loss Measurements in an Indoor Office Setting: Effects of Environmental FactorsNahshon Mokua Obiri, Kristof Van Laerhoven2025-05-09下载This paper presents a comprehensive dataset of LoRaWAN technology path loss measurements collected in an indoor office environment, focusing on quantifying the effects of environmental factors on sign...
Extending the Control Plane of Container Orchestrators for I/O VirtualizationGaregin Grigoryan, Minseok Kwon, M. Mustafa Rafique2025-05-09下载Single Root Input/Output Virtualization (SR-IOV) is a standard technology for forking a single PCI express device and providing it to applications while ensuring performance isolation.
Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoIJianpeng Qi, Chao Liu, Chengxiang Xu, Rui Wang, Junyu Dong, Yanwei Yu2025-05-09下载Timely and efficient dissemination of service information is critical in compute-first networking systems, where user requests arrive dynamically and computing resources are constrained.
P4Kube: In-Network Load Balancer for KubernetesGaregin Grigoryan, Kevin Penkowski, Minseok Kwon2025-05-09下载Kubernetes Services such as LoadBalancer and NodePort expose applications running on pods within a Kubernetes cluster to external users. While the LoadBalancer Service requires an external load-balanc...
Learning Power Control Protocol for In-Factory 6G SubnetworksUyoata E. Uyoata, Gilberto Berardinelli, Ramoni Adeogun2025-05-09下载In-X Subnetworks are envisioned to meet the stringent demands of short-range communication in diverse 6G use cases. In the context of In-Factory scenarios, effective power control is critical to mitig...
Multi-User Beamforming with Deep Reinforcement Learning in Sensing-Aided CommunicationXiyu Wang, Gilberto Berardinelli, Hei Victor Cheng, Petar Popovski, Ramoni Adeogun2025-05-09下载Mobile users are prone to experience beam failure due to beam drifting in millimeter wave (mmWave) communications. Sensing can help alleviate beam drifting with timely beam changes and low overhead si...

cs.PF - Performance

标题作者发布日期PDF摘要
On Optimal Batch Size in Coded ComputingSwapnil Saha, Emina Soljanin, Philip Whiting2025-05-09下载We consider computing systems that partition jobs into tasks, add redundancy through coding, and assign the encoded tasks to different computing nodes for parallel execution.
Assessing Tenstorrent's RISC-V MatMul Acceleration CapabilitiesHiari Pizzini Cavagna, Daniele Cesarini, Andrea Bartolini2025-05-09下载The increasing demand for generative AI as Large Language Models (LLMs) services has driven the need for specialized hardware architectures that optimize computational efficiency and energy consumptio...

基于 VitePress 构建