2025-05-09

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
"vcd2df" -- Leveraging Data Science Insights for Hardware Security Research	Calvin Deutschbein, Jimmy Ostler, Hriday Raj	2025-05-09	下载	In this work, we hope to expand the universe of security practitioners of open-source hardware by creating a bridge from hardware design languages (HDLs) to data science languages like Python and R th...
A Comprehensive Data Description for LoRaWAN Path Loss Measurements in an Indoor Office Setting: Effects of Environmental Factors	Nahshon Mokua Obiri, Kristof Van Laerhoven	2025-05-09	下载	This paper presents a comprehensive dataset of LoRaWAN technology path loss measurements collected in an indoor office environment, focusing on quantifying the effects of environmental factors on sign...
Assessing Tenstorrent's RISC-V MatMul Acceleration Capabilities	Hiari Pizzini Cavagna, Daniele Cesarini, Andrea Bartolini	2025-05-09	下载	The increasing demand for generative AI as Large Language Models (LLMs) services has driven the need for specialized hardware architectures that optimize computational efficiency and energy consumptio...
Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization	Luca Colagrande, Luca Benini	2025-05-09	下载	Heterogeneous multi-core architectures combine on a single chip a few large, general-purpose host cores, optimized for single-thread performance, with (many) clusters of small, specialized, energy-eff...
LightNobel: Improving Sequence Length Limitation in Protein Structure Prediction Model via Adaptive Activation Quantization	Seunghee Han, Soongyu Choi, Joo-Young Kim	2025-05-09	下载	Recent advances in Protein Structure Prediction Models (PPMs), such as AlphaFold2 and ESMFold, have revolutionized computational biology by achieving unprecedented accuracy in predicting three-dimensi...
What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic Chips	Renjie Li, Wenjie Wei, Qi Xin, Xiaoli Liu, Sixuan Mao, Erik Ma, Zijian Chen, Malu Zhang, Haizhou Li, Zhaoyu Zhang	2025-05-09	下载	Large language models (LLMs) are rapidly pushing the limits of contemporary computing hardware. For example, training GPT-3 has been estimated to consume around 1300 MWh of electricity, and projection...

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference	Haolin Zhang, Jeff Huang	2025-05-09	下载	The common assumption in on-device AI is that GPUs, with their superior parallel processing, always provide the best performance for large language model (LLM) inference.
On Optimal Batch Size in Coded Computing	Swapnil Saha, Emina Soljanin, Philip Whiting	2025-05-09	下载	We consider computing systems that partition jobs into tasks, add redundancy through coding, and assign the encoded tasks to different computing nodes for parallel execution.
Distributed Tensor Network Library for Quantum Computing Emulation	Jakub Adamski, Oliver Thomson Brown	2025-05-09	下载	Tensor networks offer an adaptable and efficient approach to emulation of quantum computers. Their usage relies on partitioning circuits into small tensors, which are contracted together to form the f...
HashKitty: Distributed Password Analysis	Pedro Antunes, Tomás Santos, Daniel Fuentes, Luís Frazão	2025-05-09	下载	This article documents the HashKitty platform, a distributed solution for password analysis based on the hashcat tool, designed to improve efficiency in both offensive and defensive security operation...
Scheduled Jacobian Chaining	Simon Märtens, Uwe Naumann	2025-05-09	下载	This paper addresses the efficient computation of Jacobian matrices for programs composed of sequential differentiable subprograms. By representing the overall Jacobian as a chain product of the Jacob...
Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoI	Jianpeng Qi, Chao Liu, Chengxiang Xu, Rui Wang, Junyu Dong, Yanwei Yu	2025-05-09	下载	Timely and efficient dissemination of service information is critical in compute-first networking systems, where user requests arrive dynamically and computing resources are constrained.
Toward Heterogeneous, Distributed, and Energy-Efficient Computing with SYCL	Biagio Cosenza, Lorenzo Carpentieri, Kaijie Fan, Marco D'Antonio, Peter Thoman, Philip Salzmann	2025-05-09	下载	Programming modern high-performance computing systems is challenging due to the need to efficiently program GPUs and accelerators and to handle data movement between nodes.
An Autonomy Loop for Dynamic HPC Job Time Limit Adjustment	Thomas Jakobsche, Osman Seckin Simsek, Jim Brandt, Ann Gentile, Florina M. Ciorba	2025-05-09	下载	High Performance Computing (HPC) systems rely on fixed user-provided estimates of job time limits. These estimates are often inaccurate, resulting in inefficient resource use and the loss of unsaved w...
Taming Offload Overheads in a Massively Parallel Open-Source RISC-V MPSoC: Analysis and Optimization	Luca Colagrande, Luca Benini	2025-05-09	下载	Heterogeneous multi-core architectures combine on a single chip a few large, general-purpose host cores, optimized for single-thread performance, with (many) clusters of small, specialized, energy-eff...
DawnPiper: A Memory-scablable Pipeline Parallel Training Framework	Xuan Peng, Xuanhua Shi, Haolin Zhang, Yunfei Zhao, Xuehai Qian	2025-05-09	下载	Pipeline parallelism is a crucial paradigm for large-scale model training. However, imbalances in memory footprint across stages can lead to significant GPU memory wastage, limiting the model sizes th...
All-to-All Communication with Mobile Edge Adversary: Almost Linearly More Faults, For Free	Orr Fischer, Merav Parter	2025-05-09	下载	Resilient computation in all-to-all-communication models has attracted tremendous attention over the years. Most of these works assume the classical faulty model which restricts the total number of co...
Understanding Stragglers in Large Model Training Using What-if Analysis	Jinkun Lin, Ziheng Jiang, Zuquan Song, Sida Zhao, Menghan Yu, Zhanghan Wang, Chenyuan Wang, Zuocheng Shi, Xiang Shi, Wei Jia, Zherui Liu, Shuguang Wang, Haibin Lin, Xin Liu, Aurojit Panda, Jinyang Li	2025-05-09	下载	Large language model (LLM) training is one of the most demanding distributed computations today, often requiring thousands of GPUs with frequent synchronization across machines.

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
Revenue Optimization in Video Caching Networks with Privacy-Preserving Demand Predictions	Yijing Zhang, Ferdous Pervej, Andreas F. Molisch	2025-05-09	下载	Performance of video streaming, which accounts for most of the traffic in wireless communication, can be significantly improved by caching popular videos at the wireless edge.
A Comprehensive Data Description for LoRaWAN Path Loss Measurements in an Indoor Office Setting: Effects of Environmental Factors	Nahshon Mokua Obiri, Kristof Van Laerhoven	2025-05-09	下载	This paper presents a comprehensive dataset of LoRaWAN technology path loss measurements collected in an indoor office environment, focusing on quantifying the effects of environmental factors on sign...
Extending the Control Plane of Container Orchestrators for I/O Virtualization	Garegin Grigoryan, Minseok Kwon, M. Mustafa Rafique	2025-05-09	下载	Single Root Input/Output Virtualization (SR-IOV) is a standard technology for forking a single PCI express device and providing it to applications while ensuring performance isolation.
Efficient Information Updates in Compute-First Networking via Reinforcement Learning with Joint AoI and VoI	Jianpeng Qi, Chao Liu, Chengxiang Xu, Rui Wang, Junyu Dong, Yanwei Yu	2025-05-09	下载	Timely and efficient dissemination of service information is critical in compute-first networking systems, where user requests arrive dynamically and computing resources are constrained.
P4Kube: In-Network Load Balancer for Kubernetes	Garegin Grigoryan, Kevin Penkowski, Minseok Kwon	2025-05-09	下载	Kubernetes Services such as LoadBalancer and NodePort expose applications running on pods within a Kubernetes cluster to external users. While the LoadBalancer Service requires an external load-balanc...
Learning Power Control Protocol for In-Factory 6G Subnetworks	Uyoata E. Uyoata, Gilberto Berardinelli, Ramoni Adeogun	2025-05-09	下载	In-X Subnetworks are envisioned to meet the stringent demands of short-range communication in diverse 6G use cases. In the context of In-Factory scenarios, effective power control is critical to mitig...
Multi-User Beamforming with Deep Reinforcement Learning in Sensing-Aided Communication	Xiyu Wang, Gilberto Berardinelli, Hei Victor Cheng, Petar Popovski, Ramoni Adeogun	2025-05-09	下载	Mobile users are prone to experience beam failure due to beam drifting in millimeter wave (mmWave) communications. Sensing can help alleviate beam drifting with timely beam changes and low overhead si...

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
On Optimal Batch Size in Coded Computing	Swapnil Saha, Emina Soljanin, Philip Whiting	2025-05-09	下载	We consider computing systems that partition jobs into tasks, add redundancy through coding, and assign the encoded tasks to different computing nodes for parallel execution.
Assessing Tenstorrent's RISC-V MatMul Acceleration Capabilities	Hiari Pizzini Cavagna, Daniele Cesarini, Andrea Bartolini	2025-05-09	下载	The increasing demand for generative AI as Large Language Models (LLMs) services has driven the need for specialized hardware architectures that optimize computational efficiency and energy consumptio...