2025-10-03

cs.AR - Architecture

标题	作者	发布日期	PDF	摘要
Implémentation Efficiente de Fonctions de Convolution sur FPGA à l'Aide de Blocs Paramétrables et d'Approximations Polynomiales	Philippe Magalhães, Virginie Fresse, Benoît Suffran, Olivier Alata	2025-10-03	下载	Implementing convolutional neural networks (CNNs) on field-programmable gate arrays (FPGAs) has emerged as a promising alternative to GPUs, offering lower latency, greater power efficiency and greater...
A Resource-Driven Approach for Implementing CNNs on FPGAs Using Adaptive IPs	Philippe Magalhães, Virginie Fresse, Benoît Suffran, Olivier Alata	2025-10-03	下载	The increasing demand for real-time, low-latency artificial intelligence applications has propelled the use of Field-Programmable Gate Arrays (FPGAs) for Convolutional Neural Network (CNN) implementat...
A Hardware Accelerator for the Goemans-Williamson Algorithm	D. A. Herrera-Martí, E. Guthmuller, J. Fereyre	2025-10-03	下载	The combinatorial problem Max-Cut has become a benchmark in the evaluation of local search heuristics for both quantum and classical optimisers.
UPMEM Unleashed: Software Secrets for Speed	Krystian Chmielewski, Jarosław Ławnicki, Uladzislau Lukyanau, Tadeusz Kobus, Maciej Maciejewski	2025-10-03	下载	Developing kernels for Processing-In-Memory (PIM) platforms poses unique challenges in data management and parallel programming on limited processing units.
TeLLMe v2: An Efficient End-to-End Ternary LLM Prefill and Decode Accelerator with Table-Lookup Matmul on Edge FPGAs	Ye Qiao, Zhiheng Chen, Yifan Zhang, Yian Wang, Sitao Huang	2025-10-03	下载	With the emergence of wearable devices and other embedded systems, deploying large language models (LLMs) on edge platforms has become an urgent need.
HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference	Shubham Negi, Kaushik Roy	2025-10-03	下载	The rapid adoption of Large Language Models (LLMs) has driven a growing demand for efficient inference, particularly in latency-sensitive applications such as chatbots and personalized assistants.

cs.DC - Distributed, Parallel, and Cluster Computing

标题	作者	发布日期	PDF	摘要
Cosmological Hydrodynamics at Exascale: A Trillion-Particle Leap in Capability	Nicholas Frontiere, J. D. Emberson, Michael Buehlmann, Esteban M. Rangel, Salman Habib, Katrin Heitmann, Patricia Larsen, Vitali Morozov, Adrian Pope, Claude-André Faucher-Giguère, Antigoni Georgiadou, Damien Lebrun-Grandié, Andrey Prokopenko	2025-10-03	下载	Resolving the most fundamental questions in cosmology requires simulations that match the scale, fidelity, and physical complexity demanded by next-generation sky surveys.
A Lightweight Federated Learning Approach for Privacy-Preserving Botnet Detection in IoT	Taha M. Mahmoud, Naima Kaabouch	2025-10-03	下载	The rapid growth of the Internet of Things (IoT) has expanded opportunities for innovation but also increased exposure to botnet-driven cyberattacks.
Short-circuiting Rings for Low-Latency AllReduce	Sarah-Michelle Hammer, Stefan Schmid, Rachee Singh, Vamsi Addanki	2025-10-03	下载	Efficient collective communication is critical for many distributed ML and HPC applications. In this context, it is widely believed that the Ring algorithm for the AllReduce collective communication o...
Paris: A Decentralized Trained Open-Weight Diffusion Model	Zhiying Jiang, Raihan Seraj, Marcos Villagra, Bidhan Roy	2025-10-03	下载	We present Paris, the first publicly released diffusion model pre-trained entirely through decentralized computation. Paris demonstrates that high-quality text-to-image generation can be achieved with...
Sensors in viticulture: functions, benefits, and data-driven insights	Milan Milenkovic	2025-10-03	下载	Use of sensors and related analytical predictions can be a powerful tool in providing data-informed input to viticulturalists' decision process, complementing their vineyard observations and intuition...
iDDS: Intelligent Distributed Dispatch and Scheduling for Workflow Orchestration	Wen Guan, Tadashi Maeno, Aleksandr Alekseev, Fernando Harald Barreiro Megino, Kaushik De, Edward Karavakis, Alexei Klimentov, Tatiana Korchuganova, FaHui Lin, Paul Nilsson, Torre Wenaus, Zhaoyu Yang, Xin Zhao	2025-10-03	下载	The intelligent Distributed Dispatch and Scheduling (iDDS) service is a versatile workflow orchestration system designed for large-scale, distributed scientific computing.
PyRadiomics-cuda: 3D features extraction from medical images for HPC using GPU acceleration	Jakub Lisowski, Piotr Tyrakowski, Szymon Zyguła, Krzysztof Kaczmarski	2025-10-03	下载	PyRadiomics-cuda is a GPU-accelerated extension of the PyRadiomics library, designed to address the computational challenges of extracting three-dimensional shape features from medical images.
Energy Efficiency in Cloud-Based Big Data Processing for Earth Observation: Gap Analysis and Future Directions	Adhitya Bhawiyuga, Serkan Girgin, Rolf A. de By, Raul Zurita-Milla	2025-10-03	下载	Earth observation (EO) data volumes are rapidly increasing. While cloud computing are now used for processing large EO datasets, the energy efficiency aspects of such a processing have received much l...
On the energy efficiency of sparse matrix computations on multi-GPU clusters	Massimo Bernaschi, Alessandro Celestini, Pasqua D'Ambra, Giorgio Richelli	2025-10-03	下载	We investigate the energy efficiency of a library designed for parallel computations with sparse matrices. The library leverages high-performance, energy-efficient Graphics Processing Unit (GPU) accel...
Action Deviation-Aware Inference for Low-Latency Wireless Robots	Jeyoung Park, Yeonsub Lim, Seungeun Oh, Jihong Park, Jinho Choi, Seong-Lyun Kim	2025-10-03	下载	To support latency-sensitive AI applications ranging from autonomous driving to industrial robot manipulation, 6G envisions distributed ML with computational resources in mobile, edge, and cloud conne...
TridentServe: A Stage-level Serving System for Diffusion Pipelines	Yifei Xia, Fangcheng Fu, Hao Yuan, Hanke Zhang, Xupeng Miao, Yijun Liu, Suhan Ling, Jie Jiang, Bin Cui	2025-10-03	下载	Diffusion pipelines, renowned for their powerful visual generation capabilities, have seen widespread adoption in generative vision tasks (e.g., text-to-image/video).
Distributed Low-Communication Training with Decoupled Momentum Optimization	Sasho Nedelkoski, Alexander Acker, Odej Kao, Soeren Becker, Dominik Scheinert	2025-10-03	下载	The training of large models demands substantial computational resources, typically available only in data centers with high-bandwidth interconnects.
UPMEM Unleashed: Software Secrets for Speed	Krystian Chmielewski, Jarosław Ławnicki, Uladzislau Lukyanau, Tadeusz Kobus, Maciej Maciejewski	2025-10-03	下载	Developing kernels for Processing-In-Memory (PIM) platforms poses unique challenges in data management and parallel programming on limited processing units.
GRNND: A GPU-Parallel Relative NN-Descent Algorithm for Efficient Approximate Nearest Neighbor Graph Construction	Xiang Li, Qiong Chang, Yun Li, Jun Miyazaki	2025-10-03	下载	Relative Nearest Neighbor Descent (RNN-Descent) is a state-of-the-art algorithm for constructing sparse approximate nearest neighbor (ANN) graphs by combining the iterative refinement of NN-Descent wi...

cs.NI - Networking and Internet Architecture

标题	作者	发布日期	PDF	摘要
An efficient grey theory-driven path selection for energy efficiency control in the Internet of Things using fog and cloud computing	Mohammad Reza Akbari, Hamid Barati, Ali Barati	2025-10-03	下载	Due to the big data exchange on the Internet of Things, proper routing and selecting the best routes for fast data transmission improve network performance.
A distributed routing protocol for sending data from things to the cloud leveraging fog technology in the large-scale IoT ecosystem	Mohammad Reza Akbari, Hamid Barati, Ali Barati	2025-10-03	下载	Fog computing integrates cloud and edge resources. According to an intelligent and decentralized method, this technology processes data generated by IoT sensors to seamlessly integrate physical and cy...
Short-circuiting Rings for Low-Latency AllReduce	Sarah-Michelle Hammer, Stefan Schmid, Rachee Singh, Vamsi Addanki	2025-10-03	下载	Efficient collective communication is critical for many distributed ML and HPC applications. In this context, it is widely believed that the Ring algorithm for the AllReduce collective communication o...
Scalable Ground Station Selection for Large LEO Constellations	Grace Ra Kim, Duncan Eddy, Vedant Srinivas, Mykel J. Kochenderfer	2025-10-03	下载	Effective ground station selection is critical for low Earth orbiting (LEO) satellite constellations to minimize operational costs, maximize data downlink volume, and reduce communication gaps between...
Automatic Generation of Digital Twins for Network Testing	Shenjia Ding, David Flynn, Paul Harvey	2025-10-03	下载	The increased use of software in the operation and management of telecommunication networks has moved the industry one step closer to realizing autonomous network operation.
Corrosion Risk Estimation for Heritage Preservation: An Internet of Things and Machine Learning Approach Using Temperature and Humidity	Reginald Juan M. Mercado, Muhammad Kabeer, Haider Al-Obaidy, Rosdiadee Nordin	2025-10-03	下载	Proactive preservation of steel structures at culturally significant heritage sites like the San Sebastian Basilica in the Philippines requires accurate corrosion forecasting.
Sequence-Based Deep Learning for Handover Optimization in Dense Urban Cellular Network	Muhammad Kabeer, Rosdiadee Nordin, Mehran Behjati, Lau Sian Lun	2025-10-03	下载	Efficient handover management remains a critical challenge in dense urban cellular networks, where high cell density, user mobility, and diverse service demands increase the likelihood of unnecessary ...
SoK: Preconfirmations	Aikaterini-Panagiota Stouka, Conor McMenamin, Demetris Kyriacou, Lin Oshitani, Quentin Botha	2025-10-03	下载	In recent years, significant research efforts have focused on improving blockchain throughput and confirmation speeds without compromising security.
DH-EAC: Design of a Dynamic, Hierarchical Entanglement Access Control Protocol	Akihisa Takahashi, Yoshito Tobe	2025-10-03	下载	We propose Dynamic, Hierarchical Entanglement Access Control (DH-EAC), a pure-quantum protocol for fair and anonymous allocation of scarce entanglement across wide-area quantum networks composed of ma...
FSMA: Scalable and Reliable LoRa for Non-Terrestrial Networks with Mobile Gateways	Rohith Reddy Vennam, Maiyun Zhang, Raghav Subbaraman, Deepak Vashist, Dinesh Bharadia	2025-10-03	下载	The proliferation of Low Earth Orbit (LEO) satellites for universal IoT applications and the growing use of drones in emergency services, agriculture, and military operations highlight the transformat...
L4Span: Spanning Congestion Signaling over NextG Networks for Interactive Applications	Haoran Wan, Kyle Jamieson	2025-10-03	下载	Design for low latency networking is essential for tomorrow's interactive applications, but it is essential to deploy incrementally and universally at the network's last mile.

cs.PF - Performance

标题	作者	发布日期	PDF	摘要
Cosmological Hydrodynamics at Exascale: A Trillion-Particle Leap in Capability	Nicholas Frontiere, J. D. Emberson, Michael Buehlmann, Esteban M. Rangel, Salman Habib, Katrin Heitmann, Patricia Larsen, Vitali Morozov, Adrian Pope, Claude-André Faucher-Giguère, Antigoni Georgiadou, Damien Lebrun-Grandié, Andrey Prokopenko	2025-10-03	下载	Resolving the most fundamental questions in cosmology requires simulations that match the scale, fidelity, and physical complexity demanded by next-generation sky surveys.
Formal Analysis of Metastable Failures in Software Systems	Peter Alvaro, Rebecca Isaacs, Rupak Majumdar, Kiran-Kumar Muniswamy-Reddy, Mahmoud Salamati, Sadegh Soudjani	2025-10-03	下载	Many large-scale software systems demonstrate metastable failures. In this class of failures, a stressor such as a temporary spike in workload causes the system performance to drop and, subsequently, ...
On the energy efficiency of sparse matrix computations on multi-GPU clusters	Massimo Bernaschi, Alessandro Celestini, Pasqua D'Ambra, Giorgio Richelli	2025-10-03	下载	We investigate the energy efficiency of a library designed for parallel computations with sparse matrices. The library leverages high-performance, energy-efficient Graphics Processing Unit (GPU) accel...
UPMEM Unleashed: Software Secrets for Speed	Krystian Chmielewski, Jarosław Ławnicki, Uladzislau Lukyanau, Tadeusz Kobus, Maciej Maciejewski	2025-10-03	下载	Developing kernels for Processing-In-Memory (PIM) platforms poses unique challenges in data management and parallel programming on limited processing units.