Skip to content

2025-10-03

cs.AR - Architecture

标题作者发布日期PDF摘要
Implémentation Efficiente de Fonctions de Convolution sur FPGA à l'Aide de Blocs Paramétrables et d'Approximations PolynomialesPhilippe Magalhães, Virginie Fresse, Benoît Suffran, Olivier Alata2025-10-03下载Implementing convolutional neural networks (CNNs) on field-programmable gate arrays (FPGAs) has emerged as a promising alternative to GPUs, offering lower latency, greater power efficiency and greater...
A Resource-Driven Approach for Implementing CNNs on FPGAs Using Adaptive IPsPhilippe Magalhães, Virginie Fresse, Benoît Suffran, Olivier Alata2025-10-03下载The increasing demand for real-time, low-latency artificial intelligence applications has propelled the use of Field-Programmable Gate Arrays (FPGAs) for Convolutional Neural Network (CNN) implementat...
A Hardware Accelerator for the Goemans-Williamson AlgorithmD. A. Herrera-Martí, E. Guthmuller, J. Fereyre2025-10-03下载The combinatorial problem Max-Cut has become a benchmark in the evaluation of local search heuristics for both quantum and classical optimisers.
UPMEM Unleashed: Software Secrets for SpeedKrystian Chmielewski, Jarosław Ławnicki, Uladzislau Lukyanau, Tadeusz Kobus, Maciej Maciejewski2025-10-03下载Developing kernels for Processing-In-Memory (PIM) platforms poses unique challenges in data management and parallel programming on limited processing units.
TeLLMe v2: An Efficient End-to-End Ternary LLM Prefill and Decode Accelerator with Table-Lookup Matmul on Edge FPGAsYe Qiao, Zhiheng Chen, Yifan Zhang, Yian Wang, Sitao Huang2025-10-03下载With the emergence of wearable devices and other embedded systems, deploying large language models (LLMs) on edge platforms has become an urgent need.
HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM InferenceShubham Negi, Kaushik Roy2025-10-03下载The rapid adoption of Large Language Models (LLMs) has driven a growing demand for efficient inference, particularly in latency-sensitive applications such as chatbots and personalized assistants.

cs.DC - Distributed, Parallel, and Cluster Computing

标题作者发布日期PDF摘要
Cosmological Hydrodynamics at Exascale: A Trillion-Particle Leap in CapabilityNicholas Frontiere, J. D. Emberson, Michael Buehlmann, Esteban M. Rangel, Salman Habib, Katrin Heitmann, Patricia Larsen, Vitali Morozov, Adrian Pope, Claude-André Faucher-Giguère, Antigoni Georgiadou, Damien Lebrun-Grandié, Andrey Prokopenko2025-10-03下载Resolving the most fundamental questions in cosmology requires simulations that match the scale, fidelity, and physical complexity demanded by next-generation sky surveys.
A Lightweight Federated Learning Approach for Privacy-Preserving Botnet Detection in IoTTaha M. Mahmoud, Naima Kaabouch2025-10-03下载The rapid growth of the Internet of Things (IoT) has expanded opportunities for innovation but also increased exposure to botnet-driven cyberattacks.
Short-circuiting Rings for Low-Latency AllReduceSarah-Michelle Hammer, Stefan Schmid, Rachee Singh, Vamsi Addanki2025-10-03下载Efficient collective communication is critical for many distributed ML and HPC applications. In this context, it is widely believed that the Ring algorithm for the AllReduce collective communication o...
Paris: A Decentralized Trained Open-Weight Diffusion ModelZhiying Jiang, Raihan Seraj, Marcos Villagra, Bidhan Roy2025-10-03下载We present Paris, the first publicly released diffusion model pre-trained entirely through decentralized computation. Paris demonstrates that high-quality text-to-image generation can be achieved with...
Sensors in viticulture: functions, benefits, and data-driven insightsMilan Milenkovic2025-10-03下载Use of sensors and related analytical predictions can be a powerful tool in providing data-informed input to viticulturalists' decision process, complementing their vineyard observations and intuition...
iDDS: Intelligent Distributed Dispatch and Scheduling for Workflow OrchestrationWen Guan, Tadashi Maeno, Aleksandr Alekseev, Fernando Harald Barreiro Megino, Kaushik De, Edward Karavakis, Alexei Klimentov, Tatiana Korchuganova, FaHui Lin, Paul Nilsson, Torre Wenaus, Zhaoyu Yang, Xin Zhao2025-10-03下载The intelligent Distributed Dispatch and Scheduling (iDDS) service is a versatile workflow orchestration system designed for large-scale, distributed scientific computing.
PyRadiomics-cuda: 3D features extraction from medical images for HPC using GPU accelerationJakub Lisowski, Piotr Tyrakowski, Szymon Zyguła, Krzysztof Kaczmarski2025-10-03下载PyRadiomics-cuda is a GPU-accelerated extension of the PyRadiomics library, designed to address the computational challenges of extracting three-dimensional shape features from medical images.
Energy Efficiency in Cloud-Based Big Data Processing for Earth Observation: Gap Analysis and Future DirectionsAdhitya Bhawiyuga, Serkan Girgin, Rolf A. de By, Raul Zurita-Milla2025-10-03下载Earth observation (EO) data volumes are rapidly increasing. While cloud computing are now used for processing large EO datasets, the energy efficiency aspects of such a processing have received much l...
On the energy efficiency of sparse matrix computations on multi-GPU clustersMassimo Bernaschi, Alessandro Celestini, Pasqua D'Ambra, Giorgio Richelli2025-10-03下载We investigate the energy efficiency of a library designed for parallel computations with sparse matrices. The library leverages high-performance, energy-efficient Graphics Processing Unit (GPU) accel...
Action Deviation-Aware Inference for Low-Latency Wireless RobotsJeyoung Park, Yeonsub Lim, Seungeun Oh, Jihong Park, Jinho Choi, Seong-Lyun Kim2025-10-03下载To support latency-sensitive AI applications ranging from autonomous driving to industrial robot manipulation, 6G envisions distributed ML with computational resources in mobile, edge, and cloud conne...
TridentServe: A Stage-level Serving System for Diffusion PipelinesYifei Xia, Fangcheng Fu, Hao Yuan, Hanke Zhang, Xupeng Miao, Yijun Liu, Suhan Ling, Jie Jiang, Bin Cui2025-10-03下载Diffusion pipelines, renowned for their powerful visual generation capabilities, have seen widespread adoption in generative vision tasks (e.g., text-to-image/video).
Distributed Low-Communication Training with Decoupled Momentum OptimizationSasho Nedelkoski, Alexander Acker, Odej Kao, Soeren Becker, Dominik Scheinert2025-10-03下载The training of large models demands substantial computational resources, typically available only in data centers with high-bandwidth interconnects.
UPMEM Unleashed: Software Secrets for SpeedKrystian Chmielewski, Jarosław Ławnicki, Uladzislau Lukyanau, Tadeusz Kobus, Maciej Maciejewski2025-10-03下载Developing kernels for Processing-In-Memory (PIM) platforms poses unique challenges in data management and parallel programming on limited processing units.
GRNND: A GPU-Parallel Relative NN-Descent Algorithm for Efficient Approximate Nearest Neighbor Graph ConstructionXiang Li, Qiong Chang, Yun Li, Jun Miyazaki2025-10-03下载Relative Nearest Neighbor Descent (RNN-Descent) is a state-of-the-art algorithm for constructing sparse approximate nearest neighbor (ANN) graphs by combining the iterative refinement of NN-Descent wi...

cs.NI - Networking and Internet Architecture

标题作者发布日期PDF摘要
An efficient grey theory-driven path selection for energy efficiency control in the Internet of Things using fog and cloud computingMohammad Reza Akbari, Hamid Barati, Ali Barati2025-10-03下载Due to the big data exchange on the Internet of Things, proper routing and selecting the best routes for fast data transmission improve network performance.
A distributed routing protocol for sending data from things to the cloud leveraging fog technology in the large-scale IoT ecosystemMohammad Reza Akbari, Hamid Barati, Ali Barati2025-10-03下载Fog computing integrates cloud and edge resources. According to an intelligent and decentralized method, this technology processes data generated by IoT sensors to seamlessly integrate physical and cy...
Short-circuiting Rings for Low-Latency AllReduceSarah-Michelle Hammer, Stefan Schmid, Rachee Singh, Vamsi Addanki2025-10-03下载Efficient collective communication is critical for many distributed ML and HPC applications. In this context, it is widely believed that the Ring algorithm for the AllReduce collective communication o...
Scalable Ground Station Selection for Large LEO ConstellationsGrace Ra Kim, Duncan Eddy, Vedant Srinivas, Mykel J. Kochenderfer2025-10-03下载Effective ground station selection is critical for low Earth orbiting (LEO) satellite constellations to minimize operational costs, maximize data downlink volume, and reduce communication gaps between...
Automatic Generation of Digital Twins for Network TestingShenjia Ding, David Flynn, Paul Harvey2025-10-03下载The increased use of software in the operation and management of telecommunication networks has moved the industry one step closer to realizing autonomous network operation.
Corrosion Risk Estimation for Heritage Preservation: An Internet of Things and Machine Learning Approach Using Temperature and HumidityReginald Juan M. Mercado, Muhammad Kabeer, Haider Al-Obaidy, Rosdiadee Nordin2025-10-03下载Proactive preservation of steel structures at culturally significant heritage sites like the San Sebastian Basilica in the Philippines requires accurate corrosion forecasting.
Sequence-Based Deep Learning for Handover Optimization in Dense Urban Cellular NetworkMuhammad Kabeer, Rosdiadee Nordin, Mehran Behjati, Lau Sian Lun2025-10-03下载Efficient handover management remains a critical challenge in dense urban cellular networks, where high cell density, user mobility, and diverse service demands increase the likelihood of unnecessary ...
SoK: PreconfirmationsAikaterini-Panagiota Stouka, Conor McMenamin, Demetris Kyriacou, Lin Oshitani, Quentin Botha2025-10-03下载In recent years, significant research efforts have focused on improving blockchain throughput and confirmation speeds without compromising security.
DH-EAC: Design of a Dynamic, Hierarchical Entanglement Access Control ProtocolAkihisa Takahashi, Yoshito Tobe2025-10-03下载We propose Dynamic, Hierarchical Entanglement Access Control (DH-EAC), a pure-quantum protocol for fair and anonymous allocation of scarce entanglement across wide-area quantum networks composed of ma...
FSMA: Scalable and Reliable LoRa for Non-Terrestrial Networks with Mobile GatewaysRohith Reddy Vennam, Maiyun Zhang, Raghav Subbaraman, Deepak Vashist, Dinesh Bharadia2025-10-03下载The proliferation of Low Earth Orbit (LEO) satellites for universal IoT applications and the growing use of drones in emergency services, agriculture, and military operations highlight the transformat...
L4Span: Spanning Congestion Signaling over NextG Networks for Interactive ApplicationsHaoran Wan, Kyle Jamieson2025-10-03下载Design for low latency networking is essential for tomorrow's interactive applications, but it is essential to deploy incrementally and universally at the network's last mile.

cs.PF - Performance

标题作者发布日期PDF摘要
Cosmological Hydrodynamics at Exascale: A Trillion-Particle Leap in CapabilityNicholas Frontiere, J. D. Emberson, Michael Buehlmann, Esteban M. Rangel, Salman Habib, Katrin Heitmann, Patricia Larsen, Vitali Morozov, Adrian Pope, Claude-André Faucher-Giguère, Antigoni Georgiadou, Damien Lebrun-Grandié, Andrey Prokopenko2025-10-03下载Resolving the most fundamental questions in cosmology requires simulations that match the scale, fidelity, and physical complexity demanded by next-generation sky surveys.
Formal Analysis of Metastable Failures in Software SystemsPeter Alvaro, Rebecca Isaacs, Rupak Majumdar, Kiran-Kumar Muniswamy-Reddy, Mahmoud Salamati, Sadegh Soudjani2025-10-03下载Many large-scale software systems demonstrate metastable failures. In this class of failures, a stressor such as a temporary spike in workload causes the system performance to drop and, subsequently, ...
On the energy efficiency of sparse matrix computations on multi-GPU clustersMassimo Bernaschi, Alessandro Celestini, Pasqua D'Ambra, Giorgio Richelli2025-10-03下载We investigate the energy efficiency of a library designed for parallel computations with sparse matrices. The library leverages high-performance, energy-efficient Graphics Processing Unit (GPU) accel...
UPMEM Unleashed: Software Secrets for SpeedKrystian Chmielewski, Jarosław Ławnicki, Uladzislau Lukyanau, Tadeusz Kobus, Maciej Maciejewski2025-10-03下载Developing kernels for Processing-In-Memory (PIM) platforms poses unique challenges in data management and parallel programming on limited processing units.

基于 VitePress 构建