Abstract
The identification and quantification of marine mammals is crucial for understanding their abundance, ecology and supporting their conservation efforts. Traditional methods for detecting cetaceans, however, are often labor-intensive and limited in their accuracy. To overcome these challenges, this work explores the use of convolutional neural networks (CNNs) as a tool for automating the detection of cetaceans through aerial images from unmanned aerial vehicles (UAVs). Additionally, the study proposes the use of Long-Short-Term-Memory (LSTM)-based models for video detection using a CNN-LSTM architecture. Models were trained on a selected dataset of dolphin examples acquired from 138 online videos with the aim of testing methods that hold potential for practical field monitoring. The approach was effectively validated on field data, suggesting that the method shows potential for further applications for operational settings. The results show that image-based detection methods are effective in the detection of dolphins from aerial UAV images, with the best-performing model, based on a ConvNext architecture, achieving high accuracy and f1-score values of 83.9% and 82.0%, respectively, within field observations conducted. However, video-based methods showed more difficulties in the detection task, as LSTM-based models struggled with generalization beyond their training environments, achieving a top accuracy of 68%. By reducing the labor required for cetacean detection, thus improving monitoring efficiency, this research provides a scalable approach that can support ongoing conservation efforts by enabling more robust data collection on cetacean populations.