Object detection for sky surveillance is a challenging problem due to having small objects in a large volume and a constantly changing background which requires high resolution frames. For example, detecting flying birds in wind farms to prevent their collision with the wind turbines. This paper proposes a YOLOv4-based ensemble model for bird detection in grayscale videos captured around wind turbines in wind farms. In order to tackle this problem, we introduce two datasets—(1) Klim and (2) Skagen—collected at two locations in Denmark. We use Klim training set to train three increasingly capable YOLOv4 based models. Model 1 uses YOLOv4 trained on the Klim dataset, Model 2 introduces tiling to improve small bird detection, and the last model uses tiling and temporal stacking and achieves the best mAP values on both Klim and Skagen datasets. We used this model to set up an ensemble detector, which further improves mAP values on both datasets. The three models achieve testing mAP values of 82%, 88%, and 90% on the Klim dataset. mAP values for Model 1 and Model 3 on the Skagen dataset are 60% and 92%. Improving object detection accuracy could mitigate birds’ mortality rate by choosing the locations for such establishment and the turbines location. It can also be used to improve the collision avoidance systems used in wind energy facilities.