Camera trapping for wildlife insights
A camera trap is a device for capturing wildlife on film. It is typically left in a remote place for extended periods of time to monitor the presence and activity of various animals. The camera is amongst others triggered by animal activity. However, it can also be triggered by other kinds of movement such as waiving grass. The goal is to offer as much information with as little noise as a possible to the people managing the parks. Rather than having an ecologist labour through thousands of pictures it would be ideal to have an algorithm decide whether 1) an animal is present in an image, and if so, 2) classify the animal so that it is immediately clear which images offer valuable information.
As Sensing Clues serves wildlife parks across the globe, we went for an algorithm that is able to detect species from different regions across the globe. There are pre-trained algorithms available that are able to classify a large number of species, such as those trained on the iNaturalist dataset. However, these images are mostly not from camera traps.
Images from camera traps are notoriously difficult to analyse due to the animals being for instance occluded, motion blurred, or because these images are taken in different circumstances such as day- and nighttime. There are a number of quality camera trap datasets, such as the LILA BC repository. However, many of these datasets only concern relatively small regions of the world. Luckily, the camera trap data set from the iWildCam2020 FGVC7 Kaggle competition proved to be of great help. It contains over 200k images of a vast number of species across the globe and thus offered us a great starting point for an algorithm to detect and classify animals.
For this problem we took a two step approach. First, we detect animals in images. For this we use the MegaDetector, which is an object detection algorithm developed by Microsoft AI for Earth that has been specifically trained to detect animals and humans from camera trap footage (see figure 3). We crop the detected objects to their respective bounding boxes and then turn them into squares by padding the images with zeros to preserve the aspect ratio. Although the MegaDetector works great when it comes to detecting animals, it is not able to classify the detected animal in terms of what species it is. This is what we do in the second step of our approach. In the second step these crops are passed to a dedicated classifier, which is based on a pre-trained InceptionV3 network trained with TensorFlow. This particular network has been pre-trained on the iNaturalist 2017 dataset and already incorporates some base level information of animals, before even having fed it a single camera trap photo.
In order to ease classification we remove the variation between images by blurring each image with a Gaussian filter and subtracting the blurred image from the original (see Figure 1, also see this link). We perform this preprocessing step before cropping the images. This results in images with roughly uniform brightness throughout the image. Especially for nighttime images this results in better contrast.
Figure 1: image preprocessing
An example of this preprocessing step can be found in Figure 2.
Figure 2: nighttime camera-trap image of a badger without (left) and with (right) the preprocessing function defined in Figure 1 applied. Original image credit: Jasper Ridge Biological Preserve of Stanford University.
Analysing images in practice
The algorithm we trained performed well in the Kaggle competition, but it only is of added value if it can be used in practice. So how do you run such an algorithm in practice?
The models can be easily served using TensorFlow Serving. Images are picked up, passed to the MegaDetector and the outcome is used to create crops which are passed to the classification algorithm.
Although our algorithm was trained on species across the globe, we know for a fact that certain animals do not inhabit certain parts of the world. This is something we incorporated by means of a simple business rule that compares the location of the camera trap with the regions that the predicted specie inhabits. This can be of help in situations with look-a-like species that each inhabit a different region of the world, such as the South American jaguar vs. the African / Asian leopard. In principle, location information can also be included in the algorithm itself, however, as our goal was to come up with a minimal-viable product, we opted for a low-complexity business rule-driven solution.
Figure 3: a squirrel detected by the MegaDetector
Having an algorithm that performs well on a test set does not necessarily mean you can rely on it in a live environment. Information in the Sensing Clues Wildlife insights platform has to be of excellent quality such that wildlife rangers can rely on it. Hence, it is important that information is validated before it is trusted. Therefore, all images (coming from the parks Sensing Clues serves) in which animals have been detected are first presented to the image owners in a separate platform for validation. Validated images can then potentially be used to further improve the algorithm (i.e. human-in-the-loop).
We have developed a tool to detect and classify animals species in camera trap images from various regions across the globe. This tool processes camera trap information in an efficient, non-labour intensive manner allowing ecologists to sift through images faster since the empty images, which can make up a significant percentage, are automatically discarded. On the other hand it provides a species prediction in case of an animal being detected. The next step for this algorithm is to prove itself in practice and ultimately add to the stack of information that helps safeguard wildlife!
Written by Richard Bartels and Mike Kraus