Keeping track of animals in the wild with computer vision

Globally, countless wildlife species are under constant pressure from encroachment, poaching and trafficking, (illegal) logging, and other human-induced threats. The presence of wild animals like elephants, tigers, bears and wolves in rugged countrysides, calls for measures to ensure peaceful human-wildlife coexistence. Leveraging data can give rangers an edge when it comes to nature conservation and wildlife protection.

Sensing Clues has developed a wildlife insights platform that enables wildlife rangers to instantly assess what is happening, where, when, who is involved or affected, and how threats tend to evolve. This platform combines data from various sources to offer comprehensive information on the current state of affairs in wildlife parks across the globe. For instance, while on patrol, rangers can track their sightings offline through the Cluey app and later synchronise with the central platform for further analysis.

To make these assessments, quality sightings data is of utmost importance — with the more, clearly being the better. Traditionally, much relies on sightings by rangers in the field. However, boots on the ground are always scarce in comparison with the vast areas that need protection. A common additional source of information is the use of camera traps that capture wildlife on film. However, incorporating camera trap information into the platform often turns out to be a quite labour-intensive process as ecologists have to manually sift through hours of camera trap footage.

In this post, we will delve into how image recognition can be used to process camera trap information in an efficient, non-labour intensive manner and how this is currently being incorporated into the Sensing Clues wildlife insights platform.

Camera trapping for wildlife insights

A camera trap is a device for capturing wildlife on film. It is typically left in a remote place for extended periods of time to monitor the presence and activity of various animals. The camera is amongst others triggered by animal activity. However, it can also be triggered by other kinds of movement such as waiving grass. The goal is to offer as much information with as little noise as a possible to the people managing the parks. Rather than having an ecologist labour through thousands of pictures it would be ideal to have an algorithm decide whether 1) an animal is present in an image, and if so, 2) classify the animal so that it is immediately clear which images offer valuable information.

The model

As Sensing Clues serves wildlife parks across the globe, we went for an algorithm that is able to detect species from different regions across the globe. There are pre-trained algorithms available that are able to classify a large number of species, such as those trained on the iNaturalist dataset. However, these images are mostly not from camera traps.

Images from camera traps are notoriously difficult to analyse due to the animals being for instance occluded, motion blurred, or because these images are taken in different circumstances such as day- and nighttime. There are a number of quality camera trap datasets, such as the LILA BC repository. However, many of these datasets only concern relatively small regions of the world. Luckily, the camera trap data set from the iWildCam2020 FGVC7 Kaggle competition proved to be of great help. It contains over 200k images of a vast number of species across the globe and thus offered us a great starting point for an algorithm to detect and classify animals.

For this problem we took a two step approach. First, we detect animals in images. For this we use the MegaDetector, which is an object detection algorithm developed by Microsoft AI for Earth that has been specifically trained to detect animals and humans from camera trap footage (see figure 3). We crop the detected objects to their respective bounding boxes and then turn them into squares by padding the images with zeros to preserve the aspect ratio. Although the MegaDetector works great when it comes to detecting animals, it is not able to classify the detected animal in terms of what species it is. This is what we do in the second step of our approach. In the second step these crops are passed to a dedicated classifier, which is based on a pre-trained InceptionV3 network trained with TensorFlow. This particular network has been pre-trained on the iNaturalist 2017 dataset and already incorporates some base level information of animals, before even having fed it a single camera trap photo.

In order to ease classification we remove the variation between images by blurring each image with a Gaussian filter and subtracting the blurred image from the original (see Figure 1, also see this link). We perform this preprocessing step before cropping the images. This results in images with roughly uniform brightness throughout the image. Especially for nighttime images this results in better contrast.

Figure 1: image preprocessing

An example of this preprocessing step can be found in Figure 2.

Figure 2: nighttime camera-trap image of a badger without (left) and with (right) the preprocessing function defined in Figure 1 applied. Original image credit: Jasper Ridge Biological Preserve of Stanford University.

Analysing images in practice

The algorithm we trained performed well in the Kaggle competition, but it only is of added value if it can be used in practice. So how do you run such an algorithm in practice?

The models can be easily served using TensorFlow Serving. Images are picked up, passed to the MegaDetector and the outcome is used to create crops which are passed to the classification algorithm.

Although our algorithm was trained on species across the globe, we know for a fact that certain animals do not inhabit certain parts of the world. This is something we incorporated by means of a simple business rule that compares the location of the camera trap with the regions that the predicted specie inhabits. This can be of help in situations with look-a-like species that each inhabit a different region of the world, such as the South American jaguar vs. the African / Asian leopard. In principle, location information can also be included in the algorithm itself, however, as our goal was to come up with a minimal-viable product, we opted for a low-complexity business rule-driven solution.

Figure 3: a squirrel detected by the MegaDetector

Having an algorithm that performs well on a test set does not necessarily mean you can rely on it in a live environment. Information in the Sensing Clues Wildlife insights platform has to be of excellent quality such that wildlife rangers can rely on it. Hence, it is important that information is validated before it is trusted. Therefore, all images (coming from the parks Sensing Clues serves) in which animals have been detected are first presented to the image owners in a separate platform for validation. Validated images can then potentially be used to further improve the algorithm (i.e. human-in-the-loop).

Conclusion

We have developed a tool to detect and classify animals species in camera trap images from various regions across the globe. This tool processes camera trap information in an efficient, non-labour intensive manner allowing ecologists to sift through images faster since the empty images, which can make up a significant percentage, are automatically discarded. On the other hand it provides a species prediction in case of an animal being detected. The next step for this algorithm is to prove itself in practice and ultimately add to the stack of information that helps safeguard wildlife!

Written by Richard Bartels and Mike Kraus

Keeping track of animals in the wild with computer vision

You May Also Like

Lessons Learned 2021 Q4: challenges faced by a Machine Learning Engineer

How to add on-screen logging to your Flask application and deploy it on AWS Elastic Beanstalk

Why Data Scientists should write Unit Tests for their code

Schrijf je in!

Vantage AI B.V.