Workshop: Third Workshop on AI for Humanitarian Assistance and Disaster Response

On Pseudo-Absence Generation and Machine Learning for Locust Breeding Ground Prediction in Africa

Arnu Pretorius

Abstract: Desert locust outbreaks threaten food security in Africa and have affected the livelihoods of millions of people. Furthermore, these outbreaks could potentially become more severe and frequent as a result of climate change. Machine learning (ML) has been demonstrated as an effective approach to locust distribution modelling which may assist in early warning. However, ML requires a significant amount of labelled data to train. Most publicly available labelled data on locusts are presence-only data, where only the sightings of locusts being present at a particular location are recorded. Prior work using ML have resorted to pseudo-absence generation methods as a way to circumvent this issue and build balanced datasets for training. The most commonly used approach is to randomly sample points in a region of interest while ensuring these sampled pseudo-absence points are at least a specific distance away from any true presence points. In this paper, we compare this random sampling approach to more advanced pseudo-absence generation, such as environmental profiling and background extent limitation, for predicting desert locust breeding grounds in Africa. We find that for the algorithms we tested, namely logistic regression (LR), gradient boosting and random forests, LR performed significantly better ($p$-value $< 2.2 \times 10^{-16}$) than the more sophisticated ensemble methods, both in terms of prediction accuracy and F1 score. Although background extent limitation combined with random sampling seemed to boost performance for ensemble methods, no statistically significant differences were detected between the pseudo-absence generation methods used to train LR. In light of this, we conclude that simpler approaches such as random sampling for pseudo-absence generation and linear classifiers such as LR for modelling are sensible and effective for predicting locust breeding grounds across Africa.