Keywords: [ Natural Language Processing ] [ Social Media ] [ vaccine hesitancy ] [ hotspot analysis ] [ covid-19 ]
One of the major challenges faced by health policymakers in the fight against community-based infectious diseases, such as COVID-19, Malaria, Monkeypox, and Marburg, is vaccine hesitancy. In Nigeria, Twitter is one of the social media platforms used to promote anti-vaccination posts. Anti-vaccination posts or reactions on Twitter can lead to a compromise of community confidence or lack of willingness in taking the vaccine during an outbreak. In this research, we collected 10,000 vaccine-related geotagged Twitter posts in Nigeria, from December 2020 to February 2022, to identify hotspots by clustering tweet sentiments. We used the Natural Language Processing pre-trained model known as VADER to classify the tweets into three sentiment classes (positive, negative, and neutral). The outputs were validated using machine learning classification algorithms, including, Naïve Bayes with an accuracy of 66%, Logistic Regression (71%), Support Vector Machines (65%), Decision Tree (61%), and K-Nearest Neighbour (56%). The average Area under the Curve score of 78%, 85%, 83%, 67%, and 63%, respectively, was used to evaluate the quality of the multi-classification outputs. The classified sentiments were visualised on the Nigerian map using ArcGIS Online. The point-based location technique was used to calculate the hotspots on the map. Green, red, and grey were used to identify the dominance of positive, negative, and neutral sentiments. The outcome of this research shows that social media data can be used to complement existing data in identifying hotspots during an outbreak. It can also be used to inform health policy in managing vaccine hesitancy.