Skip to yearly menu bar Skip to main content

Affinity Workshop: Women in Machine Learning

Resume Parsing using an ensemble of CNN, Bi-LSTM and CRF in a Hard Voting Predictive Approach

Scholastica N. Mallo · Francisca Nonyelum Ogwueleka · Philip Odion · Martins E. Irhebhude


In this study, we propose a neural network ensemble approach using Bi-LSTM, CNN and CRF combined with hard voting to parse resume and select the candidate that is best fit for the job. For the data sets, we collected about 7000 resumes in .pdf and .docx from kaggle and Teledom Nigeria. Due to the unstructured format of resumes, some preprocessing steps would be taken before they can be used. The first step is the extraction of the plain text whereby the resumes which are in .pdf, .docx, and .doc formats are converted into text format using Apache Tika Server. The second step is to remove the Stop Words whereby the many unwanted lines, punctuations, bullets, etc, are removed by using string replacement method and regular expressions. After this is done, then the Segmentation is carried out. The segmentation model is created using a Convolutional Neural Network (CNN) architecture, this will return the sentences with the correct label. In this step, the model divides a whole resume into different segments such as personal, educational occupational, Skills, Publications, Certifications etc. The next task would be to extract useful information from each segment. The base model for this stage is an ensemble of CRF, Bi-LSTM and CNN and hard voting approach will be employed. The idea is to use a single unified predictive model in place of separate classification models by considering the predicted class with maximum votes against each class label. This study is ongoing.

Chat is not available.