Affinity Workshop: WiML Workshop 1

Importance of Data Re-Sampling and Dimensionality Reduction in Predicting Students’ Success

Eluwumi Buraimoh · Ritesh Ajoodha · Kershree Padayachee


We present the importance of data pre-processing in predicting students’ success. We implemented Principal Component Analysis for dimensionality reduction to achieve better model performance. Data re-sampling techniquewas also utilized to handle the imbalanced class problem that is one of the significant issues in effective classification in Educational Data Mining due to the nature of the data fromeducational settings. We also performed a comparative analysis on the impacts of Random Under-Sampling (RUS), Random Over-Sampling (ROS), and Synthetic Minority Over-SamplingTechnique (SMOTE) to an imbalanced dataset used in this study. SMOTE and PCA techniques application offer better performance compared to RUS and ROS with PCA. Support Vector Machine had the best accuracy value of 0.94 after the application of SMOTE and PCA. The application of PCA on the imbalanced data also positively affected the accuracy of the models used in this study. We used other performance metrics to evaluate our models: Kappa, Area Under Curve, and Precision-Recall curve. Our finding shows that the predictive models can predict student success with the application of PCA and data re-sampling techniques.

Chat is not available.