Improving Triage Level Classification Using Gradient Boosting Algorithms
Abstract
Triage in emergency departments (EDs) is critical for prioritizing patient care and optimizing limited healthcare resources. However, conventional triage systems are prone to inconsistencies, such as undertriage and overtriage, especially in high-demand clinical environments. This study proposes a machine learning approach for predicting triage levels using structured data from the emergency unit of a high-complexity regional hospital in Chile. The dataset was filtered to include only adult and dental consultation records, excluding pediatric and obstetric-gynecological cases due to their distinct clinical characteristics. Patients were classified into five triage levels (C1–C5), revealing a strong class imbalance, with C2 and C3 accounting for nearly 90% of records. We applied two tree-based gradient boosting algorithms, XGBoost and CatBoost, combined with Bayesian hyperparameter optimization and SVMSMOTE to address class imbalance. Data preprocessing included Random Forest-based imputation, standardization of numerical features, and one-hot encoding of categorical variables. Our results show that XGBoost outperformed CatBoost in overall accuracy (0.73) and weighted F1-score (0.72), particularly for the most frequent categories (C2 and C3). CatBoost, while slightly lower in global accuracy, achieved higher sensitivity in the minority class C1. Both models, however, exhibited limitations in predicting the C5 category, highlighting the ongoing challenge of accurately classifying less frequent triage levels.