International Journal of Strategy and Organisational Learning (IJSOL)
Vol.2 No.1
DOI https://www.doi.org/10.56830/IJSOL06202501
Authors
Walaa H. Elashmawi
Mohamed Rashad
Yasmin Alkady
Abstract
The high rates of dropout from higher education, which range from 30% to 40%
globally, pose significant challenges to institutions and societies. Conventional binary
classification models (graduate versus dropout) fail to identify enrolled students at risk
of academic or personal struggles, hindering proactive interventions. This study
proposes a data-driven framework based on machine learning (ML) for classifying
student trajectories into three distinct categories: graduate, enrolled, and dropout,
providing a nuanced understanding of student progression. Leveraging a Kaggle dataset
of 4424 instances with students’ demographic backgrounds, academic histories, and
personal context features. Three machine learning classifiers are utilized: Random
Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). The
framework is composed of various phases, including data preprocessing, feature
extraction of the topmost significant features, and evaluation of the utilized ML models.
The RF model demonstrated superior performance, achieving 73.22% accuracy,
71.19% precision, 73.22% recall and 71.26% F1 score, with critical predictors through
a feature importance analysis. This multiclass approach enables early identification of
at-risk enrolled students, facilitating targeted interventions such as tailored academic
advising and retention strategies. By providing interpretable data-driven insights, the
framework empowers institutions to optimize resource allocation and improve student
success.
Keywords: Higher Education, Machine Learning, Multiclass classification,
Predictive analytics, Random Forest classifier.
