Emerging Technologies in Computing. Second International Conference, iCETiC 2019, London, UK, August 19–20, 2019, Proceedings

Research Article

Accuracy Comparison of Machine Learning Algorithms for Predictive Analytics in Higher Education

  • @INPROCEEDINGS{10.1007/978-3-030-23943-5_19,
        author={Sarfraz Brohi and Thulasyammal Pillai and Sukhminder Kaur and Harsimren Kaur and Sanath Sukumaran and David Asirvatham},
        title={Accuracy Comparison of Machine Learning Algorithms for Predictive Analytics in Higher Education},
        proceedings={Emerging Technologies in Computing. Second International Conference, iCETiC 2019, London, UK, August 19--20, 2019, Proceedings},
        proceedings_a={ICETIC},
        year={2019},
        month={7},
        keywords={Predictive analytics Machine learning Higher education},
        doi={10.1007/978-3-030-23943-5_19}
    }
    
  • Sarfraz Brohi
    Thulasyammal Pillai
    Sukhminder Kaur
    Harsimren Kaur
    Sanath Sukumaran
    David Asirvatham
    Year: 2019
    Accuracy Comparison of Machine Learning Algorithms for Predictive Analytics in Higher Education
    ICETIC
    Springer
    DOI: 10.1007/978-3-030-23943-5_19
Sarfraz Brohi1,*, Thulasyammal Pillai1,*, Sukhminder Kaur1,*, Harsimren Kaur2,*, Sanath Sukumaran1,*, David Asirvatham1,*
  • 1: Taylor’s University
  • 2: Hilti Asia IT Services
*Contact email: SarfrazNawaz.Brohi@taylors.edu.my, Thulasyammal.RamiahPillai@taylors.edu.my, Sukhminder.Kaur@taylors.edu.my, simyaulekh10@gmail.com, Sanath@taylors.edu.my, David.Asirvatham@taylors.edu.my

Abstract

In this research, we compared the accuracy of machine learning algorithms that could be used for predictive analytics in higher education. The proposed experiment is based on a combination of classic machine learning algorithms such as Naive Bayes and Random Forest with various ensemble methods such as Stochastic, Linear Discriminant Analysis (LDA), Tree model (C5.0), Bagged CART (treebag) and K Nearest Neighbors (KNN). We applied traditional classification methods to classify the students’ performance and to determine the independent variables that offer the highest accuracy. Our results depict that the data with the 11 features using random forest generated the best accuracy value of 0.7333. However, we revised the experiment with ensemble algorithms to reduce the variance (bagging), bias (boosting) and to improve the prediction accuracy (stacking). Consequently, the bagging random forest outperformed other methods with the accuracy value of 0.7959.