Testbeds and Research Infrastructures for the Development of Networks and Communications. 14th EAI International Conference, TridentCom 2019, Changsha, China, December 7-8, 2019, Proceedings

Research Article

Evaluating the Effectiveness of Wrapper Feature Selection Methods with Artificial Neural Network Classifier for Diabetes Prediction

  • @INPROCEEDINGS{10.1007/978-3-030-43215-7_1,
        author={M. Fahmiin and T. Lim},
        title={Evaluating the Effectiveness of Wrapper Feature Selection Methods with Artificial Neural Network Classifier for Diabetes Prediction},
        proceedings={Testbeds and Research Infrastructures for the Development of Networks and Communications. 14th EAI International Conference, TridentCom 2019, Changsha, China, December 7-8, 2019, Proceedings},
        proceedings_a={TRIDENTCOM},
        year={2020},
        month={3},
        keywords={Feature selection Wrapper methods Diabetes classification},
        doi={10.1007/978-3-030-43215-7_1}
    }
    
  • M. Fahmiin
    T. Lim
    Year: 2020
    Evaluating the Effectiveness of Wrapper Feature Selection Methods with Artificial Neural Network Classifier for Diabetes Prediction
    TRIDENTCOM
    Springer
    DOI: 10.1007/978-3-030-43215-7_1
M. Fahmiin1,*, T. Lim1,*
  • 1: Universiti Teknologi Brunei
*Contact email: fahmiinabdullah96@gmail.com, lim.tiong.hoo@utb.edu.bn

Abstract

Feature selection is an important preprocessing technique used to determine the most important features that contributes to the classification of a dataset, typically performed on high dimension datasets. Various feature selection algorithms have been proposed for diabetes prediction. However, the effectiveness of these proposed algorithms have not been thoroughly evaluated statistically. In this paper, three types of feature selection methods (Sequential Forward Selection, Sequential Backward Selection and Recursive Feature Elimination) classified under the wrapper method are used in identifying the optimal subset of features needed for classification of the Pima Indians Diabetes dataset with an Artificial Neural Network (ANN) as the classifying algorithm. All three methods manage to identify the important features of the dataset (Plasma Glucose Concentration and BMI reading), indicating their effectiveness for feature selection, with Sequential Forward Selection obtaining the feature subset that most improves the ANN. However, there are little to no improvements in terms of classifier evaluation metrics (accuracy and precision) when trained using the optimal subsets from each method as compared to using the original dataset, showing the ineffectiveness of feature selection on the low-dimensional Pima Indians Diabetes dataset.