
Research Article
Predictive Modeling of Diabetes Using Ensemble Learning and Feature Optimization
@INPROCEEDINGS{10.4108/eai.28-4-2025.2358067, author={M. Dhilsath Fathima and M. Akash and A. Yashwanth Reddy and G. Trilok}, title={Predictive Modeling of Diabetes Using Ensemble Learning and Feature Optimization }, proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part II}, publisher={EAI}, proceedings_a={ICITSM PART II}, year={2025}, month={10}, keywords={diabetes prediction ensemble learning gradient boosting feature optimization xgboost shap smote healthcare analytics}, doi={10.4108/eai.28-4-2025.2358067} }
- M. Dhilsath Fathima
M. Akash
A. Yashwanth Reddy
G. Trilok
Year: 2025
Predictive Modeling of Diabetes Using Ensemble Learning and Feature Optimization
ICITSM PART II
EAI
DOI: 10.4108/eai.28-4-2025.2358067
Abstract
Diabetes has emerged as a huge global health burden as a chronic metabolic disorder. Early and accurate prediabetes detection is important to prevent complications like cardiovascular diseases and neuropathy. In this paper, we present an ensemble-based robust predictive framework incorporating advanced feature optimization methods, which is based on the extreme gradient boosting (XGBoost) method. Data pre-processing steps including imputation, Normalization, outlier deletion, and features elimination were applied to improve the accuracy of the model. Synthetic Minority Oversampling Technique (SMOTE) handled class imbalance and SHAP (Shapley Additive explanations) values was used to obtain feature importance interpretability. The proposed model is trained and tested using the PIMA Indian Diabetes Dataset and obtained better results compared with other classical classifiers in accuracy and AUC-ROC. The system was implemented as a web-based application for on-line risk prediction. Here we show that the combination of ensemble learning and the incorporation of optimization preprocessing allow reliable, scalable and interpretable diabetes risk prediction to be generated.