Research Article
Implementing Extreme Gradient Boosting (XGBoost) Classifier to Improve Customer Churn Prediction
@INPROCEEDINGS{10.4108/eai.2-8-2019.2290338, author={Iqbal Hanif}, title={Implementing Extreme Gradient Boosting (XGBoost) Classifier to Improve Customer Churn Prediction}, proceedings={Proceedings of the 1st International Conference on Statistics and Analytics, ICSA 2019, 2-3 August 2019, Bogor, Indonesia}, publisher={EAI}, proceedings_a={ICSA}, year={2020}, month={1}, keywords={churn prediction classification extreme gradient boosting imbalanced-classes data logistic regression}, doi={10.4108/eai.2-8-2019.2290338} }
- Iqbal Hanif
Year: 2020
Implementing Extreme Gradient Boosting (XGBoost) Classifier to Improve Customer Churn Prediction
ICSA
EAI
DOI: 10.4108/eai.2-8-2019.2290338
Abstract
As a part of Customer Relationship Management (CRM), Churn Prediction is very important to predict customers who are most likely to churn and need to be retained with caring programs to prevent them to churn. Among machine learning algorithms, Extreme Gradient Boosting (XGBoost) is a recently popular prediction algorithm in many machine learning challenges as a part of ensemble method which is expected to give better predictions with imbalanced-classes data, a common characteristic of customers churn data. This research is aimed to prove or disprove that XGBoost algorithm gives better prediction compared with logistic regression algorithm as an existing algorithm. This research was conducted by using customer’s data sample (both churned and stayed customers) and their behaviors recorded for 6 months from October 2017 to March 2018. There were four phases in this research: data preparation phase, feature selection phase, modelling phase, and evaluation phase. The results show that XGBoost algorithm gives a better prediction than LogReg algorithm does based on its prediction accuracy, specificity, sensitivity and ROC curve. XGBoost model also has a better capability to separate churned customers from not-churned customers than LogReg model does according to KS chart and Gains-Lift charts produced by each algorithm.