
Research Article
Early Chronic Kidney Disease Identification using Machine Learning
@INPROCEEDINGS{10.4108/eai.28-4-2025.2358049, author={Peer Mohamed Appa M. A. Y and Gaduputi Sai Venkat and Udaygiri Charan Prasad and Moilla Prasanth Reddy}, title={Early Chronic Kidney Disease Identification using Machine Learning}, proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part II}, publisher={EAI}, proceedings_a={ICITSM PART II}, year={2025}, month={10}, keywords={chronic kidney disease machine learning xgboost early diagnosis predictive analytics medical data processing support vector machines random forest decision trees}, doi={10.4108/eai.28-4-2025.2358049} }
- Peer Mohamed Appa M. A. Y
Gaduputi Sai Venkat
Udaygiri Charan Prasad
Moilla Prasanth Reddy
Year: 2025
Early Chronic Kidney Disease Identification using Machine Learning
ICITSM PART II
EAI
DOI: 10.4108/eai.28-4-2025.2358049
Abstract
Chronic Kidney Disease (CKD) is a progressive, irreversible and disabling condition that severely affects kidney function, representing the commonest cause of endstage renal failure. Traditional diagnostic techniques such as serum creatinine, glomerular filtration rate (GFR), and urinalysis are usually laborious, expensive, and may miss the opportunity of early disease detection. ML approaches are being developed with the goal of improving the early detection of manageable disease by discerning subtle patterns from, and between, different sources of patient data beyond what is evident from conventional methods. In this paper, we propose a predictive model using Extreme Gradient Boosting (XGBoost), a kind of tree-ensemble method, which well‐performs on structured medical data. Through a clinical data set involving demographic, biochemical, and haematological parameters, we show that XGBoost outperforms the Logistic Regression, Decision Trees, Support Vector Machines, and Random Forest in terms of accuracy (95.8%). There are also performance measurements (precision, recall, F1-score, confusion matrix) which provide good proof of its efficiency. Without the ML pipeline, early prediction of CKD would be infeasible, leading to late medical intervention and poorer patient outcomes. In the future, we will improve the prediction performance by integrating the real-time monitoring features of patients and by exploring deep learning approaches.