Proceedings of the 3rd International Conference on Big Data Economy and Digital Management, BDEDM 2024, January 12–14, 2024, Ningbo, China

Research Article

Integrated RFE-XGBoost Credit Risk Prediction for SMEs Using Multi-Source Heterogeneous Big Data

Download53 downloads
  • @INPROCEEDINGS{10.4108/eai.12-1-2024.2347291,
        author={Yuwen  Zeng and Juan  He and Jun  Ren and Xingyu  Liu},
        title={Integrated RFE-XGBoost Credit Risk Prediction for SMEs Using Multi-Source Heterogeneous Big Data},
        proceedings={Proceedings of the 3rd International Conference on Big Data Economy and Digital Management, BDEDM 2024, January 12--14, 2024, Ningbo, China},
        publisher={EAI},
        proceedings_a={BDEDM},
        year={2024},
        month={6},
        keywords={credit risk prediction; multi-source heterogeneous big data; text mining; rfe-xgboost},
        doi={10.4108/eai.12-1-2024.2347291}
    }
    
  • Yuwen Zeng
    Juan He
    Jun Ren
    Xingyu Liu
    Year: 2024
    Integrated RFE-XGBoost Credit Risk Prediction for SMEs Using Multi-Source Heterogeneous Big Data
    BDEDM
    EAI
    DOI: 10.4108/eai.12-1-2024.2347291
Yuwen Zeng1, Juan He1,*, Jun Ren1, Xingyu Liu1
  • 1: Southwest Jiaotong University
*Contact email: hejunlin93@163.com

Abstract

This paper addresses the challenge of credit risk prediction in the financing of SMEs, focusing on 716 listed manufacturing SMEs from 2018 to 2022. A predictive model based on RFE and XGBoost was developed by integrating multi-source heterogeneous data, including key indicators such as ESG scores, public sentiment, and litigation records. The results indicate that after applying RFE, the average F1 score of XGBoost improved from 0.8822 to 0.9147, an enhancement of approximately 3.25%. This improvement is about 4.75% higher than the next best model (Random Forest, with an average F1 score of 0.8732). These findings underscore the significant role of multi-source heterogeneous data in credit risk management and provide financial institutions with a more advanced tool for risk assessment.