About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Proceedings of the International Conference on Information Economy, Data Modeling and Cloud Computing, ICIDC 2022, 17-19 June 2022, Qingdao, China

Research Article

Research on Income Forecasting based on Machine Learning Methods and the Importance of Features

Download4522 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.4108/eai.17-6-2022.2322745,
        author={Jinglin  Wang},
        title={Research on Income Forecasting based on Machine Learning Methods and the Importance of Features},
        proceedings={Proceedings of the International Conference on Information Economy, Data Modeling and Cloud Computing, ICIDC 2022, 17-19 June 2022, Qingdao, China},
        publisher={EAI},
        proceedings_a={ICIDC},
        year={2022},
        month={10},
        keywords={income; classification; gini importance; random forest; knn},
        doi={10.4108/eai.17-6-2022.2322745}
    }
    
  • Jinglin Wang
    Year: 2022
    Research on Income Forecasting based on Machine Learning Methods and the Importance of Features
    ICIDC
    EAI
    DOI: 10.4108/eai.17-6-2022.2322745
Jinglin Wang1,*
  • 1: Foreign Language School attached to Guangxi Normal University
*Contact email: Axon.Wang@outlook.com

Abstract

In modern society, age has a significant impact on the income distribution of employee. However, little research has focused on the precise impacts of different factors of income and their relevant applications in predicting the person’s income. Using 48,842 individuals’ income census data from Adult Data Set, this study aims to predict the annual income level of the individual with machine learning approaches based on 13 attributes of the person (age, workclass, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week and native-country) and determine the key factors of the prediction. For income prediction, 32,561 individuals are divided randomly for training the classification model; the Random Forest (RF), K Nearest Neighbor (KNN), Support Vector Machines (SVM), Logistic Regression (LR) and Naïve Bayes (NB) algorithm have been adopted. Since the accuracy of RT is greater than 0.9 in this task, Gini Importance is used to measure the relativities between each feature and the topic. Among these 5 methods, the RT and KNN models perform relatively well, with accuracies of 0.97973 and 0.8976 respectively. And the age of the employee shows the highest relativity to his or her possible income with the importance of 0.225.

Keywords
income; classification; gini importance; random forest; knn
Published
2022-10-13
Publisher
EAI
http://dx.doi.org/10.4108/eai.17-6-2022.2322745
Copyright © 2022–2025 EAI
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL