Big Data Technologies and Applications. 7th International Conference, BDTA 2016, Seoul, South Korea, November 17–18, 2016, Proceedings

Research Article

Feature Selection Techniques for Improving Rare Class Classification in Semiconductor Manufacturing Process

Download
378 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-58967-1_5,
        author={Jae Kim and Kyu Cho and Jong Lee and Young Han},
        title={Feature Selection Techniques for Improving Rare Class Classification in Semiconductor Manufacturing Process},
        proceedings={Big Data Technologies and Applications. 7th International Conference, BDTA  2016, Seoul, South Korea, November 17--18, 2016, Proceedings},
        proceedings_a={BDTA},
        year={2017},
        month={6},
        keywords={Semiconductor manufacturing process Fault detection prediction Feature selection Oversampling MeanEuSTDEV},
        doi={10.1007/978-3-319-58967-1_5}
    }
    
  • Jae Kim
    Kyu Cho
    Jong Lee
    Young Han
    Year: 2017
    Feature Selection Techniques for Improving Rare Class Classification in Semiconductor Manufacturing Process
    BDTA
    Springer
    DOI: 10.1007/978-3-319-58967-1_5
Jae Kim1,*, Kyu Cho2,*, Jong Lee1,*, Young Han3,*
  • 1: Inha University
  • 2: Inha Technical College
  • 3: Sungkyul University
*Contact email: jaekwonkorea@naver.com, kccho@ingatc.ac.kr, jslee@inha.ac.kr, hanys@sungkyul.ac.kr

Abstract

In order to enhance the performance, rare class prediction are to need the feature selection method for target class-related feature. Traditional data mining algorithms fail to predict rare class, as the class imbalanced data models are inherently built in favor of the majority of class-common characteristics among data instances. In the present paper, we propose the Euclidean distance- and standard deviation-based feature selection and over-sampling for the fault detection prediction model. We study applying the semiconductor manufacturing process control in fault detection prediction. First, the features calculate the MAV (Mean Absolute Value) median values. Secondly, the MeanEuSTDEV (the mean of Euclidean distance and standard deviation) are used to select the most appropriate features of the classification model. Third, to address the rare class over-fitting problem, oversampling is used. Finally, learning generates the fault detection prediction data-mining model. Furthermore, the prediction model is applied to measure the performance.