Over-sampling imbalanced datasets using the Covariance Matrix

Ireimis Leguen-deVarona; Julio Madera; Yoan Martínez-López; José Hernández-Nieto

ew 20(27): e2

Research Article

Over-sampling imbalanced datasets using the Covariance Matrix

Download1354 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/eai.13-7-2018.163982,
    author={Ireimis Leguen-deVarona and Julio Madera and Yoan Mart\^{\i}nez-L\^{o}pez and Jos\^{e} Carlos Hern\^{a}ndez-Nieto},
    title={Over-sampling imbalanced datasets using the Covariance Matrix},
    journal={EAI Endorsed Transactions on Energy Web},
    volume={7},
    number={27},
    publisher={EAI},
    journal_a={EW},
    year={2020},
    month={4},
    keywords={Imbalanced datasets, Oversampling, Covariance Matrix, Attribute Dependency},
    doi={10.4108/eai.13-7-2018.163982}
}

Ireimis Leguen-deVarona
Julio Madera
Yoan Martínez-López
José Carlos Hernández-Nieto
Year: 2020
Over-sampling imbalanced datasets using the Covariance Matrix
EW
EAI
DOI: 10.4108/eai.13-7-2018.163982

Ireimis Leguen-deVarona¹, Julio Madera¹^,*, Yoan Martínez-López¹, José Carlos Hernández-Nieto¹

1: University of Camagüey, Camagüey, Cuba

*Contact email: julio.madera@reduc.edu.cu

Abstract

INTRODUCTION: Nowadays, many machine learning tasks involve learning from imbalanced datasets, leading to the miss-classification of the minority class. One of the state-of-the-art approaches to ”solve” this problem at the data level is Synthetic Minority Over-sampling Technique (SMOTE) which in turn uses KNearest Neighbors (KNN) algorithm to select and generate new instances.

OBJECTIVES: This paper presents SMOTE-Cov, a modified SMOTE that use Covariance Matrix instead of KNN to balance datasets, with continuous attributes and binary class.

METHODS: We implemented two variants SMOTE-CovI, which generates new values within the interval of each attribute and SMOTE-CovO, which allows some values to be outside the interval of the attributes.

RESULTS: The results show that our approach has a similar performance as the state- of-the-art approaches.

CONCLUSION: In this paper, a new algorithm is proposed to generate synthetic instances of the minority class, using the Covariance Matrix.

Keywords: Imbalanced datasets, Oversampling, Covariance Matrix, Attribute Dependency

Received: 2020-02-29
Accepted: 2020-04-04
Published: 2020-04-15
Publisher: EAI

: http://dx.doi.org/10.4108/eai.13-7-2018.163982

Copyright © 2020 I. Leguen-deVaronaet al., licensed to EAI. This is an open access article distributed under the terms ofthe Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unlimiteduse, distribution and reproduction in any medium so long as the original work is properly cited.

Over-sampling imbalanced datasets using the Covariance Matrix

Abstract

About EAI

Community

Publish with EAI