phat 24(1):

Research Article

Cancer disease multinomial classification using transfer learning and SVM on the genes’ sequences

Download130 downloads
  • @ARTICLE{10.4108/eetpht.9.3220,
        author={Ines Slimene and Imene Messaoudi and Afef Elloumi Oueslati and Zied Lachiri},
        title={Cancer disease multinomial classification using transfer learning and SVM on the genes’ sequences},
        journal={EAI Endorsed Transactions on Pervasive Health and Technology},
        volume={9},
        number={1},
        publisher={EAI},
        journal_a={PHAT},
        year={2023},
        month={7},
        keywords={Cancer, FCGR, Deep Insight, Transfer Learning, SVM},
        doi={10.4108/eetpht.9.3220}
    }
    
  • Ines Slimene
    Imene Messaoudi
    Afef Elloumi Oueslati
    Zied Lachiri
    Year: 2023
    Cancer disease multinomial classification using transfer learning and SVM on the genes’ sequences
    PHAT
    EAI
    DOI: 10.4108/eetpht.9.3220
Ines Slimene1,*, Imene Messaoudi2, Afef Elloumi Oueslati2, Zied Lachiri1
  • 1: National Engineering School of Tunis
  • 2: University of Carthage
*Contact email: ines.slimene@enit.utm.tn

Abstract

INTRODUCTION: Early disease detection plays an important role in medical field especially for cancer disease, which helps doctors in diagnosing and identifying the therapeutic process. Aiming to provide assistance, many biological techniques other than machine and deep learning models were proposed. They were applied on a different type of data such as medical images and clinical data. Despite the efficiency of those techniques, they remain costly and need a lot of execution and preparation time, and resources. OBJECTIVES: In this paper, we present a novel method of disease detection analyzing the genes sequences composition. METHODS: We start by extracting k-mer nucleotides as features from gene sequences with the Frequency Chaos Game Representation (FCGR) technique. Since extracted data are huge, we use a DeepInsight model to extract the most representative k-mers. A combination of a transfer learning model, which is Residual neural Network (ResNet), and a support vector machine (SVM) algorithm is then used then to classify samples into 18 cancer disease types. RESULTS: We achieved an accuracy of 0.98 while choosing FCGR6 in feature extraction, and a combination of ResNet50 and SVM in the multinomial classification step, against an accuracy of 0.97 while using ResNet50 with a fully connected layer and FCGR5. CONCLUSION: Defining the gene sequence alterations helps in the disease detection at early stage. Here, we adopt the FCGR method (that gives the frequency of each k-mer) in defining features of the gene sequences. Then, we use deep learning models to deal with the big number of characteristics and predicting different cancer diseases.