Support Vector Machine based Breast Cancer Classification using Next Generation Sequences

Babymol Kurian; V. Jyothi

Proceedings of the First International Conference on Advanced Scientific Innovation in Science, Engineering and Technology, ICASISET 2020, 16-17 May 2020, Chennai, India

Research Article

Support Vector Machine based Breast Cancer Classification using Next Generation Sequences

Download932 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.16-5-2020.2303953,
    author={Babymol  Kurian and V. L. Jyothi},
    title={Support Vector Machine based Breast Cancer Classification using Next Generation Sequences},
    proceedings={Proceedings of the First  International Conference on Advanced Scientific Innovation in Science, Engineering and Technology, ICASISET 2020, 16-17 May 2020, Chennai, India},
    publisher={EAI},
    proceedings_a={ICASISET},
    year={2021},
    month={1},
    keywords={support vector machine supervised machine learning breast cancer multiple classification next generation sequencing},
    doi={10.4108/eai.16-5-2020.2303953}
}

Babymol Kurian
V. L. Jyothi
Year: 2021
Support Vector Machine based Breast Cancer Classification using Next Generation Sequences
ICASISET
EAI
DOI: 10.4108/eai.16-5-2020.2303953

Babymol Kurian¹^,*, V. L. Jyothi²

1: Sathyabama Institute of Science and Technology,Chennai
2: Department of Computer Science & Applications, Guru Shree Shanthi Vijai Jain College, Chennai

*Contact email: babymolkurian@gmail.com

Abstract

Next Generation Sequencing is inevitable for providing better approach for predicting and curing diseases with high success rate in an appreciable timeline. Modern technology such as machine learning support the medical research with high speed and tremendous accuracy from disease prediction to cure. In this paper, the supervised learning model, Support Vector Machine is applied on next generation sequences for the prediction of breast cancer. Ten basic features of DNA sequences such as individual nucleobase average count of A, G, C, T, AT and GC-content, AT/GC composition, G-Quadruplex occurrence, ORF (Open Reading Frame) count and MR (Mutation Rate) are used for framing the feature vector. The feature vectors along with the class value are considered as the dataset for supervised learning. Datasets are prepared to classify (class value) as ‘0’ for normal sequences, ‘1’ for BRCA1 cancer sequences and ‘2’ for BRCA2 cancer sequences. Four different categories of datasets are prepared with 50, 100, 150 and 200 sequences for each class of normal sequence, BRCA1 and BRCA2 cancer sequence. While increasing the dataset size, the outlier, the distribution and scattered features of data were also analysed. The datasets are split into training and testing set with 80:20 ratio for the classification process. SVM model in Python is applied for supervised classification process.

Keywords: support vector machine, supervised machine learning, breast cancer, multiple classification, next generation sequencing

Published: 2021-01-27
Publisher: EAI

: http://dx.doi.org/10.4108/eai.16-5-2020.2303953

Support Vector Machine based Breast Cancer Classification using Next Generation Sequences

Abstract

About EAI

Community

Publish with EAI