Research Article
A Novel Algorithm for Prediction of Protein Coding DNA from Non-coding DNA in Microbial Genomes Using Genomic Composition and Dinucleotide Compositional Skew
@INPROCEEDINGS{10.1007/978-3-642-27308-7_57, author={Baharak Goli and B. Aswathi and Achuthsankar Nair}, title={A Novel Algorithm for Prediction of Protein Coding DNA from Non-coding DNA in Microbial Genomes Using Genomic Composition and Dinucleotide Compositional Skew}, proceedings={Advances in Computer Science and Information Technology. Computer Science and Engineering. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part II}, proceedings_a={CCSIT PATR II}, year={2012}, month={11}, keywords={Identification of protein coding DNA genomic composition dinucleotide compositional skew feature selection methods machine learning}, doi={10.1007/978-3-642-27308-7_57} }
- Baharak Goli
B. Aswathi
Achuthsankar Nair
Year: 2012
A Novel Algorithm for Prediction of Protein Coding DNA from Non-coding DNA in Microbial Genomes Using Genomic Composition and Dinucleotide Compositional Skew
CCSIT PATR II
Springer
DOI: 10.1007/978-3-642-27308-7_57
Abstract
Accurate identification of genes encoding proteins in genome remains an open problem in computational biology that has been receiving increasing consideration with explosion in sequence data. This has inspired us to relook at this problem. In this study, we propose a novel gene finding algorithm which relies on the use of genomic composition and dinucleotide compositional skew information. In order to identify the most prominent features, two feature selection techniques widely used in data preprocessing for machine learning problems: CFS and ReliefF algorithm applied. The performance of two types of neural network such as multilayer perceptron and RBF network was evaluated with these filter approaches. Our proposed model led to successful prediction of protein coding from non-coding with 96.47% and 94.18 % accuracy for MLP and RBF Network respectively in case of CFS and 94.94 % and 92.46 % accuracy for MLP and RBF Network respectively in case of ReliefF algorithm.