Advances in Computer Science and Information Technology. Computer Science and Engineering. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part II

Research Article

A Novel Algorithm for Prediction of Protein Coding DNA from Non-coding DNA in Microbial Genomes Using Genomic Composition and Dinucleotide Compositional Skew

Download
230 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-27308-7_57,
        author={Baharak Goli and B. Aswathi and Achuthsankar Nair},
        title={A Novel Algorithm for Prediction of Protein Coding DNA from Non-coding DNA in Microbial Genomes Using Genomic Composition and Dinucleotide Compositional Skew},
        proceedings={Advances in Computer Science and Information Technology. Computer Science and Engineering. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part II},
        proceedings_a={CCSIT PATR II},
        year={2012},
        month={11},
        keywords={Identification of protein coding DNA genomic composition dinucleotide compositional skew feature selection methods machine learning},
        doi={10.1007/978-3-642-27308-7_57}
    }
    
  • Baharak Goli
    B. Aswathi
    Achuthsankar Nair
    Year: 2012
    A Novel Algorithm for Prediction of Protein Coding DNA from Non-coding DNA in Microbial Genomes Using Genomic Composition and Dinucleotide Compositional Skew
    CCSIT PATR II
    Springer
    DOI: 10.1007/978-3-642-27308-7_57
Baharak Goli1,*, B. Aswathi1, Achuthsankar Nair1
  • 1: University of Kerala
*Contact email: baharak_goli@yahoo.com

Abstract

Accurate identification of genes encoding proteins in genome remains an open problem in computational biology that has been receiving increasing consideration with explosion in sequence data. This has inspired us to relook at this problem. In this study, we propose a novel gene finding algorithm which relies on the use of genomic composition and dinucleotide compositional skew information. In order to identify the most prominent features, two feature selection techniques widely used in data preprocessing for machine learning problems: CFS and ReliefF algorithm applied. The performance of two types of neural network such as multilayer perceptron and RBF network was evaluated with these filter approaches. Our proposed model led to successful prediction of protein coding from non-coding with 96.47% and 94.18 % accuracy for MLP and RBF Network respectively in case of CFS and 94.94 % and 92.46 % accuracy for MLP and RBF Network respectively in case of ReliefF algorithm.