Bio-Inspired Models of Network, Information, and Computing Systems. 5th International ICST Conference, BIONETICS 2010, Boston, USA, December 1-3, 2010, Revised Selected Papers

Research Article

Feature and Kernel Evolution for Recognition of Hypersensitive Sites in DNA Sequences

Download
476 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-32615-8_23,
        author={Uday Kamath and Amarda Shehu and Kenneth Jong},
        title={Feature and Kernel Evolution for Recognition of Hypersensitive Sites in DNA Sequences},
        proceedings={Bio-Inspired Models of Network, Information, and Computing Systems. 5th International ICST Conference, BIONETICS 2010, Boston, USA, December 1-3, 2010, Revised Selected Papers},
        proceedings_a={BIONETICS},
        year={2012},
        month={10},
        keywords={DNase I hypersensitive sites evolutionary algorithms support vector machines genetic programming kernel functions motifs},
        doi={10.1007/978-3-642-32615-8_23}
    }
    
  • Uday Kamath
    Amarda Shehu
    Kenneth Jong
    Year: 2012
    Feature and Kernel Evolution for Recognition of Hypersensitive Sites in DNA Sequences
    BIONETICS
    Springer
    DOI: 10.1007/978-3-642-32615-8_23
Uday Kamath1, Amarda Shehu, Kenneth Jong1
  • 1: George Mason University

Abstract

The annotation of DNA regions that regulate gene transcription is the first step towards understanding phenotypical differences among cells and many diseases. Hypersensitive (HS) sites are reliable markers of regulatory regions. Mapping HS sites is the focus of many statistical learning techniques that employ Support Vector Machines (SVM) to classify a DNA sequence as HS or non-HS. The contribution of this paper is a novel methodology inspired by biological evolution to automate the basic steps in SVM and improve classification accuracy. First, an evolutionary algorithm designs optimal sequence motifs used to associate feature vectors with the input sequences. Second, a genetic programming algorithm designs optimal kernel functions that map the feature vectors into a high-dimensional space where the vectors can be optimally separated into the HS and non-HS classes. Results show that the employment of evolutionary computation techniques improves classification accuracy and promises to automate the analysis of biological sequences.