2nd International ICST Conference on Scalable Information Systems

Research Article

Acquisition of Rule-based Knowledge for Analyzing DNA-binding Sites in Proteins

Download607 downloads
  • @INPROCEEDINGS{10.4108/infoscale.2007.972,
        author={Shinn-Jang Ho and Chia-Yun Chang and Liang-Tsung Huang and Shiow Fen Hwang and Shinn-Ying Ho},
        title={Acquisition of Rule-based Knowledge for Analyzing DNA-binding Sites in Proteins},
        proceedings={2nd International ICST Conference on Scalable Information Systems},
        proceedings_a={INFOSCALE},
        year={2010},
        month={5},
        keywords={Knowledge acquisition Binding site Protein Decision tree.},
        doi={10.4108/infoscale.2007.972}
    }
    
  • Shinn-Jang Ho
    Chia-Yun Chang
    Liang-Tsung Huang
    Shiow Fen Hwang
    Shinn-Ying Ho
    Year: 2010
    Acquisition of Rule-based Knowledge for Analyzing DNA-binding Sites in Proteins
    INFOSCALE
    ICST
    DOI: 10.4108/infoscale.2007.972
Shinn-Jang Ho1, Chia-Yun Chang2, Liang-Tsung Huang3, Shiow Fen Hwang3, Shinn-Ying Ho2,*
  • 1: Depart. of Automation Engineering,National Formosa University,Yunlin 632,Taiwan
  • 2: Institute of Bioinformatics,National Chiao Tung Univ. Hsinchu 300, Taiwan
  • 3: Depart. of Information Eng. and Computer Science Feng Chia University,Taichung 407, Taiwan
*Contact email: syho@mail.nctu.edu.tw

Abstract

This study aims to analyze DNA-binding proteins via acquisition of interpretable knowledge which can accurately predict binding sites in proteins to understand DNA-protein recognition mechanism. For mining accurate and interpretable knowledge, a large-scale dataset consisting of 982 DNA-binding proteins is constructed. This study investigates a novel feature set consisting of 11 features, including solvent accessibility, secondary structure, charge information near the residue, amino acid group and neighbor property. The derived binding and non-binding rules reveal that besides the well-known solvent accessibility, the electric charge distribution near the residue and the amino acid groups also play important roles in prediction of binding sites. The interpretable and accurate knowledge is helpful for biologist to analyze DNA-binding proteins.