Collaborative Computing: Networking, Applications and Worksharing. 14th EAI International Conference, CollaborateCom 2018, Shanghai, China, December 1-3, 2018, Proceedings

Research Article

Target Gene Mining Algorithm Based on gSpan

  • @INPROCEEDINGS{10.1007/978-3-030-12981-1_36,
        author={Liangfu Lu and Xiaoxu Ren and Lianyong Qi and Chenming Cui and Yichen Jiao},
        title={Target Gene Mining Algorithm Based on gSpan},
        proceedings={Collaborative Computing: Networking, Applications and Worksharing. 14th EAI International Conference, CollaborateCom 2018, Shanghai, China, December 1-3, 2018, Proceedings},
        proceedings_a={COLLABORATECOM},
        year={2019},
        month={2},
        keywords={gSpan gene mining algorithm Gene expression data Data mining Visual analysis},
        doi={10.1007/978-3-030-12981-1_36}
    }
    
  • Liangfu Lu
    Xiaoxu Ren
    Lianyong Qi
    Chenming Cui
    Yichen Jiao
    Year: 2019
    Target Gene Mining Algorithm Based on gSpan
    COLLABORATECOM
    Springer
    DOI: 10.1007/978-3-030-12981-1_36
Liangfu Lu1, Xiaoxu Ren1, Lianyong Qi2,*, Chenming Cui1, Yichen Jiao1
  • 1: Tianjin University
  • 2: Qufu Normal University
*Contact email: lianyongqi@gmail.com

Abstract

In recent years, the focus of bioinformatics research has turned to biological data processing and information extraction. New mining algorithm was designed to mine target gene fragment efficiently from a huge amount of gene data and to study specific gene expression in this paper. The extracted gene data was filtered in order to remove redundant gene data. Then the binary tree was constructed according to the Pearson correlation coefficient between gene data and processed by gSpan frequent subgraph mining algorithm. Finally, the results were visually analyzed in grayscale image way which helped us to find out the target gene. Compared with the existing target gene mining algorithms, such as integrated decision feature gene selection algorithm, our approach enjoys the advantages of higher accuracy and processing high-dimensional data. The proposed algorithm has sufficient theoretical basis, not only makes the results more efficient, but also makes the possibility of error results less. Moreover, the dimension of the data is much higher than the dimension of the data set used by the existing algorithm, so the algorithm is more practical.