Research Article
Metabolic Network Construction Using Ensemble Algorithms
@INPROCEEDINGS{10.4108/eai.3-12-2015.2262392, author={Seongho Kim and Joohyoung Lee and Hyejeong Jang and Xiang Zhang}, title={Metabolic Network Construction Using Ensemble Algorithms}, proceedings={The First International Workshop on Bioinformatics}, publisher={ACM}, proceedings_a={BIOINFORMATICS}, year={2016}, month={5}, keywords={ensemble averaging metabolomics network construction}, doi={10.4108/eai.3-12-2015.2262392} }
- Seongho Kim
Joohyoung Lee
Hyejeong Jang
Xiang Zhang
Year: 2016
Metabolic Network Construction Using Ensemble Algorithms
BIOINFORMATICS
ACM
DOI: 10.4108/eai.3-12-2015.2262392
Abstract
One of the most important and challenging "knowledge extraction" tasks in bioinformatics is the reverse engineering of genes, proteins, and metabolites networks from biological data. Gaussian graphical models (GGMs) have been proven to be a very powerful formalism to infer biological networks. Standard GGM selection techniques can unfortunately not be used in the "small N, large P" data setting. Various methods to overcome this issue have been developed based on regularized estimation, partial least squares method, and limited-order partial correlation graphs. Several studies compared the performances among several network construction algorithms, such as PLSR, SCE, and ES, ICR and PCR, Ridge regression, Lasso and adaptive Lasso, to see which method is the best for biological network constructions. Each comparison analysis resulted in that each construction method has its own advantages as well as disadvantages according to different circumstances, such as the network complexity. However, it is almost impossible to recognize the complexity of the network before estimation. Thus, we develop an Ensemble method which is model averaging to construct a metabolic network. Our simulation studies show that the ensemble averaging based network construction has F1 score larger than these of other methods except only for Adaptive Lasso, reflecting its ability to account for uncertainty of network complexity.