Efficient Processing of Models for Large-scale Shotgun Proteomics Data

Himanshu Grover; Vanathi Gopalakrishnan

International Workshop on Collaborative Big Data

Research Article

Efficient Processing of Models for Large-scale Shotgun Proteomics Data

Download624 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/icst.collaboratecom.2012.250716,
    author={Himanshu Grover and Vanathi Gopalakrishnan},
    title={Efficient Processing of Models for Large-scale Shotgun Proteomics Data},
    proceedings={International Workshop on Collaborative Big Data},
    publisher={IEEE},
    proceedings_a={C-BIG},
    year={2012},
    month={12},
    keywords={bioinformatics high-throughput proteomics indexing multiprocessing parallelization},
    doi={10.4108/icst.collaboratecom.2012.250716}
}

Himanshu Grover
Vanathi Gopalakrishnan
Year: 2012
Efficient Processing of Models for Large-scale Shotgun Proteomics Data
C-BIG
ICST
DOI: 10.4108/icst.collaboratecom.2012.250716

Himanshu Grover¹, Vanathi Gopalakrishnan¹^,*

1: Department of Biomedical Informatics, University of Pittsburgh, USA

*Contact email: vanathi@pitt.edu

Abstract

Mass-spectrometry (MS) based proteomics has become a key enabling technology for the systems approach to biology, providing insights into the protein complement of an organism. Bioinformatics analyses play a critical role in interpretation of large, and often replicated, MS datasets generated across laboratories and institutions. A significant amount of computational effort in the workflow is spent on the identification of protein and peptide components of complex biological samples, and consists of a series of steps relying on large database searches and intricate scoring algorithms. In this work, we share our efforts and experience in efficient handling of these large MS datasets through database indexing and parallelization based on multiprocessor architectures. We also identify important challenges and opportunities that are relevant specifically to the task of peptide and protein identification, and more generally to similar multi-step problems that are inherently parallelizable.

Keywords: bioinformatics high-throughput proteomics indexing multiprocessing parallelization

Published: 2012-12-14
Publisher: IEEE

: http://dx.doi.org/10.4108/icst.collaboratecom.2012.250716