International Workshop on Collaborative Big Data

Research Article

Efficient Processing of Models for Large-scale Shotgun Proteomics Data

Download624 downloads
  • @INPROCEEDINGS{10.4108/icst.collaboratecom.2012.250716,
        author={Himanshu Grover and Vanathi Gopalakrishnan},
        title={Efficient Processing of Models for Large-scale Shotgun Proteomics Data},
        proceedings={International Workshop on Collaborative Big Data},
        publisher={IEEE},
        proceedings_a={C-BIG},
        year={2012},
        month={12},
        keywords={bioinformatics high-throughput proteomics indexing multiprocessing parallelization},
        doi={10.4108/icst.collaboratecom.2012.250716}
    }
    
  • Himanshu Grover
    Vanathi Gopalakrishnan
    Year: 2012
    Efficient Processing of Models for Large-scale Shotgun Proteomics Data
    C-BIG
    ICST
    DOI: 10.4108/icst.collaboratecom.2012.250716
Himanshu Grover1, Vanathi Gopalakrishnan1,*
  • 1: Department of Biomedical Informatics, University of Pittsburgh, USA
*Contact email: vanathi@pitt.edu

Abstract

Mass-spectrometry (MS) based proteomics has become a key enabling technology for the systems approach to biology, providing insights into the protein complement of an organism. Bioinformatics analyses play a critical role in interpretation of large, and often replicated, MS datasets generated across laboratories and institutions. A significant amount of computational effort in the workflow is spent on the identification of protein and peptide components of complex biological samples, and consists of a series of steps relying on large database searches and intricate scoring algorithms. In this work, we share our efforts and experience in efficient handling of these large MS datasets through database indexing and parallelization based on multiprocessor architectures. We also identify important challenges and opportunities that are relevant specifically to the task of peptide and protein identification, and more generally to similar multi-step problems that are inherently parallelizable.