Research Article
Efficient Processing of Models for Large-scale Shotgun Proteomics Data
@INPROCEEDINGS{10.4108/icst.collaboratecom.2012.250716, author={Himanshu Grover and Vanathi Gopalakrishnan}, title={Efficient Processing of Models for Large-scale Shotgun Proteomics Data}, proceedings={International Workshop on Collaborative Big Data}, publisher={IEEE}, proceedings_a={C-BIG}, year={2012}, month={12}, keywords={bioinformatics high-throughput proteomics indexing multiprocessing parallelization}, doi={10.4108/icst.collaboratecom.2012.250716} }
- Himanshu Grover
Vanathi Gopalakrishnan
Year: 2012
Efficient Processing of Models for Large-scale Shotgun Proteomics Data
C-BIG
ICST
DOI: 10.4108/icst.collaboratecom.2012.250716
Abstract
Mass-spectrometry (MS) based proteomics has become a key enabling technology for the systems approach to biology, providing insights into the protein complement of an organism. Bioinformatics analyses play a critical role in interpretation of large, and often replicated, MS datasets generated across laboratories and institutions. A significant amount of computational effort in the workflow is spent on the identification of protein and peptide components of complex biological samples, and consists of a series of steps relying on large database searches and intricate scoring algorithms. In this work, we share our efforts and experience in efficient handling of these large MS datasets through database indexing and parallelization based on multiprocessor architectures. We also identify important challenges and opportunities that are relevant specifically to the task of peptide and protein identification, and more generally to similar multi-step problems that are inherently parallelizable.