Computational tools for top down mass spectrometry based proteoform identification and proteogenomics

Project: Research projectResearch Project

Description

? DESCRIPTION (provided by applicant): Mass spectrometry-based top-down proteomics has emerged as one of the most informative approaches in protein analysis because it provides the bird-eye view of all intact proteoforms generated from post-translational modifications and sequence variations. A major challenge in proteoform identification by database search is the combinatorial explosion of possible proteoforms resulting from combinations of sequence variations, post-translational modifications, and other molecular events, such as protein degradation. Here, we propose to a novel data model, called the mass graph, to efficiently represent a huge number of potential proteoforms, and design new mass graph-based alignment and filtering algorithms that precisely identify complex proteoforms at the proteome level. We will also develop a software pipeline that combines top-down mass spectrometry and RNA-Seq data to identify sample-specific proteoforms. The proposed research will be conducted by a group of researchers who have complementary expertise. All the proposed algorithms will be implemented as user-friendly open source software tools.
StatusActive
Effective start/end date6/1/165/31/20

Funding

  • National Institutes of Health: $299,451.00

Fingerprint

Mass spectrometry
Proteins
RNA
Explosions
Data structures
Pipelines
Degradation
Proteomics
Open source software