A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra

Qiang Kou, Si Wu, Nikola Tolić, Ljiljana Paša-Tolić, Yunlong Liu, Xiaowen Liu

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Motivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, posttranslational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graphbased software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms.

Original languageEnglish (US)
Pages (from-to)1309-1316
Number of pages8
JournalBioinformatics
Volume33
Issue number9
DOIs
StatePublished - May 1 2017

Fingerprint

Mass Spectrometry
Mass spectrometry
Proteins
Protein
Graph in graph theory
Biological Phenomena
Alternative Splicing
Explosions
Proteomics
Post Translational Protein Processing
Software Tools
Explosion
Data structures
Data Structures
Mutation
Alignment
Software
Genes
Research Personnel
Gene

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra. / Kou, Qiang; Wu, Si; Tolić, Nikola; Paša-Tolić, Ljiljana; Liu, Yunlong; Liu, Xiaowen.

In: Bioinformatics, Vol. 33, No. 9, 01.05.2017, p. 1309-1316.

Research output: Contribution to journalArticle

Kou, Qiang ; Wu, Si ; Tolić, Nikola ; Paša-Tolić, Ljiljana ; Liu, Yunlong ; Liu, Xiaowen. / A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra. In: Bioinformatics. 2017 ; Vol. 33, No. 9. pp. 1309-1316.
@article{ed181692e5884facade0b7751e8c3426,
title = "A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra",
abstract = "Motivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, posttranslational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graphbased software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms.",
author = "Qiang Kou and Si Wu and Nikola Tolić and Ljiljana Paša-Tolić and Yunlong Liu and Xiaowen Liu",
year = "2017",
month = "5",
day = "1",
doi = "10.1093/bioinformatics/btw806",
language = "English (US)",
volume = "33",
pages = "1309--1316",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "9",

}

TY - JOUR

T1 - A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra

AU - Kou, Qiang

AU - Wu, Si

AU - Tolić, Nikola

AU - Paša-Tolić, Ljiljana

AU - Liu, Yunlong

AU - Liu, Xiaowen

PY - 2017/5/1

Y1 - 2017/5/1

N2 - Motivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, posttranslational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graphbased software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms.

AB - Motivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, posttranslational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graphbased software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms.

UR - http://www.scopus.com/inward/record.url?scp=85019736691&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019736691&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btw806

DO - 10.1093/bioinformatics/btw806

M3 - Article

C2 - 28453668

AN - SCOPUS:85019736691

VL - 33

SP - 1309

EP - 1316

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 9

ER -