Computational identification of micro-structural variations and their proteogenomic consequences in cancer

Yen Yi Lin, Alexander Gawronski, Faraz Hach, Sujun Li, Ibrahim Numanagić, Iman Sarrafi, Swati Mishra, Andrew McPherson, Colin C. Collins, Milan Radovich, Haixu Tang, S. Cenk Sahinalp

Research output: Contribution to journalArticle

Abstract

Motivation Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. We introduce a computational framework, ProTIE, to integratively analyze all three types of omics data for a complete molecular profile of a tissue sample. Our framework features MiStrVar, a novel algorithmic method to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can accurately profile structurally aberrant transcripts in tumors. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures. Observing structural aberrations in all three types of omics data validates their presence in the tumor samples. Results We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq datasets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released. A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides. Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature. Interestingly, the vast majority of these translated aberrations, fusions in particular, were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer. Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure. Availability and implementation MiStrVar is available for download at https://bitbucket.org/compbio/mistrvar, and ProTIE is available at https://bitbucket.org/compbio/protie.

Original languageEnglish (US)
Pages (from-to)1672-1681
Number of pages10
JournalBioinformatics
Volume34
Issue number10
DOIs
StatePublished - May 15 2018

Fingerprint

Aberrations
Proteomics
Peptides
Cancer
Aberration
Genes
Genomics
Tumors
Fusion reactions
Genome
Tumor
Fusion
Genomic Structural Variation
Neoplasms
Breast Cancer
Sequencing
Tissue
Breast Neoplasms
RNA Sequence Analysis
Atlases

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Lin, Y. Y., Gawronski, A., Hach, F., Li, S., Numanagić, I., Sarrafi, I., ... Sahinalp, S. C. (2018). Computational identification of micro-structural variations and their proteogenomic consequences in cancer. Bioinformatics, 34(10), 1672-1681. https://doi.org/10.1093/bioinformatics/btx807

Computational identification of micro-structural variations and their proteogenomic consequences in cancer. / Lin, Yen Yi; Gawronski, Alexander; Hach, Faraz; Li, Sujun; Numanagić, Ibrahim; Sarrafi, Iman; Mishra, Swati; McPherson, Andrew; Collins, Colin C.; Radovich, Milan; Tang, Haixu; Sahinalp, S. Cenk.

In: Bioinformatics, Vol. 34, No. 10, 15.05.2018, p. 1672-1681.

Research output: Contribution to journalArticle

Lin, YY, Gawronski, A, Hach, F, Li, S, Numanagić, I, Sarrafi, I, Mishra, S, McPherson, A, Collins, CC, Radovich, M, Tang, H & Sahinalp, SC 2018, 'Computational identification of micro-structural variations and their proteogenomic consequences in cancer', Bioinformatics, vol. 34, no. 10, pp. 1672-1681. https://doi.org/10.1093/bioinformatics/btx807
Lin, Yen Yi ; Gawronski, Alexander ; Hach, Faraz ; Li, Sujun ; Numanagić, Ibrahim ; Sarrafi, Iman ; Mishra, Swati ; McPherson, Andrew ; Collins, Colin C. ; Radovich, Milan ; Tang, Haixu ; Sahinalp, S. Cenk. / Computational identification of micro-structural variations and their proteogenomic consequences in cancer. In: Bioinformatics. 2018 ; Vol. 34, No. 10. pp. 1672-1681.
@article{da686c7c9f7e432e9a0364b7b3ab91f5,
title = "Computational identification of micro-structural variations and their proteogenomic consequences in cancer",
abstract = "Motivation Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. We introduce a computational framework, ProTIE, to integratively analyze all three types of omics data for a complete molecular profile of a tissue sample. Our framework features MiStrVar, a novel algorithmic method to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can accurately profile structurally aberrant transcripts in tumors. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures. Observing structural aberrations in all three types of omics data validates their presence in the tumor samples. Results We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq datasets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released. A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides. Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature. Interestingly, the vast majority of these translated aberrations, fusions in particular, were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer. Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure. Availability and implementation MiStrVar is available for download at https://bitbucket.org/compbio/mistrvar, and ProTIE is available at https://bitbucket.org/compbio/protie.",
author = "Lin, {Yen Yi} and Alexander Gawronski and Faraz Hach and Sujun Li and Ibrahim Numanagić and Iman Sarrafi and Swati Mishra and Andrew McPherson and Collins, {Colin C.} and Milan Radovich and Haixu Tang and Sahinalp, {S. Cenk}",
year = "2018",
month = "5",
day = "15",
doi = "10.1093/bioinformatics/btx807",
language = "English (US)",
volume = "34",
pages = "1672--1681",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "10",

}

TY - JOUR

T1 - Computational identification of micro-structural variations and their proteogenomic consequences in cancer

AU - Lin, Yen Yi

AU - Gawronski, Alexander

AU - Hach, Faraz

AU - Li, Sujun

AU - Numanagić, Ibrahim

AU - Sarrafi, Iman

AU - Mishra, Swati

AU - McPherson, Andrew

AU - Collins, Colin C.

AU - Radovich, Milan

AU - Tang, Haixu

AU - Sahinalp, S. Cenk

PY - 2018/5/15

Y1 - 2018/5/15

N2 - Motivation Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. We introduce a computational framework, ProTIE, to integratively analyze all three types of omics data for a complete molecular profile of a tissue sample. Our framework features MiStrVar, a novel algorithmic method to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can accurately profile structurally aberrant transcripts in tumors. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures. Observing structural aberrations in all three types of omics data validates their presence in the tumor samples. Results We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq datasets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released. A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides. Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature. Interestingly, the vast majority of these translated aberrations, fusions in particular, were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer. Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure. Availability and implementation MiStrVar is available for download at https://bitbucket.org/compbio/mistrvar, and ProTIE is available at https://bitbucket.org/compbio/protie.

AB - Motivation Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. We introduce a computational framework, ProTIE, to integratively analyze all three types of omics data for a complete molecular profile of a tissue sample. Our framework features MiStrVar, a novel algorithmic method to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can accurately profile structurally aberrant transcripts in tumors. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures. Observing structural aberrations in all three types of omics data validates their presence in the tumor samples. Results We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq datasets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released. A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides. Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature. Interestingly, the vast majority of these translated aberrations, fusions in particular, were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer. Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure. Availability and implementation MiStrVar is available for download at https://bitbucket.org/compbio/mistrvar, and ProTIE is available at https://bitbucket.org/compbio/protie.

UR - http://www.scopus.com/inward/record.url?scp=85047069904&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047069904&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btx807

DO - 10.1093/bioinformatics/btx807

M3 - Article

VL - 34

SP - 1672

EP - 1681

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 10

ER -