Computational identification of micro-structural variations and their proteogenomic consequences in cancer

Yen Yi Lin, Alexander Gawronski, Faraz Hach, Sujun Li, Ibrahim Numanagić, Iman Sarrafi, Swati Mishra, Andrew McPherson, Colin C. Collins, Milan Radovich, Haixu Tang, S. Cenk Sahinalp

Research output: Contribution to journalArticle

1 Scopus citations


Motivation Rapid advancement in high throughput genome and transcriptome sequencing (HTS) and mass spectrometry (MS) technologies has enabled the acquisition of the genomic, transcriptomic and proteomic data from the same tissue sample. We introduce a computational framework, ProTIE, to integratively analyze all three types of omics data for a complete molecular profile of a tissue sample. Our framework features MiStrVar, a novel algorithmic method to identify micro structural variants (microSVs) on genomic HTS data. Coupled with deFuse, a popular gene fusion detection method we developed earlier, MiStrVar can accurately profile structurally aberrant transcripts in tumors. Given the breakpoints obtained by MiStrVar and deFuse, our framework can then identify all relevant peptides that span the breakpoint junctions and match them with unique proteomic signatures. Observing structural aberrations in all three types of omics data validates their presence in the tumor samples. Results We have applied our framework to all The Cancer Genome Atlas (TCGA) breast cancer Whole Genome Sequencing (WGS) and/or RNA-Seq datasets, spanning all four major subtypes, for which proteomics data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) have been released. A recent study on this dataset focusing on SNVs has reported many that lead to novel peptides. Complementing and significantly broadening this study, we detected 244 novel peptides from 432 candidate genomic or transcriptomic sequence aberrations. Many of the fusions and microSVs we discovered have not been reported in the literature. Interestingly, the vast majority of these translated aberrations, fusions in particular, were private, demonstrating the extensive inter-genomic heterogeneity present in breast cancer. Many of these aberrations also have matching out-of-frame downstream peptides, potentially indicating novel protein sequence and structure. Availability and implementation MiStrVar is available for download at, and ProTIE is available at

Original languageEnglish (US)
Pages (from-to)1672-1681
Number of pages10
Issue number10
StatePublished - May 15 2018

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'Computational identification of micro-structural variations and their proteogenomic consequences in cancer'. Together they form a unique fingerprint.

  • Cite this

    Lin, Y. Y., Gawronski, A., Hach, F., Li, S., Numanagić, I., Sarrafi, I., Mishra, S., McPherson, A., Collins, C. C., Radovich, M., Tang, H., & Sahinalp, S. C. (2018). Computational identification of micro-structural variations and their proteogenomic consequences in cancer. Bioinformatics, 34(10), 1672-1681.