MODMatcher

Multi-Omics Data Matcher for Integrative Genomic Analysis

Seungyeul Yoo, Tao Huang, Joshua D. Campbell, Eunjee Lee, Zhidong Tu, Mark W. Geraci, Charles A. Powell, Eric E. Schadt, Avrum Spira, Jun Zhu

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.

Original languageEnglish (US)
Article numbere1003790
JournalPLoS Computational Biology
Volume10
Issue number8
DOIs
StatePublished - Aug 14 2014
Externally publishedYes

Fingerprint

Genomics
genomics
Labeling
Atlases
Genome
Lung
Quality assurance
Neoplasms
Genes
Inspection
sampling
quality control
Annotation
Datasets
analysis
lungs
Genetic Association
neoplasms
genome
data quality

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Modeling and Simulation
  • Ecology, Evolution, Behavior and Systematics
  • Genetics
  • Molecular Biology
  • Ecology
  • Cellular and Molecular Neuroscience

Cite this

Yoo, S., Huang, T., Campbell, J. D., Lee, E., Tu, Z., Geraci, M. W., ... Zhu, J. (2014). MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis. PLoS Computational Biology, 10(8), [e1003790]. https://doi.org/10.1371/journal.pcbi.1003790

MODMatcher : Multi-Omics Data Matcher for Integrative Genomic Analysis. / Yoo, Seungyeul; Huang, Tao; Campbell, Joshua D.; Lee, Eunjee; Tu, Zhidong; Geraci, Mark W.; Powell, Charles A.; Schadt, Eric E.; Spira, Avrum; Zhu, Jun.

In: PLoS Computational Biology, Vol. 10, No. 8, e1003790, 14.08.2014.

Research output: Contribution to journalArticle

Yoo, S, Huang, T, Campbell, JD, Lee, E, Tu, Z, Geraci, MW, Powell, CA, Schadt, EE, Spira, A & Zhu, J 2014, 'MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis', PLoS Computational Biology, vol. 10, no. 8, e1003790. https://doi.org/10.1371/journal.pcbi.1003790
Yoo, Seungyeul ; Huang, Tao ; Campbell, Joshua D. ; Lee, Eunjee ; Tu, Zhidong ; Geraci, Mark W. ; Powell, Charles A. ; Schadt, Eric E. ; Spira, Avrum ; Zhu, Jun. / MODMatcher : Multi-Omics Data Matcher for Integrative Genomic Analysis. In: PLoS Computational Biology. 2014 ; Vol. 10, No. 8.
@article{3d68154792784ffd971aea1c442756e2,
title = "MODMatcher: Multi-Omics Data Matcher for Integrative Genomic Analysis",
abstract = "Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.",
author = "Seungyeul Yoo and Tao Huang and Campbell, {Joshua D.} and Eunjee Lee and Zhidong Tu and Geraci, {Mark W.} and Powell, {Charles A.} and Schadt, {Eric E.} and Avrum Spira and Jun Zhu",
year = "2014",
month = "8",
day = "14",
doi = "10.1371/journal.pcbi.1003790",
language = "English (US)",
volume = "10",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "8",

}

TY - JOUR

T1 - MODMatcher

T2 - Multi-Omics Data Matcher for Integrative Genomic Analysis

AU - Yoo, Seungyeul

AU - Huang, Tao

AU - Campbell, Joshua D.

AU - Lee, Eunjee

AU - Tu, Zhidong

AU - Geraci, Mark W.

AU - Powell, Charles A.

AU - Schadt, Eric E.

AU - Spira, Avrum

AU - Zhu, Jun

PY - 2014/8/14

Y1 - 2014/8/14

N2 - Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.

AB - Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.

UR - http://www.scopus.com/inward/record.url?scp=84924380084&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84924380084&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1003790

DO - 10.1371/journal.pcbi.1003790

M3 - Article

VL - 10

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 8

M1 - e1003790

ER -