Translating genome wide association study results to associations among common diseases: In silico study with an electronic medical record

Research output: Contribution to journalArticle

3 Scopus citations

Abstract

Objective: To develop a map of disease associations exclusively using two publicly available genetic sources: the catalog of single nucleotide polymorphisms (SNPs) from the HapMap, and the catalog of Genome Wide Association Studies (GWAS) from the NHGRI, and to evaluate it with a large, long-standing electronic medical record (EMR). Methods: A computational model, In Silico Bayesian Integration of GWAS (IsBIG), was developed to learn associations among diseases using a Bayesian network (BN) framework, using only genetic data. The IsBIG model (I-Model) was re-trained using data from our EMR (M-Model). Separately, another clinical model (C-Model) was learned from this training dataset. The I-Model was compared with both the M-Model and the C-Model for power to discriminate a disease given other diseases using a test dataset from our EMR. Area under receiver operator characteristics curve was used as a performance measure. Direct associations between diseases in the I-Model were also searched in the PubMed database and in classes of the Human Disease Network (HDN). Results: On the basis of genetic information alone, the I-Model linked a third of diseases from our EMR. When compared to the M-Model, the I-Model predicted diseases given other diseases with 94% specificity, 33% sensitivity, and 80% positive predictive value. The I-Model contained 117 direct associations between diseases. Of those associations, 20 (17%) were absent from the searches of the PubMed database; one of these was present in the C-Model. Of the direct associations in the I-Model, 7 (35%) were absent from disease classes of HDN. Conclusion: Using only publicly available genetic sources we have mapped associations in GWAS to a human disease map using an in silico approach. Furthermore, we have validated this disease map using phenotypic data from our EMR. Models predicting disease associations on the basis of known genetic associations alone are specific but not sensitive. Genetic data, as it currently exists, can only explain a fraction of the risk of a disease. Our approach makes a quantitative statement about disease variation that can be explained in an EMR on the basis of genetic associations described in the GWAS.

Original languageEnglish (US)
Pages (from-to)864-874
Number of pages11
JournalInternational Journal of Medical Informatics
Volume82
Issue number9
DOIs
StatePublished - Sep 1 2013

Keywords

  • Bayesian network
  • Bioinformatics
  • Data mining
  • Electronic medical records (EMRs)
  • Genome Wide Association Studies (GWAS)
  • In silico
  • Integration
  • Single nucleotide polymorphisms (SNPs)

ASJC Scopus subject areas

  • Health Informatics

Fingerprint Dive into the research topics of 'Translating genome wide association study results to associations among common diseases: In silico study with an electronic medical record'. Together they form a unique fingerprint.

  • Cite this