Translating genome wide association study results to associations among common diseases: In silico study with an electronic medical record

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Objective: To develop a map of disease associations exclusively using two publicly available genetic sources: the catalog of single nucleotide polymorphisms (SNPs) from the HapMap, and the catalog of Genome Wide Association Studies (GWAS) from the NHGRI, and to evaluate it with a large, long-standing electronic medical record (EMR). Methods: A computational model, In Silico Bayesian Integration of GWAS (IsBIG), was developed to learn associations among diseases using a Bayesian network (BN) framework, using only genetic data. The IsBIG model (I-Model) was re-trained using data from our EMR (M-Model). Separately, another clinical model (C-Model) was learned from this training dataset. The I-Model was compared with both the M-Model and the C-Model for power to discriminate a disease given other diseases using a test dataset from our EMR. Area under receiver operator characteristics curve was used as a performance measure. Direct associations between diseases in the I-Model were also searched in the PubMed database and in classes of the Human Disease Network (HDN). Results: On the basis of genetic information alone, the I-Model linked a third of diseases from our EMR. When compared to the M-Model, the I-Model predicted diseases given other diseases with 94% specificity, 33% sensitivity, and 80% positive predictive value. The I-Model contained 117 direct associations between diseases. Of those associations, 20 (17%) were absent from the searches of the PubMed database; one of these was present in the C-Model. Of the direct associations in the I-Model, 7 (35%) were absent from disease classes of HDN. Conclusion: Using only publicly available genetic sources we have mapped associations in GWAS to a human disease map using an in silico approach. Furthermore, we have validated this disease map using phenotypic data from our EMR. Models predicting disease associations on the basis of known genetic associations alone are specific but not sensitive. Genetic data, as it currently exists, can only explain a fraction of the risk of a disease. Our approach makes a quantitative statement about disease variation that can be explained in an EMR on the basis of genetic associations described in the GWAS.

Original languageEnglish
Pages (from-to)864-874
Number of pages11
JournalInternational Journal of Medical Informatics
Volume82
Issue number9
DOIs
StatePublished - Sep 2013

Fingerprint

Electronic Health Records
Genome-Wide Association Study
Computer Simulation
PubMed
National Human Genome Research Institute (U.S.)
Databases
HapMap Project

Keywords

  • Bayesian network
  • Bioinformatics
  • Data mining
  • Electronic medical records (EMRs)
  • Genome Wide Association Studies (GWAS)
  • In silico
  • Integration
  • Single nucleotide polymorphisms (SNPs)

ASJC Scopus subject areas

  • Health Informatics

Cite this

@article{cac33b31acaa44459f5c60a340811179,
title = "Translating genome wide association study results to associations among common diseases: In silico study with an electronic medical record",
abstract = "Objective: To develop a map of disease associations exclusively using two publicly available genetic sources: the catalog of single nucleotide polymorphisms (SNPs) from the HapMap, and the catalog of Genome Wide Association Studies (GWAS) from the NHGRI, and to evaluate it with a large, long-standing electronic medical record (EMR). Methods: A computational model, In Silico Bayesian Integration of GWAS (IsBIG), was developed to learn associations among diseases using a Bayesian network (BN) framework, using only genetic data. The IsBIG model (I-Model) was re-trained using data from our EMR (M-Model). Separately, another clinical model (C-Model) was learned from this training dataset. The I-Model was compared with both the M-Model and the C-Model for power to discriminate a disease given other diseases using a test dataset from our EMR. Area under receiver operator characteristics curve was used as a performance measure. Direct associations between diseases in the I-Model were also searched in the PubMed database and in classes of the Human Disease Network (HDN). Results: On the basis of genetic information alone, the I-Model linked a third of diseases from our EMR. When compared to the M-Model, the I-Model predicted diseases given other diseases with 94{\%} specificity, 33{\%} sensitivity, and 80{\%} positive predictive value. The I-Model contained 117 direct associations between diseases. Of those associations, 20 (17{\%}) were absent from the searches of the PubMed database; one of these was present in the C-Model. Of the direct associations in the I-Model, 7 (35{\%}) were absent from disease classes of HDN. Conclusion: Using only publicly available genetic sources we have mapped associations in GWAS to a human disease map using an in silico approach. Furthermore, we have validated this disease map using phenotypic data from our EMR. Models predicting disease associations on the basis of known genetic associations alone are specific but not sensitive. Genetic data, as it currently exists, can only explain a fraction of the risk of a disease. Our approach makes a quantitative statement about disease variation that can be explained in an EMR on the basis of genetic associations described in the GWAS.",
keywords = "Bayesian network, Bioinformatics, Data mining, Electronic medical records (EMRs), Genome Wide Association Studies (GWAS), In silico, Integration, Single nucleotide polymorphisms (SNPs)",
author = "Vibha Anand and Marc Rosenman and Stephen Downs",
year = "2013",
month = "9",
doi = "10.1016/j.ijmedinf.2013.05.003",
language = "English",
volume = "82",
pages = "864--874",
journal = "International Journal of Medical Informatics",
issn = "1386-5056",
publisher = "Elsevier Ireland Ltd",
number = "9",

}

TY - JOUR

T1 - Translating genome wide association study results to associations among common diseases

T2 - In silico study with an electronic medical record

AU - Anand, Vibha

AU - Rosenman, Marc

AU - Downs, Stephen

PY - 2013/9

Y1 - 2013/9

N2 - Objective: To develop a map of disease associations exclusively using two publicly available genetic sources: the catalog of single nucleotide polymorphisms (SNPs) from the HapMap, and the catalog of Genome Wide Association Studies (GWAS) from the NHGRI, and to evaluate it with a large, long-standing electronic medical record (EMR). Methods: A computational model, In Silico Bayesian Integration of GWAS (IsBIG), was developed to learn associations among diseases using a Bayesian network (BN) framework, using only genetic data. The IsBIG model (I-Model) was re-trained using data from our EMR (M-Model). Separately, another clinical model (C-Model) was learned from this training dataset. The I-Model was compared with both the M-Model and the C-Model for power to discriminate a disease given other diseases using a test dataset from our EMR. Area under receiver operator characteristics curve was used as a performance measure. Direct associations between diseases in the I-Model were also searched in the PubMed database and in classes of the Human Disease Network (HDN). Results: On the basis of genetic information alone, the I-Model linked a third of diseases from our EMR. When compared to the M-Model, the I-Model predicted diseases given other diseases with 94% specificity, 33% sensitivity, and 80% positive predictive value. The I-Model contained 117 direct associations between diseases. Of those associations, 20 (17%) were absent from the searches of the PubMed database; one of these was present in the C-Model. Of the direct associations in the I-Model, 7 (35%) were absent from disease classes of HDN. Conclusion: Using only publicly available genetic sources we have mapped associations in GWAS to a human disease map using an in silico approach. Furthermore, we have validated this disease map using phenotypic data from our EMR. Models predicting disease associations on the basis of known genetic associations alone are specific but not sensitive. Genetic data, as it currently exists, can only explain a fraction of the risk of a disease. Our approach makes a quantitative statement about disease variation that can be explained in an EMR on the basis of genetic associations described in the GWAS.

AB - Objective: To develop a map of disease associations exclusively using two publicly available genetic sources: the catalog of single nucleotide polymorphisms (SNPs) from the HapMap, and the catalog of Genome Wide Association Studies (GWAS) from the NHGRI, and to evaluate it with a large, long-standing electronic medical record (EMR). Methods: A computational model, In Silico Bayesian Integration of GWAS (IsBIG), was developed to learn associations among diseases using a Bayesian network (BN) framework, using only genetic data. The IsBIG model (I-Model) was re-trained using data from our EMR (M-Model). Separately, another clinical model (C-Model) was learned from this training dataset. The I-Model was compared with both the M-Model and the C-Model for power to discriminate a disease given other diseases using a test dataset from our EMR. Area under receiver operator characteristics curve was used as a performance measure. Direct associations between diseases in the I-Model were also searched in the PubMed database and in classes of the Human Disease Network (HDN). Results: On the basis of genetic information alone, the I-Model linked a third of diseases from our EMR. When compared to the M-Model, the I-Model predicted diseases given other diseases with 94% specificity, 33% sensitivity, and 80% positive predictive value. The I-Model contained 117 direct associations between diseases. Of those associations, 20 (17%) were absent from the searches of the PubMed database; one of these was present in the C-Model. Of the direct associations in the I-Model, 7 (35%) were absent from disease classes of HDN. Conclusion: Using only publicly available genetic sources we have mapped associations in GWAS to a human disease map using an in silico approach. Furthermore, we have validated this disease map using phenotypic data from our EMR. Models predicting disease associations on the basis of known genetic associations alone are specific but not sensitive. Genetic data, as it currently exists, can only explain a fraction of the risk of a disease. Our approach makes a quantitative statement about disease variation that can be explained in an EMR on the basis of genetic associations described in the GWAS.

KW - Bayesian network

KW - Bioinformatics

KW - Data mining

KW - Electronic medical records (EMRs)

KW - Genome Wide Association Studies (GWAS)

KW - In silico

KW - Integration

KW - Single nucleotide polymorphisms (SNPs)

UR - http://www.scopus.com/inward/record.url?scp=84881370708&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881370708&partnerID=8YFLogxK

U2 - 10.1016/j.ijmedinf.2013.05.003

DO - 10.1016/j.ijmedinf.2013.05.003

M3 - Article

C2 - 23743324

AN - SCOPUS:84881370708

VL - 82

SP - 864

EP - 874

JO - International Journal of Medical Informatics

JF - International Journal of Medical Informatics

SN - 1386-5056

IS - 9

ER -