Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction

Chia Yen Chen, Jiali Han, David J. Hunter, Peter Kraft, Alkes L. Price

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color (HC), tanning ability (TA), and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRSs) and best linear unbiased prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R2 for HC increased by 66% (0.0456-0.0755; P < 10-16), the R2 for TA increased by 123% (0.0154 to 0.0344; P < 10-16), and the liability-scale R2 for BCC increased by 68% (0.0138-0.0232; P < 10-16) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being overweighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be underweighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.

Original languageEnglish (US)
Pages (from-to)427-438
Number of pages12
JournalGenetic Epidemiology
Volume39
Issue number6
DOIs
StatePublished - Sep 1 2015

Fingerprint

Hair Color
Tanning
Single Nucleotide Polymorphism
Basal Cell Carcinoma
Sample Size
Genome-Wide Association Study
Genome
Population

Keywords

  • Basal cell carcinoma
  • Genome-wide association study
  • Pigmentation
  • Polygenic prediction
  • Principal component analysis

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Cite this

Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction. / Chen, Chia Yen; Han, Jiali; Hunter, David J.; Kraft, Peter; Price, Alkes L.

In: Genetic Epidemiology, Vol. 39, No. 6, 01.09.2015, p. 427-438.

Research output: Contribution to journalArticle

Chen, Chia Yen ; Han, Jiali ; Hunter, David J. ; Kraft, Peter ; Price, Alkes L. / Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction. In: Genetic Epidemiology. 2015 ; Vol. 39, No. 6. pp. 427-438.
@article{ab33d2b6f5ce4de891c033dc9fcee2ca,
title = "Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction",
abstract = "Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color (HC), tanning ability (TA), and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRSs) and best linear unbiased prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R2 for HC increased by 66{\%} (0.0456-0.0755; P < 10-16), the R2 for TA increased by 123{\%} (0.0154 to 0.0344; P < 10-16), and the liability-scale R2 for BCC increased by 68{\%} (0.0138-0.0232; P < 10-16) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being overweighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be underweighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.",
keywords = "Basal cell carcinoma, Genome-wide association study, Pigmentation, Polygenic prediction, Principal component analysis",
author = "Chen, {Chia Yen} and Jiali Han and Hunter, {David J.} and Peter Kraft and Price, {Alkes L.}",
year = "2015",
month = "9",
day = "1",
doi = "10.1002/gepi.21906",
language = "English (US)",
volume = "39",
pages = "427--438",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "6",

}

TY - JOUR

T1 - Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction

AU - Chen, Chia Yen

AU - Han, Jiali

AU - Hunter, David J.

AU - Kraft, Peter

AU - Price, Alkes L.

PY - 2015/9/1

Y1 - 2015/9/1

N2 - Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color (HC), tanning ability (TA), and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRSs) and best linear unbiased prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R2 for HC increased by 66% (0.0456-0.0755; P < 10-16), the R2 for TA increased by 123% (0.0154 to 0.0344; P < 10-16), and the liability-scale R2 for BCC increased by 68% (0.0138-0.0232; P < 10-16) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being overweighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be underweighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.

AB - Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color (HC), tanning ability (TA), and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRSs) and best linear unbiased prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R2 for HC increased by 66% (0.0456-0.0755; P < 10-16), the R2 for TA increased by 123% (0.0154 to 0.0344; P < 10-16), and the liability-scale R2 for BCC increased by 68% (0.0138-0.0232; P < 10-16) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being overweighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be underweighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.

KW - Basal cell carcinoma

KW - Genome-wide association study

KW - Pigmentation

KW - Polygenic prediction

KW - Principal component analysis

UR - http://www.scopus.com/inward/record.url?scp=84939465836&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84939465836&partnerID=8YFLogxK

U2 - 10.1002/gepi.21906

DO - 10.1002/gepi.21906

M3 - Article

VL - 39

SP - 427

EP - 438

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 6

ER -