A new statistic to evaluate imputation reliability

Peng Lin, Sarah M. Hartz, Zhehao Zhang, Scott F. Saccone, Jia Wang, Jay A. Tischfield, Howard Edenberg, John R. Kramer, Alison M. Goate, Laura J. Bierut, John P. Rice

Research output: Contribution to journalArticle

43 Citations (Scopus)

Abstract

Background: As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems. Methodology/Principal Findings: We introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into "cases" and "controls", we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (λ = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS<0.9, the Q-Q plot was acceptable and there were no longer false positives. We then evaluated the robustness of IQS computed independently on the two halves of the data. In both European Americans and African Americans the correlation was >0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms. Conclusions/Significance: IQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms.

Original languageEnglish
Article numbere9697
JournalPLoS One
Volume5
Issue number3
DOIs
StatePublished - 2010

Fingerprint

Polymorphism
single nucleotide polymorphism
Single Nucleotide Polymorphism
statistics
Statistics
Nucleotides
Gene Frequency
gene frequency
Genome-Wide Association Study
genetic polymorphism
Databases
Genes

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Lin, P., Hartz, S. M., Zhang, Z., Saccone, S. F., Wang, J., Tischfield, J. A., ... Rice, J. P. (2010). A new statistic to evaluate imputation reliability. PLoS One, 5(3), [e9697]. https://doi.org/10.1371/journal.pone.0009697

A new statistic to evaluate imputation reliability. / Lin, Peng; Hartz, Sarah M.; Zhang, Zhehao; Saccone, Scott F.; Wang, Jia; Tischfield, Jay A.; Edenberg, Howard; Kramer, John R.; Goate, Alison M.; Bierut, Laura J.; Rice, John P.

In: PLoS One, Vol. 5, No. 3, e9697, 2010.

Research output: Contribution to journalArticle

Lin, P, Hartz, SM, Zhang, Z, Saccone, SF, Wang, J, Tischfield, JA, Edenberg, H, Kramer, JR, Goate, AM, Bierut, LJ & Rice, JP 2010, 'A new statistic to evaluate imputation reliability', PLoS One, vol. 5, no. 3, e9697. https://doi.org/10.1371/journal.pone.0009697
Lin P, Hartz SM, Zhang Z, Saccone SF, Wang J, Tischfield JA et al. A new statistic to evaluate imputation reliability. PLoS One. 2010;5(3). e9697. https://doi.org/10.1371/journal.pone.0009697
Lin, Peng ; Hartz, Sarah M. ; Zhang, Zhehao ; Saccone, Scott F. ; Wang, Jia ; Tischfield, Jay A. ; Edenberg, Howard ; Kramer, John R. ; Goate, Alison M. ; Bierut, Laura J. ; Rice, John P. / A new statistic to evaluate imputation reliability. In: PLoS One. 2010 ; Vol. 5, No. 3.
@article{f314344099af46c3a1ba68e4f6c52dcb,
title = "A new statistic to evaluate imputation reliability",
abstract = "Background: As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems. Methodology/Principal Findings: We introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into {"}cases{"} and {"}controls{"}, we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (λ = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS<0.9, the Q-Q plot was acceptable and there were no longer false positives. We then evaluated the robustness of IQS computed independently on the two halves of the data. In both European Americans and African Americans the correlation was >0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms. Conclusions/Significance: IQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms.",
author = "Peng Lin and Hartz, {Sarah M.} and Zhehao Zhang and Saccone, {Scott F.} and Jia Wang and Tischfield, {Jay A.} and Howard Edenberg and Kramer, {John R.} and Goate, {Alison M.} and Bierut, {Laura J.} and Rice, {John P.}",
year = "2010",
doi = "10.1371/journal.pone.0009697",
language = "English",
volume = "5",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "3",

}

TY - JOUR

T1 - A new statistic to evaluate imputation reliability

AU - Lin, Peng

AU - Hartz, Sarah M.

AU - Zhang, Zhehao

AU - Saccone, Scott F.

AU - Wang, Jia

AU - Tischfield, Jay A.

AU - Edenberg, Howard

AU - Kramer, John R.

AU - Goate, Alison M.

AU - Bierut, Laura J.

AU - Rice, John P.

PY - 2010

Y1 - 2010

N2 - Background: As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems. Methodology/Principal Findings: We introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into "cases" and "controls", we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (λ = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS<0.9, the Q-Q plot was acceptable and there were no longer false positives. We then evaluated the robustness of IQS computed independently on the two halves of the data. In both European Americans and African Americans the correlation was >0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms. Conclusions/Significance: IQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms.

AB - Background: As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems. Methodology/Principal Findings: We introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into "cases" and "controls", we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (λ = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS<0.9, the Q-Q plot was acceptable and there were no longer false positives. We then evaluated the robustness of IQS computed independently on the two halves of the data. In both European Americans and African Americans the correlation was >0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms. Conclusions/Significance: IQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms.

UR - http://www.scopus.com/inward/record.url?scp=78149417737&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78149417737&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0009697

DO - 10.1371/journal.pone.0009697

M3 - Article

C2 - 20300623

AN - SCOPUS:78149417737

VL - 5

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 3

M1 - e9697

ER -