Quality control and quality assurance in genotypic data for genome-wide association studies

Cathy C. Laurie, Kimberly F. Doheny, Daniel B. Mirel, Elizabeth W. Pugh, Laura J. Bierut, Tushar Bhangale, Frederick Boehm, Neil E. Caporaso, Marilyn C. Cornelis, Howard Edenberg, Stacy B. Gabriel, Emily L. Harris, Frank B. Hu, Kevin B. Jacobs, Peter Kraft, Maria Teresa Landi, Thomas Lumley, Teri A. Manolio, Caitlin McHugh, Ian Painter & 5 others Justin Paschall, John P. Rice, Kenneth M. Rice, Xiuwen Zheng, Bruce S. Weir

Research output: Contribution to journalArticle

222 Citations (Scopus)

Abstract

Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome-wide association studies (GWAS). This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium test P-values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis to SNP selection. The methods are illustrated with examples from the "Gene Environment Association Studies" (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of GWAS.

Original languageEnglish
Pages (from-to)591-602
Number of pages12
JournalGenetic Epidemiology
Volume34
Issue number6
DOIs
StatePublished - Sep 2010

Fingerprint

Genome-Wide Association Study
Quality Control
Single Nucleotide Polymorphism
Sex Chromosome Aberrations
Genotype
Chromosomes, Human, Pair 2
Principal Component Analysis
Chromosome Aberrations
Sample Size
Artifacts
Nucleotides
Genome
DNA
Genes

Keywords

  • Chromosome aberration
  • DNA sample quality
  • Genotyping artifact
  • GWAS
  • Hardy-Weinberg equilibrium

ASJC Scopus subject areas

  • Genetics(clinical)
  • Epidemiology

Cite this

Laurie, C. C., Doheny, K. F., Mirel, D. B., Pugh, E. W., Bierut, L. J., Bhangale, T., ... Weir, B. S. (2010). Quality control and quality assurance in genotypic data for genome-wide association studies. Genetic Epidemiology, 34(6), 591-602. https://doi.org/10.1002/gepi.20516

Quality control and quality assurance in genotypic data for genome-wide association studies. / Laurie, Cathy C.; Doheny, Kimberly F.; Mirel, Daniel B.; Pugh, Elizabeth W.; Bierut, Laura J.; Bhangale, Tushar; Boehm, Frederick; Caporaso, Neil E.; Cornelis, Marilyn C.; Edenberg, Howard; Gabriel, Stacy B.; Harris, Emily L.; Hu, Frank B.; Jacobs, Kevin B.; Kraft, Peter; Landi, Maria Teresa; Lumley, Thomas; Manolio, Teri A.; McHugh, Caitlin; Painter, Ian; Paschall, Justin; Rice, John P.; Rice, Kenneth M.; Zheng, Xiuwen; Weir, Bruce S.

In: Genetic Epidemiology, Vol. 34, No. 6, 09.2010, p. 591-602.

Research output: Contribution to journalArticle

Laurie, CC, Doheny, KF, Mirel, DB, Pugh, EW, Bierut, LJ, Bhangale, T, Boehm, F, Caporaso, NE, Cornelis, MC, Edenberg, H, Gabriel, SB, Harris, EL, Hu, FB, Jacobs, KB, Kraft, P, Landi, MT, Lumley, T, Manolio, TA, McHugh, C, Painter, I, Paschall, J, Rice, JP, Rice, KM, Zheng, X & Weir, BS 2010, 'Quality control and quality assurance in genotypic data for genome-wide association studies', Genetic Epidemiology, vol. 34, no. 6, pp. 591-602. https://doi.org/10.1002/gepi.20516
Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ, Bhangale T et al. Quality control and quality assurance in genotypic data for genome-wide association studies. Genetic Epidemiology. 2010 Sep;34(6):591-602. https://doi.org/10.1002/gepi.20516
Laurie, Cathy C. ; Doheny, Kimberly F. ; Mirel, Daniel B. ; Pugh, Elizabeth W. ; Bierut, Laura J. ; Bhangale, Tushar ; Boehm, Frederick ; Caporaso, Neil E. ; Cornelis, Marilyn C. ; Edenberg, Howard ; Gabriel, Stacy B. ; Harris, Emily L. ; Hu, Frank B. ; Jacobs, Kevin B. ; Kraft, Peter ; Landi, Maria Teresa ; Lumley, Thomas ; Manolio, Teri A. ; McHugh, Caitlin ; Painter, Ian ; Paschall, Justin ; Rice, John P. ; Rice, Kenneth M. ; Zheng, Xiuwen ; Weir, Bruce S. / Quality control and quality assurance in genotypic data for genome-wide association studies. In: Genetic Epidemiology. 2010 ; Vol. 34, No. 6. pp. 591-602.
@article{7cde1743ee114e9493188ed8fa1fa5e6,
title = "Quality control and quality assurance in genotypic data for genome-wide association studies",
abstract = "Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome-wide association studies (GWAS). This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium test P-values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis to SNP selection. The methods are illustrated with examples from the {"}Gene Environment Association Studies{"} (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of GWAS.",
keywords = "Chromosome aberration, DNA sample quality, Genotyping artifact, GWAS, Hardy-Weinberg equilibrium",
author = "Laurie, {Cathy C.} and Doheny, {Kimberly F.} and Mirel, {Daniel B.} and Pugh, {Elizabeth W.} and Bierut, {Laura J.} and Tushar Bhangale and Frederick Boehm and Caporaso, {Neil E.} and Cornelis, {Marilyn C.} and Howard Edenberg and Gabriel, {Stacy B.} and Harris, {Emily L.} and Hu, {Frank B.} and Jacobs, {Kevin B.} and Peter Kraft and Landi, {Maria Teresa} and Thomas Lumley and Manolio, {Teri A.} and Caitlin McHugh and Ian Painter and Justin Paschall and Rice, {John P.} and Rice, {Kenneth M.} and Xiuwen Zheng and Weir, {Bruce S.}",
year = "2010",
month = "9",
doi = "10.1002/gepi.20516",
language = "English",
volume = "34",
pages = "591--602",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "6",

}

TY - JOUR

T1 - Quality control and quality assurance in genotypic data for genome-wide association studies

AU - Laurie, Cathy C.

AU - Doheny, Kimberly F.

AU - Mirel, Daniel B.

AU - Pugh, Elizabeth W.

AU - Bierut, Laura J.

AU - Bhangale, Tushar

AU - Boehm, Frederick

AU - Caporaso, Neil E.

AU - Cornelis, Marilyn C.

AU - Edenberg, Howard

AU - Gabriel, Stacy B.

AU - Harris, Emily L.

AU - Hu, Frank B.

AU - Jacobs, Kevin B.

AU - Kraft, Peter

AU - Landi, Maria Teresa

AU - Lumley, Thomas

AU - Manolio, Teri A.

AU - McHugh, Caitlin

AU - Painter, Ian

AU - Paschall, Justin

AU - Rice, John P.

AU - Rice, Kenneth M.

AU - Zheng, Xiuwen

AU - Weir, Bruce S.

PY - 2010/9

Y1 - 2010/9

N2 - Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome-wide association studies (GWAS). This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium test P-values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis to SNP selection. The methods are illustrated with examples from the "Gene Environment Association Studies" (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of GWAS.

AB - Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome-wide association studies (GWAS). This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium test P-values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis to SNP selection. The methods are illustrated with examples from the "Gene Environment Association Studies" (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of GWAS.

KW - Chromosome aberration

KW - DNA sample quality

KW - Genotyping artifact

KW - GWAS

KW - Hardy-Weinberg equilibrium

UR - http://www.scopus.com/inward/record.url?scp=77956242566&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77956242566&partnerID=8YFLogxK

U2 - 10.1002/gepi.20516

DO - 10.1002/gepi.20516

M3 - Article

VL - 34

SP - 591

EP - 602

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 6

ER -