Identifying significant gene-environment interactions using a combination of screening testing and hierarchical false discovery rate control

H. Robert Frost, Li Shen, Andrew Saykin, Scott M. Williams, Jason H. Moore

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Although gene-environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome-wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening-testing approach for G× E interaction detection that combines elastic net penalized regression with joint estimation to support a single omnibus test for the presence of G× E interactions. In our original work on this technique, however, we did not assess type I error control or power and evaluated the method using just a single, small bladder cancer data set. In this paper, we extend the original method in two important directions and provide a more rigorous performance evaluation. First, we introduce a hierarchical false discovery rate approach to formally assess the significance of individual G× E interactions. Second, to support the analysis of truly genome-wide data sets, we incorporate a score statistic-based prescreening step to reduce the number of single nucleotide polymorphisms prior to fitting the first stage penalized regression model. To assess the statistical properties of our method, we compare the type I error rate and statistical power of our approach with competing techniques using both simple simulation designs as well as designs based on real disease architectures. Finally, we demonstrate the ability of our approach to identify biologically plausible SNP-education interactions relative to Alzheimer's disease status using genome-wide association study data from the Alzheimer's Disease Neuroimaging Initiative (ADNI).

Original languageEnglish (US)
JournalGenetic Epidemiology
DOIs
StateAccepted/In press - 2016

Fingerprint

Gene-Environment Interaction
Single Nucleotide Polymorphism
Alzheimer Disease
Genome
Aptitude
Genome-Wide Association Study
Urinary Bladder Neoplasms
Neuroimaging
Joints
Education
Power (Psychology)
Datasets

Keywords

  • Gene-environment interactions
  • Hierarchical FDR
  • Penalized regression
  • Screening testing

ASJC Scopus subject areas

  • Epidemiology
  • Medicine(all)
  • Genetics(clinical)

Cite this

@article{bc11e0df9b2a45668ceda229cb22109c,
title = "Identifying significant gene-environment interactions using a combination of screening testing and hierarchical false discovery rate control",
abstract = "Although gene-environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome-wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening-testing approach for G× E interaction detection that combines elastic net penalized regression with joint estimation to support a single omnibus test for the presence of G× E interactions. In our original work on this technique, however, we did not assess type I error control or power and evaluated the method using just a single, small bladder cancer data set. In this paper, we extend the original method in two important directions and provide a more rigorous performance evaluation. First, we introduce a hierarchical false discovery rate approach to formally assess the significance of individual G× E interactions. Second, to support the analysis of truly genome-wide data sets, we incorporate a score statistic-based prescreening step to reduce the number of single nucleotide polymorphisms prior to fitting the first stage penalized regression model. To assess the statistical properties of our method, we compare the type I error rate and statistical power of our approach with competing techniques using both simple simulation designs as well as designs based on real disease architectures. Finally, we demonstrate the ability of our approach to identify biologically plausible SNP-education interactions relative to Alzheimer's disease status using genome-wide association study data from the Alzheimer's Disease Neuroimaging Initiative (ADNI).",
keywords = "Gene-environment interactions, Hierarchical FDR, Penalized regression, Screening testing",
author = "Frost, {H. Robert} and Li Shen and Andrew Saykin and Williams, {Scott M.} and Moore, {Jason H.}",
year = "2016",
doi = "10.1002/gepi.21997",
language = "English (US)",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",

}

TY - JOUR

T1 - Identifying significant gene-environment interactions using a combination of screening testing and hierarchical false discovery rate control

AU - Frost, H. Robert

AU - Shen, Li

AU - Saykin, Andrew

AU - Williams, Scott M.

AU - Moore, Jason H.

PY - 2016

Y1 - 2016

N2 - Although gene-environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome-wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening-testing approach for G× E interaction detection that combines elastic net penalized regression with joint estimation to support a single omnibus test for the presence of G× E interactions. In our original work on this technique, however, we did not assess type I error control or power and evaluated the method using just a single, small bladder cancer data set. In this paper, we extend the original method in two important directions and provide a more rigorous performance evaluation. First, we introduce a hierarchical false discovery rate approach to formally assess the significance of individual G× E interactions. Second, to support the analysis of truly genome-wide data sets, we incorporate a score statistic-based prescreening step to reduce the number of single nucleotide polymorphisms prior to fitting the first stage penalized regression model. To assess the statistical properties of our method, we compare the type I error rate and statistical power of our approach with competing techniques using both simple simulation designs as well as designs based on real disease architectures. Finally, we demonstrate the ability of our approach to identify biologically plausible SNP-education interactions relative to Alzheimer's disease status using genome-wide association study data from the Alzheimer's Disease Neuroimaging Initiative (ADNI).

AB - Although gene-environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome-wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening-testing approach for G× E interaction detection that combines elastic net penalized regression with joint estimation to support a single omnibus test for the presence of G× E interactions. In our original work on this technique, however, we did not assess type I error control or power and evaluated the method using just a single, small bladder cancer data set. In this paper, we extend the original method in two important directions and provide a more rigorous performance evaluation. First, we introduce a hierarchical false discovery rate approach to formally assess the significance of individual G× E interactions. Second, to support the analysis of truly genome-wide data sets, we incorporate a score statistic-based prescreening step to reduce the number of single nucleotide polymorphisms prior to fitting the first stage penalized regression model. To assess the statistical properties of our method, we compare the type I error rate and statistical power of our approach with competing techniques using both simple simulation designs as well as designs based on real disease architectures. Finally, we demonstrate the ability of our approach to identify biologically plausible SNP-education interactions relative to Alzheimer's disease status using genome-wide association study data from the Alzheimer's Disease Neuroimaging Initiative (ADNI).

KW - Gene-environment interactions

KW - Hierarchical FDR

KW - Penalized regression

KW - Screening testing

UR - http://www.scopus.com/inward/record.url?scp=84984700615&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984700615&partnerID=8YFLogxK

U2 - 10.1002/gepi.21997

DO - 10.1002/gepi.21997

M3 - Article

C2 - 27578615

AN - SCOPUS:84984700615

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

ER -