Expression quantitative locus mapping for identification of hotspots using an empirical Bayes mixture model

Guanglong Jiang, Yingqiang Fu, Pengyue Zhang, Shirin Ardeshir-Rouhani-Fard, Lijun Cheng, Lang Li, Zhigao Li

Research output: Contribution to journalArticle

Abstract

Identification of genomic regions that regulate gene expression can help our understanding of the mechanisms underlying genetic contributions to phenotypic variations. Hence, we consider a mixture model to locate candidate genomic regions that are more frequently associated with gene expression traits. A modified two-sample t-statistic was used, and singlenucleotide polymorphisms (SNPs) with P-values <10-5 were considered for a subsequent two-component negative binomial mixture model. An expectationmaximisation algorithm was adopted to identify the parameters involved in the model. The SNPs were then ranked based on their false discovery rate (FDR) values. Any SNP with a FDR value <1% was considered as a potential hotspot. Three independent datasets were used to replicate the findings. A number of common hotspots were identified, and many hotspots have annotated function as the binding site of transcription factors or histone proteins.

Original languageEnglish (US)
Pages (from-to)108-122
Number of pages15
JournalInternational Journal of Computational Biology and Drug Design
Volume10
Issue number2
DOIs
StatePublished - 2017

Fingerprint

Polymorphism
Identification (control systems)
Gene Expression
Gene expression
Statistical Models
Histones
Transcription factors
Transcription Factors
Binding Sites
Binding sites
Statistics
Proteins
Datasets

Keywords

  • Empirical Bayes
  • Expression quantitative trait loci
  • Gene expression
  • Genome-wide association studies
  • Genotype
  • Mixture model
  • Transcription factor

ASJC Scopus subject areas

  • Drug Discovery
  • Computer Science Applications

Cite this

Expression quantitative locus mapping for identification of hotspots using an empirical Bayes mixture model. / Jiang, Guanglong; Fu, Yingqiang; Zhang, Pengyue; Ardeshir-Rouhani-Fard, Shirin; Cheng, Lijun; Li, Lang; Li, Zhigao.

In: International Journal of Computational Biology and Drug Design, Vol. 10, No. 2, 2017, p. 108-122.

Research output: Contribution to journalArticle

Jiang, Guanglong ; Fu, Yingqiang ; Zhang, Pengyue ; Ardeshir-Rouhani-Fard, Shirin ; Cheng, Lijun ; Li, Lang ; Li, Zhigao. / Expression quantitative locus mapping for identification of hotspots using an empirical Bayes mixture model. In: International Journal of Computational Biology and Drug Design. 2017 ; Vol. 10, No. 2. pp. 108-122.
@article{a4aadbc275304233b70c0a73b1be93ad,
title = "Expression quantitative locus mapping for identification of hotspots using an empirical Bayes mixture model",
abstract = "Identification of genomic regions that regulate gene expression can help our understanding of the mechanisms underlying genetic contributions to phenotypic variations. Hence, we consider a mixture model to locate candidate genomic regions that are more frequently associated with gene expression traits. A modified two-sample t-statistic was used, and singlenucleotide polymorphisms (SNPs) with P-values <10-5 were considered for a subsequent two-component negative binomial mixture model. An expectationmaximisation algorithm was adopted to identify the parameters involved in the model. The SNPs were then ranked based on their false discovery rate (FDR) values. Any SNP with a FDR value <1{\%} was considered as a potential hotspot. Three independent datasets were used to replicate the findings. A number of common hotspots were identified, and many hotspots have annotated function as the binding site of transcription factors or histone proteins.",
keywords = "Empirical Bayes, Expression quantitative trait loci, Gene expression, Genome-wide association studies, Genotype, Mixture model, Transcription factor",
author = "Guanglong Jiang and Yingqiang Fu and Pengyue Zhang and Shirin Ardeshir-Rouhani-Fard and Lijun Cheng and Lang Li and Zhigao Li",
year = "2017",
doi = "10.1504/IJCBDD.2017.083882",
language = "English (US)",
volume = "10",
pages = "108--122",
journal = "International Journal of Computational Biology and Drug Design",
issn = "1756-0756",
publisher = "Inderscience Enterprises Ltd",
number = "2",

}

TY - JOUR

T1 - Expression quantitative locus mapping for identification of hotspots using an empirical Bayes mixture model

AU - Jiang, Guanglong

AU - Fu, Yingqiang

AU - Zhang, Pengyue

AU - Ardeshir-Rouhani-Fard, Shirin

AU - Cheng, Lijun

AU - Li, Lang

AU - Li, Zhigao

PY - 2017

Y1 - 2017

N2 - Identification of genomic regions that regulate gene expression can help our understanding of the mechanisms underlying genetic contributions to phenotypic variations. Hence, we consider a mixture model to locate candidate genomic regions that are more frequently associated with gene expression traits. A modified two-sample t-statistic was used, and singlenucleotide polymorphisms (SNPs) with P-values <10-5 were considered for a subsequent two-component negative binomial mixture model. An expectationmaximisation algorithm was adopted to identify the parameters involved in the model. The SNPs were then ranked based on their false discovery rate (FDR) values. Any SNP with a FDR value <1% was considered as a potential hotspot. Three independent datasets were used to replicate the findings. A number of common hotspots were identified, and many hotspots have annotated function as the binding site of transcription factors or histone proteins.

AB - Identification of genomic regions that regulate gene expression can help our understanding of the mechanisms underlying genetic contributions to phenotypic variations. Hence, we consider a mixture model to locate candidate genomic regions that are more frequently associated with gene expression traits. A modified two-sample t-statistic was used, and singlenucleotide polymorphisms (SNPs) with P-values <10-5 were considered for a subsequent two-component negative binomial mixture model. An expectationmaximisation algorithm was adopted to identify the parameters involved in the model. The SNPs were then ranked based on their false discovery rate (FDR) values. Any SNP with a FDR value <1% was considered as a potential hotspot. Three independent datasets were used to replicate the findings. A number of common hotspots were identified, and many hotspots have annotated function as the binding site of transcription factors or histone proteins.

KW - Empirical Bayes

KW - Expression quantitative trait loci

KW - Gene expression

KW - Genome-wide association studies

KW - Genotype

KW - Mixture model

KW - Transcription factor

UR - http://www.scopus.com/inward/record.url?scp=85018261560&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018261560&partnerID=8YFLogxK

U2 - 10.1504/IJCBDD.2017.083882

DO - 10.1504/IJCBDD.2017.083882

M3 - Article

VL - 10

SP - 108

EP - 122

JO - International Journal of Computational Biology and Drug Design

JF - International Journal of Computational Biology and Drug Design

SN - 1756-0756

IS - 2

ER -