Principal component analysis for predicting transcription-factor binding motifs from array-derived data

Yunlong Liu, Matthew P. Vincenti, Hiroki Yokota

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Background: The responses to interleukin 1 (EL- 1) in human chondrocytes constitute a complex regulatory mechanism, where multiple transcription factors interact combinatorially to transcription-factor binding motifs (TFBMs). In order to select a critical set of TFBMs from genomic DNA information and an array-derived data, an efficient algorithm to solve a combinatorial optimization problem is required. Although computational approaches based on evolutionary algorithms are commonly employed, an analytical algorithm would be useful to predict TFBMs at nearly no computational cost and evaluate varying modelling conditions. Singular value decomposition (SVD) is a powerful method to derive primary components of a given matrix. Applying SVD to a promoter matrix defined from regulatory DNA sequences, we derived a novel method to predict the critical set of TFBMs. Results: The promoter matrix was defined to establish a quantitative relationship between the IL-1 -driven mRNA alteration and genomic DNA sequences of the IL-1 responsive genes. The matrix was decomposed with SVD, and the effects of 8 potential TFBMs (5′-CAGGC-3′, 5′-CGCCC-3′, 5′-CCGCC-3′, 5′-ATGGG-3′, 5′-GGGAA-3′, 5′-CGTCC-3′, 5′-AAAGG-3′, and 5′-ACCCA-3′) were predicted from a pool of 512 random DNA sequences. The prediction included matches to the core binding motifs of biologically known TFBMs such as AP2, SP1, EGR1, KROX, GC-BOX, ABI4, ETF, E217, SRF, STAT, IK-1, PPARG, STAF, ROAZ, and NFKB, and their significance was evaluated numerically using Monte Carlo simulation and genetic algorithm. Conclusions: The described SVD-based prediction is an analytical method to provide a set of potential TFBMs involved in transcriptional regulation. The results would be useful to evaluate analytically a contribution of individual DNA sequences.

Original languageEnglish (US)
Article number276
JournalBMC Bioinformatics
Volume6
DOIs
StatePublished - Nov 18 2005

Fingerprint

Transcription factors
Transcription Factor
Principal Component Analysis
Principal component analysis
Transcription Factors
DNA sequences
Singular value decomposition
DNA Sequence
Interleukin-1
Critical Set
Promoter
Genomics
Predict
Interleukin
Transcriptional Regulation
Chondrocytes
Oligonucleotide Array Sequence Analysis
Evaluate
Prediction
Random Sequence

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Principal component analysis for predicting transcription-factor binding motifs from array-derived data. / Liu, Yunlong; Vincenti, Matthew P.; Yokota, Hiroki.

In: BMC Bioinformatics, Vol. 6, 276, 18.11.2005.

Research output: Contribution to journalArticle

@article{a68b565e86804dc7ad7ed061c617065a,
title = "Principal component analysis for predicting transcription-factor binding motifs from array-derived data",
abstract = "Background: The responses to interleukin 1 (EL- 1) in human chondrocytes constitute a complex regulatory mechanism, where multiple transcription factors interact combinatorially to transcription-factor binding motifs (TFBMs). In order to select a critical set of TFBMs from genomic DNA information and an array-derived data, an efficient algorithm to solve a combinatorial optimization problem is required. Although computational approaches based on evolutionary algorithms are commonly employed, an analytical algorithm would be useful to predict TFBMs at nearly no computational cost and evaluate varying modelling conditions. Singular value decomposition (SVD) is a powerful method to derive primary components of a given matrix. Applying SVD to a promoter matrix defined from regulatory DNA sequences, we derived a novel method to predict the critical set of TFBMs. Results: The promoter matrix was defined to establish a quantitative relationship between the IL-1 -driven mRNA alteration and genomic DNA sequences of the IL-1 responsive genes. The matrix was decomposed with SVD, and the effects of 8 potential TFBMs (5′-CAGGC-3′, 5′-CGCCC-3′, 5′-CCGCC-3′, 5′-ATGGG-3′, 5′-GGGAA-3′, 5′-CGTCC-3′, 5′-AAAGG-3′, and 5′-ACCCA-3′) were predicted from a pool of 512 random DNA sequences. The prediction included matches to the core binding motifs of biologically known TFBMs such as AP2, SP1, EGR1, KROX, GC-BOX, ABI4, ETF, E217, SRF, STAT, IK-1, PPARG, STAF, ROAZ, and NFKB, and their significance was evaluated numerically using Monte Carlo simulation and genetic algorithm. Conclusions: The described SVD-based prediction is an analytical method to provide a set of potential TFBMs involved in transcriptional regulation. The results would be useful to evaluate analytically a contribution of individual DNA sequences.",
author = "Yunlong Liu and Vincenti, {Matthew P.} and Hiroki Yokota",
year = "2005",
month = "11",
day = "18",
doi = "10.1186/1471-2105-6-276",
language = "English (US)",
volume = "6",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Principal component analysis for predicting transcription-factor binding motifs from array-derived data

AU - Liu, Yunlong

AU - Vincenti, Matthew P.

AU - Yokota, Hiroki

PY - 2005/11/18

Y1 - 2005/11/18

N2 - Background: The responses to interleukin 1 (EL- 1) in human chondrocytes constitute a complex regulatory mechanism, where multiple transcription factors interact combinatorially to transcription-factor binding motifs (TFBMs). In order to select a critical set of TFBMs from genomic DNA information and an array-derived data, an efficient algorithm to solve a combinatorial optimization problem is required. Although computational approaches based on evolutionary algorithms are commonly employed, an analytical algorithm would be useful to predict TFBMs at nearly no computational cost and evaluate varying modelling conditions. Singular value decomposition (SVD) is a powerful method to derive primary components of a given matrix. Applying SVD to a promoter matrix defined from regulatory DNA sequences, we derived a novel method to predict the critical set of TFBMs. Results: The promoter matrix was defined to establish a quantitative relationship between the IL-1 -driven mRNA alteration and genomic DNA sequences of the IL-1 responsive genes. The matrix was decomposed with SVD, and the effects of 8 potential TFBMs (5′-CAGGC-3′, 5′-CGCCC-3′, 5′-CCGCC-3′, 5′-ATGGG-3′, 5′-GGGAA-3′, 5′-CGTCC-3′, 5′-AAAGG-3′, and 5′-ACCCA-3′) were predicted from a pool of 512 random DNA sequences. The prediction included matches to the core binding motifs of biologically known TFBMs such as AP2, SP1, EGR1, KROX, GC-BOX, ABI4, ETF, E217, SRF, STAT, IK-1, PPARG, STAF, ROAZ, and NFKB, and their significance was evaluated numerically using Monte Carlo simulation and genetic algorithm. Conclusions: The described SVD-based prediction is an analytical method to provide a set of potential TFBMs involved in transcriptional regulation. The results would be useful to evaluate analytically a contribution of individual DNA sequences.

AB - Background: The responses to interleukin 1 (EL- 1) in human chondrocytes constitute a complex regulatory mechanism, where multiple transcription factors interact combinatorially to transcription-factor binding motifs (TFBMs). In order to select a critical set of TFBMs from genomic DNA information and an array-derived data, an efficient algorithm to solve a combinatorial optimization problem is required. Although computational approaches based on evolutionary algorithms are commonly employed, an analytical algorithm would be useful to predict TFBMs at nearly no computational cost and evaluate varying modelling conditions. Singular value decomposition (SVD) is a powerful method to derive primary components of a given matrix. Applying SVD to a promoter matrix defined from regulatory DNA sequences, we derived a novel method to predict the critical set of TFBMs. Results: The promoter matrix was defined to establish a quantitative relationship between the IL-1 -driven mRNA alteration and genomic DNA sequences of the IL-1 responsive genes. The matrix was decomposed with SVD, and the effects of 8 potential TFBMs (5′-CAGGC-3′, 5′-CGCCC-3′, 5′-CCGCC-3′, 5′-ATGGG-3′, 5′-GGGAA-3′, 5′-CGTCC-3′, 5′-AAAGG-3′, and 5′-ACCCA-3′) were predicted from a pool of 512 random DNA sequences. The prediction included matches to the core binding motifs of biologically known TFBMs such as AP2, SP1, EGR1, KROX, GC-BOX, ABI4, ETF, E217, SRF, STAT, IK-1, PPARG, STAF, ROAZ, and NFKB, and their significance was evaluated numerically using Monte Carlo simulation and genetic algorithm. Conclusions: The described SVD-based prediction is an analytical method to provide a set of potential TFBMs involved in transcriptional regulation. The results would be useful to evaluate analytically a contribution of individual DNA sequences.

UR - http://www.scopus.com/inward/record.url?scp=28344440835&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=28344440835&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-6-276

DO - 10.1186/1471-2105-6-276

M3 - Article

C2 - 16297243

AN - SCOPUS:28344440835

VL - 6

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 276

ER -