Genome-wide analysis and modeling of DNA methylation susceptibility in 30 breast cancer cell lines by using CpG flanking sequences

Jaehyun An, Kwangsoo Kim, Sung Min Rhee, Heejoon Chae, Kenneth Nephew, Sun Kim

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

DNA methylation is an epigenetic modification of DNA that adds a methyl group to cytosine. Aberrant DNA methylation in the CpG context is frequently observed in cancer cells and it is known that aberrant DNA methylation silences tumor repressor genes. However, the mechanism of DNA methylation is not well understood. A widely accepted hypothesis is that DNA methylation does not randomly occur and may be controlled by some instructive mechanisms. In this paper, we conducted an extensive study on this important question by using proprietary sequencing data from methyl-binding domain protein (MBD)-Cap ChIP sequencing experiments for 30 breast cancer cell lines. The goal of our study is to investigate difference in nucleotide composition around CpG sites, where high levels of methylation are observed, and use the information for modeling DNA methylation susceptibility. First, we observed that DNA methylation is not uniform in the whole-genome region and also that the character composition of CpG flanking sequences are significantly different between hyper- and hypo-methylated groups. In an in-depth study, we used information theoretic approaches such as entropy and relative entropy to delineate character composition features and found enrichment of A (Adenine) and T (Thymine) in specific positions around hyper-methylated sites. As the methylation level is increased, A, T proportions in specific positions around hypermethylated sites are increased while A, T proportions in other positions around hypermethylated sites are decreased. Second, we built predictive models for methylation susceptibility by using characters flanking CpG sites as features and hyper-/hypo-methylation status as class. Third, we constructed predictive models using a log odds score of two profiles from DNA sequences surrounding CpG sites of hyper- and hypo-methylated groups. This analysis showed that distribution of profile scores of hyper-/hypo-methylated sites sequences is quite distinct. Our genome-wide CpG methylation study shows that nucleotides around CpG sites caries information for cytosine methylation. This is consistent with the seminal work on the instructive evidence of DNA methylation by Keshet et al. (Nature Genetics, 38(2), 149-153 2006). Our study is on the full genome scale and used the sequencing data, thus our study is significantly different in terms of resolution of data and analysis methods used for the study by Keshet et al.

Original languageEnglish
Article number1341003
JournalJournal of Bioinformatics and Computational Biology
Volume11
Issue number3
DOIs
StatePublished - Jun 2013

Fingerprint

DNA Methylation
Methylation
Genes
Cells
Genome
Breast Neoplasms
Cell Line
Cytosine
Entropy
Nucleotides
RNA Cap-Binding Proteins
Chemical analysis
Thymine
DNA sequences
Adenine
Epigenomics
Tumors
Neoplasms
DNA
Proteins

Keywords

  • classification
  • CpG flanking sequence
  • DNA methylation
  • genome-wide analysis

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Medicine(all)

Cite this

Genome-wide analysis and modeling of DNA methylation susceptibility in 30 breast cancer cell lines by using CpG flanking sequences. / An, Jaehyun; Kim, Kwangsoo; Rhee, Sung Min; Chae, Heejoon; Nephew, Kenneth; Kim, Sun.

In: Journal of Bioinformatics and Computational Biology, Vol. 11, No. 3, 1341003, 06.2013.

Research output: Contribution to journalArticle

@article{11b8965ff2f846c28a00ddf097fd04bf,
title = "Genome-wide analysis and modeling of DNA methylation susceptibility in 30 breast cancer cell lines by using CpG flanking sequences",
abstract = "DNA methylation is an epigenetic modification of DNA that adds a methyl group to cytosine. Aberrant DNA methylation in the CpG context is frequently observed in cancer cells and it is known that aberrant DNA methylation silences tumor repressor genes. However, the mechanism of DNA methylation is not well understood. A widely accepted hypothesis is that DNA methylation does not randomly occur and may be controlled by some instructive mechanisms. In this paper, we conducted an extensive study on this important question by using proprietary sequencing data from methyl-binding domain protein (MBD)-Cap ChIP sequencing experiments for 30 breast cancer cell lines. The goal of our study is to investigate difference in nucleotide composition around CpG sites, where high levels of methylation are observed, and use the information for modeling DNA methylation susceptibility. First, we observed that DNA methylation is not uniform in the whole-genome region and also that the character composition of CpG flanking sequences are significantly different between hyper- and hypo-methylated groups. In an in-depth study, we used information theoretic approaches such as entropy and relative entropy to delineate character composition features and found enrichment of A (Adenine) and T (Thymine) in specific positions around hyper-methylated sites. As the methylation level is increased, A, T proportions in specific positions around hypermethylated sites are increased while A, T proportions in other positions around hypermethylated sites are decreased. Second, we built predictive models for methylation susceptibility by using characters flanking CpG sites as features and hyper-/hypo-methylation status as class. Third, we constructed predictive models using a log odds score of two profiles from DNA sequences surrounding CpG sites of hyper- and hypo-methylated groups. This analysis showed that distribution of profile scores of hyper-/hypo-methylated sites sequences is quite distinct. Our genome-wide CpG methylation study shows that nucleotides around CpG sites caries information for cytosine methylation. This is consistent with the seminal work on the instructive evidence of DNA methylation by Keshet et al. (Nature Genetics, 38(2), 149-153 2006). Our study is on the full genome scale and used the sequencing data, thus our study is significantly different in terms of resolution of data and analysis methods used for the study by Keshet et al.",
keywords = "classification, CpG flanking sequence, DNA methylation, genome-wide analysis",
author = "Jaehyun An and Kwangsoo Kim and Rhee, {Sung Min} and Heejoon Chae and Kenneth Nephew and Sun Kim",
year = "2013",
month = "6",
doi = "10.1142/S0219720013410035",
language = "English",
volume = "11",
journal = "Journal of Bioinformatics and Computational Biology",
issn = "0219-7200",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "3",

}

TY - JOUR

T1 - Genome-wide analysis and modeling of DNA methylation susceptibility in 30 breast cancer cell lines by using CpG flanking sequences

AU - An, Jaehyun

AU - Kim, Kwangsoo

AU - Rhee, Sung Min

AU - Chae, Heejoon

AU - Nephew, Kenneth

AU - Kim, Sun

PY - 2013/6

Y1 - 2013/6

N2 - DNA methylation is an epigenetic modification of DNA that adds a methyl group to cytosine. Aberrant DNA methylation in the CpG context is frequently observed in cancer cells and it is known that aberrant DNA methylation silences tumor repressor genes. However, the mechanism of DNA methylation is not well understood. A widely accepted hypothesis is that DNA methylation does not randomly occur and may be controlled by some instructive mechanisms. In this paper, we conducted an extensive study on this important question by using proprietary sequencing data from methyl-binding domain protein (MBD)-Cap ChIP sequencing experiments for 30 breast cancer cell lines. The goal of our study is to investigate difference in nucleotide composition around CpG sites, where high levels of methylation are observed, and use the information for modeling DNA methylation susceptibility. First, we observed that DNA methylation is not uniform in the whole-genome region and also that the character composition of CpG flanking sequences are significantly different between hyper- and hypo-methylated groups. In an in-depth study, we used information theoretic approaches such as entropy and relative entropy to delineate character composition features and found enrichment of A (Adenine) and T (Thymine) in specific positions around hyper-methylated sites. As the methylation level is increased, A, T proportions in specific positions around hypermethylated sites are increased while A, T proportions in other positions around hypermethylated sites are decreased. Second, we built predictive models for methylation susceptibility by using characters flanking CpG sites as features and hyper-/hypo-methylation status as class. Third, we constructed predictive models using a log odds score of two profiles from DNA sequences surrounding CpG sites of hyper- and hypo-methylated groups. This analysis showed that distribution of profile scores of hyper-/hypo-methylated sites sequences is quite distinct. Our genome-wide CpG methylation study shows that nucleotides around CpG sites caries information for cytosine methylation. This is consistent with the seminal work on the instructive evidence of DNA methylation by Keshet et al. (Nature Genetics, 38(2), 149-153 2006). Our study is on the full genome scale and used the sequencing data, thus our study is significantly different in terms of resolution of data and analysis methods used for the study by Keshet et al.

AB - DNA methylation is an epigenetic modification of DNA that adds a methyl group to cytosine. Aberrant DNA methylation in the CpG context is frequently observed in cancer cells and it is known that aberrant DNA methylation silences tumor repressor genes. However, the mechanism of DNA methylation is not well understood. A widely accepted hypothesis is that DNA methylation does not randomly occur and may be controlled by some instructive mechanisms. In this paper, we conducted an extensive study on this important question by using proprietary sequencing data from methyl-binding domain protein (MBD)-Cap ChIP sequencing experiments for 30 breast cancer cell lines. The goal of our study is to investigate difference in nucleotide composition around CpG sites, where high levels of methylation are observed, and use the information for modeling DNA methylation susceptibility. First, we observed that DNA methylation is not uniform in the whole-genome region and also that the character composition of CpG flanking sequences are significantly different between hyper- and hypo-methylated groups. In an in-depth study, we used information theoretic approaches such as entropy and relative entropy to delineate character composition features and found enrichment of A (Adenine) and T (Thymine) in specific positions around hyper-methylated sites. As the methylation level is increased, A, T proportions in specific positions around hypermethylated sites are increased while A, T proportions in other positions around hypermethylated sites are decreased. Second, we built predictive models for methylation susceptibility by using characters flanking CpG sites as features and hyper-/hypo-methylation status as class. Third, we constructed predictive models using a log odds score of two profiles from DNA sequences surrounding CpG sites of hyper- and hypo-methylated groups. This analysis showed that distribution of profile scores of hyper-/hypo-methylated sites sequences is quite distinct. Our genome-wide CpG methylation study shows that nucleotides around CpG sites caries information for cytosine methylation. This is consistent with the seminal work on the instructive evidence of DNA methylation by Keshet et al. (Nature Genetics, 38(2), 149-153 2006). Our study is on the full genome scale and used the sequencing data, thus our study is significantly different in terms of resolution of data and analysis methods used for the study by Keshet et al.

KW - classification

KW - CpG flanking sequence

KW - DNA methylation

KW - genome-wide analysis

UR - http://www.scopus.com/inward/record.url?scp=84879634336&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879634336&partnerID=8YFLogxK

U2 - 10.1142/S0219720013410035

DO - 10.1142/S0219720013410035

M3 - Article

VL - 11

JO - Journal of Bioinformatics and Computational Biology

JF - Journal of Bioinformatics and Computational Biology

SN - 0219-7200

IS - 3

M1 - 1341003

ER -