Genome-wide analysis and modeling of DNA methylation susceptibility in 30 breast cancer cell lines by using CpG flanking sequences

Jaehyun An, Kwangsoo Kim, Sung Min Rhee, Heejoon Chae, Kenneth P. Nephew, Sun Kim

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


DNA methylation is an epigenetic modification of DNA that adds a methyl group to cytosine. Aberrant DNA methylation in the CpG context is frequently observed in cancer cells and it is known that aberrant DNA methylation silences tumor repressor genes. However, the mechanism of DNA methylation is not well understood. A widely accepted hypothesis is that DNA methylation does not randomly occur and may be controlled by some instructive mechanisms. In this paper, we conducted an extensive study on this important question by using proprietary sequencing data from methyl-binding domain protein (MBD)-Cap ChIP sequencing experiments for 30 breast cancer cell lines. The goal of our study is to investigate difference in nucleotide composition around CpG sites, where high levels of methylation are observed, and use the information for modeling DNA methylation susceptibility. First, we observed that DNA methylation is not uniform in the whole-genome region and also that the character composition of CpG flanking sequences are significantly different between hyper- and hypo-methylated groups. In an in-depth study, we used information theoretic approaches such as entropy and relative entropy to delineate character composition features and found enrichment of A (Adenine) and T (Thymine) in specific positions around hyper-methylated sites. As the methylation level is increased, A, T proportions in specific positions around hypermethylated sites are increased while A, T proportions in other positions around hypermethylated sites are decreased. Second, we built predictive models for methylation susceptibility by using characters flanking CpG sites as features and hyper-/hypo-methylation status as class. Third, we constructed predictive models using a log odds score of two profiles from DNA sequences surrounding CpG sites of hyper- and hypo-methylated groups. This analysis showed that distribution of profile scores of hyper-/hypo-methylated sites sequences is quite distinct. Our genome-wide CpG methylation study shows that nucleotides around CpG sites caries information for cytosine methylation. This is consistent with the seminal work on the instructive evidence of DNA methylation by Keshet et al. (Nature Genetics, 38(2), 149-153 2006). Our study is on the full genome scale and used the sequencing data, thus our study is significantly different in terms of resolution of data and analysis methods used for the study by Keshet et al.

Original languageEnglish (US)
Article number1341003
JournalJournal of bioinformatics and computational biology
Issue number3
StatePublished - Jun 1 2013


  • classification
  • CpG flanking sequence
  • DNA methylation
  • genome-wide analysis

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Medicine(all)

Fingerprint Dive into the research topics of 'Genome-wide analysis and modeling of DNA methylation susceptibility in 30 breast cancer cell lines by using CpG flanking sequences'. Together they form a unique fingerprint.

Cite this