Peak detection on ChIP-Seq data using wavelet transformation

Heng Yi Wu, Jie Zhang, Kun Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We propose a signal processing approach for detecting enrichment regions from ChIP-seq datasets. A wavelet transform of the ChIP-seq data offers a direct visualization for both short- and long-range patterns of the genome-wide mapping profile for protein binding site on DNA. To investigate the location of transcription factor binding site (TFBS) from ChIP-seq data, a wavelet-based peak detection algorithm is proposed. Differing from prior methods exploring the statistics of peaks in whole genome, scalogram of raw data is used. In addition, a SNR-like parameter used to detects the peaks is proposed to instead of raw data for tackling the peak finding problem. Also peak depth, the length of peak regions can be obtained by the measurement of SNR-like parameter with a threshold constrain. Furthermore, in order to eliminate false positives, a filter which sifts out the peaks with sufficient SNR but not deep enough in sequence depth is applied. The effectiveness of our method is demonstrated by applying the STAT1 ChIP-seq data and comparing to the well known published method, PeakSeq. The experimental results show that a large fraction of peaks identified by our method are consistent with the results of PeakSeq algorithm while our results show more consistent motif conservation scores.

Original languageEnglish (US)
Title of host publication2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010
Pages555-560
Number of pages6
DOIs
StatePublished - Dec 1 2010
Externally publishedYes
Event2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010 - HongKong, China
Duration: Dec 18 2010Dec 21 2010

Other

Other2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010
CountryChina
CityHongKong
Period12/18/1012/21/10

Fingerprint

Binding sites
Genes
Transcription factors
Wavelet transforms
Conservation
Signal processing
DNA
Visualization
Binding Sites
Statistics
Wavelet Analysis
Chromosome Mapping
Protein Binding
Transcription Factors
Genome

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Cite this

Wu, H. Y., Zhang, J., & Huang, K. (2010). Peak detection on ChIP-Seq data using wavelet transformation. In 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010 (pp. 555-560). [5703861] https://doi.org/10.1109/BIBMW.2010.5703861

Peak detection on ChIP-Seq data using wavelet transformation. / Wu, Heng Yi; Zhang, Jie; Huang, Kun.

2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010. 2010. p. 555-560 5703861.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wu, HY, Zhang, J & Huang, K 2010, Peak detection on ChIP-Seq data using wavelet transformation. in 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010., 5703861, pp. 555-560, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010, HongKong, China, 12/18/10. https://doi.org/10.1109/BIBMW.2010.5703861
Wu HY, Zhang J, Huang K. Peak detection on ChIP-Seq data using wavelet transformation. In 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010. 2010. p. 555-560. 5703861 https://doi.org/10.1109/BIBMW.2010.5703861
Wu, Heng Yi ; Zhang, Jie ; Huang, Kun. / Peak detection on ChIP-Seq data using wavelet transformation. 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010. 2010. pp. 555-560
@inproceedings{9627706ac6b849ed863231d5f60774bb,
title = "Peak detection on ChIP-Seq data using wavelet transformation",
abstract = "We propose a signal processing approach for detecting enrichment regions from ChIP-seq datasets. A wavelet transform of the ChIP-seq data offers a direct visualization for both short- and long-range patterns of the genome-wide mapping profile for protein binding site on DNA. To investigate the location of transcription factor binding site (TFBS) from ChIP-seq data, a wavelet-based peak detection algorithm is proposed. Differing from prior methods exploring the statistics of peaks in whole genome, scalogram of raw data is used. In addition, a SNR-like parameter used to detects the peaks is proposed to instead of raw data for tackling the peak finding problem. Also peak depth, the length of peak regions can be obtained by the measurement of SNR-like parameter with a threshold constrain. Furthermore, in order to eliminate false positives, a filter which sifts out the peaks with sufficient SNR but not deep enough in sequence depth is applied. The effectiveness of our method is demonstrated by applying the STAT1 ChIP-seq data and comparing to the well known published method, PeakSeq. The experimental results show that a large fraction of peaks identified by our method are consistent with the results of PeakSeq algorithm while our results show more consistent motif conservation scores.",
author = "Wu, {Heng Yi} and Jie Zhang and Kun Huang",
year = "2010",
month = "12",
day = "1",
doi = "10.1109/BIBMW.2010.5703861",
language = "English (US)",
isbn = "9781424483044",
pages = "555--560",
booktitle = "2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010",

}

TY - GEN

T1 - Peak detection on ChIP-Seq data using wavelet transformation

AU - Wu, Heng Yi

AU - Zhang, Jie

AU - Huang, Kun

PY - 2010/12/1

Y1 - 2010/12/1

N2 - We propose a signal processing approach for detecting enrichment regions from ChIP-seq datasets. A wavelet transform of the ChIP-seq data offers a direct visualization for both short- and long-range patterns of the genome-wide mapping profile for protein binding site on DNA. To investigate the location of transcription factor binding site (TFBS) from ChIP-seq data, a wavelet-based peak detection algorithm is proposed. Differing from prior methods exploring the statistics of peaks in whole genome, scalogram of raw data is used. In addition, a SNR-like parameter used to detects the peaks is proposed to instead of raw data for tackling the peak finding problem. Also peak depth, the length of peak regions can be obtained by the measurement of SNR-like parameter with a threshold constrain. Furthermore, in order to eliminate false positives, a filter which sifts out the peaks with sufficient SNR but not deep enough in sequence depth is applied. The effectiveness of our method is demonstrated by applying the STAT1 ChIP-seq data and comparing to the well known published method, PeakSeq. The experimental results show that a large fraction of peaks identified by our method are consistent with the results of PeakSeq algorithm while our results show more consistent motif conservation scores.

AB - We propose a signal processing approach for detecting enrichment regions from ChIP-seq datasets. A wavelet transform of the ChIP-seq data offers a direct visualization for both short- and long-range patterns of the genome-wide mapping profile for protein binding site on DNA. To investigate the location of transcription factor binding site (TFBS) from ChIP-seq data, a wavelet-based peak detection algorithm is proposed. Differing from prior methods exploring the statistics of peaks in whole genome, scalogram of raw data is used. In addition, a SNR-like parameter used to detects the peaks is proposed to instead of raw data for tackling the peak finding problem. Also peak depth, the length of peak regions can be obtained by the measurement of SNR-like parameter with a threshold constrain. Furthermore, in order to eliminate false positives, a filter which sifts out the peaks with sufficient SNR but not deep enough in sequence depth is applied. The effectiveness of our method is demonstrated by applying the STAT1 ChIP-seq data and comparing to the well known published method, PeakSeq. The experimental results show that a large fraction of peaks identified by our method are consistent with the results of PeakSeq algorithm while our results show more consistent motif conservation scores.

UR - http://www.scopus.com/inward/record.url?scp=79952023244&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952023244&partnerID=8YFLogxK

U2 - 10.1109/BIBMW.2010.5703861

DO - 10.1109/BIBMW.2010.5703861

M3 - Conference contribution

SN - 9781424483044

SP - 555

EP - 560

BT - 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW 2010

ER -