Correcting imbalanced reads coverage in bacterial transcriptome sequencing with extreme deep coverage

Xinjun Zhang, Dharanesh Gangaiah, Robert S. Munson, Stanley M. Spinola, Yunlong Liu

Research output: Contribution to journalArticle

Abstract

High throughput bacterial RNA-Seq experiments can generate extremely high and imbalanced sequencing coverage. Over- or underestimation of gene expression levels will hinder accurate gene differential expression analysis. Here we evaluated strategies to identify expression differences of genes with high coverage in bacterial transcriptome data using either raw sequence reads or unique reads with duplicate fragments removed. In addition, we proposed a generalised linear model (GLM) based approach to identify imbalance in read coverage based on sequence compositions. Our results show that analysis using raw reads identifies more differentially expressed genes with more accurate fold change than using unique reads. We also demonstrate the presence of sequence composition related biases that are independent of gene expression levels and experimental conditions. Finally, genes that still show strong coverage imbalance after correction were tagged using statistical approach.

Original languageEnglish (US)
Pages (from-to)195-213
Number of pages19
JournalInternational Journal of Computational Biology and Drug Design
Volume7
Issue number2-3
DOIs
StatePublished - 2014

    Fingerprint

Keywords

  • Bacterial transcriptome sequencing
  • Computational biology
  • Coverage imbalance
  • Gene differential expression
  • Generalised linear model
  • GLM
  • RNA-Seq
  • Tri-nucleotides

ASJC Scopus subject areas

  • Computer Science Applications
  • Drug Discovery

Cite this