A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq data

Kefei Liu, Jieping Ye, Yang Yang, Li Shen, Hui Jiang

Research output: Contribution to journalArticle

Abstract

The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. The normalization of existing DE detection algorithms is ad hoc and performed once for all prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of log-transformed RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2 penalty for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented lagrangian method to solve it. Simulation studies show that the proposed model and algorithms perform better than or comparably to existing methods in terms of detection power and false-positive rate. The performance gain increases with increasingly larger sample size or higher signal to noise ratio, and is more significant when a large proportion of genes are differentially expressed in an asymmetric manner.

Original languageEnglish (US)
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
DOIs
StateAccepted/In press - Jan 6 2018

Fingerprint

Differential Expression
RNA
Gene expression
Normalization
Gene Expression
Joints
RNA Sequence Analysis
Regression Coefficient
Genes
Signal-To-Noise Ratio
Statistical Models
Penalty
Least-Squares Analysis
Sample Size
Linear Models
Penalized Regression
Gene
Penalized Least Squares
Augmented Lagrangian Method
Model

Keywords

  • augmented Lagrangian method
  • differential expression analysis
  • L1-Norm regularization
  • linear regression
  • normalization
  • RNA-Seq

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Cite this

A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq data. / Liu, Kefei; Ye, Jieping; Yang, Yang; Shen, Li; Jiang, Hui.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, 06.01.2018.

Research output: Contribution to journalArticle

@article{c69f969915a245d4a716d4a014d1f677,
title = "A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq data",
abstract = "The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. The normalization of existing DE detection algorithms is ad hoc and performed once for all prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of log-transformed RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2 penalty for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented lagrangian method to solve it. Simulation studies show that the proposed model and algorithms perform better than or comparably to existing methods in terms of detection power and false-positive rate. The performance gain increases with increasingly larger sample size or higher signal to noise ratio, and is more significant when a large proportion of genes are differentially expressed in an asymmetric manner.",
keywords = "augmented Lagrangian method, differential expression analysis, L1-Norm regularization, linear regression, normalization, RNA-Seq",
author = "Kefei Liu and Jieping Ye and Yang Yang and Li Shen and Hui Jiang",
year = "2018",
month = "1",
day = "6",
doi = "10.1109/TCBB.2018.2790918",
language = "English (US)",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - A Unified Model for Joint Normalization and Differential Gene Expression Detection in RNA-Seq data

AU - Liu, Kefei

AU - Ye, Jieping

AU - Yang, Yang

AU - Shen, Li

AU - Jiang, Hui

PY - 2018/1/6

Y1 - 2018/1/6

N2 - The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. The normalization of existing DE detection algorithms is ad hoc and performed once for all prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of log-transformed RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2 penalty for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented lagrangian method to solve it. Simulation studies show that the proposed model and algorithms perform better than or comparably to existing methods in terms of detection power and false-positive rate. The performance gain increases with increasingly larger sample size or higher signal to noise ratio, and is more significant when a large proportion of genes are differentially expressed in an asymmetric manner.

AB - The RNA-sequencing (RNA-seq) is becoming increasingly popular for quantifying gene expression levels. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. The normalization of existing DE detection algorithms is ad hoc and performed once for all prior to DE detection, which may be suboptimal since ideally normalization should be based on non-DE genes only and thus coupled with DE detection. We propose a unified statistical model for joint normalization and DE detection of log-transformed RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models and jointly estimated with the regression coefficients. By imposing sparsity-inducing L1 penalty (or mixed L1/L2 penalty for multiple treatment conditions) on the regression coefficients, we formulate the problem as a penalized least-squares regression problem and apply the augmented lagrangian method to solve it. Simulation studies show that the proposed model and algorithms perform better than or comparably to existing methods in terms of detection power and false-positive rate. The performance gain increases with increasingly larger sample size or higher signal to noise ratio, and is more significant when a large proportion of genes are differentially expressed in an asymmetric manner.

KW - augmented Lagrangian method

KW - differential expression analysis

KW - L1-Norm regularization

KW - linear regression

KW - normalization

KW - RNA-Seq

UR - http://www.scopus.com/inward/record.url?scp=85040557512&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040557512&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2018.2790918

DO - 10.1109/TCBB.2018.2790918

M3 - Article

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

ER -