A Unified Model for Robust Differential Expression Analysis of RNA-Seq Data

Kefei Liu, Li Shen, Hui Jian

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A fundamental task for RNA-seq data analysis is to determine whether the RNA-seq read counts for a gene or exon are significantly different across experimental conditions. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. In most existing methods the normalization step is independent of DE analysis, which is not well justified since ideally normalization should be based on non-DE genes only. Recently, Jiang and Zhan proposed a robust statistical model for joint between-sample normalization and DE analysis from log-transformed RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models, and the L0 penalty is introduced to induce sparsity in the regression coefficients. In their model, the experimental conditions are assumed to be categorical (e.g., 0 for control and 1 for case), and one-way analysis of variance (ANOVA) is used to identify genes that are differentially expressed between two or more conditions. In this work, Jiang and Zhan's model is generalized to accommodate continuous/numerical experimental conditions, and a linear regression model is used to detect genes for which the expression level is significantly affected by the experimental conditions. Furthermore, an efficient algorithm is developed to solve for the global solution of the resultant high-dimensional, non-convex and non-differentiable penalized least squares regression problem. Extensive simulation studies and a real RNA-seq data example show that when the proportion of DE genes is small or the numbers of up-and down-regulated genes are approximately equal the proposed method performs similarly to existing methods in terms of detection power and false positive rate. When a large proportion (e.g., > 30%) of genes are differentially expressed in an asymmetric manner, it outperforms existing methods and the performance gain is even more substantial as the sample size increases.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
EditorsHarald Schmidt, David Griol, Haiying Wang, Jan Baumbach, Huiru Zheng, Zoraida Callejas, Xiaohua Hu, Julie Dickerson, Le Zhang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages437-442
Number of pages6
ISBN (Electronic)9781538654880
DOIs
StatePublished - Jan 21 2019
Externally publishedYes
Event2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 - Madrid, Spain
Duration: Dec 3 2018Dec 6 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

Conference

Conference2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018
CountrySpain
CityMadrid
Period12/3/1812/6/18

Fingerprint

RNA
Genes
Linear Models
Gene Expression
Gene expression
Statistical Models
Least-Squares Analysis
Sample Size
Analysis of variance (ANOVA)
Exons
Analysis of Variance
Linear regression
Joints

Keywords

  • Differential expression analysis
  • Inter-sample normalization
  • L0 sparsity regularization
  • Linear regression
  • RNA-seq

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Cite this

Liu, K., Shen, L., & Jian, H. (2019). A Unified Model for Robust Differential Expression Analysis of RNA-Seq Data. In H. Schmidt, D. Griol, H. Wang, J. Baumbach, H. Zheng, Z. Callejas, X. Hu, J. Dickerson, ... L. Zhang (Eds.), Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018 (pp. 437-442). [8621331] (Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BIBM.2018.8621331

A Unified Model for Robust Differential Expression Analysis of RNA-Seq Data. / Liu, Kefei; Shen, Li; Jian, Hui.

Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018. ed. / Harald Schmidt; David Griol; Haiying Wang; Jan Baumbach; Huiru Zheng; Zoraida Callejas; Xiaohua Hu; Julie Dickerson; Le Zhang. Institute of Electrical and Electronics Engineers Inc., 2019. p. 437-442 8621331 (Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liu, K, Shen, L & Jian, H 2019, A Unified Model for Robust Differential Expression Analysis of RNA-Seq Data. in H Schmidt, D Griol, H Wang, J Baumbach, H Zheng, Z Callejas, X Hu, J Dickerson & L Zhang (eds), Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018., 8621331, Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018, Institute of Electrical and Electronics Engineers Inc., pp. 437-442, 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018, Madrid, Spain, 12/3/18. https://doi.org/10.1109/BIBM.2018.8621331
Liu K, Shen L, Jian H. A Unified Model for Robust Differential Expression Analysis of RNA-Seq Data. In Schmidt H, Griol D, Wang H, Baumbach J, Zheng H, Callejas Z, Hu X, Dickerson J, Zhang L, editors, Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018. Institute of Electrical and Electronics Engineers Inc. 2019. p. 437-442. 8621331. (Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018). https://doi.org/10.1109/BIBM.2018.8621331
Liu, Kefei ; Shen, Li ; Jian, Hui. / A Unified Model for Robust Differential Expression Analysis of RNA-Seq Data. Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018. editor / Harald Schmidt ; David Griol ; Haiying Wang ; Jan Baumbach ; Huiru Zheng ; Zoraida Callejas ; Xiaohua Hu ; Julie Dickerson ; Le Zhang. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 437-442 (Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018).
@inproceedings{a6a187e3df334259975a29d80e8ef95d,
title = "A Unified Model for Robust Differential Expression Analysis of RNA-Seq Data",
abstract = "A fundamental task for RNA-seq data analysis is to determine whether the RNA-seq read counts for a gene or exon are significantly different across experimental conditions. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. In most existing methods the normalization step is independent of DE analysis, which is not well justified since ideally normalization should be based on non-DE genes only. Recently, Jiang and Zhan proposed a robust statistical model for joint between-sample normalization and DE analysis from log-transformed RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models, and the L0 penalty is introduced to induce sparsity in the regression coefficients. In their model, the experimental conditions are assumed to be categorical (e.g., 0 for control and 1 for case), and one-way analysis of variance (ANOVA) is used to identify genes that are differentially expressed between two or more conditions. In this work, Jiang and Zhan's model is generalized to accommodate continuous/numerical experimental conditions, and a linear regression model is used to detect genes for which the expression level is significantly affected by the experimental conditions. Furthermore, an efficient algorithm is developed to solve for the global solution of the resultant high-dimensional, non-convex and non-differentiable penalized least squares regression problem. Extensive simulation studies and a real RNA-seq data example show that when the proportion of DE genes is small or the numbers of up-and down-regulated genes are approximately equal the proposed method performs similarly to existing methods in terms of detection power and false positive rate. When a large proportion (e.g., > 30{\%}) of genes are differentially expressed in an asymmetric manner, it outperforms existing methods and the performance gain is even more substantial as the sample size increases.",
keywords = "Differential expression analysis, Inter-sample normalization, L0 sparsity regularization, Linear regression, RNA-seq",
author = "Kefei Liu and Li Shen and Hui Jian",
year = "2019",
month = "1",
day = "21",
doi = "10.1109/BIBM.2018.8621331",
language = "English (US)",
series = "Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "437--442",
editor = "Harald Schmidt and David Griol and Haiying Wang and Jan Baumbach and Huiru Zheng and Zoraida Callejas and Xiaohua Hu and Julie Dickerson and Le Zhang",
booktitle = "Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018",

}

TY - GEN

T1 - A Unified Model for Robust Differential Expression Analysis of RNA-Seq Data

AU - Liu, Kefei

AU - Shen, Li

AU - Jian, Hui

PY - 2019/1/21

Y1 - 2019/1/21

N2 - A fundamental task for RNA-seq data analysis is to determine whether the RNA-seq read counts for a gene or exon are significantly different across experimental conditions. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. In most existing methods the normalization step is independent of DE analysis, which is not well justified since ideally normalization should be based on non-DE genes only. Recently, Jiang and Zhan proposed a robust statistical model for joint between-sample normalization and DE analysis from log-transformed RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models, and the L0 penalty is introduced to induce sparsity in the regression coefficients. In their model, the experimental conditions are assumed to be categorical (e.g., 0 for control and 1 for case), and one-way analysis of variance (ANOVA) is used to identify genes that are differentially expressed between two or more conditions. In this work, Jiang and Zhan's model is generalized to accommodate continuous/numerical experimental conditions, and a linear regression model is used to detect genes for which the expression level is significantly affected by the experimental conditions. Furthermore, an efficient algorithm is developed to solve for the global solution of the resultant high-dimensional, non-convex and non-differentiable penalized least squares regression problem. Extensive simulation studies and a real RNA-seq data example show that when the proportion of DE genes is small or the numbers of up-and down-regulated genes are approximately equal the proposed method performs similarly to existing methods in terms of detection power and false positive rate. When a large proportion (e.g., > 30%) of genes are differentially expressed in an asymmetric manner, it outperforms existing methods and the performance gain is even more substantial as the sample size increases.

AB - A fundamental task for RNA-seq data analysis is to determine whether the RNA-seq read counts for a gene or exon are significantly different across experimental conditions. Since the RNA-seq measurements are relative in nature, between-sample normalization of counts is an essential step in differential expression (DE) analysis. In most existing methods the normalization step is independent of DE analysis, which is not well justified since ideally normalization should be based on non-DE genes only. Recently, Jiang and Zhan proposed a robust statistical model for joint between-sample normalization and DE analysis from log-transformed RNA-seq data. Sample-specific normalization factors are modeled as unknown parameters in the gene-wise linear models, and the L0 penalty is introduced to induce sparsity in the regression coefficients. In their model, the experimental conditions are assumed to be categorical (e.g., 0 for control and 1 for case), and one-way analysis of variance (ANOVA) is used to identify genes that are differentially expressed between two or more conditions. In this work, Jiang and Zhan's model is generalized to accommodate continuous/numerical experimental conditions, and a linear regression model is used to detect genes for which the expression level is significantly affected by the experimental conditions. Furthermore, an efficient algorithm is developed to solve for the global solution of the resultant high-dimensional, non-convex and non-differentiable penalized least squares regression problem. Extensive simulation studies and a real RNA-seq data example show that when the proportion of DE genes is small or the numbers of up-and down-regulated genes are approximately equal the proposed method performs similarly to existing methods in terms of detection power and false positive rate. When a large proportion (e.g., > 30%) of genes are differentially expressed in an asymmetric manner, it outperforms existing methods and the performance gain is even more substantial as the sample size increases.

KW - Differential expression analysis

KW - Inter-sample normalization

KW - L0 sparsity regularization

KW - Linear regression

KW - RNA-seq

UR - http://www.scopus.com/inward/record.url?scp=85062520018&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062520018&partnerID=8YFLogxK

U2 - 10.1109/BIBM.2018.8621331

DO - 10.1109/BIBM.2018.8621331

M3 - Conference contribution

T3 - Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

SP - 437

EP - 442

BT - Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018

A2 - Schmidt, Harald

A2 - Griol, David

A2 - Wang, Haiying

A2 - Baumbach, Jan

A2 - Zheng, Huiru

A2 - Callejas, Zoraida

A2 - Hu, Xiaohua

A2 - Dickerson, Julie

A2 - Zhang, Le

PB - Institute of Electrical and Electronics Engineers Inc.

ER -