Using machine learning algorithms to identify genes essential for cell survival

Santosh Philips, Heng Yi Wu, Lang Li

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Background: With the explosion of data comes a proportional opportunity to identify novel knowledge with the potential for application in targeted therapies. In spite of this huge amounts of data, the solutions to treating complex disease is elusive. One reason being that these diseases are driven by a network of genes that need to be targeted in order to understand and treat them effectively. Part of the solution lies in mining and integrating information from various disciplines. Here we propose a machine learning method to mining through publicly available literature on RNA interference with the goal of identifying genes essential for cell survival. Results: A total of 32,164 RNA interference abstracts were identified from 10.5 million pubmed abstracts (2001 - 2015). These abstracts spanned over 1467 cancer cell lines and 4373 genes representing a total of 25,891 cell gene associations. Among the 1467 cell lines 88% of them had at least 1 or up to 25 genes studied in a given cell line. Among the 4373 genes 96% of them were studied in at least 1 or up to 25 different cell lines. Conclusions: Identifying genes that are crucial for cell survival can be a critical piece of information especially in treating complex diseases, such as cancer. The efficacy of a therapeutic intervention is multifactorial in nature and in many cases the source of therapeutic disruption could be from an unsuspected source. Machine learning algorithms helps to narrow down the search and provides information about essential genes in different cancer types. It also provides the building blocks to generate a network of interconnected genes and processes. The information thus gained can be used to generate hypothesis which can be experimentally validated to improve our understanding of what triggers and maintains the growth of cancerous cells.

Original languageEnglish (US)
Article number397
JournalBMC Bioinformatics
Volume18
DOIs
StatePublished - Oct 3 2017

Fingerprint

Essential Genes
Learning algorithms
Learning systems
Learning Algorithm
Cell Survival
Machine Learning
Genes
Cells
Gene
Cell
Cell Line
Gene Regulatory Networks
RNA Interference
Cancer
Line
Neoplasms
Explosions
RNA
PubMed
Mining

Keywords

  • Gene essentiality
  • Literature mining
  • Machine learning

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Using machine learning algorithms to identify genes essential for cell survival. / Philips, Santosh; Wu, Heng Yi; Li, Lang.

In: BMC Bioinformatics, Vol. 18, 397, 03.10.2017.

Research output: Contribution to journalArticle

Philips, Santosh ; Wu, Heng Yi ; Li, Lang. / Using machine learning algorithms to identify genes essential for cell survival. In: BMC Bioinformatics. 2017 ; Vol. 18.
@article{28e95627e5fc4c6a8f52cb4c4841c85a,
title = "Using machine learning algorithms to identify genes essential for cell survival",
abstract = "Background: With the explosion of data comes a proportional opportunity to identify novel knowledge with the potential for application in targeted therapies. In spite of this huge amounts of data, the solutions to treating complex disease is elusive. One reason being that these diseases are driven by a network of genes that need to be targeted in order to understand and treat them effectively. Part of the solution lies in mining and integrating information from various disciplines. Here we propose a machine learning method to mining through publicly available literature on RNA interference with the goal of identifying genes essential for cell survival. Results: A total of 32,164 RNA interference abstracts were identified from 10.5 million pubmed abstracts (2001 - 2015). These abstracts spanned over 1467 cancer cell lines and 4373 genes representing a total of 25,891 cell gene associations. Among the 1467 cell lines 88{\%} of them had at least 1 or up to 25 genes studied in a given cell line. Among the 4373 genes 96{\%} of them were studied in at least 1 or up to 25 different cell lines. Conclusions: Identifying genes that are crucial for cell survival can be a critical piece of information especially in treating complex diseases, such as cancer. The efficacy of a therapeutic intervention is multifactorial in nature and in many cases the source of therapeutic disruption could be from an unsuspected source. Machine learning algorithms helps to narrow down the search and provides information about essential genes in different cancer types. It also provides the building blocks to generate a network of interconnected genes and processes. The information thus gained can be used to generate hypothesis which can be experimentally validated to improve our understanding of what triggers and maintains the growth of cancerous cells.",
keywords = "Gene essentiality, Literature mining, Machine learning",
author = "Santosh Philips and Wu, {Heng Yi} and Lang Li",
year = "2017",
month = "10",
day = "3",
doi = "10.1186/s12859-017-1799-1",
language = "English (US)",
volume = "18",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Using machine learning algorithms to identify genes essential for cell survival

AU - Philips, Santosh

AU - Wu, Heng Yi

AU - Li, Lang

PY - 2017/10/3

Y1 - 2017/10/3

N2 - Background: With the explosion of data comes a proportional opportunity to identify novel knowledge with the potential for application in targeted therapies. In spite of this huge amounts of data, the solutions to treating complex disease is elusive. One reason being that these diseases are driven by a network of genes that need to be targeted in order to understand and treat them effectively. Part of the solution lies in mining and integrating information from various disciplines. Here we propose a machine learning method to mining through publicly available literature on RNA interference with the goal of identifying genes essential for cell survival. Results: A total of 32,164 RNA interference abstracts were identified from 10.5 million pubmed abstracts (2001 - 2015). These abstracts spanned over 1467 cancer cell lines and 4373 genes representing a total of 25,891 cell gene associations. Among the 1467 cell lines 88% of them had at least 1 or up to 25 genes studied in a given cell line. Among the 4373 genes 96% of them were studied in at least 1 or up to 25 different cell lines. Conclusions: Identifying genes that are crucial for cell survival can be a critical piece of information especially in treating complex diseases, such as cancer. The efficacy of a therapeutic intervention is multifactorial in nature and in many cases the source of therapeutic disruption could be from an unsuspected source. Machine learning algorithms helps to narrow down the search and provides information about essential genes in different cancer types. It also provides the building blocks to generate a network of interconnected genes and processes. The information thus gained can be used to generate hypothesis which can be experimentally validated to improve our understanding of what triggers and maintains the growth of cancerous cells.

AB - Background: With the explosion of data comes a proportional opportunity to identify novel knowledge with the potential for application in targeted therapies. In spite of this huge amounts of data, the solutions to treating complex disease is elusive. One reason being that these diseases are driven by a network of genes that need to be targeted in order to understand and treat them effectively. Part of the solution lies in mining and integrating information from various disciplines. Here we propose a machine learning method to mining through publicly available literature on RNA interference with the goal of identifying genes essential for cell survival. Results: A total of 32,164 RNA interference abstracts were identified from 10.5 million pubmed abstracts (2001 - 2015). These abstracts spanned over 1467 cancer cell lines and 4373 genes representing a total of 25,891 cell gene associations. Among the 1467 cell lines 88% of them had at least 1 or up to 25 genes studied in a given cell line. Among the 4373 genes 96% of them were studied in at least 1 or up to 25 different cell lines. Conclusions: Identifying genes that are crucial for cell survival can be a critical piece of information especially in treating complex diseases, such as cancer. The efficacy of a therapeutic intervention is multifactorial in nature and in many cases the source of therapeutic disruption could be from an unsuspected source. Machine learning algorithms helps to narrow down the search and provides information about essential genes in different cancer types. It also provides the building blocks to generate a network of interconnected genes and processes. The information thus gained can be used to generate hypothesis which can be experimentally validated to improve our understanding of what triggers and maintains the growth of cancerous cells.

KW - Gene essentiality

KW - Literature mining

KW - Machine learning

UR - http://www.scopus.com/inward/record.url?scp=85030331000&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030331000&partnerID=8YFLogxK

U2 - 10.1186/s12859-017-1799-1

DO - 10.1186/s12859-017-1799-1

M3 - Article

AN - SCOPUS:85030331000

VL - 18

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 397

ER -