Predicting the results of molecular specific hybridization using boosted tree algorithm

Weijun Zhu, Yingjie Han, Huanmei Wu, Yang Liu, Xiaofei Nan, Qinglei Zhou

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

In the field of bioinformatics and DNA computing, simulated hybridization experiments can replace real molecular hybridization experiments to some extent, avoiding some disadvantages of the actual experimental design. However, the core techniques, which are employed by the popular DNA simulation software, are limited to the exponential computational complexity of the combinatorial problems. As a result, it is impossible to decide whether a specific hybridization among complex DNA molecules is effective or not within acceptable time. To address this common problem, we hereby introduce a new method based on the machine learning technique. First, a sample set is employed to train the boosted tree algorithm, which resulted in a corresponding machine learning model. Second, this model is applied to predict the classification results of molecular hybridization for a given group of DNA molecular coding. The experiment results showed that the new method had an average accuracy level of 94.2% and an average efficiency level 90 839 times higher than that of the existing representative approaches. Especially for the case study in this paper, the efficiency of the new method is 235 000, 250 000, and 990 000 times higher than that of the three existing methods, respectively. These experimental results indicate that our new approach can quickly and accurately determine the biological effectiveness of molecular hybridization for a given DNA design.

Original languageEnglish (US)
JournalConcurrency Computation
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

Tree Algorithms
DNA
Learning systems
Machine Learning
Experiment
DNA Computing
Experiments
Combinatorial Problems
Bioinformatics
Simulation Software
Experimental design
Design of experiments
Computational complexity
Computational Complexity
Coding
Molecules
Predict
Experimental Results
Model

Keywords

  • biological effectiveness
  • boosted tree algorithm
  • DNA design
  • specific hybridization

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Computer Science Applications
  • Computer Networks and Communications
  • Computational Theory and Mathematics

Cite this

Predicting the results of molecular specific hybridization using boosted tree algorithm. / Zhu, Weijun; Han, Yingjie; Wu, Huanmei; Liu, Yang; Nan, Xiaofei; Zhou, Qinglei.

In: Concurrency Computation, 01.01.2018.

Research output: Contribution to journalArticle

Zhu, Weijun ; Han, Yingjie ; Wu, Huanmei ; Liu, Yang ; Nan, Xiaofei ; Zhou, Qinglei. / Predicting the results of molecular specific hybridization using boosted tree algorithm. In: Concurrency Computation. 2018.
@article{4b0f2942a159482485d92fa304723615,
title = "Predicting the results of molecular specific hybridization using boosted tree algorithm",
abstract = "In the field of bioinformatics and DNA computing, simulated hybridization experiments can replace real molecular hybridization experiments to some extent, avoiding some disadvantages of the actual experimental design. However, the core techniques, which are employed by the popular DNA simulation software, are limited to the exponential computational complexity of the combinatorial problems. As a result, it is impossible to decide whether a specific hybridization among complex DNA molecules is effective or not within acceptable time. To address this common problem, we hereby introduce a new method based on the machine learning technique. First, a sample set is employed to train the boosted tree algorithm, which resulted in a corresponding machine learning model. Second, this model is applied to predict the classification results of molecular hybridization for a given group of DNA molecular coding. The experiment results showed that the new method had an average accuracy level of 94.2{\%} and an average efficiency level 90 839 times higher than that of the existing representative approaches. Especially for the case study in this paper, the efficiency of the new method is 235 000, 250 000, and 990 000 times higher than that of the three existing methods, respectively. These experimental results indicate that our new approach can quickly and accurately determine the biological effectiveness of molecular hybridization for a given DNA design.",
keywords = "biological effectiveness, boosted tree algorithm, DNA design, specific hybridization",
author = "Weijun Zhu and Yingjie Han and Huanmei Wu and Yang Liu and Xiaofei Nan and Qinglei Zhou",
year = "2018",
month = "1",
day = "1",
doi = "10.1002/cpe.4982",
language = "English (US)",
journal = "Concurrency Computation Practice and Experience",
issn = "1532-0626",
publisher = "John Wiley and Sons Ltd",

}

TY - JOUR

T1 - Predicting the results of molecular specific hybridization using boosted tree algorithm

AU - Zhu, Weijun

AU - Han, Yingjie

AU - Wu, Huanmei

AU - Liu, Yang

AU - Nan, Xiaofei

AU - Zhou, Qinglei

PY - 2018/1/1

Y1 - 2018/1/1

N2 - In the field of bioinformatics and DNA computing, simulated hybridization experiments can replace real molecular hybridization experiments to some extent, avoiding some disadvantages of the actual experimental design. However, the core techniques, which are employed by the popular DNA simulation software, are limited to the exponential computational complexity of the combinatorial problems. As a result, it is impossible to decide whether a specific hybridization among complex DNA molecules is effective or not within acceptable time. To address this common problem, we hereby introduce a new method based on the machine learning technique. First, a sample set is employed to train the boosted tree algorithm, which resulted in a corresponding machine learning model. Second, this model is applied to predict the classification results of molecular hybridization for a given group of DNA molecular coding. The experiment results showed that the new method had an average accuracy level of 94.2% and an average efficiency level 90 839 times higher than that of the existing representative approaches. Especially for the case study in this paper, the efficiency of the new method is 235 000, 250 000, and 990 000 times higher than that of the three existing methods, respectively. These experimental results indicate that our new approach can quickly and accurately determine the biological effectiveness of molecular hybridization for a given DNA design.

AB - In the field of bioinformatics and DNA computing, simulated hybridization experiments can replace real molecular hybridization experiments to some extent, avoiding some disadvantages of the actual experimental design. However, the core techniques, which are employed by the popular DNA simulation software, are limited to the exponential computational complexity of the combinatorial problems. As a result, it is impossible to decide whether a specific hybridization among complex DNA molecules is effective or not within acceptable time. To address this common problem, we hereby introduce a new method based on the machine learning technique. First, a sample set is employed to train the boosted tree algorithm, which resulted in a corresponding machine learning model. Second, this model is applied to predict the classification results of molecular hybridization for a given group of DNA molecular coding. The experiment results showed that the new method had an average accuracy level of 94.2% and an average efficiency level 90 839 times higher than that of the existing representative approaches. Especially for the case study in this paper, the efficiency of the new method is 235 000, 250 000, and 990 000 times higher than that of the three existing methods, respectively. These experimental results indicate that our new approach can quickly and accurately determine the biological effectiveness of molecular hybridization for a given DNA design.

KW - biological effectiveness

KW - boosted tree algorithm

KW - DNA design

KW - specific hybridization

UR - http://www.scopus.com/inward/record.url?scp=85053521674&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053521674&partnerID=8YFLogxK

U2 - 10.1002/cpe.4982

DO - 10.1002/cpe.4982

M3 - Article

AN - SCOPUS:85053521674

JO - Concurrency Computation Practice and Experience

JF - Concurrency Computation Practice and Experience

SN - 1532-0626

ER -