Support vector machines in HTS data mining

Type I MetAPs inhibition study

Jianwen Fang, Yinghua Dong, Gerald H. Lushington, Qizhuang Ye, Gunda I. Georg

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

This article reports a successful application of support vector machines (SVMs) in mining high-throughput screening (HTS) data of a type I methionine aminopeptidases (MetAPs) inhibition study. A library with 43,736 small organic molecules was used in the study, and 1355 compounds in the library with 40% or higher inhibition activity were considered as active. The data set was randomly split into a training set and a test set (3:1 ratio). The authors were able to rank compounds in the test set using their decision values predicted by SVM models that were built on the training set. They defined a novel score PT 50, the percentage of the test set needed to be screened to recover 50% of the actives, to measure the performance of the models. With carefully selected parameters, SVM models increased the hit rates significantly, and 50% of the active compounds could be recovered by screening just 7% of the test set. The authors found that the size of the training set played a significant role in the performance of the models. A training set with 10,000 member compounds is likely the minimum size required to build a model with reasonable predictive power.

Original languageEnglish (US)
Pages (from-to)138-144
Number of pages7
JournalJournal of Biomolecular Screening
Volume11
Issue number2
DOIs
StatePublished - Mar 2006
Externally publishedYes

Fingerprint

Aminopeptidases
Data Mining
Methionine
Support vector machines
Data mining
Screening
Throughput
Libraries
Support Vector Machine
Molecules

Keywords

  • High-throughput screening
  • Machine learning
  • MetAP
  • Support vector machines

ASJC Scopus subject areas

  • Analytical Chemistry
  • Clinical Biochemistry
  • Biotechnology
  • Biochemistry
  • Molecular Biology

Cite this

Support vector machines in HTS data mining : Type I MetAPs inhibition study. / Fang, Jianwen; Dong, Yinghua; Lushington, Gerald H.; Ye, Qizhuang; Georg, Gunda I.

In: Journal of Biomolecular Screening, Vol. 11, No. 2, 03.2006, p. 138-144.

Research output: Contribution to journalArticle

Fang, Jianwen ; Dong, Yinghua ; Lushington, Gerald H. ; Ye, Qizhuang ; Georg, Gunda I. / Support vector machines in HTS data mining : Type I MetAPs inhibition study. In: Journal of Biomolecular Screening. 2006 ; Vol. 11, No. 2. pp. 138-144.
@article{4931760fed5240c6beb460b00dd7a2fa,
title = "Support vector machines in HTS data mining: Type I MetAPs inhibition study",
abstract = "This article reports a successful application of support vector machines (SVMs) in mining high-throughput screening (HTS) data of a type I methionine aminopeptidases (MetAPs) inhibition study. A library with 43,736 small organic molecules was used in the study, and 1355 compounds in the library with 40{\%} or higher inhibition activity were considered as active. The data set was randomly split into a training set and a test set (3:1 ratio). The authors were able to rank compounds in the test set using their decision values predicted by SVM models that were built on the training set. They defined a novel score PT 50, the percentage of the test set needed to be screened to recover 50{\%} of the actives, to measure the performance of the models. With carefully selected parameters, SVM models increased the hit rates significantly, and 50{\%} of the active compounds could be recovered by screening just 7{\%} of the test set. The authors found that the size of the training set played a significant role in the performance of the models. A training set with 10,000 member compounds is likely the minimum size required to build a model with reasonable predictive power.",
keywords = "High-throughput screening, Machine learning, MetAP, Support vector machines",
author = "Jianwen Fang and Yinghua Dong and Lushington, {Gerald H.} and Qizhuang Ye and Georg, {Gunda I.}",
year = "2006",
month = "3",
doi = "10.1177/1087057105284334",
language = "English (US)",
volume = "11",
pages = "138--144",
journal = "Journal of Biomolecular Screening",
issn = "1087-0571",
publisher = "SAGE Publications Inc.",
number = "2",

}

TY - JOUR

T1 - Support vector machines in HTS data mining

T2 - Type I MetAPs inhibition study

AU - Fang, Jianwen

AU - Dong, Yinghua

AU - Lushington, Gerald H.

AU - Ye, Qizhuang

AU - Georg, Gunda I.

PY - 2006/3

Y1 - 2006/3

N2 - This article reports a successful application of support vector machines (SVMs) in mining high-throughput screening (HTS) data of a type I methionine aminopeptidases (MetAPs) inhibition study. A library with 43,736 small organic molecules was used in the study, and 1355 compounds in the library with 40% or higher inhibition activity were considered as active. The data set was randomly split into a training set and a test set (3:1 ratio). The authors were able to rank compounds in the test set using their decision values predicted by SVM models that were built on the training set. They defined a novel score PT 50, the percentage of the test set needed to be screened to recover 50% of the actives, to measure the performance of the models. With carefully selected parameters, SVM models increased the hit rates significantly, and 50% of the active compounds could be recovered by screening just 7% of the test set. The authors found that the size of the training set played a significant role in the performance of the models. A training set with 10,000 member compounds is likely the minimum size required to build a model with reasonable predictive power.

AB - This article reports a successful application of support vector machines (SVMs) in mining high-throughput screening (HTS) data of a type I methionine aminopeptidases (MetAPs) inhibition study. A library with 43,736 small organic molecules was used in the study, and 1355 compounds in the library with 40% or higher inhibition activity were considered as active. The data set was randomly split into a training set and a test set (3:1 ratio). The authors were able to rank compounds in the test set using their decision values predicted by SVM models that were built on the training set. They defined a novel score PT 50, the percentage of the test set needed to be screened to recover 50% of the actives, to measure the performance of the models. With carefully selected parameters, SVM models increased the hit rates significantly, and 50% of the active compounds could be recovered by screening just 7% of the test set. The authors found that the size of the training set played a significant role in the performance of the models. A training set with 10,000 member compounds is likely the minimum size required to build a model with reasonable predictive power.

KW - High-throughput screening

KW - Machine learning

KW - MetAP

KW - Support vector machines

UR - http://www.scopus.com/inward/record.url?scp=33644934561&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33644934561&partnerID=8YFLogxK

U2 - 10.1177/1087057105284334

DO - 10.1177/1087057105284334

M3 - Article

VL - 11

SP - 138

EP - 144

JO - Journal of Biomolecular Screening

JF - Journal of Biomolecular Screening

SN - 1087-0571

IS - 2

ER -