Effect of Binding Pose and Modeled Structures on SVMGen and GlideScore Enrichment of Chemical Libraries

David Xu, Samy Meroueh

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Virtual screening consists of docking libraries of small molecules to a target protein followed by rank-ordering of the resulting structures using scoring functions. The ability of scoring methods to distinguish between actives and inactives depends on several factors that include the accuracy of the binding pose during the docking step and the quality of the three-dimensional structure of the target. Here, we build on our previous work to introduce a new scoring approach (SVMGen) that uses machine learning trained with features from statistical pair potentials obtained from three-dimensional crystal structures. We use SVMGen and GlideScore to explore how enrichment or rank-ordering is affected by binding pose accuracy. To that end, we create a validation set that consists strictly of proteins whose crystal structure was solved in complex with their inhibitors. For the rank-ordering studies, we use crystal structures from PDBbind along with corresponding binding affinity data provided in the database. In addition to binding pose, we investigate the effect of using modeled structures for the target on the enrichment performance of SVMGen and GlideScore. To accomplish this, we generated homology models for protein kinases in DUD-E for which crystal structures are available to enable comparison of enrichment between modeled and crystal structure. We also generate homology models for kinases in SARfari for which there are many known small-molecule inhibitors but no known crystal structure. These models are used to assess the ability of SVMGen and GlideScore to distinguish between actives and decoys. We focus our work on protein kinases considering the wealth of structural and binding affinity data that exists for this family of proteins.

Original languageEnglish (US)
Pages (from-to)1139-1151
Number of pages13
JournalJournal of Chemical Information and Modeling
Volume56
Issue number6
DOIs
StatePublished - Jun 27 2016

Fingerprint

Crystal structure
Proteins
Protein Kinases
Molecules
Learning systems
Screening
Phosphotransferases
ability
learning
performance

ASJC Scopus subject areas

  • Chemistry(all)
  • Chemical Engineering(all)
  • Computer Science Applications
  • Library and Information Sciences

Cite this

Effect of Binding Pose and Modeled Structures on SVMGen and GlideScore Enrichment of Chemical Libraries. / Xu, David; Meroueh, Samy.

In: Journal of Chemical Information and Modeling, Vol. 56, No. 6, 27.06.2016, p. 1139-1151.

Research output: Contribution to journalArticle

@article{67e35353f4814024aa5398a68ca48282,
title = "Effect of Binding Pose and Modeled Structures on SVMGen and GlideScore Enrichment of Chemical Libraries",
abstract = "Virtual screening consists of docking libraries of small molecules to a target protein followed by rank-ordering of the resulting structures using scoring functions. The ability of scoring methods to distinguish between actives and inactives depends on several factors that include the accuracy of the binding pose during the docking step and the quality of the three-dimensional structure of the target. Here, we build on our previous work to introduce a new scoring approach (SVMGen) that uses machine learning trained with features from statistical pair potentials obtained from three-dimensional crystal structures. We use SVMGen and GlideScore to explore how enrichment or rank-ordering is affected by binding pose accuracy. To that end, we create a validation set that consists strictly of proteins whose crystal structure was solved in complex with their inhibitors. For the rank-ordering studies, we use crystal structures from PDBbind along with corresponding binding affinity data provided in the database. In addition to binding pose, we investigate the effect of using modeled structures for the target on the enrichment performance of SVMGen and GlideScore. To accomplish this, we generated homology models for protein kinases in DUD-E for which crystal structures are available to enable comparison of enrichment between modeled and crystal structure. We also generate homology models for kinases in SARfari for which there are many known small-molecule inhibitors but no known crystal structure. These models are used to assess the ability of SVMGen and GlideScore to distinguish between actives and decoys. We focus our work on protein kinases considering the wealth of structural and binding affinity data that exists for this family of proteins.",
author = "David Xu and Samy Meroueh",
year = "2016",
month = "6",
day = "27",
doi = "10.1021/acs.jcim.5b00709",
language = "English (US)",
volume = "56",
pages = "1139--1151",
journal = "Journal of Chemical Information and Computer Sciences",
issn = "0095-2338",
publisher = "American Chemical Society",
number = "6",

}

TY - JOUR

T1 - Effect of Binding Pose and Modeled Structures on SVMGen and GlideScore Enrichment of Chemical Libraries

AU - Xu, David

AU - Meroueh, Samy

PY - 2016/6/27

Y1 - 2016/6/27

N2 - Virtual screening consists of docking libraries of small molecules to a target protein followed by rank-ordering of the resulting structures using scoring functions. The ability of scoring methods to distinguish between actives and inactives depends on several factors that include the accuracy of the binding pose during the docking step and the quality of the three-dimensional structure of the target. Here, we build on our previous work to introduce a new scoring approach (SVMGen) that uses machine learning trained with features from statistical pair potentials obtained from three-dimensional crystal structures. We use SVMGen and GlideScore to explore how enrichment or rank-ordering is affected by binding pose accuracy. To that end, we create a validation set that consists strictly of proteins whose crystal structure was solved in complex with their inhibitors. For the rank-ordering studies, we use crystal structures from PDBbind along with corresponding binding affinity data provided in the database. In addition to binding pose, we investigate the effect of using modeled structures for the target on the enrichment performance of SVMGen and GlideScore. To accomplish this, we generated homology models for protein kinases in DUD-E for which crystal structures are available to enable comparison of enrichment between modeled and crystal structure. We also generate homology models for kinases in SARfari for which there are many known small-molecule inhibitors but no known crystal structure. These models are used to assess the ability of SVMGen and GlideScore to distinguish between actives and decoys. We focus our work on protein kinases considering the wealth of structural and binding affinity data that exists for this family of proteins.

AB - Virtual screening consists of docking libraries of small molecules to a target protein followed by rank-ordering of the resulting structures using scoring functions. The ability of scoring methods to distinguish between actives and inactives depends on several factors that include the accuracy of the binding pose during the docking step and the quality of the three-dimensional structure of the target. Here, we build on our previous work to introduce a new scoring approach (SVMGen) that uses machine learning trained with features from statistical pair potentials obtained from three-dimensional crystal structures. We use SVMGen and GlideScore to explore how enrichment or rank-ordering is affected by binding pose accuracy. To that end, we create a validation set that consists strictly of proteins whose crystal structure was solved in complex with their inhibitors. For the rank-ordering studies, we use crystal structures from PDBbind along with corresponding binding affinity data provided in the database. In addition to binding pose, we investigate the effect of using modeled structures for the target on the enrichment performance of SVMGen and GlideScore. To accomplish this, we generated homology models for protein kinases in DUD-E for which crystal structures are available to enable comparison of enrichment between modeled and crystal structure. We also generate homology models for kinases in SARfari for which there are many known small-molecule inhibitors but no known crystal structure. These models are used to assess the ability of SVMGen and GlideScore to distinguish between actives and decoys. We focus our work on protein kinases considering the wealth of structural and binding affinity data that exists for this family of proteins.

UR - http://www.scopus.com/inward/record.url?scp=84976370554&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84976370554&partnerID=8YFLogxK

U2 - 10.1021/acs.jcim.5b00709

DO - 10.1021/acs.jcim.5b00709

M3 - Article

C2 - 27154487

AN - SCOPUS:84976370554

VL - 56

SP - 1139

EP - 1151

JO - Journal of Chemical Information and Computer Sciences

JF - Journal of Chemical Information and Computer Sciences

SN - 0095-2338

IS - 6

ER -