Virtual screening consists of docking libraries of small molecules to a target protein followed by rank-ordering of the resulting structures using scoring functions. The ability of scoring methods to distinguish between actives and inactives depends on several factors that include the accuracy of the binding pose during the docking step and the quality of the three-dimensional structure of the target. Here, we build on our previous work to introduce a new scoring approach (SVMGen) that uses machine learning trained with features from statistical pair potentials obtained from three-dimensional crystal structures. We use SVMGen and GlideScore to explore how enrichment or rank-ordering is affected by binding pose accuracy. To that end, we create a validation set that consists strictly of proteins whose crystal structure was solved in complex with their inhibitors. For the rank-ordering studies, we use crystal structures from PDBbind along with corresponding binding affinity data provided in the database. In addition to binding pose, we investigate the effect of using modeled structures for the target on the enrichment performance of SVMGen and GlideScore. To accomplish this, we generated homology models for protein kinases in DUD-E for which crystal structures are available to enable comparison of enrichment between modeled and crystal structure. We also generate homology models for kinases in SARfari for which there are many known small-molecule inhibitors but no known crystal structure. These models are used to assess the ability of SVMGen and GlideScore to distinguish between actives and decoys. We focus our work on protein kinases considering the wealth of structural and binding affinity data that exists for this family of proteins.
ASJC Scopus subject areas
- Chemical Engineering(all)
- Computer Science Applications
- Library and Information Sciences