DrugMetab: An Integrated Machine Learning and Lexicon Mapping Named Entity Recognition Method for Drug Metabolite

Heng Yi Wu, Deshun Lu, Mustafa Hyder, Shijun Zhang, Sara Quinney, Zeruesenay Desta, Lang Li

Research output: Contribution to journalArticle

Abstract

Drug metabolites (DMs) are critical in pharmacology research areas, such as drug metabolism pathways and drug-drug interactions. However, there is no terminology dictionary containing comprehensive drug metabolite names, and there is no named entity recognition (NER) algorithm focusing on drug metabolite identification. In this article, we developed a novel NER system, DrugMetab, to identify DMs from the PubMed abstracts. DrugMetab utilizes the features characterized from the Part-of-Speech, drug index, and pre/suffix, and determines DMs within context. To evaluate the performance, a gold-standard corpus was manually constructed. In this task, DrugMetab with sequential minimal optimization (SMO) classifier achieves 0.89 precision, 0.77 recall, and 0.83 F-measure in the internal testing set; and 0.86 precision, 0.85 recall, and 0.86 F-measure in the external validation set. We further compared the performance between DrugMetab and whatizitChemical, which was designed for identifying small molecules or chemical entities. DrugMetab outperformed whatizitChemical, which had a lower recall rate of 0.65.

Original languageEnglish (US)
JournalCPT: Pharmacometrics and Systems Pharmacology
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

Named Entity Recognition
Metabolites
Learning systems
Machine Learning
Drugs
Pharmaceutical Preparations
Drug interactions
Terminology
Glossaries
Metabolism
Classifiers
Gold
Pharmacology
Suffix
Drug Interactions
PubMed
Molecules
Recognition Algorithm
Names
Testing

ASJC Scopus subject areas

  • Modeling and Simulation
  • Pharmacology (medical)

Cite this

DrugMetab : An Integrated Machine Learning and Lexicon Mapping Named Entity Recognition Method for Drug Metabolite. / Wu, Heng Yi; Lu, Deshun; Hyder, Mustafa; Zhang, Shijun; Quinney, Sara; Desta, Zeruesenay; Li, Lang.

In: CPT: Pharmacometrics and Systems Pharmacology, 01.01.2018.

Research output: Contribution to journalArticle

@article{92939aab17994519809053fc2aad9aaf,
title = "DrugMetab: An Integrated Machine Learning and Lexicon Mapping Named Entity Recognition Method for Drug Metabolite",
abstract = "Drug metabolites (DMs) are critical in pharmacology research areas, such as drug metabolism pathways and drug-drug interactions. However, there is no terminology dictionary containing comprehensive drug metabolite names, and there is no named entity recognition (NER) algorithm focusing on drug metabolite identification. In this article, we developed a novel NER system, DrugMetab, to identify DMs from the PubMed abstracts. DrugMetab utilizes the features characterized from the Part-of-Speech, drug index, and pre/suffix, and determines DMs within context. To evaluate the performance, a gold-standard corpus was manually constructed. In this task, DrugMetab with sequential minimal optimization (SMO) classifier achieves 0.89 precision, 0.77 recall, and 0.83 F-measure in the internal testing set; and 0.86 precision, 0.85 recall, and 0.86 F-measure in the external validation set. We further compared the performance between DrugMetab and whatizitChemical, which was designed for identifying small molecules or chemical entities. DrugMetab outperformed whatizitChemical, which had a lower recall rate of 0.65.",
author = "Wu, {Heng Yi} and Deshun Lu and Mustafa Hyder and Shijun Zhang and Sara Quinney and Zeruesenay Desta and Lang Li",
year = "2018",
month = "1",
day = "1",
doi = "10.1002/psp4.12340",
language = "English (US)",
journal = "CPT: Pharmacometrics and Systems Pharmacology",
issn = "2163-8306",
publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - DrugMetab

T2 - An Integrated Machine Learning and Lexicon Mapping Named Entity Recognition Method for Drug Metabolite

AU - Wu, Heng Yi

AU - Lu, Deshun

AU - Hyder, Mustafa

AU - Zhang, Shijun

AU - Quinney, Sara

AU - Desta, Zeruesenay

AU - Li, Lang

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Drug metabolites (DMs) are critical in pharmacology research areas, such as drug metabolism pathways and drug-drug interactions. However, there is no terminology dictionary containing comprehensive drug metabolite names, and there is no named entity recognition (NER) algorithm focusing on drug metabolite identification. In this article, we developed a novel NER system, DrugMetab, to identify DMs from the PubMed abstracts. DrugMetab utilizes the features characterized from the Part-of-Speech, drug index, and pre/suffix, and determines DMs within context. To evaluate the performance, a gold-standard corpus was manually constructed. In this task, DrugMetab with sequential minimal optimization (SMO) classifier achieves 0.89 precision, 0.77 recall, and 0.83 F-measure in the internal testing set; and 0.86 precision, 0.85 recall, and 0.86 F-measure in the external validation set. We further compared the performance between DrugMetab and whatizitChemical, which was designed for identifying small molecules or chemical entities. DrugMetab outperformed whatizitChemical, which had a lower recall rate of 0.65.

AB - Drug metabolites (DMs) are critical in pharmacology research areas, such as drug metabolism pathways and drug-drug interactions. However, there is no terminology dictionary containing comprehensive drug metabolite names, and there is no named entity recognition (NER) algorithm focusing on drug metabolite identification. In this article, we developed a novel NER system, DrugMetab, to identify DMs from the PubMed abstracts. DrugMetab utilizes the features characterized from the Part-of-Speech, drug index, and pre/suffix, and determines DMs within context. To evaluate the performance, a gold-standard corpus was manually constructed. In this task, DrugMetab with sequential minimal optimization (SMO) classifier achieves 0.89 precision, 0.77 recall, and 0.83 F-measure in the internal testing set; and 0.86 precision, 0.85 recall, and 0.86 F-measure in the external validation set. We further compared the performance between DrugMetab and whatizitChemical, which was designed for identifying small molecules or chemical entities. DrugMetab outperformed whatizitChemical, which had a lower recall rate of 0.65.

UR - http://www.scopus.com/inward/record.url?scp=85054162089&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054162089&partnerID=8YFLogxK

U2 - 10.1002/psp4.12340

DO - 10.1002/psp4.12340

M3 - Article

C2 - 30033622

AN - SCOPUS:85054162089

JO - CPT: Pharmacometrics and Systems Pharmacology

JF - CPT: Pharmacometrics and Systems Pharmacology

SN - 2163-8306

ER -