Literature mining on pharmacokinetics numerical data

A feasibility study

Zhiping Wang, Seongho Kim, Sara Quinney, Yingying Guo, Stephen D. Hall, Luis M. Rocha, Lang Li

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

A feasibility study of literature mining is conducted on drug PK parameter numerical data with a sequential mining strategy. Firstly, an entity template library is built to retrieve pharmacokinetics relevant articles. Then a set of tagging and extraction rules are applied to retrieve PK data from the article abstracts. To estimate the PK parameter population-average mean and between-study variance, a linear mixed meta-analysis model and an E-M algorithm are developed to describe the probability distributions of PK parameters. Finally, a cross-validation procedure is developed to ascertain false-positive mining results. Using this approach to mine midazolam (MDZ) PK data, an 88% precision rate and 92% recall rate are achieved, with an F-score = 90%. It greatly out-performs a conventional data mining approach (support vector machine), which has an F-score of 68.1%. Further investigate on 7 more drugs reveals comparable performances of our sequential mining approach.

Original languageEnglish
Pages (from-to)726-735
Number of pages10
JournalJournal of Biomedical Informatics
Volume42
Issue number4
DOIs
StatePublished - Aug 2009

Fingerprint

Pharmacokinetics
Feasibility Studies
Probability distributions
Support vector machines
Data mining
Data Mining
Midazolam
Pharmaceutical Preparations
Libraries
Meta-Analysis
Population
Support Vector Machine

Keywords

  • Clearance
  • Data mining
  • Entity recognition
  • Information extraction
  • Linear mixed model
  • Midazolam
  • Pharmacokinetics

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

Literature mining on pharmacokinetics numerical data : A feasibility study. / Wang, Zhiping; Kim, Seongho; Quinney, Sara; Guo, Yingying; Hall, Stephen D.; Rocha, Luis M.; Li, Lang.

In: Journal of Biomedical Informatics, Vol. 42, No. 4, 08.2009, p. 726-735.

Research output: Contribution to journalArticle

Wang, Zhiping ; Kim, Seongho ; Quinney, Sara ; Guo, Yingying ; Hall, Stephen D. ; Rocha, Luis M. ; Li, Lang. / Literature mining on pharmacokinetics numerical data : A feasibility study. In: Journal of Biomedical Informatics. 2009 ; Vol. 42, No. 4. pp. 726-735.
@article{03756f586b764b4cae8a978b41f50862,
title = "Literature mining on pharmacokinetics numerical data: A feasibility study",
abstract = "A feasibility study of literature mining is conducted on drug PK parameter numerical data with a sequential mining strategy. Firstly, an entity template library is built to retrieve pharmacokinetics relevant articles. Then a set of tagging and extraction rules are applied to retrieve PK data from the article abstracts. To estimate the PK parameter population-average mean and between-study variance, a linear mixed meta-analysis model and an E-M algorithm are developed to describe the probability distributions of PK parameters. Finally, a cross-validation procedure is developed to ascertain false-positive mining results. Using this approach to mine midazolam (MDZ) PK data, an 88{\%} precision rate and 92{\%} recall rate are achieved, with an F-score = 90{\%}. It greatly out-performs a conventional data mining approach (support vector machine), which has an F-score of 68.1{\%}. Further investigate on 7 more drugs reveals comparable performances of our sequential mining approach.",
keywords = "Clearance, Data mining, Entity recognition, Information extraction, Linear mixed model, Midazolam, Pharmacokinetics",
author = "Zhiping Wang and Seongho Kim and Sara Quinney and Yingying Guo and Hall, {Stephen D.} and Rocha, {Luis M.} and Lang Li",
year = "2009",
month = "8",
doi = "10.1016/j.jbi.2009.03.010",
language = "English",
volume = "42",
pages = "726--735",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",
number = "4",

}

TY - JOUR

T1 - Literature mining on pharmacokinetics numerical data

T2 - A feasibility study

AU - Wang, Zhiping

AU - Kim, Seongho

AU - Quinney, Sara

AU - Guo, Yingying

AU - Hall, Stephen D.

AU - Rocha, Luis M.

AU - Li, Lang

PY - 2009/8

Y1 - 2009/8

N2 - A feasibility study of literature mining is conducted on drug PK parameter numerical data with a sequential mining strategy. Firstly, an entity template library is built to retrieve pharmacokinetics relevant articles. Then a set of tagging and extraction rules are applied to retrieve PK data from the article abstracts. To estimate the PK parameter population-average mean and between-study variance, a linear mixed meta-analysis model and an E-M algorithm are developed to describe the probability distributions of PK parameters. Finally, a cross-validation procedure is developed to ascertain false-positive mining results. Using this approach to mine midazolam (MDZ) PK data, an 88% precision rate and 92% recall rate are achieved, with an F-score = 90%. It greatly out-performs a conventional data mining approach (support vector machine), which has an F-score of 68.1%. Further investigate on 7 more drugs reveals comparable performances of our sequential mining approach.

AB - A feasibility study of literature mining is conducted on drug PK parameter numerical data with a sequential mining strategy. Firstly, an entity template library is built to retrieve pharmacokinetics relevant articles. Then a set of tagging and extraction rules are applied to retrieve PK data from the article abstracts. To estimate the PK parameter population-average mean and between-study variance, a linear mixed meta-analysis model and an E-M algorithm are developed to describe the probability distributions of PK parameters. Finally, a cross-validation procedure is developed to ascertain false-positive mining results. Using this approach to mine midazolam (MDZ) PK data, an 88% precision rate and 92% recall rate are achieved, with an F-score = 90%. It greatly out-performs a conventional data mining approach (support vector machine), which has an F-score of 68.1%. Further investigate on 7 more drugs reveals comparable performances of our sequential mining approach.

KW - Clearance

KW - Data mining

KW - Entity recognition

KW - Information extraction

KW - Linear mixed model

KW - Midazolam

KW - Pharmacokinetics

UR - http://www.scopus.com/inward/record.url?scp=67649382957&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67649382957&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2009.03.010

DO - 10.1016/j.jbi.2009.03.010

M3 - Article

VL - 42

SP - 726

EP - 735

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

IS - 4

ER -