Identification of biological relationships from text documents using efficient computational methods.

Mathew Palakal, Matthew Stephens, Snehasis Mukhopadhyay, Rajeev Raje, Simon Rhodes

Research output: Contribution to journalArticle

21 Scopus citations

Abstract

The biological literature databases continue to grow rapidly with vital information that is important for conducting sound biomedical research and development. The current practices of manually searching for information and extracting pertinent knowledge are tedious, time-consuming tasks even for motivated biological researchers. Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. The term "object" refers to any biological entity such as a protein, gene, cell cycle, etc. and relationship refers to any dynamic action one object has on another, e.g. protein inhibiting another protein or one object belonging to another object such as, the cells composing an organ. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov Models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For the thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81 percent. These results are promising for multi-object identification and relationship finding from biological documents.

Original languageEnglish (US)
Pages (from-to)307-342
Number of pages36
JournalJournal of bioinformatics and computational biology
Volume1
Issue number2
DOIs
StatePublished - Jul 2003

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Identification of biological relationships from text documents using efficient computational methods.'. Together they form a unique fingerprint.

  • Cite this