Identification of biological relationships from text documents using efficient computational methods.

Mathew Palakal, Matthew Stephens, Snehasis Mukhopadhyay, Rajeev Raje, Simon Rhodes

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

The biological literature databases continue to grow rapidly with vital information that is important for conducting sound biomedical research and development. The current practices of manually searching for information and extracting pertinent knowledge are tedious, time-consuming tasks even for motivated biological researchers. Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. The term "object" refers to any biological entity such as a protein, gene, cell cycle, etc. and relationship refers to any dynamic action one object has on another, e.g. protein inhibiting another protein or one object belonging to another object such as, the cells composing an organ. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov Models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For the thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81 percent. These results are promising for multi-object identification and relationship finding from biological documents.

Original languageEnglish
Pages (from-to)307-342
Number of pages36
JournalJournal of Bioinformatics and Computational Biology
Volume1
Issue number2
StatePublished - Jul 2003

Fingerprint

Computational methods
Proteins
cdc Genes
Biological Models
Cell Cycle Proteins
Hidden Markov models
Glossaries
Ontology
Biomedical Research
Genes
Cells
Research Personnel
Acoustic waves
Databases
Research
Object Attachment
Experiments

ASJC Scopus subject areas

  • Medicine(all)
  • Cell Biology

Cite this

Identification of biological relationships from text documents using efficient computational methods. / Palakal, Mathew; Stephens, Matthew; Mukhopadhyay, Snehasis; Raje, Rajeev; Rhodes, Simon.

In: Journal of Bioinformatics and Computational Biology, Vol. 1, No. 2, 07.2003, p. 307-342.

Research output: Contribution to journalArticle

Palakal, Mathew ; Stephens, Matthew ; Mukhopadhyay, Snehasis ; Raje, Rajeev ; Rhodes, Simon. / Identification of biological relationships from text documents using efficient computational methods. In: Journal of Bioinformatics and Computational Biology. 2003 ; Vol. 1, No. 2. pp. 307-342.
@article{56fc4deaef6d4a39adfb4816b7e7c279,
title = "Identification of biological relationships from text documents using efficient computational methods.",
abstract = "The biological literature databases continue to grow rapidly with vital information that is important for conducting sound biomedical research and development. The current practices of manually searching for information and extracting pertinent knowledge are tedious, time-consuming tasks even for motivated biological researchers. Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. The term {"}object{"} refers to any biological entity such as a protein, gene, cell cycle, etc. and relationship refers to any dynamic action one object has on another, e.g. protein inhibiting another protein or one object belonging to another object such as, the cells composing an organ. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov Models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For the thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81 percent. These results are promising for multi-object identification and relationship finding from biological documents.",
author = "Mathew Palakal and Matthew Stephens and Snehasis Mukhopadhyay and Rajeev Raje and Simon Rhodes",
year = "2003",
month = "7",
language = "English",
volume = "1",
pages = "307--342",
journal = "Journal of Bioinformatics and Computational Biology",
issn = "0219-7200",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "2",

}

TY - JOUR

T1 - Identification of biological relationships from text documents using efficient computational methods.

AU - Palakal, Mathew

AU - Stephens, Matthew

AU - Mukhopadhyay, Snehasis

AU - Raje, Rajeev

AU - Rhodes, Simon

PY - 2003/7

Y1 - 2003/7

N2 - The biological literature databases continue to grow rapidly with vital information that is important for conducting sound biomedical research and development. The current practices of manually searching for information and extracting pertinent knowledge are tedious, time-consuming tasks even for motivated biological researchers. Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. The term "object" refers to any biological entity such as a protein, gene, cell cycle, etc. and relationship refers to any dynamic action one object has on another, e.g. protein inhibiting another protein or one object belonging to another object such as, the cells composing an organ. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov Models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For the thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81 percent. These results are promising for multi-object identification and relationship finding from biological documents.

AB - The biological literature databases continue to grow rapidly with vital information that is important for conducting sound biomedical research and development. The current practices of manually searching for information and extracting pertinent knowledge are tedious, time-consuming tasks even for motivated biological researchers. Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. The term "object" refers to any biological entity such as a protein, gene, cell cycle, etc. and relationship refers to any dynamic action one object has on another, e.g. protein inhibiting another protein or one object belonging to another object such as, the cells composing an organ. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov Models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For the thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81 percent. These results are promising for multi-object identification and relationship finding from biological documents.

UR - http://www.scopus.com/inward/record.url?scp=16644365679&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=16644365679&partnerID=8YFLogxK

M3 - Article

C2 - 15290775

AN - SCOPUS:16644365679

VL - 1

SP - 307

EP - 342

JO - Journal of Bioinformatics and Computational Biology

JF - Journal of Bioinformatics and Computational Biology

SN - 0219-7200

IS - 2

ER -