Exploring alternative knowledge representations for protein secondary-structure prediction

Uros Midic, A. Dunker, Zoran Obradovic

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Methods for 3-class secondary-structure prediction are thought to be reaching the highest achievable accuracy. Their accuracy on β-sheet residue class is considerably lower than for the other two classes. We analysed the relevance of 315 individual input attributes for a predictor with the usual framework of using sequence-profile based data with an input window of fixed size. We propose two alternative knowledge representations with significantly smaller sets of input attributes. We also investigated the possibility of exploiting the prediction of connected pairs of β-sheet residues and the prediction of residue contact maps for the improvement of accuracy of secondary-structure prediction.

Original languageEnglish
Pages (from-to)286-313
Number of pages28
JournalInternational Journal of Data Mining and Bioinformatics
Volume1
Issue number3
DOIs
StatePublished - 2007

Fingerprint

Secondary Protein Structure
Knowledge representation
Proteins
contact

Keywords

  • Bioinformatics
  • Data mining
  • Feature selection
  • Knowledge representation
  • Machine learning
  • Protein folding
  • Protein structure prediction
  • Sensitivity analysis

ASJC Scopus subject areas

  • Library and Information Sciences
  • Information Systems
  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Exploring alternative knowledge representations for protein secondary-structure prediction. / Midic, Uros; Dunker, A.; Obradovic, Zoran.

In: International Journal of Data Mining and Bioinformatics, Vol. 1, No. 3, 2007, p. 286-313.

Research output: Contribution to journalArticle

@article{23df879a264545218d5cc0f563719b46,
title = "Exploring alternative knowledge representations for protein secondary-structure prediction",
abstract = "Methods for 3-class secondary-structure prediction are thought to be reaching the highest achievable accuracy. Their accuracy on β-sheet residue class is considerably lower than for the other two classes. We analysed the relevance of 315 individual input attributes for a predictor with the usual framework of using sequence-profile based data with an input window of fixed size. We propose two alternative knowledge representations with significantly smaller sets of input attributes. We also investigated the possibility of exploiting the prediction of connected pairs of β-sheet residues and the prediction of residue contact maps for the improvement of accuracy of secondary-structure prediction.",
keywords = "Bioinformatics, Data mining, Feature selection, Knowledge representation, Machine learning, Protein folding, Protein structure prediction, Sensitivity analysis",
author = "Uros Midic and A. Dunker and Zoran Obradovic",
year = "2007",
doi = "10.1504/IJDMB.2007.011614",
language = "English",
volume = "1",
pages = "286--313",
journal = "International Journal of Data Mining and Bioinformatics",
issn = "1748-5673",
publisher = "Inderscience Enterprises Ltd",
number = "3",

}

TY - JOUR

T1 - Exploring alternative knowledge representations for protein secondary-structure prediction

AU - Midic, Uros

AU - Dunker, A.

AU - Obradovic, Zoran

PY - 2007

Y1 - 2007

N2 - Methods for 3-class secondary-structure prediction are thought to be reaching the highest achievable accuracy. Their accuracy on β-sheet residue class is considerably lower than for the other two classes. We analysed the relevance of 315 individual input attributes for a predictor with the usual framework of using sequence-profile based data with an input window of fixed size. We propose two alternative knowledge representations with significantly smaller sets of input attributes. We also investigated the possibility of exploiting the prediction of connected pairs of β-sheet residues and the prediction of residue contact maps for the improvement of accuracy of secondary-structure prediction.

AB - Methods for 3-class secondary-structure prediction are thought to be reaching the highest achievable accuracy. Their accuracy on β-sheet residue class is considerably lower than for the other two classes. We analysed the relevance of 315 individual input attributes for a predictor with the usual framework of using sequence-profile based data with an input window of fixed size. We propose two alternative knowledge representations with significantly smaller sets of input attributes. We also investigated the possibility of exploiting the prediction of connected pairs of β-sheet residues and the prediction of residue contact maps for the improvement of accuracy of secondary-structure prediction.

KW - Bioinformatics

KW - Data mining

KW - Feature selection

KW - Knowledge representation

KW - Machine learning

KW - Protein folding

KW - Protein structure prediction

KW - Sensitivity analysis

UR - http://www.scopus.com/inward/record.url?scp=44249085445&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=44249085445&partnerID=8YFLogxK

U2 - 10.1504/IJDMB.2007.011614

DO - 10.1504/IJDMB.2007.011614

M3 - Article

VL - 1

SP - 286

EP - 313

JO - International Journal of Data Mining and Bioinformatics

JF - International Journal of Data Mining and Bioinformatics

SN - 1748-5673

IS - 3

ER -