Feature selection filters based on the permutation test

Predrag Radivojac, Zoran Obradovic, A. Dunker, Slobodan Vucetic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Citations (Scopus)

Abstract

We investigate the problem of supervised feature selection within the filtering framework. In our approach, applicable to the two-class problems, the feature strength is inversely proportional to the p-value of the null hypothesis that its class-conditional densities, p(X|Y = 0) and p(X|Y = 1), are identical. To estimate the p-values, we use Fisher's permutation test combined with the four simple filtering criteria in the roles of test statistics: sample mean difference, symmetric Kullback-Leibler distance, information gain, and chi-square statistic. The experimental results of our study, performed using naive Bayes classifier and support vector machines, strongly indicate that the permutation test improves the above-mentioned filters and can be used effectively when sample size is relatively small and number of features relatively large.

Original languageEnglish
Title of host publicationLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
EditorsJ.-F. Boulicaut, F. Esposito, D. Pedreschi, F. Giannotti
Pages334-346
Number of pages13
Volume3201
StatePublished - 2004
Event15th European Conference on Machine Learning, ECML 2004 - Pisa, Italy
Duration: Sep 20 2004Sep 24 2004

Other

Other15th European Conference on Machine Learning, ECML 2004
CountryItaly
CityPisa
Period9/20/049/24/04

Fingerprint

Permutation Test
p-Value
Feature Selection
Feature extraction
Filtering
Statistics
Filter
Kullback-Leibler Distance
Naive Bayes Classifier
Conditional Density
Information Gain
Chi-square
Sample mean
Null hypothesis
Test Statistic
Support vector machines
Statistic
Support Vector Machine
Sample Size
Classifiers

ASJC Scopus subject areas

  • Hardware and Architecture
  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Radivojac, P., Obradovic, Z., Dunker, A., & Vucetic, S. (2004). Feature selection filters based on the permutation test. In J-F. Boulicaut, F. Esposito, D. Pedreschi, & F. Giannotti (Eds.), Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3201, pp. 334-346)

Feature selection filters based on the permutation test. / Radivojac, Predrag; Obradovic, Zoran; Dunker, A.; Vucetic, Slobodan.

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science). ed. / J.-F. Boulicaut; F. Esposito; D. Pedreschi; F. Giannotti. Vol. 3201 2004. p. 334-346.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Radivojac, P, Obradovic, Z, Dunker, A & Vucetic, S 2004, Feature selection filters based on the permutation test. in J-F Boulicaut, F Esposito, D Pedreschi & F Giannotti (eds), Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science). vol. 3201, pp. 334-346, 15th European Conference on Machine Learning, ECML 2004, Pisa, Italy, 9/20/04.
Radivojac P, Obradovic Z, Dunker A, Vucetic S. Feature selection filters based on the permutation test. In Boulicaut J-F, Esposito F, Pedreschi D, Giannotti F, editors, Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science). Vol. 3201. 2004. p. 334-346
Radivojac, Predrag ; Obradovic, Zoran ; Dunker, A. ; Vucetic, Slobodan. / Feature selection filters based on the permutation test. Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science). editor / J.-F. Boulicaut ; F. Esposito ; D. Pedreschi ; F. Giannotti. Vol. 3201 2004. pp. 334-346
@inproceedings{d934bd0aaec846c2a5de59b6000aae72,
title = "Feature selection filters based on the permutation test",
abstract = "We investigate the problem of supervised feature selection within the filtering framework. In our approach, applicable to the two-class problems, the feature strength is inversely proportional to the p-value of the null hypothesis that its class-conditional densities, p(X|Y = 0) and p(X|Y = 1), are identical. To estimate the p-values, we use Fisher's permutation test combined with the four simple filtering criteria in the roles of test statistics: sample mean difference, symmetric Kullback-Leibler distance, information gain, and chi-square statistic. The experimental results of our study, performed using naive Bayes classifier and support vector machines, strongly indicate that the permutation test improves the above-mentioned filters and can be used effectively when sample size is relatively small and number of features relatively large.",
author = "Predrag Radivojac and Zoran Obradovic and A. Dunker and Slobodan Vucetic",
year = "2004",
language = "English",
volume = "3201",
pages = "334--346",
editor = "J.-F. Boulicaut and F. Esposito and D. Pedreschi and F. Giannotti",
booktitle = "Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)",

}

TY - GEN

T1 - Feature selection filters based on the permutation test

AU - Radivojac, Predrag

AU - Obradovic, Zoran

AU - Dunker, A.

AU - Vucetic, Slobodan

PY - 2004

Y1 - 2004

N2 - We investigate the problem of supervised feature selection within the filtering framework. In our approach, applicable to the two-class problems, the feature strength is inversely proportional to the p-value of the null hypothesis that its class-conditional densities, p(X|Y = 0) and p(X|Y = 1), are identical. To estimate the p-values, we use Fisher's permutation test combined with the four simple filtering criteria in the roles of test statistics: sample mean difference, symmetric Kullback-Leibler distance, information gain, and chi-square statistic. The experimental results of our study, performed using naive Bayes classifier and support vector machines, strongly indicate that the permutation test improves the above-mentioned filters and can be used effectively when sample size is relatively small and number of features relatively large.

AB - We investigate the problem of supervised feature selection within the filtering framework. In our approach, applicable to the two-class problems, the feature strength is inversely proportional to the p-value of the null hypothesis that its class-conditional densities, p(X|Y = 0) and p(X|Y = 1), are identical. To estimate the p-values, we use Fisher's permutation test combined with the four simple filtering criteria in the roles of test statistics: sample mean difference, symmetric Kullback-Leibler distance, information gain, and chi-square statistic. The experimental results of our study, performed using naive Bayes classifier and support vector machines, strongly indicate that the permutation test improves the above-mentioned filters and can be used effectively when sample size is relatively small and number of features relatively large.

UR - http://www.scopus.com/inward/record.url?scp=22944454584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=22944454584&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:22944454584

VL - 3201

SP - 334

EP - 346

BT - Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

A2 - Boulicaut, J.-F.

A2 - Esposito, F.

A2 - Pedreschi, D.

A2 - Giannotti, F.

ER -