International Journal of Education & Applied Sciences Research

International Journal of Education & Applied Sciences Research

Print ISSN : 2349 –4808

Online ISSN : 2349 –2899

Frequency : Continuous

Current Issue : Volume 2 , Issue 5
2015

EFFICIENT DESIGN OF FAST FEATURE SUBSET SELECTION ALGORITHM FOR MULTI-DIMENSIONAL DATA BASED ON CLUSTERING

*Meghana Satish, **T. Bhavana Bhat, ***M V Trupthi, ****Kaushal Vishu, *****C N Chinnaswamy

*Information Science NIE, Mysore, India,    **Information Science NIE, Mysore, India,    ***Information Science NIE, Mysore, India,    ****Information Science NIE, Mysore, India,    *****Information Science NIE, Mysore, India

DOI : Page No : 17-27

Published Online : 2015-05-30

Download Full Article : PDF Check for Updates


Abstract

Feature selection for clustering of high dimensional data clustering is a difficult problem because of the broad number of redundant and irrelevant featured that the data can have that can run the clustering. A weighting scheme is proposed, wherein the weight for each feature is measured by its contribution to the given clustering task.Two different steps are proposed in the algorithm In the first step features are divided into clusters using graph-theoretic clustering methods. In the second step,feature subsets are formed by the features that are strongly related to target classes. Feature subset selection research is focuses on searching for relevant features. The proposed logic on minimizes redundant data set and improves the feature subset accuracy. Efficient minimum-spanning tree (MST) clustering method to ensure efficiency of proposed algorithm, is adopted in the algorithm. Extensive experiments are carried out to compare the proposed algorithm and several other feature selection algorithms, namely, Relief, FCBF, CFS, Consist, and FOCUS-SF .The results, demonstrate that the algorithm not only produces smaller subsets of features but also improves the performances of the four types of classifiers.

 

Keywords- Feature subset selection, filter technique, feature clustering, graph-based clustering.