Abstract
Feature selection for clustering of high dimensional data clustering is a difficult problem because of the broad number of redundant and irrelevant featured that the data can have that can run the clustering. A weighting scheme is proposed, wherein the weight for each feature is measured by its contribution to the given clustering task.Two different steps are proposed in the algorithm In the first step features are divided into clusters using graph-theoretic clustering methods. In the second step,feature subsets are formed by the features that are strongly related to target classes. Feature subset selection research is focuses on searching for relevant features. The proposed logic on minimizes redundant data set and improves the feature subset accuracy. Efficient minimum-spanning tree (MST) clustering method to ensure efficiency of proposed algorithm, is adopted in the algorithm. Extensive experiments are carried out to compare the proposed algorithm and several other feature selection algorithms, namely, Relief, FCBF, CFS, Consist, and FOCUS-SF .The results, demonstrate that the algorithm not only produces smaller subsets of features but also improves the performances of the four types of classifiers.
Keywords- Feature subset selection, filter technique, feature clustering, graph-based clustering.