Indian Journal of Science and Technology
Year: 2013, Volume: 9, Issue: 21, Pages: 1-10
Pradeep Kumar Mallick*, Debahuti Mishra, Srikanta Patanaik and Kailash Shaw
Department of Computer Science and Engineering, [email protected]
*Author For Correspondence
Pradeep Kumar Mallick
Department of Computer Science and Engineering,
Objectives: This study proposes a hybrid model of simultaneous gene clustering and gene selection for gene expression datasets using hierarchical clustering and rough set theory for classification of data patterns. Methods/Analysis: The internal architecture of the proposed model broadly works in three phases, in first phase; the initial clusters are formed using hierarchical clustering and again those resulted clusters are divided into more clusters using based on lower and upper approximation property of rough set theory. In second phase; the reduct property of rough set is applied on obtained clusters from the second phase; and in third phase, the gene ranking and cluster ranking has been employed to rank the genes in clusters to discover significant of informative genes. This method tries to find the genes of interest known as significant genes and maximize the accuracy of the model with reduction percentage. The advantage of this approach is analyzed by experimental results on two benchmark datasets such as Leukemia and Colon Cancer. Finally, the classification performance of the original datasets were recorded using Support Vector Machine (SVM) classifier and also with few existing feature/gene selection and clustering techniques. Findings: The experimental results and performance measures proves the efficiency of the proposed hybridized technique over existing feature/gene selection as well as established traditional k-means clustering technique.
Keywords: Gene Selection, Hierarchical Clustering, Lower Approximation, Reduct, Rough Set Theory, Upper Approximation
Subscribe now for latest articles and news.