A Hybrid Approach for Simultaneous Gene Clustering and Gene Selection for Pattern Classification

Pradeep Kumar Mallick; Debahuti Mishra; Srikanta Patanaik and Kailash Shaw

doi:10.17485/ijst/2016/v9i21/94175

Article

A Hybrid Approach for Simultaneous Gene Clustering and Gene Selection for Pattern Classification

VIEWS 919
PDF 342

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2016/v9i21/94175

Year: 2013, Volume: 9, Issue: 21, Pages: 1-10

Original Article

A Hybrid Approach for Simultaneous Gene Clustering and Gene Selection for Pattern Classification

Pradeep Kumar Mallick^*, Debahuti Mishra, Srikanta Patanaik and Kailash Shaw

Department of Computer Science and Engineering, [email protected]
[email protected]
[email protected]
[email protected]
*Author For Correspondence
Pradeep Kumar Mallick
Department of Computer Science and Engineering,
Email:[email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: This study proposes a hybrid model of simultaneous gene clustering and gene selection for gene expression datasets using hierarchical clustering and rough set theory for classification of data patterns. Methods/Analysis: The internal architecture of the proposed model broadly works in three phases, in first phase; the initial clusters are formed using hierarchical clustering and again those resulted clusters are divided into more clusters using based on lower and upper approximation property of rough set theory. In second phase; the reduct property of rough set is applied on obtained clusters from the second phase; and in third phase, the gene ranking and cluster ranking has been employed to rank the genes in clusters to discover significant of informative genes. This method tries to find the genes of interest known as significant genes and maximize the accuracy of the model with reduction percentage. The advantage of this approach is analyzed by experimental results on two benchmark datasets such as Leukemia and Colon Cancer. Finally, the classification performance of the original datasets were recorded using Support Vector Machine (SVM) classifier and also with few existing feature/gene selection and clustering techniques. Findings: The experimental results and performance measures proves the efficiency of the proposed hybridized technique over existing feature/gene selection as well as established traditional k-means clustering technique.
Keywords: Gene Selection, Hierarchical Clustering, Lower Approximation, Reduct, Rough Set Theory, Upper Approximation