Total views : 386

A Hybrid Approach for Simultaneous Gene Clustering and Gene Selection for Pattern Classification


  • Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan University, Bhubaneswar - 751030, Odisha, India


Objectives: This study proposes a hybrid model of simultaneous gene clustering and gene selection for gene expression datasets using hierarchical clustering and rough set theory for classification of data patterns. Methods/Analysis: The internal architecture of the proposed model broadly works in three phases, in first phase; the initial clusters are formed using hierarchical clustering and again those resulted clusters are divided into more clusters using based on lower and upper approximation property of rough set theory. In second phase; the reduct property of rough set is applied on obtained clusters from the second phase; and in third phase, the gene ranking and cluster ranking has been employed to rank the genes in clusters to discover significant of informative genes. This method tries to find the genes of interest known as significant genes and maximize the accuracy of the model with reduction percentage. The advantage of this approach is analyzed by experimental results on two benchmark datasets such as Leukemia and Colon Cancer. Finally, the classification performance of the original datasets were recorded using Support Vector Machine (SVM) classifier and also with few existing feature/gene selection and clustering techniques. Findings: The experimental results and performance measures proves the efficiency of the proposed hybridized technique over existing feature/gene selection as well as established traditional k-means clustering technique.


Gene Selection, Hierarchical Clustering, Lower Approximation, Reduct, Rough Set Theory, Upper Approximation.

Full Text:

 |  (PDF views: 306)


  • Dessì N, Pes B. Similarity of feature selection methods: An empirical study across data intensive classification tasks. Expert Systems with Applications. 2015 Jun; 42(10):4632– 42.
  • Venkataraman S, Sivakumar S, Selvaraj R. A novel clustering based feature subset selection framework for effective data classification. Indian Journal of Science and Technology. 2016 Jan; 9(4):1–9. Doi no: 10.17485/ijst/2016/v9i4/87038.
  • Amarnath B, Appavu S alias Balamurugan. Metaheuristic approach for efficient feature selection: A data classification perspective. Indian Journal of Science and Technology. 2016 Jan; 9(4):1–6. Doi no: 10.17485/ijst/2016/v9i4/87039.
  • Manoharan GV, Shanmugalakshmi R. Multi-objective firefly algorithm for multi-class gene selection. Indian Journal of Science and Technology. 2015 Jan; 8(1):27–34. Doi no: 10.17485/ijst/2015/v8i1/52310.
  • Das K, Ray J, Mishra D. Gene selection using information theory and statistical approach. Indian Journal of Science and Technology. 2015 Apr; 8(8):695–701. Doi no: 10.17485/ ijst/2015/v8i8/64508.
  • Lee SH, Lim JS. Minimum gene selection using BSWFM. Indian Journal of Science and Technology. 2015 Oct; 8(26):1–6. Doi no: 10.17485/ijst/2015/v8i26/80982.
  • Aziz R, Verma CK, Srivastava N. A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data. Genomics Data. 2016 Jun; 8:4–15.
  • Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A, Benitez JM, Herrera F. A review of microarray datasets and applied feature selection methods. Information Sciences. 2014 Oct; 282:111–135.
  • Chandrashekar G, Sahin F. A survey on feature selection methods. Computers and Electrical Engineering. 2014 Jan; 40(1):16–28.
  • Hernandez-Pereira E, Bolon-Canedo V, Sanchez-Marono N, Alvarez-Estevez D, Moret-Bonillo V, Alonso-Betanzos A. A comparison of performance of K-complex classification methods using feature selection. Information Sciences. 2016 Jan; 328:1–14.
  • Moradkhani M, Amiri A, Javaherian M, Safari H. A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm. Applied Soft Computing. 2015 Oct; 35:123–35.
  • Naseriparsa M, Bidgoli AM, Varaee T. A hybrid feature selection method to improve performance of a group of classification algorithms. International Journal of Computer Applications. 2013 May; 69(17):28–35.
  • Salem DA, Abul Seoud RAAA, Ali HA. A new gene selection Technique based on hybrid method for cancer classification using Microarray. International Journal of Bioscience, Biochemistry and Bioinformatics. 2011; 1(4):431–9.
  • Yang P, Zhang Z. Hybrid methods to select informative gene sets in microarray data classification. Australian Conference on Artificial Intelligence. Ed. M.A. Orgun, J. Thornton. Berlin Heidelberg: Springer Verlag; LNAI. 2007; 4830:810–4.
  • Au WH, Chan KCC, Wong AKC, Wang Y. Attribute clustering for grouping, selection and classification of gene expression data. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2009; 2(2):83– 101.
  • Wang D, Zhang H, Liu R, Lv W, Wang D. t-Test feature selection approach based on term frequency for text categorization. Pattern Recognition Letters. 2014 Aug; 45:1–10.
  • Mundra PA, Rajapakse JC. Gene and sample selection using T-score with sample selection. Journal of Biomedical Informatics. 2016 Feb; 59:31–41.
  • Mishra S, Mishra D. SVM-BT-RFE: An improved gene selection framework using Bayesian T-test embedded in support vector machine (recursive feature elimination) algorithm. Karbala International Journal of Modern Science. 2015; 1(2):86–96.
  • Anushaa M, Sathiaseelan JGR. Feature selection using k-means genetic algorithm for multi-objective optimization. Procedia Computer Science. 2015; 57:1074–80.
  • Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Transactions on Neural Networks. 2000; 11(3):586–600.
  • Alshamlan HM, Badr GH, Alohali YA. Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification. Computational Biology and Chemistry. 2015; 56:49–60.
  • Tabakhi S, Moradi P. Relevance-redundancy feature selection based on ant colony optimization. Pattern Recognition. 2015; 48:2798–811.
  • Sakar CO, Kursun O, Gurgen F. A feature selection method based on kernel canonical correlation analysis and the minimum Redundancy–Maximum Relevance filter method. Expert Systems with Applications. 2012 Feb; 39(3):3432– 37.
  • Jiang Y, Li C. mRMR-based feature selection for classification of cotton foreign matter using hyperspectral imaging. Computers and Electronics in Agriculture. 2015; 119:191– 200.
  • Maji P, Paul S. Rough set based maximum relevancemaximum significance criterion and gene selection from microarray data. International Journal of Approximate Reasoning. 2011; 52:408–26.
  • Raza MS, Qamar U. An incremental dependency calculation technique for feature selection using rough sets. Information Sciences. 2016 May; 343-344:41–65.
  • Zhang X, Mei C, Chen D, Li J. Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy. Pattern Recognition. 2016; 56:1–15.
  • Inbarani HH, Azar AT, Jothi G. Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Computer Methods and Programs in Bio Medicine. 2014; 113:175–85.
  • Herawan T, Deris MM, Abawajy JH. A rough set approach for selecting clustering attribute. Knowledge-Based Systems. 2010; 23:220–31.
  • Mundra PA, Rajapakse JC. Gene and sample selection using T-score with sample selection. Journal of Biomedical Informatics. 2016; 59:31–41.
  • Meng J, Zhang J, Li R, Luan Y. Gene selection using rough set based on neighborhood for the analysis of plant stress response. Applied Soft Computing. 2014 Dec; 25:51–63.
  • de Souto MCP, Costa IG, de Araujo DSA, Ludermir TB, Schliep A. Clustering cancer gene expression data: A comparative study. BMC Bioinformatics. 2008; 9(497).
  • Chipman H, Tibshirani R. Hybrid hierarchical clustering with applications to microarray data. Biostatistics. 2006; 7(2):286–301.
  • Luan XY, Li ZP, Liu TZ. A novel attribute reduction algorithm based on rough set and improved artificial fish swarm algorithm. Neurocomputing. 2016; 174:522–9.
  • Wang Q, Dai H, Sun Y. A rough set based clustering algorithm and the information theoretical approach to refine clusters. Fifth World Congress on Intelligent Control and Automation. 2004; 5:4287–91.
  • Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science. 1999; 286:531–7.
  • Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci. 1999; 96:6745–50.
  • Ploj B. Advances in machine learning research (chapter 3). Nova Science Publishers; 2014. ISBN 978-1-63321-214-5.
  • Liu D, Qian H, Dai G, Zhang Z. An iterative SVM approach to feature selection and classification in high-dimensional datasets. Pattern Recognition. 2013; 46:2531–7.
  • Le Thi HA, Vo XT, Dinh TP. Feature selection for linear SVMs under uncertain data: Robust optimization based on difference of convex functions algorithms. Neural Networks. 2015; 59:36–50.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.