• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 36, Pages: 2988-3001

Original Article

Modified Associative Classification Model for Microarray Gene Expression Data using Maximal Frequent Itemsets and Probability Distribution

Received Date:04 March 2023, Accepted Date:14 August 2023, Published Date:27 September 2023


Objective: To make study on generating a less number of class association rules and predicting the class. Methods: Modified associative classification model (MACM) is proposed here for diagnosing cancer from microarray gene expression data using maximal frequent itemsets and probability distribution. The proposed system performs supervised discretization, maximal frequent itemset generation from 80% of the data and prediction processes on the 20% of the dataset. The frequent items set are generated using the minimum support as 20%, 40% and 80% and the minimum confidence as 80%. Binary class data sets and multi class data sets are used to evaluate the constructed model and compared with the classical associative classification algorithms. The model performance is evaluated with type of frequent itemset, number of class association rules generated, accuracy and time taken during training the model. The experiment uses the two colorectal cancer datasets, one lung cancer dataset and one multi label cancer datasets. Findings: The maximal frequent itemset generates the class association rules quickly with lesser number and leads to consume lesser memory space. The performance of the proposed method provides 100%, classification accuracy for the colon cancer datasets GSE15781 and GSE25070 and 99.17% for the colon cancer data set GSE87211. 94% classification accuracy is obtained for the lung cancer dataset GSE43580 when used maximal frequent itemset types. Novelty: Proposed Modified associative classification model has achieved very high performance in classifying gene expression data. The associative classification model helps to diagnose cancer diseases, pathway analysis and treat the cancer disease.

Keywords: Microarray; Discretization; Maximal frequent itemsets; Association rules; Probability distribution


  1. Fathi H, Alsalman H, Gumaei A, Manhrawy IIM, Hussien AG, El-Kafrawy P. An Efficient Cancer Classification Model Using Microarray and High-Dimensional Data. Computational Intelligence and Neuroscience. 2021;2021:1–14. Available from: https://doi.org/10.1155/2021/7231126
  2. Shah SH, Iqbal MJ, Ahmad I, Khan S, Rodrigues JJPC. Optimized gene selection and classification of cancer from microarray gene expression data using deep learning. Neural Computing and Applications. 2020;p. 1–12. Available from: https://doi.org/10.1007/s00521-020-05367-8
  3. Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M. Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis. 2020;143:1–19. Available from: https://doi.org/10.1016/j.csda.2019.106839
  4. B Çİ, Ö ÜMK, Çolak C. Assessment of COVID-19-Related Genes Through Associative Classification Techniques. Konuralp Medical Journal. 2022;14(1):1–8. Available from: https://doi.org/10.18521/ktd.958555
  5. Veroneze R, Corbi SCT, Silva BRD, Rocha CDS, Maurer-Morelli CV, Orrico SRP, et al. Using association rule mining to jointly detect clinical features and differentially expressed genes related to chronic inflammatory diseases. PLOS ONE. 2020;15(10):1–22. Available from: https://doi.org/10.1371/journal.pone.0240269
  6. Fournier-Viger LJM, P, VS. Frequent itemset mining: A 25 years review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2019;9(6). Available from: https://doi.org/10.1002/widm.1329
  7. Kenmogne EB, Fotso LCT, Djamegni CT. A novel algorithm for mining maximal frequent gradual patterns. Engineering Applications of Artificial Intelligence. 2023;120:105939. Available from: https://doi.org/10.1016/j.engappai.2023.105939
  8. Alagukumar S, Kathirvalavakumar T, Prasath R. Compact Associative Classification for Up and Down Regulated Genes Using Supervised Discretization and Clustering. In: MIKE 2021: Mining Intelligence and Knowledge Exploration, Lecture Notes in Computer Science book series. Cham. Springer International Publishing. 13119:33–46.
  9. Sen D, Paladhi S, Frnda J, Chatterjee S, Banerjee S, Nedoma J. Associative Classifier Coupled With Unsupervised Feature Reduction for Dengue Fever Classification Using Gene Expression Data. IEEE Access. 2022;10:88340–88353. Available from: https://doi.org/10.1109/ACCESS.2022.3198937
  10. Abdo AS, Abdul-Kader HM, Salem RK. Enhanced Compressed Maximal Frequent Patterns from COVID-19 Streaming Data. Studies in Informatics and Control. 2022;31(1):99–108. Available from: https://doi.org/10.24846/v31i1y202210
  11. Xu H, Ma Y, Zhang J, Gu J, Jing X, Lu S, et al. Identification and Verification of Core Genes in Colorectal Cancer. BioMed Research International. 2020;2020:1–13. Available from: https://doi.org/10.1155/2020/8082697
  12. Lv J, Li L. Hub Genes and Key Pathway Identification in Colorectal Cancer Based on Bioinformatic Analysis. BioMed Research International. 2019;2019:1–13. Available from: https://doi.org/10.1155/2019/1545680
  13. Ong HF, Mustapha N, Hamdan H, Rosli R, Mustapha A. Informative top-k class associative rule for cancer biomarker discovery on microarray data. Expert Systems with Applications. 2020;146:113169. Available from: https://doi.org/10.1016/j.eswa.2019.113169
  14. Yuan F, Lu L, Zou Q. Analysis of gene expression profiles of lung cancer subtypes with machine learning algorithms. Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease. 2020;1866(8):1–9. Available from: https://doi.org/10.1016/j.bbadis.2020.165822
  15. Staunton JE, Slonim DK, Coller HA, Tamayo P, Angelo MJ, Park JMJ, et al. Chemosensitivity prediction by transcriptional profiling. Proceedings of the National Academy of Sciences. 2001;98(19):10787–10792. Available from: https://pubmed.ncbi.nlm.nih.gov/11553813/
  16. Maniruzzaman, Rahman JJ, Ahammed B, Abedin M, Suri HS, Biswas MS, et al. Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. In: GRS, Suri JS., eds. Cognitive Informatics, Computer Modeling, and Cognitive Science, Theory, Case Studies, and Applications. (Vol. 1, pp. 273-317) Elsevier. 2020.
  17. Abd-Elnaby M, Alfonse M, Roushdy M. Classification of breast cancer using microarray gene expression data: A survey. Journal of Biomedical Informatics. 2021;117:1–9. Available from: https://doi.org/10.1016/j.jbi.2021.103764
  18. Smyth GK. Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments. Statistical Applications in Genetics and Molecular Biology. 2004;3. Available from: https://doi.org/10.2202/1544-6115.1027
  19. Sá CRd, CS, Knobbe A, A. Entropy-based discretization methods for ranking data. Information Sciences. 2016;329:921–936. Available from: https://doi.org/10.1016/j.ins.2015.04.022
  20. Garcia S, Luengo J, Sáez JA, López V, Herrera F. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning. IEEE Transactions on Knowledge and Data Engineering. 2013;25(4):734–750. Available from: https://www.computer.org/csdl/journal/tk/2013/04/ttk2013040734/13rRUwInvJH
  21. Badhon B, Kabir MMJ, Xu S, Kabir M. A survey on association rule mining based on evolutionary algorithms. International Journal of Computers and Applications. 2021;43(8):775–785. Available from: https://doi.org/10.1080/1206212X.2019.1612993
  22. Agrawal R, Imieliński T, Swami A. Mining association rules between sets of items in large databases. ACM SIGMOD Record. 1993;22(2):207–216. Available from: https://doi.org/10.1145/170036.170072
  23. Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T. MAFIA: a maximal frequent itemset algorithm. IEEE Transactions on Knowledge and Data Engineering. 2005;17(11):1490–1504. Available from: https://doi.org/10.1109/TKDE.2005.183


© 2023 Alagukumar & Kathirvalavakumar. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.