Indian Journal of Science and Technology
Year: 2023, Volume: 16, Issue: 36, Pages: 2988-3001
Original Article
S Alagukumar1,2*, T Kathirvalavakumar3
1Research Scholar, Research centre in Computer Science, V. H. N. Senthikumara Nadar College, Virudhunagar, Madurai Kamaraj University, Madurai, Tamil Nadu, India
2Assistant Professor, Department of Computer Applications, Ayya Nadar Janaki Ammal College, Sivakasi, Tamil Nadu, India
3Associate Professor, Research Centre in Computer Science, V. H. N. Senthikumara Nadar College, Virudhunagar, Madurai Kamaraj University, Madurai, Tamil Nadu, India
*Corresponding Author
Email: [email protected]
Received Date:04 March 2023, Accepted Date:14 August 2023, Published Date:27 September 2023
Objective: To make study on generating a less number of class association rules and predicting the class. Methods: Modified associative classification model (MACM) is proposed here for diagnosing cancer from microarray gene expression data using maximal frequent itemsets and probability distribution. The proposed system performs supervised discretization, maximal frequent itemset generation from 80% of the data and prediction processes on the 20% of the dataset. The frequent items set are generated using the minimum support as 20%, 40% and 80% and the minimum confidence as 80%. Binary class data sets and multi class data sets are used to evaluate the constructed model and compared with the classical associative classification algorithms. The model performance is evaluated with type of frequent itemset, number of class association rules generated, accuracy and time taken during training the model. The experiment uses the two colorectal cancer datasets, one lung cancer dataset and one multi label cancer datasets. Findings: The maximal frequent itemset generates the class association rules quickly with lesser number and leads to consume lesser memory space. The performance of the proposed method provides 100%, classification accuracy for the colon cancer datasets GSE15781 and GSE25070 and 99.17% for the colon cancer data set GSE87211. 94% classification accuracy is obtained for the lung cancer dataset GSE43580 when used maximal frequent itemset types. Novelty: Proposed Modified associative classification model has achieved very high performance in classifying gene expression data. The associative classification model helps to diagnose cancer diseases, pathway analysis and treat the cancer disease.
Keywords: Microarray; Discretization; Maximal frequent itemsets; Association rules; Probability distribution
© 2023 Alagukumar & Kathirvalavakumar. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)
Subscribe now for latest articles and news.