Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 30, Pages: 1-10
R. Vidya1* and G. M. Nasira2
1 Department of Computer Science, [email protected]
2 Department of Computer Science, [email protected]
*Author for correspondence
Department of Computer Science,
Background/Objective: Cervical Cancer is one among the most vulnerable and highly affected diseases among women around the World. Normally, cells grow and divide to produce more cells only when the body needs them. This orderly process helps to keep dividing when new cells are not needed. These cells may form a mass of extra tissue called a growth of tumor. Tumors can be classified as Benign or Malignant. First Benign Tumors are not cancer. They can usually be removed, and in most cases, they do not show up. Most important, the cells in benign tumors do not spread to other parts of the body. Second, malignant tumors are cancer cells. These tumors can damage nearby tissues and organs. Malignant tumors are threat to life. In this research work, prediction of normal cervix or Cancer cervix is determined with the aid of Powerful Data Mining algorithms. Methods/Statistical Analysis: In this research work, prediction of normal cervix or Cancer cervix is determined with the aid of Powerful Data Mining algorithms. Data mining plays an indispensable role in prediction especially in medical field. Using this concept, Classification and Regression Tree algorithm, Random Forest Tree algorithm and RFT with K-means learning for prediction of normal cervix or cancer cervix is introduced. Collection of data from NCBI (National center for Bio-technology Information) in our work, we used the data set that contains 500 records and 61 variables (i.e. Biopsy numerical value with gene identifier). The output has been presented in the form of prediction tree format. As stated, we selected a sample of 100 records with 61 biopsy features. Based on this biopsy data, an awareness program is conducted and survey is followed up to identify the changes of women during this transition period. To collect data efficiently, a Personal Interview program was conducted among rural women in various places. Collaboration with JIPMER hospital people were checked up for the test of cervical cancer. The results obtained through biopsy test were put through statistical analysis and was given through MATLAB for algorithm testing. To ascertain the results obtained are segregated and delivered in various heads with 100 test data and 60 training data. Findings: Comparison of the performance of various algorithms was used under the techniques in terms of sensitivity, specificity and accuracy to determine the best predictor for the cervical cancer. At first, Regression tree algorithm methodology was used for prediction. The CART binary tree yields two results, either normal cervix or cancer cervix. A Splitting Criterion called GINI index is used to identify the diversity that exists in cervical data. RFT validated optimal accuracy, a new logic was applied i.e. “combinations of two algorithms” is used. It is also an ensemble supervised machine learning algorithm. The process of whitening is used as a pre-process in k-means clustering, to get the best prediction result. The result showed the 83.87% accuracy with CART TREE output. Random Forest Tree (RFT) is used to improve the prediction accuracy. With MATLAB Coding we achieved 93.54% of prediction accuracy. The K-Means algorithm is considered efficient for processing huge datasets and hence a high accuracy of 96.77% is achieved with RFT - K-MEAN LEARNING TREE output. The Randomization of Algorithm is presented in two ways: 1. Bagging for random bootstrap sampling and 2. Input attributes are selected at random for decision tree generation. This creates an unbiased estimate of generalization error as growing of tree into forest progressed and the derivation of time complexity of K-Means is achieved. Applications/Improvements: Cervical cancer diagnosis and prognosis are two medical applications which pose a great challenge to the researchers. The algorithms optimize a cost function defined on the Euclidean distance measure between the data points and means of cluster. Combination of RFT with K-means algorithm is the novelty of our research work, where we have achieved high accuracy result. Accurate prediction of occurrence of cervical cancer has been the most challenging and toughest task in medical data mining because of the non-availability of proper dataset. Many researchers have been done to develop different techniques that can solve problems and improve the prediction accuracy of cervical cancer through images. But in our research work, the prediction of cervical cancer is with Numerical Data. NCBI (National Center for Biotechnology Information) data set has been used. This research paper is a boon to create expert medical decision making systems and a solution for medical practitioners to construct an optimal prediction model for Cervical Cancer Prediction.
Keywords: Cervical Cancer, CART, Data Mining, Hereditary Pattern, K-Mean, RFT
Subscribe now for latest articles and news.