Liver disease prediction is the most concentrated research issue in various medical organization and industries. Hepatic disorder needs to be predicted immediately to ensure the timely treatment. However automated and faster prediction of liver disease presence is more difficult task, especially with the incomplete patient data. In
In
In
In
In
In
In
In
In
In
The main goal of our research work is to introduce the automated system which can predict the liver disease in the accurate way. This is done with the concern of the noises and missing terms present in the collected dataset. This study attempts to predict the liver disease present in the patient by analysing the given input dataset. This is done by introducing the Modified Convolutional neural network based on liver disease prediction system which can automatically detect the liver disease from the given input dataset. This research work also attempts to handle the large volume of dataset with more irrelevant terms by adapting the Dimensionality reduction technique. This research work intends to reduce the computation overhead of the classification process by selecting the more optimal features from the input dataset.
In the proposed research work, Dimensionality reduction is carried out using Modified Principal component analysis as a preprocessing step. After the preprocessing step, the optimal features are selected using Score based Artificial Fish Swarm Algorithm (SAFSA). Finally, Modified Convolutional neural network is used for classification of the dataset. The overall flow of the proposed research work is shown in
The analysis of the research work is carried out on the Indian patient liver dataset which is described in the following sub section. From the
Patients with Liver disease have been continuously increasing because of excessive consumption of alcohol, inhale of harmful gases, intake of contaminated food, pickles and drugs. This dataset was used to evaluate prediction algorithms in an effort to reduce burden on doctors. This data set contains 416 liver patient records and 167 non liver patient records collected from North East of Andhra Pradesh, India. The "Dataset" column is a class label used to divide groups into liver patient (liver disease) or not (no disease). This data set contains 441 male patient records and 142 female patient records. Any patient whose age exceeded 89 is listed as being of age "90". The size of the dataset is 22.8 KB.
The liver disease dataset might consist of most noisy features and the more irrelevant features. This might increase the computation overhead of the classifier. This can be avoided by the preprocessing step over the input dataset. In this work, Dimensionality reduction is carried out by using Modified Principal component analysis.
PCA is a classical multivariate data analysis method that is useful in linear feature extraction and data compression
where T is the transform matrix, X is the original vectors and Y is the transformed vectors. In order to solve the transform matrix T, the following equation (2):
where the matrices I , S, U and λ are the square matrix with unity along its diagonal, the covariance matrix of original images, the eigenvectors and the eigen values. U_{j} and λ_{j}( j =1,2,...,m ) can be computed through the equation (2), with the eigen values ordered as λ_{1} ≥ λ_{2} ≥ .... ≥ λ_{m}. The eigenvectors U can be expressed as U = [U_{1}, U_{2},....,U_{m}] = [u_{ij}]_{mxn}, where U satisfies with the equation: U^{T}U = UU^{T}=I. The matrix T can be determined by inversing the matrix U.
Previous studies have demonstrated that PCA is effective in data compression for all classes within the imaged area. In most image processing applications, it is better to deal with a fewer classes and some classes present in the image may be neglected. The PCA method cannot guarantee that the information related to the relevant classes is effectively compressed. The major limitations of PCA are
Standard PCA struggles with Big Data when we need outofcore computation.
Standard PCA can detect only linear relationships between variables/features.
The transformed data we generate after applying PCA should ideally be sparse. Thus, standard PCA always generates dense expressions in certain datasets.
The above mentioned limitations are solved in the proposed Modified PCA. In MPCA, instead of linear assumption, three matrices are constructed with the help of covariance, SVD and iterative methods. In the new method MPCA, training samples, which are relevant for a given application, were selected from a scene, and the transformed matrix T' was obtained from these training samples.
Comparing the two equations (2) and (3), the difference lies in the transform matrix, and essentially lies in the samples for calculating the covariance matrix, one is from training samples, the other is from the whole image sample. The above steps describe the basic steps of PCA where it is modified by constructing the three PCA based on covariance, SVD and Iterative method. From these modifications in PCA, dimensionality reduction is carried out more effectively. The detailed procedure of MPCA including the pseudo code of MPCA is given below:
Construct PCA using Covariance_{t} //t^{th}^{ }vector generationwhere M
PCA using SVD (Singular Value Decomposition)^{T}MCompute eigen decomposition of BB^{T}= X
PCA using iterative method^{T}serror = eigenvalue.rsr = s/sexit if error <tolerancereturn eigenvalue W3, r
Calculate variance extracted from the 3 PCA
Combine the variance of PCA using mean function
If
End if
End
The variances extracted from the 3 PCA are combined using mean function. Average values of the 3 PCA values are checked using a threshold value of 0.3. The samples which are having the coefficients less than 0.3 is eliminated. Out of 583 samples 578 samples are selected for next feature selection. The implemented result sample of MPCA is shown in
After preprocessing, it is required to select the most relevant features from the dimensionality reduced data samples, in order to obtain the most accurate and reliable outcome. This optimal feature selection can be done by introducing the optimization algorithm which can select the most optimal features from the given input dataset. In this work, optimal feature selection is done by Score based Artificial Fish Swarm Algorithm (SAFSA). Here, information gain and entropy values are taken as fitness values.
In general AFSA (artificial fishswarm algorithm) is one of the best methods of optimization among the swarm intelligence algorithms
Consider the state vector of artiﬁcial ﬁsh
Then the basic movement process can be expressed as in equation (5).
Where r produces random numbers between zero and 1, Step is the step size of a move and
In the standard artificial fish algorithm, Step, Visual two parameters are fixed. Bigger Step, and Visual parameter values can guarantee fast convergence at the early stage of the algorithm, but reduces accuracy, and even lead to the local optimal search results. For balancing algorithm's convergence speed and precision, dynamic parameter is introduced namely regulate factor l(0<l<1). Let Step, and Visual parameters are adapting the dynamic changes in the score based artificial fish algorithm. At the time of artificial fish movement, the standard algorithm failed to obtain global optimal values. To overcome the disadvantages, the mobile reference factor is expanded from the original food concentration centre combined with the global optimal position in foraging behavior.
Using artificial fish algorithm search out the optimal value which is the optimal value of objective function theory for continuous function optimization, so it is possible to get as high as a limited time precision that it is the key of the artificial fish algorithm (optimization algorithm). Experimental data shows that artificial fish algorithm late iteration of the effect on accuracy function finally, belong to "invalid iterative calculation.
Limitations of AFSA
High structural and computational complexities
Lack of using AFs’ previous experiences
Lack of appropriate balance between exploration and exploitation to improve the optimization process.
In this section, the above mentioned limitations are handled and eliminated in order to improve the overall performance of AFSA. Late to eliminate the waste and improve the artificial fish algorithm of rapidity and precision of the permitted error introduced precision K and iterative adaptive termination number Z, and connecting with the grid search method, make the artificial fish after the operation accuracy of convergence to the range of allowable error timely termination of iteration, saves the operation time. As a result of the existence of random behavior in artificial fish algorithm for computing the global optimal solution of the late, after the expiration of the iteration, a local grid traversal can overcome much of random behavior influence on final precision, improve the computing accuracy.
The pseudocode of Score based artificial fish swarm algorithm is given as follows:
In initialisation first initialise feature subset using AFS algorithm.
Find fitness value, for each and every feature subset_{i}
while (t <MaxGeneration)_{i}_{ }doPerform Follow behaviour on X_{i}(t) and compute X_{i,follow
}Perform swarm behaviour on X_{i }(t) and compute X_{i,swarm
}Calculate r valuephi1=2.05;phi2=2.05;phi=phi1+phi2;chi=2/(phi2+sqrt(phi^24*phi));r=chiIf F(X_{i, follow}) < F(X_{i, swarm})
Find best feature subset based on maximum z value.
Repeat the process until convergence attained
Convergence is attained using maximum iteration or repetition of same value in atleast 5 iteration.
Classification is done using modified convolutional neural network is used for classification of the dataset. In deep learning, a convolutional neural network is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks, based on their sharedweights architecture and translation invariance characteristics.
In this work there are three layers applied to ensure computation overhead reduced accurate prediction. Those are input layer, convolution layer, pooling layer, and finally soft max or fully connected layer as shown in
The major limitation of CNN is its inability to encode Orientational and relative spatial relationships, view angle. CNN do not encode the position and orientation of data. Lack of ability to be spatially invariant to the input data sample.
The CNN is trained via a sequence of training examples
When tuning the weights and biases of CNN, total number of tuning parameters should be calculated based on the CNN structure. Each individual in GA then holds a number of candidate solutions equal to the number of tuning parameters. At each iteration CNN calculates an output to the problem based on the parameters specified by GA. A separate cost function should also be defined which compares the deviations between output and real target values. GA minimizes this cost function in several iteration until a point where no further improvements could be made and then optimization is terminated. Optimized values are replaced in the final CNN structure.
Initialize parameter values
Generate the initial population
While i<MaxIteration and Bestfitness<MaxFitness do
Fitness calculation
Perform selection
Perform cross over
Perform mutation
End while
Return the best solution
(i) The output of the Convolutional layer can be written as
Where
(ii) The output of the output layer can be written as
Where z denotes the final feature map in the feature layer,
So, the meansquare error can be written as
Where p is the value of the parameter, and p refers to
In this work, performance of the CNN is improvised by introducing the optimal parameter selection, in which CNN parameter values will be selected more optimally using genetic algorithm. This optimal parameter selection process would lead to accurate and efficient classification outcome. Parameter value estimation is the most important step in the CNN classifier which tends to provide the optimal classification outcome. Appropriate selection of parameter values would lead to accurate decision making. In this work, given data set is divided into three subsets for the accuracy and optimal selection of parameter values.
Training set: a set of examples used for learning: to fit the parameters of the classifier g In the MLP case, we would use the training set to find the “optimal” weights with the backprop rule
Validation set: a set of examples used to tune the parameters of a classifier g In the MLP case, we would use the validation set to find the “optimal” number of hidden units or determine a stopping point for the back propagation algorithm
Test set: a set of examples used only to assess the performance of a fullytrained classifier. In the MLP case, we would use the test to estimate the error rate after we have chosen the final model (MLP size and actual weights). After assessing the final model with the test set, you must not further tune the model
The procedure of parameter selection process is given below:
Divide the available data into training, validation and test set
Select architecture and training parameters
Train the model using the training set
Evaluate the model using the validation set
Repeat steps 2 through 4 using different architectures and training parameters
Select the best model and train it using data from the training and validation set
Assess this final model using the test set
In CNN, the optimal bias and weight values are calculated using genetic algorithm, in order to improve prediction of liver disease. The bias and variance values obtained are listed below:
Bias: 0.7151 0.9522 0.1391 0.8866 0.9818 0.1841 0.5042 0.3220 0.2596 0.7990 (Neurons =10)
Weight: 0.9572 0.8442 0.9221 0.8800 0.6134 0.8967 0.5626 0.8254 0.8708 0.7949
In this section, numerical evaluation of the proposed research methodology is done in terms of various performance measures to analyze the performance improvement of the proposed and existing research methodologies. The MATLAB simulation environment is used to implement the proposed research methodology. The performance measures considered in this work are listed as follows: "Accuracy, Precision, Recall and Fmeasure".
The comparison is made between the proposed Modified Convolutional neural network based Liver disease prediction system (MCNNLDPS) and the existing methodologies Multi layer perceptron neural network (MLPNN)
The performance metrics values are given in the following
Metrics 
Methods 

MLPNN 
MCNNLDPS 

Accuracy (%) 
86.70 
90.75 
FMeasure (%) 
70.02 
91.25 
Precision (%) 
84.35 
88.57 
Recall (%) 
59.85 
94.11 
Accuracy score represents the model’s ability to correctly predict both the positives and negatives out of all the predictions. Mathematically, it represents the ratio of sum of true positive and true negatives out of all the predictions.
In the following
From this analysis it is proved that the proposed shows better performance than the existing technique. Proposed CNNLDPS shows improved increased accuracy than MLPNN. This is mainly because of the proposed formulations of SAFSA and MCNN. Here proposedCNNLDPSattains4.05% increased accuracy than the existing MLNNN.
Precision evaluates the fraction of correctly classified instances or samples among the ones classified as positives. Thus, the formula to calculate the precision is given by:
The performance analysis in terms of Precision metric is shown in
In
Recall score represents the model’s ability to correctly predict the positives out of actual positives.
The graphical evaluation of the recall score is clearly depicted in
FMeasure provides a way to combine both precision and recall into a single measure that captures both properties. The traditional F measure is calculated as follows:
The F Measure graphical evaluation is clearly shown in
The proposed methodology CNNLDPS ensures the accurate liver disease prediction outcome. The accuracy and efficiency of the classifier is improvised by performing the feature selection before classification which is done by using Score based artificial fish swarm algorithm. Here the performance of the CNN classifier is improvised by choosing the weight and bias values optimally using genetic algorithm. And also feature selection process is improvised by introducing the improved fish swarm algorithm where position update is done using the new equation. The numerical analysis of the research work has been carried out in the matlab from which it is proved that the proposed technique can ensure the 4.05% increased accurate liver disease classification outcome. Here accuracy of disease prediction is improved by integrating the genetic algorithm with the Convolutional neural network which is novelty of this research.