Brain tumor treatments include surgery, chemotherapy, radiotherapy, and a combination of these treatments. Even with critical medical monitoring, patients usually do not survive more than 14 months
Magnetic resonance imaging (MRI) can produce high-quality images, and is therefore considered efficient for tracking brain tumors
Two CNN architectures, Inception-V3 and VGG-Net, with features and weights extracted during training on an Image-net dataset
Abdalla and Esmail
Another automatic brain tumor detection method was proposed using CNNs with 3 × 3 kernels
A two-phase multi-model automatic diagnosis of brain tumors was proposed
The model in
A model for automatic brain tumor detection was proposed
A model for automatic brain tumor detection was proposed
N. Srivastava et.al. in
P. Dvorák et.al. in
S. Irsheidat et.al. in
Sravya et al.
An automated brain tumor detection system
A brain tumor detection application
From a literature review I is learnt that the CNN is a promising approach for accurate image classification problems. CNNs use kernels and feature maps to extract features, reducing the dimensions of the input. This helps to increase the efficiency of the model with respect to time and memory constraints.
Insufficient training data leads to lower accuracy. For any classification problem with insufficient training data, transfer learning is an effective way to achieve higher accuracy because it requires fewer computations and fewer trainable parameters than training from scratch. Therefore, it provides higher accuracy, even if the amount of data is smaller. Fine-tuning is another technique that can be applied to increase the accuracy of the model when the classification problem is different from the problem used for transfer learning.
In this study, an automatic brain tumor detection model, using Inception-V3, VGG-16, and VGG-19 architectures, is developed based on transfer learning. In addition, fine tuning is performed to improve the classification accuracy, because the dataset used to train the model with transfer learning is different from the dataset used for the problem for which the model was trained.
In this study, automatic classification of brain MRI images into healthy images and images with tumors is proposed.
In this section, we give a brief description of the dataset used and an introduction to CNNs, transfer learning, and fine-tuning.
Training Set |
Test Set |
||
---|---|---|---|
Yes |
No |
Yes |
No |
1050 |
1050 |
450 |
450 |
The experiments described in this study were performed using a publicly available dataset acquired from a Kaggle warehouse. This dataset consisted of 1500 brain MRI images with tumors and 1500 brain MRI images without tumors. All images were two-dimensional and had a height and width of 256 × 256 pixels. All images were skull-stripped and labeled yes if they contained a tumor and no if they did not.
The dataset contains 3000 images with and without a brain tumor. All images need to be resized to 224 × 224 pixels and then converted into vectors, which serve as input to the neural network. All images were then converted to grayscale images. This preprocessed data, along with the label of the image, is provided as input to the neural network. Label 0 represents an image without a tumor and label 1 represents an image with a tumor.
Although we have a dataset of 3000 MRI images, a CNN model with one million parameters cannot be trained because it is considered insufficient. As stated in the objectives, the solution to this problem is data augmentation, which is more effective than manual segmentation because it takes more time. Similarly, as more data is used, this approach minimizes the occurrence of errors, which are seen as major setbacks in these types of applications. This is a technique for artificially increasing the size of existing data by rotating, scaling, and adding noise. Data can be enlarged by flipping the images horizontally or vertically by a certain angle, zooming the image, and increasing or decreasing the brightness range. All of this was used to magnify our data. Each image was augmented by approximately 16 times the original data. This ensures that the model is not overfitted to the data
A CNN
It mainly consists of a sequence of four steps for classification: convolution, activation, pooling, and fully connected layers, as shown in
Convolutional layer: This layer involves a mathematical operation that requires two inputs: the input image matrix and a filter. The input image was multiplied by the filter, and a feature map was generated as an output.
Activation layer: This layer includes an activation function that gives nonlinearity to the neural network. Rectifier linear units (ReLUs) are used because they increase the training speed. Equation (1) shows the mathematical equation for the ReLU activation.
Pooling layer: The main limitation of the convolutional layer is that it captures features that depend on their location. Thus, if the location of the feature in the image changes slightly, the classification becomes inaccurate. Pooling allows the network to overcome this limitation by making the representation more compact so that it is invariant to minor changes and insignificant details. Max pooling and average pooling were used to connect the features.
Fully connected layer: The features learned from the convolutional layers are finally fed into the fully connected layer. The term “fully connected” means that every node in this layer is connected to every other node in the next layer. The main purpose of this layer is to associate a class with a particular input image. This layer uses softmax activation.
Loss function: This function (H) must be minimized during training. The output is calculated after the image has passed through all the previous layers. It is compared to the desired output using the loss function, and the error rate is calculated. This process is repeated for several iterations until the loss function is minimized. The loss function we used was the categorical cross-entropy (CCE). Equation (2) shows the mathematical equation for CCE.
Where
Transfer learning
Transfer learning is mainly used when there is not enough training data available. Therefore, the features and weights obtained when training the architectures with the Image-net dataset were initialized. The Inception-V3
These models then receive the magnetic resonance images of the brain as input. The input, output, and fully connected layers with soft-max activation are trained, and the features are learned. These features were used to classify the images into two classes: healthy and tumor.
The Inception-V3
The features of the Inception-V3
For Inception-V3, 29,630,466 parameters were trained while fine-tuning the model for the last 156 layers, and an accuracy of 89 % was achieved on the training set and 89 % on the test set. In VGG-16, 11,800,066 parameters were trained while fine-tuning the model for the last nine layers, and an accuracy of 98 % was achieved on the training set, and 96 % was achieved on the test set. In VGG-16, 18,289,922 parameters were trained while fine-tuning the model for the last 11 layers. An accuracy of 98 % was achieved for the training set, and 97 % for the test set.
The model was evaluated by using an accuracy metric. The accuracy of a model can be defined as the ratio between the number of correctly classified images/samples and the total number of images in the dataset. The accuracy of the training set and the test set was calculated using Equation (3).
Where TP represents True Postitives, TN represents True Negatives, FP represents False Positives, and FN represents False Negatives.
Initialize the CNN and load its weights and features
Replace the input and output layers and the fully connected layer with new layers using the ReLU activation function.
Freeze all other layers and train the model
Finally, re-train the last layers to update all higher-level parameters responsible for accurate tumor classification, using a very low learning rate and the Adam optimizer.
The dataset used was publicly available on the Kaggle website, where a custom dataset containing 3000 brain MRI images was published for research purposes.
This dataset contains 1500 images with tumors and 1500 images without tumors, with each image having a size of 256 × 256. All these images were converted into 224 × 224 vectors and fed into the neural network as input along with their labels
Data augmentation includes many methods, including rotating images, flipping images, increasing/decreasing image brightness, and increasing/decreasing image size
The proposed method was implemented in Python and Keras, using TensorFlow as the backend. 30 epochs were selected and the batch size was 128. The selected learning rate was set to 0.0001. The learning rate of the model was carefully chosen as a higher learning rate will cause the optimizer to diverge the loss function of the model instead of converging it.
Because we aimed for an accurate and reliable tumor detection method, we used three CNN architectures, a deep architecture Inception-V3 and the others being shallow architectures VGG-16
Inception-V3: Inception-V3 is a deep architecture with 48 inception layers, each of which consists of four convolutional layers with activation functions and two max-pooling layers
VGG-16: VGG-16 has a shallow architecture with only 16 layers , as shown in
VGG-19: VGG-19 is a CNN architecture with 19 layers, as shown in
All architectures, Inception-V3, VGG-16, and VGG-19, were trained for 30 epochs with a batch size of 128 using categorical cross-entropy as the loss function. The Adam optimizer with a very low learning rate of 0.0001 was used to optimize the training. The step size is considered too small because larger values lead to divergence in the loss function instead of convergence. The plots of accuracy and loss for Inception-V3 are shown in
Inception-V3 was trained with 4098 parameters, VGG-16 with 1029 parameters, and VGG-19 with 4096 parameters using transfer learning. During fine-tuning, Inception-V3 was trained for 29,630,466 parameters, VGG-16 for 11,800,066 parameters, and VGG-16 for 18,289,922 parameters for 20 epochs. The results obtained after fine-tuning were better than those obtained with transfer learning only because the weights of the model were re-trained with the current dataset and updated according to the current problem we are dealing with.
|
|
|
|
Inception-V3 |
Not applied |
87% |
86% |
Inception-V3 |
Applied |
89% |
|
VGG-16 |
Not Applied |
87% |
86% |
VGG-16 |
Applied |
98% |
|
VGG-19 |
Not Applied |
84% |
83% |
VGG-19 |
Applied |
98% |
|
The results obtained with transfer learning and fine-tuning are presented in
Inception-V3 achieved 87 % accuracy on the training set and 86 % with the test set on transfer learning. After fine-tuning the architecture, an accuracy of 89 % was achieved on the training set and 89 % on the test set, representing a 2 % increase in training set accuracy and a 3 % increase in test set accuracy. VGG-16 achieved 87 % accuracy on the training set and 86 % on the test set during transfer learning. After fine-tuning the architecture, an accuracy of 98 % was achieved on the training set and 96% on the test set, representing an 11 % increase in the accuracy of the training set and a 10% increase in the accuracy of the test set. VGG-19 achieved 84 % accuracy on the training set and 83% on the test set for transfer learning. However, after fine-tuning the architecture, it achieved 98 % accuracy on the training set and 97 % on the test set, representing a 14 % increase in accuracy in the training set and a 14 % increase in accuracy in the test set.
VGG-19 and Inception-V3 achieved similar accuracy in transfer learning, as they trained almost the same number of parameters. However, when fine-tuning the model, Inception-V3 must train a much larger number of parameters than VGG-19. Therefore, although the accuracies achieved by Inception-V3 and VGG-19 after transfer learning were similar, there was a significant difference in the accuracies they achieved after fine-tuning.
VGG-16 and VGG-19 achieved higher accuracies than Inception-V3 after fine-tuning because the efficiency of the optimizer is affected by the number of layers present in the network. The optimizer works better with networks with fewer layers than those with more layers. Because the optimizer is efficient with VGG-16 and VGG-19, they achieve better accuracy than Inception-V3.
|
|
|
VGG-19 |
97.00 |
Trained on automatically extracted MRI features using transfer learning and fine-tuning 12 layers with learning rate=0.0001 |
VGG-16 |
96.00 |
Trained on automatically extracted MRI features using transfer learning and fine-tuning 9 layers with learning rate=0.0001 |
VGG-16 (2020) |
96.00 |
Trained on a small dataset for 15 epochs using transfer learning |
Inception-V3 |
89.01 |
Trained on automatically extracted MRI features using transfer learning and fine-tuning 155 layers with learning rate=0.0001 |
VGGNet -16 (2017) |
83.86 |
Trained on Image-Net Dataset and tested on MRI slices 200×200 |
ResNet (2017) |
84.91 |
Trained on Image-Net Dataset and tested on MRI slices 200×200 |
Inception-V3 (2020) |
75.00 |
Trained on a small dataset for 15 epochs using transfer learning |
Therefore, it can be stated that whenever the dataset is limited and contains an insufficient number of images to train a neural network, transfer learning can be used to achieve better accuracy in less time than training the model from scratch. Fine-tuning can then be used in conjunction with transfer learning to achieve higher accuracy, as the weights of the model are adjusted to fit the current problem.
In this study, we presented a solution to the computer vision problem. That is, we automated the detection of brain tumors in MRI images using CNNs and transfer learning. The deep architecture and shallow architecture of CNNs Inception-V3, VGG-16, and VGG-19 were used to extract features using transfer learning. Then, some weights were updated by fine-tuning. An accuracy of 89 % was achieved with Inception-V3, 96 % with VGG-16, and 97 % with VGG-19 on the experimental dataset. Another technique called data augmentation was used to reduce the likelihood of overfitting the dataset, because the dataset was small, and to solve the problem of biased datasets. Transfer Learning presents a novel method for analyzing data with few annotations by transferring knowledge from the source domain to the target domain. In the future, this technique can be advised for determining the size of a tumor so that its stage can also be determined. In addition, this model can be recommended to apply transfer learning to any other tumor-detection problem for which training data is lacking.
The data used to support the findings of this study are available at
https://www.kaggle.com/ahmedhamada0/brain-tumour-detection.