The mechanism of automatic classification of fruits gains importance due to labor cost increase and due to the lack of labor in rural production. The fruit recognition is required in agricultural applications to harvest fruits using robots. Therefore, the proposed system is a Deep Learning based fruit classification system that can be used

Supermarkets place a high value on fruit quality assessment. A quality assurance check is an absolute necessity to guarantee that the stock is in acceptable condition and without any bad or blemished fruits. These types of quality inspections are typically performed by visually evaluating the fruits and rejecting the spoiled and faulty fruits from the excellent ones. In any case, the human examination isn’t generally dependable because of stress, weariness, interruption and inability to concentrate. These inadequacies show the need of utilizing robotized image recognition techniques

Deep learning techniques have recently outperformed many algorithms in related fields, including image classification. The convolutional neural network is one of the most powerful deep learning approaches for image categorization. According to several studies, convolutional neural networks outperform regular image classification techniques like k-nearest neighbors, multilayer perception, and support vector machines. Convolutional Neural Networks (CNN) has become a hot research area in the field of image classification and recognition. When compared to traditional classification algorithms, CNN can extract features from the image. In CNN, the image can be directly used as input to the network, it does not require pre-processing and feature extraction

We have also trained the model on the publicly available fruits-360 dataset. We have achieved 100 percent accuracy. We have obtained

The entire paper is distributed into sections as follows. Section 1, explains the introduction and need for the classification of fruits and reviews the work related to the classification of fruits. In Section 2, we have presented the subtleties of different datasets like ImageNet and fruits-360 utilized in our system. Section 3 explains the methodology and structure of our proposed CNN model and the EfficientNet-b0 model. The findings of our experiments for both the datasets and the transfer learning are presented in Section 4. The discussion of the resulting output images is given in Section 5. Section 6 gives the conclusion.

We have used two datasets to test the proposed model and compare its output for the recognition of fruits. The two datasets are the ImageNet dataset and the fruits-360 dataset.

S.No. |
Properties |
ImageNet Dataset |
Fruits-360 Dataset |

1 |
Total Number of Images |
9,130 |
47,526 |

2 |
Training set size |
7,760 |
40,397 |

3 |
Validation Set Size |
1,370 |
7,129 |

4 |
Test Set Size |
1,870 |
15,938 |

5 |
Number of classes |
11 |
92 |

6 |
Image Size |
224X224X3 Pixels |
100X100X3 Pixels |

7 |
Background |
Not Homogenous |
Plain White |

8 |
Each Image contains |
Single/Group of Fruits |
Single Fruit |

All the images were downloaded from the ImageNet dataset. The

It is a collection of images of fruits and vegetables. 2020.05.18.0 is the current version.

The dataset is available on GitHub [fruits_360_github] and Kaggle [fruits_360_kaggle].

We have used only fruits of 92 categories. The

The total number of images in the training and testing sets is 47,526 and 15,938 respectively. The size of each image is 100X100X3 pixels. Each image contains a single fruit. Different varieties of the same fruit (for example, apples) are classified into different groups. There are numerous apple varieties, each of which is treated as a different category. They are identified by digits like apple red 1, apple red 2 and so on.

Convolutional neural networks are artificial neural networks with a deep feed-forward architecture. These are mostly used for 2D data i.e. images. The proposed CNN structure is divided into three stages: preprocessing, feature extraction, and classification. CNN’s require a large set of the dataset. The images in the dataset are cropped and resized to 224X224X3 pixels for the ImageNet dataset and 100X100X3 for the fruits-360 dataset. Images are in the standard RGB format, where 3 refers to R, G, and B color. The next step is feature extraction. This is achieved by using convolution layers. The proposed CNN Model diagram consists of several blocks which in turn has sub-blocks. Each sub-block has a convolution layer followed by Batch Normalization Layer, ReLU and Max-Pooling layer. These blocks are directly attached to fully connected Layers and finally by a softmax layer to classify and recognize the input images. _{om} is generated by convolving the input feature vector f_{im} with the kernel k (x, y), i.e.

f_{om} (x, y) = (f_{im} ∗ k) (x, y) = ∑_{i} ∑_{j} f_{im} (i, j). k (x − i, y − j) (1)

f (x) = max (0, x) (2)

as activation function.

These layers introduce a non-linearity function so that images are correctly classified but do not reduce the dimension of the images. After ReLU comes Max-Pooling layers. Pooling layers use kernels of size 2X2 and a stride of 2. These layers change the dimension of the images to one-fourth of their size and thus reduces the number of computations to be performed and so the time to process the model is also reduced. They are accompanied by fully connected layers (FC). Each neuron of the FC layer is connected to each neuron of the preceding layer. The FC layers are preceded by dropout layers. Again, these dropout layers regularize the model and decrease the number of computations and thus avoid overfitting of the model. Finally, the last layer uses a softmax activation function giving the output equal to the number of categories. It calculates the probabilities of each class. The highest probability class label is the output class of the given image.

The proposed CNN has 51 layers. There are a total of six blocks. All the blocks are connected through Max-Pooling layers. The Block_1 has three sub-block and each sub-block has a convolution layer and each convolution layer has 32 filters of size 7X7X3 and BiasLearnRate factor 5 as L1 regularization. Block_2 has two sub-blocks. Each sub-block has a convolution layer with 64 filters of size 5X5X3.Block_3 is the same as block_2 with two convolution layers and 128 filters of size 3X3X3.Block_4 has three sub-blocks. Each sub-block has a convolution layer with 256 filters of size 3X3X3.Block_5 has only one sub-block with one convolution layer and 384 filters of size 3X3X3. The last Block_6 has two sub-blocks with convolution layers having 512 filters of size 3X3X3. After this, there is a Max-Pooling layer which is then followed by four FC layers with 4096, 1024 and 120 neurons with 50%, 40%, 50% and 40% dropouts respectively. The final most layer is the softmax layer with 11 outputs for the ImageNet dataset and 92 for the fruits-360 dataset. The output with maximum probability is the output class.

We have used SGDM (Stochastic gradient descent with momentum) as an optimizer with piecewise learning. The optimizer is extremely important since it aids in lowering or raising the model's error function. The details of the hyperparameters are shown in

S.No. |
Hyperparameters |
Value |

1 |
Optimizer |
SGDM |

2 |
Momentum |
0.90 (default) |

3 |
InitialLearnRate |
0.01 |

4 |
Learning rate drop period |
50 |

5 |
Learning rate drop factor |
0.2 |

6 |
Batchsize |
64 |

7 |
Epochs |
110 |

8 |
L2 Regularization |
0.0005 |

The performance of the model can be tested with a well-trained convolutional neural network model by using transfer learning.

We used a newly developed classifier, which has proven to be effective as well as efficient.

Accuracy =

where Npq denotes the cumulative estimate of the classifier, and TP_{p} denotes true positive (the number of images in category p that are exactly labelled as category p).

It is among the most widely utilized deep learning optimization algorithms. The gradient descent algorithm changes weights and biases in deep neural networks to reduce the loss function by adding fewer steps in the direction of the loss function's negative gradient,

θℓ+1=θℓ−α∇E (θℓ) (4)

here ℓ represents the number of iterations, α>0 indicates the rate of learning, θ is the parameter vector, and E (θ) is the loss function.

The entire training dataset is utilized to calculate the loss function's gradient, ∇E (θ). But, the stochastic gradient descent algorithm tests the gradient and updates a subset of the training set and not the entire dataset. This subset is named a minibatch. Iteration is the term used to describe the process of evaluating the gradient using the minibatch. With each iteration, the algorithm gets closer to minimizing the

An epoch can be defined as a complete pass of the training algorithm through the total number of samples using MiniBatches. The ‘MiniBatchsize' and ‘MaxEpochs' name-value pair arguments can be used to determine the size of MiniBatch and the total number of epochs, respectively. Here, the proposed model uses MiniBatchsize as 64 and MaxEpochs as 110. The stochastic gradient descent algorithm can move down the steepest descent path to the best solution. Another way to reduce the fluctuation is to use a momentum term.

θℓ+1=θℓ−α∇E (θℓ) + γ (θℓ−θℓ−1) (5)

gives the increase in the stochastic gradient descent with momentum, here γ defines gradient step of the previous iteration to the present. The ‘momentum' name-value pair argument may be used to define this value. Here, it is 0.90(default). Again, the SGDM is better optimizer as compared to another optimizer like ‘Adam’. Although Adam converges faster, SGDM generalizes better and thus results in improved final performance.

In MATLAB, we define solver Name as 'sgdm' to utilize stochastic gradient descent with momentum for training a neural network. The starting value of the learning rate α can be defined as ‘InitialLearnRate' name-value pair argument. The value of α is 0.01. We may even change the learning rates after some iterations. Here, we have used a piecewise learning schedule and so after a drop period of 50 epochs, the value of α changes by a drop factor of 0.2.

We have used Intel Core i7 Processor with 32 GB RAM and NVIDIA GeForce RTX 2060 Super GPU. We have implemented the model using MATLAB R2021a using Deep Learning Toolbox on Windows 10 Pro.

We have tested and analyzed the proposed model with two different datasets for fruit classification.

We have trained the CNN model for the classification and recognition of fruits. The ImageNet dataset was employed for the classification of 11 categories of fruits. We have trained the Model for 110 epochs with a minibatchSize of 64 and the learning rate was 0.01. We have optimized the dataset utilizing the stochastic gradient descent optimizer (SGDM) with a piecewise learning schedule. The data was shuffled after every epoch. The graphical representation of training and validation of the dataset for every epoch is shown in

Again, we tested the proposed model with the fruits-360 dataset. The image augmentation was applied similar to ImageNet dataset. We have obtained 100% training accuracy and 100% validation accuracy. The time required to train the model is 1001 minutes and 41 seconds.

Finally, we analyzed the EfficientNet-b0 model on the ImageNet dataset. In this case, we have fine-tuned the last layers of the model. We have used a small value of 3e-4 as an initial learning rate over 6 epochs and the minibatchsize of 10 images. We have optimized the dataset using a stochastic gradient descent optimizer with momentum (SGDM) with a constant learning schedule. The data was shuffled after every epoch. The time required to train the model is 55 minutes 01 second. The graphical representation of training and validation of the dataset for every epoch during fine-tuning the model is shown in

Again, we tested the EfficientNet-b0 model with the fruits-360 dataset. The image augmentation was applied similar to ImageNet dataset. We have obtained 100% training accuracy and 100% validation accuracy. The time required to train the model is 350 minutes 40 seconds.

S. No. Research Study CNN Architecture/ Classifiers Dataset Number Of Images Number Of Categories Image Size Classification Accuracy Test Accuracy 1 The Current Study The Proposed CNN Model ImageNet 9,130 11 224X224X3 91.28% 100% 2 The Proposed CNN Model fruits-360 47,526 92 100X100X3 100% 100% 3 EfficientNet-b0 Neural Network Model Using Transfer Learning ImageNet 9,130 11 224X224X3 96.77% 100% 4 EfficientNet-b0 Neural Network Model Using Transfer Learning fruits-360 47,526 92 100X100X3 100% 99.90% 5 Classification of 7 fruits using CNN CNN ImageNet 10,578 7 256X256X3 91.66% N/A 6 Fruit Classification using EfficientNet and MixNet EfficientNet-b0 Neural Network Model Using Transfer Learning fruits-360 48,905 95 100X100X3 N/A 99.98% 7 Fruit Recognition Using EfficientNet-b0 algorithm EfficientNet-b0 Neural Network Model Using Transfer Learning fruits-360 17624 25 100X100X3 95.67% 98% 8 Fruit Recognition using Deep Learning CNN fruits-360 90,380 131 100X100X3 100% 98.66% 9 Fruit Image Classification using Pure CNN CNN fruits-360 55,244 81 100X100X3 98.88% 97.87% 10 Fruit Classification using Six Layer CNN CNN http://images. Google.com http://images. baidu.com 1800 09 256X256X3 91.44% N/A

The images of fruits-360 are clean with plain white background and so the validation and testing accuracy for both the proposed model and transfer learning is very high and promising. On the other hand, ImageNet dataset images are with complicated backgrounds and so could not reach 100% validation accuracy but could find testing accuracy as 100%. The effect of preprocessing and optimization was analyzed in both datasets. It was found that the factors that influence to have a better performance are initial learning rate, optimizer, and number of epochs, L2 regularization and the number of hidden layers. We have used the same optimizer for the proposed model and transfer learning but other parameters were different. We have chosen them based on the trial and error method and no specific algorithm was used to choose the initial learning rate and so on.

The results summarized in

Once, the model is trained it can be used for testing. The testing accuracy is also 100%. It recognizes the test images within a fraction of a second and hence the model can be used for practical implementation.

Similarly,

We have developed a new CNN model for the classification and recognition of fruits using MATLAB software. We have achieved very good accuracy on the used ImageNet and fruits-360 dataset. The results showed that the efficiency of the proposed model is similar to the transfer learning using the latest EfficientNet-b0 model. The testing accuracy is also very good with all the datasets and for both the architectures and so can be applied to the agricultural area. We have achieved state-of-the-art results and hence, we say the proposed model is a robust one.

Though, the proposed model gives 100% accuracy the number of computations is more and so needs more time to train the model.

As a future aspect, we can add more categories of fruits to the model. We can increase the training and validation sets to incorporate more things. We can test the model by changing the activation function and the optimization methods.