Date Fruit Recognition using Feature Extraction Techniques and Deep Convolutional Neural Network

Numerous biotechnology software applications are developed to provide computational solutions to complex agricultural problems like identification of diseases and monitoring plant growth. Dates are healthy fruit and its contribution in total G.D.P of Pakistan is approximately 4% in which District Khairpur provides approximately 81% production. Approximately 22 types of Dates are produced in different areas of Pakistan. It is observed that national as well as international emptor are unable to correctly identify the type of dates. Objectives: This study aims to presents a framework for recognition of Dates using Deep Learning technique based on color, shape and size feature extraction methods. Methods: We have established fruit images dataset of 500 images for evaluation purpose likewise 360-dataset. Three types of Dates were selected for experiments like Aseel, Karbalain and Kupro. The range of 500date fruit samples were collected out of which 350 used for training dataset and 150 used for testing purpose. Findings: Experiment performed on the selected samples following the proposed framework. For better accuracy, we have used combination of several hidden layers and 100 epochs which gives the best performance result of 97.2% at 4th epoch. A confusion matrix is used to analyze and measure the results accuracy through which we get 89.2% as a True Positive. Application and Improvements: The outcome will be beneficial for the emptor, researchers and also for automated factory classification. *Author for correspondence


Introduction
Date fruits are sweet and edible, consumed as a staple diet. Date is practically at no cost have sodium and cholesterol elements providentially. The dates are very useful, helpful and they are well thought-out a divine gift specially for heart patients. Dates are one of the major crops of Pakistan and possess an enormous significance in global market. There are more than 400 varieties of date fruit found around the world among which 40 types of dates order to identify and classify fruit images. They provide better accuracy results than other machine learning algorithms. The most common example of the Artificial Neural Network is Convolutional Neural Networks and is considered as the most efficient Deep Learning Algorithms 4 . Furthermore, many other algorithms are also proposed to extract the feature of fruit characters by using its image for identification and classification. Fruit can be detected and graded according to its quality by using the color, shape and size features and such process can be done by using various classifiers. To identify the class of the fruit, morphological feature and color feature were used with neural network 4,5 .
The lack of skills to identify different types of dates in to consumers make it a matter of importance to develop such an application which should be prove helpful and use in marketplace for such purpose. The Dates recognition framework using DNN based on features extraction methods is proposed and implement in this study.
The rest of the study classifies in different sections; related work with the research contributions is described in Section 2. In Section 3, the collected data is represented and the flow of the developed system is given in Section 4. The calculated results and concluded comments are presented in Section 5 and 6 respectively.

Literature Survey
Lot of work is done regarding the classification and recognition of fruits including date fruit using different statistical and artificial intelligence techniques. In 6 described the method of fuzzy inference. In 7 proposed a complete mechanism based on Gaussian mixture for Date classification. The taken samples of Date pertaining to every module were further checked by Mardia's multivariate tests. In 8 introduced a fruit recognition system based on convolutional neural network with experiments performed on the selected samples following the proposed framework array of high quality images dataset of 38409 samples of 60 different fruits. They trained a neural network based model with 40000 iterations to batches of 50 images for fruit detection and achieved accuracy rate of 96.3% during testing phase. A R-CNN based framework proposed for orchard based on state-of-the-art detection technique 9 . Three different fruits were selected for individual study i.e. Apple, Mango and Almond. With the large number of training images flip and scale data augmentation techniques were found best for desired results. The experimental results produced accuracy in detection with an F1-score of >0.9 for apples and mangoes.
Likewise, A method introduced by and others for generating large-scale data set of synthetic images that are semantic 10 . The main purpose behind this generation of segmentation data set in agriculture was to bootstrap or pre-train models of image identification based on computer vision. It presented an approach on adaptive method for detection of area of interest from fruit images 11 . The co-working solution was applied on novel neural architecture and heuristic search was applied on three types of different fruits i.e. Apple, Banana and Orange. They performed the Segmentation process on the input images for easier process then heuristic search was performed based on Adaptive Artificial Neural Network in order to detect the area of interest. Similarly, presented a method of automatic fruit recognition and counting from multiple images 12 . They use bag-of-words model to find fruit image and novel statistical approach for counting. Correlation of 74.2% was achieved with experiments on 28000 color images.
Another automatic fruit segmentation and classification approach was presented by 13 . Extracted most relevant features from image by combining feature learning algorithm with conditional random field and achieved 88% accuracy in order of classification. Furthermore, also proposed a method of fruit recognition based on compressed sensing technique to reduce the complex degree of algorithm 14 . Different class fruits were experimented and they achieved recognition rate of 83%. Also, a fruit harvesting autonomous robot was designed by 15 . In their work advanced method of neural network and stereo matching technique was applied for fruit image identification. Likewise, 16,17 developed an innovative approach of classification on the basis of image processing and machine learning algorithms using various classifiers like SVM, KNN and Nave Bayes. Similarly, presented an approach of fruit detection based on Deep convolutional Network 18 . Fruit detection was performed by adapting the faster region-based CNN model with the combination of color RGB image and Near-Infrared (NIR).
After reviewing the literature, it is observed that majority of the researchers prefer color, shape and size based techniques for the classification and recognition of date and other fruits. Fadel 19 developed a classifier for the separation of date fruit on the RGB color criteria using probabilistic neural network. Similarly, 20 presented a system using RGB images which classify dates in to three categories by extracting their qualities. Likewise, 21 introduced classification method of computer vision and pattern recognition, which was tested on an image containing seven different types of dates and extracting fifteen features from image. The classification system was proposed by Muhammad 1 extract features like texture, size and colors from image of dates then decomposed it into components and applied weber local descriptor to each component for classification. A novel method presented by 22 for Date fruit grading, used KNN classifier with the combination of shape and texture feature by using contour of the date fruit, curvelet transform and local binary pattern respectively.
The research we have done is also based on three features i.e. Color, Shape and Size however some of the techniques that are used in this research are taken from the above mentioned literature. The selected specimen of date fruit in this research are completely different from all the other types that were used in previous research due to the changing of weather and environmental conditions.

Data Collection
Authors of this research are basically belongs to the district Khairpur which is the major date growing city of Pakistan. The season of the dates expired in the month of august 2018. Moreover, different types of Dates grow in district Khairpur. Initially, three types of dates i.e. Aseel, Karbalain and Kupro are selected for experiments. The Total 600 500 Table 1. Statistical information of collected date images sample image of Aseel is shown in Figure 1. The samples of the mentioned date types were collected at the end of July 2018. For the collection of data, a digital single-lens reflex camera device was used at the 120 mega pixel standard in the clear blue sky day with 120,000 lux luminance. The captured date images then saved in to the computer memory. Images were cropped and converted in to 300 dpi by using Adobe Photoshop CS6 software. The

Overview of Dates Recognition System
Various mechanisms are proposed for the identification and classification of fruits and vegetables. In this research project, date fruit is selected because dates are core fruit of district Khairpur. Approximately 22 types of dates produced from which some types are grown in fewer amount while some on the basis of gustatory perception are grown in anenormous amount. From the most grown types of date fruit, three well known types i.e. Aseel, Karbalain and Kupro were selected for experimentation which are most liked and favorable. Experiments for the identification of different types of A dataset for training and testing purpose created in order to save and process captured images. The next step is the preprocessing where normalization technique is applied to all the images in order to make them normalize in size and shape protocols. Then we apply the Noise reduction operation to all the collected date images to remove the very small scale features. The third part of the preprocessing is the Skeletonization which extracted the region-based shape of date fruit from image.
To remove the background and separate only the shape of date, the Image Threshold technique is used which removed the entire background of the image (except date fruit itself) successfully. The threshold image then used as an input to histogram. Using Histogram technique, graph of RED, GREEN and BLUE component of threshold image generated. Furthermore, to extract the color pixel values of the preprocessed image, color feature extraction method was used. Area, perimeter, length, width and centroid of the date image calculated in order to extract its size features. While the shape features of the respective image extracted through the Scale-Invariant Feature Transform (SIFT) method.
By using the Histogram and extracted feature's values, classification of Date fruit took place through Deep Learning method. The trained Neural Network is able to recognize and classify the specific Date type which has similarities in term of graph value, pixel information, and similarities in shape key points, length, width, color percentage to the targeted values.

Experiments and Results
The collected images of date fruit are experimented through the proposed mechanism. The 200image samples were collected from the each selected date type for experimentation. The confusion matrix is also interestingly used by the researchers for the better accuracy in classification of fruits 23 . Hence, the outcome of the developed application is represented using the confusion table for identification of proper date fruit.
The actual process begins with the background removing by image Thresholding technique. For Image enhancement we need a change in color space of the selected image to obtain single channel for luminance information and two other channels for chrominance information. Therefore various color spaces are applied on the original color image in order to obtain specific color space.
From the various applied color space operations, the output of YCbCr and La*b* results are shown in Figure  3. By using RGB technique, image loses its shadow. This means it leads to black crush as it turns dark area in to black. So in this case information of that area also loses.
YCbCr threshold technique is used for experiment which split image in to luminance Y and Chrominance CB and Chrominance Cr. The complex background and different illumination conditions are removed by segmenting the date fruit in this technique. Effects on date region segmentation are avoided as Chromaticity of YCbCr color space is being used.
La*b* technique which describes all perceivable colors in three dimensions L for lightness while a and b greenred and blue-yellow color opponent mathematically. It consist of three layers luminosity 'L*' , chromaticity 'a*' where colors falls along the axis and chromaticity 'b*' which indicate the color falls along the blue-yellow axis. Among three defined techniques, La*b* is the most suitable which was selected for further processing.
After removing the background, it can clearly be visualized that some noise like holes appeared in image region and also some of major boundary contents removed. This is because of their similarities with the background pixels values. To overcome this complicatedness, image is converted into Grayscale level as shown in Figure 3in order to recover the lost data. The Histogram technique is applied to the gray scaled image to find out the graphs for further process.
Luminance of the image shows in horizontal (X) axis in the histogram from left to right edge of the graph as from pure black to pure white. While comparative quantity of light for the luminance is show on the vertical (Y) axis in the graph.
The color histogram technique is also applied on with and without background images. This technique basically used to apply for global contrast of images but in this research histogram technique is applied to extract the pixel information in term of graph.
When the color histogram technique was applied, the output we get is in a graphical form which shows the ratio of the RED, GREEN and BLUE component of the image individually and combined as well. Along with the graph, percentage is also acquired for RED, GREEN and BLUE pixels of the image without background. To analyze the texture of the input image, we need pixel information and Matlab facilitates with the built-in tool to provide the equivalent pixel matrix of the image as shown in Figure 3.
In the Figure 4, first quarter show RED, GREEN and BLUE components collectively. But for recognition of more than one date, graphical representation of all the three components individually i.e. RED, GREEN and BLUE along with their values is required for comparison and classification. So the color histogram of RED, GREEN and BLUE-sensitive pixels shows their values respectively. The mentioned histogram technique is applied to the threshold image of date fruit as well which are without background. And likewise the RED, GREEN and BLUE components of threshold image as a color histogram and pixel information is also acquired.
The intensity level is measured by extracting pixel information of both images with and without background. The concluded results show that the image with background has higher intensity level as compared to the image without background. The pixel information within the boundary of the date fruit image does not show any difference in the both images but the pixels that are out of the fruit region contain different values which hardly effects the next processes. To attain the accurate results we need image with low intensity therefore the image without background is selected for further practice. To extract color feature, an array of M by N by 3 is used to store color image data which define Red, Green and Blue color components for each individual pixel. Combination of R,G,B intensities defines color of each pixel and stored in each color plane at pixel's location. Values of each color component vary between 0 and 1 in an RGB array. A pixel has value (0,0,0) displayed as black and pixel with value (1,1,1) displayed as white.
Along with the color features, also shape feature extracted in order to acquire meaningful results. Using various operations of SIFT (scale invariant feature transformation algorithm), different values for shape feature were generated. Using Harris feature detector, similar features were extracted and matched in original and gray scaled images as shown in Figure 5. SIFT feature generator eliminates the edge responses which leads to point out the major and minor edges in an image and allow to localize the number of accurate key points for shape identification. The edge responses and key point localization also depicted in Figure 5. For better results canny edge detector is used along with the harris feature detector in order to extract the maximum number of similar shape features. Using canny edge detector, number of shape features extracted as it detect number of internal edges in an image along with the boundary edges which allow Harris feature detector to match maximum number of features as shown in Figure 5.
Extracting the region properties of an image, information of area, perimeter, major axis, minor axis, centroid and bounding box is used to find out the size of an image. The perimeter of an image is measured in the number of pixels. First the connected components in a gray scaled or binary image labeled to be identified then using region props() function, multiple regions were extracted and structured in an array. To get the outlines of an array of (x,y) coordinates of individual i.e. horizontal, vertical and diagonal pixels, bw boundries() function along with plot() is used to plot all the possible pixel values over the original image and also total number of pixels count from the binary image as shown in Figure 6. The properties of region of an image such as Area, Perimeter, Bounding box, Length, Width, Centroid and total number of pixels inside the boundary were calculated as size feature and   In this experiment we have trained a network based on 5 neurons as an input, a fully connected hidden layer consist of 10 neurons and an output layer with the minimum batch size of 50. Simulation is performed in MATLAB software and acquired 97.2% accuracy in identification with certain inputs and targeted values as depicted in Figure 8. Acquired results are analyzed through confusion table.
As the image based results are always displayed in a matrix form or table form. Hence, in this research, confusion table is preferred to represent the data because the developed system sometimes efficiently identifies images and gives positive result and negative output produces by system seldom as very few times the system shows true negative and false positive results which are unexpected results. The detailed calculated results represented in Table 2.
The selected three types of dates collectively observed, experimented and achieved successful identification results in terms of true positive with Karbalain date while unexpected result such as true negative also received with the Aseel date. Surprisingly, the results True Positive of Aseel and Kupro dates are quite acceptable. The successful false negative and unpredicted false positive also received with images of Aseel and Kupro.

Conclusion
It is demonstrated that the proposed framework is very effective in the classification of dates based on color, size and shape features extraction from the images of dates. Approximately 22 types of dates are grown in province Sindh; only three types of dates i.e., Aseel, Karbalain and Kupro were selected for recognition and classification. 500 samples were collected among them 350 required samples were experimented. The experiments were performed in Matlab using the proposed framework. Before the recognition, three preprocessing operations were performed on each input image. Different color based techniques; shape and size algorithms were used during the entire process of feature extraction and recognition and concluded results at acceptable level by evaluating graph and extracting pixel information. The calculated results represented with confusion table. The cumulative best accuracy of 97.2% is achieved during the experiments. For future work; number of features like texture and more and also by types of date fruit will be increased to develop more efficient and powerful application of recognition and classification.