Optimization is a very important part of machine learning has attracted abundant attention of researchers. With the exponential growth of knowledge quantity and also the increase of model quality, optimization strategies in machine learning face additional challenges
The CNN pre-trained models are firstly used for transferring learning or for fine tuning methods. In this to transfer learning from pre-training models, the most powerful method that helps to learn from pre-trained model is performed. With the pre-trained models the optimization algorithms was applied. After this, for the concepts of CNN structure role in the model, the new CNN model structure were proposed and it gives very good performance from the used others model. In this case, the model structure also has an impact with the optimization algorithms to reduce the training and testing time.
In the optimization based strategy there are different ways to choose parameters those helps to speed up the model training progress. Based on this, different optimization algorithms with parameter training options are practiced and the best parameter performances were used. From the attitude of the gradient data in optimization, in style optimization strategy is divided into 3 categories: First-order optimization strategies, High-order optimizations and heuristic derivative-free optimization strategies
In SGD, all parameters are updated with the same learning rate αt in the tth iteration as
Where
Where
The annoyed entropy data loss
Where
The regularization loss
A very popular technique that used alongside SGD is named momentum. Instead of using only the gradient of the present step to guide the search, momentum also accumulates the gradient of the past steps to work out the direction to travel. Also bill that each gradient update has been determined into modules along with w1 (Weight 1) and w2 (Weight 2) direction. If individually sum these vectors up, their components along the direction w1 wipe out, while the component along the w2 direction is reinforced.
For an update, this adds to the module along w2 (Weight 2), while zeroing out the component in w1 direction. This help us the move faster to the minima. For this reason, momentum is additionally
Where
Where
Where
RMSprop can be compared very similarly to Adadelta. It attempts to extend Adagrad in a very similar way that Adadelta does. It maintains the per-weight learning rate while eliminating the decaying learning rate inherent in Adagrad. RMSprop preserves a "cache" of past weight values which decay over time given a decay parameter and accumulates the square gradient. The current gradient is divided by this "leaky" cache to modulate the learning rate per weight. RMSprop preserves a global learning rate parameter that Adadelta gets rid of. One prominent difference that is apparent based on the visualizations provided. RMSprop does not have the same initialization problem that Adadelta has. The problem addressed in Adadelta and RMSprop
Adam
Here, the 1st and 2nd order moments are defined as:
Where β1 and β2 are the decay rates for first and second moments, respectively,
Where
Recently, many researchers were focused their thesis work on the optimization algorithm for reducing the training and testing time in SAR image detecting and recognition. Despeckling and colorization of SAR images simultaneously has been done, and they compare the performance of the method with that of other CNN methods (CNN
It is true that presenting the classifier with enough information is essential to achieving good performance, large training set sizes can be detrimental to generalization performance and invariably need significant training time. Such large training sets can often redundant or noisy samples which only introduce unnecessary computations and could cause learning bias
The method or techniques used for the research work is a critical step that forwards the task of the proposed concepts to bring at the final step. Surprisingly methodology is a key for the research work. Methodology of this research works also clued very clearly as below in
MSTAR database, is dataset that collected for the research work, which is provided by the Sandia National Laboratory (SNR) SAR sensor platform operating at X-band. The collection was cooperatively sponsored by Defense Advanced Research Projects Agency (DARPA) and Air force Research Laboratory as part of the Moving and Stationary Target Acquisition and Recognition (MSTAR) program.
Automatic Target Recognition (ATR) of Synthetic Aperture Radar (SAR) images is an area of continuing research by all branches of military and some research institutions
Obvious that there are many mutual databases for close range digital photography, such as Image Net, PASCAL, and Label ME, etc. While for SAR, the available large database is very limited and the only public released database is MSTAR
|
|
|
1 |
|
|
2 |
|
|
3 |
|
|
4 |
|
|
5 |
|
|
6 |
|
|
|
|
|
From the above
|
|
|
1 |
|
|
2 |
|
|
3 |
|
|
4 |
|
|
5 |
|
|
6 |
|
|
|
|
|
Below
SAR images recognition often proceeds in three stages: Feature extraction, detection, and recognition
LeNet
ResNet was proposed by He et al.
With the successful use of CNNs for image recognition, Simonyan and Zisserman proposed an easy and effective design principle for CNNs. This new architecture was termed as VGG and was modular in layers pattern. Grounded on these outcomes, VGG substituted the 11x11 and 5x5 filters with a stack of 3x3 filters layer and experimentally confirmed that concurrent placement of 3x3 filters can induce the effect of the large size filter
The main drawback allied with VGG was high computational cost. VGG16 model has 41 layers and required image sizes of [227x227x3].
The deep Convolutional Neural Networks (CNNs) gained a grand success on broad of computer visions. However, CNN model structures for training the data, were needs high CPU or GPU capacity of computer resources. The researcher, in this field was concerned on designing and developing CNN structures and optimization algorithm to reduce the training and testing time in SAR image detection and recognition. The two scenarios were proposed for the research works are basically model-based and optimization algorithms–based method from the Chui and et al study perspectives
In the proposed model structure, the convolution layer, non-linear activation function (ReLU), pooling layer and batch normalizations layers are the basic CNN structures. Each convolution layer operation was used 5x5 which it is number of filter. Along with the MaxPool or pooling down the image size the Batch Norm layer was the most important layer that helps to increase the model performance. Accordingly, the proposed model were achieves best training and testing time reduction while using batch norm and maxpooling layers in the model structure. From this, using batch norm method shows that preferred model performance improvement in the proposed model structure. Although, resizing the input image size to 128x128x3 instead of 224x224x3 and 227x227x3 also gives more additional support in reducing the training and testing time. The purpose of ReLU was to announce non-linearity in the ConvNets, since most of the real-world data would want the ConvNets to learn would be non-linear and Convolution was a linear process element wise matrix multiplication and addition, so justification for non-linearity by introducing a non-linear function like ReLU has been used.
Batch normalization helps to initializing bias to zero or to other small constant random values and breaks the weight symmetry. Accordingly the down sampling method used was 3x3, which it was down sample size and gives additional benefit in the CNN structures of MaxPool Layer. So in this proposed model structure, the 3x3 maxpooling was used that gives more training time reduction. The softmax that helps to provide probability distribution of the given image to one of the label trained images. The final fc_6 is shows the number of class or labels used for the training model was six target class labels which are used in the model. As a general, the CNN structures starting from input layer to output layer were proposed and used as it has been shown below. Each and every layers are tends to their specific task to learn based on what input image given to the input layer of the proposed model. So, the proposed model has three convolutional layers followed by ReLU and batch norm layer instead of dropout layer. From this, the batch norm layer used was found best performance rather than dropout layer in the network structure. With this, the input image size used also plays its own impact in the proposed model. The maxpool size of 3x3 were also used to sample down the image feature size in that it found best result in the new proposed model structure. In such a way that, the new proposed model with SGDM optimization method was achieved best training and testing time reduction than the used pre-training models.
The effectiveness of the proposed method has been evaluated on six different targets from public MSTAR SAR image dataset which are: armored personnel carrier: BTR60; tank: SLICY, T62; Air Defense unit: ZSU234; truck: ZIL131; bull dozer: D7 are used. Here, the below information was explained with Matlab for image data preparing for further detection and recognition. Managing image dataset for recognition tasks plays eyeful steps and it was done using IMDS (Image Data Store) method. IMDS was used to store the different categories of image with corresponding class label. To see number of images available within each category from IMDS, the count each label function was used. To maintain the equal number of images for the purpose of reducing the redundancy problems and classification problems in all categories in the IMDS, the min set count function was used to equalize the image numbers in each class labels. All image class label information was stored in IMDS which stores the image data properly. To find the location of image in various categories in IMDS, find method was used. Subsequently AlexNet or ResNet50 or Vgg16 Models or proposed model started. The input layer and output layers are specified with input image size and output label categories numbers for the models.
In IMDS the dataset was divided in to training and testing set by split each label function. In this training data set size was 70% and test set size was 30% as specified using split each label function. Many engineers mistakenly overwrite the images data source while processing. To solve this mistake; the method called augmented image data source store to resize and convert any gray scale image to RGB image totally image processing for further model understanding, on both training set and testing set were done. In computer vision, the feature vector was usually just the image pixel values. One special-purpose layer used commonly in computer vision was known as a “convolutional” layer. In its place of deriving optimal weights to multiply with all data point in the input vector, a convolutional layer derives an image kernel that it convolves with the input vector.
Within this research work, MATLAB tools were used, since MATLAB is easy or very accessible with deep learning tool box and MATLAB also accelerates deep learning.
After different filtration algorithm were used and compared with each other, the Guided filter algorithm achieves high PSNR value in SAR image speckle noise removal technique as shown in the
|
|
|
|
|
1 |
|
|
|
|
2 |
|
|
|
|
3 |
|
|
|
|
4 |
|
|
|
|
5 |
|
|
|
|
The
Image features were extracted from image stored in IMDS and its information again stored as a matrix, and converted to gray scale image value to perform recognition task. For this work, the activation function was used. By stacking these layers together, a convolution neural network effectively implements a template-matching approach to recognize objects in an image, except that it creates hundreds of general templates and usually stacks multiple convolutional layers together as shown by below template. Early layers usually learn to recognize simple edges and lines (such as those shown in
A confusion matrix lists the values of known cover types of the reference data in the columns and of the classified data in the rows. The main diagonal of the matrix lists the properly classified pixels. One advantage of a confusion matrix is that it is easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another). A confusion matrix holds information about actual and predicted classifications done by a classification system on test data. Performance of such systems is usually evaluated using the data in the matrix. In this; the method called Error Correcting Output Code (ECOC) was used. These methods used
Generally, confusion matrix were done using AlexNet, ResNet-50, VGG16 and proposed models on testing data of 30% from the whole data used and accuracy of the models were done. Accuracy assessment is an essential step in the process of analyzing remote sensing data. Remote sensing products can serve as the basis for economic decisions. Potential users have to know about the steadfastness of the data when confronted with maps derived from remote sensing data. Accuracy expresses "exactness"; it dealings the agreement between a standard assumed to be correct and classified image of unknown quality. If the image classification agrees closely with the standard, it is said to be accurate. The most common way to express classification accuracy is the preparation of an error matrix also known as confusion matrix or contingency matrix. Diverse measures and statistics can be derived from the values in an error matrix. The basic form of an error matrix and non-statistical measures are described
|
|
|
|
|
|
|
|
AlexNet |
0. 89% |
100 |
91 |
100 |
89 |
68 |
90 |
ResNet-50 |
0. 92% |
100 |
63 |
100 |
93 |
98 |
98 |
VGG16 |
0. 86% |
92 |
61 |
100 |
83 |
95 |
89 |
Proposed |
0.95% |
100 |
96 |
100 |
82 |
92 |
100 |
The performance of the all three pre-trained and proposed models was compared based accuracy. From those, the average accuracy of each classes were computed and the overall model accuracy performances were achieved. Accordingly, AlexNet, ResNet-50, vgg16 and proposed models achieved accuracy of 89%, 92%, 86% and 95% respectively. In this the proposed model were achieved better accuracy of 95%.
Detecting the target using bounding box regression model and RCNN (Region Based CNN) object detectors were used based embedded features representation. Convolutional Neural Network (CNN) is biologically inspired and multilayer deep learning models are trained end to end from raw image pixel values to the recognition outputs
The algorithm of synthetic aperture radar automatic target recognition (SAR-ATR) is generally calm of the withdrawal of a set of features that transform the raw input into a representation, followed by a trainable classifier
Accordingly, the following detection results were done. In this input x is the image given to the model, m-d feature is massive number of images feature represented as f(x), k-way classifier (
Batch normalization was employed to deal with the problems associated with
In the model training progress, Mini-Batch Gradient optimizer used was 32, faster linear learning of Beta Tolerance (InitialLearnRate) was = 0.001 and 1.0000e-04 as it seen on the model training progress, Momentum= 0.900, Learn rate drop factor=0.2, epoch=10 were used.
The following discussions shows that the performance comparison of the three pre-trained and proposed models used was compared in their accuracy and speed of training and testing time to recognize the SAR image using optimization algorithm method. To reduce high train and test time, optimization algorithms such as SGDM, RMSProp and Adam optimization algorithms with parameter initialization methods were used with the pre-trained model and from the scratch the proposed models were also used. The used models are AlexNet + SGDM, AlexNet + RMSprop, AlexNet + Adam, VGG16 + SGDM, VGG16 + RMSProp, VGG16 + Adam, proposed model + SGDM, proposed model + RMSProp, proposed model + Adam. The training time of AlexNet + SGDM: 32’and 4s and testing time was 24s, ResNet-50 + SGDM:174’ and 5s and testing time was 27s, VGG16 + SGDM: 324’and 21s and testing time was 44s, proposed model + SGDM: 26’ and 49s and testing time was 17s, AlexNet + RMSProp: 41’ and 5s and testing time was 26s, ResNet-50 + RMSProp: 190’ and 36s and testing time was 29s, VGG16 + RMSProp: 338’ and 45s and testing time was 51s, proposed model + RMSProp: 30’and 14s and testing time was 18s, AlexNet + Adam: 40’ and 48s and testing time was 27s, ResNet-50 + Adam: 177’ and 24s and testing time was 36s, VGG16 + Adam: 444’ and 2s and testing time was 1’and 33s, proposed model + Adam: 28’ and 53s and testing time was 21s. In the network using larger pooling size generally results in worse performance, since it throws away too much information during sub sampling
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
Train time |
32 |
4 |
174 |
5 |
324 |
21 |
26 |
49 |
Test time |
0 |
24 |
0 |
27 |
0 |
44 |
0 |
17 |
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
Train time |
41 |
5 |
190 |
36 |
338 |
45 |
30 |
14 |
Test time |
0 |
26 |
0 |
29 |
0 |
51 |
0 |
18 |
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
Train time |
40 |
48 |
177 |
24 |
444 |
2 |
28 |
53 |
Test time |
0 |
27 |
0 |
36 |
1 |
33 |
0 |
21 |
In this research work some contributions have been presented that advances in the field of SAR image detection and recognition. The contributions are focused on optimization algorithm for reducing the training and testing time for SAR image detection and recognition based on optimization-based and model-based method.
In the future may consider further enhance the performance of detection and recognition, we will conduct further studies on an appropriate optimization algorithm with different deep learning models.