Detection and Segmentation of Blood Cells Based on Supervised Learning

Cells in our blood can be categorized. Pathologist detects these blood cells and diagnoses different disease. This paper describes the way in which we can detect these cells in body by using the concept of image processing. The manual way of detecting these cells and diagnosing the disease is very time consuming and has more chances of making mistakes. This method is developing to identify the cell when any cell image is given as an input. The methodology used is, image acquisition, image segmentation, trained images, test images, feature extraction of cells, feature vector dataset, similarity of cells. This will detect the different cells.


Introduction
Different diseases found in human blood cell were some can be very harmful and can also cause death of the patient. Detecting these cells can help the pathologists in diagnosing diseases like AIDS, Leukemia and blood cancer 1 In order to control such harmful disease many scientist have tried different method for identifying the different cells in human blood 3 . Human Blood contains Red Blood Cells, White Blood Cells or Leukocytes and platelets 3 . Red blood cells are erythrocytes. These are most common in blood. In human blood there is 20-30 trillion RBC. WBC is fewer in human blood. In human blood there is 4,000-10,000 (µm) WBC 10 . Platelets are smallest cells in our blood. In human blood the size of platelets is 2-3µm in diameter 4 . These cells are detected by the pathologist and diseases are diagnosed. This paper focuses on WBC cells. White Blood Cells builds our immune system which protect the body from contagious disease 3 .The count of WBC indicates the diseases in human body 16 . The WBC cells are different from other blood cells because of their shape, color and size 3 . WBC has five different types based on the shape and color 3 . The five type of WBC are neutrophil, eosinophils, monocytes, basophils and lymphocytes 9 . These five types of WBC are classified in two categories Granulocytes and A granulocytes.
Granulocytes are the cells with several nuclei lobes. Granulocytes have following types of cells Basophils, Eosinophils and Neutrophils. A granulocytes area cells with no granules. Granulocytes have following types of cells Monocytes and Lymphocytes. The neutrophil cells contains nucleus divided in two to five lobes. Diameter of neutrophil cells is 10-12 (µm).The eosinophil contains nucleus divided in two lobes. Diameter of eosinophil cells is 10-12 (µm).The basophils contains nucleus of bi or tri lobed but it is difficult to see these lobes because coarse granules hide it. Diameter of these cells 12-15(µm).The lymphocyte contains one large nucleus which covers entire cytoplasm. Diameter of these cells 7-8(µm) for large and 12-15(µm) is small. The monocyte contains one nucleus of kidney shape. Diameter of these cells 15-30(µm).The the amount of WBC cells in human body are Neutrophil-50-70%, Eosinophil-1-4%, Basophil-1%, Monocyte-6% and Lymphocyte-20-40% 4,16 . The pathologists are detecting these cells manually which is more prone to human errors and time consuming also.
Manual detection is more troublesome because the human eyes have to detect the cells based on shape of the cells. The system proposed in this paper is to detect the different cells. This will give less error prone result within less amount of time. The pathologists have to detect the count of the cells manually .This can be time consuming and can have human errors. Manual detection is more troublesome because the human eyes have to detect the cells based on shape of the cells. The proposed method will detect the different cells of different shapes of the white blood cells. This will give less error prone result within less amount of time.

System Proposed
The system developed is automated system where it detects the different cells in human blood. This process has different steps. These steps are shown in the following diagram (Figure1). The input is image of blood smear which has only one cell of any type of WBC cells. The output of the system will identify the cell detected and the file path where it is stored is also registered.

Methodology
The methodology of the proposed system to detect different cells is as follows.

Image Segmentation
These images obtained are given as input to segmentation phase. Segmentation of an image is clustering similar property pixels into one cluster. The segmentation is done so that the image is represented in more meaningful way so that it becomes easier to analyse. This system uses image segmentation since we will be referring to different cells in blood smear. Using segmentation we will be separating the image background and the cell so that we could only get the cell nucleus in the foreground of the image 14,16 . Image segmentation will also help to distinguish between the shapes of cell nucleus. Here the region of interest (nucleus) is set to its original color and background is set to white.

Trained Images
It is a data set of images which will have only one cell which can be any of the five types of WBC cell. These will be used to create the data set of feature vector.

Test Images
It is a data set of images which will have only one cell which can be any of the five types of WBC cell. The Feature vector of these images will be used to compare against the standard feature vector in the feature vector dataset.

Feature Extraction
This is very important phase of this project. Feature extraction includes morphological operations 1 . The features are based on shape, colour and texture feature 13 . It extracts some important information of the object of interest. In this system we are trying to find the shape features of the cell's nucleus 12 . The feature vector of the trained image and the feature vector obtained from the test image which is blood smear which has only one cell of any type of WBC are compared 11 . The shape features are obtained by the calculating the moments of each cell. Following are steps to calculate feature vector: 1. Dividing the image into regions.
2. Calculating area of each region.
3. Calculate the number of x-coordinate pixels and y-coordinate pixels of each region. 4. Calculate x centroid and y centroid of each region. 5. Calculate x and y centroid of entire image.
6. Calculating the feature vector

Dividing the image into regions
After segmenting the image the image is divided into regions. This step is needed since it becomes easy to get the region of interest. It also becomes easier to get the area and centroid of the region of interest

Calculating area of each regions
Here we count the number of pixels which satisfies some condition.
We use the following formula for it: y) gives the value of the pixel at coordinates x, x, is the height and y is the width of each block made, r, g, and b are the red, green, and blue values of pixel of image respectively 3 .

Calculate the number of x-coordinate pixels and y coordinate pixels
Here we are summing up all the values of x-coordinate pixel and y-coordinate pixel which satisfies above condition.

Calculating x-centroid and y-centroid of each region
Here it calculates the x-centroid and y-centroid of each region.
The following is the formula to calculate x-centroid of each region: xcentroid sumx area = sumx is the number of pixel in x-coordinate area is the area of each region The following is the formula to calculate y-centroid of each region: ycentroid sumy area = sumy is the number of pixel in y-coordinate area is the area of each region

Calculating the x-centroid and y-centroid of entire image
Here it calculates the x-centroid and y-centroid of entire image region.

Calculating the feature vector
Here we are calculating the feature vector. Feature vector is vector of moments Moments: Image moment is average or moment of image pixel's intensities or moment function which is usually has some properties related to image. These properties can be area, centroid and so on. Moments are suitable in shape learning. Zero to third order moments are applied for shape learning and orientation 18 . Formula to calculate moment is as follows: Zeroth order moment gives the information of the area in the foreground or it counts the total number of pixel in the region of interest. M 10 gives the first order moment along the x-axis 8,15 .
M 01 gives the first order moment along y-axis 8,15 .
M 20 gives the second order moment along the x-axis 17 .
M 02 gives the second order moment along the y-axis 17 .

Feature Vector Data Set
It is the data set which will have the name of the cell, the file path was the cell is stored and the feature vector of the cell.

Similarity of Cells
The features of the image stored in the feature vectors are compared to obtain a match. The feature vector of the test image is associated with the feature vector of the different WBC cells in the trained images feature data set. To relate the two feature vector we calculate the Coefficient of Correlation (CoC) of the two feature vectors.

Coefficient of Correlation
Correlation helps in judging resemblances between the two measured vector quantities which analysis whether the two quantities are identical or they are completely different. Pearson's correlation coefficient is denoted as r. It was developed by Karl Pearson. It is widely used in pattern Learning and computer vision 2 .
Steps to calculate coefficient correlation: Considering two feature vector u, v. To find average of the two feature vectors u,v.
Calculating the difference vector of u,v. Calculating unit vector. Calculating Correlation (Similarity) of the two unit Vector. To find average of the two feature vectors u,v. Determine the average of the feature vector by using,

Calculating the Difference Vector of u,v.
Here we subtract each element of the feature vector by the average of the feature vector.
Formula to find difference vector for feature vector u: {u} is the feature vector .
{v} is the feature vector. u is the average of feature vector u 6 .
v is the average of feature vector v 6 .

Calculating Unit Vector
Here we calculate the unit vector which is the difference vector divided by the length.
|u| is the length of the feature vector u.
To evaluate the unit vector of the vector v, |v| is the length of the feature vector v 16 .

Calculating Correlation (Similarity) of the Two Unit Vector
Here we find the dot product of the two unit vector. Formula to calculate the dot product: |u| is the length of the feature vector u 16 .
|v| is the length of the feature vector v.
If the coefficient of correlation is calculated as value equal to 1 than the two images are absolutely identical 2, 16 .
If the coefficient of correlation is calculated as value equal to 0 than the two images are completely uncorrelated 2 .
If the coefficient of correlation is calculated as value equal is -1 than the two images are completely anticorrelated 2, 16 .

Experimental Results and Comparisons
The images of the different types of white blood cells before and after segmentation are displayed 7, 9 (Figure 2a,b).
Table1 shows the Feature vector extraction based on moments to check for similarity of nucleus of the cell image.    Table 2 shows the result class in which the test fall into. The result of testing is done on different images.

Conclusion
The Automatic Detection of WBC Cell Count is useful for the pathologist for detecting the cells. This system will reduce the errors which can occur during detection of cells manually. By comparing the feature vectors of the testing image feature data set and the trained image feature vector dataset we can analyse the relation between the vectors. The system finds the correlation between the two vectors and based on the similarity value it classifies the cell type 4 .
This system would be useful in the pathology lab for fast detection of the cells.