Diabetes is one of the recurrent diseases that target the elderly population worldwide.
Many Machine Learning (ML) or Data Mining techniques have been utilized in predicting diabetes in the last couple of years.
Ioannis Kavakiotis reviewed the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to Prediction and Diagnosis, Diabetic Complications, Genetic Background and Environment, and Health Care and Management.
The proposed procedure or work flow is as shown in
PIMA Indian dataset (Dataset 1) is originally from the National Institute of Diabetes and Kidney diseases.
Number of times pregnant
Plasma Glucose concentration at 2 hours in an oral glucose tolerance test
Diastolic blood pressure (mmHg)
Triceps skin fold thickness (mm)
Two-hour serum insulin (mu U/ml)
Body Mass Index
Diabetes Pedigree Function
Age (years)
Class variable (0 or 1)
The dataset has 768 observations with 8 attributes and one outcome. PIMA is a group of Native Americans living in Arizona.
The second dataset (Dataset 2) is in vivo dataset created with 82 patients chosen randomly. These patients are having different medical conditions like diabetic, non-diabetic, high or low blood pressure. All the patients are in the range of 30-85 age groups. The different attributes of vivo database are listed below.
Gender
Age
Blood Pressure
Machine learning is the area in which machine learns from previous experience. This field is very similar to artificial intelligence. Basically, there are two types of machine learning algorithms, supervised and unsupervised. We have chosen to use supervised type of algorithm since we already know the output in the datasets we have. Supervised learning means mapping input to output based on labeled input output pairs. Labeled data consists of training examples. Each pair consists of input data and desirable output data. Different machine learning algorithms used here are Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbor (KNN) and Naïve Bays classifier. All the classification algorithms are implemented in MATLAB, which is a commercial mathematics software developed by MathWorks, Inc. It is used for algorithm development, data visualization, data analysis and provides interactive environment for numerical calculation.
It is a type of supervised machine learning algorithm used for both regression and classification. In SVM, each data point is plotted on N-dimensional hyper plane (N- the number of attributes/features) that distinctly classifies the data points. To separate data points to two different classes, many possible hyper planes are there. We have to choose the one with maximum margin. Maximum margin is the distance between data points of two classes. Maximizing the distances between the nearest data points (either class) and hyper-plane will help us to decide the right hyper-plane. In the SVM classifier, it is easy to have a linear hyper-plane between these two classes. But, another burning question which arises is, should we need to add this feature manually to have a hyper-plane. No, the SVM algorithm has a technique called the kernel trick. The SVM kernel is a function that takes low dimensional input space and transforms it to a higher dimensional space i.e. it converts not separable problem to separable problem. It is mostly useful in non-linear separation problem. Simply put, it does some extremely complex data transformations, then finds out the process to separate the data based on the labels or outputs you’ve defined. The training data is represented as form of n data points.
Where yn=1 or -1, a target or output variable that denotes the class to which the point xn belongs where n is the number of data samples. The SVM classifier maps the input vectors into decision value and performs classification. The Hyperplane is defined as wT.x+b=0 where w is p dimensional weight vector and b is scalar. The vector w is perpendicular to the separating hyperplane. Scalar parameter b is used to increase the margin. When training dataset is linearly separable, we select these hyperplanes in such a way that there are no points in between them and try on maximizing the distance between hyperplane. Mathematically, we will maximize the distance between the Hyperplane which is defined by wTx+b= -1 and the hyperplane defined by wTx+b= 1 as shown in
Decision tree is another type of supervised learning algorithm which has tree like structure. Decision Trees usually mimic human thinking ability while making a decision, so it is easy to understand. The internal node represents attributes/features of the dataset and branches represent the decision rule and leaf node represents the outcome or result. A decision tree makes decisions by splitting nodes into sub-nodes as shown in
KNN is one of the simplest supervised machine learning technique. K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories. K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm. It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset. KNN algorithm at the training phase just stores the dataset and when it gets new data, and then it classifies that data into a category that is much similar to the new data. In the classification problem, algorithm will find the k-nearest neighbor of unseen data point and then it will assign the class to unseen data point by having the class which has the highest number of data points out of all classes of k neighbors as shown in
It is a probabilistic classifier based on Bays theorem. Bayes’ Theorem provides a way that we can calculate the probability of a hypothesis given our prior knowledge. Naive Bayes is a machine learning model that is used for large volumes of data, even if you are working with data that has millions of data records the recommended approach is Naive Bayes. Prediction of membership probabilities is made for every class such as the probability of data points associated to a particular class. The class having maximum probability is appraised as the most suitable class. NB classifiers conclude that all the variables or features are not related to each other.
Bayes’ theorem:
Where,
Feature selection methods can reduce the number of attributes, which can avoid the redundant features. In this research work we have used Principle Component Analysis (PCA) method for Dataset 1 to reduce the dimensionality. To make predictions of unknown, unseen data points, testing is used. In this research work 70 % data is used for training and 30% data is used for testing.
Dataset 2 is created through in vivo experimentation on 182 patients having different conditions like diabetic, nondiabetic. Photoplethysmography (PPG) technique is used for measurement of blood glucose concentration.
Performance of different classifiers is compared using performance evaluation metrics such as sensitivity, specificity and accuracy.
Classifier |
Sensitivity |
Specificity |
Accuracy |
SVM |
0.7386 |
0.6795 |
0.7222 |
Naïve Bayes |
0.7941 |
0.7667 |
0.7857 |
Decision Tree |
0.9107 |
0.8775 |
0.8997 |
KNN |
0.7402 |
0.6331 |
0.7035 |
Classifier |
Sensitivity |
Specificity |
Accuracy |
SVM |
0.6000 |
0.8148 |
0.75 |
Naïve Bayes |
0.75 |
0.7222 |
0.7273 |
Decision Tree |
0.8947 |
0.9206 |
0.9146 |
KNN |
0 |
0.7500 |
0.7187 |
In this research work four different classifiers are used for diabetes prediction. They are compared with earlier researcher’s work and tabulated in following
Sources |
Accuracy (%) |
Chang sheng Zhu |
58 |
Nitrsh Warke |
62 |
Deepti Sisodia |
65.10 |
A. Thammireddy |
68 |
Aishwarya Mujumdar |
68 |
|
|
N.Sneha |
77.73 |
Neha Prerna T |
74.4 |
Muhammad Azeem Sarwar |
77 |
Sources |
Accuracy (%) |
Aishwarya Mujumdar |
67 |
Neha Prerna T |
68.9 |
Amina Azar |
71.4 |
Nitrsh Warke |
72 |
N. Sneha |
73.48 |
Muhammad Azeem Sarwar |
74 |
A. Thammireddy |
76 |
Getu Gamo Sagaro |
76 |
|
|
Sources |
Accuracy (%) |
Amina Azar |
65.19 |
Nitrsh Warke |
66 |
|
|
Neha Prerna T |
70.8 |
Aishwarya Mujumdar |
72 |
Abdulhakim |
75.97 |
Muhammad Azeem Sarwar |
77 |
Chang sheng Zhu |
78 |
Sources |
Accuracy (%) |
Nitrsh Warke |
68 |
Neha Prerna T |
69.7 |
Muhammad Azeem Sarwar |
71 |
Quan Zou |
72.75 |
Naveen Kishore G |
72.91 |
Getu Gamo Sagaro |
73 |
N.Sneha |
73.18 |
Aishwarya Mujumdar |
74 |
Amina Azar |
75.65 |
A. Thammareddy |
76.22 |
Seyede Somayeh |
80.2 |
|
|
For Naïve Bays and Decision tree classifier our algorithm shows better performance as compared to SVM and KNN.
PPG (photoplethysmography) is a simple, optical based noninvasive technique used in the development of advanced health care.
A functional relationship exists between pulse signal and blood glucose concentration.
Time domain features are over the X and Y axis of pulse as shown in
Below time domain features are extracted for our experimentation.
Width period- Time taken for a single period
Highest peak value- Maximum amplitude of the signal
Time of the highest peak value- Time value when amplitude of a signal is maximum
Diastolic peak amplitude- Amplitude of diastolic peak
Time of Diastolic peak- Time at which diastolic peak occur
Notch amplitude- Amplitude of notch
Time of notch- Time at which notch occurs in a signal
Time difference - Total time taken from start to peak, peak to notch, notch to diastolic peak and last diastolic peak to end
Mean amplitude value- Mean amplitude value of a single period
Standard deviation of single period- Standard deviation of amplitudes
Mean amplitude- Mean of amplitudes from start to-peak, peak to notch, notch to diastolic peak and diastolic peak to end
Auto Regression Coefficients
Kaiser Teager Energy
Power Spectral Density
These features are given as input to neural network.
In this research work, we have used three hidden layer neural network topology. The neural network structure is shown in
The predicted results are compared with the actual glucose value using Clarke Error Grid Analysis. As shown in
The accuracy of proposed system comes to 89.97% for diabetes prediction, when compared with similar previous research work for different algorithms like KNN, Decision Tree and Naïve Bayes classifier. In this research work, two databases are used for experimentation. During this work, four machine learning algorithms are used for prediction of diabetes and performance of each type is measured with respect to different accuracy measures. The results of all four algorithms are compared with actual results of patients. Actual results are recorded using the traditional Invasive Method. After comparing each algorithm with actual result, it is found out that Decision tree algorithm shows better performance amongst all with accuracy of 89.97% for dataset1 for diabetes prediction. For dataset 2, using Clarke error grid analysis we got 94.27 % data points in clinically accepted region A and B. The research work can be extended for extraction of derivative features for better results of measurement of blood glucose concentration.
We would like to thank Bharati Hospital and Research Center, Pune, India for allowing us to collect data in pathology laboratory section.