Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 28, Pages: 1-6
Divya1 , Babita Pandey2 * and Devendra K. Pandey3
1 School of Computer Science and Engineering, [email protected]
2 School of Computer Applications, [email protected]
3 School of Biosciences, [email protected]
*Author for correspondence
School of Computer Applications,
Email: [email protected]
Background/Objectives: Microarray technology allows the neuromuscular dystrophy to be predicted using gene expression patterns. Microarray gene expression data suffer from curse of high dimensionality i.e. tens of thousands of genes and few samples. So, it is necessitate reducing the dimension for accurate diagnosis. Methods/Statistical Analysis: Firstly, five-fold cross validation technique is applied to generate random results. Two feature selection techniques i.e. t-test and entropy are employed to select the genes. K-nearest neighbor and linear support vector machine are deployed for classification of diseased samples with the help of ranked genes. The performance of these integrated techniques is tested on the microarray dataset of neuromuscular dystrophies i.e. Juvenile Dermatomyositis (JDM) and Fascioscapulohumeral Muscular Dystrophy (FSHD). Findings: Effective disease specific genes are selected from thousand of genes. The value of various performance measures shows that the integration of entropy with k-nearest neighbor has outperformed on both datasets. It has given 89.47% accuracy on JDM dataset and 100% accuracy on FSHD dataset. The integration of these methods is first time application on these two diseases datasets. It can be applied on other neuromuscular disorder datasets as well.
Keywords: Dimension Reduction, Entropy, K-Fold Validation, K-Nearest Neighbor, Neuromuscular Dystrophy, Support Vector Machine
Subscribe now for latest articles and news.