Classification of Gujarati Documents using Naïve Bayes Classifier

Rajnish M  Rakholia   and Jatinderkumar R  Saini

doi:10.17485/ijst/2017/v10i5/103233

Article

Classification of Gujarati Documents using Naïve Bayes Classifier

VIEWS 1667
PDF 291

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2017/v10i5/103233

Year: 2017, Volume: 10, Issue: 5, Pages: 1-9

Review Article

Classification of Gujarati Documents using Naïve Bayes Classifier

Rajnish M. Rakholia^1* and Jatinderkumar R. Saini²

¹School of Computer Science, R. K. University, Rajkot - 360020, Gujarat, India; [email protected] ²Narmada College of Computer Application, Bharuch - 392011, Gujarat, India; [email protected]

*Author for the correspondence:
Rajnish M. Rakholia
School of Computer Science, R. K. University, Rajkot - 360020, Gujarat, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: Information overload on the web is a major problem faced by institutions and businesses today. Sorting out some useful documents from the web which is written in Indian language is a challenging task due to its morphological variance and language barrier. As on date, there is no document classifier available for Gujarati language. Methods: Keyword search is a one of the way to retrieve the meaningful document from the web, but it doesn’t discriminate by context. In this paper we have presented the Naïve Bayes (NB) statistical machine learning algorithm for classification of Gujarati documents. Six pre-defined categories sports, health, entertainment, business, astrology and spiritual are used for this work. A corpus of 280 Gujarat documents for each category is used for training and testing purpose of the categorizer. We have used k-fold cross validation to evaluate the performance of Naïve Bayes classifier. Findings: The experimental results show that the accuracy of NB classifier without and using features selection was 75.74% and 88.96% respectively. These results prove that the NB classifier contribute effectively in Gujarati documents classification. Applications: Proposed research work is very useful to implement the functionality of directory search in many web portals to sort useful documents and many Information Retrieval (IR) applications.

Keywords: Classification, Document Categorization, Gujarati Language, Naïve Bayes