• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2020, Volume: 13, Issue: 35, Pages: 3652-3663

Original Article

OCR for historical Kannada documents using clustering methods

Received Date:31 July 2020, Accepted Date:06 September 2020, Published Date:03 October 2020

Abstract

Motivation: In India, the Language Kannada is an ancient and official language in Karnataka State. The study of ancient Kannada scripts from stone carvings, leaf, metal, cloth, paper and other sources enhances our knowledge on the traditions and culture practiced in Karnataka. Due to Poor Quality, variability and the contrast, the Kannada ancient scripts become very challenging to extract the information or to recognize the characters. Objectives: To design a suitable Optical Character Recognition (OCR) technique to read ancient Kannada scripts. Method: Clustering by fast search and find of density peaks is a state-of-the-art density-based clustering algorithm that can effectively find clusters with arbitrary shapes. However, it requires to calculate the distances between all the points in a data set to determine the density and separation of each point. Consequently, its computational cost is extremely high in the case of large-scale data sets. In this work the given document is preprocessed. The features alike SIFT and SURF are extracted and clustered using K-Means clustering. The similarity is computed using different measures. Findings: The classification accuracy was studied under different clustering methods like Kmeans, Agglomerative, Density based clustering with distance based measures like Euclidean and Manhattan. To evaluate the performance of the proposed method, we created our own database of Ashok, Kadamba, Hoysala and Mysuru scripts and experiment was conducted in a database of 4 classes under 70, 50 and 30 different training models from each class. Novelty: We propose a K-means clustering using SIFT and SURF for Kannada ancient manuscript. Experiment was conducted in our own database to validate the performance of the presented system

Keywords: Historical Kannada; Karnataka; SIFT; SURF; KMeans

References

  1. Sheshadri K, Ambekar PKT, Prasad DP, Kumar RP. An OCR system for printed Kannada using k-means clustering. In: IEEE International Conference on Industrial Technology. Vina del Mar. p. 183–187.
  2. Biswas S, Tai-Hoon K, Bhattacharyya D. Features extraction and verification of signature image using clustering technique. International Journal of International Journal of International Journal of International Journal of Smart. 2010;p. 43–55.
  3. Guruprasad P, Majumdar DJ. Optimal Clustering Technique for Handwritten Nandinagari Character Recognition. International Journal of Computer Applications Technology and Research. 2017;6(5):213–223. Available from: https://dx.doi.org/10.7753/ijcatr0605.1001
  4. Surinta O, Karaaba FM, Schomaker RBL, Wiering MA. Recognition of handwritten characters using local gradient feature descriptors. Engineering Applications of Artificial Intelligence. 2015;45:405–414. Available from: https://dx.doi.org/10.1016/j.engappai.2015.07.017
  5. Kumar M, Jindal M, Sharma R. k-nearest neighbor based offline hand-written Gurmukhi character recognition. International Conference on Image Information Processing. 2011;p. 1–4.
  6. Belagali N, Shanmukhappa A, Angadi. OCR for Handwritten Kannada Language Script. International Journal of Recent Trends in Engineering & Research (IJRTER). 2016;02(08). Available from: https://dopi.org/10.1109/ICPR.2008.4761867
  7. Manjunath AE, Sharath B. Implementing Kannada Optical Character Recognitionon theAndroid Operating System for Kannada Sign Boards. International Journal of Advanced Research in Computer and Communication Engineering. 2013;2(1).
  8. Chandrakala HT, Thippeswamy D, G. A Comprehensive Survey onOCR Techniques for Kannada Script. International Journal of Science and Research (IJSR). 2016;5(4).
  9. Kumar HRS, Ramakrishnan AG. Lipi Gnani. ACM Transactions on Asian and Low-Resource Language Information Processing. 2020;19(4):1–23. Available from: https://dx.doi.org/10.1145/3387632
  10. Bai L, Cheng X, Liang J, Shen H, Guo Y. Fast density clustering strategies based on the k-means algorithm. Pattern Recognition. 2017;71:375–386. Available from: https://dx.doi.org/10.1016/j.patcog.2017.06.023
  11. Dhillon IS, Modha DS. Machine Learning. 2001;42:143–175. Available from: https://dx.doi.org/10.1023/a:1007612920971

Copyright

© 2020 Ravi et al.This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee).

DON'T MISS OUT!

Subscribe now for latest articles and news.