• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2020, Volume: 13, Issue: 21, Pages: 2157-2168

Original Article

Multilingual OCR systems for the regional languages in Balochistan

Received Date:27 March 2020, Accepted Date:27 April 2020, Published Date:24 June 2020

Abstract

Background: There are various languages for which an optical character recognition technology has been developed but most of these address a particular language and thereby multilingual OCR remains a challenge. Methods: Development of multilingual OCR is one of a highly debated issue. Researcher are studying the feasibility and operational feasibility of multilingual OCR from technical as well as from viable aspects. Multilingual OCR includes printed or handwritten characters’ form. In this paper, we study the significance, challenges and issues of developing multilingual OCR system for regional language based on Persio-Arabic script by conducting a comprehensive survery about the operational viability of mmultilingual OCR. Findings: A feedback of 339 participants is collected through an online surgery to find the scope and applicability of multilingual OCR. The respondents were from different linguistic background. The study identified that a large majority of participants are willing to use their native language for the accomplishment of their computational task and deemed that the support of multiple languages in a software would increase their productivity. Novelty: In current form, the study addresses the viability of multilingual OCR of regional language based on Persio-Arabic script. To the best of our knowledge, such kind of study has not been conducted for the domain of Pakistan.

Keywords:
Multilingual, OCR, Multi-fonts, Omnifont, Regional languages

References

  1. Chandio AA, Leghari M, Leghari M, Jalbani AH. Multi-Font and Multi-Size Printed Sindhi Character Recognition using Convolutional Neural Networks. Pakistan Journal of Engineering and Applied Sciences. 2019;24(1).
  2. Shyni SM, Raj MAR, Abirami S. Offline Tamil Handwritten Character Recognition Using Sub Line Direction and Bounding Box Techniques. Indian Journal of Science and Technology. 2015;8(S7):110. doi: 10.17485/ijst/2015/v8is7/67780
  3. Mabee C. 2012. Available from: https://dev.panlex.org/wp-content/uploads/2014/03/ocr-survey.pdf
  4. Jyothi J, Manjusha K, Kumar MA, Soman KP. Innovative Feature Sets for Machine Learning based Telugu Character Recognition. Indian Journal of Science and Technology. 2015;8(24). doi: 10.17485/ijst/2015/v8i24/79996
  5. Abu-Mostafa YS, Psaltis D. Image Normalization by Complex Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1985;PAMI-7(1):46–55. doi: 10.1109/tpami.1985.4767617
  6. Baloch HA. 1997.
  7. Hashmi SZ. 2010.
  8. Achanta R, Estrada F, Wils P, Süsstrunk S. Salient region detection and segmentation. In: International conference on computer vision systems. (pp. 66-75) Springer. 2008.
  9. Canny J. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1986;PAMI-8(6):679–698. Available from: https://dx.doi.org/10.1109/tpami.1986.4767851
  10. Sanjrani AA, Baber J, Bakhtyar M, Noor W, Khalid M. Handwritten optical character recognition system for Sindhi numerals. In2016 International Conference on Computing, Electronic and Electrical Engineering. 2016;p. 262–267.
  11. Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000;22(8):888–905. Available from: https://dx.doi.org/10.1109/34.868688
  12. Jumani AK, Memon MA, Khoso FH, Sanjrani AA, Soomro S. Named entity recognition system for Sindhi language. In: International conference for emerging technologies in computing. (pp. 237-246) Springer. 2018.
  13. Shujra AA, Rajper S, Jumani AK. Measurement of E-learners’ level of interest in online course using Support Vector Machine. Indian Journal of Science and Technology. 2019;12(40):1–9. Available from: https://dx.doi.org/10.17485/ijst/2019/v12i40/147265
  14. Laghari AA, He H, Shafiq M, Khan A. Assessment of quality of experience (QoE) of image compression in social cloud computing. Multiagent and Grid Systems. 2018;14:125–143. Available from: https://dx.doi.org/10.3233/mgs-180284
  15. Karim S, Zhang Y, Laghari AA, Asif MR. IEEE Image processing based proposed drone for detecting and controlling street crimes. In: IEEE 17th International Conference on Communication Technology (ICCT). (pp. 1725-1730) 2017.
  16. Siddiqui MF, Siddique WA, Jumani AK, Ahmed M. Face Detection and Recognition System for Enhancing Security Measures Using Artificial Intelligence System. Indian Journal of Science and Technology. 2020;13(09):1057–1064. Available from: https://dx.doi.org/10.17485/ijst/2020/v013i09/149298
  17. Otsu N. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics. 1979;9(1):62–66. Available from: https://dx.doi.org/10.1109/tsmc.1979.4310076
  18. Perona P, Malik J. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1990;12(7):629–639. Available from: https://dx.doi.org/10.1109/34.56205
  19. Tsao YF, Fu KS. A parallel thinning algorithm for 3-D pictures. Elsevier BV. 1981. doi: 10.1016/0146-664x(81)90011-3
  20. Finlayson GD, Schiele B, Crowley JL. Comprehensive colour image normalization. In: European conference on computer vision. (pp. 475-490) Springer. 1998.
  21. Zhang TY, Suen CY. A fast parallel algorithm for thinning digital patterns. Communications of the ACM. 1984;27(3):236–239. Available from: https://dx.doi.org/10.1145/357994.358023
  22. JUMANI AK, MAHAR MH, KHOSO FH, MEMON MA. Online Text Categorization System Using Support Vector Machine. SINDH UNIVERSITY RESEARCH JOURNAL -SCIENCE SERIES. 2018;50(001):85–90. Available from: https://dx.doi.org/10.26692/surj/2018.01.0014
  23. Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing. 2004;22(10):761–767. Available from: https://dx.doi.org/10.1016/j.imavis.2004.02.006
  24. Pan J, Tompkins WJ. A Real-Time QRS Detection Algorithm. IEEE Transactions on Biomedical Engineering. 1985;BME-32(3):230–236. Available from: https://dx.doi.org/10.1109/tbme.1985.325532
  25. Ibrar M, Mi J, Karim S, Laghari AA, Shaikh SM, Kumar V. Improvement of Large-Vehicle Detection and Monitoring on CPEC Route. Springer Science and Business Media LLC. 2018. doi: 10.1007/s13319-018-0196-5
  26. Karim S, Halepoto IA, Manzoor A, Phulpoto NH, Laghari AA. Vehicle detection in Satellite Imagery using Maximally Stable Extremal Regions. International Journal of Computer Science and Network Security. 2018;18(4):75–78. Available from: http://paper.ijcsns.org/07_book/201804/20180413.pdf
  27. Karim S, Zhang Y, Yin S, Laghari AA, Brohi AA. Impact of compressed and down-scaled training images on vehicle detection in remote sensing imagery. Multimedia Tools and Applications. 2019;78:32565–32583. Available from: https://dx.doi.org/10.1007/s11042-019-08033-x
  28. Morar A, Moldoveanu F, Gröller E. Image segmentation based on active contours without edges. In2012 IEEE 8th international conference on intelligent computer communication and processing. 2012;p. 213–220. doi: 10.1109/ICCP.2012.6356188
  29. Sun C, Si D. Skew and slant correction for document images using gradient direction. Proceedings of the Fourth International Conference on Document Analysis and Recognition. 1997;1:142–146.
  30. Haykin S. A comprehensive foundation. Neural networks. 2004;2:41.
  31. Krizhevsky A, Sutskever I, Hinton GE. In Advances in neural information processing systems. Imagenet classification with deep convolutional neural networks. . 2012;p. 1097–1105. Available from: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
  32. Joachims T. Text categorization with support vector machines: Learning with many relevant features. In: European conference on machine learning. (pp. 137-142) Springer. 1998.
  33. Norgeot B, Glicksberg BS, Butte AJ. A call for deep-learning healthcare. Nature Medicine. 2019;25:14–15. Available from: https://dx.doi.org/10.1038/s41591-018-0320-3
  34. Naz S, Hayat K, Razzak MI, Anwar MW, Madani SA, Khan SU. The optical character recognition of Urdu-like cursive scripts. Pattern Recognition. 2014;47:1229–1248. Available from: https://dx.doi.org/10.1016/j.patcog.2013.09.037

Copyright

© 2020 Sanjrani, Naveed, Sajid, Ahmed, Awan, Jumani. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.