Multilingual OCR systems for the regional languages in Balochistan

Anwar Ali Sanjrani  lowast; Muhammad Shumail Naveed; Muhammad Sajid; Atiq Ahmed; Shafiq Awan; Awais Khan Jumani    nbsp

doi:10.17485/IJST/v13i21.2

Article

Multilingual OCR systems for the regional languages in Balochistan

VIEWS 1485
PDF 390

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v13i21.2

Year: 2020, Volume: 13, Issue: 21, Pages: 2157-2168

Original Article

Multilingual OCR systems for the regional languages in Balochistan

Anwar Ali Sanjrani^1∗, Muhammad Shumail Naveed¹ , Muhammad Sajid¹ , Atiq Ahmed¹ , Shafiq Awan² , Awais Khan Jumani³

1 Department of Computer Science and Information Technology, University of Balochistan Quetta, Pakistan
2 Department of Computer science, Benazir Bhutto Shaheed University Lyari, Karachi, Pakistan 3 ILMA, University Karachi, Sindh, Pakistan

∗Corresponding author:
Anwar Ali Sanjrani
Department of Computer Science and Information Technology, University of Balochistan-Quetta, Pakistan
Email: [email protected]

Received Date:27 March 2020, Accepted Date:27 April 2020, Published Date:24 June 2020

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background: There are various languages for which an optical character recognition technology has been developed but most of these address a particular language and thereby multilingual OCR remains a challenge. Methods: Development of multilingual OCR is one of a highly debated issue. Researcher are studying the feasibility and operational feasibility of multilingual OCR from technical as well as from viable aspects. Multilingual OCR includes printed or handwritten characters’ form. In this paper, we study the significance, challenges and issues of developing multilingual OCR system for regional language based on Persio-Arabic script by conducting a comprehensive survery about the operational viability of mmultilingual OCR. Findings: A feedback of 339 participants is collected through an online surgery to find the scope and applicability of multilingual OCR. The respondents were from different linguistic background. The study identified that a large majority of participants are willing to use their native language for the accomplishment of their computational task and deemed that the support of multiple languages in a software would increase their productivity. Novelty: In current form, the study addresses the viability of multilingual OCR of regional language based on Persio-Arabic script. To the best of our knowledge, such kind of study has not been conducted for the domain of Pakistan.

Keywords:
Multilingual, OCR, Multi-fonts, Omnifont, Regional languages

References

Chandio AA, Leghari M, Leghari M, Jalbani AH. Multi-Font and Multi-Size Printed Sindhi Character Recognition using Convolutional Neural Networks. Pakistan Journal of Engineering and Applied Sciences. 2019;24(1).
Shyni SM, Raj MAR, Abirami S. Offline Tamil Handwritten Character Recognition Using Sub Line Direction and Bounding Box Techniques. Indian Journal of Science and Technology. 2015;8(S7):110. doi: 10.17485/ijst/2015/v8is7/67780
Mabee C. 2012. Available from: https://dev.panlex.org/wp-content/uploads/2014/03/ocr-survey.pdf
Jyothi J, Manjusha K, Kumar MA, Soman KP. Innovative Feature Sets for Machine Learning based Telugu Character Recognition. Indian Journal of Science and Technology. 2015;8(24). doi: 10.17485/ijst/2015/v8i24/79996
Abu-Mostafa YS, Psaltis D. Image Normalization by Complex Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1985;PAMI-7(1):46–55. doi: 10.1109/tpami.1985.4767617
Baloch HA. 1997.
Hashmi SZ. 2010.
Achanta R, Estrada F, Wils P, Süsstrunk S. Salient region detection and segmentation. In: International conference on computer vision systems. (pp. 66-75) Springer. 2008.
Canny J. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1986;PAMI-8(6):679–698. Available from: https://dx.doi.org/10.1109/tpami.1986.4767851
Sanjrani AA, Baber J, Bakhtyar M, Noor W, Khalid M. Handwritten optical character recognition system for Sindhi numerals. In2016 International Conference on Computing, Electronic and Electrical Engineering. 2016;p. 262–267.
Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000;22(8):888–905. Available from: https://dx.doi.org/10.1109/34.868688
Jumani AK, Memon MA, Khoso FH, Sanjrani AA, Soomro S. Named entity recognition system for Sindhi language. In: International conference for emerging technologies in computing. (pp. 237-246) Springer. 2018.
Shujra AA, Rajper S, Jumani AK. Measurement of E-learners’ level of interest in online course using Support Vector Machine. Indian Journal of Science and Technology. 2019;12(40):1–9. Available from: https://dx.doi.org/10.17485/ijst/2019/v12i40/147265
Laghari AA, He H, Shafiq M, Khan A. Assessment of quality of experience (QoE) of image compression in social cloud computing. Multiagent and Grid Systems. 2018;14:125–143. Available from: https://dx.doi.org/10.3233/mgs-180284
Karim S, Zhang Y, Laghari AA, Asif MR. IEEE Image processing based proposed drone for detecting and controlling street crimes. In: IEEE 17th International Conference on Communication Technology (ICCT). (pp. 1725-1730) 2017.
Siddiqui MF, Siddique WA, Jumani AK, Ahmed M. Face Detection and Recognition System for Enhancing Security Measures Using Artificial Intelligence System. Indian Journal of Science and Technology. 2020;13(09):1057–1064. Available from: https://dx.doi.org/10.17485/ijst/2020/v013i09/149298
Otsu N. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics. 1979;9(1):62–66. Available from: https://dx.doi.org/10.1109/tsmc.1979.4310076
Perona P, Malik J. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1990;12(7):629–639. Available from: https://dx.doi.org/10.1109/34.56205
Tsao YF, Fu KS. A parallel thinning algorithm for 3-D pictures. Elsevier BV. 1981. doi: 10.1016/0146-664x(81)90011-3
Finlayson GD, Schiele B, Crowley JL. Comprehensive colour image normalization. In: European conference on computer vision. (pp. 475-490) Springer. 1998.
Zhang TY, Suen CY. A fast parallel algorithm for thinning digital patterns. Communications of the ACM. 1984;27(3):236–239. Available from: https://dx.doi.org/10.1145/357994.358023
JUMANI AK, MAHAR MH, KHOSO FH, MEMON MA. Online Text Categorization System Using Support Vector Machine. SINDH UNIVERSITY RESEARCH JOURNAL -SCIENCE SERIES. 2018;50(001):85–90. Available from: https://dx.doi.org/10.26692/surj/2018.01.0014
Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing. 2004;22(10):761–767. Available from: https://dx.doi.org/10.1016/j.imavis.2004.02.006
Pan J, Tompkins WJ. A Real-Time QRS Detection Algorithm. IEEE Transactions on Biomedical Engineering. 1985;BME-32(3):230–236. Available from: https://dx.doi.org/10.1109/tbme.1985.325532
Ibrar M, Mi J, Karim S, Laghari AA, Shaikh SM, Kumar V. Improvement of Large-Vehicle Detection and Monitoring on CPEC Route. Springer Science and Business Media LLC. 2018. doi: 10.1007/s13319-018-0196-5
Karim S, Halepoto IA, Manzoor A, Phulpoto NH, Laghari AA. Vehicle detection in Satellite Imagery using Maximally Stable Extremal Regions. International Journal of Computer Science and Network Security. 2018;18(4):75–78. Available from: http://paper.ijcsns.org/07_book/201804/20180413.pdf
Karim S, Zhang Y, Yin S, Laghari AA, Brohi AA. Impact of compressed and down-scaled training images on vehicle detection in remote sensing imagery. Multimedia Tools and Applications. 2019;78:32565–32583. Available from: https://dx.doi.org/10.1007/s11042-019-08033-x
Morar A, Moldoveanu F, Gröller E. Image segmentation based on active contours without edges. In2012 IEEE 8th international conference on intelligent computer communication and processing. 2012;p. 213–220. doi: 10.1109/ICCP.2012.6356188
Sun C, Si D. Skew and slant correction for document images using gradient direction. Proceedings of the Fourth International Conference on Document Analysis and Recognition. 1997;1:142–146.
Haykin S. A comprehensive foundation. Neural networks. 2004;2:41.
Krizhevsky A, Sutskever I, Hinton GE. In Advances in neural information processing systems. Imagenet classification with deep convolutional neural networks. . 2012;p. 1097–1105. Available from: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
Joachims T. Text categorization with support vector machines: Learning with many relevant features. In: European conference on machine learning. (pp. 137-142) Springer. 1998.
Norgeot B, Glicksberg BS, Butte AJ. A call for deep-learning healthcare. Nature Medicine. 2019;25:14–15. Available from: https://dx.doi.org/10.1038/s41591-018-0320-3
Naz S, Hayat K, Razzak MI, Anwar MW, Madani SA, Khan SU. The optical character recognition of Urdu-like cursive scripts. Pattern Recognition. 2014;47:1229–1248. Available from: https://dx.doi.org/10.1016/j.patcog.2013.09.037

Copyright

© 2020 Sanjrani, Naveed, Sajid, Ahmed, Awan, Jumani. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Published By Indian Society for Education and Environment (iSee)