Anwar Ali Sanjrani1∗, Muhammad Shumail Naveed1 , Muhammad Sajid1 , Atiq Ahmed1 , Shafiq Awan2 , Awais Khan Jumani3
1 Department of Computer Science and Information Technology, University of Balochistan Quetta, Pakistan
2 Department of Computer science, Benazir Bhutto Shaheed University Lyari, Karachi, Pakistan 3 ILMA, University Karachi, Sindh, Pakistan
∗Corresponding author:
Anwar Ali Sanjrani
Department of Computer Science and Information Technology, University of Balochistan-Quetta, Pakistan
Email: [email protected]
Abstract
Background: There are various languages for which an optical character recognition technology has been developed but most of these address a particular language and thereby multilingual OCR remains a challenge. Methods: Development of multilingual OCR is one of a highly debated issue. Researcher are studying the feasibility and operational feasibility of multilingual OCR from technical as well as from viable aspects. Multilingual OCR includes printed or handwritten characters’ form. In this paper, we study the significance, challenges and issues of developing multilingual OCR system for regional language based on Persio-Arabic script by conducting a comprehensive survery about the operational viability of mmultilingual OCR. Findings: A feedback of 339 participants is collected through an online surgery to find the scope and applicability of multilingual OCR. The respondents were from different linguistic background. The study identified that a large majority of participants are willing to use their native language for the accomplishment of their computational task and deemed that the support of multiple languages in a software would increase their productivity. Novelty: In current form, the study addresses the viability of multilingual OCR of regional language based on Persio-Arabic script. To the best of our knowledge, such kind of study has not been conducted for the domain of Pakistan.
Keywords:
Multilingual, OCR, Multi-fonts, Omnifont, Regional languages
References
- Chandio AA, Leghari M, Leghari M, Jalbani AH. Multi-Font and Multi-Size Printed Sindhi Character Recognition using Convolutional Neural Networks. Pakistan Journal of Engineering and Applied Sciences. 2019;24(1).
- Shyni SM, Raj MAR, Abirami S. Offline Tamil Handwritten Character Recognition Using Sub Line Direction and Bounding Box Techniques. Indian Journal of Science and Technology. 2015;8(S7):110. doi: 10.17485/ijst/2015/v8is7/67780
- Mabee C. 2012. Available from: https://dev.panlex.org/wp-content/uploads/2014/03/ocr-survey.pdf
- Jyothi J, Manjusha K, Kumar MA, Soman KP. Innovative Feature Sets for Machine Learning based Telugu Character Recognition. Indian Journal of Science and Technology. 2015;8(24). doi: 10.17485/ijst/2015/v8i24/79996
- Abu-Mostafa YS, Psaltis D. Image Normalization by Complex Moments. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1985;PAMI-7(1):46–55. doi: 10.1109/tpami.1985.4767617
- Baloch HA. 1997.
- Hashmi SZ. 2010.
- Achanta R, Estrada F, Wils P, Süsstrunk S. Salient region detection and segmentation. In: International conference on computer vision systems. (pp. 66-75) Springer. 2008.
- Canny J. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1986;PAMI-8(6):679–698. Available from: https://dx.doi.org/10.1109/tpami.1986.4767851
- Sanjrani AA, Baber J, Bakhtyar M, Noor W, Khalid M. Handwritten optical character recognition system for Sindhi numerals. In2016 International Conference on Computing, Electronic and Electrical Engineering. 2016;p. 262–267.
- Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000;22(8):888–905. Available from: https://dx.doi.org/10.1109/34.868688
- Jumani AK, Memon MA, Khoso FH, Sanjrani AA, Soomro S. Named entity recognition system for Sindhi language. In: International conference for emerging technologies in computing. (pp. 237-246) Springer. 2018.
- Shujra AA, Rajper S, Jumani AK. Measurement of E-learners’ level of interest in online course using Support Vector Machine. Indian Journal of Science and Technology. 2019;12(40):1–9. Available from: https://dx.doi.org/10.17485/ijst/2019/v12i40/147265
- Laghari AA, He H, Shafiq M, Khan A. Assessment of quality of experience (QoE) of image compression in social cloud computing. Multiagent and Grid Systems. 2018;14:125–143. Available from: https://dx.doi.org/10.3233/mgs-180284
- Karim S, Zhang Y, Laghari AA, Asif MR. IEEE Image processing based proposed drone for detecting and controlling street crimes. In: IEEE 17th International Conference on Communication Technology (ICCT). (pp. 1725-1730) 2017.
- Siddiqui MF, Siddique WA, Jumani AK, Ahmed M. Face Detection and Recognition System for Enhancing Security Measures Using Artificial Intelligence System. Indian Journal of Science and Technology. 2020;13(09):1057–1064. Available from: https://dx.doi.org/10.17485/ijst/2020/v013i09/149298
- Otsu N. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics. 1979;9(1):62–66. Available from: https://dx.doi.org/10.1109/tsmc.1979.4310076
- Perona P, Malik J. Scale-space and edge detection using anisotropic diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1990;12(7):629–639. Available from: https://dx.doi.org/10.1109/34.56205
- Tsao YF, Fu KS. A parallel thinning algorithm for 3-D pictures. Elsevier BV. 1981. doi: 10.1016/0146-664x(81)90011-3
- Finlayson GD, Schiele B, Crowley JL. Comprehensive colour image normalization. In: European conference on computer vision. (pp. 475-490) Springer. 1998.
- Zhang TY, Suen CY. A fast parallel algorithm for thinning digital patterns. Communications of the ACM. 1984;27(3):236–239. Available from: https://dx.doi.org/10.1145/357994.358023
- JUMANI AK, MAHAR MH, KHOSO FH, MEMON MA. Online Text Categorization System Using Support Vector Machine. SINDH UNIVERSITY RESEARCH JOURNAL -SCIENCE SERIES. 2018;50(001):85–90. Available from: https://dx.doi.org/10.26692/surj/2018.01.0014
- Matas J, Chum O, Urban M, Pajdla T. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing. 2004;22(10):761–767. Available from: https://dx.doi.org/10.1016/j.imavis.2004.02.006
- Pan J, Tompkins WJ. A Real-Time QRS Detection Algorithm. IEEE Transactions on Biomedical Engineering. 1985;BME-32(3):230–236. Available from: https://dx.doi.org/10.1109/tbme.1985.325532
- Ibrar M, Mi J, Karim S, Laghari AA, Shaikh SM, Kumar V. Improvement of Large-Vehicle Detection and Monitoring on CPEC Route. Springer Science and Business Media LLC. 2018. doi: 10.1007/s13319-018-0196-5
- Karim S, Halepoto IA, Manzoor A, Phulpoto NH, Laghari AA. Vehicle detection in Satellite Imagery using Maximally Stable Extremal Regions. International Journal of Computer Science and Network Security. 2018;18(4):75–78. Available from: http://paper.ijcsns.org/07_book/201804/20180413.pdf
- Karim S, Zhang Y, Yin S, Laghari AA, Brohi AA. Impact of compressed and down-scaled training images on vehicle detection in remote sensing imagery. Multimedia Tools and Applications. 2019;78:32565–32583. Available from: https://dx.doi.org/10.1007/s11042-019-08033-x
- Morar A, Moldoveanu F, Gröller E. Image segmentation based on active contours without edges. In2012 IEEE 8th international conference on intelligent computer communication and processing. 2012;p. 213–220. doi: 10.1109/ICCP.2012.6356188
- Sun C, Si D. Skew and slant correction for document images using gradient direction. Proceedings of the Fourth International Conference on Document Analysis and Recognition. 1997;1:142–146.
- Haykin S. A comprehensive foundation. Neural networks. 2004;2:41.
- Krizhevsky A, Sutskever I, Hinton GE. In Advances in neural information processing systems. Imagenet classification with deep convolutional neural networks. . 2012;p. 1097–1105. Available from: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
- Joachims T. Text categorization with support vector machines: Learning with many relevant features. In: European conference on machine learning. (pp. 137-142) Springer. 1998.
- Norgeot B, Glicksberg BS, Butte AJ. A call for deep-learning healthcare. Nature Medicine. 2019;25:14–15. Available from: https://dx.doi.org/10.1038/s41591-018-0320-3
- Naz S, Hayat K, Razzak MI, Anwar MW, Madani SA, Khan SU. The optical character recognition of Urdu-like cursive scripts. Pattern Recognition. 2014;47:1229–1248. Available from: https://dx.doi.org/10.1016/j.patcog.2013.09.037
Copyright
© 2020 Sanjrani, Naveed, Sajid, Ahmed, Awan, Jumani. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Published By Indian Society for Education and Environment (iSee)