Deciphering complex text-based CAPTCHAs with deep learning

Asadullah Kehar; Rafaqat Hussain Arain; Riaz Ahmed Shaikh

doi:10.17485/IJST/v13i13.126

Article

Deciphering complex text-based CAPTCHAs with deep learning

VIEWS 2874
PDF 401

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v13i13.126

Year: 2020, Volume: 13, Issue: 13, Pages: 1390-1400

Original Article

Deciphering complex text-based CAPTCHAs with deep learning

Asadullah Kehar^1*, Rafaqat Hussain Arain¹, Riaz Ahmed Shaikh¹

¹Department of Computer Science, Shah Abdul Latif University, Khairpur, Pakistan

*Author for correspondence
Asadullah Kehar
Department of Computer Science, Shah Abdul Latif University, Khairpur, Pakistan
Email: [email protected]

Received Date:03 April 2020, Accepted Date:23 April 2020, Published Date:16 May 2020

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background: CAPTCHA is a mechanism to distinguish humans from bots. It has become standard means of protection from the misuse of resources on World Wide Web. Different types of CAPTCHAs are implemented but text-based schemes are the most widely used due to its easiness and robustness. A user is asked to type in the text from an image. The image is intentionally distorted to dodge the bots. Recognizing the text is easy for humans but very hard for computers. Method/Findings: In this work, a text-based CAPTCHA scheme with background clutter and partially connected characters is decoded. The main steps consist on preprocessing, segmentation and recognition. Several digital image processing techniques were applied during preprocessing, segmentation steps and convolutional neural network (CNN) was used for recognition process. Since massive data is required for CNN therefore data was generated synthetically. A complex text-based CAPTCHA scheme with varying number of letters: 3, 4 and 5 letters is decoded with the overall precision of 77.5%, 64.2% and 51.9% respectively.

Keywords: CAPTCHAs; HIPs; image processing; machine learning; CNN

References

Ahn Lv, Blum M, Langford J. Telling humans and computers apart automatically. Communications of the ACM. 2004;47(2):56–60.
doi: 10.1145/966389.966390
Baird HS, Luk M. 2003.
Chellapilla K, Larson K, Simard PY, Czerwinski M. Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs) InCEAS. 2005.
Simard PY, Steinkraus D, Platt JC. Best practices for convolutional neural networks applied to visual document analysis. InIcdar. 2003;3.
Mori G, Malik J. Recognizing objects in adversarial clutter: Breaking a visual CAPTCHA. In2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2003;1.
Yan J, Ahmad E, Low AS. 2008.
Chandavale AA, Sapkal A. A new approach towards segmentation for breaking CAPTCHA. InInternational Conference on Security in Computer Networks and Distributed Systems. (pp. 323-335) Berlin, Heidelberg. Springer. 2012.
Starostenko O, Cruz-Perez C, Uceda-Ponga F, Alarcon-Aquino V. Breaking text-based CAPTCHAs with variable word and character orientation. Elsevier BV. 2015.
doi: 10.1016/j.patcog.2014.09.006
El Ahmad AS, Yan J, Tayara M. The robustness of Google CAPTCHA's. Computing Science. Newcastle University. 2011.
Chellapilla K, Larson K, Simard P, Czerwinski M. Designing human friendly human interaction proofs (HIPs) InProceedings of the SIGCHI conference on Human factors in computing systems. 2005;p. 711–720.
Moy G, Jones N, Harkless C, Potter R. Distortion estimation techniques in solving visual CAPTCHAs. InProceedings of the. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2004;2.
Arain RH, Shaikh RA, Kumar K, Maitlo A, Kehar A, Shah SA, et al. Verifying the Robustness of Text-based CAPTCHAs offered by Local E-Commerce Sites. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY. 2018;18(9):79–84.
Blumenstein M, Verma B, Basli H. A novel feature extraction technique for the recognition of segmented handwritten characters. Seventh International Conference on Document Analysis and Recognition. 2003;p. 137–141.
Cao Z, Huang M, Wang Y. A new drop-falling algorithms segmentation touching character. In2010 IEEE International Conference on Software Engineering and Service Sciences. 2010;p. 380–383.
Huang SY, Lee YK, Bell G, Ou ZH. A projection-based segmentation algorithm for breaking MSN and YAHOO CAPTCHAs. InICSIE’08: Proceedings of the 2008 International Conference of Signal and Image Engineering. 2008.
Hussain R, Gao H, Shaikh RA. Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition. Multimedia Tools and Applications. 2017;76(24):25547–25561. doi: 10.1007/s11042-016-4151-2
Karim S, Zhang Y, Laghari AA, Asif MR. Image processing based proposed drone for detecting and controlling street crimes. Int. Conf. Commun. Technol. Proceedings, ICCT. 2018;p. 1725–1730.
Shaikh RA, Memon I, Hussain R, Maitlo A, Shaikh H. A contemporary approach for object recognition based on spatial layout and low level features’ integration. Multimedia Tools and Applications. 2018. doi: 10.1007/s11042-018-6796-5
Ibrar M, Mi J, Karim S, Laghari AA, Shaikh SM, Kumar V. Improvement of Large-Vehicle Detection and Monitoring on CPEC Route. 3D Research. 2018;9(3). doi: 10.1007/s13319-018-0196-5
Laghari AA, He H, Shafiq M, Khan A. Assessment of quality of experience (QoE) of image compression in social cloud computing. Multiagent and Grid Systems. 2018;14(2):125–143. doi: 10.3233/mgs-180284
Karim S, Zhang Y, Yin S, Laghari AA, Brohi AA. Impact of compressed and down-scaled training images on vehicle detection in remote sensing imagery. Multimedia Tools and Applications. 2019;78(22):32565–32583. doi: 10.1007/s11042-019-08033-x
Yan J, Ahmad E, Low AS. 2008.
Hussain R, Gao H, Shaikh RA, Kumar K. Recognition of text-based CAPTCHAs with neural confidence. International Journal of Computer Science and Information Security. 2016;14(9):290.
Shujra AA, Rajper S, Jumani AK. Measurement of E-learners’ level of interest in online course using Support Vector Machine. Indian Journal of Science and Technology. 2019;12(40):1–9. doi: 10.17485/ijst/2019/v12i40/147265
Jumani AK, Mahar MH, Khoso FH, Memon MA. Online Text Categorization System Using Support Vector Machine. Sindh University Research Journal -Science Series. 2018;50(001):85–90. doi: 10.26692/surj/2018.01.0014
Jiangqing W, Wei C. Vertical Projection Characters Segmentation Based on Minimum Threshold and Curve-Fitting. Journal of South-Central University for Nationalities (Natural Science Edition). 2011(4):22.
Jumani AK, Memon MA, Khoso FH, Sanjrani AA, Soomro S. Named entity recognition system for Sindhi language. InInternational conference for emerging technologies in computing. (pp. 237-246) Cham. Springer. 2018.
Siddiqui MF, Siddique WA, Ahmedh M, Jumani AK. Face Detection and Recognition System for Enhancing Security Measures Using Artificial Intelligence System. Indian Journal of Science and Technology. 2020;13(09):1057–64.

Copyright

Copyright: © 2020 Kehar, Arain, Shaikh. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Published By Indian Society for Education and Environment (iSee)