Indian Journal of Science and Technology
DOI: 10.17485/IJST/v16i42.1799
Year: 2023, Volume: 16, Issue: 42, Pages: 3771-3777
Original Article
João Rafael Gonçalves Evangelista1*, Renato José Sassi1, Márcio Romero1
1Universidade Nove de Julho, São Paulo, PA, 01525-000, Brazil
*Corresponding Author
Email: [email protected]
Received Date:18 July 2023, Accepted Date:24 September 2023, Published Date:13 November 2023
Objectives: Apply Natural Language Processing (NLP) to enrich Google Hacking Database (GHDB) with attributes and convert its textual values to ASCII, to enable the application of Machine Learning techniques to group Dorks by similarity and find vulnerabilities. Methods: The computational experiments were conducted in seven steps: Selection of the GHDB, Removal of Hyperlinks and Deletion of Attributes, Removal of the Site Parameter from Dorks, Removal of Outliers and Stopwords, Enrichment with NLP, Base Transformation, and Application of the Self-Organizing Maps (SOM). Findings: The application of NLP allowed segmenting of the Dorks by characters. After that, we converted the characters to their numeric values in ASCII. So, we enrich the GHDB and enable the application of ML techniques, in this case, the SOM. The results obtained with the application of the SOM were considered good. The topographic error (TE) and quantization error (QE) values of the maps generated by SOM were close to 0, which means good accuracy and the maps represent the input data well. Novelty: The formation of clusters of Dorks with SOM after enriching the GHDB with NLP.
Keywords: Google Hacking Database, Dorks, Natural Language Processing, SelfOrganizing Maps, Enrichment
© 2023 Evangelista et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)
Subscribe now for latest articles and news.