Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 27, Pages: 1-9
In-A Kim1 , Kyu-Hyun Cho1 , Hyung-Jun Yim1 , Hwan-Kuk Kim2 and Kyu-Chul Lee1*
1 Department of Computer Engineering, Chungnam National University, 220 Gung-Dong, Yuseong-Gu, Daejeon, Korea; [email protected], [email protected], [email protected], [email protected]
2 Korea Internet and Security Agency, 135, Jungdae-ro, Songpa-gu, Seoul, Korea; [email protected]
HTML5 is a recent version of HTML, a programming language for web documents. It was developed to solve the problems of previous HTML versions. However, the new elements and functions of HTML5 have expanded the range of attacks that third parties can abuse. This is especially the case for public institutions which apply HTML5 in their web sites, and means that their web sites are more vulnerable to these attacks than other private websites. Public institutions’ web sites consist of a larger number of web documents than other general web sites because the web sites provide information regarding policies, voting, and other events, and are connected with subordinate institutions. In this paper, because public institutions web sites consist of a large number of web documents, we usedHadoop, which is an open-source framework for distributed storage and processing. HTML5 vulnerabilities detection was processed for a large number of web documents by using distributed parallel processing. By applying distributed parallel processing for the crawling and detecting processes, we were able to improve the performance of the crawling and detecting processes for a large number of web documents connected to public institutions web sites.
Keywords: Crawling, Distributed Parallel Processing, Hadoop, HTML5 Vulnerability
Subscribe now for latest articles and news.