• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2017, Volume: 10, Issue: 17, Pages: 1-5

Original Article

A Survey on Uniform Resource Locator and Content Matching to Discover Deep- Web Pages


Objectives: 1) The Objective is to harvest the deep web pages efficiently1 2) Personalize search according to user interest. 3) Combine pre-query and post-query approach. Methods/Statistical Analysis: Three methods are used 1. URL Matching: This method is used to match the query content in URL. For that, the system gets a link from online and site database. Links are extracted to match the user entered query content. 2. Content matching: This method is used extracting the links and getting form content and matching the user-entered query .If match calculates the occurrence frequency of that query on the form. For that, it use Jsoup library.3. Pre-query Algorithm: This method used to display pre-query result after entering focus in the search box. For that user login to the system that time system select his profile and according to that links will display to the user. Findings: As in the existing system, most of the search engines display the results according to the most visited sites or recently added sites. To find the deep web pages from the databases is a challenging, because they are not enrolled with any web indexes and keep constantly changing. In this system, Smart Crawler performs URL matching and content matching to discover the deep web pages. Proposed crawler proficiently gets deep-web network from wide destination and accomplishes the higher outcome from different crawlers. Page ranking is performed and it displays high ranked results on the result page .Here it provided personalized search, results display according to the user professions. Maintaining log file and the pre-query result will reduce time. First time this crawler perform personalize search this means that this crawler is unique. During the evaluation, notice that proposed approach is more efficient than the existing crawler. Application/Improvements: The application gathers real-time user profile information from user accounts. Therefore, it must be reliable and keep those data in safe. This crawler is used as the search engine for e-learning application, E-shopping site. Links can be a bookmark for future use. As in improvement, it can rank the pages according to user-entered review for each link. Also, the opened link will display page content in task extraction form. That is code, concept, URL on the page.

Keyword: Deep Web, IP, Positioning, Smart Crawler


Subscribe now for latest articles and news.