• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2015, Volume: 8, Issue: 17, Pages: 1-6

Original Article

A Hybrid Approach for Extracting Web Information

Abstract

Mining the webpage is the predominant technique to grab the data from the internet. It is the extracting job from the web pages in either supervised or unsupervised. Unsupervised extraction extracts more irrelevant data than the relevant and it fails to eliminate the data redundancy. The proposed hybrid approach separating the relevant content from the webpages and filter out the replication. The newly generated hybrid algorithm performs the region separation using tag tree and straining the repeated information. Hence the output contains only reliable data. This approach is the proficient way for extracting the relevant information from the webpages.
Keywords: Hybrid Approach for Extraction, Multi String Alignment, Region Separation, Web Content Mining

DON'T MISS OUT!

Subscribe now for latest articles and news.