Indian Journal of Science and Technology
DOI: 10.17485/ijst/2015/v8i17/61595
Year: 2015, Volume: 8, Issue: 17, Pages: 1-6
Original Article
R. Abarna1* and S. Pradeepa2
1 School of Computing, SASTRA University, Thanjavur - 613401, Tamil Nadu, India; [email protected]
2 SASTRA University, Thanjavur - 613401, Tamil Nadu, India; [email protected]
Mining the webpage is the predominant technique to grab the data from the internet. It is the extracting job from the web pages in either supervised or unsupervised. Unsupervised extraction extracts more irrelevant data than the relevant and it fails to eliminate the data redundancy. The proposed hybrid approach separating the relevant content from the webpages and filter out the replication. The newly generated hybrid algorithm performs the region separation using tag tree and straining the repeated information. Hence the output contains only reliable data. This approach is the proficient way for extracting the relevant information from the webpages.
Keywords: Hybrid Approach for Extraction, Multi String Alignment, Region Separation, Web Content Mining
Subscribe now for latest articles and news.