A Hybrid Approach for Extracting Web Information

R  Abarna   and S  Pradeepa

doi:10.17485/ijst/2015/v8i17/61595

Article

A Hybrid Approach for Extracting Web Information

VIEWS 844
PDF 290

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2015/v8i17/61595

Year: 2015, Volume: 8, Issue: 17, Pages: 1-6

Original Article

A Hybrid Approach for Extracting Web Information

R. Abarna^1* and S. Pradeepa²

¹School of Computing, SASTRA University, Thanjavur - 613401, Tamil Nadu, India; [email protected]
² SASTRA University, Thanjavur - 613401, Tamil Nadu, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Mining the webpage is the predominant technique to grab the data from the internet. It is the extracting job from the web pages in either supervised or unsupervised. Unsupervised extraction extracts more irrelevant data than the relevant and it fails to eliminate the data redundancy. The proposed hybrid approach separating the relevant content from the webpages and filter out the replication. The newly generated hybrid algorithm performs the region separation using tag tree and straining the repeated information. Hence the output contains only reliable data. This approach is the proficient way for extracting the relevant information from the webpages.
Keywords: Hybrid Approach for Extraction, Multi String Alignment, Region Separation, Web Content Mining