• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2015, Volume: 8, Issue: 14, Pages: 1-5

Original Article

A Mathematical Approach for Mining Web Content Outliers using Term Frequency Ranking


The Internet is a considerable collection of information that makes it extremely difficult to search and retrieve the required and valuable information. The primary objective of this paper is to provide the user with efficient and effective result through search engine, as the result set contains irrelevant and redundant data called outliers. In this research work, a mathematical approach called Spearman's rank correlation coefficient has been used to calculate the correlation between the document pairs. If the correlation value is 1, then the document is redundant which can be removed. This method depends on the term frequency of common words between the document pairs that is ranked based on the frequency value. This method improves the effectiveness, efficiency and reliability of the search engine. The comparison has been made with the performance of n-gram method, TF.IDF, Linear Correlation and Ranking Correlation. Thus the experimental result shows that the proposed method improves the precision, recall, f-score and accuracy. Thus the result produced by this method improves accuracy.

Keywords: Ranking Correlation Coefficient, Search Engines, Term Frequency, Web Content Mining, Web Content Outliers.


Subscribe now for latest articles and news.