Indian Journal of Science and Technology
DOI: 10.17485/ijst/2015/v8i14/55679
Year: 2015, Volume: 8, Issue: 14, Pages: 1-5
Original Article
S. Sathya Bama* , M. S. Ifran Ahmed and A. Saravanan
Department of MCA, Sri Krishna College of Technology, Coimbatore - 641042, Tamil Nadu, India; [email protected], [email protected], a. [email protected]
The Internet is a considerable collection of information that makes it extremely difficult to search and retrieve the required and valuable information. The primary objective of this paper is to provide the user with efficient and effective result through search engine, as the result set contains irrelevant and redundant data called outliers. In this research work, a mathematical approach called Spearman's rank correlation coefficient has been used to calculate the correlation between the document pairs. If the correlation value is 1, then the document is redundant which can be removed. This method depends on the term frequency of common words between the document pairs that is ranked based on the frequency value. This method improves the effectiveness, efficiency and reliability of the search engine. The comparison has been made with the performance of n-gram method, TF.IDF, Linear Correlation and Ranking Correlation. Thus the experimental result shows that the proposed method improves the precision, recall, f-score and accuracy. Thus the result produced by this method improves accuracy.
Keywords: Ranking Correlation Coefficient, Search Engines, Term Frequency, Web Content Mining, Web Content Outliers.
Subscribe now for latest articles and news.