Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 43, Pages: 1-11
R. Janani* and S. Vijayarani
*Author for correspondence
Department of Computer Science, School of Computer Science and Engineering, Bharathiar University, Coimbatore - 641046, Tamil Nadu, India; [email protected]
Objectives: To retrieve the information after analyzing the contents of the documents which are stored in the desktop by applying string matching algorithms. Methods/Statistical Analysis: To analyze the content of the documents, the various pattern matching algorithms are used to find all the occurrences of a limited set of patterns within an input text or input document. In order to perform this task, this research work used four existing string matching algorithms; they are Brute Force algorithm, Knuth-Morris-Pratt algorithm (KMP), Boyer Moore algorithm and Rabin Karp algorithm. This work also proposes three new string matching algorithms. They are Enhanced Boyer Moore algorithm, Enhanced Rabin Karp algorithm and Enhanced Knuth-Morris-Pratt algorithm. Findings: For experimentation, this work has used two types of documents, i.e. .txt and .docx. Performance measures used are search time, number of iterations and accuracy. From the experimental results, it is realized that the enhanced KMP algorithm gives better accuracy compared to other string matching algorithms. Application/Improvements: Normally, these algorithms are used in the field of text mining, document classification, content analysis and plagiarism detection. In future, these algorithms have to be enhanced to improve their performance and the various types of documents will be used for experimentation.
Keywords: Brute Force, Boyer Moore, Information Retrieval, Knuth-Morris-Pratt, Pattern Matching, Rabin Karp
Subscribe now for latest articles and news.