• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2020, Volume: 13, Issue: 11, Pages: 1270-1275

Original Article

Instant fuzzy search using probabilistic-correlation based ranking

Received Date:29 February 2020, Accepted Date:06 April 2020, Published Date:03 May 2020

Abstract

Background : Instant search recommends completions of the query ‘on the fly’, and instantly displays the results with every keystroke. It is desirable that these query results be robust against typographical errors that appear not only in the query but also in the documents. Additionally, instant search requires instant response time and ranking of the results to focus on the most important answers. Method: In this study, simple and efficient methods for instant fuzzy single keyword and multi-keyword search that are resilient to typographical errors and that employ no more than inverted and forward indices are studied. While computing search results incrementally using the cached results, the answers are ranked based on their relevance to the query using probabilistic correlation-based ranking. Findings: Experiments are conducted on data sets DBLP and Medline and the execution time for obtaining answers to instant fuzzy single keyword search is recorded for different prefix lengths. Similarly, the execution time for obtaining answers to instant fuzzy multi-keyword search is recorded for sub-queries of two keywords and three keywords for various prefix lengths on the same data set. Furthermore, in order to measure the usefulness of the proposed correlation-based ranking, precision is calculated for the search results. Experimental evaluation demonstrates the efficacy of the instant fuzzy search algorithms and the probabilistic correlation-based ranking. Applications: The proposed instant fuzzy keyword search for single and multiple keywords not only improves the efficiency but also the quality of the search results.

Keywords: Keyword Search, Multi-keyword Search, Fuzzy Search, Probabilistic Correlation

References

  1. Bickel S, Haider P, Scheffer T. Learning to complete sentences. In: Machine Learning, Vol. 3. (pp. 497-504) Springer. 2005.
  2. Bast H, Mortensen CW, Weber I. Output-sensitive autocompletion search. Information Retrieval. 2008;11(4):269–286. doi: 10.1007/s10791-008-9048-x
  3. Bast H, Weber I. Type less, find more: fast autocompletion search with a succinct index. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 2006;p. 364–371.
  4. Bast H, Weber I. The Complete Search engine: Interactive, efficient, and towards IR & DB integration. Third Biennial Conference on Innovative Data Systems. 2007;p. 88–95.
  5. Cetindil I, Esmaelnezhad J, Kim T, Li C. Efficient instant-fuzzy search with proximity ranking. IEEE 30th International Conference on. 2014;p. 328–339.
  6. JS. 2011. Available from: https://eric.ed.gov/?id=ED524397
  7. Qin J, Xiao C, Hu S. Efficient query autocompletion with edit distance-based error tolerance. The VLDB Journal. 2019.
  8. Zhou X, Qin J, Xiao C, Wang W, Lin X, Ishikawa Y. BEVA: An efficient query processing algorithm for error-tolerant autocompletion. ACM Trans. Database Syst. 2016;41(1):44.
  9. Lewin-Eytan L, Raviv CD, Libov A, Maarek A, Monaco Y, Inventors; Verizon Media P, et al. 2009. Available from: https://patents.google.com/patent/US20200012686A1/en
  10. Lee DL, Huei Chuang , Seamons K. Document ranking and the vector-space model. IEEE Software. 1997;14(2):67–75. doi: 10.1109/52.582976
  11. Robertson S. On event spaces and probabilistic models in information retrieval. Information Retrieval. 2005;8(2):319–329.
  12. Croft B, Lafferty J. Springer Science & Business Media. 2003. Available from: https://link.springer.com/book/10.1007/978-94-017-0171-6
  13. Devi B, Shankar VG, Srivastava S, Srivastava DK. AnaBus: A Proposed Sampling Retrieval Model for Business and Historical Data Analytics. InData Management, Analytics and Innovation. (pp. 179-187) Singapore. Springer. 2020.

Copyright

Copyright: © 2020 Rekha. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.