• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2020, Volume: 13, Issue: 34, Pages: 3561-3571

Original Article

An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce

Received Date:06 July 2020, Accepted Date:05 September 2020, Published Date:22 September 2020

Abstract

Objectives: To achieve improved performance of FP-Growth based Association Rule Mining algorithm for massive data by effective utilization of storage,execution capability and improved partition technique within the Hadoop MapReduce framework. Methodology: The proposed methodology has four main phases: In the first phase, the item sets for finding the frequent pattern are encoded and thus minimizes the expensive operation for large data set. In the second phase, improved hash partitioning reduces the network overhead and improves the communication speed within the MapReduce phase for each item set. The effective usage of network bandwidth and storage is obtained by the impact of compression in the third phase. The use of combiner in final phase for frequent item set mining minimizes the overhead of reduce phase by finding the pattern in each partition and minimizes the overall execution time of the FP-Growth algorithm. Findings: FP-Growth based association rule mining algorithm is designed for parallel execution on distributed cluster of servers. Changes to the MapReduce implementation of FP-Growth with the impact of encoding. Improved hash partitioning, compression and configuration results in a significant performance gain with better improvement in execution time.Novelty/Improvements: According to the experimental results, the changes in storage and processing level within the MapReduce framework improves the overall performance of the parallel frequent item set mining in Hadoop cluster.

Keywords: Association rule mining; Hadoop; MapReduce; FP-Growth

References

 

  1. Lin JCW, Gan W, Fournier-Viger P, Yang L, Liu Q, Frnda J, et al. High utility-itemset mining and privacy-preserving utility mining. Perspectives in Science. 2016;7:74–80. Available from: https://dx.doi.org/10.1016/j.pisc.2015.11.013
  2. Kumbhare AT, Chobe VS, . An Overview of Association Rule Mining Algorithms. International Journal of Computer Science and Information Technologies. 2014;5(1):927–930.
  3. Shirke D, Varshney D. Parallel Mining of Frequent Itemsets in Hadoop Cluster Having Heterogeneous Nodes. International Journal of Advance Research in Computer Science and Management Studies. 2017;5(7):129–136. Available from: https://doi.org/10.1109/CBD.2013.22
  4. Siddiqa A. A survey of big data management: Taxonomy and state-of-the-art. J. Netw. Comput. Appl. 2016;71:151–166. Available from: https://doi.org/10.1016/j.jnca.2016.04.008
  5. Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R. A survey of sequential pattern mining. Data Science and Pattern Recognition. 2017;1:54–77. Available from: https://dl.acm.org/doi/10.1145/3314107
  6. Choi TM, Chan HK, Yue X. Recent Development in Big Data Analytics for Business Operations and Risk Management. IEEE Transactions on Cybernetics. 2017;47(1):81–92. Available from: https://dx.doi.org/10.1109/tcyb.2015.2507599
  7. Gu R, Yang X, Yan J, Sun Y, Wang B, Yuan C, et al. Shadoop Improving MapReduce Performance by Optimizing Job Execution Mechanism in Hadoop Clusters. Journal of Parallel and Distributed Computing. 2014;74(3):2166–2179. Available from: https://doi.org/10.1016/j.jpdc.2013.10.003
  8. Rochd Y, Hafidi I, . Performance Improvement of PrePost Algorithm Based on Hadoop for Big Data. International Journal of Intelligent Engineering and Systems. 2018;11(5):226–235. Available from: https://dx.doi.org/10.22266/ijies2018.1031.21
  9. Xun Y, Zhang J, Qin X, Zhao X. FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters. IEEE Transactions on Parallel and Distributed Systems. 2017;28:101–114. Available from: https://dx.doi.org/10.1109/tpds.2016.2560176
  10. Barkhordari M, Niamanesh M. Kavosh: an effective Map-Reduce-based association rule mining method. Journal of Big Data. 2018;5(1). Available from: https://dx.doi.org/10.1186/s40537-018-0129-4
  11. Sethi KK, Ramesh D. HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing. The Journal of Supercomputing. 2017;73(8):3652–3668. Available from: https://dx.doi.org/10.1007/s11227-017-1963-4
  12. Al-Hamodi AGA, Lu S, Al-Salhi EAY. An Enhanced Frequent Pattern Growth Based on MapReduce for Mining Association Rules. International Journal of Data Mining & Knowledge Management Process. 2016;6(2):19–28. Available from: https://dx.doi.org/10.5121/ijdkp.2016.6202

Copyright

© 2020 Senthilkumar & Hari Prasad.This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee).

DON'T MISS OUT!

Subscribe now for latest articles and news.