An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce

A Senthilkumar; D Hari Prasad

doi:10.17485/IJST/v13i34.1078

Article

An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce

VIEWS 1545
PDF 337

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v13i34.1078

Year: 2020, Volume: 13, Issue: 34, Pages: 3561-3571

Original Article

An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce

A Senthilkumar^1*, D Hari Prasad²

¹Research Scholar, Sri Ramakrishna College of Arts and Science, Coimbatore, 641 006
²Professor and Head, Department of Computer Application, Sri Ramakrishna College of Arts and Science, Coimbatore, 641 006

*Corresponding Author
Email: [email protected]

Received Date:06 July 2020, Accepted Date:05 September 2020, Published Date:22 September 2020

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: To achieve improved performance of FP-Growth based Association Rule Mining algorithm for massive data by effective utilization of storage,execution capability and improved partition technique within the Hadoop MapReduce framework. Methodology: The proposed methodology has four main phases: In the first phase, the item sets for finding the frequent pattern are encoded and thus minimizes the expensive operation for large data set. In the second phase, improved hash partitioning reduces the network overhead and improves the communication speed within the MapReduce phase for each item set. The effective usage of network bandwidth and storage is obtained by the impact of compression in the third phase. The use of combiner in final phase for frequent item set mining minimizes the overhead of reduce phase by finding the pattern in each partition and minimizes the overall execution time of the FP-Growth algorithm. Findings: FP-Growth based association rule mining algorithm is designed for parallel execution on distributed cluster of servers. Changes to the MapReduce implementation of FP-Growth with the impact of encoding. Improved hash partitioning, compression and configuration results in a significant performance gain with better improvement in execution time.Novelty/Improvements: According to the experimental results, the changes in storage and processing level within the MapReduce framework improves the overall performance of the parallel frequent item set mining in Hadoop cluster.

Keywords: Association rule mining; Hadoop; MapReduce; FP-Growth

References

Lin JCW, Gan W, Fournier-Viger P, Yang L, Liu Q, Frnda J, et al. High utility-itemset mining and privacy-preserving utility mining. Perspectives in Science. 2016;7:74–80. Available from: https://dx.doi.org/10.1016/j.pisc.2015.11.013
Kumbhare AT, Chobe VS, . An Overview of Association Rule Mining Algorithms. International Journal of Computer Science and Information Technologies. 2014;5(1):927–930.
Shirke D, Varshney D. Parallel Mining of Frequent Itemsets in Hadoop Cluster Having Heterogeneous Nodes. International Journal of Advance Research in Computer Science and Management Studies. 2017;5(7):129–136. Available from: https://doi.org/10.1109/CBD.2013.22
Siddiqa A. A survey of big data management: Taxonomy and state-of-the-art. J. Netw. Comput. Appl. 2016;71:151–166. Available from: https://doi.org/10.1016/j.jnca.2016.04.008
Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R. A survey of sequential pattern mining. Data Science and Pattern Recognition. 2017;1:54–77. Available from: https://dl.acm.org/doi/10.1145/3314107
Choi TM, Chan HK, Yue X. Recent Development in Big Data Analytics for Business Operations and Risk Management. IEEE Transactions on Cybernetics. 2017;47(1):81–92. Available from: https://dx.doi.org/10.1109/tcyb.2015.2507599
Khezr SN, Navimipour NJ. MapReduce and Its Applications, Challenges, and Architecture: a Comprehensive Review and Directions for Future Research. Journal of Grid Computing. 2017;15(3):295–321. Available from: https://dx.doi.org/10.1007/s10723-017-9408-0
Gu R, Yang X, Yan J, Sun Y, Wang B, Yuan C, et al. Shadoop Improving MapReduce Performance by Optimizing Job Execution Mechanism in Hadoop Clusters. Journal of Parallel and Distributed Computing. 2014;74(3):2166–2179. Available from: https://doi.org/10.1016/j.jpdc.2013.10.003
Rochd Y, Hafidi I, . Performance Improvement of PrePost Algorithm Based on Hadoop for Big Data. International Journal of Intelligent Engineering and Systems. 2018;11(5):226–235. Available from: https://dx.doi.org/10.22266/ijies2018.1031.21
Xun Y, Zhang J, Qin X, Zhao X. FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters. IEEE Transactions on Parallel and Distributed Systems. 2017;28:101–114. Available from: https://dx.doi.org/10.1109/tpds.2016.2560176
Barkhordari M, Niamanesh M. Kavosh: an effective Map-Reduce-based association rule mining method. Journal of Big Data. 2018;5(1). Available from: https://dx.doi.org/10.1186/s40537-018-0129-4
Wang CS, Chang JY. MISFP-Growth: Hadoop-Based Frequent Pattern Mining with Multiple Item Support. Applied Sciences. 2019;9(10). Available from: https://dx.doi.org/10.3390/app9102075
Sethi KK, Ramesh D. HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing. The Journal of Supercomputing. 2017;73(8):3652–3668. Available from: https://dx.doi.org/10.1007/s11227-017-1963-4
Bagui S, Dhar PC. Positive and negative association rule mining in Hadoop’s MapReduce environment. Journal of Big Data. 2019;6(1). Available from: https://dx.doi.org/10.1186/s40537-019-0238-8
Xia D, Lu X, Li H, Wang W, Li Y, Zhang Z. A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data. Complexity. 2018;2018:1–16. Available from: https://dx.doi.org/10.1155/2018/2818251
Al-Hamodi AGA, Lu S, Al-Salhi EAY. An Enhanced Frequent Pattern Growth Based on MapReduce for Mining Association Rules. International Journal of Data Mining & Knowledge Management Process. 2016;6(2):19–28. Available from: https://dx.doi.org/10.5121/ijdkp.2016.6202

Copyright

© 2020 Senthilkumar & Hari Prasad.This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee).