Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 18, Pages: 1-12
K. K. Sherly1* and R. Nedunchezhian2
1 Department of Information Technology, Toc H Institute of Science and Technology, Ernakulam - 682313, Kerala, India; [email protected]
2 Department of Computer Science and Engineering, Sri Ranganathar Institute of Engineering and Technology, Coimbatore - 641110, Tamilnadu, India; [email protected]
Objectives: To develop a memory efficient, incremental and interactive distributed FPM having less communication and synchronization overhead with good load balancing capability, to analyze the dynamic transactional data in a distributed database. Methods/Analysis: This technique adopts prefix based equivalence class partitioning scheme to generate frequent item sets without generating local frequent sets with low memory consumption. This approach uses a range of support values to update the frequent patterns with less time complexity. This paper proposes distributed FPM techniques with both count distributed and compressed data distributed parallel approaches. The performance of the algorithms are tested and compared with popular distributed FPM algorithms using standard datasets. Findings: To deal with the massive dynamic data stored in distributed databases, this approach develops three distributed frequent set generation algorithms, which update frequent patterns by reusing the previously stored pattern information with no complex calculations or data structures. The proposed approaches also provide the user with the facility to interactively adjust the minimum support value as per their own conveniences by keeping the nearly frequent itemsets with the help of two minimum support thresholds (low, high). Measures have been taken to reduce the additional itemset storage and computations as well as to achieve good load balancing with low communication and synchronization overhead. Since the proposed algorithms adopt prefix based equivalent class partitioning technique at each n-itemset level and undergo four levels of itemset filtering to remove infrequent items from each class before calculating the individual item count, the inter node communication required is less in this approach. To eliminate the drawbacks of both count and data distribution approaches one of the algorithms proposed adopts a hybrid approach which distributes the compressed data only once, hence communication overhead is less compared with other DD algorithms. Conclusion/Application: The proposed distributed techniques reduce memory utilization and itemset comparisons compared to the existing approaches. The performances are tested and evaluated for market analysis and online credit card fraud detection applications.
Keywords: Credit Card Fraud Detection System, Incremental Distributed Frequent Pattern Mining, Interactive Parallel Mining Techniques, Market Basket Analysis, Prefix Based Equivalence Class Partitioning Approach
Subscribe now for latest articles and news.