Total views : 768
A Improved Incremental and Interactive Frequent Pattern Mining Techniques for Market Basket Analysis and Fraud Detection in Distributed and Parallel Systems
Objectives: To develop a memory efficient, incremental and interactive distributed FPM having less communication and synchronization overhead with good load balancing capability, to analyze the dynamic transactional data in a distributed database. Methods/Analysis: This technique adopts prefix based equivalence class partitioning scheme to generate frequent item sets without generating local frequent sets with low memory consumption. This approach uses a range of support values to update the frequent patterns with less time complexity. This paper proposes distributed FPM techniques with both count distributed and compressed data distributed parallel approaches. The performance of the algorithms are tested and compared with popular distributed FPM algorithms using standard datasets. Findings: To deal with the massive dynamic data stored in distributed databases, this approach develops three distributed frequent set generation algorithms, which update frequent patterns by reusing the previously stored pattern information with no complex calculations or data structures. The proposed approaches also provide the user with the facility to interactively adjust the minimum support value as per their own conveniences by keeping the nearly frequent itemsets with the help of two minimum support thresholds (low, high). Measures have been taken to reduce the additional itemset storage and computations as well as to achieve good load balancing with low communication and synchronization overhead. Since the proposed algorithms adopt prefix based equivalent class partitioning technique at each n-itemset level and undergo four levels of itemset filtering to remove infrequent items from each class before calculating the individual item count, the inter node communication required is less in this approach. To eliminate the drawbacks of both count and data distribution approaches one of the algorithms proposed adopts a hybrid approach which distributes the compressed data only once, hence communication overhead is less compared with other DD algorithms. Conclusion/Application: The proposed distributed techniques reduce memory utilization and itemset comparisons compared to the existing approaches. The performances are tested and evaluated for market analysis and online credit card fraud detection applications.
Credit Card Fraud Detection System, Incremental Distributed Frequent Pattern Mining, Interactive Parallel Mining Techniques, Market Basket Analysis, Prefix Based Equivalence Class Partitioning Approach
- Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules. Proceedings of International Conference Very LargeData bases. 1994; 487–99.
- Han J, Pei J, Yin Y. Mining frequent patterns without canidate generation. Proceedings of ACM SIGMOD International Conferenceon Management of Data. 2000; 1–12.
- Park J, Chen M, Yu P. An effective hash-based algorithm formining association rules. PYOC ACM-SIGMOD International ConferenceManagement of Data. 1995; 175–86.
- Cheung D, Ng T, Fu A, Fu Y. Efficient mining of association rulesin distributed databases. IEEE Transactions on Knowledgeand Data Engineering. 1996; 8(6):911–22.
- Hidber C. Online association rule mining. Proceedings of theACM SIGMOD International Conference on Management ofData. 1999; 145–56.
- Leung C, Khan Q, Quamrul I, Li Z, Hoque T. CanTree: A canonical-order tree for incremental frequent-pattern mining. Knowledgeand Information Systems. 2007; 11(3):287–311.
- Yang C, Yang D. IMBT-a binary tree for efficient support countingof incremental data mining. International Conference onComputational Science and Engineering; IEEE ComputerSociety. 2009; 324–9.
- Sherly K, Nedunchezhian R, Rajalakshmi M. IAPI Quad-Filter: An interactive and adaptive partitioned approach forincremental frequent pattern mining. Journal ofTheoretical and Applied Information Technology. 2014;63(1):147–57.
- Park J, Chen M, Yu P. Efficient parallel data mining for association rules. Proceedings of International Conference Informationand Knowledge Management. 1995.
- Chen D, Lai C, Hu W, Chen W, Zhang W, Zhen W. Tree partitionbased parallel frequent pattern mining on shared memorysystems. Proceedings of 20th International conference onParallel and Distributed Processing Symposium. 2006.
- Li N, Zeng L, He Q, Shi Z. Parallel implementation of apriori algorithmbased on mapreduce. International Journal of Networkedand Distributed Computing. 2013; 1(2):89–96.
- Pramudiono I, Kitsuregawa M. Parallel FP-growth on PC cluster. Proceedings of the 7th Pacific-Asia Conference on Advancesin Knowledge Discovery and Data Mining. 2003;467–73.
- Bhadane C, Shah K, Vispute P. An efficient parallel approach forfrequent itemset mining of incremental data. International Journalof Scientific and Engineering Research. 2012; 3(2):1–5.
- Xu J, Sung AH, Liu O. Behavior mining for fraud detection. Journalof Research and Practice in Information Technology. 2007; 39(1).
- Ghosh S, Reilly DL. Credit card fraud detection with a Neural-Network. Proceedings of International Conference on SystemScience. 1994; 621–30.
- Syeda M, Zhang YQ, Pan Y. Parallel granular neural networks forfast credit card fraud detection. Proceedings of IEEEInternational Conference on Fuzzy Systems. 2002; 572 –7.
- Aleskerov, Freisleben B, Rao B. CARDWATCH: A Neural Networkbased database mining system for credit card frauddetection. Proceedings of IEEE/IAFE Conference on ComputationalIntelligence for Financial Engineering (CIFEr). 1997; 220–6.
- Chiu C, Tsai C. A web services based collaborative scheme forcredit card fraud detection. Proceedings of IEEE International Conferenceon e-Technology, e-Commerce and e-Service. 2004; 177–81.
- Stolfo SJ, Fan DW, Lee W, Prodronidis AL, Chan PK. Credit cardfraud detection using meta-learning: issues and initial results. Proceedings of AAAI Workshop AI Methods in Fraudand Risk Management. 1997; 83–90.
- Fan W, Wang H, Philip S. YuSalvatore J. Stolfo. A fully distributed frameworkfor cost-sensitive data mining. Proceedings ofthe 22nd International Conference on Distributed ComputingSystems (ICDCS’02). 2002.
- Chun Wei Clifton Phua. Investigative data mining in fraud detection. A thesis submitted. 2003; 1–126.
- Kundu A, Sural S. BLAST-SSAHA hybridization for credit cardfraud detection. IEEE Transactions on Dependable andsecure Computing. 2009; 6(4):309–15.
- Srivastava A, Majumdar AK. Credit card fraud detection usingHidden Markov Model. IEEE Transactions on Dependable andSecure Computing. 2008; 5(1):37–48.
- Renuga Devi T, Rabiyathul Basariya A, Kamaladevi M. Frauddetection in card not present transactions based on behavioralpattern. Journal of Theoretical and Applied Information Technology. 2014; 61(3):447–55.
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.