Indian Journal of Science and Technology
Year: 2019, Volume: 12, Issue: 3, Pages: 1--4
K. Sampath Kini1* and K. Karthik Pai2
1Department of Computer Science and Engineering, NMAMIT, Nitte, Karkala - 574110, Karnataka, India; [email protected]
2Department of Information Science and Engineering, NMAMIT, Nitte, Karkala - 574110, Karnataka, India; [email protected]
*Author for correspondence
K. Sampath Kini
Department of Computer Science and Engineering, NMAMIT, Nitte, Karkala - 574110, Karnataka, India
Email: [email protected]
Objectives: Mining of frequent item sets in transactional databases has been in widely use since years. This traditional technique involves mining of frequent itemsets on the entire set of records present in the transaction database at once. This has been causing performance issues such as out of memory, large turnaround time for the computation. The aim of the study is to propose suitable technique to overcome memory issues, reduce overall turnaround time and enable the determination of frequent item sets based on specific season or a time period. Methods/Analysis: Frequent item sets mining can be used for decision making in large number of real-life applications. With the growth in amount of data, quite a number of FIM (frequent itemset mining) approaches were proposed to meet the requirements of scalability. However, some existing approaches have met this requirement to some extent; they require high consumption of CPU and memory. In this study, we present another approach, named FIMUPT (frequent itemset mining using partitioning technique). The proposed technique eliminates processing of all records at once; instead it processes the records in small chunks in multiple increments. This technique makes various partitions of relevant size from large transaction database. It processes the partitions one after another instead of large transaction database. This scheme also has a provision to pass support threshold values for various partitions. Every partition of transaction database can be mapped to a season or a time period as needed. Findings: We observed that the proposed techniques perform very well in terms of computational time and memory usage. Size of the partition is decided based on the total number of transactions present, size of main memory and time period applicable for the entire transaction database. It consumes less memory since partition size is less compared entire transaction database. As the single partition is processed at once in the memory, it eliminates large memory needs of traditional technique [reference Wikipedia]. We have used sample dataset from the data mining library source spfm which is an open source. The size of the data sets we have used are 50MB and 100MB. Algorithms such as Apriori failed to run on 100MB datasets throwing out of memory error. Applications: However, with the partitioned approach successful executions were observed. Our test environment produced 15 partitions of entire transaction database. It concludes that with less memory size, we can process larger number of transaction records to perform data mining tasks. When compared with the existing approaches, experimental results tell that FIMUPT gives a performance gain of 19% on average. This technique eliminates out of memory issues seen in Apriori algorithm and its variant algorithm.
Keywords: Data Mining, Frequent Itemset, Partitioning, and Performance, Transaction Database
Subscribe now for latest articles and news.