Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 33, Pages: 1-7
Karthick N* and X. Agnes Kalarani
Objectives: The main objective of this method is to extract the meaningful information from the large amount of data and provide the aggregated form of output to the users. This work tends to mine the large volume of data which are gathered from the multiple sites in order to provide the useful information to the sites for improving their performance. Method: Data aggregation plays a most concerned role in the big data environment where it is very complex to extract useful information from large volume of data. In the existing work, computation based partitioning and aggregation (CP-A) is used to divide the big data into multiple partitions in which aggregation would be done. However existing works do not focus on the content similarity present between the set of data’s which might degrades the accuracy of aggregation result. To overcome this problem in the proposed research methodology, hybridized content and computation aware partitioning and aggregation (HCCP-A) method is introduced. Initially, this work would partition the big data into multiple partitions with the concern content and computation properties. After partitioning, data de-duplication technique would be applied to eliminate the repeated data’s that are present in every partition. This partitioning and data de-duplication process would be done in the mapper stage. The output from the mapper node would be parsed into aggregator node which will perform aggregation. Finally, aggregation result from the aggregator node would be fused together in the reducer node. Results: Hybridized content and computation aware partitioning and aggregation method is introduced in this work for extracting useful information from the large volume of data’s in the summarized format. This methodology is used to aggregate the large volume of data and after aggregation, result produced were compared with the existing methodology called CP-A in terms of performance metrics called the error rate, execution time and CPU utilization. The experimental tests were conducted and the performance has been made against the different number of data size. From this experimental testing it has been proved that the proposed methodology provides a better result than the existing methodology in terms of all performance measures. Conclusion: The finding demonstrates that the data aggregation using Hybridized content and computation aware partitioning and aggregation method is presented and this method has high accuracy and less error rate than the previous methodologies.
Keywords: Aggregation, Big Data, Data Duplication, Data Fusion Partitioning
Subscribe now for latest articles and news.