Comparison of Machine Learning Algorithm on Map Reduction for Performance Improvement in Big Data

Ananthi Sheshasayee   and J  V  N  Lakshmi

doi:10.17485/ijst/2015/v8i29/84650

Article

Comparison of Machine Learning Algorithm on Map Reduction for Performance Improvement in Big Data

VIEWS 821
PDF 481

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2015/v8i29/84650

Year: 2015, Volume: 8, Issue: 29, Pages: 1-6

Original Article

Comparison of Machine Learning Algorithm on Map Reduction for Performance Improvement in Big Data

Ananthi Sheshasayee^1* and J. V. N. Lakshmi²

¹PG and Department of Computer Science and Research, Quaid – E – Millath Government. College for Women, Chennai, - 600002, Tamil Nadu, India; [email protected]
²Department of Research in Computer Science, SCSVMV University, Kanchipuram - 631561, Tamil Nadu, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background: As massive data acquisition and storage becomes increasingly affordable, a wide variety of enterprises are engaged in sophisticated data analysis. The amount of digital information which is majorly unstructured produced is exceeding day by day. Method: MapReduce programming method is easily applicable to many different learning algorithms. Machine Learning is at the core of data analysis. Traditional machine learning algorithms speed up at a time to fit the statistical query model on multicore computers. Data Sharing is avoided by Hadoop whereas Machine learning Algorithm needs data to be stored in single place. The method compares the machine learning algorithms on MapReduce paradigm for evaluating speed. Findings: MapReduce programming model enables easy development of scalable parallel applications to process large clusters of data. Hadoop Distributed File System runs the MapReduce jobs which influence the performance significantly while handling huge data set stored on different nodes of a multi node cluster. This paper analyses on developing machine learning algorithms on Hadoop to process large clusters of data. Analyzing logistic regression algorithms on MapReduce for evaluating the performance to speed up processing by developing a cost model. The attributes of the system are evaluated for improving time efficiency. The objective is to provide ad hoc performance for MapReduce programs which run on large data sets. Improvements: A method for optimizing job assignment on machine learning is implemented in order to minimize the total execution time. This feature improves the productivity of MapReduce users to optimize performance efficiency.
Keywords: Big Data, HADOOP, HDFS, Machine Learning, Map Reduction