Indian Journal of Science and Technology
Year: 2015, Volume: 8, Issue: 29, Pages: 1-6
Ananthi Sheshasayee1* and J. V. N. Lakshmi2
1 PG and Department of Computer Science and Research, Quaid – E – Millath Government. College for Women, Chennai, - 600002, Tamil Nadu, India; [email protected]
2 Department of Research in Computer Science, SCSVMV University, Kanchipuram - 631561, Tamil Nadu, India; [email protected]
Background: As massive data acquisition and storage becomes increasingly affordable, a wide variety of enterprises are engaged in sophisticated data analysis. The amount of digital information which is majorly unstructured produced is exceeding day by day. Method: MapReduce programming method is easily applicable to many different learning algorithms. Machine Learning is at the core of data analysis. Traditional machine learning algorithms speed up at a time to fit the statistical query model on multicore computers. Data Sharing is avoided by Hadoop whereas Machine learning Algorithm needs data to be stored in single place. The method compares the machine learning algorithms on MapReduce paradigm for evaluating speed. Findings: MapReduce programming model enables easy development of scalable parallel applications to process large clusters of data. Hadoop Distributed File System runs the MapReduce jobs which influence the performance significantly while handling huge data set stored on different nodes of a multi node cluster. This paper analyses on developing machine learning algorithms on Hadoop to process large clusters of data. Analyzing logistic regression algorithms on MapReduce for evaluating the performance to speed up processing by developing a cost model. The attributes of the system are evaluated for improving time efficiency. The objective is to provide ad hoc performance for MapReduce programs which run on large data sets. Improvements: A method for optimizing job assignment on machine learning is implemented in order to minimize the total execution time. This feature improves the productivity of MapReduce users to optimize performance efficiency.
Keywords: Big Data, HADOOP, HDFS, Machine Learning, Map Reduction
Subscribe now for latest articles and news.