• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2015, Volume: 8, Issue: 35, Pages: 1-8

Original Article

Exploring Non-Homogeneity and Dynamicity of High Scale Cloud through Hive and Pig

Abstract

Cloud environment is usually associated with non-homogeneity and dynamicity in terms of resource usage and access at all levels. The study of this heterogeneous and non-uniform behavior is therefore an important problem. Google cluster trace which is a production trace released by Google in November 2014 serves as an example of a high scale Cloud environment. This paper deals with statistical analysis of this cluster trace. Since the size of production trace is very huge therefore, Hive which is a HadoopDistributed File System (HDFS) based platform for querying and analysis of big data, has been used. Hive was accessed through its Beeswax interface. The data was imported into HDFS through HCatalog. Apart from Hive, Pig which is a scripting language and provides abstraction on top of Hadoop was used. The method adopted deals with clustering and studying the distribution of arrival time of jobs, distribution of resource usage and also study of distribution of process runtime. To the best of our knowledge the analytical method adopted by us is novel. The findings revealed that jobs in a production trace can be classified into major, mediocre and minor resource usage types. Furthermore, it can be concluded from our study that arrival time of jobs followed weibull distribution. Usage of resources such as CPU and memory was observed to be following a zipf like distribution while study of process runtime shows that some jobs had very small values of runtime while others had very large values hence they followed heavy tailed distribution. Our analysis will help researchers in properly understanding the nonhomogenous and dynamic behavior synonymous with cloud environment. It will also help them in developing new algorithms for resource allocation and scheduling in Cloud.
Keywords: Dynamicity, Hadoop,High Scale Cloud, Hive, Pig, Non-Homogenous

DON'T MISS OUT!

Subscribe now for latest articles and news.