Energy Optimization in Cloud by Appling Horizontal clustering

Task consolidation technology in Cloud Computing is an upcoming advancement and an effective approach to decrease the energy-consumption. Cloud environment application provider’s main goal is to consume the resources perfectly and gain maximum profit. This goal leads to job scheduling as a main focus and challenging issues in cloud environment. So for this, in this paper, horizontal clustering technique is applied on job to cluster the list of jobs from the same level. The clustering of task takes place on the basis of priority of tasks. The similar property tasks cluster on one machine as to execute similar tasks collectively.


Introduction
Cloud Computing provides us application to access our data from anywhere online instead of our own personal computer. And we can restore our data in case PC has any issues through internet. The Cloud Computing give us facility to Pay as per we use the services. It provide services on rent to run small business like storage space, infrastructure, platform to develop software like JAVA and c++ and software also to communicate like Gmail and Facebook. Cloud Computing is turning IT services into utilities. Figure 1 shows similar way we are paying for water and electricity. In Cloud Computing we need not to bother where our servers are, where our stored documents, where we hosted our applications. These all responsibilities are of Data Center owner that how they can manage the data and demands of customers. The term cloud is used as a metaphor for the Internet.
National Institute of Standard and Technology (NIST) defines the Cloud Computing is the model which provide the on demand network access to the customer of shared pool of resources (e.g. servers, networks, applications, services and storage). Cloud Computing services are pay per use like electricity. The cloud services are classified as SaaS (Software as a Services), PaaS (Platform as a Services) and IaaS (Infrastructure as a Services). Figure  2. 1. Software as a Service (SaaS): SaaS is a full operating platform with several applications, management and the customer interface. The thin client interface provides the application to the customer in SaaS model and the user's work begins and ends with entering and managing its information and customer interaction. Example of SaaS is Drupal, Turbo Tax • Ubiquitous network access.
• Location independent resources pooling.
• Pay per use.
The cloud services are pay per use like electricity. Number of user request on one time for services the optimal task scheduling required consuming minimum cost and supplying high SLA 3 .

Datacenter Architecture
The Cloud Computing data centers are much flexible than the traditional data centers. In the traditional data centers all the servers are physically presented but now a day with developing trend of technology the virtualization is used to give best output to their customers. The data centers like Google, Facebook has a lot of VMs on one host to access the query of client in few seconds. A Data Center houses has hundreds to thousand number of storage units, servers which are connected with each other through various routers/switches, that are arranged to form one particular topology. The architecture of Data Center is shown in Figure 3 in which enough bandwidth is considered to overcome the problem of congestion 4 .
In the architecture of Data Center there are three main components broker, green service allocator and resources. Broker is the intimidator between the customer and resources. It takes cloudlets from the client and sends to the servers for processing.
Green service allocator layer: This layer provides the effective services to the customer by applying some particular actions on the system that the servers can allocate minimum machines to the cloudlets on request of broker. Green service allocator layer has six functions.
• Green negotiator: Green negotiators negotiate with the broker and client to set the SLA (Service Level Agreement) in between the service provider and the consumer according to the QoS requirement of the client and on specific pricing. • Service analyzer: It checks the requirement of customer before accepting or rejecting the request on the basis of VM manager to maintain the minimum load and energy consumption. • Consumer profiler: Consumer profiler analysis the background of consumer to give the priority according to the importance of consumer over the other consumers. • Pricing: It calculates the charged of request that how it manages the demand and supply of computing resources according to the priority of the cloudlet. • Monitor energy: It searches and evaluates the physical machine to power on or off. • Schedule services: Service scheduler allocates the VM to the cloudlets and determine that when VM should add or remove according to the workload.
• Manage VMs: VM manager keeps record of available VM and also migrates VM on the other physical machines if needed. • Accounting: It maintains the usage of machines to compute the usage cost.
The next component is resources in which physical machines are involved with the virtualization in it. Each physical machine has more than one VM to reach the accepted demand of customer. The VM are dynamically getting start and stop on the physical machine according to the work load. With the development of technology VM are migrating on the different physical machines to give low SLA violation and consume less energy. The number of parameters can be considered by the researcher to make the improvement in the service level of data center. Virtual Machines are separated on the basis of their use: • A system virtual machine: It is the complete operating system which use then real hardware are not available and it is also known as hardware virtualization. • A process virtual machine: It is used to run single program. These types of machines are built for providing portability and flexibility to the program and it is also called application virtual machines.

Energy Consumption
Since Cloud Computing came into existence are not perfect in terms of energy consumption. The consumption of large amount of energy has negative impact on our environment by releasing huge CO 2 which arise greenhouse effect. Energy consumption includes many factors such as load balancing, power distribution, cooling, server and etc. Addition to this in 2007 one report is submitted to the US congress on "Server and data Centre energy consumption". This report focus on energy efficiency by US data Centre was 61 billion kilowatts-hour in 2006 totaling was $4.5 billon 5 . The higher power consumption require cooling system that cost in the range of $2 to $5 million per year. The main two factors which help data centre to consume less energy are: 1. Shutting it down, 2. Scaling down its performance 5 . When Cloud Computing came into existence its main focus was to make it a huge data centre for high performance computing and making profit from it by get paid what we used, but with the passage of time, it became a model of computing facility for the dynamic provisioning 6 . Nowadays VMs migration and consolidation algorithms are based on energy consumption model mostly with single system resource constraint, i.e. CPU. These algorithms may not assume the impact of other resources. Some researchers believe that the energy consumption of whole server varies mostly linearly with the CPU utilization. However, it still consumes more than 70% of its peak energy, even if a server is completely idle. Day to day usage of computing services leads to energy consumption. The energy consumption has huge impact on the environment with the dissipation of CO 2 , which increases greenhouse effect 7 . When cloud perform operation, cloud use data Centre to store data, process stored data with the help of servers and data gets it transferred over the internet. It is estimated that approximately 10% of the world total energy is consumed by internet. The cost of energy to power data Centre gets doubled after every 5 years. The amount of power consumed by data Centre grew by 56% between 2005 and 2010. In 2010 approximately 1.1% to 1.5% of the total world energy was consumed by data Centre. In 2011 energy consumed by data Centre was approximately 1,00,00,000 MW, which has generated 40,568,000 tons of CO 2 emissions. The data Centre consumes only 20-30 % to operate, whereas rest of the 70-80 % of the energy consumption is wasted due to over-provisioned idle resources which approximately results to 20,000,000 tons of CO 2 emission. In Cloud Computing to store any document consumes less power whereas conventional computing consumes much more power as compared to it 8 . Electricity is needed to operate servers, interconnecting telecommunication networks and to cooling of system. Data Centre is not much costly to build but not eco-friendly for the environment 9 . Reducing energy consumption is a challenging issue. Even government is also pressurizing to reduce power consumption to reduce CO 2 emission and greenhouse effect. Because of this reason Google, Yahoo is building their Data Centre in Barren desert surrounded by Columbia River in US in order to obtain cheap hydro power. From total expenditure spend to work efficiently Amazon EC2 pays 42% for energy usage. Data Centre named Microsoft Dublin consumes approximately 5.4 MW of energy. A lot of research has been done to gather different aspects for managing and consuming energy used in data centers.

Scheduling
In Cloud Computing, scheduling directly affects some important parameters of cloud environment like energy efficiency. The major components of energy consumption Data Center are server, cooling system and interconnecting telecommunication. So to reduce energy consumption of Data Center is to decrease the number of server which is in active state to receive and process tasks 10 . This can be possible by effective scheduling which analyses both the load on network link as well as the occupancy of outing queue at network switches. Figure 4 Resource scheduling: The VMS are allocated to the task for the processing according to the CPU time or the capacity of the resource to execute the workload in given time period. To overcome the problem of delaying the task assigns b the broker.

Task Scheduling
In task scheduling some tasks are consolidated to form one job for further assignment to the processors. Task scheduling is of five types: • Cloud service scheduling: This scheduling is categorized at two levels: 1. User level scheduling: It deals with issues raised by the service management between service provider and customers. 2. System level scheduling: It manages resources with the Data Center. • User level scheduling: It deals with the dynamically fluctuating resources demand. In the cloud environment resources are of distinct type and the main focus of this scheduling is to give high SLA to the customer with the minimum cost. • Static and dynamic scheduling: In the static scheduling the information of the task arrived for the execution is well known beforehand. But in dynamic the knowledge about task is hidden. The dynamic scheduling is the one of big challenge to overcome. • Heuristic scheduling: This scheduling is the major concern to improve the energy efficiency in cloud data centers. This scheduling need algorithms which can schedule the task over VM on the basis of artificial intelligence or we can say the it assign task to the resources as like human being thing by checking all the factor to take care of all the parameters.

Task Consolidations
The workflowsim simulator is used to implement the technique in which clustering is applied on the cloudlets to collect same properties task on VM to decrease the data transmission over the different severs. The jobs with the particular configuration are arrived through the broker from the client to get processed. Then methodology will apply on the cloudlet to assign Virtual Machine for processing in minimum time slot. The scheduling techniques are implemented to utilize the resources effectively and the delay between task executions can remove completely. But sometime wrong strategy make the negative effect and increase the congestion of tasks which raise the energy consumption and put bad impact on other parameters also. The job comes with the input data file size and output data file size on the basis of which we measure the impact factor of the job. The impact factor is job run time, it is measured by the CPU itself automatically which is known as burst time of the cloudlet. The existing model is based upon the imbalanced metrics to optimize the task scheduling over the scientific workflow executions over the cloud environments. The existing model utilizes the task clustering mechanisms to cluster the small tasks into the task group appearing like the single task to process them smoothly and quickly.

Methodology
The user workflow model is simple in which the task are cluster of same level to form one task list to get execute. By considering the task impact factor for collecting similar task on same Virtual Machine, our workflow model will become more beneficial for large engineering and scientific applications.
As a big amount of data transmission can highly increase the energy consumption of an ON stage server and as well as the energy consumption of transmission on network, normal data placement based on data impact factor is very important. One single task contains its own size of input data from different parent data node. So to overcome this problem horizontal clustering is applied to combine only those tasks which contain same information and impact factor as in Figure 5.

Horizontal Clustering
The horizontal clustering technique applies on the job on the basis of the data dependency belong from same level data that data transmission over the network can reduce smartly. As in figure first parent node has two child and that child node jobs will combine in same cluster because both the child have same level and receive data from the same parent node to execute as in Figure 6.

Simulation Experimental
It is very difficult to implement the repeatable experiments in a real cloud environment. We perform simulations in Workflowsim which is an extension of CloudSim (a toolkit for simulation) as to ensure the repeatability of experiments. Workflowsim is an open source simulator that was developed by a PhD student from University of Southern California, Weiwei Chen. It is the advancement of cloudsim by providing a workflow level support of simulation. Workflowsim has provided the environment to evaluate number of algorithms and policies before real development of cloud products. We have simulated datacenter which consists of hosts which further contains Virtual Machines of different configurations. We have calculated the results by changing the properties of various machines repeatedly as shown in the table below:

Components
• Workflow Mapper (create list of tasks).
• Workflow Engine (manage tasks on their dependencies).
• Workflow scheduler (match jobs to the worker node).

Energy Optimization in Cloud by Appling Horizontal clustering
• Failure Generator (introduce the job failure).
• Failure Monitor (collect failure jobs to return it to records).

Evaluation of Energy Consumption with Different Workloads Analysis
In the results for energy consumption on different workload is compared in Figure 7. We calculated the energy for three type of workload (low, medium and high). The low load value is 30, medium load is 60 and high load is 90. The comparison of existing technique and proposed techniques results shows that our technique has giving bit better results as compared to data correlation clustering Hierarchical clustering implemented in 6 . We have also seen that horizontal clustering technique results are much better in energy consumption as compared to data correlation clustering technique.

Conclusion
In this paper, we proposed horizontal clustering technique which is compared with the hierarchical clustering on the basis of parameters (energy efficiency). Now a day it is major issues among us. Sometime bad task scheduling technique creates congestion on machines and unnecessary VM get turn on which consume more energy and release CO 2 which affect our environment. So by applying effective VM scheduling and task consolidation can decrease energy consumption. We evaluate energy consumption results of our proposed technique and existing technique and results shows that our technique is better. We use workflowsim simulator to consolidate the task on the basis of impact factor.
But in future this technique can implement on the real environment to improve the SLA violation.