Classification of Virtualization Environment for Cloud Computing

Cloud Computing is a relatively new field gaining more popularity day by day for ramified applications among the Internet users. Virtualization plays a significant role for managing and coordinating the access from the resource pool to multiple virtual machines on which multiple heterogeneous applications are running. Various virtualization methodologies are of significant importance because it helps to overcome the complex workloads, frequent application patching and updating, and multiple software architecture. Although a lot of research and study has been conducted on virtualization, a range of issues involved have mostly been presented in isolation of each other. Therefore, we have made an attempt to present a comprehensive survey study of different aspects of virtualization. We present our classification of virtualization methodologies and their brief explanation, based on their working principle and underlying features.


Introduction
Cloud computing or Internet computing is used for enabling convenient, on-demand network access to a networks, servers, mass storage and application specific services with minimal effort to both service provider and end user [14]. For simplicity, A Cloud itself an infrastructure or framework that comprises a pool of physical computing resources i.e. a set of hardware, processors, memory, storage, networks and bandwidth, which can be organized on Demand into services that can grow or shrink in real-time scenario [15]. In surge of demand of internet and its immense usage all over the globe, Computing has moved from the traditional computing to distributed high performance computing say distributing computing, subsequently Grid Computing and then computing through clouds.

Need for Virtualization
A virtualization environment that enables the configuration of systems (i.e. compute power, bandwidth and storage) as well as helps the creation of individual virtual machines, are the key features of cloud computing. Virtualization provides a platform with complex IT resources in a scalable manner (efficiently growing), which is ideal for delivering services. At a fundamental level, virtualization technology enables the abstraction or decoupling of the application payload from the underlying physical resources [16]; the Physical resources can be changed or transformed into virtual or logical resources on-demand which is sometimes known as Provisioning. However, In traditional approach, there are mixed hardware environment, multiple management tools, frequent application patching and updating, complex workloads and multiple software architecture. But comparatively in cloud data center far better approach like homogeneous environment, standardize management tools, minimal application patching and updating, simple workloads and single standard software architecture [17].

Characteristics of Virtualization
The cloud computing virtualization environment has the following characteristics discussed below:

Consolidation
Virtualization eliminates the need of a dedicated single system to one application and multiple OS can run in the same server. Both old and advanced version of OS may capable of deploying in the same platform without purchasing additional hardware and new required applications may be run simultaneously on their respective OS.

Easier development flexibility
Application developers may able to run and test their applications and programs in heterogeneous OS environments on the same virtualized machine. It facilitates the virtual machines to host heterogeneous OS. Isolation of different applications in their respective virtual partition also helps the developers.

Migration and cloning
virtual machine can be moved from one site to another to balance the workload. As the result of migration, users can access updated hardware as well as make recovery from hardware failure. Cloned virtual machines are easy to deploy in the local sites as well as remote sites.

Stability and security
In a virtualized atmosphere, host operating systems are hosting different types of multiple guest operating systems containing multiple applications. Each virtual machine is isolated from each other and they are not at all interfering into the other's work which in turn helps the security and stability aspect.

Paravirtualization
Paravirtualization is one of the important aspects of virtualization. In a virtual machine, guest OS can run on the host OS with or without modifications. If any changes or modifications are made to the operating system to be familiar with the Virtual Machine Manager (VMM), this process is said to be "paravirtualized".

Classification
In this paper, we have classified virtualization environment into six categories based on Scheduling-based, Load distributionbased, Energy-aware-based, Operational-based, Distribution pattern-based and Transactional-based as shown in the figure [1].
We assume the host physical machines as a set of physical resources.
S= {CPU cores, Memory, Storage, I/O, Networking} and the pool of physical resources, denoted by P, is the sum of all set of physical resources. P=S 1 + S 2 +…. + S n = ∑ S i . Let us consider the resource pool set P into two subsets: P = S j 1 + S k 2 ( j ≠ k ) (j, k indicates natural number). = Set of physical machines having available resources to host VMs + Set of the remaining physical machines not having available resources to host VMs.

Scheduling-based Environment
Here we have classified scheduling environment into four sub-categories. These are as follows:

Round-Robin Procedure
The Round Robin scheduling process broadens the VMs across the pool of host resources as evenly as possible [1]. Eucalyptus cloud platform currently uses this as default scheduling policy [2]. Here we describe the procedure as below: Step1: for each new VMs, it iterates sequentially until it is able to find an available resources (physical machine that is capable of hosting VMs) from the resource pool P.
Step1 (a): If found, then matching is done between physical machine and VM.
Step2: For next VM, the policy iterates sequentially through resource pool P from the previous point where it left off the last iteration and choose the nearest host that can serve the VM.
Step 3: Now go to Step1 and iterates the whole process until all VMs are allocated.

Dynamic Round-Robin Procedure
This method is an extension of Round-Robin which includes two considerations [3]: Consideration1: One physical machine can host multiple virtual machines. If anyone these VMs has finished its work and the remaining others are still working on the same physical machine. Then no more VMs are getting hosted on this physical machine being in "retiring" state that means the physical machine can be shut down when the remaining VMs complete their execution.
Consideration2: when a physical machine does not wait for the VMs to finish and goes to the "retired" state for a long time, then it will be forcefully migrated to the other active physical machines and will shut down after the completion of the migration process.
So, this scheduling technique will consume less power than the Round-Robin procedure.

Stripping Procedure
The Stripping Scheduling policy broadens the VMs across the pool of host resources as many as possible [1]. For example, OpenNabula cloud platform currently uses this scheduling policy [4]. The procedure of this sort of stripping scheduling is given below: Step1: For each new VM, it first discards the set S k 2 . Step2: From the set S j 1 , it finds the physical machine hosting least number of VMs.
Step2 (a): if found, then matching is done between physical machine and VM.
Step3: Now go to Step1 and iterates the whole process until all VMs are allocated.

Packing Procedure
The Packing policy spread the VMs across the pool of host resources as few as possible [1]. OpenNabula cloud platform currently uses this scheduling policy and implemented as Greedy policy option in Eucalyptus [2,4]. Here the technique as follows: Step1: For each new VM, it first discards the set S k 2 .
Step2: From the set S j 1 , it finds the physical machine hosting maximum number of VMs.
Step2 (a): if found, then matching is done between physical machine and VM.
Step3: Now go to Step1 and iterates the whole process until all VMs are allocated.

Energy Aware-based Environment
In this environment, energy-efficiency issues are considered and the procedures used in each category mentioned below:

Watts per core
Watt per core policy wants to seek the host taking the minimum additional wattage per core, sinking overall power consumption. It is assumed that no additional power will be consumed during shut down or hibernating mode. The procedure as follows: Step1: For each new VM, it first discards the set S k 2 .
Step2: From the set S j 1 , it finds the physical machine taking the minimum additional wattage per CPU core based on each physical machine's power supply.
Step2 (a): if found, then matching is done between physical machine and VM.
Step3: Now go to Step1 and iterates the whole process until all VMs are allocated.

Cost per core
The Cost per core policy is energy-aware policy and also minimizes the cost estimation by seeking the host that would capture least additional cost per core. Assumption is same as in Watt per core policy. Here we describe the method: Step1: For each new VM, it first discards the set S k 2 .
Step2: From the set S j 1 , it finds the physical machine taking the minimum additional cost per CPU core based on each physical machine's power supply and electricity cost.
Step2 (a): if found, then matching is done between physical machine and VM.
Step3: Now go to Step1 and iterates the whole process until all VMs are allocated.

Load-balancing-based Environment
In the virtualization environment, the load-balancer has the responsibility to reduce the load overhead and that's why we made a classification of load balancing approaches.
Here we consider CPU core set as Q which is subdivided into allocated CPU core subset denoted as Q 1 and free CPU core subset denoted as Q 2 .

Free-CPU-Count based Procedure
This Load balancing policy is the Free-CPU-Count policy & wants to minimize the CPU load on the hosts [1]. OpenNabula cloud platform currently uses this policy as the Load Aware policy [4]. Here we describe the procedure: Step1: For each new VM, it first discards the set S k 2 .
Step2: From the set S j 1 , it finds the physical machine having maximum number of free CPU cores from set Q.
Step2 (a): if found, then matching is done between physical machine and VM.
Step3: Now go to Step1 and iterates the whole process until all VMs are allocated.

Ratio-based Load Balancing Procedure
It is an enhanced version of Count-based Load Balancing technique which is also wants to minimize the CPU load on the hosts [1]. The procedure is as follows: Step1: For each new VM, it first discards the set S k 2 .
Step2: From the set S j 1 , it finds the physical machine having maximum ratio of Q 2 and Q 1 (Q 2 /Q 1 >1) from set Q.
Step2 (a): if found, then matching is done between physical machine and VM.
Step3: Now go to Step1 and iterates the whole process until all VMs are allocated.

Operational-based Environment
In operational-based procedure, we explain the general movement of the virtual machines from the local sites to the remote sites in the same cloud or between the cross clouds.

Migration
Virtual Machine migration is the process of transferring a VM from one host physical machine to another host machine which is either currently running or will be running or may be booted up after placing the new VMs. Migration is initiated due to 1) Lack of resources in the source node or local site or 2) When remote node is running VMs on behalf of local node due to dynamic unavailability of resources or 3) For minimizing the number of host machines that are running remote sites (less energy consumption) or 4) For maximizing allocation of VMs among the local machines rather than remote machines. We will discuss the migration process as depicted in the figure [2].

Fig.2. Schematic Diagram of Migration Process.
In this context, Resource as a Service (RaaS) [5] is a physical layer comprising of a pool of physical Resources i.e. Servers, networks, storage, and data center space, which provides all the resources to the VMs. Hypervisor layer, is the management layer which provides overall management, such as decision making regarding where to deploy the VMs, admission control, resource control, usage of accounting etc. The implementation layer provides the hosting environment for VMs. Migration of a VM needs the coordination in the transfer of source and Destination (D) host machines and their states [6]. Main migration algorithm, Migration request handler, Initiate transfer handler and Forced migration handler are there to handle the migration process [7]. The overall migration steps are as follows: Step1: The Controller and the intermediate nodes (I) send the migration request to the possible future destinations. D-nodes can be in remote sites and the request can be forwarded to another remote sites depending upon resource availability.
Step2: The receiver may accept or reject depending on the account capabilities and usage of accounts.
Step2 (a): If all the business rules and accounting policy permits the request, then "initiation transfer" reply message will be sent to the requesting nodes thorough I-nodes (Intermediatenode).
Step3: Now getting the reply message, the transfer operation from the S-node to the D-node via I-nodes and VMs on the source site will be migrated to the destination sites. Here transfer initiation is done by the C-node to D-node and the transfer is verified through token [7] by the controller with the help of S-node.
In migration, monitoring is essentially needed to ensure that VMs obtaining the capacity stipulated in the Service-Level-Agreement and to get the data for accounting of the resources, which has been used by the service providers [7]. The monitoring issues may be the amount of RAM or the network bandwidth or the number of currently logged-in users. Grid Monitoring Architecture (GMA) has been defined by the Global grid Forum [8].

Live Migration
In the surge of rapid usage of virtualization, migration procedure has been enhanced Due to the advantages of live migration, say server consolidation, and resource isolation [9]. Live migration of virtual machines [10,11] is a technique in which the virtual machine seems to be active and gives responses to end-users all the time during migration process. Live migration facilitates energy-efficiency, online maintenance, and load balancing [12]. Live migration is sometimes known as called realtime or hot migration in cloud computing environment. While the virtual machine is running on the source node or one host server and during live migration, the virtual machine is moved to the target node or another host server without interrupting any active network connections or without any visible effect from the user's point of view. Live migration helps to optimize the efficient utilization of available CPU resources. Different live migration algorithms are described in [18, 19, 20, and 21]. Here we will discuss few processes of live migration.
CPU state Migration: Migration of CPU state is concerned in the context with process migration. While migrating the CPU state, the process control block (PCB), the number of cores, processor speed, and required process-specific memory will all transfer from the source host to the destination host.
Memory Migration: While concerned with the memory migration, all the pages are transferred from the source host A to the destination host B. In this memory migration, pre-copy [22] migration is concerned, where all pages are iteratively copied from source host to destination host during the first round. Assume, there are n numbers of rounds; and subsequent rounds up to n copy only the pages those are getting dirtied (dirty pages) during the previous transfer round (indicated by dirty bitmap). Here we have to consider that every virtual machine has some set of pages which update very frequently and this phenomenon leads to the performance degradation of the pre-copy. For every iteration, a "dirtying rate" is calculated depending upon the length of the pages and the number of pages being dirtied. Christopher Clark et al. [10] have bounded the number of iterations of pre-copying, based on the writable working set (WWS) according to the behavior of typical server workloads.
Storage Migration: To maintain VM migration, the system has to provide each VM with a location-independent, consistent vision of the file system and that is accessible on all hosts. Each VM are using its own virtual disk, to which the corresponding file system is mapped and transfers the contents of the virtual disk to the source machine. We depend on the storage area networks (SAN) or NAS to permit us to migrate connections to the storage devices. This phenomenon allows us to migrate a disk by reconnecting to the disk on the target machine.
Network Migration: To locate the remote systems and to communicate with a virtual machine, a virtual IP address (known to other units) has been assigned to each virtual machine. This IP address and the IP address of the currently hosting machine of the VM are distinct. Each virtual machine can have its individual unique virtual MAC address. The hypervisor maps between the virtual IP & the MAC addresses to their corresponding virtual machines. It is to be considered that all the virtual machines should be in the same IP subnet. While migrating to the target node, an ARP broadcast has to be sent to the network declaring that the IP address has moved to a new MAC address (physical location) and TCP connections survive the migration.
Device Migration: Hypervisor makes the physical hardware virtualized and represents each VM with a standard set of virtual devices. Well-known physical hardwares are completely emulated by these virtual devices in an effective way and these devices translate the VM requests to the system hardware. Device migration requires the dependency on host-specific devices as it may be difficult to migrate due to an awkward provisional state (for example, CD recording device while recording) or unavailability for migration.

Distributed Pattern-based Environment
Previously we have explained different types of scheduling policy, energy efficient technique, load balancing and migration procedure. But there must be a virtual machine deployment pattern which helps the distribution of the virtual machines making it more efficient in faster response time, minimizing communication latency, avoiding congestion and dynamic updation. The classification is as follows:

Centralized distribution
Centralized distribution is the traditional approach. It can be implemented in a simple way so that the users and the administrators can use easily. The VM images are stored in the central NFS (Network File Server) server and the client nodes retrieve copies of VMs from the central node on demand. This type of multiple point-to-point transfer creates an inconsistent situation when a large number of clients want to access multi-terabyte file. So, client-transfer should be synchronized.

Balanced Binary tree Distribution
Balanced Binary tree based distribution is used for reducing the overhead of network congestion and allowing parallel transfers. Here all the computing nodes are set in balanced binary tree pattern making the source node as the root node. The root node always carries the VM images and distributes the VM images from the parent node to child node. Newly arrived child node can easily get the data from its parent node. Thus data and VM images are flowing through the entire binary tree. But when a node is crashed, then all the child node of this very node will get stopped. So time-out and re-transmission strategy make it resolved.

Unicast distribution
Unicast distribution distributes the VM images in a sequential order to the destination nodes even in remote sites. But it is so time consuming and faces the network congestion.

Multicast
Multicast is efficient way for distribution of VMs. In multicast, the information or data is transmitted to a required group of destination nodes in s single transmission. Packets are sent in a group so that it can minimize the CPU load but increase the packet loss probability. It results best in Local Area Network (LAN).

Distribution between cross clouds
Multicast does well in LAN. But sometimes transfer is required beyond the LAN. Considering an example, more than one private, physically distinguishable desktop cloud is sharing the same data and information and they are using multicast distribution method. But transferring the data is forbidden by their network policy. To overcome this type of constraint, peer-to-peer or balanced binary tree distribution mechanisms is used over the common network linking those clouds.

Peer-to-Peer distribution
Peer-to-Peer is a decentralized approach. There is no centralized server, every node in the system works as a server or client. Every VM node may acts as sender or receiver or sender & receiver both. It is possible to multiple transfers of different files to the same node. Bit Torrent Protocol is the example of peer-topeer distribution [13].

Transactional-based Environment
In this category, we have explained different architecturebased virtualizations and deployment of different operating systems & new applications.

Isolated Guest Operating System-based virtualization approaches
In this approach, host OS is running on the hardware infrastructure. It supports multiple guest virtualized OS on the same physical server and it is capable of maintaining isolation of different guest OS shown in the figure [3]. All the operating systems are using the same kernel and hardware infrastructure. The host OS controls the guest OS.

User application-based virtualization approaches
In this approach, virtualization is done according to the users' on demand requirement and it is hosted on the upper of the host OS as shown in the figure [4]. On getting request, emulation of VM containing its own guest operating system & related applications is carried out by this virtualization method so that users can get their specific on-demand service from the emulated VMs.

Hypervisor-based virtualization approaches
Hypervisor is a mainframe operating system, which allows other operating systems to run on the same system concurrently. And its monitoring system monitors the accesses of Virtual Machines. Hypervisor is accessible in the booting time of the system to regulate the allocation of hardware infrastructure to the multiple VMs from the resource layer. The architecture model is shown in the figure [5].

Summary and Future Directions
Due to the large number of virtualization technique reviewed in this paper, it is difficult to compare all their quantitative performance. Here we present a comprehensive review study on different aspects virtualization procedure and the inter relationship among them. We have shown here what should be considerations while doing virtualization.
In this paper, different types of scheduling technique and load balancing technique are explained and these procedures are implemented in most of the cases in management node and load balancer. Load balancer is responsible for reducing the complex load overhead. Here, we have described two types of energyefficient policies which can be quickly, easily and trivially implemented in existing cloud platforms, without any necessity for external systems and also provides the cloud administrators with a convenient, suitable and simple way of improving power efficiency and minimizing energy costs across their data centers. Combining these aspect, we can suppose that better load balancing algorithm and scheduling technique are to be introduced which should be more powerful & efficient and also less energyconsuming. While considering different virtual machine distribution technique, no matter if it is a public or private or community cloud connected via LAN or various remote cloud computing sites connected through internet. Among these distributing techniques, multicast seems to be most efficient because it minimizes CPU load, reduces power-consumption and effectively cost-efficient. Virtual machine image distribution techniques should be in an efficient way so that while migration comes into the picture, source & destination nodes or local or remote sites should be commented in a particular pattern. Live migration is dynamic abstraction layer which provides energy-efficiency, online maintenance, and load balancing. That's why, live migration is hot topic in the field of cloud computing virtualization. Different virtualized approaches enlighten that what should be architecture based on application or OS or hypervisor on-demand basis. In hypervisor-based security constraints are there. As hypervisor controls all the accesses of VMs and monitors the environment, so failure of the hypervisor or crashing of hypervisor or attack on hypervisor by the intruders may lead to performance degradation. Hence, security aspects are to be considered while making virtualization.

Conclusion
Cloud computing is one of the most emerging technology in the Internet. In the context of cloud computing, virtualization is used computer resources to imitate other computer resources or whole computers. This survey study has represented a wide range of overview of the research work in the field of Cloud Computing with respect of the virtualization methods. We have discussed many procedures, schemes and mentioned their silent features. Particularly, we have looked forward for the issues in scheduling, load distribution, energy efficiency, distribution pattern and also transactional approaches. We have discussed the basic principles of the various virtualization techniques, through which a general classified virtualization concept has been represented. This survey is done on the basis of virtualization environment in cloud computing, which may help the researchers to expand the concepts of virtualization and this may lead to practical implementation of virtualization in industry level in a private cloud.