AUTONOMIC, OPTIMAL, AND NEAR-OPTIMAL RESOURCE ALLOCATION IN CLOUD COMPUTING

Size: px

Start display at page:

Download "AUTONOMIC, OPTIMAL, AND NEAR-OPTIMAL RESOURCE ALLOCATION IN CLOUD COMPUTING"

Jonas Walters
5 years ago
Views:

1 AUTONOMIC, OPTIMAL, AND NEAR-OPTIMAL RESOURCE ALLOCATION IN CLOUD COMPUTING by Arwa Sulaiman Aldhalaan A Dissertation Submitted to the Graduate Faculty of George Mason University In Partial fulfillment of The Requirements for the Degree of Doctor of Philosophy Information Technology Dr. Daniel A. Menasce, Dissertation Director Dr. Alexander Brodsky, Committee Member Dr. Sam Malek, Committee Dr. John ShortIe, Committee Member Member Dr. Stephen Nash, Department Chair Dr. Kenneth S. Ball, Dean, The Volgenau School of Engineering Spring Semester 2015 George Mason University Fairfax, VA

2 Autonomic, Optimal, and Near-Optimal Resource Allocation in Cloud Computing A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University By Arwa Sulaiman Aldhalaan Master of Science Bowling Green State University, 2010 Bachelor of Science King Saud University, 2006 Director: Dr. Daniel A. Menascé, Professor Department of Computer Science Spring Semester 2015 George Mason University Fairfax, VA

4 Dedication I dedicate this dissertation to my beloved husband, parents, and sons. Without them, this dissertation would have never been written. I am grateful to have them in my life. iii

5 Acknowledgments I am deeply thankful to my advisor Prof. Daniel Menascé for his guidance and support. I truly cannot imagine a better advisor. He provided me with guidance, assistance, valuable feedback, encouragement, and expertise that I needed during my dissertation. I am also grateful to Dr. Alexander Brodsky, Dr. Sam Malek, and Dr. John Shortle for serving in my committee and for their time and advise. I am thankful to my lovely family for making success possible and rewarding. My special thanks goes to my dear husband Abdulelah Alrajhi who encouraged me to reach my dreams, and motivated me when I needed it the most. He supported me while pursuing his own doctorate which is magnificent. I am thankful to my parents Sulaiman, and Maha for their constant love and endless support. To my grandparents for their prayers and words of wisdom. To my caring brothers and sister for their continuous encouragement. To my beautiful little sons Faisal, Nawaf, Sultan, and Mohammed who have been my motivation, inspiration and drive. Finally, I am thankful to King Saud University for their scholarship, and to my country for funding my education. iv

6 Table of Contents Page List of Tables viii List of Figures ix Abstract xi 1 Introduction Problem Statement Thesis Statement Summary of Contributions Organization of The Dissertation Background and Related Work Cloud Computing Virtualization Autonomic Computing Optimization Techniques and Heuristic Search Related Work Resource Allocation and Scalability Power and Energy Consumption Performance Modeling in Virtual Environments Live VM Migration Summary of Related Work Analytic Performance Modeling and Optimization of Live VM Migration Introduction Background and Problem Statement Analytic Model of Live Migration With Uniform Dirtying Rate Analytic Model of Live Migration With Hot Pages Model of Copying Hot Pages During the Pre-Copy Phase Model of Copying Hot Pages During the Downtime Phase Clusters of Pages Summary of Analytic Model Results v

7 3.7 Optimizing Live Migration Parameters Numerical Results Conclusion Simulation Validation of Live VM Migration Analytical Performance Models Experiments With Uniform Dirtying Rate Parameters Results Experiment Description using Clusters of Pages Results Autonomic Allocation of Virtual Machines in IaaS Cloud Providers Introduction Problem Description Processing of Consumer Requests Optimization model Minimization of VMs migration Revenue model Availability model Heuristic Search Experiments Results Concluding Remarks Autonomic Allocation of Communicating Virtual Machines in Hierarchical IaaS Cloud Providers Introduction Problem Assumptions and Notation Revenue model Optimization Problem Heuristic Algorithms Basic VM Allocation Heuristic (BVAH) Advanced VM Allocation Heuristic (AVAH) No Communication (NoComm) Allocation Strategies Availability Constraint Experimental Results Concluding Remarks Autonomic Allocation of Virtual Machines in SaaS Cloud Providers Introduction vi

8 7.2 Problem Formalization and Notation Optimization Model Heuristic Search The ScaleUpDown Solution ScaleUpDown Algorithm Example The FillSlotsFirst Algorithm The Optimal Algorithm Experimental Results Concluding Remarks Conclusion and Future Work Bibliography vii

9 List of Tables Table Page 3.1 Summary of performance model results Parameter values used in the experiments Optimization Results Parameter values used in the experiments for the uniform dirtying rate case Summary of results for the uniform dirtying rate case from both the simulation and the model Parameter values used in the experiments for the clusters of pages case Summary of results for the clusters of pages case from both the simulation and the model Inputs, variables and outputs of a problem instance at time t Parameter values used in the experiments Average results including 95% confidence intervals Average revenue for various values of α Types of co-locations for VMs i and j Allocation Example Parameter Values for the Experiments Summary of Results Availability with DNC Allocation Example Parameter values used in the experiments CPU and I/O service demands (in sec) for each VM type Summary of results of the ScaleUpDown algorithm and the optimal solution 157 viii

10 List of Figures Figure Page 1.1 Outline of the dissertation framework Summary of contributions HG vs. β T down in seconds vs. α Gain vs. α P TotalMig vs. α T down in seconds vs. α Gain vs. α P TotalMig vs. α T down in seconds vs. α for the clusters of pages Gain vs. α for the clusters of pages P TotalMig vs. α for the clusters of pages Number of pages migrated vs. number of iterations α 10% Number of pages migrated vs. number of iterations α 20% Number of pages migrated vs. no. of iterations α 30%, P s Number of pages migrated vs. no. of iterations α 30%, P s Number of pages migrated vs. no. of iterations α 10% for the clusters of pages Number of pages migrated vs. no. of iterations α 20% for the clusters of pages Number of pages migrated vs. no. of iterations α 30%, P s 16 for the clusters of pages Number of pages migrated vs. no. of iterations α 30%, P s 32 for the clusters of pages Total normalized allocated capacity, C alloc, vs. time Revenue vs. time Availability vs. time CDF of the total capacity allocated to ecps Infrastructure of a Cloud Service Provider ix

11 6.2 Example of the Operation of the BVAH Algorithm Normalized allocated capacity over time AVAH vs. BVAH for the linear revenue function AVAH vs. A-NoComm for the linear revenue function AVAH vs. A-NoComm for exponential revenue function BVAH vs. B-NoComm for the linear revenue function BVAH vs. B-NoComm for the exponential revenue function Availability for AVAH with no DNC vs. AVAH with DNC Revenue per request for AVAH with no DNC vs. AVAH with DNC Accumulated revenue for AVAH with no DNC vs. AVAH with DNC Framework architecture for SaaS Average number of users ScaleUpDown: Cost per second vs. time for current VM allocation ScaleUpDown: Accumulated cost Cost per second vs. time for ScaleUpDown and FillSlotsFirst Accumulated cost (in $) for ScaleUpDown and FillSlotsFirst ScaleUpDown: Average number of users per VM in each VM type ScaleUpDown: Average number of used VMs of each type FillSlotsFirst: Average number of users per VM in each VM type FillSlotsFirst: Average number of used VMs of each type ScaleUpDown: Average total number of users in each type of VM x

12 Abstract AUTONOMIC, OPTIMAL, AND NEAR-OPTIMAL RESOURCE ALLOCATION IN CLOUD COMPUTING Arwa Sulaiman Aldhalaan, PhD George Mason University, 2015 Dissertation Director: Dr. Daniel A. Menascé Cloud computing has changed the IT industry by providing robust technology solutions at a variety of prices. The technology used to implement a cloud computing infrastructure is based on largely distributed virtual environments that provide services to consumers allowing them to lease computing resources that scale to their needs, and gives consumers the ability to run a wide range of applications dynamically and on an on-demand basis. Therefore, improving resource flexibility and scalability. Resource management is a core aspect of virtualization in cloud computing and enables efficient management of a large volume of resources. Therefore, there is a need to use autonomic techniques for resource management in cloud computing in order to optimize a utility function of interest to stakeholders in a way that considers tradeoffs between competing Service Level Agreements (SLAs) contracted between consumers and cloud providers (CPs). Autonomic and optimization techniques for resource management in cloud computing are designed, implemented, and validated in this research to process consumers requests in an efficient way in cloud computing environments.

13 In this research, we provide the design and validation of analytic performance models and optimization of live virtual machine migration with minimal downtime and disruption of services, and provide a non-linear optimization solution in Chapter 3. This model is validated in Chapter 4 using a cloud simulation environment. Additionally, we designed and implemented an autonomic resource provisioning method in an Infrastructure as a Service (IaaS) cloud with the goal of maximizing the cloud provider s revenue subject to availability, capacity, and virtual machine (VM) migration constraints. We developed a heuristic algorithm to solve this NP-hard problem in Chapter 5. Furthermore in Chapter 6, we provide an autonomic resource provisioning method considering the hierarchical structure of an IaaS cloud of data centers, clusters within the data centers, racks within the clusters and servers within the racks. This problem assumes that cloud customers request a group of virtual machines with a communication pattern among them. Our solution minimizes communication costs and improves performance subject to VM co-location constraints. In Chapter 7, we provide an autonomic solution for resource management in a Software as a Service (SaaS) cloud where software services are provided dynamically at different prices and categories enabling consumers to subscribe, and unsubscribe to software services. The SaaS cloud provider s goal is to minimize the cost of its infrastructure, leased from an IaaS provider, and at the same time ensure the delivery of software services to customers while meeting response time SLAs subject to resource constraints.

14 Chapter 1: Introduction The demand for computing cycles has steadily increased in recent years. With the advance of various information technologies, more people are able to access faster communication networks, access a wide range of information repositories, and many other resources that became part of our daily lives. These interactions, at both the personal and organizational levels, range from commerce, business, education, manufacturing, software as a service, to social networking. As a result, there is a need for scalable, efficient, and dependable infrastructures such as cloud computing. Cloud computing has changed the IT industry by providing robust technology solutions at a variety of prices. The technology used to implement cloud computing infrastructure is based on largely distributed virtual environments that provide services to consumers allowing them to lease computing resources that scale to their needs, and gives consumers the ability to run a wide range of applications dynamically and on an on-demand basis. Therefore, improving resource flexibility and scalability. Cloud computing has been defined by NIST [65] as A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. The main advantage of cloud computing is that it provides its users a pay-as-you-go model, where they can pay for use of computing resources as they need. Thus, there is no need to provision resources for peakloads since cloud computing supplies a dynamic way of allocating the needed resources with consistent performance. Another advantage, is the ability of a cloud computing environment to adapt its resource usage according to customer s workload variability. This feature of cloud computing eliminates the need for users to plan far ahead for resource provisioning. 1

15 One of the main supporting technologies used in cloud computing is virtualization. Virtualization is considered a key enabler for cloud computing since it provides lower operating costs for extremely large-scale computer data centers when compared with traditional data centers. Therefore, the elasticity and scalability in cloud computing allow customers to dynamically use computing resources especially with applications that produce workloads of great diversity and massive scale. Cloud computing providers (CPs) use virtual machines (VMs) that can be easily allocated into and deallocated from physical machines and can also be migrated to different physical machines in order to meet Quality of Service (QoS) objectives. A random allocation of VMs on PMs could cause significant inefficiencies for CPs and frustration for its consumers. Therefore, resource management is a core aspect of virtualization in cloud computing and enables the efficient management of a large volume of resources. Resource management concerns include: VM migration, live migration strategies, VM allocation, infrastructure consolidation, workload scheduling, energy consumption, resource utilization, and resource monitoring. These resource management concerns are best addressed with the help of performance modeling, optimization, and autonomic computing methods to produce optimal or near-optimal solutions for resource allocation that greatly improve overall system efficiency with minimal overhead. Service and infrastructure management in clouds is challenging due to their ever-growing complexity and variability in workload and environmental requirements. Immediate responses to these variations are necessary to meet the agreed Service Level Agreements (SLAs). Consequently, human administration becomes impractical. Thus, there is a need to use autonomic techniques and optimization methods for resource management in cloud computing to obtain an optimal or near-optimal allocation of resources to optimize a utility function of interest to stakeholders in a way that considers tradeoffs between competing SLAs contracted between consumers and CPs. 2

16 1.1 Problem Statement Cloud computing is based on largely distributed virtual environments that provide services to its users to use computing resources that scale to their needs. This infrastructure contains physical servers that are organized in a hierarchical structure consisting of several regions of data centers including racks and clusters of physical machines. Cloud computing providers rely on virtualization to manage the dynamic nature of this infrastructure, where virtual machines can dynamically be allocated to and deallocated from physical machines and also be migrated to different physical machines as the workload varies in order to meet SLAs and QoS goals. Cloud computing providers have to be able to manage their resources in a dynamic way in order to optimize some objective function of interest to them. Therefore, to achieve this goal, the most pressing issues to be addressed include: how to process consumers requests in an efficient way in a cloud computing environment to (1) optimally or near-optimally allocate virtual machines resources into physical machines, (2) outsource virtual machines to external cloud providers, (3) migrate virtual machines to other physical machines with minimal downtime and disruption of services, (4) optimally or near-optimally decide the type and amount of VMs to be allocated by an SaaS to meet QoS goals in a dynamic and an on-demand way in the presence of an ever-changing workload and infrastructure capabilities to constantly meet QoS goals. These QoS goals can include availability, response time, and throughput. 1.2 Thesis Statement It is possible to use autonomic and optimization techniques to dynamically manage computing resources of a cloud provider in a way that meets QoS goals and provides optimal or near-optimal allocation of resources that optimize an objective function of interest to the cloud provider. 3

17 1.3 Summary of Contributions The key contributions of this dissertation are (also summarized in Figures 1.1 and 1.2): Design and validation of analytic performance models and optimization of live VM migration with minimal downtime and disruption of services. Our proposed model takes into consideration several models of page modifying rates, and provides a non-linear optimization solution. We conducted experimental validation of live VM migration models by simulating a real cloud environment. Design, implementation, and validation of an autonomic resource provisioning method in an Infrastructure as a Service (IaaS) cloud with the goal of maximizing the cloud provider s revenue subject to availability SLA, capacity, and VM migration constraints. We developed and assessed a heuristic algorithm to solve this NP-hard problem. Design, implementation, and validation of an autonomic resource provisioning method in an IaaS cloud organized in a hierarchical structure consisting of regional data centers, clusters within the data centers, racks within the clusters and servers within the racks. This problem assumes that cloud customers request a group of virtual machines with a communication pattern among them. Our solution minimizes communication cost subject to virtual machine co-location constraints and improves performance. Design, implementation, and validation of an autonomic solution to provide resource management in a Software as a Service (SaaS) cloud where software services are provided dynamically at a variety of prices and categories enabling consumers to subscribe, and unsubscribe software services. The cloud provider s goal is to minimize infrastructure cost and ensure to deliver software services to customers while meeting response time SLA subject to resource constraints. 4

18 Figure 1.1: Outline of the dissertation framework: user requests arrive to SaaS and VMs are requested from IaaS to run applications; then VMs migrations are performed by IaaS to optimize the use of resources. Figure 1.2: Summary of contributions 5

19 1.4 Organization of The Dissertation This dissertation is organized as follows: Chapter 2: Background and Related Work. Chapter 3: Analytic Performance Modeling and Optimization of Live VM Migration. This chapter provides an analytical performance model for live migration of virtual machines. The chapter includes a model for a uniform dirtying rate for memory pages and two cases of hot and non-hot memory pages. A non-linear optimization solution is also provided. We also extend our model to include several categories of memory pages by clustering the source VM s memory pages into groups depending on their dirtying rate. Chapter 4: Provides an experimental validation of live VM migration models provided in Chapter 3. Chapter 5: Autonomic Allocation of Virtual Machines in IaaS Cloud Providers. This chapter provides an autonomic resource provisioning method with availability constrains in an IaaS cloud environment that aims to utilize resources and maximize the cloud provider s revenue. Chapter 6: Autonomic Allocation of Communicating Virtual Machines in Hierarchical IaaS Cloud Providers. We consider in this chapter the hierarchical structure of computing resources (regional data centers, clusters, racks, and servers) with the goal of optimizing communication cost between virtual machines. Chapter 7: Autonomic Allocation of Virtual Machines in SaaS Cloud Providers. In this chapter, we present an autonomic method to optimize resource allocation in a Software as a Service (SaaS) cloud environment with the goal of minimizing the infrastructure cost while meeting response time SLAs. Chapter 8: Conclusions and Future Work. 6

20 Chapter 2: Background and Related Work 2.1 Cloud Computing Cloud computing is a model for enabling convenient and on-demand access to shared computing resources such as servers, storage, software, applications, and services that can be dynamically provisioned as needed. According to the National Institute of Standards and Technology (NIST) [65] cloud computing is defined as A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. The essential characteristics of cloud computing as set by NIST [65] are: On-demand self-service: cloud computing provides on-demand services where users can obtain computing resources as needed. Broad network access: cloud computing resources can be accessed over a network through a wide range of devices such as personal computers, tablets, and smartphones. Resource pooling: datacenter resources include: CPUs, networks, and storage. These resources are pooled to provide access to multiple simultaneous users. Rapid elasticity: cloud computing resources can be elastically allocated or deallocated according to consumers needs. These computing resources can scale according to workload variations. Measured service: resources in cloud systems can be monitored and measured such as memory, network bandwidth, and CPU utilization. The service Models of cloud computing as set by NIST [65] are: 7

21 Software as a Service (SaaS): software applications are hosted by a cloud provider and made available to consumers through the internet. Consumers can use software applications without the need to perform software installation, software maintenance, software update, nor manage the underlying cloud infrastructure. Platform as a Service (PaaS): cloud computing provider provides a platform service to its consumer to deploy their created applications using programming languages, libraries, services, and tools supported by the provider. Consumers can use this cloud platform without the need to manage the underlying cloud infrastructure. Infrastructure as a Service (IaaS): cloud computing provider provides a cloud infrastructure to provision computing resources such as CPU, memory, network, and storage. Consumers can allocate or deallocate resources as they need in a dynamic way without managing the underlying cloud infrastructure. 2.2 Virtualization Virtualization is a technique that was introduced in the early 70s by IBM to increase the level of sharing and utilization of expensive computing resources such as mainframes [66]. Virtualization provides a hardware abstraction layer, called the hypervisor or the Virtual Machine Monitor (VMM), on top of the physical hardware. This layer provides an interface that is functionally equivalent to the actual hardware (including CPU, memory, and disks) to a number of virtual machines. Several virtual machines can be instantiated on top of this abstraction layer each with different operating systems as long as they are compatible with the hardware itself. There are several advantages of virtualization including: Statistical multiplexing: it is used to increase utilization. It is unlikely that different VMs on the same physical machine will exhibit the same workload intensity levels at all times. Therefore, the unused resources of low utilization VMs can be directed 8

22 to other co-located VMs with higher utilization to improve resource sharing. This feature has enabled the operation of extremely large-scale data centers at lower costs. Load Balancing: it is relatively easy to migrate VMs from one physical machine to another in order to improve performance. Also, if one physical machine is overloaded, some VMs can be migrated to another physical machine to obtain a better load balancing. Clean separation between applications running on different VMs on the same physical machine. This provides security and reliability. A security attack on one VM does not compromise other VMs because of their isolation. Also, virtualization provides a more reliable environment because if one VM hangs, it does not affect other VMs running on the same physical machine. 2.3 Autonomic Computing In 2001, IBM suggested the concept of autonomic computing. It is inspired on the human body which has a complex autonomic nervous system that takes care of most bodily functions, thus removing from our consciousness the task of coordinating all our bodily functions [45]. Autonomic computing systems can regulate themselves without human intervention and adapt to changing environments in a way that enhances the performance or QoS [15]. IBM s four fundamental characteristics of self-management in autonomic computing are: self-configuration, self-optimization, self-healing, and self-protection [45]. In this research, self-optimization techniques are taken into consideration in order to optimize the CP s use of resources. The goal of a CP is to optimize a specific utility function. The attributes of this utility function include QoS metrics of interest such as availability, response time, or power consumption of the computing resources. When performance degradation or failures occur over time, the autonomic computing system will automatically change its configuration in a way that optimize the utility function of the system. The goal of this research is to study, design, implement, and evaluate techniques for autonomic resource management 9

23 in cloud computing. 2.4 Optimization Techniques and Heuristic Search Optimization is the art and science of allocating resources to the best possible effect [23]. Specifically, it solves the problem of finding the best solution among all feasible solutions, with the goal to maximize or minimize an objective function of interest. The solution that maximizes or minimizes the objective function is called the optimal solution. Usually, optimization problems have a set of constraints in the form of equalities and/ or inequalities that need to be satisfied by the optimal solution. There are different optimization methods, depending on the nature of the optimization problem: Linear Programming, Non-linear Programming, Mixed-Integer Linear Programming, Mixed-Integer Non-linear Programming. In Linear Programming (LP), a linear objective function is optimized, and all its constraints are of linear equalities and inequalities [23]. LP works by determining the values of the variables that provide the optimal solution that maximizes or minimizes the objective function from the space of all feasible solutions. Non-linear Programming (NLP) is similar to LP except that its objective function or constraints are non-linear. In Mixed-Integer Linear Programming (MILP), the value of the variables in the model are mixed such that some of the variables are real values and others are integer values [23]. The objective function and constraints in MILP are all of a linear form. If the objective function or any of the constraints are of a non-linear form, then this is called a Mixed-Integer Non-linear Programming (MINLP). MINLP problems are much harder to solve because every possible solution must be tested [23]. MILP problems are usually solved using the branch-and-bound method, in which the search for a solution has a tree structure where a bounding function is used to determine the most promising node by estimating a bound on the objective function. Then, this node is selected for further growth (branching) [23]. The problem with MILP and MINLP is that they are time and memory consuming especially for medium or large size problems. 10

24 In some optimization problems it is impossible to enumerate every possible solution and take the best of all as in LP and NLP. The reason is that some problems have combinatorial explosion [23] where the number of possible solutions grows exponentially with the number of elements in the problem. Therefore, other kinds of optimization methods such as combinatorial search techniques are used to provide a near-optimal solution in a relatively low amount of time. Combinatorial search techniques use search algorithms for finding a near-optimal solution where exhaustive search is not feasible, by efficiently exploring the large solution space using heuristics for example. Heuristics is a method that is not guaranteed to provide the optimal solution, but it provides a near-optimal solution more quickly than a classic method such as LP. Examples of combinatorial search techniques include: hill climbing and beam search. Hill climbing [81] is a combinatorial search that starts from the current point of allocation in the space of possible solutions. Then, it finds a neighborhood of that point using a heuristic procedure. After that, it finds the neighbor with the highest value for the objective function. Then, the search moves to that point. This process repeats iteratively until no improvement is found or until a threshold on the maximum number of visited neighborhoods is reached. In this way, hill climbing provides a local-optimum solution. To avoid being trapped in a local-optimum solution, hill climbing can be restarted several times from a random point. Then, the near-optimal solution is the best solution among the local-optima. Beam search [79] is also a combinatorial search that starts from the current point of allocation in the space of possible solutions. The objective function is computed for all neighbors of that allocation. The k allocations with the highest value for the utility function are kept to continue the search. The value of k is the beam. Then, neighbors of each k selected points are evaluated, and so on. This iterative process continues until a given level d is reached. Several other optimization methods can be used to find an optimal or a near-optimal solution in a large solution space such as simulated annealing, genetic algorithms, and evolutionary computing [61]. 11

25 2.5 Related Work Resource management in cloud computing has been investigated in several research papers as an optimization problem with different QoS goals and different solution methods in order to improve the usability and the efficiency of the cloud infrastructure. Specifically, providing solutions for the problem of resource allocation and VM placement, VM migration and live migration, improving the scalability in the cloud environment, minimizing power consumption and energy costs, performance modeling in virtual environments, and improving QoS metrics such as response time and availability Resource Allocation and Scalability Dynamic VM placement and consolidation has been studied in several papers using different goalssuchasresponsetime,availability,orpowersaving. In[51],thereadercanfindasurvey and discussion on the resource provisioning options for IaaS providers. Frameworks for an exact solution of the problem were presented in [16,36,54,78,84,90,91], while examples of approaches based on heuristics can be found in [19,27,28,33,34,38,48,80,82]. Fundamentals of an on-going effort to develop a toolkit is presented in [30]. This toolkit supports scalable and dependable cloud platforms and architectures enabling fixable and dynamic provisioning of cloud resources. The focus of the toolkit is adapting self-preservation cloud systems to meet predicted and unforeseen changes in resource requirements on the cloud service life cycle including construction, deployment, and operation of services. The goal is to provide a foundation for a reliable, sustainable, and trustful cloud computing industry. The authors in [49] survey the recent literature and outline a framework for cloud resource management which lays the basis for the core of their research. They classify the available literature into eight sub domains such as global scheduling, local scheduling, demand profiling, utilization estimation, pricing, application scaling, workload management, cloud management, and measurement studies. Moreover, they identify five challenges for future investigation including providing predictable performance for cloud-hosted applications, achieving global manageability for cloud systems, engineering scalable resource 12

26 management systems, understanding economic behavior and cloud pricing, and developing solutions for the mobile cloud paradigm. Finally, they provide a set of fundamental research challenges for cloud resource management. Finding the optimal deployment using heuristics has been studied in [15, 62, 68]. In [15], combinatorial search techniques and analytical queuing network models have been used to dynamically deploy resources in a datacenter. Specifically, their method included beam search to find an optimal allocation of applications into servers with the variation in workload. They used a multi-class queuing network to model online and batch jobs. The goal is to optimize a utility function of the response time for online jobs and throughput for batch jobs. The authors in [68] provide an autonomic controller that dynamically allocates CPU resources to various VMs under varying workload levels to optimize a global utility function. This utility function is a weighted average of response time, throughput, and probability of rejection metrics relative to SLAs. Their approach of assigning the CPU allocation to each VM considers a priority-based allocation and CPU share based allocation. Using beam search, the controller algorithm searches the space of CPU allocations to find the allocation that provide a near-optimal allocation. Nevertheless, in this research we implement other autonomic methods for resource provisioning in cloud environments such as hill climbing search with different constraints including availability and response time SLAs. Another framework was proposed in [62] to find the optimal deployment of software components into hardware nodes while meeting a multidimensional QoS metric (latency, energy consumption, security, and availability). The authors propose four optimization algorithms: Mixed Integer Linear Programming, Mixed Integer Non-Linear Programming, greedy algorithm, and generic algorithm. Each of these algorithms can be used depending on thevariabilityandthenatureoftheworkload. Thisstudycanberelatedtoourtopicbecause they provide the use of different algorithms to optimize software components allocation into hardware nodes. One of these four algorithms can be used depending on the system characteristics such as rate of change in system parameters. Accordingly, in our research of resource allocation we employ different algorithms such as heuristics, LP, or NLP for 13

27 optimizing resource allocation in a distributed cloud environment. Some papers provide different methods for resource allocation as in [17,77,78,84]. In [78] the authors present Wrasse, which is a tool that provides a generic and extensible interface to solve resource allocation problems. It uses the power of GPUs to implement a parallel solver to find the near-optimal solution of generalized allocation problems. The framework proposed provides a language that enables easy re-formulation of problem specification when requirements change for a bin-packing problem. Their evaluation shows that Wrasse provides solutions that are as good as those provided by heuristics used in the experiment. However, some limitations are associated with this proposed tool such as it cannot determine non-existence of a solution, does not guarantee a minimal solution. The authors in [17] introduce a management algorithm for dynamic resource allocation in virtualized server environments. The goal of the algorithm is to minimize the cost of running the data center. The cost has competing terms that penalize overcapacity (low utilization) and overloading which causes poor application performance and violates response time SLA. Their algorithm is based on Measure-Forecast-Remap (MFR), which starts by measuring historical data, forecasting the future demand, and then remapping VMs to PMs. They compare their algorithm with a static allocation which showed an improvement of 50% and a reduction of SLA violation up to 20%. However, the authors took response time SLA into consideration, and they compare the performance of their proposed method with a static allocation where it would be better to compare against a dynamic allocation method. A unified method that considers policies to place application replicas and distribution of client requests among them was discussed in [77]. In [84], a bin-packing formulation that maximizes resource satisfaction in a datacenter is proposed. The authors in [74] solve the problem of virtual resource allocation for networked cloud environments specifically for the node mapping phase. They propose a method for mapping user requests to virtual resources using a mixed integer programming (MIP) solution capable of taking into consideration QoS requirements. Their goal is to minimize the mapping cost and the overall number of hops for every virtual link mapped to a substrate path to reduce the number 14

28 of hops along a traffic route between two communication nodes. However, in our research we use heuristics that are based on hill climbing to maximize the revenue and minimize the communication cost between communicating VMs in distributed cloud environments subjected to SLA constraints. Other methods such as Branch-and-Bound are presented in [6]. The authors present an autonomic adaptation process for multi-cloud applications that relies on a Branch-and- Bound algorithm to optimize the adaptation process when selecting the services to be deployed within the application. Their algorithm selects the best configuration when adapting a multi-cloud application by choosing the services that properly fit the quality properties specified by the application. Moreover, other methods are proposed for auction- based VM allocation as in [63]. The authors in [63] address the problem of auction-based VM allocation and provisioning by designing an autonomic approximation mechanism that uses a dynamic programming algorithm to optimally select the winning users. Their mechanism determines the payment the users have to pay for using the allocated resources. In their setting, users are allowed to request bundles of heterogeneous VM instances such as requesting communication intensive VMs and computation intensive VMs together. Their proposed strategy-proof polynomial time approximation scheme (PTAS) mechanism can find an allocation of resources to the users maximizing the social welfare, where the social welfare is the sum of users valuations. However, in our research we use autonomic methods for VM allocation with the goal of maximizing the cloud provider s revenue and satisfying specific SLAs. We also provide heuristic search techniques that allocate communicating VMs as close as possible to increase the revenue and minimize communication costs. Some research [8,9,41,48,56] was conducted to study the characteristics of the servers in the cloud infrastructure such as servers structure or servers heterogeneity. The organization of servers in the cloud infrastructure has been studied in [48] to minimize response time and provide data locality. The authors presented SCAVP, a structural constraint-aware virtual machine placement system, to improve performance and availability of services hosted on IaaS clouds. SCAVP supports three types of constraints (demand, communication and 15

29 availability) and uses approximation algorithms to find a sub-optimal solution. Their hierarchical structure of servers is organized into regions, zones, and racks with the main objective of minimizing the communication cost while satisfying both demand and availability constraints. However, their solution is based on using a graph approach where the application is transformed into a VM graph and the datacenter specification is transformed to a datacenter graph. Then, they apply divide-and-conquer methodology for VM placement. In our study, we use a heuristic algorithm as the core of VM placement into PMs. The authors in [18] propose an optimization framework that improves fault tolerance and reduces bandwidth usage by optimizing the allocation of applications to physical machines. They present a detailed analysis of a large-scale web application and its communication patterns. However, much information is needed as input for their framework (e.g., network topology of the cluster, services running in the cluster and number of machines required for each service, list of fault domains in the cluster and list of machines belonging to each fault domain, and traffic matrix for services in the cluster). In our research, we provide different heuristic algorithms that use the communication matrix between communicating VMs to improve the VM allocation on an online basis and minimizes the overall costs. The authors in [56] address the problem of heterogeneous servers in a cloud environment in which servers have different capabilities and the workload exhibits heterogeneous fluctuations. They provide a solution on how to schedule jobs on heterogeneous servers so that each one receives its fair share of resources to make progress while providing good performance. Their method is based on dividing the servers into two pools of core and accelerator nodes, and then dynamically adjusting the size of each pool to reduce cost or improve utilization to find a good cost-performance trade off. They also provide a job scheduling method called Progress Share (PS) that captures the contribution of each resource to the progress of a job. Based on that this method calculates the share of computing resources for each server. However, they did not include data locality in the cost calculation, which is a critical factor to reduce response time and improve performance. All above mentioned solutions do not consider the problem of maximizing the revenue of a cloud service provider 16

30 given communication patterns among VMs in a hierarchically organized cloud infrastructure. There are however a few notable exceptions. One is the work of Hu et al. [41], which shows how to collect information about intra-ensemble VM interactions for co-locating communicating VMs in order to reduce the consumption of bi-section bandwidth. The work of Ballani et al. [8] uses simulation to argue for a pricing model that is location independent to cloud consumers. Their work proposes dynamic resource provisioning, which essentially charges consumers based on the dominant resource, i.e., the greater of the occupancy and network price. The work in [8] does not discuss any heuristic for VM allocation based on VM communication patterns. In [9], Ballani et al. present the design of virtual network abstractions that capture the trade-off between the performance guarantees offered to cloud consumers, their costs and the provider revenue. Their work describes Oktopus, a system that implements their abstractions. The authors in [29] focus on how a cloud provider can maximizes the SLA-based revenues by proper resource allocation. Thus, they provide a queuing theory based mathematical formulation for the resource allocation problem that includes various parameters such as resource quantity, request arrival, service time and pricing model. The objective is to find how many servers should be assigned for each service instance in order to achieve maximum revenue. They use the mean response time to evaluate the service performance. Their model consider serving each customer as an M/M/1 model. However, in our research we aim to maximize the revenue using heuristic search based on hill climbing techniques. The authors in [98] solve the problem of dynamic pricing in an IaaS cloud. They presented a revenue maximization framework that uses dynamic pricing as a stochastic dynamic program. Their model takes into consideration that prices are charged per instance per time unit, and as a result the demand departure process is explicitly modeled. Their framework provide the optimality conditions and structural results of the optimal pricing policies. They also include a general non-homogeneous demand model. However, they do not use a heuristic for revenue maximization with QoS constraints. The authors in [89] provide a heuristic scheduling algorithm, called hyper-heuristic scheduling algorithm (HHSA), that 17

31 can be used with rule-based scheduling algorithms widely used on many cloud computing systems. The algorithm is used to reduce the makespan by making diversity detection and improvement detection operators to dynamically determine which low-level heuristic is to be used in finding better candidate solutions. They compare their method with existing scheduling algorithms used in CloudSim. The authors in [40] formulate and optimize the problem of VM placement in a public cloud gaming systems with the goal of maximizing the total net profit for service providers while maintaining just-good-enough gaming QoE. They develop a heuristic algorithm that works by consolidating more VMs on a server as long as the user-specified maximal tolerable QoE. However, our heuristic is based on hill climbing and takes into consideration a group of VMs that communicate together to perform a certain task. The authors in [32] address the problem of VM placement in a large scale data center with compute, network, and availability constraints as an integrated solution. Their technique is based on cold spots which is a collection of compute nodes that provide high availability. They then cluster connecting VMs to allocate them in the cloud using a graphbased search algorithm. Our technique uses a heuristic algorithm that builds a maximum spanning tree to determine the placement order of VMs based on their communication strength in order to place them in close proximity using hill climbing with the goal of minimizing communication cost between VMs. In [86], the author provides an algorithm for VM placement that takes a collection of VMs called pattern and finds a near optimal deployment in the cloud that satisfies availability and capacity constraints. Their algorithm is based on the importance sampling technique. Also, the authors in [85] used the statistical sampling method (cross-entropy) to solve a VM placement problem where VMs are clustered based on their communication needs. They consider communication and availability constraints. However, our work introduces four heuristic algorithms for allocating communicating VMs in a hierarchical data center with the goal of maximizing the revenue subject to capacity constraints. The authors in [69] use traffic-aware placement of VMs to improve network scalability. 18

32 Their heuristic algorithm partitions VMs and hosts into clusters, then places the VM clusters in close proximity with the goal of minimizing network traffic. Their input includes a traffic matrix and a cost matrix among host machines. Our work uses a communication matrix as input to determine the communication strength between VMs, and then uses a heuristic algorithm to solve the placement problem with the goal of minimizing the cost. Some research was conducted to implement methods for autonomic cloud computing environments to improve the elasticity and adaptivity of resource provisioning [28, 57, 58]. In [58], the authors propose a control policy named proportional thresholding that takes into account the coarse-grained actuators provided by resource providers. It modifies an integral controller by using a dynamic target range, instead of a static target value. The goal is to make feedback control in a cloud computing infrastructure different from feedback control in other computer systems. The authors extended their work to provide control policies for elastic scaling based on autonomic control in [57]. They also used proportional thresholding in the controller to determine the size of the storage cluster and scale the storage tier in a data-intensive cluster-based multi-tier service. The goal is to handle unpredicted changes in workload while assuring that the guest pays the minimum cost to meet its Service Level Objective (SLO). The ability of the cloud environment to scale by adding or removing computing resources as the workload changes dynamically is a critical point. The authors in [28] present SmartScale, which is an automated scaling framework. It determines the optimal way of scaling either vertically or horizontally depending on the application needs. In vertical scaling, more resources are added to the existing VM instances that are already running, but in horizontal scaling more VM instances are added. SmartScale is used to find the optimal way to allocate resources to a specific application in the cloud while the workload varies in a way that optimizes both resource usage and the reconfiguration cost incurred due to scaling. Their proposed model scales an application by changing the number of VMs versus changing the resources assigned to VMs. The authors in [60] provide a review of the existing techniques about auto-scaling applications in cloud environments. They propose a classification of these techniques into five main categories: static threshold 19

33 based rules, control theory, reinforcement learning, queuing theory and time series analysis. They also provide a clear definition of the auto-scaling problem, and explained the different experimental platforms used to test auto-scaling algorithms. A study has been presented in [94] for automating cloud resource selection. The authors present Conductor which is a system that helps customers to decide which services to use when deploying MapReduce computations in the cloud with the goal of minimizing cost. By using optimization techniques to determine an execution plan, the system automatically selects the best cloud services to use. If any changes appear at runtime, the system adapts to these changes. The goal in the deployment is cost savings while meeting deadline requirements. However, this system is dedicated to MapReduce computations. Frameworks for minimizing the cost of resources in the cloud were presented in [31, 50, 83, 93, 96, 100, 101]. In [96], a resource allocation algorithms were presented for SaaS providers to minimize infrastructure cost and SLA violations such as response time. Their algorithms are based on mapping and scheduling mechanisms and policies for translating the customers QoS requirements to infrastructure level parameters and allocating VMs to serve their requests. However, they do not use heuristic algorithms. In [31], the authors present an integer linear program (ILP) formulation for the problem of scheduling SaaS customer workflows into multiple IaaS providers to meet response time SLA. They use heuristics to solve the relaxed version of the presented ILP and find integer solutions for the solver parameters. In [100], the authors present evolutionary algorithms to minimize the resource usage for SaaS providers and improve execution time. The authors in [101] proposed a penalty based grouping genetic algorithm for multiple composite SaaS components clustering in the cloud. Their goal is to minimize resources used by the SaaS by clustering its component without violating specific constraint. Their algorithm considers the SaaS resource and communication requirements. The authors in [83] and [93] minimized the infrastructure cost by using a multi-tenant SaaS model where a single instance of a software application serves multiple customers (tenants). But they did not use heuristics for optimization. In [50] a toolkit called Services2Cloud was implemented where users can compute 20

34 revenue automatically from probabilistic patterns. It is used to assist service providers in the analysis of their expected revenue based on customer subscription and service usage. However, they did not use cost minimization. These solutions do not use heuristic search techniques to the problem of minimizing the cost for SaaS cloud provider with response time SLA Power and Energy Consumption Cloud environments consists of large-scale distributed data centers across multiple geographical regions. The cost to operate these huge environments is typically dominated by electric energy bills. Therefore, there is a need to implement and develop novel optimization methods to minimize energy consumption by enabling efficient server utilization. Many solutions have been proposed in the literature. In [12], heuristics are presented for dynamic VM consolidation in cloud providers with the goal of minimizing energy consumption while meeting CPU performance SLAs. These adaptive heuristics are used for dynamic consolidation of VMs based on an analysis of historical data from the resource usage by VMs. Their approach uses live VM migration to migrate the VM with the minimum migration time between hosts in order to minimize power consumption. Their algorithms were evaluated using CloudSim and real-world workload traces. They assume that the workload is not completely random so future events can be predicted based on past data. While quite interesting, this work does not consider availability SLAs nor a maximization of total revenue. In another work [20], the authors proposed an approach for the problem of power-efficient allocation of VMs in virtualized heterogeneous computing environments. Their technique considers the minimum, maximum, and proportion of the CPU allocated to VMs sharing the same resource. The amount of resources allocated to a VM can be adjusted based on available resources, power costs, and application utilities based on power-performance trade offs. However, there are some limitations with their approach such as the allocation of VMs is not adapted at run-time (the allocation is static) and no other resources besides the CPU 21

35 are considered during the VM reallocation. Resource allocation policies for cloud virtualized environments that identify performance and energy trade-offs and provide a priori availability guarantees for cloud end-users are presented in [1]. The authors in [33] aim to minimize the total power and migration costs in cloud computing systems under performance-related SLAs; more specifically, upper bounds on response times for serving client requests. They solve this optimization problem using a heuristic algorithm based on convex optimization and dynamic programming. Their simulation results show that considering the SLA with effective VM placement can help minimize the operational cost in a cloud computing system. Considering VMs migration, this paper aims to minimizes the migration cost only, but not minimizing the number of migrations for each resource allocation request. In [27], the authors propose a flexible energy-aware framework to address the problem of energy-aware allocation/deallocation of VMs. The goal is to lower the power consumption while fulfilling SLAs. The proposed optimizer relies on Constraint Programming (CP) to compute a configuration of VMs that minimize the power consumption. Their approach also uses branching heuristics to instantiate variables and a value to try for each variable in order to guide the solver to a near-optimal solution. The results of their experiments have shown that the approach is capable of saving energy and CO 2 emissions in a real world scenario on average 18%, and its scalability is tested where it splits the problem in several parts to enable parallel computation to find a solution. However, the authors did not show the trade-off between energy consumption and performance while using their approach. The authors in [97] present the design, implementation, and evaluation of a resource management system for cloud computing services which is used to allocate data center resources dynamically based on application demands and support green computing by optimizing the number of servers in use. In their algorithms, they use the skewness metric to combine VMs with different resource characteristics to physical resources. Their goal is to prevent overload in the servers and save energy used. The authors in [64] consider the issue of energy-related costs in data centers and cloud infrastructures. 22

36 Thus, they present ecocloud which is a self-organizing and adaptive approach for the consolidation of VMs on CPU and RAM. Decisions on the assignment and migration of VMs are driven by probabilistic processes and are based on local information. They propose a mathematical model to test ecocloud in a range of scenarios by changing the value of some parameters and provide experiments on datacenters. The goal is to consolidate VMs on as few physical servers as possible and switch the other servers off to minimize power consumption while ensuring a good level of QoS. In [76], the authors present a power and performance model based on experimentation. Their model of power and response time is calculated as a function of the total CPU utilization of the virtualized system. Instead of using simulators, they used actual hardware measurements on a two-server computer system, and a number of benchmarks to conduct experiments. Then, they constructed their models based on the results of their experiments. An automated management tool, ipoem, for virtual data centers was presented in[102]. It is an integrated power and performance management middleware to guide administration decisions. Basically, users specify a target location in terms of system performance and power cost, and ipoem returns the management configurations and operations that are required to drive the system to the destination status by using system positioning searching algorithm. The operating cost in this paper measures the cost of VM migrations and the cost of response time SLA violations due to server overloading. However, their algorithm is deterministic and terminates at the first configuration that satisfies the desired input parameters of power and performance, where it does not heuristically optimize the system configuration such as finding a better configuration that minimizes the operational cost, or energy consumption Performance Modeling in Virtual Environments Performance models are used to predict the values of a system s performance measures [67]. These performance models can provide information for resource allocation in order to optimize the system configuration. Several papers have proposed performance models for 23

37 virtualized environments. The research in [92] developed performance models for resource allocation. The authors propose a probabilistic performance modeling approach, to model the probability distribution of an application performance metric conditioned on one or more variables that can be measured or controlled, such as system resource utilization and allocation metrics. These performance metrics are modeled in terms of percentiles. Their model can be used to estimate response time distributions for a variety of scenarios. However, prior knowledge is required to use their model. The research in [7, 88] studies performance degradation that resulted from CPU contention. In [88], the authors describe the main challenges on modeling VM performance estimation. Many key points must be taken into consideration when modeling VM performance such as being aware of both visible and invisible resources since the performance of a VM is also depends on the interference of other VMs running on the same server. Thus, the authors of [88] investigate core and cache contention effects. However, they did not provide an analytical performance model that can be used in different VMMs to predict performance. In [7], the authors analyze the performance of server consolidation using the vconsolidate benchmark for different workload characteristics such as a web server, a mail server, and a database application running simultaneously. Their goal is to build a modeling framework that can be used to predict the performance of future platforms and configurations. By running experiments using vconsolidate they discovered that the performance degradation of a CPU intensive workload is due to cache and core interference. However, their experimental results are based on running the VMM, but not on predicting the performance using analytical or mathematical models. Work has been done on estimating the resource requirements of applications when they are transferred to a virtual environment such as in [14,95,99]. In [14], the authors propose an open-class queuing network model for predicting the performance of applications when migrated to run on a Xen virtual system instead of running on a Linux system. The performance model enabled the prediction of performance measures before committing the 24

38 migration process by predicting bounds on the virtual environment performance. Specifically, these metrics include response time, service demands, and slowdown which is the overhead of using virtualization. The authors in [95] present an automated approach to estimate additional resource requirements incurred by virtualization overheads. Their model is based on a number of micro-benchmarks to profile the different types of overhead caused by virtualization, and a regression-based model that maps the system usage into a virtualized one. Also, the analysis of performance overheads of virtualization in cluster-based environments was presented in [99]. The authors provided a performance model based on traditional benchmarks focusing on CPU, memory, I/O, and network to collect performance metrics. The goal of the study was to find performance bottlenecks when running high performance computing HPC applications in a virtual cluster. However, their performance model does not provide dynamic prediction of workload variations in order to scale resource usage Live VM Migration Live migration has been an essential topic in cloud computing environments which is studied in a variety of contexts. In particular, providing methods and approaches to enhance the performance of live VM migration, and thus providing a more reliable distributed virtual environment. In [24], the authors studied and described the live migration of entire OS instances, and provided the design issues involved in live migration including minimizing downtime, minimizing total migration time, and ensuring that migration does not disrupt active services in the network. The authors in [44] carried out a performance analysis of virtualized environments and examined the performance overhead in VMware ESX and Citrix XenServer virtualized environments. They created regression-based models for virtualized CPU and memory performance in order to predict the performance when migrating applications. In [59], the authors provided a live migration performance and energy model. The model s primary goal is to determine which VM should be migrated within a server farm with minimum 25

39 migration cost. In their experiment, they specify the most dirtied memory pages as hot pages for their workloads. They use linear regression methods and show that migration is an I/O intensive application. Another performance model of concurrent live migration is proposed in [53]. The authors experimentally collect performance of a virtualized system. Then, they constructed a performance model representing the performance characteristics of live migration using PRISM, a probabilistic model checker. In [43], a self-adaptive resource allocation method based on online architecture-level performance prediction models is proposed. Their method considers cloud dynamics triggered by variations in application workload. The authors of [71] propose a memory page selection in order to choose the memory pages to transfer during VM live migration. Their approach is based on the probability density function of the changes made by virtual machines on memory pages. This approach can help reduce the live migration downtime. The authors in [42] developed a testing framework that measures performance characteristics of VM live migration including total migration time, VM downtime, and the amount of data transferred over the network during migration. They apply their testing framework and present the results of live migration in virtualization systems including KVM, XenServer, VMware, and Hyper-V. They compare the performance of these virtualization systems by running experiments to help cloud administrators choose which virtualization fits in their infrastructure. However, in our research we provide analytic performance model to determine the live migration metrics and we conducted experiments in a simulated cloud environment. The behavior of iterative pre-copy live migration for memory intensive applications has been studied in[46], which proposes an optimized pre-copy strategy that dynamically adapts to the memory change rate in order to guarantee convergence. Their proposed algorithm, which is implemented in KVM, detects memory update patterns and terminates migration when improvements in downtime are unlikely to occur. A simulator based on Xen s migration algorithm is designed in [2] to characterize the downtime and total migration time. However, their simulation model is based on dynamic information collected during 26

40 pre-copying iterations. Thus, it is hard to use it for a prior migration decision before the migration begins. In [13], the authors proposed a framework for automatic machine scaling that meets consumer performance requirements and minimizes the number of provisioned virtual machines. This optimization reduces the cost resulting from over-provisioning and the performance issues related to under-provisioning. The authors in [22] describe recent accuracy and scalability advances made in the SimGrid simulation framework. They assess the quality of competing algorithms with respect to objective metrics such as execution time, throughput, and power consumption by comparing these metrics across multiple experiments. Their results show that SimGrid compares favorably to most existing simulators in terms of scalability, accuracy, or the trade-off between the two. A simulation framework for VM live migration that has been integrated as an extension to the SimGrid toolkit is proposed in [39]. The authors developed micro benchmarks for resource share calculation mechanism for VMs and a live migration model implementing the precopy migration algorithm. Their model calculates the migration time, and migration traffic, and data exchanges within the whole system. This allows users to obtain results of systems where migrations play a major role. However, in our live migration model, we use an analytical performance model to calculate several parameters of the live migration process under uniform and non-uniform dirtying rates. We also present a non-linear optimization model to minimizes the downtime subject to network utilization constraints Summary of Related Work To the best of our knowledge, none of the above mentioned solutions consider the problem of maximizing the revenue of a cloud service provider under availability constraints in an IaaS. Papers [54] and [38] consider the problem of reducing VM migration during reconfiguration. None of the above studies provide an optimization model with the goal to minimize the VM s downtime subject to constraints such as network utilization. Our research presents a detailed analytical model of the pre-copy based live VM migration, and includes the case of 27

41 hot pages in the prediction and estimation of VM s downtime, total migration time, number of iterations needed before downtime, gain, and network utilization. The closest approach to our hierarchical data center solution was proposed in[48]. There, the authors presented SCAVP, a structural constraint-aware virtual machine placement system, to improve performance and availability of services hosted on IaaS clouds. However, this approach does not consider the problem of minimizing VM migration during system reconfiguration. Our solution introduces four heuristic algorithms for allocating communicating VMs in a hierarchical datacenter. Our technique uses these heuristic algorithms and builds a maximum spanning tree to determine the placement order of VMs based on their communication strength in order to place them in close proximity using hill climbing with the goal of minimizing communication cost between VMs. Moreover, in this research we proposed an autonomic solution to provide resource management in a SaaS cloud where software services are provided dynamically at different prices and categories enabling consumers to subscribe or unsubscribe software services. The cloud provider s goal is to minimize infrastructure cost and ensure the delivery of software services to customers while meeting response time SLAs and resource constraints. 28

42 Chapter 3: Analytic Performance Modeling and Optimization of Live VM Migration 3.1 Introduction Virtualization platforms provide support for entire virtual machines (VM) to be migrated from one physical machine to another should the need arise. Earlier techniques relied on stop-and-copy approaches by which the VM was stopped and its address space copied over the network to a different physical machine before the VM was restarted at the target machine. This technique could lead to long VM downtimes. More recently, VM hypervisors started to offer live VM migration approaches that allow pages of the address space to be copied while the VM is running. If any copied page is dirtied (i.e., modified), it has to be copied again. The process stops when a fraction α of the pages need to be copied. Then, the VM is stopped and these remaining pages are copied. The main contributions of this chapter are: (1) analytic performance models to compute the VM downtime, the total number of pages copied during migration, and network utilization due to VM migration, as a function of α and other system parameters; (2) analytic performance models for the case in which a fraction of the pages of the address space are hot pages (i.e., have a higher dirtying rate than the other pages); and (3) a non-linear optimization model to find the value of α that minimizes the VM downtime subject to constraints on network utilization. Some of the results in this chapter were published in [3]. The rest of the chapter is organized as follows. Section 3.2 provides some background on VM migration and introduces the problem statement. Then, sections 3.3 and 3.4 provide the analytic model for the cases in which all pages have the same dirtying rate and the case in which some pages ( hot pages ) have a higher dirtying rate than the rest of the pages. In section 3.7, the optimization 29

43 problem is described. The results of the experiments are discussed in Section 3.8. Finally, Section 3.9 concludes the chapter. 3.2 Background and Problem Statement Live migration is the process of migrating the contents of a VM s memory from one physical host (source VM) to another (target VM), while the VM is executing. The goal is to minimize both the downtime (the period during which the VM s execution is stopped) and total migration time (the duration of end-to-end migration, from the moment the migration is initiated until the source VM is discarded) [24]. In contrast to live migration, stop and copy [24,47] is considered the simplest VM migration technique, which involves suspending the source VM, copying all its memory pages to the target VM, and then starting this new target VM. Although this approach can be easy to implement and control, it can cause long periods of VM downtime and total migration time especially with practical applications and large memory size VMs. Thus, leading to performance degradation and unacceptable VM outage. The live migration approach discussed in this chapter uses the pre-copy based migration [24,87] in which memory pages are copied from the source VM to the target VM iteratively. While the source VM continues to execute, the migration process starts by copying all pages at the first round, and then copying at each subsequent round i the modified or dirtied pages on round i 1. Dirty pages are memory pages that have been modified during the migration process while the source VM is still running. The hypervisor tracks the dirty pages at each iteration in order to re-send them. This iterative process continues for a fixed number of iterations, or until a small working set size is reached. After that, the source VM is stopped and the downtime phase starts in order to transfer the remaining active memory contents of the source VM. However, since most of the source VM s memory contents were already transferred during the pre-copy phase, the downtime is significantly reduced, except for some special cases. 30

44 Current virtual machine software supports live migration of VMs that can be migrated with very short downtimes (depending on the workload) ranging from tens of milliseconds to a few seconds [35]. Examples of such support is present in VMWare [72] and Xen [10], an open source virtual machine monitor (VMM) allowing multiple commodity operating systems to share conventional hardware. Many parameters can affect the performance of the live migration process such as the size and number of memory pages, dirtying rate, and network bandwidth. In this study we analytically model and optimize the parameters of the problem stated above. Our model quantitatively predicts the performance of this live migration process. The goal of our optimization is to minimize the VM s downtime subject to some resource constraint. In other words, the goal is to determine the optimal point at which the pre-copy phase should stop to provide the lowest VM downtime subject to the resource constraint. We also took into consideration the concept of hot pages which is the set of pages that get updated very frequently. 3.3 Analytic Model of Live Migration With Uniform Dirtying Rate Let us define the following: P s : number of memory pages currently on VM s (0 j P s, j N). s: source VM selected to be migrated. t: the newly instantiated VM as target. B: available network bandwidth, in KB/sec, between source VM s and target VM t. S: size of a page in KB. τ: time to transmit a page over the network. τ = S/B. 31

45 n: last iteration number during which pages are migrated before downtime. It is a threshold to stop the migration process. It can either be a fixed number of iterations, or a number of iterations until a small working set size is reached (0 i n, i N). D: average memory dirtying rate in pages/sec. All the derivations in this chapter do not require any specific distribution function. Regarding the pages in memory that needs to be modified, only the average dirtying rate is needed. ρ: network utilization during live migration. ρ = D τ. P(i): number of pages copied during iteration i. Note that P(0) = P s because the entire address space is copied during the first iteration. T(i): time spent in each iteration i. Note that T(0) = P(0) τ = P s τ. U net : utilization of the network due to VM migration. The number of pages copied from VM s to VM t at a given iteration i is equal to the number of pages dirtied during the previous iteration. Thus, P(i) = D T(i 1). (3.1) The time spent at iteration i is equal to the time spent transmitting the number of pages that need to be transferred at that iteration. So, T(i) = P(i) τ. (3.2) Using Eq. (3.1) in Eq. (3.2) we obtain the following recursive expression for T(i). T(i) = T(i 1) D τ = T(i 1) ρ (3.3) 32

46 SolvingtherecursioninEq.(3.3)andnotingthatT(0) = P s τ providesuswiththefollowing closed form expression for T(i). T(i) = P s D i τ i+1 = P s τ ρ i for i 0. (3.4) Then, using Eq. (3.4) in Eq. (3.1) gives us a closed form expression for P(i): P(i) = P s ρ i for i 0. (3.5) Because P(i) P s for i 0, Eq. (3.5) implies that ρ 1. We will assume throughout the chapter that ρ < 1 as our steady state condition. Pages will be copied while the source VM is live during iterations 0 to n. Then, the VM is taken down and all pages that were dirtied during iteration n, i.e., P(n+1) pages have to be copied. Thus, using Eq. (3.5), the VM downtime, defined as T down, can be computed as T down = P(n+1) τ = P s τ ρ n+1. (3.6) The time during which pages are being copied and the VM is up, T pre copy, is T pre copy = n T(i) = i=0 n i=0 P s τ ρ i = P s τ 1 ρn+1 1 ρ. (3.7) The total VM migration time is then the sum of the durations of all iterations during the pre-copy phase (i.e., iterations from 0 to n) plus the downtime. Thus, T total = T pre copy +T down = P s τ 1 ρn+1 1 ρ [ 1 ρ +P s τ ρ n+1 n+2 = P s τ 1 ρ ]. (3.8) 33

47 If the value of the threshold n is defined as the number of iterations such that at most αp s pages (with α < 1) need to be migrated, we can write that P(n+1) = P s ρ n+1 αp s. (3.9) Applying natural logarithms to both sides and noting that D τ < 1, we obtain n lnα lnρ 1. (3.10) Given that we want to use the smallest number of iterations such that at most αp s pages need to be migrated, n = lnα lnρ 1. (3.11) Since n must be 0, it follows that α ρ. Note that n is independent of the size of the address space of the source VM. The total number of pages migrated up to iteration i can be obtained as NMP(i) = P s i j=0 ρ j = P s 1 ρi+1 1 ρ (3.12) and the total number of pages migrated P TotalMig is then ( 1 ρ n+1 P TotalMig = NMP(n)+α P s = P s 1 ρ ) +α. (3.13) We now define the gain G in downtime as the ratio between the downtime without live migration and with live migration. The downtime without live migration is equal to the 34

48 time to copy the entire address space, i.e., P s τ. Thus, using Eq. (3.6), we obtain G = P s τ T down = P s τ P s ρ n+1 τ = 1 ρn+1. (3.14) Because ρ < 1, G > 1, which means that the downtime without live migration is higher than that using live migration by a factor equal to ρ n+1. It is interesting to note that the gain is independent of the size of the address space of the source VM. The utilization of the network, U net, due to VM migration can be computed as follows. During live copying, the network utilization is ρ. During the period in which the VM is down, the network utilization due to the copying of αp s pages is [α P s (S/B)]/T down = (α P s τ)/t down. The fraction of time live copying is taking place is T pre copy /(T down + T pre copy ) and the fraction of time copying is taking place when the VM goes down is T down /(T down +T pre copy ). Thus, the average network utilization due to VM migration is U net = ρ T pre copy + α P s τ T down (3.15) T down +T pre copy T down T down +T pre copy Using Eqs. (3.6) and (3.7) in Eq. (3.15) and doing some algebraic manipulation provides U net = ρ ρn+2 +α(1 ρ) 1 ρ n+2. (3.16) Note that the utilization U net does not depend on P s. 3.4 Analytic Model of Live Migration With Hot Pages Most programs exhibit a locality of reference such that a relatively small number of pages have a much higher percentage of being modified than others. We call them hot pages as in [24]. We define some additional notation for this case. 35

49 β: fraction of hot pages in the address space of the source VM s. D nh : dirtying rate of the non-hot pages. D h : dirtying rate of the hot pages. D h > D nh. We show in what follows how the model in the previous section can be adapted for the following two situations: (1) Hot I: all pages, including hot pages, are migrated during pre-copy and (2) Hot II: hot pages are not copied during pre-copy; instead, they are copied when the VM is taken down. Figure 3.1 shows how the ratio HG (for Hot page Gain) varies with β for three values of α (10%, 40%, and 70%). This ratio is defined as the VM downtime under Hot I divided by the VM downtime under Hot II. The curves show that for the two smallest values of α, the VM downtime is smaller when hot pages are migrated during pre-copy than when they are only migrated when the VM is down. Also, the ratio decreases as β increases, i.e., as there are more hot pages in the address space of the VM. For the large value of α (70%), the situation reverses, i.e., the downtime for the case when hot pages are only copied when the VM is down is always smaller then when hot pages are copied during pre-copy. The intuitive explanation is that Hot I copies hot pages during pre-copy. Thus, lower values of α imply in more iterations and more opportunities for the hot pages to be copied during pre-copy, and consequently, less downtime. For higher values of α, there will be less hot pages copied during pre-copy under Hot I, and Hot II will have a smaller downtime. The two following subsections provide models that quantify the tradeoffs between these two alternatives. 36

50 1 0.8 HG Alpha = 10% Alpha = 40% Alpha = 70% Beta Figure 3.1: HG vs. β for three values of α (10% bottom, 40% center, and 70% top) and for P s = 4096 bytes Model of Copying Hot Pages During the Pre-Copy Phase In this case, we can just use the results derived in the previous section by replacing D by the effective dirtying rate, D effective. D effective = D nh (1 β)+d h β. (3.17) We define ρ eff as D effective τ. Then, T down becomes T down = P s τ ρ n+1 eff. (3.18) 37

51 where n = lnα 1 lnρ eff (3.19) The duration of the pre-copy phase is T pre copy = P s τ 1 ρn+1 eff. (3.20) 1 ρ eff The total number of pages migrated is ( ) 1 ρ n+1 eff P TotalMig = P s +α. (3.21) 1 ρ eff Therefore, the gain G in this case is computed G = 1 ρ n+1 eff. (3.22) The network utilization due to VM migration is U net = ρ eff ρ n+2 eff +α(1 ρ eff ). (3.23) 1 ρ n+2 eff Note that the utilization U net depends on β through ρ eff, which depends on D effective. Also, as in the uniform dirtying rate case, U net does not depend on P s Model of Copying Hot Pages During the Downtime Phase In this case, we can adapt the results in the previous section as follows. The value of P s has to be replaced by P s (1 β) because only a fraction (1 β) of the address space participates 38

52 in the live migration. The dirtying rate has to be replaced by the dirtying rate of the nonhot pages, D nh. When the VM is taken down, the hot pages as well as the non-hot pages dirtied during iteration n have to be copied. We define ρ nh as D nh τ. Thus, T down becomes T down = P(n+1) τ +P s β τ (3.24) = P s (1 β) ρ n+1 nh τ +P s β τ = P s τ [(1 β)ρ n+1 nh +β] (3.25) where n = lnα 1. (3.26) lnρ nh The total time spent in the pre-copy phase is T pre copy = P s (1 β) τ 1 ρn+1 nh. (3.27) 1 ρ nh The total number of pages migrated is P TotalMig = P(n+1)+(α+β)P s = P s (1 β) ( 1 ρ n+1 nh )+(α+β)p s 1 ρ nh = P s [ (1 β) ( ) ] 1 ρ n+1 nh +(α+β). (3.28) 1 ρ nh Therefore, the gain G in this case is computed as G = P s τ T down = P s τ P s τ [ (1 β)ρ n+1 nh +β] = 1 (1 β)ρ n+1 (3.29) nh +β. 39

53 The network utilization due to VM migration is computed similarly to Eq.(3.15), namely T pre copy U net = ρ nh + (α+β) P s τ T down (3.30) T down +T pre copy T down +T pre copy T down Using Eqs. (3.25) and (3.27) in (3.30), we obtain U net = (1 β)(ρ nh ρ n+2 nh )+(α+β)(1 ρ nh) (1 β)(1 ρ n+2 nh )+β(1 ρ. (3.31) nh) Note that, as expected, the above expression has the same form as that for the uniform case when β = Clusters of Pages This section extends the results derived before to the case where pages can be clustered in groups of pages with similar page dirtying rate. Let there be K page clusters with page dirtying rates D 1,,D k,,d K for the pages in the respective clusters. Consider that the proportion of pages in cluster k is f k (k = 1,,K). The average number of pages copied from VM s to VM t at a given iteration i is equal to the number of pages dirtied during the previous iteration. Thus, K K P(i) = f k D k T(i 1) = T(i 1) f k D k. (3.32) k=1 k=1 The time spent at iteration i is equal to the time spent transmitting all the pages that need to be transferred at that iteration. So, T(i) = P(i) τ. (3.33) 40

54 Using Eq. (3.32) in Eq. (3.33) we obtain the following recursive expression for T(i). ( K ) T(i) = T(i 1) f k D k τ = T(i 1) D effective τ (3.34) k=1 where D effective = K k=1 f k D k. Let us defining as before ρ effective as D effective τ. Solving the recursion in Eq. (3.34) and noting that T(0) = P s τ provides us with the following closed form expression for T(i). T(i) = P s τ ρ i effective for i 0. (3.35) Then, using Eq. (3.35) in Eq. (3.33) gives us a closed form expression for P(i): P(i) = P s ρ i effective for i 0. (3.36) Because P(i) P s for i 0, Eq. (3.36) implies that ρ effective 1. We will assume throughout the chapter ρ effective < 1 as our steady state condition. Note that Eq. (3.36) is identical to Eq. (3.5) with ρ replaced by ρ effective. Therefore, the remaining results can be obtained by using ρ effective in lieu of ρ in the corresponding equations. T down = P s τ ρ n+1 effective. (3.37) T pre copy = P s τ 1 ρn+1 effective 1 ρ effective. (3.38) T total = P s τ [ ] 1 ρ n+2 effective. (3.39) 1 ρ effective 41

55 lnα n = 1. (3.40) lnρ effective ( ) 1 ρ n+1 effective P TotalMig = P s +α. (3.41) 1 ρ effective G = 1 ρ n+1. (3.42) effective U net = ρ effective ρ n+2 effective +α(1 ρ effective) 1 ρ n+2 effective. (3.43) Note that the Hot Pages I case discussed before is a special case of the clustered case in which there are two clusters. 3.6 Summary of Analytic Model Results Table 3.1 shows all the equations derived in the previous sections. These equations allow us to draw some important conclusions. First, as α increases, n decreases, and T down increases in all three cases. Second, T down increases with P s in all cases. Third, P TotalMig is not monotonically increasing or decreasing with α because the terms 1 ρ n+1, 1 ρ n+1 eff, and 1 ρ n+1 nh decrease as α increases (thus making P TotalMig decrease) but the term α that appears as a multiplier of P s makes P TotalMig increase with α. The gain G is always greater than one and decreases with α. The network utilization due to live migration does not depend on the size of the source VM s address space. The results for page clusters only depend on the value of D effective and not on the individual values of f k and D k. 42

56 3.7 Optimizing Live Migration Parameters An interesting optimization problem is that of finding the value of α that minimizes the VM downtime subject to some constraints such as keeping the network utilization due to VM migration below a certain limit. We note that T down = f(α) and U net = g(α) where the specific functions f and g for each of the three cases are given by Table 3.1. Then, the optimization problem can be written as Minimize T down = f(α) subject to U net (α) U max net. This is a non-linear optimization problem that we solve using methods included in MATLAB. 3.8 Numerical Results Table 3.2 shows the parameters used in the experiments reported here. 43

57 Table 3.1: Summary of performance model results. Uniform Dirtying Rate ρ = D τ T down = P s τ ρ n+1 ; n = lnα ( lnρ ) 1 P TotalMig = P 1 ρ n+1 s 1 ρ +α G = 1 ; U ρ n+1 net = ρ ρn+2 +α(1 ρ) 1 ρ n+2 Condition: α ρ < 1 Hot Pages Copied During the Pre-Copy Phase D effective = D nh (1 β)+d h β ρ eff = D effective τ T down = P s τ ρ n+1 eff ; n = lnα ( P TotalMig = P s 1 ρ n+1 eff 1 ρ eff +α lnρ eff 1 ) G = 1 ; U ρ n+1 net = ρ eff ρ n+2 eff +α(1 ρ eff) eff 1 ρ n+2 eff Conditions: α ρ eff < 1, β < 1 Hot Pages Copied During the Downtime Phase ρ nh = D nh τ T down = P s τ [(1 β)ρ n+1 nh +β]; n = lnα lnρ nh 1 [ ( ] 1 ρ P TotalMig = P s (1 β) n+1 nh 1 ρ nh )+(α+β) 1 G = (1 β)ρ n+1 +β; U net = (1 β)(ρ n+2 nh ρ nh Conditions: α ρ nh < 1, β < 1 Page Clusters D effective = K k=1 f k D k ρ eff = D effective τ T down = P s τ ρ n+1 eff ; n = lnα ( P TotalMig = P s G = 1 ρ n+1 eff 1 ρ n+1 eff 1 ρ eff +α nh )+(α+β)(1 ρ nh) (1 β)(1 ρ n+2 nh )+β(1 ρ nh) lnρ eff 1 ) ; U net = ρ eff ρ n+2 eff +α(1 ρ eff) 1 ρ n+2 eff Conditions: α ρ eff < 1 44

58 Table 3.2: Parameter values used in the experiments. Parameter Value P s 4096, 8192, 16384, and D nh 1.8 pages/sec B 60 KB/sec D h 4 pages/sec S 16 KB β 10% D 2 pages/sec Unet max 40% K 5 Clusters D k 0.01, 0.05, 0.40, 0.70, and 0.85 pages/sec f k 10%, 10%, 10%, 20% and 50% Pv: 4096 Pv: 8192 Pv: Pv: VM Downtime Alpha Figure 3.2: T down in seconds vs. α for different values of P s (4 KB, 8KB, 16KB, and 32KB). 45

59 Figure 3.2 shows the variation of the VM downtime in seconds, T down, with α for the four values of P s shown in Table 3.2 for the case of uniform dirtying rate. As predicted by the equations, the downtime increases (or stays the same) with α because more pages will have to be copied when the VM is taken down. The reason that T down may not increase at times with α is that the increase in α may not be enough to increase the number of pages to be copied. The figure shows that, for the parameters used, larger values of α can create very large (and intolerable) downtimes, especially for large address spaces. For example, if one wanted to keep the downtime below 500 sec, one could use any of the values of α shown in the figure for P s = 4096 pages, α {0.05,0.1,0.15} for P s {4096,8192,16384}, and α = 0.05 for P s = Thus, the formulation presented in this research would allow a hypervisor to dynamically determine the value of the parameter α for a given set of parameters. Figure 3.3 shows the variation of the gain G with α. As predicted, the gain decreases or stays the same as α increases. For a small value of α such as 0.05, the downtime in the stop and copy case is 23 times higher than in live migration for the parameters used. 46

60 Gain Alpha Figure 3.3: Gain vs. α for the uniform dirtying rate case. Figure 3.4 shows the variation of the total number of pages migrated during the entire VM migration copy including the pages copied while the VM is up and those copied while the VM is down. Clearly, larger address spaces will generate more pages being copied. As pointed out before, P TotalMig is not monotonically increasing or decreasing with α. This is because as α increases, more pages will have to be copied when the VM goes down, but less iterations, and therefore less pages will be copied when the VM is up. This effect is more pronounced for the case of larger address spaces. 47

61 7 x Total Migrated Pages Pv: 4096 Pv: 8192 Pv: Pv: Alpha Figure 3.4: P TotalMig vs. α for different values of P s (4 KB, 8KB, 16KB, and 32KB) for the uniform dirtying rate. 48

62 Pv: 4096 Hot I Pv: 4096 Hot II Pv: Hot I Pv: Hot II VM Downtime Alpha Figure 3.5: T down in seconds vs. α for the two cases of hot pages (hot pages copied during the pre-copy phase (case Hot I) and hot pages copied when the VM is taken down (case Hot II)) for two values of P s (4 KB and 16KB). Figure 3.5 shows the variation of the VM downtime, in seconds, versus α for the two cases of hot page migration and for two sizes of the address space. The figure shows that for the same type of hot page migration, the downtime increases as the size of the address space increases. The figure also shows that, for the parameters used, the VM downtime is smaller for the case in which hot pages are migrated while the VM is up. 49

63 Hot I Hot II Gain Alpha Figure 3.6: Gain vs. α for the two cases of hot pages (hot pages copied during the pre-copy phase (case Hot I) and hot pages copied when the VM is taken down (case Hot II)) for two values of P s (4 KB and 16KB). 50

64 Figure 3.6 shows the variation of the gain G versus α for the two cases of hot pages and for two values of P s. In both cases, the gain decreases or stays the same as α increases. However, for the parameters used, the gain is higher when hot pages are migrated when the VM is up because this case has a lower downtime as seen in Fig x Total Migrated Pages Pv: 4096 Hot I Pv: 4096 Hot II Pv: Hot I Pv: Hot II Alpha Figure 3.7: P TotalMig vs. α for the two cases of hot pages (hot pages copied during the pre-copy phase (case Hot I) and hot pages copied when the VM is taken down (case Hot II)) for two values of P s (4 KB and 16KB). 51

65 Figure 3.7 shows the total number of migrated pages for the two cases of hot page migration and two values of P s. The figure shows that in the case in which hot pages are copied while the VM is up, more pages end up being copied resulting in more overall network traffic Pv: 4096 Pv: 8192 Pv: Pv: VM Downtime Alpha Figure 3.8: T down in seconds vs. α for the clusters of pages case for different values of P s (4 KB, 8KB, 16KB, and 32KB). Figure 3.8 shows the variation of the VM downtime in seconds, T down, with α for the four values of P s shown in Table 3.2 for the cluster of pages case. The downtime increases (or stays the same) with α as more pages are copied when the VM is taken down. Figure 3.9 shows the variation of the gain G with α for the cluster of pages case. The gain decreases 52

66 or stays the same as α increases Gain Alpha Figure 3.9: Gain vs. α for the clusters of pages case. Figure 3.10 shows the variation of the total number of pages migrated during the VM live migration. 53

67 4.5 x Total Migrated Pages Pv: 4096 Pv: 8192 Pv: Pv: Alpha Figure 3.10: P TotalMig vs. α for the clusters of pages case for different values of P s (4 KB, 8KB, 16KB, and 32KB). 54

68 We ran the optimization problem described in section 3.7 for a network utilization constraint U max net = 40% and for the three cases described above. The results are shown in Table 3.3. The table shows the value of α that minimizes the downtime T down and that does not violate U max net. For the same value of P s the uniform case provides a lower downtime than Hot I, which provides a lower downtime than Hot II. Table 3.3: Optimization Results. P s Optimal α T down U net Uniform Hot I Hot II Conclusion This chapter presented analytic models to estimate the time needed to perform live migration of a VM machine. Three cases were considered: uniform page dirtying rate, hot pages being copied during the pre-copy phase, and hot pages copied only during the VM s 55

69 downtime. The pre-copy phase continues until no more than a fraction α of the pages copied during the pre-copy phase need to be copied. The value of α is an important parameter because as its value increases, the VM s downtime increases. However, at the same time, lower values of α generate higher network utilization due to VM migration. The performance of VMs not being migrated may be degraded due to high network utilization caused by VM migration. For that reason, the study presents a non-linear optimization problem that finds the value of α that minimizes the VM downtime subject to network utilization constraints. As future work, this optimization model can be implemented and tested in an open-source VMM hypervisor such as Xen. The analytic models presented here can be used to predict the performance of a specific VM s live migration before starting the migration process. This way, a cloud provider can select the VM with the least cost for migration in a large environment while satisfying Service Level Agreements. The optimization model can be extended by adding energy consumption constraints associated with the use of resources during VM migration. As part of future work, there are several ongoing research activities related to this chapter. The first, is to validate the model in an experimental setting. This study made some simplifying assumptions such as constant page dirtying rate and a constant network bandwidth. Experiments with real systems will allow us to assess the impact of these assumptions. Nevertheless, we believe that this is the first research to address this problem and the first to provide a closed form solution to the problem. Secondly, we intend to extend the model to the case in which more than one VM is being migrated at the same time. Related and interesting problems include the optimal selection of which VM to migrate first in order to minimize the impact on running applications while not exceeding some thresholds in terms of maximum migration time and/or downtime. 56

70 Chapter 4: Simulation Validation of Live VM Migration Analytical Performance Models 4.1 Experiments With Uniform Dirtying Rate This section describes the simulation experiments used to evaluate the analytical performance models of live VM migration presented in Chapter 3. This section considers the case of uniform dirtying rate, i.e., memory pages are modified at a uniform rate during the live migration process while the source VM is still running. The live migration algorithm shown in Algorithm 1 takes several parameters as input and then creates an array of structures that simulates a hypervisor s memory page table to keep track of memory page modifications (see lines 2-4). Each entry in the page table includes the following attributes: WriteProb: probability that a page is written. Because we are simulating the uniform dirtying rate case, this parameter is initialized to 1/NumOfPages. PageWrites: number of times a specific memory page is written during the live migration process. This parameter is initialized to 0 at the beginning (line 8) and is incremented by 1 each time the page is written or modified (line 19). The simulation then starts the live migration loop (line 12). It starts by migrating all the memory address space, and then tracks the dirty pages at each iteration in order to re-send them in the following iteration. For each memory page (lines 15-22), the number of writes is generated according to a Poisson distribution CDF process shown in Algorithm 2. This iterative process continues until a small working set size is reached depending on the value of α. The product of the dirtying rates is computed in each iteration (line 23), and then the geometric mean of the dirtying rates across all iterations is calculated (line 26). The 57

71 geometric mean of n numbers is the n th root of the product of these numbers. For example, consider D i as the dirtying rate at iteration i. Then, the geometric mean of the dirtying rates given that we have n iterations would be n D 1 D 2... D n. After that, the VM is stopped and the downtime is calculated (line 27) in order to transfer the remaining active memory contents of the VM. Algorithm 1 VM Live Migration with uniform dirtying rate 1: LiveMigration (NumOfPages, PageSize, NetBW, α, λ) 2: struct PageTableEntry {real WriteProb; int PageWrites;} 3: PageTableEntry PageTable [NumOfPages]; 4: Clock 0; /* Initialize system time */ 5: WP 1 / NumOfPages; 6: for all i = 1 to NumOfPages do 7: PageTable[i].WriteProb WP; 8: PageTable[i].PageWrites 0; 9: end for 10: ModifiedPages NumOfPages; m 1; Prod 1; 11: while ModifiedPages α NumOfPages do 12: IterationDuration ModifiedPages PageSize / NetBW; 13: ModifiedPages 0; 14: for all i = 1 to NumOfPages do 15: NumWrites Poisson (λ, IterationDuration, 16: PageTable[i].WriteProb); 17: if NumWrites > 0 then 18: ModifiedPages ModifiedPages + 1; 19: PageTable[i].PageWrites PageTable[i].PageWrites + 1; 20: end if 21: end for 22: Prod = Prod * ModifiedPages/IterationDuration; 23: Clock Clock + IterationDuration; /* Update clock */ 24: m m+1; 25: end while/*end live migration loop */ 26: GeomMeanDirtyingRate nthroot (Prod, m-1); 27: Downtime ModifiedPages PageSize / NetBW; Parameters We conducted experiments using parameter values shown in Table 4.1. The experiment starts by running the simulation for 30 times for each of the 12 combinations of the number of pages P s and α which result in a total of 360 runs. For each combination of the number of pages P s and α, we compute the average dirtying rate of the simulation, average downtime, 58

72 Algorithm 2 Generating Poisson-distributed random variables 1: Poisson (λ, t,p i ) 2: /* Initialization */ 3: L e λp i t ; 4: /* Compute M Poisson-distributed CDF random variables */ 5: CDF = poisscdf(m,l); 6: u = rand(0,1); /* random number between 0 and 1 */ 7: if u < CDF[1] then 8: k 0; 9: else 10: found false; 11: while not found do 12: for all i = 1 to M do 13: if (CDF[i] <= u) and (u < CDF[i+1]) then 14: k i; 15: found true; 16: Break; 17: end if 18: end for 19: end while 20: end if 21: Return (k) and average number of modified pages at each live migration iteration to compare these parameters with the results from the model. Table 4.1: Parameter values used in the experiments for the uniform dirtying rate case. Parameter Value P s 4096, 8192, 16384, and KB α 10%, 20%, and 30% B 60 KB/sec S 16 KB λ

73 4.1.2 Results This section compares the results from the simulation with the results from the analytical model described in Chapter 3. Note that the analytical model uses an average dirtying rate as its main parameter while the simulation assumes a Poisson distribution for the number of modified pages at each iteration and computes the average dirtying rate through all the iterations in the simulation. Table 4.2 shows a summary of the results from both the simulation and the model. For each combination of number of pages P s and α, the average dirtying rate with 95% CI of the simulation Avg.sim.D is calculated and then used as an input parameter for the model to compute the downtime and number of pages migrated in each iteration. Moreover, the table shows the downtime with 95% CI for the simulation and the downtime for the model with the percent error between them. Table 4.2: Summary of results for the uniform dirtying rate case from both the simulation and the model. (P s,α) Avg. sim. D Avg. sim. downtime Model downtime % Rel. Error (4096, 10%) ± ± (4096, 20%) ± ± (4096, 30%) ± ± (8192, 10%) ± ± (8192, 20%) ± ± (8192, 30%) ± ± (16384, 10%) ± ± (16384, 20%) ± ± (16384, 30%) ± ± (32768, 10%) ± ± (32768, 20%) ± ± (32768, 30%) ± ±

74 Modified Pages Model Simulation Alpha*Ps Iterations Figure 4.1: Number of pages migrated vs. number of iterations for simulation and model when P s is 4 KB and α is 10%. 61

75 Modified Pages Model Simulation Alpha*Ps Iterations Figure 4.2: Number of pages migrated vs. number of iterations for simulation and model when P s is 8 KB and α is 20%. The number of pages migrated at each live migration iteration in the simulation is compared with the model. Figure 4.1 shows the number of pages migrated for the simulation and the model and P s * α boundary where the live migration must stop and the downtime starts. This figure shows when P s is 4KB and α is 10%. The number of live migration iterations is 4 iterations for both the simulation and the model. Figure 4.2 shows the number of pages migrated for the simulation and the model and the downtime when P s is 8KB and α is 20%. Here the number of iterations is 3 for both the simulation and the model. Figure 4.3 and Figure 4.4 show the number of pages migrated when P s is 16KB and 32 KB and α is 30%. The number of iterations is 2 for both graphs. We can see from the figures that both the simulation and the model have the same number of iterations for each different run of P s and α. Moreover, the number of iterations decrease as α increases, this 62

76 1.8 x Modified Pages Model Simulation Alpha*Ps Iterations Figure 4.3: Number of pages migrated vs. number of iterations for simulation and model when P s is 16 KB and α is 30%. 63

77 3.5 x Model Simulation Alpha*Ps Modified Pages Iterations Figure 4.4: Number of pages migrated vs. number of iterations for simulation and model when P s is 32 KB and α is 30%. 64

78 is because once α is increased more pages are left to be migrated during downtime which results in less number of live migration iterations. Here we plotted a number of figures that show the comparison between the simulation and the model for selected values of P s and α. Other figures having the same percentage of α are very similar to the ones plotted here. 4.2 Experiment Description using Clusters of Pages In this section we assume that memory pages are clustered in groups of pages K with different page dirtying rate for each cluster of pages as presented in Chapter 3. These clusters are of different sizes C i and different write probabilities WP i. In the experiment, we extended Algorithm 1 previously used to include the clusters of memory pages. We used four clusters with sizes C 1,C 2,C 3, and C 4 with write probabilities per page in each cluster as WP 1,WP 2,WP 3, and WP 4 such that: K C i = P s (4.1) i=1 where the sum of all clusters sizes equals the total number of memory pages P s, and the sum of the write probabilities over all pages is equal to 1 as: K WP i S i = 1. (4.2) i=1 The parameters used in this experiment are described in Table 4.3. As in the case of uniform dirtying rate, the experiment starts by running the simulation for 30 times for each of the 12 combination of the number of pages P s and α. Then, for each combination of the number of pages P s and α, we compute the average dirtying rate of the simulation, average downtime, and average number of modified pages at each live migration iteration to compare these parameters with the results from the model. 65

79 Table 4.3: Parameter values used in the experiments for the clusters of pages case. Parameter Value P s 4096, 8192, 16384, and KB α 10%, 20%, and 30% B 60 KB/sec S 16 KB λ 2.4 C 0.3*P s, 0.4*P s, 0.1*P s, 0.2*P s WP 0.4/C 1, 0.2/C 2, 0.1/C 3, 0.3/C Results This section compares the results from the simulation experiment with the results from the analytical model described in Chapter 3 for the clusters of pages case. Table 4.4 shows a summary of the results from both the simulation and the model. For each combination of number of pages P s and α, the average dirtying rate with 95% CI of the simulation Avg.sim.D is calculated and then used as an input parameter for the model to compute the downtime and number of pages migrated in each iteration. Moreover, the table shows the downtime with 95% CI for the simulation and the downtime for the model with the percent error between them. 66

80 Table 4.4: Summary of results for the clusters of pages case from both the simulation and the model. (P s,α) Avg. sim. D Avg. sim. downtime Model downtime % Rel. Error (4096, 10%) ± ± (4096, 20%) ± ± (4096, 30%) ± ± (8192, 10%) ± ± (8192, 20%) ± ± (8192, 30%) ± ± (16384, 10%) ± ± (16384, 20%) ± ± (16384, 30%) ± ± (32768, 10%) ± ± (32768, 20%) ± ± (32768, 30%) ± ± Modified Pages Model Simulation Alpha*Ps Iterations Figure 4.5: Number of pages migrated vs. number of iterations for simulation and model when P s is 4 KB and α is 10% for the case of clusters of pages. 67

81 Model Simulation Alpha*Ps Modified Pages Iterations Figure 4.6: Number of pages migrated vs. number of iterations for simulation and model when P s is 8 KB and α is 20% for the case of clusters of pages. As shown from the results in Table 4.4, the error rate between the simulation and model is very low (less than 0.2%). The number of pages migrated at each live migration iteration in the simulation is compared with the model. Figure 4.5 shows the number of pages migrated for the simulation and the model and P s α boundary where the live migration must stop and the downtime starts. This graph shows when the value of P s is 4KB and α is 10%. The number of live migration iterations is four for both the simulation and the model. Figure 4.6 shows the number of pages migrated for the simulation and the model 68

82 and P s α when P s is 8KB and α is 20%. Here the number of iterations is three for both the simulation and the model. Figure 4.7 and Figure 4.8 show the number of pages migrated when P s is 16KB and 32 KB and α is 30%. The number of iterations is two for both graphs. We can see from the figures that both the simulation and the model have the same number of iterations for each different run of P s and α. Other figures having different values of P s and α are very similar to the ones plotted here. 1.8 x Model Simulation Alpha*Ps Modified Pages Iterations Figure 4.7: Number of pages migrated vs. number of iterations for simulation and model when P s is 16 KB and α is 30% for the case of clusters of pages. 69

83 3.5 x Model Simulation Alpha*Ps Modified Pages Iterations Figure 4.8: Number of pages migrated vs. number of iterations for simulation and model when P s is 32 KB and α is 30% for the case of clusters of pages. 70

84 Chapter 5: Autonomic Allocation of Virtual Machines in IaaS Cloud Providers 5.1 Introduction Cloud computing has gained significant attention recently due to the flexibility it brings to organizations that have variable computational and storage requirements. The elasticity afforded by cloud computing allows consumers to dynamically request and relinquish computing and storage resources and pay for them on a pay-per-use basis. Cloud computing providers rely on virtualization techniques to manage the dynamic nature of their infrastructure. Virtual machines can be easily allocated into and deallocated from physical machines and can also be migrated to different physical machines in order to meet Quality of Service (QoS) objectives. The main advantages of cloud computing from the point of view of their users are: pay-as-you go model, no need to provision for peak-loads, reduced time to market, and consistent performance and availability [73]. Cloud computing providers have to be able to manage their resources in a dynamic way in order to optimize some objective function of interest to them (e.g., revenue) subject to some constraints as the workload varies in nature and intensity. Self-optimization is one of the four attributes of autonomic computing systems [15, 52, 75]. This chapter presents an autonomic approach to the design of self-optimizing cloud providers (CP). Self-protecting is another important aspect of autonomic computing. In [37], a biologically inspired technique is presented to increase attack and exploration resilience in cloud computing. The goal of a CP in our case is to optimize its revenue, which is a function of the availability it provides to its consumers. We consider that the CP provides Infrastructure as a Service (IaaS) in the form of virtual machines (VM) of different capacities, which can be dynamically allocated and deallocated as in Amazon s EC2. We also assume that the 71

85 CP can outsource the allocation of VMs to other external CPs. However, doing so decreases the revenue of the CP. The revenue function also includes a penalty associated with the migration of VMs when a new VM allocation request has to be processed. Constraints include capacity constraints, maximum percentage of VM migration, and availability Service Level Agreement (SLA) constraints. We assume that requests from consumers demand the allocation or deallocation of a certain number and type of VMs and that the CP has to make an optimal decision as to how best to satisfy the request. This problem is NP-hard. The chapter introduces a heuristic solution based on the widely used local search method called hill-climbing [81]. The resource provisioning model we propose is at the core of frameworks for service provisioning that are emerging today. For example, in OPTIMIS [30], the deployment and execution phase of a service lifecycle needs mechanisms and algorithms to dynamically allocate services on top of the platform. This chapter considers the problem in which the cloud provider wants to maximize its revenue, subject to capacity, availability SLA, and VM migration constraints. The chapter presents a heuristic solution, called Near Optimal (NOPT), to this NP-hard problem and discusses the results of its experimental evaluation in comparison with a best fit (BF) allocation strategy. The results show that NOPT provides a 45% improvement in average revenue when compared with BF for the parameters used in the experiment. Moreover, the NOPT algorithm maintained the availability close to one for all classes of users while BF exhibited a lower availability and even failed to meet the availability SLA at times. The major contributions of this chapter are: (1) the formal specification of the optimization problem faced by cloud providers that offer virtual machines of different types in a IaaS model, (2) a heuristic search method designed to find a near-optimal solution to the optimization problem, and (3) an experimental evaluation of the method. Some of the results in this chapter were published in [21]. The rest of the chapter is organized as follows. Section 5.2 introduces the problem statement and motivates this research. The next section describes the notation and formalizes the optimization problems 72

86 solved in this work. Section 5.4 describes the heuristic techniques we used to solve the optimization problem. The results of experiments are discussed in Section 5.5. Finally, section 5.6 concludes the chapter. 5.2 Problem Description This study considers the following assumptions regarding cloud consumers and providers: Cloud consumers submit requests to a cloud provider (CP) to allocate or deallocate a given number of Virtual Machines (VMs) having different capacities. Cloud consumers and a CP agree on an SLA based on the availability level perceived by the consumers. The cloud provider owns a pool of Physical Machines (PMs). The cloud provider can accept the customer s request without changing the current configuration but simply instantiating the new VMs on the pool of PMs. The cloud provider can accept the customer s request changing the current configuration, that is migrating VMs from a PM to another in the pool or to a PM controlled by an external cloud provider (ecp). VM migration can reduce the CP s revenue because of: (1) a drop in the availability level provided by the new PM in the pool where the VM is migrated to, (2) the cost to outsource the VMs to another cloud provider, or (3) the penalty in operational costs incurred by migrating VMs. VM migration can occur for two reasons: failure of a PM or because more room is needed in the pool of PMs or in a single PM. The CP wants to maximize its revenue. 73

87 5.3 Processing of Consumer Requests This section describes the steps taken by the CP to process a consumer request to allocate/deallocate VMs. First, we need to consider the notation used throughout the chapter. H: number of PMs owned by the cloud provider (1 h H, h N). K: number of ecps used by the CP to outsource its workload (1 k K, k N). N: number of VM types offered by the CP (1 j N, j N). M: number of consumer service classes (1 i M, i N). µ i,j 0: number of type j VMs requested to be allocated (µ i,j > 0) or deallocated (µ i,j < 0) by class i customer requests. A request from class i consumers is characterized by the vector µ i = (µ i,1,,µ i,n ). Let µ i = N j=1 µ i,j. λ i : average arrival rate of requests from class i customers. n i,j 0: number of VMs of type j allocated to consumers of class i. n i 0: total number of VMs allocated to consumers of class i. n i = N j=1 n i,j. x i,j,h N: number of VMs of type j requested by consumers of class i allocated to PM h. 0 x i,j,h n i,j. y i,j,k N: number of VMs of type j requested by consumer of class i allocated to ecp k. 0 y i,j,k n i,j. x i,j,h and y i,j,k are subject to the following structural constraint x i,j,h + j,k j,h y i,j,k = n i i 74

88 (X,Y): system state, where X = {x i,j,h }, Y = {y i,j,k }, i [1,M], j [1,N], h [1,H], and i,j,h N. The system state at time t is denoted by (X(t),Y(t)) and the state variables as x i,j,h (t) and y i,j,k (t). C h : nominal capacity of PM h measured in compute units. d j : capacity needed to instantiate and operate a VM of type j on a PM. c h [0,C h ]: available capacity of PM h (that is, the current available capacity). The following capacity constraint must be satisfied: M N c h = C h x i,j,h d j i=1 j=1 h a l [0,1]: availability of PM h (1 l H) or of ecp k (H + 1 l H + K). a l = MTTF l /(MTTF l +MTTR l ) where MTTF l and MTTR l are the mean time to failure and mean time to repair of PM l, respectively. A min i, i [1,M]: minimum availability accepted by class i consumers. A i : availability experienced by class i consumers. A i is a function of the system state (X,Y). See section for the availability model used in this chapter. ri,j,l min : minimum revenue obtained by the CP with the allocation of one VM of type j requestedbyaconsumerofclassiatapmorecpl; thishappenswhenthatallocation minimally meets the customer s SLA. r i,j,l : revenue obtained by the CP when allocating a VM of type j requested by a consumer of class i on a PM or ecp l. (r i,j,l r min i,j,l ) q j,k : price charged by ecp k to operate a VM of type j. R: total revenue produced by the system state (X,Y). 75

89 g i,j : economic loss (or penalty) incurred in the migration of a previously allocated VM of type j for a class i consumer. A penalty may result due to the CP not being able to meet SLAs because resources (e.g., processing and network) are being wasted migrating VMs. α: maximum percentage of VMs allowed to migrate with each new allocation/deallocation request. Consider that a class i request µ i arrives at time t and finds the system at state (X(t),Y(t)) curr. The CP then tries to allocate the VMs in any (non-optimal) way as long as the capacity constraints are satisfied. Let (X(t),Y(t)) non opt be the resulting system state. Then, the CP transforms the state (X(t),Y(t)) non opt into an optimal (or near-optimal) state (X(t),Y(t)) opt as described in the subsequent sections Optimization model The optimization problem to be solved can now be expressed as: max. R = i,j ( H ) K x i,j,h r i,j,h + y i,j,k r i,j,h+k h=1 k=1 (5.1) s.t. (Resource constraints) x i,j,h d j C h c h, h (5.2) i,j x i,j,h + y i,j,k = n i j,h j,k i (5.3) x i,j,h [0,n i,j ] i,j,h (5.4) y i,j,k [0,n i,j ] i,j,k (5.5) 76

90 (SLA constraint) A i A min i i (5.6) Minimization of VMs migration Consider that at time t a class i consumer requests the allocation of a set of virtual machines from the CP. The CP then has to compute a new system state (X(t),Y(t)) that maximizes its revenue R. The transition from state (X(t 1),Y(t 1)) to state (X(t),Y(t)) can produce an unpredictable number of VM migrations that could hurt system stability and performance. Two possible ways to deal with this phenomenon are: (1) constraint the number of VM migrations allowed at system state change or (2) minimize the number of VM migrations. The problem formulation presented above does not control VMs migration. In what follows we introduce the concept of system state variation and economic loss due to VM migration that allows us to control the stability and performance degradation due to system reconfiguration. More specifically, we provide a formulation that minimizes VMs migration. Assume that a class i consumer submits a request at time t for µ i (t) VMs (µ i (t) = j µ i,j(t)). If the allocation is successful, the total number of VMs allocated to customers of class i at time t is n i (t) = µ i (t)+n i (t 1). The allocation of these µ i (t) VMs produces the following variation in the system state: x i,j,h = x i,j,h (t) x i,j,h (t 1) i,j,h (5.7) and y i,j,k = y i,j,k (t) y i,j,k (t 1) i,j,k. (5.8) The cases of x i,j,h > 0 and y i,j,k > 0 include the scenarios for the allocation of the new µ i (t) VMs (on PM h and/or ecp k) and for the migration of instantiated VMs (to PM h and/or ecp k). Similarly, the cases of x i,j,h < 0 and y i,j,k < 0 include the scenarios 77

91 where instantiated VMs are deallocated or are migrated to a different PM or ecp. Then, the total number, Γ i, of VM migrations experienced by class i consumers due to the request of µ i (t) VMs is given by Γ i = j [( i,j,h + h H x ] y i,j,k ) µ i,j (t) k K i (5.9) where: H = {h : x i,j,h > 0 i,j}; K = {k : y i,j,k > 0 i,j}; µ i,j (t) = n i,j (t) n i,j (t 1) if n i,j (t) n i,j (t 1) and µ i,j (t) = 0 otherwise. µ i,j (t) = 0 means that no type j VMs are allocated. Assume that the migration of a previously allocated VM of type j for a class i consumer can be quantified as an economic loss (i.e., a penalty) g i,j. The total loss due to VM migration when the system state changes from (X(t 1)Y(t 1)) to (X(t)Y(t)) can be quantified as: [( g i,j i,j,h + i,j h H x ] y i,j,k ) µ i,j (t). (5.10) k K For example, consider the case of two consumer classes (M = 2), three VM types (N = 3), three PMs (H = 3) and two ecps (K = 2). Suppose the system is in the following state represented by matrices for consumer class 1 (i.e., i=1) and for consumer class 2. The state for each customer class is represented by two VM allocation matrices: the left one for the allocation of VMs to PMs and the right one for the allocation of VMs to ecps. The rows in both matrices correspond to VM types (i.e., the index j). The columns in the first matrix correspond to PMs (i.e., the h index) and the columns in the second matrix 78

92 correspond to ecps, i.e., the k index). X(t 1)Y(t 1) i=1 = X(t 1)Y(t 1) i=2 = Suppose also that, at time t, a new request from consumers of classes 1 and 2 arrives and assume that µ 1 (t) = (0,2,1) and µ 2 (t) = (3,1,1). Consider that the new system state determined by the solution to the above optimization problem is X(t)Y(t) i=1 = X(t)Y(t) i=2 = Applying Eqs. (5.7)-(5.9), we obtain that Γ 1 = 5 and Γ 2 = 0. Assuming that g 1,j = 0.1 and g 2,j = 0.5 for all VM types, the penalty due to VM migrations is 0.5. We now modify the formulation of the optimization problem to include the concept of system state variation and economic loss due to VM migration: 79

93 maximize R = ( H K x i,j,h (t) r i,j,h + y i,j,k (t) r i,j,h+k ) i,j h=1 k=1 [( g i,j i,j,h + i,j h H x ] y i,j,k ) µ i,j (t) k K (5.11) s.t. (Resource constraints) x i,j,h (t) d j C h c h h (5.12) i,j (SLA constraint) A i A min i i (5.13) (VM migration constraint) i Γ i i,j ( h x i,j,h(t)+ k y 100 α (5.14) i,j,k(t)) (Structural constraints) x i,j,h (t) [0,n i,j ] i,j,h (5.15) y i,j,k (t) [0,n i,j ] i,j,k (5.16) (Definitions) 80

94 H = {h : x i,j,h > 0 i,j} K = {k : y i,j,k > 0 i,j} L = {l,m : n l,m (t) n l,m (t 1)} n i (t) = N n i,j (t) j=1 x i,j,h = x i,j,h (t) x i,j,h (t 1) i,j,h y i,j,k = y i,j,k (t) y i,j,k (t 1) i,j,k Table 5.1 summarizes the problem parameters classified as input and output. At time t = 0, by definition, x i,j,h (t 1) = 0, y i,j,k (t 1) = 0 and n i,j (t 1) = 0. Table 5.1: Inputs, variables and outputs of a problem instance at time t Input H, K, N, M, µ i,j (t), λ i, n i,j (t), n i,j (t 1), x i,j,h (t 1), y i,j,k (t 1), C h, d h ri,j,l min, q j,k, g i,j, a i, A min i, α Output R, x i,j,h (t), y i,j,k (t) Revenue model We consider two revenue models: linear and exponential. The former considers a linear increase in revenue as a function of the availability A i and the latter considers that the 81

95 revenue increases exponentially with A i. In both cases, the revenue is equal to a value r min i,j,l when the SLA is minimally met (i.e., A i = A min i ) for the case in which a class i customer request for a type j VM is allocated at a PM or ecp l. The linear model is defined as r i,j,l = ri,j,l min A i A min i for A i A min i (5.17) and the exponential model is defined as r i,j,l = r min i,j,l +e(a i A min i )/β 1 for A i A min i (5.18) where β = [ ln r min 1 A min i i,j,l.( 1 A min i ]. 1)+1 In both models, the revenue increases, linearly or exponentially, from r min i,j,l to rmin i,j,l /Amin i as A i increases from A min i to 1. If the VM is allocated at an ecp k (k = 1,,K), the revenue becomes r i,j,l q j,k due to the price that has to be paid to the ecp. A i is a function of the state (X(t)Y(t)) as shown in the next subsection Availability model The availability perceived by class i customers depends on where its VMs are allocated and on the availability of the PMs and ecps that host the VMs of these consumers. Thus, A i = N j=1 H+K l=1 ǫ i,j,l n i a l (5.19) 82

96 where x i,j,l l = 1,,H ǫ i,j,l = l = H +1,,H +K. y i,j,l H (5.20) Equation 5.19 computes the perceived availability A i as the weighted average of the availability of all PMs and ecps that host consumer i s VMs. The weights represent the fraction of consumer i s VMs at a PM or ecp. Consider the following example. A CP uses 2 PMs (H = 2) and one ecp (K = 1) to allocate the VMs of a consumer of class i. The availability of the PMs are a 1 = 0.99 and a 2 = The availability of the ecp is a 3 = The consumer has 5 allocated VMs (i.e., n i = 5). Two of them are of type 1 and are allocated at PM 1. The other three are of type 2 and are allocated as follows: 2 at PM 2 and one at the ecp. Then, according to Eq. (5.19), A i = = (5.21) 5 The availability model defined above is a VM-mapping model since it is based on the availability provided to the VMs by the underlying physical infrastructure of PMs and ecps. Another possible model would be one that takes into account the pattern of use (i.e., sequential, in parallel, or in any other combination) of the VMs by the applications that use them. Such a characterization varies significantly with the workload and with time. Therefore, we decided to use the more general VM-mapping model described above. 5.4 Heuristic Search The optimization problem presented in Section 3 is NP-hard and is non-linear if we adopt the exponential revenue model. Therefore, we resort to a heuristic solution based on hill climbing search techniques [81]. (See pseudocode in Algorithm 3.) The algorithm starts from the current state (X,Y) curr and iteratively explores a fraction of the solution space 83

97 by creating neighborhoods of states by means of the neighbors function (see line 9) to be discussed later. The algorithm performs MaxRestarts searches (see line 4) to minimize the possibility of the hill climbing search to be stuck in a local optimum, a well-known drawback of this type of search. The first search starts from (X,Y) curr and the others start from a feasible (i.e., that satisfies the capacity constraint and availability SLA) random state obtained by the function rnd (see line 20), which randomly migrates α% of the VMs in the state resulting from the previous local search. Algorithm 3 NOPT hill climbing local search method 1: NOPT (MaxRestarts, MaxIterations, α, NumNeighbors, (X,Y) curr ) returns ((X,Y)); 2: /* Start from current state */ 3: (X,Y) temp (X,Y) curr 4: for all i = 1 to MaxRestarts do 5: /* Initialization */ 6: Searching True; NumIterations 0 7: /* Perform one local search */ 8: while Searching & (NumIterations < MaxIterations) do 9: S neighbors (NumNeighbors, α, (X,Y) temp ) 10: (X,Y) max argmax s S {R(s)} 11: if R((X,Y) max ) > R((X,Y) temp ) then 12: (X,Y) temp (X,Y) max 13: else 14: Searching False 15: end if 16: NumIterations NumIterations : end while 18: LocalOpt [i] (X,Y) temp 19: /* Restart from a random feasible state by migrating α% of the VMs */ 20: (X,Y) temp rnd ((X,Y) temp ) 21: end for 22: Return argmax i {R(LocalOpt[i])} Each local search (see line 9) finds a neighborhood of the state (X,Y) temp that does not allow more than α% VM migrations, finds the state (X,Y) max in the neighborhood with the largest revenue (line 10), and moves the search to state (X,Y) max if the revenue of that state exceeds that of state (X,Y) temp (lines 11 and 12). Otherwise, the local search ends (line 14) and another has to be restarted if the limit of restarts, MaxRestarts, has not 84

98 Algorithm 4 neighbors function 1: neighbors (NumNeighbors, α, (X,Y) curr ) 2: returns {(X,Y)} 3: Assumption: PMs are numbered in order of increasing availability. 4: S ; 5: for all n=1 to NumNeighbors do 6: NumMigrated 0; (X,Y) (X,Y) curr 7: /* respect the maximum migration threshold α */ 8: while (NumMigrated α M i=1 n i ) do 9: lp 1; /* PM with lowest availability */ 10: if a VM allocated at PM lp in (X,Y) then 11: randomly select a VM v from PM lp 12: j type (v); 13: mp H; Migrated False; 14: while ((mp > lp) ( Migrated)) do 15: if c mp d j then 16: /* enough capacity found at PM mp */ 17: (X,Y) migrate((x,y), v, lp, mp); 18: NumMigrated NumMigrated + 1; 19: Migrated True 20: 21: else /* move to the next PM with lower availability */ 22: mp mp -1 23: end if 24: end while 25: if Migrated then 26: /* could not migrate to the CP; migrate to a random ecp */ 27: k random ecp; 28: (X,Y) migrate((x,y), v, lp, k); 29: NumMigrated NumMigrated : end if 31: else 32: /* move to next higher availability PM */ 33: if lp hp then 34: lp lp : end if 36: end if 37: end while 38: S S (X,Y) 39: end for 40: Return S been reached. After all local searches are executed, the algorithm returns the state with the largest revenue among these identified by all local searches (line 22). The revenue function R used in lines 10 and 11 is the one given in Eq The neighbors function (line 9) computes a neighborhood composed of NumNeighbors states. 85

99 The approach for finding a neighborhood, given in Algorithm 4, is as follows. We consider without loss of generality that the PMs are numbered from 1 to H in increasing order of availability. A random VM v is chosen from the lowest numbered PM (i.e., lowest availability) that has at least one VM allocated and migrated to the highest numbered PM (i.e., highest availability) with enough remaining capacity to receive VM v. If there is no PM of higher availability with enough remaining capacity, the selected VM v is migrated to a randomly selected ecp (lines 27-28). The while loop at line 8 repeats the process while the number of migrations does exceed the threshold of α% of the number of VMs allocated (i.e., M i=1 n i). The while loop at line 14 attempts to migrate VM v to a PM numbered mp, starting from mp = H and moving down. The function migrate ((X,Y), v, s, d) (lines 17 and 28) transforms a state (X,Y) to represent the migration of VM v from source PM s to PM d (d H) or to ecp d (d > H). After the while loop starting at line 8, a neighbor of the current state, (X,Y) curr, is available and added to the set of neighbors S (line 38) to be returned by the neighbors function after all NumNeighbors neighbors were generated (line 40). 5.5 Experiments We conducted experiments that consider a random stream of VM allocation or deallocation requests generated by consumers of all classes. The arrival process of requests is considered to be a Poisson process. Every time a request µ i arrives, a new near-optimal state is computed using the NOPT algorithm described in the previous section. We consider three classes of customers (i.e., M = 3) and three VM types (i.e., N = 3). The capacities d j are such that d 2 = 2 d 1 and d 3 = 2 d 2, which is consistent with Amazon s EC2 capacity relationships between small, medium, and large VMs. Table 5.2 shows the values of the parameters used in the experiments. We compare the NOPT algorithm with a Best Fit (BF) strategy that works as follows: each new allocation request is allocated at the PM of highest availability that has sufficient 86

100 capacity. If no PMs can be used to satisfy the request, a random ecp is chosen. Table 5.2: Parameter values used in the experiments. Parameter Value H 200 K 3 N 3 M 3 C h 10 h d 1,d 2,d 3 1, 2, 4 A min 1,A min 2,A min , 0.985, ri,1,l min,rmin i,2,l,rmin i,3,l 1, 2, 3 a l 0.98 (50% of the PMs); (rest) q 1,k,q 2,k,q 3,k 1.2, 2.4, 3.6 g i,j 3 i,j α 0.5 MaxIterations 20 MaxRestarts Results This section compares NOPT with BF using the same workload of 1,250 randomly generated allocation/deallocation requests. The workload was generated in a way that maintains the capacityutilizationρ, definedbelow, aroundatargetvalueρ target = 0.80inourexperiments. For each request, a random number (between 1 and 30) of VMs is generated for each consumerclassandeachvmtype. Theserequestsstartasallocationrequests(i.e., µ i,j > 0). At each request generation, the capacity utilization ρ is compared to ρ target. Once ρ reaches 87

101 or exceeds ρ target, the request becomes a deallocation request (i.e., µ i,j < 0) with the proviso that a deallocation request cannot deallocate a non-allocated VM. ρ = H M N h=1 i=1 j=1 x i,j,h d j H h=1 C. (5.22) h Figure 5.1 shows a graph of C alloc, the total allocated capacity normalized by the total capacity of the CP: C alloc = M N i=1 j=1 d j( H h=1 x i,j,h + K H h=1 C h k=1 y i,j,k). (5.23) 1.5 Normalized Capacity Allocated Time Figure 5.1: Total normalized allocated capacity, C alloc, vs. time 88

102 The graph of Figure 5.1 shows the effect of the workload of allocation and deallocation requests on the allocated capacity. A value of 1 for C alloc indicates that the total allocated capacity is equal to the total capacity of the CP. Values above 1 indicate that ecps are being used to meet the allocated demand. As it can be seen from the figure, the value of C alloc oscillates between 0.7 and 1.45 with an average close to 1 as VMs are allocated and deallocated over time. Figure 5.2 compares the variation of the revenue for NOPT (upper curve) and BF (lower curve). The graph also shows the upper and lower bounds of the 95% confidence intervals for the average revenue, which is 2,006 for NOPT and 1,382 for BF. Thus, NOPT provides an improvement of 45% in the average revenue. It is interesting to compare the curves in Fig. 5.1 with those of Fig As the demand for capacity goes below the total CP capacity, the NOPT algorithm has more opportunity to rearrange the VMs in order to obtain an improved revenue. For example, at time 3,000, the total allocated capacity is about 80% of the CPs capacity and the revenue for NOPT spikes to its highest value of about 38% above its average value and about 80% above the amount for BF at that time. After that point, there is a very clear separation between the revenue lines of NOPT and BF. Figure 5.3 shows three graphs displaying the variation of the availability over time for each of the three classes. For each class the availability for NOPT and BF is shown as well as the availability SLA for the class. It can be easily seen that NOPT always exhibits higher availability than BF. This is expected since NOPT is designed to maximize the revenue, which increases with the availability as indicated by the revenue model. It can also be seen that BF does not meet the availability SLA all the time for class 2. This is due to the fact that BF does not provide any guarantee to meet availability SLAs. 89

103 Revenue NOPT BF Time Figure 5.2: Revenue vs. time for NOPT (top) and BF (bottom). 90

104 Availability NOPT Class1 BF Class1 Class1 Minimum Constraint Time Availability NOPT Class2 BF Class2 Class2 Minimum Constraint Time Availability NOPT Class3 BF Class3 Class3 Minimum Constraint Time Figure 5.3: Availability vs. time for each class (top: class 1, middle: class 2, and bottom: class 3) for NOPT and BF. Also shown the availability SLA for each class. 91

105 1 Empirical CDF F(x) NOPT BF Percentage of Total Capacity on ecp Figure 5.4: Experimental CDF of the percent of the total capacity allocated to ecps. Figure 5.4 shows the empirical CDF, F(x), of the percentage of the total capacity allocated to ecps, P ecp, defined below, for both NOPT and BF. P ecp = 100 M N i=1 j=1 d K j k=1 y i,j,k M N i=1 j=1 d j( K k=1 y i,j,k + H h=1 x i,j,h) (5.24) The curves in Fig. 5.4 show that NOPT tends to allocate more capacity at ecps than BF for this workload and for these parameter values. For example, NOPT has a 80% chance 92

106 of allocating more than 30% of the capacity on ecps (1 F NOPT (0.3) = 0.8) while BF has only a 20% chance of outsourcing more than 30% of the capacity to ecps (1 F BF (0.3) = 0.2). The reason is that NOPT will use ecps if that increases the revenue due to higher availability. Table 5.3 shows some average results with their 95% confidence intervals for NOPT and BF. The first two rows show the average new capacity requested by each request and the average capacity deallocated per request. Remember that allocation and deallocation requests tend to balance out over time to maintain the CP allocated capacity around the value of ρ target. Because the experiments start with allocations, the average allocated capacity per request is slightly higher than the average deallocated capacity per request. The allocation and deallocation values for NOPT and BF are the same because we used the same workload for both. The table also shows the average PM remaining capacity after processing a request for NOPT and BF; a value of 10 means an empty PM and a value of zero a fully occupied PM. NOPT exhibits a lower value for the average PM remaining capacity than BF. The reason is that BF does not use VM migration, which in the case of NOPT helps to better utilize available capacity in the PMs. The following row shows the average revenue, which as discussed above, shows a 45% advantage for BF. The last row, only applicable for NOPT,showsanaverageof1.52±0.4fortheaveragenumberofmigratedVMsperrequest. Metric NOPT BF Avg. new capacity requested per request 320 ± ± 5 Avg. deallocated capacity per request 301 ± ± 6 Avg. PM remaining capacity 1.46 ± ± 0.06 Avg. revenue 2,006 ± ± 10 Avg. number of migrated VMs per request 1.52 ± 0.4 NA Table 5.3: Average results including 95% confidence intervals. 93

107 An interesting feature of the NOPT algorithm is its constraint on the percentage α of VMs that can be migrated when a neighborhood is generated. A low value of α constrains the reallocation possibilities available to NOPT and may decrease the revenue it generates. On the other hand, a higher value of α may incur in decreased revenue due to the VM migration penalty g i,j, which is part of the revenue function (see Eq. 5.11). This indicates that there may be an optimal value of α. Table 5.4 shows the results obtained with NOPT for various values of α. The table shows that a value of α = 0.5 provides the largest revenue among the values of α chosen for this workload. α 20% 40% 50% 70% 100% Revenue 1,801 1,906 2,006 1,934 1,977 Table 5.4: Average revenue for various values of α. Another observation has to do with the impact of the value of the migration penalty g i,j. The experiments reported above used g i,j = 3 i,j. We also ran the experiments with lower values of g i,j (i.e., g 1,1 = 0.5,g 1,2 = 0.5,g 1,3 = 0.9, g 2,1 = 0.1,g 2,2 = 0.1,g 2,3 = 0.12, g 3,1 = 0.9,g 3,2 = 0.8, and g 3,3 = 0.7) and noticed that the average revenue decreased from 2,006 to 1,678 (a reduction of 16.4%) and the average number of migrated VMs increased from 1.52 to 1.78 (a 17% increase). 94

108 Each run of the NOPT algorithm (i.e., the processing of a request) took only 0.12 sec for the experiments reported above executed on a laptop with an Intel Core i5-2410m 2.30 GHz and 6 GB of RAM. The algorithm was implemented in Matlab. An implementation in C or C++ may further speed up its execution. 5.6 Concluding Remarks This chapter presented a formalization of the problem of optimal allocation of VMs requested from a cloud provider (CP) in a way that maximizes the CP s revenue subject to capacity, availability SLA, and VM migration constraints. The CP is allowed to use external CPs to allocate VMs for its customer. The revenue takes into account the charges incurred in using external CPs. Because the problem is NP-hard, the study presented a Near Optimal(NOPT) algorithm based on hill-climbing and compared the results with a Best Fit (BF) strategy. The results show that NOPT provides a 45% improvement in average revenue when compared with BF for the parameters used in the experiments. Moreover, the NOPT algorithm maintained the availability close to 1 for all customer classes while BF exhibited a lower availability and even failed to meet the availability SLA at times. In the future, one may look into the problem of finding an optimal value of α along with the optimal allocation. 95

109 Chapter 6: Autonomic Allocation of Communicating Virtual Machines in Hierarchical IaaS Cloud Providers 6.1 Introduction Cloud providers are typically hierarchically organized into interconnected data centers, each with a collection of racks of servers organized into clusters. The communication cost between two servers is a function of their relative location in the cloud infrastructure. Cloud consumers submit allocation requests for virtual machines, of different types and capacities, and provide an indication of the communication strength between all pairs of requested virtual machines. There is therefore a need for autonomic provisioning of virtual machines in a cloud environment. Autonomic computing deals with the design of self-optimizing, self-configuring, selfhealing, and self-protecting computer systems [52]. One of the main applications of autonomic computing is on very large and complex computer systems (e.g., cloud computing infrastructures) for which it is virtually impossible for human beings to make low level decisions at run-time. In autonomic computing, human beings establish high level goals that are used by autonomic controllers that follow the MAPE-K loop [52] to (1) analyze data obtained through monitors, (2) plan steps for optimization, configuration, failure recovery, and /or protection against security attacks, and (3) execute these plans. In Chapter 5, we presented an autonomic algorithm for the provision of virtual machines into servers of a cloud provider. The goal of that chapter was to maximize the revenue (a function of the availability provided to each consumer) of the cloud provider subject to availability and capacity constraints. In this chapter we consider a different kind of problem and a different organization for the infrastructure of a cloud provider. In Chapter 5, the organization of the servers was 96

110 considered to be flat. Here, we consider a hierarchical organization consisting of several interconnected data centers. Each data center has a collection of servers organized into clusters, which have several racks of servers. The communication cost between two servers varies significantly depending on their relative location in the infrastructure. Due to this reason, some IaaS providers (e.g., Amazon s EC2) allow their customers to request cluster instances, which provisions the VMs in the same logical cluster providing high-bandwidth and low-latency for the VMs within the cluster. A cloud provider typically has a hierarchically organized networking infrastructure. The servers of a rack are connected through a local switch, which has some ports used for uplinks to a cluster-level switch that provides connectivity across all racks of a cluster. Cluster-level switches have uplinks to a data center switch, which allows servers from across different clusters of a data center to communicate. The data center switch also has uplinks to other data center switches so that servers can communicate across data centers. The communication latency increases and the bandwidth decreases as we move from same server, to different server but same rack, to different rack but same cluster, to different cluster but same data center, and to different data center [11]. Using the latency and bandwidth values in the example in [11], one can estimate that a 1-MB message would take 0.05 msec to be transmitted within a server (e.g., between two virtual machines on the same server), 10 msec between servers of the same rack, and 100 msec between servers in different racks of the same cluster. Thus, it is imperative that virtual machines that communicate among themselves be allocated as close as possible to each other. There is a growing number of applications that require close communication between the virtual machines used to support the application. MapReduce [26] is an example of that; map tasks must communicate with reduce tasks to complete a job. Thus, when allocating virtual machines from a cloud provider, one should try to optimize the locality of communication. The contributions of this chapter are: (1) A pricing model for cloud resource usage based 97

111 on how close, communication-wise, VMs are allocated by a cloud provider. This pricing model provides incentives to the cloud provider to reduce performance uncertainties. (2) Efficient heuristic algorithms to solve this NP-hard VM allocation problem; the heuristics are shown experimentally to perform significantly better than an allocation strategy that is oblivious to the communication strength between virtual machines. In particular, the proposed heuristics were shown to generate between 87% and 88% of the upper bound on the optimal revenue. Because of the efficiency of the heuristics, they can be used to solve the VM placement problem in an online manner as new requests arrive from consumers.(3) We introduced availability constraints based on user requests for not co-locating VMs on the same server to improve fault tolerance. Some of the work reported in this chapter was published in [4]. The rest of this chapter is organized as follows. Section 6.2 presents the assumptions and notation used throughout the chapter. Section 6.3 discusses the two revenue models used in the chapter: a linear revenue and an exponential revenue model. The next section formalizes the optimization model considered here. Section 6.5 presents the heuristic algorithm used to find a near optimal solution to the optimization problem. The next section provides experimental results that compare the revenue obtained through the heuristic, with that of an allocation method that is oblivious to communications costs, and to an upper bound of the optimal solution. Finally, section 6.8 presents some concluding remarks. 6.2 Problem Assumptions and Notation Figure 6.1 illustrates the hierarchical infrastructure for a cloud provider considered in this chapter. This infrastructure consists of various interconnected data centers typically situated in different geographical regions to improve business continuity in the face of natural disasters as well as for reducing response time by increasing proximity to a widespread set of consumers. Each data center has a number of servers organized in clusters (aka arrays) of racks, with each rack containing several servers. 98

112 Cloud Provider Infrastructure Data Center 1 Data Center K Cluster Cluster Rack Server... Server Rack Server... Server... Rack Server... Server Rack Server... Server Cluster Cluster Rack Server... Server... Rack Server... Server Rack Server... Server... Rack Server... Server Figure 6.1: Infrastructure of a Cloud Service Provider. The following are our assumptions regarding cloud consumers and providers: A Cloud Provider (CP) offers Infrastructure as a Service (IaaS) services to consumers, who can request a certain number of virtual machines (VM) of different types and capacities. The cloud provider has a number of data centers (DC). Each data center has a number of clusters of racks and each rack has a number of servers. There are several categories of cloud consumers who pay different amounts to obtain a better allocation for their virtual machines, i.e., an allocation that is more consistent with the communication needs of the consumer s requested virtual machines. 99

113 Cloud consumers submit requests to the CP to allocate or deallocate a certain number of VMs. Each allocation request indicates the type of each requested machine and also provides a communication strength index between each pair of requested VMs. This index, a number between 0 and 1, indicates the intensity of the communication between the various VMs. A value of 1 represents maximum communication coupling and 0 represents no communication between the requested VMs. The CP charges a fee from the consumer that depends on how well the CP is able to allocate the requested VMs so that VMs that have higher communication strength indices are allocated as close as possible to each other. The CP wants to maximize its revenue. The following notation is used throughout the chapter. D: number of data centers in the cloud provider infrastructure (1 d D, d N). C(d): number of clusters of racks in data center d (1 c(d) C(d), c(d) N) R(c,d): number of racks in cluster c(d) of data center d (1 r(c,d) R(c,d), r(c,d) N) S(r,c,d): number of servers in rack r(c,d) of cluster c(d) of data center d. (1 s(r,c,d) S(r,c,d), s(r,c,d) N) N: number of VM types offered by the CP (1 t N, t N). K: number of VMs requested by a cloud consumer in each allocation request. type(i): type of VM i. P: number of CP consumer categories. 1 p P,p N µ = (µ 1,,µ K ): vector of virtual machines requested by a cloud consumer. µ k (k = 1,,K) is the type of the k-th virtual machine requested by a consumer in its allocation request. 1 µ k N, µ k N 100

114 C: K K communication strength matrix such that 0 C[i,j] 1,C[i,j] = C[j,i],C[i,i] = 1, i,j {1,,K}. β = ( µ,c): allocation request coming from a cloud consumer. A request consists of the vector of VMs requested (including their types) and the communication strength matrix for the requested VMs. Later in the chapter we extend β to include the Do Not Co-locate (DNC) matrix described in section 6.6. λ: average arrival rate of VM allocation/deallocation requests. C s(r,c,d) : nominal capacity of server s(r,c,d) measured in compute units. d t : capacity needed to instantiate and operate a VM of type t on a server measured in compute units. A i = (s,r,c,d): allocation of VM i on server s of rack r of cluster c of data center d. x i s,r,c,d : decision variable for the optimization problem. This value takes the value 1 if there is an allocation A i = (s,r,c,d) and 0 otherwise. n t,s(r,c,d) N: number of VMs of type t allocated to server s of rack r of cluster c of data center d. Note that n t,s(r,c,d) = x i s,r,c,d. (6.1) i s.t. type(i)=t c h [0,C h ]: current available capacity of server h. The following capacity constraint must be satisfied: N c h = C h n t,h d t server h (6.2) t=1 r p min : minimum revenue obtained by the CP with the allocation of a pair of communicating VMs to category p consumers; this happens when that allocation is minimally 101

115 compatible with the customer s communication strength index for the pair of VMs (e.g., allocation at different data centers for strongly communicating VMs). rmax: p maximum revenue obtained by the CP with the allocation of a pair of communicating VMs to category p consumers; this happens when that allocation is maximally compatible with the customer s communication strength index for the pair of VMs (e.g., allocation at the same server for strongly communicating VMs). r p i,j : revenue obtained by the CP when allocating VM i according to allocation A i and VM j according to allocation A j to a category p customer. This revenue depends on the type of the co-locations A i and A j according to Table 6.1. R: total revenue obtained by the CP state for an allocation request β = ( µ,c), which can be expressed as R = A i,a j r p i,j. In other words, the cloud consumer pays the CP a certain amount r p i,j for each pair i,j of requested VMs for i < j. 6.3 Revenue model The revenue r p i,j obtained by the CP when allocating a pair of VMs i and j to consumers of category p (p = 1,,P) depends on the type of the co-locations A i and A j according to Table 6.1. A co-location indicates the relative proximity of two VMs in terms of their position in the hierarchical CP infrastructure. 102

116 Table 6.1: Types of co-locations for VMs i and j. Co-location Description Type (α) 1 Same server A i = A j = (s,r,c,d) 2 Different servers of the same rack A i = (s i,r,c,d),a j = (s j,r,c,d) s i s j 3 Different racks, same cluster A i = (s i,r i,c,d),a j = (s j,r j,c,d) s i s j r i r j 4 Different clusters, same data center A i = (s i,r i,c i,d),a j = (s j,r j,c j,d) s i s j r i r j c i c j 5 Different data centers A i = (s i,r i,c i,d i ),A j = (s j,r j,c j,d j ) s i s j r i r j c i c j d i d j We first consider a revenue function r p i,j that represents a linear decrease in revenue as the co-location type α goes from 1 to 5. So, r p i,j (α) = [ r p min rp max 4.α+ 5 rp max r p ] min C[i,j] (6.3) 4 where α is the co-location type (see column 1 of Table 6.1) and rmax, p r p min, and C[i,j] are as defined above. Note that r p i,j (1) = rp max and r p i,j (5) = rp min in Eq. (6.3). Another revenue model also considered here is one in which the revenue decreases exponentially with the co-location type α. Thus, ( r p (r p i,j (α) = max ) 5 ) 1/4 r p e ln(rp min /rp max).α/4 C[i,j]. (6.4) min 103

117 As with Eq. (6.3), r p i,j (1) = rp max and r p i,j (5) = rp min in Eq. (6.4). Notethattherevenuefunctionr p i,j (α)isafunctionoftheallocationsa i anda j throughα. Thus, r p i,j (α) = rp i,j (A i,a j ). 6.4 Optimization Problem The optimization problem to be solved can now be expressed as: max R = P p=1 A i,a j, i<j,i,j [1,,K] r p i,j (A i,a j ) (6.5) s.t. (Server capacity constraint) N c h = C h n t,h d t h (6.6) t=1 Note that, according to Eq. (6.1), n t,h depends on the VM allocations. This optimization problem is NP-hard. The number of possible allocations is of the order of H K where H is the total number of servers in the cloud infrastructure and K the number of VMs requested by a cloud consumer. Large data centers and cloud infrastructures (e.g., Google, Amazon, and Microsoft) have of the order of millions of servers. Thus, a request for 10 virtual machines would generate of the order of possible allocations considering one million servers. For this reason, an efficient heuristic is required to find a near-optimal solution to this optimization problem, which must be solved in real time as allocation requests arrive. 104

118 6.5 Heuristic Algorithms We first describe the Basic VM Allocation Heuristic (BVAH), which does not deallocate any already allocated VM in order to find a near-optimal placement for the requested VMs. Then, we describe the Advanced VM Allocation Heuristic (AVAH) that uses BVAH and considers the possibility of deallocating some of the most recently allocated VMs, allocating the VMs in the new request, and reallocating the deallocated VMs. Finally, in this section we also discuss an allocation strategy that we call NoComm, which allocates the VMs in a way equivalent to BVAH and AVAH but does not take into account the values of the communication strength index Basic VM Allocation Heuristic (BVAH) At a high-level, the basic heuristic algorithm shown below builds a labeled undirected graph in which the vertices are the VMs requested and the edges represent the existence of a nonzero communication strength value between the vertices at the end points of the edge. The labels in the graph are the values of the communication strength indices (step 1). We then build a maximum spanning tree for the graph (step 2). The rationale behind this step is to cover all VMs and obtain a maximum sum of the communication strengths, avoiding connections with low communication strength. The edges of the maximum spanning tree are sorted in step 3 in descending order of communication strength and stored in a list L. Then, in the loop between steps 5 and 10, each edge (v,w) in the list L is examined from the highest to the lowest communication strength edge. If VMs v and w were already allocated then the algorithm considers the next edge in the list (step 6). If VM v has been allocated but VM w has not, then the algorithm invokes the AllocateCloseTo (w, v) procedure that allocates VM w as close as possible to VM v (step 7). More about this procedure later. Step 8 is similar to step 7. In this case, VM v is allocated as close as possible to the already allocated VM w. Finally, in step 9, VMs v and w were not allocated 105

119 yet. In this case, the CoAllocate procedure is used to allocate these two VMs as close as possible to each other. Step 1: Build the undirected graph G = (V,E,L) where the set of nodes V corresponds to each of the VMs requested by a consumer request β = ( µ,c), the set of edges E = {(v,w) v,w V,C[v,w] (0,1]}, and L is a set of edge labels such that the label of edge (v,w) is C[v,w]. Step 2: Build the maximum spanning tree, T, for the graph G. This can be done using any of the existing algorithms for finding the minimum spanning tree (e.g., the Kruskal algorithm [55]) by using the negative values of the edge labels in L. The number of edges in T is K 1 where K is the number of nodes in the tree (equal to the number of VMs requested in β). Step 3: Sort the edges of the tree T in descending weight order. Let this list be denoted as L and its elements denoted as L(1),,L(K 1). So, L(1) corresponds to the edge in T with the highest communication strength. Note that the edges in L are not necessarily the K 1 edges in G with the largest weight. Step 4: [Allocation loop] k 1. Step 5: If k = K then Stop else let L(k) = (v,w). Step 6: If VMs v and w have already been allocated Go to Step 10. Step 7: If VM v has been allocated but VM w has not been allocated then Allocate- CloseTo (w,v); Go to Step 10. Step 8: If VM w has been allocated but VM v has not been allocated then Allocate- CloseTo (v,w); Go to Step 10. Step 9: If VMs v and w have not been allocated then CoAllocate (w,v). Step 10: k k +1; Go to Step

120 We illustrate the operation of the algorithm through a small example. Consider the undirected graph of Fig. 6.2(a) and the corresponding maximum spanning tree in Fig. 6.2(b) (a) (b) Figure 6.2: Example of the Operation of the BVAH Algorithm. Then, the list L is {(4,5),(1,2),(1,6),(1,3),(3,4)}. According to the algorithm above, the following sequence of allocation calls will take place: (a) CoAllocate (4,5); (b) CoAllocate (1,2); (c) AllocateCloseTo (6,1); and (d) AllocateCloseTo (3,1). Consider now that the CP s infrastructure has a single data center with two clusters with two racks each and two servers per rack. Assume for simplicity that all requested VMs have unit capacity. Table 6.2 illustrates the state of the CP as allocations (a)-(d) above take place. The first row after the headers in the table shows the available capacity of the 107

121 CP before any VMs are allocated. For example, server S1 of rack 1 of cluster 1 has available capacity equal to 1 unit and server S1 of rack 2 of the same cluster has an available capacity of 2 units. The following four pairs of rows correspond to the four allocations (a)-(d). The first row of each pair shows where the VMs are allocated and the second row shows the remaining capacity after the allocation. During allocation (a), VMs 4 and 5 are co-allocated at the same server (server S1 of rack 2 of cluster 1). After this allocation, the remaining capacity of this server becomes equal to zero. Allocation (b) requires that VMs 1 and 2 be allocated as close as possible. The only possibility is for these VMs to be allocated on servers S1 and S2 of the same rack (rack 2) of cluster 2. Next, allocation (c) requires VM 6 to be allocated as close as possible to VM 1, which is allocated at rack 2 of cluster 2. The closest available server on that cluster is server S2 on rack 1. Finally, allocation (d) requires that VM 3 be allocated as close as possible to VM 1, which is allocated in cluster 2. It turns out that at this point there is no available capacity in cluster 2. Thus, VM 3 is allocated in cluster 1. Table 6.2: Allocation Example. Data Center Cluster 1 Cluster 2 Rack 1 Rack 2 Rack 1 Rack 2 S1 S2 S1 S2 S1 S2 S1 S (a) 4, (b) 4, (c) 4, (d) 3 4,

122 We now explain the algorithm used by procedures AllocateCloseTo and CoAllocate. Algorithm 5 shows the pseudo-code for the CoAllocate algorithm. It compares the remaining capacity at each server with the required capacity by a VM and attempts to place VMs v and w on the same server (line 3), same rack (line6), same cluster (line 9), same data center (line 12). If everything fails, VMs v and w are allocated on different data centers if there is available capacity. Algorithm 5 CoAllocate Algorithm 1: CoAllocate (VM v, VM w); 2: /* allocate VMs v and w as close as possible of each other */ 3: if server s such that c s d v +d w then 4: allocate v and w on s 5: else 6: if a rack with servers s 1 and s 2 s.t. c s1 d v and c s2 d W then 7: allocate v on s 1 and w on s 2 8: else 9: if a cluster with servers s 1 and s 2 s.t. c s1 d v and c s2 d w then 10: allocate v on s 1 and w on s 2 11: else 12: if a datacenter with servers s 1 and s 2 s.t. c s1 d v and c s2 d w then 13: allocate v on s 1 and w on s 2 14: else 15: if datacenters d 1,d 2 with servers s 1 and s 2 s.t. c s1 d v and c s2 d w then 16: allocate v on s 1 and w on s 2 17: else 18: no allocation possible 19: end if 20: end if 21: end if 22: end if 23: end if Algorithm 6 shows the process used to allocate VM v as close as possible to an already allocated VM w. The algorithm first attempts to allocate VM v at the server where VM w is allocated (lines 4-5). If this fails, it attempts to allocate VM v on the same rack as w (lines 7-8). If not possible, it attempts the allocation in the same cluster (lines 10-11). If this attempt is not successful, an allocation on the same data center is tried (lines 13-14). If that also fails, the algorithm attempts to allocate VM v at any other data center. 109

123 Algorithm 6 AllocateCloseTo Algorithm 1: AllocateCloseTo (VM v, VM w); 2: /* allocates VM v as close as possible to VM w */ 3: Let A w = (s,r,c,d) 4: if c s d v then 5: allocate v on s 6: else 7: if a server s 1 on rack r s.t. c s1 d v then 8: allocate v on s 1 9: else 10: if a server s 1 on cluster c s.t.. c s1 d v then 11: allocate v on s 1 12: else 13: if a server s 1 on data center d s.t. c s1 d v then 14: allocate v on s 1 15: else 16: if a server s 1 on data center d 1 (d 1 d) s.t. c s1 d v then 17: allocate v on s 1 18: else 19: no allocation possible 20: end if 21: end if 22: end if 23: end if 24: end if An efficient implementation of algorithms 5 and 6 would use a B -tree [25] in which the keys (stored in ascending order) are the result of concatenating the data center id, cluster id, rack id, and server id. The value associated with a key is the remaining capacity of the server at the location indicated by the key. In a B -tree, all leaf nodes are typically linked so that one can traverse just the leaves if necessary. A leaf node contains several key-value pairs. By traversing the leaf nodes from left to right one could more efficiently find available capacity on the same server, same rack, same cluster, and same data center. One can also obtain, as required by algorithm 6, with a few steps the available capacity at a given server (even when the CP has millions of servers) Advanced VM Allocation Heuristic (AVAH) This advanced allocation procedure is performed when a new allocation request β arrives. Consider the set D consisting of the M most recent allocation requests and the corresponding revenues obtained by these allocations. When a new VM allocation request β arrives, the 110

124 advanced heuristic performs the following steps, which implements a sort of greedy hill climbing. All the allocations described in the following algorithm are done using the Basic VM Allocation Heuristic (BVAH) described in the previous subsection. Step 1: For each request β d D do Step 1.1: Allocate β and compute the total revenue R obtained by this allocation. Step 1.2: Let R d be the revenue originally generated by request β d. Deallocate the VMs in β d and in β. Step 1.3: AllocatetheVMsintherequestβ usingbvahandobtaintherevenue R new generated by this allocation. Step 1.4: Allocate the VMs in the request β d using BVAH and obtain the revenue R new d generated by this allocation. Step 1.5: If (R new d +R new ) > (R d +R) then go to Step 3. Step 1.6: Deallocate β d ; Deallocate β; Allocate β d in this order using BVAH. Step 2: Allocate β using BVAH; Step 3: Update D by removing the least recent request and adding the request β to D. The AVAH allocation procedure mitigates the problem of cloud infrastructure fragmentation because it deallocates the VMs requested in the M most recent requests and reallocates them using BVAH. It should be noted that the deallocation and allocation operations described in the AVAH algorithm should not be carried out in the actual cloud but in a data structure that keeps track of the allocation of VMs in the cloud infrastructure. Actual allocation and deallocations should only be made after the algorithm determines the final allocation/deallocation combination. 111

125 6.5.3 No Communication (NoComm) Allocation Strategies For purposes of comparison, we consider two variants of the BVAH and AVAH VM allocation strategies. Both are based on ignoring the values of the communication strength index for allocation purposes only. These strategies can be easily implemented by making C[i, j] = 1 i,j. Clearly, the revenue r p i,j should be still calculated using the original values of the communication strength indices in the matrix C. We call these modified BVAH and AVAH strategies B-NoComm and A-NoComm, respectively. They consider the fact that the CP infrastructure is organized hierarchically and has different communication costs depending on the relative location of VMs in the cloud and also consider the capacity constraints. 6.6 Availability Constraint The availability of VMs in datacenters is crucial to most applications. In particular, if more than one VM related to a given request is allocated to the same server, the failure of that server brings down all the VMs running on it. Therefore, when submitting a request, a customer may want to indicate that certain VMs should not be co-located on the same server in order to increase the availability of the ensemble of requested VMs. For example, consider the VMs that support a 3-tier application consisting of web server, application servers, and database servers. If all VMs that run the web servers are allocated into the same server, the entire site will go down when that server fails. Consider the impact of co-location in what follows. Let S be the number of servers, each with availability A, used to allocate the V VMs requested by a user. Then, the availability of the set of VMs in a request is calculated as: Availability = 1 (1 A) S (6.7) where S [1,V]. Thus, the range of availability values is [A,1 (1 A) V ]. If A = 0.9 and V = 5, that range varies from 0.9 to = Thus, the overall availability 112

126 increases as the VMs are allocated into more servers. We then extend the process described in the previous sections by adding to the allocation request submitted by a user constraints in terms of co-location of VMs on the same server. We define for that purpose a Do Not Co-locate (DNC) matrix such that DNC [i,j] = 1 if VMs i and j should not be allocated on the same server and DNC [i,j] = 0 if there is not such a restriction. This matrix is submitted by the user with each request along with the communications strength matrix. The DNC constraint is added to Algorithm 5 line 3 which becomes: server s such that c s d v +d w and DNC [v,w] Experimental Results We implemented the various strategies described in the previous sections in MatLab. The experiments consider a random stream of VM allocation or deallocation requests generated by consumers of all categories. The arrival process of requests is considered to be a Poisson process. Table 6.3 shows the parameters used in the experiments. We consider 1,600 servers organized in two data centers, each with ten clusters with four racks each. There are 20 servers per rack. There are three types of VMs (N = 3) offered by the CP and their capacities are 1, 2, and 4 compute units, respectively. Each request requires 10 VMs and each server has a capacity equal to 10 compute units. There are three categories of consumers (P = 3). We randomly generated 30 workloads composed of 1200 requests each for a total of 36,000 requests. In each workload, the communication strength matrix C, the virtual machine types, and customer class for each request were randomly generated. We then used the same 30 workloads to compare all VM allocation strategies described above. We computed the average per-request revenue and the cumulative revenue along with their 95% confidence intervals for all 30 workloads and all requests. The DNC is randomly generated based on a fraction f of cells of the DNC above the diagonal that have a value equal to 1. For example, when f=0.7, 70% of the values above the diagonal in the DNC matrix are equal to one. 113

127 Table 6.3: Parameter Values for the Experiments. Parameter Value D 2 C(d) 10 for all data centers R(c, d) 4 for all clusters S(r,c,d) 20 for all racks N 3 P 3 d t 1, 2, and 4 K 10 λ 0.3 C s(r,c,d) 10 for all servers M 20 r p max = 10.5,11.7, and 12 for p = 1,2,3 r p min = 1,2, and 3 for p = 1,2,3 We defined the CP s normalized capacity utilization ρ as H h=1 t ρ = n t,h d t H h=1 C. (6.8) h Note that the numerator in Eq.(6.8) is the total allocated capacity, expressed as the summation over all H servers and over all VM types of the number n t,h of VMs of type t allocated at server h multiplied by the required capacity d t of a VM of type t. The denominator of this equation is simply the total capacity available in all servers. Theworkloadfortheexperimentswasgeneratedinawaythatmaintainsρbelowatarget value of around 0.7. Requests start as allocation requests. At each request generation, the 114

128 capacity utilization ρ is compared to the target value. Once ρ reaches or exceeds the target value, the request becomes, with a 50% chance, an allocation or deallocation request. We also defined a very-easy-to-compute upper bound (which we call UpperOPT) for the optimal per-request revenue by considering that there is no space constraint on the servers. That implies that α = 1 in the revenue equations (6.3) and (6.4). Figure 6.3 shows the variation of the normalized capacity utilization ρ over time as requests are processed. The figure shows that once the target value of ρ (i.e., 0.7) is reached, deallocation requests start to be generated. This happens at around time Normalized Capacity Allocated Time Figure 6.3: Normalized allocated capacity over time 115

129 Figure 6.4 shows a comparison between the average revenue obtained per request using the AVAH (top) and the BVAH (bottom) allocation strategies for the linear revenue function. The graph shows that both strategies generate nearly the same average revenue until time However, once deallocation requests start to be generated, there is a clear separation in the revenue lines for AVAH and BAVH. The reason is that once the deallocation of VMs happens, AVAH shows a marked advantage over BVAH because AVAH takes the advantage of available capacity that occurs in the cloud due to deallocation and has more opportunity to rearrange new and existing VMs to place them in close proximity and obtain an improved revenue Revenue AVAH Linear BAVH Linear Time Figure 6.4: AVAH (top) vs. BVAH (bottom) for the linear revenue function. 116

130 Also, in all the graphs we can see that AVAH has a much higher variability than BVAH because AVAH is capable of improving the revenue by switching the order of allocations between a current request and a previously allocated request if this improves the revenue. The average revenue for BVAH is 65.3 ± at the 95% confidence level and for AVAH is 71.9 ± Thus, AVAH provides a 10% higher average revenue than BVAH and performs better than BVAH at the 95% confidence level Revenue AVAH Linear A-NoComm Linear Time Figure 6.5: AVAH (top) vs. A-NoComm (bottom) for the linear revenue function. 117

131 Revenue AVAH Exponential A-NoComm Exponential Time Figure 6.6: AVAH (top) vs. A-NoComm (bottom) for exponential revenue function Figure 6.5 shows a comparison between AVAH (top) and A-NoComm (bottom) when the revenue function is linear. When the allocation strategy does not consider the influence of the communication strength between VMs, an inferior allocation in terms of revenue is obtained. In fact, the average revenue for the AVAH case is 71.9 ± and that of A-NoComm is 52.9 ± Therefore, AVAH provides a 36% better revenue and is better than A-NoComm at the 95% confidence level. A similar graph to that in Fig. 6.5 is shown in Fig. 6.6 for the exponential revenue function. In this case, AVAH provides an average revenue of 72.1 ± while A-NoComm provides an average revenue of 53.2 ± Thus, AVAH generates 35% more revenue than A-NoComm and is better than A-NoComm at the 95% confidence level. 118

132 Revenue BVAH Linear B-NoComm Linear Time Figure 6.7: BVAH (top) vs. B-NoComm (bottom) for the linear revenue function. Figure 6.7 compares the average revenue of BVAH (top) and B-NoComm (bottom) for a linear revenue function. The figure clearly shows that BVAH is superior than B-NoComm. In fact, the average revenue generated by BVAH is 65.3 ± and by B-NoComm is 40.5 ± Thus, the average revenue generated by BVAH is 61% superior than that of B-NoComm and BVAH is better than B-NoComm at the 95% confidence level. Figure 6.8 is similar to Fig. 6.7 except that in this case the exponential revenue function is used. Similarly to the AVAH case, there is a marked separation between the two strategies. BVAH s average revenue is 65.3 ± while B-NoComm is 40.4 ± Thus, BVAH generates 61% more revenue than B-NoComm and is better than B-NoComm at the 95% confidence level. 119

133 Table 6.4 shows a summary of the results as well as the UpperOPT results that show that BVAH achieves more than 88% (for linear and exponential revenue functions) and AVAH more than 87% (for linear function) and 88% (for the exponential function) of the upper bound for the optimal solution Revenue BVAH Exponential B-NoComm Exponential Time Figure 6.8: BVAH (top) vs. B-NoComm (bottom) for the exponential revenue function. 120

134 Table 6.4: Summary of Results. Strategy Accumulated Avg. Request UpperOPT Revenue Revenue Linear Revenue Model BVAH 25,686 ± ± AVAH 27,474± ± B-NoComm 16,207 ± ± A-NoComm 20,775 ± ± Exponential Revenue Model BVAH 25,667 ± ± AVAH 27,496 ± ± B-NoComm 16,153 ± ± A-NoComm 20,744 ± ± Figure 6.9 compares the average availability of VMs when using DNC with f=70% (top) and without using DNC (bottom) for AVAH with a linear revenue function. The figure clearly shows that the availability of VMs is higher when using the DNC matrix. As shown, once deallocations start to occur at time 1500, the availability of VMs with no DNC starts to sharply decrease because more capacity is offered from the deallocated VMs so new VMs will be placed in the same servers, thus decreasing the availability. After that, as allocation requests arrive and find a fuller cloud the VMs will be placed apart resulting in higher availability over time. In fact, the average availability generated by AVAH with DNC f=70% is ± and by AVAH with no DNC is ± Thus, the average availability generated by AVAH using DNC is better than AVAH with no DNC at the 95% confidence level. Table 6.5 compares the results of the average availability of all algorithms with and without DNC. The fourth column of Table 6.5 calculates the value of, the difference between the average availability using DNC f =70% (B) and the average availability with no DNC (A). Note that the 95% confidence interval for does not contain zero. Because this confidence interval is all in the positive side (note that we defined as 121

135 B - A), it means that f=0.7 has a higher availability at the 95% confidence level. If it did contain zero, there would be no significant difference at the 95% confidence level. Availability DNC f=70% No DNC Time Figure 6.9: Availability for AVAH with no DNC (top) vs. AVAH with DNC f=70% (bottom). Even though using DNC improves the availability, the revenue decreases when DNC is used because avoiding the placement of VMs in the same servers makes VMs to be placed further apart from each other; the revenue depends on the relative position of VMs. Figure 6.10 shows the average revenue of AVAH when using DNC with f=70% (top) and 122

136 without using DNC (bottom) for a linear revenue function. The average revenue generated by AVAH with DNC f=70% is 67.4 ± and by AVAH with no DNC is 71.9 ± Thus, the average revenue generated by AVAH with no DNC is better than AVAH using DNC at the 95% confidence level Revenue DNC f=70% No DNC Time Figure 6.10: Revenue per request for AVAH with no DNC (top) vs. AVAH with DNC f=70% (bottom) for the linear revenue function. 123

137 x Accumulated Revenue DNC f=70% No DNC Time Figure 6.11: Accumulated revenue for AVAH with no DNC (top) vs. AVAH with DNC f=70% (bottom) for the linear revenue function. The accumulated revenue over time is shown in Figure 6.11 for AVAH with no DNC (top) and AVAH with DNC f=70% (bottom). We can see a clear separation between the two revenue lines showing that AVAH with no DNC generates better revenue over time. In fact, the accumulated revenue generated by AVAH with DNC f=70% is 25,954 ± 566 and by AVAH with no DNC is 27,474 ± 600. Thus, the accumulated revenue generated by AVAH with no DNC is higher than AVAH using DNC at the 95% confidence level. 124

138 Table 6.5: Availability with DNC. Strategy A= Avg. Availability B= Avg. Availability =B-A No DNC DNC f=70% BVAH ± ± ± AVAH ± ± ± B-NoComm ± ± ± A-NoComm ± ± ± Concluding Remarks This chapter considered the problem of optimal allocation of VMs in a hierarchically organized CP infrastructure. Servers are organized into racks, which are organized into clusters, which are organized into data centers. The communication cost increases as we move from same server, to different server but same rack, to different rack but same cluster, to different cluster but same data center, and to different data center. The VM allocation problem deals with maximizing the CP s revenue, which is a function of the relative placement of servers and of the degree of communication intensity between VMs. The results of the chapter show that consumers can benefit if they are able to provide accurate information about their application needs and cloud providers can increase their revenue if they place requested VMs as close as possible based on communication needs. A basic heuristic (BVAH) and an advanced heuristic (AVAH) were presented to find a near optimal solution to this NP-hard problem. We also presented versions of BVAH and AVAH, called B-NoComm and A-NoComm, which ignore the communication strength among VMs when allocating VMs. The experiments showed that AVAH is better than BVAH and that BVAH (AVAH) is better than B-NoComm (A-NoComm) at the 95% confidence level. These experiments also showed that BVAH and AVAH generate around 88% 125

139 of the upper bound of the optimal revenue. We also introduced availability constraints based on user requests for not co-locating VMs on the same server. The experiments showed that, as expected, the availability increases when such constraints are imposed but the revenue decreases. 126

140 Chapter 7: Autonomic Allocation of Virtual Machines in SaaS Cloud Providers 7.1 Introduction Software as a Service (SaaS) [65] allows companies and individuals to use software hosted and managed by a SaaS provider on a pay per use basis instead of hosting software in their own datacenters and pay for the entire cost up front in addition to extra annual charges for software maintenance fees and upgrades. SaaS providers can lease their computing infrastructure to instantiate VMs that run their software services from Infrastructure as a Service (IaaS) [65] providers on a pay per use basis. Customers can subscribe to and unsubscribe from a software service at anytime. Thus, the SaaS cloud provider should dynamically scale the number of needed VMs to run software services as a function of the demand in a way that minimizes the SaaS cost of using VMs from an IaaS but at the same time guaranteeing an agreed upon Quality of Service (QoS) to the customers. Providing an individual VM dedicated to each customer to run a software service could lead to substantial waste of resources and high infrastructure costs. Thus, one efficient way for resource utilization and cost reduction is for SaaS providers to employ a multi-tenancy approach [70, 96] where several customers (tenants) can subscribe to the same application that is already running on a specific VM, such that this application behaves for each tenant as if this tenant were the sole user of the application. Consequently, SaaS providers can run the same application for multiple customers in the same computing environment to increase the utilization of resources. While using this approach, SaaS providers should maintain response time SLAs and other resource constraints at all times. This chapter solves the problem of determining how SaaS providers can optimally manage the dynamic nature of customer requests in a heterogeneous environment in which VMs 127

141 are of different capacities, cost, and computing power. SaaS providers need to determine how many and what type of VMs to instantiate in order to satisfy a given demand for software services while meeting response time SLAs. We present a heuristic solution, that we call ScaleUpDown, based on hill-climbing that provides a near optimal solution that provides solutions very close to the optimal solution while visiting a very small fraction of the solution space. Our experiments showed that the number of states visited by ScaleUpDown is very low (on the order of 10 4 of the entire space) while the solution obtained is 2% more expensive in many cases, 13% more expensive inothers, and31%moreexpensiveinonlyonecase. Aswithanysuchlocalsearchalgorithm, the key is in the heuristic used to determine the neighborhood of a point in the search space. We devised a heuristic to find a neighborhood, which works as follows: for each VM type build a neighborhood of a state by (1) adding a VM of larger capacity, moving users from lower capacity VMs to the newly added VM, and removing unused VMs of lower capacity; and (2) removing a VM of larger capacity and moving users from that VM to VMs of lower capacity. We also ran experiments with another heuristic we developed, called FillSlotsFirst. Our experiments showed that at the peak value, the current cost of FillSlotsFirst is 50% higher than that of ScaleUpDown. The accumulated cost for FillSlotsFirst increases at a rateof0.22 /secattheendofourexperimentswhiletheaccumulatedcostforscaleupdown grows at the smaller rate of 0.18 / sec during the same interval. Some of the work reported in this chapter has been submitted for publication [5]. The rest of the chapter is organized as follows. Section 7.2 introduces the problem statement and describes the notation. Section 7.3 formalizes the optimization problem solved in this work. Section 7.4 describes the heuristic techniques we used to solve the optimization problem. The results of experiments are discussed in Section 7.5. Finally, section 7.6 concludes the chapter. 128

7.2 Problem Formalization and Notation This chapter considers the following assumptions and notation regarding SaaS cloud providers as illustrated in Fig. 7.1.

142 7.2 Problem Formalization and Notation This chapter considers the following assumptions and notation regarding SaaS cloud providers as illustrated in Fig We assume that customers of an SaaS provider can subscribe/ unsubscribe to software applications offered by the SaaS provider. Each application provided by the SaaS provider is offered at different QoS levels. With each request to the SaaS provider, a customer informs the application it wants, the desired QoS level of the application, and the number of users to be added or removed from the subscription of that application at that QoS level. For example, customer s may initially send a request (a,q,u,s) to add u users as subscribers of application a at QoS level q. A positive value for u represents additional subscribers to the application and a negative value represents a decrease in the number of subscribers. Figure 7.1: Customers subscribe/unsubscribe to software services at requested QoS for a requested number of users. The SaaS provider determines a near optimal number of VMs to be requested from an IaaS cloud provider to run these software services. 129

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision