Escuela Politécnica Superior de Linares

Escuela Politécnica Superior de Linares UNIVERSIDAD DE JAÉN Escuela Politécnica Superior de Linares Master Thesis ENERGY OPTIMIZATION IN CLOUD Student: COMPUTING SYSTEMS Iván Tomás Cotes Ruiz Supervisors: Dr. Rocío Pérez de Prado Dr. Sebastián García Galán Department: Telecommunication Engineering Department June, 2016

Contents Index of tables... 3 Index of figures... 4 Antecedentes... 5 Objetivos... 7 Conclusiones... 8 1. Background... 10 2. Objectives... 11 3. Methodology... 12 3.1. State of the art... 12 3.1.1. Power model... 12 3.1.2. Dynamic Voltage and Frequency Scaling (DVFS)... 13 3.1.3. Fuzzy Logic... 15 3.1.4. Cloud computing types... 23 3.1.5. Power saving techniques in Datacenters... 27 3.2. First stage: simulation environment... 32 3.2.1. CloudSim... 33 3.2.2. CloudSim with DVFS... 35 3.2.3. WorkflowSim... 36 3.2.4. Merged simulator... 43 3.2.5. Changes with WorflowSim... 44 3.2.6. Changes with CloudSim... 45 3.2.7. Modifications and additions to achieve the proposed joint simulator... 46 3.2.8. Additional notes to the power model... 48 3.3. Second stage: scheduling algorithms... 50 3.3.1. Power aware scheduling... 50 3.3.2. VM scheduling... 51 1

3.3.3. Tasks scheduling... 53 3.3.4. Bag-of-tasks power aware scheduling... 54 3.3.5. Classic schedulers adapted to power... 55 3.3.6. Watts per MIPS scheduler for VMs... 56 3.3.7. Fuzzy integration in WorkflowSimDVFS... 58 3.3.8. VM scheduling FRBS... 61 3.3.9. Tasks scheduler FRBS... 62 3.3.10. Power model analytical... 63 3.3.11. Integration with Matlab... 65 4. Results and discussion... 67 4.1. DVFS results... 67 4.1.1. DVFS savings... 67 4.1.2. DVFS parameters evolution... 75 4.2. FRBS results... 78 4.2.1. FRBS simulation scenario... 78 4.2.2. Rules generated for both FRBS schedulers... 82 4.2.3. FRBS savings... 84 5. Conclusions... 91 Bibliography... 92 2

Index of tables Table 1: basic temperature levels... 16 Table 2: membership functions of air conditioner FRBS... 18 Table 3: car FRBS example... 20 Table 4: CloudSim s basic example... 34 Table 5: WorkflowSim communication messages between entities... 38 Table 6: WorkflowSim tags meaning... 39 Table 7: Frequency multipliers and MIPS... 48 Table 8: Energy estimation example... 54 Table 9: Time summary (s)... 68 Table 10: Overall power summary (W)... 70 Table 11: Avg power summary (W)... 72 Table 12: Energy summary (Wh)... 73 Table 13: physical hosts configuration input parameters... 79 Table 14: physical host s configuration calculated parameters... 81 Table 15: VM MIPS in FRBS scenario... 82 Table 16: basic experiments... 85 Table 17: experiments with fuzzy task scheduler... 86 Table 18: experiments with fuzzy VM scheduler... 88 Table 19: experiments with both fuzzy schedulers... 89 3

Index of figures Figure 1: fuzzy temperature levels... 17 Figure 2: FRBS objects... 20 Figure 3: distance membership functions... 21 Figure 4: speed membership functions... 21 Figure 5: acceleration membership functions... 21 Figure 6: car FRBS test {0.8, 0.2}... 22 Figure 7: CloudSim s basic example... 34 Figure 8: Montage 25 DAG... 36 Figure 9: WorkflowSim initialization stage... 42 Figure 10: WorkflowSim main stage... 42 Figure 11: WorkflowSim ending stage... 43 Figure 12: Time summary (s)... 69 Figure 13: Time savings (%)... 69 Figure 14: Overall power summary (W)... 70 Figure 15: Overall power savings (%)... 71 Figure 16: Avg power summary (W)... 72 Figure 17: Avg power savings (%)... 72 Figure 18: Energy summary (Wh)... 73 Figure 19: Energy savings (%)... 74 Figure 20: Utilization evolution... 76 Figure 21: Multiplier evolution... 76 Figure 22: Power evolution... 77 Figure 23: VmWattsPerMipsMetric result values... 85 Figure 24: VmWattsPerMipsMetric result savings... 86 Figure 25: VmWattsPerMipsMetric result values (2)... 87 Figure 26: VmWattsPerMipsMetric result savings (2)... 87 Figure 27: VmsFuzzy result values... 88 Figure 28: VmsFuzzy result savings... 89 Figure 29: VmsFuzzy result values (2)... 90 Figure 30: VmsFuzzy result savings (2)... 90 4

Antecedentes La industria de Cloud Computing se encuentra en continuo crecimiento en los recientes años. Cada vez más empresas ofrecen servicios de almacenamiento o procesamiento en la nube. Esto ofrece a los usuarios la capacidad de acceder a esos servicios desde cualquier lugar con tan sólo una conexión a Internet, haciendo mucho más sencillas tareas como copias de seguridad de datos importantes, así como poder obtener acceso a una capacidad de procesamiento superior para la ejecución de tareas. Sin embargo, el crecimiento de los Centros de Procesamiento de Datos (CPD), contiendo cada vez más servidores con mayor número de recursos está incrementando su consumo energético y, por tanto, el consumo energético global. Además, los sistemas de refrigeración necesitados para evitar altas temperaturas en los CPD son numerosos y consumen una alta cantidad de potencia. Las técnicas que reducen el consumo de potencia también consiguen una reducción en los costes, así como un incremento en la fiabilidad del sistema, ya que un mayor consumo energético genera una mayor temperatura y cuanto mayor es la temperatura del sistema se consiguen mayores tasas de error. Como se manifiesta en [ 27 ], por cada 10ºC de incremento de la temperatura de un sistema, la tasa de fallo se duplica. Se requiere una solución inminente para solventar esto. Como se menciona en varios artículos [ 1 ][ 2 ][ 3 ][ 4 ] el consumo energético debido a los servidores se eleva al 0.5% mundial en 2008, y se espera que ese consumo se cuadriplique en 2020. Además, se estima que el consumo de un superordenador equivale a la suma del consumo de 22000 hogares en los Estados Unidos. Hay una gran cantidad de trabajo previo relacionado con este proyecto. En [ 43 ], M. Seddiki et al. introducen un planificador de potencia basado en reglas borrosas para máquinas virtuales en los simuladores CloudSim y RealCloudSim, pero no consideran planificación de tareas ni el uso de workflows. En [ 44 ], García-Galán et al. proponen una nueva estrategia llamada KARP, como una alternativa a la aproximación de Michigan. KARP está basado en PSO, mientras que Michigan está basado en Algoritmos Genéticos. Aplican esta nueva estrategia en Grid Computing, obteniendo mejores resultados que con la alternativa genética. En [ 45 ] comparan diferentes estrategias de adquisición de conocimiento en sistemas borrosos, comparando KASIA con la aproximación de Pittsburgh y KARP con la aproximación de Michigan, y aplicando estas técnicas a Grid Computing. Muestran cómo KASIA y KARP, 5

ambos basados en PSO consiguen mejores resultados que sus alternativas genéticas. La estrategia KASIA es introducido en [ 46 ], donde muestran las ventajas del algoritmo PSO sobre los algoritmos genéticos. También trabajan con meta planificadores basados en algoritmos genéticos en [ 47 ]. 6

Objetivos El objetivo principal de este proyecto consiste en reducir los consumos de potencia y energía de los CPD. Cada algoritmo implementado necesita ser probado. Para evitar la necesidad de poseer in CPD, este proyecto está basado en simulaciones. La elección del simulador es importante, debido al hecho de obtener resultados lo más similares a un CPD real como sea posible. Si los resultados obtenidos en un simulador no se corresponden con los que obtendrían en un escenario real, el análisis y las conclusiones no tienen sentido, y no podremos decir que los algoritmos desarrollados consiguen ahorrar energía en CPD reales. Teniendo esto en cuenta, la primera etapa de este proyecto consiste en conseguir un buen escenario de simulación como la base para la segunda etapa, el diseño de los algoritmos para reducir el consumo energético. De este modo, los resultados obtenidos al final del proceso serán basados en un buen entorno de simulación. Siempre se ha de tener en cuenta que los resultados de la simulación no serán perfectos. Un procesamiento real en un CPD real considera múltiples parámetros, y no todos ellos son considerados en los simuladores. Sin embargo, los simuladores están consiguiendo cada vez una mejor correspondencia con el funcionamiento de CPD reales y ofrecen a los investigadores una gran herramienta para probar diferentes tipos de algoritmos para mejorar el rendimiento de los CPD. Existen diversas técnicas de ahorro de potencia para CPD. En la sección 3.1.5 se muestran algunos de las más importantes. Los métodos usados en este proyecto se basan en una combinación del algoritmo DVFS y el desarrollo de dos sistemas expertos basados en reglas para proveer planificadores basados en potencia para los ámbitos de planificación de máquinas virtuales y tareas. El algoritmo DVFS ya está implementado en el simulador elegido como base del proyecto. En esta documentación se describe su uso y se analizan los ahorros de potencia y energía conseguidos con él. Los dos sistemas expertos desarrollados están basados en una optimización con un único fitness, en los cuales se ha considerado la energía. A lo largo de esta documentación, se explican en detalle los parámetros considerados en cada sistema experto y se analizan los resultados obtenidos. 7

Conclusiones La sección 4 de esta documentación muestra los resultados obtenidos en las dos etapas del proyecto. En primer lugar, se realizan experimentos con el entorno de simulación obtenido y el algoritmo DVFS. La sección 4.1.1 muestra estos resultados, y la sección 4.1.2 analiza la evolución de tres parámetros del algoritmo en el simulador. La Tabla 12, y Figuras 18 y 19 muestran que los mejores resultados obtenidos para el consumo energético son conseguidos por los gobernadores dinámicos, como era de esperar. También se analizan los resultados obtenidos para el tiempo de procesamiento y la potencia consumida, donde se muestra cómo el gobernador de mínima potencia introduce retardos en la ejecución, incrementando el tiempo total de procesado y elevando el consumo de energía. Del mismo modo, se ejecutan 14 experimentos con respecto a la segunda etapa del proyecto, la adición de dos planificadores sistemas expertos basados en reglas borrosas. La sección 4.2.1 describe el escenario de simulación, mostrando las características de las máquinas virtuales y físicas. La sección 4.2.2 muestra los resultados obtenidos en los 14 experimentos, comparando los planificadores mencionados con otros algoritmos no basados en reglas. Los mejores resultados obtenidos en cuanto a consumo energético se deben al doble uso de ambos planificadores de máquinas virtuales y tareas basados en reglas, obteniéndose un ahorro total de un 7.23% como muestra la Tabla 19 ya las Figuras 29 y 30. También se analizan los resultados con respecto a tiempo de ejecución y consumo de potencia, pero al no hacer una optimización multi-objetivo los resultados obtenidos con respecto a tiempo no son optimizados por los planificadores sistemas expertos. Tras ejecutar y analizar todos los experimentos, podemos concluir sabiendo que los algoritmos desarrollados consiguen un menor valor del consumo energético, lo que queríamos conseguir. A pesar de que estos algoritmos y resultados están basados en experimentos ejecutados en un simulador y no podemos afirmar que los resultados obtenidos sean exactamente lo que se habría obtenido ejecutando estos algoritmos en un sistema real, podemos asegurar que estos resultados son similares a los obtenidos en un escenario real, debido a las características implementadas en el simulador. No incluir el algoritmo DVFS no haría que los planificadores de lógica borrosa ahorraran más o menos energía, pero los resultados que obtenemos incluyendo este algoritmo son más cercanos a los obtenidos en un escenario real, donde los procesadores que contienen los servidores ejecutan este algoritmo de manera nativa internamente. 8

Adicionalmente, la importancia de la inclusión de los Gráficos Acíclicos Dirigidos (DAG) nos permiten reafirmar la autenticidad de los resultados y la reducción de la energía consumida con estos planificadores de lógica borrosa, ya que el orden de las tareas que son ejecutadas en estos experimentos siguen un patrón real de consecuciones, y además sus longitudes no son generadas de forma aleatoria. Desarrollos futuros de este proyecto incluirán una optimización multi-objetivo, basada en ambos parámetros tiempo y energía como fitness en estos algoritmos. Como es esperado, los resultados de la Tabla 16 muestran una buena reducción de energía, pero no podemos conseguir una gran reducción del tiempo de procesamiento. Esto es debido a que todos los algoritmos lógica borrosa y Pittsburgh están basados en utilizar la energía como fitness únicamente, no teniendo en cuenta el tiempo de ejecución. Los resultados que obtienen un menor tiempo de ejecución también obtienen mayor consumo energético. Esta optimización multi-objetivo obtendría un equilibrio entre estos dos parámetros, consiguiendo valores que satisfagan ambos requisitos de unos bajos tiempo de ejecución y consumo energético en los sistemas de Cloud Computing. En el caso del simulador propuesto, al ser de código libre ofrecemos una gran oportunidad a investigadores de todo el mundo de poder probar diferentes algoritmos y comprobar los resultados en tiempo, potencia y energía, sabiendo que puede ejecutar trazas reales, reproduciendo cargas de trabajo de escenarios reales. Esto puede ahorrar inversión en las primeras etapas de investigación, cuando se desean probar nuevos algoritmos. Una vez que el algoritmo obtiene buenos resultados en el simulador, se puede probar en un escenario real para confirmar su funcionamiento. En todo caso, este simulador de código libre podrá seguir creciendo por manos de cualquier investigador que desee contribuir al proyecto. 9

1. Background The industry of Cloud Computing is in continuous growth in the recent years. More and more enterprises offer processing or data storage services in their datacenters. This makes users able to access those services wherever they are, making much easier to backup and access to their important data, as well as obtaining additional processing resources for their tasks. However, the growth of the datacenters, containing more clusters and servers with more resources is increasing their energy consumption, and hence, the global amount of energy consumed. Also, the cooling systems needed to avoid high temperatures in Datacenters are numerous and consume a high amount of power too. The techniques that reduce power consumption also get a reduction in operational costs, as well as an increase in system reliability, as higher power means higher temperature and high temperatures lead to higher failure rates. As stated in [ 27 ], for every 10ºC that temperature increases in a system, the failure rate is doubled. An imminent solution is needed to solve this. As it is mentioned in several papers [ 1 ][ 2 ][ 3 ][ 4 ] the energy consumption due to datacenters was up to a 0.5% of the global consumption in 2008, and it is expected to continue raising to quadruple it on 2020. Additionally, estimations show that the energy consumed by a super computer is equivalent to the energy consumed by 22000 households in the US. There is a wide range of previous work related to this project. In [ 43 ]. M. Seddiki et al. introduce a power-aware FRBS scheduler for VMs in CloudSim and RealCloudSim, but they do not consider tasks scheduling or the use of workflows. In [ 44 ], García-Galán et al. propose a new strategy named KARP, as an alternative to the use of the Michigan approach. KARP is based on PSO, while the Michigan approach is based on the Genetic Algorithm. They apply this new strategy in Grid Computing, getting better results than using the genetic counterpart. In [ 45 ], they compare different strategies of knowledge acquisition in fuzzy systems, comparing KASIA with the Pittsburgh approach and KARP with the Michigan approach, and applying these techniques to Grid Computing. They show how KASIA and KARP, both based on PSO achieve better results than their genetic alternatives. The KASIA strategy is introduced in [ 46 ], where they show the advantages of the PSO algorithm against Genetic ones. They also cover fuzzy meta schedulers based on Genetic Algorithms in [ 47 ]. 10

2. Objectives The main goal of this project consists on reducing the power and energy consumption of datacenter networks. Every algorithm implemented needs to be tested. To avoid the necessity of owning a datacenter, this project will be based on simulations. The simulator s choice is important, due to the fact of obtaining results the most similar as possible to a real network. If the results got in a simulator don t match a real scenario at all, the analysis and conclusions are pointless, and we can t say the designed algorithms save energy in real datacenters. With this in mind, the first stage of this project consists on providing a good simulation environment as the base for the second stage, the design of the algorithms in order to reduce the energy consumption. This way, the results got at the end of the process will be based on a fairly good simulation environment. Always need to note that the simulation results will not be perfect. A real processing in a real datacenters consider multiple parameters, and not all of them are considered in the simulators. However, simulators are getting better matching with real networks and provide researchers a great tool for testing different types of algorithms to improve the performance of the datacenters. There are several power-saving techniques in Datacenters. In section 3.1.5 we show some of the most important of them. The methods we use are based on a combination of the Dynamic Voltage and Frequency Scaling (DVFS) algorithm and the development of two rule-based expert systems to provide power-aware schedulers in both VMs and tasks scopes. The first algorithm is already implemented in the base simulator chosen and we describe its use and analyze the power and energy savings got with it. The two developed expert systems are based on a single fitness, considered energy. Throughout the documentation, we explain in detail the parameters considered in each expert system and analyze the results got. 11

3. Methodology In this section, the overall process of this project will be explained, from the starting point until the final results, explaining in detail step by step in order to make the reader able to reproduce the experiments. It will begin with a state of the art, introducing several concepts that need to be understood in order to comprehend everything in the project. After that, the overall process will be explained following execution order, from the initial situation, describing problems encountered and approaches to solve them, until the simulation environment is built and explained. Then, the same procedure will be performed with the second stage, describing step by step everything needed to get the algorithms that will save energy consumption. Results and analysis will be described in the next section. 3.1. State of the art In this section we will introduce some concepts that need to be understand in order to fully comprehend the project. 3.1.1. Power model For being able of knowing if the techniques developed in this project achieve the intended reduction of the energy consumption, we need to be able to measure the power consumed. Each task that will be executed in a resource will last a certain amount of time, and that particular resource will consume a certain amount of power. The relation between both parameters allows to estimate the energy that is consumed, following (1). E = P t ( 1 ) This relation will have a great importance in this project, as the balance between power consumption and execution time will limit the energy that can be saved. A machine that can execute a task under a lower time is normally due to having a larger amount of resources, which tends to consume more power. If the amount of time saved is equivalent to the power over consumed, there has not been energy savings at all, so this relation will be the center of this entire project. Execution time is easily got as a ratio between the task s length and the performance of the machine. The power parameter is a more delicate issue. The globally used power model considers energy consumption as a sum of two parts: a dynamic and a static component [ 28 ]. E = E d + E s ( 2 ) 12

The dynamic component depends directly on the processor, while the static component depends on memory, I/O devices and storage. The dynamic power can be easily modified, as will be seen in section 3.1.2, but the static part is not. This is why this static component is normally considered as a relation with the dynamic component. The dynamic power is estimated as on (3) [ 29 ], where there is a relation to the processor voltage squared, the frequency of the processor, its capacitance and a constant k d. P dynamic = k d CV 2 f ( 3 ) Power considered in equation (1) is related to this dynamic component, which is the part that depends on the CPU parameters. So with (1) and (3) we get the dynamic energy. E d = P d t ( 4 ) The static component of the energy is often considered as a portion of the dynamic component (5) [ 30 ][ 31 ], where k s is a constant. E s = k s E d ( 5 ) With this consideration, and joining equations (2) to (5) [ 28 ]: E = E s + E d = k s E d + E d = (1 + k s )E d E s (V, f) ( 6 ) This is why the variation of the global energy is considered as a proportion to the dynamic component, the one that we can vary with the CPU s parameters: voltage and frequency, but always taking into account the balance expressed on (1). The technique introduced in the next section also depends on this balance. Always keep in mind that a reduction of a machine s performance reduces power consumed, but increasing the execution time of the tasks the machine will execute. 3.1.2. Dynamic Voltage and Frequency Scaling (DVFS) Modern CPUs already include this technique which is used to dynamically vary the voltage and frequency of the CPU to adapt to the workload, which will be measured as the utilization of the CPU. Its main objective is trying to reduce the energy consumption. Its behavior depends on the governor selected. There are two basic types: static and dynamic governors. While static ones use a fixed value for both voltage and frequency, dynamic governors are based on thresholds, so the voltage and frequency are varied based on the current utilization in respect to the configured threshold. It is important to note that the different available levels of voltage and frequency depend on each processor. Those values are displayed as multipliers, so that we have a number of discrete values. 13

In this project, we are going to work with 5 different governors. We are introducing them and explaining their differences here. Static Performance: fixes the values of voltage and frequency to achieve the maximum value of performance. This is the configuration that works as per default, which means, as if the DVFS algorithm was not active. This always achieves the lowest execution time in the machine, but incurs the highest power consumption. PowerSave: the opposite as the Performance governor. Fixes the lowest values of voltage and frequency, achieving the lowest power consumption as possible, but enlarging execution time, delaying the end time. UserSpace: allows the user to select the desired values for both voltage and frequency. In this project, not many tests have been performed using this governor, due to its dependency with the different available values of the multiplier of the processor and the current selection of the user. Dynamic OnDemand: works comparing the current value of the utilization of the CPU with a preselected threshold. In the simulator used, this threshold is configurable. The default value in the simulator is 95%. Conservative: varies from the previous governor in considering two thresholds instead of one, called up_threshold and down_threshold. Their default values in the simulator are 80% and 20% respectively. The behavior of the dynamic governors allows to increase the performance when needed, if the utilization of the CPU exceeds the up_threshold, and also allows to lower it to save power when utilization falls below the down_threshold. Note that in OnDemand both values expresses the same one. To avoid a constant variation of the multipliers, an iterator is introduced. When utilization falls below the threshold in OnDemand, an iterator counts a number of steps, and, if after it reaches a certain value the utilization continues being under the threshold, then the multipliers and lowered one step. A more detailed comparison of these governors is shown in the results, where we show those that achieve to reduce the energy consumption and different problems that incur in some of them. 14

The general objective of this algorithm is to reduce both values of voltage and frequency when the utilization is low. While utilization is below 100% there is no further delay in the execution s end time. In other hand, if utilization exceeds the threshold or even reach 100%, voltage and frequency will be switched up until it reaches its maximum, so to use the maximum performance of the CPU if it is needed in the execution of those tasks. Now consider P idle as the amount of power consumed when the utilization of the CPU is 0, and P busy the power consumed when the utilization is 100%. This technique is based on knowing that even though both P idle and P busy depend on CPU s utilization u = 0 and u = 1 respectively, power depends on (3), meaning that as this technique decreases both voltage and frequency, power consumption will be reduced. A CPU can work at a certain number of different frequencies depending on the multiplier, and voltage is scaled with the frequency. Lower voltage implies that lower frequencies can be selected on the CPU, as both parameters are interrelated. The basic power model used on the simulator considers that both P idle and P busy values have been got experimentally on a modern CPU that supports DVFS, meaning that it is possible to measure both parameters at different values of the multiplier, having different frequencies and voltages, so that an array of values will be gotten for each of them. Using DVFS in the simulator we can estimate the power and energy consumption based on a real CPU s behavior, as modern processors introduce this technique. Based on both P idle and P busy values and the utilization level of the CPU we can estimate the power consumed by the CPU of a computer based on a linear increase of the CPU s utilization. P(u) = P idle + (P busy P idle )u ( 7 ) while this utilization u is used to check the thresholds of the selected governor and scale frequency and voltage to minimize the overall energy consumption. This way, we can estimate power and energy consumption in a simulator also considering the savings introduced by the DVFS algorithm. This allows us to have a simulation environment nearer to the behavior of a real processor. 3.1.3. Fuzzy Logic This type of logic is commonly used in Fuzzy Rule Based Systems (FRBS). The main concept of this logic is based on knowing that the value of a variable can be a real number between 0 and 1, instead of true or false. The range of values that get each output for each 15

variable is specified in the Membership Functions. This type of logic gets systems working nearer to the way humans evaluate the state of a parameter. A basic example would be measuring the temperature of the water. Classic logic based on true or false variables would only get discrete values for the input parameters, while fuzzy logic allows defining a combination of the different membership functions based on the level of membership with each of them. In the case of water temperature, we could define different levels of temperature based on the value of the temperature in ºC. For example, defining 5 membership functions we could have the following values: Level Temperature Very cold <10ºC Cold >10ºC AND <20ºC Warm >20ºC AND <30ºC Hot >30ºC AND <40ºC Very hot >40ºC Table 1: basic temperature levels Using classic logic based on IF ELSE statements, a value of temperature of 34ºC would be defined HOT according to Table 1. However, every value of temperature between 31ºC and 39ºC would be indeed defined as HOT, offering the information that all these values incur the same type of temperature. This is not how we humans think, as everyone would agree that 36ºC is hotter than 32ºC. This cannot be achieved using the traditional logic, and it is why the concept of fuzzy logic was created. With this new type of logic, we can now define in a system different ranges of values of the temperature for each specified level. This way, a value of 34ºC would belong to both WARM and HOT levels as it is an intermediate point between both levels. The width of each membership function depends on the user that configures them. Figure 1 below shows a possible configuration for these membership functions. A width of 20 has been set on each of them, allowing each temperature to belong to two membership functions. There are several ways of defining the membership functions in a FRBS. Typical methods include triangular, pyramidal and Gaussian functions among others. The one used in this temperature example uses triangular functions. Some systems as jfuzzylogic consider triangular and pyramidal membership functions as a number of points, while other 16

authors like in [ 5 ] consider pyramidal functions as two points and two slopes. Matlab makes easy the use of Gaussians. Figure 1 has been generated with jfuzzylogic, and as a simple example, we show the code to build this antecedent. FUZZIFY temperature TERM very_cold := (0, 1) (5, 1) (15, 0); TERM cold := (5, 0) (15, 1) (25, 0); TERM warm := (15, 0) (25, 1) (35, 0); TERM hot := (25, 0) (35, 1) (45, 0); TERM very_hot := (35, 0) (45, 1) (50, 1); END_FUZZIFY Figure 1: fuzzy temperature levels Normally in a FRBS, the input value range is not specified as in Figure 1, considering values of temperature between 0 and 50ºC, but instead are normally normalized between 0 and 1. Input value then, need to be normalized according to the maximum possible input temperature, which in this case is 50ºC. This is done this way to allow a single FRBS be able to work in different systems. Imagine that we have two systems that measure temperature. If one system considers temperatures between 0 and 50ºC, and the other one takes values between 0 and 100ºC, we can do two different things. As the temperature ranges are different, we can simply build two FRBS. This, however, feels like having the same engine twice, which is not really efficient. The other option is building just one FRBS between 0 and 1, and make both systems use it. The only thing to do is, prior to send the input parameter, system one need to normalize the temperature by the maximum of 50ºC, and system two 17

with 100ºC. This way we can avoid having multiple FRBS, always that both systems consider the same number and type of antecedents. This is the case of the antecedents, which is the name that receive the input parameters. But fuzzy logic considers other different objects apart from the antecedents. - Fuzzification: This is the first stage. Input parameters are received here after they have been normalized, and this engine is in charge of determining the level of membership that the normalized input parameter has in each membership function. Following the current temperature example, a value of 27ºC would belong in a high grade to the membership function WARM, and would also belong, but in a lower grade to HOT. This engine determines this grade of membership to each membership function. These grades of memberships will be used by the inference engine to evaluate the rules. - Rule base: The rule base includes all the rules that will be executed in the inference process. Rules are defined in an IF THEN clause that will take the level of membership of some or all the antecedents to a specified membership function and use them to obtain a value for the output consequent in a membership function. This can be easier understood with an example. Following the FRBS specified in [ 5 ], we get a system with two antecedents, temperature and variation of the temperature and a consequent, the output response to the air conditioner. Table 2 shows the membership functions for this system. Antecedents Consequent Temperature Variation Output Cold Slow Very low Cool Moderate Low Mild Fast Medium Warm High Hot Very high Table 2: membership functions of air conditioner FRBS 18

A sample rule would be, if the temperature measured is too low, belonging to Cold, and the variation of the temperature is high, belonging to Fast, which means that the temperature is dropping at a high rate, we would want the air conditioner to heat the room fast, so the output response would be Very high. In fuzzy logic, this rule is considered as follows: IF temperature IS cold AND variation IS fast THEN output IS very-high This is an example of a possible rule. Always keep in mind that there can be several rules in the FRBS and all of them are evaluated each time the system evaluates the input parameters to get a value for the output. Then, we need in the system a set of rules that cover the majority of the situations that our system can find when receiving the input parameters. If we have rules for both low and high temperature values, it is obvious to think that the temperature measured and received as input will be either high or low, but never both at the same time. What the fuzzifier does in the first stage, is obtaining the grade of membership of each input to each of their membership functions. As stated most of the memberships grades will be 0, so when evaluating a set of rules, if the input temperature is too low and its grade of membership to the high membership function is 0, the evaluation of the rules considering the high membership function will get a null value for the output. There are two types of rules: those which consider AND clauses in the antecedents and those which use OR clauses. The main difference between both of them is as follows: while AND rules set the consequent value as the minimum value of all of the antecedents, OR rules set the consequent value as the maximum of the antecedents. In the prior rule shown, if the grade of membership of temperature to cold is 0.8 and the grade of variation to fast is 0.3, the grade that output will be set to very-high will be 0.3, as AND set to minimum. Modifying the rule to OR with the given antecedent values would get an output grade of 0.8. - Inference engine: While the rule base only stores the rules defined by the user, which are the same on all the evaluations, the input parameters change in each evaluation and the output value will vary. This evaluation of each rule with the current values of the antecedents is performed in the inference engine, which receives the grade of membership of each antecedent to their membership functions from the fuzzifier. After this evaluation, the system has an output value for each rule. 19

- Deffuzification: This is where all output values are considered to get a single normalized output value. The typical way these values are merged into a single one is using the Center of Gravity (COG) method. When the COG has been got, the deffuzifier will get the normalized output value based on the membership functions of the consequent. Then, this normalized value need to be denormalized in the corresponding system to get an appropriate output value from the FRBS. Figure 2 shows a diagram of this set of objects used in a FRBS. Figure 2: FRBS objects To understand better all these concepts, here we show an example of a FRBS consisting in a system that evaluate the distance and speed of a car in a given time and returns the acceleration of the car. The system needs to move the car until a wall set at a certain distance and stop it in the wall. Table 3 shows the membership functions of the antecedents and consequent. Antecedents Consequent Distance Speed Acceleration Close Slow Much-brake Medium Medium Brake Far Fast Keep-speed Accelerate Much-accelerate Table 3: car FRBS example 20

In this example, both antecedents will have a normalized range of [0 1], as both distance and speed cannot be negative. A negative distance would mean that the wall has been surpassed and a negative speed would mean going backwards, and both of them are unwished situations. However, the output acceleration needs to be negative in those cases where the car is wanted to brake and reduce its speed, so the normalized range of the consequent will be [-1 1]. Figures 3, 4 and 5 show the different membership functions displayed in Table 3 Figure 3: distance membership functions Figure 4: speed membership functions Figure 5: acceleration membership functions 21

To test this FRBS, first the rules need to be generated. According to the scenario, it is wished that the car moves fast when it is far from the wall, but gradually reduces the speed when it is getting closer to its destination. As a matter of testing, 9 rules are generated that adjust to this requirements: 1. IF distance IS close AND speed IS slow THEN acceleration IS keep-speed 2. IF distance IS close AND speed IS medium THEN acceleration IS brake 3. IF distance IS close AND speed IS fast THEN acceleration IS much-brake 4. IF distance IS medium AND speed IS slow THEN acceleration IS accelerate 5. IF distance IS medium AND speed IS medium THEN acceleration IS keep-speed 6. IF distance IS medium AND speed IS fast THEN acceleration IS brake 7. IF distance IS far AND speed IS slow THEN acceleration IS much-accelerate 8. IF distance IS far AND speed IS medium THEN acceleration IS accelerate 9. IF distance IS far AND speed IS fast THEN acceleration IS keep-speed Now to see how these rules work, we will test the FRBS with input values of {distance, speed} = {0.8, 0.2}. Figure 6: car FRBS test {0.8, 0.2} As can be seen, being AND rules, all of them containing close distance and fast speed get an output of 0. The COG is performed with the rest of the output values and, as expected, the FRBS indicates to keep accelerating, as the distance is relatively far for the speed the car is moving. The output we get in a FRBS depends in a high grade on the rules configured by the user. As it is difficult for a user to find the best possible rule configuration to accomplish the 22

best result, normally these FRBS are used along another system that optimizes these rules. This is the case of the Pittsburgh and KASIA approach among others. These systems use meta-heuristic algorithms like the genetic algorithm and Particle Swarm Optimization (PSO) respectively to try an initially random generated members of a population or particles and optimize them through a combination of them oriented to the best solutions found. Here enter the concepts of exploration and exploitation. If the system is too focused on exploring, then there will be low chances of getting a good final value, as the system is not moving towards the good solutions. In the other hand, if the system is too focused on exploiting, then there are high chances of getting stuck in a local optimum in the case of PSO or incur in elitism due to a high selective pressure. For these two reasons, it is needed to find a correct balance between these two approaches. This way, using one of these systems we can find a set of rules that accomplish a better result than with the rules we manually tried. 3.1.4. Cloud computing types After years of researching, the community has developed several different types of Cloud Computing networks, each of them based on offering a different service to their users. This way, all different types are named after a service, using the nomenclature XaaS (X as a Service) [ 23 ]. Among all these types, the most important are SaaS (Software as a Service), PaaS (Platform as a Service), IaaS (Infrastructure as a Service) and HaaS (Hardware as a Service), among others. There are many different types of taxonomies based on different enterprises, as each of them tries to define Cloud Computing in their own way. The mentioned types follow a layered architecture, being SaaS the upper level and HaaS the bottom. Their order expresses the level of privileges that the final user gets in each of them. Here we describe which is done in each of them. - Application level Contains the current application that is offered to their users. This will vary depending on the enterprise. The use of this type of services include several benefits from the point of view of the final user. First, the final user do not need to have specialized knowledge for the support of the system. The responsibility of this step depends on the enterprise which offer the cloud computing service, offering to the final user the availability of the application at all times and ensuring that the application will work as intended. 23

- Platform level If the final user needs more privileges in the cloud and prefers more flexibility, this is the second layer of the stack. In this level, the final user is in charge of managing the whole operating system instead of just making use of a built application. This allows the flexibility of being able to work with the cloud server as if it were a local server, with the responsibilities over the system and the applications. The final user is given a platform with a certain performance, not being able to decide on the VMs characteristics or the physical host management. However, this may be an advantage for some final users, as they do not need to have the required knowledge to manage these VMs and reduce the complexity of the system on their side. As in SaaS, the hardware failures continue being responsibility of the IT enterprises that offer this service, which avoid the final user the need of buying any hardware in case any of them stops working. - Infrastructure level In case the final user wants still more privileges and absolute control over a physical host, this layer offer a whole server of the demanded characteristics, offering the managing rights of VMs creation and deciding how to divide the host s resources. Still, final users get rid of tasks such as security, backup and data partitioning. Three of the most important virtualization technologies used at this layer of the cloud computing stack are KVM [ 24 ], VMware [ 25 ] and XEN [ 26 ]. Virtualization technologies offer the configuration of the whole performance of the physical host and divide it into different VMs. These VMs are configurable, able of choosing which the performance capacities of each one of them is. This allows having a dynamic resource management, with systems modifying the capabilities of each VM to achieve the optimum working of the whole server and system. Another great advantage for users that use this is the fact of always been able of using the latest technology. In case the user prefers a local server to have full control over it, after some time the hardware will grow deprecated and another server will need to be bought. However, using these IaaS solutions the final user will always get the performance that is being paid, not worrying about hardware. This allows users being able to compete at a much lower cost than if they would need to regularly acquire new hardware. 24

- Hardware level This level is normally managed by the IT enterprise that offer the cloud service to the final users. In this layer, the physical resources of the Datacenter are managed, including the physical servers, routers, switches and systems to provide power and cooling. In this level, operations like hardware configuration, failure, power, cooling and traffic management take place. In the case of Hardware as a Service (HaaS), hardware belonging to the IT enterprise is installed in the client s site. A SLA indicates the responsibilities of each part, and the client pay for the use of that hardware. This way, the final user is able of managing the hardware, but leaving the replacement of broken parts to the IT owner. Among the numerous advantages of using this solution, some of them are shared with the IaaS alternative. For example, using this strategy the final use do not need to worry with the obsolescence of hardware, as always that the level of performance needed increases, the IT enterprise can replace the hardware for another one with more resources increasing the fee the user is paying to the owner of the hardware. This hardware is then able to be used for another client who does not need as much performance. Additionally, in case there is some failure with the hardware, the IT is in charge of replacing it for new one as it is one of their responsibilities. This way, the amount of funds that need to be invested by the final users each time an upgrade in hardware is needed is much lower than needing to buy the hardware themselves. Also, in case of failures, the Managed Services Provider (MSP) is able to do the troubleshooting instead and help solving the problem. Maintenance can be relied on them too, making much easier the task of keeping the hardware working as intended. Another advantage is scalability. If a client needs to increase the amount of hardware of the Datacenter, the MSP can easily install additional hardware increasing the fee. Although this is obvious to think, there is another possibility that is also important to note. In case the client buys the hardware him/herself and the enterprise contracts, the amount of hardware needed will be lower, but hardware already bought means money already invested. With HaaS solutions, it is as simple as getting the MSP to take away some hardware and reducing the free paid. The information shown concerns the architecture of cloud computing. There is also another classification of cloud computing as types. 25

- Public clouds The services offered by the cloud are available to every user. These networks normally do not incur in an initial investment, but lacks the control of the data managed. Also, the security offered on their applications or storage services may not be enough for some clients, who will then consider paying for a private cloud if their data is too important to sacrifice reliability. - Private clouds These clouds are designed by the use of a single organization, which offer a higher level of privacy over free public solutions. They offer the highest level of control over reliability, performance and security, as they can be managed by the same organization that pays for them. In case of Datacenters self-run by the organization, they may incur in high costs and they will require high amounts of space for allocation of the physical machines within the organization. They also need to invest funds in management and maintenance, which increases costs too. In this case, they can even lack the benefit from less focused management, and in part losing part of the concept that makes cloud computing a great solution to many organizations. - Hybrid clouds Being a combination of both public and private alternatives, these networks try to solve the problems encountered in both them by themselves. There are two parts of the infrastructure in both different networks, having one part in a public cloud and another one in a private cloud, which gets a higher flexibility than running only one of them. This can offer the possibilities of storing important data of the organization in the private part, while leaving some applications run in a SaaS solution in the public network. They offer a higher level of control and data security than solo-public solutions, but continuing facilitating the scalability of the network by admitting extending the Datacenter capabilities with the addition of external services in public clouds. Another advantage is meeting temporary capacities, in those times when a high amount of resources are needed to process a high amount of data. Then, external resources can be used to solve a peak problem of burst data. This means that the client is paying for those extra resources only when they are needed, avoiding the necessity of owning a huge private cloud to meet users SLA at all times and avoid breaching their agreements. This way, the private Datacenter can be built to 26

support processing average workloads, and letting external resources to deal with additional traffic. The difficult component of this mixed solution is deciding and optimizing the best division of components that run in each of both parts. - Virtual private clouds This is another solution to solve the problems found running a cloud solely public or private. Essentially, it consists on a platform running over a public cloud, with the difference of using Virtual Private Networks (VPN) that allows defining own topologies and security settings. 3.1.5. Power saving techniques in Datacenters There are many different types of techniques to achieve a reduction in the power consumed by Datacenters. In this section we show a summary of some of the most important of them and indicate the direction this project moves in the power saving scope. 1- DVFS As shown before, DVFS [ 8 ][ 9 ] is a technique that, based on the type of governor used, adjust both frequency and voltage levels of a processor adapting to the workload. This technique allows a higher performance at a higher power cost and, in the other hand, achieve power reduction at the cost of performance. This algorithm is included in the Linux kernel at it allows the execution in servers easily. This technique is not a substitution of other algorithm in the same scope, but an addition to continue increasing the level of power saving in datacenters. As the reduction of the power consumption also incurs in a reduction of the performance, it is not wise to set power to the minimum, as it would make difficult to accomplish the agreements set in the SLA of the user. To achieve the maximum power saving, the levels of frequency and voltage need to be set to the lowest possible level that accomplish the SLA, keeping both performance and power to the minimum possible levels. Servers in old days when this technique was not yet implemented used to continuously have both frequency and voltage levels set to their maximum, consuming at all times the maximum power depending only on the utilization. This meant an incredibly high amount of power consumed in idle servers, which now is fairly lower. 27