Outin, Edouard, et al. "Enhancing cloud energy models for optimizing datacenters efficiency." Cloud and Autonomic Computing (ICCAC), 2015 International Conference on. IEEE, 2015. Reviewed by Cristopher Flagg December 6, 2017
Objective Minimize Energy Consumption Maintain SLA requirements Nontrivial Multi-Objective Optimization Problem Genetic algorithm to optimize Cloud energy consumption Machine learning to improve fitness function
Fitness Function - Research Questions Depends on the underlying model RQ1. Do differences exist between the energy simulation based on hardware specifications and the real data that can be observed? RQ2. Could we use machine learning techniques at runtime to improve the simulation accuracy?
Problem Statement Simulation used to model datacenter consumption Accuracy of simulation drives accuracy of modeling Models used in "Analysis" step of MAPE-K Based on Standard Performance Evaluation Corporation (SPEC) benchmarks of power consumption
Problem Statement - CloudSim to provide a generalized and extensible simulation framework that enables modeling, simulation, and experimentation of emerging Cloud computing infrastructures and application services Energy model is based on the host CPU utilization
Problem Statement - GreenCloud Packet level simulator with a strong emphasis on networking and energy awareness. Independent energy models for each type of resource (e.g. CPU, RAM, disk, network). Determining coefficients for models is complex and can not be approximated.
Problem Statement - SimGrid Study the behavior of large-scale distributed systems such as Grids, Clouds, HPC or P2P systems SURF Energy Plugin enables accounting for computation time and dissipated energy Assumes energy consumption is linear with the CPU utilization
Problem Statement - icancloud Predict the trade-offs between cost and performance of a given set of applications executed in a specific hardware Supports modeling hardware energy consumption of a system such as CPUs, memories, disks, PSUs. Based on predefined collections of applications
Problem Statement - Summary Simulators used in classical analysis step of a MAPE-K Analysis step uses hard coded "static" rules, also called Event-Condition-Action (ECA) engines This paper uses and manipulates simulators instead of the ECA engine.
Problem Statement - Experimental Protocol Google Scholar to identify most cited simulator (CloudSim) Simulators based on the spec.org values for the DELL PowerEdge R620 Request the PDU metrics for this server through SNMP Stress tools to mimic variable server utilization (stress-ng) Two experiments on fresh Ubuntu Server 14.04.2 LTS
Problem Statement - Bare Metal No hypervisor - Directly stressing host operating system Average energy consumption over 120 seconds interval
Problem Statement - Hypervisor and VM KVM hypervisor with single large Ubuntu VM When idle, non-negligible gap between spec.org and measured value
Problem Statement - RQ1 Revisited CloudSim simulation values not very accurate (based on the spec.org data) Cannot rely on the CPU metric to predict the Watts consumed.
Approach Monitor managed elements of Cloud infrastructure. Analysis determines changes needed to bring the system in the ideal state more energy-efficient no SLA violations High performance
Approach
Approach Genetic algorithm manipulates a Cloud configuration instanced as a model Fitness function designed to evaluate the energy consumption (goal of paper) Plan and execute changes from best instance
Approach - Cloud Model Model inspired by previous experiments Model is mapping of virtual machine placement SLA constraints different hosts load Allows mutations, crossovers and validity checks
Approach - Cloud Model Uses KMF modeling framework (modeling.kevoree.org) Utilizes model generators Stores time series of models
Approach - Energy Consumption Model OpenStack Ceilometer compute agent on each node Forwards all the metrics to central agent for aggregation Uses machine learning mechanisms to design a new energy model for the Cloud datacenter Train our model beforehand
Approach - Energy Consumption Model Detailed sequence of actions performed by every compute node agent: On the server, monitor CPU utilization, RAM usage, volume of read and writes on the disk and volume of network data received and sent. With the PDU we get the corresponding energy consumed by the server Every second we retrieve the metrics from the server and the PDU Metrics collector stores tuple (%cpu, %ram, read, writes, recv, sent, Watts)
Approach - Energy Consumption Model
Approach - Energy Consumption Model Multivariate Adaptive Regression Spline Predict the values of a continuous dependent variable from a set of independent variables Does not assume any particular type or class of relationship (e.g., linear, logistic, etc.) between the predictor variables and the dependent variable E total = ' predict(host) + E network Network usage does not change with proportional to traffic load, is related to topology. Model assumes this is a static value
Experimental Protocol - Validation Gather sparse data for predictions, representing different utilization levels of the server s hardware (i.e. CPU, RAM, disk, network) Cloud infrastructure mimics random / variable workloads Stress-ng used to consume server resources
Experimental Protocol - Sample Data Training data gathered for a given host node
Experimental Protocol - Energy model results Ehost is the total energy consumption of a given host cpu refers to the current host CPU utilization ram refers to the current host RAM usage sent denotes the volume of network sent data (in Kb)
Conclusion - Analysis of Results?
Conclusion - Analysis of Results The results look promising as we get an average error of 3,8% between the effectively measured values and the predicted ones which improve the accuracy comparing to CloudSim. This result permits to answer positively to RQ2
Conclusion - Threats to Validity Disk I/O NOT dominant features in the prediction equation computed by the MARS algorithm Volume of disk operations was quite constant Pure sequential disk access is not realistic NO live migration energy overhead considered
Conclusion - Questions CloudSim is CPU only. Greencloud takes drives and ram into account as well, but not reviewed No results, no analysis of missing results