SANDPIPER: BLACK-BOX AND GRAY-BOX STRATEGIES FOR VIRTUAL MACHINE MIGRATION

SANDPIPER: BLACK-BOX AND GRAY-BOX STRATEGIES FOR VIRTUAL MACHINE MIGRATION Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif * University of Massachusetts Amherst * Intel, Portland

Data Centers Data Centers are composed of: Large clusters of servers Network attached storage devices Multiple applications per server Shared hosting environment Allocates resources to meet Service Level Agreements (SLAs)

Provisioning Methods Hotspots form if resource demand exceeds provisioned capacity Static over-provisioning Allocate for peak load Wastes resources Not suitable for dynamic workloads Dynamic provisioning Adjust based on workload Often done manually Becoming easier with virtualization

Problem Statement How can we automatically detect and eliminate hotspots in data center environments? Use VM migration and dynamic resource allocation!

Outline Introduction & Motivation System Overview When? How much? And Where to? Implementation and Evaluation Conclusions

Research Challenges Sandpiper: automatically detect and mitigate hotspots through virtual machine migration When to migrate? Where to move to? How much of each resource to allocate? How much information needed to make decisions?

Sandpiper Architecture Nucleus Monitor resources Report to control plane One per physical server Control Plane Centralized server Profiling Engine Construct profiles Hotspot Detector Detect when hotspots occur Migration & Resizing Manager Determine how to eliminate hotspots Nucleus Monitoring Engine Dom-0 PM 1 Profiling Engine VM 1 VM 2 Hotspot Detector Control Plane PM N Migration& Resizing Manager PM = Physical Machine VM = Virtual Machine

Black-Box and Gray-Box Black-box: only data from outside the VM Completely OS and application agnostic Black Box??? Gray Box Application logs OS statistics Gray-Box: access to OS-level statistics and application logs Can improve detection and profiling Not always feasible customer may control OS Is black-box sufficient? What do we gain from gray-box data?

Outline Introduction & Motivation System Overview When? How much? And Where to? Implementation and Evaluation Conclusions

Black-box Monitoring Xen uses a Driver Domain Special VM with network and disk drivers Nucleus runs here CPU Scheduler statistics Apportion the CPU utilization of Dom-0 to other VM s Nucleus Monitoring Engine Dom-0 VM Hypervisor

Black-box Monitoring Xen uses a Driver Domain Special VM with network and disk drivers Nucleus runs here Network Dom-0 implements the network interface driver Nucleus Monitoring Engine Dom-0 VM Hypervisor

Black-box Monitoring Xen uses a Driver Domain Special VM with network and disk drivers Nucleus runs here Memory Detect pressure by swap partitions. Only know when performance is poor. Limited to reactive decision. Nucleus Monitoring Engine Dom-0 VM Hypervisor

Gray-box Monitoring Monitoring daemon used to gather OS-level and application-level statistics. detection of memory hotspots. Nucleus Monitoring Engine Dom-0 Monitoring daemon VM Hypervisor

Profile generation Each profile contains a distribution and time-series.

Hotspot Detection When? Resource Thresholds Potential hotspot if utilization exceeds threshold Only trigger for sustained overload Must be overloaded for k out of n measurements Minimize impact of transient spikes Time Series prediction Use historical data to predict future values

Resource provisioning How much?

Resource provisioning How much? How much additional resources are needed? Gray-box: needs can always be estimated correctly All the required information can be determined from the server logs and the OS-level information. Can used to reduce the amount of memory allocation.

VM resizing Adjusting the resource allocation of the overloaded VM. Only if there are insufficient spare resources, the VM will be migrated to a different PM.

net Determining Placement Where to? Migrate VMs from overloaded to underloaded servers Volume = 1 1-cpu 1 * * 1-net 1 1-mem Use Volume to find most loaded servers Captures load on multiple resource dimensions cpu Migrations incur overhead Migration cost determined by RAM Migrate the VM with highest Volume/size ratio Maximize the amount of load transferred while minimizing the overhead of migrations

Placement Algorithm First try migrations Displace VMs from high Volume servers Use Volume/size to minimize overhead Decreasing order of volume/size PM1 VM1 VM2 PM2 VM3 VM4 Decreasing order of volumes

Placement Algorithm Swap if necessary Swap a high Volume VM for a low Volume one Requires 3 migrations Spare Decreasing order of volume/size PM1 VM1 VM2 VM3 PM2 VM5 VM4 Decreasing order of volumes

Outline Introduction & Motivation System Overview When? How much? And Where to? Implementation and Evaluation Conclusions

Implementation Use Xen 3.0.2-3 virtualization software Testbed of twenty 2.4Ghz P4 servers

VM resizing

CPU Usage (stacked) Migration Effectiveness Sandpiper detects and responds to 3 hotspots VM1 VM2 VM4 PM 1 VM4 VM3 VM5 PM 2 VM1 VM5 PM 3

Memory Hotspots Memory utilization increases over time The VM initially assigned 256MB of RAM on 384 MB PM Another idle PM with 1GB RAM is also running Gray-box system can decrease a VM memory allocation Gray-box can improve application performance by proactively increasing allocation

# of Hotspots Data Center Prototype 16 server cluster runs realistic data center applications on 35 virtual machines 6 servers (14 VMs) become simultaneously overloaded 4 CPU hotspots and 2 network hotspots Sandpiper eliminates all hotspots in four minutes Uses 7 migrations and 2 swaps Despite migration overhead, VMs see fewer periods of overload 12 10 8 Static Sandpiper 6 4 2 0 1 11 21 31 41 51 Time

Summary Sandpiper can rapidly detect and eliminate hotspots while treating each VM as a black-box Gray-Box information can improve performance in some scenarios Proactive memory allocations

THANK YOU!