Empirical Evaluation of Latency-Sensitive Application Performance in the Cloud Sean Barker and Prashant Shenoy University of Massachusetts Amherst Department of Computer Science
Cloud Computing! Cloud platforms built with data centers: large-scale, concentrated servers clusters Machines rented out to companies or individuals Hosting for arbitrary applications May supplement local resources! Cheap enough to rent machines by the hour Type CPUs Memory Disk Cost/hr Small 1 1.7 GB 160 GB $0.085 Large 4 7.5 GB 850 GB $0.34 XL 8 15 GB 1690 GB $0.68 Current prices on Amazon Elastic Compute Cloud (EC2) 2
Multimedia Cloud Computing Scenarios! Clouds designed primarily for web & e-commerce apps, but may also be used for multimedia! Rent game server for an evening No firewall or bandwidth issues, only a few dollars! Rent high-cpu machines for HD video transcoding Home PC may take several hours to transcode one video, cloud can transcode many in a fraction of this time! Rent servers for webcast of live event Large, inexpensive temporary bandwidth allocation 3
Resource Sharing in the Cloud! Data center servers are typically well-equipped Providers share individual machines machines among multiple users Core 1 Core 2 Core 3 Core 4 4 GB RAM8 GB RAM4 GB RAM 1000 GB Disk 1000 GB Disk! Example: one user runs game server, another runs high-performance database on same machine! Multimedia has unique performance requirements Low latency games, low jitter & high bandwidth streaming! Are cloud platforms designed for conventional web applications suitable for multimedia? 4
Outline! Motivation! Virtualized clouds! Amazon EC2 study! Laboratory cloud study! Real world multimedia case studies! Related work & conclusions 5
Virtualized Clouds! Cloud platforms are virtualized data centers! Virtualization facilitates machine distribution among multiple users with virtual machines (VMs) Users Customer A Customer C Game Server Web Server Media Server VM VM VM Hardware Customer B 6
Virtual Machine Isolation! Each VM is assigned slice of physical resources! VM access to hardware managed by hypervisor Enforces limits and isolates VMs from each other Users Users App A App B App C resource starvation App A App B App C VM VM VM Hypervisor Hardware VM VM VM Hypervisor Hardware! Are these resource sharing mechanisms suitable for the timeliness constraints of multimedia? 8
Outline! Motivation! Virtualized clouds! Amazon EC2 study! Laboratory cloud study! Real world multimedia case studies! Related work & conclusions 9
EC2 Study Overview! Amazon Elastic Compute Cloud (EC2) Popular virtualized cloud platform! Unknown applications coexisting on machine No control over VM placement! Goal: evaluate performance with unknown background server load! Methodology: measured CPU, disk, and network consistency over period of days 10
EC2 CPU Performance 1400 1200 1000 2.5x average EC2 Local outliers: 1.5-2x avg CPU time (ms) 800 600 400 200 no competing VMs: no outliers 0 Time (5 minute intervals) Volatility on EC2 vs stability on dedicated server 11
EC2 Disk Performance 90000 80000 EC2 Local Long write time (ms) 70000 60000 50000 40000 30000 20000 10000 widely fluctuating disk performance 0 Time (5 minute intervals) Similarly: inconsistent EC2 disk performance 12
EC2 Network Latency (LAN) 250 First three hops latency (ms) 200 150 100 50 0 Time (5 minute intervals) Latency variations in EC2 LAN 13
EC2 Study Summary! Performance variations observed on EC2 Not observed on local server running a single VM! Can only speculate on causes without access to the hypervisor! Need to experiment on a controlled platform similar to Amazon s 14
Laboratory Cloud Study Overview! Local cloud running the Xen hypervisor Same virtualization technology used by EC2 Advantage: local cloud gives us control of interference! Built-in mechanisms for sharing hardware between VMs CPU credit scheduler Round-robin disk servicing Linux-level tool tc for network sharing! How well do these tools isolate background work?! Methodology: evaluated performance impact of competing VM 15
CPU Performance with Background Load 200 150 Max background work: VM gets 50% CPU CPU time (ms) 100 50 No background work: VM gets 100% CPU 0 Time (5 second intervals) Default 1 to 1 sharing with variable background load 16
Disk Performance with Background Load 100 Performance Impact (%) 80 60 40 20 0 unfair impact Fair Share Small Read Small Write Read Throughput Write Throughput 1 2 3 4 8 Disk Thread Pairs on Collocated VM Degraded by half over fair, but stable with increasing load 17
Laboratory Cloud Study Summary! Significant interference possible from background VMs! Xen configuration can guarantee share of CPU Default settings allow fluctuation in shared CPU! Disk sharing less fair and harder to control Consistent with observed EC2 behavior! Network sharing effects evaluated in case studies on laboratory cloud (next) 18
Case Study 1 Doom 3 Game Server! Multiplayer Doom 3 game server! Introduced controlled interference as before! Measured map load times and server latency! Network sharing configuration via tc: Idle: No bandwidth usage by resource-hog VM Off (default): No rate-limiting, network free-for-all Shared: 50% (min) to 100% (max) of bandwidth per VM Dedicated: 50% (max) of bandwidth per VM 19
Game Server Map Load 5000 Average Server Load Time (ms) 4000 3000 2000 1000 0 Idle Disk CPU Disk + CPU Collocated VM Activity Interference produces up to 50% degradation 20
Game Server Latency Configuration Avg. Latency (ms) Std. Deviation (jitter) Timeouts No interference 8.1 10.2 0% tc off (free-for-all) N/A N/A 100% tc, sharing b/w 33.9 16.9 2% tc, dedicated b/w 23.6 29.6 7%! Server crippled without bandwidth controls (tc off)! Dedicated vs shared bandwidth: Dedicated: lower latency, higher jitter Sharing: higher latency, lower jitter 21
Case Study 2 Darwin Streaming Server! Streaming video to multiple clients! Introduced controlled interference as before! Measured sustained streaming bandwidth and stream jitter (latency variation)! Varied tc settings and number of clients Max video stream rate of 1 Mbps per client 22
Streaming Server Bandwidth average bitrate per stream (kbps) 1000 800 600 400 200 0 decreased stream quality idle (fair) off shared dedicated tc sharing type 4 streams 8 streams both tc configurations recovered bandwidth 23
Streaming Server Jitter average stream jitter (ms) 16 14 12 10 8 6 4 2 4 streams 8 streams 0 idle (fair) off shared dedicated tc sharing type Jitter improved by shared, but worsened by dedicated 24
Real World Case Studies Summary! Real applications show substantial impacts from background interference! Network is particularly vulnerable without administrative controls! Proper configuration is important CPU and network isolation tools fairly well-developed Disk isolation needs better mechanisms 25
Related Work! Fair-share schedulers and quality-of-service Nieh and Lam (SOSP 97) for multimedia Sundaram et al. (ACM MM 00) for QoS-aware OS! Virtualization and hypervisors Xen, VMware ESX Server! Improving performance isolation Gupta et al. (Middleware 06) for Xen mechanisms! We focus on evaluation of existing mechanisms with specific attention to multimedia 26
Conclusions! Clouds exhibit performance variations Applications with timeliness requirements are particularly sensitive! Appropriate hypervisor configuration can help In some cases, prevents resource starvation Some resource sharing mechanisms need improvement! Future work: evaluation of non-xen platforms! Questions? 27