Scheduler Support for Video-oriented Multimedia on Client-side Virtualization Hwanju Kim 1, Jinkyu Jeong 1, Jaeho Hwang 1, Joonwon Lee 2, and Seungryoul Maeng 1 Korea Advanced Institute of Science and Technology (KAIST) 1 Sungkyunkwan University 2 ACM Multimedia Systems (MMSys) February, 22-24, Chapel Hill, North Carolina, USA
Virtualization Everywhere A new layer between OS and HW =hypervisor OS Hypervisor OS Multiple OSes High resource utilization Strong Isolation Easy management Live maintenance Fast provisioning Server-side virtualization Client-side virtualization Cloud computing Virtual desktop infrastructure 2/22
Client-side Virtualization Multiple OS instances on a local device Primary use cases Different OSes for application compatibility Consolidating business and personal computing environments on a single device BYOD: Bring Your Own Device Managed domain Business Personal Hypervisor 3/22
Multimedia on Virtualized Clients Multimedia is ubiquitous on any Video Playback Compilation Data Processing 3D game Video conference Downloading Windows Linux Business Personal Business Personal Hypervisor Hypervisor Hypervisor 1. Multimedia workloads are dominant on virtualized clients 2. Interactive systems can have concurrently mixed workloads 4/22
Issues on Multi-layer Scheduling A multimedia-agnostic hypervisor invalidat es OS policies for multimedia Larger CPU proportion & Timely dispatching BVT [SOSP 99] SMART [TOCS 03] Rialto [SOSP 97] BEST [MMCN 02] HuC [TOMCCAP 06] Redline [OSDI 08] RSIO [SIGMETRICS 10] Windows MMCSS Task OS scheduler CPU Task Additional abstraction Task I m unaware of any multimedia -specific OS policies in a, since I see each as a black box. Task OS scheduler Virtual CPU Hypervisor Scheduler CPU Task Task OS Scheduler Virtual CPU Semantic gap! 5/22
Multimedia-agnostic Hypervisor Multimedia QoS degradation Average FPS 30 25 20 15 10 5 0 Two s with equal CPU shares Multimedia + Competing Video playback (720p) on VLC media player Average FPS 100 90 80 70 60 50 40 30 20 10 0 Video playback or 3D game Xen hypervisor Credit scheduler Quake III Arena (demo1) Competing workloads Competing workloads in another Competing workloads in another 6/22
Possible Solutions to Semantic Gap Explicit vs. Implicit Explicit OS cooperation OS scheduler Hypervisor Scheduler + Accurate - OS modification - Infeasible w/o multimedia-friendly OS schedulers Explicit User involvement OS scheduler Hypervisor Scheduler + Simple - Inconvenient - Unsuitable for dynamic workloads Implicit Hypervisor-only OS scheduler Workload monitor Hypervisor Scheduler + Transparency - Difficult to identify workload demands at the hypervisor 7/22
Proposed Approach Multimedia-aware hypervisor scheduler Transparent scheduler support for multimedia No modifications to upper layer SW (OS & apps) Feedback-driven scheduling Multimedia monitor Hypervisor Multimedia manager (feedback-driven) Scheduling command (e.g., CPU share or priority) CPU scheduler Estimated multimedia QoS Audio Video CPU Challenges 1. How to estimate multimedia QoS based on a small set of HW events? 2. How to control CPU scheduler based on the estimated information 8/22
Multimedia QoS Estimation What is estimated as multimedia QoS? Display rate (i.e., frame rate) Used by HuC scheduler [TOMCCAP 06] How is a display rate captured at the hypervisor? Two types of display 1. Memory-mapped display (e.g., video playback) Display interface Memory-mapped Graphics Library 2. GPU-accelerated display (e.g., 3D game) Framebuffer Acceleration unit Video device 9/22
Memory-mapped Display (1/2) How to estimate a display update rate on the memory-mapped framebuffer Write-protection for virtual address space mapped to framebuffer Display interface write The hypervisor can inspect any attempt to map memory Hypervisor page fault handler { Update display rate } Virtual address space Framebuffer memory Write-protection Sampling to reduce trap overheads (1/128 pages, by default) 10/22
Memory-mapped Display (2/2) Accurate estimation Maintaining display rate per task An aggregated display rate does not represent multimedia QoS Tracking guest OS task at the hypervisor Task 25 FPS Task 10 FPS Inspecting address space switches (Antfarm [USENIX 06]) Monitoring audio access (RSIO [SIGMETRIC 10]) Inspecting audio buffer access with write-protection A task with a high display rate and audio access à a multimedia task 11/22
GPU-accelerated Display (1/2) Naïve method Inspecting GPU command buffer with write-protection or polling Too heavy due to huge amount of GPU commands Lightweight method Little overhead, but less accuracy 3D games are less sensitive to frame rate degradation than video playback GPU interrupt-based estimation An interrupt is typically used for an application to manage buffer memory Hypothesis A GPU interrupt rate is in proportion to a display rate 12/22
# of GPU interrupt / sec GPU-accelerated Display (2/2) Linear relationship between display rates and GPU interrupt rates 12000 10000 8000 6000 4000 2000 0 Intel GMA 950 (Apple MacBook) Quake3 demo1 (640x480) Quake3 demo2 (640x480) Quake3 demo1 (1024x768) # of GPU interrupt / sec 160 140 120 100 80 Nvidia 6150 Go (HP Pavillion tablet) Quake3 demo1 (640x480) Quake3 demo2 (640x480) Quake3 demo1 (1024x768) 60 0 0 20 40 60 A 80 GPU 100 interrupt 0 20 rate 40 can 60 be 80 used 100 to estimate 0 20 a display 40 rate 60 FPS FPS FPS without additional overheads # of GPU interrupt / sec 400 350 300 250 200 150 100 50 PowerVR (Samsung GalaxyS) Quake3 demo4 (320x240) Quake3 demo4 (640x480) Exponential weighted moving average (EWMA) is used to reduce fluctuation EWMA t = (1-w) x EWMA t-1 + w x current value 13/22
Multimedia Manager A feedback-driven CPU allocator Base assumption Additional CPU share (or higher priority) improves a display rate Desired frame rate (DFR) A currently achievable display rate Multiplied by tolerable ratio (0.8) /* Exceptional cases: * 1) No relationship between CPU and FPS * 2) FPS is saturated below DFR * 3) Local CPU contention in a */ If no FPS improvement by CPU share increase (3 times) Then Decrease CPU share by half IF current FPS < previous FPS AND current FPS < DFR THEN Increase CPU share If in initial phase Then Exponential increase Else Linear increase 14/22
Priority Boosting Responsive dispatching Problem The hypervisor does not distinguish the types of events for priority boosting A that will handle a multimedia event cannot preempt a currently running handling a normal event. Higher priority for multimedia-related events e.g., video, audio, one-shot timer Priority MMBOOST IOBOOST Normal priority Multimedia events Other events Based on remaining CPU shares 15/22
Evaluation Experimental environment Intel MacBook with Intel GMA 950 Xen 3.4.0 with Ubuntu 8.04 Implementation based on Xen Credit scheduler Two- scenario One with direct I/O + one with indirect (hosted) I/O Presenting the case of direct I/O in this talk See the paper for the details of the indirect I/O case 16/22
Estimation Accuracy Estimation accuracy Error rates: 0.55%~3.05% Video playback (720p) (w/ CPU-bound ) Real FPS Estimated FPS multimedia manager disabled FPS 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 Time (sec) Quake 3 (w/ CPU-bound ) Real FPS Estimated FPS Estimated FPS (EWMA, w=0.2) FPS 100 80 60 40 20 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 Time (sec) 17/22
Estimation Overhead CPU overhead caused by page faults Video playback 0.3~1% with sampling Less than 5% with tracking all pages Overhead All pages Sampling 1/8 pages 1/32 pages 1/128 pages Low resolution (640x354) High resolution (1280x720) 4.95% 1.10% 0.54% 0.58% 3.91% 1.04% 0.69% 0.33% 18/22
Multimedia Manager Video playback (720p) + CPU-bound FPS DFR CPU share (%) FPS 25 20 15 25 20 15 10 5 0 25 100 90 0 10 20 30 40 50 60 70 80 80 20 70 60 50 Time (sec) 15 100 90 80 70 60 50 40 30 20 100 10 090 80 70 60 CPU share (%) 10 40 50 5 5 6 7 8 9 10 30 20 10 80 81 82 83 84 40 19/22
Performance Improvement Performance improvement Closed to maximum achievable frame rates Video playback or 3D game Hypervisor Competing workloads Credit scheduler Video playback (720p) on VLC media player Credit scheduler w/ multimedia support Quake III Arena (demo1) Credit scheduler Credit scheduler w/ multimedia support Average FPS 25 20 15 10 5 0 Average FPS 100 80 60 40 20 0 Competing workloads in another Competing workloads in another 20/22
Limitations & Discussion Network-streamed multimedia Additional preemption support required for multimedia-related network packets Multiple multimedia workloads in a Multimedia manager algorithm should be refined to satisfy QoS of mixed multimedia workloads in the same Adaptive management for SMP s Adaptive vcpu allocation based on hosted multimedia workloads 21/22
Conclusions Demands for multimedia-aware hypervisor Multimedia are increasingly dominant in virtualized systems Multimedia-friendly hypervisor scheduler Transparent and lightweight multimedia support on client-side virtualization Future directions Multimedia for server-side VDI Multicore extension for SMP s Considerations for network-streamed multimedia 22/22
Thank You! Questions and comments Contact hjukim@calab.kaist.ac.kr 23/22
EXTRA SLIDES
Workload Details Descriptions and CPU usages for used workloads 25/22
Task-based Display Rate Management Different types of frame rate accounting For Unix-like OSes, a display rate is accounted to a previously scheduled task when a framebuffer write is observed (this heuristic works well because a display interface is interactive and is highly likely to be scheduled promptly on requests) Video playback (640x354) + Directory listing on the screen (ls alr /) (w/ CPU-bound ) Real FPS Estimated FPS FPS 30 25 20 15 10 5 0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 Time (sec) Error rate = 1.78% 26/22
Local CPU Contention in a Increasing the number of CPU-bound tasks with a video playback application Native Linux Virtualized Linux w/ our scheme Ø Our scheme provides sufficient CPU to a (IDD) that hosts a multimedia workload even though local CPU contention increases Ø No starvation of another CPU-bound (guest) 27/22