Tuning Your SUSE Linux Enterprise Virtualization Stack. Jim Fehlig Software Engineer

Similar documents
Virtualization at Scale in SUSE Linux Enterprise Server

Storage Performance Tuning for FAST! Virtual Machines

KVM 在 OpenStack 中的应用. Dexin(Mark) Wu

Configuring and Benchmarking Open vswitch, DPDK and vhost-user. Pei Zhang ( 张培 ) October 26, 2017

Automatic NUMA Balancing. Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master Technologist, HP

Red Hat Enterprise Linux 6

KVM PERFORMANCE OPTIMIZATIONS INTERNALS. Rik van Riel Sr Software Engineer, Red Hat Inc. Thu May

Red Hat Enterprise Virtualization Hypervisor Roadmap. Bhavna Sarathy Senior Technology Product Manager, Red Hat

SUSE Best Practices for SAP HANA on KVM

A block layer overview. Red Hat Kevin Wolf 8 November 2012

Landslide Virtual Machine Installation Instructions

KVM Weather Report. Amit Shah SCALE 14x

SUSE Linux Enterprise Server: Supported Virtualization Technologies

Achieve Low Latency NFV with Openstack*

CFS-v: I/O Demand-driven VM Scheduler in KVM

Cloud environment with CentOS, OpenNebula and KVM

Painless switch from proprietary hypervisor to QEMU/KVM. Denis V. Lunev

Linux/QEMU/Libvirt. 4 Years in the Trenches. Chet Burgess Cisco Systems Scale 14x Sunday January 24th

Xen Summit Spring 2007

Measuring a 25 Gb/s and 40 Gb/s data plane

KVM / QEMU Storage Stack Performance Discussion

Developing cloud infrastructure from scratch: the tale of an ISP

Knut Omang Ifi/Oracle 20 Oct, Introduction to virtualization (Virtual machines) Aspects of network virtualization:

Live block device operations in QEMU

Introduction to Oracle VM (Xen) Networking

Applying Polling Techniques to QEMU

What is KVM? KVM patch. Modern hypervisors must do many things that are already done by OSs Scheduler, Memory management, I/O stacks

Virtio-blk Performance Improvement

Live Migration: Even faster, now with a dedicated thread!

Task Scheduling of Real- Time Media Processing with Hardware-Assisted Virtualization Heikki Holopainen

predefined elements (CI)

Virtualization and Performance

Know your competition A review of qemu and KVM for System z

Linux Internals For MySQL DBAs. Ryan Lowe Marcos Albe Chris Giard Daniel Nichter Syam Purnam Emily Slocombe Le Peter Boros

Deploy the ASAv Using KVM

Red Hat Enterprise Linux on IBM System z Performance Evaluation

Netchannel 2: Optimizing Network Performance

AutoNUMA Red Hat, Inc.

Parallels Virtuozzo Containers

INSTALLATION RUNBOOK FOR Netronome Agilio OvS. MOS Version: 8.0 OpenStack Version:

Red Hat Virtualization 4.1 Technical Presentation May Adapted for MSP RHUG Greg Scott

KVM Virtualized I/O Performance

Nova Scheduler: Optimizing, Configuring and Deploying NFV VNF's on OpenStack

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1

Increase KVM Performance/Density

Pexip Infinity Server Design Guide

XEN and KVM in INFN production systems and a comparison between them. Riccardo Veraldi Andrea Chierici INFN - CNAF HEPiX Spring 2009

OS-caused Long JVM Pauses - Deep Dive and Solutions

Maximizing Six-Core AMD Opteron Processor Performance with RHEL

Chapter 5 C. Virtual machines

Installing the Cisco IOS XRv 9000 Router in KVM Environments

Installing the Cisco CSR 1000v in KVM Environments

Userspace NVMe Driver in QEMU

Transparent Hugepage Support

Veritas InfoScale 7.4 Virtualization Guide - Linux

Performance Considerations of Network Functions Virtualization using Containers

Flavors of Memory supported by Linux, their use and benefit. Christoph Lameter, Ph.D,

Cisco UCS Manager VM-FEX for KVM CLI Configuration Guide, Release 3.2

PAC094 Performance Tips for New Features in Workstation 5. Anne Holler Irfan Ahmad Aravind Pavuluri

Who stole my CPU? Leonid Podolny Vineeth Remanan Pillai. Systems DigitalOcean

Abstract. Testing Parameters. Introduction. Hardware Platform. Native System

JSA KVM SUPPORT. Theodore Jencks, CSE Juniper Networks

Improve Web Application Performance with Zend Platform

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee

KVM on s390: what's next?

CS370 Operating Systems

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

Transparent Hugepage

Virtualization Practices: Providing a Complete Virtual Solution in a Box

Agilio CX 2x40GbE with OVS-TC

KVM Weather Report. Red Hat Author Gleb Natapov May 29, 2013

I/O virtualization. Jiang, Yunhong Yang, Xiaowei Software and Service Group 2009 虚拟化技术全国高校师资研讨班

Installing the Cisco CSR 1000v in Citrix XenServer Environments

CSE 120 Principles of Operating Systems

Before We Start... 1

Red Hat enterprise virtualization 3.1 feature comparison

Virtualization, Xen and Denali

DPDK Vhost/Virtio Performance Report Release 18.05

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

VirtFS A virtualization aware File System pass-through

Red Hat Enterprise Linux 7 Virtualization Getting Started Guide

HPMMAP: Lightweight Memory Management for Commodity Operating Systems. University of Pittsburgh

I/O and virtualization

Linux on System z Distribution Performance Update

Virtualization with VMware ESX and VirtualCenter SMB to Enterprise

Data Path acceleration techniques in a NFV world

DPDK Vhost/Virtio Performance Report Release 18.11

Kernel Bypass. Sujay Jayakar (dsj36) 11/17/2016

Hostless Xen Deployment

vnetwork Future Direction Howie Xu, VMware R&D November 4, 2008

Libvirt: a virtualization API and beyond

Virtualization. Dr. Yingwu Zhu

HPE Helion OpenStack Carrier Grade 1.1 Release Notes HPE Helion

HP Helion OpenStack Carrier Grade 1.1: Release Notes

Choosing Hardware and Operating Systems for MySQL. Apr 15, 2009 O'Reilly MySQL Conference and Expo Santa Clara,CA by Peter Zaitsev, Percona Inc

Vhost dataplane in Qemu. Jason Wang Red Hat

VMWARE TUNING BEST PRACTICES FOR SANS, SERVER, AND NETWORKS

Red Hat Enterprise Linux 8.0 Beta

TIPS TO. ELIMINATE LATENCY in your virtualized environment

Low overhead virtual machines tracing in a cloud infrastructure

Transcription:

Tuning Your SUSE Linux Enterprise Virtualization Stack Jim Fehlig Software Engineer jfehlig@suse.com

Agenda General guidelines Network Disk CPU Memory NUMA 2

General Guidelines Minimize software installed on the host Reduces resources Reduces security risks/increases availability Synchronize time Use NTP to synchronize time on the host AND virtual machines Consider host resource requirements Host uses resources too! Avoid over-allocating resources to virtual machines Remove unneeded devices from virtual machines Use paravirtual drivers for better performance 3

General Guidelines Xen Disable autoballooning of domain0 Xen parameter 'dom0_mem=xxg' /etc/xen/xl.conf: autoballoon= off Limit domain0 vcpus Xen parameter 'dom0_max_vcpus=xx' Use tmpfs for xenstore database Default configuration in SLES12 and newer pvops kernel in SLES12 SP2 Goodbye kernel-xen, hello kernel-default 4

Network Use multiple networks to avoid congestion admin, storage, live migration,... May require using arp_filter to prevent ARP flux http://linux-ip.net/html/ether-arp.html#ether-arp-flux echo 1 > /proc/sys/net/ipv4/conf/arp_filter Same MTU in all devices to avoid fragmentation 5

Network Multiqueue-enabled Virtual NICs virtio (KVM) vhost_net backend xen-vif (Xen) netbk backend (kernel-xen) xen_netback backend (pvops) Emulated NICs e1000 Default and preferred emulated NIC rtl8139 6

Network Shared physical NICs SR-IOV macvtap VM Host communication not possible Passthrough of physical NICs, aka PCI passthrough Not supported by Intel due to security concerns Note: These approaches offer increased performance, but may complicate migration 7

Network Comparison of vnic Bandwidth 1G Network 30000 MB/s 25000 20000 15000 10000 vm2host vm2vm vm2network 5000 0 rtl8139 e1000 virtio xen-vif macvtap 8

Network virtio Several tunables available via virtual interface configuration <interface type= bridge > <model type= virtio /> < driver ioeventfd= off on queues= 4 > <host csum= off on gso= off on /> <guest csum= off on gso= off on /> </driver> <bandwidth> <inbound average= 102400 peak= 1048576 /> </bandwidth> </interface> 9

Network xen_vif <interface type= bridge > <model type= netfront /> <bandwidth> <inbound average= 102400 /> </bandwidth> </interface> netbk backend (kernel-xen) netbk.tasklets, netbk.bind, and netbk.queue_length options xen_netback backend (pvops) xen_netback.max_queues option 10

Disk Devices and Double Vision Two page caches Two copies of data in memory Two IO schedulers Guest and host both reordering and delaying IO kernel >= 3.13 has no IO scheduler for virtual devices # cat /sys/block/vd[x]/queue/scheduler none Possibly two filesystems Guest filesystem Host filesystem containing the image Possibly two volume managers Guest and host both using LVM Dr says Configure guest or host to bypass one of the redundant layers 11

Disk Block devices vs Image Files Block devices Historically better performance Use standard tools for administration/disk modification Accessible from host (pro and con) Eliminates one of the file systems Image Files Easier system management Easier to move, clone, backup Comprehensive toolkit (guestfs) for image manipulation Fully allocated vs sparse Performance vs resource consumption 12

Disk Image Files raw Most common format Historically, best performance qcow2 Required for snapshot support in libvirt + tools Improved performance and stability qed Compared to older versions of qcow2 Next generation qcow vhd/vhdx/vmdk/others Suggest using for import/export only 13

Disk Image Files vs Block Devices KB/s 16000 14000 12000 10000 8000 6000 4000 2000 read write 0 host blkdev raw qcow2 qed 14

Disk Cache Modes Write-back Host page cache enabled Writes reported completed when data placed in host page cache VM flush commands honored Default mode in KVM and Xen Write-through Host page cache enabled Writes reported completed only when data has been committed to storage device VM informed no writeback cache 15

Disk Cache Modes Directsync Host page cache disabled Writes reported completed only when data committed to storage device Useful for guests that don't send flush commands none Host page cache disabled O_DIRECT semantics Guest informed of writeback cache 16

Disk Cache Modes Unsafe Host page cache enabled Similar to writeback, except VM flush commands ignored 17

Disk Cache Modes Cache Modes and Read Bandwidth 120000 KB/s 100000 80000 60000 40000 writeback writethrough directsync none unsafe 20000 0 blkdev raw qcow2 qed 18

Disk Cache Modes Cache Modes and Write Bandwidth KB/s 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 blkdev raw qcow2 qed writeback writethrough directsync none unsafe 19

Disk - KVM Specific IO modes native Linux asynchronous IO Lower CPU overhead threads POSIX asynchronous IO emulation using a pool of worker threads Compatible with all disk types: LVM, block devices, images files Default mode in SLES <disk> <driver name='qemu' io='native threads'/> </disk> 20

Disk IO Modes IO Mode Bandwidth Characteristics 30000 25000 KB/s 20000 15000 10000 seq-read rand-read seq-write rand-write 5000 0 threads native 21

Disk KVM Specific IO Threads Dedicated threads for servicing IO requests <iothreads>2</iothreads> <devices> <disk> <driver name='qemu' iothread='1'/> </disk> <disk> <driver name='qemu' iothread='2'/> </disk> </devices> 22

Disk KVM Specific 12000 IO Thread Bandwidth 10000 KB/s 8000 6000 4000 read write 2000 0 no-iothread iothread 23

Disk IO scheduler Completely Fair Queuing (CFQ), deadline, noop, none In kernel >= 3.13 Virtual block devices only support 'none' CFQ is default for others In kernel < 3.13 default is CFQ for all block devices Tunable per device echo noop > /sys/block/<device>/queue/scheduler Disable one of the schedulers noop in the VM, deadline in the host noop in the VM, CFQ in the host 24

Disk IO Scheduler IO Scheduler Characteristics - Large Working Set 30000 25000 KB/s 20000 15000 10000 seq-read rand-read seq-write rand-write 5000 0 cfq deadline noop 25

Disk IO Scheduler IO Scheduler Characteristics - Small Working Set 30000 25000 KB/s 20000 15000 10000 seq-read rand-read seq-write rand-write 5000 0 cfq deadline noop 26

Disk IO Scheduler IO Scheduler Characteristics - Multiple VMs KB/s 1800 1600 1400 1200 1000 800 600 400 200 seq-read rand-read seq-write rand-write 0 cfq deadline noop 27

CPU - Host Avoid excessive CPU contention Due to excessive CPU overcommit or incorrect vcpu pinning Scheduler Performance vs latency CFS tuned with kernel.sched_* parameters CPU power states CPU frequency governor cpupower frequency-set -g performance Kernel parameters processor.max_cstate and intel_idle.max_cstate SLES12 Tuning Guide https://www.suse.com/documentation/sles-12/book_sle_tuning/data/book_sle_tuning.html 28

CPU Virtual Machine vcpu model and features Normalize to allow migration among heterogeneous hosts virsh capabilities virsh cpu-baseline /dev/stdin >> all-hosts-cpu-caps.xml virsh cpu-baseline all-hosts-cpu-caps.xml <cpu mode='custom' match='exact'> <model fallback='allow'>nahalem</model> <feature policy='require' name='cmt'/>... </cpu> 29

CPU Virtual Machine vcpu topology For smaller VMs (<= 8 vcpus), multiple sockets with a single core and thread, on the same NUMA node, generally give best performance For larger VMs, topologies that closely resemble the host topology generally give best performance <cpu'> <topology sockets='8' cores='1' threads='1'/>... </cpu> 30

CPU Virtual Machine vcpu Pinning Constrain vcpu threads to physical CPUs <cputune> <vcpupin vcpu='0' cpuset='0-15'/>... </cputune> <cputune> <vcpupin vcpu='0' cpuset='0'/> <vcpupin vcpu= 1 cpuset= 1 />... </cputune> 31

CPU Virtual Machine vcpu scheduling Fine tune scheduling of vcpus <cputune> <shares>2048</shares> <period>1000000</period> <quota>10000000</quota>... </cputune> 32

Memory - Host Memory overcommit generally not recommended, but... Kernel Samepage Merging (KSM) Memory overcommit technique Best results when running multiple instances of same image ksmd thread consumes 5-10% of one core with default settings echo 1 > /sys/kernel/mm/ksm/run /sys/kernel/mm/ksm/pages_to_scan /sys/kernel/mm/ksm/sleep_millisecs Warning: By default, pages common across NUMA nodes are merged Increased memory access latencies may be observed in VM echo 0 > /sys/kernel/mm/ksm/merge_across_nodes 33

Memory - KSM KSM Behavior 25 SLES12 SP2 Virtual Machines 25000 MB 20000 15000 10000 AnonPages PagesShared PagesSharing 5000 0 1 2 3 4 5 6 ksmd scans 34

Memory - Host Transparent Huge Pages (THP) Enabled by default Anonymous memory and tmpfs/shmem only Warning: May reduce performance of workloads with sparse memory access patterns echo > never /sys/kernel/mm/transparent_hugepage/enabled Huge Pages Manually control allocation and use of huge pages At boot: hugepagesz=2m hugepages=8192 Runtime: echo 8192 > /proc/sys/vm/nr_hugepages Virtual machine configuration <memorybacking> <hugepages/> </memorybacking> 35

Memory Virtual Machine Lock VM pages to prevent swapping <memorybacking> <locked/> </memorybacking> Prevent page sharing <memorybacking> <nosharepages/> </memorybacking> 36

NUMA Potentially huge impact on performance Consider host topology when sizing guests virsh {nodeinfo, capabilities, freecell} Prevent vcpus from floating across NUMA nodes vcpu pinning Avoid allocating VM memory across NUMA nodes <numatune> <memory mode='strict' nodeset='1'/> </numatune> Disable NUMA autobalance in host if pinning VM resources echo 0 > /proc/sys/kernel/numa_autobalancing 37

NUMA Consider vnuma for large virtual machines <cpu> <numa> <cell id= 0 cpus= 0-15 memory= 16777216 unit='kib'/> <cell id= 1 cpus= 16-31 memory= 16777216 unit='kib'/> </numa> </cpu> 38

NUMA MB/s 45000 40000 35000 30000 25000 20000 15000 10000 5000 0 Memory Bandwidth Comparison VM Fits on Single NUMA Node 16vcpu-basic 16vcpu-pinned 16vcpu-vnuma read write 39

NUMA Memory Bandwidth Comparison VM Larger than Single NUMA Node MB/s 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 72vcpu-basic 72vcpu-vnuma read write 40

NUMA Memory Access Comparison Local vs Remote Access 3 2.5 GB/s 2 1.5 1 local remote 0.5 0 pnuma vnuma 41

NUMA Memory Bandwidth Comparison 35 30 GB/s 25 20 15 10 8x1 1x8 4x4 5 0 pnuma vnuma 42

NUMA Convergence Latency 3 2.5 Sec 2 1.5 1 1x4 4x4 4x1 8x1 0.5 0 pnuma vnuma 43