Software Engineering at VMware Dan Scales May 2008

Similar documents
Virtualization with VMware ESX and VirtualCenter SMB to Enterprise

Virtualization with VMware ESX and VirtualCenter SMB to Enterprise

How it can help your organisation

An overview of virtual machine architecture

VMware vsphere with ESX 4 and vcenter

VMware vsphere with ESX 4.1 and vcenter 4.1

VMware Overview VMware Infrastructure 3: Install and Configure Rev C Copyright 2007 VMware, Inc. All rights reserved.

Securing the Data Center against

Module 1: Virtualization. Types of Interfaces

Vmware VCP-101E. VMware ESX Server System Management II. Download Full Version :

Introduction to Virtualization. From NDG In partnership with VMware IT Academy

VMware ESX Server 3i. December 2007

Nested Virtualization and Server Consolidation

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

CHAPTER 16 - VIRTUAL MACHINES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Solutions

Exam : VMWare VCP-310

Virtualization. ...or how adding another layer of abstraction is changing the world. CIS 399: Unix Skills University of Pennsylvania.

Virtual Machines Disco and Xen (Lecture 10, cs262a) Ion Stoica & Ali Ghodsi UC Berkeley February 26, 2018

Virtualization and memory hierarchy

VMWARE TUNING BEST PRACTICES FOR SANS, SERVER, AND NETWORKS

Performance & Scalability Testing in Virtual Environment Hemant Gaidhani, Senior Technical Marketing Manager, VMware

"Charting the Course... VMware vsphere 6.7 Boot Camp. Course Summary

By the end of the class, attendees will have learned the skills, and best practices of virtualization. Attendees

Chapter 5 C. Virtual machines

PAC532 iscsi and NAS in ESX Server 3.0. Andy Banta Senior Storage Engineer VMware

Operating Systems. read the warning about the size of the project make sure you get the 6 th edition (or later) of the book

CS 550 Operating Systems Spring Operating Systems Overview

The Convergence of Storage and Server Virtualization Solarflare Communications, Inc.

Better Security with Virtual Machines

VCP410 VMware vsphere Cue Cards

Directions in Data Centre Virtualization and Management

Virtual Recovery for Real Disasters: Virtualization s Impact on DR Planning. Caddy Tan Regional Manager, Asia Pacific Operations Double-Take Software

1 Quantum Corporation 1

COPYRIGHTED MATERIAL. Introducing VMware Infrastructure 3. Chapter 1

CS370 Operating Systems

VMware vsphere 6.5 Boot Camp

File backup and recovery programs store duplicate copies of data to protect you from data loss.

L4/Darwin: Evolving UNIX. Charles Gray Research Engineer, National ICT Australia

Memory Management. To improve CPU utilization in a multiprogramming environment we need multiple programs in main memory at the same time.

Deep Dive: Cluster File System 6.0 new Features & Capabilities

Advanced Memory Management

vtserver Running vtserver on Hypervisors

Unit 5: Distributed, Real-Time, and Multimedia Systems

Virtualization. Dr. Yingwu Zhu

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

VMware vsphere with ESX 6 and vcenter 6

Changes in VCP6.5-DCV exam blueprint vs VCP6

W H I T E P A P E R. What s New in VMware vsphere 4: Performance Enhancements

Hyper-V Innovations for the SMB. IT Pro Camp, Northwest Florida State College, Niceville, FL, October 5, 2013

PAC094 Performance Tips for New Features in Workstation 5. Anne Holler Irfan Ahmad Aravind Pavuluri

CHAPTER 16 - VIRTUAL MACHINES

VMware HA: Overview & Technical Best Practices

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Four Components of a Computer System

Virtualization Overview

VMware vsphere: Taking Virtualization to the Next Level

Lecture 1. CMSC 412 S17 (lect 1)

Hypervisor security. Evgeny Yakovlev, DEFCON NN, 2017

Background: Operating Systems

Virtualization and Virtual Machines. CS522 Principles of Computer Systems Dr. Edouard Bugnion

CSE 120 Principles of Operating Systems

Roadmap for Challenging Times System Virtualiztion

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

VMware vsphere 5.5 Advanced Administration

Configuring and Managing Virtual Storage

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

What s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1

Operating Systems (2INC0) 2018/19. Introduction (01) Dr. Tanir Ozcelebi. Courtesy of Prof. Dr. Johan Lukkien. System Architecture and Networking Group

Architecture and Performance Implications

Vmware VCP-310. VMware Certified Professional on VI3.

Virtual Machine Monitors!

VMware vsphere 5.5 Professional Bootcamp

VMware Infrastructure 3 Pricing & Packaging

Network Design Considerations for VMware Deployments. Koo Juan Huat

Comprehensive Kernel Instrumentation via Dynamic Binary Translation

Understanding Virtual System Data Protection

Bacula Systems Virtual Machine Performance Backup Suite

19: I/O. Mark Handley. Direct Memory Access (DMA)

Disaster Recovery-to-the- Cloud Best Practices

Best Practices for deploying VMware ESX 3.x and 2.5.x server with EMC Storage products. Sheetal Kochavara Systems Engineer, EMC Corporation

Multiprocessor Scheduling. Multiprocessor Scheduling

W H I T E P A P E R. Comparison of Storage Protocol Performance in VMware vsphere 4

Virtualizing Oracle on VMware

predefined elements (CI)

Virtual Machines. Part 2: starting 19 years ago. Operating Systems In Depth IX 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

VMware Infrastructure Update 1 for Dell PowerEdge Systems. Deployment Guide. support.dell.com

Chapter 2: Operating-System Structures

[TITLE] Virtualization 360: Microsoft Virtualization Strategy, Products, and Solutions for the New Economy

G Disco. Robert Grimm New York University

Productizing Linux Applications. What We ll Cover

OS structure. Process management. Major OS components. CSE 451: Operating Systems Spring Module 3 Operating System Components and Structure

Configuration Maximums VMware Infrastructure 3: ESX Server 3.5 Update 2, ESX Server 3i version 3.5 Update 2, VirtualCenter 2.

Veritas Storage Foundation in a VMware ESX Environment

Administrative Details. CS 140 Final Review Session. Pre-Midterm. Plan For Today. Disks + I/O. Pre-Midterm, cont.

GFS: The Google File System

VMware Infrastructure The New Computing Platform. Presented by: Nick Smith Corporate Account Manager, VMware

vnetwork Future Direction Howie Xu, VMware R&D November 4, 2008

Hyperconverge Your Way to 2-SKU Data Center

Virtual Machine Backup Guide Update 2 Release for ESX Server 3.5, ESX Server 3i version 3.5, VirtualCenter 2.5

Transcription:

Software Engineering at VMware Dan Scales May 2008 Eng_BC_Mod 1.Product Overview v091806

The Challenge Suppose that you have a very popular software platform: that includes hardware-level and OS code that can easily crash a machine that must run on many different hardware platforms that big corporations use all the time and must be very reliable that customers really like and constantly want new features for How do you do very rapid development of large new features and yet produce high-quality releases Requires concurrent development via many developers But can you really do high-quality releases with such concurrent development? 2

The Challenge (cont.) Big part of the solution: Good developers Coding practices that allow lots of testing while code is running Reliable (and fast!) tools for building, testing, and merging code Constant automated QA (testing) 3

Outline Background on VMware Scale of development and releases at VMware, and some of our problems Rapid concurrent development Rapidly evolving interfaces Many product releases sharing same code Many hardware platforms Some software techniques for keeping OS reliable Some more automated techniques for keeping code base robust and ensuring quality of releases 4

What is Virtualization? Without Virtualization With Virtualization Application Operating System Hardware VMware provides hardware virtualization that presents a complete x86 platform to the virtual machine Allows multiple applications to run in isolation within virtual machines on the same physical machine Virtualization provides direct access to the hardware resources to give you much greater performance than software emulation 5

Why are virtual machines useful Run Windows on Linux (or Mac OS) Consolidate many different workloads/guest OSes on one big machine ( server consolidation ) We can migrate running virtual machines (VMs) run one host to another, so we can do automatic workload balancing across a cluster of machines VMs also simplify high availability, disaster recovery, etc. 6

VMware Virtualization Architectures VMware Workstation VMware ESX Server App VMX VMX VM VMM VM VMM VMX VMX VMX VM VMM VM VM VMM VMM Resource Scheduling Storage Stack Network Stack Host OS VMkernel Device Drivers Hosted design Runs on Windows, Linux, MacOS Device support is inherited from host operating system Virtualization installs like an application rather than like an operating system Hypervisor Virtualization supported via small kernel (VMkernel) Highly efficient direct I/O pass-through architecture for network and disk Excellent management/scheduling of hardware resources 7

Virtual Machine Management VirtualCenter 2.0 3 rd Party Apps VirtualCenter UI VC daemon VMX/VMM VC daemon VMX/VMM vmkernel ESX 3.0 ESX 3.0 vmkernel We touch every level of the software stack drivers, CPU code, OS code, user-level code, UI, management agents, management applications We keep adding features that can span the entire stack Strong interfaces management API and DDK Most intertwined code VMX, VMM, and vmkernel 8

Customers constantly want new features 64-bit support 10-Gigabit ethernet Support for new features in Intel/AMD chips (including virtualization support) NUMA support Storage migration Network boot.. 9

10

Outline Background on VMware Scale of development and releases at VMware, and some of our problems Rapid concurrent development Rapidly evolving interfaces Many product releases sharing same code Many hardware platforms Some software techniques for keeping OS reliable Some more automated techniques for keeping code base robust and ensuring quality of releases 11

Goal Goal: fast development of new features and support for new platforms while maintaining performance and reliability Problems: Highly concurrent development Rapidly evolving interfaces Frequent and concurrent releases Many hardware platforms 12

Highly concurrent development Biggest areas of inter-dependence: vmkernel, VMX, and VMM Largely written in C Recently, for the vmkernel (hypervisor): 15-19 checkins per day 150 developers For the vmkernel, vmx, and vmm: 36 checkins per day 230 developers Need to minimize problems with checkins, so one checkin with issues doesn t affect lots of developers 13

Many releases VMware Workstation, VMware Server, and ESX Server that are released independently All share common code (VMX and VMM) Many different platforms: Windows Linux MacOS VMkernel variety of server hardware Variety of storage hardware (SCSI, FibreChannel, iscsi, NFS) and network hardware (1 Gb Ethernet, 10 Gb Ethernet, TCP offload, ) 32-bit vs. 64-bit 14

Concurrent Development and Releases You need to integrate a variety of features and bug fixes into the code base (Main) You need to create stable snapshots of the source code for product release Typically, you create branches of the code for product releases, so development for future releases can continue on Main Code will have to be cross-ported from product branch to Main Feature1 Main Feature3 Release2 time Feature2 Release1 15

Outline Background on VMware Scale of development and releases at VMware, and some of our problems Rapid concurrent development Rapidly evolving interfaces Many product releases sharing same code Many hardware platforms Some software techniques for keeping OS reliable Some more automated techniques for keeping code base robust and ensuring quality of releases 16

Software development for vmkernel Mostly written in C Includes base kernel, ~40 drivers, ~25 other modules Modules do things such as: Virtual switch Storage multipathing VM migration Distributed file system (VMFS) Important to continually verify that new features/subsystems will work together 17

Static Checking Maintain the strictest possible static checking, so we check easy bugs as soon as possible Important to do from the start, else it becomes hard to fix later E.g. all those warning messages that come from linux drivers. We use the strictest type checking options from the compiler Sometimes use C++ compiler, since it does stricter checking Catch 64-bit VMotion bug Coverity is very useful 18

Verify/test code while running Make it easy to enable/disable the verification code Development and release builds, all extra code disappears in release builds Assertions are crucial Simple and very general assertions are invaluable e.g. don t ever block while you have a spin lock or in ISR Document and verify invariants at beginning of functions e.g. the scheduling lock must be held Allows a module/function to protect against a drastic change in its usage Catch compiler and processor bugs! Page table change (CR3) doesn t immediately take effect 19

Verify/test code while running (cont.) Lock ranking Establish global order for all locks, locks must be acquired in that order to guarantee no deadlock E.g. buffer cache lock must always be acquired before file system lock Extra benefit discourages modules from creating too many locks Stress options Used for causing unusual, but recoverable conditions if (STRESS_OPTION(disk_error)) return DISK_FAILURE; Enabled in devel builds, returns TRUE in (say) 1 out of 1000 calls Allows testing of cases that would be very hard for QA to reproduce 20

Verify/test code while running (cont.) POST (Power-on self test) Tests that run when module is initialized Hard to get developers to write them Heap memory poisoning 21

Modularity Make it more likely that one module can t affect another Resource usage verify that a bottom-half or kernel thread doesn t run for more than reasonable limit Can lead to very mysterious networking performance. (serial port bug) Separate heap for each module So memory bug with one module is less likely to affect another 22

Modularity (cont.) Verify DMA is to appropriate memory DMA goes around MMU, can write arbitrary memory IO MMUs are coming Use address-space protection Add extra address space protections in devel builds Separate address space for drivers (Nooks work) Move code out to user space 23

Miscellaneous Coding conventions Get error codes right from the start Don t use 0 and 1! Even Linux error codes don t carry enough information Want more general error codes that carry arbitrary other information Probably want requirement to never ignore error codes Interestingly, many of these kernel issues apply to other multi-threaded, modular applications: Hostd multi-threaded management daemon on ESX host Web-servers App servers 24

Outline Background on VMware Scale of development and releases at VMware, and some of our problems Rapid concurrent development Rapidly evolving interfaces Many product releases sharing same code Many hardware platforms Some software techniques for keeping OS reliable Some more automated techniques for keeping code base robust and ensuring quality of releases 25

Large-scale Concurrent Development How do you keep source code base (Main) stable for developers and product releases? Make it easy to test/verify code while running Good developers who are careful and follow responsibilities Code reviews Pre-checkin build Pre-checkin testing Policy for determining how code is integrated into Main 26

Modular code with well-defined interfaces develop in separate branches and merge Most changes span modules and interfaces always changing free-for-all! Concurrent development Most changes confined to modules, but interfaces change component branching developers A B A B developers More stable than free-for-all Requires sync back-and-forth developers developers A B develope A A A B B B developers Some features are huge and touch every part of the system: Feature developers A B major features developed on feature branch A B developers 27

What to use? Feature branches are painful Because you must always merge with changes from other developers until you are ready to check in But useful for really big features that can be incrementally added VMware uses component branching Requires merging between the component branches Most other OSes seem to do this as well (Microsoft, Linux, have hierarchy of branches with human mergers) Big changes may use feature branch Component branching requires frequent sync ing between branches after really good testing 28

Fast automation is key! We need to be able to do automated testing We have many levels of automatic tests: pre-checkin, nightly tests, longer tests before merging between branches.. We need automatic merging of code from one branch to another Alternatively, have a gatekeeper that manually does merge, verifies functionality e.g. Linux integration trees For all this, we need a very fast build system So developers will build on all possible platforms before checking in So we can verify checkins to branches, merges to other branches very quick So we can quickly check if checked-in code compiles on all platforms 29

Concurrent Development Problem: rapid concurrent development of code with rapidly changing interfaces Solutions: Developer responsibilities, including pre-checkin tests Component branching Automated build of tree after checkin Automated test of build after checkin Should we have automatic backout if build or tests fail? Must be quick to isolate exact change that caused problem. Continous manual and automated QA to find bugs early. 30

Concurrent releases Problem: releases of many products on many platforms Solutions Build all products on all branches in order to catch problems early Create product branches when nearing release of a product Feature1 Main Feature3 Release2 time Feature2 Release1 31

Concurrent Releases (cont.) Product branches automatically merge changes to main development tree Need to do continuous builds on main tree to check for problem because of automerges Need to remind developers if automerges fail Should really do merging via human Feature1 Release2 Main time Feature2 Release1 Bug-fix automerge 32

Conclusion Rapid development of new features on a large code base that is changing at all layers is hard! Solutions: Good developers with good development practices Coding practices that allow lots of testing while code is running Reliable (and fast!) tools for building, testing, and merging code Continuous testing (manual and automated) Developers are important, but a good build/tools team is also crucial 33