Beyond 1001 Dedicated Data Service Instances

Introduction

The Challenge

Given: Application platform based on Cloud Foundry to serve thousands of apps

Application Runtime

Many platform users - who don t know each other Different app langs & frameworks 100% on-demand self-service: no involvement of the platform operator necessary. Instant scalablity & self-healing

Easy Deployment $> cf push myapp

Runtime abstraction

Java Bildpack Java Code Staging Java Droplet Execution Droplet

Ruby Bildpack Ruby Staging Ruby Droplet Execution Droplet

Java Droplet Droplet Droplet Ruby Droplet

In front of the Cloud Foundry Runtime all droplets are equal

Droplet Droplet $START_CMD $START_CMD

Droplet Container Image Droplet $START_CMD $START_CMD $START_CMD

Something you can execute in a container. Something you can execute in a container. Something you can execute in a container. $START_CMD $START_CMD $START_CMD

Abstraction enables further assumptions & automation

Scaling Apps Cloud Foundry Runtime App#2 Instance#1 $> App#1 Instance#1 App#2 Instance#2 Assuming this to be our status quo.

App Scalability $> cf scale -i 3 app#1

Scaling Apps Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#2 $> App#1 Instance#3 App#2 Instance#2 Two additional instances have been created.

App Self-Healing

App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#2 $> App#1 Instance#3 App#2 Instance#2 Everything is healthy.

App Self-Healing Cloud Foundry Runtime App#1 Instance#1 App1 Instance $> 2 App#1 Instance#3 App#2 Instance#1 App#2 Instance#2 App #1 Instance #2 is failing.

App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 $> App#1 Instance#3 App#2 Instance#2 App #1 Instance #2 - gone temporarily.

App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#2 $> App#1 Instance#3 App#2 Instance#2 App #1 Instance #2 re-created.

How does the paradise for Backing Services look like?

Missing: A solution to serve thousands of data services

Application Runtime Data Services

The Mission

Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures

a9s PostgreSQL a9s MongoDB a9s RabbitMQ Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures a9s Elasticsearch a9s Redis a9s LogMe

Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures

Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures.

Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures

in both:

public in both:

in both: public and on-premise clouds

and integrate well with multiple platforms

Requirements

Portability Security Usability Scalability Performance Maintainability Robustness Manageability Flexibility On-demand self-service Extensibility Multi-tenancy

Portability Scalability Production-Readiness On-demand self-service

Design

How to build it?

Data Service Provisioning API Automation Middleware Data Service Automation

Open Service Broker, a new industry standard for data service provisioning.

Open Service Broker API Supporters Google Pivotal IBM RedHat Fujitsu SAP

Supporting Platforms Cloud Foundry OpenShift Kubernetes More to come

Get Service Catalog GET /v2/catalog Provision Service - Create Service Instance PUT /v2/service_instances/:id Bind Service PUT /v2/service_instances/:instance_id/ service_bindings/:id Unbind Service DELETE /v2/service_instances/:instance_id/ service_bindings/:id Unprovision Service DELETE /v2/service_instances/:id http://docs.cloudfoundry.org/services/api.html#api-overview

HTTP Verb Action Service Catalog GET /v2/catalog Create Service Instance PUT /v2/service_instances/:id Create Service Binding PUT /v2/service_instances/:instance_id/ service_bindings/:id Delete Service Binding DELETE /v2/ service_instances/:instance_id/ service_bindings/:id Delete Service Instance DELETE /v2/service_instances/:id Deliver meta data about the data service. Provision a VM, install and configure a data service VMs / Cluster representing a service instance. Create a data service user and return credentials representing a service binding. Remove credentials associated with the service binding. Destroy the VMs and data associated with the service instance.

Data Service Provisioning API Automation Middleware Data Service

The Open Service Broker API does not define what a service instance is.

Applying the design pattern: On-Demand Provisioning of Dedicated Data Service Instances

Result

Using a Service Broker with Cloud Foundry $> cf create-service

Easy Deployment $> cf create-service mongodb single-small my-single-mongo-1

my-single-mongo-1 MongoDB VM#1

Easy Deployment $> cf create-service mongodb cluster-small my-3node-mongocluster-2

Newly created service instance my-single-mongo-1 my-3node-mongo-cluster-2 MongoDB VM#1 MongoDB VM#1 MongoDB VM#2 MongoDB VM#3

Technical Challenges

State

State is handled differently in each backing service.

State is handled differently in each backing service. Operational model will be different. Replication, failure detection, failover.

State is handled differently in each backing service. Operational model will be different. Replication, failure detection, failover. The data service automation will be different.

Where to store state?

App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#2 App#1 Instance#3 App#2 Instance#2 Everything is healthy.

App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App1 Instance 2 App#1 Instance#3 App#2 Instance#2 App #1 Instance #2 is failing.

App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#3 App#2 Instance#2 App #1 Instance #2 - gone temporarily.

App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#2 App#1 Instance#3 App#2 Instance#2 App #1 Instance #2 re-created.

App self-healing is easy because there is NO STATE.

App self-healing is easy because there is NO STATE. How to store state but still being able to perform self-healing?

Store state on a remotely attached block device = persistent disk.

IaaS API VIRTUAL DATACENTER Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack

IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE Operating System Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD Storage Volume HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack

Persistent disk has a file system. Filesystems may fail replication / clustering & backups are still very important.

The data lifecycle has been decoupled from the VM lifecycle The VM becomes disposable.

Storing state

What needs to be automated?

Data Service Instance Lifecycle

Lifecycle of a Data Service Instance 1 Provision a data service server Install data service software Configure data service software Consume data service with apps Debug data service issues Update data service version Update operating system Backup & recover data Scale out data service VM(s) Destroy data service & DB VM(s)

Can you do that x * 1000 times?

You either automate it or delegate it to the app developer.

Automation

BOSH

BOSH let s you orchestrate the lifecycle of large-scale deployments of stateful distributed systems to infrastructure.

BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE BOSH API BOSH BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack

BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE Operating System BOSH Agent VIRTUAL MACHINE BOSH API BOSH BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD Storage Volume HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack

BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE Operating System BOSH Agent VIRTUAL MACHINE BOSH API BOSH BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack

BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE PostgreSQL Operating System BOSH Agent VIRTUAL MACHINE BOSH API BOSH BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack

BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE PostgreSQL Operating System Operating Cloud Controller System Operating UAA System BOSH API BOSH BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack

BOSH Automation

BOSH Releases contain the automation

A BOSH-Deployment depends on 1.. * Stemcells

A BOSH-Deployment is described by a Release & Manifest

A Release describe 1.. * Jobs

A Release contains 1.. * Package

A BOSH Deployment s settings are contained in a Manifest

Infrastructure settings settings are contained in the Cloud Config

BOSH makes your deployments

Infrastructure Independent

A BOSH release contains the main-automation (software packages, how to run processes) BOSH releases can be re-used on every* infrastructure

Automate once, deploy everywhere.

BOSH CLI BOSH BOSH BOSH VMware AWS OpenStack

BOSH CLI BOSH BOSH BOSH VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service / App Some Service / App Some Service / App BOSH Agent BOSH Agent BOSH Agent VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service / App Some Service / App Some Service / App BOSH Agent BOSH Agent BOSH Agent VMware AWS OpenStack

BOSH CLI BOSH BOSH BOSH VMware AWS OpenStack

BOSH CLI $> bosh target http://bosh-on.aws.com BOSH BOSH BOSH VMware AWS OpenStack

BOSH CLI BOSH BOSH BOSH VMware AWS OpenStack

BOSH CLI $> bosh deploy BOSH BOSH BOSH VMware AWS OpenStack

Switch deployment between clouds Keep the same release Use a stemcell specific to the new cloud Adapt the cloud config

Operating System Independent

A BOSH release does not depend on the OS

The only dependency to the OS is a BOSH stemcell

VIRTUAL MACHINE Operating System Image BOSH Agent

VIRTUAL MACHINE Operating System Image BOSH Agent }

VIRTUAL MACHINE Operating System Image BOSH Agent }OS image + BOSH agent = Stemcell

VIRTUAL MACHINE Ubuntu Stemcell BOSH Agent

Changing the OS of a BOSH deployed system Keep the same release Change the stemcell Change the manifest

Scalable

Horizontal Scaling

VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service Some Service Some Service Some Service Some Service Some Service Some Service BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service Some Service Some Service Some Service Some Service Some Service Some Service BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent Horizontal Scaling VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service Some Service Some Service Some Service Some Service Some Service Some Service BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service Some Service Some Service Some Service Some Service Some Service Some Service BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent

Scaling-out a BOSH deployed system Keep the same release Use the same stemcell Change the manifest

Vertical Scaling

VIRTUAL MACHINE 4 GB RAM, 1 vcpu 4GB RAM 1 vcpu 10GB persistent disk PostgreSQL Data 10 GB Persistent Disk BOSH Agent

VIRTUAL MACHINE 4 GB RAM, 1 vcpu PostgreSQL Data 10 GB Persistent Disk BOSH Agent

VIRTUAL MACHINE 4 GB RAM, 1 vcpu PostgreSQL BOSH Agent

VIRTUAL MACHINE 8 GB RAM, 2 vcpus PostgreSQL Data 10 GB Persistent Disk 20 GB Persistent Disk BOSH Agent

VIRTUAL MACHINE 8 GB RAM, 2 vcpus PostgreSQL Data 10 GB Persistent Disk Data 20 GB Persistent Disk BOSH Agent

VIRTUAL MACHINE 8 GB RAM, 2 vcpus PostgreSQL Data 20 GB Persistent Disk BOSH Agent

BOSH Deployments are Predictable

Src code is compiled in a freshly created VMs VMs always contain exact the software, specified in the release No left-overs of prior deployments as new VMs are used.

BOSH Deployments are Repeatable

Executing a specific BOSH deployment always leads to exact same deployed system.

Monitored & Self-Healing

Self-healing process failures

BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Some other process Process Monitor BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor

Self-healing process monitor failures

BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Process Monitor Some other process BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor

BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Process Monitor Some other process BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor

BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Process Monitor Some other process BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor

Self-healing VM failures

Self-healing BOSH Agent failures

BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Process Monitor Some other process BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor

BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Process Monitor Some other process BLOB Store VIRTUAL MACHINE BOSH Agent Process Monitor Yet another process

Data Service Provisioning API Automation Middleware Data Service Automation

Reference Architecture

CF Client create service Cloud Controller create service a9s PostgreSQL SPI create binding Cloud Foundry Adapter a9s Service Broker Middleware Adapter create deployment from template xy with attributes { } create service specific credentials Templates a9s Deployer Deployments deploy release abc & deployment manifest xyz Bosh Execute deployments Service Instance Service Instance Service Instance my-single-postgres-1 my-3node-postgres-cluster-2 my-3node-postgres-cluster-3 VM#1 VM#1 VM#2 VM#3 VM#1 VM#2 VM#3

Data Service Provisioning API Automation Middleware Data Service Automation

Portability BOSH: Multi-infrastructure support Open Service Broker API: Multi-platform support Scalability On-demand provisioning of dedicated service instances BOSH: Scale existing service instances vertically, solo & clustered instances Production Readyness Dedicated data service instances / Strong instance isolation BOSH: Self-healing, clustered service instances, backup & restore On-demand self-service Open Service Broker API, On-demand provisioning, ondemand updates On-demand backup & restore

a9s PostgreSQL a9s MongoDB a9s RabbitMQ a9s Elasticsearch a9s Redis a9s LogMe

Operations

Continuous Data Service Delivery

Delivering Data Service Patches Open Source PostgreSQL Building new Data Service Releases a9s PostgreSQL Upstream Release Build Test a9s Release Platform #1 Platform #2 Updating the Data Services Update Data Service Instances Platform #n

Common Maintenance Tasks to performed at Scale

Create Service Instance Create VM Install and start data services

Vertical Scale Service Instance Destroy old VM Create new VM Mount old persistent disk Create and mount new persistent disk Copy data Optional: reintegrate into the cluster

OS Update Destroy old VM Create new VM based on new Stemcell (\w new OS version) Attach persistent disk

Ultimate Question

Can you handle more than 1001 Data Service Instances?

Yes.

Excerpt from our perf tests:

Provisioning 1001 instances in sequence over a greater period of time does not expose any significant bottleneck.

For large highly-frequented platforms the amount of simultaneous deployments may become relevant.

Data Service Instances BOSH Queue Time Avg. time to provision VM Total time needed to create instances 250 14 min 6:57 min 21 min 500 29 min 7:01 36 min 750 46 min 7:13 53 min

Optimization Task: Manage BOSH queueing time to an acceptable level.

Scaling BOSH is key to deal with simultaneous provisioning.

Sum Up

Sum Up Full lifecycle automation is feasible Open Service Broker API, a new standard Choosing the right automation technology is key, e.g. BOSH CI/CD based dev and ops are essential

Questions?

Common Data Service Design Patterns A. Shared VM cluster B. Dedicated containers C. Dedicated VMs / VM clusters

Scaling a shared VM cluster

Scaling a shared VM cluster MongoDB Cluster 3 VMs MongoDB VM #1 MongoDB VM #2 MongoDB VM #3

Scaling a shared VM cluster Low costs per service instance

Scaling a shared VM cluster Simple Service Broker Logic

create service create a database

create service binding create a database user

Weak Isolation!

Structural Limitation!

Scaling a shared VM cluster What to do when the shared cluster is full?

Scaling a shared VM cluster MongoDB Cluster 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3 Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max

Scaling a shared VM cluster MongoDB Cluster #1 3 VMs 3 VMs Service Instance #1 #1 = database = #1 #1 Service Instance #2 #2 = database = #2 #2 Service Instance #3 #3 = database = #3 #3 Service Instance #4 #4 = database = #4 #4 Service Instance #5 #5 = database = #5 #5 Service Instance #6 #6 = database = #6 #6 Service Instance # # = database = # # Service Instance # n-max # = database = #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3

Scaling a shared VM cluster MongoDB Cluster #1 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3 MongoDB Cluster #2 3 VMs Service Instance #n+1 Service Instance #n+2 Service Instance #n+3 Service Instance #n+4 Service Instance #n+5 Service Instance #n+6 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6

Scaling a shared VM cluster Simple Service Broker Logic

Scaling a shared VM cluster Complex Service Broker Logic

Fragmentation

Fragmentation MongoDB Cluster #1 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 MongoDB Cluster #2 3 VMs Service Instance #n+1 Service Instance #n+2 Service Instance #n+3 Service Instance #n+4 Service Instance #n+5 Service Instance #n+6 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 Caused by frequent creation and / deletion of service instances

Fragmentation MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 Caused by frequent creation and / deletion of service instances

Placement Problem

Placement Problem MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6

Placement Problem MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3? Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 New Service Instance

Cluster Rebalancing

Cluster Rebalancing MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6

Cluster Rebalancing MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6

Cluster Rebalancing MongoDB Cluster #1 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance # 2*n-max Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 A cluster rebalance freeing infrastructure resources would desirable.

Shared Cluster Conclusion

Scalability issued can be addressed Isolation issues are heavily data-service specific > A generic solution is not possible.

Scaling Dedicated Containers

Better Isolation

Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Docker host VM #1 Docker host VM #2

Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Service Instance #2 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Docker host VM #1 Docker host VM #2

How to scale?

Structural Limitation!

Scaling a shared VM cluster What to do when the Cell/Cluster is full?

Scaling Dedicated Containers PosgreSQL Cell #1 2 VMs across 2 AZs PosgreSQL Cell #2 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes Service Instance #3 = 2 Docker containers + 2 PostgreSQL processes Service Instance #2 = 2 Docker containers + 2 PostgreSQL processes Service Instance #3 = 2 Docker containers + 2 PostgreSQL processes Docker host VM #1 Docker host VM #2 Docker host VM #3 Docker host VM #4

Same Service Broker Challenge

New Challenge: How to add Cell-VMs on-demand?

On-Demand VM provisioning is unavoidable.

Why not delegate most challenges?

On-Demand Dedicated VMs and Clusters

Architecture

CF Client create service Cloud Controller create service a9s MongoDB SPI create binding Cloud Foundry Adapter a9s Service Broker Middleware Adapter create deployment from template xy with attributes { } create service specific credentials Templates a9s Deployer Deployments deploy release abc & deployment manifest xyz Bosh Execute deployments Service Instance Service Instance Service Instance my-single-mongodb-1 my-3node-mongodb-cluster-2 my-3node-mongodb-cluster-3 MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB VM#1 VM#1 VM#2 VM#3 VM#1 VM#2 VM#3

Let BOSH do the VM orchestration!

Let the infrastructure solve the placement and fragmentation challenge!

Shared Data Services

Shared PostgreSQL Cluster > Bad idea 1x 1x Single PostgreSQL Server 1 VM Service Instance 1 Service Instance 2 VM#1 Service Instance 3 OR PostgreSQL Cluster 3 VMs Service Instance 1 Service Instance 1 Service Instance 1 Service Instance 2 Service Instance 2 Service Instance 2 VM#1 VM#2 VM#3 Service Instance 3 Service Instance 3 Service Instance 3 Single VM or single cluster of VMs Single PostgreSQL server or single PostgreSQL cluster Isolation limited to PostreSQL multitenancy capabilities

Shared PostgreSQL = SPOF

Cloud Foundry Runtime PostgreSQL Cluster 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance Service Instance VM#1 VM#2 VM#3 Service Instance Service Instance Service Instance

Cloud Foundry Runtime PostgreSQL Cluster 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance Service Instance App App App App App App App App App App Service Instance Service Instance Service Instance App App App App App App App App App App App App App App App App App App App App

Cloud Foundry Runtime App App App App App App App App App App App App App App App App App App App PostgreSQL Cluster App App Service Instance 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance App App Service Instance Service Instance Service Instance App App App App App App App App App App App App App

Your shared PostgreSQL cluster goes down, all your PostgreSQL database instances go down.

Beware of bad neighborhood

Cloud Foundry Runtime Service Instance Service Instance VM#1 Service Instance PostgreSQL Cluster 3 VMs Service Instance Service Instance VM#2 Service Instance Service Instance Service Instance VM#3 Service Instance

Shared clusters are vulnerable to bad neighbors

Dedicated

Dedicated PostgreSQL instances > Good idea n x Service Instance my-single-postgres-1 VM#1 and / or Service Instance my-3node-postgres-cluster-2 Service instance = dedicated VM or dedicated cluster of VMs Uses infrastructure m x isolation to enable VM#1 VM#2 VM#3 multi-tenancy support

Cloud Foundry Runtime Service Instance VM Service Instance VM Service Instance Service Instance VM VM Service Instance Service Instance VM VM Service Instance VM#1 VM#2 VM#3 Service Instance VM#1 VM#2 VM#3

PostgreSQL failures are contained. Only one service instance affected.

Bad neighborhood protection with dedicated service instances

Cloud Foundry Runtime PostgreSQL Cluster = Service Instance #1 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance PostgreSQL Cluster = Service Instance #2 3 VMs Service Instance Service Instance Service Instance VM#1 VM#2 VM#3 PostgreSQL Cluster = Service Instance #3 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance

Cloud Foundry Runtime PostgreSQL Cluster = Service Instance #1 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance PostgreSQL Cluster = Service Instance #2 3 VMs Service VM#1 Instance Service VM#2 Instance Service Instance VM#3 PostgreSQL Cluster = Service Instance #3 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance

Cloud Foundry Runtime PostgreSQL Cluster = Service Instance #1 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance PostgreSQL Cluster = Service Instance #2 3 VMs Service VM#1 Instance Service VM#2 Instance Service Instance VM#3 PostgreSQL Cluster = Service Instance #3 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance Infrastructure isolation

Dedicated clusters isolate bad neighbors

Cloud

Cloud Automation

Cloud Automation Robustness

Cloud Automation Robustness Self-Healing

Cloud Automation Robustness Self-Healing Scalability

Cloud Automation Robustness Self-Healing Scalability On-demand self-service

Resource Type VM vs. Container Provisioning Pre-provisioned vs. On-demand-provisioning Failover Strategy Resurrection Failover vs. Standby-Failover Data Redundancy Single replica vs. Data redundancy / Multiple replicas Infrastructure Reliability Perfect. HA VMs. No SPOFs. Never fails. VMs cost more than a Design to fail. Fails from time to time. Saves money. Automation Technology BOSH vs. Chef vs. Puppet Service Instances Shared Dedicated

Desired time to repair Seconds, minutes, hours? Availability Service instance availability Service broker availability Configurability Adapt to local network and security policies. Integrate existing infrastructure. Accessibility Remote log-in to service instances. Performance Service broker performance (ops/s) Service instance performance Time to provision service instance. Security Network security. Encryption. Transparency Accessing metrics and logs. Operability Easyness to operate and maintain.

1. Define a service instance! 2. Define total # service instances! 3. Define # service instance CRUD ops / min!

Common Data Service Design Patterns

Common Data Service Design Patterns A. Shared VM cluster B. Dedicated containers C. Dedicated VMs / VM clusters

Scaling a shared VM cluster

Scaling a shared VM cluster MongoDB Cluster 3 VMs MongoDB VM #1 MongoDB VM #2 MongoDB VM #3

Scaling a shared VM cluster Low costs per service instance

Scaling a shared VM cluster Simple Service Broker Logic

create service create a database

create service binding create a database user

Weak Isolation!

Structural Limitation!

Scaling a shared VM cluster What to do when the shared cluster is full?

Scaling a shared VM cluster Simple Service Broker Logic

Scaling a shared VM cluster Complex Service Broker Logic

Fragmentation

Placement Problem

Placement Problem MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3? Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 New Service Instance

Cluster Rebalancing

Shared Cluster Conclusion

Scalability issued can be addressed Isolation issues are heavily data-service specific > A generic solution is not possible.

Scaling Dedicated Containers

Better Isolation

Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Docker host VM #1 Docker host VM #2

How to scale?

Structural Limitation!

Scaling a shared VM cluster What to do when the Cell/Cluster is full?

Same Service Broker Challenge

New Challenge: How to add Cell-VMs on-demand?

On-Demand VM provisioning is unavoidable.

Why not delegate most challenges?

On-Demand Dedicated VMs and Clusters

Architecture

Let BOSH do the VM orchestration!

Let the infrastructure solve the placement and fragmentation challenge!

Shared Services Instances

Shared PostgreSQL = SPOF