Beyond 1001 Dedicated Data Service Instances
Introduction
The Challenge
Given: Application platform based on Cloud Foundry to serve thousands of apps
Application Runtime
Many platform users - who don t know each other Different app langs & frameworks 100% on-demand self-service: no involvement of the platform operator necessary. Instant scalablity & self-healing
Easy Deployment $> cf push myapp
Runtime abstraction
Java Bildpack Java Code Staging Java Droplet Execution Droplet
Ruby Bildpack Ruby Staging Ruby Droplet Execution Droplet
Java Droplet Droplet Droplet Ruby Droplet
In front of the Cloud Foundry Runtime all droplets are equal
Droplet Droplet $START_CMD $START_CMD
Droplet Container Image Droplet $START_CMD $START_CMD $START_CMD
Something you can execute in a container. Something you can execute in a container. Something you can execute in a container. $START_CMD $START_CMD $START_CMD
Abstraction enables further assumptions & automation
Scaling Apps Cloud Foundry Runtime App#2 Instance#1 $> App#1 Instance#1 App#2 Instance#2 Assuming this to be our status quo.
App Scalability $> cf scale -i 3 app#1
Scaling Apps Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#2 $> App#1 Instance#3 App#2 Instance#2 Two additional instances have been created.
App Self-Healing
App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#2 $> App#1 Instance#3 App#2 Instance#2 Everything is healthy.
App Self-Healing Cloud Foundry Runtime App#1 Instance#1 App1 Instance $> 2 App#1 Instance#3 App#2 Instance#1 App#2 Instance#2 App #1 Instance #2 is failing.
App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 $> App#1 Instance#3 App#2 Instance#2 App #1 Instance #2 - gone temporarily.
App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#2 $> App#1 Instance#3 App#2 Instance#2 App #1 Instance #2 re-created.
How does the paradise for Backing Services look like?
Missing: A solution to serve thousands of data services
Application Runtime Data Services
The Mission
Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures
Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures
a9s PostgreSQL a9s MongoDB a9s RabbitMQ Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures a9s Elasticsearch a9s Redis a9s LogMe
Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures
Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures
Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures.
Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures.
Providing a growing number of data services with full lifecycle automation of thousands of data service instances across a wide range of infrastructures
in both:
public in both:
in both: public and on-premise clouds
and integrate well with multiple platforms
and integrate well with multiple platforms
Requirements
Portability Security Usability Scalability Performance Maintainability Robustness Manageability Flexibility On-demand self-service Extensibility Multi-tenancy
Portability Scalability Production-Readiness On-demand self-service
Design
How to build it?
Data Service Provisioning API Automation Middleware Data Service Automation
Open Service Broker, a new industry standard for data service provisioning.
Open Service Broker API Supporters Google Pivotal IBM RedHat Fujitsu SAP
Supporting Platforms Cloud Foundry OpenShift Kubernetes More to come
Supporting Platforms Cloud Foundry OpenShift Kubernetes More to come
Get Service Catalog GET /v2/catalog Provision Service - Create Service Instance PUT /v2/service_instances/:id Bind Service PUT /v2/service_instances/:instance_id/ service_bindings/:id Unbind Service DELETE /v2/service_instances/:instance_id/ service_bindings/:id Unprovision Service DELETE /v2/service_instances/:id http://docs.cloudfoundry.org/services/api.html#api-overview
HTTP Verb Action Service Catalog GET /v2/catalog Create Service Instance PUT /v2/service_instances/:id Create Service Binding PUT /v2/service_instances/:instance_id/ service_bindings/:id Delete Service Binding DELETE /v2/ service_instances/:instance_id/ service_bindings/:id Delete Service Instance DELETE /v2/service_instances/:id Deliver meta data about the data service. Provision a VM, install and configure a data service VMs / Cluster representing a service instance. Create a data service user and return credentials representing a service binding. Remove credentials associated with the service binding. Destroy the VMs and data associated with the service instance.
Data Service Provisioning API Automation Middleware Data Service
The Open Service Broker API does not define what a service instance is.
Applying the design pattern: On-Demand Provisioning of Dedicated Data Service Instances
Result
Using a Service Broker with Cloud Foundry $> cf create-service
Easy Deployment $> cf create-service mongodb single-small my-single-mongo-1
my-single-mongo-1 MongoDB VM#1
Easy Deployment $> cf create-service mongodb cluster-small my-3node-mongocluster-2
Newly created service instance my-single-mongo-1 my-3node-mongo-cluster-2 MongoDB VM#1 MongoDB VM#1 MongoDB VM#2 MongoDB VM#3
Technical Challenges
State
State is handled differently in each backing service.
State is handled differently in each backing service. Operational model will be different. Replication, failure detection, failover.
State is handled differently in each backing service. Operational model will be different. Replication, failure detection, failover. The data service automation will be different.
Where to store state?
App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#2 App#1 Instance#3 App#2 Instance#2 Everything is healthy.
App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App1 Instance 2 App#1 Instance#3 App#2 Instance#2 App #1 Instance #2 is failing.
App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#3 App#2 Instance#2 App #1 Instance #2 - gone temporarily.
App Self-Healing Cloud Foundry Runtime App#2 Instance#1 App#1 Instance#1 App#1 Instance#2 App#1 Instance#3 App#2 Instance#2 App #1 Instance #2 re-created.
App self-healing is easy because there is NO STATE.
App self-healing is easy because there is NO STATE. How to store state but still being able to perform self-healing?
Store state on a remotely attached block device = persistent disk.
IaaS API VIRTUAL DATACENTER Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack
IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE Operating System Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD Storage Volume HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack
Persistent disk has a file system. Filesystems may fail replication / clustering & backups are still very important.
The data lifecycle has been decoupled from the VM lifecycle The VM becomes disposable.
Storing state
What needs to be automated?
Data Service Instance Lifecycle
Lifecycle of a Data Service Instance 1 Provision a data service server Install data service software Configure data service software Consume data service with apps Debug data service issues Update data service version Update operating system Backup & recover data Scale out data service VM(s) Destroy data service & DB VM(s)
Can you do that x * 1000 times?
You either automate it or delegate it to the app developer.
Automation
BOSH
BOSH let s you orchestrate the lifecycle of large-scale deployments of stateful distributed systems to infrastructure.
BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE BOSH API BOSH BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack
BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE BOSH API BOSH BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack
BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE BOSH API BOSH BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack
BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE Operating System BOSH Agent VIRTUAL MACHINE BOSH API BOSH BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD Storage Volume HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack
BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE Operating System BOSH Agent VIRTUAL MACHINE BOSH API BOSH BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack
BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE PostgreSQL Operating System BOSH Agent VIRTUAL MACHINE BOSH API BOSH BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack
BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE PostgreSQL Operating System BOSH Agent VIRTUAL MACHINE BOSH API BOSH BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack
BOSH CLI $> bosh deploy IaaS API VIRTUAL DATACENTER VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE PostgreSQL Operating System Operating Cloud Controller System Operating UAA System BOSH API BOSH BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH CPI Router STORAGE Storage Node Storage Node Storage Node HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Infrastructure as a Service (IaaS), e.g. OpenStack
BOSH Automation
BOSH Releases contain the automation
A BOSH-Deployment depends on 1.. * Stemcells
A BOSH-Deployment is described by a Release & Manifest
A Release describe 1.. * Jobs
A Release contains 1.. * Package
A BOSH Deployment s settings are contained in a Manifest
Infrastructure settings settings are contained in the Cloud Config
BOSH makes your deployments
Infrastructure Independent
A BOSH release contains the main-automation (software packages, how to run processes) BOSH releases can be re-used on every* infrastructure
Automate once, deploy everywhere.
BOSH CLI BOSH BOSH BOSH VMware AWS OpenStack
BOSH CLI BOSH BOSH BOSH VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service / App Some Service / App Some Service / App BOSH Agent BOSH Agent BOSH Agent VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service / App Some Service / App Some Service / App BOSH Agent BOSH Agent BOSH Agent VMware AWS OpenStack
BOSH CLI BOSH BOSH BOSH VMware AWS OpenStack
BOSH CLI $> bosh target http://bosh-on.aws.com BOSH BOSH BOSH VMware AWS OpenStack
BOSH CLI BOSH BOSH BOSH VMware AWS OpenStack
BOSH CLI $> bosh deploy BOSH BOSH BOSH VMware AWS OpenStack
BOSH CLI BOSH BOSH BOSH VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service / App Some Service / App Some Service / App BOSH Agent BOSH Agent BOSH Agent VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service / App Some Service / App Some Service / App BOSH Agent BOSH Agent BOSH Agent VMware AWS OpenStack
Switch deployment between clouds Keep the same release Use a stemcell specific to the new cloud Adapt the cloud config
Operating System Independent
A BOSH release does not depend on the OS
The only dependency to the OS is a BOSH stemcell
VIRTUAL MACHINE Operating System Image BOSH Agent
VIRTUAL MACHINE Operating System Image BOSH Agent }
VIRTUAL MACHINE Operating System Image BOSH Agent }OS image + BOSH agent = Stemcell
VIRTUAL MACHINE Ubuntu Stemcell BOSH Agent
Changing the OS of a BOSH deployed system Keep the same release Change the stemcell Change the manifest
Scalable
Horizontal Scaling
VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service Some Service Some Service Some Service Some Service Some Service Some Service BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service Some Service Some Service Some Service Some Service Some Service Some Service BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent Horizontal Scaling VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service Some Service Some Service Some Service Some Service Some Service Some Service BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE VIRTUAL MACHINE Some Service Some Service Some Service Some Service Some Service Some Service Some Service BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent BOSH Agent
Scaling-out a BOSH deployed system Keep the same release Use the same stemcell Change the manifest
Vertical Scaling
VIRTUAL MACHINE 4 GB RAM, 1 vcpu 4GB RAM 1 vcpu 10GB persistent disk PostgreSQL Data 10 GB Persistent Disk BOSH Agent
VIRTUAL MACHINE 4 GB RAM, 1 vcpu PostgreSQL Data 10 GB Persistent Disk BOSH Agent
VIRTUAL MACHINE 4 GB RAM, 1 vcpu PostgreSQL BOSH Agent
VIRTUAL MACHINE 8 GB RAM, 2 vcpus PostgreSQL Data 10 GB Persistent Disk 20 GB Persistent Disk BOSH Agent
VIRTUAL MACHINE 8 GB RAM, 2 vcpus PostgreSQL Data 10 GB Persistent Disk Data 20 GB Persistent Disk BOSH Agent
VIRTUAL MACHINE 8 GB RAM, 2 vcpus PostgreSQL Data 20 GB Persistent Disk BOSH Agent
BOSH Deployments are Predictable
Src code is compiled in a freshly created VMs VMs always contain exact the software, specified in the release No left-overs of prior deployments as new VMs are used.
BOSH Deployments are Repeatable
Executing a specific BOSH deployment always leads to exact same deployed system.
Monitored & Self-Healing
Self-healing process failures
BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Some other process Process Monitor BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor
BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Some other process Process Monitor BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor
Self-healing process monitor failures
BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Process Monitor Some other process BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor
BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Process Monitor Some other process BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor
BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Process Monitor Some other process BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor
BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Process Monitor Some other process BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor
Self-healing VM failures
BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Some other process Process Monitor BLOB Store VIRTUAL MACHINE BOSH Agent Process Monitor Yet another process
BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Some other process Process Monitor BLOB Store VIRTUAL MACHINE BOSH Agent Process Monitor Yet another process
Self-healing BOSH Agent failures
BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Process Monitor Some other process BLOB Store VIRTUAL MACHINE BOSH Agent Yet another process Process Monitor
BOSH Installation BOSH Managed Infrastructure Resources BOSH Health Monitor BOSH Director CPI VIRTUAL MACHINE BOSH Agent Some process Process Monitor NATS Message Bus BOSH Registry VIRTUAL MACHINE BOSH Agent Process Monitor Some other process BLOB Store VIRTUAL MACHINE BOSH Agent Process Monitor Yet another process
Data Service Provisioning API Automation Middleware Data Service Automation
Reference Architecture
CF Client create service Cloud Controller create service a9s PostgreSQL SPI create binding Cloud Foundry Adapter a9s Service Broker Middleware Adapter create deployment from template xy with attributes { } create service specific credentials Templates a9s Deployer Deployments deploy release abc & deployment manifest xyz Bosh Execute deployments Service Instance Service Instance Service Instance my-single-postgres-1 my-3node-postgres-cluster-2 my-3node-postgres-cluster-3 VM#1 VM#1 VM#2 VM#3 VM#1 VM#2 VM#3
Data Service Provisioning API Automation Middleware Data Service Automation
Portability BOSH: Multi-infrastructure support Open Service Broker API: Multi-platform support Scalability On-demand provisioning of dedicated service instances BOSH: Scale existing service instances vertically, solo & clustered instances Production Readyness Dedicated data service instances / Strong instance isolation BOSH: Self-healing, clustered service instances, backup & restore On-demand self-service Open Service Broker API, On-demand provisioning, ondemand updates On-demand backup & restore
a9s PostgreSQL a9s MongoDB a9s RabbitMQ a9s Elasticsearch a9s Redis a9s LogMe
Operations
Continuous Data Service Delivery
Delivering Data Service Patches Open Source PostgreSQL Building new Data Service Releases a9s PostgreSQL Upstream Release Build Test a9s Release Platform #1 Platform #2 Updating the Data Services Update Data Service Instances Platform #n
Common Maintenance Tasks to performed at Scale
Create Service Instance Create VM Install and start data services
Vertical Scale Service Instance Destroy old VM Create new VM Mount old persistent disk Create and mount new persistent disk Copy data Optional: reintegrate into the cluster
OS Update Destroy old VM Create new VM based on new Stemcell (\w new OS version) Attach persistent disk
Ultimate Question
Can you handle more than 1001 Data Service Instances?
Yes.
Excerpt from our perf tests:
Provisioning 1001 instances in sequence over a greater period of time does not expose any significant bottleneck.
For large highly-frequented platforms the amount of simultaneous deployments may become relevant.
Data Service Instances BOSH Queue Time Avg. time to provision VM Total time needed to create instances 250 14 min 6:57 min 21 min 500 29 min 7:01 36 min 750 46 min 7:13 53 min
Optimization Task: Manage BOSH queueing time to an acceptable level.
Scaling BOSH is key to deal with simultaneous provisioning.
Sum Up
Sum Up Full lifecycle automation is feasible Open Service Broker API, a new standard Choosing the right automation technology is key, e.g. BOSH CI/CD based dev and ops are essential
Questions?
Common Data Service Design Patterns A. Shared VM cluster B. Dedicated containers C. Dedicated VMs / VM clusters
Scaling a shared VM cluster
Scaling a shared VM cluster MongoDB Cluster 3 VMs MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster MongoDB Cluster 3 VMs MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster MongoDB Cluster 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster MongoDB Cluster 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster Low costs per service instance
Scaling a shared VM cluster Simple Service Broker Logic
create service create a database
create service binding create a database user
Weak Isolation!
Structural Limitation!
Scaling a shared VM cluster What to do when the shared cluster is full?
Scaling a shared VM cluster MongoDB Cluster 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3 Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max
Scaling a shared VM cluster MongoDB Cluster #1 3 VMs 3 VMs Service Instance #1 #1 = database = #1 #1 Service Instance #2 #2 = database = #2 #2 Service Instance #3 #3 = database = #3 #3 Service Instance #4 #4 = database = #4 #4 Service Instance #5 #5 = database = #5 #5 Service Instance #6 #6 = database = #6 #6 Service Instance # # = database = # # Service Instance # n-max # = database = #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster MongoDB Cluster #1 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster MongoDB Cluster #1 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3 MongoDB Cluster #2 3 VMs Service Instance #n+1 Service Instance #n+2 Service Instance #n+3 Service Instance #n+4 Service Instance #n+5 Service Instance #n+6 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6
Scaling a shared VM cluster Simple Service Broker Logic
Scaling a shared VM cluster Complex Service Broker Logic
Fragmentation
Fragmentation MongoDB Cluster #1 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 MongoDB Cluster #2 3 VMs Service Instance #n+1 Service Instance #n+2 Service Instance #n+3 Service Instance #n+4 Service Instance #n+5 Service Instance #n+6 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 Caused by frequent creation and / deletion of service instances
Fragmentation MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 Caused by frequent creation and / deletion of service instances
Placement Problem
Placement Problem MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6
Placement Problem MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3? Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 New Service Instance
Placement Problem MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 New Service Instance Strategy to place new service instances is required and may require data service specific logic.
Cluster Rebalancing
Cluster Rebalancing MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6
Cluster Rebalancing MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6
Cluster Rebalancing MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 A unbalanced set of clusters wastes infrastructure resources.
Cluster Rebalancing MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6
Cluster Rebalancing MongoDB Cluster #1 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance # 2*n-max Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 A cluster rebalance freeing infrastructure resources would desirable.
Shared Cluster Conclusion
Scalability issued can be addressed Isolation issues are heavily data-service specific > A generic solution is not possible.
Scaling Dedicated Containers
Better Isolation
Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Docker host VM #1 Docker host VM #2
Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Docker host VM #1 Docker host VM #2
Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Service Instance #2 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Docker host VM #1 Docker host VM #2
How to scale?
Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Service Instance #2 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Docker host VM #1 Docker host VM #2
Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Service Instance #2 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Docker host VM #1 Docker host VM #2
Structural Limitation!
Scaling a shared VM cluster What to do when the Cell/Cluster is full?
Scaling Dedicated Containers PosgreSQL Cell #1 2 VMs across 2 AZs PosgreSQL Cell #2 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes Service Instance #3 = 2 Docker containers + 2 PostgreSQL processes Service Instance #2 = 2 Docker containers + 2 PostgreSQL processes Service Instance #3 = 2 Docker containers + 2 PostgreSQL processes Docker host VM #1 Docker host VM #2 Docker host VM #3 Docker host VM #4
Same Service Broker Challenge
New Challenge: How to add Cell-VMs on-demand?
On-Demand VM provisioning is unavoidable.
Why not delegate most challenges?
On-Demand Dedicated VMs and Clusters
Architecture
CF Client create service Cloud Controller create service a9s MongoDB SPI create binding Cloud Foundry Adapter a9s Service Broker Middleware Adapter create deployment from template xy with attributes { } create service specific credentials Templates a9s Deployer Deployments deploy release abc & deployment manifest xyz Bosh Execute deployments Service Instance Service Instance Service Instance my-single-mongodb-1 my-3node-mongodb-cluster-2 my-3node-mongodb-cluster-3 MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB VM#1 VM#1 VM#2 VM#3 VM#1 VM#2 VM#3
Let BOSH do the VM orchestration!
Let the infrastructure solve the placement and fragmentation challenge!
Shared Data Services
Shared PostgreSQL Cluster > Bad idea 1x 1x Single PostgreSQL Server 1 VM Service Instance 1 Service Instance 2 VM#1 Service Instance 3 OR PostgreSQL Cluster 3 VMs Service Instance 1 Service Instance 1 Service Instance 1 Service Instance 2 Service Instance 2 Service Instance 2 VM#1 VM#2 VM#3 Service Instance 3 Service Instance 3 Service Instance 3 Single VM or single cluster of VMs Single PostgreSQL server or single PostgreSQL cluster Isolation limited to PostreSQL multitenancy capabilities
Shared PostgreSQL = SPOF
Cloud Foundry Runtime PostgreSQL Cluster 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance Service Instance VM#1 VM#2 VM#3 Service Instance Service Instance Service Instance
Cloud Foundry Runtime PostgreSQL Cluster 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance Service Instance App App App App App App App App App App Service Instance Service Instance Service Instance App App App App App App App App App App App App App App App App App App App App
Cloud Foundry Runtime App App App App App App App App App App App App App App App App App App App PostgreSQL Cluster App App Service Instance 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance App App Service Instance Service Instance Service Instance App App App App App App App App App App App App App
Cloud Foundry Runtime App App App App App App App App App App App App App App App App App App App PostgreSQL Cluster App App Service Instance 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance App App Service Instance Service Instance Service Instance App App App App App App App App App App App App App
Your shared PostgreSQL cluster goes down, all your PostgreSQL database instances go down.
Beware of bad neighborhood
Cloud Foundry Runtime PostgreSQL Cluster 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance Service Instance VM#1 VM#2 VM#3 Service Instance Service Instance Service Instance
Cloud Foundry Runtime PostgreSQL Cluster 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance Service Instance VM#1 VM#2 VM#3 Service Instance Service Instance Service Instance
Cloud Foundry Runtime PostgreSQL Cluster 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance Service Instance VM#1 VM#2 VM#3 Service Instance Service Instance Service Instance
Cloud Foundry Runtime PostgreSQL Cluster 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance Service Instance VM#1 VM#2 VM#3 Service Instance Service Instance Service Instance
Cloud Foundry Runtime Service Instance Service Instance VM#1 Service Instance PostgreSQL Cluster 3 VMs Service Instance Service Instance VM#2 Service Instance Service Instance Service Instance VM#3 Service Instance
Shared clusters are vulnerable to bad neighbors
Dedicated
Dedicated PostgreSQL instances > Good idea n x Service Instance my-single-postgres-1 VM#1 and / or Service Instance my-3node-postgres-cluster-2 Service instance = dedicated VM or dedicated cluster of VMs Uses infrastructure m x isolation to enable VM#1 VM#2 VM#3 multi-tenancy support
Cloud Foundry Runtime Service Instance VM Service Instance VM Service Instance Service Instance VM VM Service Instance Service Instance VM VM Service Instance VM#1 VM#2 VM#3 Service Instance VM#1 VM#2 VM#3
Cloud Foundry Runtime Service Instance VM Service Instance VM Service Instance Service Instance VM VM Service Instance Service Instance VM VM Service Instance VM#1 VM#2 VM#3 Service Instance VM#1 VM#2 VM#3
Cloud Foundry Runtime Service Instance VM Service Instance VM Service Instance Service Instance VM VM Service Instance Service Instance VM VM Service Instance VM#1 VM#2 VM#3 Service Instance VM#1 VM#2 VM#3
PostgreSQL failures are contained. Only one service instance affected.
Bad neighborhood protection with dedicated service instances
Cloud Foundry Runtime PostgreSQL Cluster = Service Instance #1 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance PostgreSQL Cluster = Service Instance #2 3 VMs Service Instance Service Instance Service Instance VM#1 VM#2 VM#3 PostgreSQL Cluster = Service Instance #3 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance
Cloud Foundry Runtime PostgreSQL Cluster = Service Instance #1 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance PostgreSQL Cluster = Service Instance #2 3 VMs Service Instance Service Instance Service Instance VM#1 VM#2 VM#3 PostgreSQL Cluster = Service Instance #3 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance
Cloud Foundry Runtime PostgreSQL Cluster = Service Instance #1 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance PostgreSQL Cluster = Service Instance #2 3 VMs Service Instance Service Instance Service Instance VM#1 VM#2 VM#3 PostgreSQL Cluster = Service Instance #3 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance
Cloud Foundry Runtime PostgreSQL Cluster = Service Instance #1 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance PostgreSQL Cluster = Service Instance #2 3 VMs Service VM#1 Instance Service VM#2 Instance Service Instance VM#3 PostgreSQL Cluster = Service Instance #3 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance
Cloud Foundry Runtime PostgreSQL Cluster = Service Instance #1 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance PostgreSQL Cluster = Service Instance #2 3 VMs Service VM#1 Instance Service VM#2 Instance Service Instance VM#3 PostgreSQL Cluster = Service Instance #3 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance
Cloud Foundry Runtime PostgreSQL Cluster = Service Instance #1 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance PostgreSQL Cluster = Service Instance #2 3 VMs Service VM#1 Instance Service VM#2 Instance Service Instance VM#3 PostgreSQL Cluster = Service Instance #3 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance
Cloud Foundry Runtime PostgreSQL Cluster = Service Instance #1 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance PostgreSQL Cluster = Service Instance #2 3 VMs Service VM#1 Instance Service VM#2 Instance Service Instance VM#3 PostgreSQL Cluster = Service Instance #3 3 VMs Service VM#1 Instance Service VM#2 Instance Service VM#3 Instance Infrastructure isolation
Dedicated clusters isolate bad neighbors
Cloud
Cloud Automation
Cloud Automation Robustness
Cloud Automation Robustness Self-Healing
Cloud Automation Robustness Self-Healing Scalability
Cloud Automation Robustness Self-Healing Scalability On-demand self-service
Resource Type VM vs. Container Provisioning Pre-provisioned vs. On-demand-provisioning Failover Strategy Resurrection Failover vs. Standby-Failover Data Redundancy Single replica vs. Data redundancy / Multiple replicas Infrastructure Reliability Perfect. HA VMs. No SPOFs. Never fails. VMs cost more than a Design to fail. Fails from time to time. Saves money. Automation Technology BOSH vs. Chef vs. Puppet Service Instances Shared Dedicated
Desired time to repair Seconds, minutes, hours? Availability Service instance availability Service broker availability Configurability Adapt to local network and security policies. Integrate existing infrastructure. Accessibility Remote log-in to service instances. Performance Service broker performance (ops/s) Service instance performance Time to provision service instance. Security Network security. Encryption. Transparency Accessing metrics and logs. Operability Easyness to operate and maintain.
1. Define a service instance! 2. Define total # service instances! 3. Define # service instance CRUD ops / min!
Common Data Service Design Patterns
Common Data Service Design Patterns A. Shared VM cluster B. Dedicated containers C. Dedicated VMs / VM clusters
Scaling a shared VM cluster
Scaling a shared VM cluster MongoDB Cluster 3 VMs MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster MongoDB Cluster 3 VMs MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster MongoDB Cluster 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster MongoDB Cluster 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster Low costs per service instance
Scaling a shared VM cluster Simple Service Broker Logic
create service create a database
create service binding create a database user
Weak Isolation!
Structural Limitation!
Scaling a shared VM cluster What to do when the shared cluster is full?
Scaling a shared VM cluster MongoDB Cluster 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3 Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max
Scaling a shared VM cluster MongoDB Cluster #1 3 VMs 3 VMs Service Instance #1 #1 = database = #1 #1 Service Instance #2 #2 = database = #2 #2 Service Instance #3 #3 = database = #3 #3 Service Instance #4 #4 = database = #4 #4 Service Instance #5 #5 = database = #5 #5 Service Instance #6 #6 = database = #6 #6 Service Instance # # = database = # # Service Instance # n-max # = database = #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster MongoDB Cluster #1 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3
Scaling a shared VM cluster MongoDB Cluster #1 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #1 MongoDB VM #2 MongoDB VM #3 MongoDB Cluster #2 3 VMs Service Instance #n+1 Service Instance #n+2 Service Instance #n+3 Service Instance #n+4 Service Instance #n+5 Service Instance #n+6 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6
Scaling a shared VM cluster Simple Service Broker Logic
Scaling a shared VM cluster Complex Service Broker Logic
Fragmentation
Fragmentation MongoDB Cluster #1 3 VMs Service Instance #1 = database #1 Service Instance #2 = database #2 Service Instance #3 = database #3 Service Instance #4 = database #4 Service Instance #5 = database #5 Service Instance #6 = database #6 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 MongoDB Cluster #2 3 VMs Service Instance #n+1 Service Instance #n+2 Service Instance #n+3 Service Instance #n+4 Service Instance #n+5 Service Instance #n+6 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 Caused by frequent creation and / deletion of service instances
Fragmentation MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 Caused by frequent creation and / deletion of service instances
Placement Problem
Placement Problem MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6
Placement Problem MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3? Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 New Service Instance
Placement Problem MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 New Service Instance Strategy to place new service instances is required and may require data service specific logic.
Cluster Rebalancing
Cluster Rebalancing MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #n+3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6
Cluster Rebalancing MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6
Cluster Rebalancing MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6 A unbalanced set of clusters wastes infrastructure resources.
Cluster Rebalancing MongoDB Cluster #1 3 VMs MongoDB Cluster #2 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 Service Instance # 2*n-max MongoDB VM #4 MongoDB VM #5 MongoDB VM #6
Cluster Rebalancing MongoDB Cluster #1 3 VMs Service Instance #1 = database #1 Service Instance #n+2 Service Instance #3 = database #3 Service Instance # 2*n-max Service Instance #5 = database #5 Service Instance # = database # Service Instance # n-max = database #n-max MongoDB VM #2 MongoDB VM #3 A cluster rebalance freeing infrastructure resources would desirable.
Shared Cluster Conclusion
Scalability issued can be addressed Isolation issues are heavily data-service specific > A generic solution is not possible.
Scaling Dedicated Containers
Better Isolation
Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Docker host VM #1 Docker host VM #2
Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Docker host VM #1 Docker host VM #2
Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Service Instance #2 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Docker host VM #1 Docker host VM #2
How to scale?
Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Service Instance #2 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Docker host VM #1 Docker host VM #2
Scaling Dedicated Containers PosgreSQL Cell 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Service Instance #2 = 2 Docker containers + 2 PostgreSQL processes + 2 PostgreSQL databases asynchronously replicated Docker host VM #1 Docker host VM #2
Structural Limitation!
Scaling a shared VM cluster What to do when the Cell/Cluster is full?
Scaling Dedicated Containers PosgreSQL Cell #1 2 VMs across 2 AZs PosgreSQL Cell #2 2 VMs across 2 AZs Service Instance #1 = 2 Docker containers + 2 PostgreSQL processes Service Instance #3 = 2 Docker containers + 2 PostgreSQL processes Service Instance #2 = 2 Docker containers + 2 PostgreSQL processes Service Instance #3 = 2 Docker containers + 2 PostgreSQL processes Docker host VM #1 Docker host VM #2 Docker host VM #3 Docker host VM #4
Same Service Broker Challenge
New Challenge: How to add Cell-VMs on-demand?
On-Demand VM provisioning is unavoidable.
Why not delegate most challenges?
On-Demand Dedicated VMs and Clusters
Architecture
CF Client create service Cloud Controller create service a9s MongoDB SPI create binding Cloud Foundry Adapter a9s Service Broker Middleware Adapter create deployment from template xy with attributes { } create service specific credentials Templates a9s Deployer Deployments deploy release abc & deployment manifest xyz Bosh Execute deployments Service Instance Service Instance Service Instance my-single-mongodb-1 my-3node-mongodb-cluster-2 my-3node-mongodb-cluster-3 MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB VM#1 VM#1 VM#2 VM#3 VM#1 VM#2 VM#3
Let BOSH do the VM orchestration!
Let the infrastructure solve the placement and fragmentation challenge!
Shared Services Instances
Shared PostgreSQL Cluster > Bad idea 1x 1x Single PostgreSQL Server 1 VM Service Instance 1 Service Instance 2 VM#1 Service Instance 3 OR PostgreSQL Cluster 3 VMs Service Instance 1 Service Instance 1 Service Instance 1 Service Instance 2 Service Instance 2 Service Instance 2 VM#1 VM#2 VM#3 Service Instance 3 Service Instance 3 Service Instance 3 Single VM or single cluster of VMs Single PostgreSQL server or single PostgreSQL cluster Isolation limited to PostreSQL multitenancy capabilities
Shared PostgreSQL = SPOF
Cloud Foundry Runtime PostgreSQL Cluster 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance Service Instance VM#1 VM#2 VM#3 Service Instance Service Instance Service Instance
Cloud Foundry Runtime PostgreSQL Cluster 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance Service Instance App App App App App App App App App App Service Instance Service Instance Service Instance App App App App App App App App App App App App App App App App App App App App
Cloud Foundry Runtime App App App App App App App App App App App App App App App App App App App PostgreSQL Cluster App App Service Instance 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance App App Service Instance Service Instance Service Instance App App App App App App App App App App App App App
Cloud Foundry Runtime App App App App App App App App App App App App App App App App App App App PostgreSQL Cluster App App Service Instance 3 VMs Service Instance Service Instance Service Instance Service Instance Service Instance App App Service Instance Service Instance Service Instance App App App App App App App App App App App App App