Nomad
Armon Dadgar @armon
Cluster Manager Scheduler Nomad
Cluster Manager Scheduler Nomad
Schedulers map a set of work to a set of resources
Work (Input) Resources Web Server -Thread 1 Web Server -Thread 2 Redis -Thread 1 CPU Scheduler CPU - Core 1 CPU - Core 2 Kernel -Thread 1 CPU Scheduler
Work (Input) Resources Web Server -Thread 1 Web Server -Thread 2 Redis -Thread 1 CPU Scheduler CPU - Core 1 CPU - Core 2 Kernel -Thread 1 CPU Scheduler
Type Work Resources CPU Scheduler Threads Physical Cores AWS EC2 / OpenStack Nova Virtual Machines Hypervisors Hadoop YARN MapReduce Jobs Client Nodes Cluster Scheduler Applications Servers Schedulers In the Wild
Higher Resource Utilization Decouple Work from Resources Better Quality of Service Advantages
Higher Resource Utilization Bin Packing Decouple Work from Resources Over-Subscription Better Quality of Service Job Queueing Advantages
Higher Resource Utilization Abstraction Decouple Work from Resources API Contracts Better Quality of Service Standardization Advantages
Higher Resource Utilization Priorities Decouple Work from Resources Resource Isolation Better Quality of Service Pre-emption Advantages
Nomad
Cluster Scheduler Easily Deploy Applications Operationally Simple Built for Scale Nomad
example.nomad job "redis" { datacenters = ["us-east-1"] task "redis" { driver = "docker" config { image = "redis:latest" } resources { cpu = 500 # Mhz memory = 256 # MB } } } network { mbits = 10 dynamic_ports = ["redis"] }
Declares what to run Job Specification
Nomad determines where and manages how to run Job Specification
Nomad abstracts work from resources Job Specification
Docker Containerized Rkt Windows Server Containers Qemu / KVM Virtualized Xen Hyper-V Java Jar Standalone Static Binaries C#
Declarative Job Specification Infrastructure-As-Code Removes Imperative Logic External Dependencies? Nomad
Service Discovery? Health Monitoring? Application Secrets? Stateful Applications? Nomad
job my-app" { task my-app" { service { port = http check { type = http path = /health interval = 5s } } } } example.nomad
Client Nomad Server Schedule App Nomad App 1 Consul Register Service Monitor Health Consul Server App N
Secret Distribution: API Keys DB Credentials SSL/TLS Certificates Nomad
job my-app" { task my-app" { env { DB_USERPASS = foo:bar } } } example.nomad
Secure secret storage Dynamic secrets Leasing, renewal, and revocation Auditing Rich ACLs Vault Multiple client authentication methods
Login Vault Token Vault Token + Operation Op Response
example.nomad job my-app" { task my-app" { env { VAULT_TOKEN = b6a10b96-9060-11e6-9c6f-67a52bc6b8d3 } } }
job my-app" { task my-app" { vault { policies = [ my-app-role ] } } } example.nomad
Submit Job + Vault Token Nomad Server Verify Vault Token Schedule App Client Nomad Generate + Renew Vault Token App 1 App N
Native Vault Integration No Secrets in Jobs No Secrets on Client Disk Minimize Trust Nomad
Stateless Stateful Stateful Applications
Stateless Stateful API Web Cache Stateful Applications
Stateless Stateful API Web Cache HDFS Cassandra MongoDB Stateful Applications
Stateless Stateful API Web Cache HDFS Cassandra MongoDB *SQL Stateful Applications
EASY MEDIUM HARD Stateless Stateful API Web Cache HDFS Cassandra MongoDB *SQL Stateful Applications
job my-app" { task my-app" { ephemeral_disk { sticky = true } } } example.nomad
Moves data between tasks on the same machine
Copies data between tasks on different machines
Easily Deploy Apps: Declarative Jobs Flexible Workloads Consul Integration Vault Integration Nomad Sticky Volumes
Operationally Simple
Client Server
Built on Experience GOSSIP CONSENSUS
Cluster Management Gossip Based (P2P) Membership Failure Detection Serf Event System
Large Scale Production Hardened Simple Clustering and Federation Serf
Service Discovery Configuration Coordination (Locking) Central Servers + Distributed Clients Consul
Multi-Datacenter Raft Consensus Large Scale Production Hardened Consul
Operational Simplicity: Single Binary No Dependencies Highly Available Nomad
Built for Scale
Built on Experience Mature Libraries Proven Design Patterns GOSSIP CONSENSUS Lacking Scheduling Logic
Built on Research GOSSIP CONSENSUS
CLIENT CLIENT CLIENT DC1 DC2 DC3 RPC RPC RPC SERVER REPLICATION SERVER REPLICATION SERVER FORWARDING FORWARDING FOLLOWER LEADER FOLLOWER Single Region Architecture
REGION A SERVER SERVER SERVER REPLICATION REPLICATION FOLLOWER LEADER FORWARDING FOLLOWER REGION B GOSSIP REGION FORWARDING SERVER FOLLOWER REPLICATION SERVER LEADER REPLICATION FORWARDING SERVER FOLLOWER Multi Region Architecture
Region is Isolation Domain 1-N Datacenters Per Region Flexibility to do 1:1 (Consul) Scheduling Boundary Nomad
Hundreds of regions Tens of thousands of clients per region Thousands of jobs per region
Inspired by Google Omega Optimistic Concurrency State Coordination Service & Batch workloads Nomad Pluggable Architecture
Data Model NODE EVALUATION ALLOCATION JOB
Evaluation ~= State Change
Evaluations Create / Update / Delete Job Node Up / Node Down Allocation Failed / Finished
Evaluations SCHEDULER func(evaluation) => []AllocationUpdates
Evaluations SCHEDULER func(evaluation) => []AllocationUpdates Service, Batch, System
External Event EvaluaBon CreaBon EvaluaBon Queuing EvaluaBon Processing OpBmisBc CoordinaBon State Updates
Omega Architecture Optimistically Schedule 100 s of Jobs in Parallel Controls for Correctness Nomad
Nomad Million Container Challenge 1,000 Jobs 1,000 Tasks per Job 5,000 Hosts on GCE 1,000,000 Containers
640 KB ought to be enough for anybody. Bill Gates
2nd Largest Hedge Fund 18K Cores 5 Hours 2,200 Containers/second
Cluster Scheduler Easily Deploy Applications Operationally Simple Built for Scale Nomad
Thanks! Q/A