Istio A modern service mesh Louis Ryan Principal Engineer @ Google @louiscryan
My Google Career HTTP Reverse Proxy HTTP HTTP2 GRPC Reverse Proxy Reverse Proxy HTTP API Proxy HTTP Control Plane HTTP2 GRPC Stubby API Proxy v2 GData Library Server Stubby GRPC (local) Server Server Centralization Performance & Isolation
Cloud Internal & External Convergence HTTP2 GRPC Network distance & bandwidth Protocols Isolation & Reliability Security Concerns Reverse Proxy Control Plane HTTP2 GRPC API Proxy v2 Stubby GRPC (local) Sidecar! Server
Decoupling Velocity Operators & Developers Code & Networking Network Topology & Security Modernization & Architecture
What is a Service Mesh? A network for services, not bytes Observability Resiliency Traffic Control Security Policy Enforcement Zero code change FREE!
What is a Service Mesh? A network for services, not bytes Observability Resiliency Traffic Control Security Policy Enforcement
Weaving the mesh - Sidecars HTTP/1.1, HTTP/2, grpc, TCP with or without TLS HTTP/1.1, HTTP/2, grpc, TCP with or without TLS Internet proxy sidecar sidecar svca svcb Service A Outbound features: Service authentication Load balancing Retry and circuit breaker Fine-grained routing Telemetry Request Tracing Fault Injection Service B Inbound features: Service authentication Authorization Rate limits Load shedding Telemetry Request Tracing Fault Injection External Services
Istio - Putting it all together Control Plane API Pilot Control flow during request processing Mixer Istio-Auth Discovery & Config data to Envoys TLS certs to Envoy Policy checks, telemetry Pod Traffic is transparently intercepted and proxied. App is unaware of Envoy s presence Envoy Envoy svca svcb Service A Service B
Our sidecar of choice - Envoy A C++ based L4/L7 proxy Low memory footprint Battle-tested @ Lyft 100+ services 10,000+ VMs 2M req/s Plus an awesome team willing to work with the community! Goodies: API driven config updates no reloads Zone-aware load balancing w/ failover Traffic routing and splitting Health checks, circuit breakers, timeouts, retry budgets, fault injection, HTTP/2 & grpc Transparent proxying Designed for observability
2. 3. 4. Custom Eureka Consul Environment-specific topology extraction Topology is mapped to a platform-agnostic model. Additional rules are layered onto the model. E.g. retries, traffic splits etc. Configuration is pushed to Envoy and applied without restarts Rules API 1. Kubernetes Modeling the Service Mesh Platform Adapter Abstract Model Pilot Envoy API Service discovery & traffic rules Envoy Envoy Envoy Envoy
What is a Service Mesh? A network for services, not bytes Observability Resiliency Traffic Control Security Policy Enforcement
Visibility Monitoring & tracing should not be an afterthought in the infrastructure Goals Istio - Grafana dashboard w/ Prometheus backend Metrics without instrumenting apps Consistent metrics across fleet Trace flow of requests across services Portable across metric backend providers Istio Zipkin tracing dashboard
Visibility: Metrics Requests Requests Responses Envoy Responses Service Traces Mixer collects metrics emitted by Envoys Adapters in the Mixer normalize and forward to monitoring backends Metrics backend can be swapped at runtime Zipkin Report([ ]attributes) Check(attributes) Mixer Operator Supplied Config Adapters Backends Prometheus Stackdriver GUIs ServiceGraph Example Grafana Statsd Weave Scope
Visibility: Tracing Requests Requests Responses Envoy Responses Service Traces Applications do not have to deal with generating spans or correlating causality Mixer Applications need to forward context headers on outbound calls Envoy sends traces to Mixer Adapters at Mixer send traces to respective backends Report([ ]attributes) Check(attributes) Envoys generate spans Zipkin Operator Supplied Config Adapters Backends Prometheus Stackdriver GUIs ServiceGraph Example Grafana Statsd Weave Scope
What is a Service Mesh? A network for services, not bytes Observability Resiliency Traffic Control Security Control
Resiliency Istio adds fault tolerance to your application without any changes to code // Circuit breakers destination: serviceb.example.cluster.local policy: - tags: version: v1 circuitbreaker: simplecb: maxconnections: 100 httpmaxrequests: 1000 httpmaxrequestsperconnection: 10 httpconsecutiveerrors: 7 sleepwindow: 15m httpdetectioninterval: 5m Resilience features Timeouts Retries with timeout budget Circuit breakers Health checks AZ-aware load balancing w/ automatic failover Control connection pool size and request load Systematic fault injection
What is a Service Mesh? A network for services, not bytes Observability Resiliency & Efficiency Traffic Control Security Policy Enforcement
Traffic Splitting serviceb.example.cluster.local Rules API // A simple traffic splitting rule Pilot Traffic routing rules 99% Envoy Pod Envoy svcb http://serviceb.example 1% svca Envoy Service A svcb Traffic control is decoupled from infrastructure scaling Pod Labels: version: v2.0-alpha, env:us-staging Service B destination: serviceb.example.cluster.local match: source: servicea.example.cluster.local route: - tags: version: v1.5 env: us-prod weight: 99 - tags: version: v2.0-alpha env: us-staging weight: 1 Pod Labels: version: v1.5 env: us-prod
Traffic Steering // Content-based traffic steering rule destination: serviceb.example.cluster.local match: httpheaders: user-agent: regex: ^(.*?;)?(iphone)(;.*)?$ precedence: 2 route: - tags: version: canary Service B version: v1 Pod 1 svca svcb Service A Service B version: v1 Pod 1 oid* ndr t: *A gen r-a Use Use Pod 2 Pod 3 svcb svca Service A Content-based traffic steering Pod 2 Pod 3 r-ag ent: *iph one * Pod 4 svcb version: canary
What is a Service Mesh? A network for services, not bytes Observability Resiliency & Efficiency Traffic Control Security Policy Enforcement
Securing Services Encryption by default Verifiable identity Secure naming / addressing Revocation
Problem: Strong Service Security at Scale Concerns Insiders Hijacked services Microservice attack surface Workload mobility Brittle fine-grained models Securing resources not just endpoints Audit & Compliance Wants Workload mobility Remote admin & development Shared & 3rd party services User & Service identity Lower costs Traditional perimeter security models are insufficient
Istio - Security at Scale spiffe.io
What is a Service Mesh? A network for services, not bytes Observability Resiliency & Efficiency Traffic Control Security Policy Enforcement
Putting it all together Control Plane API Control flow during request processing Pilot Mixer Istio-Auth Discovery & Config data to Envoys TLS certs to Envoy Policy checks, telemetry Pod Envoy Envoy svca svcb Service A Service B
What s Mixer For? Nexus for policy evaluation and telemetry reporting Precondition checking Quotas & Rate Limiting Primary point of extensibility Enabler for platform mobility Operator-focused configuration model
Attributes - The behavioral vocabulary target.service request.size request.time source.ip source.name source.user api.operation = = = = = = = playlist.svc.cluster.local 345 2017-04-12T12:34:56Z 192.168.10.1 music-fe.serving.cluster.local admin@musicstore.cluster.local GetPlaylist
Roadmap Production Readiness Multi-Cloud & Multi-Environment Networking - Extension models, UDP, QUIC, performance,... Moar integrations - ACLs, Telemetry, Audit, Policy,... Security - HSM, Cert & Key stores, federation,... API Management
Thanks! Phew