Networking Approaches in a Container World Flavio Castelli Engineering Manager fcastelli@suse.com Rossella Sblendido Engineering Manager rsblendido@suse.com
Disclaimer There a many container engines, I m going to focus on Docker Multiple networking solutions are available: Introduce the core concepts Many projects cover only some of them Container orchestration engines: Tightly coupled with networking I m going to focus on Docker Swarm and Kubernetes Remember: the container ecosystem moves at a fast peace, things can suddenly change 2
The problem Given: Containers are lightweight Containers are great for microservices Microservices: multiple distributed processes communicating Lots of containers that need to be connected together 3
Single host 4
Reuse the host network container-01 host lo... Container has full access to host s interfaces! 5
Reuse the host network $ docker run --rm --name container-01 --net=host -ti busybox /bin/sh / # ifconfig docker0 Link encap:ethernet HWaddr xx:xx:xx:xx:xx:xx inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0 inet6 addr: fe80::x/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:19888 errors:0 dropped:0 overruns:0 frame:0 TX packets:19314 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3063342 (2.9 MiB) TX bytes:29045336 (27.6 MiB) lo Link encap:ethernet HWaddr xx:xx:xx:xx:xx:xx inet addr:192.168.1.121 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::x/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:135513 errors:0 dropped:0 overruns:0 frame:0 TX packets:109723 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:102680118 (97.9 MiB) TX bytes:22766730 (21.7 MiB) Link encap:local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:230 errors:0 dropped:0 overruns:0 frame:0 TX packets:230 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:37871 (36.9 KiB) TX bytes:37871 (36.9 KiB) Warning: the container can see and control all the host interfaces 6
Network bridge container-01 container-02 172.17.0.0/16 docker0 An internal, virtual switch Containers are plugged in that switch Containers on the same bridge can talk to each other Users can create multiple bridges host 7
Network bridge: as seen by the host $ ifconfig docker0 docker0 Link encap:ethernet HWaddr xx:xx:xx:xx:xx:xx inet addr:172.17.0.1 Bcast:0.0.0.0 Mask:255.255.0.0 inet6 addr: fe80::42:a2ff:fe10:ccf7/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:7 errors:0 dropped:0 overruns:0 frame:0 TX packets:30 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:480 (480.0 B) TX bytes:5025 (5.0 KB) docker0 is by default at 172.17.0.1 $ ip route default via 192.168.1.1 dev wlan0 proto static metric 600 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1... route handling traffic from host to containers 8
How to expose a service container-01 container-02 port 80 172.17.0.0/16 docker0 host port 8080 Port 80 of container-02 is mapped to port 8080 of the host Risk: port exhaustion on the host 9
Multi-host networking 10
Multi-host networking scenarios container01 frontend network container02 container03 container05 container04 application network container06 database network host-a host-b host-c 11
Multi-host networking scenarios container01 frontend network container02 container03 container05 container04 application network container06 database network a big host-a 12
Multi-host networking scenarios container01 container03 container05 VM-1 frontend network container02 VM-2 container04 application network container06 database network VM-3 a big host-a 13
Routing solutions 14
Routing approach Create a common IP space at container level Assign a /24 subnet to each host Setup IP routes between the hosts Main projects: Calico Flannel Romana 15
Routing approach container-01 container-02 10.0.9.4 10.0.9.5 container-03 container-04 10.0.10.8 10.0.10.9 10.0.9.0/24 10.0.9.1 Routing rule: 10.0.10.* goes through 192.168.1.2 10.0.10.0/24 10.0.10.1 docker0 docker0 host-a host-b 192.168.1.3 Routing rule: 10.0.10.* goes through docker0 16
Calico's approach container-01 container-02 10.0.9.4 container-03 10.0.9.5 container-04 10.0.10.8 10.0.10.9 10.0.9.0/24 10.0.10.0/24 10.0.9.1 10.0.10.1 docker0 docker0 host-a 192.168.1.2 host-b Felix agent Uses kernel s L3 forwarding capabilities Handles ACLs BGP 192.168.1.3 One of the protocols used to build the Internet Used to advertise routes 17
Flannel's approach container-01 container-02 10.0.9.4 container-03 10.0.9.5 container-04 10.0.10.8 10.0.10.9 10.0.9.0/24 10.0.10.0/24 10.0.9.1 10.0.10.1 docker0 docker0 host-a 192.168.1.2 host-b flanneld process Keep routes up-to-date etcd 192.168.1.3 Network configuration Network topology 18
Calico + flannel = Canal Collaboration announced on May 9th 2016 Use Calico and flannel together Project still in its early days 19
Overlay solutions 20
Overlay network approach Create a parallel network for cross communication Connect hosts with encapsulation tunnels Connect containers to the virtual networks Main projects: Docker (native) Flannel Weave 21
Overlay network container-01 container-02 container-03 container-04 10.0.9.0/24 capture traffic leaving to some other container in 10.0.9.X 10.0.9.0/24 docker0 Encapsulated traffic (eg. VXLAN) docker0 host-a host-b 192.168.1.2 192.168.1.3 Overlay traffic outer Ether Header (src/dst) outer IP Header (src/dst) outer UDP Header VXLAN Header Inner Ether Frame 22
Overlay network and k/v store Network state and configuration can be saved into a k/v store: Docker < 1.12: supports etcd, consul and zookeeper via libkv Docker >= 1.12: no external dependency, built-in component Flannel: etcd Weave: no external dependency, doesn t use k/v store at all 23
Overlay backends VXLAN UDP Docker X - Weave X X* Flannel X X* VXLAN: * backed by custom protocol faster than UDP, traffic doesn't go to userspace Some hardware acceleration available UDP: can add encryption easily 24
Routing vs Overlay Good Native performance Easy debugging Bad Routing Overlay Easier inter-cloud Doesn t require control over the infrastructure Requires control over the infrastructure Hybrid cloud more complicated (requires VPN) Can run out of addresses Inferior performances Debugging more complicated No IP multicast (except for weave) 25
How to use these projects Container Network Module (CNM): Specification used by Docker Plugins for: calico, weave, Note well: Docker 1.12+ Swarm mode works only with the native overlay network driver Container Network Interface (CNI): Derived from rkt networking proposal Supported by rkt, kubernetes, Cloud Foundry, Mesos, Support for: calico, flannel, weave,... 26
More troubles... 27
Are we done? Now we can: Connect containers running on different hosts React to network changes Is that enough? Unfortunately not... 28
Service discovery A container runs a service: producer A container accesses this service: consumer The consumer needs to find where the producer is located (IP address, in some cases even port #) 29
Challenge #1: find the producer Where is redis? web-01 redis-01 host-a host-b web-01 redis-01 host-a host-b 30
Challenge #2: react to changes web is already connected to redis web-01 redis-01 host-a host-b host-c 31
Challenge #2: react to changes redis is moved to another host different IP web-01 redis-01 redis-02 host-a host-b host-c web points to to the old location it s broken 32
Challenge #2: react to changes The link has to be reestablished web-01 redis-02 host-a host-b host-c Containers can be moved at any time: The producer can be moved to a different host The consumer should keep working 33
Challenge #3: multiple choices Multiple instances of the redis image Which redis? web-01 redis-01 redis-02 host-a host-b host-c Workloads can be scaled: More instances of the same producer How to choose between all of them? 34
Addressing service discovery 35
Use DNS Not a good solution: Containers can die/be moved somewhere more often Return DNS responses with a short TTL more load on the server Some clients ignore TTL old entries are cached Note well: Docker < 1.11: updates /etc/hosts dynamically Docker >= 1.11: integrates a DNS server 36
Key-value store Rely on a k/v store (etcd, consul, zookeeper) Producer register itself: IP, port # Orchestration engine handles this data to the consumer At run time either: Change your application to read data straight from the k/v Rely on some helper that exposes the values via environment file or configuration file 37
Handing changes & multiple choices 38
DIY solution Use a load balancer Point all the consumers to a load balancer Expose the producer(s) using the load balancer Configure the load balancer to react to changes More moving pieces 39
Rely on the orchestration engine Service has an unique and stable IP address Consumers are pointed to the service Service redirects the request to one of the containers running the producer Traditional DNS can be added on top of it no changes to legacy applications Feature offered by Kubernetes and Docker >= 1.12 40
Kubernetes and Swarm services redis service VIP web-01 redis-01 redis-02 host-a host-b host-c User declares a service Orchestration engine allocates a virtual IP address for it On each container node: iptables rules to handle VIP container IP translation A process keeps the iptables rules up-to-date 41
Are we really done? 42
Ingress traffic Your production application is running inside of a container cluster How to route customers requests to these containers? How to react to changes (containers moved, scaling, )? 43
Kubernetes approach Services can be of three different types: ClusterIP: virtual IP reachable only by containers inside of the cluster NodePort: ClusterIP + the service is exposed on all the nodes of the cluster on a specific port <NodeIP>:<NodePort> LoadBalancer: NodePort + k8s allocates a load balancer using the underlying cloud provider. Then it configures it and it keep it up-to-date 44
Docker 1.12 approach Define a service using the `--publish` flag The service is exposed on all the nodes of the cluster on a specific port <NodeIP>:<ServicePort> 45
Ingress traffic flow Load balancer http://guestbook.com 8080 8081 guestbook -01 8080 blog-01 host-a Load balancer picks a container host Traffic is handled by the internal service 8081 guestbook -01 host-b 8080 8081 blog-01 host-c Works even when the node chosen by the load balancer is not running the container 46
Recap Calico Docker built-in Flannel Weave Approach routing overlay routing, overlay overlay Specification CNI, CNM CNM CNI, CNM CNI, CNM It s not just a matter of connecting containers: Service discovery Handling changes & multiple choices Handling ingress traffic 47
Questions? 48