Docker Networking: From One to Many Don Mills
What we are going to talk about Overview of traditional Docker networking Some demonstrations Questions New Docker features Some more demonstrations Questions again
The Building Blocks of Docker Networking VXLAN Discovery Segmentation Network Namespaces (netns) Virtual Ethernet Interfaces (Veths)
The Building Blocks Part One Network Namespaces (netns) A logical, separated, discrete copy of the network stack. Network Namespaces (netns)
Network Namespaces virtualize the network functions Each container has one* Container 1 Namespace (interfaces, routing table) Container 2 Namespace (interfaces, routing table) Container N Namespace (interfaces, routing table) Linux Kernel Global Namespace (interfaces, routing table, iptables) HARDWARE
Docker Single Host Networking (Traditional) Four modes Null (None) Host Mapped Container Bridged (default)
The Building Blocks Part Two Virtual Ethernet Devices (veths) A linked pair of virtual interfaces Network Namespaces (netns) Virtual Ethernet Interfaces (veths)
Veths link the namespaces Traffic goes in one, comes out the other VETH1 VETH2 Container Bridge (docker0) Container's Network Namespace Host's Network Namespace
Bridged Mode Network Outbound traffic NAT to host NIC IP address ContainerA 10.0.1.3 eth0 Bridge docker0 NIC 192.168.0.3 80 8080 Docker Host Inbound traffic DNAT from outside port to inside port
The Building Blocks Part Three- Discovery How containers discover other containers. Discovery Network Namespaces (netns) Virtual Ethernet Interfaces (Veths)
Legacy Links
Questions?
The Building Blocks Part Four - Segmentation Keeping container networks separate and distinct Discovery Segmentation Network Namespaces (netns) Virtual Ethernet Interfaces (Veths)
User-Defined Bridges Users can now define additional bridges to allow for network micro-segregation. Container Yellow1 Container Green1 Bridge Yellow Bridge Green Container Yellow2 Container Green2
Discovery 2 - Embedded DNS Servers, Aliases, and New Links
The Building Blocks Part Five VXLAN VXLAN (Virtual Extensible LAN) is a way of tunneling layer 2 traffic inside layer 3 routed traffic. VXLAN Discovery Segmentation Network Namespaces (netns) Virtual Ethernet Interfaces (Veths)
Bridged Mode Inbound Example
VXLAN Header One ethernet packet inside another
VXLAN Process
The Architecture of a Switch Control Management Data
Multi-host Network Container 172.18.0.3 eth1 Linux Bridge docker_gwbridge Outbound traffic NAT to host NIC IP address DockerA NIC 10.0.1.3 eth0 Linux Bridge OverlayNetNS vxlan1 Overlay traffic encapsulated in VXLAN Docker Host
Questions?
Appendix (Extra Slides)
The Building Blocks Part One Network Namespaces (netns) A logical, separated discrete copy of the network stack. Gets own routes, interfaces, and iptables rules Each container gets its own in /var/run/docker/netns called it's SandboxKey #docker run itd name=test1 busybox #docker inspect test1 grep "SandboxKey" "SandboxKey": "/var/run/docker/netns/2fb603b6d595",
Docker Single Host Networking (Traditional) Four modes Null (None) Container only has loopback interface in netns Host Container shares host's default netns Mapped Container Container shares another container's netns Bridged (default)
None Mode Container has loopback interface but no other network interfaces. #docker run it net=none name=test1 busybox / #ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Host Mode Container uses Docker Host network stack (runs in default netns). #docker run it net=host name=test1 busybox / #ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 08:00:27:2c:fe:f4 brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 08:00:27:3e:2d:96 brd ff:ff:ff:ff:ff:ff 4: docker0: <NO CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue link/ether 02:42:5a:ce:26:f7 brd ff:ff:ff:ff:ff:ff
Mapped Container Mode Container uses network stack of another container (runs in other container's netns). dmills@dockerhost:~$ docker run it name=test1 busybox / # ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 51: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff dmills@dockerhost:~$ docker run it net=container:test1 name=test2 busybox / # ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 51: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
Default Bridged Mode All containers connect their networking interfaces to a shared Linux Bridge Allows internal communication between all containers by default can control with --icc=true/false All traffic outbound is Source Translated (Linux IP Masquerade) All inbound traffic is Destination Translated
The Building Blocks Part Two Virtual Ethernet Devices (Veths) A linked pair of virtual ethernet interfaces (always 2 in a pair) Traffic that goes into one comes out of the other One veth goes in the container netns The other goes into the bridge You can find the linked veth by using ethtool S {vethname}
Default Bridged Mode The Bridge Creates a bridge interface and bridge on Docker host (docker0). #brctl show bridge name bridge id STP enabled interfaces docker0 8000.02425ace26f7 no vethb270fef #ip addr show dev docker0 4: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 02:42:5a:ce:26:f7 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 scope global docker0 valid_lft forever preferred_lft forever
Default Bridged Mode - Outbound Adds an iptables MASQ (Source NAT) rule for outbound traffic to NAT to interface on host. #iptables L t nat Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all 172.17.0.0/16 anywhere
Default Bridged Mode - Inbound Adds an iptables DNAT rule under the Docker Chain for inbound traffic if configured. #docker run dit name=test1 p 80:8080 busybox #iptables L t nat Chain DOCKER (2 references) target prot opt source destination DNAT tcp anywhere anywhere tcp dpt:http to:172.17.0.2:8080
Default Bridged Mode Links for discovery Containers are assigned a random ip address on instantiation...how can they find each other? Through the --link feature. (As of Docker 1.10 known as legacy link ). # docker run dit name test1 busybox 028c276905c9777328cb00bf1338fe3360b8b12b68af411a481d043117d8e84 7 # docker run it name test2 link test1 busybox / # grep test1 /etc/hosts 172.17.0.2 test1 028c276905c9
Default Bridged Mode Links for micro-segmentation If Docker daemon started with -- icc=false and --iptables=true options, then links allow communication between two containers (by adding iptables rules). # docker run dit name test1 busybox 028c276905c9777328cb00bf1338fe3360b8b12b68af411a481d043117d8e84 7 # docker run it name test2 link test1 busybox
New Features! New Features in Docker 1.9/1.10: The docker network commands Multiple user-defined bridges for microsegmentation Built in DNS server for user-defined bridges and overlays and link aliases (1.10) Multi-host overlays Plug-in Architecture
Docker Network commands Docker has moved most network related commands to the docker network set. docker network ls docker network inspect docker network create docker network rm docker network connect/disconnect
User-Defined Bridges Users can now define additional bridges (beyond the docker0 default) to allow for network micro-segregation. Replaces functionality of --icc=false and links All containers on the user-defined bridge can reach each other All containers on the user-defined bridge can resolve hostname (container-name) of each other #docker network create bridgeyellow
Internal DNS Server As of Docker 1.10, user-defined bridges and overlay networks now use an embedded DNS server on each Docker host Runs at 127.0.0.11 Injects Server entry into /etc/resolv.conf You can add network-scoped aliases for a container all containers on that network can reach it by the alias as well #docker run it name=server1 net alias=web test/apache
Multi-Host Overlay Networks Allows containers on separate hosts to communicate directly Can have multiple Overlay networks on same hosts for segregation Embedded DNS Server on each host can resolve the container names of every container on the overlay network for discovery
The Building Blocks Part Three VXLAN VXLAN (Virtual Extensible LAN) is a way of tunneling layer 2 traffic inside layer 3 routed traffic. Runs on UDP port 4789 Encapsulates the original ethernet frame inside the ip packet Traffic is encapsulated at VTEPs (Virtual Tunnel Endpoints) Contains a VNI (Virtual Network Identifier) number that distinguishes between virtual LANS (so you can run multiple ones on the same physical network)
The Vagrant setups Layer 2 (all in same subnet) https://github.com/donmills/dockeroverlayvagrant Layer 3 (with a router in the middle of two subnets) https://github.com/donmills/dockeroverlayvyos