Raw Block Volume in Kubernetes Mitsuhiro Tanino, Principal Software Engineer, Hitachi Vantara

Similar documents
Hitachi & Red Hat collaborate: Container migration guide

Introduction to Kubernetes Storage Primitives for Stateful Workloads

What s New in Kubernetes 1.10

Managing Compute and Storage at Scale with Kubernetes. Dan Paik / Google

Container-Native Storage

Above the clouds with container-native storage

Kubernetes Storage: Current Capabilities and Future Opportunities. September 25, 2018 Saad Ali & Nikhil Kasinadhuni Google

Kubernetes, Persistent Volumes and the Pure Service Orchestrator. Simon Dodsley, Director of New Stack Technologies

Internals of Docking Storage with Kubernetes Workloads

Disaster Recovery and Data Protection for Kubernetes Persistent Volumes. Xing Yang, Principal Architect, Huawei


PRP Distributed Kubernetes Cluster

OpenShift + Container Native Storage (CNS)

Managing and Protecting Persistent Volumes for Kubernetes. Xing Yang, Huawei and Jay Bryant, Lenovo

You Have Stateful Apps - What if Kubernetes Would Also Run Your Storage?

Red Hat Gluster Storage 3.1

Kubernetes Integration with Virtuozzo Storage

Package your Java Application using Docker and Kubernetes. Arun

Red Hat Enterprise Linux Atomic Host 7 Getting Started with Kubernetes

Webinar Series. Cloud Native Storage. May 17, 2017

Kubernetes on Openstack

Convergence of VM and containers orchestration using KubeVirt. Chunfu Wen

What s New in Red Hat OpenShift Container Platform 3.4. Torben Jäger Red Hat Solution Architect

RED HAT GLUSTER TECHSESSION CONTAINER NATIVE STORAGE OPENSHIFT + RHGS. MARCEL HERGAARDEN SR. SOLUTION ARCHITECT, RED HAT BENELUX April 2017

McAfee Cloud Workload Security Installation Guide. (McAfee epolicy Orchestrator)

Data Transfer on a Container. Nadya Williams University of California San Diego

Backup strategies for Stateful Containers in OpenShift Using Gluster based Container-Native Storage

Kubernetes Basics. Christoph Stoettner Meetup Docker Mannheim #kubernetes101

Open Service Broker API: Creating a Cross-Platform Standard Doug Davis IBM Shannon Coen Pivotal

What s New in K8s 1.3

Dell Storage Center with Flocker

Hacking and Hardening Kubernetes

CONTINUOUS INTEGRATION CONTINUOUS DELIVERY

OpenShift Container Platform 3.11

Integrate Ceph and Kubernetes. on Wiwynn ST P. All-Flash Storage

EDB Postgres Containers and Integration with OpenShift. Version 1.0

Scaling Jenkins with Docker and Kubernetes Carlos

Scheduling in Kubernetes October, 2017

DEVELOPER INTRO

Trident Documentation

HITACHI VIRTUAL STORAGE PLATFORM G SERIES FAMILY MATRIX

INTRODUCING CONTAINER-NATIVE VIRTUALIZATION

Kubernetes 1.9 Features and Future

HITACHI VIRTUAL STORAGE PLATFORM F SERIES FAMILY MATRIX

Red Hat JBoss Middleware for OpenShift 3

OpenShift Dedicated 3 Release Notes

Local Ephemeral Storage Resource Management. Jing Xu, Google

TEN LAYERS OF CONTAINER SECURITY

Kubernetes: Twelve KeyFeatures

Containerisation with Docker & Kubernetes

Kubernetes 101. Doug Davis, STSM September, 2017

Red Hat CloudForms 4.5

The speed of containers, the security of VMs

Kubernetes - Load Balancing For Virtual Machines (Pods)

RUNNING VIRTUAL MACHINES ON KUBERNETES. Roman Mohr & Fabian Deutsch, Red Hat, KVM Forum, 2017

How to build scalable, reliable and stable Kubernetes cluster atop OpenStack.

EDB Postgres Containers and Integration with OpenShift. Version 2.3

What s New in K8s 1.3

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

VMWARE PIVOTAL CONTAINER SERVICE

A day in the life of a log message Kyle Liberti, Josef

EDB Postgres Containers and Integration with OpenShift. Version 2.2

Blue elephant on-demand: PostgreSQL + Kubernetes. FOSDEM 2018, Brussels

Question: 2 Kubernetes changed the name of cluster members to "Nodes." What were they called before that? Choose the correct answer:

Learn. Connect. Explore.

Introduction to Kubernetes

Code: Slides:

Top Nine Kubernetes Settings You Should Check Right Now to Maximize Security

Microservices. Chaos Kontrolle mit Kubernetes. Robert Kubis - Developer Advocate,

Veritas CloudPoint 1.0 Administrator's Guide

Unified Kubernetes CRI runtimes based on Kata Containers. Xu Wang hyper.sh

agenda PAE Docker Docker PAE

Software containers are likely to become a very important tool over the

VMware Integrated OpenStack with Kubernetes Getting Started Guide. VMware Integrated OpenStack 4.1

Red Hat Roadmap for Containers and DevOps

Kubernetes objects on Microsoft Azure

An Introduction to Kubernetes

THE AFS NAMESPACE AND CONTAINERS

Red Hat OpenShift Application Runtimes 1

Introduction to the Open Service Broker API. Doug Davis

VirtFS A virtualization aware File System pass-through

Infoblox IPAM Driver for Kubernetes User's Guide

Important DevOps Technologies (3+2+3days) for Deployment

Taming Distributed Pets with Kubernetes

The Path to GPU as a Service in Kubernetes Renaud Gaubert Lead Kubernetes Engineer

VMWARE ENTERPRISE PKS

Infoblox IPAM Driver for Kubernetes. Page 1

What s New in Kubernetes 1.12

Continuous Integration of an Operating System in Kubernetes. Stef Walter Red Hat

VMWARE PKS. What is VMware PKS? VMware PKS Architecture DATASHEET

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

VMware Integrated OpenStack with Kubernetes Getting Started Guide. VMware Integrated OpenStack 4.0

[Cinder] Support LVM on a shared LU

Red Hat Mobile Application Platform 4.1 MBaaS Administration and Installation Guide

Red Hat CloudForms 4.6-Beta

Container Orchestration on Amazon Web Services. Arun

Life of a Packet. KubeCon Europe Michael Rubin TL/TLM in GKE/Kubernetes github.com/matchstick. logo. Google Cloud Platform

The speed of containers, the security of VMs. KataContainers.io

Part2: Let s pick one cloud IaaS middleware: OpenStack. Sergio Maffioletti

Transcription:

Raw Block Volume in Kubernetes Mitsuhiro Tanino, Principal Software Engineer, Hitachi Vantara

Agenda Background Raw Block Volume Support Usage of Raw Block Volumes Implementation deep dive Future Work

Background By using Kubernetes storage feature, user can create/attach/detach/delete persistent volume to/from their pod. But currently these volumes must be consumed via filesystem inside the pod. User can t consume raw block volume directly from their applications without filesystem. MountPoint: - /var/lib/data Filesystem PV DB Application Pod Storage Plugin PV

Background Benefits of raw block volume are; - Enable users to choose appropriate type of volume for their applications such as MariaDB, HiRDB (*1), etc. - Provide consistent I/O performance and low latency compared to filesystem volume, especially enterprise Fibre Channel, iscsi storage and Local SSD disk are suitable for raw block volume use-case. - At production system, raw block is essential functionality. *1: HiRDB: Hitachi s RDBMS

Problem Current workflow: Pod with Filesystem Persistent Volume User Pod, Claim Persistent Volume (1) Request (2) Bind (3) Attach Backend Storage DB Application Pod MountPoint: /var/lib/data (5) Mount PV Storage Plugin Filesystem PV /dev/sda Kubelet node (4) Start Pod Kubelet (1) User requests a Pod with Persistent Volume Claim (2) PV-Binder binds requested PVC and preprovisioned Persistent Volume (3) Storage plugin attaches PV to Kubelet node. PV is recognized as /dev/sda on node. Filesystem is created on top of /dev/sda automatically (4) Kubelet starts an application Pod (5) PV is mounted to kubelet node, then the mount point is passed into pod s mount point: /var/lib/data

Problem User Current workflow: Pod with Filesystem Persistent Volume Pod, Claim Persistent Volume (1) Request (3) Attach Backend Storage Kubelet node (1) User requests a Pod with pre-provisioned persistent volume Following components DB Application Pod can t handle block type of volume. (2) Kubelet Bind MountPoint: /var/lib/data (5) Mount PV Storage Plugin Filesystem PV /dev/sda (4) Start Pod (2) Dynamic Provisioner creates a new Persistent Volume from backend storage They only support Filesystem type volume at Kubernetes v1.8. Persistent Volume and Persistent Volume Claim Storage Plugin Kubelet (3) Storage plugin attaches PV to Kubelet node. PV is recognized as /dev/sda on node. Filesystem is created on top of /dev/sda automatically (4) Kubelet starts an application Pod (5) PV is mounted to kubelet node, then the mount point is passed into pod s mount point: /var/lib/data

Problem Today s workarounds Create a pod with privileged option can expose all volumes on kubelet node to a container. Not recommended since privileged container may cause serious security problem. apiversion: v1 kind: Pod metadata: name: privileged-pod spec: containers: - name: privileged-container # The container definition #... securitycontext: privileged: true

Raw Block Volume Support

Solution Raw Block Volume Support (v1.9, alpha feature) New API support volumemode API for PV and PVC volumedevices API for Pod definition Plugin support Fibre Channel plugin supports raw Block Volume as a reference driver(only support pre-provisioned PV) Dynamic Provisioning support Spec is under discussion but confirmed our externalprovisioner(closed) works together with FC plugin.

Solution New workflow: Pod with Block Persistent Volume User (1) Request Pod: volumedevices Claim: volumemode (2) Bind PV: volumemode Backend Storage (3) Attach devicepath: /dev/sda (5) Pass PV Storage Plugin (Block Volume Support) Block PV DB Application Pod /dev/sda Kubelet node (4) Start Pod Kubelet (Block Volume Support) (1) User requests a Pod with volumedevices, and Persistent Volume Claim with volumemode = Block (2) PV-Binder binds requested PVC and preprovisioned Persistent Volume with volumemode = Block (3) Storage plugin attaches the PV to Kubelet node. PV is recognized as /dev/sda on node. Filesystem is not created (4) Kubelet starts a DB application Pod. (5) The Block PV is passed to the Pod as /dev/sda (or any user defined devicepath)

Usage of Raw Block Volumes

Configuration for services Enable feature-gates Block Volume is alpha feature at v1.9 Need to configure --feature-gates=blockvolume=true for kubelet, kube-api-server, kube-controller-manager to enable BlockVolume

Block Volume Definition Persistent Volume definition apiversion: v1 kind: PersistentVolume metadata: name: block-pv001 spec: capacity: storage: 1Gi accessmodes: - ReadWriteOnce volumemode: Block persistentvolumereclaimpolicy: Retain fc: targetwwns: ['28000001ff0414e2'] lun: 0 The volumemode support two volume modes Filesystem Block Admin can define expected usage of volume through volumemode

Block Volume Definition cont. Persistent Volume Claim definition apiversion: v1 kind: PersistentVolumeClaim metadata: name: block-pvc001 spec: accessmodes: - ReadWriteOnce volumemode: Block resources: requests: storage: 1Gi The volumemode support two volume modes Filesystem Block User can define an expected mode of volume through PVC s volumemode.

Block Volume Definition cont. Pod definition apiversion: v1 kind: Pod metadata: name: blockvolume-pod spec: containers: - name: blockvolume-container # The container definition # volumedevices: - name: data devicepath: /dev/xvda volumes: - name: data persistentvolumeclaim: claimname: block-pvc001 readonly: true The volumedevices is supported to define block volume in pod. These two combination of parameters are available. volumedevices and volumemode Block volumemounts and volumemode Filesystem The readonly parameter can be define in Pod definition. If this parameter is true, The volume is attached read only block volume.

Quick glance of PV/PVC New field VolumeMode is shown in describe PV/PVC % kubectl describe pv/block-pv001 Name: block-pv001 Labels: <none> Annotations: pv.kubernetes.io/bound-by-controller=yes StorageClass: fast Status: Bound Claim: default/block-pvc001 Reclaim Policy: Retain Access Modes: RWO VolumeMode: Block Capacity: 1Gi Message: Source: # The storage definition #... % kubectl describe pvc/block-pvc001 Name: block-pvc001 Namespace: default StorageClass: fast Status: Bound Volume: block-pv0001 Labels: <none> Annotations: pv.kubernetes.io/bind-completed=yes pv.kubernetes.io/bound-by-controller=yes Capacity: 1Gi Access Modes: RWO VolumeMode: Block Events: <none>

Quick glance of pod with block volume New field Devices is shown in describe pod % kubectl describe po/blockvolume-pod Name: block-pod Containers: # The container definition #... Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-kdjq8 (ro) Devices: /dev/xvda from data Volumes: data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: block-pvc001 ReadOnly: true

Quick glance of running pod with block volume Device /dev/xvda is created. User can issue I/O to block device directly inside the pod. % kubectl exec -it blockvolume-pod -- ls -la /dev/ total 0 drwxr-xr-x 5 root root 400 Oct 8 22:23. drwxr-xr-x 21 root root 242 Oct 8 22:23.. crw-rw-rw- 1 root root 1, 9 Oct 8 22:23 urandom brw-rw---- 1 root disk 7, 2 Oct 8 22:23 xvda crw-rw-rw- 1 root root 1, 5 Oct 8 22:23 zero % kubectl exec -it block-pod1 -- dd if=/dev/zero of=/dev/xvda bs=1024k count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.188629 s, 556 MB/s

Implementation deep dive

Implementation deep dive New interface for plugin to enable block volume Under pod and plugin directory PV/PVC volume binding matrix Avoid silent volume replacement

New interface for plugin: implementation matrix # Interface Mmethod Summary 1 BlockVolumeMapper SetUpDevice() Attach volume to kubelet node 2 BlockVolumeUnmapper TearDownDevice() Detach volume from kubelet node 3 NewBlockVolumeMapper() Create blockvolumemapper object 4 BlockVolumePlugin NewBlockVolumeUnmapper() Create blockvolumeunmapper object 5 ConstructBlockVolumeSpec() Create block volumespec 6 GetGlobalMapPath() Get global map path BlockVolume 7 GetPodDeviceMapPath() Get pod device map path

Global map path directory Each volume has own global map path directory to store volume plugin s unique data. ex. /var/lib/kubelet/kubernetes.io/fc/volumedevices/wwn-lun-0/ Symbolic link associated to block device such as /dev/sdx is stored under global map path directory ex. /var/ /fc/volumedevices/wwn-lun-0/{pod uuid symlink} -> /dev/sdb If multiple pods use a same volume, multiple links are stored under the directory Admin can find physical block device by checking the symbolic link

Pod device map path directory Each pod has own pod device map path directory. ex. /var/lib/kubelet/pods/{pod uuid}/volumedevices/kubernetes.io~fc/ Symbolic link associated to block device such as /dev/sdx is stored under pod device map path directory too. ex. /var/ /volumedevices/kubernetes.io~fc/{block-pv0001 symlink} -> /dev/sdb If a pod has multiple volumes, multiple links with volume names are stored under the directory Admin can find physical block device by checking the symbolic link

VolumeMode binding matrix Unspecified volumemode is handled as Filesystem to keep compatibility with existing behavior To bind PV and PVC, both PV and PVC must have same volumemode # PV volumemode PVC volumemode Result 1 unspecified unspecified BIND 2 unspecified Filesystem BIND 3 unspecified Block NO BIND 4 Filesystem unspecified BIND 5 Filesystem Filesystem BIND 6 Filesystem Block NO BIND 7 Block unspecified NO BIND 8 Block Filesystem NO BIND 9 Block Block BIND

Avoid block volume silent replacement Problem We confirmed container runtime doesn t take a lock to attached block volume even if container is online. Therefore volume is possibly replaced silently to another volume unexpectedly. Solution Device file of Block volume such as /dev/sdx will be also opened via loopback device. This takes device lock using file-descriptor then we can avoid device silent replacement.

Future work

Future work Dynamically provision raw block volumes, Finalize spec Raw block support for remaining volume plugins: Local volumes GCE PD AWS EBS Gluster

References Spec (#1265) Block Volume Support Spec API, Controller, Kubelet, etc changes (#50457) API Change.. (#53385) VolumeMode PV-PVC Binding change (#51494) Container runtime interface change,... volumemanager changes, operationexecutor changes (#55112) Block volume: Command line printer update... - Block Volume support for Plugins (#51493) FC plugin update... (#54752) iscsi plugin update... (#55899) AWS plugin update. [merged] [merged] [merged] [merged] [merged] [merged] ->v1.10 ->v1.10

Disclaimer Kubernetes is a trademark of the Cloud Native Computing Foundation. MariaDB is a trademark of the MariaDB Foundation. GCE is a trademark of the Google Inc. Amazon Web Services is trademarks of Amazon.com, Inc. and/or one or more of its subsidiaries, in the United States and/or other countries. Gluster is a trademark of the Red Hat, Inc. All other trademarks are the property of their respective owners.