Distributed File Storage in Multi-Tenant Clouds using CephFS Openstack Vancouver 2018 May 23 Patrick Donnelly CephFS Engineer Red Hat, Inc. Tom Barron Manila Engineer Red Hat, Inc. Ramana Raja CephFS Engineer Red Hat, Inc.
How do we solve this?
Ceph Higher Level Components OBJECT BLOCK FILE RGW S3 and Swift compatible object storage with object versioning, multi-site federation, and replication RBD A virtual block device with snapshots, copy-on-write clones, and multi-site replication LIBRADOS A library allowing apps to direct access RADOS (C, C++, Java, Python, Ruby, PHP) CEPHFS A distributed POSIX file system with coherent caches and snapshots on any directory RADOS A software-based, reliable, autonomic, distributed object store comprised of self-healing, self-managing, intelligent storage nodes (OSDs) and lightweight monitors (Mons) 3 Openstack Sydney: 2017 November 06
What is CephFS? CephFS is a POSIX-compatible distributed file system! 3 moving parts: the MDS(s), the clients, and RADOS. Files, directories, and other metadata are stored in RADOS. The MDS has no local state. Mount using: FUSE: ceph-fuse... Kernel: mount -t ceph... Coherent caching across clients. MDS issues inode capabilities which enforce synchronous or buffered writes. Clients access data directly via RADOS. File I/O Client RADOS Metadata RPC Journal Metadata Client ceph-mds 4 Openstack Sydney: 2017 November 06
CephFS native driver deployment Storage (Ceph public) network Ceph MGR Storage Provider Network Ceph MDS Controller Nodes Ceph MON Tenant A Tenant B Manila Share Manila API Tenant VMs with 2 nics Compute Nodes External Provider Network Storage nodes Ceph MDS placement: With MONs/python s/ dedicated? Public OpenStack Service API (External) network https://docs.openstack.org/manila/ocata/devref/cephfs_native_driver.html
CephFS NFS driver (in data plane) OpenStack client/nova VM Clients connected to NFS-Ganesha gateway. Better security. No single point of failure (SPOF) in Ceph storage cluster (HA of MON, MDS, OSD) NFS-Ganesha needs to be HA for no SPOF in data plane. NFS-Ganesha active/passive HA WIP (Pacemaker/Corosync) Metadata updates Native Ceph NFS NFS Data updates gateway server daemons Monitor Metadata Server OSD Daemon
CephFS NFS driver deployment Storage (Ceph public) network Storage NFS Network Ceph MDS Controller Nodes Ceph MON Tenant A Tenant B Manila Share Manila API Tenant VMs with 2 nics Compute Nodes External Provider Network Storage nodes Ceph MDS placement: With MONs/python s/ dedicated? Public OpenStack Service API (External) network https://docs.openstack.org/manila/queens/admin/cephfs_driver.html
Future: Ganesha per Tenant Ceph public network Ceph MDS Ceph MGR Ceph MON kubernetes Controller Nodes Tenant A Tenant B Manila Share Manila API Tenant VMs Compute Nodes External Provider Network Public OpenStack Service API (External) network
Manila MGR MDS OSD OSD OSD NFSGW Ganesha 9 Kubernetes Pod (HA Managed by Kubernetes)
Managing Scale-Out: CephFSShareNamespace One IP to access NFS Cluster for tenant; only via tenant network # of pods equal to scale-out for CephFSShareNames pace ; dynamically grow/shrink?? Pod #1 NFSGW Ganesha NFS Client Pod #2 Service Stable network identifiers for pods! NFSGW Ganesha StatefulSet #1 NFS Client 10
Thanks! Patrick Donnelly pdonnell@redhat.com Thanks to the CephFS team: John Spray, Greg Farnum, Zheng Yan, Ramana Raja, Doug Fuller, Jeff Layton, and Brett Niver. Homepage: http://ceph.com/ Mailing lists/irc: http://ceph.com/irc/ 11 Openstack Sydney: 2017 November 06