Linux Cluster next generation Vladislav Bogdanov

Similar documents
Red Hat Enterprise Linux 7

高可用集群的开源解决方案 的发展与实现. 北京酷锐达信息技术有限公司 技术总监史应生

Cluster Components. Clustering with OpenAIS and Corosync Heartbeat

RedHat Cluster (Pacemaker/Corosync)

RHEL Clustering and Storage Management. 5 Days

High Availability for Highly Reliable Systems

Red Hat Enterprise Linux 7

MySQL High Availability and Geographical Disaster Recovery with Percona Replication Manager. Yves Trudeau November 2013

Red Hat Clustering: Best Practices & Pitfalls. Lon Hohberger Principal Software Engineer Red Hat May 2013

Red Hat Enterprise Linux 7

Linux-HA Version 3. Open Source Data Center April 30 th, 2009

Exploring History with Hawk

SUSE Linux Enterprise High Availability Extension

SUSE Linux Enterprise High Availability Extension

Agenda. Clustering. High Availability. Cluster Management. Failover. Fencing. Lock Management GFS2

Red Hat Enterprise Linux 7 High Availability Add-On Overview

The Corosync Cluster Engine

Administration Guide. SUSE Linux Enterprise High Availability Extension 12 SP2

Red Hat Enterprise Linux 7 High Availability Add-On Overview

Red Hat Cluster A walkthrough

Administration Guide. SUSE Linux Enterprise High Availability Extension 15

Red Hat Cluster Suite Networking

CIT 668: System Architecture

High-Availability Option (HAO) for RHEL on System z and System p. David Boyes Sine Nomine Associates

Research Project 1. Failover research for Bright Cluster Manager

Red Hat Enterprise Linux 6

HA NFS Cluster using Pacemaker and DRBD on RHEL/CentOS 7. Matt Kereczman Version 1.5,

Red Hat Enterprise Linux 7 High Availability Add-On Reference. Reference Document for the High Availability Add-On for Red Hat Enterprise Linux 7

High Availability. Neale Ferguson Sine Nomine Associates Tuesday 13 August,

MySQL HA Solutions Selecting the best approach to protect access to your data

HA for OpenStack: Connecting the dots

Linux-HA Clustering, no SAN. Prof Joe R. Doupnik Ingotec, Univ of Oxford, MindworksUK

OCFS2 Mark Fasheh Oracle

Upgrading a HA System from to

GFS Best Practices and Performance Tuning. Curtis Zinzilieta, Red Hat Global Services

Linux Clusters Made Easy with the SUSE Linux Enterprise High Availability Extension

A Carrier-Grade Cloud Phone System

Running Linux-HA on a IBM System z

Pacemaker The Open Source, High Availability Cluster

Cluster Installation using SNA HAO for RHEL 7

Red Hat Enterprise Linux 5 Cluster Suite Overview

Understanding High Availability options for PostgreSQL

High Availability with the openais project. Prepared by: Steven Dake October 2005

Errata and Commentary Final, Submitted to Curriculum. ~]$ restorecon.ssh/authorized_keys

Open Source Storage Management Aperi and SMI-S for Linux

Red Hat Enterprise Linux 5 Cluster Administration. Configuring and Managing a Red Hat Cluster Edition 5

The openais Architecture

Red Hat Enterprise Linux 6

Red Hat Cluster Suite

Red Hat Enterprise Linux 6

Upgrading to Parallels Virtuozzo Containers 4.6

HA solution with PXC-5.7 with ProxySQL. Ramesh Sivaraman Krunal Bauskar

Using OpenSAF for carrier grade High Availability

Symmetric Cluster Architecture and Component Technical Specifications. David Teigland Red Hat, Inc.

Installation and Setup Quick Start

The openais project. Technomap. Prepared by Steven Dake January 2005

Live updates in high-availability (HA) clouds

Red Hat High Availability vs. Steeleye Protection Suite for Linux

Red Hat Enterprise Linux 6 High Availability Add-On Overview

Parallels Virtuozzo Containers 4.7 for Linux

Linux Ha Error Code 139

TUT90846 Towards Zero Downtime How to Maintain SAP HANA System Replication Clusters

V8.02_formatted

OpenSAF More than HA. Jonas Arndt. HP - Telecom Architect OpenSAF - TCC

Choosing a MySQL HA Solution Today

Resource Manager Collector RHCS Guide

MySQL High Availability on the Pacemaker cluster stack. Brian Hellman, Florian Haas

McAfee SIEM HA Receivers

Red Hat Enterprise Linux 5.8 Beta Cluster Suite Overview. Red Hat Cluster Suite for Red Hat Enterprise Linux 5

Evaluation of the SAF AIS implementations

High Availability. Prepared by Vaibhav Daud

Pacemaker 1.1 Configuration Explained

[ 1 ]

High Availability Guide. OpenStack contributors

Availability of. Datacenter

Essentials. Oracle Solaris Cluster. Tim Read. Upper Saddle River, NJ Boston Indianapolis San Francisco. Capetown Sydney Tokyo Singapore Mexico City

Pacemaker 1.1 Clusters from Scratch

Introduction to the Service Availability Forum

INUVIKA TECHNICAL GUIDE

Linux Clustering Technologies. Mark Spencer November 8, 2005

Pacemaker 1.1 Configuration Explained

Highly Available NFS Storage with DRBD and Pacemaker

Slurm Birds of a Feather

Pacemaker 1.1 Clusters from Scratch

Trouble?! Shoot your Cloud!

Group Replication: A Journey to the Group Communication Core. Alfranio Correia Principal Software Engineer

Pacemaker Administration

PGConf US PostgreSQL Power on Power. Michael Meskes. Bernd Helmle. Julian Schauder. credativ

GR Reference Models. GR Reference Models. Without Session Replication

Slurm Roadmap. Danny Auble, Morris Jette, Tim Wickberg SchedMD. Slurm User Group Meeting Copyright 2017 SchedMD LLC

The Zenoss Enablement Series:

Red Hat Global File System

How To Make Databases on Linux on System z Highly Available

Configuring Control Center 1.0.x for HA

Red Hat Enterprise Linux 5 Cluster Administration. Configuring and Managing a Red Hat Cluster

ATCA, HPI, AIS open specifications for HA applications. Artem Kazakov SOKENDAI/KEK TIPP09

Red Hat Enterprise Linux 4 Cluster Suite Overview

Simplified Storage Migration for Microsoft Cluster Server

Percona XtraDB Cluster ProxySQL. For your high availability and clustering needs

Kernel Level Speculative DSM

Transcription:

Linux Cluster next generation Vladislav Bogdanov

Heartbeat Simple (mostly two-node) clusters IP (UDP: unicast, broadcast, multicast) or serial communication Limited functionality (esp. haresources mode), no clustered storage support Later: added crm Then split: Heartbeat (messaging + membership) Resource agents (OCF) Cluster-glue (lrmd, stonith, plumbing, IPC, logging) Cluster Resource Manager -> pacemaker Now: deprecated (except resource-agents and pacemaker)

RHCS CMAN messaging + quorum + API (then plugin to OpenAIS, so only quorum + API) DLM kernel + userspace GFS (then GFS2) kernel + userspace CLVM Fencing framework Rgmanager - cluster resource manager Later: pacemaker support for CMAN Later: fencing via pacemaker Then: project split, DLM and GFS-utils are now separate packages Now: everything except them is deprecated after RHEL6

OpenAIS Totem protocol messaging virtual synchrony SA Forum APIs CPG, AMF, CKPT, EVT, LCK UDP multicast and broadcast communication CMAN became a plugin to OpenAIS Pacemaker supports OpenAIS via plugin DLM, GFS2 (and OCFS2) work with OpenAIS via CMAN only CLVM (mostly) works with OpenAIS DLM, GFS2 and OCFS2 do not work reliably without CMAN Later: project split, Totem and CPG go to corosync (1.x) Now: discontinued

Pacemaker (2-3 years ago) Runs on top of heartbeat, openais (corosync 1.x) Provides quorum Supports: Resource Clone Anonymous Globally-unique Master-slave (with multi-master support) Resource migration Order and colocation constraints Groups Node attributes (volatile and non-volatile) Fencing (STONITH) heartbeat agents Later: support for CMAN (rather ugly) and RHCS fence agents

Corosync 2.x Totem + CPG UDP unicast in addition to multicast and broadcast Multi-ring (active and passive) Libqb based Quorum (votequorum) Single-threaded No plugins anymore (OpenAIS, CMAN and pacemaker via plugin are not compatible) CMAP dynamic cluster reconfiguration Nodelist Dynamic nodelist (UDPU) members via CMAP More info corosync_overview(8), cmap_keys(8), corosync.conf(5)

Votequorum (corosync 2.x) Configurable node votes Expected votes (cluster-wide) Special features Two-node mode WFA (wait-for-all) no quorum until all configured nodes are seen simultaneously LMS (last-man-standing) dynamic expected_votes and quorum recalculation (down) ATB (auto-tie-breaker) partition with node with lowest known id remains quorate even with 50% of votes AD (allow-downscale) decrease expected_votes and quorum on clean shutdown, down to configured expected_votes More info - votequorum(5)

DLM Kernel component: TCP/SCTP communications for locking itself Now: in-kernel interface for fs-control Now: Additional sysctls instead of user-space configuration User-space component (dlm_controld) Before: CMAN for both membership and quorum RHCS (fenced) for fencing Then: CPG for membership (but CPG_NODE_DOWN event is not handled) CMAN for quorum RHCS (fenced) API for fencing Now: CPG for membership Corosync quorum Stonithd (pacemaker) for fencing FS-control support is deprecated but still exists

Kernel component: GFS2 DLM for locking Before: FS-control via user-space Now (Fedora 17+ and RHEL7): In-kernel FS-control User-space component (gfs_controld) Before: DLM for locking control CMAN for membership RHCS (fenced) for fencing Now (Fedora 17+ and RHEL7): obsolete

Locking support: DLM Membership support: Before: corosync (1.x) CLVM OpenAIS (with buggy LCK service) CMAN Now: CPG (corosync 2.x) Legacy Quorum: missing (!!!), global stop on graceful shutdown of cluster node. Developers are insane. Patch exists.

Pacemaker (now) Support for corosync 2.x (messaging, quorum, parts of configuration via CMAP) CPG membership. CMAN, heartbeat, openais (with plugin) support is deprecated. Libqb IPC Order-sets (A then (B and C) then D) Colocation sets (A (B C) D) Sequential yes/no Full stonithd (fencing daemon) rewrite Client API Fencing topology (A or (B and C)) Used by DLM 4.x Can work without the rest of pacemaker

Pacemaker (cont) Full LRMD rewrite Systemd resources Nagios checks for container resources (VM) Tickets GEO-clustering (with booth PAXOS algorithm implementation) Tickets are volatile cluster attributes Still missing: Non-volatile cluster attributes (can be implemented with tickets and migratable 'Ticketer' resource-agent) Migration of 'first' resource causes 'then' resources to restart Being developed Remote LRMD execution

Management tools Console: crmsh (from SUSE). Actively developed. pcs (from RedHat). New development. Web-based: hawk (from SUSE) pcs (from RedHat) GUI: Pacemaker-GUI (almost not supported) LCMC (Java)

Linux cluster (Fedora17 and RHEL7) Corosync 2.x Provides CPG and quorum Fence-agents from RHCS OCF resource-agents from heartbeat Pacemaker: Uses: CPG and quorum from corosync Fence-agents for fencing Resource-agents for resource management Provides: CIB (Cluster information Base) CRMd (Cluster Resource Manager daemon) Stonith (fencing) daemon and API DLM 4.x Uses CPG and quorum from corosync Uses fencing via stonith API (pacemaker) CLVM Uses CPG (and quorum if patched) from corosync GFS2 (with in-kernel FS-control) Uses DLM kernel component directly

Questions? www.sam-solutions.com Value of Talent. Delivered.