Recovery: auto start components

Size: px

Start display at page:

Download "Recovery: auto start components"

Angela Parker
5 years ago
Views:

1 Recovery: auto start components Ambari auto start for services and components Summary This document describes the Ambari auto start feature before and after version Ambari auto start is a feature that enables certain components to be marked for auto start so that whenever a node restarts, ambari agent automatically restarts the stopped components. Auto start of a component is based on its current state and desired state. Ambari 2.3.x/2.2.x (see here) Auto start of services and components is supported via ambari.properties file using several properties. However, this approach is static - anytime auto start for a service component is required to be turned on or off, these properties in ambari.properties have to be modified and ambari server has to be restarted for the changes to go into effect. Moreover, ambari agent has to be restarted so that it can bootstrap with the server to get the auto start configuration. Ambari (see here) Auto start is dynamic. No restart of ambari server or ambari agent is required for any changes to take effect. All auto start properties reside in the database. API support has been added to configure the auto start setting for services and have ambari server communicate the changes to the ambari agents during the subsequent registration or heartbeat. Ambari web (UI) uses the APIs to dynamically control the auto start settings. How auto start works in Ambari versions 2.3.x/2.2.x When an ambari agent starts, it bootstraps with the ambari server via registration. The server sends information to the agent about the components that have been enabled for auto start along with the other auto start properties in ambari.properties. The agent compares the current state of these components against the desired state, to determine if these components are to be installed, started, restarted or stopped. Ambari.properties To enable components for auto start, specify them using recover.enabled_components=a,b,c # Enable Metrics Collector auto-restart recovery.type=auto_start recovery.enabled_components=metrics_collector recovery.lifetime_max_count=1024 Here s a sample snippet of the auto start configuration that is sent to the agent by the server during agent registration: "recoveryconfig": "type" : "AUTO_START", "maxcount" : 10, "windowinminutes" : 60, "retrygap" : 0, "enabledcomponents" : "a,b", disabledcomponents : c,d

2 For example, if the current state of METRICS_COLLECTOR component on a host is INSTALLED but it is enabled for auto start, the desired state is STARTED. The recovery manager generates a start command for METRICS_COLLECTOR which is executed by the controller. Recovery scenarios Depending on the value of recovery_type (DEFAULT, AUTO_START, FULL) attribute in ambari.properties file, the following recovery commands are supported. DEFAULT means auto start is disabled by default. Summary of recovery_type values and state transitions Attribute: recovery_type Commands State Transitions AUTO_START Start INSTALLED STARTED FULL Install, Start, Restart, Stop INIT INSTALLED, INIT STARTED, INSTALLED STARTED, STARTED STARTED, STARTED INSTALLED DEFAULT None Auto start feature disabled Detailed state transitions for various recovery_type values Current state Desired state Recovery command Recovery mode Remarks INSTALLED STARTED Start AUTO_START Start a component INSTALLED STARTED Start FULL Start a component INSTALLED INSTALLED Install FULL Stale component configurations. INIT STARTED Install FULL Start a component INIT INSTALLED Install FULL Install a component STARTED STARTED Restart FULL Stale component configurations STARTED INSTALLED Stop FULL Stop a component How auto start works in Ambari version Recovery scenarios Please note that only Auto start recovery mode is supported, i.e., components that are in INSTALLED state can be transitioned to STARTED state. Ambari server sends the AUTO_START value for recovery type to the agent. Sample recovery configuration sent by the server to the agent: "recoveryconfig": "type" : "AUTO_START", "maxcount" : 10,

3 "windowinminutes" : 60, "retrygap" : 0, "components" : "a,b", "recoverytimestamp" : Enabling or disabling auto start feature from the UI: New RESTful APIs to capture the service and component names for auto start Support for multi instance services and components Fresh installs and upgrades In a fresh install, all services will be set to auto start by default. In upgrades this will not be the default. The user has to enable auto start via the UI. Maintenance mode Auto start will be ignored for host components which are in maintenance mode. A host component can be in maintenance mode due one or more of following reasons: The host component was placed in maintenance mode The host was placed in maintenance mode The service was placed in maintenance mode The cluster where the hosts belongs to was placed in maintenance mode. Maintenance state of a component is got from the maintenance_state field in hostcomponentdesiredstate table: cluster_id host_id service_name component_name maintenance_state Auto start properties Auto start setting is per service instance and stored in recovery_enabled field in servicecomponentdesiredstate table. However, all the other properties like recovery.type, recovery.lifetime_max_count, recovery.max_count, recovery.window_in_minutes, recovery.retry_interval will be global - applies to all service/component instances in that cluster and stored in the clusterconfig table for the cluster-env property. This is because having per service instance or component instance level setting will be too noisy with little or no benefit. Persistence Properties for auto start will be stored in the database. The idea is to use servicecomponentdesiredstate and clusterconfig table and distribute the information across these tables. Blueprint based deployments For blueprints for deployment (headless deployments). Blueprints do not have any room for specifying settings properties. Blueprint schema will have to be modified to accommodate settings. All components are auto started.

4 Specify a set or all of the components to be auto started. If it is a set, then explicitly call out the list of components. For all components, specify recovery_enabled="true" at the cluster level: "settings" : [ "recovery_settings" : [ "recovery_enabled" : "true" ] ], "Blueprints" : "stack_name" : "HDP", "stack_version" : "2.5" Specify METRICS_COLLECTOR as the default auto started component in both UI and blueprint, in the stack definition, with the ability for the blueprint authors to remove METRICS_COLLECTOR from getting auto start. Blueprints can override the default list specified in the stack definition. During deployment, the servicecomponentdesiredstate table s recovery_enabled field is set to true or false for each component. Attributes will be stored in cluster-env.xml. Cluster-env.xml contains the following non-volatile properties: recovery_type recovery_lifetime_max_count recovery_max_count recovery_window_in_minutes recovery_retry_interval recovery_enabled /var/lib/ambari-server/resources/stacks/hdp/<version>/configuration/cluster-env.xml <configuration> <property> <name>recovery_type</name> <value>auto_start</value> <description>recovery type</description> </property> : : </configuration> Enabling components for auto start Components can be enabled for auto start by any of the following ways: 1. Stack definition: /var/lib/ambari-server/resources/common-services/<service_name>/<version>/metainfo.xml specifies whether a component is enabled for auto start.

5 To enable a component for auto start in the stack definition, the XML snippet <recovery_enabled>true</recovery_enabled> should be specified. For example, to enable AMBARI_METRICS_COLLECTOR for auto start, it s stack definition file common-services /AMBARI_METRICS/0.1.0/metainfo.xml should have the line in bold below: <metainfo> <schemaversion>2.0</schemaversion> <services> <service> <name>ambari_metrics</name> <displayname>ambari Metrics</displayName> <version>0.1.0</version> <comment>a system for metrics collection that provides storage and retrieval capability for metrics collected from the cluster </comment> <components> <component> <name>metrics_collector</name> <displayname>metrics Collector</displayName> <category>master</category> <recovery_enabled>true</recovery_enabled> 1. Blueprint definition: When using blueprint deployments, the components specified in the blueprint JSON will override the ones specified in the stack definition. 3. UI based deployments Based on the stack definition, while deploying a cluster using the UI, the servicecomponentdesiredstate table s new field recovery_enab led is updated by the backend with true/false based on whether the component is enabled or disabled for auto start. Changes to the auto start value of one or more components is done from the UI. The changes will be updated in servicecomponentdesir edstate table (recovery_enabled column) which is the source of truth when the ambari server communicates with the ambari agent. Blueprint schema Use cluster-env section in the blueprint JSON to specify cluster specific auto start attributes. JSON for enabling auto start: "settings" : [ "recovery_settings" : [ "recovery_enabled" : "true" ], "service_settings" : [ "name" : "HDFS", "recovery_enabled" : "false", "name" : "TEZ", "recovery_enabled" : "false" ], "component_settings" : [ "name" : "DATANODE", "recovery_enabled" : "true" ]

6 ] Blueprint processor hands off this list to the deployment module so that servicecomponentdesiredstate table can be updated. Component autostart hierarchy Stack definition will contain the default list of components to be enabled or disabled. Blueprint definition can use the cluster-env section to specify a list which will override the one specified in the stack definition. UI will get it's list from the stack definition. The backend will update the servicecomponentdesiredstate table with the list coming in from the UI or Blueprint. Ambari Metric Service specific changes METRICS_COLLECTOR component is set to auto start by default in ambari.properties in Ambari versions earlier to In 2.4.0, this setting has been migrated to /var/lib/ambari-server/resources/common-services/ambari_metrics/<version>/metainfo.xml with the <recovery_enabled >true</recovery_enabled> entry. Backward compatibility Ambari.properties will be ignored. All values come from either the stack definition for UI based deployments or blueprint for blueprint based deployments. Cluster-env.xml or the cluster-env section of the blueprint supplies the auto start properties listed above. Pre-populate settings in the DB: The backend will populate the servicecomponentdesiredstate table with true/false values for various components during deployment - coming from the stack deployment or blueprint. Communication The ambari agent communicates with ambari server during registration (start up) and with periodic heartbeats. These are events when the server can send information to the agent when there are changes to the auto start property on services and components, giving an opportunity to the agent to apply those changes. Registration The server sends the following JSON to the agent during registration. "recoveryconfig": "type" : "AUTO_START", "maxcount" : "5", "windowinminutes" : 20, "retrygap" : 2, "maxlifetimecount" : 5, "components : METRICS_COLLECTOR, OOZIE_SERVER

7 The components member contains a list of components enabled for auto start and not in maintenance mode. Heartbeat If the auto start value for one or more components changes and/or the cluster-env level recovery properties change, the above JSON is constructed with the changed components and sent to the agent during the subsequent heartbeat. Database Cluster specific properties The following cluster level properties will be stored under the cluster-env type in clusterconfig table as a JSON: Property name Value(s) Description recovery_type DEFAULT, AUTO_START DEFAULT: No auto start. AUTO_START: auto start only. recovery_lifetime_max_count recovery_max_count recovery_window_in_minutes recovery_retry_interval recovery_enabled true, false Cluster level recovery Cluster config table: cluster_id type_name version_tag version config_data 2 cluster-env version1 1...," recovery_lifetime_max_count ":"1024","recovery _max_count":"6"," recovery_type":" AUTO_START",," recovery_retry_interval":"5" The recovery_enabled value from clusterconfig overrides the value from servicecomponentdesiredstate for that cluster. Service component specific properties The servicecomponentdesiredstate table will be used to specify whether a component is enabled for auto start or not. Columns in bold are new. Existing attributes in ambari.properties are mapped to the new columns here.

8 recovery.disabled_components/recovery.enabled_components recovery_enabled (boolean) cluster_id component_name service_name recovery_enabled 2 YARN_CLIENT YARN 0 2 METRICS_COLLECTOR AMBARI_METRICS 1 2 OOZIE_SERVER OOZIE 1 REST API Get auto-start flags of a cluster Type: GET Request: api/v1/clusters/<cluster_name>?fields=clusters/desired_configs/cluster-env "href" : " "Clusters" : [ "cluster_name" : "testcluster", "version" : "HDP-2.2", "desired_configs" : "cluster-env" : "tag" : "version1", "user" : "admin", "version" : 1 Type: GET Request: api/v1/clusters/<cluster_name>/configurations?type=cluster-env&tag=version<xxx> Example Response: href: "...", items: [ href: "...",

9 tag: "version<xxx>", type: "cluster-env", version: 2, Config: cluster_name: "c1", stack_id: "HDP-2.3", properties: fetch_nonlocal_groups: "true", ignore_groupsusers_create: "false", kerberos_domain: "EXAMPLE.COM", override_uid: "true", repo_suse_rhel_template: "...", repo_ubuntu_template: "package_type base_url components", security_enabled: "false", smokeuser: "ambari-qa", smokeuser_keytab: "/etc/security/keytabs/smokeuser.headless.keytab", user_group: "hadoop", recovery_enabled: false, recovery_type: AUTO_START, recovery_lifetime_max_count: 10, recovery_max_count: 2, recovery_window_in_minutes: 10, recovery_retry_interval: 5000 ] Set auto-start flags of a cluster Type: PUT Request: api/v1/clusters/<cluster_name> Clusters: desired_config: tag: "version<xxx>", type: "cluster-env",

10 properties: fetch_nonlocal_groups: "true", ignore_groupsusers_create: "false", kerberos_domain: "EXAMPLE.COM", override_uid: "true", repo_suse_rhel_template: "...", repo_ubuntu_template: "...", security_enabled: "false", smokeuser: "ambari-qa", smokeuser_keytab: "...", user_group: "hadoop", recovery_enabled: true, recovery_type: AUTO_START, recovery_lifetime_max_count: 10, recovery_max_count: 2, recovery_window_in_minutes: 10, recovery_retry_interval: 5000 Get auto-start flags of all components Type: GET Request: api/v1/clusters/<cluster_name>/components?fields=servicecomponentinfo/component_name,servicecomponentinfo/service_name, ServiceComponentInfo/category,ServiceComponentInfo/recovery_enabled Success Response: application/json Example Response href: "...", items: [ href: "...", ServiceComponentInfo: category: "SLAVE", component_name: "DATANODE", service_name: "HDFS", cluster_name: "c1",

11 recovery_enabled: true, href: "...", ServiceComponentInfo: category: "MASTER", cluster_name: "c1", component_name: "NAMENODE", service_name: "HDFS", recovery_enabled: true, href: "...", ServiceComponentInfo: category: "SLAVE", cluster_name: "c1", component_name: "JOURNALNODE", service_name: "HDFS", recovery_enabled: false ] Error Response: Bad Request "status" : <status>, "message" : <error message> Set auto-start flags of all components Type: PUT Request 1: api/v1/clusters/<cluster_name>/components?servicecomponentinfo/ component_name.in(< enabled_component_names>) Request Params: application/json

12 ServiceComponentInfo: recovery_enabled: true Request 2: api/v1/clusters/<cluster_name>/components?servicecomponentinfo/ component_name.in(< disabled_component_names>) Request Params: application/json ServiceComponentInfo: recovery_enabled: false Success Response: 202 OK Error Response: Bad Request "status" : <status>, "message" : <error message> Request 3: api/v1/clusters/testcluster/components/zookeeper_server -d '"ServiceComponentInfo" : "recovery_enabled":"true"' Request 4: api/v1/clusters/testcluster/components?servicecomponentinfo/component_name=zookeeper_server -d '"ServiceComponentInfo" : "recovery_enabled":"false"' Request 5: curl -u admin:admin -H "X-Requested-By: ambari" -X PUT ' ServiceComponentInfo/component_name.in(ZOOKEEPER_SERVER)' -d '"ServiceComponentInfo" : "recovery_enabled":"false"' Request 6: curl -u admin:admin -H "X-Requested-By: ambari" -X PUT -d '"RequestInfo": "query": "ServiceComponentInfo/ component_name.in(zookeeper_client,zookeeper_server)", "ServiceComponentInfo" : "recovery_enabled":"true"'

Installing Apache Knox

3 Installing Apache Knox Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents...3 Install Knox...3 Set Up Knox Proxy... 4 Example: Configure Knox Gateway for YARN UI...6 Example: Configure