Workload Experience Manager

Size: px
Start display at page:

Download "Workload Experience Manager"

Transcription

1 Workload Experience Manager

2 Important Notice Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks of Cloudera and its suppliers or licensors, and may not be copied, imitated or used, in whole or in part, without the prior written permission of Cloudera or the applicable trademark holder. If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required notices. A copy of the Apache License Version 2.0, including any notices, is included herein. A copy of the Apache License Version 2.0 can also be found here: Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. All other trademarks, registered trademarks, product names and company names or logos mentioned in this document are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, supplier or otherwise does not constitute or imply endorsement, sponsorship or recommendation thereof by us. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Cloudera. Cloudera may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Cloudera, the furnishing of this document does not give you any license to these patents, trademarks copyrights, or other intellectual property. For information about patents covering Cloudera products, see The information in this document is subject to change without notice. Cloudera shall not be liable for any damages resulting from technical errors or omissions which may be present in this document, or from use of this document. Cloudera, Inc. 395 Page Mill Road Palo Alto, CA info@cloudera.com US: Intl: Release Information Version: Workload Experience Manager 1.0.x Date: November 26, 2018

3 Table of Contents Overview of Workload Experience Manager...5 Default Time Range...5 Workload XM Diagnostic Data Collection...5 Sources of Data Sent to Workload XM...5 Diagnostic Data Collection Details...6 Redaction Capabilities for Diagnostic Data...6 What's New from Workload XM...8 Download SQL Commands to Address "Corrupt Table Statistics" and "Missing Table Statistics" Query Health Checks...8 New Log and Query Redaction Configuration Properties for Telemetry Publisher...9 Proxy Server Support for Telemetry Publisher...9 Multiple Usability Improvements...9 Setting Up Workload Experience Manager with Telemetry Publisher...12 Pre-Requisites for Setting Up Workload XM...12 Configuring a Firewall for Workload XM...12 Redact Data Before Sending to Workload XM...13 Connecting Cloudera Manager to Workload XM...14 Step 1. Get Altus Credentials...14 Step 2. Add Altus Credentials to Cloudera Manager...15 Step 3. Add the Telemetry Publisher Service Role...15 Configuring Telemetry Publisher When Key Trustee Is Enabled...18 Logging In to Workload XM...19 Using Workload Experience Manager (XM)...20 Default Time Range...20 Common Use Cases...20 Troubleshooting Abnormal Job Durations...20 (Hadoop Administrators) Troubleshooting Failed Data Engineering Jobs...24 (Application Developers) Determining Cause for Slow and Failed Queries...25 Workload Experience Manager (XM) Reference...28

4 Data Warehouse (Apache Impala) Query Status...28 Data Warehouse (Apache Impala) Query Types...28 Data Warehouse (Apache Impala) Health Checks...31 Potential SQL Issues...35 Data Engineering (Apache Hive, Spark, MapReduce) Health Checks...37 Appendix: Apache License, Version

5 Overview of Workload Experience Manager Overview of Workload Experience Manager Workload Experience Manager (Workload XM) is a tool that provides insights to help you gain in-depth understanding of the workloads you send to clusters managed by Cloudera Manager. In addition, it provides information that can be used for troubleshooting failed jobs and optimizing slow jobs that run on those clusters. After a job ends, information about job execution is sent to Workload XM with the Telemetry Publisher, a role in the Cloudera Manager Management Service. Workload XM uses the information to display metrics about the performance of a job. Additionally, Workload XM compares the current run of a job to previous runs of the same job by creating baselines. You can use the knowledge gained from this information to identify and address abnormal or degraded performance or potential performance improvements. Default Time Range If you have not specified a time range, Workload Experience Manager (Workload XM) displays data for the last 24 hours by default. If there is no data available for the last 24 hours, Workload XM displays the full range that is available by default. Workload XM Diagnostic Data Collection When you enable Workload XM, the Cloudera Management Service starts the Telemetry Publisher role. Telemetry Publisher collects and transmits metrics as well as configuration and log files from Impala, Oozie, Hive, YARN, and Spark services for jobs running on CDH clusters to Workload XM. Telemetry Publisher collects metrics for all clusters that use the environments where Workload XM is enabled. This topic describes the sources of information sent to Workload XM and how that data is redacted: Sources of Data Sent to Workload XM Workload Experience Manager 5

6 Overview of Workload Experience Manager The above diagram shows the sources from which you can configure Telemetry Publisher to collect diagnostic data. This data is collected in the following ways: Pull Telemetry Publisher pulls diagnostic data from these services periodically (once per minute, by default). These sources are indicated with the outbound arrows leading from Telemetry Publisher in the above diagram. They are Oozie, YARN, and Spark. Push A Cloudera Manager Agent pushes diagnostic data from these services to Telemetry Publisher within 5 seconds after a job finishes. These sources are indicated with the inbound arrows to Telemetry Publisher. They are Hive and Impala. After the diagnostic data reaches Telemetry Publisher, it is stored temporarily in its data directory and periodically (once per minute) exported to Workload XM. Diagnostic Data Collection Details The diagnostic data collected by Telemetry Publisher and sent to Workload XM includes the following: MapReduce Jobs Telemetry Publisher polls the YARN Job History Server for recently completed MapReduce jobs. For each of these jobs, Telemetry Publisher collects the configuration and jhist file, which is the job history file that contains job and task counters, from HDFS. Telemetry Publisher can be configured to collect MapReduce task logs from HDFS and send them to Workload XM. By default, this log collection is turned off. Spark Applications Telemetry Publisher polls the Spark History Server for recently completed Spark applications. For each of these applications, Telemetry Publisher collects their event log from HDFS. Telemetry Publisher only collects Spark application data from Spark version 2.2 and later. Telemetry Publisher can be configured to collect the executor logs of Spark applications from HDFS and send them to Workload XM, but this data collection is turned off by default. Important: CDH version 5.x is packaged with Spark 1.6 so you cannot configure Telemetry Publisher data collection for CDH 5.x clusters unless you are using CDS 2.2 Powered by Apache Spark or later versions with those clusters. Oozie Workflows Telemetry Publisher polls Oozie servers for recently completed Oozie workflows and sends their details to Workload XM. Hive Queries Telemetry Publisher uses the same mechanism used by Cloudera Navigator (a Cloudera Manager agent) to collect Hive query audits. The Cloudera Manager agent periodically searches for query detail files that are generated by HiveServer2 after a query completes and then sends the details from those files to Telemetry Publisher. Important: Cloudera Navigator does not need to be enabled on the cluster, but Hive query audits must be enabled. Impala Queries A Cloudera Manager agent periodically looks for query profiles of recently completed queries and sends them to Telemetry Publisher. Redaction Capabilities for Diagnostic Data The diagnostic data collected by Telemetry Publisher might contain sensitive data in the job configurations or the logs. There are several ways you can redact sensitive data before it is sent to Telemetry Publisher. Cloudera recommends enabling the following redaction features even if you are not sending diagnostic data to Telemetry Publisher: Log and query redaction Refer to the Workload XM documentation and the Cloudera Manager documentation for information about log and query redaction. This redaction feature enables you to redact information in logs and queries collected by Telemetry Publisher based on filters created with regular expressions. MapReduce job properties redaction You can redact job configuration properties before they are stored in HDFS. Since Telemetry Publisher reads job configuration files from HDFS, it only fetches redacted configuration information. See Redacting MapReduce Job Properties on page 14 for more information. 6 Workload Experience Manager

7 Overview of Workload Experience Manager Spark event and executor log redaction The Spark2 on YARN service has the spark.redaction.regex configuration property that can be used to redact sensitive data from event and executor logs. When this configuration property is enabled, Telemetry Publisher sends only redaction data to Workload XM. This configuration property is enabled by default, but can be overridden by using safety valves in Cloudera Manager or in the Spark application itself. See Redact Data Before Sending to Workload XM on page 13 for more information about data redaction in Workload XM. Workload Experience Manager 7

8 What's New from Workload XM What's New from Workload XM Download SQL Commands to Address "Corrupt Table Statistics" and "Missing Table Statistics" Query Health Checks If your queries trigger the Corrupt Table Statistics or the Missing Table Statistics health checks, Workload XM generates the SQL code you can copy and run on your cluster to address these issues. To download SQL code for creating or repairing table statistics: 1. Under Data Warehouse, select Queries. 2. On the Queries page, select the time period you want to investigate for the Range column. 3. In the Health Check column, select either Corrupt Table Statistics or Missing Table Statistics. This filters out queries that do not trigger these health checks. 4. Click the query to view its details. 5. In the Performance Issues region of the query details page, click the Health Check Violations tab. This lists the health checks that were triggered for this query. It is here you see the SQL code that you can copy and run to repair the table statistics issues. 8 Workload Experience Manager

9 What's New from Workload XM New Log and Query Redaction Configuration Properties for Telemetry Publisher In Cloudera Manager 5.16, you can now configure log and query redaction for the Telemetry Publisher service in Cloudera Manager. By default this configuration is enabled. For more information, see Log and Query Redaction for the Telemetry Publisher Service on page 16. Proxy Server Support for Telemetry Publisher In Cloudera Manager 5.16, you can now configure the Telemetry Publisher service to send metrics as well as configuration and log files to Workload XM by way of a proxy server for database and Altus metrics uploads. For more information, see Configuring Telemetry Publisher to Use a Proxy Server on page 17 Multiple Usability Improvements The Workload XM team is constantly improving usability. Here are some of our recent upgrades to the user experience: Support for parsing Spark 2.3 application history logs. Job history files and Spark event logs are now available to download from the Execution Detail tab in the Job detail page: Figure 1: Download Job History Files Workload Experience Manager 9

10 What's New from Workload XM Figure 2: Download Spark Event Logs Additions to the Query Detail page. Now you can download the query profile for Impala queries and view the total number of joins performed for a specific query: New Concurrency chart added to the Data Warehouse Summary page. This chart shows query concurrency in the cluster during a selected time range. You can use this chart to gain insight, such as identifying potential resource contention in the cluster or using it to identify the busiest time of day on your cluster. 10 Workload Experience Manager

11 What's New from Workload XM Workload Experience Manager 11

12 Setting Up Workload Experience Manager with Telemetry Publisher Setting Up Workload Experience Manager with Telemetry Publisher Diagnostic information about job and query execution is sent to Workload Experience Manager (Workload XM) with Telemetry Publisher, a role in the Cloudera Manager Management Service. When new clusters are added with Cloudera Manager, Telemetry Publisher automatically sends the new cluster information to Workload XM. This section describes how to connect Cloudera Manager to Workload XM by configuring Telemetry Publisher: Pre-Requisites for Setting Up Workload XM Connecting Cloudera Manager to Workload XM Configuring Telemetry Publisher When Key Trustee Is Enabled Pre-Requisites for Setting Up Workload XM Before you can set up Cloudera Manager's Telemetry Publisher service to send diagnostic data to Workload XM, you must make sure you have the correct versions of Cloudera Manager and CDH: Supported Versions of Cloudera Manager and CDH To use Workload XM with CDH clusters managed by Cloudera Manager, you must have the following versions: For CDH 5.x clusters: Cloudera Manager version and later CDH version 5.8 and later Important: Workload XM is not available on Cloudera Manager 6.0 whether you are managing CDH 5.x or CDH 6.x clusters. After you have verified that you have the correct versions of Cloudera Manager and CDH, you must configure data redaction and your firewall. These topics are addressed in the following sections: Configuring a Firewall for Workload XM Workload XM is a cloud service which runs on Amazon Web Services (AWS). The Telemetry Publisher service, which was introduced in Cloudera Manager version , collects metrics from various components in a CDH cluster and securely sends these metrics by way of Transport Layer Security (HTTPS) over the internet to the Workload XM service as shown in the following illustration. 12 Workload Experience Manager

13 Setting Up Workload Experience Manager with Telemetry Publisher To connect an on-premises CDH cluster to communicate with Workload XM, you must configure your firewall using the following information. The Cloudera Telemetry Publisher service makes outbound connections to two endpoints to communicate with Workload XM as follows: Endpoint #1: This endpoint maps to a dynamic IP address in AWS us west-1. AWS us west-1 IP address ranges are documented here. Endpoint #2: This endpoint also maps to a dynamic IP address in AWS us west-1. See the above link for the IP address ranges that are documented on the AWS web site. Starting with Cloudera Manager version , you can also configure an HTTP proxy between Telemetry Publisher and Workload XM. In this configuration, the proxy acts as an HTTP tunnel for the encrypted TLS communication between Telemetry Publisher and Workload XM. See Configuring Telemetry Publisher to Use a Proxy Server on page 17 for details. Redact Data Before Sending to Workload XM Telemetry Publisher collects diagnostic data from logs, job configurations, and queries, and then sends this data to Workload XM. This diagnostic information might contain sensitive data so it is desirable to redact the sensitive information before Telemetry Publisher sends it to Workload XM. Redact Logs and Queries To redact sensitive data in the CDH cluster, such as log files, use Cloudera Manager. See Log and Query Redaction in the Cloudera Manager documentation. However, note that this only redacts data, not metadata. Sensitive data in files is redacted, but the name, owner, and other metadata about the files is not redacted. The Cloudera documentation Workload Experience Manager 13

14 Setting Up Workload Experience Manager with Telemetry Publisher referred to above explains what is redacted and what is not. Also see Log and Query Redaction for the Telemetry Publisher Service on page 16 for additional details about log and query redaction in Workload XM. Redact Spark Data The Spark on YARN service in CDH enables the spark.redaction.regex configuration property by default, which redacts sensitive data from event and executor logs. Do not override this setting to ensure that Telemetry Publisher only sends redacted information to Workload XM. Redacting MapReduce Job Properties Set the mapreduce.job.redacted-properties configuration property for YARN to redact MapReduce job configuration properties before they are stored in HDFS. Telemetry Publisher reads the job configuration file from HDFS, so if you set this property for all the MapReduce jobs you use, only redacted job configuration information is fetched from HDFS. To set this property in Cloudera Manager: 1. In the Cloudera Manager Admin Console, select the YARN service, and then click the Configuration tab. 2. Search for mapreduce.job.redacted-properties to locate this configuration property. By default, several MapReduce job properties are set. Leave these set as they are. 3. Click the plus sign after the last property listed and add any additional properties for your MapReduce jobs. 4. Click Save Changes and restart the YARN service. Connecting Cloudera Manager to Workload XM Diagnostic information about job and query execution is sent to Workload Experience Manager (Workload XM) with Telemetry Publisher, a role in the Cloudera Manager Management Service. When new clusters are added with Cloudera Manager, Telemetry Publisher automatically sends the new cluster information to Workload XM. This topic describes how to connect Cloudera Manager to Workload XM by connecting the Cloudera Manager Telemetry Publisher service role to a Cloudera Altus account. Note: Cloudera recommends using Java 8. If you are using Java 7, additional steps are required when you add the Telemetry Publisher service role. Connecting Cloudera Manager Telemetry Publisher to Workload XM is a three-step process. After connecting Cloudera Manager to Workload XM, you must enable Log and Query Redaction for the Telemetry Publisher Service on page 16. If you must use a proxy server with Workload XM, see Configuring Telemetry Publisher to Use a Proxy Server on page 17. Step 1. Get Altus Credentials In order to use Workload XM, you need an Altus account. For more information about how to set up an Altus account, see the Cloudera Altus documentation. 1. Go to wxm.cloudera.com, and follow the prompts to set up your account. 2. On the Altus Home page, click on your user name in the upper right corner of the page, and select My Account. 3. Click Generate Access Key. This creates an Altus Access Key ID and Altus Private Key. The Altus Access Key ID and Altus Private Key are needed to add an Altus account to Cloudera Manager. Note: The Cloudera Altus console displays the API access key immediately after you create it. You must copy or download the access key information when it is displayed. Do not exit the console without copying the keys. After you exit the console, there is no other way to view or copy the access key. 14 Workload Experience Manager

15 Step 2. Add Altus Credentials to Cloudera Manager 1. Sign in to the Cloudera Manager Admin Console. 2. Navigate to Administration > External Accounts > Altus Credentials. 3. Select Add Access Key Authentication, provide the following information, and click Add: Name Altus Access Key ID Altus Private Key 4. Navigate to Administration > Settings. Type Altus in the search box to find the Telemetry Altus Account configuration setting. Then select the Altus credentials you created and named in Step Click Save Changes. Step 3. Add the Telemetry Publisher Service Role After you add an Altus account, add the Telemetry Publisher service role to the Cloudera Manager Service. Important: Before you specify a host cluster for the Telemetry Publisher service, make sure that you name the cluster with a human-readable name in Cloudera Manager. If you do not name the cluster in Cloudera Manager before you associate the cluster with the Telemetry Publisher service, Workload XM identifies the cluster with a random string of 32 characters, such as 44a6e75e ea e84c2, which is difficult to identify and work with in the Workload XM application. To rename a cluster in Cloudera Manager: Setting Up Workload Experience Manager with Telemetry Publisher 1. On the Home page of the Admin Console, click the Clusters drop-down list and select the cluster you want to rename. 2. On the cluster page, click the Actions menu adjacent to the cluster name, and select Rename Cluster. 3. In the Rename Cluster dialog box, type the new cluster name, and then click Rename Cluster. 1. In the Cloudera Manager Admin Console, navigate to Clusters > Cloudera Management Service. 2. Select Actions > Add Role Instances. The Add Role Instances wizard opens. If a Telemetry Publisher role already exists, Cloudera Manager does not let you add another. 3. Select a host for the Telemetry Publisher and complete the wizard. 4. If you are using Java 8, skip this step. If you are using Java 7, you must configure Telemetry Publisher as follows: 1. In the Cloudera Manager Admin Console, click Cloudera Management Service. 2. On the Cloudera Management Service page, select the Configuration tab and then select the Telemetry Publisher filter under Scope. 3. Type java configuration in the search text box to locate the Java Configuration Options for Telemetry Publisher configuration property and add the following to the text box: -Dhttps.protocols=TLSv1.2 -Dhttps.cipherSuites=TLS_RSA_WITH_AES_256_CBC_SHA256 Workload Experience Manager 15

16 Setting Up Workload Experience Manager with Telemetry Publisher Figure 3: Java 7 Configuration for Telemetry Publisher 4. Click Save Changes. 5. Go to Clusters > Cloudera Management Service and select the Telemetry Publisher role. 6. Click Actions > Test Altus Connection. A successful test indicates that the Telemetry Publisher can connect to Altus. 7. Go to Clusters > Hive> > Instances and restart the roles for Hive. Log and Query Redaction for the Telemetry Publisher Service Log and query redaction for the Telemetry Publisher service is controlled with the Log and Query Redaction configuration property. This property is enabled by default and works with the log and query redaction property for HDFS. If you want to disable log and query redaction for the Telemetry Publisher service, you must also disable log and query redaction for HDFS or the Telemetry Publisher service will not start. The Log and Query Redaction configuration property is only available in Cloudera Manager version 5.16 and later. For more information about log and query redaction, see the Cloudera Manager documentation. Important: Cloudera strongly recommends that you enable log and query redaction for both HDFS and the Telemetry Publisher service to protect sensitive data from being accessed by unauthorized users. If you must disable Telemetry Publisher log and query redaction for testing purposes: 1. In the Cloudera Manager Admin Console, navigate to Clusters > HDFS > Configuration, and type redact into the Search box to locate the log and query redaction properties for HDFS. 2. Uncheck the Enable Log and Query Redaction property, and then click Save Changes. 3. Still in the Cloudera Manager Admin Console, click Clusters > Cloudera Management Service > Configuration > Telemetry Publisher, type redact in the Search box, and uncheck the Log and Query Redaction property for the Telemetry Publisher Default Group: 16 Workload Experience Manager

17 Setting Up Workload Experience Manager with Telemetry Publisher 4. Click Save Changes. 5. Restart both the HDFS and the Telemetry Publisher services to disable log and query redaction. Configuring Telemetry Publisher to Use a Proxy Server You can configure the Telemetry Publisher service to send metrics as well as configuration and log files to WXM by way of a proxy server for database and Altus metrics uploads. You cannot upload information from Amazon Web Services (AWS) by way of a proxy server. By default, this configuration property is disabled. Telemetry Publisher uses the TLS/HTTPS protocol to send telemetry information to WXM. This ensures that the data is encrypted in flight. The proxy you use must support the HTTP CONNECT method in order to be able to pass through the encrypted messages. For more information, see the associated RFC. Telemetry Publisher support for proxy servers is only available in Cloudera Manager version 5.16 and later. To enable Telemetry Publisher to send information by way of a proxy server: 1. In the Cloudera Manager Admin Console, navigator to Clusters > Cloudera Management Service > Configuration > Telemetry Publisher, and type proxy into the Search box to locate the proxy configuration properties: Workload Experience Manager 17

18 Setting Up Workload Experience Manager with Telemetry Publisher 2. Select Telemetry Publisher Default Group and provide the proxy server name, port, username, and password. 3. Click Save Changes, and then restart the Telemetry Publisher service. Configuring Telemetry Publisher When Key Trustee Is Enabled When Key Trustee is enabled, the default HDFS user for Telemetry Publisher (hdfs) does not have permission to download the relevant files from HDFS. The Telemetry Publisher user must be in both of the user groups that contain the Job History Server and the Spark History Server. For example, if the Job History Server is in the hadoop user group and the Spark History Server is in the spark user group, the Telemetry Publisher user must be in both the hadoop group and the spark group to download files from HDFS when Key Trustee is enabled. 18 Workload Experience Manager

19 Logging In to Workload XM Logging In to Workload XM To access Workload XM, perform the following steps: 1. Log in to the Workload XM console: wxm.cloudera.com/ 2. In Search, type the name of the cluster you want to analyze. 3. Select either Data Warehouse or Data Engineering summary pages or details (queries or jobs) in the left navigation menu. These links launch pages where you can drill down to view health checks, execution details, baselines, and trends. There can be a delay from job completion to when the job is available in Workload XM. Large jobs can take up to 10 minutes to display in Workload XM. For information about how to use Workload XM and the information it can provide, see Using Workload Experience Manager (XM) on page 20. Workload Experience Manager 19

20 Using Workload Experience Manager (XM) Using Workload Experience Manager (XM) Default Time Range If you have not specified a time range, Workload Experience Manager (Workload XM) displays data for the last 24 hours by default. If there is no data available for the last 24 hours, Workload XM displays the full range that is available by default. Common Use Cases The following examples of use cases provide an introduction to Workload XM's capabilities. Troubleshooting Abnormal Job Durations Using Workload XM to find and troubleshoot any slow-running jobs to help identify areas of risk in jobs running on your cluster. 1. Log in to the Workload XM console at: wxm.cloudera.com, and in Search, type the name of the cluster you want to analyze. 2. On the Data Engineering Summary page, click the time range in the upper right corner of the page and specify a time range you are interested in. 3. In the Trend graph, click the Abnormal Duration tab to view the number of jobs with an abnormal duration that executed within the selected time frame. Any jobs that fall outside of the baseline duration will be marked as slow. If you hover over the graph, a comparison between the current period and the previous period displays. 20 Workload Experience Manager

21 Using Workload Experience Manager (XM) After reviewing the chart, click the number of Abnormal Duration jobs above the graph to see a list of the slow jobs within the specified time range. 4. After clicking the Abnormal Duration number, a list of all slow jobs displays on the Data Engineering Jobs page. These jobs have all triggered the Duration health check: From the Duration drop-down list, select a duration range or select Customize to enter a custom minimum or maximum duration to view any jobs that meet that duration criteria. 5. Click on the Job name to view more detailed information. Under the Duration health check, you can see that this job finished much slower than the normal duration: Workload Experience Manager 21

22 Using Workload Experience Manager (XM) To further investigate, click the Task Duration health check. 6. After clicking Task Duration, you can see that this job contains several tasks that are heavily skewed, meaning that they took an abnormal amount of time to finish: Click one of the tasks to view further details about it. 7. After clicking one of the tasks, the Task Details pane displays details about its run. In addition to the long run time, garbage collection is taking significantly more time than the average task: 22 Workload Experience Manager

23 Using Workload Experience Manager (XM) Click Task GC Time to view more information about garbage collection for this job. 8. On the Task GC Time page, click the Execution Details tab, and then click one of the MapReduce stages: 9. In the MapReduce stage Summary page, click View Configurations, and then enter part of the MapReduce memory configuration property name to search for and view the configuration for garbage collection: In the above case, setting this property to 1024 might be causing the mapper JVM to have insufficient memory, which triggers too frequent garbage collection. Increasing this number might improve performance on your cluster. Workload Experience Manager 23

24 Using Workload Experience Manager (XM) (Hadoop Administrators) Troubleshooting Failed Data Engineering Jobs Use Workload XM to quickly troubleshoot failed data engineering jobs. 1. Log in to the Workload XM console at: wxm.cloudera.com, and in Search, type the name of the cluster you want to analyze. 2. On the Data Engineering Jobs page, click the Health Checks drop-down list, and select Failed to Finish. This filters the list to display a list of jobs that did not complete. 3. In the list of jobs, click on the Job name to view more detailed information: 4. On the Jobs details page, click Health Checks to view details for the Failed to Finish health check. It indicates that the failure occurred in the Map stage of job execution: Click on Map Stage and then click Execution Details. 5. In the Summary section of the page, click on the number of failures to see all failed tasks.: 24 Workload Experience Manager

25 Using Workload Experience Manager (XM) 6. Click on a failed task to see the error message from each failed attempt. In this example, the error message, Task KILL is received. Killing attempt!, is not very descriptive or helpful. To gather more information about the task failure, open the associated log file to further analyze the root cause. (Application Developers) Determining Cause for Slow and Failed Queries You can also use Workload XM to find the cause of slow query run times and long execution times. 1. Log in to the Workload XM console at: wxm.cloudera.com, and in Search, type the name of the cluster you want to analyze. 2. On the Data Engineering Jobs page, click the Health Checks drop-down list, and select Task Wait Time. This filters the list to display jobs with longer than average wait times. Workload Experience Manager 25

26 Using Workload Experience Manager (XM) 3. Click on the Job name to view more detailed information. 4. On the details page for that job, click Health Checks and then click Task Wait Time to see which tasks have abnormally long wait times. Click one of the tasks listed under Outlier Tasks to view details about it. 5. When you view the Outlier Task details, notice the long wait time, which is indicated under Wait Duration. Compare this value to the run time once started, indicated under Successful Attempt Duration. The Successful Attempt Duration value is significantly better than the average. This could mean that insufficient resources were allocated for this job. 26 Workload Experience Manager

27 Using Workload Experience Manager (XM) Workload Experience Manager 27

28 Workload Experience Manager (XM) Reference Workload Experience Manager (XM) Reference The following topics provide descriptions of health checks for data engineering jobs, which involve Hive, MapReduce, and Spark, and descriptions of health checks for data warehousing workloads, which involve Impala. In addition to health check descriptions, these topics also provide recommendations for addressing the conditions that trigger health checks and information about the query statuses, types, and potential SQL issues that are identified by Workload XM. Impala Query Status Data Warehouse Query Types Impala Health Checks Potential SQL Issues Hive, Spark, MapReduce Health Checks Data Warehouse (Apache Impala) Query Status Query statuses appear in the Failed Queries graph on the Data Warehouse Summary page and in the Status drop-down list on the Data Warehouse Queries page. All query statuses are described in the following table: Query Status Analysis Exception Authorization Exception Cancelled Exceeded Memory Limit Failed - Any Reason Other Failures Rejected from Pool Session Closed Succeeded Description These queries failed due to syntax errors or incorrect table or column names. These queries failed because the user executing the queries does not have permission to access the data. These queries were cancelled by the system or a user. The amount of memory required to execute these queries exceed the allocated memory limit. Query failed for any reason listed here. These queries failed for other unclassified reasons. These queries failed because there are too many queries already pending in the Impala resource pool. The session was closed by the system or a user for this set of queries. Query succeeded. Data Warehouse (Apache Impala) Query Types Query types appear in the Type drop-down list on the Data Warehouse Queries page. All query types are described in the following table. For more detailed information about these SQL statements, see the Impala documentation. Query Types ALTER TABLE ALTER VIEW Description Changes the structure or properties of an existing table. For example, ALTER TABLE table_name ADD PARTITION (month=1, day=1); Changes the characteristics of a view. For example, ALTER VIEW view_name AS SELECT * FROM table_name; 28 Workload Experience Manager

29 Workload Experience Manager (XM) Reference Query Types COMPUTE STATS CREATE DATABASE CREATE FUNCTION CREATE ROLE CREATE TABLE CREATE TABLE AS SELECT CREATE TABLE LIKE CREATE VIEW DDL DESCRIBE DB DESCRIBE TABLE DML DROP DATABASE DROP FUNCTION DROP STATS Description Gathers information about volume and distribution data in a table and all associated columns and partitions. For example, COMPUTE STATS table_name; Creates a new database. For example, CREATE DATABASE database_name; Creates a user-defined function (UDF), which you can use to implement custom logic during SELECT or INSERT operations. For example, CREATE FUNCTION function_name LOCATION 'hdfs_path_to_jar' SYMBOL='class_name'; Creates a role to which privileges can be granted. After privileges are granted to roles, then the roles can be assigned to users. A user who has been assigned a role is only able to exercise the privileges of that role. For example, CREATE ROLE role_name; Creates a new table and specifies its characteristics. For example, CREATE TABLE table_name (column_name data_type) PARTITIONED BY (column_name data_type) LOCATION 'hdfs_path'; Creates a new table with the output from a SELECT statement. For example, CREATE TABLE table_name AS SELECT * FROM table_3; Creates a new table by cloning an existing table. For example, CREATE TABLE table_name_2 LIKE table_name_1; Creates a shorthand abbreviation (alias) for a query. A view is a purely logical construct with no physical data behind it. For example, CREATE VIEW view_name AS SELECT * FROM table_name; Data Definition Language. SQL statements that define data structures. For example, CREATE TABLE; Displays metadata about a database. For example, DESCRIBE database_name; Displays metadata about a table. For example, DESCRIBE table_name; Data Manipulation Language. SQL statements that manipulate data structures. For example, ALTER TABLE; Removes a database from the system. For example, DROP database_name; Removes a user-defined function (UDF) so that it is not available for execution during Impala SELECT or INSERT operations. For example, DROP FUNCTION function_name; Removes the specified statistics from a table or a partition. For example, DROP STATS table_name; Workload Experience Manager 29

30 Workload Experience Manager (XM) Reference Query Types DROP TABLE DROP VIEW EXPLAIN GRANT PRIVILEGE GRANT ROLE LOAD N/A REFRESH REVOKE PRIVILEGE REVOKE ROLE SELECT SET SHOW COLUMN STATS SHOW CREATE TABLE SHOW DATABASES Description Removes a table and its underlying HDFS data files for internal tables, although not for external tables. For example, DROP TABLE table_name; Removes the specified view. Because a view is purely a logical construct with no physical data behind it, DROP VIEW only involves changes to metadata in the metastore database, not any data files in HDFS. For example, DROP VIEW view_name; Generates a query execution plan for a specific query. For example, EXPLAIN SELECT * FROM table_1; Grants privileges on specified objects to groups. For example, GRANT privilege_name ON TABLE table_name TO role_name; Grants roles on specified objects to groups. For example, GRANT ROLE role_nameto GROUP group_name; Loads data from an external data source into a table. For example, LOAD DATA INPATH 'hdfs_file_or_directory_path' INTO TABLE tablename; These queries failed due to syntax errors and Impala is not able to identify a query type for them. Reloads the metadata for a table from the metastore database and does an incremental reload of the file and block metadata from the HDFS NameNode. REFRESH is used to avoid inconsistencies between Impala and external metadata sources, specifically the Hive Metastore and the NameNode. For example, REFRESH table_name; Revokes privileges on a specified object from groups. For example, REVOKE privilege_name ON TABLE table_name; Revokes roles on a specified object from groups. For example, REVOKE ROLE role_name FROM GROUP group_name; Requests data from a data source. For example, SELECT * FROM table_1; Sets configuration properties or session parameters. For example, SET compression_codec=snappy; Displays the column statistics for a specified table. For example, SHOW COLUMN STATS table_name; Displays the CREATE TABLE statement used to reproduce the current structure of a table. For example, SHOW CREATE TABLE table_name; Displays all available databases. For example, SHOW DATABASES; 30 Workload Experience Manager

31 Workload Experience Manager (XM) Reference Query Types SHOW FILES SHOW FUNCTIONS SHOW GRANT ROLE SHOW ROLES SHOW TABLES SHOW TABLE STATS TRUNCATE TABLE USE Description Displays the files that constitute a specified table or a partition within a partitioned table. For example, SHOW FILES IN table_name; Displays user-defined functions (UDFs) or user-defined aggregate functions (UDAFs) that are associated with a particular database. For example, SHOW FUNCTIONS IN database_name; or SHOW AGGREGATE FUNCTIONS IN database_name; Lists all the grants for the specified role name. For example, SHOW GRANT ROLE role_name; Displays all available roles. For example, SHOW ROLES; Displays the names of tables. For example, SHOW TABLES; Displays the statistics for a table. For example, SHOW TABLE STATS table_name; Removes the data from an Impala table, while leaving the table itself. For example, TRUNCATE TABLE table_name; Switches the current session to a specified database. For example, USE database_name; Data Warehouse (Apache Impala) Health Checks Impala health checks appear in the Suboptimal Queries graph on the Data Warehouse Summary page and in the Health Check drop-down list on the Data Warehouse Queries page. All query health checks are described in the following table. These health checks provide hints about how to make your workloads faster or they point out which aspects of your queries might be causing bottlenecks on your cluster. However, the following recommendations are not exhaustive and there may be additional fixes other than those listed below that can make your workloads run faster. It is important to note that query tuning can be as much an art as a science. If you are currently satisfied with your cluster performance, you can use these health checks as a way to gain insights into how your query workloads are executing on your cluster. That said, the suboptimal conditions identified by these health checks might cause problems as new applications are added, the system footprint is expanded, or the overall load on the system increases. Use these health checks to proactively monitor potential issues across your cluster. Table 1: Health Checks Aggregation Spilled Partitions Description Indicates that data spilled to disk during the aggregation operation for these queries. This health check is triggered during aggregation if there is not enough memory, which causes data to spill to disk. If you are satisfied with your cluster performance despite this health check being triggered, you can disregard it. If performance is an issue, try the following fixes: Use a less complex GROUP BY clause that involves fewer columns (do not use a high cardinality GROUP BY clause). Workload Experience Manager 31

32 Workload Experience Manager (XM) Reference Health Checks Bytes Read Skew Corrupt Table Statistics HashJoin Spilled Partitions Insufficient Partitioning Many Materialized Columns Description Increase the setting for the query's MEM_LIMIT query option. See the Impala documentation. Add more physical memory. For more details, see the Impala documentation SQL Operations that Spill to Disk. Indicates that one of the cluster nodes is reading a significantly larger amount of data than other nodes. To address this condition, rebalance the data or use the Impala SCHEDULE_RANDOM_REPLICA query option. For additional suggestions, see Avoiding CPU Hotspots for HDFS Cached Data in the Impala documentation set. Indicates that these queries contain table statistics that were incorrectly computed and cannot be used. This condition can be caused by metastore database issues. Recompute table statistics. For more information, see Detecting Missing Statistics in the Impala documentation set. Indicates that data spilled to disk during the hash join operation for these queries. This condition occurs when there is not enough memory during the hash join, which causes data to spill to disk. To address this issue: Reduce the cardinality of the right-hand side of the join by filtering more rows from it. Add more physical memory. Increase the setting for the query's MEM_LIMIT query option. See the Impala documentation. Use a denormalized table. Indicates that there is insufficient partitioning for parallel query execution to occur for these queries. This condition is triggered when query execution is wasting resources and time because the system is reading rows that are not required for the operation. To address this condition: Check to see if your more popular filters can become partition keys. For example, if you have many queries that use ship date as a filter, consider creating partitions using ship date as the partition key. Add filters to your query for existing partition columns. For more details see Partitioning for Impala Tables in the Impala documentation set. Indicates that an abnormally large number of columns were returned for these queries. 32 Workload Experience Manager

33 Workload Experience Manager (XM) Reference Health Checks Missing Table Statistics Slow Aggregate Slow Client Slow Code Generation Description This condition is only triggered for Parquet tables. If you are reading more than 15 columns, this health check is triggered. To address this condition, rewrite the query so it does not return more than 15 columns. Indicates that no table statistics were computed for query optimization for these queries. To address this condition, compute table statistics. For more information, see Detecting Missing Statistics in the Impala documentation set. Indicates that the aggregation operations were slower than expected for these queries. Ten million rows per second is the typical throughput and if the observed throughput is less than that, this health check is triggered. Observed throughput is calculated by dividing the time spent in the aggregation operation into the number of input rows. Addressing this condition depends on the root cause: If the root cause is resource conflicts with other queries, then allocate different resource pools to reduce conflicts. If the root cause is overly complex GROUP BY operations, then rewrite the queries to simplify the GROUP BY operations. Indicates that the client consumed query results slower than expected for these queries. The causes and remediations for this health check can vary: If the condition is triggered because some clients are taking too long to unregister the query, then use more appropriate clients for the workload. For example, if you are testing and building SQL queries, it might make more sense to use an interactive client over ODBC or JDBC. If the condition is triggered because you are doing exploratory analysis and reading some rows and then waiting for some time to read the next set of rows, this uses up systems resources because the query has not closed. To remediate, consider using the Impala timeout feature. See Setting Timeout Periods for Daemons, Queries, and Sessions in the Impala documentation set. As an additional option, consider adding a LIMIT clause to your queries to limit the number of rows returned to 100 or less. Indicates that compiled code was generated more slowly than expected for these queries. Workload Experience Manager 33

34 Workload Experience Manager (XM) Reference Health Checks Slow HDFS Scan Description In every query plan fragment, Impala considers how much time is used to generate the code and this health check indicates that the time exceeded 20% of the overall query execution time. This might be triggered by query complexity. For example, if the query has too many predicates in its WHERE clauses, too many joins, or too many columns. For queries where code generation is too slow, consider using the DISABLE_CODEGEN query option in your session. Indicates that scanning data from HDFS was slower than expected for these queries. Note: If the workload is accessing data that is stored on Amazon S3, this is a known limitation of that storage platform. This condition is caused by a slow disk, extremely complex scan predicates, or the HDFS NameNode is too busy. The HDFS scan rate is based on the amount of time that the scanner took to read a specific number of rows. This condition can be addressed by: Replacing the disk if the cause is a slow disk. Reduce complexity by simplifying the scan predicates. If the HDFS NameNode is too busy, consider upgrading to CDH 5.15 or later. For more information, see Upgrading Cloudera Manager and CDH. Slow Hash Join Slow Query Planning Indicates that hash join operations were slower than expected for these queries. This health check might be triggered when there are overly complex join predicates or the hash join is causing data to spill to disk. Five million rows per second is the typical throughput and if the observed throughput is less than that, this health check is triggered. Observed throughput is calculated by dividing the number of input rows by the time spent in the hash join operation. To remediate this condition, simplify the join predicates or reduce the size of the right side of the join. Indicates that the query plan was generated more slowly than expected for these queries. This health check is triggered when the query planning time exceeds 30% of the overall query execution time. This can be caused by very complex queries or if a metadata refresh occurs while the query is executing. To remediate this condition, consider simplifying your queries. For example, reduce the number of columns returned, reduce the number of filters, or reduce the number of joins. 34 Workload Experience Manager

35 Workload Experience Manager (XM) Reference Health Checks Slow Row Materialization Slow Sorting Speed Slow Write Speed Description Indicates that rows were returned more slowly than expected for these queries. This health check is triggered if it takes more than 20% of the query execution time to return rows. It can be caused by overly complex expressions in the SELECT list or when too many rows are requested. To address this condition, simplify the query by reducing the number of columns in the select list or by reducing the number of rows requested. Indicates that the sorting operations were slower than expected for these queries. Ten million rows per second is the typical throughput and if the observed throughput is less than that, this health check is triggered. Observed throughput is calculated by dividing the number of input rows by the time spent in the sorting operation. To remediate this condition, simplify the ORDER BY clause in queries. If data is spilling to disk, reduce the volume of data to be sorted by adding more predicates to the WHERE clauses, by increasing the available memory, or by increasing the value specified for the MEM_LIMIT query option. See the Impala documentation. Indicates that the query write speed is slower than expected for these queries. Note: If the workload is accessing data that is stored on Amazon S3, this is a known limitation of that storage platform. If the difference between actual write time and the expected write time are more than 20% of the query execution time, this health check is triggered. This condition can be caused when overly complex expressions are used, too many columns are specified, or too many rows are requested in the SELECT list. To address this condition, simplify the query by reducing the number of columns, or by reducing the complexity of the SELECT list expression. Potential SQL Issues Potential issues found in the SQL in your workloads appear in the Performance Issues region of the query details page when you click a query in the list on the Data Warehouse Queries page. Potential SQL issues are common mistakes made in writing SQL. All SQL issues that Workload Experience Manager (Workload XM) identifies are listed in the following table. Workload Experience Manager 35

Product Compatibility Matrix

Product Compatibility Matrix Compatibility Matrix Important tice (c) 2010-2014, Inc. All rights reserved., the logo, Impala, and any other product or service names or slogans contained in this document are trademarks of and its suppliers

More information

Altus Data Engineering

Altus Data Engineering Altus Data Engineering Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

Cloudera Manager Quick Start Guide

Cloudera Manager Quick Start Guide Cloudera Manager Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this

More information

Altus Shared Data Experience (SDX)

Altus Shared Data Experience (SDX) Altus Shared Data Experience (SDX) Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document

More information

Cloudera Connector for Netezza

Cloudera Connector for Netezza Cloudera Connector for Netezza Important Notice 2010-2017 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are

More information

Cloudera Search Quick Start Guide

Cloudera Search Quick Start Guide Cloudera Search Quick Start Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained

More information

Important Notice Cloudera, Inc. All rights reserved.

Important Notice Cloudera, Inc. All rights reserved. Cloudera Upgrade Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

Important Notice Cloudera, Inc. All rights reserved.

Important Notice Cloudera, Inc. All rights reserved. Cloudera Operation Important Notice 2010-2017 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

Prerequisites for Using Enterprise Manager with Your Primavera Applications

Prerequisites for Using Enterprise Manager with Your Primavera Applications Oracle Enterprise Manager For Oracle Construction and Engineering Configuration Guide for On Premises Version 18 August 2018 Contents Introduction... 5 Prerequisites for Using Enterprise Manager with

More information

Cloudera Navigator Data Management

Cloudera Navigator Data Management Cloudera Navigator Data Management Important Notice 2010-2019 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document

More information

Cloudera Connector for Teradata

Cloudera Connector for Teradata Cloudera Connector for Teradata Important Notice 2010-2017 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document

More information

Cloudera ODBC Driver for Apache Hive

Cloudera ODBC Driver for Apache Hive Cloudera ODBC Driver for Apache Hive Important Notice 2010-2017 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document,

More information

Cloudera ODBC Driver for Apache Hive Version

Cloudera ODBC Driver for Apache Hive Version Cloudera ODBC Driver for Apache Hive Version 2.5.15 Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and any other product or service

More information

HYCU SCOM Management Pack for F5 BIG-IP

HYCU SCOM Management Pack for F5 BIG-IP USER GUIDE HYCU SCOM Management Pack for F5 BIG-IP Product version: 5.5 Product release date: August 2018 Document edition: First Legal notices Copyright notice 2015-2018 HYCU. All rights reserved. This

More information

Important Notice Cloudera, Inc. All rights reserved.

Important Notice Cloudera, Inc. All rights reserved. Important Notice 2010-2017 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document, except as otherwise disclaimed,

More information

Cloudera Navigator Data Management

Cloudera Navigator Data Management Cloudera Navigator Data Management Important Notice 2010-2017 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document

More information

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam Impala A Modern, Open Source SQL Engine for Hadoop Yogesh Chockalingam Agenda Introduction Architecture Front End Back End Evaluation Comparison with Spark SQL Introduction Why not use Hive or HBase?

More information

Cloudera Director User Guide

Cloudera Director User Guide Cloudera Director User Guide Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are

More information

Administration 1. DLM Administration. Date of Publish:

Administration 1. DLM Administration. Date of Publish: 1 DLM Administration Date of Publish: 2018-05-18 http://docs.hortonworks.com Contents Replication concepts... 3 HDFS cloud replication...3 Hive cloud replication... 3 Cloud replication guidelines and considerations...4

More information

Cloudera Director User Guide

Cloudera Director User Guide Cloudera Director User Guide Important Notice 2010-2017 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are

More information

Administration 1. DLM Administration. Date of Publish:

Administration 1. DLM Administration. Date of Publish: 1 DLM Administration Date of Publish: 2018-07-03 http://docs.hortonworks.com Contents ii Contents Replication Concepts... 4 HDFS cloud replication...4 Hive cloud replication... 4 Cloud replication guidelines

More information

NetApp Cloud Volumes Service for AWS

NetApp Cloud Volumes Service for AWS NetApp Cloud Volumes Service for AWS AWS Account Setup Cloud Volumes Team, NetApp, Inc. March 29, 2019 Abstract This document provides instructions to set up the initial AWS environment for using the NetApp

More information

Oracle Cloud Using Oracle Big Data Manager. Release

Oracle Cloud Using Oracle Big Data Manager. Release Oracle Cloud Using Oracle Big Data Manager Release 18.2.5 E91848-08 June 2018 Oracle Cloud Using Oracle Big Data Manager, Release 18.2.5 E91848-08 Copyright 2018, 2018, Oracle and/or its affiliates. All

More information

Oracle Cloud Using Oracle Big Data Manager. Release

Oracle Cloud Using Oracle Big Data Manager. Release Oracle Cloud Using Oracle Big Data Manager Release 18.2.1 E91848-07 April 2018 Oracle Cloud Using Oracle Big Data Manager, Release 18.2.1 E91848-07 Copyright 2018, 2018, Oracle and/or its affiliates. All

More information

DURATION : 03 DAYS. same along with BI tools.

DURATION : 03 DAYS. same along with BI tools. AWS REDSHIFT TRAINING MILDAIN DURATION : 03 DAYS To benefit from this Amazon Redshift Training course from mildain, you will need to have basic IT application development and deployment concepts, and good

More information

Cloudera Impala User Guide

Cloudera Impala User Guide Cloudera Impala User Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in

More information

Cloudera QuickStart VM

Cloudera QuickStart VM Cloudera QuickStart VM Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

Cloudera Administration

Cloudera Administration Cloudera Administration Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

Cloudera Director User Guide

Cloudera Director User Guide Cloudera Director User Guide Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are

More information

Important Notice Cloudera, Inc. All rights reserved.

Important Notice Cloudera, Inc. All rights reserved. Cloudera Upgrade Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

HYCU SCOM Management Pack for F5 BIG-IP

HYCU SCOM Management Pack for F5 BIG-IP USER GUIDE HYCU SCOM Management Pack for F5 BIG-IP Product version: 5.6 Product release date: November 2018 Document edition: First Legal notices Copyright notice 2015-2018 HYCU. All rights reserved. This

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

VERITAS StorageCentral 5.2

VERITAS StorageCentral 5.2 VERITAS StorageCentral 5.2 Release Notes Windows Disclaimer The information contained in this publication is subject to change without notice. VERITAS Software Corporation makes no warranty of any kind

More information

Cloudera ODBC Driver for Apache Hive Version

Cloudera ODBC Driver for Apache Hive Version Cloudera ODBC Driver for Apache Hive Version 2.5.10 Important Notice 2010-2014 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and any other product or service

More information

vfire Server Console Guide Version 1.5

vfire Server Console Guide Version 1.5 vfire Server Console Guide Table of Contents Version Details 4 Copyright 4 About this guide 6 Intended Audience 6 Standards and Conventions 6 Introduction 7 Accessing the Server Console 8 Creating a System

More information

Oracle Big Data Manager User s Guide. For Oracle Big Data Appliance

Oracle Big Data Manager User s Guide. For Oracle Big Data Appliance Oracle Big Data Manager User s Guide For Oracle Big Data Appliance E96163-02 June 2018 Oracle Big Data Manager User s Guide, For Oracle Big Data Appliance E96163-02 Copyright 2018, 2018, Oracle and/or

More information

Performance Optimization for Informatica Data Services ( Hotfix 3)

Performance Optimization for Informatica Data Services ( Hotfix 3) Performance Optimization for Informatica Data Services (9.5.0-9.6.1 Hotfix 3) 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Cloudera ODBC Driver for Impala

Cloudera ODBC Driver for Impala Cloudera ODBC Driver for Impala Important Notice 2010-2017 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document,

More information

Cloudera ODBC Driver for Apache Hive Version

Cloudera ODBC Driver for Apache Hive Version Cloudera ODBC Driver for Apache Hive Version 2.5.12 Important Notice 2010-2014 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and any other product or service

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

Cloudera Administration

Cloudera Administration Cloudera Administration Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

DISCLAIMER COPYRIGHT List of Trademarks

DISCLAIMER COPYRIGHT List of Trademarks DISCLAIMER This documentation is provided for reference purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this documentation, this documentation

More information

DefendX Software Control-Audit for Hitachi Installation Guide

DefendX Software Control-Audit for Hitachi Installation Guide DefendX Software Control-Audit for Hitachi Installation Guide Version 4.1 This guide details the method for the installation and initial configuration of DefendX Software Control-Audit for NAS, Hitachi

More information

Cloudera Manager Installation Guide

Cloudera Manager Installation Guide Cloudera Manager Installation Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained

More information

Cloudera Introduction

Cloudera Introduction Cloudera Introduction Important Notice 2010-2017 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

Using SAP NetWeaver Business Intelligence in the universe design tool SAP BusinessObjects Business Intelligence platform 4.1

Using SAP NetWeaver Business Intelligence in the universe design tool SAP BusinessObjects Business Intelligence platform 4.1 Using SAP NetWeaver Business Intelligence in the universe design tool SAP BusinessObjects Business Intelligence platform 4.1 Copyright 2013 SAP AG or an SAP affiliate company. All rights reserved. No part

More information

Apache Hive for Oracle DBAs. Luís Marques

Apache Hive for Oracle DBAs. Luís Marques Apache Hive for Oracle DBAs Luís Marques About me Oracle ACE Alumnus Long time open source supporter Founder of Redglue (www.redglue.eu) works for @redgluept as Lead Data Architect @drune After this talk,

More information

Veritas Desktop and Laptop Option 9.2

Veritas Desktop and Laptop Option 9.2 1. Veritas Desktop and Laptop Option 9.2 Quick Reference Guide for DLO Installation and Configuration 24-Jan-2018 Veritas Desktop and Laptop Option: Quick Reference Guide for DLO Installation and Configuration.

More information

Symantec Ghost Solution Suite Web Console - Getting Started Guide

Symantec Ghost Solution Suite Web Console - Getting Started Guide Symantec Ghost Solution Suite Web Console - Getting Started Guide Symantec Ghost Solution Suite Web Console- Getting Started Guide Documentation version: 3.3 RU1 Legal Notice Copyright 2019 Symantec Corporation.

More information

Service Manager. Database Configuration Guide

Service Manager. Database Configuration Guide Service Manager powered by HEAT Database Configuration Guide 2017.2.1 Copyright Notice This document contains the confidential information and/or proprietary property of Ivanti, Inc. and its affiliates

More information

AvePoint RevIM Installation and Configuration Guide. Issued May AvePoint RevIM Installation and Configuration Guide

AvePoint RevIM Installation and Configuration Guide. Issued May AvePoint RevIM Installation and Configuration Guide AvePoint RevIM 3.2.1 Installation and Configuration Guide Issued May 2017 1 Table of Contents What s New in This Guide... 4 About AvePoint RevIM... 5 Installation Requirements... 6 Hardware Requirements...

More information

IBM Security QRadar Deployment Intelligence app IBM

IBM Security QRadar Deployment Intelligence app IBM IBM Security QRadar Deployment Intelligence app IBM ii IBM Security QRadar Deployment Intelligence app Contents QRadar Deployment Intelligence app.. 1 Installing the QRadar Deployment Intelligence app.

More information

Cloudera ODBC Driver for Apache Hive Version

Cloudera ODBC Driver for Apache Hive Version Cloudera ODBC Driver for Apache Hive Version 2.5.17 Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and any other product or service

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

x10data Application Platform v7.1 Installation Guide

x10data Application Platform v7.1 Installation Guide Copyright Copyright 2010 Automated Data Capture (ADC) Technologies, Incorporated. All rights reserved. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the

More information

Diagnostic Manager Advanced Installation Guide

Diagnostic Manager Advanced Installation Guide Diagnostic Manager Publication Date: May 03, 2017 All Rights Reserved. This software is protected by copyright law and international treaties. Unauthorized reproduction or distribution of this software,

More information

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Copyright 2018, Oracle and/or its affiliates. All rights reserved. Beyond SQL Tuning: Insider's Guide to Maximizing SQL Performance Monday, Oct 22 10:30 a.m. - 11:15 a.m. Marriott Marquis (Golden Gate Level) - Golden Gate A Ashish Agrawal Group Product Manager Oracle

More information

Enterprise Vault Troubleshooting FSA Reporting. 12 and later

Enterprise Vault Troubleshooting FSA Reporting. 12 and later Enterprise Vault Troubleshooting FSA Reporting 12 and later Enterprise Vault : Troubleshooting FSA Reporting Last updated: 2018-04-17. Legal Notice Copyright 2018 Veritas Technologies LLC. All rights reserved.

More information

Data Federation Administration Tool Guide SAP Business Objects Business Intelligence platform 4.1 Support Package 2

Data Federation Administration Tool Guide SAP Business Objects Business Intelligence platform 4.1 Support Package 2 Data Federation Administration Tool Guide SAP Business Objects Business Intelligence platform 4.1 Support Package 2 Copyright 2013 SAP AG or an SAP affiliate company. All rights reserved. No part of this

More information

Documentation. This PDF was generated for your convenience. For the latest documentation, always see

Documentation. This PDF was generated for your convenience. For the latest documentation, always see Management Pack for AWS 1.50 Table of Contents Home... 1 Release Notes... 3 What's New in Release 1.50... 4 Known Problems and Workarounds... 5 Get started... 7 Key concepts... 8 Install... 10 Installation

More information

Remote Support Security Provider Integration: RADIUS Server

Remote Support Security Provider Integration: RADIUS Server Remote Support Security Provider Integration: RADIUS Server 2003-2019 BeyondTrust Corporation. All Rights Reserved. BEYONDTRUST, its logo, and JUMP are trademarks of BeyondTrust Corporation. Other trademarks

More information

External Data Connector for SharePoint

External Data Connector for SharePoint External Data Connector for SharePoint Last Updated: August 2014 Copyright 2014 Vyapin Software Systems Private Limited. All rights reserved. This document is being furnished by Vyapin Software Systems

More information

How to Run the Big Data Management Utility Update for 10.1

How to Run the Big Data Management Utility Update for 10.1 How to Run the Big Data Management Utility Update for 10.1 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

Insight Case Studies. Tuning the Beloved DB-Engines. Presented By Nithya Koka and Michael Arnold

Insight Case Studies. Tuning the Beloved DB-Engines. Presented By Nithya Koka and Michael Arnold Insight Case Studies Tuning the Beloved DB-Engines Presented By Nithya Koka and Michael Arnold Who is Nithya Koka? Senior Hadoop Administrator Project Lead Client Engagement On-Call Engineer Cluster Ninja

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks

More information

Hands-on Lab Session 9909 Introduction to Application Performance Management: Monitoring. Timothy Burris, Cloud Adoption & Technical Enablement

Hands-on Lab Session 9909 Introduction to Application Performance Management: Monitoring. Timothy Burris, Cloud Adoption & Technical Enablement Hands-on Lab Session 9909 Introduction to Application Performance Management: Monitoring Timothy Burris, Cloud Adoption & Technical Enablement Copyright IBM Corporation 2017 IBM, the IBM logo and ibm.com

More information

Database Performance Analyzer

Database Performance Analyzer GETTING STARTED GUIDE Database Performance Analyzer Version 11.1 Last Updated: Friday, December 1, 2017 Retrieve the latest version from: https://support.solarwinds.com/@api/deki/files/32225/dpa_getting_started.pdf

More information

Cloudera Introduction

Cloudera Introduction Cloudera Introduction Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

Database Performance Analyzer Integration Module

Database Performance Analyzer Integration Module ADMINISTRATOR GUIDE Database Performance Analyzer Integration Module Version 11.0 Last Updated: Friday, July 21, 2017 Retrieve the latest version from: https://support.solarwinds.com/@api/deki/files/32921/dpaimadministratorguide.pdf

More information

One Identity Active Roles 7.2. Replication: Best Practices and Troubleshooting Guide

One Identity Active Roles 7.2. Replication: Best Practices and Troubleshooting Guide One Identity Active Roles 7.2 Replication: Best Practices and Troubleshooting Copyright 2017 One Identity LLC. ALL RIGHTS RESERVED. This guide contains proprietary information protected by copyright. The

More information

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program

More information

Scan to Digitech v1.0

Scan to Digitech v1.0 Scan to Digitech v1.0 Administrator's Guide June 2009 www.lexmark.com Lexmark and Lexmark with diamond design are trademarks of Lexmark International, Inc., registered in the United States and/or other

More information

Configuring and Deploying Hadoop Cluster Deployment Templates

Configuring and Deploying Hadoop Cluster Deployment Templates Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page

More information

Storage Foundation and High Availability Solutions HA and Disaster Recovery Solutions Guide for Microsoft SharePoint 2013

Storage Foundation and High Availability Solutions HA and Disaster Recovery Solutions Guide for Microsoft SharePoint 2013 Storage Foundation and High Availability Solutions HA and Disaster Recovery Solutions Guide for Microsoft SharePoint 2013 Windows 7.1 April 2016 Storage Foundation and High Availability Solutions HA and

More information

Netwrix Auditor. Virtual Appliance and Cloud Deployment Guide. Version: /25/2017

Netwrix Auditor. Virtual Appliance and Cloud Deployment Guide. Version: /25/2017 Netwrix Auditor Virtual Appliance and Cloud Deployment Guide Version: 9.5 10/25/2017 Legal Notice The information in this publication is furnished for information use only, and does not constitute a commitment

More information

Table of Contents. Configure and Manage Logging in to the Management Portal Verify and Trust Certificates

Table of Contents. Configure and Manage Logging in to the Management Portal Verify and Trust Certificates Table of Contents Configure and Manage Logging in to the Management Portal Verify and Trust Certificates Configure System Settings Add Cloud Administrators Add Viewers, Developers, or DevOps Administrators

More information

NETWRIX WINDOWS SERVER CHANGE REPORTER

NETWRIX WINDOWS SERVER CHANGE REPORTER NETWRIX WINDOWS SERVER CHANGE REPORTER ADMINISTRATOR S GUIDE Product Version: 4.0 June 2013. Legal Notice The information in this publication is furnished for information use only, and does not constitute

More information

User s Manual. Version 5

User s Manual. Version 5 User s Manual Version 5 Copyright 2017 Safeway. All rights reserved. No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any language,

More information

Have documentation feedback? Submit a Documentation Feedback support ticket using the Support Wizard on support.air-watch.com.

Have documentation feedback? Submit a Documentation Feedback support ticket using the Support Wizard on support.air-watch.com. VMware AirWatch Email Notification Service Installation Guide Providing real-time email notifications to ios devices with AirWatch Inbox and VMware Boxer AirWatch v9.1 Have documentation feedback? Submit

More information

Evaluation Guide Host Access Management and Security Server 12.4

Evaluation Guide Host Access Management and Security Server 12.4 Evaluation Guide Host Access Management and Security Server 12.4 Copyrights and Notices Copyright 2017 Attachmate Corporation, a Micro Focus company. All rights reserved. No part of the documentation materials

More information

SAS Data Loader 2.4 for Hadoop

SAS Data Loader 2.4 for Hadoop SAS Data Loader 2.4 for Hadoop vapp Deployment Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS Data Loader 2.4 for Hadoop: vapp Deployment

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

N4A Device Manager 4.6.0

N4A Device Manager 4.6.0 N4ACMSX-UG460 N4A Device Manager 4.6.0 User Guide Version 1.0 October 30, 2015 NOVATEL WIRELESS COPYRIGHT STATEMENT 2015 Novatel Wireless, Inc. All rights reserved. The information contained in this document

More information

Using Hive for Data Warehousing

Using Hive for Data Warehousing An IBM Proof of Technology Using Hive for Data Warehousing Unit 1: Exploring Hive An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted Rights - Use,

More information

USER GUIDE. CTERA Agent for Windows. June 2016 Version 5.5

USER GUIDE. CTERA Agent for Windows. June 2016 Version 5.5 USER GUIDE CTERA Agent for Windows June 2016 Version 5.5 Copyright 2009-2016 CTERA Networks Ltd. All rights reserved. No part of this document may be reproduced in any form or by any means without written

More information

Cloudera Installation

Cloudera Installation Cloudera Installation Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

INSTALLATION & OPERATIONS GUIDE Wavextend Calculation Framework & List Manager for CRM 4.0

INSTALLATION & OPERATIONS GUIDE Wavextend Calculation Framework & List Manager for CRM 4.0 INSTALLATION & OPERATIONS GUIDE Wavextend Calculation Framework & List Manager for CRM 4.0 COPYRIGHT Information in this document, including URL and other Internet Web site references, is subject to change

More information

Installation Guide. EventTracker Enterprise. Install Guide Centre Park Drive Publication Date: Aug 03, U.S. Toll Free:

Installation Guide. EventTracker Enterprise. Install Guide Centre Park Drive Publication Date: Aug 03, U.S. Toll Free: EventTracker Enterprise Install Guide 8815 Centre Park Drive Publication Date: Aug 03, 2010 Columbia MD 21045 U.S. Toll Free: 877.333.1433 Abstract The purpose of this document is to help users install

More information

VMware vrealize Operations for Horizon Administration

VMware vrealize Operations for Horizon Administration VMware vrealize Operations for Horizon Administration vrealize Operations for Horizon 6.3 This document supports the version of each product listed and supports all subsequent versions until the document

More information

Early Data Analyzer Web User Guide

Early Data Analyzer Web User Guide Early Data Analyzer Web User Guide Early Data Analyzer, Version 1.4 About Early Data Analyzer Web Getting Started Installing Early Data Analyzer Web Opening a Case About the Case Dashboard Filtering Tagging

More information

Installing SmartSense on HDP

Installing SmartSense on HDP 1 Installing SmartSense on HDP Date of Publish: 2018-07-12 http://docs.hortonworks.com Contents SmartSense installation... 3 SmartSense system requirements... 3 Operating system, JDK, and browser requirements...3

More information

BLUEPRINT TEAM REPOSITORY. For Requirements Center & Requirements Center Test Definition

BLUEPRINT TEAM REPOSITORY. For Requirements Center & Requirements Center Test Definition BLUEPRINT TEAM REPOSITORY Installation Guide for Windows For Requirements Center & Requirements Center Test Definition Table Of Contents Contents Table of Contents Getting Started... 3 About the Blueprint

More information

Oracle Cloud Using the Google Calendar Adapter with Oracle Integration

Oracle Cloud Using the Google Calendar Adapter with Oracle Integration Oracle Cloud Using the Google Calendar Adapter with Oracle Integration E85501-05 January 2019 Oracle Cloud Using the Google Calendar Adapter with Oracle Integration, E85501-05 Copyright 2017, 2019, Oracle

More information

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Blended Learning Outline: Cloudera Data Analyst Training (171219a) Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills

More information

Analytics: Server Architect (Siebel 7.7)

Analytics: Server Architect (Siebel 7.7) Analytics: Server Architect (Siebel 7.7) Student Guide June 2005 Part # 10PO2-ASAS-07710 D44608GC10 Edition 1.0 D44917 Copyright 2005, 2006, Oracle. All rights reserved. Disclaimer This document contains

More information

MarkLogic Server. Monitoring MarkLogic Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved.

MarkLogic Server. Monitoring MarkLogic Guide. MarkLogic 9 May, Copyright 2017 MarkLogic Corporation. All rights reserved. Monitoring MarkLogic Guide 1 MarkLogic 9 May, 2017 Last Revised: 9.0-2, July, 2017 Copyright 2017 MarkLogic Corporation. All rights reserved. Table of Contents Table of Contents Monitoring MarkLogic Guide

More information

Tenant Administration. vrealize Automation 6.2

Tenant Administration. vrealize Automation 6.2 vrealize Automation 6.2 You can find the most up-to-date technical documentation on the VMware website at: https://docs.vmware.com/ If you have comments about this documentation, submit your feedback to

More information

Have documentation feedback? Submit a Documentation Feedback support ticket using the Support Wizard on support.air-watch.com.

Have documentation feedback? Submit a Documentation Feedback support ticket using the Support Wizard on support.air-watch.com. VMware AirWatch Email Notification Service Installation Guide Providing real-time email notifications to ios devices with AirWatch Inbox and VMware Boxer Workspace ONE UEM v9.4 Have documentation feedback?

More information

VMware AirWatch Database Migration Guide A sample procedure for migrating your AirWatch database

VMware AirWatch Database Migration Guide A sample procedure for migrating your AirWatch database VMware AirWatch Database Migration Guide A sample procedure for migrating your AirWatch database For multiple versions Have documentation feedback? Submit a Documentation Feedback support ticket using

More information

AvePoint Cloud Governance. Release Notes

AvePoint Cloud Governance. Release Notes AvePoint Cloud Governance Release Notes Table of Contents New Features and Improvements: June 2018... 2 New Features and Improvements: May 2018... 3 New Features and Improvements: April 2018... 4 New Features

More information