How to Install and Configure EBF16193 for Hortonworks HDP 2.3 and HotFix 3 Update 2

Size: px

Start display at page:

Download "How to Install and Configure EBF16193 for Hortonworks HDP 2.3 and HotFix 3 Update 2"

Andra Owens
5 years ago
Views:

How to Install and Configure EBF16193 for Hortonworks HDP 2.3 and 9.6.1 HotFix 3 Update 2 1993-2015 Informatica Corporation.

recording or otherwise) without prior consent of Informatica Corporation.

1 How to Install and Configure EBF16193 for Hortonworks HDP 2.3 and HotFix 3 Update Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. All other company and product names may be trade names or trademarks of their respective owners and/or copyrighted materials of such owners.

2 Abstract Enable Big Data Edition to run mappings on a Hadoop cluster on Hortonworks HDP 2.3. Supported Versions Informatica Big Data Edition HotFix 3 Informatica Big Data Edition Hotfix 3 Update 2 Table of Contents Overview Pre-Installation Task Step 1. Download EBF Step 2. Update the Informatica Domain Applying EBF16193 to the Informatica Domain Step 3. Update the Hadoop Cluster Post-Installation Tasks Configure Big Data Editoin Create and Configure the Analyst Service Configure Big Data Edition for Hortonworks HDP Configure Hadoop Cluster Properties for Hortonworks HDP... 8 Update the Repository Plug-in Add hbase_protocol.jar to the Hadoop classpath Optional Hortonworks HDP Configuration Release Notes New Features and Enhancements Changes Fixed Limitations Known Limitations Third-Party Limitations Overview EBF16193 upgrades Big Data Edition to version HotFix 3 Update 2 and adds support for Hortonworks HDP 2.3. You can apply the EBF to Informatica HotFix 3 or Informatica HotFix 3 Update 2. If you apply the EBF to version HotFix 3, the EBF will upgrade the Informatica domain to version HotFix 3 Update 2. To apply the EBF and enable support for Hortonworks HDP 2.3, perform the following tasks: 1. Complete the pre-installation task. 2. Download the EBF. 3. Update the Informatica domain. 4. Update the Hadoop cluster. 2

3 5. Update the Informatica clients. 6. Complete the post-installation tasks. Optionally, you can configure Big Data Edition for Hortonworks HDP 2.3 after you apply the EBF. Pre-Installation Task Complete the pre-installation task before you apply EBF16193 to Big Data Edition. Note: Skip this task if the Informatica domain does not have an Analyst Service. To use the Analyst Service with a Hadoop cluster that uses Kerberos authentication, perform the following steps before you apply Update 2: 1. Shut down the Analyst Service. 2. Delete the following directories on the machine where the Data Integration Service runs: <Informatica installation directory>\tomcat\temp\<analystservicename> <Informatica installation directory>\services\analystservice\analysttool Step 1. Download EBF16193 Download the EBF. 1. Open a browser. 2. In the address field, enter the following URL: 3. Navigate to the following directory: /updates/informatica9/9.6.1 HotFix 3/EBF Download the following files: EBF16193.Linux64-X86.tar.gz Contains the EBF installer for the Informatica domain and the Hadoop cluster. EBF16193_Client_Installer_win32_x86.zip Contains the EBF installer for the Informatica client. Use this file to update the Developer tool. 5. Extract the files from EBF16193.Linux64-X86.tar.gz. The EBF16193.Linux64-X86.tar.gz file contains the following.tar files: EBF16193_Server_installer_linux_em64t.tar EBF installer for the Informatica domain. Use this file to update the Informatica domain. EBF16193_HadoopRPM_EBFInstaller.tar.Z EBF installer for the Hadoop RPM. Use this file to update the Hadoop cluster. Step 2. Update the Informatica Domain Apply the EBF to the Informatica domain to enable support for Hortonworks HDP 2.3 and upgrade Big Data Edition to version HotFix 3 Update 2. 3

4 Applying EBF16193 to the Informatica Domain Apply the EBF to every node in the domain that is used to connect to HDFS or HiveServer. To apply the EBF to a node in the domain, perform the following steps: 1. Copy EBF16193_Server_installer_linux_em64t.tar to a temporary location on the node. 2. Extract the installer file. Run the following command: tar -xvf EBF16193_Server_Installer_linux_em64t.tar 3. Configure the following properties in the Input.properties file: DEST_DIR=<Informatica installation directory> ROLLBACK=0 4. Run installebf.sh. 5. Repeat steps 1 through 4 for every node in the domain that is used for Hive pushdown. Note: To roll back the EBF for the Informatica domain on a node, set ROLLBACK to 1 and run installebf.sh. Step 3. Update the Hadoop Cluster To update the Hadoop cluster to enable support for Hortonworks HDP 2.3, apply the EBF. Perform the following steps: 1. Copy EBF16193_HadoopRPM_EBFInstaller.tar.Z to a temporary location on the cluster machine. 2. Extract the installer file. Run the following command: tar -xvf EBF16193_HadoopRPM_EBFInstaller.tar.Z 3. Provide the node list in the HadoopDataNodes file. 4. Configure the destdir parameter in the input.properties file: destdir=<informatica installation directory> For example, set the destdir parameter to the following value: destdir="/opt/informatica" 5. Run InformatcaHadoopEBFInstall.sh. Post-Installation Tasks Complete the post-installation tasks after you apply the update. Configure Big Data Editoin If you need to configure Big Data Edition for a new Hadoop distribution after you apply the update, complete the postinstallation task. Use the Big Data Edition Configuration Utility to automate part of the Big Data Edition configuration process. Alternatively, you can manually configure Big Data Edition. For more information about the manual configuration steps required, see the Informatica Big Data Edition Installation and Configuration Guide. For information about configuring Hortonworks HDP 2.3, see the following topic: Configure Big Data Edition for Hortonworks HDP 2.3 on page 8 4

5 To automate part of the configuration process for the Hadoop cluster properties on the machine where the Data Integration Service runs, perform the following steps: 1. On the machine where the Data Integration Service runs, open the command line. 2. Go to the following directory: <Informatica installation directory>/tools/bdeutil. 3. Run BDEConfig.sh. 4. Press Enter. 5. Choose the Hadoop distribution: Option Description 1 Cloudera CDH 2 Hortonworks HDP 3 MapR 4 Pivotal HD 5 IBM BigInsights 6. Choose the Hadoop distribution version you want to use to configure Big Data Edition. 7. Choose how to access files on the Hadoop cluster: If you choose Cloudera CDH, the following options appear: Option Description 1 Cloudera Manager. Enter this option to use the Cloudera Manager API to access files on the Hadoop cluster. 2 Secure Shell (SSH). Enter this option to use SSH to access files on the Hadoop cluster. This option requires SSH connections to the machines that host the NameNode, JobTracker, and Hive client. If you select this option, Informatica recommends that you use an SSH connection without a password or have sshpass or Expect installed. 3 Shared directory. Select this option to use a shared directory to access files on the Hadoop cluster. You must have read permission for the shared directory. Note: Informatica recommends the Cloudera Manager or SSH option. If you choose a distribution other than Cloudera CDH, the following options appear: Option Description 1 Secure Shell (SSH). Enter this option to use SSH to access files on the Hadoop cluster. This option requires SSH connections to the machines that host the NameNode, JobTracker, and Hive client. If you select this option, Informatica recommends that you use an SSH connection without a password or have sshpass or Expect installed. 2 Shared directory. Enter this option to use a shared directory to access files on the Hadoop cluster. You must have read permission for the shared directory. Note: Informatica recommends the SSH option. 5

6 8. If you did not choose Cloudera CDH, continue to step 9. Choose the Cloudera CDH cluster you want to use to configure Big Data Edition. 9. Based on the option you selected, see the corresponding topic to continue with the configuration process: Use Cloudera Manager on page 6 Use SSH on page 6 Use a Shared Directory on page 7 Use Cloudera Manager If you choose Cloudera Manager, perform the following steps to configure Big Data Edition: 1. Enter the Cloudera Manager host. 2. Enter the Cloudera user ID. 3. Enter the password for the user ID. 4. Enter the port for Cloudera Manager. The Big Data Edition Configuration Utility retrieves the required information from the Hadoop cluster. 5. Complete the manual configuration steps. For more information about the manual configuration steps for Cloudera CDH, see the Informatica Big Data Edition Installation and Configuration Guide. Use SSH If you choose SSH, you must provide host names and Hadoop configuration file locations. Note: Informatica recommends that you use an SSH connection without a password or have sshpass or Expect installed. If you do not use one of these methods, you must enter the password each time the utility downloads a file from the Hadoop cluster. Verify the following host names: NameNode, JobTracker, and Hive client. Additionally, verify the locations for the following files on the Hadoop cluster: hdfs-site.xml core-site.xml mapred-site.xml yarn-site.xml hive-site.xml Perform the following steps to configure Big Data Edition: 1. Enter the NameNode host name. 2. Enter the SSH user ID. 3. Enter the password for the SSH user ID. If you use an SSH connection without a password, leave this field blank and press enter. 4. Enter the location for the hdfs-site.xml file on the Hadoop cluster. 5. Enter the location for the core-site.xml file on the Hadoop cluster. The Big Data Edition Configuration Utility connects to the NameNode and downloads the following files: hdfs-site.xml and core-site.xml. 6. Enter the JobTracker host name. 6

7 7. Enter the SSH user ID. 8. Enter the password for the SSH user ID. If you use an SSH connection without a password, leave this field blank and press enter. 9. Enter the directory for the mapred-site.xml file on the Hadoop cluster. 10. Enter the directory for the yarn-site.xml file on the Hadoop cluster. The utility connects to the JobTracker and downloads the following files: mapred-site.xml and yarnsite.xml. 11. Enter the Hive client host name. 12. Enter the SSH user ID. 13. Enter the password for the SSH user ID. If you use an SSH connection without a password, leave this field blank and press enter. 14. Enter the directory for the hive-site.xml file on the Hadoop cluster. The utility connects to the Hive client and downloads the following file: hive-site.xml. 15. Complete the manual configuration steps. For more information about the manual configuration steps required for the Hadoop distribution, see the Informatica Big Data Edition Installation and Configuration Guide. Use a Shared Directory If you choose shared directory, perform the following steps to configure Big Data Edition: 1. Enter the location of the shared directory. Note: You must have read permission for the directory, and the directory should contain the following files: core-site.xml hdfs-site.xml hive-site.xml mapred-site.xml yarn-site.xml 2. Complete the manual configuration steps. For more information about the manual configuration steps required for the Hadoop distribution, see the Informatica Big Data Edition Installation and Configuration Guide. Troubleshooting the Configuration Utility Consider the following troubleshooting tips when you perform the post-installation tasks: In the ClusterConfig.properties file, the hostname is incorrect for the command templates if I use the shared directory option for the Big Data Edition Configuration Utility. If the utility cannot determine the host name for the connection based on the files in the shared directory, the utility uses "localhost." Manually replace "localhost" with the host name for the connection. In the ClusterConfig.properties file, which user do I provide for the UserName parameter in the Hive remote connection command template? Provide the user name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster. 7

8 In the ClusterConfig.properties file, which user do I provider for the USERNAME parameter in the HDFS connection command template? Provide the user name that is used to access HDFS. Create and Configure the Analyst Service To use the Analyst Service with a Hadoop cluster that uses Kerberos authentication, create the Analyst Service and configure it to use the Kerberos ticket for the Data Integration Service. Perform the following steps: 1. Verify that the Data Integration Service is configured for Kerberos. For more information, see the Informatica Big Data Edition User Guide. 2. Create an Analyst Service. For more information about how to create the Analyst Service, see the Informatica Application Services Guide. 3. Log in to the Administrator tool. 4. In the Domain Navigator, select the Analyst Service. 5. In the Processes tab, edit the Advanced Properties. 6. Add the following value to the JVM Command Line Options field: DINFA_HADOOP_DIST_DIR=<Informatica installation directory>/services/shared/hadoop/<hadoop_distribution>. Configure Big Data Edition for Hortonworks HDP 2.3 If the Hadoop cluster runs Hortonworks HDP 2.3, you must configure Big Data Edition. Skip this section if the Hadoop cluster does not run Hortonworks HDP 2.3 Configure Hadoop Cluster Properties for Hortonworks HDP Configure Hadoop cluster properties in the yarn-site.xml file and mapred-site.xml file that the Data Integration Service uses when it runs mappings on a Hortonworks HDP cluster Configure yarn-site.xml for the Data Integration Service You need to configure the Hortonworks cluster properties in the yarn-site.xml file that the Data Integration Service uses when it runs mappings in a Hadoop cluster. If you use the Big Data Edition Configuration Utility to configure Big Data Edition, yarn-site.xml is automatically configured. Open the yarn-site.xml file in the following directory on the node on which the Data Integration Service runs: <Informatica installation directory>/services/shared/hadoop/hortonworks_<version>/conf/ Configure the following property in the yarn-site.xml file: yarn.resourcemanager.scheduler.address Scheduler interface address. Use the value in the following file: /etc/hadoop/conf/yarn-site.xml The following sample text shows the property you can set in yarn-site.xml: <name>yarn.resourcemanager.scheduler.address</name> <value>hostname:port</value> <description>the address of the scheduler interface</description> 8

9 Configure mapred-site.xml for the Data Integration Service You need to configure the Hortonworks cluster properties in the mapred-site.xml file that the Data Integration Service uses when it runs mappings in a Hadoop cluster. Open the mapred-site.xml file in the following directory on the node on which the Data Integration Service runs: <Informatica installation directory>/services/shared/hadoop/hortonworks_<version>/conf/ Configure the following properties in the mapred-site.xml file: mapreduce.jobhistory.intermediate-done-dir Directory where the MapReduce jobs write history files. Use the value in the following file: /etc/hadoop/conf/mapred-site.xml mapreduce.jobhistory.done-dir Directory where the MapReduce JobHistory server manages history files. Use the value in the following file: /etc/hadoop/conf/mapred-site.xml The following sample text shows the properties you must set in the mapred-site.xml file: <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/mr-history/tmp</value> <description>directory where MapReduce jobs write history files.</description> <name>mapreduce.jobhistory.done-dir</name> <value>/mr-history/done</value> <description>directory where the MapReduce JobHistory server manages history files.</ description> If you use the Big Data Edition Configuration Utility to configure Big Data Edition, the following properties are automatically configured in mapred-site.xml. If you do not use the utility, configure the following properties in mapredsite.xml: mapreduce.jobhistory.address Location of the MapReduce JobHistory Server. Use the value in the following file:/etc/hadoop/conf/mapred-site.xml mapreduce.jobhistory.webapp.address Web address of the MapReduce JobHistory Server. Use the value in the following file: /etc/hadoop/conf/mapred-site.xml The following sample text shows the properties you can set in the mapred-site.xml file: <name>mapreduce.jobhistory.address</name> <value>hostname:port</value> <description>mapreduce JobHistory Server IPC host:port</description> <name>mapreduce.jobhistory.webapp.address</name> <value>hostname:port</value> <description>mapreduce JobHistory Server Web UI host:port</description> Configure Rolling Upgrades for Hortonworks HDP To enable support for rolling upgrades for Hortonworks HDP, you must configure the following properties in mapredsite.xml on the machine where the Data Integration Service runs: 9

10 mapreduce.application.classpath Classpaths for MapReduce applications. Use the following value: $PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/ mapreduce/lib/*:$pwd/mr-framework/hadoop/share/hadoop/common/*:$pwd/mr-framework/hadoop/ share/hadoop/common/lib/*:$pwd/mr-framework/hadoop/share/hadoop/yarn/*:$pwd/mrframework/hadoop/share/hadoop/yarn/lib/*:$pwd/mr-framework/hadoop/share/hadoop/hdfs/*: $PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/<hadoop_version>/hadoop/lib/ hadoop-lzo jar:/etc/hadoop/conf/secure Replace <hadoop_version> with your Hortonworks HDP version. For example, use for a Hortonworks HDP 2.3 cluster. mapreduce.application.framework.path Path for the MapReduce framework archive. Use the following value: /hdp/apps/<hadoop_version>/mapreduce/mapreduce.tar.gz#mr-framework Replace <hadoop_version> with your Hortonworks HDP version. For example, use for a Hortonworks HDP 2.3 cluster. The following sample text shows the properties you can set in the mapred-site.xml file: <name>mapreduce.application.classpath</name> <value>$pwd/mr-framework/hadoop/share/hadoop/mapreduce/*:$pwd/mr-framework/hadoop/share/hadoop/ mapreduce/lib/*:$pwd/mr-framework/hadoop/share/hadoop/common/*:$pwd/mr-framework/hadoop/share/ hadoop/common/lib/*:$pwd/mr-framework/hadoop/share/hadoop/yarn/*:$pwd/mr-framework/hadoop/share/ hadoop/yarn/lib/*:$pwd/mr-framework/hadoop/share/hadoop/hdfs/*:$pwd/mr-framework/hadoop/share/ hadoop/hdfs/lib/*:/usr/hdp/<hadoop_version>/hadoop/lib/hadoop-lzo jar:/etc/ hadoop/conf/secure </value> <description>classpaths for MapReduce applications. Replace <hadoop_version> with your Hortonworks HDP version. For example, use for a Hortonworks HDP 2.3 cluster. </description> <name>mapreduce.application.framework.path</name> <value>/hdp/apps/<hadoop_version>/mapreduce/mapreduce.tar.gz#mr-framework</value> <description> Path for the MapReduce framework archive. Replace <hadoop_version> with your Hortonworks HDP version. For example, use for a Hortonworks HDP 2.3 cluster. </description> Update the Repository Plug-in If you upgraded an existing repository, you must update the repository plug-in to enable PowerExchange for HDFS to run on the Hadoop distribution. If you created a new repository, skip this task. 1. Ensure that the Repository service is running in exclusive mode. 2. On the server machine, open the command console. 3. Run cd <Informatica installation directory>/server/bin 4. Run./pmrep connect -r <repo_name> -d <domain_name> -n <username> -x <password> 5. Run./pmrep registerplugin -i native/pmhdfs.xml -e -N true 6. Set the Repository service to normal mode. 7. Open the PowerCenter Workflow manager on the client machine. The distribution appears in the Connection Object menu. 10

11 Add hbase_protocol.jar to the Hadoop classpath Add hbase-protocol.jar to the Hadoop classpath on every node on the Hadoop cluster. Then, restart the Node Manager for each node in the Hadoop cluster. hbase-protocol.jar is located in the HBase installation directory on the Hadoop cluster. For more information, refer to the following link: Optional Hortonworks HDP Configuration Optionally, you can enable support for the following Hortonworks HDP features: HBase Tez High Availability Enable HBase Support To use HBase as a source or target when you run a mapping in the Hive environment, you must add hbase-site.xml to a distributed cache. Perform the following steps: 1. On the machine where the Data Integration Service runs, go to the following directory: <Informatica installation directory>/services/shared/hadoop/hortonworks_<version>/infaconf. 2. Edit hadoopenv.properties. 3. Verify the HBase version specified in infapdo.env.entry.mapred_classpath uses the HBase version for Hortonworks HDP 2.3. Hortonworks HDP 2.3 uses HBase The following sample text shows infapdo.env.entry.mapred_classpath with the correct HBase version: infapdo.env.entry.mapred_classpath=infa_mapred_classpath=$hadoop_node_hadoop_dist/lib/ hbase-server jar:$hadoop_node_hadoop_dist/lib/htrace-core.jar: $HADOOP_NODE_HADOOP_DIST/lib/htrace-core-2.04.jar:$HADOOP_NODE_HADOOP_DIST/lib/protobufjava jar:$HADOOP_NODE_HADOOP_DIST/lib/hbase-client jar: $HADOOP_NODE_HADOOP_DIST/lib/hbase-common jar: $HADOOP_NODE_HADOOP_DIST/lib/hive-hbase-handler jar: $HADOOP_NODE_HADOOP_DIST/lib/hbase-protocol jar 4. Add the following entry to the infapdo.aux.jars.path variable: file://$dis_hadoop_dist/conf/hbasesite.xml. The following sample text shows infapdo.aux.jars.path with the variable added: infapdo.aux.jars.path=file://$dis_hadoop_dist/infalib/hive infa-boot.jar,file:// $DIS_HADOOP_DIST/infaLib/hive-infa-plugins-interface.jar,file://$DIS_HADOOP_DIST/ infalib/profiling-hive hw21-udf.jar,file://$dis_hadoop_dist/infalib/hadoop avro_complex_file.jar,file://$dis_hadoop_dist/conf/hbase-site.xml 5. On the machine where the Data Integration Service runs, go to the following directory: <Informatica installation directory>/services/shared/hadoop/hortonworks_<version>/conf. 6. In hbase-site.xml and hive-site.xml, verify that thezookeeper.znode.parent property exists and matches the property set in hbase-site.xml on the cluster. By default, the ZooKeeper directory on the cluster is /usr/hdp/current/hbase-client/conf. 11

12 Enable Tez To use Tez to push mapping logic to the Hadoop cluster, enable Tez for the Data Integration Service or for a Hive connection. When you enable Tez for the Data Integration Service, Tez becomes the default execution engine to push mapping logic to the Hadoop cluster. When you enable Tez for a Hive connection, Tez takes precedence over the execution engine set for the Data Integration Service. Enable Tez for the Data Integration Service To use Tez to push mapping logic to the Hadoop cluster, enable Tez for the Data Integration Service. Open hive-site.xml in the following directory on the node on which the Data Integration Service runs: <Informatica installation directory>/services/shared/hadoop/hortonworks_<version>/conf/ Configure the following property: hive.execution.engine Chooses the execution engine. You can use "mr" for MapReduce or "tez", which requires Hadoop 2. The following sample text shows the property you can set in hive-site.xml: <name>hive.execution.engine</name> <value>tez</value> <description>chooses execution engine. Options are: mr (MapReduce, default) or tez (Hadoop 2 only)</description> To use MapReduce as the default execution engine to push mapping logic to the Hadoop cluster, use "mr" as the value for the hive.execution.engine property. Enable Tez for a Hive Connection When you enable Tez for a Hive connection, the Data Integration Service uses Tez to push mapping logic to the Hadoop cluster regardless of what is set for the Data Integration Service. 1. Open the Developer tool. 2. Click Window > Preferences. 3. Select Informatica > Connections. 4. Expand the domain. 5. Expand the Databases and select the Hive connection. 6. Edit the Hive connection and configure the Environment SQL property on the Database Connection tab. Use the following value: set hive.execution.engine=tez; If you enable Tez for the Data Integration Service but want to use MapReduce, you can use the following value for the Environment SQL property: set hive.execution.engine=mr;. Configure Tez After you enable Tez, you must configure properties in tez-site.xml. You can find tez-site.xml in the following directory on the machine where the Data Integration Service runs: <Informatica installation directory>/services/shared/hadoop/hortonworks_<version>/conf. Configure the following properties: tez.lib.uris Specifies the location of tez.tar.gz on the Hadoop cluster. Use the value specified in tez-site.xml on the cluster. You can find tez-site.xml in the following directory on any node in the cluster: /etc/tez/conf. 12

13 tez.am.launch.env Specifies the location of Hadoop libraries. Use the following syntax when you configure tez-site.xml: <name>tez.lib.uris</name> <value><file system default name>://<directory of tez.tar.gz></value> <description>the location of tez.tar.gz. Set tez.lib.uris to point to the tar.gz uploaded to HDFS.</description> <name>tez.am.launch.env</name> <value>ld_library_path=<hdp directory>/<hdp version>/hadoop/lib/native</value> <description>the location of Hadoop libraries.</description> The following example shows the properties if tez.tar.gz is in the /apps/tez/lib directory on HDFS and the Hortonworks HDP verison is : <name>tez.lib.uris</name> <value>hdfs://hdp/apps/tez/lib/tez.tar.gz</value> <description>the location of tez.tar.gz. Set tez.lib.uris to point to the tar.gz uploaded to HDFS.</description> <name>tez.am.launch.env</name> <value>ld_library_path=/usr/hdp/ /hadoop/lib/native</value> <description>the location of Hadoop libraries.</description> Enable Support for a Highly Available Hortonworks HDP Cluster You can enable Data Integration Service and the Developer tool to read from and write to a highly available Hortonworks cluster. The Hortonworks cluster provides a highly available NameNode and ResourceManager. To enable support for a highly available Hortonworks HDP cluster, perform the following tasks: 1. Configure cluster properties for high availability. 2. Configure the connection to the cluster. Configure Cluster Properties for a Highly Available Name Node You must configure cluster properties in hive-site.xml to enable support for a highly available NameNode. On the machine where the Data Integration Service runs, you can find hive-site.xml in the following directory: <Informatica installation directory>/services/shared/hadoop/hortonworks_<version>/conf. Configure the following properties in hive-site.xml: dfs.ha.automatic-failover.enabled This property determines whether automatic failover is enabled. Set this value to true. dfs.ha.namenodes.<clustername> The ClusterName is specified in the dfs.nameservice property. The following sample text shows the property for a cluster named cluster01: dfs.ha.namenodes.cluster01. Specify the NameNode IDs with a comma separated list. For example, you can use the following values: nn1,nn2. 13

14 dfs.namenode.https-address The HTTPS server that the NameNode listens on. dfs.namenode.https-address.<clustername>.<namenodeid> The HTTPS server that a highly available NameNode specified in dfs.ha.namenodes.<clustername> listens on. Each NameNode requires a separate entry. For example, if you have two highly available NameNodes, you must have two corresponding dfs.namenode.https-address.<clustername>.<namenodeid> properties. The following sample text shows a NameNode with the ID nn1 on a cluster named cluster01: dfs.namenode.https-address.cluster01.nn1 dfs.namenode.http-address The HTTP server that the NameNode listens on. dfs.namenode.http-address.<clustername>.<namenodeid> The HTTPS server that a highly available NameNode specified in dfs.ha.namenodes.<clustername> listens on. Each NameNode requires a separate entry. For example, if you have two highly available NameNodes, you must have two corresponding dfs.namenode.http-address.<clustername>.<namenodeid> properties. The following sample text shows a NameNode with the ID nn1 on a cluster named cluster01: dfs.namenode.http-address.cluster01.nn1 dfs.namenode.rpc-address The fully-qualified RPC address for the NameNode to listen on. dfs.namenode.rpc-address.<clustername>.<namenodeid> The fully-qualified RPC address for a highly available NameNode specified in dfs.ha.namenodes.<clustername> listens on. Each NameNode requires a separate entry. For example, if you have two highly available NameNodes, you must have two corresponding dfs.namenode.rpcaddress.<clustername>.<namenodeid> properties. The following sample text shows a NameNode with the ID nn1 on a cluster named cluster01: dfs.namenode.rpc-address.cluster01.nn1. The following sample text shows the properties for two highly available NameNodes with the IDs nn1 and nn2 on a cluster named cluster01: <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> <name>dfs.namenodes.cluster01</name> <value>nn1,nn2</value> <name>dfs.namenode.https-address</name> <value>node01.domain01.com:50470</value> <name>dfs.namenode.https-address.cluster01.nn1</name> <value>node01.domain01.com:50470</value> <name>dfs.namenode.https-address.cluster01.nn2</name> <value>node02.domain01.com:50470</value> 14

15 <name>dfs.namenode.http-address</name> <value>node01.domain01.com:50070</value> <name>dfs.namenode.http-address.cluster01.nn1</name> <value>node01.domain01.com:50070</value> <name>dfs.namenode.http-address.cluster01.nn2</name> <value>node02.domain01.com:50070</value> <name>dfs.namenode.rpc-address</name> <value>node01.domain01.com:8020</value> <name>dfs.namenode.rpc-address.cluster01.nn1</name> <value>node01.domain01.com:8020</value> <name>dfs.namenode.rpc-address.cluster01.nn2</name> <value>node02.domain01.com:8020</value> Configure Cluster Properties for a Highly Available Resource Manager You must configure cluster properties in yarn-site.xml to enable support for a highly available Resource Manager. On the machine where the Data Integration Service runs, you can find yarn-site.xml in the following directory: <Informatica installation directory>/services/shared/hadoop/hortonworks_<version>/conf. Configure the following properties in yarn-site.xml: yarn.resourcemanager.ha.enabled This property determines whether high availability is enabled for Resource Managers. Set this value to true. yarn.resourcemanager.ha.rm-ids List of highly available Resource Manager IDs. For example, you can use the following values: rm1,rm2. yarn.resourcemanager.hostname The host name for the Resource Manager. yarn.resourcemanager.hostname.<resourcemanagerid> Host name for one of the highly available Resource Managers specified in yarn.resourcemanager.ha.rmids. Each Resource Manager requires a separate entry. For example, if you have two Resource Managers, you must have two corresponding yarn.resourcemanager.hostname.<resourcemanagerid> properties. The following sample text shows a Resource Manager with the ID rm1: yarn.resourcemanager.hostname.rm1. yarn.resourcemanager.webapp.address.<resourcemanagerid> The HTTP address for the web application of one of the Resource Managers you specified in yarn.resourcemanager.ha.rm-ids. Each Resource Manager requires a separate entry. 15

16 yarn.resourcemanager.scheduler.address The address of the scheduler interface. yarn.resourcemanager.scheduler.address.<resourcemanagerid> The address of the scheduler interface for one of the highly available Resource Managers. Each resource manager requires a separate entry. The following sample text shows the properties for two highly available Resource Managers with the IDs rm1 and rm2: <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2></value> <name>yarn.resourcemanager.hostname</name> <value>node01.domain01.com</value> <name>yarn.resourcemanager.hostname.rm1</name> <value>node01.domain01.com</value> <name>yarn.resourcemanager.hostname.rm2</name> <value>node02.domain01.com</value> <name>yarn.resourcemanager.webapp.address</name> <value>node01.domain01.com:8088</value> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>node01.domain01.com:8088</value> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>node02.domain01.com:8088</value> <name>yarn.resourcemanager.scheduler.address</name> <value>node01.domain01.com:8030</value> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>node01.domain01.com:8030</value> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>node02.domain01.com:8030</value> 16

17 Configure the Connection to a Highly Available Hortonworks HDP Cluster You must configure Big Data Edition to connect to a highly available Hortonworks HDP cluster. Perform the following steps: 1. Go to the following directory on the NameNode of the cluster: /etc/hadoop/conf 2. Locate the following files: hdfs-site.xml yarn-site.xml 3. Note: If you use the Big Data Edition Configuration Utility to configure Big Data Edition, skip this step. Copy the files to the following directory on the machine where the Data Integration Service: <Informatica installation directory>/services/shared/hadoop/hortonworks_<version>/conf 4. Copy the files to the following directory on the machine where the Developer tool runs <Informatica installation directory>/clients/developerclient/hadoop/hortonworks_<version>/ conf 5. Open the Developer tool. 6. Click Window > Preferences. 7. Select Informatica > Connections. 8. Expand the domain. 9. Expand Databases and select the Hive connection. 10. Edit the Hive connection and configure the following properties in the Properties to Run Mappings in Hadoop Cluster tab: Default FS URI Use the value from the dfs.nameservices property in hdfs-site.xml. Job tracker/yarn Resource Manager URI Enter any value in the following format: <string>:<port>. For example, enter dummy: Expand File Systems and select the HDFS connection. 12. Edit the HDFS connection and configure the following property in the Details tab: NameNode URI Use the value from the dfs.nameservices property in hdfs-site.xml. Release Notes EBF16193 adds support for Hortonworks HDP 2.3. Additionally, the EBF upgrades Big Data Edition to version HotFix 3 Update 2 from version HotFix 3. The following Release Notes apply to EBF New Features and Enhancements This section describes new features and enhancements to Big Data Edition HotFix 3 Update 2 with EBF Hadoop Ecosystem Effective in EBF16193, Big Data Edition supports Hadoop clusters that run Hortonworks HDP

18 Kerberos Authentication Effective in version HotFix 3 Update 2, Big Data Edition supports Cloudera CDH and Hortonworks HDP clusters that use Microsoft Active Directory as the KDC for Kerberos authentication. For more information, see the Informatica HotFix 3 Update 2 Big Data Edition User Guide. Update Strategy Transformation Effective in version HotFix 3 Update 2, Big Data Edition supports the Update Strategy transformation for Hive targets in the Hive environment. The Hadoop cluster must use Hive 0.14 or later. For more information, see the Informatica HotFix 3 Update 2 Big Data Edition Users Guide. Changes This section describes changes to Big Data Edition HotFix 3 Update 2. Kerberos Authentication Effective in verison HotFix 3 Update 2, Big Data Edition dropped support for Hadoop clusters that only use an MIT KDC for Kerberos authentication. Fixed Limitations The following table describes fixed limitations: CR Description A mapping fails to run in the Hive environment with a permission denied error for the scratch directory when the following conditions are true: - The cluster runs Hortonworks HDP The user designated in the Hive connection is not the Data Integration Service user A mapping that contains a Lookup transformation creates temporary jar files that are not removed after the mapping completes in the Hive environment / In the Analyst tool, testing a Hive connection fails for a Hadoop cluster that uses Kerberos authentication When you run a mapping with a JDBC source and target in Hive environment, the mapping fails in Hortonworks version 2.2 with the following error in the job logs: INFO [IPC Server handler 5 on 50241] org.apache.hadoop.mapred.taskattemptlistenerimpl: Diagnostics report from attempt_ _0216_m_000000_0: Error: java.io.ioexception: Mapping execution failed with the following error: ODL_26128 Database error encountered in connection object [insplash_stghdlr_base] with the following error message: [The Data Integration Service could not find the run-time OSGi bundle for the adapter [com.informatica.adapter.infajdbc.infajdbcconnectinfo] for the operating system [LINUX]. Copy the adapter run-time OSGi bundle and verify that you have set the correct library name in the plugin.xml file A mapping that contains a Lookup transformation fails with the following error in the Hive environment: [main] ExecReducer: org.apache.hadoop.hive.ql.metadata.hiveexception: [Error 20001]: An error occurred while reading or writing to your custom script. 18

19 Known Limitations The following table describes known limitations: CR Description A mapping fails to run in the Hive environment when the following conditions are true: - The cluster runs Hortonworks HDP The mapping has a flat file target. - The user designed in the Hive connection is not the Data Integration Service user. Third-Party Limitations The following table describes third-party limitations: CR Description The Update Strategy transformation fails to insert data into a bucketed target table. This is a third-party limitation for Hive versions before 1.3. For more information, see the following Hive limitation: Author Big Data Edition Team 19

How to Install and Configure Big Data Edition for Hortonworks

How to Install and Configure Big Data Edition for Hortonworks 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,