How to Install and Configure EBF15545 for MapR with MapReduce 2

Size: px
Start display at page:

Download "How to Install and Configure EBF15545 for MapR with MapReduce 2"

Transcription

1 How to Install and Configure EBF15545 for MapR with MapReduce Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. All other company and product names may be trade names or trademarks of their respective owners and/or copyrighted materials of such owners.

2 Abstract Enable Big Data Edition to run mappings on a Hadoop cluster on MapR with MapReduce 2. Supported Versions Informatica Big Data Edition HotFix 2 Table of Contents Overview Step 1. Download EBF Step 2. Update the Informatica Domain Applying EBF15545 to the Informatica Domain Configuring MapR Distribution Variables for Mappings in a Hive Environment Configuring Hadoop Cluster Properties in yarn-site.xml Step 3. Update the Hadoop Cluster Applying EBF15545 to the Hadoop Cluster Configure the Heap Space for the MapR-FS Verifying the Cluster Details Step 4. Update the Developer tool Applying EBF15545 to the Informatica Clients Configuring the Developer tool Step 5. Update PowerCenter Configuring the PowerCenter Integration Service Copying MapR Distribution Files for PowerCenter Mappings in the Native Environment Enable User Impersonation for Native and Hive Execution Environments Connections Overview HDFS Connection Properties HBase Connection Properties Hive Connection Properties Creating a Connection Troubleshooting Overview EBF15545 adds support for MapR with MapReduce 2 to Informatica HotFix 2. Note: Teradata Connector for Hadoop (Command Line Edition) does not support MapR Only MapR 3.1 is supported. To apply the EBF and configure Informatica, perform the following tasks: 1. Download the EBF. 2

3 2. Update the Informatica domain Note: If the Data Integration Service runs on a machine that uses SUSE 11, the native mode of execution and Hive pushdown are not supported. Use a Data Integration Service that runs on a machine that uses RHEL. 3. Update the Hadoop cluster 4. Update the Developer tool client 5. Update PowerCenter Optionally, you can enable support for user impersonation. Step 1. Download EBF15545 Before you enable MapR with MapReduce 2 for Informatica HotFix 2, download the EBF. 1. Open a browser. 2. In the address field, enter the following URL: 3. Navigate to the following directory: /updates/informatica9/9.6.1 HotFix2/EBF Download the following files: EBF15545.Linux64-X86.tar.gz Contains the EBF installer for the Informatica domain and the Hadoop cluster. EBF15545_Client_Installer_win32_x86.zip Contains the EBF installer for the Informatica client. Use this file to update the Developer tool. 5. Extract the files from EBF15545.Linux64-X86.tar.gz. The EBF15545.Linux64-X86.tar.gz file contains the following.tar files: EBF15545_Server_installer_linux_em64t.tar EBF installer for the Informatica domain. Use this file to update the Informatica domain. EBF15545_HadoopRPM_EBFInstaller.tar EBF installer for the Hadoop RPM. Use this file to update the Hadoop cluster. Step 2. Update the Informatica Domain Update the Informatica domain to enable MapR with MapReduce 2. Note: If the Data Integration Service runs on a machine that uses SUSE 11, the native mode of execution and Hive pushdown are not supported. Use a Data Integration Service that runs on a machine that uses RHEL. Perform the following tasks: 1. Apply the EBF to the Informatica domain 2. Configure MapR distribution variables for mappings in a Hive Environment 3. Configure Hadoop cluster properties in yarn-site.xml 3

4 Applying EBF15545 to the Informatica Domain Apply the EBF to every node in the domain that is used to connect to HDFS or HiveServer on MapR To apply the EBF to a node in the domain, perform the following steps: 1. Copy EBF15545_Server_installer_linux_em64t.tar to a temporary location on the node. 2. Extract the installer file. Run the following command: tar -xvf EBF15545_Server_Installer_linux_em64t.tar 3. Configure the following properties in the Input.properties file: DEST_DIR=<Informatica installation directory> ROLLBACK=0 4. Run installebf.sh. 5. Repeat steps 1 through 4 for every node in the domain that is used for Hive pushdown. Note: To roll back the EBF for the Informatica domain on a node, set ROLLBACK to 1 and run installebf.sh. Configuring MapR Distribution Variables for Mappings in a Hive Environment When you use the MapR distribution to run mappings in a Hive environment, you must configure MapR environment variables. Configure the following MapR variables: Add MAPR_HOME to the environment variables in the Data Integration Service Process properties. Set MAPR_HOME to the following path: <Informatica installation directory>/services/shared/hadoop/ mapr_4.0.2_yarn. Add -Dmapr.library.flatclass to the custom properties in the Data Integration Service Process properties. For example, add JVMOption1=-Dmapr.library.flatclass Add -Dmapr.library.flatclass to the Data Integration Service advanced property JVM Command Line Options. Set the MapR Container Location Database name variable CLDB in the following file: <Informatica installation directory>/services/shared/hadoop/mapr_4.0.2_yarn/conf/mapr-clusters.conf. For example, add the following property: INFAMAPR402 secure=false <master_node_name>:7222 Configuring Hadoop Cluster Properties in yarn-site.xml To run mappings on a MapR cluster, you must configure the cluster properties in yarn-site.xml on the machine where the Data Integration Service runs. yarn-site.xml is located in the following directory on the machine where the Data Integration Service runs: <Informatica installation directory>/services/shared/hadoop/mapr_4.0.2_yarn/conf/. In yarn-site.xml, configure the following properties: mapreduce.jobhistory.address Location of the MapReduce JobHistory Server. The default value is Use the value in the following file: /opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/mapred-site.xml mapreduce.jobhistory.webapp.address Web address of the MapReduce JobHistory Server. The default value is

5 Use the value in the following file: /opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/mapred-site.xml yarn.resourcemanager.scheduler.address Scheduler interface address. The default value is Use the value in the following file: /opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/yarn-site.xml The following sample code describes the properties you can set in yarn-site.xml: <property> <name>mapreduce.jobhistory.address</name> <value>hostname:port</value> <description>mapreduce JobHistory Server IPC host:port</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hostname:port</value> <description>mapreduce JobHistory Server Web UI host:port</description> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hostname:port</value> <description>the address of the scheduler interface</description> </property> Step 3. Update the Hadoop Cluster To update the Hadoop cluster to enable MapR4.0.2, perform the following tasks: 1. Apply the EBF to the Hadoop cluster 2. Configure the heap space for the MapR-FS 3. Verify the cluster details Applying EBF15545 to the Hadoop Cluster To apply the EBF to the Hadoop cluster, perform the following steps: 1. Copy EBF15545_HadoopRPM_EBFInstaller.tar to a temporary location on the cluster machine. 2. Extract the installer file. Run the following command: tar -xvf EBF15545_HadoopRPM_EBFInstaller.tar 3. Provide the node list in the HadoopDataNodes file. 4. Configure the destdir parameter in the input.properties file: destdir=<informatica home directory> For example, set the DEST_DIR parameter to the following value: destdir="/opt/informatica" 5. Run InformaticaHadoopEBFInstall.sh. 5

6 Configure the Heap Space for the MapR-FS You must configure the heap space reserved for the MapR-FS on every node in the cluster. Perform the following steps: 1. Navigate to the following directory: /opt/mapr/conf. 2. Edit the warden.conf file. 3. Set the value for the service.command.mfs.heapsize.percent property to Save and close the file. 5. Repeat steps 1 through 4 for every node in the cluster. 6. Restart the cluster. Verifying the Cluster Details Verify the following settings for the MapR cluster: MapReduce Version If the cluster is configured for Classic MRv1, use the MapR Control System (MCS) to change the configuration to YARN. Then, restart the cluster. MapR User Details Verify that the MapR user exists on each Hadoop cluster node and that the following properties match: User ID (uid) Group ID (gid) Groups For example, the MapR user might have the following properties: uid=2000(mapr) gid=2000(mapr) groups=2000(mapr) Data Integration Service User Details Verify that the user who runs the Data Integration Service is assigned the same gid as the MapR user and belongs to the same group. For example, a Data Integration Service user named testuser, might have the following properties: uid=30103(testuser) gid=2000(mapr) groups=2000(mapr) After you verify the Data Integration Service user details, perform the following steps: 1. Create a user that has the same user ID and name as the Data Integration Service user. 2. Add this user to all the nodes in the Hadoop cluster and assign it to the mapr group. 3. Verify that the user you created has read and write permissions for the following directory: /opt/mapr/ hive/hive-0.13/logs. A directory corresponding to the user will be created at this location. 4. Verify that the user you created has permissions for the Hive warehouse directory. 6

7 The Hive warehouse directory is set in the following file: /opt/mapr/hive/hive-0.13/conf/hivesite.xml. For example, if the warehouse directory is /user/hive/warehouse, run the following command to grant the user permissions for the directory: hadoop fs chmod R 777 /user/hive/warehouse Step 4. Update the Developer tool Update the Developer tool to enable MapR Perform the following tasks: 1. Apply the EBF to the Informatica clients 2. Configure the Developer tool Applying EBF15545 to the Informatica Clients To apply the EBF to the Informatica client, perform the following steps: 1. Copy EBF15545_Client_Installer_win32_x86 to the Windows client machine. 2. Extract the installer. 3. Configure the following properties in the Input.properties file: DEST_DIR=<Informatica installation directory> ROLLBACK=0 Use two slashes when you set the DEST_DIR property. For example, include the following lines in the Input.properties file: DEST_DIR=C:\\Informatica\\9.6.1HF2RC ROLLBACK=0 4. Run installebf.bat. Note: To roll back the EBF for the Informatica client, set ROLLBACK to 1 and run installebf.bat. Configuring the Developer tool To configure the Developer tool after you apply the EBF, perform the following steps: 1. Go to the following directory on any node in the Hadoop cluster: <MapR installation directory>/conf. 2. Find the mapr-cluster.conf file. 3. Copy the file to the following directory on the machine on which the Developer tool runs: <Informatica installation directory>\clients\developerclient\hadoop\mapr_402\conf 4. Go to the following directory on the machine on which the Developer tool runs: <Informatica installation directory>\<version>\clients\developerclient 5. Edit run.bat to include the MAPR_HOME environment variable and the -clean settings: For example, include the following lines: <Informatica installation directory>\clients\developerclient\hadoop\mapr_402 developercore.exe -clean 6. Save and close the file. 7

8 7. Add the following values to the developercore.ini file: -Dmapr.library.flatclass -Djava.library.path=hadoop\mapr_402\lib\native\Win32;bin;..\DT\bin You can find developercore.ini in the following directory: <Informatica installation directory> \clients\developerclient 8. Save and close the file. 9. Use run.bat to start the Developer tool. Step 5. Update PowerCenter Update the Informatica domain to enable MapR Perform the following tasks: 1. Update the repository plugin 2. Configure the PowerCenter Integration Service 3. Copy MapR distribution files to PowerCenter mappings in the native environment Configuring the PowerCenter Integration Service To enable support for MapR4.0.2, configure the PowerCenter Integration Service. Perform the following steps: 1. Log in to the Administrator tool. 2. In the Domain Navigator, select the PowerCenter Integration Service. 3. Click the Processes view. 4. Add the following environment variable: MAPR_HOME Use the following value: <INFA_HOME>/server/bin/javalib/hadoop/mapr Add the following custom property: JVMClassPath Use the following value: <INFA_HOME>/server/bin/javalib/hadoop/mapr402/*:<INFA_HOME>/ server/bin/javalib/hadoop/* Copying MapR Distribution Files for PowerCenter Mappings in the Native Environment When you use the MapR distribution to run mappings in a native environment, you must copy MapR files to the machine on which you install Big Data Edition. 1. Go to the following directory on any node in the cluster: <MapR installation directory>/conf. For example, go to the following directory on any node in the cluster: /opt/mapr/conf. 2. Find the following files: mapr-cluster.conf mapr.login.conf 3. Copy the files to the following directory to the machine on which the Data Integration Service runs: <Informatica installation directory>/server/bin/javalib/hadoop/mapr402/conf. 8

9 4. Log in to the Administrator tool. 5. In the Domain Navigator, select the PowerCenter Integration Service. 6. Recycle the service. Click Actions > Recycle Service. Enable User Impersonation for Native and Hive Execution Environments User impersonation allows the Data Integration Service to submit Hadoop jobs as a specific user. By default, Hadoop jobs are submitted with the user who runs the Data Integration Service. To enable user impersonation for the native and Hive environments, perform the following steps: 1. Go to the following directory on the machine on which the Data Integration Service runs: <Informatica installation directory>/services/shared/hadoop/mapr_4.0.2_yarn/conf 2. Create a directory named "proxy". Run the following command: mkdir <Informatica installation directory>/services/shared/hadoop/mapr_4.0.2_yarn/conf/ proxy 3. Change the permissions for the proxy directory to -rwxr-xr-x. Run the following command: chmod 755 <Informatica installation directory>/services/shared/hadoop/mapr_4.0.2_yarn/ conf/proxy 4. Verify the following details for the user that you want to impersonate with the Data Integration Service user: Exists on the machine on which the Data Integration Service runs Exists on every node in the Hadoop cluster Has the same user-id and group-id on machine on which the Data Integration Service runs as well as the Hadoop cluster. 5. Create a file for the Data Integration Service user that impersonates other users. Run the following command: touch <Informatica installation directory>/services/shared/hadoop/mapr_4.0.2_yarn/conf/ proxy/<username> For example, to create a file for the Data Integration Service user named user1 that is used to impersonate other users, run the following command: touch $INFA_HOME/services/shared/hadoop/mapr_4.0.2_yarn/conf/proxy/user1 6. Log in to the Administrator tool. 7. In the Domain Navigator, select the Data Integration Service. 8. Recycle the Data Integration Service. Click Actions > Recycle Service. Connections Overview Define the connections you want to use to access data in Hive or HDFS. You can create the following types of connections: HDFS connection. Create an HDFS connection to read data from or write data to the Hadoop cluster. 9

10 HBase connection. Create an HBase connection to access HBase. The HBase connection is a NoSQL connection Hive connection. Create a Hive connection to access Hive data or run Informatica mappings in the Hadoop cluster. Create a Hive connection in the following connection modes: - Use the Hive connection to access Hive as a source or target. If you want to use Hive as a target, you need to have the same connection or another Hive connection that is enabled to run mappings in the Hadoop cluster. You can access Hive as a source if the mapping is enabled for the native or Hive environment. You can access Hive as a target only if the mapping is run in the Hadoop cluster. - Use the Hive connection to validate or run an Informatica mapping in the Hadoop cluster. Before you run mappings in the Hadoop cluster, review the information in this guide about rules and guidelines for mappings that you can run in the Hadoop cluster. You can create the connections using the Developer tool, Administrator tool, and infacmd. Note: For information about creating connections to other sources or targets such as social media web sites or Teradata, see the respective PowerExchange adapter user guide for information. HDFS Connection Properties Use a Hadoop File System (HDFS) connection to access data in the Hadoop cluster. The HDFS connection is a file system type connection. You can create and manage an HDFS connection in the Administrator tool, Analyst tool, or the Developer tool. HDFS connection properties are case sensitive unless otherwise noted. Note: The order of the connection properties might vary depending on the tool where you view them. The following table describes HDFS connection properties: Property Name Name of the connection. The name is not case sensitive and must be unique within the domain. The name cannot exceed 128 characters, contain spaces, or contain the following special characters: ~ `! $ % ^ & * ( ) - + = { [ } ] \ : ; " ' <, >.? / ID Location Type User Name NameNode URI String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name. The description of the connection. The description cannot exceed 765 characters. The domain where you want to create the connection. Not valid for the Analyst tool. The connection type. Default is Hadoop File System. User name to access HDFS. The URI to access MapR-FS. Use the following URI: maprfs:/// 10

11 HBase Connection Properties Use an HBase connection to access HBase. The HBase connection is a NoSQL connection. You can create and manage an HBase connection in the Administrator tool or the Developer tool. Hbase connection properties are case sensitive unless otherwise noted. The following table describes HBase connection properties: Property Name ID The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters: ~ `! $ % ^ & * ( ) - + = { [ } ] \ : ; " ' <, >.? / String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name. The description of the connection. The description cannot exceed 4,000 characters. Location Type ZooKeeper Host(s) ZooKeeper Port Enable Kerberos Connection The domain where you want to create the connection. The connection type. Select HBase. Name of the machine that hosts the ZooKeeper server. Port number of the machine that hosts the ZooKeeper server. Use the value specified for hbase.zookeeper.property.clientport in hbase-site.xml. You can find hbase-site.xml on the Namenode machine in the following directory: /opt/mapr/hbase/hbase /conf Enables the Informatica domain to communicate with the HBase master server or region server that uses Kerberos authentication. 11

12 Property HBase Master Principal Service Principal Name (SPN) of the HBase master server. Enables the ZooKeeper server to communicate with an HBase master server that uses Kerberos authentication. Enter a string in the following format: hbase/<domain.name>@<your-realm> Where: - domain.name is the domain name of the machine that hosts the HBase master server. - YOUR-REALM is the Kerberos realm. HBase Region Server Principal Service Principal Name (SPN) of the HBase region server. Enables the ZooKeeper server to communicate with an HBase region server that uses Kerberos authentication. Enter a string in the following format: hbase_rs/<domain.name>@<your-realm> Where: - domain.name is the domain name of the machine that hosts the HBase master server. - YOUR-REALM is the Kerberos realm. Hive Connection Properties Use the Hive connection to access Hive data. A Hive connection is a database type connection. You can create and manage a Hive connection in the Administrator tool, Analyst tool, or the Developer tool. Hive connection properties are case sensitive unless otherwise noted. Note: The order of the connection properties might vary depending on the tool where you view them. The following table describes Hive connection properties: Property Name ID Location Type The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters: ~ `! $ % ^ & * ( ) - + = { [ } ] \ : ; " ' <, >.? / String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name. The description of the connection. The description cannot exceed 4000 characters. The domain where you want to create the connection. Not valid for the Analyst tool. The connection type. Select Hive. 12

13 Property Connection Modes User Name Common Attributes to Both the Modes: Environment SQL Hive connection mode. Select at least one of the following options: - Access Hive as a source or target. Select this option if you want to use the connection to access the Hive data warehouse. If you want to use Hive as a target, you must enable the same connection or another Hive connection to run mappings in the Hadoop cluster. - Use Hive to run mappings in Hadoop cluster. Select this option if you want to use the connection to run mappings in the Hadoop cluster. You can select both the options. Default is Access Hive as a source or target. User name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster. Use the user name of an operating system user that is present on all nodes on the Hadoop cluster. SQL commands to set the Hadoop environment. In native environment type, the Data Integration Service executes the environment SQL each time it creates a connection to a Hive metastore. If you use the Hive connection to run mappings in the Hadoop cluster, the Data Integration Service executes the environment SQL at the beginning of each Hive session. The following rules and guidelines apply to the usage of environment SQL in both connection modes: - Use the environment SQL to specify Hive queries. - Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. The path must be the fully qualified path to the JAR files used for user-defined functions. Set the parameter hive.aux.jars.path with all the entries in infapdo.aux.jars.path and the path to the JAR files for user-defined functions. - You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries. If you use the Hive connection to run mappings in the Hadoop cluster, the Data Integration service executes only the environment SQL of the Hive connection. If the Hive sources and targets are on different clusters, the Data Integration Service does not execute the different environment SQL commands for the connections of the Hive source or target. 13

14 Properties to Access Hive as Source or Target The following table describes the connection properties that you configure to access Hive as a source or target: Property Metadata Connection String Bypass Hive JDBC Server Data Access Connection String The JDBC connection URI used to access the metadata from the Hadoop server. You can use PowerExchange for Hive to communicate with a HiveServer service or HiveServer2 service. To connect to HiveServer2, specify the connection string in the following format: jdbc:hive2://<hostname>:<port>/<db> Where - <hostname> is name or IP address of the machine on which HiveServer2 runs. - <port> is the port number on which HiveServer2 listens. - <db> is the database name to which you want to connect. If you do not provide the database name, the Data Integration Service uses the default database details. JDBC driver mode. Select the check box to use the embedded JDBC driver mode. To use the JDBC embedded mode, perform the following tasks: - Verify that Hive client and Informatica services are installed on the same machine. - Configure the Hive connection properties to run mappings in the Hadoop cluster. If you choose the non-embedded mode, you must configure the Data Access Connection String. Informatica recommends that you use the JDBC embedded mode. The connection string to access data from the Hadoop data store. To connect to HiveServer2, specify the non-embedded JDBC mode connection string in the following format: jdbc:hive2://<hostname>:<port>/<db> Where - <hostname> is name or IP address of the machine on which HiveServer2 runs. - <port> is the port number on which HiveServer2 listens. - <db> is the database to which you want to connect. If you do not provide the database name, the Data Integration Service uses the default database details. Properties to Run Mappings in Hadoop Cluster The following table describes the Hive connection properties that you configure when you want to use the Hive connection to run Informatica mappings in the Hadoop cluster: Property Database Name Default FS URI Namespace for tables. Use the name default for tables that do not have a specified database name. The URI to access the default MapR File System. Use the following connection URI: maprfs:/// 14

15 Property Yarn Resource Manager URI The service within Hadoop that submits the MapReduce tasks to specific nodes in the cluster. For MapR with YARN, use the following format: <hostname>:<port> Where - <hostname> is the host name or IP address of the JobTracker or Yarn resource manager. - <port> is the port on which the JobTracker or Yarn resource manager listens for remote procedure calls (RPC). Use the value specified by yarn.resourcemanager.address in yarnsite.xml. You can find yarn-site.xml in the following directory on the NameNode: /opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop. For MapR with MapReduce 1, use the following URI: maprfs:/// Hive Warehouse Directory on HDFS Advanced Hive/Hadoop Properties Temporary Table Compression Codec The absolute HDFS file path of the default database for the warehouse that is local to the cluster. For example, the following file path specifies a local warehouse: /user/hive/warehouse If the Metastore Execution Mode is remote, then the file path must match the file path specified by the Hive Metastore Service on the hadoop cluster. Use the value specified for the hive.metastore.warehouse.dir property in hive-site.xml. You can find yarn-site.xml in the following directory on the node that runs HiveServer2: /opt/mapr/hive/hive-0.13/conf. Configures or overrides Hive or Hadoop cluster properties in hive-site.xml on the machine on which the Data Integration Service runs. You can specify multiple properties. Use the following format: <property1>=<value> Where - <property1> is a Hive or Hadoop property in hive-site.xml. - <value> is the value of the Hive or Hadoop property. To specify multiple properties use &: as the property separator. The maximum length for the format is 1 MB. If you enter a required property for a Hive connection, it overrides the property that you configure in the Advanced Hive/Hadoop Properties. The Data Integration Service adds or sets these properties for each map-reduce job. You can verify these properties in the JobConf of each mapper and reducer job. Access the JobConf of each job from the Jobtracker URL under each mapreduce job. The Data Integration Service writes messages for these properties to the Data Integration Service logs. The Data Integration Service must have the log tracing level set to log each row or have the log tracing level set to verbose initialization tracing. For example, specify the following properties to control and limit the number of reducers to run a mapping job: mapred.reduce.tasks=2&:hive.exec.reducers.max=10 Hadoop compression library for a compression codec class name. 15

16 Property Codec Class Name Metastore Execution Mode Metastore Database URI Metastore Database Driver Metastore Database Username Metastore Database Password Remote Metastore URI Codec class name that enables data compression and improves performance on temporary staging tables. Controls whether to connect to a remote metastore or a local metastore. By default, local is selected. For a local metastore, you must specify the Metastore Database URI, Driver, Username, and Password. For a remote metastore, you must specify only the Remote Metastore URI. The JDBC connection URI used to access the data store in a local metastore setup. Use the following connection URI: jdbc:<datastore type>://<node name>:<port>/<database name> where - <node name> is the host name or IP address of the data store. - <data store type> is the type of the data store. - <port> is the port on which the data store listens for remote procedure calls (RPC). - <database name> is the name of the database. For example, the following URI specifies a local metastore that uses MySQL as a data store: jdbc:mysql://hostname23:3306/metastore Use the value specified for the javax.jdo.option.connectionurl property in hive-site.xml. You can find hive-site.xml in the following directory on the node that runs HiveServer2: /opt/mapr/hive/hive-0.13/conf. Driver class name for the JDBC data store. For example, the following class name specifies a MySQL driver: Use the value specified for the javax.jdo.option.connectiondrivername property in hivesite.xml. You can find hive-site.xml in the following directory on the node that runs HiveServer2: /opt/mapr/hive/hive-0.13/conf. The metastore database user name. Use the value specified for the javax.jdo.option.connectionusername property in hive-site.xml. You can find hive-site.xml in the following directory on the node that runs HiveServer2: /opt/mapr/hive/hive-0.13/ conf. Required if the Metastore Execution Mode is set to local. The password for the metastore user name. Use the value specified for the javax.jdo.option.connectionpassword property in hive-site.xml. You can find hive-site.xml in the following directory on the node that runs HiveServer2: /opt/mapr/hive/hive-0.13/ conf. The metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you must specify the Thrift server details. Use the following connection URI: thrift://<hostname>:<port> Where - <hostname> is name or IP address of the Thrift metastore server. - <port> is the port on which the Thrift server is listening. Use the value specified for the hive.metastore.uris property in hivesite.xml. You can find hive-site.xml in the following directory on the node that runs HiveServer2: /opt/mapr/hive/hive-0.13/conf. 16

17 Creating a Connection Create a connection before you import data objects, preview data, profile data, and run mappings. 1. Click Window > Preferences. 2. Select Informatica > Connections. 3. Expand the domain in the Available Connections list. 4. Select the type of connection that you want to create: To select a Hive connection, select Database > Hive. To select an HDFS connection, select File Systems > Hadoop File System. 5. Click Add. 6. Enter a connection name and optional description. 7. Click Next. 8. Configure the connection properties. For a Hive connection, you must choose the Hive connection mode and specify the commands for environment SQL. The SQL commands appy to both the connection modes. Select at least one of the following connection modes: Option Access Hive as a source or target Run mappings in a Hadoop cluster. Use the connection to access Hive data. If you select this option and click Next, the Properties to Access Hive as a source or target page appears. Configure the connection strings. Use the Hive connection to validate and run Informatica mappings in the Hadoop cluster. If you select this option and click Next, the Properties used to Run Mappings in the Hadoop Cluster page appears. Configure the properties. 9. Click Test Connection to verify the connection. You can test a Hive connection that is configured to access Hive data. You cannot test a Hive connection that is configured to run Informatica mappings in the Hadoop cluster. 10. Click Finish. Troubleshooting This section describes troubleshooting information. A Hive pushdown mapping fails with the following error in the Hadoop job log: Container [pid=25720,containerid=container_ _0253_01_000002] is running beyond physical memory limits. Current usage: 1.1 GB of 1 GB physical memory used; 21.8 GB of 2.1 GB virtual memory used. Killing container To resolve this issue, you must modify yarn-site.xml on every node in the Hadoop cluster. Then, restart the cluster services. yarn-site.xml is located in the following directory on the Hadoop cluster nodes: /opt/mapr/hadoop/ hadoop-2.5.1/etc/hadoop. In yarn-site.xml, configure the following properties: Note: If a property does not exist, add it to yarn-site.xml. 17

18 yarn.nodemanager.resource.memory-mb Amount of physical memory, in MB, that can be allocated for containers. Use for the value. yarn.scheduler.minimum-allocation-mb The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this won't take effect, and the specified value will get allocated at minimum. Use 2048 for the value. yarn.scheduler.maximum-allocation-mb The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value. Use for the value. yarn.app.mapreduce.am.resource.mb The amount of memory the MR AppMaster needs. Use 2048 for the value. yarn.nodemanager.resource.cpu-vcores Number of CPU cores that can be allocated for containers. Use 8 for the value. The following sample code shows the properties you can configure in yarn-site.xml: <property> <name>yarn.nodemanager.resource.memory-mb</name> <description> Amount of physical memory, in MB, that can be allocated for containers. </description> <value>24000</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <description> The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this won't take effect, and the specified value will get allocated at minimum.</description> <value>2048</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <description> The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value.</ description> <value>24000</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <description> The amount of memory the MR AppMaster needs.</description> <value>2048</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <description> Number of CPU cores that can be allocated for containers. </ description> <value>8</value> </property> 18

19 Author Big Data Edition Team 19

How to Install and Configure EBF14514 for IBM BigInsights 3.0

How to Install and Configure EBF14514 for IBM BigInsights 3.0 How to Install and Configure EBF14514 for IBM BigInsights 3.0 2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

How to Install and Configure EBF16193 for Hortonworks HDP 2.3 and HotFix 3 Update 2

How to Install and Configure EBF16193 for Hortonworks HDP 2.3 and HotFix 3 Update 2 How to Install and Configure EBF16193 for Hortonworks HDP 2.3 and 9.6.1 HotFix 3 Update 2 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any

More information

How to Configure Informatica HotFix 2 for Cloudera CDH 5.3

How to Configure Informatica HotFix 2 for Cloudera CDH 5.3 How to Configure Informatica 9.6.1 HotFix 2 for Cloudera CDH 5.3 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

How to Install and Configure Big Data Edition for Hortonworks

How to Install and Configure Big Data Edition for Hortonworks How to Install and Configure Big Data Edition for Hortonworks 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

Pre-Installation Tasks Before you apply the update, shut down the Informatica domain and perform the pre-installation tasks.

Pre-Installation Tasks Before you apply the update, shut down the Informatica domain and perform the pre-installation tasks. Informatica LLC Big Data Edition Version 9.6.1 HotFix 3 Update 3 Release Notes January 2016 Copyright (c) 1993-2016 Informatica LLC. All rights reserved. Contents Pre-Installation Tasks... 1 Prepare the

More information

Configuring a Hadoop Environment for Test Data Management

Configuring a Hadoop Environment for Test Data Management Configuring a Hadoop Environment for Test Data Management Copyright Informatica LLC 2016, 2017. Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

How to Configure Big Data Management 10.1 for MapR 5.1 Security Features

How to Configure Big Data Management 10.1 for MapR 5.1 Security Features How to Configure Big Data Management 10.1 for MapR 5.1 Security Features 2014, 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

How to Run the Big Data Management Utility Update for 10.1

How to Run the Big Data Management Utility Update for 10.1 How to Run the Big Data Management Utility Update for 10.1 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

How to Write Data to HDFS

How to Write Data to HDFS How to Write Data to HDFS 2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior

More information

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2 Configuring s for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2 Copyright Informatica LLC 2016, 2017. Informatica, the Informatica logo, Big

More information

Configuring Sqoop Connectivity for Big Data Management

Configuring Sqoop Connectivity for Big Data Management Configuring Sqoop Connectivity for Big Data Management Copyright Informatica LLC 2017. Informatica, the Informatica logo, and Big Data Management are trademarks or registered trademarks of Informatica

More information

Informatica PowerExchange for Hive (Version HotFix 1) User Guide

Informatica PowerExchange for Hive (Version HotFix 1) User Guide Informatica PowerExchange for Hive (Version 10.1.1 HotFix 1) User Guide Informatica PowerExchange for Hive User Guide Version 10.1.1 HotFix 1 June 2017 Copyright Informatica LLC 2012, 2017 This software

More information

Informatica Cloud Spring Hadoop Connector Guide

Informatica Cloud Spring Hadoop Connector Guide Informatica Cloud Spring 2017 Hadoop Connector Guide Informatica Cloud Hadoop Connector Guide Spring 2017 December 2017 Copyright Informatica LLC 2015, 2017 This software and documentation are provided

More information

Informatica PowerExchange for Hive (Version 9.6.1) User Guide

Informatica PowerExchange for Hive (Version 9.6.1) User Guide Informatica PowerExchange for Hive (Version 9.6.1) User Guide Informatica PowerExchange for Hive User Guide Version 9.6.1 June 2014 Copyright (c) 2012-2014 Informatica Corporation. All rights reserved.

More information

CCA Administrator Exam (CCA131)

CCA Administrator Exam (CCA131) CCA Administrator Exam (CCA131) Cloudera CCA-500 Dumps Available Here at: /cloudera-exam/cca-500-dumps.html Enrolling now you will get access to 60 questions in a unique set of CCA- 500 dumps Question

More information

Exam Questions CCA-500

Exam Questions CCA-500 Exam Questions CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH) https://www.2passeasy.com/dumps/cca-500/ Question No : 1 Your cluster s mapred-start.xml includes the following parameters

More information

Tuning the Hive Engine for Big Data Management

Tuning the Hive Engine for Big Data Management Tuning the Hive Engine for Big Data Management Copyright Informatica LLC 2017. Informatica, the Informatica logo, Big Data Management, PowerCenter, and PowerExchange are trademarks or registered trademarks

More information

Cloudera Manager Quick Start Guide

Cloudera Manager Quick Start Guide Cloudera Manager Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this

More information

Configuring a JDBC Resource for MySQL in Metadata Manager

Configuring a JDBC Resource for MySQL in Metadata Manager Configuring a JDBC Resource for MySQL in Metadata Manager 2011 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

Informatica Cloud Spring Complex File Connector Guide

Informatica Cloud Spring Complex File Connector Guide Informatica Cloud Spring 2017 Complex File Connector Guide Informatica Cloud Complex File Connector Guide Spring 2017 October 2017 Copyright Informatica LLC 2016, 2017 This software and documentation are

More information

Informatica PowerExchange for Hive (Version HotFix 1) User Guide

Informatica PowerExchange for Hive (Version HotFix 1) User Guide Informatica PowerExchange for Hive (Version 9.5.1 HotFix 1) User Guide Informatica PowerExchange for Hive User Guide Version 9.5.1 HotFix 1 December 2012 Copyright (c) 2012-2013 Informatica Corporation.

More information

Informatica PowerExchange for Hive (Version 9.6.0) User Guide

Informatica PowerExchange for Hive (Version 9.6.0) User Guide Informatica PowerExchange for Hive (Version 9.6.0) User Guide Informatica PowerExchange for Hive User Guide Version 9.6.0 January 2014 Copyright (c) 2012-2014 Informatica Corporation. All rights reserved.

More information

Tuning Intelligent Data Lake Performance

Tuning Intelligent Data Lake Performance Tuning Intelligent Data Lake Performance 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without

More information

Configuring a JDBC Resource for IBM DB2 for z/os in Metadata Manager

Configuring a JDBC Resource for IBM DB2 for z/os in Metadata Manager Configuring a JDBC Resource for IBM DB2 for z/os in Metadata Manager 2011 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks

More information

Using MDM Big Data Relationship Management to Perform the Match Process for MDM Multidomain Edition

Using MDM Big Data Relationship Management to Perform the Match Process for MDM Multidomain Edition Using MDM Big Data Relationship Management to Perform the Match Process for MDM Multidomain Edition Copyright Informatica LLC 1993, 2017. Informatica LLC. No part of this document may be reproduced or

More information

SAS Viya 3.2 and SAS/ACCESS : Hadoop Configuration Guide

SAS Viya 3.2 and SAS/ACCESS : Hadoop Configuration Guide SAS Viya 3.2 and SAS/ACCESS : Hadoop Configuration Guide SAS Documentation July 6, 2017 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. SAS Viya 3.2 and SAS/ACCESS

More information

Performance Tuning and Sizing Guidelines for Informatica Big Data Management

Performance Tuning and Sizing Guidelines for Informatica Big Data Management Performance Tuning and Sizing Guidelines for Informatica Big Data Management 10.2.1 Copyright Informatica LLC 2018. Informatica, the Informatica logo, and Big Data Management are trademarks or registered

More information

Tuning Enterprise Information Catalog Performance

Tuning Enterprise Information Catalog Performance Tuning Enterprise Information Catalog Performance Copyright Informatica LLC 2015, 2018. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the United States

More information

Using Apache Zeppelin

Using Apache Zeppelin 3 Using Apache Zeppelin Date of Publish: 2018-04-01 http://docs.hortonworks.com Contents Introduction... 3 Launch Zeppelin... 3 Working with Zeppelin Notes... 5 Create and Run a Note...6 Import a Note...7

More information

Configuring a JDBC Resource for IBM DB2/ iseries in Metadata Manager HotFix 2

Configuring a JDBC Resource for IBM DB2/ iseries in Metadata Manager HotFix 2 Configuring a JDBC Resource for IBM DB2/ iseries in Metadata Manager 9.5.1 HotFix 2 2013 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Configuring a JDBC Resource for Sybase IQ in Metadata Manager

Configuring a JDBC Resource for Sybase IQ in Metadata Manager Configuring a JDBC Resource for Sybase IQ in Metadata Manager 2012 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

Upgrading Big Data Management to Version Update 2 for Hortonworks HDP

Upgrading Big Data Management to Version Update 2 for Hortonworks HDP Upgrading Big Data Management to Version 10.1.1 Update 2 for Hortonworks HDP Copyright Informatica LLC 2017. Informatica, the Informatica logo, and Informatica Big Data Management are trademarks or registered

More information

Using Two-Factor Authentication to Connect to a Kerberos-enabled Informatica Domain

Using Two-Factor Authentication to Connect to a Kerberos-enabled Informatica Domain Using Two-Factor Authentication to Connect to a Kerberos-enabled Informatica Domain Copyright Informatica LLC 2016, 2018. Informatica LLC. No part of this document may be reproduced or transmitted in any

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com Hortonworks Data Platform : Security Administration Tools Guide Copyright 2012-2014 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop,

More information

Upgrading Big Data Management to Version Update 2 for Cloudera CDH

Upgrading Big Data Management to Version Update 2 for Cloudera CDH Upgrading Big Data Management to Version 10.1.1 Update 2 for Cloudera CDH Copyright Informatica LLC 2017. Informatica, the Informatica logo, and Informatica Cloud are trademarks or registered trademarks

More information

How to Use Full Pushdown Optimization in PowerCenter

How to Use Full Pushdown Optimization in PowerCenter How to Use Full Pushdown Optimization in PowerCenter 2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

Hortonworks Data Platform

Hortonworks Data Platform Apache Spark Component Guide () docs.hortonworks.com : Apache Spark Component Guide Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and

More information

Informatica PowerExchange for Microsoft Azure Blob Storage 10.2 HotFix 1. User Guide

Informatica PowerExchange for Microsoft Azure Blob Storage 10.2 HotFix 1. User Guide Informatica PowerExchange for Microsoft Azure Blob Storage 10.2 HotFix 1 User Guide Informatica PowerExchange for Microsoft Azure Blob Storage User Guide 10.2 HotFix 1 July 2018 Copyright Informatica LLC

More information

New Features and Enhancements in Big Data Management 10.2

New Features and Enhancements in Big Data Management 10.2 New Features and Enhancements in Big Data Management 10.2 Copyright Informatica LLC 2017. Informatica, the Informatica logo, Big Data Management, and PowerCenter are trademarks or registered trademarks

More information

Cloudera Administration

Cloudera Administration Cloudera Administration Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

Configuring and Deploying Hadoop Cluster Deployment Templates

Configuring and Deploying Hadoop Cluster Deployment Templates Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page

More information

VMware vsphere Big Data Extensions Administrator's and User's Guide

VMware vsphere Big Data Extensions Administrator's and User's Guide VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 1.1 This document supports the version of each product listed and supports all subsequent versions until

More information

Enterprise Data Catalog Fixed Limitations ( Update 1)

Enterprise Data Catalog Fixed Limitations ( Update 1) Informatica LLC Enterprise Data Catalog 10.2.1 Update 1 Release Notes September 2018 Copyright Informatica LLC 2015, 2018 Contents Enterprise Data Catalog Fixed Limitations (10.2.1 Update 1)... 1 Enterprise

More information

DEC 31, HareDB HBase Client Web Version ( X & Xs) USER MANUAL. HareDB Team

DEC 31, HareDB HBase Client Web Version ( X & Xs) USER MANUAL. HareDB Team DEC 31, 2016 HareDB HBase Client Web Version (1.120.02.X & 1.120.02.Xs) USER MANUAL HareDB Team Index New features:... 3 Environment requirements... 3 Download... 3 Overview... 5 Connect to a cluster...

More information

Exam Questions CCA-505

Exam Questions CCA-505 Exam Questions CCA-505 Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam https://www.2passeasy.com/dumps/cca-505/ 1.You want to understand more about how users browse you public

More information

Because databases are not easily accessible by Hadoop, Apache Sqoop was created to efficiently transfer bulk data between Hadoop and external

Because databases are not easily accessible by Hadoop, Apache Sqoop was created to efficiently transfer bulk data between Hadoop and external Because databases are not easily accessible by Hadoop, Apache Sqoop was created to efficiently transfer bulk data between Hadoop and external structured datastores. The popularity of Sqoop in enterprise

More information

PowerExchange for Facebook: How to Configure Open Authentication using the OAuth Utility

PowerExchange for Facebook: How to Configure Open Authentication using the OAuth Utility PowerExchange for Facebook: How to Configure Open Authentication using the OAuth Utility 2013 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means

More information

How to Optimize Jobs on the Data Integration Service for Performance and Stability

How to Optimize Jobs on the Data Integration Service for Performance and Stability How to Optimize Jobs on the Data Integration Service for Performance and Stability 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Tuning Intelligent Data Lake Performance

Tuning Intelligent Data Lake Performance Tuning Intelligent Data Lake 10.1.1 Performance Copyright Informatica LLC 2017. Informatica, the Informatica logo, Intelligent Data Lake, Big Data Mangement, and Live Data Map are trademarks or registered

More information

Introduction To YARN. Adam Kawa, Spotify The 9 Meeting of Warsaw Hadoop User Group 2/23/13

Introduction To YARN. Adam Kawa, Spotify The 9 Meeting of Warsaw Hadoop User Group 2/23/13 Introduction To YARN Adam Kawa, Spotify th The 9 Meeting of Warsaw Hadoop User Group About Me Data Engineer at Spotify, Sweden Hadoop Instructor at Compendium (Cloudera Training Partner) +2.5 year of experience

More information

Performance Optimization for Informatica Data Services ( Hotfix 3)

Performance Optimization for Informatica Data Services ( Hotfix 3) Performance Optimization for Informatica Data Services (9.5.0-9.6.1 Hotfix 3) 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

This document contains information on fixed and known limitations for Test Data Management.

This document contains information on fixed and known limitations for Test Data Management. Informatica LLC Test Data Management Version 10.1.0 Release Notes December 2016 Copyright Informatica LLC 2003, 2016 Contents Installation and Upgrade... 1 Emergency Bug Fixes in 10.1.0... 1 10.1.0 Fixed

More information

Hadoop On Demand: Configuration Guide

Hadoop On Demand: Configuration Guide Hadoop On Demand: Configuration Guide Table of contents 1 1. Introduction...2 2 2. Sections... 2 3 3. HOD Configuration Options...2 3.1 3.1 Common configuration options...2 3.2 3.2 hod options... 3 3.3

More information

Hadoop. copyright 2011 Trainologic LTD

Hadoop. copyright 2011 Trainologic LTD Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides

More information

Creating Column Profiles on LDAP Data Objects

Creating Column Profiles on LDAP Data Objects Creating Column Profiles on LDAP Data Objects Copyright Informatica LLC 1993, 2017. Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

Hortonworks Technical Preview for Stinger Phase 3 Released: 12/17/2013

Hortonworks Technical Preview for Stinger Phase 3 Released: 12/17/2013 Architecting the Future of Big Data Hortonworks Technical Preview for Stinger Phase 3 Released: 12/17/2013 Document Version 1.0 2013 Hortonworks Inc. All Rights Reserved. Architecting the Future of Big

More information

Beta. VMware vsphere Big Data Extensions Administrator's and User's Guide. vsphere Big Data Extensions 1.0 EN

Beta. VMware vsphere Big Data Extensions Administrator's and User's Guide. vsphere Big Data Extensions 1.0 EN VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 1.0 This document supports the version of each product listed and supports all subsequent versions until

More information

CertKiller.CCA-500,55.Q&A

CertKiller.CCA-500,55.Q&A CertKiller.CCA-500,55.Q&A Number: CCA-500 Passing Score: 800 Time Limit: 120 min File Version: 14.15 http://www.gratisexam.com/ Pretty much all the questions we study that may have multiple answers, no

More information

Rev: A02 Updated: July 15, 2013

Rev: A02 Updated: July 15, 2013 Rev: A02 Updated: July 15, 2013 Welcome to Pivotal Command Center Pivotal Command Center provides a visual management console that helps administrators monitor cluster performance and track Hadoop job

More information

Apache Ranger User Guide

Apache Ranger User Guide Apache Ranger 0.5 - User Guide USER GUIDE Version : 0.5.0 September 2015 About this document Getting started General Features Login to the system: Log out to the system: Service Manager (Access Manager)

More information

Configuring Hadoop Security with Cloudera Manager

Configuring Hadoop Security with Cloudera Manager Configuring Hadoop Security with Cloudera Manager Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names

More information

Using Apache Phoenix to store and access data

Using Apache Phoenix to store and access data 3 Using Apache Phoenix to store and access data Date of Publish: 2018-07-15 http://docs.hortonworks.com Contents ii Contents What's New in Apache Phoenix...4 Orchestrating SQL and APIs with Apache Phoenix...4

More information

Hortonworks Data Platform

Hortonworks Data Platform Apache Ambari Views () docs.hortonworks.com : Apache Ambari Views Copyright 2012-2017 Hortonworks, Inc. All rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Apache Zeppelin Component Guide (December 15, 2017) docs.hortonworks.com Hortonworks Data Platform: Apache Zeppelin Component Guide Copyright 2012-2017 Hortonworks, Inc. Some

More information

KNIME Extension for Apache Spark Installation Guide. KNIME AG, Zurich, Switzerland Version 3.7 (last updated on )

KNIME Extension for Apache Spark Installation Guide. KNIME AG, Zurich, Switzerland Version 3.7 (last updated on ) KNIME Extension for Apache Spark Installation Guide KNIME AG, Zurich, Switzerland Version 3.7 (last updated on 2018-12-10) Table of Contents Introduction.....................................................................

More information

Integrating with Apache Hadoop

Integrating with Apache Hadoop HPE Vertica Analytic Database Software Version: 7.2.x Document Release Date: 10/10/2017 Legal Notices Warranty The only warranties for Hewlett Packard Enterprise products and services are set forth in

More information

Securing the Oracle BDA - 1

Securing the Oracle BDA - 1 Hello and welcome to this online, self-paced course titled Administering and Managing the Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled Securing the Oracle

More information

Security Enhancements in Informatica 9.6.x

Security Enhancements in Informatica 9.6.x Security Enhancements in Informatica 9.6.x 1993-2016 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or

More information

KNIME Extension for Apache Spark Installation Guide

KNIME Extension for Apache Spark Installation Guide Installation Guide KNIME GmbH Version 2.3.0, July 11th, 2018 Table of Contents Introduction............................................................................... 1 Supported Hadoop distributions...........................................................

More information

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018 Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018 K. Zhang (pic source: mapr.com/blog) Copyright BUDT 2016 758 Where

More information

Guidelines - Configuring PDI, MapReduce, and MapR

Guidelines - Configuring PDI, MapReduce, and MapR Guidelines - Configuring PDI, MapReduce, and MapR This page intentionally left blank. Contents Overview... 1 Set Up Your Environment... 2 Get MapR Server Information... 2 Set Up Your Host Environment...

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

Running PowerCenter Advanced Edition in Split Domain Mode

Running PowerCenter Advanced Edition in Split Domain Mode Running PowerCenter Advanced Edition in Split Domain Mode 1993-2016 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

Implementing Data Masking and Data Subset with IMS Unload File Sources

Implementing Data Masking and Data Subset with IMS Unload File Sources Implementing Data Masking and Data Subset with IMS Unload File Sources 2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

Migrating Mappings and Mapplets from a PowerCenter Repository to a Model Repository

Migrating Mappings and Mapplets from a PowerCenter Repository to a Model Repository Migrating Mappings and Mapplets from a PowerCenter Repository to a Model Repository 2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Publishing and Subscribing to Cloud Applications with Data Integration Hub

Publishing and Subscribing to Cloud Applications with Data Integration Hub Publishing and Subscribing to Cloud Applications with Data Integration Hub 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

Installation and Configuration Guide Simba Technologies Inc.

Installation and Configuration Guide Simba Technologies Inc. Simba Drill ODBC Driver with SQL Connector Installation and Configuration Guide Simba Technologies Inc. Version 1.3.15 November 1, 2017 Copyright 2017 Simba Technologies Inc. All Rights Reserved. Information

More information

Implementing Data Masking and Data Subset with IMS Unload File Sources

Implementing Data Masking and Data Subset with IMS Unload File Sources Implementing Data Masking and Data Subset with IMS Unload File Sources 2013 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying,

More information

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours) Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:

More information

Installing Datameer with MapR on an Edge Node

Installing Datameer with MapR on an Edge Node Installing Datameer with MapR on an Edge Node If Datameer is installed on an edge node and has to be connected with MapR, you also need to install the MapR client software on the edge node, so the node

More information

Vendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam.

Vendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam. Vendor: Cloudera Exam Code: CCA-505 Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam Version: Demo QUESTION 1 You have installed a cluster running HDFS and MapReduce

More information

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction

More information

Cloudera Administration

Cloudera Administration Cloudera Administration Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

Optimizing the Data Integration Service to Process Concurrent Web Services

Optimizing the Data Integration Service to Process Concurrent Web Services Optimizing the Data Integration Service to Process Concurrent Web Services 2012 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Important Notice Cloudera, Inc. All rights reserved.

Important Notice Cloudera, Inc. All rights reserved. Apache Hive Guide Important Notice 2010-2018 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks

More information

Hadoop. Introduction to BIGDATA and HADOOP

Hadoop. Introduction to BIGDATA and HADOOP Hadoop Introduction to BIGDATA and HADOOP What is Big Data? What is Hadoop? Relation between Big Data and Hadoop What is the need of going ahead with Hadoop? Scenarios to apt Hadoop Technology in REAL

More information

Exam Questions 1z0-449

Exam Questions 1z0-449 Exam Questions 1z0-449 Oracle Big Data 2017 Implementation Essentials https://www.2passeasy.com/dumps/1z0-449/ 1. What two actions do the following commands perform in the Oracle R Advanced Analytics for

More information

Hortonworks Data Platform

Hortonworks Data Platform Apache Spark Component Guide () docs.hortonworks.com : Apache Spark Component Guide Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and

More information

Server Installation Guide

Server Installation Guide Server Installation Guide Server Installation Guide Legal notice Copyright 2018 LAVASTORM ANALYTICS, INC. ALL RIGHTS RESERVED. THIS DOCUMENT OR PARTS HEREOF MAY NOT BE REPRODUCED OR DISTRIBUTED IN ANY

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

Implementing Informatica Big Data Management in an Amazon Cloud Environment

Implementing Informatica Big Data Management in an Amazon Cloud Environment Implementing Informatica Big Data Management in an Amazon Cloud Environment Copyright Informatica LLC 2017. Informatica LLC. Informatica, the Informatica logo, Informatica Big Data Management, and Informatica

More information

7 Deadly Hadoop Misconfigurations. Kathleen Hadoop Talks Meetup, 27 March 2014

7 Deadly Hadoop Misconfigurations. Kathleen Hadoop Talks Meetup, 27 March 2014 7 Deadly Hadoop Misconfigurations Kathleen Ting kathleen@apache.org @kate_ting Hadoop Talks Meetup, 27 March 2014 Who Am I? Started 3 yr ago as 1 st Cloudera Support Eng Now manages Cloudera s 2 largest

More information

Informatica Version Release Notes December Contents

Informatica Version Release Notes December Contents Informatica Version 10.1.1 Release Notes December 2016 Copyright Informatica LLC 1998, 2017 Contents Installation and Upgrade... 2 Support Changes.... 2 Migrating to a Different Database.... 5 Upgrading

More information

Chase Wu New Jersey Institute of Technology

Chase Wu New Jersey Institute of Technology CS 644: Introduction to Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Institute of Technology Some of the slides were provided through the courtesy of Dr. Ching-Yung Lin at Columbia

More information

Integrating Big Data with Oracle Data Integrator 12c ( )

Integrating Big Data with Oracle Data Integrator 12c ( ) [1]Oracle Fusion Middleware Integrating Big Data with Oracle Data Integrator 12c (12.2.1.1) E73982-01 May 2016 Oracle Fusion Middleware Integrating Big Data with Oracle Data Integrator, 12c (12.2.1.1)

More information

Table of Contents. Abstract

Table of Contents. Abstract JDBC User Guide 2013 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent

More information

Pentaho MapReduce with MapR Client

Pentaho MapReduce with MapR Client Pentaho MapReduce with MapR Client Change log (if you want to use it): Date Version Author Changes Contents Overview... 1 Before You Begin... 1 Use Case: Run MapReduce Jobs on Cluster... 1 Set Up Your

More information

3. Monitoring Scenarios

3. Monitoring Scenarios 3. Monitoring Scenarios This section describes the following: Navigation Alerts Interval Rules Navigation Ambari SCOM Use the Ambari SCOM main navigation tree to browse cluster, HDFS and MapReduce performance

More information