Configuring Sqoop Connectivity for Big Data Management

Similar documents
Configuring Intelligent Streaming 10.2 For Kafka on MapR

Configuring a Hadoop Environment for Test Data Management

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

How to Configure Informatica HotFix 2 for Cloudera CDH 5.3

How to Run the Big Data Management Utility Update for 10.1

New Features and Enhancements in Big Data Management 10.2

How to Install and Configure EBF16193 for Hortonworks HDP 2.3 and HotFix 3 Update 2

How to Install and Configure Big Data Edition for Hortonworks

How to Install and Configure EBF15545 for MapR with MapReduce 2

Upgrading Big Data Management to Version Update 2 for Hortonworks HDP

Informatica Cloud Spring Hadoop Connector Guide

Upgrading Big Data Management to Version Update 2 for Cloudera CDH

Informatica Cloud Spring Complex File Connector Guide

Pre-Installation Tasks Before you apply the update, shut down the Informatica domain and perform the pre-installation tasks.

How to Install and Configure EBF14514 for IBM BigInsights 3.0

How to Configure Big Data Management 10.1 for MapR 5.1 Security Features

Hortonworks Data Platform

Installing Apache Zeppelin

Informatica Big Data Management Hadoop Integration Guide

Informatica Big Data Management Big Data Management Administrator Guide

Performance Tuning and Sizing Guidelines for Informatica Big Data Management

Informatica Version Release Notes December Contents

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

How to Configure MapR Hive ODBC Connector with PowerCenter on Linux

KNIME Extension for Apache Spark Installation Guide. KNIME AG, Zurich, Switzerland Version 3.7 (last updated on )

Informatica Big Data Management (Version Update 2) Installation and Configuration Guide

KNIME Extension for Apache Spark Installation Guide

Enterprise Data Catalog Fixed Limitations ( Update 1)

BIG DATA TRAINING PRESENTATION

Hadoop Security. Building a fence around your Hadoop cluster. Lars Francke June 12, Berlin Buzzwords 2017

Guidelines - Configuring PDI, MapReduce, and MapR

Tuning the Hive Engine for Big Data Management

Informatica 10.2 Release Notes September Contents

Hadoop. Introduction / Overview

9.4 Hadoop Configuration Guide for Base SAS. and SAS/ACCESS

Cloudera Connector for Teradata

CCA Administrator Exam (CCA131)

Sizing Guidelines and Performance Tuning for Intelligent Streaming

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Making a POST Request Using Informatica Cloud REST API Connector

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Introduction to Cloudbreak

SAS Data Loader 2.4 for Hadoop

Exam Questions CCA-500

SAS Viya 3.2 and SAS/ACCESS : Hadoop Configuration Guide

How to Write Data to HDFS

Knox Implementation with AD/LDAP

HDP Security Overview

Informatica Big Data Management HotFix 1. Big Data Management Security Guide

HDP Security Overview

ISILON ONEFS WITH HADOOP KERBEROS AND IDENTITY MANAGEMENT APPROACHES. Technical Solution Guide

Polybase In Action. Kevin Feasel Engineering Manager, Predictive Analytics ChannelAdvisor #ITDEVCONNECTIONS ITDEVCONNECTIONS.COM

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

SAS 9.4 Hadoop Configuration Guide for Base SAS and SAS/ACCESS, Fourth Edition

Inria, Rennes Bretagne Atlantique Research Center

Integrating Big Data with Oracle Data Integrator 12c ( )

Pentaho MapReduce with MapR Client

Informatica Big Data Management (Version 10.1) Security Guide

docs.hortonworks.com

Hortonworks Data Platform

How to Use Topic Patterns in Kafka Data Objects

Hortonworks Data Platform

Hortonworks Technical Preview for Apache Falcon

Hortonworks Data Platform

Configuring Apache Knox SSO

Automation of Rolling Upgrade for Hadoop Cluster without Data Loss and Job Failures. Hiroshi Yamaguchi & Hiroyuki Adachi

Innovatus Technologies

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

iway Big Data Integrator New Features Bulletin and Release Notes

iway iway Big Data Integrator New Features Bulletin and Release Notes Version DN

VMware vsphere Big Data Extensions Administrator's and User's Guide

Enabling Single Sign-On Using Microsoft Azure Active Directory in Axon Data Governance 5.2

Installing SmartSense on HDP

INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?)

Apache Hadoop Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

Informatica Enterprise Information Catalog

About 1. Chapter 1: Getting started with oozie 2. Remarks 2. Versions 2. Examples 2. Installation or Setup 2. Chapter 2: Oozie

Talend Open Studio for Big Data. Getting Started Guide 5.3.2

Informatica Big Data Release Notes February Contents

Using Apache Phoenix to store and access data

Big Data Hadoop Stack

Hortonworks Data Platform

Configuring and Deploying Hadoop Cluster Deployment Templates

Tuning Enterprise Information Catalog Performance

Teradata Aster Database Drivers and Utilities Support Matrix

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Vendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam.

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

KillTest *KIJGT 3WCNKV[ $GVVGT 5GTXKEG Q&A NZZV ]]] QORRZKYZ IUS =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX

How to connect to Cloudera Hadoop Data Sources

Informatica PowerExchange for Microsoft Azure Blob Storage 10.2 HotFix 1. User Guide

Importing Metadata from Relational Sources in Test Data Management

Installing Apache Knox

Oracle Fusion Middleware Integrating Big Data with Oracle Data Integrator. 12 c ( )

How to Generate a Custom URL in the REST Web Service Consumer Transformation

Index. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI /

What would you do if you knew? Hortonworks Data Platform for Teradata Release Definition Release 2.3 B C July 2015

ambari administration 2 Administering Ambari Date of Publish:

Talend Open Studio for Big Data. Release Notes 5.4.1

Transcription:

Configuring Sqoop Connectivity for Big Data Management Copyright Informatica LLC 2017. Informatica, the Informatica logo, and Big Data Management are trademarks or registered trademarks of Informatica LLC in the United States and many jurisdictions throughout the world. A current list of Informatica trademarks is available on the web at https://www.informatica.com/trademarks.html.

Abstract Sqoop is a Hadoop command line program to process data between relational databases and HDFS through MapReduce programs. This article explains how to configure Sqoop connectivity with Big Data Management. Configure Sqoop connectivity for relational data objects, customized, data objects, and logical data objects that are based on a JDBC-compliant database. Supported Versions Informatica Big Data Management 10.1 Table of Contents Overview.... 2 Download the JDBC Driver JAR Files.... 2 Configure the HADOOP NODE JDK HOME Property in the hadoopenv.properties File.... 3 Configure the mapred-site.xml File for Cloudera Clusters.... 3 Configure the yarn-site.xml File for Cloudera Kerberos Clusters.... 4 Configure the mapred-site.xml File for Cloudera Kerberos non-ha Clusters.... 5 Configure the core-site.xml File for Ambari-based non-kerberos Clusters.... 5 Overview Big Data Management uses third-party Hadoop utilities such as Sqoop to process data efficiently. You can use Sqoop to import and export data. When you use Sqoop, you do not need to install the relational database client and software on any node in the Hadoop cluster. If you did not configure Sqoop connectivity when you installed Informatica Big Data Management, you can configure it later. Perform the following tasks to configure Sqoop connectivity with Big Data Management: 1. Download the JDBC driver JAR files for Sqoop connectivity. 2. Configure the HADOOP_NODE_JDK_HOME property in the hadoopenv.properties file. 3. Configure the mapred-site.xml file for Cloudera clusters. 4. Configure the yarn-site.xml file for Cloudera Kerberos clusters. 5. Configure the mapred-site.xml file for Cloudera Kerberos non-ha clusters. 6. Configure the core-site.xml file for Ambari-based non-kerberos clusters. Download the JDBC Driver JAR Files To configure Sqoop connectivity for relational databases, you must download the relevant JDBC driver jar files and copy the jar files to the node where the Data Integration Service runs. At run time, the Data Integration Service copies the jar files to the Hadoop distribution cache so that the jar files are accessible to all nodes in the Hadoop cluster. You can use any Type 4 JDBC driver that the database vendor recommends for Sqoop connectivity. Note: The DataDirect JDBC drivers that Informatica ships are not licensed for Sqoop connectivity. 2

If you use the Cloudera Connector Powered by Teradata or Hortonworks Connector for Teradata, you must download additional JAR files and copy them to the node where the Data Integration Service runs. 1. Download the JDBC driver jar files for the database that you want to connect to. 2. If you use the Cloudera Connector Powered by Teradata, perform the following steps: a. Download the Cloudera Connector Powered by Teradata package from the following URL: http://www.cloudera.com/downloads.html The package is named as sqoop-connector-teradata-<version>.tar.gz. Download all the jar files in the package. b. Download the terajdbc4.jar file and tdgssconfig.jar file from the following URL: http://downloads.teradata.com/download/connectivity/jdbc-driver 3. If you use the Hortonworks Connector for Teradata, perform the following steps: a. Download the Hortonworks Connector for Teradata package from the following URL: http://hortonworks.com/downloads/#addons The package is named as hdp-connector-for-teradata-<version>-distro.tar.gz. Download all the jar files in the package. b. Download the avro-mapred-1.7.4-hadoop2.jar file from the following URL: https://archive.apache.org/dist/avro/avro-1.7.4/java/ 4. On the node where the Data Integration Service runs, copy all the JAR files mentioned in the earlier steps to the following directory: <Informatica installation directory>\externaljdbcjars Configure the HADOOP NODE JDK HOME Property in the hadoopenv.properties File Before you run Sqoop mappings, you must configure the HADOOP_NODE_JDK_HOME property in the hadoopenv.properties file on the Data Integration Service node. Configure the HADOOP_NODE_JDK_HOME property to point to the JDK version that the cluster nodes use. You must use JDK version 1.7 or later. 1. Go to the following location: <Informatica installation directory>/services/shared/hadoop/ <Hadoop_distribution_name>_<version_number>/infaConf 2. Find the file named hadoopenv.properties. 3. Back up the file before you update it. 4. Use a text editor to open the file. 5. Define the HADOOP_NODE_JDK_HOME property as follows: infapdo.env.entry.hadoop_node_jdk_home=hadoop_node_jdk_home=<cluster_jdk_home>/jdk<version> For example, infapdo.env.entry.hadoop_node_jdk_home=hadoop_node_jdk_home=/usr/java/default 6. Save the properties file with the name hadoopenv.properties. Configure the mapred-site.xml File for Cloudera Clusters Before you run Sqoop mappings on Cloudera clusters, you must configure MapReduce properties in the mapredsite.xml file on the Hadoop cluster, and restart Hadoop services and the cluster. 1. Open the Yarn Configuration in Cloudera Manager. 3

3. Click + and configure the following properties: Property mapreduce.application.classpath 2. Find the property named NodeManager Advanced Configuration Snippet (Safety Valve) for mapredsite.xml. mapreduce.jobhistory.intermediate-donedir Value $HADOOP_MAPRED_HOME/,$HADOOP_MAPRED_HOME/lib/, $MR2_CLASSPATH,$CDH_MR2_HOME <Directory where the map-reduce jobs write history files> 4. Select the Final check box. 5. Redeploy the client configurations. 6. Restart Hadoop services and the cluster. Configure the yarn-site.xml File for Cloudera Kerberos Clusters To run Sqoop mappings on Cloudera clusters that use Kerberos authentication, you must configure properties in the yarn-site.xml file on the Data Integration Service node and restart the Data Integration Service. Copy the following properties from the mapred-site.xml file on the cluster and add them to the yarn-site.xml file on the Data Integration Service node: mapreduce.jobhistory.address Location of the MapReduce JobHistory Server. The default value is 10020. <name>mapreduce.jobhistory.address</name> <value>hostname:port</value> <description>mapreduce JobHistory Server IPC host:port</description> mapreduce.jobhistory.principal SPN for the MapReduce JobHistory server. <name>mapreduce.jobhistory.principal</name> <value>mapred/_host@your-realm</value> <description>spn for the MapReduce JobHistory server</description> mapreduce.jobhistory.webapp.address Web address of the MapReduce JobHistory Server. The default value is 19888. <name>mapreduce.jobhistory.webapp.address</name> <value>hostname:port</value> <description>mapreduce JobHistory Server Web UI host:port</description> mapreduce.application.classpath Classpaths for MapReduce applications. <name>mapreduce.application.classpath</name> 4

<value>$hadoop_mapred_home/*,$hadoop_mapred_home/lib/*,$mr2_classpath, $CDH_MR2_HOME</value> <description>classpaths for MapReduce applications</description> Configure the mapred-site.xml File for Cloudera Kerberos non- HA Clusters Before you run Sqoop mappings on the Spark and Blaze engines, and on Cloudera Kerberos clusters that are not enabled with NameNode high availability, you must configure the mapreduce.jobhistory.address property in the mapred-site.xml file on the Hadoop cluster, and restart Hadoop services and the cluster. 1. Open the Yarn Configuration in Cloudera Manager. 2. Find the property named NodeManager Advanced Configuration Snippet (Safety Valve) for mapredsite.xml. 3. Click +. 4. Enter the name as mapreduce.jobhistory.address. 5. Set the value as follows: <MapReduce JobHistory Server hostname>:<port> 6. Select the Final check box. 7. Redeploy the client configurations. 8. Restart Hadoop services and the cluster. Configure the core-site.xml File for Ambari-based non- Kerberos Clusters To run Sqoop mappings on IBM BigInsights, Hortonworks HDP, or Azure HDInsight clusters that do not use Kerberos authentication, you must create a proxy user for the yarn user who will impersonate other users. You must configure the impersonation properties in the core-site.xml file on the Hadoop cluster, and restart Hadoop services and the cluster. Configure the following user impersonation properties in the core-site.xml file: hadoop.proxyuser.yarn.groups <name>hadoop.proxyuser.yarn.groups</name> <value><name_of_the_impersonation_user></value> <description>allows impersonation from any group.</description> hadoop.proxyuser.yarn.hosts <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value> <description>allows impersonation from any host.</description> Author Ellen Chandler Principal Technical Writer 5