iway iway Big Data Integrator Getting Started Lab Version DN

Similar documents
iway Omni-Payer Management Central User s Guide Version DN

Source Management (Version Control) Installation and Configuration Guide. Version 8.0 and Higher

iway Integration Tools Getting Started Guide Version 8.0 and Higher

iway iway Big Data Integrator User s Guide Version DN

Using the DataDirect ODBC Driver for Hyperstage

iway iway Big Data Integrator New Features Bulletin and Release Notes Version DN

WebFOCUS Open Portal Services Administration Guide. Release 8.0 Version 09

Omni-Gen Consumption View User s Guide. Version 3.8

iway iway Big Data Integrator New Features Bulletin and Release Notes Version DN

iway Big Data Integrator New Features Bulletin and Release Notes

App Studio for the Developer Studio User

WebFOCUS Narrative Charts Installation and Configuration Release 8.2 Version 04

Omni-Gen API Services Reference Guide. Version 3.6

Data Quality Workbench User s Guide. Version 3.2 and Higher

Mobile Faves for Android User's Manual Version 3.1

WebFOCUS Managed Reporting End User's Manual

Omni Console User s Guide. Version 3.6.3

Managed Reporting Release 8.2 Version 01

Mobile Faves for Android User's Manual Version 3 Release 2.1 and Higher

Migration Release 8.0 Version 09

Creating WebFOCUS Infographics Release 8.2 Version 04

WebFOCUS App Studio Installation and Configuration Guide Release 8.2 Version 04

WebFOCUS App Studio Installation and Configuration Guide Release 8.2 Version 02

iway iway Adapter for CORBA for BEA WebLogic User s Guide Version 5 Release 5

Managed Reporting Release 8.2 Version 03

IBM DB2 Web Query for IBM i. Version 2 Release 2

Mobile Faves Branding Package Developer s Guide

iway iway Application Systems Adapter for Amdocs ClarifyCRM for BEA WebLogic User s Guide Version 5 Release 5

Active Technologies User's Guide Release 8.2 Version 01M

IBM DB2 Web Query for IBM i. Version 2 Release 2

iway iway Java Adapter for Mainframe Introduction

iway iway Adapter for IBM WebSphere MQ (MQSeries) User s Guide Version 5 Release 5

iway iway Adapter for RDBMS for BEA WebLogic Server User s Guide Version 5 Release 5

iway iway Emulation Adapter (3270/5250) for BEA WebLogic User s Guide Version 5 Release 5

iway Software Development Kit User's Guide Version 8.0 and Higher

Migration Release 8.2 Version 04

iway iway Adapter for User s Guide Version 5 Release 5

Troubleshooting Release 8.2 Version 01M

Server Release Notes WebFOCUS Reporting Server Release 8203 DataMigrator Server Release 7708

WebFOCUS Business User Edition Release Guide Release 8.2 Version 01

Using WebFOCUS Designer Release 8.2 Version 03

WebFOCUS Adapter for Geographic Information Systems Getting Started Release 8.2 Version 02

WebFOCUS RStat Release Guide Version 3.0

iway Troubleshooting and Debugging Best Practices for iway Service Manager Version 7 DN

iway iway Application System Adapter for Amdocs ClarifyCRM User s Guide Version 5 Release 5

iway Cross-Channel Services Guide Version 8.0.x and Higher

Getting Started With the Responsive Design Framework Add-on for WebFOCUS App Studio Release 8.1 Version 05

iway iway Adapter for CICS User s Guide User s Guide Version 5 Release 5

DB2 Web Query Active Technologies for Mobile Web Apps

iway iway Adapter for Telnet for BEA WebLogic User s Guide Version 5.5 DN

Troubleshooting Release 8.2 Version 02

Using the JSON Iterator

iway iway Java Adapter for Mainframe Samples Guide

iway BEA WebLogic Solutions Guide for iway Version 5.5

Big Data Hadoop Stack

Hortonworks Data Platform

iway iway Application Adapter for Oracle E-Business Suite User s Guide Version 5 Release 5 DN

SC-T35/SC-T45/SC-T46/SC-T47 ViewSonic Device Manager User Guide

Virtual Appliance User s Guide

Using Hive for Data Warehousing

iway iway Application Systems Adapter for Oracle E- Business Suite for Sun User s Guide Version 5 Release 5

SAS Data Loader 2.4 for Hadoop

WebFOCUS Business User Edition Release 8.2 Version 02

iway iway Server Installation Version 5 Release 3.2

Cloudera Manager Quick Start Guide

Extracting and Storing PDF Form Data Into a Repository

iway iway Transaction Adapter for CICS (XML) for BEA WebLogic User s Guide Version 5 Release 5

DB2 Web Query New Features. July 2010 Hotfix

Log & Event Manager UPGRADE GUIDE. Version Last Updated: Thursday, May 25, 2017

HP Intelligent Management Center SOM Administrator Guide

Server Release Notes. WebFOCUS Reporting Server Release 82. DataMigrator Server Release 7707 DN

Desktop Installation Guide

WebFOCUS Business User Edition Release 8.2 Version 01M

Working With Data Release 8.2 Version 01M

i2b2 Workbench Developer s Guide: Eclipse Neon & i2b2 Source Code

Cisco TEO Adapter Guide for SAP ABAP

Talend Big Data Sandbox. Big Data Insights Cookbook

MapMarker Plus Developer Installation Guide

Talend Open Studio for Data Quality. User Guide 5.5.2

Magnify Search Quick Start Guide Release 8.2 Version 02

MapMarker Plus Desktop Installation Guide

Overview. Borland VisiBroker 7.0

InQuira Analytics Installation Guide

Moving a File (Event-Based)

Talend Big Data Sandbox. Big Data Insights Cookbook

Integrating Big Data with Oracle Data Integrator 12c ( )

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

EMC SourceOne Management Pack for Microsoft System Center Operations Manager

SAS Model Manager 2.3

Server Installation Guide

Building an Application to Dynamically Execute Partner Process Flows

Desktop Installation Guide

Create Your First Print-Quality Reports

BEA BEA WebLogic ERP Adapter Installation and Configuration Version For WebLogic Server 9.1

In this lab, you will build and execute a simple message flow. A message flow is like a program but is developed using a visual paradigm.

Acronis Backup & Recovery 11 Beta Advanced Editions

ALTIRIS CONNECTOR 6.0 FOR HP SYSTEMS INSIGHT MANAGER PRODUCT GUIDE

iway iway Server Installation Version 5 Release 3.3

Online Analytical Processing (OLAP) Release 8.2 Version 01M and Higher

Dell Storage Compellent Integration Tools for VMware

Transcription:

iway iway Big Data Integrator Getting Started Lab Version 1.4.0 DN3502228.0816

Active Technologies, EDA, EDA/SQL, FIDEL, FOCUS, Information Builders, the Information Builders logo, iway, iway Software, Parlay, PC/FOCUS, RStat, Table Talk, Web390, WebFOCUS, WebFOCUS Active Technologies, and WebFOCUS Magnify are registered trademarks, and DataMigrator and Hyperstage are trademarks of Information Builders, Inc. Adobe, the Adobe logo, Acrobat, Adobe Reader, Flash, Adobe Flash Builder, Flex, and PostScript are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and/or other countries. Due to the nature of this material, this document refers to numerous hardware and software products by their trademarks. In most, if not all cases, these designations are claimed as trademarks or registered trademarks by their respective companies. It is not this publisher's intent to use any of these names generically. The reader is therefore cautioned to investigate all claimed trademark rights before using any of these names other than to refer to the product described. Copyright 2016, by Information Builders, Inc. and iway Software. All rights reserved. Patent Pending. This manual, or parts thereof, may not be reproduced in any form without the written permission of Information Builders, Inc.

iway Contents Preface...5 Documentation Conventions...5 Related Publications...6 Customer Support...6 Help Us to Serve You Better...7 User Feedback...9 iway Software Training and Professional Services...9 1. iway Big Data Integrator Getting Started Lab...11 Overview...12 Prerequisites...12 Minimum Hardware Requirements...12 Download the Cloudera Quick Start Image...12 Download Oracle VM VirtualBox...14 Installing Oracle VM VirtualBox on the Host System...14 Extracting the Cloudera Hadoop Quick Start Image...22 Loading and Starting the Cloudera Hadoop Quick Start Image Using Oracle VM VirtualBox Manager...23 Starting iway Big Data Integrator on the Cloudera Hadoop Quick Start Image...28 Configuring a Hive Database Connection...29 Configuring a MySQL Database Connection...42 Creating a New Project...51 Configuring a Connection to a Hadoop Distributed File System Server...55 Creating and Running a Sqoop Configuration...58 Creating and Running a Mapping Configuration...69 Creating and Running a Flume Configuration...97 Creating and Running a Data Wrangler Configuration...114 Additional Resources...124 Videos...124 iway Big Data Integrator Getting Started Lab 3

Contents User Documentation...125 Reader Comments...127 4 iway Software

iway Preface iway Big Data Integrator (BDI) is a solution that provides a design-time and runtime framework, which allows you to leverage Hadoop as a data integration platform. Using iway BDI, you can import and transform data natively in Hadoop. In the Integration Edition, iway BDI provides relational data replication and Change Data Capture (CDC) using Apache Sqoop along with data de-duplication. iway BDI also provides streaming and unstructured data capture using Apache Flume. In addition, iway BDI also provides native data transformation capabilities with a rich array of out of the box functions. This getting started lab describes how to download, install, and configure the Cloudera Hadoop Quick Start Image, which includes an installation of iway BDI Version 1.4.0. How This Manual Is Organized This manual includes the following chapters: 1 Chapter/Appendix iway Big Data Integrator Getting Started Lab Contents Describes how to download, install, and configure the Cloudera Hadoop Quick Start Image, which includes an installation of iway Big Data Integrator (BDI) Version 1.4.0. Documentation Conventions The following table describes the documentation conventions that are used in this manual. Convention THIS TYPEFACE or this typeface this typeface Description Denotes syntax that you must enter exactly as shown. Represents a placeholder (or variable), a cross-reference, or an important term. It may also indicate a button, menu item, or dialog box option that you can click or select. iway Big Data Integrator Getting Started Lab 5

Related Publications Convention underscore Key + Key { }...... Description Indicates a default setting. Indicates keys that you must press simultaneously. Indicates two or three choices. Type one of them, not the braces. Separates mutually exclusive choices in syntax. Type one of them, not the symbol. Indicates that you can enter a parameter multiple times. Type only the parameter, not the ellipsis (...). Indicates that there are (or could be) intervening or additional commands. Related Publications Visit our Technical Documentation Library at http://documentation.informationbuilders.com. You can also contact the Publications Order Department at (800) 969-4636. Customer Support Do you have any questions about this product? Join the Focal Point community. Focal Point is our online developer center and more than a message board. It is an interactive network of more than 3,000 developers from almost every profession and industry, collaborating on solutions and sharing tips and techniques. Access Focal Point at http://forums.informationbuilders.com/eve/forums. You can also access support services electronically, 24 hours a day, with InfoResponse Online. InfoResponse Online is accessible through our website, http://www.informationbuilders.com. It connects you to the tracking system and knownproblem database at the Information Builders support center. Registered users can open, update, and view the status of cases in the tracking system and read descriptions of reported software issues. New users can register immediately for this service. The technical support section of http://www.informationbuilders.com also provides usage techniques, diagnostic tips, and answers to frequently asked questions. 6 iway Software

Preface Call Information Builders Customer Support Services (CSS) at (800) 736-6130 or (212) 736-6130. Customer Support Consultants are available Monday through Friday between 8:00 a.m. and 8:00 p.m. EST to address all your questions. Information Builders consultants can also give you general guidance regarding product capabilities and documentation. Please be ready to provide your six-digit site code number (xxxx.xx) when you call. To learn about the full range of available support services, ask your Information Builders representative about InfoResponse Online, or call (800) 969-INFO. Help Us to Serve You Better To help our consultants answer your questions effectively, be prepared to provide specifications and sample files and to answer questions about errors and problems. The following tables list the environment information our consultants require. Platform Operating System OS Version JVM Vendor JVM Version The following table lists the deployment information our consultants require. Adapter Deployment Container For example, iway Business Services Provider, iway Service Manager For example, WebSphere Version Enterprise Information System (EIS) - if any EIS Release Level EIS Service Pack EIS Platform iway Big Data Integrator Getting Started Lab 7

Help Us to Serve You Better The following table lists iway-related information needed by our consultants. iway Adapter iway Release Level iway Patch The following table lists additional questions to help us serve you better. Request/Question Error/Problem Details or Information Did the problem arise through a service or event? Provide usage scenarios or summarize the application that produces the problem. When did the problem start? Can you reproduce this problem consistently? Describe the problem. Describe the steps to reproduce the problem. Specify the error message(s). Any change in the application environment: software configuration, EIS/database configuration, application, and so forth? Under what circumstance does the problem not occur? The following is a list of error/problem files that might be applicable. Input documents (XML instance, XML schema, non-xml documents) 8 iway Software

Preface Transformation files Error screen shots Error output files Trace files User Feedback Service Manager package to reproduce problem Custom functions and agents in use Diagnostic Zip Transaction log For information on tracing, see the iway Service Manager User's Guide. In an effort to produce effective documentation, the Technical Content Management staff welcomes your opinions regarding this document. Please use the Reader Comments form at the end of this document to communicate your feedback to us or to suggest changes that will support improvements to our documentation. You can also contact us through our website, http://documentation.informationbuilders.com/connections.asp. Thank you, in advance, for your comments. iway Software Training and Professional Services Interested in training? Our Education Department offers a wide variety of training courses for iway Software and other Information Builders products. For information on course descriptions, locations, and dates, or to register for classes, visit our website, http://education.informationbuilders.com, or call (800) 969-INFO to speak to an Education Representative. Interested in technical assistance for your implementation? Our Professional Services department provides expert design, systems architecture, implementation, and project management services for all your business integration projects. For information, visit our website, http://www.informationbuilders.com/support. iway Big Data Integrator Getting Started Lab 9

iway Software Training and Professional Services 10 iway Software

iway 1 iway Big Data Integrator Getting Started Lab This getting started lab describes how to download, install, and configure the Cloudera Hadoop Quick Start Image, which includes an installation of iway Big Data Integrator (BDI) Version 1.4.0. Topics: Overview Prerequisites Installing Oracle VM VirtualBox on the Host System Extracting the Cloudera Hadoop Quick Start Image Loading and Starting the Cloudera Hadoop Quick Start Image Using Oracle VM VirtualBox Manager Starting iway Big Data Integrator on the Cloudera Hadoop Quick Start Image Configuring a Hive Database Connection Configuring a MySQL Database Connection Creating a New Project Configuring a Connection to a Hadoop Distributed File System Server Creating and Running a Sqoop Configuration Creating and Running a Mapping Configuration Creating and Running a Flume Configuration Creating and Running a Data Wrangler Configuration Additional Resources iway Big Data Integrator Getting Started Lab 11

Overview Overview Prerequisites iway Big Data Integrator (BDI) Integration Edition simplifies the creation, management, and use of Hadoop-based data lakes. It provides a modern, native approach to Hadoopbased data integration and management that ensures high levels of capability, compatibility, and flexibility to help your organization. iway BDI runs on all major Hadoop distributions, ensuring high portability. It ingests and cleanses traditional, mobile, social media, sensor, and other data in batch or streams, using native Hadoop facilities. It also runs under YARN, taking advantage of native Hadoop performance and resource negotiation, and leverages the Spark processing engine, if available. iway Big Data Integrator includes a simplified, easy-to-use (Eclipse-based) interface, so you can spend less time coding and debugging. This getting started lab describes how to download, install, and configure the Cloudera Hadoop Quick Start Image, which includes an installation of iway BDI Version 1.4.0. In this section: Minimum Hardware Requirements Download the Cloudera Quick Start Image Download Oracle VM VirtualBox Before continuing, ensure that you review the prerequisites that are described in this section. Minimum Hardware Requirements Ensure that your host system has the following hardware requirements available: At least 16 GB of hard disk space. At least 8 GB of RAM. At least 4 CPU cores. Download the Cloudera Quick Start Image ibdi_lab_start_image.zip (7.78 GB) You can download this image from the following FTP site: ftp.ibi.com/incoming/ibdi 12 iway Software

1. iway Big Data Integrator Getting Started Lab Note: To obtain the required credentials (user name and password) to access this FTP site, contact Information Builders Customer Support Services: Online: http://techsupport.ibi.com Phone: (800) 736-6130 Open File Explorer and type ftp.ibi.com/incoming/ibdi in the Quick access field, as shown in the following image. Before you download the ibdi_lab_start_image.zip file, right-click anywhere in the File Explorer window and select Login As from the context menu, as shown in the following image. iway Big Data Integrator Getting Started Lab 13

Installing Oracle VM VirtualBox on the Host System The Log On As dialog opens, as shown in the following image. Log on using a valid user name and password. Download (copy) the ibdi_lab_start_image.zip file to a location on your file system. Download Oracle VM VirtualBox VirtualBox-5.0.26-108824-Win.exe (110 MB) This software is available for download from the following website: https://www.virtualbox.org Installing Oracle VM VirtualBox on the Host System How to: Install Oracle VM VirtualBox on the Host System The images in this procedure were captured recently during an installation of Oracle VM VirtualBox Version 5.0.22. Although the images show 5.0.22 versioning, all of the steps can be followed as described for Oracle VM VirtualBox Version 5.0.26. 14 iway Software

1. iway Big Data Integrator Getting Started Lab Procedure: How to Install Oracle VM VirtualBox on the Host System 1. Double-click the VirtualBox-5.0.26-108824-Win.exe file. The Oracle VM VirtualBox Setup installation wizard opens, as shown in the following image. 2. Click Next. iway Big Data Integrator Getting Started Lab 15

Installing Oracle VM VirtualBox on the Host System The Custom Setup (choose location) pane opens, as shown in the following image. If required, you can change the location where Oracle VM VirtualBox is installed. 3. Click Next to accept the default location. 16 iway Software

1. iway Big Data Integrator Getting Started Lab The Custom Setup pane opens, as shown in the following image. 4. Click Next to accept the default options. iway Big Data Integrator Getting Started Lab 17

Installing Oracle VM VirtualBox on the Host System The Warning: Network Interfaces pane opens, as shown in the following image. 5. Click Yes to continue with the Oracle VM VirtualBox installation. 18 iway Software

1. iway Big Data Integrator Getting Started Lab The Ready to Install pane opens, as shown in the following image. 6. Click Install. iway Big Data Integrator Getting Started Lab 19

Installing Oracle VM VirtualBox on the Host System Oracle VM VirtualBox is installed on your system and a status indicator is displayed, as shown in the following image. During the installation, the following Windows Security messages/prompts are displayed: Windows Security Message #1 (Oracle Corporation Universal Service Bus): 20 iway Software

1. iway Big Data Integrator Getting Started Lab Windows Security Message #2 (Oracle Corporation Network Adapters): Windows Security Message #3 (Oracle Corporation Network Service): 7. Click Install for each Windows Security message/prompt that is displayed. iway Big Data Integrator Getting Started Lab 21

Extracting the Cloudera Hadoop Quick Start Image The Oracle VM VirtualBox installation is complete pane opens, as shown in the following image. 8. Click Finish. You have successfully installed Oracle VM VirtualBox Version 5.0.26 on your system. You are now ready to load and start the Cloudera Hadoop Quick Start Image you downloaded, which includes an installation of iway BDI Version 1.4.0. Extracting the Cloudera Hadoop Quick Start Image After you have downloaded the ibdi_lab_start_image.zip file from ftp.ibi.com/incoming/ibdi, extract this archive to a location on your file system. The ibdi-getting-started_lab.vbox file is made available, as shown in the following image. 22 iway Software

1. iway Big Data Integrator Getting Started Lab Note: To save disk space on your system, after extracting the ibdi-getting-started_lab.vbox file, you can delete the original ibdi_lab_start_image.zip file. Loading and Starting the Cloudera Hadoop Quick Start Image Using Oracle VM VirtualBox Manager How to: Load and Start the Cloudera Hadoop Quick Start Image Using Oracle VM VirtualBox Manager This section describes how to load and start the Cloudera Hadoop Quick Start image using Oracle VM VirtualBox Manager. Procedure: How to Load and Start the Cloudera Hadoop Quick Start Image Using Oracle VM VirtualBox Manager 1. Open Oracle VM VirtualBox Manager by: Double-clicking the Oracle VM VirtualBox shortcut icon on your desktop, as shown in the following image. or: Clicking the Windows Start menu, selecting All Programs, and clicking Oracle VM VirtualBox from the programs group, as shown in the following image. iway Big Data Integrator Getting Started Lab 23

Loading and Starting the Cloudera Hadoop Quick Start Image Using Oracle VM VirtualBox Manager The Oracle VM VirtualBox Manager opens, as shown in the following image. 2. Click the Machine menu and select Add, as shown in the following image. 24 iway Software

1. iway Big Data Integrator Getting Started Lab The Select a virtual machine file dialog opens, as shown in the following image. 3. Navigate to the location on your file system where the ibdi-getting-started_lab.vbox file is located and then click Open. iway Big Data Integrator Getting Started Lab 25

Loading and Starting the Cloudera Hadoop Quick Start Image Using Oracle VM VirtualBox Manager After the virtual machine is added, the ibdi-getting-started_lab entry is listed in the left pane of Oracle VM VirtualBox Manager, as shown in the following image. The appliance is set to Powered Off by default. The system specifications of the virtual machine are displayed in the right pane. 26 iway Software

1. iway Big Data Integrator Getting Started Lab 4. Right-click the ibdi-getting-started_lab entry in the left pane, select Start, and then click Normal Start from the context menu, as shown in the following image. You can also click the Start drop-down arrow from the tool bar, and then click Normal Start from the context menu, as shown in the following image. You are now ready to begin using iway Big Data Integrator (BDI) Version 1.4.0. iway Big Data Integrator Getting Started Lab 27

Starting iway Big Data Integrator on the Cloudera Hadoop Quick Start Image Starting iway Big Data Integrator on the Cloudera Hadoop Quick Start Image How to: Start iway Big Data Integrator on the Cloudera Hadoop Quick Start Image This section describes how to start iway Big Data Integrator (BDI) Version 1.4.0 on the Cloudera Hadoop Quickstart Image. Procedure: How to Start iway Big Data Integrator on the Cloudera Hadoop Quick Start Image 1. Double-click the iway BDI shortcut icon, as shown in the following image. The Workspace Launcher dialog opens, as shown in the following image. 2. Specify a custom workspace or accept the default value. 3. Click OK. 28 iway Software

1. iway Big Data Integrator Getting Started Lab iway Big Data Integrator (BDI) opens as a perspective in the Eclipse Luna framework, as shown in the following image. Configuring a Hive Database Connection How to: Configure and Test a Hive Database Connection The Data Source Explorer tab in iway Big Data Integrator (BDI) allows you to configure and manage connections to your database systems. This section describes how to configure and test a new JDBC connection to Apache Hive. Apache Hive is a data warehouse infrastructure that is built on top of Hadoop for providing data summarization, query, and analysis. Using an SQL-like language called HiveQL, Apache Hive facilitates querying and managing (through MapReduce) large data sets residing in a Hadoop Distributed File System (HDFS). iway Big Data Integrator Getting Started Lab 29

Configuring a Hive Database Connection Procedure: How to Configure and Test a Hive Database Connection 1. In the Data Source Explorer tab, right-click Database Connections and select New from the context menu, as shown in the following image. You can also click the New Connection Profile icon in the Data Source Explorer tab, as shown in the following image. 30 iway Software

1. iway Big Data Integrator Getting Started Lab The New Connection Profile wizard opens, as shown in the following image. 2. Select Hive JDBC from the Connection Profile Types area and type BDI_Hive in the Name field. 3. Click Next. iway Big Data Integrator Getting Started Lab 31

Configuring a Hive Database Connection The Specify a Driver and Connection Details pane opens, as shown in the following image. 4. Click the New Driver Definition icon, which is located to the right of the Drivers field. 32 iway Software

1. iway Big Data Integrator Getting Started Lab The New Driver Definition pane opens, as shown in the following image. 5. In the Name/Type tab, which is selected by default, select Generic JDBC Driver. iway Big Data Integrator Getting Started Lab 33

Configuring a Hive Database Connection 6. Click the JAR List tab and then click Add JAR/Zip, as shown in the following image. 34 iway Software

1. iway Big Data Integrator Getting Started Lab The Select the file dialog opens, as shown in the following image. 7. Navigate to the /home/cloudera/bdi_resources/working_env folder, select the hive-cdh5-2.5.16.jar file, and then click OK. iway Big Data Integrator Getting Started Lab 35

Configuring a Hive Database Connection You are returned to the New Driver Definition pane where the hive-cdh5-2.5.16.jar file is now included in the JAR List tab, as shown in the following image. 36 iway Software

1. iway Big Data Integrator Getting Started Lab 8. Click the Properties tab, as shown in the following image. 9. Select the Driver Class property and then click the corresponding ellipsis ( ) icon. iway Big Data Integrator Getting Started Lab 37

Configuring a Hive Database Connection The Available Classes from Jar List dialog opens, as shown in the following image. 10. Select Browse for class. 38 iway Software

1. iway Big Data Integrator Getting Started Lab The hive-cdh5-2.5.16.jar file is processed automatically and displays a list of available classes, as shown in the following image. 11. Select the com.cloudera.hive.jdbc4.hs2driver class from the list and click OK. iway Big Data Integrator Getting Started Lab 39

Configuring a Hive Database Connection You are returned to the New Driver Definition pane where the com.cloudera.hive.jdbc4.hs2driver class is now listed as a value in the Properties tab, as shown in the following image. 12. Click OK. 40 iway Software

1. iway Big Data Integrator Getting Started Lab The Specify a Driver and Connection Details pane opens, as shown in the following image. 13. Specify the following information: Database: default URL: jdbc:hive2://localhost:10000 User name: admin Password: admin 14. Select the Save password check box. 15. Click Test Connection. iway Big Data Integrator Getting Started Lab 41

Configuring a MySQL Database Connection If you configured your Hive JDBC connection correctly, a Ping succeeded message is displayed, as shown in the following image. 16. Click OK and then click Finish. Your new Hive database connection BDI_Hive (Apache Hive v. 1.1.0.7.0) is listed as a new node under the Database Connections folder, as shown in the following image. Configuring a MySQL Database Connection How to: Configure and Test a MySQL Database Connection The Data Source Explorer tab in iway Big Data Integrator (BDI) allows you to configure and manage connections to your database systems. This section describes how to configure and test a new JDBC connection to MySQL. 42 iway Software

1. iway Big Data Integrator Getting Started Lab Procedure: How to Configure and Test a MySQL Database Connection 1. In the Data Source Explorer tab, right-click Database Connections and select New from the context menu, as shown in the following image. You can also click the New Connection Profile icon in the Data Source Explorer tab, as shown in the following image. iway Big Data Integrator Getting Started Lab 43

Configuring a MySQL Database Connection The New Connection Profile wizard opens, as shown in the following image. 2. Select MySQL from the Connection Profile Types area and type Retail_DB_MySQL in the Name field. 3. Click Next. 44 iway Software

1. iway Big Data Integrator Getting Started Lab The Specify a Driver and Connection Details pane opens, as shown in the following image. 4. Click the New Driver Definition icon, which is located to the right of the Drivers field. iway Big Data Integrator Getting Started Lab 45

Configuring a MySQL Database Connection The New Driver Definition pane opens, as shown in the following image. 5. In the Name/Type tab, which is selected by default, select MySQL JDBC Driver 5.1. 46 iway Software

1. iway Big Data Integrator Getting Started Lab 6. Click the JAR List tab, select the mysql-connector-java-5.1.0-bin.jar file and then click Edit JAR/Zip, as shown in the following image. iway Big Data Integrator Getting Started Lab 47

Configuring a MySQL Database Connection The Select the file dialog opens, as shown in the following image. 7. From the File System, navigate to the /var/lib/sqoop folder, select the mysql-connector-java.jar file, and then click OK. 48 iway Software

1. iway Big Data Integrator Getting Started Lab You are returned to the New Driver Definition pane where the mysql-connector-java.jar file is now included in the JAR List tab, as shown in the following image. 8. Click OK. iway Big Data Integrator Getting Started Lab 49

Configuring a MySQL Database Connection The Specify a Driver and Connection Details pane opens, as shown in the following image. 9. Specify the following information: Database: retail_db URL: jdbc:mysql://localhost:3306/retail_db User name: retail_dba Password: cloudera 10. Select the Save password check box. 11. Click Test Connection. 50 iway Software

1. iway Big Data Integrator Getting Started Lab If you configured your Hive JDBC connection correctly, a Ping succeeded message is displayed, as shown in the following image. 12. Click OK and then click Finish. Your new MySQL database connection Retail_DB_MySQL (MySQL v. 5.1.73) is listed as a new node under the Database Connections folder, as shown in the following image. Creating a New Project How to: Create a New Project This section describes how to create a new project in iway Big Data Integrator (BDI). iway Big Data Integrator Getting Started Lab 51

Creating a New Project Procedure: How to Create a New Project 1. Right-click anywhere within the Project Explorer tab, select New, and then click Project from the context menu, as shown in the following image. 52 iway Software

1. iway Big Data Integrator Getting Started Lab The New Project dialog opens, as shown in the following image. 2. Select Big Data Project and click Next. iway Big Data Integrator Getting Started Lab 53

Creating a New Project The New Big Data Project dialog opens, as shown in the following image. 3. Type BDI in the Project name field and click Finish. Your new project (BDI) is listed as a new node in the Project Explorer tab, as shown in the following image. 4. Expand the BDI project node to view its contents and folder structure. 54 iway Software

1. iway Big Data Integrator Getting Started Lab Configuring a Connection to a Hadoop Distributed File System Server How to: Configure a Connection to a HDFS Server A Hadoop Distributed File System (HDFS) is a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster. This section describes how to configure a connection to a HDFS server using iway Big Data Integrator (BDI). Procedure: How to Configure a Connection to a HDFS Server 1. Right-click anywhere within the Project Explorer tab, select New, and then click Other from the context menu, as shown in the following image. iway Big Data Integrator Getting Started Lab 55

Configuring a Connection to a Hadoop Distributed File System Server The New dialog opens, as shown in the following image. 2. Type hdfs in the field to filter the selection, select New HDFS Server, and then click Next. 56 iway Software

1. iway Big Data Integrator Getting Started Lab The HDFS Server Location pane opens, as shown in the following image. 3. Specify the following information: Name: hdfs_bdi URL: hdfs://localhost:8020 HDFS Version: 2.2 User ID: cloudera Group IDs: cloudera 4. Click Finish. iway Big Data Integrator Getting Started Lab 57

Creating and Running a Sqoop Configuration The Open Associated Perspective prompt is displayed, as shown in the following image. 5. Click No. Your new connection to a HDFS server (hdfs_bdi) is listed as a new node in the Project Explorer tab, as shown in the following image. 6. Expand the hdfs_bdi node to view its contents and folder structure. Creating and Running a Sqoop Configuration How to: Create and Run a Sqoop Configuration Apache Sqoop allows you to efficiently transfer bulk data between Apache Hadoop and structured datastores, such as relational databases. This section describes how to create and run a Sqoop configuration using iway Big Data Integrator (BDI). 58 iway Software

1. iway Big Data Integrator Getting Started Lab Procedure: How to Create and Run a Sqoop Configuration 1. Expand the BDI project node in the Project Explorer tab, right-click the Sqoops folder, select New, and then click Other from the context menu, as shown in the following image. iway Big Data Integrator Getting Started Lab 59

Creating and Running a Sqoop Configuration The New dialog opens, as shown in the following image. 2. Type sqoop in the field to filter the selection, select Sqoop, and then click Next. 60 iway Software

1. iway Big Data Integrator Getting Started Lab The New Sqoop dialog opens, as shown in the following image. 3. Type retail_db in the Name field and click Finish. The retail_db.sqoop tab opens in the iway BDI workspace, as shown in the following image. 4. In the Sqoop Target area, click the drop-down arrow to the right of the Target Data Source field and select BDI_Hive, which is the Hive database connection you configured earlier. 5. Type bdi_retail_db in the Target Schema field. iway Big Data Integrator Getting Started Lab 61

Creating and Running a Sqoop Configuration 6. In the Source Tables area, click the green plus sign icon (+), as shown in the following image. The Table Selection dialog opens, as shown in the following image. 7. Expand Retail_DB_MySQL, retail_db, Schemas, retail_db, and then Tables. 8. Select all of the following tables that are available (press and hold the Shift key): categories customers departments order_items orders products 9. Click OK. 62 iway Software

1. iway Big Data Integrator Getting Started Lab The selected tables are now populated in the Source Tables area, as shown in the following image. 10. In the Options column for customers, select CDC. 11. Click the Save icon or use the Ctrl+S shortcut to save your work. 12. Expand the Sqoops folder under the BDI project node, right-click retail_db.sqoop, select Run As, and then click Run Configurations, as shown in the following image. iway Big Data Integrator Getting Started Lab 63

Creating and Running a Sqoop Configuration The Run Configurations dialog opens, as shown in the following image. 13. Right-click iway Big Data Integrator Build and select New from the context menu. 64 iway Software

1. iway Big Data Integrator Getting Started Lab A new run/build configuration pane opens, as shown in the following image. 14. Specify the following information: Name: sqoop_mysql Deployment Build Target: /BDI/Sqoops Host Name: localhost User Name: cloudera Password: cloudera Deployment Location: bdi/deployments/sqoop/retail_db iway Big Data Integrator Processes: retail_db sqoop /BDI/Sqoops/retail_db.sqoop iway Big Data Integrator Getting Started Lab 65

Creating and Running a Sqoop Configuration Note: To specify the value for the iway Big Data Integrator Processes area, click the green plus sign icon (+), which opens the iway Big Data Integrator Processes Selection dialog, expand BDI, Sqoops, and then select retail_db.sqoop and click OK. 66 iway Software

1. iway Big Data Integrator Getting Started Lab 15. From the Run Configurations dialog, click Apply and then Run, as shown in the following image. Note: This process may take 10 to 20 minutes to complete. 16. Click the Console tab, which displays iway BDI messages during processing, as shown in the following image. iway Big Data Integrator Getting Started Lab 67

Creating and Running a Sqoop Configuration The following messages indicate that the Sqoop configuration (retail_db.sqoop) has been successfully compiled and deployed: ------------------------------------------------------------------------------------------------------------------------------------ 08/12/2016 05:06:34.271 [INFO] exit-status: 0 08/12/2016 05:06:34.273 [INFO] exec: cd bdi/deployments/sqoop/retail_db/work;./run.sh 08/12/2016 05:06:34.281 [INFO] Deployment successfully sent to 'localhost'. ------------------------------------------------------------------------------------------------------------------------------------ 17. In the Data Source Explorer tab, expand the Hive database connection you configured (BDI_Hive), default, Catalogs, Hive, and then Schemas, as shown in the following image. 18. Right-click the Schemas folder and select Refresh from the context menu. Verify that the bdi_retail_db schema is available, which was generated by running the Sqoop configuration (retail_db.sqoop). 68 iway Software

1. iway Big Data Integrator Getting Started Lab Creating and Running a Mapping Configuration How to: Create and Execute a Mapping Configuration iway Big Data Integrator (BDI) provides the ability to map and transform data in a Hadoop Distributed File System (HDFS) using the Mapper tool. This section describes how to create and run a mapping configuration using iway BDI. Procedure: How to Create and Execute a Mapping Configuration 1. Expand the BDI project node in the Project Explorer tab, right-click the Mappings folder, select New, and then click Other from the context menu, as shown in the following image. The New dialog opens, as shown in the following image. iway Big Data Integrator Getting Started Lab 69

Creating and Running a Mapping Configuration 2. Type mapper in the field to filter the selection, select Big Data Mapper, and then click Next. 70 iway Software

1. iway Big Data Integrator Getting Started Lab The New Mapper dialog opens, as shown in the following image. 3. Type transform_customer in the Name field and click Finish. Your new mapping (transform_customer.iwmapper) is listed as a new node under the Mappings folder of your BDI project, as shown in the following image. iway Big Data Integrator Getting Started Lab 71

Creating and Running a Mapping Configuration 4. Drag and drop the Source object from the Palette onto the Design view, as shown in the following image. 72 iway Software

1. iway Big Data Integrator Getting Started Lab The New Source Table dialog opens, as shown in the following image. 5. Expand bdi_retail_db in the left pane, select the customers table, and then select the following columns: customer_id customer_fname customer_lname customer_email customer_password customer_street customer_city customer_state iway Big Data Integrator Getting Started Lab 73

Creating and Running a Mapping Configuration customer_zipcode 6. Click Finish. The first Source object (bdi_retail_db.customers) is added to the Design view, as shown in the following image. 7. Repeat Step 4, by dragging and dropping the Source object from the Palette onto the Design view. 74 iway Software

1. iway Big Data Integrator Getting Started Lab The New Source Table dialog opens, as shown in the following image. 8. Expand bdi_retail_db in the left pane, select the orders table, and then select the following columns: order_id order_date order_customer_id order_status 9. Click Finish. iway Big Data Integrator Getting Started Lab 75

Creating and Running a Mapping Configuration The second Source object (bdi_retail_db.orders) is added to the Design view, as shown in the following image. 76 iway Software

1. iway Big Data Integrator Getting Started Lab 10. Drag and drop the Expressions object from the Palette onto the Design view, as shown in the following image. iway Big Data Integrator Getting Started Lab 77

Creating and Running a Mapping Configuration The Expressions object is added to the Design view, as shown in the following image. 11. Hover over the right corner of the Expressions object and click the Expression Builder icon, as shown in the following image. 78 iway Software

1. iway Big Data Integrator Getting Started Lab The Expression Builder dialog opens, as shown in the following image. 12. Type the following expression: ----------------------------------------------------------------------------------------------------------------- concat(bdi_retail_db.customers.customer_fname,' ',bdi_retail_db.customers.customer_lname) ------------------------------------------------------------------------------------------------------------------------------------ 13. Click Finish. iway Big Data Integrator Getting Started Lab 79

Creating and Running a Mapping Configuration The Expressions object is updated and refreshed in the Design view, as shown in the following image. 14. Click the Save icon or use the Ctrl+S shortcut to save your work. 80 iway Software

1. iway Big Data Integrator Getting Started Lab 15. Drag and drop the Target object from the Palette onto the Design view, as shown in the following image. iway Big Data Integrator Getting Started Lab 81

Creating and Running a Mapping Configuration The Target dialog opens, as shown in the following image. 16. Perform the following steps: a. Select New Table, which will create a new table target entry with custom columns. b. Type simplecustomer in the Table Name field. c. Select bdi_retail_db from the Target Schema drop-down list. 17. Click Finish. 82 iway Software

1. iway Big Data Integrator Getting Started Lab The Target object is added to the Design view, as shown in the following image. iway Big Data Integrator Getting Started Lab 83

Creating and Running a Mapping Configuration 18. Select the first Source object (bdi_retail_db.customers), hover over the customer_id column, click and drag the Create Join operation (as a connecting line) to the order_customer_id column in the second Source object (bdi_retail_db.orders), as shown in the following image. 84 iway Software

1. iway Big Data Integrator Getting Started Lab The Join Type dialog opens, as shown in the following image. 19. Select inner as the join type and click OK. iway Big Data Integrator Getting Started Lab 85

Creating and Running a Mapping Configuration The inner join mapping is created in the Design view, as shown in the following image. 20. Select the first Source object (for example, bdi_retail_db.customers), pause the mouse pointer over the upper-right corner of this object, and then click and drag the Create Mapping operation (as a connecting line) to the Expressions object, as shown in the following image. 86 iway Software

1. iway Big Data Integrator Getting Started Lab The mapping is created between the first Source object (bdi_retail_db.customers) and the Expressions object in the Design view, as shown in the following image. Note: As a best practice, create mappings between Source objects and Expressions object before configuring an expression. 21. Click the Save icon or use the Ctrl+S shortcut to save your work. iway Big Data Integrator Getting Started Lab 87

Creating and Running a Mapping Configuration 22. Hover over the upper-right corner of the Target object and click the Add Table Column icon, as shown in the following image. 23. Repeat the previous step (Step 22) three times so that a total of four table columns are now added, as shown in the following image. You will need to change these default column names. 88 iway Software

1. iway Big Data Integrator Getting Started Lab 24. Hover over each column_name entry and click the Edit Column Name icon, as shown in the following image. The Edit Column dialog opens, as shown in the following image. Here you can specify a new column name and click OK to accept the change. 25. Repeat the previous step (Step 24) by renaming each column_name entry as follows: Rename column_name to order_id. Rename column_name1 to order_date. Rename column_name2 to order_status. Rename column_name3 to order_name. iway Big Data Integrator Getting Started Lab 89

Creating and Running a Mapping Configuration After you are finished, your Target object should look like the following image. 26. Create the following mappings between the second Source object (bdi_retail_db.orders) and the Target object: Second Source Object (bdi_retail_db.orders) order_id order_date order_status Target Object order_id order_date order_status 90 iway Software

1. iway Big Data Integrator Getting Started Lab For example: 27. Create a mapping between the Expressions object and the order_name table column in the Target object, as shown in the following image. iway Big Data Integrator Getting Started Lab 91

Creating and Running a Mapping Configuration In the Design view, your completed mapping should now look like the following image. 28. Click the Save icon or use the Ctrl+S shortcut to save your work. 29. Expand the Mappings folder under the BDI project node, right-click transform_customer.iwmapper, select Run As, and then click Run Configurations, as shown in the following image. 92 iway Software

1. iway Big Data Integrator Getting Started Lab The Run Configurations dialog opens, as shown in the following image. iway Big Data Integrator Getting Started Lab 93

Creating and Running a Mapping Configuration 30. Right-click iway Big Data Integrator Build and select New from the context menu. 31. Specify the following information: Name: transform_customer Deployment Build Target: /BDI/Mappings Host Name: localhost User Name: cloudera Password: cloudera Deployment Location: bdi/deployments/mapper/customer iway Big Data Integrator Processes: transform_customer iwmapper /BDI/Mappings/transform_customer.iwmapper 94 iway Software

1. iway Big Data Integrator Getting Started Lab Note: To specify the value for the iway Big Data Integrator Processes area, click the green plus sign icon (+), which opens the iway Big Data Integrator Processes Selection dialog. Expand BDI, Mappings, select transform_customer.iwmapper, then click OK. iway Big Data Integrator Getting Started Lab 95

Creating and Running a Mapping Configuration 32. From the Run Configurations dialog, click Apply and then Run, as shown in the following image. Note: The process may take several minutes to complete. 33. Click the Console tab, which displays iway BDI messages during processing, as shown in the following image. 96 iway Software

1. iway Big Data Integrator Getting Started Lab The following messages indicate that the mapper configuration (transform_customer.iwmapper) has been successfully compiled and deployed: ----------------------------------------------------------------------------------------------------------------------------------- 08/14/2016 02:16:21.374 [INFO] exit-status: 0 08/14/2016 02:16:21.375 [INFO] exec: cd bdi/deployments/mapper/customer/work;./run.sh 08/14/2016 02:16:21.386 [INFO] Deployment successfully sent to 'localhost'. ------------------------------------------------------------------------------------------------------------------------------------ 34. In the Project Explorer tab, expand the Mappings folder and then transform_customer.target, where you will now see the generated build folder, as shown in the following image. Creating and Running a Flume Configuration How to: Create and Run a Flume Configuration Apache Flume is an available service for efficiently collecting, aggregating, and moving large amounts of streaming data into a Hadoop Distributed File System (HDFS). It has a simple and flexible architecture based on streaming data flows. Apache Flume is robust and fault tolerant with tunable reliability mechanisms for failover and recovery. iway Big Data Integrator Getting Started Lab 97

Creating and Running a Flume Configuration You can think of the Flume configuration in iway BDI as configuring and running a channel/listener in iway Service Manager (ism). This section describes how to create and run a Flume configuration using iway BDI. Procedure: How to Create and Run a Flume Configuration 1. Expand the BDI project node in the Project Explorer tab, right-click the Flumes folder, select New, and then click Other from the context menu, as shown in the following image. 98 iway Software

1. iway Big Data Integrator Getting Started Lab The New dialog opens, as shown in the following image. 2. Type flume in the field to filter the selection, select Flume, and then click Next. iway Big Data Integrator Getting Started Lab 99

Creating and Running a Flume Configuration The New Flume dialog opens, as shown in the following image. 3. Type flume_http in the Name field and click Next. 100 iway Software

1. iway Big Data Integrator Getting Started Lab The Select Source pane opens, as shown in the following image. 4. Select HTTPSource and click Next. iway Big Data Integrator Getting Started Lab 101

Creating and Running a Flume Configuration The Select Channel pane opens, as shown in the following image. 5. Select Memory Channel and click Next. 102 iway Software

1. iway Big Data Integrator Getting Started Lab The Select Sink pane opens, as shown in the following image. 6. Select HDFS Sink and click Finish. iway Big Data Integrator Getting Started Lab 103

Creating and Running a Flume Configuration The flume_http.iwflume tab opens in the iway BDI workspace where specific Flume details can be configured, as shown in the following image. The Source tab is selected by default. 7. Specify the following information: Bind: localhost Port: 8181 104 iway Software

1. iway Big Data Integrator Getting Started Lab 8. Click the Sink tab, as shown in the following image. 9. Scroll down to the HDFS Path parameter and type the following value: ------------------------------------------------------------------------------------------------------------------------------------ bdi_retail_db/flume/data ------------------------------------------------------------------------------------------------------------------------------------ 10. Click the Save icon or use the Ctrl+S shortcut to save your work. iway Big Data Integrator Getting Started Lab 105

Creating and Running a Flume Configuration 11. Expand the Flumes folder under the BDI project node, right-click flume_http.iwflume, select Run As, and then click Run Configurations, as shown in the following image. The Run Configurations dialog opens, as shown in the following image. 12. Right-click iway Big Data Integrator Publish and select New from the context menu. Important: Ensure that you are selecting and right-clicking the iway Big Data Integrator Publish option and not the iway Big Data Integrator Build option. 106 iway Software

1. iway Big Data Integrator Getting Started Lab The Create, manage, and run configurations pane opens, as shown in the following image. 13. Specify the following information: Name: flume_http Deployment Build Target: /BDI/Flumes Host Name: localhost User Name: cloudera Password: cloudera Deployment Location: bdi/deployments/flume/http iway Big Data Integrator Getting Started Lab 107

Creating and Running a Flume Configuration iway Big Data Integrator Processes: flume_http iwflume /BDI/Flumes/flume_http.iwflume To specify the value for the iway Big Data Integrator Processes area, click the green plus sign icon (+), which opens the iway Big Data Integrator Processes Selection dialog. Expand BDI, Flumes, select flume_http.iwflume, and then click OK. 108 iway Software

1. iway Big Data Integrator Getting Started Lab 14. From the Run Configurations dialog, click Apply and then Run, as shown in the following image. 15. Click the Console tab, which displays iway BDI messages during processing, as shown in the following image. iway Big Data Integrator Getting Started Lab 109

Creating and Running a Flume Configuration The following messages indicate that the Flume configuration (flume_http.iwflume) has been successfully compiled and deployed: ------------------------------------------------------------------------------------------------------------------------------------ 08/15/2016 03:27:05.335 [INFO] exit-status: 0 08/15/2016 03:27:05.337 [INFO] exec: chmod -R 755 bdi/deployments/flume/http 08/15/2016 03:27:05.350 [INFO] Published successfully to 'localhost'. ------------------------------------------------------------------------------------------------------------------------------------ 16. In the Project Explorer tab, expand the Flumes folder and then flume_http.target, where you will now see the generated build folder, as shown in the following image. You are now ready to run the Flume configuration, which would be similar to starting a channel/listener in ism. 17. Open a terminal window, as shown in the following image. 110 iway Software

1. iway Big Data Integrator Getting Started Lab 18. Enter the following command: ------------------------------------------------------------------------------------------------------------------------------------ cd bdi/deployments/flume/http/work/ ------------------------------------------------------------------------------------------------------------------------------------ For example: 19. Execute the run.sh file by entering the following command: ------------------------------------------------------------------------------------------------------------------------------------./run.sh ------------------------------------------------------------------------------------------------------------------------------------ For example: Ensure that the following line is displayed in the terminal window: ------------------------------------------------------------------------------------------------------------------------------------ INFO instrumentation.monitoredcountergroup: Component type: SOURCE, name: httpsource started ------------------------------------------------------------------------------------------------------------------------------------ For example: iway Big Data Integrator Getting Started Lab 111

Creating and Running a Flume Configuration Note: Do not close this terminal window, since doing so would stop the Flume configuration. 20. Open another terminal window and execute the following CURL commands: CURL Command #1: ------------------------------------------------------------------------------------------------------------------------------------ curl -H "Content-Type: application/json" -X POST -d '[{ "headers" : { "timestamp" : "110434324343", "host": "random_host.example.com", "field1" : "val1", "m_user" :"m_user", "m_year" : "m_year", "m_month" : "m_month", "m_day" :"m_day" }, "body" : "1,2,3,4" }]' http://localhost:8181 ------------------------------------------------------------------------------------------------------------------------------------ 21. Return to your iway BDI workspace. 22. Expand the configured connection to the HDFS server, which is located in the Project Explorer tab, as shown in the following image. 23. Navigate to the following folder: ------------------------------------------------------------------------------------------------------------------------------------ /user/cloudera ------------------------------------------------------------------------------------------------------------------------------------ 112 iway Software

1. iway Big Data Integrator Getting Started Lab 24. Right-click the cloudera folder and select Refresh from the context menu, as shown in the following image. 25. Continue navigating to the following folder: ------------------------------------------------------------------------------------------------------------------------------------ /user/cloudera/bdi_retail_db/flume/data ------------------------------------------------------------------------------------------------------------------------------------ iway Big Data Integrator Getting Started Lab 113

Creating and Running a Data Wrangler Configuration In the /data folder, you will see the data you just posted to Flume, as shown in the following image. Since you ran two CURL commands earlier, notice that two data results are listed. You are now ready to use the Data Wrangler in iway BDI. Creating and Running a Data Wrangler Configuration How to: Create and Run a Data Wrangler Configuration The Data Wrangler in iway Big Data Integrator (BDI) allows you to consume the Flume data you published. This section describes how to create and run a Data Wrangler configuration using iway BDI. 114 iway Software

1. iway Big Data Integrator Getting Started Lab Procedure: How to Create and Run a Data Wrangler Configuration 1. Expand the BDI project node in the Project Explorer tab, right-click the Wranglers folder, select New, and then click Other from the context menu, as shown in the following image. iway Big Data Integrator Getting Started Lab 115

Creating and Running a Data Wrangler Configuration The New dialog opens, as shown in the following image. 2. Type wrangler in the Wizards field to filter the selection, select Wrangler, and then click Next. 116 iway Software

1. iway Big Data Integrator Getting Started Lab The New Wrangler dialog opens, as shown in the following image. The Project Folder field is automatically populated with the /BDI/Wranglers folder path. 3. Click Browse to the right of the Source field. iway Big Data Integrator Getting Started Lab 117

Creating and Running a Data Wrangler Configuration The Select a Source dialog opens, as shown in the following image. 118 iway Software

1. iway Big Data Integrator Getting Started Lab 4. Expand hdfs_bdi and navigate to the following folder: ------------------------------------------------------------------------------------------------------------------------------------ /user/cloudera/bdi_retail_db/flume/data ------------------------------------------------------------------------------------------------------------------------------------ 5. Select one of the Flume data results and click OK. iway Big Data Integrator Getting Started Lab 119

Creating and Running a Data Wrangler Configuration You are returned to the New Wrangler dialog where the Source field and Name field are now populated, as shown in the following image. 6. Click Finish. 120 iway Software

1. iway Big Data Integrator Getting Started Lab The FlumeData.wrangler tab opens in the iway BDI workspace where specific Data Wrangler details can be configured, as shown in the following image. 7. Type the following value for the Schema parameter: ------------------------------------------------------------------------------------------------------------------------------------ retail_db ------------------------------------------------------------------------------------------------------------------------------------ 8. Click Execute. The Wrangler Execution Status dialog opens and displays a message indicating that the Data Wrangler execution was completed successfully, as shown in the following image. 9. Click OK. A Hive table is created for the data that was sent to Flume. You can send additional data to Flume as required using the CURL command. iway Big Data Integrator Getting Started Lab 121

Creating and Running a Data Wrangler Configuration 10. To confirm that Flume data was generated, expand the BDI_Hive database connection you configured in the Data Source Explorer tab, expand default, and then expand Catalogs, as shown in the following image. 11. Right-click the Hive folder and select Refresh from the context menu, as shown in the following image. 122 iway Software

1. iway Big Data Integrator Getting Started Lab 12. Continue by expanding Schemas, retail_db, Tables, flumedata, and Columns, as shown in the following image. 13. Right-click a column, select Data, and then click Sample Contents from the context menu, as shown in the following image. Note: This process may take several minutes to complete. iway Big Data Integrator Getting Started Lab 123

Additional Resources The sample contents of the selected Flume data are displayed in the SQL Results tab, as shown in the following image. Additional Resources In this section: Videos User Documentation This section identifies additional resources that are designed to enhance your user experience with iway Big Data Integrator (BDI). We encourage you to share these resources with your colleagues and team members. Videos You can view technical videos on our YouTube channel that provide a walkthrough of all of the core exercises in this getting started lab. Information Builders Worldwide Customer Services YouTube Channel iway Big Data Integrator Video Playlist Individual iway BDI videos: Starting iway Big Data Integrator and Creating a New Project Configuring a Hive Database Connection Configuring a MySQL Database Connection Configuring a Connection to a HDFS Server Creating and Running a Sqoop Configuration Creating and Running a Mapping Configuration Creating and Running a Flume Configuration 124 iway Software