IBM Fluid Query for PureData Analytics. - Sanjit Chakraborty

Similar documents
BlueMix Hands-On Workshop

IBM Big SQL Partner Application Verification Quick Guide

Partner Integration Portal (PIP) Installation Guide

Using Hive for Data Warehousing

1. Configuring Azure and EBP for a simple demo

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Oracle Big Data Cloud Service, Oracle Storage Cloud Service, Oracle Database Cloud Service

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

LDAP Configuration Guide

How to Run the Big Data Management Utility Update for 10.1

Developer Training for Apache Spark and Hadoop: Hands-On Exercises

Innovatus Technologies

Downloading and installing Db2 Developer Community Edition on Red Hat Enterprise Linux Roger E. Sanders Yujing Ke Published on October 24, 2018

Accessing clusters 2. Accessing Clusters. Date of Publish:

Lab 0: Intro to running Jupyter Notebook on a Raspberry Pi

How to Deploy an Oracle E-Business Suite System in Minutes Using Oracle VM Templates

Bitnami MariaDB for Huawei Enterprise Cloud

Microsoft Exam

VMware AirWatch Database Migration Guide A sample procedure for migrating your AirWatch database

Upgrading the Server Software

Bitnami JRuby for Huawei Enterprise Cloud

How to Install and Configure EBF16193 for Hortonworks HDP 2.3 and HotFix 3 Update 2

SAS Model Manager 2.3

GoPrint Server Migration

ZENworks Reporting System Reference. January 2017

TIBCO Jaspersoft running in AWS accessing a back office Oracle database via JDBC with Progress DataDirect Cloud.

Business Intelligence on Dell Quickstart Data Warehouse Appliance Using Toad Business Intelligence Suite

Bitnami Apache Solr for Huawei Enterprise Cloud

DBXL AZURE INSTALLATION GUIDE

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Bitnami MySQL for Huawei Enterprise Cloud

The TIBCO Insight Platform 1. Data on Fire 2. Data to Action. Michael O Connell Catalina Herrera Peter Shaw September 7, 2016

Hortonworks Data Platform

Oracle Application Express Users Guide

UC for Enterprise (UCE) NEC Centralized Authentication Service (NEC CAS)

Bitnami MEAN for Huawei Enterprise Cloud

This document contains information on fixed and known limitations for Test Data Management.

Configuring Apache Knox SSO

Installing or Upgrading ANM Virtual Appliance

Hadoop. Introduction / Overview

Using Apache Phoenix to store and access data

Defining an ODBC data source

Sandbox Setup Guide for HDP 2.2 and VMware

Oracle BDA: Working With Mammoth - 1

DEC 31, HareDB HBase Client Web Version ( X & Xs) USER MANUAL. HareDB Team

Apparo Fast Edit. Installation Guide 3.1.1

Ranger 0.5 Audit Configuration

Adobe Marketing Cloud Using FTP and sftp with the Adobe Marketing Cloud

User Migration Tool. User Migration Tool Prerequisites

ACE Live on RSP: Installation Instructions

Log Analyzer Reference

Talend Open Studio for Data Quality. User Guide 5.5.2

Installing IBM InfoSphere BigInsights Quick Start Edition

IBM Operational Decision Manager Version 8 Release 5. Configuring Operational Decision Manager on WebLogic

Processing Big Data with Hadoop in Azure HDInsight

Table of Contents HOL-PRT-1463

Upgrade Instructions. NetBrain Integrated Edition 7.1. Two-Server Deployment

How to connect to Cloudera Hadoop Data Sources

XenMobile 10 Cluster installation. Here is the task that would be completed in order to implement a XenMobile 10 Cluster.

SAS Data Loader 2.4 for Hadoop

Xcalar Installation Guide

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Greenplum-Spark Connector Examples Documentation. kong-yew,chan

Cloudera Manager Quick Start Guide

Using Apache Zeppelin

Entrust Connector (econnector) Venafi Trust Protection Platform

Including Dynamic Images in Your Report

User Guide. Voic Manager. Version 14

Reset the Admin Password with the ExtraHop Rescue CD

Quick Guide to Installing and Setting Up MySQL Workbench

Data Analytics using MapReduce framework for DB2's Large Scale XML Data Processing

Teradata Studio Express

New in This Version. Numeric Filtergram

SAS. Information Map Studio 3.1: Creating Your First Information Map

ThingWorx Relational Databases Connectors Extension User Guide

SAP BusinessObjects. Erick Carlson SAP Solution Architect N.A. SAP on Oracle Team

Session 1: Accessing MUGrid and Command Line Basics

Installing an HDF cluster

API Gateway Version September Key Property Store User Guide

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

VMware Content Gateway to Unified Access Gateway Migration Guide

RED IM Integration with Bomgar Privileged Access

User Guide. 3CX Robo Dialer. Version

Getting Started with ESX Server 3i Installable Update 2 and later for ESX Server 3i version 3.5 Installable and VirtualCenter 2.5

OCLC-Hosted EZproxy Troubleshooting Tips

Installing HDF Services on an Existing HDP Cluster

akkadian Global Directory 3.0 System Administration Guide

$HIVE_HOME/bin/hive is a shell utility which can be used to run Hive queries in either interactive or batch mode.

Power Development Platform Connecting to your PDP system user guide

Installation Guide. Apparo Fast Edit. For Windows Server IBM Cognos Analytics. Version Single Instance

Talend Big Data Sandbox. Big Data Insights Cookbook

Configuring the Oracle Network Environment. Copyright 2009, Oracle. All rights reserved.

Real-time Data Engineering in the Cloud Exercise Guide

MapReduce & YARN Hands-on Lab Exercise 1 Simple MapReduce program in Java

Getting Started with ESX Server 3i Embedded ESX Server 3i version 3.5 Embedded and VirtualCenter 2.5

Downloading and installing Db2 Developer Community Edition on Ubuntu Linux Roger E. Sanders Yujing Ke Published on October 24, 2018

Installing the PC-Kits SQL Database

SAS/ACCESS Interface to R/3

CSC 564: SQL Injection Attack Programming Project

Quick Deployment Step- by- step instructions to deploy Oracle Big Data Lite Virtual Machine

Transcription:

IBM Fluid Query for PureData Analytics - Sanjit Chakraborty

Introduction IBM Fluid Query is the capability that unifies data access across the Logical Data Warehouse. Users and analytic applications need access to data in a variety of data repositories and platforms without concern for the data s location or access method or the need to rewrite a query. IBM Fluid Query is the capability for a data store to route a query to the correct data store within the logical data warehouse so that the query can flow to the data, not the data flow to the query. No matter where a user connects within the logical data warehouse, they can access all data through the same, standard API/SQL access. IBM Fluid Query powers the Logical Data Warehouse by giving users the ability to combine their data even if spread across various sources in a fast, agile manner to drive analytics and deeper insight, without understanding how to connect multiple data stores, use different syntax or APIs or change their application. IBM Fluid Query allows access to data in Hadoop or Spark from IBM PureData System for Analytics appliances. Fluid Query enables the fast movement of data between Hadoop and IBM PureData System for Analytics appliances. Enabling query and data movement, IBM Fluid Query connects those appliances to common Hadoop systems like IBM BigInsights for Apache Hadoop, Cloudera and Hortonworks with the ability to access Spark through a Spark SQL connector. This provides an additional level of flexibility when accessing data residing on the Hadoop framework. With IBM Fluid Query, you can query against PureData System for Analytics, Hadoop or both by combining results from PureData System for Analytics database tables and Hadoop data sources thus creating powerful analytic combinations. Objectives In the following workshop, you will be exploring following areas: Get necessary libraries, and client configuration files from IBM BigInsights environment for work with Fluid Query Configure Fluid Query to connect to BigSQL in IBM BigInsights environment Move data between IBM PureData System for Analytics and BigSQL Query data from IBM PureData System for Analytics that resides in BigSQL Retrieve data from BigSQL in IBM BigInsights environment to PureData System [Netezza and BigInsights VMs must be up and running prior to start this workshop] Page 1 of 10

Start of the workshop 1. Set the Fluid Query environment on PDA 1.1. Access the Netezza Virtual Machine (Netezza VM) by double click on the following desktop icon. Use user name and password as nz for login to the Netezza VM: 1.2. Change to the directory that containing the Fluid Query scripts cd /nz/export/ae/products/fluidquery 1.3. Download the required Hadoop, Hive, BigSQL libraries, and client configuration files from the BigInsights to the Netezza system../fqdownload.sh --host bigdata --user bigsql --service bigsql Each time you are prompted for a password, type passw0rd. When asked if you want to continue connecting, type yes The fqdownload.sh will fail with following error if /nz/export/ae/products/fluidquery/libs/ibm/fqdm directory not empty: Tips ERROR: The download directory /nz/export/ae/products/fluidquery/libs/ibm/fqdm is not empty Delete all file from the /nz/export/ae/products/fluidquery/libs/ibm/fqdm directory using following command: rm -f /nz/export/ae/products/fluidquery/libs/ibm/fqdm/* 2. Configuring the Data Movement Utility on PDA 2.1. Change to the directory containing the configuration file to be edited Page 2 of 10

cd /nz/export/ae/products/fluidquery/conf 2.2. Using vi, open the file fq-remote-conf.xml Notes If you would rather not do the configuration yourself, you can use the prepopulated version with the following command and skip to step 2.4.: cp fq-remote-conf.xml.completed fq-remote-conf.xml 2.3. Locate the following sections and edit them accordingly. 2.3.1 Change the location to archive the data to be under the bigsql home directory <name>nz.fq.output.path</name> <value>/home/bigsql/nzbackup/sls_sales_fact_archive</value> <description>directory in HDFS where transferred data is stored</description> 2.3.2 Specify the name of the Netezza system, where the data is stored. <name>nz.fq.nps.server</name> <value>netezza</value> <description>the wall IP address or fully qualified hostname of the Netezza server</description> 2.3.3 Verify the credentials that will be used to connect to the Netezza system. For the purposes of this lab, the database user account is admin and password is password. <name>nz.fq.nps.user</name> <value>admin</value> <description>the Netezza database user account name for access to the database</description> </property> <property> <name>nz.fq.nps.password</name> <value>password</value> <description>the password for the Netezza database user account</description> 2.3.4 Provide the following details for the Hadoop environment: <name>nz.fq.sql.server</name> Page 3 of 10

<value>bigdata</value> <description>sql server address on Hadoop where imported table will be created</description> 2.3.5 Provide the BigSQL port <name>nz.fq.sql. port</name> <value>32051</value> <description>sql server port number on Hadoop</description> 2.3.6 Setting the type property as BigSQL <name>nz.fq.sql.type</name> <value>bigsql</value> <description>supported types: hive1, hive2, bigsql. Hive2 is preferred, Hive1 should be used on old systems where Hive2 is not available.</description> 2.3.7 Provide connection credentials <name>nz.fq.sql.user</name> <value>bigsql</value> <description>user name. Required if User/Password authentication should be used. </description> <name>nz.fq.sql.password</name> <value>bigsql</value> <description>password. Required if user name was provided.</description> 2.4. Change back to the directory containing the scripts cd /nz/export/ae/products/fluidquery 2.5. Using the configuration file, we just edited, create a connection to the BigInsights/Hadoop environment for the Data Movement utility../fqconfigure.sh --service fqdm --provider ibm --fqdm-conf conf/fq-remoteconf.xml --database gosalesdw --config fqdm This process will take few minutes. Press Y, if it asks for overwrite the fqdm configuration file. Make sure all checks from fqconfigure.sh command returns OK before you proceed to next step. Page 4 of 10

Following is a sample output from fqconfigure.sh command: Tips log4j:warn No appenders could be found for logger (com.ibm.nz.fq.fqconfiguration). log4j:warn Please initialize the log4j system properly. log4j:warn See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Database set to: gosalesdw FluidQuery version: 1.7.0.0 [Build 160420-201] SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/nz/export/ae/products/fluidquery/libs/ibm/fqdm/slf4jlog4j12.jar!/org/slf4j/impl/staticloggerbinder.class] SLF4J: Found binding in [jar:file:/nz/export/ae/products/fluidquery/libs/ibm/fqdm/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/staticloggerbinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.log4jloggerfactory] Checking access to HDFS for user: bigsql. Using following url: hdfs://bigdata.ibm.com:8020...[ok] Checking connection to warehouse jdbc:netezza://netezza:5480/gosalesdw...[ok] Checking map-reduce jobs...[ok] Init destination database engine...[ok] Checking BigSQL connection. jdbc:db2://bigdata.localdomain:32051/bigsql...[ok] Done Connection configuration success. 2.6. Register UDFs that will be used to move data between the systems../fqregister.sh --db gosalesdw --config fqdm --udtf tohadoop,fromhadoop Notes If the above command generates messages Existing function with same name already found. Press Y to register and overwrite each existing function. 3. Configuring Access To BigSQL Copy the JDBC driver that will be used to query data from BigSQL, along with the necessary connections. 3.1. Change to the directory designated to hold the bigsql JDBC driver. cd /nz/export/ae/products/fluidquery/libs/ibm/bigsql 3.2. Copy the driver from the BigInsights/Hadoop environment. When prompted for the password enter passw0rd. scp bigsql@bigdata:/usr/ibmpacks/bigsql/4.1/db2/java/db2jcc.jar. 3.3. Change the directory to Page 5 of 10

cd /nz/export/ae/products/fluidquery 3.4. Next configure the connection to BigSQL that you will be used for queries. When prompted for your password, enter passw0rd. Press Y, if it asks for overwrite the pda2bigsql configuration file../fqconfigure.sh --host bigdata --port 32051 --username bigsql --config pda2bigsql --provider ibm --service bigsql --database gosalesdw 3.5. Register the UDFs that will be used to read data. When it will ask form client password use passw0rd../fqregister.sh --db gosalesdw --config pda2bigsql --udtf readbigsql 4. Copying Data to BigSQL In previous steps you configured the PDA and BigInsights Hadoop environments. Now let s copy the oldest month of data from PDA to BigInsights Hadoop environment. 4.1. In gosalesdw database, the earliest month of data in the sls_sales_fact table is from January 2004. You can use the following SQL using nzsql to figure it out: nzsql -d gosalesdw -c 'SELECT MIN(order_day_key) FROM sls_sales_fact;' 4.2. To move the full year of data, specify a range of 20040101 through 20041231. You will use the tohadoop UDF that you registered in section 2.6. The parameters for the UDF are as follows: tohadoop( nps_db, nps_table, hive_schema, hive_table, option, ); Notes You will use the following parameters : nps_db Name of the PDA database from which we are copying the data. nps_table Name of the PDA table from which we are copying the data. hive_schema Name of the schema in which the data will reside in BigInsights Hadoop. hive_table Name of the table in which the data will reside in BigInsights Hadoop. Page 6 of 10

option All of the other arguments needed to move data to the Hadoop cluster are taken from the edited fq-remote-conf.xml file. The file parameters can be overridden using command arguments starting in this section. In this lab you will use the nz.fq.nps.where as a parameter to include a WHERE clause to filter data. Run the following SQL from nzsql to move the full year of data from 2004. This step will take few minutes and once successful will return Fluid Query Data Movement finished successfully. nzsql -d gosalesdw -c "CALL tohadoop('gosalesdw', 'sls_sales_fact', 'gosalesdw', 'sls_sales_fact_archive', 'nz.fq.nps.where=order_day_key BETWEEN 20040101 and 20041231'); " 4.3. Now let s check the data in BigSQL. Double click on the following desktop icon to access the BigInsights Virtual Machine (BigInsights VM): The above will open a terminal session as user root. Use su - bigsql to login as user bigsql. 4.4. Run following commands one at a time, to check the data that you copied to BigSQL: db2 CONNECT TO bigsql db2 "SELECT COUNT(*) FROM gosalesdw.sls_sales_fact_archive" db2 "SELECT MIN(order_day_key), MAX(order_day_key) FROM gosalesdw.sls_sales_fact_archive" Following are sample outputs from the above commands: $ db2 connect to bigsql Tips Database Connection Information Database server = DB2/LINUXX8664 10.6.3 SQL authorization ID = BIGSQL Local database alias = BIGSQL $ db2 "SELECT COUNT(*) FROM gosalesdw.sls_sales_fact_archive" Tips 1 --------------------------------- 84474. Page 7 of 10

1 record(s) selected. $ db2 "SELECT MIN(order_day_key), MAX(order_day_key) FROM gosalesdw.sls_sales_fact_archive" Tips 1 2 ----------- ----------- 20040112 20041217 1 record(s) selected. 5. Querying BigSQL/Hadoop Data from PDA In this section you will query data that is stored in BigSQL from PDA. 5.1. Go back to the Netezza PuTTY session that you previously using. 5.2. Use the UDF registered in step 3.5, to count the rows of data in the archive table and see the data ranges, execute the following query from nzsql: nzsql -d gosalesdw -c "SELECT * FROM TABLE WITH FINAL (readbigsql('','','select COUNT(*) FROM gosalesdw.sls_sales_fact_archive'));" The above SQL should validate the same number of rows that found in the step 4.4. 5.3. You can query both PDA and BigSQL at once. Execute this query from nzsql. This may take a minute or so. nzsql -d gosalesdw -c "select * from ((select cast(order_day_key/10000 as int) sales_year,sum(sale_total) from sls_sales_fact group by sales_year) union all (select * from table with final (readbigsql('','','select cast(order_day_key/10000 as int) sales_year,sum(sale_total) from gosalesdw.sls_sales_fact_archive group by cast(order_day_key/10000 as int)'))))zed order by sales_year;" Page 8 of 10

6. Retrieving Data from BigSQL/Hadoop While querying data in Hadoop from PDA is fine for occasional or ad hoc use, for cases where performance is important, you would want to temporarily move data back into PDA. For example, let s bring the 2010 sales data back into PDA for run some ad hoc report. 6.1. The 2005 data is stored a table called GOSALESDW.SLS_SALES_FACT. It has multiple years of data. But we only want to pull back the 2005 data and store it in PDA. First, at Netezza side drop the SLS_SALES_FACT_2005 table, if it exists. nzsql -d gosalesdw -c "DROP TABLE sls_sales_fact_2005" 6.2. Retrieve the data from BigSQL and store it in PDA (This may take a minute or so): nzsql -d gosalesdw -c "CREATE TABLE sls_sales_fact_2005 AS SELECT * FROM TABLE WITH FINAL (readbigsql('','','select * from gosalesdw.sls_sales_fact where order_day_key between 20050101 and 20051231')) DISTRIBUTE ON RANDOM;" 6.3. Confirm amount of data retrieved: nzsql -d gosalesdw -c "SELECT COUNT(*) FROM sls_sales_fact_2005;" Page 9 of 10

Workshop Complete Thank you for your time to walk through this workshop. Please perform following before starting a new workshop or leaving: - From the Netezza terminal session, run /nzscratch/backup/cleanup.sh - Close the terminal session that connect to Netezza - Close the terminal session that connect to BigInsights - Close this workshop Customer Experience When a customer sits down to one of the lab machine, there will be a monitor, laptop and mouse. On one of the monitors, they will be presented with our Reset Application. The customer will click on one of the workshop icon from the Reset Application. At this point a PDF of the workshop will display on one screen and an app or a browser screen will pop up on the other screen depending on the starting point of the workshop. App or Browser Workshop Page 10 of 10