INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?)

Similar documents
SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Big Data Hadoop Stack

Hadoop. Introduction / Overview

88X + PERFORMANCE GAINS USING IBM DB2 WITH BLU ACCELERATION ON INTEL TECHNOLOGY

IBM Big SQL Partner Application Verification Quick Guide

New Features and Enhancements in Big Data Management 10.2

Cloudera Manager Quick Start Guide

How to Run the Big Data Management Utility Update for 10.1

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

Hadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)

Using Hive for Data Warehousing

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

Enterprise Data Catalog Fixed Limitations ( Update 1)

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Introduction to Cloudbreak

ACTIAN PRODUCTS by Platform - Vector, Vector in Hadoop as of October 18, 2017

Hitachi Hyper Scale-Out Platform (HSP) Hortonworks Ambari VM Quick Reference Guide

Installation Apache Manually Windows Server 2008 R2 64 Bit

Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018

Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Hortonworks Data Platform

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Hive SQL over Hadoop

Big Data with Hadoop Ecosystem

Introduction into Big Data analytics Lecture 3 Hadoop ecosystem. Janusz Szwabiński

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

microsoft

Hortonworks Technical Preview for Apache Falcon

Data Analytics using MapReduce framework for DB2's Large Scale XML Data Processing

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Innovatus Technologies

VMware vsphere Big Data Extensions Administrator's and User's Guide

Oracle Big Data SQL High Performance Data Virtualization Explained

Modern Data Warehouse The New Approach to Azure BI

Talend Open Studio for Big Data. Installation and Upgrade Guide 5.3.1

Hands-on Exercise Hadoop

VMware vsphere Big Data Extensions Command-Line Interface Guide

Hadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop

Installation and Configuration Guide Simba Technologies Inc.

Microsoft Analytics Platform System (APS)

Installing Apache Zeppelin

Ian Choy. Technology Solutions Professional

Top 25 Hadoop Admin Interview Questions and Answers

Hadoop An Overview. - Socrates CCDH

CIS 612 Advanced Topics in Database Big Data Project Lawrence Ni, Priya Patil, James Tench

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Certified Big Data Hadoop and Spark Scala Course Curriculum

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

April Copyright 2013 Cloudera Inc. All rights reserved.

International Journal of Advance Engineering and Research Development. A study based on Cloudera's distribution of Hadoop technologies for big data"

Informatica Cloud Spring Hadoop Connector Guide

Hortonworks Technical Preview for Stinger Phase 3 Released: 12/17/2013

QlikView, Creating Business Discovery Application using HDP V1.0 March 13, 2014

DATA SCIENCE USING SPARK: AN INTRODUCTION

Backtesting with Spark

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Hortonworks Data Platform

SQL 2016 Performance, Analytics and Enhanced Availability. Tom Pizzato

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.

Cloudera ODBC Driver for Apache Hive Version

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Configuring Ports for Big Data Management, Data Integration Hub, Enterprise Information Catalog, and Intelligent Data Lake 10.2

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Managing and Monitoring a Cluster

Do-It-Yourself 1. Oracle Big Data Appliance 2X Faster than

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Certified Big Data and Hadoop Course Curriculum

AWS Serverless Architecture Think Big

New Approaches to Big Data Processing and Analytics

iway Big Data Integrator New Features Bulletin and Release Notes

Configuring Sqoop Connectivity for Big Data Management

Data Lake Based Systems that Work

Hadoop Development Introduction

Hortonworks SmartSense

Configuring and Deploying Hadoop Cluster Deployment Templates

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

KNIME Extension for Apache Spark Installation Guide. KNIME AG, Zurich, Switzerland Version 3.7 (last updated on )

Create Test Environment

SAS Data Loader 2.4 for Hadoop: User s Guide

BlueMix Hands-On Workshop

KNIME Extension for Apache Spark Installation Guide

BigInsights and Cognos Stefan Hubertus, Principal Solution Specialist Cognos Wilfried Hoge, IT Architect Big Data IBM Corporation

Polybase In Action. Kevin Feasel Engineering Manager, Predictive Analytics ChannelAdvisor #ITDEVCONNECTIONS ITDEVCONNECTIONS.COM

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

SINGLE NODE SETUP APACHE HADOOP

Informatica Cloud Spring Complex File Connector Guide

SAS Visual Analytics 7.3: Installation and Configuration Guide (Distributed SAS LASR )

Shark: Hive (SQL) on Spark

Talend Open Studio for Big Data. Getting Started Guide 5.3.2

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Flush Dns Settings Linux Redhat 5 Step Step Pdf

Expert Lecture plan proposal Hadoop& itsapplication

HPE Basic Implementation Service for Hadoop

Oracle Big Data Connectors

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Transcription:

PER STRICKER, THOMAS KALB 07.02.2017, HEART OF TEXAS DB2 USER GROUP, AUSTIN 08.02.2017, DB2 FORUM USER GROUP, DALLAS INITIAL EVALUATION BIGSQL FOR HORTONWORKS (Homerun or merely a major bluff?) Copyright 2016 ITGAIN GmbH 1

Agenda Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation Working with BigSQL Familiar and the New a. DB2 - Interface b. HDFS - Interface Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine Functional Differences Performance Differences BIG SQL and Hive Conclusion Sham or Masterstroke? Questions and Discussions Copyright 2016 ITGAIN GmbH 2

Hadoop (HDFS) http://bradhedlund.s3.amazonaws.com/2011/hadoop-network-intro/hadoop-cluster.png Copyright 2016 ITGAIN GmbH 3

Hadoop Distribution Cloudera / Hortonworks / MapR / IOP (Worldwide Market share) others 20 % Hortonworks 16 % Cloudera 53% MapR 11 % Quelle: https://www.dezyre.com/article/top-6-hadoop-vendors-providing-big-data-solutions-in-open-data-platform/93 Copyright 2016 ITGAIN GmbH 4

Hadoop Appraisal Quelle: https://www.cloudera.com/content/dam/www/static/documents/analyst-reports/forrester-wave-big-data-hadoop-distributions.pdf Copyright 2016 ITGAIN GmbH 5

Hadoop SQL Engines Quelle: IBM Big SQL Vendor Landscape 2014 IBM Corporation Copyright 2016 ITGAIN GmbH 6

Agenda Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) BIGSQL Sham or Masterstroke? Installation stumbling blocks Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation Working with BigSQL Familiar and the New a. DB2 - Interface b. HDFS - Interface Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine Functional Differences Performance Differences Conclusion Sham or Masterstroke? Questions and Discussion Copyright 2016 ITGAIN GmbH 7

Big SQL and MPP-Architecture IBM Big SQL is a high performance SQLon-Apache-Hadoop- Engine IBM MPP-engine (C++) replaces the MapReduce-Layer (Java) Big SQL is a MPP (Massively Parallel Processing) SQL-engine HIVE extends Hadoop with Data- Warehouse Features HBASE is a distributed column-oriented database HDFS is a high availability filesystem for storing very large volumes of data distributed across many nodes. Quelle: Big SQL: A Technical Introduction 2016 IBM Corporation Copyright 2016 ITGAIN GmbH 8

SMP vs. MPP Architecture SMP: Dynamically distributes running processes across all available processors which share system resources (multi processor systems) Copyright 2016 ITGAIN GmbH 9

SMP vs. MMP Architecture MPP: Distributes a task across multiple independent nodes with individual processors, RAM and I/O. (Share nothing architecture) Copyright 2016 ITGAIN GmbH 10

SMP Scaling Vertical Scaling Copyright 2016 ITGAIN GmbH 11

Horizontal Scaling

Agenda Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation Working with BigSQL Familiar and the New a. DB2 - Interface b. HDFS - Interface Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine Functional Differences Performance Differences BIG SQL and Hive Conclusion Sham or Masterstroke? Questions and Discussions Copyright 2016 ITGAIN GmbH 13

DB2 DPF versus Hadoop (HDFS) Hadoop Cluster (Diploma Thesis) DB2 DPF Hadoop Cluster Copyright 2016 ITGAIN GmbH 14

DB2 DPF Quelle: toadworld.com Copyright 2016 ITGAIN GmbH 15

Big SQL IBM Slide Quelle: Big SQL: A Technical Introduction 2016 IBM Corporation Copyright 2016 ITGAIN GmbH 16

BIG SQL ITGAIN Slide Copyright 2016 ITGAIN GmbH 17

Agenda Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation Working with BigSQL Familiar and the New a. DB2 - Interface b. HDFS - Interface Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine Functional Differences Performance Differences Conclusion Sham or Masterstroke? Questions and Discussions Copyright 2016 ITGAIN GmbH 18

Installation Stumbling Blocks ITGAIN Test Environment Installing two nodes Hardware 2 virtual Servers with 8 Cores / 10 GB RAM / SSDs Software Linux RedHat 7.2 / Cent OS 7.2 Ambari 2.2.2.0 Hortonworks Data Platform (HDP) 2.4.2 BETA: Big SQL 4.2 for Hortonworks Data Platform Extending with two additional identical nodes (DataNode / WorkerNode) Copyright 2016 ITGAIN GmbH 19

Installation Stumbling Blocks Red Hat or CentOS? IBM BigInsights for Apache Hadoop 4.2 only supports Red Hat Enterprise Linux (RHEL) Server 6.7 Red Hat Enterprise Linux (RHEL) Server 7.2 Hortonworks Data Platform HDP 2.4.2 supports Red Hat Enterprise Linux (RHEL) 6.x - 7.x CentOS 6.x - 7.x Debian 7.x Oracle Linux 6.x - 7.x SUSE Linux Enterprise Server (SLES) v11 SP3 / SP4 Ubuntu Precise v12.04 Ubuntu Trusty v14.04 Copyright 2016 ITGAIN GmbH 20

Installation Stumbling Blocks Red Hat or CentOS? Recommendation for BETA auf Hortonworks Red Hat Enterprise Linux (RHEL) Server 7.2 Test-Cluster on Red Hat Enterprise Linux (RHEL) Server 7.2 CentOS 7.2 Installation on both OSes was successful Copyright 2016 ITGAIN GmbH 21

Installation Stumbling Blocks The HDP Installation with Ambari Copyright 2016 ITGAIN GmbH 22

Installation Stumbling Blocks The HDP Installation with Ambari Tips and Tricks: Very simple installation with Ambari, provided there are no errors Therefore: prior to the installation take the time to clear any warnings in the Confirm Hosts and Check Scripts In case of Errors: Check the errors output to stderr Often stderr is empty Typical cause is a timeout If stderr contains errors Attempt to correct the error and retry If the installation crashes it is often easier to retry with a fresh OS rather than changing the OS and retrying the installation Copyright 2016 ITGAIN GmbH 23

Installation Stumbling Blocks The BigSQL Installation Recommendations: Execute the Big SQL Pre-Checker before the Installation Pre-Checker Scripts are available in the installation package but need to be extracted rpm2cpio BigInsights-HDP-1.2.0.0-2.4.el7.x86_64.rpm cpio -ivd./var/lib/ambari-server/resources/stacks/hdp/2.4/services/bigsql/ package/scripts/bigsql-precheck.sh rpm2cpio BigInsights-HDP-1.2.0.0-2.4.el7.x86_64.rpm cpio -ivd./var/lib/ambari-server/resources/stacks/hdp/2.4/services/bigsql/ package/scripts/bigsql-util.sh All errors should be cleared before starting the installation Copyright 2016 ITGAIN GmbH 24

Installation Stumbling Blocks The BigSQL Installation Execute for ALL servers! Only when successful should you start the installation Copyright 2016 ITGAIN GmbH 25

Installation Stumbling Blocks The BigSQL Installation Add the Service to a Cluster Copyright 2016 ITGAIN GmbH 26

Installation Stumbling Blocks The BigSQL Installation Copyright 2016 ITGAIN GmbH 27

Installation Stumbling Blocks The BigSQL Installation It is always possible to add additional Big SQL Workers to an individual host via Add Services option under Hosts However, this is not possible on a Big SQL Head Node! Copyright 2016 ITGAIN GmbH 28

Installation Stumbling Blocks Extending the Cluster with Ambari Additional hosts can easily be added with the Add New Hosts Wizard Copyright 2016 ITGAIN GmbH 29

Installation Stumbling Blocks Extending the Cluster with Ambari Copyright 2016 ITGAIN GmbH 30

Installation Stumbling Blocks Extending the Cluster with Ambari Copyright 2016 ITGAIN GmbH 31

Installation Stumbling Blocks Extending the Cluster with Ambari Copyright 2016 ITGAIN GmbH 32

Installation Stumbling Blocks Extending the Cluster with Ambari Data must be redistributed after the extension Copyright 2016 ITGAIN GmbH 33

Agenda Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation Working with BigSQL Familiar and the New a. DB2 - Interface b. HDFS - Interface Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine Functional Differences Performance Differences BIG SQL and Hive Conclusion Sham or Masterstroke? Questions and Discussions Copyright 2016 ITGAIN GmbH 34

Working with BigSQL The New and the Familiar DB2 Interface Copyright 2016 ITGAIN GmbH 35

Working with BigSQL The New and the Familiar Where does one find the Tables in HDFS? /apps/hive/warehouse/bigsql.db/firsttable Copyright 2016 ITGAIN GmbH 36

Working with BigSQL The New and the Familiar Or via the Command line (HDFS Browse): Copyright 2016 ITGAIN GmbH 37

Working with BigSQL The New and the Familiar Not everything works with the DB2 Command line: For example loading data into a Hadoop Table What now? Copyright 2016 ITGAIN GmbH 38

Working with BigSQL The New and the Familiar There is also a Command line for BigSQL: JSqsh (Java SQL Shell) pronounced "jay-skwish According to the docs it should be found in: /usr/ibmpacks/common-utils/current/jsqsh BUT: Copyright 2016 ITGAIN GmbH 39

Working with BigSQL The New and the Familiar SOLUTION: JSqsh isn t part of the BigSQL-Installation Copyright 2016 ITGAIN GmbH 40

Working with BigSQL The New and the Familiar JSqsh appears in the list of installed clients JSqsh can also be installed via the OpenSource GitHubproject Copyright 2016 ITGAIN GmbH 41

Working with BigSQL The New and the Familiar JSqsh Setup: Copyright 2016 ITGAIN GmbH 42

Working with BigSQL The New and the Familiar JSqsh Setup: driver selection Copyright 2016 ITGAIN GmbH 43

Working with BigSQL The New and the Familiar JSqsh Setup: Customize the Connection details and save Copyright 2016 ITGAIN GmbH 44

Working with BigSQL The New and the Familiar Requesting the table list with Jsqsh Jsqsh Command help via \help e.g g.: Defining the current schema: use BIGSQL Requesting a table list in a given schema: \show tables Copyright 2016 ITGAIN GmbH 45

Working with BigSQL The New and the Familiar Starting point: Load data in the Tables Tip: for better Performance load the Load-File with hdfs hdfs dfs -copyfromlocal /tmp/firsttable.csv /tmp/ hdfs dfs -chmod 777 /tmp/firsttable.csv Copyright 2016 ITGAIN GmbH 46

Working with BigSQL The New and the Familiar What happened in the hdfs-filesystem? a new file has appeared Copyright 2016 ITGAIN GmbH 47

Working with BigSQL The New and the Familiar db2top also works: For example, LOAD Copyright 2016 ITGAIN GmbH 48

Working with BigSQL The New and the Familiar Even db2pd works: For example LOAD However LIST UTILITIES does not work Copyright 2016 ITGAIN GmbH 49

Agenda Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation Working with BigSQL Familiar and the New a. DB2 - Interface b. HDFS - Interface Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine Functional Differences Performance Differences BIG SQL and Hive Conclusion Sham or Masterstroke? Questions and Discussions Copyright 2016 ITGAIN GmbH 50

Loading the Benchmark BIGSQL HDFS Table Copyright 2016 ITGAIN GmbH 51

The HDFS (DB2-) Blocks Copyright 2016 ITGAIN GmbH 52

BIGSQL HDFS versus DB2 DPF Copyright 2016 ITGAIN GmbH 53

BIGSQL HDFS versus DB2 DPF Copyright 2016 ITGAIN GmbH 54

DB2 DPF Restrictions Copyright 2016 ITGAIN GmbH 55

DB2 DPF Restrictions Copyright 2016 ITGAIN GmbH 56

Performance differences DB2 DPF versus DB2 HDFS Loading 10 million rows DB2 DPF: 22 Sek. DB2 HDFS: 64 Sek. Copyright 2016 ITGAIN GmbH 57

Performance differences DB2 DPF versus DB2 HDFS Random I/O Benchmark (Reading von 1023 rows) DB2 DPF Cold: DB2 HDFS Cold: Warm: Warm: Copyright 2016 ITGAIN GmbH 58

Performance differences DB2 DPF versus DB2 HDFS Read-Ahead I/O Benchmark (Reading von 10 Mio. Rows) DB2 DPF Cold: DB2 HDFS Cold: Warm: Warm: Copyright 2016 ITGAIN GmbH 59

Agenda Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation Working with BigSQL Familiar and the New a. DB2 - Interface b. HDFS - Interface The Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine Functional Differences Performance Differences BIG SQL and Hive Conclusion Sham or Masterstroke? Questions and Discussions Copyright 2016 ITGAIN GmbH 60

The Big Data Deployment (SQL for unstructured Data) Working with datatypes for complex data (partially structured) ARRAY: Collection of data of the same datatype MAP: Collection of Key-Value pairs STRUCT: Collection of data with different datatypes Working with unstructured data is possible via the Serializer and Deserializer (SerDe) The SerDe-Interface is instructed how it should process data blocks There are many Built-In SerDes for example for JSON, Avro, Parquet, Regular Expressions, etc... Many SerDes are available in the Public Domain Specific SerDes that may be required can be developed in Java Copyright 2016 ITGAIN GmbH 61

Big Data Working with the ARRAY-Data types Collection of data of the same datatype Copyright 2016 ITGAIN GmbH 62

Big Data Working with MAP Types Collection of Key-Value pairs Copyright 2016 ITGAIN GmbH 63

Big Data Working with STRUCTs Collection of data with different data types Copyright 2016 ITGAIN GmbH 64

Big Data Unstructured Data Using SerDes in BigSQL Before using the SerDe.jar-Files it needs to be registered in BigSQL - Only when the jar file has been successfully registered will it be available to BigSQL 3 Steps to Register: Hive Servers: Copy the SerDe.jar-File in the /lib/ directory Big SQL Node: Copy the SerDe.jar-File in the /userlib/ directory of each individual node Restart all BigSQL Services Copyright 2016 ITGAIN GmbH 65

Big Data Example of Unstructured Data Example: Parsing log files with Regular Expression (RegexSerDe) Copyright 2016 ITGAIN GmbH 66

Big Data Example of Unstructured Data select * from apache_log fetch first 5 rows only For example, to correlate Client Data with Web Browser data for analysis of user behavior Copyright 2016 ITGAIN GmbH 67

Agenda Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation Working with BigSQL Familiar and the New a. DB2 - Interface b. HDFS - Interface Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine Functional Differences Performance Differences BIG SQL and Hive Conclusion Sham or Masterstroke? Questions and Discussions Copyright 2016 ITGAIN GmbH 68

Big SQL versus Hive SQLReplayer Copyright 2016 ITGAIN GmbH 69

Hive Big SQL Object Synchronization Create a table into Hive: SQLReplayer Copyright 2016 ITGAIN GmbH 70

Hive Big SQL Object Synchronization Synchronize the Hive Tables: SQLReplayer Copyright 2016 ITGAIN GmbH 71

Hive Big SQL Object Synchronization Test the Big SQL Table: SQLReplayer Copyright 2016 ITGAIN GmbH 72

Hive Big SQL Data Synchronization (Refresh) Edit the HDFS File: SQLReplayer Copyright 2016 ITGAIN GmbH 73

Hive Big SQL Data Synchronization (Refresh) Select the Hive Table: SQLReplayer Copyright 2016 ITGAIN GmbH 74

Hive Big SQL Data Synchronization (Refresh) Synchronization (Refresh): SQLReplayer Copyright 2016 ITGAIN GmbH 75

Agenda Introduction The MPP Architecture DB2 DPF and Hadoop (HDFS) Installation stumbling blocks Red Hat or Centos The HDP Installation with Ambari (See Appendix) The BigSQL Installation Working with BigSQL Familiar and the New a. DB2 - Interface b. HDFS - Interface Der Big Data Deployment (SQL for unstructured Data) DB2 Engine versus HDFS-Engine Functional Differences Performance Differences BIG SQL and Hive Conclusion Sham or Masterstroke? Questions and Discussions Copyright 2016 ITGAIN GmbH 76

BIGSQL Sham or Masterstroke? Sham DB2 DPF for HDFS Masterstroke The right strategy at the right time Reuse of existing investments Increased acceptance via the reuse of SQL Simple integration of Big Data in an existing infrastructure Copyright 2016 ITGAIN GmbH 77

The Big Data Solution Big SQL Hadoop-Tables are not a replacement for OLTP-DBMS Technology Big SQL makes it possible to use SQL Requests against existing Hadoop Data (no proprietary storage formats) All the data are Hadoop files in HDFS Big SQL was developed to make effective and efficient use of the Hadoop infrastructure Most organizations possess experienced SQL developers No UPDATE or DELETE is possible on a Hadoop table Much lower license costs than DPF Good SQL compatibility Great monitoring with Speedgain for BIGSQL is available Copyright 2016 ITGAIN GmbH 78

The Big Data Solution Primary Use cases would be: To move rarely referenced data out of the Data-Warehouse and onto cheaper hardware while maintaining the ability to query the data via SQL To setup new Data-Warehouse To filter and analyze unstructured data (such as log files, sensor data and social media) as well as to connect this data to existing structured data (such as via federation) Copyright 2016 ITGAIN GmbH 79

Conclusion Bluff = Homerun Copyright 2016 ITGAIN GmbH 80

Q & A Copyright 2016 ITGAIN GmbH 81