Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Similar documents
Oracle Big Data Connectors

Building an Integrated Big Data & Analytics Infrastructure September 25, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle

Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016

Evolving To The Big Data Warehouse

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle. Oracle Big Data 2017 Implementation Essentials. 1z Version: Demo. [ Total Questions: 10] Web:

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

Turning Relational Database Tables into Spark Data Sources

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.

Exam Questions

Hadoop Map Reduce 10/17/2018 1

Oracle Big Data Fundamentals Ed 1

Oracle 1Z Oracle Big Data 2017 Implementation Essentials.

Oracle Big Data SQL High Performance Data Virtualization Explained

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Innovatus Technologies

Exam Questions 1z0-449

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Safe Harbor Statement

Oracle NoSQL Database Enterprise Edition, Version 18.1

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

SQL Gone Wild: Taming Bad SQL the Easy Way (or the Hard Way) Sergey Koltakov Product Manager, Database Manageability

Introduction to BigData, Hadoop:-

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

Big Data For Oil & Gas

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam

Oracle R Technologies

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData

Oracle Big Data Fundamentals Ed 2

microsoft

<Insert Picture Here> Introduction to Big Data Technology

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Database Applications (15-415)

A Fast and High Throughput SQL Query System for Big Data

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Certified Big Data and Hadoop Course Curriculum

Narration Script for ODI Adapter for Hadoop estudy

How to Troubleshoot Databases and Exadata Using Oracle Log Analytics

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Oracle NoSQL Database Enterprise Edition, Version 18.1

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Do-It-Yourself 1. Oracle Big Data Appliance 2X Faster than

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Microsoft Big Data and Hadoop

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

April Copyright 2013 Cloudera Inc. All rights reserved.

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Deploying Spatial Applications in Oracle Public Cloud

Databases 2 (VU) ( / )

Introducing Oracle R Enterprise 1.4 -

Apache Hive for Oracle DBAs. Luís Marques

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

Acquiring Big Data to Realize Business Value

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access


Big Data with Hadoop Ecosystem

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Flash Storage Complementing a Data Lake for Real-Time Insight

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

Importing and Exporting Data Between Hadoop and MySQL

Big Data Architect.

Spatial Analytics Built for Big Data Platforms

HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads

基于 Hadoop 和 RDBMS 的 Oracle 大数据分析 Corey Wei 技术顾问甲骨文公司. 版权所有 2014,Oracle 和 / 或其关联公司 保留所有权利

Bringing Data to Life

Automating Information Lifecycle Management with

Strategies for Incremental Updates on Hive

Oracle Big Data Connectors

Performance Tuning Data Transfer Between RDB and Hadoop. Terry Koch. Sr. Engineer

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

MySQL Cluster Web Scalability, % Availability. Andrew

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

Introduction to Hive Cloudera, Inc.

Data Platforms and Pattern Mining

CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu

Oracle Big Data Connectors

Security and Performance advances with Oracle Big Data SQL

TPCX-BB (BigBench) Big Data Analytics Benchmark

Talend Big Data Sandbox. Big Data Insights Cookbook

Hadoop. copyright 2011 Trainologic LTD

What is Gluent? The Gluent Data Platform

Javaentwicklung in der Oracle Cloud

DBAs can use Oracle Application Express? Why?

User's Guide Release 4 (4.0)

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Certified Big Data Hadoop and Spark Scala Course Curriculum

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Transcription:

1

Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2

Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle Database Oracle Direct Connector for HDFS Oracle Loader for Hadoop Performance 3

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 4

Oracle s Big Data Platform Stream Acquire Organize & Discover Analyze Visualize & Decide 5

Oracle s Big Data Platform Hadoop Oracle Database 6

Oracle Big Data Connectors Oracle Direct Connector for HDFS Oracle Loader for Hadoop Oracle R Connector for Hadoop Oracle Data Integrator Application Adapters for Hadoop 7

Oracle Loader for Hadoop and Oracle Direct Connector for HDFS Access data resident on Hadoop from Oracle Database Load data from Hadoop into Oracle Database Analyze all data together: Data processed on Hadoop along with data in Oracle Database 8

Oracle R Connector for Hadoop R Analytics leveraging Hadoop and HDFS Oracle R Client Linearly Scale a Robust Set of R Algorithms HDFS Hadoop Leverage MapReduce for R Calculations Compute Intensive Parallelism for Simulations 9

Oracle Data Integrator Application Adapters for Hadoop Transforms Via MapReduce(HIVE) Benefits Consistent tooling across BI/DW, SOA, Integration and Big Data Activates Loads Reduce complexities of processing Hadoop through graphical tooling Improves productivity when processing Big Data (Structured + Unstructured) Oracle Database Improving Productivity and Efficiency for Big Data 10

Big Data Connectors ORACLE LOADER FOR HADOOP ORACLE DIRECT CONNECTOR FOR HDFS 11

Loading and Accessing Data from Hadoop INPUT 1 SHUFFLE /SORT LOG FILES INPUT 2 SHUFFLE /SORT SHUFFLE /SORT Oracle Database 12

Example Use Case BUSINESS PROBLEM Need insight into customer web activity (clickstream data) CONNECT HADOOP WITH ORACLE DATABASE Aggregate raw data and load into database for analysis BUSINESS PROBLEM Need to connect web activity with transactional activity CONNECT HADOOP WITH ORACLE DATABASE Perform analysis on in-place data by running Oracle SQL queries 13

Usage Scenarios Bulk load large volumes of data Example: Historical data, daily uploads of data gathered during the day Loads at regular frequency Example: 24/7 monitoring of log feeds Loads at irregular frequency Example: Monitoring of sensor feeds Access data files in place on HDFS 14

Oracle Direct Connector for HDFS Accessing HDFS Data from Oracle Database Features Access and analyze data in place on HDFS HDFS Access or load into the database in parallel using external table mechanism Oracle Database SQL Query Query and join data on HDFS with database resident data External Table Load into the database using SQL if required HDFS Client Automatic load balancing to maximize performance 15

Oracle Direct Connector for HDFS External Tables Access data on HDFS via external tables No DML operations, and no indexes can be created on external tables Data files can be text files or Oracle Data Pump files (created by Oracle Loader for Hadoop) Parallelism is controlled by the external table definition Data files are grouped to distribute load evenly across PQ slaves 16

Oracle Direct Connector for HDFS 3 Simple Steps Create external table Run the Oracle Direct Connector for HDFS utility to publish HDFS content to the external table Access and load into the database using SQL >hadoop jar \ $ODCH_HOME/jlib/orahdfs.jar \ oracle.hadoop.hdfs.extab.externaltable\ -conf MyConf.xml \ -publish 17

Performance Comparison Fuse DFS Load rate (TB/hour) CPU Usage 6 5 4 3 2 1 0 Fuse-DFS Oracle Direct Connector for HDFS CPU seconds used per GB 180 160 140 120 100 80 60 40 20 0 Fuse-DFS Oracle Direct Connector for HDFS 18

Key Benefits Uniquely enables access to HDFS data files from Oracle Database Performance 12 TB/hour from Oracle Big Data Appliance to Oracle Exadata 5x 20x faster than comparable third party products Easy to use for Oracle DBAs and Hadoop developers Developed and supported by Oracle 19

Oracle Loader for Hadoop Read target table metadata Connect to the database from from the database reducer nodes, load into ORACLE LOADER FOR HADOOP database partitions in Partition, sort, and convert parallel (JDBC or direct into Oracle data types on path) Hadoop SHUFFLE /SORT Features Offloads data preprocessing from the database server to Hadoop Works with a range of input data formats SHUFFLE /SORT Handles skew in input data to maximize performance Online and offline modes (offline: create Oracle Data Pump files on HDFS) 20

Input Formats Oracle Loader for Hadoop Delimited text InputFormat Hive tables InputFormat Avro record InputFormat User written InputFormat (Planned) Regular expression InputFormat (Planned) Oracle NoSQL Database InputFormat 21

Automatically Handle Input Data Skew Load Balancing across Reducers Distribute load evenly across reduce tasks All reducers do approximately the same amount of work Avoids slowdown because of unbalanced reducer loads Maximizes performance Data is sampled to determine optimal partitioning of map output keys 22

Oracle Loader for Hadoop 2 Simple Steps Create target table Submit Oracle Loader for Hadoop job to the cluster >hadoop jar \ $OLH_HOME/jlib/oraloader.jar \ oracle.hadoop.loader.oraloader \ -conf MyConf.xml 23

Performance Comparison Third party products Load rate (TB/hour) CPU Usage 2.5 2 1.5 1 0.5 0 Comparable third party product Oracle Loader for Hadoop CPU seconds used per GB 700 600 500 400 300 200 100 0 Comparable third party product Oracle Loader for Hadoop 24

Key Benefits Load directly from HDFS, Hive tables, into Oracle Database without intermediate staging files Performance 10x faster than comparable third party products Offload database server processing on to Hadoop Minimizes impact on performance SLAs of production applications Easy to use for Oracle DBAs and Hadoop developers Developed and supported by Oracle 25

Oracle Loader for Hadoop and Oracle Direct Connector for HDFS ORACLE LOADER FOR HADOOP ORACLE DIRECT CONNECTOR FOR HDFS SHUFFLE /SORT SQL Query External Table HDFS Client SHUFFLE /SORT Oracle Database 26

Performance Summary 12 TB / HOUR (66 BILLION ROWS) 5 20 TIMES FASTER THAN THIRD PARTY PRODUCTS D DATABASE CPU USAGE IN COMPARISON 27

Summary High performance connectors for load and access of data from a Hadoop cluster Fast and efficient connectors support a range of use cases Simple to set up, easy to use for developers Developed and supported by Oracle 28

Q & A 29

Graphic Section Divider 30

31

32