Big Data Infrastructure The Oracle Way. Daniel Steiger

Similar documents
Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Cloud Acceleration. Performance comparison of Cloud vendors. Tobias Deml DOAG2017

Oracle Big Data Fundamentals Ed 1

Oracle BDA: Working With Mammoth - 1

Empfehlungen vom BigData Admin

Backup Methods from Practice

Oracle Big Data Connectors

Domain Services Clusters Centralized Management & Storage for an Oracle Cluster Environment Markus Flechtner

Data Vault Partitioning Strategies. Dani Schnider, Trivadis AG DOAG Conference, 23 November 2017

Oracle Big Data Appliance X7-2

Analytic Views: Use Cases in Data Warehouse. Dani Schnider, Trivadis AG DOAG Conference, 21 November 2017

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Database Sharding with Oracle RDBMS

Oracle Big Data Fundamentals Ed 2

Oracle 1Z Oracle Big Data 2017 Implementation Essentials.

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr

MapR Enterprise Hadoop

Application Containers an Introduction

Oracle GoldenGate for Big Data

Security and Performance advances with Oracle Big Data SQL

Spatial Analytics Built for Big Data Platforms

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Microservices with Kafka Ecosystem. Guido Schmutz

Oracle Big Data Appliance

Prices in USA (Dollar) Oracle Engineered Systems Price List August 22, 2018

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

Introduction to the Oracle Big Data Appliance - 1

Oracle Exadata X7. Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer

Application Containers an Introduction

Oracle Big Data Appliance

Oracle Big Data Appliance

Oracle Big Data Appliance

Application Containers an Introduction

Oracle Public Cloud Machine

WHITEPAPER. MemSQL Enterprise Feature List

For reference purposes only, subject to change Date Time of Last Update 0800

Securing the Oracle BDA - 1

EsgynDB Enterprise 2.0 Platform Reference Architecture

Oracle Big Data SQL High Performance Data Virtualization Explained

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Data Lake Based Systems that Work

Quick Deployment Step- by- step instructions to deploy Oracle Big Data Lite Virtual Machine

Configuring and Deploying Hadoop Cluster Deployment Templates

Oracle. Oracle Big Data 2017 Implementation Essentials. 1z Version: Demo. [ Total Questions: 10] Web:

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Oracle Big Data Appliance

PCMG's Oracle Engineering Systems Price List October 3, For reference purposes only, subject to change Date Time of Last Update 0800

Stages of Data Processing

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Oracle Cloud Using Oracle Big Data Cloud Service. Release

WELCOME. Unterstützung von Tuning- Maßnahmen mit Hilfe von Capacity Management. DOAG SIG Database

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

IaaS/PaaS with Oracle Private Cloud Appliance in practice. Konrad HÄFELI Senior Solution Manager Infrastructure Engineering

Oracle Database Service High Availability with Data Guard?

Oracle NoSQL Database Enterprise Edition, Version 18.1

REALTIME WEB APPLICATIONS WITH ORACLE APEX

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Deploying and Managing Dell Big Data Clusters with StackIQ Cluster Manager

Data Sheet FUJITSU Storage ETERNUS CS200c S4

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Oracle In-Memory & Data Warehouse: The Perfect Combination?

Data Sheet FUJITSU Storage ETERNUS CS200c S4

Evolving To The Big Data Warehouse

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

The Microsoft Big Data architecture approach

Oracle NoSQL Database Enterprise Edition, Version 18.1

Hitachi Converged Platform for Oracle

5 Fundamental Strategies for Building a Data-centered Data Center

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

1z0-449.exam. Number: 1z0-449 Passing Score: 800 Time Limit: 120 min File Version: Oracle. 1z0-449

Big Data Architect.

Hadoop. Introduction / Overview

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Innovatus Technologies

Cloud Analytics and Business Intelligence on AWS

Online Operations in Oracle 12.2

Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016

Do-It-Yourself 1. Oracle Big Data Appliance 2X Faster than

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Big Data Technologies and Geospatial Data Processing:

Configuration Process Overview

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Data Sheet FUJITSU Storage ETERNUS CS200c S3

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

The three investigators

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

ORACLE CONFIGURATION SERVICES EXHIBIT

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Flash Storage Complementing a Data Lake for Real-Time Insight

<Insert Picture Here> Introduction to Big Data Technology

SOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera

Transcription:

Big Data Infrastructure The Oracle Way. Daniel Steiger BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH

About... Daniel Steiger Principal Consultant @ Trivadis Oracle DBA and IT Infrastructure Architect Program Manager IT Infrastructure Optimization Co-Author "Der Oracle DBA", Hanser Verlag Speaker and Teacher 2 17.11.2016

Our company. Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: O P E R A T I O N Trivadis Services takes over the interacting operation of your IT systems. 3 17.11.2016

With over 600 specialists and IT experts in your region. COPENHAGEN HAMBURG 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants DÜSSELDORF Research and development budget: CHF 5.0 million FRANKFURT Financially self-supporting and sustainably profitable BASEL FREIBURG STUTTGART BRUGG ZURICH MUNICH VIENNA Experience from more than 1,900 projects per year at over 800 customers GENEVA BERN LAUSANNE 4 17.11.2016

Agenda 1. Introduction 2. Oracle Big Data Infrastructure 3. Oracle Big Data Software 4. BDA Setup 5. Use Case 6. Summary 5 17.11.2016

Introduction 6 17.11.2016

Hadoop is born from Apache Nutch 197 Foundation of Cloudera 2006 2008 2011 2012 Oracle Rolls Out 'Big Data Appliance' Oracle Makes Big Data Appliance Move With Cloudera 7 17.11.2016

About the Current State of Big Data Technology "Cloudera is eight; Apache Hadoop is ten. Big data has gone from zero to how-did-that-happen huge. The bestiary is bigger than ever, too: new projects like Apache Kudu, Apache Impala (incubating), Apache Kafka and Apache Spark define the future of big data and analytics, extending the core Hadoop platform to handle streaming, real-time and advanced analytics." Mike Olson, Cloudera CSO and Co-Founder, Aug. 25, 2016 8 17.11.2016

Data Lakes and Reservoirs Since the data doesn t just sit there until it evaporates but eventually flows to various applications, we should think of this as a data reservoir rather than a data lake. http://blogs.informatica.com 9 17.11.2016

Data Reservoir Functions Ingestion Storage/Retention Processing Access Source: Architecting Data Lakes, 2016 O Reilly Media, Inc. 10 17.11.2016

Oracle Big Data Management System Architecture Schema-on-read Raw data Complex processing Huge volume at low cost Schema-on-write Cleansed data Complex integration Large volume at moderate cost 11 17.11.2016

Oracle's Big Data Solution A complete and optimized solution for big data Tight integration with Exadata, Exalogic, Exalytics and SPARC Supercluster using Infiniband network Single-vendor support for both hardware and software 12 17.11.2016

Oracle Big Data Infrastructure 13 17.11.2016

The Big Data Appliance X6-2 Hardware Per Node (X6-2): 2 x 22-Core (2.2GHz) Intel Xeon E5-2699 v4 8 x 32GB DDR4-2400 Memory (max. 768GB) 12 x 8TB 7,200 RPM High Capacity SAS Drives 2 x QDR 40Gb/sec InfiniBand Ports 4 x 10 Gb Ethernet Ports, 1 x ILOM Ethernet Port RAM to CPU Ratio: ODA X6-2M: 38 GB per Core MiniCluster S7-2: 32 GB per Core BDA: 17.5 GB per Core* Starter Rack: 6 x nodes Full Rack: 18 x nodes Up to 18 racks * Cloudera recommendation for "Compute Intensive Workloads": 16 GB per core 14 17.11.2016

Big Data Appliance Network Connectivity Source: Oracle Big Data Appliance: Datacenter Network Integration, Oracle White Paper, 2012 15 17.11.2016

Oracle Big Data Appliance Software Stack (Release 4.6.0) Cloudera Enterprise Data Hub Edition Apache Hadoop (CDH) Cloudera Impala Cloudera Search (Apache Solr) Apache HBase and Apache Accumulo Apache Spark Apache Kafka Cloudera Manager Cloudera Navigator Cloudera Backup and Disaster Recovery (BDR) Oracle Linux, Oracle Java JDK MySQL Database Enterprise Server - Advanced Edition Oracle SQL Connector for HDFS Oracle XQuery for Hadoop Oracle R Advanced Analytics for Hadoop Oracle NoSQL Database (key-value) Community Edition (CE) Enterprise Manager Plug-In 16 17.11.2016

Oracle Big Data Connectors Facilitate access to data stored in an Apache Hadoop cluster. Available on either Oracle Big Data Appliance or a Hadoop cluster running on commodity hardware Oracle SQL Connector for HDFS Oracle Loader for Hadoop Oracle XQuery for Hadoop Oracle R Advanced Analytics for Hadoop Oracle Data Integrator Oracle DataSource for Hadoop (OD4H) Note: The connectors are licensed separately from Oracle Big Data Appliance Source: Oracle 17 17.11.2016

Security for Data at Rest and Data in Motion Authentication through Kerberos Authorization through Apache Sentry Auditing through Oracle Audit Vault Encryption for Data-at-Rest Network Encryption Big Data SQL adds Advanced Security on Hadoop & NoSQL: Masking and Redaction Virtual Private Database: Fine-grain Access Control 18 17.11.2016

Administration with EM Cloud Control Plug-In for EM Cloud Control 12.1.0.4 and later Discover the components of a Big Data Appliance Network as managed targets Manage the HW and SW components Collect metrics to analyze the performance of the network and each BDS component Trigger alerts based on availability and system health Respond to warnings and incidents Always (!) check My Oracle Support Doc ID 1570523.1, "Enterprise Manager for Oracle Big Data Appliance Frequently Asked Questions" 19 17.11.2016

Oracle Big Data Software 20 17.11.2016

Oracle Big Data Software Oracle Big Data SQL Oracle Big Data Discovery Oracle Data Integrator for Big Data Oracle GoldenGate for Big Data 21 17.11.2016

Oracle Big Data SQL Query Data in RDBMS, Hadoop and NoSQL Same query - but there are intelligent optimizations that push the queries down to the source Tables in Hadoop or NoSQL databases are defined as external tables in Oracle (leveraging Hive metastore to determine both parallelism and read semantics) Applying query optimizations to the data (Storage Indexes, Local filtering and Caching) Oracle DataSource for Hadoop (OD4H) 22 17.11.2016

Oracle Big Data SQL (cont.) Oracle Big Data SQL extends SmartScan capabilities (such as filter-predicate offloads) to Oracle external tables with the installation of the Big Data SQL processing agent on the DataNodes of the Hadoop cluster. This technology enables the Hadoop cluster to discard a huge portion of irrelevant data up to 99 percent of the total and return much smaller result sets to the Oracle Database server. Oracle Big Data SQL 3.0 can connect Oracle Database to the Hadoop environment on Oracle Big Data Appliance, other systems based on CDH (Cloudera's Distribution including Apache Hadoop), HDP (Hortonworks Data Platform), and potentially other non-cdh Hadoop systems 23 17.11.2016

Oracle Big Data Discovery The Visual Face of Big Data Uses the power of Apache Spark to process massive amounts of information Uses Oracle Big Data SQL to query the data in HDFS without moving it at all 24 17.11.2016

Oracle Data Integrator (ODI) for Big Data ODI for Big Data is used to transform and enrich data within the big data reservoir ODI for Big Data generates native code that is then run on the underlying Hadoop platform without requiring any additional agents Enable users to build business and data mappings without having to learn HiveQL, Pig Latin and Map Reduce ODI separates the design interface to build logic and the physical implementation layer to run the code 25 17.11.2016

Oracle GoldenGate for Big Data Data Delivery to Big Data Targets Less invasive compared to ETL-Processes Real-Time Data for Streaming Analytics Release 12.2 (Dec. 2015) Native Java Replication Pluggable Formatting Architecture JSON, AVRO, XML, Delimited Text Native Kerberos Support Kafka Targets 26 17.11.2016

Big Data Appliance Setup 27 17.11.2016

Well, first you have to move the box... Safety and Compliance Guide Site Checklist 28 17.11.2016 Source: kerryosborne

Setup cd /opt/oracle/bdamaamoth mammoth s 1 cdh Mammoth is the utility that deploys software on Oracle's Big Data Appliance Step 1 = PreinstallChecks Step 2 = SetupPuppet Step 3 = PatchFactoryImage Step 4 = CopyLicenseFiles Step 5 = CopySoftwareSource Step 6 = CreateUsers Step 7 = SetupMountPoints Step 8 = SetupMySQL Step 9 = InstallHadoop Step 10 = StartHadoopServices Step 11 = InstallBDASoftware Step 12 = SetupKerberos Step 13 = HDFSTransparentEncryption Step 14 = SetupEMAgent Step 15 = SetupASR Step 16 = CleanupInstall Step 17 = CleanupSSHroot (Optional) 29 17.11.2016

Install Big Data Discovery At one node only Takes a couple of minutes as RAID 6 is built locally bdacli enable bdd Some hints... Cannot connect to mysql database => edit temporary password file Needs email adress during setup dialog Installation shows finished successfully... but was not 30 17.11.2016

Patching Patching means: Software to raise a pre-existing software release number E.g. CDH 5.5.1 to CDH 5.5.2 Example: Re-Image to 4.2.0 with Patch 22118555 (3.6G) JSON specs must exist at server to be reimaged Re-Imaging writes image to internal usb and boots from usb BDA Configurator v4.4.0-1 BDA Patch 4.4.0 P22537238_440_Linux-x86-64_1of3.zip Mammoth P22537238_440_Linux-x86-64_2of3.zip BDABaseImage-ol6-4.4.0_RELEASE.iso P22537238_440_Linux-x86-64_3of3.zip BDAExtras-ol6-4.4.0 31 17.11.2016

From our experience... A Big Data Appliance Admin needs a broad skill set Unix admin skills are mandatory (ssh, X-server, scp, networking,...) Oracle Engineered System expertise helps a lot (Exadata, ODA, Infiniband,...) Cloudera administration skills are usefull Setup and patching: Always check for known issues on My Oracle Support (see references for Doc IDs) Check logfiles after every step Pay attention to Infiniband Firmware Release on IB Switches when connecting Exadata and BDA (require exact same version) 32 17.11.2016

Use Case 33 17.11.2016

Use Case "Fraud Detection" Company: Reference: Oracle Open World 2016 Business Case: Fraud Detection Motivation Statement: "Mit der BDA wollen wir unsere Analysen zur Betrugserkennung um zusätzlichen Dimensionen verfeinern. Die BDA erfasst zum Beispiel auch Sport-Performance- Kennzahlen wie die Laufleistung der einzelnen Spieler, die sie dann mit seiner durchschnittlichen Laufleistung vergleichen können. Krass untypische Leistungswerte können ein Hinweis auf vorab getroffene Absprachen sein, dem wir dann nachgehen." Reference: Computer World 34 17.11.2016

Use Case Solution Big Data Appliance as "Data Reservoir" Key arguments from customer perspective "Die Exadata und die BDA im Tandem bieten uns Integrationsvorteile, die wir mit Konkurrenzsystemen nicht so einfach erzielen können." * Fast start to "Big Data" Comprehensive software stack for data analytics Start small, grow on demand Ready for future (yet unknown) demands *Reference: Computer World 35 17.11.2016

Summary 36 17.11.2016

Summary The main technical advantage when deploying Big Data SQL on the Oracle Big Data Appliance is InfiniBand s high bandwidth to other Oracle Engineered Systems Other BDA exclusive features: Perfect Balance for reduce tasks The Big Data Appliance provides a solid enterprise-class infrastructure (HW & SW) Installation and patching procedures are not yet as mature on the BDA as on other engineered systems like Exadata Leveraging the full potential of a BDA requires both Engineered System expertise and Data Analytics knowhow 37 17.11.2016

Is the Big Data Appliance the Rigth Choice for You? Yes, if... you need a fast start to production ready data analytics you already run other Oracle engineered systems with Infiniband technology your use case involves data in RDBMS, Haddop and NoSQL and you have high query performance demands you have an important business case with unpredictable grow J you like to stay with cloudera 38 17.11.2016

Questions and responses Daniel Steiger Principal Consultant daniel.steiger@trivadis.com 39 17.11.2016

Trivadis @ DOAG 2016 Booth: 3rd Floor next to the escalator Know how, T-Shirts, Contest and Trivadis Power to go We look forward to your visit Because with Trivadis you always win! 40 17.11.2016

Links & References 41 17.11.2016

Links and References (1) An Enterprise Architect s Guide to Big Data Reference Architecture Overview http://www.oracle.com/technetwork/topics/entarch/oracle-wp-big-data-refarch-2019930.pdf Oracle Big Data Management System Statement of Direction http://www.oracle.com/ocom/groups/public/@otn/documents/webcontent/2516729.pdf Oracle Big Data Appliance Documentation https://docs.oracle.com/bigdata/bda46/ Oracle Big Data Lite Virtual Machine http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite- 2104726.html#wp 42 17.11.2016

Links and References (2) Information Center: Oracle Big Data Appliance (My Oracle Support Doc ID 1445762.2) http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoopcluster/ Oracle Big Data SQL: One Fast Query, All Your Data https://blogs.oracle.com/datawarehousing/entry/oracle_big_data_sql_one 43 17.11.2016

Links and References (3) Owner s Guide Owner's Guide Release 4 (4.4) E65664-03 January 2016 Oracle Big Data Appliance Patch Set Master Note, Doc ID 1485745.1 Information Center: Install/Upgrade/Configure Oracle BDA, Doc ID 1445745.2 Oracle BDA Base Image Version 4.2.0 for New Installations on OL6, Doc ID 2077858.1 (Base for BMR to finally reach 4.4.0) Oracle Big Data Appliance Installation Frequently Asked Questions, Doc ID 1518939.1 Upgrading CDH, Doc ID 2109175.1 How to Enable/Disable Oracle Big Data Discovery on Oracle Big Data Appliance V4.3/OL6 with bdacli, Doc ID 2083079.1 (is also (not) valid for 4.4) "bdacli enable bdd" Fails with "ERROR: Error getting mysql database status" on BDA 4.4.0 / BDD 1.1, Doc ID 2109175.1 44 17.11.2016

Backup Slides 45 17.11.2016

Oracle Big Data SQL Licensing All nodes within the Hadoop cluster that runs Oracle Big Data SQL must be licensed. A separate license must be procured per disk per Hadoop cluster. All disks within every node that is part of a cluster running Oracle Big Data SQL must be licensed. Partial licensing within a node is not available. All nodes in the cluster are included. Only the Hadoop cluster side (Oracle Big Data Appliance, or other) of an Oracle Big Data SQL installation is licensed and no additional license is required for the database server side. 46 17.11.2016

BDA Prize List 47 17.11.2016

Big Data in the Cloud Offering & Prizing Reference: https://cloud.oracle.com 48 17.11.2016

BDA Specific Software Features Oracle NoSQL Database Oracle NoSQL Database is a distributed key-value database built on storage technology of Berkeley DB Java Edition. An intelligent driver on top of Berkeley DB keeps track of the underlying storage topology, shards the data and knows where data can be placed with the lowest latency. Oracle R Support for Big Data The standard R distribution is installed on all nodes of Oracle Big Data Appliance Oracle R Connector for Hadoop provides R users with highperformance, native access to HDFS and the MapReduce programming framework Oracle R Enterprise is a separate package that provides real-time access to Oracle Database. 49 17.11.2016

Big Data Preparation (Cloud Service) Self-service data preparation for domain experts Ingest, prepare, enrich, and publish data with a unified cloud-based data wrangling solution Unique combination of Natural Language Processing (NLP) with Machine Learning (ML) Leverage Linked Open Data graph of domain knowledge Powered by Apache Spark See https://cloud.oracle.com/en_us/big-data-preparation 50 17.11.2016