Big Data Infrastructure The Oracle Way. Daniel Steiger BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
About... Daniel Steiger Principal Consultant @ Trivadis Oracle DBA and IT Infrastructure Architect Program Manager IT Infrastructure Optimization Co-Author "Der Oracle DBA", Hanser Verlag Speaker and Teacher 2 17.11.2016
Our company. Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: O P E R A T I O N Trivadis Services takes over the interacting operation of your IT systems. 3 17.11.2016
With over 600 specialists and IT experts in your region. COPENHAGEN HAMBURG 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants DÜSSELDORF Research and development budget: CHF 5.0 million FRANKFURT Financially self-supporting and sustainably profitable BASEL FREIBURG STUTTGART BRUGG ZURICH MUNICH VIENNA Experience from more than 1,900 projects per year at over 800 customers GENEVA BERN LAUSANNE 4 17.11.2016
Agenda 1. Introduction 2. Oracle Big Data Infrastructure 3. Oracle Big Data Software 4. BDA Setup 5. Use Case 6. Summary 5 17.11.2016
Introduction 6 17.11.2016
Hadoop is born from Apache Nutch 197 Foundation of Cloudera 2006 2008 2011 2012 Oracle Rolls Out 'Big Data Appliance' Oracle Makes Big Data Appliance Move With Cloudera 7 17.11.2016
About the Current State of Big Data Technology "Cloudera is eight; Apache Hadoop is ten. Big data has gone from zero to how-did-that-happen huge. The bestiary is bigger than ever, too: new projects like Apache Kudu, Apache Impala (incubating), Apache Kafka and Apache Spark define the future of big data and analytics, extending the core Hadoop platform to handle streaming, real-time and advanced analytics." Mike Olson, Cloudera CSO and Co-Founder, Aug. 25, 2016 8 17.11.2016
Data Lakes and Reservoirs Since the data doesn t just sit there until it evaporates but eventually flows to various applications, we should think of this as a data reservoir rather than a data lake. http://blogs.informatica.com 9 17.11.2016
Data Reservoir Functions Ingestion Storage/Retention Processing Access Source: Architecting Data Lakes, 2016 O Reilly Media, Inc. 10 17.11.2016
Oracle Big Data Management System Architecture Schema-on-read Raw data Complex processing Huge volume at low cost Schema-on-write Cleansed data Complex integration Large volume at moderate cost 11 17.11.2016
Oracle's Big Data Solution A complete and optimized solution for big data Tight integration with Exadata, Exalogic, Exalytics and SPARC Supercluster using Infiniband network Single-vendor support for both hardware and software 12 17.11.2016
Oracle Big Data Infrastructure 13 17.11.2016
The Big Data Appliance X6-2 Hardware Per Node (X6-2): 2 x 22-Core (2.2GHz) Intel Xeon E5-2699 v4 8 x 32GB DDR4-2400 Memory (max. 768GB) 12 x 8TB 7,200 RPM High Capacity SAS Drives 2 x QDR 40Gb/sec InfiniBand Ports 4 x 10 Gb Ethernet Ports, 1 x ILOM Ethernet Port RAM to CPU Ratio: ODA X6-2M: 38 GB per Core MiniCluster S7-2: 32 GB per Core BDA: 17.5 GB per Core* Starter Rack: 6 x nodes Full Rack: 18 x nodes Up to 18 racks * Cloudera recommendation for "Compute Intensive Workloads": 16 GB per core 14 17.11.2016
Big Data Appliance Network Connectivity Source: Oracle Big Data Appliance: Datacenter Network Integration, Oracle White Paper, 2012 15 17.11.2016
Oracle Big Data Appliance Software Stack (Release 4.6.0) Cloudera Enterprise Data Hub Edition Apache Hadoop (CDH) Cloudera Impala Cloudera Search (Apache Solr) Apache HBase and Apache Accumulo Apache Spark Apache Kafka Cloudera Manager Cloudera Navigator Cloudera Backup and Disaster Recovery (BDR) Oracle Linux, Oracle Java JDK MySQL Database Enterprise Server - Advanced Edition Oracle SQL Connector for HDFS Oracle XQuery for Hadoop Oracle R Advanced Analytics for Hadoop Oracle NoSQL Database (key-value) Community Edition (CE) Enterprise Manager Plug-In 16 17.11.2016
Oracle Big Data Connectors Facilitate access to data stored in an Apache Hadoop cluster. Available on either Oracle Big Data Appliance or a Hadoop cluster running on commodity hardware Oracle SQL Connector for HDFS Oracle Loader for Hadoop Oracle XQuery for Hadoop Oracle R Advanced Analytics for Hadoop Oracle Data Integrator Oracle DataSource for Hadoop (OD4H) Note: The connectors are licensed separately from Oracle Big Data Appliance Source: Oracle 17 17.11.2016
Security for Data at Rest and Data in Motion Authentication through Kerberos Authorization through Apache Sentry Auditing through Oracle Audit Vault Encryption for Data-at-Rest Network Encryption Big Data SQL adds Advanced Security on Hadoop & NoSQL: Masking and Redaction Virtual Private Database: Fine-grain Access Control 18 17.11.2016
Administration with EM Cloud Control Plug-In for EM Cloud Control 12.1.0.4 and later Discover the components of a Big Data Appliance Network as managed targets Manage the HW and SW components Collect metrics to analyze the performance of the network and each BDS component Trigger alerts based on availability and system health Respond to warnings and incidents Always (!) check My Oracle Support Doc ID 1570523.1, "Enterprise Manager for Oracle Big Data Appliance Frequently Asked Questions" 19 17.11.2016
Oracle Big Data Software 20 17.11.2016
Oracle Big Data Software Oracle Big Data SQL Oracle Big Data Discovery Oracle Data Integrator for Big Data Oracle GoldenGate for Big Data 21 17.11.2016
Oracle Big Data SQL Query Data in RDBMS, Hadoop and NoSQL Same query - but there are intelligent optimizations that push the queries down to the source Tables in Hadoop or NoSQL databases are defined as external tables in Oracle (leveraging Hive metastore to determine both parallelism and read semantics) Applying query optimizations to the data (Storage Indexes, Local filtering and Caching) Oracle DataSource for Hadoop (OD4H) 22 17.11.2016
Oracle Big Data SQL (cont.) Oracle Big Data SQL extends SmartScan capabilities (such as filter-predicate offloads) to Oracle external tables with the installation of the Big Data SQL processing agent on the DataNodes of the Hadoop cluster. This technology enables the Hadoop cluster to discard a huge portion of irrelevant data up to 99 percent of the total and return much smaller result sets to the Oracle Database server. Oracle Big Data SQL 3.0 can connect Oracle Database to the Hadoop environment on Oracle Big Data Appliance, other systems based on CDH (Cloudera's Distribution including Apache Hadoop), HDP (Hortonworks Data Platform), and potentially other non-cdh Hadoop systems 23 17.11.2016
Oracle Big Data Discovery The Visual Face of Big Data Uses the power of Apache Spark to process massive amounts of information Uses Oracle Big Data SQL to query the data in HDFS without moving it at all 24 17.11.2016
Oracle Data Integrator (ODI) for Big Data ODI for Big Data is used to transform and enrich data within the big data reservoir ODI for Big Data generates native code that is then run on the underlying Hadoop platform without requiring any additional agents Enable users to build business and data mappings without having to learn HiveQL, Pig Latin and Map Reduce ODI separates the design interface to build logic and the physical implementation layer to run the code 25 17.11.2016
Oracle GoldenGate for Big Data Data Delivery to Big Data Targets Less invasive compared to ETL-Processes Real-Time Data for Streaming Analytics Release 12.2 (Dec. 2015) Native Java Replication Pluggable Formatting Architecture JSON, AVRO, XML, Delimited Text Native Kerberos Support Kafka Targets 26 17.11.2016
Big Data Appliance Setup 27 17.11.2016
Well, first you have to move the box... Safety and Compliance Guide Site Checklist 28 17.11.2016 Source: kerryosborne
Setup cd /opt/oracle/bdamaamoth mammoth s 1 cdh Mammoth is the utility that deploys software on Oracle's Big Data Appliance Step 1 = PreinstallChecks Step 2 = SetupPuppet Step 3 = PatchFactoryImage Step 4 = CopyLicenseFiles Step 5 = CopySoftwareSource Step 6 = CreateUsers Step 7 = SetupMountPoints Step 8 = SetupMySQL Step 9 = InstallHadoop Step 10 = StartHadoopServices Step 11 = InstallBDASoftware Step 12 = SetupKerberos Step 13 = HDFSTransparentEncryption Step 14 = SetupEMAgent Step 15 = SetupASR Step 16 = CleanupInstall Step 17 = CleanupSSHroot (Optional) 29 17.11.2016
Install Big Data Discovery At one node only Takes a couple of minutes as RAID 6 is built locally bdacli enable bdd Some hints... Cannot connect to mysql database => edit temporary password file Needs email adress during setup dialog Installation shows finished successfully... but was not 30 17.11.2016
Patching Patching means: Software to raise a pre-existing software release number E.g. CDH 5.5.1 to CDH 5.5.2 Example: Re-Image to 4.2.0 with Patch 22118555 (3.6G) JSON specs must exist at server to be reimaged Re-Imaging writes image to internal usb and boots from usb BDA Configurator v4.4.0-1 BDA Patch 4.4.0 P22537238_440_Linux-x86-64_1of3.zip Mammoth P22537238_440_Linux-x86-64_2of3.zip BDABaseImage-ol6-4.4.0_RELEASE.iso P22537238_440_Linux-x86-64_3of3.zip BDAExtras-ol6-4.4.0 31 17.11.2016
From our experience... A Big Data Appliance Admin needs a broad skill set Unix admin skills are mandatory (ssh, X-server, scp, networking,...) Oracle Engineered System expertise helps a lot (Exadata, ODA, Infiniband,...) Cloudera administration skills are usefull Setup and patching: Always check for known issues on My Oracle Support (see references for Doc IDs) Check logfiles after every step Pay attention to Infiniband Firmware Release on IB Switches when connecting Exadata and BDA (require exact same version) 32 17.11.2016
Use Case 33 17.11.2016
Use Case "Fraud Detection" Company: Reference: Oracle Open World 2016 Business Case: Fraud Detection Motivation Statement: "Mit der BDA wollen wir unsere Analysen zur Betrugserkennung um zusätzlichen Dimensionen verfeinern. Die BDA erfasst zum Beispiel auch Sport-Performance- Kennzahlen wie die Laufleistung der einzelnen Spieler, die sie dann mit seiner durchschnittlichen Laufleistung vergleichen können. Krass untypische Leistungswerte können ein Hinweis auf vorab getroffene Absprachen sein, dem wir dann nachgehen." Reference: Computer World 34 17.11.2016
Use Case Solution Big Data Appliance as "Data Reservoir" Key arguments from customer perspective "Die Exadata und die BDA im Tandem bieten uns Integrationsvorteile, die wir mit Konkurrenzsystemen nicht so einfach erzielen können." * Fast start to "Big Data" Comprehensive software stack for data analytics Start small, grow on demand Ready for future (yet unknown) demands *Reference: Computer World 35 17.11.2016
Summary 36 17.11.2016
Summary The main technical advantage when deploying Big Data SQL on the Oracle Big Data Appliance is InfiniBand s high bandwidth to other Oracle Engineered Systems Other BDA exclusive features: Perfect Balance for reduce tasks The Big Data Appliance provides a solid enterprise-class infrastructure (HW & SW) Installation and patching procedures are not yet as mature on the BDA as on other engineered systems like Exadata Leveraging the full potential of a BDA requires both Engineered System expertise and Data Analytics knowhow 37 17.11.2016
Is the Big Data Appliance the Rigth Choice for You? Yes, if... you need a fast start to production ready data analytics you already run other Oracle engineered systems with Infiniband technology your use case involves data in RDBMS, Haddop and NoSQL and you have high query performance demands you have an important business case with unpredictable grow J you like to stay with cloudera 38 17.11.2016
Questions and responses Daniel Steiger Principal Consultant daniel.steiger@trivadis.com 39 17.11.2016
Trivadis @ DOAG 2016 Booth: 3rd Floor next to the escalator Know how, T-Shirts, Contest and Trivadis Power to go We look forward to your visit Because with Trivadis you always win! 40 17.11.2016
Links & References 41 17.11.2016
Links and References (1) An Enterprise Architect s Guide to Big Data Reference Architecture Overview http://www.oracle.com/technetwork/topics/entarch/oracle-wp-big-data-refarch-2019930.pdf Oracle Big Data Management System Statement of Direction http://www.oracle.com/ocom/groups/public/@otn/documents/webcontent/2516729.pdf Oracle Big Data Appliance Documentation https://docs.oracle.com/bigdata/bda46/ Oracle Big Data Lite Virtual Machine http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite- 2104726.html#wp 42 17.11.2016
Links and References (2) Information Center: Oracle Big Data Appliance (My Oracle Support Doc ID 1445762.2) http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoopcluster/ Oracle Big Data SQL: One Fast Query, All Your Data https://blogs.oracle.com/datawarehousing/entry/oracle_big_data_sql_one 43 17.11.2016
Links and References (3) Owner s Guide Owner's Guide Release 4 (4.4) E65664-03 January 2016 Oracle Big Data Appliance Patch Set Master Note, Doc ID 1485745.1 Information Center: Install/Upgrade/Configure Oracle BDA, Doc ID 1445745.2 Oracle BDA Base Image Version 4.2.0 for New Installations on OL6, Doc ID 2077858.1 (Base for BMR to finally reach 4.4.0) Oracle Big Data Appliance Installation Frequently Asked Questions, Doc ID 1518939.1 Upgrading CDH, Doc ID 2109175.1 How to Enable/Disable Oracle Big Data Discovery on Oracle Big Data Appliance V4.3/OL6 with bdacli, Doc ID 2083079.1 (is also (not) valid for 4.4) "bdacli enable bdd" Fails with "ERROR: Error getting mysql database status" on BDA 4.4.0 / BDD 1.1, Doc ID 2109175.1 44 17.11.2016
Backup Slides 45 17.11.2016
Oracle Big Data SQL Licensing All nodes within the Hadoop cluster that runs Oracle Big Data SQL must be licensed. A separate license must be procured per disk per Hadoop cluster. All disks within every node that is part of a cluster running Oracle Big Data SQL must be licensed. Partial licensing within a node is not available. All nodes in the cluster are included. Only the Hadoop cluster side (Oracle Big Data Appliance, or other) of an Oracle Big Data SQL installation is licensed and no additional license is required for the database server side. 46 17.11.2016
BDA Prize List 47 17.11.2016
Big Data in the Cloud Offering & Prizing Reference: https://cloud.oracle.com 48 17.11.2016
BDA Specific Software Features Oracle NoSQL Database Oracle NoSQL Database is a distributed key-value database built on storage technology of Berkeley DB Java Edition. An intelligent driver on top of Berkeley DB keeps track of the underlying storage topology, shards the data and knows where data can be placed with the lowest latency. Oracle R Support for Big Data The standard R distribution is installed on all nodes of Oracle Big Data Appliance Oracle R Connector for Hadoop provides R users with highperformance, native access to HDFS and the MapReduce programming framework Oracle R Enterprise is a separate package that provides real-time access to Oracle Database. 49 17.11.2016
Big Data Preparation (Cloud Service) Self-service data preparation for domain experts Ingest, prepare, enrich, and publish data with a unified cloud-based data wrangling solution Unique combination of Natural Language Processing (NLP) with Machine Learning (ML) Leverage Linked Open Data graph of domain knowledge Powered by Apache Spark See https://cloud.oracle.com/en_us/big-data-preparation 50 17.11.2016