Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 2
Kontakt Harald Erb Principal Sales Consultant Information Architect Kontakt +49 (0)6103 397-403 harald.erb@oracle.com
Data Lab Introduction
1876: Edison s Invention Factory, Menlo Park, NJ SE Presentation H. Erb
Today: Characteristics of Digital Business Leaders They Reframe Challenges Looking at them from new perspectives and multiple angles They Sprint They work at pace - researching, testing and evaluating current ideas while generating new ones They Appreciate That Failure Can Be Good and are not afraid of new ideas They Convert Data Into Value They invest heavily in analyzing their own data and data from external sources to establish patterns and un-noticed opportunities
Data Science Process Model Klaas Bollhoefer, Chief Data Scientist, *um 7
Synergizing Skills BICC and/or ACC? Analytical Competency Center (ACC)» Separate group reporting to CxO. Typically not part of a Business Intelligence Competency Center (BICC)» Mission: broadening the adoption of Analytics across the organization» Skilled resource pool of Data Scientists, Statisticians and Business Experts» Data-driven approach (not development-driven) with privileged access to enterprise data sources» Group will be assigned to projects for a limited time ACC Copyright 2016, Oracle and/or its affiliates. All rights reserved. 8
Adopted from Hugenberg 2011-S.168 Data Projects Define Problem Fundamental business question to be solved: Structure Problem What? How? Create Analysis Plan Business question Create Work Plan Problem area: Root of problem: Hypothesis Task Why? Decision maker: Decision criteria: Hypothesis Analysis Boundaries of problem handling: Solution limitations:? Yes No Source Week Business problem? What is at issue? What needs to be analyzed? Precise goal definition Deliminations Useful data / structure? Hypothesis definition Verify correlations Descriptive analysis Data preparation Select Necessary information? Available Information? Which quality? Data owner? Available data sets? Copyright 2016, Oracle and/or its affiliates. All rights reserved. DOAG 2016 Konferenz, Nürnberg 9
Data Lab Catalyst for Innovation» A Data Lab is a complete analytics environment with access to all data» Where interdisciplinary teams work together to invent, research and develop data projects» To create value from data by commercializing those data projects 10
Data Lab Key Requirements Based on Raw Data Full Access to Data Sources (Select only) Complete Sandbox Environment Agile Experimentation Fail Fast 11
Common SQL Access to ALL Data Data Management: Logical Architecture Structured Data Metadata Management Master Data Orchestration, Scheduling & Monitoring Applications Channels Data Stores Non-structured Sources Interactions Data Ingestion Batch Integration Real-Time Integration Data Streaming Data Wrangling Raw Data Sets Data Lake Data Processing Data Enrichment Data Aggregation Curated & Transformed Data Sets Operational Data Store Enterprise Information Store Data Federation & Virtualization Layer Reporting / Business Intelligence Advanced Analytics Data Driven Applications Logs Social Media External Data Adhoc Files or Data Sets Sandboxes Data Lab Data Catalog Transformations Prototyping Data Discovery Tools Analytic Tools Line of Governance 12
Oracle Big Data Discovery Enabling Business Analysts
Enabling Business Analysts to become Citizen Data Scientists Problem: A Data Lab is currently the realm of the elusive Data Scientist Opportunity: Unlock the Data Lab to Business Analysts and guide them to act like data scientists
Business Analysts require a new approach to the Data Lab A single intuitive and visual user interface, to... find explore transform discover share find and explore relevant Data Sets & to understand its potential quickly transform and enrich it to make it better unlock Big Data for anyone to discover and share new insights 18
Team Sport in the Data Lab One Exploratory Analysis Tool for Business Analysts & Data Scientists? DWH / OLTP Business Analyst New KPI, Report Requirement Databases Hadoop Discovery Output New Data Set (cleaned / enriched) Data Engineer Data Scientist Data Science 19
Analysis Scenario: Investigation of Car Complaints Demo Part 1 20
Analysis Scenario: Available Data Demo Part 1 Internal Data (Warranty Claims) Additional Data (i.e. demographics) hadoop fs -cat /datasets/warranty/claims_full.txt less 21
Data Loading Data Ingest Overview New Data Set in BDD Data Catalog File Upload BDD Studio Big Data Discovery Data Proc. Client 22
Oracle Big Data Discovery Python Interface
Team Sport in the Data Lab Data Scientist is taking over and applies Statistical Methods DWH / OLTP Business Analyst New KPI, Report Requirement Databases Hadoop Discovery Output Data Engineer Data Scientist New Data Set (cleaned / enriched) Data Science 24
Shaping a Data Set for further processing Handling of sparse Data / NULL values 25
Shaping a Data Set for further processing Aggregation Roll up low-level data to higher grains Production Year Vehicle Model Year Vehicle Make Intuitive UI helps analysts find the right grains Execute at full scale using Spark Results can be sampled or indexed in full 26
Shaping a Data Set for further processing Combining multiple Data Sets Blend huge datasets in BDD UI to support experimentation, preview Execute at scale with Spark Results can be sampled or indexed in full 27
Shaping a Data Set for further processing Export new Data Set Hive Table in Hadoop 28
BDD-Shell interface Point of Contact with Data Scientists BDD Shell is an interactive tool designed to work with BDD without using Studio's front-end Provides a way to explore and manipulate the internals of BDD and interact with Hadoop Python-based shell Exposes all BDD data objects Easy-to-use Python Wrappers for BDD APIs and Python Utilities Use of Third-party Libraries, e.g., Pandas and NumPy 29
Data Analysis with Python (Re-)use data from Oracle Big Data Discovery while working with the BDD Shell Import Package NumPy (Numerical Python) Import Spark Machine Learning library MLlib List of Oracle Big Data Discovery Data Sets Converting a Oracle Big Data Discovery Data Set into an Apache Spark Dataframe 30
Leveraging Notebooks for a better user experience Point of Contact with Data Scientists Easiest way to use the BDD-Shell Visual appeal, ease of use, collaboration features of an integrated platform Power and flexibility of custom code Pick up BDD s datasets and leverage Machine Learning algorithms to infer new insight www.jupyter.org 31
Re-using a Data Set for Machine Learning Demo Part 2 Shaping a new Data Set in Big Data Discovery Tool Machine Learning with Python ML & Spark in Jupyter Notebook
Using Jupyter Notebook, Python ML & Spark Demo Part 2 33
Oracle Big Data Discovery Customer Cases
Prototype Testing
Prototype Testing 1. A flexible environment to exploit all available data for prototype testing discovery 2. Can driver comments really add value to our prototype testing discovery? 3. What is the relationship between errors? 1 Telemetry 1.2 Billion rows at 100Hz Data Platform Analysis & Dashboarding 2 Errors 3 Driver Comments Storage Factory Discovery Lab
Post-LHC accelerator projects (80-100 km)
CERN Data Lab Reliability, Availability, Maintainability and Safety (RAMS) studies for the Future Circular Collider (FCC). Use RAMS findings to assess the feasibility of the needs of FCC
Customer DNA
Customer DNA 41
Model Monitoring Master Data / Operational Data Data Quality Garanti Bank s Analytics Platform Changed Data Data Store Analytics Laboratory Production Oracle Big Data Appliance Hive Oracle Exadata Data Warehouse Discovery & Exploration R SAS R Oracle R Enterprise on Exadata Enterprise Guide & Miner R Studio SAS Studio Modelling Oracle R Advanced Analytics for Hadoop SAS In-Memory Statistics for Hadoop Scoring Java R SQLite FICO Blaze Advisor SQL SAS SQL / HSQL SAS Visual Statistics SAS R Enriched Data On Demand Data SAS VA Oracle Big Data Discovery Deployment SAS Decision Manager Model Repository Text Miner Decision Tree On Demand 42
Oracle Big Data Discovery Deployment Options
Multiple Deployment Options for Oracle Big Data Discovery
Q & A
46