Oracle Big Data Discovery

Similar documents
Introducing Oracle Machine Learning

SAP Agile Data Preparation Simplify the Way You Shape Data PUBLIC

BEST BIG DATA CERTIFICATIONS

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Latest from the Lab: What's New Machine Learning Sam Buhler - Machine Learning Product/Offering Manager

Intelligence for the connected world How European First-Movers Manage IoT Analytics Projects Successfully

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Python With Data Science

Oracle Big Data Discovery

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Hitachi Vantara Overview Pentaho 8.0 and 8.1 Roadmap. Pedro Alves

Understanding the latent value in all content

Blurring the Line Between Developer and Data Scientist

Oracle Big Data Discovery

Oracle Big Data Connectors

Oracle Machine Learning Notebook

Data Science. Data Analyst. Data Scientist. Data Architect

CONTAINER CLOUD SERVICE. Managing Containers Easily on Oracle Public Cloud

Take P, R or U. and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

MDM Partner Summit 2015 Oracle Enterprise Data Quality Overview & Roadmap

Deploying, Managing and Reusing R Models in an Enterprise Environment

Microsoft Developer Day

Safe Harbor Statement

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Using the Force of Python and SAS Viya on Star Wars Fan Posts

EUROPEAN ICT PROFESSIONAL ROLE PROFILES VERSION 2 CWA 16458:2018 LOGFILE

Fast Innovation requires Fast IT

Security and Performance advances with Oracle Big Data SQL

Data Analyst Nanodegree Syllabus

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016

Javaentwicklung in der Oracle Cloud

Overview of Data Services and Streaming Data Solution with Azure

Data Analyst Nanodegree Syllabus

BIG DATA COURSE CONTENT

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Oracle Enterprise Data Quality - Roadmap

Autonomous Data Warehouse in the Cloud

TIBCO Spotfire Statement of Direction. Spotfire Product Management

Transforming IT: From Silos To Services

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

R Language for the SQL Server DBA

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Education Brochure. Education. Accelerate your path to business discovery. qlik.com

How to Troubleshoot Databases and Exadata Using Oracle Log Analytics

An Enchanted World: SAS in an Open Ecosystem

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List)

Informatica Enterprise Information Catalog

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

Connecting your Microservices and Cloud Services with Oracle Integration CON7348

Analytics Fundamentals by Mark Peco

Saving ETL Costs Through Data Virtualization Across The Enterprise

Safe Harbor Statement

Data Management and Security in the GDPR Era

Modern Data Warehouse The New Approach to Azure BI

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Oracle Machine Learning Notebook

Oliver Engels & Tillmann Eitelberg. Big Data! Big Quality?

Deploying Spatial Applications in Oracle Public Cloud

Continuous delivery of Java applications. Marek Kratky Principal Sales Consultant Oracle Cloud Platform. May, 2016

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

Oracle Data Integrator 12c: Integration and Administration

Integrating your CX, ERP and HCM Clouds with your On-premises Applications CON7012

Big Data Architect.

Big Data analytics in insurance

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Improving Your Business with Oracle Data Integration See How Oracle Enterprise Metadata Management Can Help You

Aufbau agiler BI- & Discovery-Applikationen mit Oracle Endeca. DOAG 2012 Nürnberg, 20. November Harald Erb Solution Architect BI & DWH

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

SIEM Solutions from McAfee

Oracle R Technologies

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016

Specialist ICT Learning

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data and Enterprise Data, Bridging Two Worlds with Oracle Data Integration

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

THE RISE OF. The Disruptive Data Warehouse

A Tutorial on Apache Spark

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Digital Excellence. Inventer de nouveaux services et nouveaux usages Témoignage de la ville de Marseille

Capture Business Opportunities from Systems of Record and Systems of Innovation

Industrial IoT: Architecture Framework Use Cases. Artur Borycki Teradata Labs

The Data Explosion. A Guide to Oracle s Data-Management Cloud Services

Apache Ignite - Using a Memory Grid for Heterogeneous Computation Frameworks A Use Case Guided Explanation. Chris Herrera Hashmap

Modern and Fast: A New Wave of Database and Java in the Cloud. Joost Pronk Van Hoogeveen Lead Product Manager, Oracle

IT directors, CIO s, IT Managers, BI Managers, data warehousing professionals, data scientists, enterprise architects, data architects

Self-Service Data Preparation for Qlik. Cookbook Series Self-Service Data Preparation for Qlik

ORACLE SERVICES FOR APPLICATION MIGRATIONS TO ORACLE HARDWARE INFRASTRUCTURES

RSDs vs Dossiers Best Practices on When and Where to use them

Approaching the Petabyte Analytic Database: What I learned

Fusion Product Hub Training Data Governance: Business Rules and Impact Analysis. July 2014

Oracle Endeca Information Discovery

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Open And Linked Data Oracle proposition Subtitle

Advanced Solutions of Microsoft SharePoint Server 2013

What is Gluent? The Gluent Data Platform

Transcription:

Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1

Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 2

Kontakt Harald Erb Principal Sales Consultant Information Architect Kontakt +49 (0)6103 397-403 harald.erb@oracle.com

Data Lab Introduction

1876: Edison s Invention Factory, Menlo Park, NJ SE Presentation H. Erb

Today: Characteristics of Digital Business Leaders They Reframe Challenges Looking at them from new perspectives and multiple angles They Sprint They work at pace - researching, testing and evaluating current ideas while generating new ones They Appreciate That Failure Can Be Good and are not afraid of new ideas They Convert Data Into Value They invest heavily in analyzing their own data and data from external sources to establish patterns and un-noticed opportunities

Data Science Process Model Klaas Bollhoefer, Chief Data Scientist, *um 7

Synergizing Skills BICC and/or ACC? Analytical Competency Center (ACC)» Separate group reporting to CxO. Typically not part of a Business Intelligence Competency Center (BICC)» Mission: broadening the adoption of Analytics across the organization» Skilled resource pool of Data Scientists, Statisticians and Business Experts» Data-driven approach (not development-driven) with privileged access to enterprise data sources» Group will be assigned to projects for a limited time ACC Copyright 2016, Oracle and/or its affiliates. All rights reserved. 8

Adopted from Hugenberg 2011-S.168 Data Projects Define Problem Fundamental business question to be solved: Structure Problem What? How? Create Analysis Plan Business question Create Work Plan Problem area: Root of problem: Hypothesis Task Why? Decision maker: Decision criteria: Hypothesis Analysis Boundaries of problem handling: Solution limitations:? Yes No Source Week Business problem? What is at issue? What needs to be analyzed? Precise goal definition Deliminations Useful data / structure? Hypothesis definition Verify correlations Descriptive analysis Data preparation Select Necessary information? Available Information? Which quality? Data owner? Available data sets? Copyright 2016, Oracle and/or its affiliates. All rights reserved. DOAG 2016 Konferenz, Nürnberg 9

Data Lab Catalyst for Innovation» A Data Lab is a complete analytics environment with access to all data» Where interdisciplinary teams work together to invent, research and develop data projects» To create value from data by commercializing those data projects 10

Data Lab Key Requirements Based on Raw Data Full Access to Data Sources (Select only) Complete Sandbox Environment Agile Experimentation Fail Fast 11

Common SQL Access to ALL Data Data Management: Logical Architecture Structured Data Metadata Management Master Data Orchestration, Scheduling & Monitoring Applications Channels Data Stores Non-structured Sources Interactions Data Ingestion Batch Integration Real-Time Integration Data Streaming Data Wrangling Raw Data Sets Data Lake Data Processing Data Enrichment Data Aggregation Curated & Transformed Data Sets Operational Data Store Enterprise Information Store Data Federation & Virtualization Layer Reporting / Business Intelligence Advanced Analytics Data Driven Applications Logs Social Media External Data Adhoc Files or Data Sets Sandboxes Data Lab Data Catalog Transformations Prototyping Data Discovery Tools Analytic Tools Line of Governance 12

Oracle Big Data Discovery Enabling Business Analysts

Enabling Business Analysts to become Citizen Data Scientists Problem: A Data Lab is currently the realm of the elusive Data Scientist Opportunity: Unlock the Data Lab to Business Analysts and guide them to act like data scientists

Business Analysts require a new approach to the Data Lab A single intuitive and visual user interface, to... find explore transform discover share find and explore relevant Data Sets & to understand its potential quickly transform and enrich it to make it better unlock Big Data for anyone to discover and share new insights 18

Team Sport in the Data Lab One Exploratory Analysis Tool for Business Analysts & Data Scientists? DWH / OLTP Business Analyst New KPI, Report Requirement Databases Hadoop Discovery Output New Data Set (cleaned / enriched) Data Engineer Data Scientist Data Science 19

Analysis Scenario: Investigation of Car Complaints Demo Part 1 20

Analysis Scenario: Available Data Demo Part 1 Internal Data (Warranty Claims) Additional Data (i.e. demographics) hadoop fs -cat /datasets/warranty/claims_full.txt less 21

Data Loading Data Ingest Overview New Data Set in BDD Data Catalog File Upload BDD Studio Big Data Discovery Data Proc. Client 22

Oracle Big Data Discovery Python Interface

Team Sport in the Data Lab Data Scientist is taking over and applies Statistical Methods DWH / OLTP Business Analyst New KPI, Report Requirement Databases Hadoop Discovery Output Data Engineer Data Scientist New Data Set (cleaned / enriched) Data Science 24

Shaping a Data Set for further processing Handling of sparse Data / NULL values 25

Shaping a Data Set for further processing Aggregation Roll up low-level data to higher grains Production Year Vehicle Model Year Vehicle Make Intuitive UI helps analysts find the right grains Execute at full scale using Spark Results can be sampled or indexed in full 26

Shaping a Data Set for further processing Combining multiple Data Sets Blend huge datasets in BDD UI to support experimentation, preview Execute at scale with Spark Results can be sampled or indexed in full 27

Shaping a Data Set for further processing Export new Data Set Hive Table in Hadoop 28

BDD-Shell interface Point of Contact with Data Scientists BDD Shell is an interactive tool designed to work with BDD without using Studio's front-end Provides a way to explore and manipulate the internals of BDD and interact with Hadoop Python-based shell Exposes all BDD data objects Easy-to-use Python Wrappers for BDD APIs and Python Utilities Use of Third-party Libraries, e.g., Pandas and NumPy 29

Data Analysis with Python (Re-)use data from Oracle Big Data Discovery while working with the BDD Shell Import Package NumPy (Numerical Python) Import Spark Machine Learning library MLlib List of Oracle Big Data Discovery Data Sets Converting a Oracle Big Data Discovery Data Set into an Apache Spark Dataframe 30

Leveraging Notebooks for a better user experience Point of Contact with Data Scientists Easiest way to use the BDD-Shell Visual appeal, ease of use, collaboration features of an integrated platform Power and flexibility of custom code Pick up BDD s datasets and leverage Machine Learning algorithms to infer new insight www.jupyter.org 31

Re-using a Data Set for Machine Learning Demo Part 2 Shaping a new Data Set in Big Data Discovery Tool Machine Learning with Python ML & Spark in Jupyter Notebook

Using Jupyter Notebook, Python ML & Spark Demo Part 2 33

Oracle Big Data Discovery Customer Cases

Prototype Testing

Prototype Testing 1. A flexible environment to exploit all available data for prototype testing discovery 2. Can driver comments really add value to our prototype testing discovery? 3. What is the relationship between errors? 1 Telemetry 1.2 Billion rows at 100Hz Data Platform Analysis & Dashboarding 2 Errors 3 Driver Comments Storage Factory Discovery Lab

Post-LHC accelerator projects (80-100 km)

CERN Data Lab Reliability, Availability, Maintainability and Safety (RAMS) studies for the Future Circular Collider (FCC). Use RAMS findings to assess the feasibility of the needs of FCC

Customer DNA

Customer DNA 41

Model Monitoring Master Data / Operational Data Data Quality Garanti Bank s Analytics Platform Changed Data Data Store Analytics Laboratory Production Oracle Big Data Appliance Hive Oracle Exadata Data Warehouse Discovery & Exploration R SAS R Oracle R Enterprise on Exadata Enterprise Guide & Miner R Studio SAS Studio Modelling Oracle R Advanced Analytics for Hadoop SAS In-Memory Statistics for Hadoop Scoring Java R SQLite FICO Blaze Advisor SQL SAS SQL / HSQL SAS Visual Statistics SAS R Enriched Data On Demand Data SAS VA Oracle Big Data Discovery Deployment SAS Decision Manager Model Repository Text Miner Decision Tree On Demand 42

Oracle Big Data Discovery Deployment Options

Multiple Deployment Options for Oracle Big Data Discovery

Q & A

46