Oracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features

Similar documents
Oracle Machine Learning & Advanced Analytics

Oracle Machine Learning Notebook

Transformational Machine Learning Use Cases You Can Deploy Now CON6234

Oracle Machine Learning Notebook

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016

Oracle s Machine Learning and Advanced Analytics Data Management Platforms. Move the Algorithms; Not the Data

Introducing Oracle Machine Learning

Take Your Analytics to the Next Level of Insight using Machine Learning in the Oracle Autonomous Data Warehouse

Oracle Big Data Science

Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL

Oracle Big Data Science IOUG Collaborate 16

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Getting Started with Advanced Analytics in Finance, Marketing, and Operations

New Features in Oracle Data Miner 4.2. The new features in Oracle Data Miner 4.2 include: The new Oracle Data Mining features include:

Oracle R Technologies

Oracle s Machine Learning and Advanced Analytics

Oracle R Enterprise Platform and Configuration Requirements Oracle R Enterprise runs on 64-bit platforms only.

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

Oracle Big Data Connectors

Safe Harbor Statement

Oracle Big Data Discovery

Oracle Big Data, Machine Learning, DW OOW újdonságok

C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Combining Graph and Machine Learning Technology using R

Week 1 Unit 1: Introduction to Data Science

Oracle9i Data Mining. Data Sheet August 2002

CHAPTER. Oracle Data Miner

Data Science Bootcamp Curriculum. NYC Data Science Academy

BUILT FOR THE SPEED OF BUSINESS

SAS High-Performance Analytics Products

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

How to Troubleshoot Databases and Exadata Using Oracle Log Analytics

Oracle Enterprise Data Quality - Roadmap

Deploying, Managing and Reusing R Models in an Enterprise Environment

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

Safe Harbor Statement

COMPUTE CLOUD SERVICE. Moving to SPARC in the Oracle Cloud

Big Data For Oil & Gas

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

Using Existing Numerical Libraries on Spark

Taking R to New Heights for Scalability and Performance

Data Science. Data Analyst. Data Scientist. Data Architect

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Scalable Machine Learning in R. with H2O

Chapter 28. Outline. Definitions of Data Mining. Data Mining Concepts

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud

Build a system health check for Db2 using IBM Machine Learning for z/os

Test bank for accounting information systems 1st edition by richardson chang and smith

Improving Your Business with Oracle Data Integration See How Oracle Enterprise Metadata Management Can Help You

Introduction to Data Mining and Data Analytics

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Machine Learning and SystemML. Nikolay Manchev Data Scientist Europe E-

Oracle R Enterprise. New Features in Oracle R Enterprise 1.5. Release Notes Release 1.5

Oracle Exadata: Strategy and Roadmap

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

Introducing Oracle R Enterprise 1.4 -

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Adds Leading AI Data Engine to Oracle Cloud Applications, Providing Dynamic and Insightful Company-Level Data to Power Even Smarter Decisions

Chapter 6. Foundations of Business Intelligence: Databases and Information Management VIDEO CASES

Apache SystemML Declarative Machine Learning

Migrate from Netezza Workload Migration

Real Time Summarization. Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Massively Parallel Processing. Big Data Really Fast. A Proven In-Memory Analytical Processing Platform for Big Data

Decision models for the Digital Economy

Gain Greater Productivity in Enterprise Data Mining

Informatica Enterprise Information Catalog

Accelerate your SAS analytics to take the gold

Oracle9i Data Mining. An Oracle White Paper December 2001

Overview of Data Services and Streaming Data Solution with Azure

Graph Analytics and Machine Learning A Great Combination Mark Hornick

Python With Data Science

Data Mining: Approach Towards The Accuracy Using Teradata!

Performance and Load Testing R12 With Oracle Applications Test Suite

Oracle Data Mining Concepts. 18c

Javaentwicklung in der Oracle Cloud

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

Data Virtualization for the Enterprise

Oracle and Tangosol Acquisition Announcement

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Migrate from Netezza Workload Migration

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Implementing SVM in an RDBMS: Improved Scalability and Usability. Joseph Yarmus, Boriana Milenova, Marcos M. Campos Data Mining Technologies Oracle

MySQL Group Replication. Bogdan Kecman MySQL Principal Technical Engineer

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

Chapter 1 - The Spark Machine Learning Library

Distributed Machine Learning" on Spark

August Oracle - GoldenGate Statement of Direction

BIG DATA SCIENTIST Certification. Big Data Scientist

Convergence and Collaboration: Transforming Business Process and Workflows

KNIME for the life sciences Cambridge Meetup

Evolving To The Big Data Warehouse

Data Preprocessing. Slides by: Shree Jaswal

MDM Partner Summit 2015 Oracle Enterprise Data Quality Overview & Roadmap

Performance Innovations with Oracle Database In-Memory

Modern and Fast: A New Wave of Database and Java in the Cloud. Joost Pronk Van Hoogeveen Lead Product Manager, Oracle

Oracle Enterprise Manager 12c IBM DB2 Database Plug-in

Transcription:

Oracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features Move the Algorithms; Not the Data! Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Advanced Analytics and Machine Learning charlie.berger@oracle.com www.twitter.com/charliedatamine

Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle.

Machine Learning/Analytics + Data Warehouse + Hadoop Platform Sprawl More Duplicated Data More Data Movement Latency More Security challenges More Duplicated Storage More Duplicated Backups More Duplicated Systems More Space and Power

Oracle s Advanced Analytics and Machine Learning Platform Machine Learning Algorithms Embedded in the Data Management Platform Information Producers Data Scientists, R Users, Citizen Data Scientists R Studio Client SQL Dev/Data Miner Oracle Data Viz Information Consumers BI Analysts, Managers Functional Users (HCM, CRM) Predictive Applications HQL Oracle BDA Hadoop ORAAH Machine Learning Algorithms, Statistical Functions + R Integration for Scalable, Parallel, Distributed Execution Data Management + Advanced Analytical Platform Big Data SQL Oracle Database EE Oracle Advanced Analytics (ODM + ORE) Machine Learning Algorithms, Statistical Functions + R Integration for Scalable, Parallel, Distributed, in-db Execution Oracle Database EE Oracle Cloud Copyright 2017, Oracle and/or its affiliates. All rights reserved.

Oracle s Adv. Analytics Machine Learning Algorithms Classification Naïve Bayes Logistic Regression (GLM) Decision Tree Random Forest Neural Network Support Vector Machine Explicit Semantic Analysis Gaussian Mixture Models Clustering Hierarchical K-Means Hierarchical O-Cluster Expectation Maximization (EM) Regression Generalized Linear Model Support Vector Machine (SVM) Random Forest Linear Model Stepwise Linear regression LASSO Association Rules A priori Attribute Importance Minimum Description Length A1 A2 A3 A4 A5 A6 A7 Principal Component Analysis (PCA) Unsupervised Pair-wise KL Divergence Anomaly Detection Predictive Queries One-Class Support Vector Machine (SVM) Statistical Functions Basic statistics: median, stdev, t-test, F-test, Pearson s, Chi-sq, Anova, etc. + Ability to Mine Unstructured, Structured, & Transactional data + Support for SQL Partition-By Models Algorithm Support for Text Algorithms support text type Tokenization and theme extraction Explicit Semantic Analysis (ESA) for document similarity Feature Extraction Principal Component Analysis (PCA) Non-negative Matrix Factorization Singular Value Decomposition (SVD) Time Series Single Exponential Smoothing Double Exponential Smoothing Open Source ML Algorithms CRAN R Algorithm Packages through Embedded R Execution Spark MLlib algorithm integration

Machine Learning & Advanced Analytical Methodologies Data Preparation & Adv. Analytical Process Runs In-Database Additional relevant data and engineered features Oracle Database 12c Historical or Current Data to be scored for predictions Historical data Assembled historical data Predictions & Insights Sensor data, Text, unstructured data, transactional data, spatial data, etc.

Oracle s Advanced Analytics Fastest Way to Deliver Scalable Enterprise-wide ML/Predictive Analytics Oracle Cloud Major Benefits Data remains in Database & Hadoop Model building and scoring occur in-database Use R packages with data-parallel invocations Leverage investment in Oracle IT Eliminate data duplication Eliminate separate analytical servers Deliver enterprise-wide applications GUI for ML/Predictive Analytics & code gen R interface leverages database as HPC engine Traditional Analytics Data Import Data Mining Model Scoring Data Prep. & Transformation Data Mining Model Building Data Prep & Transformation Data Extraction Hours, Days or Weeks Oracle Advanced Analytics avings Model Scoring Embedded Data Prep Model Building Data Preparation Secs, Mins or Hours

Oracle s Advanced Analytics Fastest Way to Deliver Scalable Enterprise-wide Predictive Analytics Oracle Cloud Key Features Parallel, scalable data mining algorithms and R integration In-Database + Hadoop Don t move the data Data analysts, data scientists & developers Drag and drop workflow, R and SQL APIs Extends data management into powerful advanced/predictive analytics platform Enables enterprise predictive analytics deployment + applications

You Can Think of Oracle s Advanced Analytics Like This Traditional SQL Oracle Advanced Analytics - SQL & Human-driven queries Domain expertise Any rules must be defined and managed Automated knowledge discovery, model building and deployment Domain expertise to assemble the right data to mine/analyze SQL Queries Analytical SQL Verbs SELECT DISTINCT AGGREGATE WHERE AND OR GROUP BY ORDER BY RANK + PREDICT DETECT CLUSTER CLASSIFY REGRESS PROFILE IDENTIFY FACTORS ASSOCIATE Oracle Cloud

Oracle Text Native Capability of every Oracle Database Oracle Text uses standard SQL to index, search, and analyze text and documents stored in the Oracle database, in files, and on the web. Oracle Text supports multiple languages and uses advanced relevanceranking technology to improve search quality. Oracle Advanced Analytics leverages Oracle Text to pre-process ( tokenize ) unstructured data for the OAA SQL ML/data mining functions

Oracle Advanced Analytics How Oracle R Enterprise Compute Engines Work Oracle Cloud Other R packages R-> SQL Oracle Database 12c R R Engine Other R packages Oracle R Enterprise (ORE) packages Results Results Oracle R Enterprise packages 1 R-> SQL Transparency Push-Down 2 In-Database Adv Analytical SQL Functions 3 Embedded R Package Callouts R language for interaction with the database R-SQL Transparency Framework overloads R functions for scalable in-database execution Function overload for data selection, manipulation and transforms Interactive display of graphical results and flow control as in standard R Submit user-defined R functions for execution at database server under control of Oracle Database 30+ Powerful data mining algorithms (regression, clustering, AR, DT, etc._ Run Oracle Data Mining SQL data mining functioning (ORE.odmSVM, ORE.odmDT, etc.) Speak R but executes as proprietary indatabase SQL functions machine learning algorithms and statistical functions Leverage database strengths: SQL parallelism, scale to large datasets, security Access big data in Database and Hadoop via SQL, R, and Big Data SQL R Engine(s) spawned by Oracle DB for database-managed parallelism ore.groupapply high performance scoring Efficient data transfer to spawned R engines Emulate map-reduce style algorithms and applications Enables production deployment and automated execution of R scripts

Oracle Advanced Analytics 12.2, Oracle Data Miner 4.2 New Features Agenda 1. New Machine Learning Algorithms 2. Significant ML Model Build Performance Improvements 3. Partitioned Models 4. Oracle Data Miner Update of OAA Server Enhancements 5. Analytical Platform that Enables Predictive Applications

Oracle Advanced Analytics 12.2 New Oracle Database Features Unsupervised Feature Selection Uunsupervised algorithm for pair-wise correlations (Kullback-Leibler Divergence (KLD)) for numeric & categorical attributes to find highest information containing attributes Association Rules Enhancements Adds calculation of values associated with AR rules such as sales amount to indicate the value of co-occurring items in baskets Can filter input items prior to market basket anaysis Significant Performance Improvements for all Algorithms New parallel model build / apply redesigned infrastructure to enable faster new algorithm introduction Scale to larger data volumes found in big data and cloud use cases Partitioned Models A1 A2 A3 A4 A5 A6 A7 Instead of building, naming and referencing 10s or 1000s of models, partitioned models organize and represent multiple models as partitions in a single model entity Partition 1 Partition 2 Partition N

Oracle Advanced Analytics 12.2 Model Build Time Performance Unofficial T7-4 (Sparc & Solaris) X5-4 (Intel and Linux) OAA 12.2 Algorithms Rows (Ms) Model Build Time (Secs / Degree of Parallelism) Wow! That s Fast! Attributes Importance 640 28s / 512 44s / 72 K Means Clustering 640 161s / 256 268s / 144 Expectation Maximization 159 455s / 512 588s / 144 Naive Bayes Classification 320 17s / 256 23s / 72 GLM Classification 640 154s / 512 363s / 144 GLM Regression 640 55s / 512 93s / 144 Support Vector Machine (IPM solver) 640 404s / 512 1411s / 144 Support Vector Machine (SGD solver) 640 84s / 256 188s / 72 The way to read their results is that they compare 2 chips: X5 (Intel and Linux) and T7 (Sparc and Solaris). They are measuring scalability (time in seconds) with increase degree of parallelism (dop). The data also has high cardinality Copyright categorical 2016, Oracle columns and/or which its affiliates. translates All rights in reserved. 9K mining attributes (when algorithms require explosion). There are no comparisons to 12.1 and it is fair to say that the 12.1 algorithms could not run on data of this size.

Oracle Advanced Analytics 12.2 New Oracle Database Features Explicit Semantic Analysis (ESA) algorithm Useful technique for extracting meaningful, interpretable features; better than LDA English Wikipedia is Text corpus default to equate tokens with human identifiable features and concepts ESA improves text processing, classification, document similarity and topic identification Compare documents that may not even mention same topics e.g. al-qa ida or Osama bin Laden: Document 1 'Senior members of the Saudi royal family paid at least $560 million to Osama bin Laden terror group and the Taliban for an agreement his forces would not attack targets in Saudi Arabia, according to court documents. The papers, filed in a $US3000 billion ($5500 billion) lawsuit in the US, allege the deal was made after two secret meetings between Saudi royals and leaders of al-qa ida, including bin Laden. The money enabled al-qa ida to fund training camps in Afghanistan later attended by the September 11 hijackers. The disclosures will increase tensions between the US and Saudi Arabia.' Document 2 'The Saudi Interior Ministry on Sunday confirmed it is holding a 21-year-old Saudi man the FBI is seeking for alleged links to the Sept. 11 hijackers. Authorities are interrogating Saud Abdulaziz Saud al-rasheed "and if it is proven that he was connected to terrorism, he will be referred to the sharia (Islamic) court," the official Saudi Press Agency quoted an unidentified ministry official as saying. ESA Similarity Score = 0.62

Oracle Advanced Analytics 12.2 New Oracle Database Features Explicit Semantic Analysis (ESA) algorithm "The more things change... Yes, I'm inclined to agree, especially with regards to the historical relationship between stock prices and bond yields. The two have generally traded together, rising during periods of economic growth and falling during periods of contraction. Consider the period from 1998 through 2010, during which the U.S. economy experienced two expansions as well as two recessions: Then central banks came to the rescue. Fed Chairman Ben Bernanke led from Washington with the help of the bank's current $3.6T balance sheet. He's accompanied by Mario Draghi at the European Central Bank and an equally forthright Shinzo Abe in Japan. Their coordinated monetary expansion has provided all the sugar needed for an equities moonshot, while they vowed to hold global borrowing costs at record lows Top topics (concepts, people, organizations, events ) discovered by ESA using Wikipedia as model source data Recession, Ben Bernanke, Lost Decade Japan, Mario Draghi, Quantitative easing, Long Depression, Great Recession, Federal Open Market Committee, Bank of Canada, Monetary policy, Japanese asset price bubble, Money supply, Great Depression, Central bank, Federal Reserve System If instead of using the entire Wikipedia, we limit ourselves to the source dataset comprised of concepts only, this result would translate to: Recession, Quantitative easing, Monetary policy, Money supply, Central bank, Federal Reserve System

Oracle Advanced Analytics 12.2 New Oracle Database Features Explicit Semantic Analysis (ESA) algorithm View model Apply model

Oracle Advanced Analytics 12.2 New Oracle Database Features Extensibility for R Models Register R models as in-database models for build, apply, settings, and viewing Extends ease of advanced analytics development from R to Oracle Database Supports data with nested attributes, handling text and aggregated transactional data for open source R packages Oracle Data Miner R Build Node The R Build Node allows you to register R models. It builds R models and generates R model test results for Classification and Regression mining function. R Build nodes supports Classification, Regression, Clustering, and Feature Extraction mining functions only. Enables R users to roll out new analytics and more rapidly take advantage of existing R packages

Oracle Data Miner 4.2 New Features for OAA NEW IN 4.2 Add/Expose all 12.2 features in Oracle Data Miner UI

Previewing a 4.2 Feature Workflow Scheduler NEW IN 4.2