Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016

Similar documents
Taking R to New Heights for Scalability and Performance

Oracle Big Data Science

Oracle R Technologies Overview

Oracle Big Data Science IOUG Collaborate 16

Oracle Machine Learning Notebook

Oracle R Technologies

Session 4: Oracle R Enterprise Embedded R Execution SQL Interface Oracle R Technologies

Safe Harbor Statement

Oracle Machine Learning & Advanced Analytics

Introducing Oracle Machine Learning

Getting Started with Advanced Analytics in Finance, Marketing, and Operations

Oracle R Enterprise. New Features in Oracle R Enterprise 1.5. Release Notes Release 1.5

Learning R Series Session 5: Oracle R Enterprise 1.3 Integrating R Results and Images with OBIEE Dashboards Mark Hornick Oracle Advanced Analytics

Introducing Oracle R Enterprise 1.4 -

Oracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features

My name is Brian Pottle. I will be your guide for the next 45 minutes of interactive lectures and review on this lesson.

Oracle Machine Learning Notebook

How to Troubleshoot Databases and Exadata Using Oracle Log Analytics

Combining Graph and Machine Learning Technology using R

Graph Analytics and Machine Learning A Great Combination Mark Hornick

Oracle Database 18c and Autonomous Database

DBAs can use Oracle Application Express? Why?

Oracle R Technologies Overview

C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69

Javaentwicklung in der Oracle Cloud

Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL

Using Machine Learning in OBIEE for Actionable BI. By Lakshman Bulusu Mitchell Martin Inc./ Bank of America

Oracle R Enterprise. User's Guide Release E

Session 7: Oracle R Enterprise OAAgraph Package

FastR: Status and Outlook

Deploying Spatial Applications in Oracle Public Cloud

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

OpenWorld 2018 SQL Tuning Tips for Cloud Administrators

Moving Databases to Oracle Cloud: Performance Best Practices

Oracle R Enterprise Platform and Configuration Requirements Oracle R Enterprise runs on 64-bit platforms only.

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

Oracle Big Data Connectors

Automating Information Lifecycle Management with

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider

Oracle s Machine Learning and Advanced Analytics Data Management Platforms. Move the Algorithms; Not the Data

Learning R Series Session 3: Oracle R Enterprise 1.3 Embedded R Execution

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

<Insert Picture Here>

Oracle Big Data Discovery

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13

Brendan Tierney. Running R in the Database using Oracle R Enterprise 05/02/2018. Code Demo

Deploying, Managing and Reusing R Models in an Enterprise Environment

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Learning R Series Session 1: Introduction to Oracle's R Technologies and Oracle R Enterprise 1.3

Oracle Enterprise Data Quality - Roadmap

Spotfire: Brisbane Breakfast & Learn. Thursday, 9 November 2017

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

The Oracle Trust Fabric Securing the Cloud Journey

Performance Innovations with Oracle Database In-Memory

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

This presentation is for informational purposes only and may not be incorporated into a contract or agreement.

Real Time Summarization. Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Consolidate and Prepare for Cloud Efficiencies Oracle Database 12c Oracle Multitenant Option

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

Oracle s Machine Learning and Advanced Analytics

Implementing SVM in an RDBMS: Improved Scalability and Usability. Joseph Yarmus, Boriana Milenova, Marcos M. Campos Data Mining Technologies Oracle

Massive Scalability With InterSystems IRIS Data Platform

Take Your Analytics to the Next Level of Insight using Machine Learning in the Oracle Autonomous Data Warehouse

Oracle Enterprise Manager 12c IBM DB2 Database Plug-in

High Availability for Enterprise Clouds: Oracle Solaris Cluster and OpenStack

Oracle Exadata: Strategy and Roadmap

Additional License Authorizations. For Vertica software products

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

DNS Level 100. Rohit Rahi November Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Scalable Machine Learning in R. with H2O

Oracle Autonomous Database

Managing Oracle Database 12c with Oracle Enterprise Manager 12c

State of the Dolphin Developing new Apps in MySQL 8

Modern and Fast: A New Wave of Database and Java in the Cloud. Joost Pronk Van Hoogeveen Lead Product Manager, Oracle

Rocky Mountain Oracle Users Group Fall Educational Workshop November 12, 2015

Latest from the Lab: What's New Machine Learning Sam Buhler - Machine Learning Product/Offering Manager

Data Science Bootcamp Curriculum. NYC Data Science Academy

Safe Harbor Statement

Transformational Machine Learning Use Cases You Can Deploy Now CON6234

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

MySQL Group Replication. Bogdan Kecman MySQL Principal Technical Engineer

Copyright 2017 Oracle and/or its affiliates. All rights reserved.

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Understanding Oracle RAC ( ) Internals: The Cache Fusion Edition

The Fastest and Most Cost-Effective Backup for Oracle Database: What s New in Oracle Secure Backup 10.2

Recent Innovations in Data Storage Technologies Dr Roger MacNicol Software Architect

Large-Scale Patch Automation for the Cloud-Generation DBAs

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

MDM Partner Summit 2015 Oracle Enterprise Data Quality Overview & Roadmap

Trouble-free Upgrade to Oracle Database 12c with Real Application Testing

Boost your Analytics with ML for SQL Nerds

Performance and Load Testing R12 With Oracle Applications Test Suite

Additional License Authorizations. For HPE HAVEn and Vertica Analytics Platform software products

Personalized Experiences Enabled Through Extensibility

Oracle Database 10G. Lindsey M. Pickle, Jr. Senior Solution Specialist Database Technologies Oracle Corporation

Connecting your Microservices and Cloud Services with Oracle Integration CON7348

Transcription:

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016 Mark Hornick, Director, Advanced Analytics January 27, 2016

Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 2

Oracle s Advanced Analytics Multiple interfaces across platforms SQL, R, GUI, Dashboards, Apps Users R programmers Data / Business Analysts Business Analysts/Mgrs Domain End Users R Client SQL Developer/ OBIEE Applications Oracle Data Miner Platform Hadoop ORAAH Parallel, distributed algorithms Oracle Database Enterprise Edition Oracle Advanced Analytics - Database Option SQL Data Mining & Analytic Functions + R Integration for Scalable, Distributed, Parallel in-database ML Execution Oracle Cloud Oracle Database 12c 3

Scaling R to Big Data Immediate access to database and Hadoop data from R Eliminate need to request extracts from IT/DBA Process data where they reside minimize or eliminate data movement through data.frame proxies Scalability and Performance Use parallel, distributed algorithms that scale to big data on Oracle Database Leverage powerful engineered systems and RAC environments to build and score models on billions of rows of data or build millions of models in parallel from R Ease of deployment Using Oracle Database, place R scripts immediately (no need to recode) in production via SQL Use production quality infrastructure without custom plumbing or extra complexity Process support Maintain and ensure data security, backup, and recovery using existing processes Store, access, manage, and track analytics objects (models, scripts, workflows, data) in Oracle Database 4

Scaling R to Big Data Oracle R Enterprise, part of Oracle Advanced Analytics option Use Oracle Database as HPC environment Use in-database parallel and distributed machine learning algorithms Manage R scripts and R objects in Oracle Database Integrate R results into applications and dashboards via SQL Client R Engine ORE packages Oracle Database User tables In-db stats SQL Interfaces SQL*Plus, SQLDeveloper, Database Server Machine 5

Oracle R Enterprise three key areas Transparency Layer Standard R syntax to manipulate data in-database via proxy objects (ore.frame) Overloads R functions and translates functionality to SQL Rich set of transformations, statistical functions, visualizations Predictive Analytics In-database ODM algorithms exposed via R interface with auto data preparation ORE-specific parallel, distributed algorithms Embedded R Execution Invoke user-defined R functions from R and SQL at database server machine Use CRAN packages in user-defined R functions Execute in data-parallel and task-parallel manner to leverage powerful machines like Exadata Return structured data, images, XML via SQL invocation for easy integration with applications R Script Repository and Datastore as R object repository 6

Scalability through proxies with function overloading In-database aggregation no data movement Oracle Distribution of R version 3.2.0 (--) -- "Full of Ingredients" > aggdata <- aggregate(ontime_s$dest, + by = list(ontime_s$dest), + FUN = length) > class(aggdata) [1] "ore.frame" attr(,"package") [1] "OREbase" > head(aggdata,4) Group.1 x 1 ABE 237 2 ABI 34 3 ABQ 1357 4 ABY 10 Oracle Advanced Analytics ORE Client Packages Transparency Layer Oracle SQL select DEST, count(*) from ONTIME_S group by DEST ONTIME_S In-db Stats 7

Scalable Machine Learning Algorithms ORE parallel distributed model (e.g., Linear Regression) using embedded R engines Oracle Distribution of R version 3.2.0 (--) -- "Full of Ingredients" > options(ore.parallel=4) > lm_mod <- ore.lm(arrdelay ~ DISTANCE + DEPDELAY, data=ontime_s) Oracle Advanced Analytics ORE Client Packages Predictive Analytics > summary(lm_mod) Call: ore.lm(formula = ARRDELAY ~ DISTANCE + DEPDELAY, data = ONTIME_S) Residuals: Min 1Q Median 3Q Max -1462.45-6.97-1.36 5.07 925.08 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 2.254e-01 5.197e-02 4.336 1.45e-05 *** DISTANCE -1.218e-03 5.803e-05-20.979 < 2e-16 *** DEPDELAY 9.625e-01 1.151e-03 836.289 < 2e-16 *** --- Residual Signif. codes: standard 0 error: *** 14.73 0.001 on ** 2151440.01 degrees * of 0.05 freedom. 0.1 1 (4785 observations deleted due to missingness) Multiple R-squared: 0.7647, Adjusted R-squared: 0.7647 F-statistic: 3.497e+05 on 2 and 215144 DF, p-value: < 2.2e-16 ONTIME_S 1 2 Parallel ORE Framework 3 extproc extproc extproc external procedure Oracle R Distribution Parallel Oracle ore.lm R Distribution Compute R Oracle Parallel Packages R Distribution ore.lm Compute Parallel R Oracle Packages ore.lm R Distribution Compute Parallel R Packages Compute Engine 4 8

Oracle R Enterprise Predictive Analytics algorithms in-database plus open source R packages for algorithms in combination with embedded R data- and task-parallel execution Classification Decision Tree Logistic Regression Naïve Bayes RandomForest Support Vector Machine Regression Linear Model Generalized Linear Model Multi-Layer Neural Networks Stepwise Linear Regression Support Vector Machine New in ORE 1.5 Clustering Hierarchical k-means Orthogonal Partitioning Clustering Attribute Importance Minimum Description Length Anomaly Detection 1 Class Support Vector Machine Market Basket Analysis Apriori Association Rules Feature Extraction Nonnegative Matrix Factorization Principal Component Analysis Singular Value Decomposition Time Series Single Exponential Smoothing Double Exponential Smoothing 9

Internet of Things Use Case: Energy Demand Leveraging ORE Embedded R Execution Model each customer s usage to understand behavior and predict individual usage and overall aggregate demand 200 thousand households, each with a utility smart meter 1 reading / meter / hr 200K x 8760 hrs / yr 1.752B readings 3 years worth of data 5.256B readings Each customer has 26280 readings If each model takes 10 seconds to build, 555.6 hrs (23.2 days) with 128 DOP 4.3 hrs

Scalable Analysis Model Building Smart meter scenario Oracle Database + ORE Data c1 c2 ci cn R Datastore R Script Repository f(dat,args, ) f(dat,args, ) f(dat,args, ) f(dat,args, ) f(dat,args, ) { R Script build model Model c1 Model c2 Model ci Model cn }

Build models, partitioned on CUST_ID, and store in database ore.groupapply (CUST_USAGE_DATA, CUST_USAGE_DATA$CUST_ID, function(dat, ds.name) { cust_id <- dat$cust_id[1] mod <- lm(consumption ~. -CUST_ID, dat) mod$effects <- mod$residuals <- mod$fitted.values <- NULL name <- paste("mod", cust_id,sep="") assign(name, mod) ds.name1 <- paste(ds.name,".",cust_id,sep="") ore.save(list=paste("mod",cust_id,sep=""), name=ds.name1, overwrite=true) TRUE }, ds.name="mydatastore", ore.connect=true, parallel=128 ) 12

Next Generation Workflow for data collection / model building Oracle Database with OAA / ORE Big Data SQL R Client OAA/ORE R integration SQL Client 13

Next Generation Workflow for data collection / model building Oracle Database with OAA / ORE Big Data SQL R Client OAA/ORE R integration SQL Client 14

Next Generation Streaming Machine Learning Architecture Input event stream Model transfer via Oracle Stream Explorer Data prep, model import, scoring function creation Data Scoring Prediction stream Oracle Database Data Training Data Model Building ore.groupapply + {pmmll} package Oracle R Enterprise Datastore ore.save R Script Repository 15

Check out this ORE session for more details on streaming machine learning Thursday at 9:50 AM, Room 104 Thursday at 11:55 AM, Room 103 16

Oracle Advanced Analytics On Premise or Cloud 100% Compatibility Enables Easy Coexistence and Migration CoExistence and Migration Same Architecture On-Premise Same Analytics Same Standards Oracle Cloud Transparently move workloads and analytical methodologies between On-premise and public cloud 17

To Learn More about Oracle s R Technologies http://oracle.com/goto/r 18