Taking R to New Heights for Scalability and Performance

Size: px
Start display at page:

Download "Taking R to New Heights for Scalability and Performance"

Transcription

1 Taking R to New Heights for Scalability and Performance Mark Hornick Director, Advanced Analytics and Machine Learning blogs.oracle.com/r January 31,2017

2 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 2

3 Why statisticians data analysts data scientists use R Powerful Extensible Graphical Extensive statistics Ease of installation and use Rich ecosystem ~10K open source packages Millions of users worldwide Free R is a statistics language similar to Base SAS or SPSS statistics 3

4 Traditional R and Data Source Interaction Read/Write using built-in R capabilities read Flat Files extract / export Data Source export load RODBC / RJDBC / ROracle Deployment R script cron job Access latency Paradigm shift: R Data Access Language R Memory limitation data size, call-by-value Single threaded Ad hoc production deployment Issues for backup, recovery, security 4

5 How to take R to new heights for scalability and performance? i.e., to work on Big Data 5

6 Volume Veracity Variety Value Variability Velocity 6

7 Capabilities that take R to new heights Transparent Data Access, Analysis, and Exploration Production Deployment Scalable Machine Learning Data & Task Parallelism 7

8 Transparent data access, analysis, and exploration All Processing Light local Processing Translate R invocations Offload processing to more powerful machines and environments using data.frame proxies Avoid data movement SQL Spark Java, Python, Scala, R HiveQL Data Source Main Processing Main Processing Main Processing 8

9 Transparent data access and manipulation Maintain language features and interface Transparently translate R to language of powerful data processing engines Reference data to eliminate data movement Analyze all of your data 9

10 Proxy objects for Big Data data.frame Proxy data.frame Inherits from 10

11 Transparency Examples library(ore) ore.connect("rquser", "orcl", "localhost", "rquser", all=true) ore.ls() hist(my_table$arrdelay,breaks=100) merge (TEST_DF1, TEST_DF2, by.x="x1", by.y="x2") df <- with(ontime_s, ONTIME_S[DEST=="SFO" DEST=="BOS",1:21]) df$lrgdelay <- ifelse(df$arrdelay > 20,1,0) head(df) summary(df) # with OREdplyr in ORE select(flights, year, month, dep_delay) rename(flights, tail_num = tailnum) filter(flights, month == 1, day == 1) arrange(flights, year, month, day) mutate(flights, speed=air_time/distance) ore.frame Proxy Object 11

12 Scalable Machine Learning Offload model building to parallel software and powerful machines and environments Light local Processing Enable new technologies as they arise All Processing RDBMS Build models using powerful hardware and parallel algorithms Spark Flink Hadoop Data.Rdata Main Processing Main Processing Main Processing Main Processing OAA ORAAH, MLlib FlinkML ORAAH, Mahout 12

13 Scalable Machine Learning Maintain R machine learning interface Easy to specify formula minimal lines of code Include transformations, interaction terms, etc. Algorithm Façade Algorithm Target Predictors log(arrdelay) ~ DISTANCE + ORIGIN + DEST + as.factor(month) + as.factor(year) + as.factor(dayofmonth) + as.factor(dayofweek) + as.factor(flightnum) 13

14 Scalable Machine Learning Maintain R machine learning interface Easy to specify formula minimal lines of code Include transformations, interaction terms, etc. Bring the algorithm to the data Eliminate or minimize data movement Leverage proxy objects to reference data Slow Fast Algorithm Data 14

15 Scalable Machine Learning Maintain R machine learning interface Easy to specify formula minimal lines of code Include transformations, interaction terms, etc. Bring the algorithm to the data Eliminate or minimize data movement Leverage proxy objects to reference data Parallel, distributed algorithm implementations Oracle-proprietary parallel, distributed algorithms Leverage other open source packages and toolkits e.g., Apache Spark Mllib, Apache FlinkML Algorithm 15

16 Linear Model Performance Comparison Predict Total Revenue of a customer based on 31 numeric variables as predictors, on 184 million records using SPARC T5-8, 4TB of RAM Data in an Oracle Database table Algorithm Threads Used* Memory required** Time for Data Loading*** Time for Computation Total Relative Performance Open-Source R Linear Model (lm) 1 220Gb 1h3min 43min 1h46min 1x Oracle R Enterprise lm (ore.lm) min 42.8min 2.47X Oracle R Enterprise lm (ore.lm) min34s 1min34s 67.7X Oracle R Enterprise lm (ore.lm) s 57.97s 110X Oracle R Enterprise lm (ore.lm) s 41.69s 153X *Open-source R lm() is single threaded **Data moved into the R Session s memory, since open-source lm() requires all data to be in-memory ***How long it takes to load 40Gb of raw data into the open-source R Session s memory 16

17 Time in Seconds Time in Seconds Not all parallel implementations are the same Comparing performance with varying Spark memory footprints Benchmark on single X5-2 Node with 74 threads and 256 GB of Total RAM, Spark on CDH Input Data is 15GB Ontime airline dataset with 123mi records, predicting 8,926 total coefficients 1, LM in Spark Performance ORAAH LM Spark MLlib LM 908 1, GLM in Spark Performance ORAAH GLM Spark Mllib GLM DNF Spark Context memory in GB WNR LM formula used ARRDELAY ~ DISTANCE + ORIGIN + DEST + as.factor(month) + as.factor(year) + as.factor(dayofmonth) + as.factor(dayofweek) + as.factor(flightnum) GLM formula used CANCELLED DISTANCE + ORIGIN + DEST + as.factor(month) + as.factor(year) + as.factor(dayofmonth) + as.factor(dayofweek) + as.factor(flightnum) Spark Context memory in GB WNR

18 Data and Task Parallel Execution Easily specify parallelism and data partitioning Simplified API all-in-one Build and score with millions of models Automated management of parallel R engines Insulation from hardware details Limit resources as appropriate Startup and shutdown automatically Automated loading of data into parallel R engines Leverage CRAN packages 18

19 Data and Task Parallelism Hand-code logic to spawn R engines and partition and feed data to R engines as they become available All Processing parallel Rserve Rmpi snow Store UDF Invoke Execute user-defined R function on back-end servers in data-parallel or task-parallel manner Auto-partition and feed data while also leveraging CRAN packages Partition data by value Partition data by count invoke function with index Data.Rdata.Rdata.Rdata.Rdata Spawn & control R engines Provide function and data R Script Repository R Object Repository 19

20 Example API Supply data Specify function Use CRAN packages Store and load R objects Pass Arguments Specify parallelism Get/use results R objects structured data Images etc. library(e1071) mod <- ore.tableapply(iris_table, function(dat,datastore) { library(e1071) dat$species <- as.factor(dat$species) mod<-naivebayes(species ~., dat) ore.save(mod,name=datastore) }, datastore="nb_model-1") scorenbmodel <- function(dat, datastore) { } library(e1071) ore.load(datastore) dat$pred <- predict(mod, newdata = dat) dat IRIS_PRED IRIS_PRED$PRED <- "A" <- IRIS_TABLE[1,] res <- ore.rowapply(iris_table, scorenbmodel, datastore = "NB_Model-1", DAT <- ONTIME_S[ONTIME_S$DEST %in% parallel=4, FUN.VALUE=IRIS_PRED, rows=10) c("bos","sfo","lax","ord","atl","phx","den"),] No parallelism Data parallel by chunk Data parallel by partition modlist <- ore.groupapply( X=DAT, INDEX=DAT$DEST, parallel=3, function(dat) lm(arrdelay ~ DISTANCE + DEPDELAY, dat)) 20

21 Deployment Avoid costly recoding or translating R code Invoke R easily from non-r environments Map data structures and types naturally Seamlessly return data.frames, images, XML, JSON in local environment data structures 21

22 Deployment data.frame images R objects Application Business Logic C, C++, Java, etc. rjava API RServe client/server TCP/IP rpy2 system() cron independent Application / Dashboards Business Logic C, C++, Java SQL Allow applications and dashboard tools to use familiar, existing SQL protocols for invoking R Invoke R from SQL results as table: structured, image, XML, JSON automate parallel and concurrent execution R Scripts Data Stores RDBMS HDFS NoSQL Hive File system SQL - R Engine R Scripts Data RDBMS HDFS NoSQL Hive 22

23 Deploy R using SQL Store named R function in Script Repository from R or SQL Return values Images as PNG BLOB column data.frame content as database table XML with data.frame and image Benefits Fewer moving parts IPC data transfer speeds at backend Invoke same function from R or SQL Security Integrated backup and recovery begin sys.rqscriptdrop('randomreddots'); sys.rqscriptcreate('randomreddots', 'function(){ id <- 1:10 plot( 1:100, rnorm(100), pch = 21, bg = "red", cex = 2, main="random Red Dots" ) data.frame(id=id, val=id / 100) }'); end; select from select from ID, IMAGE table(rqeval( NULL,'PNG','RandomRedDots')); id, val table(rqeval( NULL,'select 1 id, 1 val from dual', 'RandomRedDots')); -- Return structured and image content within XML string select * from table(rqeval(null, 'XML', 'RandomRedDots')); -- In R, invoke same function by name ore.doeval(fun.name='randomreddots') 23

24 Architectural Elements: Enabling R for Big Data Leverage powerful back-ends for the heavy lifting transparently Leverage new, more powerful back-ends more easily as they appear Enable parallelism quickly and easily for big data processing Immediately leverage data scientist R scripts and results in production environments Production Deployment Transparent Data Access, Analysis, and Exploration Data & Task Parallelism Scalable Machine Learning 24

25 Oracle R Enterprise Part of Oracle Advanced Analytics option to Oracle Database Use Oracle Database as HPC environment Use in-database parallel and distributed machine learning algorithms Manage R scripts and R objects in Oracle Database Integrate R results into applications and dashboards via SQL R Client Oracle R Enterprise Oracle Database User tables In-db stats SQL Interfaces SQL*Plus, SQLDeveloper, Database Server Machine 25

26 Oracle R Enterprise Part of Oracle Advanced Analytics option to Oracle Database Transparency layer Leverage proxy objects (ore.frames) - data remains in the database Overload R functions that translate functionality to SQL Use standard R syntax to manipulate database data Parallel, distributed algorithms Scalability and performance Oracle R Enterprise Exposes in-database algorithms from ODM Additional R-based algorithms executing and database server Embedded R execution Manage and invoke R scripts in Oracle Database Data-parallel, task-parallel, and non-parallel execution Use open source CRAN packages R Client Oracle Database User tables In-db stats SQL Interfaces SQL*Plus, SQLDeveloper, Database Server Machine 26

27 OAA / Oracle R Enterprise Predictive Analytics algorithms in-database plus open source R packages for algorithms in combination with embedded R data- and task-parallel execution Classification Decision Tree Logistic Regression Naïve Bayes Support Vector Machine RandomForest Regression Linear Model Generalized Linear Model Multi-Layer Neural Networks Stepwise Linear Regression Support Vector Machine New in ORE *ODB 12.2 only Clustering Hierarchical k-means Orthogonal Partitioning Expectation Maximization* Attribute Importance Minimum Description Length Expectation Maximization* Anomaly Detection 1 Class Support Vector Machine Market Basket Analysis Apriori Association Rules Feature Extraction Nonnegative Matrix Factorization Principal Component Analysis Singular Value Decomposition Explicit Semantic Analysis* Time Series Single Exponential Smoothing Double Exponential Smoothing 27

28 Oracle R Advanced Analytics for Hadoop Using Hadoop/Hive/Spark Integration, plus R Engine and Open-Source R Packages Oracle R Advanced Analytics for Hadoop (ORAAH) on Hadoop Cluster R interface to HQL Basic Statistics, Data Prep, Joins and View creation Parallel, distributed algorithms: ORAAH Spark algorithms: Deep Neural, GLM, LM Spark MLlib algorithms: LM, GLM, LASSO, Ridge Regression, Decision Trees, Random Forests, SVM, k-means, PCA HQL Use of Open-source R packages via custom R Mappers / Reducers Oracle Database with Advanced Analytics option R Client R Analytics Oracle R Advanced Analytics for Hadoop SQL Client SQL Developer Other SQL Apps

29 Oracle R Advanced Analytics for Hadoop Predictive Analytics algorithms Classification GLM ORAAH Logistic Regression ORAAH Logistic Regression Random Forests Decision Trees Support Vector Machines Clustering Hierarchical k-means Hierarchical k-means Gaussian Mixture Models Regression MLP Neural Networks ORAAH LASSO Ridge Regression Support Vector Machines Random Forest Linear Regression Basic Statistics Correlation/Covariance Feature Extraction Non-negative Matrix Factorization Collaborative Filtering (LMF) Singular Value Decomposition Attribute Importance Principal Components Analysis Principal Components Analysis 29

30 Cloud-Based Machine Learning Oracle Advanced Analytics option including Oracle Data Mining and Oracle R Enterprise on: Oracle Exadata Cloud Service Oracle Database Cloud Service: Included in High Performance and Extreme Performance services Oracle R Advanced Analytics for Hadoop Included in the Oracle Big Data Cloud Service 30

31 Demonstration of ORE 31

32 Join us for the Oracle R Enterprise Hands-on Lab 9:00 Using R for Big Data Advanced Analytics and Machine Learning data exploration / attribute importance clustering regression OREdplyr, and more 32

33 Join us for Big Data with ORAAH 1:00 Using Machine Learning to unlock the Business Value in Big Data 33

34 Join us for new technology intro 9:50 Combining Graph and Machine Learning Technologies using R 34

35 Join us for new technology intro 10:55 Introducing Oracle Machine Learning new notebook technology from Oracle 35

36 Learn More about Oracle s Advanced Analytics R Technologies 36

37

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016 Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016 Mark Hornick, Director, Advanced Analytics January 27, 2016 Safe Harbor Statement The following

More information

Oracle R Technologies

Oracle R Technologies Oracle R Technologies R for the Enterprise Mark Hornick, Director, Oracle Advanced Analytics @MarkHornick mark.hornick@oracle.com Safe Harbor Statement The following is intended to outline our general

More information

Oracle Machine Learning Notebook

Oracle Machine Learning Notebook Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com

More information

Session 4: Oracle R Enterprise Embedded R Execution SQL Interface Oracle R Technologies

Session 4: Oracle R Enterprise Embedded R Execution SQL Interface Oracle R Technologies Session 4: Oracle R Enterprise 1.5.1 Embedded R Execution SQL Interface Oracle R Technologies Mark Hornick Director, Oracle Advanced Analytics and Machine Learning July 2017 Safe Harbor Statement The following

More information

Combining Graph and Machine Learning Technology using R

Combining Graph and Machine Learning Technology using R Combining Graph and Machine Learning Technology using R Hassan Chafi Oracle Labs Mark Hornick Oracle Advanced Analytics February 2, 2017 Safe Harbor Statement The following is intended to outline our research

More information

Introducing Oracle Machine Learning

Introducing Oracle Machine Learning Introducing Oracle Machine Learning A Collaborative Zeppelin notebook for Oracle s machine learning capabilities Charlie Berger Marcos Arancibia Mark Hornick Advanced Analytics and Machine Learning Copyright

More information

Oracle Big Data Science

Oracle Big Data Science Oracle Big Data Science Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com @VlamisSoftware Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri

More information

Oracle R Technologies Overview

Oracle R Technologies Overview Oracle R Technologies Overview Mark Hornick, Director, Oracle Advanced Analytics The following is intended to outline our general product direction. It is intended for information

More information

Using Machine Learning in OBIEE for Actionable BI. By Lakshman Bulusu Mitchell Martin Inc./ Bank of America

Using Machine Learning in OBIEE for Actionable BI. By Lakshman Bulusu Mitchell Martin Inc./ Bank of America Using Machine Learning in OBIEE for Actionable BI By Lakshman Bulusu Mitchell Martin Inc./ Bank of America Using Machine Learning in OBIEE for Actionable BI Using Machine Learning (ML) via Oracle R Technologies

More information

Oracle R Technologies Overview

Oracle R Technologies Overview Oracle R Technologies Overview Mark Hornick, Director, Oracle Database Advanced Analytics The following is intended to outline our general product direction. It is intended for information

More information

Introducing Oracle R Enterprise 1.4 -

Introducing Oracle R Enterprise 1.4 - Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Session 3: Oracle R Enterprise Embedded R Execution R Interface

Session 3: Oracle R Enterprise Embedded R Execution R Interface Session 3: Oracle R Enterprise 1.5.1 Embedded R Execution R Interface Oracle R Technologies Mark Hornick Director, Advanced Analytics and Machine Learning mark.hornick@oracle.com October 2018 Copyright

More information

Oracle Big Data Science IOUG Collaborate 16

Oracle Big Data Science IOUG Collaborate 16 Oracle Big Data Science IOUG Collaborate 16 Session 4762 Tim and Dan Vlamis Tuesday, April 12, 2016 Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri Developed 200+ Oracle

More information

Learning R Series Session 5: Oracle R Enterprise 1.3 Integrating R Results and Images with OBIEE Dashboards Mark Hornick Oracle Advanced Analytics

Learning R Series Session 5: Oracle R Enterprise 1.3 Integrating R Results and Images with OBIEE Dashboards Mark Hornick Oracle Advanced Analytics Learning R Series Session 5: Oracle R Enterprise 1.3 Integrating R Results and Images with OBIEE Dashboards Mark Hornick Oracle Advanced Analytics Learning R Series 2012 Session Title

More information

Learning R Series Session 3: Oracle R Enterprise 1.3 Embedded R Execution

Learning R Series Session 3: Oracle R Enterprise 1.3 Embedded R Execution Learning R Series Session 3: Oracle R Enterprise 1.3 Embedded R Execution Mark Hornick, Senior Manager, Development Oracle Advanced Analytics Learning R Series 2012 Session Title

More information

C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69

C5##54&6*6*1%2345*D&'*E2)2*F4G)&69 ?23(&65*@52%6&6'*A&)(*B*267* C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69!"#$%&'%(?2%3"9*

More information

My name is Brian Pottle. I will be your guide for the next 45 minutes of interactive lectures and review on this lesson.

My name is Brian Pottle. I will be your guide for the next 45 minutes of interactive lectures and review on this lesson. Hello, and welcome to this online, self-paced lesson entitled ORE Embedded R Scripts: SQL Interface. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle.

More information

Graph Analytics and Machine Learning A Great Combination Mark Hornick

Graph Analytics and Machine Learning A Great Combination Mark Hornick Graph Analytics and Machine Learning A Great Combination Mark Hornick Oracle Advanced Analytics and Machine Learning November 3, 2017 Safe Harbor Statement The following is intended to outline our research

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Getting Started with Advanced Analytics in Finance, Marketing, and Operations

Getting Started with Advanced Analytics in Finance, Marketing, and Operations Getting Started with Advanced Analytics in Finance, Marketing, and Operations Southwest Regional Oracle Applications User Group Dan Vlamis February 24, 2017 @VlamisSoftware Vlamis Software Solutions Vlamis

More information

Oracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features

Oracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features Oracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features Move the Algorithms; Not the Data! Charlie Berger, MS Engineering, MBA Sr. Director Product Management,

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Brendan Tierney. Running R in the Database using Oracle R Enterprise 05/02/2018. Code Demo

Brendan Tierney. Running R in the Database using Oracle R Enterprise 05/02/2018. Code Demo Running R in the Database using Oracle R Enterprise Brendan Tierney Code Demo Data Warehousing since 1997 Data Mining since 1998 Analytics since 1993 1 Agenda What is R? Oracle Advanced Analytics Option

More information

Data Science Bootcamp Curriculum. NYC Data Science Academy

Data Science Bootcamp Curriculum. NYC Data Science Academy Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations

More information

<Insert Picture Here>

<Insert Picture Here> 1 Oracle R Enterprise Training Sessions Session 1: Getting Started with Oracle R Enterprise Mark Hornick, Senior Manager, Development Oracle Advanced Analytics The following is intended

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with

More information

Big Data Infrastructures & Technologies

Big Data Infrastructures & Technologies Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory

More information

Scalable Machine Learning in R. with H2O

Scalable Machine Learning in R. with H2O Scalable Machine Learning in R with H2O Erin LeDell @ledell DSC July 2016 Introduction Statistician & Machine Learning Scientist at H2O.ai in Mountain View, California, USA Ph.D. in Biostatistics with

More information

ORAAH Change List Summary. ORAAH Change List Summary

ORAAH Change List Summary. ORAAH Change List Summary ORAAH 2.7.0 Change List Summary i ORAAH 2.7.0 Change List Summary ORAAH 2.7.0 Change List Summary ii REVISION HISTORY NUMBER DATE DESCRIPTION NAME ORAAH 2.7.0 Change List Summary iii Contents 1 ORAAH 2.7.0

More information

Oracle R Enterprise. New Features in Oracle R Enterprise 1.5. Release Notes Release 1.5

Oracle R Enterprise. New Features in Oracle R Enterprise 1.5. Release Notes Release 1.5 Oracle R Enterprise Release Notes Release 1.5 E49659-04 December 2016 These release notes contain important information about Release 1.5 of Oracle R Enterprise. New Features in Oracle R Enterprise 1.5

More information

Oracle Machine Learning & Advanced Analytics

Oracle Machine Learning & Advanced Analytics Oracle Machine Learning & Advanced Analytics Data Management Platforms Move the Algorithms; Not the Data! Detlef E. Schröder Master Principal Sales Consultant Machine Learning, AI and Cognitive Analytics

More information

Oracle Big Data Discovery

Oracle Big Data Discovery Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It

More information

Session 7: Oracle R Enterprise OAAgraph Package

Session 7: Oracle R Enterprise OAAgraph Package Session 7: Oracle R Enterprise 1.5.1 OAAgraph Package Oracle Spatial and Graph PGX Graph Algorithms Oracle R Technologies Mark Hornick Director, Oracle Advanced Analytics and Machine Learning July 2017

More information

Oracle R Advanced Analytics for Hadoop Release Notes. Oracle R Advanced Analytics for Hadoop Release Notes

Oracle R Advanced Analytics for Hadoop Release Notes. Oracle R Advanced Analytics for Hadoop Release Notes Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes i Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes ii REVISION HISTORY NUMBER

More information

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second

More information

Tackling Big Data Using MATLAB

Tackling Big Data Using MATLAB Tackling Big Data Using MATLAB Alka Nair Application Engineer 2015 The MathWorks, Inc. 1 Building Machine Learning Models with Big Data Access Preprocess, Exploration & Model Development Scale up & Integrate

More information

Matrix Computations and " Neural Networks in Spark

Matrix Computations and  Neural Networks in Spark Matrix Computations and " Neural Networks in Spark Reza Zadeh Paper: http://arxiv.org/abs/1509.02256 Joint work with many folks on paper. @Reza_Zadeh http://reza-zadeh.com Training Neural Networks Datasets

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

Oracle s Machine Learning and Advanced Analytics Data Management Platforms. Move the Algorithms; Not the Data

Oracle s Machine Learning and Advanced Analytics Data Management Platforms. Move the Algorithms; Not the Data Oracle s Machine Learning and Advanced Analytics Data Management Platforms Move the Algorithms; Not the Data Disclaimer The following is intended to outline our general product direction. It is intended

More information

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Distributed Machine Learning" on Spark

Distributed Machine Learning on Spark Distributed Machine Learning" on Spark Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Outline Data flow vs. traditional network programming Spark computing engine Optimization Example Matrix Computations

More information

An Introduction to Apache Spark

An Introduction to Apache Spark An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Oracle Machine Learning Notebook

Oracle Machine Learning Notebook Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com

More information

R mit Exadata und Exalytics Erfahrungen mit R

R mit Exadata und Exalytics Erfahrungen mit R R mit Exadata und Exalytics Erfahrungen mit R Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH extreme Datamining with Oracle R Enterprise

More information

Distributed Computing with Spark

Distributed Computing with Spark Distributed Computing with Spark Reza Zadeh Thanks to Matei Zaharia Outline Data flow vs. traditional network programming Limitations of MapReduce Spark computing engine Numerical computing on Spark Ongoing

More information

Distributed Computing with Spark and MapReduce

Distributed Computing with Spark and MapReduce Distributed Computing with Spark and MapReduce Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Traditional Network Programming Message-passing between nodes (e.g. MPI) Very difficult to do at scale:» How

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

KNIME for the life sciences Cambridge Meetup

KNIME for the life sciences Cambridge Meetup KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More

More information

Oracle Big Data Fundamentals Ed 1

Oracle Big Data Fundamentals Ed 1 Oracle University Contact Us: +0097143909050 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data

More information

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous

More information

Turning Relational Database Tables into Spark Data Sources

Turning Relational Database Tables into Spark Data Sources Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Scaled Machine Learning at Matroid

Scaled Machine Learning at Matroid Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Machine Learning Pipeline Learning Algorithm Replicate model Data Trained Model Serve Model Repeat entire pipeline Scaling

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle

More information

How to Troubleshoot Databases and Exadata Using Oracle Log Analytics

How to Troubleshoot Databases and Exadata Using Oracle Log Analytics How to Troubleshoot Databases and Exadata Using Oracle Log Analytics Nima Haddadkaveh Director, Product Management Oracle Management Cloud October, 2018 Copyright 2018, Oracle and/or its affiliates. All

More information

Oracle R Enterprise Platform and Configuration Requirements Oracle R Enterprise runs on 64-bit platforms only.

Oracle R Enterprise Platform and Configuration Requirements Oracle R Enterprise runs on 64-bit platforms only. Oracle R Enterprise Release Notes Release 1.5.1 E83205-02 April 2017 These release notes contain important information about Release 1.5.1 of Oracle R Enterprise. New Features in Oracle R Enterprise 1.5.1

More information

Using Existing Numerical Libraries on Spark

Using Existing Numerical Libraries on Spark Using Existing Numerical Libraries on Spark Brian Spector Chicago Spark Users Meetup June 24 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm

More information

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that

More information

Oracle Enterprise Data Quality - Roadmap

Oracle Enterprise Data Quality - Roadmap Oracle Enterprise Data Quality - Roadmap Mike Matthews Martin Boyd Director, Product Management Senior Director, Product Strategy Copyright 2014 Oracle and/or its affiliates. All rights reserved. Oracle

More information

Deploying, Managing and Reusing R Models in an Enterprise Environment

Deploying, Managing and Reusing R Models in an Enterprise Environment Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Index. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225

Index. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225 Index A Anonymous function, 66 Apache Hadoop, 1 Apache HBase, 42 44 Apache Hive, 6 7, 230 Apache Kafka, 8, 178 Apache License, 7 Apache Mahout, 5 Apache Mesos, 38 42 Apache Pig, 7 Apache Spark, 9 Apache

More information

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT. Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem

More information

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and AI and Visual Analytics: Machine Learning in Business Operations Steven Hillion Senior Director, Data Science Anshuman Mishra Principal Data Scientist DISCLAIMER During the course of this presentation,

More information

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230

More information

Javaentwicklung in der Oracle Cloud

Javaentwicklung in der Oracle Cloud Javaentwicklung in der Oracle Cloud Sören Halter Principal Sales Consultant 2016-11-17 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information

More information

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of

More information

ORAAH Change List Summary. ORAAH Change List Summary

ORAAH Change List Summary. ORAAH Change List Summary ORAAH 2.7.1 Change List Summary i ORAAH 2.7.1 Change List Summary ORAAH 2.7.1 Change List Summary ii REVISION HISTORY NUMBER DATE DESCRIPTION NAME ORAAH 2.7.1 Change List Summary iii Contents 1 ORAAH 2.7.1

More information

Unifying Big Data Workloads in Apache Spark

Unifying Big Data Workloads in Apache Spark Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Sparkling Water. August 2015: First Edition

Sparkling Water.   August 2015: First Edition Sparkling Water Michal Malohlava Alex Tellez Jessica Lanford http://h2o.gitbooks.io/sparkling-water-and-h2o/ August 2015: First Edition Sparkling Water by Michal Malohlava, Alex Tellez & Jessica Lanford

More information

Spotfire: Brisbane Breakfast & Learn. Thursday, 9 November 2017

Spotfire: Brisbane Breakfast & Learn. Thursday, 9 November 2017 Spotfire: Brisbane Breakfast & Learn Thursday, 9 November 2017 CONFIDENTIALITY The following information is confidential information of TIBCO Software Inc. Use, duplication, transmission, or republication

More information

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data

More information

Chapter 1 - The Spark Machine Learning Library

Chapter 1 - The Spark Machine Learning Library Chapter 1 - The Spark Machine Learning Library Objectives Key objectives of this chapter: The Spark Machine Learning Library (MLlib) MLlib dense and sparse vectors and matrices Types of distributed matrices

More information

Gain Greater Productivity in Enterprise Data Mining

Gain Greater Productivity in Enterprise Data Mining Clementine 9.0 Specifications Gain Greater Productivity in Enterprise Data Mining Discover patterns and associations in your organization s data and make decisions that lead to significant, measurable

More information

Boost your Analytics with Machine Learning for SQL Nerds. Julie mssqlgirl.com

Boost your Analytics with Machine Learning for SQL Nerds. Julie mssqlgirl.com Boost your Analytics with Machine Learning for SQL Nerds Julie Koesmarno @MsSQLGirl mssqlgirl.com 1. Y ML 2. Operationalizing ML 3. Tips & Tricks 4. Resources automation delighting customers Deepen Engagement

More information

Integrate MATLAB Analytics into Enterprise Applications

Integrate MATLAB Analytics into Enterprise Applications Integrate Analytics into Enterprise Applications Dr. Roland Michaely 2015 The MathWorks, Inc. 1 Data Analytics Workflow Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics

More information

What is Gluent? The Gluent Data Platform

What is Gluent? The Gluent Data Platform What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the

More information

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 3 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

ACHIEVEMENTS FROM TRAINING

ACHIEVEMENTS FROM TRAINING LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM

More information

<Insert Picture Here> MySQL Cluster What are we working on

<Insert Picture Here> MySQL Cluster What are we working on MySQL Cluster What are we working on Mario Beck Principal Consultant The following is intended to outline our general product direction. It is intended for information purposes only,

More information

Using Numerical Libraries on Spark

Using Numerical Libraries on Spark Using Numerical Libraries on Spark Brian Spector London Spark Users Meetup August 18 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm with

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Blended Learning Outline: Cloudera Data Analyst Training (171219a) Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills

More information

Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL

Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL Make Big Data + Analytics Simple Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Data Mining and Advanced Analytics

More information

Machine Learning and SystemML. Nikolay Manchev Data Scientist Europe E-

Machine Learning and SystemML. Nikolay Manchev Data Scientist Europe E- Machine Learning and SystemML Nikolay Manchev Data Scientist Europe E- mail: nmanchev@uk.ibm.com @nikolaymanchev A Simple Problem In this activity, you will analyze the relationship between educational

More information

MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by

MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by 1 MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by MathWorks In 2004, MATLAB had around one million users

More information

Machine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich

Machine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich Machine Learning In A Snap Thomas Parnell Research Staff Member IBM Research - Zurich What are GLMs? Ridge Regression Support Vector Machines Regression Generalized Linear Models Classification Lasso Regression

More information

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider Fault Detection using Advanced Analytics at CERN's Large Hadron Collider Antonio Romero Marín Manuel Martin Marquez USA - 27/01/2016 BIWA 16 1 What s CERN USA - 27/01/2016 BIWA 16 2 What s CERN European

More information

Python With Data Science

Python With Data Science Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,

More information

Automating Information Lifecycle Management with

Automating Information Lifecycle Management with Automating Information Lifecycle Management with Oracle Database 2c The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

Oracle Data Integrator 12c: Integration and Administration

Oracle Data Integrator 12c: Integration and Administration Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information