Taking R to New Heights for Scalability and Performance
|
|
- Alaina Hensley
- 6 years ago
- Views:
Transcription
1 Taking R to New Heights for Scalability and Performance Mark Hornick Director, Advanced Analytics and Machine Learning blogs.oracle.com/r January 31,2017
2 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. 2
3 Why statisticians data analysts data scientists use R Powerful Extensible Graphical Extensive statistics Ease of installation and use Rich ecosystem ~10K open source packages Millions of users worldwide Free R is a statistics language similar to Base SAS or SPSS statistics 3
4 Traditional R and Data Source Interaction Read/Write using built-in R capabilities read Flat Files extract / export Data Source export load RODBC / RJDBC / ROracle Deployment R script cron job Access latency Paradigm shift: R Data Access Language R Memory limitation data size, call-by-value Single threaded Ad hoc production deployment Issues for backup, recovery, security 4
5 How to take R to new heights for scalability and performance? i.e., to work on Big Data 5
6 Volume Veracity Variety Value Variability Velocity 6
7 Capabilities that take R to new heights Transparent Data Access, Analysis, and Exploration Production Deployment Scalable Machine Learning Data & Task Parallelism 7
8 Transparent data access, analysis, and exploration All Processing Light local Processing Translate R invocations Offload processing to more powerful machines and environments using data.frame proxies Avoid data movement SQL Spark Java, Python, Scala, R HiveQL Data Source Main Processing Main Processing Main Processing 8
9 Transparent data access and manipulation Maintain language features and interface Transparently translate R to language of powerful data processing engines Reference data to eliminate data movement Analyze all of your data 9
10 Proxy objects for Big Data data.frame Proxy data.frame Inherits from 10
11 Transparency Examples library(ore) ore.connect("rquser", "orcl", "localhost", "rquser", all=true) ore.ls() hist(my_table$arrdelay,breaks=100) merge (TEST_DF1, TEST_DF2, by.x="x1", by.y="x2") df <- with(ontime_s, ONTIME_S[DEST=="SFO" DEST=="BOS",1:21]) df$lrgdelay <- ifelse(df$arrdelay > 20,1,0) head(df) summary(df) # with OREdplyr in ORE select(flights, year, month, dep_delay) rename(flights, tail_num = tailnum) filter(flights, month == 1, day == 1) arrange(flights, year, month, day) mutate(flights, speed=air_time/distance) ore.frame Proxy Object 11
12 Scalable Machine Learning Offload model building to parallel software and powerful machines and environments Light local Processing Enable new technologies as they arise All Processing RDBMS Build models using powerful hardware and parallel algorithms Spark Flink Hadoop Data.Rdata Main Processing Main Processing Main Processing Main Processing OAA ORAAH, MLlib FlinkML ORAAH, Mahout 12
13 Scalable Machine Learning Maintain R machine learning interface Easy to specify formula minimal lines of code Include transformations, interaction terms, etc. Algorithm Façade Algorithm Target Predictors log(arrdelay) ~ DISTANCE + ORIGIN + DEST + as.factor(month) + as.factor(year) + as.factor(dayofmonth) + as.factor(dayofweek) + as.factor(flightnum) 13
14 Scalable Machine Learning Maintain R machine learning interface Easy to specify formula minimal lines of code Include transformations, interaction terms, etc. Bring the algorithm to the data Eliminate or minimize data movement Leverage proxy objects to reference data Slow Fast Algorithm Data 14
15 Scalable Machine Learning Maintain R machine learning interface Easy to specify formula minimal lines of code Include transformations, interaction terms, etc. Bring the algorithm to the data Eliminate or minimize data movement Leverage proxy objects to reference data Parallel, distributed algorithm implementations Oracle-proprietary parallel, distributed algorithms Leverage other open source packages and toolkits e.g., Apache Spark Mllib, Apache FlinkML Algorithm 15
16 Linear Model Performance Comparison Predict Total Revenue of a customer based on 31 numeric variables as predictors, on 184 million records using SPARC T5-8, 4TB of RAM Data in an Oracle Database table Algorithm Threads Used* Memory required** Time for Data Loading*** Time for Computation Total Relative Performance Open-Source R Linear Model (lm) 1 220Gb 1h3min 43min 1h46min 1x Oracle R Enterprise lm (ore.lm) min 42.8min 2.47X Oracle R Enterprise lm (ore.lm) min34s 1min34s 67.7X Oracle R Enterprise lm (ore.lm) s 57.97s 110X Oracle R Enterprise lm (ore.lm) s 41.69s 153X *Open-source R lm() is single threaded **Data moved into the R Session s memory, since open-source lm() requires all data to be in-memory ***How long it takes to load 40Gb of raw data into the open-source R Session s memory 16
17 Time in Seconds Time in Seconds Not all parallel implementations are the same Comparing performance with varying Spark memory footprints Benchmark on single X5-2 Node with 74 threads and 256 GB of Total RAM, Spark on CDH Input Data is 15GB Ontime airline dataset with 123mi records, predicting 8,926 total coefficients 1, LM in Spark Performance ORAAH LM Spark MLlib LM 908 1, GLM in Spark Performance ORAAH GLM Spark Mllib GLM DNF Spark Context memory in GB WNR LM formula used ARRDELAY ~ DISTANCE + ORIGIN + DEST + as.factor(month) + as.factor(year) + as.factor(dayofmonth) + as.factor(dayofweek) + as.factor(flightnum) GLM formula used CANCELLED DISTANCE + ORIGIN + DEST + as.factor(month) + as.factor(year) + as.factor(dayofmonth) + as.factor(dayofweek) + as.factor(flightnum) Spark Context memory in GB WNR
18 Data and Task Parallel Execution Easily specify parallelism and data partitioning Simplified API all-in-one Build and score with millions of models Automated management of parallel R engines Insulation from hardware details Limit resources as appropriate Startup and shutdown automatically Automated loading of data into parallel R engines Leverage CRAN packages 18
19 Data and Task Parallelism Hand-code logic to spawn R engines and partition and feed data to R engines as they become available All Processing parallel Rserve Rmpi snow Store UDF Invoke Execute user-defined R function on back-end servers in data-parallel or task-parallel manner Auto-partition and feed data while also leveraging CRAN packages Partition data by value Partition data by count invoke function with index Data.Rdata.Rdata.Rdata.Rdata Spawn & control R engines Provide function and data R Script Repository R Object Repository 19
20 Example API Supply data Specify function Use CRAN packages Store and load R objects Pass Arguments Specify parallelism Get/use results R objects structured data Images etc. library(e1071) mod <- ore.tableapply(iris_table, function(dat,datastore) { library(e1071) dat$species <- as.factor(dat$species) mod<-naivebayes(species ~., dat) ore.save(mod,name=datastore) }, datastore="nb_model-1") scorenbmodel <- function(dat, datastore) { } library(e1071) ore.load(datastore) dat$pred <- predict(mod, newdata = dat) dat IRIS_PRED IRIS_PRED$PRED <- "A" <- IRIS_TABLE[1,] res <- ore.rowapply(iris_table, scorenbmodel, datastore = "NB_Model-1", DAT <- ONTIME_S[ONTIME_S$DEST %in% parallel=4, FUN.VALUE=IRIS_PRED, rows=10) c("bos","sfo","lax","ord","atl","phx","den"),] No parallelism Data parallel by chunk Data parallel by partition modlist <- ore.groupapply( X=DAT, INDEX=DAT$DEST, parallel=3, function(dat) lm(arrdelay ~ DISTANCE + DEPDELAY, dat)) 20
21 Deployment Avoid costly recoding or translating R code Invoke R easily from non-r environments Map data structures and types naturally Seamlessly return data.frames, images, XML, JSON in local environment data structures 21
22 Deployment data.frame images R objects Application Business Logic C, C++, Java, etc. rjava API RServe client/server TCP/IP rpy2 system() cron independent Application / Dashboards Business Logic C, C++, Java SQL Allow applications and dashboard tools to use familiar, existing SQL protocols for invoking R Invoke R from SQL results as table: structured, image, XML, JSON automate parallel and concurrent execution R Scripts Data Stores RDBMS HDFS NoSQL Hive File system SQL - R Engine R Scripts Data RDBMS HDFS NoSQL Hive 22
23 Deploy R using SQL Store named R function in Script Repository from R or SQL Return values Images as PNG BLOB column data.frame content as database table XML with data.frame and image Benefits Fewer moving parts IPC data transfer speeds at backend Invoke same function from R or SQL Security Integrated backup and recovery begin sys.rqscriptdrop('randomreddots'); sys.rqscriptcreate('randomreddots', 'function(){ id <- 1:10 plot( 1:100, rnorm(100), pch = 21, bg = "red", cex = 2, main="random Red Dots" ) data.frame(id=id, val=id / 100) }'); end; select from select from ID, IMAGE table(rqeval( NULL,'PNG','RandomRedDots')); id, val table(rqeval( NULL,'select 1 id, 1 val from dual', 'RandomRedDots')); -- Return structured and image content within XML string select * from table(rqeval(null, 'XML', 'RandomRedDots')); -- In R, invoke same function by name ore.doeval(fun.name='randomreddots') 23
24 Architectural Elements: Enabling R for Big Data Leverage powerful back-ends for the heavy lifting transparently Leverage new, more powerful back-ends more easily as they appear Enable parallelism quickly and easily for big data processing Immediately leverage data scientist R scripts and results in production environments Production Deployment Transparent Data Access, Analysis, and Exploration Data & Task Parallelism Scalable Machine Learning 24
25 Oracle R Enterprise Part of Oracle Advanced Analytics option to Oracle Database Use Oracle Database as HPC environment Use in-database parallel and distributed machine learning algorithms Manage R scripts and R objects in Oracle Database Integrate R results into applications and dashboards via SQL R Client Oracle R Enterprise Oracle Database User tables In-db stats SQL Interfaces SQL*Plus, SQLDeveloper, Database Server Machine 25
26 Oracle R Enterprise Part of Oracle Advanced Analytics option to Oracle Database Transparency layer Leverage proxy objects (ore.frames) - data remains in the database Overload R functions that translate functionality to SQL Use standard R syntax to manipulate database data Parallel, distributed algorithms Scalability and performance Oracle R Enterprise Exposes in-database algorithms from ODM Additional R-based algorithms executing and database server Embedded R execution Manage and invoke R scripts in Oracle Database Data-parallel, task-parallel, and non-parallel execution Use open source CRAN packages R Client Oracle Database User tables In-db stats SQL Interfaces SQL*Plus, SQLDeveloper, Database Server Machine 26
27 OAA / Oracle R Enterprise Predictive Analytics algorithms in-database plus open source R packages for algorithms in combination with embedded R data- and task-parallel execution Classification Decision Tree Logistic Regression Naïve Bayes Support Vector Machine RandomForest Regression Linear Model Generalized Linear Model Multi-Layer Neural Networks Stepwise Linear Regression Support Vector Machine New in ORE *ODB 12.2 only Clustering Hierarchical k-means Orthogonal Partitioning Expectation Maximization* Attribute Importance Minimum Description Length Expectation Maximization* Anomaly Detection 1 Class Support Vector Machine Market Basket Analysis Apriori Association Rules Feature Extraction Nonnegative Matrix Factorization Principal Component Analysis Singular Value Decomposition Explicit Semantic Analysis* Time Series Single Exponential Smoothing Double Exponential Smoothing 27
28 Oracle R Advanced Analytics for Hadoop Using Hadoop/Hive/Spark Integration, plus R Engine and Open-Source R Packages Oracle R Advanced Analytics for Hadoop (ORAAH) on Hadoop Cluster R interface to HQL Basic Statistics, Data Prep, Joins and View creation Parallel, distributed algorithms: ORAAH Spark algorithms: Deep Neural, GLM, LM Spark MLlib algorithms: LM, GLM, LASSO, Ridge Regression, Decision Trees, Random Forests, SVM, k-means, PCA HQL Use of Open-source R packages via custom R Mappers / Reducers Oracle Database with Advanced Analytics option R Client R Analytics Oracle R Advanced Analytics for Hadoop SQL Client SQL Developer Other SQL Apps
29 Oracle R Advanced Analytics for Hadoop Predictive Analytics algorithms Classification GLM ORAAH Logistic Regression ORAAH Logistic Regression Random Forests Decision Trees Support Vector Machines Clustering Hierarchical k-means Hierarchical k-means Gaussian Mixture Models Regression MLP Neural Networks ORAAH LASSO Ridge Regression Support Vector Machines Random Forest Linear Regression Basic Statistics Correlation/Covariance Feature Extraction Non-negative Matrix Factorization Collaborative Filtering (LMF) Singular Value Decomposition Attribute Importance Principal Components Analysis Principal Components Analysis 29
30 Cloud-Based Machine Learning Oracle Advanced Analytics option including Oracle Data Mining and Oracle R Enterprise on: Oracle Exadata Cloud Service Oracle Database Cloud Service: Included in High Performance and Extreme Performance services Oracle R Advanced Analytics for Hadoop Included in the Oracle Big Data Cloud Service 30
31 Demonstration of ORE 31
32 Join us for the Oracle R Enterprise Hands-on Lab 9:00 Using R for Big Data Advanced Analytics and Machine Learning data exploration / attribute importance clustering regression OREdplyr, and more 32
33 Join us for Big Data with ORAAH 1:00 Using Machine Learning to unlock the Business Value in Big Data 33
34 Join us for new technology intro 9:50 Combining Graph and Machine Learning Technologies using R 34
35 Join us for new technology intro 10:55 Introducing Oracle Machine Learning new notebook technology from Oracle 35
36 Learn More about Oracle s Advanced Analytics R Technologies 36
37
Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016
Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016 Mark Hornick, Director, Advanced Analytics January 27, 2016 Safe Harbor Statement The following
More informationOracle R Technologies
Oracle R Technologies R for the Enterprise Mark Hornick, Director, Oracle Advanced Analytics @MarkHornick mark.hornick@oracle.com Safe Harbor Statement The following is intended to outline our general
More informationOracle Machine Learning Notebook
Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com
More informationSession 4: Oracle R Enterprise Embedded R Execution SQL Interface Oracle R Technologies
Session 4: Oracle R Enterprise 1.5.1 Embedded R Execution SQL Interface Oracle R Technologies Mark Hornick Director, Oracle Advanced Analytics and Machine Learning July 2017 Safe Harbor Statement The following
More informationCombining Graph and Machine Learning Technology using R
Combining Graph and Machine Learning Technology using R Hassan Chafi Oracle Labs Mark Hornick Oracle Advanced Analytics February 2, 2017 Safe Harbor Statement The following is intended to outline our research
More informationIntroducing Oracle Machine Learning
Introducing Oracle Machine Learning A Collaborative Zeppelin notebook for Oracle s machine learning capabilities Charlie Berger Marcos Arancibia Mark Hornick Advanced Analytics and Machine Learning Copyright
More informationOracle Big Data Science
Oracle Big Data Science Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com @VlamisSoftware Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri
More informationOracle R Technologies Overview
Oracle R Technologies Overview Mark Hornick, Director, Oracle Advanced Analytics The following is intended to outline our general product direction. It is intended for information
More informationUsing Machine Learning in OBIEE for Actionable BI. By Lakshman Bulusu Mitchell Martin Inc./ Bank of America
Using Machine Learning in OBIEE for Actionable BI By Lakshman Bulusu Mitchell Martin Inc./ Bank of America Using Machine Learning in OBIEE for Actionable BI Using Machine Learning (ML) via Oracle R Technologies
More informationOracle R Technologies Overview
Oracle R Technologies Overview Mark Hornick, Director, Oracle Database Advanced Analytics The following is intended to outline our general product direction. It is intended for information
More informationIntroducing Oracle R Enterprise 1.4 -
Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationSession 3: Oracle R Enterprise Embedded R Execution R Interface
Session 3: Oracle R Enterprise 1.5.1 Embedded R Execution R Interface Oracle R Technologies Mark Hornick Director, Advanced Analytics and Machine Learning mark.hornick@oracle.com October 2018 Copyright
More informationOracle Big Data Science IOUG Collaborate 16
Oracle Big Data Science IOUG Collaborate 16 Session 4762 Tim and Dan Vlamis Tuesday, April 12, 2016 Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri Developed 200+ Oracle
More informationLearning R Series Session 5: Oracle R Enterprise 1.3 Integrating R Results and Images with OBIEE Dashboards Mark Hornick Oracle Advanced Analytics
Learning R Series Session 5: Oracle R Enterprise 1.3 Integrating R Results and Images with OBIEE Dashboards Mark Hornick Oracle Advanced Analytics Learning R Series 2012 Session Title
More informationLearning R Series Session 3: Oracle R Enterprise 1.3 Embedded R Execution
Learning R Series Session 3: Oracle R Enterprise 1.3 Embedded R Execution Mark Hornick, Senior Manager, Development Oracle Advanced Analytics Learning R Series 2012 Session Title
More informationC5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69
?23(&65*@52%6&6'*A&)(*B*267* C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69!"#$%&'%(?2%3"9*
More informationMy name is Brian Pottle. I will be your guide for the next 45 minutes of interactive lectures and review on this lesson.
Hello, and welcome to this online, self-paced lesson entitled ORE Embedded R Scripts: SQL Interface. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle.
More informationGraph Analytics and Machine Learning A Great Combination Mark Hornick
Graph Analytics and Machine Learning A Great Combination Mark Hornick Oracle Advanced Analytics and Machine Learning November 3, 2017 Safe Harbor Statement The following is intended to outline our research
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationGetting Started with Advanced Analytics in Finance, Marketing, and Operations
Getting Started with Advanced Analytics in Finance, Marketing, and Operations Southwest Regional Oracle Applications User Group Dan Vlamis February 24, 2017 @VlamisSoftware Vlamis Software Solutions Vlamis
More informationOracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features
Oracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features Move the Algorithms; Not the Data! Charlie Berger, MS Engineering, MBA Sr. Director Product Management,
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationBrendan Tierney. Running R in the Database using Oracle R Enterprise 05/02/2018. Code Demo
Running R in the Database using Oracle R Enterprise Brendan Tierney Code Demo Data Warehousing since 1997 Data Mining since 1998 Analytics since 1993 1 Agenda What is R? Oracle Advanced Analytics Option
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More information<Insert Picture Here>
1 Oracle R Enterprise Training Sessions Session 1: Getting Started with Oracle R Enterprise Mark Hornick, Senior Manager, Development Oracle Advanced Analytics The following is intended
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationOutrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS
Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory
More informationScalable Machine Learning in R. with H2O
Scalable Machine Learning in R with H2O Erin LeDell @ledell DSC July 2016 Introduction Statistician & Machine Learning Scientist at H2O.ai in Mountain View, California, USA Ph.D. in Biostatistics with
More informationORAAH Change List Summary. ORAAH Change List Summary
ORAAH 2.7.0 Change List Summary i ORAAH 2.7.0 Change List Summary ORAAH 2.7.0 Change List Summary ii REVISION HISTORY NUMBER DATE DESCRIPTION NAME ORAAH 2.7.0 Change List Summary iii Contents 1 ORAAH 2.7.0
More informationOracle R Enterprise. New Features in Oracle R Enterprise 1.5. Release Notes Release 1.5
Oracle R Enterprise Release Notes Release 1.5 E49659-04 December 2016 These release notes contain important information about Release 1.5 of Oracle R Enterprise. New Features in Oracle R Enterprise 1.5
More informationOracle Machine Learning & Advanced Analytics
Oracle Machine Learning & Advanced Analytics Data Management Platforms Move the Algorithms; Not the Data! Detlef E. Schröder Master Principal Sales Consultant Machine Learning, AI and Cognitive Analytics
More informationOracle Big Data Discovery
Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It
More informationSession 7: Oracle R Enterprise OAAgraph Package
Session 7: Oracle R Enterprise 1.5.1 OAAgraph Package Oracle Spatial and Graph PGX Graph Algorithms Oracle R Technologies Mark Hornick Director, Oracle Advanced Analytics and Machine Learning July 2017
More informationOracle R Advanced Analytics for Hadoop Release Notes. Oracle R Advanced Analytics for Hadoop Release Notes
Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes i Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes ii REVISION HISTORY NUMBER
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationTackling Big Data Using MATLAB
Tackling Big Data Using MATLAB Alka Nair Application Engineer 2015 The MathWorks, Inc. 1 Building Machine Learning Models with Big Data Access Preprocess, Exploration & Model Development Scale up & Integrate
More informationMatrix Computations and " Neural Networks in Spark
Matrix Computations and " Neural Networks in Spark Reza Zadeh Paper: http://arxiv.org/abs/1509.02256 Joint work with many folks on paper. @Reza_Zadeh http://reza-zadeh.com Training Neural Networks Datasets
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationOracle s Machine Learning and Advanced Analytics Data Management Platforms. Move the Algorithms; Not the Data
Oracle s Machine Learning and Advanced Analytics Data Management Platforms Move the Algorithms; Not the Data Disclaimer The following is intended to outline our general product direction. It is intended
More informationSpotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data
Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationDistributed Machine Learning" on Spark
Distributed Machine Learning" on Spark Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Outline Data flow vs. traditional network programming Spark computing engine Optimization Example Matrix Computations
More informationAn Introduction to Apache Spark
An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationOracle Machine Learning Notebook
Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com
More informationR mit Exadata und Exalytics Erfahrungen mit R
R mit Exadata und Exalytics Erfahrungen mit R Oliver Bracht Managing Director eoda Matthias Fuchs Senior Consultant ISE Information Systems Engineering GmbH extreme Datamining with Oracle R Enterprise
More informationDistributed Computing with Spark
Distributed Computing with Spark Reza Zadeh Thanks to Matei Zaharia Outline Data flow vs. traditional network programming Limitations of MapReduce Spark computing engine Numerical computing on Spark Ongoing
More informationDistributed Computing with Spark and MapReduce
Distributed Computing with Spark and MapReduce Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Traditional Network Programming Message-passing between nodes (e.g. MPI) Very difficult to do at scale:» How
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationKNIME for the life sciences Cambridge Meetup
KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More
More informationOracle Big Data Fundamentals Ed 1
Oracle University Contact Us: +0097143909050 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationTurning Relational Database Tables into Spark Data Sources
Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationScaled Machine Learning at Matroid
Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Machine Learning Pipeline Learning Algorithm Replicate model Data Trained Model Serve Model Repeat entire pipeline Scaling
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle
More informationHow to Troubleshoot Databases and Exadata Using Oracle Log Analytics
How to Troubleshoot Databases and Exadata Using Oracle Log Analytics Nima Haddadkaveh Director, Product Management Oracle Management Cloud October, 2018 Copyright 2018, Oracle and/or its affiliates. All
More informationOracle R Enterprise Platform and Configuration Requirements Oracle R Enterprise runs on 64-bit platforms only.
Oracle R Enterprise Release Notes Release 1.5.1 E83205-02 April 2017 These release notes contain important information about Release 1.5.1 of Oracle R Enterprise. New Features in Oracle R Enterprise 1.5.1
More informationUsing Existing Numerical Libraries on Spark
Using Existing Numerical Libraries on Spark Brian Spector Chicago Spark Users Meetup June 24 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm
More informationIBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics
IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that
More informationOracle Enterprise Data Quality - Roadmap
Oracle Enterprise Data Quality - Roadmap Mike Matthews Martin Boyd Director, Product Management Senior Director, Product Strategy Copyright 2014 Oracle and/or its affiliates. All rights reserved. Oracle
More informationDeploying, Managing and Reusing R Models in an Enterprise Environment
Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationIndex. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225
Index A Anonymous function, 66 Apache Hadoop, 1 Apache HBase, 42 44 Apache Hive, 6 7, 230 Apache Kafka, 8, 178 Apache License, 7 Apache Mahout, 5 Apache Mesos, 38 42 Apache Pig, 7 Apache Spark, 9 Apache
More informationOracle Big Data. A NA LYT ICS A ND MA NAG E MENT.
Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem
More informationThis document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and
AI and Visual Analytics: Machine Learning in Business Operations Steven Hillion Senior Director, Data Science Anshuman Mishra Principal Data Scientist DISCLAIMER During the course of this presentation,
More informationIntroducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone
Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230
More informationJavaentwicklung in der Oracle Cloud
Javaentwicklung in der Oracle Cloud Sören Halter Principal Sales Consultant 2016-11-17 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information
More informationEvent: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect
Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect BEOP.CTO.TP4 Owner: OCTO Revision: 0001 Approved by: JAT Effective: 08/30/2018 Buchanan & Edwards Proprietary: Printed copies of
More informationORAAH Change List Summary. ORAAH Change List Summary
ORAAH 2.7.1 Change List Summary i ORAAH 2.7.1 Change List Summary ORAAH 2.7.1 Change List Summary ii REVISION HISTORY NUMBER DATE DESCRIPTION NAME ORAAH 2.7.1 Change List Summary iii Contents 1 ORAAH 2.7.1
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationSparkling Water. August 2015: First Edition
Sparkling Water Michal Malohlava Alex Tellez Jessica Lanford http://h2o.gitbooks.io/sparkling-water-and-h2o/ August 2015: First Edition Sparkling Water by Michal Malohlava, Alex Tellez & Jessica Lanford
More informationSpotfire: Brisbane Breakfast & Learn. Thursday, 9 November 2017
Spotfire: Brisbane Breakfast & Learn Thursday, 9 November 2017 CONFIDENTIALITY The following information is confidential information of TIBCO Software Inc. Use, duplication, transmission, or republication
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationChapter 1 - The Spark Machine Learning Library
Chapter 1 - The Spark Machine Learning Library Objectives Key objectives of this chapter: The Spark Machine Learning Library (MLlib) MLlib dense and sparse vectors and matrices Types of distributed matrices
More informationGain Greater Productivity in Enterprise Data Mining
Clementine 9.0 Specifications Gain Greater Productivity in Enterprise Data Mining Discover patterns and associations in your organization s data and make decisions that lead to significant, measurable
More informationBoost your Analytics with Machine Learning for SQL Nerds. Julie mssqlgirl.com
Boost your Analytics with Machine Learning for SQL Nerds Julie Koesmarno @MsSQLGirl mssqlgirl.com 1. Y ML 2. Operationalizing ML 3. Tips & Tricks 4. Resources automation delighting customers Deepen Engagement
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Dr. Roland Michaely 2015 The MathWorks, Inc. 1 Data Analytics Workflow Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics
More informationWhat is Gluent? The Gluent Data Platform
What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the
More informationCloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 3 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationACHIEVEMENTS FROM TRAINING
LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM
More information<Insert Picture Here> MySQL Cluster What are we working on
MySQL Cluster What are we working on Mario Beck Principal Consultant The following is intended to outline our general product direction. It is intended for information purposes only,
More informationUsing Numerical Libraries on Spark
Using Numerical Libraries on Spark Brian Spector London Spark Users Meetup August 18 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm with
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationBig Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL
Big Data Analytics with Oracle Advanced Analytics 12c and Big Data SQL Make Big Data + Analytics Simple Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Data Mining and Advanced Analytics
More informationMachine Learning and SystemML. Nikolay Manchev Data Scientist Europe E-
Machine Learning and SystemML Nikolay Manchev Data Scientist Europe E- mail: nmanchev@uk.ibm.com @nikolaymanchev A Simple Problem In this activity, you will analyze the relationship between educational
More informationMATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by
1 MATLAB is a multi-paradigm numerical computing environment fourth-generation programming language. A proprietary programming language developed by MathWorks In 2004, MATLAB had around one million users
More informationMachine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich
Machine Learning In A Snap Thomas Parnell Research Staff Member IBM Research - Zurich What are GLMs? Ridge Regression Support Vector Machines Regression Generalized Linear Models Classification Lasso Regression
More informationFault Detection using Advanced Analytics at CERN's Large Hadron Collider
Fault Detection using Advanced Analytics at CERN's Large Hadron Collider Antonio Romero Marín Manuel Martin Marquez USA - 27/01/2016 BIWA 16 1 What s CERN USA - 27/01/2016 BIWA 16 2 What s CERN European
More informationPython With Data Science
Course Overview This course covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases. Who Should Attend Data Scientists, Software Developers,
More informationAutomating Information Lifecycle Management with
Automating Information Lifecycle Management with Oracle Database 2c The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationOracle Data Integrator 12c: Integration and Administration
Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Oracle Data Integrator 12c: Integration and Administration Duration: 5 Days What you will learn Oracle Data Integrator is a comprehensive
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More information