C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69

Size: px
Start display at page:

Download "C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69"

Transcription

1 C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69!"#$%&'%(?2%3"9*<%263&H&2 I%"7G3)*?262'5% E2)2*F3&5635*267*D&'*E2)2 13)"H5%*,J0*,-./!"#$%&'()*+*,-./0 1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>

2 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. Copyright 2018, Oracle and/or its affiliates. All rights reserved. 2

3 1%2345*D&'*E2)2*?262'5% J

4 !"635#)G24 <3)&"62H45 R=56)9 <3)&"62H45 E2)2*F5)9 <3)&"62H45?5)%&39 N6#G) R=56)9 F)%52L&6'*R6'&65 R6)5%#%&95*E2)2*S*B5#"%)&6' F)%G3)G%57 R6)5%#%&95* E2)2 RT53G)&"6 N66"=2)&"6 E2)2 E&93"=5%$* 1G)#G) Q

5 I%23)&324 <3)&"62H45 R=56)9 <3)&"62H45 E2)2*F5)9 <3)&"62H45?5)%&39 N6#G) R=56)9 1HW53)*F)"%5 K27""#8KEXF F)%52L&6'*R6'&65 R6)5%#%&95*E2)2*S*B5#"%)&6' F)%G3)G%57 R6)5%#%&95* E2)2 RT53G)&"6 N66"=2)&"6 E2)2 V")5H""O98<624$)&3*F5%=&359 E&93"=5%$* 1G)#G) U

6 D&'*E2)2*?262'5%! N634G757 A&)(*244*D&'*E2)2*"::5%&6'9*YDE<0* DE!!*267*DE!FZ! R62H459*L299&=5*#2%24454*3"#$*":*72)2* 45=5%2'&6'*<#23(5*F#2%O " KEXF*[\\] KEXF " KEXF*[\\]*1HW53)*F)"%2'5*Y!4"G7Z " X&45*E&::\&6' 267*3(53O&6'*2:)5%*3"#&59! DG&47*267*L262'5*#&#54&659! RLH57757*C5##54&6*V")5H""O " <624$^5*72)2*&69)26)4$ " <624$^5*2)*93245*A&)(*1%2345*B*<7=26357*<624$)&39* :"%*K27""# _

7 N634G757 A&)(*244*D&'*E2)2*"::5%&6'9*YDE<0* DE!!*267*DE!FZ! R62H459*L299&=5*#2%24454*3"#$*":*72)2* 45=5%2'&6'*<#23(5*F#2%O " KEXF*[\\] KEXF " KEXF*[\\]*1HW53)*F)"%2'5*Y!4"G7Z " X&45*E&::\&6' 267*3(53O&6'*2:)5%*3"#&59! DG&47*267*L262'5*#&#54&659! RLH57757*C5##54&6*V")5H""O <G)"6"L"G9*KGH*` D&'*E2)2*!4"G7*F5%=&35?"93"65*F"G)( Y?"672$*` c $z " <624$^5*72)2*&69)26)4$ " <624$^5*2)*93245*A&)(*1%2345*B*<7=26357*<624$)&39* :"%*K27""# a

8 /

9 X&45*H%"A95%*562H459*954:\ 95%=&359*72)2*L"=5L56)*:%"L* :"%*5T2L#45*KEXF*)"*1HW53)* F)"%2'5 d

10 NL#"%)26)4$*)(&9*7%2'S7%"# "%*3"#$*&9*)G%657*&6)"*2*F#2%O* #%"'%2L0*A(&3(*&9*5T53G)57* "%*93(57G457.-

11

12 I&3O*$"G%*:2="%&)5*6")5H""O* 56=&%"6L56)*267*9)2%)*)"* 3"75*&6*(5%5*2'2&69)*$"G%* 2624$)&39*4&H%2%&59 M(5*529&59)*A2$*)"*HG&47*"G)*2* 42H*&9*)"*45=5%2'5*9"L5* O6"A6*H29&39*267*34G9)5%&6'* )"*%G6*W"H9*&6*#2%24454 f95*4&h%2%&59*4&o5*b0* M569"%:4"A 267*!2::5*:"%*$"G%* &:*#"99&H45* &6*#2%24454 K27""#8KEXF.,

13 D&'*E2)2*?262'5%*A&)(*B*N6)5%#%5)5% <H&4&)$*)"*=&9G24&^5*95=5%24*9"G%359*:%"L*V")5H""O9*"6*DE<0*DE!F*267*DE!! )*+,-,./) ,6:2;,<,=6,*83./;=9.J

14 <66"G63&6'b*1B<<K*,;/;-*:"%*F#2%O*,;T.Q

15 What is ORAAH (Oracle R Advanced Analytics for Hadoop) ORAAH is a set of R packages and Java libraries that provide: An R interface for manipulating data stored in a local File System, HDFS, HIVE, Impala or JDBC sources, and creating Distributed Model Matrices across a Cluster of Hadoop Nodes in preparation for ML. A general computation framework where users invoke parallel, distributed MapReduce jobs from R, writing custom mappers and reducers in R while also leveraging open source CRAN packages. Parallel and distributed Machine Learning algorithms that take advantage of all the nodes of a Hadoop cluster for scalable, high performance modeling on big data. Functions use the expressive R formula object optimized for Spark parallel execution. ORAAH's custom LM/GLM/MLP NN algorithms on Spark scale better and run faster than the open-source Spark MLlib functions, but ORAAH provides interfaces to MLlib as well. Copyright 2018, Oracle and/or its affiliates. All rights reserved. 15

16 Where is ORAAH available? On premises: Part of the Oracle Big Data Connectors license for the Oracle Big Data Appliance, DIY Cloudera clusters and DIY Hortonworks clusters. On Oracle Cloud: Part of the Oracle Big Data Connectors license that is included with the Oracle Big Data Cloud Service and the Oracle Big Data Cloud at Customer Included as part of the Big Data Cloud (formerly known as Compute Edition) Copyright 2018, Oracle and/or its affiliates. All rights reserved. 16

17 ORAAH Benefits: Making Spark MLlib better for R users ORAAH Formula parser can handle the full set of open-source R formula transformations, so it can be used with any Spark MLlib algorithm supported by ORAAH. Even in newer Spark releases (Oct 2018) SparkR fails to process a simple interaction between attributes. Using SparkMLlib Logistic Regression model in SparkR fails: R> model <- glm( Kyphosis ~ (Age + Number)^2, df, family = "binomial") ERROR RBackendHandler: fitrmodelformula on org.apache.spark.ml.api.r.sparkrwrappers failed Error in invokejava(isstatic = TRUE, classname, methodname,...) :java.lang.illegalargumentexception: Could not parse formula: Kyphosis ~ (Age + Number)^2 Using Spark MLlib Logistic Regression model via ORAAH R> model <- orch.ml.logistic( Kyphosis ~ (Age + Number)^2, data = data) OBX Model Matrix: processed 1 factor variables, sec OBX Model Matrix: created MLlib LabeledPoint RDD (81 rows) sec OBX Machine Learning: MLlib Logistic Regression elapsed time sec R> model$coefficients [1] produces the same exact result from open-source R glm( Kyphosis ~ (Age + Number)^2, data = kyphosis, family = "binomial")$coefficients (Intercept) Age Number Age:Number Copyright 2018, Oracle and/or its affiliates. All rights reserved. 17

18 ORAAH and Python: Simple and clean code: building a Spark MLlib Random Forest model from HIVE source Python user steps 47 lines Load Libraries Establish Spark Session Process Formula Copy data from HIVE Create 3 rd copy of Data for vectors Build Model Single Vector of Predictions ORAAH user steps 14 lines Load Libraries Establish HIVE and Spark Session Build Model directly against HIVE (also HDFS, IMPALA,, JDBC or Spark DF) data with full formula support Predictions exported with desired columns, no need to glue back original columns Copyright 2018, Oracle and/or its affiliates. All rights reserved. 18

19 Machine Learning Algorithms and Utilities in ORAAH Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark DF interfaces RED indicates new in release Classification Clustering Extreme Learning Machines (Oracle s MPI/Spark-based) Hierarchical k-means (Spark MLlib) Hierarchical-ELM (Oracle s MPI/Spark-based) Gaussian Mixture Models (Spark MLlib) Multi-Layer Neural Nets (Oracle s Spark-based) Hierarchical k-means (also available in Map-Red) Logistic Regression (Oracle s Spark-based) Feature Extraction & Creation Gradient Boosted Trees (Spark MLlib) Logistic Regression (Spark MLlib) Distributed Stochastic PCA (Oracle s MPI/Spark-based) Decision Trees (Spark MLlib) Distributed Stochastic SVD (Oracle s MPI/Spark-based) Random Forest (Spark MLlib) Principal Component Analysis (Spark MLlib) Nonnegative Matrix Factorization (Map-Red) Regression Low Rank Matrix Factorization (Map-Red) Multi-Layer Neural Nets (Oracle s Spark-based) Transparency Functions with IMPALA and HIVE Linear Regression Model (Oracle s Spark-based) Gradient Boosted Trees (Spark MLlib) Aggregations, Table Joins, summarization Linear Regression Model (Spark MLlib) Variable Creation, Push & Pull data from IMPALA and HIVE Support Vector Machine (SVM) (Spark MLlib) Ability to push and pull data from Oracle Database LASSO (Spark MLlib) JDBC Driver interface - build Spark DataFrames for ORAAH Ridge Regression (Spark MLlib) Open Source R Algorithms Random Forest (Spark MLlib) Ability to run any R package via our hadoop.run Decision Trees (Spark MLlib) function in Map-Reduce mode Copyright 2018, Oracle and/or its affiliates. All rights reserved.

20 :"%*E2)2*)(2)*:&)9*&6*L5L"%$l lhg)*249"*2h45*)"*9"4=5*2*.-h&*%"a*l"754*ya(&3(*3266")*:&)*56)&%54$*&6*l5l"%$z* <44*)59)9*%G6*"6*2*_\V"75*D&'*E2)2*<##4&2635*qa\,*A&)(*,U_jD*":*B<?*#5%*V"75 X"%LG42b* *r*7&9)2635*s*"%&'&6*s*759) s*29;:23)"%yl"6)(z*s*29;:23)"%y$52%z*s*29;:23)"%y72$":l"6)(z*s*29;:23)"%y72$":a55oz*s*29;:23)"%y:4&'()6glz,-

21

22 1B<<KP9*V5A*E&9)%&HG)57*FhE D563(L2%O*":*1B<<KP9*YF#2%Os?INZ*=9*F#2%O*?44&Hb*_T*:29)5%*s*4&652%*93245*G# F&6'G42%*h53)"%*E53"L#"9&)&"6*,-O*T*,-O*75695*&6#G)*YJ;,jHZ.-*)(%5279*Y.,*23)G24*3"%590*\ qltq-jhz B26O ;U90;2 9+,6:MQ") VWI QR-?O 90;P2 MF8<-?OU X,Y,.-- HT Qd. 9+,6:2QR-?O $#AA/29+,6:MQ").U- I&Z a-a,-- IZ' d/,,,

23 V5A*1B<<K*f)&4&)$*:G63)&"69*:"%*E2)2* I%"3599&6'*267*N6'59) [;\]2,FJ29+,6:2;,<,2=6,*83,J

24 1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69,Q

25 1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69 467EL73YKJ> N6)5%#%5)9*2*!Fh*:&45*267*4"279* &)*&6*L5L"%$*&6)"*2*F#2%O*EX;* 9$9)5L*"%*KEXF,U

26 1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69 467ELJ8376?O8 j565%2)59*2*9&l#45*9gll2%$* ":*)(5*&6:"%L2)&"6*&6*2* F#2%OEX,_

27 1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69 467EL74--87< D%&6'9*2*F#2%OEX &6)"*BP9*4"324* L5L"%$*:"%*:G%)(5%* L26&#G42)&"6*"%*%59G4)*#%&6)&6',a

28 1B<<K*,;/;-*\ V5A*F#2%O*EX*F324&6'*:G63)&"6,/

29 1B<<K*,;/;-*\ V5A*F#2%O*EX*F324&6'*:G63)&"6 467EL37,-8 F32459*2*F#2%O*EX*G9&6'*"65*":*.U*

30 1B<<K*,;/;-*\ V5A*F#2%O*EX*F324&6'*:G63)&"6 ^3E4S F#2%O*EX*2335#)*L26$*%g2=2 75:2G4)* :G63)&"690*:%"L*9("AYZ*)"*3"G6)YZ J-

31 V5A*gED!*&6)5%:235!!%52)59*2*F#2%O*E2)2X%2L5 :%"L*2*gED!*9"G%35*)(2)*326*H5*G957*"6*26$*":* J.

32 ?"754*#5%:"%L2635*92L#45*:"%*<449)2)5*I%57&3)&"6*!(24456'5 D *92L#45*"G)*":*)(5*"%&'&624*.J;QL&*%53"%790*_-t8Q-t*9#4&)*:"%*M59)&6' $#AA/29+,6:MQ") 1RQ! <4L"9)*29*'""7*29*)(5* JT*:29)5%*)"*HG&47 $#AA/29+,6:2_RQK! X29)59)*)"*DG&47*267* F3"%5 J,

33 ?"754*#5%:"%L2635*92L#45*:"%*<449)2)5*I%57&3)&"6*!(24456'5 D *92L#45*"G)*":*)(5*"%&'&624*.J;QL&*%53"%790*_-t8Q-t*9#4&)*:"%*M59)&6' JJ

34 @&=5*E5L" D&'*E2)2*?262'5%*V")5H""O9 `46:?FN2S?<E2R,6N829+,6:2]-53<863

35 JU

36 J_

37 Ja

38 J/

39 Jd

40 Q-

41 Q.

42 Q,

43 QJ

44 QQ

45 QU

46 L")&=2)&"6*:"%*)(5*:G)G%5 B5'%599&"6 E53&9&"6*M%559 ())#b88aaa;l4$52%6&6';"%'8 Q_

47 34"G73G9)"L5%3"6653);"%2345;3"L

48

49

Oracle Machine Learning Notebook

Oracle Machine Learning Notebook Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com

More information

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016 Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016 Mark Hornick, Director, Advanced Analytics January 27, 2016 Safe Harbor Statement The following

More information

ORAAH Change List Summary. ORAAH Change List Summary

ORAAH Change List Summary. ORAAH Change List Summary ORAAH 2.7.0 Change List Summary i ORAAH 2.7.0 Change List Summary ORAAH 2.7.0 Change List Summary ii REVISION HISTORY NUMBER DATE DESCRIPTION NAME ORAAH 2.7.0 Change List Summary iii Contents 1 ORAAH 2.7.0

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Distributed Machine Learning" on Spark

Distributed Machine Learning on Spark Distributed Machine Learning" on Spark Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Outline Data flow vs. traditional network programming Spark computing engine Optimization Example Matrix Computations

More information

Chapter 1 - The Spark Machine Learning Library

Chapter 1 - The Spark Machine Learning Library Chapter 1 - The Spark Machine Learning Library Objectives Key objectives of this chapter: The Spark Machine Learning Library (MLlib) MLlib dense and sparse vectors and matrices Types of distributed matrices

More information

Data Science Bootcamp Curriculum. NYC Data Science Academy

Data Science Bootcamp Curriculum. NYC Data Science Academy Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations

More information

Scaled Machine Learning at Matroid

Scaled Machine Learning at Matroid Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Machine Learning Pipeline Learning Algorithm Replicate model Data Trained Model Serve Model Repeat entire pipeline Scaling

More information

Distributed Computing with Spark

Distributed Computing with Spark Distributed Computing with Spark Reza Zadeh Thanks to Matei Zaharia Outline Data flow vs. traditional network programming Limitations of MapReduce Spark computing engine Numerical computing on Spark Ongoing

More information

Introducing Oracle Machine Learning

Introducing Oracle Machine Learning Introducing Oracle Machine Learning A Collaborative Zeppelin notebook for Oracle s machine learning capabilities Charlie Berger Marcos Arancibia Mark Hornick Advanced Analytics and Machine Learning Copyright

More information

Oracle R Advanced Analytics for Hadoop Release Notes. Oracle R Advanced Analytics for Hadoop Release Notes

Oracle R Advanced Analytics for Hadoop Release Notes. Oracle R Advanced Analytics for Hadoop Release Notes Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes i Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes ii REVISION HISTORY NUMBER

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

Oracle R Technologies

Oracle R Technologies Oracle R Technologies R for the Enterprise Mark Hornick, Director, Oracle Advanced Analytics @MarkHornick mark.hornick@oracle.com Safe Harbor Statement The following is intended to outline our general

More information

Unifying Big Data Workloads in Apache Spark

Unifying Big Data Workloads in Apache Spark Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache

More information

Higher level data processing in Apache Spark

Higher level data processing in Apache Spark Higher level data processing in Apache Spark Pelle Jakovits 12 October, 2016, Tartu Outline Recall Apache Spark Spark DataFrames Introduction Creating and storing DataFrames DataFrame API functions SQL

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle

More information

ORAAH Change List Summary. ORAAH Change List Summary

ORAAH Change List Summary. ORAAH Change List Summary ORAAH 2.7.1 Change List Summary i ORAAH 2.7.1 Change List Summary ORAAH 2.7.1 Change List Summary ii REVISION HISTORY NUMBER DATE DESCRIPTION NAME ORAAH 2.7.1 Change List Summary iii Contents 1 ORAAH 2.7.1

More information

Using Existing Numerical Libraries on Spark

Using Existing Numerical Libraries on Spark Using Existing Numerical Libraries on Spark Brian Spector Chicago Spark Users Meetup June 24 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm

More information

Scalable Machine Learning in R. with H2O

Scalable Machine Learning in R. with H2O Scalable Machine Learning in R with H2O Erin LeDell @ledell DSC July 2016 Introduction Statistician & Machine Learning Scientist at H2O.ai in Mountain View, California, USA Ph.D. in Biostatistics with

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

Big Data Infrastructures & Technologies

Big Data Infrastructures & Technologies Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory

More information

Matrix Computations and " Neural Networks in Spark

Matrix Computations and  Neural Networks in Spark Matrix Computations and " Neural Networks in Spark Reza Zadeh Paper: http://arxiv.org/abs/1509.02256 Joint work with many folks on paper. @Reza_Zadeh http://reza-zadeh.com Training Neural Networks Datasets

More information

Using Numerical Libraries on Spark

Using Numerical Libraries on Spark Using Numerical Libraries on Spark Brian Spector London Spark Users Meetup August 18 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm with

More information

Oracle Big Data Science

Oracle Big Data Science Oracle Big Data Science Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com @VlamisSoftware Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri

More information

Session 7: Oracle R Enterprise OAAgraph Package

Session 7: Oracle R Enterprise OAAgraph Package Session 7: Oracle R Enterprise 1.5.1 OAAgraph Package Oracle Spatial and Graph PGX Graph Algorithms Oracle R Technologies Mark Hornick Director, Oracle Advanced Analytics and Machine Learning July 2017

More information

KNIME for the life sciences Cambridge Meetup

KNIME for the life sciences Cambridge Meetup KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More

More information

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT. Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem

More information

Combining Graph and Machine Learning Technology using R

Combining Graph and Machine Learning Technology using R Combining Graph and Machine Learning Technology using R Hassan Chafi Oracle Labs Mark Hornick Oracle Advanced Analytics February 2, 2017 Safe Harbor Statement The following is intended to outline our research

More information

Distributed Computing with Spark and MapReduce

Distributed Computing with Spark and MapReduce Distributed Computing with Spark and MapReduce Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Traditional Network Programming Message-passing between nodes (e.g. MPI) Very difficult to do at scale:» How

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second

More information

Data Science Training

Data Science Training Data Science Training R, Predictive Modeling, Machine Learning, Python, Bigdata & Spark 9886760678 Introduction: This is a comprehensive course which builds on the knowledge and experience a business analyst

More information

Index. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225

Index. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225 Index A Anonymous function, 66 Apache Hadoop, 1 Apache HBase, 42 44 Apache Hive, 6 7, 230 Apache Kafka, 8, 178 Apache License, 7 Apache Mahout, 5 Apache Mesos, 38 42 Apache Pig, 7 Apache Spark, 9 Apache

More information

Turning Relational Database Tables into Spark Data Sources

Turning Relational Database Tables into Spark Data Sources Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following

More information

An Introduction to Apache Spark

An Introduction to Apache Spark An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations

More information

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

Blended Learning Outline: Cloudera Data Analyst Training (171219a) Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills

More information

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230

More information

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program

More information

Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016

Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016 Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016 Hans Viehmann Product Manager EMEA ORACLE Corporation 12. Mai 2016 Safe Harbor Statement The following

More information

Tackling Big Data Using MATLAB

Tackling Big Data Using MATLAB Tackling Big Data Using MATLAB Alka Nair Application Engineer 2015 The MathWorks, Inc. 1 Building Machine Learning Models with Big Data Access Preprocess, Exploration & Model Development Scale up & Integrate

More information

Oracle R Advanced Analytics for Hadoop Release Notes. Oracle R Advanced Analytics for Hadoop Release Notes

Oracle R Advanced Analytics for Hadoop Release Notes. Oracle R Advanced Analytics for Hadoop Release Notes Oracle R Advanced Analytics for Hadoop 2.8.0 Release Notes i Oracle R Advanced Analytics for Hadoop 2.8.0 Release Notes Oracle R Advanced Analytics for Hadoop 2.8.0 Release Notes ii REVISION HISTORY NUMBER

More information

Introducing Oracle R Enterprise 1.4 -

Introducing Oracle R Enterprise 1.4 - Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I

More information

Oracle Big Data Fundamentals Ed 1

Oracle Big Data Fundamentals Ed 1 Oracle University Contact Us: +0097143909050 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

Deploying Spatial Applications in Oracle Public Cloud

Deploying Spatial Applications in Oracle Public Cloud Deploying Spatial Applications in Oracle Public Cloud David Lapp, Product Manager Oracle Spatial and Graph Oracle Spatial Summit at BIWA 2017 Safe Harbor Statement The following is intended to outline

More information

Integrating Advanced Analytics with Big Data

Integrating Advanced Analytics with Big Data Integrating Advanced Analytics with Big Data Ian McKenna, Ph.D. Senior Financial Engineer 2017 The MathWorks, Inc. 1 The Goal SCALE! 2 The Solution tall 3 Agenda Introduction to tall data Case Study: Predicting

More information

Do-It-Yourself 1. Oracle Big Data Appliance 2X Faster than

Do-It-Yourself 1. Oracle Big Data Appliance 2X Faster than Oracle Big Data Appliance 2X Faster than Do-It-Yourself 1 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such

More information

Taking R to New Heights for Scalability and Performance

Taking R to New Heights for Scalability and Performance Taking R to New Heights for Scalability and Performance Mark Hornick Director, Advanced Analytics and Machine Learning mark.hornick@oracle.com @MarkHornick blogs.oracle.com/r January 31,2017 Safe Harbor

More information

Oracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features

Oracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features Oracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features Move the Algorithms; Not the Data! Charlie Berger, MS Engineering, MBA Sr. Director Product Management,

More information

Oracle Big Data Science IOUG Collaborate 16

Oracle Big Data Science IOUG Collaborate 16 Oracle Big Data Science IOUG Collaborate 16 Session 4762 Tim and Dan Vlamis Tuesday, April 12, 2016 Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri Developed 200+ Oracle

More information

MLI - An API for Distributed Machine Learning. Sarang Dev

MLI - An API for Distributed Machine Learning. Sarang Dev MLI - An API for Distributed Machine Learning Sarang Dev MLI - API Simplify the development of high-performance, scalable, distributed algorithms. Targets common ML problems related to data loading, feature

More information

Big Data and FrameWorks; Perspectives to Applied Machine Learning

Big Data and FrameWorks; Perspectives to Applied Machine Learning Big Data and FrameWorks; Perspectives to Applied Machine Learning Mehdi Habibzadeh PhD in Computer Science Outlines (Oct 2016) : Big Data and Challenges Review and Trends Math and Probability Concepts

More information

Machine Learning and SystemML. Nikolay Manchev Data Scientist Europe E-

Machine Learning and SystemML. Nikolay Manchev Data Scientist Europe E- Machine Learning and SystemML Nikolay Manchev Data Scientist Europe E- mail: nmanchev@uk.ibm.com @nikolaymanchev A Simple Problem In this activity, you will analyze the relationship between educational

More information

Sparkling Water. August 2015: First Edition

Sparkling Water.   August 2015: First Edition Sparkling Water Michal Malohlava Alex Tellez Jessica Lanford http://h2o.gitbooks.io/sparkling-water-and-h2o/ August 2015: First Edition Sparkling Water by Michal Malohlava, Alex Tellez & Jessica Lanford

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

Scalable Data Science in R and Apache Spark 2.0. Felix Cheung, Principal Engineer, Microsoft

Scalable Data Science in R and Apache Spark 2.0. Felix Cheung, Principal Engineer, Microsoft Scalable Data Science in R and Apache Spark 2.0 Felix Cheung, Principal Engineer, Spark @ Microsoft About me Apache Spark Committer Apache Zeppelin PMC/Committer Contributing to Spark since 1.3 and Zeppelin

More information

ACHIEVEMENTS FROM TRAINING

ACHIEVEMENTS FROM TRAINING LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM

More information

Oracle Big Data Discovery

Oracle Big Data Discovery Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It

More information

Benchmarking Spark ML using BigBench. Sweta Singh TPCTC 2016

Benchmarking Spark ML using BigBench. Sweta Singh TPCTC 2016 Benchmarking Spark ML using BigBench Sweta Singh singhswe@us.ibm.com TPCTC 2016 Motivation Study the performance of Machine Learning use cases on large data warehouses in context of assessing Alternate

More information

Machine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich

Machine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich Machine Learning In A Snap Thomas Parnell Research Staff Member IBM Research - Zurich What are GLMs? Ridge Regression Support Vector Machines Regression Generalized Linear Models Classification Lasso Regression

More information

Accelerating Spark Workloads using GPUs

Accelerating Spark Workloads using GPUs Accelerating Spark Workloads using GPUs Rajesh Bordawekar, Minsik Cho, Wei Tan, Benjamin Herta, Vladimir Zolotov, Alexei Lvov, Liana Fong, and David Kung IBM T. J. Watson Research Center 1 Outline Spark

More information

MLlib and Distributing the " Singular Value Decomposition. Reza Zadeh

MLlib and Distributing the  Singular Value Decomposition. Reza Zadeh MLlib and Distributing the " Singular Value Decomposition Reza Zadeh Outline Example Invocations Benefits of Iterations Singular Value Decomposition All-pairs Similarity Computation MLlib + {Streaming,

More information

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with

More information

Apache Spark 2.0. Matei

Apache Spark 2.0. Matei Apache Spark 2.0 Matei Zaharia @matei_zaharia What is Apache Spark? Open source data processing engine for clusters Generalizes MapReduce model Rich set of APIs and libraries In Scala, Java, Python and

More information

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous

More information

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Research challenges in data-intensive computing The Stratosphere Project Apache Flink Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

Data Analytics and Machine Learning: From Node to Cluster

Data Analytics and Machine Learning: From Node to Cluster Data Analytics and Machine Learning: From Node to Cluster Presented by Viswanath Puttagunta Ganesh Raju Understanding use cases to optimize on ARM Ecosystem Date BKK16-404B March 10th, 2016 Event Linaro

More information

COPYRIGHT DATASHEET

COPYRIGHT DATASHEET Your Path to Enterprise AI To succeed in the world s rapidly evolving ecosystem, companies (no matter what their industry or size) must use data to continuously develop more innovative operations, processes,

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

Oracle Big Data, Machine Learning, DW OOW újdonságok

Oracle Big Data, Machine Learning, DW OOW újdonságok Oracle Big Data, Machine Learning, DW OOW újdonságok Fekete Zoltán Platform principal presales consultant Safe Harbor Statement The following is intended to outline our general product direction. It is

More information

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course: DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::

Overview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training:: Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

Resource and Performance Distribution Prediction for Large Scale Analytics Queries

Resource and Performance Distribution Prediction for Large Scale Analytics Queries Resource and Performance Distribution Prediction for Large Scale Analytics Queries Prof. Rajiv Ranjan, SMIEEE School of Computing Science, Newcastle University, UK Visiting Scientist, Data61, CSIRO, Australia

More information

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Big data analytics / machine learning 6+ years

More information

Integration with popular Big Data Frameworks in Statistica and Statistica Enterprise Server Solutions Statistica White Paper

Integration with popular Big Data Frameworks in Statistica and Statistica Enterprise Server Solutions Statistica White Paper and Statistica Enterprise Server Solutions Statistica White Paper Siva Ramalingam Thomas Hill TIBCO Statistica Table of Contents Introduction...2 Spark Support in Statistica...3 Requirements...3 Statistica

More information

DBAs can use Oracle Application Express? Why?

DBAs can use Oracle Application Express? Why? DBAs can use Oracle Application Express? Why? 20. Jubilarna HROUG Konferencija October 15, 2015 Joel R. Kallman Director, Software Development Oracle Application Express, Server Technologies Division Copyright

More information

Agenda. Spark Platform Spark Core Spark Extensions Using Apache Spark

Agenda. Spark Platform Spark Core Spark Extensions Using Apache Spark Agenda Spark Platform Spark Core Spark Extensions Using Apache Spark About me Vitalii Bondarenko Data Platform Competency Manager Eleks www.eleks.com 20 years in software development 9+ years of developing

More information

What is Gluent? The Gluent Data Platform

What is Gluent? The Gluent Data Platform What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may

More information

How to Troubleshoot Databases and Exadata Using Oracle Log Analytics

How to Troubleshoot Databases and Exadata Using Oracle Log Analytics How to Troubleshoot Databases and Exadata Using Oracle Log Analytics Nima Haddadkaveh Director, Product Management Oracle Management Cloud October, 2018 Copyright 2018, Oracle and/or its affiliates. All

More information

Machine Learning in Action

Machine Learning in Action Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Machine Learning With Spark

Machine Learning With Spark Ons Dridi R&D Engineer 13 Novembre 2015 Centre d Excellence en Technologies de l Information et de la Communication CETIC Presentation - An applied research centre in the field of ICT - The knowledge developed

More information

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud DATA INTEGRATION PLATFORM CLOUD Experience Powerful Integration in the Want a unified, powerful, data-driven solution for all your data integration needs? Oracle Integration simplifies your data integration

More information

Practical Machine Learning Agenda

Practical Machine Learning Agenda Practical Machine Learning Agenda Starting From Log Management Moving To Machine Learning PunchPlatform team Thales Challenges Thanks 1 Starting From Log Management 2 Starting From Log Management Data

More information

Apache SystemML Declarative Machine Learning

Apache SystemML Declarative Machine Learning Apache Big Data Seville 2016 Apache SystemML Declarative Machine Learning Luciano Resende About Me Luciano Resende (lresende@apache.org) Architect and community liaison at Have been contributing to open

More information

Hadoop Development Introduction

Hadoop Development Introduction Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand

More information

Putting it all together: Creating a Big Data Analytic Workflow with Spotfire

Putting it all together: Creating a Big Data Analytic Workflow with Spotfire Putting it all together: Creating a Big Data Analytic Workflow with Spotfire Authors: David Katz and Mike Alperin, TIBCO Data Science Team In a previous blog, we showed how ultra-fast visualization of

More information

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS 2017 28th International Workshop on Database and Expert Systems Applications A Cloud System for Machine Learning Exploiting a Parallel Array DBMS Yiqun Zhang, Carlos Ordonez, Lennart Johnsson Department

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information