C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69
|
|
- Joleen Sutton
- 5 years ago
- Views:
Transcription
1 C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69!"#$%&'%(?2%3"9*<%263&H&2 I%"7G3)*?262'5% E2)2*F3&5635*267*D&'*E2)2 13)"H5%*,J0*,-./!"#$%&'()*+*,-./0 1%2345*2678"%*&)9*2::&4&2)59;*<44*%&'()9*%595%=57;**>
2 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. Copyright 2018, Oracle and/or its affiliates. All rights reserved. 2
3 1%2345*D&'*E2)2*?262'5% J
4 !"635#)G24 <3)&"62H45 R=56)9 <3)&"62H45 E2)2*F5)9 <3)&"62H45?5)%&39 N6#G) R=56)9 F)%52L&6'*R6'&65 R6)5%#%&95*E2)2*S*B5#"%)&6' F)%G3)G%57 R6)5%#%&95* E2)2 RT53G)&"6 N66"=2)&"6 E2)2 E&93"=5%$* 1G)#G) Q
5 I%23)&324 <3)&"62H45 R=56)9 <3)&"62H45 E2)2*F5)9 <3)&"62H45?5)%&39 N6#G) R=56)9 1HW53)*F)"%5 K27""#8KEXF F)%52L&6'*R6'&65 R6)5%#%&95*E2)2*S*B5#"%)&6' F)%G3)G%57 R6)5%#%&95* E2)2 RT53G)&"6 N66"=2)&"6 E2)2 V")5H""O98<624$)&3*F5%=&359 E&93"=5%$* 1G)#G) U
6 D&'*E2)2*?262'5%! N634G757 A&)(*244*D&'*E2)2*"::5%&6'9*YDE<0* DE!!*267*DE!FZ! R62H459*L299&=5*#2%24454*3"#$*":*72)2* 45=5%2'&6'*<#23(5*F#2%O " KEXF*[\\] KEXF " KEXF*[\\]*1HW53)*F)"%2'5*Y!4"G7Z " X&45*E&::\&6' 267*3(53O&6'*2:)5%*3"#&59! DG&47*267*L262'5*#6&659! RLH57757*C5##54&6*V")5H""O " <624$^5*72)2*&69)26)4$ " <624$^5*2)*93245*A&)(*1%2345*B*<7=26357*<624$)&39* :"%*K27""# _
7 N634G757 A&)(*244*D&'*E2)2*"::5%&6'9*YDE<0* DE!!*267*DE!FZ! R62H459*L299&=5*#2%24454*3"#$*":*72)2* 45=5%2'&6'*<#23(5*F#2%O " KEXF*[\\] KEXF " KEXF*[\\]*1HW53)*F)"%2'5*Y!4"G7Z " X&45*E&::\&6' 267*3(53O&6'*2:)5%*3"#&59! DG&47*267*L262'5*#6&659! RLH57757*C5##54&6*V")5H""O <G)"6"L"G9*KGH*` D&'*E2)2*!4"G7*F5%=&35?"93"65*F"G)( Y?"672$*` c $z " <624$^5*72)2*&69)26)4$ " <624$^5*2)*93245*A&)(*1%2345*B*<7=26357*<624$)&39* :"%*K27""# a
8 /
9 X&45*H%"A95%*562H459*954:\ 95%=&359*72)2*L"=5L56)*:%"L* :"%*5T2L#45*KEXF*)"*1HW53)* F)"%2'5 d
10 NL#"%)26)4$*)(&9*7%2'S7%"# "%*3"#$*&9*)G%657*&6)"*2*F#2%O* #%"'%2L0*A(&3(*&9*5T53G)57* "%*93(57G457.-
11
12 I&3O*$"G%*:2="%&)5*6")5H""O* 56=&%"6L56)*267*9)2%)*)"* 3"75*&6*(5%5*2'2&69)*$"G%* 2624$)&39*4&H%2%&59 M(5*529&59)*A2$*)"*HG&47*"G)*2* 42H*&9*)"*45=5%2'5*9"L5* O6"A6*H29&39*267*34G9)5%&6'* )"*%G6*W"H9*&6*#2%24454 f95*4&h%2%&59*4&o5*b0* M569"%:4"A 267*!2::5*:"%*$"G%* &:*#"99&H45* &6*#2%24454 K27""#8KEXF.,
13 D&'*E2)2*?262'5%*A&)(*B*N6)5%#%5)5% <H&4&)$*)"*=&9G24&^5*95=5%24*9"G%359*:%"L*V")5H""O9*"6*DE<0*DE!F*267*DE!! )*+,-,./) ,6:2;,<,=6,*83./;=9.J
14 <66"G63&6'b*1B<<K*,;/;-*:"%*F#2%O*,;T.Q
15 What is ORAAH (Oracle R Advanced Analytics for Hadoop) ORAAH is a set of R packages and Java libraries that provide: An R interface for manipulating data stored in a local File System, HDFS, HIVE, Impala or JDBC sources, and creating Distributed Model Matrices across a Cluster of Hadoop Nodes in preparation for ML. A general computation framework where users invoke parallel, distributed MapReduce jobs from R, writing custom mappers and reducers in R while also leveraging open source CRAN packages. Parallel and distributed Machine Learning algorithms that take advantage of all the nodes of a Hadoop cluster for scalable, high performance modeling on big data. Functions use the expressive R formula object optimized for Spark parallel execution. ORAAH's custom LM/GLM/MLP NN algorithms on Spark scale better and run faster than the open-source Spark MLlib functions, but ORAAH provides interfaces to MLlib as well. Copyright 2018, Oracle and/or its affiliates. All rights reserved. 15
16 Where is ORAAH available? On premises: Part of the Oracle Big Data Connectors license for the Oracle Big Data Appliance, DIY Cloudera clusters and DIY Hortonworks clusters. On Oracle Cloud: Part of the Oracle Big Data Connectors license that is included with the Oracle Big Data Cloud Service and the Oracle Big Data Cloud at Customer Included as part of the Big Data Cloud (formerly known as Compute Edition) Copyright 2018, Oracle and/or its affiliates. All rights reserved. 16
17 ORAAH Benefits: Making Spark MLlib better for R users ORAAH Formula parser can handle the full set of open-source R formula transformations, so it can be used with any Spark MLlib algorithm supported by ORAAH. Even in newer Spark releases (Oct 2018) SparkR fails to process a simple interaction between attributes. Using SparkMLlib Logistic Regression model in SparkR fails: R> model <- glm( Kyphosis ~ (Age + Number)^2, df, family = "binomial") ERROR RBackendHandler: fitrmodelformula on org.apache.spark.ml.api.r.sparkrwrappers failed Error in invokejava(isstatic = TRUE, classname, methodname,...) :java.lang.illegalargumentexception: Could not parse formula: Kyphosis ~ (Age + Number)^2 Using Spark MLlib Logistic Regression model via ORAAH R> model <- orch.ml.logistic( Kyphosis ~ (Age + Number)^2, data = data) OBX Model Matrix: processed 1 factor variables, sec OBX Model Matrix: created MLlib LabeledPoint RDD (81 rows) sec OBX Machine Learning: MLlib Logistic Regression elapsed time sec R> model$coefficients [1] produces the same exact result from open-source R glm( Kyphosis ~ (Age + Number)^2, data = kyphosis, family = "binomial")$coefficients (Intercept) Age Number Age:Number Copyright 2018, Oracle and/or its affiliates. All rights reserved. 17
18 ORAAH and Python: Simple and clean code: building a Spark MLlib Random Forest model from HIVE source Python user steps 47 lines Load Libraries Establish Spark Session Process Formula Copy data from HIVE Create 3 rd copy of Data for vectors Build Model Single Vector of Predictions ORAAH user steps 14 lines Load Libraries Establish HIVE and Spark Session Build Model directly against HIVE (also HDFS, IMPALA,, JDBC or Spark DF) data with full formula support Predictions exported with desired columns, no need to glue back original columns Copyright 2018, Oracle and/or its affiliates. All rights reserved. 18
19 Machine Learning Algorithms and Utilities in ORAAH Spark 2.1+ Algorithms by Oracle, interfaces to Spark MLlib, plus HIVE, Impala and Spark DF interfaces RED indicates new in release Classification Clustering Extreme Learning Machines (Oracle s MPI/Spark-based) Hierarchical k-means (Spark MLlib) Hierarchical-ELM (Oracle s MPI/Spark-based) Gaussian Mixture Models (Spark MLlib) Multi-Layer Neural Nets (Oracle s Spark-based) Hierarchical k-means (also available in Map-Red) Logistic Regression (Oracle s Spark-based) Feature Extraction & Creation Gradient Boosted Trees (Spark MLlib) Logistic Regression (Spark MLlib) Distributed Stochastic PCA (Oracle s MPI/Spark-based) Decision Trees (Spark MLlib) Distributed Stochastic SVD (Oracle s MPI/Spark-based) Random Forest (Spark MLlib) Principal Component Analysis (Spark MLlib) Nonnegative Matrix Factorization (Map-Red) Regression Low Rank Matrix Factorization (Map-Red) Multi-Layer Neural Nets (Oracle s Spark-based) Transparency Functions with IMPALA and HIVE Linear Regression Model (Oracle s Spark-based) Gradient Boosted Trees (Spark MLlib) Aggregations, Table Joins, summarization Linear Regression Model (Spark MLlib) Variable Creation, Push & Pull data from IMPALA and HIVE Support Vector Machine (SVM) (Spark MLlib) Ability to push and pull data from Oracle Database LASSO (Spark MLlib) JDBC Driver interface - build Spark DataFrames for ORAAH Ridge Regression (Spark MLlib) Open Source R Algorithms Random Forest (Spark MLlib) Ability to run any R package via our hadoop.run Decision Trees (Spark MLlib) function in Map-Reduce mode Copyright 2018, Oracle and/or its affiliates. All rights reserved.
20 :"%*E2)2*)(2)*:&)9*&6*L5L"%$l lhg)*249"*2h45*)"*9"4=5*2*.-h&*%"a*l"754*ya(&3(*3266")*:&)*56)&%54$*&6*l5l"%$z* <44*)59)9*%G6*"6*2*_\V"75*D&'*E2)2*<##4&2635*qa\,*A&)(*,U_jD*":*B<?*#5%*V"75 X"%LG42b* *r*7&9)2635*s*"%&'&6*s*759) s*29;:23)"%yl"6)(z*s*29;:23)"%y$52%z*s*29;:23)"%y72$":l"6)(z*s*29;:23)"%y72$":a55oz*s*29;:23)"%y:4&'()6glz,-
21
22 1B<<KP9*V5A*E&9)%&HG)57*FhE D563(L2%O*":*1B<<KP9*YF#2%Os?INZ*=9*F#2%O*?44&Hb*_T*:29)5%*s*4&652%*93245*G# F&6'G42%*h53)"%*E53"L#"9&)&"6*,-O*T*,-O*75695*&6#G)*YJ;,jHZ.-*)(%5279*Y.,*23)G24*3"%590*\ qltq-jhz B26O ;U90;2 9+,6:MQ") VWI QR-?O 90;P2 MF8<-?OU X,Y,.-- HT Qd. 9+,6:2QR-?O $#AA/29+,6:MQ").U- I&Z a-a,-- IZ' d/,,,
23 V5A*1B<<K*f)&4&)$*:G63)&"69*:"%*E2)2* I%"3599&6'*267*N6'59) [;\]2,FJ29+,6:2;,<,2=6,*83,J
24 1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69,Q
25 1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69 467EL73YKJ> N6)5%#%5)9*2*!Fh*:&45*267*4"279* &)*&6*L5L"%$*&6)"*2*F#2%O*EX;* 9$9)5L*"%*KEXF,U
26 1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69 467ELJ8376?O8 j565%2)59*2*9&l#45*9gll2%$* ":*)(5*&6:"%L2)&"6*&6*2* F#2%OEX,_
27 1B<<K*,;/;-*\ V5A*F#2%O*EX*72)2*L26&#G42)&"6*:G63)&"69 467EL74--87< D%&6'9*2*F#2%OEX &6)"*BP9*4"324* L5L"%$*:"%*:G%)(5%* L26&#G42)&"6*"%*%59G4)*#%&6)&6',a
28 1B<<K*,;/;-*\ V5A*F#2%O*EX*F324&6'*:G63)&"6,/
29 1B<<K*,;/;-*\ V5A*F#2%O*EX*F324&6'*:G63)&"6 467EL37,-8 F32459*2*F#2%O*EX*G9&6'*"65*":*.U*
30 1B<<K*,;/;-*\ V5A*F#2%O*EX*F324&6'*:G63)&"6 ^3E4S F#2%O*EX*2335#)*L26$*%g2=2 75:2G4)* :G63)&"690*:%"L*9("AYZ*)"*3"G6)YZ J-
31 V5A*gED!*&6)5%:235!!%52)59*2*F#2%O*E2)2X%2L5 :%"L*2*gED!*9"G%35*)(2)*326*H5*G957*"6*26$*":* J.
32 ?"754*#5%:"%L2635*92L#45*:"%*<449)2)5*I%57&3)&"6*!(24456'5 D *92L#45*"G)*":*)(5*"%&'&624*.J;QL&*%53"%790*_-t8Q-t*9#4&)*:"%*M59)&6' $#AA/29+,6:MQ") 1RQ! <4L"9)*29*'""7*29*)(5* JT*:29)5%*)"*HG&47 $#AA/29+,6:2_RQK! X29)59)*)"*DG&47*267* F3"%5 J,
33 ?"754*#5%:"%L2635*92L#45*:"%*<449)2)5*I%57&3)&"6*!(24456'5 D *92L#45*"G)*":*)(5*"%&'&624*.J;QL&*%53"%790*_-t8Q-t*9#4&)*:"%*M59)&6' JJ
34 @&=5*E5L" D&'*E2)2*?262'5%*V")5H""O9 `46:?FN2S?<E2R,6N829+,6:2]-53<863
35 JU
36 J_
37 Ja
38 J/
39 Jd
40 Q-
41 Q.
42 Q,
43 QJ
44 QQ
45 QU
46 L")&=2)&"6*:"%*)(5*:G)G%5 B5'%599&"6 E53&9&"6*M%559 ())#b88aaa;l4$52%6&6';"%'8 Q_
47 34"G73G9)"L5%3"6653);"%2345;3"L
48
49
Oracle Machine Learning Notebook
Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com
More informationFault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016
Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016 Mark Hornick, Director, Advanced Analytics January 27, 2016 Safe Harbor Statement The following
More informationORAAH Change List Summary. ORAAH Change List Summary
ORAAH 2.7.0 Change List Summary i ORAAH 2.7.0 Change List Summary ORAAH 2.7.0 Change List Summary ii REVISION HISTORY NUMBER DATE DESCRIPTION NAME ORAAH 2.7.0 Change List Summary iii Contents 1 ORAAH 2.7.0
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationDistributed Machine Learning" on Spark
Distributed Machine Learning" on Spark Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Outline Data flow vs. traditional network programming Spark computing engine Optimization Example Matrix Computations
More informationChapter 1 - The Spark Machine Learning Library
Chapter 1 - The Spark Machine Learning Library Objectives Key objectives of this chapter: The Spark Machine Learning Library (MLlib) MLlib dense and sparse vectors and matrices Types of distributed matrices
More informationData Science Bootcamp Curriculum. NYC Data Science Academy
Data Science Bootcamp Curriculum NYC Data Science Academy 100+ hours free, self-paced online course. Access to part-time in-person courses hosted at NYC campus Machine Learning with R and Python Foundations
More informationScaled Machine Learning at Matroid
Scaled Machine Learning at Matroid Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Machine Learning Pipeline Learning Algorithm Replicate model Data Trained Model Serve Model Repeat entire pipeline Scaling
More informationDistributed Computing with Spark
Distributed Computing with Spark Reza Zadeh Thanks to Matei Zaharia Outline Data flow vs. traditional network programming Limitations of MapReduce Spark computing engine Numerical computing on Spark Ongoing
More informationIntroducing Oracle Machine Learning
Introducing Oracle Machine Learning A Collaborative Zeppelin notebook for Oracle s machine learning capabilities Charlie Berger Marcos Arancibia Mark Hornick Advanced Analytics and Machine Learning Copyright
More informationOracle R Advanced Analytics for Hadoop Release Notes. Oracle R Advanced Analytics for Hadoop Release Notes
Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes i Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes Oracle R Advanced Analytics for Hadoop 2.7.1 Release Notes ii REVISION HISTORY NUMBER
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationOracle R Technologies
Oracle R Technologies R for the Enterprise Mark Hornick, Director, Oracle Advanced Analytics @MarkHornick mark.hornick@oracle.com Safe Harbor Statement The following is intended to outline our general
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationHigher level data processing in Apache Spark
Higher level data processing in Apache Spark Pelle Jakovits 12 October, 2016, Tartu Outline Recall Apache Spark Spark DataFrames Introduction Creating and storing DataFrames DataFrame API functions SQL
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Big Data Connectors: High Performance Integration for Hadoop and Oracle Database Melli Annamalai Sue Mavris Rob Abbott 2 Program Agenda Big Data Connectors: Brief Overview Connecting Hadoop with Oracle
More informationORAAH Change List Summary. ORAAH Change List Summary
ORAAH 2.7.1 Change List Summary i ORAAH 2.7.1 Change List Summary ORAAH 2.7.1 Change List Summary ii REVISION HISTORY NUMBER DATE DESCRIPTION NAME ORAAH 2.7.1 Change List Summary iii Contents 1 ORAAH 2.7.1
More informationUsing Existing Numerical Libraries on Spark
Using Existing Numerical Libraries on Spark Brian Spector Chicago Spark Users Meetup June 24 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm
More informationScalable Machine Learning in R. with H2O
Scalable Machine Learning in R with H2O Erin LeDell @ledell DSC July 2016 Introduction Statistician & Machine Learning Scientist at H2O.ai in Mountain View, California, USA Ph.D. in Biostatistics with
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationBig Data Infrastructures & Technologies
Big Data Infrastructures & Technologies Spark and MLLIB OVERVIEW OF SPARK What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop Improves efficiency through: In-memory
More informationMatrix Computations and " Neural Networks in Spark
Matrix Computations and " Neural Networks in Spark Reza Zadeh Paper: http://arxiv.org/abs/1509.02256 Joint work with many folks on paper. @Reza_Zadeh http://reza-zadeh.com Training Neural Networks Datasets
More informationUsing Numerical Libraries on Spark
Using Numerical Libraries on Spark Brian Spector London Spark Users Meetup August 18 th, 2015 Experts in numerical algorithms and HPC services How to use existing libraries on Spark Call algorithm with
More informationOracle Big Data Science
Oracle Big Data Science Tim Vlamis and Dan Vlamis Vlamis Software Solutions 816-781-2880 www.vlamis.com @VlamisSoftware Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri
More informationSession 7: Oracle R Enterprise OAAgraph Package
Session 7: Oracle R Enterprise 1.5.1 OAAgraph Package Oracle Spatial and Graph PGX Graph Algorithms Oracle R Technologies Mark Hornick Director, Oracle Advanced Analytics and Machine Learning July 2017
More informationKNIME for the life sciences Cambridge Meetup
KNIME for the life sciences Cambridge Meetup Greg Landrum, Ph.D. KNIME.com AG 12 July 2016 What is KNIME? A bit of motivation: tool blending, data blending, documentation, automation, reproducibility More
More informationOracle Big Data. A NA LYT ICS A ND MA NAG E MENT.
Oracle Big Data. A NALYTICS A ND MANAG E MENT. Oracle Big Data: Redundância. Compatível com ecossistema Hadoop, HIVE, HBASE, SPARK. Integração com Cloudera Manager. Possibilidade de Utilização da Linguagem
More informationCombining Graph and Machine Learning Technology using R
Combining Graph and Machine Learning Technology using R Hassan Chafi Oracle Labs Mark Hornick Oracle Advanced Analytics February 2, 2017 Safe Harbor Statement The following is intended to outline our research
More informationDistributed Computing with Spark and MapReduce
Distributed Computing with Spark and MapReduce Reza Zadeh @Reza_Zadeh http://reza-zadeh.com Traditional Network Programming Message-passing between nodes (e.g. MPI) Very difficult to do at scale:» How
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More informationData Science Training
Data Science Training R, Predictive Modeling, Machine Learning, Python, Bigdata & Spark 9886760678 Introduction: This is a comprehensive course which builds on the knowledge and experience a business analyst
More informationIndex. bfs() function, 225 Big data characteristics, 2 variety, 3 velocity, 3 veracity, 3 volume, 2 Breadth-first search algorithm, 220, 225
Index A Anonymous function, 66 Apache Hadoop, 1 Apache HBase, 42 44 Apache Hive, 6 7, 230 Apache Kafka, 8, 178 Apache License, 7 Apache Mahout, 5 Apache Mesos, 38 42 Apache Pig, 7 Apache Spark, 9 Apache
More informationTurning Relational Database Tables into Spark Data Sources
Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following
More informationAn Introduction to Apache Spark
An Introduction to Apache Spark 1 History Developed in 2009 at UC Berkeley AMPLab. Open sourced in 2010. Spark becomes one of the largest big-data projects with more 400 contributors in 50+ organizations
More informationBlended Learning Outline: Cloudera Data Analyst Training (171219a)
Blended Learning Outline: Cloudera Data Analyst Training (171219a) Cloudera Univeristy s data analyst training course will teach you to apply traditional data analytics and business intelligence skills
More informationIntroducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone
Introducing Microsoft SQL Server 2016 R Services Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone SQL Server 2016: Everything built-in built-in built-in built-in built-in built-in $2,230
More informationCERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)
CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program
More informationVerarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016
Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016 Hans Viehmann Product Manager EMEA ORACLE Corporation 12. Mai 2016 Safe Harbor Statement The following
More informationTackling Big Data Using MATLAB
Tackling Big Data Using MATLAB Alka Nair Application Engineer 2015 The MathWorks, Inc. 1 Building Machine Learning Models with Big Data Access Preprocess, Exploration & Model Development Scale up & Integrate
More informationOracle R Advanced Analytics for Hadoop Release Notes. Oracle R Advanced Analytics for Hadoop Release Notes
Oracle R Advanced Analytics for Hadoop 2.8.0 Release Notes i Oracle R Advanced Analytics for Hadoop 2.8.0 Release Notes Oracle R Advanced Analytics for Hadoop 2.8.0 Release Notes ii REVISION HISTORY NUMBER
More informationIntroducing Oracle R Enterprise 1.4 -
Hello, and welcome to this online, self-paced lesson entitled Introducing Oracle R Enterprise. This session is part of an eight-lesson tutorial series on Oracle R Enterprise. My name is Brian Pottle. I
More informationOracle Big Data Fundamentals Ed 1
Oracle University Contact Us: +0097143909050 Oracle Big Data Fundamentals Ed 1 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big Data
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationDeploying Spatial Applications in Oracle Public Cloud
Deploying Spatial Applications in Oracle Public Cloud David Lapp, Product Manager Oracle Spatial and Graph Oracle Spatial Summit at BIWA 2017 Safe Harbor Statement The following is intended to outline
More informationIntegrating Advanced Analytics with Big Data
Integrating Advanced Analytics with Big Data Ian McKenna, Ph.D. Senior Financial Engineer 2017 The MathWorks, Inc. 1 The Goal SCALE! 2 The Solution tall 3 Agenda Introduction to tall data Case Study: Predicting
More informationDo-It-Yourself 1. Oracle Big Data Appliance 2X Faster than
Oracle Big Data Appliance 2X Faster than Do-It-Yourself 1 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such
More informationTaking R to New Heights for Scalability and Performance
Taking R to New Heights for Scalability and Performance Mark Hornick Director, Advanced Analytics and Machine Learning mark.hornick@oracle.com @MarkHornick blogs.oracle.com/r January 31,2017 Safe Harbor
More informationOracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features
Oracle s Machine Learning and Advanced Analytics Release 12.2 and Oracle Data Miner 4.2 New Features Move the Algorithms; Not the Data! Charlie Berger, MS Engineering, MBA Sr. Director Product Management,
More informationOracle Big Data Science IOUG Collaborate 16
Oracle Big Data Science IOUG Collaborate 16 Session 4762 Tim and Dan Vlamis Tuesday, April 12, 2016 Vlamis Software Solutions Vlamis Software founded in 1992 in Kansas City, Missouri Developed 200+ Oracle
More informationMLI - An API for Distributed Machine Learning. Sarang Dev
MLI - An API for Distributed Machine Learning Sarang Dev MLI - API Simplify the development of high-performance, scalable, distributed algorithms. Targets common ML problems related to data loading, feature
More informationBig Data and FrameWorks; Perspectives to Applied Machine Learning
Big Data and FrameWorks; Perspectives to Applied Machine Learning Mehdi Habibzadeh PhD in Computer Science Outlines (Oct 2016) : Big Data and Challenges Review and Trends Math and Probability Concepts
More informationMachine Learning and SystemML. Nikolay Manchev Data Scientist Europe E-
Machine Learning and SystemML Nikolay Manchev Data Scientist Europe E- mail: nmanchev@uk.ibm.com @nikolaymanchev A Simple Problem In this activity, you will analyze the relationship between educational
More informationSparkling Water. August 2015: First Edition
Sparkling Water Michal Malohlava Alex Tellez Jessica Lanford http://h2o.gitbooks.io/sparkling-water-and-h2o/ August 2015: First Edition Sparkling Water by Michal Malohlava, Alex Tellez & Jessica Lanford
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationScalable Data Science in R and Apache Spark 2.0. Felix Cheung, Principal Engineer, Microsoft
Scalable Data Science in R and Apache Spark 2.0 Felix Cheung, Principal Engineer, Spark @ Microsoft About me Apache Spark Committer Apache Zeppelin PMC/Committer Contributing to Spark since 1.3 and Zeppelin
More informationACHIEVEMENTS FROM TRAINING
LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM
More informationOracle Big Data Discovery
Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It
More informationBenchmarking Spark ML using BigBench. Sweta Singh TPCTC 2016
Benchmarking Spark ML using BigBench Sweta Singh singhswe@us.ibm.com TPCTC 2016 Motivation Study the performance of Machine Learning use cases on large data warehouses in context of assessing Alternate
More informationMachine Learning In A Snap. Thomas Parnell Research Staff Member IBM Research - Zurich
Machine Learning In A Snap Thomas Parnell Research Staff Member IBM Research - Zurich What are GLMs? Ridge Regression Support Vector Machines Regression Generalized Linear Models Classification Lasso Regression
More informationAccelerating Spark Workloads using GPUs
Accelerating Spark Workloads using GPUs Rajesh Bordawekar, Minsik Cho, Wei Tan, Benjamin Herta, Vladimir Zolotov, Alexei Lvov, Liana Fong, and David Kung IBM T. J. Watson Research Center 1 Outline Spark
More informationMLlib and Distributing the " Singular Value Decomposition. Reza Zadeh
MLlib and Distributing the " Singular Value Decomposition Reza Zadeh Outline Example Invocations Benefits of Iterations Singular Value Decomposition All-pairs Similarity Computation MLlib + {Streaming,
More informationOutrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS
Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS Topics AGENDA Challenges with Big Data Analytics How SAS can help you to minimize time to value with
More informationApache Spark 2.0. Matei
Apache Spark 2.0 Matei Zaharia @matei_zaharia What is Apache Spark? Open source data processing engine for clusters Generalizes MapReduce model Rich set of APIs and libraries In Scala, Java, Python and
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More informationResearch challenges in data-intensive computing The Stratosphere Project Apache Flink
Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationData Analytics and Machine Learning: From Node to Cluster
Data Analytics and Machine Learning: From Node to Cluster Presented by Viswanath Puttagunta Ganesh Raju Understanding use cases to optimize on ARM Ecosystem Date BKK16-404B March 10th, 2016 Event Linaro
More informationCOPYRIGHT DATASHEET
Your Path to Enterprise AI To succeed in the world s rapidly evolving ecosystem, companies (no matter what their industry or size) must use data to continuously develop more innovative operations, processes,
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationOracle Big Data, Machine Learning, DW OOW újdonságok
Oracle Big Data, Machine Learning, DW OOW újdonságok Fekete Zoltán Platform principal presales consultant Safe Harbor Statement The following is intended to outline our general product direction. It is
More informationDATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:
DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationOverview. : Cloudera Data Analyst Training. Course Outline :: Cloudera Data Analyst Training::
Module Title Duration : Cloudera Data Analyst Training : 4 days Overview Take your knowledge to the next level Cloudera University s four-day data analyst training course will teach you to apply traditional
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationResource and Performance Distribution Prediction for Large Scale Analytics Queries
Resource and Performance Distribution Prediction for Large Scale Analytics Queries Prof. Rajiv Ranjan, SMIEEE School of Computing Science, Newcastle University, UK Visiting Scientist, Data61, CSIRO, Australia
More informationIn-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Big data analytics / machine learning 6+ years
More informationIntegration with popular Big Data Frameworks in Statistica and Statistica Enterprise Server Solutions Statistica White Paper
and Statistica Enterprise Server Solutions Statistica White Paper Siva Ramalingam Thomas Hill TIBCO Statistica Table of Contents Introduction...2 Spark Support in Statistica...3 Requirements...3 Statistica
More informationDBAs can use Oracle Application Express? Why?
DBAs can use Oracle Application Express? Why? 20. Jubilarna HROUG Konferencija October 15, 2015 Joel R. Kallman Director, Software Development Oracle Application Express, Server Technologies Division Copyright
More informationAgenda. Spark Platform Spark Core Spark Extensions Using Apache Spark
Agenda Spark Platform Spark Core Spark Extensions Using Apache Spark About me Vitalii Bondarenko Data Platform Competency Manager Eleks www.eleks.com 20 years in software development 9+ years of developing
More informationWhat is Gluent? The Gluent Data Platform
What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationHow to Troubleshoot Databases and Exadata Using Oracle Log Analytics
How to Troubleshoot Databases and Exadata Using Oracle Log Analytics Nima Haddadkaveh Director, Product Management Oracle Management Cloud October, 2018 Copyright 2018, Oracle and/or its affiliates. All
More informationMachine Learning in Action
Machine Learning in Action PETER HARRINGTON Ill MANNING Shelter Island brief contents PART l (~tj\ssification...,... 1 1 Machine learning basics 3 2 Classifying with k-nearest Neighbors 18 3 Splitting
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationMachine Learning With Spark
Ons Dridi R&D Engineer 13 Novembre 2015 Centre d Excellence en Technologies de l Information et de la Communication CETIC Presentation - An applied research centre in the field of ICT - The knowledge developed
More informationDATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud
DATA INTEGRATION PLATFORM CLOUD Experience Powerful Integration in the Want a unified, powerful, data-driven solution for all your data integration needs? Oracle Integration simplifies your data integration
More informationPractical Machine Learning Agenda
Practical Machine Learning Agenda Starting From Log Management Moving To Machine Learning PunchPlatform team Thales Challenges Thanks 1 Starting From Log Management 2 Starting From Log Management Data
More informationApache SystemML Declarative Machine Learning
Apache Big Data Seville 2016 Apache SystemML Declarative Machine Learning Luciano Resende About Me Luciano Resende (lresende@apache.org) Architect and community liaison at Have been contributing to open
More informationHadoop Development Introduction
Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand
More informationPutting it all together: Creating a Big Data Analytic Workflow with Spotfire
Putting it all together: Creating a Big Data Analytic Workflow with Spotfire Authors: David Katz and Mike Alperin, TIBCO Data Science Team In a previous blog, we showed how ultra-fast visualization of
More informationA Cloud System for Machine Learning Exploiting a Parallel Array DBMS
2017 28th International Workshop on Database and Expert Systems Applications A Cloud System for Machine Learning Exploiting a Parallel Array DBMS Yiqun Zhang, Carlos Ordonez, Lennart Johnsson Department
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More information