#Azure #MicrosoftAIJourney

Size: px
Start display at page:

Download "#Azure #MicrosoftAIJourney"

Transcription

1 #Azure #MicrosoftAIJourney

2

3

4 SQL Server + R

5

6 1990s Development started based on S language (created in 1980) 1993 R starts as a Research Project in University of Auckland, New Zealand First version Alpha Stable Beta V1.0 Considered by developers to be production ready V2.0 First UseR conference V3.0

7

8 C R A N Comprehensive R Archive Network

9 Oct ,143

10 Open Source lingua franca Analytics, Computing, Modeling CRAN Task View by Barry Rowlingson: More packages on Github and BioConductor project

11 Boxplot Bar Plot Histogram Contour Dot Plot Mosaic Scatter Latticist

12 Vectors

13

14 Investment insurance R is an abstract language and hence code is safe across platform and versions Unlike other big data tools such as Spark Runs on existing in production platforms SQL 2016 Spark Clusters Teradata Approachable language No need for a computer science degree Free version for learning Stable deployment Familiar tooling

15 Open Source R compared to Microsoft R Server US flight data for 20 years Linear Regression on Arrival Delay Run on 4 core laptop, 16GB RAM and 500GB SSD

16

17

18

19 Application Database

20 Better Collaboration & Sharing Insights Faster Time to Insight SQL Server Machine Learning Services Streamline Productivity and Deployment Better Security & Compliance

21 Windows Jobobject MSSQLSERVER Service MSSQLLAUNCHPAD Service sqlservr.exe sp_execute_external_script Named pipe launchpad.exe R/Python Launcher Windows satellite Windows Windows R/Python satellite process satellite satellite process process processes What and How to launch R/Python satellite process TCP sqlsatellite.dll

22 Pushing compute to the data

23 SQL Server

24 train <- sqlquery(connection, select * from nyctaxi_sample ) model <- glm(formula, train) Data Scientist Workstation Any R/Python IDE 2 Execution 1 3 Pull Data Model Output DB

25 cc <- RxInSqlServer( connectionstring, computecontext) rxlogit(formula, cc) 2 Execution Data Scientist Workstation 1 3 Script rx* output SQL Server 2017 SQL Server Any R/Python IDE 4 Model or Predictions Machine Learning Services R/Python Runtime

26

27 # Set ComputeContext cc <- RxInSqlServer(connectionString = connection_string, numtasks = num_tasks); rxsetcomputecontext(cc); # Define data source visitor_interests <- RxSqlServerData(sqlQuery = input_query, colclasses = c(book_category = "numeric", college_education = "numeric", male = "numeric", clicks_in_1 = "numeric", ),connectionstring = connection_string, usefastread = TRUE); # Train model on SQL Server i.e., push rxlogit compute to remote server logit_model <- rxlogit(book_category ~ college_education + male + clicks_in_1 +, data = visitor_interests);

28

29

30

31 SQL Server

32

33 CREATE TABLE iris_rx_data ("Sepal.Length" float not null, "Sepal.Width" float not null, "Petal.Length" float not null, "Petal.Width" float not null, "Species" varchar(100)) INSERT INTO iris_rx_data EXEC = = N'iris_data <- = = N'iris_data' --WITH RESULT SETS (("Sepal.Length" float not null, --"Sepal.Width" float not null, --"Petal.Length" float not null, --"Petal.Width" float not null, "Species" varchar(100))); ALTER TABLE iris_rx_data ADD ID INT PRIMARY KEY NOT NULL IDENTITY (1,1)

34 BEGIN TRY CREATE TABLE [dbo].[iris_rx_models] ( [model_name] [varchar](30) NOT NULL, [model] [varbinary](max) NOT NULL) END TRY BEGIN CATCH print ERROR_MESSAGE() END CATCH CREATE PROCEDURE [dbo].[generate_iris_rxbtrees_model] AS BEGIN DELETE FROM [dbo].[iris_rx_models] WHERE model_name = 'iris_rxbtrees_model' varbinary(max); EXECUTE = = N' iris.sub <- c(sample(1:50, 25), sample(51:100, 25), sample(101:150, 25)) iris.dtree <- rxdtree(species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris[iris.sub, ]) model <- rxserializemodel(iris.dtree, realtimescoringonly = FALSE) #realtimescoringonly - Setting this flag could reduce the model size but rxunserializemodel can no longer retrieve the RevoScaleR model rxunserializemodel(model) cat(paste0("r Process ID = ", Sys.getpid())) cat("\n") = N'@model varbinary(max) OUTPUT INSERT [dbo].[iris_rx_models] ; END;

35 ALTER PROCEDURE [dbo].[predict_species] varchar(100)) AS BEGIN varbinary(max) = (select model from iris_rx_models where model_name -- Predict species based on the specified model: exec = = N' require("revoscaler"); irismodel<-rxunserializemodel(nb_model) species<-rxpredict(irismodel, iris_rx_data[,2:5]); OutputDataSet <- cbind(iris_rx_data[1], species, iris_rx_data[6]); cat(paste0("r Process ID = ", Sys.getpid())) cat("\n") OutputDataSet <- merge(iris_rx_data, OutputDataSet) colnames(outputdataset) <- c("id", "1","2","3","4", "Species.Actual", "6","7", "Species.Expected"); OutputDataSet <- OutputDataSet; = N' select id, "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species" from = = N'@nb_model with result sets UNDEFINED; END; GO EXEC [predict_species] 'iris_rxbtrees_model'

36 Dataset = Rows = SQL Server 5000

37 ALTER procedure [dbo].[predict_species_stream] varchar(100)) as begin varbinary(max) = (select model from [dbo].[iris_rx_models] where model_name -- Predict species based on the specified model: exec = = N' require("revoscaler"); irismodel<-rxunserializemodel(nb_model) species<-rxpredict(irismodel, iris_rx_data[,2:5]); OutputDataSet <- cbind(iris_rx_data[1], species, iris_rx_data[6]); colnames(outputdataset) <- c("id", "Species.Actual", "Species.Expected"); cat(paste0("r Process ID = ", Sys.getpid())) cat("\n") OutputDataSet <- OutputDataSet; = N' select id, "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species" from = = N'@nb_model int = with result sets UNDEFINED; end; GO EXEC [predict_species_stream] 'iris_rxbtrees_model'

38 sp_execute_external = N = 1 (MAXDOP = 2)

39

40 --Create Fake Data SELECT TOP 0 * INTO iris_rx_data_big FROM iris_rx_data GO INSERT INTO iris_rx_data_big ([Sepal.Length], [Sepal.Width], [Petal.Length], [Petal.Width]) VALUES ( RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0 ) GO 1000 INSERT INTO iris_rx_data_big ([Sepal.Length], [Sepal.Width], [Petal.Length], [Petal.Width]) SELECT RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0 FROM iris_rx_data_big GO 10 CREATE INDEX [IXiris_rx_data_big] ON iris_rx_data_big(id)

41 ALTER PROCEDURE [dbo].[predict_species_parallel] varchar(100)) as begin varbinary(max) = (select model from iris_rx_models where model_name -- Predict species based on the specified model: exec = = N' require("revoscaler"); irismodel<-rxunserializemodel(nb_model) species<-rxpredict(irismodel, iris_rx_data[,2:5]); OutputDataSet <- cbind(iris_rx_data[1], species, iris_rx_data[6]); colnames(outputdataset) <- c("id", "Species.Actual", "Species.Expected"); cat(paste0("r Process ID = ", Sys.getpid())) cat("\n") cat(" ") cat("\n") OutputDataSet <- OutputDataSet; = = N'select id, "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species" from iris_rx_data_big WHERE LEFT(ID,1) BETWEEN 1 AND 5 /*Add this bit to make parallel*/ OPTION(MAXDOP = = N'@nb_model int = with result sets (("ID" INT, "Setosa" INT, "versicolor" INT, "virginica" INT, "Species" varchar(150))); END; GO SET STATISTICS XML ON EXEC [predict_species_parallel] 'iris_rxbtrees_model' SET STATISTICS XML OFF

42

43 varbinary(max) = (SELECT TOP 1 model from [dbo].[iris_rx_models] WHERE model_name = 'iris_rxbtrees_model'); DROP TABLE IF EXISTS #TMP SELECT [Sepal.Length] *.9 AS [Sepal.Length], [Sepal.Width] *.9 AS [Sepal.Width], [Petal.Length] *.9 AS [Petal.Length], [Petal.Width]*.9 AS [Petal.Width], Species, ID INTO #tmp FROM iris_rx_data SELECT * FROM PREDICT(MODEL DATA = #tmp AS d) WITH (setosa_pred float, versicolor_pred float, virginica_pred float) AS p;

44 varbinary(max) = 0x626C6F62298B1834AEF8DA6B26B89D32475A371DEDE9328FD60DB36F45A5E2C702CB E472E9C26354AFC8A4474CFD49695C6E B F F4F3F2F F CDCC8C3F F F4F3F2F B F F F F4F3F2F F F FFF000000FF F F4F3F2F F B010003B F F4F3F2F F CDCC8C3F F F4F3F2F F FA F07FA F07F F03F C F F4F3F2F F F FFF000000FF F F4F3F2F F B020003D F F4F3F2F F CDCC8C3F F F4F3F2F B F F B A F F4F3F2F F F FFF000000FF F F4F3F2F F B030003B F F4F3F2F F CDCC8 C3F F F4F3F2F F FA F07FA F07F F03F F03F F F4F3F2F F F FFF000000FF F F4F3F2F F B040003CD F F4F3F2F F CDCC8C3F F F4F3F2F F FA F07FA F07F F03F F03F F03F D53F F03F D53F E03F4FECC44EECC4EE3F D53F E03F143BB1133BB1A33F F03F F03F D53F E53F3096FC62C92FD63F7B14AE47E17AD43F F F4F3F2F F F FFF000000FF F F4F3F2F F B D F F4F3F2F F CDCC8C3F F F4F3F2F B F F F F4F3F2F F F FFF000000FF F F4F3F2F F B C F F4F3F2F F CDCC8C3F F F4F3F2F F C C2E4C656E B C2E F F4F3F2F F F FFF000000FF F F4F3F2F F B070003DD F F4F3F2F F CDCC8C3F F F4F3F2F A F FA F07FA F07F C F0BF F0BFF8FFFFFFFFFF38403CB1133BB A A F93F F F4F3F2F F F FFF000000FF F F4F3F2F F B D F F4F3F2F F CD CC8C3F F F4F3F2F B F F F F4F3F2F F F FFF000000FF F F4F3F2F F B F F4F3F2F F CDCC8C3F F F4F3F2F B F F F F4F3F2F F F FFF000000FF F F4F3F2F F B0A B C B B B F03F F03F F03F F03F F03F F03F E C B0F F A F6C6F E B B C C2E4C656E B C2E C C2E4C656E B C2E B C C2E4C656E B C2E C C2E4C656E B C2E B B B B1A B1B DROP TABLE IF EXISTS #TMP SELECT * INTO #TMP FROM ( SELECT 5.1 [Sepal.Length], 3.4 [Sepal.Width], 1.5 [Petal.Length], 0.3 [Petal.Width] UNION SELECT 4.3, 3.1, 1.2, 0.2 UNION SELECT 6.6, 1.4, 5.3, 2.2) AS A SELECT d.*, p.* FROM PREDICT(MODEL DATA = #TMP as d) WITH(setosa_Pred float, versicolor_pred float, virginica_pred float) as p;

45

46

47 DMV sys.dm_exec_requests sys.dm_external_script_requests sys.dm_external_script_execution_ stats sys.dm_os_performance_counters Description New column: external_script_request_id Returns running external scripts, DOP & assigned user account Number of executions for rx* functions in RevoScaleR package New External Scripts performance counters

48 SELECT * FROM sys.resource_governor_external_resource_pools GO ALTER EXTERNAL RESOURCE POOL [default] WITH ( MAX_CPU_PERCENT = 90, AFFINITY CPU = AUTO, MAX_MEMORY_PERCENT = 25 ); GO ALTER RESOURCE GOVERNOR RECONFIGURE; GO SELECT * FROM sys.resource_governor_external_resource_pools

49 (SQL Server 2017) 2017

50

51 CD C:\Program Files\Microsoft SQL Server\140\Setup Bootstrap\SQL2017\x64\ RSetup.exe /install /component MLM /version /language 1033 /destdir "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\R_SERVICES\library\MicrosoftML\mxLibs\x64"

52 CREATE TABLE CNNFileLocations ( [file.name] nvarchar(max), type nvarchar(max), label Int, modelname nvarchar(150)) CREATE PROCEDURE [dbo].[spcnnloadfilelocationsr] (@ModeltoTrain NVarChar(MAX)) as begin DELETE FROM CNNFileLocations WHERE modelname INSERT INTO CNNFileLocations execute = = N' root.directory.name.training <- paste("c:/r/images/", ModeltoTrain, "/Training", sep="") root.directory.name.testing <- paste("c:/r/images/", ModeltoTrain, "/testing", sep="") training.folders <- list.dirs(root.directory.name.training) root.folder.length <- nchar(root.directory.name.training) + 1 #Remove the root folder as we do not need it training.folders <- training.folders[-1] imagesdf <- data.frame(cbind( file.name = file.path(list.files(training.folders, "*.*", full.names = TRUE)), type = substr(dirname(list.files(training.folders, "*.*", full.names = TRUE)), root.folder.length + 1, 1000)), stringsasfactors = FALSE) #Create an integer label by turniing into a factor and then to an integer imagesdf$label <- as.integer(as.factor(imagesdf[[2]])) - 1 imagesdf$modelname <- ModeltoTrain OutputDataSet <- data.frame(imagesdf) = N'@ModeltoTrain ; END

53 CREATE PROCEDURE [dbo].[spcnnmodelcreate] NVarChar(MAX)) as begin nvarchar(max) = CONCAT(N'SELECT [file.name], [type], [Label] FROM CNNFileLocations WHERE modelname = N'''') CREATE TABLE [dbo].[cnnmodel]( execute sp_execute_external_script [int] = N'R' IDENTITY(1,1) NOT NULL PRIMARY = N' [Model] [varbinary](max) NULL, require(microsoftml) [ModelName] [nvarchar](150) NULL, [dt2] [datetime2](7) NOT NULL DEFAULT(GETDATE())) imagesdf <- CNNFileLocations imagesdf$file.name <- as.character(imagesdf$file.name) imagesdf$type GO <- as.character(imagesdf$type) imagesdf$label <- as.numeric(imagesdf$label) imagemodel <- rxlogisticregression( formula = Label ~ Features, data = imagesdf, NVarChar(MAX) = 'GoT') AS type = "multiclass", mltransforms = list( loadimage(vars = list(features = "file.name")), resizeimage(vars = "Features", width = 224, height = 224), extractpixels(vars = "Features"), INSERT featurizeimage(var = "Features", dnnmodel = "Resnet50")) ) CREATE PROCEDURE [dbo].[spcnnmodelinsert] (@ModeltoTrain AS TABLE (v VarBinary(MAX)) EXEC OutputDataSet <- data.frame(payload = as.raw(serialize(imagemodel, connection=null))); INSERT INTO CNNModel (Model, ModelName) SELECT = = = N'@ModeltoTrain with result sets ((model varbinary(max))); END

54 CREATE PROCEDURE [dbo].[spcnnmodelpredict] varchar(150)) AS BEGIN nvarchar(max) = CONCAT(N'SELECT [type], label FROM CNNFileLocations WHERE modelname = N''' GROUP BY [type], label ORDER BY [type]') varbinary(max); select TOP = model from CNNModel where ModelName ORDER BY dt2 DESC -- Predict species based on the specified model: exec = = N' cnn_modelu <- unserialize(cnn_model) root.directory.name.testing <- paste("c:/r/images/", ModeltoTrain, "/testing", sep="") testing.folder <- list.dirs(root.directory.name.testing) test.files <- data.frame(file.name = file.path(list.files(testing.folder, "*.*", full.names = TRUE)), stringsasfactors = FALSE) test.files[, "Label"] <- -99 # Lets use the trained model to predict the type of image prediction <- rxpredict(cnn_modelu, data = test.files, extravarstowrite = list("label", "file.name")) #Get the distinct values distinct.types <- CNNFileLocations OutputDataSet <- distinct.types #Join to find the type names prediction <- merge(prediction, distinct.types, by.x = "PredictedLabel", by.y = "label") OutputDataSet <- prediction = = = N'@cnn_model @ModeltoTrain with result sets UNDEFINED; end;

55

56 R Client Easily scale up a single server to a grid to handle more concurrent requests Load balancing cross compute nodes A shared pool of warmed up R shells to improve scoring performance.

57 Load Balancer Server level HA: Introduce multiple Web Nodes for Active-Active backup / recovery, via load balancer Data Store HA: leverage Enterprise grade DB, SQL Server and Postgres HA capabilities

58

59

60 Distributed R - How Does Local Compute Context? Microsoft R Server Client R IDE or commandline Predictive Algorithm Console Analyze Blocks In Parallel LOCAL CONTEXT Load Block At A Time Big Data Microsoft R Server functions A compute context defines where to process. E.g. remote context like Hadoop Map Reduce Microsoft R functions prefixed with rx Current set compute context determines processing location Copyright Microsoft Corporation. All rights reserved.

61 Distributed R - How Does Remote Compute Context? Microsoft R Server Client Microsoft R Server Server R IDE or commandline REMOTE CONTEXT Distribute Work, Compile Results Analyze Blocks In Parallel Load Block At A Time Big Data Console Predictive Algorithm Results Pack and Ship Requests to Remote Environments Algorithm Master Microsoft R Server functions A compute context defines where to process. E.g. remote context like Hadoop Map Reduce Microsoft R functions prefixed with rx Current set compute context determines processing location Copyright Microsoft Corporation. All rights reserved.

62 ScaleR models can be deployed from a server or edge node to run in Hadoop without any functional R model re-coding for map-reduce Compute context R script sets where the model will run Local Parallel processing Linux or Windows ### SETUP LOCAL ENVIRONMENT VARIABLES ### mylocalcc <- localpar ### LOCAL COMPUTE CONTEXT ### rxsetcomputecontext(mylocalcc) ### CREATE LINUX, DIRECTORY AND FILE OBJECTS ### linuxfs <- RxNativeFileSystem() ) AirlineDataSet <- RxXdfData( AirlineDemoSmall/AirlineDemoSmall.xdf, filesystem = linuxfs) In Hadoop myhadoopccc <- RxHadoopMR() rxsetcomputecontext(myhadoopcc) hdfsfs <- RxHdfsFileSystem() hdfsfs Functional model R script does not need to change to run in Hadoop ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxsummary(~arrdelay+dayofweek, data= AirlineDataSet, reportprogress=1) ### CrossTab the data rxcrosstabs(arrdelay ~ DayOfWeek, data= AirlineDataSet, means=t) ### Linear Model and plot hdfsxdfarrlatelinmod <- rxlinmod(arrdelay ~ DayOfWeek + 0, data = AirlineDataSet) plot(hdfsxdfarrlatelinmod$coefficients)

63 ScaleR models can be deployed from a server or edge node to run in SQL Server without any functional R model re-coding for in-database computations Compute context R script sets where the model will run Local Parallel processing Linux or Windows ### SETUP LOCAL ENVIRONMENT VARIABLES ### mysqlcon <- "Driver=SQL;SERVER=localhost;Database= RevoTester;Uid=RevoTester; pwd=######" mylocalcc <- localpar ### LOCAL COMPUTE CONTEXT ### rxsetcomputecontext(mylocalcc) ### CREATE SQL SERVER DATA SOURCE ### AirlineDemoQuery <- "SELECT * FROM AirlineDemoSmall;" AirlineDataSet <- RxOdbcData(connectionString = mysqlcon, sqlquery = AirlineDemoQuery) In SQL SERVER ### SETUP SQL Server ENVIRONMENT VARIABLES ### mysqlcc <- "Driver=SQL;SERVER=localhost;Database=RevoTester; Uid=RevoTester; pwd=######" ### SQL SERVER COMPUTE CONTEXT ### rxsetcomputecontext(mysqlcc) ### CREATE SQL SERVER DATA SOURCE ### AirlineDemoQuery <- "SELECT * FROM AirlineDemoSmall;" AirlineDataSet <- RxSqlServerData(connectionString = mysqlcc, sqlquery = AirlineDemoQuery) Functional model R script does not need to change to run in either DB ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxsummary(~arrdelay+dayofweek, data= AirlineDataSet, reportprogress=1) ### CrossTab the data rxcrosstabs(arrdelay ~ DayOfWeek, data= AirlineDataSet, means=t) ### Linear Model and plot hdfsxdfarrlatelinmod <- rxlinmod(arrdelay ~ DayOfWeek + 0, data = AirlineDataSet) plot(hdfsxdfarrlatelinmod$coefficients)

64 ScaleR models can be deployed from a server or edge node to run in Teradata without any functional R model re-coding for in-database computations Compute context R script sets where the model will run Local Parallel processing Linux or Windows ### SETUP LOCAL ENVIRONMENT VARIABLES ### mylocalcc <- localpar ### LOCAL COMPUTE CONTEXT ### rxsetcomputecontext(mylocalcc) ### CREATE LOCAL FILE-SYSTEM POINTER AND FILE OBJECT ### localfs <- RxNativeFileSystem() ) AirlineDataSet <- RxXdfData( AirlineDemoSmall.xdf, filesystem = localfs) In Teradata ### SETUP TERADATA ENVIRONMENT VARIABLES ### mytdcc <- "Driver=Teradata; DBCNAME=TeradataProd; Database=RevoTester; Uid=RevoTester; pwd=######" ### TERADATA COMPUTE CONTEXT ### rxsetcomputecontext(mytdcc) ### CREATE TERADATA DATA SOURCE ### AirlineDemoQuery <- "SELECT * FROM AirlineDemoSmall;" AirlineDataSet <- RxTeradata(connectionString = mytdcc, sqlquery = AirlineDemoQuery) Functional model R script does not need to change to run in Teradata ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxsummary(~arrdelay+dayofweek, data= AirlineDataSet, reportprogress=1) ### CrossTab the data rxcrosstabs(arrdelay ~ DayOfWeek, data= AirlineDataSet, means=t) ### Linear Model and plot hdfsxdfarrlatelinmod <- rxlinmod(arrdelay ~ DayOfWeek + 0, data = AirlineDataSet) plot(hdfsxdfarrlatelinmod$coefficients)

65 R R R R R R R R R R R Server

66

67

68

69

70 Anomaly Detection

71

72

73

74

75

76

77 Train on what is normal (single class) Model understands what it like to be normal When an item is encountered that does not fit its idea of what it is like to be normal then it is counted as an anomaly

78

79

80

81 B I G D ATA & A D VA N C E D A N A LY T I C S AT A G L A N C E Ingest Store Prep & Train Model & Serve Intelligence Business apps Data Factory (Data movement, pipelines & orchestration) Cosmos DB Custom apps Kafka Blobs Data Lake Databricks HDInsight Data Lake Analytics SQL SQL Database Predictive apps Event Hub IoT Hub Machine Learning SQL Data Warehouse Operational reports Sensors and devices Analysis Services Analytical dashboards

82 A P A C H E S P A R K An unified, open source, parallel, data processing framework for Big Data Analytics Spark SQL Interactive Queries Spark MLlib Machine Learning Spark Streaming Stream processing GraphX Graph Computation Spark Core Engine Yarn Mesos Spark Structured Streaming Stream processing Standalone Spark Scheduler MLlib Machine Learning

83 S P A R K - B E N E F I T S Performance Using in-memory computing, Spark is considerably faster than Hadoop (100x in some tests). Can be used for batch and real-time data processing. Developer Productivity Easy-to-use APIs for processing large datasets. Includes 100+ operators for transforming. Unified Engine Integrated framework includes higher-level libraries for interactive SQL queries, Stream Analytics, ML and graph processing. A single application can combine all types of processing. Ecosystem Spark has built-in support for many data sources, rich ecosystem of ISV applications and a large dev community. Available on multiple public clouds (AWS, Google and Azure) and multiple on-premises distributors

84 A D V A N T A G E S O F A U N I F I E D P L A T F O R M Spark Streaming Spark Machine Learning Spark SQL

85 D A T A B R I C K S - C O M P A N Y O V E R V I E W

86 A Z U R E D A T A B R I C K S Microsoft Azure

87 A Z U R E D A T A B R I C K S Azure Databricks Collaborative Workspace IoT / streaming data DATA ENGINEER DATA SCIENTIST BUSINESS ANALYST Machine learning models Cloud storage Deploy Production Jobs & Workflows BI tools MULTI-STAGE PIPELINES JOB SCHEDULER NOTIFICATION & LOGS Data warehouses Optimized Databricks Runtime Engine Data exports Hadoop storage DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs Data warehouses Enhance Productivity Build on secure & trusted cloud Scale without limits

88 BIG DATA STORAGE Reduced Administration BIG DATA ANALYTICS K N O W I N G T H E V A R I O U S B I G D A T A S O L U T I O N S CONTROL EASE OF USE Azure Databricks Azure Data Lake Analytics Azure HDInsight Azure Marketplace HDP CDH MapR Any Hadoop technology, any distribution Workload optimized, managed clusters Frictionless & Optimized Spark clusters Data Engineering in a Job-as-a-service model IaaS Clusters Managed Clusters Big Data as-a-service Azure Data Lake Analytics Azure Data Lake Store Azure Storage

89 G E N E R A L S P A R K C L U S T E R A R C H I T E C T U R E Driver Program SparkContext Cluster Manager Worker Node Worker Node Worker Node Data Sources (HDFS, SQL, NoSQL, )

90 S E C U R E C O L L A B O R A T I O N Azure Databricks enables secure collaboration between colleagues With Azure Databricks colleagues can securely share key artifacts such as Clusters, Notebooks, Jobs and Workspaces Secure collaboration is enabled through a combination of: Fine grained permissions: Defines who can do what on which artifacts (access control) Fine Grained Permissions AAD-based User Authentication AAD-based authentication: Ensures that users are actually who they claim to be

91

92 J O B S Jobs are the mechanism to submit Spark application code for execution on the Databricks clusters Spark application code is submitted as a Job for execution on Azure Databricks clusters Jobs execute either Notebooks or Jars Azure Databricks provide a comprehensive set of graphical tools to create, manage and monitor Jobs.

93 D A T A B R I C K S S P A R K I S F A S T Benchmarks have shown Databricks to often have better performance than alternatives SOURCE: Benchmarking Big Data SQL Platforms in the Cloud

94 Spark ML Algorithms S P A R K M L A L G O R I T H M S

95 D E E P L E A R N I N G Azure Databricks supports and integrates with a number of Deep Learning libraries and frameworks to make it easy to build and deploy Deep Learning applications Supports Deep Learning Libraries/frameworks including: Microsoft Cognitive Toolkit (CNTK). o Article explains how to install CNTK on Azure Databricks. TensorFlowOnSpark BigDL Offers Spark Deep Learning Pipelines, a suite of tools for working with and processing images using deep learning using transfer learning. It includes high-level APIs for common aspects of deep learning so they can be done efficiently in a few lines of code: Distributed Hyperparameter Tuning Transfer Learning

96

97

98

99

100

101

102

103

104

105 Visual Studio Tools for AI Visual Studio extension with deep integration to Azure ML End to end development environment, from new project through training Support for remote training Job management On top of all of the goodness of Visual Studio (Python, Jupyter, Git, etc)

106

107

108

109 THE FASTEST TOOLKIT

110 MOST SCALABLE

111 This material is provided for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED.

Franck Mercier. Technical Solution Professional Data + AI Azure Databricks

Franck Mercier. Technical Solution Professional Data + AI Azure Databricks Franck Mercier Technical Solution Professional Data + AI http://aka.ms/franck @FranmerMS Azure Databricks Thanks to our sponsors Global Gold Silver Bronze Microsoft JetBrains Rubrik Delphix Solution OMD

More information

Data and AI LATAM 2018

Data and AI LATAM 2018 Data and AI LATAM 2018 La parte de imagen con el identificador de relación rid5 no se encontró en el archivo. La parte de imagen con el identificador de relación rid5 no se encontró en el archivo. La parte

More information

Modeling. Preparation. Operationalization. Profile Explore. Model Testing & Validation. Feature & Algorithm Selection. Transform Cleanse Denormalize

Modeling. Preparation. Operationalization. Profile Explore. Model Testing & Validation. Feature & Algorithm Selection. Transform Cleanse Denormalize Preparation Modeling Ingest Transform Cleanse Denormalize Profile Explore Visualize Feature & Algorithm Selection Model Testing & Validation Operationalization Models Visualizations Deploy Apps, Services

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

Microsoft. Advanced Analytics. Juan Carlos Rodriguez García Data Platform Solution Architect

Microsoft. Advanced Analytics. Juan Carlos Rodriguez García Data Platform Solution Architect Microsoft Advanced Analytics Juan Carlos Rodriguez García jurodr@microsoft.com Data Platform Solution Architect VALOR Fuente: Gartner DIFICULTAD Banca Omnicanal MAX Maximizar el Tiempo, Todo el Tiempo

More information

Understanding the latent value in all content

Understanding the latent value in all content Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence

More information

Boost your Analytics with ML for SQL Nerds

Boost your Analytics with ML for SQL Nerds Boost your Analytics with ML for SQL Nerds SQL Saturday Spokane Mar 10, 2018 Julie Koesmarno @MsSQLGirl mssqlgirl.com jukoesma@microsoft.com Principal Program Manager in Business Analytics for SQL Products

More information

Microsoft, Open Source, R: You Gotta be Kidding Me!

Microsoft, Open Source, R: You Gotta be Kidding Me! Microsoft, Open Source, R: You Gotta be Kidding Me! Bio - Niels Berglund Software Specialist - Derivco lots of production dev. plus figuring out ways to "use and abuse" existing and new technologies Author

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

Overview of Data Services and Streaming Data Solution with Azure

Overview of Data Services and Streaming Data Solution with Azure Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server

More information

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud Microsoft Azure Databricks for data engineering Building production data pipelines with Apache Spark in the cloud Azure Databricks As companies continue to set their sights on making data-driven decisions

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Data Architectures in Azure for Analytics & Big Data

Data Architectures in Azure for Analytics & Big Data Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A

More information

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp. Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020

More information

70-532: Developing Microsoft Azure Solutions

70-532: Developing Microsoft Azure Solutions 70-532: Developing Microsoft Azure Solutions Exam Design Target Audience Candidates of this exam are experienced in designing, programming, implementing, automating, and monitoring Microsoft Azure solutions.

More information

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

70-532: Developing Microsoft Azure Solutions

70-532: Developing Microsoft Azure Solutions 70-532: Developing Microsoft Azure Solutions Objective Domain Note: This document shows tracked changes that are effective as of January 18, 2018. Create and Manage Azure Resource Manager Virtual Machines

More information

The Evolution of Big Data Platforms and Data Science

The Evolution of Big Data Platforms and Data Science IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering

More information

HDInsight > Hadoop. October 12, 2017

HDInsight > Hadoop. October 12, 2017 HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

Alexander Klein. #SQLSatDenmark. ETL meets Azure

Alexander Klein. #SQLSatDenmark. ETL meets Azure Alexander Klein ETL meets Azure BIG Thanks to SQLSat Denmark sponsors Save the date for exiting upcoming events PASS Camp 2017 Main Camp 05.12. 07.12.2017 (04.12. Kick-Off abends) Lufthansa Training &

More information

Exam Questions

Exam Questions Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure

More information

Azure Data Lake Analytics Introduction for SQL Family. Julie

Azure Data Lake Analytics Introduction for SQL Family. Julie Azure Data Lake Analytics Introduction for SQL Family Julie Koesmarno @MsSQLGirl www.mssqlgirl.com jukoesma@microsoft.com What we have is a data glut Vernor Vinge (Emeritus Professor of Mathematics at

More information

WHITEPAPER. MemSQL Enterprise Feature List

WHITEPAPER. MemSQL Enterprise Feature List WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure

More information

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and AI and Visual Analytics: Machine Learning in Business Operations Steven Hillion Senior Director, Data Science Anshuman Mishra Principal Data Scientist DISCLAIMER During the course of this presentation,

More information

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. 17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations

More information

Integrate MATLAB Analytics into Enterprise Applications

Integrate MATLAB Analytics into Enterprise Applications Integrate Analytics into Enterprise Applications Aurélie Urbain MathWorks Consulting Services 2015 The MathWorks, Inc. 1 Data Analytics Workflow Data Acquisition Data Analytics Analytics Integration Business

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

exam.   Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0 70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to

More information

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

SQL Server Machine Learning Marek Chmel & Vladimir Muzny SQL Server Machine Learning Marek Chmel & Vladimir Muzny @VladimirMuzny & @MarekChmel MCTs, MVPs, MCSEs Data Enthusiasts! vladimir@datascienceteam.cz marek@datascienceteam.cz Session Agenda Machine learning

More information

Architecting Microsoft Azure Solutions (proposed exam 535)

Architecting Microsoft Azure Solutions (proposed exam 535) Architecting Microsoft Azure Solutions (proposed exam 535) IMPORTANT: Significant changes are in progress for exam 534 and its content. As a result, we are retiring this exam on December 31, 2017, and

More information

Talend Big Data Sandbox. Big Data Insights Cookbook

Talend Big Data Sandbox. Big Data Insights Cookbook Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is

More information

Agenda. Spark Platform Spark Core Spark Extensions Using Apache Spark

Agenda. Spark Platform Spark Core Spark Extensions Using Apache Spark Agenda Spark Platform Spark Core Spark Extensions Using Apache Spark About me Vitalii Bondarenko Data Platform Competency Manager Eleks www.eleks.com 20 years in software development 9+ years of developing

More information

Approaching the Petabyte Analytic Database: What I learned

Approaching the Petabyte Analytic Database: What I learned Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may

More information

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2

MATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2 1 Senior Application Engineer The MathWorks Korea 2017 The MathWorks, Inc. 2 Data Analytics Workflow Business Systems Smart Connected Systems Data Acquisition Engineering, Scientific, and Field Business

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

Přehled novinek v SQL Server 2016

Přehled novinek v SQL Server 2016 Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing

More information

Big Data Applications with Spring XD

Big Data Applications with Spring XD Big Data Applications with Spring XD Thomas Darimont, Software Engineer, Pivotal Inc. @thomasdarimont Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under a

More information

Vishesh Oberoi Seth Reid Technical Evangelist, Microsoft Software Developer, Intergen

Vishesh Oberoi Seth Reid Technical Evangelist, Microsoft Software Developer, Intergen Vishesh Oberoi Technical Evangelist, Microsoft VishO@microsoft.com @ovishesh Seth Reid Software Developer, Intergen contact@sethreid.co.nz @sethreidnz Vishesh Oberoi Technical Evangelist, Microsoft VishO@microsoft.com

More information

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 HOTSPOT You install the Microsoft Hive ODBC Driver on a computer that runs Windows

More information

Olivia Klose Technical Evangelist. Sascha Dittmann Cloud Solution Architect

Olivia Klose Technical Evangelist. Sascha Dittmann Cloud Solution Architect Olivia Klose Technical Evangelist Sascha Dittmann Cloud Solution Architect What is Apache Spark? Apache Spark is a fast and general engine for large-scale data processing. An unified, open source, parallel,

More information

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024 Current support level End Mainstream End Extended SQL Server 2005 SQL Server 2008 and 2008 R2 SQL Server 2012 SQL Server 2005 SP4 is in extended support, which ends on April 12, 2016 SQL Server 2008 and

More information

Industry-leading Application PaaS Platform

Industry-leading Application PaaS Platform Industry-leading Application PaaS Platform Solutions Transactional Apps Digital Marketing LoB App Modernization Services Web Apps Web App for Containers API Apps Mobile Apps IDE Enterprise Integration

More information

Azure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD

Azure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD Azure Data Factory VS. SSIS Reza Rad, Consultant, RADACAD 2 Please silence cell phones Explore Everything PASS Has to Offer FREE ONLINE WEBINAR EVENTS FREE 1-DAY LOCAL TRAINING EVENTS VOLUNTEERING OPPORTUNITIES

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Microsoft vision for a new era

Microsoft vision for a new era Microsoft vision for a new era United platform for the modern service provider MICROSOFT AZURE CUSTOMER DATACENTER CONSISTENT PLATFORM SERVICE PROVIDER Enterprise-grade Global reach, scale, and security

More information

Azure Data Lake Store

Azure Data Lake Store Azure Data Lake Store Analytics 101 Kenneth M. Nielsen Data Solution Architect, MIcrosoft Our Sponsors About me Kenneth M. Nielsen Worked with SQL Server since 1999 Data Solution Architect at Microsoft

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

28 February 1 March 2018, Trafo Baden. #techsummitch

28 February 1 March 2018, Trafo Baden. #techsummitch #techsummitch 28 February 1 March 2018, Trafo Baden #techsummitch Transform your data estate with cloud, data and AI #techsummitch The world is changing Data will grow to 44 ZB in 2020 Today, 80% of organizations

More information

Integrate MATLAB Analytics into Enterprise Applications

Integrate MATLAB Analytics into Enterprise Applications Integrate Analytics into Enterprise Applications Dr. Roland Michaely 2015 The MathWorks, Inc. 1 Data Analytics Workflow Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics

More information

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List)

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List) CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List) Microsoft Solution Latest Sl Area Refresh No. Course ID Run ID Course Name Mapping Date 1 AZURE202x 2 Microsoft

More information

Techno Expert Solutions

Techno Expert Solutions Course Content of Microsoft Windows Azzure Developer: Course Outline Module 1: Overview of the Microsoft Azure Platform Microsoft Azure provides a collection of services that you can use as building blocks

More information

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo

Microsoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 You have an Azure HDInsight cluster. You need to store data in a file format that

More information

Processing of big data with Apache Spark

Processing of big data with Apache Spark Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT

More information

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism

SQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and

More information

Azure DevOps. Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region

Azure DevOps. Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region Azure DevOps Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region What is DevOps? People. Process. Products. Build & Test Deploy DevOps is the union of people, process, and products to

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

Prepare. Model. Operationalize

Prepare. Model. Operationalize Prepare Model Operationalize Model Re-Code Validate Deploy How do we operationalize R? Turn R analytics Web services in one line of code; Swagger-based REST APIs, easy to consume, with any programming

More information

Integrate MATLAB Analytics into Enterprise Applications

Integrate MATLAB Analytics into Enterprise Applications Integrate Analytics into Enterprise Applications Lyamine Hedjazi 2015 The MathWorks, Inc. 1 Data Analytics Workflow Preprocessing Data Business Systems Build Algorithms Smart Connected Systems Take Decisions

More information

17/05/2017. What we ll cover. Who is Greg? Why PaaS and SaaS? What we re not discussing: IaaS

17/05/2017. What we ll cover. Who is Greg? Why PaaS and SaaS? What we re not discussing: IaaS What are all those Azure* and Power* services and why do I want them? Dr Greg Low SQL Down Under greg@sqldownunder.com Who is Greg? CEO and Principal Mentor at SDU Data Platform MVP Microsoft Regional

More information

Oracle Machine Learning Notebook

Oracle Machine Learning Notebook Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com

More information

Oracle Big Data Discovery

Oracle Big Data Discovery Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It

More information

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad Swimming in the Data Lake Presented by Warner Chaves Moderated by Sander Stad Thank You microsoft.com hortonworks.com aws.amazon.com red-gate.com Empower users with new insights through familiar tools

More information

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data

Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

Greenplum-Spark Connector Examples Documentation. kong-yew,chan

Greenplum-Spark Connector Examples Documentation. kong-yew,chan Greenplum-Spark Connector Examples Documentation kong-yew,chan Dec 10, 2018 Contents 1 Overview 1 1.1 Pivotal Greenplum............................................ 1 1.2 Pivotal Greenplum-Spark Connector...................................

More information

Microsoft Perform Data Engineering on Microsoft Azure HDInsight.

Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight http://killexams.com/pass4sure/exam-detail/70-775 QUESTION: 30 You are building a security tracking solution in Apache Kafka to parse

More information

WITH INTEL TECHNOLOGIES

WITH INTEL TECHNOLOGIES WITH INTEL TECHNOLOGIES Commitment Is to Enable The Best Democratize technologies Advance solutions Unleash innovations Intel Xeon Scalable Processor Family Delivers Ideal Enterprise Solutions NEW Intel

More information

Databricks, an Introduction

Databricks, an Introduction Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,

More information

Oskari Heikkinen. New capabilities of Azure Data Factory v2

Oskari Heikkinen. New capabilities of Azure Data Factory v2 Oskari Heikkinen New capabilities of Azure Data Factory v2 Oskari Heikkinen Lead Cloud Architect at BIGDATAPUMP Microsoft P-TSP Azure Advisors Numerous projects on Azure Worked with Microsoft Data Platform

More information

New Features and Enhancements in Big Data Management 10.2

New Features and Enhancements in Big Data Management 10.2 New Features and Enhancements in Big Data Management 10.2 Copyright Informatica LLC 2017. Informatica, the Informatica logo, Big Data Management, and PowerCenter are trademarks or registered trademarks

More information

Big data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT

Big data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT : Choices for high availability and disaster recovery on Microsoft Azure By Arnab Ganguly DataCAT March 2019 Contents Overview... 3 The challenge of a single-region architecture... 3 Configuration considerations...

More information

Specialist ICT Learning

Specialist ICT Learning Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.

More information

20532D: Developing Microsoft Azure Solutions

20532D: Developing Microsoft Azure Solutions 20532D: Developing Microsoft Azure Solutions Course Details Course Code: Duration: Notes: 20532D 5 days Elements of this syllabus are subject to change. About this course This course is intended for students

More information

Developing Microsoft Azure Solutions

Developing Microsoft Azure Solutions Developing Microsoft Azure Solutions Duration: 5 Days Course Code: M20532 Overview: This course is intended for students who have experience building web applications. Students should also have experience

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

BI ENVIRONMENT PLANNING GUIDE

BI ENVIRONMENT PLANNING GUIDE BI ENVIRONMENT PLANNING GUIDE Business Intelligence can involve a number of technologies and foster many opportunities for improving your business. This document serves as a guideline for planning strategies

More information

SQL Server 2017 Power your entire data estate from on-premises to cloud

SQL Server 2017 Power your entire data estate from on-premises to cloud SQL Server 2017 Power your entire data estate from on-premises to cloud PREMIER SPONSOR GOLD SPONSORS SILVER SPONSORS BRONZE SPONSORS SUPPORTERS Vulnerabilities (2010-2016) Power your entire data estate

More information

Unifying Big Data Workloads in Apache Spark

Unifying Big Data Workloads in Apache Spark Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache

More information

COURSE 10977A: UPDATING YOUR SQL SERVER SKILLS TO MICROSOFT SQL SERVER 2014

COURSE 10977A: UPDATING YOUR SQL SERVER SKILLS TO MICROSOFT SQL SERVER 2014 ABOUT THIS COURSE This five-day instructor-led course teaches students how to use the enhancements and new features that have been added to SQL Server and the Microsoft data platform since the release

More information

Index. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI /

Index. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI / Index A Advanced Message Queueing Protocol (AMQP), 44 Analytics, 9 Apache Ambari project, 209 210 API key, 244 Application data, 4 Azure Active Directory (AAD), 91, 257 Azure Blob Storage, 191 Azure data

More information

Latest from the Lab: What's New Machine Learning Sam Buhler - Machine Learning Product/Offering Manager

Latest from the Lab: What's New Machine Learning Sam Buhler - Machine Learning Product/Offering Manager Latest from the Lab: What's New Machine Learning Sam Buhler - Machine Learning Product/Offering Manager Please Note IBM s statements regarding its plans, directions, and intent are subject to change or

More information

Agenda. Future Sessions: Azure VMs, Backup/DR Strategies, Azure Networking, Storage, How to move

Agenda. Future Sessions: Azure VMs, Backup/DR Strategies, Azure Networking, Storage, How to move Onur Dogruoz Agenda Provide an introduction to Azure Infrastructure as a Service (IaaS) Walk through the Azure portal Help you understand role-based access control Engage in an overview of the calculator

More information

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture Big Data Syllabus Hadoop YARN Setup Programming in YARN framework j Understanding big data and Hadoop Big Data Limitations and Solutions of existing Data Analytics Architecture Hadoop Features Hadoop Ecosystem

More information

The age of Big Data Big Data for Oracle Database Professionals

The age of Big Data Big Data for Oracle Database Professionals The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG

More information

Big data systems 12/8/17

Big data systems 12/8/17 Big data systems 12/8/17 Today Basic architecture Two levels of scheduling Spark overview Basic architecture Cluster Manager Cluster Cluster Manager 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Talend Big Data Sandbox. Big Data Insights Cookbook

Talend Big Data Sandbox. Big Data Insights Cookbook Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is

More information

#techsummitch

#techsummitch www.thomasmaurer.ch #techsummitch Justin Incarnato Justin Incarnato Microsoft Principal PM - Azure Stack Hyper-scale Hybrid Power of Azure in your datacenter Azure Stack Enterprise-proven On-premises

More information

Scaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1

Scaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1 Scaling MATLAB for Your Organisation and Beyond Rory Adams 2015 The MathWorks, Inc. 1 MATLAB at Scale Front-end scaling Scale with increasing access requests Back-end scaling Scale with increasing computational

More information

Developing Microsoft Azure Solutions (70-532) Syllabus

Developing Microsoft Azure Solutions (70-532) Syllabus Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages

More information

Azure Data Factory. Data Integration in the Cloud

Azure Data Factory. Data Integration in the Cloud Azure Data Factory Data Integration in the Cloud 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and

More information

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been

More information

Integrating Advanced Analytics with Big Data

Integrating Advanced Analytics with Big Data Integrating Advanced Analytics with Big Data Ian McKenna, Ph.D. Senior Financial Engineer 2017 The MathWorks, Inc. 1 The Goal SCALE! 2 The Solution tall 3 Agenda Introduction to tall data Case Study: Predicting

More information

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous

More information