#Azure #MicrosoftAIJourney
|
|
- Emery Douglas
- 5 years ago
- Views:
Transcription
1 #Azure #MicrosoftAIJourney
2
3
4 SQL Server + R
5
6 1990s Development started based on S language (created in 1980) 1993 R starts as a Research Project in University of Auckland, New Zealand First version Alpha Stable Beta V1.0 Considered by developers to be production ready V2.0 First UseR conference V3.0
7
8 C R A N Comprehensive R Archive Network
9 Oct ,143
10 Open Source lingua franca Analytics, Computing, Modeling CRAN Task View by Barry Rowlingson: More packages on Github and BioConductor project
11 Boxplot Bar Plot Histogram Contour Dot Plot Mosaic Scatter Latticist
12 Vectors
13
14 Investment insurance R is an abstract language and hence code is safe across platform and versions Unlike other big data tools such as Spark Runs on existing in production platforms SQL 2016 Spark Clusters Teradata Approachable language No need for a computer science degree Free version for learning Stable deployment Familiar tooling
15 Open Source R compared to Microsoft R Server US flight data for 20 years Linear Regression on Arrival Delay Run on 4 core laptop, 16GB RAM and 500GB SSD
16
17
18
19 Application Database
20 Better Collaboration & Sharing Insights Faster Time to Insight SQL Server Machine Learning Services Streamline Productivity and Deployment Better Security & Compliance
21 Windows Jobobject MSSQLSERVER Service MSSQLLAUNCHPAD Service sqlservr.exe sp_execute_external_script Named pipe launchpad.exe R/Python Launcher Windows satellite Windows Windows R/Python satellite process satellite satellite process process processes What and How to launch R/Python satellite process TCP sqlsatellite.dll
22 Pushing compute to the data
23 SQL Server
24 train <- sqlquery(connection, select * from nyctaxi_sample ) model <- glm(formula, train) Data Scientist Workstation Any R/Python IDE 2 Execution 1 3 Pull Data Model Output DB
25 cc <- RxInSqlServer( connectionstring, computecontext) rxlogit(formula, cc) 2 Execution Data Scientist Workstation 1 3 Script rx* output SQL Server 2017 SQL Server Any R/Python IDE 4 Model or Predictions Machine Learning Services R/Python Runtime
26
27 # Set ComputeContext cc <- RxInSqlServer(connectionString = connection_string, numtasks = num_tasks); rxsetcomputecontext(cc); # Define data source visitor_interests <- RxSqlServerData(sqlQuery = input_query, colclasses = c(book_category = "numeric", college_education = "numeric", male = "numeric", clicks_in_1 = "numeric", ),connectionstring = connection_string, usefastread = TRUE); # Train model on SQL Server i.e., push rxlogit compute to remote server logit_model <- rxlogit(book_category ~ college_education + male + clicks_in_1 +, data = visitor_interests);
28
29
30
31 SQL Server
32
33 CREATE TABLE iris_rx_data ("Sepal.Length" float not null, "Sepal.Width" float not null, "Petal.Length" float not null, "Petal.Width" float not null, "Species" varchar(100)) INSERT INTO iris_rx_data EXEC = = N'iris_data <- = = N'iris_data' --WITH RESULT SETS (("Sepal.Length" float not null, --"Sepal.Width" float not null, --"Petal.Length" float not null, --"Petal.Width" float not null, "Species" varchar(100))); ALTER TABLE iris_rx_data ADD ID INT PRIMARY KEY NOT NULL IDENTITY (1,1)
34 BEGIN TRY CREATE TABLE [dbo].[iris_rx_models] ( [model_name] [varchar](30) NOT NULL, [model] [varbinary](max) NOT NULL) END TRY BEGIN CATCH print ERROR_MESSAGE() END CATCH CREATE PROCEDURE [dbo].[generate_iris_rxbtrees_model] AS BEGIN DELETE FROM [dbo].[iris_rx_models] WHERE model_name = 'iris_rxbtrees_model' varbinary(max); EXECUTE = = N' iris.sub <- c(sample(1:50, 25), sample(51:100, 25), sample(101:150, 25)) iris.dtree <- rxdtree(species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = iris[iris.sub, ]) model <- rxserializemodel(iris.dtree, realtimescoringonly = FALSE) #realtimescoringonly - Setting this flag could reduce the model size but rxunserializemodel can no longer retrieve the RevoScaleR model rxunserializemodel(model) cat(paste0("r Process ID = ", Sys.getpid())) cat("\n") = N'@model varbinary(max) OUTPUT INSERT [dbo].[iris_rx_models] ; END;
35 ALTER PROCEDURE [dbo].[predict_species] varchar(100)) AS BEGIN varbinary(max) = (select model from iris_rx_models where model_name -- Predict species based on the specified model: exec = = N' require("revoscaler"); irismodel<-rxunserializemodel(nb_model) species<-rxpredict(irismodel, iris_rx_data[,2:5]); OutputDataSet <- cbind(iris_rx_data[1], species, iris_rx_data[6]); cat(paste0("r Process ID = ", Sys.getpid())) cat("\n") OutputDataSet <- merge(iris_rx_data, OutputDataSet) colnames(outputdataset) <- c("id", "1","2","3","4", "Species.Actual", "6","7", "Species.Expected"); OutputDataSet <- OutputDataSet; = N' select id, "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species" from = = N'@nb_model with result sets UNDEFINED; END; GO EXEC [predict_species] 'iris_rxbtrees_model'
36 Dataset = Rows = SQL Server 5000
37 ALTER procedure [dbo].[predict_species_stream] varchar(100)) as begin varbinary(max) = (select model from [dbo].[iris_rx_models] where model_name -- Predict species based on the specified model: exec = = N' require("revoscaler"); irismodel<-rxunserializemodel(nb_model) species<-rxpredict(irismodel, iris_rx_data[,2:5]); OutputDataSet <- cbind(iris_rx_data[1], species, iris_rx_data[6]); colnames(outputdataset) <- c("id", "Species.Actual", "Species.Expected"); cat(paste0("r Process ID = ", Sys.getpid())) cat("\n") OutputDataSet <- OutputDataSet; = N' select id, "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species" from = = N'@nb_model int = with result sets UNDEFINED; end; GO EXEC [predict_species_stream] 'iris_rxbtrees_model'
38 sp_execute_external = N = 1 (MAXDOP = 2)
39
40 --Create Fake Data SELECT TOP 0 * INTO iris_rx_data_big FROM iris_rx_data GO INSERT INTO iris_rx_data_big ([Sepal.Length], [Sepal.Width], [Petal.Length], [Petal.Width]) VALUES ( RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0 ) GO 1000 INSERT INTO iris_rx_data_big ([Sepal.Length], [Sepal.Width], [Petal.Length], [Petal.Width]) SELECT RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0, RIGHT(ABS(CHECKSUM(NEWID())), 2)/10.0 FROM iris_rx_data_big GO 10 CREATE INDEX [IXiris_rx_data_big] ON iris_rx_data_big(id)
41 ALTER PROCEDURE [dbo].[predict_species_parallel] varchar(100)) as begin varbinary(max) = (select model from iris_rx_models where model_name -- Predict species based on the specified model: exec = = N' require("revoscaler"); irismodel<-rxunserializemodel(nb_model) species<-rxpredict(irismodel, iris_rx_data[,2:5]); OutputDataSet <- cbind(iris_rx_data[1], species, iris_rx_data[6]); colnames(outputdataset) <- c("id", "Species.Actual", "Species.Expected"); cat(paste0("r Process ID = ", Sys.getpid())) cat("\n") cat(" ") cat("\n") OutputDataSet <- OutputDataSet; = = N'select id, "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species" from iris_rx_data_big WHERE LEFT(ID,1) BETWEEN 1 AND 5 /*Add this bit to make parallel*/ OPTION(MAXDOP = = N'@nb_model int = with result sets (("ID" INT, "Setosa" INT, "versicolor" INT, "virginica" INT, "Species" varchar(150))); END; GO SET STATISTICS XML ON EXEC [predict_species_parallel] 'iris_rxbtrees_model' SET STATISTICS XML OFF
42
43 varbinary(max) = (SELECT TOP 1 model from [dbo].[iris_rx_models] WHERE model_name = 'iris_rxbtrees_model'); DROP TABLE IF EXISTS #TMP SELECT [Sepal.Length] *.9 AS [Sepal.Length], [Sepal.Width] *.9 AS [Sepal.Width], [Petal.Length] *.9 AS [Petal.Length], [Petal.Width]*.9 AS [Petal.Width], Species, ID INTO #tmp FROM iris_rx_data SELECT * FROM PREDICT(MODEL DATA = #tmp AS d) WITH (setosa_pred float, versicolor_pred float, virginica_pred float) AS p;
44 varbinary(max) = 0x626C6F62298B1834AEF8DA6B26B89D32475A371DEDE9328FD60DB36F45A5E2C702CB E472E9C26354AFC8A4474CFD49695C6E B F F4F3F2F F CDCC8C3F F F4F3F2F B F F F F4F3F2F F F FFF000000FF F F4F3F2F F B010003B F F4F3F2F F CDCC8C3F F F4F3F2F F FA F07FA F07F F03F C F F4F3F2F F F FFF000000FF F F4F3F2F F B020003D F F4F3F2F F CDCC8C3F F F4F3F2F B F F B A F F4F3F2F F F FFF000000FF F F4F3F2F F B030003B F F4F3F2F F CDCC8 C3F F F4F3F2F F FA F07FA F07F F03F F03F F F4F3F2F F F FFF000000FF F F4F3F2F F B040003CD F F4F3F2F F CDCC8C3F F F4F3F2F F FA F07FA F07F F03F F03F F03F D53F F03F D53F E03F4FECC44EECC4EE3F D53F E03F143BB1133BB1A33F F03F F03F D53F E53F3096FC62C92FD63F7B14AE47E17AD43F F F4F3F2F F F FFF000000FF F F4F3F2F F B D F F4F3F2F F CDCC8C3F F F4F3F2F B F F F F4F3F2F F F FFF000000FF F F4F3F2F F B C F F4F3F2F F CDCC8C3F F F4F3F2F F C C2E4C656E B C2E F F4F3F2F F F FFF000000FF F F4F3F2F F B070003DD F F4F3F2F F CDCC8C3F F F4F3F2F A F FA F07FA F07F C F0BF F0BFF8FFFFFFFFFF38403CB1133BB A A F93F F F4F3F2F F F FFF000000FF F F4F3F2F F B D F F4F3F2F F CD CC8C3F F F4F3F2F B F F F F4F3F2F F F FFF000000FF F F4F3F2F F B F F4F3F2F F CDCC8C3F F F4F3F2F B F F F F4F3F2F F F FFF000000FF F F4F3F2F F B0A B C B B B F03F F03F F03F F03F F03F F03F E C B0F F A F6C6F E B B C C2E4C656E B C2E C C2E4C656E B C2E B C C2E4C656E B C2E C C2E4C656E B C2E B B B B1A B1B DROP TABLE IF EXISTS #TMP SELECT * INTO #TMP FROM ( SELECT 5.1 [Sepal.Length], 3.4 [Sepal.Width], 1.5 [Petal.Length], 0.3 [Petal.Width] UNION SELECT 4.3, 3.1, 1.2, 0.2 UNION SELECT 6.6, 1.4, 5.3, 2.2) AS A SELECT d.*, p.* FROM PREDICT(MODEL DATA = #TMP as d) WITH(setosa_Pred float, versicolor_pred float, virginica_pred float) as p;
45
46
47 DMV sys.dm_exec_requests sys.dm_external_script_requests sys.dm_external_script_execution_ stats sys.dm_os_performance_counters Description New column: external_script_request_id Returns running external scripts, DOP & assigned user account Number of executions for rx* functions in RevoScaleR package New External Scripts performance counters
48 SELECT * FROM sys.resource_governor_external_resource_pools GO ALTER EXTERNAL RESOURCE POOL [default] WITH ( MAX_CPU_PERCENT = 90, AFFINITY CPU = AUTO, MAX_MEMORY_PERCENT = 25 ); GO ALTER RESOURCE GOVERNOR RECONFIGURE; GO SELECT * FROM sys.resource_governor_external_resource_pools
49 (SQL Server 2017) 2017
50
51 CD C:\Program Files\Microsoft SQL Server\140\Setup Bootstrap\SQL2017\x64\ RSetup.exe /install /component MLM /version /language 1033 /destdir "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\R_SERVICES\library\MicrosoftML\mxLibs\x64"
52 CREATE TABLE CNNFileLocations ( [file.name] nvarchar(max), type nvarchar(max), label Int, modelname nvarchar(150)) CREATE PROCEDURE [dbo].[spcnnloadfilelocationsr] (@ModeltoTrain NVarChar(MAX)) as begin DELETE FROM CNNFileLocations WHERE modelname INSERT INTO CNNFileLocations execute = = N' root.directory.name.training <- paste("c:/r/images/", ModeltoTrain, "/Training", sep="") root.directory.name.testing <- paste("c:/r/images/", ModeltoTrain, "/testing", sep="") training.folders <- list.dirs(root.directory.name.training) root.folder.length <- nchar(root.directory.name.training) + 1 #Remove the root folder as we do not need it training.folders <- training.folders[-1] imagesdf <- data.frame(cbind( file.name = file.path(list.files(training.folders, "*.*", full.names = TRUE)), type = substr(dirname(list.files(training.folders, "*.*", full.names = TRUE)), root.folder.length + 1, 1000)), stringsasfactors = FALSE) #Create an integer label by turniing into a factor and then to an integer imagesdf$label <- as.integer(as.factor(imagesdf[[2]])) - 1 imagesdf$modelname <- ModeltoTrain OutputDataSet <- data.frame(imagesdf) = N'@ModeltoTrain ; END
53 CREATE PROCEDURE [dbo].[spcnnmodelcreate] NVarChar(MAX)) as begin nvarchar(max) = CONCAT(N'SELECT [file.name], [type], [Label] FROM CNNFileLocations WHERE modelname = N'''') CREATE TABLE [dbo].[cnnmodel]( execute sp_execute_external_script [int] = N'R' IDENTITY(1,1) NOT NULL PRIMARY = N' [Model] [varbinary](max) NULL, require(microsoftml) [ModelName] [nvarchar](150) NULL, [dt2] [datetime2](7) NOT NULL DEFAULT(GETDATE())) imagesdf <- CNNFileLocations imagesdf$file.name <- as.character(imagesdf$file.name) imagesdf$type GO <- as.character(imagesdf$type) imagesdf$label <- as.numeric(imagesdf$label) imagemodel <- rxlogisticregression( formula = Label ~ Features, data = imagesdf, NVarChar(MAX) = 'GoT') AS type = "multiclass", mltransforms = list( loadimage(vars = list(features = "file.name")), resizeimage(vars = "Features", width = 224, height = 224), extractpixels(vars = "Features"), INSERT featurizeimage(var = "Features", dnnmodel = "Resnet50")) ) CREATE PROCEDURE [dbo].[spcnnmodelinsert] (@ModeltoTrain AS TABLE (v VarBinary(MAX)) EXEC OutputDataSet <- data.frame(payload = as.raw(serialize(imagemodel, connection=null))); INSERT INTO CNNModel (Model, ModelName) SELECT = = = N'@ModeltoTrain with result sets ((model varbinary(max))); END
54 CREATE PROCEDURE [dbo].[spcnnmodelpredict] varchar(150)) AS BEGIN nvarchar(max) = CONCAT(N'SELECT [type], label FROM CNNFileLocations WHERE modelname = N''' GROUP BY [type], label ORDER BY [type]') varbinary(max); select TOP = model from CNNModel where ModelName ORDER BY dt2 DESC -- Predict species based on the specified model: exec = = N' cnn_modelu <- unserialize(cnn_model) root.directory.name.testing <- paste("c:/r/images/", ModeltoTrain, "/testing", sep="") testing.folder <- list.dirs(root.directory.name.testing) test.files <- data.frame(file.name = file.path(list.files(testing.folder, "*.*", full.names = TRUE)), stringsasfactors = FALSE) test.files[, "Label"] <- -99 # Lets use the trained model to predict the type of image prediction <- rxpredict(cnn_modelu, data = test.files, extravarstowrite = list("label", "file.name")) #Get the distinct values distinct.types <- CNNFileLocations OutputDataSet <- distinct.types #Join to find the type names prediction <- merge(prediction, distinct.types, by.x = "PredictedLabel", by.y = "label") OutputDataSet <- prediction = = = N'@cnn_model @ModeltoTrain with result sets UNDEFINED; end;
55
56 R Client Easily scale up a single server to a grid to handle more concurrent requests Load balancing cross compute nodes A shared pool of warmed up R shells to improve scoring performance.
57 Load Balancer Server level HA: Introduce multiple Web Nodes for Active-Active backup / recovery, via load balancer Data Store HA: leverage Enterprise grade DB, SQL Server and Postgres HA capabilities
58
59
60 Distributed R - How Does Local Compute Context? Microsoft R Server Client R IDE or commandline Predictive Algorithm Console Analyze Blocks In Parallel LOCAL CONTEXT Load Block At A Time Big Data Microsoft R Server functions A compute context defines where to process. E.g. remote context like Hadoop Map Reduce Microsoft R functions prefixed with rx Current set compute context determines processing location Copyright Microsoft Corporation. All rights reserved.
61 Distributed R - How Does Remote Compute Context? Microsoft R Server Client Microsoft R Server Server R IDE or commandline REMOTE CONTEXT Distribute Work, Compile Results Analyze Blocks In Parallel Load Block At A Time Big Data Console Predictive Algorithm Results Pack and Ship Requests to Remote Environments Algorithm Master Microsoft R Server functions A compute context defines where to process. E.g. remote context like Hadoop Map Reduce Microsoft R functions prefixed with rx Current set compute context determines processing location Copyright Microsoft Corporation. All rights reserved.
62 ScaleR models can be deployed from a server or edge node to run in Hadoop without any functional R model re-coding for map-reduce Compute context R script sets where the model will run Local Parallel processing Linux or Windows ### SETUP LOCAL ENVIRONMENT VARIABLES ### mylocalcc <- localpar ### LOCAL COMPUTE CONTEXT ### rxsetcomputecontext(mylocalcc) ### CREATE LINUX, DIRECTORY AND FILE OBJECTS ### linuxfs <- RxNativeFileSystem() ) AirlineDataSet <- RxXdfData( AirlineDemoSmall/AirlineDemoSmall.xdf, filesystem = linuxfs) In Hadoop myhadoopccc <- RxHadoopMR() rxsetcomputecontext(myhadoopcc) hdfsfs <- RxHdfsFileSystem() hdfsfs Functional model R script does not need to change to run in Hadoop ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxsummary(~arrdelay+dayofweek, data= AirlineDataSet, reportprogress=1) ### CrossTab the data rxcrosstabs(arrdelay ~ DayOfWeek, data= AirlineDataSet, means=t) ### Linear Model and plot hdfsxdfarrlatelinmod <- rxlinmod(arrdelay ~ DayOfWeek + 0, data = AirlineDataSet) plot(hdfsxdfarrlatelinmod$coefficients)
63 ScaleR models can be deployed from a server or edge node to run in SQL Server without any functional R model re-coding for in-database computations Compute context R script sets where the model will run Local Parallel processing Linux or Windows ### SETUP LOCAL ENVIRONMENT VARIABLES ### mysqlcon <- "Driver=SQL;SERVER=localhost;Database= RevoTester;Uid=RevoTester; pwd=######" mylocalcc <- localpar ### LOCAL COMPUTE CONTEXT ### rxsetcomputecontext(mylocalcc) ### CREATE SQL SERVER DATA SOURCE ### AirlineDemoQuery <- "SELECT * FROM AirlineDemoSmall;" AirlineDataSet <- RxOdbcData(connectionString = mysqlcon, sqlquery = AirlineDemoQuery) In SQL SERVER ### SETUP SQL Server ENVIRONMENT VARIABLES ### mysqlcc <- "Driver=SQL;SERVER=localhost;Database=RevoTester; Uid=RevoTester; pwd=######" ### SQL SERVER COMPUTE CONTEXT ### rxsetcomputecontext(mysqlcc) ### CREATE SQL SERVER DATA SOURCE ### AirlineDemoQuery <- "SELECT * FROM AirlineDemoSmall;" AirlineDataSet <- RxSqlServerData(connectionString = mysqlcc, sqlquery = AirlineDemoQuery) Functional model R script does not need to change to run in either DB ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxsummary(~arrdelay+dayofweek, data= AirlineDataSet, reportprogress=1) ### CrossTab the data rxcrosstabs(arrdelay ~ DayOfWeek, data= AirlineDataSet, means=t) ### Linear Model and plot hdfsxdfarrlatelinmod <- rxlinmod(arrdelay ~ DayOfWeek + 0, data = AirlineDataSet) plot(hdfsxdfarrlatelinmod$coefficients)
64 ScaleR models can be deployed from a server or edge node to run in Teradata without any functional R model re-coding for in-database computations Compute context R script sets where the model will run Local Parallel processing Linux or Windows ### SETUP LOCAL ENVIRONMENT VARIABLES ### mylocalcc <- localpar ### LOCAL COMPUTE CONTEXT ### rxsetcomputecontext(mylocalcc) ### CREATE LOCAL FILE-SYSTEM POINTER AND FILE OBJECT ### localfs <- RxNativeFileSystem() ) AirlineDataSet <- RxXdfData( AirlineDemoSmall.xdf, filesystem = localfs) In Teradata ### SETUP TERADATA ENVIRONMENT VARIABLES ### mytdcc <- "Driver=Teradata; DBCNAME=TeradataProd; Database=RevoTester; Uid=RevoTester; pwd=######" ### TERADATA COMPUTE CONTEXT ### rxsetcomputecontext(mytdcc) ### CREATE TERADATA DATA SOURCE ### AirlineDemoQuery <- "SELECT * FROM AirlineDemoSmall;" AirlineDataSet <- RxTeradata(connectionString = mytdcc, sqlquery = AirlineDemoQuery) Functional model R script does not need to change to run in Teradata ### ANALYTICAL PROCESSING ### ### Statistical Summary of the data rxsummary(~arrdelay+dayofweek, data= AirlineDataSet, reportprogress=1) ### CrossTab the data rxcrosstabs(arrdelay ~ DayOfWeek, data= AirlineDataSet, means=t) ### Linear Model and plot hdfsxdfarrlatelinmod <- rxlinmod(arrdelay ~ DayOfWeek + 0, data = AirlineDataSet) plot(hdfsxdfarrlatelinmod$coefficients)
65 R R R R R R R R R R R Server
66
67
68
69
70 Anomaly Detection
71
72
73
74
75
76
77 Train on what is normal (single class) Model understands what it like to be normal When an item is encountered that does not fit its idea of what it is like to be normal then it is counted as an anomaly
78
79
80
81 B I G D ATA & A D VA N C E D A N A LY T I C S AT A G L A N C E Ingest Store Prep & Train Model & Serve Intelligence Business apps Data Factory (Data movement, pipelines & orchestration) Cosmos DB Custom apps Kafka Blobs Data Lake Databricks HDInsight Data Lake Analytics SQL SQL Database Predictive apps Event Hub IoT Hub Machine Learning SQL Data Warehouse Operational reports Sensors and devices Analysis Services Analytical dashboards
82 A P A C H E S P A R K An unified, open source, parallel, data processing framework for Big Data Analytics Spark SQL Interactive Queries Spark MLlib Machine Learning Spark Streaming Stream processing GraphX Graph Computation Spark Core Engine Yarn Mesos Spark Structured Streaming Stream processing Standalone Spark Scheduler MLlib Machine Learning
83 S P A R K - B E N E F I T S Performance Using in-memory computing, Spark is considerably faster than Hadoop (100x in some tests). Can be used for batch and real-time data processing. Developer Productivity Easy-to-use APIs for processing large datasets. Includes 100+ operators for transforming. Unified Engine Integrated framework includes higher-level libraries for interactive SQL queries, Stream Analytics, ML and graph processing. A single application can combine all types of processing. Ecosystem Spark has built-in support for many data sources, rich ecosystem of ISV applications and a large dev community. Available on multiple public clouds (AWS, Google and Azure) and multiple on-premises distributors
84 A D V A N T A G E S O F A U N I F I E D P L A T F O R M Spark Streaming Spark Machine Learning Spark SQL
85 D A T A B R I C K S - C O M P A N Y O V E R V I E W
86 A Z U R E D A T A B R I C K S Microsoft Azure
87 A Z U R E D A T A B R I C K S Azure Databricks Collaborative Workspace IoT / streaming data DATA ENGINEER DATA SCIENTIST BUSINESS ANALYST Machine learning models Cloud storage Deploy Production Jobs & Workflows BI tools MULTI-STAGE PIPELINES JOB SCHEDULER NOTIFICATION & LOGS Data warehouses Optimized Databricks Runtime Engine Data exports Hadoop storage DATABRICKS I/O APACHE SPARK SERVERLESS Rest APIs Data warehouses Enhance Productivity Build on secure & trusted cloud Scale without limits
88 BIG DATA STORAGE Reduced Administration BIG DATA ANALYTICS K N O W I N G T H E V A R I O U S B I G D A T A S O L U T I O N S CONTROL EASE OF USE Azure Databricks Azure Data Lake Analytics Azure HDInsight Azure Marketplace HDP CDH MapR Any Hadoop technology, any distribution Workload optimized, managed clusters Frictionless & Optimized Spark clusters Data Engineering in a Job-as-a-service model IaaS Clusters Managed Clusters Big Data as-a-service Azure Data Lake Analytics Azure Data Lake Store Azure Storage
89 G E N E R A L S P A R K C L U S T E R A R C H I T E C T U R E Driver Program SparkContext Cluster Manager Worker Node Worker Node Worker Node Data Sources (HDFS, SQL, NoSQL, )
90 S E C U R E C O L L A B O R A T I O N Azure Databricks enables secure collaboration between colleagues With Azure Databricks colleagues can securely share key artifacts such as Clusters, Notebooks, Jobs and Workspaces Secure collaboration is enabled through a combination of: Fine grained permissions: Defines who can do what on which artifacts (access control) Fine Grained Permissions AAD-based User Authentication AAD-based authentication: Ensures that users are actually who they claim to be
91
92 J O B S Jobs are the mechanism to submit Spark application code for execution on the Databricks clusters Spark application code is submitted as a Job for execution on Azure Databricks clusters Jobs execute either Notebooks or Jars Azure Databricks provide a comprehensive set of graphical tools to create, manage and monitor Jobs.
93 D A T A B R I C K S S P A R K I S F A S T Benchmarks have shown Databricks to often have better performance than alternatives SOURCE: Benchmarking Big Data SQL Platforms in the Cloud
94 Spark ML Algorithms S P A R K M L A L G O R I T H M S
95 D E E P L E A R N I N G Azure Databricks supports and integrates with a number of Deep Learning libraries and frameworks to make it easy to build and deploy Deep Learning applications Supports Deep Learning Libraries/frameworks including: Microsoft Cognitive Toolkit (CNTK). o Article explains how to install CNTK on Azure Databricks. TensorFlowOnSpark BigDL Offers Spark Deep Learning Pipelines, a suite of tools for working with and processing images using deep learning using transfer learning. It includes high-level APIs for common aspects of deep learning so they can be done efficiently in a few lines of code: Distributed Hyperparameter Tuning Transfer Learning
96
97
98
99
100
101
102
103
104
105 Visual Studio Tools for AI Visual Studio extension with deep integration to Azure ML End to end development environment, from new project through training Support for remote training Job management On top of all of the goodness of Visual Studio (Python, Jupyter, Git, etc)
106
107
108
109 THE FASTEST TOOLKIT
110 MOST SCALABLE
111 This material is provided for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED.
Franck Mercier. Technical Solution Professional Data + AI Azure Databricks
Franck Mercier Technical Solution Professional Data + AI http://aka.ms/franck @FranmerMS Azure Databricks Thanks to our sponsors Global Gold Silver Bronze Microsoft JetBrains Rubrik Delphix Solution OMD
More informationData and AI LATAM 2018
Data and AI LATAM 2018 La parte de imagen con el identificador de relación rid5 no se encontró en el archivo. La parte de imagen con el identificador de relación rid5 no se encontró en el archivo. La parte
More informationModeling. Preparation. Operationalization. Profile Explore. Model Testing & Validation. Feature & Algorithm Selection. Transform Cleanse Denormalize
Preparation Modeling Ingest Transform Cleanse Denormalize Profile Explore Visualize Feature & Algorithm Selection Model Testing & Validation Operationalization Models Visualizations Deploy Apps, Services
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationMicrosoft. Advanced Analytics. Juan Carlos Rodriguez García Data Platform Solution Architect
Microsoft Advanced Analytics Juan Carlos Rodriguez García jurodr@microsoft.com Data Platform Solution Architect VALOR Fuente: Gartner DIFICULTAD Banca Omnicanal MAX Maximizar el Tiempo, Todo el Tiempo
More informationUnderstanding the latent value in all content
Understanding the latent value in all content John F. Kennedy (JFK) November 22, 1963 INGEST ENRICH EXPLORE Cognitive skills Data in any format, any Azure store Search Annotations Data Cloud Intelligence
More informationBoost your Analytics with ML for SQL Nerds
Boost your Analytics with ML for SQL Nerds SQL Saturday Spokane Mar 10, 2018 Julie Koesmarno @MsSQLGirl mssqlgirl.com jukoesma@microsoft.com Principal Program Manager in Business Analytics for SQL Products
More informationMicrosoft, Open Source, R: You Gotta be Kidding Me!
Microsoft, Open Source, R: You Gotta be Kidding Me! Bio - Niels Berglund Software Specialist - Derivco lots of production dev. plus figuring out ways to "use and abuse" existing and new technologies Author
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationOverview of Data Services and Streaming Data Solution with Azure
Overview of Data Services and Streaming Data Solution with Azure Tara Mason Senior Consultant tmason@impactmakers.com Platform as a Service Offerings SQL Server On Premises vs. Azure SQL Server SQL Server
More informationMicrosoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud
Microsoft Azure Databricks for data engineering Building production data pipelines with Apache Spark in the cloud Azure Databricks As companies continue to set their sights on making data-driven decisions
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationData Architectures in Azure for Analytics & Big Data
Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A
More informationData 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.
Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020
More information70-532: Developing Microsoft Azure Solutions
70-532: Developing Microsoft Azure Solutions Exam Design Target Audience Candidates of this exam are experienced in designing, programming, implementing, automating, and monitoring Microsoft Azure solutions.
More informationAsanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks
Asanka Padmakumara ETL 2.0: Data Engineering with Azure Databricks Who am I? Asanka Padmakumara Business Intelligence Consultant, More than 8 years in BI and Data Warehousing A regular speaker in data
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More information70-532: Developing Microsoft Azure Solutions
70-532: Developing Microsoft Azure Solutions Objective Domain Note: This document shows tracked changes that are effective as of January 18, 2018. Create and Manage Azure Resource Manager Virtual Machines
More informationThe Evolution of Big Data Platforms and Data Science
IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering
More informationHDInsight > Hadoop. October 12, 2017
HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationAlexander Klein. #SQLSatDenmark. ETL meets Azure
Alexander Klein ETL meets Azure BIG Thanks to SQLSat Denmark sponsors Save the date for exiting upcoming events PASS Camp 2017 Main Camp 05.12. 07.12.2017 (04.12. Kick-Off abends) Lufthansa Training &
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More informationAzure Data Lake Analytics Introduction for SQL Family. Julie
Azure Data Lake Analytics Introduction for SQL Family Julie Koesmarno @MsSQLGirl www.mssqlgirl.com jukoesma@microsoft.com What we have is a data glut Vernor Vinge (Emeritus Professor of Mathematics at
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationThis document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and
AI and Visual Analytics: Machine Learning in Business Operations Steven Hillion Senior Director, Data Science Anshuman Mishra Principal Data Scientist DISCLAIMER During the course of this presentation,
More informationData 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.
17-18 March, 2018 Beijing Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020 Today, 80% of organizations
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Aurélie Urbain MathWorks Consulting Services 2015 The MathWorks, Inc. 1 Data Analytics Workflow Data Acquisition Data Analytics Analytics Integration Business
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationexam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0
70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to
More informationSQL Server Machine Learning Marek Chmel & Vladimir Muzny
SQL Server Machine Learning Marek Chmel & Vladimir Muzny @VladimirMuzny & @MarekChmel MCTs, MVPs, MCSEs Data Enthusiasts! vladimir@datascienceteam.cz marek@datascienceteam.cz Session Agenda Machine learning
More informationArchitecting Microsoft Azure Solutions (proposed exam 535)
Architecting Microsoft Azure Solutions (proposed exam 535) IMPORTANT: Significant changes are in progress for exam 534 and its content. As a result, we are retiring this exam on December 31, 2017, and
More informationTalend Big Data Sandbox. Big Data Insights Cookbook
Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is
More informationAgenda. Spark Platform Spark Core Spark Extensions Using Apache Spark
Agenda Spark Platform Spark Core Spark Extensions Using Apache Spark About me Vitalii Bondarenko Data Platform Competency Manager Eleks www.eleks.com 20 years in software development 9+ years of developing
More informationApproaching the Petabyte Analytic Database: What I learned
Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may
More informationMATLAB. Senior Application Engineer The MathWorks Korea The MathWorks, Inc. 2
1 Senior Application Engineer The MathWorks Korea 2017 The MathWorks, Inc. 2 Data Analytics Workflow Business Systems Smart Connected Systems Data Acquisition Engineering, Scientific, and Field Business
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationPřehled novinek v SQL Server 2016
Přehled novinek v SQL Server 2016 Martin Rys, BI Competency Leader martin.rys@adastragrp.com https://www.linkedin.com/in/martinrys 20.4.2016 1 BI Competency development 2 Trends, modern data warehousing
More informationBig Data Applications with Spring XD
Big Data Applications with Spring XD Thomas Darimont, Software Engineer, Pivotal Inc. @thomasdarimont Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under a
More informationVishesh Oberoi Seth Reid Technical Evangelist, Microsoft Software Developer, Intergen
Vishesh Oberoi Technical Evangelist, Microsoft VishO@microsoft.com @ovishesh Seth Reid Software Developer, Intergen contact@sethreid.co.nz @sethreidnz Vishesh Oberoi Technical Evangelist, Microsoft VishO@microsoft.com
More informationMicrosoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo
Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 HOTSPOT You install the Microsoft Hive ODBC Driver on a computer that runs Windows
More informationOlivia Klose Technical Evangelist. Sascha Dittmann Cloud Solution Architect
Olivia Klose Technical Evangelist Sascha Dittmann Cloud Solution Architect What is Apache Spark? Apache Spark is a fast and general engine for large-scale data processing. An unified, open source, parallel,
More informationSQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024
Current support level End Mainstream End Extended SQL Server 2005 SQL Server 2008 and 2008 R2 SQL Server 2012 SQL Server 2005 SP4 is in extended support, which ends on April 12, 2016 SQL Server 2008 and
More informationIndustry-leading Application PaaS Platform
Industry-leading Application PaaS Platform Solutions Transactional Apps Digital Marketing LoB App Modernization Services Web Apps Web App for Containers API Apps Mobile Apps IDE Enterprise Integration
More informationAzure Data Factory VS. SSIS. Reza Rad, Consultant, RADACAD
Azure Data Factory VS. SSIS Reza Rad, Consultant, RADACAD 2 Please silence cell phones Explore Everything PASS Has to Offer FREE ONLINE WEBINAR EVENTS FREE 1-DAY LOCAL TRAINING EVENTS VOLUNTEERING OPPORTUNITIES
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationMicrosoft vision for a new era
Microsoft vision for a new era United platform for the modern service provider MICROSOFT AZURE CUSTOMER DATACENTER CONSISTENT PLATFORM SERVICE PROVIDER Enterprise-grade Global reach, scale, and security
More informationAzure Data Lake Store
Azure Data Lake Store Analytics 101 Kenneth M. Nielsen Data Solution Architect, MIcrosoft Our Sponsors About me Kenneth M. Nielsen Worked with SQL Server since 1999 Data Solution Architect at Microsoft
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More information28 February 1 March 2018, Trafo Baden. #techsummitch
#techsummitch 28 February 1 March 2018, Trafo Baden #techsummitch Transform your data estate with cloud, data and AI #techsummitch The world is changing Data will grow to 44 ZB in 2020 Today, 80% of organizations
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Dr. Roland Michaely 2015 The MathWorks, Inc. 1 Data Analytics Workflow Access and Explore Data Preprocess Data Develop Predictive Models Integrate Analytics
More informationCloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List)
CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List) Microsoft Solution Latest Sl Area Refresh No. Course ID Run ID Course Name Mapping Date 1 AZURE202x 2 Microsoft
More informationTechno Expert Solutions
Course Content of Microsoft Windows Azzure Developer: Course Outline Module 1: Overview of the Microsoft Azure Platform Microsoft Azure provides a collection of services that you can use as building blocks
More informationMicrosoft. Exam Questions Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo
Microsoft Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) Version:Demo NEW QUESTION 1 You have an Azure HDInsight cluster. You need to store data in a file format that
More informationProcessing of big data with Apache Spark
Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT
More informationSQT03 Big Data and Hadoop with Azure HDInsight Andrew Brust. Senior Director, Technical Product Marketing and Evangelism
Big Data and Hadoop with Azure HDInsight Andrew Brust Senior Director, Technical Product Marketing and Evangelism Datameer Level: Intermediate Meet Andrew Senior Director, Technical Product Marketing and
More informationAzure DevOps. Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region
Azure DevOps Randy Pagels Intelligent Cloud Technical Specialist Great Lakes Region What is DevOps? People. Process. Products. Build & Test Deploy DevOps is the union of people, process, and products to
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationPrepare. Model. Operationalize
Prepare Model Operationalize Model Re-Code Validate Deploy How do we operationalize R? Turn R analytics Web services in one line of code; Swagger-based REST APIs, easy to consume, with any programming
More informationIntegrate MATLAB Analytics into Enterprise Applications
Integrate Analytics into Enterprise Applications Lyamine Hedjazi 2015 The MathWorks, Inc. 1 Data Analytics Workflow Preprocessing Data Business Systems Build Algorithms Smart Connected Systems Take Decisions
More information17/05/2017. What we ll cover. Who is Greg? Why PaaS and SaaS? What we re not discussing: IaaS
What are all those Azure* and Power* services and why do I want them? Dr Greg Low SQL Down Under greg@sqldownunder.com Who is Greg? CEO and Principal Mentor at SDU Data Platform MVP Microsoft Regional
More informationOracle Machine Learning Notebook
Oracle Machine Learning Notebook Included in Autonomous Data Warehouse Cloud Charlie Berger, MS Engineering, MBA Sr. Director Product Management, Machine Learning, AI and Cognitive Analytics charlie.berger@oracle.com
More informationOracle Big Data Discovery
Oracle Big Data Discovery Turning Data into Business Value Harald Erb Oracle Business Analytics & Big Data 1 Safe Harbor Statement The following is intended to outline our general product direction. It
More informationSwimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad
Swimming in the Data Lake Presented by Warner Chaves Moderated by Sander Stad Thank You microsoft.com hortonworks.com aws.amazon.com red-gate.com Empower users with new insights through familiar tools
More informationSpotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data
Spotfire Data Science with Hadoop Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data THE RISE OF BIG DATA BIG DATA: A REVOLUTION IN ACCESS Large-scale data sets are nothing
More informationOracle Big Data Fundamentals Ed 2
Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies
More informationGreenplum-Spark Connector Examples Documentation. kong-yew,chan
Greenplum-Spark Connector Examples Documentation kong-yew,chan Dec 10, 2018 Contents 1 Overview 1 1.1 Pivotal Greenplum............................................ 1 1.2 Pivotal Greenplum-Spark Connector...................................
More informationMicrosoft Perform Data Engineering on Microsoft Azure HDInsight.
Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight http://killexams.com/pass4sure/exam-detail/70-775 QUESTION: 30 You are building a security tracking solution in Apache Kafka to parse
More informationWITH INTEL TECHNOLOGIES
WITH INTEL TECHNOLOGIES Commitment Is to Enable The Best Democratize technologies Advance solutions Unleash innovations Intel Xeon Scalable Processor Family Delivers Ideal Enterprise Solutions NEW Intel
More informationDatabricks, an Introduction
Databricks, an Introduction Chuck Connell, Insight Digital Innovation Insight Presentation Speaker Bio Senior Data Architect at Insight Digital Innovation Focus on Azure big data services HDInsight/Hadoop,
More informationOskari Heikkinen. New capabilities of Azure Data Factory v2
Oskari Heikkinen New capabilities of Azure Data Factory v2 Oskari Heikkinen Lead Cloud Architect at BIGDATAPUMP Microsoft P-TSP Azure Advisors Numerous projects on Azure Worked with Microsoft Data Platform
More informationNew Features and Enhancements in Big Data Management 10.2
New Features and Enhancements in Big Data Management 10.2 Copyright Informatica LLC 2017. Informatica, the Informatica logo, Big Data Management, and PowerCenter are trademarks or registered trademarks
More informationBig data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT
: Choices for high availability and disaster recovery on Microsoft Azure By Arnab Ganguly DataCAT March 2019 Contents Overview... 3 The challenge of a single-region architecture... 3 Configuration considerations...
More informationSpecialist ICT Learning
Specialist ICT Learning APPLIED DATA SCIENCE AND BIG DATA ANALYTICS GTBD7 Course Description This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics.
More information20532D: Developing Microsoft Azure Solutions
20532D: Developing Microsoft Azure Solutions Course Details Course Code: Duration: Notes: 20532D 5 days Elements of this syllabus are subject to change. About this course This course is intended for students
More informationDeveloping Microsoft Azure Solutions
Developing Microsoft Azure Solutions Duration: 5 Days Course Code: M20532 Overview: This course is intended for students who have experience building web applications. Students should also have experience
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationBI ENVIRONMENT PLANNING GUIDE
BI ENVIRONMENT PLANNING GUIDE Business Intelligence can involve a number of technologies and foster many opportunities for improving your business. This document serves as a guideline for planning strategies
More informationSQL Server 2017 Power your entire data estate from on-premises to cloud
SQL Server 2017 Power your entire data estate from on-premises to cloud PREMIER SPONSOR GOLD SPONSORS SILVER SPONSORS BRONZE SPONSORS SUPPORTERS Vulnerabilities (2010-2016) Power your entire data estate
More informationUnifying Big Data Workloads in Apache Spark
Unifying Big Data Workloads in Apache Spark Hossein Falaki @mhfalaki Outline What s Apache Spark Why Unification Evolution of Unification Apache Spark + Databricks Q & A What s Apache Spark What is Apache
More informationCOURSE 10977A: UPDATING YOUR SQL SERVER SKILLS TO MICROSOFT SQL SERVER 2014
ABOUT THIS COURSE This five-day instructor-led course teaches students how to use the enhancements and new features that have been added to SQL Server and the Microsoft data platform since the release
More informationIndex. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI /
Index A Advanced Message Queueing Protocol (AMQP), 44 Analytics, 9 Apache Ambari project, 209 210 API key, 244 Application data, 4 Azure Active Directory (AAD), 91, 257 Azure Blob Storage, 191 Azure data
More informationLatest from the Lab: What's New Machine Learning Sam Buhler - Machine Learning Product/Offering Manager
Latest from the Lab: What's New Machine Learning Sam Buhler - Machine Learning Product/Offering Manager Please Note IBM s statements regarding its plans, directions, and intent are subject to change or
More informationAgenda. Future Sessions: Azure VMs, Backup/DR Strategies, Azure Networking, Storage, How to move
Onur Dogruoz Agenda Provide an introduction to Azure Infrastructure as a Service (IaaS) Walk through the Azure portal Help you understand role-based access control Engage in an overview of the calculator
More informationBig Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture
Big Data Syllabus Hadoop YARN Setup Programming in YARN framework j Understanding big data and Hadoop Big Data Limitations and Solutions of existing Data Analytics Architecture Hadoop Features Hadoop Ecosystem
More informationThe age of Big Data Big Data for Oracle Database Professionals
The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG
More informationBig data systems 12/8/17
Big data systems 12/8/17 Today Basic architecture Two levels of scheduling Spark overview Basic architecture Cluster Manager Cluster Cluster Manager 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationTalend Big Data Sandbox. Big Data Insights Cookbook
Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is
More information#techsummitch
www.thomasmaurer.ch #techsummitch Justin Incarnato Justin Incarnato Microsoft Principal PM - Azure Stack Hyper-scale Hybrid Power of Azure in your datacenter Azure Stack Enterprise-proven On-premises
More informationScaling MATLAB. for Your Organisation and Beyond. Rory Adams The MathWorks, Inc. 1
Scaling MATLAB for Your Organisation and Beyond Rory Adams 2015 The MathWorks, Inc. 1 MATLAB at Scale Front-end scaling Scale with increasing access requests Back-end scaling Scale with increasing computational
More informationDeveloping Microsoft Azure Solutions (70-532) Syllabus
Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages
More informationAzure Data Factory. Data Integration in the Cloud
Azure Data Factory Data Integration in the Cloud 2018 Microsoft Corporation. All rights reserved. This document is provided "as-is." Information and views expressed in this document, including URL and
More informationCombine Native SQL Flexibility with SAP HANA Platform Performance and Tools
SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been
More informationIntegrating Advanced Analytics with Big Data
Integrating Advanced Analytics with Big Data Ian McKenna, Ph.D. Senior Financial Engineer 2017 The MathWorks, Inc. 1 The Goal SCALE! 2 The Solution tall 3 Agenda Introduction to tall data Case Study: Predicting
More informationOracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data
Oracle Big Data SQL Release 3.2 The unprecedented explosion in data that can be made useful to enterprises from the Internet of Things, to the social streams of global customer bases has created a tremendous
More information