SQL Server Machine Learning Marek Chmel & Vladimir Muzny

Similar documents
SQL Server 2017: Data Science with Python or R?

Boost your Analytics with ML for SQL Nerds

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Python With Data Science

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

Overview of Data Services and Streaming Data Solution with Azure

R Language for the SQL Server DBA

About Intellipaat. About the Course. Why Take This Course?

Dr. SubraMANI Paramasivam. Think & Work like a Data Scientist with SQL 2016 & R

Data Analytics Training Program using

COURSE 10977A: UPDATING YOUR SQL SERVER SKILLS TO MICROSOFT SQL SERVER 2014

SQL Server 2017 Power your entire data estate from on-premises to cloud

Boost your Analytics with Machine Learning for SQL Nerds. Julie mssqlgirl.com

Data Analytics Training Program

Populating the Galaxy Zoo

Data and AI LATAM 2018

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

"Charting the Course... MOC B Updating Your SQL Server Skills to Microsoft SQL Server 2014 Course Summary

Microsoft, Open Source, R: You Gotta be Kidding Me!

Data Science Bootcamp Curriculum. NYC Data Science Academy

Indira Bandari. Predictive Analytics using R in SQL Server

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

Certified Data Science with Python Professional VS-1442

Understanding the latent value in all content

Přehled novinek v SQL Server 2016

BIG DATA COURSE CONTENT

HANDS ON DATA MINING. By Amit Somech. Workshop in Data-science, March 2016

Analyzing Big Data with Microsoft R

SQL Server 2019 Big Data Clusters

CloudSwyft Learning-as-a-Service Course Catalog 2018 (Individual LaaS Course Catalog List)

Python Certification Training

Modern Data Warehouse The New Approach to Azure BI

Azure SQL Database. Indika Dalugama. Data platform solution architect Microsoft datalake.lk

Overview. Audience profile. At course completion. Course Outline. : 20773A: Analyzing Big Data with Microsoft R. Course Outline :: 20773A::

Data Science with Python Course Catalog

Blurring the Line Between Developer and Data Scientist

Matplotlib Python Plotting

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Mastering Data Warehouse Aggregates Solutions For Star Schema Performance

SQL Server in Azure. Marek Chmel. Microsoft MVP: Data Platform Microsoft MCSE: Data Management & Analytics Certified Ethical Hacker

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Connecting ArcGIS with R and Conda. Shaun Walbridge

Migrating Enterprise BI to Azure

Oracle Big Data Discovery

Using the Force of Python and SAS Viya on Star Wars Fan Posts

5/24/ MVP SQL Server: Architecture since 2010 MCT since 2001 Consultant and trainer since 1992

20777A: Implementing Microsoft Azure Cosmos DB Solutions

ARTIFICIAL INTELLIGENCE AND PYTHON

Take P, R or U. and solve your data quality problems Oliver Engels & Tillmann Eitelberg, OH22

SOFTWARE DEVELOPMENT: DATA SCIENCE

Specialist ICT Learning

Microsoft Perform Data Engineering on Microsoft Azure HDInsight.

The Future of Analytics or The New SQL

Modeling. Preparation. Operationalization. Profile Explore. Model Testing & Validation. Feature & Algorithm Selection. Transform Cleanse Denormalize

Microsoft vision for a new era

Python Training. Complete Practical & Real-time Trainings. A Unit of SequelGate Innovative Technologies Pvt. Ltd.

ADABAS & NATURAL 2050+

DATA SCIENCE NORTHWESTERN BOOT CAMP CURRICULUM OVERVIEW DATA SCIENCE BOOT CAMP

Python Certification Training

Python Certification Training

Microsoft certified solutions associate

THE DATA ANALYTICS BOOT CAMP

Course Outline. Upgrading Your Skills to SQL Server 2016 Course 10986A: 3 days Instructor Led

Microsoft Analytics Platform System (APS)

Evaluation of Machine Learning Algorithms for Satellite Operations Support

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Introduction to Computer Vision Laboratories

90 Hours for online Live Training

What is Gluent? The Gluent Data Platform

Data Science Course Content

Data Science and Open Source Software. Iraklis Varlamis Assistant Professor Harokopio University of Athens

Ch.1 Introduction. Why Machine Learning (ML)?

Introducing SAS Model Manager 15.1 for SAS Viya

DATA SCIENCE USING SPARK: AN INTRODUCTION

Intel Distribution for Python* и Intel Performance Libraries

Mike Schulte Data Scientist at the University of Pittsburgh Professor of Economics and Philosophy at Western Michigan University

Swimming in the Data Lake. Presented by Warner Chaves Moderated by Sander Stad

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

Python based Data Science on Cray Platforms Rob Vesse, Alex Heye, Mike Ringenburg - Cray Inc C O M P U T E S T O R E A N A L Y Z E

Oskari Heikkinen. New capabilities of Azure Data Factory v2

Introducing Oracle R Enterprise 1.4 -

Mothra: A Large-Scale Data Processing Platform for Network Security Analysis

Super SQL Bootcamp. Price $ (inc GST)

SCIENCE. An Introduction to Python Brief History Why Python Where to use

UCF DATA ANALYTICS AND VISUALIZATION BOOT CAMP

Prepare. Model. Operationalize

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Data Architectures in Azure for Analytics & Big Data

Analytics Platform for ATLAS Computing Services

DATA ANALYTICS BOOT CAMP

Columnstore Technology Improvements in SQL Server Presented by Niko Neugebauer Moderated by Nagaraj Venkatesan

BI ENVIRONMENT PLANNING GUIDE

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

Oracle Big Data Connectors

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

Scientific Computing using Python

Webgurukul Programming Language Course

Getting Started with Python

SQL Server SQL Server 2008 and 2008 R2. SQL Server SQL Server 2014 Currently supporting all versions July 9, 2019 July 9, 2024

Transcription:

SQL Server Machine Learning Marek Chmel & Vladimir Muzny @VladimirMuzny & @MarekChmel MCTs, MVPs, MCSEs Data Enthusiasts! vladimir@datascienceteam.cz marek@datascienceteam.cz

Session Agenda Machine learning and Data Science SQL 2017 Machine learning architecture Using R for Machine Learning with SQL Server Using Python for Machine Learning with SQL Server

Machine Learning Introduction Predict properties of new data by learning from a sample Predict sales of stores in a region based on historical sales Predict probability of fraud on a new credit card transaction Predict default of a new loan based on loan / transaction history Predict sentiment of a new tweet or review Classify new image(s) based on sample images & attributes Classify data into groups or clusters Popular ML technologies R & Python

Advanced analytics, or data science or artificial intelligence?

Machine learning / data mining algorithms

Data science more than data engineering

Main Differences DS vs. BI

Data Science & Machine Learning Roles Data Scientist A highly educated and skilled person who can solve complex data problems by employing deep expertise in scientific disciplines (mathematics, statistics or computer science) Data Professional A skilled person who creates or maintains data systems, data solutions or implements predictive modelling. Roles: Database Administrator, Database Developer, or BI Developer Software Developer A skilled person who designs and develops programming logic, and can apply machine learning to integrate predictive functionality into applications

Machine Learning Challenges

Real World Applications

Microsoft Rs Microsoft R Open Microsoft R Open in Azure ML Microsoft R Client Microsoft R Server...for HDInsight, for Hadoop, for Linux (SUSE, Red Hat/CentOS) Microsoft SQL Server 2016 R Services on-prem and for Azure SQL Database (preview) Microsoft SQL Server 2017 Machine Learning Services Microsoft Machine Learning Server

Python Fewer statistics/ml packages, but becoming just enough Great as glue: orchestration and scripting Key data science libraries numpy & scipy (numeric processing and stats) Nowhere near as vast as R in scope pandas (data frames) matplotlib and ggplot2 (charts) scikit-learn (mining) microsoftml* and revoscalepy*

Machine Learning Services History 2015 Microsoft acquires Revolution Analytics 2016 SQL Server R Services 2017 SQL Server Machine Learning Services

Machine Learning Architecture Extensibility framework create a better interface between SQL Server and data science languages such as R and Python reduce the friction that occurs when data science solutions are moved into production protect data that might be exposed during the data science development process Executing a trusted scripting language within a secure framework database developer can maintain security while allowing data scientists to use enterprise data SQL 2016 Extensibility Framework R Support (3.2.2) Microsoft R Server SQL Server 2017 Python Support (3.5.2) R Support (3.3.3) Native Scoring using PREDICT In -database Package Management

Architecture core concepts Multi-process architecture Full interoperability with open source R and Python R and Python can function independently on SQL Server Microsoft provides a set of proprietary libraries that provide integration with SQL Server Security support for both integrated Windows authentication and password-based SQL logins SQL Server Trusted Launchpad to manage external script execution Scalability and performance resource governance and parallel processing using SQL Server distributed computing provided by the algorithms in RevoScaleR and revoscalepy.

R Language Architecture RevoScaleR. Includes a variety of APIs for data manipulation and analysis. The APIs have been optimized to analyze data sets that are too big to fit in memory and to perform computations distributed over several cores or processors. RevoPemaR - Parallel External Memory Algorithm, developing own parallel algorithms

Python and SQL Server revoscalepy is a new library provided by Microsoft to support distributed computing, remote compute contexts, and high-performance algorithms for Python. It is based on the RevoScaleR package for R, which was provided in Microsoft R Server and SQL Server R Services, and aims to provide the same functionality: Supports multiple compute contexts, both remote and local Provides functions equivalent to those in RevoScaleR for data transformation and visualization Provides Python versions of RevoScaleR machine learning algorithms for distributed or parallel processing Improved performance, including use of the Intel math libraries

Best Practices: Resources Memory is a key constraint for R / Python scripts Use sys.dm_resource_governor_external_resource_pools DMV with a test workload Leverage Resource Governance to isolate SQL & external scripts New EXTERNAL RESOURCE POOL object Leverage Always On Secondaries to offload external script execution

Best Practices: Operationalization Secure out-of-the box defaults Some lift-n-shift scripts may not work. Ex: installing packages or reaching out to external resources Leverage SQL Server data integration capabilities Ex: DQ to pull data from other sources, SSIS, external tables Leverage SQL query processing integration Batch mode execution on Columnstore data Parallel execution for training (rx* functions) and scoring Streaming execution of external scripts

Python in SQL Server 2017 Anaconda distribution Distribution of Python focused on Data Science Package and environment manager Installs with more than 100 packages Python version 3.5.2 Jupyter notebooks

R in SQL Server 2017 Best in class scientific language Numerous packages availiable R 3.3.3 Rstudio and external connectivity

Popular data science packages NumPy N-dimensional arrays, random numbers Pandas data manipulation, DataFrame object SciPy scientific computing and statistical methods Scikit-learn machine learning Matplotlib plotting and graphics

DEMO Machine Learning