Mike Schulte Data Scientist at the University of Pittsburgh Professor of Economics and Philosophy at Western Michigan University

Similar documents
Data mining overview. Data Mining. Data mining overview. Data mining overview. Data mining overview. Data mining overview 3/24/2014

Outrun Your Competition With SAS In-Memory Analytics Sascha Schubert Global Technology Practice, SAS

INTRODUCTION... 2 FEATURES OF DARWIN... 4 SPECIAL FEATURES OF DARWIN LATEST FEATURES OF DARWIN STRENGTHS & LIMITATIONS OF DARWIN...

Worldwide Hosted PBX Market Table of Contents. EasternManagementGroup 0

SQL Server Machine Learning Marek Chmel & Vladimir Muzny

TIM 50 - Business Information Systems

Data warehouse and Data Mining

Informatica PowerExchange for Tableau User Guide

Microsoft Exam

BUILD BETTER MICROSOFT SQL SERVER SOLUTIONS Sales Conversation Card

TIM 50 - Business Information Systems

Accelerate your SAS analytics to take the gold

BEST BIG DATA CERTIFICATIONS

Microsoft certified solutions associate

CUSTOMER INTERACTION MANAGER WITH INTEGRATED DIGITAL MESSAGING

Data Mining with Microsoft

Think & Work like a Data Scientist with SQL 2016 & R DR. SUBRAMANI PARAMASIVAM (MANI)

FIVE BEST PRACTICES FOR ENSURING A SUCCESSFUL SQL SERVER MIGRATION

Integration of Economic and Construction Outlooks: A Case Study. Lorenz Kleist Consultant October 6, 2009

R Language for the SQL Server DBA

Now, Data Mining Is Within Your Reach

Management Information Systems MANAGING THE DIGITAL FIRM, 12 TH EDITION FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT

ACHIEVEMENTS FROM TRAINING

Sage MAS 200 SQL Server Edition Introduction and Overview

Citizen Data Scientist is the new Data Analyst

Fast Innovation requires Fast IT

SIP Global Market 7-Year Forecast and Analysis. Table of Contents. EasternManagementGroup 0

Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)

Introduction to Data Mining and Data Analytics

Taking Your Application Design to the Next Level with Data Mining

Data Mining: Approach Towards The Accuracy Using Teradata!

Collaboration on Cybersecurity program between California University and Shippensburg University

DIWAX Data handling and forecasting based on National Accounts

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

Oracle9i Data Mining. An Oracle White Paper December 2001

Transforming IT: From Silos To Services

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

Hybrid Cloud 1. ebookiness created by the HPE Europe Division of Ingram Micro

Crystal Reports. Overview. Contents. How to report off a Teradata Database

Data Science and Open Source Software. Iraklis Varlamis Assistant Professor Harokopio University of Athens

Oracle and Tangosol Acquisition Announcement

Lab 3: From Data to Models

Global Optical Connectors Market: Analysis By Type (Board to Board, Edge Card, Mid Board), By Application, By Region, By Country ( )

Aeg eksperimenteerida otsimotooriturundusega (mõõdukalt ja mõõdetavalt) Robin Gurney

Introducing SAS Model Manager 15.1 for SAS Viya

The DataBridge: A Social Network for Long Tail Science Data!

BOARD OF REGENTS ACADEMIC AFFAIRS COMMITTEE 4 STATE OF IOWA SEPTEMBER 12-13, 2018

Integrating Large Datasets from Multiple Sources Calgary SAS Users Group (CSUG)

Data Warehouse Testing. By: Rakesh Kumar Sharma

Worldwide Consumer Digital Camera Forecast Summary:

Indira Bandari. Predictive Analytics using R in SQL Server

10778A: Implementing Data Models and Reports with Microsoft SQL Server 2012

Processing Unstructured Data. Dinesh Priyankara Founder/Principal Architect dinesql Pvt Ltd.

Overview. Introduction to Data Warehousing and Business Intelligence. BI Is Important. What is Business Intelligence (BI)?

Professional certification from BCS, The Chartered Institute for IT. bcs.org/certification

Socioeconomic Overview of Ohio

STREAMLINED CERTIFICATION PATHS

Getting Started with Advanced Analytics in Finance, Marketing, and Operations

Introducing Oracle R Enterprise 1.4 -

Introduction to Data Mining. Rafal Lukawiecki Strategic Consultant, Project Botticelli Ltd

5/24/ MVP SQL Server: Architecture since 2010 MCT since 2001 Consultant and trainer since 1992

Session Questions and Responses

Deltek Touch CRM for Deltek CRM. User Guide

Note: In the presentation I should have said "baby registry" instead of "bridal registry," see

2(&'ÃJOREDOÃFRQIHUHQFHÃRQ WHOHFRPPXQLFDWLRQVÃSROLF\ÃIRUÃWKH GLJLWDOÃHFRQRP\

ER/Studio Enterprise Portal Evaluation Guide. Published: March 6, 2009

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

Introduction of the Industrial Internet Consortium. May 2016

Data Warehousing & Mining. Data integration. OLTP versus OLAP. CPS 116 Introduction to Database Systems

VACANCY ANNOUNCEMENT

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Marco Porta Betim Çiço Peter Kaczmarski Neki Frasheri Virginio Cantoni. Fernand Vandamme (BIKEMA)

Lies, Damned Lies and Statistics Using Data Mining Techniques to Find the True Facts.

QMF Analytics v11: Not Your Green Screen QMF

What is Data Warehouse like

Innovation Infrastructure Partnership

Outlier Detection With SQL And R. Kevin Feasel, Engineering Manager, ChannelAdvisor Moderated By: Satya Jayanty

2017 Ethics & Compliance Hotline & Incident Management Benchmark Report Webinar

Intelligent Enterprise meets Science of Where. Anand Raisinghani Head Platform & Data Management SAP India 10 September, 2018

Overview of the Information Technology Management Concentration & Career Focuses

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Symantec and its partner community are securing the cloud generation, together

Data Warehouse and Data Mining

Qualification Specification for the Knowledge Modules that form part of the BCS Level 4 Software Developer Apprenticeship

Gemalto brings trust to an increasingly connected world

ADVANCED ANALYTICS USING SAS ENTERPRISE MINER RENS FEENSTRA

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Paper SAS Taming the Rule. Charlotte Crain, Chris Upton, SAS Institute Inc.

Slice Intelligence!

THE TRUSTED NETWORK POWERING GLOBAL SUPPLY CHAINS AND THEIR COMMUNITIES APPROVED EDUCATION PROVIDER INFORMATION PACK

MICROSOFT CLOUD PLATFORM AND INFRASTRUCTURE CERTIFICATION. Includes certifications for Microsoft Azure and Windows Server

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Introducing Microsoft SQL Server 2016 R Services. Julian Lee Advanced Analytics Lead Global Black Belt Asia Timezone

Lecture 18. Business Intelligence and Data Warehousing. 1:M Normalization. M:M Normalization 11/1/2017. Topics Covered

2017 USER SURVEY EXECUTIVE SUMMARY

Global and Chinese Network Intrusion Prevention Systems (IPS) Products Industry, 2016 Market Research Report

Product Documentation SAP Business ByDesign February Marketing

Data Warehousing and Data Mining. Announcements (December 1) Data integration. CPS 116 Introduction to Database Systems

Transforming the Economics of

Using the SAS Add-In for Microsoft Office you can access the power of SAS via three key mechanisms:

Transcription:

Mike Schulte mike@shrewd-owl.com Data Scientist at the University of Pittsburgh Professor of Economics and Philosophy at Western Michigan University

Advanced Analytics Introduced Advanced Analytics within SQL Server and Excel R and RStudio Connecting R to SQL Server Solution Examples

Summary Statistics Historical View Traditional Business Intelligence work does a lot of this already

Fit Mathematical Models Present Day View Captures current capabilities and performances

Fit Statistical Models Forward-Looking View Captures likely outcomes for the future based on past and present outcomes

SQL Server and basic SQL statements Excel Data Mining Add-In Analysis Services and DMX R and R Services Microsoft Azure Machine Learning

library(e1071) nb_model <- naivebayes(class~.,data = products)

Wizard interface no programming required! Contained within Excel Limited capabilities Older algorithms

More flexible than the Excel add-in Integrates well with the rest of the SQL stack Limited capabilities Older algorithms Requires specialized knowledge of DMX

Statistical programming environment Open source Powerful and flexible Large user community Requires specialized knowledge of, well, R!

Academic statisticians Pharmaceutical companies Government agencies Professional consultants Business analysts Converted SAS users! More

Create a System DSN for connection Connect R to your SQL Database Pull data from SQL to R Analyze the data to create a model Operationalize the model This can still be a useful way to use R with SQL Server.

Open Administrative Tools in Control Panel

Manage the ODBC Data Sources

Create a New System Data Source Name

Choose SQL Server Native Client 11.0

Choose the SQL Server Installation You Want

Recommended: Use Windows Authentication

Install RODBC Package in R

Issue standard queries Drop, create, and fetch tables List available tables See documentation for more

Load RODBC Package and Connect to DSN

You can now issue queries from within R!

library(rodbc) Bring the Data into R channel <- odbcconnect("rconnection") autodata <- sqlquery(channel, "SELECT id, mpg, cylinders, displacement, horsepower, weight, acceleration FROM [dbo].[autodata];") trainingdata <- autodata[complete.cases(autodata),] missingdata <- autodata[!complete.cases(autodata),]

Build a Linear Regression Model and Impute automodel <- lm(mpg~horsepower+weight, data=trainingdata) missingdata$mpg <- round(predict(automodel, newdata=missingdata),1)

Update Our Database with Imputed Values for(i in 1:length(missingdata$id)){ string1 <- "UPDATE dbo.autodata SET mpg = " string2 <- as.character(missingdata$mpg[i]) string3 <- " WHERE id = " string4 <- as.character(missingdata$id[i]) querystring <- paste(paste(paste(paste(string1,string2,sep=""), string3,sep=""),string4,sep="")) sqlquery(channel,querystring) }

Note that this approach is new with SQL Server 2016.

Advantages: Data do not have to move Performance improvement (scale, parallelism) Challenges: Harder to code Harder to set up access

There are lots of use cases that fit into several categories: Association Analysis (Market Basket Analysis) Classification Estimation Simulation and Optimization Clustering And more

Products often sell well together. Some of these patterns are well established and may only be confirmed by the analysis. More unexpected patterns, like the apocryphal beer and diapers example, might be discovered too, providing additional insight.

Explore associations Confirm expected patterns Find unexpected patterns Create actionable insights

Set up periodic monitoring of known rules Detect drops in association strength and investigate

A charter fishing company wishes to determine the optimal number of boats to have in service. Too many boats will mean wasted resources, while too few boats will mean missed opportunities.

Use historical and forecast data to fit a distribution

Use the fitted distribution to project revenue for each additional boat. Decide how many boats to keep!

We would like to group countries that are economically similar to one another.

We begin with data on each country: Median GDP Growth (3 years) Population (in millions) Enabling Trade Index

setwd("c:/users/michael/desktop/demos") dfrm <- read.csv(file="clustering-demo-data.csv", header=t,stringsasfactors=f) dfrm$scgdpg <- scale(dfrm$medgdpg,center=t,scale=t) dfrm$scpop <- scale(dfrm$pop13,center=t,scale=t) dfrm$sceti <- scale(dfrm$eti,center=t,scale=t) kmc <- kmeans(dfrm[,5:7],centers=5,nstart=10)

Cluster 1: 50 Countries

Cluster 2: 42 Countries

Cluster 3: 11 Countries

Cluster 4: 2 Countries

Cluster 5: 33 Countries

What sale price should I use for Froot Loops?

Use historical data to determine lift for each price point.

Use lift to determine relative profit for each price point. Recommend a sale price to your marketing and sales teams!

Two Broad Areas of Concern: Jobs Ethics