As a reference, please find a version of the Machine Learning Process described in the diagram below.

Similar documents
Applied Machine Learning

Data Science and Machine Learning Essentials

Principles of Machine Learning

Introduction to Azure Machine Learning

Introduction to Data Science

Data Science Essentials

Cortana Intelligence Suite Foundations for Dynamics

Developing Intelligent Apps

Data Science and Machine Learning Essentials

Figure 3.20: Visualize the Titanic Dataset

Microsoft Cloud Workshop. Big Data and Visualization Leader Hackathon Guide

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials

How to create a theograph. A step-by-step guide

Index. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI /

Building Self-Service BI Solutions with Power Query. Written By: Devin

Formatting the Question Text and Adding Images, Media Objects, Tables etc.

Distributed Time Travel for Feature Generation. Prasanna Padmanabhan DB Tsai Mohammad H. Taghavi

User Guide. Data Preparation R-1.0

The Explorer. chapter Getting started

WEKA Explorer User Guide for Version 3-4

Adding Links. Links convey credibility and help with search engine optimization.

Lecture 25: Review I

Data Resource Centre, University of Guelph CREATING AND EDITING CHARTS. From the menus choose: Graphs Chart Builder... 20/11/ :06:00 PM Page 1

Lab 5: Reporting with RPE

The UBot Studio SCRIPT REFERENCE. The Qualifier Functions

User Guide. Data Preparation R-1.1

MicroStrategy Desktop

DATA ANALYTICS ON AMAZON PRODUCT REVIEW USING NOSQL HIVE AND MACHINE LEARNING ON SPARKS ON HADOOP FILE SYSTEM.

Enterprise Miner Version 4.0. Changes and Enhancements

1 Topic. Image classification using Knime.

MACHINE LEARNING TOOLBOX. Logistic regression on Sonar

Iconasys Advanced 360 Product View Creator. User Guide (Mac OSX)

Microsoft Analyzing and Visualizing Data with Power BI.

Lab 5: Reporting with RPE

SAS Factory Miner 14.2: User s Guide

NEW FEATURES OF ENTERPRISE GUIDE 6.1 MATT MALCZEWSKI, SAS CANADA

RISKMAN QUICK REFERENCE GUIDE TO SYSTEM CONFIGURATION & TOOLS

QRG: Adding Images, Files and Links in the WYSIWYG Editor

Slice Intelligence!

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Content Analysis for Workshop Smart Builder e-learning Authoring Tool By: Smruti Shah Roll No. 9 MA-ET

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

SAS Model Manager 15.1: Quick Start Tutorial

REFRESHER TRAINING OF FOCAL POINTS ON THE

Short instructions on using Weka

Create a Questionnaire. Triangle

S2 Text. Instructions to replicate classification results.

mltool Documentation Release Maurizio Sambati

CS4491/CS 7265 BIG DATA ANALYTICS

Andrea Martorana Tusa. Failure prediction for manifacturing industry

MicroStrategy Academic Program

Asanka Padmakumara. ETL 2.0: Data Engineering with Azure Databricks

MicroStrategy Analytics Desktop

Blackboard QuickStart Guide for Students

Published on Online Documentation for Altium Products (

Drill Table. Summary. Availability. Modified by on 19-Nov Parent page: Objects

BULK EDITING DASHBOARD ON O365

FastStats Integration

Dynamics 365 for BPO Dynamics 365 for BPO

Evaluating Classifiers

Logistic Regression: Probabilistic Interpretation

SAS Visual Analytics 7.2, 7.3, and 7.4: Getting Started with Analytical Models

SAS E-MINER: AN OVERVIEW

Dreamweaver MX The Basics

New website Training:

Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002

To start Kidspiration on a Macintosh: Open the Kidspiration 3 folder and double-click the Kidspiration icon.

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Websites. Version 1.7

SAS Model Manager 2.2. Tutorials

Intellicus Enterprise Reporting and BI Platform

Customer Support Guide Creating a custom Headcount Dashboard

Converting categorical data into numbers with Pandas and Scikit-learn -...

ADOBE Dreamweaver CS3 Basics

SQream Dashboard Version SQream Technologies

The Allure of Machine Learning, now within Reach in Microsoft Azure

Crossing the AI Chasm. What We Learned from building Apache PredictionIO (incubating)

What's New in MATLAB for Engineering Data Analytics?

Data Science Training

Informatica PowerExchange for Microsoft Azure Blob Storage 10.2 HotFix 1. User Guide

Somerville College WordPress user manual. 7th October 2015

Submitting Assignments

Dreamweaver Template Tutorial - How to create a website from a template

Package gbts. February 27, 2017

FactoryStudio Setup. The purpose of this document is to describe how to set up FactoryStudio for XLReporter.

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

Release notes for version 3.7.1

Data Mining: STATISTICA

2/26/2017. Spark MLlib is the Spark component providing the machine learning/data mining algorithms. MLlib APIs are divided into two packages:

QuickStart Guide for Students

MicroStrategy Academic Program

MIPE: Model Informing Probability of Eradication of non-indigenous aquatic species. User Manual. Version 2.4

Copyright 2018 by KNIME Press

USING ALTERYX DESIGNER FOR PREDICTIVE ANALYSIS JUNE 2017

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

New features in MediaBank 3.1p1

Analytics and Visualization

BASICS OF SPATIAL MODELER etraining

Introduction & Navigation

Transcription:

PREDICTION OVERVIEW In this experiment, two of the Project PEACH datasets will be used to predict the reaction of a user to atmospheric factors. This experiment represents the first iteration of the Machine Learning Process and further iterations and explorations should be performed in order to achieve the desired performance of the model. As a reference, please find a version of the Machine Learning Process described in the diagram below. This experiment can be found in the Cortana Analytics Gallery https://gallery.cortanaanalytics.com/experiment/6bf324d25ae24ca19b20494e12c3b44d

LOAD DATA There are multiple ways to bring data into Azure Machine Learning Studio. For this sample experiment the datasets were uploaded from local files (af_data.csv and 1. Upload data from local file Click on New at the bottom left of the page in Azure ML Studio. Go to Dataset -> From Local File.

Choose the file to upload from your local machine, check the correct options are entered and click OK. Repeat this for all datasets you want to upload. The datasets uploaded should now appear under Saved Datasets -> My Datasets when you start a new experiment.

2. Use the Reader module The Reader module can be used to load data into Azure ML from various data sources. You can choose from public Web URL, Hive Query, Azure table, Azure blob and Data feed provider. As an example, you can load the datasets from Project Peach which are stored in Azure blob (more info at http://aka.ms/uclhack). You need to insert the URI and choose the file format as shown below.

EXPLORE DATA AND ENGINEER FEATURES Once your datasets are in Azure ML, you can drag then into the workspace. You can visualize the data by rightclicking on the output of the dataset module and selecting Visualize. The User Profile dataset has 200000 rows and 21 columns, and you can see a preview of the data. If you click on one of the columns, some basic statistics appear on the right-hand side. Visualize the atmospheric factors dataset. The Timestamp column is a numeric feature and the format of the date is not one recognized by Azure ML. To convert this column in to a DateTime feature, you need to first convert it to a string and then use some simple R code to convert it to a format accepted by Azure ML.

Search for the Metadata Editor module in the search box at the top-left of the page and drag the module onto the workspace. Connect the output of the Atmospheric dataset to the input of the Metadata Editor. Click on the Metadata Editor module and its Properties are going to appear in the right-hand side. Click Launch column selector and include the TIMESTAMP column in the pop-up window as shown below.

Insert a Execute R Script module onto the workspace and click on it. On the right-hand side you will see a box filled with sample R code. Delete the existing code and insert the code from the text box below. This code is reading the values in the column TIMESTAMP and converting them in the standard Date format which is readable by Azure ML.

# Map 1-based optional input ports to variables dataset1 <- maml.mapinputport(1) # class: data.frame dataset1$timestamp <- as.date(dataset1$timestamp, format="%y%m%d") # Select data.frame to be sent to the output Dataset port maml.mapoutputport("dataset1"); Click Run at the bottom of the page to run the experiment. As you progress in this sample, run the experiment after inserting new modules to be able to visualize the output of the modules and check everything is running as expected. After the experiment finished running, right-click visualize the output of the Execute R Script module. The TIMESTAMP column should be in a different format as shown below. The ATMOSPHERIC_CONDITION column is a string feature and the EXPOSURE_LEVEL column is a numeric feature. These columns only take a specific number a values so their data type should be converted to categorical. This can be achieved using another Metadata Editor module.

Drag another Metadata Editor module into the workspace and change its properties as shown below.

Next step is to join the two datasets to be able to train a machine learning models on the combined data. Drag the Join module in the workspace and connect the output of the Metadata Editor module to the left-hand side input and the output of the User Profile data to the right-hand side input of the Join module. The two dataset are joined on User ID. Click on the Join module and set its properties as shown below. Make sure the Keep the right key columns in the joined table box is unticked.

Run the experiment and right-click visualize the output of the join once the experiment finished running. You should see a join dataset as below.

TRAIN, SCORE AND EVALUATE MACHINE LEARNING MODELS The data needs to be split into two sets: a training set and a testing set. The training set is used to the train the machine learning models and the testing set is used to measure the performance of the trained model. To split the data, we use the Split Data module. Set the properties of the Split Data module to: Splitting mode Split Rows, Fraction of rows in the first output dataset 0.7, Random seed 0 and Stratified split False. Make sure the Randomized split checkbox is ticked. The first output of this module will contain 70% of the data which is the training set and the second output of the module will contain the remaining 30%. Next we need to choose the machine learning model for this prediction problem. We can to predict if the user has no reaction or a negative reaction to the atmospheric factors. The user reaction is recorded in the dataset, so we will use this information to make future predictions. Hence, in this scenario we use a class of machine learning algorithms called supervised learning as our data points are labeled. More specifically we will use binary classification algorithms to predict the user reaction to atmospheric factors. For this sample, we select the Two-class boosted decision tree model. Search for this module and drag it into the workspace. Add a Train Model module. Connect the output of the Two-class decision tree to the left-hand

side input of the Train Model module and the first output of the Split Data module to the right-hand input of the Train Model module as shown below. Click on the Train Model module and set the Label column to USER S FEEDBACK using the Launch column selector. Add a Score Model module to the workspace and connect it to the Train Model and the Split Data module as shown below. In this sample, we want to compare two different models to determine which one performs better for this particular problem. Add a Two-class logistic regression module and connect it to a new Train Model and Score Model modules as in the previous step. You should obtain something as in the figure below.

In order to compare and evaluate these trained models, we use the Evaluate Model module connected as shown below. Run the experiment and once the experiment finished running, right-click visualize the output of the Evaluate Model module. The Evaluate Results Page shows different metrics and performance measures for your trained algorithms. For example, the accuracy of the Boosted decision tree algorithm is 0.912 very similar the accuracy of the logistic regression algorithm which is 0.911. The boosted decision tree is performing better in terms of the ROC curve while the logistic regression is performing better in terms of precision.