Machine Learning - Regression. CS102 Fall 2017

Similar documents
Machine Learning - Clustering. CS102 Fall 2017

Overview of Big Data

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Name Per. a. Make a scatterplot to the right. d. Find the equation of the line if you used the data points from week 1 and 3.

1. How would you describe the relationship between the x- and y-values in the scatter plot?

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Lecture on Modeling Tools for Clustering & Regression

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3. Chapter 3: Data Preprocessing. Major Tasks in Data Preprocessing

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

More About Factoring Trinomials

Section E. Measuring the Strength of A Linear Association

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Data Mining: Models and Methods

Chapter 4: Analyzing Bivariate Data with Fathom

Data Mining. Jeff M. Phillips. January 7, 2019 CS 5140 / CS 6140

Correlation. January 12, 2019

What to come. There will be a few more topics we will cover on supervised learning

1) Give decision trees to represent the following Boolean functions:

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

Curve Fitting with Linear Models

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

SOCIAL MEDIA MINING. Data Mining Essentials

What is machine learning?

Mid-Chapter Quiz: Lessons 2-1 through 2-3

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

Introduction to Machine Learning. Xiaojin Zhu

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Jarek Szlichta

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Introduction to Classification & Regression Trees

Introduction to Automated Text Analysis. bit.ly/poir599

Error Analysis, Statistics and Graphing

Data Mining Algorithms

CS570: Introduction to Data Mining

Random Forest A. Fornaser

Data Preprocessing. Slides by: Shree Jaswal

Lecture 25: Review I

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram

Machine Learning with Python

Data Clustering. Danushka Bollegala

ECLT 5810 Clustering

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Natural Language Processing

Computational Databases: Inspirations from Statistical Software. Linnea Passing, Technical University of Munich

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Multiple Regression White paper

ECLT 5810 Clustering

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Solutions of Equations An ordered pair will be a solution to an equation if the equation is when the numbers are substituted into the equation.

A straight line is the graph of a linear equation. These equations come in several forms, for example: change in x = y 1 y 0

Predicting housing price

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Applying SnapVX to Real-World Problems

Fall 2016 CS130 - Regression Analysis 1 7. REGRESSION. Fall 2016

Machine Learning. Topic 4: Linear Regression Models

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Lecture 7: Linear Regression (continued)

Link Prediction for Social Network

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

Intro to Artificial Intelligence

Applying Supervised Learning

Using Excel for Graphical Analysis of Data

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Overview. Data Mining for Business Intelligence. Shmueli, Patel & Bruce

NONPARAMETRIC REGRESSION TECHNIQUES

Objectives. 3-5 Scatter Plots and Trend Lines. Create and interpret scatter plots. Use trend lines to make predictions.

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA

Linear Regression and K-Nearest Neighbors 3/28/18

数据挖掘 Introduction to Data Mining

Large Scale Data Analysis Using Deep Learning

Data Mining. Jeff M. Phillips. January 12, 2015 CS 5140 / CS 6140

14.2 The Regression Equation

Using Excel for Graphical Analysis of Data

Machine Learning in Python. Rohith Mohan GradQuant Spring 2018

A Dendrogram. Bioinformatics (Lec 17)

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

k-means Clustering Todd W. Neller Gettysburg College

Demystifying Machine Learning

Gene Clustering & Classification

Data Preparation. Data Preparation. (Data pre-processing) Why Prepare Data? Why Prepare Data? Some data preparation is needed for all mining tools

Classification and Regression

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. Komate AMPHAWAN

Lab Activity #2- Statistics and Graphing

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Spatial Interpolation & Geostatistics

INTRODUCTION TO BIG DATA, DATA MINING, AND MACHINE LEARNING

University of Ghana Department of Computer Engineering School of Engineering Sciences College of Basic and Applied Sciences

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

Linear Regression. Problem: There are many observations with the same x-value but different y-values... Can t predict one y-value from x. d j.

Comparing and Modeling with Functions

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Comparative analysis of data mining methods for predicting credit default probabilities in a retail bank portfolio

Transcription:

Machine Learning - Fall 2017

Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for patterns in data Machine Learning Using data to make inferences or predictions Data Visualization Graphical depiction of data Data Collection and Preparation

Machine Learning Using data to make inferences or predictions Supervised machine learning Set of labeled examples to learn from: training data Develop model from training data Use model to make inferences about new data Unsupervised machine learning Unlabeled data, look for patterns or structure (similar to data mining)

Machine Learning Using data to make inferences or predictions Supervised machine learning Set of labeled examples to learn from: training data Develop model from training data Use model to make inferences about new data Also Unsupervised machine Reinforcement learning learning Unlabeled data, look Improve for patterns model as or new structure data arrives (similar to data Semi-supervised mining) learning Labeled + unlabeled Active learning Semi-supervised, ask user for labels

Using data to make inferences or predictions Supervised Training data, each example: Set of predictor values - independent variables Numeric output value - dependent variable Model is function from predictors to output Use model to predict output value for new predictor values Example Predictors: mother height, father height, current age Output: height

Other Types of Machine Learning Using data to make inferences or predictions Classification Like regression except output values are labels or categories Example Predictor values: age, gender, income, profession Output value: buyer, non-buyer Clustering Unsupervised Group data into sets of items similar to each other Example - group customers based on spending patterns

Back to Set of predictor values - independent variables Numeric output value - dependent variable Model is function from predictors to output Training data w 1, x 1, y 1, z 1 è o 1 w 2, x 2, y 2, z 2 è o 2 w 3, x 3, y 3, z 3 è o 3... Model f(w, x, y, z) = o

Back to Goal: Function f applied to training data should produce values as close as possible in aggregate to actual outputs Training data w 1, x 1, y 1, z 1 è o 1 w 2, x 2, y 2, z 2 è o 2 w 3, x 3, y 3, z 3 è o 3... Model f(w, x, y, z) = o f(w 1, x 1, y 1, z 1 ) = o 1 f(w 2, x 2, y 2, z 2 ) = o 2 f(w 3, x 3, y 3, z 3 ) = o 3

Simple Linear We will focus on: One numeric predictor value, call it x One numeric output value, call it y Ø Data items are points in two-dimensional space

Simple Linear We will focus on: One numeric predictor value, call it x One numeric output value, call it y Functions f(x)=y that are lines (for now)

Simple Linear Functions f(x)=y that are lines: y = a x + b y = 0.8 x + 2.6

Real Examples (from Overview)

Summary So Far Given: Set of known (x,y) points Find: function f(x)=ax+b that best fits the known points, i.e., f(x) is close to y Use function to predict y values for new x s Ø Also can be used to test correlation

Correlation and Causation (from Overview) Correlation Values track each other Height and Shoe Size Grades and SAT Scores Causation One value directly influences another Education Level à Starting Salary Temperature à Cold Drink Sales

Correlation and Causation (from Overview) Correlation Values track each other Height and Shoe Size Grades and SAT Scores Find: function f(x)=ax+b that best fits the known points, i.e., f(x) is close to y The better the function fits the points, the more correlated x and y are

and Correlation The better the function fits the points, the more correlated x and y are Linear functions only Correlation Values track each other Positively when one goes up the other goes up Also negative correlation When one goes up the other goes down Latitude versus temperature Car weight versus gas mileage Class absences versus final grade

Next Calculating simple linear regression Measuring correlation through spreadsheets Shortcomings and dangers Polynomial regression

Calculating Simple Linear Method of least squares Given a point and a line, the error for the point is its vertical distance d from the line, and the squared error is d 2 Given a set of points and a line, the sum of squared error (SSE) is the sum of the squared errors for all the points Goal: Given a set of points, find the line that minimizes the SSE

Calculating Simple Linear Method of least squares d 4 d 5 d 2 d3 d 1 SSE = d 12 + d 22 + d 32 + d 42 + d 5 2

Calculating Simple Linear Method of least squares Goal: Find the line that minimizes the SSE d 4 d 5 d 2 d3 d 1 Good News! There are many software packages to do it for you SSE = d 12 + d 22 + d 32 + d 42 + d 5 2

Measuring Correlation More help from software packages Pearson s Product Moment Correlation (PPMC) Pearson coefficient, correlation coefficient Value r between 1 and -1 1 maximum positive correlation 0 no correlation -1 maximum negative correlation The better the function fits the points, the more correlated x and y are Coefficient of determination r 2, R 2, R squared Measures fit of any line/curve to set of points Usually between 0 and 1 For simple linear regression R 2 = Pearson 2

Measuring Correlation More help from software packages Swapping x and y axes Pearson s Product yields Moment same Correlation values (PPMC) Pearson coefficient, correlation coefficient Value r between 1 and -1 1 maximum positive correlation 0 no correlation -1 maximum negative correlation The better the function fits the points, the more correlated x and y are Coefficient of determination r 2, R 2, R squared Measures fit of any line/curve to set of points Usually between 0 and 1 For simple linear regression R 2 = Pearson 2

Correlation Game http://aionet.eu/corguess (*) Try to get: Right answers 10, Guesses Right answers 2 Anti-cheating: Pictures = Right answers + 1 (*) Improved version of Wilderdom correlation guessing game thanks to Poland participant Marcin Piotrowski Other correlation games: http://guessthecorrelation.com/ http://www.rossmanchance.com/applets/guesscorrelation.html http://www.istics.net/correlations/

Through Spreadsheets City temperatures (using Cities.csv) 1. temperature (y) versus latitude (x) 2. temperature (y) versus longitude (x) 3. latitude (y) versus longitude (x) 4. longitude (y) versus latitude (x)

Through Spreadsheets (2) Spreadsheet correl() function

Shortcomings of Simple Linear Anscombe s Quartet (From Overview) Also identical R 2 values!

Reminder Goal: Function f applied to training data should produce values as close as possible in aggregate to actual outputs Training data w 1, x 1, y 1, z 1 è o 1 w 2, x 2, y 2, z 2 è o 2 w 3, x 3, y 3, z 3 è o 3... Model f(w, x, y, z) = o f(w 1, x 1, y 1, z 1 ) = o 1 f(w 2, x 2, y 2, z 2 ) = o 2 f(w 3, x 3, y 3, z 3 ) = o 3

Polynomial Given: Set of known (x,y) points Find: function f that best fits the known points, i.e., f(x) is close to y f(x) = a 0 + a 1 x + a 2 x 2 + + a n x n Best fit is still method of least squares Still have coefficient of determination R 2 (no r) Pick smallest degree n that fits the points reasonably well Also exponential regression: f(x) = a b x

Dangers of (Polynomial) Overfitting and Underfitting (From Overview)

Anscombe s Quartet in Action

Summary Supervised machine learning Training data: Set of input values with numeric output value Model is function from inputs to output Use function to predict output value for inputs Balance complexity of function against best fit Also useful for quantifying correlation For linear functions, the closer the function fits the points, the more correlated the measures are