Multiple Regression White paper

Similar documents
CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Year 10 General Mathematics Unit 2

Chapter 7: Linear regression

Section 18-1: Graphical Representation of Linear Equations and Functions

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Practice Test (page 391) 1. For each line, count squares on the grid to determine the rise and the run. Use slope = rise

Graphing Linear Equations

Chapter 1. Linear Equations and Straight Lines. 2 of 71. Copyright 2014, 2010, 2007 Pearson Education, Inc.

Math-2. Lesson 3-1. Equations of Lines

Sec 4.1 Coordinates and Scatter Plots. Coordinate Plane: Formed by two real number lines that intersect at a right angle.

Bivariate (Simple) Regression Analysis

MDM 4UI: Unit 8 Day 2: Regression and Correlation

Independent Variables

E-Campus Inferential Statistics - Part 2

8 th Grade Mathematics Unpacked Content For the new Common Core standards that will be effective in all North Carolina schools in the

Linear Functions. College Algebra

Name Class Date. Using Graphs to Relate Two Quantities

SPSS INSTRUCTION CHAPTER 9

Vocabulary Unit 2-3: Linear Functions & Healthy Lifestyles. Scale model a three dimensional model that is similar to a three dimensional object.

Chapter 8: Regression. Self-test answers

Using Excel for Graphical Analysis of Data

7 Fractions. Number Sense and Numeration Measurement Geometry and Spatial Sense Patterning and Algebra Data Management and Probability

PSY 9556B (Feb 5) Latent Growth Modeling

Chapter 4: Analyzing Bivariate Data with Fathom

8.NS.1 8.NS.2. 8.EE.7.a 8.EE.4 8.EE.5 8.EE.6

FLC Ch 3. Ex 1 Plot the points Ex 2 Give the coordinates of each point shown. Sec 3.2: Solutions and Graphs of Linear Equations

Minnesota Academic Standards for Mathematics 2007

List of Topics for Analytic Geometry Unit Test

1. Solve the following system of equations below. What does the solution represent? 5x + 2y = 10 3x + 5y = 2

Forms of Linear Equations

Scatterplot: The Bridge from Correlation to Regression

Section Graphs and Lines

slope rise run Definition of Slope

Name Per. a. Make a scatterplot to the right. d. Find the equation of the line if you used the data points from week 1 and 3.

Chapter 1 Polynomials and Modeling

Relations and Functions 2.1

Bell Ringer Write each phrase as a mathematical expression. Thinking with Mathematical Models

graphing_9.1.notebook March 15, 2019

Regression Analysis and Linear Regression Models

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Building Better Parametric Cost Models

3-1 Writing Linear Equations

Mathematics (

Section 2.1: Intro to Simple Linear Regression & Least Squares

Linear Topics Notes and Homework DUE ON EXAM DAY. Name: Class period:

GRAPHING WORKSHOP. A graph of an equation is an illustration of a set of points whose coordinates satisfy the equation.

For our example, we will look at the following factors and factor levels.

Curve Fitting with Linear Models

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

Lesson 16: More on Modeling Relationships with a Line

SLStats.notebook. January 12, Statistics:

8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Dealing with Data in Excel 2013/2016

Direct and Partial Variation. Lesson 12

- 1 - Fig. A5.1 Missing value analysis dialog box

Fall 2012 Points: 35 pts. Consider the following snip it from Section 3.4 of our textbook. Data Description

Statistics 1 - Basic Commands. Basic Commands. Consider the data set: {15, 22, 32, 31, 52, 41, 11}

Using Excel for Graphical Analysis of Data

Regression. Notes. Page 1 25-JAN :21:57. Output Created Comments

Algebra 2 Chapter Relations and Functions

SNAP Centre Workshop. Graphing Lines

Advanced Algebra. Equation of a Circle

ACTIVITY: Representing Data by a Linear Equation

Experimental Design and Graphical Analysis of Data

5.1 Introduction to the Graphs of Polynomials

Plotting Graphs. Error Bars

Section 3.1. Reading graphs and the Rectangular Coordinate System

Lab Activity #2- Statistics and Graphing

3-6 Lines in the Coordinate Plane

Unit Maps: Grade 8 Math

Regression. Page 1. Notes. Output Created Comments Data. 26-Mar :31:18. Input. C:\Documents and Settings\BuroK\Desktop\Data Sets\Prestige.

Unit: Quadratic Functions

2.1. Rectangular Coordinates and Graphs. 2.1 Rectangular Coordinates and Graphs 2.2 Circles 2.3 Functions 2.4 Linear Functions. Graphs and Functions

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

STA 570 Spring Lecture 5 Tuesday, Feb 1

2.1 Solutions to Exercises

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Section 4.4: Parabolas

Two-Stage Least Squares

Section 2.2 Graphs of Linear Functions

Experiment 1 CH Fall 2004 INTRODUCTION TO SPREADSHEETS

LAMPIRAN. Sampel Penelitian

The following procedures and commands, are covered in this part: Command Purpose Page

Glossary Common Core Curriculum Maps Math/Grade 6 Grade 8

Writing and Graphing Linear Equations. Linear equations can be used to represent relationships.

Things to Know for the Algebra I Regents

Section 7D Systems of Linear Equations

Graphs and transformations, Mixed Exercise 4

Introduction to Statistical Analyses in SAS

Step 1. Use a ruler or straight-edge to determine a line of best fit. One example is shown below.

LINEAR TOPICS Notes and Homework: DUE ON EXAM

Fathom Dynamic Data TM Version 2 Specifications

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Probabilistic Analysis Tutorial

Exam 4. In the above, label each of the following with the problem number. 1. The population Least Squares line. 2. The population distribution of x.

DataSet2. <none> <none> <none>

5. Compare the volume of a three dimensional figure to surface area.

Did you ever think that a four hundred year-old spider may be why we study linear relationships today?

Unit Maps: Grade 8 Math

Transcription:

+44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend.

Multiple Regression In order to establish if the advertising mechanisms a company employ have an impact on the web hits or sales that a company may have, an analysis that correlates advertising spend (independent variable) with a sales (dependant variable) is useful. For this we use regression analysis where we see how well a straight line fits through the data, indicating how related the two measures are. As well as understanding the relationship between spend and response it is also important to understand if a significant amount of the response is explained by the independent variable (spend). For this, we examine the relationship by analysing the variance in the data. If a significant correlation is found, we can then use the model by which we performed the regression to predict the change in response that would result from a unit change in spend This document discusses multiple regression using a dataset consisting of a single dependent variable (web hits) and two independent variables (web spend and TV spend) If you have multiple independent variables, the most straightforward approach is to look at the effect of each independent variable on the dependant variable separately and then look at the effect both independent variables have together. The value of doing the separate analyses is that it contextualises the combined analysis that takes both into account.

Multiple Regression Web hits (number) Web hits (number) Firstly we plot the numbers on a simple scatter chart to see what kind of correlation we have. The dependant measure (in this case web hits) is always plotted on the Y axis. 1800 1600 1400 1200 1000 800 600 1800 1600 1400 1200 1000 800 600 400 The relationship between web hits and total web spend 400 3000 3500 4000 4500 5000 5500 6000 Total web spend ( ) The relationship between web hits and TV cost 0 5000 10000 15000 20000 TV cost ( ) web hits vs total web spend regression line webhits v tv cost regression line A regression line is then drawn to best fit the data. A positive correlation means that both variables rise together. A negative correlation means that as one rises, the other falls. Both of these are positive meaning that the more your spend the more web hits you get which is intuitive but not very useful. Lets look at the regression figures. Web spend alone The Pearson correlation coefficient between web spend and web hits is 0.778. A value of 1.0 would mean a perfect correlation and a value of 0.0 would mean no correlation at all. The square of the correlation coefficient (R Square) explains the proportion of variation of one variable explained by the other. So in the case of web spend, 0.778 squared is 0.6053. In other words there is a 60.53% dependence of web hits on web spend. We now need to know if the correlation between web hits and web spend is a significant one; i.e. is one significantly contingent on the other? In order to test this, an analysis of the variance (ANOVA) is carried out.

Multiple Regression: Web spend only Model Sum of Squares df Mean square F Sig (p) Regression 1999743.427 1 1999743.427 44.399 0.000 Residual 1306175.993 29 45040.551 Total 3305919.419 30 Table 1: ANOVA (Web spend only) In table 1 we can see that the significance (statistically called the p value ) is 0.000. The lower the p value the more significant the result. If p < 0.05 we can accept that these two variables are related. A significant value in the ANOVA means that we can believe what the regression analysis is telling us (whether there was a strong or weak regression) i.e. the method by which we can use our current data to predict outcomes in web hits if we spend x on the web. The second part of the ANOVA analysis (table 2) provides the necessary values to create a regression model equation. The web only model The coefficients table (table 2) gives us the values we need in order to write the regression equation: Web hits = slope * independent variable + intercept or y = mx+c (the equation of a line in geometry) The intercept (c) is the value at the intersection and is -704.415 (the value of web hits if spend is 0). The slope (mx) is 4.696 (in the column labelled B for Web spend) and the unit is hits /. This means for every unit change in TV spend ( ) 4.696 web hits hits are achieved. Conversely, a one unit change in web hits would cost an increase of 21p in web spend. Model B (Intercept) Unstandardised co-efficients Std Error Standardised coefficients Beta F Sig (p) Constant -704.415 359.24 1.961 0.060 Web Spend 4.696 0.705 0.778 6.663 0.000 Table 2: Coefficients (Web spend only)

Multiple Regression: Web spend only Testing the web only model The regression equation can be used to test how good your model is. For example, if you spent 600 on the web, how many web hits might you get? Web hits = 4.696 * 600 + (-704.415) Web hits = 2113 for a 600 web spend. This can be compared with the observed data to see what the margin of error exists. The more data points you have in order to do your regression, the stronger the model will be. TV spend alone TV spend is based on a CPT (Cost per Thousand) of 3. The Pearson Correlation between web hits and TV spend is 0.792 so it would seem that web hits are a better relation (closer to 1) to TV spend than web spend. The R square is 0.628 meaning that there is a 62.8% dependence of web hits on TV spend. To test the significance ANOVA is carried out. As the p value (Sig) is < 0.05, it can be accepted that the relationship between TV spend and web hits is significant. Model Sum of Squares df Mean square F Sig (p) Regression 2076193.626 1 2076193.626 48.962 0.000 Residual 1229725.794 29 42404.338 Total 3305919.419 30 Table 3: ANOVA (TV spend only) Model B (Intercept) Unstandardised co-efficients Std Error Standardised coefficients Beta F Sig (p) Constant 1356.154 58.774 23.074 0.000 TV Spend 0.489 0.70 0.792 6.997 0.000 Table 4: Coefficients (TV spend only)

Multiple Regression: TV spend only The TV only model With TV alone the slope is 0.489 hits per spent on TV adverts. The intercept (c) is the value at the intersection and is -1356.15 (the value of web hits if spend is 0). The slope (mx) is 0.489 (in the column labelled B for TV spend) and the unit is hits /. This means for every unit change in TV spend ( ) 0.489 hits are achieved. Multiplied to sensible numbers, for every 10 spent, 4.89 web hits are achieved. Conversely, one unit change in web hits would require a 2.05 spend on TV advertising. Testing the TV only model The regression equation can be used to test how good your model is. For example, if an TV ad cost 300, how many web hits might you get? Web hits = 0.489 * 300 + (1356.15) Web hits = 1502.85 for a 300 TV spend. This can be compared with the observed data to see what margin of error exists. Of course the analysis will be more exact when all of the variables are taken into account. To take both independent variables into account we run a similar exercise but input both TV and web spend into the model. With both variables applied the Pearson correlation for each variable does not change however the R square becomes 0.788: i.e. there is a 78.8% dependence of web hits on both TV spend and web spend (up from the individual variables alone). As before the ANOVA is performed to test the the significance of the relationship.

Multiple Regression: TV & web spend Model Sum of Squares df Mean square F Sig (p) Regression 2605100.568 2 1302550.284 52.041 0.000 Residual 700818.852 28 25029.245 Total 3305919.419 30 Table 5: ANOVA Model B (Intercept) Unstandardised co-efficients Std Error Standardised coefficients Beta F Sig (p) Constant -16.921 302.089-0.056 0.956 TV Spend 3.20 0.065 0.519 4.918 0.000 Web Spend 2.927 0.637 0.485 4.597 0.000 Table 6: Coefficients As before the p value (Sig) is < 0.05, it can therefore be accepted that the relationship between TV and we spend on web hits is significant. The model The slope for web spend is 2.927 hits per when both independent variables are included. If TV spend is held constant, every spent on the web will result in 2.9 more hits. Conversely, every hit costs 0.34p. The units for the variables in a multiple regression should be comparable. If they are not, the Beta coefficients standardise all of the variables in the regression, including the dependent and independent variables, putting all the variables on the same scale. They tell us the number of standard deviations that the outcome will change as a result of one standard deviation change in the predictor. Therefore, you can compare the magnitude of the coefficients to see which one has more effect. Larger betas will give you larger t values and lower p values. The regression equation for both web and TV spend can now be modelled: web hits = (slope web spend * Google spend) + (slope TV spend * TV spend) + intercept If we now substitute in web and TV spend of 600 and 300 respectively we get: Web hits = (2.927 * 600) + (3.20 * 300) + (-16.921) Web hits = 2699.28

Multiple Regression: TV & web spend Conclusion In conclusion, a regression analysis requires steps to not only investigate the correlation between variables, but to test the significance of this correlation. Once the correlation has been proven to provide a useful model then a regression equation can be built to predict the value of the dependant variable given arbitrary values of the independent variable.

+44 (0) 333 666 7366 Get in touch with Adalyser GIVE US A CALL + 44 (0) 333 666 7366 PAY US A VISIT Manchester Business Park, 3000 Aviator Way, Manchester, M22 5TG, United Kingdom SEND US AN E-MAIL sales@adalyser.com