Scatterplot: The Bridge from Correlation to Regression

Similar documents
Linear Functions. College Algebra

Section 7D Systems of Linear Equations

Writing and Graphing Linear Equations. Linear equations can be used to represent relationships.

Forms of Linear Equations

Section Graphs and Lines

Linear Topics Notes and Homework DUE ON EXAM DAY. Name: Class period:

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Practice Test (page 391) 1. For each line, count squares on the grid to determine the rise and the run. Use slope = rise

2.1 Solutions to Exercises

Sec 4.1 Coordinates and Scatter Plots. Coordinate Plane: Formed by two real number lines that intersect at a right angle.

Applied Regression Modeling: A Business Approach

Section 4.4: Parabolas

Data Management Project Using Software to Carry Out Data Analysis Tasks

A straight line is the graph of a linear equation. These equations come in several forms, for example: change in x = y 1 y 0

Sketching graphs of polynomials

Section 3.7 Notes. Rational Functions. is a rational function. The graph of every rational function is smooth (no sharp corners)

STA 570 Spring Lecture 5 Tuesday, Feb 1

Applied Regression Modeling: A Business Approach

Chapter 1. Linear Equations and Straight Lines. 2 of 71. Copyright 2014, 2010, 2007 Pearson Education, Inc.

Using Large Data Sets Workbook Version A (MEI)

Year 10 General Mathematics Unit 2

Multiple Regression White paper

Here is Kellogg s custom menu for their core statistics class, which can be loaded by typing the do statement shown in the command window at the very

Name Course Days/Start Time

3-1 Writing Linear Equations

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Graphing Linear Equations

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

Four Types of Slope Positive Slope Negative Slope Zero Slope Undefined Slope Slope Dude will help us understand the 4 types of slope

Using Excel for Graphical Analysis of Data

Bell Ringer Write each phrase as a mathematical expression. Thinking with Mathematical Models

Vocabulary Unit 2-3: Linear Functions & Healthy Lifestyles. Scale model a three dimensional model that is similar to a three dimensional object.

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

LINEAR TOPICS Notes and Homework: DUE ON EXAM

Math 154 Elementary Algebra. Equations of Lines 4.4

WHAT YOU SHOULD LEARN

FLC Ch 3. Ex 1 Plot the points Ex 2 Give the coordinates of each point shown. Sec 3.2: Solutions and Graphs of Linear Equations

1. What specialist uses information obtained from bones to help police solve crimes?

Slide 1 / 96. Linear Relations and Functions

The x-intercept can be found by setting y = 0 and solving for x: 16 3, 0

CHAPTER 6. The Normal Probability Distribution

WEEK 4 REVIEW. Graphing Systems of Linear Inequalities (3.1)

Advanced Algebra. Equation of a Circle

Section 1.5. Finding Linear Equations

Chapter 3: Rate Laws Excel Tutorial on Fitting logarithmic data

Learning Packet THIS BOX FOR INSTRUCTOR GRADING USE ONLY. Mini-Lesson is complete and information presented is as found on media links (0 5 pts)

Name Class Date. Using Graphs to Relate Two Quantities

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

2-4 Graphing Rational Functions

Welcome to class! Put your Create Your Own Survey into the inbox. Sign into Edgenuity. Begin to work on the NC-Math I material.

STRAIGHT LINE GRAPHS THE COORDINATES OF A POINT. The coordinates of any point are written as an ordered pair (x, y)

PR3 & PR4 CBR Activities Using EasyData for CBL/CBR Apps

SLStats.notebook. January 12, Statistics:

Excel Primer CH141 Fall, 2017

9.8 Rockin the Residuals

(Refer Slide Time: 0:32)

Section 1.1 The Distance and Midpoint Formulas

Chapter 6: DESCRIPTIVE STATISTICS

LEIAG-Excel Workshop

CHAPTER. Graphs of Linear Equations. 3.1 Introduction to Graphing 3.2 Graphing Linear Equations 3.3 More with Graphing 3.4 Slope and Applications

Plotting Graphs. Error Bars

Math 263 Excel Assignment 3

Evolution of the Telephone

Review for Mastery Using Graphs and Tables to Solve Linear Systems

Advanced Algebra Chapter 3 - Note Taking Guidelines

Glossary Common Core Curriculum Maps Math/Grade 6 Grade 8

Revision Topic 11: Straight Line Graphs

Meeting 1 Introduction to Functions. Part 1 Graphing Points on a Plane (REVIEW) Part 2 What is a function?

Relations and Functions 2.1

Experiment 1 CH Fall 2004 INTRODUCTION TO SPREADSHEETS

Tangent line problems

3.1. 3x 4y = 12 3(0) 4y = 12. 3x 4y = 12 3x 4(0) = y = x 0 = 12. 4y = 12 y = 3. 3x = 12 x = 4. The Rectangular Coordinate System

slope rise run Definition of Slope

Using Excel for Graphical Analysis of Data

Section 2.2 Graphs of Linear Functions

Lesson 6 - Practice Problems

Activity Graphical Analysis with Excel and Logger Pro

Topic. Section 4.1 (3, 4)

An introduction to plotting data

Chapter 1 Polynomials and Modeling

Grade 9 Math Terminology

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2018/10/30) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

Ex.1 constructing tables. a) find the joint relative frequency of males who have a bachelors degree.

A scatter plot Used to display values for typically for a set of

Correlation. January 12, 2019

Math 8 Honors Coordinate Geometry part 3 Unit Updated July 29, 2016

Microsoft Excel 2007 Creating a XY Scatter Chart

GSE Algebra 1 Name Date Block. Unit 3b Remediation Ticket

More Ways to Solve & Graph Quadratics The Square Root Property If x 2 = a and a R, then x = ± a

Types of Edges. Why Edge Detection? Types of Edges. Edge Detection. Gradient. Edge Detection

Lesson 1.1 Exercises, pages 8 12

graphing_9.1.notebook March 15, 2019

2-3 Graphing Rational Functions

You should be able to plot points on the coordinate axis. You should know that the the midpoint of the line segment joining (x, y 1 1

Chapter 4 Graphing Linear Equations and Functions

Excel Spreadsheets and Graphs

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

List of Topics for Analytic Geometry Unit Test

In math, the rate of change is called the slope and is often described by the ratio rise

Transcription:

Scatterplot: The Bridge from Correlation to Regression

We have already seen how a histogram is a useful technique for graphing the distribution of one variable. Here is the histogram depicting the distribution of the Age Category (agecat) variable in the voter.xlsx file we have used in class. Now let s look at histograms for the rates of the violent crimes of forcible rape and robbery in the United States in 2012.

These histograms show us the distribution of the 2 crime rates individually, but they don t show us how the 2 crime rates are related to each other. We can see that the distribution of forcible rape rates is positively skewed; the distribution of robbery rates is also positively skewed, although less than the rate of forcible rape. But, these histograms do not allow us to see how these 2 violent crime rates may be related to each other. We don t know what happens to the robbery rate when the forcible rape rate increases or decreases. Because it is important to know how (or if) 2 or more variables are related to each other, we need to go beyond simply describing their distributions in samples or populations. We need techniques to capture in numbers and in graphs how variables are related to each other. We have previously covered correlation.

Simple Linear Regression With correlation, we have seen that the strength and direction of association between two variables can be summarized in one number- the correlation coefficient. (See the slide show Correlation to refresh your understanding of this procedure.) Often, we would like to go beyond such description to make predictions about what the values of one variable are likely to be if we know the values of one or more related variables. Regression procedures allow us to make such predictions. Let us clarify what we mean by predictions. We do not mean some psychic ability to foresee some future event based on readings of an individual's palms or his/her corona. Nor do we mean an ability to forecast changes in weather on the basis of someone's aching bones or some other physical ailment or quirk. We definitely do not refer to a person's intuition, feelings, or some other supernatural capability. Rather, our use of the term prediction means deriving acceptably accurate estimates of the value of one variable on the basis of its known relationship with one or more other variables. Many sets of variables have linear relationships which can be graphed using the standard X and Y coordinates of a two-dimensional grid. To refresh your memory of 2 dimensional graphing, the next slide reviews the basic structure of such graphs.

Values of Y increase as we move up the Y axis from the origin to the top of the axis.. Origin Origin Values of X increase as we move across the X axis from left to right.

The graph used in regression is called a scatterplot. A scatterplot has a horizontal axisdesignated by the letter X- and a vertical axis- designated by the letter Y. Each point in the scatterplot is represented by its X value (that is,its value on the horizontal axis) and its corresponding Y value (its value on the vertical axis). For example, the numerical pair (1, 2) indicates that the X value is 1 and the Y value is 2. On the next slide, we will see how this point is plotted on a scatterplot.

This is a very simple example of a scatterplot. Note the location of the point above the value 1.00 on the x (horizontal) axis and at the point 2.00 on the y (vertical) axis. Now let us add a second point to this graph. This point will have coordinates x = 3 and y = 5. The scatterplot is on the next slide.

(3,5) (1,2) Here is the scatterplot of these 2 points and the straight line that connects them. Remember that 2 points determine a straight line and, as with all straight lines, this line can be determined with the formula y = mx + b, where: y is the value of the y variable; m is the slope of the line; x is the value of the x variable; and b is the y-intercept- the point where the line crosses the y (vertical) axis when x = 0.00. Before discussing the line in more detail, let us add a third point to the scatterplot. This point will have coordinates x = 1 and y = 4.

It should be obvious that this third point (1,4) will not lie on the same straight line as the other 2 points (1,2; 3,5). However there is a line which can be drawn through this graph which will come close to all 3 points. The next slide illustrates this line.

This line appears to pass through only 1 of the points on this graph. However, of all the lines that we could draw through this graph, this is the one line that comes closest to touching all 3 points on the graph. This line is called the line of least squares or least squares line and it will be very handy for making predictions about the dependent variable when we know its relationship with the independent variable. For now, let us consider the correlation between forcible rape and robbery rates.

Here is an abbreviated correlation matrix showing the correlation between forcible rape and robbery rates for the United States in 2012 (compiled and computed in the FBI Uniform Crime Reports). The correlation coefficient of -.255 indicates a weak negative relationship between the 2 variables. In words, as the rate of robbery increases the rate of forcible rape decreases slightly. Now, let s look at the graph- the scatterplot- for the correlation between these variables.

In this scatterplot, robbery rates are presented as the independent variable along the x axis and forcible rape rates are plotted as the dependent variable on the y (vertical) axis. It is not immediately evident if the least squares line passes through any of the points in the graph, but it is the one line that comes closest to all of the points. [The equation for the line is presented above the scatterplot; we will return to this equation in a subsequent slide.] Note that the line has a negative slope- as robbery rates increase along the x axis, forcible rape rates decrease on the y axis. On the next slide, we present the correlation matrix for these variables along with this scatterplot.

The correlation matrix shows a correlation coefficient of -.255; the scatter plot shows a least squares line with a negative slope. It should be obvious that the scatterplot for a positive correlation will have a least squares line with a positive slope. Now let s take a closer look at the equation above the scatterplot.

Recall the equation for a straight line: y = mx + b. In research terms: 1) y is the value of the dependent variable; 2) m is the slope of the least squares line; 3) x is the value of the independent variable; 4) b is the y-intercept. SPSS presents the linear equation is a slightly different order. This order is effectively y = b + mx. In other words, SPSS gives the y-intercept first followed by the product of the slope times the value of the x (independent) variable. Here is the equation for this least squares line: Forcible rape = 36.80 +-0.07*robbery Dependent y- + slope*independent Variable intercept We ve almost come full circle. The equation for the least squares line through a scatterplot depicting a linear relationship between two variables allows a researcher to predict values of the dependent variable from values of the independent variable. Let s see how this is done on the next slide.

Suppose we want to know what the forcible rape rate is given that we know what the robbery rate is. Further, suppose we know that a state s robbery rate is 100.00 per 100,000 inhabitants. With our equation- Forcible rape = 36.80 +-0.07*robbery- we only have to plug in 1 number (100.00). Solving the equation: Forcible rape = 36.80 +-0.07*100.00 Forcible rape = 36.80 +-7 Forcible rape = 29.80 With these data and this linear relationship between these variables, we would predict that a state with a robbery rate of 100.00 per 100,000 inhabitants will have a forcible rape rate of 29.80 per 100,000 inhabitants. Now suppose that a state s robbery rate is 175.00 per 100,000 inhabitants. Solving this equation: Forcible rape = 36.80 +-0.07*175.00 Forcible rape = 36.80 +- 12.25 Forcible rape = 24.55 With this presentation, we can see how we use the technique of linear regression to predict values of one variable knowing values of an associated variable. This presentation is also intended to reinforce the importance of identifying INDEPENDENT and DEPENDENT variables. Remember: the INDEPENDENT variable is plotted along the horizontal (X) axis of a graph, while the DEPENDENT variable is plotted up and down the vertical (Y) axis of the graph.

The preceding slides showing the scatterplot for the relationship between the variables were prepared using an earlier version of SPSS. Newer versions of IBM SPSS Statistics produce the scatterplot, but may not show the least squares line through the data points on the graph, nor the equation for the line. However, we can still depict a scatterplot and we can generate the linear equation using the Analyze Regression Linear command sequence. The following slides will demonstrate how to do these procedures.

First, here is the scatterplot for the relationship between robbery rates and forcible rape rates; the graph is constructed without the least squares line. In the next few slides, we will demonstrate the SPSS command sequence which generates this scatterplot. For now, note that the data points appear to lie on a line with a negative slope. Let us see how this graph was generated.

Here is the command sequence to generate a scatterplot using Legacy Dialogs.

On this screen, we choose Simple Scatter ; click the Define button

Move the independent variable (in this illustration, robbery ) to the X-axis field; move the dependent variable (here, Forcible rape ) to the Y-axis field. Click Titles.

Enter a title for the scatterplot; graphs are typically indicated as Figure x. Click Continue.

Returning to the Simple Scatterplot screen, click OK.

Here is the simple scatterplot for the association between robbery rates the independent variable- and forcible rape rates- the dependent variable. Now let s generate the equation for the least squares line through this scatterplot.

Here is the command sequence to begin the procedure.

Move the independent variable (robbery) to the Independent(s) field; move the dependent variable (Forcible rape) to the Dependent field. [This command sequence can also be used for multiple regression in which there can be more than one variable entered in this field; for our illustration, we will do a simple regression of one variable on a second variable.

Most of the other buttons on this screen can be ignored since we are only doing a simple regression. Click OK.

Here is the Regression output; from this screen, we can generate the equation for the least squares line. Recall the linear equation: y = mx + b.

Of the output on the previous slide, the part we need to generate the linear equation is the table of Coefficients at the bottom of the screen. In this table: 1) The cell in the column headed B and row headed (Constant) contains the y- intercept or b in the equation- y = mx + b; in this case, b = 36.805. 2) The cell in the same column in the row headed robbery contains the slope of the least squares line- in this case, m = -.069 The equation for the least squares line through this scatterplot is: y = -.069x + 36.805