Studet Activity : Fidig the Least Squares Regressio Lie By Explorig the Relatioship betwee Slope ad Residuals Objective: How does oe determie a best best-fit lie for a set of data? Eyeballig it may be a good place to start, but there is a more exact way. I this activity, you will ot oly fid a best best-fit lie for a data set, you will discover why it must be the best. What to Do: Figure 1: Fathom Case Table 1. Ope the Fathom file Parabolic_Path_LSRL.ftm. 2. I the Case Table (upper left corer of scree), iput the data set as give i Figure 1. 3. Notice the data poits are displayed i a scatter plot (upper right of scree) as i Figure 2. Also o the scatter plot, a blue horizotal lie has appeared that we ca use to fid a best-fit lie to model the data. Figure 2: Fathom Scatter Plot a. The geeral equatio for this lie ca be foud just below the scatter plot. What is it? Write the equatio usig x for mea(x) ad y for mea(y). b. What specific slope does the iitial horizotal (blue) lie have? c. Based o that slope, simplify the geeral liear equatio for the specific lie graphed.
4. Below the Case Table (Figure 1) o the Fathom scree is a slider (Figure 3) that chages the values of parameter b i the liear equatio. Try movig the slider to the left ad right. Figure 3: Slider for Parameter b a. What effect does chagig the value of b have o the blue lie? Why? b. By chagig b, ca the lie be vertically or horizotally traslated? c. By chagig b, what type of trasformatio does occur? 5. I order to fid a equatio for a lie, at least oe poit o the lie must be kow. For a best-fit lie, a reasoable poit to begi with is ( x, y ), also called the ceter of gravity for the data set. a. Why would the ceter of gravity be a good poit to iclude o a best-fit lie? b. Aroud what poit does the lie o the scatter plot appear to rotate whe the slope, b, is chaged with the slider? c. I the Fathom widow, a Summary Table as i Figure 4 displays statistics o the data, icludig the slope (b), x, y, the sum of the residuals, ad the sum of squared residuals. Based o the data s statistics, what are the specific coordiates of the poit of rotatio? Figure 4: Fathom Summary Table d. Lookig at the lie graphed o the scatter plot, do these coordiates appear to be correct? 2003, rev. 2008 J. Reihardt & J. Simos 2
e. Now look closely at the Case Table as i Figure 5 that cotais the data poits. It also icludes other useful iformatio: YFitted values ad Residual values. How are the YFitted values also kow as predicted values determied? Figure 5: Fathom Case Table f. How are the Residuals determied? What is a residual? 6. Kowig oly oe poit (i this case, the ceter of gravity) is ot sufficiet iformatio to determie ay lie, much less a best-fit lie. We also eed to fid the best slope to fit the data. a. Adjust the slope of the lie by usig the slider to chage the value of b. Try to fid a lie that is a good fit to the data. Record the value of the slope (b) for this iitial estimate. b. How did you decide what the slope of the best-fit lie should be? c. Do you thik someoe else would choose the same slope as you? Why or why ot? 7. Is there a more accurate method for determiig the best slope? Go to the Object meu ad select Show Hidde Objects as i Figure 6. A graph of a fuctio appears. a. What is the explaatory (idepedet) variable of this fuctio? [Hit: How are the axes labeled?] Figure 6: Fathom s Object Meu b. What is the respose (depedet) variable? 2003, rev. 2008 J. Reihardt & J. Simos 3
c. What type of fuctio models the relatioship betwee these two variables? d. What is the shape of this fuctio s graph? 8. Try adjustig the slope (b) of the lie oce agai. This time otice how the mysterious poit o the parabola moves i respose. This poit ad the slope of our best-fit lie are coected. a. Adjust the slope to move the mysterious poit closer to the vertex of the parabola. What effect does this have o how well the lie fits the data? b. Use the parabola ad its vertex to determie the best slope for your liear model. Oce you are satisfied with your choice, record the coordiate values of the mysterious poit to the earest 2 decimal places. (Refer to the Summary Table for values.) c. At the vertex poit, otice that the respose variable has a miimum value. Cosiderig the meaig of the respose variable, why is the vertex helpful i determiig the slope of our best-fit lie? 9. Determie a equatio for the best-fit lie for the data set. a. Use the ceter of gravity ad the slope foud with the parabola to write a equatio for the best-fit lie i poit-slope form, that is, y " y 1 = b( x " x 1 ). b. Rewrite your equatio above i slope-itercept form, y = bx + a. c. The type of best-fit lie that foud i this activity is kow as the Least Squares Regressio Lie, or LSRL, for short. Why is it referred to by this ame? 2003, rev. 2008 J. Reihardt & J. Simos 4
10. Now that we have determied the LSRL, we ca check it with the LSRL as computed by Fathom. a. Highlight (click o) the scatter plot. The select Least-Squares Lie from the Graph meu. Fathom graphs the LSRL (gree lie) o the scatter plot with ours (blue lie). Are they close? What is Fathom s LSRL equatio, ad how does it compare to ours? b. Retur to the Graph meu ad select the Show Squares optio. Notice that squares coected to the LSRL appear as i Figure 7. What do these squares represet? Ad what does the total sum of the areas of these squares represet? Figure 7: Fathom Scatter Plot ad LSRL c. Below the equatio for Fathom s LSRL is a value for its Sum of squares. What is this value? Where ca this value be foud o the graph of the parabola? d. Chage the value of the slope (b) of the lie ad otice the effect o the size of the squares. To determie a best best-fit lie for a data set, we foud a special lie through the ceter of gravity that also does what to the sum of the areas of the gree squares? 11. What two primary characteristics must a Least Squares Regressio Lie (a best-fit lie) have to model a set of data, ad why is each characteristic sigificat? 2003, rev. 2008 J. Reihardt & J. Simos 5
Extesios 1. Why is there a quadratic relatioship betwee the slope of a best-fit lie ad the sum of the squared residuals? Ivestigate this relatioship i the followig: a. Use a lie i the form y = b( x! x ) + y that cotais the ceter of gravity, x = 0.8 ad y =1.0. Fid the residual i terms of b for each of the five data poits i the activity. (It may help to orgaize the results i a table, ad you may wat to collaborate with others.) b. Square each residual. The fid the sum of the squared residuals i terms of b (the slope). c. What type of fuctio describes the relatioship betwee the slope ad the sum of squared residuals for a lie that icludes the ceter of gravity? d. Verify that the geeral quadratic fuctio (give i the parabola s widow i Fathom ad also below) works for the data set i the activity.!(residuals) 2 =!(y i " y) 2 " 2b #! y i (x i " x) + b 2 #!(x i " x) 2 e. Show that the above holds true for ay geeral data set. 2. I the activity, we determied a LSRL lie for a data set of five specific poits. What if the data set chaged? a. How does chagig a poit i the data set affect the quadratic relatioship betwee the slope ad the sum of the squared residuals for a possible best-fit lie? Choose ay oe of the five data poits ad drag it aroud to ew locatios. I what ways does this affect the parabola used to determie the LSRL? How ca the effects o the parabola caused by chagig a poit i the data set be explaied i terms of the parabola s equatio? b. How does chagig a poit i the data set affect the Least Squares Regressio Lie? Highlight (click o) the scatter plot. The uder the Graph meu, de-select Show Squares. Choose ay oe of the five data poits ad drag it aroud to ew locatios. I what ways does this affect the LSRL (the gree lie i the Fathom widow)? Specifically, how does chagig a poit i the data set affect each of the two defiig characteristics of the Least Squares Regressio Lie? 2003, rev. 2008 J. Reihardt & J. Simos 6