Slides by John Loucks St. Edward s University Slide 1
Chapter 2, Part B Descriptive Statistics: Tabular and Graphical Presentations Exploratory Data Analysis: Stem-and-Leaf Display Crosstabulations and Scatter Diagrams Slide 2
Exploratory Data Analysis The techniques of exploratory data analysis consist of simple arithmetic and easy-to-draw pictures that can be used to summarize data quickly. One such technique is the stem-and-leaf display. Slide 3
Stem-and-Leaf Display A stem-and-leaf display shows both the rank order and shape of the distribution of the data. It is similar to a histogram on its side, but it has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the last digit for each item in rank order. Each line in the display is referred to as a stem. Each digit on a stem is a leaf. Slide 4
Example: Hudson Auto Repair The manager of Hudson Auto would like to gain a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide. Slide 5
Stem-and-Leaf Display Example: Hudson Auto Repair Sample of Parts Cost ($) for 50 Tune-ups 91 78 93 57 75 52 99 80 97 62 71 69 72 89 66 75 79 75 72 76 104 74 62 68 97 105 77 65 80 109 85 97 88 68 83 68 71 69 67 74 62 82 98 101 79 105 79 69 62 73 Slide 6
Stem-and-Leaf Display Example: Hudson Auto Repair 5 6 7 8 9 10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9 a stem a leaf Slide 7
Stretched Stem-and-Leaf Display If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display by using two stems for each leading digit(s). Whenever a stem value is stated twice, the first value corresponds to leaf values of 0-4, and the second value corresponds to leaf values of 5-9. Slide 8
Stretched Stem-and-Leaf Display Example: Hudson Auto Repair 5 5 6 6 7 7 8 8 9 9 10 10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9 Slide 9
Stem-and-Leaf Display Leaf Units A single digit is used to define each leaf. In the preceding example, the leaf unit was 1. Leaf units may be 100, 10, 1, 0.1, and so on. Where the leaf unit is not shown, it is assumed to equal 1. The leaf unit indicates how to multiply the stemand-leaf numbers in order to approximate the original data. Slide 10
Example: Leaf Unit = 0.1 If we have data with values such as 8.6 11.7 9.4 9.1 10.2 11.0 8.8 a stem-and-leaf display of these data will be Leaf Unit = 0.1 8 6 8 9 10 11 1 4 2 0 7 Slide 11
Example: Leaf Unit = 10 If we have data with values such as 1806 1717 1974 1791 1682 1910 1838 a stem-and-leaf display of these data will be Leaf Unit = 10 16 8 17 18 19 1 9 0 3 1 7 The 82 in 1682 is rounded down to 80 and is represented as an 8. Slide 12
Crosstabulations and Scatter Diagrams Thus far we have focused on methods that are used to summarize the data for one variable at a time. Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables. Crosstabulation and a scatter diagram are two methods for summarizing the data for two variables simultaneously. Slide 13
Crosstabulation A crosstabulation is a tabular summary of data for two variables. Crosstabulation can be used when: one variable is qualitative and the other is quantitative, both variables are qualitative, or both variables are quantitative. The left and top margin labels define the classes for the two variables. Slide 14
Crosstabulation Example: Finger Lakes Homes The number of Finger Lakes homes sold for each style and price for the past two years is shown below. quantitative variable categorical variable Price Home Style Range Colonial Log Split A-Frame Total < $200,000 > $200,000 18 6 19 12 55 12 14 16 3 45 Total 30 20 35 15 100 Slide 15
Crosstabulation Example: Finger Lakes Homes Insights Gained from Preceding Crosstabulation The greatest number of homes (19) in the sample are a split-level style and priced at less than $200,000. Only three homes in the sample are an A-Frame style and priced at $200,000 or more. Slide 16
Crosstabulation Example: Finger Lakes Homes Frequency distribution for the price range variable Price Home Style Range Colonial Log Split A-Frame Total < $200,000 > $200,000 18 6 19 12 55 12 14 16 3 45 Total 30 20 35 15 100 Frequency distribution for the home style variable Slide 17
Using Excel s PivotTable Report to Create a Crosstabulation Excel Worksheet (showing partial data) A B C D E 1 Home Price ($) Style 2 1 >200K Colonial 3 2 <200K Log 4 3 >200K Log 5 4 <200K A-Frame 6 5 <200K Colonial 7 6 <200K Split-Level 8 7 >200K A-Frame 9 8 >200K Colonial Note: Rows 10-101 are not shown. Slide 18
Using Excel s PivotTable Report to Create a Crosstabulation Displaying the Initial PivotTable Field List and PivotTable Report Step 1 Click the Insert tab on the Ribbon Step 2 In the Tables group, click the icon above the word PivotTable Step 3 When the Create PivotTable dialog box appears: Choose Select a Table or Range Enter A1:C101 in the Table/Range box Choose New Worksheet as the location for the PivotTable Report Click OK Slide 19
Using Excel s PivotTable Report to Create a Crosstabulation Setting Up the PivotTable Field List Step 1 In the PivotTable Field List, go to Choose Fields to add to report Drag the Price ($) field to Row Labels area Drag the Style field to Column Labels area Drag the Home field to the Values area Step 2 Click on Sum of Home in the Values area Step 3 Click Value Field Settings from the list of options Step 4 When the Value Field Settings dialog box appears: Under Summarize value field by, choose Count Click OK Slide 20
Using Excel s PivotTable Report to Create a Crosstabulation Value Worksheet A B C D E F G 1 Count of Home Style 2 Price ($) Colonial Log Split-Level A-Frame Grand Total 3 <200K 18 6 19 12 55 4 >200K 12 14 16 3 45 5 Grand Total 30 20 35 15 100 6 Slide 21
Crosstabulation: Row or Column Percentages Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables. Slide 22
Crosstabulation: Row Percentages Example: Finger Lakes Homes Price Home Style Range Colonial Log Split A-Frame Total < $200,000 > $200,000 32.73 10.91 34.55 21.82 100 26.67 31.11 35.56 6.67 100 Note: row totals are actually 100.01 due to rounding. (Colonial and > $200K)/(All > $200K) x 100 = (12/45) x 100 Slide 23
Crosstabulation: Column Percentages Example: Finger Lakes Homes Price Home Style Range Colonial Log Split A-Frame < $200,000 > $200,000 Total 60.00 30.00 54.29 80.00 40.00 70.00 45.71 20.00 100 100 100 100 (Colonial and > $200K)/(All Colonial) x 100 = (12/30) x 100 Slide 24
Crosstabulation: Simpson s Paradox Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation. We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation. Simpson Paradox: In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data. suggests the overall relationship between the variables. Slide 25
Scatter Diagram and Trendline A scatter diagram is a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. The general pattern of the plotted points suggests the overall relationship between the variables. A trendline provides an approximation of the relationship. Slide 26
Scatter Diagram A Positive Relationship y x Slide 27
Scatter Diagram A Negative Relationship y x Slide 28
Scatter Diagram No Apparent Relationship y x Slide 29
Scatter Diagram Example: Panthers Football Team The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. x = Number of Interceptions 1 3 2 1 3 y = Number of Points Scored 14 24 18 17 30 Slide 30
Number of Points Scored Scatter Diagram 35 30 25 20 15 10 5 0 y 0 1 2 3 4 Number of Interceptions x Slide 31
Example: Panthers Football Team Insights Gained from the Preceding Scatter Diagram The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. Higher points scored are associated with a higher number of interceptions. The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line. Slide 32
Using Excel s Chart Wizard to Construct a Scatter Diagram and Trendline Excel Worksheet (showing data) A B C Number of Number of 1 Interceptions Points Scored 2 1 14 3 3 24 4 2 18 5 1 17 6 3 30 7 Slide 33
Using Excel s Chart Tools to Construct a Scatter Diagram and Trendline Step 1 Select cells A2:B6 Step 2 Click the Insert tab on the Excel Ribbon Step 3 In the Charts group, click Scatter Step 4 When the list of scatter diagram subtypes appears: Click Scatter with only Markers Step 5 In the Chart Layout group, click Layout 1 Step 6 Select the Chart Title and replace it with Scatter Diagram for the Panthers Step 7 Select the Horizontal Axis (Value) Title and replace it with Number of Interceptions... continue Slide 34
Using Excel s Chart Tools to Construct a Scatter Diagram and Trendline Step 8 Select the Vertical (Value) Axis Title and replace it with Number of Points Scored Step 9 Right click Series 1 Legend Entry and click Delete - - - - - - - - - - - - - - - - To Add a Trendline - - - - - - - - - - - - - - - Step 10 Position the pointer over any data point in the scatter diagram and right-click to display options Step 11 Choose Add Trendline Step 12 When the Format Trendline dialog box appears: Select Trendline Options Choose Linear from Trend/Regression Type list Click Close Slide 35
Using Excel s Chart Tools to Construct a Scatter Diagram and Trendline Number of Points Scored. 8 9 10 11 12 13 14 15 16 17 18 19 20 35 30 25 20 15 10 5 0 A B C Scatter Diagram for the Panthers 0 1 2 3 4 Num ber of Interceptions Slide 36
Tabular and Graphical Methods Categorical Data Data Quantitative Data Tabular Methods Graphical Methods Tabular Methods Graphical Methods Frequency Distribution Rel. Freq. Dist. Percent Freq. Distribution Crosstabulation Bar Graph Pie Chart Frequency Distribution Rel. Freq. Dist. % Freq. Dist. Cum. Freq. Dist. Cum. Rel. Freq. Distribution Cum. % Freq. Distribution Crosstabulation Dot Plot Histogram Ogive Stem-and- Leaf Display Scatter Diagram Slide 37
End of Chapter 2, Part B Slide 38