Psychology 312: Lecture 7 Descriptive Statistics Slide #1 Descriptive Statistics Descriptive statistics & pictorial representations of experimental data. In this lecture we will discuss descriptive statistics. Slide #2 Outline Frequency distributions Measures of central tendency Pictorial presentation of data In doing so we will begin with a discussion of frequency distributions, then talk about measures of central tendency and finally review some pictorial presentations of data. Slide #3 Frequency Distributions Often first step for getting a handle on your data Indicates how frequently each score/value appears in your data set. Can be presented as a table or as a graph Once you have conducted your study and obtained data. Frequency distributions are often used as the first step for getting an idea of what your data looks like. This type of distribution will indicate how frequently each score/ value appears in your data set. Distributions of this nature can be presented either as a table or as a graph. I m going to show you examples of each of this. Slide #4 Frequency Distribution Example of a frequency table. Let s imagine that we survey the students in this class and we ask them to report their shoe size. This would be an example of a frequency table built on those types of data. In the left column we have the various shoe sizes that are represented in the class from a 5 ½ to 11. In the right column we have the number of students that reported each of those shoe sizes. So notice one student reported having a 5 ½ shoe side, five students reported having a size eight and two students reported having a size ten. Slide #5 Frequency Distribution
Then translate it into a frequency polygon. Scores on X axis Frequency of each score on Y axis Those same data could be translated into a frequency polygon. This is a graphical representation of the previous table. Often times frequency polygons are attractive because they can give you a quick visual representation of what your distribution looks like. Notice that in a frequency polygon the scores are always on the X axis and the frequency is always identified on the Y axis. Slide #6 Additional information about our data set is provided by descriptive statistics. While frequency distributions are often an attractive first step in getting a handle on our data they are rarely used alone. Additional information about the data set can be provided by a set of descriptive statistics or more specifically what we call measures of central tendency. Those include the mean, the median and the mode. Let s walk through each of these. Slide #7 Mean (x ): the arithmetic average. The mean represents the arithmetic average of the data set. It is calculated by summing the individual scores in the data set and then dividing by the total number of scores. In this rather small data set the mean is a score of five. Slide #8 No Title Table of data of shoe size vs. frequency vs. sixe x freq. Mean= 200/25= 8. Here is a larger data set, but the same principle applies. In the far column on the left we have shoe size. In the middle column we have the frequency of each of those shoe sizes in our data set and in the third set column we have each shoe size times its frequency. Adding up those scores gets us the total scores in the distribution, which are then divided by the total number of scores. In this instance twenty five rendering a mean score of eight. The average shoe size in this class is a size eight. Slide #9 Things to remember about the mean: 1. It is sensitive to all scores in the set. o Change 1 score change the mean.
o Will be affected by extreme scores There are a couple of things that you need to remember about the mean. These will be important for later discussions in this course. One of those things is to recognize is the mean is sensitive to al scores in the set. That means if you change one score you will change the mean. Another way of thinking about this is that the mean will be particularly affected by extreme scores that will pull the mean either up or down depending upon their particular value. Slide #10 2) The sum of all scores deviations from the mean will equal zero. The other thing that I would like you to remember about the mean is that the sum of all scores deviations from the mean will always equal zero. This will be important of our discussion of variability in the next lecture. This table demonstrates what I mean. On the far left column we have each score in the distribution. In the middle column we have the mean score of that distribution. If you subtract each score from the mean score and then add those deviations scores you will notice that the sum is zero. Slide #11 Measure of Central Tendency Median (M d ): the middle score (the score that cuts the distribution into two equal halves) The second measure of central tendency is the median or the middle score. Said another way the median is a score that cuts the distribution into two equal halves. Slide #12 To find the median: o Put scores in order of magnitude. o If odd # of scores. Take middle score (M d = 5) o If even # of scores Add two scores on either side of midpoint Divide the sum by 2 3 + 5 = 8/2= 4 (M d ) To find a median of a distribution. You need to start by first putting your scores in order of magnitude. If your distribution includes an odd number of scores as this one does finding the median is quite easy, because it is simply the middle score. However if your distribution includes an even number of scores as this one does than in this case you must add the two scores on either side of the midpoint and then divide that sum by two. This calculation will render the median
score. Slide #13 No Title Image of a table of numbers. Here is a slightly larger distribution. Again presented as a frequency distribution. We have twelve scores at the top, twelve scores at the bottom and in this case the median score is a score of eight. Slide #14 Things to remember about the median: o It is not sensitive to each individual score. o It is not affected by extreme scores in the distribution. As you did with the mean there are a few things to remember about the median. For instance the median is not sensitive to individual scores in the distribution. This means that it is also not affected by extreme scores in a distribution. This will be an important point, because you will see later in some distributions the median is actual a more valid method measure of central tendency than is the mean. Slide #15 Mode (M o ): the most frequent score Example scores: o Distribution: 1,2, 2, 3, 4, 5, 2, 4, 2 o Mode = 2 Note: Can be more than one mode in a distribution. o EX: Bimodal distribution The final measure of central tendency is the mode the most frequent score in your distribution. So for instance if your distribution looked like this the mode would be a value of two. Well we often times talk about the mode of a distribution you should note that it is possible for a distribution to have more than one mode. For instance if you ever heard the term the bimodal distribution you have a distribution like the one shown here in which there are two peaks in that distribution and the value of each of those peaks represents a mode in that distribution. Slide #16 No Title Image of a table of numbers. Mode = 8.
Returning back to our frequency distribution for shoe sizes in the course we can look to the right column now to quickly identify the mode. In this instance it should be the shoe size with the highest frequency. In this case it is a shoe size of eight. Slide #17 Measure Of Central Tendency Note: Not all measures of CT are appropriate for all types of data. o Nominal Data Only mode can be used. o Ordinal data Only median & mode can be used. We can relate are current discussion back to issues we discussed in the previous lectures. Specifically you should note that not all measures of central tendency are appropriate for all types of data. For instance if you are dealing with nominal data only the mode can be used. It isn t meaningful to take about mean or median scores when dealing with categorical data. If however your data are ordinal data you can use the median or mode. However the mean is not meaningful for this type of data, because in this case we cannot assume that there is equal difference between the values of this particular scale. Slide #18 Measures of central tendency can tell us about the shape of the data set. o Normal? o Skewed? We use measures of central tendency to tell us something about the overall shape of our data set. Specifically we use them to determine if our data set looks normal vs. whether if it looks skewed. Slide #19 If the mean = median = mode, the distribution is normal. In a normal distribution the mean, median and mode scores will all be either identical or very similar to one another. Visually the normal distribution looks something like this a beautiful balanced bell shape curve. Slide #20 If the mean > median > mode, the distribution is positively skewed. If we have a data set however in which the mean is larger than the median, which in turn is larger
than the mode then we say that we have a distribution that is positively skewed. Visually a positively skewed distribution looks like this with a number of scores clustered on the left and then some high extremes scores on the right. Slide #21 If the mean < median < mode, the distribution is negatively skewed. In contrast is the mean score is less than the median, which in turn is less than mode the distribution, is negatively skewed. In that case the distribution would look something like this in which a number of scores are clustered on the right with a few extreme scores on the left. Slide #22 NOTE: it is better to use the median versus the mean for description when a set of scores is heavily skewed. o Mean is pulled in the direction of the skew. o The median is not affected by extreme scores. now that you understand the difference between a normal and a screwed distribution you should keep in mind that it is better to use the median vs. the mean for purposes of description when your data set is heavily skewed. As you will recall the mean will be pulled in the direction of the skew or in the direction of the extreme scores. In contrast your median will not be affected by those extreme scores and will therefore render a more valid measure of essential tendency. Slide #23 Graphing Data Ways to display your data graphically as: o Bar graph o Line graph In either case o IV on X axis o DV on Y axis In addition to frequency distributions and measure of central tendency you might also choose to display your data graphically using either a bar graph or a line graph. Both of these are quite common in social science research. In either case the independent variable will always be presented on the X axis. While the dependent variable will always be displayed on the Y axis. Let s look at an example. Slide #24 Graphing Data Example 1: Music experiment o Plotting mean accuracy of performance in the rock, classical & no music condition.
o Hypothetical results: Rock= 25% accuracy Classical= 50% accuracy No music = 60% accuracy In the previous lecture we designed a hypothetical music experiment. In that experiment we intended to test for the potential effect of background music on problem solving ability. Let s imagine that we actually conducted that experiment and obtained some data. We would now like to present those data graphically. In doing so we plan to plot the mean accuracy of performance across are three conditions: the rock, classical and no music condition. Let s imagine that are hypothetical results look something like the following. The rock conditions rendered 25% accuracy on the problem solving task, the classical rendered 50% accuracy and the no music condition rendered 60% accuracy. Slide #25 Bar Graph Bar graph is used because IV is nominal. Those data presented graphically would look something like this. Notice that in our graph the independent variable is represented on the X axis. In this case are three different music condtions and the dependent variable is presented on the Y axis. In this case mean performance accuracy. A bar graph is most appropriate for this type of study, because this is a case in which the independent variable is nominal or categorical in nature. Slide #26 Graphing Data Example 2: o Memory accuracy recall under different doses of same drug (0, 50, or 100 mg) o Results: (mean accuracy) 0 mg dose= 25% 50 mg dose= 50% 100 mg dose = 60% Now let s look at a slightly different example. Let s imagine that we would like to conduct a different experiment and in this experiment we plan to measure the memory accuracy recall across three different dosages of the same drug. Let s further imagine that our results from this experiment look something like the following. Slide #27 Line Graph Line graph is used because IV lies on a continuum.
Those data could then be presented graphically in a line graph such as this. In contrast to the first experiment a line graph makes more sense in this case, because the independent variable lies on a continuum. Slide #28 Next Lecture That concludes this lecture. Next we will discuss Inferential Statistics. That concludes this lecture. Next we will discuss Inferential Statistics.