(التوزيع الطبيعي ( Distribution Normal (Gaussian) One of the most important distributions in statistics is a continuous distribution called the normal distribution or Gaussian distribution. Consider the following, if we measured cells or people or plants or biochemical reactions (as absorbance values in a spectrophotometer) etc. we would find a range of variation. If we were to put these measurements into appropriate categories or class intervals and then plot the numbers in each category as a histogram it would look like Figure (a): Fig. Gradation of a histogram (a) into the normal curve (b) As the of observations n used to construct the histogram approaches infinity, and the width of the class intervals goes to zero, and we construct a curve over the histogram, this curve will be a smooth continuous, bell-shaped curve, with no gaps between the histogram and curve (Fig. b). Data of this sort are said to be normally distributed. Most of our measurements (data points) would be close to the mean, and progressively fewer would depart widely from the mean. Normal Curve equation The normal distribution curve is an infinite number of possible curves, all described by the same algebraic expression namely probability density function: 1 2 2 ( x µ ) /2 y = e 2π Where y is the height of the curve for a given value x, e is the base of the natural logarithms (approximately 2.71828), and π is the well-known constant (about 3.141519). The parameters µ and 2 are the mean and the variance, respectively, of the normal random variable. µ x It is obvious that, the mathematical equation for the probability distribution of the continuous normal variable depends upon the two parameters µ and. Once µ and are specified, the normal curve is completely determined. 51
Properties of the Normal Distribution 1. It has the appearance of bell-shaped curve extending infinitely in both directions. 2. Unimodal, symmetrical about the mean µ, and its mode occurs at x = µ. 3. The normal curve approaches the horizontal axis as we proceed in either direction away from the mean. 4. The total area under the curve and above the horizontal axis is equal to 1. 5. Whether the mean or standard deviation is large or small, the relative area between any two designated points is always the same. Let s look at three commonly used points along the abscissa. In Figure below (where µ = 100 and = 15), we see the following a. 68.26% of the area (data) is contained within µ ± 1, b. 95.45% of the area is contained within µ ± 2, c. 99.74% of the area is contained within µ ± 3 These rules are commonly known as the Empirical Rule. 99.74% 95.45% 68.26% Example on the Empirical Rule 55 70 85 100 115 130 145 µ-3 µ-2 µ-1 µ µ+ 1 µ+ 2 µ+ 3 Women participating in a three-day experimental diet regime have been demonstrated to have normally distributed weight loss with mean 600 g and a standard deviation 200 g. a) What percentage of these women will have a weight loss between 400 and 800 g? Since 68.26% of the area (data) is contained within µ ± 1, Thus, the percentage of these women will have a weight loss between 400 and 800 g which equal to 600 ± 200 or µ ± 1 is approximately 68% 6. The normal distribution depends on the values of the parameters µ, the population mean and 2, the population variance. a. Two normal curves, which have the same standard deviation but different means. 52
µ 1 µ 2 x µ 1 < µ 2 The two curves are identical in form but are centered at different positions along the horizontal axis. b. Two normal curves with the same mean but different standard deviations. µ 1 = µ 2 x 1 < 2 The two curves are centered at exactly the same position on the horizontal axis, but the curve with the larger standard deviation is lower and spreads out farther. c. Two normal curves that have different means and different standard deviations. 1 2 µ 1 µ 2 x µ 1 < µ 2 1 < 2 The two curves are centered at different positions on the horizontal axis and their shapes reflect the two different values of. Areas under the Normal Curve The area under the curve bounded by the two ordinates x = x 1 and x = x 2 equals the probability that the random variable x falls between x = x 1 and x = x 2. Thus, for the normal curve in the Figure below, the P(x 1 < x < x 2 ) is represented by the area of the shaded region. 53
The area under the curve between any two ordinates depend upon the values of µ and and consequently, the probability associated with distributions differ in mean and standard deviation are different Suppose the Figure below, where we have shaded regions corresponding to P(x 1 < x < x 2 ) for two curves with different means and variances. a. the P(x 1 < x < x 2 ), where x is the random variable describing distribution I, and indicated by the cross-hatched area b. P(x l < x < x 2 ) where x is the random variable describing distribution II and given by the entire shaded region Obviously, the probability associated with distribution I and distribution II are different from each other. (التوزيع الطبيعي المعياري) Standard normal distribution Is a normal distribution with mean µ = zero and variance 2 =1 Converting normal distribution into a standard normal distribution (Standardizing the normal curve) Any normal distribution can be transformed into the standard normal distribution with mean zero and variance 1, by apply the following formula: = x µ z This standardization is called the z-transformation, and z is sometimes referred to as z score, z value, or normal deviate. 54
Importance of standard normal distribution In discussing normal distribution, we have seen that there are many normal distributions which depend on values of µ and 2. In order to determine areas under normal curve, to determine the probability that any normally distributed random variable falls in a given interval it is necessary to set up separate tables of normal curve areas for every conceivable value of µ and which would be a hopeless task. Standard normal distribution on the other hand has µ = 0 and 2 =1 which allow us to reduce the required number of tables of normal-curve areas to only one-that of the standard normal distribution. P(x 1 <x<x 2 ) =P (z 1 <z<z 2 ). When x is between the values x = x 1 and x = x 2, the random variable z will fall between the corresponding values x µ x µ z = 1 1 and z = 2 2 The original and transformed distributions are illustrated in the two Figures below. Fig. Normal distribution Fig. Standard normal distribution Since all the values of x falling between x 1 and x 2 have corresponding z values between z 1 and z 2, the area under x curve between the ordinates x = x 1 and x = x 2 in the above Figure equals the area under the z curve between the transformed ordinates z = z 1 and z = z 2. Hence we have P(x 1 <x<x 2 ) =P (z 1 <z<z 2 ). The use of the table named area under the normal curve The Appendix of all statistics textbook gives a table of the area under the standard normal curve lying to the left of any specified z value ranging from -3.8 to 3.8. Computing Normal Probabilities 1. State the problem. 2. What is the appropriate probability statement? 3. Sketch a curve and shade required area 4. Convert to a standard normal distribution 5. Find the probability in the standard normal table 55
Example Suppose that the hemoglobin levels for healthy adult males are approximately normally distributed with a mean of 16 and variance of 0.81. Find the probability that a randomly chosen healthy adult male has a hemoglobin level less than 14. Solution We are to find P(x < 14) To find the P(x < 14), sketch a curve and evaluate the area under the normal curve to the left of x = 14. This can be done by transforming x = 14 to the corresponding z value and obtaining the area to the left of z from the table. =0.9 Since µ = 16 and = 0. 81 = 0.9 14 16 We find Z = = 2.22 0.9 P(x < 14) = P(Z < 2.22) = 0.0132 x 14 16 Z -2.22 0 Therefore, only about 1.3% of healthy adult males have hemoglobin levels less than 14. Example Suppose that the serum cholesterol levels of Palestinian women aged 21-30 are approximately normally distributed with a mean of 4.7 mmol/l and a variance of 0.25. Find the probability that a randomly chosen Palestinian woman aged 21-30 has a serum cholesterol level more than 5.6 mmol. Solution In this example we are to find the P(x> 5.6) Again, sketch a curve, this time following the model of the figure below. To find the P(x> 5.6), we need to evaluate the area under the normal curve to the right of x = 5.6. This can be done by transforming x = 5.6 to the corresponding z value, obtaining the area to the left of z from the table, and then subtracting this area from 1. Since µ = 4.7 and = 0. 25 = 0.5 We find 5.6 4.7 = = 0.5 z = x µ 0.9 0.5 = 1.8 Hence P(x> 5.6) = P(z > 1.8) = 1 P(z < 1.8) = 1 0.9641 = 0.0359. =0.5 x 4.7 5.6 Z 0 1.8 So about 4% of Palestinian women aged 21-30 have a serum cholesterol levels of more than 5.6 mmol/l. 56
Example IQ scores (IQ is Intelligence Quotient) on the Wechsler Adult Intelligence Scale are approximately normally distributed with mean µ = 100 and = 15. a. What is the proportion of persons having IQs between 80 and 120? Solution Let x 1 = 80 and x 2 = 120 This question means that we are to find the probability P(x 1 <IQ<x 2 ) = P(80< IQ <120) Sketch a normal curve like the one in the figure below. Shade in the area desired. Find the z values corresponding to x 1 = 80 and x 2 = 120 x µ z = 1 80 100 20 1 = = = 1. 33 15 15 120 100 20 Z 2 = = = 1.33 15 15 Since, P(x 1 <x<x 2 ) =P (z 1 <z<z 2 ) Therefore, P(80<IQ<120) = P( 1.33<z<1.33) The P( 1.33<z<1.33) is given by the area of the shaded region in the figure. This area may be found by subtracting the area to the left of the ordinate z = 1.33 from the entire area to the left of z = 1.33. By using the table named area under the normal curve we have P(80<IQ<120) = P( 1.33<z<1.33) = P(z<1.33) P(z< 1.33) = 0.9082 0.0918 = 0.8164. Therefore the proportion of persons having IQs between 80 and 120 is 0.8164, about 82%. Inverse Normal Distribution OR, Finding z values when probabilities (areas) are given 1. State the problem 2. Draw a picture IQ 80 100 120 Z 1.33 0 1.33 57
3. Use the table to find the probability closest to the one you need 4. Read off the z-value 5. Unstandardise i.e. x = µ + z Example Given a normal distribution with µ = 40 and = 6, find the value of x that has a. 45% of the area below it b. 14% of the area above it. Solution In this problem we reverse the process and begin with a known area or probability, find the z value, and then determine x by rearranging the formula µ z = x to give x = z + µ a. An area of 0.45 to the left of the desired x value is shaded in the figure below. We require a z value that leaves an area of 0.45 to the left. From the table we find P(z <?) = 0.45 so that the desired z value is -0.13. x = 6 0.13 + = 39.22 Hence, ( )( ) 40 = 6 0.45 b. 14% of the area above it. 40 x In the figure below, we shade an area equal to 0.14 to the right of the desired x value. This time we require a z value that leaves 0.14 of the area to the right and hence an area of 0.86 to the left. Again from the table we find P(z <?) = 0.86 so that the desired z value is 1.08 and x = (6)(1.08) + 40 = 46.48 = 6 0.14 40 x 58