STAT10010 Introductory Statistics Lab 2 1. Aims of Lab 2 By the end of this lab you will be able to: i. Recognize the type of recorded data. ii. iii. iv. Construct summaries of recorded variables. Calculate and interpret the margin of error in a survey. Draw a stratified random sample of data. v. Calculate some descriptive statistics for a data set. 2. Survey data In this lab we will work with some survey data collected in a political study in the USA. The researcher wanted to assess if there was an association between age or gender and candidate preference (Democrats, Republicans, and Others) in a presidential election. The researcher randomly selected 400 individuals and asked them the following 3 questions: 1) What gender are you? 2) What age are you in years? 3) Is the candidate you will back in the upcoming presidential election a: (a) Democrat (b) Republican or (c) other? Q1: Is each question asked by the researcher an open question or a closed question? From Blackboard, download the Minitab worksheet file called PoliticalPoll.mtw to your computer and open it in Minitab. (Recall from lab 1 how to open a Minitab worksheet.) Your worksheet should then look like the screen below:
Clearly the first column contains the gender of each person in the survey, the second column contains their political preference and the third, their age. Scroll down to double check that there are 400 observations/people in your data set (i.e. there should be 400 rows of data in your worksheet). Q2: The data recorded for the gender variable are categorical data; are the data ordinal or nominal? Q3: What type of data is recorded for the preference variable? Q4: What type of data is recorded for the age variable? Note that the Gender column and the Preference column are both in text format (again, recall lab 1.). To analyse the data it will often be easier to work with the data in numerical format. To change the format of the data, in the menu bar go to Data, then Code, then Text to Numeric. Code the gender variable as 0 for female and 1 for male, and save the new data in column C4, say, in your worksheet. Give your new column of data a label. Re-code the preference data column in the same way.
Let s look at some tables which summarise the information in our data set. In the menu bar go to Stat, then Tables, then Tally Individual Variables. Choose your new numerically expressed gender data, and your new numerically expressed preference data. Q5: How many females were in the sample of 400 people? Q6: How many people in the sample supported neither the Democrats nor the Republicans? Q7: What proportion support the Democrats? 3. The margin of error One way of assessing the uncertainty in our estimate of the proportion which supports the Democrats is through the margin of error. Recall from lectures that the margin of error in a survey in which the sample is of size n is 1 divided by the square root of n i.e. MoE = 1 / n Let s calculate the margin of error for the political poll data set. In the menu bar, go to Calc, then Calculator. Store your result in the next free column (probably column C6). In the Expression box, enter 1/SQRT(400). You should be able to find the SQRT function in the list of arithmetic functions. Q8: What is the margin of error (in %) of the study? Q9: What is the interval in which the true proportion which supports the Democrats lies? 4. Stratified random sampling Let s now draw a stratified random sample from the political poll data. Recall from lectures what is meant by a stratified random sample. Let s treat the two gender categories as our two strata. Say we wish to draw a random sample of size 10 from each stratum. Let s first organise our data so that all the female data is grouped together, and then all the male data. Go to Data, then Sort. You want to sort all the columns of data, and you want to sort them by gender. Check the original columns option.
Your worksheet should now be organised such that all the female data are in the first rows followed by all the male data. The female observations are numbered 1 up to 204 let s choose a random sample of size 10 from this set of observations. Select Calc, then Random Data and then Integer. Ask Minitab to generate 10 rows and to store the resulting sample in your next free column. Enter 1 as the minimum value and 204 as the maximum. The numbers generated are a set of random numbers. Each observation in our original (female) data set included in the list of random numbers will be an observation in our stratified random sample. To construct a new data set consisting of the randomly sampled female observations go to Data, Subset Worksheet. Check the row numbers box, enter the list of randomly generated numbers and press OK. In the next Window, tell Minitab to include all the columns containing data. A new worksheet of data should pop up save this worksheet as in your STAT10010 folder on your H drive. Repeat this to draw a sample of 10 observations from the male stratum and save your worksheet. Q10: Based on your new stratified random samples, which stratum has the higher proportion of support for Republicans?
5. Some basic descriptive statistics To calculate some descriptive statistics we can use the Stat, Basic Statistics, Display Descriptive Statistics option. Click the statistics box, and ensure that only the mean, minimum and maximum boxes are checked. Q11: Which stratum has the largest average age? Q12: Which stratum has the maximum age? ooo You now have now worked with some categorical and numerical data, drawn some conclusions based on sampled data and drawn a stratified random sample. Some more steps into the world of a statistician ooo