Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with a little practice, students should find that the new layout is logical and easy to use. The tabs across the top can be thought of as tool boxes, where the options within each tab represent the individual tools available in Excel. Once XLSTAT has been installed, the functionality can be found under the XLSTAT tab. When the XLSTAT tab is selected, you will see the following screen: The options for analysis are grouped into logical categories like Discover, explain and predict and Test a hypothesis. Descriptive Statistics and Confidence Intervals for Means Use the following procedure to find the complete description of a variable, including the mean, median, and standard deviation. 1. Select XLSTAT>Describing Data>Descriptive Statistics. 2. The screen should default to the General tab. Place your cursor inside the first box Quantitative data:. Click on the first cell of the column of the variable to be analyzed; this cell should contain the variable name. Highlight the entire variable column. 3. The Options and Charts tabs will default to the required statistics; explore these tabs to determine the full range of options. To generate the 95% Confidence Intervals around the mean, select the Outputs tab. Scroll through the list of output options available for Quantitative Data:. 5. Check the two boxes corresponding to the lower and upper bounds of the confidence interval. Select OK and Typical output with descriptive statistics is shown. Notice that the variable name from the first row appears in the output. This is particularly useful when analyzing multiple variables simultaneously. To generate descriptive statistics for multiple variables simultaneously, select all of the variables of interest, following the instructions from step 2. Ensure that the variable names from the first row are captured. Select OK and Scatterplot 1. Select XLSTAT>Visualizing Data>Scatter plots. The following screen will appear: 2. Place your cursor inside the X: box. Click on the first row of the variable to be assigned to the (horizontal) x-axis and highlight the full range of data for the variable. Do the same for the Y: box. This variable will be assigned to the (vertical) y-axis. Note that both variables must be quantitative and have the same number of rows. The Options tab will default to the required statistics. Click OK and ISBN-13: 978-0-321-74775-4 ISBN-10: 0-321-74775-5 9 0 0 0 0 1 9 780321 747754
Correlation 1. Select XLSTAT>Test a Hypothesis>Correlation/Association tests>correlation tests. The following screen will appear: 2. Place your cursor inside the Observations/variables table: box. Click on the first row of the first variable to be analyzed and highlight the range of variables to be included, then highlight the columns of data to be included. Note: The quantitative data to be included in the correlation analysis should be in columns next to each other. If this is not the case, reposition the variables in the data set. 3. On the General tab, the type of correlation analysis will default to Pearson, which is typically used with quantitative data of sample sizes greater than 30. The other options are Spearman and Kendall Correlation values. These options are typically used with ordinal data or data with fewer than 30 observations. The significance level will default to 5% (95% confidence), but can be easily changed. Select the Missing data tab. The first option, Do not accept missing data, will result in the analysis not running if there are any missing values in the data set. The second option, Remove the observations, will ignore any observations for all correlations with any missing values, even if those values were not required for the correlation. The third option, Pairwise deletion, will ignore observations only when the missing values were required for the correlation analysis. The fourth option, Estimate missing data, will replace any missing values, using the imputation option selected (mean or mode or nearest neighbor). Be certain to select the option most appropriate for your data. 5. The Outputs and Charts tabs will default to the required statistics and plots. Select OK and Regression Modeling, Finding the Equation of Regression Line, and Residual Plots Use the following procedure to generate a linear regression model. 1. Select XLSTAT>Modeling Data>Linear regression. The following screen will appear: 3. The Options and Validation tabs will default to the required statistics and plots. Select OK and Typical output from a linear regression model is provided. Scrolling to the bottom of the output, several plots of residuals can be found to help assess the stability and generalizability of the model. 2. Place your cursor inside the Y/Dependent variables: box. Click on the first row of the variable to be assigned as the dependent variable the variable to be predicted or explained and highlight the entire data range for the variable. Do the same with the X/Explanatory variables:. Note that several variables can be included here. All of the variables selected for these two roles must be quantitative; there is a third box reserved for qualitative variables, should one be required for a model. Displaying Categorical Data Using Frequency Counts, Bar Charts, and Pie Charts Use the following procedure to generate a frequency table, a bar chart, and a pie chart for a categorical variable. 1. Select XLSTAT>Describing data>descriptive statistics. 2. Place your cursor inside the Qualitative data: box. Click on the first row of the variable to be analyzed and highlight the entire column of data. The Options and Outputs tabs will default to the required statistics and plots. Select the Charts (2) tab and check that you want to generate both the Bar Chart and the Pie Chart. Select OK and 2
Histogram 1. Select XLSTAT>Visualizing Data>Histogram. The following screen will appear: 3. Typical output is provided: 2. Place your cursor inside the Data: box. Click on the first row of the variable to be analyzed and highlight the entire column of data. The Options and Missing data, etc. tabs will default to the required statistics and plots. Select OK and Boxplot 1. Select XLSTAT>Visualizing Data>Univariate plots. 2. Place your cursor inside the Quantitative data: box. Click on the first row of the variable to be analyzed and highlight the entire column of data. The Options and Outputs tabs will default to the required statistics and plots. Select OK and 3. Typical output is provided: 5. To create side-by-side boxplots of a quantitative variable by different values of a qualitative value (such as gender), after selecting the quanitative variable in step 2, check the Subsamples: box. Place your cursor in the larger box and select the qualitative variable to be included. Select the Outputs tab and check the Group plots box. Select OK and Typical output is provided: Assessing Normality and Goodness of Fit Use the following procedure to assess the normality of a quantitative variable or its goodness of fit to a particular distribution. 1. Select XLSTAT>Modeling data>distribution fitting. 2. Place your cursor inside the Data: box. Click on the first row of the variable to be analyzed and highlight the entire range of the data. To test if the data follows a normal distribution, ensure that Normal appears in the Distribution: box. 3. Select the Options tab. XLSTAT provides two tests to assess the fit of the data to the theoretical distribution selected from the General tab. The Chi-square goodness of fit test is a parametric test using the distance between the histogram of the theoretical distribution and the histogram of the empirical distribution of the sample. The histograms are calculated using k intervals selected in the Number: box. This test is better for discrete data. The Kolmogorov-Smirnov goodness of fit test is an exact nonparametric test based on the maximum distance between a theoretical distribution function and the empirical distribution function of the sample. This test can be used only for continuous distributions. Select the test most appropriate for the data. continued 3
Assessing Normality and Goodness of Fit (continued) The Options, Missing data, Outputs, and Charts tabs will default to the required statistics and charts. Select OK and 5. Typical results for the Kolmogorov-Smirnov test and Chi-Square test are shown: Note: The Kolmogorov-Smirnov and Chi-square test results explain whether the distribution is normal. For these tests, a low p-value (less than the alpha value), would indicate that the distribution is not normal. Sampling 1. Select XLSTAT>Preparing data>data sampling. Place your cursor inside the Data: box. Highlight the entire range of the data all variables and all observations. 2. The Sampling: box includes several options. If the data have been sorted in any way, the first two options N first rows and 3. N last rows may not be appropriate. For a simple random sample, the third option Random without replacement may be most appropriate. Review all options prior to making a selection. Enter the required number of observations into the sample in the Sample size: box. Select OK and The resulting output will be a random subset of the original dataset. Hypothesis Test and Confidence Interval for a Single Proportion 1. Select XLSTAT>Test a hypothesis>parametric tests>tests for one proportion. The following screen will appear: 2. In the Frequency: box, enter the frequency of the condition of interest. For example, if the sample includes 119 people and 62 are women and the test and confidence interval will be executed on the proportion of women, enter a value of 62. In the Sample size: box, enter the total number of individuals in the sample. Note that this information can be obtained by generating the descriptive statistics for the qualitative variable of interest (Describing data>descriptive statistics). In the Test proportion: box, enter the theoretical proportion against which the sample proportion is being tested. If there is no test being conducted, enter a value of.50. 3. Select the Options tab. The Alternative hypothesis: box provides three options for hypothesis testing: two-tailed test (the default) and a one-tailed test in each direction (less 4 than and greater than the hypothesized difference between the population proportion and the Test proportion). Select the most appropriate option. In the Hypothesized difference: box, enter the value of the hypothesized difference (typically, but not always, 0). In the Significance level (%): box, enter the alpha value for the test (typically, but not always, 5 (5%)). Note that this also corresponds to a 95% confidence interval. The confidence interval options represent slightly different calculations of intervals. Review the differences and select the option most appropriate for your data. Select OK and Typical output is provided below:
Hypothesis Test and Confidence Interval for the Difference between Proportions 1. Select XLSTAT>Test a hypothesis>parametric tests>tests for two proportions. 2. Follow step 2 of Hypothesis Test and Confidence Interval for a Single Proportion. 3. Select the Options tab. Follow step 3 of Hypothesis Test and Confidence Interval for a Single Proportion. The two variance options represent an unpooled (unequal variance) approach and a pooled (equal variance) approach, respectively. Review the differences and select the option most appropriate for your data. If you are unsure, select the more conservative unpooled approach, which is the default. Select OK and Hypothesis Test and Confidence Interval for One Sample Mean 1. Select XLSTAT>Test a hypothesis>parametric tests>one sample t-test and z-test. 2. Place your cursor inside the Data: box. Click inside the first row of the variable to be analyzed and highlight the full range of the variable. 3. Select the Options tab. Follow step 2 of Hypothesis Test and Confidence Interval for a Single Proportion. 5. The Missing data tab provides options for what to do with missing data. If there are any missing values, select the second option Remove the observations. The Outputs tab will default to the required statistics. Select OK and Typical output is provided below: Hypothesis Test and Confidence Interval for Mean of Paired Differences 1. Select XLSTAT>Test a hypothesis>parametric tests>two sample t-test and z-test. The following screen will appear: 6. Typical output is provided below: 5. Follow step 3 and step 5 of Hypothesis Test and Confidence Interval for One Sample Mean. 6. The Outputs and Charts, tabs will default to the statistics that are required (the defaults will not produce charts). Select OK and 7. Typical output is provided below: 2. Place cursor in the Sample 1: box. Click inside the first row of the first pair of variables to be analyzed and highlight the full range of the variable. Place cursor in the Sample 2: box and highlight the full range of the second variable. 3. Under the Data format: options, identify that the data is Paired samples. Select the Options tab. 5
Hypothesis Tests and Confidence Interval for Difference of Means in Two Independent Samples 1. Select XLSTAT>Test a hypothesis>parametric tests>two sample t-test and z-test. 2. Prior to selecting the data for analysis, you must identify its Data format. For a hypothesis test of two independent samples, the data could exist in one of two formats. Data would reflect the first format, one column per sample, if the two samples were in different columns (i.e., Female Heights and Male Heights ). Data would reflect the second format, one column per variable, if all of the quantitative data for both samples is in a single column (i.e., Height ) and the sample identifiers or categories exists in a separate column (i.e., Gender ). 3. Place cursor in the Sample 1: box. Click inside the first row of the first pair of variables to be analyzed and highlight the full range of the variable. Place the cursor in the Sample 2: box and highlight the full range of the second variable. Finding the Area Under the Normal Curve and Inverse Normality 1. Select the quantitative variable to be analyzed. At the bottom of that variable column, generate the mean and the standard deviation. To generate, enter the formulas =AVERAGE(A2:A41) and =STDEV(A2:A41), where A2 through A41 is the range of the data and standard deviation. 2. Insert a new blank column next to the variable of interest. To do this, click on the top of the column where you want to insert a new column. Select Home>Insert. 3. To find the associated cumulative area under the normal curve of values for a variable, place your cursor in the second row of the new column. Note that the first cell should be used to name the column. Click the fx button. In the Search for a Function box, type Normal Distribution. From the Select a Function list, click on NORM. DIST. You will see the following screen: In the X box, click on the first value in your variable of interest. In the Mean box, click on the cell where you calculated the AVERAGE. Type a $ in front of the letter and in front of the number, referencing the cell where the Generating Random Numbers Follow step 3 of Hypothesis Test and Confidence Interval for a Single Proportion. 5. The Missing data tab provides options for what to do with missing data. Review and select the most appropriate option. The Outputs and Charts tabs will default to the required statistics (the defaults will not produce charts). Select OK and 6. Typical output is provided: AVERAGE was calculated. In the Standard_dev box, click on the cell where you calculated the STDEV. Again, type a $ in front of the letter and the number of the cell reference. In the Cumulative box, type TRUE and click OK. 5. The resulting value will be the cumulative probability (from negative infinity) associated with the value in that row of the variable of interest, assuming a normal distribution. Copy this function to the bottom of the column. 6. To determine the inverse the value of interest based upon a normal probability of occurrence create a new column to the right of the data of interest. Click the fx button. In the Search for a Function box, type Normal Distribution. From the Select a Function list, click on NORM.INV. You will see the following screen: 7. In the Probability box, enter the cumulative probability from the normal curve in which you are interested. Repeat step 4 to enter the necessary values into the Mean and Standard_dev boxes. Click OK. 8. The resulting value will be the observation associated with the cumulative probability indicated, assuming a normal distribution. 1. Create a new column on the right of your dataset titled RANDOM. 2. Inside the first open cell row 2 type the following function: =RAND(). 3. After you click Enter, a random number, following a uniform distribution between 0 and 1, will be generated. Once your random number has been generated, from the Home tab click Copy. Highlight the remainder of the column to the end of the data and click Paste. You may have noticed that the first value in row 2 changed. RAND is a volatile function in Excel, meaning that the result will change whenever a change is made in the spreadsheet. To resolve this, simply highlight the entire RANDOM column, select Copy, and then under the Paste options, select Paste Values. 6