Intermediate SAS: Statistics OIT TSS 293-4444 oithelp@mail.wvu.edu oit.wvu.edu/training/classmat/sas/ Table of Contents Procedures... 2 Two-sample t-test:... 2 Paired differences t-test:... 2 Chi Square Test... 3 Correlations and Reliability... 3 Regression... 4 ANOVA: Simple 1-way, unbalanced data... 4 Using SAS from menus (SAS/Insight)... 5 Data Windows... 5 Opening a Data Window... 5 Variables... 6 Autofill with numbers... 8 Observations... 9 Shortcut Menu Items... 10 Edit Menu... 11 Analyze menu... 12 Histograms/Bar Charts... 12 Distribution Analyses... 14 Regression... 15 Multivariate Analyses... 16 Plots... 16 Query... 18 Analyst... 19 Copyright 2003 West Virginia University
Procedures Two-sample t-test: For 2 independent samples... PROC TTEST data=people; CLASS gender; VAR age wt paid score; TITLE '2 Sample T-Test'; PROC TTEST data=computer; where manufacturer='dell' manufacturer = 'APPLE'; class manufacturer; var age cost x y; T-Test Output Variable Method Variances DF t Value Pr > t cost Pooled Equal 444-1.00 0.3183 cost Satterthwaite Unequal 110-2.45 0.0159 x Pooled Equal 444-0.98 0.3294 x Satterthwaite Unequal 37.9-1.02 0.3129 Equality of Variances Variable Method Num DF Den DF F Value Pr > F cost Folded F 412 32 11.39 <.0001 x Folded F 412 32 1.11 0.7361 Paired differences t-test: 1. Create a new difference variable in the data step (e.g. DIFF=POST-PRE;). Insert DIFF=z-y; in data step. 2. Use PROC MEANS with the T & PRT option to see if the difference is significantly different from zero. PROC MEANS MEAN N STD T PRT data=people; VAR DIFF; TITLE 'Paired Differences T-Test'; Analysis Variable : DIFF Mean t Value Pr > t 0.1884453 4.51 0.0015 2 SAS Statistics
Chi Square Test Use Proc Freq to obtain a Chi Square Test of Homogeneity: PROC FREQ; TABLES trt*grp / NOROW CHISQ; trt TRFrequency grp Percent Col Pct 1 2 Total ---------+--------+--------+ 0 3 3 6 18.75 18.75 37.50 75.00 25.00 ---------+--------+--------+ 1 0 5 5 0.00 31.25 31.25 0.00 41.67 ---------+--------ˆ--------ˆ 2 1 4 5 6.25 25.00 31.25 25.00 33.33 ---------+--------+--------+ˆ Total 4 12 16 25.00 75.00 100.00 STATISTICS FOR TABLE OF TRT BY GRP Statistic DF Value Prob ---------------------------------------- Chi-Square 2 3.733 0.155 Correlations and Reliability For Pearson's correlation coefficients and Cronbach's Alpha: PROC CORR alpha nomiss nosimple data=people; VAR x y age score paid wt; Cronbach Coefficient Alpha Variables Alpha Raw -.002782 Standardized 0.116268 x y age score paid wt x 1.00000 0.99963-0.74018 0.99963 0.42158-0.59007 <.0001 0.0144 <.0001 0.2250 0.0725 SAS Statistics 3
Regression PROC REG data=people; MODEL y=z; TITLE 'Simple Linear Regression'; Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 2.20180 1.24321 1.77 0.1145 z 1 0.47970 0.27050 1.77 0.1141 ANOVA: Simple 1-way, unbalanced data PROC GLM data=computer; where building ne 'Brooks' and building ne 'GCS' and building ne 'Wise'; CLASS building; MODEL cost age = building; MEANS building / DUNCAN; TITLE 'Simple ANOVA with Duncan Post-Hoc test'; Source DF Type III SS Mean Square F Value Pr > F Building 5 102.9530830 20.5906166 18.55 <.0001 Duncan Grouping Mean N Building A 5.2989 57 Colson A B A 4.8979 72 Lair B B 4.6705 108 EvLib B B 4.5296 159 1WP C 3.3934 22 Braxton C C 3.2693 20 Emoore 4 SAS Statistics
Using SAS from menus (SAS/Insight) Data Windows A data window displays a SAS data set as a table, with columns of the table containing variables and rows containing observations. In a data window, you can sort, search, edit, and extract subsets of your data. You can also assign measurement levels and default roles that determine how your variables are used in graphs and analyses. Opening a Data Window You can open data windows in several ways. Technique One: Solutions > Analysis > Interactive Data Analysis This dialog displays two lists: Library and Data Set. A library is a location where data sets are stored. The Library list always contains the standard libraries WORK, MAPS, SASHELP, and SASUSER. You can define other libraries using the LIBNAME statement. By default, SASUSER is selected in the Library list. To see the data sets in any other library, click on the library's name. This causes the Data Set list to display all data sets in that library. By default, the first data set in the Data Set list is selected. To select another data set, click on its name. Then click on OK to display the data window. The Options button on the dialog enables you to enter WHERE clauses and other SAS data set options. SAS Statistics 5
You can open any number of data windows on different data sets, but you can open only one data window on each data set. Variables The column headings in a data window give information on each variable, including the name, label, default roles, and measurement level. The number of variables appears in the upper left corner of the data window. A variable's default role assigns the role a variable plays by default in graphs and analyses. Click in the upper left portion of the variable header to display a pop-up menu of variable roles. Variable Roles Pop-up Menu You can assign four default roles: Group: enables you to process your data by groups. You can use multiple group variables to process your data by groups for each unique combination of values of the group variables. Label:labels observations in scatter plots, rotating plots, and box plots. Frequency:represents the frequency of occurrence for other values in each observation. Weight: supplies weights for each observation. You can assign Freq, Weight, and Label roles to only one variable at a time. You can assign the Group role to more than one variable. The order in which you assign the group role determines the order in which the variables are used to define groups. 6 SAS Statistics
A variable's measurement level determines the way it is treated in graphs and analyses. Measurement Levels Pop-up Menu You can assign two measurement levels: Interval: contains values that vary across a continuous range. For example, a variable measuring temperature would likely be an interval variable. Numeric variables default to the interval measurement level but can be changed to nominal. Nominal: contains a discrete set of values. For example, a variable indicating gender would be a nominal variable. Character variables can use only the nominal measurement level. Up to 250 variable measurement levels can be stored with a data set. Default roles and measurement levels are displayed in the column headings above the variable names. The default role appears at the upper left of the column heading and the measurement level appears at the upper right. If a variable has more than one default role, then only the first character of each role appears. You can use the data pop-up menu to create new variables or to change the default role or measurement level of existing variables. You can use the Edit > Variables menu to create new variables that are transformations of existing variables. SAS Statistics 7
Autofill with numbers 1. Right click on data or click on the triangle in the upper right corner. 2. Select New Variables from the shortcut menu 3. A new column named A will be added to the rightmost position 4. To change its name, click on the A 5. Bring up the shortcut menu again and select Define Variables 6. Give the variable a better name and click on OK 7. Bring up the shortcut menu one more time and select Fill Values. 8. Supply an initial value and an increment 9. When you click on OK, that column will be filled with sequential numbers. Shortcut menu 8 SAS Statistics
Observations The row headings in a data window give information on each observation, including the observation states and observation number. The total number of observations appears in the upper left corner of the data window. SAS/INSIGHT software supports the following observation states: Marker: shows the shape of the marker used in scatter plots, rotating plots, and box plots. Color: shows the color of the observation. o An observation's marker and color appear at the left side of the row heading Label/UnLabel: tells whether a label is displayed by default. o An observation's Label/UnLabel state is shown by a picture of a label around the observation number if the observation's label is displayed by default. Show/Hide: tells whether an observation is displayed in graphs. o An observation's Show/Hide state is shown by whether or not a marker is displayed in the row heading. Include/Exclude: tells whether an observation is included in calculations for curves and analysis tables. o An observation's Include/Exclude state is shown by the way the observation number is displayed. The observation number is grayed-out for observations that are excluded from calculations. Select: tells whether an observation is selected. o An observation's select state is shown by whether the row heading is highlighted or not. You can use the Edit > Observations menu to set all of these observation states. This menu also enables you to find observations meeting a specific search criterion or to examine observations in detail. You can also use the observation pop-up menu to set observation states. To see this menu for a particular observation, click on the observation's marker. Observation Pop-up Menu SAS Statistics 9
Shortcut Menu Items You can sort the data or search it to select observation(s) that meet a criteria. You can select Data Options that control how your environment will work. You can right click on an observation and choose Examine to see its values in a different format. 10 SAS Statistics
Edit Menu SAS Statistics 11
Analyze menu Histograms/Bar Charts Bar charts are pictorial representations of the distribution of values of a variable. You can use bar charts to show distributions of interval or nominal variables. Bar charts of interval variables are also called histograms. You can label the heights of the bars in a bar chart, control the orientation, and control the information shown on the axes. For bar charts of interval variables, you can also control the width and offset of the bars. Choose: a numeric variable Analyze > Histogram choose a character or nominal variable Analyze > Histogram For nominal variables, bars are distinguished by different colors. For interval variables, all bars have the same color. If you click on a bar, you will see its value appear at the top and those rows will be selected in the data window. 12 SAS Statistics
Histograms continued If you do not select a variable first, you will get a dialog box. Click on the variable name then the appropriate button to assign the variable to that role. Click on the Output button to modify the appearance of the histogram. SAS Statistics 13
Distribution Analyses Click on: continuous numeric variable Analyze > Distribution character or nominal variable. Analyze > Distribution Click on the little triangle at the bottom left corner of each picture to choose to choose whether to display values and reference lines. If you do not select the variable first, you will get a dialog box similar to the one for creating histograms. The Methods and Output buttons allow you to select additional options. 14 SAS Statistics
Regression Click on the Y variable. Hold down the Ctrl key and click on the X variable. Or just select the menu item and complete the dialog box. Analyze > Fit SAS Statistics 15
Multivariate Analyses Select multiple variables by Ctrl clicking them or use the dialog box. Analyze > Multivariate. If you use the dialog, you can choose different Methods and Output options. Plots Select a Y followed by an X variable. Analyze > Line Plot Analyze > Scatter Plot 16 SAS Statistics
Plots continued Select a numeric variable Analyze > Box Plot Select Z, Y, and X variables in that order Analyze > Rotating Plot Edit> Windows > Tools If you select the hand tool, you can click in the plot area to rotate the graph in 3D. Select Z, Y, and X variables in that order Analyze > Contour Plot SAS Statistics 17
Query Right click on saved system file in library and select Query. Or Tools> Query and select a dataset and table from the dialog box. Then select columns for your query. You can change their labels and formats using the buttons in the center. Summary Functions: Sample Output Building Number AVG(age) ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1WP 159 4.5 Braxton 22 3.4 Brooks 1 5.4 Colson 57 5.3 Emoore 20 3.3 EvLib 108 4.7 GCS 5 4.1 Lair 72 4.9 Wise 11 3.4 18 SAS Statistics
Analyst Solutions > Analysis > Analyst Open a SAS data set then use the menu to select analyses. Menu Example: Statistics > Regression > Linear SAS Statistics 19