... DATA REDUCTION: AN INTRODUCTION TO PRINCIPAL COMPONENTS ANALYSIS. Statistics

Size: px
Start display at page:

Download "... DATA REDUCTION: AN INTRODUCTION TO PRINCIPAL COMPONENTS ANALYSIS. Statistics"

Transcription

1 DATA REDUCTION: AN INTRODUCTION TO PRINCIPAL COMPONENTS ANALYSIS Jennifer L. Caveny, M.S. James F. Murray, Ph.D. U.S. Quality Algorithms, Inc. This paper is an introduction to the method of Principal Components (PC) Analysis and the SAS Procedure PRINCOMP. First, we will give a quick ovelview of the method. The second section of the paper will introduce the SAS procedure and outline the minimum required coding. In the third section, we'll present an example. Finally, we'll demonstrate through the example some code which can be used to graph the principal components. Section I. Introduction to Principal Components Analysis PC Analysis has been around for nearly a hundred years. The method was introduced by Pearson around the turn of the century and further developed by Hotelling in It is one of many empirical approaches to data reduction. The main goal of PC Analysis is to reduce the variables in a data set to a minimum number of measurable dimensions while retaining the maximum information. These minimum dimensions are the principal components. The idea of PC Analysis is to manipulate the original variables (by applying weights) into a new set of "composites" which contain as much of the information from the original data as possible. In order to retain the original relationships, these composites are linear combinations of the original variables. Each of the composites will explain a unique proportion of the variability in the original variables so that the set of all composites e""plains all of the original variability. PC Analysis is commonly used in the field of Psychology to create scoring algorithms for psychometric instruments. For example, patients respond to 1 questions on a standardized personality inventory. PC Analysis will collapse the original 1 questions to a smaller number of composite scores (say 5 or 1) representing major personality characteristics. These scores would be produced by applying a set of weights to the original responses.. For any one characteristic those questions which relate similar information are giv~n large weights and those questions which do not contribute any information are weighted less for that characteristic. PC Analysis is also beneficial with survey data where it is used to produce scoring algorithms. The original variables can be replaced with the principal components in any analysis or summarization which was appropriate for the original data but not feasible because of the large number ofvariables. We would like to choose each composite so that it accounts for as much of the original variability as possible. In other words, we want each composite to be highly correlated with the original variables. The degree of correlation between a composite and the original variables is directly related to the amount of variability contained within that single composite. Therefore, to maximize the variability of a composite is to maximize the correlation between that composite and the original variables. Thus, PC Analysis searches for those sets of weights which produce linear composites explaining the maximum possible variance. Consider the set of example data plotted as x and y in Figure 1 below.., FlgUl1l 1... I,1 + y 1 I....! x " 781

2 We see that the data seem to fall in somewhat of an elliptical pattern. Imagine a rotating diameter line through the center point of the ellipse and consider the variability in the data at each angle. Figure 2 shows that the dimension with the most variability falls along the major axis of the ellipse (line a). Assume this line explains 85% of the variance. We can account for the remaining 15% of the variability by constructing line b. These two lines could be considered the two principal components of the original data. y " I 't j, I Figale 2... "'" "..*.,," " ~... ';(. '1... ".....'......'.. ". ".*,.~.:...,'fi'.,.~~.... "....- x Now we can see that with this set of data there are two possible linear combinations and therefore two sets of weights to be applied to the original variables. But, how do we come up with these weights? Through the work of many famous (and not so famous) statisticians and researcbers, it has been shown that the eigenvectors of the covariance matrix make a good choice for these weights. Because this paper is intended to be an introduction to the method through SAS, we will not go into the details of the theoretical derivation. We will only go so far as to state that the covariance matrix can be decomposed into latent vectors which can be used as the sets of weights to produce the principal components. In addition. associated with each latent vector is a latent root (or eigenvalue) which can be interpreted as the proportion of variability in the set of original variables which is e:\.-plained by this linear combination. Principal components are extracted in the order of the amount of variability explained. Therefore, the first component identified is that which explains the largest amount of variability. As the components are identified, each will explain the largest proportion of the remaining " variance but less variance than the previous components. Also, because each component explains a unique piece of the original variability, all components will be uncorrelated with the previous components. Principal components can be produced uilti11% of the variability in the original data is explained. The number of principal components extracted may be as large as the number of original variables. However, the first few composites usually explain a sufficient proportion of the variability from the original data. Determining how many and which components to retrieve is the most difficult part of PC Analysis. The SAS system will compute as many principal components as required to explain 1% of the variability in the original variables. It is up to the user to determine how many of those components are practically useful and what "construct" each represents. The idea is to choose the smallest number of principal components which account for the largest amount of variability. A rule of thumb is to choose enough components to explain at least 8% of the variability. Scatterplots of the principal components are sometimes helpful in identifying the dimension each component represents. Another aspect which should be considered in PC Analysis is the scaling of the original variables. When principal components are extracted from the covariance matrix, those variables with the largest variance will be given the highest weights. Remember variance has no unit and the variance of a variable is directly affected by the unit of measure. When the magnitudes of the variances are related to the units of measure, then the principal component weights will also be a function of the units of measure. The common remedy is to extract principal components from the correlation matrix rather than the covariance matrix. Because correlations have no units and take on values in the range of -1 to I, the effect of units has been removed. Another option is to transform all original variables to the same scale. For example, all variables could be transformed to standard normal variates. The major rationale for performing a PC analysis is to reduce the complexity of dealing with all of the original data. A key result of the PC analysis is to understand the important constructs represented within the original data. Section 11. SAS PROC PRlNCOMP The SAS System for Statistical Analysis provides several approaches to Principal Components Analysis. The following table identifies the available procedures along with the type of data needed and the analyses performed. 782

3 ~ Analysis Type of Data PRlNCOMP PC continuous FACTOR Corrunon Factor continuous PC continuous PRlNQUAL PC categorical CORRESP Weighted PC categorical PROC PRlNCOMP provides a straight forward approach to PC Analysis and is the topic of this paper. There are six main statements for PROC PRINCOMP: the PROC call, the VAR, BY, FREQ, PARTIAL, and WEIGHT statements. Each of these statements will be discussed in greater detail. The PROC call has three options that are used to specify data sets. First, the common DATA= option tells the system which data set holds the set of original variables. As with other procedures, if the DATA= option is not specified, the most recently created data set will be used. The input data set can be a regular SAS data set or output from many other SAS procedures such as a correlation matrix, a covariance matrix, or a sums of squares and crossproducts matrix. There are two available options for requesting output data sets. The OUT= option produces a data set holding all of the original observations and variables plus the new principal components. The OUTST AT= option creates a data set containing surrunary statistics. Both of these data sets will be described in more detail later. The user can tailor his or her analysis with several available options in the PROC call. First, the N= option specifies the exact number ofprincipal components to be computed. Remember, SAS will produce as many principal components as needed to explain all of the variability in the original variables. It is the researchers responsibility to decide how many components are actually useful. It should be noted that the weightings and values of the component scores will not be affected by this option. Only the number of components output and the amount of variability explained will be limited. If an output data set containing the principal component scores is to be saved and used in future analyses, the user may want to control the names of the variables holding the component scores. This can be accomplished through the PREFIX= option. The components will be assigned names beginning with the value of PREFIX and ending with the number of the component. As with all SAS variables, the name (PREFIX + number) cannot exceed eight characters. For example, if the user suspects that there will be fewer than nine principal components then the value of PREFIX can hold up to seven characters and the last character should be reserved for the component number. If more than nine components are expected, then at least two characters should be reserved for the number. If more than nine components are expected then at least two characters should be reserved for the component number. Another usefuj option is NOINT which forces SAS to use the correlation (or covariance) matrix without correction for the mean. This means that no intercept term is included in the model. The NOINT option is only used when prior work has convinced the analyst that the intercept should equal zero. A similar option is V ARDEF= which allows the user to specify the denominator to be used in variance and standard deviation calculations. The possible values are OF, N, WEIGHT or WDF. The default value is DF which indicates that the error degrees offreedom are to be used. The value of N requests that the number of observations be used as the denominator. When the relative weights are given to individual observations (through the WEIGHT statement) the divisor can be either the sum of the weights (V ARDEF=WGn or the sum of the weights minus one (V ARDEF=WDF). The COV (or COVARIANCE) option requests that the principal components be extracted from the covariance matrix (rather than the default correlation matrix). Recall, variables with large variances will be weighted stronger when the covariance matrix is used. Thus. this option should not be used unless the original variables are all measured in like units or have equal variance. The STD (or STANDARD) option requests that the resulting principal components be standardized to have unit variance. The decomposition method insures that the eigenvectors of the correlation matrix will have unit length. This option causes the eigenvectors to be divided by the square roots of the eigenvalues to produce principal component scores which have unit variance. Finally, the NOPRINT option can be used to suppress printed output. This option is useful when many PC analyses are being produced. Also, it is usually only used when either the OUT= or OUTST AT= options are specified as alternative means of receiving the results. The next most corrunon statement in PROC PRINCOMP is the V AR statement which identifies the set of original variables. Because we are expecting SAS to perfonn numeric calculations, the variables listed in this 783

4 statement must be numeric. Character or other nonnumeric types of variables will cause an error message. SAS does not limit the number of variables which can be specified; however, the number of observations in the data set will limit the number of variables that can be considered. The number of observations should always exceed the number of variables. While no hard rule exists, it is prudent to follow the guide of at least ten observations per variable. If the V AR statement is omitted, SAS will perform the PC Analysis on all numeric variables in the input data set As with all SAS procedures, a BY statement can be included to request a separate analysis for each unique value of the BY variables. The input data should be soned in order of the BY variables. The output data sets will also include the BY variables. The WEIGHT statement can be used to introduce relative weights for individual observations in the original set of variables. In some cases, the researcher may have prior knowledge that the reliability of cenain variables differs greatly across observations and may wish to weight the observations differentially. For example, if the observations in the data set are estimated means then weighting the observations by the inverse of the standard deviation would provide optimal results. Another situation where weights may be usefu1 is in the case of missing values. If missing values are replaced with mean or median values, those observations with a substitute value may be weighted less than observations with actual values. Sometimes, the input data may be previously summarized. In other words, each observation in the input data set may represent multiple occurrences of that unique combination of values of the original variables. In this case, the FREQ statement can be used to identify that variable which holds the frequencies. Theoretically, this frequency variable should be an integer. If noninteger values are included, SAS will only use the integer portion of the value. Likewise., SAS will exclude observations with missing or zero values of the FREQ variable. The user can request that the effect of cenain variables be removed from the correlation matrix prior to the extraction of principal components. Those variables can be identified with the PARTIAL statement. As with the V AR statement, all variables named should be numeric. PRINCOMP uses the PARTIAL variables to predict the V AR variables and then computes residuals. In output data sets, these residuals are named with the characters "R_" prefixed to the first six characters of the VAR variable names. The principal components are ej.1racted from the correlation matrix of these residuals. Now that we have covered all of the statements and options, let's go back to the output data sets. First, the OUT= data set contains all of the origina1 variables plus the new variables holding the principal component scores. These new variables will be named PRlNl, PRlN2, etc. if the PREFIX= option is not used to customize the names. If the N= option is specified, only that number of component score variables will be included in this data set. The number of observations in this data set will equal that of the input data set. It should be noted that an OUT= data set cannot be created if the input data set is a summary or statistic type of data set (i.e., TYPE = CORR, COY, EST, SSCP, etc.). If the PARTIAL statement is used, then this data set also contains the residuals in variables named by the convention discussed previously. The OUTSTAT= data set contains SUIDIDaIY statistics for each of the variables listed in the V AR statement. This data set contains variables with the same names as those used in the PC analysis (i.e., those listed in the V AR statement). However. in this data set those variables hold the values of the summary statistics, not the raw data. Since the statistics are contained as observations. they can be identified with the _TYPE_variable. The following table lists the available statistics. Statistic means standard deviations number of observations sum of weights correlations covariances eigenvalues weights (eigenvectors) uncorrected std deviations uncorrected correlations uncorrected covariances uncorrected weights _TYPE_ MEAN SID N SUMWGT CORR COY EIGENVAL SCORE USID UCORR UCOV USCORE There will be one _ TYPE_ = MEAN observation for each value of the BY variables or one observation when there is no BY statement. These observations will be omitted if the PARTIAL statement is used. The _TYPE_ = SID observations contain the standard deviations of the original variables. The OUTSTAT= data set will contain one of these observations for each value of the BY variables. When the PARTIAL 784

5 statement is used, these observations hold the square root of the mean square error from the prediction of the original variables by the PARTIAL variables. These observations are excluded when the COV option is used. The observations with _ TYPE_ = N will contain the same value across all variables. Again, there will be one observation per BY value. When the partial statement and the default value of V ARDEF (OF) are used, the degrees of freedom for the PARTIAL variables are removed from the number of observations. The _ TYPE_ = SUMWGT records will only be output when the sum of the weights differs from the number of observations. The values contained in these observations will be equal across all variables. If the PARTIAL statement is used along with the V ARDEF = WDF option, then this value is decremented by the degrees of freedom of the PARTIAL variables. There will be as many _TYPE_ = CORR observations in the OUTSTAT= data set as there are variables being analyzed. These observations contain the correlations between pairs of the original variables. The _NAIvIE_ variable identifies the second variable of the pair. When the PARTIAL statement is used, the partial correlations are output instead of the raw correlations. These observations are excluded when the COV option is used. Similarly, the _TYPE_ = COV observations are only included when the COV option is specified. The number of these observations is equal to the number of original variables. The partial covariances will be output when the PARTIAL statement is used. The observations containing the eigenvalues are identified by _TYPE _ = EIGENV AL. There will be one for each value of the BY variables. The eigenvalues are found in the variables named after the original variables. However, it should be noted that this does not mean that an individual eigenvalue is directly related to any individual input variable. The eigenvalues will be assigned to these variables based on the ordering of the variables in the V AR statement The first eigenvalue can be found in the first variable listed and so on. When the N= option is used to limit the number ofprincipal components, only that number of eigenvalues will be output and the remaining variables will hold missing values. There are as many observations with _TYPE_ = SCORE as there are principal components. Thus, if the a certain number of components is requested, only that Dumber of SCORE observations will be included in the OUTSTAT= data set. The corresponding principal component can be identified with the _NAIvIE_ variable which will hold the values assigned with the PREFIX= option or the default names. The OUTST AT= data set will have different TYPE values depending upon various options. The default value is CORR. When the components are extracted from the covariance matrix (i.e., the COVoption is used), the resulting data set will have TYPE=COV. When the NOINT option is used the output data set will be unadjusted, so the value of TYPE will be UCORR or UCOV depending upon the use of the COVoption. Armed with this knowledge of the options and statements available in PROC PRINCOMP, we will now proceed to an example. Section Ill. Example For illustration purposes let's consider a contrived example. Suppose a leading health care plan develops a compensation system which will encourage providers to improve quality of care. Each provider will be measured on twelve variables. 1. hospital cost per member per month 2. specialist physician cost per member per month 3. emergency room cost per member per month 4. primary care physician encounters per member per year 5. laboratory tests per member per year 6. radiology encounters per member per year 7. proportion of members transferring to other primary care providers during the year average scores from survey questions where members rate their physicians on the following qualities: 8. "Ability to make appointments for checkups" 9. "Ability to make appointments for illness" 1. "Ability to contact doctor when office is closed" 11. "Ability to obtain referrals to specialists after evaluation by primary doctor" 12. "Response to an emergency call within 3 minutes" It is suspected that these twelve measures will actually represent two or three constructs which will capture the quality of care these providers are furnishing. We want to identify these constructs and develop composite scores for each. Thus, we'll perform a PC analysis. For each measure, the providers will be ranked and assigned scores in the range of 1 to 1, where high numbers indicate better performance. The input data set 785

6 will contain one observation per provider and twelve variables measuring the provider's relative performance in the designated areas. For simplicity's sake, we will refer to these scores as MEASI - MEASI2, where the number corresponds to the items listed above. The table below displays ten example observations. ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ' ID MEASl MEASl MEASJ MEAS4 MEAS5 MEAS6 MEAS7 MEASS MEAS9 MEASIO MEASH MEAS S _~' w._'.~-.'_..'" " '.....,'"".'.'or oo _....'..', """"'"...'.'...''''... "..._...'.~, '.'.' loos S S , ~ ~ ~, , ~ ~.~ ~~ First, we will consider missing values. All twelve of the measures should contribute to a provider's scores on the final constructs; therefore, missing values will be replaced with the average value for that measure. We want to perform a very simple PC analysis with no weights or partialling. The components will be extracted from the correlation matrix with an intercept term and no standardization of scores will be requested. The following code produces this simple analysis. PROC PRlNCOMP DATA = DOCS OUT = PCOUT; VAR MEASI -MEASI2; TITLE 'PC Analysis on Twelve Quality Measures'; RUN; An example of SAS output which would be produced from this code is shown on the following pages. The printed output includes summary statistics (namely the mean, standard deviation and number of observations) for each of the twelve original variables. The next section of the printed output is a display of the correlation matrix. All elements along the diagonal are equal to one. Each off-diagonal element tells the degree of the linear relationship between those two original variables. In our contrived example, MEASI and MEAS2 have a strong positive linear relationship with a correlation of.813. On the other hand, MEASI and MEAS12 do not appear to have a linear relationship since their correlation is only.6. The third section lists the eigenvalues associated with each principal component in order of the proportion of variability explained. We see that SAS produced twelve principal components in our example. The first component explains about 41 % of the variability in the set of original variables. The second and third components add an additional 21% and 2% respectively. Thus the first three principal components account for approximately 82% of the variance of the original variables. The final section of the printed output displays the eigenvectors of the correlation matrix. These vectors of constants are used as weights to create the component scores. For example, let's consider the first component. From the first eigenvector (listed under the column labeled PRlNI on the output), we can compute a value for the first principal component for each observation in the original data set through following equation. PRINI = -.66 * MEASI * MEAS * MEAS * MEAS * MEAS * MEAS * MEAS * MEAS * MEAS * MEASIO * MEASII * MEASl2 However, because we did request an OUT data set, these scores were computed for each principal component and saved in the PCOUT data set as PRlNl - PRlN

7 From these weights, we see that the original variables MEAS7 through MEAS12 contribute the most to this component while all of the other original variables have lower weights. Consider what these six original variables are measuring... rate of member transfers and the five survey responses. This first principal component appears to be a construct of member satisfaction. Let's consider the second principal component. From the second eigenvector (column labeled PRIN2) we see that MEAS1, MEAS2 and MEAS3 all have weights around.5 which are the heaviest weights for this component. These three variables measure the per member per month costs which a provider incurred during the year for hospital, specialist and emergency room use. This principal component might be representing resource utilization. Now consider the third component and its weightings. As we've seen before, three of the original variables (MEAS3, MEAS4 and MEAS5) have the highest weightings for this component. These three variables measure a physician's use of network services. We consider this component to be the dimension of access to available services. None of the eight remaining principal components uniquely explain any significant amount of the variability in the original data. Recall, the goal of PC Analysis is to reduce the number of original variables while retaining as much of the original information as possible. We have accounted for more than 8% of the variance in the twelve measures with three principal components. MODIFIED SAS OUTPUT Principal Component Analysis 5 Observations 12 Variables Simple Statistics MEASI MEAS2 MEAS3 MEAS4 MEAS5 MEAS6 Mean StD MEAS7 MEAS8 MEAS9 MEASIO MEASll MEAS12 Mean StD Correlation Matrix MEASI MEAS2 MEAS3 MEAS4 MEAS5 MEAS6 MEAS7 MEAS8 MEAS9 MEASIO MEASll MEAS12 MEASI MEAS MEAS MEAS

8 Principal Component Analysis Correlation Matrix MEASI MEAS2 MEAS3 MEAS4 MEAS5 MEAS6 MEAS7 MEAS8 MEAS9 MEASIO MEASII MEASI2 MEAS5 MEAS8 MEAS9 MEASIO MEASll MEASl Q Eigenvalues of the Correlation Matrix Eigenvalue Difference Proportion Cumulative PRINI PRIN2 PRIN3 PRIN4 PRIN5 PRIN6 PRIN7 PRIN8 PRIN9 PRINIO PRINll PRIN Eigenvectors PRINI PRlN2 PRlN3 PRlN4 PRINS PRlN6 PRIN7 PRINS PRIN9 PRINIO PRINII PRINJ2 MEASI MEAS2 MEAS3 MEAS4 MEAS5 MEAS6 MEAS7 MEAS8 MEAS9 MEASIO MEASII MEASI

9 .Section IV. Graphing Principal Components Now suppose we would like to evaluate the fictitious providers from the contrived example based on their scores on the three principal components. Scatterplots of the components help to visualize how a provider is doing in all three areas. The following code computes confidence intervals around the components and creates a scatter plot of the first two components. PROC UNJV ARIA TE DATA = PCOUT NOPRINT; VAR PRINI PRIN2 PRIN3; OUTPUT OUT = STATS MEAN = MEANI MEAN2 MEAN3 SID = SIDI SID2 SID3 ; RUN; DATAPCOUT; IF _N_EQ I THEN SET STATS; SETPCOUT; set up variables for plot annotation *; LENGTII FUNCTION STYLE $8 TEXT $4 POSmON XSYS YSYS HSYS $1 ; RETAIN MEANI MEAN2 MEAN3 SIDI SID2 SID3 SIZE 2 FUNCTION 'label' POSmON'5' XSYS '2' YSYS '2" HSYS '2' STYLE 'zapf; * choose two providers to highlight *; IF ID IN (,1367', '1394') THEN TEXT =ID; ELSE TEXT = '*'; * compute confidence intervals *; ARRAY L {*} LI 12 13; ARRAY U {*} UI U2 U3; ARRAY M {*} MEANI MEAN2 MEAN3; ARRAY S {*} SIDI SID2 SID3; DOl= I TO 3; L{I} = M{I} * S{l}; U{1} = M{I} * S{I}; END; RUN; GOPTIONS RESET = ALL DEVICE = HPLJ3SI; PROC GPLOT DATA = PCOUT; AXISI WIDTII = 3 LABEL = (FONT=ZAPF 'COMPONENT 1 '); AXIS2 WIDTII = 3 LABEL = (FONT=ZAPF 'COMPONENT 2'); SYMBOL! VALUE = NONE; SYMB12 VALUE = POINT I = JOIN; SYMBOL3 VALUE = POINT 1 = JOIN; SYMBOL4 VALUE = POINT 1 = JOIN; SYMBOLS VALUE = POINT I = JOIN; PLOT PRIN2*PRINl PRIN2*Ll PRlN2 Ul L2*PRlNI U2*PRlNll ANNOTATE=PCOUT OVERLAY FRAME HAXIS = AXIS I V AXIS = AXIS2; TITLE FONT = ZAPF RUN; QUIT; JUSTIFY=CENTER 'FIRST TWO PRINCIPAL COMPONENTS'; The graphs are shown on the following page. The four lines on each graph represent the 95% confidence bounds for the two components being graphed. This is a simplistic approach, but it does give us an idea of relative performance. Recall, the original variables were all measured on a to 1 scale where higher numbers represent better performance. Therefore, the components will be interpreted in the same manner. The numbers on the graphs identify fictitious providers who have been highlighted solely for illustration purposes. In a real-life situation, points would be identified based on some predefined criteria. For example. an analyst may want to identify all subjects falling outside a certain confidence band. First, consider provider From the first two graphs we see that this provider's score falls outside of the lower confidence bound on the first principal component. In fact, he has the lowest score for this component. However, from the third graph we see that his scores on the second and third principal components are not significantly different from the overall mean scores. From this analysis. we learn that this provider could stand to improve upon member satisfaction. Now consider provider From the graphs we see that this provider performs well in all three areas. In addition, he seems to be outstanding in the area of utilization. These three components will be useful in evaluating the quality of care each physician provides. In this example. PC Analysis proved effective in reducing the number of measures from twelve to three. 789

10 I II _ I Fbst 'l\yo Principal Componeo.ts,.,.., M, * ~*:I :.:, - - : -",:~...:.....:;;, 'V; :, - ~ 'II.. ~.. *Jtf't....:.,.J:~... 4.*.,.-. o. ",... ~ --::... ~ 'II 'II.,ji3e7 1* 'II.,.... 'II",, I.. _.- s ,.. J...s..J.:.. I Second and Thizd Principal Components J II 1, -, -, -, -I' t Fb'st and Thizd Principal Components -,.,.,.\.-. -'- t -).:., - J.. 1 -, , ' If..,. ~if...,... IuM..... 'II.:.. r1! t.~.~ - ~.:~. ""-~ r.&. \*,,~,.A:"-~~' :; -1, IWA~.. IJ *... r;,~.,... #,'~,.~.. '1,- *... -, -I -J -, -, 2 _I -"'- References SAS/STAT User's Guide, Version 6, Fourth Edition, Volume 2. Cary, NC: SAS Institute, Inc., 199 Dunteman, George H. (1984), Introduction to Multivariate Analysis, Sage Publications, Beverly Hills_ CA Marascuilo, Leonard A. and Levin, Joel R. (1982), Multivariate Statistics in the Social Sciences: A Researcher's Guide. Brooks/Cole Publishing, Monterey, CA. SAS is a registered trademark ofsas Institute, Inc., Cary, NC. 79

SAS/STAT 14.1 User s Guide. The PRINCOMP Procedure

SAS/STAT 14.1 User s Guide. The PRINCOMP Procedure SAS/STAT 14.1 User s Guide The PRINCOMP Procedure This document is an individual chapter from SAS/STAT 14.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute

More information

SAS/STAT 14.1 User s Guide. Special SAS Data Sets

SAS/STAT 14.1 User s Guide. Special SAS Data Sets SAS/STAT 14.1 User s Guide Special SAS Data Sets This document is an individual chapter from SAS/STAT 14.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute

More information

SAS/STAT 13.2 User s Guide. The VARCLUS Procedure

SAS/STAT 13.2 User s Guide. The VARCLUS Procedure SAS/STAT 13.2 User s Guide The VARCLUS Procedure This document is an individual chapter from SAS/STAT 13.2 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

SAS/STAT 12.3 User s Guide. Special SAS Data Sets (Chapter)

SAS/STAT 12.3 User s Guide. Special SAS Data Sets (Chapter) SAS/STAT 12.3 User s Guide Special SAS Data Sets (Chapter) This document is an individual chapter from SAS/STAT 12.3 User s Guide. The correct bibliographic citation for the complete manual is as follows:

More information

SPSS INSTRUCTION CHAPTER 9

SPSS INSTRUCTION CHAPTER 9 SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can

More information

Chapter 68 The VARCLUS Procedure. Chapter Table of Contents

Chapter 68 The VARCLUS Procedure. Chapter Table of Contents Chapter 68 The VARCLUS Procedure Chapter Table of Contents OVERVIEW...3593 Background..... 3593 GETTING STARTED...3595 SYNTAX...3600 PROC VARCLUS Statement...3600 BYStatement...3605 FREQStatement...3605

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 9. Linear regression with latent variables 9.1 Principal component regression (PCR) 9.2 Partial least-squares regression (PLS) [ mostly

More information

Chapter 3 Analyzing Normal Quantitative Data

Chapter 3 Analyzing Normal Quantitative Data Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

The EMCLUS Procedure. The EMCLUS Procedure

The EMCLUS Procedure. The EMCLUS Procedure The EMCLUS Procedure Overview Procedure Syntax PROC EMCLUS Statement VAR Statement INITCLUS Statement Output from PROC EMCLUS EXAMPLES-SECTION Example 1: Syntax for PROC FASTCLUS Example 2: Use of the

More information

SAS/STAT 13.1 User s Guide. The SCORE Procedure

SAS/STAT 13.1 User s Guide. The SCORE Procedure SAS/STAT 13.1 User s Guide The SCORE Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

Chapter 1. Using the Cluster Analysis. Background Information

Chapter 1. Using the Cluster Analysis. Background Information Chapter 1 Using the Cluster Analysis Background Information Cluster analysis is the name of a multivariate technique used to identify similar characteristics in a group of observations. In cluster analysis,

More information

Chapter 13 Multivariate Techniques. Chapter Table of Contents

Chapter 13 Multivariate Techniques. Chapter Table of Contents Chapter 13 Multivariate Techniques Chapter Table of Contents Introduction...279 Principal Components Analysis...280 Canonical Correlation...289 References...298 278 Chapter 13. Multivariate Techniques

More information

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG

More information

CREATING THE ANALYSIS

CREATING THE ANALYSIS Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219

More information

Multivariate Normal Random Numbers

Multivariate Normal Random Numbers Multivariate Normal Random Numbers Revised: 10/11/2017 Summary... 1 Data Input... 3 Analysis Options... 4 Analysis Summary... 5 Matrix Plot... 6 Save Results... 8 Calculations... 9 Summary This procedure

More information

SAS/STAT 14.2 User s Guide. The SIMNORMAL Procedure

SAS/STAT 14.2 User s Guide. The SIMNORMAL Procedure SAS/STAT 14.2 User s Guide The SIMNORMAL Procedure This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute

More information

An introduction to SPSS

An introduction to SPSS An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible

More information

Chapter 1 Introduction. Chapter Contents

Chapter 1 Introduction. Chapter Contents Chapter 1 Introduction Chapter Contents OVERVIEW OF SAS/STAT SOFTWARE................... 17 ABOUT THIS BOOK.............................. 17 Chapter Organization............................. 17 Typographical

More information

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics.

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics. 2018 by Minitab Inc. All rights reserved. Minitab, SPM, SPM Salford

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 13.1 User s Guide. The NESTED Procedure SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

CHAPTER 2: SAMPLING AND DATA

CHAPTER 2: SAMPLING AND DATA CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),

More information

MAT 110 WORKSHOP. Updated Fall 2018

MAT 110 WORKSHOP. Updated Fall 2018 MAT 110 WORKSHOP Updated Fall 2018 UNIT 3: STATISTICS Introduction Choosing a Sample Simple Random Sample: a set of individuals from the population chosen in a way that every individual has an equal chance

More information

Year 8 Review 1, Set 1 Number confidence (Four operations, place value, common indices and estimation)

Year 8 Review 1, Set 1 Number confidence (Four operations, place value, common indices and estimation) Year 8 Review 1, Set 1 Number confidence (Four operations, place value, common indices and estimation) Place value Digit Integer Negative number Difference, Minus, Less Operation Multiply, Multiplication,

More information

Section 4 General Factorial Tutorials

Section 4 General Factorial Tutorials Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One

More information

Error Analysis, Statistics and Graphing

Error Analysis, Statistics and Graphing Error Analysis, Statistics and Graphing This semester, most of labs we require us to calculate a numerical answer based on the data we obtain. A hard question to answer in most cases is how good is your

More information

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Graphical Analysis of Data using Microsoft Excel [2016 Version] Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

Year 10 General Mathematics Unit 2

Year 10 General Mathematics Unit 2 Year 11 General Maths Year 10 General Mathematics Unit 2 Bivariate Data Chapter 4 Chapter Four 1 st Edition 2 nd Edition 2013 4A 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 2F (FM) 1,

More information

Chapter Two: Descriptive Methods 1/50

Chapter Two: Descriptive Methods 1/50 Chapter Two: Descriptive Methods 1/50 2.1 Introduction 2/50 2.1 Introduction We previously said that descriptive statistics is made up of various techniques used to summarize the information contained

More information

Statistics, Data Analysis & Econometrics

Statistics, Data Analysis & Econometrics ST009 PROC MI as the Basis for a Macro for the Study of Patterns of Missing Data Carl E. Pierchala, National Highway Traffic Safety Administration, Washington ABSTRACT The study of missing data patterns

More information

Statistics 1 - Basic Commands. Basic Commands. Consider the data set: {15, 22, 32, 31, 52, 41, 11}

Statistics 1 - Basic Commands. Basic Commands. Consider the data set: {15, 22, 32, 31, 52, 41, 11} Statistics 1 - Basic Commands http://mathbits.com/mathbits/tisection/statistics1/basiccommands.htm Page 1 of 3 Entering Data: Basic Commands Consider the data set: {15, 22, 32, 31, 52, 41, 11} Data is

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

IBM SPSS Categories 23

IBM SPSS Categories 23 IBM SPSS Categories 23 Note Before using this information and the product it supports, read the information in Notices on page 55. Product Information This edition applies to version 23, release 0, modification

More information

The NESTED Procedure (Chapter)

The NESTED Procedure (Chapter) SAS/STAT 9.3 User s Guide The NESTED Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation for the complete manual

More information

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved. StatCalc User Manual Version 9 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Introduction... 4 Getting Help... 4 Uninstalling StatCalc...

More information

Effective Forecast Visualization With SAS/GRAPH Samuel T. Croker, Lexington, SC

Effective Forecast Visualization With SAS/GRAPH Samuel T. Croker, Lexington, SC DP01 Effective Forecast Visualization With SAS/GRAPH Samuel T. Croker, Lexington, SC ABSTRACT A statistical forecast is useless without sharp, attractive and informative graphics to present it. It is really

More information

The SIMNORMAL Procedure (Chapter)

The SIMNORMAL Procedure (Chapter) SAS/STAT 12.1 User s Guide The SIMNORMAL Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 12.1 User s Guide. The correct bibliographic citation for the complete

More information

The DMINE Procedure. The DMINE Procedure

The DMINE Procedure. The DMINE Procedure The DMINE Procedure The DMINE Procedure Overview Procedure Syntax PROC DMINE Statement FREQ Statement TARGET Statement VARIABLES Statement WEIGHT Statement Details Examples Example 1: Modeling a Continuous

More information

Suggested Foundation Topics for Paper 2

Suggested Foundation Topics for Paper 2 Suggested Foundation Topics for Paper 2 Number N a N b N b N c N d Add, subtract, multiply and divide any positive and negative integers Order decimals and integers Order rational numbers Use the concepts

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

Chapter 52 The PRINCOMP Procedure. Chapter Table of Contents

Chapter 52 The PRINCOMP Procedure. Chapter Table of Contents Chapter 52 The PRINCOMP Procedure Chapter Table of Contents OVERVIEW...2737 GETTING STARTED...2738 SYNTAX...2743 PROC PRINCOMP Statement......2744 BYStatement...2746 FREQStatement...2747 PARTIALStatement...2747

More information

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95 A Statistical Analysis Macro Library in SAS Carl R. Haske, Ph.D., STATPROBE, nc., Ann Arbor, M Vivienne Ward, M.S., STATPROBE, nc., Ann Arbor, M ABSTRACT Statistical analysis plays a major role in pharmaceutical

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data EXERCISE Using Excel for Graphical Analysis of Data Introduction In several upcoming experiments, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

Math 227 EXCEL / MEGASTAT Guide

Math 227 EXCEL / MEGASTAT Guide Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf

More information

For our example, we will look at the following factors and factor levels.

For our example, we will look at the following factors and factor levels. In order to review the calculations that are used to generate the Analysis of Variance, we will use the statapult example. By adjusting various settings on the statapult, you are able to throw the ball

More information

Descriptive Statistics Descriptive statistics & pictorial representations of experimental data.

Descriptive Statistics Descriptive statistics & pictorial representations of experimental data. Psychology 312: Lecture 7 Descriptive Statistics Slide #1 Descriptive Statistics Descriptive statistics & pictorial representations of experimental data. In this lecture we will discuss descriptive statistics.

More information

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by Paper CC-016 A macro for nearest neighbor Lung-Chang Chien, University of North Carolina at Chapel Hill, Chapel Hill, NC Mark Weaver, Family Health International, Research Triangle Park, NC ABSTRACT SAS

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

Statistics Case Study 2000 M. J. Clancy and M. C. Linn

Statistics Case Study 2000 M. J. Clancy and M. C. Linn Statistics Case Study 2000 M. J. Clancy and M. C. Linn Problem Write and test functions to compute the following statistics for a nonempty list of numeric values: The mean, or average value, is computed

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

1. What specialist uses information obtained from bones to help police solve crimes?

1. What specialist uses information obtained from bones to help police solve crimes? Mathematics: Modeling Our World Unit 4: PREDICTION HANDOUT VIDEO VIEWING GUIDE H4.1 1. What specialist uses information obtained from bones to help police solve crimes? 2.What are some things that can

More information

Data Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC

Data Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC PharmaSUG2010 - Paper TT16 Data Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC ABSTRACT Graphical representation of clinical data is used for concise visual presentations of

More information

Year 8 Set 2 : Unit 1 : Number 1

Year 8 Set 2 : Unit 1 : Number 1 Year 8 Set 2 : Unit 1 : Number 1 Learning Objectives: Level 5 I can order positive and negative numbers I know the meaning of the following words: multiple, factor, LCM, HCF, prime, square, square root,

More information

It s Not All Relative: SAS/Graph Annotate Coordinate Systems

It s Not All Relative: SAS/Graph Annotate Coordinate Systems Paper TU05 It s Not All Relative: SAS/Graph Annotate Coordinate Systems Rick Edwards, PPD Inc, Wilmington, NC ABSTRACT This paper discusses the SAS/Graph Annotation coordinate systems and how a combination

More information

1. Determine the population mean of x denoted m x. Ans. 10 from bottom bell curve.

1. Determine the population mean of x denoted m x. Ans. 10 from bottom bell curve. 6. Using the regression line, determine a predicted value of y for x = 25. Does it look as though this prediction is a good one? Ans. The regression line at x = 25 is at height y = 45. This is right at

More information

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010 Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester Principal Component and Factor Analysis

More information

Basic Commands. Consider the data set: {15, 22, 32, 31, 52, 41, 11}

Basic Commands. Consider the data set: {15, 22, 32, 31, 52, 41, 11} Entering Data: Basic Commands Consider the data set: {15, 22, 32, 31, 52, 41, 11} Data is stored in Lists on the calculator. Locate and press the STAT button on the calculator. Choose EDIT. The calculator

More information

1. Descriptive Statistics

1. Descriptive Statistics 1.1 Descriptive statistics 1. Descriptive Statistics A Data management Before starting any statistics analysis with a graphics calculator, you need to enter the data. We will illustrate the process by

More information

DESIGN OF EXPERIMENTS and ROBUST DESIGN

DESIGN OF EXPERIMENTS and ROBUST DESIGN DESIGN OF EXPERIMENTS and ROBUST DESIGN Problems in design and production environments often require experiments to find a solution. Design of experiments are a collection of statistical methods that,

More information

INTRODUCTION TO THE SAS ANNOTATE FACILITY

INTRODUCTION TO THE SAS ANNOTATE FACILITY Improving Your Graphics Using SAS/GRAPH Annotate Facility David J. Pasta, Ovation Research Group, San Francisco, CA David Mink, Ovation Research Group, San Francisco, CA ABSTRACT Have you ever created

More information

Example Using Missing Data 1

Example Using Missing Data 1 Ronald H. Heck and Lynn N. Tabata 1 Example Using Missing Data 1 Creating the Missing Data Variable (Miss) Here is a data set (achieve subset MANOVAmiss.sav) with the actual missing data on the outcomes.

More information

General Instructions. Questions

General Instructions. Questions CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

More information

Integrated Mathematics I Performance Level Descriptors

Integrated Mathematics I Performance Level Descriptors Limited A student performing at the Limited Level demonstrates a minimal command of Ohio s Learning Standards for Integrated Mathematics I. A student at this level has an emerging ability to demonstrate

More information

Figure 1. Paper Ring Charts. David Corliss, Marketing Associates, Bloomfield Hills, MI

Figure 1. Paper Ring Charts. David Corliss, Marketing Associates, Bloomfield Hills, MI Paper 16828 Ring Charts David Corliss, Marketing Associates, Bloomfield Hills, MI Abstract Ring Charts are presented as a new, graphical technique for analyzing complex relationships between tables in

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

In-Database Procedures with Teradata: How They Work and What They Buy You David Shamlin and David Duling, SAS Institute, Cary, NC

In-Database Procedures with Teradata: How They Work and What They Buy You David Shamlin and David Duling, SAS Institute, Cary, NC Paper 337-2009 In-Database Procedures with Teradata: How They Work and What They Buy You David Shamlin and David Duling, SAS Institute, Cary, NC ABSTRACT SAS applications are often built to work with large

More information

1-2 9 Measures and accuracy

1-2 9 Measures and accuracy Year Term Week Chapter Ref Lesson 9.1 Estimation and approximation Year 2 m Autumn Term 1-2 9 Measures and accuracy 3-4 (Number) 9.2 Calculator methods 9.3 Measures and accuracy Assessment 9 10.1 Solving

More information

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM

1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1. NUMBER SYSTEMS USED IN COMPUTING: THE BINARY NUMBER SYSTEM 1.1 Introduction Given that digital logic and memory devices are based on two electrical states (on and off), it is natural to use a number

More information

STEPHEN WOLFRAM MATHEMATICADO. Fourth Edition WOLFRAM MEDIA CAMBRIDGE UNIVERSITY PRESS

STEPHEN WOLFRAM MATHEMATICADO. Fourth Edition WOLFRAM MEDIA CAMBRIDGE UNIVERSITY PRESS STEPHEN WOLFRAM MATHEMATICADO OO Fourth Edition WOLFRAM MEDIA CAMBRIDGE UNIVERSITY PRESS Table of Contents XXI a section new for Version 3 a section new for Version 4 a section substantially modified for

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

Creating Maps in SAS/GRAPH

Creating Maps in SAS/GRAPH Creating Maps in SAS/GRAPH By Jeffery D. Gilbert, Trilogy Consulting Corporation, Kalamazoo, MI Abstract This paper will give an introduction to creating graphs using the PROC GMAP procedure in SAS/GRAPH.

More information

Edexcel Linear GCSE Higher Checklist

Edexcel Linear GCSE Higher Checklist Number Add, subtract, multiply and divide whole numbers integers and decimals Multiply and divide fractions Order integers and decimals Order rational numbers Use the concepts and vocabulary of factor

More information

Chapter 1 Histograms, Scatterplots, and Graphs of Functions

Chapter 1 Histograms, Scatterplots, and Graphs of Functions Chapter 1 Histograms, Scatterplots, and Graphs of Functions 1.1 Using Lists for Data Entry To enter data into the calculator you use the statistics menu. You can store data into lists labeled L1 through

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Exam Review: Ch. 1-3 Answer Section

Exam Review: Ch. 1-3 Answer Section Exam Review: Ch. 1-3 Answer Section MDM 4U0 MULTIPLE CHOICE 1. ANS: A Section 1.6 2. ANS: A Section 1.6 3. ANS: A Section 1.7 4. ANS: A Section 1.7 5. ANS: C Section 2.3 6. ANS: B Section 2.3 7. ANS: D

More information

General Program Description

General Program Description General Program Description This program is designed to interpret the results of a sampling inspection, for the purpose of judging compliance with chosen limits. It may also be used to identify outlying

More information

SHAPE, SPACE & MEASURE

SHAPE, SPACE & MEASURE STAGE 1 Know the place value headings up to millions Recall primes to 19 Know the first 12 square numbers Know the Roman numerals I, V, X, L, C, D, M Know the % symbol Know percentage and decimal equivalents

More information

CREATING THE DISTRIBUTION ANALYSIS

CREATING THE DISTRIBUTION ANALYSIS Chapter 12 Examining Distributions Chapter Table of Contents CREATING THE DISTRIBUTION ANALYSIS...176 BoxPlot...178 Histogram...180 Moments and Quantiles Tables...... 183 ADDING DENSITY ESTIMATES...184

More information

Stage 1 (intervention) Stage 2 Stage 3 Stage 4. Advanced 7-8. Secure 4-6

Stage 1 (intervention) Stage 2 Stage 3 Stage 4. Advanced 7-8. Secure 4-6 Stage 1 (intervention) Stage 2 Stage 3 Stage 4 YEAR 7 LAT Grade Emerging (not secondary read) 1-3 Secure 4-6 Advanced 7-8 Advanced + 9 YEAR 8 1 Emerging 2-3 Secure 4-6 Advanced 7-9 Autumn 1 Place Value

More information

Facial Expression Detection Using Implemented (PCA) Algorithm

Facial Expression Detection Using Implemented (PCA) Algorithm Facial Expression Detection Using Implemented (PCA) Algorithm Dileep Gautam (M.Tech Cse) Iftm University Moradabad Up India Abstract: Facial expression plays very important role in the communication with

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

EXST SAS Lab Lab #6: More DATA STEP tasks

EXST SAS Lab Lab #6: More DATA STEP tasks EXST SAS Lab Lab #6: More DATA STEP tasks Objectives 1. Working from an current folder 2. Naming the HTML output data file 3. Dealing with multiple observations on an input line 4. Creating two SAS work

More information

PCOMP http://127.0.0.1:55825/help/topic/com.rsi.idl.doc.core/pcomp... IDL API Reference Guides > IDL Reference Guide > Part I: IDL Command Reference > Routines: P PCOMP Syntax Return Value Arguments Keywords

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

SAS Training Spring 2006

SAS Training Spring 2006 SAS Training Spring 2006 Coxe/Maner/Aiken Introduction to SAS: This is what SAS looks like when you first open it: There is a Log window on top; this will let you know what SAS is doing and if SAS encountered

More information

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created

More information

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility PharmaSUG2011 Paper TT12 Creating Forest Plots Using SAS/GRAPH and the Annotate Facility Amanda Tweed, Millennium: The Takeda Oncology Company, Cambridge, MA ABSTRACT Forest plots have become common in

More information

SAS Graphics Macros for Latent Class Analysis Users Guide

SAS Graphics Macros for Latent Class Analysis Users Guide SAS Graphics Macros for Latent Class Analysis Users Guide Version 2.0.1 John Dziak The Methodology Center Stephanie Lanza The Methodology Center Copyright 2015, Penn State. All rights reserved. Please

More information

BODMAS and Standard Form. Integers. Understand and use coordinates. Decimals. Introduction to algebra, linear equations

BODMAS and Standard Form. Integers. Understand and use coordinates. Decimals. Introduction to algebra, linear equations HIGHER REVISION LIST FOUNDATION REVISION LIST Topic Objectives Topic Objectives BODMAS and Standard Form * Add, subtract, multiply and divide whole numbers, integers and decimals * Order integers and decimals

More information