Loglinear Models for Categorical Data. Michael Friendly

Size: px
Start display at page:

Download "Loglinear Models for Categorical Data. Michael Friendly"

Transcription

1 Admit?: Yes Sex: Male Admit?: No Right Eye Grade High 2 3 Unaided distant vision data Brown Hazel Green Blue Sex: Female Low High 2 3 Left Eye Grade Low Black Brown Red Blond Michael Friendly York University, <friendly@yorku.ca> SPIDA 2004

2 outline Lecture Outline Introduction Graphical methods for Categorical Data Loglinear models: Prelimaries Overview of loglinear models Visualizing contingency tables and loglinear models 2 2 tables Larger tables n-way tables Structured tables Ordinal variables Square tables SPIDA Michael Friendly

3 outline Resources SAS macro programs, from SAS System for Statistical Graphics (Friendly, 1991): SAS macro programs, from Visualizing Categorical Data (Friendly, 2000): Workshop notes: SCS Short Course notes, Categorical Data Analysis with Graphics SPIDA Michael Friendly

4 overview2 Graphical Methods for Categorical Data If I can t picture it, I can t understand it. Albert Einstein Getting information from a table is like extracting sunlight from a cucumber. Farquhar & Farquhar, 1891 Tables vs. Graphs Tables are best suited for look-up read off exact numbers Graphs are better for showing patterns, trends, anomalies, making comparisons Visual presentation as communication: what do you want to say? SPIDA Michael Friendly

5 overview2 Visual metaphors Quantitative data: magnitude position along an axis Frequency data: count area (Friendly, 1995) Model: (DeptGender)(Admit) Admit?: Yes Sex: Male Admit?: No A B C D E F Sex: Female Admitted Male Rejected Female Fourfold display for 2 2 table Mosaic plot for 3-way table SPIDA Michael Friendly

6 Auto data: PC2/1 order Displa Gratio Hroom Length MPG Price Rep77 Rep78 Rseat Trunk Turn Weight Gratio MPG Rep78 Rep77 Price Hroom Trunk Rseat Length Weight Displa Turn Turn Displa Weight Length Rseat Trunk Hroom Price Rep77 Rep78 MPG Gratio Weight Turn Trunk Rseat Rep78 Rep77 Price MPG Length Hroom Gratio Displa Loglinear Models for Categorical Data overview2 Principles of Graphical Displays Effect ordering (Friendly and Kwan, 2003) In tables and graphs, sort unordered factors according to the effects you want to see/show. Auto data: Alpha order Corrgrams: Exploratory displays for correlation matrices (Friendly, 2002) SPIDA Michael Friendly

7 overview2 Comparisons Make visual comparisons easy Visual grouping connect with lines, make key comparisons contiguous Baselines compare data to model against a line, preferably horizontal Frequency Sqrt(frequency) Number of Occurrences Number of Occurrences SPIDA Michael Friendly

8 overview2 Small multiples combine stratified graphs into coherent displays (Tufte, 1983) e.g., scatterplot matrix for quantitative data: all pairwise scatterplots 87.2 Prestige Educ Income Women 0 SPIDA Michael Friendly

9 overview2 e.g., mosaic matrix for quantitative data: all pairwise mosaic plots Admit Male Female A B C D E F Admit Reject A B C D E F Male Female Gender Admit Reject Male Female A B A B C C Dept SPIDA Michael Friendly D D E E F F Male Female Admit Reject Admit Reject

10 prelim Loglinear models: Preliminaries Categorical data: Variables with a finite number of discrete values Binary: pass/fail, live/die, vote yes/vote no Nominal: unordered categories Religion: Christian, Jewish, Muslim,... Party preference: Conservative, Liberal, NDP, Bloc Québécois, Green Party Ordinal: ordered categories Education: Primary, Secondary, University, Post-graduate Income ranges: 0-20, 20-30, 30-40,... Likert scale: strongly disagree strongly agree Interval: Counts: number of children 0, 1, 2,... SPIDA Michael Friendly

11 prelim Loglinear models: Preliminaries Data representations: Case form: One row per case (observation), one column per variable standard data table (Quantitative data is always represented in case form) Table form: A set of n variables represented as an n-way table Table 1: Admissions to Berkeley graduate programs Admitted Rejected Total Males Females Total SPIDA Michael Friendly

12 prelim 3+-way tables Table 2: Berkeley data: 3-way table Male Female Dept Admit Reject Admit Reject A B C D E F SPIDA Michael Friendly

13 prelim Frequency form: One row per combination of variable categories 1 Obs Admit Gender Dept count Admitted Male A Admitted Male B Admitted Male C Admitted Male D Admitted Male E Admitted Male F Admitted Female A Admitted Female B Admitted Female C Admitted Female D Admitted Female E Admitted Female F Rejected Female E Rejected Female F 317 SPIDA Michael Friendly

14 prelim Loglinear models: Preliminaries Data transformations: All SAS procedures take input in case form. Most SAS procedures take input in frequency form need to use a FREQ or WEIGHT statement to identify the frequency (count) variable. Categorical variables can be represented by numeric or character values Character variables are ordered alphabetically by default often not what you want (e.g., Low, Medium, High High, Low, Medium) Often easiest to use numeric values, and create a SAS format to attach value labels: 1 proc format; 2 value inc 1= Low 2= Medium 3= High ; 3 proc freq; 4 tables educ * income; 5 format income inc.; SPIDA Michael Friendly

15 prelim Recoding categories Categories can be collapsed (frequencies summed) using SAS formats 1 proc format; 2 value edu 1-8= Primary 9-12= Secondary = University 16-high= Post-grad ; 4 proc freq; 5 tables educ * income; 6 format educ edu. income inc.; Reordering categories Many SAS procedures allow several different ways to order the categories of a discrete variable ORDER=INTERNAL by numeric or character value (default). ORDER=DATA by order of appearance in the dataset. ORDER=FORMATTED by order of formatted value. ORDER=FREQ by order of frequency of occurrence. The table macro Collapse or recode variables SPIDA Michael Friendly

16 overview Related perspectives Loglinear models: Overview Loglinear models can be developed as an analog of classical ANOVA and regression models, where multiplicative relations (under independence) are re-expressed in additive form as models for log(frequency). Alternatively, but somewhat more generally, loglinear models are also GLMs for log(frequency), with a Poisson distribution for the cell counts. Two-way tables: Loglinear approach For two discrete variables, A and B, suppose a multinomial sample of total size n over the IJ cells of a two-way I J contingency table, with cell frequencies n ij, and cell probabilities π ij = n ij /n. The table variables are statistically independent when the cell (joint) probability equals the product of the marginal probabilities, Pr(A = i & B = j) = Pr(A = i) Pr(B = j), or, π ij = π i+ π +j. An equivalent model in terms of expected frequencies, m ij = nπ ij is m ij = n m i+ m +j. SPIDA Michael Friendly

17 overview This multiplicative model can be expressed in additive form as a model for log m ij, log m ij = log n + log m i+ + log m +j. (1) By anology with ANOVA models, the independence model (1) can be expressed as log m ij = µ + λ A i + λ B j, (2) where µ is the grand mean of log m ij and the parameters λ A i and λ B j express the marginal frequencies of variables A and B, and are typically defined so that i λa i = j λb j = 0. Dependence between the table variables is expressed by adding association parameters, λ AB ij, giving the saturated model, log m ij = µ + λ A i + λ B j + λ AB ij. (3) The saturated model fits the table perfectly ( m ij = n ij ): there are as many parameters as cell frequencies. A global test for association tests H 0 : λ AB ij = 0. For ordinal variables, the λ AB may be structured more simply, giving tests for ordinal association. ij SPIDA Michael Friendly

18 overview Two-way tables: GLM approach In the GLM approach, the vector of cell frequencies, n = {n ij } is specified to have a Poisson distribution with means m = {m ij } given by log m = Xβ where X is a known design (model) matrix and β is a column vector containing the unknown λ parameters. For example, for a 2 2 table, the saturated model (3) with the usual zero-sum constraints can be represented as log m 11 log m 12 log m 21 log m 22 = µ λ A 1 λ B 1 λ AB 11 Note that only the linearly independent parameters are represented. λ A 2 = λ A 1, because λa 1 + λ A 2 = 0, and so forth. An advantage of the GLM formulation is that it makes it somewhat easier to express models with ordinal or quantitative variables. SPIDA Michael Friendly

19 overview Three-way Tables Saturated model: For a 3-way table, of size I J K for variables A, B, C, the saturated loglinear model includes associations between all pairs of variables, as well as a 3-way association term, λ ABC ijk log m ijk = µ + λ A i + λ B j + λ C k + λ AB ij + λ AC ik + λ BC jk + λ ABC ijk. (4) As before, one-way terms (λ A i, λb j, λc k ij, λ AC ik, λbc jk ) pertain to differences in the marginal frequencies of the table variables. Two-way terms (λ AB ) pertain to the partial association for each pair of variables, controlling for the remaining variable. The three-way term, λ ABC allows the partial association between any pair of ijk variables to vary over the categories of the third variable. Such models are usually hierarchical: the presence of a high-order term, such as λ ABC implies that all low-order relatives are automatically included. ijk Thus, a short-hand notation for a loglinear model lists only the high-order terms, i.e., model (4) [ABC] SPIDA Michael Friendly

20 overview Reduced models: The usual goal is to fit the smallest model (fewest high-order terms) that is sufficient to explain/describe the observed frequencies. All representative subset models are shown in the table below Table 3: Log-linear Models for Three-Way Tables Model Model symbol Interpretation Mutual independence [A][B][C] A B C Joint independence [AB][C] (A B) C Conditional independence [AC][BC] (A B) C All two-way associations [AB][AC][BC] homogeneous association Saturated model [ABC] interaction e.g., [AB][C] log m ijk = µ + λ A i + λ B j + λ C k + λ AB ij SPIDA Michael Friendly

21 overview Assessing goodness of fit Goodness of fit of a specified model may be tested by the likelihood ratio G 2, G 2 = 2 i n i log(n i / m i ), (5) or the Pearson χ 2, χ 2 = i (n i m i ) 2 m i, (6) with degrees of freedom = # cells - # estimated parameters. E.g., for the model of mutual independence, [A][B][C], df = IJK (I 1) (J 1) (K 1) = (I 1)(J 1)(K 1) The terms summed in (5) and (6) are the squared cell residuals Other measures of balance goodness of fit against parsimony, e.g., Akaike s Information Criterion (smaller is better) AIC = G 2 2df or AIC = G # parameters SPIDA Michael Friendly

22 vismethods Visualizing Contingency tables Two-way tables 2 2 tables Visualize odds ratio (FFOLD macro) 2 2 k tables Homogeneity of association r 3 tables Trilinear plots (TRIPLOT macro) r c tables Visualize association (SIEVE program) r c tables Visualize association (MOSAIC macro) Square r r tables Visualize agreement (AGREE program) n-way tables Fit loglinear models, visualize lack-of-fit (MOSAIC macro) Test & visualize partial association (MOSAIC macro) Visualize pairwise association (MOSMAT macro) Visualize conditional association (MOSMAT macro) Visualize loglinear structure (MOSMAT macro) Correspondence analysis and MCA (CORRESP macro) SPIDA Michael Friendly

23 2by2 2 2 tables 2 2 tables provide the simplest context to illustrate loglinear models and their relations to other tests and modeling techniques. Example: Bickel et al. (1975): admissions to graduate depatments at U. C. Berkeley in 1973 Aggregate data for the six largest departments: Table 4: Admissions to Berkeley graduate programs Admitted Rejected Total % Admitted Males Females Total Equivalent questions: Proportions: Are the proportions of males and females admitted the same? Method: Test of independent proportions, H 0 : p 1 = p 2 Answer: z = ˆp 1 ˆp 2 = p(1 p)(1/n1 +1/n 2 ) = (.6163)(1/2691+1/1835) SPIDA Michael Friendly

24 2by2 2 2 tables Odds: Are the odds of admission the same for males and females? Odds(Admit Male) Method: Odds ratio, H 0 : θ = Odds(Admit = 1 Female) Odds(Admit Male) Answer: θ = Odds(Admit = 1198/1493 Female) 557/1276 = 1.84 Males 84% more likely to be admitted. Is admission independent of gender? Method: χ 2 test of independence, H 0 : p ij = p i p j Answer: χ 2 (1) = (n ij m ij ) 2 m ij = 92.21, p < Does cell frequency depend only on marginal frequencies of gender and admission? Method: Loglinear model, log(m ij ) = µ + λ A i + λ G j + λag ij Answer: H 0 : λ AG ij = 0 LR G 2 (1) = 93.45, p < SPIDA Michael Friendly

25 2by2 Standard analysis: PROC FREQ proc freq data=berkeley; weight freq; tables gender*admit / chisq; Output: Statistics for Table of gender by admit Statistic DF Value Prob Chi-Square <.0001 Likelihood Ratio Chi-Square <.0001 Continuity Adj. Chi-Square <.0001 Mantel-Haenszel Chi-Square <.0001 Phi Coefficient How to visualize and interpret? SPIDA Michael Friendly

26 2by2 Fourfold displays for 2 2 tables Quarter circles: radius n ij area frequency Independence: Adjoining quadrants align Odds ratio: ratio of areas of diagonally opposite cells Confidence rings: Visual test of H 0 : θ = 1 adjoining rings overlap Sex: Male Admit?: Yes Admit?: No Sex: Female Confidence rings do not overlap: θ 1 SPIDA Michael Friendly

27 2by2 Fourfold displays for 2 2 k tables Data in Table 4 had been pooled over departments Stratified analysis: one fourfold display for each department Each 2 2 table standardized to equate marginal frequencies Shading: highlight departments for which H a : θ i 1 Department: A Sex: Male Department: B Sex: Male Department: C Sex: Male Admit?: Yes Admit?: No Admit?: Yes Admit?: No Admit?: Yes Admit?: No Sex: Female Department: D Sex: Male Sex: Female Department: E Sex: Male Sex: Female Department: F Sex: Male Admit?: Yes Admit?: No Admit?: Yes Admit?: No Admit?: Yes Admit?: No Sex: Female Sex: Female Sex: Female Only one department (A) shows association; θ A = women (0.349) 1 = 2.86 times as likely as men to be admitted. SPIDA Michael Friendly

28 2by2 What happened here? Simpson s paradox: Aggregate data are misleading because they falsely assume men and women apply equally in each field. But: Large differences in admission rates across departments. Men and women apply to these departments differentially. Women applied in large numbers to departments with low admission rates. (This ignores possibility of structural bias against women: differential funding of fields to which women are more likely to apply.) Other graphical methods can show these effects. SPIDA Michael Friendly

29 2by2 The FOURFOLD program and the FFOLD macro The FOURFOLD program is written in SAS/IML. The FFOLD macro provides a simpler interface. Printed output: (a) significance tests for individual odds ratios, (b) tests of homogeneity of association (here, over departments) and (c) conditional association (controlling for department). Plot by department: 1 %include catdata(berkeley); 2 berk4f.sas 3 %ffold(data=berkeley, 4 var=admit Gender, /* panel variables */ 5 by=dept, /* stratify by dept */ 6 down=2, across=3, /* panel arrangement */ 7 htext=2); /* font size */ Aggregate data: first sum over departments, using the TABLE macro: 8 %table(data=berkeley, out=berk2, 9 var=admit Gender, /* omit dept */ 10 weight=count, /* frequency variable */ 11 order=data); 12 %ffold(data=berk2, var=admit Gender); SPIDA Michael Friendly

30 2by2 The FOURFOLD program and the FFOLD macro Printed output: Odds (Admitted Male) / (Admitted Female) Odds Ratio Log Odds SE(LogOdds) Z Pr> Z A B C D E F SPIDA Michael Friendly

31 2by2 For a 2 2 k table, additional tests are also of interest: Homogeneity of odds ratios: H 0 : θ 1 = θ 2 = θ k (This is equivalent to the loglinear model of no 3-way association, [AG][AD][DG].) Conditional independence: Admit Gender Dept Test of Homogeneity of Odds Ratios (no 3-Way Association) TEST CHISQ DF PROB Homogeneity of Odds Ratios Conditional Independence of Admit and Gender Dept (assuming Homogeneity) TEST CHISQ DF PROB Likelihood-Ratio Cochran-Mantel-Haenszel SPIDA Michael Friendly

32 twoway Two-way frequency tables Table 5: Hair-color eye-color data Eye Hair Color Color Black Brown Red Blond Total Green Hazel Blue Brown Total SPIDA Michael Friendly

33 twoway Two-way frequency tables: PROC FREQ proc freq data=haireye; weight count; tables Eye * HAIR / chisq; Output: Statistics for Table of Eye by HAIR Statistic DF Value Prob Chi-Square <.0001 Likelihood Ratio Chi-Square <.0001 Mantel-Haenszel Chi-Square <.0001 Phi Coefficient SPIDA Michael Friendly

34 twoway Two-way frequency tables: PROC GENMOD We can fit the same model as a Poisson GLM for count: proc genmod data=haireye; class hair eye; model count = hair eye hair*eye / dist=poisson type3; Output: LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq hair <.0001 eye <.0001 hair*eye <.0001 How to visualize and interpret? SPIDA Michael Friendly

35 twoway count area Two-way frequency tables: Sieve diagrams When row/col variables are independent, n ij n i+ n +j each cell can be represented as a rectangle, with area = height width frequency, n ij Green Expected frequencies: Hair Eye Color Data Hazel Eye Color Blue Brown Black 286 Brown Hair Color 71 Red 127 Blond 592 SPIDA Michael Friendly

36 twoway Sieve diagrams Height/width marginal frequencies, n i+, n +j Area expected frequency, n i+ n +j Shading observed frequency, n ij, color: sign(n ij ˆm ij ). Independence: Shown when density of shading is uniform. Green Hazel Eye Color Blue Brown Black Brown Red Blond Hair Color SPIDA Michael Friendly

37 twoway Sieve diagrams Effect ordering: Reorder rows/cols to make the pattern coherent Blue Eye Color Green Hazel Brown Black Brown Red Blond Hair Color SPIDA Michael Friendly

38 twoway Sieve diagrams Vision classification data for 7477 women Unaided distant vision data High Right Eye Grade 2 3 Low High 2 3 Left Eye Grade Low SPIDA Michael Friendly

39 nway2 Mosaic displays and Log-linear Models Hartigan and Kleiner (1981), Friendly (1994, 1999): Width one set of marginals, n i+ Height relative proportions of other variable, p j i = n ij /n i+ area frequency, n ij = n i+ p j i Model: (Gender)(Admit) Admitted Rejected Standardized residuals: <-4-4:-2-2:-0 0:2 2:4 >4 Male Female SPIDA Michael Friendly

40 nway2 Shading: Sign and magnitude of Pearson χ 2 residual, d ij = (n ij ˆm ij )/ ˆm ij (or L.R. G 2 ) Sign: negative in red; + positive in blue Magnitude: intensity of shading: d ij > 0, 2, 4,... Independence: Rows align, or cells are empty! E.g., aggregate Berkeley data, independence model: Model: (Gender)(Admit) Admitted Rejected Standardized residuals: <-4-4:-2-2:-0 0:2 2:4 >4 Male Female SPIDA Michael Friendly

41 nway2 Mosaic displays Departments Gender: Did departments differ in the total number of applicants? Did men and women apply differentially to departments? Model: (Dept)(Gender) A B C D E F Model [Dept] [Gender]: G 2 (5) = Note: Departments ordered A F by overall rate of admission. Male Female SPIDA Michael Friendly

42 nway2 Mosaic displays: Hair color and eye color Brown Hazel Green Blue Dark hair goes with dark eyes, light hair with light eyes Red hair, hazel eyes an exception? Effect ordering: Rows/cols permuted by CA Dimension Black Brown Red Blond SPIDA Michael Friendly

43 nway2 Mosaic displays for multiway tables Generalizes to n-way tables: divide cells recursively Can fit any log-linear model (e.g., 3-way), Table 6: Log-linear Models for Three-Way Tables Model Model symbol Independence interpretation Mutual independence [A][B][C] A B C Joint independence [AB][C] (A B) C Conditional independence [AC][BC] (A B) C All two-way associations [AB][AC][BC] (none) Saturated model [ABC] (none) e.g., the model for conditional independence (A C B): [AB][BC] log m ijk = µ + λ A i + λ B j + λ C k + λ AB ij + λ BC jk Each mosaics shows: DATA (size of tiles) (some) marginal frequencies (spacing visual grouping) RESIDUALS (shading) what associations have been omitted? SPIDA Michael Friendly

44 nway2 E.g., Joint independence, [DG][A] (null model, Admit as response) [G 2 (11) = 877.1]: Model: (DeptGender)(Admit) A B C D E F Admitted Male Rejected Female SPIDA Michael Friendly

45 nway2 Mosaic displays for multiway tables Visual fitting: Pattern of lack-of-fit (residuals) better model smaller residuals cleaning the mosaic better model empty cells best done interactively! Model: (DeptGender)(DeptAdmit) A B C D E F E.g., Add [Dept Admit] association Conditional independence: Fits poorly, overall (G 2 (6) = 21.74) But, only in Department A! Admitted Male Rejected Female SPIDA Michael Friendly

46 nway2 Sequential plots and models Mosaic for an n-way table hierarchical decomposition of association in a way analogous to sequential fitting in regression Joint cell probabilities are decomposed as p ijkl = First 2 terms mosaic for v 1 and v 2 {v 1 v 2 } {}}{ p i p j i p k ij p l ijk p n ijk }{{} {v 1 v 2 v 3 } First 3 terms mosaic for v 1, v 2 and v 3 Sequential models of joint independence additive decomposition of the total association, G 2 [v 1 ][v 2 ]...[v p ] (mutual independence), G 2 [v 1 ][v 2 ]...[v p ] = G2 [v 1 ][v 2 ] +G2 [v 1 v 2 ][v 3 ] +G2 [v 1 v 2 v 3 ][v 4 ] + +G2 [v 1...v p 1 ][v p ] SPIDA Michael Friendly

47 nway2 Sequential plots and models: Example Hair color x Eye color marginal table (ignoring Sex) (Hair)(Eye), G2 (9) = Brown Hazel Green Blue Black Brown Red Blond SPIDA Michael Friendly

48 nway2 Sequential plots and models: Example 3-way table, Joint Independence Model [Hair Eye] [Sex] (HairEye)(Sex), G2 (15) = Brown Hazel Green Blue M F Black Brown Red Blond SPIDA Michael Friendly

49 nway2 Sequential plots and models: Example 3-way table, Mutual Independence Model [Hair] [Eye] [Sex] (Hair)(Eye)(Sex), G2 (24) = Brown Hazel Green Blue M F Black Brown Red Blond SPIDA Michael Friendly

50 nway2 Sequential plots and models: Example (Hair)(Eye), G2 (9) = (HairEye)(Sex), G2 (15) = (Hair)(Eye)(Sex), G2 (24) = Brown Hazel Green Blue + Brown Hazel Green Blue = Brown Hazel Green Blue Black Brown Red Blond M F Black Brown Red Blond M F Black Brown Red Blond [Hair] [Eye] G 2 (9) = [Hair Eye] [Sex] G 2 (15) = [Hair] [Eye] [Sex] G 2 (24) = SPIDA Michael Friendly

51 nway2 Mosaic matrices Analog of scatterplot matrix for categorical data (Friendly, 1999) Shows all p(p 1) pairwise views in a coherent display Each pairwise mosaic shows bivariate (marginal) relation Fit: marginal independence Residuals: show marginal associations Direct visualization of the Burt matrix analyzed in multiple correspondence analysis for p categorical variables Hair Black Brown Red Blond Black Brown Red Blond Brown Haz Grn Blue Male Female Blue Grn Brown Haz Eye Grn Blue Brown Haz Black Brown Red Blond Male Female Male Female Male Female Sex Black Brown Red Blond Brown Haz Grn Blue SPIDA Michael Friendly

52 nway2 Hair Brown Haz Grn Blue Male Female Black Brown Red Blond Brown Haz Grn Blue Eye Male Female Black Brown Red Blond Brown Haz Grn Blue Male Female Male Female Sex SPIDA Michael Friendly Brown Haz Grn Blue Black Brown Red Blond Black Brown Red Blond

53 nway2 Berkeley data: Admit Male Female A B C D E F Admit Reject Male Female Gender A B C D E F Admit Reject Male Female A B C D A B C D E Dept SPIDA Michael Friendly E F F Male Female Admit Reject Admit Reject

54 nway2 Partial association, Partial mosaics Stratified analysis: How does the association between two (or more) variables vary over levels of other variables? Mosaic plots for the main variables show partial association at each level of the other variables. E.g., Hair color, Eye color BY Sex TABLES sex * hair * eye; Sex: Male Sex: Female Brown Hazel Green Blue Brown Hazel Green Blue Black Brown Red -3.3 Blond Black Brown Red Blond SPIDA Michael Friendly

55 nway2 Partial association, Partial mosaics Stratified analysis: For models of partial independence, A B at each level of (controlling for) C A B C k, partial G 2 s add to the overall G 2 for conditional independence, G 2 A B C = k G 2 A B C(k) Table 7: Partial and Overall conditional tests, Hair Eye Sex Model df G 2 p-value [Hair][Eye] Male [Hair][Eye] Female [Hair][Eye] Sex SPIDA Michael Friendly

56 nway2 Software for Mosaic Displays Demonstration web applet: Runs the current version of mosaics via a cgi-script Can run sample data, upload a data file, enter data in a form. Choose model fitting and display options (not all supported). SPIDA Michael Friendly

57 mossoft SPIDA Michael Friendly

58 mossoft SPIDA Michael Friendly

59 mossoft SPIDA Michael Friendly

60 mossoft Software for Mosaic Displays SAS software & documentation: Examples: Many in VCD and on web site SAS/IML modules: mosaics.sas SAS/IML program Enter frequency table directly in SAS/IML, or read from a SAS dataset. Most flexible: Select, collapse, reorder, re-label table levels using SAS/IML statements Specify structural 0s, fit specialized models (e.g., quasi-independence) Interface to models fit using PROC GENMOD SPIDA Michael Friendly

61 mossoft Software for Mosaic Displays Macro interface: mosaic macro, table macro, mosmat macro mosaic macro Easiest to use: Direct input from a SAS dataset No knowledge of SAS/IML required Reorder table variables; collapse, reorder table levels with table macro Convenient interface to partial mosaics (BY=) table macro Create frequency table from raw data Collapse, reorder table categories Re-code table categories using SAS formats, e.g., 1= Male 2= Female mosmat macro Mosaic matrices analog of scatterplot matrix (Friendly, 1999) SPIDA Michael Friendly

62 mossoft mosaic macro example: Berkeley data berkeley.sas 1 title Berkeley Admissions data ; 2 proc format; 3 value admit 1="Admitted" 0="Rejected" ; 4 value dept 1="A" 2="B" 3="C" 4="D" 5="E" 6="F"; 5 value $sex M = Male F = Female ; 6 data berkeley; 7 do dept = 1 to 6; 8 do gender = M, F ; 9 do admit = 1, 0; 10 input 11 output; 12 end; end; end; 13 /* -- Male -- - Female- */ 14 /* Admit Rej Admit Rej */ 15 datalines; /* Dept A */ /* B */ /* C */ /* D */ /* E */ /* F */ 22 ; SPIDA Michael Friendly

63 mossoft Data set berkeley: dept gender admit freq 1 M M F F M M F F M M F F M M F F M M F F M M F F SPIDA Michael Friendly

64 mossoft mosaic macro example: Berkeley data mosaic9m.sas 1 goptions hsize=7in vsize=7in; 2 3 %include catdata(berkeley); 4 5 *-- apply character formats to numeric table variables; 6 %table(data=berkeley, 7 var=admit Gender Dept, 8 weight=freq, 9 char=y, format=admit admit. gender $sex. dept dept., 10 order=data, out=berkeley); %mosaic(data=berkeley, 13 vorder=dept Gender Admit, /* reorder variables */ 14 plots=2:3, /* which plots? */ 15 fittype=joint, /* fit joint indep. */ 16 split=h V V, htext=3); /* options */ SPIDA Michael Friendly

65 mossoft mosaic macro example: Berkeley data Model: (Dept)(Gender) Model: (DeptGender)(Admit) A B C D E F A B C D E F Male Two-way, Dept. by Gender Female Admitted Rejected Male Female Three-way, Dept. by Gender by Admit SPIDA Michael Friendly

66 mossoft mosmat macro: Mosaic matrices mosmat9m.sas 1 %include catdata(berkeley); 2 %mosmat(data=berkeley, 3 vorder=admit Gender Dept, sort=no); Admit Admit Reject Admit Reject Male Female A B C D E F Male Female Gender Male Female Admit Reject A B C D E F F A B C D E A B C D E F Dept Admit Reject Male Female SPIDA Michael Friendly

67 mossoft Partial mosaics mospart3.sas 1 %include catdata(hairdat3s); 2 %gdispla(off); 3 %mosaic(data=haireye, 4 vorder=hair Eye Sex, by=sex, 5 htext=2, cellfill=dev); 6 %gdispla(on); 7 %panels(rows=1, cols=2); /* make 2 figs -> 1 */ Sex: Male Sex: Female Brown Hazel Green Blue Brown Hazel Green Blue Black Brown Red -3.3 Blond Black Brown Red Blond SPIDA Michael Friendly

68 corresp Correspondence analysis and MCA Correspondence analysis (CA): Analog of PCA for frequency data: account for maximum % of χ 2 in few (2-3) dimensions finds scores for row (x im ) and column (y jm ) categories on these dimensions uses Singular Value Decomposition of residuals from independence, d ij = (n ij m ij )/ m ij d ij n = M m=1 λ m x im y jm optimal scaling: each pair of scores, x im ) and column (y jm, have highest possible correlation (= λ m ). plots of the row (x im ) and column (y jm ) scores show associations MCA: Extends CA to n-way tables, but only uses bivariate associations (like mosaic matrix) SPIDA Michael Friendly

69 corresp Hair color, Eye color data: 0.5 * Eye color * HAIR COLOR RED Green Dim 2 (9.5%) 0.0 Brown BLACK Hazel BROWN Blue BLOND Dim 1 (89.4%) Interpretation: row/column points near each other are positively associated Dim 1: 89.4% of χ 2 (dark light) Dim 2: 9.5% of χ 2 (RED/Green vs. others) SPIDA Michael Friendly

70 corresp PROC CORRESP and the CORRESP macro Two forms of input dataset: dataset in contingency table form column variables are levels of one factor, observations (rows) are levels of the other. Obs Eye BLACK BROWN RED BLOND 1 Brown Blue Hazel Green Raw category responses (case form), or cell frequencies (frequency form), classified by 2 or more factors (e.g., output from PROC FREQ) Obs Eye HAIR Count 1 Brown BLACK 68 2 Brown BROWN Brown RED 26 4 Brown BLOND Green RED Green BLOND 16 SPIDA Michael Friendly

71 corresp PROC CORRESP PROC CORRESP and the CORRESP macro Handles 2-way CA, extensions to n-way tables, and MCA Many options for scaling row/column coordinates and output statistics OUTC= option output dataset for plotting (PROC CORRESP doesn t do plots itself) CORRESP macro Uses PROC CORRESP for analysis Produces labeled plots of the category points in either 2 or 3 dimensions Many graphic options; can equate axes automatically See: SPIDA Michael Friendly

72 corresp Example: Hair and Eye Color Input the data in contingency table form 1 data haireye; corresp2a.sas 2 input EYE $ BLACK BROWN RED BLOND ; 3 datalines; 4 Brown Blue Hazel Green ; SPIDA Michael Friendly

73 corresp Example: Hair and Eye Color Using PROC CORRESP directly labeled printer plot proc corresp data=haireye outc=coord short; id eye; /* row variable */ var black brown red blond; /* col variables */ proc plot data=coord vtoh=2; /* plot step */ plot dim2 * dim1 = * $eye / box haxis=by.1 vaxis=by.1; /* plot options */ Using the CORRESP macro labeled high-res plot %corresp (data=haireye, id=eye, /* row variable */ var=black brown red blond, /* col variables */ dimlab=dim); /* options */ SPIDA Michael Friendly

74 corresp Example: Hair and Eye Color Printed output: The Correspondence Analysis Procedure Inertia and Chi-Square Decomposition Singular Principal Chi- Values Inertias Squares Percents % ************************* % *** % (Degrees of Freedom = 9) Row Coordinates Dim1 Dim2 Brown Blue Hazel Green Column Coordinates Dim1 Dim2 BLACK BROWN RED BLOND SPIDA Michael Friendly

75 corresp Example: Hair and Eye Color Output dataset(selected variables): Obs _TYPE_ EYE DIM1 DIM2 1 INERTIA.. 2 OBS Brown OBS Blue OBS Hazel OBS Green VAR BLACK VAR BROWN VAR RED VAR BLOND Row and column points are distinguished by the _TYPE_ variable: OBS vs. VAR SPIDA Michael Friendly

76 corresp Example: Hair and Eye Color Graphic output from CORRESP macro: * Eye color * HAIR COLOR 0.5 RED Green Dim 2 (9.5%) 0.0 Brown BLACK Hazel BROWN Blue BLOND Dim 1 (89.4%) Top legend produced with Annotate data set and the INANNO= option to the CORRESP macro SPIDA Michael Friendly

77 corresp Multi-way tables Stacking approach: van der Heijden and de Leeuw (1985) three-way table, of size I J K can be sliced and stacked as a two-way table, of size (I J) K I x J x K table J (I J )x K table I J J K J The variables combined are treated interactively Each way of stacking corresponds to a loglinear model (I J) K [AB][C] I (J K) [A][BC] J (I K) [B][AC] K SPIDA Michael Friendly

78 corresp Multi-way tables: Stacking PROC CORRESP: Use TABLES statement and option CROSS=ROW or CROSS=COL. E.g., for model [A B] [C], proc corresp cross=row; tables A B, C; weight count; CORRESP macro: Can use / instead of, %corresp( options=cross=row, tables=a B/ C, weight count); SPIDA Michael Friendly

79 corresp Example: Suicide Rates Suicide rates in West Germany, by Age, Sex and Method of suicide Sex Age POISON GAS HANG DROWN GUN JUMP M M M M M F F F F F CA of the [Age Sex] by [Method] table: Shows associations between the Age-Sex combinations and Method Ignores association between Age and Sex SPIDA Michael Friendly

80 corresp Example: Suicide Rates suicide5.sas 1 %include catdata(suicide); 2 *-- equate axes!; 3 axis1 order=(-.7 to.7 by.7) length=6.5 in label=(a=90 r=0); 4 axis2 order=(-.7 to.7 by.7) length=6.5 in; 5 %corresp(data=suicide, weight=count, 6 tables=%str(age sex, method), 7 options=cross=row short, 8 vaxis=axis1, haxis=axis2); Output: Inertia and Chi-Square Decomposition Singular Principal Chi- Values Inertias Squares Percents % ************************* % ************** % ** % % (Degrees of Freedom = 45) SPIDA Michael Friendly

81 corresp 0.7 Gas * F * F * M Gun Dimension 2 (33.0%) 0.0 Poison * M * F * M Jump * F * F Hang * M Drown * M Dimension 1 (60.4%) SPIDA Michael Friendly

82 MALE FEMALE Gun Gas Hang Poison Jump Drown Loglinear Models for Categorical Data corresp Compare with mosaic display: Suicide data - Model (SexAge)(Method) >65 SPIDA Michael Friendly Standardized residuals: <-8-8:-4-4:-2-2:-0 0:2 2:4 4:8 >8

83 structured Ordered categories More structured tables Tables with ordered categories allow more parsimonious representations of association Can represent λ AB ij by a small number of parameters more focused and more powerful tests of lack of independence Square tables For square I I tables, where row and column variables have the same categories: Can ignore diagonal cells, where association is expected and test remaining association (quasi-independence) Can test whether association is symmetric around the diagonal cells. SPIDA Michael Friendly

84 structured Ordered categories Ordinal scores In many cases it may be reasonable to assign numeric scores, {a i } to an ordinal row variable and/or numeric scores, {b i } to an ordinal column variable. Typically, scores are equally spaced and sum to zero, {a i } = i (I + 1)/2, e.g., {a i } = { 1, 0, 1} for I=3. Linear-by-Linear (Uniform) Association: When both variables are ordinal, the simplest model posits that any association is linear in both variables. λ AB ij = γ a i b j This model adds one additional parameter to the independence model (γ = 0). It is similar to For integer scores, the local log odds ratio for any contiguous 2 2 table are all equal, log θ ij = γ This is a model of uniform association SPIDA Michael Friendly

85 structured Row Effects and Column Effects: When only one variable is assigned scores, we have the row effects model or the column effects model. E.g., in the row effects model, the row variable (A) is treated as nominal, while the column variable (B) is assigned ordered scores {b j }. log m ij = µ + λ A i + λ B j + α i b j where the row parameters, α i, are defined so they sum to zero. This model has (I 1) more parameters than the independence model. A Row Effects + Column Effects model allows both variables to be ordered, but not necessarily with linear scores. Fitting models for ordinal variables Create numeric variables for category scores PROC GENMOD: Use as quantitative variables in MODEL statement, but not listed as CLASS variables SPIDA Michael Friendly

86 structured Example: Mental impairment and parents SES Srole et al. (1978) Data on mental health status of 1600 young NYC residents in relation to parents SES. Mental health: Well, mild symptoms, moderate symptoms, Impaired SES: 1 (High) 6 (Low) Mental Parents SES health High Low 1: Well : Mild : Moderate : Impaired SPIDA Michael Friendly

87 structured Mental Impairment and SES: Independence Linear x Linear Mental Well Mild Moderate Impaired Mental Well Mild Moderate Impaired High Low SES High Low SES Visual assessment: Residuals from the independence model show an opposite-corner pattern. This is consistent with both: Linear linear model: equi-spaced scores for both Mental and SES Row effects model: equi-spaced scores for SES, ordered scores for Mental SPIDA Michael Friendly

88 structured Statistical assesment: Table 8: Mental health data: Goodness-of-fit statistics for ordinal loglinear models Model G 2 df Pr(> G 2 ) AIC AIC-best Independence Col effects (SES) Row effects (mental) Lin x Lin Both the Row Effects and Linear linear models are significantly better than the Independence model AIC indicates a slight preference for the Linear linear model In the Linear linear model, the estimate of the coefficient of a i b j is ˆγ = = log θ, so ˆθ = exp(0.0907) = each step down the SES scale increases the odds of being classified one step poorer in mental health by 9.5%. SPIDA Michael Friendly

89 structured Fitting these models with PROC GENMOD: mentgen2.sas 1 %include catdata(mental); 2 data mental; 3 set mental; 4 m_lin = mental; *-- copy m_lin and s_lin for; 5 s_lin = ses; *-- use non-class variables; 6 7 title Independence model ; 8 proc genmod data=mental; 9 class mental ses; 10 model count = mental ses / dist=poisson obstats residuals; 11 format mental mental. ses ses.; 12 ods output obstats=obstats; 13 %mosaic(data=obstats, vorder=mental SES, resid=stresdev, 14 title=mental Impairment and SES: Independence, split=h V); Row Effects model: mentgen2.sas 16 proc genmod data=mental; 17 class mental ses; 18 model count = mental ses mental*s_lin / dist=poisson obstats; Linear linear model: mentgen2.sas 21 proc genmod data=mental; 22 class mental ses; 23 model count = mental ses m_lin*s_lin / dist=poisson obstats; SPIDA Michael Friendly

90 square Square-ish tables Tables where two (or more) variables have the same category levels: Employment categories of related persons (mobility tables) Multiple measurements over time (panel studies; longitudinal data) Repeated measures on the same individuals under different conditions Related/repeated measures are rarely independent, but may have simpler forms than general association E.g., vision data: Left and right eye acuity grade for 7477 women Independence, G2(9)= Low RightEye High 2 3 High 2 3 LeftEye Low SPIDA Michael Friendly

91 square Quasi-Independence Related/repeated measures are rarely independent most observations often fall on diagonal cells. Quasi-independence ignores diagonals, tests independence in remaining cells (λ ij = 0 for i j). The model dedicates one parameter (δ i ) to each diagonal cell, fitting them exactly, where I( ) is the indicator function. log m ij = µ + λ A i + λ B j + δ i I(i = j) This model may be fit as a GLM by including indicator variables for each diagonal cell. DIAG 4 rows 4 cols SPIDA Michael Friendly

92 square Using PROC GENMOD mosaic10g.sas 1 title Quasi-independence model (women) ; 2 proc genmod data=women; 3 class RightEye LeftEye diag; 4 model Count = LeftEye RightEye diag / 5 dist=poisson link=log obstats residuals; 6 ods output obstats=obstats; 7 %mosaic(data=obstats, vorder=righteye LeftEye,...); Mosaic: Quasi-Independence, G2(5)=199.1 Low RightEye High 2 3 High 2 3 LeftEye Low SPIDA Michael Friendly

93 square Symmetry Tests whether the table is symmetric around the diagonal, i.e., m ij = m ji As a loglinear model, symmetry is log m ij = µ + λ A i + λ B j + λ AB ij, subject to the conditions λ A i = λ B j and λ AB ij = λ AB ji. This model may be fit as a GLM by including indicator variables with equal values for symmetric cells, and indicators for the diagonal cells (fit exactly) SYMMETRY 4 rows 4 cols) SPIDA Michael Friendly

94 square Using PROC GENMOD mosaic10g.sas 1 proc genmod data=women; 2 class symmetry; 3 model Count = symmetry / 4 dist=poisson link=log obstats residuals; 5 ods output obstats=obstats; 6 %mosaic(data=obstats, vorder=righteye LeftEye,...); Mosaic: Symmetry, G2(6)=19.25 Low RightEye High 2 3 High 2 3 LeftEye Low SPIDA Michael Friendly

95 square Quasi-Symmetry Symmetry is often too restrictive: equal marginal frequencies (λ A i = λ B i ) PROC GENMOD: Use the usual marginal effect parameters + symmetry: mosaic10g.sas 1 proc genmod data=women; 2 class LeftEye RightEye symmetry; 3 model Count = LeftEye RightEye symmetry / 4 dist=poisson link=log obstats residuals; 5 ods output obstats=obstats; Quasi-Symmetry, G2(3)=7.27 Low RightEye High 2 3 High 2 3 LeftEye Low SPIDA Michael Friendly

96 square Table 9: Summary of models fit to vision data Model G 2 df Pr(> G 2 ) AIC AIC - min(aic) Independence Linear*Linear Row+Column Effects Quasi-Independence Symmetry Quasi-Symmetry Ordinal Quasi-Symmetry Only the quasi-symmetry models provide an acceptable fit The ordinal quasi-symmetry model is most parsimonious SPIDA Michael Friendly

97 loglin References Bickel, P. J., Hammel, J. W., and O Connell, J. W. Sex bias in graduate admissions: Data from Berkeley. Science, 187: , Friendly, M. SAS System for Statistical Graphics. SAS Institute, Cary, NC, 1st edition, Friendly, M. Mosaic displays for multi-way contingency tables. Journal of the American Statistical Association, 89: , Friendly, M. Conceptual and visual models for categorical data. The American Statistician, 49: , Friendly, M. Extending mosaic displays: Marginal, conditional, and partial views of categorical data. Journal of Computational and Graphical Statistics, 8(3): , Friendly, M. Visualizing Categorical Data. SAS Institute, Cary, NC, Friendly, M. Corrgrams: Exploratory displays for correlation matrices. The American Statistician, 56(4): , Friendly, M. and Kwan, E. Effect ordering for data displays. Computational Statistics and Data Analysis, 43(4): , Hartigan, J. A. and Kleiner, B. Mosaics for contingency tables. In Eddy, W. F., editor, Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface, pp Springer-Verlag, New York, NY, Srole, L., Langner, T. S., Michael, S. T., Kirkpatrick, P., Opler, M. K., and Rennie, T. A. C. Mental Health in the Metropolis: The Midtown Manhattan Study. NYU Press, New York, SPIDA Michael Friendly

98 loglin Tufte, E. R. The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT, van der Heijden, P. G. M. and de Leeuw, J. Correspondence analysis used complementary to loglinear analysis. Psychometrika, 50: , SPIDA Michael Friendly

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Choosing Statistical Method 1 Choice an appropriate method 2 Cross-tabulation More advance analysis of frequency tables

More information

Loglinear and Logit Models for Contingency Tables

Loglinear and Logit Models for Contingency Tables Chapter 8 Loglinear and Logit Models for Contingency Tables Loglinear models comprise another special case of generalized linear models designed for contingency tables of frequencies. They are most easily

More information

Two-way contingency tables

Two-way contingency tables Chapter 4 Two-way contingency tables The analysis of two-way frequency tables concerns the association between two variables. A variety of specialized graphical displays help to visualize the pattern of

More information

ANALYSIS OF NOMINAL DATA MULTI-WAY CONTINGENCY TABLE

ANALYSIS OF NOMINAL DATA MULTI-WAY CONTINGENCY TABLE M A T H E M A T I C A L E C O N O M I C S No. 7(14) 2011 ANALYSIS OF NOMINAL DATA MULTI-WAY CONTINGENCY TABLE Abstract. Presented in this paper the method of graphical presentation of the relationship

More information

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975. Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975. SPSS Statistics were designed INTRODUCTION TO SPSS Objective About the

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

Michael Friendly. Categorical Data Analysis with Graphics

Michael Friendly. Categorical Data Analysis with Graphics -5 0 5 Sqrt(frequency) 10 15 20 25 30 35 40 April, 2003 SUGI 28 York University, Toronto, Michael Friendly Black Brown Red Blond 0 2 4 6 8 10 12 Number of males Left Eye Grade High

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

Version 3.6. Michael Friendly Psychology Department York University

Version 3.6. Michael Friendly Psychology Department York University User s Guide for MOSAICS Version 3.6 Michael Friendly Psychology Department York University Contents 1 Introduction 1 2 Installation Guide 2 2.1 How to obtain MOSAICS.... 3 2.2 Installing MOSAICS.......

More information

An introduction to SPSS

An introduction to SPSS An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible

More information

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values Chapter 500 Introduction This procedure produces tables of frequency counts and percentages for categorical and continuous variables. This procedure serves as a summary reporting tool and is often used

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 5, 2015 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

Categorical Data Analysis with Graphics. Michael Friendly. April, 2003 SUGI 28.

Categorical Data Analysis with Graphics. Michael Friendly.   April, 2003 SUGI 28. SUGI 8 Michael Friendly Color version of these slides: http://www.math.yorku.ca/scs/sugi/sugi8.pdf Logit plots, effect plots Diagnostic plots Logistic and logit regression Mosaic displays Mosaic matrices

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

AND NUMERICAL SUMMARIES. Chapter 2

AND NUMERICAL SUMMARIES. Chapter 2 EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Intermediate SAS: Statistics

Intermediate SAS: Statistics Intermediate SAS: Statistics OIT TSS 293-4444 oithelp@mail.wvu.edu oit.wvu.edu/training/classmat/sas/ Table of Contents Procedures... 2 Two-sample t-test:... 2 Paired differences t-test:... 2 Chi Square

More information

General Factorial Models

General Factorial Models In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 34 It is possible to have many factors in a factorial experiment. In DDD we saw an example of a 3-factor study with ball size, height, and surface

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or  me, I will answer promptly. Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00

More information

UNIT 4. Research Methods in Business

UNIT 4. Research Methods in Business UNIT 4 Preparing Data for Analysis:- After data are obtained through questionnaires, interviews, observation or through secondary sources, they need to be edited. The blank responses, if any have to be

More information

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 13.1 User s Guide. The NESTED Procedure SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

General Factorial Models

General Factorial Models In Chapter 8 in Oehlert STAT:5201 Week 9 - Lecture 1 1 / 31 It is possible to have many factors in a factorial experiment. We saw some three-way factorials earlier in the DDD book (HW 1 with 3 factors:

More information

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement

More information

4 Displaying Multiway Tables

4 Displaying Multiway Tables 4 Displaying Multiway Tables An important subset of statistical data comes in the form of tables. Tables usually record the frequency or proportion of observations that fall into a particular category

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Chapter Two: Descriptive Methods 1/50

Chapter Two: Descriptive Methods 1/50 Chapter Two: Descriptive Methods 1/50 2.1 Introduction 2/50 2.1 Introduction We previously said that descriptive statistics is made up of various techniques used to summarize the information contained

More information

Strategies for Modeling Two Categorical Variables with Multiple Category Choices

Strategies for Modeling Two Categorical Variables with Multiple Category Choices 003 Joint Statistical Meetings - Section on Survey Research Methods Strategies for Modeling Two Categorical Variables with Multiple Category Choices Christopher R. Bilder Department of Statistics, University

More information

POLS 205 Political Science as a Social Science. Analyzing Tables of Data

POLS 205 Political Science as a Social Science. Analyzing Tables of Data POLS 205 Political Science as a Social Science Analyzing Tables of Data Christopher Adolph University of Washington, Seattle May 17, 2010 Chris Adolph (UW) Tabular Data May 17, 2010 1 / 67 Review Inference

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

The NESTED Procedure (Chapter)

The NESTED Procedure (Chapter) SAS/STAT 9.3 User s Guide The NESTED Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation for the complete manual

More information

GRAPHICAL METHODS FOR CATEGORICAL DATA. Michael Friendly, York University. Seive diagrams

GRAPHICAL METHODS FOR CATEGORICAL DATA. Michael Friendly, York University. Seive diagrams GRAPHCAL METHODS FOR CATEGORCAL DATA Michael Friendly York University Abstract Statistical methods for categorical data such as loglinear models and logistic regression represent discrete analogs of the

More information

ARTIFICIAL INTELLIGENCE (CS 370D)

ARTIFICIAL INTELLIGENCE (CS 370D) Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-18) LEARNING FROM EXAMPLES DECISION TREES Outline 1- Introduction 2- know your data 3- Classification

More information

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users BIOSTATS 640 Spring 2018 Review of Introductory Biostatistics STATA solutions Page 1 of 13 Key Comments begin with an * Commands are in bold black I edited the output so that it appears here in blue Unit

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

Enterprise Miner Tutorial Notes 2 1

Enterprise Miner Tutorial Notes 2 1 Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

Inference for loglinear models (contd):

Inference for loglinear models (contd): Stat 504, Lecture 25 1 Inference for loglinear models (contd): Loglinear/Logit connection Intro to Graphical Models Stat 504, Lecture 25 2 Loglinear Models no distinction between response and explanatory

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 327 Geometric Regression Introduction Geometric regression is a special case of negative binomial regression in which the dispersion parameter is set to one. It is similar to regular multiple regression

More information

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95 A Statistical Analysis Macro Library in SAS Carl R. Haske, Ph.D., STATPROBE, nc., Ann Arbor, M Vivienne Ward, M.S., STATPROBE, nc., Ann Arbor, M ABSTRACT Statistical analysis plays a major role in pharmaceutical

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC. Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics

More information

book 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition

book 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition book 2014/5/6 15:21 page v #3 Contents List of figures List of tables Preface to the second edition Preface to the first edition xvii xix xxi xxiii 1 Data input and output 1 1.1 Input........................................

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Hierarchical Clustering / Dendrograms

Hierarchical Clustering / Dendrograms Chapter 445 Hierarchical Clustering / Dendrograms Introduction The agglomerative hierarchical clustering algorithms available in this program module build a cluster hierarchy that is commonly displayed

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

CS 4460 Intro. to Information Visualization Sep. 18, 2017 John Stasko

CS 4460 Intro. to Information Visualization Sep. 18, 2017 John Stasko Multivariate Visual Representations 1 CS 4460 Intro. to Information Visualization Sep. 18, 2017 John Stasko Learning Objectives For the following visualization techniques/systems, be able to describe each

More information

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM * Which directories are used for input files and output files? See menu-item "Options" and page 22 in the manual.

More information

Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Introduction

Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Introduction Graphical Presentation for Statistical Data (Relevant to AAT Examination Paper 4: Business Economics and Financial Mathematics) Y O Lam, SCOPE, City University of Hong Kong Introduction The most convenient

More information

Subset Selection in Multiple Regression

Subset Selection in Multiple Regression Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that

More information

SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/

SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/ STAT:5201 Applied Statistic II (Factorial with 3 factors as 2 3 design) Three-way ANOVA (Factorial with three factors) with replication Factor A: angle (low=0/high=1) Factor B: geometry (shape A=0/shape

More information

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska

Preprocessing Short Lecture Notes cse352. Professor Anita Wasilewska Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

CS 237: Probability in Computing

CS 237: Probability in Computing CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 25: Logistic Regression Motivation: Why Logistic Regression? Sigmoid functions the logit transformation

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research Liping Huang, Center for Home Care Policy and Research, Visiting Nurse Service of New York, NY, NY ABSTRACT The

More information

Zero-Inflated Poisson Regression

Zero-Inflated Poisson Regression Chapter 329 Zero-Inflated Poisson Regression Introduction The zero-inflated Poisson (ZIP) regression is used for count data that exhibit overdispersion and excess zeros. The data distribution combines

More information

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques. Section 2.1 - Introduction Graphs are commonly used to organize, summarize, and analyze collections of data. Using a graph to visually present a data set makes it easy to comprehend and to describe the

More information

Interactive Math Glossary Terms and Definitions

Interactive Math Glossary Terms and Definitions Terms and Definitions Absolute Value the magnitude of a number, or the distance from 0 on a real number line Addend any number or quantity being added addend + addend = sum Additive Property of Area the

More information

Hierarchical Generalized Linear Models

Hierarchical Generalized Linear Models Generalized Multilevel Linear Models Introduction to Multilevel Models Workshop University of Georgia: Institute for Interdisciplinary Research in Education and Human Development 07 Generalized Multilevel

More information

Analysis of Complex Survey Data with SAS

Analysis of Complex Survey Data with SAS ABSTRACT Analysis of Complex Survey Data with SAS Christine R. Wells, Ph.D., UCLA, Los Angeles, CA The differences between data collected via a complex sampling design and data collected via other methods

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA

NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA Statistics is concerned with scientific methods of collecting, recording, organising, summarising, presenting and analysing data from which future

More information

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Why enhance GLM? Shortcomings of the linear modelling approach. GLM being

More information

LESSON 1: INTRODUCTION TO COUNTING

LESSON 1: INTRODUCTION TO COUNTING LESSON 1: INTRODUCTION TO COUNTING Counting problems usually refer to problems whose question begins with How many. Some of these problems may be very simple, others quite difficult. Throughout this course

More information

Downloaded from

Downloaded from UNIT 2 WHAT IS STATISTICS? Researchers deal with a large amount of data and have to draw dependable conclusions on the basis of data collected for the purpose. Statistics help the researchers in making

More information

Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table

Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table Q Cheat Sheets What to do when you cannot figure out how to use Q What to do when the data looks wrong Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

Introducing Categorical Data/Variables (pp )

Introducing Categorical Data/Variables (pp ) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Definition: Feature Engineering (FE) = the process of transforming the data to an optimal representation for a given application. Scaling (see Chs.

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

User Services Spring 2008 OBJECTIVES Introduction Getting Help Instructors

User Services Spring 2008 OBJECTIVES  Introduction Getting Help  Instructors User Services Spring 2008 OBJECTIVES Use the Data Editor of SPSS 15.0 to to import data. Recode existing variables and compute new variables Use SPSS utilities and options Conduct basic statistical tests.

More information

SAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure

SAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure SAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure This document is an individual chapter from SAS/STAT 14.3 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute

More information

Biostat Methods STAT 5820/6910 Handout #4: Chi-square, Fisher s, and McNemar s Tests

Biostat Methods STAT 5820/6910 Handout #4: Chi-square, Fisher s, and McNemar s Tests Biostat Methods STAT 5820/6910 Handout #4: Chi-square, Fisher s, and McNemar s Tests Example 1: 152 patients were randomly assigned to 4 dose groups in a clinical study. During the course of the study,

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Advances in Visualizing Categorical Data Using the vcd, gnm and vcdextra Packages in R

Advances in Visualizing Categorical Data Using the vcd, gnm and vcdextra Packages in R intro vcd gnm 3D odds References Advances in Visualizing Categorical Data Using the vcd, gnm and vcdextra Packages in R Michael Friendly 1 Heather Turner 2 David Firth 2 Achim Zeileis 3 1 Psychology Department

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

ECLT 5810 Clustering

ECLT 5810 Clustering ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping

More information

Section 4 General Factorial Tutorials

Section 4 General Factorial Tutorials Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One

More information

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences 1 RefresheR Figure 1.1: Soy ice cream flavor preferences 2 The Shape of Data Figure 2.1: Frequency distribution of number of carburetors in mtcars dataset Figure 2.2: Daily temperature measurements from

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information

MATH 117 Statistical Methods for Management I Chapter Two

MATH 117 Statistical Methods for Management I Chapter Two Jubail University College MATH 117 Statistical Methods for Management I Chapter Two There are a wide variety of ways to summarize, organize, and present data: I. Tables 1. Distribution Table (Categorical

More information

HYPERVARIATE DATA VISUALIZATION

HYPERVARIATE DATA VISUALIZATION HYPERVARIATE DATA VISUALIZATION Prof. Rahul C. Basole CS/MGT 8803-DV > January 25, 2017 Agenda Hypervariate Data Project Elevator Pitch Hypervariate Data (n > 3) Many well-known visualization techniques

More information

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview Chapter 888 Introduction This procedure generates D-optimal designs for multi-factor experiments with both quantitative and qualitative factors. The factors can have a mixed number of levels. For example,

More information

MINITAB 17 BASICS REFERENCE GUIDE

MINITAB 17 BASICS REFERENCE GUIDE MINITAB 17 BASICS REFERENCE GUIDE Dr. Nancy Pfenning September 2013 After starting MINITAB, you'll see a Session window above and a worksheet below. The Session window displays non-graphical output such

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

Medoid Partitioning. Chapter 447. Introduction. Dissimilarities. Types of Cluster Variables. Interval Variables. Ordinal Variables.

Medoid Partitioning. Chapter 447. Introduction. Dissimilarities. Types of Cluster Variables. Interval Variables. Ordinal Variables. Chapter 447 Introduction The objective of cluster analysis is to partition a set of objects into two or more clusters such that objects within a cluster are similar and objects in different clusters are

More information

Mosaic displays for n-way tables

Mosaic displays for n-way tables Chapter 5 Mosaic displays for n-way tables Mosaic displays help to visualize the pattern of associations among variables in two-way and larger tables. Extensions of this technique can reveal partial associations,

More information