Categorical Data Analysis with Graphics. Michael Friendly. April, 2003 SUGI 28.

Size: px
Start display at page:

Download "Categorical Data Analysis with Graphics. Michael Friendly. April, 2003 SUGI 28."

Transcription

1 SUGI 8 Michael Friendly Color version of these slides: Logit plots, effect plots Diagnostic plots Logistic and logit regression Mosaic displays Mosaic matrices Mosaic displays and loglinear models for n-way tables Fourfold displays Sieve diagrams Methods for two-way frequency tables Hanging rootograms Robust distribution plots Methods for discrete distributions Overview: Categorical Data and Graphics Outline April, SUGI 8 York University, Toronto, <friendly@yorku.ca> Michael Friendly Black Brown Red Blond 6 8 Number of males Left Eye Grade -5 High Low -5.9 Low -. Sqrt(frequency) Right Eye Grade Unaided distant vision data High Brown Hazel Green Blue Categorical Data Analysis Methods of analysis for categorical data fall into two main categories: Non-parametric, randomization-based methods make minimal assumptions useful for hypothesis-testing SAS: PROC FREQ Pearson Chi-square Fisher s exact test (for small expected frequencies) Mantel-Haenszel tests (ordered categories: test for linear association) Model-based methods Must assume random sample (possibly stratified) Useful for estimation purposes Greater flexibility; fitting specialized models (e.g., symmetry) More suitable for multi-way tables SAS: PROC LOGISTIC, CATMOD, GENMOD, INSIGHT (Fit YX) estimate standard errors, covariances for model parameters confidence intervals for parameters, predicted Pr{response} SUGI 8 Michael Friendly Graphical Methods for Categorical Data If I can t picture it, I can t understand it. Albert Einstein Getting information from a table is like extracting sunlight from a cucumber. Farquhar & Farquhar, 89 Tables vs. Graphs Tables are best suited for look-up read off exact numbers Graphs are better for showing patterns, trends, anomalies, making comparisons Visual presentation as communication: what do you want to say? SUGI 8 Michael Friendly

2 Auto data: PC/ order 8 6 Principles of Graphical Displays Effect ordering (Friendly and Kwan, ) In tables and graphs, sort unordered factors according to the effects you want to see/show. Auto data: Alpha order Corrgrams: Exploratory displays for correlation matrices (Friendly, ) SUGI 8 Michael Friendly Effect ordering and high-lighting for tables (Friendly, ) Table : Hair color - Eye color data: Effect ordered Hair color Eye color Black Brown Red Blond Brown Hazel 5 5 Green Blue Table : Hair color - Eye color data: Alpha ordered Hair color Eye color Blond Black Brown Red Blue Brown Green 5 5 Hazel Model: Independence: [Hair][Eye] χ (9)= 8.9 Color coding: <- <- <- > > > n in each cell: n < expected n > expected SUGI 8 5 Michael Friendly Displa Gratio Hroom Length MPG Price Rep77 Rep78 Rseat Trunk Turn Weight Gratio MPG Rep78 Rep77 Price Hroom Trunk Rseat Length Weight Displa Turn Turn Displa Weight Length Weight Rseat Trunk Hroom Price Rep77 Rep78 Turn Trunk Rseat Rep78 Rep77 Price MPG Gratio MPG Length Hroom Gratio Displa SUGI 8 7 Michael Friendly Admit Reject Dept Admit Reject Gender Admit Admit Reject Admit Reject Small multiples (Tufte, 98) combine stratified graphs into coherent displays Comparisons Make visual comparisons easy SUGI 8 6 Michael Friendly 5 6 Number of Occurrences 5 6 Number of Occurrences Frequency 75 Sqrt(frequency) Visual grouping connect with lines, make key comparisons contiguous Baselines compare data to model against a line, preferably horizontal Comparisons Make visual comparisons easy

3 SUGI 8 9 Michael Friendly VCD and SSSG - Make these methods available and accessible in SAS Practical power = Statistical power Probability of Use Today s goal: take-home knowledge Tomorrow s goal: dynamic, interactive graphics for categorical data Goals Diagnostic plots - influence, violation of assumptions Effect plots - estimated probabilities of response or log odds Residual plots - departures from model, omitted terms,... Plots for model-based methods Help detect patterns, trends, anomalies, suggest hypotheses Show the data, not just summaries Minimal assumptions (like non-parametric methods) Exploratory methods SUGI 8 8 Michael Friendly Fourfold display for table Mosaic plot for -way table Sex: Female Admitted Rejected Admit?: Yes Sex: Male Admit?: No Model: (DeptGender)(Admit) Frequency data: count area Quantitative data: magnitude position along an axis Visual metaphors VCD Macros & SAS/IML programs Macros, datasets available at Discrete distributions DISTPLOT Plots for discrete distributions GOODFIT Goodness-of-fit for discrete distributions ORDPLOT Ord plot for discrete distributions POISPLOT Poissonness plot ROOTGRAM Hanging rootograms Two-way and n-way tables AGREE Observer agreement chart CORRESP Plot PROC CORRESP results FFOLD Fourfold displays for k tables (macro) FOURFOLD Fourfold displays for k tables (SAS/IML) SIEVE Sieve diagrams (SAS/IML) MOSAIC Mosaic displays (macro) MOSAICS SAS/IML modules for mosaic displays MOSMAT Mosaic matrices (macro) TABLE Construct a grouped frequency table, with recoding TRIPLOT Trilinear plots for n tables SUGI 8 Michael Friendly Model-based methods ADDVAR Added variable plots for logistic regression CATPLOT Plot results from PROC CATMOD HALFNORM Half-normal plots for generalized linear models INFLGLIM Influence plots for generalized linear models INFLOGIS Influence plots for logistic regression LOGODDS Plot empirical logits and probabilities for binary data POWERLOG Power calculations for logistic regression POWERRxC Power calculations for two-way frequency table POWERx Power calculations for a table ROBUST Robust fitting for linear models Utility macros DUMMY Create dummy variables LAGS Calculate lagged frequencies for sequential analysis PANELS Arrange multiple plots in a panelled display SORT Sort a dataset by the value of a statistic or formatted value Utility Graphics utility macros: BARS, EQUATE, GDISPLA, GENSYM, GSKIP, LABEL, POINTS, PSCALE VCD Archive (vcdprog.zip) available to purchasers at: support.sas.com/publishing/bbu/5657_sample.html SUGI 8 Michael Friendly

4 Discrete distributions Counts of occurrences: accidents, words in text, blood cells with some characteristic. Data: Basic outcome value, k, k =,,..., and number of observations, nk, with that value. Example: distributions of key marker words: from, may, whilst,... in Federalist Papers by James Madison, e.g., blocks of words with may: Occurrences (k) 5 6 Blocks (nk) Example: Saxony families with children having k =,,... sons. k nk SUGI 8 Michael Friendly Discrete distributions Questions: What process gave rise to the distribution? Form of distribution: uniform, binomial, Poisson, negative binomial, geometric, etc.? Estimate parameters Visualize goodness of fit For example: Federalist Papers: might expect a Poisson(λ) distribution. Families in Saxony: might expect a Bin(n, p) distribution with n =. Perhaps p =.5 as well. SUGI 8 Michael Friendly Fitting and graphing discrete distributions VCD methods to fit, visualize, and diagnose discrete distributions: Fitting: GOODFIT macro fits uniform, binomial, Poisson, negative binomial, geometric, logarithmic series distributions (or any specified multinomial) Hanging rootograms: Sensitively assess departure between Observed, Fitted counts (ROOTGRAM macro) Ord plots: Diagnose form of a discrete distribution (ORDPLOT macro) Poissonness plots: Robust fitting and diagnostic plots for Poisson (POISPLOT macro) Robust distribution plots (DISTPLOT macro) SUGI 8 Michael Friendly GOODFIT macro: Fitting discrete distributions GOODFIT macro fits uniform, binomial, Poisson, negative binomial, geometric, logarithmic series distributions (or any specified multinomial) E.g., Try fitting Poisson model madfit.sas title "Instances of may in Federalist papers"; data madison; input count blocks; label count= Number of Occurrences 5 blocks= Blocks of Text ; 6 datalines; ; 5 %goodfit(data=madison, var=count, freq=blocks, 6 dist=poisson); SUGI 8 5 Michael Friendly

5 Fitting discrete distributions The GOODFIT macro gives a table of observed and fitted frequencies, Pearson χ residuals (CHI) and likelihood-ratio deviance residuals (DEV). Instances of may in Federalist papers COUNT BLOCKS PHAT EXP CHI DEV ====== ======= ======= SUGI 8 6 Michael Friendly Fitting discrete distributions In addition, it provides the overall goodness-of-fit tests: Goodness-of-fit test for data set MADISON Analysis variable: COUNT Number of Occurrences Distribution: POISSON Estimated Parameters: lambda =.6565 Pearson chi-square = Prob > chi-square = Likelihood ratio G = 5. Prob > chi-square =.55 Degrees of freedom = 5 The poisson model does not fit! Why? SUGI 8 7 Michael Friendly SUGI 8 9 Michael Friendly 5 6 Number of Occurrences - Sqrt(frequency) 8 6 %goodfit(data=madison, var=count, freq=blocks, dist=poisson, out=fit); %rootgram(data=fit, var=count, obs=blocks); shift histogram bars to the fitted curve judge deviations vs. horizontal line. plot freq smaller frequencies are emphasized. Tukey (97, 977): Hang & root them Hanging rootograms SUGI 8 8 Michael Friendly 5 6 Number of Occurrences Frequency 6 8 largest frequencies dominate display must assess deviations vs. a curve Problems: 6 %goodfit(data=madison, var=count, freq=blocks, dist=poisson); Discrete distributions often graphed as histograms, with a theoretical fitted distribution superimposed. What s wrong with histograms?

6 SUGI 8 Michael Friendly 5 6 Occurrences of may Frequency Ratio, (k n(k) / n(k-)) 5 slope =. intercept=-. type: Negative binomial parm: p = Diagnoses distribution as NegBin Estimates ˆp =.576 ORDPLOT macro %ordplot(data=madison, count=count, freq=blocks); Ord plots SUGI 8 Michael Friendly Fit line by WLS, using nk as weights Slope Intercept Distribution Parameter (b) (a) (parameter) estimate + Poisson (λ) λ = a + Binomial (n, p) p = b/(b ) + + Neg. binomial (n,p) p = b + Log. series (θ) θ = b θ = a plot of kpk/pk against k is linear signs of intercept and slope determine the form, give rough estimates of parameters Ord (967): for each of Poisson, Binomial, Negative Binomial, and Logarithmic Series distributions, How to tell which discrete distributions are likely candidates? Ord plots: Diagnose form of discrete distribution SUGI 8 Michael Friendly Robust plots for Poisson distribution (Hoaglin and Tukey, 985) For Poisson, plot count metameter = φ (nk) = log e (k! nk/n) vs. k Linear relation Poisson, slope gives ˆλ CI for points, diagnostic (influence) plot POISPLOT macro Ord plots lack robustness one discrepant freqency, nk affects points for both k and k + Robust distribution plots: Poisson SUGI 8 Michael Friendly Logarithmic series Binomial Number collected Number of males Frequency Ratio, (k n(k) / n(k-)) Frequency Ratio, (k n(k) / n(k-)) slope = intercept=.96 type: Binomial parm: p = slope =.6 intercept=-.79 type: Logarithmic series parm: theta =.6 Ord plot: Families in Saxony Butterfly species collected in Malaya Other distributions: Ord plots

7 SUGI 8 5 Michael Friendly Poissonness plot Influence plot for change in λ Number of Occurrences Scaled Leverage Poisson metameter, ln(k! n(k) / N) Parameter change lambda: mean =.656 exp(slope) =.56. slope =.8 intercept= Instances of may in Federalist papers Influence plot Curvilinear relation distribution is not Poisson POISPLOT macro: output SUGI 8 Michael Friendly title "Instances of may in Federalist papers"; data madison; input count blocks; label count= Number of Occurrences 5 blocks= Blocks of Text ; 6 datalines; ; 5 %poisplot(data=madison,count=count, freq=blocks); POISPLOT macro: example SUGI 8 7 Michael Friendly Correspondence analysis and MCA (CORRESP macro) Fit loglinear models, visualize lack-of-fit (MOSAIC macro) Test & visualize partial association (MOSAIC macro) Visualize pairwise association (MOSMAT macro) Visualize conditional association (MOSMAT macro) Visualize loglinear structure (MOSMAT macro) n-way tables Two-way tables tables Visualize odds ratio (FFOLD macro) k tables Homogeneity of association r tables Trilinear plots (TRIPLOT macro) r c tables Visualize association (SIEVE program) r c tables Visualize association (MOSAIC macro) Square r r tables Visualize agreement (AGREE program) Contingency tables SUGI 8 6 Michael Friendly 5 6 Number of Occurrences - -9 n: a/log(p) =. p: -e(b) =.69 slope(b) = -.99 intercept= Count metameter Linear relation correct distribution, slope gives parameter estimates CI reflect variability of the individual counts, nk DISTPLOT macro %distplot(data=madison, count=count, freq=blocks, dist=negbin); Other distributions: Analogous plots, for suitable count metameter, φ (nk) vs. k. Generalized robust distribution plots

8 Methods for tables Bickel et al. (975): data on admissions to graduate depatments at U. C. Berkeley in 97. Aggregate data for the six largest departments: Table : Admissions to Berkeley graduate programs Admitted Rejected Total % Admitted Males Females Total Evidence for gender bias? G () = 9.7, χ () = 9., p <. Odds ratio, θ = Odds(Admit Male) Odds(Admit =.8 Female) Males 8% more likely to be admitted. SUGI 8 8 Michael Friendly Standard analysis: PROC FREQ proc freq data=berkeley; weight freq; tables gender*admit / chisq; Output: Statistics for Table of gender by admit Statistic DF Value Prob Chi-Square 9.5 <. Likelihood Ratio Chi-Square 9.9 <. Continuity Adj. Chi-Square <. Mantel-Haenszel Chi-Square 9.89 <. Phi Coefficient.7 How to visualize and interpret? SUGI 8 9 Michael Friendly SUGI 8 Michael Friendly Only one department (A) shows association; θa =.9 women (.9) =.86 times as likely as men to be admitted Sex: Female Sex: Female Sex: Female Admit?: Yes Admit?: No Admit?: Yes Admit?: No Admit?: Yes Admit?: No Department: D Sex: Male 8 79 Department: E Sex: Male 5 8 Department: F Sex: Male 5 Sex: Female Sex: Female Sex: Female Admit?: Yes Admit?: No Admit?: Yes Admit?: No Admit?: Yes Admit?: No Department: A Sex: Male 5 Department: B Sex: Male 5 7 Department: C Sex: Male 5 Data in Table had been pooled over departments Stratified analysis: one fourfold display for each department Each table standardized to equate marginal frequencies Shading: highlight departments for which Ha : θi Fourfold displays for k tables SUGI 8 Michael Friendly Confidence rings do not overlap: θ Sex: Female Admit?: Yes Admit?: No 98 9 Quarter circles: radius nij area frequency Independence: Adjoining quadrants align Odds ratio: ratio of areas of diagonally opposite cells Confidence rings: Visual test of H : θ = adjoining rings overlap Sex: Male Fourfold displays for tables

9 What happened here? Simpson s paradox: Aggregate data are misleading because they falsely assume men and women apply equally in each field. But: Large differences in admission rates across departments. Men and women apply to these departments differentially. Women applied in large numbers to departments with low admission rates. (This ignores possibility of structural bias against women: differential funding of fields to which women are more likely to apply.) Other graphical methods can show these effects. SUGI 8 Michael Friendly The FOURFOLD program and the FFOLD macro The FOURFOLD program is written in SAS/IML. The FFOLD macro provides a simpler interface. Printed output: (a) significance tests for individual odds ratios, (b) tests of homogeneity of association (here, over departments) and (c) conditional association (controlling for department). Plot by department: berkf.sas %include catdata(berkeley); %ffold(data=berkeley, var=admit Gender, /* panel variables */ 5 by=dept, /* stratify by dept */ 6 down=, across=, /* panel arrangement */ 7 htext=); /* font size */ Aggregate data: first sum over departments, using the TABLE macro: 8 %table(data=berkeley, out=berk, 9 var=admit Gender, /* omit dept */ weight=count, /* frequency variable */ order=data); %ffold(data=berk, var=admit Gender); SUGI 8 Michael Friendly SUGI 8 5 Michael Friendly Hair Color Black Brown Red Blond Brown Eye Color Blue Hazel 5 5 Green Height/width marginal frequencies, ni+, n+j Area expected frequency, ni+n+j Shading observed frequency, nij, color: sign(nij ˆmij). Independence: Shown when density of shading is uniform. Sieve diagrams SUGI 8 Michael Friendly Black Brown Red Blond Hair Color Brown Eye Color Blue Hazel Expected frequencies: Hair Eye Color Data Green When row/col variables are independent, nij ni+n+j each cell can be represented as a rectangle, with area = height width frequency, nij count area Two-way frequency tables: Sieve diagrams

10 SUGI 8 7 Michael Friendly Left Eye Grade High Low Low Right Eye Grade High Unaided distant vision data Vision classification data for 777 women Sieve diagrams SUGI 8 6 Michael Friendly Hair Color Black Brown Red Blond Brown Eye Color Green Hazel Blue Effect ordering: Reorder rows/cols to make the pattern coherent Sieve diagrams SUGI 8 9 Michael Friendly Admitted Standardized residuals: Rejected 9 78 <- -:--:- : : > Model: (Gender)(Admit) area frequency, nij = ni+p j i Height relative proportions of other variable, p j i = nij/ni+ Width one set of marginals, ni+ Hartigan and Kleiner (98), Friendly (99, 999): Mosaic displays and Log-linear Models SUGI 8 8 Michael Friendly proc iml; %include iml(sieve); *-- frequency table; tab = { , , , }; 8 *-- variable and level names; 9 vnames = { Right Eye Grade Left Eye Grade }; lnames = { High Low, High Low }; title = { Unaided distant vision data }; *-- Global options; font= hwpsl ; 5 run sieve(tab, vnames, lnames, title ); 6 quit; sieve.sas Sieve diagrams: Example

11 SUGI 8 Michael Friendly Model [Dept] [Gender]: G (5) =.6. Note: Departments ordered A F by overall rate of admission. Model: (Dept)(Gender) Did men and women apply differentially to departments? Did departments differ in the total number of applicants? Departments Gender: Mosaic displays SUGI 8 Michael Friendly Admitted Standardized residuals: Rejected 9 78 <- -:--:- : : > Model: (Gender)(Admit) E.g., aggregate Berkeley data, independence model: Independence: Rows align, or cells are empty! Sign: negative in red; + positive in blue Magnitude: intensity of shading: dij >,,,... Shading: Sign and magnitude of Pearson χ residual, dij = (nij ˆmij)/ ˆmij (or L.R. G ) SUGI 8 Michael Friendly Admitted Rejected Model: (DeptGender)(Admit) E.g., Joint independence (null model, Admit as response) [G () = 877.]: SUGI 8 Michael Friendly DATA (size of tiles) (some) marginal frequencies (spacing visual grouping) RESIDUALS (shading) what associations have been omitted? Each mosaics shows: [AB][BC] log mijk = µ + λ A i + λ B j + λ C k + λ AB ij + λ BC jk e.g., the model for conditional independence (A C B): Saturated model [ABC] (none) All two-way associations [AB][AC][BC] (none) Mutual independence [A][B][C] A B C Joint independence [AB][C] (A B) C Conditional independence [AC][BC] (A B) C Model Model symbol Independence interpretation Table : Log-linear Models for Three-Way Tables Can fit any log-linear model (e.g., -way), Generalizes to n-way tables: divide cells recursively Mosaic displays for multiway tables

12 SUGI 8 5 Michael Friendly Enter frequency table directly in SAS/IML, or read from a SAS dataset. Most flexible: Select, collapse, reorder, re-label table levels using SAS/IML statements Specify structural s, fit specialized models (e.g., quasi-independence) Interface to models fit using PROC GENMOD SAS/IML modules: mosaics.sas program Examples: Many in VCD and on web site Documentation & software: Runs the current version of mosaics via a cgi-script Can run sample data, upload a data file, enter data in a form. Choose model fitting and display options (not all supported). Demonstration web applet: Software for Mosaic Displays SUGI 8 Michael Friendly Admitted Rejected Fits poorly, overall (G (6) =.7) But, only in Department A! E.g., Add [Dept Admit] association Conditional independence: Model: (DeptGender)(DeptAdmit) Pattern of lack-of-fit (residuals) better model smaller residuals cleaning the mosaic better model empty cells best done interactively! Visual fitting: Mosaic displays for multiway tables Software for Mosaic Displays Macro interface: mosaic macro, table macro, mosmat macro mosaic macro Easiest to use: Direct input from a SAS dataset No knowledge of SAS/IML required Reorder table variables; collapse, reorder table levels with table macro Convenient interface to partial mosaics (BY=) table macro Create frequency table from raw data Collapse, reorder table categories Re-code table categories using SAS formats, e.g., = Male = Female mosmat macro Mosaic matrices analog of scatterplot matrix (Friendly, 999) SUGI 8 6 Michael Friendly Software for Mosaic Displays Other implementations: JMP and SAS/INSIGHT both provide rudimentary mosaic displays (two-way only, no interface with model-fitting engines (shame!). The R-Project ( now provides the vcd package, implementing most of the graphical methods from VCD. Truly interactive mosaic displays have been implemented in: Lisp-Stat (ViSta) Java (Mondrian) SUGI 8 7 Michael Friendly

13 mosaic macro example: Berkeley data berkeley.sas title Berkeley Admissions data ; proc format; value admit ="Admitted" ="Rejected" ; value dept ="A" ="B" ="C" ="D" 5="E" 6="F"; 5 value $sex M = Male F = Female ; 6 data berkeley; 7 do dept = to 6; 8 do gender = M, F ; 9 do admit =, ; input output; end; end; end; /* -- Male -- - Female- */ /* Admit Rej Admit Rej */ 5 datalines; /* Dept A */ /* B */ /* C */ /* D */ /* E */ 5 7 /* F */ ; SUGI 8 8 Michael Friendly mosaic macro example: Berkeley data mosaic9m.sas goptions hsize=7in vsize=7in; %include catdata(berkeley); 5 *-- apply character formats to numeric table variables; 6 %table(data=berkeley, 7 var=admit Gender Dept, 8 weight=freq, 9 char=y, format=admit admit. gender $sex. dept dept., order=data, out=berkeley); %mosaic(data=berkeley, vorder=dept Gender Admit, /* reorder variables */ plots=:, /* which plots? */ 5 fittype=joint, /* fit joint indep. */ 6 split=h V V, htext=); /* options */ SUGI 8 9 Michael Friendly SUGI 8 5 Michael Friendly 9 title = Model: &MODEL ; plots=:; fittype= joint ; run mosaic(dim, table, vnames, lnames, plots, title); quit; 8 proc iml; *-- Berkeley Admissions data; dim = { 6}; vnames = {"Admit" "Gender" "Dept"}; /* var names */ 5 lnames = { /* level names */ 6 "Admitted" "Rejected" " " " " " " " ", 7 "Male" "Female" " " " " " " " ", 8 "A" "B" "C" "D" "E" "F"}; 9 /* Admit Not Admit Not */ table = { 5, 89 9, 5 7, 7 8, 5, 9, 8 79,, 5 8, 9 99, 5 5, 7}; 6 reset storage=mosaic.mosaic; 7 load module=_all_; *-- Load mosaic modules; mosaic9.sas SAS/IML example SUGI 8 5 Michael Friendly Two-way, Dept. by Gender Three-way, Dept. by Gender by Admit Admitted Rejected Model: (Dept)(Gender) Model: (DeptGender)(Admit) mosaic macro example: Berkeley data

14 SUGI 8 5 Michael Friendly δj=β Gender : effect of gender in Dept. A. β Dept i : effect on admissions of department, Lij = log(mij/mij): log odds of admission for males as vs. females, where, Lij = log(mij/mij) = α + β Dept i + δj=β Gender log mijk = µ + λ A i + λ D j + λ G k + λ AD ij + λ DG jk + δj=λ AG ik Admit Gender Dept, except for Dept. A Lij = log(mij/mij) = α + β Dept i log mijk = µ + λ A i + λ D j + λ G k + λ AD ij + λ DG jk Admit Gender Dept (conditional independence) For a binary response, each loglinear model is equivalent to a logit model (logistic regression, with categorical predictors) Logit models SUGI 8 5 Michael Friendly Admit Reject Dept Admit Reject Gender Admit Admit Reject Admit Reject %include catdata(berkeley); %mosmat(data=berkeley, vorder=admit Gender Dept, sort=no); mosmat macro: Mosaic matrices SUGI 8 55 Michael Friendly Department - Gender Female Male Log Odds (Admitted) - - logit(admit) = Dept DeptA*Gender proc catmod order=data data=berkeley;... response / out=predict; model admit = dept deptag / ml; %catplot(data=predict, xc=dept, class=gender, type=function, z=.96, legend=legend); Admit Gender Dept, except for Dept. A Fit: PROC CATMOD; plot: CATPLOT macro Plots for logit models SUGI 8 5 Michael Friendly Visualization procedures CATPLOT macro - plot predicted, observed log odds from CATMOD INFLGLIM macro - influence plots for generalized linear models HALFNORM macro - half-normal plot of residuals for generalized linear models Fitting procedures PROC CATMOD PROC LOGISTIC PROC GENMOD / dist=poisson SAS/INSIGHT (Fit Y X) Options Distribution poisson Logit models

15 Plots for logit models Fit: PROC CATMOD; plot: CATPLOT macro Admit Gender Dept, except for Dept. A catberk6.sas %include catdata(berkeley); data berkeley; set berkeley; *-- Dummy variable for Gender in Dept A; 5 deptag = (gender= F ) * (dept=); 6 format dept dept.; 7 8 proc catmod order=data 9 data=berkeley; weight freq; population dept gender; direct deptag; response / out=predict; model admit = dept deptag / ml; 5 run; 6... SUGI 8 56 Michael Friendly Plots for logit models PROC CATMOD output: Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq Intercept 9. <. dept <. deptag 6. <. Likelihood Ratio Analysis of Maximum Likelihood Estimates Standard Chi- Parameter Estimate Error Square Pr > ChiSq Intercept <. dept A <. B <. C D E <. deptag <. How to interpret? SUGI 8 57 Michael Friendly SUGI 8 59 Michael Friendly Department - Gender Female Male Log Odds (Admitted) - - logit(admit) = Dept DeptA*Gender catberk6.sas title logit(admit) = Dept DeptA*Gender ; %catplot(data=predict, x=dept, class=gender, type=function, /* plot the log odds */ 5 z=.96); /* 95% error bars */ Plots for logit models SUGI 8 58 Michael Friendly A M A F B M B F C M C F D M D F E M E F F M F F dept gender _OBS PRED SEPRED_ PROC CATMOD: observed and predicted logits: catberk6.sas 7 proc print data=predict; 8 id dept gender; 9 var _obs pred sepred_; format _numeric_ 6. dept dept.; where(_type_= FUNCTION ); Plots for logit models

16 SUGI 8 6 Michael Friendly 9 %inflglim(data=berkeley, class=dept gender admit, resp=freq, model=admit dept gender dept, dist=poisson, id=cell, 5 gx=hat, gy=streschi); 8 genberk.sas %include catdata(berkeley); *-- make a cell ID variable, joining factors; data berkeley; set berkeley; 5 cell = trim(put(dept,dept.)) 6 gender 7 trim(put(admit,yn.)); Berkeley data, model [AD][GD] Lij = α + β Dept i INFLGLIM macro: Example SUGI 8 6 Michael Friendly Leverage (H value) - AM+ - Adjusted Pearson residual - - FM- BM- CF+ CM- EF+ BM+ FF- AM- AF- CF- DM- DF- EM- EF- AF+ 5 Fit: PROC GENMOD; calculates additional diagnostic measures (Hat value, Cook s D, etc.) Plot: measures of residual (GY= χ, χ residual) vs. leverage (GX=hat value), bubble size (area, radius) Cook s D. which cells have undue impact on fitted model? INFLGLIM macro: Influence plots for generalized linear models (Williams, 987) Diagnostic plots for Generalized Linear Models SUGI 8 6 Michael Friendly %halfnorm(data=berkeley, class=dept gender admit, resp=freq, model=dept gender dept admit, 5 dist=poisson, id=cell); genberk.sas Plot ordered absolute residuals, r (i) vs. expected normal values, z (i) Standard normal confidence envelope not suitable for GLMs Simulate reference line and envelope with simulated confidence intervals HALFNORM macro: Half-normal plot of residuals (Atkinson, 98) Diagnostic plots for Generalized Linear Models SUGI 8 6 Michael Friendly All cells which do not fit ( ri > ) are for department A. Males applying to dept A have large leverage large influence (Cook s D) Leverage (H value) AM+ - Adjusted Pearson residual - - CF- EF- DM- FM- BM- DF- CF+ CM- EF+ EM- BM+ FF- AM- AF- AF+ 5 INFLGLIM macro: Example

17 SUGI 8 65 Michael Friendly Quantitative regressors: age, dose Transformed regressors: age, log(dose) Polynomial regressors: age, age, Categorical predictors: treatment, sex Interaction regessors: treatment age, sex age Explanatory variables: Binary response: success/failure, vote: yes/no Ordinal response: none, some, severe depression Polytomous response: vote Liberal, Tory, Aliance, NDP Response variable: Logistic regression models SUGI 8 6 Michael Friendly Points with largest residual labeled The model fits well, except in department A. Expected value of half normal quantile Absolute Std Deviance Residual AF+ AM-AM+ EF+ AF- 5 Logistic regression models: Binary response For a binary response, Y (, ), let x be a vector of p regressors, and πi be the probability, Pr(Y = x). The logistic regression model is a linear model for the log odds, or logit that Y =, given the values in x, ( ) πi logit(πi) log = α + x T i β πi = α + βxi + βxi + + βpxip An equivalent (non-linear) form of the model may be specified for the probability, πi, itself, πi = { + exp( [α + x T i β])} The logistic model is a linear model for the log odds, but also a multiplicative model for the odds of success, πi = exp(α + x T i β) = exp(α) exp(x T i β) πi so, increasing xij by increases logit(πi) by βj, and multiplies the odds by e βj. SUGI 8 66 Michael Friendly Logistic regression models: Binary response Fitting: PROC LOGISTIC (or ROBUST macro M-estimation) Data: Frequency form (from PROC FREQ) when all predictors are discrete Case form when any predictors are quantitative Models: CLASS statement (V7+) no need for dummy variables discrete predictors can specify order and parameterization (effect, polynomial, reference cell) MODEL statement allows GLM syntax, e.g., model Better = Sex Treat SUGI 8 67 Michael Friendly

18 Logistic regression models: Binary response Visualization: Goal: see and understand the data and fitted model LOGODDS macro: Plot observed responses, fitted and smoothed probabilities Model plots: OUTPUT statement fitted ˆπi, lower/upper ( α) CI, and/or fitted logit, (α + x T i ˆβ) ± z α/ se(logit) Plot with standard procedures (PROC GCHART, GPLOT) Utility macros (BARS, LABEL, POINTS, PSCALE, etc.) for custom displays Effect plots plot hierarchical subset of effects, averaging over those not included. INFLOGIS macro: Influence plots for logistic regression models ADDVAR macro: Added variable plots for new predictors or transformations of old SUGI 8 68 Michael Friendly Example: Arthritis treatment data Predictors: Sex, Treatment (treated, placebo), Age Response: improvement (none, some, marked) Consider first as binary response: None vs. (Some or Marked)= Better Data in case form: arthrit.sas data arthrit; length treat $7. sex $6. ; input id treat $ sex $ age ; case = _n_; 5 better = (improve > ); *-- Make binary response; 6 datalines ; 7 57 Treated Male 7 9 Placebo Male Treated Male 9 Placebo Male 9 77 Treated Male 7 Placebo Male 5... (observations omitted ) 56 Treated Female 69 Placebo Female 66 Treated Female 7 5 Placebo Female 66 7 Placebo Female 68 Placebo Female 7 5 ; SUGI 8 69 Michael Friendly SUGI 8 7 Michael Friendly AGE Log Odds Better= - SUGI 8 7 Michael Friendly %include catdata(arthrit); %logodds(data=arthrit, x=age, y=better, /* vars to plot */ smooth=.5, /* LOWESS smoothing parameter */ 5 plot=logit); /* plot on logit scale */ Show the data: Plot (/) responses [stacked or jittered] ( ) yi+/ Divide X into groups (e.g., deciles), emprical logit, log ni yi+/, for each Linear logistic regression, plus smoothed curve (LOWESS macro) The LOGODDS macro: Smoothing: Discrete data often requires smoothing to see! Linearity: Is a linear relation realistic? LOGODDS macro: Empirical logit plots

19 PROC LOGISTIC: Model fitting and plotting Specify ordering of response levels (order= or descending options) Specify parameterizations for CLASS variables OUTPUT statement to get fitted logits and probabilities glogistc.sas proc logistic data=arthrit descending; class sex (ref=last) treat (ref=first) / param=ref; model better = sex treat age; 5 output out=results 6 p=prob l=lower u=upper 7 xbeta=logit stdxbeta=selogit / alpha=.; The output includes: Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq sex treat age SUGI 8 7 Michael Friendly Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept sex Female treat Treated age Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits sex Female vs Male.7.8. treat Treated vs Placebo age Parameter estimates (reference cell coding): β =.9 Females e.9 =. times more likely to be better than Males β =.76 Treated e.76 =5.8 times more likely to be better than Placebo β =.87 odds ratio=.5 odds of improvement increase 5% each year. Over years, odds of improvement multiplied by e.86 =.6, a 6% increase. SUGI 8 7 Michael Friendly PROC LOGISTIC: Model plots Plots of fitted values from the dataset specified on the OUTPUT statement Plot either predicted probabilities or logits The first few observations from the results dataset: id sex treat age better prob lower upper logit selogit 57 Male Treated Male Placebo Male Treated Male Placebo Male Treated Male Placebo SUGI 8 7 Michael Friendly PROC LOGISTIC: Model plots Basic plots: Plot either logit or probability vs. one predictor (continuous or most levels) Separate curves for one factor Separate panels for all others (BY statement) proc gplot data=results; plot (logit prob) * age = treat; by sex; symbol v=circle i=join l= c=black; symbol v=dot i=join l= c=red; SUGI 8 75 Michael Friendly

20 SUGI 8 77 Michael Friendly 9 *-- Error bars, on logit scale; %bars(data=results, var=logit, class=age, cvar=treat, by=age, barlen=selogit, out=bars); *-- Custom legends and panel labels; 5 %label(data=results, y=logit, x=age, xoff=, cvar=treat, 6 by=sex, subset=last.treat, out=label, pos=6, text=treat); 7 %label(data=results, y=.5, x=, size=, 8 by=sex, subset=first.sex, out=label, pos=6, text=sex); 9 *-- Probability scales at right; %pscale(out=pscale, byvar=sex, byval=%str( Female, Male )); *-- Join ANNOTATE datasets; 5 data bars; 6 set label label bars pscale; 7 proc sort; 8 by sex; glogistc.sas Enhanced plots: PROC LOGISTIC: Model plots SUGI 8 76 Michael Friendly Age Age Placebo. Log Odds Improved Placebo.6 Probability Improved Log Odds Improved Treated Probability Improved Female Male Treated Plot on logit scale, with probability scale at right (PSCALE macro) Show 67% error bars ± se (BARS macro) Custom legend and panel labels (LABEL macro) Enhanced plots: PROC LOGISTIC: Model plots SUGI 8 79 Michael Friendly Age Age Treated Placebo. - Log Odds Improved Probability Improved Probability Improved.7 Log Odds Improved.7 Placebo.8.8 Female.9 Treated Male proc logistic data=arthrit descending; class sex (ref=last) treat (ref=first) / param=ref; model better = sex treat output out=results p=prob l=lower u=upper 5 xbeta=logit stdxbeta=selogit / alpha=.; Only need to change the MODEL statement Output dataset automatically incorporates all model terms Plotting steps remain exactly the same Plotting fitted values Models with interactions SUGI 8 78 Michael Friendly Age Age Placebo. - Log Odds Improved Placebo Probability Improved - Treated.7 Log Odds Improved Probability Improved Female Treated.9 Male title h=.8 a=-9 Probability Improved /* right axis label */ h=.5 a=-9 ; /* extra space */ goptions hby=; /* suppress BY values */ proc gplot data=results; 5 plot logit * age = treat / 6 vaxis=axis haxis=axis hm= vm= 7 nolegend anno=bars frame; 8 by sex; 9 axis label=(a=9 Log Odds Improved ) order=(- to ); axis order=( to 8 by ) offset=(,6); symbol v=+ i=join l= c=black; symbol v=- i=join l= c=red; label age= Age ; 5 run; glogistc.sas

21 Effect plots for generalized linear models Fox (987) For complex models, often wish to plot a specific main effect or interaction (including lower-order relatives) Fit full model to data with linear predictor (e.g., logit) η = Xβ and link function g(µ) = η estimate b of β and covariance matrix V (b) of b. Vary each predictor in the term over its range Fix other predictors at typical values (mean, median, proportion in the data) effect model matrix, X Calculate fitted effect values, ˆη = X b. Standard errors are square roots of diag(x V (b)x T ) Plot ˆη, or values transformed back to scale of response, g ( ˆη ). Note: This provides a general means to visualize interactions in all linear and generalized linear models. SUGI 8 8 Michael Friendly Effect plots in SAS Create a grid of values for predictors in the effect (EXPGRID macro) Fix other predictors at typical values (mean, median, proportion in the data) Concatenate grid with data Fit model output data set fitted values in the grid Standard errors automatically calculated Plot fitted values in the grid (Not yet a macro) SUGI 8 8 Michael Friendly SUGI 8 8 Michael Friendly 7 *-- select grid, replace labels; 8 data effect; 9 set predicted (where =(_in_=)); label logit= log odds of Volunteering prob = Probability of Volunteering ; 6 *-- fit model, output fitted values; proc logistic data=cowles outest=parm covout; model Volunter = Sex Extraver Neurot / covb; output out=predicted xbeta=logit stdxbeta=selogit 5 p=prob u=upper l=lower / alpha=.; 7 *-- catenate grid values with data; 8 data cowles; 9 set cowles _grid_; 6 %include catdata(cowles); %expgrid(sex=.5, /* Fix Sex at mid value */ Extraver= to by, /* range of predictors */ Neurot= to by 6, /* " " */ 5 _in_=); /* grid select value */ cowles.sas Effect plots: Example SUGI 8 8 Michael Friendly Extraversion Probability of Volunteering Neuroticism 6 8 Predictors: Sex, Neuroticism, Extraversion strong interaction, Neuroticism Extraversion Cowles and Davis (987) Volunteering for a psychology experiment Effect plots: Example

22 6 SUGI 8 85 Michael Friendly C, CBAR analogs of Cook s D in OLS standardized change in regression coefficients when i-th case is deleted. DIFCHISQ, DIFDEV χ when i-th case is deleted. Influence: Actual impact of an individual case leverage residual Residuals: Which observations are poorly fitted? Leverage: Potential impact of an individual case distance from the centroid in space of predictors Influence measures and diagnostic plots SUGI 8 8 Michael Friendly 6 8 Extraversion.. Probability of Volunteering Neuroticism.9 *-- Custom legend; %label(data=effect, 5 x=extraver, y=prob, 6 subset=extraver=, /* at last.extraver */ 7 text=put(neurot,.), /* label text */ 8 pos=6, xoff=., out=labels); 9 *-- Plot step; proc gplot data=effect; plot prob * Extraver = Neurot / vaxis=axis haxis=axis vm= anno=labels nolegend; 5 symbol v=dot i=spline r=5; 6 axis label=(a=9 r=) order=( to.9 by.); 7 axis order=( to by 6) offset=(,); 8 run; quit; cowles.sas Effect plots: Example Influence measures and diagnostic plots PROC LOGISTIC provides printed output with the influence and iplots options proc logistic data=arthrit ; model better = sex treat age / influence iplots; The LOGISTIC Procedure Regression Diagnostics Deviance Residual Hat Matrix Diagonal Case ( unit =.6) ( unit =.) Number Value Value *.89 *.6 *. *.685 *.87 *.5 *. * 5.7 *.86 * 6.88 *.8 * 7.7 *.8 * 8.99 *.9 * 9.96 *.66 *.5 *. *. *.6 *.5 *. *. *.65 *.599 *.5 * 5. *.65 * *.5 * SUGI 8 86 Michael Friendly 7.9 *.69 * 8.6 *.58 * 9.9 *.69 *.6 *.58 *. *.7 *.8 *.6 *. *.7 *.59 *.6 * 5.9 *.78 * 6.69 *.5 * 7.99 *.8 *... Problems: Way too much output Doesn t highlight unusual cases well Index plots don t consider combinations of measures SUGI 8 87 Michael Friendly

23 SUGI 8 89 Michael Friendly Male Treated Male Placebo Female Placebo Female Placebo Female Treated Female Treated case better sex treat age pred hat difchisq difdev c Printed output lists cases with large leverage, residual or influence: %include data(arthrit); %inflogis(data=arthrit, class=sex treat, /* CLASS variables */ y=better, /* response */ 5 x=sex treat age, /* predictors */ 6 id=case, /* case ID */ 7 gy=difchisq, /* graph ordinate */ 8 gx=pred HAT); /* graph abscissas */ INFLOGIS macro: Example SUGI 8 88 Michael Friendly Estimated Probability Leverage (Hat value) Change in Pearson Chi Square Change in Pearson Chi Square Arthritis treatment data Bubble size: Influence on Coefficients (C) Arthritis treatment data Bubble size: Influence on Coefficients (C) Specialized version of INFLGLIM macro for logistic regression Plots a measure of change in χ (DIFCHISQ or DIFDEV) vs. predicted probability or leverage. Bubble symbols show actual influence (C or CBAR) Shows standard cutoffs for large values Labels outlying cases INFLOGIS macro SUGI 8 9 Michael Friendly Leverage (Hat value) Change in Pearson Chi Square Arthritis treatment data Bubble size: Influence on Coefficients (C) INFLOGIS macro: Example SUGI 8 9 Michael Friendly Estimated Probability Change in Pearson Chi Square Arthritis treatment data Bubble size: Influence on Coefficients (C) INFLOGIS macro: Example

24 Conclusions Summarization & exposure Effective data analysis requires summarization hypothesis tests, model fits (& comparisons!), parameter estimates (& precision!) Also requires exposure displays to help the viewer see (& understand!) patterns, trends, and anomalies. Graphical methods for categorical data Many new methods developed over the last 5 years Some novel, others extend familiar methods for quantitative data Described and illustrated in VCD Theory into practice To be useful, statistical methods must be: available implemented in standard software accessible easy to use (or at least easier) VCD provides general macros and SAS/IML programs SUGI 8 9 Michael Friendly References Atkinson, A. C. Two graphical displays for outlying and influential observations in regression. Biometrika, 68:, 98. Bickel, P. J., Hammel, J. W., and O Connell, J. W. Sex bias in graduate admissions: Data from Berkeley. Science, 87:98, 975. Cowles, M. and Davis, C. The subject matter of psychology: Volunteers. British Journal of Social Psychology, 6:97, 987. Fox, J. Effect displays for generalized linear models. In Clogg, C. C., editor, Sociological Methodology, 987, pp Jossey-Bass, San Francisco, 987. Friendly, M. Mosaic displays for multi-way contingency tables. Journal of the American Statistical Association, 89:9, 99. Friendly, M. Extending mosaic displays: Marginal, conditional, and partial views of categorical data. Journal of Computational and Graphical Statistics, 8():7 95, 999. Friendly, M. Multidimensional arrays in SAS/IML. In Proceedings of the SAS User s Group International Conference, volume 5, pp. 7. SAS Institute,. Friendly, M. Corrgrams: Exploratory displays for correlation matrices. The American Statistician, 56():6,. Friendly, M. and Kwan, E. Effect ordering for data displays. Computational Statistics and Data Analysis, 7,. In press. Hartigan, J. A. and Kleiner, B. Mosaics for contingency tables. In Eddy, W. F., editor, Computer Science and Statistics: Proceedings of the th Symposium on the Interface, pp Springer-Verlag, New York, NY, 98. SUGI 8 9 Michael Friendly Hoaglin, D. C. and Tukey, J. W. Checking the shape of discrete distributions. In Hoaglin, D. C., Mosteller, F., and Tukey, J. W., editors, Exploring Data Tables, Trends and Shapes, chapter 9. John Wiley and Sons, New York, 985. Ord, J. K. Graphical methods for a class of discrete distributions. Journal of the Royal Statistical Society, Series A, : 8, 967. Tufte, E. R. The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT, 98. Tukey, J. W. Some graphic and semigraphic displays. In Bancroft, T. A., editor, Statistical Papers in Honor of George W. Snedecor. Iowa State University Press, Ames, IA, 97. Tukey, J. W. Exploratory Data Analysis. Addison Wesley, Reading, MA, 977. Williams, D. A. Generalized linear model diagnostics using the deviance and single case deletions. Applied Statistics, 6:8 9, 987. SUGI 8 9 Michael Friendly

Michael Friendly. Categorical Data Analysis with Graphics

Michael Friendly. Categorical Data Analysis with Graphics -5 0 5 Sqrt(frequency) 10 15 20 25 30 35 40 April, 2003 SUGI 28 York University, Toronto, Michael Friendly Black Brown Red Blond 0 2 4 6 8 10 12 Number of males Left Eye Grade High

More information

Loglinear Models for Categorical Data. Michael Friendly

Loglinear Models for Categorical Data. Michael Friendly Admit?: Yes Sex: Male 1198 1493 Admit?: No Right Eye Grade High 2 3 Unaided distant vision data Brown Hazel Green Blue -3.1 4.4 2.3 7.0 557 1278 Sex: Female Low High 2 3 Left Eye Grade Low -2.2-5.9 Black

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

Modelling Proportions and Count Data

Modelling Proportions and Count Data Modelling Proportions and Count Data Rick White May 5, 2015 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Choosing Statistical Method 1 Choice an appropriate method 2 Cross-tabulation More advance analysis of frequency tables

More information

An introduction to SPSS

An introduction to SPSS An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

run ld50 /* Plot the onserved proportions and the fitted curve */ DATA SETR1 SET SETR1 PROB=X1/(X1+X2) /* Use this to create graphs in Windows */ gopt

run ld50 /* Plot the onserved proportions and the fitted curve */ DATA SETR1 SET SETR1 PROB=X1/(X1+X2) /* Use this to create graphs in Windows */ gopt /* This program is stored as bliss.sas */ /* This program uses PROC LOGISTIC in SAS to fit models with logistic, probit, and complimentary log-log link functions to the beetle mortality data collected

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Version 3.6. Michael Friendly Psychology Department York University

Version 3.6. Michael Friendly Psychology Department York University User s Guide for MOSAICS Version 3.6 Michael Friendly Psychology Department York University Contents 1 Introduction 1 2 Installation Guide 2 2.1 How to obtain MOSAICS.... 3 2.2 Installing MOSAICS.......

More information

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

From logistic to binomial & Poisson models

From logistic to binomial & Poisson models From logistic to binomial & Poisson models Ben Bolker October 17, 2018 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share

More information

Loglinear and Logit Models for Contingency Tables

Loglinear and Logit Models for Contingency Tables Chapter 8 Loglinear and Logit Models for Contingency Tables Loglinear models comprise another special case of generalized linear models designed for contingency tables of frequencies. They are most easily

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Strategies for Modeling Two Categorical Variables with Multiple Category Choices

Strategies for Modeling Two Categorical Variables with Multiple Category Choices 003 Joint Statistical Meetings - Section on Survey Research Methods Strategies for Modeling Two Categorical Variables with Multiple Category Choices Christopher R. Bilder Department of Statistics, University

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Poisson Regression and Model Checking

Poisson Regression and Model Checking Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences

1 RefresheR. Figure 1.1: Soy ice cream flavor preferences 1 RefresheR Figure 1.1: Soy ice cream flavor preferences 2 The Shape of Data Figure 2.1: Frequency distribution of number of carburetors in mtcars dataset Figure 2.2: Daily temperature measurements from

More information

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975. Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975. SPSS Statistics were designed INTRODUCTION TO SPSS Objective About the

More information

GRAPHICAL METHODS FOR CATEGORICAL DATA. Michael Friendly, York University. Seive diagrams

GRAPHICAL METHODS FOR CATEGORICAL DATA. Michael Friendly, York University. Seive diagrams GRAPHCAL METHODS FOR CATEGORCAL DATA Michael Friendly York University Abstract Statistical methods for categorical data such as loglinear models and logistic regression represent discrete analogs of the

More information

Subset Selection in Multiple Regression

Subset Selection in Multiple Regression Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 327 Geometric Regression Introduction Geometric regression is a special case of negative binomial regression in which the dispersion parameter is set to one. It is similar to regular multiple regression

More information

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS code SAS (originally Statistical Analysis Software) is a commercial statistical software package based on a powerful programming

More information

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Regression. Dr. G. Bharadwaja Kumar VIT Chennai Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017

Predictive Checking. Readings GH Chapter 6-8. February 8, 2017 Predictive Checking Readings GH Chapter 6-8 February 8, 2017 Model Choice and Model Checking 2 Questions: 1. Is my Model good enough? (no alternative models in mind) 2. Which Model is best? (comparison

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM * Which directories are used for input files and output files? See menu-item "Options" and page 22 in the manual.

More information

book 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition

book 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition book 2014/5/6 15:21 page v #3 Contents List of figures List of tables Preface to the second edition Preface to the first edition xvii xix xxi xxiii 1 Data input and output 1 1.1 Input........................................

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these

More information

SPSS. (Statistical Packages for the Social Sciences)

SPSS. (Statistical Packages for the Social Sciences) Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

Hierarchical Generalized Linear Models

Hierarchical Generalized Linear Models Generalized Multilevel Linear Models Introduction to Multilevel Models Workshop University of Georgia: Institute for Interdisciplinary Research in Education and Human Development 07 Generalized Multilevel

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

Exploratory model analysis

Exploratory model analysis Exploratory model analysis with R and GGobi Hadley Wickham 6--8 Introduction Why do we build models? There are two basic reasons: explanation or prediction [Ripley, 4]. Using large ensembles of models

More information

Generalized least squares (GLS) estimates of the level-2 coefficients,

Generalized least squares (GLS) estimates of the level-2 coefficients, Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical

More information

Making the Transition from R-code to Arc

Making the Transition from R-code to Arc Making the Transition from R-code to Arc Sanford Weisberg Supported by the National Science Foundation Division of Undergraduate Education Grants 93-54678 and 96-52887. May 17, 2000 Arc is the revision

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation

More information

Graphical Techniques for Displaying Multivariate Data

Graphical Techniques for Displaying Multivariate Data Graphical Techniques for Displaying Multivariate Data James R. Schwenke Covance Periapproval Services, Inc. Brian J. Fergen Pfizer Inc * Abstract When measuring several response variables, multivariate

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,

More information

Two-way contingency tables

Two-way contingency tables Chapter 4 Two-way contingency tables The analysis of two-way frequency tables concerns the association between two variables. A variety of specialized graphical displays help to visualize the pattern of

More information

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT This chapter provides step by step instructions on how to define and estimate each of the three types of LC models (Cluster, DFactor or Regression) and also

More information

An introduction to plotting data

An introduction to plotting data An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a

More information

CREATING THE ANALYSIS

CREATING THE ANALYSIS Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219

More information

Enterprise Miner Tutorial Notes 2 1

Enterprise Miner Tutorial Notes 2 1 Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender

More information

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data Introduction About this Document This manual was written by members of the Statistical Consulting Program as an introduction to SPSS 12.0. It is designed to assist new users in familiarizing themselves

More information

Zero-Inflated Poisson Regression

Zero-Inflated Poisson Regression Chapter 329 Zero-Inflated Poisson Regression Introduction The zero-inflated Poisson (ZIP) regression is used for count data that exhibit overdispersion and excess zeros. The data distribution combines

More information

Table Of Contents. Table Of Contents

Table Of Contents. Table Of Contents Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store

More information

Contents of SAS Programming Techniques

Contents of SAS Programming Techniques Contents of SAS Programming Techniques Chapter 1 About SAS 1.1 Introduction 1.1.1 SAS modules 1.1.2 SAS module classification 1.1.3 SAS features 1.1.4 Three levels of SAS techniques 1.1.5 Chapter goal

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information

STATA 13 INTRODUCTION

STATA 13 INTRODUCTION STATA 13 INTRODUCTION Catherine McGowan & Elaine Williamson LONDON SCHOOL OF HYGIENE & TROPICAL MEDICINE DECEMBER 2013 0 CONTENTS INTRODUCTION... 1 Versions of STATA... 1 OPENING STATA... 1 THE STATA

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered

More information

Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA

Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA ECL 290 Statistical Models in Ecology using R Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA Datasets in this problem set adapted from those provided

More information

Box-Cox Transformation for Simple Linear Regression

Box-Cox Transformation for Simple Linear Regression Chapter 192 Box-Cox Transformation for Simple Linear Regression Introduction This procedure finds the appropriate Box-Cox power transformation (1964) for a dataset containing a pair of variables that are

More information

Multivariate Capability Analysis

Multivariate Capability Analysis Multivariate Capability Analysis Summary... 1 Data Input... 3 Analysis Summary... 4 Capability Plot... 5 Capability Indices... 6 Capability Ellipse... 7 Correlation Matrix... 8 Tests for Normality... 8

More information

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball

More information

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Using Multivariate Adaptive Regression Splines (MARS ) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers Why enhance GLM? Shortcomings of the linear modelling approach. GLM being

More information

ANALYSIS OF NOMINAL DATA MULTI-WAY CONTINGENCY TABLE

ANALYSIS OF NOMINAL DATA MULTI-WAY CONTINGENCY TABLE M A T H E M A T I C A L E C O N O M I C S No. 7(14) 2011 ANALYSIS OF NOMINAL DATA MULTI-WAY CONTINGENCY TABLE Abstract. Presented in this paper the method of graphical presentation of the relationship

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Product Catalog. AcaStat. Software

Product Catalog. AcaStat. Software Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

Handbook of Statistical Modeling for the Social and Behavioral Sciences

Handbook of Statistical Modeling for the Social and Behavioral Sciences Handbook of Statistical Modeling for the Social and Behavioral Sciences Edited by Gerhard Arminger Bergische Universität Wuppertal Wuppertal, Germany Clifford С. Clogg Late of Pennsylvania State University

More information

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values Chapter 500 Introduction This procedure produces tables of frequency counts and percentages for categorical and continuous variables. This procedure serves as a summary reporting tool and is often used

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office) SAS (Base & Advanced) Analytics & Predictive Modeling Tableau BI 96 HOURS Practical Learning WEEKDAY & WEEKEND BATCHES CLASSROOM & LIVE ONLINE DexLab Certified BUSINESS ANALYTICS Training Module Gurgaon

More information

Chapter 1. Using the Cluster Analysis. Background Information

Chapter 1. Using the Cluster Analysis. Background Information Chapter 1 Using the Cluster Analysis Background Information Cluster analysis is the name of a multivariate technique used to identify similar characteristics in a group of observations. In cluster analysis,

More information

Dealing with Categorical Data Types in a Designed Experiment

Dealing with Categorical Data Types in a Designed Experiment Dealing with Categorical Data Types in a Designed Experiment Part II: Sizing a Designed Experiment When Using a Binary Response Best Practice Authored by: Francisco Ortiz, PhD STAT T&E COE The goal of

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Then Log-odds with respect to a baseline category (eg, the last category) exp( ß ji = 0j + 1j X 1i + + kj X ki ) 1 + J 1 X exp( 0` + i`x 1i + + kj X k

Then Log-odds with respect to a baseline category (eg, the last category) exp( ß ji = 0j + 1j X 1i + + kj X ki ) 1 + J 1 X exp( 0` + i`x 1i + + kj X k Connections between -linear and istic regression models Any -linear model can be expressed as a istic regression model Logisitic regression requires specication of a response variable Log-linear models

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Chapter Two: Descriptive Methods 1/50

Chapter Two: Descriptive Methods 1/50 Chapter Two: Descriptive Methods 1/50 2.1 Introduction 2/50 2.1 Introduction We previously said that descriptive statistics is made up of various techniques used to summarize the information contained

More information

AND NUMERICAL SUMMARIES. Chapter 2

AND NUMERICAL SUMMARIES. Chapter 2 EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics PASW Complex Samples 17.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement

More information

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Introduction to Minitab The interface for Minitab is very user-friendly, with a spreadsheet orientation. When you first launch Minitab, you will see

More information