Inference for loglinear models (contd):

Stat 504, Lecture 25 1 Inference for loglinear models (contd): Loglinear/Logit connection Intro to Graphical Models

Stat 504, Lecture 25 2 Loglinear Models no distinction between response and explanatory variables distribution = link = Logit Models models how binary response variable depends on a set of explanatory variables distribution = link = They are related in a sense that the loglinear models are more general than logit models, and some logit models are equivalent to certain loglinear models (e.g. consider the admissions data example and HW9). if you have a binary response variable in the loglinear model, you can construct the logits to help with the interpretation of the loglinear model. some logit models with only categorical variables have equivalent loglinear models

Stat 504, Lecture 25 3 A Key to using logit models to interpret loglinear models: the differences between Λ s equal log of odds, and functions of Λ s equal odds ratios. Let s consider the Blue collar worker data and the homogeneous model (MW, MS, SW): Loglinear model: and the probability of worker s job satisfaction: π ij = P (High worker satisfaction M = i, S = j Logit model: Constant: Effect of management quality: Effect of supervisor s job satisfaction:

Stat 504, Lecture 25 4

Stat 504, Lecture 25 5 Model Selection: Ref. Ch. 9 (Agresti), and more advanced topics on model selection with ordinal data are in Sec. 9.4 and 9.5. One response variable: The logit models can be fit directly and are simpler because they have fewer parameters than the equivalent loglinear model. If the response variable has more than two levels, you can use a polytomous logit model. If you use loglinear models, the highestway associations among the explanatory variables should be included in all models. Whether you use logit or loglinear formualations, the results will be the same regardless of which formulation you use. Two or more response variables: Use loglinear models because they are more general.

Stat 504, Lecture 25 6 Model selection strategies with Loglinear models Determine if some variables are responses and some explanatory. Include associations terms for the explanatory variables in the model. Focus your model search on models that relate the responses to explanatory variables. If a margin is fixed by design, included the appropriate term in the loglinear model (to ensure that the marginal fitted values from the model equal to observed margin). Try to determine the level of complexity that is neccessary by fitting models with marginal/main effects only all 2way associations all 3way associations, etc... all highestway associations. Backward elimination strategy (analogous to one discussed for logit models) or a stepwise procedure (be careful in using computer aglorithms; you are better off doing likelihood ratio tests, e.g. blue collar data, or a 4-way table handout from Fienberg on detergent use).

Stat 504, Lecture 25 7 Classes of nested models: loglinear models hierarchical loglinear models graphical loglinear models decomposable loglinear models conditional independence models

Stat 504, Lecture 25 8 More on model building/selection: Graphical Models Graphical models are useful and are widely applicable Graphs visually represent scientific content of models and thus facilitate communication. Graphs break down complex problems/models into smaller and simpler pieces that can be studied separately. Graphs are natural data structures for programming History: Statistical Physics (Gibbs, 1902) Genetics and Path analysis (Wright, 1921,1923, 1934) Contingency tables (Barlett(1935))

Stat 504, Lecture 25 9 References: Cowell, R.G., Dawid, A.P., Lauritzen, S.L, and Spiegelhalter, D.J. (1999) Probabilistic Networks and Expert Systems. NY: Springer-Verlag. Darroch, J.N., Lauritzen, S.L., and Speed, T.P. (1980). Markov fields and log-linear models for contingency tables. Annals of Statistics, 8, 522539. Edwards, D. (2000). Introductions to Graphical Modeling,2nd edition. NY: SpringerVerlag. (includes MIM software). Lauritzen, S.L. (1996). Graphical Models. NY: Oxford Science Publications. Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Chichester: Wiley.

Stat 504, Lecture 25 10 Some on-line references: Murphy, K. (1998) A brief introduction to graphical models and Bayesian networks. http://www.cs.ubc.ca/ murphyk/bayes/bayes.html Graphical models in R: http://www.r-project.org/gr/ MIM, a graphical modeling software: http://www.hypergraph.dk/

Stat 504, Lecture 25 11 Graphical Models and Contingency Tables Determine when marginal and partial associations are the same such that we can collapse a multi-way table into a smaller table (or tables) to study certain associations. Represent substantive theories and hypotheses, which correspond to certain loglinear/logit models. Two basci types of graphs: undirected graphical models directed graphical models (or Bayesian Networks). Other: chain graphs, ancestral graphs, etc...

Stat 504, Lecture 25 12 Example 1: The Blue Collar Worker Data Example 2: The Ries-Smith Detergent Study (Fienberg, 1980) Example 3: The Czech autoworkers data come from a prospective epidemiological study of 1841 Czech car factory workers (Edwards and Havranek (1985)). The table cross-classifies the autoworkers for progonostic factors in coronary heart disease. The labels for variables are A for whether or not the worker smokes, B indicates strenuous mental work, C corresponds to strenuous physical work, D stands for systolic pressure, E corresponds to ratio of β and α lipoproteins, and F represents family anamnesis of coronary heart disease.

Stat 504, Lecture 25 13 B no yes F E D C A no yes no yes neg < 3 < 140 no 44 40 112 67 yes 129 145 12 23 140 no 35 12 80 33 yes 109 67 7 9 3 < 140 no 23 32 70 66 yes 50 80 7 13 140 no 24 25 73 57 yes 51 63 7 16 pos < 3 < 140 no 5 7 21 9 yes 9 17 1 4 140 no 4 3 11 8 yes 14 17 5 2 3 < 140 no 7 3 14 14 yes 9 16 2 3 140 no 4 0 13 11 yes 5 14 4 4 Table 1: Prognostic factors in coronary heart disease. Source: Edwards and Havranek (1985).

Stat 504, Lecture 25 14 Some terminology and definitions (common to all graphical models): Vertices V (or nodes) are points that represent variables. Edges, E, are lines that connect two vertices. Graph, G = {V, E}, consists of a finite set of vertices V and edges, E. The presence of an edge between two vertices indicates that an association may exist between the two variables. The absence of an edge between two vertices indicates that the two variables are independent.

Stat 504, Lecture 25 15 Undirected graph (no arrows, undirected relationships Directed graph (with arrows, implying causal relationship

Stat 504, Lecture 25 16 Adjacent vertices: directly connected by an edge Path: a sequence of distinct edges that take you from one variable to another Separted: two variables are separated if all paths between them are intersected by another variable (or a set of variables) Complete: there is an edge between every pair of vertices Clique: a subset of vertices that s maximally complete Boundary: of a variable (vertex) are all vertices in a set K, but not the variable of interest, adjacent to the vertex of interest.

Stat 504, Lecture 25 17 Example:

Stat 504, Lecture 25 18 Independence Graphs: Key result: Two variables are conditionally independent given any subset of variables that separates them. Def: An independence graph is (conditional) independence graph if there is no edge between two vertices whenever variables they represent are conditionally independent given all the remaining variables. This definition links partial associations to absence of an edge. Note: 1) If all r.v. s are categorical, then all graphical models for those are loglinear models (but not vice versa) 2) Different loglinear models may have the same graphical representation

Stat 504, Lecture 25 19 Examples:

Stat 504, Lecture 25 20 Link between loglinear models and graphs is the partial association model for k variables: two-way terms set to zero e.g. A indep. of B given ALL the other variables is equiv. to NO edge between A and B in the corresponding graph Markov properties Pairwise Markov property : Setting two-way association terms to zero specifies pairwise Markov relationship in the graph Global Markov property: Local Markov property: Equivalence Theorem: describes equivalence between different Markov properties.

Stat 504, Lecture 25 21 Partial Association Model in 4-dim (Example 2 (detergent study) from Fienberg) Let S=1, U=2, T=3, P=4 Suppose we analyze the data without a regard for the response, but with the loglinear model. There are 6 PA models for setting 6 first order interaction terms equal to zero: There is only one significant two-way term We also eliminated higher way order terms!

Stat 504, Lecture 25 22 A graphical approach to model selection: Fit all partial association models and compute p-values (i.e. unconditional edge exclusion test) drop non-significant edges from the graph, and form a resulting independence graph check for the inclusion of edges, one at the time, using conditional tests and add back significant edges (i.e. conditional edge inclusion test) check for exclusion of edges

Stat 504, Lecture 25 23 For our 4-way example, begin with the model that only includes edge between U and P because u 24 = λ UP jk = 0. Test of inclusion of other edges: So now we add edges (1,3) and (3,4). Now we have a decomposable graph and we don t gain much by dropping an edge. Now again, we can check for inclusion of edge (2,3), by checking for inclusion of two terms,u 23 andu 234. We get G 2 = 0.7, df = 1 and G 2 = 3.5, df = 2. This is not significant. But if we use AIC = G 2 df = 3.5 2 > 0 says include the edge in the model!

Stat 504, Lecture 25 24 Now use BIC = G 2 df log(n) = 7.24 < 0 says do NOT include the edge. The global search using BIC yields the best model [1][3][24] if you want the logit version of the model where the preference is the response and first three variables explanatory then you need to include those in the model, e.g. [123][34][24], G 2 = 8.4., df = 9. Do the conditional test for [34]: G 2 = 3.8, df = 1, p value = 0.512.

Stat 504, Lecture 25 25 Example 3: Begin by all PA-models and computing G 2 for exclusion of edges: Each G 2 has df=16, so a critical chi-square value for p-value=0.05 is 26.30. If we drop the edges we get the following graph: G 2 = 83.75, df = 51, p value = 0.0026, thus we deleted too many edges!

Stat 504, Lecture 25 26 There is about 32,768 decomposable models! How about model: [bf][bc][ace][ade] where graph is triangulated and decomposable. But the model doesn t fit too well either.

Stat 504, Lecture 25 27 But if we do stepwise model selection in MIM, for example we get this as the best graphical model which is not decomposable: G 2 = 58.28, df = 49, p value = 0.17