Inference for loglinear models (contd):

Similar documents
36-720: Graphical Models

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Lecture 13: May 10, 2002

Stat 5421 Lecture Notes Graphical Models Charles J. Geyer April 27, Introduction. 2 Undirected Graphs

Probabilistic Graphical Models

Decomposition of log-linear models

Computer vision: models, learning and inference. Chapter 10 Graphical Models

3 : Representation of Undirected GMs

Research Article Structural Learning about Directed Acyclic Graphs from Multiple Databases

Graphical Models. David M. Blei Columbia University. September 17, 2014

Probabilistic Graphical Models

The Basics of Graphical Models

Graph Theory. Probabilistic Graphical Models. L. Enrique Sucar, INAOE. Definitions. Types of Graphs. Trajectories and Circuits.

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Junction Trees and Chordal Graphs

arxiv:cmp-lg/ v1 12 Feb 1997

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

STA 4273H: Statistical Machine Learning

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

The grbase Package. August 13, Author Søren Højsgaard Claus Dethlefsen

Graphical models are a lot like a circuit diagram they are written down to visualize and better understand a problem.

Lecture 4: Undirected Graphical Models

2. Graphical Models. Undirected graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Lecture 5: Exact inference

Markov bases for decomposable graphical models

Handbook of Statistical Modeling for the Social and Behavioral Sciences

Log-linear Models of Contingency Tables: Multidimensional Tables

The grbase Package. October 25, 2007

Undirected Graphical Models. Raul Queiroz Feitosa

On some subclasses of circular-arc graphs

Junction Trees and Chordal Graphs

STATISTICS (STAT) Statistics (STAT) 1

Structure Estimation in Graphical Models

Graphical Modeling for High Dimensional Data

Forensic Statistics and Graphical Models (2) Richard Gill Spring Semester

mimr A package for graphical modelling in R

Lecture 11: May 1, 2000

2. Graphical Models. Undirected pairwise graphical models. Factor graphs. Bayesian networks. Conversion between graphical models. Graphical Models 2-1

Probabilistic Graphical Models

Hierarchical Generalized Linear Models

Improving Markov Chain Monte Carlo Model Search for Data Mining

GENERATING JUNCTION TREES OF DECOMPOSABLE GRAPHS WITH THE CHRISTMAS TREE ALGORITHM

Social science application of graphical models on mobility data

CHAPTER 1 INTRODUCTION

2 Review of Set Theory

On the usage of the grim package

Enumerating the decomposable neighbours of a decomposable graph under a simple perturbation scheme

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

Applied Multivariate Analysis

Model selection. Peter Hoff. 560 Hierarchical modeling. Statistics, University of Washington 1/41

Variational Methods for Graphical Models

Statistical Techniques in Robotics (STR, S15) Lecture#06 (Wednesday, January 28)

Machine Learning!!!!! Srihari. Chain Graph Models. Sargur Srihari

A Particular Type of Non-associative Algebras and Graph Theory

Sparse Nested Markov Models with Log-linear Parameters

Strategies for Modeling Two Categorical Variables with Multiple Category Choices

Discrete mathematics , Fall Instructor: prof. János Pach

Draft: More complex graph computations for undirected graphical models by the CoCo bundle for R

Lecture 13: Model selection and regularization

Graphical Models Reconstruction

STA 4273H: Statistical Machine Learning

Graphical models and message-passing algorithms: Some introductory lectures

Machine Learning. Sourangshu Bhattacharya

Edge-exchangeable graphs and sparsity

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

Lecture 9: Undirected Graphical Models Machine Learning

Introduction to information theory and coding - Lecture 1 on Graphical models

ECE521 W17 Tutorial 10

Matchings in Graphs. Definition 1 Let G = (V, E) be a graph. M E is called as a matching of G if v V we have {e M : v is incident on e E} 1.

Mixture Models and the EM Algorithm

Probabilistic Graphical Models

More details on Loopy BP

Hierarchical Mixture Models for Nested Data Structures

Chapter 2 PRELIMINARIES. 1. Random variables and conditional independence

Lecture 3: Conditional Independence - Undirected

Small Survey on Perfect Graphs

COMP90051 Statistical Machine Learning

11.4 Bipartite Multigraphs

15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018

K-structure, Separating Chain, Gap Tree, and Layered DAG

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

Markov Random Fields and Gibbs Sampling for Image Denoising

The External Network Problem

A New Approach For Convert Multiply-Connected Trees in Bayesian networks

Deep Boltzmann Machines

Estimation of Item Response Models

Exam Advanced Data Mining Date: Time:

Math 778S Spectral Graph Theory Handout #2: Basic graph theory

CSE 158. Web Mining and Recommender Systems. Midterm recap

Dynamic Thresholding for Image Analysis

Lecture 1: Examples, connectedness, paths and cycles

The Dynamic Hungarian Algorithm for the Assignment Problem with Changing Costs

Multi-label classification using rule-based classifier systems

Machine Learning (BSMC-GA 4439) Wenke Liu

Cheng Soon Ong & Christian Walder. Canberra February June 2018

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Problem Set 7 Solutions

Directed Graphical Models

Multiple-imputation analysis using Stata s mi command

Transcription:

Stat 504, Lecture 25 1 Inference for loglinear models (contd): Loglinear/Logit connection Intro to Graphical Models

Stat 504, Lecture 25 2 Loglinear Models no distinction between response and explanatory variables distribution = link = Logit Models models how binary response variable depends on a set of explanatory variables distribution = link = They are related in a sense that the loglinear models are more general than logit models, and some logit models are equivalent to certain loglinear models (e.g. consider the admissions data example and HW9). if you have a binary response variable in the loglinear model, you can construct the logits to help with the interpretation of the loglinear model. some logit models with only categorical variables have equivalent loglinear models

Stat 504, Lecture 25 3 A Key to using logit models to interpret loglinear models: the differences between Λ s equal log of odds, and functions of Λ s equal odds ratios. Let s consider the Blue collar worker data and the homogeneous model (MW, MS, SW): Loglinear model: and the probability of worker s job satisfaction: π ij = P (High worker satisfaction M = i, S = j Logit model: Constant: Effect of management quality: Effect of supervisor s job satisfaction:

Stat 504, Lecture 25 4

Stat 504, Lecture 25 5 Model Selection: Ref. Ch. 9 (Agresti), and more advanced topics on model selection with ordinal data are in Sec. 9.4 and 9.5. One response variable: The logit models can be fit directly and are simpler because they have fewer parameters than the equivalent loglinear model. If the response variable has more than two levels, you can use a polytomous logit model. If you use loglinear models, the highestway associations among the explanatory variables should be included in all models. Whether you use logit or loglinear formualations, the results will be the same regardless of which formulation you use. Two or more response variables: Use loglinear models because they are more general.

Stat 504, Lecture 25 6 Model selection strategies with Loglinear models Determine if some variables are responses and some explanatory. Include associations terms for the explanatory variables in the model. Focus your model search on models that relate the responses to explanatory variables. If a margin is fixed by design, included the appropriate term in the loglinear model (to ensure that the marginal fitted values from the model equal to observed margin). Try to determine the level of complexity that is neccessary by fitting models with marginal/main effects only all 2way associations all 3way associations, etc... all highestway associations. Backward elimination strategy (analogous to one discussed for logit models) or a stepwise procedure (be careful in using computer aglorithms; you are better off doing likelihood ratio tests, e.g. blue collar data, or a 4-way table handout from Fienberg on detergent use).

Stat 504, Lecture 25 7 Classes of nested models: loglinear models hierarchical loglinear models graphical loglinear models decomposable loglinear models conditional independence models

Stat 504, Lecture 25 8 More on model building/selection: Graphical Models Graphical models are useful and are widely applicable Graphs visually represent scientific content of models and thus facilitate communication. Graphs break down complex problems/models into smaller and simpler pieces that can be studied separately. Graphs are natural data structures for programming History: Statistical Physics (Gibbs, 1902) Genetics and Path analysis (Wright, 1921,1923, 1934) Contingency tables (Barlett(1935))

Stat 504, Lecture 25 9 References: Cowell, R.G., Dawid, A.P., Lauritzen, S.L, and Spiegelhalter, D.J. (1999) Probabilistic Networks and Expert Systems. NY: Springer-Verlag. Darroch, J.N., Lauritzen, S.L., and Speed, T.P. (1980). Markov fields and log-linear models for contingency tables. Annals of Statistics, 8, 522539. Edwards, D. (2000). Introductions to Graphical Modeling,2nd edition. NY: SpringerVerlag. (includes MIM software). Lauritzen, S.L. (1996). Graphical Models. NY: Oxford Science Publications. Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Chichester: Wiley.

Stat 504, Lecture 25 10 Some on-line references: Murphy, K. (1998) A brief introduction to graphical models and Bayesian networks. http://www.cs.ubc.ca/ murphyk/bayes/bayes.html Graphical models in R: http://www.r-project.org/gr/ MIM, a graphical modeling software: http://www.hypergraph.dk/

Stat 504, Lecture 25 11 Graphical Models and Contingency Tables Determine when marginal and partial associations are the same such that we can collapse a multi-way table into a smaller table (or tables) to study certain associations. Represent substantive theories and hypotheses, which correspond to certain loglinear/logit models. Two basci types of graphs: undirected graphical models directed graphical models (or Bayesian Networks). Other: chain graphs, ancestral graphs, etc...

Stat 504, Lecture 25 12 Example 1: The Blue Collar Worker Data Example 2: The Ries-Smith Detergent Study (Fienberg, 1980) Example 3: The Czech autoworkers data come from a prospective epidemiological study of 1841 Czech car factory workers (Edwards and Havranek (1985)). The table cross-classifies the autoworkers for progonostic factors in coronary heart disease. The labels for variables are A for whether or not the worker smokes, B indicates strenuous mental work, C corresponds to strenuous physical work, D stands for systolic pressure, E corresponds to ratio of β and α lipoproteins, and F represents family anamnesis of coronary heart disease.

Stat 504, Lecture 25 13 B no yes F E D C A no yes no yes neg < 3 < 140 no 44 40 112 67 yes 129 145 12 23 140 no 35 12 80 33 yes 109 67 7 9 3 < 140 no 23 32 70 66 yes 50 80 7 13 140 no 24 25 73 57 yes 51 63 7 16 pos < 3 < 140 no 5 7 21 9 yes 9 17 1 4 140 no 4 3 11 8 yes 14 17 5 2 3 < 140 no 7 3 14 14 yes 9 16 2 3 140 no 4 0 13 11 yes 5 14 4 4 Table 1: Prognostic factors in coronary heart disease. Source: Edwards and Havranek (1985).

Stat 504, Lecture 25 14 Some terminology and definitions (common to all graphical models): Vertices V (or nodes) are points that represent variables. Edges, E, are lines that connect two vertices. Graph, G = {V, E}, consists of a finite set of vertices V and edges, E. The presence of an edge between two vertices indicates that an association may exist between the two variables. The absence of an edge between two vertices indicates that the two variables are independent.

Stat 504, Lecture 25 15 Undirected graph (no arrows, undirected relationships Directed graph (with arrows, implying causal relationship

Stat 504, Lecture 25 16 Adjacent vertices: directly connected by an edge Path: a sequence of distinct edges that take you from one variable to another Separted: two variables are separated if all paths between them are intersected by another variable (or a set of variables) Complete: there is an edge between every pair of vertices Clique: a subset of vertices that s maximally complete Boundary: of a variable (vertex) are all vertices in a set K, but not the variable of interest, adjacent to the vertex of interest.

Stat 504, Lecture 25 17 Example:

Stat 504, Lecture 25 18 Independence Graphs: Key result: Two variables are conditionally independent given any subset of variables that separates them. Def: An independence graph is (conditional) independence graph if there is no edge between two vertices whenever variables they represent are conditionally independent given all the remaining variables. This definition links partial associations to absence of an edge. Note: 1) If all r.v. s are categorical, then all graphical models for those are loglinear models (but not vice versa) 2) Different loglinear models may have the same graphical representation

Stat 504, Lecture 25 19 Examples:

Stat 504, Lecture 25 20 Link between loglinear models and graphs is the partial association model for k variables: two-way terms set to zero e.g. A indep. of B given ALL the other variables is equiv. to NO edge between A and B in the corresponding graph Markov properties Pairwise Markov property : Setting two-way association terms to zero specifies pairwise Markov relationship in the graph Global Markov property: Local Markov property: Equivalence Theorem: describes equivalence between different Markov properties.

Stat 504, Lecture 25 21 Partial Association Model in 4-dim (Example 2 (detergent study) from Fienberg) Let S=1, U=2, T=3, P=4 Suppose we analyze the data without a regard for the response, but with the loglinear model. There are 6 PA models for setting 6 first order interaction terms equal to zero: There is only one significant two-way term We also eliminated higher way order terms!

Stat 504, Lecture 25 22 A graphical approach to model selection: Fit all partial association models and compute p-values (i.e. unconditional edge exclusion test) drop non-significant edges from the graph, and form a resulting independence graph check for the inclusion of edges, one at the time, using conditional tests and add back significant edges (i.e. conditional edge inclusion test) check for exclusion of edges

Stat 504, Lecture 25 23 For our 4-way example, begin with the model that only includes edge between U and P because u 24 = λ UP jk = 0. Test of inclusion of other edges: So now we add edges (1,3) and (3,4). Now we have a decomposable graph and we don t gain much by dropping an edge. Now again, we can check for inclusion of edge (2,3), by checking for inclusion of two terms,u 23 andu 234. We get G 2 = 0.7, df = 1 and G 2 = 3.5, df = 2. This is not significant. But if we use AIC = G 2 df = 3.5 2 > 0 says include the edge in the model!

Stat 504, Lecture 25 24 Now use BIC = G 2 df log(n) = 7.24 < 0 says do NOT include the edge. The global search using BIC yields the best model [1][3][24] if you want the logit version of the model where the preference is the response and first three variables explanatory then you need to include those in the model, e.g. [123][34][24], G 2 = 8.4., df = 9. Do the conditional test for [34]: G 2 = 3.8, df = 1, p value = 0.512.

Stat 504, Lecture 25 25 Example 3: Begin by all PA-models and computing G 2 for exclusion of edges: Each G 2 has df=16, so a critical chi-square value for p-value=0.05 is 26.30. If we drop the edges we get the following graph: G 2 = 83.75, df = 51, p value = 0.0026, thus we deleted too many edges!

Stat 504, Lecture 25 26 There is about 32,768 decomposable models! How about model: [bf][bc][ace][ade] where graph is triangulated and decomposable. But the model doesn t fit too well either.

Stat 504, Lecture 25 27 But if we do stepwise model selection in MIM, for example we get this as the best graphical model which is not decomposable: G 2 = 58.28, df = 49, p value = 0.17