Bayesian Networks and Decision Graphs

Similar documents
Data Compression Fundamentals

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

ITEC2620 Introduction to Data Structures

Markov Random Fields

Compiler Construction

10708 Graphical Models: Homework 2

2017 ACM ICPC ASIA, INDIA REGIONAL ONLINE CONTEST

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning Lecture 16

Glynda, the good witch of the North

Lec-5-HW-1, TM basics

On the Parikh-de-Bruijn grid

Chapter 4: Regular Expressions

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

CMPSCI 250: Introduction to Computation. Lecture #28: Regular Expressions and Languages David Mix Barrington 2 April 2014

COMP90051 Statistical Machine Learning

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

Notes for Comp 454 Week 2

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

XVIII Open Cup named after E.V. Pankratiev Stage 10, Grand Prix of Korea, Sunday, February 4, 2018

Undirected graph is a special case of a directed graph, with symmetric edges

CMPSCI 250: Introduction to Computation. Lecture #36: State Elimination David Mix Barrington 23 April 2014

1.3 Functions and Equivalence Relations 1.4 Languages

Introduction to Graphical Models

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Inference Complexity As Learning Bias. The Goal Outline. Don t use model complexity as your learning bias

TCOM 501: Networking Theory & Fundamentals. Lecture 11 April 16, 2003 Prof. Yannis A. Korilis

These definitions cover some basic terms and concepts of graph theory and how they have been implemented as C++ classes.

Graphical Models. David M. Blei Columbia University. September 17, 2014

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Counting Faces of Polytopes

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Lecture 3: Conditional Independence - Undirected

Section 1.7 Sequences, Summations Cardinality of Infinite Sets

Regular Expressions. Lecture 10 Sections Robb T. Koether. Hampden-Sydney College. Wed, Sep 14, 2016

Models and Algorithms for Complex Networks

Assignment No.4 solution. Pumping Lemma Version I and II. Where m = n! (n-factorial) and n = 1, 2, 3

1. Which of the following regular expressions over {0, 1} denotes the set of all strings not containing 100 as a sub-string?

Section 2.4 Sequences and Summations

The Basics of Graphical Models

Accessing Variables. How can we generate code for x?

Chapter 8 of Bishop's Book: Graphical Models

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Machine Learning

Lecture 5: Exact inference

Machine Learning

The angle measure at for example the vertex A is denoted by m A, or m BAC.

Power Set of a set and Relations

CS 343: Artificial Intelligence

CAD Algorithms. Shortest Path

CS201 Discussion 7 MARKOV AND RECURSION

Reading. Graph Terminology. Graphs. What are graphs? Varieties. Motivation for Graphs. Reading. CSE 373 Data Structures. Graphs are composed of.

Problem Set 3 Due: 11:59 pm on Thursday, September 25

FMA901F: Machine Learning Lecture 6: Graphical Models. Cristian Sminchisescu

Machine Learning. Sourangshu Bhattacharya

Bayesian Classification Using Probabilistic Graphical Models

Building Classifiers using Bayesian Networks

QUESTION BANK. Unit 1. Introduction to Finite Automata

ASSIGNMENT 3. COMP-202, Fall 2014, All Sections. Due: October 24 th, 2014 (23:59)

A Brief Introduction to Bayesian Networks AIMA CIS 391 Intro to Artificial Intelligence

Twin Binary Sequences: A Non-redundant Representation for General Non-slicing Floorplan

Warm-up as you walk in

Chapter 18: Decidability

Multiple Choice Questions

Differential Cryptanalysis

Introduction to Syntax Analysis. The Second Phase of Front-End

Learning DAGs from observational data

CS 3100 Models of Computation Fall 2011 This assignment is worth 8% of the total points for assignments 100 points total.

Summary. Agenda. Search. Search is a common technique in problem solving. Search Formulation Tree Search Uniform cost search A* Search.

What are graphs? (Take 1)

Bayesian Networks. A Bayesian network is a directed acyclic graph that represents causal relationships between random variables. Earthquake.

Section 6.3: Further Rules for Counting Sets

UNIT I PART A PART B

Minimum Spanning Trees and Shortest Paths

CS2210 Data Structures and Algorithms

Introduction to Syntax Analysis

Undirected graph is a special case of a directed graph, with symmetric edges

UNION-FREE DECOMPOSITION OF REGULAR LANGUAGES

Outline. Solving Problems by Searching. Introduction. Problem-solving agents

Joint Entity Resolution

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

1/19/2010. Usually for static, deterministic environment

Machine Learning

Introduction to Computer Architecture

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

ALGORITHMS FOR DECISION SUPPORT

Clustering Using Graph Connectivity

DVA337 HT17 - LECTURE 4. Languages and regular expressions

NP-Hardness. We start by defining types of problem, and then move on to defining the polynomial-time reductions.

Lecture 4: Undirected Graphical Models

CS-171, Intro to A.I. Mid-term Exam Summer Quarter, 2016

Technische Universität München Zentrum Mathematik

Graphical Models & HMMs

A Randomized Algorithm for Minimum Cuts

CS242: Probabilistic Graphical Models Lecture 3: Factor Graphs & Variable Elimination

CS 161 Lecture 11 BFS, Dijkstra s algorithm Jessica Su (some parts copied from CLRS) 1 Review

Transcription:

ayesian Networks and ecision raphs hapter 7 hapter 7 p. 1/27

Learning the structure of a ayesian network We have: complete database of cases over a set of variables. We want: ayesian network structure representing the independence properties in the database. ases Pr t Ut 1. yes pos pos 2. yes neg pos 3. yes pos neg 4. yes pos neg 5. no neg neg t Pr Ut hapter 7 p. 2/27

ayesian networks from databases I xample: Transmission of symbol strings, P (U): irst 2 Last 3 aaa aab aba abb baa bab bba bbb aa 0.017 0.021 0.019 0.019 0.045 0.068 0.045 0.068 ab 0.033 0.040 0.037 0.038 0.011 0.016 0.010 0.015 ba 0.011 0.014 0.010 0.010 0.031 0.046 0.031 0.045 bb 0.050 0.060 0.057 0.057 0.016 0.023 0.015 0.023 onsider the model N as representing the database: T1 T2 T3 T4 T5 Simpler than the database. representation of the database. The chain rule (yields the joint probability which we can compare to the actual database): P N (U) = P(T 1, T 2, T 3, T 4, T 5 ) = P(T 1 )P(T 2 T 1 )P(T 3 T 2 )P(T 4 T 3 )P(T 5 T 4 ). hapter 7 p. 3/27

ayesian networks from databases II xample: Transmission of symbol strings, P (U): irst 2 Last 3 aaa aab aba abb baa bab bba bbb aa 0.017 0.021 0.019 0.019 0.045 0.068 0.045 0.068 ab 0.033 0.040 0.037 0.038 0.011 0.016 0.010 0.015 ba 0.011 0.014 0.010 0.010 0.031 0.046 0.031 0.045 bb 0.050 0.060 0.057 0.057 0.016 0.023 0.015 0.023 onsider the model N as representing the database: T1 T2 T3 T4 T5 irst 2 Last 3 aaa aab aba abb baa bab bba bbb aa 0.016 0.023 0.018 0.021 0.044 0.067 0.050 0.061 ab 0.030 0.044 0.033 0.041 0.011 0.015 0.012 0.014 ba 0.010 0.016 0.012 0.014 0.029 0.045 0.033 0.041 bb 0.044 0.067 0.059 0.061 0.016 0.023 0.017 0.021 re P N (U) and P (U) sufficiently identical? hapter 7 p. 4/27

(naïve) way to look at it Some agent produces samples of cases from a ayesian network M over the universe U. These cases are handed over to you, and you should now reconstruct M from the cases. ssumptions: The sample is fair (P (U) reflects the distribution determined by M). ll links in M are essential. naïve procedure: or each ayesian network structure N: - alculate the distance between P N (U) and P (U). Return the network N that minimizes the distance, and where all links are essential. ut this is hardly feasible! hapter 7 p. 5/27

The space of network structures is huge! The number of structures (as a function of the number of nodes): f(n) = nx ( 1) i+1 n! (n i)!i! 2i(n i) f(n i). i=1 Some example calculations: Nodes Number of s Nodes Number of s 1 1 13 1.9 10 31 2 3 14 1.4 10 36 3 25 15 2.4 10 41 4 543 16 8.4 10 46 5 29281 17 6.3 10 52 6 3.8 10 6 18 9.9 10 58 7 1.1 10 9 19 3.3 10 65 8 7.8 10 11 20 2.35 10 72 9 1.2 10 15 21 3.5 10 79 10 4.2 10 18 22 1.1 10 87 11 3.2 10 22 23 7.0 10 94 12 5.2 10 26 24 9.4 10 102 hapter 7 p. 6/27

Two approaches to structural learning Score based learning: Produces a series of candidate structures. Returns the structure with highest score. onstraint based learning: stablishes a set of conditional independence statements for the data. uilds a structure with d-separation properties corresponding to the independence statements found. hapter 7 p. 7/27

onstraint based learning Some notation: To denote that is conditionally independent of given X in the database we shall use I(,, X). Some assumptions: The database is a faithful sample from a ayesian network M: and are d-separated given X in M if and only if I(,, X). We have an oracle that correctly answers questions of the type: Is I(,, X)? The algorithm: Use the oracle s answers to first establish a skeleton of a ayesian network: The skeleton is the undirected graph obtained by removing directions on the arcs. Next, when the skeleton is found we then start looking for the directions on the arcs. hapter 7 p. 8/27

inding the skeleton I The idea: if there is a link between and in M then they cannot be d-separated, and as the data is faithful it can be checked by asking questions to the oracle: The link is part of the skeleton if and only if I(,, X), for all X. ssume that the only conditional independence found is I(, ): The skeleton The only possible ssume that the conditional independences found are I(,, ) and I(,, {, }): hapter 7 p. 9/27

inding the skeleton II The idea: if there is a link between and in M then they cannot be d-separated, and as the data is faithful it can be checked by asking questions to the oracle: The link is part of the skeleton if and only if I(,, X), for all X. ssume that the only conditional independence found is I(, ): The skeleton The only possible ssume that the conditional independences found are I(,, ) and I(,, {, }): hapter 7 p. 10/27

Setting the directions on the links I Rule 1: If you have three nodes,,, such that and, but not, then introduce the v-structure if there exists an X (possibly empty) such that I(,, X) and X. xample: ssume that we get the independences I(, ), I(, ), I(,, ), I(,, ), I(,, ), I(,, {, }), I(,, {, }), I(,, {, }), I(,, {, }), I(,, {,, }), I(,, {,, }), I(,, {,, }). hapter 7 p. 11/27

Setting the directions on the links I Rule 1: If you have three nodes,,, such that and, but not, then introduce the v-structure if there exists an X (possibly empty) such that I(,, X) and X. xample: ssume that we get the independences I(, ), I(, ), I(,, ), I(,, ), I(,, ), I(,, {, }), I(,, {, }), I(,, {, }), I(,, {, }), I(,, {,, }), I(,, {,, }), I(,, {,, }). hapter 7 p. 12/27

Setting the directions on the links II Rule 2 [void new v-structures]: When Rule 1 has been exhausted, and you have (and no link between and ), then direct. Rule 3 [void cycles]: If introduces a directed cycle in the graph, then do Rule 4 [hoose randomly]: If none of the rules 1-3 can be applied anywhere in the graph, choose an undirected link and give it an arbitrary direction. xample: Skeleton Rule 1 Rule 4 hapter 7 p. 13/27

The rules: an overview Rule 1 [ind v-structures]: If you have three nodes,,, such that and, but not, then introduce the v-structure if there exists an X (possibly empty) such that I(,, X) and X. Rule 2 [void new v-structures]: When Rule 1 has been exhausted, and you have (and no link between and ), then direct. Rule 3 [void cycles]: If introduces a directed cycle in the graph, then do Rule 4 [hoose randomly]: If none of the rules 1-3 can be applied anywhere in the graph, choose an undirected link and give it an arbitrary direction. hapter 7 p. 14/27

nother example onsider the graph: pply the four rules to learn a ayesian network structure hapter 7 p. 15/27

nother example I Step 1: Rule 1 hapter 7 p. 16/27

nother example I Step 1: Rule 1 Step 2: Rule 2 hapter 7 p. 16/27

nother example I Step 1: Rule 1 Step 2: Rule 2 Step 3: Rule 4 hapter 7 p. 16/27

nother example I Step 1: Rule 1 Step 2: Rule 2 Step 3: Rule 4 Step 4: Rule 2 hapter 7 p. 16/27

nother example I Step 1: Rule 1 Step 2: Rule 2 Step 3: Rule 4 Step 4: Rule 2 Step 5: Rule 2 hapter 7 p. 16/27

nother example I Step 1: Rule 1 Step 2: Rule 2 Step 3: Rule 4 Step 4: Rule 2 Step 5: Rule 2 Step 6: Rule 4 However, we are not guaranteed a unique solution! hapter 7 p. 16/27

nother example II Step 1: Rule 1 Step 2: Rule 2 hapter 7 p. 17/27

nother example II Step 1: Rule 1 Step 2: Rule 2 Step 3: Rule 4 hapter 7 p. 17/27

nother example II Step 1: Rule 1 Step 2: Rule 2 Step 3: Rule 4 Step 4: Rule 4 hapter 7 p. 17/27

nother example II Step 1: Rule 1 Step 2: Rule 2 Step 3: Rule 4 Step 4: Rule 4 Step 5: Rule 2 hapter 7 p. 17/27

nother example II Step 1: Rule 1 Step 2: Rule 2 Step 3: Rule 4 Step 4: Rule 4 Step 5: Rule 2 Step 6: Rule 2+3 lthough the solution is not necessarily unique, all solutions have the same d-separation properties! hapter 7 p. 17/27

rom independence tests to skeleton Until now, we have assumed that all questions of the form Is I(,, X)? can be answered (allowing us to establish the skeleton). However, questions come at a price, and we would there like to ask as few questions as possible. To reduce the number of questions we exploit the following property: Theorem: The nodes and are not linked if and only if I(,, pa()) or I(,, pa()). It is sufficient to ask questions I(,, X), where X is a subset of s or s neighbors. H n active path from to must go through a parent of. hapter 7 p. 18/27

The P algorithm The P algorithm: 1. Start with the complete graph; 2. i := 0; 3. while a node has at least i + 1 neighbors - for all nodes with at least i + 1 neighbors - for all neighbors of - for all neighbor sets X such that X = i and X (nb() \ {}) - if I(,, X) then remove the link and store "I(,, X)" - i := i + 1 hapter 7 p. 19/27

xample We start with the complete graph and ask the questions I(, )?, I(, )?, I(, )?, I(, )?, I(, )?, I(, )?, I(, )?, I(, )?, I(, )?, I(, )? The original model The complete graph hapter 7 p. 20/27

xample We start with the complete graph and ask the questions I(, )?, I(, )?, I(, )?, I(, )?, I(, )?, I(, )?, I(, )?, I(, )?, I(, )?, I(, )?. The original model The complete graph We get a yes for I(, )? and I(, )?: the links and are therefore removed. hapter 7 p. 21/27

xample We now condition on one variable and ask the questions I(,, )?, I(,, )?, I(,, )?, I(,, )?, I(,, )?, I(,, )?, I(,, )?, I(,, )?, I(,, )?,...,I(,, )?, I(,, )?. The original model fter one iteration hapter 7 p. 22/27

xample We now condition on one variable and ask the questions I(,, )?, I(,, )?, I(,, )?, I(,, )?, I(,, )?, I(,, )?, I(,, )?, I(,, )?, I(,, )?,...,I(,, )?, I(,, )?. The original model fter one iteration The question I(,, )? has the answer "yes": we therefore remove the link. hapter 7 p. 23/27

xample We now condition on two variable and ask questions like I(,, {, })?. The original model fter two iterations The questions I(,, {, })? and I(,, {, })? have the answer yes : we therefore remove the links and. hapter 7 p. 24/27

xample We now condition on three variables, but since no nodes have four neighbors we are finished. The original model fter three iterations The identified set of independence statements are then I(, ), I(, ), I(,, ), I(,, {, }), and I(,, {, }). They are sufficient for applying rules 1-4. hapter 7 p. 25/27

Real world data The oracle is a statistical test, e.g. conditional mutual information: (, X) = X X P(X) X, P(, X)log P(, X) P( X)P( X). I(,, X) (, X) = 0. However, all tests have false positives and false negatives, which may provide false results/causal relations! Similarly, false results may also be caused to hidden variables: hapter 7 p. 26/27

Properties If the database is a faithful sample from a ayesian network and the oracle is reliable, then a solution exists. You can only infer causal relations if no hidden variables exists. hapter 7 p. 27/27