Approximate Inference 1. Forward Sampling

Size: px

Start display at page:

Download "Approximate Inference 1. Forward Sampling"

Anne Horn
6 years ago
Views:

1 Approximate Inference This section on approximate inference relies on samples / particles Full particles: complete assignment to all network variables eg. (X = x, X = x,..., X N = x N )

2 Topological sort or order: An ordering of the nodes in the AG where X comes before Y in the ordering if there is a directed path from X to Y in the graph. A topological order is equivalent to a partial order on the nodes of the graph There may be several topological orderings A C B Examples of Topological orders: A,B,C, B,A,C, 3 Student Example P() low 0.6 I P(I) low 0.7 high 0.3 high 0.4 I S P(S I) I G P(G,I) low low C 0.3 low low B 0.4 low low A 0.3 ifficulty Intelligence low low 0.95 low high 0.05 high low 0. high high 0.8 low high C 0.0 low high B 0.08 low high A 0.9 high low C 0.7 high low B 0.5 high low A 0.05 high high C 0. high high B 0.3 Grade Letter SAT G L P(L G) C weak 0.99 C strong 0.0 B weak 0.4 B strong 0.6 A weak 0. A strong 0.9 high high A 0.5 4

3 Topological ordering:, I, G, S, L ifficulty Grade Letter Intelligence SAT. Sample from P() (Say you get =high). Sample I from P(I) (Say you get I=low) 3. Sample G from P(G I=low,=high) (Say you get G=C) 4. Sample S from P(S I=low) (Say you get S=low) 5. Sample L from P(L G=C) (Say you get L=weak) You now have a sample (=high, I=low, G=C, S=low, L=weak) 5 Suppose you want to calculate P( X = x, X = x,, X n = x n ) using forward sampling on a Bayesian network. The algorithm:. o a topological sort of the nodes in the Bayesian network.. For j = to NU_SAPLES For each node i in the ordering (starting from the top of the Bayesian network down) Sample the value xˆi from the distribution P( X i Parents(X i )) Add { xˆ ˆ ˆ, x,..., xn} to your collection of samples 3. Let P( X x, X x,..., X x ) # of samples with X x, X x,..., X NU _ SAPLES n n n x n 6 3

4 How do you sample from P(X i Parents(X i ))? Note: P( X i Parents(X i )) is a multinomial distribution P(x i,..., x ik,..., k )? 7 How do you sample from a multinomial distribution P(x i,..., x ik,..., k )? Generate a sample s uniformly from [0,] Partition interval into k subintervals: [0, ), [, + ),... ore generally, the ith interval is i i j, j j j If s is in the ith interval, the sample value is x i. Use binary search to find the interval for s in time O(log k) 8 4

5 Suppose your list of samples looks like the following table: I G S L low low B low weak low high A high strong low high A high weak high high A high strong high low C low weak P(I=high) = 3/5 = 0.6 Note that this value becomes a lot more accurate as the number of samples heads to infinity. 9 From a set of samples = {[],..., []}, we can estimate the expectation of any function f as: Eˆ ( f ) m To estimate P(y) Pˆ ( y) I{ y[ m] m f ( [ m]) y} This will return if the sample [m] matches the values in y (eg. y is I=high) and 0 otherwise. 0 5

6 Rejection Sampling Rejection Sampling What if we want to estimate P(y E=e)? Rejection sampling: do forward sampling but throw out samples where Ee Example: P(I=high L=weak) = /3 I G S L low low B low weak low high A high strong low high A high weak high high A high strong high low C low weak 6

7 Rejection Sampling What if the evidence E=e is very very rare? For example, if P(e) = 0.00, then for 0,000 samples, we get 0 unrejected samples To obtain at least * unrejected samples, we need to generate on average = */P(e) samples If evidence is rare, we end up generating a lot of samples which wastes time 3 Bad news: Rejection Sampling Rare evidence is the norm! As # of evidence variables k = E grows, the probability of the evidence decreases exponentially with k Need something better than rejection sampling! 4 7

8 Likelihood Weighting 5 Likelihood Weighting Intuition: Weight samples according to probability of the evidence Intelligence SAT I P(I) low 0.7 high 0.3 I S P(S I) low low 0.95 low high 0.05 high low 0. high high 0.8 rawing I = high and S = high should be 80% of a sample rawing I = low and S = high should be 5% of a sample 6 8

9 Likelihood Weighting Weighted particles: [ ], w[],..., [ ], w[ ] Estimate: Pˆ ( y e) m w[ m] I{ y[ m] m w[ m] y} 7 Likelihood Weighting Procedure LW-Sample(, // Bayesian network over X Z=z // Event in the network ). Let X,, X n be a topological ordering of X. w 3. for i =,, n 4. u i x<pa Xi > // Assignment to Pa Xi in x,, x i- 5. if X i Zthen 6. Sample x i from P(X i u i ) 7. else 8. x i z<x i > // Assignment to X i in z 9. w w P(x i u i ) // ultiply weight by probability of desired value 0. return (x,, x n ), w 8 9

Forward Sampling. Forward Sampling. Forward Sampling. Approximate Inference 1

Forward Sampling. Forward Sampling. Forward Sampling. Approximate Inference 1 Approximate Inference This section on approximate inference relies on samples / particles Full particles: complete assignment to all network variables eg. ( = x, = x,..., N = x N ) Topological sort or