Forward Sampling. Forward Sampling. Forward Sampling. Approximate Inference 1

Size: px

Start display at page:

Download "Forward Sampling. Forward Sampling. Forward Sampling. Approximate Inference 1"

Dwight Pitts
6 years ago
Views:

1 Approximate Inference This section on approximate inference relies on samples / particles Full particles: complete assignment to all network variables eg. ( = x, = x,..., N = x N ) Topological sort or order: An ordering of the nodes in the AG where comes before Y in the ordering if there is a directed path from to Y in the graph. A topological order is equivalent to a partial order on the nodes of the graph There may be several topological orderings A C B Examples of Topological orders: A,B,C, B,A,C, Student Example P() low 0.6 high 0.4 I G P(G,I) low low C 0.3 low low B 0.4 low low A 0.3 low high C 0.0 low high B 0.08 low high A 0.9 high low C 0.7 high low B 0.5 high low A 0.05 high high C 0. high high B 0.3 ifficulty Grade Letter Intelligence SAT I P(I) low 0.7 high 0.3 I S P(S I) low low 0.95 low high 0.05 high low 0. high high 0.8 G L P(L G) C weak 0.99 C strong 0.0 B weak 0.4 B strong 0.6 A weak 0. A strong high high A 0.5 4

2 ifficulty Grade Letter Intelligence SAT Topological ordering:, I, G, S, L. Sample from P() (Say you get =high). Sample I from P(I) (Say you get I=low) 3. Sample G from P(G I=low,=high) (Say you get G=C) 4. Sample S from P(S I=low) (Say you get S=low) 5. Sample L from P(L G=C) (Say you get L=weak) Suppose you want to calculate P( = x, = x,, n = x n ) using forward sampling on a Bayesian network. The algorithm:. o a topological sort of the nodes in the Bayesian network.. For j = to NU_SAPLES For each node i in the ordering (starting from the top of the Bayesian network down) Sample the value xˆi from the distribution P( i Parents( i )) Add { xˆ ˆ ˆ, x,..., xn} to your collection of samples 3. Let P( x, x,..., n xn ) # of samples with x, x,..., NU _ SAPLES You now have a sample (=high, I=low, G=C, S=low, L=weak) 5 6 n x n How do you sample from P( i Parents( i ))? Note: P( i Parents( i )) is a multinomial distribution P(x i,..., x ik,..., k )? How do you sample from a multinomial distribution P(x i,..., x ik,..., k )? Generate a sample s uniformly from [0,] Partition interval into k subintervals: [0, ), [, + ),... ore generally, the ith interval is i i j, j j j If s is in the ith interval, the sample value is x i. Use binary search to find the interval for s in time O(log k) 7 8

3 Suppose your list of samples looks like the following table: I G S L low low B low weak low high A high strong low high A high weak high high A high strong high low C low weak P(I=high) = 3/5 = 0.6 Note that this value becomes a lot more accurate as the number of samples heads to infinity. 9 From a set of particles = {[],..., []}, we can estimate the expectation of any function f as: Eˆ ( f ) m To estimate P( Pˆ ( I{ y[ y} m f ( [ ) This is the values of the variables in Y in the particle [ 0 efine = total # of particles generated n = p = max i Pa i d = max i Val( i ) Overall cost of sampling is O( n p log(d)) To get the CP entry for given Pa, it costs O(p) Sampling process for P( Pa ) costs O(log(d)) How accurate is this estimate? Using the Hoeffding bound: ˆ P ( P ( [ P(, P( ]) e How many samples are required to achieve an estimate whose error is bounded by, with probability at least (-)? Setting e ln( / ) we get 3

4 How accurate is this estimate? Using the Chernoff bound: ˆ P( /3 P ( P ( P( ( )) e How many samples are required to achieve an estimate whose error is bounded by, with probability at least (-)? ln( / ) 3 P( Note: This requires us to know P( Rejection Sampling 3 4 Rejection Sampling What if we want to estimate P(y E=e)? Rejection sampling: do forward sampling but throw out samples where Ee Example: P(I=high L=weak) = /3 I G S L low low B low weak low high A high strong low high A high weak high high A high strong high low C low weak Rejection Sampling What if the evidence E=e is very very rare? For example, if P(e) = 0.00, then for 0,000 samples, we get 0 unrejected samples To obtain at least * unrejected samples, we need to generate on average = */P(e) samples If evidence is rare, we end up generating a lot of samples which wastes time 5 6 4

5 Rejection Sampling Bad news: Rare evidence is the norm! As # of evidence variables k = E grows, the probability of the evidence decreases exponentially with k Likelihood Weighting Need something better than rejection sampling! 7 8 Likelihood Weighting Likelihood Weighting Intuition: Weight samples according to probability of the evidence Intelligence SAT I P(I) low 0.7 high 0.3 I S P(S I) low low 0.95 low high 0.05 high low 0. high high 0.8 rawing I = high and S = high should be 80% of a sample rawing I = low and S = high should be 5% of a sample 9 Weighted particles: [ ], w[],..., [ ], w[ ] Estimate: Pˆ ( y e) m w[ I{ y[ m w[ y} 0 5

6 Likelihood Weighting Procedure LW-Sample(, // Bayesian network over Z=z // Event in the network ). Let,, n be a topological ordering of. w 3. for i =,, n 4. u i x<pa i > // Assignment to Pa i in x,, x i- 5. if i Zthen 6. Sample x i from P( i u i ) 7. else 8. x i z< i > // Assignment to i in z 9. w w P(x i u i ) // ultiply weight by probability of desired value 0. return (x,, x n ), w 6

Approximate Inference 1. Forward Sampling

Approximate Inference 1. Forward Sampling Approximate Inference This section on approximate inference relies on samples / particles Full particles: complete assignment to all network variables eg. (X = x, X = x,..., X N = x N ) Topological sort