Lifted Stochastic Planning, Belief Propagation and Marginal MAP

Size: px
Start display at page:

Download "Lifted Stochastic Planning, Belief Propagation and Marginal MAP"

Transcription

1 Lifted Stochastic Planning, Belief Propagation and Marginal MAP Hao Cui and Roni Khardon Department of Computer Science, Tufts University, Medford, Massachusetts, USA Abstract It is well known that the problems of stochastic planning and probabilistic inference are closely related. This paper makes several contributions in this context for factored spaces where the complexity of solutions is challenging. First, we analyze the recently developed SOGBOFA heuristic, which performs stochastic planning by building an explicit computation graph capturing an approximate aggregate simulation of the dynamics. It is shown that the values computed by this algorithm are identical to the approximation provided by Belief Propagation (BP). Second, as a consequence of this observation, we show how ideas on lifted BP can be used to develop a lifted version of SOGBOFA. Unlike implementations of lifted BP, Lifted SOGBOFA has a very simple implementation as a dynamic programming version of the original graph construction. Third, we show that the idea of graph construction for aggregate simulation can be used to solve marginal MAP (MMAP) problems in Bayesian networks, where MAP variables are constrained to be at roots of the network. This yields a novel algorithm for MMAP for this subclass. An experimental evaluation illustrates the advantage of Lifted SOG- BOFA for planning. Introduction The connection between planning and inference is well known both for the logical setting and the probabilistic setting. Over the last decade multiple authors have introduced explicit reductions showing how stochastic planning can be solved using probabilistic inference (Domshlak and Hoffmann 2006; Toussaint and Storsky 2006; Furmston and Barber 2010; Lang and Toussaint 2010; Furmston and Barber 2011; Neumann 2011; Kober and Peters 2011; Sabbadin, Peyrard, and Forsell 2012; Liu and Ihler 2012; Cheng et al. 2013; Kiselev and Poupart 2014; Nitti, Belle, and Raedt 2015; van de Meent et al. 2016; Lee, Marinescau, and Dechter 2016; Nguyen, Kumar, and Lau 2017) with applications in robotics, scheduling and environmental problems. However, heuristic methods and search are still the best performing approaches for planning in large combinatorial state and action spaces (Kolobov et al. 2012; Keller and Helmert 2013; Cui and Khardon 2016). This paper makes several contributions in this context. Copyright c 2018, Association for the Advancement of Artificial Intelligence ( All rights reserved. We analyze a recent heuristic algorithm that was shown to be very effective in practice for large action spaces. SOG- BOFA (Cui and Khardon 2016) builds an approximate computation graph capturing marginals of state and reward variables under independence assumption. It then uses gradient based search to optimize action choice. Our analysis shows that the value computed by SOGBOFA s computation graph is identical to the solution of Belief Propagation (BP) when conditioned on actions. Abstracting this suggests a novel approach to planning and marginal MAP problems, where approximations are often used: If we can compute a closed form representation of the value (or marginal probability) computed by the chosen approximation, when conditioned on the actions (max variables), then we can perform the optimization through automatic differentiation (Griewank and Walther 2008). In the case of SOGBOFA we have a closed form for the result of BP for a sub-family of directed graphs. This captures marginal MAP (MMAP) problems for Bayesian networks with max variables at roots and evidence restricted to be at one leaf node. Given this analysis it is natural to ask if we can improve SOGBOFA further by taking advantage of improvements of BP, for example Lifted BP (Singla and Domingos 2008; Kersting, Ahmadi, and Natarajan 2009) where repeated messages are avoided in the computation. As we show, this can be done, and the form of the lifted algorithm is extremely simple. For the sub-family of directed graphs, lifted BP is achieved simply by using a dynamic programming implementation of the construction of the SOGBOFA graph. This is an important advantage because typical implementations of lifted inference are complex, whereas here the implementation comes almost for free. An experimental evaluation shows that the computation time of SOGBOFA is improved and that, under the same time constraints, lifted SOGBOFA improves the performance in planning over SOGBOFA. Finally, while SOGBOFA was developed for planning, it can be used to solve MMAP problems. The original construction for SOGBOFA assumes one final Q value node which is being optimized. It is not hard to see that this can be adapted to solve any MMAP problem with decision variables at the roots of the Bayesian network. This provides a novel alternative to the mixed-bp algorithm of (Liu and Ihler 2013) for MMAP, where our algorithm combines lifted BP conditioned on MAP nodes with gradient search.

2 Preliminaries We assume familiarity with basic notions in Bayesian networks and stochastic planning. This section recalls some details of BP and SOGBOFA. Belief Propagation in Bayesian Networks For our results it is convenient to refer to the BP algorithm for directed graphs (Pearl 1989). A Bayesian network is given by a directed acyclic graph where each node x is associated with a random variable and a corresponding conditional probability table (CPT) capturing p(x parents(x)). The joint probability of all variables is given by i p(x i parents(x i )). We restrict the presentation in this paper to Bayesian networks with binary variables, i.e., x i {0, 1}. BP calculates an approximation of the marginal probability of each node by sending messages among adjacent nodes. Assume first that the directed graph is a polytree (no underlying undirected cycles). The marginal of x, as calculated by BP, is denoted by BEL(x) p(x e), where e is the total evidence in the graph. Let π(x) p(x e + ) and λ(x) p(e x), where e + is the evidence that can reach x through its parents, e is the evidence that can be reached from x though its children. We use α to represent a normalization constant, for example, α(2, 3, 5) = (0.2, 0.3, 0.5), and use β to denote some constant. For a polytree, x separates its parents from its children and where BEL(x) = απ(x)λ(x) (1) λ(x) = n λ zj (x) (2) j=1 and π(x) = n p(x w) π x (w k ) (3) w k=1 incorporate evidence through children and parents respectively. In (3) the sum variable w ranges over all assignments to the parents of x and w k is the induced value to the kth parent. λ zj (x) is the message that each child z j sends to its parent to x and π x (w k ) is the message that each parent w k sends to x. The messages are given by λ x (w i ) = β λ(x) p(x w) π x (w k ) (4) x w k :k i k i π zj (x) = α k j λ zk (x)π(x) (5) where in (4) w i is fixed and the the sum is over values to other parents w k. Since the nodes are binary the messages have two values. The algorithm is initialized as follows. If x = 0 is observed, we set both π(x) and λ(x) to (0,1) and if x = 1 is observed we set them to (1,0). Otherwise, if a node x is a root (parent-less), we set π(x) to be the prior probability of x. If x is child-less, we set λ(x) to be (1,1), i.e., an uninformative value. A node can send a message along an edge if all messages from its other edges have been received. If the graph is a tree (or polytree) then one pass on the graph from leaves to root and one pass from root to leaves provide BEL(x) = p(x) for any x in the tree, where p(x) is the true marginal probability. The loopy BP algorithm applies the same updates even if the graph is not a polytree. In this case we initialize all messages to (1,1) and follow the same update rules for messages according to some schedule. The algorithm is not guaranteed to converge but it often does and it is known to perform well in many cases. The following property of BP is well known: Lemma 1. If loopy BP is applied to a graph G with no evidence, i.e., λ(x) = (1, 1) for all x at initialization, then for any order of message updates and at any time in the execution of loopy BP for any node x, λ(x) {1, 1} and λ x (w) is a constant independent of w for any parent of x. AROLLOUT and SOGBOFA Stochastic planning is typically captured as a Markov decision process (MDP) (Puterman 1994) with factored state and action spaces. A MDP is specified by {S, A, T, R, γ}, where S is a finite state space, A is a finite action space, T (s, a, s ) = p(s s, a) defines the transition probabilities, R(s, a) is the immediate reward of taking action a in state s, and γ is the discount factor. In factored spaces (Boutilier, Dean, and Hanks 1995) the state space S is specified by a set of binary variables {x 1,..., x l } so that S = 2 l. Similarly, A is specified by a set of binary variables {y 1,..., y m } so that A = 2 m. In off-line planning the task is to compute a policy that optimizes the long term reward. More formally, any policy π : S A induces a distribution over trajectories, and the task in planning is to optimize the expected discounted total reward E[ i γi R(s i, π(s i )) π], where s i is the i th state visited by following π (and s 0 = s). For finite horizon problems the sum goes up to the fixed horizon and we use γ = 1, i.e., discounting is not needed. In on-line planning we are given a fixed limited time t per step and cannot compute a policy in advance. Instead, given the current state, the algorithm must decide on the next action consuming at most t time. Then the action is performed in the MDP, a transition and reward are observed and the algorithm is presented with the next state. This process repeats and the long term performance of the algorithm is evaluated. On-line planning has been the standard evaluation method in recent planning competitions. The algorithms discussed in this paper perform on-line planning by estimating the value of every initial action where a rollout policy, typically a random policy, is used in future steps. They differ in how they estimate the value of an initial action through a forward simulation and in how they search over possibilities for the initial action. The AROLLOUT algorithm (Cui et al. 2015) introduced the idea of algebraic simulation to estimate values but optimized over actions in a brute force manner. The work of (Cui and Khardon 2016) showed how rollouts can be computed symbolically and that the optimization can be done using automatic differentiation leading to the SOGBOFA algorithm. Finite horizon planning can be translated from a high

3 level language (e.g., RDDL (Sanner 2010)) to a dynamic Bayesian network (DBN). SOGBOFA transforms the CPT of a node x into disjoint sum form. The CPT is split into n disjoint portions g i, each sharing p(x parents). Each portion is further split into mutually exclusive clauses c i1..c ij. In other words, the CPT is represented in the form if(c 11 c 12...) then p 1... if(c n1 c n2...) then p n, and the elements c i1j 1 and c i2j 2 are mutually exclusive and exhaustive. AROLLOUT performs a forward pass calculating ˆp(x), an approximation of the true marginal p(x), for any node in the graph. ˆp(x) is calculated as a function of ˆp(c ij ), an estimate of the probability that c ij is true. ˆp(x) = ij p(x c ij )ˆp(c ij ) = ij p i ˆp(c ij ) (6) where ˆp(c ij ) is calculated assuming its parents are independent. ˆp(c ij ) = ˆp(w k ) (1 ˆp(w k )). (7) w k c ij w k c ij The main observation in SOGBOFA is that instead of using this process to calculate concrete values, we can use the expressions to construct an explicit data structure representing the computation graph, where the last node represents the accumulated reward. Once this is done, we can use automatic differentiation and gradient based search to optimize action variables. SOGBOFA includes several additional heuristics including dynamic control of simulation depth, dynamic selection of gradient step size, maintaining domain constraints, and a balance between gradient search and random restarts. Most of this is orthogonal to topic of this paper and we omit the details. We discuss the control of simulation depth with the experimental section, where it bears on the results presented. AROLLOUT is Equivalent to BP We first show that the computation of AROLLOUT can be rewritten as a sum over assignments. Lemma 2. AROLLOUT s calculation in Eq (6) and (7) is equivalent to ˆp(x) = W p(x W ) w k (8) where W is the set of assignment to the parents of x. Proof. The sum in 8 can be divided into disjoint sets of assignments according to the c ij they satisfy. Let w l be a parent of x which is not in c ij. Let W (c ij ) be the assignments to the parents of x which satisfy c ij, and W \wl (c ij ) be the assignments to the parents of x except for w l which satisfy c ij. Since w l is not in c ij, W \wl (c ij ) is well defined. We have W (c ij) p(x W ) w k (9) = W \wl (c ij) ( p(x W, w l = T )ˆp(w l ) k l + p(x W, w l = F )(1 ˆp(w l )) ) k l Since w l is not in c ij the assignment of w l doesn t affect the probability of x. So for W W \wl (c ij ) we have p(x W, w l = T ) = p(x W, w l = F ) = p(x W ) and therefore the above sum can be simplified to. W \wl (c ij) p(x W ) k l Applying the same reasoning to all individual w l / c ij, we get that Eq (9) is equal to p(x W ) ˆp(w k ) wk (1 ˆp(w k )) 1 wk k c ij W w p(c ij) where w p (c ij ) is the set of assignments to variables in c ij which satisfies c ij. Any assignment to w k c ij which satisfy c ij must satisfy w k = T when w k c ij and w k = F when w k c ij, and the last expression simplifies to p(x W ) ˆp(w k ) (1 ˆp(w k )) w k c ij w k c ij W w p(c ij) Finally, because the c ij are mutually exclusive and exhaustive, the sum over the disjoint sets of assignments is identical to the sum of 6. With Lemma 1 and Lemma 2 we can prove the equivalence. Proposition 1. The marginals calculated by AROLLOUT are identical to the marginals calculated by BP on the DBN generated by the planning problem, conditioned on the initial state, initial action, and rollout policy, and with no evidence. Proof. By Lemma 1 λ(x) and λ x (w i ) are (1, 1) for all nodes. Therefore, backward messages do not affect the result of BP and we can argue inductively going forward from roots to leaves in the DBN. By Eq (1) and Lemma 1 we have BEL(x) = απ(x)λ(x) = π(x) (10) where the last equality is true because π(x) is always normalized. Therefore from Eq (3) we have BEL(x) = n p(x w) π x (w k ). (11) w k=1 Now, from Eq (5) and Lemma 1 we have π x (w k ) = π(w k ) = BEL(w k ) (12)

4 and substituting (12) into (11) we get BEL(x) = w p(x w) n BEL(w k ). (13) k=1 Inductively assuming BEL(w k = T ) = ˆp(w k ) and BEL(w k = F ) = 1 ˆp(w k ), we can rewrite Eq (13) as BEL(x = T ) = ˆp(x) = W p(x W ) w k (1 ˆp(w k )) 1 w k, which is identical to Eq (8). Lifted SOGBOFA Given the equivalence of the previous section, we next show how SOGBOFA can be improved using ideas from lifted inference. Lifted BP aims to calculate exactly the same solutions as BP but do so more quickly by avoiding identical messages. The idea was first introduced by (Singla and Domingos 2008) who showed how to extract the equivalence classes of messages from a structured model, specifically a Markov logic network. Each type of message is then computed only once and propagated as a power of the original individual message, replacing the product of identical messages. The work of (Kersting, Ahmadi, and Natarajan 2009) showed how repeated structure can be extracted from any ground Markov network by simulating only the process of BP without calculating the messages. The messages are then grouped and propagated similarly. The details of Lifted BP were developed for undirected models but the same principles apply to directed models. Thanks to the property of BP exposed in Lemma 1 we know that in SOGBOFA graphs there are no backward messages, and we can focus on the structure of forward messages. Further, forward messages are the marginal distributions of the corresponding variables, and since we are focusing on binary variables each message can be captured with one number or expression for p(x i = 1). In Lifted BP, two nodes or factors send identical messages when all their input messages are identical and in addition the local factor is identical. With the forward structure of SOGBOFA this has a clear analogy. Two nodes in the SOGBOFA graph are identical when their parents are identical and the local algebraic expressions capturing the local factors are identical. This suggests a straightforward implementation for Lifted SOGBOFA using dynamic programming. Lifted SOGBOFA algorithm: We run the algorithm in exactly the same manner as SOGBOFA except for the construction of the computation graph. (1) When constructing a node in the explicit computation graph we check if an identical node, with same parents and same operation, has been constructed before. If so we return that node instead of constructing a new one. Otherwise we create a new node in the graph. (2) If we only use the idea above we may end up with multiple edges between the same two nodes. Such structures are simplified on the fly during construction. In particular, every plus node with k incoming identical edges is turned into a multiply node with constant multiplier k. Similarly, every multiply node with k incoming identical edges is turned into a power node with constant multiplier k. This portion of the construction generates the counting expressions, often seen in lifted inference, automatically from the DBN structure. It is important to identify two aspects of this algorithm and its relation to lifted BP. First, similar to (Kersting, Ahmadi, and Natarajan 2009) and unlike (Singla and Domingos 2008), our algorithm for constructing the lifted graph must first process the ground network, i.e., its complexity is linear in the size of the original ground computation graph. It is interesting to explore constructions that go directly from the relational planning problem description to the lifted SOG- BOFA graph, without considering each ground node, but we do not pursue this here. On its face the saving from this construction does not look significant because we only have one pass of messages in the computation of BP so that spending linear time in advance is already expensive. This would be correct if we only ran BP once. However, because we evaluate the same lifted SOGBOFA graph and calculate gradients over it many times in the course of optimization, the effect of a compressed computation graph is significant. The second difference is due to the inputs to the two algorithms. SOGBOFA uses algebraic expressions whereas lifted BP uses tabular CPTs. Therefore, SOGBOFA s computation is potentially more structured than lifted BP because the input CPTs are given as algebraic expressions and because subexpressions are also represented as nodes in the graph. In other words lifting can occur at sub-cpt level as well as at the full CPT level as in Lifted BP. Summarizing this discussion we have: Observation 1. The computation graph of Lifted SOGBOFA is at least as compressed as the message structure in Lifted BP. when run on the DBN generated by the planning problem, conditioned on the initial state, initial action, and rollout policy, and with no evidence Lifted SOGBOFA shrinks the computation graph and leads to faster estimates of action values. This can be used in two ways. Given a fixed time constraint, we can aim for the same number of updates and deeper search or keeping the same depth we can perform more updates or random restarts. Both of these have the potential to improve planning performance. Experimental Validation For the evaluation we need to discuss one additional aspect of SOGBOFA. The algorithm aims at on-line planning where one has to decide on the next action in a short amount of time, and then the process repeats. The algorithm unrolls the DBN given by a planning problem to a horizon h and then performs the optimization as described. If h is too deep the computation is slow and there is not enough time for the optimization. Therefore SOGBOFA dynamically decides on the depth in order to trade off accuracy with available computation time. Speedup in construction means that the algorithm will search to a greater depth. This potentially complicates the comparison of the original and lifted versions of the algorithm. For the experiments in this paper, we manually fix the depth for each problem to a reasonable setting, and then compare the results of the two algorithms. This evaluates the potential improvement from having more updates in the search process.

5 avg. reward Domain: tamarisk Random HandCode Lifted SOGBOFA (Fixed Depth) SOGBOFA (Fixed Depth) avg. reward Domain: traffic Random HandCode Lifted SOGBOFA (Fixed Depth) SOGBOFA (Fixed Depth) avg. reward Domain: skill_teaching Random HandCode Lifted SOGBOFA (Fixed Depth) SOGBOFA (Fixed Depth) instance index instance index instance index Figure 1: Comparison of SOGBOFA and Lifted SOGBOFA on 20 problems in 3 domains. The work of (Cui and Khardon 2016) has evaluated the algorithms on problems of varying sizes from 6 planning domains 1 and the code and details are available at github.com/hcui01/sogbofa. We follow that evaluation in using the original code and the same problems. We first evaluate the contribution of lifting to SOGBOFA in terms of graph size and number of updates on problem 11 in each domain. Following exploratory experiments, we fixed the depth manually to a value that was expected to yield reasonable results for SOGBOFA. 2 The depth used, average graph size (over multiple steps), average number of updates within 10 seconds are reported in Table 1. As can be seen, on 3 of the domains, the lifted algorithm creates smaller graphs and results in more updates. On the other domains there is a small improvement but the differences are not significant. Therefore in the following experiments we focus on the 3 domains where the lifting compression looks promising. We next turn to evaluate whether the compression translates to better planning performance. For this we follow the experimental methodology from (Cui and Khardon 2016). We run all problems with horizon 40 and each problem instance is run 20 times. The results reported represent the average cumulative reward (no discounting) and standard deviations from these 20 runs. To control the run time needed for these experiments the algorithms are given 10 seconds per time step (compared to 60 seconds in (Cui and Khardon 2016)). The results are shown in Figure 1. For visual clarity raw scores of cumulative reward are normalized relative to two fixed algorithms. The random policy is normalized as 0 and a simple human coded policy is normalized as 1. The raw scores are linearly scaled relative to these. As can be seen in the figure lifting provides an improvement in performance across a range of problems. For skill teaching a significant 1 The domains are taken from International Probabilistic Planning Competition (IPPC) 2011 (sysadmin, traffic), IPPC 2014 (skill teaching, academic advising, tamarisk) and the RDDL distribution (elevators) (Sanner 2010) 2 In particular, we pick the average depth value in a run of SOG- BOFA with dynamic depth on problem 11 in each domain. improvement can be seen on problems For tamarisk and traffic, the improvement is more modest, and there are three ranges of problem sizes. The smaller and easier problems are already solved well by SOGBOFA and we do not get a significant improvement. The largest problems are too large for both algorithms (with 10 seconds per step), i.e., they do not get enough updates and their performance degrades. In the middle (problems on tamarisk and on traffic) lifting provides performance imporvements. This analysis is corroborated by looking at graph sizes and the number of gradient updates from the same runs. Overall, the lifted algorithm does not degrade performance on any problem and improves performance in large number of cases. The range of problems where it provides an improvement will change with the setting of time limit, where we expect the gap to move to the right and increase for large problems, when given more time. Algebraic Solver for Marginal MAP Marginal MAP is computationaly hard and state of the art exact algorithms use branch and bound techniques (Lee et al. 2016). Approximation algorithms include mixed belief propagation (Liu and Ihler 2013), an extension of BP that directly addresses MMAP. The SOGBOFA algorithm uses the DBN model to solve the planning problem. But the analogy to inference does not require the DBN structure. The main conditions that are required are that the input is a directed model, that the action nodes are roots of that model, and that there is a single value node at a leaf of the directed model. The corresponding requirements for MMAP are that max nodes are at the roots of the model and that there is a single evidence node at a leaf of this graph. Given a MMAP problem with multiple evidence nodes, we can first disconnect the edges to children of evidence nodes by substituting the evidence node values directly. We can then connect all leaf evidence node to a new leaf node (denoted L* below) which captures the logical conjunction requiring that all those nodes have their observed values. See (Mauá 2016). This leads to the following algorithm:

6 Domain elevators sysadmin academic tamarisk traffic skill depth SOGBOFA Size Lifted Size SOGBOFA Updates Lifted Updates Table 1: Graph size and number of update statistics for SOGBOFA and lifted SOGBOFA Algebraic MMAP Algorithm: (1) Given a MMAP problem with decision nodes at roots perform the above translation to make it suitable for inference by SOGBOFA. (2) Generate a lifted algebraic graph from this MMAP problem, where MAP nodes are treated as action nodes and L* is the reward node of the planning problem. This graph satisfies the requirements for the SOGBOFA optimizer. (3) Optimization for decision nodes is done using the same procedure as in SOGBOFA, in particular, using gradient ascent on the prior marginals for the decision nodes, where gradients are computed using automatic differentiation and where gradient updates use an adaptive step size. Preliminary experiments with this algorithm suggest that it is competitive or better than mixed belief propagation (Liu and Ihler 2013) for this class of problems. We leave a more thorough experimental evaluation to future work. An interesting question is whether these ideas can be used to solve general MMAP problems. Mauá (2016) has shown that one can translate any MMAP problem into a MMAP (or maximum expected utility, MEU) problem with MAP variables at roots and a single evidence (or reward) variable. Unfortunately, while that construction is correct for exact inference, it does not preserve the advantage of the forward search in SOGBOFA. We leave an investigation into alternative constructions for future work. Related Work The connection between planning an inference is not new. In the context of influence diagrams, the connection between MEU and marginal MAP shown to be equivalent by (Mauá 2016), was pointed out by (Cooper 1988). Starting from an AI planning formulation, Domshlak and Hoffmann (2006) have shown how the conformant planning problem can be solved using weighted model counting for CNF, and Lee, Marinescau, and Dechter (2016) have shown how the problem can be translated to MMAP and solved directly by MMAP algorithms. Another major direction has been MDP formulations of factored state and action spaces, where symbolic versions of dynamic programming algorithms were developed (Hoey et al. 1999; Boutilier et al. 1995; Raghavan et al. 2012; 2013). It is easy to see that symbolic value iteration corresponds to alternating variable elimination where we alternate between eliminating state variables of a time slice and action variables of the same time slice. These algorithms cleverly avoid an explicit representation of the potentially exponential size policy. Both exact and approximate algorithms have been developed. The main difficulty with this approach is the space requirements of representing value functions or policies which can become prohibitive. A different approach to MDPs has been through formulations as variational inference. Starting with (Dayan and Hinton 1997) several formulations define a reward weighted posterior distribution over trajectories, where identifying the MAP over actions gives the optimal policy. Solutions include the expectation maximization (EM) or variational EM algorithms (Toussaint and Storsky 2006; Furmston and Barber 2010; Neumann 2011; Kober and Peters 2011) policy gradients (Kober and Peters 2011; van de Meent et al. 2016) and BP (Liu and Ihler 2012; Cheng et al. 2013). As discussed above, one of the main differences between these approaches and ours is that they condition on the reward whereas our approach simply evaluates the marginal of the reward. While in principle these are equivalent, the computational properties of different approximation algorithms on the different graphs is not necessarily the same. The work of Lang and Toussaint (2010) introduces an algorithm in the context of planning for robotics which, with the hindsight of this paper, looks close to AROLLOUT. Their algorithm performs action evaluation by forward simulation using the factored frontier algorithm, which is equivalent to BP. The details differ, however, because they sample concrete trajectories and do not take advantage of the symbolic aggregate simulation. Lifted SOGBOFA could potentially improve planning performance in that approach. Conclusions The paper identifies a connection between a successful heuristic for planning in large factored state and action spaces and belief propagation. The SOGBOFA heuristic performs its estimation symbolically and through that performs its search using gradients. This suggests a general scheme for approximation algorithms for MMAP. Any approximation scheme whose value or objective can be represented using an explicit computation graph, can be optimized directly with gradient search through automatic differentiation. Two immediate consequences of this connection are shown to derive new interesting algorithms. The first taking ideas from inference to planning is a lifted version of SOG- BOFA which improves planning performance. The second, taking ideas from planning to inference, is a new algorithm that applies SOGBOFA inference for a subclass of MMAP problems. These connections expose the potential for improving algorithms further in both fields.

7 Acknowledgments This work was partly supported by NSF under grant IIS Some of the experiments in this paper were performed on the Tufts Linux Research Cluster supported by Tufts Technology Services. References Boutilier, C.; Dearden, R.; Goldszmidt, M.; et al Exploiting structure in policy construction. In Proc. of the International Joint Conference on Artificial Intelligence, volume 14, Boutilier, C.; Dean, T.; and Hanks, S Planning under uncertainty: Structural assumptions and computational leverage. In Proceedings of the Second European Workshop on Planning, Cheng, Q.; Liu, Q.; Chen, F.; and Ihler, A. T Variational planning for graph-based mdps. In NIPS, Cooper, G. F A method for using belief networks as influence diagrams. UAI. Cui, H., and Khardon, R Online symbolic gradient-based optimization for factored action MDPs. In Proc. of the International Joint Conference on Artificial Intelligence. Cui, H.; Khardon, R.; Fern, A.; and Tadepalli, P Factored MCTS for large scale stochastic planning. In Proc. of the AAAI Conference on Artificial Intelligence. Dayan, P., and Hinton, G. E Using expectationmaximization for reinforcement learning. Neural Computation 9(2): Domshlak, C., and Hoffmann, J Fast probabilistic planning through weighted model counting. In Proc. of the International Conference on Automated Planning and Scheduling, Furmston, T., and Barber, D Variational methods for reinforcement learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, AISTATS, Furmston, T., and Barber, D Efficient inference in markov control problems. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (UAI), Griewank, A., and Walther, A Evaluating derivatives - principles and techniques of algorithmic differentiation (2. ed.). SIAM. Hoey, J.; St-Aubin, R.; Hu, A.; and Boutilier, C SPUDD: Stochastic planning using decision diagrams. In Proceedings of the Workshop on Uncertainty in Artificial Intelligence, Keller, T., and Helmert, M Trial-based heuristic tree search for finite horizon MDPs. In Proc. of the International Conference on Automated Planning and Scheduling. Kersting, K.; Ahmadi, B.; and Natarajan, S Counting belief propagation. In UAI, Kiselev, I., and Poupart, P A novel single-dbn generative model for optimizing POMDP controllers by probabilistic inference. In AAAI, AAAI Press. Kober, J., and Peters, J Policy search for motor primitives in robotics. Machine Learning 84(1-2): Kolobov, A.; Dai, P.; Mausam, M.; and Weld, D. S Reverse iterative deepening for finite-horizon MDPs with large branching factors. In Proc. of the International Conference on Automated Planning and Scheduling. Lang, T., and Toussaint, M Planning with noisy probabilistic relational rules. Journal of Artificial Intelligence Research 39:1 49. Lee, J.; Marinescu, R.; Dechter, R.; and Ihler, A. T From exact to anytime solutions for marginal MAP. In AAAI, AAAI Press. Lee, J.; Marinescau, R.; and Dechter, R Applying search based probabilistic inference algorithms to probabilistic conformant planning: Preliminary results. In Proceedings of the International Symposium on Artificial Intelligence and Mathematics (ISAIM). Liu, Q., and Ihler, A. T Belief propagation for structured decision making. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), Liu, Q., and Ihler, A. T Variational algorithms for marginal MAP. Journal of Machine Learning Research 14(1): Mauá, D. D Equivalences between maximum a posteriori inference in bayesian networks and maximum expected utility computation in influence diagrams. International Journal Approximate Reasoning 68: Neumann, G Variational inference for policy search in changing situations. In Proceedings of the 28th International Conference on Machine Learning, ICML, Nguyen, D. T.; Kumar, A.; and Lau, H. C Collective multiagent sequential decision making under uncertainty. In AAAI, Nitti, D.; Belle, V.; and Raedt, L. D Planning in discrete and continuous markov decision processes by probabilistic programming. In ECML/PKDD (2), volume 9285 of Lecture Notes in Computer Science, Springer. Pearl, J Probabilistic reasoning in intelligent systems - networks of plausible inference. Morgan Kaufmann series in representation and reasoning. Morgan Kaufmann. Puterman, M. L Markov decision processes: Discrete stochastic dynamic programming. Wiley. Raghavan, A.; Joshi, S.; Fern, A.; Tadepalli, P.; and Khardon, R Planning in factored action spaces with symbolic dynamic programming. In Proc. of the AAAI Conference on Artificial Intelligence. Raghavan, A.; Khardon, R.; Fern, A.; and Tadepalli, P Symbolic opportunistic policy iteration for factored-action MDPs. In Advances in Neural Information Processing Systems, Sabbadin, R.; Peyrard, N.; and Forsell, N A framework and a mean-field algorithm for the local control of spatial processes. International Journal Approximate Reasoning 53(1): Sanner, S Relational dynamic influence diagram language (rddl): Language description. Unpublished Manuscript. Australian National University. Singla, P., and Domingos, P. M Lifted first-order belief propagation. In AAAI. Toussaint, M., and Storsky, A Probabilistic inference for solving discrete and continuous sta te markov decision processes. In ICML. van de Meent, J.; Paige, B.; Tolpin, D.; and Wood, F Blackbox policy search with probabilistic programs. In Proceedings of the International Conference on Artificial Intelligence and Statistics, AISTATS,

Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation

Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation Hao Cui Department of Computer Science Tufts University Medford, MA 02155, USA hao.cui@tufts.edu Roni Khardon

More information

Online Symbolic Gradient-Based Optimization for Factored Action MDPs

Online Symbolic Gradient-Based Optimization for Factored Action MDPs Online Symbolic Gradient-Based Optimization for Factored Action MDPs Hao Cui and Roni Khardon Department of Computer Science, Tufts University, Medford, Massachusetts, USA Hao.Cui@tufts.edu, roni@cs.tufts.edu

More information

Online Symbolic Gradient-Based Optimization for Factored Action MDPs

Online Symbolic Gradient-Based Optimization for Factored Action MDPs Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-) Online Symbolic Gradient-Based Optimization for Factored Action MDPs Hao Cui and Roni Khardon Department

More information

Applying Search Based Probabilistic Inference Algorithms to Probabilistic Conformant Planning: Preliminary Results

Applying Search Based Probabilistic Inference Algorithms to Probabilistic Conformant Planning: Preliminary Results Applying Search Based Probabilistic Inference Algorithms to Probabilistic Conformant Planning: Preliminary Results Junkyu Lee *, Radu Marinescu ** and Rina Dechter * * University of California, Irvine

More information

Loopy Belief Propagation

Loopy Belief Propagation Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

More information

Symbolic LAO* Search for Factored Markov Decision Processes

Symbolic LAO* Search for Factored Markov Decision Processes Symbolic LAO* Search for Factored Markov Decision Processes Zhengzhu Feng Computer Science Department University of Massachusetts Amherst MA 01003 Eric A. Hansen Computer Science Department Mississippi

More information

Solving Relational MDPs with Exogenous Events and Additive Rewards

Solving Relational MDPs with Exogenous Events and Additive Rewards Solving Relational MDPs with Exogenous Events and Additive Rewards Saket Joshi 1, Roni Khardon 2, Prasad Tadepalli 3, Aswin Raghavan 3, and Alan Fern 3 1 Cycorp Inc., Austin, TX, USA 2 Tufts University,

More information

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference David Poole Department of Computer Science University of British Columbia 2366 Main Mall, Vancouver, B.C., Canada

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Other models of interactive domains Marc Toussaint University of Stuttgart Winter 2018/19 Basic Taxonomy of domain models Other models of interactive domains Basic Taxonomy of domain

More information

Multiagent Planning with Factored MDPs

Multiagent Planning with Factored MDPs Appeared in Advances in Neural Information Processing Systems NIPS-14, 2001. Multiagent Planning with Factored MDPs Carlos Guestrin Computer Science Dept Stanford University guestrin@cs.stanford.edu Daphne

More information

Learning Directed Probabilistic Logical Models using Ordering-search

Learning Directed Probabilistic Logical Models using Ordering-search Learning Directed Probabilistic Logical Models using Ordering-search Daan Fierens, Jan Ramon, Maurice Bruynooghe, and Hendrik Blockeel K.U.Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001

More information

Factored Markov Decision Processes

Factored Markov Decision Processes Chapter 4 Factored Markov Decision Processes 4.1. Introduction Solution methods described in the MDP framework (Chapters 1 and2 ) share a common bottleneck: they are not adapted to solve large problems.

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

A Co-Clustering approach for Sum-Product Network Structure Learning

A Co-Clustering approach for Sum-Product Network Structure Learning Università degli Studi di Bari Dipartimento di Informatica LACAM Machine Learning Group A Co-Clustering approach for Sum-Product Network Antonio Vergari Nicola Di Mauro Floriana Esposito December 8, 2014

More information

Heuristic Search Value Iteration Trey Smith. Presenter: Guillermo Vázquez November 2007

Heuristic Search Value Iteration Trey Smith. Presenter: Guillermo Vázquez November 2007 Heuristic Search Value Iteration Trey Smith Presenter: Guillermo Vázquez November 2007 What is HSVI? Heuristic Search Value Iteration is an algorithm that approximates POMDP solutions. HSVI stores an upper

More information

When Network Embedding meets Reinforcement Learning?

When Network Embedding meets Reinforcement Learning? When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE

More information

Markov Decision Processes and Reinforcement Learning

Markov Decision Processes and Reinforcement Learning Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence

More information

Adversarial Policy Switching with Application to RTS Games

Adversarial Policy Switching with Application to RTS Games Adversarial Policy Switching with Application to RTS Games Brian King 1 and Alan Fern 2 and Jesse Hostetler 2 Department of Electrical Engineering and Computer Science Oregon State University 1 kingbria@lifetime.oregonstate.edu

More information

Trial-based Heuristic Tree Search for Finite Horizon MDPs

Trial-based Heuristic Tree Search for Finite Horizon MDPs Trial-based Heuristic Tree Search for Finite Horizon MDPs Thomas Keller University of Freiburg Freiburg, Germany tkeller@informatik.uni-freiburg.de Malte Helmert University of Basel Basel, Switzerland

More information

An Approach to State Aggregation for POMDPs

An Approach to State Aggregation for POMDPs An Approach to State Aggregation for POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Eric A. Hansen Dept. of Computer Science and Engineering

More information

Abstract. 2 Background 2.1 Belief Networks. 1 Introduction

Abstract. 2 Background 2.1 Belief Networks. 1 Introduction Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference* David Poole Department of Computer Science University of British Columbia 2366 Main Mall, Vancouver, B.C., Canada

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated

More information

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................

More information

State Space Reduction For Hierarchical Reinforcement Learning

State Space Reduction For Hierarchical Reinforcement Learning In Proceedings of the 17th International FLAIRS Conference, pp. 59-514, Miami Beach, FL 24 AAAI State Space Reduction For Hierarchical Reinforcement Learning Mehran Asadi and Manfred Huber Department of

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

From Exact to Anytime Solutions for Marginal MAP

From Exact to Anytime Solutions for Marginal MAP From Exact to Anytime Solutions for Marginal MAP Junkyu Lee *, Radu Marinescu **, Rina Dechter * and Alexander Ihler * * University of California, Irvine ** IBM Research, Ireland AAAI 2016 Workshop on

More information

Planning and Acting in Uncertain Environments using Probabilistic Inference

Planning and Acting in Uncertain Environments using Probabilistic Inference Planning and Acting in Uncertain Environments using Probabilistic Inference Deepak Verma Dept. of CSE, Univ. of Washington, Seattle, WA 98195-2350 deepak@cs.washington.edu Rajesh P. N. Rao Dept. of CSE,

More information

Integrating Probabilistic Reasoning with Constraint Satisfaction

Integrating Probabilistic Reasoning with Constraint Satisfaction Integrating Probabilistic Reasoning with Constraint Satisfaction IJCAI Tutorial #7 Instructor: Eric I. Hsu July 17, 2011 http://www.cs.toronto.edu/~eihsu/tutorial7 Getting Started Discursive Remarks. Organizational

More information

Generalized First Order Decision Diagrams for First Order Markov Decision Processes

Generalized First Order Decision Diagrams for First Order Markov Decision Processes Generalized First Order Decision Diagrams for First Order Markov Decision Processes Saket Joshi Tufts University Medford, MA, USA sjoshi01@cs.tufts.edu Kristian Kersting Fraunhofer IAIS Sankt Augustin,

More information

AIC-PRAiSE User Guide

AIC-PRAiSE User Guide AIC-PRAiSE User Guide Rodrigo de Salvo Braz Artificial Intelligence Center SRI International June 2, 2017 S 1 Contents 1 Introduction 3 2 Installing PRAiSE 3 3 Using PRAiSE 3 3.1 Higher-Order Graphical

More information

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil " Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av Prof Mello Moraes, 31, 05508-900, São Paulo, SP - Brazil fgcozman@uspbr Abstract

More information

Name: UW CSE 473 Midterm, Fall 2014

Name: UW CSE 473 Midterm, Fall 2014 Instructions Please answer clearly and succinctly. If an explanation is requested, think carefully before writing. Points may be removed for rambling answers. If a question is unclear or ambiguous, feel

More information

Monte Carlo Tree Search PAH 2015

Monte Carlo Tree Search PAH 2015 Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the

More information

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between

More information

Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning

Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning Rina Dechter* Department of Information and Computer Science University of California, Irvine dechter@ics. uci. edu Abstract

More information

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture

More information

Lifted Belief Propagation: Pairwise Marginals and Beyond

Lifted Belief Propagation: Pairwise Marginals and Beyond Lifted Belief Propagation: Pairwise Marginals and Beyond Babak Ahmadi and Kristian Kersting and Fabian Hadiji Knowledge Dicovery Department, Fraunhofer IAIS 53754 Sankt Augustin, Germany firstname.lastname@iais.fraunhofer.de

More information

Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil

Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, 05508-900, São Paulo, SP - Brazil fgcozman@usp.br

More information

Reinforcement Learning: A brief introduction. Mihaela van der Schaar

Reinforcement Learning: A brief introduction. Mihaela van der Schaar Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @

More information

Generalized Inverse Reinforcement Learning

Generalized Inverse Reinforcement Learning Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract

More information

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying

Lecture 5: Exact inference. Queries. Complexity of inference. Queries (continued) Bayesian networks can answer questions about the underlying given that Maximum a posteriori (MAP query: given evidence 2 which has the highest probability: instantiation of all other variables in the network,, Most probable evidence (MPE: given evidence, find an

More information

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models

Computer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2

More information

15-780: MarkovDecisionProcesses

15-780: MarkovDecisionProcesses 15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic

More information

Solving Factored POMDPs with Linear Value Functions

Solving Factored POMDPs with Linear Value Functions IJCAI-01 workshop on Planning under Uncertainty and Incomplete Information (PRO-2), pp. 67-75, Seattle, Washington, August 2001. Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Computer

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

On the Space-Time Trade-off in Solving Constraint Satisfaction Problems*

On the Space-Time Trade-off in Solving Constraint Satisfaction Problems* Appeared in Proc of the 14th Int l Joint Conf on Artificial Intelligence, 558-56, 1995 On the Space-Time Trade-off in Solving Constraint Satisfaction Problems* Roberto J Bayardo Jr and Daniel P Miranker

More information

6 : Factor Graphs, Message Passing and Junction Trees

6 : Factor Graphs, Message Passing and Junction Trees 10-708: Probabilistic Graphical Models 10-708, Spring 2018 6 : Factor Graphs, Message Passing and Junction Trees Lecturer: Kayhan Batmanghelich Scribes: Sarthak Garg 1 Factor Graphs Factor Graphs are graphical

More information

Anytime Coordination Using Separable Bilinear Programs

Anytime Coordination Using Separable Bilinear Programs Anytime Coordination Using Separable Bilinear Programs Marek Petrik and Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst, MA 01003 {petrik, shlomo}@cs.umass.edu Abstract

More information

Introduction to Graphical Models

Introduction to Graphical Models Robert Collins CSE586 Introduction to Graphical Models Readings in Prince textbook: Chapters 10 and 11 but mainly only on directed graphs at this time Credits: Several slides are from: Review: Probability

More information

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours. CS 188 Spring 2010 Introduction to Artificial Intelligence Final Exam INSTRUCTIONS You have 3 hours. The exam is closed book, closed notes except a two-page crib sheet. Please use non-programmable calculators

More information

Scott Sanner NICTA, Statistical Machine Learning Group, Canberra, Australia

Scott Sanner NICTA, Statistical Machine Learning Group, Canberra, Australia Symbolic Dynamic Programming Scott Sanner NICTA, Statistical Machine Learning Group, Canberra, Australia Kristian Kersting Fraunhofer IAIS, Dept. of Knowledge Discovery, Sankt Augustin, Germany Synonyms

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Using Domain-Configurable Search Control for Probabilistic Planning

Using Domain-Configurable Search Control for Probabilistic Planning Using Domain-Configurable Search Control for Probabilistic Planning Ugur Kuter and Dana Nau Department of Computer Science and Institute for Systems Research University of Maryland College Park, MD 20742,

More information

Node Aggregation for Distributed Inference in Bayesian Networks

Node Aggregation for Distributed Inference in Bayesian Networks Node Aggregation for Distributed Inference in Bayesian Networks Kuo-Chu Chang and Robert Fung Advanced Decision Systmes 1500 Plymouth Street Mountain View, California 94043-1230 Abstract This study describes

More information

Planning and Control: Markov Decision Processes

Planning and Control: Markov Decision Processes CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What

More information

Stochastic Planning with First Order Decision Diagrams

Stochastic Planning with First Order Decision Diagrams Stochastic Planning with First Order Decision Diagrams Saket Joshi and Roni Khardon {sjoshi1,roni}@cs.tufts.edu Department of Computer Science, Tufts University 161 College Avenue Medford, MA, 2155, USA

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Applying Metric-Trees to Belief-Point POMDPs

Applying Metric-Trees to Belief-Point POMDPs Applying Metric-Trees to Belief-Point POMDPs Joelle Pineau, Geoffrey Gordon School of Computer Science Carnegie Mellon University Pittsburgh, PA 1513 {jpineau,ggordon}@cs.cmu.edu Sebastian Thrun Computer

More information

A fast point-based algorithm for POMDPs

A fast point-based algorithm for POMDPs A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl

More information

Real-Time Problem-Solving with Contract Algorithms

Real-Time Problem-Solving with Contract Algorithms Real-Time Problem-Solving with Contract Algorithms Shlomo Zilberstein Francois Charpillet Philippe Chassaing Computer Science Department INRIA-LORIA Institut Elie Cartan-INRIA University of Massachusetts

More information

Efficient Factored Inference with Affine Algebraic Decision Diagrams

Efficient Factored Inference with Affine Algebraic Decision Diagrams Journal of Artificial Intelligence Research? (?)? Submitted 9/1; published?/? Efficient Factored Inference with Affine Algebraic Decision Diagrams Scott Sanner NICTA and the Australian National University

More information

Integrating Machine Learning in Parallel Heuristic Search

Integrating Machine Learning in Parallel Heuristic Search From: Proceedings of the Eleventh International FLAIRS Conference. Copyright 1998, AAAI (www.aaai.org). All rights reserved. Integrating Machine Learning in Parallel Heuristic Search R. Craig Varnell Stephen

More information

Expectation Propagation

Expectation Propagation Expectation Propagation Erik Sudderth 6.975 Week 11 Presentation November 20, 2002 Introduction Goal: Efficiently approximate intractable distributions Features of Expectation Propagation (EP): Deterministic,

More information

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks Yang Xiang and Tristan Miller Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2

More information

APRICODD: Approximate Policy Construction using Decision Diagrams

APRICODD: Approximate Policy Construction using Decision Diagrams APRICODD: Approximate Policy Construction using Decision Diagrams Robert St-Aubin Dept. of Computer Science University of British Columbia Vancouver, BC V6T 14 staubin@cs.ubc.ca Jesse Hoey Dept. of Computer

More information

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Exact Inference STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Exact Inference What To Do With Bayesian/Markov Network? Compact representation of a complex model, but Goal: efficient extraction of

More information

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000 Belief propagation in a bucket-tree Handouts, 275B Fall-2000 Rina Dechter November 1, 2000 1 From bucket-elimination to tree-propagation The bucket-elimination algorithm, elim-bel, for belief updating

More information

Information Processing Letters

Information Processing Letters Information Processing Letters 112 (2012) 449 456 Contents lists available at SciVerse ScienceDirect Information Processing Letters www.elsevier.com/locate/ipl Recursive sum product algorithm for generalized

More information

Summary: A Tutorial on Learning With Bayesian Networks

Summary: A Tutorial on Learning With Bayesian Networks Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.

More information

COS 513: Foundations of Probabilistic Modeling. Lecture 5

COS 513: Foundations of Probabilistic Modeling. Lecture 5 COS 513: Foundations of Probabilistic Modeling Young-suk Lee 1 Administrative Midterm report is due Oct. 29 th. Recitation is at 4:26pm in Friend 108. Lecture 5 R is a computer language for statistical

More information

Bayesian Classification Using Probabilistic Graphical Models

Bayesian Classification Using Probabilistic Graphical Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies

More information

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.

D-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C. D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either

More information

Approximate Discrete Probability Distribution Representation using a Multi-Resolution Binary Tree

Approximate Discrete Probability Distribution Representation using a Multi-Resolution Binary Tree Approximate Discrete Probability Distribution Representation using a Multi-Resolution Binary Tree David Bellot and Pierre Bessière GravirIMAG CNRS and INRIA Rhône-Alpes Zirst - 6 avenue de l Europe - Montbonnot

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian

More information

Markov Decision Processes (MDPs) (cont.)

Markov Decision Processes (MDPs) (cont.) Markov Decision Processes (MDPs) (cont.) Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 29 th, 2007 Markov Decision Process (MDP) Representation State space: Joint state x

More information

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs Jiji Zhang Philosophy Department Carnegie Mellon University Pittsburgh, PA 15213 jiji@andrew.cmu.edu Abstract

More information

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012 Exam Topics Artificial Intelligence Recap & Expectation Maximization CSE 473 Dan Weld BFS, DFS, UCS, A* (tree and graph) Completeness and Optimality Heuristics: admissibility and consistency CSPs Constraint

More information

Using Domain-Configurable Search Control for Probabilistic Planning

Using Domain-Configurable Search Control for Probabilistic Planning Using Domain-Configurable Search Control for Probabilistic Planning Ugur Kuter and Dana Nau Department of Computer Science and Institute for Systems Research University of Maryland College Park, MD 20742,

More information

Binary Decision Diagrams

Binary Decision Diagrams Logic and roof Hilary 2016 James Worrell Binary Decision Diagrams A propositional formula is determined up to logical equivalence by its truth table. If the formula has n variables then its truth table

More information

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Tree-structured approximations by expectation propagation

Tree-structured approximations by expectation propagation Tree-structured approximations by expectation propagation Thomas Minka Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 USA minka@stat.cmu.edu Yuan Qi Media Laboratory Massachusetts

More information

Q-learning with linear function approximation

Q-learning with linear function approximation Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

Lecture 5: Exact inference

Lecture 5: Exact inference Lecture 5: Exact inference Queries Inference in chains Variable elimination Without evidence With evidence Complexity of variable elimination which has the highest probability: instantiation of all other

More information

LAO*, RLAO*, or BLAO*?

LAO*, RLAO*, or BLAO*? , R, or B? Peng Dai and Judy Goldsmith Computer Science Dept. University of Kentucky 773 Anderson Tower Lexington, KY 40506-0046 Abstract In 2003, Bhuma and Goldsmith introduced a bidirectional variant

More information

An Effective Upperbound on Treewidth Using Partial Fill-in of Separators

An Effective Upperbound on Treewidth Using Partial Fill-in of Separators An Effective Upperbound on Treewidth Using Partial Fill-in of Separators Boi Faltings Martin Charles Golumbic June 28, 2009 Abstract Partitioning a graph using graph separators, and particularly clique

More information

Partially Observable Markov Decision Processes. Silvia Cruciani João Carvalho

Partially Observable Markov Decision Processes. Silvia Cruciani João Carvalho Partially Observable Markov Decision Processes Silvia Cruciani João Carvalho MDP A reminder: is a set of states is a set of actions is the state transition function. is the probability of ending in state

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014 Suggested Reading: Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Probabilistic Modelling and Reasoning: The Junction

More information

VOILA: Efficient Feature-value Acquisition for Classification

VOILA: Efficient Feature-value Acquisition for Classification VOILA: Efficient Feature-value Acquisition for Classification Mustafa Bilgic and Lise Getoor University of Maryland College Park, MD {mbilgic,getoor}@cs.umd.edu Abstract We address the problem of efficient

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees Professor Erik Sudderth Brown University Computer Science September 22, 2016 Some figures and materials courtesy

More information

Stochastic Anytime Search for Bounding Marginal MAP

Stochastic Anytime Search for Bounding Marginal MAP Stochastic Anytime Search for Bounding Marginal MAP Radu Marinescu 1, Rina Dechter 2, Alexander Ihler 2 1 IBM Research 2 University of California, Irvine radu.marinescu@ie.ibm.com, dechter@ics.uci.edu,

More information

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks

A Parallel Algorithm for Exact Structure Learning of Bayesian Networks A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu

More information

Multiple Constraint Satisfaction by Belief Propagation: An Example Using Sudoku

Multiple Constraint Satisfaction by Belief Propagation: An Example Using Sudoku Multiple Constraint Satisfaction by Belief Propagation: An Example Using Sudoku Todd K. Moon and Jacob H. Gunther Utah State University Abstract The popular Sudoku puzzle bears structural resemblance to

More information