Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation

Size: px
Start display at page:

Download "Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation"

Transcription

1 Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation Hao Cui Department of Computer Science Tufts University Medford, MA 02155, USA Roni Khardon Department of Computer Science Indiana University Bloomington, IN, USA Abstract The paper reviews the SOGBOFA algorithm for stochastic planning in high dimensional state and action spaces and its connections to reinforcement learning. SOGBOFA builds an explicit computation graph capturing an approximation of the Q value for the current state and any action. The computation of this graph was recently shown to be equivalent to running belief propagation inference on the corresponding dynamic Bayesian network. This construction is combined with automatic differentiation over the graph to search for the best action in the current state, and the result is used for on-line planning. The algorithm therefore combines value function approximation through probabilistic inference with gradient search, both of which are closely related to algorithms in reinforcement learning. The key difference is that instead of the widely used idea of approximating Q values using function approximation, SOGBOFA uses a novel graph construction and approximate inference to compute Q values. We argue that these ideas have useful analogues for reinforcement learning. 1 Introduction SOGBOFA is an algorithm for stochastic planning in high dimensional state and action spaces. This paper gives an overview of the SOGBOFA algorithm and its connection to probabilistic inference and then explores the connections to model based reinforcement learning and policy gradients. SOGBOFA [2] extends the well known rollout algorithm [12] that chooses its actions through simulation. The rollout algorithm uses a simulator to estimate the quality of each possible action for the current step by taking that action and then continuing the simulation with some fixed policy. This method is both model free and policy free and it works well in some cases. Two limitations are the need for a good rollout policy and the need for many simulations. For the second point note that one must enumerate all actions and estimate the value for each action separately and, in addition, each action requires many simulations to get an accurate estimate. SOGBOFA is model based and it uses the model to improve over this algorithm in two important ways. The first is that, instead of using concrete simulation of trajectories, the algorithm builds an explicit computation graph capturing an approximation of the corresponding value when rolling out the random policy. Therefore a single symbolic simulation suffices. The second is that because the simulation is given in an explicit computation graph one can use automatic differentiation and gradients to search for the best action, avoiding the action enumeration which is required by rollout. Several recent observations expose the connection to RL more clearly [3]. The first is that the computation graph of SOGBOFA calculates exactly the same solution as the one that would be computed by belief propagation (BP) on the corresponding inference problem [5], and that this allows for im- Infer to Control: NIPS 2018 Workshop on Probabilistic Reinforcement Learning and Structured Control.

2 provements through lifted inference. The second is that the rollout policy can be optimized to some extent in the same manner and simultaneously with action selection. This approach was shown to be competitive in large scale planning problems and lifted-conformant SOGBOFA was the runner-up at the international probabilistic planning competition In addition, the most recent version of SOGBOFA deals with hard action constrains represented in first order logic, which is an important aspect of some recent RL applications. Finally, [5] developed a reduction that allows a variant of SOGBOFA to solve any Marginal MAP inference problem and sowed that this yields a state of the art MMAP solver. This might allow us ot use the same approach to solve a wider range of decision problems, including POMDPs. Unlike most methods in RL SOGBOFA assumes that an explicit representation of the model is given as input. However, like recent RL methods (e.g., [11]) it embeds its model inside a simulation whose output is being optimized. In addition, estimation of Q values is a fundamental question for modelfree RL methods such as policy gradient and Actor-Critic methods. SOGBOFA can provide an alternative way for constructing critics with more stable estimates. The interesting point is that it performs its computations in a manner that is different from current methods in RL and it offers opportunities for improvement through the use of aggregate simulation. The paper reviews some aspects of SOGBOFA and provides some further discussion of the similarities to work in RL and potential future work. 2 The SOGBOFA Algorithm Stochastic planning can be formalized using Markov decision processes [8] in factored state and action spaces. In factored spaces [1] the state is specified by a set of variables and the number of states is exponential in the number of variables. Similarly in factored action spaces an action is specified by a set of variables. We assume that all state and action variables are binary. Finite horizon planning can be captured using a dynamic Bayesian network (DBN) where state and action variables at each time step are represented explicitly and the CPTs of variables are given by the transition probabilities. SOGBOFA works in the setup of on-line planning. In on-line planning we are given a fixed limited time t per step and cannot compute a policy in advance. Instead, given the current state, the algorithm must decide on the next action within time t. Then the action is performed, a transition and reward are observed and the algorithm is presented with the next state. This process repeats and the long term performance of the algorithm is evaluated. SOGBOFA performs on-line planning by estimating the value of initial actions where a fixed rollout policy, typically a random policy, is used in future steps. The AROLLOUT algorithm [4] introduced the idea of algebraic simulation to estimate values but optimized over actions by enumeration. Then SOGBOFA showed how algebraic rollouts can be computed symbolically and that the optimization can be done using automatic differentiation. Graph Construction: The algorithm starts by translating from a high level language (e.g., RDDL [9]) to a DBN. AROLLOUT transforms the CPT of a node x into a disjoint sum form. In particular, the CPT is represented in the form if(c 11 c 12...) then p 1... if(c n1 c n2...) then p n, where p i is p(x=1) and the c ij are conjunctions of parent values which are are mutually exclusive and exhaustive. It then performs a forward pass calculating ˆp(x), an approximation of the true marginal p(x), for any node x in the graph. ˆp(x) is calculated as a function of ˆp(c ij ), an estimate of the probability that c ij is true, which assumes the parents are independent. This is done using the following equations where nodes are processed in the topological order of the graph: ˆp(x) = ij p(x c ij )ˆp(c ij ) = ij p i ˆp(c ij ) (1) ˆp(c ij ) = w k c ij ˆp(w k ) w k c ij (1 ˆp(w k )). (2) The following example from [2] illustrates this construction. The problem has three state variables s(1), s(2) and s(3), and three action variables a(1), a(2), a(3) respectively. In addition we have two 1 2

3 2 * * * 1/3 1/3 1/ a1 a2 a3 STEP Q r s1 s2 s3 a1 a2 a3 Figure 1: Example of SOGBOFA graph construction. intermediate variables cond1 and cond2 which are not part of the state. The transitions and reward are given by the following RDDL [9] expressions where primed variants of variables represent the value of the variable after performing the action. cond1 = Bernoulli(0.7) cond2 = Bernoulli(0.5) s (1) = if (cond1) then a(3) else false s (2) = if (s(1)) then a(2) else false s (3) = if (cond2) then s(2) else false reward = s(1) s(2) s(3) AROLLOUT translates the RDDL code into algebraic expressions using standard transformations from a logical to a numerical representation. In our example this yields: s (1) = (1-a(3))*0.7 s (2) = s(1)*a(2) s (3) = s(2) * 0.5 r = s(1) s(2) s(3) These expressions are used to calculate an approximation of marginal distributions over state and reward variables. The distribution at each time step is approximated using a product distribution over the state variables. To illustrate, assume that the state is s 0 = {s(1)=0, s(2)=1, s(3)=0} which we take to be a product of marginals. At the first step AROLLOUT uses a concrete action, for example a 0 = {a(1)=1, a(2)=0, a(3)=0}. This gives values for the reward r 0 = = 1 and state variables s 1 = {s(1)=(1 0) 0.7=0.7, s(2)=0 0=0, s(3)=1*0.5 = 0.5}. In future steps it calculates marginals for the action variables and uses them in a similar manner. For example if a 1 = {a(1)=0.33, a(2)=0.33, a(3)=0.33} we get r 1 = = 1.2 and s 2 = {s(1)=(1 0.33) 0.7, s(2)= 0.7 * 0.33, s(3)=0 0.5}. Summing the rewards from all steps gives an estimate of the Q value for a 0. AROLLOUT randomly enumerates values for a 0 and then selects the one with the highest estimate. The main observation in SOGBOFA is that instead of calculating concrete values, we can use the expressions to construct an explicit directed acyclic graph representing the computation steps, where the last node represents the expectation of the cumulative reward. SOGBOFA uses a symbolic representation for the first action and assumes that the rollout uses the random policy. In our example if the action variables are mutually exclusive (such constraints are often imposed in high level domain descriptions) this gives marginals of a 1 = {a(1)=0.33, a(2)=0.33, a(3)=0.33} over these variables. The SOGBOFA graph for our example expanded to depth 1 is shown In Figure 1. The bottom layer represents the current state and action variables. Each node at the next level represents the expression that AROLLOUT would have calculated for that marginal. To expand the planning horizon we simply duplicate the second layer construction multiple times. 3

4 Now, given concrete marginal probabilities for the action variables at the first step, i.e., a 0, an evaluation of the graph captures the same calculation as AROLLOUT. Calculating the gradient: The main observation is that once the graph is built, representing Q as a function of action variables, we can calculate gradients using the method of automatic differentiation [6]. Our implementation uses the reverse accumulation method since it is more efficient having linear complexity in the size of the DAG for calculation of all gradients. Lifted SOGBOFA Graph: As mentioned above the computation graph was shown to calculate the same value as would be computed by belief propagation. Building on that one can build a lifted graph based on ideas from lifted BP [10, 7]. In Lifted BP, two nodes or factors send identical messages when all their input messages are identical and in addition the local factor is identical. With the computational structure of SOGBOFA this has a clear analogy. Two nodes in SOGBOFA s graph are identical when their parents are identical and the local algebraic expressions capturing the local factors are identical. This suggests a straightforward implementation for Lifted SOGBOFA using dynamic programming, where the graph construction tracks nodes and avoids duplication. Moreover, by joining identical edges into grouped calculations (every plus node with k incoming identical edges is turned into a multiply node with constant multiplier k and every multiply node with k incoming identical edges is turned into a power node with constant power k) the graph directly captures counting expressions that are often seen in lifted inference. Optimizing the Rollout Policy: Note that, just like rollout, SOGBOFA does not calculate an explicit policy. It optimizes values for the current step and uses a random policy for trajectory actions. However, one can try to improve on this by optimizing the rollout policy as well. It turns out that optimizing a full-fledged policy is not realistic due to the non-transferability of the policy across planning problems with different goals as well as the approximation inherent in aggregate simulation. However, one can optimize rollout actions to some extent. In particular, we can directly optimize the marginal probability for each action variable in future steps. This gives the conformant variant of SOGBOFA (following a similar name for straight-line plans in stochastic planning). This type of optimization is already supported by the construction described above, by not assigning marginal probabilities to future action variables and turning them into symbolic variables. Then automatic differentiation and gradient ascent for all variables can be performed at the same time cost. Lifted conformant SOGBOFA was shown to be very effective especially in problems with large action spaces and high amount of stochasticity and was runner up in the recent planning competition. 3 Connection to Reinforcement Learning While the scheme above assumes a given symbolic model one can combine it with model learning to get a model based RL algorithm. The main requirement is that the representation of the learned model will allow for an algebraic representation (or be transformed to one) so as to be included with aggregate simulation. On the other hand, because the planner assumes independence in the graph construction, one can learn an independent model for each state variable and ignore correlation among them. This will give a novel model based RL algorithm that plans via approximate inference and first action gradients. This approach can also yield an interesting model based actor critic algorithm where the critic is constructed through the graph based on the learned model. Note that it is not necessary to learn the model off-line. The model learning can be done incrementally as typically done in RL. Our algorithm also has aspects that are similar to policy gradients. Here the main distinction is that a policy is not explicitly constructed by the algorithm. This is partly due to its use in on-line planning and the domain and goal independent nature of planning (i.e., because policies do not transfer for different reward functions). However, allowing for long term learning, and maintaining a richer policy representation which can be reused over multiple steps can yield a novel variant of policy gradients. Here too the simulation and gradients of policy gradients can be done through the graph construction. We note, however, that such a construction is non-trivial because aggregate simulations do not allow the policy to access the state directly. This discussion suggests that the techniques in SOGBOFA can be useful for new model based RL methods. Acknowledgments This work was partly supported by NSF under grant IIS

5 References [1] Craig Boutilier, Thomas Dean, and Steve Hanks. Planning under uncertainty: Structural assumptions and computational leverage. In Proceedings of the Second European Workshop on Planning, pages , [2] Hao Cui and Roni Khardon. Online symbolic gradient-based optimization for factored action MDPs. In Proc. of the International Joint Conference on Artificial Intelligence, [3] Hao Cui and Roni Khardon. Stochastic planning, lifted inference, and marginal MAP. In Workshop on Planning and Inference help with AAAI, [4] Hao Cui, Roni Khardon, Alan Fern, and Prasad Tadepalli. Factored MCTS for large scale stochastic planning. In Proc. of the AAAI Conference on Artificial Intelligence, [5] Hao Cui, Radu Marinescu, and Roni Khardon. From stochastic planning to marginal MAP. In Advances in Neural Information Processing Systems, [6] Andreas Griewank and Andrea Walther. Evaluating derivatives - principles and techniques of algorithmic differentiation (2. ed.). SIAM, [7] Kristian Kersting, Babak Ahmadi, and Sriraam Natarajan. Counting belief propagation. In UAI, pages , [8] M. L. Puterman. Markov decision processes: Discrete stochastic dynamic programming. Wiley, [9] Scott Sanner. Relational dynamic influence diagram language (rddl): Language description. Unpublished Manuscript. Australian National University, [10] Parag Singla and Pedro M. Domingos. Lifted first-order belief propagation. In AAAI, [11] Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, and Pieter Abbeel. Value iteration networks. In Advances in Neural Information Processing Systems, pages , [12] Gerald Tesauro and Gregory R Galperin. On-line policy improvement using monte-carlo search. In Advances in Neural Information Processing Systems, pages ,

Lifted Stochastic Planning, Belief Propagation and Marginal MAP

Lifted Stochastic Planning, Belief Propagation and Marginal MAP Lifted Stochastic Planning, Belief Propagation and Marginal MAP Hao Cui and Roni Khardon Department of Computer Science, Tufts University, Medford, Massachusetts, USA Hao.Cui@tufts.edu, roni@cs.tufts.edu

More information

Online Symbolic Gradient-Based Optimization for Factored Action MDPs

Online Symbolic Gradient-Based Optimization for Factored Action MDPs Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-) Online Symbolic Gradient-Based Optimization for Factored Action MDPs Hao Cui and Roni Khardon Department

More information

Online Symbolic Gradient-Based Optimization for Factored Action MDPs

Online Symbolic Gradient-Based Optimization for Factored Action MDPs Online Symbolic Gradient-Based Optimization for Factored Action MDPs Hao Cui and Roni Khardon Department of Computer Science, Tufts University, Medford, Massachusetts, USA Hao.Cui@tufts.edu, roni@cs.tufts.edu

More information

Applying Search Based Probabilistic Inference Algorithms to Probabilistic Conformant Planning: Preliminary Results

Applying Search Based Probabilistic Inference Algorithms to Probabilistic Conformant Planning: Preliminary Results Applying Search Based Probabilistic Inference Algorithms to Probabilistic Conformant Planning: Preliminary Results Junkyu Lee *, Radu Marinescu ** and Rina Dechter * * University of California, Irvine

More information

A Brief Introduction to Reinforcement Learning

A Brief Introduction to Reinforcement Learning A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent

More information

Multiagent Planning with Factored MDPs

Multiagent Planning with Factored MDPs Appeared in Advances in Neural Information Processing Systems NIPS-14, 2001. Multiagent Planning with Factored MDPs Carlos Guestrin Computer Science Dept Stanford University guestrin@cs.stanford.edu Daphne

More information

Solving Factored POMDPs with Linear Value Functions

Solving Factored POMDPs with Linear Value Functions IJCAI-01 workshop on Planning under Uncertainty and Incomplete Information (PRO-2), pp. 67-75, Seattle, Washington, August 2001. Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Computer

More information

6 th International Planning Competition: Uncertainty Part

6 th International Planning Competition: Uncertainty Part 6 th International Planning Competition: Uncertainty Part Daniel Bryce SRI International bryce@ai.sri.com Olivier Buffet LORIA-INRIA olivier.buffet@loria.fr Abstract The 6 th International Planning Competition

More information

Solving Relational MDPs with Exogenous Events and Additive Rewards

Solving Relational MDPs with Exogenous Events and Additive Rewards Solving Relational MDPs with Exogenous Events and Additive Rewards Saket Joshi 1, Roni Khardon 2, Prasad Tadepalli 3, Aswin Raghavan 3, and Alan Fern 3 1 Cycorp Inc., Austin, TX, USA 2 Tufts University,

More information

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA

Modeling and Reasoning with Bayesian Networks. Adnan Darwiche University of California Los Angeles, CA Modeling and Reasoning with Bayesian Networks Adnan Darwiche University of California Los Angeles, CA darwiche@cs.ucla.edu June 24, 2008 Contents Preface 1 1 Introduction 1 1.1 Automated Reasoning........................

More information

Click to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1

Click to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1 Approximate Click to edit Master titlemodels style for Batch RL Click to edit Emma Master Brunskill subtitle style 11 FVI / FQI PI Approximate model planners Policy Iteration maintains both an explicit

More information

Symbolic LAO* Search for Factored Markov Decision Processes

Symbolic LAO* Search for Factored Markov Decision Processes Symbolic LAO* Search for Factored Markov Decision Processes Zhengzhu Feng Computer Science Department University of Massachusetts Amherst MA 01003 Eric A. Hansen Computer Science Department Mississippi

More information

Generalized First Order Decision Diagrams for First Order Markov Decision Processes

Generalized First Order Decision Diagrams for First Order Markov Decision Processes Generalized First Order Decision Diagrams for First Order Markov Decision Processes Saket Joshi Tufts University Medford, MA, USA sjoshi01@cs.tufts.edu Kristian Kersting Fraunhofer IAIS Sankt Augustin,

More information

Generalized Inverse Reinforcement Learning

Generalized Inverse Reinforcement Learning Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract

More information

APRICODD: Approximate Policy Construction using Decision Diagrams

APRICODD: Approximate Policy Construction using Decision Diagrams APRICODD: Approximate Policy Construction using Decision Diagrams Robert St-Aubin Dept. of Computer Science University of British Columbia Vancouver, BC V6T 14 staubin@cs.ubc.ca Jesse Hoey Dept. of Computer

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Other models of interactive domains Marc Toussaint University of Stuttgart Winter 2018/19 Basic Taxonomy of domain models Other models of interactive domains Basic Taxonomy of domain

More information

Monte Carlo Tree Search PAH 2015

Monte Carlo Tree Search PAH 2015 Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated

More information

Markov Decision Processes and Reinforcement Learning

Markov Decision Processes and Reinforcement Learning Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Partially Observable Markov Decision Processes. Silvia Cruciani João Carvalho

Partially Observable Markov Decision Processes. Silvia Cruciani João Carvalho Partially Observable Markov Decision Processes Silvia Cruciani João Carvalho MDP A reminder: is a set of states is a set of actions is the state transition function. is the probability of ending in state

More information

Summary: A Tutorial on Learning With Bayesian Networks

Summary: A Tutorial on Learning With Bayesian Networks Summary: A Tutorial on Learning With Bayesian Networks Markus Kalisch May 5, 2006 We primarily summarize [4]. When we think that it is appropriate, we comment on additional facts and more recent developments.

More information

Scott Sanner NICTA, Statistical Machine Learning Group, Canberra, Australia

Scott Sanner NICTA, Statistical Machine Learning Group, Canberra, Australia Symbolic Dynamic Programming Scott Sanner NICTA, Statistical Machine Learning Group, Canberra, Australia Kristian Kersting Fraunhofer IAIS, Dept. of Knowledge Discovery, Sankt Augustin, Germany Synonyms

More information

Slides credited from Dr. David Silver & Hung-Yi Lee

Slides credited from Dr. David Silver & Hung-Yi Lee Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

An Approach to State Aggregation for POMDPs

An Approach to State Aggregation for POMDPs An Approach to State Aggregation for POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Eric A. Hansen Dept. of Computer Science and Engineering

More information

Trial-based Heuristic Tree Search for Finite Horizon MDPs

Trial-based Heuristic Tree Search for Finite Horizon MDPs Trial-based Heuristic Tree Search for Finite Horizon MDPs Thomas Keller University of Freiburg Freiburg, Germany tkeller@informatik.uni-freiburg.de Malte Helmert University of Basel Basel, Switzerland

More information

Factored Markov Decision Processes

Factored Markov Decision Processes Chapter 4 Factored Markov Decision Processes 4.1. Introduction Solution methods described in the MDP framework (Chapters 1 and2 ) share a common bottleneck: they are not adapted to solve large problems.

More information

Deep Reinforcement Learning

Deep Reinforcement Learning Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic

More information

AIC-PRAiSE User Guide

AIC-PRAiSE User Guide AIC-PRAiSE User Guide Rodrigo de Salvo Braz Artificial Intelligence Center SRI International June 2, 2017 S 1 Contents 1 Introduction 3 2 Installing PRAiSE 3 3 Using PRAiSE 3 3.1 Higher-Order Graphical

More information

Real-Time Problem-Solving with Contract Algorithms

Real-Time Problem-Solving with Contract Algorithms Real-Time Problem-Solving with Contract Algorithms Shlomo Zilberstein Francois Charpillet Philippe Chassaing Computer Science Department INRIA-LORIA Institut Elie Cartan-INRIA University of Massachusetts

More information

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks

A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks A Well-Behaved Algorithm for Simulating Dependence Structures of Bayesian Networks Yang Xiang and Tristan Miller Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2

More information

Dynamic Preferences in Multi-Criteria Reinforcement Learning

Dynamic Preferences in Multi-Criteria Reinforcement Learning Sriraam Natarajan Prasad Tadepalli School of Electrical Engineering and Computer Science, Oregon State University, USA NATARASR@EECS.ORST.EDU TADEPALL@EECS.ORST.EDU Abstract The current framework of reinforcement

More information

Approximate Discrete Probability Distribution Representation using a Multi-Resolution Binary Tree

Approximate Discrete Probability Distribution Representation using a Multi-Resolution Binary Tree Approximate Discrete Probability Distribution Representation using a Multi-Resolution Binary Tree David Bellot and Pierre Bessière GravirIMAG CNRS and INRIA Rhône-Alpes Zirst - 6 avenue de l Europe - Montbonnot

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Independence Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

State Space Reduction For Hierarchical Reinforcement Learning

State Space Reduction For Hierarchical Reinforcement Learning In Proceedings of the 17th International FLAIRS Conference, pp. 59-514, Miami Beach, FL 24 AAAI State Space Reduction For Hierarchical Reinforcement Learning Mehran Asadi and Manfred Huber Department of

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @

More information

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity

Comparing Dropout Nets to Sum-Product Networks for Predicting Molecular Activity 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Inference Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended)

Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Pavlos Andreadis, February 2 nd 2018 1 Markov Decision Processes A finite Markov Decision Process

More information

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:

More information

Stochastic Planning with First Order Decision Diagrams

Stochastic Planning with First Order Decision Diagrams Stochastic Planning with First Order Decision Diagrams Saket Joshi and Roni Khardon {sjoshi1,roni}@cs.tufts.edu Department of Computer Science, Tufts University 161 College Avenue Medford, MA, 2155, USA

More information

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models

1 : Introduction to GM and Directed GMs: Bayesian Networks. 3 Multivariate Distributions and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2015 1 : Introduction to GM and Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Wenbo Liu, Venkata Krishna Pillutla 1 Overview This lecture

More information

Efficient Factored Inference with Affine Algebraic Decision Diagrams

Efficient Factored Inference with Affine Algebraic Decision Diagrams Journal of Artificial Intelligence Research? (?)? Submitted 9/1; published?/? Efficient Factored Inference with Affine Algebraic Decision Diagrams Scott Sanner NICTA and the Australian National University

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 1 Course Overview This course is about performing inference in complex

More information

Loopy Belief Propagation

Loopy Belief Propagation Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

Reinforcement Learning: A brief introduction. Mihaela van der Schaar

Reinforcement Learning: A brief introduction. Mihaela van der Schaar Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming

More information

Learning Directed Probabilistic Logical Models using Ordering-search

Learning Directed Probabilistic Logical Models using Ordering-search Learning Directed Probabilistic Logical Models using Ordering-search Daan Fierens, Jan Ramon, Maurice Bruynooghe, and Hendrik Blockeel K.U.Leuven, Dept. of Computer Science, Celestijnenlaan 200A, 3001

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2, 2012 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

More information

Heuristic Search Value Iteration Trey Smith. Presenter: Guillermo Vázquez November 2007

Heuristic Search Value Iteration Trey Smith. Presenter: Guillermo Vázquez November 2007 Heuristic Search Value Iteration Trey Smith Presenter: Guillermo Vázquez November 2007 What is HSVI? Heuristic Search Value Iteration is an algorithm that approximates POMDP solutions. HSVI stores an upper

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 18, 2015 Today: Graphical models Bayes Nets: Representing distributions Conditional independencies

More information

When Network Embedding meets Reinforcement Learning?

When Network Embedding meets Reinforcement Learning? When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE

More information

CSE151 Assignment 2 Markov Decision Processes in the Grid World

CSE151 Assignment 2 Markov Decision Processes in the Grid World CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 gclin@ucsd.edu Tom Maddock A55645 tmaddock@ucsd.edu Abstract Markov decision processes exemplify sequential problems, which are

More information

Integrating Probabilistic Reasoning with Constraint Satisfaction

Integrating Probabilistic Reasoning with Constraint Satisfaction Integrating Probabilistic Reasoning with Constraint Satisfaction IJCAI Tutorial #7 Instructor: Eric I. Hsu July 17, 2011 http://www.cs.toronto.edu/~eihsu/tutorial7 Getting Started Discursive Remarks. Organizational

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 15, 2011 Today: Graphical models Inference Conditional independence and D-separation Learning from

More information

Planning and Control: Markov Decision Processes

Planning and Control: Markov Decision Processes CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 17 EM CS/CNS/EE 155 Andreas Krause Announcements Project poster session on Thursday Dec 3, 4-6pm in Annenberg 2 nd floor atrium! Easels, poster boards and cookies

More information

Announcements. Homework 4. Project 3. Due tonight at 11:59pm. Due 3/8 at 4:00pm

Announcements. Homework 4. Project 3. Due tonight at 11:59pm. Due 3/8 at 4:00pm Announcements Homework 4 Due tonight at 11:59pm Project 3 Due 3/8 at 4:00pm CS 188: Artificial Intelligence Constraint Satisfaction Problems Instructor: Stuart Russell & Sergey Levine, University of California,

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 1, 2019 Today: Inference in graphical models Learning graphical models Readings: Bishop chapter 8 Bayesian

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

arxiv: v1 [cs.db] 22 Mar 2018

arxiv: v1 [cs.db] 22 Mar 2018 Learning State Representations for Query Optimization with Deep Reinforcement Learning Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, S. Sathiya Keerthi + University of Washington, Microsoft, Criteo

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Bayes Nets: Inference Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188

More information

CS Deep Reinforcement Learning HW4: Model-Based RL due October 18th, 11:59 pm

CS Deep Reinforcement Learning HW4: Model-Based RL due October 18th, 11:59 pm CS294-112 Deep Reinforcement Learning HW4: Model-Based RL due October 18th, 11:59 pm 1 Introduction The goal of this assignment is to get experience with model-learning in the context of RL, and to use

More information

Bayesian Networks Inference (continued) Learning

Bayesian Networks Inference (continued) Learning Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning

More information

Bayesian Classification Using Probabilistic Graphical Models

Bayesian Classification Using Probabilistic Graphical Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University

More information

Lecture 4: Undirected Graphical Models

Lecture 4: Undirected Graphical Models Lecture 4: Undirected Graphical Models Department of Biostatistics University of Michigan zhenkewu@umich.edu http://zhenkewu.com/teaching/graphical_model 15 September, 2016 Zhenke Wu BIOSTAT830 Graphical

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019 Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 1 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Introduction, History, General Concepts

More information

Tree-structured approximations by expectation propagation

Tree-structured approximations by expectation propagation Tree-structured approximations by expectation propagation Thomas Minka Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 USA minka@stat.cmu.edu Yuan Qi Media Laboratory Massachusetts

More information

Semantics and Inference for Recursive Probability Models

Semantics and Inference for Recursive Probability Models From: AAAI-00 Proceedings. Copyright 2000, AAAI (www.aaai.org). All rights reserved. Semantics and Inference for Recursive Probability Models Avi Pfeffer Division of Engineering and Applied Sciences Harvard

More information

A fast point-based algorithm for POMDPs

A fast point-based algorithm for POMDPs A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl

More information

Lifted Belief Propagation: Pairwise Marginals and Beyond

Lifted Belief Propagation: Pairwise Marginals and Beyond Lifted Belief Propagation: Pairwise Marginals and Beyond Babak Ahmadi and Kristian Kersting and Fabian Hadiji Knowledge Dicovery Department, Fraunhofer IAIS 53754 Sankt Augustin, Germany firstname.lastname@iais.fraunhofer.de

More information

CS450 - Structure of Higher Level Languages

CS450 - Structure of Higher Level Languages Spring 2018 Streams February 24, 2018 Introduction Streams are abstract sequences. They are potentially infinite we will see that their most interesting and powerful uses come in handling infinite sequences.

More information

Causality in Communication: The Agent-Encapsulated Bayesian Network Model

Causality in Communication: The Agent-Encapsulated Bayesian Network Model Causality in Communication: The Agent-Encapsulated Bayesian Network Model Scott Langevin Oculus Info, Inc. Toronto, Ontario, Canada Marco Valtorta mgv@cse.sc.edu University of South Carolina, Columbia,

More information

Markov Decision Processes. (Slides from Mausam)

Markov Decision Processes. (Slides from Mausam) Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

Generalized and Bounded Policy Iteration for Interactive POMDPs

Generalized and Bounded Policy Iteration for Interactive POMDPs Generalized and Bounded Policy Iteration for Interactive POMDPs Ekhlas Sonu and Prashant Doshi THINC Lab, Dept. of Computer Science University Of Georgia Athens, GA 30602 esonu@uga.edu, pdoshi@cs.uga.edu

More information

Directed Graphical Models (Bayes Nets) (9/4/13)

Directed Graphical Models (Bayes Nets) (9/4/13) STA561: Probabilistic machine learning Directed Graphical Models (Bayes Nets) (9/4/13) Lecturer: Barbara Engelhardt Scribes: Richard (Fangjian) Guo, Yan Chen, Siyang Wang, Huayang Cui 1 Introduction For

More information

An Improved Algebraic Attack on Hamsi-256

An Improved Algebraic Attack on Hamsi-256 An Improved Algebraic Attack on Hamsi-256 Itai Dinur and Adi Shamir Computer Science department The Weizmann Institute Rehovot 76100, Israel Abstract. Hamsi is one of the 14 second-stage candidates in

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000

Belief propagation in a bucket-tree. Handouts, 275B Fall Rina Dechter. November 1, 2000 Belief propagation in a bucket-tree Handouts, 275B Fall-2000 Rina Dechter November 1, 2000 1 From bucket-elimination to tree-propagation The bucket-elimination algorithm, elim-bel, for belief updating

More information

Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning

Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning Applying the Episodic Natural Actor-Critic Architecture to Motor Primitive Learning Jan Peters 1, Stefan Schaal 1 University of Southern California, Los Angeles CA 90089, USA Abstract. In this paper, we

More information

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees

CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees Professor Erik Sudderth Brown University Computer Science September 22, 2016 Some figures and materials courtesy

More information

Uncertain Data Models

Uncertain Data Models Uncertain Data Models Christoph Koch EPFL Dan Olteanu University of Oxford SYNOMYMS data models for incomplete information, probabilistic data models, representation systems DEFINITION An uncertain data

More information

Planning and Acting in Uncertain Environments using Probabilistic Inference

Planning and Acting in Uncertain Environments using Probabilistic Inference Planning and Acting in Uncertain Environments using Probabilistic Inference Deepak Verma Dept. of CSE, Univ. of Washington, Seattle, WA 98195-2350 deepak@cs.washington.edu Rajesh P. N. Rao Dept. of CSE,

More information

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation:

Inference. Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: B A E J M Most likely explanation: This slide deck courtesy of Dan Klein at

More information

A Co-Clustering approach for Sum-Product Network Structure Learning

A Co-Clustering approach for Sum-Product Network Structure Learning Università degli Studi di Bari Dipartimento di Informatica LACAM Machine Learning Group A Co-Clustering approach for Sum-Product Network Antonio Vergari Nicola Di Mauro Floriana Esposito December 8, 2014

More information

Neuro-Dynamic Programming An Overview

Neuro-Dynamic Programming An Overview 1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. May 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly applicable,

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

6 : Factor Graphs, Message Passing and Junction Trees

6 : Factor Graphs, Message Passing and Junction Trees 10-708: Probabilistic Graphical Models 10-708, Spring 2018 6 : Factor Graphs, Message Passing and Junction Trees Lecturer: Kayhan Batmanghelich Scribes: Sarthak Garg 1 Factor Graphs Factor Graphs are graphical

More information

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference

Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference David Poole Department of Computer Science University of British Columbia 2366 Main Mall, Vancouver, B.C., Canada

More information

Neural Networks and Tree Search

Neural Networks and Tree Search Mastering the Game of Go With Deep Neural Networks and Tree Search Nabiha Asghar 27 th May 2016 AlphaGo by Google DeepMind Go: ancient Chinese board game. Simple rules, but far more complicated than Chess

More information

Hidden Markov Models in the context of genetic analysis

Hidden Markov Models in the context of genetic analysis Hidden Markov Models in the context of genetic analysis Vincent Plagnol UCL Genetics Institute November 22, 2012 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi

More information

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models Prof. Daniel Cremers 4. Probabilistic Graphical Models Directed Models The Bayes Filter (Rep.) (Bayes) (Markov) (Tot. prob.) (Markov) (Markov) 2 Graphical Representation (Rep.) We can describe the overall

More information

Markov Logic: Representation

Markov Logic: Representation Markov Logic: Representation Overview Statistical relational learning Markov logic Basic inference Basic learning Statistical Relational Learning Goals: Combine (subsets of) logic and probability into

More information

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs

A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs A Transformational Characterization of Markov Equivalence for Directed Maximal Ancestral Graphs Jiji Zhang Philosophy Department Carnegie Mellon University Pittsburgh, PA 15213 jiji@andrew.cmu.edu Abstract

More information