Click to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1

Size: px
Start display at page:

Download "Click to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1"

Transcription

1 Approximate Click to edit Master titlemodels style for Batch RL Click to edit Emma Master Brunskill subtitle style 11

2 FVI / FQI PI Approximate model planners Policy Iteration maintains both an explicit representation of a policy and the value of that policy Image from David Silver 2

3 Forward Search w/generative Model a1 a2 s2 s2 Slide modified from David Silver 3

4 Exact/Exhaustive Forward Search a1 max exp a2 s2 exp s2 Slide modified from David Silver 4

5 How many nodes in a H-depth tree (as a function of state space S and action space A )? a1 max exp a2 s2 exp s2 Slide modified from David Silver 5

6 How many nodes in a H-depth tree (as a function of state space S and action space A )? ( S A )H a1 max exp a2 s2 exp s2 Slide modified from David Silver 6

7 Sparse Sampling: Don t Enumerate All Next States, Instead Sample Next States s ~ P(s s,a) a1 max exp a2 s2 exp s37 s2 Sample n next states, si ~ P(s s,a) Compute (1/n) Sumi V(s_i) Converges to expected future reward: Sums p(s s,a)v(s ) Slide modified from David Silver 7

8 Sparse Sampling: # nodes if sample n states at each action node? Independent of S! O(n A )H a1 max exp a2 s2 exp s37 s2 Sample n next states, si ~ P(s s,a) Compute (1/n) Sumi V(s_i) Converges to expected future reward: Sums p(s s,a)v(s ) Slide modified from David Silver 8

9 Sparse Sampling: # nodes if sample n states at each action node? Independent of S! O(n A )H a1 max exp a2 s2 exp s37 s2 Upside: Can choose n to achieve bounds on the accuracy of the value function at the root state, independent of state space size Downside: Still exponential in horizon, n still large for good bounds Slide modified from David Silver 9

10 Limitation of Sparse Sampling Slide modified from Alan Fern 10

11 Monte Carlo Tree Search Combine ideas of sparse sampling with an adaptive method for focusing on more promising parts of the ree Here more promising means the actions that are seem likely to yield higher long term reward Uses the idea of simulation search 11

12 Simulation Based Search Slide modified from David Silver 12

13 Simulation based Search Slide modified from David Silver 13

14 Simple Monte Carlo Search rollout policy dafa greedy improvement with respect to fixed rollout policy Slide modified from David Silver 14

15 Upper Confidence Tree (UCT) [Kocsis & Szepesvari, 2006] Combine forward search and simulation search Instance of Monte-Carlo Tree Search Repeated Monte Carlo simulation of rollout policy Rollouts add one or more nodes to search tree UCT Uses optimism under uncertainty idea Some nice theoretical properties Much better realtime performance than sparse sampling Slide modified from Alan Fern 15

16 a1 Slide modified from Alan Fern 16

17 a1 a2 Slide modified from Alan Fern 17

18 a1 a2 Slide modified from Alan Fern 18

19 a1 a2 Slide modified from Alan Fern 19

20 a1 s2 a2 1 Slide modified from Alan Fern 20

21 (Upper Confidence Bound) Slide modified from Alan Fern 21

22 Slide modified from Alan Fern 22

23 Requires us to have a simulator/ generative model Each pass down the tree, follow tree policy until reach a state leaf where not all actions have been tried. Then need to simulate starting from that state leaf the result of taking another action Slide modified from Alan Fern 23

24 Guarantees on UCT Slide modified from Remi Munos 24

25 Computer Go Previous game tree approaches faired poorly Slide modified from slides from Alan Fern & David Silver 25

26 Rules of Go Slide modified from David Silver 26

27 Position Evaluation in Go Slide modified from David Silver 27

28 Monte Carlo Evaluation in Go: Planning problem, just a very very hard one Slide modified from David Silver 28

29 Enormous Progress. MCTS Huge Impact Slide modified from David Silver 29

30 Going Back to Batch RL... Use supervised learning method to compute model Use learned model with MCTS planning Note: error in model will impact error in estimated values! Computes an action for current state, take action, then redo planning for next state 30

31 Autonomous Driving using Texplore (Hester and Stone 2013) 31

32 FVI / FQI API Approximate model planners Image from David Silver 32

33 33

Monte Carlo Tree Search PAH 2015

Monte Carlo Tree Search PAH 2015 Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated

More information

44.1 Introduction Introduction. Foundations of Artificial Intelligence Monte-Carlo Methods Sparse Sampling 44.4 MCTS. 44.

44.1 Introduction Introduction. Foundations of Artificial Intelligence Monte-Carlo Methods Sparse Sampling 44.4 MCTS. 44. Foundations of Artificial ntelligence May 27, 206 44. : ntroduction Foundations of Artificial ntelligence 44. : ntroduction Thomas Keller Universität Basel May 27, 206 44. ntroduction 44.2 Monte-Carlo

More information

Approximation Algorithms for Planning Under Uncertainty. Andrey Kolobov Computer Science and Engineering University of Washington, Seattle

Approximation Algorithms for Planning Under Uncertainty. Andrey Kolobov Computer Science and Engineering University of Washington, Seattle Approximation Algorithms for Planning Under Uncertainty Andrey Kolobov Computer Science and Engineering University of Washington, Seattle 1 Why Approximate? Difficult example applications: Inventory management

More information

Bandit-based Search for Constraint Programming

Bandit-based Search for Constraint Programming Bandit-based Search for Constraint Programming Manuel Loth 1,2,4, Michèle Sebag 2,4,1, Youssef Hamadi 3,1, Marc Schoenauer 4,2,1, Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud

More information

CS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8

CS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8 CS885 Reinforcement Learning Lecture 9: May 30, 2018 Model-based RL [SutBar] Chap 8 CS885 Spring 2018 Pascal Poupart 1 Outline Model-based RL Dyna Monte-Carlo Tree Search CS885 Spring 2018 Pascal Poupart

More information

Neural Networks and Tree Search

Neural Networks and Tree Search Mastering the Game of Go With Deep Neural Networks and Tree Search Nabiha Asghar 27 th May 2016 AlphaGo by Google DeepMind Go: ancient Chinese board game. Simple rules, but far more complicated than Chess

More information

Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation

Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation Hao Cui Department of Computer Science Tufts University Medford, MA 02155, USA hao.cui@tufts.edu Roni Khardon

More information

CME 213 SPRING Eric Darve

CME 213 SPRING Eric Darve CME 213 SPRING 2017 Eric Darve MPI SUMMARY Point-to-point and collective communications Process mapping: across nodes and within a node (socket, NUMA domain, core, hardware thread) MPI buffers and deadlocks

More information

Christian Späth summing-up. Implementation and evaluation of a hierarchical planning-system for

Christian Späth summing-up. Implementation and evaluation of a hierarchical planning-system for Christian Späth 11.10.2011 summing-up Implementation and evaluation of a hierarchical planning-system for factored POMDPs Content Introduction and motivation Hierarchical POMDPs POMDP FSC Algorithms A*

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Action Selection for MDPs: Anytime AO* Versus UCT

Action Selection for MDPs: Anytime AO* Versus UCT Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence Action Selection for MDPs: Anytime AO* Versus UCT Blai Bonet Universidad Simón Bolívar Caracas, Venezuela bonet@ldc.usb.ve Hector

More information

Master Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces

Master Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Master Thesis Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Andreas ten Pas Master Thesis DKE 09-16 Thesis submitted in partial fulfillment

More information

Deep Reinforcement Learning

Deep Reinforcement Learning Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic

More information

Large Scale Parallel Monte Carlo Tree Search on GPU

Large Scale Parallel Monte Carlo Tree Search on GPU Large Scale Parallel Monte Carlo Tree Search on Kamil Rocki The University of Tokyo Graduate School of Information Science and Technology Department of Computer Science 1 Tree search Finding a solution

More information

Trial-based Heuristic Tree Search for Finite Horizon MDPs

Trial-based Heuristic Tree Search for Finite Horizon MDPs Trial-based Heuristic Tree Search for Finite Horizon MDPs Thomas Keller University of Freiburg Freiburg, Germany tkeller@informatik.uni-freiburg.de Malte Helmert University of Basel Basel, Switzerland

More information

Diagnose and Decide: An Optimal Bayesian Approach

Diagnose and Decide: An Optimal Bayesian Approach Diagnose and Decide: An Optimal Bayesian Approach Christopher Amato CSAIL MIT camato@csail.mit.edu Emma Brunskill Computer Science Department Carnegie Mellon University ebrun@cs.cmu.edu Abstract Many real-world

More information

15-780: MarkovDecisionProcesses

15-780: MarkovDecisionProcesses 15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic

More information

arxiv: v3 [cs.ai] 19 Sep 2017

arxiv: v3 [cs.ai] 19 Sep 2017 Journal of Artificial Intelligence Research 58 (2017) 231-266 Submitted 09/16; published 01/17 DESPOT: Online POMDP Planning with Regularization Nan Ye ACEMS & Queensland University of Technology, Australia

More information

Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes

Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Michael W. Boldt, Robert P. Goldman, David J. Musliner, Smart Information Flow Technologies (SIFT) Minneapolis,

More information

Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic

Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic Truong-Huy Dinh Nguyen, Wee-Sun Lee, and Tze-Yun Leong National University of Singapore, Singapore 117417 trhuy, leews, leongty@comp.nus.edu.sg

More information

On the Parallelization of UCT

On the Parallelization of UCT On the Parallelization of UCT Tristan Cazenave 1 and Nicolas Jouandeau 2 1 Dept. Informatique cazenave@ai.univ-paris8.fr 2 Dept. MIME n@ai.univ-paris8.fr LIASD, Université Paris 8, 93526, Saint-Denis,

More information

Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes

Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes Michael W. Boldt, Robert P. Goldman, David J. Musliner, Smart Information Flow Technologies (SIFT) Minneapolis,

More information

Monte Carlo Tree Search: From Playing Go to Feature Selection

Monte Carlo Tree Search: From Playing Go to Feature Selection Monte Carlo Tree Search: From Playing Go to Feature Selection Michèle Sebag joint work: Olivier Teytaud, Sylvain Gelly, Philippe Rolet, Romaric Gaudel TAO, Univ. Paris-Sud Planning to Learn, ECAI 2010,

More information

Markov Decision Processes. (Slides from Mausam)

Markov Decision Processes. (Slides from Mausam) Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology

More information

LinUCB Applied to Monte Carlo Tree Search

LinUCB Applied to Monte Carlo Tree Search LinUCB Applied to Monte Carlo Tree Search Yusaku Mandai and Tomoyuki Kaneko The Graduate School of Arts and Sciences, the University of Tokyo, Tokyo, Japan mandai@graco.c.u-tokyo.ac.jp Abstract. UCT is

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

Slides credited from Dr. David Silver & Hung-Yi Lee

Slides credited from Dr. David Silver & Hung-Yi Lee Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to

More information

Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended)

Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Pavlos Andreadis, February 2 nd 2018 1 Markov Decision Processes A finite Markov Decision Process

More information

A Series of Lectures on Approximate Dynamic Programming

A Series of Lectures on Approximate Dynamic Programming A Series of Lectures on Approximate Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.)

More information

Applications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich?

Applications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich? Applications of Reinforcement Learning Ist künstliche Intelligenz gefährlich? Table of contents Playing Atari with Deep Reinforcement Learning Playing Super Mario World Stanford University Autonomous Helicopter

More information

Markov Decision Processes and Reinforcement Learning

Markov Decision Processes and Reinforcement Learning Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence

More information

Artificial Intelligence, CS, Nanjing University Spring, 2016, Yang Yu. Lecture 5: Search 4.

Artificial Intelligence, CS, Nanjing University Spring, 2016, Yang Yu. Lecture 5: Search 4. Artificial Intelligence, CS, Nanjing University Spring, 2016, Yang Yu Lecture 5: Search 4 http://cs.nju.edu.cn/yuy/course_ai16.ashx Previously... Path-based search Uninformed search Depth-first, breadth

More information

arxiv: v1 [cs.ai] 13 Oct 2017

arxiv: v1 [cs.ai] 13 Oct 2017 Combinatorial Multi-armed Bandits for Real-Time Strategy Games Combinatorial Multi-armed Bandits for Real-Time Strategy Games Santiago Ontañón Computer Science Department, Drexel University Philadelphia,

More information

Heuristic Search in MDPs 3/5/18

Heuristic Search in MDPs 3/5/18 Heuristic Search in MDPs 3/5/18 Thinking about online planning. How can we use ideas we ve already seen to help with online planning? Heuristics? Iterative deepening? Monte Carlo simulations? Other ideas?

More information

Backtracking. Chapter 5

Backtracking. Chapter 5 1 Backtracking Chapter 5 2 Objectives Describe the backtrack programming technique Determine when the backtracking technique is an appropriate approach to solving a problem Define a state space tree for

More information

DESPOT: Online POMDP Planning with Regularization

DESPOT: Online POMDP Planning with Regularization APPEARED IN Advances in Neural Information Processing Systems (NIPS), 2013 DESPOT: Online POMDP Planning with Regularization Adhiraj Somani Nan Ye David Hsu Wee Sun Lee Department of Computer Science National

More information

Balancing Exploration and Exploitation in Classical Planning

Balancing Exploration and Exploitation in Classical Planning Proceedings of the Seventh Annual Symposium on Combinatorial Search (SoCS 2014) Balancing Exploration and Exploitation in Classical Planning Tim Schulte and Thomas Keller University of Freiburg {schultet

More information

Schema bandits for binary encoded combinatorial optimisation problems

Schema bandits for binary encoded combinatorial optimisation problems Schema bandits for binary encoded combinatorial optimisation problems Madalina M. Drugan 1, Pedro Isasi 2, and Bernard Manderick 1 1 Artificial Intelligence Lab, Vrije Universitieit Brussels, Pleinlaan

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 45. AlphaGo and Outlook Malte Helmert and Gabriele Röger University of Basel May 22, 2017 Board Games: Overview chapter overview: 40. Introduction and State of the

More information

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

More information

An Analysis of Monte Carlo Tree Search

An Analysis of Monte Carlo Tree Search Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) An Analysis of Monte Carlo Tree Search Steven James, George Konidaris, Benjamin Rosman University of the Witwatersrand,

More information

Parallel Monte-Carlo Tree Search

Parallel Monte-Carlo Tree Search Parallel Monte-Carlo Tree Search Guillaume M.J-B. Chaslot, Mark H.M. Winands, and H. Jaap van den Herik Games and AI Group, MICC, Faculty of Humanities and Sciences, Universiteit Maastricht, Maastricht,

More information

Heuristic (Informed) Search

Heuristic (Informed) Search Heuristic (Informed) Search (Where we try to choose smartly) R&N: Chap., Sect..1 3 1 Search Algorithm #2 SEARCH#2 1. INSERT(initial-node,Open-List) 2. Repeat: a. If empty(open-list) then return failure

More information

Improving UCT Planning via Approximate Homomorphisms

Improving UCT Planning via Approximate Homomorphisms Improving UCT Planning via Approximate Homomorphisms Nan Jiang Computer Science & Eng. University of Michigan nanjiang@umich.edu Satinder Singh Computer Science & Eng. University of Michigan baveja@umich.edu

More information

Trial-based Heuristic Tree Search for Finite Horizon MDPs

Trial-based Heuristic Tree Search for Finite Horizon MDPs Trial-based Heuristic Tree Search for Finite Horizon MDPs Thomas Keller University of Freiburg Freiburg, Germany tkeller@informatik.uni-freiburg.de Malte Helmert University of Basel Basel, Switzerland

More information

Planning and Control: Markov Decision Processes

Planning and Control: Markov Decision Processes CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What

More information

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10 Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:

More information

CS 188: Artificial Intelligence Spring Recap Search I

CS 188: Artificial Intelligence Spring Recap Search I CS 188: Artificial Intelligence Spring 2011 Midterm Review 3/14/2011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the

More information

Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search Tom Pepels 1, Tristan Cazenave 2, Mark H.M. Winands 1, and Marc Lanctot 1 1 Department of Knowledge Engineering, Maastricht University

More information

A MONTE CARLO ROLLOUT ALGORITHM FOR STOCK CONTROL

A MONTE CARLO ROLLOUT ALGORITHM FOR STOCK CONTROL A MONTE CARLO ROLLOUT ALGORITHM FOR STOCK CONTROL Denise Holfeld* and Axel Simroth** * Operations Research, Fraunhofer IVI, Germany, Email: denise.holfeld@ivi.fraunhofer.de ** Operations Research, Fraunhofer

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

An Analysis of Virtual Loss in Parallel MCTS

An Analysis of Virtual Loss in Parallel MCTS An Analysis of Virtual Loss in Parallel MCTS S. Ali Mirsoleimani 1,, Aske Plaat 1, Jaap van den Herik 1 and Jos Vermaseren 1 Leiden Centre of Data Science, Leiden University Niels Bohrweg 1, 333 CA Leiden,

More information

Artificial Intelligence. Game trees. Two-player zero-sum game. Goals for the lecture. Blai Bonet

Artificial Intelligence. Game trees. Two-player zero-sum game. Goals for the lecture. Blai Bonet Artificial Intelligence Blai Bonet Game trees Universidad Simón Boĺıvar, Caracas, Venezuela Goals for the lecture Two-player zero-sum game Two-player game with deterministic actions, complete information

More information

OGA-UCT: On-the-Go Abstractions in UCT

OGA-UCT: On-the-Go Abstractions in UCT OGA-UCT: On-the-Go Abstractions in UCT Ankit Anand and Ritesh Noothigattu and Mausam and Parag Singla Indian Institute of Technology, Delhi New Delhi, India {ankit.anand, riteshn.cs112, mausam, parags}@cse.iitd.ac.in

More information

University of Huddersfield Repository

University of Huddersfield Repository University of Huddersfield Repository Naveed, Munir, Kitchin, Diane E., Crampton, Andrew, Chrpa, Lukáš and Gregory, Peter A Monte-Carlo Path Planner for Dynamic and Partially Observable Environments Original

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Optimal Sequential Drilling for Hydrocarbon Field Development Planning

Optimal Sequential Drilling for Hydrocarbon Field Development Planning Proceedings of the Twenty-Ninth AAAI Conference on Innovative Applications (IAAI-17)) Optimal Sequential Drilling for Hydrocarbon Field Development Planning Ruben Rodriguez Torrado Repsol S.A. Mostoles,

More information

CSC 373: Algorithm Design and Analysis Lecture 4

CSC 373: Algorithm Design and Analysis Lecture 4 CSC 373: Algorithm Design and Analysis Lecture 4 Allan Borodin January 14, 2013 1 / 16 Lecture 4: Outline (for this lecture and next lecture) Some concluding comments on optimality of EST Greedy Interval

More information

Adversarial Policy Switching with Application to RTS Games

Adversarial Policy Switching with Application to RTS Games Adversarial Policy Switching with Application to RTS Games Brian King 1 and Alan Fern 2 and Jesse Hostetler 2 Department of Electrical Engineering and Computer Science Oregon State University 1 kingbria@lifetime.oregonstate.edu

More information

Monte Carlo Tree Search with Bayesian Model Averaging for the Game of Go

Monte Carlo Tree Search with Bayesian Model Averaging for the Game of Go Monte Carlo Tree Search with Bayesian Model Averaging for the Game of Go John Jeong You A subthesis submitted in partial fulfillment of the degree of Master of Computing (Honours) at The Department of

More information

Recap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems.

Recap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems. CS 188: Artificial Intelligence Spring 011 Midterm Review 3/14/011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the

More information

Improving Exploration in UCT Using Local Manifolds

Improving Exploration in UCT Using Local Manifolds Improving Exploration in Using Local Manifolds Sriram Srinivasan University of Alberta ssriram@ualberta.ca Erik Talvitie Franklin and Marshal College erik.talvitie@fandm.edu Michael Bowling University

More information

Incremental Query Optimization

Incremental Query Optimization Incremental Query Optimization Vipul Venkataraman Dr. S. Sudarshan Computer Science and Engineering Indian Institute of Technology Bombay Outline Introduction Volcano Cascades Incremental Optimization

More information

Introduction to Mobile Robotics Multi-Robot Exploration

Introduction to Mobile Robotics Multi-Robot Exploration Introduction to Mobile Robotics Multi-Robot Exploration Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Kai Arras Exploration The approaches seen so far are purely passive. Given an unknown environment,

More information

CS450 - Structure of Higher Level Languages

CS450 - Structure of Higher Level Languages Spring 2018 Streams February 24, 2018 Introduction Streams are abstract sequences. They are potentially infinite we will see that their most interesting and powerful uses come in handling infinite sequences.

More information

Neural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze)

Neural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze) Neural Episodic Control Alexander pritzel et al. 2017 (presented by Zura Isakadze) Reinforcement Learning Image from reinforce.io RL Example - Atari Games Observed States Images. Internal state - RAM.

More information

Scalable Distributed Monte-Carlo Tree Search

Scalable Distributed Monte-Carlo Tree Search Proceedings, The Fourth International Symposium on Combinatorial Search (SoCS-2011) Scalable Distributed Monte-Carlo Tree Search Kazuki Yoshizoe, Akihiro Kishimoto, Tomoyuki Kaneko, Haruhiro Yoshimoto,

More information

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:

More information

3 SOLVING PROBLEMS BY SEARCHING

3 SOLVING PROBLEMS BY SEARCHING 48 3 SOLVING PROBLEMS BY SEARCHING A goal-based agent aims at solving problems by performing actions that lead to desirable states Let us first consider the uninformed situation in which the agent is not

More information

Dilemma First Search for Effortless Optimization of NP-hard Problems

Dilemma First Search for Effortless Optimization of NP-hard Problems for Effortless Optimization of NP-hard Problems Julien Weissenberg Hayko Riemenschneider Ralf Dragon Luc Van Gool Abstract To tackle the exponentiality associated with NPhard problems, two paradigms have

More information

REAL-TIME PLANNING AS DECISION-MAKING UNDER UNCERTAINTY ANDREW MITCHELL. BS Computer Science, University of New Hampshire, 2017 THESIS

REAL-TIME PLANNING AS DECISION-MAKING UNDER UNCERTAINTY ANDREW MITCHELL. BS Computer Science, University of New Hampshire, 2017 THESIS REAL-TIME PLANNING AS DECISION-MAKING UNDER UNCERTAINTY BY ANDREW MITCHELL BS Computer Science, University of New Hampshire, 2017 THESIS Submitted to the University of New Hampshire in Partial Fulfillment

More information

Partially Observable Markov Decision Processes. Mausam (slides by Dieter Fox)

Partially Observable Markov Decision Processes. Mausam (slides by Dieter Fox) Partially Observable Markov Decision Processes Mausam (slides by Dieter Fox) Stochastic Planning: MDPs Static Environment Fully Observable Perfect What action next? Stochastic Instantaneous Percepts Actions

More information

Searching with Partial Information

Searching with Partial Information Searching with Partial Information Above we (unrealistically) assumed that the environment is fully observable and deterministic Moreover, we assumed that the agent knows what the effects of each action

More information

On-line search for solving Markov Decision Processes via heuristic sampling

On-line search for solving Markov Decision Processes via heuristic sampling On-line search for solving Markov Decision Processes via heuristic sampling Laurent Péret and Frédérick Garcia Abstract. In the past, Markov Decision Processes (MDPs) have become a standard for solving

More information

Nested Rollout Policy Adaptation for Monte Carlo Tree Search

Nested Rollout Policy Adaptation for Monte Carlo Tree Search Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Nested Rollout Policy Adaptation for Monte Carlo Tree Search Christopher D. Rosin Parity Computing, Inc. San Diego,

More information

Go in numbers 3,000. Years Old

Go in numbers 3,000. Years Old Go in numbers 3,000 Years Old 40M Players 10^170 Positions The Rules of Go Capture Territory Why is Go hard for computers to play? Brute force search intractable: 1. Search space is huge 2. Impossible

More information

Cell. x86. A Study on Implementing Parallel MC/UCT Algorithm

Cell. x86. A Study on Implementing Parallel MC/UCT Algorithm 1 MC/UCT 1, 2 1 UCT/MC CPU 2 x86/ PC Cell/Playstation 3 Cell 3 x86 10%UCT GNU GO 4 ELO 35 ELO 20 ELO A Study on Implementing Parallel MC/UCT Algorithm HIDEKI KATO 1, 2 and IKUO TAKEUCHI 1 We have developed

More information

Estimation of Item Response Models

Estimation of Item Response Models Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:

More information

Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm

Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm Distributed Gibbs: A Memory-Bounded Sampling-Based DCOP Algorithm Duc Thien Nguyen, William Yeoh, and Hoong Chuin Lau School of Information Systems Singapore Management University Singapore 178902 {dtnguyen.2011,hclau}@smu.edu.sg

More information

looking ahead to see the optimum

looking ahead to see the optimum ! Make choice based on immediate rewards rather than looking ahead to see the optimum! In many cases this is effective as the look ahead variation can require exponential time as the number of possible

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

UCT Algorithm Circle: Probabilistic Algorithms

UCT Algorithm Circle: Probabilistic Algorithms UCT Algorithm Circle: 24 February 2011 Outline 1 2 3 Probabilistic algorithm A probabilistic algorithm is one which uses a source of random information The randomness used may result in randomness in the

More information

Guiding Combinatorial Optimization with UCT

Guiding Combinatorial Optimization with UCT Guiding Combinatorial Optimization with UCT Ashish Sabharwal, Horst Samulowitz, and Chandra Reddy IBM Watson Research Center, Yorktown Heights, NY 10598, USA {ashish.sabharwal,samulowitz,creddy}@us.ibm.com

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search 2-15-16 Reading Quiz What is the relationship between Monte Carlo tree search and upper confidence bound applied to trees? a) MCTS is a type of UCB b) UCB is a type of MCTS c) both

More information

Trees (Part 1, Theoretical) CSE 2320 Algorithms and Data Structures University of Texas at Arlington

Trees (Part 1, Theoretical) CSE 2320 Algorithms and Data Structures University of Texas at Arlington Trees (Part 1, Theoretical) CSE 2320 Algorithms and Data Structures University of Texas at Arlington 1 Trees Trees are a natural data structure for representing specific data. Family trees. Organizational

More information

Copyright 2000, Kevin Wayne 1

Copyright 2000, Kevin Wayne 1 Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Constraint Satisfaction Problems II and Local Search Instructors: Sergey Levine and Stuart Russell University of California, Berkeley [These slides were created by Dan Klein

More information

Informed State Space Search B4B36ZUI, LS 2018

Informed State Space Search B4B36ZUI, LS 2018 Informed State Space Search B4B36ZUI, LS 2018 Branislav Bošanský, Martin Schaefer, David Fiedler, Jaromír Janisch {name.surname}@agents.fel.cvut.cz Artificial Intelligence Center, Czech Technical University

More information

PLite.jl. Release 1.0

PLite.jl. Release 1.0 PLite.jl Release 1.0 October 19, 2015 Contents 1 In Depth Documentation 3 1.1 Installation................................................ 3 1.2 Problem definition............................................

More information

Large Margin Classification Using the Perceptron Algorithm

Large Margin Classification Using the Perceptron Algorithm Large Margin Classification Using the Perceptron Algorithm Yoav Freund Robert E. Schapire Presented by Amit Bose March 23, 2006 Goals of the Paper Enhance Rosenblatt s Perceptron algorithm so that it can

More information

Deep Learning of Visual Control Policies

Deep Learning of Visual Control Policies ESANN proceedings, European Symposium on Artificial Neural Networks - Computational Intelligence and Machine Learning. Bruges (Belgium), 8-3 April, d-side publi., ISBN -9337--. Deep Learning of Visual

More information

CS 540: Introduction to Artificial Intelligence

CS 540: Introduction to Artificial Intelligence CS 540: Introduction to Artificial Intelligence Midterm Exam: 7:15-9:15 pm, October, 014 Room 140 CS Building CLOSED BOOK (one sheet of notes and a calculator allowed) Write your answers on these pages

More information

Adaptive Building of Decision Trees by Reinforcement Learning

Adaptive Building of Decision Trees by Reinforcement Learning Proceedings of the 7th WSEAS International Conference on Applied Informatics and Communications, Athens, Greece, August 24-26, 2007 34 Adaptive Building of Decision Trees by Reinforcement Learning MIRCEA

More information

DFS* and the Traveling Tournament Problem. David C. Uthus, Patricia J. Riddle, and Hans W. Guesgen

DFS* and the Traveling Tournament Problem. David C. Uthus, Patricia J. Riddle, and Hans W. Guesgen DFS* and the Traveling Tournament Problem David C. Uthus, Patricia J. Riddle, and Hans W. Guesgen Traveling Tournament Problem Sports scheduling combinatorial optimization problem. Objective is to create

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Heuristic Search Value Iteration Trey Smith. Presenter: Guillermo Vázquez November 2007

Heuristic Search Value Iteration Trey Smith. Presenter: Guillermo Vázquez November 2007 Heuristic Search Value Iteration Trey Smith Presenter: Guillermo Vázquez November 2007 What is HSVI? Heuristic Search Value Iteration is an algorithm that approximates POMDP solutions. HSVI stores an upper

More information

AP Computer Science Unit 3. Programs

AP Computer Science Unit 3. Programs AP Computer Science Unit 3. Programs For most of these programs I m asking that you to limit what you print to the screen. This will help me in quickly running some tests on your code once you submit them

More information

LECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2

LECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2 15-382 COLLECTIVE INTELLIGENCE - S18 LECTURE 20: SWARM INTELLIGENCE 6 / ANT COLONY OPTIMIZATION 2 INSTRUCTOR: GIANNI A. DI CARO ANT-ROUTING TABLE: COMBINING PHEROMONE AND HEURISTIC 2 STATE-TRANSITION:

More information

Structural Graph Matching With Polynomial Bounds On Memory and on Worst-Case Effort

Structural Graph Matching With Polynomial Bounds On Memory and on Worst-Case Effort Structural Graph Matching With Polynomial Bounds On Memory and on Worst-Case Effort Fred DePiero, Ph.D. Electrical Engineering Department CalPoly State University Goal: Subgraph Matching for Use in Real-Time

More information