CS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8

Size: px
Start display at page:

Download "CS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8"

Transcription

1 CS885 Reinforcement Learning Lecture 9: May 30, 2018 Model-based RL [SutBar] Chap 8 CS885 Spring 2018 Pascal Poupart 1

2 Outline Model-based RL Dyna Monte-Carlo Tree Search CS885 Spring 2018 Pascal Poupart 2

3 Model-free RL No explicit transition or reward models Q-learning: value-based method Policy gradient: policy-based method Actor critic: policy and value based method state reward Agent: update policy/value function Environment action CS885 Spring 2018 Pascal Poupart 3

4 Model-based RL Learn explicit transition and/or reward model Plan based on the model Benefit: Increased sample efficiency Drawback: Increased complexity plan Agent: update model Agent: update policy/value function state reward Environment action CS885 Spring 2018 Pascal Poupart 4

5 Maze Example r r r +1 u u -1 u l l l g = 1 Reward is for non-terminal states We need to learn all the transition probabilities! (1,1)à (1,2)à (1,3)à (1,2)à (1,3)à (2,3)à (3,3)à (4,3) +1 (1,1)à (1,2)à (1,3)à (2,3)à (3,3)à (3,2)à (3,3)à (4,3) +1 (1,1)à (2,1)à (3,1)à (3,2)à (4,2) -1 P((2,3) (1,3),r) =2/3 P((1,2) (1,3),r) =1/3 Use this information in # % = max ) * %,, Pr % 4 %,, # (% 4 ) CS885 Spring 2018 Pascal Poupart 5

6 Model-based RL Idea: at each step Execute action Observe resulting state and reward Update transition and/or reward model Update policy and/or value function CS885 Spring 2018 Pascal Poupart 6

7 Model-based RL (with Value Iteration) ModelBasedRL(!) Repeat Select and execute " Observe! and $ Update counts: %!, " %!, " + 1, %!, ",! * %!, ",! * + 1 Update transition: Pr! *!, " -(/,0,/1 )! Update reward: 5!, " -(/,0) /,0 89 :(/,0) -(/,0) Solve: ; (!) = max 0 5!, " + A / 1 Pr! *!, " ;! *!!! Until convergence of ; Return ; CS885 Spring 2018 Pascal Poupart 7

8 Complex models Use function approx. for transition and reward models Linear model:!"# $ % $, ' = )($ -. $, / 0 1) ' Non-linear models: Stochastic (e.g. Gaussian process): $, / 0 1) ' Deterministic (e.g., neural network): $ % = 5 $, ' = ))($, ')!"# $ % $, ' = 34($ -. CS885 Spring 2018 Pascal Poupart 8

9 Partial Planning In complex models, fully optimizing the policy or value function at each time step is intractable Consider partial planning A few steps of Q-learning A few steps of policy gradient CS885 Spring 2018 Pascal Poupart 9

10 Model-based RL (with Q-learning) ModelBasedRL(!) Repeat Select and execute ", observe! and $ Update transition: % & % & ) & *!, "!,./ *(!, ") Update reward:% 2 % 2 ) 2 3!, " $.4 3(!, ") Repeat a few times: sample!, 6" arbitrarily 7 3!, 6" + 9 max? *(!, 6"), 6",?!, 6" 6= > Update?: 7.A?(!, 6")!! Until convergence of? Return? CS885 Spring 2018 Pascal Poupart 10

11 Partial Planning vs Replay Buffer Previous algorithm is very similar to Model-free Q- learning with a replay buffer Instead of updating Q-function based on samples from replay buffer, generate samples from model Replay buffer: Simple, real samples, no generalization to other sate-action pairs Partial planning with a model Complex, simulated samples, generalization to other state-action pairs (can help or hurt) CS885 Spring 2018 Pascal Poupart 11

12 Dyna Learn explicit transition and/or reward model Plan based on the model Learn directly from real experience plan Agent: update model Agent: update policy/value function state reward state reward Environment action CS885 Spring 2018 Pascal Poupart 12

13 Dyna-Q Dyna-Q(!) Repeat Select and execute ", observe! and $ Update transition: % & % & ) & *!, "!,./ *(!, ") Update reward: % 2 % 2 ) 2 3!, " $.4 3(!, ") 5 $ + 7 max ; < =!, " =!, " Update =: %? %? )? 5.@ =(!, ") Repeat a few times: sample!, B" arbitrarily 5 3!, B" + 7 max = *(!, B"), B", =!, B" B; < Update =: %? %? )? 5.@ =(!, B")!! Return = CS885 Spring 2018 Pascal Poupart 13

14 Task: reach G from S Dyna-Q CS885 Spring 2018 Pascal Poupart 14

15 Planning from current state Instead of planning at arbitrary states, plan from the current state This helps improve next action Monte Carlo Tree Search CS885 Spring 2018 Pascal Poupart 15

16 Tree Search a s1 b s2 s3 s12 s13 a b a b a b a b s4 s5 s6 s7 s8 s9 s10 s11 s14 s15 s16 s17 s18 s19 s20 s21 CS885 Spring 2018 Pascal Poupart 16

17 Tractable Tree Search Combine 3 ideas: Leaf nodes: approximate leaf values with value of default policy! " $, & " ( $, & 1 *($, &) -./0 Chance nodes: approximate expectation by sampling from transition model " 1 $, & 3 $, & =($ > ) *($, &) ~ 9:(6 7 6,<) Decision nodes: expand only most promising actions & = &@AB&C < " $, & + D E FG ,< and = $ = "($, & ) Resulting algorithm: Monte Carlo Tree Search CS885 Spring 2018 Pascal Poupart 17

18 Computer Go Deep RL Monte Carlo Tree Search Oct 2015: CS885 Spring 2018 Pascal Poupart 18

19 Monte Carlo Tree Search (with upper confidence bound) UCT(! " ) create root #$%& " with state!/0/& #$%& "! " while within computational budget do #$%& < =>&&?$@ABC(#$%& " ) F0@G& H&I0G@/?$@ABC!/0/& #$%& < J0BKGL(#$%& <, F0@G&) return 0B/A$#(J&!/OhA@% #$%& ", 0 ) TreePolicy(#$%&) while #$%& is nonterminal do if #$%& is not fully expanded do return UVL0#%(#$%&) else #$%& J&!/OhA@%(#$%&, O) return #$%& CS885 Spring 2018 Pascal Poupart 19

20 Monte Carlo Tree Search (continued) Expand(!"#$) choose % untried actions of '()*%*$!"#$ ) add a new child!"#$ to!"#$ with )*%*$!"#$ 9 ;()*%*$!"#$, %) return!"#$ deterministic transition BestChild(!"#$,?) return arg max M CDEF G HIJKELFC CDEF!"#$9 +? O PQ C CDEF C(CDEF G ) DefaultPolicy(!"#$) while!"#$ is not terminal do sample % ~ U % )*%*$!"#$ ) ;()*%*$!"#$, %) return V(), %) CS885 Spring 2018 Pascal Poupart 20

21 Monte Carlo Tree Search (continued) Single Player Backup(!"#$,%&'($) while!"#$ is not null do 7 789: ; 789: <=>?@: 5!"#$ 7 789: <A!!"#$!!"#$ + 1!"#$ D&E$!F(!"#$) Two Players (adversarial) BackupMinMax(!"#$,%&'($) while!"#$ is not null do 7 789: ; 789: <=>?@: 5!"#$ 7 789: <A!!"#$!!"#$ + 1 %&'($ %&'($!"#$ D&E$!F(!"#$) CS885 Spring 2018 Pascal Poupart 21

22 AlphaGo Four steps: 1. Supervised Learning of Policy Networks 2. Policy gradient with Policy Networks 3. Value gradient with Value Networks 4. Searching with Policy and Value Networks Monte Carlo Tree Search variant CS885 Spring 2018 Pascal Poupart 22

23 Search Tree At each edge store!(#, %), '(% #), )(#, %) Where ) #, % is the visit count of (#, %) Sample trajectory CS885 Spring 2018 Pascal Poupart 23

24 Simulation At each node, select edge! that maximizes! =!$%&!' ( ) *,! + -(*,!) where - *,! 1(( 3) is an exploration bonus 456 3,( ) *,! = 4 6 3,( *,! [;< = * + 1 ;? 8 ] 1 BC *,! D!* EB*BFGH!F BFG$!FBIJ B 1 8 *,! = A 0 IFhG$DB*G CS885 Spring 2018 Pascal Poupart 24

Neural Networks and Tree Search

Neural Networks and Tree Search Mastering the Game of Go With Deep Neural Networks and Tree Search Nabiha Asghar 27 th May 2016 AlphaGo by Google DeepMind Go: ancient Chinese board game. Simple rules, but far more complicated than Chess

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes

More information

A Brief Introduction to Reinforcement Learning

A Brief Introduction to Reinforcement Learning A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent

More information

Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended)

Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Pavlos Andreadis, February 2 nd 2018 1 Markov Decision Processes A finite Markov Decision Process

More information

A Series of Lectures on Approximate Dynamic Programming

A Series of Lectures on Approximate Dynamic Programming A Series of Lectures on Approximate Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.)

More information

Deep Reinforcement Learning

Deep Reinforcement Learning Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic

More information

Click to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1

Click to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1 Approximate Click to edit Master titlemodels style for Batch RL Click to edit Emma Master Brunskill subtitle style 11 FVI / FQI PI Approximate model planners Policy Iteration maintains both an explicit

More information

Markov Decision Processes and Reinforcement Learning

Markov Decision Processes and Reinforcement Learning Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence

More information

Value Iteration. Reinforcement Learning: Introduction to Machine Learning. Matt Gormley Lecture 23 Apr. 10, 2019

Value Iteration. Reinforcement Learning: Introduction to Machine Learning. Matt Gormley Lecture 23 Apr. 10, 2019 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Reinforcement Learning: Value Iteration Matt Gormley Lecture 23 Apr. 10, 2019 1

More information

Applications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich?

Applications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich? Applications of Reinforcement Learning Ist künstliche Intelligenz gefährlich? Table of contents Playing Atari with Deep Reinforcement Learning Playing Super Mario World Stanford University Autonomous Helicopter

More information

Slides credited from Dr. David Silver & Hung-Yi Lee

Slides credited from Dr. David Silver & Hung-Yi Lee Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to

More information

CME 213 SPRING Eric Darve

CME 213 SPRING Eric Darve CME 213 SPRING 2017 Eric Darve MPI SUMMARY Point-to-point and collective communications Process mapping: across nodes and within a node (socket, NUMA domain, core, hardware thread) MPI buffers and deadlocks

More information

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019 Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 1 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Introduction, History, General Concepts

More information

Monte Carlo Tree Search PAH 2015

Monte Carlo Tree Search PAH 2015 Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the

More information

More Realistic Adversarial Settings. Virginia Tech CS5804 Introduction to Artificial Intelligence Spring 2015

More Realistic Adversarial Settings. Virginia Tech CS5804 Introduction to Artificial Intelligence Spring 2015 More Realistic Adversarial Settings Virginia Tech CS5804 Introduction to Artificial Intelligence Spring 2015 Review Minimax search How to adjust for more than two agents, for non-zero-sum Analysis very

More information

Gradient Reinforcement Learning of POMDP Policy Graphs

Gradient Reinforcement Learning of POMDP Policy Graphs 1 Gradient Reinforcement Learning of POMDP Policy Graphs Douglas Aberdeen Research School of Information Science and Engineering Australian National University Jonathan Baxter WhizBang! Labs July 23, 2001

More information

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:

More information

Reinforcement Learning: A brief introduction. Mihaela van der Schaar

Reinforcement Learning: A brief introduction. Mihaela van der Schaar Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming

More information

Reinforcement Learning (2)

Reinforcement Learning (2) Reinforcement Learning (2) Bruno Bouzy 1 october 2013 This document is the second part of the «Reinforcement Learning» chapter of the «Agent oriented learning» teaching unit of the Master MI computer course.

More information

Gradient Descent. 1) S! initial state 2) Repeat: Similar to: - hill climbing with h - gradient descent over continuous space

Gradient Descent. 1) S! initial state 2) Repeat: Similar to: - hill climbing with h - gradient descent over continuous space Local Search 1 Local Search Light-memory search method No search tree; only the current state is represented! Only applicable to problems where the path is irrelevant (e.g., 8-queen), unless the path is

More information

10703 Deep Reinforcement Learning and Control

10703 Deep Reinforcement Learning and Control 10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture

More information

Monte Carlo Tree Search

Monte Carlo Tree Search Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated

More information

Two-player Games ZUI 2016/2017

Two-player Games ZUI 2016/2017 Two-player Games ZUI 2016/2017 Branislav Bošanský bosansky@fel.cvut.cz Two Player Games Important test environment for AI algorithms Benchmark of AI Chinook (1994/96) world champion in checkers Deep Blue

More information

6.231 DYNAMIC PROGRAMMING LECTURE 15 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 15 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 15 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Sequential consistency and greedy algorithms Sequential improvement ROLLOUT

More information

Planning and Control: Markov Decision Processes

Planning and Control: Markov Decision Processes CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What

More information

Markov Decision Processes. (Slides from Mausam)

Markov Decision Processes. (Slides from Mausam) Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology

More information

44.1 Introduction Introduction. Foundations of Artificial Intelligence Monte-Carlo Methods Sparse Sampling 44.4 MCTS. 44.

44.1 Introduction Introduction. Foundations of Artificial Intelligence Monte-Carlo Methods Sparse Sampling 44.4 MCTS. 44. Foundations of Artificial ntelligence May 27, 206 44. : ntroduction Foundations of Artificial ntelligence 44. : ntroduction Thomas Keller Universität Basel May 27, 206 44. ntroduction 44.2 Monte-Carlo

More information

Foundations of Artificial Intelligence

Foundations of Artificial Intelligence Foundations of Artificial Intelligence 45. AlphaGo and Outlook Malte Helmert and Gabriele Röger University of Basel May 22, 2017 Board Games: Overview chapter overview: 40. Introduction and State of the

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Artificial Intelligence. Game trees. Two-player zero-sum game. Goals for the lecture. Blai Bonet

Artificial Intelligence. Game trees. Two-player zero-sum game. Goals for the lecture. Blai Bonet Artificial Intelligence Blai Bonet Game trees Universidad Simón Boĺıvar, Caracas, Venezuela Goals for the lecture Two-player zero-sum game Two-player game with deterministic actions, complete information

More information

Accelerating Reinforcement Learning in Engineering Systems

Accelerating Reinforcement Learning in Engineering Systems Accelerating Reinforcement Learning in Engineering Systems Tham Chen Khong with contributions from Zhou Chongyu and Le Van Duc Department of Electrical & Computer Engineering National University of Singapore

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

Neural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze)

Neural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze) Neural Episodic Control Alexander pritzel et al. 2017 (presented by Zura Isakadze) Reinforcement Learning Image from reinforce.io RL Example - Atari Games Observed States Images. Internal state - RAM.

More information

Bandit-based Search for Constraint Programming

Bandit-based Search for Constraint Programming Bandit-based Search for Constraint Programming Manuel Loth 1,2,4, Michèle Sebag 2,4,1, Youssef Hamadi 3,1, Marc Schoenauer 4,2,1, Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud

More information

Generalized Inverse Reinforcement Learning

Generalized Inverse Reinforcement Learning Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract

More information

When Network Embedding meets Reinforcement Learning?

When Network Embedding meets Reinforcement Learning? When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE

More information

Lecture 13: Learning from Demonstration

Lecture 13: Learning from Demonstration CS 294-5 Algorithmic Human-Robot Interaction Fall 206 Lecture 3: Learning from Demonstration Scribes: Samee Ibraheem and Malayandi Palaniappan - Adapted from Notes by Avi Singh and Sammy Staszak 3. Introduction

More information

Heuristic (Informed) Search

Heuristic (Informed) Search Heuristic (Informed) Search (Where we try to choose smartly) R&N: Chap., Sect..1 3 1 Search Algorithm #2 SEARCH#2 1. INSERT(initial-node,Open-List) 2. Repeat: a. If empty(open-list) then return failure

More information

CS Deep Reinforcement Learning HW4: Model-Based RL due October 18th, 11:59 pm

CS Deep Reinforcement Learning HW4: Model-Based RL due October 18th, 11:59 pm CS294-112 Deep Reinforcement Learning HW4: Model-Based RL due October 18th, 11:59 pm 1 Introduction The goal of this assignment is to get experience with model-learning in the context of RL, and to use

More information

Universality in a Large Ensemble of Geometries

Universality in a Large Ensemble of Geometries Universality in a Large Ensemble of Geometries Jim Halverson Northeastern University String Phenomenology 2017 Based on work with Cody Long and Ben Sung. Some Advertisements Cody Long s Talk: Weak Coupling

More information

CSE 203A: Randomized Algorithms

CSE 203A: Randomized Algorithms CSE 203A: Randomized Algorithms (00:30) The Minimax Principle (Ch 2) Lecture on 10/18/2017 by Daniel Kane Notes by Peter Greer-Berezovsky Game Tree Evaluation (2.1) - n round deterministic 2-player game

More information

CS 188: Artificial Intelligence. Recap Search I

CS 188: Artificial Intelligence. Recap Search I CS 188: Artificial Intelligence Review of Search, CSPs, Games DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered. You

More information

Partially Observable Markov Decision Processes. Silvia Cruciani João Carvalho

Partially Observable Markov Decision Processes. Silvia Cruciani João Carvalho Partially Observable Markov Decision Processes Silvia Cruciani João Carvalho MDP A reminder: is a set of states is a set of actions is the state transition function. is the probability of ending in state

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

Practical Tutorial on Using Reinforcement Learning Algorithms for Continuous Control

Practical Tutorial on Using Reinforcement Learning Algorithms for Continuous Control Practical Tutorial on Using Reinforcement Learning Algorithms for Continuous Control Reinforcement Learning Summer School 2017 Peter Henderson Riashat Islam 3rd July 2017 Deep Reinforcement Learning Classical

More information

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10 Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:

More information

Outline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility

Outline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility Outline CS38 Introduction to Algorithms Lecture 18 May 29, 2014 coping with intractibility approximation algorithms set cover TSP center selection randomness in algorithms May 29, 2014 CS38 Lecture 18

More information

Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay

Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay Haiyan (Helena) Yin, Sinno Jialin Pan School of Computer Science and Engineering Nanyang Technological University

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

CS 188: Artificial Intelligence Spring Recap Search I

CS 188: Artificial Intelligence Spring Recap Search I CS 188: Artificial Intelligence Spring 2011 Midterm Review 3/14/2011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the

More information

CMU-Q Lecture 2: Search problems Uninformed search. Teacher: Gianni A. Di Caro

CMU-Q Lecture 2: Search problems Uninformed search. Teacher: Gianni A. Di Caro CMU-Q 15-381 Lecture 2: Search problems Uninformed search Teacher: Gianni A. Di Caro RECAP: ACT RATIONALLY Think like people Think rationally Agent Sensors? Actuators Percepts Actions Environment Act like

More information

Recap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems.

Recap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems. CS 188: Artificial Intelligence Spring 011 Midterm Review 3/14/011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the

More information

Efficient Learning in Linearly Solvable MDP Models A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA.

Efficient Learning in Linearly Solvable MDP Models A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA. Efficient Learning in Linearly Solvable MDP Models A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Ang Li IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE

More information

Lecture 18: Video Streaming

Lecture 18: Video Streaming MIT 6.829: Computer Networks Fall 2017 Lecture 18: Video Streaming Scribe: Zhihong Luo, Francesco Tonolini 1 Overview This lecture is on a specific networking application: video streaming. In particular,

More information

Self Programming Networks

Self Programming Networks Self Programming Networks Is it possible for to Learn the control planes of networks and applications? Operators specify what they want, and the system learns how to deliver CAN WE LEARN THE CONTROL PLANE

More information

Master Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces

Master Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Master Thesis Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Andreas ten Pas Master Thesis DKE 09-16 Thesis submitted in partial fulfillment

More information

Alternatives to Direct Supervision

Alternatives to Direct Supervision CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of

More information

arxiv: v1 [cs.cv] 2 Sep 2018

arxiv: v1 [cs.cv] 2 Sep 2018 Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering

More information

Go in numbers 3,000. Years Old

Go in numbers 3,000. Years Old Go in numbers 3,000 Years Old 40M Players 10^170 Positions The Rules of Go Capture Territory Why is Go hard for computers to play? Brute force search intractable: 1. Search space is huge 2. Impossible

More information

A fast point-based algorithm for POMDPs

A fast point-based algorithm for POMDPs A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl

More information

Probabilistic Belief. Adversarial Search. Heuristic Search. Planning. Probabilistic Reasoning. CSPs. Learning CS121

Probabilistic Belief. Adversarial Search. Heuristic Search. Planning. Probabilistic Reasoning. CSPs. Learning CS121 CS121 Heuristic Search Planning CSPs Adversarial Search Probabilistic Reasoning Probabilistic Belief Learning Heuristic Search First, you need to formulate your situation as a Search Problem What is a

More information

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x

Autoencoder. Representation learning (related to dictionary learning) Both the input and the output are x Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised

More information

Lecture 19: Generative Adversarial Networks

Lecture 19: Generative Adversarial Networks Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,

More information

Marco Wiering Intelligent Systems Group Utrecht University

Marco Wiering Intelligent Systems Group Utrecht University Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment

More information

Neuro-Dynamic Programming An Overview

Neuro-Dynamic Programming An Overview 1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. May 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly applicable,

More information

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING

GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING M. Babaeizadeh,, I.Frosio, S.Tyree, J. Clemons, J.Kautz University of Illinois at Urbana-Champaign, USA NVIDIA, USA An ICLR 2017 paper A github project GPU-BASED

More information

Advanced A* Improvements

Advanced A* Improvements Advanced A* Improvements 1 Iterative Deepening A* (IDA*) Idea: Reduce memory requirement of A* by applying cutoff on values of f Consistent heuristic function h Algorithm IDA*: 1. Initialize cutoff to

More information

CS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation

CS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation CS 687 Jana Kosecka Reinforcement Learning Continuous State MDP s Value Function approximation Markov Decision Process - Review Formal definition 4-tuple (S, A, T, R) Set of states S - finite Set of actions

More information

Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation

Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation Hao Cui Department of Computer Science Tufts University Medford, MA 02155, USA hao.cui@tufts.edu Roni Khardon

More information

GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning

GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning Koichi Shirahata, Youri Coppens, Takuya Fukagai, Yasumoto Tomita, and Atsushi Ike FUJITSU LABORATORIES LTD. March 27, 2018 0 Deep

More information

Two-player Games ZUI 2012/2013

Two-player Games ZUI 2012/2013 Two-player Games ZUI 2012/2013 Game-tree Search / Adversarial Search until now only the searching player acts in the environment there could be others: Nature stochastic environment (MDP, POMDP, ) other

More information

Monte Carlo Hierarchical Model Learning

Monte Carlo Hierarchical Model Learning Monte Carlo Hierarchical Model Learning Jacob Menashe and Peter Stone The University of Texas at Austin Austin, Texas {jmenashe,pstone}@cs.utexas.edu ABSTRACT Reinforcement learning (RL) is a well-established

More information

CS 4700: Artificial Intelligence

CS 4700: Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Fall 2017 Instructor: Prof. Haym Hirsh Lecture 8 Today Informed Search (R&N Ch 3,4) Adversarial search (R&N Ch 5) Adversarial Search (R&N Ch 5) Homework

More information

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS Summer Introduction to Artificial Intelligence Midterm You have approximately minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark your answers

More information

Heuristic Search in MDPs 3/5/18

Heuristic Search in MDPs 3/5/18 Heuristic Search in MDPs 3/5/18 Thinking about online planning. How can we use ideas we ve already seen to help with online planning? Heuristics? Iterative deepening? Monte Carlo simulations? Other ideas?

More information

Search : Lecture 2. September 9, 2003

Search : Lecture 2. September 9, 2003 Search 6.825: Lecture 2 September 9, 2003 1 Problem-Solving Problems When your environment can be effectively modeled as having discrete states and actions deterministic, known world dynamics known initial

More information

Pascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering

Pascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering Study of Multiple Multiagent Reinforcement Learning Algorithms in Grid Games by Pascal De Beck-Courcelle A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of

More information

1 Minimum Cut Problem

1 Minimum Cut Problem CS 6 Lecture 6 Min Cut and Karger s Algorithm Scribes: Peng Hui How, Virginia Williams (05) Date: November 7, 07 Anthony Kim (06), Mary Wootters (07) Adapted from Virginia Williams lecture notes Minimum

More information

Markov Decision Processes (MDPs) (cont.)

Markov Decision Processes (MDPs) (cont.) Markov Decision Processes (MDPs) (cont.) Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 29 th, 2007 Markov Decision Process (MDP) Representation State space: Joint state x

More information

Introduction to Deep Q-network

Introduction to Deep Q-network Introduction to Deep Q-network Presenter: Yunshu Du CptS 580 Deep Learning 10/10/2016 Deep Q-network (DQN) Deep Q-network (DQN) An artificial agent for general Atari game playing Learn to master 49 different

More information

2SAT Andreas Klappenecker

2SAT Andreas Klappenecker 2SAT Andreas Klappenecker The Problem Can we make the following boolean formula true? ( x y) ( y z) (z y)! Terminology A boolean variable is a variable that can be assigned the values true (T) or false

More information

Clustering. k-mean clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Clustering. k-mean clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein A quick review The clustering problem: homogeneity vs. separation Different representations

More information

19: Inference and learning in Deep Learning

19: Inference and learning in Deep Learning 10-708: Probabilistic Graphical Models 10-708, Spring 2017 19: Inference and learning in Deep Learning Lecturer: Zhiting Hu Scribes: Akash Umakantha, Ryan Williamson 1 Classes of Deep Generative Models

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian

More information

Real-Time Solving of Quantified CSPs Based on Monte-Carlo Game Tree Search

Real-Time Solving of Quantified CSPs Based on Monte-Carlo Game Tree Search Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Real-Time Solving of Quantified CSPs Based on Monte-Carlo Game Tree Search Baba Satomi, Yongjoon Joe, Atsushi

More information

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between

More information

Mission Command Analysis Using Monte Carlo Tree Search

Mission Command Analysis Using Monte Carlo Tree Search TRAC-M-TR-13-050 14 June 2013 Mission Command Analysis Using Monte Carlo Tree Search TRADOC Analysis Center - Monterey 700 Dyer Road Monterey, California 93943-0692 This study cost the Department of Defense

More information

ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.

ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. CS Summer Introduction to Artificial Intelligence Midterm ˆ You have approximately minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. ˆ Mark your answers

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

In Homework 1, you determined the inverse dynamics model of the spinbot robot to be

In Homework 1, you determined the inverse dynamics model of the spinbot robot to be Robot Learning Winter Semester 22/3, Homework 2 Prof. Dr. J. Peters, M.Eng. O. Kroemer, M. Sc. H. van Hoof Due date: Wed 6 Jan. 23 Note: Please fill in the solution on this sheet but add sheets for the

More information

Unsupervised Learning

Unsupervised Learning Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy

More information

Introduction to Fall 2014 Artificial Intelligence Midterm Solutions

Introduction to Fall 2014 Artificial Intelligence Midterm Solutions CS Introduction to Fall Artificial Intelligence Midterm Solutions INSTRUCTIONS You have minutes. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators

More information

Machine Learning Techniques at the core of AlphaGo success

Machine Learning Techniques at the core of AlphaGo success Machine Learning Techniques at the core of AlphaGo success Stéphane Sénécal Orange Labs stephane.senecal@orange.com Paris Machine Learning Applications Group Meetup, 14/09/2016 1 / 42 Some facts... (1/3)

More information

Programming Reinforcement Learning in Jason

Programming Reinforcement Learning in Jason Programming Reinforcement Learning in Jason Amelia Bădică 1, Costin Bădică 1, Mirjana Ivanović 2 1 University of Craiova, Romania 2 University of Novi Sad, Serbia Talk Outline Introduction, Motivation

More information

Simulated Annealing. Slides based on lecture by Van Larhoven

Simulated Annealing. Slides based on lecture by Van Larhoven Simulated Annealing Slides based on lecture by Van Larhoven Iterative Improvement 1 General method to solve combinatorial optimization problems Principle: Start with initial configuration Repeatedly search

More information

0.1. alpha(n): learning rate

0.1. alpha(n): learning rate Least-Squares Temporal Dierence Learning Justin A. Boyan NASA Ames Research Center Moett Field, CA 94035 jboyan@arc.nasa.gov Abstract TD() is a popular family of algorithms for approximate policy evaluation

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Independence PGMs encode assumption of statistical independence between variables. Critical

More information

MIND: Machine Learning based Network Dynamics. Dr. Yanhui Geng Huawei Noah s Ark Lab, Hong Kong

MIND: Machine Learning based Network Dynamics. Dr. Yanhui Geng Huawei Noah s Ark Lab, Hong Kong MIND: Machine Learning based Network Dynamics Dr. Yanhui Geng Huawei Noah s Ark Lab, Hong Kong Outline Challenges with traditional SDN MIND architecture Experiment results Conclusion Challenges with Traditional

More information

3.6.2 Generating admissible heuristics from relaxed problems

3.6.2 Generating admissible heuristics from relaxed problems 3.6.2 Generating admissible heuristics from relaxed problems To come up with heuristic functions one can study relaxed problems from which some restrictions of the original problem have been removed The

More information