CS885 Reinforcement Learning Lecture 9: May 30, Model-based RL [SutBar] Chap 8
|
|
- Samson Wilson
- 5 years ago
- Views:
Transcription
1 CS885 Reinforcement Learning Lecture 9: May 30, 2018 Model-based RL [SutBar] Chap 8 CS885 Spring 2018 Pascal Poupart 1
2 Outline Model-based RL Dyna Monte-Carlo Tree Search CS885 Spring 2018 Pascal Poupart 2
3 Model-free RL No explicit transition or reward models Q-learning: value-based method Policy gradient: policy-based method Actor critic: policy and value based method state reward Agent: update policy/value function Environment action CS885 Spring 2018 Pascal Poupart 3
4 Model-based RL Learn explicit transition and/or reward model Plan based on the model Benefit: Increased sample efficiency Drawback: Increased complexity plan Agent: update model Agent: update policy/value function state reward Environment action CS885 Spring 2018 Pascal Poupart 4
5 Maze Example r r r +1 u u -1 u l l l g = 1 Reward is for non-terminal states We need to learn all the transition probabilities! (1,1)à (1,2)à (1,3)à (1,2)à (1,3)à (2,3)à (3,3)à (4,3) +1 (1,1)à (1,2)à (1,3)à (2,3)à (3,3)à (3,2)à (3,3)à (4,3) +1 (1,1)à (2,1)à (3,1)à (3,2)à (4,2) -1 P((2,3) (1,3),r) =2/3 P((1,2) (1,3),r) =1/3 Use this information in # % = max ) * %,, Pr % 4 %,, # (% 4 ) CS885 Spring 2018 Pascal Poupart 5
6 Model-based RL Idea: at each step Execute action Observe resulting state and reward Update transition and/or reward model Update policy and/or value function CS885 Spring 2018 Pascal Poupart 6
7 Model-based RL (with Value Iteration) ModelBasedRL(!) Repeat Select and execute " Observe! and $ Update counts: %!, " %!, " + 1, %!, ",! * %!, ",! * + 1 Update transition: Pr! *!, " -(/,0,/1 )! Update reward: 5!, " -(/,0) /,0 89 :(/,0) -(/,0) Solve: ; (!) = max 0 5!, " + A / 1 Pr! *!, " ;! *!!! Until convergence of ; Return ; CS885 Spring 2018 Pascal Poupart 7
8 Complex models Use function approx. for transition and reward models Linear model:!"# $ % $, ' = )($ -. $, / 0 1) ' Non-linear models: Stochastic (e.g. Gaussian process): $, / 0 1) ' Deterministic (e.g., neural network): $ % = 5 $, ' = ))($, ')!"# $ % $, ' = 34($ -. CS885 Spring 2018 Pascal Poupart 8
9 Partial Planning In complex models, fully optimizing the policy or value function at each time step is intractable Consider partial planning A few steps of Q-learning A few steps of policy gradient CS885 Spring 2018 Pascal Poupart 9
10 Model-based RL (with Q-learning) ModelBasedRL(!) Repeat Select and execute ", observe! and $ Update transition: % & % & ) & *!, "!,./ *(!, ") Update reward:% 2 % 2 ) 2 3!, " $.4 3(!, ") Repeat a few times: sample!, 6" arbitrarily 7 3!, 6" + 9 max? *(!, 6"), 6",?!, 6" 6= > Update?: 7.A?(!, 6")!! Until convergence of? Return? CS885 Spring 2018 Pascal Poupart 10
11 Partial Planning vs Replay Buffer Previous algorithm is very similar to Model-free Q- learning with a replay buffer Instead of updating Q-function based on samples from replay buffer, generate samples from model Replay buffer: Simple, real samples, no generalization to other sate-action pairs Partial planning with a model Complex, simulated samples, generalization to other state-action pairs (can help or hurt) CS885 Spring 2018 Pascal Poupart 11
12 Dyna Learn explicit transition and/or reward model Plan based on the model Learn directly from real experience plan Agent: update model Agent: update policy/value function state reward state reward Environment action CS885 Spring 2018 Pascal Poupart 12
13 Dyna-Q Dyna-Q(!) Repeat Select and execute ", observe! and $ Update transition: % & % & ) & *!, "!,./ *(!, ") Update reward: % 2 % 2 ) 2 3!, " $.4 3(!, ") 5 $ + 7 max ; < =!, " =!, " Update =: %? %? )? 5.@ =(!, ") Repeat a few times: sample!, B" arbitrarily 5 3!, B" + 7 max = *(!, B"), B", =!, B" B; < Update =: %? %? )? 5.@ =(!, B")!! Return = CS885 Spring 2018 Pascal Poupart 13
14 Task: reach G from S Dyna-Q CS885 Spring 2018 Pascal Poupart 14
15 Planning from current state Instead of planning at arbitrary states, plan from the current state This helps improve next action Monte Carlo Tree Search CS885 Spring 2018 Pascal Poupart 15
16 Tree Search a s1 b s2 s3 s12 s13 a b a b a b a b s4 s5 s6 s7 s8 s9 s10 s11 s14 s15 s16 s17 s18 s19 s20 s21 CS885 Spring 2018 Pascal Poupart 16
17 Tractable Tree Search Combine 3 ideas: Leaf nodes: approximate leaf values with value of default policy! " $, & " ( $, & 1 *($, &) -./0 Chance nodes: approximate expectation by sampling from transition model " 1 $, & 3 $, & =($ > ) *($, &) ~ 9:(6 7 6,<) Decision nodes: expand only most promising actions & = &@AB&C < " $, & + D E FG ,< and = $ = "($, & ) Resulting algorithm: Monte Carlo Tree Search CS885 Spring 2018 Pascal Poupart 17
18 Computer Go Deep RL Monte Carlo Tree Search Oct 2015: CS885 Spring 2018 Pascal Poupart 18
19 Monte Carlo Tree Search (with upper confidence bound) UCT(! " ) create root #$%& " with state!/0/& #$%& "! " while within computational budget do #$%& < =>&&?$@ABC(#$%& " ) F0@G& H&I0G@/?$@ABC!/0/& #$%& < J0BKGL(#$%& <, F0@G&) return 0B/A$#(J&!/OhA@% #$%& ", 0 ) TreePolicy(#$%&) while #$%& is nonterminal do if #$%& is not fully expanded do return UVL0#%(#$%&) else #$%& J&!/OhA@%(#$%&, O) return #$%& CS885 Spring 2018 Pascal Poupart 19
20 Monte Carlo Tree Search (continued) Expand(!"#$) choose % untried actions of '()*%*$!"#$ ) add a new child!"#$ to!"#$ with )*%*$!"#$ 9 ;()*%*$!"#$, %) return!"#$ deterministic transition BestChild(!"#$,?) return arg max M CDEF G HIJKELFC CDEF!"#$9 +? O PQ C CDEF C(CDEF G ) DefaultPolicy(!"#$) while!"#$ is not terminal do sample % ~ U % )*%*$!"#$ ) ;()*%*$!"#$, %) return V(), %) CS885 Spring 2018 Pascal Poupart 20
21 Monte Carlo Tree Search (continued) Single Player Backup(!"#$,%&'($) while!"#$ is not null do 7 789: ; 789: <=>?@: 5!"#$ 7 789: <A!!"#$!!"#$ + 1!"#$ D&E$!F(!"#$) Two Players (adversarial) BackupMinMax(!"#$,%&'($) while!"#$ is not null do 7 789: ; 789: <=>?@: 5!"#$ 7 789: <A!!"#$!!"#$ + 1 %&'($ %&'($!"#$ D&E$!F(!"#$) CS885 Spring 2018 Pascal Poupart 21
22 AlphaGo Four steps: 1. Supervised Learning of Policy Networks 2. Policy gradient with Policy Networks 3. Value gradient with Value Networks 4. Searching with Policy and Value Networks Monte Carlo Tree Search variant CS885 Spring 2018 Pascal Poupart 22
23 Search Tree At each edge store!(#, %), '(% #), )(#, %) Where ) #, % is the visit count of (#, %) Sample trajectory CS885 Spring 2018 Pascal Poupart 23
24 Simulation At each node, select edge! that maximizes! =!$%&!' ( ) *,! + -(*,!) where - *,! 1(( 3) is an exploration bonus 456 3,( ) *,! = 4 6 3,( *,! [;< = * + 1 ;? 8 ] 1 BC *,! D!* EB*BFGH!F BFG$!FBIJ B 1 8 *,! = A 0 IFhG$DB*G CS885 Spring 2018 Pascal Poupart 24
Neural Networks and Tree Search
Mastering the Game of Go With Deep Neural Networks and Tree Search Nabiha Asghar 27 th May 2016 AlphaGo by Google DeepMind Go: ancient Chinese board game. Simple rules, but far more complicated than Chess
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient I Used Materials Disclaimer: Much of the material and slides for this lecture
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 12: Deep Reinforcement Learning
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 12: Deep Reinforcement Learning Types of Learning Supervised training Learning from the teacher Training data includes
More informationA Brief Introduction to Reinforcement Learning
A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent
More informationReinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended)
Reinforcement Learning (INF11010) Lecture 6: Dynamic Programming for Reinforcement Learning (extended) Pavlos Andreadis, February 2 nd 2018 1 Markov Decision Processes A finite Markov Decision Process
More informationA Series of Lectures on Approximate Dynamic Programming
A Series of Lectures on Approximate Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.)
More informationDeep Reinforcement Learning
Deep Reinforcement Learning 1 Outline 1. Overview of Reinforcement Learning 2. Policy Search 3. Policy Gradient and Gradient Estimators 4. Q-prop: Sample Efficient Policy Gradient and an Off-policy Critic
More informationClick to edit Master title style Approximate Models for Batch RL Click to edit Master subtitle style Emma Brunskill 2/18/15 2/18/15 1 1
Approximate Click to edit Master titlemodels style for Batch RL Click to edit Emma Master Brunskill subtitle style 11 FVI / FQI PI Approximate model planners Policy Iteration maintains both an explicit
More informationMarkov Decision Processes and Reinforcement Learning
Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence
More informationValue Iteration. Reinforcement Learning: Introduction to Machine Learning. Matt Gormley Lecture 23 Apr. 10, 2019
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Reinforcement Learning: Value Iteration Matt Gormley Lecture 23 Apr. 10, 2019 1
More informationApplications of Reinforcement Learning. Ist künstliche Intelligenz gefährlich?
Applications of Reinforcement Learning Ist künstliche Intelligenz gefährlich? Table of contents Playing Atari with Deep Reinforcement Learning Playing Super Mario World Stanford University Autonomous Helicopter
More informationSlides credited from Dr. David Silver & Hung-Yi Lee
Slides credited from Dr. David Silver & Hung-Yi Lee Review Reinforcement Learning 2 Reinforcement Learning RL is a general purpose framework for decision making RL is for an agent with the capacity to
More informationCME 213 SPRING Eric Darve
CME 213 SPRING 2017 Eric Darve MPI SUMMARY Point-to-point and collective communications Process mapping: across nodes and within a node (socket, NUMA domain, core, hardware thread) MPI buffers and deadlocks
More informationReinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019
Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 1 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Introduction, History, General Concepts
More informationMonte Carlo Tree Search PAH 2015
Monte Carlo Tree Search PAH 2015 MCTS animation and RAVE slides by Michèle Sebag and Romaric Gaudel Markov Decision Processes (MDPs) main formal model Π = S, A, D, T, R states finite set of states of the
More informationMore Realistic Adversarial Settings. Virginia Tech CS5804 Introduction to Artificial Intelligence Spring 2015
More Realistic Adversarial Settings Virginia Tech CS5804 Introduction to Artificial Intelligence Spring 2015 Review Minimax search How to adjust for more than two agents, for non-zero-sum Analysis very
More informationGradient Reinforcement Learning of POMDP Policy Graphs
1 Gradient Reinforcement Learning of POMDP Policy Graphs Douglas Aberdeen Research School of Information Science and Engineering Australian National University Jonathan Baxter WhizBang! Labs July 23, 2001
More informationIntroduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University
Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:
More informationReinforcement Learning: A brief introduction. Mihaela van der Schaar
Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming
More informationReinforcement Learning (2)
Reinforcement Learning (2) Bruno Bouzy 1 october 2013 This document is the second part of the «Reinforcement Learning» chapter of the «Agent oriented learning» teaching unit of the Master MI computer course.
More informationGradient Descent. 1) S! initial state 2) Repeat: Similar to: - hill climbing with h - gradient descent over continuous space
Local Search 1 Local Search Light-memory search method No search tree; only the current state is represented! Only applicable to problems where the path is irrelevant (e.g., 8-queen), unless the path is
More information10703 Deep Reinforcement Learning and Control
10703 Deep Reinforcement Learning and Control Russ Salakhutdinov Machine Learning Department rsalakhu@cs.cmu.edu Policy Gradient II Used Materials Disclaimer: Much of the material and slides for this lecture
More informationMonte Carlo Tree Search
Monte Carlo Tree Search Branislav Bošanský PAH/PUI 2016/2017 MDPs Using Monte Carlo Methods Monte Carlo Simulation: a technique that can be used to solve a mathematical or statistical problem using repeated
More informationTwo-player Games ZUI 2016/2017
Two-player Games ZUI 2016/2017 Branislav Bošanský bosansky@fel.cvut.cz Two Player Games Important test environment for AI algorithms Benchmark of AI Chinook (1994/96) world champion in checkers Deep Blue
More information6.231 DYNAMIC PROGRAMMING LECTURE 15 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 15 LECTURE OUTLINE Rollout algorithms Cost improvement property Discrete deterministic problems Sequential consistency and greedy algorithms Sequential improvement ROLLOUT
More informationPlanning and Control: Markov Decision Processes
CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What
More informationMarkov Decision Processes. (Slides from Mausam)
Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology
More information44.1 Introduction Introduction. Foundations of Artificial Intelligence Monte-Carlo Methods Sparse Sampling 44.4 MCTS. 44.
Foundations of Artificial ntelligence May 27, 206 44. : ntroduction Foundations of Artificial ntelligence 44. : ntroduction Thomas Keller Universität Basel May 27, 206 44. ntroduction 44.2 Monte-Carlo
More informationFoundations of Artificial Intelligence
Foundations of Artificial Intelligence 45. AlphaGo and Outlook Malte Helmert and Gabriele Röger University of Basel May 22, 2017 Board Games: Overview chapter overview: 40. Introduction and State of the
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationArtificial Intelligence. Game trees. Two-player zero-sum game. Goals for the lecture. Blai Bonet
Artificial Intelligence Blai Bonet Game trees Universidad Simón Boĺıvar, Caracas, Venezuela Goals for the lecture Two-player zero-sum game Two-player game with deterministic actions, complete information
More informationAccelerating Reinforcement Learning in Engineering Systems
Accelerating Reinforcement Learning in Engineering Systems Tham Chen Khong with contributions from Zhou Chongyu and Le Van Duc Department of Electrical & Computer Engineering National University of Singapore
More informationADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL
ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN
More informationNeural Episodic Control. Alexander pritzel et al (presented by Zura Isakadze)
Neural Episodic Control Alexander pritzel et al. 2017 (presented by Zura Isakadze) Reinforcement Learning Image from reinforce.io RL Example - Atari Games Observed States Images. Internal state - RAM.
More informationBandit-based Search for Constraint Programming
Bandit-based Search for Constraint Programming Manuel Loth 1,2,4, Michèle Sebag 2,4,1, Youssef Hamadi 3,1, Marc Schoenauer 4,2,1, Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud
More informationGeneralized Inverse Reinforcement Learning
Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract
More informationWhen Network Embedding meets Reinforcement Learning?
When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE
More informationLecture 13: Learning from Demonstration
CS 294-5 Algorithmic Human-Robot Interaction Fall 206 Lecture 3: Learning from Demonstration Scribes: Samee Ibraheem and Malayandi Palaniappan - Adapted from Notes by Avi Singh and Sammy Staszak 3. Introduction
More informationHeuristic (Informed) Search
Heuristic (Informed) Search (Where we try to choose smartly) R&N: Chap., Sect..1 3 1 Search Algorithm #2 SEARCH#2 1. INSERT(initial-node,Open-List) 2. Repeat: a. If empty(open-list) then return failure
More informationCS Deep Reinforcement Learning HW4: Model-Based RL due October 18th, 11:59 pm
CS294-112 Deep Reinforcement Learning HW4: Model-Based RL due October 18th, 11:59 pm 1 Introduction The goal of this assignment is to get experience with model-learning in the context of RL, and to use
More informationUniversality in a Large Ensemble of Geometries
Universality in a Large Ensemble of Geometries Jim Halverson Northeastern University String Phenomenology 2017 Based on work with Cody Long and Ben Sung. Some Advertisements Cody Long s Talk: Weak Coupling
More informationCSE 203A: Randomized Algorithms
CSE 203A: Randomized Algorithms (00:30) The Minimax Principle (Ch 2) Lecture on 10/18/2017 by Daniel Kane Notes by Peter Greer-Berezovsky Game Tree Evaluation (2.1) - n round deterministic 2-player game
More informationCS 188: Artificial Intelligence. Recap Search I
CS 188: Artificial Intelligence Review of Search, CSPs, Games DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered. You
More informationPartially Observable Markov Decision Processes. Silvia Cruciani João Carvalho
Partially Observable Markov Decision Processes Silvia Cruciani João Carvalho MDP A reminder: is a set of states is a set of actions is the state transition function. is the probability of ending in state
More informationApprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang
Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control
More informationPractical Tutorial on Using Reinforcement Learning Algorithms for Continuous Control
Practical Tutorial on Using Reinforcement Learning Algorithms for Continuous Control Reinforcement Learning Summer School 2017 Peter Henderson Riashat Islam 3rd July 2017 Deep Reinforcement Learning Classical
More informationAlgorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10
Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:
More informationOutline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility
Outline CS38 Introduction to Algorithms Lecture 18 May 29, 2014 coping with intractibility approximation algorithms set cover TSP center selection randomness in algorithms May 29, 2014 CS38 Lecture 18
More informationKnowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay
Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay Haiyan (Helena) Yin, Sinno Jialin Pan School of Computer Science and Engineering Nanyang Technological University
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationCS 188: Artificial Intelligence Spring Recap Search I
CS 188: Artificial Intelligence Spring 2011 Midterm Review 3/14/2011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the
More informationCMU-Q Lecture 2: Search problems Uninformed search. Teacher: Gianni A. Di Caro
CMU-Q 15-381 Lecture 2: Search problems Uninformed search Teacher: Gianni A. Di Caro RECAP: ACT RATIONALLY Think like people Think rationally Agent Sensors? Actuators Percepts Actions Environment Act like
More informationRecap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems.
CS 188: Artificial Intelligence Spring 011 Midterm Review 3/14/011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the
More informationEfficient Learning in Linearly Solvable MDP Models A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA.
Efficient Learning in Linearly Solvable MDP Models A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Ang Li IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE
More informationLecture 18: Video Streaming
MIT 6.829: Computer Networks Fall 2017 Lecture 18: Video Streaming Scribe: Zhihong Luo, Francesco Tonolini 1 Overview This lecture is on a specific networking application: video streaming. In particular,
More informationSelf Programming Networks
Self Programming Networks Is it possible for to Learn the control planes of networks and applications? Operators specify what they want, and the system learns how to deliver CAN WE LEARN THE CONTROL PLANE
More informationMaster Thesis. Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces
Master Thesis Simulation Based Planning for Partially Observable Markov Decision Processes with Continuous Observation Spaces Andreas ten Pas Master Thesis DKE 09-16 Thesis submitted in partial fulfillment
More informationAlternatives to Direct Supervision
CreativeAI: Deep Learning for Graphics Alternatives to Direct Supervision Niloy Mitra Iasonas Kokkinos Paul Guerrero Nils Thuerey Tobias Ritschel UCL UCL UCL TUM UCL Timetable Theory and Basics State of
More informationarxiv: v1 [cs.cv] 2 Sep 2018
Natural Language Person Search Using Deep Reinforcement Learning Ankit Shah Language Technologies Institute Carnegie Mellon University aps1@andrew.cmu.edu Tyler Vuong Electrical and Computer Engineering
More informationGo in numbers 3,000. Years Old
Go in numbers 3,000 Years Old 40M Players 10^170 Positions The Rules of Go Capture Territory Why is Go hard for computers to play? Brute force search intractable: 1. Search space is huge 2. Impossible
More informationA fast point-based algorithm for POMDPs
A fast point-based algorithm for POMDPs Nikos lassis Matthijs T. J. Spaan Informatics Institute, Faculty of Science, University of Amsterdam Kruislaan 43, 198 SJ Amsterdam, The Netherlands {vlassis,mtjspaan}@science.uva.nl
More informationProbabilistic Belief. Adversarial Search. Heuristic Search. Planning. Probabilistic Reasoning. CSPs. Learning CS121
CS121 Heuristic Search Planning CSPs Adversarial Search Probabilistic Reasoning Probabilistic Belief Learning Heuristic Search First, you need to formulate your situation as a Search Problem What is a
More informationAutoencoder. Representation learning (related to dictionary learning) Both the input and the output are x
Deep Learning 4 Autoencoder, Attention (spatial transformer), Multi-modal learning, Neural Turing Machine, Memory Networks, Generative Adversarial Net Jian Li IIIS, Tsinghua Autoencoder Autoencoder Unsupervised
More informationLecture 19: Generative Adversarial Networks
Lecture 19: Generative Adversarial Networks Roger Grosse 1 Introduction Generative modeling is a type of machine learning where the aim is to model the distribution that a given set of data (e.g. images,
More informationMarco Wiering Intelligent Systems Group Utrecht University
Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment
More informationNeuro-Dynamic Programming An Overview
1 Neuro-Dynamic Programming An Overview Dimitri Bertsekas Dept. of Electrical Engineering and Computer Science M.I.T. May 2006 2 BELLMAN AND THE DUAL CURSES Dynamic Programming (DP) is very broadly applicable,
More informationGPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING M. Babaeizadeh,, I.Frosio, S.Tyree, J. Clemons, J.Kautz University of Illinois at Urbana-Champaign, USA NVIDIA, USA An ICLR 2017 paper A github project GPU-BASED
More informationAdvanced A* Improvements
Advanced A* Improvements 1 Iterative Deepening A* (IDA*) Idea: Reduce memory requirement of A* by applying cutoff on values of f Consistent heuristic function h Algorithm IDA*: 1. Initialize cutoff to
More informationCS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation
CS 687 Jana Kosecka Reinforcement Learning Continuous State MDP s Value Function approximation Markov Decision Process - Review Formal definition 4-tuple (S, A, T, R) Set of states S - finite Set of actions
More informationPlanning and Reinforcement Learning through Approximate Inference and Aggregate Simulation
Planning and Reinforcement Learning through Approximate Inference and Aggregate Simulation Hao Cui Department of Computer Science Tufts University Medford, MA 02155, USA hao.cui@tufts.edu Roni Khardon
More informationGUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning
GUNREAL: GPU-accelerated UNsupervised REinforcement and Auxiliary Learning Koichi Shirahata, Youri Coppens, Takuya Fukagai, Yasumoto Tomita, and Atsushi Ike FUJITSU LABORATORIES LTD. March 27, 2018 0 Deep
More informationTwo-player Games ZUI 2012/2013
Two-player Games ZUI 2012/2013 Game-tree Search / Adversarial Search until now only the searching player acts in the environment there could be others: Nature stochastic environment (MDP, POMDP, ) other
More informationMonte Carlo Hierarchical Model Learning
Monte Carlo Hierarchical Model Learning Jacob Menashe and Peter Stone The University of Texas at Austin Austin, Texas {jmenashe,pstone}@cs.utexas.edu ABSTRACT Reinforcement learning (RL) is a well-established
More informationCS 4700: Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Fall 2017 Instructor: Prof. Haym Hirsh Lecture 8 Today Informed Search (R&N Ch 3,4) Adversarial search (R&N Ch 5) Adversarial Search (R&N Ch 5) Homework
More informationThe exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS Summer Introduction to Artificial Intelligence Midterm You have approximately minutes. The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. Mark your answers
More informationHeuristic Search in MDPs 3/5/18
Heuristic Search in MDPs 3/5/18 Thinking about online planning. How can we use ideas we ve already seen to help with online planning? Heuristics? Iterative deepening? Monte Carlo simulations? Other ideas?
More informationSearch : Lecture 2. September 9, 2003
Search 6.825: Lecture 2 September 9, 2003 1 Problem-Solving Problems When your environment can be effectively modeled as having discrete states and actions deterministic, known world dynamics known initial
More informationPascal De Beck-Courcelle. Master in Applied Science. Electrical and Computer Engineering
Study of Multiple Multiagent Reinforcement Learning Algorithms in Grid Games by Pascal De Beck-Courcelle A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of
More information1 Minimum Cut Problem
CS 6 Lecture 6 Min Cut and Karger s Algorithm Scribes: Peng Hui How, Virginia Williams (05) Date: November 7, 07 Anthony Kim (06), Mary Wootters (07) Adapted from Virginia Williams lecture notes Minimum
More informationMarkov Decision Processes (MDPs) (cont.)
Markov Decision Processes (MDPs) (cont.) Machine Learning 070/578 Carlos Guestrin Carnegie Mellon University November 29 th, 2007 Markov Decision Process (MDP) Representation State space: Joint state x
More informationIntroduction to Deep Q-network
Introduction to Deep Q-network Presenter: Yunshu Du CptS 580 Deep Learning 10/10/2016 Deep Q-network (DQN) Deep Q-network (DQN) An artificial agent for general Atari game playing Learn to master 49 different
More information2SAT Andreas Klappenecker
2SAT Andreas Klappenecker The Problem Can we make the following boolean formula true? ( x y) ( y z) (z y)! Terminology A boolean variable is a variable that can be assigned the values true (T) or false
More informationClustering. k-mean clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein A quick review The clustering problem: homogeneity vs. separation Different representations
More information19: Inference and learning in Deep Learning
10-708: Probabilistic Graphical Models 10-708, Spring 2017 19: Inference and learning in Deep Learning Lecturer: Zhiting Hu Scribes: Akash Umakantha, Ryan Williamson 1 Classes of Deep Generative Models
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG) Bayesian Networks General Factorization Bayesian Curve Fitting (1) Polynomial Bayesian
More informationReal-Time Solving of Quantified CSPs Based on Monte-Carlo Game Tree Search
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Real-Time Solving of Quantified CSPs Based on Monte-Carlo Game Tree Search Baba Satomi, Yongjoon Joe, Atsushi
More informationPart II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between
More informationMission Command Analysis Using Monte Carlo Tree Search
TRAC-M-TR-13-050 14 June 2013 Mission Command Analysis Using Monte Carlo Tree Search TRADOC Analysis Center - Monterey 700 Dyer Road Monterey, California 93943-0692 This study cost the Department of Defense
More informationˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet.
CS Summer Introduction to Artificial Intelligence Midterm ˆ You have approximately minutes. ˆ The exam is closed book, closed calculator, and closed notes except your one-page crib sheet. ˆ Mark your answers
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationIn Homework 1, you determined the inverse dynamics model of the spinbot robot to be
Robot Learning Winter Semester 22/3, Homework 2 Prof. Dr. J. Peters, M.Eng. O. Kroemer, M. Sc. H. van Hoof Due date: Wed 6 Jan. 23 Note: Please fill in the solution on this sheet but add sheets for the
More informationUnsupervised Learning
Deep Learning for Graphics Unsupervised Learning Niloy Mitra Iasonas Kokkinos Paul Guerrero Vladimir Kim Kostas Rematas Tobias Ritschel UCL UCL/Facebook UCL Adobe Research U Washington UCL Timetable Niloy
More informationIntroduction to Fall 2014 Artificial Intelligence Midterm Solutions
CS Introduction to Fall Artificial Intelligence Midterm Solutions INSTRUCTIONS You have minutes. The exam is closed book, closed notes except a one-page crib sheet. Please use non-programmable calculators
More informationMachine Learning Techniques at the core of AlphaGo success
Machine Learning Techniques at the core of AlphaGo success Stéphane Sénécal Orange Labs stephane.senecal@orange.com Paris Machine Learning Applications Group Meetup, 14/09/2016 1 / 42 Some facts... (1/3)
More informationProgramming Reinforcement Learning in Jason
Programming Reinforcement Learning in Jason Amelia Bădică 1, Costin Bădică 1, Mirjana Ivanović 2 1 University of Craiova, Romania 2 University of Novi Sad, Serbia Talk Outline Introduction, Motivation
More informationSimulated Annealing. Slides based on lecture by Van Larhoven
Simulated Annealing Slides based on lecture by Van Larhoven Iterative Improvement 1 General method to solve combinatorial optimization problems Principle: Start with initial configuration Repeatedly search
More information0.1. alpha(n): learning rate
Least-Squares Temporal Dierence Learning Justin A. Boyan NASA Ames Research Center Moett Field, CA 94035 jboyan@arc.nasa.gov Abstract TD() is a popular family of algorithms for approximate policy evaluation
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2016 Lecturer: Trevor Cohn 21. Independence in PGMs; Example PGMs Independence PGMs encode assumption of statistical independence between variables. Critical
More informationMIND: Machine Learning based Network Dynamics. Dr. Yanhui Geng Huawei Noah s Ark Lab, Hong Kong
MIND: Machine Learning based Network Dynamics Dr. Yanhui Geng Huawei Noah s Ark Lab, Hong Kong Outline Challenges with traditional SDN MIND architecture Experiment results Conclusion Challenges with Traditional
More information3.6.2 Generating admissible heuristics from relaxed problems
3.6.2 Generating admissible heuristics from relaxed problems To come up with heuristic functions one can study relaxed problems from which some restrictions of the original problem have been removed The
More information