Function Approximation. Pieter Abbeel UC Berkeley EECS

Size: px
Start display at page:

Download "Function Approximation. Pieter Abbeel UC Berkeley EECS"

Transcription

1 Function Approximation Pieter Abbeel UC Berkeley EECS

2 Outline Value iteration with function approximation Linear programming with function approximation

3 Value Iteration Algorithm: Start with for all s. For i=1,, H For all states s 2 S: Impractical for large state spaces = the expected sum of rewards accumulated when starting from state s and acting optimally for a horizon of i steps = the optimal action when in state s and getting to act for a horizon of i steps

4 Example: tetris state: board configuration + shape of the falling piece ~2 200 states! action: rotation and translation applied to the falling piece 22 features aka basis functions Á i Ten basis functions, 0,..., 9, mapping the state to the height h[k] of each of the ten columns. Nine basis functions, 10,..., 18, each mapping the state to the absolute difference between heights of successive columns: h[k+1] h[k], k = 1,..., 9. One basis function, 19, that maps state to the maximum column height: max k h [k] One basis function, 20, that maps state to the number of holes in the board. One basis function, 21, that is equal to 1 in every state. ˆV θ (s) = 21 i=0 θ i φ i (s) =(s) [Bertsekas & Ioffe, 1996 (TD); Bertsekas & Tsitsiklis 1996 (TD);Kakade 2002 (policy gradient); Farias & Van Roy, 2006 (approximate LP)]

5 Function Approximation V(s) = + distance to closest ghost + θ 2 distance to closest power pellet + θ 3 in dead-end + θ 4 closer to power pellet than ghost is + = θ 0 θ 1 n θ i φ i (s) =(s) i=0

6 Function Approximation 0 th order approximation (1-nearest neighbor):. s x1 x2 x3 x x5 x6 x7 x8.... x9 x10 x11 x12 φ(s) = ˆV (s) = ˆV (x4) = θ 4 ˆV (s) =(s) Only store values for x1, x2,, x12 call these values θ 1, θ 2,...,θ 12 Assign other states value of nearest x state

7 Function Approximation 1 th order approximation (k-nearest neighbor interpolation): ˆV (s) =φ 1 (s)θ 1 + φ 2 (s)θ 2 + φ 5 (s)θ 5 + φ 6 (s)θ x1. x2 x3 x4 s.... x5 x6 x7 x8.... x9 x10 x11 x12 φ(s) = ˆV (s) =(s) Only store values for x1, x2,, x12 call these values θ 1, θ 2,...,θ 12 Assign other states interpolated value of nearest 4 x states

8 Function Approximation Examples: S = R, ˆV (s) =θ1 + θ 2 s S = R, ˆV (s) =θ1 + θ 2 s + θ 3 s 2 S = R, ˆV (s) = n i=0 θ i s i S, ˆV (s) = log( exp((s)) )

9 Function Approximation Main idea: Use approximation of the true value function, θ ˆV θ is a free parameter to be chosen from its domain V Θ S Representation size: à downto: Θ + : less parameters to estimate - : less expressiveness, typically there exist many V for which there is no θ such that ˆVθ = V

10 Supervised Learning Given: set of examples (s (1),V(s (1) )),,(s (2),V(s (2) )),...,(s (m),v(s (m) )) Asked for: best ˆV θ Representative approach: find through least squares: min θ Θ m ( ˆV θ (s (i) ) V (s (i) )) 2 i=1 θ

11 Supervised Learning Example Linear regression Observation Error or residual Prediction min θ 0,θ 1 n (θ 0 + θ 1 x (i) y (i) ) 2 i=

12 Overfitting To avoid overfitting: reduce number of features used Practical approach: leave-out validation Perform fitting for different choices of feature sets using just 70% of the data Pick feature set that led to highest quality of fit on the remaining 30% of data

13 Status Function approximation through supervised learning BUT: where do the supervised examples come from?

14 Value Iteration with Function Approximation Pick some (typically ) Initialize by choosing some setting for Iterate for i = 0, 1, 2,, H: Step 1: Bellman back-ups s S : S S Vi+1 (s) max a s T (s, a, s ) S << S θ (0) R(s, a, s )+γ ˆV θ (i)(s ) Step 2: Supervised learning find θ (i+1) as the solution of: min ˆVθ (i+1)(s) V 2 i+1 (s) θ s S

15 Value Iteration with Function Approximation --- Example Mini-tetris: two types of blocks, can only choose translation (not rotation) Example state: Reward = 1 for placing a block Sink state / Game over is reached when block is placed such that part of it extends above the red rectangle If you have a complete row, it gets cleared

16 Value Iteration with Function Approximation --- Example S = {,,, }

17 Value Iteration with Function Approximation --- Example S = {,,, } 10 features aka basis functions Á i Four basis functions, 0,..., 3, mapping the state to the height h[k] of each of the four columns. Three basis functions, 4,..., 6, each mapping the state to the absolute difference between heights of successive columns: h[k+1] h[k], k = 1,..., 3. One basis function, 7, that maps state to the maximum column height: max k h[k] One basis function, 8, that maps state to the number of holes in the board. One basis function, 9, that is equal to 1 in every state. Init \theta^{(0)} = ( -1, -1, -1, -1, -2, -2, -2, -3, -2, 10)

18 Value Iteration with Function Approximation --- Example Bellman back-ups for the states in S : V( ) = max {0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ),

19 Value Iteration with Function Approximation --- Example Bellman back-ups for the states in S : V( ) = max {0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ),

20 Value Iteration with Function Approximation --- Example S = {,,, } 10 features aka basis functions Á i Four basis functions, 0,..., 3, mapping the state to the height h[k] of each of the four columns. Three basis functions, 4,..., 6, each mapping the state to the absolute difference between heights of successive columns: h[k+1] h[k], k = 1,..., 3. One basis function, 7, that maps state to the maximum column height: max k h[k] One basis function, 8, that maps state to the number of holes in the board. One basis function, 9, that is equal to 1 in every state. Init θ (0) =( 1, 1, 1, 1, 2, 2, 2, 3, 2, 20)

21 Value Iteration with Function Approximation --- Example Bellman back-ups for the states in S : V( )=max {0.5 *(1+ ( ))+0.5*(1 + ( ) ), (6,2,4,0, 4, 2, 4, 6, 0, 1) (6,2,4,0, 4, 2, 4, 6, 0, 1) 0.5 *(1+ ( ))+0.5*(1 + ( ) ), (2,6,4,0, 4, 2, 4, 6, 0, 1) (2,6,4,0, 4, 2, 4, 6, 0, 1) 0.5 *(1+ ( ))+0.5*(1 + ( ) ), (sink-state, V=0) (sink-state, V=0) 0.5 *(1+ ( ))+0.5*(1 + ( ) )} (0,0,2,2, 0,2,0, 2, 0, 1) (0,0,2,2, 0,2,0, 2, 0, 1)

22 Value Iteration with Function Approximation --- Example Bellman back-ups for the states in S : V( )=max {0.5 *( )+0.5*( ), 0.5 *( )+0.5*( ), 0.5 *(1+ 0 )+0.5*(1 + 0 ), 0.5 *(1+ 6 )+0.5*(1 + 6 ), = 6.4 (for = 0.9)

23 Value Iteration with Function Approximation --- Example Bellman back-ups for the second state in S : θ (0) =( 1, 1, 1, 1, 2, 2, 2, 3, 2, 20) V( )=max {0.5 *(1+ ( ))+0.5*(1 + ( ) ), (sink-state, V=0) (sink-state, V=0) 0.5 *(1+ ( ))+0.5*(1 + ( ) ), (sink-state, V=0) (sink-state, V=0) 0.5 *(1+ ( ))+0.5*(1 + ( ) ), (sink-state, V=0) (sink-state, V=0) 0.5 *(1+ ( ))+0.5*(1 + ( ) )} = 19 (0,0,0,0, 0,0,0, 0, 0, 1) (0,0,0,0, 0,0,0, 0, 0, 1) -> V = 20 -> V = 20

24 Value Iteration with Function Approximation --- Example Bellman back-ups for the third state in S : θ (0) =( 1, 1, 1, 1, 2, 2, 2, 3, 2, 20) V( )=max {0.5 *(1+ ( ))+0.5*(1 + ( ) ), (4,4,0,0, 0,4,0, 4, 0, 1) (4,4,0,0, 0,4,0, 4, 0, 1) -> V = -8 -> V = *(1+ ( ))+0.5*(1 + ( ) ), (2,4,4,0, 2,0,4, 4, 0, 1) (2,4,4,0, 2,0,4, 4, 0, 1) -> V = -14 -> V = *(1+ ( ))+0.5*(1 + ( ) ), (0,0,0,0, 0,0,0, 0, 0, 1) (0,0,0,0, 0,0,0, 0, 0, 1) -> V = 20 -> V = 20 = 19

25 Value Iteration with Function Approximation --- Example Bellman back-ups for the fourth state in S : θ (0) =( 1, 1, 1, 1, 2, 2, 2, 3, 2, 20) V( )=max {0.5 *(1+ ( ))+0.5*(1 + ( ) ), (6,6,4,0, 0,2,4, 6, 4, 1) (6,6,4,0, 0,2,4, 6, 4, 1) -> V = -34 -> V = *(1+ ( ))+0.5*(1 + ( ) ), (4,6,6,0, 2,0,6, 6, 4, 1) (4,6.6,0, 2,0,6, 6, 4, 1) -> V = -38 -> V = *(1+ ( ))+0.5*(1 + ( ) ), (4,0,6,6, 4,6,0, 6, 4, 1) (4,0,6,6, 4,6,0, 6, 4, 1) -> V = -42 -> V = -42 = -29.6

26 Value Iteration with Function Approximation --- Example After running the Bellman back-ups for all 4 states in S we have: We now run supervised learning on these 4 examples to find a new µ: V( )= 6.4 (2,2,4,0, 0,2,4, 4, 0, 1) V( )= 19 (4,4,4,0, 0,0,4, 4, 0, 1) V( )= 19 (2,2,0,0, 0,2,0, 2, 0, 1) V( )= (4,0,4,0, 4,4,4, 4, 0, 1) min θ (6.4 ( )) 2 +(19 ( )) 2 +(19 ( )) 2 +(( 29.6) ( )) 2 à Running least squares gives new µ θ (1) =(0.195, 6.24, 2.11, 0, 6.05, 0.13, 2.11, 2.13, 0, 1.59)

27 Potential guarantees?

28 Simple example** µ 2µ Function approximator: [1 2] * µ

29 Simple example**

30 Composing operators** Definition. An operator G is a non-expansion with respect to a norm. if Fact. If the operator F is a contraction with respect to a norm. and the operator G is a non-expansion with respect to the same norm, then the sequential application of the operators G and F is a -contraction, i.e., Corollary. If the supervised learning step is a nonexpansion, then iteration in value iteration with function approximation is a -contraction, and in this case we have a convergence guarantee.

31 Averager function approximators are non-expansions** Examples: nearest neighbor (aka state aggregation) linear interpolation over triangles (tetrahedrons, )

32 Averager function approximators are non-expansions**

33 Linear regression L ** [Example taken from Gordon, 1995.]

34 Guarantees for fixed point** I.e., if we pick a non-expansion function approximator which can approximate J* well, then we obtain a good value function estimate. To apply to discretization: use continuity assumptions to show that J* can be approximated well by chosen discretization scheme

35 Outline Value iteration with function approximation Linear programming with function approximation

36 Infinite Horizon Linear Program min V µ 0 (s)v (s) s S s.t. V(s) s T (s, a, s )[R(s, a, s )+γv (s )], s S, a A µ 0 is a probability distribution over S, with µ 0 (s)> 0 for all s 2 S. Theorem. V * is the solution to the above LP.

37 Infinite Horizon Linear Program min V µ 0 (s)v (s) s S s.t. V(s) s T (s, a, s )[R(s, a, s )+γv (s )], s S, a A V (s) =(s) Let:, and consider S rather than S: min θ s S µ 0 (s)(s) s.t. (s) s T (s, a, s ) R(s, a, s )+γ(s ), s S,a A à Linear program that finds ˆV θ (s) =(s)

38 Approximate Linear Program Guarantees** min θ s S µ 0 (s)(s) s.t. (s) s T (s, a, s ) R(s, a, s )+γ(s ), s S,a A LP solver will converge Solution quality: [de Farias and Van Roy, 2002] Assuming one of the features is the feature that is equal to one for all states, and assuming S =S we have that: V Φθ 1,µ0 2 1 γ min V Φθ θ (slightly weaker, probabilistic guarantees hold for S not equal to S, these guarantees require size of S to grow as the number of features grows)

Func%on Approxima%on. Pieter Abbeel UC Berkeley EECS

Func%on Approxima%on. Pieter Abbeel UC Berkeley EECS Func%on Approxima%on Pieter Abbeel UC Berkeley EECS Value Itera4on Algorithm: Start with for all s. For i=1,, H For all states s 2 S: Imprac4cal for large state spaces = the expected sum of rewards accumulated

More information

CS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation

CS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation CS 687 Jana Kosecka Reinforcement Learning Continuous State MDP s Value Function approximation Markov Decision Process - Review Formal definition 4-tuple (S, A, T, R) Set of states S - finite Set of actions

More information

TD(0) with linear function approximation guarantees

TD(0) with linear function approximation guarantees CS 287: Advanced Robotics Fall 2009 Lecture 15: LSTD, LSPI, RLSTD, imitation learning Pieter Abbeel UC Berkeley EECS TD(0) with linear function approximation guarantees Stochastic approximation of the

More information

LSTD(0) in policy iteration LSTD(0) Page 1. TD(0) with linear function approximation guarantees. TD(0) with linear function approximation guarantees

LSTD(0) in policy iteration LSTD(0) Page 1. TD(0) with linear function approximation guarantees. TD(0) with linear function approximation guarantees TD(0) with linear function approximation guarantees Stochastic approximation of the following operations: Back-up: (T π V)(s)= s T(s,π(s),s )[R(s,π(s),s )+γv(s )] CS 287: Advanced Robotics Fall 2009 Lecture

More information

15-780: MarkovDecisionProcesses

15-780: MarkovDecisionProcesses 15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic

More information

Linear Regression and K-Nearest Neighbors 3/28/18

Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,

More information

Planning and Control: Markov Decision Processes

Planning and Control: Markov Decision Processes CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What

More information

A Brief Introduction to Reinforcement Learning

A Brief Introduction to Reinforcement Learning A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent

More information

Scan Matching. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics

Scan Matching. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Scan Matching Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Scan Matching Overview Problem statement: Given a scan and a map, or a scan and a scan,

More information

CS 188: Artificial Intelligence Spring Recap Search I

CS 188: Artificial Intelligence Spring Recap Search I CS 188: Artificial Intelligence Spring 2011 Midterm Review 3/14/2011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the

More information

Recap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems.

Recap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems. CS 188: Artificial Intelligence Spring 011 Midterm Review 3/14/011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the

More information

Discre'za'on. Pieter Abbeel UC Berkeley EECS

Discre'za'on. Pieter Abbeel UC Berkeley EECS Discre'za'on Pieter Abbeel UC Berkeley EECS Markov Decision Process AssumpCon: agent gets to observe the state [Drawing from Su;on and Barto, Reinforcement Learning: An IntroducCon, 1998] Markov Decision

More information

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday. CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted

More information

Robust Value Function Approximation Using Bilinear Programming

Robust Value Function Approximation Using Bilinear Programming Robust Value Function Approximation Using Bilinear Programming Marek Petrik Department of Computer Science University of Massachusetts Amherst, MA 01003 petrik@cs.umass.edu Shlomo Zilberstein Department

More information

Q-learning with linear function approximation

Q-learning with linear function approximation Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang

Apprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control

More information

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running

More information

The Problem of Overfitting with Maximum Likelihood

The Problem of Overfitting with Maximum Likelihood The Problem of Overfitting with Maximum Likelihood In the previous example, continuing training to find the absolute maximum of the likelihood produced overfitted results. The effect is much bigger if

More information

Unsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1

Unsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1 Unsupervised CS 3793/5233 Artificial Intelligence Unsupervised 1 EM k-means Procedure Data Random Assignment Assign 1 Assign 2 Soft k-means In clustering, the target feature is not given. Goal: Construct

More information

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric. CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

More information

Learning from Data: Adaptive Basis Functions

Learning from Data: Adaptive Basis Functions Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.

More information

Using Machine Learning to Optimize Storage Systems

Using Machine Learning to Optimize Storage Systems Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation

More information

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10

Algorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10 Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Markov Decision Processes and Reinforcement Learning

Markov Decision Processes and Reinforcement Learning Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence

More information

Machine Learning. Supervised Learning. Manfred Huber

Machine Learning. Supervised Learning. Manfred Huber Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule. CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

More information

The Simplex Algorithm for LP, and an Open Problem

The Simplex Algorithm for LP, and an Open Problem The Simplex Algorithm for LP, and an Open Problem Linear Programming: General Formulation Inputs: real-valued m x n matrix A, and vectors c in R n and b in R m Output: n-dimensional vector x There is one

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

Convergence of reinforcement learning with general function approximators

Convergence of reinforcement learning with general function approximators Convergence of reinforcement learning with general function approximators Vassilis A-Papavassiliou and Stuart Russell Computer Science Division, U. of California, Berkeley, CA 94720-1776 {vassilis^russell}

More information

CS 188: Artificial Intelligence. Recap Search I

CS 188: Artificial Intelligence. Recap Search I CS 188: Artificial Intelligence Review of Search, CSPs, Games DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered. You

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

Unlabeled Data Classification by Support Vector Machines

Unlabeled Data Classification by Support Vector Machines Unlabeled Data Classification by Support Vector Machines Glenn Fung & Olvi L. Mangasarian University of Wisconsin Madison www.cs.wisc.edu/ olvi www.cs.wisc.edu/ gfung The General Problem Given: Points

More information

Markov Decision Processes. (Slides from Mausam)

Markov Decision Processes. (Slides from Mausam) Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology

More information

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University

Introduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:

More information

Sampling-Based Motion Planning

Sampling-Based Motion Planning Sampling-Based Motion Planning Pieter Abbeel UC Berkeley EECS Many images from Lavalle, Planning Algorithms Motion Planning Problem Given start state x S, goal state x G Asked for: a sequence of control

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

CS489/698: Intro to ML

CS489/698: Intro to ML CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun

More information

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline Introduction OLS and important terminology Least Median of Squares (LMedS) M-estimator Penalized least squares What is

More information

Unit #13 : Integration to Find Areas and Volumes, Volumes of Revolution

Unit #13 : Integration to Find Areas and Volumes, Volumes of Revolution Unit #13 : Integration to Find Areas and Volumes, Volumes of Revolution Goals: Beabletoapplyaslicingapproachtoconstructintegralsforareasandvolumes. Be able to visualize surfaces generated by rotating functions

More information

Kernels and Clustering

Kernels and Clustering Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity

More information

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence CS 188: Artificial Intelligence Constraint Satisfaction Problems II and Local Search Instructors: Sergey Levine and Stuart Russell University of California, Berkeley [These slides were created by Dan Klein

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Approximate Q-Learning 3/23/18

Approximate Q-Learning 3/23/18 Approximate Q-Learning 3/23/18 On-Policy Learning (SARSA) Instead of updating based on the best action from the next state, update based on the action your current policy actually takes from the next state.

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

E0005E - Industrial Image Analysis

E0005E - Industrial Image Analysis E0005E - Industrial Image Analysis The Hough Transform Matthew Thurley slides by Johan Carlson 1 This Lecture The Hough transform Detection of lines Detection of other shapes (the generalized Hough transform)

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning CSE 40171: Artificial Intelligence Learning from Data: Unsupervised Learning 32 Homework #6 has been released. It is due at 11:59PM on 11/7. 33 CSE Seminar: 11/1 Amy Reibman Purdue University 3:30pm DBART

More information

Linear Regression & Gradient Descent

Linear Regression & Gradient Descent Linear Regression & Gradient Descent These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online.

More information

When Network Embedding meets Reinforcement Learning?

When Network Embedding meets Reinforcement Learning? When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE

More information

Reinforcement Learning: A brief introduction. Mihaela van der Schaar

Reinforcement Learning: A brief introduction. Mihaela van der Schaar Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:

More information

Stanford University. A Distributed Solver for Kernalized SVM

Stanford University. A Distributed Solver for Kernalized SVM Stanford University CME 323 Final Project A Distributed Solver for Kernalized SVM Haoming Li Bangzheng He haoming@stanford.edu bzhe@stanford.edu GitHub Repository https://github.com/cme323project/spark_kernel_svm.git

More information

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Midterm Examination CS 540-2: Introduction to Artificial Intelligence Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

CS 457 Networking and the Internet. What is Routing. Forwarding versus Routing 9/27/16. Fall 2016 Indrajit Ray. A famous quotation from RFC 791

CS 457 Networking and the Internet. What is Routing. Forwarding versus Routing 9/27/16. Fall 2016 Indrajit Ray. A famous quotation from RFC 791 CS 457 Networking and the Internet Fall 2016 Indrajit Ray What is Routing A famous quotation from RFC 791 A name indicates what we seek An address indicates where it is A route indicates how we get there

More information

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601 Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,

More information

In Homework 1, you determined the inverse dynamics model of the spinbot robot to be

In Homework 1, you determined the inverse dynamics model of the spinbot robot to be Robot Learning Winter Semester 22/3, Homework 2 Prof. Dr. J. Peters, M.Eng. O. Kroemer, M. Sc. H. van Hoof Due date: Wed 6 Jan. 23 Note: Please fill in the solution on this sheet but add sheets for the

More information

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)

Lecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM) School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.

More information

Clustering with Reinforcement Learning

Clustering with Reinforcement Learning Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

A Series of Lectures on Approximate Dynamic Programming

A Series of Lectures on Approximate Dynamic Programming A Series of Lectures on Approximate Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.)

More information

CP365 Artificial Intelligence

CP365 Artificial Intelligence CP365 Artificial Intelligence Example Problem Problem: Does a given image contain cats? Input vector: RGB/BW pixels of the image. Output: Yes or No. Example Problem Problem: What category is a news story?

More information

Machine Learning: Think Big and Parallel

Machine Learning: Think Big and Parallel Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

More information

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro CMU-Q 15-381 Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization Teacher: Gianni A. Di Caro GLOBAL FUNCTION OPTIMIZATION Find the global maximum of the function f x (and

More information

Residual Advantage Learning Applied to a Differential Game

Residual Advantage Learning Applied to a Differential Game Presented at the International Conference on Neural Networks (ICNN 96), Washington DC, 2-6 June 1996. Residual Advantage Learning Applied to a Differential Game Mance E. Harmon Wright Laboratory WL/AAAT

More information

Graph Algorithms (part 3 of CSC 282),

Graph Algorithms (part 3 of CSC 282), Graph Algorithms (part of CSC 8), http://www.cs.rochester.edu/~stefanko/teaching/10cs8 1 Schedule Homework is due Thursday, Oct 1. The QUIZ will be on Tuesday, Oct. 6. List of algorithms covered in the

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Announcements. CS 188: Artificial Intelligence Fall Robot motion planning! Today. Robotics Tasks. Mobile Robots

Announcements. CS 188: Artificial Intelligence Fall Robot motion planning! Today. Robotics Tasks. Mobile Robots CS 188: Artificial Intelligence Fall 2007 Lecture 6: Robot Motion Planning 9/13/2007 Announcements Project 1 due (yesterday)! Project 2 (Pacman with ghosts) up in a few days Reminder: you are allowed to

More information

CS 188: Artificial Intelligence Fall Announcements

CS 188: Artificial Intelligence Fall Announcements CS 188: Artificial Intelligence Fall 2007 Lecture 6: Robot Motion Planning 9/13/2007 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore Announcements Project

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1 CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #19: Machine Learning 1 Supervised Learning Would like to do predicbon: esbmate a func3on f(x) so that y = f(x) Where y can be: Real number:

More information

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016 CPSC 340: Machine Learning and Data Mining Logistic Regression Fall 2016 Admin Assignment 1: Marks visible on UBC Connect. Assignment 2: Solution posted after class. Assignment 3: Due Wednesday (at any

More information

Stable Linear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions

Stable Linear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions Stable Linear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions Benjamin Van Roy and John N. Tsitsiklis Laboratory for Information and Decision Systems Massachusetts

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Nearest Neighbor Classification. Machine Learning Fall 2017

Nearest Neighbor Classification. Machine Learning Fall 2017 Nearest Neighbor Classification Machine Learning Fall 2017 1 This lecture K-nearest neighbor classification The basic algorithm Different distance measures Some practical aspects Voronoi Diagrams and Decision

More information

Marco Wiering Intelligent Systems Group Utrecht University

Marco Wiering Intelligent Systems Group Utrecht University Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment

More information

CPSC 340: Machine Learning and Data Mining. Density-Based Clustering Fall 2016

CPSC 340: Machine Learning and Data Mining. Density-Based Clustering Fall 2016 CPSC 340: Machine Learning and Data Mining Density-Based Clustering Fall 2016 Assignment 1 : Admin 2 late days to hand it in before Wednesday s class. 3 late days to hand it in before Friday s class. 0

More information

k-means Clustering Todd W. Neller Gettysburg College

k-means Clustering Todd W. Neller Gettysburg College k-means Clustering Todd W. Neller Gettysburg College Outline Unsupervised versus Supervised Learning Clustering Problem k-means Clustering Algorithm Visual Example Worked Example Initialization Methods

More information

Basis Function Adaptation in Temporal Difference Reinforcement Learning

Basis Function Adaptation in Temporal Difference Reinforcement Learning Basis Function Adaptation in Temporal Difference Reinforcement Learning Ishai Menache Department of Electrical Engineering Technion, Israel Institute of Technology Haifa 32000, Israel imenache@tx.technion.ac.il

More information

Announcements. Homework 4. Project 3. Due tonight at 11:59pm. Due 3/8 at 4:00pm

Announcements. Homework 4. Project 3. Due tonight at 11:59pm. Due 3/8 at 4:00pm Announcements Homework 4 Due tonight at 11:59pm Project 3 Due 3/8 at 4:00pm CS 188: Artificial Intelligence Constraint Satisfaction Problems Instructor: Stuart Russell & Sergey Levine, University of California,

More information

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B Clustering in R d Clustering CSE 250B Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional

More information

Decision Trees: Representation

Decision Trees: Representation Decision Trees: Representation Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning

More information

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005

Locally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005 Locally Weighted Learning for Control Alexander Skoglund Machine Learning Course AASS, June 2005 Outline Locally Weighted Learning, Christopher G. Atkeson et. al. in Artificial Intelligence Review, 11:11-73,1997

More information

EE368 Project: Visual Code Marker Detection

EE368 Project: Visual Code Marker Detection EE368 Project: Visual Code Marker Detection Kahye Song Group Number: 42 Email: kahye@stanford.edu Abstract A visual marker detection algorithm has been implemented and tested with twelve training images.

More information

Tracking system. Danica Kragic. Object Recognition & Model Based Tracking

Tracking system. Danica Kragic. Object Recognition & Model Based Tracking Tracking system Object Recognition & Model Based Tracking Motivation Manipulating objects in domestic environments Localization / Navigation Object Recognition Servoing Tracking Grasping Pose estimation

More information

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

A Comparative study of Clustering Algorithms using MapReduce in Hadoop A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering

More information

Automatic Image Alignment

Automatic Image Alignment Automatic Image Alignment with a lot of slides stolen from Steve Seitz and Rick Szeliski Mike Nese CS194: Image Manipulation & Computational Photography Alexei Efros, UC Berkeley, Fall 2018 Live Homography

More information

Methods and algorithms for contour analysis of the optical and infrared images

Methods and algorithms for contour analysis of the optical and infrared images Methods and algorithms for contour analysis of the optical and infrared images Kuptsov I.V. Lavrov V.V, Luchkin R.S. Nemykin O.I. Prokhorov M.E. Ryndin Y.G. Sukhanov S.A. The basic algorithms of contour

More information

EN1610 Image Understanding Lab # 4: Corners, Interest Points, Hough Transform

EN1610 Image Understanding Lab # 4: Corners, Interest Points, Hough Transform EN1610 Image Understanding Lab # 4: Corners, Interest Points, Hough Transform The goal of this fourth lab is to ˆ Learn how to detect corners, and use them in a tracking application ˆ Learn how to describe

More information

EE795: Computer Vision and Intelligent Systems

EE795: Computer Vision and Intelligent Systems EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 10 130221 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Canny Edge Detector Hough Transform Feature-Based

More information

Distributed and Asynchronous Policy Iteration for Bounded Parameter Markov Decision Processes

Distributed and Asynchronous Policy Iteration for Bounded Parameter Markov Decision Processes Distributed and Asynchronous Policy Iteration for Bounded Parameter Markov Decision Processes Willy Arthur Silva Reis 1, Karina Valdivia Delgado 2, Leliane Nunes de Barros 1 1 Departamento de Ciência da

More information

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999 Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification

More information

Quadruped Robots and Legged Locomotion

Quadruped Robots and Legged Locomotion Quadruped Robots and Legged Locomotion J. Zico Kolter Computer Science Department Stanford University Joint work with Pieter Abbeel, Andrew Ng Why legged robots? 1 Why Legged Robots? There is a need for

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors

More information