Function Approximation. Pieter Abbeel UC Berkeley EECS
|
|
- Melvyn Horton
- 6 years ago
- Views:
Transcription
1 Function Approximation Pieter Abbeel UC Berkeley EECS
2 Outline Value iteration with function approximation Linear programming with function approximation
3 Value Iteration Algorithm: Start with for all s. For i=1,, H For all states s 2 S: Impractical for large state spaces = the expected sum of rewards accumulated when starting from state s and acting optimally for a horizon of i steps = the optimal action when in state s and getting to act for a horizon of i steps
4 Example: tetris state: board configuration + shape of the falling piece ~2 200 states! action: rotation and translation applied to the falling piece 22 features aka basis functions Á i Ten basis functions, 0,..., 9, mapping the state to the height h[k] of each of the ten columns. Nine basis functions, 10,..., 18, each mapping the state to the absolute difference between heights of successive columns: h[k+1] h[k], k = 1,..., 9. One basis function, 19, that maps state to the maximum column height: max k h [k] One basis function, 20, that maps state to the number of holes in the board. One basis function, 21, that is equal to 1 in every state. ˆV θ (s) = 21 i=0 θ i φ i (s) =(s) [Bertsekas & Ioffe, 1996 (TD); Bertsekas & Tsitsiklis 1996 (TD);Kakade 2002 (policy gradient); Farias & Van Roy, 2006 (approximate LP)]
5 Function Approximation V(s) = + distance to closest ghost + θ 2 distance to closest power pellet + θ 3 in dead-end + θ 4 closer to power pellet than ghost is + = θ 0 θ 1 n θ i φ i (s) =(s) i=0
6 Function Approximation 0 th order approximation (1-nearest neighbor):. s x1 x2 x3 x x5 x6 x7 x8.... x9 x10 x11 x12 φ(s) = ˆV (s) = ˆV (x4) = θ 4 ˆV (s) =(s) Only store values for x1, x2,, x12 call these values θ 1, θ 2,...,θ 12 Assign other states value of nearest x state
7 Function Approximation 1 th order approximation (k-nearest neighbor interpolation): ˆV (s) =φ 1 (s)θ 1 + φ 2 (s)θ 2 + φ 5 (s)θ 5 + φ 6 (s)θ x1. x2 x3 x4 s.... x5 x6 x7 x8.... x9 x10 x11 x12 φ(s) = ˆV (s) =(s) Only store values for x1, x2,, x12 call these values θ 1, θ 2,...,θ 12 Assign other states interpolated value of nearest 4 x states
8 Function Approximation Examples: S = R, ˆV (s) =θ1 + θ 2 s S = R, ˆV (s) =θ1 + θ 2 s + θ 3 s 2 S = R, ˆV (s) = n i=0 θ i s i S, ˆV (s) = log( exp((s)) )
9 Function Approximation Main idea: Use approximation of the true value function, θ ˆV θ is a free parameter to be chosen from its domain V Θ S Representation size: à downto: Θ + : less parameters to estimate - : less expressiveness, typically there exist many V for which there is no θ such that ˆVθ = V
10 Supervised Learning Given: set of examples (s (1),V(s (1) )),,(s (2),V(s (2) )),...,(s (m),v(s (m) )) Asked for: best ˆV θ Representative approach: find through least squares: min θ Θ m ( ˆV θ (s (i) ) V (s (i) )) 2 i=1 θ
11 Supervised Learning Example Linear regression Observation Error or residual Prediction min θ 0,θ 1 n (θ 0 + θ 1 x (i) y (i) ) 2 i=
12 Overfitting To avoid overfitting: reduce number of features used Practical approach: leave-out validation Perform fitting for different choices of feature sets using just 70% of the data Pick feature set that led to highest quality of fit on the remaining 30% of data
13 Status Function approximation through supervised learning BUT: where do the supervised examples come from?
14 Value Iteration with Function Approximation Pick some (typically ) Initialize by choosing some setting for Iterate for i = 0, 1, 2,, H: Step 1: Bellman back-ups s S : S S Vi+1 (s) max a s T (s, a, s ) S << S θ (0) R(s, a, s )+γ ˆV θ (i)(s ) Step 2: Supervised learning find θ (i+1) as the solution of: min ˆVθ (i+1)(s) V 2 i+1 (s) θ s S
15 Value Iteration with Function Approximation --- Example Mini-tetris: two types of blocks, can only choose translation (not rotation) Example state: Reward = 1 for placing a block Sink state / Game over is reached when block is placed such that part of it extends above the red rectangle If you have a complete row, it gets cleared
16 Value Iteration with Function Approximation --- Example S = {,,, }
17 Value Iteration with Function Approximation --- Example S = {,,, } 10 features aka basis functions Á i Four basis functions, 0,..., 3, mapping the state to the height h[k] of each of the four columns. Three basis functions, 4,..., 6, each mapping the state to the absolute difference between heights of successive columns: h[k+1] h[k], k = 1,..., 3. One basis function, 7, that maps state to the maximum column height: max k h[k] One basis function, 8, that maps state to the number of holes in the board. One basis function, 9, that is equal to 1 in every state. Init \theta^{(0)} = ( -1, -1, -1, -1, -2, -2, -2, -3, -2, 10)
18 Value Iteration with Function Approximation --- Example Bellman back-ups for the states in S : V( ) = max {0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ),
19 Value Iteration with Function Approximation --- Example Bellman back-ups for the states in S : V( ) = max {0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ), 0.5 *(1+ V( ))+0.5*(1 + V( ) ),
20 Value Iteration with Function Approximation --- Example S = {,,, } 10 features aka basis functions Á i Four basis functions, 0,..., 3, mapping the state to the height h[k] of each of the four columns. Three basis functions, 4,..., 6, each mapping the state to the absolute difference between heights of successive columns: h[k+1] h[k], k = 1,..., 3. One basis function, 7, that maps state to the maximum column height: max k h[k] One basis function, 8, that maps state to the number of holes in the board. One basis function, 9, that is equal to 1 in every state. Init θ (0) =( 1, 1, 1, 1, 2, 2, 2, 3, 2, 20)
21 Value Iteration with Function Approximation --- Example Bellman back-ups for the states in S : V( )=max {0.5 *(1+ ( ))+0.5*(1 + ( ) ), (6,2,4,0, 4, 2, 4, 6, 0, 1) (6,2,4,0, 4, 2, 4, 6, 0, 1) 0.5 *(1+ ( ))+0.5*(1 + ( ) ), (2,6,4,0, 4, 2, 4, 6, 0, 1) (2,6,4,0, 4, 2, 4, 6, 0, 1) 0.5 *(1+ ( ))+0.5*(1 + ( ) ), (sink-state, V=0) (sink-state, V=0) 0.5 *(1+ ( ))+0.5*(1 + ( ) )} (0,0,2,2, 0,2,0, 2, 0, 1) (0,0,2,2, 0,2,0, 2, 0, 1)
22 Value Iteration with Function Approximation --- Example Bellman back-ups for the states in S : V( )=max {0.5 *( )+0.5*( ), 0.5 *( )+0.5*( ), 0.5 *(1+ 0 )+0.5*(1 + 0 ), 0.5 *(1+ 6 )+0.5*(1 + 6 ), = 6.4 (for = 0.9)
23 Value Iteration with Function Approximation --- Example Bellman back-ups for the second state in S : θ (0) =( 1, 1, 1, 1, 2, 2, 2, 3, 2, 20) V( )=max {0.5 *(1+ ( ))+0.5*(1 + ( ) ), (sink-state, V=0) (sink-state, V=0) 0.5 *(1+ ( ))+0.5*(1 + ( ) ), (sink-state, V=0) (sink-state, V=0) 0.5 *(1+ ( ))+0.5*(1 + ( ) ), (sink-state, V=0) (sink-state, V=0) 0.5 *(1+ ( ))+0.5*(1 + ( ) )} = 19 (0,0,0,0, 0,0,0, 0, 0, 1) (0,0,0,0, 0,0,0, 0, 0, 1) -> V = 20 -> V = 20
24 Value Iteration with Function Approximation --- Example Bellman back-ups for the third state in S : θ (0) =( 1, 1, 1, 1, 2, 2, 2, 3, 2, 20) V( )=max {0.5 *(1+ ( ))+0.5*(1 + ( ) ), (4,4,0,0, 0,4,0, 4, 0, 1) (4,4,0,0, 0,4,0, 4, 0, 1) -> V = -8 -> V = *(1+ ( ))+0.5*(1 + ( ) ), (2,4,4,0, 2,0,4, 4, 0, 1) (2,4,4,0, 2,0,4, 4, 0, 1) -> V = -14 -> V = *(1+ ( ))+0.5*(1 + ( ) ), (0,0,0,0, 0,0,0, 0, 0, 1) (0,0,0,0, 0,0,0, 0, 0, 1) -> V = 20 -> V = 20 = 19
25 Value Iteration with Function Approximation --- Example Bellman back-ups for the fourth state in S : θ (0) =( 1, 1, 1, 1, 2, 2, 2, 3, 2, 20) V( )=max {0.5 *(1+ ( ))+0.5*(1 + ( ) ), (6,6,4,0, 0,2,4, 6, 4, 1) (6,6,4,0, 0,2,4, 6, 4, 1) -> V = -34 -> V = *(1+ ( ))+0.5*(1 + ( ) ), (4,6,6,0, 2,0,6, 6, 4, 1) (4,6.6,0, 2,0,6, 6, 4, 1) -> V = -38 -> V = *(1+ ( ))+0.5*(1 + ( ) ), (4,0,6,6, 4,6,0, 6, 4, 1) (4,0,6,6, 4,6,0, 6, 4, 1) -> V = -42 -> V = -42 = -29.6
26 Value Iteration with Function Approximation --- Example After running the Bellman back-ups for all 4 states in S we have: We now run supervised learning on these 4 examples to find a new µ: V( )= 6.4 (2,2,4,0, 0,2,4, 4, 0, 1) V( )= 19 (4,4,4,0, 0,0,4, 4, 0, 1) V( )= 19 (2,2,0,0, 0,2,0, 2, 0, 1) V( )= (4,0,4,0, 4,4,4, 4, 0, 1) min θ (6.4 ( )) 2 +(19 ( )) 2 +(19 ( )) 2 +(( 29.6) ( )) 2 à Running least squares gives new µ θ (1) =(0.195, 6.24, 2.11, 0, 6.05, 0.13, 2.11, 2.13, 0, 1.59)
27 Potential guarantees?
28 Simple example** µ 2µ Function approximator: [1 2] * µ
29 Simple example**
30 Composing operators** Definition. An operator G is a non-expansion with respect to a norm. if Fact. If the operator F is a contraction with respect to a norm. and the operator G is a non-expansion with respect to the same norm, then the sequential application of the operators G and F is a -contraction, i.e., Corollary. If the supervised learning step is a nonexpansion, then iteration in value iteration with function approximation is a -contraction, and in this case we have a convergence guarantee.
31 Averager function approximators are non-expansions** Examples: nearest neighbor (aka state aggregation) linear interpolation over triangles (tetrahedrons, )
32 Averager function approximators are non-expansions**
33 Linear regression L ** [Example taken from Gordon, 1995.]
34 Guarantees for fixed point** I.e., if we pick a non-expansion function approximator which can approximate J* well, then we obtain a good value function estimate. To apply to discretization: use continuity assumptions to show that J* can be approximated well by chosen discretization scheme
35 Outline Value iteration with function approximation Linear programming with function approximation
36 Infinite Horizon Linear Program min V µ 0 (s)v (s) s S s.t. V(s) s T (s, a, s )[R(s, a, s )+γv (s )], s S, a A µ 0 is a probability distribution over S, with µ 0 (s)> 0 for all s 2 S. Theorem. V * is the solution to the above LP.
37 Infinite Horizon Linear Program min V µ 0 (s)v (s) s S s.t. V(s) s T (s, a, s )[R(s, a, s )+γv (s )], s S, a A V (s) =(s) Let:, and consider S rather than S: min θ s S µ 0 (s)(s) s.t. (s) s T (s, a, s ) R(s, a, s )+γ(s ), s S,a A à Linear program that finds ˆV θ (s) =(s)
38 Approximate Linear Program Guarantees** min θ s S µ 0 (s)(s) s.t. (s) s T (s, a, s ) R(s, a, s )+γ(s ), s S,a A LP solver will converge Solution quality: [de Farias and Van Roy, 2002] Assuming one of the features is the feature that is equal to one for all states, and assuming S =S we have that: V Φθ 1,µ0 2 1 γ min V Φθ θ (slightly weaker, probabilistic guarantees hold for S not equal to S, these guarantees require size of S to grow as the number of features grows)
Func%on Approxima%on. Pieter Abbeel UC Berkeley EECS
Func%on Approxima%on Pieter Abbeel UC Berkeley EECS Value Itera4on Algorithm: Start with for all s. For i=1,, H For all states s 2 S: Imprac4cal for large state spaces = the expected sum of rewards accumulated
More informationCS 687 Jana Kosecka. Reinforcement Learning Continuous State MDP s Value Function approximation
CS 687 Jana Kosecka Reinforcement Learning Continuous State MDP s Value Function approximation Markov Decision Process - Review Formal definition 4-tuple (S, A, T, R) Set of states S - finite Set of actions
More informationTD(0) with linear function approximation guarantees
CS 287: Advanced Robotics Fall 2009 Lecture 15: LSTD, LSPI, RLSTD, imitation learning Pieter Abbeel UC Berkeley EECS TD(0) with linear function approximation guarantees Stochastic approximation of the
More informationLSTD(0) in policy iteration LSTD(0) Page 1. TD(0) with linear function approximation guarantees. TD(0) with linear function approximation guarantees
TD(0) with linear function approximation guarantees Stochastic approximation of the following operations: Back-up: (T π V)(s)= s T(s,π(s),s )[R(s,π(s),s )+γv(s )] CS 287: Advanced Robotics Fall 2009 Lecture
More information15-780: MarkovDecisionProcesses
15-780: MarkovDecisionProcesses J. Zico Kolter Feburary 29, 2016 1 Outline Introduction Formal definition Value iteration Policy iteration Linear programming for MDPs 2 1988 Judea Pearl publishes Probabilistic
More informationLinear Regression and K-Nearest Neighbors 3/28/18
Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,
More informationPlanning and Control: Markov Decision Processes
CSE-571 AI-based Mobile Robotics Planning and Control: Markov Decision Processes Planning Static vs. Dynamic Predictable vs. Unpredictable Fully vs. Partially Observable Perfect vs. Noisy Environment What
More informationA Brief Introduction to Reinforcement Learning
A Brief Introduction to Reinforcement Learning Minlie Huang ( ) Dept. of Computer Science, Tsinghua University aihuang@tsinghua.edu.cn 1 http://coai.cs.tsinghua.edu.cn/hml Reinforcement Learning Agent
More informationScan Matching. Pieter Abbeel UC Berkeley EECS. Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics
Scan Matching Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Scan Matching Overview Problem statement: Given a scan and a map, or a scan and a scan,
More informationCS 188: Artificial Intelligence Spring Recap Search I
CS 188: Artificial Intelligence Spring 2011 Midterm Review 3/14/2011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the
More informationRecap Search I. CS 188: Artificial Intelligence Spring Recap Search II. Example State Space Graph. Search Problems.
CS 188: Artificial Intelligence Spring 011 Midterm Review 3/14/011 Pieter Abbeel UC Berkeley Recap Search I Agents that plan ahead à formalization: Search Search problem: States (configurations of the
More informationDiscre'za'on. Pieter Abbeel UC Berkeley EECS
Discre'za'on Pieter Abbeel UC Berkeley EECS Markov Decision Process AssumpCon: agent gets to observe the state [Drawing from Su;on and Barto, Reinforcement Learning: An IntroducCon, 1998] Markov Decision
More informationAnnouncements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.
CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted
More informationRobust Value Function Approximation Using Bilinear Programming
Robust Value Function Approximation Using Bilinear Programming Marek Petrik Department of Computer Science University of Massachusetts Amherst, MA 01003 petrik@cs.umass.edu Shlomo Zilberstein Department
More informationQ-learning with linear function approximation
Q-learning with linear function approximation Francisco S. Melo and M. Isabel Ribeiro Institute for Systems and Robotics [fmelo,mir]@isr.ist.utl.pt Conference on Learning Theory, COLT 2007 June 14th, 2007
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationApprenticeship Learning for Reinforcement Learning. with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang
Apprenticeship Learning for Reinforcement Learning with application to RC helicopter flight Ritwik Anand, Nick Haliday, Audrey Huang Table of Contents Introduction Theory Autonomous helicopter control
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running
More informationThe Problem of Overfitting with Maximum Likelihood
The Problem of Overfitting with Maximum Likelihood In the previous example, continuing training to find the absolute maximum of the likelihood produced overfitted results. The effect is much bigger if
More informationUnsupervised Learning. CS 3793/5233 Artificial Intelligence Unsupervised Learning 1
Unsupervised CS 3793/5233 Artificial Intelligence Unsupervised 1 EM k-means Procedure Data Random Assignment Assign 1 Assign 2 Soft k-means In clustering, the target feature is not given. Goal: Construct
More informationCase-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationCS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley 1 1 Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance
More informationLearning from Data: Adaptive Basis Functions
Learning from Data: Adaptive Basis Functions November 21, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Neural Networks Hidden to output layer - a linear parameter model But adapt the features of the model.
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationAlgorithms for Solving RL: Temporal Difference Learning (TD) Reinforcement Learning Lecture 10
Algorithms for Solving RL: Temporal Difference Learning (TD) 1 Reinforcement Learning Lecture 10 Gillian Hayes 8th February 2007 Incremental Monte Carlo Algorithm TD Prediction TD vs MC vs DP TD for control:
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationMarkov Decision Processes and Reinforcement Learning
Lecture 14 and Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Slides by Stuart Russell and Peter Norvig Course Overview Introduction Artificial Intelligence
More informationMachine Learning. Supervised Learning. Manfred Huber
Machine Learning Supervised Learning Manfred Huber 2015 1 Supervised Learning Supervised learning is learning where the training data contains the target output of the learning system. Training data D
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationThe Simplex Algorithm for LP, and an Open Problem
The Simplex Algorithm for LP, and an Open Problem Linear Programming: General Formulation Inputs: real-valued m x n matrix A, and vectors c in R n and b in R m Output: n-dimensional vector x There is one
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationConvergence of reinforcement learning with general function approximators
Convergence of reinforcement learning with general function approximators Vassilis A-Papavassiliou and Stuart Russell Computer Science Division, U. of California, Berkeley, CA 94720-1776 {vassilis^russell}
More informationCS 188: Artificial Intelligence. Recap Search I
CS 188: Artificial Intelligence Review of Search, CSPs, Games DISCLAIMER: It is insufficient to simply study these slides, they are merely meant as a quick refresher of the high-level ideas covered. You
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationUnlabeled Data Classification by Support Vector Machines
Unlabeled Data Classification by Support Vector Machines Glenn Fung & Olvi L. Mangasarian University of Wisconsin Madison www.cs.wisc.edu/ olvi www.cs.wisc.edu/ gfung The General Problem Given: Points
More informationMarkov Decision Processes. (Slides from Mausam)
Markov Decision Processes (Slides from Mausam) Machine Learning Operations Research Graph Theory Control Theory Markov Decision Process Economics Robotics Artificial Intelligence Neuroscience /Psychology
More informationIntroduction to Reinforcement Learning. J. Zico Kolter Carnegie Mellon University
Introduction to Reinforcement Learning J. Zico Kolter Carnegie Mellon University 1 Agent interaction with environment Agent State s Reward r Action a Environment 2 Of course, an oversimplification 3 Review:
More informationSampling-Based Motion Planning
Sampling-Based Motion Planning Pieter Abbeel UC Berkeley EECS Many images from Lavalle, Planning Algorithms Motion Planning Problem Given start state x S, goal state x G Asked for: a sequence of control
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 14: Training of Deep NNs Instructor: Sun Sun 1 Outline Activation functions Regularization Gradient-based optimization 2 Examples of activation functions 3 5/28/18 Sun Sun
More informationRobust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson
Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline Introduction OLS and important terminology Least Median of Squares (LMedS) M-estimator Penalized least squares What is
More informationUnit #13 : Integration to Find Areas and Volumes, Volumes of Revolution
Unit #13 : Integration to Find Areas and Volumes, Volumes of Revolution Goals: Beabletoapplyaslicingapproachtoconstructintegralsforareasandvolumes. Be able to visualize surfaces generated by rotating functions
More informationKernels and Clustering
Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity
More informationCS 188: Artificial Intelligence
CS 188: Artificial Intelligence Constraint Satisfaction Problems II and Local Search Instructors: Sergey Levine and Stuart Russell University of California, Berkeley [These slides were created by Dan Klein
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationApproximate Q-Learning 3/23/18
Approximate Q-Learning 3/23/18 On-Policy Learning (SARSA) Instead of updating based on the best action from the next state, update based on the action your current policy actually takes from the next state.
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationE0005E - Industrial Image Analysis
E0005E - Industrial Image Analysis The Hough Transform Matthew Thurley slides by Johan Carlson 1 This Lecture The Hough transform Detection of lines Detection of other shapes (the generalized Hough transform)
More informationDivide and Conquer Kernel Ridge Regression
Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 20 2018 Review Solution for multiple linear regression can be computed in closed form
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationCSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning
CSE 40171: Artificial Intelligence Learning from Data: Unsupervised Learning 32 Homework #6 has been released. It is due at 11:59PM on 11/7. 33 CSE Seminar: 11/1 Amy Reibman Purdue University 3:30pm DBART
More informationLinear Regression & Gradient Descent
Linear Regression & Gradient Descent These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online.
More informationWhen Network Embedding meets Reinforcement Learning?
When Network Embedding meets Reinforcement Learning? ---Learning Combinatorial Optimization Problems over Graphs Changjun Fan 1 1. An Introduction to (Deep) Reinforcement Learning 2. How to combine NE
More informationReinforcement Learning: A brief introduction. Mihaela van der Schaar
Reinforcement Learning: A brief introduction Mihaela van der Schaar Outline Optimal Decisions & Optimal Forecasts Markov Decision Processes (MDPs) States, actions, rewards and value functions Dynamic Programming
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:
More informationStanford University. A Distributed Solver for Kernalized SVM
Stanford University CME 323 Final Project A Distributed Solver for Kernalized SVM Haoming Li Bangzheng He haoming@stanford.edu bzhe@stanford.edu GitHub Repository https://github.com/cme323project/spark_kernel_svm.git
More informationMidterm Examination CS 540-2: Introduction to Artificial Intelligence
Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationCS 457 Networking and the Internet. What is Routing. Forwarding versus Routing 9/27/16. Fall 2016 Indrajit Ray. A famous quotation from RFC 791
CS 457 Networking and the Internet Fall 2016 Indrajit Ray What is Routing A famous quotation from RFC 791 A name indicates what we seek An address indicates where it is A route indicates how we get there
More informationDisguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network. Nathan Sun CIS601
Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601 Introduction Face ID is complicated by alterations to an individual s appearance Beard,
More informationIn Homework 1, you determined the inverse dynamics model of the spinbot robot to be
Robot Learning Winter Semester 22/3, Homework 2 Prof. Dr. J. Peters, M.Eng. O. Kroemer, M. Sc. H. van Hoof Due date: Wed 6 Jan. 23 Note: Please fill in the solution on this sheet but add sheets for the
More informationLecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)
School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.
More informationClustering with Reinforcement Learning
Clustering with Reinforcement Learning Wesam Barbakh and Colin Fyfe, The University of Paisley, Scotland. email:wesam.barbakh,colin.fyfe@paisley.ac.uk Abstract We show how a previously derived method of
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationA Series of Lectures on Approximate Dynamic Programming
A Series of Lectures on Approximate Dynamic Programming Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology Lucca, Italy June 2017 Bertsekas (M.I.T.)
More informationCP365 Artificial Intelligence
CP365 Artificial Intelligence Example Problem Problem: Does a given image contain cats? Input vector: RGB/BW pixels of the image. Output: Yes or No. Example Problem Problem: What category is a news story?
More informationMachine Learning: Think Big and Parallel
Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least
More informationCMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro
CMU-Q 15-381 Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization Teacher: Gianni A. Di Caro GLOBAL FUNCTION OPTIMIZATION Find the global maximum of the function f x (and
More informationResidual Advantage Learning Applied to a Differential Game
Presented at the International Conference on Neural Networks (ICNN 96), Washington DC, 2-6 June 1996. Residual Advantage Learning Applied to a Differential Game Mance E. Harmon Wright Laboratory WL/AAAT
More informationGraph Algorithms (part 3 of CSC 282),
Graph Algorithms (part of CSC 8), http://www.cs.rochester.edu/~stefanko/teaching/10cs8 1 Schedule Homework is due Thursday, Oct 1. The QUIZ will be on Tuesday, Oct. 6. List of algorithms covered in the
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationAnnouncements. CS 188: Artificial Intelligence Fall Robot motion planning! Today. Robotics Tasks. Mobile Robots
CS 188: Artificial Intelligence Fall 2007 Lecture 6: Robot Motion Planning 9/13/2007 Announcements Project 1 due (yesterday)! Project 2 (Pacman with ghosts) up in a few days Reminder: you are allowed to
More informationCS 188: Artificial Intelligence Fall Announcements
CS 188: Artificial Intelligence Fall 2007 Lecture 6: Robot Motion Planning 9/13/2007 Dan Klein UC Berkeley Many slides over the course adapted from either Stuart Russell or Andrew Moore Announcements Project
More informationCS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1
CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #19: Machine Learning 1 Supervised Learning Would like to do predicbon: esbmate a func3on f(x) so that y = f(x) Where y can be: Real number:
More informationCPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016
CPSC 340: Machine Learning and Data Mining Logistic Regression Fall 2016 Admin Assignment 1: Marks visible on UBC Connect. Assignment 2: Solution posted after class. Assignment 3: Due Wednesday (at any
More informationStable Linear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions
Stable Linear Approximations to Dynamic Programming for Stochastic Control Problems with Local Transitions Benjamin Van Roy and John N. Tsitsiklis Laboratory for Information and Decision Systems Massachusetts
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationNearest Neighbor Classification. Machine Learning Fall 2017
Nearest Neighbor Classification Machine Learning Fall 2017 1 This lecture K-nearest neighbor classification The basic algorithm Different distance measures Some practical aspects Voronoi Diagrams and Decision
More informationMarco Wiering Intelligent Systems Group Utrecht University
Reinforcement Learning for Robot Control Marco Wiering Intelligent Systems Group Utrecht University marco@cs.uu.nl 22-11-2004 Introduction Robots move in the physical environment to perform tasks The environment
More informationCPSC 340: Machine Learning and Data Mining. Density-Based Clustering Fall 2016
CPSC 340: Machine Learning and Data Mining Density-Based Clustering Fall 2016 Assignment 1 : Admin 2 late days to hand it in before Wednesday s class. 3 late days to hand it in before Friday s class. 0
More informationk-means Clustering Todd W. Neller Gettysburg College
k-means Clustering Todd W. Neller Gettysburg College Outline Unsupervised versus Supervised Learning Clustering Problem k-means Clustering Algorithm Visual Example Worked Example Initialization Methods
More informationBasis Function Adaptation in Temporal Difference Reinforcement Learning
Basis Function Adaptation in Temporal Difference Reinforcement Learning Ishai Menache Department of Electrical Engineering Technion, Israel Institute of Technology Haifa 32000, Israel imenache@tx.technion.ac.il
More informationAnnouncements. Homework 4. Project 3. Due tonight at 11:59pm. Due 3/8 at 4:00pm
Announcements Homework 4 Due tonight at 11:59pm Project 3 Due 3/8 at 4:00pm CS 188: Artificial Intelligence Constraint Satisfaction Problems Instructor: Stuart Russell & Sergey Levine, University of California,
More informationClustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B
Clustering in R d Clustering CSE 250B Two common uses of clustering: Vector quantization Find a finite set of representatives that provides good coverage of a complex, possibly infinite, high-dimensional
More informationDecision Trees: Representation
Decision Trees: Representation Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning
More informationLocally Weighted Learning for Control. Alexander Skoglund Machine Learning Course AASS, June 2005
Locally Weighted Learning for Control Alexander Skoglund Machine Learning Course AASS, June 2005 Outline Locally Weighted Learning, Christopher G. Atkeson et. al. in Artificial Intelligence Review, 11:11-73,1997
More informationEE368 Project: Visual Code Marker Detection
EE368 Project: Visual Code Marker Detection Kahye Song Group Number: 42 Email: kahye@stanford.edu Abstract A visual marker detection algorithm has been implemented and tested with twelve training images.
More informationTracking system. Danica Kragic. Object Recognition & Model Based Tracking
Tracking system Object Recognition & Model Based Tracking Motivation Manipulating objects in domestic environments Localization / Navigation Object Recognition Servoing Tracking Grasping Pose estimation
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationAutomatic Image Alignment
Automatic Image Alignment with a lot of slides stolen from Steve Seitz and Rick Szeliski Mike Nese CS194: Image Manipulation & Computational Photography Alexei Efros, UC Berkeley, Fall 2018 Live Homography
More informationMethods and algorithms for contour analysis of the optical and infrared images
Methods and algorithms for contour analysis of the optical and infrared images Kuptsov I.V. Lavrov V.V, Luchkin R.S. Nemykin O.I. Prokhorov M.E. Ryndin Y.G. Sukhanov S.A. The basic algorithms of contour
More informationEN1610 Image Understanding Lab # 4: Corners, Interest Points, Hough Transform
EN1610 Image Understanding Lab # 4: Corners, Interest Points, Hough Transform The goal of this fourth lab is to ˆ Learn how to detect corners, and use them in a tracking application ˆ Learn how to describe
More informationEE795: Computer Vision and Intelligent Systems
EE795: Computer Vision and Intelligent Systems Spring 2012 TTh 17:30-18:45 FDH 204 Lecture 10 130221 http://www.ee.unlv.edu/~b1morris/ecg795/ 2 Outline Review Canny Edge Detector Hough Transform Feature-Based
More informationDistributed and Asynchronous Policy Iteration for Bounded Parameter Markov Decision Processes
Distributed and Asynchronous Policy Iteration for Bounded Parameter Markov Decision Processes Willy Arthur Silva Reis 1, Karina Valdivia Delgado 2, Leliane Nunes de Barros 1 1 Departamento de Ciência da
More informationText Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999
Text Categorization Foundations of Statistic Natural Language Processing The MIT Press1999 Outline Introduction Decision Trees Maximum Entropy Modeling (optional) Perceptrons K Nearest Neighbor Classification
More informationQuadruped Robots and Legged Locomotion
Quadruped Robots and Legged Locomotion J. Zico Kolter Computer Science Department Stanford University Joint work with Pieter Abbeel, Andrew Ng Why legged robots? 1 Why Legged Robots? There is a need for
More informationNatural Language Processing
Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors
More information