Func%on Approxima%on. Pieter Abbeel UC Berkeley EECS

Size: px

Start display at page:

Download "Func%on Approxima%on. Pieter Abbeel UC Berkeley EECS"

Philip McGee
6 years ago
Views:

1 Func%on Approxima%on Pieter Abbeel UC Berkeley EECS

2 Value Itera4on Algorithm: Start with for all s. For i=1,, H For all states s 2 S: Imprac4cal for large state spaces = the expected sum of rewards accumulated when star4ng from state s and ac4ng op4mally for a horizon of i steps = the op4mal ac4on when in state s and gemng to act for a horizon of i steps

Nine basis func4ons, 10,..., 18, each mapping the state to the absolute difference between heights of successive columns: h[k+1] h[k], k = 1,..., 9.

3 Example: tetris state: board configura4on + shape of the falling piece ~2 200 states! ac4on: rota4on and transla4on applied to the falling piece 22 features aka basis func4ons Á i Ten basis func4ons, 0,..., 9, mapping the state to the height h[k] of each of the ten columns. Nine basis func4ons, 10,..., 18, each mapping the state to the absolute difference between heights of successive columns: h[k+1] h[k], k = 1,..., 9. One basis func4on, 19, that maps state to the maximum column height: max k h[k] One basis func4on, 20, that maps state to the number of holes in the board. One basis func4on, 21, that is equal to 1 in every state. ˆV θ (s) = 21 i=0 θ i φ i (s) =θ φ(s) [Bertsekas & Ioffe, 1996 (TD); Bertsekas & Tsitsiklis 1996 (TD); Kakade 2002 (policy gradient); Farias & Van Roy, 2006 (approximate LP)]

4 Function Approximation V(s) = + distance to closest ghost + θ 2 distance to closest power pellet + θ 3 in dead-end + θ 4 closer to power pellet than ghost is + = θ 0 θ 1 n θ i φ i (s) =θ φ(s) i=0

5 Func4on Approxima4on 0 th order approxima4on (1- nearest neighbor):. s x1 x2 x3 x x5 x6 x7 x8.... x9 x10 x11 x12 φ(s) = ˆV (s) = ˆV (x4) = θ 4 ˆV (s) =θ φ(s) Only store values for x1, x2,, x12 call these values θ 1, θ 2,...,θ 12 Assign other states value of nearest x state

6 Function Approximation 1 th order approxima4on (k- nearest neighbor interpola4on): ˆV (s) =φ 1 (s)θ 1 + φ 2 (s)θ 2 + φ 5 (s)θ 5 + φ 6 (s)θ x1. x2 x3 x4 s.... x5 x6 x7 x8.... x9 x10 x11 x12 φ(s) = ˆV (s) =θ φ(s) Only store values for x1, x2,, x12 call these values θ 1, θ 2,...,θ 12 Assign other states interpolated value of nearest 4 x states

7 Func4on Approxima4on Examples: S = R, ˆV (s) =θ1 + θ 2 s S = R, ˆV (s) =θ1 + θ 2 s + θ 3 s 2 S = R, ˆV (s) = n i=0 θ i s i S, ˆV (s) = log( exp(θ φ(s)) )

8 Func4on Approxima4on Main idea: ˆV θ Use approxima4on of the true value func4on, V θ is a free parameter to be chosen from its domain Θ S Representa4on size: à downto: Θ + : less parameters to es4mate - : less expressiveness, typically there exist many V for which there is no θ such that ˆVθ = V

9 Supervised Learning Given: set of examples (s (1),V(s (1) ), (s (2),V(s (2) ),...,(s (m),v(s (m) ) Asked for: best ˆV θ Representa4ve approach: find through least squares: min θ Θ m ( ˆV θ (s (i) ) V (s (i) )) 2 i=1 θ

10 Supervised Learning Example Linear regression Observa4on Error or residual Predic4on min θ 0,θ 1 n (θ 0 + θ 1 x (i) y (i) ) 2 i=

11 OverfiMng To avoid overfimng: reduce number of features used Prac4cal approach: leave- out valida4on Perform fimng for different choices of feature sets using just 70% of the data Pick feature set that led to highest quality of fit on the remaining 30% of data

12 Value Itera4on with Func4on Approxima4on Pick some (typically ) Ini4alize by choosing some semng for Iterate for i = 0, 1, 2,, H: Step 1: Bellman back- ups s S : S S Vi+1 (s) max a s T (s, a, s ) S << S θ (0) R(s, a, s )+γ ˆV θ (i)(s ) Step 2: Supervised learning find θ (i+1) as the solu4on of: min ˆVθ (i+1)(s) V 2 i+1 (s) θ s S

13 Infinite Horizon Linear Program min V µ 0 (s)v (s) s S s.t. V(s) s T (s, a, s )[R(s, a, s )+γv (s )], s S, a A µ 0 is a probability distribu4on over S, with µ 0 (s)> 0 for all s 2 S. Theorem. V * is the solu4on to the above LP.

14 Infinite Horizon Linear Program min V µ 0 (s)v (s) s S s.t. V(s) s T (s, a, s )[R(s, a, s )+γv (s )], s S, a A V (s) =θ φ(s) Let:, and consider S rather than S: min θ s S µ 0 (s)θ φ(s) s.t. θ φ(s) s T (s, a, s ) R(s, a, s )+γθ φ(s ), s S,a A à Linear program that finds ˆV θ (s) =θ φ(s)

15 Approximate Linear Program Guarantees** min θ s S µ 0 (s)θ φ(s) s.t. θ φ(s) s T (s, a, s ) R(s, a, s )+γθ φ(s ), s S,a A LP solver will converge Solu4on quality: [de Farias and Van Roy, 2002] Assuming one of the features is the feature that is equal to one for all states, and assuming S =S we have that: V Φθ 1,µ0 2 1 γ min V Φθ θ (slightly weaker, probabilis4c guarantees hold for S not equal to S, these guarantees require size of S to grow as the number of features grows)

16 Sampling-Based Motion Planning Pieter Abbeel UC Berkeley EECS Many images from Lavalle, Planning Algorithms

17 Motion Planning Problem Given start state x S, goal state x G Asked for: a sequence of control inputs that leads from start to goal Why tricky? Need to avoid obstacles For systems with underactuated dynamics: can t simply move along any coordinate at will E.g., car, helicopter, airplane, but also robot manipulator hitting joint limits

18 Solve by Nonlinear Optimization for Control? Could try by, for example, following formulation: X t can encode obstacles Or, with constraints, (which would require using an infeasible method): Can work surprisingly well, but for more complicated problems with longer horizons, often get stuck in local maxima that don t reach the goal

19 Examples Helicopter path planning Swinging up cart-pole Acrobot

20 Examples

21 Examples

22 Examples

23 Motion Planning: Outline Configuration Space Probabilistic Roadmap Boundary Value Problem Sampling Collision checking Rapidly-exploring Random Trees (RRTs) Smoothing

24 Configuration Space (C-Space) = { x x is a pose of the robot} obstacles à configuration space obstacles Workspace Configuration Space (2 DOF: translation only, no rotation) free space obstacles

25 Motion planning

26 Probabilistic Roadmap (PRM) Space R n forbidden space Free/feasible space 11

27 Probabilistic Roadmap (PRM) Configurations are sampled by picking coordinates at random 12

28 Probabilistic Roadmap (PRM) Configurations are sampled by picking coordinates at random 13

29 Probabilistic Roadmap (PRM) Sampled configurations are tested for collision 14

30 Probabilistic Roadmap (PRM) The collision-free configurations are retained as milestones 15

31 Probabilistic Roadmap (PRM) Each milestone is linked by straight paths to its nearest neighbors 16

32 Probabilistic Roadmap (PRM) Each milestone is linked by straight paths to its nearest neighbors 17

33 Probabilistic Roadmap (PRM) The collision-free links are retained as local paths to form the PRM 18

34 Probabilistic Roadmap (PRM) The start and goal configurations are included as milestones s g 19

35 Probabilistic Roadmap (PRM) The PRM is searched for a path from s to g s g 20

36 Probabilistic Roadmap Initialize set of points with x S and x G Randomly sample points in configuration space Connect nearby points if they can be reached from each other Find path from x S to x G in the graph Alternatively: keep track of connected components incrementally, and declare success when x S and x G are in same connected component

37 PRM example

38 PRM example 2

39 Sampling How to sample uniformly at random from [0,1] n? Sample uniformly at random from [0,1] for each coordinate How to sample uniformly at random from the surface of the n-d unit sphere? Sample from n-d Gaussian à isotropic; then just normalize How to sample uniformly at random for orientations in 3-D?

40 PRM: Challenges 1. Connecting neighboring points: Only easy for holonomic systems (i.e., for which you can move each degree of freedom at will at any time). Generally requires solving a Boundary Value Problem Typically solved without collision checking; later verified if valid by collision checking 2. Collision checking: Often takes majority of time in applications (see Lavalle)

41 PRM s Pros and Cons Pro: Probabilistically complete: i.e., with probability one, if run for long enough the graph will contain a solution path if one exists. Cons: Required to solve 2 point boundary value problem Build graph over state space but no particular focus on generating a path

42 Rapidly exploring Random Trees Basic idea: Build up a tree through generating next states in the tree by executing random controls However: not exactly above to ensure good coverage

43 Rapidly-exploring Random Trees (RRT) RANDOM_STATE(): often uniformly at random over space with probability 99%, and the goal state with probability 1%, this ensures it attempts to connect to goal semi-regularly

44 RRT Practicalities NEAREST_NEIGHBOR(x rand, T): need to find (approximate) nearest neighbor efficiently KD Trees data structure (upto 20-D) [e.g., FLANN] Locality Sensitive Hashing SELECT_INPUT(x rand, x near ) Two point boundary value problem If too hard to solve, often just select best out of a set of control sequences. This set could be random, or some well chosen set of primitives.

45 RRT Extension No obstacles, holonomic: With obstacles, holonomic: Non-holonomic: approximately (sometimes as approximate as picking best of a few random control sequences) solve two-point boundary value problem

46 Growing RRT Demo:

47 Bi-directional RRT Volume swept out by unidirectional RRT: x S x G Volume swept out by bi-directional RRT: x S x G Difference becomes far more pronounced in higher dimensions

48 Multi-directional RRT Planning around obstacles or through narrow passages can often be easier in one direction than the other

Resolution-Complete RRT (RC-RRT) Issue: nearest points chosen for expansion are (too) often the ones stuck behind an obstacle RC-RRT solution: Choose a maximum number of times, m, you are willing to

49 Resolution-Complete RRT (RC-RRT) Issue: nearest points chosen for expansion are (too) often the ones stuck behind an obstacle RC-RRT solution: Choose a maximum number of times, m, you are willing to try to expand each node For each node in the tree, keep track of its Constraint Violation Frequency (CVF) Initialize CVF to zero when node is added to tree Whenever an expansion from the node is unsuccessful (e.g., per hitting an obstacle): Increase CVF of that node by 1 Increase CVF of its parent node by 1/m, its grandparent 1/m 2, When a node is selected for expansion, skip over it with probability CVF/m

50 RRT* Source: Karaman and Frazzoli

51 RRT* Asymptotically optimal Main idea: Swap new point in as parent for nearby vertices who can be reached along shorter path through new point than through their original (current) parent

52 RRT* RRT RRT* Source: Karaman and Frazzoli

53 RRT* RRT RRT* Source: Karaman and Frazzoli

54 LQR-trees (Tedrake, IJRR 2010) Idea: grow a randomized tree of stabilizing controllers to the goal Like RRT Can discard sample points in already stabilized region

55 LQR-trees (Tedrake) Ck: stabilized region after iteration k

56 LQR-trees (Tedrake)

57 Smoothing Randomized motion planners tend to find not so great paths for execution: very jagged, often much longer than necessary. à In practice: do smoothing before using the path Shortcutting: along the found path, pick two vertices x t1, x t2 and try to connect them directly (skipping over all intermediate vertices) Nonlinear optimization for optimal control Allows to specify an objective function that includes smoothness in state, control, small control inputs, etc.

Sampling-Based Motion Planning

Sampling-Based Motion Planning Pieter Abbeel UC Berkeley EECS Many images from Lavalle, Planning Algorithms Motion Planning Problem Given start state x S, goal state x G Asked for: a sequence of control