Can we quantify the hardness of learning manipulation? Kris Hauser Department of Electrical and Computer Engineering Duke University

Size: px

Start display at page:

Download "Can we quantify the hardness of learning manipulation? Kris Hauser Department of Electrical and Computer Engineering Duke University"

Ronald Atkinson
5 years ago
Views:

1 Can we quantify the hardness of learning manipulation? Kris Hauser Department of Electrical and Computer Engineering Duke University

2 Robot Learning!

3 Robot Learning! Google used 14 identical robots 800,000 real grasp demonstrations 80% success rate (Levine et al, ArXiV 2016)

4 Robot Learning! Berkeley Dex-Net million synthetic grasps 80% success rate (Mahler et al, RSS 2017)

5 How much data Can be practically gathered? ~1,000 human-guided demonstrations ~10,000 autonomous demonstrations (safe problems) ~1m simulation demonstrations >1b online images Is needed to learn well????

6 What to learn? Partial dynamic information Parameters for physics sims Parameters for mathematical models Dynamics Dynamics function Rigid body simulations Partial decisions Simplified grasps Grasp poses Target corrections Decisions Policies Force limit surfaces Pressure distributions Grasp metrics Tracking controllers Optimization margins Trajectories

7 What to learn? Partial dynamic information Parameters for physics sims Parameters for mathematical models Force limit surfaces Pressure distributions Dynamics Dynamics function Rigid body simulations Grasp metrics Partial decisions Simplified grasps Grasp poses Target corrections Tracking controllers Optimization margins Decisions Policies Rely on optimization/ planning to achieve manipulation Trajectories

8 What to learn? Partial dynamic information Parameters for physics sims Parameters for mathematical models Dynamics Zhou and Hauser, RSS 2017 Revisiting Contact workshop Force limit surfaces Dynamics function Learn control directly Rigid body simulations (May need a perception system, or might learn end-to-end) Pressure distributions Grasp metrics Partial decisions Simplified grasps Grasp poses Target corrections Tracking controllers Optimization margins Decisions Policies Trajectories

9 What to learn? Partial dynamic information Parameters for physics sims Parameters for mathematical models Dynamics Dynamics function Rigid body simulations Partial decisions Simplified grasps Grasp poses Target corrections Decisions Policies Learning full dynamics / decisions Tracking requires less analysis from a human controllers engineer Force limit surfaces Pressure distributions Grasp metrics Optimization margins Typically requires a lot of data Trajectories

10 What to learn? Partial dynamic information Parameters for physics sims Parameters for mathematical models Dynamics Dynamics function Rigid body simulations Partial decisions Simplified grasps Grasp poses Target corrections Decisions Policies Partial learning combines data Tracking with prior knowledge of physics controllers Force limit surfaces Pressure distributions Grasp metrics Optimization margins Typically generalizes better with less data Trajectories

Parameters for mathematical models Force limit surfaces Pressure distributions Dynamics Dynamics function Rigid body simulations

11 What to learn? Hauser, Wang, and Cutkosky, RSS 2017 Luo and Hauser, RSS 2015 & AuRo 2017 Partial dynamic information Parameters for physics sims Parameters for mathematical models Force limit surfaces Pressure distributions Dynamics Dynamics function Rigid body simulations Grasp metrics Partial decisions Simplified grasps Grasp poses Target corrections Partial learning combines data Tracking with prior knowledge of physics controllers Optimization margins Typically generalizes better with less data Decisions Policies Trajectories

models Dynamics Dynamics function Rigid body simulations Partial decisions Simplified

corrections Decisions Policies Partial learning combines data Tracking with prior

12 What to learn? Partial dynamic information Parameters for physics sims Parameters for mathematical models Dynamics Dynamics function Rigid body simulations Partial decisions Simplified grasps Hauser ISRR 2013, Rocchi and Hauser 2016 Grasp Klamp t posessimulator Target corrections Decisions Policies Partial learning combines data Tracking with prior knowledge of physics controllers Force limit surfaces Pressure distributions Grasp metrics Optimization margins Typically generalizes better with less data Trajectories

13 What parameter space? Friction coefficient: 1 parameter μ Force limit surface: 2 parameters θ, φ 2D overhand parallel grasp poses x, y, θ maybe height and width Full 6D pose + preshape (x, y, z, θ, φ, ψ, q 1,, q n ) Objects: occupancy grids O(x, y, z) or depth maps h(x, y) Clutter / distractor objects Feedback control strategy? Policy gradient methods: fixed policy function or value function class V x or π x = θ i f i x Deep methods: network weights Is there enough latent structure in the data to make learning perform well?

14 Complexity within parameter space Nonlinearity Discontinuous derivatives Discontinuity

15 Complexity within parameter space Nonlinearity Discontinuous derivatives Discontinuity

16 Complexity within parameter space Can complexity be predicted from a problem specification? Nonlinearity Discontinuous derivatives What algorithm best learns these complex functions? Discontinuity Tables Nearest neighbors Locally weighted regression Linear combinations of feature functions Gaussian processes (Deep) neural networks

Context Robot optimization tasks Optimal

slower than local Learning to accelerate

17 Context Robot optimization tasks Optimal control Trajectory optimization Motion planning Grasp planning Inverse kinematics Global optimization much slower than local Learning to accelerate global optimization using examples [Goldfeder and Allen 2001] [Berenson et al 2012] [Jetchev and Toussaint 2013] [Pan et al 2014] [He et al 2016]

18 Problem space formulation Consider a parameterized family of optimization problems, e.g., Legged robot steps to different targets Objects of different size to be grasped Each problem parameter (Pparameter) θ maps to an optimization problem, which has a global optimum x θ [Hauser, TRO 2017]

19 Formal definition Let an optimization problem in the family P(θ) be expressed as min f x, θ s. t. x g x, θ 0 h x, θ = 0 Then x (θ) is an optimal solution (or nil) and f (θ) is the optimal value (or ) f, g, h are assumed to have (bounded) 1 st and 2 nd derivatives in both x and θ Parametric programming [Poore and Tiarht 1987, Jongen and Weber 1990, Bemporad 2000, Buskens and Maurer 2001]

20 Database goodness (similar to PAC-learning) Define a selection function S(θ, D) that yields one or more example solutions for a novel problem θ Adaptation function A(θ, x, θ ) adapts a prior solution pair (x, θ) to θ (e.g., via local optimization) The ε-adaptation region of pair (x, θ) is the set of θ for which f A θ, x, θ, θ f θ + ε Learner L=(D,S,A) is (α, β)-good if no more than a β fraction of problem space is left uncovered in the union of α-adaptation regions of each example in D

21 Pendulum swing-up example Target problem Local adaptation: Succeeds Suboptimal Fails

22 Pendulum swing-up example Local adaptation: Succeeds Suboptimal Fails Target problem

23 Pendulum swing-up example Target problem Local adaptation: Succeeds Suboptimal Fails

24 Observations The problem-optimum map is only piecewise continuous, differentiable Boundaries when active set changes, saddle points in Lagrangian Violates assumptions of traditional function approximation [Cassioli et al 2010, Pan et al 2014]

25 Within-region continuity Let θ be a point with a unique optimum with a unique KKT active set K Results: In a neighborhood of θ, both f θ and x θ are smooth Can determine derivatives using active-set Jacobian j K Both are bounded as long as j K is nonsingular Smoothness extends to problem space regions where K is constant

26 Analytical Results #1 (assume no equality constraints) Let A just return the same solution Let S be nearest neighbors (α, β)-goodness with a 0 and β 0 as D grows increasingly dense Convergence rate inversely dependent on Lipschitz constants Problem space dims (exponentially!)

27 Analytical Results #2 Let D contain an example in each region Let A obtain global opt. when seeded with correct region Let S select an example in the correct region Perfect learner ((0,0) goodness) But # of partitions is exponential in # of constraints Can t determine regions a priori

28 Practical results Issues: noise, imperfect selection, undersampling Effective method: k-nearest Neighbors Optimization Try local optimization from all of k-nn Stop when first solution found Other enhancements Fast adaptation: just solve active equality + inequality constraints rather than local optimization Metric learning helps, 2-3% in experiments (incremental LogDet method [Jain 2008]) Sensitivity analysis [Tang and Hauser, 2017] Query problem Q Example problems

29 Globally optimal collision free IK example 1m examples Double-integrator car-like vehicle

30 Summary Learning solutions (grasp, trajectory, policy): handle discontinuities Learning value function: handle discontinuous derivatives Size of dataset needed for performance guarantees is worst case exponential in # of constraints and in dimensionality of parameter space

31 What might save us? Exploiting problem structure Learners that account for piecewise discontinuities Efficient, reliable massive precomputation Fast tests of problem formulation complexity

32 Final comments Successful learning architectures (e.g., CNNs) emerged from familiarity with application structure (vision, NLP) The structure in manipulation problems (and decision problems in general) is relatively unknown and inevitably foreign Let us endeavor to understand the structure in manipulation first before blindly applying learning

33 Thanks! Students Jingru Luo Gao Tang Yilun Zhou Acknowledgements NSF award #IIS

Toward data-driven contact mechanics

Toward data-driven contact mechanics KRIS HAUSER Shiquan Wang and Mark Cutkosky Joint work with: Alessio Rocchi NSF NRI #1527826 Analytical contact models are awesome Side Force closure but not form closure