Sub-Optimal Heuristic Search ARA* and Experience Graphs

Robot Autonomy (16-662, S13) Lecture#09 (Wednesday Feb 13, 2013) Lecture by: Micheal Phillips Sub-Optimal Heuristic Search ARA* and Experience Graphs Scribes: S.Selvam Raju and Adam Lebovitz Contents 1 Introduction 2 2 Typical approaches to motion planning 3 3 ARA* Heuristic Search 4 3.1 Why the sub-optimal A*............................. 4 3.2 ARA* - Re-use of search results......................... 5 3.3 ARA* Algorithm................................. 6 3.4 Applications: Planning for manipulation.................... 7 3.4.1 ARA*-based motion planning for a single 7DOF arm......... 7 3.4.2 ARA*-based motion planning for dual-arm motion.......... 8 3.5 Discussion..................................... 8 4 Experience Graphs 9 4.1 Planning with E-Graphs............................. 9 4.2 Applications in Mobile Manipulation...................... 10 5 Conclusion 12 1

1 Introduction These notes are an introduction to Planning for mobile manipulation and a detailed description of the sub-optimal heuristic search algorithms. The disadvantages of the heuristic algorithms are also discussed hence describing the need for sub-optimal heuristics. Planning for Mobile Manipulation There are several tasks in robotics that involve planning. A few such planning tasks that a bar tender robot for example has would be: Task planning (order of getting items) Planning for navigation Planning motions for an arm Planning coordinated full body motions (base + arms) Planning how to grasp an object These notes chiefly deal with the planning algorithms that are used in planning motions for an arm and the co-ordinated full body motion. Figure 1: A bartender robot after pouring a drink into a glass. 2

2 Typical approaches to motion planning Consider a 20 DoF arm for which each state is defined by 20 discretized joint angles {q1, q2,..., q20}. Each action for this arm represented in 2D is achieved by changing one angle (or a set of angles) at a time. For the purpose of planning the motion of this arm, we would be required to search a very high dimensional state-space that these angles represent. For the 20D arm under consideration we might have to store m 20 states in the memory, where m is # the of values. Instead usually the states are computed on the fly as search expands states. Figure 2: A 20D arm represented on 2D to illustrate the number of states available for planning A few methods in which this planning could be done are: 1. Planning with optimized heuristic search-based approaches Eg: A* and D* These approaches have an advantage that: they are complete and guarantee optimality they have excellent cost minimization they have consistent solutions to similar motion queries However they are slow and memory intensive and hence are used only for 2D {x, y} path planning applications. 2. Planning with sampling-based approaches Eg: RRT, PRM etc. 3

These approaches are typically very fast and work with low memory and hence are used for high-d path/motion planning applications. However, they have the following disadvantages: they are complete only in the defined limit they often have a poor cost minimization hard to provide solution consistency 3. Planning with sub-optimal heuristic based approaches These approaches can be fast enough for high-d motion planning and return consistently good quality solutions. We have two such searches: ARA* (Anytime version of A*) graph search This search makes effective use of solutions (previously computed) to arrive at easier motion planning problems within a preset span of time. Experience graphs This is a heuristic search that learns from its planning experiences. 3 ARA* Heuristic Search In real world planning problems, time for deliberation is often limited. Anytime planners are well suited in addressing such problems: they 64257;nd a feasible solution quickly and then continually work on improving it until time (predetermined) runs out. ARA*, is an anytime heuristic search, which tunes its performance bound based on the available search time. It starts by 64257;nding a sub-optimal solution quickly using a loose bound, then tightens the bound progressively as time allows. Given enough time it 64257;nds a provably optimal solution. While improving its bound, ARA* reuses previous search efforts and, as a result, is signi64257;cantly more ef64257;cient than other anytime search methods. 3.1 Why the sub-optimal A* Dijkstra s: expands states in the order of f = g values. This leads to huge memory usage and is not applicable for high D problems. A* Search: expands states in the order of f(s) = g(s) + h(s) values. Though this expands to much lesser states it tends to become slow for high D applications. 4

Weighted A*: expands states in the order of f = g + εh values, ε1 = bias towards states that are closer to goal. This approach however makes the solution less optimal as ε increases. It trades off optimality for speed. The cost of the optimal solution is usually ε times the suboptimal solution provided by the weighted A*. A more detailed description of the above algorithms is available in the previous class notes. [4] In the above searches particularly the weighted A* several previous search results are similar to future search results as the path is being optimised. Hence we should be able to reuse these results. ARA* is an efficient version of the weighter A* that reuses state values within any search iteration. 3.2 ARA* - Re-use of search results ARA* works by executing A* multiple times, starting with a large ε and decreasing ε prior to each execution until ε = 1. As a result, after each search a solution is guaranteed to be within a factor ε of optimal. Running A* search from scratch every time we decrease ε, however, would be very expensive. ARA* reuses the results of the previous searches to save computation. Figure 3 show the operation of the A* algorithm with a heuristic inflated by ε = 2.5, ε = 1.5, and ε = 1 (no inflation) on a simple grid world. In this example an eight-connected grid is used with black cells being obstacles. S denotes a start state, while G denotes a goal state. The cost of moving from one cell to its neighbor is one. The heuristic is the larger of the x and y distances from the cell to the goal. The cells which were expanded are shown in grey. (A* can stop search as soon as it is about to expand a goal state without actually expanding it. Thus, the goal state is not shown in grey.) The paths found by these searches are shown with grey arrows. The A* searches with inflated heuristics expand substantially fewer cells than A* with ε = 1, but their solution is sub-optimal. Figure 3: Weighted A* searches with decreasing ε [3]. Figure 4 shows how the ARA* search reduces the number of states expanded to in the sub-sequent iterations as it improves the optimality. It uses an ImprovePath function that 5

Figure 4: ARA* search iterations for the searches shown in Figure 3 [3]. recomputes a path for a given ε. Then the Main function of ARA* repetitively calls the ImprovePath function with a series of decreasing εs and uses the results of previous searches in all the sub-sequent searches to save computation. 3.3 ARA* Algorithm The algorithm for the ARA* search is divided into two functions. The ImprovePath() function [Algorithm 1] and the Main() function [Algorithm 2]. A detailed explanation for these algorithms is available in the class website [3]. while (f(sgoal) > minimum f-value in OPEN ) do remove s with the smallest [g(s)+ εh(s)] from OPEN; insert s into CLOSED; for every successor s of s do if g(s ) > g(s) + c(s,s ) then g(s ) = g(s) + c(s,s ); if s not in CLOSED then insert s into OPEN; else insert s into INCONS end end end end Algorithm 1: ImprovePath() function 6

set ε to large value; g(sstart) = 0; OPEN = {sstart}; while ε 1 do CLOSED = {}; INCONS = {}; ImprovePath(); publish current ε suboptimal solution; decrease ε; initialize OPEN = OPEN U INCONS; end Algorithm 2: Main() function 3.4 Applications: Planning for manipulation 3.4.1 ARA*-based motion planning for a single 7DOF arm [Cohen et al., ICRA 10, ICRA 11] The ARA* search can be used to find the optimal path for a 7DOF arm to move from its current state to reach an object in the 3D space [Figure 5]. For this purpose: goal is given as 6D (x,y,z,r,p,y) pose of the end-effector each state is: 4 non-wrist joint angles {q1,... q4} when far away from the goal, all 7 joint angles {q1,... q7} when close to the goal actions are: {moving one joint at a time by fixed ; IK-based motion to snap to the goal when close to it} heuristics are: 3D BFS distances for the object Figure 5: ARA* Search used in planning for a single 7DOF arm for picking a ball. 7

3.4.2 ARA*-based motion planning for dual-arm motion [Cohen et al., ICRA 12] The ARA* search is used for co-ordinated motion of two arms on the PR2 to pick a goal object. The following data are used as parameters [Figure 6]: goal is given as 4D (x,y,z,y) pose for the object each state is 6D: 4D (x,y,z,y) pose of the object, {q1,q2} of the elbow angles actions are: changing the 4D pose of the object or q1, q2 heuristics are: 3D BFS distances for the object Figure 6: ARA* Search used in planning dual arm motion. 3.5 Discussion The pros and cons associated with the ARA* search are as listed below: Pros: There is Explicit cost minimization The planning is consistent (similar input generates similar output) Can handle arbitrary goal sets (such as set of grasps from a grasp planner) Many path constraints are easy to implement (eg: upright orientation of gripper) Cons: Requires good graph and heuristic design to plan fast The number of motions from a state affects speed and path quality 8

4 Experience Graphs Experience Graphs is a sub-optimal heuristic search with the distinct advantage of learning from it s history planning experiences. Since many tasks in robotics are repetitive, but not exactly the same each time (example: loading dishes into a dishwasher), it is advantageous to re-use prior experiences to speed up planning for subsequent motions. This method is extremely useful in high-dimensional problems like those found in mobile manipulation. 4.1 Planning with E-Graphs The Experience Graph (E-Graph or G E ) plans with specially designed algorithm with a bias toward searching G E as much as possible, rather than searching the entire graph G. This method avoids exploring large portions of the original graph G. Given a problem with an explicitly defined start and goal state (s start, s goal ), the desired path is a path that connects s start to s goal [2]. Shown in Figure 7 is a graph G with obstacles colored in black. Figure 7: This figure shows three images of a graph G overlaid with previous paths (experiences) shown in red, green, and blue. These paths can be combined to create an Experience Graph G E [1]. Previous paths are shown as colored segments which can be put together to create an Experience Graph G E. The compilation of these paths can be seen in Figure 8. This collection of paths becomes a sub-graph of the original graph G. When given a search query, the e uses the previously computed paths to speed up the process of planning the task. The heuristic is stated in equation 1. h E (s 0 ) = min π i=0 N=1 min{ε E h G (s i, s i+1 ), C E (s i, s i+1 )} (1) 9

Figure 8: Shown in this figure is the compilation of previously computed path experiences pieced together from those seen in Figure 7 [1]. From equation 1, π is a path s 0...s N 1, and s N 1 = s goal and ε E is a scalar 1. The heuristic in equation 1 guides the search toward expanding states on the E-Graph within sub- optimally bound ε. In the heuristic computation finds a minimum cost path using two kinds of path edges, where ε E h G (s i, s i+1 ) describes traveling off the E-Graph with an inflated heuristic, and C E (s i, s i+1 ) describes traveling of the E-Graph using actual costs. For finding paths for repetitive tasks, this heuristic makes the planning much faster. The planner based on equation 1 guarantees the following theorems [1] : 1. Completeness with respect to the original graph G: For a finite graph G, the planner terminates and finds a path in G that connects s start to s goal if one exists. 2. Bounds on sub-optimality: For a finite graph G, the planner terminates and the solution it returns is guaranteed to be no worse than ε ε E times the optimal solution cost in graph G. It is important to note that these properties hold true even if the quality or placement of the experiences is arbitrarily bad. 4.2 Applications in Mobile Manipulation Experiments using planning with E-Graphs has been shown to decrease planning times dramatically when compared against weighted A*. In the experiments discussed by M. Phillips et. al. [2], a PR2 was used to move objects around a kitchen. The PR2 has 10 degrees of freedom: 3 DOFs for base navigation, 1 DOF for spine, 6s DOF for arms. The 10

Figure 9: This figure shows the heuristic search using the E-Graph. C E represents the path solution using actual costs of the previously generated experiences, h G represents the path traveling off the E-Graph using inflated costs [1]. experiment was conducted on 40 goals within the kitchen. The results compared to Weighted A* are shown in table 1. Table 1: This table compares the success rate and planning times for mobile manipulation using Weighted A* and E-Graph. Method Success (of 40) Mean Time (s) Std. Dev. (s) Max (s) E-Graphs 40 0.33 0.25 1.00 Weighted A* 37 34.62 87.74 87.74 The following is a list of Pros and Cons of using E-Graph planning. is Pros Can lead search around local minima (obstacles) Provided paths can have arbitrary quality Can reuse many parts of multiple experiences User can control bound on solution quality Cons Local minima can be created going on to the E-Graph The E-Graph can gro arbitrarily large 11

5 Conclusion Heuristic search within sub-optimal bounds can be more suitable for high-dimensional manipulation planning as compared to optimal heuristic planning. Using a sub-optimal approach still provides good cost minimization, consistent behavior and rigorous theoretical success. Because sub-optimal planning dramatically speeds up search planning with more experience, the performance can get better over time with learning. References [1] Mike Phillips (adapted from Maxim Likhachev). Sub-optimal heuristic search and its application to planning for manipulation. http://personalrobotics.ri.cmu.edu/ courses/16662/notes/discrete-search/heuristic_search.pdf, 2012. [2] M. Phillips, B.J. Cohen, S. Chitta, M. Likhachev. E-graphs: Bootstrapping planning with experience graphs. [3] Maxim Likhachev, Geoff Gordon, Sebastian Thrun. ARA*: Anytime A* with Provable Bounds on Sub-Optimality. http://personalrobotics.ri.cmu.edu/courses/papers/ ara_nips03.pdf, 2003. [4] Kavya Suresh Siddhartha Srinivasa and David Matten. Lecture on discrete search algorithms, lecture08 - monday feb-11, 2013. http://personalrobotics.ri.cmu.edu/ courses/16662/notes/discrete-search/16662_lecture08.pdf, 2013. 12