Planning in Domains with Cost Function Dependent Actions

Size: px

Start display at page:

Download "Planning in Domains with Cost Function Dependent Actions"

Marybeth White
6 years ago
Views:

1 Planning in Domains with Cost Function Dependent Actions Mike Phillips and Maxim Likhachev Carnegie Mellon University Content Area Constraints, Satisfiability, and Search::Heuristic Search Robotics::Motion and Path Planning Reasoning about Plans, Processes and Actions::Planning Algorithms Abstract In a number of graph search-based planning problems, the value of the cost function that is being minimized also affects the set of possible actions at some or all the states in the graph. For example, in path planning for a robot with a limited battery power, a common cost function is energy consumption, whereas the level of remaining energy affects the navigational capabilities of the robot. Similarly, in path planning for a robot navigating dynamic environments, a total traversal time is a common cost function whereas the timestep affects whether a particular transition is valid. In such planning problems, the cost function typically becomes one of the state variables thereby increasing the dimensionality of the planning problem, and consequently the size of the graph that represents the problem. In this paper, we show how to avoid this increase in the dimensionality for the planning problems whenever the availability of the actions is monotonically non-increasing with the increase in the cost function. We present three variants of A* search for dealing with such planning problems: a provably optimal version, a suboptimal version that scales to larger problems while maintaining a bound on suboptimality, and finally a version that relaxes our assumption on the relationship between the cost function and action space. Our experimental analysis on several domains shows that the presented algorithms achieve up to several orders of magnitude speed up over the alternative approaches to planning. Introduction In certain planning problems, the resource whose usage is being minimized also affects the availability of actions. Moreover, it affects it in such a way that the smaller resource level leads to the same or smaller set of available actions. For example, a robot with a limited battery is able to perform fewer and fewer actions, especially the actions that involve moving uphill, as its battery level approaches zero. Consequently, minimizing energy consumption is a common cost function in planning for mobile robots. Another example is planning for a downhill racecar that runs on its initial potential energy but loses it to friction as it moves. Yet another Copyright c, Association for the Advancement of Artificial Intelligence ( All rights reserved. example is planning for unpowered gliders. Minimizing energy loss allows the glider to stay in the air longer and go farther distances. All of these planning problems possess an important property that with every action, the resource, and consequently the set of available actions, is monotonically non-increasing. In planning problems represented as a graph search, this property enables us to remove the value of the remaining resource from the variables defining each state in the searchspace without affecting the completeness and optimality of the planner (Tompkins 5). This can be done because the cost function that is being minimized is equal to the value of the variable that is being removed, and when planning with optimal graph search such as A* search, we are already keeping all and only states with the smallest cost function. There is no need to also compute the states whose cost function, and consequently the remaining level of the resource, is suboptimal. This however, is not true for planning with suboptimal graph searches such as weighted A*, which are important tools for achieving real-time performance and scalability to large problems. Unfortunately, dropping the state variable that represents the objective function leads to the loss of guarantees on completeness and sub-optimality. In this paper, we develop an optimal search called Cost Function Dependent Actions A* (CFDA-A*) and then its suboptimal version, called Weighted CFDA-A*, that does allow us to drop the state variable that represents the cost function without sacrificing completeness or sub-optimality bounds. Furthermore, we extend the algorithm to the problems in which our assumption on the relationship between the cost function and action space may sometimes be violated. The theoretical analysis shows that all three variants of CFDA-A* possess all the properties normally expected of A* and weighted A*. Our experimental analysis on two different domains shows that CFDA-A* and its variants can be over times faster than searching the original statespace with A* or its weighted variant. Related Work Somewhat related to our work is the existing work on planning in domains with state dominance relations (Fujino and Fujiwara 99; Bhattacharya, Roy, and Bhattacharya 998; Yu and Wah 988). A state s dominates another s if knowing that s is reachable guarantees that s cannot be in the

2 optimal solution. In other words, the cost of an optimal solution that passes through s must be guaranteed to be less than an optimal solution that passes through s. This concept allows dominated states to be pruned and reduces the number of states to be examined. Most of the work on planning with state dominance relations however was done not in the context of planning with heuristic search-based planning. Within the area of planning with heuristic searches, there were also several graph searches developed that exploited the dominance relations to gain speedups (Tompkins 5; Korsah, Stentz, and Dias 7; Gonzalez and Stentz 5; Xu and Yue 9). For instance, similarly to our algorithm, the ISE framework (Tompkins 5) reduces the dimensionality of a graph search problem by removing the cost function from the set of state variables under certain conditions. DD* Lite (Korsah, Stentz, and Dias 7) on the other hand does not remove any dimensions but prunes dominated states as it discovers them during the search. The other algorithms (Gonzalez and Stentz 5; Xu and Yue 9) exploit the state dominance relations to speedup a heuristic search in the context of specific planning domains. To the best of our knowledge however, none of these algorithms are addressing the use of suboptimal heuristic searches such as weighted A*, which are highly beneficial in scaling to large planning problems and/or achieving realtime performance (Gaschnig 979; Zhou and Hansen ; Likhachev, Gordon, and Thrun ; Bonet and Geffner ). Filling this gap is the main contribution of our paper. Notations and Assumptions We assume the planning problem is represented as searching a directed graph G = (S, E) where S is the state space, or set of vertices, and E is the set of directed edges in the graph. We use A(s) s S to denote the set of actions available at state s. The function succ(s, a) takes a state s and action a and returns the resulting state. In other words, (s, succ(s, a)) corresponds to an edge in E. We assume strictly positive costs c(s, s ) >, for every (s, s ) E. The objective of the search is to find a least-cost path from state s start to state s goal (or set of goal states). For any s, s S, we define c (s, s ) to be cost of a least-cost path from state s to state s. We will denote x d as the state variable that corresponds to the cost function, and x i as the set of all other state variables. To model the fact that one of the state variables is the cost function, we define the set of state variables x as x = {x i, x d }, where x d = c (s start, s). The main assumption we are making is that for any given x i, the set of actions is monotonically non-increasing as the cost increases. Formally, for any s, s S such that x i (s) = x i (s ), if x d (s) < x d (s ), then A(s ) A(s) (assumption ). An example of when this assumption holds would be planning for a robot with a limited battery. The cost function is the amount of energy consumed and as the battery is used up there are fewer actions the robot can perform. For instance, there may be steeper parts of the terrain that can only be climbed if the robot has enough charge left. g(ŝ start) = ; OPEN = ; insert ŝ start into OPEN with f(ŝ start) = h(ŝ start); while(ŝ goal is not expanded) 4 remove ŝ with the smallest f-value from OPEN; 5 s = {x i(ŝ), g(ŝ)}; 6 for each a in A(s) 7 s = succ(s, a) 8 ŝ = {x i(s )} 9 if ŝ was not visited before then f(ŝ ) = g(ŝ ) = ; if g(ŝ ) > g(ŝ) + c(s, s ) g(ŝ ) = g(ŝ) + c(s, s ); f(ŝ ) = g(ŝ ) + h(ŝ ); 4 insert ŝ into OPEN with f(ŝ ); Figure : Optimal CFDA-A* Optimal CFDA-A* In A*, discovered states s are put into a priority queue called the OPEN list and are sorted by the function f(s) = g(s) + h(s), where g(s) is the cost function representing the best path found so far from the start state to s, and h(s) is a heuristic estimate of the cost from s to the goal. The heuristic must not overestimate the actual cost to the goal and must satisfy the triangle inequality. Initially, the start state is put into OPEN. On each iteration, the state s with lowest f(s) is removed from OPEN and gets expanded. During expansion, A* iterates over the successors of s. If any of these states have not been discovered previously or if a cheaper path to them was found, they are put/re-inserted into OPEN with their updated f-values. Every state expanded under A* has the optimal (minimum) g-value. Therefore, when the goal is expanded, A* finds the optimal solution. In optimal A* a state cannot be re-expanded since a shorter path to a state cannot be found after the state has been expanded. Therefore, it is not necessary to maintain a list of states that have already been expanded (often called the CLOSED list). An optimal version of CFDA-A*, shown in Figure, is basically an A* search except that it is executed on a new graph Ĝ which has a reduced state space Ŝ. In Ŝ, the dimension x d is removed, and therefore each state ŝ Ŝ is only defined by x i. In other words, the search maps each state s = {x i, x d } in the original state space S onto a state ŝ = {x i } in the new state space Ŝ (line 8) and maps each state ŝ Ŝ onto s S with the smallest g-value that is reachable during the graph search (line 5). The transitions between ŝ and ŝ are defined as the transitions between the corresponding states s and s in the original state space (line 7). Theorem The algorithm terminates, and when it does, the found path from ŝ start to ŝ goal corresponds to the least-cost path from s start to s goal in S Proof Sketch: The path ˆp Ĝ from ŝ start to ŝ goal corresponds to the path p G from s start to s goal by taking each state ŝ k = {x i } ˆp and making it s k = {x i, g(ŝ)} p. Assume ˆp does not correspond to a least-cost path p. Then p must contain a state s that CFDA-A* did not consider or else it would have found the same path. However, the only states from S that CFDA-A* ignores are ones that have larger (less optimal) x d. In other words, there is a cheaper path to x i (s),

Some s s s graph Long path Edge only exists for g(s )<α Some s s s graph Edge only exists for g(s )<α Figure : Running normal weighted A* on this graph may return no solution if s is expanded

3 Some s s s graph Long path Edge only exists for g(s )<α Some s s s graph Edge only exists for g(s )<α Figure : Running normal weighted A* on this graph may return no solution if s is expanded sub-optimally such that its g-value exceeds α. For a similar reason, the suboptimality bound can be violated on this graph if the cost of the long path is greater than ɛ times the minimum cost of the lower path. then the one in p. More detailed proofs can be found in our tech report (Phillips and Likhachev ). Weighted CFDA-A* with bounded sub-optimality Weighted A* is a common generalization of the A* algorithm which puts states s into the OPEN list according to f(s) = g(s) + ɛ h(s) (Pohl 97). The scalar ɛ > inflates the heuristic. As ɛ increases, the f function becomes more heavily driven by the heuristic component and will often find the solution by expanding far fewer states. While the solution is no longer guaranteed to be optimal, its cost is guaranteed to be no more than ɛ times the cost of an optimal solution (Pohl 97). While normal weighted A* allows a state to be expanded more than once (whenever a shorter path to the state is found), it turns out that the same guarantees can be provided even if we allow each state to be expanded only once (Likhachev, Gordon, and Thrun ). The limit helps in assuring the run-time complexity, since without it, the number of state expansions can be exponential in the number of states. Running CFDA-A*, as described in the previous section, but with the inflated heuristics, destroys its guarantees on completeness and solution quality. Figure demonstrates this problem. The issue occurs when a state s is expanded sub-optimally (due to the inflated heuristic) and its suboptimal g-value leads to the absence of a critical action between s and s. Figure shows how the ɛ bound on the solution quality may also get ignored. The example shows that if there is more than one path to the goal, and the lowercost path is not found due to sub-optimal expansions, then the algorithm will find a longer path to the goal and terminate, independently of how much costlier it is. Weighted CFDA-A* shown in Figure solves this problem by introducing two versions of each state ŝ, an optimal version and a sub-optimal version. To achieve this, we redefine ŝ as ŝ = {x i, o}, where o is a boolean variable representing the optimality of ŝ. The invariant that Weighted CFDA-A* maintains is that when expanding any state ŝ whose o(ŝ) = true, the algorithm guarantees that its g(ŝ) is optimal. Similarly, when expanding any state ŝ whose o(ŝ) = false, the algorithm guarantees that its g(ŝ) is no more than ɛ times the optimal g-value. When expanding, optimal states generate both optimal and sub-optimal successors, but sub-optimal states generate only sub-optimal successors. Sub-optimal states go into the OPEN list with the usual f = g + ɛ h, but optimal states go in with an inflated f = ɛ (g + h). The following can be said about this ordering. First, sub-optimal states tend to come out earlier than optimal ones because optimal states have both terms of the f-value inflated instead of inflating just the heuristics for sub-optimal states. This is beneficial for runtime because a sub-optimal search usually finds a path to goal much faster due to the drive of the inflated heuristic. Second, the optimal states, with respect to each other, are expanded in the exact same order as they would be in the optimal A* search. The only difference is that their f-values are just scaled by a positive constant, but this has no effect on the relative ordering of the optimal states. This property restores the completeness guarantee to our algorithm: even if the sub-optimal states get blocked from the goal such as in Figure, the optimal states will expand in exactly the order they would in the optimal algorithm and will therefore get past this trap. As soon as an optimal state is expanded at the blocking point, it will generate optimal and sub-optimal successors past the block, and the sub-optimal states can again move quickly towards the goal. For the same reasons, this approach also restores the bound on sub-optimality, as stated in Theorem. Figure shows the weighted CFDA-A* variant. There are a few differences from the pseudocode of optimal CFDA- A* explained previously. The addition of a CLOSED list on lines,4, ensures that a state can only be expanded at most once. On lines 6, 9, and we see that if an optimal state is being expanded, we generate both optimal and suboptimal versions of each successor. But if a suboptimal state is being expanded, then only suboptimal successors are generated. Also, lines 5-8 now reflect how the f-value can be set in one of two ways depending on whether the successor is optimal or not. Theorem The algorithm terminates, and when it does, the found path from ŝ start to ŝ goal corresponds to a path from s start to s goal that has a cost no greater than ɛ c (s start, s goal ). Proof: If we can show that the path from ŝ start to ŝ goal has a cost no greater than ɛ c (ŝ start, ŝ goal ), then we can apply Theorem to prove the claim. Let c (ŝ start, ŝ) = g (s). Assume for sake of contradiction that upon expansion of ŝ goal, g(ŝ goal ) is greater than ɛ times the optimal g-value (g(ŝ goal ) > ɛ g (ŝ goal )). Then g(ŝ goal) ɛ > g (ŝ goal ) be-

4 g(ŝ start) = ; O(ŝ start) = true; OPEN = ; CLOSED = ; insert ŝ start into OPEN with f(ŝ start) = ɛ h(ŝ start); while(ŝ goal is not expanded) 4 remove ŝ with the smallest f-value from OPEN and insert ŝ in CLOSED; 5 s = {x i(ŝ), g(ŝ)}; 6 If O(ŝ) = true then Opt = {true, false}, else Opt = {false} 7 for each a in A(s) 8 s = succ(s, a) 9 for o in Opt ŝ = {x i(s ), o} if ŝ was not visited before then f(ŝ ) = g(ŝ ) = ; if g(ŝ ) > g(ŝ) + c(s, s ) and s / CLOSED 4 g(ŝ ) = g(ŝ) + c(s, s ); 5 if O(ŝ ) 6 f(ŝ ) = ɛ (g(ŝ ) + h(ŝ )); 7 else 8 f(ŝ ) = g(ŝ ) + ɛ h(ŝ ); 9 insert ŝ into OPEN with f(ŝ ); Figure : Weighted CFDA-A* s s s s s 4 s 5 s ' s 5' h=5 h=4 h= h= h= h= s s s s s 4 s 5 g(s ) < Dimension x d removed Figure 4: The original graph (top) is reduced (bottom) by removing dimension x d. In this example, running the normal weighted A* on the reduced graph will return no solution because of the edge that depends on the g-value. Weighted CFDA-A* resolves this problem. cause ɛ is positive. Now any state ŝ on the optimal path satisfies g (ŝ) + h(ŝ) g (ŝ goal ). By substitution, g (ŝ) + h(ŝ) g(ŝ goal) ɛ. Therefore, ɛ (g (ŝ) + h(ŝ)) g(ŝ goal ). This ɛ (g + h) is how we sort optimal states in the OPEN list. Since g (ŝ start ) is the first state in the OPEN list, by induction, g (ŝ goal ) would be discovered and expanded before g(ŝ goal ). Therefore, when the goal state is expanded, g(ŝ goal ) would be optimal, which is a contradiction. This means that the optimal goal state would have to be expanded before a suboptimal state that is not bounded by ɛ. The following theorem relates the complexity of Weighted CFDA-A*, in particular, that it reduces the dimension x d to a binary value, since for every x i, we now have only two states: an optimal state and a suboptimal state. Theorem For every collection of states with the same x i in S, Weighted CFDA-A* performs at most two expansions. Example Using the example in Figure 4, we demonstrate how our algorithm avoids the pitfalls shown in Figure using optimal states. We assume that ɛ =.. The heuristic function is just the number of edges to the goal. The top graph represents the original graph. The states grouped in dotted boxes have the same x i but different x d. In the bottom graph, this dimension has been removed. Thus, s and s have become the same state, as have s 5 and s 5. However, this has caused a cost dependent edge between s and s. The start state is s and the goal is s 5. Weighted CFDA- A* first expands s opt, ssub, and s sub, where the superscript opt denotes an optimal state and sub denotes a suboptimal state. Since, the g-value of s sub at time of expansion is, it does not generate any successors. The state s sub also generates no successors because s sub has already been expanded. If we were running regular weighted A*, the search would have ended here and declared that there was no solution. However, in weighted CFDA-A*, s opt, and sopt are still in the OPEN list. s opt would be expanded first, updating the g-value of s opt to. Then, when s opt is expanded, it generates both optimal and sub-optimal versions of state s. Now, the lower-priority sub-optimal states will again be expanded starting with s sub, followed by the goal state s sub 5. Weighted CFDA-A* with Proper Intervals The final extension to our algorithm allows it to work on the problems where assumption may sometimes be violated. We assumed that for any pair of states s = {x i, x d }, s = {x i, x d }, if x i = x i and x d < x d, then A(s ) A(s) in G, and since s has a worse g-value, we can replace both of these states with ŝ in Ĝ. This is not true if for some x i the valid values of x d are not one contiguous interval, because they are separated by invalid states in G. In these cases, the broken assumption leads to incompleteness. In Figure 5, s represents the state with the smallest g-value in x i that is reachable. When expanding this state, the action that moves to x i only generates s but not s. In some scenarios the later interval may be the only way to the goal. Not generating this successor causes incompleteness. However, if the planning problem includes a spend action that increments x d without changing x i, then CFDA-A* regains its guarantees. In Figure 5, this allows s to reach a higher (downward in the figure) g-value before executing the action so that it generates s and s. We now define a proper interval as a maximal contiguous series of valid values of x d for a particular x i. In Figure 5, x i has two proper intervals. With the addition of a spend operation that allows positive movement along x d, the only state in a proper interval that matters is the one with the smallest g-value. We can therefore augment states for Ĝ with a variable that identifies the proper interval for the x i. More formally, a state is now defined as ŝ = {x i, p, o}, where p is the proper interval index. In Figure 5(c), the intervals have been converted to a graph. There are two states for x i with interval index p being and. State s has an action that goes to s if g(s ) < 4 and the same action always goes to s, but sometimes requires a spend operation to get it past the collision period (g(s ) < 6). With the intervals and spend operation we once again have the guarantee that the set of successors is monotonically non-increasing as the

5 x i x i x i s (x i,p=) Exists if g(s ) < 4 s (x i,p=) s (x i,p=) (c) x i x i x i s s s t=4 t=6 s 4 s 4 (x i,p=) Figure 5: A scenario with three locations (x i, x i, x i ) and a train which arrives at x i at time 4 and leaves at time 6. The intervals shown for the three locations. Since x i and x i are free for all time, they each have one interval (therefore one state). Location x i has a safe interval before the train passes (t < 4) and after (t > 6) resulting in two states for this location (s, s ). (c) The intervals converted into a graph with a cost function dependent action. The edge (s, s ) only exists if g(s ) < 4. The edge going to s always exists due to the spend operation. cost increases. Figure 6 shows the algorithm for CFDA-A* with proper intervals. On line 9 we see that after getting the successor state s S we iterate over the proper intervals for that x i. After forming ŝ for this interval, on line 4 we compute c spend which is how much the g-value needs to be incremented before executing action a in order to arrive at interval p at the smallest g-value. This value may not exist indicated on lines 5-6 either because the interval we are trying to get to ends before g(ŝ) + c(s, s ) (we already missed it) or the smallest g-value we could arrive at interval p would require incrementing g(ŝ) past the end of its proper interval before executing the action. If such a c spend does exist, then on lines 7-8, it is used as an additional cost in the g-value update step. Theorem 4 The algorithm terminates, and when it does, the found path from ŝ start to ŝ goal corresponds to a path from s start to s goal that has a cost no greater than ɛ c (s start, s goal ). Proof Sketch: We can apply Theorem if assumption applies. For some s, s S, where x i = x i and x d < x d, s does not have a superset of the actions of s because an action a could lead to an invalid state for s but a valid one for s due to invalid intervals along minimized dimension. This is remedied by adding a proper interval index as part of the state and a spend operation so that the state s can have all the successors of s by spending until s before executing a. g(ŝ start) = ; O(ŝ start) = true; OPEN = ; CLOSED = ; insert ŝ start into OPEN with f(ŝ start) = ɛ h(ŝ start); while(ŝ goal is not expanded) 4 remove ŝ with the smallest f-value from OPEN and insert ŝ in CLOSED; 5 s = {x i(ŝ), g(ŝ)}; 6 If O(ŝ) = true then Opt = {true, false}, else Opt = {false} 7 for each a in A(s) 8 s = succ(s, a) 9 for each proper interval p in x i(s ) for o in Opt ŝ = {x i(s ), p, o} if ŝ was not visited before then f(ŝ ) = g(ŝ ) = ; 4 c spend = extra increment before a to arrive in p with smallest g-value; 5 if c spend does not exist 6 continue; 7 if g(ŝ ) > g(ŝ) + c(s, s ) + c spend and s / CLOSED 8 g(ŝ ) = g(ŝ) + c(s, s ) + c spend ; 9 if O(ŝ ) f(ŝ ) = ɛ (g(ŝ ) + h(ŝ )); else f(ŝ ) = g(ŝ ) + ɛ h(ŝ ); insert ŝ into OPEN with f(ŝ ); log(expands) Figure 6: Weighted CFDA-A* with Proper Intervals Weighted A* 4 Weighted CFDA A* 4 5 log(time (s)) Weighted CFDA A* <_ 4 Weighted A* 4 5 Figure 7: Planning for a limited battery comparing the full state space and our reduced state space over various ɛ. Average expansions (on the log scale). Average seconds (on the log scale). When the plot drops below the breaks, the planning time was less than or equal to. ( 4 on the log scale) seconds and was too small to measure. Experimental Analysis We performed simulation experiments in two domains to show the benefits of using our algorithms. Robot with a Limited Battery The domain of the robot with a limited battery exhibits the benefits of weighted CFDA-A* (without the need of proper intervals). The original state space of this domain is x, y, energy consumed. There is an energy consumption limit (when the battery is exhausted). The environments are randomly generated fractal costmaps commonly used to model outdoor environments (Yahja et al. 998). The robot had to move from the upper left corner to the lower right. The cost of any cell ranges from to 6. The cost of traversing a cell is the distance traveled (in cells) times the cost of the cell. This cost is assumed to be the energy consumed by that action. The heuristic used was Euclidean distance. We ran two sets of experiments. The first experiments compare the full state space

expands 4 x 6 Weighted CFDA A* =. =. 5 5 5 side length time (s) Weighted CFDA A* 5 5 5 =. =. 5 5 5 side length Figure 8: Planning for a limited battery with a reduced state space comparing ɛ =.

6 5.5 5 4.5 4.5.5 Indoor, Weighted A* Indoor, Weighted CFDA A* Outdoor, Weighted A* Outdoor, Weighted CFDA A* 4 5 6 7 8 9 Figure : Average expansions (log scale) for indoor (black) and outdoor (gray)

6 expands 4 x 6 Weighted CFDA A* =. = side length time (s) Weighted CFDA A* =. = side length Figure 8: Planning for a limited battery with a reduced state space comparing ɛ =. and ɛ =. over various map size. Average expansions Average seconds log(expands) Indoor, Weighted A* Indoor, Weighted CFDA A* Outdoor, Weighted A* Outdoor, Weighted CFDA A* Figure : Average expansions (log scale) for indoor (black) and outdoor (gray) dynamic environments comparing full state space (dashed line) to CFDA-A* (solid line) for varying ɛ. Figure 9: Examples of random indoor, outdoor maps used for planning in dynamic environments. weighted A* against our weighted CFDA-A*. We generated 5, 5-by-5 random fractal environments. For much larger environments, the full state space search didn t find solutions most of the time (especially for low ɛ). The graph shown in Figure 7 shows the average number of expansions for various values of ɛ on a log scale. CFDA-A* shows up to times less expansions for smaller ɛ. After ɛ = 4., the two are similar, because the cost function becomes so dominated by the heuristic (particularly because the max cell cost is only 6) that both methods end up expanding a straight line from the start to the goal. Figure 7 shows that the actual planning times have roughly the same speed up as the expansions. After ɛ = 4. the planninng times become too small to measure. Our second experiment shows the advantage of using weighted CFDA-A* over optimal CFDA-A*. To test this we compared the weighted CFDA-A* (with ɛ =.) against the optimal on maps of varying size ranging from -by- to 49-by-49. For each map size we had 5 trials (to average out noise). We tried 7 different map sizes so there were 5 trials in total. The graph in Figure 8 shows the average number of expansions for various map sizes. This plot shows that having an inflated heuristic produces a significant speed up over optimal planning in spite of the fact that the state space is two times larger (optimal and sub-optimal states) in order to maintain the guarantees. Figure 8 shows similar results for the corresponding planning times. Planning in Dynamic Environments Planning in dynamic environments is the problem of finding a collision free path from the start location and time to a goal location. There are both static and moving obstacles in the environment and in order to find an optimal path, planning Student Version of MATLAB must be done in the space-time state space. Also, for many robots, the direction they are facing affects where their motions take them so we ve incorporated θ into the search space as well. The planning in dynamic environments domain requires proper intervals in order for our weighted A* variant to maintain guarantees. This experiment shows the benefits of using our weighted CFDA-A* with proper intervals. The original state space is x, y, θ, time. The maps we ran on were 5-by-5 with θ discretized into 6 angles and time discretized into. seconds. We assumed we knew the trajectories of the dynamic obstacles in advance. For each map, there were dynamic obstacles with randomly generated time parameterized paths to follow. We ran two types of experiments, on indoor maps and outdoor maps. The indoor maps were comprised of randomly generated narrow hallways and rooms. The halls were narrow enough that some dynamic obstacles filled the entire width and could not be passed by the robot. An example of this type of environment is shown in Figure 9. The outdoor maps were more open with about percent of the map covered with circles of varying size. An example of an outdoor map is shown in Figure 9. Backward D Dijkstra search that accounted for static obstacles (but no dynamic obstacles) was used as the heuristic. Our experiments compared a planner on the original full state space against Weighted CFDA-A* with proper intervals. The black curves in Figure show the average expands for the indoor trials across various values for ɛ. The plot shows times fewer expansions not only for optimal but also for the weighted CFDA-A*. It is also important to note that weighted CFDA-A* runs significantly faster than optimal CFDA-A* search. The gray curves in Figure show the average expands for the outdoor trials across various values for ɛ. There is also an improvement here, however it is not as dramatic. This is because in the indoor case the robot must wait for obstacles to pass (because of the tight hallways) very often. For our algorithm, a wait (i.e., spend

7 log(time (s)) Indoor, Weighted A* Indoor, Weighted CFDA A* Outdoor, Weighted A* Outdoor, Weighted CFDA A* Figure : Average seconds (log scale) for indoor (black) and outdoor (gray) dynamic environments comparing full state space (dashed line) to CFDA-A* (solid line) for varying ɛ. Student Version of MATLAB operation) of arbitrary length is handled intelligently with a single successor, but for the full state space, waiting in place requires many expansions along the time dimension (one for each timestep). In the outdoor case this matters less because the environment is so open it is almost always faster to dodge around dynamic obstacles instead of waiting them out. In Figure, the actual planning times show roughly the same speed up as the expansions. Acknowledgements This research was partially sponsored by the ONR grant N and DARPA grant NAP. We also thank Willow Garage for their partial support of this work. Conclusions Many domains minimize a cost function that is also a dimension in the search space, because it affects the possible actions. Our algorithm allows for the removal of this dimension from the search space resulting in a graph with Cost Function Dependent Actions. CFDA-A* can solve these types of graphs in the optimal case. Weighted CFDA-A* uses optimal and suboptimal states with a clever expansion order in order to continue to provide this reduction in the size of the state space while maintaining the theoretical properties of completeness and bounded sub-optimality. Finally, our algorithm can be extended to handle more complicated problems that violate our key assumption overall, but still maintain it within proper intervals. This replaces the dimension in question with a more compact representation that works with both optimal and weighted CFDA- A*. We ve shown strong theoretical guarantees on all of our algorithms as well as significant experimental results from two domains: planning with a limited battery and planning in dynamic environments. While not addressed directly in our paper, the guarantee of bounded sub-optimality easily allows our method to be extended to anytime planning methods like ARA* (Likhachev, Gordon, and Thrun ). References Bhattacharya, S.; Roy, R.; and Bhattacharya, S An exact depth-first algorithm for the pallet loading problem. European Journal of Operational Research ():6 65. Bonet, B., and Geffner, H.. Planning as heuristic search. Artificial Intelligence 9(-):5. Fujino, T., and Fujiwara, H. 99. An efficient test generation algorithm based on search state dominance. In FTCS, Gaschnig, J Performance measurement and analysis of certain search algorithms. Tech. Rep. CMU-CS-79-4, Carnegie Mellon University. Gonzalez, J. P., and Stentz, A. 5. Planning with uncertainty in position: An optimal and efficient planner. In in Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS 5), Korsah, G. A.; Stentz, A. T.; and Dias, M. B. 7. Dd* lite: Efficient incremental search with state dominance. Technical Report CMU-RI-TR-7-, Robotics Institute, Pittsburgh, PA. Likhachev, M.; Gordon, G.; and Thrun, S.. ARA*: Anytime A* with provable bounds on sub-optimality. In Advances in Neural Information Processing Systems (NIPS) 6. Cambridge, MA: MIT Press. Phillips, M., and Likhachev, M.. Planning in domains with cost function dependent actions: Formal analysis. Pohl, I. 97. First results on the effect of error in heuristic search. Machine Intelligence 5:9 6. Tompkins, P. 5. Mission-Directed Path Planning for Planetary Rover Exploration. Ph.D. Dissertation, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. Xu, Y., and Yue, W. 9. A generalized framework for bdd-based replanning a* search. In SNPD, 9. Yahja, A.; Stentz, A.; Singh, S.; and Brumitt, B. L Framed-quadtree path planning for mobile robots operating in sparse environments. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Yu, C. F., and Wah, B. W Learning dominance relations in combinatorial search problems. IEEE Trans. Software Eng. 4(8): Zhou, R., and Hansen, E. A.. Multiple sequence alignment using A*. In Proceedings of the National Conference on Artificial Intelligence (AAAI). Student abstract.

Planning for Markov Decision Processes with Sparse Stochasticity

Planning for Markov Decision Processes with Sparse Stochasticity Maxim Likhachev Geoff Gordon Sebastian Thrun School of Computer Science School of Computer Science Dept. of Computer Science Carnegie Mellon