Generalized Adaptive A*

Size: px

Start display at page:

Download "Generalized Adaptive A*"

Paula Lewis
6 years ago
Views:

1 Generalized daptive * Xiaoxun Sun US omputer Science Los ngeles, alifornia xiaoxuns@usc.edu Sven Koenig US omputer Science Los ngeles, alifornia skoenig@usc.edu William Yeoh US omputer Science Los ngeles, alifornia wyeoh@usc.edu STRT gents often have to solve series of similar search problems. daptive * is a recent incremental heuristic search algorithm that solves series of similar search problems faster than * because it updates the h-values using information from previous searches. It basically transforms consistent h- values into more informed consistent h-values. This allows it to find shortest paths in state spaces where the action costs can increase over time since consistent h-values remain consistent after action cost increases. However, it is not guaranteed to find shortest paths in state spaces where the action costs can decrease over time because consistent h-values do not necessarily remain consistent after action cost decreases. Thus, the h-values need to get corrected after action cost decreases. In this paper, we show how to do that, resulting in Generalized daptive * (G*) that finds shortest paths in state spaces where the action costs can increase or decrease over time. Our experiments demonstrate that Generalized daptive * outperforms breadth-first search, * and * Lite for moving-target search, where * Lite is an alternative state-of-the-art incremental heuristic search algorithm that finds shortest paths in state spaces where the action costs can increase or decrease over time. ategories and Subject escriptors I.2 [rtificial Intelligence]: Problem Solving General Terms lgorithms Keywords *; * Lite; Heuristic Search; Incremental Search; Shortest Paths, Moving-Target Search. INTROUTION Most research on heuristic search has addressed one-time search problems. However, agents often have to solve series of similar search problems as their state spaces or their knowledge of the state spaces changes. daptive * [7] is a ite as: Generalized daptive *, Xiaoxun Sun, Sven Koenig and William Yeoh, Proc. of 7th Int. onf. on utonomous gents and Multiagent Systems (MS 2008), Padgham, Parkes, Müller and Parsons (eds.), May, 2-6., 2008, storil, Portugal, pp. XXX- XXX. opyright c 2008, International Foundation for utonomous gents and Multiagent Systems ( ll rights reserved. recent incremental heuristic search algorithm that solves series of similar search problems faster than * because it updates the h-values using information from previous searches. daptive * is simple to understand and easy to implement because it basically transforms consistent h-values into more informed consistent h-values. daptive * can easily be generalized to moving-target search [5], where the goal state changes over time. onsistent h-values do not necessarily remain consistent when the goal state changes. daptive * therefore makes the h-values consistent again when the goal state changes. daptive * is able to find shortest paths in state spaces where the start state changes over time since consistent h-values remain consistent when the start state changes. It is able to find shortest paths in state spaces where the action costs can increase over time since consistent h-values remain consistent after action cost increases. However, they are not guaranteed to find shortest paths in state spaces where the action costs can decrease over time because consistent h-values do not necessarily remain consistent after action cost decreases, which limits their applicability. Thus, the h-values need to get corrected after action cost decreases. In this paper, we show how to do that, resulting in Generalized daptive * (G*) that finds shortest paths in state spaces where the action costs can increase or decrease over time. 2. NOTTION We use the following notation: S denotes the finite set of states. s start S denotes the start state, and s goal S denotes the goal state. (s) denotes the finite set of actions that can be executed in state s S. c(s, a) > 0 denotes the action cost of executing action a (s) in state s S, and succ(s,a) S denotes the resulting successor state. We refer to an action sequence as path. The search problem is to find a shortest path from the start state to the goal state, knowing the state space. 3. HURISTIS Heuristic search algorithms use h-values (= heuristics) to focus their search. The h-values are derived from usersupplied H-values H(s, s ), that estimate the distance from any state s to any state s. The user-supplied H-values H(s,s ) have to satisfy the triangle inequality (= be consistent), namely satisfy H(s, s ) = 0 and H(s,s ) c(s, a) + H(succ(s, a), s ) for all states s and s with s s and all actions a that can be executed in state s. If the usersupplied H-values H(s,s ) are consistent, then the h-values h(s) = H(s,s goal) are consistent with respect to the goal

2 state s goal (no matter what the goal state is), namely satisfy h(s goal) = 0 and h(s) c(s, a) + h(succ(s, a)) for all states s with s s goal and all actions a that can be executed in state s []. 4. * daptive * is based on a version of * [2] that uses consistent h-values with respect to the goal state to focus its search. We therefore assume in the following that the h-values are consistent with respect to the goal state. 4. Variables * maintains four values for all states s that it encounters during the search: a g-value g(s) (which is infinity initially), which is the length of the shortest path from the start state to state s found by the * search and thus an upper bound on the distance from the start state to state s; an h-value h(s) (which is H(s,s goal) initially and does not change during the * search), which estimates the distance of state s to the goal state; an f-value f(s) := g(s) + h(s), which estimates the distance from the start state via state s to the goal state; and a tree-pointer tree(s) (which is undefined initially), which is used to identify a shortest path after the * search. 4.2 atastructures and lgorithm * maintains an OPN list (a priority queue which contains only the start state initially). * identifies a state s with the smallest f-value in the OPN list [Line 4 from Figure ]. * terminates once the f-value of state s is no smaller than the f-value or, equivalently, the g-value of the goal state. (It holds that f(s goal) = g(s goal) + h(s goal) = g(s goal) since h(s goal) = 0 for consistent h-values with respect to the goal state.) Otherwise, * removes state s from the OPN list and expands it, meaning that it performs the following operations for all actions that can be executed in state s and result in a successor state whose g-value is larger than the g-value of state s plus the action cost [Lines 8-2]: First, it sets the g-value of the successor state to the g-value of state s plus the action cost [Line 8]. Second, it sets the tree-pointer of the successor state to (point to) state s [Line 9]. Finally, it inserts the successor state into the OPN list or, if it was there already, changes its priority [Line 20-2]. (We refer to it generating a state when it inserts the state for the first time into the OPN list.) It then repeats the procedure. 4.3 Properties We use the following known properties of * searches []. Let g(s), h(s) and f(s) denote the g-values, h-values and f- values, respectively, after the * search: First, the g-values of all expanded states and the goal state after the * search are equal to the distances from the start state to these states. Following the tree-pointers from these states to the start state identifies shortest paths from the start state to these states in reverse. Second, an * search expands no more states than an (otherwise identical) other * search for the same search problem if the h-values used by the first * search are no smaller for any state than the corresponding h-values used by the second * search (= the former h-values dominate the latter h-values at least weakly). 5. PTIV * daptive * [7] is a recent incremental heuristic search algorithm that solves series of similar search problems faster than * because it updates the h-values using information from previous searches. daptive * is simple to understand and easy to implement because it basically transforms consistent h-values with respect to the goal state into more informed consistent h-values with respect to the goal state. We describe Lazy daptive Moving-Target daptive * (short: daptive *) in the following. 5. Improving the Heuristics daptive * updates (= overwrites) the consistent h- values with respect to the goal state of all expanded state s after an * search by executing h(s) := g(s goal) g(s). () This principle was first used in [4] and later resulted in the independent development of daptive *. The updated h- values are again consistent with respect to the goal state [7]. They also dominate the immediately preceeding h-values at least weakly [7]. Thus, they are no less informed than the immediately preceeding h-values and an * search with the updated h-values thus cannot expand more states than an * search with the immediately preceeding h-values (up to tie breaking). 5.2 Maintaining onsistency of the Heuristics daptive * solves a series of similar but not necessarily identical search problems. However, it is important that the h-values remain consistent with respect to the goal state from search problem to search problem. The following changes can occur from search problem to search problem: The start state changes. In this case, daptive * does not need to do anything since the h-values remain consistent with respect to the goal state. The goal state changes. In this case, daptive * needs to correct the h-values. ssume that the goal state changes from s goal to s goal. daptive * then updates (= overwrites) the h-values of all states s by assigning h(s) := max(h(s,s goal), h(s) h(s goal)). (2) The updated h-values are consistent with respect to the new goal state [9]. However, they are potentially less informed than the immediately preceeding h-values. Taking the maximum of h(s) h(s goal) and the user-supplied H-value H(s, s goal) with respect to the new goal state ensures that the h-values used by daptive * dominate the user-supplied H-values with respect to the new goal state at least weakly. t least one action cost changes. If no action cost decreases, daptive * does not need to do anything since the h-values remain consistent with respect to the goal state. This is easy to see. Let c denote the original action costs and c the new action costs after the changes. Then, h(s) c(s, a) + h(succ(s, a)) for s

3 S \ {s goal} implies that h(s) c(s, a)+h(succ(s,a)) c (s, a) + h(succ(s, a)) for s S \ {s goal}. However, if at least one action cost decreases then the h-values are not guaranteed to remain consistent with respect to the goal state. This is easy to see by example. onsider a state space with two states, a non-goal state s and a goal state s, and one action whose execution in the non-goal state incurs action cost two and results in a transition to the goal state. Then, an h-value of two for the non-goal state and an h-value of zero for the goal state are consistent with respect to the goal state but do not remain consistent with respect to the goal state after the action cost decreases to one. Thus, daptive * needs to correct the h-values, which is the issue addressed in this paper. 5.3 Updating the Heuristics Lazily So far, the h-values have been updated in an eager way, that is, right away. However, daptive * updates the h- values in a lazy way, which means that it spends effort on the update of an h-value only when it is needed during a search, thus avoiding wasted effort. It remembers some information during the * searches (such as the g-values of states) [Lines 8 and 30] and some information after the * searches (such as the g-value of the goal state, that is, the distance from the start state to the goal state) [Lines 35 and 37] and then uses this information to calculate the h-value of a state only when it is needed by a future * search [9]. 5.4 Pseudocode Figure contains the pseudocode of daptive * [9]. daptive * does not initialize all g-values and h-values up front but uses the variables counter, search(s) and pathcost(x) to decide when to initialize them: The value of counter is x during the xth execution of omputepath(), that is, the xth * search. The value of search(s) is x if state s was generated last by the xth * search (or is the goal state). daptive * initializes these values to zero [Lines 25-26]. The value of pathcost(x) is the length of the shortest path from the start state to the goal state found by the xth * search, that is, the distance from the start state to the goal state. daptive * executes omputepath() to perform an * search [Line 33]. (The minimum over an empty set is infinity on Line 3.) daptive * executes InitializeState(s) when the g-value or h-value of a state s is needed [Lines 6, 28, 29 and 42]. daptive * initializes the g-value of state s (to infinity) if state s either has not yet been generated by some * search (search(s) = 0) [Line 9] or has not yet been generated by the current * search (search(s) counter) but was generated by some * search (search(s) 0) [Line 7]. daptive * initializes the h-value of state s with its user-supplied H-value with respect to the goal state if the state has not yet been generated by any * search (search(s) = 0) [Line 0] procedure InitializeState(s) 2 if search(s) counter N search(s) 0 3 if g(s) + h(s) < pathcost(search(s)) 4 h(s) := pathcost(search(s)) g(s); 5 h(s) := h(s) (deltah(counter) deltah(search(s))); 6 h(s) := max(h(s), H(s, s goal )); 7 g(s) := ; 8 else if search(s) = 0 9 g(s) := ; 0 h(s) := H(s, s goal ); search(s) := counter; 2 procedure omputepath() 3 while g(s goal ) > min s OPN (g(s ) + h(s )) 4 delete a state s with the smallest f-value g(s) + h(s) from OPN; 5 for all actions a (s) 6 InitializeState(succ(s, a)); 7 if g(succ(s, a)) > g(s) + c(s, a) 8 g(succ(s, a)) := g(s) + c(s, a); 9 tree(succ(s, a)) := s; 20 if succ(s, a) is in OPN then delete it from OPN; 2 insert succ(s, a) into OPN with f-value g(succ(s, a)) + h(succ(s, a)); 22 procedure Main() 23 counter := ; 24 deltah() := 0; 25 for all states s S 26 search(s) := 0; 27 while s start s goal 28 InitializeState(s start); 29 InitializeState(s goal ); 30 g(s start) := 0; 3 OPN := ; 32 insert s start into OPN with f-value g(s start) + h(s start); 33 omputepath(); 34 if OPN = 35 pathcost(counter) := ; 36 else 37 pathcost(counter) := g(s goal ); 38 change the start and/or goal states, if desired; 39 set s start to the start state; 40 set s newgoal to the goal state; 4 if s goal s newgoal 42 InitializeState(s newgoal ); 43 if g(s newgoal ) + h(s newgoal ) < pathcost(counter) 44 h(s newgoal ) := pathcost(counter) g(s newgoal ); 45 deltah(counter + ) := deltah(counter) + h(s newgoal ); 46 s goal := s newgoal ; 47 else 48 deltah(counter + ) := deltah(counter); 49 counter := counter + ; 50 update the increased action costs (if any); Figure : daptive * daptive * updates the h-value of state s according to ssignment [Line 4] if all of the following conditions are satisfied: First, the state has not yet been generated by the current * search (search(s) counter). Second, the state was generated by a previous * search (search(s) 0). Third, the state was expanded by the * search that generated it last (g(s)+h(s) < pathcost(search(s))). Thus, daptive * updates the h-value of a state when an * search needs its g-value or h-value for the first time during the current * search and a previous * search has expanded the state already. daptive * sets the h-value of state s to the difference of the distance from the start state to the goal state during the last * search that generated and, according to the conditions, also expanded the state (pathcost(search(s))) and the g-value of the state after the same * search (g(s) since the g-value has not changed since then). daptive * corrects the h-value of state s according to ssignment 2 for the new goal state. The update for the new goal state decreases the h-values of

4 update the increased and decreased action costs (if any); 2 OPN := ; 3 for all state-action pairs (s, a) with s s goal whose c(s, a) decreased 4 InitializeState(s); 5 InitializeState(succ(s,a)); 6 if (h(s) > c(s, a) + h(succ(s, a))) 7 h(s) := c(s, a) + h(succ(s, a)); 8 if s is in OPN then delete it from OPN; 9 insert s into OPN with h-value h(s); 0 while OPN delete a state s with the smallest h-value h(s) from OPN; 2 for all states s S \ {s goal } and actions a (s) with succ(s, a) = s 3 InitializeState(s); 4 if h(s) > c(s, a) + h(succ(s, a)) 5 h(s) := c(s, a) + h(succ(s, a)); 6 if s is in OPN then delete it from OPN; 7 insert s into OPN with h-value h(s); Figure 2: onsistency Procedure all states by the h-value of the new goal state. This h-value is added to a running sum of all corrections [Line 45]. In particular, the value of deltah(x) during the xth * search is the running sum of all corrections up to the beginning of the xth * search. If a state s was generated by a previous * search (search(s) 0) but not yet generated by the current * search (search(s) counter), then daptive * updates its h-value by the sum of all corrections between the * search when state s was generated last and the current * search, which is the same as the difference of the value of deltah during the current * search (deltah(counter)) and the * search that generated state s last (deltah(search(s))) [Line 5]. It then takes the maximum of this value and the user-supplied H-value with respect to the new goal state [Line 6]. 6. GNRLIZ PTIV * We generalize daptive * to the case where action costs can increase and decrease. Figure 2 contains the pseudocode of a consistency procedure that eagerly updates consistent h-values with respect to the goal state with a version of ijkstra s algorithm [] so that they remain consistent with respect to the goal state after action cost increases and decreases. The pseudocode is written so that it replaces Line 50 of daptive * in Figure, resulting in Generalized daptive * (G*). InitializeState(s) performs all updates of the h-value of state s and can otherwise be ignored. Theorem. The h-values remain consistent with respect to the goal state after action cost increases and decreases if the consistency procedure is run. Proof: Let c denote the action costs before the changes, and c denote the action costs after the changes. Let h(s) denote the h-values before running the consistency procedure and h (s) the h-values after its termination. Thus, h(s goal) = 0 and h(s) c(s, a) + h(succ(s, a)) for all nongoal states s and all actions a that can be executed in state Line 3 can be simplified to for all state-action pairs (s,a) whose c(s, a) decreased since h(s goal) = 0 for consistent h- values with respect to the goal state and the condition on Line 6 is thus never satisfied for s = s goal. Similarly, Line 2 can be simplified to for all states s S and actions a (s) with succ(s, a) = s for the same reason. s since the h-values are consistent with respect to the goal state for the action costs before the changes. The h-value of the goal state remains zero since it is never updated. The h- value of any non-goal state is monotonically non-increasing over time since it only gets updated to an h-value that is smaller than its current h-value. Thus, we distinguish three cases for a non-goal state s and all actions a that can be executed in state s: First, the h-value of state succ(s, a) never decreased (and thus h(succ(s, a)) = h (succ(s, a))) and c(s, a) c (s,a). Then, h (s) h(s) c(s, a) + h(succ(s, a)) = c(s, a) + h (succ(s, a)) c (s, a) + h (succ(s, a)). Second, the h-value of state succ(s, a) never decreased (and thus h(succ(s, a)) = h (succ(s, a))) and c(s, a) > c (s,a). Then, c(s, a) decreased and s s goal and Lines 4-9 were just executed. Let h(s) be the h-value of state s after the execution of Lines 4-9. Then, h (s) h(s) c (s,a) + h (succ(s, a)). Third, the h-value of state succ(s, a) decreased and thus h(succ(s, a)) > h (succ(s, a)). Then, the state was inserted into the priority queue and later retrieved on Line. onsider the last time that it was retrieved on Line. Then, Lines 2-7 were executed. Let h(s) be the h-value of state s after the execution of Lines 2-7. Then, h (s) h(s) c (s,a) + h (succ(s, a)). Thus, h (s) c (s, a) + h (succ(s, a)) in all three cases, and the h-values remain consistent with respect to the goal state. daptive * updates the h-values in a lazy way. However, the consistency procedure currently updates the h- values in an eager way, which means that it is run whenever action costs decrease and can thus update a large number of h-values that are not needed during future searches. It is therefore important to evaluate experimentally whether Generalized daptive * is faster than *. 7. XMPL We use search problems in four-neighbor gridworlds of square cells (such as, for example, gridworlds used in video games) as examples. ll cells are either blocked (= black) or unblocked (= white). The agent always knows its current cell, the current cell of the target and which cells are blocked. It can always move from its current cell to one of the four neighboring cells. The action cost for moving from an unblocked cell to an unblocked cell is one. ll other action costs are infinity. The agent has to move so as to occupy the same cell as a stationary target (resulting in stationarytarget search) or a moving target (resulting in moving-target search). The agent always identifies a shortest path from its current cell to the current cell of the target after its current path might no longer be a shortest path because the target left the path or action costs changed. The agent then begins to move along the path given by the tree-pointers. We use the Manhattan distances as consistent user-supplied H-values. Figure 3 shows an example. fter the * search from the current cell of the agent (cell 3, denoted by ) to the current cell of the target (cell 5, denoted by ), the

5 Figure 3: xample Figure 4: Ideas behind Generalized daptive * target moves to cell 5 and then cells 4 and 2 become unblocked (which decreases the action costs between these cells and their neighboring unblocked cells from infinity to one). Figure 4 shows the h-values (in the upper right corners of the cells) using the ideas from Sections 5., 5.2 and 6, that all update the h-values in an eager way. The first gridworld shows the Manhattan distances as user-supplied H-values, the second gridworld shows the improved h-values after the * search, the third gridworld shows the updated h-values after the current cell of the target changed and the fourth gridworld shows the updated h-values determined by the consistency procedure after cells 4 and 2 became unblocked. (When a cell becomes unblocked, we set initialize its h-value to infinity before running the consistency procedure.) Grey cells contain h-values that were updated. Figure 5 shows the g-values (in the upper left corners of the cells), h-values (in the upper right corners) and search-values (in the lower left corners) of Generalized daptive *, that updates the h-values in a lazy way except for the consistency procedure. The first gridworld shows the values before the * search, the second gridworld shows the values after the * search, the third gridworld shows the values after the current cell of the target changed and the fourth gridworld shows the values determined by the consistency procedure after cells 4 and 2 became unblocked. Grey cells are arguments of calls to InitializeState(s). The h-value of a cell s in the fourth gridworld of Figure 5 is equal to the h- value of the same cell in the fourth gridworld of Figure 4 if search(s) = 2. n example is cell 2. The h-value of a cell s in the fourth gridworld of Figure 5 is uninitialized if search(s) = 0. In this case, the Manhattan distance with respect to the current cell of the target is equal to the h- value of the same cell in the fourth gridworld of Figure 4. n example is cell 2. Otherwise, the h-value of a cell s in the fourth gridworld of Figure 5 needs to get updated. It needs to get updated according to Sections 5. and 5.2 to be equal to the h-value of the same cell in the fourth gridworld of Figure 4 if g(s) + h(s) < pathcost(search(s)). n exam- Figure 6: xample Test Gridworlds ple is cell 3. It needs to get updated according to Section 5.2 to be equal to the h-value of the same cell in the fourth gridworld of Figure 4 if g(s) + h(s) pathcost(search(s)). n example is cell XPRIMNTL VLUTION We perform experiments in the same kind of four-neighbor gridworlds, one independent stationary-target or movingtarget search problem per gridworld. The agent and target move on 00 randomly generated four-connected gridworlds of size that are shaped like a torus for ease of implementation (that is, moving north in the northern-most row results in being in the southern-most row and vice versa, and moving west in the western-most column results in being in the eastern-most column and vice versa). Figure 6 (left) shows an example of a smaller size. We generate their corridor structures with depth-first searches. We choose the initial cells of the agent and target randomly among the unblocked cells. We unblock k blocked cells and block k unblocked cells every tenth time step in a way so that there always remains a path from the current cell of the agent to the current cell of the target, where k is a parameter whose value we vary from one to fifty. Figure 6 (right) shows that these changes create shortcuts and decrease the distances

6 counter= deltah() = 0 counter= deltah() = 0 pathcost() = 0 counter= deltah() = 0 deltah(2) = pathcost() = 0 counter= 2 deltah() = 0 deltah(2) = pathcost() = 0 Figure 5: Generalized daptive * between cells over time. For moving-target search, the target moves randomly but does not return to its immediately preceeding cell unless this is the only possible move. It does not move every tenth time step, which guarantees that the agent can reach the target. We compare Generalized daptive * (G*) against breadth-first search (FS), * and a heavily optimized * Lite (Target-entric Map) [9]. readth-first search, * and Generalized daptive * can search either from the current cell of the agent to the current cell of the target (= forward) or from the current cell of the target to the current cell of the agent (= backward). Forward searches incur a runtime overhead over backward searches since the paths given by the tree-pointers go from the goal states to the start states for forward searches and thus need to be reversed. * Lite is an alternative incremental heuristic search algorithm that finds shortest paths in state spaces where the action costs can increase or decrease over time. * Lite can be understood as changing the immediately preceeding * search tree into the current * search tree, which requires the root node of the * search tree to remain unchanged and is fast only if the number of action cost changes close to the root node of the * search tree is small and the immediately preceeding and current * search trees are thus similar. * Lite thus searches from the current cell of the target to the current cell of the agent for stationary-target search but is not fast for large k since the number of action cost changes close to the root node of the * search tree is then large. * Lite can move the map to keep the target centered for moving-target search and then again searches from the current cell of the target to the current cell of the agent but is not fast since moving the map causes the number of action cost changes close to the root node to be large [9]. Thus, * Lite is fast only for stationary-target search with small k. We use comparable implementations for all heuristic search algorithms. For example, *, * Lite and Generalized daptive * all use binary heaps as priority queues and break ties among cells with the same f-values in favor of cells with larger g- values, which is known to be a good tie-breaking strategy. We run our experiments on a Pentium 3.0 GHz P with 2 Gytes of RM. We report two measures for the difficulty of the search problems, namely the number of moves of the agent until it reaches the target and the number of searches run to determine these moves. ll heuristic search algorithms determine the same paths and their number of moves and searches are thus approximately the same. They differ slightly since the agent can follow different trajectories due to tie breaking. We report two measures for the efficiency of the heuristic search algorithms, namely the number of expanded cells (= expansions) per search and the runtime per search in microseconds. We calculate the runtime per search by dividing the total runtime by the number of * searches. For Generalized daptive *, we also report the number of h-value updates (= propagations) per search by the consistency procedure as third measure since the runtime per search depends on both its number of expansions and propagations per search. We calculate the number of propagations per search by dividing the total number of h-value updates by the consistency procedure by the number of * searches. We also report the standard deviation of the mean for the number of expansions per search (in parentheses) to demonstrate the statistical significance of our results. We compare the heuristic search algorithms using their runtime per search. Unfortunately, the runtime per search depends on low-level machine and implementation details, such as the instruction set of the processor, the optimizations performed by the compiler and coding decisions. This point is especially important since the gridworlds fit into memory and the resulting state spaces are thus small. We do not know of any better method for evaluating heuristic search methods than to implement them as well as possible, publish their runtimes, and let other researchers validate them with their own and thus potentially slightly different implementations. For example, it is difficult to compare them using proxies, such as their number of expansions per search, since they perform different basic operations and thus differ in their runtime per expansion. readth-first search has the smallest runtime per expansion, followed by *, Generalized daptive * and * Lite (in this order). This is not surprising: readth-first search leads due to the fact that it does not use a priority queue. * Lite and Generalized daptive * trail due to their runtime overhead as incremental heuristic search algorithms and need to make up for it by decreasing their number of expansions per search sufficiently to result in smaller runtimes per search. Table shows the following trends: * Lite is fast only for stationary-target search with small k. In the other cases, its number of expansions per search is too large (as explained already) and dominates some of the effects discussed below, which is why we do not list * Lite in the following. The number of searches and moves until the target is reached decreases as k increases since the distance

7 Stationary Target Moving Target searches moves expansions runtime propagations searches moves expansions runtime propagations until until per per per until until per per per target target search search search target target search search search reached reached reached reached k = FS (Forward) (77.8) 2009 N/ (34.4) 992 N/ FS (ackward) (58.2) 50 N/ (25.6) 474 N/ * (Forward) (73.3) 2670 N/ (32.2) 2596 N/ * (ackward) (55.4) 982 N/ (24.4) 899 N/ * Lite (5.7) 397 N/ (39.6) N/ G* (Forward) (2.3) (3.7) G* (ackward) (20.3) (9.0) k = 5 FS (Forward) (86.) 78 N/ (34.6) 588 N/ FS (ackward) (67.) 390 N/ (27.7) 252 N/ * (Forward) (78.6) 2328 N/ (3.4) 2036 N/ * (ackward) (62.6) 80 N/ (25.6) 63 N/ * Lite (32.5) 798 N/ (4.6) N/ G* (Forward) (24.) (3.7) G* (ackward) (23.) (8.3) k = 0 FS (Forward) (89.3) 650 N/ (36.3) 437 N/ FS (ackward) (67.3) 266 N/ (27.8) 07 N/ * (Forward) (79.) 209 N/ (32.2) 864 N/ * (ackward) (6.) 605 N/ (25.2) 402 N/ * Lite (44.3) 053 N/ (44.8) 6096 N/ G* (Forward) (26.) (4.4) G* (ackward) (23.8) (8.0) k = 20 FS (Forward) (92.3) 657 N/ (37.) 305 N/ FS (ackward) (70.2) 283 N/ (30.8) 058 N/ * (Forward) (79.4) 2063 N/ (32.0) 666 N/ * (ackward) (62.) 602 N/ (27.) 369 N/ * Lite (54.0) 380 N/ (49.3) N/ G* (Forward) (28.8) (5.0) G* (ackward) (26.9) (8.6) k = 50 FS (Forward) (97.9) 633 N/ (40.4) 268 N/ FS (ackward) (78.8) 360 N/ (30.8) 942 N/ * (Forward) (79.5) 927 N/ (32.) 530 N/ * (ackward) (65.5) 6 N/ (25.7) 73 N/ * Lite (72.8) 949 N/ (5.3) N/ G* (Forward) (35.6) (6.6) G* (ackward) (33.2) (9.) Table : xperimental Results between two cells then decreases more quickly (as explained already), which decreases more quickly the length of the path from the current cell of the agent to the current cell of the target. This decreases the number of moves and also the number of searches since the action costs change every tenth time step and a new search thus needs to be performed (at least) every tenth time step. The number of searches until the target is reached is larger for moving-target search than stationary-target search since additional searches need to be performed every time the target leaves the current path. The number of propagations per search of Generalized daptive * increases as k increases since a larger number of action costs then decrease. The number of propagations per search of Generalized daptive * is smaller for moving-target search than stationarytarget search for two reasons: First, the agent performs an * search for stationary target search whenever the action costs change. Thus, the number of searches equals the number of runs of the consistency procedure. The agent performs an * search for movingtarget search whenever the action costs change or the target leaves the current path. Thus, the number of searches is larger than the number of runs of the consistency procedure and the propagations are now amortized over several searches. Second, the additional updates of the h-values from Section 5.2 keep the h-values smaller. Thus, the h-values of more cells are equal to their Manhattan distances, and cells whose h-values are equal to the Manhattan distances stop the updates of the h-values by the consistency procedure. (This effect also explains why the number of propagations per search of Generalized daptive * is smaller for backward search than forward search.) The number of expansions per search of Generalized daptive * is smaller than the one of * since its h-values cannot be less informed than the ones of *. Similarly, the number of expansions per search of * is smaller than the one of breadth-first search since its h-values cannot be less informed than uninformed. The runtime per search of breadth-first search, * and Generalized daptive * is smaller for moving-target search than stationary-target search, which is an artifact of calculating the runtime per search by dividing the total runtime by the number of searches since the runtime for generating the initial gridworlds and initializing the data structures is then amortized over a larger number of searches. We say in the following that one heuristic search algorithm is better than another one if its runtime per search is smaller for the same search direction, no matter what the search direction is. We included breadth-first search in our comparison since our gridworlds have few cells and small branching factors, which makes breadth-first search better than *. Generalized daptive * mixes * with

8 ijkstra s algorithm for the consistency procedure, and ijkstra s algorithm has a larger runtime per expansion than breadth-first search. The number of propagations per search of Generalized daptive * increases as k increases. Generalized daptive * cannot be better than breadth-first search when this number is about as large as the number of expansions of breadth-first search, which happens only for stationary-target search with large k. Overall, for stationary-target search, * Lite is best for small k and breadth-first search is best for large k. Generalized daptive * is always worse than * Lite. It is better than breadth-first search and * for small k but worse then them for large k since its number of propagations is then larger and the improved h-values remain informed for shorter periods of time. On the other hand, for moving-target search, Generalized daptive * is best (which is why it advances the state of the art in heuristic search), followed by breadthfirst search, * and * Lite (in this order). 9. RLT WORK Real-time heuristic search [0] limits * searches to part of the state space around the current state (= local search space) and then follows the resulting path. It repeats this procedure once it leaves the local search space (or, if desired, earlier), until it reaches the goal state. fter each search, it uses a consistency procedure to update some or all of the h-values in the local search space to avoid cycling without reaching the goal state [2, 6, 3]. Real-time heuristic search thus typically solves a search problem by searching repeatedly in the same state space and following a trajectory from the start state to the goal state that is not a shortest path. daptive * and Generalized daptive * (G*), on the other hand, solve a series of similar but not necessarily identical search problems. For each search problem, they search once and determine a shortest path from the start state to the goal state. This prevents the consistency procedure of Generalized daptive * (G*) from restricting the updates of the h-values to small parts of the state space because all of its updates are necessary to keep the h-values consistent with respect to the goal state and thus find a shortest path from the start state to the goal state. However, daptive * has recently been modified to perform real-time heuristic search [8]. 0. ONLUSIONS In this paper, we showed how to generalize daptive * to find shortest paths in state spaces where the action costs can increase or decrease over time. Our experiments demonstrated that Generalized daptive * outperforms both breadth-first search, * and * Lite for moving-target search. It is future work to extend our experimental results to understand better when Generalized daptive * (G*) runs faster than breadth-first search, * and * Lite. It is also future work to combine the principles behind Generalized daptive * (G*) and * Lite to speed it up even more. of the authors and should not be interpreted as representing the official policies, either expressed or implied of the sponsoring organizations, agencies, companies or the U.S. government. 2. RFRNS []. ijkstra. note on two problems in connexion with graphs. Numerische Mathematik, :269 27, 959. [2] P. Hart, N. Nilsson, and. Raphael. formal basis for the heuristic determination of minimum cost paths. I Transactions on Systems Science and ybernetics, 2:00 07, 968. [3]. Hernández and P. Maseguer. LRT*(k). In Proceedings of the International Joint onference on rtificial Intelligence, pages , [4] R. Holte, T. Mkadmi, R. Zimmer, and. Maconald. Speeding up problem solving by abstraction: graph oriented approach. rtificial Intelligence, 85( 2):32 36, 996. [5] T. Ishida and R. Korf. Moving target search. In Proceedings of the International Joint onference on rtificial Intelligence, pages , 99. [6] S. Koenig. comparison of fast search methods for real-time situated agents. In Proceedings of the International onference on utonomous gents and Multi-gent Systems, pages , [7] S. Koenig and M. Likhachev. new principle for incremental heuristic search: Theoretical results. In Proceedings of the International onference on utomated Planning and Scheduling, pages , [8] S. Koenig and M. Likhachev. Real-Time daptive *. In Proceedings of the International Joint onference on utonomous gents and Multiagent Systems, pages , [9] S. Koenig, M. Likhachev, and X. Sun. Speeding up moving-target search. In Proceedings of the International Joint onference on utonomous gents and Multiagent Systems, [0] R. Korf. Real-time heuristic search. rtificial Intelligence, 42(2-3):89 2, 990. [] J. Pearl. Heuristics: Intelligent Search Strategies for omputer Problem Solving. ddison-wesley, 985. [2] J. Pemberton and R. Korf. Incremental path planning on graphs with cycles. In Proceedings of the International onference on rtificial Intelligence Planning Systems, pages 79 88, KNOWLGMNTS ll experimental results are the responsibility of Xiaoxun Sun. This research has been partly supported by an NSF award to Sven Koenig under contract IIS The views and conclusions contained in this document are those

1 Project: Fast Trajectory Replanning for Computer Games

1 Project: Fast Trajectory Replanning for Computer Games Project: Fast rajectory Replanning for omputer Games Figure : otal nnihilation by avedog onsider characters in real-time computer games, such as otal nnihilation shown in Figure. o make them easy to control,