Graph and A* Analysis Kyle Ray Entertainment Arts and Engineering University of Utah Salt Lake City, Ut 84102 kray@eng.utah.edu Abstract: Pathfinding in video games is an essential tool to have for both the player and the AI in order to reach a destination in the most efficient way. A* is the best way to find the shortest path. This paper will talk about implementing the graph to represent spacial points and the pathfinding algorithm to traverse the graph with their performance on significantly large graphs and ways to improve upon performance of the graph and pathfinder. I. Introduction Representing spacial distance is essential in video games and finding the quickest path from one space to the next needs to be done in a timely matter. But generally, the path needs to be found in a quick matter so that there is minimal waiting for the program in which each millisecond counts. Real time games such as Starcraft II demands paths to be found almost instantly; where a lost millisecond could be the difference between a successful attack and a failed one. This normally wouldn't be a problem on smaller graphs, but usually the smallest a grid will be is the size of monitor and usually greater than 1000 x 1000 grid sizes. We will test the efficiency of A* and large graphs with varying resolutions. This will be a much more simplified version of a map and nowhere near the complexity of a AAA game. If the algorithm is too slow for a simplified version, then it will be even slower in AAA game; and if its fast enough, then it can easily be scaled up to meet the demands of a AAA game. The map will be a simple grid of n x m that is subdivided into a certain cell size, with 1 being the size of a pixel. The current algorithms that are used for the graph and pathfinding are more naive approaches and suffer considerably when the grid size reaches 1000 x 1000. II. Graph The graph is designed for a rectangular grid of n x m size with variable cell size. Each vertex in it represents a spacial state in the grid and is stored using an adjacency list. Although the graph itself is undirected, the architecture of the adjacency list is designed so that the graph is directed in which two vertices that are connected to each other each have an edge that points to the other. This creates more space than what is needed. Each Vertex stores an x, y point that specifies its spacial location on the grid and is used to identify it to the graph. The other property that a Vertex has is a list of Vertices that it is connected to and
the weight associated to traverse the spacial distance. In this example all horizontal and vertical movement have the same weight (W) while weight of diagonal movement is W2=W 2. This uniformity makes it easier to auto generate a graph of n x m size without dealing with random weights to have reliable heuristics. A next step would to add different weights that would signify different terrain and slopes. III. A* A* was developed in 1968 by Peter Hart, Nils Nilsson, and Bertram Raphael as a general case of Edgar Dijkstra's algorithm which he developed in 1959. Dijkstra s algorithm was developed to find the shortest path on a weighted graph. It used the cost to get from the starting vertex to the current vertex (G cost) in a priority queue to determine which vertex to explore next. When a vertex comes out of the priority queue, then it is fully explored and cannot have a better G cost. Since Dijkstra s only looks at G cost, it explores radially from the tree and can be expensive to find the shortest path if the destination vertex is known. Dijkstras(Graph, source): for each v in Graph: dist[v] := infinity prev[v] := v visited[v] := False dist[source] := 0 pq := priority queue pq.insertorupdate(source, 0) while pq is not empty: current := pq.pop() visited[current] := True for each v in current.neighbors: if visited[v] == True: continue endif endwhile distance := dist[current] + distance_between(current, v) if distance < dist[v]: dist[v] = distance previous[v] = current pq.insertorupdate(v, distance) endif return (dist, prev) //dist is used to show the distance from the source to any vertex, //and prev can be used to retrace the path from the source to any vertex. Endfunction
A* is a general purpose algorithm that adds heuristics to Dijkstra s algorithm. The heuristic cost estimates the cost from the current vertex to the destination. While calculating the h cost can be anything depending on the graph, there are 3 ways to estimate spacial difference: Manhattan Distance This is used to estimate the cost if horizontal and vertical movements are possible, but no other kind (diagonal). The formula is: W ( x 2 x 1 + y 2 y 1 ) where W is the expected weight in order to better estimate the cost. Diagonal Distance This is used to estimate the cost if horizontal, vertical, and diagonal movements are possible, but not any angle in between. The formula for this is: dx= x 2 x 1 dy= y 2 y 1 W (dx+dy)+(w2 2 W ) min(dx,dy). W is the expected weight of going horizontal and vertical while W2 is the expected cost of going diagonal. Euclidean Distance This is used to estimate the cost if any 360 degree direction is allowed where a straight line from the source to the destination would accurate. The formula for this is: W (x 2 x 1 ) 2 +( y 2 y 1 ) 2 where W is the expected weight of going in any direction. These estimations do not take into account variable weights or vertices that are impassable which would make these estimations underestimate and could affect performance if there is an impassable terrain in which the shortest path would not be the one that seemed obvious. This wasted computation has devolved the pathfinding algorithm to a simple greedy algorithm. The other thing that is added to A* is the notion of a closed and open set. The open set is the priority queue from Dijkstra s algorithm which gives us vertices to evaluate and the closed set is just a set where all vertices that go there have been fully evaluated. The algorithm also makes use of lambdas for the user to decide what heuristic they should use and make the algorithm more abstract. If the user wants to use Djikstra s instead of using a heuristic, then h_cost just needs to be 0 for all vertices so that the priority queue only uses the g cost. function A*(Graph, source, destination, heuristic): closedset := an empty set openset := an empty priority queue for each v in Graph: g_cost[v] := infinity previous[v] := v openset.insertorupdate(source, 0) g_cost[source] = 0 while openset is not empty: current := openset.pop() closedset.insert(current) if current is the destination: break
endfunction IV: Timing endif endwhile for each v in current.getneighbors(): if v is in closedset: continue endif path := empty list h_cost := heuristic(v, destination) previous[v] := current g_cost[v] := g_cost[current] + distance_from(current, v) openset.insertorupdate(v, g_cost[v] + h_cost) current_vertex := destination while previous[current_vertex] is not current_vertex: path.insert(current_vertex) current_vertex := previous[current_vertex] endwhile return path //using this method, the path will be in reverse order by having the destination //being in index 0 and the source will not be in the path, but that shouldn t be a //be a problem since the source is already known. There are two parts of finding the shortest path. In order to get the path, the graph must be constructed and linked. Constructing a graph is simply making each vertex while linking the graph is creating each edge for each vertex. With the graph constructed, then the A* algorithm can be timed to see how long it takes to find the shortest path from the source to the destination. The timing algorithm checks for doubling behavior to get a sense of big O and asymptotic complexity. It will do that be creating an n x n grid that will start at 32 and double until it hits 1024 or 2048. While timing the linking and the pathfinding, it will run the simulation on repetition by starting out at 1 repetition and doubling it until the duration to run all repetitions exceeds a second. This is to get a better sense of the true time over a greater average and to compensate for the time it takes to load external libraries on the first few times the algorithm is ran. After the overall time is taken, then it will run the timing algorithm without the actual algorithm in order to account for loop overhead. This overhead is then subtracted from the overall time to get a true time to run the algorithm. The timing algorithm will time how long it takes to link the graph and find the shortest path from
one end of the graph to the other end with varying resolutions where cell sizes are 10, 5, and 1. for i = [32, 1028] where i *= 2: repetitions := 1.0D do: elapsed := 0 start := get current time V: Results for t in [0, repetitions) where t++: //run the algorithm end := get current time total_time := end start start := get current time for t in [0, repetitons) where t++ //this will get the loop overhead end := get current time overhead_time := end start elapsed := total_time overhead_time in milliseconds / repetitions repetitions *= 2 while(repetitions * elapsed < DURATION) end do while Fig. 1 Time in milliseconds to link an n x n graph of cell size 10.
Fig. 2 Time in milliseconds to find the shortest path in an n x n graph of cell size 10. The two graphs in Figure 1 and Figure 2 shows that the bottleneck of the program is linking together the graph. Finding a path is relatively quickly as it gets to 20 milliseconds to find a path in a graph that has approximately 12,000 vertices. Fig. 3 The time in milliseconds to link a graph of cell size 5
Fig. 4 The time in milliseconds to find the path in a graph of cell size 5 A graph with finer resolution yields the same pattern as the graph will cell size 10. The time it takes to find a path has increased since the number of vertices has increased to approximately 48,000. The time it does take to link each vertex is far too long since the longest it would take is nearly 280 seconds, over 4 minutes, for a sufficiently large graph. Fig 5 Time in milliseconds to link a graph of cell size 1
Fig 6 Time in milliseconds to find a path in a graph of cell size 1 This is the first break from the pattern established in the larger cell sizes which suggests that even though linking the graph may take longer in the short run, the time it takes to find a path will overtake and outrun the linking asymptotically. With both finding a path and linking the graph, it takes up to 30 minutes just to find a path on with little short of 100,000 vertices and 10 minutes just to link it. This is obviously unacceptable and doing a basic A* algorithm won t be able to cut it as graph sizes will be much larger than the graph sizes used in this testing and the weights won t be as uniform adding to the cost. VI: Conclusion Even with relatively smaller size graphs, the bottleneck presented by either linking the graph or finding an actual path is not acceptable in terms of performance and finding an optimal solution in a fast time. There are many optimizations that could potentially increase the performance of the pathfinding algorithm. One thing to change is the architecture of the graph. The current graph uses an adjacency list with each vertex being directed. This results in the linking algorithm to go through each vertex and linking it to up to eight other vertices with an algorithm that finds a vertex in the graph. The first step is could be to switch the list to a matrix which would allow finding each adjacent to only take constant time instead of linear. Also since the graph is undirected, the matrix would be symmetrical allowing the algorithm to store only one half of it to save space in memory and potentially increase what we cache. Since the introduction of A* in 1968, there have been many papers that have improved and optimized it such as ANA*, HPA*, Anytime A*, Weighted A*, and many others. Any of these could provide the optimization that is needed, but HPA* would be best for this particular algorithm since it would abstract the graph into larger cell sizes to find the general shortest path and recurse into each cell and find the shortest path with a smaller grid size. Another way would be to build and link the graph at the same time as finding the path up to a certain point such as 150 vertices or more;
while this would limit their movement, it would increase the performance of the algorithm. There are clever ways to get about the limited movement, such as you can only move where you can see. There are many ways to improve performance and many of those ways are based on what the graph and pathfinding algorithm will be used for.