Searching for Shortest Path in A Large, Sparse Graph under Memory Limitation: A Successive Mixed Bidirectional Search Method

Searching for Shortest Path in A Large, Sparse Graph under Memory Limitation: A Successive Mixed Bidirectional Search Method Xugang Ye Department of Applied Mathematics and Statistics, The Johns Hopkins University

Shortest t Path Problems: Past Results and New Challenges Classical Problems Finite Graphs, Networks One-to-One Shortest Path Problem One-to-All Shortest Path Problem All Pairs Shortest Path Problem Algorithms Label Setting Algorithms (e.g., Dijkstra s Algorithm) Label Correcting Algorithms (e.g., Bellman-Ford Algorithm) Auction Algorithms (e.g., Bertsekas Bidding Algorithm)

Shortest t Path Problems: Past Results and New Challenges (Contd.) New Challenges Large/Infinite Graphs/Networks Memory Limitation Nondeterministic Arcs/Edges Dynamic Graphs/Networks

Problem Statement We consider a directed, positively weighted graph denoted as D = (s, t, V, A, W), where s is the starting node, t is the destination node, V is the set of other nodes, A is the set of arcs, and W: A R + represents the weight function that satisfies δ < W (u, v) < + for any (u, v) A, where δ > 0 is a positive constant. Definition. A directed, positively weighted graph D = (s, t, V, A, W) is called locally finite if for each node u V {s, t}, the set N u ={v (u, v) A or (v, u) A} is finite. Furthermore, D is called locally very sparse if there exists a small positive integer, say B, such that the size of N u, denoted as N u, is bounded above by B for every u V {s, t}.

Problem Statement (Contd.) We assume the graph D = (s, t, V, A, W, B) is locally very sparse We only consider directed path. For any two nodes u, v, we denote dist(u, v) as the distance from u to v. If there is no u-v path, we define dist(u, v) = + ; otherwise, we define dist(u, v) to be the length of a shortest u-v path. Goal: Find a shortest s-t path in D It is easy to see that if there is at least one s-t path, then there is at least one shortest s-t path.

Methods For Large Scale Problems Classical Best-First Search Restricted by the memory limit Depth-first Search Only applicable to the graphs with very few cycles Classical Best-First Search + External Storage Low efficiency in duplicate detection Frontier Search (FS) + Divide-and-Conquer Bidirectional Frontier Search (DCBFS) Very good method for large, sparse graph Frontier Search (FS) + Divide-and-Conquer Unidirectional Frontier Search (DCUFS) Very good method for large, sparse graph, but need good heuristic Successive Mixed Bidirectional Search An alternative to the divide-and-conquer technique, and advantageous in utilizing external storage

Dijkstra s Algorithm Definition. An algorithm for finding a shortest s-t path in D is called complete if it can find an s-t path as long as there exists one in D. Definition. A complete algorithm for finding a shortest s-t path in D is called optimal if it can find a shortest s-t path as long as there exists an s-t path in D. Dijkstra s Algorithm in Best-First Search Version (Algorithm 1)

Dijkstra s Algorithm (Contd.)

Dijkstra s Algorithm (Contd.) Well Known Properties of Dijkstra s Algorithm If the algorithm terminates at its Step 2, then t is unreachable from s. Dijkstra s algorithm in best-first search version is complete. Dijkstra s algorithm in best-first search version is optimal.

Dijkstra s Algorithm (Contd.) Supplemental Properties of Dijkstra s Algorithm Theorem 2.2.1. Let P: v 1 (= s) ~ v 2 ~ ~ v k (= t) be a shortest s-t path. 1 2 k At any time when t E φ, there exists an index i such that 1 i < k, v h E for any 1 h i, and v i+1 O. Moreover, d(v h ) = dist(s, v h ) = L(P(s, v h )) for any 1 h i+1, where P(s, v h ) denotes the subpath of P from s to v h. Monotonicity 1 Theorem 2.2.2. At any time when E φ and O φ, for any u E and v O, dist(s, u) = d(u) dist(s, v) d(v). Monotonicity 2

Frontier Search: Algorithm Idea: to reduce the memory requirement by not storing the Closed list Frontier Dijkstra s Algorithm (Algorithm 2)

Frontier Search: Theoretical Results Equivalence Relation (Korf and Zhang, 2005) Theorem 3.1.1. If a node u is selected in Step 3 of Algorithm 2, after u is closed in Step 6, it will never be reopened. Equivalence 1 Theorem 3.1.2. With the same tie-breaking rule, during the same iteration, Algorithm 1 and Algorithm 2 select the same node with the same d label and the same predecessor. Equivalence 2

Divide-and-Conquer Technique DCBFS via Graphic Illustration s u 1 u 2 u 1 t s u 3 t Find first intermediate node Find second and third intermediate node Drawback: the structure of the algorithm is complicated, and there are considerably many nodes that are visited multiple times

Divide-and-Conquer Technique (Contd.) One Pass termination condition for the bidirectional FS where s t c min d ( x ) + min d ( y ) x O s y O d s (v) represents the distance label of v in the forward search d t (v) represents the distance label of v in the backward search c equals the length of the shortest s-t path found so far Drawback: the quantity c may reach the global minimum much earlier than this event is detected t

Solve the Technical Difficulties Solution of Korf et al. Divide-and-Conquer Unidirectional Frontier Search (DCUFS) Midline heuristics Our Solution Successive Mixed Bidirectional Search

Mixed Bidirectional Search Idea Find a nontrivial path P: v 1 (= s) ~ v 2 ~ ~ v k ( s) such that P is part of a shortest s-t path in D. Technical detail A forward version of Algorithm 1 starts from s and proceeds as long as the allocated memory allows; a backward version of Algorithm 2 starts from t and proceeds to meet the forward search. One pass termination condition The backward search first selects a node that has been closed by the forward one.

Mixed Bidirectional Search (Contd.) Mixed Bidirectional Search via Graphic Illustration s u u* t

Mixed Bidirectional Search: Algorithm

Mixed Bidirectional Search: Correctness Theorem 4.1.1. If there exists an s-t path in D, then Algorithm 3 must be able to terminate within finite steps. Upon termination, it will return a path P, which is a part of a shortest s-t path and L(P) >δ. Sketch of Proof. 1. Algorithm 3 will terminate within finite steps. It either terminates with t E s at Step 7, or it will find u at Step 9 and jumps to Step 13. 2. In the first scenario, an entire shortest s-t path is found; in the second scenario, a part of a shortest s-t path is found. * s t 3. Show that u = arg min( d ( v) + d ( v)) lies on a shortest s-t path. v E 4. Look at the case u * s and the case u * = s. s O t

Idea Successive Mixed Bidirectional Search: Algorithm Apply algorithm 3 successively. The resulting algorithm is named Algorithm 4.

Successive Mixed Bidirectional Search: Correctness Theorem 4.2.1. Algorithm 4 is both complete and optimal, i.e. if there exists an s-t path in D, then Algorithm 4 will return a shortest s-t path after finite number of iterations. Sketch of Proof. 1. Note that dist(s,t) < +. 2. Note that L(P k ) = dist(s k, t) dist(s k+1, t). k i = 1 K i = 1 3. Note that k δ L( P ) = dist(s, t) dist(s k+1, t) dist(s, t). 4. Note that L(P) = L( Pi ) = dist(s, t), where s K+1 = t. i

Successive Mixed Bidirectional Search: Acceleration Idea Reduce the repeated node expansions incurred in overlapping backward searches. Technical detail 1. Initially apply a full backward FS as stated as Algorithm 2. During the lifetime of the full backward FS, strategically save some intermediate fronts into the external storage device (e.g., hard drive). 2. During one pass of mixed bidirectional search, load the saved front as needed and d move this front when necessary. 3. Select appropriate heuristic to determine the fronts to be saved.

Preliminary Numerical Tests The test graph in local view (left) and global view (right)

Preliminary Numerical Tests (Contd.) Visualization of a test on Algorithm 4 (left: the first partial solution path; right: the third solution path)

Preliminary Numerical Tests (Contd.) Visualization of a test on Algorithm 4 (Contd., left: the entire solution path; right: the CPU time for finding each partial solution path)

Preliminary Numerical Tests (Contd.) Visualization of a test on accelerated Algorithm 4 (left: the effective backward fronts and the entire solution path; right: the CPU time for the initial full backward FS and for finding each partial solution path)

Preliminary Numerical Tests (Contd.) Performance summary of five algorithms (algorithms are coded with Matlab 7.1 and the programs are executed in a PC with Intel dual core CPU T2050 at 1.60 GHz and 1.0 G RAM.)

Conclusions and Perspectives Our method is an alternative to the divide-and-conquer technique If we emphasize more on the issue of memory saving and want to attack extremely large problems, then the divide-and-conquer technique is probably a better option. If we have considerably large memory and are more focused on the computational efficiency, then our method is more advantageous. Our algorithm has a simpler structure than a divide-and-conquer algorithm. We suggest investigating the possibility of designing a mixed bidirectional A * algorithm and then consider how to successively apply it.

Questions? Thanks very much!