New Settings in Heuristic Search. Roni Stern

Size: px

Start display at page:

Download "New Settings in Heuristic Search. Roni Stern"

Ashlee Weaver
5 years ago
Views:

1 New Settings in Heuristic Search Roni Stern September 23, 2011

2 This work was carried out under the supervision of Dr. Ariel Felner and Dr. Meir Kalech at the Department of Information Systems Engineering, Ben-Gurion University. i

3 Acknowledgments I would like to thank my supervisors Dr. Ariel Felner and Dr. Meir Kalech... I would like to thank the staff of the ISE department... (Shosh and Anat)... I would like to thank Dr. Uzi Zahavi for providing the template for this dissertation, and allowing me to use in this dissertation sentences from his dissertation. This is especially evident in the introduction chapter. I would like to thank my Parents... I would like to thank my wife Adi... ii

4 Contents 1 Introduction & Overview Background Thesis Overview Related Publications Further Publications Potential-based Search The Potential Search Algorithm Calculating the Potential Heuristic Models Constant Gap h-model Additive h-model Linear Relative h-model General h-model Bounded-cost Experimental Results Puzzle Key Player Problem in Communication (KPP- COM) Potential-based Anytime Search Experimental Results Conclusions and Future Work Probably Approximately Correction Heuristic Search Related Work PAC Heuristic Search Formal Definition Framework for a PAC Heuristic Search Algorithm Identifying a PAC Solution Trivial PAC Condition Ratio-based PAC Condition Exploiting a Lower Bound Learning from the Open List iii

5 3.4 Experimental Results Conclusions and Future Work Searching for Patterns in an Unknown Graph Problem Definition Related Work Pathfinding Searching for a Pattern in a Graph Best-First Search in an Unknown Graph Computational complexity Deterministic Heuristics KnownDegree Pattern Probabilistic Heuristic MDP Approach RPattern Theoretical Analysis Generalizing to Any Given Pattern Experimental Results RLS-LTM Evaluating the Deterministic Heuristics A Real Domain of Unknown Graphs from the Web Simulated Graphs with Probabilistic Knowledge Conclusion and Future Work Conclusions and Future Work 91 Bibliography 103 iv

6 List of Figures 2.1 Example of an expansion dilemma Manhattan distance heuristic Vs. true distance KPP-COM optimal solution vs. heuristic puzzle, solution quality Vs. Runtime Anytime KPP-COM, 600 nodes, density h h distribution for the additive 7-8 PDB heuristic Example of exploring an unknown graph Example of a subgraph that is not an induced subgraph Example of KnownDegree Example of a matching extension and the Pattern heuristic algorithm Example of potential patterns An example of the incremental update of the set of potential k-clique Examples of complete bipartite graphs Scenarios for best and worst case exploration cost Non probabilistic heuristic algorithms on random graphs Non probabilistic heuristics on scale-free graphs Citation web graph from a random walk in GS Runtime of exploring a web page Random graphs, various levels of noise Random graphs, different desired clique size v

7 List of Tables 2.1 h models and their corresponding cost functions Expanded nodes as a percentage of nodes expanded by A. Fixed desired cost C puzzle expanded nodes. The desired cost C was based on a suboptimality degree puzzle results with an oracle. Varying degree of suboptimality Average runtime in seconds on KPP-COM instances Average number of nodes expanded until PAC-SF returned a solution Performance of different PAC conditions Runtime in milliseconds, on random graphs Number of instances where the desired clique was found Online search results of the GS web citation graph, searching for a 4-clique Exploration cost and runtime, different max sample depth Runtime (in sec.) when searching for cliques of different sizes vi

8 List of Algorithms 1 PAC search algorithm framework Procedure Explore Best-first search in an unknown graph Pattern Procedure IncrementalUpdate RPattern vii

9 Abstract Heuristic search is a general problem solving method in Artificial Intelligence. As such, there are various settings for search problems. For example, in some search settings, any solution found is acceptable, while on other settings only the best solution (according to some metric) is good enough. This dissertation presents three new settings for heuristic search: boundedcost search, PAC search, and pattern searching in unknown graphs. Boundedcost search and PAC search extend the traditional notion of how the desired solution quality is defined. Searching for patterns in unknown graphs considers a different measure of search cost, namely exploration cost. The bounded-cost search problem is a search problem where the task is to find a path in the state space with cost under a constant bound. Existing search algorithms are not fit to solve a bounded-cost search problem efficiently, since they are not designed to consider a cost bound parameter. In this dissertation Potential search (PTS) algorithm is introduced to address this setting. PTS is a best-first search that expands nodes according to the probability that they will be a part of a solution of cost less than the required bound. This probability is termed the potential of a node. Calculating the potential of each node is challenging. However, it is possible to identify and expand the node with the highest potential without explicitly calculating the potential for any node. Identifying the node with the highest potential can be done by using a heuristic function that estimates the cost of the shortest path from a state to a goal state, and considering the relation between this heuristic function and the cost that it estimates (i.e., the lowest cost path to a goal). One of the major contributions of this dissertation is in introducing probabilistic arguments such as the potential in a heuristic search algorithm, and demonstrating how this can be done efficiently. Solving bounded-cost search problems has applications beyond the bounded-cost setting. Specifically, an anytime algorithm can be constructed by solving a sequence of bounded-cost search problems with a decreasing cost bound. An anytime algorithm is an algorithm that continues to run after the first solution is found, finding solutions of better qualities until it is halted. Anytime algorithm are very common in timecritical applications such as robotics, where waiting until the optimal solution is found is often infeasible. The resulting algorithm, named

10 APTS, is a robust anytime algorithm that is competitive and often outperforms state-of-the-art anytime search algorithms. In the second setting introduced in this dissertation, named PAC search, the concept of suboptimal search is generalized. In PAC search the task is to return a solution with cost that is with high probability at most a constant factor times the cost of the optimal solution. Defining PAC search is by itself a major contribution of this dissertation, as it is the first time that the desired solution quality of a search problem is defined probabilistically. Furthermore, a simple but general framework is given for constructing a PAC search algorithm. A key challenge in a PAC search algorithm is to identify when the incumbent solution is indeed suboptimal with high enough probability. Several methods for this are given, that are based on learning a priori the commutative distribution of the ratio between an existing heuristic function and the real value that it estimates. Empirical evaluation confirm the research hypothesis that indeed finding a solution that is with high probability suboptimal can be done much faster than finding a solution that is guaranteed to be suboptimal. The third search setting researched in this dissertation is the unknown graph. Previous work have described the unknown graph setting, in which the task is to minimize search effort. In this dissertation, the general problem of finding a given pattern in an unknown graph is addressed. A best-first search framework is given for solving this problem. Several heuristic algorithms in this best-first search framework are presented, including an algorithm (RPattern ) that can exploit additional probabilistic knowledge of the searched graph. The problem of searching for a pattern in an unknown graph is theoretically analyzed. The theoretical analysis shows that searching with a random walk or with any other algorithm will have similar best-case, worst-case and competitive ratio. However, experimental results show that the proposed heuristic algorithms are much more efficient in terms of exploration cost than random exploration. These experimental results were preformed on various types of graphs, including simulated random and scale-free graphs as well as on an online web crawler application that search for a k-clique pattern in Google Scholar. Most parts of this dissertation have been previously published in [Stern et al., 2010a; 2010b; 2010d; 2011b; 2011a]. Other parts are currently under review. ii

11 Chapter 1 Introduction & Overview Heuristic search is a general problem solving method in Artificial Intelligence. The problem is solved in trial and error exploration of alternatives. Heuristics are used by algorithms to determine which of many paths has the most potential to solve the problem. Heuristics represent compromises between the need for simple principles and the desire to see them distinguish correctly between good and bad choices. The search is performed on a problem space graph. The nodes, edges and weights of the problem space graph represent the different states of the problem, legal moves, and the cost of moves, respectively. A problem instance is a pair of an initial state and at least one goal state in the problem space. A solution of a problem instance is a path from the initial state to a goal state. An optimal solution is a lowest-cost path from the initial state to one of the goal states. The term generating a node refers to creating a data structure representing the node, while expanding a node means to generate all its children. Traditional heuristic search algorithms find a solution by starting at the initial state and traversing the space graph until a goal node is found. To simplify, one can view the search space as a search tree whose root is the initial state. The various search algorithms differ in the order that they decide to traverse the search tree by deciding which node to expand. The rest of this chapter aims at giving relevant background and definitions for the terms and concepts used in this dissertation. Section 1.1 gives a short background on the basic search algorithms (especially the best-first search algorithm extended in this thesis). In Section 1.2, an overview of this dissertation is provided. 1

12 1.1 Background Different search algorithms can be compared to each other according to the following criteria: Soundness and completeness: An algorithm is said to be sound if it is guaranteed that the solution it returned is correct. In the context of search algorithm, a sound algorithm is an algorithm that returns a valid path from the initial state to a goal state. An algorithm is said to be complete if it is guaranteed to return a solution is such exists. Solution quality: An algorithm can return a solution of different qualities. In the context of search, the solution quality is the cost of the path that was found by the algorithm from the initial state to a goal state. A solution of a search problem is said to be optimal if it is the lowest cost path between the initial state and a goal state. Search effort: The search effort is the cost incurred by the problem solver from the time that the problem instance is given to the problem solver and until the problem solver halts. The search effort often consists of two parameters: time complexity and space complexity. In search, time complexity is usually proportional to the number of generated nodes, since generating a node usually takes a constant time. When searching in combinatorially large state spaces, the space complexity is usually a proportional to the number of state that were stored during the search. In chapter 4 the notion of search effort that should be minimizes is extended to consider scenarios where the search effort contains also resources that are not time or memory, e.g., fuel or network load. This dissertation focuses only on sound algorithms, as in search problems checking if a given path is indeed a valid path is trivial - simply verify that applying the operators in the returned solution indeed result in reaching a goal state. Thus any unsound algorithm can be easily modified to be sound by checking the returned solution and not returning it if it is invalid. Search algorithms can be divided into two classes: local search and systematic search. Local Search In general, local search algorithm search the search space by moving from a state to one of its neighboring states, until a goal state is found. 2

13 To demonstrate this process, consider the Hill Climbing algorithm [Russell and Norvig, 2010], which is one of the classical local search algorithms. In Hill Climbing, a current state is maintained. Initially, the current state is the initial state of the search problem. In every iteration of Hill Climbing, the neighbors of the current state are considered. If one of the neighbors is a goal state - the search terminates and the goal is returned. Otherwise, the neighbors of the current state are evaluated with an evaluation function that estimates how far is each neighbor from a goal state. Then, the best neighbor according to this evaluation function is chosen to be the current state in the next iteration. This continues until a solution is found. There are many variants of local search. In some local search algorithm, the search process restarts if a solution is not found after a given number of iterations. Restarting the search means setting the current state as the initial state. Stochastic local search algorithms [Hoos and Stützle, 2004] introduce randomness into the search to add diversity. Tabu local search [Glover and Laguna, 1999] store a limited number of previously visited states, to avoid searching more than once the same state. However, Tabu local search, and any local search in general, do not keep track of all the path that were previously searched. As a result, there are paths in the search space that may be searched more than once. Furthermore, a local search cannot guarantee that it will reach all of the states in the search space. Consequently, a local search cannot guarantee that a solution will be found (i.e., it is not complete), nor can a local search identify when a solution does not exist. Furthermore, a local search algorithm cannot guarantee that the optimal solution will be found, since the optimal solution might be in a part of the search space that was not visited by the local search algorithm. Despite these limitations, local search algorithms are widely used an many applications, including SAT [Pham et al., 2008], TSP [Merz and Huhse, 2008] and MAX-Clique [Battiti and Mascia, 2009]. Some even claim that for most practical search problems, local search is the methodof-choice. However, the inability to guarantee that if a solution exist it will be returned is a major shortcoming of local search algorithm. Furthermore, being able to guarantee the quality of the returned solution is a desired property of a search algorithm in many scenarios. For example, information on the quality of the returned solution can be used by a control mechanism to decide whether the returned solution is good enough or is further search required. Thus, completeness and guaranteeing the quality of the returned solution are important properties that cannot be addressed by local search algorithms. By contrast, systematic algo- 3

14 rithms, which are described in the following section, can be complete and give some guarantees on the quality of the solution they return. Systematic Search Systematic search algorithms are algorithms that systematically go over the entire search space. As such, given enough time a systematic search algorithm will be able to identify when all the states in the search space have been searched. This is done by keeping track of the paths in the search space that were traversed by the search algorithm. There are several mechanisms for doing this, that vary by their memory requirements. One of the common mechanisms for keeping track of the paths that were previously searched by the search algorithm is to store all states in the perimeter of the search. The perimeter of the search is the set of states that have been searched, but have neighboring states that have not been searched yet. Classical examples of systematic search algorithms that use this mechanism are breadth-first and depth-first search [Cormen, 2001]. In breadth-first search the states in the perimeter of the search are stored in a queue, while in depth-first search the states in the perimeter of the search are stored in a stack. Systematic search algorithms can have two beneficial properties: completeness and optimality. Systematic search algorithms are complete because a systematic search guarantee that every state in the search space will be searched. Thus if a goal state exists - it will be found by a systematic search algorithm. For similar reasons, a systematic search cannot identify when a solution does not exists and terminate. Some systematic search algorithm can also guarantee that the returned solution is optimal. In the extreme case, a systematic search can do this by searching all the states in the search space, and returning the goal state with the lowest cost (= the optimal solution). There are search algorithms that can find and identify an optimal solution without searching the entire search space. Well-known examples of search algorithms that return the optimal solution without necessarily searching the entire search space are Dijkstra s algorithm [Dijkstra, 1959] and A* [P. E. Hart and Raphael, 1968]. Best-First Search One of the most widely used systematic search algorithm is the bestfirst search algorithm [Pearl, 1984a]. Best-first search (BFS), keeps two lists of states: an open list, which contains all the generated states which have not yet been expanded (this is the perimeter of the search mentioned before), and a closed list which contains all the states that have 4

15 been previously expanded. Every generated state is assigned a value by an evaluation function. The value that is assigned to state is called the cost of the state, and the corresponding evaluation function is called the cost function. In every iteration of BFS, the state in the open list with the lowest cost is chosen to be expanded. 1 This lowest-cost state is moved from the open list to the closed list, and the children of this state are inserted to the openlist. The purpose of the closed list is to avoid inserting states to the openlist that have already been expanded. Once a goal state is chosen for expansion, i.e., it is the lowest-cost state in the openlist, BFS halts and that goal is returned. 2 Special cases of best-first search include the breadth-first search, Dijkstra s single-source shortest-path algorithm [Dijkstra, 1959] and the A* algorithm [P. E. Hart and Raphael, 1968], differing only in their cost function. If the cost of a node is its depth in the tree, then a best first search becomes a breadth-first search, expanding all nodes at a given depth before any other depth. If the edges in the graph have different costs, then taking g(n), the shortest known distance from the initial state to state n result in Dijkstra s algorithm. If the cost is f(n) = g(n)+h(n), where h(n) is a heuristic function estimating the cost from state n to a goal state, then a best-first search becomes the A* algorithm [P. E. Hart and Raphael, 1968]. Search algorithms that use heuristic functions are known as informed search algorithms or heuristic search algorithms. The A* algorithm has two famous properties. First, if h(n) is admissible, i.e., it never overestimates the actual cost from node n to a goal, then A* is guaranteed to return the optimal solution, if one exists. The second famous property of A* is that any other equally informed search algorithm will have to expand all the nodes expanded by A* before identifying the optimal solution [Dechter and Pearl, 1985]. This second property is important, since search algorithm runtime is usually proportional to the number of nodes expanded. Thus, A* is said to be optimally effective if one wants to find a solution that is guaranteed to be optimal. However, even with A* and a highly accurate heuristic function, it is often difficult to find the optimal solution [Helmert and Röger, 2008]. Local search algorithms and A* address two extreme problem settings. On the one hand, local search algorithms address search problems where completeness is not needed as well as any guarantee on the quality of the returned solution. On the other hand, A* addresses a search problem where both completeness and optimality are required. This disser- 1 An alternative definition of BFS is that every state is assigned a utility, and the state with the highest utility is expanded in every iteration of BFS. 2 This is the text book version of BFS [Russell and Norvig, 2010]. However, there are variants of BFS where the search is halted earlier [Stern et al., 2010c] 5

16 tation addresses three new problem settings in the range between these two extremes. 1.2 Thesis Overview The following chapters are the main parts of the research in this dissertation. In every chapter a new setting for heuristic search is presented. Each chapter includes an intuitive and formal description of the new setting, and search algorithms are presented that are specifically designed to address the new setting. Chapter 2 introduces the bounded-cost search problem, where the task is to find a path in the state space with cost under a constant bound. For this setting, Potential search (PTS) algorithm is introduced, which expands nodes according to the probability that they will be a part of a solution of cost less than the required bound. This chapter also introduces a new efficient anytime search algorithm that is built from solving with PTS a sequence of bounded-cost search problems. Chapter 3 extends the connection between probability and search by introducing the probably approximately correct search problem, or in short PAC search. In PAC search, the problem is to find a solution that is approximately optimal with high probability. We argue that it is often possible to find a solution that is approximately optimal with very high probability much faster than finding a solution that is guaranteed to be approximately optimal. Several methods for creating a PAC search algorithm are presented, based on the probabilistic relation between an existing heuristic function and the real value that it estimates. Chapter 4 discusses searching in the unknown graph setting. Specifically, the general problem of finding a given pattern in an unknown graph is addressed, where the task is to minimize the total exploration cost. This problem is theoretically analyzed, and a best-first search framework is given for solving this problem. Several heuristic algorithms in this best-first search framework are presented, including an algorithm (RPattern ) that can exploit additional probabilistic knowledge of the searched graph. In addition, an online web crawler application is also described, that uses the proposed algorithms to search for k-clique pattern in Google Scholar. Finally, Chapter 5 concludes this dissertation by summarizing the main contributions and discussing future work. 6

17 1.3 Related Publications The majority of the theories and results appearing in this dissertation were published in the proceedings of the conferences listed below, or are currently under review as a journal paper. Chapter 2 contains material that was published in the proceedings of ICAPS-11 and SoCS-10. This chapter is currently undergoing final stages of editing and will be submitted as a journal article that summarizes this topic combined with supplemental research performed by Jur van den Berg et. al. as part of a paper that was presented at AAAI- 11 [van den Berg et al., 2011]. 3 Chapter 3 contains material that was published in the proceedings of SoCS-11. This chapter as well is currently being edited and will be submitted as a journal article in the near future. Chapter 4 is almost an exact copy of a journal article currently under review in JAIR. Preliminary versions of Chapter 4 have also appeared in the proceedings of AAMAS-10 and SoCS-10. The exact list of publications divided between the three main parts of this thesis is provided below: Chapter 2: Potential Search [Stern et al., 2010d] - Roni Stern, Rami Puzis and Ariel Felner. Potential Search: a new greedy anytime heuristic search. In SoCS, [Stern et al., 2011b] - Roni Stern, Rami Puzis and Ariel Felner. Potential Search: A Bounded-Cost Search Algorithm. In ICAPS, Roni Stern, Jur van den Berg, Rami Puzis, Rajat Shah, Ariel Felner, Arthur Huang and Ken Goldberg. Potential-based Non-Parametric Anytime and Bounded-Cost Search. (To be submitted) Chapter 3: Probably Approximately Correction Heuristic Search [Stern et al., 2011a] - Roni Stern, Ariel Felner and Robert Holte. Probably Approximately Correction Heuristic Search. In SoCS, Roni Stern, Ariel Felner and Robert Holte. Probably Approximately Correction Heuristic Search, Theory and Applications. (To be submitted) 3 This chapter focuses on the potential search algorithm for both bounded cost and anytime search problems. The work by Jur van den Berg et. al. have been developed independently and published after the Potential Search publications. Their work considers a single variant of the potential search algorithm, which they called ANA*, and discusses several properties of this variant. 7

18 Chapter 4: Searching for Patterns in an Unknown Graph [Stern et al., 2010a] - Roni Stern,Meir Kalech and Ariel Felner. Searching for a k-clique in unknown graphs. In AAMAS, [Stern et al., 2010b] - Roni Stern,Meir Kalech and Ariel Felner. Searching for a k-clique in unknown graphs. In SoCS, Roni Stern,Meir Kalech and Ariel Felner. Searching for Patterns in an Unknown Graph. Submitted to JAIR. 1.4 Further Publications The following published material is beyond the scope of this thesis. [Sharon et al., 2011a] - Guni Sharon, Roni Stern, Meir Goldenberg and Ariel Felner. The Increasing Cost Tree Search for Optimal Multi-Agent Pathfinding. In IJCAI, [Sharon et al., 2011b] - Guni Sharon, Roni Stern, Meir Goldenberg and Ariel Felner. Pruning Techniques for the Increasing Cost Tree Search for Optimal Multi-agent Pathfinding [Lelis et al., 2011] - Levi Lelis, Roni Stern and Shahab Jabbari Arfaee. Predicting Solution Cost with Conditional Probabilities. In SOCS, [Stern et al., 2010c] - Roni Stern, Tamar Kulberis, Ariel Felner and Robert Holte. Using Lookaheads with Optimal Best-First Search. In AAAI, [Stern and Kalech, 2010] - Roni Stern and Meir Kalech. MBD Techniques for Internet Delay Diagnosis. In DX,

19 Chapter 2 Potential-based Search Most heuristic search algorithms measure the quality of their solution by comparing it to the optimal solution. They can be classified into four major classes according to the quality of the solution that they return. (1) Optimal algorithms. Optimal algorithms return a solution that is guaranteed to be optimal. Algorithms from this type are usually variants of the well-known A* [Pearl, 1984b] or IDA* [Korf, 1985a] algorithms. In many real-life problems it is not practical to use optimal algorithms, as many problems are very hard to solve optimally. (2) Suboptimal algorithms. Suboptimal algorithms guarantee that the solution returned is no more than w times the optimal solution, where w > 1 is a predefined parameter. These algorithms are also called w- admissible. Weighted A* [Pohl, 1970] and Optimistic Search [Thayer and Ruml, 2008] are examples of algorithms of this class. Suboptimal algorithms usually run faster than optimal algorithms, trading the quality of the solution for running time. (3) Any solution algorithms. Any solution algorithms return a solution, but they have no guarantee about the quality of the solutions they find. Such algorithms usually find a solutions faster than algorithms of the first two classes, but possibly with lower quality. Examples of any solution algorithms include Depth-first-branch-and-bound (DFBnB) [Zhang and Korf, 1995], beam search variants [Furcy and Koenig, 2005], Hill climbing and Simulated annealing. (4) Anytime algorithms. Anytime algorithms are: algorithms whose quality of results improves gradually as computation time increases [Zilberstein, 1996]. An anytime search algorithm starts as an any solution algorithm. After the first solution is found, an anytime search algorithm continue to run, finding solutions of better qualities (with or without guarantee on their suboptimality). Some anytime algorithms are guaranteed to converge to finding the optimal solution if enough time is given. Prominent examples of anytime search algorithms are Any- 9

20 time Weighted A* [Hansen and Zhou, 2007] and Anytime Repairing A* [Likhachev et al., 2003]. This chapter deals with a fifth type of search algorithm, addressing the following scenario. Assume that a user has a given constant amount of budget C to execute a plan. The cost of the optimal solution or the amount of suboptimality is of no interest and not relevant. Instead, a plan with cost less than or equal to C is needed as fast as possible. We call this problem the bounded-cost search problem. For example, consider an application server for an online travel agency such as Expedia ( and a customer that requests a flight to a specific destination arriving before a given time and date (in which the customer has a meeting). This is clearly a boundedcost problem, where the cost is the arrival time. The task of Expedia is to build as fast as possible an itinerary in which the user will arrive on time. The user is not concerned with the optimality or suboptimality of the resulting plan, and Expedia would like to respond quickly with a fitting solution. Furthermore, once a flight plan has been found that fits the cost bound provided by the user, it is more important for Expedia to divert its computing resources (e.g., CPU usage) to address the requests of other users than to further optimize the flight plan of the current user. In addition to practical scenarios such as the scenario described above, we show in Section 2.4 how an efficient anytime search algorithm can be constructed by solving a series of bounded-cost search problems, Ideally, one may solve a bounded-cost search problem by running an optimal algorithm. If the optimal solution cost is less than or equal to C then return it, otherwise return failure, as no solution of cost C exists. One could even use C for pruning purposes, and prune any node n with f(n) C. However, this technique for solving the boundedcost search problem might be very inefficient as finding a solution with cost C can be much more easy than finding the optimal solution. Similarly, it is not clear how to tune any of the suboptimal algorithms (for example, which weight to use in Weighted A* and its variants), as the cost of optimal solution is not known and therefore the ratio between the cost of the desired solution C and the optimal cost is unknown too. A possible direction for solving a bounded-cost search problem is to run an anytime search algorithm and halt it when a good enough solution is found. However, solutions with costs higher than C may be found first even though they are of no use. The main problem with all these variants is that the desired goal cost is not used to guide the search, i.e., C is not considered when choosing which node to expand next. It is possible to view a bounded-cost search problem as a CSP, where the desired cost bound is simply a constraint on the solution cost. However, for many problems there are powerful domain-specific heuristics, 10

21 and it is not clear if general CSP solvers can use such heuristics. The potential-based approach described next is somewhat reminiscent of CSP solvers based on solution counting and solution density, where assignments that are estimated to allow the maximal number of solutions are preferred [Zanarini and Pesant, 2009]. In this chapter, an algorithm called Potential search (PTS) is introduced, which is specifically designed to solve a bounded-cost search problem. PTS is designed to focus on a solution that is less than or equal to C, and the first solution it provides meets this requirement. PTS is a best-first search algorithm that expands nodes according to the probability that they will be a part of a plan of cost less than or equal to the given budget C. We denote this probability as the potential of a node. Of course, the exact potential of a node is unknown. Instead, we show how any given heuristic function can be used to simulate the exact potential. This is possible as long as we have a model of the relation between the heuristic function and the cost of the optimal plan. Several such models are analyzed, and a general method for implementing PTS given such a model is proposed. We prove that with this method, nodes are expanded in a best-first order according to their potential. In the second part of this chapter, we show how a solving boundedcost problems can be used to create an anytime search algorithm. This is done by iteratively solving a bounded-cost search problem with decreasing cost bounds. As a result, we present Anytime Potential Search (APTS), which solves these bounded-cost problems with PTS. Experimental results on the standard 15-puzzle as well as on the Key Player Problem in Communication (KPP-COM) demonstrate the effectiveness of our approach. APTS is competitive with the state-of-the-art anytime and suboptimal heuristic search algorithms. It outperforms these algorithms in most cases and is more robust. Most of the research reported in Chapter 2 was published in the conferences SoCS-10 and ICAPS-11. This chapter is currently undergoing final stages of editing and will be submitted as a journal article that summarizes this topic combined with supplemental research performed by Jur van den Berg et. al. as part of a paper that was presented at AAAI- 11 [van den Berg et al., 2011]. 1 11

22 s g(b)=10 b h(b)=90 g(a)=100 h(a)=3 a g Figure 2.1: Example of an expansion dilemma. 2.1 The Potential Search Algorithm We now turn to describe the PTS algorithm in detail. Consider the graph presented in Figure 2.1. Assume that we are searching for a path from node s to node g and that we are asked to find a path of cost less than or equal to 120 (C = 120). After expanding s, the search algorithm needs to decide which node to expand next, node a or node b. 2 If the task were to find the optimal path from s to g, then clearly node b should be expanded first, since there may be a path from s to g that passes through b which is shorter than the cost of the path that passes through a as (g(b) + h(b) = 100 < g(a) + h(a) = 103). However, since any path that is shorter than 120 is acceptable in our case, expanding node b is not necessarily the best option. For example, it might be better to expand node a which is probably very close to a goal of cost less than 120 (as h(a = 3)). We propose the Potential search algorithm (denoted as PTS) which is specifically designed to find solutions with costs less than or equal to C. We define the potential of a node as the probability (P r) that this node is part of a path to a goal with cost less than or equal to C. This potential is formally defined as follows. Let g(n) be the cost of the shortest path found so far from the initial state to n, and let h (n) be the real cost of the shortest path from n to a goal. Definition : Potential. The potential of node n, denoted P T (n), is P r(g(n) + h (n)) C. 1 This chapter focuses on the potential search algorithm for both bounded cost and anytime search problems. The work by Jur van den Berg et. al. have been developed independently and published after the Potential Search publications. Their work considers a single variant of the potential search algorithm, which they called ANA*, and discusses several properties of this variant. 2 Throughout this chapter we use the standard heuristic search terminology, where the shortest known path between the start node s and a node n is denoted by g(n), and a heuristic estimate of the distance from a node n to a goal is denoted by h(n). 12

23 PTS is simply a best-first search (or any of its variants) which orders the nodes in the open-list (denoted hereafter as OPEN) according to their potential P T and chooses to expand the node with the highest P T (n). If h (n) is known then the P T (N) is easy to calculate. It is a binary function, returning 1 if g(n) + h (n) C and 0 otherwise. Of course, usually, h (n) is not known in advance and the exact potential of a node cannot be calculated. However, we show that it is possible to order the nodes according to their potential even without knowing or calculating the exact potential. This can be done by using the heuristic function h coupled with a model of the distribution of its values. Next we show how we can reason about the exact potential for several such heuristic models. We then show how these can be extended to the general case. 2.2 Calculating the Potential Many years of research in the field of heuristic search have produced powerful methods for creating sophisticated heuristics, such as abstractions [Larsen et al., 2010], constraint relaxation and memory based heuristics [Felner et al., 2004a; Sturtevant et al., 2009] as well as heuristics for planning domains [Katz and Domshlak, 2010]. Next, we show how it is possible to use any given heuristic and still choose to expand the node with the highest potential even without explicitly calculating it. All that is needed is knowledge about the model of the relation between a given heuristic and the optimal cost as defined in the next section Heuristic Models Let h be a given a heuristic function, estimating the cost of reaching a goal from a node. Consider the relation between h and h. In some domains, this relation is a known property of the available heuristic function (e.g., a precision parameter of a sensor). In other domains, it is possible to evaluate the model of a heuristic function, i.e., how close h is to h, from attributes of the domain. In order to preserve a domainindependent perspective, we focus on several general models of this h- to-h relation. We call this relation the heuristic model or h-model and define it as follows: Definition h-model. The function e is the h-model of h if h (n) = e(h(n)) for every node n. Note that the h-model is not necessarily a deterministic function, since there can be nodes with the same h but different h values. Next, we show that it is possible to implement PTS as a best-first search for 13

24 a number of common special cases of h-models. The potential function (P T (n)) is not known. However, for these cases, we provide a cost function that is easy to calculate and prove that this cost function orders the nodes exactly in the order of the potential function, that is, the node with the smallest cost is also the node with the highest potential. Therefore, it is possible to simulate the potential of a node with this cost function Constant Gap h-model We demonstrate the relation between an h-model and the potential of a node on the following simple h-model called the constant gap model. In the constant gap h-model h (n) = h(n)+k for some constant K. In the constant gap model, the potential of a node is a simple binary function: { 1 g(n) + h(n) + K C P T (n) = 0 otherwise It is easy to see that for domains with the constant gap model, finding the optimal path to a goal is trivial as a prefect cost function is f (n) = g(n) + h(n) + K. A using this cost function with tie-breaking favoring nodes with smaller h-values will expand only the nodes on the shortest path to the goal Additive h-model Consider the following h-model: h (n) = h(n) + X, where X is an independent identically distributed (i.i.d.) random variable. This does not imply that the distribution of X is uniform, but just that additive error of every node is taken from the same distribution (independently). We call this type of h-model an additive h-model. 3 Lemma For any i.i.d. random variable X, if the h-model is h (n) = h(n) + X and f(n) = g(n) + h(n) then for any pair of nodes n 1,n 2 we have: f(n 1 ) f(n 2 ) P T (n 1 ) P T (n 2 ) Proof: Assume that P T (n 1 ) P T (n 2 ), and let h i, g i and h i denote h(n i ), g(n i ) and h (n i ) respectively. According to the potential definition (Definition 2.1) then: P r(g 1 + h 1 C) P r(g 2 + h 2 C) P r(h 1 C g 1 ) P r(h 2 C g 2 ) 3 This is reminiscent of the bounded constant absolute error model described by [Pearl, 1984b] where the difference between h and h was bounded by a constant (i.e., h (n) h(n) + K). Here, K is the largest values for X. 14

25 According to the h-model, this is equivalent to: P r(h 1 + X C g 1 ) P r(h 2 + X C g 2 ) P r(x C g 1 h 1 ) P r(x C g 2 h 2 ) Since X is i.i.d., then this is equivalent to: C g 1 h 1 C g 2 h 2 f(n 2 ) f(n 1 ) Consequently, for any problem with an additive h-model, standard A*, which expands the node with the smallest f-value, will always expand the node with the highest potential. This results is summarized in Theorem 2.2.2: Theorem For any i.i.d. random variable X, if h =h + X then PTS can be implemented as a best-first search using the standard cost function of A, f = g + h. Therefore, for an additive h-model we can order the nodes in OPEN according to their potential, even without knowing the exact potential and regardless of the distribution of X Linear Relative h-model An additive h-model may not fit many real problems. Consider for example a shortest path problem in a map, using the air distance as a heuristic. If the air distance between two nodes is very large, there is a larger possibility that obstacles exist between them. More obstacles imply larger difference between the air distance and the real shortest path. We therefore propose the following more realistic h-model: h (n) = h(n) X for any random i.i.d. variable X. We call this type of model the linear relative h-model 4 and present the following cost function: f lnr (n) = h(n) C g(n) If X is a constant K, then the potential is again a binary function { 1 g(n) + h(n) K C P T (n) = 0 otherwise Again, for such models the prefect cost function exists: f(n) = g(n) + K h(n). For the more general case of a linear relative h-model 4 This model is reminiscent of the constant relative error [Pearl, 1984b], where h (n) K h(n) for some constant K. 15

26 where K is any random i.i.d. variable, we present the following cost function: f lnr (n) = h(n) C g(n) Nodes with smaller f lnr (n) are more likely to find a path within C. The intuition behind this is as follows. C g(n) is an upper bound on the remaining cost to the goal from node n that may result in a path with total cost within C. h(n) is a lower bound estimation of the remaining h(n) cost. Therefore, nodes with smaller are more likely to find a path C g(n) within such a bound. Fortunately, this can be proven theoretically as follows: Next we prove that nodes with smaller f lnr (n) are more likely to find a path with cost C. The intuition behind this is as follows. C g(n) is an upper bound on the remaining cost to the goal from node n that may result in a path with total cost smaller than or equal to C. h(n) is a lower bound estimation of the remaining cost. Therefore, nodes with smaller h(n) C g(n) are more likely to find a path within such a bound. Lemma For any i.i.d. random variable X, if the heuristic model is h (n) = h(n) X then for any pair of nodes n 1,n 2 we have: f lnr (n 1 ) f lnr (n 2 ) P T (n 1 ) P T (n 2 ) Proof: Assume that P T (n 1 ) P T (n 2 ). According to the potential definition this means that: P r(g 1 + h 1 C) P r(g 2 + h 2 C) P r(h 1 C g 1 ) P r(h 2 C g 2 ) According to the h-model, this is equivalent to: P r(x h 1 C g 1 ) P r(x h 2 C g 2 ) P r(x C g 1 h 1 ) P r(x C g 2 h 2 ) Since X is i.i.d., this is equivalent to: C g 1 C g 2 h 1 h 1 h 2 h 2 C g 1 C g 2 f lnr (n 1 ) f lnr (n 2 ) Consequently, for any problem with a heuristic that has a linear relative h-model, a best-first search that uses f lnr as a cost function will expand nodes exactly according to their potential. This result is summarized in Theorem 2.2.4: 16

27 Theorem For any i.i.d. random variable X, if h =h X then PTS can be implemented as a best-first search using f lnr as a cost function General h-model Consider the more general h-model, which is some function of h and a random i.i.d. variable X. We denote this function by e and write h = e(h, X). Let e r be the inverse function of e such that e r (h, h) = X. We denote an h-model as invertible if such an inverse function e r exists, and define a general function P gen (n) = e r (C g(n), h) which simulates the potential of a node as follows. Lemma Let X be an i.i.d. random variable X and h = e(h, X) an invertible h-model, where e r is monotonic. Then, for any pair of nodes n 1,n 2, P gen (n 1 ) P gen (n 2 ) P T (n 1 ) P T (n 2 ) Proof: Assume that P T (n 1 ) P T (n 2 ). According to the potential definition this means that: P r(g 1 + h 1 C) P r(g 2 + h 2 C) P r(h 1 C g 1 ) P r(h 2 C g 2 ) Since e is invertible, we apply e r (, h) to both sides: P r(x e r (C g 1, h 1 )) P r(x e r (C g 2, h 2 )) Since X is i.i.d., this is equivalent to: e r (C g 1, h 1 ) e r (C g 2, h 1 ) P gen (n 1 ) P gen (n 2 ) Thus, for any problem with an invertible h-model, a best-first search that expands the node with the highest P gen will exactly simulate PTS as this is the node with the highest potential. P gen can be easily converted (e.g., by multiplying P gen by -1) to a cost function f gen where an equivalent best-first search will choose to expand the node with the smallest f gen. This is summarized in Theorem 2.2.6: Theorem For any i.i.d. random variable X, if h = e(h, X), e is invertible, and e r is monotonic, then PTS can be implemented as a best-first search using a cost function f gen (n). Notice that Theorems and are in fact special cases of Theorem Table 2.1 presents a number of examples of how Theorem can be used to obtain cost functions for various h-models. 17

28 h-model (e) e r (h, h) P gen f gen h+x (additive) h -h C-g-h f=g+h h h X (linear relative) C g h h h C g =f lnr h X log h (h ) log h (C-g) -log h (C-g) Table 2.1: h models and their corresponding cost functions. C OS-1.50 OS-2.00 OS-3.00 PTS AWA*-1.50 AWA*-2.00 AWA* % 28% 63% 23% 14% 29% 68% 60 11% 21% 74% 12% 11% 25% 60% 65 6% 17% 40% 4% 6% 14% 41% 70 6% 3% 9% 3% 6% 4% 9% 75 6% 3% 3% 2% 6% 3% 4% 80 6% 3% 3% 2% 6% 3% 2% 85 6% 3% 2% 1% 6% 3% 2% 90 6% 3% 2% 1% 6% 3% 1% Table 2.2: Expanded nodes as a percentage of nodes expanded by A. Fixed desired cost C. These cost functions can then be used in a best-first search to implement PTS. The exact h-model is domain dependent. Analyzing a heuristic in a given domain and identifying its h-model may be done analytically in some domains with explicit knowledge of the domain attributes. Another option for identifying a h-model is by adding a preprocessing stage in which a set of problem instances are solved optimally, and the h-model is discovered using curve fitting methods. 2.3 Bounded-cost Experimental Results To show the applicability and robustness of PTS we present experimental results on two domains: the 15-puzzle and the Key Player Problem in Communication (KPP-COM) Puzzle There are many advanced heuristics for the 15-puzzle, but we chose the simple Manhattan distance heuristic (MD) as our goal is to compare search algorithms and not different heuristics. In order to choose the correct h-model for MD, we sampled 50 of the standard 100 random instances [Korf, 1985a] and solved each instance optimally. For every such instance we considered all the states on the optimal path, to a total of 2,507 states. Each of these states were assigned a 2-dimensional point, where the x value denotes the MD of the state and the y value denotes its optimal cost to the goal. The plot of these points is presented in 18

29 Optimal path (h*) y = 1.34x Manhattan Distance (h) Figure 2.2: Manhattan distance heuristic Vs. true distance. Figure 2.2. The dashed red line indicates the best linear fit for the plot, which is the line y = 1.34 x It is easy to observe that linear fit is very close and therefore we solved the 15-puzzle with Potential Search using a linear relative h-model and its corresponding implementation using f lnr (Theorem 2.2.4). PTS was implemented on the 15-puzzle to solve the bounded-cost search problem. In addition, we also implemented a number of state-ofthe-art anytime algorithms but focus here on the two anytime algorithms that performed best: Anytime Weighted A* [Hansen and Zhou, 2007] and Optimistic Search [Thayer and Ruml, 2008], denoted as AWA* and OS respectively. AWA* and OS are anytime algorithms but they can be easily modified to solve the bounded-cost search problem by halting these algorithms when a solution with cost less than or equal to the desired cost C was found, and pruning nodes with g + h > c. AWA* is a simple extension of Weighted A* [Pohl, 1970] (WA*). WA* orders the nodes in OPEN according to the cost function f = g + w h, were w is a predefined parameter. After the first solution has been found AWA* simply continues to run WA*, finding goals with better costs. OS is a suboptimal search algorithm which uses two cost functions: an admissible heuristic h and an inadmissible (but possibly more accurate) heuristic ĥ. In our experiments, ĥ was implemented as a 19

30 Subopt. PTS OS-1.50 OS-2.00 OS-3.00 AWA*-1.50 AWA*-2.00 AWA* ,048, ,563 1,408,569 2,162,069 1,437,363 2,554,212 3,857, , , ,863 1,125, , ,348 1,261, , ,557 45, , ,562 66, , , ,557 28, , ,619 30, , , ,557 26,226 38, ,619 26,859 69, , ,557 26,227 20, ,619 26,948 24, , ,557 26,227 13, ,619 26,948 11, , ,557 26,227 10, ,619 26,948 12, , ,557 26,227 10, ,619 26,948 10, , ,557 26,227 10, ,619 26,948 10,996 Table 2.3: 15-puzzle expanded nodes. The desired cost C was based on a suboptimality degree. weighted version of h, i.e. ĥ = w h where w is a predefined parameter. OS chooses to expand the node with the lowest g + ĥ but switches to using g + h if all the nodes in OPEN will not improve the incumbent solution (=best solution found so far) according to ĥ. OS was shown to be highly effective for many domains [Thayer and Ruml, 2008]. In its basic form, OS is a suboptimal algorithm, halting the search when the ratio between the lowest g +h in the openlist and the incumbent solution is below a desired suboptimality. However, it can easily be extended to an anytime algorithm by simply continuing the search process until the desired goal cost is found. We performed two sets of experiments which differ in the way the desired cost C was calculated. In the first set of experiments we ran PTS, OS and AWA* (with different w) on 75 random 15-puzzle instances with a fixed desired cost C of 90, 85, 80 down to 55. The exact same cost C was set to all instances, no matter what was the optimal solution cost. Table 2.2 presents the average number of nodes expanded until a goal with a cost equal to or under the desired cost was found, as a percentage of the number of nodes expanded by A* when finding the optimal solution. Every row corresponds to different desired goal cost. The algorithm with the lowest number of expanded nodes in every row is marked in bold. For OS and AWA*, we experimented with w=1.5, 2 and 3. As can be seen, for desired goal costs of 55 and 60, OS-1.5 expands the fewest nodes. In all other cases PTS outperforms both algorithms. Furthermore, even for cost 55 and 60, PTS performs relatively well, expanding only 23% and 12% of nodes expanded by A*, respectively. This demonstrates the robustness of PTS. In the second set of experiments the desired cost C was different for each individual instance and was based on the optimal solution cost as follows. First, we found the optimal solution with A* and MD for 75 random instances. Then, for every instance we ran PTS, OS and AWA* with a desired cost C that was set to be a factor of 1.1,..,1.9 times the optimal cost. All algorithms were halted when a solution of 20

31 cost less than or equal to C was found. Both OS and AWA* have a parameter w which was set to 1.5, 2 and 3. Table 2.3 presents the average number of expanded nodes, for the different algorithms (using MD as a heuristic) and different bounds. Bold fonts mark the best algorithm in terms of minimum number of nodes expanded. Since C was different for each instance, the Subopt. column in the table gives the degree of suboptimality (1 + w optimal) that was used to calculate C. Runtime results are omitted since they show exactly the same trends as the number of expanded nodes. This is reasonable since all algorithms use a cost function that is a simple arithmetic operation of g and the same heuristic function (MD), and all algorithms are best-first searches, implemented using a priority queue based openlist, and thus the time per expanded node is practically the same for all the algorithms. 5 As can be seen, for different desired cost bounds C different algorithms perform best. For constants that are close to 1 (suboptimality of 1 up to 1.3), OS-1.50 or OS-2.00 are the best. For large constants (of high suboptimality levels), either PTS or OS-3.00 performs best. All this is meaningful if we know the cost of the optimal solution and one can therefore choose the best variant of either OS or AWA*. However, in reality it is often the case that the optimal cost is not known. Therefore, in such cases one would not know how to choose the weight that will result in best performance. One is thus forced to guess a value for w without knowing the cost of the optimal solution and as a consequence without knowing the degree of suboptimality. This means that each individual column should be compared as a stand alone algorithm to any other column. The table clearly shows that for both OS and AWA* any given value of w we tried (individual columns) performs best or reasonable in only a very limited range. For example, OS-2 performs well only when the desired solution is within a degree of suboptimality of 1.2 or 1.3. Guessing w = 2 and thus using OS-2 will only perform well in such ranges of solutions but will perform much worse in other ranges. By contrast, PTS is much more robust. PTS is clearly superior when compared to any given fixed setting (any possible column) for both OS and AWA* across the different values of the desired cost C (rows). It is only outperformed by OS-1.50 for C that corresponds to suboptimality of 1 and 1.1, by OS-2.00 for suboptimality of 1.2 and 1.3 and by AWA* for suboptimality of In all other cases PTS outperforms all other variants. Furthermore, PTS was the best variant and outperformed all other variants for suboptimality levels 1.4, 1.5 and 1.9. In all other suboptimality levels PTS was relatively close to the best algorithms. Therefore, we can conclude from this set of experiments that 5 OS is a slight exception, since it maintains two openlists. However, we have seen that the overall runtime trends remain the same for OS as well. 21

32 Subopt. OS-Oracle AWA*-Oracle PTS 1 4,538,762 2,555,737 2,048, ,792,151 1,648, , , ,627 78, , ,407 42, , ,323 25, , ,619 18, ,211 86,035 17, ,198 61,545 12, ,243 46,965 11, ,345 40,782 10,559 Table 2.4: 15-puzzle results with an oracle. Varying degree of suboptimality. if the suboptimality level is not known PTS is clearly the algorithm of choice. Note that PTS is the only algorithm where the number of expanded nodes continues to decrease as the desired cost bound increases, speeding up the search with respect to the desired bound. By contrast, for any weight shown in Table 2.3, the number of nodes expanded by either AWA* or OS decreases with the cost bound only until a certain point, after which the number of expanded nodes remains constant. 6 The explanation is as follows. Let AWA*(w) be AWA* with weight w, and let C w be the cost of the first goal found by AWA*(w). Clearly, AWA*(s) will expand the same set of nodes until the first goal is found, regardless of the desired bound C. Thus the number of nodes expanded by AWA*(w) will be exactly the same, for any desired cost C C w. Similar argument applies for OS. Assume an oracle that provides the optimal solution cost. In this case, one could calculate the exact ratio between the desired cost bound C and the optimal cost and then set this ratio as input for a suboptimal algorithm. Table 2.4 presented the average number of nodes expanded by AWA* and OS with the oracle described above. The OS-Oracle and AWA*-Oracle columns in Table 2.4 present the results where we set w for AWA* and OS to be exactly the desired suboptimality, e.g., for suboptimality of 1.5 we set w = 1.5. The PTS column contains the same results as the PTS column in Table 2.3, and is provided to ease the comparison between OS and AWA* with the oracle and the standard PTS. PTS clearly outperforms this oracle variants. The reason can be explained by the known phenomenon for WA* variants, where for a given weight w, the quality of the returned solution is much better than just w times optimal [Hansen and Zhou, 2007]. Therefore, if one wants to find same. 6 For OS-3.0 and AWA*-3.0 the number of expanded nodes for C = 1.8 and C = 1.9 are practically the 22

33 Optimal path (h*) a solution of suboptimality of w then a parameter larger than w should be used in order to get the best performance. For example, if one wants a solution with guaranteed suboptimality of 1.5 then OS with w = 3 (displayed in the column OS-3.00) is a better choice (20,199 nodes) than OS with w = 1.5 (108,557 nodes) Key Player Problem in Communication (KPP-COM) y = 0.73x Heuristic function (h) Figure 2.3: KPP-COM optimal solution vs. heuristic. s,t V s,t n The 15-puzzle domain discussed above represents a class of domains where one wants to find a path (or solution) of minimal cost. In order to test the algorithms on a diversity of domains, for our second domain we chose a problem where the search is aimed at finding a solution with the maximal utility. We thus implemented the three algorithms on the Key Player Problem in Communications (KPP-COM) [Puzis et al., 2007] which was shown to be NP-Complete. KPP-COM is the problem of finding a set of k nodes in a graph with the highest group betweenness centrality (GBC). (GBC) is a metric for centrality of a group of nodes [Everett and Borgatti, 1999]. It is a generalization of the betweenness metric that measures the centrality of a node with respect to the number of shortest paths that pass through it [Freeman, 1977]. Formally, the betweenness of a node n is C b (n) = σ st (n), where σ st is the number of shortest paths be- σ st tween s and t and σ st (n) is the number of shortest paths between s and t that passes through n. The betweenness of a group of nodes A, termed 23

34 group betweenness, is defined as C b (A) = s,t V \A σ st (A) σ st where σ st (A) is the number of shortest paths between s and t that pass through at least one of the nodes in A. KPP-COM can be solved as a search problem. Let G = (V, E) be the input graph in which we are searching for a group of k nodes with highest GBC. A state in the search space consists of a set of vertices N V which are considered to be the group of vertices with the highest GBC. The initial state of the search is an empty set, and a children of a state correspond to adding a single vertex to the set of vertices of the parent state. Instead of a cost, every state has a utility, which is the GBC of the set of vertices that it contains. Note that since in this problem the optimal solution has the maximal GBC, an admissible heuristic is required to be an upper bound on the optimal utility. Similarly, a suboptimal solution is one with smaller utility than the optimal. A number of efficient admissible heuristics for this problem exist [Puzis et al., 2007] and in our experiments we used the best one, which is calculated as follows. Consider a node that consists of a set of m vertices V m. First, the contribution of every individual vertex v V that is not in V m is calculated. This is the GBC of V m {v} minus the GBC of V m. Then, the contribution of the topmost k m vertices is summed and used as an admissible heuristic. We denote this heuristic as h GBC (see [Puzis et al., 2007] for more detailed discussion on this heuristic). Since the main motivation for the KPP-COM problem is in communication network domains, all our experiments were performed on graphs generated by the model provided by [Barabasi and Albert, 1999]. This model is a well-used model of Internet topology and the web graph. First, we searched for a fitting h-model for the h GBC heuristic. Figure 2.3 shows the h (which is the maximum utility that can be added to a node) as a function of the h GBC heuristic. This was calculated by solving 100 random instances optimally, and backtracking from the solution node to the root node. The real distance to the goal (in terms of utility) of the nodes on the optimal paths as a function of their heuristic values is plotted in Figure 2.3. The dashed red line is a linear fit of the data. As can be seen, this heuristic also exhibits a clear linear relative h-model. Thus, we used the f lnr cost function for implementing PTS. We performed the following experiments on this problem. First, a graph with 600 nodes was generated according to the Barabási-Albert model, with a density factor of 2. Then we ran PTS, AWA* and OS (both with different weights) given a desired costs of 250,000, 260,000,..., 320,000, limiting the size of the searched group of vertices to be 20 24

35 C PTS DFBnB OS-0.7 OS-0.8 OS-0.9 AWA*-0.7 AWA*-0.8 AWA* , , , , , , , , , , ,000 1,194 1,117 1,145 1,396 2,746 1,138 1,391 2,713 Table 2.5: Average runtime in seconds on KPP-COM instances. (i.e., k = 20). The average optimal utility was 326,995. Since the depth of the solution is known in advance (the size of the searched group k), we also ran Depth-first branch and bound (DFBnB), which is known to be highly effective when the depth of the solution is known and many solutions exist. This experiment was repeated 25 times and Table 2.5 presents the average runtime in seconds until a node with utility larger than or equal to the desired utility was found. Indeed, PTS is shown to be effective for all of the desired utilities, performing slightly better than DFBnB, which is known to perform very well on this domain [Puzis et al., 2007]. Notice that in this problem, low weights, used by AWA* will also achieve very good performance and converge to DFBnB. This is because a low weight to the heuristic causes the search to focus more the g part of the cost function f = g + w h, resulting in a DFBnB-like behavior where deeper nodes are preferred. 2.4 Potential-based Anytime Search In time-critical applications such as robotics, waiting until the optimal solution is found is often infeasible. A common approach for such applications is to use anytime algorithms, quickly producing an initial, suboptimal solution and then improving it over time. While solving boundedcost search problems is important on its own, it can also be useful as part of an anytime search algorithm, described next. PTS can be modified to be an anytime search algorithm which we call anytime potential search (APTS). APTS uses the following greedy approach to anytime search: focus on finding a solution that is better than the incumbent solution (=best solution found). This can be naturally implemented using PTS. Simply set C to be the cost of the incumbent solution, minus a small constant ɛ, which can be the smallest edge cost in the graph. This is iteratively repeated until APTS fails to find better solutions, in which case the optimal path to a goal has been found. It is even possible to transfer the OPEN and CLOSED lists of APTS between iterations, recalculating the potential cost function for all the nodes in OPEN, when a new goal node with better cost is found. Of course, this has time overhead as all the nodes must be reinserted to OPEN with their new cost (e.g., f lnr ). 25

36 Solution quality (depth/optimal) Experimental Results PTS-2.0 OS-2.0 PTS-1.3 OS-1.3 PTS-1.1 OS Runtime (seconds) Figure 2.4: 15-puzzle, solution quality Vs. Runtime. We experimented on all the standard 100 random 15-puzzle instances. Figure 2.4 shows the results of APTS Vs. OS. The x-axis denotes the runtime and the y-axis displays the solution quality (the depth of goal found divided by the optimal goal depth). The rightmost point in the x-axis denotes the average runtime required for A to find the optimal solution. Again, we used f = g + w h as an inadmissible cost function for OS. As explained above, APTS performs a series of PTS iterations, each with a different desired cost C. To initiate C for the first iteration, we first ran WA* with the same parameter w used for OS until the first solution was found. Then, APTS was activated. As can be seen in Figure 2.4, when using the same weight APTS always outperforms OS. We also compared APTS to AWA* with various weights, and found that AWA* and APTS have very similar performance in this domain. We also performed experiments with APTS on the KPP-COM problem. Since any subset of vertices has a GBC, then every node generated induces a (probably suboptimal) utility. Thus even before A expands a goal (and returning the optimal solution), it can return suboptimal solutions. Therefore in KPP-COM APTS does not require any parameter in order to find an initial goal fast (as opposed to the 15-puzzle, in which an initial WA* was performed). Figure 2.5 displays average results on 100 different graphs with 600 nodes, density of 2 and a desired group size of 20. The x-axis denotes the runtime and the y-axis the solution quality (best utility found divided 26

37 Solution quality (utility/optimal) PTS OS-0.78 DFBnB AWA*-0.78 A* AWA*-1.1 OS Runtime (seconds) Figure 2.5: Anytime KPP-COM, 600 nodes, density 2. by the optimal utility). As can be seen in Figure 2.5 APTS (without any parameter) outperforms any other algorithm. OS with a weight of 0.78 was relatively very close. Note that while in the 15-puzzle the overhead per node of calculating the heuristic was very small, the h GBC heuristic described above requires significant time. This reduces the relative overhead of maintaining two OPEN lists required by OS. This explains the improved runtime of OS in comparison with the 15-puzzle results. It is important to note that APTS does not use any parameter while both AWA* and OS are very sensitive to the weight of the heuristic. 7 This can be seen in the degrading results of AWA*-1.3 and OS Conclusions and Future Work In this chapter we introduced a best-first search algorithm, Potential search (PTS), which is specifically designed to solve a bound-cost search problems. PTS orders the nodes according to their potential. Several ways to estimate the potential of a node are described. Specifically, we use the relation between a given heuristic and the optimal cost to a develop a cost function that can order the OPEN node according to their potential, without actually calculating it. In addition, PTS can be mod- 7 Although OS can be used with any inadmissible heuristic, finding an efficient aparametric inadmissible heuristic is challenging. 27

38 ified to an anytime search algorithm variant, APTS. Empirical results show both PTS variants are very efficient and outperformed other algorithms in most settings of our experiments. The main advantage of PTS over the other algorithms we tried is that it does not require parameter tuning (such as w in the WA*-based algorithms) and is thus much more robust across different instances. Other algorithms were shown to be very sensitive to the parameter used. In many cases, e.g., when the optimal solution is unknown, one would not be able to determine the correct value for the parameter for these algorithms. Future work will investigate how to incorporate an estimate of the search effort until the desired solution is found, in addition to the potential of a node. For example, a node that is very close to a goal might be preferred to a node that has a slightly higher potential but is farther from a goal. In addition, we are currently pursuing the use of machine learning technique to learn the potential function, instead of the indirect potential calculation described in this chapter. 28

39 Chapter 3 Probably Approximately Correction Heuristic Search Consider a standard search problem of finding a path in a state space from a given initial state s to a goal state. As explained in Section 1.1, the optimal solution can be found using the A* algorithm [P. E. Hart and Raphael, 1968], a best-first search algorithm that uses a cost function of f(n) = g(n) + h(n). Furthermore, if h (s) is the cost of the optimal solution then all the nodes with g +h < h (s) must be expanded in order to verify that no better path exist [Dechter and Pearl, 1985]. While developing accurate heuristics can greatly reduce the number of nodes with g + h < h (s), it has been shown that in many domains even with an almost perfect heuristic expanding all the nodes with g + h < h (s) is not feasible within reasonable computing resources. [Helmert and Röger, 2008] When finding an optimal solution is not feasible, a range of search algorithms have been proposed that return suboptimal solutions. In particular, when an algorithm is guaranteed to return a solution that is at most w times the optimal solution we say that this algorithm is w-admissible. Weighted A* [Pohl, 1970], A ɛ [Pearl and Kim, 1982], Anytime Weighted A* [Hansen and Zhou, 2007] and Optimistic Search [Thayer and Ruml, 2008] are known examples of w-admissible algorithms. In general, w-admissible search algorithms achieve w- admissibility by using an admissible heuristic to obtain a lower bound on the optimal solution. When the ratio between the incumbent solution (i.e., the best solution found so far) and this lower bound is below w, then w-admissibility is guaranteed. Efficient w-admissible algorithms often introduce a natural tradeoff between solution quality and search runtime. When w is high, solutions are returned quickly but have poorer quality. 1 1 In some domains, increasing w above some point also degrades the search runtime. 29

40 By contrast, setting low w values will increase the search runtime but solutions of higher quality will be returned. In this chapter we argue that it is possible to develop a search algorithm that will run much faster than traditional w-admissible algorithm by relaxing the strict w-admissibility requirement. Specifically, search problems can be solved much faster if one allows the returned solution to be w-admissible in most of the cases instead of always. For example, consider Weighted A* (WA*), a best-first search algorithm which uses the cost f(n) = g(n)+w h(n). It has been proven that WA* with weight w is w-admissible. For example, if the requirement is that a solution must be 1.25-admissible, then w will be set to However, it is often the case that setting a higher weight may often return a solution that is also 1.25-admissible with high probability. For example, on the standard 100 random 15-puzzle instances [Korf, 1985b], we have found that running WA* with w = 1.5 always returns a solution that is 1.25-admissible. Importantly, running WA* with w = 1.5 often returns a solution faster than WA* with w = Consequently, if one requires a 1.25-admissible solution with high probability for a 15-puzzle instance, running WA* with w = 1.5 would be a better choice. Inspired by the Probably Approximately Correct (PAC) learning framework from Machine Learning [Valiant, 1984], we formalize the notion of finding a w-admissible solution with high probability. We call this concept Probably Approximately Correct Heuristic Search, or PAC search in short. A PAC search algorithm is given two parameters, ɛ and δ, and is required to return a solution that is at most 1 + ɛ times the optimal solution, with probability higher than 1 δ. We call 1+ɛ the desired suboptimality, and 1 δ the required confidence. We first introduce and formally define what is PAC search. Then, we present a general framework for a PAC search algorithm. This framework can be built on top of any algorithm that produces a sequence of solutions. When the search algorithm produces a solution that achieves the desired suboptimality with the required confidence, the search halts and the incumbent solution (= the best solution found so far by the search algorithms) is returned. This results in a PAC search algorithm that can accept any positive value ɛ and δ and return a solution within these bounds. Empirical evaluation were performed on the 15-puzzle using our PAC search algorithm framework on top of Anytime Weighted A* [Hansen and Zhou, 2007]. The results show that decreasing the required confidence (1-δ) indeed yields a reduction in the number of expanded nodes. For example, setting δ = 0.05 decreases the number of expanded nodes by a factor of 3 when compared to regular Anytime Weighted A* (where δ = 0) for various values of ɛ. 30

41 A large portion of the research reported in Chapter 3 was presented in the SoCS-11 conference and published in the conference proceedings. 3.1 Related Work The possible connection between the PAC learning framework and heuristic search have been previously pointed out [Ernandes and Gori, 2004]. They used an artificial neural network (ANN) to generate a heuristic function ĥ that is only likely admissible, i.e., admissible with high probability. They showed experimentally that A* with ĥ as its heuristic can solve the 15 puzzle quickly and return the optimal solutions in many instances. This can be viewed as a special case of the PAC search concept presented in this chapter, where ɛ = 0 and δ is allowed values larger than zero. In addition, they bound the quality of the returned solution to be a function of two parameters: 1) P (ĥ ), which is the probability that ĥ is overestimating the optimal cost, and 2) d, the length (number of hops) of the optimal path to a goal. Specifically, the probability that the path found by A* with ĥ as a heuristic is optimal is given by (1 P (ĥ ))d. Unfortunately, this formula is only given as a theoretical observation. In practice, the length of the optimal path to a goal d is not known until the problem is solved optimally, and thus this bound cannot be used to identify whether a solution is probably optimal in practice. Other algorithms that use machine learning techniques to generate accurate heuristics have also been proposed [Samadi et al., 2008; Jabbari Arfaee et al., 2010]. The resulting heuristics are not guaranteed to be admissible but were shown to be very effective. No theoretical analysis of the amount of suboptimality was performed for these algorithms. The PAC framework has been borrowed from machine learning to other fields as well. For example, a data mining algorithm that is probably approximately correct [Cox et al., 2009] has been proposed. Also, a framework for probably approximately optimal strategy selection was also proposed, for the problem of finding an efficient execution order of sequence of experiments with probabilistic outcomes [Greiner and Orponen, 1990]. To our knowledge, the PAC framework has not yet been adapted to heuristic search. 3.2 PAC Heuristic Search PAC Learning is a framework for analyzing machine learning algorithms and the complexity of learning classes of concepts. A learning algorithm 31

42 is said to be a PAC learning algorithm if it generates with probability higher than 1 δ a hypothesis with an error rate lower than ɛ, where δ and ɛ are parameters of the learning algorithm. Similarly, we say that a search algorithm is a PAC search algorithm, if it returns with high probability (1 δ) a solution that has the desired suboptimality (of 1+ɛ) Formal Definition We now define formally what is a PAC search algorithm. Let M be the set of all possible start states in a given domain, and let D be a distribution over M. Correspondingly, we define a random variable S, to be a state drawn randomly from M according to distribution D. For a search algorithm A and a state s M, we denote by cost(a, s) the cost of the solution returned by A given s as a start state. We denote by h (s) the cost of the optimal solution for state s. Correspondingly, cost(a, S) is a random variable that consists of the cost of the solution returned by A for a state randomly drawn from M according to distribution D. Similarly, h (S) is a random variable that consists of the cost of the optimal solution for a random state S. Definition [PAC search algorithm] An algorithm A is a PAC search algorithm iff P r(cost(a, S) (1 + ɛ) h (S)) > 1 δ Classical search algorithms can be viewed as special cases of a PAC search algorithm. Algorithms that always return an optimal solution, such as A* and IDA*, are simply PAC search algorithms that set both ɛ and δ to zero. w-admissible algorithm are PAC search algorithms where w = 1 + ɛ and δ = 0. In this work we propose a general framework for a PAC search algorithm that is suited for any non-negative value of ɛ and δ Framework for a PAC Heuristic Search Algorithm We now introduce a general PAC search framework (PAC-SF) to obtain PAC search algorithms. The main idea of PAC-SF, is to generate better and better solutions and halt whenever the incumbent solution has the desired suboptimality (1+ɛ) with the required confidence (1-δ). The pseudo code for PAC-SF is listed in Algorithm 1. The incumbent solution U is initially set to be (line 1). In every iteration of PAC-SF, a search algorithm is run until a solution with cost lower than U is found 32

43 Algorithm 1 PAC search algorithm framework Input: 1 + ɛ, The required suboptimality Input: 1 δ, The required confidence 1 U while Improving U is possible do 2 NewSolution search for a solution if NewSolution < U then 3 U NewSolution if U is a PAC solution then return U (line 3), or it is verified that such a solution does not exist (line 2). When the incumbent solution U is a PAC solution, i.e., it is (1+ɛ)-admissible with high probability (above 1-δ), the search halts and U is returned (line 6). Implementing PAC-SF introduces two challenges: 1. Choosing a search algorithm (line 3). This is the fundamental challenge in a search problem: how to find a solution. 2. Identifying when to halt (line 6). This is the challenge of identifying when the incumbent solution U ensures that the desired suboptimality has been achieved with the required confidence. Anytime search algorithms can address the first challenge. When searching with an anytime algorithm, after the first solution is found the search continues, finding solutions of better qualities. 2 This is exactly what is needed for PAC-SF (line 3 in Algorithm 1). Prominent examples of anytime search algorithms are Anytime Weighted A* [Hansen and Zhou, 2007], Beam-Stack Search [Zhou and Hansen, 2005] and Anytime Window A* [Aine et al., 2007]. Note that the incumbent solution U can be passed to an anytime search algorithm for pruning purposes, e.g., pruning all the nodes with g + h U with an admissible h. Some anytime algorithms are guaranteed to eventually find the optimal solution. Consequently, if PAC-SF is built on top of such an algorithm, then it is guaranteed to return a solution for any values of ɛ and δ. This is because whenever the optimal solution is found, then the incumbent solution U is optimal and can be safely returned for any ɛ and δ. For simplicity, we assume hereinafter that the search algorithm used 2 For some anytime algorithms, it is not guaranteed that a returned solution is necessarily of higher quality than a previously returned solution. However, this can be easily remedied by continuing to run such anytime algorithms, until a solution that is better than all of the previously returned solutions is found. 33

44 in PAC-SF is an anytime search algorithm that converges to the optimal solution. 3.3 Identifying a PAC Solution We now turn to address the second challenge in PAC-SF, which is how to identify when the incumbent solution U is a PAC solution, and the search can terminate (line 6 in Algorithm 1). Definition [Sufficient PAC Condition] A sufficient PAC condition is a termination condition for PAC-SF that ensures that PAC-SF will return a solution for a randomly drawn state that is (1 + ɛ)-admissible with probability of at least 1 δ. PAC-SF with an anytime search algorithm that converges to the optimal solution is guaranteed to be a PAC search algorithm (as defined in Definition 3.2.1) iff it halts (and returns a solution) when a sufficient PAC condition has been met. How do we recognize when a sufficient PAC condition has been met? For a given start state s, a solution of cost U is (1 + ɛ)-admissible if the following equation holds. U h (s) (1 + ɛ) (3.1) However, Equation 3.1 cannot be used in practice as a sufficient PAC condition in PAC-SF, because h (s) is known only when an optimal solution has been found. Next, we present several practical sufficient PAC conditions, that are based on Equation Trivial PAC Condition Recall that S denotes a randomly drawn start state. The following condition is a sufficient PAC condition. P r(u h (S) (1 + ɛ)) > 1 δ (3.2) The key idea here is to assume we know nothing about a given start state s except that it was drawn from the same distribution as S (i.e., drawn from M according to distribution D). With this assumption, the random variable h (S) can be used in place of h (s) in Equation 3.1. To use the sufficient PAC condition depicted in Equation 3.2, the distribution of h (S) is required. P r(h (S) X) can be estimated in a preprocessing stage by randomly sampling states from S. Each of the sampled states is solved optimally, resulting in a set of h values. The 34

45 cumulative distribution function P r(h (S) X) can then be estimated by simply counting the number of instances with h X, or using any statistically valid curve fitting technique. A reminiscent approach was used in the KRE formula [Korf et al., 2001] for predicting the number of nodes generated by IDA*, where the state space was sampled to estimate the probability that a random state has a heuristic value h X. Note that the procedure used to sample the state space should be designed so that the distribution of the sampled states will be as similar as possible to the real distribution of start states. In some domains this may be difficult, while in other domains sampling states from the same distribution is easy. For example, sampling 15-puzzles instances from a uniform distribution over the state space can be done by generating a random permutation of the 15 tiles and verifying mathematically that the resulting permutation represents a solvable 15-puzzle instance [Johnson, 1879]. Sampling random states can also be done in some domains by performing a sequence of random walks from a set of known start states. Clearly, a sufficient PAC condition based on P r(h (S) > X) is very crude, as it ignores all the state attributes of the initial state s. For example, if for a given start state s we have h(s) = 40 and h is admissible then h (s) cannot be below 40. However, if one of the randomly sampled states has h of 35 then we will have P r(h (S) < 40) > Ratio-based PAC Condition Next, we propose an alternative sufficient PAC condition, that regards the heuristic value of the start state s. Instead of considering the distribution of h (S), consider the distribution of the ratio between h and h for a random start state S. We denote this as h (S). Similarly, the cumulative distribution function P r( h (S) > Y ) is the probability that a h h random start state S (i.e., drawn from from M according to distribution D) has h larger than a value Y. This allows the following sufficient h PAC condition. P r( h h (S) U h(s) (1 + ɛ) ) > 1 δ (3.3) Equation 3.3 can be seen as a simple extension of Equation 3.2, where both sides are divided by the heuristic estimate of the random state, and given the heuristic value of the specific start state s. It is easy to see that Equation 3.3 is indeed a sufficient PAC condition. The benefit of this condition is that the heuristic estimate of the given start state (h(s)) is considered. 35

46 PDF Probabilities CDF Probabilities Estimating P r( h (S) Y ) in practice can be done in a similar manner that was described above for estimating P r(h (S) X). First, h random problem instances are sampled. Then, collect h values instead of h values, and generate the corresponding cumulative distri- h bution function P r( h (S) Y ). Note that if h is admissible then h P r( h (S) 1) = 1. h 35% 30% 25% 20% 15% 10% 5% 0% h*/h Ratio 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% PDF CDF Figure 3.1: h h distribution for the additive 7-8 PDB heuristic. Our experiments (detailed below) were performed on the 15-puzzle domain - a standard search benchmark, using the additive 7-8 PDB as a heuristic [Korf and Felner, 2002; Felner et al., 2004b]. We use these experiments to demonstrate also how the sampling process described above can be done. First, the distribution of h for the 15-puzzle and the h additive 7-8 PDB heuristic has been learned as follows. The standard 1,000 random 15-puzzle instances [Felner et al., 2004b] were solved optimally using A*. The ratio h was calculated for the start state of every h instance. Figure 3.1 presents the resulting cumulative and probability distribution functions. The x-axis displays values of h. The blue bars h which correspond to the left y-axis show the probability of a problem instance having a specific h value. In other words the blue bars show h the probability distribution function (PDF) of h h which is P r( = X). h h The red curve, which corresponds to the right y-axis, shows the cumulative distribution function (CDF) of h, i.e., given X the curve shows h P r( h X). h Assume that we are given as input the start state s, ɛ = 0.1 and δ = 0.1. Also assume that h(s) = 50 and a solution of cost 60 has been found (i.e., U = 60). According to sufficient PAC condition depicted in 36

47 Equation 3.3, the search can halt when: P r( h h (S) U h(s) (1 + ɛ) ) > 1 δ Setting U=60, h(s)=50, ɛ=0.1 and δ=0.1, we have: P r( h (S) 1.09) > 0.9 h The probability that h (S) 1.09 can be estimated with the CDF displayed in Figure 3.1. As indicated by the red dot above the 1.1 point of h the x-axis (according to the right y-axis), P r( h (S) < 1.09) 0.1 and h consequently P r( h (S) 1.09) > 0.9. Therefore, the sufficient PAC h condition from Equation 3.3 is met and the search can safely return the incumbent solution (60) and halt. By contrast, if the incumbent solution were 70, then = 1.27, and according to the CDF in Figure 3.1 h h U h(s) (1+ɛ) is lower than 1.27 with probability that is higher than 90%. Therefore, in this case the sufficient PAC condition is not met, and the search will continue, seeking for a better solution than 70. It is important to note that the process of obtaining the distribution of P r( h (S) X) is done in a preprocessing stage, as it requires solving a h set of instances optimally. This is crucial since optimally solving a set of instances may be computationally expensive (if finding optimal solutions is easy there is no need for PAC search). This expensive preprocessing stage is done only once per domain and heuristic. By contrast, the actual search is performed per problem instance. Usually one implements a search algorithm to be used for many problem instances. Therefore the cost of expensive preprocessing stage should be amortized over the gain achieved for all instances that will be solved by the implemented algorithm Exploiting a Lower Bound PAC-SF with the ratio-based PAC condition (Equation 3.3) is very general. Finding new incumbent solutions (line 3 in Algorithm 1) can even be done with depth-first branch-and-bound (DFBnB) or local search algorithms (e.g., Hill Climbing). All that is required is a (not necessarily admissible) heuristic estimate of the start state (h(s)), and the distribution of h. h However, using the ratio-based PAC condition means that PAC-SF will only consider to halt the search when a new incumbent solution has been found. Several anytime best-first search algorithm provide additional knowledge during the search that can used to construct a better 37

48 PAC condition, that may cause PAC-SF to halt even without finding a new incumbent solution. Specifically, some search algorithms provide and gradually improve a lower bound on the optimal cost. Let L be a lower bound on the optimal cost, obtained during the search. Clearly, L. Therefore, the PAC condition in Equation 3.3 can be refined, resulting in the following sufficient PAC h(s) condition: h (s) h(s) Corollary [Ratio-based PAC condition with a lower bound ] The following equation is a sufficient PAC condition: P r( L h(s) h (s) h(s) U h(s) (1 + ɛ) ) δ Proof: Recall the PAC condition in Equation 3.3: P r( h h (S) P r( h h (S) < U h(s) (1 + ɛ) ) > 1 δ U h(s) (1 + ɛ) ) δ Since L is a lower bound on the optimal cost, then clearly L h (s). h(s) h(s) Therefore, we can bound h (S) from below by L. h h(s) Any search algorithm that maintain the all the generated nodes in an open list can use f min = min n OP EN (g(n) + h(n)) as a lower bound on the optimal cost, if h is admissible. Note that even if the search algorithm do not expand nodes according to the lowest g(n) + h(n), the value of f min is still a lower bound on the optimal cost [Hansen and Zhou, 2007]. As the search progresses, f min increases, and the condition in Corollary can be met even if a new incumbent solution has not been found yet. Consequently a PAC solution will be identified faster then when using the condition in Equation Learning from the Open List The PAC condition in Corollary can be met when either a better incumbent solution is found (decreasing U), or when the lower bound on the optimal cost increases (increasing L). A lower bound L that is obtained by f min as explained above will only increase after all the nodes with g + h L are expanded. In combinatorially large state spaces with a heuristic that is not perfect, there may be exponential number of such nodes. The following sufficient PAC condition, named Open-based PAC condition, is based on the knowledge gained from the nodes in OPEN, and can be met even before f min increases of U decreases. 38

49 Corollary [Open-based PAC condition] Let P (U, n) = P r( h (S) 1 ( U g h h(n ) 1+ɛ old(n ))) Then the following is a sufficient PAC condition: log P (U, n) log(1 δ) n OP EN Proof: The shortest path from s to the goal must pass through one of the nodes in OPEN. Therefore: h (s) = min g(n) + n OP EN h (n) Consequently, if all the nodes in OPEN do not lead to a solution that is lower than U, then U is a PAC solution. A node n OP EN is not a 1+ɛ part of a path from s to a goal of cost less than U 1+ɛ if: g(n) + h (n) U 1 + ɛ h (n) U 1 + ɛ g(n) Thus, the probability a node n OP EN is not a part of a path from s to a goal of cost less than U is given by: 1+ɛ P r( h h (S) 1 h(n) ( U g(n))) = P (U, n) 1 + ɛ Therefore, the probability that every node in OPEN is not a part of a path from s to a goal of cost less than U is given by: 1+ɛ P (U, n) n OP EN Once the expression above is below 1-δ, a PAC condition is met. A logarithm is applied to both sides to avoid precision issues, result in the expression displayed in Corollary The great benefit of the open-based PAC condition is that it can be met after every expansion. Let P (U) denote the sum n OP EN log P (U, n). Whenever P (U) log(1 δ), the open-based PAC condition is met. Whenever a better incumbent solution is found and U decreases, P (U) must be recalculated for all the nodes in OPEN, incurring O( OP EN ) overhead. However, this occurs only when a new incumbent solution is found. By contrast, it is possible to calculate P (U) in an incremental manner efficiently after expanding each node, incurring only O(1) operations. Whenever node n is expanded, P (U) must 39

50 decrease by log(p (U, n)), and increase by log(p (U, n )) for every child n of n that enters OPEN. If a child node n is already in OPEN and its g-value has decreased (i.e, a better path has been found to n ), the if g old (n ) is the old g-value of n and g new (n ) is the new g-value of n then P (U) is updated by decrementing log P (U, n) calculated according to g old (n ) and adding log P (U, n) calculated according to g new (n ). 3.4 Experimental Results AWA*+PAC 1+ɛ 1-δ A* w=1.1 w=1.25 w=1.5 w= ,688 21,048 (0.98) 21,913 (0.92) 26,989 (0.90) 43,573 (0.89) ,988 (0.99) 23,194 (0.98) 28,367 (0.98) 44,949 (0.98) ,995 (1.00) 23,213 (1.00) 28,391 (1.00) 44,980 (1.00) ,005 (1.00) 23,230 (1.00) 28,409 (1.00) 45,002 (1.00) 1 22,007 (1.00) 23,236 (1.00) 28,416 (1.00) 45,008 (1.00) ,841 (0.99) 7,143 (0.89) 18,826 (0.84) ,761 (1.00) 18,345 (0.96) 33,031 (0.93) ,020 (1.00) 23,504 (0.98) 38,926 (0.97) ,619 (1.00) 25,466 (1.00) 41,558 (0.99) 1 9,369 (1.00) 23,165 (1.00) 28,335 (1.00) 44,935 (1.00) ,043 (1.00) 1,637 (0.90) 0.8 1,257 (1.00) 2,566 (0.93) 0.9 1,496 (1.00) 5,046 (0.97) ,786 (1.00) 5,961 (0.99) 1 1,865 (1.00) 7,079 (1.00) 17,715 (1.00) (0.99) (1.00) (1.00) (1.00) (1.00) 782 (1.00) Table 3.1: Average number of nodes expanded until PAC-SF returned a solution. Next, we demonstrate empirically the benefits of PAC-SF on the 15- puzzle, which is a standard search benchmark. A solution was returned when the sufficient PAC condition described in Equation 3.3 was met, using the distribution of h shown in Figure 3.1. In this set of experiment, we used only the sufficient PAC condition described in Equa- h tion 3.3 for clarity of presentation. For producing solution (line 3 in Algorithm 1) we have used Anytime Weighted A* [Hansen and Zhou, 2007] described in Section 2.3, which is an anytime variant of Weighted A* [Pohl, 1970]. While WA* halts when a goal is expanded, AWA* continues to search, returning better and better solutions. Eventually, AWA* will converge to the optimal solution and halt. In the experiments we varied the following parameters: 40

51 Weights (w for AWA*): 1.1, 1.25, 1.5 and 2. Desired suboptimality (1 + ɛ): 1, 1.1, 1.25 and 1.5. Required confidence (1 δ): 0.5, 0.8, 0.9, 0.95 and 1. Table 3.1 shows the number of nodes expanded until PAC-SF returned a solution. The data in every cell of the table is the average over the standard 100 random 15-puzzle instances [Korf, 1985b]. For reference, the average number of nodes expanded by A*, which is 37,688, is given the in the column denoted by A*. Note that these 100 instances are not included in the standard 1,000 random instances [Felner et al., 2004b] that were used to estimate P r( h (S) > X). This was done to h separate the training set from the test set. The values in brackets show the ratio of instances where the solution returned indeed achieved the desired suboptimality, i.e, when the cost of the solution was no more than 1 + ɛ times the optimal solution. Until the first solution has been found, AWA* behaves exactly like Weighted A*. Therefore it is guaranteed that the first solution found by AWA* with weight w is not larger than w times the optimal solution [Pohl, 1970]. For example, if the desired suboptimality is 1.25, 1.5 or 2, the number of nodes expanded by AWA* with w = 1.25 is exactly the same, and the confidence that the found solution achieves the desired suboptimality is 1.0. Therefore, for w = 1+ɛ we report only results with confidence 1.0 (for reference) and omit the results for w < 1 + ɛ since as explained above they are exactly the same as the results for w = 1 + ɛ. As can be seen, all the values in the brackets exceed the required confidence significantly. Therefore, in this domain PAC-SF succeeds in returning solutions that achieve the desired suboptimality with confidence higher than the required confidence. Note that in this domain PAC-SF with AWA* is very conservative. For example, AWA* with w=1.5 achieved a suboptimality of 1.25 for all the 100 instances, even when the required confidence was only 0.5 (1 delta = 0.5). This suggests that it may be possible to further improve the proposed PAC identification technique in future work. Now, consider the number of nodes expanded when w > 1 + ɛ. Clearly, for every value of suboptimality (i.e., every value of 1 + ɛ), decreasing the required confidence (1 δ) reduced the number of nodes expanded. This means that relaxing the required confidence indeed allowed returning solutions of the desired quality faster. For example, consider AWA* with w = 1.5, where the desired suboptimality is 1.25 (1+ɛ=1.25). If the required confidence is 1.0, then AWA* with w = 1.5 expanded 7,079 nodes. On the other hand, by relaxing the required confidence to 95% (i.e., 1-δ=0.95), AWA* expanded only 1,786 nodes. In- 41

52 terestingly, even when the required confidence is set to 95%, AWA* with w = 1.5 was able to find a 1.25-admissible solution in 100% of the instances (see the value in the brackets). Table 3.2 presents a comparison between the three sufficient PAC conditions that are based on the h cumulative distribution. These experiments as well was performed on the 15-puzzle domain. The PAC h conditions used were the ratio-based PAC condition given in Equation 3.3, the ratio-based PAC condition augmented with the lower bound given in Corollary 3.3.1, and the open list-based PAC condition given in Corollary These conditions are denoted in Table 3.2 by RPAC, RPAC+LB and RPAC+OPEN respectively. Every data cell of Table 3.2 correspond to a specific value of ɛ, δ and weight w for AWA*. The values in every data cell contain the same type of values as explained for Table 3.1, i.e., the average number of nodes expanded until a solution was return (meaning that the relevant PAC condition was met), and in brackets are the ratio of instances where desired suboptimality was achieved. Similar to the results in Table 3.1, the values in the brackets exceed the required confidence significantly for all of the sufficient PAC conditions. Furthermore, for all of the proposed sufficient PAC conditions PAC-SF with AWA* in this domain is very conservative, again suggesting that it might be possible to develop even better PAC conditions in future work. In terms of expanded nodes, it is clear that RPAC+LB is superior to the RPAC (i.e., the ratio-based PAC condition in Equation 3.3), and that RPAC+OPEN is superior to all of the other sufficient PAC condition. When the desired suboptimality is 1 (1 + ɛ = 1), the advantage is minor. However, for higher values of ɛ the advantage of RPAC+OPEN over the other sufficient PAC conditions is greater. For example, consider the number of nodes expanded with RPAC, RPAC+LB and RPAC+OPEN, for 1 + ɛ=1.10 and 1 δ=0.99 using AWA* with w=1.2. Using RPAC, a solution was found after expanding 10,125 nodes on average, while RPAC+LB required only 6,306 nodes and RPAC+OPEN required 4,269. In the same setting, by decreasing 1 δ to 0.9, RPAC expanded 9,257 nodes, RPAC+LB expanded 5,581 nodes and RPAC+OPEN expanded only 3,377 nodes. 3.5 Conclusions and Future Work In this chapter the probably approximately correct concept from machine learning is adapted to heuristic search. A PAC heuristic search algorithm, denoted as PAC search algorithm, is defined as an algorithm that returns with high probability a solution with cost that is w-admissible. A 42

53 1 δ ɛ=1.00, AWA* w=1.1 RPAC 21,782 (0.99) 21,789 (1.00) 21,799 (1.00) 21,801 (1.00) 21,801 (1.00) RPAC+LB 15,320 (0.96) 21,180 (0.99) 21,190 (0.99) 21,801 (1.00) 21,801 (1.00) RPAC+OPEN 14,983 (0.96) 20,132 (0.98) 21,189 (0.99) 21,801 (1.00) 21,801 (1.00) 1 + ɛ=1.00, AWA* w=1.2 RPAC 21,876 (0.99) 21,911 (1.00) 21,924 (1.00) 21,930 (1.00) 21,930 (1.00) RPAC+LB 18,745 (0.97) 21,453 (0.98) 21,745 (0.98) 21,930 (1.00) 21,930 (1.00) RPAC+OPEN 18,402 (0.96) 21,385 (0.98) 21,744 (0.99) 21,927 (1.00) 21,929 (1.00) 1 + ɛ=1.00, AWA* w=1.3 RPAC 28,163 (0.98) 28,185 (1.00) 28,204 (1.00) 28,210 (1.00) 28,210 (1.00) RPAC+LB 27,310 (0.96) 27,892 (0.99) 28,176 (0.99) 28,210 (1.00) 28,210 (1.00) RPAC+OPEN 26,730 (0.97) 27,519 (0.99) 27,730 (1.00) 27,962 (1.00) 28,109 (1.00) 1 + ɛ=1.10, AWA* w=1.1 RPAC 9,339 (1.00) RPAC+LB 9,339 (1.00) RPAC+OPEN 9,339 (1.00) 1 + ɛ=1.10, AWA* w=1.2 RPAC 8,019 (1.00) 9,257 (1.00) 9,697 (1.00) 10,125 (1.00) 10,125 (1.00) RPAC+LB 3,617 (1.00) 5,581 (1.00) 6,165 (1.00) 6,306 (1.00) 6,327 (1.00) RPAC+OPEN 3,340 (1.00) 3,377 (1.00) 3,857 (1.00) 4,269 (1.00) 4,344 (1.00) 1 + ɛ=1.10, AWA* w=1.3 RPAC 17,692 (0.96) 22,115 (0.98) 23,834 (1.00) 25,917 (1.00) 25,917 (1.00) RPAC+LB 10,826 (0.96) 15,458 (0.98) 18,377 (1.00) 19,271 (1.00) 19,759 (1.00) RPAC+OPEN 6,946 (0.96) 8,670 (1.00) 10,076 (1.00) 12,042 (1.00) 12,819 (1.00) 1 + ɛ=1.20, AWA* w=1.2 RPAC 2,988 (1.00) RPAC+LB 2,988 (1.00) RPAC+OPEN 2,988 (1.00) 1 + ɛ=1.20, AWA* w=1.3 RPAC 1,921 (0.97) 2,672 (0.98) 3,594 (1.00) 5,669 (1.00) 5,669 (1.00) RPAC+LB 1,882 (0.97) 2,318 (0.98) 2,970 (1.00) 3,307 (1.00) 3,899 (1.00) RPAC+OPEN 1,545 (0.97) 2,080 (1.00) 2,216 (1.00) 2,449 (1.00) 2,907 (1.00) 1 + ɛ=1.30, AWA* w=1.3 RPAC 1,051 (1.00) 1,105 (1.00) 1,275 (1.00) 1,696 (1.00) 1,696 (1.00) RPAC+LB 1,024 (1.00) 1,040 (1.00) 1,241 (1.00) 1,271 (1.00) 1,348 (1.00) RPAC+OPEN 1,002 (1.00) 1,024 (1.00) 1,045 (1.00) 1,084 (1.00) 1,131 (1.00) Table 3.2: Performance of different PAC conditions. 43

54 general framework for a PAC search algorithm is presented. This framework can use any algorithm that returns a sequence of solutions, and obtain a PAC search algorithm. A major challenge in PAC search is to identify when a found solution is good enough. We propose an easy-toimplement technique for identifying such a solution, based on sampling and estimating the ratio between the optimal path and the heuristic estimate of the start state. Empirical evaluation on the 15-puzzle demonstrate that the proposed PAC search algorithm framework is able to indeed find solutions with the desired suboptimality with probability higher than the required confidence. Furthermore, by allowing AWA* to halt when the desired suboptimality is reached with high probability (but not 100%), AWA* is able to find solutions faster (i.e., expanding less nodes) than regular AWA*, which halts when the desired suboptimality is guaranteed. We currently plan to extend this work in several directions. One of the directions that we are currently perusing is to develop better sufficient PAC conditions that exploit the knowledge gained during the search (e.g., exploiting the openlist in a best-first search). Another research direction is to obtain a more accurate h distribution by using an abstraction of the state space, similar to the type system concept used to predict h the number of states generated by IDA* in the CDP formula [Zahavi et al., 2010]. States in the state space will be grouped into types, and each type will have a corresponding h distribution. A third research direction h is how to adapt the choice of which node to expand next to incorporate the value of information gained by expanding each node. 44

55 Chapter 4 Searching for Patterns in an Unknown Graph Many real-life problems can be modeled as problems on graphs, where one needs to find subgraphs that have a specific structure or attributes. Examples of such subgraph structures are shortest paths, any path, shortest traveling salesperson tours and cliques. Most classical algorithms that solve such graph problems assume that the structure of the graph is given either explicitly, in data structures such as adjacency list or adjacency matrix, or implicitly, with a start sate and a set of computable operators (e.g., moving a tile in a sliding tile puzzle state). The complexity of such search algorithms is therefore measured with respect to CPU time and memory demands. We refer to such problems as problems on known graphs. By contrast, there are domains that can be modeled as graphs, where the graph structure is not known a priori, and exploring vertices and edges requires a different type of resource, that is, neither CPU nor memory. For example, a robot navigating in an unknown terrain, where vertices and edges correspond to physical locations and roads, respectively. Acquiring knowledge about the vertices and edges of the searched graph may require activating a physical sensor and possibly mobilizing the robot, incurring a cost of fuel (or any other energy resource). Another example is an agent searching the World Wide Web, where the web sites and hypertext links represent the vertices and edges of the searched graph, respectively. Since the web is extremely large and dynamic, accessing vertices requires sending and receiving network packets (e.g., HTTP request/response). We refer to such problems as problems on unknown graphs. Solving problems in unknown graphs requires exploring some parts of the graph. We define an exploration action for a vertex as an action that discovers all its outgoing edges and neighboring vertices. Such 45

56 exploration actions are associated with a cost, denoted hereafter as exploration cost. This exploration cost is often conceptually different than the traditional computational effort (of CPU and memory). In the web graph domain, for example, the exploration can correspond to sending an HTTP request, retrieving an HTML page and parsing all the hypertext links in it. The hypertext links are the outgoing edges, and the connected web sites are the neighboring vertices. The associated exploration cost includes the network I/O of sending and receiving IP packets. For a physical domain, where a robot is navigating in an unknown terrain, the exploration is done by using sensors at a location to discover the near area, e.g., the outgoing edges and the neighboring vertices in the map. The associated exploration cost includes the cost of activating sensors at a vertex. In both cases the CPU and memory costs are often negligible in comparison to the other exploration costs. An important task, which is addressed in this chapter, is to solve the problem while minimizing the exploration cost. This is especially important when computational CPU cost is of lesser importance and can be neglected as long as it is running in time that is tractable. In this chapter, we address the problem of searching for a specific pattern of vertices and edges in an unknown graph while aiming to minimize the exploration cost. Starting from a single known vertex, the search is performed in a best-first search manner. In every step, if the desired pattern does not exist in the known part of the graph a single best vertex is chosen and explored. This process is repeated until the desired pattern is found or until the entire unknown graph is explored. Several general heuristic algorithms are proposed for choosing which vertex to explore next: KnownDegree, Pattern and RPattern. KnownDegree is a straightforward adaptation of a common known graph heuristic, in which the vertex with the highest degree is explored first. Pattern exploits the structure of the searched pattern by choosing to explore the vertex that is closest to being a part of the searched pattern. A metric for closeness of a vertex to a pattern is presented. With this closeness metric, Pattern has the property of returning a tight lower bound on the number of exploration steps required to find the searched pattern. For scenarios where probabilistic knowledge of the unknown graph is available, we propose the RPattern heuristic algorithm. RPattern is a randomized heuristic algorithm that chooses the next vertex to explore by applying a Monte-Carlo sampling procedure in combination with Pattern as a default heuristic. To demonstrate the applicability of the proposed heuristic algorithms, we describe how to implement them for two specific patterns: a k- clique and a complete (p-q)-bipartite graph. We develop the concept of a potential k-clique and a potential complete (p-q)-bipartite graph, 46

57 along with supporting corollaries that allow efficient implementation of the Pattern heuristic algorithm for these patterns. Empirical evaluation were performed on the k-clique pattern, by applying the proposed heuristic algorithms when searching random and scale-free graphs. Results show that the performance of Clique and RClique (the k-clique variants of Pattern and RPattern respectively), in terms of exploration cost, is equal to and often much better than an adaptation of the stateof-the-art clique search algorithm.the strengths and weaknesses of the different heuristic algorithms are evaluated. We also implemented and evaluated the algorithms on a web crawler application, where the papers that are accessible via Google Scholar are the vertices of the searched unknown graph. Results show that using Clique, cliques are found more often and with less exploration cost compared to KnownDegree and random exploration. Beyond the value of investigating such a basic problem in the unknown graph setting, finding patterns in an unknown graph has practical applications in real-world domains. For example, finding a set of physical locations forming a clique suggests the existence of a metropolitan area. Another example is a set of scientific papers, where finding a set of papers that reference each other suggest resemblance in content. Therefore finding such a cluster of referencing papers can be useful in a data mining context (complemented by a textual data mining approach), where the goal is to find a set of scientific papers from a given subject. Section 4.7 describes experimental results of such a web crawler application where a k-clique of referencing papers is searched in Google Scholar. This chapter is organized as follows. First the problem of finding a pattern in an unknown graph is formally defined (Section 4.1) and related work are listed (Section 4.2). Then, a best-first search framework for solving this problem is described (Section 4.3). Several deterministic heuristic algorithms are given for this best-first search framework (Section 4.4), as well as a heuristic algorithm that can exploit probabilistic knowledge of the searched graph (Section 4.5). Following, we analyze the proposed best-first search framework theoretically (Section 4.6) and compare the described heuristic algorithms experimentally (Section 4.7). This chapter finally concludes with a discussion and future work (Section 4.8). Most of the research reported in Chapter 2 was published in the conferences AAMAS-10 and SoCS-10. This material presented in this chapter has been submitted to the Journal of Artificial Intelligence (JAIR) and is currently under review. 47

58 4.1 Problem Definition Following are several definitions and notations required for formally describing the problem of finding a specific pattern in an unknown graph. Some graph problems are given a graph as input explicitly, in data structures such as adjacency list or adjacency matrix. All the vertices and edges of the graph are easily accessible by searching the given data structure. We call such problems explicitly known graph problems. In other graph problems, the graph is given as input implicitly, by an initial set of vertices and a set of computational operators. These operators can be applied to a vertex to discover its neighbors. Consequently, the vertices and edges of the graph can be discovered by applying the given operators. We call such problems implicitly known graph problems. Prominent examples of implicitly known graph problems are various combinatorial puzzles such as the sliding tile puzzles and Rubik s cube. Similarly, state graphs of planning problems are also given implicitly. In general, we regard both explicitly known graph problems and implicitly known graph problems as known graph problems. In this chapter a different type of problems are considered, that we call unknown graph problems. Similar to the implicitly known graph problems, in unknown graph problems, the vertices and edges of the graph are not given in advance, except for an initial vertex. By contrast, in an unknown graph problem the vertices and edges of the graph cannot be accessed via any computational operator alone. Exploring vertices and edges of the graph requires applying exploration action that incur a different cost than CPU cycles. In an unknown graph problem we aim at minimizing this cost. Let G = (V, E) be the initially unknown graph, and let G P = (V P, E P ) be the searched pattern. We use the terms the searched graph and the pattern graph to refer to G and G P respectively. The input to the problem addressed in this chapter is the pattern graph G P, a single vertex s from the searched graph G, and an exploration action, which is defined next. Definition [Explore] Explore:V 2 V is a function that returns all the neighbors of a given vertex in the searched graph. This exploration model is inspired by the fixed graph model [Kalyanasundaram and Pruhs, 1994]. Each exploration action has a corresponding exploration cost which depends on the vertex: Definition [Exploration Cost] The function ExpCost : V R + returns the cost of exploring a given vertex. 48

59 The cost function is additive, meaning that the total cost of multiple exploration actions is the summation over all the exploration costs of the explored vertices. Now we can define the problem of finding a pattern in an unknown graph: Definition [Subgraph isomorphism problem in an unknown graph] Given a pattern graph G P, a vertex s in the searched graph G and an exploration action Explore() with an associated exploration cost ExpCost(), the goal is to find a subgraph of G that is isomorphic to G P with minimal exploration cost. In this chapter we simplify the problem by assuming a constant exploration cost C, i.e., v V ExpCost(v) = C for some C. This is motivated by a number of real world scenarios such as the following examples: (1) a central controller that can be queried to provide the exploration data, and (2) querying a web page from a single host (this example is further justified in Section 4.7). For simplicity and without loss of generality, we will assume in the rest of the chapter that C = 1. This focuses the problem on minimizing the number of vertices that are explored before finding the desired pattern in the searched graph. Also, for clarity of presentation we assume that the searched graph is a connected and undirected. Extending the results in the chapter to directed graphs is straightforward. In Section 4.3 we briefly discuss the case of an unknown graph with more than a single connected component. 4.2 Related Work There are several problems that have already been researched in the context of an unknown domain that can be represented as a graph. A prominent example is the exploration problem, where the goal is to visit all the locations (vertices) in an unknown environment. Much work has been previously done on exploration [Gkasieniec et al., 2008; Kim, 2006; Christian A. Duncan and Kumar, 2006; Fleischer and Trippen, 2005; Panaite and Pelc, 1998; Awerbuch et al., 1995; Kalyanasundaram and Pruhs, 1994] for various types of graphs and agents. The key difference between our work and exploration is that while in an exploration task the goal is to explore all the vertices, we try in our work to avoid exploring the entire graph in order to minimize the exploration cost. Indeed, as the results in Section 4.7 show, the desired pattern can often be found without exploring large parts of the unknown graph. 49

60 4.2.1 Pathfinding Another related topic is path finding in an unknown environment, where the goal is to find a path between two locations. This challenge has been researched extensively in the fields of robotics and artificial intelligence. Navigation is a special variant of path finding in an unknown environment, in which exploring a vertex requires moving to it and the goal is to minimize the movements until a path is found. There are many navigation algorithms for numerous variations of the navigation problem [Cucka et al., 1996; Korf, 1990; Gabriely and Rimon, 2005; Bruce and Veloso, 2003]. Examples of navigation variants include performing a navigation task between 2 vertices repeatedly, with a single agent [Argamon-Engelson et al., 1999] and with multiple agents and various communication paradigms [Meshulam et al., 2005]. Another work studied the problem of distributed navigation of multiple agents. [Gilboa et al., 2006]. Pathfinding problems are often regarded in the context of real-time search. In real-time search the goal is to find an efficient path to a goal, but the amount of runtime allowed until a movement has to be performed is limited. As a result, the navigation planning and execution are interleaved. There are many real-time search algorithm, such as Real-Time A* and Learning Real-Time A* [Korf, 1990], Prioritized LRTA* [Rayner et al., 2007], Time-Bounded A* [Björnsson et al., 2009] and more. In the unknown graph context we do not assume any real-time constraints on the runtime between exploration actions. Many navigation algorithms try to find any path to a goal location. Roadmap-A* [Shmoulian and Rimon, 1998] is a pathfinding algorithm that does impose constraints on the length of the resulting path, while Physical A* [Felner et al., 2004c; 2002] finds the shortest path in an unknown physical graph. Notice that Physical A* and Roadmap-A*, as well as all the navigation algorithms described above, are designed for a physical unknown graph, i.e., where the cost of exploring a vertex is the distance from the last vertex explored (as a physical entity needs to move from the last vertex that have been explored to the new one). In this work we focus on a different problem (finding a pattern in a graph) with a different exploration cost model (constant exploration cost). While navigation and path finding problems are very important, the goal of the work presented in this chapter is to find a specific pattern of vertices and edges, and not a path to a single vertex. 50

61 4.2.2 Searching for a Pattern in a Graph Finding patterns in a graph is a well known NP-Complete problem [Garey and Johnson, 1979], also known as subgraph isomorphism. Ullman presented the classical backtracking and pruning algorithm [Ullmann, 1976]. Much work has followed that improves this algorithm with better vertex ordering [Shang et al., 2008] or by partitioning the graph to pieces and applying dynamic programming [Valiente and Martínez, 1997]. For the special case of planar graphs it is even possible to find patterns in linear time [Eppstein, 1995]. An important and well studied special case of searching for a pattern in a graph is the problem of finding a clique in a graph. This problem is NP-Complete as well [Garey and Johnson, 1979]. Bron and Kerbosch presented the classical algorithm for finding all the maximal cliques in a graph [Bron and Kerbosch, 1973]. Many algorithms exist for finding the largest clique in a graph [Östergård, 2002; Tomita and Kameda, 2007; Pardalos et al., 1997]. However, most clique algorithms are designed for explicitly known graph problems, and exploit a priori knowledge of the graph. For example, a common heuristic when searching for a clique is to start the search with the vertex that has the highest degree or by pruning vertices that have low degree. In an unknown graph, pruning all the vertices that have such a property (a low degree), requires exploring the entire graph, which is not relevant if we wish to minimize the exploration cost. The state-of-the-art algorithms for finding the largest clique in a known graph are based on local search [Battiti and Protasi, 1997; 2001; Pullan and Hoos, 2006; Battiti and Mascia, 2009]. In Section we describe these in more details. Furthermore, we propose how to adapt them to the unknown graph setting and discuss the limitations of these local search algorithms. A closely related work is the preliminary work on searching for k- cliques in physical unknown graphs with a swarm of agents [Altshuler et al., 2005]. In that work, each agent was directed to explore the closest largest clique in the known graph. First, this chapter goes beyond the specific clique pattern and addresses the more general problem of searching for any specific pattern. Second, there is a key difference between the physical unknown graph setting and the setting described in this chapter. We address unknown graphs that are not necessarily physical, which means that a vertex is not necessarily a physical location. Therefore the requirement that an agent should be physically located in a vertex v in order to explore it is dropped. Thus the two-level approach that is used for physical unknown graphs is not appropriate, as the lower level is redundant. Third, the focus of that work was mainly on task allo- 51

62 Procedure Explore Input: v V gen, The vertex to explore 1 V exp V exp {v} 2 V gen (V gen \ {v}) (neighbors(v) \ V exp ) 3 V known V known neighbors(v) 4 E known E known {e E e = (v, u), u neighbors(v)} cation for multiple agents. In this chapter we consider a single agent, and provide theoretical analysis of our problem in addition to experimental results. 4.3 Best-First Search in an Unknown Graph According to our problem definition (Definition 4.1), the task in an unknown graph problem is to minimize the exploration cost and not the computational effort. Therefore, it is worthwhile to store all the parts of the searched graph that have been returned by an exploration action. Let V known be a set containing the initial vertex s and all of the vertices that have been returned by an exploration action. Definition [Known Subgraph] G known = (V known, E known ) represents the subgraph of the searched graph that contains all the vertices in V known and the edges between them. In this chapter, G known is referred to as the known subgraph of G, or simply the known subgraph. For ease of notation, we borrow the terms expand and generate from the classical search terminology in the following way. Applying an exploration action (Definition 4.1) to vertex v will be referred to as expanding v and generating the neighbors of v. During the search, we denote V exp V known as the set of all the vertices that have already been expanded, and V gen = V known \ V exp as the set of all the vertices that have been generated but not expanded. Notice that new vertices and edges are added to G known only when expanding vertices from V gen, as all the other vertices in G known have already been expanded. A typical vertex goes through the following stages. First it is unknown. Then, when it is generated it is added to G known. Finally, when it is expanded, it is moved to V exp and its incident edges and neighboring vertices are also added to G known. Procedure Explore() lists how the lists described above (V exp,v gen, V known and E known ) are updated when a vertex v from V gen is expanded. Vertex v is inserted to V exp (line 1). All neighbors of v that have not been 52

63 expanded yet are added to V gen while v is removed from V gen (line 2). Finally, G known is updated with the vertices and edges that are connected to v (lines 3 4). Start Explore(S) Explore(A) S S A S A D B C B C Figure 4.1: Example of exploring an unknown graph. As an example of the exploration process, consider the two graphs displayed in Figure 4.1. Initially, only vertex S is known. Then, S is explored. The known subgraph G known after exploring vertex S is shown on the middle graph in Figure 4.1. All the neighbors of S are added to V gen and S is added to V exp. Then, A is explored and the corresponding known subgraph is shown on the right graph in Figure 4.1. As can be seen, when vertex A is explored, vertex D is discovered, and will now be added to V gen. Since vertex A has just been explored, it is moved from V gen to V exp. Also, the edge between A and C and the edge between A and B are now added to G known. Searching for a pattern in an unknown graph is inherently an iterative process, in which the graph is explored vertex by vertex. Since the goal is to minimize the exploration cost and not the computational effort, it is worthwhile to trade computation effort for saving unnecessary explorations. Therefore, we propose to search an unknown graph in a best-first search manner, as listed in Algorithm 2. First, the algorithm checks whether there is a subgraph in the known subgraph (i.e., G known ) that is isomorphic to G P (test() in line 3). If not, it chooses the next vertex to explore (line 4) and explores it (line 5). This process is repeated until the desired pattern is found or the entire graph has been explored (in case the desired pattern does not exist in G). Corresponding to a regular best-first search terms, V gen is the open-list, V exp is the closed-list and the test() action is the goal test. Recall that the problem addressed in this chapter is defined for the case where only a single vertex s is initially known, and the number 53

64 Algorithm 2 Best-first search in an unknown graph Input: s, Initial known vertex Input: G P, the desired pattern 5 V gen s V exp while (V gen ) AND (test(g P, G known )=False) do 6 v choosenext(v gen ) Explore(v) 7 end 8 if test(g P, G known )=True then 9 Return True 10 else 11 Return False 12 end of vertices in the searched graph is not known (Definition 4.1). Under this setting, if the searched graph is composed of a single connected component, Algorithm 2 is complete. This is because G known is initialized with s and in every iteration a vertex is expanded. Hence, eventually all the vertices on the same connected component as s will be expanded, finding the searched pattern or verifying that such a pattern does not exist in the searched graph. However, Algorithm 2 can be easily extended to a partially known graph, by simply initializing G known with the set of the initially known vertices. For example, if all the vertices in the unknown graph are known, but the edges are not, then G known is initialized as G known = (V, {}) (where V represents all the vertices in the searched graph). In such a case, every vertex in the graph can be chosen for exploration. If the graph is composed of more than a single connected component then it is easy to see that G known must be initialized with at least one member of each connected component C G Computational complexity The goal in this chapter it to minimize the number of exploration actions until finding the searched pattern. However, we also provide analysis of the computational complexity of the algorithms that are presented in this chapter. The computational complexity of an iteration of Algorithm 2 is composed of the computational complexity of: 1. Checking if the desired pattern is isomorphic to a subgraph of G known (test() in line 3). 2. Choosing the next vertex to explore (choosenext() in line 4). 54

65 3. Performing the exploration action and update the known subgraph (explore() in line 5). First, consider the computational complexity of the test() action (line 3). Searching for a pattern in a known graph can be preformed with a backtracking algorithm [Ullmann, 1976] that is polynomial in the number of vertices in the graph. The degree of the polynomial is the size of the pattern graph (in the worst case). Taking the k-clique pattern as an example, the computational complexity of searching for a k-clique pattern is O( V known k ) in the worst case. 1 This is a worst case analysis and there are many heuristic algorithms that are much more effective in practice [Shang et al., 2008], as well as more efficient algorithms for special types of graphs [Eppstein, 1995]. In addition, this search can be done incrementally, searching in every iteration only the subgraph of G known that contains the newly added vertices and their neighbors. Notice also that this search is performed only on the known subgraph (G known ), and thus it does not require any exploration cost. Although the focus of this work is on minimizing the exploration cost and not computational cost, in all our experimental settings (Section 4.7) we have found that the computational effort of the test() actions is less than a second for all the k values we have tested. Next, consider the choosenext() (line 4) action. This is the main challenge when searching in an unknown graph, as this is the only part of the best-first search that affects the exploration cost. In the following sections we propose several heuristic algorithms for choosing the next vertex to explore. These heuristic algorithms have different computational complexity, ranging from O(1) heuristics to more computationally exhaustive heuristic algorithms (e.g., Sections and 4.5.2). Finally, the computational complexity of updating the known subgraph, displayed in Procedure Explore(), is linear in the number of neighbors of the explored vertex: simply update G known (and supporting data structures) with the newly added vertices and edges. 4.4 Deterministic Heuristics Next we describe several heuristic algorithms for choosing the next vertex to explore and discuss their analytical properties KnownDegree Consider again the example of finding a k-clique in a graph. A very common and effective heuristic used for the k-clique problem in known 1 If k O( V ) then the problem is NP-complete [Garey and Johnson, 1979]. 55

66 Pattern (Gp) Graph (G) Figure 4.2: Example of a subgraph that is not an induced subgraph. graphs is to search first vertices with a high degree [Tomita and Kameda, 2007]. Vertices with a high degree are more likely to be a part of a k- clique, than vertices with a lower degree [Bollobás and Erdős, 1976]. This is also true for any pattern - vertices with high degree are more likely to be part of any specific pattern. This is because the problem we address in this chapter is to find a subgraph of G that is isomorphic to the pattern graph (See Definition 4.1 above), and not an induced subgraph. A graph G = (V, E ) is a subgraph of a graph G = (V, E) if all the vertices and edges in G exist in G, i.e., V V and E E. Thus, if the a vertex in the searched graph has more edges then its corresponding vertex in the pattern graph, it can still be matched to it. 2 For example, consider the pattern G p and graph G displayed in Figure 4.2. G contains a subgraph that is isomorphic to G p (marked by red circles), but G does not contain an induced subgraph that is isomorphic to G P. Since the real degree of a vertex v V gen is not known as it has not been expanded yet, we consider its known degree, which is the number of expanded vertices that are adjacent to v, i.e., v was seen when these nodes were expanded. We denote by KnownDegree the algorithm that chooses to expand the vertex with the highest known degree in V gen. For example, consider the graph in Figure 4.3. Throughout this chapter, we will mark expanded vertices in gray, and generated vertices in white. The generated vertex G has a known degree of 4 (it was seen from vertices A,B,C and D when they were expanded), vertices H,I and J have a known degree of 3, and vertices K,L,M and N have a known degree of 2. Hence, KnownDegree will choose to expand vertex G. 2 By contrast, G is an induced subgraph of G if all the vertices and edges in G exist in G, and all the edges in G between the vertices in V exist in E. 56

67 A B F H G S J C D E I L N K M Figure 4.3: Example of KnownDegree. In terms of computational complexity, it is possible to implement KnownDegree with only O(D log( V ) overhead in each iteration, where D is the maximum degree of a vertex in G. This can be done by storing all the vertices in V gen in a priority queue ordered by the known degree of the vertices. In every iteration a maximum of D vertices will have their known degree updated, causing log(d V gen ) operations to maintain the priority queue, and V gen is bounded by V. An obvious shortcoming of KnownDegree is that it ignores the actual pattern that is searched. Next we propose a more sophisticated heuristic algorithm that considers the structure of the searched pattern Pattern The next heuristic algorithm that we present is called Pattern. Pattern estimates the number of exploration actions required until the searched pattern is found. It then uses this estimation to choose the next vertex to expand. We first define the concept of extending the known subgraph. Definition [Extension of the Known Subgraph] A graph G is denoted as an m-extension of G known if it is possible that after expanding m vertices, G known will be equal to G. We say that a graph G is an extension of G known if there exists a number m such that G is an m-extension of G known. Next, we define the set of all m-extensions that contain a subgraph that is isomorphic to the searched pattern G P. Definition [Matching Extensions] The set of all m-extensions of G known that include a subgraph that is isomorphic to the searched pattern G P is referred to as the set of matching m-extensions and denoted by ME m. 57

68 Pattern (G P ) Known Graph (G known ) G (matching 1-extension) D S E D S E A C B A C B X Figure 4.4: Example of a matching extension and the Pattern heuristic algorithm. For a given vertex v, we refer to the subset of matching m-extensions in which v is a part of the subgraph that is isomorphic to G P as the matching m-extensions of vertex v, and denote it by ME m [v]. For every vertex v, the minimal m such that ME m [v] is referred to as the pattern distance of v. As an example, consider the graphs displayed in Figure 4.4. The left graph is the searched pattern G P, and the graph in the middle is the known subgraph G known. The rightmost graph in Figure 4.4 denoted by G is a possible 1-extension. G is a 1-extension because it might be discovered after one exploration action - if vertex C will be expanded next and vertex X will be generated. In addition, G is a matching 1- extension, since it contains a subgraph that is isomorphic to the searched pattern - the subgraph with the vertices {S, A, B, C, X}. Thus in this case the pattern distance of vertex C is one. Using the notations described above, we have that G ME 1 and G ME 1 [C]. Informally, the Pattern heuristic algorithm presented next chooses to expand the vertex that is most likely to be part of the searched pattern, among all the vertices that are closest to the searched pattern. Closeness of a vertex v is measured by the pattern distance of v, and likeliness is measured by the number of matching m-extensions of v. Algorithm 3 describes Pattern in details. m (the pattern distance) is initialized by one. If no vertex has any m-extensions, m is incremented. Otherwise, Pattern returns the vertex with the largest number of matching m-extensions (line 2). Note that after at most G P iterations a vertex must be returned, since after expanding G P vertices there is an extension where new vertices (i.e., vertices that were not in G known ) form the desired pattern. Pattern can be viewed as choosing the vertices according to a lexicographical ordering, first according to the pattern distance, and then according to ME m [v], where m is the size of the minimal pattern distance. 58

69 Algorithm 3 Pattern Input: G P, The searched pattern Input: G known, The known subgraph Input: V gen, The list of vertices that can be expanded next Output: The next vertex to expand 13 for m=1 to G P do 14 v best argmax v Vgen ( ME m [v] ) if ME m [v best ] > 0 then return v best 15 end A key challenge when implementing Pattern is how to generate and count the matching m-extensions. In general, this is intractable, as the number of matching m-extensions can be infinite, since we do not known the number of vertices in the searched graph. Nonetheless, it is sometimes possible to choose the vertex with the largest number number of extensions without explicitly counting the number of matching m-extensions. As an example, consider again the pattern and the known subgraph displayed in Figure 4.4. In the next iteration, either vertex D,C or E will be chosen for expansion. It is easy to see that all the matching 1-extensions of vertex D, i.e., ME 1 [D], require that an edge will be found between vertex D and C, resulting in the subgraph {S, A, B, C, D} matching G P. Therefore, the extensions in the matching 1-extensions of D also exists in the matching 1-extensions of C. Formally, this means that ME 1 [D] ME 1 [C]. Similarly, ME 1 [E] ME 1 [C]. Furthermore, ME 1 [C] contains extensions that are not in ME 1 [E] or in ME 1 [D]. For example, the graph G in Figure 4.4 is such a matching 1- extension. Consequently, C has the largest set of 1-matching extensions, with respect to D and E, and C will be chosen by Pattern. Pattern has the following property. Theorem The pattern distance of the vertex that is returned by Pattern is a tight lower bound on the number of expansions required until the searched pattern is found. This lower bound is tight in the sense that no larger lower bound exists. Proof: Assume by negation that it is possible to reach the searched pattern after C expansions, while the vertex v returned by Pattern has a pattern distance of C > C. If the searched pattern can be reached after C expansions, then there exists a C-extension of G known denoted by G that contains a subgraph that is isomorphic to the searched pattern G P. Therefore, there exists a vertex v V gen that has a matching C-extension. This contradicts the fact that v has been returned by Pattern, since by definition of Pattern there is no vertex in V gen with a m-matching extension for any m < C.. 59

70 B A C W V T X D S H E I F G Y Z Figure 4.5: Example of potential patterns. Pattern is clearly more computationally intensive than KnownDegree, and is not tractable in its general form. However, it can be implemented more efficiently for specific patterns. Next, we demonstrate such an implementation for the k-clique pattern and the pattern of a complete bipartite graph. The k-clique Pattern In order to implement Pattern for the k-clique, we introduce the concept of a potential k-clique and supporting terms. Definition [Potential k-clique] A set of vertices P C k V exp is a potential k-clique if there exists an extension of G known that contains a k-clique C k such that P C k = C k V exp. As an example, consider the graph in Figure 4.5. The set {S, F, G} is a potential 5-clique, since if vertices Y and Z will be expanded, they might turn out to be connected to each other, and as a result the set of vertices {S, F, G, Y, Z} will form a 5-clique. By contrast, the set {S, H, I} is not a potential 5-clique, since vertices H and I will never have an edge between them, as they do not have an edge in the known subgraph, and they have already been expanded. For ease of notation, we define the gcn function, that can be applied to any group of expanded vertices. Definition [Generated Common Neighbors] Let V be a set of expanded vertices. Then gcn(v ) = v V neighbors(v) V gen For example, gcn({s, F, G}) = {Y, Z} for the graph in Figure

71 The relation between a potential k-clique and finding a matching m- extension for the k-clique pattern is as follows. A generated vertex v has a matching m-extension if there is a potential k-clique P C k such that m + P C k + 1 k and v is connected to all the vertices in P C k (i.e., v gcn(p C k )). Conversely, it is possible to check if a vertex v has a matching m- extension for the k-clique pattern, by checking if there exists a potential k-clique P C k such that v gcn(p C k ) and m k P C k 1. The following corollary presents easy-to-compute conditions for checking if a set of vertices is a potential k-clique. Corollary A set of vertices P C k V exp, where P C k < k, is a potential k-clique if and only if 1. P C k is a clique. 2. gcn(p C k ) + P C k k. Proof: ( ) Let m be the number of vertices in P C k (i.e. m = P C k ). By definition, all the vertices in gcn(p C k ) have been generated but have not been expanded yet. Therefore, there exists an extension G where the vertices in gcn(p C k ) form a clique. Hence, in G the vertices in gcn(p C k ) P C k also forms a clique, since all the vertices in gcn(p C k ) are neighbors of all the vertices in P C k (Definition 4.4.2). Since gcn(p C k ) + P C k k we have that G contains a k-clique gcn(p C k ) P C k, and thus P C k is a potential k-clique as required (Definition 4.4.2). ( ) Assume that a set of vertices P C k is a potential k-clique. According to Definition this means that there exists a k-clique C k in an extension of G known such that P C k C k. Every subset of vertices of a clique also forms a (smaller) clique. Thus P C k must also be a clique. Furthermore, every vertex in C k \ P C k must be connected to every vertex in P C k (or else C k would not not be a clique). Thus P C k must also have k P C k common neighbors as required. Recall, that the Pattern heuristic algorithm chooses to expand the vertex with the lowest pattern distance. Implementing Pattern for the k-clique pattern can therefore be done by choosing to expand a vertex that is in the gcn of the largest potential k-clique. This vertex is the vertex with the minimal pattern distance in V gen. For clarity and to conform with previous publications, we denote by Clique this implementation of Pattern. In Pattern, tie breaking among vertices with the same (minimal) pattern distance is performed according to the number of matching m-extensions. For efficiency reasons, we implemented random tie breaking in Clique. 61

72 Procedure IncrementalUpdate Input: v, The vertex that was just expanded 1 foreach P C in PC k do 2 if v is a neighbor of all the vertices in P C then 3 P C P C {v} 4 if gcn(p C ) k P C then Add P C to PC k 5 if gcn(p C) < k P C then Remove P C from PC k 6 end 7 end 8 if gcn({v}) k 1 then Add {v} to PC k There are several ways to implement Clique with different computational complexities. We give details on our own implementation which was used in all our experiments (described in Section 4.7). Clique was implemented by maintaining a global list of potential k-cliques, denoted by PC k. Every set of expanded vertices that form a potential k-clique is stored in PC k. Note that a vertex may be part in more than one potential k-clique. When a vertex is expanded, PC k is updated incrementally, as follows. In the beginning of the search, the initial vertex s is expanded. If s has more than k 1 neighbors, then {s} is a potential k-clique, and PC k is set to be {{s}}. Otherwise, PC k is initializes as an empty set. Following, after a vertex v is expanded, PC k is updated according to the pseudo code listed in Procedure IncrementalUpdate(), described next. After a vertex v is expanded, every potential k-clique P C is examined. If v is not a neighbor of all the vertices in P C, then P C remains unchanged. Otherwise, there are two (not mutually exclusive) options: 1. P C with v is a new potential k-clique that should be added to PC k (lines 3). 2. P C is no longer a potential k-clique, and should be removed from PC k (lines 4). Checking these two options can be done easily according to Corollary The first option is checked as follows. Since v is a neighbor of all the vertices in P C, then clearly P C {v} is a clique. Thus if gcn(p C {v}) k P C {v}, then P C {v} is a new potential k-clique and should be added to PC k (line 3 in Procedure IncrementalUpdate()). For the second option, P C was a clique before v was expanded, and thus P C is still a clique. However, gcn(p C) has decreased by 1 since v has now been expanded. Thus, P C will be removed from PC k if gcn(p C) < k P C (line 4). Finally, v itself might start a new potential k-clique without any other expanded vertex. Thus if gcn({v} k 1 then {v} is a new potential k-clique (line 6). 62

73 S B S B S B A A C A D D (a) (b) (c) C Figure 4.6: An example of the incremental update of the set of potential k-clique. As an example of Procedure IncrementalUpdate(), consider the graphs in Figure 4.6 and assume that the size of the searched clique is 3. Initially, the known subgraph is the graph displayed in Figure 4.6(a). At this stage PC 3 = {{S}}, i.e., there is only a single potential 3-clique, consisting of vertex S. {S} is a potential 3-clique since gcn({s}) = {A, B} = 2 k {S} = 2. Next, vertex A is expanded, resulting in the known subgraph becoming the graph in Figure 4.6(b). Adding A to the potential 3-clique {S} do not create a new potential 3-clique, since gcn({s, A}) =. Furthermore, after expanding A the set {S} is no longer a potential 3-clique, since now gcn({s}) = {B} = 1. However, {A} is a new potential 3-clique, since gcn({a}) = {C, D} = 2. Thus after expanding vertex S we have PC 3 = {{A}}. Next, assume that vertex C is expanded, resulting in the graph in Figure 4.6(c). Now, the set {A, C} is a potential 3-clique, since C and A form a 2-clique and gcn({c, A} = {D} = 1 k {C, A} = 2. In fact, at this stage we can see that {A, C, D} is a 3-clique and the search can halt. The total computational complexity of the implementation of Clique described above is composed of two parts: (1) updating PC k according to Procedure IncrementalUpdate(), and (2) returning a vertex that is in the gcn of the largest member of PC k.the first step (Procedure IncrementalUpdate()) required O( PC k neighbors(v) ). The second step can be easily integrated into Procedure IncrementalUpdate(), by storing the largest potential k-clique and returning one of the vertices in its gcn. The computational complexity of this implementation of Clique is therefore O( PC k neighbors(v) ). Complete Bipartite Graph Next, we demonstrate a method for applying Pattern to the pattern of a complete bipartite graph. A bipartite graph is a graph whose vertex set can be partitioned into two subsets X and Y, so that each edge has one 63

74 K 3,2 K 2,5 Figure 4.7: Examples of complete bipartite graphs. end in X and one end in Y. Such a partition is called a bipartition of the graph. A complete bipartite graph is a bipartite graph with bipartition (X, Y ) in which each vertex of X is joined to all vertices of Y. If X = p and Y = q, such graph is denoted by K p,q [Bondy and Murty, 1976]. Figure 4.7 shows K 3,2 and K 2,5 [Weisstein, 2011]. Similar to the definition of a potential k-clique, we can define a potential K p,q as follows. Definition [Potential K p,q ] A pair of sets of vertices V p, V q V exp is a potential K p,q if there exists an extension of G known that contains a pair of sets of vertices C p and C q such that C p C q forms a K p,q and: V p = C p V exp V q = C q V exp As an example, consider again the graph displayed in Figure 4.5. While vertices {S, F, G} are a potential 5-clique, there is no partition of {S, F, G} to two sets that is a potential K 2,4. By contrast, the pair ({S}{H, I}) is a potential K 2,4, since if vertex W is connected to vertices V and T, then the two sets {S, W } and {H, I, V, T } will form a complete bipartite graph K 2,4. The relation between a potential K p,q and finding a matching m- extension is similar to the relation between a matching m-extension and a potential k-clique, when searching for a k-clique pattern. A vertex v has a matching m-extension if there is a potential K p,q with bipartition (V p, V q ) such that m + V p + V q + 1 p + q and either v is adjacent to all the vertices in V p or v is adjacent to all the vertices in V q (i.e. v gcn(v p ) or v gcn(v q )). Corollary provides easy-to-compute conditions for checking if a set of vertices is a potential K p,q. 64

75 Corollary A pair of sets of vertices V p, V q V exp, where V p p and V q q, is a potential K p,q if and only if 1. {V p V q } is a K Vp, V q 2. gcn(v p ) + V q q 3. gcn(v q ) + V p p The proof of corollary is a straightforward extension of the proof of Corollary Note that since the problem we address is to find an isomorphic subgraph and not an induced subgraph, two sets of vertices {V p, V q } may be a potential K p,q even if there is an edge between vertices in V p or an edge between vertices in V q. Using Corollary to implement Pattern for the complete bipartite graph pattern K p,q can be done as explained for the k-clique pattern. The purpose of the above discussion on the compete bipartite graph pattern is to demonstrate another pattern for which the concepts described for the k-clique pattern can be applied. In this chapter we present experimental results (Section 4.7) only for the k-clique pattern. 4.5 Probabilistic Heuristic The heuristic algorithms described in Section 4.4 assumed that nothing is known about the searched graph besides an initial vertex. However, there are many domains where the exact searched graph is unknown but some knowledge on the probability of the existence of edges is available. Formally, for every two vertices u and v, assume that a function P ˆ r(u, v) is available that estimates the probability of having an edge between u and v. For example, if the searched graph is the World Wide Web then it is well-known that it behaves like a scale-free graph [Bonato and Laurier, 2004], which is a graph where the degree distribution of its vertices follows a power law (i.e., the probability of a vertex having a degree x is x β for some exponent β). Furthermore, it is possible to classify web pages according to their URL [Kan, 2004; Kan and Thi, 2005]. Another example that is common in Robotics is a navigator robot in an environment represented by a graph. The robot may have an imperfect vision of the environment, wrongly identifying routes as passable with some probability [Thrun et al., 2005]. Such a probabilistic knowledge should affect the choice of which vertex to expand next. In this section we present heuristics for such graphs. 65

76 4.5.1 MDP Approach By adding probabilistic knowledge, the problem of searching for a pattern in an unknown graph can be modeled as a finite-horizon Markov Decision Problem (MDP) [Puterman, 1994] as follows. The states are all possible pair combinations of (G known, V gen ). The actions are the exploration actions applied to each vertex in V gen. The transition function between two states (old and new states) is affected by the existence probability of edges that were added to G known in the new state. Finally, the reward of every action is the negation of the exploration cost of the expanded node (in our case this is -1). A policy will be an algorithm for choosing which vertex to expand next in every iteration of Algorithm 2. An optimal policy could theoretically be computed, such that the expected cost would be minimized. An alternative formulation of our problem is as a Deterministic Partially Observable MDP (DET-POMDP) [Littman, 1996; Bonet, 2009], where the partially observable state is the entire searched graph along with the list of explored nodes - (G, V exp ), and an observation is the known subgraph G known. An action corresponds to expanding a single vertex. Note that in DET-POMDP, unlike standard POMDPs, the outcome of performing an action from a state is completely deterministic - in our problem this is simply adding the expanded vertex to V exp. Furthermore, in DET-POMDP, an observation is a deterministic function of a state and the performed action. Similarly, in our problem, given a graph G and a set of expanded vertices V exp the known subgraph (i.e., the observation) will be the subgraph of G that contains all the vertices and edges that are connected to at least a single vertex in V exp. The main problem in using an off-the-shelf MDP or POMDP algorithms is the number of possible states. Initially, only a single vertex is known. Since we do not know the number of vertices in the searched graph, the number of states is infinite. Thus it is impossible to explicitly store the entire belief state or enumerate all the states in the state space. If the number of vertices in the searched graph is known, then it is theoretically possible to employing an MDP or POMDP solver [Pineau et al., 2003; Hansen and Zilberstein, 2001; Cassandra et al., 1997; Bellman, 1962] in order to find the optimal policy, i.e., the policy that minimizes the expected exploration cost. Unfortunately, the size of the MDP state space grows exponentially with the number of vertices in the graph, as it contains all possible subgraphs of G. If the number of vertices in the unknown graph is n, then the number of states in the corresponding MDP will be (n 1)! 2 (n 2) : (n 1)! for every possible order of expanding n 1 vertices, and 2 (n 2) for all the possible graphs with n 66

77 Algorithm 4 RPattern Input: G P, The desired pattern Input: MaxDepth, Max sample depth Input: NumOfSampling, Number of samples Output: The next vertex to expand 9 foreach v in V gen do 10 Q[v] 0 loop NumOfSampling times 11 G known G known V gen V gen d 1 simulatedexplore(v, G known, V gen) while d < MaxDepth And G known does not contain G P do 12 v choosenext(v gen) simulatedexplore(v, G known, V d d end 14 Q[v] Q[v] + d if hasclique(g known,k)=false then Q[v] Q[v] + the pattern distance of v 15 end 16 Q[v] 17 end Q[v] NumOfSampling 18 return argmin v Vgen Q[v] gen) vertices (an undirected graph with n vertices has at most ( n 2) edges). For example, a graph with 10 vertices requires 9!2 45 MDP states. This combinatorial explosion prohibits any algorithm that requires enumerating a large part of the state space, as well as algorithms that require explicit representation of the belief state RPattern In order to still exploit an available probabilistic knowledge, we propose a Monte-Carlo based sampling technique that combines sampling of the MDP state space with Pattern as a default heuristic. We call this heuristic algorithm Randomized Pattern or RPattern in short. Like the previously presented heuristic algorithms, RPattern is invoked when choosing which vertex to expand next (line 4 in Algorithm 2). The basic idea is to estimate for every generated vertex v the future exploration cost until the desired pattern is found by sampling the possible extensions of G known in which v was expanded. RPattern will then choose to expand the vertex with the lowest estimated future exploration cost. 67

78 Algorithm 4 presents the pseudo code for RPattern. RPattern requires the following parameters: (1) MaxDepth, the maximum depth of every sample and (2) NumOfSampling, the number of samples to construct. Every vertex v in V gen is assigned a value Q[v], initialized by zero (line 2 in Algorithm 4). The value of Q[v] is updated by the sampling process described next, and it is used eventually to estimate the cost of finding the desired pattern given that v is expanded next The value of Q[v] is updated as follows. In each sample, G known is initialized by G known (line 4) and V gen is initialized by V gen (line 5). Then, the outcome of expanding v is simulated (line 7) using the available probabilistic knowledge (i.e., the P ˆ r function described earlier). G known and V gen are updated according to the outcome of the simulated exploration. Following, a vertex v is chosen (from V gen) for simulated exploration using the Pattern heuristic (line 9). The outcome of exploring v is then simulated, again possibly adding new edges from v to other vertices (line 10) and updating G known and V gen accordingly. This process continues until either G known contains a subgraph that is isomorphic to G P or after MaxDepth iterations have been performed. If the desired pattern has been found after d simulated exploration actions, then Q[v] is incremented by d (line 13). Otherwise, Q[v] is incremented by MaxDepth plus the pattern distance of the vertex chosen by Pattern (line 14), which is the best lower bound of the remaining exploration cost (Theorem 4.4.1). All this (lines 4 14) is done for a single sample. The average value of NumOfSampling samples is then stored in Q[v] (line 16). This is repeated for all vertices in V gen and the vertex with the smallest Q value is chosen for expansion by RPattern. 3 Note that the pattern distance of a vertex v is a lower bound on Q[v], since the pattern distance is a lower bound on the cost of finding the desired pattern given that v is expanded next (Theorem 4.4.1). An important part of RPattern is how the outcome of an exploration is simulated (line 7 & 10). This greatly depends on the available probabilistic knowledge of the graph. As described above, in this chapter we assume that the available probabilistic knowledge is the probability of any two generated vertices u, v V gen having an edge between them (denoted by P ˆ r(u, v)). By contrast, we do not assume any knowledge on vertices that have not been generated. We therefore take a myopic approach, considering only future connections between generated vertices, and no new vertices are added during a simulated exploration. 4 The sim- 3 Actually, dividing Q[v] by NumOfSampling is redundant, as this is constant for all the vertices. We leave this for clarity of presentation, to emphasize that Q[v] is designed to estimate the expected exploration cost. 4 If knowledge is available on the existence of new vertices, it can be easily incorporated into the simulated exploration. 68

79 ulated exploration of vertex v is therefore performed by adding an edge between v and any other generated vertex u with probability P ˆ r(v, u). Consequently, the computational complexity of the simulated exploration of vertex v is O( V gen ). RPattern performs MaxDepth N umof Sampling simulated explorations (in the worst case). After every simulated exploration, Pattern is executed to choose on which vertex to perform a simulated exploration next. Thus, the computational complexity of RPattern is MaxDepth NumOfSampling times O( V gen ), plus MaxDepth NumOfSampling times the computational complexity of Pattern. In summary, RPattern exploits the available probabilistic knowledge to generate samples that are extensions of G known (line 7 and 10) and utilizes Pattern for choosing the next vertex to explore during the sampling (line 9) and for estimating the exploration cost in case the maximum depth has been reached (line 14). Note that RPattern can be easily implemented as an anytime algorithm: one can always add more samples in order to improve the estimated expected exploration cost Theoretical Analysis Next, we analyze the effect of the heuristic algorithms described in the previous sections for performing the choosenext() action in Algorithm 2 on the exploration cost of searching for a pattern in an unknown graph. For pedagogical reasons we first focus on the k-clique pattern, and then generalizing to any pattern. Algorithms for choosenext() can be viewed as online algorithms. An online algorithm is an algorithm that must process each input in turn, without detailed knowledge of future input [Laplante, 2001]. An offline algorithm for choosenext() would be an algorithm that receives the entire searched graph G as input (and could potentially detect whether a k-clique exists) but is still required to choose vertices for exploration until G known contains a k-clique. This is because the search only halts when G known contains the desired pattern, and choosenext() can only choose a generated vertex for exploration. An optimal offline algorithm for choosenext() is an algorithm that if used in the best-first search described in Algorithm 2 will result in finding the searched pattern with minimum exploration cost. Using any other algorithm for choosenext() will require expanding at least the same number of vertices as the optimal offline algorithm. Let d(u, v) be the length of the shortest path between u and v in the searched graph, and denote by d(v, V ) the distance between a vertex v 5 This requires a slight modification of Algorithm 4, mainly swapping lines 1 and 3 in the pseudo code. 69

80 (a) (b) s s (c) s 2 s Figure 4.8: Scenarios for best and worst case exploration cost and a set of vertices V, which is defined as d(v, V ) = min u V d(v, u). Theorem [Optimal offline algorithm] Let C be the k-clique in the unknown graph that is closest to s, where closeness is measured by d(s, C). The optimal offline algorithm for choosenext() will choose to expand the vertices that are on the shortest path from s to C, as well as k 1 members of C. Proof: Let C k be the first k-clique that will be found using the optimal offline algorithm. It is easy to see that G known contains a the k- clique C k only after k 1 members of C k have been expanded. Since only vertices from V gen may be expanded, C k cannot be explored until k 1 of its members have been added to V gen. With the exception of the initial vertex s, a vertex is inserted into V gen only when one of its neighbors is expanded. Therefore, the minimal set of vertices that must be expanded until C k is in G known contains the shortest path from s to C k plus k 1 members of C k. Consequently, the minimal exploration cost will be achieved when C k is the k-clique that is closest to s. Theorem [Exploration cost analysis] Given an unknown graph G = (V, E) that contains a k-clique and given an initial vertex s the optimal offline algorithm will expand k 1 vertices in the best case, and V 1 vertices in the worst case. Proof: We first prove the best case result. Consider a graph in which s is a part of a clique of the desired size, denoted by C k. Figure 4.8(a) 70

81 shows an example of such a graph for k = 5. If only k 2 vertices were expanded, then there are two vertices v, u C k that have not been expanded. Since neither of them has been expanded, the edge between them cannot be in E known and thus the desired k-clique has not been found. On the other hand, if k 1 members of C k have been expanded, all the edges and vertices of C k are in G known and therefore C k is found. For proving the worst case result, consider a graph that is composed of a chain of vertices starting from s and ending with a clique of the desired size, denoted by C k. An example of such a graph is presented in Figure 4.8(b), where k=5. C k is the only k-clique in G and is therefore the closest k-clique to s. Hence, according to Theorem all the V k vertices in the chain must expanded by the optimal offline algorithm, along with k 1 members of C k. This totals to V 1 vertices that will be expanded by the optimal offline algorithm until the k-clique is found in G known (causing the search to halt) Online algorithms are commonly analyzed using competitive analysis [Borodin and El-Yaniv, 1998]. In competitive analysis we try to calculate (or at least estimate or bound) the competitive ratio. This is the maximal ratio over all possible inputs between the performance of the evaluated algorithm and the performance of the optimal offline algorithm. For the problem of finding a k-clique with minimum exploration actions, this corresponds to the maximal ratio between the exploration cost of an evaluated algorithm and the exploration cost of the optimal offline algorithm over all possible unknown graphs and initial vertices. Theorem presents a lower bound of the competitive ratio of any algorithm for this problem. Theorem [Competitive ratio of exploration cost] For any k > 2, there is no deterministic online algorithm for choosenext() that finds a k-clique in an unknown graph G = (V, E) with a competitive ratio that is less than V 1 k 1. Proof: Assume by negation that A is an algorithm that has a competitive V 1 ratio that is smaller than. Let G = (V, E) be a star-shaped graph k 1 without a k-clique, where all the vertices in the graph are connected only to s. The left graph in Figure 4.8(c) is an example of such a graph. Assuming k > 2 then running algorithm A on G starting from s will expand all the vertices in V, until concluding that no k-clique exists. Let C k 1 be the last k 1 vertices chosen for exploration by A. Let G be a graph that is identical to G except for having additional edges between all pairs of vertices in C k 1. Thus, C k 1 form a k-clique with the initial vertex. The right graph in Figure 4.8(c) is an example of such a graph for k=4. G and G provide exactly the same input to A until 71

82 one of the last k 1 vertices is expanded. Thus if A is deterministic, it will choose to expand exactly the same V (k 1) vertices in G as it chose for G until one of the vertices in C k 1 is expanded. Then, A will have to expand at least k 2 vertices from C k 1 (as s has already been expanded) until finally exploring the k-clique, totaling in V 1 vertices expanded until the k-clique has been found. On the other hand, the optimal offline algorithm will only expand the k 1 members of V 1 k 1, the clique. Thus, the competitive ratio of algorithm A is at least contradicting the assumption that A has a competitive ratio lower than V 1. k Generalizing to Any Given Pattern The above theorems can be generalized to any specific pattern G P = (V P, E P ), using the following supporting claims. Lemma G known contains a subgraph C P that is isomorphic to G P iff V exp contains a (not necessarily minimal) vertex cover of C P. Proof: ( ) Let C be a vertex cover of C P that has been expanded. By definition of Explore() (Definition 4.1), all the connecting edges and neighboring vertices of members C have been inserted to G known. Since C is a vertex cover of C P, then all its edges and vertices are in G known. ( ) Assume by negation that V exp does not contain a vertex cover of C P. Then there exists an edge (u, v) in C P such that u and v are not in V exp (i.e., have not been expanded yet). This means that (u, v) cannot be in G known, contradicting the definition of C P Note that a vertex cover of the k-clique pattern is any k 1 vertices of the clique. Therefore, as stated in Theorem 4.6.1, even the optimal offline algorithm requires exploring at least k 1 vertices until a k-clique is found. For ease of notation we use the term a vertex cover of G P in G to denote a vertex cover of a subgraph of G that is isomorphic to G P. Definition [Smallest connected subgraph with a vertex cover] Let minconnectedv C(s, G, G P ) be a function that returns the smallest connected subgraph of G (in terms of number of vertices) that contains the vertex s and contains a vertex cover of G P in G. Corollary extends the result from Theorem to the general case of searching for any pattern. Corollary [Any pattern optimal offline algorithm] The optimal offline algorithm will choose to expand only the vertices in minconnectedv C(s, G, G P ). 72

83 Proof: First we prove that there is an algorithm that expands only the vertices in minconnectedv C(s, G, G P ) and finds the desired pattern. Then we prove that no algorithm can find the desired pattern without expanding at least the same number of vertices. Let G = (V, E ) = minconnectedv C(s, G, G P ). Since V contains a vertex cover of G P, then after expanding the vertices in V, a subgraph of G known that is isomorphic to G P has been found (Lemma 4.6.4). Since G is connected and contains s, all the vertices in G can be expanded in a breadth-first order, starting from s. This expansion order ensures that a vertex is expanded only after it has been previously generated. Therefore there is an algorithm that finds the desired pattern by choosing to expand only the vertices in G. Assume by negation that the optimal offline algorithm finds the desired pattern by expanding the set of vertices V, such that V < V. Recall that G is the minimal connected graph that contains the initial vertex s and a vertex cover of G P. Clearly, any algorithm will expand the initial vertex s. Thus, either V does not contain a vertex cover of G P in G, or there is no connected subgraph of G that contains only the vertices in V. If V does not contain a vertex cover G P in G, then after expanding only the vertices in V, the known subgraph does not contain an isomorphic subgraph of G P (Lemma 4.6.4). This contradicts the fact that the desired pattern has been found after expanding only the vertices in V. On the other hand, assume that V does contain such a vertex cover but there is no connected subgraph in G that contains only the vertices in V. Then there are vertices in V that cannot be expanded without previously expanding a vertex that is not in V. It is easy to show that a vertex cannot be chosen for expansion if there is no path from s to it that contains only expanded vertices. Thus it is not possible to expand only the vertices in V, resulting in a contradiction. Concluding the proof, there is an algorithm that finds the desired pattern by expanding only the vertices in minconnectedv C(s, G, G P ), and no algorithm can find the desired pattern by expanding less vertices. Thus it is optimal. The worst case and best case results for the k-clique pattern shown in Theorem can be seen as a special case of Corollary As explained above, a vertex cover of a k-clique is any k 1 vertices from that clique. When there is a k-clique connected to the start state, minconnectedv C(s, G, G P ) will contain exactly k 1 vertices. This corresponds to the best case result stated in Theorem On the other hand, if the searched graph contains only a single k- clique that is V k vertices away from the initial vertex s, then minconnectedv C(s, G, G P ) will contain exactly V 1, corresponding to the worst case results in Theorem

84 Consequently, Theorems and can be generalized as follows. Theorem [Exploration cost analysis, searching for any pattern] Given an unknown graph G = (V, E) that contains a subgraph isomorphic to G P, then the optimal offline algorithm will expand the number of vertices that is equal to minimal vertex cover of G P in the best case, and V ( G P max minconnectedv C(v, G P, G P ) ) vertices in the v G P worst case. Proof: The best case scenario occurs when s is a part of the minimal vertex cover of G P in G and has edges to all the other vertices in the minimal vertex cover. The optimal offline algorithm will then expand s and only the vertices in the minimal vertex cover, until the desired pattern is found. Since the desired pattern cannot be found without expanding a vertex cover of G P (Lemma 4.6.4) then this is the best case. The worst case scenario occurs in a graph that is similar to the graph presented in Figure 4.8(b). The graph is composed of a chain of vertices starting from s and ending with the desired pattern. The desired pattern can be connected to the chain via any of its vertices. In the worst case graph, the desired pattern will be connected to the chain via the vertex that will maximize the exploration cost of the optimal offline algorithm. Thus only G P max minconnectedv C(v, G P, G P ) will not be expanded v G P Using a straightforward generalization of the proof of Theorem 4.6.3, it is possible to conclude that any algorithm cannot achieve a competitive ratio that is better (i.e., lower) than G G P 6 G P. As described in Theorem for the specific patten of a k-clique, the competitive ratio is. This is even higher than described in Theorem 4.6.3, V 1 V k k 1 k because k is at least one and V k. When the size of the searched graph is much larger than the size of the pattern graph, then the upper bound for the competitive ratio of any algorithm is approximately G G P. Note that an algorithm that chooses which vertex to expand randomly has approximately the same competitive ratio G of G P. Hence, one might presume that all algorithms are as effective in finding the searched pattern as random exploration. However, this analysis is based on a worst case scenario. Next, we demonstrate experimentally that the heuristic algorithms presented in Sections 4.4 and 4.5 require significantly less exploration cost than random exploration in various settings. 6 An exception to this analysis is the trivial pattern of two vertices connected by an edge, which will always be discovered after a single exploration action. 74

85 4.7 Experimental Results We empirically compared the different heuristic algorithms presented in this chapter for the k-clique pattern by running experiments on various graphs. We chose to experiment on the k-clique pattern because it is a well known pattern in the computer science literature. In every experiment the searched graph was constructed according to the following parameters: (1) the graph structure (random or scale-free), (2) the number of vertices in the graph (100,200,300 and 400), (3) the initial vertex and (4) the size of the desired clique (5,..,9). We verified that the constructed graph contained a clique of the desired size. If it did not, a new graph was generated. We chose to experiment on relatively small cliques (i.e. the size of the desired clique is substantially smaller than the size of the searched graph), as we are interested in scenarios where the k-clique can be found without exploring almost the entire unknown graph. The performance of the different heuristic algorithms was evaluated by running them on the constructed graphs, and comparing the exploration cost required until the desired pattern was found. For comparison reasons, we also ran the following algorithms: Random. An algorithm in which the next vertex to expand is chosen randomly from V gen. This algorithm serves as a baseline for comparison. Lower bound. The optimal offline algorithm described in Section 4.6. This algorithm is used as a lower bound on the exploration cost of the optimal algorithm, since no algorithm can do better than the optimal offline algorithm. RLS-LTM. An adaptation of a clique search algorithm from the known graph setting to the unknown graph setting. This algorithm is described next RLS-LTM The state-of-the-art algorithms for finding the largest clique in a known graph are based on local search [Battiti and Protasi, 1997; 2001; Pullan and Hoos, 2006; Battiti and Mascia, 2009]. These algorithms begin with a trivial clique containing a single vertex that is chosen randomly. Then, an iterative improvement process begins, in which the current clique is extended by adding a vertex that is connected to all other vertices in the current clique. This process continues until it is not possible to extend the clique any further, i.e., there is no vertex in the graph that is connected to all the vertices in the current clique. Then a 75

86 single vertex is removed from the current clique, possible allowing further extensions of the new current clique (without the removed vertex). After preforming this process several times the search restarts, discarding the current clique and choosing a different initial vertex from which the search continues. The various local search algorithms differ by the policy they employ to choose which vertex to remove or add, and by the policy they employ in choosing when to restart the search. Recently, it has been shown empirically that for random graphs and scale-free graphs the best algorithm in this local search framework is Reactive Local Search with Long Term Memory (RLS-LTM) [Battiti and Mascia, 2009]. In RLS-LTM, whenever a vertex is removed from the current clique, it is prohibited from being added to the current clique for the next T iterations. T is a parameter adjusted during the search reactively: T is increased whenever the current clique has already been visited, and decreased when the current clique is a new clique. This requires storing all cliques visited throughout the search (this is the long term memory). Among the vertices that are not prohibited, RLS-LTM chooses to add to the current clique the vertex that has the highest degree. If the size of the current clique does not increase within a fixed number of iterations, the search restarts. We have adapted the RLS-LTM clique algorithm to the unknown graph setting as follows. The current clique is initialized with the initially known vertex s. Vertices are added and removed according to the RLS-LTM algorithm, and when a generated vertex is chosen to be added to the current clique, it is first expanded (incurring an exploration cost). Furthermore, for vertices that have not been expanded yet the known degree is used instead of the actual (but unknown) degree, which is used for tie-breaking in RLS-LTM. Note that there are two shortcomings for using RLS-LTM in the unknown graph setting: 1. RLS-LTM is not complete. RLS-LTM is a local search, and it is therefore not complete. In other words, a k-clique might exists in the graph and RLS-LTM will not find it. Due to the prohibition mechanism used by RLS-LTM, this rarely occurs for small cliques. This can be remedied by forcing RLS-LTM to expand a generated vertex after a fixed number of restarts. Thus all the unknown graph will eventually be expanded and the search will halt. 2. RLS-LTM focuses on runtime. RLS-LTM was developed to find cliques fast, not to reduce the number of vertices encountered (=explored) during the search. Thus it ignores the distinction between expanded and generated vertices. 76

87 Nonetheless, we provide experimental results for this algorithm as well Evaluating the Deterministic Heuristics First, we have evaluated the proposed deterministic algorithms on random graphs. In random graphs the probability that an edge exists between any two vertices in the graph is constant. These graphs have been extensively used in computer science research as an analytical model and as benchmarks for evaluating algorithms efficiency [Erdős and Rényi, 1959; Bollobás, 2001; Santo et al., 2003]. In our experiments we generated random graphs as follows. Let n be the number of vertices in the graph, and k be the size of the desired clique. First, a graph with n vertices was generated. Then, edges were added between random pairs of vertices, until a given number of edges were added. The number of edges that was added to the graph was calculated such that the expected number of k-cliques in the graph would be one. This can be easily calculated by the linearity of expectation [Bollobás and Erdős, 1976]. If no k-cliques existed in the generated graph, the graph was discarded. This process ensures that the graph contains a k-clique, but the expected number of k-cliques in the graph is not large, making the search for a k-clique more challenging. 7 In Figure 4.9 we compare the average total exploration cost of searching for a 5-clique using the heuristic algorithms, KnownDegree RLS- LTM and Clique on random graphs. Every data point is the average over 50 random graphs generated as described above. The x-axis shows the number of vertices in the graph and the y-axis shows the average exploration cost, i.e., the number of vertices expanded until a clique of the desired size was found. The black brackets denote an error bar of one standard deviation. The figure shows that all the heuristic algorithms significantly outperform the random approach. This improvement grows with the size of the graph but the difference between the lower bound (computed by the optimal offline algorithm described in Theorem 4.6.1) and the total exploration cost of all of the heuristic algorithms also increases. Another observation is that Clique is more effective than KnownDegree and RLS-LTM for random graphs by up to 20%. This difference also increases as the number of vertices in the graph grows. Although the focus of the algorithms proposed in this chapter is to minimize the exploration cost, we provide runtime results as well to demonstrate the feasibility of the proposed algorithms. Table 4.1 dis- 7 In a graph with many k-cliques, finding a k-clique is easy, and as a result it would have been harder to distinguish between the performance of the different algorithms. 77

88 Explored Vertices Lower bound Clique* Known Degree RLS - LTM Random Vertices Figure 4.9: Non probabilistic heuristic algorithms on random graphs. Vertices Clique KnownDegree RLS-LTM Random Total Vertex Total Vertex Total Vertex Total Vertex , , Table 4.1: Runtime in milliseconds, on random graphs. plays the average runtime in milliseconds until a k-clique was found in random graphs in the same experiment set described for Figure 4.9. The values in the Total column are the average total runtime in milliseconds and the values in the Vertex column are the average runtime per vertex, also measured in milliseconds. As can be seen, all algorithms found the k-clique under four seconds. Note that the reported runtime is for all the steps of our best-first search algorithm (Algorithm 2) including the test() step, where a search for the desired pattern is performed in the known subgraph. Indeed, searching for a 5-clique in a graph with several hundreds of vertices can be done very efficiently. The runtime complexity per expanded vertex of KnownDegree as well as random exploration is very small (see Section 4.4), and thus the total runtime as seen in Table 4.1 is small (less than 300 milliseconds). On the other hand, the runtime of RLS-LTM is relatively large (over a second for random graphs with 300 and 400 vertices). In every iteration of RLS-LTM, a local search is performed in the known subgraph, and a vertex is chosen for exploration only when a generated vertex is added to the current clique (see Section for details). Consequently, the runtime until RLS-LTM chooses which vertex to expand next is larger than that of all the other heuristic algorithms. Although in a worst case analysis, the runtime of Clique is large (explained in Section 4.4.2), un- 78

89 Explored Vertices der these settings it is the fastest. This is because the actual number of potential k-clique, which dominated the runtime of Clique, is in practice much smaller than the worst case number (which is exponential in the size of the searched clique) Vertices Lower bound Clique* Known Degree RLS - LTM Random Figure 4.10: Non probabilistic heuristics on scale-free graphs. The second set of experiments were performed on scale-free graphs. In scale-free graphs the degree distribution of the vertices in the graph can be modeled by power laws, i.e. P r(degree(v) x) = x β for some β > 1. Many networks, including social networks and the Internet exhibit a degree distribution that can be modeled by power laws [Barabasi and Albert, 1999]. Since one of the domains which we are interested is the Internet, it is natural to run experiments on this class of graphs as well. A number of scale-free graph models exist [Donato et al., 2004; Donnet and Friedman, 2007; Siganos et al., 2003; Barabasi and Albert, 1999]. We chose a simple scale-free graph generator model [Eppstein and Wang, 2002] requiring two parameters: (1) the number of vertices n and (2) the number of edges m. According to this model, a graph is generated in two stages. First, a connected graph with n vertices is generated. This is done incrementally, starting with an empty graph and adding vertices one at a time. A new vertex v is added to the graph by connecting it to an old vertex u that is selected with probability proportional to its degree. Then, after all the vertices have been added to the graph, m edges are added by selecting a vertex at random and connecting it to a vertex that is also selected with probability proportional to its degree. Figure 4.10 presents results obtained by performing a set of experiments with scale-free graphs under settings similar to those of the random graph experiments described above. Two interesting phenomena 79

90 can be observed. First, the improvement in the exploration cost of all the heuristic algorithms over the random approach significantly grows when the size of the graph increases. The improvement is more than a factor of 6 in graphs with 400 vertices. Second, according to a paired t-test, KnownDegree outperforms RLS-LTM on all sizes of graphs, and it is even significantly better (p-value < 0.1) than Clique on graphs with 400 vertices. Moreover, the average exploration cost of these three algorithms (Clique, KnownDegree and RLS-LTM) is almost the same, and very close to the lower bound (calculated by the optimal offline algorithm described in Theorem 4.6.1). The improved performance of KnownDegree and RLS-LTM in scalefree graphs can be explained as follows. In scale-free graphs a vertex is more likely to be connected to a vertex with a high degree (this is known as preferential attachment). Thus, vertices with high known degree are more likely to be connected to other generated vertices or to new vertices. Vertices with high known degree are chosen by KnownDegree by definition. Furthermore, vertices with high known degree are more often considered in RLS-LTM, since such vertices are connected to more vertices than vertices with low known degree. These two arguments explain the improved performance of KnownDegree and RLS-LTM on scalefree graphs in comparison with the results of KnownDegree and RLS- LTM on random graphs. In terms of runtime, all the algorithms expect random exploration found a 5-clique under 100 milliseconds. Furthermore, the differences between the runtime of the algorithms were insignificantly small. We therefore omit these results. This is reasonable, since as seen in Figure 4.10 all the heuristic algorithms (except random exploration) found the desired 5-clique with a very small number of exploration actions - almost equal to the optimal offline algorithm. Concluding, on both random and scale-free graph we have seen that all the heuristic algorithms (Clique, KnownDegree and RLS-LTM ) outperform random exploration significantly. On random graphs Clique is superior to all of the other algorithms in terms of exploration cost, while in scale-free all the heuristic algorithms required very similar exploration cost, with a slight advantage for KnownDegree A Real Domain of Unknown Graphs from the Web We have also evaluated the deterministic heuristic algorithms on a real unknown graph - the World Wide Web. This was done by implementing an online search engine designed to search for a k-clique in the web. Specifically, we implemented a web crawler designed to search for a k- clique in academic papers available via the Google Scholar web interface 80

91 (denoted hereafter as GS). Each paper found in GS represents a vertex, and citations between papers represent edges. We call the resulting graph the citation web graph. Naturally, the connection in context between papers is bidirectional (although two papers can never cross cite each other), thus we model the citation web graph as an undirected graph. The motivation behind finding cliques in the citation web graph is to find the relevant significant papers discussing a given subject (as discussed in the introduction). This can be done by starting the clique search with a query of the name of the desired subject or term. In our experiments we used computer science related terms (e.g., Subgraph-Isomorphism, Online Algorithms ). The web crawler we implemented operates as follows. An initial query with a name of a subject or a scientific term is sent to GS. The result is parsed and a list of hypertext links referencing academic work in GS is extracted. The crawler then selects which link to crawl next, and follows that link. The resulting web page is similarly parsed for links. This process is repeated, allowing the web crawler to explore more and more parts of the citation web graph. Figure 4.11 shows an example of a citation web graph generated by a random crawl, starting with a GS query of Sublinear Algorithms. Figure 4.11: Citation web graph from a random walk in GS. In order to gather descriptive statistics of this process, 25 graphs were generated by the process described above, starting from queries of 25 different computer science topics and generating 25 corresponding citation web graphs. As could be expected, the distribution of node degrees 81

92 % Explorations in the graphs followed a power law distribution. Interestingly, many of the citation web graphs contained 4-cliques (95% out of the 25 generated web graphs) and 5-cliques (70% of the generated web graphs). On the other hand, very few (only 25% of the generated web graphs) contained larger cliques. Note that every random walk was halted after exploring several hundreds of vertices, and larger cliques may be found by further exploration. In addition, we measured the runtime used by every exploration, for over 2,500 random explorations of web pages in GS. The exploration of a vertex included sending an HTTP request, waiting for the corresponding HTML page to return and parsing it. Figure 4.12 presents the histogram of the runtime required for exploring a single vertex, grouped into bins of 200 milliseconds. As can be seen over 50% of the explorations are in the same runtime bin. Moreover, 90% of the explorations are performed in the range of the bins 0.6 and 0.8. This result agrees to some extent with our simplifying assumption of a constant exploration cost. 60% 50% 40% 30% 20% 10% 0% Runtime (sec.) Figure 4.12: Runtime of exploring a web page For finding a k-clique in the citation web graph, we implemented a best-first search (as described in Algorithm 2) on top of our web crawler as follows. The initial vertex s is an initial query that will be sent to GS. The Explore() action (line 5 in Algorithm 2) consists of sending a query to GS and extracting from the HTML of the resulting web page the list of hypertext links that references academic work. Each new link is added to G known as a new generated node. The crawler then selects which link to crawl next (line 4 in Algorithm 2), and follows that link. The resulting web page is similarly parsed for links. This process is repeated until a k-clique is found (line 3 in Algorithm 2) or after 100 web pages were explored. The number of explorations was limited to 100 in order to prevent the crawler from being classified as a denial of service (DOS) 82

93 k Random KnownDegree Clique Table 4.2: Number of instances where the desired clique was found. Algorithm Cost Runtime Time per vertex Clique (7.51) (6.38) 1.29 (0.41) KnownDegree (30.52) (13.43) 0.81 (0.32) Random (27.71) (26.36) 0.98 (0.16) Table 4.3: Online search results of the GS web citation graph, searching for a 4-clique. attack (and as a result be blocked from access to the web page). Table presents the results of 22 online GS web crawls, performed using the KnownDegree and Clique heuristic algorithms, as well as the random baseline. As mentioned above, we used computer science related terms to start the crawl in GS. The values in the column k represents the size of the desired clique. The values in the other column represent the number of instance where the desired clique was found before reaching the exploration limit of 100 described above. As can be seen in Table 4.7.3, with Clique the desired clique was found in substantially more instances than both KnownDegree and random exploration. For example, with Clique a 5-clique was found in almost three times more instances than KnownDegree (17 vs. 6), and random exploration did not manage to find a 5-clique in any of the instances (before reaching the exploration limit). Table 4.3 presents detailed results for the instances where all the algorithms have successfully found the desired clique. There were 9 such instances (instances that were solved by random, KnownDegree, and Clique ), and the desired clique size in all these instances was 4. The Cost column represents the average number of vertices expanded until the desired clique was found and the Runtime column represents the average runtime in seconds. The Time per vertex column displays the average number of seconds required to explore a vertex. The values in brackets in each column are the standard deviation. As can be seen in Table 4.3, Clique requires expanding significantly fewer vertices than both KnownDegree and Random. For example, finding a 4-clique requires expanding an average of only vertices using Clique, which is 4 times less than the average number of vertices required random, and more than two times less vertices than required by KnownDegree. The difference in runtime between KnownDegree and Clique is smaller than the difference in cost between KnownDegree and Clique (11.33 for Clique vs for KnownDegree). This is 83

94 due to the larger overhead per vertex required by Clique, as demonstrated in the Time per vertex column. When searching for a 4-clique, the average number of seconds required to explore a vertex was 0.81 for KnownDegree while it was 1.29 for Clique. This corresponds to the complexity analysis given in Section and Section 4.4.2: KnownDegree requires only O(log( V gen )) operations to choose the next vertex to expand, while Clique requires higher computational effort, due to the overhead of maintaining the set of potential k-cliques. Surprisingly, the average runtime per vertex of KnownDegree is even smaller than Random. This is counter-intuitive, as the complexity of choosing the next vertex to expand in Random is O(1). However, this can be explained as follows. In every iteration of the search (Algorithm 2), we first test if the searched pattern has been found in the known subgraph (G known ), and then run a heuristic algorithm for choosing the next vertex to expand. As more vertices are expanded, G known grows, demanding more time to test if the searched pattern has been found. Since KnownDegree finds a k-clique by expanding less vertices than Random, the resulting runtime per vertex of KnownDegree is slightly smaller than Random. In summary, although all the proposed heuristic algorithms are similar in terms of theoretical competitive ratio (Theorem 4.6.3), they performed much better than random exploration in all our experimental settings. We have seen as well that Clique outperforms all the other algorithm in terms of exploration cost, except for in scale-free graphs, where all the heuristic algorithms exhibited similar performance with a slight advantage for KnownDegree. Furthermore, the runtime of Clique has always remained feasible Simulated Graphs with Probabilistic Knowledge RClique assumes that probabilistic knowledge about the existence of an edge connecting two vertices in V gen is available. This knowledge is used by RClique when simulating the outcome of expanding vertices. To empirically evaluate RClique we simulated this probabilistic knowledge as follows. Let noise be a real number in the range [0, 1]. For a graph G = (V, E) we define the following function: { 1 rand(0, noise) if e E P (e) = rand(0, noise) if e / E In our experiments RClique uses P (e) when performing simulated exploration, assuming that an edge e V V exists with probability P (e). 84

95 Explored vertices Consider the effect of the noise parameter. If noise = 0 then RClique is given exact knowledge about all edges in G. That is, P (e) = 1 for all existing edges and P (e) = 0 for edges that do not exist in the real searched graph. In this case a simulated exploration of vertex v will reveal all the real edges connecting v to the other generated vertices. By contrast, if noise = 1 then the probabilistic knowledge given to RClique is completely random. That is, P (e) assigns random values for every possible edge (whether it exists or not). If noise is between these two extremes then edges that do exist in the real searched graph are assigned high probabilities while edges that do not exist are assigned low probabilities. For example if noise = 0.5 then existing edges are assigned P (e) from the range [0.5, 1] and non-existing edges are assigned P (e) from the range [0, 0.5]. We performed experiments with different levels of uncertainty by using different values of noise Lower bound RClique* N=0% RClique* N=25% RClique* N=50% RClique*(p) Clique* Known degree RLS - LTM Random Figure 4.13: Random graphs, various levels of noise. In the first set of experiments we compared RClique with various levels of noise to the non-probabilistic approaches. Figure 4.13 shows the average exploration cost of searching for a 5-clique in random graphs with 100 vertices, generated as described in Section Every data point is the average over 50 randomly generated graphs. The bars denote the different heuristics, including (1) random exploration, (2) KnownDegree, (3) RLS-LTM, (4) Clique, (5) RClique with various settings, and (6) the lower bound provided by the optimal offline algorithm. In all RClique settings we set NumOfSampling to 250 and MaxDepth to 3, which we have found empirically to be effective. RClique with N=0%, 25% and 50% represents RClique with various levels of noise (0%, 25%, 50%). RClique (p) denotes RClique which is only given the average vertex degree in the graph. Therefore, in the simulated explorations RClique (p) assumes that the probability of having an edge between any two vertices is the average vertex degree in the 85

Potential-Based Bounded-Cost Search and Anytime Non-Parametric A

Potential-Based Bounded-Cost Search and Anytime Non-Parametric A Roni Stern a, Ariel Felner a, Jur van den Berg b, Rami Puzis a, Rajat Shah c, Ken Goldberg c a Information Systems Engineering, Ben Gurion