Probabilistic Double-Distance Algorithm of Search after Static or Moving Target by Autonomous Mobile Agent

Size: px

Start display at page:

Download "Probabilistic Double-Distance Algorithm of Search after Static or Moving Target by Autonomous Mobile Agent"

Richard Parker
5 years ago
Views:

1 2010 IEEE 26-th Convention of Electrical and Electronics Engineers in Israel Probabilistic Double-Distance Algorithm of Search after Static or Moving Target by Autonomous Mobile Agent Eugene Kagan Dept. Industrial Engineering Gal Goren Dept. Mechanical Engineering Irad Ben-Gal Dept. Industrial Engineering Abstract We propose a real-time algorithm of search and path planning after a static or a moving target in a discrete probability space. The search is conducted by an autonomous mobile agent that is given an initial probability distribution of the target's location, and at each search step obtains information regarding target's location in the agent's local neighborhood. The suggested algorithm implements a decision-making procedure of a probabilistic version of local search with estimated global distances and results in agent's path over the domain. The suggested algorithm finds efficiently both static and moving targets, as well as targets that change their movement patterns during the search. Additional information regarding the target locations, which is unknown at the beginning of the search, can be integrated in the search in real-time, as well. It is found that for the search after a static target, the algorithm actions depend on the global estimation at all stages of the search, while for the search after a moving target the global estimations mostly affect the initial search steps. Preliminary analysis shows that for the search after a static target the obtained average number of steps is close to optimal, while for the Markovian target the average number of steps is at least in the bounds that are provided by known search methods. Index terms Search and screening, static and moving target, autonomous mobile agent 1. INTRODUCTION Consider a target moving in a bounded discrete domain and a mobile agent that searches the target. The agent starts with an initial probability distribution of the target's possible location over the domain. At each search step the agent obtains information regarding target's location in the agent's local neighborhood, and terminates the search when the target is located in this neighborhood. The target movements are governed by a definite Markov process that is known to the agent, while the target is not informed about the agent's behavior. The goal of the search agent is to find the target in minimum number of steps. In the presented formulation, the problem of search after static or mobile target was originated by Koopman [14] during the Second World War in response to the German submarine threat in the Atlantic [20]. The initial studies of the problem dealt with the probabilistic search for a static target or a target that moves relatively slow, while in the last decades the main interest is focused on search for targets with comparable velocity to that of the search agent, where the location probabilities of target s change over time [5], [8], [21]. There are two main approaches of the considered search problem. Following the first approach, an optimal path of the agent is created off-line by use of global optimization methods. It is further assumed that the search effort [14] is infinitely divisible over the points of the domain and that the search period is defined and finite. The path-planning is based on considerations of optimal allocations that represent optimal distributions of search efforts over the domain with respect to the location probabilities and the search agent abilities [4]. For cases of search in continuous time and space the existence of optimal allocations has been proved, and optimal paths of the agent were obtained [19]. Regarding discrete time and space search, existence of optimal allocations were proven in the most general cases, while the algorithms that generate such allocations were obtained only for a search after a static target [1], [19]. For the search after a moving target, optimal algorithms have been built for the concave detection probabilities function [2] and for any detection probability function and some necessary (but not sufficient) conditions of optimality [22], [23]. In variants of the above problem, methods of partially observable Markov decision processes were applied and resulted in optimal allocations when the evaluation function is piecewise linear and convex [7] and when dynamic restrictions on search efforts are applied [18]. The second approach to the above problem addresses a realtime search, such that the path-planning is generated during the search process. Decision-making regarding each next step of the search agent is conducted on the basis of the obtained information, and after choosing the next step, the agent updates the available information about target s locations. Such approach follows the line of A* algorithms and local optimization for the search over graphs or grid domains. The basic real-time A* algorithm results in a path of the search agent in the graph, and implements a search after a static target [15]. Later, a similar approach was applied in the moving-target search algorithm [9]. For the decision-making, both algorithms use local distances and heuristic distance /10/$ IEEE

2 estimations without a utilization of probabilistic information regarding target s location. Further studies of the real-time algorithms and methods of learning lead to a number of different procedures, some of which implemented heuristic learning methods of target location probabilities [3], [17]. A certain progress in the unification of off-line and realtime search was obtained by the use of information theory methods and application of real-time search procedures with information metrics [10], [11]. The obtained algorithms used admissible informational distances both for local decisionmaking and as global distance estimations and demonstrated Huffman-like optimal and near-optimal results in the search after static and moving targets. The decision-making in these algorithms is conducted by the use of partitions of the considered domain on the basis of the target s location probabilities. In further developments of these informational algorithms, a topology of the domain was specified [12] and an optimal algorithm of search after static target by a number of cooperative and non-cooperative searchers was found [13]. Although the significant advancement that has been achieved in the development of real-time search algorithms and the considerable progress in the studies of optimal allocating algorithms, no effective procedures for real-time allocating and on-line path-planning have been reported so far. The objective of this work is to develop and to analyze a real-time algorithm of search after static and moving targets that implements methods of allocating and of on-line pathplanning. The suggested algorithm is based on a probabilistic version of a local search with estimated global distances. Decision-making at each step of the search applies a probabilistic distance estimation that utilizes distance estimations and the characteristics of target's movement. The algorithm is applicable for search after static or moving targets and for the search after a target that changes its movement pattern during the search process. Additionally, obtained information regarding the target locations, which was unknown to the agent at the beginning of the search, can be taken into account in real-time, as well. 2. PROBLEM FORMULATION AND BACKGROUND Let ={,,, } be a set that represents a rectangle geographical domain of the size (, ), =. Each point, = 1,2,,, represents a cell, in which the target can be located at any time moment = 0, 1, 2,, and is called location. For simplicity, we index the points of X in linear order such that for the Cartesian coordinates (, ) of the point, its index is given by =( 1) +, where = 1,, and =1,,. Assume that at each time moment t there is a defined probability mass function : [0, 1] that specifies the probabilities ( ) =Pr{ = }, = 1, 2,,, of target's location in the points of the set X, where denotes the target's location at time moment t. Probabilities ( ) are called location probabilities, and for any t it holds true that 0 ( ) 1and ( ) =1. The set X with the defined probability mass function is the sample space. The target's movement is governed by a discrete Markov process with transition probabilities matrix =, where = Pr = = and =1for each = 1,2,..,. If matrix is a unit matrix, then the target is static, otherwise, we say that it is moving. Given target location probabilities (),, at time moment t, location probabilities at the next time moment +1are calculated as ( ) = ( ), =1,2,..,. The search agent moves over the sample space and looks for a target by testing a neighboring observed area of a certain radius >0. The ability of the agent to detect the target in the observed area is defined by a detection function that, for any observed area, specifies a probability (,) of detecting the target in the area given that the target is located in,, and the agent applies on a search effort [14], [19]. In this paper, we assume that for any search effort [ 0, ), the probability (,) =( ) is defined as an indicator (, ) of the area, i.e., ( ) =1 if and ( ) =0 otherwise. In addition, we assume that for all time moments t the observed areas have the same size. The search process is conducted as follows [10], [11]. The search agent deals with the estimated location probabilities ( ), that for the initial time moment =0are equal to the initial location probabilities ( ) = ( ),, = 1, 2,,. At each time moment t, the agent chooses an observed area and observes it. If an observation result in ( ) =1, the search terminates. Otherwise, the agent updates estimated location probabilities ( ) over the sample space X according to the Bayes rule. The updated probabilities are called observed probabilities and are denoted by ( ). To the observed probabilities, the agent applies some known or estimated rule of the target movement and obtains the estimated location probabilities ( ), = 1, 2,,, for the next time moment +1. E.g., if the target is moving according to a Markov process with a transition probabilities matrix, then the estimated location probabilities ( ) for the next time moment +1 are defined as ( ) = ( ), =1,2,..,. Given the estimated probabilities (),, the search agent applies a decision making regarding the next observed area to select and the process continues. The goal is to find a policy for choosing the areas such that the search procedure terminates in a minimal average (over possible target moves) number of steps, while the centers of the areas form a continuous trajectory over the sample space X with respect to a Manhattan metric. An existence of such policy in the most general form and the convergence of the search procedure are provided by the A*-type algorithms [9], [15] and their informational successors [10], [11], while the existence of optimal allocations are supported by Washburn theorems [23]. Here, we present a real-time algorithm that, as it follows from simulative results, creates the required trajectory of the agent

3 3. DOUBLE-DISTANCE SEARCH ALGORITHM As indicated above, the suggested algorithm is based on a probabilistic version of local search with estimated global distances, while decision-making utilizes distance estimations and the characteristics of target's movement. We assume that at each time moment t, both the search agent and the target can apply one of the following movements: ={,,,, }, where stands for move north, - move south, - move east, - move west and - stay in the current point. These are the simplest movements that correspond to the Manhattan metric; for other metrics additional movements can be considered. The target s choice of the movements is conducted according to its transition probabilities matrix, while the agent s choice follows the decision that is obtained according to the result ( ) of the observation of the area around the agent. The outline of the suggested search algorithm includes the following steps. Given initial target location probabilities ( ), = 1, 2,,, and transition probabilities matrix, 1. The search agent starts with estimated probabilities ( ) = ( ), = 1, 2,,, and an initial observed area. 2. If ( ) =1, i.e. the target is found and the process terminates. Otherwise, the following steps are conducted. 3. While ( ) 1,do 3.1. Calculate probabilities ( ), = 1, 2,, : For all do: if then set ( ) =0. For all do: set ( ) = ( ) ( ) Calculate probabilities ( ), = 1,2,,: For all do: set ( ) = ( ) Calculate weights for the movements : For all do: set = (),where is an area that will be observed if the agent will move according to the movement Estimate global probabilities if need: If =0for each k then Set = (,). else Set = Apply local decision-making Set = () Move to the area Set =+1and obtain observation result ( ). 4. Return obtained area, for which ( ) =1. As indicated above, at each time moment t, the target moves to its next location according to the transition probabilities matrix. Since the target is not aware of the agent activities up to the successful termination of the search, the target s movements do not depend on the searcher actions. The algorithm includes two key functions: global probabilities estimation function ( ) and local decision-making function ( ). The first function resolves an uncertainty for local decisionmaking, while the second conducts local decision-making and points to the next observed area. Let us start with the ( ) function. In the algorithm we implement local decision-making that is based on a discrete analogous to the maximum gradient over expected probabilities ( ) from the current location of the search agent to possible observed areas. In the other words, the agent chooses the next feasible observed area according to the defined movements V for which the observation will maximally affect the probabilities ( ) as they are determined in line 3.4 of the algorithm. The distances between the probability mass functions, where k corresponds to the indices of the movements, and the probability mass function are calculated according to the Ornstein distance metric [16] which is defined as follows: (,) = ( ) ( ). Given the set of possible movements V, the outline of the local decision-making function is the following. (): 1. For all movements do: 1.1. Determine according to Set = () For all do: if then set ( ) = For all do: set ( ) =( ) ( ) For all do: if then set ( ) = For all do: set ( ) =( ) ( ) Set ( ) = (,) + (1 )(,). 2. Set = argmax{ ( ) over all }; ties are broken randomly. 3. Return A. The calculations of the function lines have the same meaning as in the corresponding lines of the algorithm. In particular, line 1.2 defines the weights for the movements, while in the lines observed probabilities for successful and unsuccessful observations are calculated. In line 1.7 the Ornstein distance averaged over successful and unsuccessful observations is calculated, and in line 2 the area, for which this distance is maximal, is chosen. Local decision-making function acts on given probabilities ( ), = 1, 2,,. If there is no uncertainty as it is defined in the line 3.4 of the algorithm, then this function is applied to the probabilities ( ). If, in opposite, for all possible movements, the probabilities ( ) are equal to zero (see line 3.3 of the algorithm), then to resolve the uncertainty global probabilities estimation is applied. The probabilities estimation is implemented by the use of the function ( ) that starts from the probabilities ( ) and results in the probabilities ( ), = 1, 2,,. Probabilities ( ) are calculated by the use of the target s transition probabilities matrix as target location probabilities after a certain number of steps m, while the calculation starts from the agent s expected probabilities ( ), = 1, 2,,. The indicated number of steps m is defined by the use of the estimated target's location and by the entropy measure [6] that specifies the lack of information available to the searcher. Since the search agent observes the areas A that include equal

4 number of points, the entropy is calculated by using the size s of the areas. Then, the required or missing amount of information is defined as follows: ( ) =log () ( )log ( ). Notice that the logarithms are calculated on the base size s of the observed areas A. Such a calculation follows an assumption that the areas A have the same size and that the observations are error free. The estimated target's location is defined by a center of distribution that is calculated as a "center of gravity" for the probabilities ( ), = 1, 2,,. E.g., over the rectangle domain of the size (, ), =, coordinates (,), the estimated target's location, are calculated as follows: (,): 1. Set =0and =0. 2. For =1,, do: For =1,, do: Set = + ( ), =( 1) For =1,, do: For =1,, do: Set = + ( ), =( 1) Set = () and = (). 5. Return =(,). Denote by y the current location of the search agent. Then, by the use of information ( ) and the defined estimated target's location, function ( ) is outlined as follows. (,): 1. Calculate estimates target location: Set = (,). 2. Calculate distance between and agent's location y: 3. Set = (, ).Calculate number of estimated steps: = ( ) (). 4. For all do: set ( ) = ( ). 5. Do m times: 5.1. For all do: set ( ) =( ) For all do: set ( ) = ( ). 6. Return. Note that () stands for calculating the closest integer number of the real number b, and () stands for calculating the closest integer number that is greater than b. For two points =(, ) and = (, ), function (, ) calculates a geographical distance between the points. In the considered algorithm of search in discrete space, the distance is defined by the Manhattan metric: (, ) = +, which is applied in the numerical simulations below. 4. NUMERICAL ANALYSIS AND SIMULATION RESULTS The performance of the suggested algorithm was studied by the use of simulated trials with different initial distributions of the target's location and different types of target's movement. The algorithm was implemented in the form of MatLab scripts. Below, we present an example of such simulation results that have been obtained for search over a square domain with size = 2500 points. Each trial included 1000 sessions. For each session an initial target location probabilities were generated by a combination of ten binormal distributions with random centers and unit standard deviations. Initial locations of the target were chosen randomly according to the initial location probabilities. The trials were executed both for static target and for two types of the Markovian target. In each session the search was executed up to finding the target by the agent. The results of the trials over 1000 sessions for each type of target movement are presented in Table I. Target type Radius = 1, Obs. Area 3 3=9 TABLE I SIMULATION RESULTS Radius = 2, Obs. area 5 5=25 Radius = 3, Obs. area 7 7=49 Static Markov Brown In the table, stands for the average number of search steps and for the standard deviation. The possible movements of the Markovian target are: move north, move south, move east, move west and stay in the current point, while the Brownian target was prohibited to stay in its current position. In all trials the average distance between the initial target location and the agent's starting point was approximately 37.0 and standard deviation was approximately A relation between statistics of the initial distance and statistics of the number of search steps is discussed below. As expected, from the simulation results it follows that the average number of search steps strongly depends on the radius of the observed area; however both for static and for moving targets this dependence is not linearly proportional. In addition, it can be seen that the search after the Markovian target results in a lower average number of steps (at least for small observed areas) than the Brownian equivalent. This property is similar to the property of general real-time local search algorithms in a probabilistic space [10], [11]. Let us note on the algorithm's efficiency. Consider a static target search with the observed area =1, while the agent at each time moment covers a square area A of 9 points and obtains observation result () {0, 1}. The target's location probabilities distribution is combined by ten equivalent binormal distributions with unit standard deviations, i.e., the target is located in one of the areas that can be checked by the agent in one time moment, while the agent is located exactly in the center of this area. The average initial distance between the search agent and the target in this scenario is that presents an average cost of finding the target in one of ten areas. The minimal number of search steps for finding the target in one of the equiprobable areas is bounded by the entropy of the number of areas. Hence, the minimal average

5 number of search steps in the considered scenario is log (10) = The obtained result (see Table I) = is rather close to this bound. Now let us take into account that the agent moves sequentially, so it finds the target in the area of its location in two steps. The standard deviation of the initial distances is Hence, the theoretical standard deviation of search steps is log (10) 2 = that is close to the obtained standard deviation (see Table I) = The same statistical results are obtained in the simulations for the other initial target location probabilities and the sizes of the observed area. To evaluate the effectiveness of the algorithm for the search after a moving target, let us consider its approximate comparison with the known near-optimal Forward and Backward (FAB) algorithm [23]. In particular, it is known [23] that for the search after a target over 67 cells the FAB algorithm requires steps, while at each step the searcher checks a single cell and obtains an observation result with an error of In comparison, for the considered case the sample space contains 2500 points, while the agent observes an area that contains 9 points. Hence, an approximate evaluation of the number of steps required by the FAB algorithm for this search is (2500 (9 67) ) = This value is greater than the average number of steps = (see Table I) that is required by the suggested algorithm. However, note that the FAB algorithm allows erroneous detections, while in the suggested algorithms the detections have to be accurate. The obtained preliminary results support a claim that the suggested algorithm is close to optimal in the case of static target search and for the search after a moving target the obtained average number of steps is in the bounds that are provided by known search methods. Notice again, that the algorithm can be applied to various types of target movements and for both static and moving targets. Additional simulations were conducted for different types of local decision-making, such as integrating additional weighted global distance estimations to the expected target's location. Nonetheless, the obtained results for both the static and the moving targets, were found to be worse or statistically equivalent to the results presented in Table I. 5. CONCLUSION In the paper, we suggested a real-time algorithm of search after static and moving targets in discrete time and space. The algorithm inputs are the initial target's location probabilities over the considered domain and the transition matrix of possible target's moves and its output is the path trajectory of the agent that minimizes the average number of search steps. The algorithm is based on a probabilistic version of local search with estimated global distances. Decision-making at each step of the search applies a probabilistic distance estimation that utilizes distance estimations characterize the target's movement. The suggested algorithm finds efficiently both static and moving targets, as well as targets that change their movement patterns during the search. Nevertheless, for a search after a static target, the algorithm steps depend on global estimation at most stages of the search, while for a search after a moving target, global estimation mostly affect the initial search steps. Preliminary analysis and simulation results show that for the search after a static target the obtained average number of steps is close to optimal, while for the search after a moving target it is in the bounds that are provided by known nearoptimal search methods. REFERENCES [1] R. Ahlswege, and I. Wegener, Search Problems. John Wiley & Sons: New York, [2] S. S. Brown, Optimal search for a moving target in discrete time and space, Operations Research, vol. 28, no. 6, 1980, pp [3] V. Bulitko, and G. Lee, Learning in real-time search: a unifying framework, J. Artificial Intelligence Research, vol. 25, 2006, pp [4] A. P. Ciervo, Search for moving targets, Pacific-Sierra Research Corporation, Santa-Monica, CA, PSR Report 619 for Office of Naval Research, Arlington, Virginia, [5] D. C. Cooper, J. R. Frost, and R. Quincy Robe, Compatibility of land SAR procedures with search theory, US Department of Homeland Security, Washington, [6] T. M. Cover, and J. A. Thomas, Elements of Information Theory. John Wiley & Sons: New York, etc., [7] J. N. Eagle, The optimal search for a moving target when the search path is constrained, Operations Research, vol. 32, 1984, pp [8] J. R. Frost, and L. D. Stone, Review of Search Theory: Advances and Applications to Search and Rescue Decision Support, US Coast Guard Research and Development Center, Groton, [9] T. Ishida, and R. E. Korf, Moving target search: a real-time search for changing goals, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 6, 1995, pp [10] E. Kagan, and I. Ben-Gal, An informational search for a moving target, in Proc IEEE 24-th Conv. of Electrical and Electronics Engineers in Israel, e-publ. [11] E. Kagan, and I. Ben-Gal, A MDP test for dynamic targets via informational distance measure, in Proc Conf. IIERM, pp [12] E. Kagan, and I. Ben-Gal, Topology of the sample space in the problem of informational search for a moving target, in Proc Conf. IIERM, pp [13] E. Kagan, and I. Ben-Gal, Search after a static by multiple searchers by the use of informational distance measures, in Proc Conf. IE&M, p. 72 & e-publ. [14] B. O. Koopman, The theory of search, I-III, Operations research, vol. 4, 1956, pp , ; vol. 5, 1957, pp [15] R. E. Korf, Real-time heuristic search, Artificial intelligence, vol. 42, no. 2-3, 1990, pp [16] D. S. Ornstein, Ergodic Theory, Randomness, and Dynamical Systems. Yale University Press: New Haven and London, [17] M. Shimbo, and T. Ishida, Controlling the learning process of real-time heuristic search, Artificial Intelligence, vol. 146, 2003, pp [18] S. Singh, and V. Krishnamurthy, The optimal search for a moving target when the search path is constrained: the infinite-horizon case, IEEE Trans. Automatic Control, vol. 48, no. 3, 2003, pp [19] L. D. Stone, Theory of Optimal Search. Academic Press: New York, San Francisco, London, [20] L. D. Stone, The process of search planning: current approaches and continuing problems, Operations Research, vol. 31, no. 2, 1983, pp [21] L. D. Stone, What's happened in search theory since the 1975 Lanchester prize? Operations Research, vol. 37, no. 3, 1989, pp [22] A. R. Washburn, Search and Detection. ORSA Books: Arlington, VA, [23] A. R. Washburn, Search for a moving target: the FAB algorithm, Operations Research, vol. 31, no. 4, 1983, pp

CSE151 Assignment 2 Markov Decision Processes in the Grid World

CSE151 Assignment 2 Markov Decision Processes in the Grid World CSE5 Assignment Markov Decision Processes in the Grid World Grace Lin A484 gclin@ucsd.edu Tom Maddock A55645 tmaddock@ucsd.edu Abstract Markov decision processes exemplify sequential problems, which are