Decision Making for Autonomous Navigation

Size: px

Start display at page:

Download "Decision Making for Autonomous Navigation"

Ashley Rogers
6 years ago
Views:

1 Decision Making for Autonomous Navigation Abstract Robots can automate a wide range of physical tasks from warehouse management to space exploration. Many of these tasks require robots to navigate autonomously. Such a robot should continuously choose and execute actions from a set of available actions until it reaches the destination. The robot s decision should avoid collisions and other damaging outcomes while it ensures favorable ones, such as reduced travel distance and time. This paper reviews decision-making techniques for effective navigation. In particular, it reviews different methods to acquire information about the environment and how this information can be used for effective decision making. 1. Introduction Robots are machines that effect changes in the world inspired by human or animal physical capabilities. A robot is autonomous if it can complete an assigned task with no human intervention. Autonomous robots are increasingly used to automate routine tasks in areas such as industrial automation and warehouse management, or tasks in areas that are dangerous for people such as space exploration, military missions, and search and rescue operations. Mobile robots are robots that can move in a two-dimensional (2-D) or a threedimensional (3-D) world. This paper will focus on mobile robots that move in a 2-D world. The success of a mobile robot depends on its ability to navigate, that is, to ascertain its current position in the world and reach an assigned location. Mobile robots that navigate autonomously are used to move goods in a warehouse (Sun et al. 2014), 1

2 carry out mundane tasks such as vacuuming (Forlizzi & Disalvo 2006), or transport people in vehicles (Petrovskaya & Thrun 2009). Mobile robots are also used to explore dangerous or unknown locations in space missions (Maimone et al. 2007) or search and rescue operations (Davids 2002). Although these applications have different requirements and criteria for success, they all rely on autonomous navigation. Most commercially available mobile robots have a hardware/firmware layer and a software layer. The hardware layer includes sensors and actuators. Sensors measure or detect quantities in the environment, (e.g., light, heat) and convert them into digital information. Actuators are motors that cause physical actions. The software layer accepts sensor information as input, and generates commands to the robot actuators such that the robot completes its assigned task. This separation of hardware systems from the autonomous decision-making software has significant advantages. It allows robot manufacturers to build standardized robots without application-specific details, and allows robot users the flexibility to implement customized decision-making algorithms based on the application. This paper focuses on such decision-making algorithms for autonomous navigation. In computer science, decision making involves several subfields of artificial intelligence; including planning, reasoning, and machine learning. Planning organizes future actions to achieve an assigned goal. Reasoning infers the truth-value of statements from known facts. Learning uses past experience to improve decision making, so that the performance of a robot on a task improves with its experience. The remainder of this paper describes decision making, planning, learning, and other techniques used to build autonomous navigation systems for mobile robots, with a 2

3 particular focus on how knowledge about the environment is acquired and used. Section 2 describes the basic hardware components of a mobile robot and classifies various mobile robot settings. It also provides an overview of different decision-making approaches and describes the role of knowledge acquisition in decision making. Section 3 focuses on knowledge acquisition, that is, how raw sensor data can be converted into information useful for decision making. Sections 4, 5, and 6 address deliberative, reactive, and hybrid decision-making approaches to navigation, respectively. Section 7 briefly describes spatial knowledge and decision making in humans. Finally, Section 8 discusses important challenges in current autonomous navigation systems. 2. Mobile robots This section introduces basic components of a mobile robot and describes the different class of mobile robot systems based on their hardware capabilities, the environment they inhabit, the tasks they are assigned, and the decision-making approach they adopt. Mobile robots use a broad variety of sensors and actuators. Sensors can be classified as active or passive. Active sensors use their own energy source. They emit radiation that is directed toward the target to be investigated. The radiation reflected from that target is then detected and measured by the sensor. The key advantage of an active sensor is that it is less sensitive to external changes, such as time of day and season. Range-finding sensors (e.g., LiDAR, SONAR, and UltraSonic sensors) are active sensors. They measure the distance to the nearest obstacle by the emission of light or sound waves. LiDAR uses light waves; SONAR and Ultrasonic sensors uses sound waves. The reflected light or sound wave is detected, and the distance between the sensor and the obstacle is calculated 3

based on the delay until the arrival of the reflected wave. In robotics, range-finding sensors are used to detect obstacles, map the environment, and avoid collision.

Motion sensors are primarily used as an odometer to estimate the position of the robot. Motion sensors measure various aspects of robot movements.

Vision-based sensors measure electromagnetic radiation in the environment. They are most commonly used to identify landmarks along a route, but they can also serve as odometers.

4 based on the delay until the arrival of the reflected wave. In robotics, range-finding sensors are used to detect obstacles, map the environment, and avoid collision. A passive sensor measures the naturally available signals in the environment. Motion sensors and vision-based sensors are the commonly used passive sensors. Motion sensors are primarily used as an odometer to estimate the position of the robot. Motion sensors measure various aspects of robot movements. Inertial motion sensors measure the inertial force experienced by a moving object; wheel encoders estimate the rotation of the wheel. Vision-based sensors measure electromagnetic radiation in the environment. They are most commonly used to identify landmarks along a route, but they can also serve as odometers. There are also hybrid sensors that are both passive and active. Kinnect is one such hybrid sensor. Kinnect passively captures visual information with a camera. It also makes active depth measurements by its emission and detection of infrared radiation. Kinnect sensors are often used to localize and identify landmarks (Veloso et al. 2012). (a) (b) (c) Figure 1: Robot models with different actuators and sensors. (a) Surveyor with treaded wheels. (b) Turtlebot with a Kinnect Sensor. (c) Sony s AiBO robot with legs. 4

5 Wheels and legs are the two primary kinds of actuators used in mobile robots. As in Figure 1, Sony s AiBO 1 robots have legs where each joint is controlled by a motor. Other robots (e.g., Surveyors 2 and Turtlebots 3 ) use motorized wheels. Indeed, any standard vehicle (e.g., a car or a truck) can also be viewed as a wheeled robot. Wheeled robots predominate among mobile robots because of their simplicity. Compared to a robot with legs, a wheeled robot has fewer moving parts, which makes decision making more tractable. Robots with legs often have more motors, whose simultaneous control requires complex decision making. Mobile robot systems can be classified, based on the number of robots, as singlerobot or multi-robot systems. Single-robot systems do not require task allocation, teamwork, or collaboration. Multi-robot systems can parallelize tasks and hence reduce the task completion time. They are also more robust because hardware or software failure on a single robot does not necessarily result in task failure. Multi-robot systems are classified based on whether the robots complete tasks individually or through collaboration. They can also be classified by whether a central server makes decisions for all robots or if each robot controls itself. The sensing capabilities of a robot can be classified as local or global. Local sensing captures information about the immediate environment through onboard sensors (e.g., cameras, range-finding sensors). Global sensing captures a global perspective of the environment with an external sensor like GPS or an overhead camera. In many cases, mobile robot systems rely on both kinds of sensing

6 Navigational tasks assigned to the robots can be classified as exploratory or target-based. Exploratory tasks focus on the exploration of a given area without any particular target location. Target-based tasks expect the robot to travel to one or more given targets. In navigation tasks with multiple targets, either the target set is known in advance and is given to the robot before it begins execution, or the target set is not known in advance and is assigned to robots as and when new targets arrive. Finally, a robot s environment can be classified as structured or unstructured. Structured environments, such as offices or warehouses, have certain geometric regularities (e.g., rooms, corridors) that allow a robot to make simplifying assumptions that assist in navigation. In an unstructured setting these assumptions do not hold, and more domain-independent approaches are required. Environments can also be classified as static or dynamic. Environments where no features change during task execution are called static environments. If features of the environment change during the execution of the task the environment is called dynamic. The acquisition of knowledge is central to decision making in autonomous navigation. The knowledge required for autonomous navigation can be classified as knowledge about the environment, knowledge about the position and orientation of the robot in the environment, knowledge about an effective set of parameters for a robot in the given environment, and knowledge about the robot s goal. This knowledge can be acquired continuously with sensors. Such knowledge will be up to date and need not be stored internally. Another approach is one-time acquisition of knowledge that is then stored internally for future use. 6

7 Reactive systems use only continuous sensor data without any stored internal knowledge. Reactive systems are rule-based, fast, and can quickly respond to changes in the environment. They can also be sub-optimal because they make decisions based only on the immediate situation and ignore the future implications of a decision. In contrast, deliberative systems require an internal representation of knowledge. That knowledge can be either pre-defined or learned through sensor data. A deliberative system uses its knowledge to predict the future states of a robot and to select actions that would lead the robot to its goal. These methods work well only when the external environment remains consistent with robot s internal representation. Most approaches are hybrids; they acquire some components of knowledge continuously from sensors and store others. Whether or not it stores knowledge, every navigation system must acquire and apply knowledge about the environment, about the position and orientation of the robot, about its parameter settings, and about the goal. 3 Knowledge acquisition This section describes the components of knowledge required for navigation. Mapping generates a map of the environment. Localization determines the position and orientation of the robot (robot state) in the environment. Simultaneous localization and mapping is a knowledge acquisition problem where both mapping and localization is done simultaneously. Estimation of parameter values that optimize robot performance in a given environment is also discussed. Finally, this section ends with a survey of different approaches to order tasks assigned to a robot. 7

8 3.1 Mapping A map is a diagrammatic representation of an environment that represents features useful for navigation. Metric maps capture the geometric properties of the environment; topological maps capture the connectivity among different regions. Two important approaches to mapping based on metric maps are those that use geometric shapes (Chatila & Laumond 1985) and those that use probabilistic grids called occupancy grids (Elfes 1989). Early mapping work used convex polygons to model empty space (Chatila & Laumond 1985). Service robots in museums or factories could use these techniques to build a map of the environment. The environment can be initially controlled to ensure that there are no moving obstacles; once an accurate map is built, the robot can then start to navigate. This approach has two main difficulties: geometric, shape-based mapping provides no mechanism to incorporate uncertainties due to sensor or actuator errors or changes in the environment, and it applies only to environments with regular geometric structures. In contrast, occupancy grids try to address both these problems with a probabilistic grid that models a maps (Elfes 1989). This approach partitions the map into a grid, each of whose cells stores a probabilistic estimate of whether it contains an obstacle. This allows the use of techniques such as Bayesian estimation, where an initial belief about the occupancy of the cell is continuously revised as more evidence is collected from sensor measurements. Topological approaches collect information about the connections among the various regions in the map (Mataric 1992; Kuipers & Byun 1991). In one approach, robots use sensors to detect and follow walls (Mataric 1992). A compass mounted on the 8

9 robot detects the orientation of the walls. As the robot tries to follow one wall after another, it creates a network of such landmarks, and uses the network as a topological map for navigation. In another approach, the robot randomly explores the environment while it simultaneously identifies places with distinctive characteristics based on its sensor measurements (Kuipers & Byun 1991). For example, a robot in a room could use equal-distance-from-nearby-obstacles as its metric to correctly identify the middle of the room. After the robot reaches a distinctive place, it randomly chooses and follows an applicable control strategy such as follow-the-midline-in-a-corridor or walk-alongthe-edge-of-a-large-space. If a new distinctive place is found, then the two distinctive places are treated as nodes and the control strategy between them is recorded as the edge that connects the nodes. Hybrid approaches use both topological and metric information for navigation (Thrun 1998). Metric information is stored as an occupancy grid; a trained neural network converts the sensor measurements into the values in the grid. Planning over an occupancy grid depends on the granularity and can be suboptimal for fine granularity. A topological map makes planning more efficient, since it has fewer regions and hence a smaller state space compared to an occupancy grid. A Voronoi diagram converts the occupancy grid into a topological map. A Voronoi diagram (Figure 2(a)) is a set of points in free space that have at least two distinct nearest points. Critical points are points in the set where the distance to its nearest neighbor reaches a local minimum. Critical lines connect critical points to its nearest neighbors (Figure 2(b)). These critical lines partition a map into topological regions (Figure 2(c)). These topological regions are then 9

10 converted into a graph where nodes represent the regions and edges represent adjacencies between regions (Figure 2(d)). (a) Voronoi diagram (b) Critical lines (c) Regions (d) Topological graph Figure 2: Extracting the topological graph from the map (Thrun 1998, p. 44). 3.2 Localization Localization is a fundamental ability for navigation. There are two kinds of localization: position tracking, which finds the position of the robot relative to its initial position, and global localization, which finds the global co-ordinates of the robots. Dead reckoning can be used for position tracking. It continuously measures the motion of the robot and estimates a position relative to the robot s initial position. The robot s motion can be estimated either by odometers that measure the speed of the wheels or by inertial sensors, such as accelerometers, that measure the inertial forces experienced 10

11 by the robot during motion. Probabilistic approaches can improve the performance of dead reckoning. One approach is to estimate the probability distribution of the next position of the robot in the map, given the sensor measurements and its previous positions. The Markov assumption reduces this dependency to only the previous position. Kalman filters are commonly used to estimate this probability distribution (Montemerlo et al. 2002). Kalman filters assume that a Gaussian distribution can model the actuator and sensor errors. The two most common approaches to global localization are landmarks and sensor matching. A landmark is defined as a distinctive and recognizable feature of the environment. The robot identifies and stores landmarks encountered in a map. The robot is localized when a new landmark matches a stored landmark. Landmark-based approaches are common in robots that use vision-based sensors. The robot can use scaleinvariant features (Lowe 1999) of the environment as landmarks (Se et al. 2001). Scaleinvariant features do not change under image translation, scaling, or rotation. This makes them suitable landmarks for mobile robot localization. Given a database of such features, a matching algorithm can be used to find the global coordinates of the robot. Sensor matching uses all available sensor information to estimate the robot s global position. Monte Carlo Localization (MCL) is one such global localization technique. It computes the probability distribution of the robot s position in the environment given all sensor measurements and a global map (Thrun et al. 2001). MCL applies sampling-based methods to approximate the distribution. The key idea is to maintain a probability distribution of the robot s position as a set of random position samples (particles) drawn from the probability distribution. Each particle is a possible 11

12 position for the robot. The set of particles constitute a discrete approximation of the probability distribution. Weights are then assigned to each particle based on the sensor information. These weights reflect changes in the probabilities of the possible positions of the robot, given the new evidence from the sensors. Once the robot moves, it computes a new particle set that represents the robot s latest position. Both the old particle set and its weights are used to compute the new particle set. This ensures that the new particle set incorporates the evidence from the sensors. Because the number of particles can be adjusted, MCL allows a tradeoff between accuracy and computational cost, which makes it a popular localization algorithm. 3.3 Simultaneous localization and mapping Simultaneous localization and mapping (SLAM) generates a map of the environment and simultaneously determines the robot s location in that map. This is accomplished by detection of new landmarks and identification of observed landmarks as the robot moves in the environment. When a robot moves, a noisy odometer estimates the new position. Landmarks from the environment are extracted from this new position. The robot attempts to associate these just-identified landmarks with earlier observed landmarks. Reobserved landmarks are then used to improve the estimation of robot s position. The initial approach to SLAM proposed the use of an extended Kalman filter (EKF) to estimate the distribution of the robot s position and the distribution of landmark positions (Smith et al. 1987). Key limitations of EKF-based approaches are that they are slow and scale quadratically with the number of landmarks (Montemerlo et al. 2002). FASTSLAM, a popular SLAM method based on particle filters, has been shown to scale logarithmically with the number of landmarks (Montemerlo et al. 2002). An important 12

13 requirement of the SLAM-based approach is the detection and identification of enough landmarks to ensure accurate localization. A large number of landmarks, however, imply a large computation time, which might be inappropriate for real-time applications. This inherent tradeoff between speed and accuracy remains an important issue in SLAM. 3.4 Parameter estimation The identification of optimal parameter values to control robot navigation is called parameter estimation. In reactive navigation systems, the parameters are often used to weight different behaviors (Maes & Brooks 1990; Ram et al. 1994). In one approach, the correlation between behaviors and conditions that result in positive reinforcements are learned, and the behaviors with a high probability of positive reinforcement are preferred (Maes & Brooks 1990). Another approach to parameter estimation uses genetic algorithms (Ram et al. 1994). Initially, random weights are assigned to the behaviors of the robots. The robots are run in simulation and their performance on a navigational task is recorded. Many such sets of weights are generated and their performance is evaluated in simulation. A new generation of weights is created from the combination of weights with better performance from the previous generation and they are again tested in simulation. The process is repeated until a termination condition is reached; then the weights with best performance are selected. A robot that learns to adapt to changes in its environment greatly improves its utility. For example, a robot that can switch from one kind of sensor to another to account for variations in lighting conditions (e.g., learn to use a vision camera during the day and an infrared sensor at night) can be more useful. A robot that can adapt its behavior based 13

14 on feedback from its users is also useful. Such a robot could learn to navigate differently in different settings, and optimize for time or distance travelled. 3.5 Goal Knowledge Navigation goals set targets and order them, that is, they specify when a visit to one target should precede a visit to another target. For a single robot with multiple unordered targets, target ordering seeks an ordering that maximizes efficiency. In such cases, target ordering can be modeled as a traveling salesperson problem. In other cases, target sets are partially ordered or completely ordered. In scenarios such as manufacturing where the robot has to move to different parts of a factory floor to follow a predefined workflow that combines components to build products, the target set might be completely ordered and the robot s task is only to reach the targets in the specified order. Other scenarios might specify the target set as a partial order. For example, if the goal is to shift identical chairs from one room to another, the robots must visit the first room to pick a chair up and then visit the destination room to place that chair. There are no restrictions, however on which chair to pick up first or where in the destination room to place the chair. A partially or unordered target set can also be dynamic. The target set is dynamic either if the target s position changes over time (the moving target tracking problem) (Huang 2009) or if targets are dynamically added and deleted. To operate successfully in such an environment, the robot should be able to re-compute the optimal target order during the task. Another way to address dynamism is to rely on heuristics (e.g., visit the closest point first) that can provide possibly suboptimal but quick solutions. In multi-robot systems, target allocation partitions a set of targets among the robots, in a way that improves performance. Optimal target allocation is computationally 14

15 expensive, because it must find both an optimal partition and an optimal target order for every allocation. When both the robot s environment and the set of targets are static, combinatorial auctions can be used to allocate targets (Berhault et al. 2003). Each robot bids for a bundle of targets. The targets are allocated to the lowest bidder. Since the number of possible bundles increases exponentially with the targets, finding an optimal bidding strategy is a hard problem. 4. Deliberative decision making This section describes several planning mechanisms relevant for navigation. Given an internal representation of the map, the robot state, parameter values, and a goal, a planner should be able to generate a sequence of actions (a plan) that will accomplish its goal, provided the robot s actuators execute the action accurately. When the task is navigation, a plan can be a sequence of waypoints on a map. The identification of this sequence of waypoints is called pathfinding. There are many different ways to plan including classical planning, graph-based approaches, sampling-based approaches, and planning in nondeterministic settings. 4.1 Classical Planning Classical planning uses formal logic to represent and create a plan. In classical planning methods, the map, robot state information, robot action repertoire, and goal information are stored as logical formulas, and the planner uses theorem proving to prove that execution of the plan will satisfy the goal conditions. Classical planning can address planning in a state space, a plan space, a planning graph, or as a constraint satisfaction problem. 15

16 In state-space planning, the search space is a set of world states, transitions between states are actions, and a path through the state space that begins at the start state and ends in a goal state is a plan. STRIPS was an early a state-space planner that used theorem proving to test if the state that eventually results from the execution of actions listed in a plan would satisfy the goal condition (Fikes & Nilsson 1971). STRIPS used means-ends analysis as a heuristic to guide search. ABSTRIPS improved on STRIPS; it built an abstract plan first, and moved towards a more detailed version of the plan only if the abstract plan succeeded (Sacerdoti 1974). This detection and elimination of impossible search spaces allowed ABSTRIPS to plan faster. In plan-space planning, the search space is a set of plans (including partial plans). The initial state is a null plan and transitions are plan operators that add or reorder actions to modify plans. Plan-space planning methods follow the principle of least commitment and create a partial order of actions. Actions are ordered only when necessary; this reduces backtracking, as it avoids a situation seen in state space planning called Sussman s anomaly (Sacerdoti 1975). For example, NOAH, a non-linear planner, fulfills unsatisfied preconditions by adding actions to the plan (Sacerdoti 1975). It then uses procedures called critics to confirm that the additions do not harm previously satisfied preconditions. Actions that cause conflicts are ordered to resolve any conflicts. Because actions are ordered only when required (in this case is to resolve conflicts), NOAH follows the principle of least commitment. TWEAK is an improvement over NOAH; it is a simpler, provably complete and correct non-linear planner that used a STRIPS-like action representation (Chapman 1987). TWEAK, however, is restrictive; its representation does not allow conditions and universal quantifiers. UCPOP, another 16

17 partial-order planner, allows universal quantification and conditional effects (Penberthy & Weld 1992). The efficiency of UCPOP is further improved in RePOP, where heuristics ensure that the partial-order plan most likely to lead to a solution is explored first (Nguyen & Kambhampati 2001). A planning graph is a directed graph where levels alternate between proposition nodes and action nodes. An example appears in Figure 3. An edge connects an action in one level to a proposition in the next level whenever the proposition forms a postcondition of the action. An edge also connects a proposition in one level to an action in the next whenever a proposition satisfies a pre-condition of an action. Two actions interfere with one another if the outcome of one action reverses the outcome of the other, or if the outcome of one action negates the preconditions of the other. Planning graphs are more compact than their corresponding state-space based graphs, that is, they have fewer nodes because they allow actions to interfere. To produce a plan, a search algorithm is run on the planning graph in stages. Figure 3: Example of a planning graph with alternate levels of actions and propositions (Blum 1997, p. 5). 17

18 GraphPlan is a planner based on a planning graph and a STRIPS-like representation (Blum 1997). GraphPlan begins in stage 1 with a planning graph whose single proposition level represents the initial conditions. In stage i, GraphPlan takes the planning graph from stage i 1, extends it one time step (the next action level and the following propositional level), and then searches the extended planning graph for a valid plan of length i. GraphPlan s search either finds a valid plan (in which case it halts) or determines that the goals are not all achievable by time i (in which case it iterates to the next stage). In each iteration through this extend/search loop, the algorithm either discovers a plan or else proves that no plan with that many time steps or fewer is possible. Planners based on heuristic-search focus on how to guide the search through the state space. Heuristics evaluate future states on their distance to the goal; the state closest to the goal is explored first. The number of actions in a plan that solves a relaxed version of the planning problem is used as the measure distance. The Fast-Forward (FF) planning system uses planning graphs to construct relaxed planning problems whose solutions are then used to estimate the distance to the goal (Hoffmann & Nebel 2001). The Fast- Downward (FD) planning system combines heuristic search with hierarchal problem decomposition (Helmert 2006). Like FF, FD computes plans by heuristic search in the state space. Unlike FF, however, FD s heuristic function uses hierarchal decomposition of the planning task to construct relaxed planning problems. The number of actions in the plans for these relaxed planning problems is then used to estimate the distance from a state to the goal. The LAMA planner extends FD with landmarks, propositional formulas 18

19 that must be true at some point in every plan for a given task (Richter & Westphal 2010). LAMA then uses landmarks to guide the search to states with many landmarks. Some approaches have sought to transform the planning problem into a constraint satisfaction problem, to exploit existing constraint satisfaction solvers. A planning graph can be formulated as a SAT problem, and general-purpose SAT solvers can be used to solve the problem (Kautz et al. 1996). In another approach, Local Planning Graph (LPG), the planning graph is formulated as a propositional constraint satisfaction problem (Gerevini & Serina 2003). LPG treats the stages of the planning graph as variables, and the detection of a partial plan in a stage as variable assignment. LPG relies on heuristics to improve its speed. 4.2 Graph-based Search In graph-based search, an undirected graph represents the topology and distance information of a fixed map. The map is divided into a regular grid, where nodes represent cells and edges connect adjacent cells. Edge weights represent obstacles between the cells. A route through the map from the initial position of the robot to the goal can be represented as a path in the undirected graph. This reduces the path-planning problem to a shortest path problem. A* is the most common approach to search for a path to the goal state (Nilsson 1980). A* estimates the total cost from the start node to the goal as the sum of the cost to get to the current node from the start node and a heuristic estimate of the cost from the current node to the goal node. The algorithm returns an optimal path from the start to the goal when the heuristic always underestimates the true distance. A* search keeps a queue of states to be explored and the state with the lowest total cost is explored first. 19

20 On large maps, A* consumes considerable memory, because the number of states in the queue grows at an exponential rate. In contrast, Iterative Deepening A* (IDA*) only requires memory linear in the length of the solution that it constructs (Korf 1996). IDA* is an iterative algorithm that begins with an initial cutoff for the total estimated cost of the solution. Each iteration begins with a depth-first search for a solution whose estimated total cost is less than the cutoff. If a solution is not found, the cutoff is incremented and depth first search is repeated. Hierarchal planning A* (HPA*) is a nearoptimal algorithm that uses the hierarchal properties of path planning to save computation (Botea et al. 2004). HPA* partitions the map into neighborhoods, and finds a path to the border of the neighborhood that contains the start location. Next, it finds a path from the border of the start neighborhood to the border of the neighborhood around the goal. Because there are fewer neighborhoods than nodes in a grid-based graph, computation of such a path is fast. Finally, HPA* finds a path from the border of the goal neighborhood to the goal. All three paths combined forms a solution that is both quick to compute and near-optimal. 4.3 Sampling-based approaches Unlike simple path planning, motion planning does not assume that the robot is a single point in space. Instead, motion planning considers the problems that can arise when robots are treated as solid-bodied objects. The high dimensionality of motion planning renders several of the standard planning approaches ineffective. Sampling-based approaches are commonly used instead. Rather that an explicit representation of the environment, sampling-based algorithms rely on collision checking to test whether two randomly sampled states have a feasible trajectory. A graph of feasible trajectories is 20

21 built from multiple sampled states, which is then used to construct a solution to the original motion-planning problem. Although sampling-based algorithms are not complete, they provide probabilistic completeness guarantees, that is, the probability that the planner fails to return a solution, if one exists, decays to zero as the number of sampled points approaches infinity. There are two principal sampling-based approaches: Probabilistic RoadMaps (PRMs) (Kavraki et al. 1996) and Rapidly exploring Random Trees (RRT) (Zucker et al. 2007; Ferguson et al. 2006). As shown in Figure 4, PRM is a graph whose nodes represent valid states, chosen at random from a state space. In motion planning, a state includes not only the position of the robot, but also the orientation of the robot s body. Each state is represented as a node in a graph. A set of such nodes is chosen randomly. A local planner checks for a path from one state to other nearby states (within distance d from the given state, where d is pre-defined) and, if a path is found, the nodes that represent the states are connected with an edge. The resultant graph becomes the input to a graph search algorithm that then computes the shortest path from the initial state to the goal state. The RRT algorithm randomly builds a tree to search a non-convex, highdimensional space. RRT chooses a state at random, and connects it to the nearest state in the tree, provided that a collision-free path exists between them. A local planner that searches for a plan of pre-defined length l is used to test for the existence of a collisionfree path. Each iteration adds a new state to the tree, and the tree grows with a bias toward unexplored areas of the search space. An example appears in Figure 5. Growth of the tree stops when the goal state is randomly chosen and added to the tree. The tree is 21

22 then used as an input to a search algorithm, which returns a path from the initial state to a goal state. Figure 4: PRMs (Choset 2005, p. 245). The circles denote randomly sampled states and the lines denote feasible trajectory. The dark lines form a path from initial state (q init ) to final state (q goal ). Figure 5: Growth of RRTs biased towards unexplored spaces (Kuffner and LaValle 2000, p. 3). 4.4 Planning in non-deterministic settings In many domains a robot s actuators are imperfect, which leads to uncertain outcomes. In such domains the state of a robot after an action can be represented as a probability 22

23 distribution over the possible states. A Markov decision process (MDP) can be used to model such domains. An MDP framework, however, assumes that there is complete observability (i.e. the robot can accurately detect its current state). An MDP can be described as a tuple <S, A, T, R, π >, where: S is a finite set of states of the world. A continuous world can be converted into a finite set of states by discretization. A is a finite set of actions. T : S A p(s) is a state-transition function, which for every state s S and action a A, the function returns a probability distribution over the world states. R : S A R is the reward function that returns expected reward gained by the robot when it chooses an action in a specific state. π: S A is policy function that specifies the action to be chosen in any given state. The objective of an MDP framework is to find an optimal policy function that results in the maximum reward accumulation, as defined by the reward function. Value Iteration (VI) is an iterative algorithm to find such an optimal policy. A value function returns the maximum reward that the robot could get from a particular state if it is allowed to make a limited number of decisions. VI initially computes the value function assuming the robot is allowed make only one decision. In the next step, it iteratively computes the value function if the robot is allowed to make two decisions with the previously computed value function. This is repeated until the value function is found for the desired number of decisions. The optimal policy is then to choose actions that should lead to states with a maximum value as defined by the value function. 23

24 In many domains, a robot with noisy sensors may not able to determine its current state with complete reliability. Partially Observable MDP (POMDP) provides a framework to obtain optimal policies in such domains (Kaelbling et al. 1998). A POMDP can be described as a tuple <S, A, T, R, O, Of> where, S, A, T and R describe a Markov decision process; O is a finite set of observations the robot can experience through its sensors. A continuous valued sensor can be discretized to generate a finite observation set. Of : S A p(o) is the observation function which returns for each action and resulting state a probability distribution over possible observations Because POMDPs do not know the current state accurately, they maintain a probability distribution over states called a belief state. Some approaches have a separate component called State Estimator (SE) that updates the belief state based on the last action, the current observation, and the previous belief state (Figure 6). The goal is to generate a policy π that could suggest optimal actions from any belief state. An important aspect of POMDPs is that there is no difference between an action chosen to change the state of the world and action chosen to gain information. POMDPs have several drawbacks, however. First, the generation of an optimal policy is computationally expensive, so such a policy cannot be generated for large domains. Second, POMDPs are based on a stochastic model of the transition and reward function, so any policy generated is optimal only for that model; any change in the stochastic nature of the environment results in suboptimal results. 24

25 Figure 6: Relationship between state estimator, policy and the world (Kaelbling, Littman, and Cassandra 1998, p. 106). Online POMDPs, unlike offline POMDPs, interleave policy construction and execution (Figure 7) (Ross et al. 2008). The online approach tries only to find a good local policy for the current belief state of the robot, so it considers only states reachable from the current belief state. This focuses computation on a small number of states. Once this local planning phase terminates, the execution phase executes the best action from the local policy and updates the current belief based on the observation obtained. Figure 7: Policy construction and execution in online and offline POMDPs (Ross et al. 2008, p. 671). An improvement in performance of online POMDPs is made by the application of Monte Carlo sampling (Silver & Veness 2010). Instead of search over all possible observations, a search over a sampled set of observations, and beliefs reached by the sampled observations allows for a deeper search within a given planning time. 25

26 5. Reactive decision making Figure 8: Subsumption Architecture (R. Brooks 1986, p. 7). This section reviews various architectures built for reactive decision-making. Rod Brooks, in 1991, argued that intelligence can be achieved without any internal representation (Brooks 1991), based on two important observations. First, the acquisition of accurate world information through noisy sensors is difficult. Second, even if an accurate internal representation were built, it would have to be updated constantly because most real-world robotics deals with dynamic environments. Given these arguments, Brooks approach was to replace internal representations with sensor information and predefined rules that computed the decision reactively. Without any internal representation, the problems of accurate knowledge acquisition and stale knowledge are avoided. His reactive subsumption architecture is based on behaviors organized into layers, where decisions from higher-layer behaviors override decisions from lower-layer behaviors, as shown in Figure 8 (Brooks 1986). Behaviors at each layer receive sensor information and make independent decisions that are sent to the actuators. 26

27 Toto, a mobile robot system, was built on the subsumption architecture (Brooks 1986). Its low-level behaviors enabled it to wander without collision. An intermediate level of behavior recognized such landmarks as walls and clutters. Each time it recognized a landmark, a behavior allocated itself as the 'place' of that particular landmark. Behaviors that corresponded to physically adjacent landmarks had neighbor relationship links activated between them. A graph structure was thus formed, where the nodes of the graph were active computational elements rather than static data structures. As the robot moved in the environment, the nodes tracked its location. Nodes became active if they could detect that they corresponded to the robot s current location. Thus the robot had both a map, and a sense of where it was on the map, from a distributed computational model. AuRA also builds robot decision systems as a collection of parallel behaviors (Arkin 1990). It includes motor schemas that are primitive movement behaviors, and perceptual schemas that are behaviors to sense actively the required information from the environment. Each motor schema recommends a possible move in the form a vector with direction and magnitude. The sum of all such vectors represents the final decision that is then executed by the robot. Pengi consists of a collection of rules (Agre & Chapman 1987). These rules were guided by the properties of the immediate situation. With its simple scheme of rules, Pengi plays a fairly complicated game without an explicit internal representation of the environment. Although reactive methods are fast and can handle dynamism well, the fact that they do not exploit available knowledge to make long-range plans limits their efficiency. 27

28 6. Hybrid decision making This section reviews different approaches used for robot navigation in a dynamic environment such as dynamic obstacles (e.g., people moving on a factory floor) and dynamic maps (e.g., roads blocked for construction). Given a navigation plan as a sequence of subgoals, changes in the environment could render it useless. Because not all environmental changes can be predicted, robots have to rely on their sensors to gather information about them. Thus the reliance on only an internal representation does not serve well in realistic scenarios. Instead, hybrid architectures try to adapt deliberative behavior in the light of sensor information about the changes in the world. Such architectures are also reviewed in this section. 6.1 Obstacle Avoidance If there are moving obstacles in the environment, the robot must avoid them (obstacle avoidance) and continue to follow its earlier plan, revise it, or create a new plan. It is typically assumed that the robot can sense the location and the velocity of the obstacle. A common approach to obstacle avoidance is first to plan a path to avoid static obstacles and then use local information about dynamic obstacles to divert minimally from that plan. How to minimize the cost of diversion is treated as an optimization problem (Chhugani et al. 2009). Other approaches to obstacle avoidance include reactive rules (Sun et al. 2014) and potential fields (Khatib 1985). Although such rules are computationally fast, they suffer from local minima and do not guarantee optimal solutions (Koren et al. 1991). 6.2 Dynamic Maps 28

29 When maps are partially known or change over time, A*-generated plans can fail. Using A* to replan whenever a plan fails is computationally expensive. To address this need for repeated planning, incremental search algorithms like D* (Stentz 1994), focused D* (Stentz 1995), LPA* (Koenig et al. 2004), D* lite (Koenig & Likhachev 2002), and MPGAA* (Hernández, Baier, and Asin 2014) have been proposed. Incremental search algorithms assume that the unknown parts of the map have no obstacles, and find a shortest path to the goal under this assumption. The robot then follows the path. When the robot observes new map information it adds the information to the map and, if necessary, generates a new shortest path to the target from the current position. This new path generation is more efficient because it uses information from the earlier planning process. D* search starts from the goal state and explores until the robot s initial state is reached. D* search maintains a list of states to be evaluated, along with the shortest distance from each state to the goal state. The list begins with the goal state. In every iteration, D* chooses the state closest to the goal state from the list and adds its neighbors into the list. When the start state is added to the list, the search stops, and the path to the goal can be found by backtracking. The robot then follows this path to the goal. When the robot encounters a new obstacle, cost updates are propagated from the position of the obstacle to the robot. D* is efficient because most changes in the map are detected near the robot, and therefore require fewer cost updates. Focused D* improves upon D* with a heuristic that focuses the propagation of cost changes toward the robot. LPA* returns the shortest path in a changing graph, but with its start and end positions held constant. Unlike D*, which is similar to uniform cost 29

30 search, LPA* is similar to A* search. Each node in LPA* uses heuristics to maintain an estimate of the path length through that node. When edge costs change, these estimates are updated and the updated estimates are used to re-compute the plan. D* lite adapts LPA* to work when the initial position of the robot changes as the robot moves. MPGAA*, introduced in 2014, is an empirically faster yet simpler algorithm than D* lite. MPGAA* initially finds an optimal path from the robot to the goal using the A* algorithm. After the robot s environment changes, MPGAA* efficiently re-plans with two main ideas. First, it uses the information from the initial A* search procedure to improve the heuristic estimate to the goal. This makes subsequent searches faster. Second, it saves all the previous optimal paths and tries to reuse them instead of repeatedly planning. Another important search variation is anytime A*, an incremental algorithm that allows the user to query for a possibly suboptimal solution at any time during search. This is useful for robots with real-time requirements (Likhachev et al. 2005). 6.3 Hybrid Architectures Many hybrid architectures incorporate both sensor information and static internal representations of the environment in decision making (Firby 1987; Connell 1992; Gat 1997; Goodwin et al. 1997; Ranganathan & Koenig 2003; Rosenblatt 1997; Arkin 1990). Reactive planning selects plans based only on the current situation without any lookahead (Firby 1987). Reactive planning uses a set of pre-defined procedures called Reactive Action Planners (RAP). Every RAP is a specially designed search procedure that chooses actions only based on the current world state, until the assigned goal state is reached. The goal-dependent design of its search procedures makes the search more efficient than brute 30

31 force, while dependency on only the current state ensures that the search algorithm works despite changes in the environment. Many three-tier hybrid architectures have been proposed with the following structure: a reactive tier of behaviors that directly map sensor values to actuators with no internal state, a sequencer whose algorithms sequence the behaviors to achieve complex objectives, and a slow deliberative planner to find a long-term plan. In 3T, for example, the reactive layer has primitive behaviors such as wall finding, wall following, and collision avoidance (Gat 1997). Algorithms in its sequence layer combined the reactive behaviors to achieve complex objectives. For example, self-localization is achieved through a combination of wall following and collision avoidance. Figure 9: DAMN Architecture (Rosenblatt 1997, p. 2). Rather than a top-down structure to achieve the desired symbiosis of reactive and deliberative behavior, DAMN, shown in Figure 9, takes a horizontal approach, where reactive and deliberative modules concurrently vote to control the robot (Rosenblatt 1997). DAMN s slower deliberative modules vote at a slower rate, while the faster, reactive modules vote at a faster rate. Votes from different modules are combined based on a pre-defined set of weights given to the modules. These weights control the reactive and deliberative tendencies of decision making in DAMN. 31

Mobile Robots: An Introduction.

Mobile Robots: An Introduction Amirkabir University of Technology Computer Engineering & Information Technology Department http://ce.aut.ac.ir/~shiry/lecture/robotics-2004/robotics04.html Introduction