Acquisition of Intermediate Goals for an Agent Executing Multiple Tasks

Size: px

Start display at page:

Download "Acquisition of Intermediate Goals for an Agent Executing Multiple Tasks"

Nathan Greer
6 years ago
Views:

1 1034 IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 5, OCTOBER 2006 Acquisition of Intermediate Goals for an Agent Executing Multiple Tasks Yusuke Fukazawa, Chomchana Trevai, Jun Ota, and Tamio Arai Abstract In this paper, an algorithm that acquires the intermediate goals between the initial and goal states is proposed for an agent executing multiple tasks. We demonstrate the algorithm in the problem of rearranging multiple objects. The result shows that the moving distance to transfer the entire objects to their goal configuration is 1/15 of that without using intermediate goals. We experiment using a real robot to confirm that the intermediate goal can be adapted to a real environment. Our experimental results showed that an agent could adapt the intermediate goals, which were acquired in the simulation, to the experimental environment. Index Terms Autonomous agent, intermediate goal, multitask, rearrangement task, subgoal. I. INTRODUCTION The fact that an autonomous agent performs multiple tasks has attracted a great deal of attention because of the technological advances this represents and the potential for reducing personnel expenses. For an agent executing multiple tasks, there are two steps to completing whole tasks. The first step is selecting a task from multiple task candidates. The second step is selecting the action the robot should take so as to complete the task selected in the first step. Concerned with the action selection, reinforcement learning such as Q-learning [1] has been used to learn the action of a single task. However, the above learned action is not always appropriate for completing the multiple tasks, even if an agent learns the appropriate action for completing single tasks. An example of rearrangement of multiple objects [2] is shown in Fig. 1. The agent transfers these objects to their goal position as soon as possible. There are three tasks: transferring object 1 to the goal state; transferring object 2 to the goal state; and transferring object 3 to the goal state. However, if an agent learns the action of transferring object 1 to its goal configuration without considering the existence of objects 2 and 3, the learned action is not appropriate for transferring whole objects 1 3 to their goal. Therefore, task selection is a significant issue for an agent executing multiple tasks. When a robot has multiple tasks, there are usually order restrictions among them. In the case of rearrangement of multiple objects, shown in Fig. 1, an agent must determine the order in which each object is to be transported for effective completion of the task. Manuscript received November 17, 2004; revised November 15, 2005, and January 23, This paper was recommended for publication by Associate Editors F. Y. Wang and G. Antonelli and Editors I. Walker and L. Parker. This work was supported in part by a Grant-in-Aid for Scientific Research on Priority Areas Emergence of Adaptive Motor Function through Interaction between Body, Brain and Environment from the Japanese Ministry of Education, Culture, Sports, Science and Technology. This paper was presented at the Conference on Intelligent and Autonomous Systems, Amsterdam, The Netherlands, March Y. Fukazawa is with the Service and Solution Development Department, NTT DoCoMo, Inc., R&D Center, Kanagawa , Japan ( fukazawayuu@nttdocomo.co.jp). C. Trevai, J. Ota, and T. Arai are with the Department of Precision Engineering, University of Tokyo, Tokyo , Japan ( ctrevai@prince. pe.u-tokyo.ac.jp; ota@prince.pe.u-tokyo.ac.jp; arai-tamio@prince.pe.u-tokyo. ac.jp). Color versions of Figs. 1, 4, 6, 7, and 8 are available online at org. Digital Object Identifier /TRO Fig. 1. Example of multiple tasks (rearrangement of multiple objects). Robot has multiple tasks, transferring objects 1 3 from their (a) initial state to their corresponding (b) goal state. The objective of this study is to propose an algorithm so as to learn intermediate goals between the initial and goal states for an agent executing multiple tasks. The algorithm mentioned in this paper can restrict the executable tasks among intermediate goals. As a result, the order restrictions among tasks can be solved. A. Past Approach Here, past approaches related to the task selection or action selection for an agent executing multiple tasks are described. Mahadevan et al. developed an algorithm in which an agent can learn the action of three tasks, such as finding a box, pushing a box, and avoiding a deadlock; it then integrates the learning results of each task [3]. Q-learning [1] is used to learn each task. The researchers assumed that each task is independent of the others. Therefore, such a method could not be used to solve a problem with unknown order restrictions, such as conveying multiple pieces of luggage to various destinations. In the following, past approaches related to acquiring intermediate goals (or subgoals) between the initial and goal states are described. McGovern et al. developed a method in which an agent can automatically discover subgoals online [4]. They discover the subgoals in reinforcement learning by using diverse density in the two-room grid-world environment. However, this architecture is not useful for discovering order restrictions among multiple tasks, but improving agent action of a single task. Kaelbling developed landmark networks on a grid world [5]. This algorithm makes the exploration state-space small by setting the landmark between the initial and goal states. However, Kaelbling used a Voronoi graph to generate landmarks, and, therefore, it is not possible for an agent to discover the landmarks automatically. Lane et al. proposed the algorithm of acquiring the coverage actions, such as approaching the object or bringing back the object from basic actions, such as moving in the x- ory-direction based on the concept of semi-markov decision processes (SMDP) [6]. This algorithm enables the agent to explore a smaller action space. However, they presuppose that the agent can hold and bring back the object once it finds the object, and they do not suppose that there are order restrictions among tasks. Takahashi et al. proposed a hierarchical learning architecture that was demonstrated with a soccer simulation [7]. This architecture consists of multiple learning modules with different goal states, which are arranged hierarchically. A set of modules of the lower layer learns to control the velocity from sensor data. On the other hand, a set of modules of the upper layer learns to activate the modules of the lower layer from the barometer that judges if the module of the lower layer reached the goal state or not. This algorithm enables the agent to find an optimal /$ IEEE

2 IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 5, OCTOBER accomplished, the state of accomplishment of a task i at time t is described as function F i (t) as follows: F i (t) = 1; if fi(t) is a nongoal state 0; if f i (t) is a goal state. (1) Define 1F i (t) as follows: 1F i(t) =F i(t) 0 F i(t 0 1) (2) where the step intervals are fixed (1t =1). Next, the time t i is obtained, at which task i is accomplished finally as follows: t 1 =max(t;1f 1 (t) =01) t 2 = max (t;1f 2(t) =01) 111 t N = max (t;1f N (t) =01) (3) Fig. 2. Flowchart of acquiring intermediate goals by backtracking the time series of state transitions of trial and error. policy in the long term. However, the dimensions of state-action space required in a soccer simulation are smaller than those required for the conveyance of multiple pieces of luggage. Therefore, if this algorithm is used with the conveyance of luggage, the learning of the upper layer will not easily go forward. From the above, algorithms to acquire order restrictions are rarely proposed. In this paper, we define that every task (1; 2;...;n) has its own state space (S 1;S 2;...;S n) and action space (A 1;A 2;...;A n). A state space within S 1;S 2;...;S n can be affected by every action space of A 1 ;A 2 ;...;A n. II. ACQUISITION AND USE OF INTERMEDIATE GOALS Here, an algorithm to acquire intermediate goals and the algorithm to use intermediate goals are stated. The flowchart of the acquisition of intermediate goals, which is shown in Fig. 2, is described in the following. First, the time series of state transitions is acquired between the initial and goal states through trial and error. The term trial and error represents that an agent selects the task at random, and then selects the action of the selected task at random. An agent backtracks the time series of state transition and then acquires the state in which each task is accomplished as an intermediate goal. These intermediate goals are referred to as the first group of intermediate goals. More details are provided in Section II-A. The second group of intermediate goals is acquired by considering the last acquired intermediate goals as the goal state of the entire task (problem), and the acquisition of intermediate goals is repeated. Similarly, the third, fourth, and further groups of intermediate goals can be acquired. More details are provided in Section II-B. The entire intermediate goals are referred to as a series of intermediate goals. where max(variables; term1; term2;...) outputs the maximum variable among the group of variables which meets all conditions of term1; term2;... The reason why we take maximization of t is that there is the possibility of reaching a goal state several times throughout the state transition of trial and error, as order restrictions among tasks are unknown for the agent in the phase of trial and error. In the case of 1F i (t) =0throughout the state transitions of trial and error, t i is set as 0. Rearrange t i to j from small to large, which means that N. An agent can acquire both intermediate goals (g 1 (j);g 2 (j);...;g N (j)) and the state of task accomplishment (G 1(j);G 2(j);...;G N (j)) by substituting j for the following equation in the order of j =1;j =2;...;j = N : (g 1 (j);g 2 (j);...;g N (j)) = (f 1 ( j );f 2 ( j );...;f N ( j )) (G 1(j);G 2(j);...;G N (j)) = (F 1( j);f 2( j);...;f N ( j)) : (4) B. Acquiring the Second, Third, etc., Groups of Intermediate Goals Set the last acquired intermediate goal in the first group (state at time 1) as a new goal state, and then renew F i(t)(0 t 1) as follows: F 1(t) = 0; if (f 1(t) =f 1 ( 1 )) 1; if (f 1 (t) 6= f 1 ( 1 )) F 2 (t) = 0; if (f 2(t) =f 2 ( 1 )) 1; if (f 2(t) 6= f 2( 1)) 111 F N (t) = 0; if (f N (t) =f N ( 1 )) 1; if (f N (t) 6= f N ( 1 )) : (5) The intermediate goal g 0 i(t) of the second group and the state of accomplishment of task G 0 j (t) can be acquired from (4). Renew both the intermediate goal and the accomplishment state of the intermediate goal as follows: A. Acquisition of the First Group of Intermediate Goals The state of the agent at time t is defined as (f 1(t);f 2(t);...;f i(t);...;f N (t)), where i represents the task number. Similarly, an intermediate goal is referred to as (g 1 (j);g 2 (j);...;g N (j)), where j represents the number of intermediate goals. An agent acquires the state in which each task is accomplished as an intermediate goal. In order to find the states in which each task is (g 1(j);g 2(j);...;g N (j)) (g1(j);g 0 2(j); 0...;gN 0 (j)) ; (0 <jn ) = (g 1 (j 0 N );g 2 (j 0 N );...;g N (j 0 N )) ; (N <j N + N 0 ) (G 1 (j);g 2 (j);...;g N (j)) (G1(j);G 0 0 2(j);...;G 0 N (j)) ; (0 <jn ) = (G 1(j 0 N );G 2(j 0 N );...;G N (j 0 N )) ; (N <j N + N 0 ) (6)

3 1036 IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 5, OCTOBER 2006 TABLE I DEFINITION OF THE TASK WITH THE REARRANGEMENT PROBLEM OF MULTIPLE OBJECTS Fig. 3. Flowchart of using intermediate goals acquired in Section II-A. where N 0 and N represent the total number of intermediate goals of the first group and the total number of intermediate goals of the second group, respectively. The third, fourth, etc., groups of intermediate goals can be acquired by repeating the same process until reaching the initial state. C. Use of Acquired Intermediate Goals An agent can reach the goal state by reaching each intermediate goal one by one. An agent can reduce the completed tasks among intermediate goals. We call the group of task i, which satisfies T i <jat intermediate goal j, completed tasks. Here, T i represents the intermediate goal number, which, when task i is completed, can be obtained as follows: T 1 =max(j;1g 1(j) < 0) T 2 = max (j;1g 2 (j) < 0) 111 T N = max (j;1g N (j) < 0) : (7) Consequently, an agent can reduce its load and need not consider the completed tasks that satisfy T i < jwhen its aim is the intermediate goal j +1from the intermediate goal j. This rule holds within every group of intermediate goals. Here, we call the task i of T i a target task between intermediate goal T i 0 1 and intermediate goal T i. It means that an agent aims to transfer the state of the task i to the intermediate goal T i from the intermediate goal T i 0 1. The flowchart of the usage part of the algorithm is shown in Fig. 3. Note that in between the intermediate goals, an agent selects the task at random except for the reduced task, and then selects the action of the selected task at random. D. Discussion of Computational Complexity The proposed algorithm can solve the problem with the following calculation cost. There are two kinds of calculation costs: one is the cost of acquiring intermediate goals, and the other is the cost of using intermediate goals and completing the whole task. The former calculation cost becomes the total number of order restrictions between one task and the other tasks. The order of this cost becomes less than O( N C 2 ) = O(N (N 0 1)=2) = O(N 2 ), which is all of the possible combinations of tasks, where N represents the number of tasks. The latter calculation cost becomes the total number of intermediate goals 3 (calculation cost to reach intermediate goal from previous inter- mediate goal). The first term bears the same cost as acquiring intermediate goals O(N 2 ). The second term bears the same cost as completing a single task in between intermediate goals. This is because intermediate goals are designed, in our proposed method, so as to complete a single task, which we call the target task, as described in Section II-C, in between the intermediate goals. The calculation cost to complete a single task depends on the application. In the case of the application described in the following section, which is the rearrangement problem of multiple objects, the calculation cost to complete a single task becomes the total number of actions of a task (N a ) 3 (cost for calculating action; C a ). Therefore, the total cost to solve the rearrangement problem in our approach becomes O(N 2 N a C a ). A more detailed definition of both N a and C a can be found in [2]. In the case of the heuristics approach proposed in [2], the total cost to complete the rearrangement problem becomes O(N 3 NN a C a ) = O(N 4 N ac a), because the cost of calculating a heuristic function using a large arc algorithm [8], which is one of the solutions of the stacker crane problem, is O(N 3 ), and calculating the cost of actions for all of the tasks becomes O(NN ac a). Therefore, our approach has the capability to reach the goal with a smaller computational cost than that for the heuristics approach. III. REARRANGEMENT OF MULTIPLE OBJECTS Here, a method is adapted for the task of rearranging multiple objects. An example is shown in Fig. 1. The agent transfers these objects to its own goal position as soon as possible. This task has multiple applications, such as the conveyance of objects in warehouses or factories. A. Defining the Task The task is defined for each objects (Table I). State space consists of three state variables (x; y; ), which represent the location of one single object, X and Y coordinates, and angle to X- and Y -axes. Therefore, the rearrangement problem of three objects consists of nine state variables. Action space consists of two actions: transferring an object to the (intermediate) goal state and transferring an object to a temporal configuration. The way to select the temporal configuration is at random from the free space of the environment. The moving algorithm of the agent is based on the cell-decomposition method in the following simulation and experiment [9].

IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 5, OCTOBER 2006 1037 TABLE II COMPARISON OF THE MOVING DISTANCE REQUIRED TO ACCOMPLISH THE REARRANGEMENT PROBLEM; AVERAGE OF 100 TRIALS B.

1(a) and (b), respectively. The computer specification is a Pentium IV 2.8-GHz PC. The environment of the simulation is Linux, and the programming language is C++.

4 IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 5, OCTOBER TABLE II COMPARISON OF THE MOVING DISTANCE REQUIRED TO ACCOMPLISH THE REARRANGEMENT PROBLEM; AVERAGE OF 100 TRIALS B. Simulation Results We simulate the rearrangement problem under the above definition of the task and the ability of the agent. The initial and goal states of the simulation are shown in Fig. 1(a) and (b), respectively. The computer specification is a Pentium IV 2.8-GHz PC. The environment of the simulation is Linux, and the programming language is C++. The following matters are discussed: 1) effects of intermediate goals; 2) property of acquired intermediate goals. C. Effects of Intermediate Goals The comparison of the moving distance required to accomplish the task, which is the average of 100 trials, is shown in Table II. When using intermediate goals, the agent could accomplish the task by 17 m; on the other hand, the agent took m without using intermediate goals. When an agent reduces the completed task, the moving distance to transfer the entire objects to their goal configuration is 2/3 of that without reducing the completed task. As a result of the findings noted above, the efficacy of using intermediate goals and reducing the task is demonstrated. D. Property of Acquired Intermediate Goals Fig. 4(a) is the initial state, and Fig. 4(b) (n) shows the acquired intermediate goals. The first group of the intermediate goals [Fig. 4(l), (m), (n)] is acquired first; then, the state of Fig. 4(l) is set as a new goal state, and the second group of the intermediate goals [Fig. 4(j), (k), (l)] is acquired to reach the new goal state [Fig. 4(l)]. After repeating a similar process, all of the intermediate goals are acquired. The detailed structure of the intermediate goals is shown in Table III. The meanings of the terms completed task and target task are described in Section II-C. In the case of this problem, objects must be rearranged in the order of objects 1, 2, and 3 in order to reach the goal state shown in Fig. 4(n). Fig. 4(l), (m), (n) shows this order of rearranging objects. In addition, objects 1 3 must be transferred to the configuration where the robot can hold objects in the order of 1, 2, and 3 to follow the order of rearranging objects. The configuration of objects, as shown in Fig. 4(k), shows that the objects are transferred to the configuration where a robot can hold objects in the order of 1, 2, and 3. From the above, the appropriate intermediate goals for rearranging objects 1 3 are considered to be acquired. Fig. 4. Series of intermediate goals of rearrangement problem with three objects. TABLE III PROPERTY OF ACQUIRED INTERMEDIATE GOALS SHOWN IN FIG. 4 IV. ADAPTATION TO THE REAL-WORLD ENVIRONMENT Here, we experiment with the rearrangement task of multiple objects, using the acquired intermediate goals in simulation. This experiment has a couple of important issues. One is that the number of objects the robot recognizes changes dynamically throughout the experiment, because the robot does not know of the existence of movable objects at the beginning of experiment. Therefore, the robot must learn the intermediate goals for the environment with one, two, and three mov- able objects, respectively, in the series of simulations in advance. The other is that the location of objects that the robot recognizes is different from the simulation, and the robot must adapt the acquired intermediate goals to the real environment.

3) shape of the object; 4) holding position of the object. An agent detects the position of the reflectors attached to the object within the sensing area, which is the circle with radius 0.8 m.

5 1038 IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 5, OCTOBER 2006 Fig. 5. Four different goal states. We acquired the series of intermediate goals concerning these four different goal states, respectively, in advance of the experiment. Fig. 7. (a) Initial state and (b) goal state of our experiment. 3) shape of the object; 4) holding position of the object. An agent detects the position of the reflectors attached to the object within the sensing area, which is the circle with radius 0.8 m. Second, an agent reaches the object to read the QR-code and then recognizes the object. Fig. 6. Mobile robot used in our experiment. A. Simulation to Acquire Intermediate Goals We experiment using a real robot to confirm that the intermediate goal can be adapted to a real environment. In advance of the experiment, we acquired the series of intermediate goals concerning the four different goal states, as shown in Fig. 5. We acquired 8000 series of intermediate goals by simulating 2000 different initial states for each of four kinds of goal states. In the experimental phase, the robot selects the intermediate goals closest to the current state from the set of the acquired intermediate goals, whose goal state is the same as the goal state of the number of recognized objects. For example, in the state where the robot recognizes two objects A and B, the robot selects the intermediate goals from the set of intermediate goals whose goal state is shown in Fig. 5(a). If the robot recognizes three objects A, B, and C, it will select the intermediate goals from the set of those whose goal state is shown in Fig. 5(d). In the case where the robot recognizes only one object, the robot transfers the object to the goal position. B. Experimental Equipment The mobile robot used in this experiment (Fig. 6), which was developed by the Institute of Physical and Chemical Research [10], has an omnidirectional mechanism as a motion system. In order to grasp an object, the robot is equipped with a lift-up mechanism. As a sensory system, the robot has a charge-coupled device (CCD) camera, a QR-decoder, and a Sick Laser range sensor. An agent can localize itself in the environment using a laser range finder. The mark, which consists of a reflector and QR-code, is attached to the four faces of movable objects. The following information is written in the QR-code: 1) target configuration of the object; 2) number of the face of the object; C. Experimental Results Both the initial and goal states are shown in Fig. 7. The size of the environment is 2.7 m23.5 m. The sequence of the experiment is shown in Fig. 8(a) (h). The configuration of the objects shown in these figures is measured through the experiment using a laser range finder. First, the robot explores the environment in order to find an object. We used the points distributing the algorithm proposed in [11] in order to make the exploration path. While exploring the path, the robot finds a new object A [Fig. 8(a)]. The robot obtains specific information, such as the relative configuration to object A, the goal configuration of object A, the shape of object A, and the grasping configuration of object A from the attached mark. The robot interrupts the exploration of the environment and tries to transfer object A to its goal configuration, since there is only one recognized object. While carrying object A to its goal configuration, the robot encounters object B [Fig. 8(b)]. The motion is interrupted, and then the appropriate intermediate goals are selected from the set of series of intermediate goals. An agent transfer object A to the intermediate goal configuration [Fig. 8(c)] and then transfers object B to the intermediate goal configuration [Fig. 8(d)]. While rearranging object B, an agent finds object C and then selects the intermediate goal adaptable to the current state [Fig. 8(d)]. An agent transfers object B to the intermediate goal configuration [Fig. 8(e)], then an agent transfers object C to the goal configuration [Fig. 8(f)], and then an agent transfers objects B and A to their goal configuration one by one [Fig. 8(g) and (h)]. Since the entire working area has been recognized and no other objects have been found, the robot goes back to its initial position. The above experimental results demonstrate that an agent could adapt the intermediate goals, which had been acquired in the simulation, for use in an experimental environment. V. CONCLUSION In this paper, we proposed a method to acquire intermediate goals through satisfying the order restriction among tasks for an agent to learn an action through the execution of multiple tasks. In the simulation of the problem of rearranging multiple objects, the result showed that the

IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 5, OCTOBER 2006 1039 Fig. 8. View of our experiment. (a) 103.4 s.

6 IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 5, OCTOBER Fig. 8. View of our experiment. (a) s. (b) s. (c) s. (d) s. (e) s. (f) s. (g) s. (h) s. moving distance to transfer the entire objects to their goal configuration was 1/15 of that without using intermediate goals. We experimented using a real robot to confirm that the intermediate goal can be adapted to a real environment. Our experimental results showed that an agent could adapt the intermediate goals, which were acquired in the simulation, to the experimental environment. Future work includes the topics of inability to acquire optimal action among intermediate goals, and sensitivity of the intermediate goals to the state transition of trial and error. The first topic can be overcome by applying the architecture of an action-selection module, such as reinforcement learning or planning, to the selection of actions among intermediate goals. The second topic can be dealt with by acquiring intermediate goals through referring multiple time series of state transitions of trial and error. ACKNOWLEDGMENT The authors would like to thank Prof. H. Asama for his valuable comments and assistance on this work. REFERENCES [1] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, [2] J. Ota, Rearrangement planning of multiple mobile objects by using realtime search methodology, in Proc. IEEE Int. Conf. Robot. Autom., 2002, pp [3] S. Mahadevan and J. Connell, Automatic programming of behaviorbased robots using reinforcement learning, in Proc. Nat. Conf. Artif. Intell., 1991, pp [4] A. McGovern and A. G. Barto, Automatic discovery of subgoals in reinforcement learning using diverse density, in Proc. 18th Int. Conf. Mach. Learning, 2001, pp

7 1040 IEEE TRANSACTIONS ON ROBOTICS, VOL. 22, NO. 5, OCTOBER 2006 [5] L. P. Kaelbling, Hierarchical learning in stochastic domains: Preliminary results, in Proc. Int. Conf. Mach. Learning, 1993, pp [6] T. Lane and L. P. Kaelbling, Nearly deterministic abstractions of Markov decision processes, in Proc. 18th Nat. Conf. Artif. Intell., 2002, pp [7] Y. Takahashi, K. Edazawa, and M. Asada, Multi-controller fusion in multi-layered reinforcement learning, in Proc. Int. Conf. Multisensor Fusion Integr. Intell. Syst., 2001, pp [8] E. L. Lawler, J. K. Lenstra, A. H. G. Rinooy Kan, and D. B. Shumoys, The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. New York: Wiley, [9] S. Kambhampati and L. S. Davis, Multiresolution path planning for mobile robots, IEEE J. Robot. Autom., vol. RA-2, no. 3, pp , Jun [10] H. Asama, M. Sato, L. Bogoni, H. Kaetu, A. Matsumoto, and I. Endo, Development of an omni-directional mobile robot with 3DOF decoupling drive mechanism, in Proc. IEEE Int. Conf. Robot. Autom., 1995, pp [11] Y. Fukazawa, C. Trevai, J. Ota, H. Yuasa, T. Arai, and H. Asama, Region exploration path planning for a mobile robot expressing working environment by grid points, in Proc. IEEE Int. Conf. Robot. Autom., 2003, pp Nonholonomic Distance to Polygonal Obstacles for a Car-Like Robot of Polygonal Shape Paolo Robuffo Giordano, M. Vendittelli, Jean-Paul Laumond, and Philippe Souères Abstract This paper shows how to compute the nonholonomic distance between a polygonal car-like robot and polygonal obstacles. The solution extends previous work of Reeds and Shepp by finding the shortest path to a manifold (rather than to a point) in configuration space. Based on optimal control theory, the proposed approach yields an analytic solution to the problem. Index Terms Car-like robots, nonholonomic distance, optimal control theory, shortest paths. I. INTRODUCTION Distance computation plays a crucial role in robot motion planning. Numerous motion-planning algorithms rely on obstacle distance computation, e.g., skeletonization and potential field methods [1]. The distance from a robot configuration to an obstacle is the length of the shortest feasible path bringing one point on the robot boundary in contact with the obstacle. Car-like robots being nonholonomic systems, any path in configuration space is not necessarily feasible. As a consequence, the length of the shortest feasible path induces a special metric, the so-called nonholonomic metric, which is not a Euclidean metric [2]. The search for a shortest path between a polygonal robot and a polygonal obstacle in physical space can be easily reformulated into the con- Manuscript received July 20, 2005; revised January 5, This paper was recommended for publication by Associate Editor J. Wen and Editor H. Arai upon evaluation of the reviewers comments. P. Robuffo Giordano and M. Vendittelli are with the Dipartimento di Informatica e Sistemistica, Università di Roma La Sapienza, Rome, Italy ( robuffo@dis.uniroma1.it; venditt@dis.uniroma1.it). J.-P. Laumond and P. Souères are with LAAS-CNRS, Toulouse Cedex 4, France ( jpl@laas.fr; soueres@laas.fr). Digital Object Identifier /TRO Fig. 1. Car-like robot. figuration space C, i.e., representing the robot as a point and mapping the obstacles in their C-obstacle counterparts. The original problem is then transformed in finding the shortest path to the manifold defining the C-obstacle. Adopting an optimal control point of view, the proposed approach makes use of transversality conditions on the final state of the robot, which make the problem square everywhere (i.e., same number of unknowns and equations), and provide deeper insight of the solution. Moreover, simple continuity arguments allow restricting the search for the optimal path to a subset of the Reeds and Shepp (RS) families. II. CAR-LIKE ROBOTS AND THE SHORTEST PATH PROBLEM The configuration of a car-like robot, sketched in Fig. 1, at the instant t is completely defined in C = 2 2S 1, by the position (x(t); y(t)) of the reference point and the heading direction (t) of the robot. The model of the car to which we will refer in this paper is described by the control system _x(t) =cos(t) 1 u1(t) _y(t) =sin(t) 1 u1(t) _(t) =u2(t) where ju1(t)j = 1 and ju2(t)j 1 are, respectively, the linear and angular velocities of the car. This model is referred to as the RS car. A. Shortest Paths Without Obstacles The study of the shortest paths between any two configurations in the absence of obstacles has been addressed first by Dubins [3], who proved the existence of a sufficient set of optimal paths when u1 1 (the robot moves only forward). Reeds and Shepp [4] extended this result to the forward/backward case (u1 = 61), showing that there always exists a shortest path composed of straight lines and turns of minimal radius with at most two cusps, among a family of 48 different paths; moreover, every path is specified by three parameters representing the length of the basic components (arcs, lines). The problem has been revisited from a control-theory point of view by Boissonnat et al. [5] and by Sussman and Tang [6], who reduced the sufficient family to 46 paths. Souères and Laumond, using these results, computed a synthesis of the shortest paths [7], i.e., a partition of the manifold 2 2S 1 into cells defined by the type of optimal paths (among the 46 candidates) that reach their points. They also showed the metric nature of the length of the shortest path between two configurations [2]. B. Shortest Paths With Obstacles The problem of computing the shortest paths for a car-like robot in the presence of obstacles is a very difficult one. A shortest path for an (1)

Motion Planning of Multiple Mobile Robots for Cooperative Manipulation and Transportation

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 19, NO. 2, APRIL 2003 223 Motion Planning of Multiple Mobile Robots for Cooperative Manipulation and Transportation Atsushi Yamashita, Member, IEEE, Tamio