Robot behaviour learning using Topological Gaussian Adaptive Resonance Hidden Markov Model

Size: px

Start display at page:

Download "Robot behaviour learning using Topological Gaussian Adaptive Resonance Hidden Markov Model"

Rolf Lee
5 years ago
Views:

1 DOI /s x ORIGINAL ARTICLE Robot behaviour learning using Topological Gaussian Adaptive Resonance Hidden Markov Model Farhan Dawood 1 Chu Kiong Loo 2 Received: 14 January 2015 / Accepted: 9 August 2015 The Natural Computing Applications Forum 2015 Abstract Behaviour learning by robots is an emerging research field. The robot behaviour learning in realistic environment requires the robot to be able to learn incrementally in order to adapt itself to the dynamic environment. The incremental learning system should be stable and acquire new information without corrupting previous knowledge. Probabilistic approach has become the prevalent epitome in robotic learning systems. In probabilistic learning models, such as Hidden Markov Model, estimating the optimal number of states is difficult which limits the models capabilities by limiting the number of motion patterns to be learned a priori. In this paper, we presented an approach for learning behaviour patterns through continuous observation. We have proposed a novel architecture for learning the spatio-temporal sequences using Topological Gaussian Adaptive Resonance Hidden Markov Model. The proposed model dynamically generates the graph-based structure for the observed patterns through a novel topological mapping architecture. The structure (number of states) of the probabilistic model is not fixed and is adjusted based on the topological map of the acquired motion features. The topological map consists of nodes connected with edges, where each node represents the encoded motion features to adaptively generalize the observed behaviour patterns. The model combines the self-organizing, self-stabilizing & Chu Kiong Loo ckloo.um@gmail.com; ckloo.um@um.edu.my Farhan Dawood farhan.dawood@gmail.com 1 2 Advanced Robotics Lab, University of Malaya, Lembah Pantai, Kuala Lumpur, Malaysia Department of Artificial Intelligence, University of Malaya, Lembah Pantai, Kuala Lumpur, Malaysia properties and time series processing features to learn the spatio-temporal sequences. To demonstrate the properties of the proposed algorithm, we have performed a set of experiments through simulation on DARwIn-OP humanoid robot. Keywords Behaviour learning Adaptive resonance theory Gaussian distribution Hidden Markov Model Incremental learning Topological map 1 Introduction Due to the recent developments in robotics, the robots are able to move and act in human-centred environment, taking part in our daily lives. This has introduced the need for building robotic systems equipped with behaviour learning capabilities. Motivated by the fact that when a robot is in an environment, some novel events may be encountered by the robot which were not learned previously. This will make the robots ineffective for such dynamic environments. Therefore, the robot must be capable of updating these changes so that it can adapt itself efficaciously to novel situations encountered during learning. A natural way of representing these behaviours by the robot is through the use of probabilistic models. Hidden Markov Models (HMM) [1], originally developed for speech recognition and synthesis, have been widely used for human motion recognition and generation. HMM takes sequential data as input and generates the probabilities. The use of HMM is ubiquitous in time series signal processing. The ability of HMMs to generalize human actions has led to the development of several methods for HMM-based movement recognition and generation [2, 3]. However, most of these methods operate offline where the model structure is static and determined beforehand. In offline or

2 batch learning method, the motion patterns are processed sequentially. Such interaction might seem artificial, since the patterns are pre-defined by the designer and cannot accommodate the new properties of various tasks. Therefore, the stochastic model should allow new information to be incorporated incrementally. One major dilemma associated with incremental learning and memorizing the observed behavioural actions into its repertoire is the stability and plasticity dilemma [4], i.e. how can a system retain old memories but also learn new ones. For life-long learning in dynamic environment, the system must be able to learn to adapt to the changing situations by observing new patterns and keep the previous knowledge in a stable manner. Another issue associated with the well-established learning algorithms based on stochastic models is in estimating the structure of the model. Setting up appropriate number of states for the model is difficult because the number of motion patterns in the data is unknown. If there is less number of states than the observed trajectory components, the model cannot explain patterns considerably. On contrary, if the model states are large in number, the system will require too much training data and excessive computational requirement. Thus, the major question is to find a suitable model structure that would best approximate the given environment without loss of information. In the literature, the structure of HMM, i.e. the number of HMM states, is either to be chosen manually or to be estimated using other methods such as Bayesian information criterion (BIC) [5] or Akaike information criterion (AIC) [6]. This limits the number of motion patterns to be learned a priori and results in a trade-off between model fitness and number of parameters. In this paper, we are proposing a novel probabilistic model for encoding spatio-temporal motion patterns to develop a self-stabilizing and self-scaling architecture. The major contribution of our work is in developing a novel incremental learning architecture using an adaptive structure to solve the problems of: (1) How to select the structure for the probabilistic model, i.e. estimating the number of states to efficiently encode the sensory data without restricting the learning capabilities of the robot? (2) How the model can retain previously learned data and also acquire new observed knowledge in a self-organizing and self-adaptive manner? (3) How the robot can interact with the environment incrementally so that when new pattern is observed the model should incorporate it? For this purpose, we have proposed a novel Topological Gaussian Adaptive Resonance Hidden Markov Model (TGAR-HMM). In contrast to conventional HMM, the developed probabilistic learning algorithm is based on incrementally learning spatio-temporal behavioural sequences by developing the graph-based structure of the behaviour patterns in the form of topological map using Topological Gaussian Adaptive Resonance Map (TGARM). The topological model incrementally updates the number of states and parameters required by the probabilistic model to encode the observed motion elements. This compactly describes the environment as a collection of nodes linked by edges. TGARM creates a stable representation while incorporating new knowledge. After organization of the data, the temporal sequence is learned through the TGAR-HMM, where each node represents a state of the probabilistic model while an edge between two nodes represents the transition between these states. This state-space distribution allows transitions between neighbouring states only. The structure of the model and its parameters representing the probabilities are updated with the acquisition of new observation. Figure 1 shows the graphical representation of the proposed architecture. In order to evaluate the performance of our proposed architecture, we have tested our algorithm on two different types of datasets and analysed its performance. These tests were simulated on DARwIn-OP humanoid robot. The actions involved during testing range from simple behavioural gestures to complex action of kicking the ball. The rest of the paper is organized as follows: The next section reviews the related work. Later, in Sect. 3, we will discuss the proposed novel approach for motion patterns learning through TGAR-HMM. The experimental results and analysis of the data are described in Sect. 4. Finally, in Sect. 5, we will conclude our work with some discussion on the proposed model. 2 Related work Adaptive resonance theory (ART) models the cognitive and neural theory of how brain independently learns to categorize and recognize the events in the dynamic environment [7, 8]. The prominent features of this family of neural networks led the engineers to design various models such as ART-1 [9] (for binary input patterns), ART-2 [10] (for analog and binary input patterns) and Fuzzy ART [11] (combination of fuzzy logic and ART). However, learning complex spatio-temporal sequences through ART networks is still under development. To learn the temporal orders of sequential events, Bradski et al. [12] presented a working memory model called Sustained Temporal Order REcurrent (STORE) model. This architecture makes use of the ART network to recognize the temporal event sequences. Growing neural gas (GNG) is well known as unsupervised learning method. Furao & Hasegawa [13, 14] suggest several enhancements to the unsupervised variant of GNG and particularly focusing on the life-long learning of nonstationary data for solving the issues like the topology learning of images and clustering. They applied a two-

3 Fig. 1 Overview of the TGAR-HMM architecture. The observed behaviour sequence is first arranged through topological map. This topological map is then used to update the state structure for estimating the optimal number of states and the transition probabilities among these states layered network, where the first layer is used to generate a topology structure of the input data and the second layer is used to determine the number of clusters. Unfortunately, such neural networks do not work with spatio-temporal data such as speech or motion patterns. The high requirement for node insertion of the SOINN and use of the utility parameter restrains topographical formation and organization of the input data to the SOINN network. Botzheim et al. [15] presented a structured learning approach by combining GNG and Spiking Neural Network (SNN), suitable for spatio-temporal learning. The approach contains two stages, a topology learning phase and a spatiotemporal learning phase. The growing neural gas (GNG) is applied in the first phase for information extraction, and the spiking neural network (SNN) is applied in the second stage to recognize the gesture. In the second step, spiking neural network is applied to understand the spatiotemporal information and estimate the activity. Seyhan et al. [16] developed a behaviour learning model for simple and complex actions in robot using HMM and a correlation-based ART (CobART) network. CobART is a ART 2 type of network. The motion primitives acquired from the CobART are modelled through HMM to represent a relation between these motion primitives. The model generates different categories for the same behaviours but with slight variation between them, thus providing a correlation between the motion patterns. A similar model was presented by [17] through hierarchically integrating CobART networks. The model learns the spatio-temporal sequences; however, the structure of the behaviour HMM proposed by them is fixed and cannot grow incrementally. Kulic et al. [6] proposed an approach for online learning of motion primitives while observing and interaction with human collaborator. The method is based on online segmentation of the observed motion patterns for generating motion primitives, and incremental learning algorithm is used to provide the relationship between motion primitives for motion generation process. A hierarchical tree structure is incrementally formed representing the motion learned by the robot. Each node in the tree represents a motion primitive. The observed behaviours are encoded in the HMM and then clustered based on intra-modal distance measured using the relative log likelihood. In addition to the tree structure, a directed graph model is built representing the sequential relationship between the motion primitives which is then used for motion generation [18, 19]. A similar approach has been used by Kulic et al. [20] using Factorial HMM. FHMM is a generalization of HMM where

4 multiple HMM chains interact to generate single output. Each dynamic chain models the observed motion pattern with its own transition and output probability parameters. At each time step, the output generated by each chain is summed to generate the optimized observed output. A symbol space is constructed to discriminate between behaviour patterns. In all these methods, the number of HMM models encoding the motion patterns is not fixed; however, the number of states in each HMM model is pre-defined. Calinon et al. [21] presented two incremental learning approaches: direct and generative method. These methods are used to update the models parameters when novel skills are demonstrated. The probabilistic model is first learned using joint angles trajectories observed using motion sensors and is progressively refined using kinaesthetic process. For direct update method, the idea is to adapt the EM algorithm by separating the parts dedicated to the data already used to train the model and the one dedicated to the newly available data. The model is first created with data points and updated iteratively during EM step, until convergence to the set of parameters. When new data are available, regression is used to generate stochastically new data by considering the current GMR model. Cho et al. [22] describe an incremental learning approach for learning of behaviours by teaching a humanoid robot kinaesthetically by manually moving robots joints. The motion patterns are encoded in GMM described in a latent space with reduced dimensionality computed using PCA. When a new behaviour is taught, the latent space is modified and GMM is updated. It is assumed that complete data are not available, and therefore, an approximation of BIC is utilized to estimate the GMM parameters. The generalized motion patterns are generated using Gaussian Mixture Regression (GMR). New motions are generated through the fusion of two previously learned motion patterns using GMR. Similarly, new motion can also be generated combining motions spatially or from different time intervals [23]. Vasquez et al. [24] have developed an incremental model for learning and predicting through growing HMM. The model updates the structure of HMM through the use of topological mapping algorithm called Instantaneous Topological Mapping (ITM). The model was designed for vehicle motion learning and prediction. Similar to our approach, the growing HMM architecture uses the topological mapping structure to define the HMM structure. In an earlier version of the algorithm, they have employed growing neural gas (GNG). [25] developed a topological structure for unsupervised learning using Fuzzy ART. Using TopoART algorithm, a stable representation of the data is created. The model was only used for clustering and is not suitable for spatio-temporal learning. Fung et al. [26] developed an ART network-based taskindependent robot behaviour learning mechanism. They called their model the Behaviour Learning Operating Modular (BLOM) Architecture. In this architecture, the vigilance parameter of ART network is updated through game-theoretic adaptation strategy. The perceptual sensory inputs and actions are categorized separately through the ART network. The association between these two category types is established through the associative memories. The BLOM architecture was tested on simple path planning experiments; however, this architecture is not suitable for learning spatio-temporal behavioural sequences. [27] proposed an adaptive neural architecture for robot behaviour learning which dynamically grows when required (GWR). The GWR model clusters the sensory-motor data through the radial basis function (RBF) network. 3 Topological Gaussian Adaptive Resonance Hidden Markov Model In this section, we will explain the proposed incremental spatio-temporal sequence learning algorithm. The motion features (joint angle values) acquired from robot sensors are first organized through topological map using TGARM algorithm. This topological structure is then used to update and estimate the structure of probabilistic model to encode acquired features. Structurally, TGAR-HMM is similar to the standard HMMs, besides the fact that the transition structure and the number of states are not constant but vary as more input observation sequences are processed. In addition to that, the learning algorithm is able to incrementally update the model. In our proposed architecture, an HMM can be considered as graph whose nodes represent states attainable by the object and whose edges represent transitions between these states. The system is assumed to be at a particular state and to develop stochastically at discrete time steps by following the graph edges according to a transition probability. TGAR- HMM is described as a time evolving HMM with continuous observation variables, where the number of HMM states, structure and probability parameters are updated every time that a new observation sequence is available. The main idea for developing the proposed probabilistic model is that the structure of the model should consider the spatial structure of the state-space discretization, where the transition among discrete states are only permitted if the corresponding regions are neighbours. Hence, structure learning essentially consists of estimating the suitable space discretization from the observed data and identifying neighbouring regions. We have addressed this problem by proposing a topological map using the Topological Gaussian Adaptive Resonance Map (TGARM) algorithm. For parameter learning, we have utilized the incremental expectation maximization (EM) approach to handle the changing number of states and with continuous observation.

5 3.1 Motion primitive modelling topological Gaussian adaptive resonance map In this phase, we extract useful motion sequences which are represented as the nodes linked with each other through edges. Through topological map, the continuous environment is represented as a state-space distribution organized as a graph where each node is connected through the edges. The function of the topological map is to develop a discrete structure of the continuous environment. The input to the learning algorithm consists of a series of discrete observation (i.e. joint angle values from sensor reading) describing the motion features. In addition, the observations are arranged in sequences O 1:T ¼½O 1 ;...; O T :¼ O t such that every sequence described the trajectory of action. We adopt the competitive Hebbian rule proposed by Martinez [28] in topology-preserving networks to build connections between neural nodes. The competitive Hebbian rule can be described as: for each input signal, connect the two closest nodes (measured by distance) by an edge. It has been proved that each edge of the generated graph belongs to the Delaunay triangulation corresponding to the given set of reference vectors, and the graph is optimally topology-preserving in a very general sense (Fig. 2). We designed a topological map of the observed data through a novel algorithm called TGARM. The TGARM model is based on the Gaussian mixture model of the input space where each Gaussian component represents a category node. TGARM has following properties: The structure grows incrementally by incorporating new knowledge without defiling the previously learned data and adaptively responds to the information acquired from the environment. This avoids catastrophic forgetting, i.e. retain previously learned knowledge effectively. Each node in the topological map is defined as a Gaussian distribution, with mean and covariance. The parameters of each node in the topological map are updated for each input sequence observed rather than acquiring an entire data set. TGARM has more resistance to noise and take into consideration the statistical distribution of the input pattern. Figure 3 shows the structure of topological map model. The motion features, such as joint angle values, are provided as input to the architecture. Each input neuron is connected to the output neurons through the bottom-up weights; conversely, each neuron in the output layer is connected to the input layer through the top-down weights. The bottom-up weights or the activation function provides likelihood that an input pattern is a probable candidate for being a node or a category, whereas the top-down weights or the matching function provides a confidence measure about the selected candidate to be added as a node in the Fig. 2 Example of space distribution through topological mapping showing nodes connected through edges Fig. 3 Architecture for creation of topological map network. This confidence measure is controlled by the vigilance parameter, q. Otherwise, the feature space is searched for new candidate. The output layer creates a topological structure of the input data. Each node weight is defined by a vector l j and matrix R j representing its mean and covariance, respectively. Another associated parameter with each node is the node count or learning rate n j representing the number of nodes or the number of input patterns learned by TGARM. The network is initialized with two parameters: the baseline vigilance parameter q which takes the values within interval (0, 1) and the initial covariance matrix, c. The variables that define the contents of the node are summarized in Table 1. At the beginning of learning, the mean value is initialized with the input parameter and the covariance matrix is initialized with some suitable value (through experiments). As time

6 Table 1 Topological Gaussian Adaptive Resonance Map (TGARM) parameters Parameters x J l j R j q n j Meaning Winning node Mean value weight parameter Covariance matrix weight parameter initialized by initial covariance matrix c Vigilance parameter initialized by baseline vigilance parameter q with values (0, 1) Node count or learning rate progresses, the nodes are added into the network and their associated weight values are updated. The learning algorithm grows its neural structure starting with the first node. The motion features, represented by joint angle values, are encoded as a Gaussian node in the structure, and two nodes are connected with the edges. This allows the flow of information among neighbouring nodes. This algorithm allows the observer to incrementally learn and update the structure of the model based on the observed motion patterns. The algorithm for topological mapping is summarized in Algorithm 1. Algorithm 1 Algorithm for Topological Gaussian Adaptive Resonance Structure Require: : Observation Vector O t Covariance Matrix Σ Baseline Vigilance Parameter ρ Ensure: : Nodes N Edges E Input the observation vector O t. if Thereisnonodeinthenetworkthen Add O t in the network as new node. N N i {O t } ; n i =0 Update the weights of the node N i (n, μ, Σ) using Eq. (4) (6) else Determine the winner node ω J using (1) and (2) Determine the vigilance criterion for the winner node ω J using Eq. (3) if calcvig< ρ then Add as a new node N N {O t } Update the weights of the ω J (winner node) N (n J,μ J,Σ J )usingeq.(4) (6) Add edge between the previous winner and current winner nodes E E {(prevw inner, ω J )} else if calcvig ρ then Reset the winner node and find a new winner from observation vector. Update the weights of previous winner node. Obtain the new observation vector O t. If the learning is not completed, go to Step 6 to process the next observation. end if end if end if During learning a winning node w J is selected from an input pattern based on the highest probability. Since each node is represented by Gaussian components defined by the mean values, l j and the covariance matrix R j, therefore, the conditional density of O t given the winning node j or the bottom-up input activation value for a node is calculated as: 1 pðo t jjþ ¼ ð2pþ M=2 jr j j 1=2 exp 1 2 ðo i l j Þ T R 1 j ðo i l j Þ ð1þ where M is the dimensionality of the input motion patterns. For each input pattern, the activation value is calculated using (1) and the neuron with highest activation value is selected which determines the node with the highest probability (2). J ¼ arg max pðo t jjþ j ð2þ But the node represented by its weights determined by mean and covariance is only allowed to be updated if the vigilance criterion or matching between the given input and the selected winner node is fulfilled. A node w J passes the vigilance criterion if its match function value exceeds the vigilance parameter value q, that is, if: exp 1 2 ðo t l j Þ T R 1 j ðo t l j Þ q ð3þ The vigilance is a measure of similarity between the input and the node s mean relative to its standard deviation. If the winning node fails to pass the vigilance test (3), the current winner node is disqualified and its activation value is reset. Then, the observation pattern is searched for the new winning best-matching neuron. If no satisfactory neuron is found, a new neuron representing the input pattern with n J ¼ 0 is integrated satisfying the vigilance criterion. When the winning neuron, satisfying the vigilance condition representing the input pattern, is selected, its parameters, i.e. count, mean and variance, are updated using (4) (6). n J ¼ n J þ 1 l J ¼ 1 1 n J R J ¼ l J þ 1 1 R J þ n J 1 n J 1 n J O t ðo t l J ÞðO t l J Þ T ð4þ ð5þ ð6þ This algorithm allows the observer to incrementally learn and update the structure of the model based on the observed motion patterns. When the resonating neuron is determined, a lateral connection or an edge is established

7 between the current and the previous winner node. This mechanism will provide a stable architecture for providing a link between the previously learned knowledge and also integrate newly observed data in order to map temporal correlation between them. The performance of the TGARM depends on two parameters: the vigilance parameter q, and the initial covariance matrix c. The vigilance parameter directly influences the formation of new nodes when novel information is detected. For higher values of the vigilance parameter, the system becomes more sensitive to the changes in the input and the network becomes complicated. On the other hand, for the lower values of the vigilance parameter the system becomes less sensitive and faster. Therefore, the decision about the vigilance parameter value greatly influences the convergence and recognition properties of the system. Furthermore, the generalization performance of the network is greatly affected by the selection of values for initial covariance matrix. 3.2 Incremental motion learning After updating the structure of the model, motion patterns are learned through the probabilistic module. This is done through Hidden Markov Model (HMM). HMM is a doubly stochastic model composed of states which are not directly observed. HMM explicitly includes time, resulting in efficient learning of temporal sequences and compensate for uncertainties. Each state in the HMM emits an observation as output which infers the most likely dynamical system. Each state is connected by transitions between the states and generates an output pattern. Figure 4 shows an example of left-to-right HMM where each motion feature is encoded in the HMM state fs 1 ; s 2 ;...; s N g and allows the observation symbol, b i ðþ, to be emitted from each state. The probability of moving from one state to another is given by transition probability a ij. In order to select the appropriate structure of the HMM or selecting the optimum numbers of HMM states, TGARM algorithm is employed. After updating the model structure, the remaining parameters of HMM, such as transition probability and prior probabilities, are updated using expectation maximization (EM) algorithm [1, 29]. In our model, the transition between different states is characterized by the edges connecting neighbouring states, and thus, the system can only make transition between the neighbouring states. Since behaviour patterns are ordered sequences composed of different atomic actions or motion primitives, each motion element is encoded as a Gaussian. Therefore, we used left-to-right HMM model structure for representing observed motion patterns to allow the data to flow in a sequential order in forward direction of time. In left-to-right HMM, the self-transition loop is also allowed. An HMM is characterized by the following parameters: State prior probability p i ¼ Pðs 0 ¼ iþ. This represents the prior probability for the corresponding state. State transition probability matrix a ij ¼ Pðs tþ1 ¼ jjs t ¼ iþ. This represents the probability of transition from state i to state j. Observation probability distribution B ¼ PðO t js t ¼ iþ. This represents the probability distribution of observation vector from state i. This distribution is represented by Gaussian function denoted by the parameters NðO t jm i ; C i Þ, where m i and C i are mean vector and the covariance matrix for the i-th state in HMM. These HMM parameters are denoted as k ¼fp; A; Bg ¼ fp; A; m; Cg. Each hidden state in HMM encodes and abstracts an observed motion pattern where a sequence of these motion pattern is estimated using the transition between these hidden states. Table 2 summarizes the contents of the state of HMM. Table 2 Hidden Markov Model Parameters Fig. 4 Motion sequences encoded in Hidden Markov Model (HMM) Parameters p i a ij m i C i O t Meaning State prior probability every node added in the map is initialized with state prior value State transition probability updated through the edges connecting two nodes in TGARM. Mean value for Gaussian in HMM updated through topological map for each node ðm i ¼ l i Þ Covariance matrix for each node in HMM updated through topological map for each node ðc i ¼ R i Þ Observation probability distribution represented through Gaussian distribution with parameters NðO t jm i ; C i Þ

8 3.2.1 Updating structure and parameter of HMM After updating the topological map, the structure of HMM is also updated based on the added nodes and edges. The structure of HMM is updated whenever a new behaviour is observed. Corresponding to every node added in the topological map, a state in the HMM is also added. Each added state is initialized with the prior probability p i ¼ p 0 and self-transition probability a i;i ¼ a 0, where i represents the new node. Similarly, for addition of every new node and the new edges (i, j) connecting these nodes, the transition probabilities are also initialized with state transition probability value a i;j ¼ a 0. After updating the HMM structure, the parameters of HMM are also updated. The mean and the covariance values related to each Gaussian observation are updated during the structure (topological map) updating process discussed in previous section. The same values are used by the HMM. However, the remaining parameters such as transition probability and state prior probabilities need to be re-estimated. These parameters are updated using the expectation maximization algorithm. Traditionally, Baum Welch algorithm [1] which is a type of EM algorithm is used for learning the initial state probability distribution and the sate transition model. Re-estimate the transition probability using Eq. (7). P T 1 t¼1 a ij ¼ a tðiþa ij b j ðo tþ1 Þb tþ1 ðjþ P T 1 t¼1 a ð7þ tðiþb t ðiþ p i ¼ a 1ðiÞb 1 ðiþ ð8þ PðOjs i Þ In Eqs. (7) and (8), the a i and b i represent the forward and backward variables [1]. Table 3 summarizes the computation of these variables. PðOjS i Þ in (8) determines the joint observation probability. In order to update the parameters incrementally for new observed data, an incremental learning rule is applied as follows: Table 3 Recursive Computation of Forward and Backward variables Algorithm Forward variable Backward variable Computation a i ð1þ ¼p i b i ðo 1 Þ a j ðt þ 1Þ ¼ PN a i ðtþa ij b j ðo tþ1 Þ i¼1 b i ðtþ ¼1 b i ðiþ ¼ PN a ij b j ðo tþ1 Þb j ðt þ 1Þ j¼1 a ij ¼ a ij þðn p 1Þa ij N p ð9þ p i ¼ p i þðn p 1Þp i ð10þ N p where N p is the number input patterns that have been observed until the current time. 4 Experimental results In this section, we will discuss the experimental setup and datasets used along with the analysis of the proposed work on these datasets. The proposed algorithm was tested on open humanoid platform DARwIn-OP [30] developed by Robotis Co. Ltd. For simulation purposes, we have used the Webots simulator. DARwIn-OP has 20 degrees of freedom with 3-axis accelerometer (Fig. 5). The proposed approach for robot behaviour learning was tested on a dataset containing a series of different action sequences obtained through robot sensors. The dataset contains joint angle values for multiple observations. These joint angle values are used as input to the learning algorithm. In the experiment, the performance of incremental learning algorithm is computed in terms of reproducing the joint angle values. For learning the number of hidden states for each behaviour was estimated using TGARM. The experiments are conducted to test the efficacy of the proposed model for learning and reproducing the motion patterns. Motion sequences are presented one at a time to the algorithm incrementally, simulating online, sequential acquisition. The algorithm was tested for two types of datasets. The first experiment consists of different kinds of behavioural gestures. The dataset consists of variety of different actions involving upper part of the body. These actions include standing up, raising and lowering left and right arm by 180 one at a time, raising and lowering both arms by 180 simultaneously, raising and lowering left and right arm by 90 one at a time, raising and lowering left and right arm by 90 one at a time. All these gestures are required to achieve different kinds of goal position representing each category of behaviour. The observed behavioural trajectory is acquired from different motion sensors and mapped onto different categories used by the model. The first experiment was conducted on the behaviour gestures generated by the robot. The behavioural actions are generated using the movement of joints in the Webots. The action features consist of rotational angle values of respective joints measured in radians. Figure 6 shows the visualization of actions performed. Each image in the figure shows different frames extracted from the action sequences.

9 Neural Comput & Applic Fig. 5 Kinematic information of open humanoid platform DARwIn-OP Fig. 6 Different samples of behavioural actions performed by the robot during experimentation

Fig. 7 Different frames from the walk approach kick experiment performed on DARwIn-op humanoid robot in Webots The second experiment consists of complex behavioural patterns of finding the ball,

10 Fig. 7 Different frames from the walk approach kick experiment performed on DARwIn-op humanoid robot in Webots The second experiment consists of complex behavioural patterns of finding the ball, walking towards the ball and then kicking the ball. During these experiments, the robot first stands up and searches for the ball using image processing techniques. The discussions of these processing algorithms are out of the scope of this paper and are considered to be already available for experimentation as they have no direct effect on the experimentation. After finding the ball, the robot starts walking towards the ball and kicks the ball either with left or with right foot. The joint angle values for respective joints (involved in action) are recorded. The experiment was conducted for different position and distance of the ball with respect to the robot. For example, placing the ball behind or in front of the robot involves turning of robot in the direction of the ball. Figure 7 shows the random frames extracted from the sequence of dataset-2. As discussed earlier, the performance of TGARM greatly depends on the selection of values for the vigilance parameter and initial covariance matrix. In current experiments, the vigilance parameter values are selected in an ad hoc fashion by trial and error choosing the optimal value for the parameters. The experiments were performed for different values of vigilance parameter, selecting the value which efficiently generalizes the observed patterns. Similarly, the initial covariance matrix determines the isotropic spread in feature space of a new node s distribution. For large c, the learning will be slow with fewer nodes, while for smaller values of c, the training will be faster with more number of nodes. The initial covariance matrix is selected in the same manner as the vigilance parameter. We evaluated the performance of the system using the error between he demonstrated and generalized motion to determine the appropriate adapting learned motion. The error metric provides a measure for the evaluation of the generalization capability of the proposed model. The mean error is used as a metric to evaluate the sustainability of the generalized motion with respect to the demonstrated motion. We utilize heuristics to find the suitable parameters for the TGAR-HMM. We have applied different values for the vigilance and the initial covariance matrix to find the suitable candidate values for these parameters. For behaviour dataset, a value of 0.95 is observed to give a good generalization performance and a compromise between the number of nodes and the mean square error. The reason for selecting a value of 0.95 for the vigilance parameter is to generate the motion pattern as close as possible to the original pattern. This results in selection of optimal number of states or nodes during learning. Similarly, a value of 0.3 is chosen for the initial covariance matrix. Table 4 summarizes the selected parameters. Figure 8a shows the effect of selecting various values for vigilance parameter on the mean square error and the number of nodes. As the value of vigilance parameter is increased, the number of nodes also increased. Inversely, as the value of vigilance parameter is decreased, the value of MSE decreases. This shows that the higher value of vigilance parameter efficiently generalizes the learned behaviour patterns. Figure 8b shows the effect of vigilance parameter on the compression ratio (CR) (11). The compression ratio is defined through the ratio between the original motion samples and the number of nodes generated by the learning algorithm. As the value of vigilance parameter is increased, compression ratio is reduced by Table 4 Selected parameters and mean square error Algorithm Parameters MSE GHMM q ¼ 0:95, c ¼ 0: TGAR-HMM c ¼ 0:3, s ¼ 0:

11 (a) (b) Fig. 8 a Plot of MSE and number of nodes for different values of vigilance parameter. b Effect of vigilance parameter on compression ratio Fig. 9 Computation times adding number of nodes and minimizes the mean square error to better generalize the observed behaviour patterns. No: of samples CR ¼ ð11þ No. of nodes Figure 9 plots the processing time taken by learning algorithm with respect to the number of trajectories. The model size is represented as the number of nodes in the TGAR-HMM structure is also given as a reference. As may be anticipated, the learning time appears to be linearly dependent on the size of model. An interesting observation is that the time per observation is lower than 1ms. Thus, algorithm is well adapted for online applications in the environment where it is likely to observe fewer trajectories per second. We evaluated the performance of the system using error between the demonstrated and generalized motion to determine the appropriate adapting learned motion. The mean error is used as a metric to evaluate the sustainability of the generalized motion with respect to the demonstrated motion. This error metric provides a measure for the evaluation of generalization capability of proposed learning model. The motions generated from the RBA and LBA for the behaviour dataset are summarized in Fig. 10. Here

12 (a) (b) Fig. 10 Plot of original and generated motion patterns for different parts of the robots joints during behavioural gestures experiment of a RBA, b LBA 4.1 Performance evaluation Fig. 11 Generalization result for walk approach kick motion pattern. The figure displays the joint angle pattern for right hip yaw joint movement during experiment we have only shown the original and generalized motion patterns acquired from the left right shoulder and left right elbow of DARwIn-OP robot. These experiments show that the proposed algorithm can learn the behaviours and generate the same behaviours efficiently. Using the same heuristics applied for the behavioural gestures data, we select the vigilance parameter value for the find-walk-kick ball experiment. A value of 0.95 is selected as a suitable candidate. Figure 11 shows the generalization result for the second experiment involving the complex behaviour of walking, approaching and kicking the ball. Due to the large dimensionality of the dataset, for visualization, we only show the result of right hip joint angle value since these play the major role in walking motion. In order to assess the effectiveness of the proposed architecture, we have performed a comparison of our approach against the most similar approach, called Growing Hidden Markov Model (GHMM) proposed by Vasquez et al. [24]. Comparing heterogeneous techniques, even if they solve a common problem, is a difficult task. Often their theoretical bases are too different, making it difficult to evaluate them fairly. Here we propose a measure, despite its simplicity, still provide useful indicator about the parsimony of the models produced by the different approaches. Vasquez et al. [24] have developed an incremental model for learning and predicting through growing HMM. The model updates the structure of HMM through the use of topological mapping algorithm called instantaneous topological mapping (ITM). In ITM, the insertion of nodes is controlled by the Mahalanobis distance between the nodes and the insertion threshold parameter s. The model was designed for vehicle motion learning and prediction. Similar to our approach, the growing HMM architecture uses the topological mapping structure to define the HMM structure. The structure of the transition matrix is the fundamental complexity factor for inference on HMMs. Therefore, we measure the size of the model by the number of nodes in the transition graph. The parameters we have chosen for every algorithm are shown in Table 4. Owing to the difficulty in choosing adequate parameters to compare the approaches, we have started by making an educated guess and then refining them by trial and error. The growth on the model size with respect to the number of trajectories on the

13 (a) (b) Fig. 12 Model size (number of nodes) comparison between TGAR-HMM and GHMM for two different datasets. a Behaviour dataset, b walk approach kick dataset learning dataset is displayed in Fig. 12. As can be seen in the figure, the size of the TGAR-HMM models is small with respect to the discussed approach. The proposed approach is related to the growing HMM (GHMM) model. However, unlike GHMM, in the proposed algorithm we used Gaussian distribution for adding and updating new nodes. Furthermore, instead of using fixed covariance matrix for modelling the observation probabilities, the TGAR-HMM model uses the covariance matrix which is updated based on the input patterns. 5 Conclusion In this paper, we have developed a novel architecture for incremental and online learning for continuous flow of motion patterns based on novel architecture called Topological Gaussian Adaptive Resonance Hidden Markov Model (TGAR-HMM). The observed motion patterns are encoded through TGAR-HMM. The structure of the model is updated incrementally through a novel topological map. Based on this topological map, the model aggregates the information as observed and organize the acquired information in an efficient growing and self-organizing manner. The dynamic architecture grows incrementally to adapt to the new data from the environment. The fundamental characteristics of the network are the stable learning eliminating the stability and plasticity dilemma and fast convergence. As the new sequence of data is learned, the network converges to create a stable topological structure of the input trajectory. Based on this feature, the proposed architecture does not suffer ruinous forgetting and also retains previously learned data and accommodates the new information. The TGAR-HMM model efficiently learns and encodes the spatio-temporal patterns and computes the probability that observation sequences could be generated. Secondly, the novel HMM architecture adaptively selects the models structure based on the observed data and is not pre-defined based on some prior knowledge. The proposed model has been tested for different kinds of behaviour on an open humanoid platform DARwIn-OP through simulation. The model can encode and generalize simple type of behaviours (such as behavioural gestures) as well as the complex behaviour patterns consisting of combination of different simple behaviours. The aim of the proposed algorithm is not to exactly trace back the original trajectory sequence as this will result in large number of nodes or HMM states. This results in selection of optimal number of states or nodes from the measured data. This selection is controlled by the vigilance parameter. However, the selection of vigilance is done manually using trial and error method, but the current experiments show that once the suitable value of vigilance parameter is selected for a articular dataset, it can be efficiently applied to different kinds of motion sequences. However, an improved method for selection or adaptively modifying the value of vigilance parameter will efficiently provide a better generalization performance. Acknowledgments This research is supported by High Impact Research Grant UM.C/625/1/HIR/MoE/FCSIT/10 from the Ministry of Education Malaysia. References 1. Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2): Inamura T, Toshima I, Tanie H, Nakamura Y (2004) Embodied symbol emergence based on mimesis theory. Int J Robot Res 23(4 5):

14 3. Calinon S, Billard A (2004) Stochastic gesture production and recognition model for a humanoid robot. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, 2004 (IROS 2004), pp Grossberg S (1987) Competitive learning: from interactive activation to adaptive resonance. Cogn Sci 11(1): Billard AG, Calinon S, Guenter F (2006) Discriminative and adaptive imitation in uni-manual and bi-manual tasks. Robot Auton Syst 54(5): Kulic D, Takano W, Nakamura Y (2007) Incremental on-line hierarchical clustering of whole body motion patterns. In: The 16th robot and human interactive communication 2007, RO MAN Grossberg S (2013) Adaptive resonance theory: how a brain learns to consciously attend, learn, and recognize a changing world. Neural Netw 37: Jain L, Seera M, Lim C, Balasubramaniam P (2014) A review of online learning in supervised neural networks. Neural Comput Appl 25(3 4): Carpenter GA, Grossberg S (1987) A massively parallel architecture for a self-organizing neural pattern recognition machine. Comput Vision Graph Image Process 37(1): Carpenter GA, Grossberg S (1987) Art 2: self-organization of stable category recognition codes for analog input patterns. Appl Opt 26(23): Carpenter G, Grossberg S, Rosen D (1991) Art 2 A: an adaptive resonance algorithm for rapid category learning and recognition. In: International joint conference on neural networks, IJCNN 91, pp Bradski G, Carpenter G, Grossberg S (1994) Store working memory networks for storage and recall of arbitrary temporal sequences. Biol Cybern 71(6): Furao S, Hasegawa O (2006) An incremental network for on-line unsupervised classification and topology learning. Neural Netw 19(1): Furao S, Hasegawa O (2008) A fast nearest neighbor classifier based on self-organizing incremental neural network. Neural Netw 21(10): Botzheim J, Kubota N (2012) Growing neural gas for information extraction in gesture recognition and reproduction of robot partners. In: IEEE 2012 international symposium on micronanomechatronics and human science (MHS), pp Seyhan SS, Alpaslan FN, Yava M (2013) Simple and complex behavior learning using behavior hidden markov model and cobart. Neurocomputing 103: Yavas M, Alpaslan FN (2012) Hierarchical behavior categorization using correlation based adaptive resonance theory. Neurocomputing 77(1): Kulic D, Lee D, Ott C, Nakamura Y (2008) Incremental learning of full body motion primitives for humanoid robots. In: IEEE RAS 8th international conference on humanoid robots, humanoids 2008, pp Kulic D, Ott C, Lee D, Ishikawa J, Nakamura Y (2012) Incremental learning of full body motion primitives and their sequencing through human motion observation. Int J Robot Res 31: Kulic D, Takano W, Nakamura Y (2011) Towards lifelong learning and organization of whole body motion patterns. In: Kaneko M, Nakamura Y (eds) Robotics research, springer tracts in advanced robotics, vol 66. Springer, Berlin, pp Calinon S, Billard A (2007) Incremental learning of gestures by imitation in a humanoid robot. In: Proceedings of the ACM/IEEE international conference on Human-robot interaction, ACM, New York, NY, USA, HRI 07, pp Cho S, Jo S (2011) Kinesthetic learning of behaviors in a humanoid robot. In: th international conference on control, automation and systems (ICCAS), pp Cho S, Jo S (2012) Incremental motion learning through kinesthetic teachings and new motion production from learned motions by a humanoid robot. Int J Control Autom Syst 10(1): Vasquez D, Fraichard T, Laugier C (2009) Incremental learning of statistical motion patterns with growing hidden markov models. IEEE Trans Intell Transp Syst 10(3): Tscherepanow M, Kortkamp M, Kammer M (2011) A hierarchical art network for the stable incremental learning of topological structures and associations from noisy data. Neural Netw 24(8): Wk Fung, Liu YH (2003) Adaptive categorization of art networks in robot behavior learning using game-theoretic formulation. Neural Netw 16(10): Jun L, Duckett T (2003) Robot behavior learning with a dynamically adaptive RBF network: experiments in offline and online learning. In: Proceedings of the second international conference on computational intelligence, robotics and autonomous system, Singapore 28. Martinetz T, Schulten K (1994) Topology representing networks. Neural Netw 7(3): Neal RM, Hinton GE (1999) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI (ed) Learning in graphical models. MIT Press, Cambridge, pp Haa I, Tamuraa Y, Asamaa H (2013) Development of open platform humanoid robot darwin-op. Adv Robot 27(3):

Representability of Human Motions by Factorial Hidden Markov Models

Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems San Diego, CA, USA, Oct 29 - Nov 2, 2007 WeC10.1 Representability of Human Motions by Factorial Hidden Markov