Thesis Proposal : Switching Linear Dynamic Systems with Higher-order Temporal Structure. Sang Min Oh

Size: px
Start display at page:

Download "Thesis Proposal : Switching Linear Dynamic Systems with Higher-order Temporal Structure. Sang Min Oh"

Transcription

1 Thesis Proposal : Switching Linear Dynamic Systems with Higher-order Temporal Structure Sang Min Oh sangmin@cc.gatech.edu 28th May 2008

2 Contents 1 Introduction Automated Temporal Sequence Analysis Learning from examples Model-based Approach : SLDSs Beyond standard SLDSs Duration modeling for LDSs Explicit modeling of global parameters Hierarchical SLDSs Summary of Contributions Background : Switching Linear Dynamic Systems Linear Dynamic Systems Switching Linear Dynamic Systems Inference in SLDS Learning in SLDS Related Work Segmental Switching Linear Dynamic Systems Need for Improved Duration modeling for Markov models Segmental SLDS Conceptual View on the Generative Process of S-SLDS Summary of Contributions on S-SLDS Parametric Switching Linear Dynamic Systems Need for the modeling of global parameters Parametric Switching Linear Dynamic Systems Graphical representation of P-SLDS Summary of Contributions on P-SLDS Automated analysis of honey bee dances using PS-SLDS Motivation Modeling of Honey bee dance using PS-SLDS Experimental Results Learning from Training Data Inference on Test Data Qualitative Results Quantitative Results Conclusion

3 6 Hierarchical SLDS Need for Hierarchical SLDSs Graphical model representation of H-SLDS Inference in H-SLDS Learning in H-SLDSs Dataset : Wearable sensor data Summary : Planned Future Work Related Work Hierarchical HMMs Hierarchical Models with Continuous Dynamics Evaluation Honey bee dance dataset Wearable exercise dataset Summary Summary of Contribution 35 2

4 Abstract Automated analysis of temporal data is a task of utmost importance for intelligent machines. For example, ubiquitous computing systems need to understand the intention of humans from the stream of sensory information, and healthcare monitoring systems can assist patients and doctors by providing automatically annotated daily health reports. Moreover, a huge amount of multimedia data such as videos await to be analyzed and indexed for search purposes, while scientific data such as recordings of animal behavior and evolving brain signals are being collected in the hope to deliver a new scientific discovery about life. The contribution highlighted in this thesis proposal is the development of switching linear dynamic systems (SLDSs) with higher-order temporal structure. SLDSs have been used to model continuous multivariate temporal data under the assumption that the characteristics of complex temporal sequences can be captured by Markov switching between a set of simpler primitives which are linear dynamic systems (LDSs). The SLDS models presented in this proposal go beyond the conventional SLDSs by introducing additional model structure to encode more descriptive higher-order temporal structure of data. Specifically, we introduce three SLDS model extensions. First, parametric SLDSs (P-SLDSs) explicitly model the global parameters which induce systematic temporal and spatial variations of data. The additional structure of PSLDSs allows us to conduct the global parameter quantification task which could not be addressed by standard SLDSs previously in addition to providing more accurate labeling ability. Second, segmental SLDSs (S-SLDSs) provide the ability to capture descriptive duration models within LDS regimes. The encoded duration models are more descriptive than the exponential duration models induced within the standard SLDSs and allow us to avoid the severe problem of over-segmentations and demonstrate superior labeling accuracy. Third, we introduce hierarchical SLDSs (H-SLDSs), a generalization of standard SLDSs with hierarchic Markov chains. H-SLDSs are able to encode temporal data which exhibits hierarchic structure where the underlying low-level temporal patterns repeatedly appear in different higher level contexts. Accordingly, H-SLDSs can be used to analyze temporal data at multiple temporal granularities, and provide the additional ability to learn a more complex H-SLDS model easily by combining underlying H-SLDSs. The developed SLDSs have been applied to solve two real-world problems. The first problem is to automatically analyze the honey bee dance dataset where the goal is to correctly segment the dance sequences into different regimes and parse the messages about the location of food sources embedded in the data. We show that a combination of the P-SLDS and S-SLDS models has demonstrated improved labeling accuracy and message parsing results. The second problem is to analyze the wearable exercise data where we aim to provide an automatically generated exercise record at multiple temporal resolutions. Currently, H-SLDSs still in progress are used to encode the exercise dataset. We show the preliminary results and discuss the future directions in this problem domain.

5 Chapter 1 Introduction The proposed thesis in this research proposal is the following : Switching linear dynamic systems with higher-order temporal structure demonstrate superior accuracy over the standard SLDSs for the labeling, classification, and quantification of time-series data In the following sections, we describe the core problems in temporal sequence analysis, our approach towards the problems, and how it advances the state of the art in this field. 1.1 Automated Temporal Sequence Analysis Temporal sequences are abundant. Examples of temporal data include motion trajectories, voice, video frames, medical sensor signals (e.g., fmri signals), wearable sensor data, and economic indices, only to name a few. Temporal data in the most general form is a sequence of multivariate vectors. In contrast to the abundance of temporal data, the analysis still often relies on the manual interpretation by humans. The manual interpretation of the temporal data, which is a time-consuming process, seems very challenging in some cases due to the complexity of data. For example, sound technicians study complex sound waves, medical doctors conduct diagnosis based on the signals recorded from medical monitoring systems, and investment bank analysts analyze the stock price history. In other occasions, the tasks seem simpler, but, the data has still been interpreted and labeled by humans, simply due to the lack of automated analysis tools. For example, computer graphics experts in animation industry often search for a particular motion sequence from a database investing substantial amount of time, and field biologists label the tracked motion sequences of animals w.r.t. the corresponding motion regimes by exhaustively examining the tracks frame by frame. We can observe that the development of automated tools to analyze the temporal sequences can contribute to such diverse fields where they would assist the knowledge workers to improve their work productivity through the automation of diverse manual works. Moreover, these new tools can provide them with the ability to explore a large temporal sequence database, which was previously challenging due to the substantial amount of manual work required. In addition, the advances in temporal sequence analysis can contribute to so-called emerging real-time applications which are targeted to proactively respond to the needs of the users based on the sensor signals, e.g., medical monitoring devices and wearable gesture recognition system among many others. Such systems are designed to improve the quality of people s lives by allowing them to have the necessary assistance at all times and help them to conduct their tasks more easily. Among numerous tasks in temporal data analysis domain, we will focus our attention to the following tasks in this thesis proposal : labeling, quantification, and classification, which we describe more in detail in the following sections. Labeling Labeling is the task of categorizing every part of sequences into different classes based on the properties it exhibits. Labeling of temporal sequences is a common task that appears in many fields. The classes can be defined by the 1

6 (a) (b) (c) Figure 1.1: (a) A bee dance consists of three patterns : waggle, left turn, and right turn. (b) The box in the middle is a tracked bee. (c) An example honey bee dance trajectory. The track is automatically obtained using a vision-based tracker and automatically labeled afterwards. Key : waggle, right-turn, left-turn domain experts or be discovered in an unsupervised manner. We can describe the problem in the biology domain more in detail. Honey bees communicate the location and distance to a food source through a dance that takes place within hive. The dance is decomposed into three different regimes: left turn, right turn and waggle, as shown in Fig. 1.1(a). The length (duration) and orientation of the waggle phase correspond to the distance and the orientation to the food source. Figure 1.1(b) shows a dancer bee that was tracked by a previously developed vision-based tracker [25]. The labeling problem in this domain is to automatically segment the trajectory into one of the three categories. An example result obtained by a developed automated sequence tool is shown in Fig 1.1(c) with the color-coded motion patterns. Quantification Quantification is the task of inferring the global parameters that underlie the systematic variations in the data. Quantification problems appear in many domains where we are more interested in the global parameters that vary the behavior of the signals rather than the exact categorization of the sub-regimes. The quantification task assumes that temporal data can be represented by an underlying template and that a particular data is an instantiation of such template transformed systematically depending on the global parameters. For example, the intended pointing direction of a person changes the overall trajectory of the person s gesture. In this problem, people will be less interested in the actual labeling results of the sub-motion patterns than the pointing direction indicated by the person s gesture. In such cases, we would need a systematic approach to model the function between the global parameters and the variations within data. Another examples are the dance trajectories of honey bees which vary substantially depending on the distance and orientation to the indicating food source. Examples of varying honey bee dance trajectories in stylized forms are shown in Figure 1.2 where we can observe that the underlying template is functorized by global variables. The quantification of global variables provide users with the high-level information they need. Moreover, the quantification task has an interesting aspect that the information system produces can be used to improve the accuracy of the labeling task. Vice versa, it is again plausible that improved low-level labeling results can improve the quantification results. An approach which solves these two problems simultaneously will be presented in the latter part of this proposal. Classification Classification is the task of categorizing temporal sequences into one of many classes. It is also a crucial task for the retrieval of data of particular types from a large database. For example, a surveillance system can track a huge number of people and produce a large database of trajectories. Classifying all the trajectories automatically to find tracks which exhibit a particular behavior would be essential to fully make use of such collected data. The general approach to the sequence classification problem is to build the models of distinct classes where domain experts usually provide the training examples corresponding to the classes. Then, a novel sequence is tested against the created class models to produce similarity measures. Finally, users provide the analysis system with classification rules which are used to determine the class of every novel sequence based on the similarity measures. In this proposal, we would focus on the structural learning problem which is a crucial step within the pipeline of temporal sequence classification system. We describe more in detail in the following sections. 2

7 Figure 1.2: Examples of various honey bee dance trajectories in stylized forms. The upper row shows the trajectory variations which depend on the duration of the waggle phase. The bottom row shows the trajectory variations which depend on the orientation of the waggle phase. Waggle phase and angle indicate the distance and orientation to the food source. 1.2 Learning from examples The tasks of labeling, quantification, and classification associate learning problems. Class description, underlying temporal templates, the function between global parameters and the template, similarity measures, and classification rules are often not available even to the domain experts. Hence, it is unlikely that a knowledge engineer who tries to design an automated analysis system but unfamiliar with the data would possess any such description. In such cases, the task of learning the characteristics of the data from the provided examples becomes the crucial step to build a working system. To solve the associated learning problem, we adopt a divide and conquer approach where we assume the following properties for the multivariate temporal sequences : 1. Temporal sequences consists of multiple primitive patterns 2. The temporal ordering structure of the primitives provide the characteristic signature of time-series data. A learning system built under such assumption should have the following functionalities : 1. The system should be able to learn the distinct characteristics of the primitive temporal patterns. In the case where the primitive patterns are obvious and labeled examples are available, we would adopt supervised learning approach. Otherwise, the system should be able to discover the repetitive primitive patterns in an unsupervised manner. 2. The system should be able to learn the temporal ordering structures of the primitives across time. The learned structures are important parts of the class signatures which can be used to encode the patterns of different classes of sequences. Accordingly, we would propose a model-based approach where the proposed model consists of sub-models which switch over time with a certain temporal ordering structure. 1.3 Model-based Approach : SLDSs We adopt a model-based approach to characterize temporal sequences. In other words, parametric models are built from training sequences, and used to analyze novel temporal sequences. The basic generative model we adopt is the switching linear dynamic system (SLDS) model. The SLDS model assumes that complex temporal sequences can be described by the Markov switching between a set of simpler primitives which are linear dynamic systems (LDSs), often called a Kalman filter models. Hence, the distinctive feature of SLDSs is the fact that they use LDSs as their primitives, compared to the piecewise constant approximators used by the popular hidden Markov models (HMMs) [48]. LDS primitives have potential advantages over piecewise constant approximators of HMMs in several aspects. First, LDSs can be more descriptive and concise for diverse types of temporal sequences. Every LDS encodes the 3

8 Figure 1.3: Modeling of a 1D temporal sequence by an LDS and a piecewise approximation. The blue x s indicate the original training sequence changing over time (x-axis). The red and green lines represent the approximation result by an LDS model and a constant approximator. It can be observed that an LDS can model the example dynamic sequence more accurately than a constant approximator. associated dynamics, not absolute values, exhibited by the corresponding portion of the data. Hence, an LDS has the possibility to describe the primitive patterns of temporal sequences more concisely and accurately in case the data demonstrates certain dynamics. In contrast, the piecewise approximation primitives used by HMMs discard the temporal dynamic information and try to extract key values from data. For example, the 1D sequence in Fig. 1.3 can be modeled concisely by an LDS while a piecewise approximator will simply learn the mean of the data and fails to capture the distinctive dynamic information. Additionally, LDS can effectively deal with high-dimensional data through dimensionality reduction. LDS is a dynamic extension of the factor analysis model (FA) [6, 22], a dimensionality reduction technique. In certain domains, data may have very high dimensionality, e.g., video data of computer vision community. While a huge number of piecewise approximators will be needed to extract the informative key points from high-dimensional sequences, relatively small number of LDSs may extract informative dynamic patterns from low-dimensional subspace. In terms of temporal ordering structure, the standard SLDSs capture such structure within a pairwise probabilistic transition matrix, under Markov assumption, identical to HMMs. Markov assumption simplifies the temporal structure learning problem for switching models by assuming that the short-term switching patterns of discrete modes are sufficiently descriptive to encode distinct classes of data. 1.4 Beyond standard SLDSs While the standard SLDSs have the promising properties mentioned in Section 1.3, they can be extended in such a way that they can be used to produce more accurate results for the tasks of labeling, quantification, and classification. Such improvement can be achieved by extending the model to capture more descriptive temporal structures, and by enhancing the model to explicitly encode the global parameters Duration modeling for LDSs While SLDSs adopt LDSs as powerful primitives, Markov assumption limits SLDSs from capturing descriptive duration patterns of LDSs. The fact that LDSs represent dynamics suggests that every LDS is expected to be active for a longer time-span than the piecewise constant approximators of HMMs. However, the duration model induced by Markov assumption assigns the highest probability to the duration of one. Consequently, accurate segmentation results are expected only when the observation is minimally noisy. However, sensory data can possess substantial amount of noise. In the existence of noise, the simple Markovian assumption can result in over-segmentations of data where false labels with very short durations can be inserted due to the noise at those particular frames. Especially, capturing the duration patterns of LDSs is important to improve the accuracy of labeling tasks. An extension of SLDS which consist of LDSs with more accurate time duration models would be able to effectively discount such short-term strong noise and produce more accurate labeling results, guided by the prior knowledge on the durations of the LDS models. An extension of SLDS with such ability, named segmental SLDSs, is presented in 4

9 Chapter 3as one of the contributions of this proposal work Explicit modeling of global parameters The standard SLDS does not provide a principled way to model the parameterized temporal sequences, i.e., the data which exhibits systematic temporal and spatial variations. We adopt an approach where we explicitly models the function which deforms the underlying canonical template based on the global parameters to produce data with systematic variations. An illustrative example is the honeybee dances : bees communicate the orientation and the distance to the food sources through the dance angles and waggle duration of their stylized dances. In this example, the canonical underlying template is in the form of the prototype dance trajectory illustrated in Fig. 1.1 (a), and the resulting example trajectories are shown in Fig We introduce a new representation for the class of parameterized multivariate temporal sequences : parametric SLDSs (P-SLDS). Our work is partly inspired by the previous work of Wilson & Bobick where they presented parametric HMMs [59], an extension to HMMs with which they successfully interpreted human gestures. As in P-HMMs, the P-SLDS model incorporates global parameters that underlay systematic spatial variations of the overall target motion. In addition, while P-HMMs only introduced global observation parameters which describe spatial variations in the outputs, we additionally introduce dynamic parameters which capture temporal variations. As mentioned earlier, the problems of global parameter quantification and labeling can be solved simultaneously. Hence, we formulate expectation-maximization (EM) methods for learning and inference in P-SLDS and present it in Chapter 4 of this thesis proposal Hierarchical SLDSs We propose to address the learning of a hierarchical extension of SLDSs as a future work of this proposed research. The use of the hierarchical models for the automated temporal sequence analysis instead of the flat Markov models is motivated by the fact that hierarchical models can capture the correlations between the events appearing far apart in the sequence, while flat Markov models can only capture the local correlations. Additionally, hierarchical models even provide the ability to encode the repetitive patterns more precisely than the flat Markov models which tend to learn the averaged switching patterns. For example, we can see the two different training sequences in Fig. 1.4 and Fig. 1.5 where the dynamic patterns are color-coded. The sequence in Fig. 1.4 (a) starts with the upward triangle pattern and finishes in the downward triangle pattern. In contrast, the training sequence in Fig. 1.5 (a) starts with the downward triangle pattern and finishes in the upward triangle pattern. On the right side of Fig. 1.4 and Fig. 1.5, the transition tables of the corresponding hierarchical models and the flat Markov models are shown. It can be observed that the hierarchical models can capture the long-term temporal structure along with the detailed repetitive patterns, which would allow the two different models to conduct reliable classification between the two structurally different training sequences. Moreover, it is expected that the captured repetitive patterns of the upward and downward triangles in the bottom layer of the hierarchical model will help to produce more accurate labeling results even in the existence of observation noise. On the other hand, the inability of flat Markov models to capture both the long-term and the repetitive patterns can be shown in the transition tables in Fig. 1.4 (c) and Fig. 1.5 (c) where the transition tables are identical, due to the property of Markov models which only learn the averaged switching patterns. However, although hierarchical models provide the possibility to demonstrate superior accuracy in the automated sequence analysis tasks, their success crucially depends on the availability of learning algorithms which can induce representative models from data. It can be expected that the substantially increased number of parameters of the hierarchical models in comparison to the flat Markov models render the optimization-based learning approach very challenging. This is due to the increased size of the parameter space of the hierarchical models where many local minimas can hinder the model from capturing a meaningful structure from data unless the given structure and the initial parameters are excellent, which is a very hard problem on its own. Accordingly, we plan to address the learning problem of the hierarchical SLDS models as a future work. In particular, we plan to study an effective way to induce hierarchical structures from data, a relatively less studied problem, compared to the inference and the parameter learning problems with a given structure. The details of this proposed research would be described in the latter part of this proposal. 5

10 Figure 1.4: (a) A training sequence. (b) A two-level hierarchical model and (c) a flat Markov model, learned from the training data on the left. Figure 1.5: (a) A training sequence. (b) A two-level hierarchical model and (c) a flat Markov model, learned from the training data on the left. 1.5 Summary of Contributions In this research proposal, we present three contributions which extend SLDSs to produce superior accuracy over the standard SLDS model for the automated temporal sequence analysis tasks : 1. SLDSs with duration models, segmental SLDSs, and the associated learning and inference algorithms are presented in the Chapter 3. S-SLDSs produce more accurate labeling results than the standard SLDSs as the result of encoding the duration patterns of LDS primitives. 2. Parametric SLDSs, a new representation which explicitly models the deformation function between the canonical template w.r.t. the global parameters, is presented in Chapter 4. P-SLDSs provide a principled way to estimate high-level global parameters from data. The associated learning and inference algorithms are presented in this work. 3. Hierarchical SLDSs provide a principled way to conduct sequence analysis tasks for temporal data with nontrivial temporal structures. We note that one of the core issues to render the hierarchical models to be practical for the real-world problems is the structure learning problem and plan to study the issue accordingly. A reliable structural learning method will contribute to induce a hierarchical model which effectively encodes the descriptive temporal structure of the data and demonstrate superior accuracy for the sequence analysis tasks accordingly. The rest of this thesis proposal describes the above mentioned extensions in detail with experimental results when they are available. Otherwise, we outline the proposed research plan for the uncompleted portion of the work. 6

11 Chapter 2 Background : Switching Linear Dynamic Systems Switching Linear Dynamic System (SLDS) models have been studied in a variety of problem domains. Representative examples include computer vision [46, 45, 47, 38, 9, 42], computer graphics [32, 49], control systems [58], econometrics [26], speech recognition [44, 50] (Jeff Ma paper add), tracking [5], machine learning [30, 21, 40, 39, 24], signal processing [15, 16], statistics [55] and visualization [60]. While there are several versions of SLDS in the literature, this paper addresses the model structure depicted in Figure 2.2. An SLDS model represents the nonlinear dynamic behavior of a complex system by switching among a set of linear dynamic models over time. In contrast to HMM s [48], the Markov process in an SLDS selects from a set of continuously-evolving linear Gaussian dynamics, rather than a fixed Gaussian mixture density. As a consequence, an SLDS has potentially greater descriptive power. Offsetting this advantage is the fact that exact inference in an SLDS is intractable in case the continuous states are coupled during the switchings, which complicates inference and parameter learning [29]. The rest of this chapter is organized as follows. First, we review the linear dynamic systems (LDSs), and its extension, switching LDSs (SLDSs). Then, we review a set of the developed approximate inference techniques and an EM-based learning method for SLDSs. Finally, related work is reviewed and this chapter concludes. 2.1 Linear Dynamic Systems Figure 2.1: A linear dynamic system (LDS) A Linear Dynamic System (LDS) is a time-series state-space model consisting of a linear Gaussian dynamics model and a linear Gaussian observation model. The graphical representation of an LDS is shown in Fig The Markov chain at the top represents the state evolution of the continuous hidden states x t. The prior density p 1 on the initial state x 1 is assumed to be normal with mean µ 1 and covariance Σ 1, i.e., x 1 N (µ 1, Σ 1 ). The state x t is obtained by the product of state transition matrix F and the previous state x t 1, corrupted by zero-mean white noise w t with covariance matrix Q: x t = F x t 1 + w t where w t N (0, Q) (2.1) 7

12 In addition, the measurement z t is generated from the current state x t through the observation matrix H, and corrupted by zero-mean observation noise v t : z t = Hx t + v t where v t N (0, V ) (2.2) Thus, an LDS model M is defined by the tuple M = {(µ 1, Σ 1 ), (F, Q), (H, V )}. Exact inference in an LDS can be performed using the RTS smoother [4], an efficient variant of belief propagation for linear Gaussian models. Further details on LDSs can be found in [4, 33, 51]. Linear dynamic system has been often used for tracking problems [4, 33]. In addition, LDSs have been used to model the overall texture of the video scenes in a compact way with the video as a sequence of observations and generate a infinitely long video similar to the training sequences [14]. In other work, multiple LDSs were used to segment video w.r.t. the associated temporal texture patterns [11]. 2.2 Switching Linear Dynamic Systems Figure 2.2: Switching linear dynamic systems (SLDS) In a switching LDSs (SLDSs), we assume the existence of n distinct LDS models M = {M l 1 l n}. The graphical model corresponding to an SLDS is shown in Fig The middle chain, representing the hidden state sequence X = {x t 1 t T }, together with the observations Z = {z t 1 t T } at the bottom, is identical to an LDS in Fig However, we now have an additional discrete Markov chain L = {l t 1 t T } that determines which of the n models M l is used at every time-step. We call l t M the label at time t and L a label sequence. The switching state l t is obtained from the previous state label l t 1 based on the Markov transition model P (l t l t 1, B) which is represented as an n n transition matrix B that defines the switching behavior between the n distinct LDS models : l t P (l t l t 1, B) (2.3) The state x t is obtained by the product of the corresponding state transition matrix F lt and the previous state x t 1, corrupted by zero-mean white noise w t with covariance matrix Q lt : x t = F lt x t 1 + w t where w t N (0, Q lt ) (2.4) In addition, the measurement z t is generated from the current state x t through the corresponding observation matrix H lt, and corrupted by zero-mean observation noise v t with covariance V lt : z t = H lt x t + v t where v t N (0, V lt ) (2.5) Finally, in addition to a set of LDS models M, we specify two additional parameters: a multinomial distribution π(l 1 ) over the initial label l 1 : l 1 π(l 1 ) (2.6) 8

13 Algorithm 1 Expectation Maximization (EM) based learning algorithm for SLDSs 1. Initiate a learning process with an initial model parameter tuple Θ E-step : Inference to obtain the posterior distribution : f i (L, X) = P (L, X Z, Θ i ) (2.7) over the hidden variables L and X, using a current guess for the SLDS parameters Θ i. 3. M-step : obtain the updated Θ i+1 which maximizes the expected log-likelihoods : Θ i+1 argmax Θ log P (L, X, Z Θ f i (L,X) (2.8) 4. Check convergence via log-likelihood monitoring. If converged, stop. Otherwise, go back to Step 2 and repeat. { } In summary, a standard SLDS model is defined by a tuple Θ = π, B, M = {M l 1 l n}. It is worth noting that the previous work on SLDSs often adopts simplified versions of the SLDS model described above by introducing different assumptions on parameter tying. Variations include [55, 21] where only the observation models (H and V ) are switching, [46, 45, 47, 38] where only the dynamics models (F and Q) are switching with a single observation model, and [44] where the whole parameters (F, Q, H, V ) are switching but with the additional assumption that the successive continuous states decouple when switchings occur. The works with the most generic model without any parameter tying include [4, 26, 50, 42]. 2.3 Inference in SLDS Inference in an SLDS model involves computing the posterior distribution of the hidden states, which consist of the (discrete) switching states L and the (continuous) dynamic state X. More formally, the inference procedure in SLDSs corresponds to the computation of the posterior P (L, X Z, Θ) on the hidden variables which are the label sequence L and the state sequence X, given the observation sequence Z and the known parameters Θ. In application domains such as behavior recognition, the users are typically interested in inferring the switching states L [42], i.e., the labeling problem. On the other hand, the continuous state sequence X is the variable of interest in the applications such as tracking or signal processing. It is worth noting that inference is also the crucial step in parameter learning via the EM algorithm, in addition to its central role in state estimation. However, it is proved that the exact inference in SLDSs is intractable [29] with the exception of [44] where the authors assumed that the successive continuous states (x t, x t+1 ) are decoupled when switching occurs. Consequently, an array of approximate inference methods have been developed. The approximate inference in SLDS models has focused primarily on three classes of techniques : 1. Stage-wise methods such as approximate Viterbi or GPB2 which maintain a constant representational size for each time step as data is processed sequentially [46, 45, 47, 4]. 2. Structured variational methods which approximate the intractable exact model with a tractable, decoupled model [47, 21, 39].Expectation-propagation [35, 60] belongs to this class of algorithms since it maintains a constant representational size. 3. Sampling based methods which sample the hidden variables using Monte Carlo techniques [10, 15, 16, 50, 40]. 2.4 Learning in SLDS The maximum-likelihood (ML) SLDS model parameters ˆΘ can be obtained using the Expectation Maximization (EM) algorithm [12]. The hidden variables in EM are the label sequence L and the state sequence X. Given the observation 9

14 data Z, EM proceeds as described in Algorithm 1. Above, W denotes the expectation of a function ( ) under a distribution W. The E-step in Eq. 2.7 corresponds to the inference procedure for SLDSs. As mentioned in Section 2.3, it is proved that the exact inference in SLDSs is intractable [29]. Hence, we should rely on one of the approximate inference methods described in Section 2.3 for the E-step, unless we use the variation in [44]. The learning procedure in Algorithm 1 is simplified in the case where the ground truth LDS labels for the training sequences are known. In that case, every LDS parameters are learned separately based on the corresponding parts of the data. Then, the parameters of the discrete process, initial distribution π(l 1 ) and the Markov switching matrix B, are learned separately. On the other hand, if the ground truth labels are unavailable, the underlying LDS models should be learned in an unsupervised way. In that case, the inferred labels in Algorithm 1 are used to partition the data which provide the basis to learn distinct LDS models. 2.5 Related Work The development of SLDSs is closely related with the work on dimensionality reduction [57, 52, 22] for static (non time-series) data as well. In particular, the SLDS model can be thought to be a dynamic parallel of the work on modeling complex static dataset as a mixture of low-dimensional linear models [57, 22]. This analogy can be made since SLDSs aim to model non-linear and high dimensional data as the mixtures of locally linear dynamic systems which switch over time in the latent space. 10

15 Chapter 3 Segmental Switching Linear Dynamic Systems 3.1 Need for Improved Duration modeling for Markov models The duration modeling capability of any first-order Markov model is limited by its own assumption upon the transitions between the discrete switching states. As a consequence of Markov assumption, the probability of remaining in a given switching state follows a geometric distribution : P (d) = a d 1 (1 a) (3.1) Above, d denotes the duration of a given switching state and a denotes the probability of a self-transition. One consequence of this model is that a duration of one time-step possesses the largest probability mass. This can be seen in Fig. 3.1 where the red curve depicts the geometric distribution. In contrast, many natural temporal phenomena exhibit patterns of regularity in the duration over which a given model or regime is active. In such cases the geometric duration models of Markov models would not effectively encode the regularity of the data. A honey bee dance is a good example: a dancer bee will attempt to stay in the waggle regime for a certain duration to effectively communicate a message. Another example would be a walking human. Humans exhibit walking cycles of certain duration which would be modeled better using a Gaussian density. In such cases, it is clear that the actual duration diverges from a geometric distribution. Figure 3.1: A Gaussian model is closer to the duration distribution of training data (shown as the overlaid histogram) than a geometric duration model. To illustrate this point, we learned a duration model for the waggle phase using a Gaussian density and a conventional geometric distribution, using one of the manually labeled dance sequences available. Figure 3.1 shows the learned geometric and Gaussian distributions for comparison. It can be observed that the Gaussian model is much closer to the training data than the conventional geometric model. 11

16 The limitation of a geometric distribution has been previously addressed by the HMM, and HMM models with enhanced duration capabilities have been developed [17, 31, 53, 44]. The variable duration HMM (VD-HMM) was introduced in [17] : state durations are modeled explicitly in a variety of PDF forms. Later, a different parameterization of the state durations was introduced where the state transition probabilities are modeled as functions of time, which are referred to as non-stationary HMMs (NS-HMM) [31]. It has since been shown that the VD-HMM and the NS- HMM are duals [13]. In addition, segmental HMM with random effects was developed in the data mining community [20, 27]. Ostendorf et.al. provide an excellent discussion of segmental HMMs in [44]. We adopt similar ideas to arrive at SLDS models with enhanced duration modeling. 3.2 Segmental SLDS We introduce the segmental SLDS (S-SLDS) model, which improves upon the standard SLDS model by relaxing the Markov assumption at a time-step level to a coarser segment level. The development of the S-SLDS model is motivated by the regularity in durations that is often exhibited in nature. For example, as discussed in Section 3.1, a dancer bee will attempt to stay in the waggle regime for a certain duration to effectively communicate the distance to the food source. In such a case, the geometric distribution induced in a standard SLDS is not an appropriate choice. Fig. 3.1 shows that a geometric distribution assigns the highest probability to the duration of a single time step. As a result, the label inference in standard SLDSs is susceptible to over-segmentation. In an S-SLDS, the durations are first modeled explicitly [17] and then non-stationary duration functions [31] are derived from them. Both of them are learned from data. As a consequence, the S-SLDS model has more descriptive power and can yield more accurate results than the standard SLDSs. Nonetheless, we show that one can always convert a learned S-SLDS model into an equivalent standard SLDS, operating in a different label space. The approach has the significant advantage of allowing us to reuse the large array of approximate inference and learning techniques developed for SLDSs Conceptual View on the Generative Process of S-SLDS Figure 3.2: A schematic view of an S-SLDS with explicit duration models. In an S-SLDS, we deal with segments of finite duration with pre-selected maximum duration Dl max for every label l. Each segment s i = (li, d i ) is described by a tuple consisting of a label l i and a duration d i. Within each segment, a fixed LDS model M l is used to generate the continuous state sequence for the duration d i which follows the associated duration model D li. Similar to SLDSs, we take an S-SLDS to have an n n semi Markov label transition matrix B that defines the switching behavior between the segment labels, and an initial distribution P (l 1 ) over the initial label l 1 of the first segment s 1. The tilde denotes that the matrix is a semi-markov transition matrix between segments rather than between time-steps. Additionally, we associate each label l with a fixed duration model D l. We denote the set of n duration models as D = {D l (d) 1 l n}, and refer to them in what follows as explicit duration models : l i P (l i l i 1, B) and d i D li In summary, an S-SLDS is defined by a tuple Θ = { π, B, D = {D l 1 l n}, M = {M l 1 l n} }. 12

17 A schematic depiction of an S-SLDS is illustrated in Fig The top chain in the figure is a series of segments where each segment is depicted as a rounded box. In the model, the current segment s i = (li, d i ) generates a next segment s i+1 in the following manner: first, the current label l i generates the next label l i+1 based on the label transition matrix B; then, the next duration d i+1 is generated from the duration model for the label l i+1, i.e. d i+1 D li+1 (d). The dynamics for the continuous hidden states and observations are identical to a standard SLDS : a segment s i evolves the set of continuous hidden states X with a corresponding LDS model M li for the duration d i, then the observations Z are generated given the labels L and the set of continuous states X Summary of Contributions on S-SLDS The conceptual S-SLDS model described in Section has been formulated formally within a graphical model framework [43, 42] where appropriate learning and inference methods are developed. The details of the developed learning and inference techniques can be summarized as follows : Learning method is developed within EM framework. Inference method is based on previously developed inference technique for standard SLDSs. Basically, we introduced a technique to convert an S-SLDS model into a standard SLDS model, and reuse the existing array of previously developed inference techniques. Techniques to speed-up inference process have been developed. The resulting algorithm has complexity of O(T D max L 2 ), compared to the complexity O(T D 2 max L 2 ) resulting from the blind re-use of the existing methods. For your information, the complexity of the inference techniques for standard SLDS is O(T L 2 ). For more details, please refer to the published work [43, 42]. 13

18 Chapter 4 Parametric Switching Linear Dynamic Systems 4.1 Need for the modeling of global parameters Temporal sequences are often parameterized by global factors. For example, people walk, but at different paces and with different styles. Sound fluctuates, but with different frequencies and different amplitudes. Hence, one important problem in temporal sequence analysis is quantification, by which we mean the identification of global parameters that underlay the behavior of the signals, e.g., the direction of a pointing gesture. However, most switching system models are only designed to be able to label the temporal sequences into different regimes, e.g., HMMs or SLDSs. Nonetheless, these two inference tasks are not independent : a better understanding on the systematic variations in the data can improve labeling results, and vice versa. A speech recognition system which is aware of the general utterance pattern of a specific person would be able to recognize the speech of that person more accurately. The consideration of global parameters is motivated by two observations : (1) temporal sequences can be often described as the combination of a representative template and underlying global variations, and (2) we are often more interested in estimating the global parameters rather than the exact categorization of the sub-regimes. Unfortunately, the standard SLDS model does not provide a principled way to quantify temporal and spatial variations w.r.t. the fixed (canonical) underlying behavioral template. Previously, Wilson & Bobick addressed this problem by presenting a parametric HMMs (P-HMM) [59]. In a P-HMM, the parametric observation models learned are conditioned on global observation parameters and globally parameterized gestures could be recognized. P-HMMs have been used to interpret human gestures and demonstrated superior recognition performance in comparison to standard HMMs. Inspired by P-HMM, we extend the standard SLDS model, resulting in a parametric SLDS (P-SLDS). As in a P-HMM, the P-SLDS model incorporates global parameters that underlay systematic spatial variations of the overall target motion, and is able to estimate the associated global parameters from data. In addition, while P-HMMs only introduced global observation parameters which describe the spatial variations in the outputs, we additionally introduce global dynamic parameters which capture temporal variations. As mentioned earlier, the problems of global parameter quantification and labeling can be solved simultaneously. Hence, we formulate expectation-maximization (EM) methods for learning and inference in P-SLDSs and present it in Section 4.2. A preliminary version of P-SLDS work appeared in [41]. 4.2 Parametric Switching Linear Dynamic Systems As discussed in Section 4.1, the standard SLDS does not provide a principled means to quantify global variations in the motion patterns. For example, honey bees communicate the orientation and distance to food sources through the (spatial) dance angles and (temporal) waggle durations of their stylized dances which take place in the hive. As a result, these global motion parameters which encode the message of the bee dance are the variables that we are most interested in estimating. 14

19 In this section, we present a parametric SLDS (P-SLDS) model which makes it possible to quantify the global variables and solve both labeling and quantification problems in an iterative manner. The resulting P-SLDS learns canonical behavioral templates from data with additional information on the associated global parameters. P-SLDS effectively decodes the global parameters while it simultaneously labels the sequences. This is done using an expectationmaximization (EM) algorithm [12, 34], presented in the following sections. Figure 4.1: Parametric SLDS (P-SLDS) Graphical representation of P-SLDS In P-SLDSs, a set of global parameters Φ = {Φ d, Φ o } parametrize the discrete state transition probabilities and output probabilities. The parameters Φ are global in that they systematically affect the entire sequence. The graphical model of P-SLDS is shown in Fig Note that there are two classes of global parameters : the dynamics parameters Φ d and the observation parameters Φ o. The dynamics parameters Φ d represent the factors that cause temporal variations. The different values of the dynamics parameters Φ d result in different switching behavior between behavioral modes. In the case of the honey bee dance, a food source that is far away leads a dancer bee to stay in each dance regime longer, resulting in a dance with larger radius which will show less frequent transitions between dance regimes. In terms of S-SLDS model, the global dynamics parameters are associated with duration models. In contrast, the observation parameters Φ o represent factors that cause spatial variations. A good example is a pointing gesture, where the indicating direction changes the overall arm motions. In the honey bee dance case, one can consider standard SLDS as a behavioral template that can be stretched in time by global dynamic parameters Φ d and spatially rotated by global observation parameters Φ o. The common underlying behavioral template is defined by canonical parameters Θ. Note that the canonical parameters Θ are embedded in the conditional dependency arcs in Fig In the bee dancing example, the canonical parameters describe the prototyped stylized bee dance. However, the individual dynamics in the different bee dances systematically vary from the prototyped dance due to the changing food source locations which are represented by the global parameters Φ. Notice that the discrete state transitions in the top chain of Fig. 4.1 are instantiated by Θ and Φ d, and the observation model at the bottom is instantiated by Θ and Φ d while the continuous state transitions in the middle chain are instantiated solely by the canonical parameters Θ. In other words, the dynamics parameters Φ d, vary the prototyped switching behaviors, and the observation parameters Φ o vary the prototyped observation model. The intuition behind the quantification of global parameters is that they can be effectively discovered by finding the global parameters that best describe the discrepancies between the new observations and the behavioral template. In other words, the global parameters are estimated by minimizing the residual error that remains between the template and the observation sequence. 15

20 The result of parameterizing the SLDS model is the incorporation of additional conditioning variables in the initial state distribution P (l 1 Θ, Φ d ), the discrete state transition table P (l t l t 1, Θ, Φ d ), and the observation model P (z t l t, x t, Θ, Φ o ). There are three possibilities for the nature of the parameterization: (a) the PDF is a linear function of the global parameters Φ, (b) the PDF is a non-linear function of Φ, and (c) no functional form for the PDF is available. In the latter case of (c), general function approximators such as neural network may be used, as suggested in [59] Summary of Contributions on P-SLDS In addition to introducing the graphical model of P-SLDS, appropriate learning and inference methods are developed [41, 42]. The details of the developed learning and inference techniques can be summarized as follows : A learning method for P-SLDS model is developed within EM framework. It is assumed that the global parameters are available as part of the training data, and the functional forms of the parameterization are available. Since the global variables are known, the learning for P-SLDS is often analogous to the learning for standard SLDSs with additional parameters. Inference method is based on EM as well. The intuition for EM-based inference method is straightforward. Our goal is to find the unknown global parameter settings which maximize the likelihood of the model against the given test data. Hence, the inference procedure can be formulated within EM framework by treating the global parameters as missing variables. With reasonable initial value, it has been observed that the the global variables were refined through EM iterations and mostly converged to values close to the ground truths. For more details, please refer to the published work [41, 42]. 16

21 Chapter 5 Automated analysis of honey bee dances using PS-SLDS In this chapter, we evaluate our thesis by applying the developed theory of S-SLDSs developed in Chapter 3, and the theory of P-SLDSs in Chapter 4 on the real-world honey bee dataset for the labeling and quantification tasks. To take advantage of both models, we combine the models and use the resulting parametric segmental SLDS (PS-SLDS) model to learn the temporal patterns from data and use the learned model to conduct labeling and quantification tasks. The resulting PS-SLDS model demonstrates superior accuracy over the standard SLDS model for both the labeling and the quantification tasks. 5.1 Motivation The application domain which motivates the work in this chapter is a new research area which enlists visual tracking and AI modeling techniques in the service of biology [2, 3, 8, 54]. The current state of biological field work is still dominated by manual data interpretation, a time-consuming process. Automatic interpretation methods can provide field biologists with new tools for the quantitative study of animal behavior. A classical example of animal behavior and communication is the honey bee dance [19], depicted in a stylized form in Fig. 5.1(a). Honey bees communicate the location and distance to a food source through a dance that takes place within the hive. The dance is decomposed into three different regimes: left turn, right turn and waggle. The length (duration) and orientation of the waggle phase correspond to the distance and the orientation to the food source. Figure 5.1(b) shows a dancer bee that was tracked by a previously developed vision-based tracker described in [25]. After tracking, the obtained trajectories of the dancing bees are manually labeled as left turn, right turn or waggle and is shown in Figure 5.2(1-6). In this domain, our work on SLDS models is in support of three goals for automatic bee dance analysis. First, we aim to learn the motion patterns of honey bee dances from the labeled training sequences. Second, we should be able to automatically segment novel sequences into the three dance modes reliably, i.e., the labeling problem. Finally, we (a) (b) Figure 5.1: (a) A bee dance consists of three patterns : waggle, left turn, and right turn. (b) The box in the middle is a tracked bee. (c) An example honey bee dance trajectory. The track is automatically obtained using a vision-based tracker and manually labeled afterward. 17

22 (1) (2) (3) (4) (5) (6) Figure 5.2: Bee dance sequences used in the experiments. Each dance trajectory is the output of a visionbased tracker. Tables 5.1 and 5.2 give the global motion parameters for each of the numbered sequences. Key : waggle, right-turn, left-turn. face a quantification problem where the goal is to automatically deduce the message communicated, in this case: the distance and orientation to the food source. Note that both the labels and the global parameters are unknown, hence the problem is one of simultaneously inferring these hidden variables. 5.2 Modeling of Honey bee dance using PS-SLDS We describe a model for the honey bee dance based on our parameterized segmental SLDS (PS-SLDS) model, a combination of the P-SLDS and the S-SLDS models. Consequently, we expect that the global spatial and temporal variations of the honey bee dances occurring due to the distinct food source locations and the non-exponential characteristics of the duration patterns of the dance regimes can be captured by the PS-SLDS model. The bee dance is parameterized by both classes of global parameters : dynamic parameters for the dance durations and observation parameters for the dance orientations. The global dynamics parameter set Φ d = {Φd,l 1 l n} is chosen to be correlated with the average duration of each dance regime, where n = 3. The global observation parameter Φ o is chosen to be the angle orientation of the bee dance. For more details, please refer to [42]. 5.3 Experimental Results The experimental results show that PS-SLDS provides reliable global parameter quantification capabilities along with improved recognition abilities in comparison to the standard SLDS. The six dancer bee tracks obtained from the videos are shown in Fig Sample output from our vision-based tracker [25] is shown in Fig. 5.1(b), where the dancer bee is automatically tracked inside the white rectangle. We performed experiments with 6 video sequences with length 1058, 1125, 1054, 757, 609 and 814 frames, respectively. Once the sequence observations Z were obtained, the trajectories were preprocessed so that the mean of each track is located at (100,100). Note from Fig. 5.2 that the tracks are noisy and much more irregular than the idealized stylized dance prototype shown in Fig. 5.1(a). The red, green and blue colors represent right-turn, waggle and left-turn phases. The ground-truth labels are marked manually for comparison and learning purposes. The dimensionality of the continuous hidden states was set to four. We adopted a leave-one-out (LOO) strategy for evaluation. The parameters are learned from five out of six datasets, and the learned model is applied to the left-out dataset to perform labeling and simultaneous quantification of angle/average waggle duration. Six experiments were performed using both PS-SLDS and the standard SLDS, so that we test on each sequence once. The PS-SLDS estimates of angle/average waggle durations (AWD) are directly obtained from the results of global parameter quantification. On the other hand, the SLDS estimates are heuristically obtained by averaging the transition numbers or averaging the heading angles from the inferred waggle segments Learning from Training Data The parameters of both PS-SLDS and standard SLDS are learned from the data sequences depicted in Fig The standard SLDS model parameters were learned as described in the section for learning in SLDSs based on the 18

23 given ground truth labels. The canonical parameters tuple described in Section are all learned solely based on the observations Z without any parameter tying. However, the prior distribution π on the first label was set to be a uniform distribution. To learn the PS-SLDS model parameters, the ground truth waggle angles and AWDs were evaluated from the data. Then, each sequence was preprocessed (rotated) in such a way that the waggles head in the same direction based on the evaluated ground truth waggle angles. This pre-processing was performed to allow the PS-SLDS model to learn the canonical parameters which represent the behavioral template of the dance. Note that six sets of model parameters are learned through the LOO approach and the global angle of the test sequence is not known a priori during the learning phase. In addition to the model parameters learned by the standard SLDS, PS-SLDS learns additional duration models D, and semi-markov transition matrix B, as described in Section Inference on Test Data During the testing phase, the learned parameter set was used to infer the labels of the left-out test sequence. An approximate Viterbi (VI) method [45, 47] and variational approximation (VA) methods [39] were used to infer the labels in standard SLDSs. The initial probability distributions for the VA method were initialized based on the VI labels. Our initialization scheme assigned VI labels a probability of 0.8 and the other two labels at every time-step were assigned probabilities of 0.1. We used the VI method due to its simplicity and speed. Our experiments compare the performance of SLDS and PS-SLDS models based on VI and VA methods Qualitative Results Our experimental results demonstrate the superior recognition capabilities of the proposed PS-SLDS model over the original SLDS model. The label inference results on all data sequences are shown in Fig The four color-coded strips in each figure represent SLDS VI, SLDS VA, PS-SLDS VI and the ground-truth labels from the top to the bottom. The x-axis represents time flow and the color is the label at that corresponding video frame. The superior recognition abilities of PS-SLDS can be observed from the presented results. The PS-SLDS results are closer to the ground truth or comparable to SLDS results in all sequences. In particular, the sequences 1, 2 and 3 are challenging because the tracking results obtained from the vision-based tracker are more noisy. In addition, the patterns of switching between dance modes and the durations of the dance regime are more irregular than the other sequences. It can be observed that most of the over-segmentations that appear in the SLDS labeling results disappear in the PS-SLDS labeling results. PS-SLDS estimates still introduce some errors, especially in sequences 1 and 3. However, keeping in mind that even a human expert can introduce labeling noise, the labeling capabilities of PS-SLDS are fairly good Quantitative Results The quantitative results on the angle/average waggle duration quantification show the robust global parameter quantification capabilities of PS-SLDS. Table. 5.1 shows (from top to bottom ) : the absolute errors of the PS-SLDS estimate along with the error rates (%) in parenthesis, SLDS estimates based on the VI and the VA methods, and the ground truth angle. The best estimates are accented in bold fonts. The SLDS estimates are obtained by the heuristic of averaging the heading angles in the sequences that were labeled as waggle in the inference step. All of the error values are the difference between estimated results and known ground truth values. Based on the six tests, PS-SLDS and SLDS show comparable waggle angle estimation capabilities. There is no distinguishable gap in performance between VI and VA methods. Our hypothesis is that the over-segmentation errors do not effect the waggle angle estimate as much as it effects average waggle duration estimates. Note that the maximum error of PS-SLDS angle estimate was 0.11 radians for the fifth dataset, which is fairly good considering the noise in the tracking results. The quantitative results on average waggle duration (AWD) quantification show the advantages of PS-SLDS in quantifying the global dynamics parameters of interest. AWD is an indicator of the distance to the food source from the hive and is valuable data for insect biologists. Table. 5.2 shows (from top to the bottom) : the absolute errors and error rates of the PS-SLDS estimates, the SLDS estimates of VI and VA methods and the ground truth AWDs. Again, the best estimates are marked in bold fonts where PS-SLDS estimates are consistently superior to the SLDS 19

24 time > (a) Sequence 1 (b) Sequence 2 (c) Sequence 3 (d) Sequence 4 (e) Sequence 5 (f) Sequence 6 Figure 5.3: Label inference results. Estimates from SLDS and PS-SLDS models are compared to manually-obtained ground truth (GT) labels. Key : waggle, right-turn, left-turn. 20

25 Sequence PS-SLDS 0.09 (30) 0.01 (4) 0.03 (3) 0.11 (8) 0.11 (5) 0.06 (8) SLDS VI 0.05 (16) 0.03 (12) 0.02 (2) 0.09 (7) 0.18 (9) 0.09 (11) SLDS VA 0.05 (16) 0.03 (12) 0.02 (2) 0.09 (7) 0.18 (9) 0.09 (11) Ground Truth Table 5.1: Absolute errors in the global rotation angle estimates from PS-SLDS and SLDS in radians. The numbers in parenthesis are error rates (%). Last row contains the ground truth rotation angles. Sequence numbers refer to Fig Sequence PS-SLDS 13.7 (27) 0.91 (2) 1.9 (9) 0.22 (<1) 0.4 (2) 5.6 (17) SLDS VI 40.8 (79) 28.9 (62) 11.1 (52) 0.44 (1) 3.6 (19) 8 (25) SLDS VA 40.7 (79) 28.9 (62) 11.1 (52) 0.44 (1) 3.6 (19) 8 (25) Ground Truth Table 5.2: Absolute errors in the Average Waggle Duration (AWD) estimates for PS-SLDS and SLDS in frames. The numbers in parenthesis are error rates (%). Last row contains the ground truth AWD. Sequence numbers refer to Fig Sequence PS-SLDS SLDS DD-MCMC SLDS VI SLDS VA Table 5.3: Accuracy of label inference in percentage. Sequence numbers refer to Fig estimates. The SLDS estimates are obtained by evaluating means of the waggle durations in the inferred segments. The results again show that PS-SLDS estimates match the ground-truth closely. In particular, we want to highlight the quality of the PS-SLDS AWD estimates for sequences 2, 3, 4 and 5. In contrast, the SLDS estimates in these cases are inaccurate. More specifically, the SLDS estimates deviate far from the ground truth in most cases except for the sequence 4. The reliability of AWD estimates of PS-SLDS model show the benefit of the duration modeling and the canonical parameters supported by the enhanced models. Finally, Table 5.3 shows the overall accuracy of the inferred labels for the PS-SLDS, SLDS DD-MCMC, SLDS VI, and SLDS VA results. It can be observed that PS-SLDS provides very accurate labeling results w.r.t. the ground truth. Moreover, PS-SLDS consistently improves upon the standard SLDSs across all six datasets. The overall experimental results show that PS-SLDS model is promising and provides a robust framework for the bee application. It should be noted that SLDS DD-MCMC is the most computationally intensive method, and PS-SLDS still improves on SLDS DD-MCMC in a consistent manner. 5.4 Conclusion In this section, we presented experimental results on real-world honey bee dance sequences, where the honey bee dances were modeled using a parametric segmental SLDS (PS-SLDS) model, i.e. combination of P-SLDS and S- SLDS. Both the qualitative and quantitative results show that the enhanced SLDS model can robustly infer the labels and global parameters. A large number of over-segmentations in labeling which appeared in standard SLDSs are not present in the PS- SLDS results. In addition, the results on the quantification abilities of PS-SLDS show that PS-SLDS can provide estimates which are very close to the ground truth. It was also shown that PS-SLDS consistently improves on SLDSs in overall accuracy. The consistent results show that PS-SLDS improves upon SLDS for the honey bee dance data and suggest that they may be promising for other applications. 21

26 Chapter 6 Hierarchical SLDS We propose to develop a hierarchical SLDS (H-SLDS) model as a future work of this proposed research. The H- SLDS model is a generalization of standard SLDSs and is designed to model data with hierarchical structure. For example, a hierarchical automaton model that corresponds to the upward-downward triangle sequence in Fig. 6.1(a) is illustrated in Fig. 6.1(b). It can be observed that the shared three primitive dynamic patterns appear in two different top substructures with the dependency on the relative position in the sequences. In the flat Markov models, such repeating sub-structures (in the middle layer) or primitive dynamic patterns (at the bottom level) need to be redundantly duplicated under the left-to-right modeling assumption. It is the ability to reuse sub-models in different contexts that makes hierarchical models more powerful than flat Markov models. Good examples of real-world temporal data which exhibit intrinsic hierarchy are honey bees and soccer players, shown in Fig The first example of the honey bee community shown in Fig. 6.2(a) consists of three different types of members, namely a queen, worker bees and drones. While the very primitive motion patterns over short time duration of all three types of bees would be rather similar, it is the clear difference in longer-term temporal patterns that lets us to identify the roles of each bee simply from their motion trajectories. For example, a queen bee has relatively less dynamic motion range over time, while worker bees have the most range of motion such as dancing within the hive and staying on and off from the hive. A potential development of an automated honey bee role classification system might greatly help the field biologists studying the quantitative behavior of bees by saving them a large amount of time needed for hand labeling process. Another example is soccer games, shown in Fig. 6.2(b). A soccer team consists of multiple players whose roles differ substantially depending on their positions, i.e., attack middle fielders, defenders, a goal keeper, center forwards, and etc. Again, the motion trajectories of each player over short duration does not provide strong cue on their roles. However, the longer-term trajectories provide us with relatively strong clue on the roles of players. Moreover, a player in a fixed position may exhibit multiple different patterns depending on whether the team is in attack phase or defense phase. Hence, a hierarchical model can be used to identify the top-level role of every player as well as the phases the player is transitioning through. In other words, the hierarchical models (a) (b) Figure 6.1: (a) An example upward-downward triangle sequence. (b) An example 3-level hierarchical automaton representing the triangle sequence. Solid lines represent horizontal transitions, dotted lines represent vertical transitions. Double circled nodes represent production states on which the corresponding dynamic patterns are overlaid. 22

27 (a) (b) Figure 6.2: (a) A scene of honey bee hive : a queen bee is color-marked in the middle, surrounded by drones and worker bees. (b) A shot of a soccer game where each team consists of multiple players with different roles. can be used to label the play of every player at multiple semantic/temporal resolutions. In the following sections, we describe the potential benefits obtained by the use of the hierarchical models, and introduce the H-SLDS model formally. Then, the structure learning problem for H-SLDSs is discussed with the anticipated approach to discover the sub-structures and their hierarchy from the data. Finally, the dataset obtained from the motion sensors attached to the human body is described as a real-world application. 6.1 Need for Hierarchical SLDSs The use of the hierarchical models for the automated temporal sequence analysis instead of the flat Markov models is motivated by the fact that hierarchical models provide the additional ability to capture the higher-level temporal patterns in addition to the pairwise LDS switching patterns captured by standard SLDSs. Moreover, hierarchical models provide the ability to construct an agglomerate model from a set of children hierarchical models. This characteristic of hierarchical models lead to the advantage of sharing sub-structures and re-using the repeating underlying structure, and provides the possibility to model data concisely. Due to this advantage, hierarchical models, when modeled appropriately, allow the users to solve several sequence analysis tasks more accurately. Bottom LDS layer labeling The task of bottom LDS layer labeling is the task of labeling the temporal sequences according to the LDS dynamics exhibited by data. The labeling task at this granularity is inherently low-level and is apt to produce over-segmentations or incorrect labels in the case where the data exhibits non-trivial noise. Nonetheless, hierarchical models provide the possibility to produce more accurate low-level labeling results, guided by the high-level temporal structure encoded in the model. The expectation is that the local data noise is filtered out effectively via the higher-level prior encoded in the model. This advantage suggests that the labeling results at the bottom LDS layer by the hierarchical models can be superior to the results obtained by the flat Markov models. Classification The classification tasks of temporal sequences can be benefited by the use of hierarchical models. One of the primary goals of classification tasks is to provide the users with the ability to categorize the low-level multivariate signals into semantically meaningful classes. However, it can be the case that there would be substantial gaps between the representational power of a single LDS and the high-level concept classes. In such cases, the mappings between the low-level LDSs and the high-level target concepts need to be built. For example, multivariate sequences may be obtained from the wearable sensors attached to an athlete. While a trainer would need a system which can automatically categorize the sequences into high-level motion patterns, e.g., jumping, walking, and arm-swing, such motion patterns may be hardly captured by a single LDS since each high-level motion is rather complex and is expected to consist of sub-motions, i.e., multiple LDSs. In such cases, our model needs to encode the temporal ordering structure between multiple LDSs. In the case where the 23

28 different classes are characterized by the difference in long-term ordering structures, e.g., the example in Fig. 6.1, hierarchical models can produce superior classification results by taking the descriptive temporal ordering structures into account. Sharing of sub-structures As mentioned above, an important advantage of hierarchical models is that they can share underlying substructures. In the case where there are multiple classes of sequences and we are interested in conducting classification or labeling tasks, the sharing of the sub-structures can boost the re-use of the computation, resulting in the speed up of inference. Especially, this is advantageous in the case where we are interested in the classification tasks. The LDS primitives at the bottom level or the short-term repetitive patterns of LDSs in the intermediate levels can be shared. Moreover, the learning process on multiple classes can discover interesting shared structure from data and provide us with more knowledge on the difference between the classes. On the other hand, flat Markov models have difficulty in structure sharing and does not provide a principled way to combine multiple models to a single agglomerate model. Labeling with semantically meaningful concepts The labeling task with semantically meaningful concepts is to label sequences w.r.t. the high-level semantics they exhibit. In fact, this task is equivalent to the sequential classifications task with unknown segment boundaries. Hierarchical models have recursive structure, i.e., hierarchical models incorporate hierarchical models as their sub-models at lower layers. Hence, an agglomerate hierarchical model can be built from a set of existing models and can be used for the labeling tasks. Such labeling tasks can be hardly conducted by standard SLDSs in principle unless the classes correspond to low-level primitive. As mentioned above, hierarchical models provide the possibilities to improve on flat Markov models for the sequence analysis tasks. There has been previous work on using hierarchical models within with HMMs. The related modeling approach is hierarchical HMM (H-HMM) [18, 37]. The H-SLDS model to be presented in the following section share similar architecture to H-HHMs in that H-SLDSs adopt hierarchical Markov chain to encode the higher-order structure within data. The well-studied formalisms developed for H-HHMs are adopted when necessary with modification adapted to encode continuous domain that SLDSs address. In the following sections, a definition of hierarchical SLDSs is introduced and the proposed bottom-up learning approach is discussed. 6.2 Graphical model representation of H-SLDS We can represent H-SLDS as a dynamic Bayesian network (DBN) as shown in Fig. 6.3(c). The example model has two layer hierarchy. The H-SLDS model with additional arcs and finish variables denoted with symbol F is built incrementally upon the standard SLDS model (shown in Fig. 6.3(a)) through the reasoning steps described in Fig. 6.3(a)-(c). In detail, Fig. 6.3(b) shows an SLDS model with a hierarchical Markov chain where the downward arcs from the upper layers to the lower layers encode the memberships and state transitions within each regime. This model has been used in previous SLDS extensions [24, 61]. The proposed H-SLDS model in Fig. 6.3(c) has two types of additional structure : diagonal upward arcs and finish variables. The diagonal upward arcs designed to identify the lower-level states which trigger the state transitions at the upper levels. And, the finish variable structure helps the inference phase to decode the label (segmentation) boundaries more accurately and to enforce the property that transitions in the upper layers occur only when there is a transition in the lower level. The closest work in terms of hierarchical Markov chain modeling would be the H-HMM work by Murphy and Paskin [37]. More in detail, each discrete node is denoted as l (h) t with the superscript (h) corresponding to the height of the layer where the bottom level corresponds to the layer level 1 and increases upward. The subscript t denotes the corresponding time-step. A slice of the DBN is marked in the dashed rectangle in Fig The state of the whole H-SLDS at time t is represented by the discrete state vector L 1:N t N, and the continuous state x t with the corresponding observation z t. The finish variable f (h) t by the ancestor variables L (h+1):n t = [l 1 t, l 2 t,..., l N t ] where the height of the hierarchy is at the h-th layer is a Boolean variable which indicates that the current regime conditioned finishes at time t when f (h) t = true. In the case where f (h) t = false, the current L (h+1):n t 1. Note = true. The downward regime at the h- layer continues with the the ancestor variables remaining constant, i.e., L (h+1):n t that if f (h) t = true, then for all the levels below h < h, the finish variables are true, i.e., f (h ) t 24

29 (a) (b) Figure 6.3: Graphical model representations of the hierarchical SLDS models. Note that the model of interest in this proposal is (c). The series of figures demonstrate the role of new structure one by one. (a) A graphical model of standard SLDSs. (b) Hierarchical Markov chain in the green boxed rectangle extends the conventional flat Markov chain in the standard SLDS model. The downward arcs from the upper layers to the lower layers encode the memberships and state transitions within each regime. (c) Proposed H-SLDS model. Additional structure of finish variables and diagonal upward arcs are introduced to descriptively indicate the lower-level states which trigger the state transitions at the upper level. The newly introduced finish variable structure allows the inference phase to decode the segmentation boundaries more accurately and to enforce the property that transitions in the upper layers occur only when there is a transition in the lower level. (c) 25

30 Figure 6.4: An example three-layer H-SLDS model that correspond to dumbbell exercise routines. In the hierarchic Markov chain, the first bottom layer corresponds to primitive LDS components, the second layer corresponds to exercise classes, and the top layer correspond to exercise plans/routines. arrows between the discrete states represent the fact that a parent state invokes a child sub-state. The upward going arcs between the f variables demonstrate the fact that a parent state can only switch when the lower sub-regime of the child states finishes, effectively enforcing call-return semantics. In contrast to the standard SLDSs, there is an additional dependency arrow from the bottom discrete states l (1) t 1 to the continuous states x t. The two arcs from l (1) t 1 and l(1) t onto the continuous state x t represents our modeling assumption that the two continuous variables x t 1 and x t are decoupled when a switching occurs, i.e., l (1) t 1 l(1) t. The details are to be described in the following sections. The main difference between the structure of the hierarchy presented in this proposal and the work in [37] is that every discrete state l (h) t of our model has a single incoming arrow from its parent state lt h+1 while the model in [37] includes the incoming arrows from all the ancestor variables L (h+1):n t. The simpler structure in this proposal work implies that the behavior of a particular layer is only dependent on the parent state independent from the rest of the ancestor states, enforcing a more modular call-return semantics. Such modeling assumption provides two main advantages. First, it reduces the number of parameters, which would ease learning as well as inference. Secondly, the modularized architecture facilitates the bottom-up oriented structure learning problem, which would be discussed in the following sections. For brevity of this section, more details on H-SLDS parameterization are described in Appendix A. For related work, please refer to [37]. An example three-layer H-SLDS model that correspond to dumbbell exercise routines is shown in Fig At the very bottom of the figure, a sample signal sequence is shown. At the first layer of the hierarchic Markov chain, there exist LDSs which can generate simple motion trajectory segments. At the second layer, several exercise patterns that we recognize, such as flat curl and back, are built from the primitive LDS components. Finally, the top layer can be thought of as exercise routines that might be recommended by gym instructors. 6.3 Inference in H-SLDS The inference algorithm for the H-SLDS model can be obtained via two approaches in general. First, an H-SLDS model can be converted to an equivalent standard SLDS model. Then, the existing inference algorithms enlisted in Section 2.3 can be re-used. The techniques used for S-SLDSs [43, 42] can be used to build an efficient version. Secondly, generic inference algorithms for graphical-models can be used to conduct inference for H-SLDSs. For example, a belief propagation (sum-product) algorithm can be used to infer the marginal probabilities at every node of H-SLDS, with slight modification necessary to approximate the exponentially growing number of continuous states. 26

31 Additionally, a max-sum algorithm can be used to find the (approximately) most likely configurations of H-SLDS. The details on belief propagation (sum-product) algorithm and max-sum algorithm are referred to [7]. Currently, the first approach of reusing existing inference methods after a model conversion is being used as the inference method for our research. 6.4 Learning in H-SLDSs The learning problem for H-SLDSs consists of two sub-problems : (1) parameter learning problem given a fixed model structure, and (2) structure learning problem. For the parameter learning problem, we plan to develop an EM based method to iteratively improve the parameter setting under maximum-likelihood principle. Based on the previous success of the EM-based parameter learning for H-HHMs [18, 37, 28, 56] and SLDSs [24, 61, 43, 42], the EM-based method is chosen as the candidate learning method for the proposed H-SLDS model. In terms of the second structure learning problem, we plan to develop an algorithm which incrementally builds hierarchies in a bottom-up manner. The structure learning consists of several sub-problems in itself, and the details and planned researches are described in the following section separately. Bottom-up approach for structure learning Figure 6.5: A structure learning problem example with a flipping triangle sequence. Though the procedure, a four layer H-SLDS is incrementally built adding hierarchies. The standard SLDS labels are aggregated into hierarchies as the parent aggregation states are created for every discovered repetitive patterns. The size of the multi-scale resolution time windows are growing with the hierarchy. On the right-most column, newly created aggregation nodes and the assigned child sequence patterns are displayed at the corresponding hierarchies. The problem of selecting a model structure is a crucial problem since the space for the possible candidate structures is huge. In detail, the number of layers, the number of states at every layer, and the parent-child memberships between the states from two adjacent layers need to be specified. In this section, we would focus on the structure learning from unaligned data obtained from a single class. This is not at the loss of generality since the labeling problem with high-level concept labels can be addressed by building an agglomerate H-SLDS model from a set of H-SLDSs already built. The approach for the structure learning we propose in this proposal is a bottom-up approach. Intuitively, an additional layer of hierarchies is built upon the existing model by creating a parent state for the frequently appearing 27

32 sub-string patterns of child states. In other words, a parent state is created with a call-return semantics where it returns when the assigned child string pattern is observed. The bottom-up learning approach consists of three sub-problems : (1) discovery of the primitive LDSs, (2) incremental layer building algorithm, and (3) stopping criterion to finalize the bottom-up structure learning. First, a sufficient (even redundant) number of LDS models are learned from data through either a greedy learning method (e.g., [32]) or a mixture estimation method [11]. Secondly, once a set of LDSs are available, a single layer H-SLDS model, i.e., a standard SLDS model, is initiated where the additional Markov switching model is estimated from data using EM. Then, a second duration layer which would abstract the duration of the regimes is introduced and transforms the single layer SLDS into a two layer H-SLDS model, i.e., an S-SLDS model, presented in Chapter 3. Consequently, the top layer segmental labels of the two layer H-SLDS can now be treated as alphabets since distinct alphabets would correspond to different dynamic patterns. Afterwards, additional layers are added one by one through sub-string analysis where a new parent state would be created for every sub-string sequence that appears frequently. There are existing candidate algorithms to detect the frequently occurring sub-string patterns [23]. Among them, the suffix tree algorithm (see [23] for references) looks promising since it allows to find all the sub-strings and their frequencies in the data in linear time w.r.t. the length of the data. Finally, we need to decide the appropriate levels for hierarchy and finalize the incremental learning procedure. We plan to investigate an information theoretic approach where the hierarchy learning is finalized once the information gain is very limited with additional hierarchies. A structure learning problem example with a triangle sequence is shown in Fig The sequence starts with an upward triangle pattern and ends with downward triangles where they repeat three times each. The figure shows that a hierarchical model can be built via a bottom-up approach. At the bottom level, the most likely labels of the standard SLDS model are obtained.. Then, we add the second layer with the duration information on top of the standard SLDS model and transforms it to an S-SLDS model, i.e., a two layer H-SLDS model. It can be observed on the right side of Fig. 6.5 that successive occurrence of identical symbols are grouped into segment symbols (1,2,3 ). Then, repetitive sub-string patterns are searched from the segmental symbol sequence to build the third layer for the H-SLDS model. If search is successful,, the system would find the two distinct repetitive patterns of upward and downward triangles, where the symbols A and B are used respectively to represent the individual triangle patterns. If successful, one more hierarchy would be built to discover the two regimes (A,B ) where these two top states would prefer upward triangles (A) and downward triangles (B) respectively. Finally, a single top state C would be created in the top hierarchy to encode C = A B and learning would concludes since there are no more information left to be learned. 6.5 Dataset : Wearable sensor data Figure 6.6: A sample of six dimensional exercise dataset collected from an inertial sensor through a mock exercise routine. The routine consists of six different exercise classes and the manually marked labels are shown : flat curl, shoulder extension, back, twist curl, shoulder press, tricep are labeled by the alphabets from A to F respectively. The dataset we plan to apply the hierarchical models is the wearable sensor data collected through the sensors attached to human bodies. The six dimensional data was capture by Minnen et. al. [36] from a mock exercise routine composed of six different dumbbell exercises. An XSens MT9 inertial motion sensor was attached to the subject s wrist by fitting it into a pouch sewn to the back of a thin glove. The original data is sub-sampled at 12.5Hz where three axis accelerometer and gyroscope readings are recorded. In total, 27.5 minutes of data over 32 sequences are collected to account for 20,711 frames. The data contains six different exercises and 864 total repetitions (144 occurrences of each exercise). Each frame is composed of the raw accelerometer and gyroscope readings leading to a six dimensional feature vector. 28

33 (a) (b) (c) (d) (e) (f) (g) Figure 6.7: Each figure corresponds to the following category : (a) back (b) flat curl (c) shoulder extension (d) shoulder press (e) tricep (f) twist curl (g) unknown. The top row images show the snapshot of each exercise pattern. The images in the second row shows the multivariate time series data in each category. The images in the third row represent the Markov chain models of SLDSs learned from data. Dark regions correspond to high probability area. The bottom row shows the labeling results of data in each category using the corresponding learned SLDS model where the LDS labels are color-coded. See text for more details. An example sequence is shown in Fig. 6.6 where the sequence contains all six classes of exercise data and manually marked labels are shown (from A to F ) with x-axis as time. It can be observed that a certain dynamic structure repeats in each segment. To help understanding data, visualizations of each exercise category and the corresponding cropped signals are illustrated in Fig As mentioned above, it can be expected that each exercise class can be decomposed into multiple LDS regions where certain LDS components are shared across exercise classes. To illustrate such properties, standard SLDSs are learned for every exercise class with a fixed set of 14 LDSs (which is learned from data via treating the data as a mixture of LDSs). The resulting Markov chain transition matrices are shown in the third row of Fig In each image, the first column shows the prior distribution, and the rest of the matrix on the right side represents the transition probability. Dark regions correspond to high probability area. Interesting sharing properties can be observed from this result. For example, back and flat curl shares similar LDS components in the beginning since both exercises start by lifting an arm from lower position. Additionally, flat curl and twist curl exercises show relatively similar Markov chain structure. The similar but different Markov chain structure of these two physically-resembled exercise patterns suggest that even subtle difference can be captured through the high-level modeling of data where the primitive patterns can be shared, promoting the conciseness of the model. To illustrate the sharing characteristics more explicitly, all the occurrences of each exercise pattern have been labeled using the corresponding SLDS models, and the color-coded LDS labels are shown in the bottom row of Fig It can be clearly observed that back and flat curl share common LDS components in the beginning of exercises. Additionally, flat curl and twist curl share LDS components in the middle of each occurrence in general. Interesting high-level structures in this data can be observed by plotting all the sequences with every exercise class color-coded, as seen in Fig. 6.8 (a). Every row corresponds to individual sequence, and x-axis corresponds to time. Each colored rectangle denotes an execution of an exercise pattern. It can be easily observed that there are certain structures appearing across the sequences. At the top level, it can be observed that there are four top-level routines as shown on the vertical bar in Fig. 6.8 (b). Moreover, it can be again observed that there are four sub-routines which appear in the different parts of the top-level routines, which are identified by the colored rectangles overlayed on Fig. 6.8 (b). Accordingly, we plan to learn H-SLDS models from the four different top-level routines reusing the repeating sub-routines as the intermediate hierarchy layer. 29

34 (a) (b) Figure 6.8: (a) Color plotting of exercises occurring across data sequences. Each row corresponds to different sequence, and colored rectangles denote executions of the corresponding exercises. (b) Four different top-level routines are marked on the left vertical bar, and four different sub-routines are overlayed by colored rectangles. With the learned H-SLDS models, we plan to analyze the learned structure to examine whether H-SLDSs can learn meaningful structure from data. Additionally, we plan to conduct classification and labeling tasks based on the learned models to distinguish sequences from different routines and provide automatic exercise annotation at multiple granularity. The results would provide measures to examine the hierarchical analysis power of the proposed H-SLDS model. Standard SLDSs would be used as the baseline model when appropriate, e.g., classification tasks. 6.6 Summary : Planned Future Work In summary, we plan to address the following remaining issues for H-SLDSs : Development of an EM-based parameter learning algorithm Development of a bottom-up learning algorithm which can learn hierarchic model structure from data Test of the developed methods on the real-world wearable sensor data for labeling and classification tasks. 6.7 Related Work HMMs are by far the most studied model for the temporal sequence analysis. The study on the hierarchical aspect of SLDSs in this proposal is also partly inspired by the previous work on H-HMMs. We review the related work on HMMs first, then describe the previous work on hierarchical models with continuous dynamics Hierarchical HMMs The most closely related modeling work to H-SLDS in HMM community is hierarchical HMMs (H-HMMs). H- HMMs have been introduced by Fine et al.[18] where the hierarchical structure was developed to capture the long-term correlations between the elementary observations. For example, Fine et al.[18] applied H-HMMs for the problems of English text modeling and cursive hand-writing recognition where they hand-crafted the model and learned the parameters from the data. Murphy and Paskin [37] introduced a DBN representation for H-HMMs and a linear time O(T ) inference algorithm for H-HMMs, w.r.t. the length of the observation T. The graphical model representation of H-SLDS in Fig. 6.2 follows their DBN representation. H-HMMs have been used for video analysis for play/break labeling for soccer videos [28] and two-layer H-HMMs have been used for the video-based activity recognition within home environments [56]. 30

35 6.7.2 Hierarchical Models with Continuous Dynamics We review previous work on the SLDS models with hierarchies. A work by Howard by Jebera, Dynamical System Trees (DSTs) [24], has a tree structure where the leaves of a tree correspond to an individual agent and the whole tree represents group behaviors. A branch of DST from the top node to a leaf node is analogous to the model shown in Fig. 6.3(b). DSTs have been used to classify attack/defense modes of American football video. While their work provides a novel inference algorithm based on structured variational methods, the structures of the models are hand-crafted by the users and the best model is selected through trials. Another closely related work is Multi-scale SLDSs (MS-SLDS) by Zoeter and Heskes [61]. The MS-SLDS model has a graphical model similar to is analogous to the model shown in Fig. 6.3(b). Hence, the parent state can switch at anytime and does not have the call-return semantics of the H-SLDS model. They introduce an inference algorithm based on Expectation Propagation [35] method, but does not address the structure learning issues. The segmental SLDS (S-SLDS) model [43, 42] presented in Chapter 3 of this proposal is a special case of H- SLDSs. S-SLDS is a two layer model where the top layer switches between regimes and the bottom layer counts the durations. 31

36 Chapter 7 Evaluation We propose to evaluate the SLDS extensions on the real world dataset to verify their applicability and robustness. The data, their characteristics, associated analysis tasks, and anticipated evaluation plan are described below. In case the implementation and sample results are available, they are described with brief summary and publication. Dim Classes Sequences Total (Avg) frames LDS labeled Aligned / Unaligned Honey bee ,292 (715) O O / Wearable ,711(647) O / O Table 7.1: Honey bee dance and wearable exercise sequences with corresponding characteristics. The columns refer to the dimensionality, the total number of classes, the total number of sequences, the total time frame length of the whole dataset, the average counts of frames per sequence, the availability of the labeling at the LDS granularity, the availability of the aligned data, and the availability of the data in unaligned format. In case the corresponding format exists, it is marked with O, and otherwise. We plan to apply the developed theories on the two real world datasets : (1) honey bee dance dataset, and (2) wearable exercise dataset. Table 7.1 shows the characteristics of these datasets. The tabulated characteristics of data include the dimensionality, the total number of classes, the total number of sequences, the total time frame length of the whole dataset, the average counts of frames per sequence, the availability of the labeling at the LDS granularity, the availability of the aligned data, and the availability of the data in unaligned format. The availability of LDS labels indicates that the labels of the data is specified by the users at the granularity of the LDS primitives. In such cases, multiple LDSs can be learned separately based on the existing labels. For example, honey bee dance data clearly exhibits three different dynamic regimes and were labeled at the LDS granularity w.r.t. the corresponding dynamics. On the other hand, if the LDS labels are not provided, e.g., wearable exercise dataset, the LDS primitives should be learned in an unsupervised manner. The availability of aligned labeling indicates that each labeling segment within data contains exactly one occurrence of the corresponding class pattern. On the other hand, unaligned data are labeled but unknown number of repeating occurrences can exist within each segment. The example of such repetition can be observed in Fig. 6.6 in Section 6.5 where each segment contains more than three repetitions of the same exercise pattern. During the proposed evaluation phase, important temporal sequence analysis tasks are conducted in both domains using the proposed models. The standard SLDS model is used as the baseline model where the validity of the proposed models are assessed by measuring the accuracy and comparing the results of the extensions of SLDSs against the results obtained by the standard SLDS model. Below, the details of the planned analysis tasks are described and the summary of the experimental results are described when available. Note that the experimental setup and the results for the honey bee dance dataset is described in detail in Chapter 5. Hence, the details of the results on the honey bee dataset is omitted for brevity in the following sections. 32

37 S-SLDS P-SLDS PS-SLDS H-SLDS Honey bee dance dataset Superior labeling [43] in Chapter 3 Superior global variable quantification [41] in Chapter 4 Superior labeling and quantification [43, 42] in Chapter 5 Wearable exercise dataset A part of H-SLDS work In Chapter 6 October 2008 Table 7.2: Evaluation grid showing the various extensions of SLDSs against the two datasets. Citations, corresponding chapters and summary of evaluation results are provided for already implemented cases while an expected completion date is provided in other cases. 7.1 Honey bee dance dataset Labeling Standard SLDS (Chapter 2) and PS-SLDS (Chapter 3, 4) models are learned from data where the available labels are used to conduct supervised learning. Then, the accuracy of the labeling results obtained from PS- SLDSs are compared w.r.t. the existing ground-truth labels to verify the benefits of the additional duration and global parameter modeling in the PS-SLDS model. The experimental results showed that the learned PS-SLDSs substantially and consistently outperform standard SLDSs [41, 43, 42]. Global parameter quantification A PS-SLDS (Chapter 3, 4, 4) model is learned from data where the labels and the known global parameters, e.g., angle and duration of the dances, are used to learn the canonical template of honey bee dances and the global parameterization. The learned model was used to find the angles and the durations of the honey bee dances where the labeling and the quantization tasks were conducted simultaneously. Another set of global parameter estimates were obtained using standard SLDSs through a post-processing step which is designed to estimate the global parameters from the SLDS labels. Then, the accuracy of quantified global parameters and the obtained labels are compared to the available ground truth values. The experimental results demonstrated that the global parameters and the labels obtained by PS-SLDSs are consistently superior to those of SLDSs both qualitatively and quantitatively [41, 42]. 7.2 Wearable exercise dataset As proposed in Chapter 6, we plan to develop a structure learning algorithm for H-SLDSs and apply it to the wearable exercise dataset. The plan is to assess the benefits of the H-SLDS model over the standard SLDSs on the wearable exercise dataset described in Section Both the H-SLDS model and the standard SLDS models are to be learned from the routine and out-of-routine dataset. Classification The learned H-SLDS and standard SLDS models are to be tested for classification tasks where both the routine and out-of-routine data are planned to be used as test dataset. Labeling Once H-SLDSs and SLDSs are learned for the classification tasks, they can be used for labeling. In principle, H-SLDSs provide a natural way to combine the learned sub-models into a single agglomerate model which 33

Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems

Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems Sang Min Oh James M. Rehg Tucker Balch Frank Dellaert GVU Center, College of Computing Georgia Institute

More information

Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems

Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems Sang Min Oh James M. Rehg Tucker Balch Frank Dellaert GVU Center, College of Computing Georgia Institute

More information

Data-Driven MCMC for Learning and Inference in Switching Linear Dynamic Systems

Data-Driven MCMC for Learning and Inference in Switching Linear Dynamic Systems Data-Driven MCMC for Learning and Inference in Switching Linear Dynamic Systems Sang Min Oh James M. Rehg Tucker Balch Frank Dellaert College of Computing, Georgia Institute of Technology 801 Atlantic

More information

Learning Switching Linear Models of Human Motion

Learning Switching Linear Models of Human Motion To appear in Neural Information Processing Systems 13, MIT Press, Cambridge, MA Learning Switching Linear Models of Human Motion Vladimir Pavlović and James M. Rehg Compaq - Cambridge Research Lab Cambridge,

More information

Computer vision: models, learning and inference. Chapter 10 Graphical Models

Computer vision: models, learning and inference. Chapter 10 Graphical Models Computer vision: models, learning and inference Chapter 10 Graphical Models Independence Two variables x 1 and x 2 are independent if their joint probability distribution factorizes as Pr(x 1, x 2 )=Pr(x

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

CS 664 Flexible Templates. Daniel Huttenlocher

CS 664 Flexible Templates. Daniel Huttenlocher CS 664 Flexible Templates Daniel Huttenlocher Flexible Template Matching Pictorial structures Parts connected by springs and appearance models for each part Used for human bodies, faces Fischler&Elschlager,

More information

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System Proc. of IEEE Conference on Computer Vision and Pattern Recognition, vol.2, II-131 II-137, Dec. 2001. Production of Video Images by Computer Controlled Cameras and Its Application to TV Conference System

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Introduction to Trajectory Clustering. By YONGLI ZHANG

Introduction to Trajectory Clustering. By YONGLI ZHANG Introduction to Trajectory Clustering By YONGLI ZHANG Outline 1. Problem Definition 2. Clustering Methods for Trajectory data 3. Model-based Trajectory Clustering 4. Applications 5. Conclusions 1 Problem

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

Joint design of data analysis algorithms and user interface for video applications

Joint design of data analysis algorithms and user interface for video applications Joint design of data analysis algorithms and user interface for video applications Nebojsa Jojic Microsoft Research Sumit Basu Microsoft Research Nemanja Petrovic University of Illinois Brendan Frey University

More information

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models Gleidson Pegoretti da Silva, Masaki Nakagawa Department of Computer and Information Sciences Tokyo University

More information

MR IMAGE SEGMENTATION

MR IMAGE SEGMENTATION MR IMAGE SEGMENTATION Prepared by : Monil Shah What is Segmentation? Partitioning a region or regions of interest in images such that each region corresponds to one or more anatomic structures Classification

More information

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN:

IJREAT International Journal of Research in Engineering & Advanced Technology, Volume 1, Issue 5, Oct-Nov, 2013 ISSN: Semi Automatic Annotation Exploitation Similarity of Pics in i Personal Photo Albums P. Subashree Kasi Thangam 1 and R. Rosy Angel 2 1 Assistant Professor, Department of Computer Science Engineering College,

More information

Facial Expression Recognition Using Non-negative Matrix Factorization

Facial Expression Recognition Using Non-negative Matrix Factorization Facial Expression Recognition Using Non-negative Matrix Factorization Symeon Nikitidis, Anastasios Tefas and Ioannis Pitas Artificial Intelligence & Information Analysis Lab Department of Informatics Aristotle,

More information

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University NIPS 2008: E. Sudderth & M. Jordan, Shared Segmentation of Natural

More information

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves Machine Learning A 708.064 11W 1sst KU Exercises Problems marked with * are optional. 1 Conditional Independence I [2 P] a) [1 P] Give an example for a probability distribution P (A, B, C) that disproves

More information

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018 Assignment 2 Unsupervised & Probabilistic Learning Maneesh Sahani Due: Monday Nov 5, 2018 Note: Assignments are due at 11:00 AM (the start of lecture) on the date above. he usual College late assignments

More information

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs

CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs CS 543: Final Project Report Texture Classification using 2-D Noncausal HMMs Felix Wang fywang2 John Wieting wieting2 Introduction We implement a texture classification algorithm using 2-D Noncausal Hidden

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Segmentation and Tracking of Partial Planar Templates

Segmentation and Tracking of Partial Planar Templates Segmentation and Tracking of Partial Planar Templates Abdelsalam Masoud William Hoff Colorado School of Mines Colorado School of Mines Golden, CO 800 Golden, CO 800 amasoud@mines.edu whoff@mines.edu Abstract

More information

Epitomic Analysis of Human Motion

Epitomic Analysis of Human Motion Epitomic Analysis of Human Motion Wooyoung Kim James M. Rehg Department of Computer Science Georgia Institute of Technology Atlanta, GA 30332 {wooyoung, rehg}@cc.gatech.edu Abstract Epitomic analysis is

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Introduction to SLAM Part II. Paul Robertson

Introduction to SLAM Part II. Paul Robertson Introduction to SLAM Part II Paul Robertson Localization Review Tracking, Global Localization, Kidnapping Problem. Kalman Filter Quadratic Linear (unless EKF) SLAM Loop closing Scaling: Partition space

More information

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction Stefan Müller, Gerhard Rigoll, Andreas Kosmala and Denis Mazurenok Department of Computer Science, Faculty of

More information

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern

More information

Annotation of Human Motion Capture Data using Conditional Random Fields

Annotation of Human Motion Capture Data using Conditional Random Fields Annotation of Human Motion Capture Data using Conditional Random Fields Mert Değirmenci Department of Computer Engineering, Middle East Technical University, Turkey mert.degirmenci@ceng.metu.edu.tr Anıl

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Learning and Recognizing Visual Object Categories Without First Detecting Features

Learning and Recognizing Visual Object Categories Without First Detecting Features Learning and Recognizing Visual Object Categories Without First Detecting Features Daniel Huttenlocher 2007 Joint work with D. Crandall and P. Felzenszwalb Object Category Recognition Generic classes rather

More information

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Estimating Human Pose in Images. Navraj Singh December 11, 2009 Estimating Human Pose in Images Navraj Singh December 11, 2009 Introduction This project attempts to improve the performance of an existing method of estimating the pose of humans in still images. Tasks

More information

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D.

PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. PSU Student Research Symposium 2017 Bayesian Optimization for Refining Object Proposals, with an Application to Pedestrian Detection Anthony D. Rhodes 5/10/17 What is Machine Learning? Machine learning

More information

CRF Based Point Cloud Segmentation Jonathan Nation

CRF Based Point Cloud Segmentation Jonathan Nation CRF Based Point Cloud Segmentation Jonathan Nation jsnation@stanford.edu 1. INTRODUCTION The goal of the project is to use the recently proposed fully connected conditional random field (CRF) model to

More information

Human Upper Body Pose Estimation in Static Images

Human Upper Body Pose Estimation in Static Images 1. Research Team Human Upper Body Pose Estimation in Static Images Project Leader: Graduate Students: Prof. Isaac Cohen, Computer Science Mun Wai Lee 2. Statement of Project Goals This goal of this project

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks Si Chen The George Washington University sichen@gwmail.gwu.edu Meera Hahn Emory University mhahn7@emory.edu Mentor: Afshin

More information

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

PROBLEM FORMULATION AND RESEARCH METHODOLOGY PROBLEM FORMULATION AND RESEARCH METHODOLOGY ON THE SOFT COMPUTING BASED APPROACHES FOR OBJECT DETECTION AND TRACKING IN VIDEOS CHAPTER 3 PROBLEM FORMULATION AND RESEARCH METHODOLOGY The foregoing chapter

More information

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Hidden Markov Models Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi Sequential Data Time-series: Stock market, weather, speech, video Ordered: Text, genes Sequential

More information

Estimating the wavelength composition of scene illumination from image data is an

Estimating the wavelength composition of scene illumination from image data is an Chapter 3 The Principle and Improvement for AWB in DSC 3.1 Introduction Estimating the wavelength composition of scene illumination from image data is an important topics in color engineering. Solutions

More information

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009 Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer

More information

Level-set MCMC Curve Sampling and Geometric Conditional Simulation

Level-set MCMC Curve Sampling and Geometric Conditional Simulation Level-set MCMC Curve Sampling and Geometric Conditional Simulation Ayres Fan John W. Fisher III Alan S. Willsky February 16, 2007 Outline 1. Overview 2. Curve evolution 3. Markov chain Monte Carlo 4. Curve

More information

Expectation Maximization (EM) and Gaussian Mixture Models

Expectation Maximization (EM) and Gaussian Mixture Models Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation

More information

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework IEEE SIGNAL PROCESSING LETTERS, VOL. XX, NO. XX, XXX 23 An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework Ji Won Yoon arxiv:37.99v [cs.lg] 3 Jul 23 Abstract In order to cluster

More information

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Lecture 21 : A Hybrid: Deep Learning and Graphical Models 10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms for Inference Fall 2014 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 1 Course Overview This course is about performing inference in complex

More information

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Descriptive model A descriptive model presents the main features of the data

More information

Conditional Random Fields for Object Recognition

Conditional Random Fields for Object Recognition Conditional Random Fields for Object Recognition Ariadna Quattoni Michael Collins Trevor Darrell MIT Computer Science and Artificial Intelligence Laboratory Cambridge, MA 02139 {ariadna, mcollins, trevor}@csail.mit.edu

More information

Complex Prediction Problems

Complex Prediction Problems Problems A novel approach to multiple Structured Output Prediction Max-Planck Institute ECML HLIE08 Information Extraction Extract structured information from unstructured data Typical subtasks Named Entity

More information

Generalized Inverse Reinforcement Learning

Generalized Inverse Reinforcement Learning Generalized Inverse Reinforcement Learning James MacGlashan Cogitai, Inc. james@cogitai.com Michael L. Littman mlittman@cs.brown.edu Nakul Gopalan ngopalan@cs.brown.edu Amy Greenwald amy@cs.brown.edu Abstract

More information

A Fast Learning Algorithm for Deep Belief Nets

A Fast Learning Algorithm for Deep Belief Nets A Fast Learning Algorithm for Deep Belief Nets Geoffrey E. Hinton, Simon Osindero Department of Computer Science University of Toronto, Toronto, Canada Yee-Whye Teh Department of Computer Science National

More information

08 An Introduction to Dense Continuous Robotic Mapping

08 An Introduction to Dense Continuous Robotic Mapping NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy

More information

Dynamic Human Shape Description and Characterization

Dynamic Human Shape Description and Characterization Dynamic Human Shape Description and Characterization Z. Cheng*, S. Mosher, Jeanne Smith H. Cheng, and K. Robinette Infoscitex Corporation, Dayton, Ohio, USA 711 th Human Performance Wing, Air Force Research

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

An Efficient Approach for Color Pattern Matching Using Image Mining

An Efficient Approach for Color Pattern Matching Using Image Mining An Efficient Approach for Color Pattern Matching Using Image Mining * Manjot Kaur Navjot Kaur Master of Technology in Computer Science & Engineering, Sri Guru Granth Sahib World University, Fatehgarh Sahib,

More information

Chapter 10. Conclusion Discussion

Chapter 10. Conclusion Discussion Chapter 10 Conclusion 10.1 Discussion Question 1: Usually a dynamic system has delays and feedback. Can OMEGA handle systems with infinite delays, and with elastic delays? OMEGA handles those systems with

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION

CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant

More information

Ambiguity Detection by Fusion and Conformity: A Spectral Clustering Approach

Ambiguity Detection by Fusion and Conformity: A Spectral Clustering Approach KIMAS 25 WALTHAM, MA, USA Ambiguity Detection by Fusion and Conformity: A Spectral Clustering Approach Fatih Porikli Mitsubishi Electric Research Laboratories Cambridge, MA, 239, USA fatih@merl.com Abstract

More information

An Introduction to Pattern Recognition

An Introduction to Pattern Recognition An Introduction to Pattern Recognition Speaker : Wei lun Chao Advisor : Prof. Jian-jiun Ding DISP Lab Graduate Institute of Communication Engineering 1 Abstract Not a new research field Wide range included

More information

The Anatomical Equivalence Class Formulation and its Application to Shape-based Computational Neuroanatomy

The Anatomical Equivalence Class Formulation and its Application to Shape-based Computational Neuroanatomy The Anatomical Equivalence Class Formulation and its Application to Shape-based Computational Neuroanatomy Sokratis K. Makrogiannis, PhD From post-doctoral research at SBIA lab, Department of Radiology,

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach

Translation Symmetry Detection: A Repetitive Pattern Analysis Approach 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops Translation Symmetry Detection: A Repetitive Pattern Analysis Approach Yunliang Cai and George Baciu GAMA Lab, Department of Computing

More information

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION 122 CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION 5.1 INTRODUCTION Face recognition, means checking for the presence of a face from a database that contains many faces and could be performed

More information

CHAPTER 4: CLUSTER ANALYSIS

CHAPTER 4: CLUSTER ANALYSIS CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD

Expectation-Maximization. Nuno Vasconcelos ECE Department, UCSD Expectation-Maximization Nuno Vasconcelos ECE Department, UCSD Plan for today last time we started talking about mixture models we introduced the main ideas behind EM to motivate EM, we looked at classification-maximization

More information

Fast trajectory matching using small binary images

Fast trajectory matching using small binary images Title Fast trajectory matching using small binary images Author(s) Zhuo, W; Schnieders, D; Wong, KKY Citation The 3rd International Conference on Multimedia Technology (ICMT 2013), Guangzhou, China, 29

More information

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C, Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

Tai Chi Motion Recognition Using Wearable Sensors and Hidden Markov Model Method

Tai Chi Motion Recognition Using Wearable Sensors and Hidden Markov Model Method Tai Chi Motion Recognition Using Wearable Sensors and Hidden Markov Model Method Dennis Majoe 1, Lars Widmer 1, Philip Tschiemer 1 and Jürg Gutknecht 1, 1 Computer Systems Institute, ETH Zurich, Switzerland

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,

More information

Bilevel Sparse Coding

Bilevel Sparse Coding Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Short Survey on Static Hand Gesture Recognition

Short Survey on Static Hand Gesture Recognition Short Survey on Static Hand Gesture Recognition Huu-Hung Huynh University of Science and Technology The University of Danang, Vietnam Duc-Hoang Vo University of Science and Technology The University of

More information

Modeling time series with hidden Markov models

Modeling time series with hidden Markov models Modeling time series with hidden Markov models Advanced Machine learning 2017 Nadia Figueroa, Jose Medina and Aude Billard Time series data Barometric pressure Temperature Data Humidity Time What s going

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Evaluation

More information

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018

arxiv: v1 [cond-mat.dis-nn] 30 Dec 2018 A General Deep Learning Framework for Structure and Dynamics Reconstruction from Time Series Data arxiv:1812.11482v1 [cond-mat.dis-nn] 30 Dec 2018 Zhang Zhang, Jing Liu, Shuo Wang, Ruyue Xin, Jiang Zhang

More information

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Sea Chen Department of Biomedical Engineering Advisors: Dr. Charles A. Bouman and Dr. Mark J. Lowe S. Chen Final Exam October

More information

ECSE-626 Project: An Adaptive Color-Based Particle Filter

ECSE-626 Project: An Adaptive Color-Based Particle Filter ECSE-626 Project: An Adaptive Color-Based Particle Filter Fabian Kaelin McGill University Montreal, Canada fabian.kaelin@mail.mcgill.ca Abstract The goal of this project was to discuss and implement a

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation

More information

CS 223B Computer Vision Problem Set 3

CS 223B Computer Vision Problem Set 3 CS 223B Computer Vision Problem Set 3 Due: Feb. 22 nd, 2011 1 Probabilistic Recursion for Tracking In this problem you will derive a method for tracking a point of interest through a sequence of images.

More information

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013

Outlier Ensembles. Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY Keynote, Outlier Detection and Description Workshop, 2013 Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Improving Recognition through Object Sub-categorization

Improving Recognition through Object Sub-categorization Improving Recognition through Object Sub-categorization Al Mansur and Yoshinori Kuno Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi, Saitama 338-8570,

More information

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision Object Recognition Using Pictorial Structures Daniel Huttenlocher Computer Science Department Joint work with Pedro Felzenszwalb, MIT AI Lab In This Talk Object recognition in computer vision Brief definition

More information

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract Clustering Sequences with Hidden Markov Models Padhraic Smyth Information and Computer Science University of California, Irvine CA 92697-3425 smyth@ics.uci.edu Abstract This paper discusses a probabilistic

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational

More information

Hidden Markov decision trees

Hidden Markov decision trees Hidden Markov decision trees Michael I. Jordan*, Zoubin Ghahramanit, and Lawrence K. Saul* {jordan.zoubin.lksaul}~psyche.mit.edu *Center for Biological and Computational Learning Massachusetts Institute

More information

Learning to bounce a ball with a robotic arm

Learning to bounce a ball with a robotic arm Eric Wolter TU Darmstadt Thorsten Baark TU Darmstadt Abstract Bouncing a ball is a fun and challenging task for humans. It requires fine and complex motor controls and thus is an interesting problem for

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

ECE521 Lecture 18 Graphical Models Hidden Markov Models

ECE521 Lecture 18 Graphical Models Hidden Markov Models ECE521 Lecture 18 Graphical Models Hidden Markov Models Outline Graphical models Conditional independence Conditional independence after marginalization Sequence models hidden Markov models 2 Graphical

More information

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Latent Variable Models for Structured Prediction and Content-Based Retrieval Latent Variable Models for Structured Prediction and Content-Based Retrieval Ariadna Quattoni Universitat Politècnica de Catalunya Joint work with Borja Balle, Xavier Carreras, Adrià Recasens, Antonio

More information

An Introduction To Automatic Tissue Classification Of Brain MRI. Colm Elliott Mar 2014

An Introduction To Automatic Tissue Classification Of Brain MRI. Colm Elliott Mar 2014 An Introduction To Automatic Tissue Classification Of Brain MRI Colm Elliott Mar 2014 Tissue Classification Tissue classification is part of many processing pipelines. We often want to classify each voxel

More information

Clustering & Dimensionality Reduction. 273A Intro Machine Learning

Clustering & Dimensionality Reduction. 273A Intro Machine Learning Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning

More information