Modeling time series with hidden Markov models

Similar documents
Hidden Markov Model for Sequential Data

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

Using Hidden Markov Models to analyse time series data

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017

ADVANCED MACHINE LEARNING. Mini-Project Overview

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

Machine Learning (BSMC-GA 4439) Wenke Liu

An Introduction to Hidden Markov Models

Clustering Relational Data using the Infinite Relational Model

Clustering Lecture 5: Mixture Model

Introduction to Mobile Robotics

Hidden Markov Models in the context of genetic analysis

Machine Learning (BSMC-GA 4439) Wenke Liu

Biology 644: Bioinformatics

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Clustering web search results

Structured Learning. Jun Zhu

An Introduction to Pattern Recognition

18 October, 2013 MVA ENS Cachan. Lecture 6: Introduction to graphical models Iasonas Kokkinos

Introduction to Machine Learning CMU-10701

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Time series, HMMs, Kalman Filters

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009

Introduction to SLAM Part II. Paul Robertson

Parallel HMMs. Parallel Implementation of Hidden Markov Models for Wireless Applications

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

Lecture 11: Clustering Introduction and Projects Machine Learning

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT

A reevaluation and benchmark of hidden Markov Models

Latent Variable Models and Expectation Maximization

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Incremental learning of motion primitives for full body motions

COMS 4771 Clustering. Nakul Verma

Expectation Maximization. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University

Graphical Models & HMMs

Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University

ModelStructureSelection&TrainingAlgorithmsfor an HMMGesture Recognition System

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

COMP90051 Statistical Machine Learning

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

Summary: A Tutorial on Learning With Bayesian Networks

application of learning vector quantization algorithms. In Proceedings of the International Joint Conference on

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

RJaCGH, a package for analysis of

Warped Mixture Models

ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means

What is machine learning?

Machine Learning Lecture 3

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Machine Learning Lecture 3

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 15, 2015

Tai Chi Motion Recognition Using Wearable Sensors and Hidden Markov Model Method

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

08 An Introduction to Dense Continuous Robotic Mapping

Representability of Human Motions by Factorial Hidden Markov Models

Vorhersage von Schadzuständen durch Nutzung von Hidden Markov Modellen Master-Thesis von Zahra Fardhosseini Tag der Einreichung: 15.

Machine Learning. Sourangshu Bhattacharya

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

ε-machine Estimation and Forecasting

Exam Topics. Search in Discrete State Spaces. What is intelligence? Adversarial Search. Which Algorithm? 6/1/2012

Thesis Proposal : Switching Linear Dynamic Systems with Higher-order Temporal Structure. Sang Min Oh

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Exercise 5. Deadlines: Monday (final, no student correction) Matlabs Statistics Toolbox contains the following functions for HMM :

Constraints in Particle Swarm Optimization of Hidden Markov Models

Acoustic to Articulatory Mapping using Memory Based Regression and Trajectory Smoothing

OnLine Handwriting Recognition

the number of states must be set in advance, i.e. the structure of the model is not t to the data, but given a priori the algorithm converges to a loc

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points

Dynamic Thresholding for Image Analysis

Machine Learning Lecture 3

K-Means and Gaussian Mixture Models

CSCI 599 Class Presenta/on. Zach Levine. Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates

DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

Humanoid Robotics. Monte Carlo Localization. Maren Bennewitz

Machine Learning. Unsupervised Learning. Manfred Huber

27: Hybrid Graphical Models and Neural Networks

Bayes Net Learning. EECS 474 Fall 2016

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

CONTENT analysis of video is to find meaningful structures

Quadruped Robots and Legged Locomotion

Smooth Image Segmentation by Nonparametric Bayesian Inference

Human Motion Synthesis by Motion Manifold Learning and Motion Primitive Segmentation

Combining PGMs and Discriminative Models for Upper Body Pose Detection

Biostatistics 615/815 Lecture 23: The Baum-Welch Algorithm Advanced Hidden Markov Models

Computer Project #2 (Matrix Operations)

low bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11:

Heterotachy models in BayesPhylogenies

10-701/15-781, Fall 2006, Final

error low bias high variance test set training set high low Model Complexity Typical Behaviour 2 CSC2515 Machine Learning high bias low variance

Nonparametric Bayesian Texture Learning and Synthesis

Transcription:

Modeling time series with hidden Markov models Advanced Machine learning 2017 Nadia Figueroa, Jose Medina and Aude Billard

Time series data Barometric pressure Temperature Data Humidity Time What s going on here? Modeling time series with HMMs 2

Time series data Data What s the problem setting? Time We don t care about time We have several trajectories with identical duration We have unstructured trajectory(ies)! Hum. Data Time Explicit time dependency Consider dependency on the past Too complex! Modeling time series with HMMs 3

Unstructured time series data How to simplify this problem? Sunny Cloudy Rainy Consider dependency on the past Markov assumption Modeling time series with HMMs 4

Outline First part (10:15 11:00): Recap on Markov chains Hidden Markov Model (HMM) - Recognition of time series - ML Parameter estimation Data Time Second part (11:15 12:00): Time series segmentation Bayesian nonparametrics for HMMs https://github.com/epfl-lasa/ml_toolbox Modeling time series with HMMs 5

Outline first part Sunny Cloudy Rainy Modeling time series with HMMs 6

Markov chains Sunny Cloudy Transition matrix Sunny Cloudy Rainy Sunny Cloudy Rainy Rainy Initial probabilities Sunny Cloudy Rainy Modeling time series with HMMs 7

Likelihood of a Markov chain Sunny Sunny Cloudy Sunny Cloudy Rainy Transition matrix Sunny Cloudy Rainy Initial probabilities Sunny Cloudy Rainy Modeling time series with HMMs 8

Learning Markov chains Topologies Sunny Sunny Cloudy Periodic Left-to-right Ergodic Modeling time series with HMMs 9

Outline first part Sunny Cloudy Rainy Modeling time series with HMMs 10

Hidden Markov model Sunny Cloudy Sunny Cloudy Rainy Transition matrix Sunny Cloudy Rainy Rainy Initial probabilities Sunny Cloudy Rainy Modeling time series with HMMs 11

Likelihood of an HMM Sunny Transition matrix Sunny Cloudy Rainy Cloudy Rainy Initial probabilities Sunny Cloudy Rainy Forward variable Modeling time series with HMMs 12

Likelihood of an HMM Sunny Transition matrix Sunny Cloudy Rainy Cloudy Rainy Initial probabilities Sunny Cloudy Rainy Forward variable Modeling time series with HMMs 13

Likelihood of an HMM Sunny Transition matrix Sunny Cloudy Rainy Cloudy Rainy Initial probabilities Sunny Cloudy Rainy Backward variable Modeling time series with HMMs 14

Learning an HMM Baum-Welch algorithm (Expectation-Maximization for HMMs) - Iterative solution - Converges to local minimum Starting from an initial find a such that E-step: Given an observation sequence and a model, find the probabilities of the states to have produced those observations. M-step: Given the output of the E-step, update the model parameters to better fit the observations. Modeling time series with HMMs 15

Learning an HMM E-step: Probability of being in state i at time k and transition to state j Probability of being in state i at time k Modeling time series with HMMs 16

Learning an HMM M-step: Probability of being in state i at time k and transition to state j Probability of being in state i at time k Modeling time series with HMMs 17

Learning an HMM HMM is a parametric technique (Fixed number of states, fixed topology) Heuristics to determining the optimal number of states X : dataset; N : number of datapoints; K: number of free parameters - Aikaike Information Criterion: AIC= 2ln L+ 2K ( ) - Bayesian Information Criterion: BIC = 2ln L + K ln N L: maximum likelihood of the model given K parameters Choosing AIC versus BIC depends on the application: Lower Is the BIC purpose implies of either analysis fewer explanatory make predictions, variables, or better to decide fit, or which both. model best represents reality? As the number of datapoints (observations) increase, BIC assigns more weights AIC may have better predictive ability than BIC, but BIC finds a computationally to more simpler efficient models solution. than AIC. Modeling time series with HMMs 18

Applications of HMMs State estimation: What is the most probable state/state sequence of the system? Prediction: What are the most probable next observations/state of the system? Model selection: What is the most likely model that represents these observations? Modeling time series with HMMs 19

Examples Speech recognition: Left-to-right model States are phonemes Observations in frequency domain D.B. Paul., Speech Recognition Using Hidden Markov Models, The Lincoln laboratory journal, 1990 Modeling time series with HMMs 20

Examples Motion prediction: Periodic model Observations are observed joints Simulate/predict walking patterns Karg, Michelle, et al. "Human movement analysis: Extension of the f-statistic to time series using hmm." Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on. IEEE, 2013. Modeling time series with HMMs 21

Examples Motion prediction: Left-to-right models Autonomous segmentation Recognition + prediction Modeling time series with HMMs 22

Examples Motion prediction: Left-to-right models Autonomous segmentation Recognition + prediction Modeling time series with HMMs 23

Examples Motion prediction: Left-to-right model Each state is a dynamical system Modeling time series with HMMs 24

Examples Motion recognition: Toy training set 1 player 7 actions 1 Hidden Markov model per action Recognition of most likely motion and prediction of next step. MATLAB demo Modeling time series with HMMs 25

Outline First part (10:15 11:00): Recap on Markov chains Hidden Markov Model (HMM) - Recognition of time series - ML Parameter estimation Data Time Second part (11:15 12:00): Time series segmentation Bayesian non-parametrics for HMMs https://github.com/epfl-lasa/ml_toolbox Modeling time series with HMMs 26

Time series Segmentation Times-series = Sequence of discrete segments Why is this an important problem? Modeling time series with HMMs 27

Segmentation of Speech Signals Segmenting a continuous speech signal into sets of distinct words. Modeling time series with HMMs 28

Segmentation of Speech Signals Segmenting a continuous speech signal into sets of distinct words. I am on a diet. seafood I see food and I eat it! Modeling time series with HMMs 29

Segmentation of Human Motion Data Segmention of Continuous Motion Capture data from exercise routines into motion categories Jumping Jacks Knee Raises Arm Circles Squats Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009 Modeling time series with HMMs 30

Segmentation in Human Motion Data 12 Variables - Torso position - Waist Angles (2) - Neck Angle - Shoulder Angles -.. Emily Fox et al.., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009 Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009 Modeling time series with HMMs 31

Segmentation in Robotics Learning Complex Sequential Tasks from Demonstration 7 Variables - Position - Orientation Reach Grate Trash Modeling time series with HMMs 32

HMM for Time series Segmentation Assumptions: The time-series has been generated by a system that transitions between a set of hidden states: At each time step, a sample is drawn from an emission model associated to the current hidden state: Modeling time series with HMMs 33

HMM for Time series Segmentation How do we find these segments? Modeling time series with HMMs 34

HMM for Time series Segmentation Steps for Segmentation with HMM: 1. Learn the HMM parameters through Maximum Likelihood Estimate (MLE): Initial State Probabilities Transition Matrix Emission Model Parameters HMM Likelihood Baum-Welch algorithm (Expectation-Maximization for HMMs) - Iterative solution - Converges to local minimum Hyper-parameter: Number of states possible K Modeling time series with HMMs 35

HMM for Time series Segmentation Steps for Segmentation with HMM: 2. Find the most probable sequence of states generating the observations through the Viterbi algorithm: HMM Joint Probability Distribution Modeling time series with HMMs 36

HMM for Time series Segmentation Modeling time series with HMMs 37

HMM for Time series Segmentation Modeling time series with HMMs 38

Model Selection for HMMs Modeling time series with HMMs

Model Selection for HMMs? Modeling time series with HMMs

Limitations of classical finite HMMs for Segmentation Undefined # hidden states Cardinality: Choice of hidden states is based on Model Selection heuristics, there is little understanding of the strengths and weaknesses of such methods in this Fixed setting Transition [1]. Matrix Topology: We assume that all time series share the same set of emission models and switch among them in exactly the same manner [2]. [1] Emily Fox et al., An HDP-HMM for Systems with State Persistence, ICML, 2008 [2] Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009 Solution: Bayesian Non-Parametrics Modeling time series with HMMs 41

Bayesian Non-Parametrics Bayesian: Use Bayesian inference to estimate the parameters; i.e. priors on model parameters! Prior Prior Non-parametric: Does NOT mean methods with no parameters, rather models whose complexity (# of states, # Gaussians) is inferred from the data. 1. Number of parameters grows with sample size. 2. Infinite-dimensional parameter space! Modeling time series with HMMs

BNP for HMMs: HDP-HMM Cardinality Hierarchical Dirichlet Process (HDP) prior on the transition Matrix! Hyperparameters! Normal Inverse Wishart (NIW) prior on emission parameters! Emily Fox et al., An HDP-HMM for Systems with State Persistence, ICML, 2008 Modeling time series with HMMs

BNP for HMMs: HDP-HMM Cardinality The Dirichlet Process (DP) is a prior distribution over distributions. Used for clustering with infinite mixture models; i.e. instead of setting K in a GMM, the K is learned from data. This only gives us an estimate of the K clusters! Cannot be use directly on transition matrix: The Hierarchical Dirichlet Process (HDP) is a hierarchy of DPs! Modeling time series with HMMs

Segmentation with HDP-HMM Emily Fox et al., An HDP-HMM for Systems with State Persistence, ICML, 2008 Modeling time series with HMMs

Compare to Model Selection with Classical HMMs Modeling time series with HMMs

BNP for HMMs: BP-HMM Cardinality Topology The Beta Process (BP) prior on the transition Matrix! Normal Inverse Wishart (NIW) prior on emission parameters! Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009 Modeling time series with HMMs

BNP for HMMs: BP-HMM Cardinality Topology The Beta Process (BP) prior on the transition Matrix! Time-Series Features (i.e. shared HMM States) Modeling time series with HMMs

Segmentation in Human Motion Data 12 Variables - Torso position - Waist Angles (2) - Neck Angle - Shoulder Angles -.. Emily Fox et al.., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009 Emily Fox et al., Sharing Features among Dynamical Systems with Beta Processes, NIPS, 2009 Modeling time series with HMMs 49

Applications in Robotics Learning Complex Sequential Tasks from Demonstration Modeling time series with HMMs 50