An Introduction to Hidden Markov Models

Similar documents
Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

HMMConverter A tool-box for hidden Markov models with two novel, memory efficient parameter training algorithms

Modeling time series with hidden Markov models

Biology 644: Bioinformatics

Hidden Markov Model for Sequential Data

Parallel HMMs. Parallel Implementation of Hidden Markov Models for Wireless Applications

Assignment 2. Unsupervised & Probabilistic Learning. Maneesh Sahani Due: Monday Nov 5, 2018

A reevaluation and benchmark of hidden Markov Models

A Visualization Tool to Improve the Performance of a Classifier Based on Hidden Markov Models

Hidden Markov Models in the context of genetic analysis

Package HMMCont. February 19, 2015

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017

HPC methods for hidden Markov models (HMMs) in population genetics

Using Hidden Markov Models to analyse time series data

DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION

Lecture 5: Markov models

COMP90051 Statistical Machine Learning

Treba: Efficient Numerically Stable EM for PFA

Effect of Initial HMM Choices in Multiple Sequence Training for Gesture Recognition

Invariant Recognition of Hand-Drawn Pictograms Using HMMs with a Rotating Feature Extraction

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION

Natural Language Processing - Project 2 CSE 398/ Prof. Sihong Xie Due Date: 11:59pm, Oct 12, 2017 (Phase I), 11:59pm, Oct 31, 2017 (Phase II)

Genome 559. Hidden Markov Models

ModelStructureSelection&TrainingAlgorithmsfor an HMMGesture Recognition System

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

of Manchester The University COMP14112 Markov Chains, HMMs and Speech Revision

Page 1. Interface Input Modalities. Lecture 5a: Advanced Input. Keyboard input. Keyboard input

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT

Quiz Section Week 8 May 17, Machine learning and Support Vector Machines

Introduction to SLAM Part II. Paul Robertson

Vorhersage von Schadzuständen durch Nutzung von Hidden Markov Modellen Master-Thesis von Zahra Fardhosseini Tag der Einreichung: 15.

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

Multimedia Databases. Wolf-Tilo Balke Younès Ghammad Institut für Informationssysteme Technische Universität Braunschweig

Multimedia Databases. 9 Video Retrieval. 9.1 Hidden Markov Model. 9.1 Hidden Markov Model. 9.1 Evaluation. 9.1 HMM Example 12/18/2009

Hidden Markov Models. Mark Voorhies 4/2/2012

An Introduction to Pattern Recognition

Time series, HMMs, Kalman Filters

Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

application of learning vector quantization algorithms. In Proceedings of the International Joint Conference on

PROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota

Clustering Sequences with Hidden. Markov Models. Padhraic Smyth CA Abstract

Decoding recaptcha. COSC 174 Term project Curtis R. Jones & Jacob Russell

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

The Method of User s Identification Using the Fusion of Wavelet Transform and Hidden Markov Models

DSCI 575: Advanced Machine Learning. PageRank Winter 2018

Biostatistics 615/815 Lecture 23: The Baum-Welch Algorithm Advanced Hidden Markov Models

Exercise 5. Deadlines: Monday (final, no student correction) Matlabs Statistics Toolbox contains the following functions for HMM :

Using HMM in Strategic Games

Gribskov Profile. Hidden Markov Models. Building a Hidden Markov Model #$ %&

A Study on Network Flow Security

Dynamic Programming. Ellen Feldman and Avishek Dutta. February 27, CS155 Machine Learning and Data Mining

The Expectation Maximization (EM) Algorithm

ε-machine Estimation and Forecasting

Mono-font Cursive Arabic Text Recognition Using Speech Recognition System

Intelligent Hands Free Speech based SMS System on Android

Supervised and unsupervised classification approaches for human activity recognition using body-mounted sensors

HMM-Based Handwritten Amharic Word Recognition with Feature Concatenation

CSCI 5582 Artificial Intelligence. Today 10/31

One Dim~nsional Representation Of Two Dimensional Information For HMM Based Handwritten Recognition

COMS 4771 Clustering. Nakul Verma

Graphical Models & HMMs

Lecture 21 : A Hybrid: Deep Learning and Graphical Models

Person Authentication from Video of Faces: A Behavioral and Physiological Approach Using Pseudo Hierarchical Hidden Markov Models

Using Hidden Markov Models to Detect DNA Motifs

Biostatistics 615/815 Lecture 10: Hidden Markov Models

the number of states must be set in advance, i.e. the structure of the model is not t to the data, but given a priori the algorithm converges to a loc

COMP Page Rank

Optimization of HMM by the Tabu Search Algorithm

Introduction to Hidden Markov models

School of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)

Similarity-Based Clustering of Sequences using Hidden Markov Models

MEMMs (Log-Linear Tagging Models)

Outline. Probabilistic Name and Address Cleaning and Standardisation. Record linkage and data integration. Data cleaning and standardisation (I)

CS 6784 Paper Presentation

Variational Methods for Graphical Models

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Improving Time Series Classification Using Hidden Markov Models

Chapter 3. Speech segmentation. 3.1 Preprocessing

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

Conditional Random Fields : Theory and Application

Constraints in Particle Swarm Optimization of Hidden Markov Models

Conditional Random Field for tracking user behavior based on his eye s movements 1

An embedded system of Face Recognition based on ARM and HMM

Beyond the Embarrassingly Parallel

CSCI 599 Class Presenta/on. Zach Levine. Markov Chain Monte Carlo (MCMC) HMM Parameter Es/mates

Structured Learning. Jun Zhu

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

!"#$ Gribskov Profile. Hidden Markov Models. Building an Hidden Markov Model. Proteins, DNA and other genomic features can be

Real Time Recognition of Non-Driving Related Tasks in the Context of Highly Automated Driving

A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition

Event and anomaly detection using Tucker3 decomposition

BMI/CS Lecture #22 - Stochastic Context Free Grammars for RNA Structure Modeling. Colin Dewey (adapted from slides by Mark Craven)

MASTER OF ENGINEERING PROGRAM IN INFORMATION

GPU Programming for Mathematical and Scientific Computing

OnLine Handwriting Recognition

Clinical Named Entity Recognition Method Based on CRF

732A54/TDDE31 Big Data Analytics

A Performance Evaluation of HMM and DTW for Gesture Recognition

Transcription:

An Introduction to Hidden Markov Models Max Heimel Fachgebiet Datenbanksysteme und Informationsmanagement Technische Universität Berlin http://www.dima.tu-berlin.de/ 07.10.2010 DIMA TU Berlin 1

Agenda Motivation An Introduction to Hidden Markov Models What is a Hidden Markov Model? Algorithms, Algorithms, Algorithms What are the main problems for HMMs? What are the algorithms to solve them? Hidden Markov Models for Apache Mahout A short overview Outlook Hidden Markov Models and Map Reduce Take-Home Messages 07.10.2010 DIMA TU Berlin 2

Motivation Pattern recognition: finding structure in sequences. 07.10.2010 DIMA TU Berlin 3

Goals of this talk Demonstrate, how sequences can be modeled Using so-called Markov chains. Present the statistical tool of Hidden Markov Models A tool to find underlying processes to a given sequence. Give an understanding of the main problems associated with Hidden Markov Models And the corresponding applications. Present the Apache Mahout implementation of Hidden Markov Models Give an outlook on implementing Hidden Markov Models for Map/Reduce 07.10.2010 DIMA TU Berlin 4

Agenda Motivation An Introduction to Hidden Markov Models What is a Hidden Markov Model? Applications for Hidden Markov Models What are the main problems for HMMs? What are the algorithms to solve them? Contains Math Hidden Markov Models for Apache Mahout A short overview Outlook Hidden Markov Models and Map Reduce Take-Home Messages 07.10.2010 DIMA TU Berlin 5

Markov Chains I Markov Chains model sequential processes. Consider a discrete random variable q with states. State of q changes randomly in discrete time steps. Transition probability depends only on the k previous states. Markov Property q 1 h i q 2 h j Random transition q 3 h k Probability depends only on the prior states time Markov Chain 07.10.2010 DIMA TU Berlin 6

Markov Chains II Most simplest Markov chain: Transition Probability depends only on the previous state (i.e. k=1): Transition Probability is time invariant: In this case, the Markov chain is defined by: An Matrix T containing state change probabilities: An n-dimensional vector containing initial state probabilities: Since and K contain probabilities, they have to be normalized: 07.10.2010 DIMA TU Berlin 7

Markov Chains III Markov Chains are used to model sequences of states. Consider the weather: Each day can either be rainy or sunny. If a day is rainy, there is a 60% chance the next day will also be rainy. If a day is sunny, there is a 80% chance the next day will also be sunny. We can now model the weather as a Markov Chain: Examples for the use of Markov chains are: Google Page Rank Random Number Generation, Random Text Generation Queuing Theory Modeling DNA sequences Physical processes from Thermodynamics & statistical Mechanics 07.10.2010 DIMA TU Berlin 8

Hidden Markov Models I Now consider a hidden Markov chain We can not directly observe the states of the hidden Markov chain. However: we can observe effects of the hidden states. Hidden Markov Models (HMMs) are used to model such a situation: Consider a Markov chain and a random not necessarily discrete - variable p. The state of p is chosen randomly, based only on the current state of q. hidden variable q q 1 q 2 q 3 h i h j h k p 1 p 2 p 3 observed variable p 07.10.2010 DIMA TU Berlin 9

Hidden Markov Models II The simplest HMM has a discrete observable variable p The states of q are take from the set In this case, the HMM is defined by the following parameters: The matrix T and vector of the underlying Markov Chain. An matrix O containing output probabilities: Again, O needs to be normalized: Consider a prisoner in solitary confinement: The prisoner cannot directly observe the weather. However: he can observe the condition of the boots of the prison guards. 0,4 0,6 rainy sunny 0,8 0,9 dirty boots 0,1 0,2 0,2 0,8 clean boots 07.10.2010 DIMA TU Berlin 10

Agenda Motivation An Introduction to Hidden Markov Models What is a Hidden Markov Model? Applications for Hidden Markov Models What are the main problems for HMMs? What are the algorithms to solve them? Hidden Markov Models for Apache Mahout A short overview Outlook Hidden Markov Models and Map Reduce Take-Home Messages 07.10.2010 DIMA TU Berlin 11

Problems for Hidden Markov Models Problem Given Wanted Evaluation Decoding Model, Observed Sequence Model, Observed Sequence Likelihood the model produced the observed sequence Most likely hidden sequence Learning (unsupervised) Observed Sequence Most likely model that produced the observed sequence Learning (supervised) Observed- & Hidden sequence Most likely model that produced the observed & hidden sequence. Hidden sequence Model Observed sequence 07.10.2010 DIMA TU Berlin 12

Evaluation Compute the likelihood that a given model M produced a given observation sequence O. The likelihood can be efficiently calculated using dynamic programming: Forward algorithm: Reproduce the observation through the HMM, computing: Backward algorithm: Backtrace the observation through the HMM, computing: Typical application: Selecting the most likely out of several competing models. Customer behavior modeling: select the most likely customer profile Physics: select the most likely thermodynamics process 07.10.2010 DIMA TU Berlin 13

Decoding Compute the most likely sequence of hidden states for a given model M and a given observation sequence O. The most likely hidden path can be computed efficiently using the Viterbialgorithm The Viterbi algorithm is based on the Forward Algorithm. It traces the most likely hidden states while reproducing the output sequence. Typical Applications: POS tagging (observed: sentence, hidden: part-of-speech tags) Speech recognition (observed: frequencies, hidden: phonemes) Handwritten letter recognition (observed: pen patterns, hidden: letters) Genome Analysis (observed: genome sequence, hidden: structures ) 07.10.2010 DIMA TU Berlin 14

1. Supervised Learning Learning Given an observation sequence O and the corresponding hidden sequence H, compute the most likely model M that produces those sequences: Solved using instance counting Count the hidden state transitions and output state emissions. Use the relative frequencies as estimate for transition probabilities of M. 2. Unsupervised Learning Given an observation sequence O, compute the most likely model M that produces this sequence: Solved using the Baum-Welch algorithm Rather expensive iterative algorithm, but produces guaranteed EM result. Requires a Forward step and a Backward Step through the model per iteration. Alternative: Viterbi training Not as expensive as Baum-Welch, but does not guarantee EM result Requires only a Forward step through the model per iteration. 07.10.2010 DIMA TU Berlin 15

Agenda Motivation An Introduction to Hidden Markov Models What is a Hidden Markov Model? Applications for Hidden Markov Models What are the main problems for HMMs? What are the algorithms to solve them? Hidden Markov Models for Apache Mahout A short overview Outlook Hidden Markov Models and Map Reduce Take-Home Messages 07.10.2010 DIMA TU Berlin 16

Apache Mahout will contain an implementation of Hidden Markov Models in its upcoming 0.4 release. Hidden Markov Models for Apache Mahout The implementation is currently sequential (i.e. not Map/Reduce enabled). The implementation covers Hidden Markov models with discrete output states. The overall implementation structure is given by three main and two helper classes: HmmModel Container for HMM parameters HmmTrainer Methods to train a HmmModel from given observations HmmEvaluator Methods to analyze (evaluate / decode) a given HmmModel HmmUtils Helper methods, e.g. to validate and normalize a HmmModel HmmAlgorithms Helper methods containing implementations of Forward, Backward and Viterbi algorithm. 07.10.2010 DIMA TU Berlin 17

HmmModel HmmModel is the main class for defining a Hidden Markov Model. It contains the transition matrix K, emission matrix O and initial probability vector π. Construction from given Mahout matrices K, O and Mahout vector pi: HmmModel model = new HmmModel(K, O, pi); Construction of a random model with n hidden and m observable states: HmmModel model = new HmmModel(n, m); Offers serialization and deserialzation from/to JSON: HmmModel model = HmmModel.fromJson(String json); String json = model.tojson(); 07.10.2010 DIMA TU Berlin 18

Offers a collection of learning algorithms. Supervised learning from hidden and observed state sequences: HmmModel trainsupervised(int hiddenstates, int observedstates, int[] hiddensequence, int[] observedsequence, double pseudocount); Used to avoid zero probabilities Unsupervised learning using the Viterbi algorithm: HmmModel trainviterbi(hmmmodel initialmodel, int[] observedsequence, double pseudocount, double epsilon, int maxiterations, boolean scaled); Use log-scaled implementation slower but numerically more stable Unsupervised learning using the Baum-Welch algorithm: HmmTrainer HmmModel trainbaumwelch(hmmmodel initialmodel, int[] observedsequence, double epsilon, int maxiterations, boolean scaled); Use log-scaled implementation slower but numerically more stable 07.10.2010 DIMA TU Berlin 19

HMMEvaluator Offers algorithms to evaluate an HmmModel Generating a sequence of output states from the given model: int[] predict(hmmmodel model, int steps); Computing the model likelihood for a given observation: double modellikelihood(hmmmodel model, int[] observations, boolean scaled); Use log-scaled implementation slower but numerically more stable Compute most likely hidden path for given model and observation: int[] decode(hmmmodel model, int[] observations, boolean scaled); Use log-scaled implementation slower but numerically more stable 07.10.2010 DIMA TU Berlin 20

Agenda Motivation An Introduction to Hidden Markov Models What is a Hidden Markov Model? Applications for Hidden Markov Models What are the main problems for HMMs? What are the algorithms to solve them? Hidden Markov Models for Apache Mahout A short overview Outlook Hidden Markov Models and Map Reduce Take-Home Messages 07.10.2010 DIMA TU Berlin 21

How can we make HMMs Map Reduce enabled? Problem: However: Outlook All the presented algorithms are highly sequential! There is no easy way of parallelizing them. Hidden Markov Models are often compact (n, m not too large ) The most typical application on a trained HMM is decoding, which can be performed fairly efficient on a single machine. Trained HMMs can typically be efficiently used within Mappers/Reducers. The most expensive and data intensive application on HMMs is training. Main Goal: parallelize HMM training. Approaches to parallelizing learning: For supervised learning: trivial, only need to count state changes. For unsupervised learning: tricky, ongoing research :» Merging of trained HMMs on subsequences» Alternative representations allows training via parallelizable algorithms (e.g. SVD) 07.10.2010 DIMA TU Berlin 22

Take-Home Messages I Hidden Markov Models (HMMs) are a statistical tool to model processes that produce observations based on a hidden state sequence: HMMs consist of a discrete hidden variable that randomly and sequentially changes its state and a random observable variable. Hidden state change probability depends only on the k prior hidden states A typical value for k is 1. Probability of the observable variable depends only on current hidden state. Three main problems for HMMs: evaluation, decoding and training: Evaluation: Likelihood a given model generated a given observed sequence. Forward Algorithm Decoding: Most likely hidden sequence for a given observed sequence and model. Viterbi Algorithm Training: Most likely model that generated a given observed sequence (unsupervised) or a given observed and hidden sequence (supervised). Baum-Welch Algorithm 07.10.2010 DIMA TU Berlin 23

HMMs can be applied whenever an underlying process generates sequential data: Take-Home Messages II Speech Recognition, Handwritten Letter Recognition, Part-of-speech tagging, Genome Analysis, Customer Behavior Analysis, Context aware Search, Mahout contains a HMM implementation in its upcoming 0.4 release. Three main classes: HmmModel, HmmTrainer, HmmEvaluator HmmModel is a container class for representing model parameters. HmmTrainer contains implementations for the learning problem. HmmEvaluator contains implementations for the evaluation and decoding problem. Porting HMMs to Map/Reduce is non-trivial Typically, HMMs can be used within a Mapper/Reducer. Most data-intensive task for HMMs: training Porting HMM training to Map/Reduce is ongoing research. 07.10.2010 DIMA TU Berlin 24

The end my only friend, the end Thank you! 07.10.2010 DIMA TU Berlin 25