The concept of HMMs

(1)

Hidden Markov Models Hidden Markov Models

Machine Learning

Sotiris Manitsaris Sotiris Manitsaris

Robotics Lab | Dep. of Mathematics and Systems | MINES ParisTech

The concept of HMMs

« The future is independent of the past, given the present »

Andreï Andreïevitch Markov Андрей Андреевич Марков 2 June 1856 - 20 July 1921

(2)

Reasoning over Time or Space

•

We want to reason about a sequence of observations

•

Gesture recognition in Human-Robot Collaboration

•

Visual-speech recognition

•

Gesture control of robots

•

Need introduce time or space into our models

Motivations-Applications

(3)

Model definition in a Markov Chain

• Set of N States, {S1, S2,… SN}

• Sequence of states Q ={q1, q2,…}

• Initial probabilities π={π₁, π₂,… π_N}

• π_i=P(q1=Si)

• Transition matrix A NxN

• aij=P(qt+1=Sj | qt=Si)

we need observations to update our beliefs

Example of Markov Chain

Weather model:

• 3 states {sunny, rainy, cloudy}

Problem:

• Forecast weather state, based

on the current weather state

S₁ S₁ S₂ S₂ S₁

(4)

Definition of Gaussian Mixture Model

•

n states observed through an observation x

Θ1

Θ1mg

Θ₂^mg Θ3mg

•

Model parameter Θ ={ Θ

1

, Θ

2

.., Θ

n

}

Θ1

Θ2

Example of Mixture Models

Weather model:

•

3 “hidden” states

•

{rainy, cloudy, sunny}

•

Measure weather-related variables

(e.g. temperature, humidity, barometric pressure)

Problem:

•

Given the values of the weather variables, what is the

state?

(5)

Definition of an HMM

λ=(A, B, π): Hidden Markov Model

• A={aij}: Transition probabilistic distribution

• aij=P(qt+1=Sj | qt=Si)

• Β={bi(x)}: Emission probabilistic distribution

• bi(Ο_t)=P(Ο_t=x| qt=Si)

• π={π_i}: Initial state probabilistic distribution

• π_i=P(q1=Si)

the Treilis graph

a a a a a a

S

₁

S

₂

S

₃

S

₄

S

₅

S

₆

x x x x x x

a ₁₁ a ₂₂ a ₃₃ a ₄₄ a ₅₅ a ₆₆

a ₁₂ a ₂₃ a ₃₄ a ₄₅ a ₅₆

b (x) b (x) b (x) b (x) b (x) b (x) b₁(x) b₂(x) b₃(x) b₄(x) b₅(x) b₆(x)

(6)

Conditional independence

•

Basic conditional independece:

•

Past and future are independent of the present

•

Each time step only depends on the previous

•

This is called the first order Markov property

•

This is called the first order Markov property

Topologies of HMMs

Left to right (A) Left to right (B)

S₁ S₂ S₃ S₄ S₁ S₂ S₃ S₄

Left to right (A) Left to right (B)

Left to right (C) Ergodic

S₁

S₂

S₃ S₁ S₂ S₄ S₆

S₃ S₅

(7)

Example of HMM

Weather model:

• 2 “hidden” states

• {rainy, cloudy}

• Measure weather-related variables

(e.g. humidity) ^10%

70%

humidity

Problem:

Forecast the weather state, given the current weather variables

t

Basic problems of HMMs

•

Evaluation

•

O, λ → P(O| λ )

•

Uncover the hidden part

•

O, λ → Q that P(Q|O, λ ) is maximum

•

Learning

•

{ Ο } → λ that P(O| λ ) is maximum

(8)

Evaluation

O, λ→ P(O|λ)

• Solved by the Forward algorithm Applications

• Find some likely samples

• Evaluation of a sequence of observations

S₁ S₂ S₃ S₄ S₅ S₆

x x x x x x

a ₁₁ a ₂₂ a ₃₃ a ₄₄ a ₅₅ a ₆₆

a ₁₂ a ₂₃ a ₃₄ a ₄₅ a ₅₆

b₁(x) b₂(x) b₃(x) b₄(x) b₅(x) b₆(x)

• Change detection

conditionally independent

Uncover the hidden part

O, λ→Q that P(Q|O, λ) is maximum O, λ→Q that P(Q|O, λ) is maximum

• Solved by Viterbi algorithm

• No « correct » sequence to be found How to solve it:

• Use an optimality criterion that

depends on the use of the uncovered state sequence

Possible uses:

• Learn about the structure of the model

• Get average statistics of the states

S₁ S₂ S₃ S₄ S₅ S₆

a ₁₁ a ₂₂ a ₃₃ a ₄₄ a ₅₅ a ₆₆

a ₁₂ a ₂₃ a ₃₄ a ₄₅ a ₅₆

recursion given a state

• Get average statistics of the states Applications

• Find the real states by maximising the likelihood until a given state

• Find some recursion given an arbitrary state

• Used in the learning problem

x x x x x x

b₁(x) b₂(x) b₃(x) b₄(x) b₅(x) b₆(x)

(9)

Learning

• {Ο} → λ that P(O|λ) is maximum

• No analytic solution

• Solved by Baum-Welch (EM variation) when some data is missing (the states)

• Applications

θ η

g

max

• Unsupervised Learning (single HMM)

• Supervised Learning (multiple HMM)

θ η

Piano-like finger gesture recogition

capturing

hand segmentation &

fingertips identification skin modeling or distance slicing

computer vision & machine learning

deterministic/ stochastic

modelling machine learning

gesture

optical camera

depth camera

modelling

HMMs GMMs DTW

early recognition & prediction

dynamic recognition

gestures static recognition

likelihoods

(10)

A concrete example

ascending scale descending scale

• Let’s consider a gesture dictionnary GD with the following gestures:

ascending scale descending scale

ascending arpeggio descending arpeggio

• A set of ergodic HMMs, one per gesture:

• The parameters λ

i= (A_i, B_i, π

i) of all the HMMs

What to recognize

• We want to recognize

• It is an ascending arpeggi o with its inversion

(11)

How to model the gesture

état x

• We consider an alphabet of fingerings

état x1

(DO avec le 1^er doigt)

état x2

(ΜΙ avec le 2^ème doigt)

état x3

(SOL avec le 3^ème doigt)

état x4

(DO avec le 5^ème doigt) State S₁

DO with 1st fingering

State S₂ MI with 2nd fingering

State S₃ SOL with 3rd fingering

State S₄ DO with 5th fingering

x₁ x₁

x₂

x₂ x₄

x₃ x₃

0,3 0,05

0,6

0,05

0,3

0,6 0,6 0,6

0,3 0,3

0,05

0,05 0,05

0,6

0.05

• We assume: 0,6

• A={aij} and

• That Q={q₁, q_2,q_3,q_4,q₅, q₆,q₇} constitutes the ascending arpeggio with its inversion

• π₁=P(q1)=1

S

4

S

4

S

2

S

2

S

3

S

3

S₁ S₁

Other modeling could lead to a better physical meaning?

Rest state

Start state Attack state

(12)

How to model the observations

•

With Gaussian distributions. How many for M

₃

?

•

With Gaussian distributions. How many for M

₃

?

A priori knowledge

• That the sequence of observations O(t)_1:7(visible sequence) is the following:

• We assume that M₃ has the maximum likelihood since it is the only ergodic model since it is the only ergodic model

since it is the only ergodic model

• That S(t)_1:7is the state sequence (hidden sequence) that generated O(t)_1:7:

(13)

HMM representation

S(t)_1:7

q₂ q₂ q₃ q₃ q₄ q₄

S(t)_1:7

P(q2=S₂| q1=S1) P(Ο

6=x₆| q2=S6)

q₁ q₁

x₁ x₂ x₆ x₇

O(t)_1:7

Problem 3: Learning

We know:

• the model M₃

• the sequence O(t)_1:7

• the sequence O(t)_1:7 Which are:

• the λ=(A, B, π) of M₃ that maximize P(O|λ)

(14)

Problem 2: Uncover the hidden part

Viterbi

Q(t)

q₂ q₂ q₃ q₃ q₄ q₄

Q(t)_1:7

We know:

• the model M₃

• the sequence O(t)_1:7 Which are:

• the Q(t)_1:7that generated O(t)_1:7 and maximizes P(Q|O, λ)?

q₁ q₁

x₁ x₂ x₆ x₇

O(t)_1:7

and maximizes P(Q|O, λ)?

Problem 1: Evaluation

Q(t)_1:7

Forward-Backward

q₂ q₂ q₃ q₃ q₄ q₄

Q(t)_1:7

We know:

• the model M₃

• the sequence O(t)_1:7 How to:

• calculate P(O(t)_1:7| M₃)?

q₁ q₁

x₁ x₂ x₆ x₇

O(t)_1:7

(15)

Maximum likelihood computation

Μ Μ

Sequence of observations

Μ1

Μ₂ Μ₂

….

Gesture recognition Likelihood

computation Likelihood computation

Maximum likelihood computation

Maximum likelihood computation Likelihood

computation Likelihood computation

Μ₄ Μ₄

….

Likelihood computation

O(t)_1:7

Precision, Recall & Jackknife

statistic t

estimate by

_t

t Set (

₁

,Set

₂

,…,Set

_n

)

Set_k, k=1, 2,..,n left-out

Repeat for n times

learning

statistics

1 2 n

( )

t1^∗ t2^∗ t_n^∗

recognition

t Set (

₂

,...,Set

_n

) ^{t Set} (

¹

^,Set

³

^...,Set

ⁿ

) ^{t Set} (

¹

^,...,Set

^n-1

)

(16)