Hidden Markov Models Hidden Markov Models
Machine Learning
Sotiris Manitsaris Sotiris Manitsaris
Robotics Lab | Dep. of Mathematics and Systems | MINES ParisTech
The concept of HMMs
« The future is independent of the past, given the present »
Andreï Andreïevitch Markov Андрей Андреевич Марков 2 June 1856 - 20 July 1921
Reasoning over Time or Space
•
We want to reason about a sequence of observations
•
Gesture recognition in Human-Robot Collaboration
•
Visual-speech recognition
•
Gesture control of robots
•
Need introduce time or space into our models
Motivations-Applications
Model definition in a Markov Chain
• Set of N States, {S1, S2,… SN}
• Sequence of states Q ={q1, q2,…}
• Initial probabilities π={π1, π2,… πN}
• πi=P(q1=Si)
• Transition matrix A NxN
• aij=P(qt+1=Sj | qt=Si)
• aij=P(qt+1=Sj | qt=Si)
we need observations to update our beliefs
Example of Markov Chain
Weather model:
• 3 states {sunny, rainy, cloudy}
Problem:
• Forecast weather state, based
• Forecast weather state, based
on the current weather state
S1 S1 S2 S2 S1Definition of Gaussian Mixture Model
•
n states observed through an observation x
Θ1
Θ1mg
Θ2mg Θ3mg
•
Model parameter Θ ={ Θ
1
, Θ
2
.., Θ
n
}
Θ1
Θ2
Example of Mixture Models
Weather model:
•
3 “hidden” states
•
{rainy, cloudy, sunny}
•
Measure weather-related variables
(e.g. temperature, humidity, barometric pressure)
Problem:
•
Given the values of the weather variables, what is the
state?
Definition of an HMM
λ=(A, B, π): Hidden Markov Model
• A={aij}: Transition probabilistic distribution
• aij=P(qt+1=Sj | qt=Si)
• Β={bi(x)}: Emission probabilistic distribution
• bi(Οt)=P(Οt=x| qt=Si)
• π={πi}: Initial state probabilistic distribution
• πi=P(q1=Si)
the Treilis graph
a a a a a a
S
1S
2S
3S
4S
5S
6x x x x x x
a 11 a 22 a 33 a 44 a 55 a 66
a 12 a 23 a 34 a 45 a 56
b (x) b (x) b (x) b (x) b (x) b (x) b1(x) b2(x) b3(x) b4(x) b5(x) b6(x)
Conditional independence
•
Basic conditional independece:
•
Past and future are independent of the present
•
Each time step only depends on the previous
•
This is called the first order Markov property
•
This is called the first order Markov property
Topologies of HMMs
Left to right (A) Left to right (B)
S1 S2 S3 S4 S1 S2 S3 S4
Left to right (A) Left to right (B)
Left to right (C) Ergodic
S1
S2
S3 S1 S2 S4 S6
S3 S5
Example of HMM
Weather model:
• 2 “hidden” states
• {rainy, cloudy}
• Measure weather-related variables
(e.g. humidity) 10%
70%
humidity
Problem:
Forecast the weather state, given the current weather variables
t
Basic problems of HMMs
•
Evaluation
•
O, λ → P(O| λ )
•
Uncover the hidden part
•
O, λ → Q that P(Q|O, λ ) is maximum
•
Learning
•
{ Ο } → λ that P(O| λ ) is maximum
Evaluation
O, λ→ P(O|λ)
• Solved by the Forward algorithm Applications
• Find some likely samples
• Evaluation of a sequence of observations
S1 S2 S3 S4 S5 S6
x x x x x x
a 11 a 22 a 33 a 44 a 55 a 66
a 12 a 23 a 34 a 45 a 56
b1(x) b2(x) b3(x) b4(x) b5(x) b6(x)
• Change detection
conditionally independent
Uncover the hidden part
O, λ→Q that P(Q|O, λ) is maximum O, λ→Q that P(Q|O, λ) is maximum
• Solved by Viterbi algorithm
• No « correct » sequence to be found How to solve it:
• Use an optimality criterion that
depends on the use of the uncovered state sequence
Possible uses:
• Learn about the structure of the model
• Get average statistics of the states
S1 S2 S3 S4 S5 S6
a 11 a 22 a 33 a 44 a 55 a 66
a 12 a 23 a 34 a 45 a 56
recursion given a state
• Get average statistics of the states Applications
• Find the real states by maximising the likelihood until a given state
• Find some recursion given an arbitrary state
• Used in the learning problem
x x x x x x
b1(x) b2(x) b3(x) b4(x) b5(x) b6(x)
Learning
• {Ο} → λ that P(O|λ) is maximum
• No analytic solution
• Solved by Baum-Welch (EM variation) when some data is missing (the states)
• Applications
θ η
g
max
• Unsupervised Learning (single HMM)
• Supervised Learning (multiple HMM)
θ η
Piano-like finger gesture recogition
capturing
hand segmentation &
fingertips identification skin modeling or distance slicing
computer vision & machine learning
deterministic/ stochastic
modelling machine learning
gesture
optical camera
depth camera
modelling
HMMs GMMs DTW
early recognition & prediction
dynamic recognition
gestures static recognition
likelihoods
A concrete example
ascending scale descending scale
• Let’s consider a gesture dictionnary GD with the following gestures:
ascending scale descending scale
ascending arpeggio descending arpeggio
• A set of ergodic HMMs, one per gesture:
• The parameters λ
i= (Ai, Bi, π
i) of all the HMMs
What to recognize
• We want to recognize
• It is an ascending arpeggi o with its inversion
How to model the gesture
état x
• We consider an alphabet of fingerings
état x1
(DO avec le 1er doigt)
état x2
(ΜΙ avec le 2ème doigt)
état x3
(SOL avec le 3ème doigt)
état x4
(DO avec le 5ème doigt) State S1
DO with 1st fingering
State S2 MI with 2nd fingering
State S3 SOL with 3rd fingering
State S4 DO with 5th fingering
x1 x1
x2
x2 x4
x3 x3
0,3 0,05
0,6
0,05
0,3
0,6 0,6 0,6
0,3 0,3
0,05
0,05 0,05
0,6
0.05
• We assume: 0,6
• A={aij} and
• That Q={q1, q2, q3,q4, q5, q6,q7} constitutes the ascending arpeggio with its inversion
• π1=P(q1)=1
S
4
S
4
S
2
S
2
S
3
S
3
S1 S1
Other modeling could lead to a better physical meaning?
Rest state
Start state Attack state
How to model the observations
•
With Gaussian distributions. How many for M
3?
•
With Gaussian distributions. How many for M
3?
A priori knowledge
• That the sequence of observations O(t)1:7 (visible sequence) is the following:
• We assume that M3 has the maximum likelihood since it is the only ergodic model since it is the only ergodic model
since it is the only ergodic model
• That S(t)1:7 is the state sequence (hidden sequence) that generated O(t)1:7 :
HMM representation
S(t)1:7
q2 q2 q3 q3 q4 q4
S(t)1:7
P(q2=S2| q1=S1) P(Ο
6=x6| q2=S6)
q1 q1
x1 x2 x6 x7
O(t)1:7
Problem 3: Learning
We know:
• the model M3
• the sequence O(t)1:7
• the sequence O(t)1:7 Which are:
• the λ=(A, B, π) of M3 that maximize P(O|λ)
Problem 2: Uncover the hidden part
Viterbi
Q(t)
q2 q2 q3 q3 q4 q4
Q(t)1:7
We know:
• the model M3
• the sequence O(t)1:7 Which are:
• the Q(t)1:7 that generated O(t)1:7 and maximizes P(Q|O, λ)?
q1 q1
x1 x2 x6 x7
O(t)1:7
and maximizes P(Q|O, λ)?
Problem 1: Evaluation
Q(t)1:7
Forward-Backward
q2 q2 q3 q3 q4 q4
Q(t)1:7
We know:
• the model M3
• the sequence O(t)1:7 How to:
• calculate P(O(t)1:7 | M3)?
q1 q1
x1 x2 x6 x7
O(t)1:7
Maximum likelihood computation
Μ Μ
Sequence of observations
Μ1
Μ1
Μ2 Μ2
….
Gesture recognition Likelihood
computation Likelihood computation
Maximum likelihood computation
Maximum likelihood computation Likelihood
computation Likelihood computation
Μ4 Μ4
….
….
Likelihood computation
Likelihood computation
O(t)1:7
Precision, Recall & Jackknife
statistic t
estimate byt
t Set (
1,Set
2,…,Set
n)
Setk, k=1, 2,..,n left-out
Repeat for n times
learning
statistics
1 2 n
( )
t1∗ t2∗ tn∗
recognition