Markov and hidden Markov models - Introduction and Foundations

Introduction and Foundations

28 Introduction and Foundations

1.7 Markov and hidden Markov models

A hidden Markov model (HMM) is a stochastic model that is used to model time-varying random phenomena. It is based upon a Markov model, and can be understood in terms of the state-space models already derived. We now present the basic concepts, providing resolution to the issues raised here in chapters 17 and 19. Placement here serves several purposes: it provides a demonstration of the utility of the state-space formulation to yet another system; i t smoothes the development of HMM algorithms in later chapters; and it provides introduction and motivation for two important algorithms, the EM algorithm and the Viterbi algorithm.

1.7.1 Markov models

The Markov model is used to model the evolution of random phenomena that can be in discrete states as a function of time, where the transition from one state to the next is random. Suppose that a system can be in one of S distinct states, and that at each step of discrete time it can move to another state at random, with the probability of the transition at time t dependent only upon the state of the system at time t . It is convenient to represent this concept using a probabilistic state diagram. as shown in figure 1.16. In this figure, the Markov model has three states. From state 1, transitions to each of the states are possible;

from state 1 to state 1 with probability 0.5, and so forth. Let S [ t ] denote the state at time t , where S [ t ] takes on one of the values 1 , 2,

.

^{. .}^,S . The initial state is selected according to a probability n, ,

38 Introduction and Foundations

Figure 1.16: A simple Markov model

By the foregoing description, the probability of transition depends only upon the current state:

P ( S [ r

+

^{11 =}j l S [ t ] = i , S [ t - 1 1 = k , S [ t

-

21 = 1 , .

.

^{.) =}P ( S [ t

+

I ] = j l S [ t ] = i).

This structure on the probabilities is called the Markov property, and the random sequence of state values S [ 0 ] , S [ 1 ] , S [ 2 ] ,

. . .

, is called a Markov sequence or a Markov chain. This sequence is the output of the Markov model.

We can determine the probability of arriving in the next state by adding up ail the probabilities of the ways of arriving there.

P ( S [ t

+

¹¹⁼^{j )}⁼P ( S [ t

+

^{I ]}⁼j / S [ t ] = l ) P ( S [ t ] = 1)

+

P ( S [ t

+

^{I ]}⁼j l S [ t ] = 2 ) P ( S [ t ] = 2 )

+ . . .

+

P ( S [ t

+

¹¹⁼jlS[r] = S ) P ( S [ t ] = S ) . (1.56) The computation in (1.56) can be made conveniently in matrix notation. Let

be the vector of probabilities for each state, and let the matrix A contain the transition probabilities

where P ( i

1

j ) is an abbreviation for P ( S [ t

+

¹¹^{= i}^jS[t]⁼^{j ) ,}^or^a,,⁼P ( S [ r

+

^{11 =}

i l S [ t ] = j ) . For example, for the Markov model of figure 1.16

A steady-stateprobabzlrtv assignment I S one that does not change from one tlme step to the next, so the probabil~ty must satisfy the equation Ap = p Thls is a particular eigenequation, with an eigenvalue of 1 (More wlll be said about etgenvalue problems In chapter 6 )

By the law of total probab~l~ty, edch column of A must \urn to 1

: : klarkuv and Hidden Markov ,l/lodels 39

~efinition 1.2 An m x rn matrlx P, ~ c h that

x;"=,

^p,,⁼¹(each row sums to I ) and each clement of P i \ nonnegative, I \ called a stochastic matrix. If the rowv and columns each

\urn to 1 , then P is doubly stochastic C1

The matrix A of (1 57) IS the transpose of a stochastic matnx The vector ^.ircontains the ~nlttal probabil~tres Thus, we can wrlte the probabillstlc update equation as

Or, to put it another way,

with pit] = O for t 5 0. The similarity of (1.59) to the first equatlon of (1.21) should be apparent. In comparing these two, it should be noted that the "state" represented by ( 1 39)

ts actually the vector of probabil~ties p [ t ] , not the state of the Markov sequence S [ r ] . 1.7.2 Hidden Markov models

The idea behind the HMM can be illustrated using the urn problems of elementary proba- bility, as shown in figure 1.17. Suppose we have S different urns, each of which contains its own set of colored balls. At each instant of time, an urn is selected at random according to the state it was in at the previous instant of time. (That is, according to a Markov model.) Then, a ball is drawn at random from the urn selected at time t . The ball is what we observe as the output, and the actual state is hidden.

The distinction between Markov models and hidden Markov models can be further clarified by continuing the analogy with the state-space equations in (1.21). Equation (1 3 9 ) provides for the state update of the Markov system. In most linear systems, however, the state vector is not directly observable; instead, it is observed only through the observation matrix

C

(assuming for the moment that D is zero),

so the state is hidden from direct observation. Similarly, in the HMM we do not observe the state directly. Instead, each state has a probability distribution associated with it. When the HMM moves into state s [ t ] at time t , the observed output y[t] is an outcome of a random variable Y [ t ] that is selected according to distribution f (y [ t ] l S [ t ] = s), which we will

2 black

Figure 1.17: The concept of a hidden Markov model

40 Introduction and Foundations

represent using the notation

f

(yISlt1 =

$1

⁼f s ( y ) .

(This idea is illustrated in figure 1.18.) In the urn example of the preceeding paragraph, the output probabilities depend on the contents of the urns. A sequence of outputs from an HMM is y[O], y [ l ] , y [ 2 ] ,

.

. .

.

The underlying state information is not seen directly; it is hidden. The probability distribution in each state can be of any type and, in general, each state could have its own type of distribution. Most often in practice, however, each state has the same type of distribution, but with different parameters.

Figure 1.18: An HMM with four states

Let M denote the number of possible outcomes from all of the states, and let Y [ t ] be the random variable output at time t , with outcome y [ t ] . We can detennine the probability of each possible output by adding up all the probabilities,

p ( Y [ t l = j ) = P(Ylr1 = j l S [ t ] = l ) P ( S [ t ] = I )

+

p ( Y [ t l = j l S [ t ] = 2 ) P ( S [ t ] = 2 ) )

+

+ P ( Y [ r l = j l S [ t ] = S ) P ( S [ r ] = S ) . Let

P ( Y [ t ] = I ) P ( Y

[ r ]

⁼2) q [ t l =

and

so, c,

,

⁼P ( Y [ t ] = r I S [ t ] = J ) . For the urns shown In figure 1 17, wlth the ball colors black, green, and red corresponding to values 1 , 2, and 3, respectlvely.

112 113 113

C = 113 7/15 113

[ /

¹¹⁵

Dans le document Methods and Algorithms for (Page 77-81)