• Aucun résultat trouvé

Approximate Viterbi decoding for 2D-hidden Markov models

N/A
N/A
Protected

Academic year: 2022

Partager "Approximate Viterbi decoding for 2D-hidden Markov models"

Copied!
4
0
0

Texte intégral

(1)

Bernard Merialdo, Stephane Marchand-Maillet and Benoit Huet

Institut Eurecom 2229 Route des cretes, 06904 Sophia-Antipolis, France.

[email protected]

ABSTRACT

While one-dimensional Hidden Markov Models have been very successfully applied to numerous problems, their ex- tension to two dimensions has been shown to be exponen- tially complex, and this has very much restricted their usage for problems such as image analysis.

In this paper we propose a novel algorithm which is able to approximate the search for the best state path (Viterbi decoding) in a 2D HMM. This algorithm makes certain as- sumptions which lead to tractable computations, at a price of loss in full optimality. We detail our algorithm, its imple- mentation, and present some experiments on handwritten character recognition.

Because the Viterbi algorithm serves as a basis for many applications, and 1D HMMs have shown great exibility in their usage, our approach has the potential to make 2D HMMs as useful for 2D data as 1D HMMs are for 1D data such as speech.

1. INTRODUCTION

Hidden Markov Models (HMM) have long been used to e- ciently model uni-dimensional data (sequences of symbols), in particular in speech recognition systems. While it is tempting to benet from this eciency for the modeling of two-dimensional data such as images, eorts to achieve this have been limited by the increase in complexity that occurs when going from one to two dimensions.

In previous work [6], using Hidden Markov Models for image analysis, it was shown that these tools allowed for exibility in designing image models. However, these mod- els also exhibited a lack of geometrical consistency which generated problems for creating `semi-rigid' models. This was due to the fact that the two-dimensional structure of the image was taken at two successive levels (i.e. pixel and line levels).

The work of Levin and Pieraccini [5] introduced the idea of planar hidden markov models where the complexity of the problem is reduced by applying image alignment con- straints. Miller and Hunt [12] described a sub-optimal 2D Viterbi algorithm for bilevel image reconstruction. Their approach is in fact based on a 1D "column-based" Viterbi that uses decision feedback from previous rows. Recently, Li et al.[11] proposed an 2D-HMM algorithm for image classi- cation in which HMM parameters are estimated using the EM algorithm and the 2D Viterbi.

In this paper, we present a new approximation of the 2D Viterbi algorithm that allows to compute an approxima- tion of the best path within a 2D HMM, while avoiding the exponential growth in computation. In section 2, we intro- duce the concept and theory of the procedure. We present implementation details in section 3. We illustrate the use of this procedure in on some simple real world examples in section 4. Finally, in section 5 we conclude on our ndings and present our future research eorts.

2. A NOVEL 2D VITERBI PROCEDURE

One-dimensional Hidden Markov Models (1D HMM) are nite state machines which emit sequences of symbols ac- cording to a probabilistic mechanism. The state occupied by the machine at timetis a random variable qt, and the evolution is controlled by the output probability distribu- tionsP[otjqt=s] =P[otjs] and the transition probabilities

P[qt=sjjqt;1=si] =P[sjjsi].

Two-dimensional Hidden Markov Models (2D HMM) can be dened in a similar way. The output observation is an array of symbols oxy, for example the pixels of an image scanned using a line per line ordering, which are emitted based on the current state qxy. The output probability distributions of the model are now P[oxyjqxy = s]. Be- ing two dimensions, we expect the transition probability to depend on horizontal and vertical neighbors, such as

P[qxy=skjqx;1;y =si;qx;y ;1=sj] =P[skjsi;sj], for local context dependency.

It is easy to see that such a model has similar theoretical properties as a 1D HMM, because a 2D HMM is equivalent to a 1D HMM where the states are (q1;y;q2;y;:::qx;1;y;qx;y ;1;:::qX;y ;1). However, such the equivalent model has a number of states which is nowNX, (if N is the number of states in the model and X is the length of a line), and this leads to an exponential increase in the amount of computation that is needed for the regular Baum-Welch and Viterbi algorithms.

2.1. Notations

The two-dimensional data is represented by the array of symbolsoxy, for example the pixels of an image. The 2D HMM has states (s1;s2:::sS), and probability distributions

bi(o) = P(ojsi) and aij k =P[skjsi;sj]. The random vari- able used to indicate the state used to emit symboloxy is

qxy.

(2)

In the intermediate steps of the algorithm, we are inter- ested by the emission of the block of pixelsOxy=fouv;1

u x ; 1 v y g corresponding to the state block

Qxy = fquv ; 1 u x; 1 v y g. Note that com- putingP[OxyjQxy] is straightforward because the states are known (after a state has been chosen as local context for the border pixels).

We will compute the best path for blockOxy from the combination of previously obtained values. For that pur- pose, we need to introduce the following intermediate no- tations:

p= (x;y),q=qxy,o=oxy,

p

0= (x;1;y;1),q0 =qx;1;y ;1,o0=ox;1;y ;1,

p

1 = (x;y;1), q1 = qx;y ;1, o1 = ox;y ;1, Q1 =

fquv ; 1ux; 1vy;1g,

p2 = (x;1;y), q2 = qx;1;y, o2 = ox;1;y, Q2 =

fq

uv

; 1ux;1; 1vy g, We will also need the following:

R

p

0is one state conguration at positionsf(u;y) ; 1

u <x;1g(i.e. along the row containing p0 = (x; 1;y;1), until the position (x;2;y;1)).

C

p

0 is one state conguration at positionsf(x;v) ; 1

v < y;1g (i.e. along the column containing p0 = (x;1;y;1), until the position (x;1;y;2)).

These notations are graphically illustrated in gure 1.

0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000 0000000000000000000000000000

1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111 1111111111111111111111111111

00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00000

11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11111

0000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111 1111111111111111111111111111111111111111111111 1111111111111111111111111111111111111111111111 1111111111111111111111111111111111111111111111 y-2

y

x-2 x-1

y-1

x

VH VV O

p’

xy

Rp’

Q

(X,Y) (1,1)

xy

C

aijk (p’,o’,q’=s )

2 2 2 (p ,o ,q =s )

k (p,o,q=s )i 1 1 1 (p ,o ,q =s)j l

Figure 1: Notation for the 2D Viterbi procedure.

2.2. Approximation

Let us deneVxy(k) to be the maximum probability of the emission of output blockOxy over all possible state blocks

Qxy withqxy=sk. We can decompose

Vxy(k) = max

i;j N

V2D(x;y ;i;j)aij kbk(oxy);

where V2D(x;y ;i;j) is the maximum probability over all state blocksQxy;fqxygwithqx;y ;1=si andqx;1;y =sj corresponding to the emission of pixels inOxy;foxyg. In order words, we rst produce the output block except the last pixel oxy with states si sj as neighbors for the last

pixel, then move in statesk in position x;y and emit the last pixeloxy.

In turn,V2D(x;y ;i;j) can be decomposed as

V2D(x;y ;i;j) = max

lN ;R

p 0

;C

p 0

VA(p0;l ;Rp0;Cp0)

T

2D(q1=sijp0;l ;Rp0;Cp0)

T2D(q2=sjjp0;l ;Rp0;Cp0) (1) where

V

A(p0;l ;Rp0;Cp0) is the maximum probability over all state blocksQx;1;y ;1corresponding to the emission of

Ox;1;y ;1 with statesl at position p0= (x;1;y;1) and containing the congurations of statesRp0 andCp0 on its borders.

T

2D(q1=sijp0;l ;Rp0;Cp0) is the maximum probability of over all state blocksQ1 =Qx;y ;1corresponding to

Ox;y ;1withqx;y ;1=sigiven the values ofslat posi- tionp0, and given thatQ1 =Qx;y ;1 should comprise the state congurationsRp0 andCp0.

The previous formula relates the construction of the best state block for output block Oxy from state blocks corre- sponding to smaller output blocksOx;1;y ;1. However, an exact usage of Equation 1 is dicult to implement in prac- tice, since it requires to store all possible congurations for

q

0,Rp0 andCp0.

We therefore introduce the following approximations

V

2D(x;y ;i;j) ' max

k N

max

R

p 0

;C

p 0

V

A(p0;k ;Rp0;Cp0)

max

R

p 0

;C

p 0

T2D(q1=sijp0;k ;Rp0;Cp0)

max

R

p 0

;C

p 0

T

2D(q2=sjjp0;k ;Rp0;Cp0)

Let us dene VH(q1=sijq0=sk) as the maximum prob- ability over all state blocksQ1 containing the stateq0=sk at position p0 = (x;1;y;1) and the state q1 = si at positionp1= (x;y;1), while emittingOx;y ;1.

VH(q1=sijq0=sk) ' max

R

p 0

;C

p 0

VA(p0;k ;Rp0;Cp0) max

R

p 0

;C

p 0

T

2D(q1=sijp0;k ;Rp0;C(2)p0) Now, by denition,

max

R

p 0

;C

p 0

V

A(p0;k ;Rp0;Cp0) =Vp0(k) =Vx;1;y ;1(k) so that,

max

R

p 0

;C

p 0

T

2D(q1=sijp0;k ;Rp0;Cp0)'VH(q1=sijq0=sk)

V

p 0(k) When detailing the case of p2 = (x;1;y), we dene in the same manner VV(q2 = sijq0 = sk) as the maximum probability over all state blocks Q2 containing the state

q

0=skat position p0= (x;1;y;1) and the stateq2=si

(3)

at position p1 = (x;1;y), while emitting Ox;1;y. In this context,

max

R

p 0

;C

p 0

T2D(q2 =sijp0;k ;Rp0;Cp0)'VV(q2=sijq0=sk)

V

p 0(k) We therefore obtain,

V2D(x;y ;i;j) = max

k N

V

H(qx;y ;1=sijqx;1;y ;1=sk)

V

x;1;y ;1(k)

V

V(qx;1;y =sjjqx;1;y ;1=sk)

V

x;1;y ;1(k)

Finally, combining all equations, we obtain the 2D Viterbi recursive formula,

Vxy(i) = max

j;k ;lN

V

H(q1 =sjjq0=sl)VV(q2=skjq0=sl)

V

x;1;y ;1(l)

P[qxy=sijqx;y ;1=sj;qx;1;y =sk]]P[oxyjqxy=si] which can be equivalently rewritten as

Vxy(i) = max

lN

max

j;k N

xy(i;j;k ;l)

bi(oxy); (3) where,

xy(i;j;k ;l) = aij k

VH(qx;y ;1=sjjqx;1;y ;1=sl)

Vx;1;y ;1(l)

VV(qx;1;y =skjqx;1;y ;1=sl)

Vx;1;y ;1(l)

(4) These equations now create a link between the resulting most likely state at positionp0= (x;1;y;1) and that at position p= (x;y) via the enumeration of all combination of states at positions p1 = (x;y;1) and p2 = (x;1;y).

Equation 3 and 4 will form the core of the procedure we propose for retrieving the most likely state block from a given output blockO.

3. 2D VITERBI PROCEDURE 3.1. Forward pass

We now detail the practical implementation of our 2D Viterbi procedure. A model is represented by the fol- lowing parameters:

N, the number of states in the model.

V, the vocabulary (i.e. the set of all possible values of an observation). Vcan possibly be of innite size.

A=faij k=P[qxy=sijqx;y ;1=sj;qx;1;y=sk] ; 1

i N; 1 j N; 1 k Ng, the 2D-transition probability values.

B =fbi(v) =P[oxy=v jqxy =si] ; 1 iN 8v2

Vg, the set of state output probability values.

=fi =P[qxy =sijx= 1 ory= 1]; 1iNg, the set of initial state probability which determine the (upper and left) border constraints.

The following intermediate variables are calculated during the forward process,

V

xy(i) is the maximum probability of obtaining a state blockQxy containing the stateq=siat position p= (x;y), while emittingOxy,

V

H(qx;y ;1=sjjqx;1;y ;1 =sl) is the maximum prob- ability of obtaining a state block Q1 containing the stateq0 =sl at position p0 = (x;1;y;1) and the stateq1 =sjat positionp1= (x;y;1), while emitting

Ox;y ;1,

V

V(qx;1;y =skjqx;1;y ;1=sl) is the maximum prob- ability of obtaining a state block Q2 containing the stateq0 =sl at position p0 = (x;1;y;1) and the stateq2 =skat positionp2 = (x;1;y), while emitting

O

x;1;y,

xy(i;j;k ;l) is the probability of obtaining one state blockQcontaining the statessi,sj,sk and sl at po- sitions p = (x;y), p1 = (x;y;1), p2 = (x;1;y) and p0 = (x;1;y;1), respectively, while emitting

O

xy

;fo

xy g.

Similarly to the 1D Viterbi procedure, the values ofV,VH, andVVare dened recursively and calculated online during the forward pass.

3.2. Backward pass

The retrieval of the best state block Q which emits the outputOis possible via the storage of states which lead to maximal values during the forward pass. For example, we have

P[O j] = max

i

VXY(i) (5)

so that the index which achieves the maximum denes the nal stateqXY of the best state block. From the nal state, we can recover the previous two neighbors (given the - nal state), then retrieve backwards all states from the last row and column, nally all internal states (in reverse order, given their successors), through the array of indices xy(i;j;k) = l= arg max

l

[ xy(i;j;k ;l)]w ith1iN;

2xX;1;2yY ;1

4. EXPERIMENTS 4.1. Segmentation

We have used our procedure to train 2D HMM on images of handwritten characters. One application is the segmen- tation of such images. In the following example, we use a 5 state 2D HMM, with initial transition probabilities com- puted from the transition of the ve regions of shown in gure 2. Note that this initialization induces some con- straints on the boundaries between regions, as certain tran- sition probabilities will be set to zero and will prevent cer- tain combination of state neighborhoods to occur. After training on character images, the best state block provides a segmentation of training images shown in gure 3.

It is visible that the initial constraints produce a model which can easily t the shapes on rst row, but is less ade- quate for the shapes in the second row, resulting in a some- times awkward segmentation of these images.

(4)

Figure 2: Initial segmentation

Figure 3: Segmentation using the completed model

4.2. Recognition

In this experiment, we use a portion of the NIST database (see gure 4) to train a generic 10x10 model on a set of instances for each digit from 0 to 9. 10 instances of each gures are taken as training set, to build 10 HMM models i,i= 0;:::;9.

Figure 4: Handwritten character training database Recognition is performed by searching for the best model which emits an unknown character image

n= arg max

i

P[Iji]

As an example, the sequence of handwritten characters shown in gure 5 is recognized as 9 7 7 3 8 6 9 7 2 3. The system incorrectly identied three of the ten char- acters. The errors have occured on the rst characters \4",

\1", \3". When comparing with the database these errors are consistent with the training set (i.e. for example, the rst \4" is very similar to some instances of \9").

5. CONCLUSION

We have presented a procedure which approximates the computation of the Viterbi path in a 2D HMM. This opens new possibilities for the modelization of 2D data such as images, using probabilistic procedures which takes into ac- count the local context of each pixel.

Figure 5: Example of a test gure string

We have illustrated how this technique can be applied to handwritten optical character recognition. In the future, we intend to evaluate the sensitivity and performance of our algorithm on larger problems, and extend other 1D HMM procedures to this 2D HMM framework.

6. REFERENCES

[1] H. Bourlard and N. Morgan. Connectionist Speech Recognition { A Hybrid Approach. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1994.

[2] A. Kosmala and G. Rigoll. On-line handwritten for- mula recognition using statistical methods. InProceed- ings of ICPR'98, pages PR33, 1998.

[3] S.-S. Kuo and O. E. Agazzi. Automatic keyword recog- nition using Hidden Markov Models. Journal of Visual Communication and Image Representation, 5(3):265{

272, 1994.

[4] S.-S. Kuo and O. E. Agazzi. Keyword spotting in poorly printed documents using Pseudo 2-D Hidden Markov models. IEEE PAMI, 16(8):842{848, 1994.

[5] E. Levin and R Pieraccini. Dynamic planar warp- ing for optical character recognition. In Proceedings of ICASSP'92, volume III, pages 149{152, 1992.

[6] S. Marchand-Maillet and B. Merialdo. Stochastic mod- els for face image analysis. In Proceedings of the 1st European Workshop on Content-Based Multimedia In- dexing, October 1999.

[7] A. P. Pentland, R. W. Picard and S. Sclaro.

Photobook: Content-Based Manipulation of Image Databases. International Journal of Computer Vision, 18(3):233-254, June 1996.

[8] R. W Picard. A Society of Models for Video and Im- age Libraries. IBM Systems Journal, MIT Media Lab Special Issue, Vol. 35, Nos. 3 & 4, pp. 292-312, 1996.

[9] L. R. Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition. Pro- ceedings of the IEEE, 77(2):257{285, 1989.

[10] L. R. Rabiner and B.-H. Juang. Fundamentals of Speech Recognition. Prentice Hall, Englewood Clis, NJ, 1993.

[11] J. Li, A. Najmi and R. M. Gray. Image Classication by a Two Dimensional Hidden Markov Model. InPro- ceedings of ICASSP'99, volume VI, pages 3313{3316, 1999.

[12] C. L. Miller, B. R. Hunt, M. A. Neifeld, and M. W. Marcellin, Bi-level image reconstructions via 2-D Viterbi search. In Proceedings of IEEE ICIP'97, pages 181{184, 1997.

Références

Documents relatifs

1) Closed Formulas for Renyi Entropies of HMMs: To avoid random matrix products or recurrences with no explicit solution, we change the approach and observe that, in case of integer

The hidden states at each observation time (from Figure 4. General skill development model. The hidden states N are represented by circles and transition probabilities by

Unit´e de recherche INRIA Rennes, Irisa, Campus universitaire de Beaulieu, 35042 RENNES Cedex Unit´e de recherche INRIA Rhˆone-Alpes, 655, avenue de l’Europe, 38330 MONTBONNOT ST

Each subject i is described by three random variables: one unobserved categorical variable z i (defining the membership of the class of homogeneous physical activity behaviors

Accelerometer data analysis consists of extracting information on time spent at different levels of activity.. Then, this information is generally used in a

Introduced in [7, 8], hidden Markov models formalise, among others, the idea that a dy- namical system evolves through its state space which is unknown to an observer or too

We revisit this problem in a verification context: given k partially observable systems modeled as Hidden Markov Models (also called labeled Markov chains), and an execution of one

The Evidential EM (E2M) [5] is an interative algorithm dedicated to maximum likelihood estimation in statistical models based on uncertain observations encoded by belief