• Aucun résultat trouvé

Finding Piecewise Constant Functions with RJMCMC Methods

Marko Salmenkivi and Heikki Mannila

5.2 Bayesian Approach and MCMC Methods

5.2.1 Finding Piecewise Constant Functions with RJMCMC Methods

We are interested in piecewise constant models with the number of pieces as a model parameter. That is, we want to consider models with different numbers of parameters. In a Bayesian framework this can be done by specifying a hierarchical model, which assigns a prior distribution to the number of pieces.

Figure 5.2 displays a graphical representation of two models: a piecewise constant model with a fixed number of pieces and its extension to variable dimensions, with the number of piecesmas a hyperparameter of the model.

To exploit the MCMC methods in the extended model, we should be able to move from a state to another state with a different number of dimensions. In the following we present the reversible jump MCMC (RJMCMC) algorithm, which is a generalization of the Metropolis-Hastings algorithm to state spaces of variable dimensions. The form of the presentation is easily applicable for our purposes, modeling sequential data by utilizing piecewise constant presentations. For more general conditions and a detailed theoretical description of variable-dimension MCMC, see [155]. The main sources of the presentation are [419] and [155].

To use RJMCMC simulation, we need to specify proposal distributions that update the existing parameter vector. The updates are divided into

90 Data Mining in Bioinformatics

α1... αm c1...cm−1

α

S

m

α

S

α1... α4 c1...c3

Fig. 5.2.Graphical representation of (left) a model with a fixed number of pieces, herem= 4, and (right) a model with the number of pieces not fixed.Sis sequential data.

two groups: those that maintain the dimension and those that do not. The proposal distributions that maintain the dimension can change the function value in one segment and modify the start or end point of a segment. The operations that modify the dimension can create a new segment by splitting an existing one or merge two adjacent segments into one.

Let us divide the parameter vector into two componentsθ= (m, z), where m∈ {1,2, . . . , K}determines the dimension of the modelnm1 andzis the vector of the other parameters of the model, possibly of varying length. Given m,z can take only values inEnm ⊆Rnm. To enable the use of reversibility, the following condition has to be met for all pairsz, z:

nm+nmm =nm+nmm, (5.2)

where nmm is the dimension of the vector of random numbers used in determining the candidate state z in statez andnmm is the dimension of the vector of random numbers used in the reverse operation. This condition is to ensure that both sides of the reversibility equation have densities on spaces of equal dimension:

f(z|m)qmm(z, z) = f(z|m)qmm(z, z). (5.3) Heref(z|m) is the density ofzin the target distribution, given the dimension nm, andqmm(z, z) is the proposal density of proposingzinz, given that a move is proposed from the model dimensionnmto nm.

For our purposes it is enough to consider the situation where nmm >0 and nmm = 0, when m > m. That is, we use a stochastic proposal only when moving to a state with higher dimension. In the reverse operation a deterministic proposal is used instead; thusnmm= 0.

Piecewise constant model. Consider a model consisting of a single piecewise constant function. Using the notation discussed, denote the number of the pieces of the piecewise constant function by m; then the other parameters arec1, . . . , cm1, α1, . . . , αm, wherec1, . . . , cm1 are the change points of the function andα1, . . . , αmare the function levels. The dimension of the model isnm= 2m1.

We update each change point and each level componentwise. These updates do not change the dimension of the model. For the number of pieces we first choose between increasing or decreasing the value with some probabilitiesqadd and qdel = 1−qadd, and then propose to insert or delete one piece at a time. Thus,

q(m, m) =

⎧⎨

qadd ifm=m+ 1 1−qaddifm=m−1

0 otherwise

(5.4) Inserting a piece means adding one change point, which splits one of the pieces into two. In addition to the location of the new change point, the function levels around the change point must be determined. We employ a strategy introduced by Green [155], which maintains the integral of the piecewise constant function in the adding and removing processes. This property of the proposal distribution is often useful for achieving good convergence and for covering the target distribution exhaustively in feasible time.

Denote the function value in the piece to be split by α0, the candidate for the new change point by c, the candidate for the function value to the left fromc byαl, and the candidate for the function value to the right from c byαr (Figure 5.3). We should define how the parameters of the new state (c, αl, αr) are determined from the current state (α0) and a vector of random numbers.

We use two random numbers,u= (u1, u2), in constructing the new state;

thus,nmm = 2. We drawu1 uniformly from range ]Ss, Se[ and set c =u1. For the function values, we drawu2from the normal distribution N(0, σ·α0), whereσis a constant that controls how close the new function values are to the original value.

Accordingly, we have the following function g for determining the new state:

g(α0, u1, u2) = (g10, u1, u2), g20, u1, u2), g30, u1, u2)), where

92 Data Mining in Bioinformatics

α

α0 αr

l

c

w w

c c

2 1

l r t

Fig. 5.3.Changing the number of pieces of a piecewise constant function. When adding a piece, the new change point c is inserted between the existing change pointsclandcr;w1andw2 are the distances ofc fromclandcr, respectively. The function value before the operation isα0. The new levelsαlandαr are determined according to Equation 5.5. The remove operation is carried out similarly in reverse order.

g10, u1, u2) =α0+u2/w1=αl, g20, u1, u2) =α0−u2/w2=αr, g30, u1, u2) =u1 =c,

(5.5) and

u1∼U(Ss, Se),

u2∼N(0, σ·α0). (5.6)

Herew1is the distance of the new change pointc from the previous change point cl (or the start point of the observation period if there is no change point before c), and w2 is the distance of the new change point from the next change pointcr (or the end point of the observation period, if there is no change point afterc). Thusw1=c−cl andw2=cr−c.

Reversibility condition.Now we can write the reversibility condition more explicitly. For the transition into a higher dimension, the following conditions should be met:

1. m =m+ 1 is proposed.

2. given the proposal ofm,z =g(z, u) is proposed.

3. (m, z) is accepted.

Hence, we can write the reversibility condition as follows:

Am

Immp(m)f(z|m)qadd a[(m, z),(m, z)]qmm(z, u)dzdu

=

Bm

Immp(m)f(z|m)qdela[(m, z),(m, z)]dz,

where Am Enm, Bm Enm, p(m) is the probability of m and Imm andImmare the indicator functionsImm = 1(z∈Am, g(z, u)∈Bm), Imm= 1(g1(z)∈Am).

Now we want to present the right-hand side in terms of variables z and uinstead of z. Since the functiong of the Equation 5.5 is a differentiable bijection, we can substitute the variables z = (α0) and u = (u1, u2) for where the Jacobian|J|of the transformation is

|J |=|∂(g(z, u))

The reversibility condition is met if

p(m)f(z |m)qadda[(m, z),(m, g(z, u))]qmm(z, u)

=p(m)f(g(z, u)|m)qdel a[(m, g(z, u)),(m, z)]|J |. Hence the acceptance ratio for inserting a piece is

a[(m, z),(m, z)] = min

Here P r(m) is the prior probability of m being the number of pieces, P r(c1, . . . , cm1) is the prior density of the change pointsc1, . . . , cm1, and

94 Data Mining in Bioinformatics

P r(α0) is the prior density of function valueα0. Further,L(z) andL(z) are the likelihoods in the candidate state and current state, respectively, andqadd andqdelare the probabilities of proposing to insert a new segment and delete an existing one. The probability of removing a specific piece is 1/(m+ 1), and the density of inserting a specific piece is 1/(Se−Ss)·y,wherey∼N(0, σ·α0).

Finally, (w1+w2)/w1w2 is the Jacobian of the deterministic transformation of the random variables when changing the dimension of the model. Notice also the exception of the potential initial state where the denominator is zero;

then the acceptance rate is 1.

The acceptance probability for the reverse remove operation yields reversiblya[(m, z),(m, z)] = min(1,1/a[(m, z),(m, z)]).

5.3 Examples

In this section we give some simple examples of applying the framework to biological data. Our focus is on describing the applicability of the method, not on interpretation of the results.