Finding Piecewise Constant Functions with RJMCMC Methods

Marko Salmenkivi and Heikki Mannila

5.2 Bayesian Approach and MCMC Methods

5.2.1 Finding Piecewise Constant Functions with RJMCMC Methods

We are interested in piecewise constant models with the number of pieces as a model parameter. That is, we want to consider models with diﬀerent numbers of parameters. In a Bayesian framework this can be done by specifying a hierarchical model, which assigns a prior distribution to the number of pieces.

Figure 5.2 displays a graphical representation of two models: a piecewise constant model with a ﬁxed number of pieces and its extension to variable dimensions, with the number of piecesmas a hyperparameter of the model.

To exploit the MCMC methods in the extended model, we should be able to move from a state to another state with a diﬀerent number of dimensions. In the following we present the reversible jump MCMC (RJMCMC) algorithm, which is a generalization of the Metropolis-Hastings algorithm to state spaces of variable dimensions. The form of the presentation is easily applicable for our purposes, modeling sequential data by utilizing piecewise constant presentations. For more general conditions and a detailed theoretical description of variable-dimension MCMC, see [155]. The main sources of the presentation are [419] and [155].

To use RJMCMC simulation, we need to specify proposal distributions that update the existing parameter vector. The updates are divided into

90 Data Mining in Bioinformatics

α1^... αm c₁^...c_m−1

S

α1^... α4 c₁^...c₃

Fig. 5.2.Graphical representation of (left) a model with a ﬁxed number of pieces, herem= 4, and (right) a model with the number of pieces not ﬁxed.Sis sequential data.

two groups: those that maintain the dimension and those that do not. The proposal distributions that maintain the dimension can change the function value in one segment and modify the start or end point of a segment. The operations that modify the dimension can create a new segment by splitting an existing one or merge two adjacent segments into one.

Let us divide the parameter vector into two componentsθ= (m, z), where m∈ {1,2, . . . , K}determines the dimension of the modeln_m≥1 andzis the vector of the other parameters of the model, possibly of varying length. Given m,z can take only values inEⁿ^m ⊆Rⁿ^m. To enable the use of reversibility, the following condition has to be met for all pairsz, z:

n_m+n_mm =n_m+n_mm, (5.2)

where n_mm is the dimension of the vector of random numbers used in determining the candidate state z in statez andn_mm is the dimension of the vector of random numbers used in the reverse operation. This condition is to ensure that both sides of the reversibility equation have densities on spaces of equal dimension:

f(z|m)q_mm(z, z) = f(z|m)q_mm(z, z). (5.3) Heref(z|m) is the density ofzin the target distribution, given the dimension nm, andqmm(z, z) is the proposal density of proposingzinz, given that a move is proposed from the model dimensionn_mto n_m.

For our purposes it is enough to consider the situation where n_mm >0 and n_mm = 0, when m > m. That is, we use a stochastic proposal only when moving to a state with higher dimension. In the reverse operation a deterministic proposal is used instead; thusnmm= 0.

Piecewise constant model. Consider a model consisting of a single piecewise constant function. Using the notation discussed, denote the number of the pieces of the piecewise constant function by m; then the other parameters arec₁, . . . , c_m₋₁, α₁, . . . , α_m, wherec₁, . . . , c_m₋₁ are the change points of the function andα₁, . . . , α_mare the function levels. The dimension of the model isn_m= 2m−1.

We update each change point and each level componentwise. These updates do not change the dimension of the model. For the number of pieces we ﬁrst choose between increasing or decreasing the value with some probabilitiesq_add and q_del = 1−q_add, and then propose to insert or delete one piece at a time. Thus,

q(m, m) =

⎧⎨

⎩

q_add ifm=m+ 1 1−q_addifm=m−1

0 otherwise

(5.4) Inserting a piece means adding one change point, which splits one of the pieces into two. In addition to the location of the new change point, the function levels around the change point must be determined. We employ a strategy introduced by Green [155], which maintains the integral of the piecewise constant function in the adding and removing processes. This property of the proposal distribution is often useful for achieving good convergence and for covering the target distribution exhaustively in feasible time.

Denote the function value in the piece to be split by α₀, the candidate for the new change point by c, the candidate for the function value to the left fromc byα_l, and the candidate for the function value to the right from c byα_r (Figure 5.3). We should deﬁne how the parameters of the new state (c, α_l, α_r) are determined from the current state (α₀) and a vector of random numbers.

We use two random numbers,u= (u₁, u₂), in constructing the new state;

thus,n_mm = 2. We drawu₁ uniformly from range ]S_s, S_e[ and set c =u₁. For the function values, we drawu₂from the normal distribution N(0, σ·α₀), whereσis a constant that controls how close the new function values are to the original value.

Accordingly, we have the following function g for determining the new state:

g(α0, u1, u2) = (g1(α0, u1, u2), g2(α0, u1, u2), g3(α0, u1, u2)), where

92 Data Mining in Bioinformatics

α⁰ αr

l’

’

w w

c c

2 1

l ’ r t

Fig. 5.3.Changing the number of pieces of a piecewise constant function. When adding a piece, the new change point c is inserted between the existing change pointsc_landc_r;w₁andw₂ are the distances ofc fromc_landc_r, respectively. The function value before the operation isα₀. The new levelsα_landα_r are determined according to Equation 5.5. The remove operation is carried out similarly in reverse order.

g₁(α₀, u₁, u₂) =α₀+u₂/w₁=α_l, g₂(α₀, u₁, u₂) =α₀−u₂/w₂=α_r, g3(α0, u1, u2) =u1 =c,

(5.5) and

u₁∼U(S_s, S_e),

u2∼N(0, σ·α0). (5.6)

Herew1is the distance of the new change pointc from the previous change point cl (or the start point of the observation period if there is no change point before c), and w2 is the distance of the new change point from the next change pointcr (or the end point of the observation period, if there is no change point afterc). Thusw₁=c−c_l andw₂=c_r−c.

Reversibility condition.Now we can write the reversibility condition more explicitly. For the transition into a higher dimension, the following conditions should be met:

1. m =m+ 1 is proposed.

2. given the proposal ofm,z =g(z, u) is proposed.

3. (m, z) is accepted.

Hence, we can write the reversibility condition as follows:

I_mmp(m)f(z|m)q_add a[(m, z),(m, z)]q_mm(z, u)dzdu

Immp(m)f(z|m)qdela[(m, z),(m, z)]dz,

where A_m ⊆ Eⁿ^m, B_m ⊆ Eⁿ^m, p(m) is the probability of m and I_mm andI_mmare the indicator functionsI_mm = 1(z∈A_m, g(z, u)∈B_m), Imm= 1(g⁻¹(z)∈Am).

Now we want to present the right-hand side in terms of variables z and uinstead of z. Since the functiong of the Equation 5.5 is a diﬀerentiable bijection, we can substitute the variables z = (α₀) and u = (u₁, u₂) for where the Jacobian|J|of the transformation is

|J |=|∂(g(z, u))

The reversibility condition is met if

p(m)f(z |m)qadda[(m, z),(m, g(z, u))]qmm(z, u)

=p(m)f(g(z, u)|m)q_del a[(m, g(z, u)),(m, z)]|J |. Hence the acceptance ratio for inserting a piece is

a[(m, z),(m, z)] = min

Here P r(m) is the prior probability of m being the number of pieces, P r(c₁, . . . , c_m₋₁) is the prior density of the change pointsc₁, . . . , c_m₋₁, and

94 Data Mining in Bioinformatics

P r(α₀) is the prior density of function valueα₀. Further,L(z) andL(z) are the likelihoods in the candidate state and current state, respectively, andq_add andqdelare the probabilities of proposing to insert a new segment and delete an existing one. The probability of removing a speciﬁc piece is 1/(m+ 1), and the density of inserting a speciﬁc piece is 1/(Se−Ss)·y,wherey∼N(0, σ·α0).

Finally, (w₁+w₂)/w₁w₂ is the Jacobian of the deterministic transformation of the random variables when changing the dimension of the model. Notice also the exception of the potential initial state where the denominator is zero;

then the acceptance rate is 1.

The acceptance probability for the reverse remove operation yields reversiblya[(m, z),(m, z)] = min(1,1/a[(m, z),(m, z)]).

5.3 Examples

In this section we give some simple examples of applying the framework to biological data. Our focus is on describing the applicability of the method, not on interpretation of the results.

Dans le document Advanced Information and Knowledge Processing (Page 93-98)