A HYBRID METROPOLIS-HASTINGS CHAIN

(1)

on the occasion of his 75th birthday

A HYBRID METROPOLIS-HASTINGS CHAIN

UDREA P ˘AUN

We consider a hybrid Metropolis-Hastings chain on a known finite state space; its design is based on theGmethod (because this method can perform some interesting things, see Sections 1 and 2 and also [17]) and its analysis (the convergence rate) is based on theGmethod and ergodicity coefficients. Finally, we give two special cases of the hybrid chain, namely, when the state space isSn,the set of permutations of ordern (e.g., the Mallows model is defined on this space), and when this is{0,1, . . . , h}ⁿ,h, n≥1 (e.g., the Ising model is defined on{0,1}ⁿ).

AMS 2010 Subject Classification: 60J10, 60J20, 65C05, 68Q25.

Key words: ∆-ergodic theory, Gmethod, ergodicity coefficient, Metropolis-Has- tings chain, hybrid Metropolis-Hastings chain, convergence rate.

1. G∆1,∆2 AND G∆1,∆2 IN ACTION

To design and analyze our hybrid Metropolis-Hastings chain we need to extend certain notions and results from the ∆-ergodic theory in a more general framework, namely, that of nonnegative matrices, and, then, to give, at least for the ∆-ergodic theory, other results. (See mainly [17] for this section; see [13–17] and references therein for the general ∆-ergodic theory.)

Set

Par(E) ={∆|∆ is a partition ofE},

where E is a nonempty set. We shall agree that the partitions do not contain the empty set.

Definition 1.1. Let ∆₁,∆₂∈Par(E).We say that ∆₁ is finer than ∆₂ if

∀V ∈∆1,∃W ∈∆2 such that V ⊆W.

Write ∆₁ ∆₂ when ∆₁ is finer than ∆₂. Set

hmi={1,2, . . . , m}, m≥1,

REV. ROUMAINE MATH. PURES APPL.,56(2011),3, 207–228

(2)

Nm,n ={F |F is a nonnegativem×nmatrix}, Sm,n ={F |F is a stochastic m×nmatrix},

N_n=N_n,n and S_n=S_n,n.

LetF = (F_ij) ∈N_m,n. (The entries of a matrix Z will be denotedZ_ij.) Let ∅ 6=U ⊆ hmi and ∅ 6=V ⊆ hni. Define the matrices

F_U = (F_ij)_{i∈U, j∈hni}, F^V = (F_ij)_{i∈hmi, j∈V}, and F_U^V = (F_ij)i∈U, j∈V. Definition 1.2. LetP ∈Nm,n. We say thatP is a generalized stochastic matrix if∃a≥0,∃Q∈S_m,n such thatP =aQ.

Definition 1.3 ([17]). LetP ∈Nm,n. Let ∆∈Par(hmi) and Σ∈Par(hni).

We say that P is a [∆]-stable matrix on Σ if P_K^L is a generalized stochastic matrix, ∀K ∈ ∆, ∀L ∈Σ. In particular, a [∆]-stable matrix on ({i})_i∈hni is called [∆]-stable for short (({i})_i∈hni:= ({1},{2}, . . . ,{n})).

Definition 1.4 ([17]). LetP ∈Nm,n. Let ∆∈Par(hmi) and Σ∈Par(hni).

We say thatP is a ∆-stable matrix on Σ if ∆ is the least fine partition for which P is a [∆]-stable matrix on Σ. In particular, a ∆-stable matrix on ({i})_i∈hni is called ∆-stable while a (hmi)-stable matrix on Σ is called stable on Σ for short. A stable matrix on ({i})_i∈hni is called stable for short.

Let ∆₁∈Par(hmi) and ∆₂∈Par(hni). Define

G_∆₁_,∆₂ ={P |P ∈S_m,n and P is a [∆₁]-stable matrix on ∆₂} (see [17] and, for an equivalent definition, [12]),

G∆1,∆2 ={P |P ∈Nm,n andP is a [∆1]-stable matrix on ∆2}, and, if m=n,

G_∆=G_∆,∆

(see [11] for an equivalent definition) and G_∆=G_∆,∆.

Let P ∈ G∆1,∆2. Let K ∈ ∆1 and L ∈ ∆2. Then ∃a_K,L ≥ 0, ∃Q_K,L ∈ S_|K|,|L| such thatP_K^L=a_K,LQ_K,L.Set

P⁻⁺= (P_KL⁻⁺)K∈∆₁, L∈∆₂, P_KL⁻⁺=a_K,L, ∀K ∈∆₁, ∀L∈∆₂

(see also [17]). If confusion can arise we write P^−+(∆¹^,∆²⁾ instead of P⁻⁺. In this article, when we work with the operator (·)⁻⁺ = (·)⁻⁺(∆₁,∆₂) we suppose, for labelling the rows and columns of matrices, that ∆1 and ∆2 are

(3)

ordered sets, even if we omit to precise this. To give an example, let P =





2 3 7 0 5 0 6 1 1 0 0 2



.

Obviously, P ∈ G_∆₁_,∆₂, where ∆₁ = ({1,2},{3}) and ∆₂ = ({1,2},{3,4}).

Further, we have

P⁻⁺=P^−+(∆¹^,∆²⁾=

5 7 1 2

.

({1,2} and {3} are the first and the second element of ∆₁, respectively; on the basis of this order, the first and the second row of P⁻⁺ are labelled{1,2}

and {3}, respectively. The columns ofP⁻⁺ are labelled similarly.)

The next result is the main one of this section; it is a generalization of Theorem 2.3 in [17].

Theorem 1.5. Let P ∈G_∆₁_,∆₂ ⊆N_m,n and Q∈G_∆₂_,∆₃ ⊆N_n,p. Then (i)P Q∈G∆1,∆3 ⊆Nm,p;

(ii) (P Q)⁻⁺=P⁻⁺Q⁻⁺.

Proof. (i) Let P ∈G_∆₁_,∆₂ and Q ∈G_∆₂_,∆₃. Then ∀K ∈∆₁,∀U ∈∆₂,

∀L∈ ∆3, ∃a_K,U ≥0, ∃A_K,U ∈ S_|K|,|U|, ∃b_U,L ≥0, ∃B_U,L∈ S_|U|,|L| such that P_K^U =aK,UAK,U and Q^L_U =bU,LBU,L.

LetK ∈∆₁ and L∈∆₃.Leti∈K.We have X

l∈L

(P Q)_il=X

l∈L

X

k∈hni

P_ikQ_kl= X

k∈hni

P_ikX

l∈L

Q_kl= X

W∈∆2

X

k∈W

P_ikX

l∈L

Q_kl=

= X

W∈∆2

X

k∈W

PikbW,L= X

W∈∆2

bW,L

X

k∈W

Pik= X

W∈∆2

aK,WbW,L.

It follows that P

l∈L

(P Q)_il only depends on constants a_K,W, b_W,L, W ∈ ∆₂,

∀i∈K.Therefore, P Q∈G∆1,∆3. (ii) See the proof of (i).

In this article, a vector x is a row vector and x⁰ denotes its transpose.

Set e=e(n) = (1,1, . . . ,1)∈Rⁿ,∀n≥1.

The next result is a generalization of Theorem 2.10 in [17]; it is another main result of this section.

Theorem 1.6. Let P₁ ∈ G_(hm₁_i),∆₂ ⊆N_m₁_,m₂, P₂ ∈G_∆₂_,∆₃ ⊆N_m₂_,m₃, . . . , Pn−1∈G_∆_n−1_,∆_n ⊆N_m_n−1_,m_n, P_n∈G_∆_n_,({i})_i∈h

mn+1i ⊆N_m_n_,m_n+1. Then (i)P₁P₂. . . P_n is a stable matrix;

(ii) π = P₁⁻⁺P₂⁻⁺. . . P_n⁻⁺, where e⁰π := P1P2. . . Pn (i.e., π = (P1P2

. . . P_n)_{i} for a fixed i).

(4)

Proof. (i) By Theorem 1.5(i) and induction we have P1P2. . . Pn ∈ G(hm₁i),({i})_i∈h_mn+1_i. Therefore, P1P2. . . Pn is a stable matrix. (A matrix P ∈ Nq,r is a stable matrix if and only ifP ∈G(hqi),({i})_i∈hri.)

(ii) By (i), π = (P₁P₂. . . P_n)⁻⁺. Now, (ii) follows by Theorem 1.5(ii) and induction.

LetP ∈N_m,n.Define

α(P) = min

1≤i,j≤m n

X

k=1

min(Pik, Pjk)

(if P ∈ S_m,n, then α(P) is called the Dobrushin ergodicity coefficient of P ([4]; see, e.g., also [6, p. 56])) and

α(P) = 1 2 max

1≤i,j≤m n

X

k=1

|P_ik−P_jk|.

Remark 1.7 (see, e.g., [7, p. 143]). IfP ∈Sm,n, thenα(P) = 1−α(P).

Theorem 1.6(i) can be used, e.g., to see if a finite Markov chain has a finite convergence time (see also [17]). Below we give other applications (Theo- rem 1.8 and Remark 1.9) – the best results of this section – of G∆1,∆2 and of G∆1,∆2; they give bounds forα(P1P2. . . Pn),whereP1, P2, . . . , Pnare nonnegative matrices (an interesting case is that when these matrices are sparse large stochastic). When we study products of nonnegative matrices using G_∆₁_,∆₂ and/or G_∆₁_,∆₂ we shall refer this as the Gmethod.

Theorem 1.8. Let P1 ∈Nm1,m2, P2 ∈Nm2,m3, . . . , Pn∈Nmn,mn+1.Let

∆₁ = (hm₁i), ∆₂ ∈ Par(hm₂i), . . . ,∆_n ∈ Par(hm_ni), ∆_n+1 = ({i})_i∈hm_n+1_i. Consider the matrices L_l= ((L_l)_{V W})_V∈∆_l, W∈∆_l+1((L_l)_{V W} is the entry (V, W) of matrix L_l),U_l= ((U_l)V W)V∈∆_l, W∈∆_l+1,l∈ hni, where

(L_l)_{V W} := min

i∈V

X

j∈W

(P_l)_ijand (U_l)_{V W} := max

i∈V

X

j∈W

(P_l)_ij,

∀l∈ hni,∀V ∈∆_l,∀W ∈∆_l+1. Then X

K∈∆n+1

(L1L2. . . Ln)hm₁iK ≤α(P1P2. . . Pn)≤ X

K∈∆n+1

(U1U2. . . Un)hm₁iK. (Since L1L2. . . Ln and U1U2. . . Un are 1× |hm_n+1i| matrices, they can be thought as being row vectors, but above we used and below we shall use the matrix notation for entries instead of the vector one. E.g., above the matrix notation (L₁L₂. . . L_n)_hm₁_iK was used instead of the vector one (L₁L₂. . . L_n)_K because, in this article, the notation A_U, where A ∈ N_p,q and ∅ 6= U ⊆ hpi, means something different.)

(5)

Proof. Consider the matricesE1, F1 ∈G∆1,∆2, E2,F2 ∈G∆2,∆3, . . . , En, F_n ∈ G_∆_n_,∆_n+1 such that E_l⁻⁺ = L_l and F_l⁻⁺ = U_l, ∀l ∈ hni. Since (see Theorem 1.6)

(E₁E₂. . . E_n)_{i}= (E₁E₂. . . E_n)⁻⁺=E₁⁻⁺E₂⁻⁺· · ·E_n⁻⁺=L₁L₂. . . L_n and

(F₁F₂. . . F_n)_{i}= (F₁F₂. . . F_n)⁻⁺=F₁⁻⁺F₂⁻⁺· · ·F_n⁻⁺ =U₁U₂. . . U_n,

∀i∈ hm₁i,we have

(E1E2. . . En)ij = (L1L2. . . Ln)_hm₁_i{j}

and

(F₁F₂. . . F_n)_ij = (U₁U₂. . . U_n)_hm₁_i{j},

∀i∈ hm₁i,∀j ∈ hm_n+1i.

We choose the matrices E_l, F_l, l∈ hni,such that

(E_l)ij ≤(P_l)ij ≤(F_l)ij, ∀l∈ hni, ∀i∈ hm_li, ∀j ∈ hm_l+1i;

this is possible as follows. Let l∈ hni.LetV ∈∆_l and W ∈∆_l+1. Leti∈V.

Case 1. (P_l)_ij = 0, ∀j ∈ W. No problem. (We must take (E_l)_ij = 0,

∀j∈W,and we can take, e.g., (Fl)ij = 0, ∀j∈W.)

Case 2. ∃j ∈ W such that (P_l)_ij > 0. First, we construct the entries (El)ij, j ∈ W. We take (El)ij = 0, ∀j ∈ W, if (Ll)V W = 0. (Recall that E⁻⁺_l = L_l and (L_l)_{V W} = 0 imply (E_l)ij = 0, ∀j ∈ W.) Suppose, now, that (L_l)_{V W} > 0. Let (P_l)_ij₁,(P_l)_ij₂, . . . ,(P_l)_ij_k be all the nonnegative entries of (Pl)^W_{i}.Suppose that (Pl)ij1 ≤(Pl)ij2 ≤ · · · ≤(Pl)ijk.We know thatE_l⁻⁺=Ll

and (L_l)_{V W} ≤ P^k

t=1

(P_l)_ij_t. If (L_l)_{V W} ≤ (P_l)_ij_k, we take (E_l)_ij_k = (L_l)_{V W} and (E_l)_ij = 0, ∀j ∈ W, j 6= j_k. Otherwise, if (L_l)_{V W} ≤ (P_l)_ij_k−1 + (P_l)_ij_k, we take (E_l)ij_k = (P_l)ij_k, (E_l)ijk−1 = (L_l)V W −(E_l)ij_k,and (E_l)ij = 0,∀j ∈W, j 6= jk−1, j_k. Otherwise, if (L_l)_{V W} ≤ (P_l)_ij_k−2 + (P_l)_ij_k−1 + (P_l)_ij_k, we take (E_l)_ij_k = (P_l)_ij_k, (E_l)_ij_k−1 = (P_l)_ij_k−1, (E_l)_ij_k−2 = (L_l)_{V W}−(E_l)_ij_k−1−(E_l)_ij_k, and (El)ij = 0, ∀j ∈ W, j 6= jk−2, jk−1, jk. Etc. Second, we construct the entries (F_l)_ij, j ∈ W. Using the above notation, we take (F_l)_ij_k = (P_l)_ij_k+

(Ul)V W −

k

P

t=1

(Pl)ijt

and (Fl)ij = (Pl)ij,∀j∈W,j6=jk. Finally, by

(L1L2. . . Ln)_hm₁_i{j}= (E1E2. . . En)ij ≤(P1P2. . . Pn)ij ≤

≤(F₁F₂. . . F_n)_ij = (U₁U₂. . . U_n)_hm₁_i{j}, ∀i∈ hm₁i, ∀j∈ hm_n+1i,

(6)

we have X

K∈∆n+1

(L₁L₂. . . L_n)_hm₁_iK ≤α(P₁P₂. . . P_n)≤ X

K∈∆n+1

(U₁U₂. . . U_n)_hm₁_iK. (We used the fact that α(P)≤α(Q) if P ≤Q (P, Q∈Np,q).)

LetP ∈N_m,n. Let ∆∈Par(hmi) and Σ∈Par(hni).Define P⁺= (P_iJ⁺)i∈hmi, J∈Σ, P_iJ⁺ =X

k∈J

Pik, ∀i∈ hmi, ∀J ∈Σ

(see also [15]). If confusion can arise we write P^+Σ instead of P⁺. In this article, when we work with the operator (·)⁺ = (·)⁺(Σ) we suppose, for labelling the columns of matrices, that Σ is an ordered set, even if we omit to precise this.

Remark 1.9. (a) By Theorem 1.8 we have b=b(P1, P2, . . . , Pn) := max

∆1=(hm1i),

∆2∈Par(hm₂i),...,

∆n∈Par(hmni)

∆n+1=({i})_i∈h_mn+1_i

X

K∈∆n+1

(L1L2. . . Ln)hm₁iK ≤

≤α(P1P2. . . Pn)≤

≤ min

∆1=(hm₁i),

∆2∈Par(hm2i),...,

∆n∈Par(hm_ni)

∆n+1=({i})_i∈h_mn+1_i

X

K∈∆_n+1

(U₁U₂. . . U_n)_hm₁_iK:=b(P₁, P₂, . . . , P_n) =b.

(b) If P1, P2, . . . , Pn are stochastic matrices, then b ≥ 1 while if P1, P2, . . . , Pn are substochastic matrices, then it is possible that b be smaller or equal to 1. Further, as to the products of stochastic matrices, since α(P)≤1,

∀P ∈ Sm,n, it follows that the first inequality from Theorem 1.8 (also, the first inequality from (a)) is only interesting. This inequality can be even an equation in some special cases. E.g., let

P =





3 4

1 4 0 0 ³₄ ¹₄ 0 0 1



.

If we take ∆₁= (h3i), ∆₂ = ({1},{2,3}),and ∆₃= ({i})_i∈h3i = ({1},{2},{3}), then

L1L2 = 0 ¹₄ ₃

4 1 4 0 0 0 ¹₄

= 0 0 ₁₆¹ .

(7)

By Theorem 1.8, 0 + 0 + ₁₆¹ = ₁₆¹ ≤α(P²).On the other hand, since

P² =





9 16

6 16

1 16

0 ₁₆⁹ ₁₆⁷

0 0 1



,

we obtainα(P²) =₁₆¹ by direct computation. To give other examples, we can, e.g., use some examples from [17].

(c) Let P₁ ∈ N_m₁_,m₂, P₂ ∈ N_m₂_,m₃, . . . , P_n ∈ N_m_n_,m_n+1. Let ∆₁ ∈ Par(hm₁i),∆2,∆⁰₂ ∈Par(hm₂i), . . . ,∆n,∆⁰_n∈Par(hm_ni),∆n+1∈Par(hm_n+1i).

By the proof of Theorem 1.8 there are matrices E1, F1 ∈ G_∆₁_,∆₂, E2, F2 ∈ G_∆⁰

2,∆3, . . . , En, Fn∈G_∆⁰_n_,∆_n+1 with (E_l)⁻⁺_KL= min

i∈K

X

j∈L

(P_l)ij and (F_l)⁻⁺_KL = max

i∈K

X

j∈L

(P_l)ij,

∀l∈ hni, ∀K ∈∆⁰_l, ∀L∈∆_l+1,where ∆⁰₁ := ∆₁,such that E_l≤P_l≤F_l, ∀l∈ hni.

It follows that

E₁E₂. . . E_n≤P₁P₂. . . P_n≤F₁F₂. . . F_n and, therefore,

α(E₁E₂. . . E_n)≤α(P₁P₂. . . P_n)≤α(F₁F₂. . . F_n).

Obviously, the conclusion of Theorem 1.8 is a special case of the latter sequence of above inequalities, namely, when E1E2. . . En and F1F2. . . Fn are stable matrices.

(d) By (c) we have, in particular,

P₁E₂. . . E_n≤P₁P₂. . . P_n≤P₁F₂. . . F_n and, therefore,

α(P₁E₂. . . E_n)≤α(P₁P₂. . . P_n)≤α(P₁F₂. . . F_n).

Suppose that ∆₂,∆₃, . . . ,∆_n+1 and E₂, F₂, E₃, F₃, . . . , E_n, F_n are as in Theo- rem 1.8 and its proof, respectively. Further (see Theorem 1.6 and the proof of

(8)

Theorem 1.8),

P1E2. . . En=







(P1)_{1}E2. . . En

(P1)_{2}E2. . . En

...

(P₁)_{m₁_}E₂. . . E_n







=







((P₁)_{1}E₂. . . E_n)^{−+(({1}),∆}ⁿ⁺¹⁾ ((P1){2}E2. . . En)^{−+(({2}),∆}ⁿ⁺¹⁾

...

((P₁)_{m₁_}E₂. . . E_n)^−+(({m¹^}),∆ⁿ⁺¹⁾







=







((P₁)_{1})^{−+(({1}),∆}²⁾E₂^−+(∆²^,∆³⁾· · ·E_n^−+(∆ⁿ^,∆ⁿ⁺¹⁾ ((P₁)_{2})^{−+(({2}),∆}²⁾E₂^−+(∆²^,∆³⁾· · ·E_n^−+(∆ⁿ^,∆ⁿ⁺¹⁾

...

((P1){m₁})^−+(({m¹^}),∆²⁾E₂^−+(∆²^,∆³⁾· · ·En^−+(∆ⁿ^,∆ⁿ⁺¹⁾







=







((P1){1})^+∆²L2. . . Ln

((P1)_{2})^+∆²L2. . . Ln

...

((P1)_{m₁_})^+∆²L2. . . Ln







=P₁^+∆²L₂. . . L_n.

We also have

P1F2. . . Fn=P₁^+∆²U2. . . Un. Finally, we obtain

α(P₁^+∆²L₂. . . L_n)≤α(P₁P₂. . . P_n)≤α(P₁^+∆²U₂. . . U_n).

Obviously, the bounds α(P₁^+∆²L₂. . . L_n) and α(P₁^+∆²U₂. . . U_n) are better than α(E1E2. . . En) and α(F1F2. . . Fn) from (c), respectively.

Although below we do not give applications of Remark 1.9, this could be used to improve the convergence rate of our hybrid Metropolis-Hastings chain from the next section.

2. OUR HYBRID METROPOLIS-HASTINGS CHAIN We design a hybrid Metropolis-Hastings chain on a known finite state space based on the G method because this method can perform some interesting things, such as

1) to see if a finite Markov chain has a finite convergence time (see [17]

and also Section 1);

2) to see if a product of stochastic (more generally, nonnegative) matrices is positive (see, e.g., Theorem 2.3 below; see also Theorem 1.6 (we can replace the stochastic (more generally, nonnegative) matrices with their incidence matrices));

(9)

3) to design positive products of stochastic matrices, the latter having certain given properties (see, e.g., this section);

4) to give bounds for the ergodicity coefficientsα andα of the products of stochastic matrices

(see also [17]) for other applications). The analysis of our hybrid Metropolis- Hastings chain is based on the Gmethod and ergodicity coefficients.

Definition 2.1 (see, e.g., [20, p. 80]). LetP ∈N_m,n.

(a) We say thatP is arow-allowable matrix if it has at least one positive entry in each row.

(b) We say that P is a column-allowable matrix if it has at least one positive entry in each column.

Below we also use notation from Theorem 1.8 and its proof (see also Remark 1.9). Let S=hri.

Theorem 2.2. Let P₁, P₂, . . . , P_t∈S_r.Let ∆₁,∆₂, . . . ,∆_t+1 ∈Par(S),

∆1 = (S), ∆t+1 = ({i})i∈S. If Ll (see Theorem 1.8) is a column-allowable matrix, ∀l∈ hti,then P₁P₂. . . P_t>0.

Proof. Obviously,L₁>0.It follows, by induction, that L₁L₂. . . L_t>0.

Since (see the proof of Theorem 1.8)

(P₁P₂. . . P_t)_{i} ≥(E₁E₂. . . E_t)_{i} =L₁L₂. . . L_t>0, ∀i∈S, we have P1P2. . . Pt>0.

LetP ∈Nm,n. Define

P = (Pij)∈Nm,n, Pij =

1 ifP_ij >0, 0 ifP_ij = 0,

∀i∈ hmi,∀j∈ hni.We callP theincidence matrix of P (see, e.g., [6, p. 222]).

Let π = (πi)i∈S = (π1, π2, . . . , πr) be a probability distribution on S.

One way to sample approximately from S is by means of the well-known Metropolis-Hastings chain ([10] and [5]). Let Q∈Sr be an irreducible matrix such that Qis a symmetric matrix. Define

P = (Pij)∈Sr, Pij =











0 ifj 6=iand Q_ij = 0, Qijmin(1,^π_π^j^Q^ji

iQij) ifj 6=iand Qij >0, 1−P

k6=i

Pik ifj =i.

Since πP =π (πiPij =πjPji,∀i, j ∈ S,implies πP =π), we have Pⁿ →e⁰π as n → ∞.The Markov chain with transition matrix P is called Metropolis- Hastings (see also [2–3] and [18] for some historical notes, open problems, and results on this chain).

Further, we define our hybrid Metropolis-Hastings chain.

(10)

Let ∆1,∆2, . . . ,∆t+1 ∈ Par(S) with ∆1 = (S) ∆2 · · · ∆t+1 = ({i})_i∈S. (We set ∆∆⁰ if ∆⁰ ∆ and ∆⁰ 6= ∆, where ∆,∆⁰ ∈ Par(E), see Section 1.) LetQ1, Q2, . . . , Qt∈Sr such that

(C1)Q₁, Q₂, . . . , Q_tare symmetric matrices;

(C2) (Q_l)^L_K = 0, ∀l ∈ hti − {1}, ∀K, L ∈ ∆_l, K 6= L (this assumption implies thatQ2, Q3, . . . , Qt are block diagonal matrices);

(C3) (Q_l)^U_K is a row-allowable matrix, ∀l ∈ hti, ∀K ∈ ∆_l, ∀U ∈ ∆_l+1, U ⊆K.

Although Q_l, l ∈ hti, are not irreducible matrices if l ≥ 2, we define the matrices Pl,l∈ hti, as in the Metropolis-Hastings case, namely,

P_l= ((P_l)_ij)∈S_r, (P_l)_ij =











0 ifj 6=iand (Q_l)_ij = 0, (Ql)ijmin(1,^π_π^j^(Q^l⁾^ji

i(Ql)ij) ifj 6=iand (Ql)ij >0, 1−P

k6=i

(Pl)ik ifj =i,

∀l∈ hti.Set P =P₁P₂. . . P_t.

Theorem 2.3. Concerning P above we have πP =π and P >0.

Proof. Since πi(P_l)ij = πj(P_l)ji, ∀l ∈ hti, ∀i, j ∈ S, we have πP_l = π,

∀l ∈ hti. Further, πP_l = π, ∀l ∈ hti, implies πP = π. By Theorem 2.2, P =P1P2. . . Pt>0.

By Theorem 2.3, Pⁿ → e⁰π as n → ∞. P determines a Markov chain;

we call this chain the hybrid Metropolis-Hastings chain. In particular, we call this chain the hybrid Metropolis chain when Q1, Q2, . . . , Qt are symmetric matrices.

We need the next result for the analysis of hybrid Metropolis-Hastings chain.

Theorem 2.4. (A less or more known result.) Let P ∈Sr be an ape- riodic irreducible matrix. Consider a Markov chain with transition matrix P and limit probability distribution π. Let p_n be the probability distribution of chain at time n,∀n≥0.Then

kp_n−πk₁ ≤2α(Pⁿ).

Proof. (The proof is less or more known.) Letµandν be two probability distributions on hmiand Q∈S_m,n. It is known that

kµQ−νQk₁≤ kµ−νk₁α(Q)

(see, e.g., [7, p. 147]). Using the above result,pn=p0Pⁿ,andπP =π, we have kp_n−πk₁=kp₀Pⁿ−πPⁿk₁≤ kp₀−πk₁α(Pⁿ)≤2α(Pⁿ).

(11)

Now, bounds forkp_n−πk₁ of the hybrid Metropolis-Hastings chain can follow from Theorems 1.8 and 2.4 and Remark 1.9. This analysis can be reali- zed for each hybrid chain or for certain collections of hybrid chains. Obviously, it is necessary that t be as small as possible.

Suppose that ∆_l = K₁^(l), K₂^(l), . . . , Ku^(l)_l

, ∀l ∈ ht+ 1i. Below we consider a case which can be analyzed more easily than the others, namely, that satisfying, moreover, the conditions:

(c1)|K₁^(l)|=|K₂^(l)|=· · ·=|K_u^(l)_l|,∀l∈ ht+ 1i withu_l ≥2;

(c2) r=r₁r₂. . . r_t withr₁r₂. . . r_l =|∆_l+1|,∀l∈ ht−1i, and r_t=|K₁^(t)| (this is compatible with ∆1∆2 · · · ∆t+1);

(c3) (c3.1)Q_l is a symmetric matrix such that (c3.2) (Q_l)ii>0,∀i∈S, and (Q_l)_i₁_j₁ = (Q_l)_i₂_j₂, ∀i₁, i₂, j₁, j₂ ∈ S with i₁ 6= j₁, i₂ 6= j₂, and (Q_l)_i₁_j₁, (Q_l)i2j2 > 0, ∀l ∈ hti ((c3.2) says, to put it otherwise, that all the positive entries of Q_l, excepting the entries (Q_l)_ii,∀i∈S,are equal,∀l∈ hti);

(c4) (Q_l)^U_K has in each row just one positive entry, ∀l ∈ hti, ∀K ∈ ∆_l,

∀U ∈ ∆_l+1 with U ⊆ K (this is compatible with (c3.1) because (Q_l)^W_V is a square matrix, ∀l∈ hti,∀V, W ∈∆_l+1).

By (C2), (c1), (c2), and (c4),

|{j |j∈S and (Q_l)ij >0}|=r_l, ∀i∈S, ∀l∈ hti, and by (c1) and (c2),

|K₁^(l+1)|= r

|∆_l+1| = r₁r₂. . . r_t

r1r2. . . r_l =rl+1rl+2· · ·rt, ∀l∈ ht−1i.

To give an example for which the conditions (C1)–(C3) and (c1)–(c4) hold, we considerS=h6i(r= 6 = 3·2), ∆₁= (h6i),∆₂= ({1,2},{3,4},{5,6}),

∆3 = ({i})_i∈h6i= ({1},{2}, . . . ,{6}),

Q₁ =







2

4 0 ¹₄ 0 0 ¹₄ 0 ²₄ 0 ¹₄ ¹₄ 0

1

4 0 ²₄ 0 ¹₄ 0 0 ¹₄ 0 ²₄ 0 ¹₄ 0 ¹₄ ¹₄ 0 ²₄ 0

1

4 0 0 ¹₄ 0 ²₄







and Q₂ =







1 4

3

4 0 0 0 0

3 4

1

4 0 0 0 0

0 0 ¹₄ ³₄ 0 0 0 0 ³₄ ¹₄ 0 0 0 0 0 0 ¹₄ ³₄ 0 0 0 0 ³₄ ¹₄





 .

Further, we choose the positive entries of matrices Ql, l ∈ hti; these are not choose at random (it is interesting to bear in mind this fact). Let l∈ hti.Set

f_l= min

i,j∈S, (Ql)ij>0

π_j

π_i, g_l= 1 f_l, and (see (c3) again)

xl = (Ql)ij,

(12)

where i, j ∈ S are fixed such that i 6= j and (Ql)ij > 0. Obviously, fl ≤ 1, g_l≥1,and

g_l = max

i,j∈S,(Ql)ij>0

π_j πi

.

We have

(P_l)_ii= 1−X

j6=i

(P_l)_ij ≥1−(r_l−1)x_l. We impose

1−(rl−1)xl≥xlfl. It follows that

xl≤ 1

f_l+r_l−1. We choose the matrix Q_l such that

x_l= 1

fl+rl−1.

To see that this choice is possible, we need to prove that (Q_l)ii>0 (see (c3) above). Indeed,

(Q_l)ii= 1− X

j∈S,j6=i

(Q_l)ij = 1−(r_l−1)x_l = 1− rl−1

f_l+r_l−1 >0.

Below we give another main result of this article.

Theorem 2.5. Under the above assumptions we have (i)α(P)≤1−r_f ^f¹

1+r1−1 f2

f2+r2−1· · ·_f ^f^t

t+rt−1 =

= 1−r_1+(r¹

1−1)g1

1

1+(r2−1)g2 · · ·_1+(r¹

t−1)gt; (ii)kp_n−πk₁ ≤2(1−r_f ^f¹

1+r1−1 f2

f2+r2−1· · ·_f ^f^t

t+rt−1)ⁿ=

= 2(1−r_1+(r¹

1−1)g₁ 1

1+(r2−1)g₂ · · ·_1+(r¹

t−1)g_t)ⁿ, ∀n≥0, where pnis the probability distribution at timenof the hybrid Metropolis chain with the transition matrix P =P₁P₂. . . P_t,∀n≥0;

(iii)an upper bound of the minimum number nof steps such that kp_n− πk₁≤εis [(ln^ε₂)/(lna(P))] if 0< ε≤2 (recall that kp_n−πk₁ ≤2,∀n≥0), a(P)>0, where

a(P) := 1−r f1

f₁+r₁−1 f2

f₂+r₂−1· · · ft

f_t+r_t−1.

Proof. (i) The inequality follows from Remark 1.7 and Theorem 1.8 (we have

(P_l)_ij ≥x_lf_l = f_l

f_l+r_l−1, ∀i, j∈S with (P_l)_ij >0, ∀l∈ hti).

The equation is obvious.

(13)

(ii) It follows from Theorem 2.4, (i), and the well-known inequality α(T₁T₂)≤α(T₁)α(T₂), ∀T₁ ∈S_m₁_,m₂, ∀T₂∈S_m₂_,m₃

([4]; see, e.g., also [6, p. 58] or [7, p. 145]).

(iii) We impose 2(a(P))ⁿ≤ε. Then kp_n−πk₁ ≤εif n≥[(ln^ε₂)/(lna(P))].

Remark 2.6. (a) Theorem 2.5(ii) gives a geometric convergence rate of the hybrid Metropolis chain. Obviously, the chain has a fast convergence if fl

is close to 1, ∀l∈ hti.

(b) By Theorem 2.5(ii), if π = (πi)i∈S is the uniform distribution on S =hri (in this case, f_l = 1, ∀l∈ hti), then kp₁−πk₁ = 0 (i.e., we have an exact sampling from uniform distribution in one step due toP or, equivalently, in tsteps due toP1, P2, . . . , Pt).

(c) The bound from Theorem 2.5(ii) could not be the best one; this is an open problem.

To speed up the hybrid Metropolis-Hastings chain, we can replace the product Ps+1Ps+2· · ·Pt (1 ≤ s < t) from P = P1P2. . . PsPs+1· · ·Pt by the

∆s+1-stable matrix (recall that ∆l= (K₁^(l), K₂^(l), . . . , Ku^(l)l),∀l∈ ht+ 1i)

P^∗ =P^∗(s) :=







A^∗(s+1)₁

A^∗(s+1)₂ . ..

A^∗(s+1)us+1





 ,

where

A^∗(s+1)_z := 1 P

i∈K_z^(s+1)

πi

e⁰(|K_z^(s+1)|)(π_i)_i∈K(s+1)

z , ∀z∈ hu_s+1i

(recall that e=e(n) = (1,1, . . . ,1)∈Rⁿ and e⁰ is its transpose). E.g., ifS= h8i, ∆₁= (h8i), ∆₂= ({1,2,3,4},{5,6,7,8}), ∆₃= ({1,2},{3,4},{5,6},{7,8}),

∆4 = ({i})_i∈h8i= ({1},{2}, . . . ,{8}), then, fors= 2, P =P1P2P^∗ with

P^∗ =P^∗(2) =







1

a⁽³⁾₁ W₁^∗(3) 0 0 0

0 ¹

a⁽³⁾₂ W₂^∗(3) 0 0

0 0 ¹

a⁽³⁾₃ W₃^∗(3) 0

0 0 0 ¹

a⁽³⁾₄ W₄^∗(3)





 ,

(14)

where a⁽³⁾₁ := π₁ +π₂, a⁽³⁾₂ := π₃ +π₄, a⁽³⁾₃ := π₅ +π₆, a⁽³⁾₄ := π₇ +π₈, W₁^∗(3) := e⁰(2)(π₁, π₂), W₂^∗(3) := e⁰(2)(π₃, π₄), W₃^∗(3) := e⁰(2)(π₅, π₆), and W₄^∗(3) := e⁰(2)(π7, π8) (obviously, A^∗(3)_k = ¹

a⁽³⁾_k W_k^∗(3), ∀k ∈ h4i), while, for s= 1, P =P₁P^∗ with

P^∗ =P^∗(1) =





1

a⁽²⁾₁ W₁^∗(2) 0

0 ¹

a⁽²⁾₂ W₂^∗(2)



,

wherea⁽²⁾₁ :=π1+π2+π3+π4, a⁽²⁾₂ :=π5+π6+π7+π8, W₁^∗(2) :=e⁰(4)(π1, π2, π₃, π₄),and W₂^∗(2):=e⁰(4)(π₅, π₆, π₇, π₈).Moreover, setting

C1=





1

a⁽²⁾₁ U₁^∗(2) 0

0 ¹

a⁽²⁾₂ U₂^∗(2)



,

where U₁^∗(2) :=e⁰(4)(π₁+π₂,0, π₃+π₄,0),and U₂^∗(2) :=e⁰(4)(π₅+π₆,0, π₇+ π8,0),we haveP^∗(1) =C1P^∗(2).(This result can be generalized easily.) Con- sequently, we can work with C₁ and P^∗(2) instead of P^∗(1).This is an interesting thing because the matrices C1 and P^∗(2) are sparser thanP^∗(1).

To unify the theory of the hybrid chain with right ∆-stable matrices as above and that without right ∆-stable matrices as above, below we consider that s∈ hti and setrt+1 = 1,

P =P(s) =

P₁P₂. . . P_t ifs=t, P1P2. . . PsP^∗ if 1≤s < t, and

a(P) =

(1−r_f ^f¹

1+r1−1 f2

f2+r2−1· · ·_f ^f^t

t+rt−1 ifP =P1P2. . . Pt, 1−r_f ^f¹

1+r1−1 f2

f2+r2−1· · ·_f ^f^s

s+rs−1 1

rs+1rs+2···r_t+1 ifP =P₁P₂. . . P_sP^∗. The two results below are generalizations of Theorems 2.3 and 2.5, respectively.

Theorem 2.7. Concerning P above we have πP =π and P >0.

Proof. See the proof of Theorem 2.3.

Theorem 2.8. Concerning P above we have (recall that |K₁^(l+1)| = r_l+1r_l+2· · ·r_t+1,∀l∈ hti)

(i)α(P)≤1−r_f ^f¹

1+r1−1 f2

f2+r2−1· · ·_f ^f^s

s+rs−1 1

rs+1rs+2···rt+1 =

= 1−r_1+(r¹

1−1)g₁ 1

1+(r2−1)g₂ · · ·_1+(r¹

s−1)gs

1 rs+1rs+2···rt+1; (ii)kp_n−πk₁ ≤2(1−r_f ^f¹

1+r1−1 f2

f2+r2−1· · ·_f ^f^s

s+rs−1 1

rs+1rs+2···rt+1)ⁿ=