MARKOV CHAINS WITH MAXIMUM IOSIFESCU-THEODORESCU ENTROPY
VASILE PREDA and COSTEL B ˘ALC ˘AU
Using the standard maximum entropy method and some basic properties of condi- tional entropy we solve the problem of maximization without explicit constraints of the Iosifescu-Theodorescu entropy for discrete-time homogeneous stationary Markov chains of orderrwith countable state space.
AMS 2000 Subject Classification: 94A17, 60J22.
Key words: homogeneous stationary multiple Markov chain, Iosifescu-Theodo- rescu entropy, standard maximum entropy method.
1. INTRODUCTION
The reconstruction of a discrete-time homogeneous simple or multiple Markov chain with countable state space, when only partial information is know, arises in many practical applications from various sciences as econom- ics, psychology, biology (see, e.g., Iosifescu [11], Iosifescu and Grigorescu [14]).
The goal of our paper is to solve the problem of maxentropic reconstruction of some discrete-time homogeneous stationary Markov chain of order r with countable state space. The transition probabilities are to be found from the knowledge of the stationary distribution, only, i.e., from the estimated values of the joint probabilities of the states at r consecutive times. The entropy involved was first introduced and studied by Iosifescu and Theodorescu [15]
for the general class of chains with complete connections. Iosifescu [10] also derived the asymptotic behaviour of entropy for homogeneous system with complete connections with a finite set of states, his results being further com- pleted by Gabrielli, Galvez and Guiol [4]. In Section 2 we describe some preliminaries results about homogeneous stationary multiple Markov chains and their Iosifescu-Theodorescu entropy (IT-entropy).
For solving our problem, we apply the standard maximum entropy (SME) method. This method, introduced by Jaynes [16, 17], states that one should choose the transition probabilities that maximize the entropy of the chain.
REV. ROUMAINE MATH. PURES APPL.,53(2008),1, 55–61
The SME method was used for finite or denumerable simple Markov chains by Gerchak [5] and B˘alc˘au [1, 2]. We would like to point out that the SME method was also used in the reconstruction of probability distributions (see [3, 7, 8, 18]). In Section 3 we use the SME method and some basic properties of conditional entropy to solve the problem of maximization of the IT-entropy of discrete-time homogeneous stationary Markov chains without explicit con- straints of order r with countable state space. We derive an expected result, namely, that the IT-entropy of a chain {X(t), t ∈ N} reaches a maximum when X(r) is independent of (X(0), . . . , X(r−1)). This property justifies the usage of IT-entropy as a measure of mobility for social processes modelled as multiple Markov chains.
2. IT-ENTROPY OF HOMOGENEOUS STATIONARY MULTIPLE MARKOV CHAINS
Let {X(t), t ∈ N} be a homogeneous stationary Markov chain of order r, r ∈ N∗, with a countable (finite or infinite) state space I. Denote by P(r)= Pi(r)
1,...,ir;j
i1,...,ir,j∈I the array ofr-step transition probabilities, i.e., Pi(r)
1,...,ir;j =P(X(t+r) =j|X(t) =i1, . . . , X(t+r−1) =ir), ∀t∈N, for any j, i1, . . . , ir ∈I. Denote also, by π(r) = πi(r)
1,...,ir(t)
i1,...,ir∈I the joint probability of the states at r consecutive times, i.e.,
π(r)i
1,...,ir =P(X(t) =i1, . . . , X(t+r−1) =ir), ∀t∈N,
for any i1, . . . , ir ∈ I. Obviously, the chain {X(t), t ∈ N} is completely characterized by the distribution π(r) and the transition probability array P(r), where
πi(r)1,...,ir >0, ∀i1, . . . , ir∈I, (1)
X
i1,...,ir∈I
π(r)i1,...,ir = 1, (2)
X
i1,...,ik∈I
πi(r)1,...,i
k,ik+1,...,ir = X
i1,...,ik∈I
πi(r)
k+1,...,ir,i1,...,ik, ∀1≤k≤r−1, (3)
Pi(r)
1,...,ir;j ≥0, ∀j, i1, . . . , ir ∈I, (4)
X
j∈I
Pi(r)
1,...,ir;j = 1, ∀i1, . . . , ir∈I, (5)
X
i1,...,ir∈I
πi(r)
1,...,irPi(r)
1,...,ir;j = X
i1,...,ir−1∈I
π(r)i
1,...,ir−1,j, ∀j ∈I.
(6)
At any time t ∈ N, the state probability distribution π(1) = πj(1)
j∈I is given by
π(1)j =P(X(t) =j) = X
i1,...,ir−1∈I
πi(r)
1,...,ir−1,j, ∀j∈I, (7)
for r ≥ 2 the (1-step) transition probability matrix P(1) = Pij(1)
i,j∈I is given by
Pij(1) =P(X(t+ 1) =j|X(t) =i) = P
i1,...,ir−2∈I
π(r)i
1,...,ir−2,i,j
P
i1,...,ir−1∈I
πi(r)
1,...,ir−1,j
, (8)
for any i, j ∈ I, and for r = 1 this matrix is just the transition proba- bility array.
We use the following well-known results concerning the Shannon entropy [19] and the conditional entropy (see, e.g., Guia¸su [6], Ihara [9]).
Lemma2.1. Let X,Y andZ be random variable with values in a count- able set I. Let H(Z) be the Shannon entropy of Z, H(Z|X) the conditional entropy of Z given X, and H(Z|X, Y) the conditional entropy of Z given (X, Y). Then
H(Z|X, Y)≤H(Z|X)≤H(Z).
The first inequality becomes an equality if and only if Z andY are independent when X is given while the second one becomes an equality if and only ifZ and X are independent.
Remark 2.1. By definition, H(Z) =−X
z∈I
P(Z =z) lnP(Z =z), H(Z|X) =− X
z,x∈I
P(Z =z, X =x) lnP(Z =z|X =x), and
H(Z|X, Y) =− X
z,x,y∈I
P(Z =z, X =x, Y =y) lnP(Z =z|X =x, Y =y).
For (notational) convenience, 0 ln 0 = 0.
Remark 2.2. Lemma 2.1 still holds if X,Y and Z are discrete random vectors.
Theorem 2.1. Let {X(t), t ∈ N} be a Markov chain of order r with a countable state space I. Then
H(X(t+r)|X(t+r−1), . . . , X(t), X(t−1), . . . , X(0)) =
=H(X(t+r)|X(t+r−1), . . . , X(t)), ∀t∈N.
Proof. Lett∈N. It follows from the (multiple) Markov property, namely, P(X(t+r) =j|X(t+r−1) =ir, . . . , X(0) =i1−t) =
=P(X(t+r) =j|X(t+r−1) =ir, . . . , X(t) =i1), j, ir, . . . , i1−t∈I, that X(t+r) and (X(t−1), . . . , X(0)) are independent when (X(t+r − 1), . . . , X(t)) is given. The conclusion follows from Lemma 2.1 and Remark 2.2.
Definition 2.1 (see [15]). Let{X(t), t∈N}be a homogeneous stationary Markov chain of order r, r ∈ N∗. The Iosifescu-Theodorescu entropy (IT- entropy) of this chain is defined as
H(r)(P(r)) =− X
i1,...,ir∈I
X
j∈I
π(r)i
1,...,irPi(r)
1,...,ir;jlnPi(r)
1,...,ir;j.
Remark 2.3. According to Theorem 2.1, we have
H(r)(P(r)) =H(X(t+r)|X(t+r−1), . . . , X(t)) =
=H(X(t+r)|X(t+r−1), . . . , X(t), X(t−1), . . . , X(0)), ∀t∈N. Therefore, IT-entropy measures the amount of remained uncertainty of the chain at any arbitrary time after the knowledge of its behavior at r latest times or, equivalently, at all previous times.
3. MAXIMUM IT-ENTROPY
Let {X(t), t ∈ N} be a homogeneous stationary Markov chain of order r with a countable state spaceI. On account of (4), (5) and (6), we consider
the following optimization problem, according to the SME method:
(P) :
maxH(r)(P(r)) =− P
i1,...,ir∈I
P
j∈I
π(r)i
1,...,irPi(r)
1,...,ir;jlnPi(r)
1,...,ir;j
such that P
j∈I
Pi(r)
1,...,ir;j = 1, ∀i1, . . . , ir ∈I;
P
i1,...,ir∈I
π(r)i
1,...,irPi(r)
1,...,ir;j = P
i1,...,ir−1∈I
πi(r)
1,...,ir−1,j, ∀j∈I; Pi(r)1,...,ir;j ≥0, ∀j, i1, . . . , ir∈I,
whereπ(r)= πi(r)1,...,ir
i1,...,ir∈I is a given stationary distribution which satisfies (1), (2) and (3).
We assume that the state probability distributionπ(1) given by (7) has finite Shannon entropy, i.e.,
H(π(1)) =−X
i∈I
πi(1)lnπi(1) <∞.
The next theorem is the main result of this paper.
Theorem3.1. Problem (P)has a unique optimal solutionP∗(r) given by Pi∗(r)
1,...,ir;j =πj(1), ∀j, i1, . . . , ir∈I, while the maximum IT-entropy is
H(r)(P∗(r)) =H(π(1)).
Proof. According to Lemma 2.1, Remarks 2.2 and 2.3, we have H(r)(P(r)) =H(X(r)|X(0), . . . , X(r−1))≤H(X(r)) =H(π(1)), for any feasible solution P(r)of problem (P). Moreover, the inequality becomes an equality if and only if X(r) is independent of (X(0), . . . , X(r−1)), i.e.
Pi(r)
1,...,ir;j =π(1)j , ∀j, i1, . . . , ir∈I, and the proof is complete.
Remark3.1. Theorem 3.1 has the following interpretation: the IT-entropy of a chain{X(t), t∈N}is maximum whenX(r) is independent of (X(0), . . . , X(r−1)). This property justifies the usage of IT-entropy as a measure of mobility for social processes modelled by multiple Markov chains: one has strong mobility when X(t+r) is independent of (X(t), . . . , X(t+r−1)) or, equivalently, of (X(0), . . . , X(t+r−1)), at any timet.
As a direct consequence of Theorem 3.1, we have the next result.
Corollary 3.1. P(r) is the unique optimal solution of problem (P) if and only if X(r) is independent of(X(0), . . . , X(r−1)) and
Pij(1) =πj(1), ∀i, j∈I,
where P(1) = (Pij(1))i,j∈I is the (1-step) transition probability matrix given by (8).
Remark 3.2. Theorem 3.1 extends similar results obtained by Gerchak [5] (via Lagrangian duality) and B˘alc˘au [1, 2] (via geometric programming) for simple Markov chains in the finite or infinite countable state case, respectively.
REFERENCES
[1] C. B˘alc˘au, L’optimisation de l’entropie des chaˆınes de Markov avec un ensemble d´enombrable d’´etats. Stud. Cerc. Mat.48(1996), 235–244.
[2] C. B˘alc˘au,Maxentropic reconstruction of some homogeneous Markov chain in the count- able case. Math. Rep. (Bucur.)6(56)(2004), 9–19.
[3] S. Erlander,Entropy in linear programs. Math. Programming21(1981), 137–151.
[4] D. Gabrielli, A. Galvez and D. Guiol,Fluctuations of the empirical entropies of a chain of infinite order. Math. Phys. Electron. J.9(2003), 1–17.
[5] Y. Gerchak,Maximal entropy of Markov chains with common steady-state probabilities.
J. Oper. Res. Soc.32(1981), 233–234.
[6] S. Guia¸su,Quantum Mechanics. Nova Science Publ., Huntington, New York, 2001.
[7] H. Gzyl,Maxentropic reconstruction of some probability distributions. Stud. Appl. Math.
105(2000), 235–243.
[8] H. Gzyl and Y. Vel´asquez,Reconstruction of transition probabilities by maximum entropy in the mean. In R. Fry. (Ed.), Bayesian Inference and Maximum Entropy Methods in Science and Engineering (Baltimore, MD, 2001), pp. 192–203. AIP Conf. Proc. 617, Amer. Inst. Phys., Melville, NY, 2002.
[9] S. Ihara,Information Theory for Continuous Systems. World Scientific Publ., Singapore, 1993.
[10] M. Iosifescu,Sampling entropy for random homogeneous systems with complete connec- tions. Ann. Math. Statist.36(1965), 1433–1436.
[11] M. Iosifescu,Finite Markov Processes and their Applications. Wiley Series in Probability and Mathematical Statistics, Chichester & Ed. Tehnic˘a, Bucharest, 1980; republication Dover, 2007.
[12] M. Iosifescu, On U-statistics and von Mises statistics for a special class of Markov chains. J. Statist. Plann. Inference30(1992), 395–400.
[13] M. Iosifescu,A generalization of semi-Markov processes. In: J. Janssen and N. Limnios (Eds.),Semi-Markov Models and Applications (Compi`egne, 1998), pp. 23–32. Kluwer, Dordrecht, 1999.
[14] M. Iosifescu and S. Grigorescu,Dependence with Complete Connections and its Appli- cations. Cambridge Univ. Press, Cambridge, 1990.
[15] M. Iosifescu and R. Theodorescu,On the entropy of chains with complete connections.
Com. Acad. R.P. Romˆıne11(1961), 821–824. (Romanian)
[16] E.T. Jaynes, Information theory and statistical mechanics. Phys. Rev. (2)106(1957), 620-630.
[17] E.T. Jaynes,Information theory and statistical mechanics. II. Phys. Rev. (2)108(1957), 171–190.
[18] V. Preda and C. B˘alc˘au, On maxentropic reconstruction of countable Markov chains and matrix scaling problems. Stud. Appl. Math.111(2003), 85–100.
[19] C.E. Shannon,A mathematical theory of communication, PartIII. Bell System Tech. J.
27(1948), 623–656.
[20] R. Theodorescu,On the functional relations of multiple Markov processes. Stud. Cerc.
Mat.17(1965), 1367–1375.
Received 5 March 2007 University of Bucharest
Faculty of Mathematics and Computer Science Str. Academiei, 14
010014 Bucharest, Romania [email protected]
and University of Pite¸sti
Faculty of Mathematics and Computer Science Str. Tˆargu din Vale 1
110440 Pite¸sti, Romania [email protected]