Testing stationary processes for independence

(1)

www.imstat.org/aihp 2011, Vol. 47, No. 4, 1219–1225

DOI:10.1214/11-AIHP426

Testing stationary processes for independence

Gusztáv Morvai

^a,1

and Benjamin Weiss

^b

aMTA-BME Stochastics Research Group, Egry József utca 1, Building H, Budapest, 1111 Hungary. E-mail:morvai@math.bme.hu bInstitute of Mathematics, Hebrew University of Jerusalem, Jerusalem 91904, Israel. E-mail:weiss@math.huji.ac.il

Received 27 October 2009; revised 21 December 2010; accepted 15 March 2011

Abstract. LetH₀denote the class of all real valued i.i.d. processes andH₁all other ergodic real valued stationary processes. In spite of the fact that these classes are not countably tight we give a strongly consistent sequential test for distinguishing between them.

Résumé. SoitH₀la classe de tous les processus indépendants et équidistribués à valeurs réelles, etH₁la classe complémentaire dans l’ensemble des processus ergodiques. Nous donnons un test séquentiel fortement consistant pour les distinguer.

MSC:62M07

Keywords:Independent processes; Hypothesis testing

1. Introduction

In sequential testing for distinguishing between two competing hypotheses, the usual framework consists of two classes of stochastic processesH0andH1, and a sequence ofnobservationsX1, X2, . . . , Xnwhich represent a sampling from a single process. A test is usually a sequence of functionsg_nofnarguments that take the values zero and one and it is said to be weakly consistentif the sequence converges to 0/1 in probability according to whether the process belongs toH₀/H1. It is said to beconsistentif the convergence is pointwise with probability one. Much clas- sical work (see [2,3,5]) was done in the case where the classesHi consisted of i.i.d. processes with the classes being distinguished by the type of the underlying distribution. More recently (see [11], [6], [12] and [10]) the more general problem was considered where the classesH_iare no longer restricted to being i.i.d. but are assumed to contain certain ergodic stationary processes. In particular in the latter two papers some general sufficient conditions were given on the classes which ensure the existence of consistent tests.

Perhaps the simplest question of this type that one can put is to distinguish between H0, the class of all i.i.d.

processes and its complement in the class of ergodic stationary processses, namelyH₁is taken to be all such processes that are not independent. If we restrict attention to finite valued processes with a fixed number of states then D. Bailey in his thesis [1] gave just such a consistent test that was based on his universal scheme for estimating the entropy of an unknown process. If however we takeH0to be the class of all real valued independent processes then none of these earlier results apply. For example, the main result in [10] assumes that the union of the two classes is contained in a countable union of uniformly tight processes whereas the class of all distribution functions on the real line is not countably tight. Our purpose in this note is to answer just this question by providing a consistent sequential test for deciding whether an arbitrary stationary ergodic process is independent or not. In fact our result applies to vector-

1Supported by OTKA grant No. K75143 and Bolyai János Research Scholarship.

(2)

valued processes as well, indeed all that one needs to assume is that the values taken by the process lie in a countably generated measurable space which is known a priori.

In the first section we shall show how an appropriate quantization enables one to adapt any good set of independence tests for finite valued i.i.d. processes to produce an independence test for general processes. In the second section we shall give a specific example of such a good test. Similar questions can be asked about classes of Markov processes and we have answered some of these in earlier work. In [8] we showed that even when we restrict to binary processes the class of all finite order Markov chains cannot be distinguished from its complement by any weakly consistent test.

On the other hand for anykthe order-kMarkov chains can be distinguished from their complement by a consistent test even in the setting of countable alphabets (see [7] and [9]).

2. A general test for independence

Let{X,F}be a countably generated measurable space, such asRorR^dor any Polish space with its Borelσ-algebra.

Our process{X_n}^∞_n_=−∞will be stationary and ergodic taking values inX. (Note that all stationary time series{X_n}^∞_n₌₀ can be thought to be a two sided time series, that is,{X_n}^∞_n_=−∞.) The reader can keep in mind real valued processes without loss of generality.

For notational convenience, letX_mⁿ =(X_m, . . . , X_n), wherem≤n. Note that ifm > nthenX_mⁿ is the empty string.

LetAkbe a refining sequence of finite partitions ofXand denote byAˆkthe corresponding algebras of sets. Assume that_∞

k=1Aˆk generates the wholeσ-algebraF. For a countably generated measurable space such sequences always exist. For exampleAkcan be the partition ofRinto intervals of the form[j/2^k, (j+1)/2^k)for all integers|j|< k2^k and their complement. We will denote by

Q_k(x)=A_i ifx∈A_i∈Ak

the function which assigns tox the set of the partition to which it belongs. For any stationary and ergodicX-valued process{X_n}, let

Y_n^(k)=Q_k(X_n)

denote the process obtained by quantizing at levelk.

Lemma 1. A stationary and ergodicX-valued process{X_n}is independent if and only if the correspondingAk-valued processY_n^(k)is independent for eachk.

Proof. If the{X_n}is independent then as functions of theX_n’s so are all theY_n^(k). In the opposite direction assume that for allktheY_n^(k)’s are independent. Denote byAthe union of the finite algebras_∞

k=1Aˆk and byA^Zthe productσ- algebra generated byA. On thisσ-algebra the probability measure defined by the{Xn}process is a product measure.

Since the algebra_∞

k=1Aˆk generates the fullσ-algebraFand since measures that agree on an algebra agree also on theσ-algebra it generates this implies that the probability measure defined by the{Xn}process is a product measure

onF^Zas was to be shown.

LetY be a finite set. A sequence of functionsgndefined onYⁿ⁺¹is a consistent test for independence if for any Y-valued stationary and ergodic process{Yn}, eventually almost surely

g_n(Y₀, . . . , Y_n)=IND if the process is independent, DEP otherwise.

Define

En=supP

gn(Y0, . . . , Yn)=DEP ,

where the sup is over allY-valued independent and identically distributed processes.

(3)

If in addition ∞ n=0

E_n<∞

then we will say thatg_nis astrongly consistenttest. In the next section we will show how to construct such tests.

Now letYk =Ak as above. We will show how to use any family of strongly consistent tests{g_n^(k)}to devise a consistent test for the class of allX-valued processes. For an increasing sequenceb= {b_n}such thatb_n→ ∞define

TEST^b_n X₀ⁿ

=

IND ifg_n^(k)

Y₀^(k), . . . , Y_n^(k)

=INDfor all 1≤k≤b_n, DEP otherwise.

Theorem 1. Let{g_n^(k)}be a family of strongly consistent tests and definem_kbe so large that_∞

k=1

_∞

n=m_kE_n^(k)<∞ (e.g._∞

n=m_kE_n^(k)<2⁻^k).Forn≥m₁letb_n=max{1≤k: m_l≤nfor all1≤l≤k}.Then TEST^b_n(Xⁿ₀)is consistent for all stationary and ergodicX-valued processes{X_n}.

Proof. If theX-valued process{Xn}is dependent then by Lemma1for somek,g^(k)n (Y₀^(k), . . . , Yn^(k))=DEPeventually almost surely and so TEST^b_n(X₀ⁿ)=DEPeventually almost surely. Now assume that{Xn}is an independent X-valued process. Then by Lemma1, for eachk,{Yn^(k)}is independent and so

∞ n=m1

b_n

k=1

E_n^(k)=^∞

k=1

∞ n=mk

E_n^(k)<∞.

Thus ∞ n=m₁

P TEST^b_n

Xⁿ₀

=DEP

= ∞ n=m₁

P g^(k)_n

Y₀^(k), . . . , Y_n^(k)

=DEPfor some 1≤k≤bn

≤ ∞ n=m₁

bn

k=1

P g_n^(k)

Y₀^(k), . . . , Y_n^(k)

=DEP

≤ ^∞

n=m1

b_n

k=1

E_n^(k)<∞.

By the Borel–Cantelli Lemma,g^(k)_n (Y₀^(k), . . . , Y_n^(k))=INDfor all 1≤k≤b_neventually almost surely. The proof of

Theorem1is complete.

3. Construction of strongly consistent tests

In this section we will show how to construct a strongly consistent test for independence for the class of all processes taking values in a fixed finite set Y. Assume {Y_n}is a stationary and ergodic process taking values in Y. First let us define a number which will measure the degree of independence of the process. It will be zero if and only if the process is independent.

For convenience let p(y₋⁰_k₊₁) andp(y|y₋⁰_k₊₁)denote the distribution P (Y₋⁰_k₊₁=y₋⁰_k₊₁)and the conditional distributionP (Y1=y|Y₋⁰_k₊₁=y₋⁰_k₊₁), respectively.

(4)

Define Γ = sup

1≤k<∞ sup

{z⁰₋_k₊₁∈Y^k,y∈Y:p(z₋⁰_k₊₁,y)>0}

p(y)−p

y|z⁰₋_k₊₁ .

We proceed to define an empirical version of this based on the observation of a finite data segmentY₀ⁿ. To this end first define the empirical version ofp(y)andp(y|w₋⁰_k₊₁)as

ˆ

p_n(y)=#{0≤t≤n−1:Y_t₊₁=y} n

and ˆ pn

y|w₋⁰_k₊₁

=#{k−1≤t≤n−1:Y_t^t₋⁺_k¹₊₁=(w⁰₋_k₊₁, y)}

#{k−1≤t≤n−1:Y_t^t₋_k₊₁=w⁰₋_k₊₁} .

These empirical distributions, as well as the sets we are about to introduce are functions ofY₀ⁿ, but we suppress the dependence to keep the notation manageable.

For a fixed 0< γ <1 letLⁿ_k denote the set of strings with lengthkwhich appear more thann¹⁻^γ times inY₀ⁿ. That is,

Lⁿ_k=

y₋⁰_k₊₁∈Y^k: #

k−1≤t≤n−1:Y_t^t₋_k₊₁=y₋⁰_k₊₁

> n¹⁻^γ . Finally, define the empirical version ofΓ as follows:

Γˆ_n=max

y∈Y max

1≤k<∞ max

z⁰_−k+₁∈Lⁿk

pˆ_n(y)− ˆp_n

y|z⁰₋_k₊₁ .

Let us agree by convention that if the set over which we are maximizing is empty then the maximum is zero.

Observe, that by ergodicity, the ergodic theorem implies that almost surely the empirical distributionspˆ converge to the true distributionspand so

lim inf

n→∞ Γˆn≥Γ almost surely.

If the process is dependent then clearlyΓ >0. If the process is independent thenΓ =0 and we will show that not justΓˆn→0 almost surely but it converges to zero at a certain rate.

Let 0< β <¹⁻₂^γ be arbitrary. Let gn=

IND ifΓˆ_n≤n⁻^β, DEP otherwise.

Note thatg_ndepends onY₀ⁿ.

Theorem 2. Let{Y_n}be an arbitrary stationary and ergodic process taking values from a finite setY.Assume that 0< γ <1and0< β <¹⁻₂^γ.If the process is independent then eventually almost surely,g_n=IND and if the process is not independent then eventually almost surely,gn=DEP.Furthermore,if the process is independent and identically distributed then forn >2^1/(1⁻^γ⁻^2β)

P (g_n=DEP)≤14|Y|n⁴e⁻ⁿ⁻^2β⁺¹⁻^γ^/2 and the right-hand side is summable.

Proof. If the process is not independent then there is a string z⁰₋_k₊₁ ∈ Y^k and a letter y ∈ Y such that p(y)=p(y|z⁰₋_k₊₁) and p(z⁰₋_k₊₁y) >0. By ergodicity, pˆ_n(y)→p(y) and pˆ_n(y|z⁰₋_k₊₁)→p(y|z⁰₋_k₊₁) and so lim infn→∞Γˆn>0 almost surely which in turn implies thatgn=DEPeventually almost surely.

(5)

Assume that the process is independent and identically distributed. Then PΓˆ_n> n⁻^β

=P

maxy∈Y max

1≤k<∞ max

(z⁰₋_k₊₁,y)∈Lⁿk

pˆ_n(y)− ˆp_n

y|z₋⁰_k₊₁ > n⁻^β

≤P

max

y∈Y pˆ_n(y)−p(y) > n⁻^β/2 +P

maxy∈Y max

1≤k<n max

z⁰₋_k₊₁∈Lⁿk

p

y|z₋⁰_k₊₁

− ˆpn

y|z⁰₋_k₊₁ > n⁻^β/2

≤P

For somey∈Y: pˆn(y)−p(y) > n⁻^β/2 +P

For somey∈Y,1≤k < n, k−1≤l≤n−1:Y_l^l₋_k₊₁∈Lⁿ_k, pˆn

y|Y_l^l₋_k₊₁

−p

y|Y_l^l₋_k₊₁ > n⁻^β/2 .

By the union bound this in turn is:

PΓˆ_n> n⁻^β

≤

y∈Y

P pˆ_n(y)−p(y) > n⁻^β/2

+

y∈Y n−1

k=1 n−1

l=k−1

P

Y_l^l₋_k₊₁∈Lⁿ_k, pˆ_n

y|Y_l^l₋_k₊₁

−p

y|Y_l^l₋_k₊₁ > n⁻^β/2 .

By Hoeffding’s inequality (cf. Theorem 2 in [4]) for sums of bounded independent random variables, for a given y∈Y,

P pˆ_n(y)−p(y) > n⁻^β/2

≤2e⁻^0.5n^−2βⁿ. Summing overywe get that

y∈Y

P pˆ_n(y)−p(y) > n⁻^β/2

≤2|Y|e⁻^0.5n⁻^2βⁿ.

Now for a given 0≤l≤n−1, 1≤k≤n−1 andy∈Y we will give an upper bound on the probability P

Y_l^l₋_k₊₁∈Lⁿk, pˆ_n

y|Y_l^l₋_k₊₁

−p

y|Y_l^l₋_k₊₁ > n⁻^β/2 .

Letl+θ⁺(l, j, i)andl−θ⁻(l, j, i)denote the position of theith occurrence of the patternY_l^l₋_j (with lengthj and positionl) going in positive and negative directions respectively. Formally, setθ⁺(l, j,0)=0,θ⁻(l, j,0)=0 and define

θ⁺(l, j, i)=θ⁺(l, j, i−1)+min

t >0: Y_l^l₊⁺_θ^θ₊⁺_(l,j,i^(l,j,i₋⁻₁₎¹⁾₋⁺_j^t₊₁₊_t=Y_l^l₊⁺_θ^θ₊⁺_(l,j,i^(l,j,i₋⁻₁₎¹⁾₋_j₊₁ and

θ⁻(l, j, i)=θ⁻(l, j, i−1)+min

t >0: Y_l^l₋⁻_θ^θ₋⁻_(l,j,i^(l,j,i₋⁻₁₎¹⁾₋⁻_j^t₊₁₋_t=Y_l^l₋⁻_θ^θ₋⁻_(l,j,i^(l,j,i₋⁻₁₎¹⁾₋_j₊₁ .

For a given 0≤l≤n−1, 1≤k≤n−1,y∈Y,r≥0 ands≥0, by Hoeffding’s inequality (cf. Theorem 2 in [4]) for sums of bounded independent random variables,

P

_r

h=11_{Y_l₋_θ−(l,k,h)₊₁=y}+_s

h=01_{Y_l₊_θ+(l,k,h)₊₁=y}

r+s+1 −p

y|Y_l^l₋_k₊₁ ≥0.5n⁻^β Y_l^l₋_k₊₁=y₋⁰_k₊₁

≤2e⁻^0.5n⁻^2β^(r⁺^s⁺¹⁾.

(6)

Multiplying both sides byP (Y_l^l₋_k₊₁=y₋⁰_k₊₁)and summing over all possibley₋⁰_k₊₁∈Y^kwe get that

P

Y_l^l₋_k₊₁∈Lⁿk,

r

h=11_{Yl−θ−(l,k,h)+1=y}+s

h=01_{Yl+θ+(l,k,h)+1=y}

r+s+1 −p

y|Y_l^l₋_k₊₁ >0.5n⁻^β

≤P

_r

h=11_{Y_l₋_θ−(l,k,h)₊₁=y}+_s

h=01_{Y_l₊_θ+(l,k,h)₊₁=0}

r+s+1 −p

y|Y_l^l₋_k₊₁ >0.5n⁻^β

≤2e⁻^0.5n⁻^2β^(r⁺^s⁺¹⁾.

Summing over all 0≤l≤n−1 and over all pairs(r, s)such thatr≥0,s≥0,r+s+1≥ n¹⁻^γ we get that

n−1

l=0

P

Y_l^l₋_k₊₁∈Lⁿ_k, pˆ_n

y|Y_l^l₋_k₊₁

−p

y|Y_l^l₋_k₊₁ > n⁻^β/2

≤n ∞ h=n¹⁻^γ

h2e⁻^0.5n⁻^2β^h.

Thus

y∈Y n−1

k=0 n−1

l=k−1

P

Y_l^l₋_k₊₁∈Lⁿk, pˆ_n

y|Y_l^l₋_k₊₁

−p

y|Y_l^l₋_k₊₁ > n⁻^β/2

≤2n²|Y|

∞ h=n^1−γ

he⁻ⁿ⁻^2β^h/2.

Now we give an upper bound on the sum on the right-hand side. Observe thathe⁻ⁿ^−2β^h/2 is monotone decreasing inh as soon as the derivative e⁻ⁿ⁻^2β^h/2−h0.5n⁻^2βhe⁻ⁿ⁻^2β^h/2 is negative forh > n¹⁻^γ which is the case forn >

2^1/(1⁻^γ⁻^2β). Using this fact, we bound the sum by the integral ∞

h=n¹⁻^γ

he⁻ⁿ⁻^2β^h/2≤ _∞

n¹⁻^γ

he⁻ⁿ⁻^2β^h/2dh.

Integrating by parts we get that _∞

n¹⁻^γ

he⁻ⁿ^−2β^h/2dh=

h −1

n⁻^2β/2e⁻ⁿ^−2β^h/2 _∞

n¹^−γ

− _∞

n¹⁻^γ

−1

n⁻^2β/2e⁻ⁿ^−2β^h/2dh

= n¹⁻^γ

n⁻^2β/2e⁻ⁿ^−2βⁿ^1−γ^/2− 1

(n⁻^2β/2)²e⁻ⁿ^−2β^h/2 _∞

n^1−γ

= n¹⁻^γ

n⁻^2β/2e⁻ⁿ^1−γ−2β^/2+ 1

(n⁻^2β/2)²e⁻ⁿ^1−γ−2β^/2

=

2n¹⁻^γ⁺^2β+4n^4β

e⁻ⁿ¹⁻^γ⁻^2β^/2

≤

2n²+4n²

e⁻ⁿ¹⁻^γ⁻^2β^/2 since by assumption 0< γ <1 and 0< β <¹⁻₂^γ.

Combining all these we get that forn >2^1/(1⁻^γ⁻^2β) P (gn=DEP)=PΓˆn> n⁻^β

≤2|Y|e⁻^0.5n¹⁻^2β+12n⁴|Y|e⁻ⁿ¹^−γ−^2β^/2

≤14n⁴|Y|e⁻ⁿ¹⁻^γ⁻^2β^/2.

The right-hand side is summable provided 2β+γ <1 and the Borel–Cantelli Lemma yields that PΓˆn≤n⁻^β eventually

=1

(7)

and sog_n=INDeventually almost surely. The proof of Theorem2is complete.

This test can be used as the input needed for Theorem 1. Now let(X,F)be a countably generated measurable space. As is well known any countably generated measurable space has a refining sequence of finite partitionsAk

such that _∞

k=1Aˆk generates the whole σ-algebraF on X whereAˆk denotes the algebra which is generated by partitionAk. Letg^(k)_n be the test constructed above for partitionAk. Then choosingbso as to satisfy the condition in Theorem1we have the following corollary.

Corollary 1. For all stationary and ergodicX-valued process{Xn}eventually almost surely,TEST^b_n(X₀ⁿ)=IND if the process is independent and eventually almost surely TEST^b_n(X₀ⁿ)=DEP if the process is not independent.

References

[1] D. H. Bailey. Sequential schemes for classifying and predicting ergodic processes. Ph.D. thesis, Stanford Univ., 1976.MR2626644 [2] A. Berger. On uniformly consistent tests.Ann. Math. Statist.22(1951) 289–293.MR0042653

[3] A. Dembo and Y. Peres. A topological criterion for hypothesis testing.Ann. Statist.22(1994) 106–117.MR1272078

[4] W. Hoeffding. Probability inequalities for sums of bounded random variables.J. Amer. Statist. Assoc.58(1963) 13–30.MR0144363 [5] W. Hoeffding and J. Wolfowitz. Distinguishability of sets of distributions.Ann. Math. Statist.29(1958) 700–718.MR0095555

[6] Ch. Kraft. Some conditions for consistency and uniform consistency of statistical procedures.Univ. California Publ. Statist.2(1955) 125–141.

MR0073896

[7] G. Morvai and B. Weiss. Order estimation of Markov chains.IEEE Trans. Inform. Theory51(2005) 1496–1497.MR2241507 [8] G. Morvai and B. Weiss. On classifying processes.Bernoulli11(2005) 523–532.MR2146893

[9] G. Morvai and B. Weiss. Estimating the lengths of memory words. IEEE Transactions on Information Theory54(2008) 3804–3807.

MR2451043

[10] A. Nobel. Hypothesis testing for families of ergodic processes.Bernoulli12(2006) 251–269.MR2218555 [11] D. Ornstein and B. Weiss. How sampling reveals a process.Ann. Probab.18(1990) 905–930.MR1062052

[12] B. Weiss. Some remarks on filtering and prediction of stationary processes.Israel J. Math.149(2005) 345–360.MR2191220