The law of series

(1)

HAL Id: hal-00016627

https://hal.archives-ouvertes.fr/hal-00016627

Preprint submitted on 9 Jan 2006

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

The law of series

Tomasz Downarowicz, Yves Lacroix

To cite this version:

(2)

ccsd-00016627, version 1 - 9 Jan 2006

THE LAW OF SERIES

T. Downarowicz and Y. Lacroix January 8, 2006

Abstract. We prove a general ergodic-theoretic result concerning the return time statistic, which, properly understood, sheds some new light on the common sense phenomenon known as the law of series. Let (PZ_{, µ, σ) be an ergodic process on}

finitely many states, with positive entropy. We show that the distribution function of the normalized waiting time for the first visit to a small cylinder set B is, for ma-jority of such cylinders and up to epsilon, dominated by the exponential distribution function 1 − e−t_{. This fact has the following interpretation: The occurrences of such}

a “rare event” B can deviate from purely random in only one direction – so that for any length of an “observation period” of time, the first occurrence of B “attracts” its further repetitions in this period.

Note

This paper resulted from studying asymptotic laws for return/hitting time statis-tics in stationary processes, a field in ergodic theory rapidly developing in the recent years (see e.g. [A-G], [C], [C-K], [D-M], [H-L-V], [L] and the reference therein). Our result significantly contributes to this area due to both its generality and strength of the assertion. After having completely written the proof, during a free-minded discussion, the authors have discovered an astonishing interpretation of the result, clear even in terms of the common sense understanding of random processes. The organization of the paper is aimed to emphasise this discovery. The consequences for the field of asymptotic laws are moved toward the end of the paper.

Introduction

The phenomenon known as the law of series appears in many aspects of every-day life, science and technology. In the common sense understanding it signifies a sudden increase of frequency of a rare event, seemingly violating the rules of probability. Let us quote from Jean Moisset ([Mo]):

This law can be defined as: the repetition of identical or analogous events, things or symbols in space or time; for example: the announcement of several similar accidents on the same day, a series of strange events experienced by someone on the same day which are either happy ones (a period of good luck) or unfavorable (disastrous) ones, or the repetition of unexpected similar events. For example, you are invited to dinner and you are served a roast beef and you note that you were served the same menu the day before at your uncle’s home and the day before that at your cousin’s home. 1991 Mathematics Subject Classification. 37A50, 37A35, 37A05, 60G10.

Key words and phrases. stationary random process, positive entropy, return time statistic, hitting time statistic, repelling, attracting, limit law, the law of series.

This paper was written during the first author’s visit at CPT/ISITV, supported by CNRS. The research of the first author is supported by DBN grant 1 P03A 021 29

(3)

Another proverb describing more or less the same (with regard to unwanted events) is misfortune seldom comes alone. Both expressions exist in many languages, prov-ing that the phenomenon has been commonly noticed throughout the world. In this setting it has been accounted to the category of unexplained mystery, para-physics, parapsychology, together with “malignancy of fate”, “Murphy laws”, etc. Many pseudoscientific experiments have been conducted to prove obvious violation of statistical laws, where such laws were usually identified with the statistics of a series of independent trials ([St], [Km]). Equally many texts have been devoted to explain the anomaly within the framework of an independent process (see e.g., [Mi]), or merely as the weakness of our memory, keen to notice unusual events as more frequent just because they are more distinctive.

The phenomenon is also known in more serious science. In modeling and sta-tistics it is sometimes called “clustering of data”. It is experimentally observed in many real processes, such as traffic jams, telecommunication network overloads, power consumption peaks, demographic peaks, stock market fluctuations, etc., as periods of increased frequency of occurrences of certain rare events. These anom-alies are usually explained in terms of physical dependence (periods of propitious conditions) and complicated algorithms are implemented in modeling these pro-cesses to simulate them.

But, to our knowledge, there was no logical construction proving, in full gener-ality, that there exists a “natural” tendency of rare events to appear in series, and the result we will present in this paper, or any of its possible variants, remained until now unnoticed by the specialists. We prove an ergodic-theoretic theorem on stationary stochastic processes, in which a wide range of “rare events” is shown to behave either in a way which we call “unbiased” (i.e., as in an independent pro-cess), or else exactly as it is specified in the law of series, i.e., so that the first occurrence increases the chances of untimely repetitions. Roughly speaking, we prove that rare events appear in series whenever the unbiased behavior is perturbed: there is no other choice. Besides ergodicity (which is automat-ically satisfied if we observe a chosen at random single realization of any process) we make only one essential, but obviously necessary, assumption on the process: it must maintain a “touch of randomness”, i.e., the future must not be completely determined by the past, which is equivalent to assuming positive entropy. With-out this assumption a rotation of a compact group is an immediate example where events never appear in series. Of course, not every interesting rare event in reality can be modeled by the type of set we describe (cylinder over a long block), and we do not claim that our theorem fully explains the common sense phenomenon, but it certainly sheds on it some new light.

In terms of ergodic theory, we define two elementary antagonistic properties of the return times called “attracting” and “repelling”, and we prove that they behave quite differently in processes of zero and of positive entropy: attracting can persist for arbitrarily long blocks in both cases, while repelling must decay (as the length of blocks grows to infinity) in positive entropy processes. Many properties are known to differentiate between positive and zero entropy, but most of them involve a passage via measure-theoretic isomorphism, i.e., change of a generator, or require some additional structure. Our “decay of repelling” holds in general and for any finite generator, or even partition, as long as it generates positive entropy.

It is impossible not to mention here the theorem of Ornstein and Weiss [O-W2] which relates the return times of long blocks to entropy. However, this theorem says nothing about attracting or repelling, because the limit appearing in the statement

(4)

is insensitive to the proportions between gap sizes. Nevertheless, this remarkable result is very useful and it will help also in our proof. Our theorem’s proof is entirely contained within the classics of ergodic theory; it relies on basic facts on entropy for partitions and sigma-fields, some elements of the Ornstein theory (ǫ-independence), the Shannon-McMillan-Breiman Theorem, the Ornstein-Weiss Theorem on return times, the Ergodic Theorem, basics of probability and calculus. We do not invoke any specialized machinery of stochastic processes or statistics.

The authors would like to thank Dan Rudolph for a hint leading to the con-struction of Example 2 and, in effect, to the discovery of the attracting/repelling asymmetry. We also thank Jean-Paul Thouvenot for his interest in the subject, substantial help, and the challenge to find a purely combinatorial proof (which we save for the future).

Rigorous definition and statement

We establish the notation necessary to formulate the main result. Let (PZ_{, µ, σ)}

be an ergodic process on finitely many symbols, i.e., #P < ∞, σ is the standard left shift map and µ is an ergodic shift-invariant probability measure on PZ_{. Most}

of the time, we will identify finite blocks with their cylinder sets, i.e., we agree that Pn ₌ Wn−1

i=0 σ−i(P). Depending on the context, a block B ∈ Pn is attached to

some coordinates or it represents a “word” which may appear in different places along the P-names. We will also use the probabilistic language of random variables. Then µ{R ∈ A} (A ⊂ R) will abbreviate µ({x ∈ PZ _{: R(x) ∈ A}). Recall, that}

if the random variable R is nonnegative and F (t) = µ{R ≤ t} is its distribution function, then the expected value of R equalsR₀∞1 − F (t) dt.

For a set B of positive measure let RB and RB denote the random variables

defined on B (with the conditional measure µB = _µ(B)µ ) as the absolute and

nor-malized first return time to B, respectively, i.e.,

RB(y) = min{i > 0, σi(y) ∈ B}, RB(y) = µ(B)RB(y).

Notice that, by the Kac Theorem ([Kc]), the expected value of RB equals µ(B)1 ,

hence that of RB is 1 (that is why we call it “normalized”). We also define

GB(t) =

Z t 0 1 − F

B(s) ds.

(The interpretation of this function is discussed in the following section.) Clearly, GB(t) ≤ min{t, 1} and the equality holds when FB(t) = 1[1,∞), that is, when B

occurs precisely with equal gaps, i.e., periodically; the gap size then equals 1 µ(B).

The key notions of this work are defined below:

Definition 1. We say that the visits to B repel (resp. attract) each other with intensity ǫ from a distance t > 0, if

GB(t) ≥ 1 − e−t+ ǫ (resp. if GB(t) ≤ 1 − e−t− ǫ).

We abbreviate that B repels (attracts) with intensity ǫ if its visits repel (attract) each other with intensity ǫ from some distance t.

Obviously, occurrences of an event may simultaneously repel from one distance and attract from another. Notice, that the maximal intensity of repelling is e−1

achieved at t = 1 when B appears periodically. The intensity of attracting can be arbitrarily close to 1 (when B appears in enormous clusters separated by huge pauses; see the next section). The main result follows:

(5)

Theorem 1. _{If (P}Z_{, µ, σ) is ergodic and has positive entropy, then for every ǫ > 0}

the measure of the union of all n-blocks B ∈ Pn _{which repel with intensity ǫ,}

converges to zero as n grows to infinity.

We also provide an example (Example 2) in which, for a substantial collection of lengths, the majority of cylinders display strong attracting. Moreover, the process of Example 2 is isomorphic to a Bernoulli process, which implies that a partition with such strong attracting properties can be found in any measure-preserving transformation of positive entropy (see the Remark 3). Let us mention that it is easy to find zero entropy examples with either persistent repelling (discrete spectrum) or attracting (see the Example 3), or even both at a time (see the Remark 4).

Interpretation and its limits

Let us elaborate a bit on the meaning of attracting and repelling for an event B. Let VB be the random variable defined on X as the hitting time statistic, i.e.,

the waiting time for the first visit in B (the defining formula is the same as for RB,

but this time it is regarded on X with the measure µ). Further, let VB= µ(B)VB,

called, by analogy, the normalized hitting time (although the expected value of this variable need not be equal to 1). By ergodicity, VB and VB are well defined. By

an elementary consideration of the skyscraper above B, one easily verifies, that the distribution function ˜FB of VB satisfies the inequalities:

GB(t) − µ(B) ≤ ˜FB(t) ≤ GB(t)

(see [H-L-V] for more details). Because we deal with long blocks (so that, by the Shannon-McMillan-Breiman Theorem, µ(B) is, with high probability, very small), for sake of the interpretation, we will simply assume that ˜FB= GB. Thus,

attract-ing and repellattract-ing can be considered properties of the hittattract-ing rather than return time statistic. In fact, if we replace GB by ˜FB in the definition of attracting/repelling,

the formulation of Theorem 1 remains exactly the same, because it admits tolerance up to a fixed ǫ.

It is easy to see that if (PZ_{, µ, σ) is an independent Bernoulli process, then, for}

any long block B, FB(t) ≈ 1 − e−t (and also ˜FB(t) ≈ 1 − e−t) with high uniform

accuracy. We will call such behavior “unbiased” (neither attracting nor repelling), and attracting and repelling can be viewed as deviations from the unbiased pattern. Fix some t > 0. Consider the random variable I counting the number of occurrences of B in the time period [0, t

µ(B)]. The expected value of I equals

µ_(B)⌊ t

µ(B)⌋ ≈ t (up to the ignorable error µ(B)). On the other hand, µ{I > 0} =

µ_{VB ≤µ(B)t } = ˜FB(t). Attracting from the distance t occurs when the last value is

smaller than it would be (say, for the same cylinder B) in an independent process. Because the expected value of I in the independent process is maintained (and equals approximately t), the conditional expected value of I on the set {I > 0} must be larger in (PZ_{, µ, σ) than in the independent process. This fact can be}

in-terpreted as follows: If we observe the process for time t

µ(B) (which is our “memory

length” or “lifetime of the observer”) and we happen to see the event B during this time at least once, then the expected number of times we will observe the event B is larger than the analogous value in the independent process. The first occurrence

(6)

of B “attracts” further repetitions (misfortune seldom comes alone).

repelling ...B...B...B...B...B...B...B....B...B....B..B....B...B.. unbiased ...B...B....B..B....B...B...B..B...B..B.B...B...B.. attracting ...B...B..B.B..B...B...BB...B.BB...B...B.. strong attr. ...BBB.BB...B...BBB.BB.BB...

Figure 1: Comparison between unbiased, repelling and attracting distributions of copies of a block. Attracting with intensity close to 1 occurs, when GBis very “flat” (close to zero on a long initial

interval). Then FB is immediately very close to 1 indicating that on most of B the first return

time is much smaller than 1

µ(B). Of course, this must be compensated on a small part of B by

extremely large values of the return time. This means that the visits to B occur in enormous clusters of very high frequency, compensated by huge pauses with no (or very few) visits. Such pattern will be called “strong attracting” and it will take place in some of our examples.

Repelling from the distance t means exactly the opposite: The first occurrence lowers the expected number of repetitions within the observation period, i.e., repels them. If we have a mixed behavior, our impression about whether the event attracts or repels its repetitions depends on the length of our “memory”. Attracting not assisted by repelling (or assisted by repelling of an ignorably small intensity) means that no matter what memory length we apply, either we see a nearly unbiased behavior or the first occurrence visibly attracts further repetitions. Our Theorem 1 asserts that if we observe longer and longer blocks B, repelling from any distance must decay in both measure and intensity (while attracting can persist), so that for majority of long blocks we will see the behavior as described above.

We also note, that by pushing the graph of ˜FB downward (compared to 1 − e−t),

attracting contributes to increasing the expected value of the associated random variable, i.e., of the hitting time. In case of attracting assisted by only very small intensity repelling, the average waiting time for the first occurrence of the event B is increased in comparison to unbiased (may even not exist). Thus, instinctively judging the probability of the event by (the inverse of) the waiting time for the first occurrence we will typically underestimate it. All the more we are surprised, when the following occurrences happen after a considerably shorter time. This additionally strengthens the phenomenon’s appearance.

Another consequence of attracting not assisted by repelling (or assisted by re-pelling of a very small intensity) is an increased variance of the return time statistic (the variance may even cease to exist). Thus, again, the gaps between the occur-rences of B are driven away from the expected value, toward the extremities 0 and ∞, and hence, into the pattern of clusters separated by longer pauses. We skip the elementary estimations of the variance.

It must be reminded: we do not claim, for any class of processes, that occurrences of long blocks will actually deviate from unbiased. There are conditions, weaker than full independence, under which the distributions of the normalized return times of long blocks converge almost surely to the exponential law. It is so, for instance, in Markov processes (with finite memory). In fact, such convergence is implied by a sufficient rate of mixing ([A-G], [H-S-V]). Yet, such processes seem to be somewhat exceptional and we expect that attracting rules in majority of processes (see the Question 5 at the end of the paper). As we have already mentioned, at least that much is true, that in any dynamical system with positive entropy there exist partitions with strong attracting properties.

(7)

It is important not to be misled by an oversimplified approach. The “decay of repelling” in positive entropy processes appears to agree with the intuitive under-standing of entropy as chaos: repelling is a “self-organizing” property; it leads to a more uniform, hence less chaotic, distribution of an event along a typical orbit. Thus one might expect that repelling with intensity ǫ revealed by a fraction ξ of all n-blocks contributes to lowering an upper estimate of the entropy by some percent-age proportional to ξ and depending increasingly on ǫ. If this happens for infinitely many lengths n with the same parameters ξ and ǫ, the entropy should be driven to zero by a geometric progression. Surprisingly, it is not quite so, and the phe-nomenon has more subtle grounds. We will present an example which exhibits the incorrectness of such intuition (see the Example 1 and the preceding discussion). Also, it will become obvious from the proof, that there is no gradual reduction of the entropy. The entropy is “killed completely in one step”, that means, positive entropy and persistent repelling lead to a contradiction by examining the blocks of one sufficiently large length n; we do not use any iterated procedure requiring repelling for infinitely many lengths.

Notation and preliminary facts

We now establish further notation and preliminaries needed in the proof. If A_{⊂ Z then we will write P}A _{to denote the partition or sigma-field} W

i∈Aσ−i(P).

We will abbreviate Pn_{= P}[0,n)_{, P}−n_{= P}[−n,−1]_{, P}−_{= P}(−∞,−1]_{(a “finite future”,}

a “finite past”, and the “full past” of the process).

We assume familiarity of the reader with the basics of entropy for finite partitions and sigma-fields in a standard probability space. Our notation is compatible with [P] and we refer the reader to this book, as well as [Sh] and [W], for background and proofs. In particular, we will be using the following:

* The entropy of a partition equals H(P) = −PA∈Pµ(A) log2(µ(A)).

* For two finite partitions P and B, the conditional entropy H(P|B) is equal to P_B∈Bµ(B)HB(P), where HB is the entropy evaluated for the conditional

measure µB on B.

* The same formula holds for conditional entropy given a sub-sigma-field C, i.e., X

B∈B

µ(B)HB(P|C) = H(P|B ∨ C).

* The entropy of the process is given by any one of the formulas below h= H(P|P−) =1 rH(P r_|P−_{) = lim} r→∞ 1 rH(P r_).

We will exploit the notion of ǫ-independence for partitions and sigma-fields. The definition below is an adaptation from [Sh], where it concerns finite partitions only. See also [Sm] for treatment of countable partitions. Because “ǫ” is reserved for the intensity of repelling, we will speak about β-independence.

Definition 2. _{Fix β > 0. A partition P is said to be β-independent of a sigma-field} B if for any B-measurable countable partition B′ _holds

X

A∈P,B∈B′

(8)

A process (PZ_{, µ, σ}_{) is called an β-independent process if P is β-independent of the}

past P−_.

A partition P is independent of another partition or a sigma-field B if and only if H(P|B) = H(P). The following approximate version of this fact holds (see [Sh, Lemma 7.3] for finite partitions, from which the case of a sigma-field is easily derived).

Fact 1. _{A partition P is β-independent of another partition or a sigma-field B if} H_{(P|B) ≥ H(P) − ξ, for ξ sufficiently small.}

In course of the proof, a certain lengthy condition will be in frequent use. Let us introduce an abbreviation:

Definition 3. _{Given a partition P of a space with a probability measure µ and} δ >0, we will say that a property Φ(A) holds for A ∈ P with µ-tolerance δ if

µ[_{{A ∈ P : Φ(A)}}_{≥ 1 − α.}

We shall also need an elementary estimate, whose proof is an easy exercise. Fact 2. _{For each A ∈ P, H(P) ≤ (1 − µ(A)) log}₂_{(#P) + 1.}

In addition to the random variables of absolute and normalized return times RB

and RB, we will also use the analogous notions of the kth absolute return time

R(k)_B = min{i : #{0 < j ≤ i : σj_{(y) ∈ B} = k},}

and of the normalized kth _{return time R}(k)

B = µ(B)R (k)

B (both defined on B), with

F_B(k)always denoting the distribution function of the latter. Clearly, the expected value of R(k)B equals k.

The idea of the proof and the basic lemma

Before we pass to the formal proof of Theorem 1, we would like to have the reader oriented in the mainframe of the idea behind it. We intend to estimate (from above, by 1 − e−t+ ǫ) the function GBA, for long blocks of the form BA ∈ P[−n,r). The

“positive” part A has a fixed length r, while we allow the “negative” part B to be arbitrarily long. There are two key ingredients leading to the estimation. The first one, contained in Lemma 3, is the observation that for a fixed typical B ∈ P−n, the part of the process induced on B (with the conditional measure µB) generated by

the partition Pr_{, is not only a β-independent process, but it is also β-independent}

of many returns times R(k)_B of the cylinder B (see the Figure 2).

coordinate 0 ↓

... B A-1 ... B A0 .. B A1 ... B A2 .... B A3 ....

Figure 2: The process . . . A−1A0A1A2. . . of r-blocks following the copies of B is a β-independent

(9)

This allows us to decompose (with high accuracy) the distribution function FBA of

the normalized return time of BA as follows:

FBA(t) = µBA{RBA ≤ t} = µBA{RBA ≤_µ(BA)t } = X k≥1 µBA{R(B)_A = k, R(k)_B ≤_pµ(B)t } ≈ X k≥1 µBA{R_A(B)= k} · µB{R (k) B ≤pt} ≈ X k≥1 p(1 − p)k−1_{· F}(k) B ( t p),

where R_A(B)denotes the first (absolute) return time of A in the process induced on B, and p = µB(A).

The second key observation is, assuming for simplicity full independence, that when trying to model some repelling for the blocks BA, we ascertain that it is largest, when the occurrences of B are purely periodic. Any deviation from period-icity of the B’s may only lead to increasing the intensity of attracting between the copies of BA, never that of repelling. We will explain this phenomenon more for-mally in a moment. Now, if B does appear periodically, then the normalized return time of BA is governed by the same geometric distribution as the normalized return time of A in the independent process induced on B. If p is small, this geometric distribution function becomes nearly the unbiased exponential law 1 − e−t_{. The}

smallness of p is a priori regulated by the choice of the parameter r (Lemma 1). The phenomena that, assuming full independence, the repelling of BA is maxi-mized by periodic occurrences of B, and that even then there is nearly no repelling, is captured by the following elementary lemma, which will be also useful later, near the end of the rigorous proof.

Lemma 0. _{Fix some p ∈ (0, 1). Let F}(k) _{(k ≥ 1) be a sequence of distribution} functions on [0, ∞) such that the expected value of the distribution associated to F(k) _{equals k. Define} F(t) =X k≥1 p(1 − p)k−1F(k)(t p), and G(t) = Z t 0 1 − F (s)ds. Then G(t) ≤ 1 log ep(1 − e −t p ), where ep= (1 − p)− 1 p. Proof. We have G(t) =X k≥1 p_{(1 − p)}k−1 Z t 0 1 − F (k)₍s p)ds.

We know that F(k)_{(t) ∈ [0, 1] and that}R∞ 0 1 − F

(k)_{(s)ds = k (the expected value).}

With such constraints, it is the indicator function 1[k,∞) that maximizes the

inte-grals from 0 to t simultaneously for every t (because the “mass” k above the graph is for such choice of the function swept maximally to the left). The rest follows by direct calculations: G(t) ≤X k≥1 p(1 − p)k−1 Z t 0 1[0,k)(s_p)ds = Z t 0 ∞ X k=⌈s p⌉ p(1 − p)k−1_ds₌ Z t 0 (1 − p) ⌈s p⌉ds≤ (1 − p) t p − 1 log(1 − p)1p .

(10)

Recall that the maximizing distribution functions F_B(k) = 1[k,∞) occur, for the

normalized return time of a set B, precisely when B is visited periodically. This explains our former statement on this subject.

Let us comment a bit more on the first key ingredient, the β-independence. Es-tablishing it is the most complicated part of the argument. The idea is to prove conditional (given a “finite past” P−n_{) β-independence of the “present” P}r _from

jointly the full past and a large part of the future, responsible for the return times of majority of the blocks B ∈ P−n_{. But the future part must not be too large.}

Let us mention the existence of “bilaterally deterministic” processes with positive entropy (first discovered by Gureviˇc [G], see also [O-W1]), in which the sigma-fields generated by the coordinates (−∞, −m] ∪ [m, ∞) do not decrease with m to the Pinsker factor; they are all equal to the entire sigma-field. (Coincidently, our Ex-ample 1 has precisely this property; see the Remark 2.) Thus, in order to maintain any trace of independence of the “present” from our sigma-field already containing the entire past, its part in the future must be selected with an extreme care. Let us also remark that an attempt to save on the future sigma-fields by adjusting them individually to each block B0∈ P−nfalls short, mainly because of the “off diagonal

effect”; suppose Pr_{is conditionally (given P}−n_{) nearly independent of a sigma-field}

which determines the return times of only one selected block B0 ∈ P−n. The

in-dependence still holds conditionally given any cylinder B ∈ P−n_{from a collection}

of a large measure, but unfortunately, this collection can always miss the selected cylinder B0. In Lemmas 2 and 3, we succeed in finding a sigma-field (containing the

full past and a part of the future), of which Pr_{is conditionally β-independent, and}

which “nearly determines”, for majority of blocks B ∈ P−n_{, some finite number}

of their sequential return times (probably not all of them). This finite number is sufficient to allow the described earlier decomposition of the distribution function FBA.

The proof

Throughout the sequel we assume ergodicity and that the entropy h of (PZ_{, µ, σ)}

is positive. We begin our computations with an auxiliary lemma allowing us to assume (by replacing P by some Pr_{) that the elements of the “present” partition}

are small, relatively in most of B ∈ Pn _{and for every n. Note that the}

Shannon-McMillan-Breiman Theorem is insufficient: for the conditional measure the error term depends increasingly on n, which we do not fix.

Lemma 1. _{For each δ there exists an r ∈ N such that for every n ∈ N the following} holds for B ∈ P−n _{with µ-tolerance δ:}

for every A ∈ Pr_{, µ}

B(A) ≤ δ.

Proof. Let α be so small that √ α≤ δ and h− 3 √ α h+ α ≥ 1 − δ 2, and set γ = _log α

2(#P). Let r be so big that

1 r ≤ α, 1 r(h + α) ≤ δ 2,

(11)

and that there exists a collection Pr _{of no more than 2}r(h+α)_{− 1 elements of P}r

whose joint measure µ exceeds 1−γ (by the Shannon-McMillan-Breiman Theorem). Let fPr _{denote the partition into the elements of} _Pr _{and the complement of}

their union, and let R be the partition into the remaining elements of Pr _{and the}

complement of their union, so that Pr_{= f}_Pr_{∨ R. For any n we have}

rh_{= H(P}r_|P−_{) ≤ H(P}r_|P−n) = H( fPr_{∨ R|P}−n_{) =} H( fPr_{|R ∨ P}−n ) + H(R|P−n) ≤ H(fPr_|P−n ) + H(R) ≤ X B∈P−n µ(B)HB( fPr) + γr log2(#P) + 1

(we have used Fact 2 for the last passage). After dividing by r, we obtain X

B∈P−n

µ(B)1_rHB( fPr) ≥ h − γ log2(#P) −1r ≥ h − 2α.

Because each term 1_rHB( fPr) is not larger than 1_rlog2(# fPr) which was set to be at

most h + α, we deduce that

1

rHB( fPr) ≥ h − 3

√ α

holds for B ∈ P−n _{with µ-tolerance}√_{α, hence also with µ-tolerance δ. On the}

other hand, by Fact 2, for any B and A ∈ fPr_{, holds:}

HB( fPr) ≤ (1 − µB(A)) log2(# fPr) + 1 ≤ (1 − µB(A))r(h + α) + 1.

Combining the last two displayed inequalities we establish that, with µ-tolerance δ for B ∈ P−n_{and then for every A ∈ f}_Pr_{, holds}

1 − µB(A) ≥

h_{− 3}√α h+ α −

1

r(h + α) ≥ 1 − δ.

So, µB(A) ≤ δ. Because Prrefines fPr, the elements of Prare not larger.

We continue the proof with a lemma which can be deduced from [R1, Lemma 3]. We provide a direct proof. For α > 0 and M ∈ N let

S(M, α) = [

m∈Z

[mM + αM, (m + 1)M − αM) ∩ Z.

Lemma 2. For fixed α and r there exists M0 such that for every M ≥ M0 holds,

H(Pr_|P−

∨ PS(M,α)) ≥ rh − α (see the Figure 3).

∗∗∗∗∗∗∗∗∗∗∗◦◦..∗∗∗∗∗∗∗∗∗∗∗∗...∗∗∗∗∗∗∗∗∗∗∗∗...∗∗∗∗∗∗∗∗∗∗∗∗...

Figure 3. The circles indicate the coordinates 0 through r − 1, the conditioning sigma-filed is over the coordinates marked by stars, which includes the entire past and part of the future with gaps of size 2αM repeated periodically with period M (the first gap is half the size).

(12)

Proof. First assume that r = 1. Denote also S′(M, α) = [

m∈Z

[mM + αM, (m + 1)M ) ∩ Z.

Let M be so large that H(P(1−α)M_{) < (1 − α)M(h + γ), where γ =} α2

2(1−α). Then,

for any m ≥ 1,

H_(PS′(M,α)∩[0,mM)_|P−_{) ≤ H(P}S′(M,α)∩[0,mM)_{) < (1 − α)mM(h + γ).} Because H(P[0,mM)_|P−_{) = mM h, the complementary part of entropy must exceed}

mM h− (1 − α)mM(h + γ) (which equals αmM(h − α

2)), i.e., we have

H(P[0,mM)\S′(M,α)|P−∨ PS′_{(M,α)∩[0,mM)}

) > αmM (h −α 2).

Breaking the last entropy term as a sum over j ∈ [0, mM) \ S′_{(M, α) of the}

con-ditional entropies of σ−j_{(P) given the sigma-field over all coordinates left of j and}

all coordinates from S′_{(M, α) ∩ [0, mM) right of j, and because every such term is}

at most h, we deduce that more than half of these terms reach or exceed h − α. So, a term not smaller than h − α occurs for a j within one of the gaps in the left half of [0, mM ). Shifting by j, we obtain H(P|P−_{∨ σ}i_(PS′_(M,α)∩[0,mM

2 ))) ≥ h − α,

where i ∈ [0, αM) denotes the relative position of j in the gap. As we increase m, one value i will repeat in this role along a subsequence m′_{. The operation ∨ is}

con-tinuous for increasing sequences of sigma-fields, hence P−_{∨ σ}i_(PS′_(M,α)∩[0,m′ M 2 ))

converges over m′ _{to P}−_{∨ σ}i_(PS′_(M,α)

). The entropy is continuous for such pas-sage, hence H(P|P−_{∨ σ}i_(PS′_(M,α)

) ≥ h − α. The assertion now follows because S(M, α) is contained in S′_{(M, α) shifted to the left by any i ∈ [0, αM).}

Finally, if r > 1, we can simply argue for Pr _{replacing P. This will impose}

that M0 and M are divisible by r, but it is not hard to see that for large M the

argument works without divisibility at a cost of a slight adjustment of α. For a long block B ∈ P−n _{let ((P}r

B)Z, µB, σB) denote the process induced on B

generated by the restriction Pr

B of Prto B (σB is the first return time map on B).

The following lemma is the crucial item in our argument.

Lemma 3. _{For every β > 0, r ∈ N and K ∈ N there exists n}0 such that for every

n≥ n0, with µ-tolerance β for B ∈ P−n, with respect to µB, Pr is β-independent

of jointly the past P− _{and the first K return times to B, R}(k)

B (k ∈ [1, K]). In

particular, ((Pr

B)Z, µB, σB) is a β-independent process.

Proof. We choose ξ according to Fact 1, so that β₂-independence is implied. Let α satisfy

0 < _h−α2α <1, 18K√α <1, √2α < ξ, K√4

α < β₂.

Let n0 be so large that H(Pr|P−n) < rh + α for every n ≥ n0 and that for every

k∈ [1, K] with µ-tolerance α for B ∈ P−n _holds

µB{2n(h−α)≤ RB(k)≤ 2 n(h+α)

} > 1 − α

(we are using Ornstein-Weiss Theorem [O-W2]; the multiplication by k is consumed by α in the exponent). Let M0≥ 2n0(h−α)be so large that the assertion of Lemma 2

holds for α, r and M0, and that for every M ≥ M0,

(M + 1)1+h−α2α < αM2 and log2(M+1)

(13)

We can now redefine (enlarge) n0 and M0 so that M0 = ⌊2n0(h−α)⌋. Similarly, for

each n ≥ n0 we set Mn = ⌊2n(h−α)⌋. Observe, that the interval where the first

K returns of most n-blocks B may occur (up to probability α), is contained in [Mn, αMn2] (because 2n(h+α) ≤ (Mn+ 1)1+

2α

h−α _{< αM}2

n).

At this point we fix some n ≥ n0. The idea is to carefully select an M between

Mnand 2Mn(hence not smaller than M0), such that the initial K returns of nearly

every n-block happen most likely inside (with all its n symbols) the set S(M, α), so that they are “controlled” by the sigma-field PS(M,α)_{. Let α}′_{= α +} n

Mn, so that

every n-block overlapping with S(M, α′_{) is completely covered by S(M, α). By the}

second assumption on M ≥ M0and by the formula connecting Mn and n, we have

α′<2α. To define M we will invoke the triple Fubini Theorem. Fix k ∈ [1, K] and consider the probability space

P−n_{× [M}

n,2Mn] × N

equipped with the (discrete) measure M whose marginal on P−n_{× [M}

n,2Mn] is

the product of µ (more precisely, of its projection onto P−n_{) with the uniform}

distribution on the integers in [Mn,2Mn], while, for fixed B and M , the measure

on the corresponding N-section is the distribution of the random variable R_B(k). In this space let S be the set whose N-section for a fixed M (and any fixed B) is the set S(M, α′_{). We claim that for every l ∈ [M}

n, αMn2] ∩ N (and any fixed B)

the [Mn,2Mn]-section of S has measure exceeding 1 − 16α. This is quite obvious

(even for every l ∈ [Mn,∞) and with 1 − 15α) if [Mn,2Mn] is equipped with the

normalized Lebesgue measure (see the Figure 4).

Figure 4: The complement of S splits into thin skew strips shown in the picture. The nor-malized Lebesgue measure of any vertical section of the jth _{strip (starting at jM}

n with j ≥ 1) is at most _j24α′j −α′ 2 ≤ 5α′ j ≤ 10α

j . Each vertical line at l ≥ Mn intersects strips with indices

j, j+ 1, j + 2 up to at most 2j (for some j), so the joint measure of the complement of the section of S does not exceed 15α.

S ւ ↓ ց

|...|

Mn 2Mn

Figure 5: The discretization replaces the Lebesgue measure by the uniform measure on Mn

in-tegers, thus the measure of any interval can deviate from its Lebesgue measure by at most _M1

n.

For l ≤ αM2

n the corresponding section of S (in this picture drawn horizontally) consists of at

(14)

In the discrete case, however, a priori it might happen that the integers along some [Mn,2Mn]-section often “miss” the section of S leading to a decreased measure

value. (For example, it is easy to see that for l = (2Mn)! the measure of the section

of S is zero.) But because we restrict to l ≤ αM2

n, the discretization does not affect

the measure of the section of S by more than α, and the estimate with 1 − 16α holds (see the Figure 5 above).

Taking into account all other inaccuracies (the smaller than α part of S outside [Mn, αMn2] and the smaller than α part of S projecting onto blocks B which do not

obey the Ornstein-Weiss return time estimate) it is safe to claim that M(S) > 1 − 18α.

This implies that for every M from a set of measure at least 1 − 18√αthe measure of the (P−n_{× N)-section of S is larger than or equal to 1 −}√_{α. For every such}

M, with µ-tolerance √4

αfor B ∈ P−n_{, the probability µ}

B that the kth repetition

of B falls in S(M, α′_{) (hence with all its n terms inside the set S(M, α)) is at least}

1 −√4

α.

Because 18K√α <1, there exists at least one M for which the above holds for every k ∈ [1, K]. This is our final choice of M. For this M, with µ-tolerance K√4

α, all considered K returns of B are, with probability 1−√4

α(each), determined by the sigma-field PS(M,α). More precisely, for each k ∈ [1, K] there is a set Uk of measure

µB at most √4α such that the sets {R_B(k) = i} \ Uk agree with some Q(k)i \ Uk,

where each Q(k)_i is PS(M,α)_{-measurable. Thus, we can modify the variable R}(k) B so

it is PS(M,α)_{-measurable and equal to the original except on U}

k. We denote such

a modification by ˜R_B(k).

Let us go back to our entropy estimates. We have, by Lemma 2, X B∈P−n µ(B)HB(Pr|P−∨ PS(M,α)) = H(Pr|P−n∨ P−∨ PS(M,α)) = H(Pr_|P−_{∨ P}S(M,α)_{) ≥ rh − α ≥ H(P}r_|P−n_{) − 2α =} X B∈P−n µ(B)HB(Pr) − 2α.

Because HB(Pr|P− ∨ PS(M,α)) ≤ HB(Pr) for every B, we deduce that with

µ-tolerance√2α for B ∈ P−n _{must hold}

HB(Pr|P−∨ PS(M,α)) ≥ HB(Pr) −

√

2α ≥ HB(Pr) − ξ.

Combining this with the preceding arguments, with µ-tolerance K√4

α+√2α < β for B _{∈ P}−n_{both the above entropy inequality holds, and we have the P}S(M,α)

-mea-surable modifications ˜R_B(k)of the return times. By the choice of ξ, we obtain that with respect to µB, Pris jointly β₂-independent of the past and the modified return

times ˜R(k)_B _{(k ∈ [1, K]). Because µ(}S_k∈[1,K]Uk) ≤ K√4α < β2, this clearly implies

β-independence if each ˜R_B(k)is replaced by R_B(k).

To complete the proof of Theorem 1 it now remains to put the items together. Proof of Theorem 1. Fix an ǫ > 0. On [0, ∞), the functions

(15)

where ep = (1 − p)−

1

p, decrease uniformly to 1 − e−tas p → 0+. So, let δ be such

that gδ(t) ≤ 1 − e−t+ ǫ for every t. We also assume that

(1 − 2δ)(1 − δ) ≥ 1 − ǫ.

Let r be specified by Lemma 1, so that µB(A) ≤ δ for every n ≥ 1, every A ∈ Pr

and for B ∈ P−n _{with µ-tolerance δ. On the other hand, once r is fixed, the}

partition Pr _{has at most (#P)}r _{elements, so with µ}

B-tolerance δ for A ∈ Pr,

µB(A) ≥ δ(#P)−r. Let AB be the subfamily of Pr (depending on B) where this

inequality holds. Let K be so large that for any p ≥ δ(#P)−r_, ∞

X

k=K+1

p(1 − p)k_< δ 2,

and choose β < δ so small that

(K2+ K + 1)β < δ 2.

The application of Lemma 3 now provides an n0 such that for any n ≥ n0, with

µ_{-tolerance β for B ∈ P}−n_{, the process induced on B generated by P}r _{has the}

desired β-independence properties involving the initial K return times of B. So, with tolerance δ +β < 2δ we have both, the above β-independence and the estimate µB(A) < δ for every A ∈ Pr. Let Bn be the subfamily of P−n where these two

conditions hold. Fix some n ≥ n0.

Let us consider a cylinder set B ∩ A ∈ P[−n,r) _{(or, equivalently, the block BA),}

where B ∈ Bn, A ∈ AB. The length of BA is n + r, which represents an arbitrary

integer larger than n0+ r. Notice that the family of such sets BA covers more than

(1 − 2δ)(1 − δ) ≥ 1 − ǫ of the space.

We will examine the distribution of the normalized first return time for BA. In addition to our customary notations of return times, let R(B)_A be the first (absolute) return time of A in ((Pr

B)Z, µB, σB), i.e., the variable defined on BA, counting the

number of visits to B until the first return to BA. Let p = µB(A). We have

FBA(t) = µBA{RBA ≤ t} = µBA{RBA ≤_µ(BA)t } =

X

k≥1

µBA{R(B)_A = k, R(k)_B ≤_pµ(B)t }.

The kth _{term of this sum equals} 1

pµB({Ak = A} ∩ {Ak−16= A} ∩ · · · ∩ {A16= A} ∩ {A0= A} ∩ {R (k)

B ≤ pµ(B)t }),

where Ai is the r-block following the ith copy of B (the counting starts from 0 at

the copy of B positioned at [−n, −1]).

By Lemma 3, for k ≤ K, in this intersection of sets each term is β-independent of the intersection right from it. So, proceeding from the left, we can replace the probabilities of the intersections by products of probabilities, allowing an error of β. Note that the last term equals µB{R

(k)

B ≤ tp} = F (k) B (

t

p). Jointly, the inaccuracy

will not exceed (K + 1)β: µBA{R (B) A = k, R (k) B ≤pµ(B)t } − p(1 − p) k−1_F(k) B (pt) ≤ (K + 1)β.

(16)

Similarly, we also have_µBA{R(B)_A = k} − p(1 − p)k−1

≤ Kβ, hence the tail of the series µBA{R_A(B)= k} above K is smaller than K2β plus the tail of the geometric

series p(1−p)k−1_{, which, by the fact that p ≥ δ(#P)}−r_{, is smaller than}δ

2. Therefore FBA(t) ≈ X k≥1 p(1 − p)k−1F_B(k)(t p), up to (K2_{+ K + 1)β +}δ

2 ≤ δ, uniformly for every t. By the application of Lemma 0,

GBA satisfies

GBA(t) ≤ min{1,_{log e}1 _p(1 − e−tp ) + δt} ≤ gδ(t) ≤ 1 − et+ ǫ

(because p ≤ δ). We have proved that for our choice of ǫ and an arbitrary length m_{≥ n}0+ r, with µ-tolerance ǫ for the cylinders C ∈ Pm, the intensity of repelling

between visits to C is at most ǫ. This concludes the proof of Theorem 1. Consequences for limit laws

The studies of limit laws for return/hitting time statistics are based on the following approach: For x ∈ PZ _{define F}

x,n = FB (and ˜Fx,n = ˜FB), where B is

the block x[0, n) (or the cylinder in Pn _{containing x). Because for nondecreasing}

functions F : [0, ∞) → [0, 1], the weak convergence coincides with the convergence at continuity points, and it makes the space of such functions metric and compact, for every x there exists a well defined collection of limit distributions for Fx,n(and

for ˜Fx,n) as n → ∞. They are called limit laws for the return (hitting) times at

x. Due to the integral relation ( ˜FB ≈ GB) a sequence of return time distributions

converges weakly if and only if the corresponding hitting time distributions converge pointwise (see [H-L-V]), so the limit laws for the return times completely determine those for hitting times and vice versa. A limit law is essential if it appears along some subsequence (nk) for x’s in a set of positive measure. In particular, the

strongest situation occurs when there exists an almost sure limit law along the full sequence (n). Most of the results concerning the limit laws, obtained so far, can be classified in three major groups: a) characterizations of possible essential limit laws for specific zero entropy processes (e.g. [D-M], [C-K]; these limit laws are usually atomic for return times or piecewise linear for hitting times), b) finding classes of processes with an almost sure exponential limit law along (n) (e.g. [A-G], [H-S-V]), and c) results concerning not essential limit laws, limit laws along sets other than cylinders (see [L]; every probabilistic distribution with expected value not exceeding 1 can occur in any process as the limit law for such general return times), or other very specific topics. As a consequence of our Theorem 1, we obtain, for the first time, a serious bound on the possible essential limit laws for the hitting time statistics along cylinders in the general class of ergodic positive entropy processes. The statement (1) below is even slightly stronger, because we require, for a subsequence, convergence on a positive measure set, but not necessarily to a common limit.

Theorem 2. _{Assume ergodicity and positive entropy of the process (P}Z_{, µ, σ).}

(1) If a subsequence (nk) is such that ˜Fx,nk converge pointwise to some limit

laws ˜Fx on a positive measure set A of points x, then almost surely on A,

˜

Fx(t) ≤ 1 − e−t at each t ≥ 0.

(2) If (nk) grows sufficiently fast, then there is a full measure set, such that for

every x in this set holds: lim supk F˜x,nk(t) ≤ 1 − e

−t

(17)

Proof. The implication from Theorem 1 to Theorem 2 is obvious and we leave it to the reader. For (2) we hint that (nk) must grow fast enough to ensure summability

of the measures of the sets where the intensity of repelling persists. Examples

The first construction will show that for each δ > 0 and n ∈ N there exists N _{∈ N and an ergodic process on N symbols with entropy log}₂N_{− δ, such that} the n-blocks from a collection of joint measure equal to 1_n repel with nearly the maximal possible intensity e−1_{. Because δ can be extremely small compared to} 1

n,

this construction illustrates, that there is no “reduction of entropy” by an amount proportional to the fraction of blocks which reveal strong repelling.

Example 1. _{Let P be an alphabet of a large cardinality N. Divide P into two} disjoint subsets, one, denoted P0, of cardinality N0= N 2−δand the relatively small

(but still very large) rest which we denote by {1, 2, . . . , r} (we will refer to these symbols as “markers”). For i = 1, 2, . . . , r, let Bi be the collection of all n-blocks

whose first n − 1 symbols belong to P0 and the terminal symbol is the marker

i. The cardinality of Bi is N0n−1. Let Ci be the collection of all blocks of length

nN₀n−1 _{obtained as concatenations of blocks from B}i using each of them exactly

once. The cardinality of Ci is (N0n−1)!. Let X be the subshift whose points are

infinite concatenations of blocks from Sr_i=1Ci, in which every block belonging to

Ci is followed by a block from Ci+1 (1 ≤ i < r) and every block belonging to Cr

is followed by a block from C1. Let µ be the shift-invariant measure of maximal

entropy on X. It is immediate to see that the entropy of µ is 1 nN0n−1

log2((N0n−1)!),

which, for large N , nearly equals log2N0 = log2N − δ. Finally observe that the

measure of each B ∈ Bi equals _nrN1n−1 0

, the joint measure of Sr_i=1Bi is exactly 1

n, and every block B from this family appears in any x ∈ X with gaps ranging

between 1− 1 r µ(B) and 1+1 r

µ(B), revealing strong repelling.

Remark 1. Viewing blocks of length nrN0n−1 starting with a block from C1 as a

new alphabet, and repeating the above construction inductively, we can produce an example (with the measure of maximal entropy on the intersection of systems created in consecutive steps) with entropy log2N−2δ, in which the strong repelling

will occur with probability 1

nk for infinitely many lengths nk.

Remark 2. The process described in the above remark is (somewhat coincidently; it was not designed for that) bilaterally deterministic: for every m ∈ N the sigma-field P(−∞,−m]∪[m,∞) _{equals the full (product) sigma-field. Indeed, suppose we see all}

entries of a point x except on the interval (−m, m). In a typical point, this interval is contained between a pair of successive markers i for some level k of the inductive construction. Then, by examining this point’s entries far enough to the left and right we will see completely all but one blocks from the family Bi which constitute

the block C ∈ Ci covering the considered interval. Because every block from Bi is

used in C exactly once, by elimination, we will be able to determine the missing block and hence all symbols in (−m, m).

The next construction shows that there exists a process isomorphic to a Bernoulli process with an almost sure limit law ˜F _{≡ 0 for the normalized hitting times (strong} attracting), achieved along a subsequence of upper density 1. In particular, this answers in the negative a question of Zaqueu Coelho ([C]), whether all processes isomorphic to Bernoulli processes have necessarily the exponential limit law for the

(18)

hitting (and return) times. The idea of the construction was suggested to us by D. Rudolph ([R2]), who attributes the method to Arthur Rothstein.

Example 2. We will build a decreasing sequence of subshifts of finite type (SFT’s). In each we will regard the measure of maximal entropy. Begin with the full shift X0 on a finite alphabet. Select r words W1, W2, . . . , Wr of some length l and

create r SFT’s: X0(1), X (2) 0 , . . . , X

(r)

0 , forbidding one of these words in each of them,

respectively. Choose a length n so large that in the majority of blocks of this length in X₀(i)all words Wjexcept Wiwill appear at least once. Now choose another length

m, such that in the majority of blocks of this length every block C of length between nand n2_{will appear many times. Now define X}

1as the subshift whose each point

is a concatenation of the form . . . B1B2. . . BrB1. . ., where Bi is a block appearing

in X₀(i) of length either m or m + 1. Obviously, a typical block C of any length between n and n2appearing in X1 comes from some X0(i), hence contains all Wj’s

except Wi, therefore in a typical x ∈ X1, C will appear many times within each

component Birepresenting X0(i), and then it will be absent for a long time, until the

next representative of X0(i). So, every such block will reveal strong attracting. It is

not hard to see that X1is a mixing SFT and its d-bar distance from the full shift is

small whenever the length l of the (few) forbidden words Wi is large. We can now

repeat the construction starting with X1, and radically increasing all parameters.

We can arrange that the d-bar distances are summable, so the limit system X (the intersection of the Xk’s), more precisely its measure of maximal entropy, is also a

d-bar limit. Each mixing SFT is isomorphic to a Bernoulli process and this property passes via d-bar limits (see [O], [Sh]), hence X is also isomorphic to a Bernoulli process. This system has the almost sure limit law ˜F _{≡ 0 for hitting times (or} F _{≡ 1 for the return times) achieved along a sequence containing infinitely many} intervals of the form [n, n2_{]. Such sequence has upper density 1.}

Remark 3. It is also possible to construct a process Xh as above with any

preas-signed entropy h. On the other hand, it is well known ([Si]), that every measure-preserving transformation with positive entropy h possesses a Bernoulli factor of the same entropy. By the Ornstein Theorem ([O]) this factor is isomorphic to Xh. The generator of Xh appears as a partition of the space on which the initial

measure-preserving transformation is defined. This proves the universality of “the law of series”: in every measure-preserving transformation there exists a partition generating the full entropy, which has the “strong repelling properties”(i.e., almost sure limit law ˜F ≡ 0 along a sequence of lengths of upper density 1).

Various zero entropy processes with persistent repelling or attracting are implicit in the existing literature. Extreme repelling (with intensity converging to e−1 as the length of blocks grows) occurs for example in odometers, or, more generally, in rank one systems ([C-K]). For completeness, we sketch two zero entropy processes with features of positive entropy: repelling, and the unbiased behavior.

Example 3. Take the product of the independent Bernoulli process on two symbols with an odometer (modeled by an adequate process, for example a regular Toeplitz subshift; see [D] for details on Toeplitz flows). Call this product process X0. The

odometer factor provides, for each k ∈ N, markers dividing each element into so-called k-blocks of equal lengths pk. Each pkis a multiple of pk−1and each k-block is

a concatenation of (k−1)-blocks. Now we create a new process X1 by “stuttering”:

(19)

. . . AABBCCDD . . ., with the number of repetitions q1= 2. In X1the lengths of

the k-blocks for k > 1 have doubled. Repeating the stuttering for 2-blocks of X1

with a number of repetitions q2≥ 2, we obtain a process X2. And so on. Because

in each step we reduce the entropy by at least half, the limit process has entropy zero. If the qk’s grow sufficiently fast, we obtain, like in the previous example, a

system with strong attracting for a set of lengths of upper density 1. Consider a modification of this example where qk = 2 for each k and each pair AA (also BB,

etc.) is substituted by AA (BB, etc.), where A is the “mirror” of A, i.e., with the symbols 0 are replaced by 1 and vice versa. It is not very hard to compute, that such process (although has entropy zero), has the same limit law properties as the independent process: almost sure convergence along the full sequence (n) to the unbiased (exponential) limit law.

Remark 4. It is not hard to construct zero entropy processes with persistent mixed behavior. For example, applying the “stuttering technique” to an odometer one obtains a process in which a typical block B occurs in periodically repeated pairs: BB...BB...BB..., i.e., with the function GB ≈ min{1,t2} (which reveals

attracting with intensity log 2−1₂ at t1 = log 2 and repelling with intensity e−2 at

t2= 2). We skip the details.

Questions

Question 1. Is there a speed of the convergence to zero of the joint measure of the “bad” blocks in Theorem 1? More precisely, does there exist a positive function s(n, ǫ, #P) converging to zero as n grows, such that if for some ǫ and infinitely many n’s, the joint measure of the n-blocks which repel with intensity ǫ exceeds s_{(n, ǫ, #P), then the process has necessarily entropy zero? (By the Example 1,} _n1 is not enough.)

Question 2. Can one strengthen the Theorem 2 as follows: lim sup

n→∞

˜

Fx,n≤ 1 − e−t µ-almost everywhere?

Question 3. In Lemma 3, can one obtain Pr _{conditionally β-independent of jointly}

the past and all return times R_B(k)(k ≥ 1) (for sufficiently large n, with µ-tolerance β _{for B ∈ P}−n_{)? In other words, can the β-independent process ((P}r

B)Z, µB, σB)

be obtained β-independent of the factor-process generated by the partition into B and its complement?

Question 4. (suggested by J-P. Thouvenot) Find a purely combinatorial proof of Theorem 1, by counting the quantity of very long strings (of length m) inside which a positive fraction (in measure) of all n-blocks repel with a fixed intensity. For sufficiently large n this quantity should be eventually (as m → ∞) smaller than hmfor any preassigned positive h.

Question 5. As we have mentioned, we only know about conditions which ensure that the limit law for the return time is exponential. It would be interesting to find a (large) class of positive entropy processes for which the distributions of return times are essentially deviated from exponential for bounded away from zero in measure collections of arbitrarily long blocks, i.e., a class of processes with persistent attracting. Can one prove that persistent attracting is, in some reasonable sense, a “typical” property in positive entropy, or that for a fixed measure-preserving transformation with positive entropy, a “typical” generator (partition) leads to persistent attracting?

(20)

References

[A-G] Abadi, M. and Galves, A., Inequalities for the occurrence times of rare events in mixing processes. The state of the art. Inhomogeneous random systems, Markov Process. Related Fields 7 (2001), 97–112.

[C-K] Chaumoˆitre, V. and Kupsa, M., Asymptotics for return times of rank one systems, Stochastics and Dynamics 5 (2005), 65–73.

[C] Coelho, Z., Asymptotic laws for symbolic dynamical systems, Topics in symbolic dynam-ics and applications, London Math. Soc. Lecture Note Ser., vol. 279, Cambridge Univ. Press, 2000, pp. 123–165.

[D] Downarowicz, T., Survey of odometers and Toeplitz flows, Algebraic and Topological Dynamics (Kolyada, Manin, Ward eds), Contemporary Mathematics 385 (2005), 7–38. [D-M] Durand, F. and Maass, A., Limit laws for entrance times for low-complexity Cantor

minimal systems, Nonlinearity 14 (2001), 683–700.

[G] Gureviˇc, B. M., One- and two-sided regularity of stationary random processes, Dokl. Akad. Nauk SSSR 210 (1973), 763–766.

[H-L-V] Haydn, N., Lacroix, Y. and Vaienti, S., Hitting and return times in ergodic dynamical systems, Ann. Probab. 33 (2005), 2043–2050.

[H-S-V] Hirata, N. Saussol, B. and Vaienti, S., Statistics of return times: A general framework and new applications, Commun. Math. Phys. 206 (1999), 33–55.

[Kc] Kac, M., On the notion of recurrence in discrete stochastic processes, Bull. Amer. Math. Soc. 53 (1947), 1002–1010.

[Km] Kammerer P., Das Gesetz der Serie, eine Lehre von den Wiederholungen im Lebens und im Weltgeschehen, Stuttgart und Berlin, 1919.

[L] Lacroix, Y., Possible limit laws for entrance times of an ergodic aperiodic dynamical system, Israel J. Math. 132 (2002), 253–263.

[Mi] Von Mises, Probability, Statistics and Truth, 2nd ed. rev., New York, Dover, 1981. [Mo] Moisset J., La loi des s´eries, JMG Editions, 2000.

[O] Ornstein, D.S., Bernoulli shifts with the same entropy are isomorphic, Adv. Math. 4 (1970), 337–352.

[O-W1] Ornstein, D.S. and Weiss, B., Every transformation is bilaterally deterministic, Israel J. Math. 21 (1975), 154–158.

[O-W2] Ornstein, D.S. and Weiss, B., Entropy and recurrence rates for stationary random fields, IEEE Trans. Inform. Theory 48 (2002), 1694–1697.

[P] Petersen, K., Ergodic Theory, Cambridge Univ. Press, Cambridge, 1983.

[R1] Rudolph, D.J., If a two-point extension of a Bernoulli shift has an ergodic square, then it is Bernoulli, Israel J. Math. 30 (1978), 159–180.

[R2] Rudolph, D.J., private communication.

[Sh] Shields, P., The theory of Bernoulli shifts, Chicago Lectures in Mathematics, The Uni-versity of Chicago Press, Chicago, 1973.

[Si] Sinai, Y., Weak isomorphism of transformations with an invariant measure, Sov. Math., Dokl. 3 (1963), 1925–1729; translation from Dokl. Akad. Nauk SSSR 147 (1962), 797–800. [Sm] Smorodinsky, M., On Ornstein’s isomorphism theorem for Bernoulli shifts, Advances in

Math. 9 (1972), 1–9.

[St] Sterzinger, O., Zur Logik und Naturphilosophie der Wahrsheinlichkeitslehre, Leipzig, 1919.

[W] Walters, P., Ergodic theory–introductory lectures, Lecture Notes in Mathematics, vol. 458, Springer-Verlag, Berlin, 1975.

Institute of Mathematics and Computer Science, Wroclaw University of Tech-nology, Wybrze ˙ze Wyspia´nskiego 27, 50-370 Wroc law, Poland

E-mail address: downar@pwr.wroc.pl

Institut des Sciences de l’Ing´enieur de Toulon et du Var, Avenue G. Pompidou, B.P. 56, 83162 La Valette du Var Cedex, France