A combinatorial proof of S-adicity for sequences with linear complexity

(1)

WITH LINEAR COMPLEXITY

Julien Leroy1

LAMFA, CNRS UMR 6410, Univ. de Picardie Jules Verne, 33 Rue Saint Leu, 80039 cedex 01, France

julien.leroy@uni.lu Gwenaël Richomme

LIRMM (CNRS, Univ. Montpellier 2), UMR 5506 - CC 477, 161 rue Ada, 34095 Montpellier Cedex 5 and Univ. Montpellier 3, Route de Mende, 34199 Montpellier

Cedex 5, France

gwenael.richomme@lirmm.fr

Received: , Revised: , Accepted: , Published:

Abstract

Using Rauzy graphs, Ferenczi proved that if a symbolic dynamical system has linear complexity then it is S-adic. Being more specific, the result can also be proved for infinite words. We provide a new proof of this latter result using the notion of return words to a set of words.

1. Introduction

Since Morse and Hedlund’s works [18], due to their numerous properties and applications, Sturmian words were given a lot of attention (see for instance surveys and studies in [1, 3, 8, 15, 20]). In this paper, we are concerned with two of these properties. First, Sturmian words are examples of words with linear complexity [18]. Recall that the complexity of an infinite word is the function that associates to each integer n the number of factors (blocks of consecutive letters) of length n that the infinite word contains. Secondly, Sturmian words have a multi-scale property since each Sturmian word is the image of another Sturmian word by a morphism (a map, also called substitution, defined on letters and compatible with the concatenation operation) taken from a finite set Ssturm consisting of 4 morphisms [17]. More

precisely, given a Sturmian word s, one can find a sequence (fi)i≥0of morphisms of

SSturm and a letter (ai)i≥0, such that s = limn→∞f0· · · fn(aωn+1). Sturmian words

1_{This paper was partially written while the first author was invited to Montpellier in June 2011}

(2)

are then called SSturm-adic. More generally, an infinite word w is said to be S-adic

when S is a finite set of morphisms and the previous equality holds for w instead of s and for a suitable sequence of morphisms of S.

Since Pansiot’s works [19], it is known that fixed points of morphisms that are S-adic words for a singleton set S do not necessarily have linear complexity. Also, Cassaigne showed [21] that any word w = (wi)i≥0 (where each wi is a letter) over

an alphabet A is S-adic, with S = {σa, ϕa | a ∈ A}, where, using a letter α that

does not belong to A, the morphisms σaand ϕaare defined by, for all letters a, b in

A, σa(α) = a, ϕa(α) = αa, σa(b) = b = ϕa(b): w = limn→∞σw0ϕw1· · · ϕw_n(α ω_).

Nevertheless, it can be observed that (as far as we know) all infinite words with linear complexity that have been studied are S-adic for a suitable (finite) set S of morphisms. Moreover, an important result due to Ferenczi in the context of symbolic dynamical systems can be reformulated in the context of infinite words as follows.

Theorem 1.1. ([12], see also [13]) If an aperiodic and uniformly recurrent word has linear complexity then, for a finite set S of non-erasing morphisms, it is an everywhere growing S-adic sequence.

The everywhere growing property2 _{means that there exists a sequence (σ}

n)n∈N∈

SN

of morphisms of S generating the considered infinite word such that for all letters a in A, the length of σ0σ1· · · σn(a) tends to infinity with n. This property

is not verified by Cassaigne’s morphisms. The previous result of Ferenczi and other considerations (as for instance, the existence of an S-adic characterization of the family of linearly recurrent words [10]) are the origin of the S-adic conjecture which states the existence of a strong definition of S-adicity that would be equivalent to having linear complexity for a symbolic dynamical system or an infinite word (see [20, Section 12.1]). Unfortunately the everywhere growing property is not sufficient to solve this conjecture. Some examples [11] tend to show that the strong expected definition (if it exists) should concern not only the set S of morphisms but also the sequence of these morphisms used to define the infinite word.

Ferenczi’s proof of Theorem 1.1 generalizes an idea of Arnoux and Rauzy [2]; it is based on the study of Rauzy graphs, a tool independently introduced by Rauzy [23] and Boshernitzan [5] and now often used to study factor complexity of words. The Rauzy graph Gnof order n of an infinite word w is the directed graph whose vertices

are the factors of length n of w and such that any factor u = u1· · · un+1 of length

n+ 1 of w induces an edge in Gn from u1· · · unto u2· · · un+1. This edge is labelled

by the pair (u1, un+1). The label of a path in the graph is the concatenation of

labels of edges along the path.

Leroy [13] revisited Ferenczi’s proof in order to provide more information on the 2_{This property was called ω-growth property in [13]. However, when S is a singleton, the}

(3)

morphisms arising in the S-adic representation. He succeeded in going deeper into the study of underlying Rauzy graphs. He observed, without going further, that some important paths (called segments) in the Rauzy graphs used by Ferenczi are related to the notion of return words to a set. More precisely, the first (resp. the second) component of their labels corresponds to left (resp. right) return words to the set of left special factors. In this paper, we use this notion of return words to prove Theorem 1.1. Even though it is an adaptation of Ferenczi’s, we hope that our more compact proof will open new ideas to tackle the S-adic conjecture.

An interest of our approach may be to provide non-contextual definitions of short or long return words before the proof when their corresponding terms (short or long segments) in the context of Rauzy graphs are defined in Ferenczi’s proof. Furthermore, this concept of return words to a set W generalizes the one of Rauzy graphs since it does not require that the words in W share the same length.

The next two sections are preliminaries for our proof in Section 4. They allow us to present these notions of short and long return words in a general context independently of the complexity of the infinite words considered.

2. Return words to special words

The aim of this section is to recall the main combinatorial notions used in our proof. This is done through the development of an example that will be used in Section 3 to illustrate the existence of short return words that are not constant return words.

2.1. Words and morphisms

We assume that readers are familiar with combinatorics on words; for basic (possibly omitted) definitions, we follow [4, 14, 15].

Given an alphabet A, that is, a finite set of symbols called letters, we let A∗

denote the set of all finite words over A (that is, the set of all finite sequences of elements of A). As usual, the concatenation of two words u and v is simply denoted uv. It is well known that the set A∗ _{equipped with the concatenation operation is}

a free monoid with neutral element ε, the empty word.

Given two alphabets A and B, a free monoid morphism, or simply morphism σ, is a map from A∗ _{to B}∗ _{such that σ(uv) = σ(u)σ(v) for all words u and v over A}

(observe that this implies σ(ε) = ε). It is well known that a morphism is completely determined by the images of the letters.

From now on, denoting Γ = {e0, e1, e2, e3, b1, b2, b3, b4, a1, a2, 1, 2, c} and

Σ = {e1, e2, e3, b, a,1, 2, c}, we consider the following two morphisms ν and ψ

(4)

ν(e0) = e01 b41 1 e0 ψ(e0) = e1 ν(e1) = e1c c b11 e1 ψ(e1) = e1 ν(e2) = e2c c b22 e2 ψ(e2) = e2 ν(e3) = e3c c b32 e3 ψ(e3) = e3 ν(b1) = c c c e1a1b1 ψ(b1) = b ν(b2) = c c c e2a2b2 ψ(b2) = b ν(b3) = c c c c e3b3 ψ(b3) = b ν(b4) = c c c c c b4 ψ(b4) = b ν(1) = c c c c b11 ψ(1) = 1 ν(2) = c c c c b12 ψ(2) = 2 ν(a1) = c a1c c e0e1 ψ(a1) = a ν(a2) = c a2c c e2e3 ψ(a2) = a ν(c) = c c c c a1a2 ψ(c) = c

One can observe that ν is uniform, that is, the images of the letters have the same length (let us recall that the length of a word u, denoted by|u|, is the number of letters of u). The morphism ν is also primitive, that is, there exists an integer k (here k = 5) such that for all letters x, y in Γ, the letter y occurs in νk_(x).

Note also that the morphisms ν and ψ◦ ν are injective. Moreover, they are synchronizing morphisms; a morphism f from A∗ _{to B}∗ _{is synchronizing if for all}

letters x, y, z in the alphabet A and words p, s over A, the equality f (x)f (y) = pf(z)s implies p = ε or s = ε.

2.2. Infinite words and factors

A (right) infinite word (or sequence) over an alphabet A is an element of Aω_.

When a morphism is not erasing, that is, the images of letters are never the empty word, the notion of morphism extends naturally to infinite words. From now on, u will be the fixed point of ν (u = ν(u)) starting with the letter c.

A word v is a factor of a finite or infinite word w over an alphabet A if it occurs in w or equivalently, if w = xvy for some words x, y in A∗_{∪ A}ω_{. When x = ε (resp.}

y= ε), v is a prefix (resp. suffix ) of w. We let F(w) denote the set of all factors of w.

Observe that for all n≥ 0, words νn_{(1) and ν}n_{(2) are factors of u (let us recall}

that, given a morphism f and an integer n≥ 0, fn _{is the identity if n = 0, and is}

fn−1◦ f otherwise).

Fact 2.1. Let n≥ 0 be an integer and v be a factor of u. If ψ(v) = ψ(νn_{(1)) then}

v= νn_{(1). If ψ(v) = ψ(ν}n_{(2)) then v = ν}n_(2).

Proof. Let i ∈ {1, 2}. The result is immediate for n = 0 from the definitions of ν and ψ. Assume n ≥ 1. Since ψ ◦ ν is synchronizing and ψ(v) is a factor of ψ(u) = ψ ◦ ν(u), there must exist a word v′ _{such that ψ(v) = ψ ◦ ν(v}′_{). The}

(5)

all words x in Γ∗_{, the equality ψ(x) = ψ ◦ ν(α) implies x = ψ(α). This extends to}

αin Γ∗_{. Hence v = ν}n_(i).

As with any fixed point of a primitive morphism [22], the word u is an example of a uniformly recurrent infinite word, i.e., its factors occur infinitely often and with bounded gaps.

The complexity function pwof an infinite word w is the function that counts the

number of factors of each length (See [7] for a recent survey on this function). An infinite word w is said to have linear complexity if there is a constant C such that pw(n) ≤ Cn for all n ≥ 1. Again, as with any fixed point of a primitive morphism

[22], the word u is an example of an infinite word with linear complexity. 2.3. Special factors

Let us recall that a factor u of an infinite word w over an alphabet A is left special if there exist at least two different letters a and b in A such that the words au and bu are factors of w.

For all n≥ 1, observing that the words νn_{(1) and ν}n_{(2) differ only by their last}

letters, we let vn denote their longest common prefix; v0 = ε, vn = ν(vn−1)ccccb1

for n≥ 1. We also let un denote the word ψ(vn).

Fact 2.2. For all n ≥ 1, the words abun, bun2 and un1e1 are examples of left

special factors of ψ(u).

Proof. As the words b11e1, b22e2, 11e0, and b32e3 are factors of u, for all n ≥ 1,

the following words are also factors of u: e1a1b1vn1e1c, e2a2b2vn2e2c, 1vn1e01,

e3b3vn2e3.

Thus, the words e1abun1e1c, e2abun2e2c, 1un1e11, e3bun2e3 are factors of ψ(u).

Hence the fact.

As a consequence of the previous fact, the word ψ(ν(u)) has arbitrarily large special factors. Thus it is aperiodic, that is, denoting by xi the letter in position i

of ψ(ν(u)), there does not exist any positive integers m, k such that, for all i ≥ m, xi= xi+k.

We will also need the next property.

Lemma 2.3. For all n≥ 1, the following words are factors of ψ(u) but are not left special factors of ψ(u): bun1, un1e1c, un1c, un1e11, un2e2, un2e3.

Proof. First, let x = bun1 = bψ(νn(1)) and assume that, for a letter α, αx is a

factor of ψ(u). As n ≥ 1, un starts with ccc. This implies that αb must be a suffix

of a word in ψ(ν(Γ)) and so α ∈ {a, e3}. If α = e3, then u must contain as a factor

the word e3b3vn1 = e3b3νn(1) (see Fact 2.1). But one can observe that u contains

(6)

Now let x = un1e1c= ψ(νn(1))e1c. If αx is a factor of ψ(u) for a letter α, then

there exists a letter β such that (we use here Fact 2.1 and the fact that e0cis not a

prefix of a word in ν(Γ)) βvn1e1cis a factor of u with α = ψ(β). We let the reader

verify (for instance by induction) that if, for an integer ℓ and a letter δ, δνℓ_(1)e 1is

a factor of u, then δ = b1. Hence β = b1 and α = b.

The proof for other words is similar. Let us just state the following facts (with the same notation as above).

x= un1c ⇒ β ∈ {b1, b4} and α= b

x= un1e11 ⇒ β = b1 and α= b

x= un2e2 ⇒ β ∈ {b1, b2} and α= b

x= un2e3 ⇒ β ∈ {b1, b3} and α= b

2.4. Return words

Given a set L of factors of an infinite word w, a word r is called a (left) return word to L in w if there exist two words u and v in L such that rv is a factor of w, rv admits u as a prefix and rv contains no other occurrences of words of L. Clearly if a set L contains a prefix of w and w is recurrent (all of its factors occur infinitely often), then w can be decomposed over the set of left return words to L.

In this paper, given an infinite word w, we will mostly consider the setRn(w) of

return words to the set LSn(w) composed of the prefix of length n ≥ 1 of w and of

all left special factors of length n of w. At the end of this section, we show that a, b and ab are elements ofRn(u) (where u is the word defined in the previous sections)

with a more precise description of the values of n for which this membership holds. Lemma 2.4. For all n≥ 1, {a, b} ⊆ Rn(u).

For all n≥ 0, ab ∈ R|un|+2(u) = R6n+1(u).

Proof. Let n≥ 1, |un| = 6n− 1 ≥ n. Let pi be the prefix of length i of un where

0 ≤ i ≤ |un|. From Fact 2.2, we know that abpn is a factor of ψ(u) and that the

words abpn−2 (when n ≥ 2), bpn−1 and pn are all left special factors (and all of

length n). As pn−2is a prefix of pn−1, itself a prefix of pn, a and b belong toRn(u).

The word abun1e1is also a factor of ψ(u) with abunand un1e1left special factors.

As, by Lemma 2.3, bun1 is not a left special factor, ab ∈ R|un|+2(u) = R6n+1(u).

The last part of Lemma 2.4 admits a converse.

Lemma 2.5. Let n≥ 2 be an integer. If ab ∈ Rn(u) then n = 6k+ 1 for some

k≥ 0.

Although this lemma will allow the important Remark 3.1, we provide its proof in an appendix (see Section 5) as it is rather long, technical, and a bit far from the main aim of the paper.

(7)

3. Short, constant and long return words

In this section, we consider particular return words that will be useful in the next section where we present our proof of Theorem 1.1. Note that, as already said in the introduction, the terms short and long were already used in Ferenczi’s paper [12] but, even though our concepts are related, they are differently defined. For instance the fact that the length of long return words tends to infinity is a consequence of the definition, not part of it.

It is important to observe that all results in this section are independent of the complexity of the words considered.

3.1. Definitions

We will need to distinguish various types of return words to the set LSn(w). A

factor u of w is called a short return word of w if for all n ≥ 0, there exists an integer N ≥ n such that u is an element of RN(w). When a factor u is a return

word of w (i.e. belongs toRn(w) for some integer n) without being a short return

word, we said that u is a long return word. Amongst short return words, we say that a factor u of w is a constant return word of w if there exists an integer N ≥ 0 such that for all n≥ N , u is an element of Rn(w). For n ≥ 0, we let respectively

R(L)n (w), R(S)n (w) and R(C)n (w) denote the set of long return words to LSn(w),

the set of short return words to LSn(w) and the set of constant return words to

LSn(w).

Remark 3.1. Our example u studied in Section 2 shows that not all short return words are constant return words. More precisely Lemma 2.4 and Lemma 2.5 prove that a, b and ab are examples of short return words with a and b con-stant return words and ab not a concon-stant return word. Indeed, for all n we have a, b∈ R(C)n (ψ(u)) ⊆ R(S)n (ψ(u)); ab ∈Sn_i=1R(S)n (ψ(u)) \Sn_i=1R(C)n (ψ(u)).

As c, cc and ccc are left special factors of ψ(u), and as cccc is a factor of ψ(u), c is a return word of ψ(u), and more precisely, c ∈ R1(ψ(u)) ∩ R2(ψ(u)) ∩ R3(ψ(u)).

Observe that for all words x of length at least four such that cx is a factor of ψ(u), xis always preceded in ψ(u) by the letter c. Thus, for all n ≥ 4, c is not a return word to LSn(ψ(u)); c ∈ (R (L) 1 (ψ(u)) ∩ R (L) 2 (ψ(u)) ∩ R (L) 3 (ψ(u))) \ R (L) 4 (ψ(u)).

One would like to have examples of short or constant return words for an infinite word defined over a smaller alphabet than Σ. We let the reader check that this can be obtained by recoding ψ(u) for instance with the following morphism f defined from Σ to {a, b} by f (a) = a8_{b, f (b) = a}7_b2_{, f (e}

1) = a6b3, f (e2) = a5b4, f (e3) =

a4_b5_{, f (1) = a}3_b6_{, f (2) = a}2_b7_{, f (c) = ab}8_.

We will see that for infinite words with linear complexity the set of short return words is finite. Thus it is convenient to letR(S)_{(w) (resp. R}(C)_{(w)) denote the set}

S

n≥1R (S)

(8)

3.2. Coherent decompositions

From our example, we can observe that the setsRn(w) may not be codes (a code

is a set of words such that any word w in C∗ _{has a unique decomposition over C).}

This leads us to introduce the notion of coherent decomposition.

A sequence (rk)k∈N∈ (Rn(w))Nsuch that w = r0r1r2· · · is called a

decomposi-tion of w. The unique decomposidecomposi-tion (rk)k∈Nof w such that a factor of w belongs

to LSn(w) if and only if it is the prefix of length n of riri+1· · · for some i ≥ 0 is

called the coherent decomposition of w of level n.

Let s1, . . . , sk be words. Let n be an integer and let w be an infinite word.

We say that (s1, . . . , sk) is a factor of the coherent decomposition of w of level n if

there exists an integer m ≥ 0 such that for all i in {1, . . . , k}, si = rm+i−1. Note

that if (s1, . . . , sk) is a factor of the coherent decomposition of w of level n, then

{s1, . . . , sk} ⊆ Rn(w).

The next result shows a connection between the coherent decompositions of w for two successive levels.

Lemma 3.2. Let (ri)i≥0 and (ri′)i≥0 be the coherent decompositions of an infinite

word w respectively of level n and level n + 1 (for an integer n ≥ 1). There exists a strictly increasing sequence of integers (ij)j≥0 such that i0= 0 and for all j ≥ 0,

r′

j= rij· · · rij+1−1.

Proof. This is a direct consequence of the definition of coherent decompositions and the observation that the prefix of length n of any element of LSn+1(w) is an element

of LSn(w).

3.3. Short return words are concatenations of constant return words In this section, we prove the following proposition that claims in a more precise way that any short return word of an infinite word w is a concatenation of constant return words to LSn(w).

Proposition 3.3. Given an infinite word w, for any short return word s of w, there exist infinitely many integers n such that, for the coherent decomposition (ri)i≥0 of

level n of w, s = ri· · · rj for some integers i and j such that all words rℓ, i≤ ℓ ≤ j,

are constant return words.

Actually Proposition 3.3 is a particular case of the following technical lemma (Proposition 3.3 corresponds to the case k = 1 of Lemma 3.4).

Lemma 3.4. Let w be an infinite word and let s1, . . . , sk be words such that

(s1, . . . , sk) is a factor of the coherent decomposition of w of level n for infinitely

many integers n (this implies in particular that all words s1, . . . , sk are short

(9)

there exist constant return words r1, . . . , rik such that (r1, . . . , rik) is a factor of

the coherent decomposition of w of level m for infinitely many integers m, and for all j ∈ {1, . . . , k}, sj= ri_j−1+1· · · rij.

Proof. We start by defining a partial order on tuples of words. Given any words t1, . . . , tℓ, s1, . . . , sk such that t1. . . tℓ= s1. . . sk, we let (t1, . . . , tℓ) < (s1, . . . , sk)

denote the fact that there exists an integer j in {1, . . . , min(ℓ, k)} such that for all i= 1, . . . , j − 1, ti = si and |tj| < |sj|. Observe that, given a word s, this partial

order is a total order on the set of all tuples (s1, . . . , sk) such that s = s1. . . sk.

This allows us to prove Lemma 3.4 by induction.

Let s1, . . . , sk and w be as in the lemma. Let D be the set of all integers n

such that (s1, . . . , sk) is a factor of the coherent decomposition of w of level n. By

hypothesis, this set is infinite.

First, consider the case where for all j, sj is of length 1. Let j, 1 ≤ j ≤ k. For all

nin D, by definition of this set, there exists a word un of length n such that both

un and the prefix of length n of sjun belong to LSn(w). As the prefix of length

n− 1 of any element of LSn(w) is an element of LSn−1(w), we deduce that for all

n≥ 1, there exists a word vnof length n such that both vn and the prefix of length

nof sjvnis an element of LSn(w). As D is infinite, this shows that sjis a constant

return word.

We now turn to the case where there exists a word sj (with j ∈ {1, . . . , k})

which is not a constant return word. It follows that there exists an infinite sub-set D′ _of _{D such that for all n in D}′_{, (s}

1, . . . , sk) is not a factor of the coherent

decomposition of w of level n− 1, whereas (s1, . . . , sk) is a factor of the

coher-ent decomposition of w of level n. By Lemma 3.2, there exist some words t1(n),

. . . , tℓ(n)(n) and some integers i0 = 0 < i1 < . . . < ik = ℓ(n) such that for all

j = 1, . . . , k, we have sj = tij−1+1(n) · · · tij(n) and (t1(n), . . . , tℓ(n)(n)) is a factor

of the coherent decomposition of w of level n− 1: each word sj is a composition

of return words to LSn−1(w), and at least one is the concatenation of at least two

such words. Moreover, we have (t1(n), . . . , tℓ(n)(n)) < (s1, . . . , sk). Even though the

words t1(n), . . . , tℓ(n)(n) depend on n, as the length of |t1(n) · · · tℓ(n)(n)| = |s1. . . sk|

is fixed, there must exist an integer ℓ, some words t1, . . . , tℓ, and an infinite setD′′

such that for all n in D′′_{, ℓ(n) = ℓ and for all i = 1, . . . , ℓ, t}

i(n) = ti. The lemma

ends by induction applied to (t1, . . . , tℓ).

3.4. More on long return words

The next result justifies a posteriori the terminology long for long return words. Proposition 3.5. For an infinite word w, if R(L)n (w) 6= ∅ for infinitely many

integers n, then

lim

n→∞min{|r| | r ∈ R (L)

(10)

Proof. This result is a direct consequence of the definitions of short and long return words.

Assume by contradiction that limn→∞min{|r| | r ∈ R(L)n (w)} is finite. Let D

be an infinite subset of N such thatR(L)n (w) 6= ∅ for all n in D. For all i ∈ D, let

ri be a word with minimal length in R (L)

i (w). As the length of ri is bounded, the

set{ri| i ∈ D} is finite. In particular, there exists a word r such that r ∈ R(L)i (w)

for all i in an infinite subset ofD. Thus, r is a short return word of w, which is a contradiction as long return words cannot be short by definition.

3.5. Return words to a word

In almost the entire paper we are concerned with the set of return words to the set of left special words of an infinite word. The next important and well-known result uses the notion of return word to a single word.

A nonempty word r is a return word to a factor u of an infinite word w if ru is a factor of w starting with u and containing exactly two occurrences of u as a factor (one as a prefix and one as a suffix).

Proposition 3.6. Let w be an aperiodic and uniformly recurrent infinite word. We have

lim

n→+∞ u factor of wmin |u|=n

{|r| | r return word to u} = +∞.

Although the previous result is certainly well known (see for instance [9] for a proof of a similar result), we provide a short proof in order to get a self-contained paper. Let us recall one fundamental lemma of combinatorics on words (see [16] and for instance [14, Prop. 1.3.4]).

Lemma 3.7. For two nonempty words x and y, there exists a word z such that xz = zy if and only if there exist two words u and v such that x = uv, z ∈ (uv)∗_u

and y = vu.

Proof of Proposition 3.6. Assume by contradiction that the limit in Proposition 3.6 is finite. This implies that there exists a word r which is the return word to infinitely many words (un)n≥0. This means that there exist some words (xn)n≥0 such that

unxn = run. By Lemma 3.7, this implies that rk is a factor of w for arbitrarily

large k, which is a contradiction with the fact that w is aperiodic and uniformly recurrent.

4. S-adicity of infinite words with linear complexity

In this section, we provide our proof of Theorem 1.1. Each subsection corresponds to a step or a key element of the proof. We first clarify the notion of S-adicity. Let

(11)

A be an alphabet. Recall that with the product topology, the set Aω _{of infinite}

words over A is a compact metric space. As usual, for a finite word u, uω _denotes

the infinite word obtained by concatenating infinitely many copies of u.

Let w be an infinite word over A. An adic representation of w is given by a sequence (An)n∈N of alphabets, a sequence (σn : A∗n+1 → A∗n)n∈N of morphisms

and a sequence (an)n∈N of letters such that ai ∈ Ai for all i ≥ 0, A0 = A,

limn→+∞|σ0σ1· · · σn(an+1)| = +∞ and

w= lim

n→+∞σ0σ1· · · σn(a ω n+1).

When all morphisms σn, n∈ N, belong to a given finite set S of morphisms, we say

that w is S-adic (often omitting the sequence of morphisms and letters). 4.1. The number of left special factors is bounded

The starting point of the proof of Theorem 1.1 is the following breakthrough result of Cassaigne result on complexity.

Theorem 4.1. [6] An infinite word w has linear complexity if and only if the first difference of its complexity (pw(n + 1) − pw(n)) is bounded.

For a finite word u and an infinite word w, let δ−

wudenote the number of letters

a such that au is a factor of w. Recall that a word u is a left special factor of w if and only if δ−

wu≥ 2. Since Rauzy’s works [23], it is well known that the first

difference of complexity is related to special factors by the following Equation (1). For a recurrent infinite word w,

pw(n + 1) − pw(n) =

X

|u|=n, u left special factor of w

(δw−u− 1) (1)

Thus Theorem 4.1 has the following corollary (let us recall from Section 2.4 that LSn(w) is the set consisting of the prefix of length n and the left special factors of

length n of w).

Corollary 4.2. A recurrent infinite word w has linear complexity if and only if the sets LSn(w), n ≥ 0, have bounded cardinality.

4.2. The number of return words is bounded

Let us recall that Rn(w) is the set of return words to LSn(w) where w is an

infinite word and n≥ 1 is an integer.

Proposition 4.3. If w is a recurrent infinite word with linear complexity, then the cardinality of the sets Rn(w) is bounded.

(12)

Proof. Let r be a return word to the set LSn(w). By definition of this set, there

exists a word u in LSn(w) (u is of length n and is a left special factor or a prefix of

w) such that ru is a factor of w. Moreover for each factor sp of ru of length n with 1 ≤ |s| < |r|, s suffix of r and p prefix of u, there exists a unique letter γ such that γsp is a factor of w. Hence r is entirely determined by the pair (u, α) with α its last letter (but the pair may not be unique). Thus, Card(Rn(w)) ≤P_u∈LS_n_(w)δ−u= p(n +

1) − p(n) + Card(LSn(w)). The lemma follows from Theorem 4.1 and Corollary 4.2.

The reader will note that the cardinality of a set X is denoted by Card(X). 4.3. The number of constant return words is finite

Let us recall from Section 3.1 that R(C)_{(w) = ∪}

n≥1R(C)n (w) is the set of all

constant return words to the sets LSn(w).

Proposition 4.4. For any recurrent infinite word w with linear complexity, the set R(C)_{(w) is finite.}

Proof. This is a direct consequence of Proposition 4.3. Let us consider K = maxn≥1Card(Rn(w)). Assume by contradiction that R(C)(w) is not finite. Let

r1, . . . , rK+1be distinct elements of R(C)(w). There are some integers N1, . . . , NK+1

such that for all i in{1, . . . , K + 1} and for all n ≥ Ni, ribelongs to R (C)

n (w). Thus,

for n ≥ max{Ni | 1 ≤ i ≤ K + 1}, we have {r1, . . . , rK+1} ⊆ Rn(w), which is a

contradiction with Card(Rn(w)) ≤ K.

4.4. The number of short return words is finite Let us recall from Section 3.1 thatR(S)_{(w) = ∪}

n≥1R(S)n (w) is the set of all short

return words to the sets LSn(w).

Proposition 4.5. For any uniformly recurrent infinite word w with linear com-plexity, the setR(S)_{(w) is finite.}

This result is a consequence of the following technical lemma and Proposition 3.3. Lemma 4.6. Let w be an aperiodic and uniformly recurrent infinite word with linear complexity. Let (Li)i≥1 be a family of sets of words such that Li ⊆ Ri(w)

for all i≥ 1 and R = ∪i≥1Li is finite.

There exist two integers K and N such that for all n≥ N and for any coherent decomposition (ri)i∈Nof w into words ofRn(w), the number of consecutive elements

of R in (ri)i∈N is bounded by K.

In particular, the concatenation of such return words in a coherent decomposition of level greater than N has length bounded by Kℓ where ℓ is the maximal length of words in R.

(13)

Proof. We let ℓ be, as defined in the lemma, the maximal length of words in R. By Corollary 4.2 and Proposition 4.3, there exists an integer K such that for all n, Card(Rn(w) × LSn(w)) ≤ K. By Proposition 3.6, there exists an integer N

such that for all factors u of length n≥ N of w, all return words to u have length greater than Kℓ. Finally, consider an integer n≥ N and let (ri)i≥0 be a coherent

decomposition of w of level n.

Assume that there exist K + 1 consecutive elements of Ln in the considered

decomposition of w, say rm, . . . , rm+K for an integer m≥ 1. For i ≥ 0, let li be

the prefix of length n of the word ri+1ri+2· · · . From the choice of K, there exist two

integers i and j such that m≤ i < j ≤ m + K and (ri, li) = (rj, lj). Consequently,

the word ri+1· · · rj is a return word to li and|ri+1· · · rj| ≤ Kℓ, which contradicts

the choice of N .

Proof of Proposition 4.5. By Proposition 3.3, any short return word is the concate-nation of constant return words r1, . . . , rk such that (r1, . . . , rk) is a factor of

infinitely many coherent decompositions of w. By Proposition 4.4, the setR(C)_(w)

is finite. We can then apply Lemma 4.6 with Ln= R(C)(w) ∩ Rn(w) for all n ≥ 0

to deduce that short return words have bounded length. Consequently, the set of short return words is finite.

4.5. About long return words

As an immediate consequence of Lemma 4.6, the next result states that any sufficiently long factor of a coherent decomposition (of sufficiently large level n) contains a long return word.

Lemma 4.7. Let w be a uniformly recurrent infinite word with linear complexity. There exist some integers K and N such that for all n≥ N , if k ≥ 1 and (r1, . . . , rk)

is a factor of the coherent decomposition of w of level n with|r1· · · rk| ≥ K, then

at least one of the words ri (1 ≤ i ≤ k) belongs to R (L) n (w).

Proof. By Lemma 4.6 (taking Li= R(S)i (w) for all i ≥ 0), there exist some integers

K1 and N such that for all n ≥ N , at most K1 short return words can occur

successively in the coherent decomposition of w of level n. Let ℓ = max{r | r ∈ R(S)_{(w)}. Lemma 4.7 follows with K = K}

1ℓ+ 1.

Remark 4.8. With the same notation as in Lemma 4.7, we let M denote the smallest integer such that M ≥ N and min{|u| | u ∈ R(L)_M (w)} ≥ K. Due to Lemma 3.2, we have min{|u| | u ∈ R(L)n (w)} ≥ K for all n ≥ M .

4.6. Morphisms

Up to the end of Section 4, w always denotes a recurrent infinite word with linear complexity. For all n≥ 0, let Un(w) be the set of tuples ((r1, ℓ1), . . . , (rk, ℓk)), k ≥ 1,

(14)

such that

• there exists an integer j, 1 ≤ j ≤ k, such that rj ∈ R (L)

n (w) and for

all i 6= j, ri ∈ R (S)

n (w) (by abuse of notation, we write (r1, . . . , rk) ∈

(R(S)n (w))∗R(L)n (w)(R(S)n (w))∗);

• with (si)i≥0the coherent decomposition of w of level n, there exists an integer

isuch that (r1, . . . , rk) = (si, . . . , si+k−1) (i.e. (r1, . . . , rk) is a factor of the

coherent decomposition of w of level n), and for all j = 1, . . . , k, ℓj is the

prefix of length n of si+j· · · (ℓj ∈ LSn(w)).

Lemma 4.9. For any uniformly recurrent infinite word w with linear complexity, the cardinality of the sets (Un(w))n≥0 is ultimately bounded, i.e., there exist some

integers K and N such that for all n≥ N , Card(Un(w)) ≤ K.

Proof. By Lemma 4.6 (taking Li= R(S)i (w) for all i ≥ 0), there exist some integers

K and N such that for all n ≥ N , at most K short return words can occur suc-cessively in the coherent decomposition of w of level n. Note that, by definition of Un(w), for any of its elements ((r1, ℓ1), . . . , (rk, ℓk)), r1. . . rkℓk is a factor of w and

for all j = 1, . . . , k − 1, ℓj is the prefix of length n of rj+1. . . rkℓk (ℓj is uniquely

determined by rj+1, . . . , rk and ℓk). By Proposition 4.5 and Proposition 4.3, it

follows that, for all n ≥ N , the cardinality of Un(w) admits the upper bound

(Card(R(S)(w)))2(K+1)K′Card(LSn(w)) with K′= maxn≥0Card(R(L)n (w)).

Let K and M be integers as in Lemma 4.7 and Remark 4.8. Let n ≥ M and let ((r1, ℓ1), . . . , (rk, ℓk)) be an element of Un+1(w). Let (si)i≥0 be the coherent

decomposition of w of level n. By Lemma 3.2, there exists a strictly increasing sequence of integers (ij)1≤j≤k+1such that for all j≥ 0, rj = sij· · · sij+1−1and ℓkis

a prefix of sik+1· · · . By the choice of M , as |si1· · · sik+1−1| = |r1· · · rk| ≥ K,

at least one of the words sj is a long return word to LSn(w). Let k′ be the

number of integers i between i1 and ik+1 − 1 such that si ∈ R(L)n (w) and let

j1< j2< . . . < jk′be these integers. Let ℓ′_jbe the prefix of length n of s_j+1s_j+2· · · .

We have just described a way to associate to each element ((r1, ℓ1), . . . , (rk, ℓk))

of Un+1(w) a sequence ((si1, ℓ ′ i1), . . . , (sj1, ℓ ′ j1)), ((sj1+1, ℓ ′ j1+1), . . . , (sj2, ℓ ′ j2)), . . . , ((sj_{k′ −2}+1, ℓ′j_{k′ −2}+1), . . . , (sj_{k′ −1}, ℓj′_{k′ −1})), ((sj_{k′ −1}+1, ℓ′j_{k′ −1}+1), . . . , (sik, ℓ ′ ik)) of ele-ments ofUn(w).

For n≥ M , let An = {0, 1, . . . , Card(Un)(w) − 1} and let Θn be a bijection from

An toUn(w). Now we define our morphisms from A∗_n+1toA∗n by σn(a) = a1. . . ak

whenever Θn(a) can be associated as previously to Θn(a1) . . . Θn(ak).

We also let τ be the morphism fromA∗

n to A∗ (the alphabet of w) defined for

all a∈ An by τ (a) = u if Θn(a) = ((r1, ℓ1), . . . , (rk, ℓk)) and u = r1· · · rk.

Of course our morphisms depend on our construction. Other morphisms can be suitable.

(15)

4.7. The number of morphisms is finite

We continue with the notation introduced in the previous section. Let S = {σn|

n≥ M }.

Fact 4.10. The set S is finite.

Proof. We start with a technical observation. Let (r, l) in Rn+1(w)×LSn+1(w) such

that rl is a factor of w. Following Lemma 3.2, there exist (r1, l1), (r2, l2), . . . , (rk, lk),

elements of Rn(w) × LSn(w), such that rl[1,n] = r1r2· · · rklk (where lk = l[1,n]

denotes the prefix of length n of l) and li is a prefix of ri+1. . . rklk for all i in

{1, . . . , k}. We are going to show that, for i, j in {2, . . . , k − 1}, i 6= j, we have (ri, li) 6= (rj, lj) and, consequently, k < 2 + Card(Rn(w)) × Card(LSn(w)).

Let l′ _{∈ LS}

n+1(w) be the prefix of rl of length n + 1. Assume by contradiction

that (ri, li) = (rj, lj) with 1 < i < j < k. Since there are no left special factors of

length n + 1 in rl (except its prefix l′ _{and its suffix l), for all integers h}_{∈ {1, . . . , i},}

the words rh· · · rili can be uniquely extended to the left in L(w) by rh−1. As

rili = rjlj, this implies that r1. . . rili is also a suffix of r1. . . rjlj. Thus l′ has an

internal occurrence in rl, which is a contradiction.

We now turn to the proof of the fact. We need to go back to the construc-tion of the morphisms and so we let ((r1, ℓ1), . . . , (rk, ℓk)) and integers (ij)1≤j≤k+1

be as in this construction. The previous observation shows that for all j, we have 1 ≤ j ≤ k, ij+1− ij ≤ 2 + Card(Rn(w)) × Card(LSn(w)), and so ik+1 −

i1 ≤ k(2 + Card(Rn(w)) × Card(LSn(w))). By Corollary 4.2, Proposition 4.3 and

Lemma 4.9, there exists an integer K1such that for all n≥ M , Card(Rn(w)) ≤ K1,

Card(LSn(w)) ≤ K1 and k≤ K1. As the length of σn(Θ−1n+1((r1, ℓ1), . . . , (rk, ℓk)))

is bounded by ik+1− i1, it is also bounded above by K1(2 + K12).

In other words, there exists an integer K, such that for all n ≥ M and for all letters a∈ An+1, |σn(a)| ≤ K. As also Card(An) ≤ K, the set S is finite.

4.8. Everywhere growing property

We continue with the notation introduced in the two previous sections. Fact 4.11. For any sequence (an∈ An)n≥M, we have

lim

n→∞|τ σM. . . σn(an+1)| = +∞

Proof. By construction, for any n ≥ M , an+1 = Θ−1n+1((r1, ℓ1), . . . , (rk, ℓk)) for

some (r1, ℓ1), . . . , (rk, ℓk) in Un+1(w) and it can be verified by induction that

|τ σM. . . σn(an+1)| = |r1· · · rk|. By Proposition 3.5, the length of minimal elements

ofR(L)n (w) grows to infinity. So the length of minimal elements of Un(w) also grows

(16)

4.9. Proof of Theorem 1.1

Let n≥ 0. Let (ri)i≥0 be the coherent decomposition of w overRn(w) of level

n, and for all i≥ 0, let ℓibe the prefix of ri· · · . For n ≥ M , we know that at least

one of the elements of Rn(w) belongs to R(L)n (w). Let j be the smallest integer

such that rj ∈ R (L)

n (w) and let an = Θ−1n ((r0, ℓ0), . . . , (rj, ℓj)). By definition, for

n≥ M , τ σM. . . σn(an+1) is a prefix of w.

By Fact 4.11, it follows that

w= lim

n→+∞τ σM· · · σn(a ω

n+1). (2)

This, with Fact 4.11, ends the proof of Theorem 1.1.

Acknowledgement

The authors would like to thank Fabien Durand for many fruitful discussions concerning return words and S-adicity. Many thanks also to Narad Rampersad for his careful reading and to the referees for their valuable comments.

References

[1] J.-P. Allouche and J. Shallit. Automatic sequences: Theory, applications, gen-eralizations. Cambridge University Press, Cambridge, 2003.

[2] P. Arnoux and G. Rauzy. Représentation géométrique de suites de complexité 2n + 1. Bull. Soc. Math. France, 119:199–215, 1991.

[3] J. Berstel, A. Lauve, C. Reutenauer, and F. Saliola. Combinatorics on Words: Christoffel Words and Repetitions in Words, volume 27 of CRM Monograph Series. American Mathematical Society, 2008.

[4] V. Berthé and M. Rigo, editors. Combinatorics, Automata and Number Theory, volume 135 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2010.

[5] M. Boshernitzan. A condition for minimal interval exchange maps to be uniquely ergodic. Duke Math. J., 52(3):723–752, 1985.

[6] J. Cassaigne. Special factors of sequences with linear subword complexity. In Developments in language theory, II (Magdeburg, 1995), pages 25–34. World Sci. Publ., River Edge, NJ, 1996.

(17)

[7] J. Cassaigne and F. Nicolas. Factor complexity. Chapter 3 in [4].

[8] A. de Luca. Sturmian words: structure, combinatorics, and their arithmetics. Theoretical Computer Science, 183:45–82, 1997.

[9] F. Durand. A characterization of substitutive sequences using return words. Discrete Math., 179:89–101, 1998.

[10] F. Durand. Linearly recurrent subshifts have a finite number of non-periodic subshift factors. Ergodic Theory Dynam. Systems, 20:1061–1078, 2000. [11] F. Durand, J. Leroy, and G. Richomme. Do properties of an S-adic

repre-sentation determine factor complexity? To appear in J. Integer Seq. ArXiv e-prints:1208.6376v1.

[12] S. Ferenczi. Rank and symbolic complexity. Ergodic Theory Dynam. Systems, 16:663–682, 1996.

[13] J. Leroy. Some improvements of the S-adic conjecture. Advances in Applied Mathematics, 48(1):79 – 98, 2012.

[14] M. Lothaire. Combinatorics on Words, volume 17 of Encyclopedia of Mathe-matics and its Applications. Addison-Wesley, 1983. Reprinted in the Cambridge Mathematical Library, Cambridge University Press, UK, 1997.

[15] M. Lothaire. Algebraic combinatorics on words, volume 90 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2002.

[16] R. C. Lyndon and M. P. Schützenberger. The equation aM _{= b}N_cP _{in a free}

group. Michigan Math. J., 9:289–298, 1962.

[17] F. Mignosi and P. Séébold. Morphismes sturmiens et règles de Rauzy. J. Théor. Nombres Bordeaux, 5:221–233, 1993.

[18] M. Morse and G. A. Hedlund. Symbolic dynamics II. Sturmian trajectories. Amer. J. Math., 62:1–42, 1940.

[19] J.-J. Pansiot. Complexité des facteurs des mots infinis engendrés par mor-phismes itérés. In Automata, languages and programming (Antwerp, 1984), volume 172 of Lecture Notes in Comput. Sci., pages 380–389. Springer, Berlin, 1984.

[20] N. Pytheas Fogg. Substitutions in dynamics, arithmetics and combinatorics, volume 1794 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2002. Edited by V. Berthé, S. Ferenczi, C. Mauduit and A. Siegel.

(18)

[21] N. Pytheas Fogg, Terminologie S-adique et propriétés, Preprint available at http://tinyurl.com/8opdb8s, 2011.

[22] M. Queffélec. Substitution dynamical systems—spectral analysis, volume 1294 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 1987.

[23] G. Rauzy. Suites à termes dans un alphabet fini. Séminaire de Théorie des nombres de Bordeaux, exposé 25:1–16, 1982-1983.

5. Appendix: proof of Lemma 2.5

This entire section is dedicated to the proof of Lemma 2.5. Two words x and y are said to be prefix-comparable if x is a prefix of y or y is a prefix of x.

Fact 5.1. For any morphism f and any prefix-comparable words x and y, f (x) and f(y) are also prefix-comparable.

Fact 5.2. If two words x and y are prefix-comparable and z is a prefix of y, then x and z are also prefix-comparable.

In what follows, let us recall that u, introduced in Section 2, is the fixed point of ν starting with the letter c.

Lemma 5.3. Any factor of u of length at least two and starting with b1 is

prefix-comparable to one of the words b1νℓ(1)e1c, b1νℓ(1)c, b1νℓ(1)e01, b1νℓ(2)e2 and

b1νℓ(2)e3for some ℓ≥ 0.

Proof. We proceed by induction. For factors of length two, three or four, the result is true with ℓ∈ {0, 1} since

b1Γ ∩ F (u) = {b12, b1c},

b1Γ2∩ F (u) = {b11e1, b1cc, b11c, b11e0, b12e2, b12e3},

b1Γ3∩ F (u) ⊆ {b11c, b12e2, b12e3}Γ ∪ {b11e1c, b11e01, b1ccc}.

Let b1xbe a factor of length at least five of u. The result is immediately true if x

starts with 1c, 2e2, 2e3, 1e1cor 1e01, (with ℓ = 0). Thus, assume that x starts with

ccc. By the definition of ν, this implies the existence of a factor y of u such that b1y

is a factor of u, x is a prefix of ν(y) and |y| < |x|. By the inductive hypothesis, b1yis

prefix-comparable to one of the words b1νℓ(1)e1c, b1νℓ(1)c, b1νℓ(1)e01, b1νℓ(2)e2or

b1νℓ(2)e3for some ℓ≥ 0. So, by Fact 5.1 and Fact 5.2, b1xis prefix-comparable to

(19)

Corollary 5.4. Any factor of u of length at least three and starting with a1b1

is prefix-comparable to one of the words a1b1νℓ(1)e1c, a1b1νℓ(1)c, a1b1νℓ(1)e01,

a1b1νℓ(2)e2 or a1b1νℓ(2)e3 for some ℓ≥ 1.

Proof. After Lemma 5.3, it is sufficient to observe that the words a1b11e1c, a1b11c,

a1b11e0, a1b12e2and a1b12e3 are not factors of words in ν(Γ∗).

Lemma 5.5. Any factor a2b2mof u of length at least three is prefix-comparable to

a2b2νℓ(2)e2 for some ℓ≥ 1.

Proof. The proof is similar to that of Corollary 5.4.

Proof of Lemma 2.5. Let n≥ 2 be an integer such that ab ∈ Rn(u). This means

that there exists a word v and two letters x, y such that |v| = n − 2, abvxy is a factor of ψ(u), both words abv and vxy are left special factors of ψ(u) and bvx is not a left special factor of ψ(u). We aim to prove that, for some integer k, v = uk

(and so n = 6k_{+ 1), x = 1 and y = e}

1 (see before Fact 2.2 for the definition of

words uk).

By construction of ψ(u), two cases can hold (maybe not exclusively).

• Case I: There exists a word w1 and two letters x1, y1 such that abvxy =

ψ(a1b1w1x1y1) and a1b1w1x1y1is a factor of u.

• Case II: There exists a word w2 and two letters x2, y2 such that abvxy =

ψ(a2b2w2x2y2) and a2b2w2x2y2is a factor of u.

First, we prove that Case II is impossible. By Lemma 5.5, the word a2b2w2x2y2is

prefix-comparable to a2b2νk(2)e2for an integer k. If |a2b2w2x2y2| ≤ |a2b2νk(2)e2|,

then b2w2x2is a prefix of b2vk2. Thus, the non-left special word bvx = ψ(bw2x2) is a

prefix of buk2. This contradicts the fact that buk2 is a left special factor of ψ(u) (the

words abuk2 = ψ(a2b2vk2) and e3buk2 = ψ(e3b3vk2) are factors of ψ(u)). Hence

|a2b2w2x2y2| > |a2b2νk(2)e2| and νk(2)e2 is a prefix of w2x2y2. Consequently, the

non-left special factor uk2e2 (see Lemma 2.3) is a prefix of the left special factor

vxy= ψ(w2x2y2), which is a contradiction.

Therefore, Case I holds. By Corollary 5.4, the word a1b1w1x1y1 is

prefix-comparable to one of the words a1b1νk(1)e1c, a1b1νk(1)c, a1b1νk(1)e01, a1b1νk(2)e2

and a1b1νk(2)e3for some k≥ 1.

If |a1b1w1x1y1| ≤ |a1b1νk(1)|, then b1w1xis a prefix of b1vk. Thus the non-left

special factor bvx = ψ(b1w1x) of ψ(u) is a prefix of the left special factor buk, which

is a contradiction.

If|a1b1w1x1y1| ≥ |a1b1νk(1)|+2, then one of the words a1b1νk(1)e1c, a1b1νk(1)c,

a1b1νk(1)e01, a1b1νk(2)e2 or a1b1νk(2)e3 is a prefix of a1b1w1x1y1. Thus uk1e1c,

uk1c, uk1e11, uk2e2or uk2e3is a prefix of the left special factor vxy of ψ(u). This

(20)

Thus, |a1b1w1x1y1| = |a1b1νk(1)| + 1. Consequently, we have w1 = vk and

x1y1 ∈ {1e1,1c, 1e0,2e2,2e3}, that is, xy ∈ {1e1,1c, 2e2,2e3}. Noting that vxy is

left special whereas by Lemma 2.3, the words uk1c, uk2e2 and uk2e3 are not left