WITH LINEAR COMPLEXITY
Julien Leroy1
LAMFA, CNRS UMR 6410, Univ. de Picardie Jules Verne, 33 Rue Saint Leu, 80039 cedex 01, France
julien.leroy@uni.lu Gwenaël Richomme
LIRMM (CNRS, Univ. Montpellier 2), UMR 5506 - CC 477, 161 rue Ada, 34095 Montpellier Cedex 5 and Univ. Montpellier 3, Route de Mende, 34199 Montpellier
Cedex 5, France
gwenael.richomme@lirmm.fr
Received: , Revised: , Accepted: , Published:
Abstract
Using Rauzy graphs, Ferenczi proved that if a symbolic dynamical system has linear complexity then it is S-adic. Being more specific, the result can also be proved for infinite words. We provide a new proof of this latter result using the notion of return words to a set of words.
1. Introduction
Since Morse and Hedlund’s works [18], due to their numerous properties and applications, Sturmian words were given a lot of attention (see for instance surveys and studies in [1, 3, 8, 15, 20]). In this paper, we are concerned with two of these properties. First, Sturmian words are examples of words with linear complexity [18]. Recall that the complexity of an infinite word is the function that associates to each integer n the number of factors (blocks of consecutive letters) of length n that the infinite word contains. Secondly, Sturmian words have a multi-scale property since each Sturmian word is the image of another Sturmian word by a morphism (a map, also called substitution, defined on letters and compatible with the concatenation operation) taken from a finite set Ssturm consisting of 4 morphisms [17]. More
precisely, given a Sturmian word s, one can find a sequence (fi)i≥0of morphisms of
SSturm and a letter (ai)i≥0, such that s = limn→∞f0· · · fn(aωn+1). Sturmian words
1This paper was partially written while the first author was invited to Montpellier in June 2011
are then called SSturm-adic. More generally, an infinite word w is said to be S-adic
when S is a finite set of morphisms and the previous equality holds for w instead of s and for a suitable sequence of morphisms of S.
Since Pansiot’s works [19], it is known that fixed points of morphisms that are S-adic words for a singleton set S do not necessarily have linear complexity. Also, Cassaigne showed [21] that any word w = (wi)i≥0 (where each wi is a letter) over
an alphabet A is S-adic, with S = {σa, ϕa | a ∈ A}, where, using a letter α that
does not belong to A, the morphisms σaand ϕaare defined by, for all letters a, b in
A, σa(α) = a, ϕa(α) = αa, σa(b) = b = ϕa(b): w = limn→∞σw0ϕw1· · · ϕwn(α ω).
Nevertheless, it can be observed that (as far as we know) all infinite words with linear complexity that have been studied are S-adic for a suitable (finite) set S of morphisms. Moreover, an important result due to Ferenczi in the context of symbolic dynamical systems can be reformulated in the context of infinite words as follows.
Theorem 1.1. ([12], see also [13]) If an aperiodic and uniformly recurrent word has linear complexity then, for a finite set S of non-erasing morphisms, it is an everywhere growing S-adic sequence.
The everywhere growing property2 means that there exists a sequence (σ
n)n∈N∈
SN
of morphisms of S generating the considered infinite word such that for all letters a in A, the length of σ0σ1· · · σn(a) tends to infinity with n. This property
is not verified by Cassaigne’s morphisms. The previous result of Ferenczi and other considerations (as for instance, the existence of an S-adic characterization of the family of linearly recurrent words [10]) are the origin of the S-adic conjecture which states the existence of a strong definition of S-adicity that would be equivalent to having linear complexity for a symbolic dynamical system or an infinite word (see [20, Section 12.1]). Unfortunately the everywhere growing property is not sufficient to solve this conjecture. Some examples [11] tend to show that the strong expected definition (if it exists) should concern not only the set S of morphisms but also the sequence of these morphisms used to define the infinite word.
Ferenczi’s proof of Theorem 1.1 generalizes an idea of Arnoux and Rauzy [2]; it is based on the study of Rauzy graphs, a tool independently introduced by Rauzy [23] and Boshernitzan [5] and now often used to study factor complexity of words. The Rauzy graph Gnof order n of an infinite word w is the directed graph whose vertices
are the factors of length n of w and such that any factor u = u1· · · un+1 of length
n+ 1 of w induces an edge in Gn from u1· · · unto u2· · · un+1. This edge is labelled
by the pair (u1, un+1). The label of a path in the graph is the concatenation of
labels of edges along the path.
Leroy [13] revisited Ferenczi’s proof in order to provide more information on the 2This property was called ω-growth property in [13]. However, when S is a singleton, the
morphisms arising in the S-adic representation. He succeeded in going deeper into the study of underlying Rauzy graphs. He observed, without going further, that some important paths (called segments) in the Rauzy graphs used by Ferenczi are related to the notion of return words to a set. More precisely, the first (resp. the second) component of their labels corresponds to left (resp. right) return words to the set of left special factors. In this paper, we use this notion of return words to prove Theorem 1.1. Even though it is an adaptation of Ferenczi’s, we hope that our more compact proof will open new ideas to tackle the S-adic conjecture.
An interest of our approach may be to provide non-contextual definitions of short or long return words before the proof when their corresponding terms (short or long segments) in the context of Rauzy graphs are defined in Ferenczi’s proof. Furthermore, this concept of return words to a set W generalizes the one of Rauzy graphs since it does not require that the words in W share the same length.
The next two sections are preliminaries for our proof in Section 4. They allow us to present these notions of short and long return words in a general context independently of the complexity of the infinite words considered.
2. Return words to special words
The aim of this section is to recall the main combinatorial notions used in our proof. This is done through the development of an example that will be used in Section 3 to illustrate the existence of short return words that are not constant return words.
2.1. Words and morphisms
We assume that readers are familiar with combinatorics on words; for basic (possibly omitted) definitions, we follow [4, 14, 15].
Given an alphabet A, that is, a finite set of symbols called letters, we let A∗
denote the set of all finite words over A (that is, the set of all finite sequences of elements of A). As usual, the concatenation of two words u and v is simply denoted uv. It is well known that the set A∗ equipped with the concatenation operation is
a free monoid with neutral element ε, the empty word.
Given two alphabets A and B, a free monoid morphism, or simply morphism σ, is a map from A∗ to B∗ such that σ(uv) = σ(u)σ(v) for all words u and v over A
(observe that this implies σ(ε) = ε). It is well known that a morphism is completely determined by the images of the letters.
From now on, denoting Γ = {e0, e1, e2, e3, b1, b2, b3, b4, a1, a2, 1, 2, c} and
Σ = {e1, e2, e3, b, a,1, 2, c}, we consider the following two morphisms ν and ψ
ν(e0) = e01 b41 1 e0 ψ(e0) = e1 ν(e1) = e1c c b11 e1 ψ(e1) = e1 ν(e2) = e2c c b22 e2 ψ(e2) = e2 ν(e3) = e3c c b32 e3 ψ(e3) = e3 ν(b1) = c c c e1a1b1 ψ(b1) = b ν(b2) = c c c e2a2b2 ψ(b2) = b ν(b3) = c c c c e3b3 ψ(b3) = b ν(b4) = c c c c c b4 ψ(b4) = b ν(1) = c c c c b11 ψ(1) = 1 ν(2) = c c c c b12 ψ(2) = 2 ν(a1) = c a1c c e0e1 ψ(a1) = a ν(a2) = c a2c c e2e3 ψ(a2) = a ν(c) = c c c c a1a2 ψ(c) = c
One can observe that ν is uniform, that is, the images of the letters have the same length (let us recall that the length of a word u, denoted by|u|, is the number of letters of u). The morphism ν is also primitive, that is, there exists an integer k (here k = 5) such that for all letters x, y in Γ, the letter y occurs in νk(x).
Note also that the morphisms ν and ψ◦ ν are injective. Moreover, they are synchronizing morphisms; a morphism f from A∗ to B∗ is synchronizing if for all
letters x, y, z in the alphabet A and words p, s over A, the equality f (x)f (y) = pf(z)s implies p = ε or s = ε.
2.2. Infinite words and factors
A (right) infinite word (or sequence) over an alphabet A is an element of Aω.
When a morphism is not erasing, that is, the images of letters are never the empty word, the notion of morphism extends naturally to infinite words. From now on, u will be the fixed point of ν (u = ν(u)) starting with the letter c.
A word v is a factor of a finite or infinite word w over an alphabet A if it occurs in w or equivalently, if w = xvy for some words x, y in A∗∪ Aω. When x = ε (resp.
y= ε), v is a prefix (resp. suffix ) of w. We let F(w) denote the set of all factors of w.
Observe that for all n≥ 0, words νn(1) and νn(2) are factors of u (let us recall
that, given a morphism f and an integer n≥ 0, fn is the identity if n = 0, and is
fn−1◦ f otherwise).
Fact 2.1. Let n≥ 0 be an integer and v be a factor of u. If ψ(v) = ψ(νn(1)) then
v= νn(1). If ψ(v) = ψ(νn(2)) then v = νn(2).
Proof. Let i ∈ {1, 2}. The result is immediate for n = 0 from the definitions of ν and ψ. Assume n ≥ 1. Since ψ ◦ ν is synchronizing and ψ(v) is a factor of ψ(u) = ψ ◦ ν(u), there must exist a word v′ such that ψ(v) = ψ ◦ ν(v′). The
all words x in Γ∗, the equality ψ(x) = ψ ◦ ν(α) implies x = ψ(α). This extends to
αin Γ∗. Hence v = νn(i).
As with any fixed point of a primitive morphism [22], the word u is an example of a uniformly recurrent infinite word, i.e., its factors occur infinitely often and with bounded gaps.
The complexity function pwof an infinite word w is the function that counts the
number of factors of each length (See [7] for a recent survey on this function). An infinite word w is said to have linear complexity if there is a constant C such that pw(n) ≤ Cn for all n ≥ 1. Again, as with any fixed point of a primitive morphism
[22], the word u is an example of an infinite word with linear complexity. 2.3. Special factors
Let us recall that a factor u of an infinite word w over an alphabet A is left special if there exist at least two different letters a and b in A such that the words au and bu are factors of w.
For all n≥ 1, observing that the words νn(1) and νn(2) differ only by their last
letters, we let vn denote their longest common prefix; v0 = ε, vn = ν(vn−1)ccccb1
for n≥ 1. We also let un denote the word ψ(vn).
Fact 2.2. For all n ≥ 1, the words abun, bun2 and un1e1 are examples of left
special factors of ψ(u).
Proof. As the words b11e1, b22e2, 11e0, and b32e3 are factors of u, for all n ≥ 1,
the following words are also factors of u: e1a1b1vn1e1c, e2a2b2vn2e2c, 1vn1e01,
e3b3vn2e3.
Thus, the words e1abun1e1c, e2abun2e2c, 1un1e11, e3bun2e3 are factors of ψ(u).
Hence the fact.
As a consequence of the previous fact, the word ψ(ν(u)) has arbitrarily large special factors. Thus it is aperiodic, that is, denoting by xi the letter in position i
of ψ(ν(u)), there does not exist any positive integers m, k such that, for all i ≥ m, xi= xi+k.
We will also need the next property.
Lemma 2.3. For all n≥ 1, the following words are factors of ψ(u) but are not left special factors of ψ(u): bun1, un1e1c, un1c, un1e11, un2e2, un2e3.
Proof. First, let x = bun1 = bψ(νn(1)) and assume that, for a letter α, αx is a
factor of ψ(u). As n ≥ 1, un starts with ccc. This implies that αb must be a suffix
of a word in ψ(ν(Γ)) and so α ∈ {a, e3}. If α = e3, then u must contain as a factor
the word e3b3vn1 = e3b3νn(1) (see Fact 2.1). But one can observe that u contains
Now let x = un1e1c= ψ(νn(1))e1c. If αx is a factor of ψ(u) for a letter α, then
there exists a letter β such that (we use here Fact 2.1 and the fact that e0cis not a
prefix of a word in ν(Γ)) βvn1e1cis a factor of u with α = ψ(β). We let the reader
verify (for instance by induction) that if, for an integer ℓ and a letter δ, δνℓ(1)e 1is
a factor of u, then δ = b1. Hence β = b1 and α = b.
The proof for other words is similar. Let us just state the following facts (with the same notation as above).
x= un1c ⇒ β ∈ {b1, b4} and α= b
x= un1e11 ⇒ β = b1 and α= b
x= un2e2 ⇒ β ∈ {b1, b2} and α= b
x= un2e3 ⇒ β ∈ {b1, b3} and α= b
2.4. Return words
Given a set L of factors of an infinite word w, a word r is called a (left) return word to L in w if there exist two words u and v in L such that rv is a factor of w, rv admits u as a prefix and rv contains no other occurrences of words of L. Clearly if a set L contains a prefix of w and w is recurrent (all of its factors occur infinitely often), then w can be decomposed over the set of left return words to L.
In this paper, given an infinite word w, we will mostly consider the setRn(w) of
return words to the set LSn(w) composed of the prefix of length n ≥ 1 of w and of
all left special factors of length n of w. At the end of this section, we show that a, b and ab are elements ofRn(u) (where u is the word defined in the previous sections)
with a more precise description of the values of n for which this membership holds. Lemma 2.4. For all n≥ 1, {a, b} ⊆ Rn(u).
For all n≥ 0, ab ∈ R|un|+2(u) = R6n+1(u).
Proof. Let n≥ 1, |un| = 6n− 1 ≥ n. Let pi be the prefix of length i of un where
0 ≤ i ≤ |un|. From Fact 2.2, we know that abpn is a factor of ψ(u) and that the
words abpn−2 (when n ≥ 2), bpn−1 and pn are all left special factors (and all of
length n). As pn−2is a prefix of pn−1, itself a prefix of pn, a and b belong toRn(u).
The word abun1e1is also a factor of ψ(u) with abunand un1e1left special factors.
As, by Lemma 2.3, bun1 is not a left special factor, ab ∈ R|un|+2(u) = R6n+1(u).
The last part of Lemma 2.4 admits a converse.
Lemma 2.5. Let n≥ 2 be an integer. If ab ∈ Rn(u) then n = 6k+ 1 for some
k≥ 0.
Although this lemma will allow the important Remark 3.1, we provide its proof in an appendix (see Section 5) as it is rather long, technical, and a bit far from the main aim of the paper.
3. Short, constant and long return words
In this section, we consider particular return words that will be useful in the next section where we present our proof of Theorem 1.1. Note that, as already said in the introduction, the terms short and long were already used in Ferenczi’s paper [12] but, even though our concepts are related, they are differently defined. For instance the fact that the length of long return words tends to infinity is a consequence of the definition, not part of it.
It is important to observe that all results in this section are independent of the complexity of the words considered.
3.1. Definitions
We will need to distinguish various types of return words to the set LSn(w). A
factor u of w is called a short return word of w if for all n ≥ 0, there exists an integer N ≥ n such that u is an element of RN(w). When a factor u is a return
word of w (i.e. belongs toRn(w) for some integer n) without being a short return
word, we said that u is a long return word. Amongst short return words, we say that a factor u of w is a constant return word of w if there exists an integer N ≥ 0 such that for all n≥ N , u is an element of Rn(w). For n ≥ 0, we let respectively
R(L)n (w), R(S)n (w) and R(C)n (w) denote the set of long return words to LSn(w),
the set of short return words to LSn(w) and the set of constant return words to
LSn(w).
Remark 3.1. Our example u studied in Section 2 shows that not all short return words are constant return words. More precisely Lemma 2.4 and Lemma 2.5 prove that a, b and ab are examples of short return words with a and b con-stant return words and ab not a concon-stant return word. Indeed, for all n we have a, b∈ R(C)n (ψ(u)) ⊆ R(S)n (ψ(u)); ab ∈Sni=1R(S)n (ψ(u)) \Sni=1R(C)n (ψ(u)).
As c, cc and ccc are left special factors of ψ(u), and as cccc is a factor of ψ(u), c is a return word of ψ(u), and more precisely, c ∈ R1(ψ(u)) ∩ R2(ψ(u)) ∩ R3(ψ(u)).
Observe that for all words x of length at least four such that cx is a factor of ψ(u), xis always preceded in ψ(u) by the letter c. Thus, for all n ≥ 4, c is not a return word to LSn(ψ(u)); c ∈ (R (L) 1 (ψ(u)) ∩ R (L) 2 (ψ(u)) ∩ R (L) 3 (ψ(u))) \ R (L) 4 (ψ(u)).
One would like to have examples of short or constant return words for an infinite word defined over a smaller alphabet than Σ. We let the reader check that this can be obtained by recoding ψ(u) for instance with the following morphism f defined from Σ to {a, b} by f (a) = a8b, f (b) = a7b2, f (e
1) = a6b3, f (e2) = a5b4, f (e3) =
a4b5, f (1) = a3b6, f (2) = a2b7, f (c) = ab8.
We will see that for infinite words with linear complexity the set of short return words is finite. Thus it is convenient to letR(S)(w) (resp. R(C)(w)) denote the set
S
n≥1R (S)
3.2. Coherent decompositions
From our example, we can observe that the setsRn(w) may not be codes (a code
is a set of words such that any word w in C∗ has a unique decomposition over C).
This leads us to introduce the notion of coherent decomposition.
A sequence (rk)k∈N∈ (Rn(w))Nsuch that w = r0r1r2· · · is called a
decomposi-tion of w. The unique decomposidecomposi-tion (rk)k∈Nof w such that a factor of w belongs
to LSn(w) if and only if it is the prefix of length n of riri+1· · · for some i ≥ 0 is
called the coherent decomposition of w of level n.
Let s1, . . . , sk be words. Let n be an integer and let w be an infinite word.
We say that (s1, . . . , sk) is a factor of the coherent decomposition of w of level n if
there exists an integer m ≥ 0 such that for all i in {1, . . . , k}, si = rm+i−1. Note
that if (s1, . . . , sk) is a factor of the coherent decomposition of w of level n, then
{s1, . . . , sk} ⊆ Rn(w).
The next result shows a connection between the coherent decompositions of w for two successive levels.
Lemma 3.2. Let (ri)i≥0 and (ri′)i≥0 be the coherent decompositions of an infinite
word w respectively of level n and level n + 1 (for an integer n ≥ 1). There exists a strictly increasing sequence of integers (ij)j≥0 such that i0= 0 and for all j ≥ 0,
r′
j= rij· · · rij+1−1.
Proof. This is a direct consequence of the definition of coherent decompositions and the observation that the prefix of length n of any element of LSn+1(w) is an element
of LSn(w).
3.3. Short return words are concatenations of constant return words In this section, we prove the following proposition that claims in a more precise way that any short return word of an infinite word w is a concatenation of constant return words to LSn(w).
Proposition 3.3. Given an infinite word w, for any short return word s of w, there exist infinitely many integers n such that, for the coherent decomposition (ri)i≥0 of
level n of w, s = ri· · · rj for some integers i and j such that all words rℓ, i≤ ℓ ≤ j,
are constant return words.
Actually Proposition 3.3 is a particular case of the following technical lemma (Proposition 3.3 corresponds to the case k = 1 of Lemma 3.4).
Lemma 3.4. Let w be an infinite word and let s1, . . . , sk be words such that
(s1, . . . , sk) is a factor of the coherent decomposition of w of level n for infinitely
many integers n (this implies in particular that all words s1, . . . , sk are short
there exist constant return words r1, . . . , rik such that (r1, . . . , rik) is a factor of
the coherent decomposition of w of level m for infinitely many integers m, and for all j ∈ {1, . . . , k}, sj= rij−1+1· · · rij.
Proof. We start by defining a partial order on tuples of words. Given any words t1, . . . , tℓ, s1, . . . , sk such that t1. . . tℓ= s1. . . sk, we let (t1, . . . , tℓ) < (s1, . . . , sk)
denote the fact that there exists an integer j in {1, . . . , min(ℓ, k)} such that for all i= 1, . . . , j − 1, ti = si and |tj| < |sj|. Observe that, given a word s, this partial
order is a total order on the set of all tuples (s1, . . . , sk) such that s = s1. . . sk.
This allows us to prove Lemma 3.4 by induction.
Let s1, . . . , sk and w be as in the lemma. Let D be the set of all integers n
such that (s1, . . . , sk) is a factor of the coherent decomposition of w of level n. By
hypothesis, this set is infinite.
First, consider the case where for all j, sj is of length 1. Let j, 1 ≤ j ≤ k. For all
nin D, by definition of this set, there exists a word un of length n such that both
un and the prefix of length n of sjun belong to LSn(w). As the prefix of length
n− 1 of any element of LSn(w) is an element of LSn−1(w), we deduce that for all
n≥ 1, there exists a word vnof length n such that both vn and the prefix of length
nof sjvnis an element of LSn(w). As D is infinite, this shows that sjis a constant
return word.
We now turn to the case where there exists a word sj (with j ∈ {1, . . . , k})
which is not a constant return word. It follows that there exists an infinite sub-set D′ of D such that for all n in D′, (s
1, . . . , sk) is not a factor of the coherent
decomposition of w of level n− 1, whereas (s1, . . . , sk) is a factor of the
coher-ent decomposition of w of level n. By Lemma 3.2, there exist some words t1(n),
. . . , tℓ(n)(n) and some integers i0 = 0 < i1 < . . . < ik = ℓ(n) such that for all
j = 1, . . . , k, we have sj = tij−1+1(n) · · · tij(n) and (t1(n), . . . , tℓ(n)(n)) is a factor
of the coherent decomposition of w of level n− 1: each word sj is a composition
of return words to LSn−1(w), and at least one is the concatenation of at least two
such words. Moreover, we have (t1(n), . . . , tℓ(n)(n)) < (s1, . . . , sk). Even though the
words t1(n), . . . , tℓ(n)(n) depend on n, as the length of |t1(n) · · · tℓ(n)(n)| = |s1. . . sk|
is fixed, there must exist an integer ℓ, some words t1, . . . , tℓ, and an infinite setD′′
such that for all n in D′′, ℓ(n) = ℓ and for all i = 1, . . . , ℓ, t
i(n) = ti. The lemma
ends by induction applied to (t1, . . . , tℓ).
3.4. More on long return words
The next result justifies a posteriori the terminology long for long return words. Proposition 3.5. For an infinite word w, if R(L)n (w) 6= ∅ for infinitely many
integers n, then
lim
n→∞min{|r| | r ∈ R (L)
Proof. This result is a direct consequence of the definitions of short and long return words.
Assume by contradiction that limn→∞min{|r| | r ∈ R(L)n (w)} is finite. Let D
be an infinite subset of N such thatR(L)n (w) 6= ∅ for all n in D. For all i ∈ D, let
ri be a word with minimal length in R (L)
i (w). As the length of ri is bounded, the
set{ri| i ∈ D} is finite. In particular, there exists a word r such that r ∈ R(L)i (w)
for all i in an infinite subset ofD. Thus, r is a short return word of w, which is a contradiction as long return words cannot be short by definition.
3.5. Return words to a word
In almost the entire paper we are concerned with the set of return words to the set of left special words of an infinite word. The next important and well-known result uses the notion of return word to a single word.
A nonempty word r is a return word to a factor u of an infinite word w if ru is a factor of w starting with u and containing exactly two occurrences of u as a factor (one as a prefix and one as a suffix).
Proposition 3.6. Let w be an aperiodic and uniformly recurrent infinite word. We have
lim
n→+∞ u factor of wmin |u|=n
{|r| | r return word to u} = +∞.
Although the previous result is certainly well known (see for instance [9] for a proof of a similar result), we provide a short proof in order to get a self-contained paper. Let us recall one fundamental lemma of combinatorics on words (see [16] and for instance [14, Prop. 1.3.4]).
Lemma 3.7. For two nonempty words x and y, there exists a word z such that xz = zy if and only if there exist two words u and v such that x = uv, z ∈ (uv)∗u
and y = vu.
Proof of Proposition 3.6. Assume by contradiction that the limit in Proposition 3.6 is finite. This implies that there exists a word r which is the return word to infinitely many words (un)n≥0. This means that there exist some words (xn)n≥0 such that
unxn = run. By Lemma 3.7, this implies that rk is a factor of w for arbitrarily
large k, which is a contradiction with the fact that w is aperiodic and uniformly recurrent.
4. S-adicity of infinite words with linear complexity
In this section, we provide our proof of Theorem 1.1. Each subsection corresponds to a step or a key element of the proof. We first clarify the notion of S-adicity. Let
A be an alphabet. Recall that with the product topology, the set Aω of infinite
words over A is a compact metric space. As usual, for a finite word u, uω denotes
the infinite word obtained by concatenating infinitely many copies of u.
Let w be an infinite word over A. An adic representation of w is given by a sequence (An)n∈N of alphabets, a sequence (σn : A∗n+1 → A∗n)n∈N of morphisms
and a sequence (an)n∈N of letters such that ai ∈ Ai for all i ≥ 0, A0 = A,
limn→+∞|σ0σ1· · · σn(an+1)| = +∞ and
w= lim
n→+∞σ0σ1· · · σn(a ω n+1).
When all morphisms σn, n∈ N, belong to a given finite set S of morphisms, we say
that w is S-adic (often omitting the sequence of morphisms and letters). 4.1. The number of left special factors is bounded
The starting point of the proof of Theorem 1.1 is the following breakthrough result of Cassaigne result on complexity.
Theorem 4.1. [6] An infinite word w has linear complexity if and only if the first difference of its complexity (pw(n + 1) − pw(n)) is bounded.
For a finite word u and an infinite word w, let δ−
wudenote the number of letters
a such that au is a factor of w. Recall that a word u is a left special factor of w if and only if δ−
wu≥ 2. Since Rauzy’s works [23], it is well known that the first
difference of complexity is related to special factors by the following Equation (1). For a recurrent infinite word w,
pw(n + 1) − pw(n) =
X
|u|=n, u left special factor of w
(δw−u− 1) (1)
Thus Theorem 4.1 has the following corollary (let us recall from Section 2.4 that LSn(w) is the set consisting of the prefix of length n and the left special factors of
length n of w).
Corollary 4.2. A recurrent infinite word w has linear complexity if and only if the sets LSn(w), n ≥ 0, have bounded cardinality.
4.2. The number of return words is bounded
Let us recall that Rn(w) is the set of return words to LSn(w) where w is an
infinite word and n≥ 1 is an integer.
Proposition 4.3. If w is a recurrent infinite word with linear complexity, then the cardinality of the sets Rn(w) is bounded.
Proof. Let r be a return word to the set LSn(w). By definition of this set, there
exists a word u in LSn(w) (u is of length n and is a left special factor or a prefix of
w) such that ru is a factor of w. Moreover for each factor sp of ru of length n with 1 ≤ |s| < |r|, s suffix of r and p prefix of u, there exists a unique letter γ such that γsp is a factor of w. Hence r is entirely determined by the pair (u, α) with α its last letter (but the pair may not be unique). Thus, Card(Rn(w)) ≤Pu∈LSn(w)δ−u= p(n +
1) − p(n) + Card(LSn(w)). The lemma follows from Theorem 4.1 and Corollary 4.2.
The reader will note that the cardinality of a set X is denoted by Card(X). 4.3. The number of constant return words is finite
Let us recall from Section 3.1 that R(C)(w) = ∪
n≥1R(C)n (w) is the set of all
constant return words to the sets LSn(w).
Proposition 4.4. For any recurrent infinite word w with linear complexity, the set R(C)(w) is finite.
Proof. This is a direct consequence of Proposition 4.3. Let us consider K = maxn≥1Card(Rn(w)). Assume by contradiction that R(C)(w) is not finite. Let
r1, . . . , rK+1be distinct elements of R(C)(w). There are some integers N1, . . . , NK+1
such that for all i in{1, . . . , K + 1} and for all n ≥ Ni, ribelongs to R (C)
n (w). Thus,
for n ≥ max{Ni | 1 ≤ i ≤ K + 1}, we have {r1, . . . , rK+1} ⊆ Rn(w), which is a
contradiction with Card(Rn(w)) ≤ K.
4.4. The number of short return words is finite Let us recall from Section 3.1 thatR(S)(w) = ∪
n≥1R(S)n (w) is the set of all short
return words to the sets LSn(w).
Proposition 4.5. For any uniformly recurrent infinite word w with linear com-plexity, the setR(S)(w) is finite.
This result is a consequence of the following technical lemma and Proposition 3.3. Lemma 4.6. Let w be an aperiodic and uniformly recurrent infinite word with linear complexity. Let (Li)i≥1 be a family of sets of words such that Li ⊆ Ri(w)
for all i≥ 1 and R = ∪i≥1Li is finite.
There exist two integers K and N such that for all n≥ N and for any coherent decomposition (ri)i∈Nof w into words ofRn(w), the number of consecutive elements
of R in (ri)i∈N is bounded by K.
In particular, the concatenation of such return words in a coherent decomposition of level greater than N has length bounded by Kℓ where ℓ is the maximal length of words in R.
Proof. We let ℓ be, as defined in the lemma, the maximal length of words in R. By Corollary 4.2 and Proposition 4.3, there exists an integer K such that for all n, Card(Rn(w) × LSn(w)) ≤ K. By Proposition 3.6, there exists an integer N
such that for all factors u of length n≥ N of w, all return words to u have length greater than Kℓ. Finally, consider an integer n≥ N and let (ri)i≥0 be a coherent
decomposition of w of level n.
Assume that there exist K + 1 consecutive elements of Ln in the considered
decomposition of w, say rm, . . . , rm+K for an integer m≥ 1. For i ≥ 0, let li be
the prefix of length n of the word ri+1ri+2· · · . From the choice of K, there exist two
integers i and j such that m≤ i < j ≤ m + K and (ri, li) = (rj, lj). Consequently,
the word ri+1· · · rj is a return word to li and|ri+1· · · rj| ≤ Kℓ, which contradicts
the choice of N .
Proof of Proposition 4.5. By Proposition 3.3, any short return word is the concate-nation of constant return words r1, . . . , rk such that (r1, . . . , rk) is a factor of
infinitely many coherent decompositions of w. By Proposition 4.4, the setR(C)(w)
is finite. We can then apply Lemma 4.6 with Ln= R(C)(w) ∩ Rn(w) for all n ≥ 0
to deduce that short return words have bounded length. Consequently, the set of short return words is finite.
4.5. About long return words
As an immediate consequence of Lemma 4.6, the next result states that any sufficiently long factor of a coherent decomposition (of sufficiently large level n) contains a long return word.
Lemma 4.7. Let w be a uniformly recurrent infinite word with linear complexity. There exist some integers K and N such that for all n≥ N , if k ≥ 1 and (r1, . . . , rk)
is a factor of the coherent decomposition of w of level n with|r1· · · rk| ≥ K, then
at least one of the words ri (1 ≤ i ≤ k) belongs to R (L) n (w).
Proof. By Lemma 4.6 (taking Li= R(S)i (w) for all i ≥ 0), there exist some integers
K1 and N such that for all n ≥ N , at most K1 short return words can occur
successively in the coherent decomposition of w of level n. Let ℓ = max{r | r ∈ R(S)(w)}. Lemma 4.7 follows with K = K
1ℓ+ 1.
Remark 4.8. With the same notation as in Lemma 4.7, we let M denote the smallest integer such that M ≥ N and min{|u| | u ∈ R(L)M (w)} ≥ K. Due to Lemma 3.2, we have min{|u| | u ∈ R(L)n (w)} ≥ K for all n ≥ M .
4.6. Morphisms
Up to the end of Section 4, w always denotes a recurrent infinite word with linear complexity. For all n≥ 0, let Un(w) be the set of tuples ((r1, ℓ1), . . . , (rk, ℓk)), k ≥ 1,
such that
• there exists an integer j, 1 ≤ j ≤ k, such that rj ∈ R (L)
n (w) and for
all i 6= j, ri ∈ R (S)
n (w) (by abuse of notation, we write (r1, . . . , rk) ∈
(R(S)n (w))∗R(L)n (w)(R(S)n (w))∗);
• with (si)i≥0the coherent decomposition of w of level n, there exists an integer
isuch that (r1, . . . , rk) = (si, . . . , si+k−1) (i.e. (r1, . . . , rk) is a factor of the
coherent decomposition of w of level n), and for all j = 1, . . . , k, ℓj is the
prefix of length n of si+j· · · (ℓj ∈ LSn(w)).
Lemma 4.9. For any uniformly recurrent infinite word w with linear complexity, the cardinality of the sets (Un(w))n≥0 is ultimately bounded, i.e., there exist some
integers K and N such that for all n≥ N , Card(Un(w)) ≤ K.
Proof. By Lemma 4.6 (taking Li= R(S)i (w) for all i ≥ 0), there exist some integers
K and N such that for all n ≥ N , at most K short return words can occur suc-cessively in the coherent decomposition of w of level n. Note that, by definition of Un(w), for any of its elements ((r1, ℓ1), . . . , (rk, ℓk)), r1. . . rkℓk is a factor of w and
for all j = 1, . . . , k − 1, ℓj is the prefix of length n of rj+1. . . rkℓk (ℓj is uniquely
determined by rj+1, . . . , rk and ℓk). By Proposition 4.5 and Proposition 4.3, it
follows that, for all n ≥ N , the cardinality of Un(w) admits the upper bound
(Card(R(S)(w)))2(K+1)K′Card(LSn(w)) with K′= maxn≥0Card(R(L)n (w)).
Let K and M be integers as in Lemma 4.7 and Remark 4.8. Let n ≥ M and let ((r1, ℓ1), . . . , (rk, ℓk)) be an element of Un+1(w). Let (si)i≥0 be the coherent
decomposition of w of level n. By Lemma 3.2, there exists a strictly increasing sequence of integers (ij)1≤j≤k+1such that for all j≥ 0, rj = sij· · · sij+1−1and ℓkis
a prefix of sik+1· · · . By the choice of M , as |si1· · · sik+1−1| = |r1· · · rk| ≥ K,
at least one of the words sj is a long return word to LSn(w). Let k′ be the
number of integers i between i1 and ik+1 − 1 such that si ∈ R(L)n (w) and let
j1< j2< . . . < jk′be these integers. Let ℓ′jbe the prefix of length n of sj+1sj+2· · · .
We have just described a way to associate to each element ((r1, ℓ1), . . . , (rk, ℓk))
of Un+1(w) a sequence ((si1, ℓ ′ i1), . . . , (sj1, ℓ ′ j1)), ((sj1+1, ℓ ′ j1+1), . . . , (sj2, ℓ ′ j2)), . . . , ((sjk′ −2+1, ℓ′jk′ −2+1), . . . , (sjk′ −1, ℓj′k′ −1)), ((sjk′ −1+1, ℓ′jk′ −1+1), . . . , (sik, ℓ ′ ik)) of ele-ments ofUn(w).
For n≥ M , let An = {0, 1, . . . , Card(Un)(w) − 1} and let Θn be a bijection from
An toUn(w). Now we define our morphisms from A∗n+1toA∗n by σn(a) = a1. . . ak
whenever Θn(a) can be associated as previously to Θn(a1) . . . Θn(ak).
We also let τ be the morphism fromA∗
n to A∗ (the alphabet of w) defined for
all a∈ An by τ (a) = u if Θn(a) = ((r1, ℓ1), . . . , (rk, ℓk)) and u = r1· · · rk.
Of course our morphisms depend on our construction. Other morphisms can be suitable.
4.7. The number of morphisms is finite
We continue with the notation introduced in the previous section. Let S = {σn|
n≥ M }.
Fact 4.10. The set S is finite.
Proof. We start with a technical observation. Let (r, l) in Rn+1(w)×LSn+1(w) such
that rl is a factor of w. Following Lemma 3.2, there exist (r1, l1), (r2, l2), . . . , (rk, lk),
elements of Rn(w) × LSn(w), such that rl[1,n] = r1r2· · · rklk (where lk = l[1,n]
denotes the prefix of length n of l) and li is a prefix of ri+1. . . rklk for all i in
{1, . . . , k}. We are going to show that, for i, j in {2, . . . , k − 1}, i 6= j, we have (ri, li) 6= (rj, lj) and, consequently, k < 2 + Card(Rn(w)) × Card(LSn(w)).
Let l′ ∈ LS
n+1(w) be the prefix of rl of length n + 1. Assume by contradiction
that (ri, li) = (rj, lj) with 1 < i < j < k. Since there are no left special factors of
length n + 1 in rl (except its prefix l′ and its suffix l), for all integers h∈ {1, . . . , i},
the words rh· · · rili can be uniquely extended to the left in L(w) by rh−1. As
rili = rjlj, this implies that r1. . . rili is also a suffix of r1. . . rjlj. Thus l′ has an
internal occurrence in rl, which is a contradiction.
We now turn to the proof of the fact. We need to go back to the construc-tion of the morphisms and so we let ((r1, ℓ1), . . . , (rk, ℓk)) and integers (ij)1≤j≤k+1
be as in this construction. The previous observation shows that for all j, we have 1 ≤ j ≤ k, ij+1− ij ≤ 2 + Card(Rn(w)) × Card(LSn(w)), and so ik+1 −
i1 ≤ k(2 + Card(Rn(w)) × Card(LSn(w))). By Corollary 4.2, Proposition 4.3 and
Lemma 4.9, there exists an integer K1such that for all n≥ M , Card(Rn(w)) ≤ K1,
Card(LSn(w)) ≤ K1 and k≤ K1. As the length of σn(Θ−1n+1((r1, ℓ1), . . . , (rk, ℓk)))
is bounded by ik+1− i1, it is also bounded above by K1(2 + K12).
In other words, there exists an integer K, such that for all n ≥ M and for all letters a∈ An+1, |σn(a)| ≤ K. As also Card(An) ≤ K, the set S is finite.
4.8. Everywhere growing property
We continue with the notation introduced in the two previous sections. Fact 4.11. For any sequence (an∈ An)n≥M, we have
lim
n→∞|τ σM. . . σn(an+1)| = +∞
Proof. By construction, for any n ≥ M , an+1 = Θ−1n+1((r1, ℓ1), . . . , (rk, ℓk)) for
some (r1, ℓ1), . . . , (rk, ℓk) in Un+1(w) and it can be verified by induction that
|τ σM. . . σn(an+1)| = |r1· · · rk|. By Proposition 3.5, the length of minimal elements
ofR(L)n (w) grows to infinity. So the length of minimal elements of Un(w) also grows
4.9. Proof of Theorem 1.1
Let n≥ 0. Let (ri)i≥0 be the coherent decomposition of w overRn(w) of level
n, and for all i≥ 0, let ℓibe the prefix of ri· · · . For n ≥ M , we know that at least
one of the elements of Rn(w) belongs to R(L)n (w). Let j be the smallest integer
such that rj ∈ R (L)
n (w) and let an = Θ−1n ((r0, ℓ0), . . . , (rj, ℓj)). By definition, for
n≥ M , τ σM. . . σn(an+1) is a prefix of w.
By Fact 4.11, it follows that
w= lim
n→+∞τ σM· · · σn(a ω
n+1). (2)
This, with Fact 4.11, ends the proof of Theorem 1.1.
Acknowledgement
The authors would like to thank Fabien Durand for many fruitful discussions concerning return words and S-adicity. Many thanks also to Narad Rampersad for his careful reading and to the referees for their valuable comments.
References
[1] J.-P. Allouche and J. Shallit. Automatic sequences: Theory, applications, gen-eralizations. Cambridge University Press, Cambridge, 2003.
[2] P. Arnoux and G. Rauzy. Représentation géométrique de suites de complexité 2n + 1. Bull. Soc. Math. France, 119:199–215, 1991.
[3] J. Berstel, A. Lauve, C. Reutenauer, and F. Saliola. Combinatorics on Words: Christoffel Words and Repetitions in Words, volume 27 of CRM Monograph Series. American Mathematical Society, 2008.
[4] V. Berthé and M. Rigo, editors. Combinatorics, Automata and Number Theory, volume 135 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2010.
[5] M. Boshernitzan. A condition for minimal interval exchange maps to be uniquely ergodic. Duke Math. J., 52(3):723–752, 1985.
[6] J. Cassaigne. Special factors of sequences with linear subword complexity. In Developments in language theory, II (Magdeburg, 1995), pages 25–34. World Sci. Publ., River Edge, NJ, 1996.
[7] J. Cassaigne and F. Nicolas. Factor complexity. Chapter 3 in [4].
[8] A. de Luca. Sturmian words: structure, combinatorics, and their arithmetics. Theoretical Computer Science, 183:45–82, 1997.
[9] F. Durand. A characterization of substitutive sequences using return words. Discrete Math., 179:89–101, 1998.
[10] F. Durand. Linearly recurrent subshifts have a finite number of non-periodic subshift factors. Ergodic Theory Dynam. Systems, 20:1061–1078, 2000. [11] F. Durand, J. Leroy, and G. Richomme. Do properties of an S-adic
repre-sentation determine factor complexity? To appear in J. Integer Seq. ArXiv e-prints:1208.6376v1.
[12] S. Ferenczi. Rank and symbolic complexity. Ergodic Theory Dynam. Systems, 16:663–682, 1996.
[13] J. Leroy. Some improvements of the S-adic conjecture. Advances in Applied Mathematics, 48(1):79 – 98, 2012.
[14] M. Lothaire. Combinatorics on Words, volume 17 of Encyclopedia of Mathe-matics and its Applications. Addison-Wesley, 1983. Reprinted in the Cambridge Mathematical Library, Cambridge University Press, UK, 1997.
[15] M. Lothaire. Algebraic combinatorics on words, volume 90 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2002.
[16] R. C. Lyndon and M. P. Schützenberger. The equation aM = bNcP in a free
group. Michigan Math. J., 9:289–298, 1962.
[17] F. Mignosi and P. Séébold. Morphismes sturmiens et règles de Rauzy. J. Théor. Nombres Bordeaux, 5:221–233, 1993.
[18] M. Morse and G. A. Hedlund. Symbolic dynamics II. Sturmian trajectories. Amer. J. Math., 62:1–42, 1940.
[19] J.-J. Pansiot. Complexité des facteurs des mots infinis engendrés par mor-phismes itérés. In Automata, languages and programming (Antwerp, 1984), volume 172 of Lecture Notes in Comput. Sci., pages 380–389. Springer, Berlin, 1984.
[20] N. Pytheas Fogg. Substitutions in dynamics, arithmetics and combinatorics, volume 1794 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 2002. Edited by V. Berthé, S. Ferenczi, C. Mauduit and A. Siegel.
[21] N. Pytheas Fogg, Terminologie S-adique et propriétés, Preprint available at http://tinyurl.com/8opdb8s, 2011.
[22] M. Queffélec. Substitution dynamical systems—spectral analysis, volume 1294 of Lecture Notes in Mathematics. Springer-Verlag, Berlin, 1987.
[23] G. Rauzy. Suites à termes dans un alphabet fini. Séminaire de Théorie des nombres de Bordeaux, exposé 25:1–16, 1982-1983.
5. Appendix: proof of Lemma 2.5
This entire section is dedicated to the proof of Lemma 2.5. Two words x and y are said to be prefix-comparable if x is a prefix of y or y is a prefix of x.
Fact 5.1. For any morphism f and any prefix-comparable words x and y, f (x) and f(y) are also prefix-comparable.
Fact 5.2. If two words x and y are prefix-comparable and z is a prefix of y, then x and z are also prefix-comparable.
In what follows, let us recall that u, introduced in Section 2, is the fixed point of ν starting with the letter c.
Lemma 5.3. Any factor of u of length at least two and starting with b1 is
prefix-comparable to one of the words b1νℓ(1)e1c, b1νℓ(1)c, b1νℓ(1)e01, b1νℓ(2)e2 and
b1νℓ(2)e3for some ℓ≥ 0.
Proof. We proceed by induction. For factors of length two, three or four, the result is true with ℓ∈ {0, 1} since
b1Γ ∩ F (u) = {b12, b1c},
b1Γ2∩ F (u) = {b11e1, b1cc, b11c, b11e0, b12e2, b12e3},
b1Γ3∩ F (u) ⊆ {b11c, b12e2, b12e3}Γ ∪ {b11e1c, b11e01, b1ccc}.
Let b1xbe a factor of length at least five of u. The result is immediately true if x
starts with 1c, 2e2, 2e3, 1e1cor 1e01, (with ℓ = 0). Thus, assume that x starts with
ccc. By the definition of ν, this implies the existence of a factor y of u such that b1y
is a factor of u, x is a prefix of ν(y) and |y| < |x|. By the inductive hypothesis, b1yis
prefix-comparable to one of the words b1νℓ(1)e1c, b1νℓ(1)c, b1νℓ(1)e01, b1νℓ(2)e2or
b1νℓ(2)e3for some ℓ≥ 0. So, by Fact 5.1 and Fact 5.2, b1xis prefix-comparable to
Corollary 5.4. Any factor of u of length at least three and starting with a1b1
is prefix-comparable to one of the words a1b1νℓ(1)e1c, a1b1νℓ(1)c, a1b1νℓ(1)e01,
a1b1νℓ(2)e2 or a1b1νℓ(2)e3 for some ℓ≥ 1.
Proof. After Lemma 5.3, it is sufficient to observe that the words a1b11e1c, a1b11c,
a1b11e0, a1b12e2and a1b12e3 are not factors of words in ν(Γ∗).
Lemma 5.5. Any factor a2b2mof u of length at least three is prefix-comparable to
a2b2νℓ(2)e2 for some ℓ≥ 1.
Proof. The proof is similar to that of Corollary 5.4.
Proof of Lemma 2.5. Let n≥ 2 be an integer such that ab ∈ Rn(u). This means
that there exists a word v and two letters x, y such that |v| = n − 2, abvxy is a factor of ψ(u), both words abv and vxy are left special factors of ψ(u) and bvx is not a left special factor of ψ(u). We aim to prove that, for some integer k, v = uk
(and so n = 6k+ 1), x = 1 and y = e
1 (see before Fact 2.2 for the definition of
words uk).
By construction of ψ(u), two cases can hold (maybe not exclusively).
• Case I: There exists a word w1 and two letters x1, y1 such that abvxy =
ψ(a1b1w1x1y1) and a1b1w1x1y1is a factor of u.
• Case II: There exists a word w2 and two letters x2, y2 such that abvxy =
ψ(a2b2w2x2y2) and a2b2w2x2y2is a factor of u.
First, we prove that Case II is impossible. By Lemma 5.5, the word a2b2w2x2y2is
prefix-comparable to a2b2νk(2)e2for an integer k. If |a2b2w2x2y2| ≤ |a2b2νk(2)e2|,
then b2w2x2is a prefix of b2vk2. Thus, the non-left special word bvx = ψ(bw2x2) is a
prefix of buk2. This contradicts the fact that buk2 is a left special factor of ψ(u) (the
words abuk2 = ψ(a2b2vk2) and e3buk2 = ψ(e3b3vk2) are factors of ψ(u)). Hence
|a2b2w2x2y2| > |a2b2νk(2)e2| and νk(2)e2 is a prefix of w2x2y2. Consequently, the
non-left special factor uk2e2 (see Lemma 2.3) is a prefix of the left special factor
vxy= ψ(w2x2y2), which is a contradiction.
Therefore, Case I holds. By Corollary 5.4, the word a1b1w1x1y1 is
prefix-comparable to one of the words a1b1νk(1)e1c, a1b1νk(1)c, a1b1νk(1)e01, a1b1νk(2)e2
and a1b1νk(2)e3for some k≥ 1.
If |a1b1w1x1y1| ≤ |a1b1νk(1)|, then b1w1xis a prefix of b1vk. Thus the non-left
special factor bvx = ψ(b1w1x) of ψ(u) is a prefix of the left special factor buk, which
is a contradiction.
If|a1b1w1x1y1| ≥ |a1b1νk(1)|+2, then one of the words a1b1νk(1)e1c, a1b1νk(1)c,
a1b1νk(1)e01, a1b1νk(2)e2 or a1b1νk(2)e3 is a prefix of a1b1w1x1y1. Thus uk1e1c,
uk1c, uk1e11, uk2e2or uk2e3is a prefix of the left special factor vxy of ψ(u). This
Thus, |a1b1w1x1y1| = |a1b1νk(1)| + 1. Consequently, we have w1 = vk and
x1y1 ∈ {1e1,1c, 1e0,2e2,2e3}, that is, xy ∈ {1e1,1c, 2e2,2e3}. Noting that vxy is
left special whereas by Lemma 2.3, the words uk1c, uk2e2 and uk2e3 are not left