On the syntactic complexity of tree series

(1)

DOI:10.1051/ita/2010014 www.rairo-ita.org

ON THE SYNTACTIC COMPLEXITY OF TREE SERIES

Symeon Bozapalidis

¹

and Antonios Kalampakas

²^,³

Abstract. We display a complexity notion based on the syntax of a tree series which yields two distinct hierarchies, one within the class of recognizable tree series and another one in the class of non-recognizable tree series.

Mathematics Subject Classification.68Q01, 68Q15.

1. Introduction

The notion of syntactic complexity of a recognizable graph language has orig- inated in [5]. It is a structural complexity measure giving rise to a syntactic classiﬁcation inside the class of recognizable graph languages. The syntactic complexity of a recognizable graph language L is given by a function mapping any couple of natural numbers, representing the type of a graph, to the number of the distinct syntactic classes at this type. We have displayed graph languages with various types of complexities and showed that the set of connected graphs has Bellian complexity. Furthermore, the language of Eulerian graphs is syntactically more complicated than that of connected graphs (cf. [8]).

This notion has been also investigated in the setup of tree languages. More precisely, in [6], the syntactic complexity of a tree language is defined to be the number of the distinct syntactic classes of all trees with a fixed yield length. This allows a classification of tree languages according to their structural properties and several interesting results are obtained. The class of recognizable tree languages is properly contained in that of languages with bounded complexity. The Dyck tree

Keywords and phrases.Tree series, syntactic complexity, recognizability.

1 Aristotle University of Thessaloniki, Department of Mathematics, 54124 Thessaloniki, Greece;bozapali@math.auth.gr

2Democritus University of Thrace, Department of Production Engineering and Management, 67100 Xanti, Greece;akalamp@math.auth.gr

3Technical Institute of Kavala, Department of Exact Sciences, 65404 Kavala, Greece.

Article published by EDP Sciences c EDP Sciences 2010

(2)

language of orderk

D_k={t|t∈TΓ,|t|f1 =· · ·=|t|fk,rank(f_i) = 2},

has polynomial syntactic complexity of degreek−1, and the diagonal language L_d={f(t, t)|t∈TΓ}, f a ﬁxed binary symbol,

has exponential syntactic complexity.

In the present paper we develop a structural complexity theory for tree series.

LetTΓ be the set of all trees over a ranked alphabet Γ and PΓ the set of all trees with just one occurrence of the variablex. ThenPΓ becomes a free monoid with operation the substitution atx. Moreover,PΓ acts on the setTΓby the same way.

A formal power series on trees with coefficients in a field K, is just a function S :TΓ →K. The right derivative of S att and the left derivative of S atτ are defined respectively by

St⁻¹:PΓ→K, (St⁻¹, τ) = (S, τt) for everyτ∈PΓ, τ⁻¹S:TΓ →K, (τ⁻¹S, t) = (S, τ t) for everyt∈TΓ.

We denote byV(S) (resp. F(S)) the subspace ofK^P^Γ (resp. K^T^Γ) generated by the right (resp. left derivatives) ofS.

Tree series are interpreted byK−Γ-algebrasA= (A, α) whereAis aK-vector space andαis a family of multilinear functions

α_f :A^m→A, f ∈Γ_m, m≥0.

We can endowV(S), in a canonical way, with the structure of aK−Γ-algebra, called thesyntacticK−Γ-algebra of S.

A tree series S : TΓ →K is said to be recognizable if it can be realized by a pair (A, ϕ) where A is a ﬁnitely dimensional K−Γ-algebra and ϕ : A → K a linear form. This means thatS =ϕ◦h_A, whereh_Ais the uniqueK−Γ-algebra morphism fromTΓ toA. We have the following characterization result from [4]:

a tree seriesS :TΓ→Kis recognizable if and only if dimF(S)<∞if and only if dimV(S)<∞. In this case we have dimF(S) = dimV(S).

Syntactic complexity of tree series is introduced in Section4, it is described by a function SC_S : N→ N which sends every natural number n to the maximum number of linearly independent right derivativesSt⁻¹ where truns over all trees whose width (i.e., yield length) is n. It is proved that for every tree language L, the syntactic complexity of its characteristic series does not exceed that of L.

We say that a tree series S : TΓ → K has bounded, polynomial or exponential syntactic complexity according to the explicit formula deﬁning the functionSC_S. Although every recognizable series has bounded syntactic complexity this does not characterize recognizability. Indeed, the series that sends every tree t to _|t|¹ (where|t|is the size oft) has bounded complexity but is not recognizable. More- over, we display a tree series (the characteristic series of D_k) with polynomial

(3)

syntactic complexity, and show that from every seriesS with non zero coeﬃcients we can construct a seriesS_d with exponential complexitySC_S_d(n) =C_n−1, where

C_n= 1 n+ 1

2n n

is thenth Catalan number.

Berstel and Reutenauer demonstrated in [1], by using a pumping lemma technique, that the series height sending every tree to its height is not recognizable.

In Section5we calculate its syntactic complexity SCheight(n) =n−log₂n,

where r denotes the greatest integer not exceeding r. On the other hand, it is rather surprising that the series height_p that sends every tree to its height modulop (pprime) is recognizable. Moreover, we present a tree language, over the alphabet Γ consisting of a binary symbolf and a unary symbol α, that has the same syntactic complexity withheight, namely the setL_avlof all AV Ltrees i.e., all treestsuch that eithert=αort=f(t1, t2), witht1, t2∈L_avl and

|height(t1)−height(t2)| ≤1.

Since all recognizable tree series have bounded syntactic complexity this notion is only appropriate for a classiﬁcation of non-recognizable tree series. In the last section we provide a complexity measure suitable for the class of recognizable tree series. It is based on a reﬁned context namelyP_Γ⁽ⁿ⁾which is the set of all trees in TΓ(X) where the variablesx1, . . . , x_n occur in the yield of the tree (in this order from left to right) exactly once. Two dual notions of derivatives are obtained for allτ ∈P_Γ⁽ⁿ⁾andt1, . . . , t_n∈TΓ

τ⁻¹S :T_Γⁿ→K, (τ⁻¹S,(t1, . . . , t_n)) = (S, τ[t1, . . . , t_n]), and

S(t1, . . . , t_n)⁻¹:P_Γ⁽ⁿ⁾→K, (S(t1, . . . , t_n)⁻¹, τ) = (S, τ[t1, . . . , t_n]), and their corresponding sets are

F(S)⁽ⁿ⁾=τ⁻¹S|τ∈P_Γ⁽ⁿ⁾andV(S)⁽ⁿ⁾=S(t1, . . . , t_n)⁻¹|t1, . . . , t_n∈TΓ. We prove that a seriesS :TΓ →K is recognizable if and only if dimF(S)⁽ⁿ⁾ <

∞, for all n, if and only if dimV(S)⁽ⁿ⁾ < ∞, for all n. In this case it holds dimF(S)⁽ⁿ⁾= dimV(S)⁽ⁿ⁾. The reﬁned syntactic complexityRSC_S of a recognizable tree seriesS is described by the function sending every natural numbernto the dimension ofF(S)⁽ⁿ⁾. Since for anynon recognizable tree seriesS it holds

RSC_S(n) =∞, for alln

(4)

the above notion of complexity is suitable to classify only recognizable tree series.

The recognizable seriestree size, branch enumeration, and branch lengthhave the same linear complexity,

RSC_S(n) =n+ 1.

Moreover, the recognizable tree series height_p has exponential reﬁned syntactic complexity namely,

RSCheight_p(n) =pⁿ.

The presented syntactic complexities yield the following hierarchy within the class of tree series

RSCBOU N D⊂RSCP OL⊂RSCEXP ⊂REC ⊂

⊂SCBOU N D⊂SCP OL⊂SCEXP, whereRECis the class of recognizable tree series andU BOU N D,U P OL,U EXP denote respectively the classes of tree series with bounded, polynomial and expo- nentialU-complexity (U =SC, RSC).

2. Tree languages and their syntactic complexity

To construct trees we need a (ﬁnite) ranked alphabet Γ =

k≥0

Γ_k and a set X of variables. We poseX_n ={x1, . . . , x_n}, forn≥1.

The set of trees over Γ andX is the smallest setTΓ(X) inductively deﬁned by the items

• Γ0∪X ⊆TΓ(X);

• t1, . . . , t_k ∈TΓ(X) andf ∈Γ_k impliesf(t1, . . . , t_k)∈TΓ(X).

Often the treef(t1, . . . , t_k) is depicted as

hence the denomination tree. We denote TΓ the set of all trees over Γ without variables (i.e.,TΓ=TΓ(∅)). Subsets ofTΓ(X) are refereed to astree languages.

The height of a treet∈TΓ(X) is the length of its longest branch. Formally the function height :TΓ(X)→Nis inductively deﬁned by

• height(α) = 0, for α∈Γ0∪X;

• height(f(t1, . . . , t_k)) = 1 + max{height(t1), . . . ,height(t_k)}, f ∈ Γ_k and t1, . . . , t_k ∈TΓ(X).

Theyield of a treet∈TΓ is the wordy(t) obtained by concatenating from left to right the leaves (i.e., the symbols of Γ0 occurring int). Formally,y:TΓ→Γ^∗₀, is

(5)

inductively deﬁned by

y(c) =c, (c∈Γ0), y(f(t1, . . . , t_k)) =y(t1). . . y(t_k), (f ∈Γ_k, t_i∈TΓ).

The width oft∈TΓ(X) is the length of its yield: |y(t)|= width(t).

The basic operation on trees is substitution. Given t, t1, . . . , t_n ∈TΓ(X_n), we denote by t[t1, . . . , t_n] the result of substituting t_i at the place of x_i, inside t, 1≤i≤n. Denote byPΓ the subset ofTΓ(x) consisting of all trees with just one occurrence of the variablex. PΓ becomes a monoid with the substitution atxas operation: forτ, π∈PΓ,τ·π=τ[π]. This monoid isfree over the set of trees of the form

f(t1, . . . , t_i−1, x, t_i+1, . . . , t_k), f ∈Γ_k, k≥1, t_j∈TΓ. (1) This means that every τ ∈ PΓ, τ = x, is uniquely decomposed as a product of trees of the form (1) and the number of these trees is denoted by ||τ||, actually

||τ||is the length of the unique path starting from the root ofτ and ending to its unique leaf labeled byx.

The monoidPΓ acts, again by substitution atx, on the setTΓ: PΓ×TΓ→TΓ, (τ, t)→τ·t=τ[t].

LetL⊆TΓ be a tree language,t∈TΓ, and τ∈PΓ. Theright derivative of Latt and theleft derivative ofL atτ are

Lt⁻¹={τ|τ ∈PΓ, τ ·t∈L}, τ⁻¹L={t|t∈TΓ, τ·t∈L}, respectively. The equivalence relation∼_L on (the algebra)TΓ

t∼Lt ifLt⁻¹=Lt⁻¹ is actually a congruence,i.e.,

t1∼Lt1, . . . , t_k∼Lt_k andf ∈Γ_k, imply f(t1, . . . , t_k)∼Lf(t1, . . . , t_k).

It is called the syntactic congruence of the languageL.

Proposition 2.1(cf. [7]). The following conditions are equivalent for a language L⊆TΓ

(i) Lis recognized by a ﬁnite tree automaton.

(ii) card{Lt⁻¹|t∈TΓ}<∞.

(iii) card{τ⁻¹L|τ ∈PΓ}<∞.

(iv) The syntactic congruence ∼L has ﬁnite index (i.e., a ﬁnite number of classes).

In [6] we develop a complexity theory in order to investigate and classify tree languages according to their syntactic structure. Syntactic complexity is a way to measure the complexity of the syntax of a tree language. It counts the number

(6)

of distinct syntactic classes of trees with a ﬁxed width. Formally the syntactic complexity of a tree languageL⊆TΓ is the function

SC_L:N→N∞, SC_L(n) = card{¯t|t∈TΓ,width(t) =n}, n∈N, where ¯tstands for the ∼L-class oft. Alternatively we have

SC_L(n) = card{Lt⁻¹|t∈TΓ,width(t) =n}, n∈N.

It should be noticed thatSC_L(n)<∞, for alln∈N, whenever the alphabet Γ has no unary symbols, because in this case the set of trees withwidth=nis ﬁnite.

A tree languageL⊆TΓ is said to have

• bounded syntactic complexity, whenever there is a constantc∈Nsuch that SC_L(n)≤c, for alln;

• polynomial syntactic complexity whenever there is a polynomial

p(n) = m k=0

a_kn^k, a_k ∈N

such that

SC_L(n)≤p(n), for alln;

• exponential syntactic complexity whenever there is a constantc∈Nsuch that

SC_L(n)≤cⁿ, for alln.

By Proposition2.1every recognizable tree language has bounded syntactic complexity, but the converse is not true. The language

L_bal={tk|k≥0}, witht0=α,t_k+1=f(t_k, t_k),k≥0,

where rank(α) = 0, rank(f) = 2, is an instance of a non-recognizable tree language with bounded syntactic complexity: SC_L_bal(n)≤2, for alln.

Proposition 2.2 ([6], Prop. 5). Given the ranked alphabet Γ = {f1, . . . , f_k, α}, rank(f_i) = 2,1≤i≤k,rank(α) = 0, the Dyck tree language of order k

D_k={t|t∈TΓ,|t|f1 =· · ·=|t|fk} has polynomial syntactic complexity of degreek−1, namely

SC_D_k(n) = 1

(k−1)!n(n+ 1)· · ·(n+k−2).

Throughout this paper we will often use the alphabet ˆΓ consisting of a binary symbolf and a nullary symbolα, ˆΓ ={f, α}.

(7)

Proposition 2.3([6], Prop. 6). The diagonal language L_d={f(t, t)|t∈TΓˆ}, has exponential syntactic complexity, namely,

SC_L_d(n+ 1) = 1 n+ 1

2n n

4ⁿ n³^/²√

π·

3. Formal power series on trees

Aformal power series on trees (tree series for sort), with coefficients in a field K, is just a functionS:TΓ→K. The support ofSis the set{t∈TΓ|(S, t)= 0}. For t ∈ TΓ the element (S, t) is referred to as the coefficient of S at t. In the set KTΓ of all such tree series, addition, scalar multiplication and Hadamard product are defined pointwise

(S1+S2, t) = (S1, t) + (S2, t), (λS, t) =λ(S, t), (S1S2, t) = (S1, t)(S2, t), forS1, S2, S∈KTΓ,λ∈K andt∈TΓ.

We denote byKTΓthe set of all tree seriesS :TΓ →K with ﬁnite support, called polynomials. Thus any polynomial p∈ KTΓ can be written as a ﬁnite formal sum

p= n j=1

λ_jt_j for somen≥1,λ_j ∈K, andt_j ∈TΓ.

Thederivatives of a tree seriesS :TΓ →K at t∈TΓ and τ∈PΓ, are deﬁned by

St⁻¹:PΓ→K, (St⁻¹, τ) = (S, τt) for everyτ∈PΓ, τ⁻¹S:TΓ →K, (τ⁻¹S, t) = (S, τ t) for everyt∈TΓ,

which belong to theK-spacesK^P^Γ andK^T^Γ of all functionsPΓ →KandTΓ→K respectively with the pointwise addition and scalar multiplication.

We denote byV(S) (resp. F(S)) the subspaceK^P^Γ (resp. K^T^Γ) generated by all the right (resp. left derivatives) ofS:

V(S) =St⁻¹|t∈TΓ (resp. F(S) =τ⁻¹S|τ ∈PΓ).

Tree series are interpretedviamultilinear algebras. AK−Γ-algebraA= (A, α) is aK-vector spaceAtogether with a family of multilinear functions

α_f :A^m→A, f ∈Γ_m, m≥0.

A morphism from A = (A, α) to B = (B, β) is a linear function h : A → B compatible with Γ-operations

h(α_f(q1, . . . , q_m)) =β_f(h(q1), . . . , h(q_m)), f ∈Γ_m, m≥0, q_i∈A.

(8)

The spaceKTΓ is converted, in a natural way, into aK−Γ-algebra with the property: for any K−Γ-algebra A = (A, α) there is a function h_A : TΓ → A inductively deﬁned by

h_A(f(t1, . . . , t_m)) =α_f(h_A(t1), . . . , h_A(t_m)), f ∈Γ_m, m≥0, t_i∈TΓ. Its linear extension

¯h_A:KTΓ →A, h¯_A

i

λ_it_i

=

i

λ_ih_A(t_i)

is the unique existingK−Γ-algebra morphism fromKTΓtoA. CallAreachable, whenever ¯h_Ais a surjective function. The monoidPΓ acts on eachK−Γ-algebra A= (A, α)

PΓ×A→A, (τ, q)→τ·q by induction on||τ||as follows

− x·q=qfor allq∈A;

− ifτ =f(t1, . . . , t_i−1, x, t_i+1, . . . , t_k),f∈Γ_k,t_j∈TΓ, q∈A, τ·q=α_f(h_A(t1), . . . , h_A(t_i−1), q, h_A(t_i+1), . . . , h_A(t_k));

− ifτ =τ1·τ2,τ1=x=τ2, q∈A,

τ·q=τ1·(τ2·q).

Lemma 3.1. For anyA= (A, α), it holds

h_A(τ t) =τ·h_A(t), for allτ ∈PΓ, t∈TΓ.

We can endowV(S) with the structure of aK−Γ-algebra if for eachf ∈Γ_k we deﬁne the operation (V(S))_f :V(S)^k→V(S) by setting

(V(S))_f(St⁻₁¹, . . . , St⁻_k¹) =Sf(t1, . . . , t_k)⁻¹, for allt1, . . . , t_k∈TΓ. The so obtained pairVS= (V(S), V(S)) is called thesyntacticK−Γ-algebra ofS.

Clearly,VS is reachable since the function

¯h_V_S :KTΓ →V(S), ¯h_V_S

⎛

⎝

j

λ_jt_j

⎞

⎠=

j

λ_jSt⁻_j¹

is surjective.

A tree series S : TΓ → K is said to be recognizable if there exists a pair (A = (A, α), ϕ), called a realization of S, with A a ﬁnitely dimensional K−Γ- algebra and ϕ: A →K a linear form such thatS =ϕ◦h_A. We have the next characterization result.

(9)

Theorem 3.1(cf. [4]). A tree seriesS:TΓ→K is recognizable iﬀdimF(S)<∞ iﬀdimV(S)<∞. In this case we have dimF(S) = dimV(S).

Thesyntactic representation of S:TΓ →K is (VS = (V(S), V(S)), ϕ_S) where ϕ_S is the linear form

ϕ_S :V(S)→K, ϕ_S(St⁻¹) = (S, t).

This representation is universal in the following sense.

A realization (A= (A, α), ϕ) ofS is reachable if the underline K−Γ-algebra (A, α) is reachable.

Proposition 3.1 (cf. [3]). For every reachable realization (A= (A, α), ϕ) of S there is a unique surjective morphism of K−Γ-algebras h : A → VS such that h_V_S =h◦h_Aandϕ_S◦h=ϕ. The functionhis deﬁned as follows: if(h_A(t_i))_i∈I is a basis forA, then

h(h_A(t_i)) =St⁻_i¹, i∈I.

A characterization of tree series recognizability through matrix representations is presented in [3] whereas an eﬀective construction ofVS is given in [2].

Given a language L ⊆ TΓ its characteristic series ch(L) : TΓ → K, has as coeﬃcients

(ch(L), t) = 1 if t∈L,

= 0, otherwise.

Proposition 3.2. If L ⊆ TΓ is recognizable, then so is its characteristic series ch(L).

Proof. The identity

τ⁻¹ch(L) = ch(τ⁻¹L), holds for everyτ∈PΓ. Indeed, for all t∈TΓ we have

(τ⁻¹ch(L), t) = 1⇔(ch(L), τ t) = 1⇔τ t∈L

⇔t∈τ⁻¹L⇔(ch(τ⁻¹L), t) = 1.

Thus card{τ⁻¹(ch(L))|τ ∈PΓ} ≤card{τ⁻¹L|τ∈PΓ}<∞, and so dim(ch(L))<

∞,i.e., ch(L) is recognizable (Thm.3.1).

Proposition 3.3. If the series S, S :TΓ →K are recognizable, then so are the seriesS+S, λS (λ∈K),SS,τ⁻¹S (τ ∈PΓ).

Proof. We only exhibit the caseSS. Consider the syntactic realizations (V(S), V(S), ϕ_S), (V_S, v_S, ϕ_S) ofS, S respectively. Then the triple

(V(S)⊗V_S, V(S)⊗v_S, ϕ_S⊗ϕ_S)

(10)

is a realization ofSS (cf. [1]). HereV(S)⊗V_S denotes the tensor product of the spacesV(S) andV_S,

(V(S)⊗v_S)_f : (St⁻¹)^k →V(S)⊗V_S, f ∈Γ_k, k≥0, is given by

(V(S)⊗v_S)_f(St⁻_i₁¹⊗St⁻_j₁¹, . . . , St⁻_i ¹

k ⊗St⁻_j ¹

k ) =

Sf(t_i₁, . . . , t_i_k)⁻¹⊗Sf(t_j₁, . . . , t_j

k)⁻¹ whereas the linear formϕ_S⊗ϕ_S :V(S)⊗V_S →kis given by

(ϕ_S⊗ϕ_S)(St⁻¹⊗St⁻¹) = (S, t)(S, t).

The result follows from the next inequality

dimV_SS ≤dim(V(S)⊗V_S)≤dimV(S)·dimV_S <∞.

4. Syntactic complexity of tree series

Syntactic complexity is a tool to study the syntax of a tree seriesS :TΓ→K.

It is described by a function SC_S : N→ N_∞ which sends every natural number nto the maximum number of linearly independent right derivativesSt⁻¹ wheret runs over all trees whose width isn. Thus, by setting

V_n(S) =St⁻¹|t∈TΓ,width(t) =n, we get

SC_S(n) = dimV_n(S).

Notice that in the case Γ deprives unary symbols (Γ1=∅), then dimVn(S)<∞, i.e., SC_S ranges overN. The bounded, polynomial or exponential syntactic complexity classes for tree seriesS:TΓ→Kare deﬁned analogously to the case of tree languages. The corresponding classes of tree series are denotedSCBOU N D(Γ), SCP OL(Γ),SCEXP(Γ).

By Proposition3.2, it is clear that for every tree languageL⊆TΓ, the syntactic complexity of ch(L) does not exceed that ofL.

Proposition 4.1. For everyL⊆TΓ it holds SCch(L)≤SC_L.

According to Theorem3.1every recognizable series S :TΓ →K has bounded syntactic complexity, but this fact does not characterize recognizability.

Theorem 4.1.There is a non-recognizable tree series which has bounded syntactic complexity.

Proof. Consider the series

S :TˆΓ→Q, (S, t) = 1

|t|,

(11)

where |t| is the number of symbols of ˆΓ = {f, α} occurring in t. We construct inductively the sequence of trees

t0=α, t_k+1=f(t_k, α),

and observe that|tk|= 2k+ 1,k≥0. Now, the derivatives St⁻₀¹, . . . , St⁻_k−¹₁, are linearly independent. Indeed, let

λ0St⁻₀¹+λ1St⁻₁¹+· · ·+λ_k−1St⁻_k−¹₁= 0, that is

λ0

|τ|+|t0|+_|τ|^λ₊¹_|t

1|+· · ·+_|τ|^λ₊^k−1_|t

k−1|= 0, for allτ∈PˆΓ,

where|τ| is the number of symbols of ˆΓ occurring inτ. As τ ranges over PΓˆ we may view|τ|as a variablexso that the previous relation can be written as

λ0

x+1 +_x^λ₊₃¹ +· · ·+_x₊₂^λ^k−1_k−₁ = 0, for allx.

By diﬀerentiating the above relationptimes (p= 0,1, . . . , k−1) we get the system λ0

(x+ 1)^p + λ1

(x+ 3)^p +· · ·+ λ_k−1

(x+ 2k−1)^p = 0, for allx.

Takingx= 0 above, we ﬁnally obtain the Vandermonde system λ0

1^p +λ1

3^p +· · ·+ λ_k−1

(2k−1)^p = 0, p= 0,1, . . . , k−1,

from which we getλ0=λ1=· · ·=λ_k−1= 0 as wanted. Consequently, by virtue of Theorem3.1the seriesS is not recognizable.

Let us calculate its syntactic complexity. For everynall treestwithwidth(t) = n satisfy |t| = 2n−1. It turns out that all derivatives St⁻¹, width(t) = n are equal to each other and so the subspace they generate has dimension 1, that is SC_S(n) = 1 for alln. HenceS has bounded syntactic complexity.

The previous result states thatSCBOU N D(Γ) strictly contains the classREC(Γ) of recognizable tree series over Γ. Hence, we get the hierarchy

REC(Γ)⊂SCBOU N D(Γ)⊂SCP OL(Γ)⊂SCEXP(Γ).

Lemma 4.1. Let (A= (A, α), ϕ) be a realization of the series S : TΓ →K and put

V_n(A) =hA(t)|t∈TΓ,width(t) =n.

There exists a linear surjective functionh_n:V_n(A)→V_n(S)deﬁned as follows: if (h_A(t_i))1≤i≤k is a basis of the spaceV_n(A), then

h_n(h_A(t_i)) =St⁻_i ¹, 1≤i≤k.

Consequently,dimV_n(S)≤dimV_n(A).

(12)

Proof. Immediate from Proposition 3.1.

Proposition 4.2. Let S, S:TΓ→K,λ∈K andτ∈PΓ, then SC_S+S ≤SC_S+SC_S, SC_λS =SC_S (λ= 0),

SC_SS ≤SC_S·SC_S, SC_τ−1S=SC_S.

Therefore, the classesSCBOU N D(Γ),SCP OL(Γ),SCEXP(Γ)are closed under sum, scalar product, Hadamard product and left derivatives.

Proof. We only treat the seriesSS. Applying Lemma4.1forA=V(S)⊗V_S and taking into account the proof of Proposition3.3, we get a surjective function h_n :V_n(V(S)⊗V_S)→V_n(SS) which maps the vector

h_V(S)⊗VS(t) =h_V(S)(t)⊗h_V

S(t) to the vector (SS)t⁻¹. Therefore,

dimV_n(SS)≤dimV_n(V(S)⊗V_S)

≤dim(V_n(S)⊗V_n(S)) = dimV_n(S)·dimV_n(S)

that isSC_SS ≤SC_S·SC_S as wanted.

It is not hard to see that the characteristic series of the tree languagesD_k, introduced in Section3, has the same syntactic complexity asD_k which is polynomial of degreek−1 (see Prop.2.2).

A series S : TΓ → K will be called syntactically hard if it has the highest possible complexity. This means that

dimV_n(S) = card{t|t∈TΓ,width(t) =n}.

From any seriesS : TΓ → K such that (S, t) = 0 for all t ∈ TΓ, a syntactically hard seriesS_d:TΓ→K can be derived by setting

(S_d, s) = (S, t)², ifs=f(t, t),

= 0, otherwise,

provided Γ has a binary symbolf. For the proof of the fact thatS_d is syntactically hard, we have to show that, for all n, the list of derivativesS_dt⁻¹, t ∈TΓ, width(t) =n, is linearly independent. Indeed, assume that

t∈TΓ,width(t)=n

λ_tS_dt⁻¹= 0

or that

t∈TΓ,width(t)=n

λ_t(S_d, τ t) = 0, for allτ∈PΓ.

(13)

Setting aboveτ =f(t, x) for somet∈TΓ, we ﬁnd λ_t(S_d, f(t, t)) = 0 orλ_t(S, t)²= 0, or

λ_t= 0, for allt∈TΓ, width(t) =n, as wanted.

In the case that we deal with the simple alphabet ˆΓ = {f, α}, then it is well known from Combinatorics thatcard{t| width(t) =n} is the (n−1)th Catalan numberC_n−1, where

C_n= 1 n+ 1

2n n

. Hence,

Proposition 4.3. For everyS :TˆΓ→K, with(S, t)= 0 for allt∈TΓˆ, the series S_d:TΓˆ→K deﬁned above is syntactically hard, i.e.,SC_S_d(n) =C_n−1.

5. Tree height

Berstel and Reutenauer, using a pumping lemma technique, demonstrated that the series sending every tree to its height is not recognizable (cf. [1]). Our task in the sequel will be to compute the syntactic complexity of height :TΓ→Q.

For the sake of simplicity, in this section, we assume that Γ is the ranked alphabet ˆΓ consisting of a binary symbol f and a nullary symbolα. We denote byrthe the greatest integer less than or equal torand byrthe least integer greater than or equal tor.

Theorem 5.1. The syntactic complexity of height is given by the formula SCheight(n) =n−log2n, for all n.

Proof. First of all we observe that, for allt∈TΓ, it holds height(t) + 1≤width(t)≤2^height(^t⁾ from which we get that

log₂(width(t))≤height(t)≤width(t)−1. (2) Our next step will be to compute the dimension of the subspace

V_n(height) =heightt⁻¹|t∈TΓ,width(t) =n, for alln.

According to (2) if width(t) =n, then

log₂n≤height(t)≤n−1. (3)

(14)

Let t1, . . . , t_k be trees with diﬀerent heights, height(t_i) = m+i, 1 ≤ i ≤ k, satisfying (3) (m=log2n−1andk=n−log₂n). We are going to show that the derivatives

heightt⁻₁¹, . . . ,heightt⁻_k¹ form a basis ofV_n(height). Assume that

λ1(heightt⁻₁¹) +· · ·+λ_k(heightt⁻_k¹) = 0 or that

λ1height(τ t1) +· · ·+λ_kheight(τ t_k) = 0

for allτ ∈PΓ. Chooseτ1, . . . , τ_k ∈PΓ such that height(τ_i) =m+kand||τ_i||=i.

By evaluating the above equation for τ = τ_i, 1 ≤ i ≤ k, we obtain the linear homogeneous system

λ1height(τ_it1) +· · ·+λ_kheight(τ_it_k) = 0 (4) fori= 1, . . . , k.

Now we need to calculate height(τ_it_j) for everyi andj. Leti= 1, this means that the path from the root ofτ to the unique leaf labeled by the variablexhas length 1 while the height of τ is m+k. Hence, the height of the tree τ1t_j will remainm+kwhen the height oft_j is smaller thanm+k(i.e., forj= 1, . . . , k−1) and only forj =k, which means that height(t_k) =m+k, the height of τ1t_k will become 1 +m+k. Thus, we get that

height(τ1t_j) =

m+k, forj = 1, . . . , k−1;

m+k+ 1, forj =k.

Using the same arguments we get that

height(τ2t_j) =

⎧⎨

⎩

m+k, forj = 1, . . . , k−2;

m+k+ 1, forj =k−1;

m+k+ 2, forj =k, and similarly fori >2. Finally fori=kwe get

height(τ_kt_j) =m+k+j, forj= 1, . . . , k.

Therefore, the system (4) becomes

λ1(m+k) +λ2(m+k) +· · ·+λ_k−1(m+k) +λ_k(m+k+ 1) = 0 λ1(m+k) +λ2(m+k) +· · ·+λ_k−1(m+k+ 1) +λ_k(m+k+ 2) = 0

· · · = 0

λ1(m+k+ 1) +λ2(m+k+ 2) +· · ·+λ_k−1(m+ 2k−1) +λ_k(m+ 2k) = 0

(15)

whose determinant is non vanishing and soλ1=· · ·=λ_k = 0. This shows that heightt⁻₁¹, . . . ,heightt⁻_k¹

are linearly independent. On the other hand we observe that ift, t∈TΓ are such that height(t) = height(t), then heightt⁻¹ = heightt⁻¹ and so heightt⁻₁¹, . . ., heightt⁻_k¹ generates the spaceV_n(height). We conclude that

dimV_n(height) =n−log₂n, for alln

hence the proposed formula forSCheight.

Remark. In the case that Γ is an arbitrary ﬁnite ranked alphabet with no unary symbols then the inequality (2) becomes

log_k₂width(t)≤height(t)≤width(t)−1 k1−1

wherek1(resp. k2) is the smallest (resp. greatest) positive rank such that Γ_k₁ =∅ (resp Γ_k₂ =∅). However, the computed syntactic complexity is in the same class as above.

In the sequel we display a tree language having the same syntactic complexity with the series height. The languageL_avlofAV L-trees consists of all treest∈TΓ

such that eithert=αort=f(t1, t2) witht1, t2∈L_avland

|height(t1)−height(t2)| ≤1.

If t, t ∈ L_avl and height(t) = height(t), then L_avlt⁻¹ = L_avlt⁻¹. Moreover, choosing a sequence of AV L-trees t_n with height(t_n) = n, (n ≥ 0), it is easy to see that the corresponding derivativesL_avlt⁻¹ are distinct. HenceL_avl is not recognizable.

For a ﬁxedn, we have

card{Lavlt⁻¹|width(t) =n}= card{height(t)|width(t) =n}

and the last number, as we have previously seen, isn−log₂n. We conclude that Proposition 5.1. It holdsSC_L_avl =SCheight.

Recall that forpprime the ringZpof modulopintegers is a ﬁeld. We notice the rather surprising result that the series sending every treetto its height modulo p (p prime) is recognizable. Indeed, we have to show that the space generated by the right derivatives of the series

height_p:TΓ→Zp, height_p(t) = height(t)(modp) has ﬁnite dimension. Actually we shall show that

dimVheight_p=p.

As alwaysZp ={0,1, . . . , p−1}stands for the ﬁeld ofmod pintegers.

(16)

Let us choose treest_i, with height(t_i) =i, 0≤i≤p−1 and suppose that λ0height_pt⁻₀¹+λ1height_pt⁻₁¹+· · ·+λ_p−1height_pt⁻_p−¹₁= 0 or that

λ0height_p(τ t0) +λ1height_p(τ t1) +· · ·+λ_p−1height_p(τ t_p−1) = 0

for allτ∈PΓ. Chooseτ1, . . . , τ_p−1∈PΓsuch that height(τ_i) =p−1 and||τi||=i.

Evaluating the previous equality for τ = τ_i, 0 ≤ i ≤ p−1 we ﬁnd a linear homogeneous system whose determinant

p−1 p−1 · · · p−1 p−1 p−1 p−1 p−1 · · · p−1 p−1 0 p−1 p−1 · · · p−1 0 1 ... ... ... · · · · · · · · · p−1 p−1 0 1 · · · p−3 p−1 0 1 · · · p−3 p−2

=

0 · · · 0 p−1 ... · · · p−1 ... 0 · · · · · · ... p−1 · · · · · · ...

is (−1)^p(p−1)^p. Since pis prime, p does not divide (p−1)^p, that is the above determinant is non vanishing modulo p. Thus λ_i = 0, for i= 0,1, . . . , p−1 and so the considered list of right derivatives is linearly independent. Moreover, it generates the spaceVheight_p since for everyt∈TΓ we have

height_pt⁻¹= height_pt⁻_k¹, withk= height_pt.

We conclude:

Proposition 5.2. For every prime numberpthe tree seriesheight_pis recognizable.

Moreover,dimVheight_p=pand socardVheight_p=p^p.

6. The complexity of recognizable tree series

The hierarchy formed by the previously introduced syntactic complexity locates all recognizable tree series into its ﬁrst level, the class of bounded tree series.

Our intention in the present section is to built a hierarchy within the class of recognizable series by providing an eﬃcient complexity measure for these series.

As we have seen the functionSC_S introduced in the previous section gives no information for recognizable tree series. Our intention in the present section is to provide an eﬃcient complexity measure for these series.

Let us denote byP_Γ⁽ⁿ⁾the subset ofTΓ(X_n) formed by all trees wherex1, . . . , x_n occur in the yield of the tree (in this order from left to right) exactly once.

(17)

For instance the tree

For everyn≥1 there is a function

P_Γ⁽ⁿ⁾×T_Γⁿ→TΓ, (τ, t1, . . . , t_n)→τ[t1, . . . , t_n].

With respect toS:TΓ→K, two dual notions of derivatives can be deﬁned:

τ⁻¹S :TΓⁿ→K, (τ⁻¹S,(t1, . . . , t_n)) = (S, τ[t1, . . . , t_n]), and

S(t1, . . . , t_n)⁻¹:P_Γ⁽ⁿ⁾→K, (S(t1, . . . , t_n)⁻¹, τ) = (S, τ[t1, . . . , t_n]), for allτ ∈P_Γ⁽ⁿ⁾andt1, . . . , t_n∈TΓ. For everynwe set

F(S)⁽ⁿ⁾=τ⁻¹S|τ∈P_Γ⁽ⁿ⁾andV(S)⁽ⁿ⁾=S(t1, . . . , t_n)⁻¹|t1, . . . , t_n∈TΓ. In particularF(S)⁽¹⁾=F(S) andV(S)⁽¹⁾=V(S).

We need some additional notation. Given aK−Γ-algebraA= (A, α), for every t∈TΓ(X_n) and every q1, . . . , q_n ∈A, the element t[q1, . . . , q_n]∈A is inductively deﬁned as follows

• fort=x_i,x_i[q1, . . . , q_n] =q_i, 1≤i≤n;

• fort=c∈Γ0,c[q1, . . . , q_n] =α_c;

• fort=f(t1, . . . , t_k),f ∈Γ_k,k≥1,t_i∈TΓ(X_n)

f(t1, . . . , t_k)[q1, . . . , q_n] =α_f(t1[q1, . . . , q_n], . . . , t_k[q1, . . . , q_n]).

Lemma 6.1. For anyK−Γ-algebra A= (A, α)it holds:

h_A(τ[t1, . . . , t_n]) =τ[h_A(t1), . . . , h_A(t_n)]

for allτ∈P_Γ⁽ⁿ⁾ andt1, . . . , t_n∈TΓ.

Proof. Straightforward.

Lemma 6.2. Let A= (A, α) be a ﬁnitely dimensional K−Γ-algebra. Then, for every t ∈ TΓ, height(t) > dimA, the element h_A(t) can be written as a linear combination of elements of the formh_A(s), for s∈TΓ withheight(s)≤dimA.