• Aucun résultat trouvé

Additional Analytical Support for a New Method to Compute the Likelihood of Diversification Models

N/A
N/A
Protected

Academic year: 2021

Partager "Additional Analytical Support for a New Method to Compute the Likelihood of Diversification Models"

Copied!
23
0
0

Texte intégral

(1)

HAL Id: hal-02941245

https://hal.archives-ouvertes.fr/hal-02941245

Submitted on 19 Nov 2020

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Additional Analytical Support for a New Method to Compute the Likelihood of Diversification Models

Giovanni Laudanno, Bart Haegeman, Rampal Etienne

To cite this version:

Giovanni Laudanno, Bart Haegeman, Rampal Etienne. Additional Analytical Support for a New

Method to Compute the Likelihood of Diversification Models. Bulletin of Mathematical Biology,

Springer Verlag, 2020, 82 (22), 22 p. �10.1007/s11538-020-00698-y�. �hal-02941245�

(2)

(will be inserted by the editor)

Additional analytical support for a new method to compute

1

the likelihood of diversification models

2

Giovanni Laudanno · Bart Haegeman ·

3

Rampal S. Etienne

4

5

Received: date / Accepted: date

6

Abstract

7

Molecular phylogenies have been increasingly recognized as an important source of

8

information on species diversification. For many models of macro-evolution, analyt-

9

ical likelihood formulas have been derived to infer macro-evolutionary parameters

10

from phylogenies. A few years ago, a general framework to numerically compute

11

such likelihood formulas was proposed, which accommodates models that allow spe-

12

ciation and/or extinction rates to depend on diversity. This framework calculates the

13

likelihood as the probability of the diversification process being consistent with the

14

phylogeny from the root to the tips. However, while some readers found the frame-

15

work presented in Etienne et al. (2012) convincing, others still questioned it (personal

16

communication), despitenumerical evidence that for special cases the framework

17

yields the same (i.e. within double precision) numerical value for the likelihood as

18

analytical formulas do that were independently derived for these special cases. Here

19

we proveanalyticallythat the likelihoods calculated in the new framework are correct

20

for all special cases with known analytical likelihood formula. Our results thus add

21

substantial mathematical support for the overall coherence of the general framework.

22

Keywords Diversification·Birth-death process

23

Giovanni Laudanno

Groningen Institute for Evolutionary Life Sciences, Box 11103, 9700 CC, Groningen, The Netherlands E-mail: glaudanno@gmail.com

Bart Haegeman

Centre for Biodiversity Theory and Modelling, Theoretical and Experimental Ecology Station, CNRS and Paul Sabatier University, Moulis, France

Rampal S. Etienne

Groningen Institute for Evolutionary Life Sciences, Box 11103, 9700 CC, Groningen, The Netherlands

(3)

Mathematics Subject Classification (2000):Primary: 60J80; Secondary: 92D15,

24

92B10, 60J85, 92D40

25

1 Introduction

26

One of the major challenges in the field of macro-evolution is understanding the

27

mechanisms underlying patterns of diversity and diversification. A very fruitful ap-

28

proach has been to model macro-evolution as a birth-death process which reduces the

29

problem to the specification of macroevolutionary events (i.e. speciation and extinc-

30

tion). However, providing likelihood expressions for these models given empirical

31

data on speciation and extinction events is quite challenging, for the following reason.

32

While such a likelihood is very easy to derive when full information is available for

33

all events, typically the data involves phylogenetic trees constructed with molecular

34

data collected from extant species alone. Hence, no extinction events and speciation

35

events leading to extinct species are recorded in such phylogenetic trees. For a vari-

36

ety of models this problem can be overcome by considering a reconstructed process,

37

whereby the phylogeny of extant species can be regarded as a pure-birth process with

38

time-dependent speciation rate [9]. But this approach is not generally valid.

39

Thus, the methods employed to derive likelihood expressions are usually applica-

40

ble to a limited set of models. They do not apply to models that assume that speciation

41

and extinction rates depend on the number of species in the system. Hence, potential

42

feedback of diversity itself on diversification rates, due to interspecific competition

43

or niche filling, is completely ignored. The first to incorporate such feedbacks were

44

Rabosky & Lovette [10], who made rates dependent on the number of species present

45

at every given moment in time, analogously to logistic growth models used in pop-

46

ulation biology. However, their model had to assume that there is no extinction for

47

mathematical tractability, which stands in stark contrast to the empirical data: the

48

fossil record provides us with many examples of extinct species.

49

Etienne et al. [3] presented a framework to compute the likelihood of phyloge-

50

netic branching times under a diversity-dependent diversification process that explic-

51

itly accounts for the influence of species that are not in the phylogeny, because they

52

have become extinct. Some of our colleagues have doubts that this framework con-

53

tains a formal argument that the solution of the set of ordinary differential equa-

54

tions that together constitute the framework gives the likelihood of the model for

55

a given phylogenetic tree. Instead, only numerical evidence for a small set of pa-

56

rameter combinations has been provided that the method yields, in the appropri-

57

ate limit, the known likelihood for the standard diversity-independent (i.e. using

58

constant-rates) birth-death model. This likelihood was first provided by Nee et al. [9],

59

using a breaking-the-tree approach. Later, Lambert & Stadler [8] used coalescent

60

point process theory to provide an approach to obtain likelihood formulas for a wider

61

(4)

set of models. These models did not include diversity-dependence. For example,

62

Lambert et al. [4, 7] applied their framework to the protracted birth-death model,

63

which is a generalization of the diversity-independent model where speciation is no

64

longer an instantaneous event [5]. For this model they provided an explicit likelihood

65

expression.

66

Here we provide an analytical proof that the likelihood of Etienne et al. [3] re-

67

duces to the likelihood of Lambert et al. [7] – and hence to that of Nee et al. [9] – for

68

the case of diversity-independent diversification.

69

The extant species belonging to a clade are often not all available for sequenc-

70

ing, because some species are difficult to obtain tissue from (either because they are

71

difficult to find/catch, or because they are endangered, or because they have recently

72

become extinct due to anthropogenic rather than natural causes) or because it is diffi-

73

cult to extract their DNA. This means that our data consists of a phylogenetic tree of

74

an incomplete sample of species, and thus of an incomplete set of speciation events,

75

even for those that lead to the species that we observe today. Models have been de-

76

veloped to deal with this incomplete sampling. One model assumes that we know

77

exactly how many extant species are missing from the phylogeny. This is the case for

78

well-described taxonomic groups, such as birds, where we have a good idea of the

79

species that are evolutionarily related, but we are simply missing some data points for

80

the reasons mentioned above. This sampling model is calledn-sampling [7]. Another

81

model assumes that we do not know exactly how many species are missing from the

82

phylogeny, but rather that we have an idea of the probabilityρ of a species being

83

sampled. This sampling scheme is calledρ-sampling [7], but is also referred to as

84

f-sampling [9]. The framework of Etienne et al. [3] assumesn-sampling, but in this

85

paper we show that it can also be extended to incorporateρ-sampling.

86

In the next section we summarize the framework of Etienne et al. [3] and we pro-

87

vide the likelihood formula analytically derived by Lambert et al. [7] for the special

88

case of diversity-independent but time-dependent diversification with n-sampling.

89

Then we proceed by showing that the probability generating functions of these two

90

likelihoods are identical. We end with a discussion where we point out how the frame-

91

work of Etienne et al. [3] can be extended to includeρ-sampling and how it relates to

92

the likelihood formula of Rabosky & Lovette [10] for the diversity-dependent birth-

93

death model without extinction.

94

2 The diversity-dependent diversification model

95

Diversification models are birth-death processes in which “birth” and “death” cor-

96

respond to speciation and extinction events, respectively. In the simplest case, the

97

per-species speciation rateλand the per-species extinction rateµare constants. Here

98

(5)

we consider diversification models in which the speciation and extinction rates de-

99

pend on the number of speciesnpresent at timet, i.e., diversity-dependent, which we

100

denote byλnandµn. We also allow the speciation and extinction rates to depend on

101

timet, i.e.,λn(t)andµn(t), although the latter dependence is often not explicit in our

102

notation.

103

We assume that the diversification process starts at timetcfrom a crown, i.e., from two ancestor species. Assuming that at a later timet>tcthe process hasnspecies, the transition probabilities in the infinitesimal time interval[t,t+dt]are

fromnton+1 species with probabilityλn(t)dt fromnton−1 species with probabilityµn(t)dt

number of speciesnunchanged with probability 1−λn(t)dt−µn(t)dt.

The diversification process runs until the present timetp.

104

We denote byPn(t)the probability that the process hasnspecies at timet. This

105

probability satisfies the following ordinary differential equation (ODE, called master

106

equation or forward Kolmogorov equation [1]),

107

dPn(t)

dt =µn+1(n+1)Pn+1(t) +λn−1(n−1)Pn−1(t)−(λnn)nPn(t), (2.1)

where we omit in the notation the time dependence of the speciation and extinction

108

rates.

109

Sampling models

110

At the present timetpa subset of thenextant species are observed and sampled. This

111

sampling process can been modelled in two different ways (see introduction). The

112

first model assumes that a fixed number of species is unsampled, which corresponds

113

to then-sampling scheme of Ref. [7]. That is, the number of extant species attp

114

that are not sampled, a number we denote by mp, is part of the data. The second

115

model assumes that each extant species at the present time is sampled with a given

116

probability, which has been called f-sampling [9] orρ-sampling [7]. In this case the

117

number of unsampled speciesmpis a random variable, and the probability with which

118

extant species are sampled a parameter to estimate, which we denote by fp.

119

(6)

02468Complete tree

a)

b)

c)

number of lineages t

c t2 t3 t4 t5tp

Reconstructed tree

Fig. 1 a) Full tree where missing species are plotted as red dashed lines: the ones ending in a cross become extinct before the present, whereas the ones ending with a red dot are unsampled species at the present;

b) Corresponding reconstructed tree in which only extant species are present. This is the type of tree we usually work with because actual phylogenetic trees are usually obtained from molecular data taken from extant species; c) Lineages-through-time plot: the green line represents the number of lineages leading to extant species (k), the red line represents lineages leading to extinct or unsampled species (m), and the blue line represents the total number of lineages (n=k+m).

Reconstructed tree

120

A realization of the diversification process fromtctotpcan be represented graphically

121

as a tree, see Figure 1. The complete tree shows all the species that have originated in

122

(7)

the process (Fig. 1a). However, in practice we have only access to the reconstructed

123

tree, i.e., the complete tree from which we remove all the species that became extinct

124

before the present or that were not sampled (Fig. 1b). While it would be straight-

125

forward to infer information about the diversification process based on the complete

126

tree, this task is much more challenging when only the reconstructed tree is available.

127

This paper deals with likelihood formulas for a reconstructed phylogenetic tree.

128

The number of tips equals the number of sampled extant specieskp. We assume that

129

also the number of unsampled extant species is known, a number we denote bymp.

130

The information contained in a phylogenetic tree consists of a topology and a set

131

of branching times. For a large set of diversification models, including the diversity-

132

dependent one, all trees having the same branching times but different topologies are

133

equally probable [8]. Hence, we can discard the topology for the likelihood compu-

134

tation. We denote the vector of branching times byt= (t2,t3, . . . ,tkp−1), wheretkis

135

the branching time at which the phylogenetic tree changes fromktok+1 branches.

136

It will be convenient to sett0=t1=tcandtkp =tp.

137

Likelihood conditioning

138

It is common practice to condition the likelihood on the survival of both ancestor lin-

139

eages to the present time [9]. Indeed, we would only do an analysis on trees that have

140

actually survived to the present. In particular, we impose the crown age of the recon-

141

structed tree. This corresponds to conditioning on the event that both ancestor species

142

have extant descendant species. We do not require that these descendant species have

143

been sampled.

144

3 The framework of Etienne et al.

145

Etienne et al. [3] presented an approach to compute the likelihood of a phylogeny for

146

the diversity-dependent model. It is based on a new variable,Qkm(t), which they de-

147

scribed as “the probability that a realization of the diversification process is consistent

148

with the phylogeny up to timet, and hasn=m+kspecies at timet” (Ref. [3], Box

149

1), whereklineages are represented in the phylogenetic tree (because they are ances-

150

tral to one of thekpspecies extant and sampled at present) andmadditional species

151

are present but unobserved (Fig. 1c). These species might not be in the phylogenetic

152

tree because they became extinct before the present or because they are either not

153

discovered or not sampled yet (see introduction). From hereon we will refer to these

154

species denoted bymas missing species. We cannot ignore these missing species, be-

155

cause in a diversity-dependent speciation process, they can influence the speciation

156

and extinction rates.

157

We start by describing the computation of the variableQkmpp(tp), which proceeds from the crown agetcto the present timetp. It is convenient to arrange the values

(8)

Qkm(t), withm=0,1,2, . . ., into the vectorQk(t). The initial vectorQk=2(tc)is trans- formed into the vector Qk(t) at a later timet as follows (Ref. [3], Appendix S1, Eq. (S1)):

Qk(t) =Ak(tk−1,t)Bk−1(tk−1)Ak−1(tk−2,tk−1). . . A3(t2,t3)B2(t2)A2(tc,t2)Qk=2(tc),

withtk−1≤t≤tk. The operators Ak and Bk are infinite-dimensional matrices that operate along the tree, on branches and nodes, respectively (Fig. 2). Continuing this computation until the present timetp, we get

Qkp(tp) =Akp(tkp−1,tp)Bkp−1(tkp−1)Akp−1(tkp−2,tkp−1). . .

A3(t2,t3)B2(t2)A2(tc,t2)Qk=2(tc). (3.1) Note that Eq. (3.1) generalizes Eq. (S1) of Ref. [3] to the case in which the rates are

158

time-dependent.

159

We specify the different terms appearing in the right-hand side of Eq. (3.1):

160161

– For the initial vector Qk=2(tc)we assume that there are no missing species at

162

crown age, that is,Qk=2m (tc) =δm,0.

163

– The matrix Akcorresponds to the dynamics ofQkm(t)in the time interval[tk−1,tk], during which the phylogenetic tree has k branches. Etienne et al. [3] argued that these dynamics are given by the following ODE system (Ref. [3], Box 1, Eq. (B2)):

dQkm(t)

dt =µk+m+1(m+1)Qkm+1(t) +λk+m−1(m−1+2k)Qkm−1(t)

−(λm+km+k)(m+k)Qkm(t), ∀m>0, dQk0(t)

dt =µk+1Qk1(t)−(λkk)kQk0(t), ifm=0.

(3.2) The quantityQk0p(tp)is the likelihood for the tree at the present time, assuming

164

that all the species survived to the present have been sampled. We can collect the

165

coefficients ofQkm(t)on the right-hand side of the ODE system in a matrix Vk(t).

166

If we do so, the system can be rewritten as

167

dQk(t)

dt =Vk(t)Qk(t), which has solution

168

Qk(t) =expZ t tk−1

Vk(s)ds

Qk(tk−1), so that

169

Ak(tk−1,t) =exp Z t

tk−1

Vk(s)ds

. (3.3)

(9)

– The matrix Bk transforms the solution of the ODE system ending attk into the

170

initial condition of the ODE system starting at tk. It is a diagonal matrix with

171

componentskλk+mdt, so that

172

Qk+1m (tk) = Bk(tk)

m,mQkm(tk) =kλk+mdt Qkm(tk). (3.4) The multiplication byλk+mdtcorresponds to the probability that a speciation oc-

173

curs in the time interval[tk,tk+dt]. In the likelihood expressions we will omit the

174

differential (a choice that is widely adopted across the vast majority of this kind

175

of models in the literature) as it is actually not essential in parameter estimation.

176

Therefore, we will work with a likelihood density, but for simplicity we will refer

177

to it as a likelihood.

178

We are then ready to formulate the claim made by Etienne et al. [3] (in particular, see

179

their Eqs. (S2) and (S6) in Appendix S1).

180

Claim 3.1 Consider the diversity-dependent diversification model, given by specia-

181

tion ratesλn(t)and extinction ratesµn(t). The diversification process starts at crown

182

age tc with two ancestor species, and ends at the present time tp, at which a fixed

183

number of species mpare not sampled. A phylogenetic tree is constructed for the

184

sampled species. Then, the likelihood that the phylogenetic tree has kptips and vec-

185

tor of branching timest= (t1,t2, . . . ,tkp−1), conditional on the event that both crown

186

lineages survive until the present, is equal to

187

Lkp,t,mp= Qkmpp(tp)

kp+mp

mp

Pc(tc,tp). (3.5)

The term Qkmpp(tp)in the numerator of this expression is obtained from Eq. (3.1). The

188

term Pc(tc,tp)in the denominator, where the subscript c stands for conditioning, is

189

equal to

190

Pc(tc,tp) =

m=0

6

(m+2)(m+3)Qk=2m (tp), (3.6) where Qk=2m (tp)is again obtained from Eq. (3.1).

191

The structure of the likelihood expression (3.5) can be understood intuitively. It

192

is proportional toQkmpp(tp), which in Etienne et al.’s interpretation is the probability

193

that the diversification process generates the phylogenetic tree withkptips andmp

194

missing species at present timetp. The combinatorial factor kpm+mp

p

accounts for the

195

number of ways to selectmpmissing species out of a total pool ofkp+mpspecies.

196

The factorPc(tc,tp)is the probability that both ancestor species at crown agetchave

197

descendant species at the present timetp. Hence, this factor applies the likelihood

198

conditioning.

199

(10)

Etienne et al. [3] provided numerical evidence that Claim 3.1 is in agreement

200

with the likelihood provided by Nee et al. [9] under the hypothesis of diversity-

201

independent speciation and extinction rates and no missing species at the present.

202

However, a rigorous analytical proof, even for this specific case, has not yet been

203

given. In this paper we show that Claim 3.1 holds (1) for the diversity-independent

204

(but possibly time-dependent) case and (2) for the diversity-dependent case without

205

extinction (i.e., extinction rateµ=0).

206

Reconstructed tree

A

2

( t

2

,t

c

) B

2

Q

2

( t

c

) Q

4

( t

p

)

k=2 k=3 k=4

B

3

A

3

( t

3

,t

2

) A

4

( t

p

,t

3

)

t

t

c

t

2

t

3

t

p

Fig. 2 An example of how to build a likelihood for a tree withkp=4 tips. We start with a vec- tor Q2(tc) at the crown age. We use Ak(tk,tk−1) and Bk(tk) to evolve the vector across the en- tire tree (on branches and nodes, respectively) up to the present time tp according to Q4(t4) = A4(t4,t3)B3(t3)A3(t3,t2)B2(t2)A2(t2,tc)Q2(tc). At the present time the likelihood accounting formpmiss- ing species will be proportional to themp-th component of the vectorL4,mpQ4m

p(tp).

(11)

4 The likelihood for the diversity-independent case

207

Claim 3.1 proposes a likelihood expression for the case with a known number of

208

unsampled species at the present, i.e., it accounts forn-sampling. For the diversity-

209

independent case, i.e.,λn(t) =λ(t)andµn(t) =µ(t), the likelihood is contained in

210

a more general result established by Lambert et al. [7]. In the following proposition

211

we derive an explicit likelihood expression by restricting the result of Lambert et al.

212

to the diversity-independent case.

213

Proposition 4.1 Consider the diversity-independent diversification model, given by speciation ratesλ(t)and extinction rate µ(t). The diversification process starts at crown age tcwith two ancestor species, and ends at the present time tp, at which a fixed number of species mpare not sampled. A phylogenetic tree is constructed for the sampled species. Then, the likelihood that the phylogenetic tree has kptips and vector of branching times t= (t1,t2, . . . ,tkp−1), conditional on the event that both crown lineages survive until the present, is equal to

L(div-indep)

kp,t,mp =(kp−1)!

kp+mp

mp

(1−η(tc,tp))2

kp−1

i=2

λ(ti)(1−ξ(ti,tp))(1−η(ti,tp))

m|m

p

kp−1

j=0

(mj+1)(η(tj,tp))mj, (4.1) where we used the convention t0=t1=tc. The components mj(with j=0,1, . . . ,kp

214

1) of the vectorsm, in the sum on the second line, are non-negative integers satisfying

215

kj=0p−1mj=mp. The functionsξ(t,tp)andη(t,tp)are given by

216

ξ(t,tp) =1− 1

α(t,tp) +Rttpλ(s)α(t,s)ds

=1− 1

1+Rttpµ(s)α(t,s)ds (4.2) η(t,tp) =1− α(t,tp)

α(t,tp) +Rttpλ(s)α(t,s)ds

=1− α(t,tp)

1+Rttpµ(s)α(t,s)ds, (4.3) with

217

α(t,s) =expZ s t

(µ(s0)−λ(s0))ds0 .

The functionsξ(t,tp)andη(t,tp)are those appearing in Kendall’s solution of the

218

birth-death model (see Ref. [6], Eqs. (10–12)), and are useful to describe the process

219

when time-dependent rates are involved. Given the probabilityPn(t,tp)of realizing a

220

(12)

process starting with 1 species at timetand ending withnspecies at timetp, we have

221

thatξ(t,tp) =P0(t,tp)andη(t,tp) =Pn?+1(t,tp)

Pn?(t,tp) for anyn?>0.

222

Proof The likelihood forn-sampling was originally provided by Ref. [7], Eq. (7), but we start from the explicit version provided in Ref. [4], Eq. (1), see corrigendum in Ref. [2],

L(div-indep)

kp,t,mp =(kp−1)!

kp+mp

mp

(g(tc,tp))2

kp−1

i=2

f(ti,tp)

m|mp

kp−1

j=0

(mj+1)(1−g(tj,tp))mj. (4.4) Lambert et al. [4, 7] specify the functions f(t,tp)andg(t,tp)as the solution of a system of ODEs for the case of protracted speciation, a model where speciation does not take place instantaneously but is initiated and needs time to complete. The stan- dard diversification model is then obtained by taking the limit in which the speciation- completion rate tends to infinity. In this limit the four-dimensional system of Lambert et al. [4], Eq. (2), reduces to a two-dimensional system of ODEs,

f(t,tp) =dg(t,tp)

dt =λ(t) (1−q(t,tp))g(t,tp) dq(t,tp)

dt =−µ(t) + (µ(t) +λ(t))q(t,tp)−λ(t)q2(t,tp).

Note that in this paper timetruns from past to present rather than from present to past

223

as in Lambert et al. [4]. The conditions at the present timetpare given byg(tp,tp) =1

224

andq(tp,tp) =0.

225

The solution of this system of ODEs can be expressed in terms ofη(t,tp)and ξ(t,tp),

f(t,tp) =λ(t) (1−ξ(t,tp)) (1−η(t,tp)) g(t,tp) =1−η(t,tp)

q(t,tp) =ξ(t,tp),

which can be checked using the derivatives of the expressions 4.3 and 4.2

∂ η(t,tp)

∂t =−λ(t) (1−ξ(t,tp)) (1−η(t,tp))

∂ ξ(t,tp)

∂t =−(µ(t)−λ(t)ξ(t,tp)) (1−ξ(t,tp)).

Substituting the functions f(t,tp)andg(t,tp)into the likelihood expression (4.4) con-

226

cludes the proof.

227

The functions ξ(t,tp)andη(t,tp)are directly related to the functions used by

228

Nee et al. [9]. In particular, the functions they denoted byP(t,tp)andut correspond

229

in our notation to 1−ξ(t,tp)andη(t,tp), respectively.

230

(13)

This correspondence allows us to get an intuitive understanding of the likelihood

231

expression (4.1). First consider the case without missing species. Settingmp=0, we

232

get

233

L(div-indep)

kp,t,0 = (kp−1)!(1−η(tc,tp))2

kp−1

i=2

λ(ti)(1−ξ(ti,tp))(1−η(ti,tp)), which is identical to the breaking-the-tree likelihood of Nee et al. (Ref. [9], Eq. (20)).

234

In the latter approach the phylogenetic tree is broken into single branches: two for

235

the interval [tc,tp] and one for each interval [ti,tp] with i=2,3, . . . ,kp−1. Each

236

branch contributes a factor(1−ξ(ti,tp))(1−η(ti,tp)), equal to the probability that

237

the branch starting atti has a single descendant species attp. For the two branches

238

originating attc, the factor(1−ξ(ti,tp)), equal to the probability of having (one or

239

more) descendant species, drops due to the conditioning. For the other branches, there

240

is an additional factorλ(ti)for the speciation events.

241

Next consider the case with missing species. Each of the branches resulting from

242

breaking the tree can contribute species to the pool ofmpmissing species. For the

243

branch over the interval[tj,tp], there aremj such species, each contributing a factor

244

η(tj,tp)to the likelihood. Indeed,(1−ξ(tj,tp))(1−η(tj,tp))(η(tj,tp))mj is equal

245

to probability of having exactlymj+1 descendant species at the present time. One

246

of these species is represented in the phylogenetic tree, justifying the combinatorial

247

factor(mj+1)in the second line of Eq. (4.1).

248

Finally, we recall the expressions for the functionsξ(t,tp)andη(t,tp)in the case of constant rates,λ(t) =λ andµ(t) =µ,

ξ(t,tp) =µ 1−e−(λ−µ)(tp−t) λ−µe−(λ−µ)(tp−t) η(t,tp) =λ 1−e−(λ−µ)(tp−t)

λ−µe−(λ−µ)(tp−t) . 5 Equivalence for the diversity-independent case

249

Likelihood formula (4.1) allows speciation and extinction rates to be arbitrary func-

250

tions of time,λ(t)andµ(t). Here we show that, for the diversity-independent case,

251

we find the same likelihood formula with the approach of Etienne et al. [3].

252

Theorem 5.1 Claim 3.1 holds for the diversity-independent case.

253

Proof The proof relies heavily on generating functions. First, we introduce the

254

generating function for the variablesQkm(t),

255

Fk(z,t) =

m=0

zmQkm(t). (5.1)

(14)

The set of ODEs satisfied byQkm(t), Eq. (3.2), transforms into a partial differential equation (PDE) for the generating functionFk(z,t),

∂Fk(z,t)

∂t = (µ(t)−zλ(t))(1−z)∂Fk(z,t)

∂z +k(2zλ(t)−λ(t)−µ(t))Fk(z,t)

=c(z,t)∂Fk(z,t)

∂z +k∂c(z,t)

∂z Fk(z,t), (5.2)

with

256

c(z,t) = (µ(t)−zλ(t))(1−z).

Note that the number of brancheskchanges at each branching time, so that the PDE

257

forFk(z,t)is valid only fortk−1≤t≤tk(corresponding to the operatorAk). At branch-

258

ing timetk, the solutionFk(z,tk)has to be transformed to provide the initial condition

259

for the PDE forFk+1(z,t)at timetk(corresponding to the operatorBk). Using Eq. (3.4)

260

and dropping the differential, we get

261

Fk+1(z,tk) =kλ(tk)Fk(z,tk). (5.3) The initial condition at crown age isF2(z,tc) =1 becauseQk=2m (tc) =δm,0.

262

Next, we definePn(s,t)as the probability that the birth-death process that started

263

with one species at time s has n species at time t. The corresponding generating

264

function is defined as,

265

G(z,s,t) =

n=0

znPn(s,t). (5.4)

The set of ODEs satisfied byPn(s,t), Eq. (2.1), transforms into a PDE,

266

∂G(z,s,t)

∂t =c(z,t)∂G(z,s,t)

∂z . (5.5)

Its solution was given by Kendall (Ref. [6], Eq. (9)),

267

G(z,s,t) =ξ(s,t) + (1−ξ(s,t)−η(s,t))z

1−zη(s,t) , (5.6) whereξ(s,t)andη(s,t)are given in Eqs. (4.2) and (4.3).

268

The generating functionFk(z,t)can be expressed in terms of the generating func-

269

tionG(z,s,t), as shown in the following lemma.

270

Lemma 5.1 The generating function Fk(z,t)of the variables Qkm(t)is given by

271

Fk(z,t) =H2(z,tc,t)

k−1

j=2

j(tj)H(z,tj,t) (5.7) with

272

H(z,s,t) =∂G(z,s,t)

∂z =(1−ξ(s,t)) (1−η(s,t))

(1−zη(s,t))2 . (5.8)

(15)

To prove the lemma, let us suppose that the solution of Eq. (5.2) is of the form,

273

Fk(z,t) =Ck(t)

k−1

j=0

zG(z,tj,t) (5.9)

whereCk(t)is a constant depending on the branching times. We used the convention

274

t0=t1=tcand the short-hand notation∂zfor the partial derivative with respect toz.

275

This expression can be rewritten as,

276

Fk(z,t) =Ck(t)1 k

k−1 i=0

zG(z,ti,t)

k−1 j6=i,

j=0

zG(z,tj,t).

The partial derivatives ofFkcan now be computed,

zFk=Ck(t)

k−1

i=0

z2G(z,ti,t)

k−1 j6=i,

j=0

zG(z,tj,t)

tFk=Ck(t)

k−1 i=0

tzG(z,ti,t)

k−1 j6=i,j=0

zG(z,tj,t).

We substitute these expressions into the PDE, Eq. (5.2),

k−1

i=0

tzG(z,ti,t)

k−1 j6=i,

j=0

zG(z,tj,t)

=c(z,t)

k−1 i=0

z2G(z,ti,t)

k−1 j6=i,

j=0

zG(z,tj,t)

+k∂zc(z,t)1 k

k−1

i=0

zG(z,ti,t)

k−1 j6=i,

j=0

zG(z,tj,t).

This equation is satisfied if the following equation is satisfied for everyi=0,1, . . . ,k,

tzG(z,ti,t)

k−1 j6=i,

j=0

zG(z,tj,t)

=c(z,t)∂z2G(z,ti,t)

k−1 j6=i,

j=0

zG(z,tj,t)

+∂zc(z,t)∂zG(z,ti,t)

k−1 j6=i,

j=0

zG(z,tj,t).

This is the case if

277

tzG(z,ti,t) =c(z,t)∂z2G(z,ti,t) +∂zc(z,t)∂zG(z,ti,t), or, equivalently, if

278

z

tG(z,ti,t)

=∂z

c(z,t)∂zG(z,ti,t) .

(16)

This is an identity becauseG(z,ti,t) satisfies Eq. (5.5).

279

Next, we verify that the constantsCk(t)can be determined such that initial con-

280

ditions (5.3) are satisfied. This is indeed the case if we take

281

Ck(t) =

k−1

j=2

jλ(tj).

Introducing the functionH(z,s,t)and usingt0=t1=tccomplete the proof of the

282

lemma.

283

Next, we use Eq. (5.7) to derive an explicit expression for the likelihood (3.5) of

284

Claim 3.1. It will be useful to have explicit expressions for derivatives of the function

285

H(z,s,t). It follows from Eq. (5.8) that

286

1 a!∂za

Hb(z,tj,t)

=

a+2b−1 a

Hb(z,tj,t)

η(tj,t) 1−zη(tj,t)

a

, (5.10)

whereaandbare positive integers.

287

To evaluate the numerator of Eq. (3.5), we have to extractQkmpp(tp)from the gen- erating functionFkp(z,tp). Using Leibniz’ rule,

Qkmpp(tp) = 1 mp!∂zmp

Fkp(z,tp)

z=0

=Ckp(t) mp! ∂zmp

"k

p−1

j=0

H(z,tj,tp)

#

z=0

=Ckp(t) mp!

m|mp

mp m0,m1, . . . ,mkp−1

kp−1

j=0

zmj[H(z,tj,tp)]z=0

=Ckp(t) mp!

m|mp

mp!

imi!

kp−1

j=0

(mj+1)!

H(z,tj,tp)

η(tj,tp) 1−zη(tj,tp)

mj

z=0

=Ckp(t)

kp−1

j=0

H(0,tj,tp)

m|mp

1

kp

−1 i=0 mi!

kp−1

j=0

(mj+1)!ηmj(tj,tp)

=

kp−1

j=2

jλ(tj)

kp−1

j=0

(1−ξ(tj,tp))(1−η(tj,tp))

m|mp kp−1

j=0

(mj+1)ηmj(tj,tp)

= (kp−1)!(1−ξ(tc,tp))2(1−η(tc,tp))2

kp−1

j=2

λ(tj) (1−ξ(tj,tp))(1−η(tj,tp))

m|m

p

kp−1

j=0

(mj+1)ηmj(tj,tp). (5.11)

To evaluate the denominator of Eq. (3.5), we have to extractQk=2m (tp)from the

288

generating function,

289

Qk=2m (tp) = 1 m!∂zm

F2(z,tp)

z=0= 1 m!∂zm

H2(z,tc,tp)

z=0.

(17)

Substituting into Eq. (3.6) and using Eq. (5.10), we get Pc(tc,tp) =

m=0

6 (m+2)(m+3)

1 m!∂zm

H2(z,tc,tp)

z=0

=H2(0,tc,tp)

m=0

(m+1)ηm(tc,tp)

= (1−ξ(tc,tp))2. (5.12)

Finally, substituting Eqs. (5.11) and (5.12) into the likelihood formula (3.5) of Claim 3.1,

Lkp,t,mp=(kp−1)!

kp+mp mp

(1−η(t1,tp))2

kp−1

j=2

λ(tj)(1−ξ(tj,tp))(1−η(tj,tp))

m|m

p

kp−1

j=0

(mj+1)ηmj(tj,tp), (5.13)

which is identical to likelihood formula (4.1). This concludes the proof of the theo-

290

rem.

291

6 A note on sampling a fraction of extant species

292

Nee et al. [9] noted that one way to model the sampling of extant species is equivalent

293

to a mass extinction just before the present. This sampling model corresponds to

294

sampling each extanct species with a given probability fp, which has also been called

295

ρ-sampling [7]. We use the link with mass extinction to extend the previous formula

296

forn-sampling to the case ofρ-sampling.

297

First, we formulate theρ-sampling version of Claim 3.1.

298

Claim 6.1 Consider the diversity-dependent diversification model, given by specia-

299

tion ratesλn(t)and extinction ratesµn(t). The diversification process starts at crown

300

age tc with two ancestor species, and ends at the present time tp, at which extant

301

species are sampled with probability fp. Then, the likelihood of a phylogenetic tree

302

with kptips and branching timest, conditional on the event that both crown lineages

303

survive until the present, is equal to

304

Lkp,t=Ps(tc,t,tp,fp)

Pc(tc,tp) . (6.1)

The term Ps(tc,t,tp,fp)in the numerator, where the subscript s stands for sampling,

305

is equal to

306

Ps(tc,t,tp,fp) =

m=0

fpkp(1−fp)mQkmp(tp), (6.2)

(18)

where Qkmp(tp)is obtained from Eq. (3.1). The term Pc(tc,tp)in the denominator, where

307

the subscript c stands for conditioning, is equal to

308

Pc(tc,tp) =

m=0

6

(m+2)(m+3)Qk=2m (tp), (6.3) where Qk=2m (tp)is again obtained from Eq. (3.1).

309

Next, we establish as a reference the likelihood formula forρ-sampling in the

310

diversity-independent case.

311

Proposition 6.1 Consider the diversity-independent diversification model, given by

312

speciation ratesλ(t)and extinction ratesµ(t). The diversification process starts at

313

crown age tc with two ancestor species, and ends at the present time tp, at which

314

extant species are sampled with probability fp. Then, the likelihood of a phylogenetic

315

tree with kp tips and branching times t, conditional on the event that both crown

316

lineages survive until the present, is equal to

317

L(div-indep)

kp,t = (kp−1)!(1−ηe(tc,tp))2

kp−1

i=2

λ(ti)(1−ξe(ti,tp))(1−η(te i,tp)). (6.4) The functionsξe(t,tp)andη(t,te p)are given by

ξe(t,tp) =1− fp

α(t,tp) +fpRttpλ(s)α(t,s)ds

=1− fp

fp+ (1−fp)α(t,tp) +fpRtp

t µ(s)α(t,s)ds (6.5) η(t,te p) =1− α(t,tp)

α(t,tp) +fpRtp

t λ(s)α(t,s)ds

=1− α(t,tp)

fp+ (1−fp)α(t,tp) +fpRtp

t µ(s)α(t,s)ds, (6.6) with

318

α(t,s) =expZ s

t

(µ(s0)−λ(s0))ds0 .

Proof We use the equivalence betweenρ-sampling and a mass extinction, see

319

Ref. [9], Eq. (31). We introduce a modified extinction rate µ(t)containing a delta

320

function just before the present,

321

eµ(t) =µ(t)−lnfpδ(t−tp). (6.7) The likelihood formula is then obtained by settingmp=0 in Eq. (4.1), while evaluat- ing the functionsξ(t,tp)andη(t,tp)with the modified extinction rateeµ(t,tp). This establishes Eq. (6.4); it remains to be proven that the modified functionsξe(t,tp)and

Références

Documents relatifs

Faith healers may succeed in offering positive support for their urban clients, but they are not usually well versed in the nature of modern psy- chosocial disorders. As a result,

On the same cover sheet, please put your name, student number, and name of the degree (Maths/TP/TSM), and staple all the sheets together?. (Failure to do that may result

Among these models, intestinal perfusion is the most common experiment used to study the in vivo drug permeability and intestinal metabolism in different regions of the

F or example, let's onsider again the task of representing all numbers from 0 through deimal 999,999: in base 10 this obviously requires a width of six digits, so that.. rw = 60

Then we can conclude: in the worst case the method of undetermined coefficients gives an exponential number, in d, of reducible Darboux polynomials.... The problem comes from

In this paper, we adopt a global (in space) point of view, and aim at obtaining uniform estimates, with respect to initial data and to the blow-up points, which improve the results

Values near 0 indicate no spatial autocorrelation (no spatial pattern - random spatial distribution) and values in the interval (0, 1) indicate positive spatial

A second scheme is associated with a decentered shock-capturing–type space discretization: the II scheme for the viscous linearized Euler–Poisson (LVEP) system (see section 3.3)..