HAL Id: hal-00894201
https://hal.archives-ouvertes.fr/hal-00894201
Submitted on 1 Jan 1998
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
An overview of the Weitzman approach to diversity
Caroline Thaon d’Arnoldi, Jean-Louis Foulley, Louis Ollivier
To cite this version:
Original
article
An
overview of
the
Weitzman
approach
to
diversity
Caroline Thaon d’Arnoldi, Jean-Louis
Foulley*
Louis Ollivier
Station de
génétique quantitative
etappliquée,
Institut national de la recherche
agronomique,
78352Jouy-en-Josas cedex,
France(Received
10July
1997;accepted
30January
1998)
Abstract - The
diversity
of a set of breeds orspecies
is defined in the Weitzmanapproach
by
a recursion formulausing
thepairwise genetic
distances between the elements of theset. The
algorithm
forcomputing
thediversity
function of Weitzman is described. It alsoprovides
a taxonomy of the set which isinterpreted
as the maximum likelihoodphylogeny.
The
theory
is illustratedby
anapplication
to 19European
cattle breeds. Thepossible
uses of the method for
defining
optimal
conservationstrategies
arebriefly
discussed.©
Inra/Elsevier,
Parisdiversity
/
taxonomy/
conservation/
phylogeny/
genetic distanceRésumé - Un aperçu sur l’approche de la diversité selon Weitzman. La diversité d’un
ensemble
d’espèces,
ou de races, est définie par Weitzman defaçon récursive ;
les données dedépart
sont les distancesgénétiques
entre les éléments de l’ensemblepris
deux à deux.L’algorithme
de calcul de la diversitéfournit,
comme résultatintermédiaire,
un arbre declassement des
espèces
enprésence, qui
estinterprété
comme unephylogénie
du maximumde vraisemblance. La théorie est illustrée par un
exemple d’application
à 19 races bovineseuropéennes,
et les utilisationspossibles
de la méthode pour définir desstratégies optimales
de conservation sont discutées brièvement.
©
Inra/Elsevier,
Parisdiversité
/
taxonomie/
conservation/
phylogénie/
distance génétique1. INTRODUCTION
The
question
ofpreserving
biological diversity
iscurrently
attracting
agreat
dealof attention. Choices are necessary when it comes to
deciding
whichendangered
species
must beprotected
and which not.Conserving
breeds of farmanimals,
ordomestic animal
diversity,
presents strong
analogies
with the moregeneral
question
of
preserving
biological diversity.
In both cases,owing
to the limited resources *which can be devoted to
conservation,
the centralquestion
is ’what topreserve’
!6!.
The choices are difficult and it would be much easier if an
operational
theoreticalframework based on this
concept
of’diversity’
were available. As notedby
Solowet al.
!5!,
thisconcept
ofdiversity
itself appears to have not so far beenprecisely
defined,
apart
from a fewattempts
which can be traced back toMay
!3!.
An
analytical
framework able toguide
actual conservationpolicy
in adiversity-improving
directionthrough
the use of adiversity
function has beenprovided by
Weitzman,
aneconomist,
who hasgiven
anexample
ofapplication
to theproblem
of cranespecies
conservation!8-10!.
Since histheory
is recent and almost unknownto animal
geneticists
(see,
however,
Cunningham
[1]
and Ollivier!4!),
and as it hasnot
yet
been used in the context of livestock breeddiversity,
we found it useful todescribe it
briefly and,
as anillustration,
toapply
it to a set of cattle breeds.2. THEORY
The method
applies
to ’elements’ which mayrepresent
species, breeds, subspecies
or any other
operational
taxonomic unit. Pairwise distances between elements aregiven, presenting
basicproperties
ofpositivity,
symmetry
and nil distance of anelement to itself. It is concerned with
diversity
betweenunits;
thetheory
ignores
diversity
due to variation within units.2.1.
Computing
diversity
Computing
diversities isstraightforward
if one knows how much the addition of oneelement, say j,
increases thediversity
of agiven
setQ. Intuitively,
themagnitude
of thegain
should be related to how different the new element is from the setQ;
the more
different j
is fromQ,
thegreater
thegain.
This difference is measuredby
the distanced(j,
Q).
Here,
the distance from apoint j
to a setQ
isdefined,
asusual in set
theory, by
min
iEQ
d(i, j),
in otherwords,
the distancebetween j
andits closest
neighbour
inQ.
More
precisely,
the intuitiveproperty
of thediversity
function(which
will be called V from nowon)
is the’monotonicity
inspecies’:
thegain
of one elementincreases the
diversity
by
at leastd(j, Q)
However,
this is too loose aproperty
to define aunique
function. Infact,
we willconsider
(1)
asgeneral
conditions tosatisfy
for any member i withdrawn from thewhole set
S,
i.e.where
B is
thecomplement
setsymbol,
i.e. hereSBi
stands for S without i. LetV’
be defined asV
i
’
=V (SBi)
+d(i, SBi).
For agiven
setS,
the value of V’If such a condition holds for the
largest
V
i
’,
it will also be true for all the otherones since:
According
to(2),
all the functionshaving larger
values thanV’
also meet thecriterion;
to make the definition ofV(S)
unique,
it will be restricted to the lowestone
(minimum
ofV),
i.e.precisely
to thatequal
toV’.
This leads to the recursive definition of the Weitzmandiversity
function as:with the initial conditions
The value of K is taken
by
Weitzman[8, 9]
as anormalizing
constant whichcomputationally
can be set to zero.Equation
(4)
provides
aunique
functionhaving
someinteresting properties:
- the ’twin
property’:
the addition of an element which is identical to an elementof S does not increase
V;
- the
monotonicity
inspecies
[see (1)!;
-
the
continuity
in distances: if thepairwise
distances in set S areslightly
modified,
the modification ofdiversity
isslight
too;
- the
monotonicity
in distances: if everypairwise
distance in set S isincreased,
the
diversity
of S increases too.These
properties
are fundamental.They
have the merit to removeambiguity
and to
lay
down the definition ofdiversity
onsimple
andrigorous
principles.
Inparticular,
theproperty
ofcontinuity
in distances is of criticalimportance
for anyutilization of the
results, given
that there is someuncertainty
on the real values ofthe
pairwise
distances.2.2. The fundamental
representation
theoremThe
dynamic
programming
recursion ofequation
(4)
involves n!calculations,
n
being
the number of elements.Fortunately,
thefollowing
property
allows us toreduce this
computation
to 2! calculations. Thedynamic
programming
recursionproduces,
as asecondary result,
agraphical
representation
of the relations betweenthe elements.
2.2.1. Link
property
By definition,
and as shownpreviously,
there exists an element i in any set S forwhich the maximum of
equation
(4)
is achieved:Weitzman has shown that the element i in
d(i, SBi)
is one of the two closestelement i in S the loss of which involves a minimal reduction of
diversity equal
tod(i, SBi).
This element is called the link.2.2.2. Theorem
Having
identified such apair
(i, j),
how will we know which one is the link?Remember from
(3)
thatV(S)
=max (V’,
V! ).
Now V’
=d(i, j)
+V(SBi),
andVj
=d(i, j)+V (SB j)
so that the link is the elementsatifying
max{V (SBi), V (SB j) }.
The
dynamic programming
recursion becomes:where, using
Weitzman’snotations,
the elementg(S),
satisfying
max [V(SBg),
V(SBh)!
is called thelink,
the other one,h(S),
is therepresentative.
A
proof
of the theorem caneasily
be writtenby
mathematical induction withrespect
to the size of the set S.2.2.3.
Algorithm
andgraphical representation by
a taxonomic treeApplying
equation
(6)
recursively
generates
a rooted directed tree whosetwig-tips
are the elements of the set S and the nodes are the unknown ’ancestors’.The different
steps
of thealgorithm
to beapplied
recursively
are(beginning
with the value ofdiversity
set tozero):
i)
find the two closestneighbours
i and j
among
the elements of S and addd(i, j)
to
diversity;
ii)
determine thelink g
and therepresentative
hby
using
theproperty:
iii)
given
V(S)
=d(g, h)
+V(SBg),
consider a new set without the link g, i.e.SBg;
iv)
return toi)
until the size of the current set reaches1;
then add the constantK defined in
(4)
todiversity
andstop.
While
drawing
thetree,
it is useful toplace
thelink g
between therepresentative
h and the closestneighbour
of h inQBg,
Q
being
the subset whosediversity
is
computed
at thisstep.
Intuitively,
it means that the loss of the link is lessconsequential
for thediversity
than the loss of any other element. Itpresents
theadvantage
ofallowing only
onesymmetry
through
thepossible representations
forthe
tree,
while most hierarchicalclustering
methods result in a number ofpossible
representations by
rotation of the branches. Thediversity
of the set S can be read on the tree as the sum of the branchlengths,
or the sum of the ancestor ordinates. Weitzman also showed that theparticular
treegenerated
by
thedynamic
recursionalgorithm
in(6)
andsteps
i-iv can beinterpreted
as the treemaximizing
2.2.4.
Example
Let us consider a set of four
primate
species.
Pairwise distances aregiven
in thefollowing
matrix(data
areprovided
by
Weitzman!9!):
The closest
neighbours
to be found in the set{Go,
Or,
HyL,
HyS}
areHyL
andHyS.
V{Go,
Or,
HyL,
HyS}
=max [V{Go,
Or,
HyL}, V{Go,
Or,
HyS}]
+d(HyS, HyL)
Now we need to know which element is the link in the
couple
(HyL, HyS).
The
following
matrices containpairwise
distances for the subsets{Go,
Or,
HyL}
and{Go,
Or,
HyS}:
V{Go,
Or,
HyL}
=d(Go, Or) +max[V{Go,
HyL}, V{Or,
HyL}]
=
d(Go, Or)
+d(Go,
HyL) (so
Or is the link element in{Go,
Or,
HyL})
= 889
V{Go,
Or,
HyS}
=d(Go,
Or)
+max {V{Or, HyS}, V{Go, HyS}}
=
d(Go, Or)
+d(Go, HyS) (so
Or is the link element in{Go,
Or,
HyS})
= 855
V{Go,
Or,
HyL}
>V{Go,
Or,
HyS},
thus we have determined that the linkelement in the
couple
(HyL, HyS)
isHyS,
andconsequently
therepresentative
isHyL.
Considering
theremaining
set after thesuppression
of the linkelement,
i.e.link element. This information then makes it
possible
tocompute
the totaldiversity,
which is worth 1015 =
d(Go, HyL)
+d(Go, Or)
+d(HyL, HyS),
and to draw thecorresponding
taxonomic tree(figure 1).
The link
HyS
in{Go,
Or,
HyL,
HyS}
isplaced
between therepresentative
HyL
and the closest
neighbour
Or ofHyL
in{Go,
Or,
HyL}.
The link Or in{Go,
Or,
HyL}
is thenplaced
between therepresentative
Go and the closestneighbour HyL
of Go in
{Go, HyL},
resulting
in a final order ofGo, Or, HyS, HyL.
3. APPLICATION: EXAMPLE OF EUROPEAN CATTLE BREEDS 3.1. Evaluation of
diversity
The Weitzman method has been
applied
to data collectedby
F. Grosclaude[2]
on biochemicalpolymorphisms
(11
blood group loci and the locus of blood serum transferrin and that ofbeta-casein)
of 19European
cattlebreeds,
including
18 French breeds and the British Shorthorn. This latter was included because of
its Durham ancestor that has been introduced in some French
regions
during
thelast
century.
The authors calculated the Nei standard distancesconsidering
the 13polymorphic
loci(table
1).
Results of the differentsteps
of thecomputations
ofdiversity
are shown in table II.The
graphical
representation
of the result is shown infigure
2. A clear discrimi-nation is observed between two groups i.e.i)
a first group made of Northerndairy
breeds
(Frisonne,
Flamande,
MaineAnjou,
Shorthorn)
andii)
another groupinvolv-ing
beef andhardy
breeds of the Center and Westpart
of France(Salers,
Aubrac,
Limousine,
Charolais, Ferrandaise,
Blonded’Aquitaine)
as well as Western andEastern dual purpose breeds
(e.g.
PieRouge, Abondance,
Tarentaise,
Brune desCurrent
population
sizes in some of those breeds are so restricted thatthey
are said to beendangered:
e.g. Bretonne PieNoire, Ferrandaise, Vosgienne
or theShorthorn.
The Weitzman method allows us to
quantify
the loss ofdiversity
causedby
the extinction of any subset among the 19original
breeds.By looking
at the tree it is evident that the extinction of the Shorthorn causes a muchgreater
loss ofdiversity
than the extinction of the
Flamande,
whose distance from its closestneighbour,
the Frisonne PieNoire,
isquite
small.By
computing
the diversities of the initial set of breeds and the set minus theFlamande,
or theShorthorn,
or both the Flamande and theShorthorn,
one finds that the loss of the set Flamande + Shorthorn induces a reduction ofdiversity equal
to the sum of the reductions caused
by
the loss of each of these breeds. Thisproperty
ofadditivity
is related to thedegree
of’independence’
between the two breeds. OnThe loss of
diversity
causedby
the extinction of a set of breeds can be estimatedby
the sum of the ordinates of the nodes that woulddisappear
from the tree if the extinct breeds were to beremoved,
without any otherchange.
Thus, just by looking
at the
tree,
it is obvious than the loss of the Normande would decrease thediversity
eight
or nine times more than the loss of the Blonded’Aquitaine,
and even more3.2. Further considerations on conservation
strategies
The
algorithm
may beapplied
to evaluate the relative merit of breeds with smallor medium
population
sizesregarding
diversity.
Let us consider the whole set(say
Q)
of the 18 French cattle breedsanalysed
in thisstudy,
and that(say L)
of the sixlargest dairy
(Francaise
Frisonne,
Montb6liarde andNormande)
and beef breeds(Blonde
d’Aquitaine,
Charolaise andLimousine).
The relative loss due tokeeping
those six breeds
only
is 57.2%.
Now one may ask which is the mostinteresting
breed to select among the rest if any of them has to bepreserved.
This can be evaluatedby considering
the relative loss ofdiversity
betweenQ
and Lplus
each of those 12breeds. Results based on Nei and
(Cavalli-Sforza)
distances are thefollowing:
The breed
providing
the lowest loss ofdiversity
is the Salers breed followedby
the Aubrac. The
ranking
is consistent across the two distances used.Although
thisis
only
an illustration which would deserve furtheranalysis including
additionalmarkers,
thisexample
is asignificant
one as those breeds have beenrecognized
askey hardy
breeds for along
time[7].
4. DISCUSSION AND CONCLUSION
The method
presented
provides
several results with differentdegrees
of robust-ness and differentpotential
applications.
As indicated
above,
the value ofdiversity
possesses a usefulproperty
ofcontinuity
in distances. The results may be considered as relevant to
support
decisionsaffecting
the breeds orspecies
to bepreserved.
The choice would be basedonly
onobjective
computations,
withoutrelying
on suchsubjective
characteristics asbeauty,
interestfor future or
present
generations
or any other intrinsic criterium.Experience
hasshown that it is difficult to base
priorities
on such criteria.The Weitzman
approach
todiversity
allows furtherdevelopments.
Weitzman[10]
suggests
defining
adiversity expected
after agiven
period
oftime,
based onthe extinction
probability
of each element of the set considered. If n elements areendangered,
2 survival-extinctionpatterns
may occur withgiven probabilities,
and for each
pattern
theresulting diversity
may be calculated. Weitzman thendefines a
’marginal diversity’
of eachelement,
obtained as thepartial
derivative of theexpected
diversity
withrespect
to the extinctionprobability
of this element. Themarginal diversity
of breed i measures the relativegain
inexpected
diversity
(after
50 yearssay)
fromimproving
the survivalprobability
of breed i. In a similarfashion,
one could assume that the extinction of a breed can becompletely
avoidedby using
cryopreservation
and calculate thegain
inexpected
diversity
obtainedand the risk status of a
given
set ofendangered
breeds asexpressed
through
theirrespective
probabilities
ofextinction,
an order ofpriority
for acryopreservation
programme could thus be established.
Because
diversity
iscomputed recursively,
it involves verylong
calculations when the size n of the set islarger
than 25. Theapproximation
proposed
in thisstudy
relies on a random choice of the link at eachstage
of the recursivealgorithm,
i.e.on
sampling
trees among the2
n
-
1
possible
trees. Theprocedure
can beapplied
as follows:i)
compute
V among the elements of Sby choosing
at eachstep
the link not from the formula in(6),
but at random out of thepair
of closestneighbours,
ii)
repeat
i)
m times such as togenerate
m different values ofV,
iii)
take as the estimated value ofV(S)
the maximum value of V over all valuescomputed.
This can beperformed
by choosing
at random mintegers
smaller than2!!! ,
convertthem into their
binary
expression
and use the convention that the link will be thefirst element if the value is 0 and the second if it is 1. This
procedure
was tested on a set of 29 cattle breedsusing
data from Moazami-Goudarzi(pers. comm.).
Form = 10
000,
the estimated value of V was at least of 13 200 ascompared
to a realvalue of 13
722,
i.e. bias lower than 4%.
Thisapproximation
isquite
good regarding
the time ofcomputation
required by
this estimation(20 min)
while thecomplete
algorithm
needed more than 8days.
On the other
hand,
thegraphical
representation
might
be sensitive toslight
modifications of the distance matrix if the values of
diversity
are close forcer-tain subsets. Simulation
procedures
to evaluate the robustness of clades have beenproposed by
Weitzman[8].
Although
theclustering
power lookssatisfying
on theexamples
weconsidered,
anyphylogenetic
interpretation
of the results should beused with caution. It should also be
emphasized
that the use made ofgenetic
dis-tances in this
approach
differs from their use inderiving genealogical
trees.Though
trees are usefulgeometric representations
ofdiversity -
thediversity
function de-fined above is indeedequal
to the total branchlength
of thecorresponding
tree-they
must be considered astelling
theevolutionary
story
that best fits thediversity
observed,
but notnecessarily
astelling
the ’true’story.
Infact,
asemphasized by
Weitzman
[9],
there is no need for the elements to have beengenerated by
anyreal
evolutionary phylogeny.
This has to bekept
in mindparticularly
when sets of domestic breeds are considered. Given theexchanges
known to have occurred in theirpast
histories,
domestic breeds are indeed notlikely
to have resulted from astrict tree-like
branching
process. Whereas taxonomists areessentially
interested infinding
theevolutionary
story
behind agiven
observeddiversity,
conservationists,
especially
breedconservationists,
do not need thattype
of information asthey
are more concerned with the future evolution ofdiversity.
The main use of the Weitzman method is to determine
preservation strategies.
Itsupposes,
however,
that the elements of the set considered are and remain distinct.If this constraint can be
removed,
it may besuggested
that certainendangered
breeds beamalgamated
with other ones. Thepopulation
size wouldincrease,
noadditional costs would be
engaged,
and the direct loss of alleles that results froman extinction could be avoided. Of course, this
implies
that the breed standardsshould be relaxed for a
while,
but it is adynamic
conception
ofpreservation
thatDespite
the criticisms which can be raisedagainst
the Weitzmanapproach,
including
that itignores
the differences in within unitvariation,
it should bekept
in mind that it doessatisfy
certain basicproperties
which do notalways
hold with traditional criteria. Theprinciple
(1)
of’monotonicity
inspecies’
means that thechange
indiversity
V(SBi) - V(S)
due to the loss of somepopulation
i isalways
negative
or nil(for
i being
a twinelement).
Incontrast,
thisproperty
does notapply
tovariance,
for it can beeasily
shown that the total variance of a mixture ofpopulations
can increase after some of them are deleted.ACKNOWLEDGMENTS
This work was conducted while Caroline Thaon was on a
’stage
defin
d’6tudes’ at the Station degénétique quantitative
etappliqu6e
(SGC!A),
Inra, Jouy-en-Josas
as a studentfrom the Ecole
Polytechnique,
Palaiseau. Shegreatly acknowledges
the support of bothinstitutions in
making
this stay feasible.Special
thanks areexpressed
to F. Grosclaude and K. Moazami-Goudarzi(Laboratoire
degénétique biochimique,
Jouy-en-Josas)
forproviding
the data on cattleanalysed
in thisstudy.
We are alsograteful
to C. Dillmann and P. Dubreuil(Inra,
Station degénétique vegetale,
LeMoulon)
and S. Lemarié(Inra-SERD,
Grenoble)
forhaving provided
additionaltest-examples,
and to BruceSouthey
andan anonymous referee for their valuable comments which
helped
toimprove
the manuscrit. E.Thompson
is also thanked for herEnglish
revision of the text.REFERENCES
[1]
Cunningham P.,
Geneticdiversity
in domestic animals:strategies
for conservationand
development,
in: MillerR.H.,
PurselV.G.,
Norman H.D.(Eds.),
XXBiotechnology’s
Role in the GeneticImprovement
of FarmAnimals,
AmericanSociety
of AnimalScience,
Savoy,
IL,USA, 1996,
pp. 13-23.[2]
GrosclaudeF., Aupetit R.Y.,
LefebvreJ.,
MériauxJ.C.,
Essaid’analyse
des relationsgénétiques
entre les races bovinesfrangaises
à 1’aide dupolymorphisme biochimique,
Genet. Sel. Evol. 22
(1990)
317-338.[3]
May R.M., Taxonomy
asdestiny,
Nature 347(1990)
129-130.[4]
OllivierL.,
Génétique
et conservationanimales,
in: MatassinoD., Boyazoglu J.,
Capuccio
A.(Eds.),
InternationalSymposium
on Mediterranean AnimalGermplasm
and Future HumanChallenges,
EAAPpublication
no.85, Wageningen Pers, Wageningen, 1997,
pp. 211-219.
[5]
SolowA., Polasky S.,
BroadusJ.,
On the measurement ofbiological diversity,
J. Environ. Econom.Manag.
24(1993)
60-68.[6]
Vane-Wright R.I., Humphries C.J.,
Williams P.H., What toprotect? Systematics
and the agony of
choice,
Biol. Cons. 55(1991)
235-254.[7]
Vissac B., Etudegénétique
de la raced’Aubrac,
in:L’Aubrac, CNRS, Paris, I, 1970,
pp. 29-102.[8]
WeitzmanM.,
A reduced formapproach
to maximum likelihood estimation ofevolutionary
trees, Harvard Institute of EconomicResearch, Paper
No.1569,
1991.[9]
WeitzmanM.,
Ondiversity, Quart.
J. Econ. 107(1992)
363-405.[10]
WeitzmanM.,
What topreserve?
Anapplication
ofdiversity theory
to craneAPPENDIX: the maximum likelihood tree
Weitzman
[8]
provides
thefollowing phylogenetic
interpretation.
Let us notep(i, j)
the conditionalprobability
P(i! j)
that aspecies
i existsgiven
that aspecies
j
exists. Assume that thisprobability
is a function of thegenetic
distance betweeni and
j.
Thehypothesis underlying
thisassumption
is that the distanced(i, j)
between twospecies
i and j
measures the time since theirseparation.
Moreprecisely,
we will suppose that
p(i, j)
=exp
!-ad(i, j)]
where A is a ’universal extinction rate’.The maximum likelihood tree is the evolution scheme
(i.e.
the set of unknownancestors)
which maximizes theprobability
that every element of S exists at thecurrent time.
Let
P( j !i)
be the conditionalprobability
thatspecies j exists given
i exists.Assuming
that the evolution scheme isknown,
it can be shownthat,
for any subsetQ
ES,
and J ESBQ,
the conditionalprobability
P(jlQ)
that j survived
given
Q
exists satisfiesNote
p(j,
Q)
=m! P(j!i).
Now,
from basicprobability theory,
P(jlQ)
-t€Q
P(Q U j)/P(Q),
andcombining
this with(A.1)
leads to:Let us note
11(8),
thelargest probability
that Sexists,
i.e. theprobability
of existence under the most favourable evolution scheme.Equation
(A.2)
applied
forQ
=SBi,
and j
=
i implies
Any
evolution scheme that would induce a value ofP(S)
= II* would be identified as the scheme under which theprobability
that S exists ismaximal,
ie the maximum likelihood tree.Taking
thelogarithm
ofequation
(A.3)
andnormalizing A
to1,
itbecomes:
Since