HAL Id: hal-02702484
https://hal.inrae.fr/hal-02702484
Submitted on 1 Jun 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Genetic distances : description, limits and future
Jacques Cabaret
To cite this version:
Jacques Cabaret. Genetic distances : description, limits and future. Veterinary Research, BioMed
Central, 1994, 25 (6), pp.609-611. �hal-02702484�
de biologie moléculaire, hôpital Saint- Antoine, 75012 Paris, France)
Malgré d’importants progrès réalisés tant au plan diagnostique que thérapeutique, Pneu- mocystis cann/y reste un agent opportuniste
de premier plan et de nombreuses énigmes persistent, notamment au niveau épidémio- logique. Le mode de transmission reste
encore à élucider, question d’autant plus
d’actualité que la nosocomialité de cette infec- tion a été envisagée par différents auteurs : Chave et al (1991) et Goesch et al (1990).
Une réponse pourrait être apportée par la mise en évidence de différences entre souches. Dans cette optique, nous avons
recherché une variation génomique entre
P carinü isolés de différents patients et nous présentons les résultats préliminaires de
cette étude.
Nous avons effectué une PCR selon la
technique décrite par Wakefield etal(1990),
sur l’ADN extrait de 21 lavages bronchio-
alvéolaires de patients présentant une pneu-
mocystose confirmée par les techniques
usuelles de diagnostic. Après purification
des produits d’amplification, un séquençage
direct a été réalisé. Les séquences obte-
nues ont été comparées entre elles et avec
celle initialement décrite par Sinclair et ai
( 1991 ) et par Lee et al (1993).
Les résultats obtenus indiquent que les
séquences nucléotidiques de P carinü pré-
sentent des différences entre elles et avec
celles publiées.
Références
Chave JP, Davis S, Melle GV, Francioli P (1991) Trans- mission of Pneumocystis carîniifrom AIDS patients
to other immunosuppressed patients: a cluster of Pneumocystis carinüpneumonia in renal transplant recipients. AIDS 5, 927-932
Goesch TR, Gotz G, Steillbrinck KH, Albrecht H, Hoss- feld DK (1990) Possible transferof Pneumocystis
carinü between immunodeficient patients. Lancet 336,627
Lee CH, Lu JJ, Bartlett MS et al (1993) Nucleotide sequence variation in Pneumocystis carinü strains that infect humans. J Clin Microbiol 31, 754-757
Sinclair K, Wakefield AE, Banerji S, Hopkin JM (1991) Pneumocystis carinüorganisms derived from rat and human hosts are gentically distinct. Mol Biochem Parasitol45, 183-184
Wakefield AE, Pixley F, Banerji S et al (1990) Detec-
tion of Pneumocystis carinü with DNA amplification.
Lancet 336, 451-453
Genetic distances: description, limits
and future. J J Cabaret Cabaret(INRA, (INRA, station de station de pathologie aviaire et de parasitologie, labo-
ratoire d’écologie des parasites, 37380 Nou- zilly, France)
The assessment of resemblance between
objects has been a preoccupation of many
a researcher involved in morphologic, eco- logical or genetic investigations. Legendre
and Legendre (1979) described a large array of critical distances used in the field of ecol- ogy. Geneticists have also developed their
own distances, which have been reviewed
by de Vienne and Damerval (1985). Since
this review, advances in the field of mole- cular biology and statistics (resampling
methods are available due to progress in
computational capacities), may justify that
distances might be reexamined. The use of distances as basic materials for phylogenetic
reconstruction is now widespread (Darlu
and Tassy, 1993) and further investigations
of the limits of distances is needed. The pre- sent work will focus on present develop-.
ments in the use of distances.
There are ’good’ and ’bad’ distances from the mathematical point of view. The ’bad’
ones do not allow comparisons between
several populations. The ’good’ distances
should obey the triangle inequality. This
means that if distances between 3 taxa or
populations (A, B, C) are compared, dis-
tance (AC) must be equal or less than dis- tance (AB) plus distance (BC). Distances
used by ecologists, such as Jaccard, Sokal and Sneath, and chi-squared, obey this tri- angle inequality. The genetic distances of Rogers, Edwards and Gregorius also obey
the triangle inequality, whereas the Nei dis-
tance does not. The latter, although very
widely used in genetic investigations, is the
least efficient at comparing populations from
the mathematical point of view. Euclidean distances derived from multivariate ana-
lyses (eg, principal component analysis of phenotypic frequencies between popula- tions) may prove of a wider interest when
two-by-two independent comparisons are required (Gasnier et a!, 1992). The multi- variate approach is probably best when a comparative approach of populations (mor- phologic, ecological and genetic) is the tar- get (Hoste and Cabaret, 1992).
One of the most important drawbacks in the use of distances is the absence of an accurate estimation of variability. Gregorius (1984), Katz (1986) and Katz and Goux
(1986) attempted to estimate variability of Gregorius or Nei distances; formai estimates
were not obtained and only simulations could assess the extent of variability. Rogers (1991 ) also assessed the variability of sev-
eral genetic distances by means of com- puter simulations. Resampling methods (jack-knife and bootstrap) should be per- formed in the future if genetic distances are
to be used further. Genetic distances should be also related to the Fstatistics of Wright, largely employed by geneticists.
Distances are always estimated for a par- ticular purpose: identification or phyloge-
netic construction. In the latter case, patristic
distance (number of change of state during evolution) is most often used. Genetic dis- tances such as Rogers, Manhattan and Edwards distances underestimate patristic
distances. The genetic distances conversely
overestimate differences between popula- tions, mostly when small samples are stud-
ied (Rogers, 1991 ).
Resemblance-divergence patterns
between populations or taxa are established
on different approaches, phenetic (general
resemblance between taxa) or phylogenetic (evolution of descriptive characters). Group- ing of taxa relies on hypotheses described
in Darlu and Tassy (1993). The best match
between genetic (patristic distance) and algorithm of classification is still to be found.
Most of the algorithms used in phenetic classification are described in Roux (1985).
Cluster analyses are either ascending or descending. They minimize the within-group
distances or variance. Cladisticians refer to parsimony models (minimisation of changing states), among which the most common are Wagner (convergence and
reversion are accepted), Camin-Sokal (reversion towards ancestral state is not
accepted), and Dollo (convergence not accepted).
The choice of distances and estimation of their variability have little evolved over the last few years. The same observation can be made for algorithms. Multifactorial analyses, although more easily available on ordinary computers, did not show the development
one could have expected. This could be related to the apparent difficulty of assess- ing statistical inference in such analyses.
Discriminant analysis should overcome eas- ily this drawback: distances of Mahalanobis have a well-known distribution and should
satisfy the needs of geneticists. Multivari-
ate analyses could provide (directly or indi- rectly) an efficient array of distances between data of different status (morpho- logical, ecological and genetical); these met-
rics could provide good quality estimates of
mean and variability, when associated with
resampling procedures.
References
Darlu P, Tassy P (1993) Reconstruction phylogénétique.
Concepts et méthodes. Masson, collection Biologie théorique 7, Paris, 245 p
De Vienne D, Damerval C (1985) Mesures de la diver- gence génétique. 3. Distances calculées à partir de marqueurs moléculaires. In: Les distances géné- tiques. Estimations et applications (Lefort-Busson M, de Vienne D, eds), INRA, Paris, 39-57, Gasnier N, Cabaret J, Moulia C (1992) Allozyme varia-
tion between laboratory reared and wild populations
of Teladorsagia circumcincta. IntJ J Parasito/22, 581 -
587
Gregonus HR (1984) A unique genetic distance. Biom J 26, 13-18 8
Hoste H, Cabaret J (1992) Intergeneric relations bet-
ween nematodes of the digestive tract in lambs: a multivariate approach. Int Parasito/22, 173-179 Katz M (1986) Étude des propriétés de certains indices
de distance génétique et de leurs estimateurs. Thèse de sciences, Paris VII l
Katz M, Goux JM (1986) The statistical properties of genetic absolute distance. Biom J 28, 729-739 Legendre L, Legendre P (1979) Écologie numérique. 2.
La structure des données écologiques. Masson, col- lection d’Écologie 13, Paris. 254 p
Rogers JS (1991 ) A comparison of the suitability of the Rogers, modified Rogers, Manhattan and Cavalli- Sforza and Edwards distances for inferring phylo- genetic trees from allele frequencies. 5yst Zoo140, 63-73
Roux M (1985) Algorithmes de classification. Masson, collection Méthodes + Programmes, Paris, 151 p
Taxonomic sampling, sequence length
and the robustness of molecular phylo- genies. G Lecointre G Lecointre (Laboratoire d’ichtyo- (Laboratoire d’ichtyo- logie générale et appliquée, et service com-
mun de systématique moléculaire du muséum (GDR 1005), muséum national d’Histoire naturelle, 43, rue Cuvier, 75231 1
Paris cedex 05, France)
Before exposing original results on the
robustness of molecular phylogenies, 1 will
make a few remarks about the way trees
are used in applied biology, leading to 2 paradoxes.
First, in both applied and fundamental
sciences, the’operationai taxonomic unïts’
compared (strains, populations, species, genera, families, etc) are biological enti- ties, ie entities with historical links between them. The comparison of the characters studied (molecular, morphological, etc)
makes sense only in the light of the con- cept of descent (of organisms) with modifi-
cations (of characters). This concept makes
a large difference between classifying living organisms (which means producing a phy- logeny) and classifying, for instance, toys or
cartoon characters (which means producing
groupings based on non-historical criteria).
This concept implies that every tree built from biological entities concerns phy- logeny
*
. The first paradox is that some- times, in applied sciences symposia, people present dendrograms, pretending that they
are not phylogeny but just ‘classification’,
while their conclusions about these trees are explicitly historical. Even in agronomy, the classification of biological entities is about phylogenies. Thus everybody who produces trees from molecular data sets
necessarily practices molecular phylogeny.
As stressed by Ernst Mayr, nothing makes
sense in biology, except in the light of evo-
lution.
The second paradox is that the trees that
should represent phylogenies are not built
with optimal tools, but sometimes obsolete
tools, and in the worst cases, ill-employed
tools (ie UPGMA (unweighted pair group method using arithmetic averaging) without knowing anything about the constancy of
the rate of change of the characters used).
To avoid this, some basic books are avail-
able, for instance those of Hillis and Moritz
(1990), Li and Graur (1991), or Darlu and Tassy (1993). Moreover, in the great major- ity of studies in applied sciences, no infor-
mation is given on the reliability of the trees produced. A very widely used tool in fun- damental molecular phylogeny studies, the bootstrap (Feisenstein, 1985, 1988; Hillis’
and Bull, 1993), should be used. In this way 1 follow the opinion of Penny and Hendy
*