- - -
- - -
Dépôt Institutionnel de l’Université libre de Bruxelles / Université libre de Bruxelles Institutional Repository
Thèse de doctorat/ PhD Thesis Citation APA:
Cassens, I. (2004). Molecular evolutionary biology of cetaceans : phylogeny, phylogeography and conservation genetics (Unpublished doctoral dissertation). Université libre de Bruxelles, Faculté des sciences, Bruxelles.
Disponible à / Available at permalink : https://dipot.ulb.ac.be/dspace/bitstream/2013/211152/1/1d6f9916-9e16-41db-8bf9-e9b379685f2c.txt
(English version below)
Cette thèse de doctorat a été numérisée par l’Université libre de Bruxelles. L’auteur qui s’opposerait à sa mise en ligne dans DI-fusion est invité à prendre contact avec l’Université ([email protected]).
Dans le cas où une version électronique native de la thèse existe, l’Université ne peut garantir que la présente version numérisée soit identique à la version électronique native, ni qu’elle soit la version officielle définitive de la thèse.
DI-fusion, le Dépôt Institutionnel de l’Université libre de Bruxelles, recueille la production scientifique de l’Université, mise à disposition en libre accès autant que possible. Les œuvres accessibles dans DI-fusion sont protégées par la législation belge relative aux droits d'auteur et aux droits voisins. Toute personne peut, sans avoir à demander l’autorisation de l’auteur ou de l’ayant-droit, à des fins d’usage privé ou à des fins d’illustration de l’enseignement ou de recherche scientifique, dans la mesure justifiée par le but non lucratif poursuivi, lire, télécharger ou reproduire sur papier ou sur tout autre support, les articles ou des fragments d’autres œuvres, disponibles dans DI-fusion, pour autant que :
Le nom des auteurs, le titre et la référence bibliographique complète soient cités;
L’identifiant unique attribué aux métadonnées dans DI-fusion (permalink) soit indiqué;
Le contenu ne soit pas modifié.
L’œuvre ne peut être stockée dans une autre base de données dans le but d’y donner accès ; l’identifiant unique (permalink) indiqué ci-dessus doit toujours être utilisé pour donner accès à l’œuvre. Toute autre utilisation non mentionnée ci-dessus nécessite l’autorisation de l’auteur de l’œuvre ou de l’ayant droit.
--- English Version ---
This Ph.D. thesis has been digitized by Université libre de Bruxelles. The author who would disagree on its online availability in DI-fusion is invited to contact the University ([email protected]).
If a native electronic version of the thesis exists, the University can guarantee neither that the present digitized version is identical to the native electronic version, nor that it is the definitive official version of the thesis.
DI-fusion is the Institutional Repository of Université libre de Bruxelles; it collects the research output of the University, available on open access as much as possible. The works included in DI-fusion are protected by the Belgian legislation relating to authors’ rights and neighbouring rights.
Any user may, without prior permission from the authors or copyright owners, for private usage or for educational or scientific research purposes, to the extent justified by the non-profit activity, read, download or reproduce on paper or on any other media, the articles or fragments of other works, available in DI-fusion, provided:
The authors, title and full bibliographic details are credited in any copy;
The unique identifier (permalink) for the original metadata page in DI-fusion is indicated;
The content is not changed in any way.
It is not permitted to store the work in another database in order to provide access to it; the unique identifier (permalink) indicated above must always be used to provide access to the work. Any other use not mentioned above requires the authors’ or copyright owners’ permission.
DE BRUXELLES SCIENCES
DEPARTEMENT DE BIOLOGIE MOLECULAIRE LABORATOIRE DE GENETIQUE DE L’EVOLUTION
D 0328B
FACULTE DES
Molecular Evolutionary Biology of Cetaceans:
Phylogeny, Phylogeography, and Conservation Genetics
Thèse présentée en vue de l’obtention du titre de Docteur en Sciences
INSA CASSENS
DIRECTEUR DE THESE: PR. MICHEL C. MILINKOVITCH QUE 2003 / 2004
X
FACULTE DES SCIENCES
DEPARTEMENT DE BIOLOGIE MOLECULAIRE LABORATOIRE DE GENETIQUE DE L’EVOLUTION
Molecular Evolutionary Biology of Cetaceans:
Phylogeny, Phylogeography, and Conservation Genetics
Thèse présentée en vue de l’obtention du titre de Docteur en Sciences
INSA CASSENS
DIRECTEUR DE THESE; PR. MICHEL C. MILINKOVITCH
ANNEE ACADEMIQUE 2003 / 2004
j
In the very first place, l wish to thank my supervisor Michel C. Milinkovitch, who - without really knowing me - took me on as a PhD. student. I immensely appreciated your confidence during ail these years and ail the efforts that mode the realization of this doctoral thesis in your laboratory possible. Thanks a lot for your amity. Thanks for yonr enthusiasm, ideas, and constructive criticism, for constantly encouraging and pushing me. You introduced me to the fascinatingfield of evolutionary genetics and guided me to independent research. I especially enjoyed the freedom I had in (kveloping my projects. There is only one question in my mind that still awaits an answer: Why did it take me so many years to finish my work, when everything in your lob is urgent and should be done asap???
Many thanks also to Gisèle Van de Vyver for helpful advice during these years, especially with regard to administrative matters ...a territory wherelam completely lost!
1 thank Koen Van Waerebeek for many stimulating discussions by email or over a coffee nearby Brussels or Gent raihvay station. Surrounded by genes and molécules, you were my most important link to the “real” world of South American small cetaceans, reminding me of the importance of biologically meaningful hypothèses! Hopefully, your fight for the long-term conservation of whales and dolphins will bear fruit!
I was fortunate to hâve found very good friends and I mil always hâve spécial thoughts about the years in Belgium. Arnaud was the first person I met in the lift the first day in the lab and my life would not hâve been the same without him. Thanks, Monsieur Termonia, for your untiring patience of listening to my broken French in the first months (years?), for your humor, for our endless discussions about everything and nothing, for your unconditional support (not only in realizing my “créative ” interior décoration and furnishings ideas), for countless hours playing billiard & backgammon, running, and fighting in the squash court, for standing my chaotic nature. I was happy in Belgium because we are friends and I thank
you and Sophie of inviting me to become part of your family.
I am very thankful to Patrick for his encouragement and friendship from the very early day s and especially during the last two years. Thanks, Pat, for the many scientific discussions on ail aspects of population genetics and our productive collaborative work, for your confidential ear and moral support in difficult moments, and for remaining incredibly patient when I was in a bad or, maybe even worse, in a good mood (because then you had to support my sometimes spécial sense of humor). Due to long days in the lab, my fridge was empty ail the time. I survived because you approached hero status in the final months in feeding me with barrels of soup, kilos of stuffed peppers (and other delicious food), and tons of apples, mangos, kiwis, and mandarins. Thanks so much!
I very much enjoyed the company of Nasia since she took one of her best decisions and came back to Belgium. Thanks for your Greek cooking and for being the perfect organizing committee of our summer holidays in Greece, for help in uncomfortable situations (e.g.
arriving in Brussels at 6 a.m. without any keys!), for plenty of weekends spent in the lab, and for the short-whiled rides to and from university (and sorry for the number of times you had
to wait because I did not hear the alarm-clock!). Thanks for trusting me.
From our lab, Daniel also deserves spécial mention. You hâve been a great source of
practical information and provided constant assistance; from very beginning (first inscription
at ULB), over the years (équivalence papers), and at the very end (préparation for the big printing day). Thanks for your invaluable Help! I should also mention at this place that I am
deeply sorry for ail the times I tookMayka for a walk in the for est full of nmd...
I am indebted to many other colleagues (including those who arrived recently or hâve passed through) in the Laboratory of Evolutionary Genetics for providing the very friendly environment in which I was happy to carry out my doctoral studies. Spécial thanks go to Laurent and Sabrina for having a nice time in the same office, to the computer specialists Colas and Pascal for providing technical support, to Raphaël for the hobbits ’ quotes, to Jehanne, Marie-Anne, and Delphine for providing me with the necessary amount of sugar in the last weeks, to Justine for assistance in the right moments, to Cedric for standing the never-ending discussions with Pat in his office, and to Carole for organizing beach-volleyball toumaments. Thanks to ail (not to forget the “yeast” people from the other side of the floor!) for your camaraderie and friendship.
I wish to thank my friends for helping me get through the difficult times, and for ail the emotional support, entertainment, and caring they provided Thanks to Lassina, Raquel, Christof and Myriam for the invitations and the nice moments spent together. Thanks to Gerald for assuming from the very beginning that I would do fine; this helped me more than you know. Other friends whose encouragement and understanding hâve remained constant in spite of the distance are Bjôrn, Kristin, and most importantly, Jule. Thanks for the place in your life.
My family, and especially my parents, has been a constant source of support - emotional, moral, and financial - during the last six years, and this thesis would certainly not hâve existed without them. I thank you for your absolute confidence in me. The knowledge that you will ahvqys be there to pick up the pièces was what allowed me to repeatedly risk getting shattered.
And last but not least Jôm, for the very spécial person he is and who 's presence in my life is so important. Thanks so much for your encouragement and understanding through ail the ups and downs and for lots of wise words in the right moments. And most importantly for the incredible amount of patience you had with me, thanks for waiting so many years!
Furthermore, I would like to express my thanks to the members of the jury for kindly accepting to evaluate my work.
I acknowledge financial support from the “Deutscher Akademischer Austauschdienst”
(DAAD).
Thanks to ail ofyou! Insa
Molecular Evolutionary Biology of Cetaceam Table of Contents
TABLE OF CONTENTS
GENERAL INTRODUCTION... 4
Molecular biology in Cetaceans... 5
Cetaceam — origin and évolution ... 5
Cetaceans —population genetics ... 5
Cetaceans - threats and conservation efforts ...6
Molecular phylogénies... 8
Phylogeny inference methods ... 8
Genetic distances and models of sequence évolution ... 10
Use of reticulated graphs in phylogeny reconstruction ... 10
Rooting ...11
Species trees, gene trees, and the coalescent process... 11
Stockas tic lineage sorting ... 12
Coalescence theory ... 13
Inferences using the coalescent ... 14
Bayesian methods... 16
Metropolis-Hastings, Markov chain Monte Carlo sampling ... 16
Applications of Bayesian methods ...17
Numts (Nuclear mitochondrial-like sequences)... 18
Objectives...19
PART I: PHYLOGENY IN CETACEANS...21
Chapter 1. Independent Adaptation to Riverine Habitats Allowed Survival of Ancient Cetacean Lineages... 22
Introduction ... 23
Material & Methods ... 24
Results & Discussion ...26
PART II: EVALUATION OF GRAPH CONSTRUCTION METHODS...31
Chapter 2. The phylogeography of dusky dolphins (Lagenorhynchus obscurus): a critical examination of network methods and rooting procedures... 32
Introduction ... 33
Material & Methods ... 34
Data Collection ... 34
Genetic divers ity and population structure ... 35
Network estimation ... 35
Rooting techniques ...36
Results ... 36
MtDNA sequence diversity & population structure ... 36
Comparison of network estimation methods ... 39
Rooting of the intraspecific genealogy ... 40
Discussion ... 40
Population structure & management implications ...40
Network methods ...41
Rooting ... 42
Species origin & dispersion ... 43
Chapter 3. Evaluating Intraspecific “Network” Construction Methods Using Simulated Sequence Data: Do
Existing Algorithms Outperform the Global Maximum Parsimony Approach?... 45
Introduction ... 46
Material & Methods ...48
Simulation ofData ...48
Network Construction ...49
Comparison of Graphs & Statistical Analyses ...51
Results ... 51
Compatibility ...51
Ambiguity ...52
Tree length ...53
Discussion ... 53
General Performance of the UMP Approach ... ...53
Comparative Analysis of Graph Construction Methods ... 54
PART III: PHYLOGEOGRAPHY AND CONSERVATION GENETICS IN SMALL COASTAL CETACEANS FROM SOUTHERN HEMISPHERE WATERS... 57
Chapter 4. Male dispersai along the coasts but no migration in pelagic waters in dusky dolphins (Lagenorhynchus obscurus)... 58
Introduction ... 59
Material & Methods ...60
Tissue collection and DNA extraction ... 60
Sequencing and genotyping ... 61
Genetic dtversity ...63
Population structure & phylogeographic patterns ... 64
Sex-biased dispersai ...64
Coalescent-based estimâtes of migration rates and population divergence times ... 65
Results ... 65
Genetic dtversity ...65
Phylogeographic patterns ... 66
F-statistics ... 67
Bayesian clustering approaches & microsatellite data ... 68
Gender-biased dispersai ... 69
Estimatingpopulation sizes, migration rates, and divergence time in Atlantic waters ... 70
Discussion ... 71
Ancient versus recent réduction of genetic diversity in Péruvien dusky dolphins ... 71
Population structure and dispersai pattern along the South American coast ... 71
Population structure and dispersai patterns across the Atlantic ... 73
Population substructuring and the phylogenetic position ofNew Zealand dusky dolphins ... 73
Chapter 5. Population structure of nuclear and mitochondrial DNA variation among South American Burmeister’s porpoises (Phocoena spinipinnis)...»... 75
Introduction ... 76
Material & Methods ... 77
Sample collection and DNA extraction ... 77
Microsatellite isolation and genotyping ... 78
Microsatellite analysis ... 79
Mitochondrial sequence analysis ... 81
Results ... 83
Microsatellite variability ... 83
Nuclear genetic différentiation ... 83
Cross-species amplifications ... 84
Mitochondrial sequence diversity and population structure ...85
Discussion ... 86
Nuclear and mitochondrial population structure ... 86
Conservation and management perspectives ...87
Cross-species amplification ... 89
Molecular Evolutionaty Biology of Cetaceans Table of Contents
PART IV: NUCLEAR MITOCHONDRIAL-LIKE SEQUENCES (NUMTS) IN CETACEANS
... 90
Chapter 6. Phylogenetic analysis of ancient mtDNA intégrations into the nuclear genome of cetaceans... 91
Introduction ...92
Material & Methods ...93
Samples ... 93
DNA extraction, amplification, cloning, and sequencing ...94
Sequence analysis ...94
Results & Discussion ...95
Presence of multiple cytochrome b-like sequences in individual dolphins and porpoises ... 95
A minimum of three ancient intégrations of2.9kb mtDNA into the nuclear genome ... 97
Comparisons of substitution pattern among clades ... 99
Utility of numt sequences in rooting analysis ... 100
CONCLUSIONS AND PERSPECTIVES... 102
Phylogeny in cetaceans (part I)... 102
Evaluation of graph construction methods (part II)...103
Phylogeography and conservation genetics in small Coastal cetaceans from Southern Hemisphere waters (part III) ... 103
Nuclear mitochondrial-like sequences (numts) in cetaceans... 104
SUMMARY... 106
BIBLIOGRAPHY...107
APPENDICES... 118
Appendix I: Classification of extant Cetacea... 118
Appendirll: F-statistics & genetic variability... 120
Measures of genetic variation levels ...120
Measures of population différentiation ...121
Appendix III: Abbreviations... 123
Appendix IV: Glossary... 124
Appendix V: Description of the UnionTree algorithm for graph construction (chapter 3)... 126
General Introduction
This dissertation explores the evolutionary biology of Cetaceans (whales, dolphins, and porpoises; cf. Appendix I), through different research areas in the field of molecular évolution. The application of molecular markers in this mammalian order is extremely promising: with respect to (?) phylogenetic questions because the highly derived morphology of cetaceans makes homology statements and polarization of morphological characters difficult, and to (//) the assessment of population structure and dispersai behavior because their elusive behavior in the aquatic environment renders direct observation laborious.
In part I, interspecific phylogeny reconstruction methods are applied to multi-locus sequence data to infer the evolutionary history of “river dolphins”.
Part II focuses on the évaluation of “network construction methods” that are widely used in intraspecific studies for depicting the genealogical relationships among non- recombined DNA fragments. A new analytical approach is developed and différences among methods are studied using empirical (mitochondrial sequence data from dusky dolphins) and simulated data sets.
Part III describes population genetic studies in two small cetacean species (dusky dolphin, Lagenorhynchus ohscurus, and Burmeister’s porpoise, Phocoena spinipinnis) that sympatrically occur in South American waters.
Finally, in Part IV, I provide evidence for the existence of multiple nuclear copies of mitochondrial DNA fragments (“Numts”, sensu Lopez et al. 1994) in several cetacean species, and identify the corresponding insertion and duplication events using phylogenetic analyses.
Although the spécifie questions that hâve been explored are described in details in the respective introductions of each chapter, this first section will expose some general ideas and interpretive tools relevant to the inferences made in this study.
(U
c!
0) U
0) ü o
Q) 4Jü ra o
-HEocene Oligocène ^ ^ Miocene
I—I
I I I T
55 50 45 40 35 30 25
(a)
camels pigs peccaries cows deer hippos
whales,dolphins and porpoises
Figure 1. Reconstructing the past of cetaceans. Dashed lines show the approximate evolutionary timescale relevant to the different studies; (a) Phylogenetic hypothesis regarding the origin of whales, dolphins, and porpoises based on mitochondrial and nuclear data (see text for details); (b) evolutionary relationships among
“river dolphins” (Part I); (c) and (d) population genetics and phylogeography of dolphins and porpoises (Part
III); and (e) study of multiple nuclear insertions of mitochondrial-like sequences in cetaceans (Part IV).
Molecular Evolutionary Biology of Cetaceans General Introduction
Molecular biology in Cetaceans
Cetaceans — origin and évolution
Cetaceans are considered some of the most derived among extant mammals, as their évolution from terrestrial animais to an entirely aquatic life form was accompanied by fast and radical morphological and physiological adaptations. Some obvious morphological similarities shared by ail whales, dolphins and porpoises are, for example, a stream-lined, fiisiform body shape (with shortened neck and telescoped skull) and the réduction or internalisation of protubérances such as hind limbs, extemal ears, and genitals, ail adaptations reducing drag for fast swimming in an aquatic environment. Likewise, a po\verful horizontal tail fluke and flipper-like forelimbs are used for propulsion and steering/stability, respectively. Large body size and the presence of a thick subcutaneous blubber layer filled with fat and oil aid in thermorégulation whereas extemal nares (blowholes) that are located on the top of the head facilitate breathing during swimming. In addition, numerous profound physiological adaptations are les conspicuous, including those involved in buoyancy control, diving, water balance, sensory réception, and under-water communication (Berta & Sumich
1999).
Given their highly derived morphology that makes homology statements and polarization of morphological characters difficult, phylogenetic analyses in cetaceans are challenging. The oldest fossils that can be recognized as whales corne ffom rocks in the Himalayas and are about 55 million years old (Thewissen et al. 2001). With respect to whales’ origin and their position in the mammalian tree (Fig. la), analyses of DNA sequences and other molecular characters hâve suggested that cetaceans are not only closely related to, but are nested within the order Artiodactyla (cows, deer, hippos, pigs, peccaries, and camels) as the sistergroup of hippopotamuses (Sarich 1985; Amason & Gullberg 1996; Gatesy et al.
1996; Gatesy et al. 1999; Graur &, Higgins 1994; Milinkovitch et al. 1998; Milinkovitch &
Thewissen 1997; Nikaido et al. 1999). While the hippo + whale hypothesis is increasingly accepted, phylogenetic relationships among the major cetacean lineages remain highly disputed, including long-standing debates on the paraphyly {cf. Appendix IV) of toothed whales (Amason et al. 2004; Cerchio & Tucker 1998; Hasegawa et al. 1997; Messenger &
McGuire 1998; Milinkovitch 1995; Milinkovitch et al. 1996; Milinkovitch et al. 1994;
Milinkovitch et al. 1993; 1995; Nikaido et al. 2001; Smith et al. 1996) and the phylogenetic positions of four extant “river dolphin” species (Fig.lb).
“River dolphins” hâve often been placed into a single taxon, as ail show a peculiar morphology with a characteristic long and narrow rostrum, a low triangular dorsal fin, and broad and flexible flippers (among other, mostly cranial, characters). However, similarities could be ancestral or explained by convergent {cf. Appendix IV) adaptations to the riverine environment; in either case these characters would be phylogenetically uninformative.
Moreover, there is evidence that the ancestors of recent cetaceans had an explosive evolutionary radiation 30-40 million years before présent. This makes the phylogenetic reconstruction of successive spéciation events extremely difficult, as (/) the short internai branches are unlikely to bear many informative changes, and (//) the probability of having multiple conflicting gene trees increases {cf. below for a more detailed description of
“stochastic lineage sorting”).
Cetaceans -population genetics
The study of genetic différentiation among contemporary populations of dusky
dolphins (Fig. le) and Burmeister’s porpoises (Fig. Id) focuses on microevolutionary
changes at or below the species level. In intraspecific studies, a major challenge in the
analysis is to work out the relative contributions of distinct processes such as genetic drift.
gene flow, and sélection (ail likely to be variable both in time and space). For several reasons, this task is particularly complex in cetaceans. First, many cetacean species show high dispersai abilities and are distributed across habitats where movements are difficult to record and barriers to migration are seldom understood. Second, strongly biased sex-specific dispersai can resuit in incongruent population historiés for males and females (Baker et al.
1998; Bérubé et al. 1998; Brown Gladden et al. 1999; Escorza-Trevino & Dizon 2000;
Hoelzel et al. 1998b; O'Corry-Crowe et al. 1997). And third, complex behaviors such as philopatry and social organization into kinship groups can cause, even in sympatry (c/
Appendix IV) or on a small geographical scale, significant population subdivision (Hoelzel 1998; Hoelzel e/a/. 1998a).
Two examples below illustrate that the complex patterns of population structure found in some cetacean species can not easily be determined ffom an intuitive assessment of the species’ géographie distribution.
• A number of globally-distributed baleen whale species are highly migratory, spending summers on feeding groimds in temperate or near-polar waters, and wintering on breeding/calving grounds in shallow tropical waters. To date, a comprehensive picture of global genetic structure is available only for humpback whales (Megaptera novaeangliae): While three isolated populations in the North Pacific, North Atlantic, and Southern océans hâve been identified (Baker et al. 1990; Baker et al. 1993; Baker et al. 1994), fiirther substructuring into discrète breeding and feeding grounds were observed within populations (Baker et al. 1998; Baker et al. 1990; Baker et al. 1994;
Holm Larsen et al. 1996; Medrano-Gonzalez et al. 1995; Palumbi & Baker 1994). In the North Atlantic, for example, high-latitude feeding grounds (Gulf of Maine, Gulf of St. Lawrence, Newfoundland, Labrador, Greenland, Iceland, and Norway) are considerably structured in tenus of mitochondrial DNA, but almost homogeneous with respect to nuclear loci. This is the conséquence of (/) a strong matemally- directed fidelity to feeding grovmds {i.e., calves are bom during winter, then accompany their mother on her aimual migration during the first year, and continue, in subséquent summers to migrate back to the same feeding ground, although then independently ffom their mother) (Larsen et al. 1996; Palsboll et al. 1995) and (//) congrégation on a common breeding ground in the West Indies, where humpbacks of the different feeding areas mate and calve (Clapham et al. 1993). A “genetic tagging”
method showed that migratory movements of this species represent the longest known migration of any mammal, being almost 5000 miles one way (Palsboll et al. 1997).
• The problem of “hidden” genetic structure in cetacean populations is highlighted by studies on two forms of killer whales (Orcinus orca) found in the northeastem Pacific.
Observations of naturally-marked individuals during the past 20 years hâve led to the characterization of two distinct ecotypes; the so-called “résident” pods that are fish specialists, and the “transient” pods that feed primarily on mammals {e.g., pinnipeds, mustelids, and cetaceans). Residents and transients are highly distinct in both mitochondrial and nuclear DNA (Hoelzel et al 1998a; Hoelzel & Dover 1991;
Stevens et al. 1989), suggesting that the populations -although occurring in sympatry- are socially and reproductively isolated.
Cetaceans — threats and conservation efforts
For décades, conservation efforts in marine mammals primarily focused on the
international régulation of the whaling industry that was close to exterminate many of the
world’s stocks of great whales {i.e., baleen and sperm whales). Unfortunately, whaling cannot
be put forward as an example of successful sustainable management of a renewable resource
(Perrin et al. 2002). Although some species now recover despite past exploitation (e.g..
Molecular Evolutionary Biology of Cetaceans General Introduction
humpback whales in the North Atlantic, North Pacific, and parts of the Southern océans), many other severely depleted stocks persist in low numbers and are still highly endangered {e.g., the North Atlantic right whale, most blue whale populations). Nowadays, cetaceans are less affected by whaling activities and direct catch (though with noticeable exceptions, cf.
chapter 4 & 5 in this study), but increasingly threatened by incidental mortality {e.g., entanglement in fishing gear), pollution, habitat destruction, and injuries due to collisions with ships. Given that human activities are most intense in continental shelf waters, populations of small inshore and riverine species are most-heavily impacted (Brown Gladden et al. 1999; Dawson et al. 2001; Escorza-Trevino & Dizon 2000; McMillan & Bermingham 1996; Fichier et al. 1998; Rosel et al. 1999a; Secchi et al 2003; Secchi et al. 1998). Mortality levels can be high enough to reduce or eliminate local populations of more widely-distributed species (Twiss & Reeves 1999), and even threaten the survival of entire species, especially when their géographie distribution is restricted. Indeed, two of the most critically-endangered marine mammal species (i.e., they hâve a high risk of going extinct in the next 10 years) are the Chinese river dolphin {Lipotes vexillifer) which inhabits the lower and middle parts of the Yangtze River, and the vaquita (Phocoena sinus), a porpoise species that is limited to the upper part of the Gulf of California in Mexico.
Life history parameters in cetaceans can explain why these organisms are extremely vulnérable to elevated exploitation levels. Cetaceans are so-called “K-selected” species: they maintain their population sizes around the carrying capacity {cf. Appendix IV) of the local environment and are characterized by delayed sexual maturity, long life span, low reproductive rate, developed parental care, high infant survival rate, and low colonizing capacity (on the contrary, r-selected species hâve short lifespan, early reproduction, low biomass, and hâve the potential to produce large numbers of usually small offspring in a short period of time). Consider for an extreme the highly endangered North Atlantic right whale {Eubalaena glacialis): longevity is several décades (and may approach or even exceed a century) and reproduction rate is very slow (with average âge of sexual maturity at more than 10 years and an average interval of 3-6 years between successive calves) (Perrin et al. 2002).
Although many species of smaller toothed whales (porpoises and dolphins) can produce offspring annually, estimâtes of sexual maturity also range from 4-14 years, depending on the species (Berta & Sumich 1999). With the resulting low population growth rates, cetaceans generally adapt poorly to changing conditions and are extremely susceptible to high human- caused mortality levels due to a slow potential for recovery.
Despite the high mobility of cetaceans and the, often, apparent absence of géographie barriers in the marine environment, the identification of population structuring in marine mammals is more a rule than an exception (Baker et al. 1998; Baker et al. 1994; Brown Gladden et al 1999; Escorza-Trevino & Dizon 2000; Hoelzel 1998; Hoelzel et al. 1998b;
O'Corry-Crowe et al. 1997; Rosel et al. 1994; Rosel et al. 1999b). Hence, because substantial biological diversity can be lost, even if a single differentiated stock is wiped out, conservation biologists generally agréé that stocks, and not the entire species, should be the focus of management efforts to preserve its evolutionary potential. Even if there was agreement on a single stock définition (but see Avise & Bail 1990; Dizon et al. 1992; Moritz 1994; Sites &
Crandall 1997; Vogler & DeSalle 1994), the identification of stocks is still not a trivial task.
As already mentioned above, a number of factors can set hurdles to the délinéation of stock boimdaries in cetaceans (set aside the difficulties in identifying species délinéation;
Milinkovitch et al. 2002), including seasonal movements, dispersai, différences between sexes in philopatric behaviour, and diversity in mating strategies and social structure.
Nevertheless, understanding of stock structure is critical for many exploited or threatened
species, as it is the first step in stock assessment, i.e., the attempt to estimate the productivity
of a stock, to predict its résistance in the face of removals due to incidental catches, directed
harvests, or natural causes, and to measure its capacity to recover from these removals (Perrin et al. 2002).
Molecular phylogénies
Phylogeny inference methods
In molecular phylogenetics, we look for the “best estimate” of the evolutionary relationships among sampled entities (genes, individuals, species, etc.). Phylogeny inference methods seek to accomplish this goal either (1) by defming a sériés of steps (an algorithm) that leads to the détermination of a tree (i.e., “purely algorithmic” methods, sensu Swofford et al. 1996) or (2) by defining an optimality criterion for choosing, among the set of ail possible phylogénies, the best tree (or those that are equally good).
An example of a “purely algorithmic” approach is the Neighbour Joining (NJ): first, genetic distances between each pair of taxa are estimated; second, the two taxa separated by the lowest distance are clustered into a new internai node; third new distances between this new node and other taxa are computed. This process is iterated such that successively more distant taxa, or groups of taxa, are clustered into a final, fully coimected and strictly bifureating, tree. Unlike other clustering methods, such as the UPGMA (“Unweighted Pair Group Method using Arithmetic averages”), the neighbour joining method produces an unrooted tree and does not assume a molecular dock (i.e., the same rate of substitution along ail branches). Indeed, the set of pairwise distances (ffom which the least distant taxa is selected at each itération) are corrected for their average divergence from ail other nodes (Fig.2). Note that, although distance matrices are usually used in clustering (algorithmic) methods, there is no necessary conceptual link between distances and their use in clustering approaches. For example, the minimum-évolution and Fitch-Margoliash methods (Fitch &
Margoliash 1967) find the best tree by optimizing an objective fimction related to the déviation between the distances observed between sequences and those observed on the tree being evaluated.
(b)
A B C D
A 5 -ii 8
B 7 11
C ■9 -6 6
D -5 -2 -9
(c)
D(i,j) = d(i,j) - (ri + rj) where ri =
L
—2 t
with k i, j
and L = total number of taxa.
Figure 2. The neighbour joining procedure, (a), a four-taxon tree that violâtes the assumption of a molecular dock (numbers of substitutions are indicated next to each branch); (b, above diagonal), non-corrected distances d(i, j) between pairs of taxa; (b, below diagonal), pairwise distances D(i, j), that are corrected for the average divergence between each member of the pair and ail others nodes, using the formula in (c). Note that the use of uncorrected distances would lead to the incorrect clustering of A and C.
In contrast to distance methods, discrète methods (often called “character-based”
methods) consider each character separately, avoiding the loss of information that occurs when multiple characters are converted into pairwise distances. Ail character-based methods developed to date for phylogeny inference use an optimality criterion. The two most-widely used optimality criteria are maximum parsimony (MP) and maximum likelihood (ML).
MP identifies the tree (or trees) that requires the lowest number of evolutionary
changes to explain the observed data (Fig.3). Parsimony is based on the implicit assumption
that evolutionary changes are rare.
Molecular Evolutionary Biology of Cetaceans General Introduction
Figure 3. The maximum parsimony principle. In the upper part, a character matrix for four taxa (A-D) and five nucléotide sites (chl-ch5) is shown; in the lower part, the most parsimonious reconstructions on each of three possible tree topologies connecting four taxa is given. Bars with numbers indicate character State changes at the corresponding sites. Optimization is performed on the total number of evolutionary changes on the tree (the tree length, TL) and yields the left-most tree as the most parsimonious tree.
Characters or character changes that are likely to violate this assumption {i.e., particularly prone to multiple substitutions) can be assigned less weight in the analysis: by using step matrices with predefined costs associated to spécifie changes {e.g., downweighting of transitions vs. transversions) or by incorporating weights to spécifie characters (e.g., downweighting 3^^* positions in coding sequences).
Under the ML principle, one aims at finding the maximum likelihood tree (or trees), le., the tree that maximizes the probability of yielding the observed data (Fig.4). For computing the likelihood of a given tree (with associated topology, branch lengths, rates of évolution at each site, etc.), a model of sequence évolution is required which spécifiés the probabilities of character State changes along the tree. The model parameters may be user- defined or can be estimated (through ML optimization) from the data. The fondamental équation for likelihood inference in phylogenetics is Z = P(D\G, fj), where L is the likelihood of the tree, P is the probability, D is the data (e.g. DNA sequences), G is the tree and /i is a collection of parameters defining the substitution model.
(a)
ch1 ch2 ch3 cM ch5
A C A C C
B T T C C
C T A T T
D T GSi- A A T
(b)
(c) L(ch2/tl,BrL) =
(d) L(chl-5/tl,BrL) = L(chl) x L(ch2) x L(ch3) x L(ch4) x L(ch5) (e) Choose ML estimate among:
L(chl-5/tl,BrL),L(chl-5/t2,BrL),L(chl-5/t3, BrL)
Figure 4. The maximum likelihood principle. (a), character matrix for four taxa (A-D) and five nucléotide sites
(chl-ch5); (b), one (tl) of the three possible tree topologies for four taxa; x and y represent ancestral character
States; (c), the likelihood for a site (here ch2) on tree tl (with given branch lengths, BrL) is the sum of the
probabilities of ail the different ways the observed nucléotides could hâve evolved; (d), the overall likelihood for
tl is the product over ail sites; (e), ML calculations need to be done for the other possible topologies (t2, t3) in
order to choose the tree (or trees) that maximize the probability of observing the data. Note that the ML
approach requires a probabilistic model for the process of sequence évolution, i.e., we specify the transition
probability from one nucléotide State to another in a time interval dt.
Genetic distances and models of sequence évolution
Simple distance measures {e.g., counting the number of nucléotide sites at which two sequences differ) can be useful for closely related sequences. However, as genetic distance increases, measures are required that correct for multiple substitutions, i.e., substitutions that get superimposed at the same site. The simplest model of sequence évolution under which corrected distances can be obtained is the Jukes-Cantor (JC) model that assumes ail nucléotides occur at equal firequencies and ail substitutions are equally likely. As a more complex model, the Hasegawa-Kishino-Yano (HKY) model allow both for bases to occur at different frequencies and for a transition/ transversion bias (Hasegawa et al. 1985). The above models assume equal rates of substitution across ail sites, although rate heterogeneity has been shown to significantly affect the différence between estimated and real levels of sequence divergence. For example, as the presence of invariant sites can mislead evolutionary analysis (Lockhart et al. 1996), their proportion can be estimated under the invariant sites model. Site to site variation can also be modelled in combination with the JC and HKY model using a gamma distribution of rates across sites, with the shape parameter a estimated from the data: while low a values correspond to large rate variation, ail sites hâve the same substitution rate when a approaches infmity. A further complication is introduced if the base composition is significantly different among lineages as this can cause artefactual grouping of lineages with similar composition bias. The LogDet transformation (Lockhart et al. 1994) is less sensitive to base composition heterogeneity and has been used successfully in combination with distance methods.
Use of reticulated graphs in phylogeny reconstruction
Ail tree construction methods described above (NJ, MP, ML) hâve initially been developed for phylogeny estimation among well-differentiated species and assume non- reticulated relationships {i.e., absence of recombination) among the gene sequences under scrutiny. For portraying instances of recombination or horizontal gene transfer among lineages, reticulated graphs with cycles (often called “networks”) are valuable tools.
Moreover, intraspecific “network” construction methods, most of them being purely algorithmic, hâve been developed to infer the genealogical relationships among closely- related sequences {e.g., minimum spanning network (Excoffier & Smouse 1994), median- joining network (Bandelt et al. 1999), statistical parsimony network (Templeton et al. 1992)).
When recombination events are unlikely to hâve taken place (as is the case for mitochondrial sequence data), inferred cycles are still helpful in indicating the level of ambiguity as alternative genealogical pathways can exist due to parallel or convergent (homoplasious; cf.
Appendix IV) character changes (Fig.5). In sharp contrast to most tree reconstruction methods whose accuracy, consistency, and efficiency hâve been extensively studied using both mathematical and simulation approaches {e.g., Felsenstein 2004; Swofford et al. 1996 and references therein), little is known about the performances of different “network”
construction methods. Given that scientific hypothèses may strongly dépend on the topology
of the inferred généalogies, comparative analyses are warranted to better imderstand the
assumptions underlying some of the existing methods and to test whether différences can be
generalized as indicative of systematic artefacts. Simulation studies are particularly helpful in
exploring performances under finite sequence lengths, and hâve become the main tool for the
évaluation of reconstruction methods. Usually, a model phylogeny is generated, then a set of
sequences is evolved down the edges of the model phylogeny according to some chosen
model of sequence évolution, and the sequences thus obtained at the tips are given as input to
the reconstruction method under study. The resulting phylogeny is then compared to the
model “true” tree to assess the accuracy of the reconstruction method.
Molecular Evolutionary Biology of Cetaceans General Introduction
Figure 5. Alternative interprétations of a cycle (loop) in a reticulated graph (middle). If the loop is eut between the node U and W, a homoplasious character change at the first character is accepted (left side); if, in contrast, we eut between the node U and V, parallel substitutions occurred at the second nucléotide site (right side).
Rooting
Most tree construction methods do not specify character polarity (j.e., détermination of ancestral vs. derived character States) as they provide unrooted trees. The most commonly used method for rooting (c/ Appendix fV) a phylogenetic tree is the outgroup comparison where a closely-related species or group of species (outgroup) is included in the analysis (because outgroup taxa are likely to exhibit ancestral States). However, outgroup taxa must be chosen carefully. Although they should lie outside the group of interest (assumption of ingroup monophyly; cf. Appendix IV), the use of too distantly-related outgroup taxa is problematic. Indeed, the likelihood that molecular character States shared by one taxon and the outgroup will be based on random similarity rather than on history increases with increasing divergence between the outgroup and the ingroup taxa (Milinkovitch et al. 1996;
Milinkovitch & Lyons-Weiler 1998; Smith 1994; Templeton 1992; 1993; Wheeler 1990).
Rooting analysis should get more reliable, the doser the outgroup gets to the ancestral sequence. While in general there has been only limited success in directly obtaining molecular sequences firom fossil ancestors, one potential solution for outgroup rooting is to use paralogous genes {cf. Appendix IV), or paralogous duplicated fragments within genes (Lôytynoja & Milinkovitch 2001a), with one paralogue serving as the phylogenetic root for the other (Donoghue & Mathews 1998; Telford & Holland 1997). This can be done when the gene duplication occurs after the divergence between the closest organismal outgroup and the ingroup taxa, but before ingroup diversification. Using duplicated genes in rooting analysis can be even more promising when one of the gene copies is subjected to very low substitution rates such that ancestral States are preserved over long time scales. In mammals, for example, where mitochondrial DNA sequences evolve more rapidly than nuclear sequences (Brown et al. 1979; Brown et al. 1982), paralogous copies of mitochondrial genes that hâve been transferred to the nucléus (“Numts”, sensu Lopez et al. 1994) can serve as “molecular fossils”
in rooting mitochondrial trees (Bensasson et al. 2001b; Pema & Kocher 1996; Zischler et al.
1995).
Species trees, gene trees, and the coalescentprocess
Under the neutral Wright-Fisher model, a population has a finite and constant size N, générations are discrète (i.e., non-overlapping), and each new génération is formed by randomly sampling alleles from N parents with replacement from the current génération (Fig.6a). If we start with a sample of alleles and go backwards in time, we will encounter the point at which two gene lineages coalesce, and eventually find their most recent common ancestor (MRCA).
With each copy of a gene being connected to a randomly chosen copy from the
previous génération, généalogies of sampled alleles {le., the so-called “gene trees”) are
random with respect to shape and evolutionary depths (Fig.ôb). Moreover, because of stochastic variance in the reproductive success of individuals, gene lineages can go extinct when an individual produces no offspring (Fig.6c).
Figure 6. A représentation of the complété genealogy for a population of ten haploid individuals is shown in (a).
The ancestries of four and three extant lineages back to their most recent common ancestor are traced by the black and dashed Unes and are shown in (b) & (c), respectively. Gene généalogies (inner Unes) that are embedded within a three-species phylogeny (outer ffames) are drawn in (d) and (e). In (d), gene and species divergences are almost concurrent such that the gene tree closely reflects the species tree, (e) illustrâtes the problem of differential lineage sorting (the time interval between the two spéciation events is much shorter than in (d) and the distribution of retained ancestral polymorphism on the species 1 & 2 is discordant with the species phylogeny). N„ effective population size in the ancestral species of 1 & 2; t, time interval between two spéciation events; x, gene lineages that go extinct.
Stochastic lineage sorting
Species trees that reflect the temporal succession of spéciation events are constructed by estimating embedded gene trees, i.e., the genealogy of homologous sequences sampled in different species (Nordborg 2001; Rosenberg & Nordborg 2002). As long as time intervals between species-branching events are much greater than within-species coalescence times, gene and species divergences are nearly concurrent and gene trees are likely to perfectly reflect species trees (Fig.6d). However, when the time between spéciation events is short and the effective population size in ancestral species is large, gene trees can differ from species trees due to incomplète lineage sorting of ancestral polymorphism (Avise & Wollenberg 1997; Felsenstein 2004; Maddison 1997; Moore 1995; Nei 1987; Nordborg 2001; Pamilo &
Nei 1988; Takahata 1989). Such coalescent phenomena become obvious in studies near the
Molecular Evolutionary Biology of Cetaceans General Introduction
species level {e.g., Moranda et al. 2004; Slade et al. 1994), but discordance is also expected in phylogénies of clades (c/ Appendix IV) that experienced an ancient but explosive radiation, as the rapid succession of nodes is prone to yield multiple, potentially conflicting, gene trees (Albertson et al. 1999; Lôytynoja & Milinkovitch 2001a; Takahashi et al. 2001).
An assessment of genetic variation at multiple independent loci might then overcome the effects of incomplète sorting of alleles between spéciation events (Albertson et al. 1999;
Pamilo & Nei 1988; Slade et al. 1994), and several techniques hâve been proposed to détermine whether sequences ffom different genes hâve the same evolutionary history and can be concatenated into a combined data set (e.g., partition homogeneity test (Cunningham
1997; Farris et al. 1994), and RASA tests (Lyons-Weiler & Milinkovitch 1997)).
Coalescence theory
In population genetics, the imcertainty caused by the inhérent randomness of évolution is increasingly addressed in data analysis by using coalescent-based approaches.
The coalescent process (Hudson 1990; Kingman 1982) merges ancestral lineages going backwards in time, allowing to model random généalogies under different démographie historiés, including population size and dispersai (Donnelly & Tavaré 1986; 1995; Hudson
1990;Nordborg 2001).
In a more formai description, we assume a diploid, panmictic (i.e., randomly mating) population of constant (through time) effective size N {i.e., with 2N alleles). Under neutrality (no sélection), lineages can be viewed as randomly picking their parents {cf. Fig.6a). Let us first consider a sample of two lineages. In a rétrospective view, the two sequences coalesce in the previous génération (i.e., dérivé from the same ancestral sequence in the preceding génération) with a probability of 1/2V, and it takes in average 2N générations until they share a common ancestor (MRCA). A comparable reasoning can be followed when modelling the genealogy of a sample of n lineages (Fig.7a): the first set of two sequences in the sample that coalesce in the previous génération will do so with a probability of n(n-l)/4V (because n(n- l)/2 sequence pairs can be compared), and this first coalescence event will be observed, in average, after 4N/(n(n-l)) générations (T„). After the first merging, n-1 lineages are left and (n-l)(n-2)/2 pairs of sequences can coalesce in the preceding génération. The mean time needed to pass from n-1 to n-2 lineages is therefore Tn-i = 4N/((n-l)(n-2)). In this manner, the coalescence times T„, Tn-i,..., T
2are estimated, going stepwise from n lineages back to the unique common ancestor of ail the sample. Hence, the rate at which lineages coalesce dépends on how many lineages are picking their parents (the more lineages, the faster the rate) and on the size of the population (the more parents to choose from, the slower the rate).
With constant population size, mean time between two successive coalescence events become
increasingly longer when we approach the MRCA of the sample. In fact, the time necessary
to pass from n to 2 lineages is almost équivalent to the time needed for the last 2 lineages to
coalesce, suggesting that the âge estimation of the MRCA will not significantly change with
larger sample sizes. Différences in mean coalescence times between nuclear and
mitochondrial genes (Fig.7b) can be explained by the fourfold smaller effective population
size of the haploîd and matemally-inherited mitochondrial genome. One fimdamental insight
of coalescent theory is that, since selectively neutral mutations do not affect the reproductive
success of individuals, it is possible to separate the coalescent (which is influenced by
demography and life-history factors) from the mutational process. Généalogies are first
modelled backward in time, and only afterwards mutations are superimposed on the tree.
2N
2N/3 2N/6 2N/10
(C)
Figure 7. (a) Example of a gene genealogy relating five nuclear gene copies, with brandi lengths being proportional to average coalescence times expected in a population of N diploid individuals. Note that, going backwards in time, the next mean coalescence time increases with decreasing number of gene lineages that can coalesce. (b) Gene genealogy relating five mitochondrial or Y-chromosome gene copies sampled in the same population as in (a); average coalescence times, for haploid uniparental-inherited loci, are four times shorter than for nuclear loci, simply because four times less gene copies are présent in the population, (c) Four actual realizations (simulations), ail drawn to the same scale, of the coalescent for six gene copies.
Inferences using the coalescent
Généalogies modelled under the coalescent demonstrate the considérable variability of trees, both with respect to topology and branch lengths (Fig.7c). In contrast to likelihood inference in phylogenetics where the main objective in the analysis is to estimate tree topology (c/ above), the genealogy itself is not of much interest in a coalescent setting. The analogous équation is therefore P(D^G)P(G\Q) where 0 is the collection of
G