• Aucun résultat trouvé

1.2 Environmental diversity of eukaryotes

1.2.2 The molecular revolution

1.2.2.1 The ribosomal RNA gene marker

The ribosomal RNA gene (rDNA) is the main marker for the taxonomic assignment and diversity analysis of environmental sequencing data. The rDNA is present in all living organisms and thus conveys a universal evolutionary signal. It is used as a sur-rogate for the species entities defined by the phylogenetic species concept developed on the basis of analyses of the small subunit of the rDNA (SSU rDNA) (Woese et al., 1990). The rDNA provided the basis for a classification scheme relying on the taxo-nomic nomenclature of the Linnean system, including phylogenetically erected lineages to which HTS metabarcoding sequences could be placed. Prokaryotes are now classi-fied according to a phylogenomic species concept built from metagenomic data. Indeed, from complex metagenomic data have been discovered new phyla providing support for deep branching relationships (Brown et al., 2015a) and extracted rDNA sequence tags allowing the accurate description of environmental communities Logares et al. (2014).

The situation is more controversial for microbial eukaryotes because insightful varia-tion at the morphological or ecological levels are not necessarily corroborated by the variation of the rDNA sequence (Grattepanche et al., 2014). Indeed, it seems that the evolutionary signal conveyed by SSU rDNA sequences is scrambled by uneven rates of evolution and mutational saturation (Philippe and Laurent, 1998), and for phylogenetic purposes has been largely replaced by the signal contained in arrays of protein-coding genes (Katz and Grant, 2015; Sierra et al., 2013). Even though the taxonomic assign-ment of environassign-mental sequences and the phylogenetic analyses of rDNA sequences are different procedures, the former may rely on methods and knowledge developed for the

latter. For instance, I apply phylogenetic-based methods for the assignment of meta-zoan sequences in the study investigating metabarcoding for environmental monitoring using faunal biotic indices (Chapter 10).

The rDNA encodes the rRNA that during the maturation of the ribosome ma-chinery folds to form stem-loop secondary structures. The primary structures of these stem-loops correspond to hypervariable sequence regions that are usually named by the letterV associated with a number indicating the rank of the stem-loop with respect to the full rRNA secondary structure. Usually, one or a few rDNA hypervariable regions are targeted by PCR amplification in order to describe the diversity of environmental communities. Various regions have been tested for eukaryotes [e.g. V1-V3 (Pochon et al., 2013), V2 (Geisen et al., 2015) or V9 (Amaral-Zettler et al., 2009; Pawlowski et al., 2011a; Brannock and Halanych, 2015)]. The V4 region has been recommended because it offers the highest taxonomic coverage (Pawlowski et al., 2012) and because it is present in most reference sequences (Hu et al., 2015). Moreover, the V4 region has been shown to better reflect the variation entailed within the full-length SSU rDNA (Dunthorn et al., 2012) as well as the results of shotgun sequencing metagenomics (Tremblay et al., 2015). Hence, we employed this region for our HTS metabarcoding survey of Metazoa (Chapter 10).

HTS metabarcoding essentially delivers taxonomic information, which allows not only the assignment of environment sequences but also the delineation of environmental species, often referred to as Operational Taxonomic Units (OTUs) or Molecular Oper-ational Taxonomic Units (MOTUs) (Blaxter et al., 2005). Intense research efforts have been devoted to reconcile or refine the morphological species classification by taking advantage of the taxonomic resolution of markers such as the rDNA or the COI genes (Puillandre et al., 2012a,b), independently for various taxa (e.g. brachiopods: Bitner and Cohen (2015); sponges: Dohrmann et al. (2012)). Indeed, before the metabar-coding application of a candidate marker region, it is wise to evaluate its taxonomic signal at various taxonomic levels as different hypervariable regions offer different res-olutions across taxa (Hadziavdic et al., 2014; Pernice et al., 2013). The taxonomic resolution of rDNA markers are being described in great details for foraminiferans (Göker et al., 2010; Morard et al., 2015; Pawlowski and Lecroq, 2010), haptophytes (Egge et al., 2015b), ciliates (Dunthorn et al., 2012; Stoeck et al., 2014), diatoms (Luddington et al., 2012), marine stramenopiles (Massana et al., 2014), radiolarians

(Decelle et al., 2014) or even at coarse taxonomic level for all major eukaryotic lineages (Pernice et al., 2013). Extensive evaluations performed independently for each lineage are highly relevant because environmental sequences could first be roughly sorted into unambiguous, non-overlapping bins thanks to higher-level taxonomic sequence signa-tures Lejzerowicz et al. (2014); Pawlowski et al. (2014a), sequence base composition (or k-mer frequencies) (Cole et al., 2007) or sets of diagnostic positions (Sarkar et al., 2008). Indeed, the distances derived from sequence alignments might confound taxo-nomic classification and lump together sequences from different higher-level taxa, as could be predicted for ciliates (Fig. 1.2).

−0.2 −0.1 0 0.1 0.2 0.3

−0.25

−0.2

−0.15

−0.1

−0.05 0 0.05 0.1 0.15 0.2 0.25

Figure 1.2: Non-metric multidimensional scaling of the aligned sequence distances for cil-iates. This figure demonstrates that a distance computed from the alignment of sequences belonging to completely different families could be smaller than that computed within a family, supporting the taxonomic binning of sequences prior to the computation of pairwise sequence alignments for clustering. Unpublished results.

Moreover, for each taxon bin, an optimal set of parameters could be defined in order to optimize the sequence alignments undertaken for precise, species-level assignments as well as for the clustering of sequences into OTUs. Indeed, the treatment of gaps in pairwise sequence alignments is generally set to some default behavior by different clustering algorithms (e.g. consecutive gap counted as one gap inSwarm (Mahé et al., 2015) or each gap counted separately as inmothur (Schloss and Westcott, 2011)). This is highly relevant as some groups such as the ciliates and foraminiferans comprise natu-rally polymorphic species that only diverge in the length of homopolymer stretches, for which assimilating multiple contiguous gaps as one mutational event would be

appro-priate (Grattepanche et al., 2014). For ciliates, the primary structure of the SSU rDNA V4 region is valuable (Dunthorn et al., 2012), but its secondary structure also carries useful information at the genus level (Wang et al., 2015). In fact, ongoing advances towards the characterization of the evolution of ribosomal RNA folding and structure hold great promise, and may even allow the resurgence of SSU rDNA-based phylo-genies. Indeed, recent releases of phylogenetic softwares now incorporate secondary structure information as discrete characters for inference (Stamatakis, 2014). Docu-menting how evolution shaped the primary (and if applicable the secondary) structure of diverse rDNA markers with respect to phylogenetic systematics and ecological pref-erences observed in the environment represents a long-term common thread that will form a strong basis for future descriptions of HTS marker taxonomic and ecological resolutions.