• Aucun résultat trouvé

Article : Expansion of banana gene families involved in ethylene biosynthesis and signalling after

V. Présentation du projet de recherche

2. Article : Expansion of banana gene families involved in ethylene biosynthesis and signalling after

Cyril Jourda1, Céline Cardi1, Didier Mbéguié-A-Mbéguié23, Stéphanie Bocs1, Olivier Garsmeur1,Angélique D’Hont1, Nabila Yahiaoui1*

1CIRAD, UMR AGAP, F-34398 Montpellier, France

2CIRAD, UMR QUALISUD, F-97130 Capesterre-Belle-Eau, Guadeloupe, France

3CIRAD, UMR QUALISUD, F-34398 Montpellier, France

*Corresponding author: Nabila Yahiaoui

CIRAD, UMR AGAP, Avenue Agropolis, TA A96/03, F-34398 Montpellier, France Phone: +33 4 67 61 44 01

E-mail: nabila.yahiaoui@cirad.fr

Note: Les informations supplémentaires de l’article sont disponibles en Annexe 1, pages 303-335.

125

SUMMARY

 Whole genome duplications (WGD) are widespread in plants and three lineage-specific WGD occurred in the banana (Musa acuminata) genome. Here, we analysed the impact of WGDs on the evolution of banana gene families involved in ethylene biosynthesis and signalling, a key pathway for banana fruit ripening.

 Banana ethylene pathway genes were identified using comparative genomics approaches and their duplication modes and expression profiles were analysed.

 Seven out of ten banana ethylene gene families evolved through WGD and four of them (ACS, EIL, EBF, ERF) were preferentially retained. Banana orthologs to AtEIN3 and AtEIL1, two major genes for ethylene signalling in Arabidopsis were particularly expanded. This expansion was paralleled by that of EBF genes that are responsible for control of EIL proteins levels. Gene expression profiles in banana fruits suggested functional redundancy for several MaEBF and MaEIL genes derived from WGD and subfunctionalisation for some of them.

 We propose that EIL and EBF genes were co-retained after WGD in banana to maintain a balanced control of EIL protein levels and thus avoid detrimental effects of constitutive ethylene signalling. In the course of evolution, subfunctionalisation was favoured to promote a finer control of ethylene signalling.

Key words: whole genome duplication; banana; Musa acuminata; gene families; ethylene;

127

INTRODUCTION

Gene duplication is a major mechanism generating new templates for evolutionary innovation in eukaryotes (Ohno, 1970; Lynch & Conery, 2000). Genes duplicates may originate from single gene duplications such as tandem and proximal duplications, or from large scale duplications (Maere et al., 2005). Tandem duplicates are adjacent and mainly result from unequal crossing-over whereas in proximal duplications, duplicates are separated by few other genes and might result from unequal crossing-over or transposon activities (Wang et al., 2012). Large scale gene duplications including whole genome duplications (WGD), and segmental duplications (i.e. duplications of a chromosomal region, Koszul & Fischer, 2009) are frequent in the history of angiosperm genomes. A hexaploidisation event ( triplication) occurred near the origin of Eudicots (Jiao et al., 2012). It was followed by lineage specific duplications in some taxa such as two tetraploidisation events in Arabidopsis thaliana ( and , Blanc et al., 2003; Bowers et al., 2003) and a hexaploidisation in Solanaceae (“T” triplication, The Tomato Genome Consortium, 2012). Within Monocots, two WGD events were characterised in sequenced Poaceae genomes. The ρ WGD occurred around 50-70 million years ago (MYA) (Paterson et al., 2004; Salse et al., 2008) and the σ WGD was dated earlier in the monocot lineage (Tang et al., 2010). In addition, a recent WGD occurred in Zea

mays around 5 to 12 MYA (Schnable et al., 2009). Recently, the genome of banana (Musa acuminata), a monocotyledon from the order Zingiberales was sequenced, using DH-Pahang,

a doubled-haploid (523 Mb, 2n=22) derived from a seedy diploid of the subspecies

malaccensis (D’Hont et al., 2012). This subspecies contributed one of the three M. acuminata

genomes of the sterile triploid cultivar Cavendish (AAA genome) that produces half of worldwide banana production (Lescot, 2011). Analyses of the banana genome revealed three rounds of WGD (α, β and γ) that were not shared with the Poales or the Arecales (palms) (D’Hont et al., 2012). The α and β WGDs were estimated to have occurred within a short time frame around 65 MYA whereas the γ WGD was dated around 100 MYA. The availability of the banana genome sequence offers the opportunity to study the evolution of banana genes families in the context of the three WGDs.

Following duplication, paralogous genes can have different fates. They can become pseudogenes or be lost and it is now well established that over evolutionary time, most of WGD duplicate genes are lost through fractionation (Lockton & Gaut, 2005). This process has a major impact on the evolution of plant genes as some of them are preferentially retained after WGD or are found preferentially in a singleton status (Freeling, 2009). In addition, it has

129

been observed that functional categories of genes that were more likely to be retained after WGD were less likely to be retained after tandem duplication and vice-versa (Freeling, 2009; Woodhouse et al., 2011; Rodgers-Melnick et al., 2012). In banana, most retained gene categories after WGD included transcription factors, signal transduction genes and translational elongation genes similar to findings in A. thaliana (Blanc & Wolfe, 2004; Maere

et al., 2005; D’Hont et al., 2012). Retention of these gene categories has been explained by

the gene balance hypothesis (Birchler et al., 2001; Papp et al., 2003; Freeling & Thomas, 2006) which states that genes encoding products that are in a balanced interacting relationship, such as those encoding members of a protein complex, or involved in multiple steps in regulatory cascades, will tend to be dosage sensitive because changes in the stoichiometry of individual components will be detrimental. These genes are thus more prone to be co-retained after WGD (Birchler & Veitia, 2007). Other models for duplicate gene retention include neofunctionalisation, where one of the duplicates acquires a new function; and subfunctionalisation where the two copies share the function of the ancestral gene (Force

et al., 1999).

To analyse gene family evolution, we focused on a key pathway for banana fruit ripening, the ethylene biosynthesis and signalling pathway (Fig. S1). Bananas fruits are climacteric, they are characterised by drastic changes in ethylene production with an increased respiration burst during ripening (Burg & Burg, 1965; Liu et al., 1999). In addition, export bananas are ripened by exogenous application of ethylene. In A. thaliana, ethylene is perceived by a family of five ethylene receptors (ETR1 (ethylene response 1), ETR2 , ERS1 (ethylene response sensor 1), ERS2 and EIN4 (ethylene insensitive 4); reviewed by Shakeel et

al., 2013). Ethylene receptors act as negative regulators of signalling through constitutive

activation of the Ser/Thr kinase CTR1 (constitutive triple response 1; Kieber et al., 1993). The RAN1 (response-to-antagonist 1) protein is a copper transporter that is essential for biogenesis of ethylene receptors (Binder et al., 2010) and RTE1 (reversion-to-ethylene sensitivity 1) is involved in the function of the ETR1 receptor (Resnick et al., 2008). In presence of ethylene, receptors inactivate CTR1, thus relieving suppression on downstream signalling components. The EIN2 (ethylene insensitive 2) protein, an endoplasmic reticulum-bound protein (Alonso et al., 1999) is processed and its C-terminal domain migrates into the nucleus (Qiao et al., 2012; Wen et al., 2012). There, it activates the EIN3 (ethylene insensitive 3)/EIN3-like (EIL) transcription factors which, in turn, initiate the ethylene transcriptional responses by binding to specific elements in promoter regions of genes encoding ethylene response factors (ERF). Additional regulation of ethylene signalling occurs

131

at the post-transcriptional level: EIN3 proteins levels are regulated through EIN3-binding F-box (EBF) proteins which are components of SCF (Skp, Cullin, F-F-box containing) complexes (Guo & Ecker, 2003; Potuschak et al., 2003; Gagne et al., 2004). In banana, genes encoding the two main enzymes of the ethylene biosynthesis pathway (1-aminocyclopropane-1-carboxylate synthases and oxidases (ACS and ACO)) were identified based on cDNA amplification (two ACO and four ACS genes, (Liu et al., 1999; Inaba et al., 2007)). In addition, three ERS genes (Yan et al., 2011), one CTR1-like (Hu et al., 2012), five EIL (Mbéguié-A-Mbéguié et al., 2008), two EBF (Chen et al., 2011; Kuang et al., 2013), and fifteen ERF (Xiao et al., 2013) genes were identified but no complete inventory of these gene families could be performed.

Here, we identified all members of ten gene families of the banana ethylene pathway using genome-scale approaches. We analysed their evolutionary patterns with a specific focus on the EIL and EBF gene families that play a central role in the control of ethylene signalling. Our results showed a co-expansion of several ethylene genes families after banana WGD. Based on expression data, the co-expansion of a specific subgroup of EIL genes and of EBF genes can be partly associated to functional redundancy however subfunctionalisation also occurred in both families.

MATERIALS AND METHODS

Identification of ethylene pathway genes

For each gene family, clusters of protein sequences from twelve plant species (Table S1) were identified using Pathway tools databases (Karp et al., 2002) including MusaCyc (http://banana-genome.cirad.fr/musacyc; Droc et al., 2013); the Greenphyl database (http://www.greenphyl.org/cgi-bin/index.cgi, Rouard et al., 2011), InterProScan (Quevillon et

al., 2005) and BLASTP clustering using a protein reference list (Table S2). Only the longest

sequence of each gene was kept. To identify Musa ERF, APETALA2/ethylene-responsive element-binding proteins (AP2/EREBP) were identified by searching the Musa proteome for Interpro domain IPR001471 and were classified following a specific approach (Methods S1, Table S3). For other plants, ERF numbers were retrieved from published data. Gene family expansion was detected by comparing the number of family members in banana to numbers of gene family members in other species after standardisation using predicted proteome size of each species. A chi-square (χ2) test was applied to test the significance of the observed difference.

133

Phylogenetic tree reconstruction

Protein sequences were aligned using MAFFT version 6.717b (Katoh & Toh, 2008). To improve alignments, genes were manually curated when necessary. Maximum-likelihood phylogenetic analysis was performed using PhyML version 3.0 (Guindon et al., 2010) under the LG evolution model. Tree topology was reconstructed using the best of NNI (nearest neighbour interchange) and SPR (subtree pruning and regraphing) methods. Branch supports were estimated using an approximate likelihood ratio test with a Shimodaira-Hasegawa-like procedure (Guindon et al., 2010). Phylogenetic trees were visualised with FigTree v.1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/).

Identification of gene duplication modes

Duplicated copies of genes were identified by an all-by-all comparison of Musa predicted proteins using BLASTP (E-value cut-off of 1e-10) and the five best non-self protein matches were selected. WGD gene pairs were identified based on Musa ancestral blocks available at http://banana-genome.cirad.fr/dotplot (D’Hont et al., 2012) and in the Plant Genome Duplication Database (http://chibba.agtec.uga.edu/duplication/index/downloads; Lee

et al., 2013). Additional small paralogous relationships that could correspond to potential

segmental duplications were detected using SynMap

(http://genomevolution.org/CoGe/SynMap.pl) with a 3 to 3 quota-align ratio and default parameters (Tang et al., 2011). Fine analysis of duplicated regions was carried out with SynFind (http://genomevolution.org/CoGe/SynFind.pl) and the most conserved pairs deriving from Musa α WGD were identified using SynMap with a quota-align ratio of 1 to 1 (Tang et

al., 2011). Tandem and proximal duplications were considered when two duplicated genes

were consecutive in the genome or separated by twenty or fewer gene loci, respectively. A χ2 test was used to identify retention bias for ethylene pathway gene families compared to genome-wide retention. For genes from other plant species, gene duplication modes were identified using published WGD data (Bowers et al., 2003; Tang et al., 2010; Schnable et al., 2011b; The Tomato Genome Consortium, 2012) and the approach described above. Duplication modes for ACS, EBF, EIL and ERF banana gene families were visualised with Circos (Krzywinski et al., 2009).

135

Gene structure and molecular evolution analyses

Exon/intron structures of EIL and EBF genes were retrieved from

http://www.arabidopsis.org/ (TAIR10), http://rice.plantbiology.msu.edu/ (MSU7 version) and

http://banana-genome.cirad.fr/ (Gaze version 1) and were manually curated if necessary. Protein domains were identified using InterProScan and published data (Gagne et al., 2004). For molecular evolution analysis, coding sequences alignments were guided by protein sequence alignments using PAL2NAL (Suyama et al., 2006). To estimate variation in selective pressure for EBF and EIL gene families, branch models of CODEML in PAML were constructed to estimate ω (=dN/dS), the ratio between synonymous (dS) and non-synonymous (dN) substitution rates, under two different assumptions (Yang, 2007). The model assuming a single ω for all branches (the one-ratio model: M0) is compared to the free-ratio model M2 that assumes an independent ω for each selected lineage. A likelihood-free-ratio test was used to compare the fit of two models (Yang, 1998).

Plant materials for gene expression analysis

Banana (Musa acuminata Colla) fruits of the Cavendish cultivar grown at a banana farm in Guadeloupe, were harvested at the immature green, early and late mature green developmental stages, (40, 60 and 90 days after flowering (DAF) respectively (Mbéguié-A-Mbéguié et al., 2007)). After harvest, one fruit per bunch was sampled (T0 condition) and all other harvested fruits were kept for 24 h at 20°C in chambers ventilated with humidified air. Half of harvested fruits were treated for 24h with 10,000 ppm of acetylene, an ethylene analogue, followed by storage in ventilated chambers with humidified air at 20°C. One fruit per bunch and per condition was sampled at 2 days (T3 condition) and 4 days (T5 condition) after treatment. The physiological state of fruits was monitored by measuring colour change and extent of softening as in Mbéguié-A-Mbéguié et al., 2008. Pulp and peel tissues corresponding to the median part of the fruit were separately frozen in liquid nitrogen and stored at -80°C.

RNA extraction and quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis

Total RNA was extracted from 600 mg of fruit tissue using a TE3D/MATAB protocol (Argout et al., 2008) followed by a lithium chloride (2M) precipitation step. RNA was treated with RQ1 DNAse (Promega, Madison, WI, USA) and purified using RNeasy® MinEluteTM Cleanup Kit (Qiagen®, Hilden, Germany). Quantity and quality of RNA were analysed

Table 1 Gene numbers of ethylene pathway gene families in twelve plant genomes Plant species Monocots Eudicots Gene family Ma Os Bd Sb Zm Pd Tp At Vv Sl Pp Fv ACS 11 5 3 3 4 5 9 10 7 11 6 8 ACO 12 8 6 8 12 9 5 5 3 7 5 5 ERS/ETR/EIN4-like 7 5 6 6 5 6 6 5 4 8 4 4 RAN1-like 2 2 2 2 2 1 1 1 1 1 1 1 RTH 2 3 2 3 4 1 2 2 2 3 2 2 CTR1-like 3 2 2 2 1 1 1 1 1 3 1 1 EIN2-like 3 2 3 3 3 2 1 1 1 3 1 1 EBF 5 2 2 2 4 2 2 2 2 6 2 2 EIL 17 7 6 6 9 6 6 6 4 9 4 5 ERF 122 82a NA 53b 84c NA NA 65a 82d 68e 59f NA

Ma, Musa acuminata; Os, Oryza sativa; Bd, Brachypodium distachyon; Sb, Sorghum bicolor; Zm, Zea mays; Pd, Phoenix dactylifera; Tp, Thellungiella parvula; At, Arabidopsis thaliana; Vv, Vitis vinifera; Sl, Solanum lycopersicum; Pp, Prunus persica; Fv, Fragaria vesca;; NA, not available; aNakano et al., 2006; bYan et al., 2013; cZhuang et al., 2010; dLicausi et al., 2010; eSharma et al., 2010; fZhang et al., 2012.

Table 2 Modes of duplication of banana genes

Number of genes involved in different duplication modesa

Duplication

modelsb WGD Segmental Tandem Proximal Unknownc Uniqued

Genome-wide 14,771 2,717 1,258 1,537 10,728 8,225 Gene family ACS 10 9 3 0 0 1 0 ACO 5 2 0 3 2 7 0 ERS/ETR/EIN4-like 5 4 2 0 0 2 0 RAN1-like 2 2 0 0 0 0 0 RTH 0 0 0 0 0 2 0 CTR1-like 2 1 0 0 2 1 0 EIN2-like 3 3 0 0 0 0 0 EBF 5 5 0 0 0 0 0 EIL 12 12 6 0 0 5 0 ERF 92 91 4 4 0 30 0 Total 136 129 15 7 4 48 0

aThe same gene can be involved in different types of duplication

bNumber of genes duplicated by at least one identified duplication mode

cNumber of genes involved only in unknown duplications

137

through agarose gels electrophoresis and with Agilent Bioanalyzer 2100 and RNA 6000 Nano LabChips (Agilent technologies, Waldbronn, Germany). First-strand cDNA was synthesised from 1 µg RNA using SuperScript® III reverse transcriptase (InvitrogenTM, Carlsbad, CA, USA). Primers were designed using Primer Blast and Primer Designer tools (Droc et al., 2009, http://banana-genome.cirad.fr/) and are listed in Table S4. Primer specificities were confirmed by amplicon sequencing and melting-curve analysis. The qRT-PCR experiments (see Methods S2) were performed in duplicates for four biological replicates per condition in 384-well plates using a Light Cycler® 480 system (Roche Applied Sciences, Switzerland). Normalised transcript abundances (A= Etarget(-Cptarget) / Ereference(-Cpreference)) were calculated using LightCycler® 480 SW software version 1.5 and MaActin2

(GSMUA_Achr1G05990_001) as a reference gene. Statistical analysis was performed with an ANOVA after a logarithmic transformation of raw data followed by a Tukey’s test.

RESULTS

Expansion of banana gene families involved in the ethylene pathway

Members of ten gene families involved in the core ethylene biosynthesis and signalling pathway (Fig. S1) were identified using predicted proteomes of twelve plant species (Table S1) including M. acuminata, and representatives of Monocots (rice, Brachypodium, sorghum, maize, date palm) and Eudicots (Arabidopsis, Thellungiella parvula, grapevine, tomato, peach and woodland strawberry). For each gene family, the total number of genes was compared (Table 1). ACS genes showed a higher number in Musa as compared to the Poaceae (P = 0.007). The eleven banana ACS genes encode proteins belonging to Type I, Type II and Type III ACS as defined in Arabidopsis and tomato (Yamagami et al., 2003; Yoshida et al., 2005; Fig. S2). Banana and maize showed the highest ACO gene number with 12 members. Banana and tomato had three CTR1-like genes compared to one or two in other species. They also had five and six EBF genes respectively, whereas other species had two members (except maize with four EBF genes). In addition, banana gene families involved in transcriptional regulation of ethylene signalling were significantly expanded with seventeen EIL members (P < 0.001) and 122 ERF members (P = 0.000, Fig. S3). Thus, six out of ten ethylene pathway gene families (ACS, ACO, CTR1-like, EBF, EIL and ERF) showed high gene numbers in banana with a particular expansion of EIL and ERF transcription factors compared to other species. The genes encoding ethylene receptors RAN1-like and RTH (RTE1-homolog) proteins were not particularly expanded in banana.

Figure 1 Duplication modes for ACS, EBF, EIL and ERF banana gene families visualised

with Circos.

Genes are located on the eleven DH-Pahang chromosomes and on Musa ancestral blocks coloured as previously described in D’Hont et al., 2012. Gene duplication through WGD, potential segmental duplications or tandem/proximal duplications are indicated in blue, green and red, respectively. Genes present on scaffolds that are not anchored to the 11 Musa chromosomes are not represented here (e.g. MaACS8, MaEIL16).

139

The expansion of four banana gene families of the ethylene pathway is due to preferential retention after whole genome duplications

To elucidate the origin of banana gene family expansions, gene duplications were classified into four modes: WGD, potential segmental duplications, tandem and proximal duplications (Table 2). A total of 28,317 out of 36 542 banana predicted protein coding genes were found to be duplicates. The main duplication mode is due to WGD with 40% of banana genes involved in WGD gene pairs. Potential segmental duplicates corresponded to 7% of banana genes whereas tandem and proximal duplications involved 3.4 % and 4.2 % of banana genes, respectively.

Duplication modes were identified for a majority (74%) of the 184 genes from the ethylene gene families (Table 2). For seven of the ten gene families (ACS, ERS/ETR/EIN4-like, RAN1-ERS/ETR/EIN4-like, EIN2-ERS/ETR/EIN4-like, EIL, EBF, ERF), nearly all duplicates originated from WGD events including all RAN1-like, EIN2-like and EBF genes (Table 2, Fig. S2, Fig. S4-S6, Fig. 1). For the three remaining families (ACO, CTR1-like and RTH), it was not possible to identify a major duplication mode (Table 2, Fig. S7-S9). In addition, it was not possible to infer WGD relationships for genes not anchored to the Musa chromosomes (e.g. MaACS8,

MaEIL16).

Among the ten ACS genes anchored on Musa chromosomes, nine showed WGD gene pair relationships (Table 2) and originated from five different Musa ancestral blocks as defined in D’Hont et al., 2012 (Fig. 1). Relationships between MaACS4, MaACS2 and MaACS3 involved ancestral blocks G1 (in dark blue, Fig. 1) and G10 (in beige, Fig. 1). Further investigations using the Plant Genome Duplication Database detected additional paralogous relationships around MaACS4 on G1 and MaACS2 and MaACS3 on G10 suggesting that the three genes are present on previously undetected paralogous regions from a same ancestral block. In addition, two potential segmental gene pairs were found involving three ACS genes (MaACS4, MaACS6 and MaACS7) (Table 2, Fig. 1, Fig. S2). All EBF genes (MaEBF1-5) were related to each other by WGD relationships and probably originated from a unique Zingiberales ancestral gene (Fig. 1). Among the sixteen EIL genes anchored on Musa chromosomes, twelve were involved in 23 WGD gene pair relationships and resulted from duplication of three ancestral blocks; six of them were additionally involved in three potential segmental gene pairs (Table 2, Fig. 1). Finally, 101 ERF genes were found distributed on the twelve Musa ancestral blocks and among the 90 identified ERF gene pairs, 86 showed WGD relationships involving 91 different ERF genes and only two were tandem gene pairs (Table 2, Fig. 1). The ACS, EBF, EIL and ERF gene families showed significant preferential

141

retention of their members after WGD (P = 0.005, P = 0.007, P = 0.011, P = 0.000, respectively) indicating that their expansions are due to duplicate retention after the three banana WGD rounds. Thus, ethylene pathway banana gene families evolved mostly through WGD which is the main duplication mode in the banana genome.

Parallel acquisition of ultra-paralogs for EIL and EBF genes in banana

The EIL transcription factors are central positive regulators of ethylene responses and their regulation by EBF proteins is a key step of ethylene signalling control (Chao et al., 1997; Potuschak et al., 2003; Gagne et al., 2004; Binder et al., 2007). Both families were over-retained after WGD in banana raising the question of their phylogenetic evolution. The maximum likelihood phylogenetic tree of EIL proteins (85 sequences) showed three main groups which likely originated before the Eudicot/Monocot divergence as they all contained representatives from analysed species (Fig. 2). The largest group (group I, 42 genes) corresponded to homologs of AtEIN3 and AtEIL1, two Arabidopsis genes that have been shown to be necessary and sufficient for activation of ethylene-response genes (Chao et al., 1997). Group I comprised the majority of banana EIL members (13 out of 17) and also genes encoding functional EIL proteins from tomato (SlEIL1-4, Tieman et al., 2001) and rice (OsEIL1, Mao et al., 2006). These banana genes were subdivided into two subgroups: I-m1 which comprises orthologous genes from all analysed monocots and I-m2 which only