• Aucun résultat trouvé

Mechanistic insights into the evolution of DUF26-containing proteins in land plants

N/A
N/A
Protected

Academic year: 2022

Partager "Mechanistic insights into the evolution of DUF26-containing proteins in land plants"

Copied!
19
0
0

Texte intégral

(1)

Article

Reference

Mechanistic insights into the evolution of DUF26-containing proteins in land plants

VAATTOVAARA, Aleksia, et al.

Abstract

Large protein families are a prominent feature of plant genomes and their size variation is a key element for adaptation. However, gene and genome duplications pose difficulties for functional characterization and translational research. Here we infer the evolutionary history of the DOMAIN OF UNKNOWN FUNCTION (DUF) 26-containing proteins. The DUF26 emerged in secreted proteins. Domain duplications and rearrangements led to the appearance of

CYSTEINE-RICH RECEPTOR-LIKE PROTEIN KINASES (CRKs) and

PLASMODESMATA-LOCALIZED PROTEINS (PDLPs). The DUF26 is land plant-specific but structural analyses of PDLP ectodomains revealed strong similarity to fungal lectins and thus may constitute a group of plant carbohydrate-binding proteins. CRKs expanded through tandem duplications and preferential retention of duplicates following whole genome duplications, whereas PDLPs evolved according to the dosage balance hypothesis. We propose that new gene families mainly expand through small-scale duplications, while fractionation and genetic drift after whole genome multiplications drive families towards dosage balance.

VAATTOVAARA, Aleksia, et al . Mechanistic insights into the evolution of DUF26-containing proteins in land plants. Communications Biology , 2019, vol. 2, no. 1

DOI : 10.1038/s42003-019-0306-9

Available at:

http://archive-ouverte.unige.ch/unige:121409

Disclaimer: layout of this document may differ from the published version.

(2)

Mechanistic insights into the evolution

of DUF26-containing proteins in land plants

Aleksia Vaattovaara 1, Benjamin Brandt 2, Sitaram Rajaraman 1, Omid Safronov1, Andres Veidenberg 3, Markéta Luklová 1,5, Jaakko Kangasjärvi 1, Ari Löytynoja 3, Michael Hothorn 2,

Jarkko Salojärvi 1,4& Michael Wrzaczek 1

Large protein families are a prominent feature of plant genomes and their size variation is a key element for adaptation. However, gene and genome duplications pose difficulties for functional characterization and translational research. Here we infer the evolutionary history of the DOMAIN OF UNKNOWN FUNCTION (DUF) 26-containing proteins.

The DUF26 emerged in secreted proteins. Domain duplications and rearrangements led to the appearance of CYSTEINE-RICH RECEPTOR-LIKE PROTEIN KINASES (CRKs) and PLASMODESMATA-LOCALIZED PROTEINS (PDLPs). The DUF26 is land plant-specific but structural analyses of PDLP ectodomains revealed strong similarity to fungal lectins and thus may constitute a group of plant carbohydrate-binding proteins. CRKs expanded through tandem duplications and preferential retention of duplicates following whole genome duplications, whereas PDLPs evolved according to the dosage balance hypothesis. We propose that new gene families mainly expand through small-scale duplications, while fractionation and genetic drift after whole genome multiplications drive families towards dosage balance.

https://doi.org/10.1038/s42003-019-0306-9 OPEN

1Organismal and Evolutionary Biology Research Programme, Viikki Plant Science Centre, VIPS, Faculty of Biological and Environmental Sciences, University of Helsinki, Viikinkaari 1 (POB65), FI-00014 Helsinki, Finland.2Structural Plant Biology Laboratory, Department of Botany and Plant Biology, University of Geneva, Geneva, Switzerland.3Institute of Biotechnology, University of Helsinki, Viikinkaari 5 (POB56), FI-00014 Helsinki, Finland.4School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore.5Present address: Laboratory of Plant Molecular Biology, Institute of Biophysics AS CR, v.v.i. and CEITECCentral European Institute of Technology, Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czech Republic. These authors contributed equally: Benjamin Brandt, Sitaram Rajaraman. Correspondence and requests for materials should be addressed to J.Sär. (email:jarkko@ntu.edu.sg) or to M.W. (email:michael.wrzaczek@helsinki.)

1234567890():,;

(3)

G

ene duplication and loss events constitute the main factor of gene family evolution1. Duplications occur by two major processes, whole-genome multiplications (WGM) and small-scale duplications (SSD), including tandem, segmental, and transposon-mediated duplications2. They appear to be dis- tinct modes of expansion, since gene families evolving through WGMs rarely experience SSD events3. The division is also visible on the functional level, since genes duplicated in WGMs are enriched for signal transduction, transcriptional and develop- mental regulation as well as signal transduction functions, whereas SSDs occur preferentially on secondary metabolism and environmental response genes3. The prevailing hypothesis for the phenomenon is dosage balance; in regulatory networks and protein complexes the stoichiometric balance between different components needs to be preserved; thus, selection acts against losses following WGMs and against duplications in SSDs4. Families retained after WGMs are typically stable across different species whereas highly variable families evolve through SSDs5, suggesting high turnover rates. These results have been obtained by analyzing the extremes, top families displaying pure WGM retention or SSD characteristics3 while most gene families likely evolve in an intermediate manner.

Plants and other eukaryotes have developed a wide range of signal transduction mechanisms for controlling cellular func- tions and coordinating responses on cell, tissue, organ, and organismal level. Plants in particular encode large gene families of secreted proteins68and proteins with extracellular domains to respond to environmental and developmental cues. However, the large numbers make it difficult to dissect conserved or specialized functions4, and therefore a detailed understanding of their evolution and duplication history in different plant lineages is needed. Signaling proteins with extracellular domains include receptor-like protein kinases (RLKs)9,10 and receptor-like proteins (RLPs)11. In RLKs, extracellular domains are involved in signal perception and protein–protein interac- tions12while the intracellular kinase domain transduces signals to substrate proteins. RLKs are involved in essential mechan- isms including stress responses, hormone signaling, cell wall monitoring, and plant development12. The large number of secreted proteins, RLKs, and RLPs in plants may reflect their sessile lifestyle and need for meticulous monitoring of signals from other cells, tissues, or the environment. Phylogenetic relations between different groups of RLKs and RLPs have been described9,1316 but few have been physiologically and biochemically characterized17.

The Domain of Unknown Function 26 (DUF26; Gnk2 or stress-antifungal domain; PF01657)18,19 is an extracellular domain harboring a conserved cysteine motif (C-8X-C-2X-C) in its core. It is present in three types of plant proteins. The first class is CYSTEINE-RICH RECEPTOR-LIKE SECRETED PRO- TEINs (CRRSPs). The best characterized CRRSP is Gnk2 from Gingko biloba with single DUF26, which acts as mannose- binding lectin in vitro with antifungal activity18,19. Two maize CRRSPs have been shown to also bind mannose and participate in defence against a fungal pathogen20. The second class, CYSTEINE-RICH RECEPTOR-LIKE PROTEIN KINASES (CRKs), has a typical configuration of two DUF26 in the extra- cellular region and forms a large subgroup of RLKs in plants.

CRKs participate in the control of stress responses and devel- opment in Arabidopsis and in rice2131. The third class of DUF26 domain-containing proteins is the PLASMODESMATA- LOCALIZED PROTEINS (PDLPs). PDLPs contain two DUF26 domains in their extracellular region and a transmembrane helix, but lack a kinase domain. They associate with plasmodesmata and are involved in symplastic intercellular signaling32, pathogen response33, systemic signaling34, control of callose deposition35

and are targets for viral movement proteins36. However, the precise biochemical functions of plant DUF26-containing pro- teins remain unclear.

Tandem expansions drive the evolution of, for example, F-Box proteins37, and transcription factors38, but also RLKs16 and RLPs11. Diversification processes include sub-functionalization, where paralogs retain a subset of their ancestral functions, and neofunctionalization, where duplicated proteins acquire new functions38. CRKs and CRRSPs typically exist in clusters on plant chromosomes24, suggesting relatively recent tandem expansions.

This makes DUF26-containing proteins a perfect dataset for sequence-based evolutionary investigation.

Here we carry out an in-depth analysis of the DUF26- containing proteins, a protein family involved in signaling, to explore the dynamics and effect of the different duplication mechanisms on overall gene family evolution. We combine phy- logenetic analyses with experimental structural biology to gain insight into the evolution of DUF26-containing proteins. While sequence analysis indicates that the DUF26 domain is specific to land plants, the domain shows structural similarity to fungal carbohydrate-binding lectins. Our results suggest that DUF26- containing proteins constitute a group of carbohydrate-binding proteins in plants. CRKs and CRRSPs experienced both ancestral and recent lineage-specific tandem duplications. In contrast to the general pattern of gene families expanding by small-scale dupli- cation events, these gene families experienced expansion also during or after WGMs. Our work illustrates that detailed under- standing of the evolution of large protein families is a prerequisite for translatingfindings from model plants to different species and for dissecting conserved or specialized functions of proteins.

Results

DUF26-containing proteins have diverse domain composi- tions. We selected 32 species with high-quality genome assem- blies representing major plant lineages. After manual curation, de novo annotation, and exclusion of partial gene models and pseudogenes, we obtained 1409 high-quality gene models (Fig.1a, Supplementary Figure 1, Supplementary Note 1, Supplementary Data 1). The PFAM protein domain database39identifies DUF26 as specific to embryophytes. Accordingly, we identified no DUF26 or DUF26-like domains from genomes of algae, charophytes, diatoms, fungi, insects or vertebrates. DUF26-containing proteins are grouped into three categories: CRRSPs, PDLPs, and CRKs (Fig.1b). CRRSPs consist of a signal peptide followed by one or more DUF26 domains, separated by a variable region. CRRSPs with a single DUF26 (sdCRRSPs) were identified from most land plants, including the liverwort (Marchantia polymorpha) and moss (Physcomitrella patens)lineages (Fig.1). CRRSPs with two DUF26 domains (ddCRRSPs) were identified from vascular plants including the lycophyte Selaginella moellendorffii and represent the predominant type in vascular plants (Fig.1). Rice as well as Brassicaceae display lineage-specific evolution with a large number of ddCRRSPs while sdCRRSPs are absent (Fig.1a, Supplementary Figure 2).

CRKs contain a signal peptide, two DUF26 domains, and a transmembrane region followed by an intracellular protein kinase domain. Similar to ddCRRSPs, CRKs were identified from vascular plants but not from bryophytes (Fig. 1a). The CRKs likely originate from a fusion of sdCRRSPs with transmembrane region and kinase domain from LRR_clade_3 RLKs in the common ancestor of vascular plants15. The Selaginella genome uniquely encodes single DUF26 CRKs (sdCRKs; Fig.1b) and only few CRKs from eudicots contain more than two DUF26 domains.

PDLPs were identified from all seed plants and are composed of a signal peptide, two DUF26, and a transmembrane region

(4)

b

Solanum tuberosum Vitis vinifera

Medicago truncatula Arabidopsis lyrata Picea abies Selaginella moellendorffii

Oryza sativa

Theobroma cacao Hordeum vulgare Spirodela polyrhiza Marchantia polymorpha

Nelumbo nucifera

Betula pendula Physcomitrella patens Micromonas pusilla

Solanum lycopersicum Volvox carteri

Amborella trichopoda Ostreococcus lucimarinus

Brachypodium distachyon Zea mays

Cucumis sativus Arabidopsis thaliana Capsella rubella Aquilegia coerulea Sorghum bicolor

Prunus persica Chlamydomonas reinhardtii Coccomyxa subellipsoidea

Solanum melongena Klebsormidium flaccidum

Populus trichocarpa

Number of genes

sdCRRSP CRRSP sdCRK bCRK vCRK PDLP

0 20 40 60 80 100

Gymnosperms(1 )

Monocots(6) Eudicots(15) Monilophytes(0)

Charophytes(1) Bryophytes(2) Lycophytes(1)

Basal Angiospems(1)

2 1 ? 1 1 3 12

? 1 1 6 15

? 1 1 6 15

1

1 ? 1 1 6 15

5 1 2 1

1 6 15

CRCK sdCRRSP ddCRRSP PDLP sdCRK CRK_I CRK_II tdCRK qdCRK qdCRRSP

Signal peptide DUF26 domain Transmembrane region Kinase domain (type I) Kinase domain (type II)

a

WGD WGT

Fig. 1Overview and distribution of DUF26-containing genes in plants.aDUF26-containing genes are absent from algae and charophytes but present in land plants.Marchantia polymorphaandPhyscomitrella patensgenomes encode sdCRRSPs.Selaginella moellendorfipossesses sdCRRSPs, sdCRKs, and canonical CRKs. Seed plant (gymnosperm and angiosperm) genomes encode the whole set of DUF26-containing genes. CRKs were dened as basal group CRKs (bCRKs) or variable group CRKs (vCRKs) based on their phylogenetic positions. Whole-genome duplication (WGD) events are presented with green circle and whole-genome triplication (WGT) events with dark blue circle. Ferns were omitted from analyses due to lack of available genome assemblies.

bOverview of different domain compositions of proteins containing DUF26 in different plant lineages. The number of representative species in the analyses is given in brackets after the name of the group. Numbers in the table present the number of species in each lineage in which the domain structure was found. Abbreviations sd (single domain), dd (double domain), td (triple domain), and qd (quadruple domain) refer to the number of the DUF26 domains

(5)

followed by short cytoplasmic extension. We also identified several angiosperm CRKs lacking signal peptide, extracellular domain, and transmembrane region. These are subsequently referred to as CYSTEINE-RICH RECEPTOR-LIKE CYTOPLAS- MIC KINASEs (CRCKs).

Evolution of CRKs, PDLPs, and ddCRRSPs from small sdCRRSPs. We investigated the evolution of CRRSPs, CRKs, and PDLPs by estimating phylogenetic trees (Supplementary Note 2).

Overall, the tree of DUF26-containing proteins split into two distinct groups, basal group α and variable groupβ (Fig.2a, b),

Variable (β) Group

Monocot vCRK Monocot CRRSP

Eudicot vCRK Eudicot CRRSP

Amborella vCRK Amborella CRRSP

PDLPs PDLPs

sdCRKs sdCRRSPs

bCRKs

bCRKs

vCRKs CRRSPs vCRKs

CRRSPs

0.6 DUF26-A

DUF26-B

MEME (no SSC) 16.06.2017 06:14 0

1 2 3 4

Bits 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

MEME (no SSC) 16.06.2017 06:27 0

1 2 3 4

Bits 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

DUF26-A

DUF26-B

a

b

c

sdCRRSPs

Selaginella sdCRKs/sdCRRSPs

bCRKs Selaginella ddCRKs

PDLPs Spruce-specific CRKs/CRRSPs

CRRSPs

sdCRRSPs

PDLPs, DUF26-A and DUF26-B bCRKs, DUF26-A and DUF26-B

MEME (no SSC) 24.09.2018 10:20 0

1 2 3 4

Bits 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061 MEME (no SSC) 05.10.2018 11:49 0

1 2 3 4

Bits 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354

MEME (no SSC) 05.10.2018 11:19 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950 MEME (no SSC) 06.10.2018 10:05

0 1 2 3 4

Bits

0 1 2 3 4

Bits

0 1 2 3 4

Bits

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758

MEME (no SSC) 06.10.2018 10:05 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455

Dicot vCRKs, DUF26-A and DUF26-B

Monocot vCRKs, DUF26-A and DUF26-B

MEME (no SSC) 06.10.2018 10:58 0

1 2 3 4

Bits 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657

MEME (no SSC) 06.10.2018 11:02 0

1 2 3 4

Bits 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455

MEME (no SSC) 06.10.2018 10:32 0

1 2 3 4

Bits 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566

MEME (no SSC) 06.10.2018 10:41 0

1 2 3 4

Bits 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263 MEME (no SSC) 08.10.2018 13:35

0 1 2 3 4

Bits 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455

MEME (no SSC) 08.10.2018 13:35 0

1 2 3 4

Bits 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556

CRRSPs, DUF26-A and DUF26-B

Basal (α) group Variable (β) group

d

Eudicot vCRKs

Eudicot vCRKs

Monocot vCRKs 76

CRRSPs

76 Basal (α) Group Selaginella sdCRRSP

Selaginella sdCRK Selaginella ddCRK sdCRRSP

PDLP-I

Spruce-specific CRK/CRRSP bCRK-I bCRK-II

PDLP-II

(6)

whereαis paraphyletic with respect toβ. To increase the number of informative sites and obtain better resolution, we estimated separate phylogenetic trees for both groups (Supplementary Fig- ure 2a, b). The subgroupings of the basalα- and variableβ-groups remained conserved also in trees estimated for each subfamily of DUF26-containing proteins (Supplementary Figure 2c–e). Subse- quently, we reconciled the gene trees with the species tree, and estimated ancestral gene contents and duplication/loss events for the subfamilies in 11 species (Supplementary Figure 3). To identify significant expansions we fitted birth–death rate models for DUF26-containing protein families and compared the rates against computationally derived gene families (orthogroups) for RLKs, all protein kinases, and plasmodesmal proteins40 using Badirate41. Finally, we assessed selective pressure by estimating amino acid conservation patterns around the main cysteine motif of the DUF26 domains for major subfamilies, and found con- served sites specific to α- or β-groups (Fig. 2, Supplementary Note 3).

The α-group is likely older, containing sequences from all vascular plants. Proteins in this group are conserved on sequence level and identification of putative orthologs from different species is frequently possible. Purifying selection, selective removal of (deleterious) variations, is likely the main force acting on this clade, as suggested by low dN/dS values (one- rate model for whole groups: bCRK-I 0.184, bCRK-II 0.192, PDLPs 0.267, sdCRRSPs 0,162, CRCK 0.134; more flexible model with branch-specific dN/dS within each group yielded similar results). The subgroups within the basalα-group evolved independently and their DUF26 domains share features distinguishing them from the β-group (Fig. 2c, Supplementary Note 3). Since the sdCRRSPs are located close to the root of the α-group (Fig. 2b) and form a monophyletic subclade at the root of the CRRSP tree (Supplementary Figure 2c), they are likely the ancestral type of DUF26 proteins in land plants. Furthermore, sdCRRSPs are present in various early diverging plant lineages such as the gymnosperm Ginkgo biloba (including Gnk218,19) and the liverwort, Marchantia polymorpha (Supplementary Figure 2f). Turnover rates of sdCRRSPs do not differ from those of all gene families and show lineage-specific expansions in early diverging species (Supple- mentary Figure 3a).

The placement of Selaginella sdCRKs to the root of the CRK phylogeny (Supplementary Figure 2d) and as sister to sdCRRSPs in the α-group (Fig. 2b) suggests an ancient origin.

The DUF26 domain likely duplicated after fusing with trans- membrane region and kinase domain, thus establishing the typical double DUF26 CRK configuration (Supplementary Figure 2d). Following duplication, the two DUF26 domains diverged into distinct, evolutionarily conserved, forms, DUF26-A and DUF26-B (Fig. 2d). All CRKs expanded significantly in the branches leading to lycophytes (P=0.0017600) and to angios- perms (P=0.0151412) compared to all RLKs (Fig. 3), and in the branch from lycophytes to angiosperms compared to all protein kinases (Supplementary Figure 4a).

A monophyletic group of CRKs with representatives from gymnosperms and angiosperms is located near the base of the CRK phylogeny (Supplementary Figure 2d) in the α-group (Figs2a, b). This group, referred to as basal CRKs (bCRKs), likely represents the ancient CRKs in seed plants. Following the initial innovation in ancestral vascular plants, the group evolved at rates similar to orthogroups containing all protein kinases or all RLKs (Fig. 3b, Supplementary Figures 3b and 4b). The bCRKs split into two subgroups, bCRK-I and bCRK-II (Fig. 2b and Supplementary Figure 5), in gymnosperms and angiosperms, suggesting divergence in early seed plants. The larger bCRK-I subclade further divides into distinct branches with tandemly duplicated Amborella bCRKs at their roots (Supplementary Figures 5 and 6a, b) suggesting rapid differentiation following duplication in ancestral angiosperms42. The number of the bCRK-Is is conserved, excluding an expansion specific to Solanaceae, while the small bCRK-II subclade is absent from Brassicaceae.

PDLPs were only found in seed plants. PDLPs belong to the α-group (Fig. 2a, b) and represent the most conserved class of DUF26-containing genes. PDLPs do not display different expansion rates compared to plasmodesmata-related orthogroups40 (Fig. 3c). PDLPs split into two clades, PDLP-I and PDLP-II (Supplementary Figure 2e), which both contain Amborella trichopodaand eudicot and monocot PDLPs, suggest- ing divergence in ancestral angiosperms. PDLPs and ddCRRSPs originate from CRKs through loss of kinase domains and/or transmembrane regions. The loss can be a two-step process, as exemplified by an atypical PDLP from Amborella trichopoda located at the root of the main ddCRRSP clade (Supplementary Figure 7). PDLPs were possibly present already in ferns since database searches identified a partial gene model lacking a transmembrane region in Marsilea quadrifolia43 (see Methods) with similarity to PDLPs, placed at the root of a phylogenetic tree for PDLPs (Supplementary Figure 8).

In the α-group, a group of spruce-specific CRKs (spruce vCRKs) are more related to PDLPs than other CRKs (Fig.2a, b).

They form a distinct group between bCRKs and a large group of angiosperm variable CRKs (vCRKs; Fig.2a, b, Supplementary Figures 2d and 3d). Angiosperm vCRKs form the β-group together with ddCRRSPs and atypical monocot sdCRRSPs (Fig. 2a, b). These CRRSPs likely evolved from vCRKs through loss of transmembrane regions and kinase domains and, in case of sdCRRSPs, also DUF26-B domains. The β-group is less conserved compared to the α-group and branches into two eudicot-specific and one monocot-specific group with a few Amborella trichopodavCRKs at the root of the groups. Members of the β-group experienced several independent tandem expan- sions in different plant taxa (Figs3d, e, Supplementary Figures 3d, 4c and 6c) and expanded during the diversification of monocots and dicots. CRRSPs in the β-group are not monophyletic, suggesting independent birth from partial duplications of vCRKs. Hence, expansion rates and extrapolation of ancestral gene counts for ddCRRSPs could not be reliably predicted.

Fig. 2Phylogenetic tree of CRRSPs, CRKs, and PDLPs.aThe phylogenetic tree was estimated with the maximum-likelihood method using all high-quality full-length DUF26-containing sequences from lycophytes onwards. CRCKs and concA-CRKs were excluded while GNK2 fromGingko bilobawas included.

Overall, DUF26-containing genes split into basal and variable groups. Detailed phylogenetic trees with bootstrap support (1000 replicates) andltered sequence alignments are available athttp://was.bi?id=IaroP(full tree),http://was.bi?id=wpEHGt(basal group separately), andhttp://was.bi?id=aIJe_D (variable group separately).bThe same phylogenetic tree as in panelarooted to ancestral sdCRRSPs and sdCRKs fromSelaginella moellendorfishowing that the variable group branches out from the basal group.cThe MEMEgures present the conservation pattern of amino acid positions around the main cysteine motif within the DUF26 domains for sdCRRSPs, bCRKs, and PDLPs from the basal group and CRRSPs and vCRKs from the variable group. The features specic only to genes either in the basal group or in the variable group are highlighted.dThe DUF26-A and DUF26-B domains are clearly separated in an unrooted phylogenetic tree containing DUF26 domain sequences. The MEMEgures present differences in the conservation of the AA sequence surrounding the conserved cysteines in DUF26-A and DUF26-B

(7)

Lineage-specific expansions in the β-group make ortholog identification challenging.

DUF26 is related to fungal lectins and assembles as tandems.

High sequence divergence of DUF26 proteins and the lineage- specific expansions raise the question whether their overall structure is conserved. The consensus DUF26 domain as defined in PFAM comprises ~90–110 amino acids. Structural information is currently available for the sdCRRSP Gnk219 but not for

ddCRRSPs, CRKs, or PDLPs, proteins with a double DUF26 configuration. Mechanistic constraints restrict the evolution of protein structures, and therefore understanding structural con- servation provides clues for protein function.

We defined the structural relationship of tandem DUF26 domains by determining crystal structures of AtPDLP5 (residues 26–241) and AtPDLP8 (21–253) ectodomains to 1.25 and 1.95 Å resolution, respectively (Supplementary Table 1). Individual DUF26 domains feature two smallα-helices folding on top of a central antiparallel β-sheet (Fig. 4). The PDLP5 DUF26-A c

Ath 8 (+3)

Atr 3 Smo 0

Bdi 5

Zma 7 (+2) Spo 4 Vvi 6 (+1)

Ptr 10 (+5)

Osa 6 (+1)

Ppa 0

Aco 5 (+1)

3

5 4

5

0

4 5 5

4

0

0.2

PDLPs vs. plasmodesmata-related proteins

a

Ptr 64 (+28) Vvi 51 (+9)

Osa 51 (+11) Zma 33 (+3) Ath 44 (+8)

Ppa 0

Aco 39 (+10) Bdi 40

Smo 5

Spo 21 Atr 17 (+1) 22

16 36

30 21

40 29

5 0

42

0.2 All CRKs vs. RLKs

P = 0.0017600

P = 0.0151412

b bCRKs vs. RLKs

Bdi 2 (-1)

Smo 5 (+1) Vvi 11 (+4)

Spo 4 (+1) Aco 10 (+5)

Ppa 0

Ath 5 (-2) Ptr 9 (+2)

Atr 9 (+4) Osa 3 Zma 3

4 0

3 3 5

5 3 5

7 7

0.2 P = 0.0051119

d vCRKs vs. RLKs

Ath 39 (+6)

Zma 30 (+4) Ptr 55 (+22)

Spo 17 Bdi 38 Osa 48 (+10) Vvi 40 (+7)

Aco 29 (+10)

Atr 8 (+1) Smo 0 Ppa 0

19

0

18

17 33

0

33

7

27 38

0.2 P = 0.0032154

P = 0.0567543

e vCRKs

Monocots Eudicots

Amborella

Fig. 3Comparison of evolutionary rates between gene families. Analyses were carried out with Badirate for 11 species (Physcomitrella patens,Selaginella moellendorfi,Amborella trichopoda,Arabidopsis thaliana,Populus trichocarpa,Vitis vinifera,Aquilegia coerulea,Spirodela polyrhiza,Zea mays,Oryza sativa,and Brachypodium distachyon). Neutral branches are reported as bold black lines; branches involving gene family expansion are reported as bold purple lines and branches with contraction as blue dashed lines. Branches with signicant differences (false discovery rate adjustedp< 0.05) to birthdeath rate model estimates are marked with arrows. Node labels present the ancestral gene family sizes estimated by Badirate. Tip labels contain species abbreviations and the change in numbers compared to the most recent ancestral node.aAll CRKs compared to other receptor-like kinases (RLKs).bbCRKs compared to RLKs.cPDLPs compared to other plasmodesmata-related orthogroups.dvCRKs compared to RLKs.ePhylogenetic maximum-likelihood tree showing differences in lineage-specic expansions in monocot and dicot vCRKs following the split ofAmborella trichopoda. Species-specic expansions (at least two genes from same species) are marked with red and clades including sequences from only Brassicaceae or Solanaceae are marked with blue

(8)

domain is N-glycosylated at positions Asn69 and Asn132 in our crystals (Fig.4a). The secondary structure elements of DUF26 are covalently linked by three disulfide bridges, formed by six conserved Cys residues, part of which belong to the C-8X-C-2X-C motif (Figs. 2d and 4a). We previously suggested that tandem DUF26-containing proteins could be involved in ROS or redox

sensing24,26. To assess the functional roles of the invariant disulfide bridges in PDLPs, we mutated the partially solvent exposed PDLP5Cys101, PDLP5Cys148, and PDLP5Cys191to alanine.

While the wild-type PDLP5 ectodomain behaves as a monomer in solution (Supplementary Figure 9), the mutant protein tends to aggregate in our biochemical preparations (Supplementary

C

N

C N

DUF26-B

DUF26-B DUF26-A

a

Asn69 Asn132

1

2 3

6

4 90° 5

b

DUF26-A

c

6 4 5

3 2

1

N

C PDLP5

Gnk2

Gnk2

d e

LDL

PDLP5 DUF26-A

f

3

Y3 PDLP5

DUF26-A

N N C

C

PDLP5 PDLP8

N

Y31/Y33 I30/I32

Y224/Y224

Y229/Y229

Q190/Q190 R129/R129

D92/D92

Q88/Q88 F135/F135

Mannose bound to Gnk2

Mannose bound to Gnk2

N C

N C

3 Globotriose bound

to LDL PDLP5

Gnk2 A35/V35

N37/S37

R132/K133 R119

E130/D131

Q128/R129

(9)

Figure 9) and display reduced structural stability in thermofluor assays (Supplementary Figure 10). These experiments and crystallographic data (Fig.4a) suggest that the conserved disulfide bonds in PDLPs and potentially in other DUF26-containing proteins are involved in structural stabilization rather than redox signaling.

The N-terminal DUF26-A (PDLP5 residues 30–132) and the C-terminal DUF26-B (residues 143–236) domains are connected by a structured loop (residues 133–142) and make extensive contacts with each other (Fig.4a). The resulting ectodomain has a claw-like shape with theβ-sheets of DUF26-A and B facing each other (Fig. 4a). The DUF26-A and B domains in PDLP5 and 8 closely align, with root mean square deviations (r.m.s.d.s) of 1.6 and 1.2 Å when comparing 89 corresponding Cα atoms, respectively (Supplementary Figure 11a). DUF26-A is more variable than DUF26-B on the sequence level (Fig. 2d). The DUF26-A and -B domains in PDLP5 and PDLP8 have 24% and 30% of their residues in common, most of which map to the hydrophobic core of the domain (including the six cysteine residues forming intra-molecular disulfide bonds) and to the DUF26-A–DUF26-B interface (Fig.4b). This interface is formed by a line of aromatic and hydrophobic residues originating from the proximal face of the β-sheet in DUF26-A and -B (Fig. 4b, Supplementary Figure 12). Importantly, many of the interface residues are conserved among different PDLPs, but also among CRKs and ddCRRSPs (Supplementary Figure 12). Consistently, the ectodomains of PDLP5 and PDLP8 belonging to different phylogenetic clades (Supplementary Figure 8) closely align with an r.m.s.d. of ~1.6 Å when comparing 198 corresponding Cα atoms (Supplementary Figure 11b). These observations suggest that evolutionarily distant DUF26 tandem proteins likely share the conserved three-dimensional structure.

The physiological ligands for PDLPs are currently unknown.

Therefore, we performed structural homology searches44 to obtain insights into the biochemical function of plant DUF26 domains (see Methods). Top hits include the single DUF26 domain protein Gnk219. Despite moderate sequence similarity, the overall fold of Gnk2 and PDLP5 DUF26-A and B as well as disulfide-bond arrangement is conserved (Fig. 4c). Notably, Glu130 and Arg132 implicated in mannose binding in Gnk2 are replaced by Asp131 and Lys133 in the DUF-A of PDLP5, respectively (Fig. 4d). A similar pocket is found in the DUF-A domain of PDLP8, but not in the DUF-B domains of either PDLP5 or 8. Despite these structural homologies of Gnk2, PDLP5 DUF-A, and PDLP8 DUF-A, we could not detect binding of mannose to isolated PDLP5 ectodomain in vitro (Supplementary Figure 13a). We are also unable to detect any binding of other water-soluble cell wall-derived carbohydrates to the PDLP5 ectodomain (Supplementary Figure 13b). The PDLP5 DUF26

domains share strong structural homology also with two fungal lectins, theα-galactosyl-bindingLyophyllum decasteslectin (LDL)45and a glycan-binding Y3 lectin fromCoprinus comatus46. Both proteins closely align with the plant DUF26 domain, and share one of the three disulfide bridges (Fig.4e–f). The surface areas involved in globotriose and glycan binding, respectively, are not conserved in PDLPs, but the structural similarity of plant DUF26 domains with different eukaryotic lectins could suggest a common evolutionary origin and a role as carbohydrate recognition modules45.

We next explored potential binding sites in the two molecules by analyzing sitewise ω for orthologs of PDLP5 and PDLP8.

Low ω values were observed in structural context, indicating conservation of residues buried inside the DUF26 domain, while variable residues (under more relaxed selection) appear on the surface of the structure (Fig. 5a). The variability of the PDLP5 and PDLP8 DUF26 domain surface may be central to their ability to interact with other proteins or ligands (Supplementary Figure 13c). Selection patterns may differ between young lineage- specific and evolutionarily conserved proteins. For PDLP5, high ω values on the surface could indicate fast evolution leading to sub- or neofunctionalization, as PDLP5 orthologs originate from the recent lineage-specific duplication in Brassicaceae.

The different surface charge properties of related PDLPs from Arabidopsis (Fig. 5b) suggest that different PDLPs and other DUF26-containing proteins sense a diverse set of ligands. While the nature of these molecules is currently unknown, cell-wall- derived carbohydrates or small extracellular molecules represent candidate ligands. Notably, we observed typical lectin-dimers in PDLP5 and PDLP8 crystals, in which two lectin domains dimerized along an extended antiparallel β-sheet (Fig. 5c)47. In principle, this mode of dimerization could form an extended binding cleft for a carbohydrate polymer, and presents an attractive activation mechanism for PDLPs and CRKs, where a monomeric ground state forms ligand-induced oligomers, as previously seen with plant LysM-domain-containing carbohy- drate receptors48.

The CRK kinase domain is related to LRR and S-locus RLKs.

Kinase domains transduce signals by phosphorylating substrate proteins. Typically, the kinase domain has been used to investi- gate phylogenetic relationships between RLKs9,15,16. The CRK kinase domain is similar to the kinase domain of S-locus lectin and LRR RLKs from LRR_clade_315 (Supplementary Data 2).

Based on catalytic motifs in the kinase domains49 most CRKs seem to be active kinases and in vitro activity of several CRKs has been experimentally confirmed25,28,30. Most CRKs belong to the RD type50,51considered capable of auto-activation but few non- RD CRKs are present in plants49.

Fig. 4The crystals structures of the PDLP5 and PDLP8 ectodomains reveal a conserved tandem architecture of two lectin-like domains.aOverview of the PDLP5 ectodomain. The two DUF26 domains are shown as ribbon diagrams, colored in blue (DUF26-A) and orange (DUF26-B), respectively. N-glycans are located at Asn69 and Asn132 of DUF26-A and are depicted in bonds representation (in cyan). The DUF26-A and DUF26-B domains each contain three disulde bridges labeled 1 (Cys89Cys98), 2 (Cys101Cys126), 3 (Cys36Cys113), 4 (Cys191Cys200), 5 (Cys203Cys228), and 6 (Cys148Cys215).

bClose-up view of the DUF26-ADUF26-B interface in PDLP5 (orange) and PDLP8 (blue), shown in bonds representation.cSuperimposition of the Gnk2 extracellular DUF26 domain (PDB-ID 4XRE) with either PDLP5 DUF-26A (r.m.s.d. is ~1.4 Å comparing 100 aligned Cαatoms) or PDLP5 DUF26-B (r.m.s.d.

is ~2.0 Å comparing 93 corresponding Cαatoms). Corresponding disulde bridges shown in bonds representation (PDLP5 in green, Gnk2 in yellow) are highlighted in gray. Gnk2-bound mannose is shown in magenta (in bonds representation).dClose-up view of the residues involved in the binding of mannose of Gnk2 (bonds representation, in blue and magenta, respectively) and putative residues involved in ligand binding of PDLP5 DUF26-A (in orange).eThe fungal LDL DUF26 domain (Cαtrace in blue; PDB-ID 4NDV) and PDLP5 DUF26-A (in orange) superimposed with an r.m.s.d. of ~2.4 Å comparing 75 aligned Cαatoms). Disulde bridges (LDL in yellow and PDLP5 in green; aligned disulde bridges highlighted in gray) and the LDL bound globotriose (magenta) are shown in bonds representation.fCαtraces of the structural superimposition of the fungal Y3 protein (PDB-ID 5V6I) and PDLP5 DUF26-A (r.m.s.d. is ~2.6 Å comparing 78 corresponding Cαatoms). Disulde bridges of Y3 (yellow) and PDLP5 DUF26-A (green) are shown alongside, one corresponding disulde pair is highlighted in gray

(10)

Analyzing ectodomains and kinase domains of CRKs separately suggests thatSelaginella ddCRKs share an ancestor with bCRKs, whileSelaginellasdCRKs share an ancestor with vCRKs (Fig.6a).

The separation of DUF26-A and DUF26-B (Fig. 2d) and the timing of those events does not reveal whether duplication of the DUF26 domain in the CRK extracellular region occurred more than once or whether functional constraints in the kinase domain led to the similarity ofSelaginellasdCRKs and vCRKs. Juxtaposi- tion of phylogenetic trees based on ectodomains and kinase domain suggests several exchanges of kinase or extracellular regions among CRKs during evolution (Fig.6a). Most strikingly, a group of monocot-specific CRKs separates from other CRKs in a phylogenetic tree based on the kinase domain (Fig. 6a). Those CRKs have a different exon–intron structure (Fig. 6b, Supple- mentary Figure 1b) and a kinase domain with high similarity to concanavalin-A-like lectin protein kinase domains (Supplemen- tary Data 2) altogether suggestive of chimeric gene formation following tandem duplication52. This kinase domain switch is specific to grasses (Poaceae) and has likely resulted in a different set of substrates. In addition, loss of ectodomains and transmem- brane regions has established CRCKs at least three times; one

group is specific to angiosperms (CRCK-I clade), one is specific to Brassicaceae and one only toArabidopsis thaliana.

Mixed-mode evolution of large gene families. For detailed analyses of gene family dynamics we analyzed the synteny, conservation of gene order between species, and tandem dupli- cations inAmborella trichopoda, tomato (Solanum lycopersicum), Arabidopsis, rice, and maize (Zea mays; Fig. 7, Supplementary Figure 7), and estimated the timing of duplication events by reconciliation of gene trees with species trees (Supplementary Figure 14a).

Within the rapidly diverging β-group the vCRKs show large lineage-specific expansions. The ancestral origins for monocot and eudicot vCRKs differ, and neither synteny nor orthology can be identified (Fig. 7c, Supplementary Figure 14c). This suggests that this subfamily has a high birthrate and expands rapidly by tandem duplications. Additionally, many tandems are lost or fractionated after WGMs. Similarly, CRRSPs demonstrate little synteny between species (Fig.7a, Supplementary Figure 14d), and CRRSPs in rice and Arabidopsis experienced lineage-specific

PDLP5 PDLP8

N

C

N

PDLP5 PDLP8

+

c

C

180°

180°

a PDLP5

N N

C

1.17

1.17

180°

180°

b

PDLP8

N

C C

N

0.15 0.66 1.03

0.15 0.66

0.15 0.59

0.15 0.59 1.03

Fig. 5PDLP5 and PDLP8 may have drastically different oligomerisation modes; surface charge distributions and surface-exposed residues are not widely conserved.aThe conservation of amino acid residues illustrated on the molecular surface of the PDLP5 or PDPL8 crystallization dimers, respectively.

Sitewiseω(dN/dS) values, indicating the intensity and direction of selection on amino acid changing mutations, illustrated on the molecular surfaces and in ribbon diagrams of PDLP5 or PDPL8. Theω-values range from 0.15 (green) to slightly over 1.0 (magenta), reecting conserved sites under purifying selection and sites evolving close to a neutral process, respectively.bElectrostatic potential mapped onto molecular surfaces of the putative PDLP5 and PDLP8, orientation as incdimer, respectively.cRibbon diagrams of PDLP5 (orange) and PDLP8 (blue) crystallographic dimers. In both dimers large, antiparallelβ-sheets are formed, using different proteinprotein interaction surfaces

(11)

Ectodomain

1500 1000 500 0

Kinase domain

0 500 1000 150

Monocot CRKs with atypical kinase Selaginella ddCRKs

Selaginella sdCRKs bCRK-I

bCRK-II

Spruce CRKs Amborella CRKs Monocot CRKs Eudicot CRKs

Monocot CRKs

Eudicot CRKs

a

LOC_Os12g41530, Atypical monocot vCRK DUF26-A DUF26-B TMR Kinase

b

LOC_Os07g35004, Monocot vCRK

AT1G70520, Dicot bCRK Smo_113941, Lycophyte bCRK Smo_444507, Lycophyte sdCRK 100 bases

LOC_Os05g03920, Monocot bCRK

Fig. 6CRKs experienced domain rearrangements.aComparison of phylogenetic trees based on ectodomain region and kinase domain of 880 CRKs.

Phylogenetic maximum-likelihood trees are presented as tanglegram where the tree of the CRK ectodomain region is plotted against the tree of the kinase domain. The kinase tree is rooted to atypical monocot CRKs with a Concanavalin-A type kinase domain and the ectodomain tree is rooted to CRKs fromSelaginella moellendorfi. The ectodomain tree was detangled based on the kinase domain tree. Lines connect the ectodomain and kinase domain belonging to same gene, and connection are drawn in different colors for better visibility. Juxtaposition of the trees shows rearrangements and domain swaps of ecto- and kinase domains. Black circles highlight the difference between the ectodomains and kinase domains of theSelaginella sdCRKs and ddCRKs and also the group of the atypical monocot CRKs which have exchanged the kinase domain.bThe exonintron structure of the CRKs. Usually CRKs contain seven exons: one encoding DUF26 domains, one encoding transmembrane region (TMR), andve exons encoding the kinase domain. In atypical monocot CRKs with exchanged kinase domain, whole gene is encoded by one or two exons. The scale bar for each gene represents 100 bases. Regions encoding the DUF26-A are colored with blue, the DUF26-B with orange, the transmembrane region (TMR) with pink, and the kinase domain with green

(12)

tandem duplications (Supplementary Figure 14d). In Brassicaceae this expansion can be traced toAmborellaCRRSP (AtrCRRSP2), suggesting a tandem mode of expansion.

Tandem duplications evolve through unequal crossover or homologous recombination events53. Unequal crossover pro- duces copy number variation, whereas homologous recombina- tion such as gene conversion plays a role in concerted evolution, which can maintain the similarity between gene copies over long periods54. Gene conversion depends on genomic distance as well

as sequence homology. Accordingly, we observed several events among the lineage-specific tandem vCRK expansions (Supple- mentary Data 3), whereas for bCRKs gene conversion was only observed in the tandem expansion in Amborella. Thus, gene conversion is important for maintaining the similarity between recent tandem duplicates but conversion events become rare as sequences diverge over time.

The CRCK-I genes are present in most genomes as single copy genes within conserved syntenic genome segments, suggesting a

Zma Osa Atr Sly Ath

WGD WGT

CRK PDLP CRRSP

2 3 4 5

6 7 8 9 10

1 2 3 4 5 6 7 8 109 11 12

1 2 3 4 5 6 7 8 9 12 11 10

1

2 Zma

−1

b

3 Osa 5 Ath 11 Sly 9 Atr 13 Smo

26 Zma 43 Osa 36 Ath 7 Sly 6 Atr 0 Smo

+7 +1

+5 +1 +12

−1

−4

−2

−1 1

8

7 3 6

c

0

+20 +11 +23 +2

+34 +4 +5

−6

−1

−1 1

1

21 3 New

1

1 2 3 4 5 6 7 8 9 10 12 11 13 Scaffold 85

1

d CRK

PDLP CRRSP

>6 6 5 4 3 2 1 0

At1g04520 At1g70740 At1g19090 At3g60720 At5g40380 At4g28670

At1g70690 At2g01660

At2g33330 At5g37660 At4g21230

At1g70530 At5g48540 Solyc09g057960.1

Solyc01g091040.2 Solyc02g067770.2 Solyc02g080050.1 Solyc02g086590.2 Solyc03g005690.2 Solyc03g112730.2 Solyc04g007880.2 Solyc04g007950.2 Solyc05g005050.2 Solyc05g051270.2 Solyc09g090680.2

AC185422.3_FG005

GRMZM2G040840 GRMZM2G062471 GRMZM2G352866

GRMZM2G043878 GRMZM2G026406 GRMZM2G334181 GRMZM2G387381 GRMZM2G309380

Os01g36790

Os09g24330 Os03g16950 Os05g02200 Os07g35140 Os07g35004 Os02g43000 Os04g45460 Os03g36650

Z. mays

S. lycopersicum A. thaliana

Cold 1 h

Cold 12 h

flgII-28 6 hDC3000 6 hDC3000 hrc 6 hDC3000 avr 6 hP. fluorescens6 hP. putida 6 h A. tumefaciens 6 hA. brassicicolaC. fulvum 0 hC. fulvum 2 hC. fulvum 4 h Salt stressCold stressCold 36 hCold 36 h 1X. oryzaeX. oryzae MAI1

M. grisea

R. solani 1 dpi

R.solani 3 dpi Root growth (MZ)Root growth (EZ)Root growth (DZ)

Root growth (MZ)Root growth (EZ)Root growth (DZ) Root growth (MZ)Root growth (EZ)Root growth (DZ) Root growth (MZ)Root growth (EZ)Root growth (DZ)

U. maydis Cold

Salt 200 nM

C. zeina

C. grami- nicola 12 h24 h12 d 2 d4 d6 d8 d120 h24 h48 h 0 h 0.5 h16 h24 h 1 h6 h 3 h ColdSalt seedlingSalt 6 hP to CD3000

flg22

B. cinerea

Fungal pathogens Bacterial pathogens Abiotic stress Development

e

bCRKs vCRKs

4.5 4.0 3.5 3.0 2.5 2.0 1.5 Std. dev. of phenotypic variation of crk mutants

2 3 4 5

O. sativa

Références

Documents relatifs

We also identified multi-domain secretory ShKT-domain-containing proteins (here named multiShKL); a single full-length sequence encoded a protein with 2 ShKT domains (multiShKL

The x-ray structure of the catalytic domain of the Lactobacillus casei enzyme (12) shows that it forms a hexamer with a subunit fold unrelated to that of eukaryotic Ser兾Thr

Les gens se sent libre en faisant des activitées comme de la musique ou du sport mais ils sont juste encore plus prisonnier de ses régles de ses lois que nous sommes obliger

To determine whether the decrease in T cell/antigen contact within the prostate tissue was due to loss of antigen expression over time, we harvested prostate tissues from

Notre travail de Bachelor, sur la base des études analysées, ne nous permet pas de conclure qu’une supplémentation en vitamines C et/ou E améliore la performance physique

Dans tous les cas, la quête menée par l’enfant dans le conte gbaya est fructueuse et aboutit aux retrouvailles de l’enfant avec sa mère et même au-delà, dans la version B, avec

Second, we demonstrated that ghrelin modulates the in vitro expression of hepatic clock genes (gper1a, gper1b, gper2a, gper3, gbmal1a, gclock1a and grev-erb) in

In addition to coding capacity, cis-acting elements present on the FL RNA are directly involved in many steps of the virus replication cycle, such as splicing, packaging or