• Aucun résultat trouvé

Evidence of Pathogen-Induced Immunogenetic Selection across the Large Geographic Range of a Wild Seabird

N/A
N/A
Protected

Academic year: 2021

Partager "Evidence of Pathogen-Induced Immunogenetic Selection across the Large Geographic Range of a Wild Seabird"

Copied!
34
0
0

Texte intégral

(1)

HAL Id: hal-02988644

https://hal.archives-ouvertes.fr/hal-02988644

Submitted on 25 Nov 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Hila Levy, Steven Fiddaman, Juliana Vianna, Daly Noll, Gemma Clucas,

Jasmine Sidhu, Michael Polito, Charles Bost, Richard Phillips, Sarah Crofts,

et al.

To cite this version:

(2)

PDF Proof: Mol. Biol. Evol.

1

Article (Discoveries)

2

Evidence of pathogen-induced immunogenetic selection across the

3

large geographic range of a wild seabird

4 Authors: Hila Levy*1, Steven R. Fiddaman*1, Juliana A. Vianna2, Daly Noll2,3, Gemma V. Clucas4,5, 5 Jasmine K. H. Sidhu1, Michael J. Polito6, Charles A. Bost7, Richard A. Phillips8, Sarah Crofts9, Gary 6 D. Miller10, Pierre Pistorius11, Francesco Bonnadonna12, Céline Le Bohec13,14, Andrés A. Barbosa15, 7 Phil Trathan8, Andrea Raya Rey16,17,18, Laurent A.F. Frantz19, Tom Hart1, Adrian L. Smith1

8 * These authors have provided equal contribution and share first authorship.

9 [1] Department of Zoology, University of Oxford, South Parks Road, Oxford, OX1 3PS, United 10 Kingdom; [2] Pontificia Universidad Católica de Chile, Departamento de Ecosistemas y Medio 11 Ambiente, Vicuña Mackenna 4860, Macul, Santiago, Chile; [3] Instituto de Ecología y Biodiversidad, 12 Universidad de Chile, Departamento de Ciencias Ecológicas, Santiago, Chile; [4] Cornell Atkinson 13 Center for a Sustainable Future, Cornell University, Ithaca, New York 14850, USA; [5] Cornell Lab 14 of Ornithology, Cornell University, Ithaca, New York 14850, USA; [6] Department of Oceanography 15 and Coastal Sciences, Louisiana State University, Baton Rouge, Louisiana 70803, USA; [7] Centre 16 d'Etudes Biologiques de Chizé (CEBC), UMR 7372 du CNRS‐Université de La Rochelle,

17 Villiers‐en‐Bois, 79630, France ; [8] British Antarctic Survey, High Cross, Madingley Road, 18 Cambridge, CB3 0ET, United Kingdom; [9] Falklands Conservation, PO Box 26, Stanley, Falkland 19 Islands, FIQQ 1ZZ, United Kingdom; [10] Microbiology and Immunology, PALM, University of 20 Western Australia, Crawley, WA, 6009, Australia; [11] DST/NRF Centre of Excellence at the Percy 21 FitzPatrick Institute for African Ornithology, Department of Zoology, Nelson Mandela University, 22 Port Elizabeth, 6031, South Africa; [12] CEFE UMR 5175, CNRS, Université de Montpellier, 23 Université Paul-Valéry Montpellier, EPHE, 1919 route de Mende, 34293 Montpellier cedex 5, 24 France; [13] Université de Strasbourg, CNRS, IPHC UMR 7178, 23 rue Becquerel, F-67000 25 Strasbourg, France; [14] Centre Scientifique de Monaco, Département de Biologie Polaire, 8 quai 26 Antoine 1er, MC 98000 Monaco, Principality of Monaco ; [15] Museo Nacional de Ciencias 27 Naturales, Departamento de Ecología Evolutiva, CSIC, C/José Gutiérrez Abascal, 2, 28006 Madrid, 28 Spain; [16] Centro Austral de Investigaciones Científicas – Consejo Nacional de Investigaciones 29 Científicas y Técnicas (CADIC-CONICET), Bernardo Houssay 200, Ushuaia, Tierra del Fuego, 30 Argentina; [17] Instituto de Ciencias Polares, Ambiente y Recursos Naturales, Universidad Nacional 31 de Tierra del Fuego, Yrigoyen 879, Ushuaia, Argentina; [18] Wildlife Conservation Society,

(3)

PDF Proof: Mol. Biol. Evol.

35 All sequence data generated in this study were deposited in GenBank under accession numbers 36 MN394222 - MN394376 (TLR4), MN313018 - MN313169 (TLR5), MN312870 - MN313017 (TLR7), 37 and MN566362 - MN566421 (mitochondrial control region, HVR1).

38

Abstract

39 Over evolutionary time, pathogen challenge shapes the immune phenotype of the host to better 40 respond to an incipient threat. The extent and direction of this selection pressure depends on the local 41 pathogen composition, which is in turn determined by biotic and abiotic features of the environment. 42 However, little is known about adaptation to local pathogen threats in wild animals. The Gentoo 43 penguin (Pygoscelis papua) is a species complex that lends itself to the study of immune adaptation 44 because of its circumpolar distribution over a large latitudinal range, with little or no admixture 45 between different clades. In this study, we examine the diversity in a key family of innate immune 46 genes - the Toll-like receptors (TLRs) - across the range of the Gentoo. The three TLRs that we 47 investigated present varying levels of diversity, with TLR4 and TLR5 greatly exceeding the diversity 48 of TLR7. We present evidence of positive selection in TLR4 and TLR5, which points to pathogen-49 driven adaptation to the local pathogen milieu. Finally, we demonstrate that two positively selected 50 co-segregating sites in TLR5 are sufficient to alter the responsiveness of the receptor to its bacterial 51 ligand, flagellin. Taken together, these results suggest that Gentoo penguins have experienced distinct 52 pathogen-driven selection pressures in different environments, which may be important given the role 53 of the Gentoo as a sentinel species in some of the world's most rapidly changing environments.

54

Introduction

55 All organisms are challenged by pathogens in their surrounding environments, but it is clear that the 56 pathogen pressure can vary by location. Similarly to free-living metazoans, a latitudinal species 57 richness gradient has been identified in several parasitic and pathogenic taxa, which may be driven by 58 temperature and other abiotic and biotic factors (Rohde and Heap 1998; Guernier, et al. 2004; Dionne, 59 et al. 2007). Given this gradient in pathogen pressure, it follows that natural selection on the host will 60 favour distinct immune phenotypes in different environments, as suggested by MHC II genetic 61 diversity patterns in Humboldt penguins associated with higher pathogen diversity in lower latitudes 62 (Sallaberry-Pincheira, et al. 2015; Sallaberry-Pincheira, et al. 2016). In our study, we sought to test 63 the hypothesis that pathogen-driven selection can drive distinct patterns of host immune system 64 genotype and phenotype, using the Gentoo penguin (Pygoscelis papua ssp.) as a model species 65 complex.

66 The Gentoo penguin complex (Vianna, et al. 2017; Clucas, et al. 2018) is ideally suited for 67 investigating pathogen-driven selection on the immune system. Firstly, it has a circumpolar range, 68 spanning the largest latitudinal range of any penguin species, between 46-66° S, with breeding

(4)

PDF Proof: Mol. Biol. Evol.

69 colonies in most of the Southern Ocean’s sub-Antarctic islands, as well as the islands off Tierra del 70 Fuego in South America, South Georgia, the Scotia Arc, and the Western Antarctic Peninsula 71 (Stonehouse 1970). Secondly, population monitoring of the species has shown it to be growing at the 72 southern end of its range (Lynch, et al. 2012), with highly fluctuating changes over time in colonies in 73 the South Atlantic and Indian Oceans (Lescroel and Bost 2006; Trathan, et al. 2007). Thirdly, the 74 Gentoo penguin is a highly philopatric seabird known to remain close to its breeding colonies year 75 round (Trivelpiece, et al. 1987; Wilson, et al. 1998; Clausen and Putz 2003; Hinke, et al. 2017), 76 limiting gene flow across breeding regions (Levy, et al. 2016; Vianna, et al. 2017; Clucas, et al. 77 2018).

78 Furthermore, across its range, Gentoo penguins overlap with (and occasionally co-occur in mixed 79 colonies with) King (Aptenodytes patagonicus), Magellanic (Spheniscus magellanicus), Macaroni 80 (Eudyptes chrysolophus), and Southern Rockhopper penguins (Eudyptes chrysocome) in

sub-81 Antarctic colonies, as well as congeneric Adélie (P. adeliae) and Chinstrap penguins (P. antarcticus) 82 in its Antarctic range. Gentoo penguin colonies are also frequented by a number of flying birds with 83 vast ranges, including albatrosses and petrels (Order: Procellariiformes), as well as

predator-84 scavengers like skuas (genus Stercorarius) and sheathbills (genus Chionis) that could introduce and/or 85 spread novel avian pathogens. Levels of human interaction also vary across the range, from

86 permanent settlements with livestock present near colonies in the Falkland/Malvinas Islands, to 87 seasonal or year-round scientific research stations in Sub-Antarctic and Antarctic colonies, and an 88 increasing presence of Antarctic tourism. Differences in sympatric interactions with other species 89 across the range of the Gentoo is likely to result in different pathogen challenges and therefore 90 different selective pressures.

91 To investigate genetic diversity across the immune system, many immunogenetic studies on penguins 92 have focused on the major histocompatibility complex (MHC; Tsuda, et al. 2001; Bollmer, et al. 93 2007; Knafler, et al. 2012; Sallaberry-Pincheira, et al. 2016). Increasingly, however, the Toll-like 94 receptors (TLRs) are recognised as important monogenic determinants of disease resistance

95 phenotypes, and are therefore important operands for natural selection (Grueber, et al. 2014). Toll-like 96 receptors are the best-studied family of pattern-recognition receptors in the vertebrate innate immune 97 system, representing the front line of detection of pathogen challenge (Kawai and Akira 2006). TLRs 98 respond to highly conserved microbe-associated molecular patterns (MAMPs) that are structurally 99 conserved in large groups of pathogens. Upon binding of a MAMP, TLRs undergo dimerization and 100 initiate an intracellular signalling cascade that culminates in the production of anti-pathogen effector 101 molecules (Akira, et al. 2001; Botos, et al. 2011).

102 Vertebrates have six major families of TLRs which are typically conserved across evolutionary time 103 to retain specificity for a particular MAMP or family of MAMPs. In most avian species, there are ten

(5)

PDF Proof: Mol. Biol. Evol.

104 recognized Toll-like receptors (Roach, et al. 2005; Boyd, et al. 2007; Brownlie and Allan 2011). Of 105 these, TLR4 and TLR5 respond to the bacterial agonists lipopolysaccharide and flagellin,

106 respectively, while TLR7 responds to single-stranded RNA of viruses in the endosomal compartment 107 (Chow, et al. 1999; Gewirtz, et al. 2001; Lund, et al. 2004).

108 To investigate TLR diversity across the range of the Gentoo penguin complex, we sequenced the full 109 coding sequences of TLR4, TLR5 and TLR7, as opposed to targeted portions of certain exons as in 110 previous studies (Dalton, et al. 2016a). These three genes represent bacterial- and viral-sensing Toll-111 like receptors that are present in almost all vertebrates. Samples (n = 155) were obtained from a broad 112 geographic range across the range of the species (Figure 1), representing the largest geospatial scale 113 of any immunogenetics study outside of humans. We describe patterns of diversity in TLRs that have 114 a clear spatial component, and provide evidence that some of the diversity in TLR4 and TLR5 is 115 driven by positive selection between different locations. We also demonstrate that two of the 116 positively selected residues in TLR5 yield a phenotypic difference in the response of the receptor to 117 flagellin, providing further evidence that Gentoo penguins have experienced differential pathogen-118 driven selection pressures in different environments.

119

Results

120 Amplification of TLR genes in the Gentoo penguin

121 Through successful amplification by PCR or whole-genome sequencing, we were able to confirm that 122 Gentoo penguins have clear homologs of TLR4, TLR5, and TLR7, finding no evidence of gene loss or 123 pseudogenization, as has been reported in other avian lineages for TLR5 (Velová, et al. 2018) or in 124 African penguins for TLR7 (Dalton, et al. 2016a).

125 The length of the P. papua TLR4 coding sequence matches the longest reported length (2550 bp/849 126 aa) in other bird species. For TLR5, there is a start codon that yields an open reading frame in line 127 with the length of previously published TLR5 sequences (2589 bp/862 aa; Velová, et al 2018), but the 128 ORF continues upstream of the putative start codon, yielding a complete ORF that is 2643 bp/880 aa, 129 which is 54 bp longer than other reported avian TLR5 sequences. Both the longer and shorter ORF 130 respond to flagellin in our in vitro system (data not shown), suggesting both could be functional in 131 vivo. The length of the TLR7 coding exon, at 3126 bp/1042 aa falls within the reported range of avian

132 coding sequence lengths.

133 Diversity Indices and Population Differentiation 134 Mitochondrial Hypervariable Region (HVR1)

135 All colonies with more than two sampled individuals presented high levels of mitochondrial

136 hypervariable region 1 haplotype diversity (Hd = 0.60-1.00; Figure 2 and Supplementary Table S1)

(6)

PDF Proof: Mol. Biol. Evol.

137 as obtained in DnaSP 6.12.10 (Rozas, et al. 2017). Differentiation between colonies at this locus was 138 significant and in line with previous data for this species (see Supplementary Table S2D), with four 139 clearly differentiated clades obtained through population-level analyses in Arlequin v3.5.1.3

140 (Excoffier and Lischer 2010): (1) a southern clade consisting of colonies South of the Polar Front in 141 South Georgia, the South Orkneys, the South Shetlands, and Western Antarctic Peninsula; (2) a South 142 American/Falklands/Malvinas clade, (3) a Kerguelen clade and (4) a North Indian Ocean clade 143 (Marion and Crozet Islands).

144 Our BEAST 2 phylogenetic analysis of HVR1 showed support for a division within the Gentoo 145 penguin complex occurring approximately 3.36 Mya (1.72-4.88 Mya), when the North Indian Ocean 146 clade diverges from all others. The Kerguelen lineage diverges from the Atlantic lineages 2.36 Mya 147 (1.14-3.51 Mya), and the populations North and South of the Polar Front within the Atlantic Ocean 148 appear to diverge 1.19 Mya (0.51-1.75 Mya; Figure 3). Within each clade, no clear site-specific 149 mtDNA structure was noted.

150 TLR4

151 For TLR4, 13 polymorphic sites were identified, with a total of 21 distinct phased haplotypes coding 152 for 9 unique protein variants (Supplementary Table S1). Indian Ocean colonies (CR, MAR, COU, 153 and MO) presented the highest levels of haplotype diversity (Hd = 0.66-0.90), while the South

154 American colony in Tierra del Fuego (MT) and the Falkland/Malvinas Islands (CB and BR) had very 155 low diversity (Hd = 0.00-0.05; Figure 2). Among Gentoo penguin colonies south of the Polar Front,

156 diversity was highest in the central colonies of SIG, COP, SP, and BO (Hd = 0.51-0.63), decreasing

157 sharply in BI in South Georgia (Hd = 0.00), as well as southward down the Western Antarctic

158 Peninsula (GGV and JP; Hd = 0.08-0.26).

159 Comparing pairwise genetic distances (FST and ΦST) for in Arlequin TLR4 (Figure 5 and 160 Supplementary Table S2A), the Indian Ocean populations of Crozet and Marion differed

161 significantly (FST > 0.3, p < 0.01) or near-significantly (corrected p ~ 0.01) from all other colonies in 162 the Atlantic. Crozet and Marion did not differ significantly from each other (FST = 0.001, p = 0.40), 163 while some haplotypes were shared between CR/MAR and the Kerguelen Island colonies of COU and 164 MO, yielding some non-significant pairings among the four Indian Ocean colonies. In the Atlantic, 165 one haplotype, also present in the Indian Ocean in lower frequencies, dominated across all the

166 colonies (Figure 4). The central sites in the Scotia Arc (COP/SP on King George Island, Signy Island, 167 and BO on the northern end of the Western Antarctic Peninsula) exhibited greater diversity than other 168 Atlantic sites and contained private alleles. The overall pattern of significance for FST and ΦST

169 comparisons can be seen in Figure 5. Not surprisingly, colonies within the same island group that are 170 in close proximity to each other (COU/MO, CB/BR, COP/SP) did not differ significantly, despite 171 variations in sample sizes.

(7)

PDF Proof: Mol. Biol. Evol.

172 Hierarchical population structure was detected for TLR4 across Gentoo penguin colonies using 173 Arlequin (AMOVA, global FST = 0.32, p < 0.0001). The proportion of variation resulting from 174 differences among groups was 24.01% (FCT = 0.24, p = 0.003) when colonies were placed into four 175 groups, coinciding with the four mtDNA clades: (1) Marion/Crozet archipelagos; (2) Kerguelen Is.; 176 (3) Falkland/Malvinas and Tierra del Fuego; and (4) south of the Polar Front in the Scotia Arc and 177 Antarctic Peninsula. However, FCT increased to 28.19% (p = 0.007) when Bird Island was grouped 178 separately from other Southern Gentoo penguin colonies, a pattern also suggested in genomic-level 179 analyses (Clucas, et al. 2018).

180 TLR5

181 TLR5 was the most diverse TLR locus analysed. Twenty polymorphic sites were identified, with a

182 total of 46 distinct phased haplotypes coding for 32 unique protein variants (Supplementary Table 183 S1). Five Gentoo penguin colonies north of the Polar Front (COU/MO in Kerguelen, CB/BR in 184 Falklands/Malvinas and MT in Tierra del Fuego) exhibited the highest diversity measures (6-17 185 haplotypes, Hd = 0.77-0.92, Figure 2), despite differences in sample size. This is unexpected given

186 that Martillo Island is the smallest known population of this species (Nc = 12 breeding pairs, Ghys, et

187 al. 2008), yet still maintained high diversity (5 unique protein variants in a sample of n = 5) at this 188 locus. Interestingly, Crozet and Marion Island colonies exhibited substantially lower genetic variation 189 at this locus with only two haplotypes (Hd = 0.13-0.44), though these were shared with the Kerguelen

190 colonies (Hd = 0.86-0.87; Figure 4). Southern colonies exhibited moderate diversity (Hd = 0.51-0.65),

191 with the exception of SP in the South Shetland Islands and the Western Antarctic Peninsula at the 192 edge of the range, with only two or three haplotypes (Hd = 0.12-0.38). Strikingly, Atlantic colonies

(8)

PDF Proof: Mol. Biol. Evol.

207 TLR7

208 TLR7 was the least diverse TLR locus, with 9 polymorphic sites, 10 phased haplotypes, and 8 unique

209 protein variants present across the study colonies (Figure 2 and Supplementary Table S1). One 210 haplotype was predominant in all colonies (frequency of 70-100%) and 7 of the 10 haplotypes were 211 private alleles, only found in single colonies (Figure 4). In the Indian Ocean, Kerguelen colonies 212 (COU/MO) had relatively more diversity (Hd = 0.34-0.50) than the Crozet and Marion Island colonies

213 (Hd = 0.00), which had only one haplotype. In the Atlantic, South Georgia (BI) had only one

214 haplotype (Hd = 0.00), while all northern Atlantic colonies (CB/BR/MT) and SP in the South Shetland

215 Islands presented two haplotypes (Hd = 0.06-0.47). Other southern colonies contained 3-4 haplotypes

216 (Hd = 0.38-0.60), while the southernmost colony at Jougla Point (JP) on the Western Antarctic

217 Peninsula exhibited the most unique haplotypes (H = 5; Hd = 0.38) and contained two private alleles.

218 Unsurprisingly, only 1 of 91 pairwise comparisons between colonies were significant in terms of FST 219 and 3/91 for ΦST (p < 0.01), with no pattern to this differentiation (Figure 5 and Supplementary 220 Table S2C). Different AMOVA groupings yielded an among-group variation (FCT) no higher than 221 2.85% (four-clade grouping), further highlighting the lack of structure in this locus.

222 Effect of Population Size on Diversity

223 Census population size was not significantly correlated to TLR haplotype diversity (See

224 Supplementary Figure S1 and Supplementary Tables S3 and S4: TLR4 p = 0.067; TLR5 p = 0.75; 225 TLR7 p = 0.64).

226 Isolation by Distance

227 Significant isolation by distance, using shortest distances by sea between colonies in a Mantel’s test, 228 was detected in both TLR4 (r = 0.515, p = 0.001) and TLR5 (r = 0.593, p = 0.001; Supplementary 229 Table S5A). mtDNA HVR1 was less strongly correlated to isolation by distance (r = 0.312), though 230 marginally significant (p = 0.015). On the other hand, the lack of diversity and structure in the TLR7 231 locus yielded no significant correlation across the range of Gentoo penguin colonies sampled. 232 Analysis of Positive Selection

233 We investigated the P. papua TLR4, TLR5 and TLR7 genes for evidence of positive selection, which 234 could be an indicator of adaptation to local pathogen environments across the natural range of the 235 species. Neutrality tests (Tajima’s D and Fu’s Fs) did not yield observable patterns of significant 236 deviation from neutrality across the full length of these genes (Supplementary Table S1). Using a 237 codon-specific approach, the site models in the codeml package of programs in PAML v4.9 (Yang 238 1997; Yang 2007) were employed to test for signatures of positive selection in P. papua TLR loci.

(9)

PDF Proof: Mol. Biol. Evol.

239 Non-synonymous sites observed and analysed are graphically depicted in their relative positions on 240 the proteins in Figure 6.

241 As expected, the majority of codons (TLR4, 98.8%; TLR5, 98.4%) were predicted to be under

242 purifying selection with the ratio of non-synonymous to synonymous substitutions (dN/dS) being < 1. 243 Interestingly, 1.2% (TLR4) or 1.6% (TLR5) of codons in the alignment were found to be positively 244 selected using M2a, and similar frequencies, 1.2% (TLR4) or 1.7% (TLR5), were found using M8. For 245 TLR7, 99.9% of sites were found to be under purifying selection, while the remaining 0.1% were

246 predicted to be under neutral selection. We investigated whether models that permit positive selection 247 were a significantly better fit to the multiple alignments than models where dN/dS ≯ 1 by performing 248 likelihood ratio tests between pairs of models. For TLR4 and TLR5, all model comparisons (M1a vs. 249 M2a, M7 vs. M8, and M8a vs. M8) significantly favoured the positive selection model compared to 250 the neutral model (TLR4, p  0.017; TLR5, p  7.210-23 for all comparisons), indicating that P. papua 251 TLR4 and TLR5 have likely undergone positive selection. In contrast, for TLR7, the data were not a

252 significantly better fit to the positive selection model compared to the neutral model (p = 1) so the null 253 hypothesis of codons being negatively- and neutrally-selected was not rejected.

254 For TLR4 and TLR5, we then used the Bayes Empirical Bayes (BEB) algorithm to infer the posterior 255 probability that a particular codon has experienced positive selection. For TLR4, three codons were 256 predicted to have undergone positive selection at posterior probability of >0.90 under model M2a 257 (Supplementary Tables S6A-C). All but one of the TLR4 polymorphic residues (12, 82, 236, 316 258 and 445) were located in the extracellular (LRR) domain, of which two were positively selected (12 259 and 236). The final positively selected site (659) was located in the transmembrane domain (Figure 260 6). Of the three selected sites, one (12V/A) is a relatively non-conservative change (see

261 Supplementary Table S7 for amino acid distance metrics). One site in TLR4 (12) has previously 262 been found to be under positive selection in birds (Velová, et al. 2018), while the remaining two (236 263 and 659) are novel selected sites in penguins (Supplementary Table S8).

264 For TLR5, there were 13 amino acid variants, of which nine were predicted to have undergone 265 positive selection at posterior probability of >0.90 in both M2a and M8 (Supplementary Table S6D-266 F). Of these, four were located in the extracellular domain (10, 285, 442 and 535), one was in the 267 transmembrane domain (667), and four were located in the TIR (intracellular) domain (698, 747, 788 268 and 845; Figure 6). Two sites (442 and 698) have previously been reported as being positively 269 selected in other birds (Grueber, et al. 2014; Velová, et al. 2018), while the remaining seven sites are 270 novel selected sites in penguins. Interestingly, one site in the TLR5 extracellular domain (285) is 271 adjacent to two residues known to be important for flagellin binding in Interface B (Yoon, et al. 2012; 272 Song, et al. 2017), and so could be important for ligand preference (Supplementary Table S8; 273 Supplementary Figure S2). Four positively selected sites in TLR5 were non-conservative changes

(10)

PDF Proof: Mol. Biol. Evol.

274 (10C/Y, 442A/V, 667S/F and 788S/C). It is interesting to note that the non-conservative TLR5 667F/S 275 polymorphism is located in the transmembrane domain, a region that is typically constrained by the 276 physiochemical requirement to embed in the cell membrane. Furthermore, the homology-based 277 methods of amino acid substitution consequence prediction SIFT (Ng and Henikoff, 2003) and 278 PolyPhen-2 (Adzhubei, et al. 2013) both predict the S667F change to be of high functional 279 consequence (SIFT score = 0.01; PolyPhen-2 score = 0.998; Supplementary Table S9).

280 Transmembrane integrity is predicted to remain intact, despite the non-conservative polymorphism, 281 and the Phobius tool (Käll, et al. 2004) predicts the transmembrane domain is unchanged by the 282 polymorphism. While the vast majority (57/64, 89% of available sequences) of avian TLR5 have a 283 serine in this position in the transmembrane domain, only one other bird - the Northern Fulmar 284 (Fulmarus glacialis) - has a phenylalanine in this position (Supplementary Figure S3), indicating 285 there likely to have been a functional consequence to the S667F change in the Gentoo penguin. 286 Functional Analysis of Selected TLR5 Residues

287 While in silico methods can be useful indicators of protein residues under selection, functional study 288 is the only means to isolate the selected phenotype and its relevance. In order to assess whether key 289 selected sites identified in the positive selection analyses have functional consequences, we developed 290 an in vitro assay using transient expression of TLRs in a reporter cell line.

291 Since extracellular domain polymorphisms in TLRs are likely to give rise to preferences in ligand 292 type (Faber, et al. 2018), we focused on TMD/TIR domain polymorphisms that are likely to give rise 293 to differential signalling in response to the same agonist. Given that five TLR5 polymorphisms were 294 located in the TMD/TIR domain, we tested the two polymorphisms with the highest posterior 295 probability of selection from the PAML analysis – residues 667 (TMD) and 845 (TIR).

296 Polymorphisms at these positions segregate well with geographical location: birds from Crozet and 297 Marion were all homozygous for the 667S/845I haplotype, while 87.3% (n = 71) of birds from

298 colonies South of the Polar Front were homozygous for the derived haplotype, 667F/845V (Figure 7). 299 In the Kerguelen Islands, the ancestral 667S/845I haplotype predominated, but variants at both

300 positions were present at lower frequencies. South American/Falklands/Malvinas colonies were the 301 most diverse at the positions of interest, where 46.7% (n = 14) of birds were heterozygous at one or 302 both of the sites. Overall, however, polymorphisms at these loci tended to co-segregate: 72.4% (n = 303 110) of birds were homozygote for either the ancestral (667S/845I) or derived (667F/845V)

304 haplotypes.

305 Given the strong tendency for the alleles at these positions to co-segregate at the extremes of the range 306 of P. papua, and the fact that these were the TIR/TMD polymorphisms with the highest likelihood of 307 positive selection, the functional consequences of altering both residues together were investigated. 308 FLAG-tagged constructs of both of the TLR5 TIR/TMD variants with the same LRR domain were

(11)

PDF Proof: Mol. Biol. Evol.

309 transiently expressed in TLR5-/- HEK-Blue™ Null1 NF-κB reporter cells and protein expression levels

310 were normalised using an anti-FLAG ELISA. Cells expressing either constructs were then treated 311 with Salmonella enterica serovar Typhimurium-derived flagellin, or PBS control, and the NF- κB 312 response was measured. Cells expressing either construct responded to flagellin, demonstrating that 313 the P. papua TIR domain interacts efficiently with human adapter molecules. Interestingly, there was 314 a marked enhancement (~1.5 fold, p = 0.04) of the flagellin response in the variant that was

315 predominantly found in the Southern Gentoo compared to the variant found in the Indian Ocean 316 clades, suggesting that the derived haplotype 667F/845V has enhanced signalling capability compared 317 to the ancestral genotype 667S/845I (Figure 7). These data provide further evidence that P. papua 318 TLR5 has undergone positive selection for different immune capabilities during the expansion of the

319 species below the Polar Front towards the Antarctic Peninsula.

320

Discussion

321 All vertebrates are subject to challenge by a plethora of pathogens that can exert strong selective 322 pressures on host populations. The innate immune system, and in particular the Toll-like receptors, is 323 responsible for both recognizing, and responding to, a pathogen threat by inducing inflammation and 324 priming the adaptive immune response. To investigate TLR adaptation in Gentoo penguins, we 325 sequenced the entire coding regions of TLR4, TLR5 and TLR7, which recognize products from 326 bacterial and viral pathogens. Multiple individuals were sequenced from colonies at the extremes of 327 the species' range (~8000 km between the most distant colonies), providing an extensive geospatial 328 component to our analysis. We found spatially-associated patterns of diversity in the TLRs, although 329 greater diversity was observed in TLR4 and TLR5 compared to TLR7. Furthermore, clear evidence of 330 positive selection in both TLR4 and TLR5 was identified, which was further reinforced by the 331 demonstration that two of the TLR5 TMD/TIR domain polymorphisms are sufficient to alter the 332 magnitude of responsiveness to flagellin. To our knowledge, no other TLR study outside of humans 333 has supported predictions of positive selection with confirmation of functional polymorphism. 334 Patterns of Diversification and Selection

335 Most studies of TLR genetic diversity in wild populations have investigated small, bottlenecked, 336 and/or endangered avian populations (Grueber, et al. 2012; Grueber, et al. 2013; Hartmann, et al. 337 2014; González-Quevedo, et al. 2015; Dalton, et al. 2016a; Dalton, et al. 2016b). In these populations, 338 drift, rather than selection, is suspected to have been the main driver of sampled diversity due to 339 recent bottlenecks or pronounced founder effects. Several studies have documented TLR diversity in 340 domesticated animals such as pigs (Darfour-Oduro, et al. 2016), cows (Novak, et al. 2019), and 341 chickens (Świderská, et al. 2018), but large-scale studies on whole-gene TLR variation in a wild 342 population are lacking. Although study design can dramatically affect the diversity detected in 343 different studies it is noteworthy that with Gentoo penguins TLR5 exhibited higher diversity than

(12)

PDF Proof: Mol. Biol. Evol.

344 TLR4 whereas with domestic chicken breeds (n=110, 25 breeds; Świderská, et al. 2018) and grey

345 partridge (n=10; Vinkler, et al. 2015) TLR4 was more diverse than TLR5 (Supplementary Table 346 S10). The diversity of TLR7 (compared with TLR4 and/or TLR5) was relatively low in Gentoo 347 penguins, domestic chicken breeds and grey partridges (Supplementary Table S10).

348 To provide an internal reference for TLR diversity, we sequenced the mitochondrial hypervariable 349 region (HVR1) as a neutral marker in the same individuals. In line with previous studies, we found 350 evidence of at least four deeply divergent clades in P. papua based on HVR1 sequence (Vianna, et al. 351 2017; Clucas, et al. 2018; Pertierra, et al. in review). These more recent analyses support a revision, 352 first proposed by de Dinechin et al. (2012), of the previously-accepted two subspecies model that was 353 based on morphological characteristics (Stonehouse 1970). These four clades are likely to have much 354 greater divergence (millions of years) than what would be expected at the intraspecific level. We 355 found evidence of differentiation according to this underlying population structure in TLR4 and TLR5 356 which further supports the argument for taxonomic revision of the species, with particular focus on 357 the classification of colonies in South Georgia and the Indian Ocean. Conversely, TLR7 was highly 358 conserved across the species range and is clearly not subject to the same selection pressures as TLR4 359 and TLR5. Overall, our study highlights that the genetic differentiation across Gentoo penguin clades 360 is not just driven by drift, but by clear population-specific adaptations to the environment.

361 Diversity has been widely reported to vary between different families of TLRs, particularly

362 comparing extracellular and intracellular TLRs. Some authors have reported that TLRs that respond to 363 viral ligands are more likely to be under purifying selection, at least in mammals (Barreiro, et al. 364 2009; Wlasiuk and Nachman 2010; Wang, et al. 2016; Kloch, et al. 2018), although the pattern may 365 not be consistent in birds, with TLR3 displaying the greatest number of non-synonymous variants of 366 the four TLRs tested in different chicken breeds (Świderská, et al. 2018), and TLR7 diversity in the 367 house finch (Carpodacus mexicanus) far exceeding that of TLR4 and TLR5 (Alcaide and Edwards 368 2011). Consistent with the pattern observed in mammals (and also the lesser kestrel, Falco naumanni; 369 Alcaide and Edwards 2011), we observed overall nucleotide diversity measurements that were several 370 times higher for the extracellular/bacterial TLRs 4 and 5 (TLR4: 6.1  10-4 ± 0.5  10-4; TLR5: 14.8  371 10-4 ± 0.6  10-4) compared to the intracellular/viral TLR7 (1.0  10-4 ± 0.1  10-4), indicating strong 372 purifying selection for maintenance of function in TLR7. In addition, we found no evidence of any 373 codons under selection in TLR7, compared to three and nine sites in TLR4 and TLR5, respectively, 374 similar to the pattern of positively selected residues reported in several avian species (Alcaide and 375 Edwards 2011).

376 Within TLR sequences, levels of variation are not uniformly distributed across the domains of the 377 receptor. TLRs are type I integral membrane glycoproteins with highly conserved architecture across 378 large phylogenetic distances (Botos, et al. 2011). Typical TLR structure comprises an N-terminal

(13)

PDF Proof: Mol. Biol. Evol.

379 extracellular (or intraluminal for intracellular TLRs) leucine-rich repeat (LRR) domain for ligand 380 binding, a single transmembrane helix, and a C-terminal cytoplasmic signalling (Toll/interleukin-1 381 receptor, TIR) domain interacting with intracellular adapter proteins (Bell, et al. 2003). The leucine-382 rich repeat domain directly binds microbe-derived ligands in all known vertebrate TLRs, with the 383 exception of TLR4 recognition of lipopolysaccharide via an accessory molecule, myeloid

384 differentiation factor, MD-2 (Park, et al. 2009). Since the LRR domain represents the interface 385 between host and pathogen, and pathogens exhibit variable MAMPs to evade detection (Andersen-386 Nissen, et al. 2005), there is often an excess of diversity in the LRR domain compared to the TIR 387 domain (Świderská, et al. 2018; Velová, et al. 2018). In contrast, TIR domains interact with adapter 388 proteins such as MyD88 (myeloid differentiation primary-response protein 88) which are shared 389 between several TLRs, although a MyD88-independent pathway also facilitates TLR3 and TLR4 390 signalling (Akira and Takeda 2004). Unsurprisingly, TIR domains were found to be much more 391 highly conserved than their corresponding extracellular domains in a study of 366 vertebrate TLRs 392 from 96 species with the exception of TLR10 (Mikami, et al. 2012). Within species, the same trend is 393 evident: in a study of TLR3, 4, 5 and 7 diversity across domestic chicken breeds, only three of the 46 394 non-synonymous polymorphisms (two in chTLR3 and one in chTLR7) were located in the TIR domain 395 (Świderská, et al. 2018).

396 In line with previous evidence for the asymmetric distribution of polymorphisms in TLR domains, we 397 identified an excess of polymorphisms in the LRR domain compared to the TIR domain in two of the 398 three TLRs studied (TLR4: 8 LRR vs. 0 TIR; TLR7: 6 LRR vs. 0 TIR). Interestingly, P. papua TLR5 399 contained a greater number of TIR domain polymorphisms than would be expected from other species 400 (11 LRR vs. 7 TIR), particularly given the LRR is over three-times the length of the TIR domain. Of 401 the seven TLR5 TIR domain polymorphisms, five were non-synonymous substitutions, suggesting that 402 the TIR domain of TLR5 has been under selection to modulate signalling intensity.

403 Somewhat surprisingly, we also identified non-synonymous polymorphic sites in the transmembrane 404 domains of both TLR4 (659 Ala/Thr) and TLR5 (667 Ser/Phe). The transmembrane domain is an 405 uncommon location for TLR polymorphisms, presumably because the region is highly constrained by 406 chemical and functional requirements. As such, the effects of polymorphisms in this region are often 407 large. For instance, the human TLR1 602S variant is associated with disrupted cell surface localization 408 of the receptor but is protective against pathology associated with leprosy. It is also noteworthy that 409 the Gentoo penguin TLR5 transmembrane polymorphic site (667) identified in this study is highly 410 conserved elsewhere in avian phylogeny. Of the other birds with published TLR5 sequences, 411 displayed in the alignment (Supplementary Figure S3), 57 (89%) have a serine (ancestral P. papua 412 genotype) at this position in the transmembrane domain, and only one other bird – the Northern 413 Fulmar (Fulmarus glacialis) – has a phenylalanine residue (derived P. papua genotype). The high 414 conservation of serine at this position in the protein points to a widespread pressure for maintenance

(14)

PDF Proof: Mol. Biol. Evol.

415 of function across avian phylogeny, and provides more evidence of a positively selected residue with 416 functional consequences.

417 Functional polymorphisms in TLR5 support positive selection

418 We identified a number of positively selected codons in both TLR4 and TLR5, making both of these 419 receptors candidates for further functional investigation. Polymorphisms in TLR LRR domains have 420 the potential to yield preferences for subtly different microbial ligands, such as LPS or flagellins from 421 different bacterial species (Nahori, et al. 2005; Faber, et al. 2018). However, very limited data are 422 available regarding which pathogens are present in the environments of each of the Gentoo clades, 423 and therefore elucidation of any differences in ligand preference will require further study. We did, 424 identify one positively selected site in TLR5 (285) that is adjacent to two important resides for 425 flagellin binding in Interface B (Yoon, et al. 2012; Song, et al. 2017). This site would be a good 426 candidate for functional investigation of changes in flagellin ligand preferences, but this would be 427 difficult in the absence of known flagellin variants in candidate P. papua pathogens. Instead, we 428 chose to investigate TIR and transmembrane domain polymorphisms for functional consequences 429 because these can yield signalling intensity differences in response to the same agonist (Faber, et al. 430 2018). Given that the TIR domain of TLR4 did not show any non-synonymous polymorphisms, we 431 focused on the TLR5 TIR/transmembrane domain, and in particular the two residues with the 432 strongest signature of positive selection (667 and 845). Site 667 was likely to be of significant 433 functional consequences because of its transmembrane location, non-conservative amino acid change 434 (serine to phenylalanine) and both SIFT and PolyPhen-2 predicting the change to be of high

435 importance.

436 The Gentoo penguin is reported to have undergone a circumpolar expansion, with ancestral 437 populations in the Indian Ocean seeding northern populations that expanded into the Atlantic, and 438 further expansions south of the Polar Front and to the West Antarctic Peninsula – the southernmost 439 extreme of the range (de Dinechin, et al. 2012; Peña, et al. 2014). It is interesting to note that one of 440 the ancestral Indian Ocean clades (Marion and Crozet archipelagos) is completely dominated by birds 441 of the 667S/845I genotype, while the most derived clade of Gentoo penguins south of the Polar Front 442 are almost entirely dominated by the 667F/845V genotype. These data may reflect an incipient 443 selective sweep of the 667F/845V genotype in Southern Gentoo penguins.

444 An alternative explanation for the reduction in diversity at residues 667 and 845 could be genetic 445 bottlenecking during the expansion of P. papua south of the Polar Front. However, evidence from 446 neutral markers in this study and a previous study (Levy, et al. 2016) reveal that neutral variation is 447 maintained in the Southern Gentoo penguin colonies at levels comparable with the northern Atlantic 448 Gentoo penguin clade up to the southernmost extreme of the range. In TLR5, we also saw no

449 correlation between census population size and haplotype diversity. These findings support the

(15)

PDF Proof: Mol. Biol. Evol.

450 hypothesis that a selective sweep, rather than a bottlenecking event, is responsible for the near-451 fixation of the TLR5 667F/845V haplotype in the Southern Gentoo penguin. Perhaps more

452 importantly, the finding that the 667F/845V haplotype has enhanced signalling capability provides a 453 functional basis for selection of this TLR5 haplotype.

454 Potential drivers of selection

455 Toll-like receptors, like other genes of the immune system, are subject to competing types of 456 selection. Balancing selection works to maintain diversity at a population level in response to the 457 diversity of pathogens in the environment, as was recently proposed in TLRs of the bank vole 458 (Myodes glareolus; Kloch, et al. 2018). In contrast, purifying selection may predominate (to retain 459 key functionality), which has been described in large-scale studies of human TLRs in different ethnic 460 backgrounds (Mukherjee, et al. 2014). Finally, positive selection may promote the fixation of novel 461 variants that confer a fitness advantage in the response to pathogens. In the present study, we found 462 evidence of positive selection in TLR4 and TLR5, which likely indicates that the pathogen

463 composition differs substantially between distinct locations in the Gentoo penguin’s range.

464 Spatial heterogeneity in the profile of pathogens that afflict Gentoo penguins would be a key driver 465 for the patterns of selection identified in the TLR variants. Latitudinal species diversity gradients have 466 been described for pathogens (and their hosts) (Rohde and Heap, 1998; Guégan, et al. 2008, Dionne, 467 et al. 2007), which might suggest fewer pathogens in Antarctic species. However, a diverse range of 468 pathogens are found in these environments (discussed below). Moreover, within the Gentoo range 469 there are diverse biotic and abiotic characters that exhibit spatial variation (Trathan, et al. 2007, 470 Barbosa, et al. 2009, Barbosa, et al. 2013, Lamont, et al. 2018; Chown, et al. 2015) and these factors 471 will affect the transmission of pathogens. Indeed, the regionalized selection of TLR alleles in different 472 sectors of the Gentoo range support the premise that different challenges are more prevalent or 473 pathogenic in different populations.

474 The dense colonial conditions and ubiquitous guano (faeces) that characterise Gentoo penguin 475 habitats provide ideal conditions for the transmission of a wide range of pathogens transmitted by 476 direct contact or faeces. Furthermore, penguins as a group are known to be highly susceptible to a 477 variety of infectious diseases, including, avian cholera (Jaeger, et al. 2018), avian pox (Kane, et al. 478 2012), avian malaria (Fix, et al. 1988; Grilo, et al. 2016) and aspergillosis (Flach, et al. 1990). A 479 number of infection associated mass mortality events have been documented in both wild and captive 480 penguin populations (Grimaldi, et al. 2015). However, little is known about the pathogens that exist in 481 sub-Antarctic and Antarctic regions, their prevalence, or their fitness costs on penguin populations. 482 Limited data are available on the prevalence of diseases in penguin populations (Clarke and Kerry 483 2000; Barbosa and Palacios 2009; Woods, et al. 2009; Grimaldi, et al. 2015), and most studies rely 484 upon short notes, observations, and case reports closely tied to obvious signs of disease and mass

(16)

PDF Proof: Mol. Biol. Evol.

485 mortality in well-studied and highly visited penguin colonies. Studies that survey the environmental 486 and host microbiomes to characterise pathogen presence in polar regions remain limited to sites near 487 major polar research stations, have small sample sizes, and/or do not cover large spatial and temporal 488 ranges (Zdanowski, et al. 2004; Dewar, et al. 2013; Fan, et al. 2013; Ma, et al. 2013; Dewar, et al. 489 2014).

490 The presence of Gram-negative bacteria exhibiting both lipopolysaccharides and flagella, including 491 Campylobacter, Escherichia, Salmonella and others, has been demonstrated in Gentoo penguin

492 colonies (Dimitrov, et al. 2009; Bonnedahl 2011; Barbosa, et al. 2013; González-Acuna, et al. 2013; 493 García-Peña, et al. 2017). However, it is not known whether any of these (or other bacterial

494 pathogens) vary across the Gentoo penguin’s range, or may have played a role in the selection of 495 Gentoo penguin TLR4 or TLR5 variants.

496 Studies of single-stranded RNA viruses (which would typically be recognised by TLR7) are similarly 497 lacking in Gentoo penguins. Though single-stranded RNA viruses, including the causative agents of 498 Newcastle disease virus and avian influenza, have occasionally been detected in Pygoscelis penguins 499 through immunological assays and direct isolation (Morgan and Westbury 1988; Wallensten, et al. 500 2006; Neira, et al. 2017; Olivares, et al. 2019; Wille, et al. 2019), the fitness consequences of viral 501 infection on penguin populations are unknown. We know of only one case report from Signy Island 502 where evidence of a puffinosis outbreak (normally caused by Coronavirus) was described in Gentoo 503 penguin chicks (Mac Donald and Conroy 1971). The viral drivers behind the strong purifying

504 selection we observed in TLR7 are unknown, but it could be that the ssRNA viruses that affect Gentoo 505 penguins are less diverse across the species range than flagellated Gram-negative bacteria.

506 Sympatric interactions with a diversity of migratory flying birds and their parasites may be important 507 contributors to pathogen diversity in Gentoo penguin colonies. Birds such as albatrosses, petrels, 508 shearwaters, sheathbills, shags, gulls, terns, and skuas are often observed in close proximity to Gentoo 509 penguin colonies and there are 46 species recognised in Antarctica alone (Lepage, et al. 2014). There 510 is some evidence that ectoparasites and blood parasites are transmitted between co-occurring bird 511 species (Barbosa et al. 2011; Levin, et al. 2013). It is plausible that cross-species transmission events 512 are important in cross-colony transmission and structuring the profile of pathogens afflicting

513 particular Gentoo penguins.

514 It remains unclear why the two functionally tested TMD/TIR residues in TLR5 would confer 515 increased responsiveness to flagellin in Gentoo penguins south of the Polar Front. TLR signalling 516 must be tightly controlled and aberrant TLR-induced inflammation can lead to immune pathology, 517 toxic shock syndrome and death. It is unsurprising, therefore, that TLR polymorphisms have been 518 described that confer reduced sensitivity to their agonist and a state of tolerance. For instance, the 519 replacement of a highly conserved proline residue by a histidine at position 712 of TLR4 confers

(17)

PDF Proof: Mol. Biol. Evol.

520 endotoxin resistance in certain strains of mice (Qureshi, et al. 1999). It could be that the exposure to 521 (or diversity of) pathogens is decreased for the Southern Gentoo penguin clade compared to other 522 clades of Gentoo penguins, and thus individuals can tolerate enhanced signalling to a prevailing 523 infection. Alternatively, the enhanced signalling could be a manifestation of the Southern Gentoo 524 penguin adapting to a particular pathogen that is present in the West Antarctic Peninsula and absent 525 elsewhere. The finding of adaptive changes in the Gentoo penguin immune system necessitates a 526 much better understanding of the pathogen threats faced by Gentoo penguins in order for their 527 significance to be realized.

528 Concluding remarks

529 This wide-ranging immunogenetic study of Toll-like receptors in wild Gentoo penguins reveals 530 differential selection and adaptation to local pathogen pressure. While the drivers behind the observed 531 patterns of diversity and selection remain unclear in the context of currently available data, it is clear 532 that the Gentoo penguin has undergone adaptation to local pathogen assemblages across its range. 533 Infectious disease threats to penguins are likely to become ever more severe in the coming decades 534 given the rapidly changing polar climate (Mayewski 2012; Lynch, et al. 2012). There is also evidence 535 of reverse zoonosis of enteric bacteria being transmitted from humans to sea bird species in Antarctica 536 (Cerda-Cuellar, et al. 2019), which could further increase transmission, especially in light of

537 increasing tourist and scientific research program presence in Antarctica. Although the Gentoo 538 penguin is not currently one of the 13 out of 18 penguin species with a conservation state of 539 threatened or near-threatened, certain sub-Antarctic populations have experienced sharp declines 540 (Crawford, et al. 2003; Lescroel and Bost 2006; Crawford, et al. 2009; Crawford, et al. 2014). 541 Consequently, the vulnerability of pathogen-naïve populations of penguins should not be

542 underestimated, nor should the importance of the Gentoo penguin as a sentinel species in the Southern 543 Ocean (Carpenter-Kling, et al. 2019).

544 Our findings have important implications for the conservation of not just Gentoo penguins, but also 545 many other vertebrate species, both in the wild and in captivity. Until now, most efforts to genetically 546 delineate conservation units have relied mostly on neutral markers. The ever-increasing availability of 547 genomic data allows targeted analysis of pathogen-recognition and other immune genes to assess 548 whether different populations possess specific functional adaptations to their environments and should 549 therefore be conserved separately. The approach used here, together with pathogen discovery and 550 surveillance systems, could better define conservation units in species that occupy varied habitats and 551 ecological niches in order to focus resources on potentially susceptible populations.

(18)

PDF Proof: Mol. Biol. Evol.

554 This study used 155 blood samples from Gentoo penguins, previously obtained in the framework of 555 other projects. Samples were collected between 1999-2017 at the 14 sites shown in Figure 1 (details 556 in Supplementary Table S11). To take blood, penguins were held with the flippers restrained and the 557 head placed under the arm of the handler, or they were wrapped in cushioned material covering the 558 head and preventing movement, to minimize stress during handling (Lemaho, et al. 1992). A second 559 handler took up to 1 mL blood from the brachial, intertarsal or jugular vein using a 25G or 23G needle 560 and 1 mL syringe or capillary, after cleaning the area with an alcohol swab. Total restraint time was 561 generally two to three minutes. The animal was then released at the edge of the colony and observed 562 to ensure it returned to its normal behavior. Blood was stored in 95% ethanol or Queen’s Lysis Buffer 563 at -20 °C for transport at room temperature and subsequent storage at -20 °C upon arrival. All blood 564 samples were imported under the appropriate animal by-product import licenses.

565 Sampling was conducted under permits from each site’s territorial government or governing agency. 566 These permits for animal handling were issued following independent institutional ethical review of 567 the sampling protocols, in accordance with Scientific Committee for Antarctic Research (SCAR) 568 guidelines.

569 DNA extraction

570 DNA for samples from MO, CB, BI, COP, and JP was extracted from blood samples using QIAGEN 571 DNeasy Blood and Tissue kits. The digestion step was modified to include 40 μL proteinase K and 572 extended to 3 hours for blood samples. Details of the modifications made to the protocols for tissue 573 samples are available in (Younger, Emmerson, et al. 2015; Younger, Clucas, et al. 2015). All these 574 samples were treated with 1 μL Riboshredder (Epicentre) to reduce RNA contamination and DNA 575 was visualized on a 1% agarose gel to confirm high molecular weight DNA was present. DNA 576 concentration and purity was measured on a Qubit and Nanodrop (Thermofisher Scientific),

577 respectively. These samples are stored at the University of Oxford for future analysis. DNA from all 578 other sampling sites was isolated using a modified salt protocol (Aljanabi and Martinez 1997), with 579 details in Vianna, et al. (2017), stored at the Pontificia Universidad Católica de Chile for future 580 analysis.

581 TLR Genotyping

582 Primers for TLR4, 5, and 7 coding regions were designed using the primer design feature based on 583 Primer3 2.3.7 (Untergasser, et al. 2012) in Geneious v.11.0 (Biomatters, http://www.geneious.com). 584 Reference coding sequences for primer design were derived from the congeneric Adélie penguin 585 (Pygoscelis adeliae) reference genome (Genbank accession JMFP01000000) and unpublished Gentoo 586 penguin genomic data. TLR4 amplifications for samples from MO, CB, BI, COP, and JP, were 587 conducted in 12 µL volumes (9 µL Qiagen Taq PCR Master Mix, 2 µL 10 µM forward and reverse 588 primer mix, and 1 µL of template DNA diluted 1:100). TLR5 and TLR7 amplifications for these

(19)

PDF Proof: Mol. Biol. Evol.

589 samples were conducted in 25 µL volumes containing 5 µL 5X Phusion High Fidelity (HF) Buffer 590 (New England Biolabs, UK), 0.5 µL 10 mM dNTPs, 1.25 µL of 10 µM forward primer, 1.25 µL of 10 591 µM reverse primer, 2 µL of template DNA diluted 1:100, 0.25 µL of Phusion Hot Start Flex DNA 592 Polymerase (New England Biolabs, UK), and 14.75 µL nuclease-free water. One GC-rich region in 593 TLR7 required the use of Phusion GC buffer, rather than HF buffer for amplification. PCR products

594 were visualized on a 1% agarose gel stained with SYBR Safe. The primers and PCR reaction 595 conditions are fully detailed in the Supplementary Information (Supplementary Table S12). PCR 596 products were sequenced using Macrogen Europe’s EZ-Seq (TLR4) or Eco-Seq (TLR5 and TLR7) 597 services (http://www.macrogen.com, Netherlands) for purification and Sanger sequencing, using the 598 same PCR primers for sequencing.

599 For all remaining sampling sites, which underwent whole genome sequencing, a total of 100 ng of 600 genomic DNA was fragmented to an average of 350 base pairs to construct paired-end libraries using 601 the Illumina TruSeq Nano kit with the included indexed adapter and barcode. A total of six PCR 602 cycles were used for enrichment, purified with Sample Purification beads, quantified using a Qubit 603 fluorometer and then sequenced to ~20x coverage with 150 paired-end reads using an Illumina HiSeq 604 X platform at MedGenome (USA).

605 Sequences for each TLR coding region were assembled, edited, and aligned using Geneious v.11.0. 606 For TLR4, which has multiple exons along the coding region, exon sequences were extracted and 607 concatenated for further analysis. Heterozygous sites and single nucleotide polymorphisms (SNPs) 608 were detected by visually examining chromatograms. In cases of doubt, resequencing was

609 accomplished so that only high-quality reads from multiple sequencing runs were called as SNPs. All 610 heterozygous sites also had homozygous individuals within the dataset, and each gene had at least one 611 haplotype homozygous across the full length of the gene. All alleles were verified using a

612 combination of multiple independent Sanger sequencing runs and where available, the whole genome 613 sequencing data. The International Union of Pure and Applied Chemistry (IUPAC) code for

614 degenerate nucleotides was used to label heterozygous positions.

615 mtDNA Genotyping

616 For mitochondrial DNA, the hypervariable region of the mitochondrial control region (HVR1), also 617 known as Domain I, was amplified using the primers GPPAIR3F and GPPAIR3R (Clucas, et al. 618 2014) for samples from MO, CB, BI, COP, and JP. Amplifications were conducted in 25 µL volumes 619 according to the manufacturer’s instructions, using Phusion Hot Start Flex DNA Polymerase (New 620 England Biolabs, UK) and 2 µL of template DNA diluted 1:100. Amplifications involved a two-step 621 PCR, with an initial cycle of 98 °C for 30 seconds, 40 cycles of 98 °C for 10 seconds and 72 °C for 20 622 seconds, followed by a 10-minute extension at 72 °C. PCR products were visualized on a 1% agarose 623 gel stained with SYBR Safe. PCR fragments were purified using an ethanol/sodium acetate

(20)

PDF Proof: Mol. Biol. Evol.

624 precipitation, and sequencing was performed using the Applied Biosystems BigDye Terminator v3.1 625 sequencing kit (Applied Biosystems) with the same PCR primers as sequencing primers.

626 Published mtDNA HVR1 sequences for the samples from BI (GenBank accessions KJ646314-627 KJ646330, n = 16) and COP (KJ646361-KJ646382, n = 21) were included in the analysis for the 628 relevant individuals. mtDNA data was not available for the individuals from two Antarctic sites. 629 Though other individuals from those sites have sequence data available, we only included data from 630 individuals sequenced for both TLRs and HVR1.

631 Individual mtDNA fragments from all remaining sites (CR, MAR, COU, BR, MT, SIG, SP) were 632 amplified using the primers tRNAGlu and AH530 (Roeder, et al. 2002). These reactions included 0.4 633 uM of each primer, 1.5 mM 1X of PCR reaction buffer, MgCl2, 200 uM of each dNTP, and 1U of Taq 634 Polymerase Platinum (Invitrogen) in a two-phase touchdown program (Korbie and Mattick 2008): (1) 635 10 minutes at 95 C, and 11 cycles of 95 C for 15 seconds; a touchdown with an annealing

636 temperature of 60 C–50 ºC for 30 seconds, with one cycle per 1 C interval, and 72 C for 45 637 seconds; (2) 35 amplification cycles at 95 C for 15 seconds, 50 C for 30 seconds, and 72 C for 45 638 seconds; and a final extension period of 30 minutes at 72 C. The purification of these PCR products 639 and sequencing was carried out by Macrogen using an ABI PRISM 3730XL.

640 Only overlapping segments of the HVR1 sequences common to all samples were used by aligning and 641 editing with Geneious v11.0. Consensus sequences for the resulting 273 bp region of interest were 642 extracted for analysis.

643 Population-Level Analyses

644 Haplotypes were inferred for each of the diploid TLR loci using the PHASE algorithm (Stephens, et 645 al. 2001) implemented in DnaSP 6.12.10 (Rozas, et al. 2017) with 10,000 iterations and 1,000 burn-in 646 iterations. Phased haplotype data was used as input to determine standard genetic diversity measures 647 of each population, including number of polymorphic sites, haplotypes, haplotype diversity, and 648 nucleotide diversity, using DnaSP 6.12.10 and Arlequin v3.5.1.3 (Excoffier and Lischer 2010). 649 Minimum spanning haplotype networks were constructed and visualized using PopART 1.7 (Bandelt, 650 et al. 1999; Leigh and Bryant 2015) for each locus. DnaSP was also used to identify synonymous and 651 non-synonymous polymorphic sites and frequencies. Arlequin was used to calculate Tajima’s D 652 (Tajima 1989), and Fu’s Fs (Fu 1997). FSTAT v.2.9.3 (Goudet 1995) was used to calculate allelic 653 richness, taking into account differences in sample size.

654 Because TLR nucleotide diversity has been observed to have a correlation to population size in some 655 island bird populations (Gilroy, et al. 2017), we evaluated the relationship between Gentoo penguin 656 census population sizes (Nc) and haplotype diversity (Hd) at each locus. Gentoo penguins are

(21)

PDF Proof: Mol. Biol. Evol.

657 philopatric, but also show evidence of admixture within island groups and adjacent coastlines (Levy, 658 et al. 2016 and Vianna, et al. 2017). For this reason, the census population sizes selected for the 659 analysis were numbers of breeding pairs from the most recent available surveys of each archipelago or 660 region (in the case of the South Shetland Islands and Western Antarctic Peninsula). The Nc size and

661 source survey reference is available in Supplementary Table S3. Spearman’s rank correlations and p 662 values were calculated for each diversity-size comparison.

663 For population differentiation comparisons, Arlequin was used to calculate pairwise FST distances

664 based on haplotype frequencies (Weir and Cockerham 1984), and pairwise ΦSTs for the TLR and

665 mtDNA sequences (Excoffier, et al. 1992). FindModel (Posada and Crandall 1998) was used to find 666 the best fit substitution model for use in Arlequin. ΦST calculations for the TLR loci were obtained

667 using the Tamura and Nei substitution model (Tamura and Nei 1993), while mtDNA ΦST analysis was

668 carried out using the Kimura 2-Parameter model (Kimura 1980) with a gamma of 0.27. Analysis of 669 molecular variance (AMOVA) was used to compute hierarchical F-statistics, with 10,000

670 permutations, to evaluate likely patterns of genetic structure, seeking to identify the population 671 grouping that maximized the among-group variation (FCT) and minimized the variation among

672 colonies within groups (FSC) (Excoffier, et al. 1992). Significance of overall and pairwise genetic

673 distances were computed using 1,000,000 permutations. We used the SGoF+ method (Carvajal-674 Rodriguez and de Una-Alvarez 2011) within the Myriads software (Carvajal-Rodriguez 2018) to 675 correct for multiple tests.

676 To test for isolation-by-distance, shortest distances by sea (during summer ice extent), in km, were 677 computed between each sampling location, using Google Earth v7.3.2.5776 (Supplementary Table 678 S5B), which were then related to pairwise FST in a Mantel test, implemented in Arlequin.

679 Population Divergence and Phylogeography

680 Phylogenetic reconstruction and estimates of divergence time were carried out using BEAST 2.5.2 681 (Bouckaert, et al. 2019). The evolutionary model for mtDNA analysis was selected using jModelTest 682 v. 2.1.10 (Darriba, et al. 2012), testing 88 candidate models and selecting the best fit using the Bayes 683 Information Criterion (BIC). All 88 models were within the 100% confidence interval, with HKY+G 684 selected for further divergence analyses (Hasegawa, et al. 1985). A total of 15 Adélie penguin (P. 685 adeliae) and 15 Chinstrap penguin (P. antarcticus) mitochondrial HVR1 GenBank sequences (Clucas,

686 et al. 2014), aligned and cropped to the equivalent size of the Gentoo sequences to avoid bias, were 687 included in the analysis as outgroups.

688 The most recent common ancestor prior was set for the Pygoscelis genus at 7.6 Mya (Subramanian, et 689 al. 2013), derived from the fossil calibration for Pygoscelis grandis (Walsh and Suarez 2006), with a 690 normal distribution, and standard deviation () of 1.3 Mya. A strict molecular clock with a starting

(22)

PDF Proof: Mol. Biol. Evol.

691 prior of 1.0 and a Yule speciation process for branching rates, with uniform priors for birth and clock 692 rates of 1.0 was applied. Four independent runs of 30 million MCMC chains were performed, logging 693 parameters every 3,000 steps. The four independent runs were then combined using LogCombiner 694 v.2.5.2 and assessed for convergence within Tracer v.1.7.1 (Rambaut, et al. 2018). All parameters 695 converged with ESS values greater than 6,000. A maximum clade credibility tree was then generated 696 using TreeAnnotator v.2.5.2 (part of the BEAST software distribution) and visualized in FigTree 697 v.1.4.3.

698 Selection Analyses

699 Phylogenetic inference was carried out on phased sequence data for each TLR locus using RAxML-700 NG using a GTR substitution matrix (Kozlov, et al. 2019). To detect selection, maximum likelihood 701 analysis of ratios of non-synonymous to synonymous nucleotide substitutions (dN/dS; ω) was 702 performed with the codeml package of programs in PAML v. 4.9 (Yang 1997; Yang 2007). Various 703 models were fitted to the multiple alignments: M1a (neutral model; two site classes: 0 < ω0 < 1 and ω1 704 = 1); M2a (positive selection; three site classes: 0 < ω0 < 1, ω1 = 1 and ω2 > 1); M7 (neutral model; 705 values of ω fit to a beta distribution where ω > 1 disallowed); M8 (positive selection; similar to M7 706 but with an additional codon class of ω > 1); M8a (neutral model; similar to M8 but with a fixed 707 codon class at ω = 1). Likelihood ratio tests were performed on pairs of models to assess whether 708 models allowing positively selected codons gave a significantly better fit to the data than neutral 709 models (model comparisons were M1a vs. M2a, M7 vs. M8, and M8a vs. M8). In situations where the 710 null hypothesis of neutral codon evolution could be rejected (p < 0.05), the posterior probability of 711 codons under selection in M2a and M8 were inferred using the Bayes Empirical Bayes algorithm 712 (Yang, et al. 2005).

713 In Silico Prediction of Polymorphism Functional Consequences

714 Physiochemical distances between amino acid variants were assessed using distance matrices 715 provided by several authors (Sneath, 1966; Epstein, 1967; Grantham, 1974; Miyata, 1979; Urbina, et 716 al. 2006; Supplementary Table S7). Predicted functional consequences of amino acid substitutions 717 were assessed using the homology-based tools SIFT (Ng and Henikoff, 2003) and PolyPhen-2 718 (Adzhubei, et al. 2013), using online servers (https://sift.bii.a-star.edu.sg/ and

719 http://genetics.bwh.harvard.edu/pph2/; both accessed December 2019). Transmembrane domain

720 positions were predicted using Phobius (Käll, et al. 2004; http://phobius.sbc.su.se/; accessed 721 December 2019).

722 Functional Analysis of TLR5 Genotype Expression in CRISPR-Cas9 edited HEK-Blue Cells 723 In order to functionally assess positively selected TLR5 polymorphisms in vitro, two full-length TLR5 724 sequences were synthesised including the two signalling domain polymorphisms that had the

(23)

PDF Proof: Mol. Biol. Evol.

725 strongest signature of selection (GBlocks, IDT). Synthetic genes were cloned using the Gibson 726 assembly method (Gibson, et al. 2009) into the p3XFLAG-CMV™-14 expression vector (Sigma) 727 which incorporates a 3x-FLAG sequence on the C-terminus of the expressed construct. Insert-728 containing vector was purified using the ZymoPURE II plasmid Maxiprep with the optional 729 endotoxin-removal step (Zymo). Both constructs were transiently expressed using TransIT®-2020 730 (Mirus Bio) in custom HEK-Blue™ Null1 cells (InvivoGen) that had undergone genome editing using 731 the CRISPR-Cas9 technique to disrupt endogenous human TLR5. Cells expressing Gentoo TLR5 732 constructs were challenged with Salmonella Typhimurium-derived flagellin (FLA-ST; InvivoGen) at 733 100 ng/mL and incubated for 24 h. Cell supernatants were harvested and NF-κB activity was assessed 734 by measuring the absorbance at 405 nm on a FLUOstar® Omega microplate reader (BMG Labtech) 735 following the addition of p-nitrophenyl phosphate substrate, according to the manufacturer’s

736 instructions (SIGMAFAST™, Sigma). Expression levels were monitored by subjecting cell lysates to 737 a direct anti-FLAG ELISA. Cell lysates were harvested in ice-cold RIPA buffer (ThermoFisher) and 738 proteins were immobilised on high-bind ELISA plates (VWR) using coating buffer (BioLegend) 739 overnight at 4 C. Wells were blocked using StartingBlock™ (PBS) blocking buffer (ThermoFisher) 740 for 1 h and then incubated with mouse monoclonal anti-FLAG M2 antibody (Sigma, 1:1000) at 37 C 741 for 1 h, followed by incubation with goat anti-mouse IgG-HRP (ThermoFisher, 1:10000) for 1 h at 37 742 C. Reactive protein amount was then assessed by the addition of 3,3',5,5'-tetramethylbenzidine 743 substrate and measurement of absorbance at 650 nm. Expression data were then used to normalise 744 signalling data. Statistical differences between means were determined by a two-tailed Student’s t-745 test, and statistical significance was considered to be p < 0.05. Transfections were carried out in three 746 independent wells per condition, and the experiment was conducted on at least three independent 747 occasions.

748 Acknowledgments

749 Financial support for this study was provided by an Oxford Clarendon Fund scholarship for HL and 750 from the Biotechnology and Biological Sciences Research Council (BBSRC) [grant number

751 BB/M011224/1] for SRF. Sample collection was funded in part by CONICYT PIA ACT172065 GAB 752 and Spanish Ministry of Science projects CGL2007-60369 and CTM2015-64720. Logistic and field 753 costs for sampling at the Crozet and Kerguelen archipelagos were supported by the Institut Polaire 754 Français Paul-Emile Victor (IPEV: Programme 137 – C.L.B. and Programme 354 – F.B.,

Références

Documents relatifs

Pathogen load (or replication rate) has been shown previously to explain little variance in virulence (measured as putative host mortality, mean conjunctival swelling, body mass

Many biological mechanisms, such as phenotypic plasticity, genetic correlations, indirect genetic effects, and age-speci fi c responses, are well known to interfere with the

ϭϲϱ ‘…Ž—•‹‘ &gt;͛ŶĂůLJƐĞ YƵĂŶƚŝĨŝĠĞ ĚƵ DŽƵǀĞŵĞŶƚ ĚĞ ů͛ĂƉƉĂƌĞŝů ůŽĐŽŵŽƚĞƵƌ ƌĞƋƵŝĞƌƚ

Soil carbon contents as function of soil depth under CE (native Cerrado, 2003 sampling), CT (conventional tillage cropping system, 25 years old, 2003 sampling) and NT-21

For three LPS- binding sites, R263K, K360R and K434R, the biochemical features of the residue were maintained between rodents (all were positively charged residues) but distinct

Using the grid cell method (White and Garrott 1990 ; Adams and Davis 1967 ), the area tra- versed by a study group is dissected by a grid of cells or squares, and the sum of the

Figure 4: Shape parameter κ from fits of the selectivities to generalized Pareto distributions versus σ from fits to log-normal distributions – Results from different libraries