Comparaison de différentes stratégies spatiales d'échantillonnage d'une core collection parmi des populations naturelles d'espèces fourragères

(1)

HAL Id: hal-02838498

https://hal.inrae.fr/hal-02838498

Submitted on 7 Jun 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Comparaison de différentes stratégies spatiales d’échantillonnage d’une core collection parmi des

populations naturelles d’espèces fourragères

François Balfourier, Gilles Charmet, Jean-Marie Prosperi, Michel Goulard, Pascal Monestiez

To cite this version:

François Balfourier, Gilles Charmet, Jean-Marie Prosperi, Michel Goulard, Pascal Monestiez. Com- paraison de différentes stratégies spatiales d’échantillonnage d’une core collection parmi des populations naturelles d’espèces fourragères. Colloque BRG/USTL. Méthodologie de gestion et de conservation des ressources génétiques, Oct 1997, Lille, France. �hal-02838498�

(2)

Original ^article

Comparison

of different

spatial

strategies

^for

sampling

^{a core}^collection

of natural

populations

^{of fodder}^crops

François ^Balfourier Gilles Charmeta Jean-Marie Prosperib

Michel Goulard Pascal Monestiez a

Institut national de la recherche agronomique, Station d’amélioration des plantes, 234, ^avenue^duBrézet, ⁶³⁰³⁹Clermont-Ferrand cedex 2, ^France

b

Station de génétique ^etamélioration de plantes, ^Institut^national

de la recherche agronomique, Domaine de Melgueil ³⁴¹³⁰Mauguio, ^France

’ Station de biométrie, Institut national de la recherche agronomique,

BP 91, 84143 Montfavet cedex, ^France

Abstract - Seven different strategies ^forsampling ^a^corecollection are compared ⁱⁿ large collections of natural populations ^of^two^foddercrop species (Lolium perenne and Medicago truncatula) previously evaluated for agronomic ^traits.The first five

strategies consist of random ôrstratified sampling methods with ônelevel of classification of the population (classification ^basedônagronomic ^traits^withôr^without

a geographic contiguity constraint, ^basedôn^thecountry ôrregion ôforigin, ^{or on} spatial structures of populations). ^{The last}ône^{of these}strategies îs^basedôngeosta- tistical analysis, which allows studies of the spatial pattern ôfgenetic diversity. ^The

two remaining strategies ^are^based^on^theprincipal component ^scorestrategy (PCSS),

which consists of maximising â^distance^criterion^betweenpopulations, ^this^criterion being applied ^toagronomic ^traits^{of the}populations ôr^to^theirspatial components after geostatistical analysis. The different strategies ârecompared using ^two^criteria:

i) ^theircapacity ^tocapture ^thephenotypic diversity of the whole collection, ^{as mea-}

sured by the Shannon index; ii) ^theirability to restore the spatial structuration of the initial collection, ^measuredby distance between _mapsfor agronomic ^traits.^The comparisons ^{for the}^twospecies ^{show that}strategies that take into account spatial

structuration of diversity give ^thebest results for both criteria. © Inra/Elsevier, ^Paris

core collection / geostatistics / Lolium perenne / Medicago truncatula / spatial

structuration

* Correspondence ^andreprints

(3)

Résumé - Comparaison de différentes stratégies spatiales d’échantillonnage ^d’une

core collection parmi des populations naturelles d’espèces fourragères. Sept stratégies d’échantillonnage ^d’une^corecollection sont comparées ^sur^deux^collec-

tions de populations naturelles d’espèces fourragères (Lolium perenne et Medicago truncatula) préalablement ^évaluéespour des caractéristiques agronomiques. ^Lescinq premières stratégies consistent en des tirages aléatoires simples ^oustratifiés selon

un seul niveau de classification des populations: (i) âléatoiresimple (stratégie ^Ran- dom); (ii) stratifié selon une classification basée sur les caractéristiques agronomiques (stratégie Stratagro); (iii) stratifié selon la région ôu^lepays d’origine (stratégie Stratreg); (iv) ^stratifiéselon les caractéristiques agronomiques âveccontrainte de

contiguïté géographique (stratégie Straconti); (v) stratifié selon les structures spatiales des populations (stratégie Stratspa), déterminées _parles méthodes de la

géostatistique. ^Ils’agit ^de^méthodesstatistiques adaptées ^àl’analyse ^de^donnéesspa- tiales et à la mise en évidence de structure ou distributions spatiales ^nonaléatoires.

Enfin, ^deuxâutresstratégies déterministes sont testées; êlles^sont^basées^surûn

critère de maximisation de distance entre populations, ^cecritère étant appliqué ^aux

données agronomiques ^de^{base des}populations (stratégie PCSS) ^et^à^leurs^com-

posantes spatiales (stratégie SPPCSS). ^Lesdifférentes stratégies ^sontcomparées ^selon

deux critères: leur capacité ^àcapturer la diversité phénotypique ^contenuedans la collection de départ, ^mesuréepar ^unindice de Shannon, ^et^leuraptitude ^à^restituer^la

structuration spatiale de la collection globale, ^mesuréepar ^unedistance entre cartes établies _pourdes caractères agronomiques. ^Lescomparaisons montrent que, chez les deux espèces, ^lesstratégies ^tenantcompte ^desstructures spatiales (Stratspa ^et^SP- PCSS) donnent les meilleurs résultats, aussi bien _pourl’indice de diversité globale

que pour la représentativité ^des^cartes.@ Inra/Elsevier, ^Paris

core collection / géostatistique / Lolium perenne / Medicago truncatula / ^struc-

turation spatiale

1. INTRODUCTION

Genetic resources from natural populations ^arevaluable materials for plant breeders, especially ⁱⁿ^foddercrops for which adaptation ^toenvironmental and

farming conditions are fundamental factors of diversity. However, ^theincreasing

size of genetic ^resourcescollections often limits their distribution, optimal

utilisation and sometimes conservation. In addition to practical difficulties that _maybe encountered by ^curatorsⁱⁿ^themultiplication ^andregeneration

of accessions, ^users^wish^toobtain information about the genetic ^material

available. For this, evaluation of the genetic ^resources^foragronomic ^{traits is} required, ^butîsâlmostimpossible in very large collections. For this _reason, several authors (Frankel, 1984; Brown, 1989b) ^proposed^the^concept^{of ’core} collections’, which consists of identifying âsubsample ôfmanageable ^size^from

a large collection, ^asrepresentative of the total genetic diversity ^aspossible.

In this _case,evaluation and pre-breeding ^workmay be managed ^{with the}^core collection, ^{while the}^restof the introductions _maybe maintained, âtâ^lower cost, ⁱⁿ^{a reserve}collection. Reviews on the topic ôf^corecollections have been

published by ^Hamon^et^al.(1994) ^{and Brown}(1995).

Recent studies in France on fodder species of Lolium (Balfourier ^and

Charmet, 1994; ^Monestiezêtal., 1997) ândMedicago genera (Chaulet ^{and Pros-} peri, 1994; ^Bonninêtal., 1996a, b) ^showedthat, whatever the species, ^traitsôf agronomic înterest^submitted^to^naturalselection, âs^well^{as some}presumably

neutral markers, often exhibit strong geographical (i.e. spatial) structuration.

(4)

The aim of the present study ^was^todemonstrate the value, ^forspontaneous species ^suchâs^foddercrops, of considering ^thespatial pattern ôfgenetic diversity of collected populations âsâsampling ^criterion^forâ^corecollection, using geostatistical techniques. This involves a series of statistical methods

applied ^tospatial ^dataanalysis ^{and used}^todisplay ^spatialstructures and nonrandom distributions.

2. MATERIALS AND METHODS 2.1. Plant material

The study ^was^carriedôutôn^two^modelspecies ^{of fodder}crops: ânoutbreed-

ing grass, Lolium _perenneL., perennial ryegrass, and an inbreeding legume, Medicago truncatula Gaertn, ^which^is^aMediterranean annual Medicago. ^These

two species ^were^chosenbecause it is well known that reproductive ^behaviour

has considerable effects on gene flow, ^{and hence}^onspatial structuration of

diversity. ^Perennialryegrass is widespread ⁱⁿEurope, while annual Medicago

or medic is a species mainly ^localisedⁱⁿ^warmMediterranean areas or at low altitudes. These species, which exhibit a large diversity ⁱⁿtheir natural _pop-

ulations, ^areconsidered as models for population genetics ^studies^on^fodder

crops. Furthermore, medic is also one of the model species used for studies of

plant-microorganisms relationships (Barker ^et^al.,1990).

A collection of 550 natural populations, previously collected from all over

France (21 administrative regions), ^was^{used for}^perennial^ryegrass(Charmet

and Balfourier, 1991), while, ^formedic, ^asample of 110 natural popula-

tions from six countries or regions ôverthe Mediterranean area (Algeria: 33, Spain: 38, Portugal: 5, ^France:19, ^Greece:¹¹and Crete: 4) ^wasused. Because of the varying âbundanceôf^thisspecies ^fromônecountry ^toanother, ^the^dis-

tribution of populations may ^nothave been homogeneous for the whole ^area

analysed.

2.2. Agronomic ^traits

Agronomic evaluation was carried out on the 550 _ryegrasspopulations, using spaced plant ^nurseriesⁱⁿâmultilocal evaluation network. Several traits ^were observed and scored according ^toâ^visual¹^to9 scale: seasonal vigour ^traits (spring, ^summer,autumn), ^{frost and}^rustsusceptibilities, persistency, âlterna- tivity (i.e. ability ^toproduce spikes ⁱⁿsowing year), heading date, âftermath heading ândgrowth habit. To describe diversity ^{for these}traits, taking întoâc-

count genotype by environment interactions, ²⁸agronomic ^variables(Charmet

et al., 1990) ^wereconsidered (table I).

For the medic, ^a^trialplot ^of^two^1-m^rowsper population, ⁱⁿtriplicate,

was planted ⁱⁿ¹⁹⁹⁵^atMontpellier, according ^to^arandomised block design.

Observations were made of plant vigour, illustrated by plot density ^andsize, phenological stages, pathogenic susceptibility, pod ^{and seed}production ^and

natural regeneration scored 3 weeks after the first autumn rainfalls. Thus, ^a

total of 29 agronomic ^traits^was^observed(table 1!.

(5)

(6)

2.3. Statistical development

2.3.1. Principal component analysis (PCA) of agronomic ^data

and clustering of populations

A basic PCA was carried out on the two sets of centred and scaled agronomic data, î.e. ²⁸variables for 550 _ryegrasspopulations, and 29 variables for 110 medic populations. Following this, âmultivariate hierarchical clustering analysis ^was^carriedôutôn^the^firstcomponents of each PCA whose eigenvalue

was greater ^than¹(i.e. ^nine^axes^forryegrass, noted compl r to comp9r, ^{and five}

axes for medics, compl m ^tocomp5m). ^After^cutting^trees,this makes it possible

to define agronomic ^clustersamong populations ^{for the}^twospecies. Using ^the

same ratio of between-cluster variance over total variance ⁼0.5, ¹¹agronomic

clusters were identified for _ryegrass(Charmet ^et^al.,1990) and four clusters for medic populations.

Finally, âmultivariate hierarchical clustering analysis, using âgeographical contiguity constraint (Charmet êtal., 1994), ^wasapplied ^to^theryegrass data set only, ^{which led}ûs^toconsider 25 ’contiguous’ clusters for the same ratio of 0.5. This method permits grouping ôfpopulations according ^to^{both their} similarity ^foragronomic traits and to their geographical proximity.

2.3.2. Geostatistical analyses

Spatial structuration of diversity ^foragronomic ^traits^wasanalysed using geostatistical ^methods(Cressie, 1991; Wackernagel, 1995). Application ^torye- grass populations ^hasalready ^beenreported (Monestiez ^etal., 1994). ^Most^com-

putations, figures ândmaps ^weremade using the S-Plus programming package (Becker êtâl.,1988).

2.3.!.1. Univariate analysis

In geostatistics, stochastic modelling ^is^used^todescribe the spatial ^data.

Suppose that the studied variable is a realisation of a random field and that the observed data are a spatial sample ^of^somesites of this realisation. Let Z(x)

denote the random field at location x and z(x) denote the observed data at the same location _x,which is a realisation of Z(x). ^Fromdifferent observations

(z(zi ) , ... , z(xn)), ^we^can^define^for^each^couple(i, j), ^thedissimilarity ^between

two points xi and xj ^by:

Assuming stationarity ^{of order}^two(mean and variance of the variable are

supposed ^tobe invariant by translation), ^wedefine the variogram:

where x + h is the translation of _x,and E denotes the expectation. ^For^the isotropic case, the variogram g(h) ^can^beestimated, using previous dissimilar- ities, by:

where n! is the number of couples, ^{and the}^distanced(x!, xi) ⁼^h.

(7)

The analysis ^ofspatial structuration of a variable will be made with its

variogram ^which^measuresthe semivariance of the difference between points separated by ^agiven distance h. Structural analysis consists of describing ^and modelling the estimated variogram g*(h). Classically, three different items are

described by such functions:

o First, ^thediscontinuity ^at^theorigin. ^Thevariogram equals ^zero^at^the origin by definition. For a distance close _{to zero,}a significant ^valueusually

called the ’nugget effect’ is often observed. This initial variance ^canbe seen as measurement error or microscale variation.

o Second, ^thevariogram usually ^increases^withincreasing distances. The shorter the distances between the sites, ^the^moredependent the observations.

o Third, the ’sill’ is the steady-state value reached by the function. When the distance between sites is larger ^than^agiven value, ^thevariogram ^function

becomes constant and the sites should be considered ^tobe independent.

This distance value, called the ’range’, îsânêssentialparameter ⁱⁿspatial description. The value of the sill is close to that of the global population

variance.

For a two-dimensional _space,spatial structuration of a variable ^canbe visualised on maps. To obtain smoothed _maps,we need to predict ^values^at unsampled locations. In geostatistics, ^theprediction method is called ‘kriging’.

Kriging ^uses^theobserved values of the variable in conjunction ^{with the}

information provided by ^thevariogram (Wackernagel, 1995).

In the univariate case, four variables, considered as realisations of random

variables, ^wereindependently analysed ⁱⁿ^thepresent study: the first two

components ^{of PCA}of ryegrass ^data(complr, comp2r) ^and^{of medics}(complm, comp2m).

2.3.2.2. Multivariate spatial ^method

Multivariate spatial analysis îsâgeneralisation, ⁱⁿthe multivariate _case,of the study ôfvariograms ândrespective cross-variograms for each variable. The

cross-variogram ^between^two^variablesZl ândZ2 îs^definedâs:

These cross-variograms provide information about the spatial ^relations^{of de-} pendence ^betweendifferent variables. A linear co-regionalisation ^model^is^used

as a tool for simultaneously modelling variograms ^andcross-variograms. ^The

basic physical îdeaîsthat the variables under study âregenerated by ^differ-

ent processes acting additively with various specific spatial structures. Finally,

in ’kriging analysis’, these different structures are themselves decomposed by

PCA into spatial components, ^whichâreâllmutually independent. ÂsⁱⁿPCA,

the relationships ^betweenoriginal variables and spatial components ^can^{be de-} scribed and spatial components ^can^bemapped. ^A^more^detaileddescription ^of

the model and its fittings îsgiven by Goulard and Voltz (1992), while Monestiez et al. (1994) provide âûsefuldescription ôfkriging analysis.

To summarise the spatial information given by ^numerousoriginal agronomic variables, kriging analysis ^was^not^carried^out^on^theoriginal variables, ^but

rather on the first nine components ^of^PCA^forryegrass (complr to comp9r),

and on the first five components ^of^PCAfor medics (complm ^tocomp5m).

(8)

2.3.3. Sampling strategies

Seven core collection sampling strategies ^weretested, using various propor-

tions, p, of the whole collection (p ⁼5, 10, ²⁰^{and 40}%), ^{for the}^two^species.^For

the first five strategies, 200 random samples ^weregenerated ^by^computer^for

each of the four proportions. ^Thesestrategies consisted of simple ^or^stratified

random sampling:

o random sampling (RANDOM): ^simple^random^sampling,^without^replace-

ment, ôfâproportion p in the whole collection of 550 ryegrass populations ând

110 medic populations;

9 random sampling stratified by agronomic clustering (STRATAGRO):

random sampling ^of^aproportion p in each of the q agronomic clusters, previously defined after PCA of the original agronomic ^variables(q ⁼¹¹^for

ryegrass and ⁼4 for medics);

o random sampling stratified by regions ^oforigin (STRATREG): ^random

sampling ^of^aproportion p in each French administrative region ^forryegrass

(21), ôrêach^countryôfôriginfor medics (6);

. random sampling stratified by contiguous ândagronomic clustering (STRATCONTI): ^random^samplingôfâproportion p in each of the 25 ^con-

tiguous clusters, âspreviously defined for _ryegrassonly (Charmet êtâl.,1994);

o random sampling stratified by spatial structures (STRATSPA): ^random sampling ^of^aproportion p in each of the ten clusters derived from clustering

on the first component ôf^PCAôfspatial structures, âspreviously ôbtainedby kriging analysis.

Thus, all these stratified samples ^weregenerated according ^to^thep strategy of Brown (1989b), ^i.e.^wereproportional ^to^cluster^size.

The last two strategies ^are^based^on^theprincipal component ^scorestrategy

(PCSS) (Hamon êtâl., ^1995;^Noirotêtâl., 1996). ^PCSS^consists^{of three}

steps: ^theapplication ^of^PCA^to^thequantitative data of the collection; ^the computerisation ^{of the}weighted Euclidean distance between individuals, using

new coordinates on PCA _axes;and the selection of the accession that has the highest relative contribution (RC) ^to^cloud^inertia.^Then,^step^by^step,

we add to the core the accession that maximises the ^sumof RC within the

core. Finally, all the accessions of the collection are classified according ^to^a

maximisation distance criterion, ^thenⁱⁿselecting ^{the first}p%. Îtîs^thereforeâ

deterministic method. We applied this method first using components ôf^PCA of original agronomic variables, and second using components ôf^PCAôfspatial

structures:

o PCSS: selection, ⁱⁿ^{the whole}collection, ôfâsample ôfp accessions, according ^to^their^RC^scoreâfter^PCAôforiginal agronomic variables;

o spatial principal component ^scorestrategy (SPPCSS): selection, ⁱⁿ^the

whole collection, ôfâsample ôfp accessions according ^to^{their RC}^scoreâfter

PCA of spatial structures.

2.4. Evaluation criterion and comparison of the different strategies

At first, ^the^sevensampling strategies ^werecompared for their capacity

to capture the initial diversity ^oforiginal collection, ^which^wasquantified by

a Shannon diversity ^indexcomputed ^{for all}agronomic ^traits(Shannon ^and

(9)

Weaver, 1963). For each undeterministic method, ²⁰⁰samples ^werecomputed

to calculate Shannon indices. Then, ^to^makecomparisons, only the lower 95 % confidence limit of the index distribution was taken into account. Considering

the importance ^ofspatial structuration of diversity ^foragronomic traits, ^the

different strategies ^werethen tested for their ability ^toproduce subsamples

of populations, which make it possible ^to^estimate^akriged map which is the closest to the original map obtained for the whole collection.

The reference _mapsused here were those of the first two components ^of PCA of original agronomic variables for _ryegrass( compl r, comp2r) ^{and medics} (complm, comp2m). The ryegrass reference maps ^wereobtained, from the 550 observations of the whole collection, by kriging ^{on a}regular grid ^{of 1 264} equally-spaced points covering France; ^formedics, ^thekriging ^method^was performed ^{from the}¹¹⁰observed values on a grid ^{of 2 479}points covering

the six Mediterranean countries.

These reference _mapswere then compared ^withmaps obtained from subsam-

ples ôfp ⁼5, 10, ²⁰ând⁴⁰% of the whole collection, ândresulting ^{from the}

different strategies. Comparisons ^were^madethrough ^amap distance d simply defined by ^d⁼1/N !ilZrefei - ^Ztestiwhere ^Nis the number of regular grid points, Zre f ei the reference _mapvalue at location i, calculated with the whole set of accessions, ^andZtesti the value of the tested _map,at the same location i, calculated with only ^the^coreaccessions.

The first five strategies being undeterministic, ^wecalculated a distance _map for each of the 200 samples ^{of each}strategy ^{and each}proportion p. We then kept

the dsup distance, ^which^was^theupper 95 % confidence limit of the distance distribution.

In the same way, ^tocalculate a confidence interval on the reference _map, 200 bootstrap resamplings ^were^madeamong the 550 _ryegrassand 110 medic

accessions, ^andused, ^aspreviously, ^tocalculate the dsup distance for each reference _map.

3. RESULTS

3.1. PCA of original agronomic ^variables

The first nine components ^{of PCA of}²⁸agronomic variables of _ryegrass summarise 65 % of the total variance. However, from table III, giving ^correla-

tions between initial variables and the first two components ^{of PCA}(complr, comp2r), ^weobserve that the first axis is positively correlated with alternativ-

ity (ALT), ^aftermathheading (AH) ^{and frost}susceptibility ⁱⁿseveral locations

(FROl, FR03), ^andnegatively correlated with persistency ^andvigour ^traits.

This axis can be globally interpreted ^asrepresentative ^ofvigour. ^{The second}

axis is rather negatively correlated with heading ^date(HD), ^andpositively ^with

rust susceptibility (RUS1, RUS4); it is thus an axis linked with earliness and

plant physiology.

As for medics, the first five components of PCA of the original ^data^set

account for 78 % of the total variance. Table IV gives correlation coefficients between the 29 variables studied and the first two components, complm ând comp2m. The first axis (42 %) ^can^beeasily interpreted âsâvigour axis since all

(10)

height plots (HI ^toH6) ^{and width}^plots(Wl ^toW7) ^arepositively ^correlated

with it. The second axis (15 %) ^islinked with reproductive traits, ^such^aspod

and seed weights (Ws, Wp), ^orpod and seed number produced per square meter (Np-m2, Ns-m2). ^Flowering^date(DFL) correlation coefficients with the two axes are quite similar, ⁱⁿcontrast to the seed weight/pod weight ^ratio

(Ws-Wp), ^henceapproximately ^seedquality.

3.2. Geostatistical analysis

Although the univariate analysis ^was^carried^out^on^{the first}^twocompo- nents of each original PCA, ^we^present^hereonly ^resultsconcerning ^the^first component ^{for each}species.

Figure ^{1 shows}variograms ^{of these}components, respectively ^forryegrass

( compl r) and medics ( compl m). ^Thecompl variogram has been fitted with

a combination of two spherical models, corresponding ^{to two}^differentspatial

structures with _rangesof 120 and 300 km, respectively. ^Inaddition, ^since^we

observe a discontinuity ^at^theorigin (nugget effect), ^a^thirdpepitic ^structure

was added to the model. For medics (complm), ^two^models^werealso used: ^a

spherical model, ^withârange of 600 km, ândâ^linearmodel, ⁱⁿâddition^toâ pepitic structure, leading âlso^to^threespecific spatial structures.

Figures 2 and 3 show interpolated ^mapsôfcomplr ândcomplm, î.e.maps

resulting ^fromkriging of the first component of each PCA on regular grid points. Thus, figure 2, for ryegrass, displays ^thestrong spatial structuration of component complr. ⁱⁿ^browncolour, ^we^findpopulations, rather localised in the south of France, ^withhigh ^scores^ofalternativity, ^aftermathheading,

frost susceptibility ^{and low}vigour ^scores.^Incontrast, ^northernpopulations

(11)

appear ^morevigorous (green colour). Âsâgeneral rule, ⁱⁿâddition^toâ^clear

north-south cline, ^somepatches ^ofhomogeneous values and various radius may be noted, explained by the addition of two specific processes of spatial

structuration: one with a range of 120 km, the other with a 300 km range.

Figure 3 displays ^{values of}complm for medics. The regionalisation ^{of this}

new variable is clearly perceptible ^on^themap. French and south Algerian populations exhibiting ^lowgrowth ^scores(brown colour) ^contrast^withSpanish

and Cretan populations, ând^toâ^lesserêxtentwith those from the coastal

area of Algeria, ^whichexpress ^astronger growth (green colour). ^It^seems^that

populations originating from borders of the distribution _area,whether in the north (France: âreas^withhigh rainfall, ^{but cold}climate) ôrⁱⁿ^{the south}(dry Algerian ârea,^withvery low rainfall), clearly ^contrast^{with the}populations

from more favourable _areas,with moderate temperatures and sufficient rainfall.

(12)

(13)