Estimation of relatedness in natural populations using highly polymorphic genetic markers

(1)

HAL Id: hal-00893889

https://hal.archives-ouvertes.fr/hal-00893889

Submitted on 1 Jan 1991

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Estimation of relatedness in natural populations using

highly polymorphic genetic markers

P Capy, Jfy Brookfield

To cite this version:

(2)

Original

article

Estimation

of

relatedness

in natural

_populations

using highly polymorphic

genetic

markers

P Capy

JFY Brookfield

2

1

Centre National de la Recherche

_Scientifique

Laboratoire de

_Biologie

et

Génétique

Evolutives

91198

_Gifsur

Yvette

_{Cedex, France;}

2

University of Nottingham, Department of Genetics,

Queen’s

Medical

Center, Nottingham

NG7

2UH,

UK

(Received

7

_{Nlay 1990; accepted}

2

_{August 1991)}

Summary - This _reportaddresses 3 _important

_questions

in

_{population biology: 1),}

Is it

_possible

to determine the actual

kinship

between individuals taken at random from

a natural

_population?

_2),

Is it

_possible

to estimate an _average

_degree

of kinship in a

population

in terms of the

_probability

that 2 individuals drawn at random are related?

3),

Is it

_possible

to estimate a

_{population’s family}

structure in terms of the number and the relative size of the different families? To answer these

_questions

the estimation of

kinship

between 2 individuals is first considered. To do this, identity

probabilities,

based

upon 2 sets of assumptions

concerning

the

_genetic

markers

_used,

were derived for different

cases of

kinship.

The use of VNTRs

_(variable

number of tandem

_repeats)

shows that for

multilocus

_probes,

all distributions of

_{identity broadly overlap}

even when the number of

loci is about 20. Therefore

_by

VNTRs _alone,it is difficult to define the true

_kinship

between 2 individuals when

only

their DNA

_fingerprints

are

_compared.

More accurate estimations

can be achieved with monolocus

_probes.

However, to estimate a

_{population’s}

structure or

the average

degree

of

kinship

between

individuals,

it is not _necessaryto

_{identify precisely}

each individual

_sampled,

but rather,

only

to determine whether individuals are related or

not. For _this,it is _{necessary to}define a threshold

_identity

value which

depends

on the

common _patternsthat can be observed between unrelated individuals. Below this

_value,

individuals are considered to be unrelated

_and,

above

_{it, they}

are considered to be related.

Finally,

a

_{sequential sampling procedure}

is

proposed.

natural _populations

_/

relatedness

_/

genetic marker

_/

multilocus probes

/

monolocus

probes

*

(3)

Résumé — Estimation de la parenté au sein des populations naturelles à l’aide de marqueurs génétiques hautement _polymorphes.Peut-on déterminer les liens de

parenté

entre 2 individus pris au hasard dans une

population

naturelle ? Peut-on estimer la

_parenté

_moyenne,c’est-à-dire la

probabilité

de tirer au hasard 2 individus

apparentés,

au sein d’une

population

naturelle ? Ou bien encore _peut-ondéterminer la structure d’une

population,

à savoir le nombre et la taille relative des

_{différentes familles qui}

la

_{composent ?}

#

Pour

_répondre

à ces

_questions,

l’estimation de la

parenté

entre 2 individus a été tout d’abord

_envisagée.

A

_partir

de 2 séries

_{d’hypothèses}

relatives aux _marqueurs

génétiques

utilisés,

les

_{probabilités}

d’identité entre 2 individus ont été

_définies

pour des liens de

parenté simples. L’application

de ces 2 modèles aux VNTR montre que pour les sondes

multilocus, les distributions des

_{probabilités}

d’identité se recouvrent très

_largement,

même

lorsqu’une

vingtaine de locus sont détectés. Par

_conséquent,

il est

_difficile,

voire

impossible,

de déterminer

_{précisément}

la

_parenté

entre 2 individus en se basant exclusivement sur ce _typede données. Par _contre,l’utilisation simultanée de

plusieurs

sondes monolocus

permet d’obtenir des estimations

_{plus précises.}

Pour estimer la structure d’une

population

ou la

_parenté

_{moyenne entre}individus, il n’est pas nécessaire

d’identifier précisément

chaque

individu, mais uniquement de déterminer si 2 individus sont

apparentés

ou non. Pour

_cela,

un seuil d’identité est

défini en fonction

des valeurs d’identité observées entre

individus non

apparentés.

En

deçà

de cette valeur

seuil,

les individus ne sont pas considérés comme

apparentés

et

au-delà,

il est admis

qu’ils

le

sont. Enfin,

une

procédure séquentielle

d’échantillonnage

est

_proposée.

population naturelle

_/

relation de parenté / marqueur génétique

/

sonde multilocus

_/

sonde monolocus

INTRODUCTION

In

_population

_genetics

_many

_problems

of natural

populations

cannot be solved without a better

_knowledge

of the

_kinship

structure at

_present

and in a small

number of

_generations

in the recent

_past.

The effective size of the

_population,

its

number of founders and the

possible

existence of _groupsof related individuals may

be of

_great

_importance,

but it is

_usually

_verydifficult to obtain such data or even

to make accurate estimates.

For

_instance,

in

_{Drosophila melanogaster, analyses}

of _enzyme

_polymorphism

often show a deficit in

_{heterozygotes}

in natural

_populations.

The

_Wright

fixation index

_(Fis)

can reach 0.6-0.7

(Danielli

and

Costa, 1977;

David et

al, 1989;

Vouidibio

et

_al,

_1989).

Several

hypotheses

are

_{frequently proposed}

to

_explain

such results: selection

_against

_{heterozygotes,}

_inbreeding,

_and/or

the

_mixing

of

populations

with different allelic

_frequencies

_{(Wahlund effect).}

_However,

it remains difficult to

determine the relative

_importance

of each _process.

_Indeed,

in

_Drosophila

_species,

it is almost

impossible

to estimate the

_size,

the

_geographical

limits and the

kinship

structure

_(number

of _groupsof related individuals or

_families)

of a

_population.

During

the last few years, new

_techniques

have been

_developed

for estimates

of relatedness between two individuals chosen from a natural

population.

These

techniques

rest _uponthe detection of

_{highly polymorphic}

DNA sequences, such

(4)

the main

_problem

lies in

_finding

a

highly polymorphic

_system

or a combination of

systems.

The

_principal

characteristic of these

_systems

must allow the

_definition,

for

each

_individual,

of a

_{&dquo;genetic identity card&dquo;,}

or a

fingerprint,

sufficiently

accurate

to avoid 2 unrelated individuals

_possessing

the same

_pattern.

Such

_genetic

systems exist in numerous vertebrates. One

example

is the

major

histocompatibility complex

(Dausset,

1958;

Vaiman, 1970; Klein,

1987)

which determines

_{transplant rejection.}

This

system

consists of 4

_loci,

_having

an _{average of}

10-20 alleles.

_However,

in several natural

populations,

strong

linkage disequilibria

are found

(Dausset

and

Svejgaard,

1977).

Thus,

the

probability

that unrelated

individuals possess the same

_haplotype

can be

_high.

For

_{invertebrates, only enzymatic}

data are

presently

available.

However,

these

techniques

do not detect _manyalleles. For

_instance,

in

_{Drosophila melanogaster,}

the

Amylase

locus has

_{approximately}

13 described alleles

_(Dainou

et

al,

1987)

and is among the most

_highly

_polymorphic

loci. For other enzymes such as Esterase-6 and

Xanthine

_{dehydrogenase,}

it is often

_possible

to detect _manymore

_alleles,

ie between

20 and 30

_alleles,

when

_{electrophoresis}

conditions like buffer

_pH

or

_gel

concentration are modified

(Coyne,

_1976;

Singh

et

_al,

_1976;

Modiano et

_al,

_1979;

Ramshaw et

al, 1979;

Singh,

1979; Keith,

1983).

However,

the

_geographical

distribution of the alleles is not

_homogeneous

and it is rare for all the alleles to exist in a

single

_region.

In other

_words,

at a

_{given place,}

unrelated individuals _mayhave similar

_genotypes.

Moreover,

this

_disadvantage

is reinforced

by

the fact

that,

in a

_{given population,}

the allele

frequencies

are far from uniform with

generally

1 or 2

frequent

alleles and

several alleles at low

_frequencies.

Such

problems

can be

partially

avoided when several

enzymatic

loci are

consid-ered

_together.

This solution has

_already

been

_proposed

for

_paternity

determination

(Chakraborty

et

_al,

_1988),

for estimates of relatedness between colonies of social insects

_(Pamilo

and

_Crozier,

_{1982; Pamilo, 1984;}

Queller

et

_{al, 1988; Queller}

and

Goodnight,

1989)

and between individuals in vertebrates

_(Schartz

and

_Armitage,

1983;

Wilkinson and

McCraken,

1985).

However,

these

_procedures

are not

_always

suitable when the social structures of

_species

are unknown or not accessible.

Recently,

several

_genetic

_systems,

such as

_transposable

elements or minisatellites

and more

generally

RFLPs

(Restriction

Fragment Length

Polymorphisms)

have

provided

new _waysof

estimating

the

kinship

between individuals and of

analysing

the structure of relatedness

_(number

of groups of related

individuals)

in natural

populations.

However,

such

_systems

as minisatellites _maystill not be accurate

enough,

and several authors have

_already

stressed the limits of these

_approaches

for the

_analysis

of natural

_populations

_(Lynch,

1988; Brookfield, 1989; Lewin,

1989).

The first aim of the

_present

work is to evaluate the difficulties in

_estimating

the kin

_relationship

between 2 individuals

accurately

when different

_parameters

of a natural

population,

such as the social _structure,the

_mating

_system,

the

age-classes,

the

_generation

turnover, and the existence of

_{overlapping generations}

among

others,

are unknown. After a brief

presentation

of the basic model and a means of

measuring

the

_degree

of

_identity

between 2

_individuals,

the distributions of

_identity

probabilities

_(using

two sets of

_assumptions

_concerning

the

genetic

systems

used)

will be

_presented

for different kin

_{relationship. Then,}

their

(5)

estimation of

kinship

structure,

ie,

the number and the size of groups of related

individuals,

and on the estimation of an _average

kinship level,

ie the

probability

that 2 individuals drawn at random are

related,

in a

population

of unknown

kinship

structure. A

_{sampling procedure}

based upon the model

proposed

by

Rouault and

Capy

(1986)

and

_{by Capy}

and Rouault

₍₁₉₈₇₎

will be

_proposed.

MATERIALS AND METHODS

Basic model and

_identity

Each individual is defined

by

a set of bands obtained after

_digestion

by

a restriction

endonuclease(s)

of total

_{DNA, hybridisation}

with a marked nucleic acid

probe

and

_{autoradiography.}

The

_resulting

set of bands

_corresponds

to the individual’s

fingerprint

and the

_segregation

of each band is Mendelian.

Identity

between 2 individuals can be calculated from the number of shared

bands;

these bands

_being

identical

_by

state or

_by

descent

(Lynch, 1988).

The

expression proposed by

Nei and Li

₍₁₉₇₉₎

will be used. In

_this,

the

_identity

between

a and b is:

where _naand nbare the number of bands of individuals a and

b,

and nab the number

of bands shared

_by

a and b. This

expression,

which

corresponds

to the

proportion

of bands shared between 2

individuals,

varies from 0

_(if

a and b have no common

bands)

to 1

_(if

a and b share all their

bands).

Identity

and relatedness

In the

_previous

_definition,

the value of

_identity

increases with the relatedness of individuals. Table I

_gives

some values of

identity

for common

_kinship.

For all

situations

_given

in this

_table,

it is assumed that

_parents

in Go do not share _any band and are

_heterozygous

at all their loci. In these

_conditions,

for a

single locus,

the

_comparison

between full sibs leads to the definition of 3 classes of

_{identity 0,}

_1/2

and 1 with the

_respective

_{probabilities}

_{4/16, 8/16}

and

_4/16.

For the

_comparison

between

_offspring

_{of a bacl:cross,}

4 classes of

identity

exist

_0,

_{1/2, 2/3}

and 1 with the

respective

probabilities

2/16,

6/16, 4/16

and

_4/16.

From these

_examples,

it is clear that for a

_given

_average

_identity,

several kin

_{relationships}

_mayexist. For

_instance,

the

_expected

values of

_identity

between

_{parent/offspring}

and between full-sibs are

identical

_(I

=

50%).

The _same

phenomenon

is observed for the

expected

identities

between F2 individuals

_(offspring

of

_FlxF1)

or between

offspring

of a backcross

(I = 60.42%).

This result is more conclusive when the distributions of

identity

are

(6)

RESULTS

Expressions

and distributions

_{of identity probabilities}

Two

_simple

models will be

_considered,

each of them

_{corresponding}

to 2 different

genetic

markers and 2 levels of

_polymorphism

detection. As discussion will be in

terms of the

_application

to

_VNTI3s,

model I is related to a monolocus _systemand model II to a multilocus _system.In both cases, to

simplify

the

presentation,

the

existence of an

identity by

state will be

_{neglected. Expressions}

for the

_{probabilities}

and distributions of

_identity

will be

_given

for 4

_kinships

ie

_{parent/offspring,}

full-sibs,

half-sibs and unrelated individuals.

_Furthermore,

the distribution of

identity

between Fl individuals of a

population,

founded

by

4 unrelated individuals

(2

males

and 2

_females),

will be calculated.

_Finally,

in the second

_model,

to illustrate

the

_{problem posed by overlapping generations,}

identities for 4 other

_kinships

(grandparent/grandchildren, uncle/nephew,

cousins and double

_cousins)

will be defined.

Model I

This model

_corresponds

to an idealized situation. It is assumed that:

1),

all loci

present

in a _{genome, for}a

_{given probe,}

are

_detected;

_2),

all individuals have the same number of loci

₍

_T

_i)

and all loci are

_heterozygous

(so

that all individuals have

2n

_{bands); 3),}

2 unrelated individuals do not share _anybands.

Under this

_model,

the

_probability

that 2 individuals share i bands

_according

to

their

_kinship,

is:

Parent/offspring

(po):

(7)

where

_CL

is the number of combinations of i _{bands among 2n}

_bands;

Half-sibs

_(hs):

-Unrelated individuals

_(nr):

The

_probability

of

_sharing

i bands if the 2 individuals

_(a

and

_b)

_compared

are

derived from the first

_generation

of a

_population

founded

_by

F females and M

males,

is

_{given by:}

where

_P0,

P1 and P2 are the

_{probabilities}

of

_drawing

2 individuals that are,

respectively, unrelated,

half-sibs and full-sibs from the

_{population. Assuming}

that all females and all males have the same

expected

number of

_offspring,

the values of these

_{probabilities}

are :

In these

_expressions

it is assumed that a

_given

female can be inseminated

by

several males and a

_given

male can inseminate several females. When

_F/M

mates

per males

exist,

ie _{monogamy when F}=

M,

these

probabilities

become:

According

to this

_model,

the

_relationship

between

_identity

_(I)

and the number of shared bands

_(i)

is:

&dquo;&dquo;

Model II

In this second

_model,

it is assumed that:

_1),

the number of bands _perindividual is

not _constant;

_2),

not all loci are

_detected;

_3),

_only

one band _perlocus is

_detected,

ie there are no allelic bands in the

_fingerprint

of a

_{given individual;}

4),

all loci are

heterozygous; 5)

2 unrelated individuals do not share _any

_bands;

_6),

the number of bands _perindividual follows a Poisson distribution with a mean of n.

Under these

_conditions,

the

_probability

that 2 individuals share i bands

_according

to their

_{kin-relationship,}

is:

(8)

where

_P!!!

is the

probability

that a

_parent

has

exactly

i bands,

e is

_exponential,

and where

_j

_max

is the

_highest

_possible

value

_{of j,}

_ie,

the maximum number of bands for an individual. The

_probability

_P(

_j

₎

is

_{given by:}

Full-sibs

_(fs):

Half-sibs

_(hs):

Grandparent-grandchildren

(pc), uncle/nephew (un),

double-cousins

_(dc):

Cousins

_(co):

Unrelated individuals

_(n

_T

_):

Finally,

if 2 individuals are taken at random in the F1

_generation

of a

_population

founded

_by

F females and All

males,

the

probability

that

they

share i bands is

given

by expression

(4). Otherwise, according

to this

_model,

the

_relationship

between

identity

(I)

and the number of shared bands

_(i)

is:

Figure

1

_gives

the theoretical distributions of identities for the 2 models and for

the first 4

_kinship

relations described here. It has been assumed that

_exactly

10 loci

(ie

exactly

20 bands _perindividual

_according

to the model

_I)

or an _averageof 10 loci

(ie

about 10 bands per individual in the model

II)

can be detected. It can be seen

firstly,

that the distributions of full-sibs and of half-sibs are

symmetrical

in model

(9)

difficult to discriminate between the distributions of half-sibs and full-sibs in the Fl _progenyof a

_{simple population}

_(see

_fig

_ID).

When successive

_{generations overlap,}

it becomes more and more difficult to

estimate the true

_kinship

between 2 individuals.

_Indeed,

the distributions of

(10)

double-cousins must all be considered. Several of these distributions have the same _average

identity.

An illustration of this last

_problem

is

_{given by}

the

_analysis

of a

_simple

hypo-thetical

_genealogy

of 3 successive

_generations

_{(fig 3).}

In this _case,6 unrelated

_pairs

of

_grandparents

_representthe first

_generation.

These

_pairs

each

_produce

between 1

and 4 children. These children

_(a

total of 15

_individuals)

form the second

genera-tion. The third

_generation

is

_composed

of the

_offspring

_(a

total of 16

_individuals)

of the

_couples

in the second

_generation.

In this

_genealogy,

8 kinds or

_relationship

exist and their relative

_proportions

are

_given

in table II.

_Finally,

figure

4

_presents

the

distributions of identities

_according

to model II. Most ot the distributions

_overlap,

making

it difficult to determine the exact kin

relationship

between 2 individuals. For

_instance,

for an

identity

of

_0.25,

the 2 individuals

compared

can be: full sibs

(3.12%),

half sibs

_(2.25%),

_uncle/nephew

_(35%),

_{parent/offspring}

_(3.75%),

grand-parent/grand

children

_(43.75%),

first cousins

_(8.75%),

double cousins

_(3.38%).

Application

to VNTR loci

(11)

pointing

out the difficulties in

_estimating

the relatedness between 2 individuals taken at random in a

_population

of unknown structure.

The 2

_systems

of

_probes

allow one to detect

_{highly polymorphic}

loci for which the mutation rate can be close to

1/100

_per

generation

and per

gamete

(Burke, 1989).

Thus,

the

_polymorphism

_{(number of alleles)}

at a

_given

locus should be much

_greater

than that

_generally

observed for an

_enzymatic

locus. In

spite

of this

_property,

the

estimation of the true

_genetic

_relationship

between 2 individuals remains hazardous with multilocus

probes,

but seems more accurate with monolocus

_probes.

The

primary

advantages

of monolocus

_probes

are that:

_1),

the number of loci is

_known;

and

_2),

the

_homozygous

and

heterozygous

states at a locus can be defined for a

given probe

(see

for

_example

Nakamura et

_al,

_1987).

As

_regards

these

_advantages,

it _appearsthat model

_I,

which was not realistic with

respect

to multilocus

probes,

becomes more valid for monolocus

probes. Indeed,

in

this context, if n monolocus

_probes

are used

_{simultaneously,}

each individual will be

defined

by

a number of bands

_lying

between n and

_2n,

and at least

50%

of these bands will be transmitted to its

_offspring

_{(table III).}

To

_improve

model

_I,

_hypothesis

2 can be

changed,

insofar as it is not _necessary to consider that all loci are

_{heterozygous.}

This is

_particularly

_important

in small

(12)

Thus,

for n monolocus

probes,

a

_given

individual

(a)

will

_present

na bands with

n < na < 2n. The number of

_homozygous

loci will be HO = _{2n -}_na.In these

conditions,

the

_expressions

of

identity probabilities

are identical to those

_given

in

(13)

heterozygous

loci.

_Thus,

if HO represents the average number of

homozygous

loci

per individual in a

_{given population, expressions}

2 and 3 become:

Full-sibs

_{( f s):}

Half-sibs

_(hs):

In these

_conditions,

the total number of shared bands HO + i will be associated

with the above

_{probabilities}

_Pfs!i!

or

_P

_hs

₍

_i

₎

_’

The

overlapping

proportion,

between

the

_identity

distributions of these 2 kin

_{relationships,}

will be related to the number of

_heterozygous

loci in their

_parents.

The

_greater

this

number,

the more the 2

distributions will

overlap.

Estimation of the _average

_{degree of kinship}

and

_{of kinship}

structure

The

_previous

models are

_simple

cases with some unrealistic

_assumptions.

One

assumption

is that 2 unrelated individuals do not share _anybands.

_Indeed,

Wetton et al

₍₁₉₈₇₎

and DT Parkin

_{(personal communication)}

have

_{shown, using}

minisatellite sequences, that unrelated birds may share between 10 and

25%

of their

bands,

which are

_probably

identical in state and not

_by

descent. For minisatellite

profiles,

this

_identity

can be due to

_{electrophoretic comigration,}

_especially

in the

upper

part

of the

_gel

_{(Lynch, 1988).}

Two other unrealistic

_assumptions

are that all

loci detected are

_heterozygous

and that in a

_fingerprint

there are no allelic bands.

For

_instance,

several allelic bands were found in the

fingerprint

analysis

of human

families

_(Jeffreys

et

_al,

₁₉₈₅₎

in

_dogs

and cats

_(Jeffreys

and

_Morton,

_1987),

and in

birds

_(Burke

and

_Bruford,

_1987).

Therefore,

a more realistic model should consider:

1),

the number of bands varies

from one individual to

_another;

_2),

there are

_homozygous

loci and

_pairs

of allelic

bands in the

_fingerprint

of an

_individual;

_3),

2 unrelated individuals _mayshare similar bands identical

by

state. Under these

assumptions,

it is obvious that an

accurate estimate of

_kinship

between 2 individuals will be even more difficult. This results from the increase in the

_{overlapping proportion}

of the different distributions of

_{identity, mainly}

due to

_{identity by}

state.

_However,

with monolocus

_probes

it

seems

possible

to choose a

sample

of

probes

which avoid or minimize these obstacles.

In

_{population genetics,}

and

_especially

in the

_analysis

of natural

_population

structure,

the aim is not

_always

to

_get

accurate estimates of

_kinship

between

different individuals

_(Gilbert

et

al, 1990;

Kuhnlein et

_al,

_1990).

In most _cases,the

purpose is the estimation of the

kinship

structure.

_Therefore,

it is

_only

_necessaryto

determine whether individuals

_belong

to the same

_family

or not. On the other

hand,

an

_identity

in state _may

_exist,

_meaning

that 2 unrelated individuals _mayshare some

of their bands. In this

situation,

it becomes necessary to define a threshold value

(14)

or not. Below this

_value,

it will be

_impossible

to determine if two individuals are

directly

related or share a recent common _ancestor,and so

_they

will be considered

to be

_unrelated;

above this

_value,

it will be considered that a

_kinship

relation

exists between these individuals. Of course, the definition of TV

depends

upon the

polymorphism

of the

genetic

system

used and _uponthe

_population

under

study.

The more

polymorphic

the

genetic

_system

and the

population,

the lower the TV

will be.

Estimates of the TV can be obtained

_{by comparing}

known unrelated individuals.

For

_instance,

in the work of Wetton et al

₍₁₉₈₇₎

on

_birds,

the TV could be chosen

between 0.044 and 0.247

_(see

table

I,

p

147).

However,

when

nothing

is known about

the

_kinship

structure of the

population,

the TV can be defined from the

_identity

of

individuals

_belonging

to different

_populations.

If

_only

a fixed TV is

defined,

errors can be made when identities are _veryclose

to the TV. For

_instance,

it will be

_possible

to

_classify

as unrelated some related

individuals and to

_classify

as related some unrelated individuals.

Thus,

it will be more correct to define a zone of

_uncertainty

around the TV in which it will be not

possible

to determine whether 2 individuals are related or not. Of course, the TV

and the

_uncertainty

zone will be defined

according

to the distribution of

_identity

between unrelated individuals.

_Moreover,

with this

_{procedure, only}

individuals who

are

_directly

parent-offspring,

full-sibs,

grandparent/grandchildren, etc)

will be classed in the same

_family;

and

_according

to the

TV,

first

_cousins,

for whom the

_{expected identity}

is

12.5%

could be considered as unrelated.

Thus, employing

an

_appropriate

TV

_{value, identity}

can indeed be used

_just

to determine whether individuals are related or not. From an

_{identity matrix,}

it

is then

possible

to estimate the

proportion

of

_pairs

of related individuals. This

corresponds

to the

_{probability, Pr,}

of

_drawing

at random 2 individuals who share

a common ancestor in the recent

_past,

ie in the

_previous

1 or 2

_generations,

or who

are

_directly

related.

_Moreover,

from the same

_identity

_matrix,

it is also

possible

to

define different groups of related individuals or families in order to estimate the

population

structure,

ie the number of families and their

respective

size.

To

get

accurate estimates of Pr and of

_population

_structure,

a

sampling

procedure

similar to that

_proposed

_by

Rouault and

_Capy

₍₁₉₈₆₎

and

_{by Capy}

and Rouault

₍₁₉₈₇₎

can be used. This is a

sequential

procedure

based on the

relationship

between the

_sample

_size,

the

_parameter

estimated and confidence intervals of

_proportions

_and/or

a

_sampling

error. In the first _case,the

_proportion

of

_pairs

of related individuals must be estimated. The

_probability

of

_observing

np

pairs

of related individuals in a

_sample

of n individuals follows a binomial.

Since a

_proportion

(Pr)

must be

estimated,

the

_{sampling procedure}

will be

_stopped

when the confidence interval of Pr will be

equal

to or below a

_given

value fixed a

priori

before

_sampling.

In the second case, the

population

structure will be defined

by

the number and the size of the different groups of related individuals.

Thus,

the

_probability

of

_drawing

ni members of each

_family

i follows a multinomial

distribution. In this latter _case,the

_{sampling procedure}

should be

_stopped

when the

_probability

of the

_sample

and the confidence interval of each

_proportion

_(here,

the relative

proportion

of each

_family)

is

_equal

to or below the

_parameters

defined

(15)

DISCUSSION AND CONCLUSIONS

The above results

_complement

those of

Lynch

(1988)

and Brookfield

₍₁₉₈₉₎

and indicate the limits of the use of

_genetic

_systems

such as minisatellites for the

analysis

of relatedness in natural

_populations

_(see

also

_Lewin,

_1989).

_{Nevertheless,}

as has been shown for

_birds,

such

_systems

_may

_provide

new information to

_complete

or

confirm that obtained

_by

other

techniques

(Wetton

et

_al,

_1987;

_Burke:1989;

Burke

et

_al,

_1989).

Without

_preliminary

data on the structure of the

population

(size,

geographical

limit,

age-classes,

etc)

and on the sexual

and/or

family

behavior of

individuals,

it is

_{quite impossible}

to estimate the exact

_kinship

relation between different individuals.

_However,

if

_only

the relatedness

_(without

accurate estimates

of the true

_kinship

_relation)

between individuals is

_considered,

it is

_possible

to

envisage

the estimation of an _averagerate of

kinship

or of a

_population

structure.

However,

with

_genetic

_systems

which show a

_high

mutation rate and for which it is

impossible

to detect the

kinship

between individuals

_having

an

identity

of

10-15%,

the

_only

individuals which can be shown to be related will be

_{parent/offspring,}

brother-sister,

individuals involved in a backcross or, more

_generally,

individuals of

inbred strains or families. The main

_advantage

of the model

_proposed

here is that

it is not _necessaryto

_identify

the different alleles and their relative

_frequencies.

However,

this can be done for monolocus

probes,

and in this case a method similar

to that

_{proposed by Queller}

and

_Goodnight

₍₁₉₈₉₎

could be used for estimation of relatedness.

In the

_present

_work,

_only

2 kinds of

_hypothetical

genetic

systems

have been considered.

_Among

the different

_systems

_{already described,}

several could be used for such an

_analysis.

The main characteristics of a suitable system would be the

_following:

₍₁₎

each individual has a

_great

number of bands

(from

10 to

_30);

(2)

heterozygosity

must be

_high;

₍₃₎

the number of bands shared

by

unrelated individuals must be as low as

possible.

With

_regard

to the multilocus

_{probes available,}

most of them do not fulfill all these conditions. The number of bands _may_varyfrom 2-3 to more than

_20;

the

heterozygosity

and the mutation rate seem to be variable but very

high

(in

some

cases, ;:

97%

for the

heterozygosity

and 0.003 per

gamete

for the mutation

_rate;

Jeffreys

et

_al,

_1988);

but the number of bands shared between unrelated individuals

may be

large

(

z

14%

in

birds;

Wetton et

al,

1987).

With the

development

of monolocus

_probes,

_manyinconveniences could be avoided or reduced. Several

probes

could be used

simultaneously,

as different

enzymatic

loci,

with the

_advantage

that most _{loci possess}a

_high

mutation rate

and

_probably

a uniform distribution of their

_respective

alleles in a

_{given population}

as well.

_Moreover,

with such

_probes

it becomes

_possible

to minimize the level of

identity

in state between bands of 2 individuals.

ACKNOWLEDGMENTS

We thank D

_Iirane,

DL

_Hartl,

A

Larson,

M Hedin and 2 anonymous referees for

_helpful

(16)

REFERENCES

Burke T

₍₁₉₈₉₎

DNA

_{fingerprinting}

and other methods for the

_study

of

_mating

success. Trends Ecol Evol

_5,

139-144

Burke

_T,

Bruford MW

₍₁₉₈₇₎

DNA

_{fingerprinting}

in birds. Nature

_(Lond)

_327,

149-152

Burke

_T,

Davies

_NB,

Bruford

MW,

Hatchwell BJ

₍₁₉₈₉₎

Parental care and

_mating

behaviour

of polyandrous

dunnocks Prunella modularis related to

_paternity

_by

DNA

fingerprinting.

Nature

_(Lond)

_338,

249-251

Brookfield JF1’

₍₁₉₈₉₎

_Analysis

of DNA

_{fingerprinting}

data in cases of

_disputed

paternity.

IMA J Math

_Appl

Med

_{Biol 6,}

11-131

Capy

P.

Rouault J

₍₁₉₈₇₎

Estimation of allele number in a natural

_{population by}

the isofemale line method. Genetics

_117,

795-801

Chakraborty

R, Meagher TR,

Smouse PE

₍₁₉₈₈₎

_{Parentage analysis}

with

_genetic

markers in natural

_populations.

I. The

_expected

_proportion

of

_offspring

with

unambiguous

paternity.

Genetics

_118,

527-536

Co3-ne

JA

₍₁₉₇₆₎

Lack of

_{genetic similarity}

between two

_sibling

_species

as revealed

by

varied

techniques.

Genetics

_84,

593-607

Dainou

_0,

Cariou

_ML,

David

JR,

Hickey

D

₍₁₉₈₇₎

_Amylase

gene

duplication:

an

ancestral trait in the

_{Drosophila melanogaster species}

_subgroup.

_{Heredity 59,}

245-251

Danielli

GA,

Costa R

₍₁₉₇₇₎

Transient

_equilibrium

at the Est-6 locus in wild

population

of

_{Drosophila melanogaster.}

Genetica

_47,

37-41

David

_JR,

_{Alonso-Morraga}

_A,

_Capy

_P,

Munoz-Serrano

_A,

Vouidibio J

₍₁₉₈₉₎

Short _rangevariations and alcohol resources in

_{Drosophila melanogaster.}

Symp

Evolutionary Biology of

Transient Unstable

_Populations

_(Fontevilla

A,

ed)

Springer-Verlag

Berlin,

130-143

Dausset J

₍₁₉₅₈₎

_{Iso-leuco-anticorps.}

Acta

_{Hematol 20,}

156

Dausset

_J,

_Svejgaard

A

₍₁₉₇₇₎

HLA and Diseases.

_Munksgaard,

_Copenhagen

Gilbert

_DA,

Lehman

_N,

O’Brien

_{SJ, Wayne}

RK

₍₁₉₉₀₎

Genetic

_{fingerprinting}

reflects

_population

differentiation in the California Channel Island fox. Nature

(Lond)

344,

764-767

Jeffreys AJ,

Wilson

_V,

Thein SL

₍₁₉₈₅₎

_{Hypervariable}

&dquo;minisatellite&dquo;

_regions

in

human DNA. Nature

_(Lond)

_{314, 67-73}

Jeffreys

AJ,

Morton DB

₍₁₉₈₇₎

DNA

_fingerprints

of

_dogs

and cats. Anim Genet

_18,

1-15

Jeffreys

AJ, Royle NJ,

Wilson

_{V, Wong}

Z

₍₁₉₈₈₎

_Spontaneous

mutation rates to

new

_length

alleles at

_{tandem-repetitive}

_{hypervariable}

loci in human DNA. Nature

(Lond)

332,

278-281

Keith TP

₍₁₉₈₃₎

_Frequency

distribution of esterase-5 alleles in two

_populations

of

Drosophila pseudoobscura.

Genetics

_105,

135-155

Klein J

₍₁₉₈₇₎

Natural

_{History of}

the

_Major

_{Histocompatibility}

_Complex.

John

Wiley

and

_Sons,

New York

Kuhnlein

_{U, Zadworny D,}

Dawe

_Y,

Fairfull

_RW,

Gavora JS

₍₁₉₉₀₎

Assessment of

inbreeding by

DNA

_{fingerprinting:}

_development

of a calibration curve

_using

defined

(17)

Lewin R

₍₁₉₈₉₎

Limits to DNA

_{fingerprinting.}

Science

_243,

1549-1551

Lynch

M

₍₁₉₈₈₎

Estimation of relatedness

_by

DNA

_{fingerprinting.}

Mol Biol

Evol 5,

584-599

Modiano

_G,

Battistuzzi

G,

Esan

_GJF,

Testa

_V,

Lazzato L

₍₁₉₇₉₎

Genetic

het-erogeneity

of &dquo;normal&dquo; human

_erythrocyte

_{glucose-6-phosphate}

_{dehydrogenase:}

an

electrophoretic polymorphism.

Proc Natl Acad Sci USA

_76,

852-856

Nakamura

_{Y, Leppert M,}

O’Connell

P,

Wolff

R,

Holm

_T,

Culver

_M,

Martin

C,

Fujimoto E,

Hoff

_M,

Kumlin

_E,

White R

₍₁₉₈₇₎

Variable number of tandem

repeat

(VNTR)

markers for human gene

mapping.

Science

235,

1616-1622

lVei

_M,

Li WH

₍₁₉₇₉₎

Mathematical model for

studying genetic

variation in terms

of restriction endonucleases. Proc Natl Acad Sci

_USA,

_76,

5269-5273

Pamilo P

₍₁₉₈₄₎

_Genotypic

correlation and

_regression

in _{social groups:}

_multiple

alleles, multiple

loci and subdivided

populations.

Genetics

_107,

307-320

Pamilo

_P,

Crozier RH

₍₁₉₈₂₎

_{Measuring genetic}

relatedness in natural

_populations:

methodology.

Tfieor

_Popul

Biol 21,

171-193

Queller DC,

Strassmann

JE,

Hughes

CR

₍₁₉₈₈₎

Genetic relatedness in colonies of

tropical

wasps with

multiple

queens. Science

242,

1155-1157

Queller

DC,

Goodnight

KF

₍₁₉₈₉₎

_Estimating

relatedness

_{using genetic}

markers. Evolution

_43,

258-275

Ramshaw

JAM,

Coyne

JA,

Lewontin RC

₍₁₉₇₉₎

The

_sensitivity

of

_gel

electrophore-sis as a detector of

_genetic

variation. Genetics

93,

1019-1037

Rouault

_J,

_Capy

P

₍₁₉₈₆₎

Comment 6valuer un nombre de

6chantillonnage.

Ann Econ

Stat 4,

111-124

Singh

RS

₍₁₉₇₉₎

Genic

heterogeneity

within

_{electrophoresis}

&dquo;alleles&dquo; and the

pattern of variation among loci in

_Drosophila

_{pseudoobscura.}

Genetics

_93,

997-1018

Singh RS,

Lewontin

_RC,

Felton AA

₍₁₉₇₆₎

Genic

heterogeneity

within

elec-trophoretic

&dquo;alleles&dquo; of xanthine

dehydrogenase

in

_{Drosophila pseudoobscura.}

Ge-netics

_84,

609-629

Schwartz

OA,

Armitage

hB

₍₁₉₈₃₎

Problems in the use of

_{genetic similarity}

to

show relatedness. Evolution

_37,

417-420

Vaiman

M,

Renard

C,

Lafage

P,

Ameteau

_J,

Nizzu P

₍₁₉₇₀₎

Evidence for a

histocompatibility

system

in

_{pig. Transplantation}

_10,

155-164

Vouidibio

_J,

_{Capy P, Defaye D,}

Pla

_E,

Sandrin

_J,

Csink

A,

David JR

₍₁₉₈₉₎

Short

range

genetic

structure of

_Drosophila

_melanogaster

_populations

in an

Afrotropical

urban area, and its

significance.

Proc Natl Acad Sci USA

86,

8442-8446

Wetton

_JH,

Carter

_RE,

Parkin

_DT,

Walters D

₍₁₉₈₇₎

_Demographic

_study

of a wild

house _sparrow

_{population by}

DNA

_{fingerprinting.}

Nature

_(Lond)

_327,

147-149

Wilkinson

GS,

lVIcCraken GF

₍₁₉₈₅₎

On

_estimating

relatedness