• Aucun résultat trouvé

On interpolations in QTL detection

N/A
N/A
Protected

Academic year: 2021

Partager "On interpolations in QTL detection"

Copied!
7
0
0

Texte intégral

(1)

HAL Id: hal-00658592

https://hal.archives-ouvertes.fr/hal-00658592

Preprint submitted on 10 Jan 2012

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

On interpolations in QTL detection

Céline Delmas, Charles-Elie Rabier

To cite this version:

Céline Delmas, Charles-Elie Rabier. On interpolations in QTL detection. 2012. �hal-00658592�

(2)

On interpolations in QTL detection

Céline Delmas

INRA UR631, Station d’Amélioration Génétique des Animaux, Auzeville, France.

Charles-Elie Rabier

Université de Toulouse, Institut de Mathématiques de Toulouse, U.P.S., Toulouse, France.

INRA UR631, Station d’Amélioration Génétique des Animaux, Auzeville, France.

Summary. We consider the likelihood ratio test (LRT) process related to the test of the absence of QTL (a QTL denotes a quantitative trait locus, i.e. a gene with quantitative effect on a trait) on the interval[0, T] representing a chromosome.

Recently, Azaïs et al. (2011) proved that the LRT process was the square of a non linear interpolated process. However, in their study of the same problem, Chang et al. (2009) introduced another interpolation. So, why do Azaïs et al. (2011) and Chang et al. (2009) find different interpolations ? We correct errors present in the interpolation of Chang et al. (2009) and establish the link between the two interpolations. We finally generalize the interpolation of Chang et al. (2009) to the alternative hypothesis of a QTL located att∈[0, T].

Keywords: Gaussian process, Likelihood Ratio Test, Mixture models, Nuisance parameters present only under the alternative, QTL detection.

1. Introduction

WefousonthefamousIntervalMapping"ofLanderandBotstein(1989). That

is to say, weaddressthe problem of deteting aQuantitative Trait Lous, so-

alledQTL(ageneinueningaquantitativetraitwhihisabletobemeasured)

onagivenhromosome. Westudyabakrosspopulation: A×(A×B),where A and B are purely homozygous lines. The trait is observedon n individuals (progenies)and wedenote by Yj, j = 1, ..., n, the observations, whih we will assume to be Gaussian, independent and identially distributed (i.i.d.). The

mehanismofgenetis,ormorepreiselyofmeiosis,impliesthatamongthetwo

hromosomesofeahindividual, oneispurelyinheritedfrom Awhiletheother

(thereombinedone),onsistsofpartsoriginatedfromAandpartsoriginated

fromB, duetorossing-overs(seeforinstane Wuetal.(2007)).

The hromosome will be represented by the segment [0, T]. The distane

on [0, T] is alled the geneti distane, it is measured in Morgans. 2 geneti

markersareloatedatxedloationst1= 0< t2=T. ThegenomeX(t)ofone

individual takesthevalue+1 if, forexample, thereombinedhromosome is

originatedfromAatloationtandtakesthevalue−1ifitisoriginatedfromB

. WeusetheHaldane(1919)modelingthat anberepresentedasfollows: X(0)

is a random sign and X(t) = X(0)(−1)N(t) where N(.) is a standardPoisson

proesson[0, T]. Weassumeananalysisofvarianemodel forthequantitative trait:

Y =µ + X(t)q + σε (1)

(3)

2 C.E.Rabieretal.

whereεisaGaussianwhitenoiseandt isthetrueloationoftheQTL.

Infatthegenomeinformationwillbeavailableonlyatthemarkerloations

andtheobservationwillbe

(Y, X(t1), X(t2)).

So, we observe n observations (Yj, Xj(t1), Xj(t2)) i.i.d. Calulation on the

Poissondistributionshowthat

r(t, t) :=P(X(t)X(t) =−1) =P(|N(t)−N(t)| odd) = 1

2 (1−e2|tt|),

wesetin addition

¯

r(t, t) = 1−r(t, t).

ThehallengeisthattheloationoftheQTLt isunknown,sowewillperform

alikelihoodratiotest(LRT)inorder totest thepreseneof aQTL(ie. q= 0)

ateveryloationt∈[0, T]. It leadstoaproessn(t), t∈[0, T]}alledLRT

proess",and takingasteststatistithemaximumof thisproessomes down

to perform a LRT in a model when the loalisation of the QTL is an extra

parameter. Akeypointisthatsinethe"genomeinformation"isonlyavailable

at markerloations, wehave to dealwith a mixture model, when we perform

a test at a loation t whih does not orrespond to a marker loation. This

mixturemodelhastwoGaussianomponents: bothsomponentshavevariane

σ2, but therst omponent hasexpeted valueµ+q whereas theseond one

has expeted value µ−q. So, at suh a loation t, in order to obtain the

weightsofour mixturemodel, forthe rst(resp. seond) omponent,wehave

toomputetheprobabilitythatX(t) = 1(resp. X(t) =−1)giventhe"genome information"at markers. In partiular, aordingto Azaïs et al. (2011), ifwe

allp(t) =P{X(t) = 1|X(t1), X(t2)}(ie. theweightfortherstomponent),

wehave:

p(t) =Q1,1t 1X(t1)=11X(t2)=1 + Q1,t11X(t1)=11X(t2)=1

+Qt1,11X(t1)=11X(t2)=1 + Qt1,11X(t1)=11X(t2)=1

where:

Q1,1t =r(t¯ 1, t) ¯r(t, t2)

¯

r(t1, t2) , Q1,t1= ¯r(t1, t)r(t, t2) r(t1, t2)

Qt1,1= r(t1, t) ¯r(t, t2)

r(t1, t2) , Qt1,1=r(t1, t)r(t, t2)

¯

r(t1, t2) .

This problem has been studied under some approximations by Rebaï et al.

(1995), Rebaï et al. (1994), Ciero (1998), Azaïs and Ciero-Ayrolles (2002),

Azaïs andWshebor(2009). Reently, Azaïs etal.(2011)haveshownthatthe

LRT proess wasthe square of a non linear interpolated proess. The aim of

this present artile is to establish the link between this non linear interpola-

tionandanotherinterpolationintroduedalso reentlybyChang et al.(2009).

The interpolation of Azaïs et al. (2011) is a non linear interpolation between

test statistis at markerloations : it means that all the test statistis inside

a marker interval, an be dedued by interpolation from the test statistis at

(4)

likelihoodprolewhih issmoothbetweenmarkers. Besides, thisinterpolation

leadsto aneasyformulaforomputing thesupremumoftheLRTproess(see

Lemma1ofAzaïsetal.(2011)). TheinterpolationofChangetal.(2009)isless

interestingfromagenetipointofview: itisverydiulttointerpretitgraphi-

ally. However,itisalwaysinterestingtounderstandwhythetwoartilesAzaïs

et al. (2011)and Chang et al. (2009), whih study thesame problem, present

dierent results. We will orret heretehnial errors present in Chang et al.

(2009), and establishthe link between thetwointerpolations. Finally, wewill

generalizetheinterpolationofChang etal.(2009)tothealternativehypothesis

ofaQTLloatedat t ∈[0, T],sineontraryto Azaïset al.(2011),Chang et

al.(2009)fousedonlyonthenullhypothesis.

We refer to the book of Van der Vaart (1998) for elements of asymptoti

statistisused inproofs.

2. Two different interpolations

LetH0bethenullhypothesis q= 0. Sine in Changet al.(2009),theauthors

studyonlythenullhypothesis,wewillrstfousonlyunderthenullhypothesis.

Besides,itiswellknownthatforaregularmodel,thesoretestisequivalentto

thesquare oftheLRT, sowithoutlossof generality,wewilllimitourstudy to

soretests asinChang etal.(2009).

2.1. Under the null hypothesis

Let's onsider a loation t, distint from the marker loations, that is to say t ∈]t1, t2[, and theresult will be prolonged by ontinuity at markerloations.

Sn(t)willbethesoreteststatistiatloationt.

AordingtoTheorem1andformula(5)ofAzaïsetal.(2011),wehave:

Sn(t) =

nQ1,1t −Qt1,1o

Sn(t1) +n

Q1,1t −Q1,t1o Sn(t2) r

Eh

{2p(t)−1}2i , (2)

where

∀k= 1,2 Sn(tk) = Xn

j=1

(yj−µ) (21X(tk)=1−1) σ√

n ,

E h

{2p(t)−1}2i

=n

Q1,1t −Qt1,1o2

+n

Q1,1t −Q1,t1o2

+ 2n

Q1,1t −Qt1,1o n

Q1,1t −Q1,t1o

e2( .

This is the non linear interpolation of Azaïs et al. (2011), between statistis

onmarkers. Theouple(Sn(t1), Sn(t2))followastandardbivariatenormaldis-

tribution with ovarianee2(t2t1). Let's now establish the link beween this

interpolationandtheinterpolationofChang etal.(2009).

Aordingtoformula(5)ofAzaïsetal.(2011)andusingthefatthatQ1,1t =

(5)

4 C.E.Rabieretal.

1−Qt1,1and Q1,t1= 1−Qt1,1,wehave Sn(t) = (1−2Qt1,1)

Xn

j=1

(yj−µ)

1Xj(t1)=11Xj(t2)=1−1Xj(t1)=11Xj(t2)=1

σ√n r

Eh

{2p(t)−1}2i + (1−2Qt1,1)

Xn

j=1

(yj−µ)

1Xj(t1)=11Xj(t2)=1−1Xj(t1)=11Xj(t2)=1

σ√ n

r E

h

{2p(t)−1}2i .

LetG1n(t)andG2n bethequantities suhas:

G1n= Xn

j=1

(yj−µ)

1Xj(t1)=11Xj(t2)=1−1Xj(t1)=11Xj(t2)=1

σp

nr(t¯ 1, t2)

,

G2n= Xn

j=1

(yj−µ)

1Xj(t1)=11Xj(t2)=1−1Xj(t1)=11Xj(t2)=1

σp

n r(t1, t2)

.

G1nandG2nareasymptotiallyindependentstandardnormalvariablesunderH0.

Besides,itislearthatwehavethefollowingrelationshipbetweenSn(t),G1nand G2n :

Sn(t) =np

¯

r(t1, t2) (1−2Qt1,1)G1n + p

r(t1, t2) (1−2Qt1,1)G2no /

r Eh

{2p(t)−1}2i

.

(3)

We will see later that this interpolation is the interpolation of Chang et al.

(2009)but rewritten withoutapproximations. Thisinterpolation is diultto

desribegraphiallybeauseitisaninterpolationbetweentwoteststatistisG1n

andG2n, whih bothsinlude thegenome informationatthe twomarkers. The maindiereneisthat G1n andG2n arenotpointsoftheproessSn(.),ontrary

toSn(t1)andSn(t2)fortheinterpolationofAzaïsetal.(2011).

Note that if we want to obtain the sore test, we just have to replae µ

by µˆ := n1Pn

j=1Yj and σ by σˆ := n

1 n1

P(Yj−µ)ˆ 2o1/2

in formulae (2) and

(3). These newexpressions(iewith ˆσandµˆ)ofG1n,G2n, Sn(t1)andSn(t2)are

asymptotiallyequivalenttothepreviousones. WewillallrespetivelyGb1nand Gb2n thenewexpressionsofG1n,G2n.

Let's now fous on the work of Chang et al. (2009). With our notations, the

soretest statistiofformula(8)ofChang etal.(2009)is:

Un(t) =np

¯

r(t1, t2) (1−2Qt1,1)Ge1n + p

r(t1, t2) (1−2Qt1,1)Ge2no /

r Eh

{2p(t)−1}2i

,

where

Ge1n = Pn

j=1yj 1Xj(t1)=11Xj(t2)=1 − Pn

j=1yj 1Xj(t1)=11Xj(t2)=1

ˆ σp

nr(t¯ 1, t2)

,

Ge2n = Pn

j=1yj 1Xj(t1)=11Xj(t2)=1 − Pn

j=1yj1Xj(t1)=11Xj(t2)=1

ˆ σp

nr(t¯ 1, t2)

.

(6)

LetoPθ0(1)beasequeneofrandomvetorsthatonvergestozeroinprobability underH0(i.e. noQTLonthewholeintervalstudied)andletOp(1)beasequene

boundedinprobability. Wehave:

Gb1n= Pn

j=1yj

1Xj(t1)=11Xj(t2)=1 − 1Xj(t1)=11Xj(t2)=1

ˆ σp

nr(t¯ 1, t2) −µˆ Pn

j=1

1Xj(t1)=11Xj(t2)=1 − 1 ˆ

σp nr(t¯ 1, t

=Ge1n

Op(1) +Op(1/√

n) Op(1) =Ge1n+Op(1) +oPθ0(1) .

Inthesameway:

Gb2n=Ge2n+Op(1) +oPθ0(1) .

As a onsequene, sine Gb2n =G2n+oPθ0(1) and Gb1n = G1n+oPθ0(1), we an

remarkthatSn(t)6=Un(t) +oPθ0(1). So,theinterpolationintroduedbyChang etal.(2009)isonlyanapproximationassaidbefore. TheinterpolationofChang

etal.(2009)rewrittenwithoutapproximationsispresentedinformula(3).

2.2. Under the alternative hypothesis

Letdenethealternativehypothesis:

Hat :theQTLisloatedatthepositiontwitheet q=a/√

nwherea6= 0.

InAzaïset al.(2011),theauthorsshowthat thisalternativehypothesis ison-

tiguousto the nullhypothesis. So, it makesthe algebraeasy under Hat. We

willstillhavethesameinterpolationsasunderH0. Inpartiular,forthenonlin-

earinterpolation,aordingtoAzaïs etal.(2011),westillhave(Sn(t1), Sn(t2))

whihfollowsabivariatenormaldistributionwith ovarianee2(t2t1). How-

ever,Sn(t1)and Sn(t2)arenotenteredanymore: E{Sn(t1)} =ae2(tt1)

andE{Sn(t2)}=ae2(t2t).

Let's fous now on the interpolation of Chang et al. (2009). After some

alulationsandusingthefat thatQ1,1t = 1−Qt1,1 andQ1,t1= 1−Qt1,1,

weobtain

E

G1n =ap

¯

r(t1, t2) (2Q1,1t −1)/σ , E

G2n =ap

r(t1, t2) (2Q1,t1−1)/σ .

Besides,

Cov

G1n,G2n =E

G1nG2n −E

G1n E

G2n = 0−E

G1n E G2n

=−a2p

¯

r(t1, t2)p

r(t1, t2) (2Q1,t1−1) (2Q1,1t −1)/σ2 .

So, under the alternative, G1n and G2n will still beasymptotially normalwith unit variane. However,G1n and G2n arenotindependentanymore(ontraryto underthenullhypothesis).

Notethat here,welimitedourstudytoonlytwogenetimarkersloatedon

thehromosome,but itaneasilybegeneralizedto severalmarkers.

(7)

6 C.E.Rabieretal.

3. Acknowledgements

Theauthors thank Jean-MarAzaïs and Jean-MihelElsen for fruitfuldisus-

sions. ThisworkhasbeensupportedbytheAnimalGenetiDepartmentofthe

FrenhNationalInstitute forAgriulturalResearh,SABRE, andtheNational

Center forSientiResearh.

CélineDelmas(eline.delmastoulouse.inra.fr)

INRAUR631,Stationd'AméliorationGénétiquedesAnimaux,

BP52627-31326Castanet-TolosanCedex,Frane.

Charles-ElieRabier(rabierstat.wis.edu)

UniversitédeToulouse,InstitutdeMathématiquesdeToulouse,U.P.S.,

F-31062ToulouseCedex9,Frane.

INRAUR631,Stationd'AméliorationGénétiquedesAnimaux,

BP52627-31326Castanet-TolosanCedex,Frane.

References

Azaïs,J.M.andCiero-Ayrolles,C.(2002).Anasymptotitestforquantitative

genedetetion. Ann. I. H.Poinaré, 38, 6,1087-1092.

Azaïs, J.M.and Wshebor,M.(2009). Level sets and extremaof random pro-

esses andelds. Wiley,New-York.

Azaïs,J.M.,Delmas,C.,Rabier,C-E(2011). Likelihood RatioTestproess for

QuantitativeTrait Lous detetion. submittedtoESAIM.

Chang,M.N., Wu, R., Wu, S. S.,Casella,G.(2009). Sorestatistisformap-

pingquantitativetraitloi. Statistial AppliationinGenetis andMoleular

Biology, 8(1),16.

Ciero,C.(1998). Asymptotidistributionofthemaximumlikelihoodratiotest

forgenedetetion. Statistis, 31, 261-285.

Haldane,J.B.S(1919).Theombinationoflinkagevaluesandthealulationof

distanebetweentheloioflinkedfators. JournalofGenetis, 8,299-309.

Lander,E.S.,Botstein,D.(1989). Mappingmendelianfatorsunderlyingquan-

titativetraitsusingRFLP linkagemaps. Genetis, 138,235-240.

Rabier,C-E.(2010). PhDthesis, UniversitéToulouse3,PaulSabatier.

Rebaï,A., Gonet,B.,Mangin,B.(1994). Approximatethresholdsofinterval

mappingtestsforQTLdetetion. Genetis, 138,235-240.

Rebaï,A.,Gonet,B.,Mangin,B.(1995). Comparingpowerofdierentmeth-

odsforQTLdetetion. Biometris, 51,87-99.

VanderVaart,A.W.(1998) Asymptoti statistis, CambridgeSeriesin Statis-

tialandProbabilistiMathematis.

Wu,R.,MA,C.X.,Casella,G.(2007)StatistialGenetisofQuantitativeTraits,

Références

Documents relatifs

ĂůƌĞĂĚLJ ďĞĞŶ ĂĚĚƌĞƐƐĞĚ ĚŝƌĞĐƚůLJ ŝŶ ĞͲYd> ƐƚƵĚŝĞƐ͘ /Ŷ ƉŝŐƐ͕ ĞͲYd> ƐƚƵĚŝĞƐ ŚĂǀĞ ŚŝŐŚůŝŐŚƚĞĚ ŐĞŶĞƐ ĂŶĚ ŐĞŶĞ ŶĞƚǁŽƌŬƐ ƚŚĂƚ ĂƌĞ

For the LDLA method, we did expect biased estimates of the QTL posi- tions when the distance between the QTL were reduced. This was because the method considers only one QTL, and

By combining information on milk yield and composition with gene expression data from a large bovine mammary RNA sequence dataset, we highlight multiple lactation, gene

In conclusion, we show that it is feasible to apply the indirect method at the gametic level for computing conditional gametic relationships between two parents of all individuals

Average accuracies of genomic prediction (± standard errors) for Holstein-Friesian (HF, solid fill) and Jersey (J, diagonal fill) animals using a model that included a

The RN effect on carcass lean content as measured during grading failed to be significant, while the error probability for RN effects on an estimate of

livestock: an heteroskedastic model, and models corresponding to several hypotheses concerning the distribution of the QTL substitution effect among the sires: a

PROSPECTS: 1/ The genetic basis of quality traits understood and utilized in breeding 2/The genomic regions involved in the variability of traits that determine the tuber