On interpolations in QTL detection

(1)

HAL Id: hal-00658592

https://hal.archives-ouvertes.fr/hal-00658592

Preprint submitted on 10 Jan 2012

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

On interpolations in QTL detection

Céline Delmas, Charles-Elie Rabier

To cite this version:

Céline Delmas, Charles-Elie Rabier. On interpolations in QTL detection. 2012. �hal-00658592�

(2)

On interpolations in QTL detection

Céline Delmas

INRA UR631, Station d’Amélioration Génétique des Animaux, Auzeville, France.

Charles-Elie Rabier

Université de Toulouse, Institut de Mathématiques de Toulouse, U.P.S., Toulouse, France.

INRA UR631, Station d’Amélioration Génétique des Animaux, Auzeville, France.

Summary. We consider the likelihood ratio test (LRT) process related to the test of the absence of QTL (a QTL denotes a quantitative trait locus, i.e. a gene with quantitative effect on a trait) on the interval[0, T] representing a chromosome.

Recently, Azaïs et al. (2011) proved that the LRT process was the square of a non linear interpolated process. However, in their study of the same problem, Chang et al. (2009) introduced another interpolation. So, why do Azaïs et al. (2011) and Chang et al. (2009) find different interpolations ? We correct errors present in the interpolation of Chang et al. (2009) and establish the link between the two interpolations. We finally generalize the interpolation of Chang et al. (2009) to the alternative hypothesis of a QTL located att^⋆∈[0, T].

Keywords: Gaussian process, Likelihood Ratio Test, Mixture models, Nuisance parameters present only under the alternative, QTL detection.

1. Introduction

WefousonthefamousIntervalMapping"ofLanderandBotstein(1989). That

is to say, weaddressthe problem of deteting aQuantitative Trait Lous, so-

alledQTL(ageneinueningaquantitativetraitwhihisabletobemeasured)

onagivenhromosome. Westudyabakrosspopulation: A×(A×B)^,^where A ând B âre ^purely ^homozygous ^lines. ^The ^trait îs ôbservedôn n individuals (progenies)and wedenote by Yj, j = 1, ..., n^, ^the observations, whih we will assume to be Gaussian, independent and identially distributed (i.i.d.). The

mehanismofgenetis,ormorepreiselyofmeiosis,impliesthatamongthetwo

hromosomesofeahindividual, oneispurelyinheritedfrom A^while^the^other

(thereombinedone),onsistsofpartsoriginatedfromA^and^parts^originated

fromB^, ^due^torossing-overs(seeforinstane Wuetal.(2007)).

The hromosome will be represented by the segment [0, T]^. ^The ^distane

on [0, T] îs âlled ^the ^geneti ^distane, ît îs ^measured ⁱⁿ ^Morgans. 2 ^geneti

markersareloatedatxedloationst1= 0< t2=T^. ^The^genomeX(t)^of^one

individual takesthevalue+1 îf, ^forêxample, ^the^reombined^hromosome îs

originatedfromAât^loationtând^takes^the^value−1îfîtîsôriginated^fromB

. WeusetheHaldane(1919)modelingthat anberepresentedasfollows: X(0)

is a random sign and X(t) = X(0)(−1)^N(t) ^where N(.) ^is ^a ^standard^Poisson

proesson[0, T]^. ^Weâssumeânânalysisôf^variane^model ^for^thequantitative trait:

Y =µ + X(t^⋆)q + σε ⁽¹⁾

(3)

2 C.E.Rabier^et^al.

whereεîsâ^Gaussian^white^noiseândt^∗ îs^the^true^loationôf^the^QTL.

Infatthegenomeinformationwillbeavailableonlyatthemarkerloations

andtheobservationwillbe

(Y, X(t1), X(t2)).

So, we observe n observations (Yj, Xj(t1), Xj(t2)) ^i.i.d. ^Calulation ^on ^the

Poissondistributionshowthat

r(t, t^′) :=P(X(t)X(t^′) =−1) =P(|N(t)−N(t^′)| ^odd) = 1

2 (1−e⁻²|^t⁻^t^′|),

wesetin addition

¯

r(t, t^′) = 1−r(t, t^′).

ThehallengeisthattheloationoftheQTLt^⋆ ^is^unknown,^so^we^will^perform

alikelihoodratiotest(LRT)inorder totest thepreseneof aQTL(ie. q= 0⁾

ateveryloationt∈[0, T]^. Ît ^leads^toâ^proess{Λn(t), t∈[0, T]}âlled^LRT

proess",and takingasteststatistithemaximumof thisproessomes down

to perform a LRT in a model when the loalisation of the QTL is an extra

parameter. Akeypointisthatsinethe"genomeinformation"isonlyavailable

at markerloations, wehave to dealwith a mixture model, when we perform

a test at a loation t ^whih ^does ^not ^orrespond ^to ^a ^marker ^loation. ^This

mixturemodelhastwoGaussianomponents: bothsomponentshavevariane

σ²^, ^but ^the^rst ômponent ^hasêxpeted ^valueµ+q ^whereas ^the^seond ône

has expeted value µ−q^. ^So, ât ^suh â ^loation t^, ⁱⁿ ôrder ^to ôbtain ^the

weightsofour mixturemodel, forthe rst(resp. seond) omponent,wehave

toomputetheprobabilitythatX(t) = 1^(resp. X(t) =−1⁾^given^the"genome information"at markers. In partiular, aordingto Azaïs et al. (2011), ifwe

allp(t) =P{X(t) = 1|X(t1), X(t2)}^(ie. ^the^weight^for^the^rst^omponent),

wehave:

p(t) =Q^1,1_t 1X(t1)=11X(t2)=1 + Q^1,_t⁻¹1X(t1)=11X(t2)=−1

+Q⁻_t^1,11_X(t1)=−11_X(t2)=1 + Q⁻_t^1,⁻¹1_X(t1)=−11_X(t2)=−1

where:

Q^1,1_t =r(t¯ 1, t) ¯r(t, t2)

¯

r(t1, t2) , Q^1,_t⁻¹= ¯r(t1, t)r(t, t2) r(t1, t2)

Q⁻_t^1,1= r(t1, t) ¯r(t, t2)

r(t1, t2) , Q⁻_t^1,⁻¹=r(t1, t)r(t, t2)

¯

r(t1, t2) .

This problem has been studied under some approximations by Rebaï et al.

(1995), Rebaï et al. (1994), Ciero (1998), Azaïs and Ciero-Ayrolles (2002),

Azaïs andWshebor(2009). Reently, Azaïs etal.(2011)haveshownthatthe

LRT proess wasthe square of a non linear interpolated proess. The aim of

this present artile is to establish the link between this non linear interpola-

tionandanotherinterpolationintroduedalso reentlybyChang et al.(2009).

The interpolation of Azaïs et al. (2011) is a non linear interpolation between

test statistis at markerloations : it means that all the test statistis inside

a marker interval, an be dedued by interpolation from the test statistis at

(4)

likelihoodprolewhih issmoothbetweenmarkers. Besides, thisinterpolation

leadsto aneasyformulaforomputing thesupremumoftheLRTproess(see

Lemma1ofAzaïsetal.(2011)). TheinterpolationofChangetal.(2009)isless

interestingfromagenetipointofview: itisverydiulttointerpretitgraphi-

ally. However,itisalwaysinterestingtounderstandwhythetwoartilesAzaïs

et al. (2011)and Chang et al. (2009), whih study thesame problem, present

dierent results. We will orret heretehnial errors present in Chang et al.

(2009), and establishthe link between thetwointerpolations. Finally, wewill

generalizetheinterpolationofChang etal.(2009)tothealternativehypothesis

ofaQTLloatedat t^⋆ ∈[0, T]^,^sineôntrary^to Âzaïsêt âl.^(2011),^Chang êt

al.(2009)fousedonlyonthenullhypothesis.

We refer to the book of Van der Vaart (1998) for elements of asymptoti

statistisused inproofs.

2. Two different interpolations

LetH0^be^the^null^hypothesis q= 0^. ^Sine ⁱⁿ ^Changêt âl.^(2009),^theâuthors

studyonlythenullhypothesis,wewillrstfousonlyunderthenullhypothesis.

Besides,itiswellknownthatforaregularmodel,thesoretestisequivalentto

thesquare oftheLRT, sowithoutlossof generality,wewilllimitourstudy to

soretests asinChang etal.(2009).

2.1. Under the null hypothesis

Let's onsider a loation t^, ^distint ^from ^the ^marker ^loations, ^that îs ^to ^say t ∈]t1, t2[^, ând ^the^result ^will ^be ^prolonged ^by ôntinuity ât ^marker^loations.

Sn(t)^will^be^the^sore^test^statisti^at^loationt^.

AordingtoTheorem1andformula(5)ofAzaïsetal.(2011),wehave:

Sn(t) =

nQ^1,1_t −Q⁻_t^1,1o

Sn(t1) +n

Q^1,1_t −Q^1,_t⁻¹o Sn(t2) r

Eh

{2p(t)−1}²i ^, ⁽²⁾

where

∀k= 1,2 Sn(tk) = Xn

j=1

(yj−µ) (21X(tk)=1−1) σ√

n ^,

E h

{2p(t)−1}²i

=n

Q^1,1_t −Q⁻_t^1,1o2

+n

Q^1,1_t −Q^1,_t⁻¹o2

+ 2n

Q^1,1_t −Q⁻_t^1,1o n

Q^1,1_t −Q^1,_t⁻¹o

e⁻²⁽ ^.

This is the non linear interpolation of Azaïs et al. (2011), between statistis

onmarkers. Theouple(Sn(t1), Sn(t2))^follow^a^standard^bivariate^normal^dis-

tribution with ovarianee⁻^2(t²⁻^t¹⁾^. ^Let's ^now ^establish ^the ^link ^beween ^this

interpolationandtheinterpolationofChang etal.(2009).

Aordingtoformula(5)ofAzaïsetal.(2011)andusingthefatthatQ^1,1_t =

(5)

1−Q⁻_t^1,⁻¹^and Q^1,_t⁻¹= 1−Q⁻_t^1,1^,^we^have Sn(t) = (1−2Q⁻_t^1,⁻¹)

Xn

j=1

(yj−µ)

1Xj(t1)=11Xj(t2)=1−1Xj(t1)=−11Xj(t2)=−1

σ√n r

Eh

{2p(t)−1}²i + (1−2Q⁻_t^1,1)

Xn

j=1

(yj−µ)

1Xj(t1)=11Xj(t2)=−1−1Xj(t1)=−11Xj(t2)=1

σ√ n

r E

h

{2p(t)−1}²i ^.

LetG¹_n(t)^andG²_n ^be^the^quantities ^suh^as^:

G¹_n= Xn

j=1

(yj−µ)

1_X_j_(t1)=11_X_j_(t2)=1−1_X_j_(t1)=−11_X_j_(t2)=−1

σp

nr(t¯ 1, t2)

,

G²_n= Xn

j=1

(yj−µ)

1Xj(t1)=11Xj(t2)=−1−1Xj(t1)=−11Xj(t2)=1

σp

n r(t1, t2)

.

G¹_n^andG²_n^areasymptotiallyindependentstandardnormalvariablesunderH0^.

Besides,itislearthatwehavethefollowingrelationshipbetweenSn(t)^,G¹_n^and G²_n ^:

Sn(t) =np

¯

r(t1, t2) (1−2Q⁻_t^1,⁻¹)G¹_n + p

r(t1, t2) (1−2Q⁻_t^1,1)G²_no /

r Eh

{2p(t)−1}²i

.

(3)

We will see later that this interpolation is the interpolation of Chang et al.

(2009)but rewritten withoutapproximations. Thisinterpolation is diultto

desribegraphiallybeauseitisaninterpolationbetweentwoteststatistisG¹_n

andG²_n^, ^whih ^bothsînlude ^the^genome informationatthe twomarkers. The maindiereneisthat G¹_n ândG²_n âre^not^pointsôf^the^proessSn(.)^,ôntrary

toSn(t1)^andSn(t2)^for^theinterpolationofAzaïsetal.(2011).

Note that if we want to obtain the sore test, we just have to replae µ

by µˆ := _n¹Pn

j=1Yj ^and σ ^by σˆ := n

1 n−1

P(Yj−µ)ˆ ²o1/2

in formulae (2) and

(3). These newexpressions(iewith ˆσândµˆ⁾ôfG¹_n^,G²_n^, Sn(t1)ândSn(t2)âre

asymptotiallyequivalenttothepreviousones. WewillallrespetivelyGb¹_n^and Gb²_n ^the^newexpressionsofG¹_n^,G²_n^.

Let's now fous on the work of Chang et al. (2009). With our notations, the

soretest statistiofformula(8)ofChang etal.(2009)is:

Un(t) =np

¯

r(t1, t2) (1−2Q⁻_t^1,⁻¹)Ge¹_n + p

r(t1, t2) (1−2Q⁻_t^1,1)Ge²_no /

r Eh

{2p(t)−1}²i

,

where

Ge¹_n = Pn

j=1yj 1Xj(t1)=11Xj(t2)=1 − Pn

j=1yj 1Xj(t1)=−11Xj(t2)=−1

ˆ σp

nr(t¯ 1, t2)

,

Ge²_n = Pn

j=1yj 1Xj(t1)=11Xj(t2)=−1 − Pn

j=1yj1Xj(t1)=−11Xj(t2)=1

ˆ σp

nr(t¯ 1, t2)

.

(6)

LetoPθ0(1)^beâ^sequeneôf^random^vetors^thatônverges^to^zeroⁱⁿprobability underH0^(i.e. ^no^QTLôn^the^wholeînterval^studied)ând^letOp(1)^beâ^sequene

boundedinprobability. Wehave:

Gb¹_n= Pn

j=1yj

1Xj(t1)=11Xj(t2)=1 − 1Xj(t1)=−11Xj(t2)=−1

ˆ σp

nr(t¯ 1, t2) −µˆ Pn

j=1

1Xj(t1)=11Xj(t2)=1 − 1 ˆ

σp nr(t¯ 1, t

=Ge¹_n−

Op(1) +Op(1/√

n) Op(1) =Ge¹_n+Op(1) +oPθ0(1) ^.

Inthesameway:

Gb²_n=Ge²_n+Op(1) +oPθ0(1) ^.

As a onsequene, sine Gb²_n =G²_n+oPθ0(1) ^and Gb¹_n = G¹_n+oPθ0(1)^, ^we ^an

remarkthatSn(t)6=Un(t) +oPθ0(1)^. ^So,^theinterpolationintroduedbyChang etal.(2009)isonlyanapproximationassaidbefore. TheinterpolationofChang

etal.(2009)rewrittenwithoutapproximationsispresentedinformula(3).

2.2. Under the alternative hypothesis

Letdenethealternativehypothesis:

Hat^⋆ :^the^QTLîs^loatedât^the^positiont^⋆^withêet q=a/√

n^wherea6= 0.

InAzaïset al.(2011),theauthorsshowthat thisalternativehypothesis ison-

tiguousto the nullhypothesis. So, it makesthe algebraeasy under Hat^⋆^. ^We

willstillhavethesameinterpolationsasunderH0^. ^In^partiular,^for^the^non^lin-

earinterpolation,aordingtoAzaïs etal.(2011),westillhave(Sn(t1), Sn(t2))

whihfollowsabivariatenormaldistributionwith ovarianee⁻^2(t²⁻^t¹⁾^. ^How-

ever,Sn(t1)ând Sn(t2)âre^notênteredânymore^: E{Sn(t1)} =ae⁻^2(t^⋆⁻^t¹⁾/σ

andE{Sn(t2)}=ae⁻^2(t²⁻^t^⋆⁾/σ^.

Let's fous now on the interpolation of Chang et al. (2009). After some

alulationsandusingthefat thatQ^1,1_t = 1−Q⁻_t^1,⁻¹ ^andQ^1,_t⁻¹= 1−Q⁻_t^1,1^,

weobtain

E

G¹_n =ap

¯

r(t1, t2) (2Q^1,1_t^⋆ −1)/σ ^, E

G²_n =ap

r(t1, t2) (2Q^1,_t^⋆⁻¹−1)/σ ^.

Besides,

Cov

G¹_n,G²_n =E

G¹_nG²_n −E

G¹_n E

G²_n = 0−E

G¹_n E G²_n

=−a²p

¯

r(t1, t2)p

r(t1, t2) (2Q^1,_t^⋆⁻¹−1) (2Q^1,1_t^⋆ −1)/σ² ^.

So, under the alternative, G¹_n ând G²_n ^will ^still ^beasymptotially normalwith unit variane. However,G¹_n ând G²_n âre^notindependentanymore(ontraryto underthenullhypothesis).

Notethat here,welimitedourstudytoonlytwogenetimarkersloatedon

thehromosome,but itaneasilybegeneralizedto severalmarkers.

(7)

3. Acknowledgements

Theauthors thank Jean-MarAzaïs and Jean-MihelElsen for fruitfuldisus-

sions. ThisworkhasbeensupportedbytheAnimalGenetiDepartmentofthe

FrenhNationalInstitute forAgriulturalResearh,SABRE, andtheNational

Center forSientiResearh.

CélineDelmas(eline.delmastoulouse.inra.fr)

INRAUR631,Stationd'AméliorationGénétiquedesAnimaux,

BP52627-31326Castanet-TolosanCedex,Frane.

Charles-ElieRabier(rabierstat.wis.edu)

UniversitédeToulouse,InstitutdeMathématiquesdeToulouse,U.P.S.,

F-31062ToulouseCedex9,Frane.

INRAUR631,Stationd'AméliorationGénétiquedesAnimaux,

BP52627-31326Castanet-TolosanCedex,Frane.

References

Azaïs,J.M.andCiero-Ayrolles,C.(2002).Anasymptotitestforquantitative

genedetetion. Ann. I. H.Poinaré, 38, 6,1087-1092.

Azaïs, J.M.and Wshebor,M.(2009). Level sets and extremaof random pro-

esses andelds. Wiley,New-York.

Azaïs,J.M.,Delmas,C.,Rabier,C-E(2011). Likelihood RatioTestproess for

QuantitativeTrait Lous detetion. submittedtoESAIM.

Chang,M.N., Wu, R., Wu, S. S.,Casella,G.(2009). Sorestatistisformap-

pingquantitativetraitloi. Statistial AppliationinGenetis andMoleular

Biology, 8(1),16.

Ciero,C.(1998). Asymptotidistributionofthemaximumlikelihoodratiotest

forgenedetetion. Statistis, 31, 261-285.

Haldane,J.B.S(1919).Theombinationoflinkagevaluesandthealulationof

distanebetweentheloioflinkedfators. JournalofGenetis, 8,299-309.

Lander,E.S.,Botstein,D.(1989). Mappingmendelianfatorsunderlyingquan-

titativetraitsusingRFLP linkagemaps. Genetis, 138,235-240.

Rabier,C-E.(2010). PhDthesis, UniversitéToulouse3,PaulSabatier.

Rebaï,A., Gonet,B.,Mangin,B.(1994). Approximatethresholdsofinterval

mappingtestsforQTLdetetion. Genetis, 138,235-240.

Rebaï,A.,Gonet,B.,Mangin,B.(1995). Comparingpowerofdierentmeth-

odsforQTLdetetion. Biometris, 51,87-99.

VanderVaart,A.W.(1998) Asymptoti statistis, CambridgeSeriesin Statis-

tialandProbabilistiMathematis.

Wu,R.,MA,C.X.,Casella,G.(2007)StatistialGenetisofQuantitativeTraits,