HAL Id: hal-00658592
https://hal.archives-ouvertes.fr/hal-00658592
Preprint submitted on 10 Jan 2012
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
On interpolations in QTL detection
Céline Delmas, Charles-Elie Rabier
To cite this version:
Céline Delmas, Charles-Elie Rabier. On interpolations in QTL detection. 2012. �hal-00658592�
On interpolations in QTL detection
Céline Delmas
INRA UR631, Station d’Amélioration Génétique des Animaux, Auzeville, France.
Charles-Elie Rabier
Université de Toulouse, Institut de Mathématiques de Toulouse, U.P.S., Toulouse, France.
INRA UR631, Station d’Amélioration Génétique des Animaux, Auzeville, France.
Summary. We consider the likelihood ratio test (LRT) process related to the test of the absence of QTL (a QTL denotes a quantitative trait locus, i.e. a gene with quantitative effect on a trait) on the interval[0, T] representing a chromosome.
Recently, Azaïs et al. (2011) proved that the LRT process was the square of a non linear interpolated process. However, in their study of the same problem, Chang et al. (2009) introduced another interpolation. So, why do Azaïs et al. (2011) and Chang et al. (2009) find different interpolations ? We correct errors present in the interpolation of Chang et al. (2009) and establish the link between the two interpolations. We finally generalize the interpolation of Chang et al. (2009) to the alternative hypothesis of a QTL located att⋆∈[0, T].
Keywords: Gaussian process, Likelihood Ratio Test, Mixture models, Nuisance parameters present only under the alternative, QTL detection.
1. Introduction
WefousonthefamousIntervalMapping"ofLanderandBotstein(1989). That
is to say, weaddressthe problem of deteting aQuantitative Trait Lous, so-
alledQTL(ageneinueningaquantitativetraitwhihisabletobemeasured)
onagivenhromosome. Westudyabakrosspopulation: A×(A×B),where A and B are purely homozygous lines. The trait is observedon n individuals (progenies)and wedenote by Yj, j = 1, ..., n, the observations, whih we will assume to be Gaussian, independent and identially distributed (i.i.d.). The
mehanismofgenetis,ormorepreiselyofmeiosis,impliesthatamongthetwo
hromosomesofeahindividual, oneispurelyinheritedfrom Awhiletheother
(thereombinedone),onsistsofpartsoriginatedfromAandpartsoriginated
fromB, duetorossing-overs(seeforinstane Wuetal.(2007)).
The hromosome will be represented by the segment [0, T]. The distane
on [0, T] is alled the geneti distane, it is measured in Morgans. 2 geneti
markersareloatedatxedloationst1= 0< t2=T. ThegenomeX(t)ofone
individual takesthevalue+1 if, forexample, thereombinedhromosome is
originatedfromAatloationtandtakesthevalue−1ifitisoriginatedfromB
. WeusetheHaldane(1919)modelingthat anberepresentedasfollows: X(0)
is a random sign and X(t) = X(0)(−1)N(t) where N(.) is a standardPoisson
proesson[0, T]. Weassumeananalysisofvarianemodel forthequantitative trait:
Y =µ + X(t⋆)q + σε (1)
2 C.E.Rabieretal.
whereεisaGaussianwhitenoiseandt∗ isthetrueloationoftheQTL.
Infatthegenomeinformationwillbeavailableonlyatthemarkerloations
andtheobservationwillbe
(Y, X(t1), X(t2)).
So, we observe n observations (Yj, Xj(t1), Xj(t2)) i.i.d. Calulation on the
Poissondistributionshowthat
r(t, t′) :=P(X(t)X(t′) =−1) =P(|N(t)−N(t′)| odd) = 1
2 (1−e−2|t−t′|),
wesetin addition
¯
r(t, t′) = 1−r(t, t′).
ThehallengeisthattheloationoftheQTLt⋆ isunknown,sowewillperform
alikelihoodratiotest(LRT)inorder totest thepreseneof aQTL(ie. q= 0)
ateveryloationt∈[0, T]. It leadstoaproess{Λn(t), t∈[0, T]}alledLRT
proess",and takingasteststatistithemaximumof thisproessomes down
to perform a LRT in a model when the loalisation of the QTL is an extra
parameter. Akeypointisthatsinethe"genomeinformation"isonlyavailable
at markerloations, wehave to dealwith a mixture model, when we perform
a test at a loation t whih does not orrespond to a marker loation. This
mixturemodelhastwoGaussianomponents: bothsomponentshavevariane
σ2, but therst omponent hasexpeted valueµ+q whereas theseond one
has expeted value µ−q. So, at suh a loation t, in order to obtain the
weightsofour mixturemodel, forthe rst(resp. seond) omponent,wehave
toomputetheprobabilitythatX(t) = 1(resp. X(t) =−1)giventhe"genome information"at markers. In partiular, aordingto Azaïs et al. (2011), ifwe
allp(t) =P{X(t) = 1|X(t1), X(t2)}(ie. theweightfortherstomponent),
wehave:
p(t) =Q1,1t 1X(t1)=11X(t2)=1 + Q1,t−11X(t1)=11X(t2)=−1
+Q−t1,11X(t1)=−11X(t2)=1 + Q−t1,−11X(t1)=−11X(t2)=−1
where:
Q1,1t =r(t¯ 1, t) ¯r(t, t2)
¯
r(t1, t2) , Q1,t−1= ¯r(t1, t)r(t, t2) r(t1, t2)
Q−t1,1= r(t1, t) ¯r(t, t2)
r(t1, t2) , Q−t1,−1=r(t1, t)r(t, t2)
¯
r(t1, t2) .
This problem has been studied under some approximations by Rebaï et al.
(1995), Rebaï et al. (1994), Ciero (1998), Azaïs and Ciero-Ayrolles (2002),
Azaïs andWshebor(2009). Reently, Azaïs etal.(2011)haveshownthatthe
LRT proess wasthe square of a non linear interpolated proess. The aim of
this present artile is to establish the link between this non linear interpola-
tionandanotherinterpolationintroduedalso reentlybyChang et al.(2009).
The interpolation of Azaïs et al. (2011) is a non linear interpolation between
test statistis at markerloations : it means that all the test statistis inside
a marker interval, an be dedued by interpolation from the test statistis at
likelihoodprolewhih issmoothbetweenmarkers. Besides, thisinterpolation
leadsto aneasyformulaforomputing thesupremumoftheLRTproess(see
Lemma1ofAzaïsetal.(2011)). TheinterpolationofChangetal.(2009)isless
interestingfromagenetipointofview: itisverydiulttointerpretitgraphi-
ally. However,itisalwaysinterestingtounderstandwhythetwoartilesAzaïs
et al. (2011)and Chang et al. (2009), whih study thesame problem, present
dierent results. We will orret heretehnial errors present in Chang et al.
(2009), and establishthe link between thetwointerpolations. Finally, wewill
generalizetheinterpolationofChang etal.(2009)tothealternativehypothesis
ofaQTLloatedat t⋆ ∈[0, T],sineontraryto Azaïset al.(2011),Chang et
al.(2009)fousedonlyonthenullhypothesis.
We refer to the book of Van der Vaart (1998) for elements of asymptoti
statistisused inproofs.
2. Two different interpolations
LetH0bethenullhypothesis q= 0. Sine in Changet al.(2009),theauthors
studyonlythenullhypothesis,wewillrstfousonlyunderthenullhypothesis.
Besides,itiswellknownthatforaregularmodel,thesoretestisequivalentto
thesquare oftheLRT, sowithoutlossof generality,wewilllimitourstudy to
soretests asinChang etal.(2009).
2.1. Under the null hypothesis
Let's onsider a loation t, distint from the marker loations, that is to say t ∈]t1, t2[, and theresult will be prolonged by ontinuity at markerloations.
Sn(t)willbethesoreteststatistiatloationt.
AordingtoTheorem1andformula(5)ofAzaïsetal.(2011),wehave:
Sn(t) =
nQ1,1t −Q−t1,1o
Sn(t1) +n
Q1,1t −Q1,t−1o Sn(t2) r
Eh
{2p(t)−1}2i , (2)
where
∀k= 1,2 Sn(tk) = Xn
j=1
(yj−µ) (21X(tk)=1−1) σ√
n ,
E h
{2p(t)−1}2i
=n
Q1,1t −Q−t1,1o2
+n
Q1,1t −Q1,t−1o2
+ 2n
Q1,1t −Q−t1,1o n
Q1,1t −Q1,t−1o
e−2( .
This is the non linear interpolation of Azaïs et al. (2011), between statistis
onmarkers. Theouple(Sn(t1), Sn(t2))followastandardbivariatenormaldis-
tribution with ovarianee−2(t2−t1). Let's now establish the link beween this
interpolationandtheinterpolationofChang etal.(2009).
Aordingtoformula(5)ofAzaïsetal.(2011)andusingthefatthatQ1,1t =
4 C.E.Rabieretal.
1−Q−t1,−1and Q1,t−1= 1−Q−t1,1,wehave Sn(t) = (1−2Q−t1,−1)
Xn
j=1
(yj−µ)
1Xj(t1)=11Xj(t2)=1−1Xj(t1)=−11Xj(t2)=−1
σ√n r
Eh
{2p(t)−1}2i + (1−2Q−t1,1)
Xn
j=1
(yj−µ)
1Xj(t1)=11Xj(t2)=−1−1Xj(t1)=−11Xj(t2)=1
σ√ n
r E
h
{2p(t)−1}2i .
LetG1n(t)andG2n bethequantities suhas:
G1n= Xn
j=1
(yj−µ)
1Xj(t1)=11Xj(t2)=1−1Xj(t1)=−11Xj(t2)=−1
σp
nr(t¯ 1, t2)
,
G2n= Xn
j=1
(yj−µ)
1Xj(t1)=11Xj(t2)=−1−1Xj(t1)=−11Xj(t2)=1
σp
n r(t1, t2)
.
G1nandG2nareasymptotiallyindependentstandardnormalvariablesunderH0.
Besides,itislearthatwehavethefollowingrelationshipbetweenSn(t),G1nand G2n :
Sn(t) =np
¯
r(t1, t2) (1−2Q−t1,−1)G1n + p
r(t1, t2) (1−2Q−t1,1)G2no /
r Eh
{2p(t)−1}2i
.
(3)
We will see later that this interpolation is the interpolation of Chang et al.
(2009)but rewritten withoutapproximations. Thisinterpolation is diultto
desribegraphiallybeauseitisaninterpolationbetweentwoteststatistisG1n
andG2n, whih bothsinlude thegenome informationatthe twomarkers. The maindiereneisthat G1n andG2n arenotpointsoftheproessSn(.),ontrary
toSn(t1)andSn(t2)fortheinterpolationofAzaïsetal.(2011).
Note that if we want to obtain the sore test, we just have to replae µ
by µˆ := n1Pn
j=1Yj and σ by σˆ := n
1 n−1
P(Yj−µ)ˆ 2o1/2
in formulae (2) and
(3). These newexpressions(iewith ˆσandµˆ)ofG1n,G2n, Sn(t1)andSn(t2)are
asymptotiallyequivalenttothepreviousones. WewillallrespetivelyGb1nand Gb2n thenewexpressionsofG1n,G2n.
Let's now fous on the work of Chang et al. (2009). With our notations, the
soretest statistiofformula(8)ofChang etal.(2009)is:
Un(t) =np
¯
r(t1, t2) (1−2Q−t1,−1)Ge1n + p
r(t1, t2) (1−2Q−t1,1)Ge2no /
r Eh
{2p(t)−1}2i
,
where
Ge1n = Pn
j=1yj 1Xj(t1)=11Xj(t2)=1 − Pn
j=1yj 1Xj(t1)=−11Xj(t2)=−1
ˆ σp
nr(t¯ 1, t2)
,
Ge2n = Pn
j=1yj 1Xj(t1)=11Xj(t2)=−1 − Pn
j=1yj1Xj(t1)=−11Xj(t2)=1
ˆ σp
nr(t¯ 1, t2)
.
LetoPθ0(1)beasequeneofrandomvetorsthatonvergestozeroinprobability underH0(i.e. noQTLonthewholeintervalstudied)andletOp(1)beasequene
boundedinprobability. Wehave:
Gb1n= Pn
j=1yj
1Xj(t1)=11Xj(t2)=1 − 1Xj(t1)=−11Xj(t2)=−1
ˆ σp
nr(t¯ 1, t2) −µˆ Pn
j=1
1Xj(t1)=11Xj(t2)=1 − 1 ˆ
σp nr(t¯ 1, t
=Ge1n−
Op(1) +Op(1/√
n) Op(1) =Ge1n+Op(1) +oPθ0(1) .
Inthesameway:
Gb2n=Ge2n+Op(1) +oPθ0(1) .
As a onsequene, sine Gb2n =G2n+oPθ0(1) and Gb1n = G1n+oPθ0(1), we an
remarkthatSn(t)6=Un(t) +oPθ0(1). So,theinterpolationintroduedbyChang etal.(2009)isonlyanapproximationassaidbefore. TheinterpolationofChang
etal.(2009)rewrittenwithoutapproximationsispresentedinformula(3).
2.2. Under the alternative hypothesis
Letdenethealternativehypothesis:
Hat⋆ :theQTLisloatedatthepositiont⋆witheet q=a/√
nwherea6= 0.
InAzaïset al.(2011),theauthorsshowthat thisalternativehypothesis ison-
tiguousto the nullhypothesis. So, it makesthe algebraeasy under Hat⋆. We
willstillhavethesameinterpolationsasunderH0. Inpartiular,forthenonlin-
earinterpolation,aordingtoAzaïs etal.(2011),westillhave(Sn(t1), Sn(t2))
whihfollowsabivariatenormaldistributionwith ovarianee−2(t2−t1). How-
ever,Sn(t1)and Sn(t2)arenotenteredanymore: E{Sn(t1)} =ae−2(t⋆−t1)/σ
andE{Sn(t2)}=ae−2(t2−t⋆)/σ.
Let's fous now on the interpolation of Chang et al. (2009). After some
alulationsandusingthefat thatQ1,1t = 1−Q−t1,−1 andQ1,t−1= 1−Q−t1,1,
weobtain
E
G1n =ap
¯
r(t1, t2) (2Q1,1t⋆ −1)/σ , E
G2n =ap
r(t1, t2) (2Q1,t⋆−1−1)/σ .
Besides,
Cov
G1n,G2n =E
G1nG2n −E
G1n E
G2n = 0−E
G1n E G2n
=−a2p
¯
r(t1, t2)p
r(t1, t2) (2Q1,t⋆−1−1) (2Q1,1t⋆ −1)/σ2 .
So, under the alternative, G1n and G2n will still beasymptotially normalwith unit variane. However,G1n and G2n arenotindependentanymore(ontraryto underthenullhypothesis).
Notethat here,welimitedourstudytoonlytwogenetimarkersloatedon
thehromosome,but itaneasilybegeneralizedto severalmarkers.
6 C.E.Rabieretal.
3. Acknowledgements
Theauthors thank Jean-MarAzaïs and Jean-MihelElsen for fruitfuldisus-
sions. ThisworkhasbeensupportedbytheAnimalGenetiDepartmentofthe
FrenhNationalInstitute forAgriulturalResearh,SABRE, andtheNational
Center forSientiResearh.
CélineDelmas(eline.delmastoulouse.inra.fr)
INRAUR631,Stationd'AméliorationGénétiquedesAnimaux,
BP52627-31326Castanet-TolosanCedex,Frane.
Charles-ElieRabier(rabierstat.wis.edu)
UniversitédeToulouse,InstitutdeMathématiquesdeToulouse,U.P.S.,
F-31062ToulouseCedex9,Frane.
INRAUR631,Stationd'AméliorationGénétiquedesAnimaux,
BP52627-31326Castanet-TolosanCedex,Frane.
References
Azaïs,J.M.andCiero-Ayrolles,C.(2002).Anasymptotitestforquantitative
genedetetion. Ann. I. H.Poinaré, 38, 6,1087-1092.
Azaïs, J.M.and Wshebor,M.(2009). Level sets and extremaof random pro-
esses andelds. Wiley,New-York.
Azaïs,J.M.,Delmas,C.,Rabier,C-E(2011). Likelihood RatioTestproess for
QuantitativeTrait Lous detetion. submittedtoESAIM.
Chang,M.N., Wu, R., Wu, S. S.,Casella,G.(2009). Sorestatistisformap-
pingquantitativetraitloi. Statistial AppliationinGenetis andMoleular
Biology, 8(1),16.
Ciero,C.(1998). Asymptotidistributionofthemaximumlikelihoodratiotest
forgenedetetion. Statistis, 31, 261-285.
Haldane,J.B.S(1919).Theombinationoflinkagevaluesandthealulationof
distanebetweentheloioflinkedfators. JournalofGenetis, 8,299-309.
Lander,E.S.,Botstein,D.(1989). Mappingmendelianfatorsunderlyingquan-
titativetraitsusingRFLP linkagemaps. Genetis, 138,235-240.
Rabier,C-E.(2010). PhDthesis, UniversitéToulouse3,PaulSabatier.
Rebaï,A., Gonet,B.,Mangin,B.(1994). Approximatethresholdsofinterval
mappingtestsforQTLdetetion. Genetis, 138,235-240.
Rebaï,A.,Gonet,B.,Mangin,B.(1995). Comparingpowerofdierentmeth-
odsforQTLdetetion. Biometris, 51,87-99.
VanderVaart,A.W.(1998) Asymptoti statistis, CambridgeSeriesin Statis-
tialandProbabilistiMathematis.
Wu,R.,MA,C.X.,Casella,G.(2007)StatistialGenetisofQuantitativeTraits,