• Aucun résultat trouvé

Maximum likelihood covariance matrix estimation from two possibly mismatched data sets

N/A
N/A
Protected

Academic year: 2021

Partager "Maximum likelihood covariance matrix estimation from two possibly mismatched data sets"

Copied!
11
0
0

Texte intégral

(1)

HAL Id: hal-02572461

https://hal.archives-ouvertes.fr/hal-02572461

Submitted on 13 May 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Maximum likelihood covariance matrix estimation from two possibly mismatched data sets

Olivier Besson

To cite this version:

Olivier Besson. Maximum likelihood covariance matrix estimation from two possibly mismatched data sets. Signal Processing, Elsevier, 2020, 167, pp.107285-107294. �10.1016/j.sigpro.2019.107285�.

�hal-02572461�

(2)

an author's https://oatao.univ-toulouse.fr/25984

https://doi.org/10.1016/j.sigpro.2019.107285

Besson, Olivier Maximum likelihood covariance matrix estimation from two possibly mismatched data sets. (2020) Signal Processing, 167. 107285-107294. ISSN 0165-1684

(3)

Maximum likelihood covariance matrix estimation from two possibly mismatched data sets

Olivier Besson

ISAE-SUPAERO, 10 Avenue Edouard Belin, Toulouse 31055, France

Keywords:

Covariance matrix estimation Maximum likelihood Mismatch

a b s t r a c t

Weconsiderestimatingthecovariancematrixfromtwodatasets,onewhosecovariancematrixR1isthe soughtoneandanothersetofsampleswhosecovariancematrixR2slightlydiffersfromthesoughtone, duee.g.todifferentmeasurementconfigurations.Weassumehoweverthatthetwomatricesarerather close, whichweformulatebyassuming thatR11/2R−12 R11/2|R1 followsaWishartdistributionaround the identitymatrix.Itturnsoutthatthisassumptionresultsintwodatasetswithdifferentmarginaldistri- butions,hencetheproblembecomesthatofcovariancematrixestimationfromtwodatasetswhichare distribution-mismatched.Themaximumlikelihoodestimator(MLE)isderived andisshowntodepend onthevaluesofthenumberofsamplesineachset.Weshowthatitinvolveswhiteningofonedataset bytheotherone,shrinkage ofeigenvaluesand colorization,atleastwhenonedatasetcontainsmore samplesthanthesizepoftheobservationspace.Whenbothdatasetshavelessthanpsamplesbutthe totalnumberislargerthanp,theMLEagainentailseigenvaluesshrinkagebutthistimeafteraprojection operation.Simulationresultscomparethenewestimatortostate of the art techniques.

1. Problemstatement

Analysisorprocessingofmultichanneldatamostoftenrelieson thecovariancematrix,whichisafundamentaltoole.g.,forprinci- palcomponentanalysis,spectralanalysis, adaptivefiltering,detec- tion,directionofarrivalestimationamongothers[1–3].Inpractical applications,the p×pcovariancematrix Rneeds tobe estimated froma finitenumbernofsamples.When thelatterareindepen- dentandGaussiandistributed,themaximumlikelihoodestimator ofRisn1S whereXisthe p×ndatamatrixandS=XXT is the samplecovariancematrix(SCM)[1].However,inlowsamplesup- port orwhen deviationfromtheGaussian assumptionisathand, theSCMtendstobehavepoorly.Inparticularitwasobservedthat thesamplecovariancematrixisusuallylesswell-conditionedthan the true covariancematrix, and thereforeconsiderable efforthas been dedicatedto regularizingit withaview to improveits per- formance.

One ofthe mostimportantapproach in thisrespectis dueto Stein [4–6] who, instead of maximizing the likelihood function, advocated tominimize a meaningfullossfunction within agiven classofestimators.Steinhenceintroducedtheconceptofadmissi- bleestimationandminimaxestimatorsundertheso-calledStein’s loss.HeshowedthattheSCM-basedestimatorisnotminimaxand

E-mail address: [email protected]

derivedminimax estimators intwoimportantclasses,namelyes- timatorsofthe formRˆ=GDGT whereDisadiagonalmatrixand Gisthe CholeskyfactorofS,orof theformRˆ=Udiag

ϕ(λ)

UT whereUdiag(λ)UTistheeigenvaluedecompositionofSandϕ(λ)

isanon-linearfunction ofλ.ThisseminalworkofSteingaverise

to a great number of studies, see forinstance [7–13] andrefer- encestherein.Asecond classofrobust estimatesisbasedonlin- ear shrinkageof the SCMto a target matrix(an approach which can be interpreted as an empirical Bayes technique), i.e., esti- matesof theformRˆ=αRt+βS whereRt=I isthe mostwidely spread choice, see e.g., [14–20]. Note that these techniques ap- plied with Rt=I achieve an affine transformation of the eigen- values of S, while retaining the eigenvectors, andtherefore bear resemblancewith Stein’smethod, although theselection ofα, β

may not be driven by the same principle. Robustness to a pos- sibly non Gaussian distribution has also been a topic of consid- erable interest andmany papers havefocused on robust estima- tionforellipticallydistributeddata,seee.g.,[21–30]andreferences therein.

Mostof the above cited works deal with estimation of a co- variancematrixfromasingledataset.Inthispaper,weconsidera situationwheretwodatasetsX1andX2areavailable,withrespec- tive covariancematricesR1 andR2.Thissituationtypically arises inradarapplicationswhenonewishestodetectatargetburiedin clutter with unknown statistics [31,32].In order to infer the lat- ter,trainingsamplesaregenerallyused,whichhopefullysharethe https://doi.org/10.1016/j.sigpro.2019.107285

(4)

samestatisticsastheclutterinthecellundertest(CUT).However, it has been evidenced that clutter is most often heterogeneous [31],withadiscrepancycomparedtotheCUTthatmaygrowwith the distance to the CUT [33]. Therefore, one is led to use some clustering that separates training samples, either based on their proximity to the CUT or by means of some statistical criterion, suchasthepowerselectedtraining[34].Thesamplessoselected are deemed to be representative ofthe clutter inthe CUT while othersarelessreliable,whichcorrespondstothesituationconsid- eredherein. A second example is in the field of synthetic aper- tureradarinthecasewhereasceneisimagedontwoconsecutive days,withpossiblechangesinbetween[35].Finally,inhyperspec- tralimagery,theproblemoftarget oranomaly detectionleadsto averysimilarframework.Indeed,thebackgroundinapixelunder testhastobeestimatedfromthelocalpixelsaroundandpixelslo- catedfurtherapart[36].Inthepresentpaper,we assumethat R2 isclosetoR1,thecovariancematrixwewishtoestimate.SinceR2 differs frombutis closeto R1 we investigateusing both X1 and X2 toestimateR1.ThereasonforusingalsoX2 isthatdespiteits covariancematrixisnot R1,itiscloseto.Additionally,one might facesituationswherethe numberofsamplesinX1 is verysmall.

Thispaperconstitutesafirstapproachtothisspecificproblemand wefocushereinonthe mostnaturalapproach,namelymaximum likelihoodestimation. The objectiveis to figureout thepros and consof the latterand the conditionsunder which it is an accu- rateestimator.The paperisorganizedasfollows.Insection 2we formulate thestatistical assumptions:more precisely, we assume thatR11/2R21R11/2|R1 isa random matrixwitha Wishart distribu- tion around the identity matrix, and we derive the joint distri- bution of (X1, X2). Section 3 is devoted to the derivation of the maximumlikelihoodestimatorofR1from(X1,X2),takingintoac- countthepossibleconfigurationsregardingthenumberofsamples ineachdataset.Numericalsimulationsillustratetheperformance oftheMLEandcompareitwithexistingalternatives insection 4. Conclusionsandpossibleextensionsofthepresentworkaredrawn insection5.

2. Datamodel

Let us assume that we have two sets of measurements X1(p×n1) and X2(p×n2) which are distributed according to X1=d N(0,R1,I) and X2 d

=N(0,R2,I) where N(0,,) de- notes the matrix-variate normal distribution whose density is (2π)pn/2||n/2||p/2etr{12XT1X1} with |.|the determi-

nant and etr{.} the exponential of the trace of a matrix. Note that we consider real-valued data here whereas in radar appli- cations it is customary to consider complex-valued signals. In Appendix A we show how the results below can be readily ex- tended to the complex case. Our goal in this paper is to esti- mateR1, usingboth X1 andX2 even if R1=R2.However we as- sumethat the two matrices are close to each other. In orderto define a model that can reflect the proximity between R1 and R2, we note that the natural distancebetween them is givenby d2(R1,R2)=p

k=1log2λk(GT1R21G1)[37,38]whereG1 isasquare- root of R1, i.e., R1=G1GT1 and λk(GT1R−12 G1) stands for the kth eigenvalue of GT1R21G1. This matrix is pivotal in adaptive detec- tionproblemsalso.Moreprecisely,inthecaseofacovariancemis- matchbetweenthetrainingsamplesandthedataundertest,itis shownin [39] that the performance of the well-known adaptive matchedfilterdependsessentiallyonthismatrix.Therefore,itbe- comes natural to encapsulate the difference between R1 and R2 through the matrixW=GT1R21G1 and its proximity to the iden- tity matrix. There are of course different ways to translate this constraintinthe model.Forinstance afrequentist approach may beadvocated wherethe jointprobability densityfunction of(X1,

X2) would be maximized under the constraintthat the distance between W and I is smaller than some value. Alternatively, and thisis what we elect here, one can resortto an empirical Bayes approach wherethe randommatrix W followssome prior distri- butionratherconcentratedaroundI.Formathematicaltractability, we choose a conjugate prior for W and we assume that W fol- lowsaWishartdistributionwithνdegreesoffreedomandparam-

etermatrixμ−1I,i.e.,W=dWp ν,μ−1I

.Ofcourse,thisisarather strongassumptionwhosevaliditywouldbedifficulttocheck,e.g., on realdata. However, it is inaccordance withthe mere knowl- edgewehaveabouttherelationbetweenR1andR2,anditallows fortractablederivations.

UsingthefactthatX1|R1andX2|R2areindependentandGaus- sian distributed with respective covariance matrices R1 and R2, andsinceR2=G1W−1GT1,wethusassumethefollowingstochastic model:

p(X1,X2|R1,W)=(2π)p(n1+n2)/2|R1|n1/2W1R1−n2/2

×etr

1

2XT1R11X11

2XT2G1TWG11X2

(1a)

p(W)= μνp/2

2νp/2p(ν/2)|W|−p−1)/2etr

1 2μW

(1b) Note that E

W1 =(νp1)1μI so that E{R2}= E

G1W1GT1 =(νp1)1μR1:therefore,forE{R2}tobeequal

toR1,one mustselectμ=νp1.Observealso thatW comes closerto I asν grows large.Indeed, E{W}=ν(νp1)1I and E

(WE{W})2 =pν(νp1)2I which goesto zero asν [40].

Themarginaldistributionof(X1,X2) isobtainedbyintegrating (1)withrespecttoW,whichresultsin

p(X1,X2|R1)=

W>0

p(X1,X2|R1,W)p(W)dW

=(2π)p(n1+n2)/2μνp/2

2νp/2p(ν/2) |R1|(n1+n2)/2etr

1

2XT1R−11 X1

×

W>0|W|+n2−p−1)/2etr

1 2W

μI+G−11 X2XT2G−T1

dW

=(2π)p(n1+n2)/2μνp/2

2νp/2p(ν/2) 2+n2)p/2p((ν+n2)/2)

×|R1|(n1+n2)/2μI+G−11 X2XT2G−T1 +n2)/2etr

1

2XT1R−11 X1

=(2π)−pn1/2|R1|−n1/2etr

1

2XT1R−11 X1

×π−pn2/2p((ν+n2)/2)

p(ν/2) |μR1|n2/2I+XT2[μR1]1X2+n2)/2

(2) Inordertoobtainthethirdequality,wemadeuseofthefactthat, ifS=dWp(ν,),

S>0

p(S)dS=1

S>0|S|p1)/2etr

1 2S1

dS

=2νp/2p(ν/2)||ν/2 (3)

Note that p(X1, X2|R1) in (2) can be factored as p(X1,X2|R1)=f1(X1,R1)×f2(X2,R1) which shows that X1

and X2 are marginally independent and that p(X1,X2|R1)= p(X1|R1)p(X2|R1) with p(X1|R1)etr

12XT1R−11 X1 and p(X2|R1)I+XT2[μR1]−1X2+n2)/2.Duetothemodeladopted fortherandommatrixW=GT1R21G1,X2 followsamatrixvariate Student distribution [41]. Therefore, the fact that R2=R1 results here in to two data sets with different distributions: one set

Références

Documents relatifs

Cette approche à la base développée dans le cadre de KVM, peut également être appliquée aux applications pour réduire leur empreinte mé- moire, tout en maintenant des

Estimation of these parameters, the largest eigenvalues and per- taining eigenvectors of the genetic covariance matrix, via restricted maximum likelihood using derivatives of

Meyer K (1989) Restricted maximum likelihood to estimate variance components for animal models with several random effects using a derivative-free algorithm. User

Abramovich, “Regularized covariance matrix estimation in complex elliptically symmetric dis- tributions using the expected likelihood approach - part 2: The under-sampled case,”

L'expérimentation de la fonctionnalité de gestion des thésaurus grâce à J'étude de cas du thésaurus de J'activité gouvernementale a été un bon moyen pour

In addition to generating a super resolution background, the decoder also has the capabil- ity to generate a second super resolution image that also includes the

Plusieurs caractéristiques physio-pathologiques participent à ce « remodelage vasculaire pulmonaire » (Figure 3: Anomalies dans la circulation pulmonaire au cours de

Le stress psychologique, l’angoisse et l’inquiétude massivement éprouvés dans les régions durablement affectées par la catastrophe, à tel point qu’il s’agit pour