HAL Id: tel-01087106
https://tel.archives-ouvertes.fr/tel-01087106
Submitted on 25 Nov 2014
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
semi-paramétriques et non paramétriques
Jean-Bernard Salomond
To cite this version:
Jean-Bernard Salomond. Propriétés fréquentistes des méthodes Bayésiennes semi-paramétriques et non paramétriques. Mathématiques générales [math.GM]. Université Paris Dauphine - Paris IX, 2014.
Français. �NNT : 2014PA090034�. �tel-01087106�
Thèse
présentée à
L'université Paris-Dauphine
Éole dotorale de Dauphine
Centre de Reherhe en Mathématiques de la Déision
Pour l'obtention du titre de
Doteur en mathémathiques appliquées
Présentée etsoutenue par
Jean-Bernard Salomond
Le30 Septembre 2014
Propriétés fréquentistes des méthodes
Bayésiennes semi-paramétriques et
non-paramétriques
Jury :
IsmaëlCastillo CNRS Examinateur
FabienneComte Université Paris Desartes Examinateur
Céile Durot Université Paris Ouest Nanterre LaDéfense Examinateur
Elisabeth Gassiat Université Paris-Sud Rapporteur
Dominique Piard Université Paris Diderot -Paris7 Examinateur
VinentRivoirard Université Paris Dauphine Examinateur
Judith Rousseau Université Paris-Dauphine Diretrie de thèse
La reherhe sur les méthodes bayésiennes non-paramétriques onnaît un essor
onsidérable depuis les vingt dernières années notamment depuis le développe-
ment d'algorithmes de simulation permettant leur mise en pratique. Il est don
néessaire de omprendre, d'un point de vue théorique, le omportement de es
méthodes. Cettethèse présentediérentes ontributionsàl'analysedes propriétés
fréquentistes des méthodes bayésiennes non-paramétriques. Si se plaer dans un
adreasymptotiquepeutparaîtrerestritifdeprimeabord,elapermetnéanmoins
d'appréhender lefontionnementdes proédures bayésiennes dansdes modèlesex-
trêmementomplexes. Celapermetnotammentdedéteterlesaspetsdel'apriori
partiulièrementinuentssur l'inferene. Denombreux résultatsgénérauxontété
obtenus dans e adre, ependant au fur et à mesure que les modèles deviennent
de plus en plus omplexes, de plus en plus réalistes, es derniers s'éartent des
hypothèses lassiques et ne sont plus ouverts par la théorie existante. Outre
l'intérêt intrinsèque de l'étude d'un modèle spéique ne satisfaisant pas les hy-
pothèses lassiques, ela permet aussi de mieux omprendre les méanismes qui
gouvernent le fontionnement des méthodes bayésiennes non-paramétriques.
Chapitre 1 L'introdutionprésente le paradigme bayésien et l'approhe bayési-
enne des problèmes non-paramétriques. Nous introduisons les propriétés
fréquentistes des méthodes bayésiennes et présentons leur importane dans
la ompréhension du omportement de la loi a posteriori. Nous présentons
ensuite les prinipaux modèles étudiés dans ette thèse, et les diultés
posées par eux-i pour l'étude de leurs propriétés asymptotiques.
Chapitre 2 Dans e hapitre, nous étudions la onsistane et la vitesse de on-
entration de la loia posterioridans lemodèle de densitédéroissante pour
diérentesmétriques. Cemodèleestpartiulièrementintéressantarlesden-
sitésdéroissantesontunereprésentationsousformedemélanged'uniformes
et sont don un as partiulier de mélange pour lequel le supportdu noyau
dépend du paramètre. Dans e adre, les hypothèses lassiques néessaires
pour laonsistanede laloiaposteriorine sontpas vériées. Notammentla
loiapriorine met passusammentde massesurlesvoisinagesde Kullbak-
Leiblerduvrai paramètre,etuneadaptationdes méthodes usuellesest don
néessaire. Pour deux familles d'a priori lassiques, nous prouvons que l'a
posterioriseonentre àlavitesse minimaxepour lespertes
L 1
etHellinger.Nousétudions ensuitelaonsistane de la loiaposterioride ladensitépour
lespertespontuelle etnormesup. Cesdeux métriquessonten généraldi-
ilesàétudierarellesnepeuventêtrereliéesàladivergenenaturellequ'est
la divergene de Kullbak-Leibler. Pour es deux pertes, nous prouvons la
onsistanede l'aposteriorietdonnons une bornesupérieurepourlavitesse
de onentration.
Chapitre 3 Nous proposons un test bayésien non paramétrique de déroissane
d'une fontion dans le modèle de régression gaussien. Dans e adre, outre
le fait que les deux hypothèses sont non-paramétriques, l'hypothèse nulle
est inlue dans l'alternative. Il s'agit don d'un as de test partiulière-
ment diile. En outre dans e as, l'approhe usuelle par le fateur de
Bayes n'est pas onsistante. Nous proposons don une approhe alternative
reprenant les idées d'approximation d'une hypothèse pontuelle par un in-
tervalle. Nous prouvons que pour une large famille de lois a priori, le test
proposé est onsistant et sépare les hypothèses à la vitesse minimaxe. De
plus notre proédure est faile à implémenter et à mettre en ÷u vre. Nous
étudions ensuite son omportement sur des données simulées et omparons
lesrésultatsave lesméthodeslassiquesexistantes danslalittérature. Pour
haun des as onsidérés, nous obtenons des résultats au moins aussi bons
quelesméthodesexistantes, etlessurpassonspourunertainnombredeas.
Chapitre 4 (o-éritave BartekKnapik)Nousproposonsuneméthodegénérale
pourl'étudedesproblèmesinverseslinéairesmal-posésdansunadrebayésien.
S'il existe de nombreux résultats sur les méthodes de régularisation et la
vitesse de onvergene d'estimateurs lassiques, pour l'estimation de fon-
tions dans un problème inverse mal-posé, les vitesses de onentration d'a
posterioridansleadrebayésienn'aétéquetrèspeuétudiédanseadre. De
pluses quelquesrares résultatsexistantne onsidèrent quedes famillestrès
limitéesde loisa priori,en généralreposant sur la déompositionen valeurs
singulières de l'opérateur onsidéré. Dans e hapitre nous proposons des
onditions générales sur la loi a priori sous lesquelles l'a posteriori se on-
entre à une ertaine vitesse. Notre approhe nous permet de trouver les
vitesses de onentration de l'a posteriori pour de nombreux modèles et de
larges lasses de loi a priori. Cette approhe est de plus partiulièrement
intéressantearellepermetde mieuxomprendrelefontionnementde laloi
aposteriori etnotammentl'impat de l'opérateursur l'inférene.
Researh on Bayesian nonparametri methods has reeived a growing interest for
the past twenty years, espeiallysine the development of powerfulsimulation al-
gorithmswhih makestheimplementationof omplexBayesian methodspossible.
From that point it is neessary to understand from a theoretial point of view
the behaviour of Bayesian nonparametri methods. This thesis presents various
ontributionstothestudyoffrequentistpropertiesofBayesiannonparametripro-
edures. Although studying these methods from an asymptoti angle may seems
restritive,itallowstograsptheoperationofthe Bayesianmahineryinextremely
omplex models. Furthermore, this approah is partiularly useful to detet the
harateristis of the prior that are strongly inuential in the inferene. Many
general results have been proposed in the literature in this setting, however the
moreomplexand realistithemodelsthefurthertheygetfromtheusualassump-
tions. Thusmany models that are of greatinterestin pratie are not overed by
the general theory. If the study of a model that does not fall under the general
theory has aninterest onitsowns, italsoallowsfor abetterunderstanding of the
behaviour of Bayesian nonparametri methods ina general setting.
Chapter 1 The introdution presents the Bayesian paradigm and the Bayesian
approah to nonparametri problems. We introdue frequentist properties
of Bayesian proedures and present their importane in the understanding
of the behaviourof the posterior distribution. Wethen present the dierent
models studied in this manusript and the hallenge faed in studying of
their asymptotiproperties.
Chapter 2 In this hapter, we study onsisteny and onentration rates of the
posterior distributionunder several metris inthe monotonedensity model.
This model is partiularly interesting as monotone densities an be written
as a mixture of uniform kernels whih is a speial ase of kernels for whih
the supportdependsonthe parameter. In thisase theusual hypotheses re-
quired toderive posterioronentrationrate are not satised. Inpartiular,
the prior distribution we onsider do not put positive mass on Kullbak-
Leiblerneighbourhoodsofthetrue parameterandwethushavetoadaptthe
standardmethodstogetanupperboundontheposterioronentrationrate.
Fortwostandardpriordistributions,weprovethattheposterioronentrate
attheminimax ratefor the
L 1
and theHellinger losses. Wethen study on-sistenyof the posteriorunderthe pointwiseand supremum loss. These two
metris are in general diult to study in the Bayesian framework as they
arenot relatedtothe Kullbak-Leiblerdivergene whihisthe naturalsemi-
metriin this setting. We however provethat the posterioris onsistent for
both losses and get an upperbound for the posterioronentration rate.
Chapter 3 We propose a Bayesian nonparametri proedure to test for mono-
toniity in the regression setting. In this ase, not only the null and the
alternativehypothesesare nonparametri, but one isembedded inthe other
whih makes the testing problem partiularly diult. In partiular the
Bayes-Fator, whih is a usual Bayesian answer to testing problems, is not
onsistent under the null hypothesis. We propose an alternative approah
thatreliesontheideasofapproximatingapointnullhypothesisbyshrinking
intervals. Theproposedproedureisonsistentforawidefamilyofpriordis-
tributions and separate the hypotheses at the minimax rate. Furthermore,
ourapproahiseasytoimplementand doesnotrequire heavy omputations
ontrariwisetothe existingproedures. Wethenstudyitsbehaviouronsim-
ulated data and for all the onsidered ases, our proedure doesat least as
goodas the lassial ones, and outperform themin some ases.
Chapter 4 (Joint work with Bartek Knapik) We propose a general approah to
study nonparametri ill-posedlinear inverse problems ina Bayesian setting.
Although there is a wide literature on regularisation methods and onver-
geneofestimators inthis setting,theposterioronentrationinaBayesian
setting has not reeived muh attention yet. Furthermore, the few existing
results only onsider very restrited families of prior distributions, mostly
related to the singular value deomposition of the operator at hand. In
this hapter we give general onditions on the prior suh that the posterior
onentratesataertainrate. Thisapproahallowsustoderiveasymptoti
resultsforvariousill-posedinverseproblemsandwidefamiliesofpriors. Fur-
thermore, this approah is partiularly interesting inthe sense that it gives
somevaluableinsightsonthebehaviourofthe posteriordistributioninthese
models and the impat onthe operator onthe inferene.
Résumé iii
Summary vii
1 Introdution 1
1.1 Bayesian nonparametri approahes . . . 2
1.1.1 Bayesian modeling . . . 2
1.1.2 Bayesian nonparametris . . . 3
1.2 Asymptoti properties of the posterior distribution . . . 5
1.2.1 Posterior onsisteny . . . 5
1.2.2 Posterior onentration rate . . . 7
1.2.3 Minimaxonentration rates and adaptation . . . 9
1.3 NonparametriBayesian testing . . . 10
1.4 Challengingasymptoti properties . . . 11
1.4.1 Inferene under monotoniity onstraints . . . 12
1.4.2 Ill posed linear inverse problems . . . 15
2 Monotone densities 27 2.1 Introdution . . . 28
2.2 Mainresults . . . 31
2.2.1
L 1
and Hellinger metri . . . . . . . . . . . . . . . . . . . . 322.2.2 Pointwise and supremum loss . . . 33
2.3 Proofs . . . 35
2.3.1 Proof of Theorems 2.1and 2.2 . . . 35
2.3.2 Proof of Theorems 2.3and 2.5 . . . 39
2.3.3 Proof of Theorem 2.4 . . . 42
2.3.4 Proof of Theorem 2.6 . . . 43
2.4 Tehnial Lemmas . . . 44
2.4.1 Proof of Lemma 2.1 . . . 44
2.4.2 Proof of Lemma 2.2 . . . 50
2.5 Additionnal proof . . . 51
2.6 Disussion . . . 53
3 Bayesian testing for monotoniity 57 3.1 Introdution . . . 58
3.1.1 Modelling with monotone onstraints . . . 58
3.1.2 The Bayes fator approah . . . 59
3.1.3 Analternativeapproah . . . 59
3.2 Construtionof the test . . . 61
3.2.1 The testingproedure . . . 61
3.2.2 Theoretialresults . . . 62
3.2.3 A hoie for the prior inthe non informativease . . . 64
3.3 Simulated Examples . . . 65
3.4 AppliationtoGlobal Warming data . . . 68
3.5 Proofof Theorem 3.1 . . . 69
3.6 Proofsof tehnial lemmas . . . 71
3.6.1 Proof of Lemma3.1 . . . 71
3.6.2 Proof of lemma3.2 . . . 74
3.6.3 Proof of Lemma3.3 . . . 77
3.7 Disussion . . . 77
4 Ill-posed inverse problems 81 4.1 Introdution . . . 82
4.2 GeneralTheorem . . . 83
4.3 Modulus of ontinuity . . . 85
4.4 Somemodels . . . 87
4.4.1 White noise . . . 87
4.4.2 Regression . . . 96
4.5 Disussion . . . 106
Introdution
PerhapsI shouldnot have been a sherman,he thought.
Butthatwasthe thingthatI wasborn for.
Ernest Hemingway, The old manand the sea.
Résumé
L'introdution présente leparadigme bayésien et l'approhe bayésienne des prob-
lèmesnon-paramétriques. Nousintroduisonslespropriétésfréquentistesdesméth-
odes bayésiennes et présentons leur importane dans la ompréhension du om-
portement de la loi a posteriori. Nous présentons ensuite les prinipaux modèles
étudiés dans ette thèse, etles diultés posées par eux-i pour l'étudede leurs
propriétés asymptotiques.
This introdution presents the main onepts ommon to the following hap-
ters,thestatistialmodelinganditsBayesianapproahthatweadoptinthisthesis.
Weproeedwithaquikintrodutiontononparametristatistisandtheonstru-
tion of prior distributions in aninnite dimensional spae, and we emphasize the
importaneof frequentists properties of Bayesian nonparametri proedures. We
then present the dierent statistialmodels studiedinthis manusript.
1.1 Bayesian nonparametri approahes
Themaingoalofstatistisistoinferonarandomphenomenongiven observations.
The ore onept of statistis is probabilisti modelling, that is a mathematial
approximation of the random phenomenon at hand. In a statistial model, an
observation
X
on an observation spaeX
is assumed to be generated from aprobability distribution
P
that belongs to a modelP
. Usually this distribution is haraterized by a parameterθ
in a parameter setΘ
whih gives the samplingmodel
{X , P θ , θ ∈ Θ } .
The aim of statistis is then to infer, and make deisions on the model, based
on the observed data. To model omplex data generating phenomenon, the pa-
rameterspae
Θ
may be very large and possibly innitedimensional. As oftenin mathematialsienes, it is interesting to delineate regions of statistial method-ology, and modern mathematial statistis tends to dierentiate Bayesian versus
frequentistmethods, parametri versus nonparametri models. In this setion,we
deneBayesian nonparametri models and underline their importane.
1.1.1 Bayesian modeling
Statisti models usually fall into either the frequentist paradigmor the Bayesian
one. The frequentistparadigmonsiders that the data are generated from axed
distribution
P θ 0
assoiated with the true parameterθ 0
. Letg
be a funtion fromΘ
toΞ
, suh that one is interested in making inferene ong(θ 0 )
. Frequentiststatistiians look forstatistis, that isfuntions
S : X 7→ Ξ
that minimizesariskR(g(θ 0 ), S(X)).
The risk is most of the time assoiated with a metri or semi-metri
d
, or moregeneralyany lossfuntion, and an be rewritten
R(g(θ), S(X)) =
Eθ 0 [d(g(θ 0 ), S(X))] ,
whereE
θ
is the expetation with respet toP θ
.IntheBayesianparadigm,onemodelstheignoraneontheparameter
θ
througha probability distribution
Π
based on prior beliefs (the prior distribution). An extensive introdution to Bayesian statistis an be found in Robert (2007). ABayesian model isthus a samplingmodel
X ∼ P θ , θ ∈ Θ
together with a prior model
θ ∼ Π
whih an be ombinedthrough the Bayes' rules toget aprobability distribution
of the parameter given the data alled the posterior distribution dened as for all
measurable
A ⊂ Θ
Π(θ ∈ A | X) = R
A P θ (X)Π(dθ) R
Θ P θ (X)Π(dθ) .
(1.1)It isthe singleobjet onwhihallinferene(e.g. estimation,testing, onstrution
of rediblesets, et.) isbased.
The Bayesian approah to statistis has beome inreasingly popular, espe-
ially sine the 1990's beause of the development of new sampling methodssuh
as Markov-Chains Monte-Carlo (MCMC) algorithms that makes sampling under
the posterior distribution feasible if not easy. Bayesian methods are now used in
a wide variety of domains, from biology to nane and data analysts are more
and more attrated by its axiomati viewof unertainty and its apaity to han-
dleomplexmodels,see Gelman etal.(2004)for instane. However, the fatthat
somemethodsarealledBayesian emphasizesthe fatthatthereisstilltwophilo-
sophial approahes to statistial modeling. When the parameter spae is nite
dimensional,Bayesian and frequentistmethodsusually agree whenthe amountof
information grows. In partiular, under weak assumptions on the prior distribu-
tion,thesoalledBernstein-von-MiseTheorem,aspresentedinLeCam and Yang
(2000), shows that Bayesian redible sets and frequentist ondene intervals are
asymptotiallyequivalent. Thisresultispartiularlyimportantasitindiatesthat
Bayesian models with dierent priors 1
will eventually agree when the amount of
information 2
grows, and willgive a similaransweras frequentists ones.
1.1.2 Bayesian nonparametris
Nonparametri models are often dened as probabilisti models with massively
many parameters (see Müller and Mitra, 2013) or with an innite dimensional
1
With aslightabuseofnotations,wemaysaypriorforpriordistributions whenthere isno
onfusion
2
parameter spae as inGhosh and Ramamoorthi(2003). These models oer more
exibilitythanparametrimethodsbuttheirmathematialomplexityisingeneral
more involved than for parametri methods.
A rst probleminBayesian nonparametris is todenea probabilitydistribu-
tion onaninnitedimensionalspae. Choosinga priordistribution isakey point
inoftheBayesianinferene,andgoingfrompriorknowledgetoapriordistribution
an behallenging. In partiular forinnitedimensional parameterspaes, assur-
ingthatthepriordistributionhasasuientlylargesupportisadiulttask,not
mentioning the diulty to ompute the posterior distribution for suh models.
A popular tool in the Bayesian nonparametri literature is the Dirihlet proess
introduedbyFerguson(1974). TheDirihletproessisaprobabilitymeasureson
the set of probability measure and an be dened as follows:
Denition 1.1 (Dirihlet proess, Ferguson, 1974). Let
α
be a non null nitemeasure on
X
. Wesay thatP
followsaDirihletproessDP (α)
,ifforallk ∈ N ∗
, allpartitionof measurable sets(B 1 , . . . , B k )
ofX
,(P (B 1 ), . . . , P (B k )) ∼ D (α(B 1 ), . . . , α(B k ))
where
D
isthe Dirihlet distribution.The Dirihlet proesses have been proved to have a large weak support (see
Ferguson,1973),whihisalldistributionswhosesupportisinludedinthesupport
of the basemeasure
α
. Its hyperparametersare easilyinterpretableand itlead to tratable posteriors. Moreover Sethuraman (1994) showed that the Dirihlet pro-essanbeobtainedinaonstrutiveway alledthe stikbreaking representation.
In addition, it opened the way to more exible prior distributions on the set of
density funtions. Sinethenmany priordistributionsoninnitedimensionalsets
have been proposed. Forinstane Antoniak (1974)introduedmixturesof Dirih-
letproessinthe ontext ofprobabilitydensitiesestimation. Given aolletionof
kernels
K µ ( · )
depending ona parameterµ
we denethe mixtureθ( · ) =
Z
X
K µ ( · )dP (µ),
where
P
is a probability measure. Thus, hoosing a prior onP
(e.g. a Dirihletproess prior) induesa prior on
θ
.Many otherpriorshavebeenproposedinthe literature, generallasses ofmix-
turesforthe density model(Lo,1984),hierarhial Dirihletproesses(Teh etal.,
2006), Gaussian proesses (Lenk , 1991) among others. It is typially diult in
nonparametri settings to quantify the impat of a prior distribution onthe pos-
that when the amount of information inreases, inferene based on dierent pri-
ors will merge, it does not hold easily when the parameter spae is very large.
Diaonis and Freedman (1986)showed that in some ases, Bayesian nonparamet-
ri proedures an lead to inonsistent results (when the data are assume to be
sampled from a distribution
P θ 0
, the posterior distribution does not aumulate its mass around the true parameter). Some other examples showthat even if theposterioronentratesitsmassaroundthetrueparameter,thepriorstillinuenes
the rate atwhih this onentration ours.
1.2 Asymptoti properties of the posterior distri-
bution
Looking at the asymptoti behaviour of the posterior distribution helps under-
standingtheimpatof theprioronthe posteriordistribution. It isalsoimportant
to detet whih parts of the prior inuene the most the posterior. In partiular,
some aspets of the prior may remain when the amount of informationgrows to
innityand may thusbe highlyinuentialforsmallsamplesizes for instane. We
nowdenetwomain asymptotipropertiesofthe posteriordistributionstudiedin
this manusript, namelyposterioronsisteny and posterioronentration rate.
1.2.1 Posterior onsisteny
Consisteny of the posteriordistribution an beonsidered asa leastrequirement
for Bayesian nonparametri proedures. Diaonis and Freedman (1986) proved
that in the ase of exhangeable data, onsisteny of the posteriordistribution is
equivalenttoweakmergingofposteriorsassoiatedwithdierentproperpriordis-
tributions. This ispartiularlyinteresting as,asargued before, itis oftendiult
togofrompriorknowledgeontheparametertoapriordistribution,andtwostatis-
tiians ouldomewith two dierent priors. Wegive amoredetaileddenition of
onsisteny oftheposteriordistribution,aspresented inGhosh and Ramamoorthi
(2003).
Let the observations
X n ∈ X n
be some random variables sampled from adistribution
P θ n
forθ ∈ Θ
. Heren
is onsidered to be a quantiation of the amount ofinformation. ConsiderΠ
aprior probabilitydistribution onΘ
. We anthus ompute the posteriordistribution of
θ
denotedΠ( ·| X n )
(see (1.1)). Assumethat there exists an unknown parameter
θ 0 ∈ Θ
suh that the data are generatedfromthetrue distribution
P θ n 0
,anddeneanǫ
-neighbourhoodofθ 0
assoiatedwiththe loss funtion
d
B ǫ (θ 0 ) = { θ, d(θ, θ 0 ) ≤ ǫ } .
Denition 1.2. The posterior distribution is said to be onsistent at
θ 0
for theloss
d
if for anyǫ > 0
,the posteriorprobability ofB ǫ (θ 0 ) Π(B ǫ (θ 0 ) | X n ) → 1
eitherin
P θ n 0
probability orP θ ∞ 0
-almost surely.A rst result of Doob (1949) shows that when
d
is a metri and(Θ, d)
is aomplete separable spae, any posterior distribution is onsistent at
θ 0
,Π
-almostsurely,undersomeergodiityonditions. Thisresultisinterestingbutveryweakas
itdoesnotprovidesanyinformationonthesetofparametersatwhihonsisteny
holds.
Ausualrequirementfortheposteriortobeonsistentisthatthepriorputspos-
itive mass onneighbourhoodsof
θ 0
. More preisely, ifP θ
isabsolutely ontinuouswith respet to
P θ 0
,dene the Kullbak-Leibler divergene asKL(P θ , P θ 0 ) = Z
log dP θ
dP θ 0
dP θ,
one willrequire that
Π(KL(P θ , P θ 0 ) < ǫ) > 0
for allǫ
.Aseondonditionisthatthemodelmakesitpossibletodierentiatesbetween
θ 0
and parameters outsideB ǫ (θ 0 )
. This an be formalized by the existene of asequene of tests of
H 0 : θ = θ 0 ,
versusH 1 : θ ∈ B ǫ c (θ 0 ).
(1.2)We then dene an exponentially onsistent sequene of tests
{ φ n ( X n ) }
as fol-lows
Denition 1.3. The sequene of tests
{ φ n ( X n ) }
is exponentially onsistent for testing (1.2)if thereexistsc > 0
suhthat for alln
E
θ 0 (φ n ( X n )) . e − cn , sup
θ ∈ B ǫ c (θ 0 )
E
θ (1 − φ n ( X n )) . e − cn .
Forindependentidentiallydistributed observations
X n = (X 1 , . . . , X n )
wherethe parameter of interest is the ommon density
f
with respet to a measureλ
,we thushave
f = dP θ
dλ , θ n ( X n ) = Y n
i=1
θ(X i ),
hene, in this ase
θ = f
, Shwartz (1965) gives general onditions on the modeltoahieveonsisteny. InthisasetheKullbak-Leiblerdivergenebetween
f
andf 0
isKL(f, f 0 ) = Z
X
f (x) log
f(x) f 0 (x)
dx.
(1.3)Shwartz (1965) requires that the prior has positive mass on all
ǫ
-neighborhoods for the Kullbak-Leibler divergene for allǫ > 0
Π (f : KL(f, f 0 ) ≤ ǫ) > 0.
The truth
f 0
is then said to belong to theKL
-support of the priorΠ
. Thisonditionensures that thesupport ofthe priorislargeinthesenseofthe Kullba-
Leibler divergene.
Inthe density setting, Shwartz's Theorem then states:
Theorem 1.1 (Shwartz (1965)). Let
Π
be a prior onΘ
, andθ 0 ∈ Θ
suh that• θ 0
isin theKL
-support ofΠ
•
there exists an exponentiallyonsistent sequene of tests for (1.2)then
Π(B ǫ (θ 0 ) | X n ) → 1 P θ ∞ 0
almost surely.SinethisresultofShwartz, othertypesofresultshavebeenobtained,inmany
dierentsettings,see forinstane Walkerand Hjort(2001),Walker(2003),Walker
(2004), Lijoiet al.(2007).
1.2.2 Posterior onentration rate
A more rened asymptoti property is the posterior onentration rate. Loosely
speaking, itis the rate at whih the
ǫ
-neighborhoodsB ǫ (θ 0 )
an shrink suh thatthe posteriorprobabilityof
B ǫ (θ 0 )
remindsloseto1
. Toget abetterunderstand- ing of the impat of the prior on the posterior, we need to study sharper resultsthan mere onsisteny. Some aspets of the prior may inuene signiantly the
posterioronentrationrate. They arethuslikelytobehighlyinuentialfornite
datasets and should thus be handled with are. We now give a preise denition
of the posterior onentration rate and present some general results proposed in
the literature.
Denition 1.4. Lettheobservations
X n
besampledfromadistribution
P θ n 0
withθ 0 ∈ Θ
andletΠ
beaprioronΘ
. Aposterioronentrationrateatθ 0
withrespetto a semimetri
d
onΘ
is a sequeneǫ n
suh that for all positive sequenesM n
going to innity
Π(θ, d(θ, θ 0 ) ≤ M n ǫ n | X n ) → 1,
in
P θ n 0
probabilityasn
goes toinnity.In their seminal papers Ghosal et al. (2000a) (see also Shen and Wasserman ,
rates in the density model (i.e. independent and identially distributed observa-
tions
X n
)ina similarwayShwartz (1965)did foronsisteny. This ideahas then been extended tomany other models, and other approahes have been proposed,see for instane Ghosal and van der Vaart (2007). Their approah requires also
thatthe prior puts enoughmassonshrinking Kullbak-Leiblerneighbourhoodsof
the truth. However the neighbourhoods here are more restritive than the ones
onsidered for onsisteny. Dene the
k
-th entred Kullbak-Leibler moment, ifdP θ n
is absolutelyontinuous with respet todP θ n 0
,V k (P θ n , P θ n 0 ) =
Z
X
log
dP θ n dP θ n 0
− KL(P θ n , P θ n 0 )
k
dP θ .
We then dene the followingKullbak-Leibler neighborhood
S n (θ 0 , ǫ, k) =
KL(P θ n , P θ n 0 ) ≤ nǫ 2 , V k (P θ n , P θ n 0 ) ≤ nǫ .
AsinShwartz'sTheorem,Ghosal and vander Vaart(2007)alsorequirestheexis-
teneofanexponentiallyonsistentsequeneoftests,butinsteadoftestingagainst
the omplement of the shrinking ball
B ǫ n (θ 0 )
, itis suient totest against setsB n j (θ 0 ) = { θ ∈ Θ n , jǫ n ≤ d(θ, θ 0 ) ≤ 2jǫ n } ,
for any integer
j ≥ J 0
for some positiveJ 0
,whereΘ n
is aninreasing sequene ofsets that takes most of the prior mass of
Π
. Their Theorem is thusas follows:Theorem 1.2 (Theorem3of Ghosaland vander Vaart (2007)). Let
d
be asemi-metri on
Θ
and onsider a sequeneǫ n
suh thatǫ n → 0
,nǫ 2 n → ∞
asn → ∞
.For
k > 1
,K > 0
andΘ n ⊂ Θ
, if there exists a sequene of testsφ n
suh that forJ 0 > 0
, for everyj ≥ J 0
E
θ 0 φ n → 0, sup
B j n (θ 0 )
(1 − φ n ) ≤ e − Knj 2 ǫ 2 n ,
(1.4)and if
Π(B n j (θ 0 ))
Π(S n (θ 0 , ǫ n , k)) ≤ e Knj 2 ǫ 2 n /2 ,
(1.5)then for every sequene
M n → ∞
we haveΠ(θ ∈ Θ n , d(θ, θ 0 ) ≤ M n ǫ n | X n ) → 1
in
P θ n 0
-probability asn
goes to innity.A usualway ofinsuringthe existeneoftests inondition(1.4), forwellsuited
semimetri
d
is to ontrol the overing number of the setsB n j (θ 0 )
. For instanewhen the semimetri
d
is the Hellinger metri,the wellknown results by Le Cam(1986) or LeCam and Yang (2000) insure the existene of suh sequene of tests
1.2.3 Minimax onentration rates and adaptation
Theonentrationrate'stheoryanberelatedtothe lassialoptimalonvergene
rate ofestimators. Ghosalet al.(2000a)showintheontextofdensityestimation,
that the posterior yields a point estimate that onverges atthe same rate as the
posterior onentration rate when the onsidered loss is bounded and onvex. It
thus makes sense to ompare frequentists and Bayesian approahes based on this
asymptotiproperty.
Tostudy the asymptoti behaviourof the posterior distribution,we only on-
sidersomesubspae
Θ 0
oftheparameterspaeonwhihthefuntionsarebehavingwell. One of the most ommon riterion for studying optimality of an estimator
is the minimaxrisk dened by the minimum overallestimatorof maximal riskof
this estimator. More preisely, if
d
is a semimetri onΘ
, the minimax risk overΘ 0 ⊂ Θ
is dened as (see Tsybakov,2009)R n = inf
T n
sup
θ ∈ Θ 0
E
θ [d(T n , θ)] ,
wheretheinmumistaken overallestimators
T n
. TheminimaxrateinΘ 0
isthusthe sequene
ǫ n
suh that there exists a xed positive onstantC
withlim sup
n →∞ ǫ − n 1 R n = C.
We say that a Bayesian proedure onentrates at the minimax rate if the
onentration rate of the posterior in the lass
Θ 0
is the minimax onvergenerate. Many models (prior and sampling models) studied in the literature have
been proven to onentrate at the minimax rate in
Θ 0
. In partiular, in thedensity model, nonparametri mixture models are known to onentrate at the
minimaxrate(up toa
log
fator)overlassesofHölderfuntionsforvarioustypesofkernels,seeGhosaland vander Vaart(2001),Ghosaland vander Vaart(2007),
Shen etal. (2013) for Gaussian kernels, Kruijer et al. (2010) for loation sale
mixturesorRousseau(2010)forbetakernels. Manyothertypesofpriorshavebeen
proven to leadto the minimax onentrationrate, van der Vaart and VanZanten
(2008) proved minimax onentration rates of the posterior for Gaussian proess
priors, Ghosal and vander Vaart (2007) and Knapik et al. (2011) show minimax
onentration rates for series expansions priors for regression and the white noise
model respetively, Arbel etal. (2013), ?, Belitser and Ghosal (2003) obtained
generi resultsfor varioussampling models.
Thesubspaes
Θ 0
arerestritedthroughregularityassumptionssuhasSobolev orHöldersmoothness,shaperestrition,orsparity. Theselassesoffuntionsarein general indexed by a parameter, say
β
, that aounts for the level of regu-on this parameter. However, it is often diult to x
β
a priori, it is thus nat-ural to seek proedures that perform well over a wide variety of
β
values, sayβ ∈ I
. Suh proedures are alled adaptive as they automatially adapt the on- entration rate over the whole olletion of spaesΘ 0,β ∈ I
. Frequentist adaptiveestimators havebeen well studiedin the literature for the past three deades, see
forinstane Efroimovih(1986), Polyak and Tsybakov (1990),orTsybakov (2009)
for a review. From a Bayesian perspetive, adaptive proedures have beome
more and more popular, see Belitserand Ghosal (2003) for innite dimensional
Gaussian distributions, Sriiolo (2006) obtained adaptive rates in the density
model, van der Vaart and van Zanten (2009) onsidered Gaussian random elds
priors,De Jonge et al.(2010)onsidered loationsale mixtures. Other examples
ofadaptiveBayesianproeduresan befoundinRivoirardet al.(2012),Rousseau
(2010)or Arbelet al.(2013) for instane.
1.3 Nonparametri Bayesian testing
Anotheraspet ofBayesiannonparametri inferenethat hasbeeninvestigated in
this workis the soalled testingproblemor modelhoie. In this ase, one isnot
interested in reovering an unknown parameter
θ
but rather in taking a deisionon the parameter given the observations. This problem of testing in a Bayesian
framework is well known and an be dated bak to Laplae (1814). It an be
formalized as follows: let
Θ 0
andΘ 1
be two distint subspaes of the parameterspae
Θ
, assoiated with prior probabilityπ 0
andπ 1
, one wants to infer whetherθ ∈ Θ 0
versusθ ∈ Θ 1
, whih an be seen asthe estimation ofI Θ
1 (θ)
asargued inRobert (2007). Consider
Π
a priordistribution onΘ = Θ 0 ∪ Θ 1
. In this settingitis naturalto onsider the
0
-1
loss with weightsγ 0 , γ 1
similar to the one proposedby Neyman and Pearson (1938)whih isdened fora deision
ϕ L(θ, ϕ) =
( γ 0
ifϕ = 0
andI Θ
0 (θ) = 0 γ 1
otherwise.
TheBayesiansolutiontothis problem(i.e. theminimizeroftheBayesianrisk)
isthen
ϕ( X n ) =
( 1
ifΠ(Θ 1 | X n )/π 1 ≥ γ 0 γ +γ 0 1 Π(Θ 0 | X n )/π 0
0
otherwise.
(1.6)Toavoid the impatof
Π(Θ 0 )
andΠ(θ 1 )
orγ 0
andγ 1
, one an equivalently denethe Bayes-Fator
B 0,1 = Π(Θ 0 | X n ) Π(Θ 1 | X n ) × π 1
π 0
.
The testing proedure orresponds to rejeting
Θ 0
ifB 0,1
is small but the Bayes-Fator provides more information than just a
0
-1
answer. Standard thresholdsare given by Jereys' sale. A test proedure based on the Bayes-Fator
B 0,1
issaid tobe onsistent if
B 0,1
goesto innity inP θ n 0
probability for allθ 0 ∈ Θ 0
andonverges to
0
inP θ n 0
probabilityfor allθ 0 ∈ Θ 1
. Bayes-Fatorsfor nonparametri goodness of t test have been studied in term of their asymptoti properties inthe literature, see Dass and Lee (2004); MVinish etal. (2009); Rousseau (2007);
Rousseau and Choi(2012)forinstane. Whenbothhypothesesarenonparametri
and one is embedded inthe other, the determinationof Bayesian proedures that
have good asymptotiproperties is diultin general.
Similarly to the estimation problem, asymptoti properties of a Bayesian an-
swer to a testing problem are of great interest from both a theoretial and a
methodologialpointofviewsineinferenebasedoninonsistentposteriorsould
be highly misleading. A similar requirement should also hold for testing proe-
dures. In this ontext, we will say that a proedure is onsistent if it gives the
right answer with probability that goes to
1
as the amount of information grows toinnity. More preisely,atestingproedure(1.6)issaidtobeonsistentforthemetri orsemi-metri
d
, if for allρ > 0 sup
θ ∈ Θ 0
E θ (ϕ( X n )) = o(1), sup
d(θ,Θ 0 )>ρ
E θ (1 − ϕ( X n )) = o(1).
(1.7)Similarlyto the frequentist literature,we onsider hereuniform onsisteny, how-
ever this denition of onsisteny slightly diers from the one usually onsidered
in the frequentist setting, as here we do not x a level for the type I error of the
test. It is also interesting to study the ounterpart of the onentration rate in
the testingproblemnamely the separation rate ofthe test. The separation rateis
dened as the smallest sequene
ρ = ρ n
suh that (1.7) is still valid. It indiateshowfastthe testan dierentiatebothhypotheses. Similarlytotheonentration
rate, it also indiates whih part of the prior inuenes the Bayesian proedure
even asymptotially. This is of great interest as it is well known that in testing
problems, the sensitivity to the prior isa major issue.
1.4 Challenging asymptoti properties of Bayesian
nonparametri proedures
Wehaveseenthat studyingthe asymptotibehaviourofthe posteriordistribution
is a major tool to understand the inuene of the prior in the nonparametri
setting. Wehavealsoseenthatthereexistssuientonditionsonthemodelunder
whih the proedure is known to be onsistent and to have optimal asymptoti
do not fall under this general theory. These models present a new hallenge for
the Bayesian nonparametri ommunity. In this setion we present two of these
problems namely the inferene under monotoniity onstraints and estimation in
linear ill-posedinverse problems.
1.4.1 Inferene under monotoniity onstraints
In many statistial problems, it is useful to impose some restritions on the pa-
rameter spae to be able to arry out the inferene. When modelling real world
situations, shape onstraints on the parameter of interest may appear naturally,
this is the ase for instane for drug response models or insurvivalanalysis. Fur-
thermore, theses hypotheses are often easy to interpret, understand and explain
ompared to smoothness restritions for instane. Among dierent shape on-
straints, monotoniityrestritions have been fairly popular inthe literature. In a
regression settingfor instane, amonotoniity ofa response isoften granted from
physial of theoretial onsiderations. Shape onstraints inferene, and mono-
toniity in partiular an be dated bak to Brunk (1955) and most of the early
works on the subjet an be found in Barlowet al. (1972). Sine then mono-
toniity onstraintshave been used in many appliedproblems: in pharmaeutial
ontext in Bornkamp and Ikstadt (2009), for survival analysis in Laslett (1982),
Neelon and Dunson (2004) studied monotone regression for trend analysis and
Dunson (2005) onsidered monotoniity onstraints on ount data. Many other
appliationsan befound inRobertson etal. (1988).
In this setion, we present the two shapeonstrained problems studied inthis
thesis, namely the estimation of a density under monotoniity onstraints and
testingfor monotoniity ina regression setting.
1.4.1.1 Monotone densities
Monotonedensitiesare ommoninpratie, espeiallyinsurvivalanalysis. Arst
study of monotone density an be imputed to Grenander (1956) who onsidered
themaximumlikelihoodestimatorofamonotone density. Sinethen many others
havebeeninterestedinestimatingaunknowndistributionundershaperestritions.
Laslett(1982)onsiderstheproblemofestimatingthedistributionofrakslength
onaminewall,Sun and Woodroofe(1996)presentsomeappliationinastronomy
andrenewalanalysisamongothers. Usingshapeonstraintsproedureswillensure
that the estimate follows this onstraints, whih ould be a requirement of the
analysis.
SineWilliamson(1956),itisknown thatadensityismonotonenon inreasing
if and only if it is a mixture of uniform kernels. More preisely, let
F
be the setof monotone non inreasing densities on
[0, ∞ )
, then for allf ∈ F
there exists aprobability distribution
P
suh thatf ( · ) =
Z ∞
0
I [0,θ] ( · )
θ dP (θ).
(1.8)This mixture representation is partiularly interesting as it allows for inferene
based on the likelihood. Grenander (1956) showed that the maximum likelihood
estimator oinides with the rst derivative of the least onave majorant of the
umulative distribution funtion. Its asymptoti properties were later studied in
Groeneboom(1985)underthe
L 1
lossandPrakasa Rao(1970)studiedtheasymp-toti behaviour of the maximum likelihood estimator evaluated at a xed point
in the interior of the support. In Groeneboom (1989), it is shown that the mini-
max rate of onvergene for this problem is of the order of
n − 1/3
. This shows ina way how monotoniity onstraints at as regularity onstraints as in this ase,
one obtains the same onvergene rate as for Lipshitz densities. Another sur-
prising aspet of monotone non inreasing densities is that the evaluation of the
maximum likelihood estimator at the boundaries of the support leads to inon-
sistent estimators. This problem has been studied in Sun and Woodroofe (1996)
and very preise results on the behaviour of the maximum likelihood estimator
at
0
an be found in Balabdaoui etal. (2011). More reently, Durotet al.(2012)obtain some asymptoti results for the maximum likelihood estimator under the
supremum loss. IntheBayesianframework,monotonedensitieshavebeenstudied
in Brunner and Lo (1989) and Lo (1984). From a Bayesian point of view, the
mixture representation (1.8) leads naturally to a mixture type prior. Choosing a
prior modelon
P
in representation (1.8) naturally indues a prior onF
. This isthe approahonsidered inBrunner and Lo(1989). InChapter 2weonsider two
types of priors on
P
namely Dirihlet proess and nite mixtures with a randomnumber of omponents. An interesting feature of these models is that the prior
doesnotputpositivemassontheKullbak-Leiblerneighborhoodofthetruth,and
thus ondition (1.5) will not hold and the standard approah based on the work
of Ghosal and vander Vaart (2007) annot be applied diretly. We prove that a
similar result holds when one only onsiders Kullbak-Leibler neighbourhoods of
trunated versions of the densities
f n ( · ) = f( · ) I [0,x n ]
F (x n ) , f θ 0 ,n ( · ) = f θ 0 ( · ) I [0,x n ] F θ 0 (x n ) ,
where
x n
is aninreasing sequene andF
is the umulative distribution funtion off
. From this result,we provethat forboth prior models, the posterior onen-tratesattheminimaxrate
n − 1/3
uptoalog(n)
term. Wealsostudytheasymptotipropertiesof the posteriordistribution ofthe densityat axed point
x
of itssup-general well suited for losses that are related to the Kullbak-Leibler divergene
(see Arbelet al., 2013; Homann et al., 2013). In partiular, the usual approah
of LeCam (1986)for onstruting exponentially onsistentsequene of tests does
not hold in this ase. However, we prove in Chapter 2 that for the onsidered
priordistributionthe posteriordistributionof
f (x)
isonsistentforeveryx
inthesupport of
f
, inluding the boundaries. The fat that the posterior distribution is onsistent at the boundaries of the support when the maximum likelihood es-timatoris not an beimputed to the penalization indued by the prior. Another
interestingfeature ofourBayesianapproahisthatthe posteriorisalsoonsistent
for the supremum loss over the whole support. Here again, the supremum loss
is not related to the Kullbak-Leibler divergene, whih makes the onstrution
ofexponentiallyonsistent sequene of tests diult.
1.4.1.2 Nonparametri test for monotoniity
Althoughthereisawideliteratureontheproblemofestimatinganunknownfun-
tion under shape onstraints, an important question is whether it is appropriate
to impose a spei shape onstraint. If it is, then the estimation proedures
ould in general be greatly improved by using a shape onstrained estimation
proedure. Conversely, imposing shape onstraints in an appropriate ase ould
lead to dramatially erroneous results. The problem of testing for monotoniity
has been widely studied inthe frequentist literature. Bowman et al.(1998)intro-
dueda test for monotoniityin the regression settingbase onthe idea of ritial
bandwidth introduedinSilverman(1981). Hall and Hekman (2000)showed that
this proedure is highlysensitive toat parts of the regression funtion,and pro-
posed another test proedure based on running gradient. Baraud et al. (2003),
Ghosalet al. (2000b) and Baraud et al. (2005)propose testing proedures in the
xeddesign regressionsettingand theGaussianwhite noisesetting. Durot(2003)
and Akakpo etal. (2014) onsider a test that exploits the onavity of a primi-
tive of a monotone funtion. A Bayesian approah to testing monotoniity in a
regression frameworkhas been proposed inSott etal. (2013).
In Chapter 3, we onsider the nonparametri regression model
Y i = f(x i ) + σǫ i , ǫ i
iid ∼ N (0, 1),
(1.9)and we want totest
H 0 : f ∈ F ,
versusH 1 : f 6∈ F ,
(1.10)for
F
be the set of monotone non inreasing funtionson[0, 1]
.A rstdiultyintestingfor monotoniityina regressionsettingis thatboth
thenullandthealternativehypothesesarenonparametri. Asageneralrulewhen
aountthesensitivitytothepriordistribution. Thisistrueforparametrimodels
butisritialfornonparametrionesasinthatase,asstatedbefore,theprioran
stillinuenetheposteriorasymptotially. Aseondandprobablymoreimportant
diulty is the fat that when testing for monotoniity in a regression setting,
the null hypothesis is embedded in the alternative. This problem is ommon
in goodness of t tests where one is interested in testing
f = f 0
versusf 6 = f 0
. This has been investigated in Dass and Lee (2004), Ghosal et al. (2008) or MVinish et al.(2009)amongothersinthe density setting,orRousseau and Choi(2012)intheregression problem. Inthisase amaindiultyisthataparameter
in the null model an also be approximated by a parameter in the alternative
model. In fat it has been proved in Walkeretal. (2004) that the Bayes-Fator
willasymptotiallysupportthemodelwithpriorthatsatisestheKullbak-Leibler
property, some additionalonditions may be required when both priors do.
In the ase of testing for monotoniity, it seems that for a natural hoie of
prior,namelypieewiseonstantfuntionswithrandomnumberofbins,theBayes-
Fatorisnotonsistent. Wethusproposeanalternativetestthatisasymptotially
equivalenttotestingformonotoniityusingasimilarideaasapproximatingapoint
nullhypothesis by ashrinkinginterval(see Rousseau ,2007). Denoteby
F
the setof monotone non inreasing funtions with support
[0, 1]
and letd ˜
be a metri ora semi-metri. Consider the test
H 0 a : ˜ d(f, F ) ≤ τ
versusH 1 a : ˜ d(f, F ) ≥ τ
(1.11)where
d(f, ˜ F ) = inf g ∈F d(f, g) ˜
andτ
is agiven threshold. Ifτ
dereases toward0
,both tests (1.10) and (1.11) are asymptotially equivalent. We propose a alibra-
tion of thethreshold
τ
,the Bayesian answer tothe test (1.11)assoiatedwith the0
-1
lossisonsistentfortheinitialproblemof testing(1.10)andgivesgoodresultsin pratie ompared to the frequentist proedures. Furthermore, for a spei
hoie of prior, the proposed Bayesian test is easy toimplement whih is a great
advantage ompared tothe existing methods.
We also study the separation rate of the test whih gives insights on the
eieny of the proedure. The adaptive minimax separation rates for testing
monotoniityhas been derived inBaraud etal.(2005)and Dümbgen et al.(2001)
overHölder alternatives. Under similarassumptions,we provethat our proedure
ahieves the minimax separation rate up to a
log(n)
fator.1.4.2 Ill posed linear inverse problems
Another general lass of models that beame popular for statistial modelling
sinethe 1960'sisthesoalledinverseproblems. Theyappearnaturallywhenone
ase in many elds of appliations: medial imaging(omputerized tomography),
eonometry(instrumentalvariables),radioastronomy(interferometry),astronomy
(blurred images of Hubble telesope) or seismology among many others. In the
statistialsettingthis is modelledby onsideringthat the data arise froma prob-
ability distribution whose parameter has been transformed by a known operator
K
that atsontheparameterspae. Inmost ases,wean assumethatthe trans-formation
K
does not indue additional noise in the observations. The sampling modelis thusmodied toX n ∼ P Kθ n , θ ∈ Θ.
(1.12)If the operator
K
an be inverted and if its inverse is ontinuous, then the gen-eral theory applies and inferene on
θ
does not dier from the usual framework.However, inmany ases,the inverseof theoperatorisnot ontinuous. In thisase
the problem is alled ill-posed with respet to Hadamard's denition, as in this
ase a small noise in the data will be greatly amplied in the inferene on
θ
. Aninteresting lass of operators whih overs many appliations isthe lass of linear
operators onHilbert Spaes. It isusuallyassumed that the operator
K
isompatand injetive and the Hilbertspaes are separable.
Statistialapproahtoinverse problems has grown popularsinethe standard
framework has been proposed in Tikhonov (1963). A usual toy example tostudy
suh methodsis the white noise model
X n = Kθ + σ W
√ n ,
(1.13)where
W
is white noise andσ > 0
a variane parameter. In this example we aneasily grasp the diulties at hand. In Chapter 4 we treat more general inverse
problems models of the form (1.12). In the following, we willreall some features
of statistialinferenein inverse problems and illustrateitwith model(1.13).
1.4.2.1 Singular value deomposition
Consider
K
to be aompat injetivelinear operatorbetween two Hilbert spaes{ Θ, h· , ·i θ }
and{ Ξ, h· , ·i ξ }
. Forreadingonveniene, weshalldropthe subsriptfortheinner produtwhenthere isnoonfusion. A usualapproahtoinferon
θ
istoonsider its deomposition in a basis of
Θ
. In the linear inverse problem setting,a simple hoie for suh basis would be the one that diagonalize the operator
K
.Morepreisely,denoteby
K ∗
theadjointoperatorofK
andsupposethattheauto-adjointoperator
K ∗ K
isompat,thenthespetralTheoremstatesthatK ∗ K
hasaomplete orthogonalsystem of eigenvetors
{ e i }
withorresponding eigenvalues{ b i }
. We thus have for allθ ∈ Θ K ∗ Kθ =
X ∞ i=1
b i h θ, e i i e i = X ∞
i=1
κ 2 i θ i e i ,
(1.14)where
κ i = √
b i
andθ i = h θ, e i i
. Inthisase wesaythatK
admitsasingularvaluedeomposition (SVD) with singular values
{ κ i }
and singularbasis{ e i }
. Inferringon
θ
isthusequivalenttoinferontheinnitesequene{ θ i }
. FromtheobservationsX n
from model (1.12), one an get an estimator
η ˆ
ofη = Kθ
. Denoting{ η ˆ i }
itsprojetion ontothe SVD basis, a simple estimator
θ ˆ
ofθ
is given byθ ˆ i = η ˆ i
κ i
.
When theproblemisill-posed,sine
{ κ i }
,goesto0
,we see thatthe oeientsθ i
willbe over-estimated for large
i
.To see this problem more learly, onsider the white noise example. By pro-
jeting (1.13)onto the basis
{ e i }
and sineW
isawhite noise,we an rewritethemodelas
x i = κ i θ i + σ
√ n ǫ i , ǫ i ∼ N (0, 1), i = 1, 2, . . .
with
x i = h x, e i i
. This sequene model has been a ornerstone in the study oflinear inverse problems, see for instane Donoho (1995); Cavalier and Tsybakov
(2002); Cavalier (2008). The ase where
K
is the identity operator (i.e.κ i = 1
for all
i
) has been widely studied in the literature. From a Bayesian perspe- tive, this representation is highly interesting as in this ase, it is natural to on-sider a prior onthe sequene
{ θ i }
. These types of priors have been onsidered inGhosal and van der Vaart (2007) when
K
is the identity or Knapik etal. (2011)orAgapiou etal.(2013)inthe inverse problemsetting. Toinferon
θ
,we onsiderthe transformed model
x i κ − i 1 = θ i + κ − i 1 σ
√ n ǫ i , i = 1, 2, . . . ,
whihreduestheproblemtoestimatingthemeanofaninniteGaussiansequene.
Sine the problem is ill-posed, the sequene
κ − i 1 → ∞
, hene the variane of thenoise blows up.
It appears from these onsiderations that the diulty of an inverse problem
an bequantied by the rate atwhih the sequene
{ κ − i 1 }
goesto innity.Denition1.5 (Ill-posedness). Wedenethe degreeofill-posednessofaninverse
problemas follows:
•
We say that a problem is mildly ill-posed of degreep
if the sequene ofsingular values
{ κ i }
is suh that there exist onstants0 < C d ≤ C u < ∞
suh that