Model Misspecification in ABC: Consequences and Diagnostics

(1)

arXiv:1708.01974v2 [math.ST] 3 Aug 2018

Model Misspecification in ABC: Consequences

and Diagnostics.

David T. Frazier

∗

, Christian P. Robert

†

, and Judith Rousseau

‡

August 6, 2018

Abstract

We analyze the behavior of approximate Bayesian computation (ABC) when the model generating the simulated data differs from the actual data generating process; i.e., when the data simulator in ABC is misspecified. We demonstrate both theoretically and in simple, but practically relevant, examples that when the model is misspecified different versions of ABC can lead to substantially different results. Our theoretical results demonstrate that under regularity conditions a version of the accept/reject ABC approach concentrates posterior mass on an appropriately defined pseudo-true parameter value. However, under model misspecification the ABC posterior does not yield credible sets with valid frequentist coverage and has non-standard asymptotic behavior. We also examine the theoretical behavior of the popular linear regression adjustment to ABC under model misspecification and demonstrate that this approach concentrates posterior mass on a completely different pseudo-true value than that obtained by the accept/reject approach to ABC. Using our theoretical results, we suggest two approaches to diagnose model misspecification in ABC. All theoretical results and diagnostics are illustrated in a simple running example.

1 Introduction

It is now routine in the astronomic, ecological and genetic sciences, as well as in economics and finance, that the models used to describe observed data are so complex that the likelihoods associated with these model are computationally intractable. In a Bayesian inference paradigm, these settings have led to the rise of approximate Bayesian computation (ABC) methods that eschew calculation of the likelihood in favor of simulation; for reviews on ABC methods see, e.g., Marin et al. (2012) and Robert (2016).

ABC is predicated on the belief that the observed data y := (y1, y2, ..., yn)′ is drawn from the

class of models {θ ∈ Θ : Pθ}, where θ ∈ Θ ⊂ Rkθ is an unknown vector of parameters and where

∗_{Monash University, Australia. Email: [email protected].}

†_{Universit´e Paris Dauphine PSL, CEREMADE CNRS, Paris, France. Email: [email protected]} ‡_{Universit´e Paris Dauphine PSL, CEREMADE CNRS, Paris, France. Email: [email protected]}

(2)

π(θ) describes our prior beliefs about θ. The goal of ABC is to conduct inference on the unknown θ by simulating pseudo-data z, z := (z1, ..., zn)⊺ ∼ Pθ, and then “comparing” y and z. In most

cases, this comparison is carried out using a vector of summary statistics η(_{·) and a metric d{·, ·}.}

Simulated values θ ∼ π(θ) are then accepted, and used to build an approximation to the exact

posterior if they satisfy an acceptance rule that depends on a tolerance parameter ǫ. Algorithm 1 ABC Algorithm

1: Simulate θi, i = 1, 2, ..., N, from π(θ), 2: Simulate zi _{= (z}i

1, zi2, ..., zni)′, i = 1, 2, ..., N, from Pθ;

3: For each i = 1, ..., N, accept θi with probability one if d{η(zi_{), η(y)}_{} ≤ ǫ, where ǫ denotes an}

user chosen tolerance parameter ǫ.

Algorithm 1 details the common accept/reject implementation of ABC, which can be aug-mented with additional steps to increase sampling efficiency; see, e.g., the MCMC-ABC approach of Marjoram et al. (2003), or the SMC-ABC approach of Sisson et al. (2007). Post-processing of the simulated pairs _{θi_{, η(z}i₎_{} has also been proposed as a means of obtaining more accurate}

posterior approximations (for reviews of ABC post-processing methods see Marin et al., 2012 and Blum et al., 2013).1

Regardless of the ABC algorithm chosen, the very nature of ABC is such that the researcher must believe there are values of θ in the prior support that can yield simulated summaries η(z) ‘close to’ the observed summaries η(y). Therefore, in order for ABC to yield meaningful inference about θ there must exist values of θ _{∈ Θ such that η(z) and η(y) are similar.}

While complex models allow us to explain many features of the observed data, it is unlikely that any Pθ will be able to produce simulated data that perfectly reproduces all features of y.

In other words, by the very nature of the complex models to which ABC is applied, the class of models {θ ∈ Θ : Pθ} used to simulate pseudo-data z is likely misspecified. Even when accounting

for the use of summary statistics that are not sufficient, and which might be compatible with several models, the value these summaries take for the observed data may well be incompatible with the realised values of these statistics for the model of interest.

Consequently, understanding the behavior of popular ABC approaches under model misspeci-fication is of paramount importance for practitioners. Indeed, as the following example illustrates, when the model is misspecified the behavior of popular ABC approaches can vary drastically. Example 1: To demonstrate the impact of model misspecification in ABC, we consider an artificially simple example where the assumed data generating process (DGP) is z_{∼ N (θ, 1) but} the actual DGP is y _{∼ N (θ, ˜σ}2_{). That is, for ˜}_σ2 _{6= 1, the DGP for z maintains an incorrect}

assumption about the variance of y and thus differs from the actual DGP for y. We consider as the basis of our ABC analysis the following summary statistics:

• the sample mean η1(y) = 1_nPni=1yi

• the centered summary η2(y) = _n−11 Pn_i=1(yi− η1(y))2− 1

1

In particular, ABC post-processing approaches follow steps (1)-(2) in Algorithm 1 but replace step (3) with a step that accepts all values generated in step (2) but re-weights them according to some criteria. For example, in the case of linear regression adjustment, θ _{∼ π(θ) is replaced by ˜θ = θ − ˆ}β⊺

{η(z) − η(y)}, with ˆβ obtained by a (weighted) least squares regression of θ on η(z)_{− η(y).}

(3)

For this experiment we consider two separate versions of ABC: the accept/reject approach, where

we take d_{{x, y} = kx − yk to be the Euclidean norm; and a post-processing ABC approach that}

uses a weighted linear regression adjustment step in place of the selection step in Algorithm 1. We refer to these approaches as ABC-AR, and ABC-Reg, respectively.

To demonstrate how these ABC approaches react to model misspecification, we fix θ = 1 and simulate “observed data sets” y according to different values of ˜σ2_{. We consider one hundred}

simulated data sets for y such that each corresponds to a different value of ˜σ2_{, with ˜}_σ2 _taking

values from ˜σ2 = .5 to ˜σ2 = 5 with evenly spaced increments. Across all the data sets we fix the random numbers used to generate the simulated data and only change the value of ˜σ2 _to

isolate the impact of model misspecification; i.e., we generate one common set of random numbers νi ∼ N (0, 1), i = 1, ..., n, for all data sets, then for a value of ˜σ2 we generate observed data from

yi = 1 + νi· ˜σ. The sample size across the experiments is taken to be n = 50.

Figure 1 compares the posterior mean, EΠ[θ|η(y)], of ABC-AR, and ABC-Reg across different

values for ˜σ2_{. The results demonstrate that misspecification in ABC can have drastic}

conse-quences, even at a relatively small sample sizes.2 _{Two useful conclusions can be drawn from}

Figure 1: one, the performance of the ABC-AR procedure remains stable regardless of the level of misspecification; two, the behavior of the linear regression adjustment approach to ABC, ABC-Reg, becomes volatile even at relatively small levels of misspecification. We formally explore these issues in Sections two and three but note here that when ˜σ2 _{≈ 1 (i.e., correct model specification)}

the two ABC approaches give similar results.

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 ˜ σ2 EΠ [θ |η (y )] n=50, N=50,000, True Value: θ = 1 ABC-Reg ABC-AR

Figure 1: Comparison of posterior means for ABC-AR, and ABC-Reg across varying levels of model misspecification. Note that ABC-Reg ends after a certain point and does not continue; after this value all posterior means continued on the same trajectory and hence are not reported. Both ABC approaches used N = 50, 000 simulated data sets generated according to z_ij _{∼ N (θ}j_{, 1),}

with θj _{∼ N (0, 25). ABC-AR retained draws that yielded kη(y) − η(z}j₎_{k in the α}

n = n−5/9

quantile. The bandwidth for ABC-Reg was taken as n−5/9_.

2

Even though the DGP for z is misspecified, because of the nature of the model misspecification and the limiting behavior of η(y), if one were to only use the first summary statistic (the sample mean) model misspecification would have little impact in this example. However, in general both the nature of the model misspecification and the precise limiting form of η(y) are unknown. Therefore, choosing a set of summaries that can mitigate the impact of model misspecification will be difficult, if not impossible, in practical applications of ABC.

(4)

It is interesting to note that the linear regression post processing approach behaves poorly under misspecification. This is particularly interesting since the post-processing linear regres-sion adjustment approach has theoretical advantages over the standard ABC-AR approach, i.e., Algorithm 1, when the model is correctly specified; see Li and Fearnhead (2018a) for details.

In the remainder, we elaborate on the above issues and rigorously characterizes the asymp-totic behavior of ABC when the model generating the pseudo-data is misspecified. In Section two, we consider model misspecification in the ABC context and demonstrate that under model misspecification, for a certain choice of the tolerance, the posterior associated with Algorithm 1 concentrates all mass on an appropriately defined pseudo-true value. In addition, we find that the asymptotic shape of the ABC posterior is non-standard under model misspecification, and will lead to credible sets with arbitrary levels of coverage. Section three demonstrates that under model misspecification, the regression adjustment ABC approach yields a posterior that concen-trates posterior mass on a completely different region of the parameter space than ABC based on Algorithm 1. We then use these theoretical results to devise an alternative regression adjustment approach that performs well regardless of model specification. Motivated by our asymptotic re-sults, in Section four we develop two model misspecification detection procedures: a graphical detection approach based on comparing acceptance probabilities from Algorithm 1 and an ap-proach based on comparing the output from Algorithm 1 and its linear regression adjustment counterpart. Proofs of all theoretical results are contained in the appendix.

2 Model Misspecification in ABC

2.1 On the Notion of Model Misspecification in ABC

Let y denote the observed data and define P0 to be the true distribution generating y. Let

P := {θ ∈ Θ ⊂ Rkθ : P

θ} be the class of model implied distributions used in ABC to simulate

pseudo-data so that z ∼ Pθ; Z represents the space of simulated data; η(y) = (η1(y), ..., ηkη(y))′ is a kη-dimension vector of summary statistics; B := {η(z) : z ∈ Z} ⊂ Rkη is the range of the

simulated summaries; d1{·, ·} is a metric on Θ; d2{·, ·} is a metric on B. When no confusion will

result we simply denote a generic metric by d{·, ·}. Π(θ) denotes a prior measure and π(θ) its corresponding density.

In likelihood-based inference, model misspecification means that P0 ∈ P. The result of which/

is that the Kullback-Leibler divergence satisfies: inf θ∈ΘD(P0||Pθ) = infθ∈Θ− Z log dP0(y) dPθ(y) dP0(y) > 0, and θ∗ _{= arg inf} θ∈ΘD(P0||Pθ)

is defined to be the pseudo-true value. Under regularity conditions Bayesian procedures predicated on the likelihood of Pθ yield posteriors that concentrate on θ∗; see, e.g., Kleijn and van der Vaart

(2012) and Muller (2013).

(5)

observed sample is generated according to y ∼ P0. However, in contrast to previous research

on ABC, we are explicitly interested in the case where P0 ∈ P. Because ABC algorithms are/

not based on the full data y but on two types of approximations, the summary statistics η(y) and the threshold ǫ, even if P0 ∈ P the model class P may be capable of generating a simulated/

summary that is compatible with the observed summary η(y), or is within an ǫ neighbourhood

of η(y). Therefore, the approximate nature of ABC means that _D(P0||Pθ) does not yield a

meaningful notion of model misspecification associated with the output of an ABC algorithm, or ABC posterior distributions.

Recalling that the ABC posterior measure is given by, for A_{⊂ Θ,} Πǫ[A|η(y)] = Z A Pθ[d{η(y), η(z)} ≤ ǫ] dΠ(θ) Z Θ Pθ[d{η(y), η(z)} ≤ ǫ] dΠ(θ),

we see that misspecification in ABC will be driven by the behavior of η(y), η(z) and the set {θ ∈ Θ : d{η(y), η(z)} ≤ ǫ}. To rigorously formulate the notion of model misspecification associated with the output of a given ABC algorithm, we must study the limiting behaviour of the ABC likelihood Pθ[d{η(y), η(z)} ≤ ǫ] as the amount of information in the data accumulates.

To this end, we follow the framework of Marin et al. (2014), Frazier et al. (2018) and Li and Fearnhead (2018b), where it is assumed that the summary statistics concentrate around some fixed value,

namely, b0 under P0 and b(θ) under Pθ. In Marin et al. (2014), the authors study the case where

ǫ = 0, while Frazier et al. (2018) and Li and Fearnhead (2018b) study ǫ > 0 but allow ǫ to vary

with n and set ǫ = ǫn. In the latter two papers the authors demonstrate that the amount of

information ABC obtains about a given θ depends on: (1) the rate at which the observed (resp. simulated) summaries converge to a well-defined limit counterpart b0 (resp., b(θ)); (2) the rate

at which the tolerance ǫn (or bandwidth) goes to zero; (3) the link between b0 and b(θ). When

P0 ∈ P, there exists some θ0 such that b(θ0) = b0 and the results of Frazier et al. (2018)

com-pletely characterize the asymptotic behaviour of the ABC posterior distribution. Furthermore, this analysis remains correct even if P0 ∈ P, so long as there exists some θ/ 0 ∈ Θ such that

b0 = b(θ0).

Therefore, the meaningful concept of model misspecification in ABC is when there does not exist any θ0 ∈ Θ satisfying b0 = b(θ0), which is precisely the notion of model incompatibility

defined in Marin et al. (2014). Throughout the remainder, we say that the model is (ABC) misspecified if

ǫ∗ = inf

θ∈Θd{b0, b(θ)} > 0 (1)

and note here that this condition is more likely to occur when kθ < kη.

Heuristically the implication of misspecification in ABC is that, under regularity and since d_{{η(y), η(z)} ≥ d{b}0, b(θ)} − oPθ(1)− oP0(1) ≥ ǫ∗− op(1), for all ǫn= o(1),

the event{θ ∈ Θ : d{η(y), η(z)} < ǫn} becomes extremely rare, and corresponds to d{η(z), b(θ)} >

ǫ∗ _{− o(1). Therefore, for a sequence of tolerances ǫ}

n = o(1), or even if ǫn < ǫ∗ no draws of θ will

be selected regardless of how many simulated samples from π(θ) we generate, and Πǫ[A|η(y)] will

be ill-behaved.

(6)

possible that other choices for ǫn will produce a well-behaved posterior. In the following section

we show that (certain) tolerance sequences satisfying ǫn → ǫ∗, as n → +∞, yield well-behaved

posteriors that concentrate posterior mass on the pseudo-true value θ∗_.

2.2 ABC Posterior Concentration Under Misspecification

Building on the intuition in the previous section, in this and the following section we rigorously characterize the asymptotic behaviour of

Πǫ[A|η(y)] = Z A Pθ[d{η(y), η(z)} ≤ ǫn] dΠ(θ) Z Θ Pθ[d{η(y), η(z)} ≤ ǫn] dΠ(θ)

when P0 ∈ P and ǫ/ ∗ > 0, with ǫ∗ defined by (1). To do so, we first define the following: for

sequences {an} and {bn}, real valued, an .bn denotes an ≤ Cbn for some C > 0, an≍ bn denotes

equivalent order of magnitude, an≫bn indicates a larger order of magnitude and the symbols

oP(an), OP(bn) have their usual meaning.

We consider the following assumptions.

[A0] d_{{η(y), b}0} = oP0(1) and there exists a positive sequence v0,n→ +∞ such that lim inf n→+∞ P0 d{η(y), b0} ≥ v0,n−1 = 1.

[A1] There exist a continuous, injective map b : Θ → B ⊂ Rkη _{and a function ρ}

n(·) satisfying:

ρn(u)→ 0 as n → +∞ for all u > 0, and ρn(u) monotone non-increasing in u (for any given n),

such that, for all θ _{∈ Θ,}

Pθ[d{η(z), b(θ)} > u] ≤ c(θ)ρn(u),

Z

Θ

c(θ)dΠ(θ) < +_∞ where z∼ Pθ, and we assume either of the following:

(i) Polynomial deviations: There exist a positive sequence vn → +∞ and u0, κ > 0 such that

ρn(u) = v−κn u−κ, for u≤ u0.

(ii) Exponential deviations: There exists hθ(·) > 0 such that Pθ[d{η(z), b(θ)} > u] ≤ c(θ)e−hθ(uvn)

and there exists c, C > 0 such that Z

Θ

c(θ)e−hθ(uvn)_dΠ(θ)_{≤ Ce}−c(uvn)τ_{, for u}_{≤ u}

0.

[A2] There exists some D > 0 and M0, δ0 > 0 such that, for all δ0 ≥ δ > 0 and M ≥ M0, there

exists Sδ⊂ {θ ∈ Θ : d{b(θ), b0} − ǫ∗ ≤ δ} for which

(i) In case (i) of [A1], D < κ and Z S 1− c(θ) M dΠ(θ) & δD.

(7)

(ii) In case (ii) of [A1], _Z

Sδ

1_{− c(θ)e}−hθ(M )_{dΠ(θ) & δ}D_. We then have the following result.

Theorem 1. The data generating process for y satisfies [A0] and assume that (1) holds. Assume also that conditions [A1] and [A2] are satisfied and ǫn↓ ǫ∗ with

ǫn≥ ǫ∗+ Mvn−1+ v0,n−1

and M is large enough. Let Mn be any positive sequence going to infinity and δn≥ Mnmax{ǫn−

ǫ∗_{, v}−1

0,n, vn−1}, then

Πǫ[d{b(θ), b0} ≥ ǫ∗+ δn|η(y)] = oP0(1), (2)

as soon as

δn≥ Mnvn−1u−D/κn = o(1) in case (i) of assumption [A1]

δn≥ Mnvn−1| log(un)|1/τ = o(1) in case (ii) of assumption [A1] .

with un = ǫn− (ǫ∗+ Mv−1n + v−10,n) .

Remark 1. Theorem 1 gives conditions so that the ABC posterior concentrates on arg min

θ∈Θd{b(θ), b0},

at least under the assumption that ǫn is slightly larger than ǫ∗. Under the more precise framework

of Theorem 2, where the asymptotic shape of the posterior distribution is studied, this condition can be refined to allow ǫn to be slightly smaller than ǫ∗. However, if ǫ∗− ǫn is bounded below by a

positive constant, then the posterior distribution does not necessarily concentrate.

Corollary 1. Assume the hypotheses of Theorem 1 are satisfied and θ∗ _{∈ Θ uniquely satisfies}

θ∗ = arg inf

θ∈Θd{b0, b(θ)}.

For any δ > 0, Πǫ[d1{θ, θ∗} > δ|η(y)] = oP0(1).

Remark 2. Theorem 1 and Corollary 1 demonstrate that Πǫ[·|η(y)] concentrates on θ∗ if the

model is misspecified. Therefore, Theorem 1 is an extension of Theorem 1 in Frazier et al. (2018) to the case of misspecified models. In addition, we note that Theorem 1 above is similar to Theorem 4.3 in Bernton et al. (2017) for ABC inference based on the Wasserstein distance. Remark 3. It is crucial to note that the pseudo-true value θ∗ _{directly depends on the choice}

of d2{·, ·}. Indeed, ABC based on two different metrics d2{·, ·} and ˜d2{·, ·} will produce

dif-ferent pseudo-true values, unless if by happenstance inf{θ ∈ Θ : d2{b(θ), b0}} and inf{θ ∈

Θ : ˜d2{b(θ), b0}} coincide. This lies in stark contrast to the posterior concentration result in

Frazier et al. (2018), which demonstrated that, under correct model specification, Πǫ[·|η(y)]

(8)

2.3 Shape of the Asymptotic Posterior Distribution

In this section, we analyse the asymptotic shape of the ABC posterior under model misspecifica-tion. For simplicity, we take the rate at which the simulated and observed summaries converge to their limit counterparts to be the same, i.e., we take v0,n = vn and we consider as the distance

d_{{η(z), η(y)} = kη(z) −η(y)k where k · k is the norm associated to a given scalar product < ·, · >.} Denote by Ik the (k× k) dimensional identity matrix and let

Φk(B) = Pr [N (0, Ik)∈ B]

for any measurable subset B of Rk_.

The following conditions are needed to establish the results of this section. [A0′_{] Assumption [A0] is satisfied, ǫ}∗ _{= inf}

θ∈Θd{b(θ), b0} > 0 and θ∗ = arg infθ∈Θkb(θ) − b0k

exists and is unique.

[A1′_{] Assumption [A1] holds and for some positive-definite matrix Σ}

n(θ∗), c0 > 0, κ > 1 and

δ > 0, for all _{kθ − θ}∗_{k ≤ δ, P}

θ[kΣn(θ∗){η(z) − b(θ)}k > u] ≤ c0u−κ for all 0 < u≤ δvn.

[A3] The map θ 7→ b(θ) is twice continuously differentiable at θ∗ _{and the Jacobian} _∇

θb(θ∗)

has full column rank kθ. The Hessian of kb(θ) − b0k2 evaluated at θ∗, and denoted by H∗, is a

positive-definite matrix.

[A4] There exists a sequence of (kη× kη) positive-definite matrices Σn(θ) such that for all M > 0

there exists u0 > 0 for which

sup |x|≤M sup kθ−θ∗_k≤u₀|Pθ (< Zn, e >≤ x) − Φ(x)| = o(1), where Zn= Σn(θ)(η(z)− b(θ)) and e = (b(θ∗)− b0)/kb(θ∗)− b0k.

[A5] There exists vn going to infinity and u0 > 0 such that for all kθ − θ∗k ≤ u0, the sequence of

functions θ _{7→ Σ}n(θ)vn−1 converges to some positive-definite matrix A(θ) and is equicontinuous at

θ∗_.

[A6] π(θ), the density of the prior measure Π(θ), is continuous and positive at θ∗_.

[A7] For Z0

n= Σn(θ∗){η(y) − b0} and all Mn going to infinity

P0 kZn0k > Mn

= o(1).

Theorem 2. Assumptions [A0 ′_{], [A1} ′_{] (with κ} _{≥ k}

θ), [A2] and [A3]-[A7] are satisfied.

We then have the following results.

(i) If limnvn(ǫn− ǫ∗) = 2c, with c∈ R, then for k · kT V the total-variation norm

kΠ_v1/2

n ,ǫ− QckT V = oP0(1)

(9)

has density qc with respect to Lebesgue measure on Rkθ proportional to qc(x)∝ Φ c− < Z0 n, A(θ∗)e > ǫ∗ kA(θ∗_)e_kǫ∗ − x⊺_H_∗_x 4_kA(θ∗_)e_kǫ∗

(ii) If limnvn(ǫn− ǫ∗) = +∞ with un= ǫn− ǫ∗ = o(1), for U_{kxk≤M} the uniform measure over

the set _{{kxk ≤ M},}

kΠu−1n ,ǫ− U{x⊺H∗x≤2}kT V = oP0(1),

Remark 4. As is true in the case where the model is correctly specified, if ǫn is too large , which

here means that (ǫn−ǫ∗)≫ 1/vn, then the asymptotic distribution of the ABC posterior is uniform

with a radius that is of the order ǫn− ǫ∗. In contrast to the case of correct model specification,

if ǫ∗ _{> 0 and if v}

n{ǫn − ǫ∗} → 2c ∈ R, then the limiting distribution is no longer Gaussian.

Moreover, this result maintains even if c = 0.

Remark 5. In likelihood-based Bayesian inference, credible sets are not generally valid confi-dence sets if the model is misspecified (see, e.g., Kleijn and van der Vaart, 2012 and Muller, 2013). However, it remains true in likelihood-based settings that the resulting posterior is still asymptotically normal. In the case of ABC, not only will credible sets not be valid confidence sets, but the asymptotic shape of the ABC posterior is not even Gaussian.

Remark 6. In practice ǫ∗ _{is unknown, and it is therefore not possible to choose ǫ}

n directly.

However, we note that the application of ABC is most often implemented by accept draws of θ within some pre-specified (and asymptotically shrinking) quantile threshold; i.e., one accepts a simulated draw θi _{if d}_{η(zi_{), η(y)}_{} is smaller than the α-th empirical quantile of the simulated}

values d{η(zj_{), η(y)}_{}, j ≤ N. However, as discussed in Section 6 of Frazier et al., 2018, the two}

representations of the ABC approach are dual in the sense that choosing a value of α on the order of δv−kθ

n , with δ small, corresponds to choosing |ǫn − ǫ∗| . δ1/kηvn and choosing αn & Mvn−kθ

corresponds to choosing ǫn− ǫ∗ &Mvn. We further elaborate on the equivalence between the two

approaches in Section 4.1.

Interestingly, the proof of Theorem 2 demonstrates that if vn(ǫn− ǫ∗) → −∞, in particular

when ǫn = o(1) and ǫ∗ > 0, posterior concentration of Πǫ[·|η(y)] need not occur. We present an

illustration of this phenomena in the following simple example.

Example 2: Consider the case where kθ = 1 and kη = 2. Let ˜Zy = √n(η(y) − b0) and

˜

Zn =√n(η(z)−b(θ)), where ˜Zn∼ N (0, vθ2I2), for vθsome known function of θ, and b(θ) = (θ, θ)⊺.

In addition, assume that b0 = (¯b0,−¯b0), with ¯b0 6= 0. Under this setting, and when k · k is the

Euclidean norm, it follows that the unique pseudo-true value is θ∗ _{= 0. However, depending on v} θ,

the approximate posterior need not concentrate on θ∗ _{= 0. This is summarized in the following}

Proposition.

Proposition 1. In the setup described above, if vθ/vθ∗ = σ(θ), for v_θ∗ some known function, such that σ is continuous and σ(¯b0/2)2 ≥ 3 and if the prior has positive and continuous density

on [_−¯b0, ¯b0], then Πǫ{|θ − θ∗| ≤ δ|η(y)} = o Πǫ |θ − ¯b0/2| ≤ δ|η(y) = o(1).

(10)

3 Regression Adjustment under Misspecification

3.1 Posterior Concentration

Linear regression post-processing methods are a common means of adjusting the ABC output. First proposed by Beaumont et al. (2002), this method has found broad applicability with ABC practitioners.

However, as demonstrated in the introductory example, we caution against the blind ap-plication of these post-processing methods when one is willing to entertain the idea of model misspecification. In particular, the use of post-processing steps in ABC can lead to point esti-mators that have very different behavior than those obtained from Algorithm 1, even in small samples.

In this section, we rigorously characterize posterior concentration of the linear regression adjustment ABC approach (hereafter, ABC-Reg) under model misspecification. For simplicity, we only consider the case of scalar θ, however, we allow η(y) to be multi-dimensional.3

We consider an ABC-Reg approach that first runs Algorithm 1, with tolerance ǫn, to obtain a

set of selected draws and summaries {θi_{, η(z}i₎_{} and then uses a linear regression model to predict}

the accepted values of θ. The accepted value θi _{is then artificially related to η(y) and η(z) through}

the linear regression model

θi = µ + β⊺{η(y) − η(zi₎_{} + ν} i,

where νi is the model residual. Define ¯θ = PN_i=1θi/N and ¯η = PN_i=1η(zi)/N. ABC-Reg defines

the adjusted parameter draw according to ˜ θi = θi+ ˆβ⊺ {η(y) − η(zi)_}, ˆ β = " 1 N N X i=1 η(zi)_{− ¯η} η(zi)_{− ¯η}⊺ #−1" 1 N N X i=1 η(zi)_{− ¯η} θi_{− ¯θ} # = dVar−1(η(zi)) dCov(η(zi), θi) Therefore, for θi _{∼ Π}

ǫ[θ|η(y)], the posterior measure for ˜θi is nothing but a scaled and shifted

version of Πǫ[·|η(y)]. Consequently, the asymptotic behavior of the ABC-Reg posterior, denoted

by eΠǫ[·|η(y)], is determined by the behavior of Πǫ[·|η(y)], ˆβ, and {η(y) − η(zi)}.

Corollary 2. Assumptions [A0 ′_{], [A1] and [A2] are satisfied and ǫ}

n ↓ ǫ∗ with

ǫn ≥ ǫ∗+ Mvn−1+ v−10,n,

and M large enough. Furthermore, for some β0 with kβ0k > 0, k ˆβ − β0k = oPθ(1). Define ˜ θ∗ _{= θ}∗_{+ β}⊺ 0(b(θ∗)− b0). For any δ > 0, e Πǫ[|θ − ˜θ∗| > δ|η(y)] = oP0(1), 3

This result can be extended at the cost of more complicated arguments but we refrain from this setting to simplify the interpretation of our results.

(11)

as soon as

ρn(ǫn− ǫ∗)≥ (ǫn− ǫ∗)−D/κ in case (i)

ρn(ǫn− ǫ∗)≥ | log(ǫn− ǫ∗)|1/τ in case (ii) .

Remark 7. An immediate consequence of Theorems 1 and Corollary 2 is that Πǫ[·|η(y)]

concen-trates posterior mass on

θ∗ = arg min

θ∈Θd{b(θ), b0},

while eΠǫ[·|η(y)] concentrates posterior mass on

˜

θ∗ = θ∗+ β₀⊺(b(θ∗)_{− b}0).

It is also important to realize that, for kβ0k large, the pseudo-true value ˜θ∗ can lie outside Θ.

Therefore, if the model is misspecified, the ABC-Reg procedure can return parameter values that do not have a sensible interpretation in terms of the assumed model.

Remark 8. An additional consequence of Theorem 1 and Corollary 2 is that Πǫ[·|η(y)] and

˜

Πǫ[·|η(y)] yield different posterior expectations. We use this point in the next section to derive a

procedure for detecting model misspecification.

3.2 Adjusting Regression Adjustment

The difference between accept/reject ABC and ABC-Reg under model misspecification is related to the regression adjustments re-centering of the accepted draws θi _{by ˆ}_β⊺_{{η(y) − η(z)}. Whilst}

useful under correct model specification, when the model is misspecified the adjustment can force θi _{away from θ}∗ _{and towards ˜}_θ∗_{, which need not lie in Θ.}

The cause of this behavior is the inability of η(z) to replicate the asymptotic behavior of η(y), which in the terminology of Marin et al. (2014) means that the model is incompatible with the observed summaries. This incompatibility of the summary statistics ensures that the influence of the centering term ˆβ⊺

{η(y) − η(z)} can easily dominate that of the accepted draws θi_{, with the}

introductory example being just one example of this behavior (an additional example is given in Section 4.3).

In an attempt to maintain the broad applicability of linear regression adjustment in ABC, and still ensure it gives sensible results under model misspecification, we propose a useful modification of the regression adjustment approach. To motivate this modification recall that, under correct model specification and regularity conditions, at first-order the regression adjustment approach ensures (see Theorem 4 in Frazier et al., 2018):

˜ θi = θi + ˆβ⊺ {η(y) − η(z)} = θi + ˆβ⊺{b0− b(θi)} + Op(1/vn) = θi −∇θb(θ∗)⊺V0−1∇θb(θ∗) −1 ∇θb(θ∗)⊺V0−1∇θb(¯θ)(θi− θ∗) + Op(1/vn), (3)

where b0 = b(θ∗) by correct model specification, ¯θ is an intermediate value satisfying |¯θ − θ∗| ≤

|θi_{− θ}∗_{|, V}

(12)

mean-value expansion. Therefore, it follows from (3) that, even if kη > kθ, the dimension of η(y)

will not affect the asymptotic variance of the ABC-Reg posterior mean. This result, at least in part, helps explain (from a technical standpoint) the popularity of the ABC-Reg approach as a dimension reduction method. However, under model misspecification, b0 6= b(θ) for any θ ∈ Θ,

and hence the intermediate value ¯θ will be such that

b0− b(θi)6= ∇θb(¯θ)(θ∗− θi).

As a consequence, equation (3) can not be valid (in general) if the model is misspecified.

The behavior of ABC-Reg under correct and incorrect model specification suggests that the methods poor behavior under the latter can be mitigated by replacing η(y) with an appropriate term. To this end, define ˆθ = EΠ[θ|η(y)] to be the posterior mean of standard ABC; let ˆzm,

m = 1, ..., M, be a pseudo-data set of length n simulated under the assumed DGP and at the value ˆθ; and define

ˆ η = M X m=1 η(ˆzm)/M.

Using ˆη, we can then implement the regression adjustment approach ˘

θi = θi+ ˆβ⊺_{{ˆη − η(z}i)_}.

The key to this approach is that under correct specification ˆη behaves like η(y), while under incorrect specification ˆη behaves like η(z). A direct consequence of this construction is that this approach avoids the incompatibility issue that arises from model misspecification.

To demonstrate the robustness of this new regression adjustment approach to ABC, we return to the simple normal example.

Example 1 (Continued): The assumed DGP is z _{∼ N (θ, 1) but the actual DGP is y ∼}

N (1, ˜σ2_{). ABC is conducted using the following summary statistics:}

• the sample mean η1(y) = 1_nPn_i=1yi;

• the centered summary η2(y) = _n−11 Pn_i=1(yi− η1(y))2− 1.

We consider two different DGPs corresponding to ˜σ2 _{= 1 and ˜}_σ2 _{= 2, which respectively}

corre-spond to correct and incorrect specification for the DGP of z. For each of these two cases we generate 100 artificial samples for y of length n = 100 and apply three different ABC approaches: ABC-AR, ABC-Reg and our new ABC-Reg procedure (referred to as ABC-Reg-New). Each pro-cedure relies on N = 50, 000 pseudo-data sets generated according to z_{∼ N (θ, 1); for ABC-AR} we take d{·, ·} to be the Euclidean norm k · k; and we retain draws that yield kη(y) − η(zi₎_{k in}

the αn= n−5/9 quantile.

Figure 2 plots the posterior mean of each approach across the Monte Carlo replications and across both designs. The results demonstrates that the new regression adjustment maintains stable performance across both correct and incorrect model specification. Table 1 reports the cor-responding coverage and average credible set length across the two Monte Carlo designs. Jointly, these results demonstrate that under correct specification the ABC-Reg-New approach performs

(13)

AR Reg-New Reg 0.8 0.9 1.0 1.1 1.2 1.3 ˜ σ2_{= 1} AR Reg-New Reg 0.0 0.5 1.0 1.5 2.0 ˜ σ2_{= 2}

Figure 2: Posterior mean comparison of ABC-AR (AR), standard regression adjustment (Reg) and the new regression adjustment approach (Reg-New) across the two Monte Carlo designs. ˜

σ2 _{= 1 (resp., ˜}_σ2 _{= 2) corresponds to correct (resp., incorrect) model specification.}

just as well as the other ABC approaches in terms of accuracy and precision, while under model misspecification the procedure behaves similar to ABC-AR.

Before concluding, we compare the Monte Carlo coverage of these methods across the two Monte Carlo designs. The results in Table 1 demonstrate that ABC-Reg and ABC-Reg-New give much shorter credible sets than ABC-AR on average. However, when the model is misspecified, this behavior gives researchers a false sense of the procedures precision, which is reflected by the poor coverage rates (for the pseudo-true value) of both regression adjustment procedures. Therefore, even though this new regression adjustment procedure gives stable performance under correct and incorrect model specification, it still suffers from the coverage issues alluded to in the remarks proceeding Theorem 2.

Table 1: Monte Carlo coverage (Cov.) and average credible set length (Length) for the simple normal example under correct (resp., incorrect) model specification. COV is the percentage of times that the 95% credible set contained θ∗ _{= 1. Length is the average length of the credible}

set, across the Monte Carlo trials. ˜

σ2 _{= 1} _σ_˜2 _{= 2}

AR Reg-New Reg AR Reg-New Reg

Cov. 98% 95% 95% 100% 65% 35%

Length 0.97 0.38 0.38 0.86 0.38 0.38

(14)

4 Detecting Misspecification

In this section we propose two methods to detect model misspecification in ABC. The first ap-proach is based on the behavior of the acceptance probability under correct and incorrect model specification. The second approach is based on comparing posterior expectations calculated un-der Πǫ[·|η(y)] (obtained from Algorithm 1) and eΠǫ[·|η(y)] (obtained using the linear regression

adjustment approach, i.e., ABC-Reg).

4.1 A Simple Graphical Approach to Detecting Misspecification

From the results of Frazier et al. (2018), under regularity and correct model specification, the acceptance probability αn= Pr [d{η(y), η(z)} ≤ ǫn] satisfies, for n large and ǫn ≫ vn−1,

αn = Pr [d{η(y), η(z)} ≤ ǫn]≍ ǫknθ.

In this way, as ǫn → 0 the acceptance probability αn → 0 in a manner that is approximately

linear in ǫkθ

n.

However, this relationship between αn and ǫn does not extend to the case where limnǫn > 0.

In particular, if ǫ∗ _{> 0, once ǫ}

n < ǫ∗ we will often obtain an acceptance probability αn that is

small or zero, even for a large number of simulations N.

The behavior of αn under correct and incorrect model specification means that one can

po-tentially diagnose model misspecification graphically by comparing the behavior of αn over a

decreasing sequence of tolerance values. In particular, by taking a decreasing sequence of tol-erances ǫ1,n ≤ ǫ2,n ≤ · · · ≤ ǫJ,n we can construct and plot the resulting sequence {αj,n}j to

determine if _{αj,n}j decays in an (approximately) linear fashion.

While αnis infeasible to obtain in practice, the same procedure can be applied with αnreplaced

by the estimator ˆαn =PNi=11l[d{η(y), η(z)} ≤ ǫn]/N. In this way, such a graphical check can be

easily performed using the ABC reference table. The only difference is that, instead of considering a single tolerance ǫ, one would consider a sequence of tolerances {ǫj,n}j and record, for each j,

ˆ αj,n= N X i=1 1l[d{η(y), η(z)} ≤ ǫj,n]/N.

Once ˆαj has been obtained, it can be plotted against ǫj (in some fashion) and the relationship

can be analyzed to determine if deviations from linearity are in evidence.

To understand exactly how such a procedure can be implemented, we return to the simple normal example.

Example 1 (Continued): The assumed DGP is z _{∼ N (θ, 1) but the actual DGP is y ∼}

N (1, ˜σ2_{). We again consider ABC analysis using the following summary statistics:}

• the centered summary η2(y) = _n−11 Pn_i=1(yi− η1(y))2− 1.

Taking ˜σ2 _{∈ {1, 1 + 2/9, 1 + 3/9, ..., 2}, we generate observed samples of size n = 100 according}

(15)

numbers fixed and only change ˜σ2_{. N = 50, 000 simulated data sets are again generated according}

to z_ij _{∼ N (θ}j_{, 1), with θ}j _{∼ N (0, 25), and for d{·, ·} we consider the Euclidean norm k · k. The}

results are presented in Figure 3.

0 20 40 60 80 0.0 02 1/ǫn, ˜σ2= 1 ˆαn 0 20 40 60 80 0.0 02 1/ǫn, ˜σ2= 1.22 ˆαn 0 20 40 60 80 0.0 02 1/ǫn, ˜σ2= 1.33 ˆαn 0 20 40 60 80 0.0 00 0.0 10 1/ǫn, ˜σ2= 1.44 ˆαn 0 20 40 60 80 0.0 00 0.0 10 1/ǫn, ˜σ2= 1.55 ˆαn 0 20 40 60 80 0.0 00 0.0 10 1/ǫn, ˜σ2= 1.67 ˆαn 0 20 40 60 80 0.0 00 0.0 10 1/ǫn, ˜σ2= 1.78 ˆαn 0 20 40 60 80 0.0 00 0.0 10 1/ǫn, ˜σ2= 1.89 ˆαn 0 20 40 60 80 0.0 00 0.0 10 1/ǫn, ˜σ2= 2 ˆαn

Figure 3: Graphical comparison of estimated acceptance probabilities ˆαj,n against decreasing

tolerance values ǫj,n.

The figure demonstrates that for n = 100 this procedure has difficulty detecting model mis-specification if _|˜σ2 _{− 1| ≤ 1/3. However, for |˜σ}2 _{− 1| ≥ 4/9 the procedure can detect model}

misspecification, which shows up as an exponential decay in ˆαn,j.

Clearly, obtaining broad conclusions about model misspecification from this graphical

ap-proach depends on many features of the underlying model, the dimension of θ,4 _{and the exact}

nature of misspecification. While potentially useful, this approach should only be used as a tool

to help diagnose model misspecification.

4.2 Detecting Model Misspecification Using Regression Adjustment

Theorem 1 and Corollary 2 demonstrate that basic ABC, as described in Algorithm 1, and ABC-Reg place posterior mass in different regions of the parameter space. Therefore, the posterior

4

In this simple example correct specification would warrant a linear relationship between αn and ǫn, since we

are only conducting inference on one parameter. More generally, under correct model specification, we would expect a linear relationship between αn and ǫ

kθ

(16)

expectations

ˆh =Z h(θ)dΠǫ[θ|η(y)], ˜h =

Z

h(θ)deΠǫ[θ|η(y)]

converge, as n_{→ +∞ and ǫ}n↓ ǫ∗, to distinct values. However, if the model is correctly specified,

ˆh and ˜h will not differ, up to first order, so long as ǫn = o(1/√n). Therefore, a useful approach for

detecting model misspecification is to compare various posterior expectations, such as moments or quantiles, calculated from the two posteriors.

More specifically, under regularity conditions given in Li and Fearnhead (2018b) and Li and Fearnhead (2018a), if the model is correctly specified and if we use Algorithm 1 based on quantile

thresh-olding with αn= δn−kθ/2 with δ > 0 small then

√

n_{kˆh − ˜hk = o}P0(1). However, if ǫ∗ _{= inf}

θ∈Θd{b0, b(θ)} > 0, under regularity conditions, we can deduce that

liminfnkˆh − ˜hk > 0

Therefore a not small kˆh − ˜hk is meaningful evidence that the model may be misspecified. To demonstrate this approach to diagnosing model misspecification, we return to our simple running example.

Example 1 (Continued): The assumed DGP is z ∼ N (θ, 1), but the actual DGP is y ∼

N (θ, ˜σ2_{). We again consider the following summary statistics:}

• the centered summary η2(y) = _n−11 Pni=1(yi− η1(y))2− 1.

We simulate n = 100 observed data points from a normal random variable with mean θ = 1 and variance ˜σ2 _{= 2, so as to capture a mild level of model misspecification, and generate}

one-thousand independent Monte Carlo replications. We again take N = 50, 000 simulated data sets generated according to zj_i _{∼ N (θ}j_{, 1), with θ}j _{∼ N (0, 25). For d{·, ·} we take the Euclidean norm}

k · k and we accept values of θ that lead to distances lower than the corresponding αn = n−5/9

quantile.

Across the Monte Carlo replications, we compare the non-centered second and third posterior moments calculated under ABC-AR and ABC-Reg:

ˆh =Z θ2dΠǫ[θ|η(y)], Z θ3dΠǫ[θ|η(y)] ⊺ , ˜h = Z θ2d ˜Πǫ[θ|η(y)], Z θ3d ˜Πǫ[θ|η(y)] ⊺

The sampling distribution of √nkˆh − ˜hk, across the Monte Carlo replications, is presented in Figure 4.

(17)

0 20 40 60 80 100 0.0 0 0.0 4 0.0 8

Monte Carlo Sampling Density of √n_{kˆh − ˜hk}

√_n kˆh − ˜hk, ˜σ2_{= 2} D en sit y

Figure 4: Monte Carlo sampling distribution of √nkˆh − ˜hk in the normal example with ˜σ2 _{= 2.}

Recall that under correct specification√nkˆh − ˜hk = oP0(1). Thus, if the model was correctly specified, we would expect a majority of the realizations for √nkˆh − ˜hk to be relatively small. It is then clear from Figure 4 that there is a substantial difference between ˆh and ˜h even under moderate model misspecification.

As further evidence on the difference between the behavior of √n_{kˆh − ˜hk under correct and} incorrect model specification, Figure 5 plots, across the Monte Carlo replications, the sampling distributions of √nkˆh − ˜hk when ˜σ2 _{= 1 (correct specification) and when ˜}_σ2 _{= 2 (incorrect}

specification). From Figure 5 it is clear that the distribution of √nkˆh − ˜hk is drastically different under correct and incorrect specification, even at this relatively minor level of misspecification.

˜ σ2 _{= 1} _σ_˜2 _{= 2} 0 20 40 60 80

Figure 5: Monte Carlo sampling distributions for√n_{kˆh − ˜hk under correct specification (˜σ}2 _{= 1)}

and a mild level of model misspecification (˜σ2 _{= 2). ˆh (resp., ˜h) is the vector of second and third}

posterior moments calculated from ABC-AR (resp., ABC-Reg). k · k is the Euclidean norm.

(18)

Before concluding this section, we note that, at least for smooth functions h(θ), a formal testing strategy based on comparing ˆh and ˜h can be constructed. For example, a test could be constructed to determine whether or not the (population) means of h(θ) under the two different sample measures, Πǫ and eΠǫ are the same, against the alternative hypothesis that these (population)

means differ. A natural test statistic for such a hypothesis test is an appropriately scaled version of ˆh_{− ˜h. While potentially useful, we do not explore this topic in any formal way within this} paper but leave its analysis for future research.

4.3 Additional Monte Carlo Evidence

In this section we demonstrate the consequences of model misspecification in ABC using the MA(2) model.

4.3.1 Moving Average Model

When the behavior of the observed data y displays short memory properties, a moving average model is capable of capturing these features in a parsimonious fashion. If the researcher believes y is generated according to an MA(q) model, then ABC requires the generation of pseudo-data according to zt= et+ q X i=1 θiet−i,

where, say, et∼ N (0, 1) i.i.d and θ1, ..., θq are such that the roots of the polynomial

p(x) = 1₋

q

X

i=1

θixi

all lie outside the unit circle.

Specializing this model to the case where q = 2 we have that

zt= et+ θ1et−1+ θ2et−2, (4)

and the unknown parameters θ = (θ1, θ2)⊺ are assumed to obey

−2 < θ1 < 2, θ1+ θ2 >−1, θ1− θ2 < 1. (5)

Our prior information on θ = (θ1, θ2)⊺ is uniform over the invertibility region in (5). A

use-ful choice of summary statistics for the MA(2) model are the sample autocovariances γj(z) = 1

T

PT

t=1+jztzt−j, for j = 0, 1, 2. Throughout the remainder of this subsection we let η(z) denote

the summaries η (z) = (γ0(z), γ1(z), γ2(z))⊺. It is simple to show that, under the DGP in equations

(4)-(5), the limit map θ _{7→ b(θ) is}

b(θ) = 1 + θ2

1 + θ22, θ1(1 + θ2), θ2

⊺

and η(z) satisfies the sufficient conditions for posterior concentration previously outlined, assum-ing y is sufficiently regular.

(19)

While short memory properties can exist in the levels of many economic and financial time series, the observed data y can often display conditional heteroskedasticity. In such cases, the dynamics of the level series _{yt}t≥1 displays short memory properties but the autocorrelations of

the squared or absolute series, {y2

t}t≥1 or {|yt|}t≥1, display persistence that can not be captured

by the MA(2) model. Therefore, if one disregards these conditional dynamics, the moving average model will be misspecified.

More concretely, consider the artificially simple situation where the researcher believes the data is generated according to an MA(2) model, equation (4), but the actual DGP for y evolves according to the stochastic volatility model

yt = exp(ht/2)ut

ht = ω + ρht−1+ vtσv (6)

|ρ| < 1, 0 < σv < 1, ut and vt and both iid standard Gaussian. In this case, if one takes

η (y) = (γ0(y), γ1(y), γ2(y))⊺ it follows that, under the DGP in (6),

η(y)_→P b0 = σ2 v 1−ρ2, 0, 0 ⊺ . For d{·, ·} the Euclidean norm we then have

θ∗ = arg inf θ∈Θd{b0, b(θ)} = (0, 0) ⊺ and ǫ∗ = s σ2 v 1_{− ρ}2 − 1 2 . 4.3.2 Monte Carlo

We are interested in comparing the behavior of ABC-AR, ABC-Reg and ABC-Reg-New when the true model generating y is actually a stochastic volatility model, as above, but the model used for simulating pseudo-data in ABC is an MA(2) model. We carry out this comparison across two simu-lation designs: one, (ω, ρ, σv)⊺ = (−.736, .90,

√

.363)⊺ _{and two, (ω, ρ, σ}

v)⊺= (−.147, .98,

√

.0614)⊺_.

These particular values are related to the unconditional coefficient of variation κ for the unob-served level of volatility ht in the observed data, with

κ2 = Var(ht) (E[ht])2 = exp σ2 v 1_{− ρ} − 1. In the first design, i.e., (ω, ρ, σv)⊺= (−.736, .90,

√

.363)⊺_{, we have κ}2 _{= 1, which roughly represents}

the behavior exhibited by lower-frequency financial returns (say, weekly or monthly returns); for the second design, i.e., (ω, ρ, σv)⊺ = (−.147, .98,

√

.0614)⊺_{, we have κ}2 _{= .1, which roughly}

corresponds to higher-frequency financial returns (say, daily returns).

Across the two different designs, we generate n = 1000 observations for y and consider

one-hundred Monte Carlo replications. Across the replications we apply ABC-AR, with d{·, ·} =

k · k, ABC-Reg and ABC-Reg-New to estimate the parameters of the MA(2) model. Given the theoretical results deduced in Sections two and three, it should be the case that the ABC-AR and ABC-Reg-New approaches gives estimators close to the pseudo-true value (θ∗

(20)

while ABC-Reg is likely to deliver point estimates with different behavior.

Figure 6 plots the resulting posterior means across the two designs, and across the ABC approaches. From Figure 6 we note that the behavior of ABC-Reg represents a substantial departure from the stable performance of ABC-AR and ABC-Reg-New. This is further evidence that standard post-processing ABC methods are more susceptible to misspecification than more basic ABC approaches.

AR Reg-New Reg -0 .3 -0 .1 0.1 0.3 Design=1,theta1 AR Reg-New Reg -0 .3 -0 .1 0.1 0.3 Design=1,theta1 AR Reg-New Reg -0 .3 -0 .1 0.1 Design=1,theta2 AR Reg-New Reg -0 .3 -0 .1 0.1 0.3 Design=1,theta2

Figure 6: Posterior mean plots for ABC-AR (AR), ABC-Reg (Reg) and ABC-Reg-New (Reg-New) across the Monte Carlo trials for designs one and two. For Design-1, observed data was simulated according to (ω, ρ, σv)⊺ = (−.736, .90,

√

.36)⊺ _{and for Design-2 (ω, ρ, σ}

v)⊺= (−.146, .98,

√

.0614)⊺_.

In addition, Table 2 presents the Monte Carlo coverage and average credible set length across the three procedures, and for both Monte Carlo designs. Similar to the simple normal example, we see that the regression adjustment procedures give much smaller confidence intervals than those obtained from ABC-AR. Again, however, we emphasize that this result is not desirable since the smaller confidence sets leave researchers with a false sense of precision regarding the uncertainty of point estimators in misspecified models.

(21)

Table 2: Monte Carlo coverage (Cov.) and average credible set length (Length) for the MA(2) model. Design-1 corresponds to data simulated under (ω, ρ, σv)⊺ = (−.736, .9,

√

.36)⊺_{, while}

Design-2 takes (ω, ρ, σv)⊺ = (−.146, .98,

√

.0614)⊺_{. Calculations are based on the marginal ABC}

posteriors.

Design-1 Design-2

θ1 Cov. 98% 95% 95% 100% 100% 49.6%

Length 0.97 0.38 0.38 0.38 0.12 0.13

θ2 Cov. 98% 95% 95% 100% 100% 49.4%

Length 0.97 0.38 0.37 0.41 0.12 0.13

Figure 7 depicts the results of the proposed graphical check for detecting model misspecifica-tion in ABC associated with an arbitrarily chosen Monte Carlo trial. It is clear from this figure that the distances calculated in ABC display the distinct exponential decay that is expected when the model generating the pseudo-data in ABC is not correctly specified.

0 20 40 60 80 100 0.0 00 0 0.0 00 5 0.0 01 0 0.0 01 5 1/ǫ2 n ˆαn Design 1: (ω, ρ, σv)′= (−.736, .90, √ .36)′ 0 20 40 60 80 100 0.0 00 0 0.0 00 5 0.0 01 0 0.0 01 5 0 20 40 60 80 100 0.0 00 0 0.0 01 0 0.0 02 0 1/ǫ2 n ˆαn Design 2: (ω, ρ, σv)′= (−.146, .98, √ .0614)′ 0 20 40 60 80 100 0.0 00 0 0.0 01 0 0.0 02 0

Figure 7: Graphical comparison of estimated acceptance probabilities against decreasing tolerance values for the MA(2) model. In this example kθ = 2 and so αn should decay linearly in ǫ2n under

correct specification.

References

Beaumont, M. A., Zhang, W., and Balding, D. J. (2002). Approximate Bayesian computation in population genetics. Genetics, 162(4):2025–2035.

Bernton, E., Jacob, P. E., Gerber, M., and Robert., C. P. (2017). Inference in generative models using the Wasserstein distance. arXiv preprint arXiv:1701.05146.

(22)

Blum, M. G. B., Nunes, M. A., Prangle, D., and Sisson, S. A. (2013). A comparative review of dimension reduction methods in approximate Bayesian computation. Statist. Sci., 28(2):189– 208.

Frazier, D., Martin, G., Robert, C., and Rousseau, J. (2018). Asymptotic properties of approxi-mate Bayesian computation. Biometrika, page asy027.

Kleijn, B. and van der Vaart, A. (2012). The Bernstein-von-Mises theorem under misspecification. Electron. J. Statist., 6:354–381.

Li, W. and Fearnhead, P. (2018a). Convergence of regression-adjusted approximate Bayesian computation. Biometrika, page asx081.

Li, W. and Fearnhead, P. (2018b). On the asymptotic efficiency of approximate Bayesian com-putation estimators. Biometrika, page asx078.

Marin, J.-M., Pillai, N. S., Robert, C. P., and Rousseau, J. (2014). Relevant statistics for Bayesian model choice. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(5):833–859.

Marin, J.-M., Pudlo, P., Robert, C. P., and Ryder, R. J. (2012). Approximate Bayesian compu-tational methods. Statistics and Computing, 22(6):1167–1180.

Marjoram, P., Molitor, J., Plagnol, V., and Tavar, S. (2003). Markov chain monte carlo without likelihoods. Proceedings of the National Academy of Sciences, 100(26):15324–15328.

Muller, U. K. (2013). Risk of Bayesian inference in misspecified models, and the sandwich covari-ance matrix. Econometrica, 81(5):1805–1849.

Robert, C. P. (2016). Approximate Bayesian Computation: A Survey on Recent Results, pages 185–205. Springer International Publishing, Cham.

Sisson, S. A., Fan, Y., and Tanaka, M. M. (2007). Sequential monte carlo without likelihoods. Proceedings of the National Academy of Sciences, 104(6):1760–1765.

A

Proofs

Theorem 1

Proof. This theorem is an adaptation of Frazier et al. (2018).

Let δn ≥ Mn(ǫn− ǫ∗)≥ 3Mnv0,n−1, then P0(Ωd) = 1 + o(1) for Ωd :={y : d{η(y), b0} ≤ δn/2}.

Assume that y_{∈ Ω}d. Consider the event

(23)

Note that, by definition d{b(θ), b0} ≥ ǫ∗, with ǫ∗ > 0. For all (z, θ)∈ Ad(δn) and if y ∈ Ωd,

δn < d{b(θ), b0} − ǫ∗ ≤ d{b(θ), η(z)} + d{η(z), η(y)} + d{η(y), b0} − ǫ∗

≤ d{b(θ), η(z)} + ǫn− ǫ∗ + δn/2

so that

δn≤ 4d{b(θ), η(z)}.

This implies in particular that Pr(Ad(δn)) = Z {d{b(θ),b0}≥ǫ∗+δn} Pθ[d{η(z), η(y)} ≤ ǫn] dΠ(θ) ≤ Z Θ Pθ(d{b(θ), η(z)} ≥ δn/4) dΠ(θ). (7)

In case (i) of polynomial tails,

Pr(Ad(δn))≤ (vnδn)−κ

Z

Θ

c(θ)dΠ(θ) = o(1) (8)

as soon as vnδn→ +∞, or in case (ii) of exponential tails

Pr(Ad(δn))≤ Ce−c(δnvn)

τ

. (9)

Moreover, we can bound from below αn = Z Θ Pθ[d{η(z), η(y)} ≤ ǫn] dΠ(θ) Note that on _{{d{η(z), b(θ)} ≤ Mv}−1 n /2} ∩ Ωd d_{{η(y), η(z)} ≤ d{η(z), b(θ)} + d{η(y), b}0} + d{b(θ), b0} ≤ v0,T−1 + Mvn−1/2 + d{b(θ), b0} ≤ ǫn as soon as ǫ∗ _{≤ d{b(θ), b} 0} ≤ ǫn− v0,T−1 + Mvn−1/2. Since ǫn− ǫ∗ ≥ v−10,T + Mvn−1, on Ωd, Z Θ Pθ(d{η(z), η(y)} ≤ ǫn) dΠ(θ)≥ Z d{b(θ),b0}≤(ǫn−ǫ∗)/4∨v−1n M/2 (1− Pθ d{η(z), b(θ)} ≥ Mvn−1/2 dΠ(θ) ≥ Z d{b(θ),b0}≤(ǫn−ǫ∗)/4∨v−1n M/2 1− c(θ)2 κ Mκ dΠ(θ) &(ǫn− ǫ∗)D∨ vn−D &(ǫn− ǫ∗)D

in case (i) of [A1], under [A2]. If case (ii) of [A1] holds, under [A2], we have Z Θ Pθ(d{η(z), η(y)} ≤ ǫn) dΠ(θ)≥ Z d{b(θ),b0}≤(ǫn−ǫ∗)/4∨vn−1M/2 1_{− c(θ)e}−hθ(M/2)_dΠ(θ) &(ǫn− ǫ∗)D

(24)

Combining these two inequality with the upper bounds (8) or (9) leads to Πǫ[d{b(θ), b0} ≥ ǫ∗+ δn|η(y)] . (ǫn− ǫ∗)−D(vnδn)−κ,

in case (i) and

Πǫ[d{b(θ), b0} ≥ ǫ∗+ δn|η(y)] . (ǫn− ǫ∗)−De−c(δnvn)

τ , in case (ii). These are of order o(1) if

δn ≥ Mnv−1n (ǫn− ǫ∗)−D/κ in case (i)

δn ≥ Mnv−1n | log(ǫn− ǫ∗)|1/τ in case (ii)

Corollary 1

Proof. Define Q(θ) = _{|d{b(θ), b}0} − d{b(θ∗), b0}|. From the continuity of θ 7→ b(θ) and the

definition of θ∗_{, for any δ > 0 there exists a γ(δ) > 0 such that}

inf

θ:d{θ,θ∗_}>δQ(θ)≥ γ(δ) > 0. Then,

Πǫ[d{θ, θ∗} > δ|η(y)] ≤ Πǫ[|Q(θ) − Q(θ∗)| > γ(δ)|η(y)] = Πǫ[|d{b(θ), b0} − d{b(θ∗), b0}| > γ(δ)|η(y)]

= Πǫ[d{b(θ), b0} > ǫ∗+ γ(δ)|η(y)].

The result follows if Πǫ[|d2{b(θ), b0} > ǫ∗ + γ(δ)|η(y)] = oP0(1). For δn > 0 and δn = o(1) as defined in Theorem 1, by the conclusion of Theorem 1, the result follows once γ(δ)_{≥ δ}n.

Proof of Theorem 2

Proof. For the sake of simplicity and without loss of generality we write vn=√n, ˜Zn=√n(η(z)−

b(θ)) and ˜Zy =√n(η(y)− b0). Denote by Bn(K) = {kθ − θ∗k ≤ K}. Throughout the proof C

denotes a generic constant which may vary from line to line. We have for all θ Pθ kη(z) − η(y)k2 ≤ ǫ2n = Pθ k ˜Zn− ˜Zy+√n(b(θ)− b0)k2 ≤ nǫ2n = Pθ k ˜Znk2+ 2 < Zn, √ n(b(θ)_{− b}0)− ˜Zy >≤ n[ǫ2n− kb(θ) − b0− ˜Zy/ √ n_k2_] = Pθ < ˜Zn, b(θ)− b0 >≤ √ n[ǫ2 n− kb(θ) − b0 − ˜Zy/√nk2] 2 − k ˜Znk2− 2 < ˜Zn, ˜Zy > 2√n !

(25)

Now on Ωn ={k ˜Zyk ≤ Mn/2} with Mn a sequence going to infinity arbitrarily slowly and such that Mn = o(n1/4), √ nkb(θ) − b0− ˜Zy/√nk2 =√nkb(θ∗)− b0k2+ √ n(θ− θ∗₎⊺_H_∗_(θ_{− θ}_∗₎ 2 − 2 < b(θ ∗₎_{− b} 0, ˜Zy > + O(M_n2/√n) + O(√n_{kθ − θ}∗_k3+_{kθ − θ}∗_kMn)

where H∗ _{is the second derivative of θ} _{7→ kb(θ) − b}

0k2 at θ∗, noting that the first derivative is

equal to 0 at θ∗_{. Let ǫ}∗ ₌_kb(θ∗₎_{− b}

0k, e′ = (b(θ∗)− b0), and ǫ > 0. If kθ − θ∗k ≤ ǫ and on the

event Ωn ={kZyk ≤ Mn} where Mn2 = o( √ n), Pθ kη(z) − η(y)k2 ≤ ǫ2n ≤ Pθ < ˜Zn, e >≤ √ n[ǫ2 n− (ǫ∗)2− (1 + Cǫ)(θ − θ∗)⊺H∗(θ− θ∗)/2] 2 + < ˜Zy, e ′ _{> +Cǫ +}_{kθ − θ}∗_kM n + Pθ kZnk2 > ǫ2 √ n/4 ≥ Pθ < ˜Zn, e >≤ √ n[ǫ2 n− (ǫ∗)2− (1 − Cǫ)(θ − θ∗)⊺H∗(θ− θ∗)/2] 2 + < ˜Zy, e >−Cǫ − kθ − θ ∗_kM n − Pθ k ˜Znk2 > ǫ2 √ n/4 (10) Consider the case where √n(ǫ2

n− (ǫ∗)2) → 2c ∈ R. We split Θ into {kθ − θ∗k2

√

n _{≤ M},}

{ǫ > kθ − θ∗_{k > M/n}1/4_{} and {ǫ ≤ kθ − θ}∗_{k}, where ǫ is arbitrarily small.}

First if_{kθ − θ}∗_k2√_n_{≤ M} √ n[ǫ2n− (ǫ∗)2− (1 − Cǫ)(θ − θ∗)⊺H∗(θ− θ∗)/2] 2 + < ˜Zy, e ′ _{> +}Mn2 √ n +kθ − θ ∗_kM n ≤ c+ < ˜Zy, e′ >− (1− Cǫ)√n(θ− θ∗₎⊺_H∗_(θ_{− θ}∗₎ 4 + Cǫ and √ n[ǫ2 n− (ǫ∗)2− (1 + Cǫ)(θ − θ∗)⊺H∗(θ− θ∗)/2] 2 + M2 √ n +kθ − θ ∗_kM n ≥ c+ < ˜Zy, e >− (1 + Cǫ)√n(θ− θ∗₎⊺_H_∗_(θ_{− θ}_∗₎ 4 − Cǫ

Moreover, using assumption [A5],

< ˜Zn, e′ >=√n < Σn(θ)−1Zn, e′ >=< Zn,√nΣn(θ)−1e′ >=< Zn, A(θ∗)e′ > +o(kZnk).

We then have with c′ _{= c+ < ˜}_Z

(26)

n is large enough Pθ kη(z) − η(y)k2 ≤ ǫ2n ≤ Pθ < Zn, A(θ∗)e′ >≤ c′− x⊺_H_∗_x 4 + Mǫ + c0ǫ−κ n−κ/4 ≤ Φ c′+ Mǫ kA(θ∗_)e′_k − x⊺_H∗_x 4_kA(θ∗_)e′_k + c0ǫ −κ n−κ/4 + sup kθ′_−θ∗_k≤u Pθ′ < Zn, A(θ∗)e′ >≤ c′− x⊺_H_∗_x 4 + Mǫ − Φ c′ _{+ Mǫ} kA(θ∗_)e_k − x⊺_H_∗_x 4_kA(θ∗_)e_k ≤ Φ c′_{+ Mǫ} kA(θ∗_)e′_k − x⊺_H_∗_x 4_kA(θ∗_)e′_k + o(1) + c0ǫ−κ n−κ/4. Similarly with y = n1/4_{(1 + ǫ)}1/2_(θ_{− θ}∗₎ Pθ kη(z) − η(y)k2≤ ǫ2n ≥ Φ c′_{− Mǫ} kA(θ∗_)e′_k− y⊺_H∗_y 4kA(θ∗_)e_k − c0ǫ−κ n−κ/4 − sup kθ−θ∗_k≤u Pθ < Zn, A(θ∗)e′ >≤ c′ − y⊺_H∗_y 4 − Mǫ − Φ c′_{− Mǫ} kA(θ∗_)e′_k− y⊺_H∗_y 4kA(θ∗_)e′_k ≥ Φ c′_{− Mǫ} kA(θ∗_)e′_k− y⊺_H_∗_y 4_kA(θ∗_)e′_k + o(1)− c0ǫ−κ n−κ/4

Moreover for all t ∈ R, writing x(θ) to emphasize its dependence in θ, ∆1 = Z Bn( M n1/4) sup kθ−θ∗_k≤u₀ Pθ′ < Zn, A(θ∗)e′ >≤ t − x(θ)⊺_H∗_x(θ) 4 − Φ t kA(θ∗_)e′_k − x(θ)⊺_H∗_x(θ) 4kA(θ∗_)e′_k dθ = n−kθ/4 Z kxk≤M sup kθ−θ∗_k≤u₀ Pθ′ < Zn, A(θ∗)e′ >≤ t − x⊺_H_∗_x 4 − Φ t kA(θ∗_)e′_k − x⊺_H_∗_x 4kA(θ∗_)e′_k dx = n−kθ/4_Mo(1)

where the last inequality follows from the dominated convergence theorem. We then have Z Bn(M/n1/4) Pθ kη(z) − η(y)k2 ≤ ǫ2n π(θ)dθ ≤ π(θ∗)(1 + o(1))n−kθ/4 Z kxk≤M Φ c′_{+ Mǫ} kA(θ∗_)e′_k − x⊺_H_∗_x 4_kA(θ∗_)e′_k dx + o(n−kθ/4_M) (11)

(27)

Similarly Z Θ Pθ kη(z) − η(y)k2 ≤ ǫ2n π(θ)dθ ≥ Z kθ−θ∗_k≤n−1/4_M Pθ kη(z) − η(y)k2 ≤ ǫ2n π(θ)dθ ≥ π(θ∗_{)(1 + o(1))n}−kθ/4 Z kxk≤M Φ c′ _{− Mǫ} kA(θ∗_)e′_k − x⊺_H_∗_x 4_kA(θ∗_)e′_k dx(1 + oP0(1)). (12)

Also if ǫ >kθ − θ∗_{k > M/n}1/4_{, since there exists a > 0 such that z}⊺_H_∗_z _{≥ akzk}2 _{for all z, if}

M is large enough, Pθ kη(z) − η(y)k2 ≤ ǫ2n ≤ Pθ < Zn, A(θ∗)e′ >≤ − aM2 8 ≤ Pθ kZnk > aM2 8kA(θ∗_)e_k .M−2κ

Now let j _{≥ 0 and set M}j = 2jM. On Mjn−1/4 ≤ kθ − θ∗k ≤ Mj+1n−1/4

Pθ kη(z) − η(y)k2 ≤ ǫ2n ≤ Pθ < Zn, A(θ∗)e′ >≤ − aM2 j 8 ≤ Pθ kZnk > aM2 j 8kA(θ∗_)e_k .M−2κ j so that Z M n−1/4_≤kθ−θ∗_k≤ǫ Pθ kη(z) − η(y)k2≤ ǫ2n π(θ)dθ . n−kθ/4 Jn X j=0 M_j−2κ(Mj+1− Mj) .n−kθ/4 Jn X j=0 M−2κ+1 j .n−kθ/4M−2κ+1. (13) Finally if _{kθ − θ}∗_{k > ǫ, kb(θ) − b}

0 − Zy/√nk2 − (ǫ∗)2 ≥ Cǫ on Ωn and when n is large enough

√ n(ǫ2 n− (ǫ∗)2)≤ c + Cǫ so that Pθ kη(z) − η(y)k2 ≤ ǫ2n ≤ Pθ < ˜Zn, b(θ)− b0 >≤ −√nCǫ + c′+ Cǫ + O(n−κ/4) ≤ Pθ(kZnk > Cn1/2kb(θ) − b0k−1/2) + O(n−κ/4) ≤ c(θ)C−κ_n−κ/2_{kb(θ) − b} 0kκ+ O(n−κ/4).

(28)

Therefore Z kθ−θ∗_k≥ǫ Pθ kη(z) − η(y)k2 ≤ ǫ2n π(θ)dθ≤ C−κ_n−κ/2 Z kθ−θ∗_k≥ǫ c(θ)kb(θ) − b0kκπ(θ)dθ + Cn−κ/4 Z kθ−θ∗_k≥ǫ c(θ)π(θ)dθ (14)

Finally combining (11), (13), (14) and (12) we obtain that if κ > kθ

Z Θ Pθ kη(z) − η(y)k2≤ ǫ2n π(θ)dθ = n−kθ/4 Z R Φ c′ kA(θ∗_)e′_k − x⊺_H_∗_x 4kA(θ∗_)e′_k dx + o(n−kθ/4₎ and for all x = n1/4_(θ_{− θ}∗₎_{∈ R}kθ fixed, writing π

zn,ǫ(·) the density of Πzn,ǫ, the ABC posterior distribution of zn(θ− θ∗), π_n1/4_,ǫ(x) = Φ c′ kA(θ∗_)ek − x ⊺_H∗_x 4kA(θ∗_)ek R RΦ c′ kA(θ∗_)e′_k − x ⊺_H∗_x 4kA(θ∗_)e′_k dx + o(1) := qc(x) + o(1) so that πn1/4_,ǫ− q_c 1 = o(1).

We now study the case where √n(ǫ2

n− (ǫ∗)2) :=

√ nu2

n → +∞ with un = o(1) and we show

that the limiting distribution is uniform. Using (10), we have that if B0,n={(θ−θ∗)⊺H∗(θ−θ∗)≤

2u2 n− 4Mn/√n}, with Mn< u2n √ nǫ going to infinity Pθ(kη(z) − η(y)k ≤ ǫn)≤ 1 ≥ Pθ < ˜Zn, e′ >≤ 2Mn+ < ˜Zy, e′ >−ǫ − kθ − θ∗kMn ≥ Pθ < ˜Zn, e′ >≤ Mn/2 ≥ 1 −_Mc1_−κ n

for some c1 > 0 on the event {| < ˜Zy, e′ >| ≤ Mn/2}, which has probability going to 1.

This implies in particular that Z B0,n Pθ(kη(z) − η(y)k ≤ ǫn)π(θ)dθ ≤ π(θ∗)(1 + o(1))Vol(B0,n) ≥ π(θ∗)(1 + o(1))Vol(B0,n)(1− c1 M−κ n ) (15) Also Vol(B0,n)≍ uknθ (16)

(29)

C′ _{> 0 such that} √ n[u2 n− (1 − Cǫ)(θ − θ∗)⊺H∗(θ− θ∗)/2] 2 + < ˜Zy, e ′ _>_{−ǫ − kθ − θ}∗_kM n ≤ −2Mn+ < ˜Zy, e′ >−ǫ − C′ǫ2Mn ≤ −Mn

on the event _{{| < ˜}Zy, e′ >| ≤ Mn/2}. Therefore writing B1,n ={Knu2n ≥ (θ − θ∗)⊺H∗(θ− θ∗) >

2u2

n(1− Cǫ)−1+ 4Mn/√n}

Z

B1,n

Pθ(kη(z) − η(y)k ≤ ǫn)π(θ)dθ . Mn−κVol(B1,n) . Mn−κKnkθ/2Vol(B0,n) (17)

Moreover Vol _{{(θ − θ}∗)⊺H∗(θ_{− θ}∗)_{≤ 2u}2_n(1_{− Cǫ)}−1+ 4Mn/√n} − Vol(B0,n) . ǫVol(B0,n). (18) If Knu2n≤ (θ − θ∗)⊺H∗(θ− θ∗)≤ ǫ2, then √ n[u2 n− (1 − Cǫ)(θ − θ∗)⊺H∗(θ− θ∗)/2] 2 + < ˜Zy, e ′ _>_{−ǫ − kθ − θ}∗_kM n ≤ − √ n(θ− θ∗₎⊺_H_∗_(θ_{− θ}_∗₎ 8

when n is large enough and there exists b > 0 such that Z Bc 1,n 1l_(θ−θ∗₎⊺_H∗_(θ−θ∗_)≤ǫ2P_θ(kη(z) − η(y)k ≤ ǫ_n)π(θ)dθ . n−κ/2 Z Bn(Aǫ2) 1l_kθ−θ∗_k≥b√_K_n_u_nkθ − θ∗k−2κdθ .n−κ/2 Z √ Aǫ b√Knun rkθ−2κ−1_dr (19) Since kθ < 2κ then the above term is of order

K(kθ−2κ)/2 n ( √ nu2_n)−κ/2ukθ n ≍ Kn(kθ−2κ)/2( √ nu2_n)−κ/2Vol(B0,n) = o(Vol(B0,n))

Finally if _{kθ − θ}∗_{k ≥ ǫ, similarly to the case where} √_nu2

n → c ∈ R, we obtain (14) and this

term is o(Vol(B0,n)) as soon as n−κ/4 = o(uknθ). Since n−1/4 = o(un) the latter is true as soon as

κ≥ kθ. Combining (15), (17), (19) , (18) and (14), we obtain that

Z Θ Pθ(kη(Z) − η(y)k ≤ ǫn)π(θ)dθ− π(θ∗)Vol( ˜B0,n) = op(1) (20)

where ˜B0,n = {(θ − θ∗)⊺H∗(θ− θ∗) ≤ 2u2n}. Let x = u−1n (θ− θ∗) be fixed and x⊺H∗x < 2, then

for n large enough x⊺_H_∗_x

≤ 2 − 4M2 n/( √ nu2 n) and using π_u−1 n ,ǫ(x) = πǫ(θ ∗_{+ u} nx|y)uknθ

(30)

then

π_u−1

n ,ǫ(x) = 1 + op(1).

If x⊺_H_∗_{x > 2, then if ǫ > 0 is small enough and n is large enough x}⊺_H_∗_x _{≥ 2(1 − Cǫ)}₋₁ ₊

4M2 n/( √ nu2 n) and π_u−1 n ,ǫ(x) = op(1). This implies that the ABC posterior distribution of u−1

n (θ− θ∗) converges to the Uniform

distri-bution over the ellipsoid _{x⊺_H_∗_x

≤ 2} in total variation.

Proposition 1

Proof. To prove Proposition 1, we prove that the approximate likelihood Pθ Z − Zy+√n(b(θ)− b0) 2 ≤ ǫ2n

is highly peaked around θ_{6= 0, and, as such, concentration around θ}∗ _{= 0 can not result.}

As in the proof of Theorem 2, writing Zn = Z = (Z1, Z2)⊺, we can define W = Z/kZk and

R = kZk/vθ and we have that W and R are independent and that their distribution does not

depend on θ. In particular R2 ∼ χ2_{(2). Now, set h = b(θ)}_{− b}

0− Zy/√n, so that Z − Zy +√n(b(θ)− b0) 2 − nǫ2n =v2θR2+ 2 √ nRvθ < W, h > +n(khk2− ǫ2n)≤ 0 (21) if and only if ∆(W ) = v_θ2n(< W, h >2 _−khk2+ ǫ2_n) = v_θ2n ˜∆(W )_{≥ 0, R ∈ (r}1(W ), r2(W ))∩ R+, where r1(W ) = √ n vθ h − < W, h > −p∆(W )˜ i, r2(W ) = √ n vθ h − < W, h > +p∆(W )˜ i. Note that ˜∆(W ) ≤ ǫ2

n so that if ˜∆(W ) ≥ 0 then | < W, h > | = khk(1 + O(ǫn)), given that

khk ≍ 1 on the event kZyk ≤ M for some arbitrarily large M. Therefore if < −W, h >≤ 0,

then < _{−W, h >≍ −khk and there is no solution for R in (21). Hence (21) holds if and only if} <−W, h >≥ 0, ˜∆(W )≥ 0 and R ∈ (r1(W ), r2(W )).

By symmetry we can set W = _{−W and, on the set < W, h >≥ 0, using the fact that}

R2 _{∼ χ}2_(2), Pθ Z − Zy +√n(b(θ)− b0) 2 ≤ nǫ2 n|W = e−r1(W )2/2₍₁_{− e}− n ˜∆(W ) v2_θ ₎ r1(W )∈ √nkhk/vθ{1 − 2ǫn} ,√nkhk/vθ To derive an approximation of Pθ kZ − Zy+√n(b(θ)− b0)k2 ≤ nǫ2n

we study more precisely r1(W ). For the sake of simplicity we assume that√nǫn = o(1), since the case where√nǫn= O(1)