• Aucun résultat trouvé

Overall Method

Dans le document Distributional National Accounts Guidelines (Page 106-109)

5.2 Preparing Survey Microdata

5.2.1 Interpolation of Tax Tabulations

5.2.1.2 Overall Method

The method of Blanchet, Flores, and Morgan (2019) is a consistent approach, rooted in calibra-tion theory, for incorporacalibra-tion tax informacalibra-tion into the survey. It has three stages: the choice of the “merging point”, the reweighting of survey observations above and below this point, and the expansion of the survey’s support by including the highest incomes from tax data.

Firstly, the choice of the merging point between the survey and tax distributions requires a comparison of the number of observations within each percentile. The input tax data for this exercise should take the form of the tabulation produced from the interpolation described in section 5.2.1. This should indicate (at least) bracket percentile, bracket thresholds and bracket averages in the form described by table 5.1.

p thr bracketavg

0 0 190

0.01 378 568

0.02 757 947

. . . .

0.99997 10,597,050 12,059,701

0.99998 13,889,397 17,180,048

0.99999 22,006,175 64,444,108

Table 5.1: Example of Tabulation Used to Correct Survey Data

The columnpshould always express percentiles of the same total population as the survey whose distribution is to be corrected. The method automatically selects a “merging point” between the two distributions by comparing the interpolated distribution of incomes from the tax data above a percentile pto the distribution of survey income.

This point pshould correspond to the beginning of the “trustable span” of the tax data — the point beyond which the distribution is can be considered reliable. This can be defined as the share of the population covered in the tax data or the share of taxpayers in the population.3

3In practice, taxpayers typically constitute a subset of the population that declares income to the tax authorities.

We stress that the aim is never to use the full distribution from tax data above pto correct the survey. This is because the beginning of its trustable span is not always sharp — the reliability of the tax data increases with income but is not always well defined. Therefore it is more prudent to restrict its use to the minimum that is necessary. Moreover, once we are past the point where there is clear evidence of a bias, we prefer to avoid distorting the survey in unnecessary ways.

Therefore, we select a merging point that is usually above p.4 fY(y), fX(y)

Figure 5.2: Choice of the Merging Point

The precise method for choosing the merging point seeks to preserve the continuity of the underlying income density function, and to realistically account for the form of the bias present in figure 5.1. In cases where the trustable span is large enough, such that the survey and tax distributions overlap, the procedure for endogenously choosing the merging point is presented

4The method is flexible enough to also confront cases where the trustable span may be too small to observe an overlap between the densities. For these, additional parameters need to be defined from neighboring years or countries to extrapolate the form of the bias and thus the correction to the whole distribution. See Appendix B in Blanchet, Flores, and Morgan (2019).

in figure 5.2.

Letθ(y) = fX(y)/fY(y)be the ratio of the survey density to the tax density at the income level y. This ratio is approximated empirically by comparing the frequencies in the percentiles of the interpolated tax distribution described in section 5.2.1 with the corresponding frequencies in the survey distribution. The value ofθ(y)can be interpreted as a relative probability. Ifθ(y)<1, then people with incomeyare underrepresented in the survey. Conversely, ifθ(y)>1, then they are overrepresented.

Note that what matters for the correction is the probability of response at a given income level relativeto the average response rate. Intuitively, if some people are underrepresented in the survey, then mechanically others have to be overrepresented, since the sum of weights must ultimately sum to the population size. This basic constraint has important consequences for how we think about the adjustment of distributions. Any modification of one part of the distribution is bound to have repercussions on the rest. In particular, it makes little sense to assume that the survey is not representative of the rich, and at the same time that it is representative of the non-rich.

LetΘ(y)be the cumulative-density counterpart toθ(y), i.e.,Θ(y) =FX(y)/FY(y). Likeθ(y), it can be estimated directly from the data. As figure 5.2 shows, in order to ensure a plausible form forθ(y), and for the density function to remain continuous, the merging point has to be chosen as the highest value ofysuch thatΘ(y) =θ(y). Tax data is discarded below this point. We call this point ¯y.

Since we must preserve the total population in the survey — which has to sum up to the total population — densities below the merging point must be lowered. Given the lack of tax data below the merging point, we choose to uniformly reweight observations below ¯y. This corresponds to the dotted blue line in figure 5.2. The implicit assumption made is that the relative probability of response stays constant over this part of the distribution, which seems to be satisfied in practice in countries with sufficient tax data (see Blanchet, Flores, and Morgan, 2019, section 3.2.2.).

The reweighting stage of the correction can be interpreted as addressing the survey’snonsampling error — that is, error from heterogeneous nonresponse rates across the population. Such a calibration procedure has several benefits. It allows users to adjust weights while respecting the aggregate consistency of other variables known to be already representative (e.g., age, gender).

But it also allows users to incorporate other distributional dimensions from the tax data, such as population characteristics or income composition, if they exist, as well as dealing with differing

income concepts. These additional dimensions are elaborated in the section 5.2.1.3.

After applying the calibration, the survey should be statistically indistinguishable from the tax data. This would be sufficient if the goal were to use the weighted survey to run regressions on the relationship between different variables. However, the precision that we get at the top of the income distribution may still be insufficient for some purposes, namely, accurately estimating inequality indicators, especially those that emphasize the top of the distribution, like top income shares. Indeed, they suffer from severe small-sample biases, which in general lead to an underestimation of inequality.

This sampling error — which comes from the limited size of the survey sample — can be overcome by expanding the survey’s support. Practically, the method involves replacing the survey distribution beyond the merging point with the tax distribution, matching the survey covariates to it by preserving the rank of each observation. Statistically, this implies using the tax data for the marginal distribution of income, but keeping the survey data when it comes to other variables (and their dependency, by which we mean their copula, with income). This step ensures that the income distribution from the tax data is reproduced, while the survey’s covariate distribution (including the household structure) is preserved. Although the socio-demographic totals for variables other than income are maintained, their conditional distribution can vary.

Dans le document Distributional National Accounts Guidelines (Page 106-109)