Putting Robust Statistical Methods into Practice: Poverty Analysis in Tunisia

(1)

Article

Reference

Putting Robust Statistical Methods into Practice: Poverty Analysis in Tunisia

AYADI, Mohamed, MATOUSSI, Mohamed Salah, VICTORIA-FESER, Maria-Pia

Abstract

Poverty analysis often results in the computation of poverty indexes based on so-called poverty lines which can be region speci…c poverty lines. The poverty lines are made of two components, namely the amount of income to satisfy the food and the non food needs. For both components, one needs to estimate quantities such as local prices or the consummers' average basket, and this is often done through a parametric model. The resulting estimates depend on the data at hand and on the type of estimators that are used. Classical estimators (and testing procedures) such as the maximumlikelihood estimator (MLE) or the least squares (LS) estimator are extremely sensitive to model deviations such as contamination in the data and hence are said not robust. The resulting analysis can therefore give a picture which is far from reality. Therefore a robust statistical approach to the estimation of the poverty lines is very important especially because these lines will serve to compute poverty indices. The main purpose of this paper is therefore to show how robust statistical procedure can be used in poverty analysis and the di¤erent [...]

AYADI, Mohamed, MATOUSSI, Mohamed Salah, VICTORIA-FESER, Maria-Pia. Putting Robust Statistical Methods into Practice: Poverty Analysis in Tunisia. Swiss Journal of Economics and Statistics, 2001, vol. 3, no. 14, p. 463-482

Available at:

http://archive-ouverte.unige.ch/unige:6453

Disclaimer: layout of this document may differ from the published version.

(2)

Putting Robust Statistical Methods into Practice: Poverty Analysis in Tunisia

Mohamed Ayadi

¹

, Mohamed Salah Matoussi

²

and Maria-Pia Victoria-Feser

³

May 18, 2001

1ISG, Université de Tunis.

2Université de Tunis III.

3Université de Genève.

(3)

Abstract

L’analyse de la pauvreté se résume bien souvent au calcul d’indices de pau- vreté basé sur des lignes de pauvreté qui peuvent être spéci…ques aux ré- gions comparées. Les lignes de pauvreté sont construites à partir de deux composantes, à savoir le niveau de revenu nécessaire à satisfaire les besoins alimentaires et non alimentaires. Pour les deux composantes, il est nécessaire d’estimer des quantités telles que les prix locaux, le panier du consommateur local, et cela est souvent fait au moyen de modèles paramétriques. Les ré- sultats dépendent des données et des méthodes d’estimation utilisées. Les estimateurs (et procédures de tests) classiques telles que le maximum de vraisemblance (MLE) ou les moindres carrés (LS) sont extrêmement sensi- bles au déviations de modèle telles que des données contaminées et donc sont dites non robustes. L’analyse qui en résulte peut donc donner une image bien loin de la réalité. C’est pourquoi il est très important d’utiliser une approche robuste pour l’analyse des lignes de pauvreté, surtout parce que ces dernière interviennent dans le calcul des indices de pauvreté. Le but principal de cet article est donc de montrer comment une procédure statistique robuste peut être utilisée dans l’analyse de la pauvreté et de montrer l’e¤et de son utilisation dans le cas de l’étude de la pauvreté en Tunisie.

Poverty analysis often results in the computation of poverty indexes based on so-called poverty lines which can be region speci…c poverty lines. The poverty lines are made of two components, namely the amount of income to satisfy the food and the non food needs. For both components, one needs to estimate quantities such as local prices or the consummers’ average basket, and this is often done through a parametric model. The resulting estimates depend on the data at hand and on the type of estimators that are used.

Classical estimators (and testing procedures) such as the maximum likelihood estimator (MLE) or the least squares (LS) estimator are extremely sensitive to model deviations such as contamination in the data and hence are said not robust. The resulting analysis can therefore give a picture which is far from reality. Therefore a robust statistical approach to the estimation of the poverty lines is very important especially because these lines will serve to compute poverty indices. The main purpose of this paper is therefore to show how robust statistical procedure can be used in poverty analysis and the di¤erent picture on poverty comparisons they can give to the Tunisian case.

(4)

Die Berechnung von Armutslinien ist ein wichtiger Bestandteil der Ar- mutsanalyse. Dafür werden normalerweise parametrische Modelle benutzt.

Die klassische Schätzung solcher Modelle wird von Ausreissern und anderen Abweichungen stark beein‡usst. Um dieses Problem zu vermeiden, werden wir in diesem Artikel robuste Methoden vorschlagen. Anhand von Daten aus Tunisien werden in der Anwendung die Ergebnisse von klassischer und robuster Methode gegenübergestellt.

Keywords: poverty lines, robust estimation, local prices, evaluation of basic non food needs, poverty comparisons.

JEL classi…cation: C13, I32.

(5)

1 Introduction

Poverty analysis is based on the determination of poverty lines from which one then computes poverty indexes such as the head count ratio or more sophisticated ones (see e.g. Sen 1976, Clark, Hemming, and Ulph 1981, Foster, Greer, and Thorbeke 1984). This indexes can then be used by economists and policy makers for temporal or spatial comparisons in a relatively easy manner. Although the choice of the poverty index is an important issue, we concentrate in this paper on the computation of the poverty lines. The literature distinguishes two procedures for the computation of poverty lines:

the method of uniformity and the one which puts the accent on speci…city.

Several authors (see e.g. Foster and Shorrocks 1991) argue that, if we retain the criteria of nation wide poverty alleviation policy, the establishment of poverty lines needs to be independent of the subgroup to whom the household belongs. Such an approach has often led to an overestimation of the number of poor in some areas like rural areas since it doesn’t take into account the cost di¤erences in basic needs which are higher in other areas like urban areas especially for non food goods such as housing. On the other hand, the approach focusing on the speci…city insists upon the choice of the consumption basket to re‡ect the local behavior of the consumer. It therefore includes indirectly the relative perception of poverty in that locally, the poor will behave in a similar manner. In our opinion, the second approach is more appropriate to explain the di¤erences between regions, since it allows one to better calibrate the poorest.

In this paper, we therefore consider the general approach for the estimation of poverty lines proposed by Ravallion and Bidani (1994). This approach consists of determining …rst the minimum income to satisfy basic food needs and second estimating the minimum income to satisfy basic non food needs. These minimum incomes constitute respectively the food and non food poverty lines. Basic food needs are computed on a regional basis depending on the local food consumption behavior so that the typical consumption basket ensures a minimal calori…c intake as determined by nutritionists. This basket is then evaluated using local prices so that the food poverty line can be calculated. The non food poverty line is then estimated using an AIDS type model (see e.g. Deaton and Muellbauer 1980).

As it will be detailed below, the process that leads to the determination of the poverty lines goes through the estimation of di¤erent quantities.

The resulting estimates depend on the data at hand, and without being too pessimistic, it wouldn’t be very reasonable to fully trust their reliability. A question that might arise is whether a few contaminated data can have a large in‡uence on the results of the analysis. This is what is usually un-

(6)

derstood under the term of statistical robustness. Contaminated data can be of any form, such as clearly (or less clearly) outlying observations, or in other words households which are very di¤erent from the others in terms of their consumption behavior. They can also be crude mistakes like record- ing errors. If the quantity of contamination is very small, one would expect the resulting analysis not to be (too much) in‡uenced. Unfortunately, it is now well known that classical estimators (and testing procedures) such as the maximum likelihood estimator (MLE) or the least squares (LS) estimator are extremely sensitive to contamination in the data. These estimators are the ones traditionally used to estimate the di¤erent quantities needed for the evaluation of the poverty lines. The resulting analysis can therefore give a picture which is far from reality. Indeed, Cowell and Victoria-Feser (1996) showed that almost any poverty index is robust to data contamination provided that the poverty line is either exogenous to the data or robustly estimated. It is therefore important to at least consider as an alternative a procedure which uses robust estimators.

The methodology we propose here will be illustrated by means of the case of Tunisia. We will make use of the household survey data conducted by the INS (Tunisian Statistical Institute) in 1990 and involving 7734 rep- resenting households from di¤erent parts of the country. Unfortunately the household survey does not provide direct information on prices. Instead, it gives detailed information on expenditures (including consumption of own products) and quantities so that local prices can be estimated. Half of the sampled households were also included in another survey from which we can get information about the calori…c content of the goods.

Each of the following sections will be devoted to the estimation of the di¤erent steps in the determination of the poverty lines. Section 2 starts with an introduction to robust statistics. In Section 3 we discuss the estimation of the poverty line which goes through the estimation of local prices for the food components of the consumption basket and the estimation of the food consumption basket (i.e. proportion of goods consumed by the typical household in each region). The non food component of the consumption basket is also estimated. Section 4 is devoted to the case of Tunisia for the a rural-urban and spatial comparison. Finally, Section 5 concludes.

2 Introduction to robust statistics

Roughly speaking robust statistics is an extension of classical statistics that takes into account the possibility of model misspeci…cations. Model misspeci…cation has a broad meaning and includes among others outliers in the

(7)

data. Actually one is in the presence of model misspeci…cation every time the classical assumptions underlying the model and justifying the method of estimation or testing procedure are violated, i.e. very often.

It is well known that in normal regression models for example, a sin- gle observation (leverage point) can attract the regression line and therefore the estimated regression parameters give a wrong picture of the reality. Re- moving leverage points when they can be detected is not enough because for example of so-called masking e¤ects. Robust statistics is aimed at de- veloping new statistical methods that automatically take into account the possibility of model misspeci…cation and permit among others to detect automatically outliers They are a generalization of classical procedures because when the speci…ed model is exact they behave like classical methods and when the model is misspeci…ed (or contaminated) they guaranty to some extent the stability of the estimators or the test statistics. Compared to classical methods, robust methods have the advantage of being more stable (or less in‡uenced) when the model is contaminated. They loose some of the classical properties like e¢ciency for estimators or power for tests when the model is exact.

2.1 Formalization

Suppose we have the observationsx1; : : : ; xn belonging to some sample space X (of any dimension) and a parametric model Fµ with density f(¢;µ) on the sample space, where the unknown parameter belongs to some parameter space £ 2 <^p. In classical statistics, one then assumes that the observations are distributed according to Fµ, and undertakes to estimate µ based on the data at hand. In robustness theory the model F_µ is considered as a mathematical abstraction which is only an ideal approximation of reality.

Statistical procedures are developed, procedures which still behave fairly well under deviations from the assumed model. We identify the samplex1; : : : ; xn

with its empirical distribution F⁽ⁿ⁾ given by F⁽ⁿ⁾(x) = 1

n Xn

i=1

¢xi(x) where

¢x_i(x) =

½ 0 x < x_i 1 x¸xi

As estimators ofµ we consider real-valued statisticsTn=Tn(x1; : : : ; xn). We represent them as functionals of the empirical distribution Tn(xi; : : : ; xn) = T(F⁽ⁿ⁾). At the model, if the estimator is consistent, we have T(Fµ) = µ.

(8)

A neighborhood of the parametric model Fµ in which robust estimators will be stable is then de…ned by the set of distributions fG"jG" = (1¡")Fµ+

"Wg, where W is an arbitrary distribution function. The distribution G"

can be considered as a mixture distribution between Fµ and W, lying in a neighborhood of radius " from Fµ. The data generated by G" are actually generated from the parametric modelFµwith a usually large probability(1¡

")and from the contamination distribution W with small probability". One particular case is whenW = ¢z leading to the neighborhoodfF"jF"= (1¡

")Fµ+"¢zg. F"is often referred to as a"-type contamination distribution. Its interpretation is easy in that one can say that F" generates with probability (1¡") data fromFµ and with probability " gross errors z.

2.2 The In‡uence Function

There are di¤erent ways of assessing the robustness properties of estimators evaluated in a neighborhood of the model. On well known way, which is called the in…nitesimal approach looks at the behavior of the estimator when a very small or an in…nitesimal amount of contamination is introduced in the model. To do this one uses the In‡uence Function (IF), a tool that was introduced by Hampel (1968, 1974) and further developed by Hampel, Ronchetti, Rousseeuw, and Stahel (1986). It describes how the estimator responds to a small amount of contamination at any point z. The IF at z can be thought of as an approximation to the relative change in an estimate caused by the addition of a small proportion of spurious observations at z.

It is de…ned at the model Fµ by IF(z;T; Fµ) = lim

"!0

·T((1¡")Fµ+"¢z)¡T(Fµ)

"

¸

Hence, it describes the e¤ect of a small contamination at the point z ("¢z) on the estimate, standardized by the mass of the contamination. Although theIF appears to be a particular case of measure of in‡uence, it is su¢cient to describe the maximal asymptotic bias of an estimator over a neighborhood of the model. Indeed, it can be shown that

sup

W kT(G_")¡T(F_µ)k ¼"sup

x kIF(x;T; F_µ)k

see Hampel et al. (1986). Therefore, by studying the IF of an estimator, one is assessing the robustness properties of this estimator under the worst type of contamination.

(9)

2.3 General class of estimators

Huber (1964) has introduced a general class of estimators generalizing the MLE in which robust estimators can be de…ned. This class, the class of M-estimators, de…nes estimators as the solution TÃ inµ of the minimization problem

minµ

Xn i=1

½(xi;µ)

where ½ is some convex function onX££. Suppose that½ has a derivative Ã(x;µ) =

· @

@µ1

½(x;µ); : : : ; @

@µp

½(x;µ)

¸₀

The MLE is included in this class, since one can write Ã(x;µ) =s(x;µ) =

· @

@µ1

logf(x;µ); : : : ; @

@µp

logf(x;µ)

¸

One of the most important properties of the IF is that it is proportional to the function Ã de…ning the M-estimators. Hence, for Fisher consistent M-estimators (i.e. Ã is such thatR

Ã(x;TÃ)dFµ(x) = 0), we have

IF(z;TÃ; Fµ) = M(Ã; Fµ)^¡1Ã(z;µ) (1) where M(Ã; Fµ) = ¡R _@

@µ⁰Ã(x;µ)dFµ(x). Choosing a bounded function Ã de…nes a robust M-estimator. In general,Ã is a function of the scores function itself (y = s(x;µ)), or of the standardized residuals (y = r=¾) in the regression problem. A very popular Ã functions is Huber’s function

Ã_c(y) =H_c(y) =y¢min µ

1; c kyk

¶

The bounding constantcis chosen so that extreme values are indeed bounded and that at the model, the M-estimator doesn’t loose too much e¢ciency compared to the MLE. For example, when the underlying model is normal, the value c= 1:345 leads to a relative e¢ciency of 95%.

Finally, under regularity conditions (see Huber 1981), an M-estimator has an asymptotic Normal distribution, i.e. p

n(TÃ ¡µ) !^D N(0; V(µ; Fµ)) with the asymptotic covariance matrix

V(µ; Fµ) =M(Ã; Fµ)^¡1Q(Ã; Fµ)M⁰(Ã; Fµ)^¡1 (2) whereM(Ã; Fµ) = ¡R _@

@µ⁰Ã(x;µ)dFµ(x)andQ(Ã; Fµ) =R

Ã(x;µ)Ã(x;µ)⁰dFµ(x).

(10)

2.4 Robust statistics in some particular models

Apart from M-estimators, had hoc robust estimators have been proposed in simple problems. In the univariate location and scale, it is very common practice to replace the sample mean by the sample median to estimate the center of the data. To estimate the variance, the sample variance can be replaced by a very popular measure called the median absolute deviation (MAD) given by

1:4826medjxi¡medxij

In multivariate settings, the problems cannot be translated from the classical to the robust so easily. For example, with multivariate normal data that are supposed to be contaminated, taking the vector of the medians to estimate the center is not a good solution (see Rousseeuw and Leroy 1987).

Moreover, mostM-estimators with a boundedIF su¤er from what is called a low breakdown point when the dimension is large. Indeed, in contrast to the IF which is a local concept, the breakdown point is a measure of the global reliability of the estimators, which describes up to what distance (in probability) from the model distribution the estimators still give some relevant information (the estimators are still near the true value of the parameters).

The breakdown point is associated with quantitative robustness. The simplest de…nition of the breakdown point "^¤ is the minimum proportion " of contamination at a point z for which the estimator T((1¡")Fµ +"¢z) is unbounded in z. The formal de…nition is given in Hampel et al. (1986), p.

97. With mostM-estimators for the multivariate location and scale problem,

"^¤ = 1=(p+ 1) (see e.g. Huber 1977). Several authors have proposed high breakdown point estimators (see e.g. Stahel 1981, Donoho (1982), Rousseeuw 1984, Tamura and Boos 1986, Davies 1987, Lopuhaä 1991, Woodru¤ and Rocke 1994, Tyler 1994, Kent and Tyler 1996, Cheng and Victoria-Feser 2000) and some are even implemented in the Splus statistical package. This is the case for example for the minimum volume estimator of Rousseeuw (1984). Broadly speaking, the MVE …nds the centre of the smallest ellipsoid containing half of the data. It is very robust in that it can withstand up to nearly 50% of outliers. It is however not very e¢cient. Therefore it is common practice to use the MVE to detect extreme observations which are then discarded and averages are then computed for each variable. Extreme values are those whose (Mahalanobis) distance is relatively far from the centre. Al- though the more recent research propose more sophisticated high breakdown point estimators, this is the estimator we used in our calculations because it readily available.

In the regression problem, the estimators can be in‡uenced by extreme observations either in they-direction (in‡uence of residual) or in thex-direction

(11)

(in‡uence of position). One important purpose of robust estimation is to bound both in‡uences. The simplest estimators have been de…ned by considering the in‡uence onxand theyseparately (for a review see e.g. Hampel et al. 1986, Marazzi 1993). We propose here to use a robust estimator of the so-called Mallows class ofM-estimators. They generalize the MLE which are given for the normal regression model Y =X¯+" by

1 n

Xri(¯)

¾ x_i = 0 r_i being the residuals. Mallows class is de…ned by

1 n

XÃ³ri

¾

´w(xi)xi = 0 (3) Therefore, by choosingÃ andwappropriately, the in‡uence of large residuals (ri) and extreme values in the independent variables (xi) is limited. There are di¤erent possible choices for Ã and w depending on several optimality criteria (see Hampel et al. 1986). Actually the di¤erences lie mainly on the choice of w(x) . We choose here a weighting scheme based on the diagonal elementshiiof the hat matrixH =X(X⁰X)^¡¹X⁰ . Without leverage points, hii¼ _n^p (p= dim(¯)) so that one downweights cases for which hii exceeds a value b_n^p . Thus

wb(xi) = min

½

1;bp=n hii

¾

A benchmark is given by b = 1:5. For the function Ã, an e¢cient choice is the Huber function Ã_c¡_r

¾

¢ = _¾^rminn

1;_jr=¾j^c o

which downweights standardized residuals lying far away, depending on the choice of the constant c.

Choosing c = 1 leads to no downweighting, and choosing c = 1:345 leads to an M-estimator achieving 95% of the MLE’s e¢ciency (see e.g. Hampel et al. 1986). One also needs to estimate ¾, and once again it should also be robust. A relatively simple one is given by the MAD of residuals. Finally, the asymptotic covariance matrix of an M-estimator de…ned by (3) is found using (2) and is estimated by

^

¾²

R Ã²_c(x)dÁ(x)

£R Ã⁰_c(x)dÁ(x)¤2

£X^Tdiag(w(xi))X¤¡1£

X^Tdiag(w²(xi))X¤ £

X^Tdiag(w(xi))X¤¡1

Á being the standard normal distribution.

(12)

3 Estimation of poverty lines

3.1 Food poverty lines

To determine the food poverty lines, one has to estimate the local consumer’s basket and evaluate it at local prices. Consumer’s habits and especially the heterogeneity of products available on local markets lead to the constitution of very di¤erent consumer’s baskets having same calori…c contents. Moreover, the prices of the same product can considerably vary across regions and are not necessarily directly available. The aim of this section is therefore to determine the procedures for the evaluation of the cost of a consumer’s basket of food with minimum calori…c contain. We start with the estimation of local prices and pursue with the determination of the local consumer’s basket.

3.1.1 Estimation of local prices

The non existence of o¢cial regional prices is a widely spread problem and usually “proxies” are used. For example, Chatterjee and Bhattacharya (1974) approximate rural price indices in India with the consumer price index of agricultural workers and the urban price index with the consumer price index of industrial workers and non agricultural employees. In the case of Tunisia, from the INS survey we have information on quantities and expenditures for the food products consumed by households. For this case and similar ones we propose to take a summary value of unit values of food products that can be estimated directly from the survey data (see also Deaton, Parikh, and Subramanian 1993).

It is well established in the recent literature, that unit values su¤er from major problems that lead to non negligible biases (see Deaton and Paxson 1995, Ayadi and Matoussi 1999). The most serious sources of bias are in our opinion the problems of measurement errors in the values of quantity bought by the households and the quality e¤ects associated with the unit values.

Indeed, the unit values do not only re‡ect prices but also quality choice of the relevant product. Despite these limits, there is a persistent tradition in using the concept of unit value (see e.g. Deaton 1988, 1990, Ayadi, Krish- nakumar, and Matoussi 1997). Given a set of di¤erent partially aggregated food products, for each household a vector of unit values is computed. The question then is how to obtain for each region an estimate of the centre of unit values which is not in‡uenced by extreme observations and at the same time avoid the problems due to the aggregation of heterogeneous products which is unavoidable in order to maintain a manageable size?

For the problems induced by goods heterogeneity we can only propose

(13)

an had hoc procedure. One should avoid too much aggregation so that a category would include di¤erent products with considerably di¤erent prices.

For example, the meat category is a too heterogenous category, since lamb and beef for example usually don’t have similar unit values. To prevent this problem, we propose to aggregated goods with unit values having similar distributions, which could be checked graphically using for example histograms.

This procedure result into a compromise between a minimum number of categories with minimal unit values heterogeneity.

For the computation of the summary value for unit values, a classical and natural choice would be to simply take the average of the unit values to estimate the “centre” . However, the problem is that such a measure is well known to be very sensitive to extreme values, hence not robust. For example, with the Tunisian data, the number of extreme values is not negligible and since the summary measure should represent the price an average person pays for the product, we feel that here a robust measure as the median is more appropriate. For example, if one looks at the boxplots of the distribution of the unit values of some goods, one sees that the number and the magnitude of outlying (i.e. extreme) unit values are large - see Figure 1. These extreme values might be legitimate although in our case we believe they can be taken as gross errors and this for several reasons, among others:

² Errors in the measurement units in the quantities during the data en- tries: For instance, the surveyor may register the quantities in kilo- grams instead of grams. This is the case of the observed 160 000 Dinars per kg of olive oil, that is due only to this type of error.

² A badly cleaned basic data …le: This is very uncommon in our household survey. However, we have found for the category semolina and

‡our the …gure 99999.9 that indicates inevitably that the corresponding data …le has not been appropriately cleaned.

{Figure1 approximately here}

We therefore propose to use median unit values as an estimator of the center of the unit values’ distribution. In the case of Tunisia, a comparison of some of the Tunis retail prices during the month of December 1990 (INS survey between May 1990 and April 1991) with the corresponding median unit values computed using our data reveals a perfect match in particular for the very homogenous categories (see Ayadi and Matoussi 1995). Furthermore an analysis of the median of unit values shows that and for several categories, the unit values derived from the survey are integers. This suggests that the estimates are equal to the true market prices.

(14)

3.1.2 Estimation of the consumers’ basket

To determine the cost of food to meet the basic needs, we have to estimate the consumer’s basket for the household around the poverty line in each region. The choice of the households that fall into the category of the ones

“around the poverty line” can be made by considering estimated poverty lines from past and/or alternative analyses. We don’t think that the relative arbitrariness in the choice of the bounds should lead to dramatic di¤erences in the estimation of the consumer’s basket, especially if one adopts a robust approach for its estimation. For example, in the case of Tunisia we considered an interval into which all previously computed poverty lines fall.

Given the set of households around the poverty line, to determine typical consumer’s basket, we are actually looking for the centre point in the space of all food goods purchased by the households. Note that the food goods can be indi¤erently measured by quantities or their value once unit values have been determined. If the purchased goods have a spherical distribution such as the multivariate normal, then the best (classical) estimator of the centre is the mean vector. If the distribution is not spherical, for example there are outliers, then the classical estimator can be seriously biased. It is therefore necessary to at least check for potential outliers by means of a robust estimator of location. We consider here the MVE de…ned in section 2.

In the case of Tunisia we found di¤erent estimates of the consumer’s basket between a classical an robust approach. By looking at the boxplots of the budget shares allocated to di¤erent products (see Figure 2) one remarks that some households consume relatively large quantities of some products, leading (if a classical analysis is performed) to an over-estimation of the food basket cost. These features can be explained by the fact that for example during the survey, a poor household has a large consumption of meat because of an exceptional celebration like a wedding.

{Figure2 approximately here}

Finally, given the typical basket of the poor in each region, its value is then adjusted so that it ensures a minimal calori…c content. Since the consumer’s basket is made of aggregated products, the calori…c value of each category is not necessarily readily available, but usually needs to be estimated by using information provided by for example a survey on the calori…c content of the household consumption. This is the case of our example for which we will explain the methodology we have chosen in section 4 The food poverty

(15)

line, z^f, is then obtained by evaluating the resulting basket by means of the median unit values. This is done region by region.

3.2 Non food poverty lines

The natural approach would be to begin by constructing a consumer’s basket of non food goods associated to a poor household and then calculate its value by means of local prices. There are however two serious impediments to this approach. The …rst one is due to the fact that usually one doesn’t have data on non food products and the second is that it is almost impossible to elaborate a homogenous measure for the quantities of non food products and deduce representative unit values. We therefore choose to approximate the non food budget share of the poverty line by looking at the behavior of the household with income equal to the food poverty line. The share they are ready to sacri…ce in order to satisfy their basic needs on non food products will serve to estimate the non food part of the poverty line.

The valuation of the non food component is done using a method presented in Ravallion (1994). This approach is based on the intuitive argument that the de…nition of “basic non food needs” require the valuation of the will- ingness to give up a necessary food product in order to purchase the good in question. Ravallion estimates the value of the food component by an AIDS class of functions:

sij =®⁰_j +¯_jlog Ãyij

z_j^f

!

+X

k

±^k_jd^k_ij+"ij (4) where sij is the food share of household i belonging to the region and/or area j, yij is its total per capita expenditure, z_j^f is the already established food poverty line for area j, d^k_ij are socioeconomic variables such as age of the household head, household size, etc., and "ij is a disturbance term. A quadratic term in log

µ

yij

z_j^f

¶

can be added to improve the …t (see Ravallion and Bidani 1994). The value of ®j = ®⁰_j +P

k±^k_jd^k_j estimates the expected non food share of households with per capita expenditure that reaches the food poverty line, i.e. yij =z_j^f. The evaluation ofd^k_j is made by means of the subsample with per capita expenditure around the poverty line. The poverty line is then given by

zj = (2¡®j)z_j^f

and includes de minimum expenditure to satisfy basic food and non food needs. This is actually the so-called lower poverty line and one can also

(16)

compute a so-called upper poverty line which then leads to an interval of possible poverty lines (see Ayadi, Matoussi, and Victoria-Feser 1998).

The model given in (4) is a theoretical model that assumes independence and normality of the error term. If the hypotheses are met, then a least squares (LS) estimator is the best estimator. However, when the hypotheses are violated, even slightly, for example by the presence of a few outlying data, then the LS can be seriously biased (see for example Hampel et al. 1986).

This induces not only biased estimates of the regression coe¢cients but also of their standard errors. We therefore propose the use of robust methods of estimation for the evaluation of the poverty lines derived through the estimation of (4) and have done it using the Mallows estimator de…ned in section2. The classical and robust estimates resulting from the Tunisian data are presented and interpreted in section 4.

4 Poverty comparisons in Tunisia

The aim in this example is not only to compare di¤erent regions in Tunisia, but also to compare poverty in rural versus urban regions.

4.1 Robust poverty lines in Tunisian regions

As said in the introduction, the data we use are from the household survey data conducted by the INS in 1990 involving 7734 households. The sampling scheme and the results of the survey are explained in INS (1990). The survey also provides the demographic characteristics of households. In order to take into account the di¤erent geographical and socioeconomic characteristics of the regions in Tunisia, we separated the country in 5 di¤erent homogenous regions (see the Appendix), three of which are urban areas.

To compute unit values to determine local prices, we had to put the food products into several categories. The categories of goods that we have retained for the calculations of the medians of the unit values are a tight compromise between unit value heterogeneity and minimum number of categories. We had for example to separate olive oil from other oils that where put initially in the category of cooking oil because of the relative large dif- ference in unit values. This was also the case for items such as “milk and derivatives“, “vegetable“, “cereals“ and “other food products“. In particular, we decomposed the cereals into six sub-categories, milk category into milk and derivatives and kept “poultry“ and “eggs“ in the same category.

Once the categories determined we computed the median of the unit values. Then, using the quantities of products consumed by each household

(17)

around the poverty line, we computed the cosumer’s basket by means of the MVE estimator. The households we consider are all households with per capita expenditure varying between 135 and 290 Dinars per year whatever their location, since all the poverty lines computed until then for Tunisia are situated in this interval.

Since some of the di¤erent categories for food products contain products with di¤erent calori…c contents, we computed an average calori…c content for each category by means of a regression of the calories on the quantities.

Least squares estimation is used here, since for homogenous products the

…t is perfect, and for less homogenous products we cannot do better than “ averaging” . The quantities of the typical basket are then adjusted so that the resulting basket insures a standard calori…c content. The minimal calori…c need is on average of 2090 calories per person and per day, with a variation depending on the age, the sex and the type of activity. It corresponds to the recommended standards by the nutritionists.

The estimated food poverty lines for the 5 regions of Tunisia are presented in the …rst column of Table 1¹. For comparison purpose, we also give the food poverty lines computed classically in the third column. When compared with the classical method of estimation, we observe that the robust food poverty lines are almost the same for Greater Tunis and the Urban Littoral but they are considerably lower for the other regions. This can be explained by the fact that the typical food basket in these regions if estimated classically is very much in‡uenced by not so typical households.

Robust Classical poverty lines poverty lines z_j^f zj z_j^f zj

Greater Tunis 191 280 188 278 Littoral Urban 180 252 178 255 Interior Urban 145 212 173 246 Littoral Rural 126 172 139 191 Interior Rural 122 169 137 188 Table 1: Estimated poverty lines in Tunisia (1990)

Using the AIDS model, we then computed the non food poverty lines. We

…rst present the robust regression’s results. In Table 2 are given the estimated

1We grouped Greater Tunis and Littoral urban together to compute the consumer’s basket because the respective samples were relatively small. We however considered dif- ferent unit values and therefore poverty lines for the two regions.

(18)

coe¢cients and their standard errors (bold number correspond to signi…cant values) for the …ve regions of Tunisia for the model (4) withd¹_ij the household size,d²_ijthe number of children,d³_ijthe number of working women,d⁴_ijthe age of the household head. Note that the choice of the explanatory variables to enter the model is restricted to the variables provided by the survey. They seem however to be as well relevant ones. Indeed, in a preliminary study we performed a robust variable selection using the results of Ronchetti and Staudte (1994) and found that the variables we use here were those who where selected most of the time across regions.

Greater Urban Urban Rural Rural Tunis Littoral Interior Littoral Interior

®⁰ 0.638 0.713 0.620 0.780 0.759 (0.023) (0.019) (0.02) (0.018) (0.015)

¯ -0.0536 -0.059 -0.0032 -0.0399 -0.015 (0.016) (0.014) (0.013) (0.014) (0.01)

° -0.0192 -0.0263 -0.040 -0.0343 -0.0482 (0.005) (0.004) (0.005) (0.005) (0.005) d¹ -0.0678 -0.0927 -0.0688 -0.111 -0.111 (0.009) (0.006) (0.007) (0.007) (0.006) d² 0.0019 0.0005 0.0059 0.0129 0.0155 (0.004) (0.003) (0.003) (0.003) (0.002) d³ -0.014 -0.03 -0.0272 -0.003 -0.003

(0.005) (0.004) (0.006) (0.005) (0.004) d⁴ 0.0003 0.00082 0.00091 0.0009 0.00077

(0.0003) (0.0002) (0.0002) (0.0002) (0.0002) Table 2: Robust regression estimates of the AIDS model

These estimates are then used to compute the poverty lines. They are given in the second column of Table 1 and can be compared to their classical counterpart in the fourth column. The poverty lines zj robustly estimated are lower in the poorest regions (UIN, RLT, RIN) when compared to their classical counterpart.

However, what attracts our attention is the di¤erence between the urban and rural lines. The urban/rural ratio is greater in the littoral region with the robust approach (1.47) compared to the ratio computed classically (1.34).

On the other hand, the ratio urban/rural in the Interior is similar between the two approaches ( 1.25 compared to 1.31), which is also similar to the urban rural ratio in the littoral region computed classically. We believe that

(19)

the di¤erence found with the robust approach makes more sense when one looks at other economic indicators. Indeed, the urban littoral has seen a rapid economic development compared to the interior which has led to an increase in living costs and which explains why the urban/rural di¤erence in the littoral region should be greater than the one in the interior region.

4.2 Regional poverty levels in Tunisia

In order to summarize our results on poverty analysis, Table 3 presents several poverty indexes for the di¤erent regions of Tunisia. They are all special cases of the Foster-Greer-Thorbeke (FGT) (Foster et al. 1984) class of poverty measures given by

P_{F GT}^® = 1 n

X

yi·z

µz¡yi

z

¶®

where y is the household total expenditure and z is the poverty line. The simple headcount ratio (HCR) is obtained when® = 0, and the poverty gap (PG) when ® = 1. We also computed the FGT with ® = 2. It should be stressed that the larger the value of ®, the more does the measure penalize poverty gaps. It should again be stressed that Cowell and Victoria-Feser (1996) showed that they are in principle robust provided that the poverty lines on which they are based are computed in a robust fashion. This is what we did in the previous sections so that a poverty analysis we conduct now based on these poverty indexes is also robust.

Robust analysis Classical analysis HCR PG FGT HCR PG FGT Greater Tunis 4.4 .81 .24 4.3 .79 .24 Urban Littoral 3.6 .57 .15 3.6 .61 .16 Urban Interior 9.1 2.12 .75 13.1 3.36 1.26 Rural Littoral 8.4 1.63 .54 10.9 2.42 .82 Rural Interior 11.9 3.15 1.18 16.0 4.28 1.66

Table 3: Poverty indexes for Tunisia

By inspecting Table 3, one …rst remarks that the di¤erences in estimated poverty indexes between a classical and robust approach lie in lower estimates especially for less developed regions, i.e. rural littoral and urban and rural interior. We can interpret this di¤erence by the di¤erence in the estimates we got for the consumer’s basket and/or for the costs of non food goods. How- ever, even if poverty is less important than classically expected, the results

(20)

show that poverty in Tunisia during the year 1990 is mainly a phenomenon that a¤ects more severely the rural rather than the urban areas. In each region the rural poverty index exceeds that of the urban one. It should be noted that this is true for classical and robust procedures, although a robust approach moderates the di¤erences in the Littoral region. If we take the example of the simplest measure and the most commonly used in the empirical studies namely the HCR, we can observe easily that the ratio of the rural over the urban in the interior region goes to 131%, when we retain the robust poverty line and to 122% for the classical one. Moreover, it goes up to 233%

and 303% when we consider the corresponding poverty lines for the littoral region. We note that this ratio remains practically the same when we take other measures. Finally, it should also be stressed that poverty is a problem that a¤ects rather the Interior regions both rural and urban, whatever the poverty measure.

5 Conclusion

In this paper we have investigated poverty in Tunisia, paying special attention to urban versus rural comparisons. We have adopted a robust approach, not only to the de…nition of the poverty lines by region, but also to the statistical methods to compute these lines. We showed that it is important to be robust in all steps of the analysis, in order for the …nal results to be robust. We believe that this approach is safe because it is not in‡uenced by either gross errors (as in unit values) or legitimate data from households not behaving (in their consumption’s habit) like the majority. We may say at least that the results re‡ect the behavior of the majority of the households, and not some compromise between the majority and extreme ones.

The results showed that poverty in Tunisia is clearly a rural phenomena and this contradicts the …nding of governmental institutions. This remains true even if one adopts the robust approach which tends to moderate the di¤erential between rural and urban areas. We also noted that poverty a¤ects more severely the Interior regions.

(21)

References

Ayadi, M., J. Krishnakumar, and M. S. Matoussi (1997). Combining spatial and temporal variations in the estimation of price elasticities and application to Tunisian households. Cahier du Département d’Econometrie 97.01, University of Geneva.

Ayadi, M. and M. S. Matoussi (1995). Analyse et comparaison de la pau- vreté dans les milieux urbain et rural en Tunisie en 1990. Paper presented at the 11th World Cogress of the International Economic Asso- ciation, Tunis.

Ayadi, M. and M. S. Matoussi (1999). Analyse de la pauvreté en Tunisie:

Comparaisons spatiales et temporelles utilisant des enquêtes ménage.

Paper presented at the 11th ERF conference, Cairo.

Ayadi, M., M. S. Matoussi, and M.-P. Victoria-Feser (1998). Urban rural poverty comparisons in Tunisia: A robust statistical approach. Cahiers du Département d’Econométrie 98.09, University of Geneva.

Chatterjee, G. S. and N. Bhattacharya (1974). Between state variation in consumer prices and per capita household consumption in rural In- dia. In T. N. Srinivasan and P. Bardhan (Eds.), Poverty and income distribution in India. Calcutta: Statistical Publishing Society.

Cheng, T.-C. and M. Victoria-Feser (2000). Robust correlation estimation with missing data. Cahiers du département d’econométrie no 2000.5, University of Geneva, CH-1211 Geneva 4.

Clark, S., R. Hemming, and D. Ulph (1981). On indices for the measurement of poverty. Economic Journal 91, 515–526.

Cowell, F. A. and M.-P. Victoria-Feser (1996). Poverty measurement with contaminated data: A robust approach.European Economic Revue 40, 1761–1771.

Davies, P. L. (1987). Asymptotic behaviour of S-estimators of multivariate location parameters and dispertion matrices.The Annals of Statis- tics 15, 1269–1292.

Deaton, A. and J. Muellbauer (1980).Economics and consumer behavior.

Cambridge: Cambridge University Press.

Deaton, A. S. (1988). Quality, quantity and spatial variation of price.

American Economic Review 78, 418–430.

Deaton, A. S. (1990). Price elasticities from survey data: Extension and Indonesian results. Journal of Econometrics 44, 281–309.

(22)

Deaton, A. S., K. Parikh, and S. Subramanian (1993). Food demand pat- terns and pricing policy in Maharashtra : An analysis using household level survey data. Research program in development studies, Princeton University.

Deaton, A. S. and c. Paxson (1995). On urban versus rural poverty in India. Research program in development studies, Princeton University.

Donoho, D. L. (1982). Breakdown properties of multivariate location estimators. Ph.d. qualifying paper, Department of Statistics, Harward University.

Foster, J., J. Greer, and E. Thorbeke (1984). A class of decomposable poverty measures. Econometrica 52, 761–765.

Foster, J. J. and A. F. Shorrocks (1991). Subgroup consistent poverty indices. Econmetrica 59, 687–709.

Hampel, F. R. (1968). Contribution to the Theory of Robust Estimation.

Ph. D. thesis, University of California, Berkeley.

Hampel, F. R. (1974). The in‡uence curve and its role in robust estimation.

Journal of the American Statistical Association 69, 383–393.

Hampel, F. R., E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel (1986).

Robust Statistics: The Approach Based on In‡uence Functions. New York: John Wiley.

Huber, P. J. (1964). Robust estimation of a location parameter.Annals of Mathematical Statistics 35, 73–101.

Huber, P. J. (1977). Robust covariances. In S. S. Gupta and D. S. Moore (Eds.), Statistical Decision Theory and Related Topics, Volume 2, pp.

1753–1758. New York: Academic Press.

Huber, P. J. (1981). Robust Statistics. New York: John Wiley.

INS (1990). Enquête sur le budget et la consommation des m<’enages en Tunisie. Ministère du plan, Tunis.

Kent, J. T. and D. E. Tyler (1996). Constrained M-estimation for multivariate location and scatter. The Annals of Statistics 24, 1346–1370.

Lopuhaä, H. P. (1991). ¿-estimators for location and scatter. Canadian Journal of Statistics 19, 307–321.

Marazzi, A. (1993).Algorithms, Routines and S-Functions for Robust Sta- tistics. Belmont, California: Wadsworth and Brooks/Cole.

(23)

Ravallion, M. (1994).Poverty comparison, Volume 56 ofFundamentals of Pure and Applied Economics. Chur, Switzerland: Harwood Academic Press.

Ravallion, M. and B. Bidani (1994). How robust is a poverty pro…le? The World Bank Economic Review 8, 75–102.

Ronchetti, E. and R. G. Staudte (1994). A robust version of Mallows’sCp. Journal of the American Statistical Association 89, 550–559.

Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the American Statistical Association 79, 871–880.

Rousseeuw, P. J. and A. M. Leroy (1987). Robust Regression and Outlier Detection. New York: John Wiley.

Sen, A. K. (1976). Poverty: An ordinal approach to measurement.Econo- metrica 44, 219–231.

Stahel, W. A. (1981). Breakdown of covariance estimators. Technical Re- port 31, Fachgruppe für Statistik, ETH, Zurich.

Tamura, R. and D. Boos (1986). Minimum Hellinger distance estimation for multivariate location and covariance.Journal of the American Sta- tistical Association 81, 223–229.

Tyler, D. E. (1994). Finite sample breakdown points of projection based multivariate location and scatter statistics. Annals of Statistics 22, 1024–1044.

Woodru¤, D. L. and D. M. Rocke (1994). Computable robust estimation of multivariate location and shape in high dimension using compound estimators. Journal of the American Statistical Association 89, 888–

896.

(24)

Appendix: Determination of the …ve homoge- nous regions in Tunisia

To make use of the characteristics of di¤erent regions in Tunisia, we separated the households according to their location with respect to 5 di¤erent homogenous regions. Tunisia is traditionally subdivided into three natural regions : North, Center and South. This decomposition is motivated by the geographical characteristics of the country. However, from an economic point of view, it is more appropriate to divide Tunisia into three parts: The Greater Tunis, and two homogenous sets namely the Littoral and the Interior.

The Greater Tunis area, which involves almost 25% of the total population, is characterized by very special administrative, social and economic properties. The Tunisian Littoral (Bizerte, Cap-Bon, Sahel, Sfax, and Gabes) have known since the independence an economic and social prosperity. This coastal fringe extending from North to South contains, together with the Greater Tunis area, the essential of the tourist, industrial and urban activity of the economy. Despite a certain economic progress, the Interior region has several acute social and economic problems which distinguish it from the other two regions. If one compares the per capita expenditure (during 1990), one sees that this subdivision is justi…ed. In addition to this regional decomposition, it is necessary to take into account the rural-urban distinction. We also aggregated the rural part of the Greater Tunis and the littoral. Two reasons support this aggregation. First, the size of the rural Greater Tunis is very small, only 167 households and second, the rural of Greater Tunis and those of the rest of the littoral are very similar and can be lumped together to form a homogenous spatial set. This leads us to …ve homogenous regions, namely the urban Greater Tunis (UGT), the urban Littoral (ULT), the urban Interior (UIN) the rural Littoral (RLT) and the rural Interior (RIN).

(25)

02468101214

U.V. Fruits 0.00.51.01.52.02.53.0

U.V. Sugar 0.20.40.60.81.01.21.4

U.V. Pasta & Semolina 0.00.51.01.52.0

U.V. Bread & Flour

Figure 1: Unit value distribution of some food products

(26)

0.00.10.20.3

Red meat 0.00.10.20.30.4

Vegetables 0.00.10.20.3

Fruits 0.00.050.100.150.200.250.30

Pasta and semolina

Figure 2: Budget share distribution of some food products