Robust methods for personal income distribution models

(1)

Thesis

Reference

Robust methods for personal income distribution models

VICTORIA-FESER, Maria-Pia

Abstract

In the present thesis, robust statistical techniques are applied and developed for the economic problem of the analysis of personal income distributions and inequality measures. We follow the approach based on influence functions in order to develop robust estimators for the parametric models describing personal income distributions when the data are censored and when they are grouped. We also build a robust procedure for a test of choice between two models and analyse the robustness properties of goodness-of-fit tests. The link between economic and robustness properties is studied through the analysis of inequality measures.

We begin our discussion by presenting the economic framework from which the statistical developments are made, namely the study of the personal income distribution and inequality measures. We then discuss the robust concepts that serve as basis for the following steps and compute optimal bounded-influence estimators for different personal income distribution models when the data are continuous and complete. In a third step, we study the case of censored data and propose a generalization of the EM [...]

VICTORIA-FESER, Maria-Pia. Robust methods for personal income distribution models. Thèse de doctorat : Univ. Genève, 1993, no. SES 384

URN : urn:nbn:ch:unige-64509

DOI : 10.13097/archive-ouverte/unige:6450

Available at:

http://archive-ouverte.unige.ch/unige:6450

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

(3)

Robust Methods for Personal Income Distribution Models

Maria-Pia Victoria Feser

Submitted for the degree of Ph.D in Econometrics and Statistics Faculty of Economic and Social Sciences

University of Geneva, Switzerland Accepted on the recommendation of Dr. A.C. Atkinson, professor, London,

Dr. P. Balestra, professor, Geneva, Dr. U. Kohli, professor, Geneva,

Dr. E. Ronchetti, professor, Geneva, supervisor, Dr. P. Rousseeuw, professor, Brussels.

Thesis No. 384 May 1993

(4)

i

To Johannes, with love.

(5)

Abstract

In the present thesis, robust statistical techniques are applied and developed for the economic problem of the analysis of personal income distributions and inequality measures. We follow the approach based on inﬂuence functions in order to develop robust estimators for the parametric models describing personal income distributions when the data are censored and when they are grouped. We also build a robust procedure for a test of choice between two models and analyse the robustness properties of goodness-of-ﬁt tests.

The link between economic and robustness properties is studied through the analysis of inequality measures.

We begin our discussion by presenting the economic framework from which the statistical developments are made, namely the study of the personal income distribution and inequality measures. We then discuss the robust concepts that serve as basis for the following steps and compute optimal bounded-influence estimators for different personal income distribution models when the data are continuous and complete. In a third step, we study the case of censored data and propose a generalization of the EM algorithm with robust estimators. For grouped data, Hampel’s theorem is extended in order to build optimally bounded-influence estimators for grouped data.

We then focus on tests for model choice and develop a robust generalized Cox-type statistic. We also analyse the robustness properties of a wide class of goodness-of-fit statistics by computing their level influence functions. Fi- nally, we study the robustness properties of inequality measures and relate our findings with some economic properties these measures should fulfil.

Our motivation for the development of these new robust procedures comes from our interest in the field of income distribution and inequality measurement. However, it should be stressed that the new estimators and tests procedures we propose do not only apply in this particular field, but they can be used in or extended to any parametric problem in which density estimation, incomplete information, grouped or discrete data, model choice, goodness-of-fit, concentration index, is one of the key words.

(6)

iii R´esum´e

Dans cette thèse, nous developpons et appliquons certaines techniques de la statistique robuste au problème économique de l’analyse de la distribution du revenu personnel et des mesures d’inégalité. Nous utilisons l’approche basée sur les fonctions d’influence afin de developper des estimateurs robustes pour les modèles paramétriques décrivant la distribution du revenu personnel lorsque les données sont censurées et lorsqu’elles sont groupées.

Nous construisons aussi des procédures robustes pour tester le choix entre deux modèles et analysons les propriétes de robustesse de tests d’adéquation.

Le lien entre certaines propriétes économiques et de robustesse est étudié au moyen des mesures d’inégalité.

Nous commen¸cons notre discussion par une présentation du cadre éco- nomique dans lequel nous nous situons, à savoir l’étude de la distribution du revenu personnel et des mesures d’inégalité associées. Nous exposons ensuite les concepts de la statistique robuste qui nous sont utiles par la suite et cal- culons des estimateurs optimaux à influence bornée pour différents modèles de distribution de revenu personnel lorsque les données sont continues et complètes, simulées ou réelles. Dans un troisième temps, nous étudions le cas des données censurées et proposons une généralisation de l’algorithme EM avec des estimateurs robustes. Le théorème de Hampel est ensuite

´

etendu au cas des données groupées et des estimateurs robustes à influence bornée de fa¸con optimale sont proposés. Plus tard, nous nous concentrons sur les procédures de choix de modèle et développons une statistique de test robuste de type Cox. Nous analysons aussi les propriétés de robustesse d’une large classe de statistiques de test d’adéquation en calculant les cor- respondantes fonctions d’influence sur le niveau. Finalement, nous étudions les propriétes de robustesse de mesures d’inégalite en fonction des propriétés

´

economiques que ces derni`eres doivent satisfaire.

Le développement de nouveaux estimateurs et de nouvelles procédures de test a été motivé par notre intérêt au problème de l’étude des distributions de revenu personnel et des mesures d’inégalité. Cependant, il est utile de mettre en évidence le fait que les nouveaux estimateurs et les nouvelles procédures de test que nous proposons ne sont pas seulement applicables dans ce domaine particulier. En effet, ils peuvent être appliqués ou étendus

`

a des problèmes paramétriques dans lesquels des termes comme estimation de densité, information incomplète, données groupées ou discrètes, choix de modèle, tests d’adéquation, indices de concentration sont des mot-clés.

(7)

Acknowledgement

I would like to express my gratitude to Prof. E. Ronchetti for his valuable suggestions and his generous guidance throughout the course of this research.

His encouragement as expert and as friend have made this work possible.

I am also grateful to Prof. A. C. Atkinson and Dr. F. Cowell for their support during my research at the London School of Economics and to Prof.

P. Balestra, Prof. U. Kohli and Prof. P. Rousseeuw for their comments during the defense.

My thanks also go to my friends and colleagues of the faculty of economic and social sciences of the University of Geneva for their stimulating discussions and their moral support, especially to S. H´eritier for his helpful comments during the preparation of the defense.

Finally, I would like to express my grateful thanks to my parents, for their love, encouragement and support during most of my student life.

(8)

List of Figures

2.1 A typical representation of the Lorenz Curve . . . 11 2.2 The Gamma density as a model for PID . . . 23 3.1 Value of the mean IF around the true parameter θ. . 42 3.2 MLE and OBRE of the Gamma distribution on PSID

data . . . 53 3.3 MLE and OBRE of the Dagum I model on PSID data 54 3.4 Gamma (OBRE) and Dagum (MLE) fit on PSID data 55 3.5 OBRE of the Gamma and Dagum I model on FES data 56 3.6 Efficiency of the OBRE for the Gamma model . . . 58 3.7 Efficiency of the OBRE for the Pareto model . . . 59 3.8 Sensitivity of the MLE and the OBRE to outliers for

the Pareto model . . . 60 3.9 Sensitivity of MLE and OBRE to diﬀerent propor-

tions of contamination . . . 62 3.10 Bias of Theil index estimates when the data are con-

taminated . . . 64 4.1 Weights given by the OBRE with 10% of information

loss . . . 75 4.2 Weights given by the OBRE with 30% of information

loss . . . 76 7.1 Behaviour of goodness-of-ﬁt statistics with model con-

tamination . . . 149

xi

(15)

(16)

List of Tables

3.1 Some examples of occurrence and frequency of gross errors . . . 29 3.2 MLE and OBRE for the Gamma model 1 (non con-

taminated) . . . 46 3.3 MLE and OBRE for the Gamma model 2 (1% of ‘bad’

contamination) . . . 47 3.4 MLE and OBRE for the Gamma model 3 (3% of con-

tamination) . . . 47 3.5 MLE and OBRE for the Gamma model 4 (5% of con-

tamination) . . . 47 3.6 MLE and OBRE for the Pareto model 1 (non con-

taminated) . . . 49 3.7 MLE and OBRE for the Pareto model 2 (2% of con-

tamination) . . . 49 3.8 MLE and OBRE for the Pareto model 3 (5% of con-

tamination) . . . 49 3.9 MLE and OBRE for the Gamma and Dagum models

on PSID data. . . . 52 3.10 MLE and OBRE for the Gamma and Dagum models

on FES data. . . . 53 4.1 OBRE on non contaminated data, with the EMM

algorithm and the CD estimation . . . 77 4.2 OBRE on contaminated data at 1%, with the EMM

algorithm and the CD estimation . . . 78 4.3 OBRE on contaminated data at 3%, with the EMM

algorithm and the CD estimation . . . 78 4.4 OBRE and MLE on non contaminated data, with the

EMM algorithm . . . 79 xiii

(17)

4.5 OBRE and MLE on contaminated data at 1%, with the EMM algorithm. . . 79 4.6 OBRE and MLE on non contaminated data, with the

EMM algorithm, when we ignore truncation . . . 80 5.1 MLE and OBRE (c =5.0) for the Pareto model with

grouped data . . . 100 6.1 Finite sample level of Cox and Atkinson statistics

(Gamma against Lognormal) . . . 113 6.2 Finite sample level of Cox and Atkinson statistics

(Exponential against Pareto) . . . 114 6.3 Finite sample level of Cox and Atkinson statistics

(Pareto against Exponential) . . . 115 6.4 Finite sample level of LKR statistic (Gamma against

Lognormal) . . . 116 6.5 Finite sample level of LKR statistic (Exponential against

Pareto) . . . 116 6.6 Finite sample level of LKR statistic (Pareto against

Exponential) . . . 116 6.7 Power (in %) of the Cox statistic (Exponential against

Pareto) . . . 117 6.8 Power (in %) of the LKR statistic (Exponential against

Pareto) . . . 118 6.9 Actual levels (in %) of Cox and Atkinson statistics

under model contamination (ε= 1%) (Gamma against Lognormal) . . . 124 6.10 Actual levels (in %) of Cox and Atkinson statistics

under model contamination (ε= 2%) (Gamma against Lognormal) . . . 125 6.11 Actual levels (in %) of the robust Atkinson statistic

(c = 2.0) with contamination (Pareto against Expo- nential) . . . 132 6.12 Actual levels (in %) of the Atkinson statistic with

contamination (Pareto against Exponential) . . . 133 8.1 Empirical Theil index when a random proportion of

data are multiplied by 10 . . . 163 8.2 Empirical Theil index when a random proportion of

data are multiplied by 4 . . . 164

(18)

LIST OF TABLES xv 8.3 MLE and Theil index with and without data contam-

ination . . . 165 8.4 OBRE and Theil index with and without data con-

tamination . . . 166

(19)

(20)

Chapter 1

Introduction

A great number of philosophers, scientists, politicians, economists, writers, humanists, religious people through the ages have spent a lot of energy trying to understand the reasons of human inequalities. It is hard to believe there will be one day an answer.

Our work however was motivated by this kind of question: why are there so great diﬀerences in people’s wealth? The present dissertation is not a philosophical essay, but a modest scientiﬁc contribution to the study of one of the several aspects of human wealth, the distribution of the income among the people. Moreover, its aim is even not to try to give some elements of an answer to the question, but to provide the economist with new statistical tools, developed especially for the matter.

The distribution of income among people is also called the personal income distribution (PID). In economics, its study has several scopes. One of them is to understand how the total income in a given society is distributed among the people, or the households, or economic units, that is to deter- mine which economic and social factors inﬂuence the distribution of income.

Another aim is to provide a measure which represents a judgement of the degree of inequality in the distribution of personal income, not only by itself but also when compared with the same measure computed on the basis of data from diﬀerent populations.

The space for the statistician is then wide. There are (a) stochastic models to build (for explaining how the PID is generated), (b) econometric models to define and estimate (for determining the factors influencing the generation and distribution of income), (c) statistical distributions to define and estimate (for describing PID) and (d) inequality indexes to build and estimate (for measuring income inequality). In the present work, we

1

(21)

concentrate on the two last aspects.

The regularities displayed by observed PID over time and space provide sufficient justification to describe them with the help of some statistical distribution functions. This provides not only a useful summary of the phenomenon, but also a technique to study the effects of alternative redistributive policies. In particular, the estimated distribution can serve as a basis for the computation of inequality measures. The phenomenon of income inequality has been a source of world-wide social upheaval. It has become a weapon in the hands of social reformers and a point of intellectual debate among academics. It is therefore necessary to invest energy in the development of appropriate statistical tools.

The two main aspects of this debate, ethical evaluation and statistical measurement, are not always clearly distinguishable. In our work, however, the statistical tools we propose do not only apply to the study of PID or inequality measures but to a wider range of similar problems. On the other hand, our work was directed by the speciﬁcity of the economic problem, that is the developments we made were motivated by their usefulness in the study of PID and income inequality measures.

Moreover, in the case of inequality measure we have a closer look at the relations between the economic and statistical properties. This is important because drawing inferences about economic inequality plays an important part in political debates about economic and social trends, and in a variety of applied studies in the field of welfare economics. However, the statistical basis on which the inferences are drawn is not always spelt out, and so the relationship between the numbers observed in a particular sample and the supposed underlying concept of inequality within the target population may be different from that suggested by superficial appearances.

The statistical innovation we propose in this ﬁeld is the use of robust methods. Robustness is a statistical concept which in a sense measures a

“qualitative” aspect of any estimator, more precisely its stability under non standard conditions. It also conveys the idea that the theoretical models, may they be simple or very complex, are only able to reﬂect the behaviour of the majority of the data. That is, the robust statistical tool is built such that the inﬂuence of data that may not belong to the stated theoretical model is limited.

It is well known that economic data in particular are far from being clean; this usually means that some observations may be present which in a sense have nothing to do with the majority of the data. These rogue data can be a result of the collection procedures. A simple example is the

“decimal point error”: the coder inadvertently puts the decimal point in

(22)

3 the wrong place and thus multiplies an observation by a factor of 10. More subtle is the week-month confusion where data are supposedly collected on weekly income, but some respondents actually report income per month.

If those observations that we also call contaminations or outliers have negligible impact upon the analysis, then obviously there is nothing to worry about. Unfortunately, in most cases they are extreme and therefore they can drive the value of the estimators by themselves. It is arguable that an outlier of this sort should be treated as exceptional and dropped from the sample.

Such extreme values may of course be picking up true information; but very often in empirical work a case can be made for dropping “obviously” an in- appropriate or suspect observation that may be the result of recording error or other contamination. This type ofad hocprocedure is unsatisfactory, but if it is not done then the result of the analysis may be seriously biased.

The robust methods we propose automatically take into account the presence of extreme observations during the estimation procedure. Indeed, these robust estimation procedures are built in such a way that they provide at the same time and in an optimal way, robust estimators and weights corresponding to each observation according to its ‘distance’ form the bulk of the majority of the data. There is therefore no need for a preliminary subjective data screening.

The developments we make are organized in the following way. In chapter 2 we present the framework of PID and inequality measures. We first discuss very briefly the different theories explaining the generation and distribution of income. We also present the Lorenz curve, a statistical tool for representing and comparing inequality in the PID, and review the most well known income inequality measures. The different parametric models proposed in the literature to describe the PID are then analyzed. A discussion about the statistical aspects involved in the analysis of PID concludes this chapter.

In chapter 3, we compute robust estimators for PID models when the data are continuous and complete. We begin by presenting the robustness concepts we need to later develop the theory, in particular the influence function (IF). The IF is the main robustness tool we use in our developments. It gives the influence of an infinitesimal amount of contamination introduced in the data on the value of any statistic (e.g. estimator, test statistic, inequality measure...) Since the case of continuous and complete data is simple, we use optimal robust bounded-influence estimators already developed in the literature. However, our contribution is the application of the general theory to the case of PID models, in particular with real data.

In chapter 4 we widen the framework to censored data. To compute ro-

(23)

bust estimators in this case we propose a generalization of the EM algorithm, namely the EMM algorithm. The former allows one to compute maximum likelihood estimators (MLE) when the data are censored. The EMM algorithm allows one to compute robust estimators in the same situation. After a presentation of the EM algorithm, we discuss its generalization. We then compare the EMM algorithm when the data are truncated with the classical approach which considers the conditional distribution.

Since the data on PID are numerous, they are often presented in a grouped form. In chapter 5 we build robust estimators for this case. We first present a large class of classical estimators and compute theirIF. Although we find that theIF is bounded, that is the influence of infinitesimal amounts of contamination on the value of the estimators is limited, we show that it can nevertheless be large. Therefore, after defining a more general class of estimators, we find estimators which are less influenced by contamination.

We conclude chapter 5 with a simulation study in which we compare the classical and robust estimators for grouped data.

At this stage, we will have provided the necessary tools to compute robust estimators for a given parametric PID model. However, this work would be incomplete without the development of a robust procedure for choosing one PID model. This is actually the subject of chapter 6. We concentrate on tests between non-nested hypotheses. We begin by presenting the most well known test statistics, particularly the Cox-type statistics. We then highlight one of their disadvantages, namely the non accuracy of the approximation of their distribution by means of their asymptotic distribution, even when the sample sizes are relatively large. We also study their robustness properties, that is the inﬂuence on the asymptotic level of the test due to inﬁnitesimal amounts of contamination. Finally, we propose a robust test statistic based on a robust parametric test, which avoids the problems raised previously, and show its performance when computed from contaminated samples.

In order to be as complete as possible, we also study goodness-of-fit tests in chapter 7. Actually, we only show that the classical goodness-of-fit tests can be badly influenced by infinitesimal amounts of contamination among the data. In particular, we compute the influence of such contaminations on the asymptotic level of the test and present a numerical example.

Finally, in chapter 8, we study more in details the case of inequality measures. Indeed, these measures can be thought of as estimators of the true underlying inequality which depends on the distribution of income.

Therefore, we can compute their IF. However, the aim is not the same as with PID models, in that this time we want to relate the behaviour of the IF to the economic properties the diﬀerent inequality measures fulﬁl.

(24)

5 We ﬁnd that in some special cases some inequality measures have a bounded IF, but unfortunately we must conclude that in the more realistic cases, the IF of a very large class of inequality measures is unbounded. We conclude this chapter with the proposition of robust inequality measures via robust estimates of parametric PID models and present a simulation study.

Finally, chapter 9 concludes the present work.

(25)

(26)

Chapter 2

Income Distribution and Inequality Measures

2.1 Introduction

The topic of the income distribution can be stated in the general economic framework as did Harold Lydall: ‘The essential problem of economics is how to increase economic welfare. In a broad sense, this problem can be di- vided into two parts: how to increase total output from given resources; and how to distribute the resulting goods and services in such way as to give the community the most benefit from them. These two aspects are sometimes described as the problem of ‘production’ and the problem of ‘distribution’, respectively. The two parts are not, of course, independent; and many of the most difficult questions arise out of the interdependence of production and distribution. Nevertheless, it is possible to identify some influences which bear primarily on the side of production and others which primarily affect distribution. No progress could be made in the discussion unless we ab- stracted, at least temporarily, from some of the considerations which might eventually be shown to be relevant to one or other side’ (Lydall 1968).

Research on income distribution has followed two main directions. The ﬁrst deals with the factor price formation and the corresponding factor shares, i.e. the distribution of income among the factors of production.

This approach was initiated by Ricardo (1819), and further developed by several schools of economic thought. The second approach deals with the distribution of a mass of income among the members of a set of economic units (family, household, individual), considering either the total income of each economic unit or its desegregation by source of income, such as wages

7

(27)

and salaries, property income, self-employment income, transfers, etc. The related topic of the latter approach is commonly called the size distribu- tion of income or personal income distribution (PID). This chapter is only concerned with this topic.

According to Slottje (1989), the theory of PID can be divided into three major categories. Models explaining the generation of income distributions are one of the important aspect of the theory. Why are incomes in a given society at a given time diﬀerent? What are the determinants inﬂuencing the particular aspects of the income distribution? These are the main questions that researchers interested in the generation of income have tried to answer.

In section 2.2 we brieﬂy present the diﬀerent theories.

Another important aspect regards the measures of inequality given the income distribution. Since it can be argued that the principal indicator of social welfare is given by the income level and the PID, it is important to develop tools to compare diﬀerent societies on a social welfare basis. This is the role of the Lorenz curve and inequality measures, developed respectively in sections 2.3 and 2.4.

Finally, an alternative way of studying PID is to describe it by means of statistical tools. As will be argued below, this approach has a lot of advantages. Moreover, it can serve as a basis for the study of the eﬀect of policies on the PID. This is why a number of authors have concentrated their research on modelling the PID by means of parametric models. This approach is developed in section 2.5 and serves as basis of the research presented in the following chapters.

2.2 The generation and distribution of income

The various theories that have been proposed to explain the distribution of income among individuals have emerged from two main schools of thought.

The first may be called the stochastic theory of distribution and is represented by such authors as Gibrat (1931), Champernowne (1937, 1953), Aitchison and Brown (1954), Rutherford (1955), Mandelbrot (1960, 1961) and Steindl (1965). These authors explain the generation of income with the help of stochastic processes, that is the actual form of the distribution is the stationary state of a stochastic process. For example, Gibrat (1931) formulated his theory based on the law of proportionate effect, and proposed a model which generates a positively skewed distribution. Gibrat’s model is a first-order Markov chain model. The variables are expressed in their logarithms with the log of income dependent on the log of income

(28)

2.2. THE GENERATION AND DISTRIBUTION OF INCOME 9 lagged a period and random events. The theory shows that, as time goes by, the distribution of income approaches the distribution of the random disturbance, which tends toward normality. Hence, he then proposed the lognormal distribution as a suitable income distribution.

In 1937, Champernowne based his stochastic theory on Markov chains which generated a Pareto distribution. Later, in 1953, he showed that under suitable conditions, his stochastic process tends toward a unique equilibrium distribution dependent upon the transition matrix but not on the initial distribution. Typically, his models specify transitional probabilities, that is the probabilities that units belonging to a certain class will move up or down to another class in the following period of time. Then, the income distribution at a certain time is linked to the income distribution one period before by a transition equation through transition probabilities. Champernowne made some quite strong assumptions which considerabilly simpliﬁed his models.

He supposed ﬁrstly that no income unit moves up by more than one income class in a period of time, and secondly that the transitional probabilities are constant with respect to time and independent of the income level. He then proposed diﬀerent models able to generate the Pareto distribution, a two-tailed distribution obeying the Pareto law (see section 2.5) and other distributions, by relaxing his initial hypotheses.

Finally, similar variations of these stochastic models were compiled by the authors mentioned above. We should also say that one main criticism about the simple stochastic models is that the process requires an incredibly long period of time to attain an equilibrium or a stationary state distribution (Shorrocks 1973).

The second school of thought, which may be called the socioeconomic school, seeks the explanation of PID by means of economic and institutional factors, such as sex, age, occupation, education, geographical diﬀerences, and the distribution of wealth. Three groups of authors belong to this school.

The ﬁrst follows the human capital approach, based on the hypothesis of lifetime income maximization. The authors concentrate their attention on the supply side of labour, which is the result of the maximization of the personal utility function. Typically, they specify a utility function in which variables such as the general price level, the salary, education, family size, etc, are entered. This approach was initiated by Mincer (1958) and Becker (1962, 1967) and subsequently developed by Chiswick (1968, 1971, 1974).

The second group of authors, who mainly link the PID to education levels, is referred to as the education planning school by Tinbergen (1975). It is represented by such authors as Bowles (1969), Dougherty (1971, 1972) and Psacharopulos and Hinchliﬀe (1972). They concentrate their attention on

(29)

the demand side, deriving various types of labour from production functions.

Hence, in this case a production function is speciﬁed, and it includes not only the classical production factors such as land, labour and capital, but also technical development, diﬀerent types of labour often being measured by the degree of schooling.

Finally, the third group of authors is called the supply and demand school. The major contribution to this approach is represented by Tinbergen (1975), who considers PID a result of the supply and demand of diﬀerent kinds of labour.

2.3 The Lorenz curve and analysis of income in- equality

The Lorenz curve, widely used to represent and analyze the size distribution of income and wealth, is defined as the relationship between the cumulative proportion of income units and the cumulative proportion of income received when units are arranged in ascending order of their income. Lorenz (1905) proposed this curve in order to compare and analyse in a non-parametric way, inequalities of wealth in a country during different epochs, or in different countries during the same epoch. The curve has been used principally as a convenient graphical device to represent size distributions of income and wealth. It is a very useful tool not only for the nonparametric description of the observed PID but also for the comparison in terms of social welfare between income distribution states. In the first subsection we present the technical aspect of the Lorenz curve and in the second subsection we explain the link between the Lorenz curve and the concept of social welfare.

2.3.1 Deﬁnition and construction

The graph of the curve is represented in a unit square (see ﬁgure 2.1). The straight line joining the points (0,0) and (1,1) is called the egalitarian line, because along this line 10% of income receivers get 10% of income, 20%, 20%

of income, and so on. Typically, actual distributions lie below the egalitarian line: the greater the convexity, the greater the degree of inequality.

More formally, the Lorenz curve can be defined by means of the probability distribution function associated to the income variable. Let F be this function, and define p= F(x) =₀^xdF(t), where p can be interpreted as the proportion of units having an income less than or equal tox. Pietra (1915) (see also Gastwirth (1971)) presented a definition of the Lorenz curve

(30)

2.3. THE LORENZ CURVE AND ANALYSIS OF INEQUALITY 11

Proportion of income receivers

Proportion of income received

0.0 0.2 0.4 0.6 0.8 1.0

0.00.20.40.60.81.0

Figure 2.1: A typical representation of the Lorenz Curve in terms of the inverse of the cumulative distribution function given by F⁻¹(t) = inf{x:F(x)≥t}. The Lorenz curve is then written as

L(p) = 1 µ

_p

0 F⁻¹(t)dt (0≤p≤1) (2.1) where the mean µ is given by µ = xdF(x). (see also Kakwani 1980 for another deﬁnition).

The Lorenz curve can also be interpreted as a tool to measure the concentration of the income variable. It represents a point comparisonmeasure as regard to a synthetic comparison measure given by inequality measures (see Zenga 1989).

The Lorenz curve satisﬁes the following conditions (Kakwani 1980):

1. if p= 0 thenL(p) = 0 2. if p= 1 thenL(p) = 1

3. L(p) = ^x_µ ≥0 and L(p) = _µf(x)¹ >0

(31)

4. L(p)≤p

wheref(x) is the density function associated to F(x).

It is possible to use the Lorenz curve as a parametric tool. Two approaches have been considered. The more obvious is simply to choose a parametric distribution for the income variable, and derive analytically the corresponding Lorenz curve. (see Dagum 1980a for the derivation of the Lorenz curve corresponding to some well known parametric models for PID).

The second approach was formulated by Kakwani and Podder (1973, 1976).

They speciﬁed the functional form of the Lorenz curve directly (for the general case) as

L(p) =p^αe⁻^β(1⁻^p) (2.2) where 0 ≤ p ≤ 1, a curve which satisﬁes the properties described above.

Other authors proposed new parametric families as models for the Lorenz curve. The list includes models due to Rasche et al. (1980), Gupta (1984), Villase˜nor and Arnold (1989), and more recently Basmann et al. (1990).

We present these functions in appendix B.

As we see in the next section, the Lorenz curve is a very useful tool for the analysis of inequality. It can be used (a) to measure the income inequality (see Gini 1910),(b)to perform a partial ordering of social welfare states (see Atkinson AB 1970),(c) to study the eﬀect of income taxes (see Latham 1988),(d)to derive goodness-of-ﬁt tests for exponential distribution functions, as well as upper and lower bounds for the Gini ratio (see Gastwirth 1972 and Gail and Gastwirth 1978b).

Finally, it should be stressed that recently another measure of point concentration has been proposed in the literature: theZ(p) concentration curve (see Zenga 1984). It is based on the comparison of the inverse cumulative distribution functionF⁻¹(p) with the inverse ﬁrst incomplete moment function F₁⁻¹(p), whereF₁(x) = _µ¹₀^xtdF(t). The Z(p) concentration curve is given by

Z(p) = 1−F⁻¹(p)

F₁⁻¹(p) (2.3)

According to Zenga (1989), its advantage is that it has not a ‘forced behaviour’ that is it does not fulﬁl property 3 for the Lorenz curve. Indeed, this property implies that the diﬀerence functionp−L(p) assumes its maximum value forp=F(µ) and there is no reason why the relative inequality must be greater for the middle classes than for the richest or the poorest classes. (For examples of application see Dancelli 1989 and Salvaterra 1989).

(32)

2.3. THE LORENZ CURVE AND ANALYSIS OF INEQUALITY 13 2.3.2 An ordering tool

As we have seen, the Lorenz curve displays the deviation of each individual income from perfect equality, and hence it captures the essence of inequality.

The nearer the Lorenz curve is to the egalitarian line, the more equal the distribution of income will be. Consequently, the Lorenz curve can be used as a criterion for ranking PID.

The income ranking uses the concept of “Lorenz-domination”. An income profile is said to Lorenz dominate another income profile in the weak sense if the Lorenz curve of the former lies nowhere below the Lorenz curve of the latter. We have strict Lorenz domination if we add the restriction that at some places the Lorenz curve is above. However, this ordering is a quasi-ordering, since if the Lorenz curves intersect, neither of the income profiles is said to be preferred.

There is also a link between the Lorenz curve ranking and social welfare.

Atkinson AB (1970) proved a theorem which implies that if the Lorenz curve corresponding to one PID is above the Lorenz curve of another PID (and both have the same mean), then the social welfare function (or social evaluation function, see Chakravarty 1990, for a deﬁnition) is greater for the ﬁrst population, regardless of the form of the utility function except that it is increasing and concave. Later, Dasgupta, Sen and Starret (1973) and Rothschild and Stiglitz (1973) generalized Atkinson’s theorem by weakening the hypotheses. (For a relation between social welfare functions and PID, see also Dagum 1990).

In practice, international comparisons involve usually different population sizes and different means, as do intertemporal comparisons for the same country. Therefore, Shorrocks (1983) generalized the Lorenz curve by scal- ing it up by the mean income. He also proved that an unambiguous ranking of income profiles (providing some suitable conditions on the social welfare function) can be obtained if and only if the generalized Lorenz curves do not intersect. This is likely to be true in many important practical situ- ations, since differences between Lorenz curves tend to be relatively small compared with variations in mean incomes. However, the welfare judgement captured by the generalized Lorenz domination may come into conflict with the social desire for more equally distributed income. For example, if we increase the income of the richest individual, keeping all other incomes fixed, the total income increases, as does inequality. It is nevertheless possible to have a PID ranking which avoids this kind of conflict. As Shorrocks (1983) showed, if we assume that an improvement of each income by a constant improves the social welfare function, then the comparison of the generalized

(33)

Lorenz curves discounted for each individual by the mean income, gives a quasi-ordering of the income proﬁles.

2.4 Income inequality measures

While the Lorenz criterion provides only a quasi-ordering of income profiles, an alternative statistic that completely orders the set of all income profiles is an inequality index. An inequality index is a scalar representation of interpersonal income differences within a given population, and hence can be considered as a measure of inequality (if inequality is defined in terms of income).

One of the first inequality indexes was developed by Corrado Gini. It is still one of the most widely used to analyse the size distribution of income and wealth. Gini (1912) specified the Gini mean difference which is by definition

∆ = n j=1

n i=1

|x_j−x_i|/n(n−1) (2.4) where 0≤∆≤2µ. The last expression can also be written

∆ = |x−y|dF(x)dF(y) (2.5)

where X and Y are identically and independently distributed variables.

When all the incomes are equal, then ∆ = 0 and when the last income is equal to the total income, then ∆ = 2µ. Since ∆ is a monotonic increasing function of the degree of income inequality, Gini deﬁned

I_G= ∆/2µ (2.6)

0 ≤ I_G ≤ 1, as an income inequality measure, known as the Gini ratio or Gini index. Gini also proved that his index is equal to twice the area between the egalitarian line and the Lorenz curveL(p) and hence can be given by

I_G = 1− 2 µ

₁

0

_F−1(p)

−∞ udF(u)dp

= 1− 2 µ

_∞

−∞

_x

−∞udF(u)dF(x) (2.7)

A number of intuitive interpretations of the Gini index are possible. For example, Pyatt (1976) interpreted the Gini index as equal to the average gain expected by an individual from the option of being someone else in the

(34)

2.4. INCOME INEQUALITY MEASURES 15 population. Sen (1973) suggested another interesting interpretation. In any pairwise comparison, the individual with the lower income may suﬀer from depression upon discovering that his income is lower. If it is assumed that this depression is proportional to the diﬀerence in incomes, the average of all such depressions in all possible pairwise comparisons leads to the Gini index.

The Gini index can be expressed in several forms. For example Dorfman (1979) showed that

I_G = 1−2µ⁻¹ _∞

0 (1−F(x))²dF(x) (2.8) Stuart (1954) and Lerman and Yitzhaki (1984) showed that

G= 2( Covariance betweenx and F(x) )

µ (2.9)

and Gastwirth (1972) pointed out that I_G = 1

µ

F(x)(1−F(x))dx (2.10)

= 2

µ

x

F(x)−1 2

dF(x) (2.11)

The above result is obtained by making use of Fubini’s theorem (Apostol 1975)

∞ x

dF(u)xdF(x) =

x

0 udF(u)dF(x) (2.12) Moreover, many authors have suggested a generalization of the Gini index.

For a discussion, see Chakravarty (1990).

Following Gini’s work, a considerable number of income inequality indexes have been proposed. Measures of inequality can be divided into normative and positive categories. The normative measures are concerned with measuring inequality in terms of the social welfare so that a higher degree of inequality corresponds to a lower level of social welfare. The main contribu- tors to this approach are Dalton (1920), Aigner and Heinz (1967) Atkinson AB (1970), Sen (1973), Kolm (1976), Blackorby and Donaldson (1978, 1980), Ebert (1987) Bossert and Pﬁngsten (1989) and Chakravarty (1983, 1990).

On the other hand, the positive measures are statistical devices that measure either the relative dispersion or the relative entropy of the observed distribution, without reference to the normative notion of social welfare.

Among the most used indexes, there are the Gini index, the coeﬃcient of

(35)

variation, the relative mean deviation, the relative median deviation, the Bonferroni (1930) index, the variance of the logarithm of income (Gibrat 1931), Hirschman (1945) index, Theil (1967) indexes, Eltet¨o and Frigyes (1968) inequality measures and the Kakwani (1980) inequality measure. In appendix B we present the mathematical forms of these indexes as well as some normative indexes. It should be noted, as Sen (1973) argued, that the distinction between normative and positive measures is not a ﬁrm one, and that every positive measure embodies some forms of the social welfare function. For example, the welfare implications of the Gini index have been debated by a large number of authors (for a brief review, see Kakwani 1980).

The main problem that arises in a concrete income inequality analysis is that the diﬀerent inequality measures may lead to conﬂicting results. This is specially the case with normative measures where the form of the welfare functions plays a crucial role. (For a detailed simulation study to judge the relative merits of various inequality measures, see Champernowne 1974).

These conﬂicts have led to the axiomatic approach to analyzing inequality.

Among the pioneering works there is Atkinson AB (1970), Cowell (1977), Foster (1985) and Sen (1973). This group considers that welfare should serve as a basis for inequality analysis and inequality measurement (or more specifically an income inequality measure) should fulfil certain criteria and then a measure should be found that is consistent with these criteria. We list briefly some of these properties (for a discussion of these properties, see the reference list given in Chakravarty 1990, p. 32). Denoting the income inequality measure byI, we have:

1. Pigou-Dalton Principle of transfers (Pigou 1912 and Dalton 1920):

the transfer of a pound from a richer person to a poorer one decreases inequality.

2. Principle of proportional addition to incomes (scale independence):

proportional addition or substraction to all incomes should leave I unaﬀected.

3. Principle of equal addition to incomes: equal additions to all incomes should diminish and equal subtraction should increase I.

4. Principle of proportional addition to persons: I should be invariant to proportional additions to the population of income receivers.

5. Principle of symmetry: I should be invariant with respect to any per- mutation of income among the income receivers.

(36)

2.4. INCOME INEQUALITY MEASURES 17 6. Principle of normalization: the range of I should be in the interval

[0,1], with zero (one) for perfect equality (inequality).

The problem then becomes, of course, one of finding a class of inequality measures that satisfies these criteria. Of all the forms proposed, only one class comes close. This class is known as the generalized entropy family. The relationship between the axioms and entropy can be traced to Shorrocks (1980, 1983), Cowell (1980) and Maasoumi (1986). The concept of using entropy to measure inequality, however, goes back to Theil (1967). This family can be defined very generally by

I_E^β = 1 β(β+ 1)

x µ

_β+1

−1 dF(x) (2.13)

where F denotes either the empirical distribution function or a parametric model. (Theil’s indexes are the limiting cases corresponding to β =−1 and β = 0). The entropy class of measures not only satisﬁes most welfare-based axioms but the class also has members that have powerful decomposition properties (see Theil 1989).

In the same line of thought, Slottje, Basmann and Nieswiadomy (1989) developed the weighted geometric mean (WGM) measure ﬁrst proposed by Basmann and Slottje (1987). This inequality measure based on the underlying Lorenz curve actually represents a class of inequality measures because it depends on a parameter-vector of weights given to incomes classes. Accord- ing to the choice of these weights, several well-known measures of inequality can be approximated. Their argument is the following: when using one measure of income inequality it is important to know the relative impor- tance or weights that use of the measure assigns to the various quantiles or shares. Indeed, inequality measures are frequently appealed to in connec- tion with policy questions concerning distributive justice, and when one of them is used as a policy target, its implicit weight structure is more likely to be causally related to the resulting levels of social and political unrest than the mere magnitude of the inequality measure itself. However, for most inequality measures, the weights given to the diﬀerent income quantiles do not appear explicitly in their formulation. This can be solved by the WGM measures.

To conclude, it should be stressed that inequality indexes have also been used by diﬀerent authors interested in decomposition of the overall degree of inequality. In this context, two particular applications stand out. The ﬁrst concerns a partition of the population into disjoint subgroups (age, sex, race, religion, etc.) and the researcher is interested in examining how the

(37)

overall degree of inequality can be appropriately resolved into contributions due to inequality within each group and inequality between groups (see e.g.

Silber 1989). The other main application desegregates the total income of each individual into amounts earned from diﬀerent sources and examines the impact of each of these sources on the overall degree of inequality. (For a review and a reference list, see Chakravarty 1990, section 2.6).

2.5 Parametric models for income distributions

The idea of hypothesizing and then estimating the size distribution of income was initiated by Vilfredo Pareto in 1895. This research was motivated by his controversy with the French and Italian socialists who were pressing for institutional reforms to reduce the inequality in the PID. Pareto (1896) speciﬁed his model as a quantitative tool to support the viewpoint that income inequality can only be reduced by sustaining economic growth and not by redistributing actual income levels. In this way Pareto analyzed the characteristic of regularity in the upper tail of the observed income distribution. This showed that the income elasticity of the survival distribution function was constant, i.e. there was a decreasing linear relation between the income-power (i.e. the logarithm of the income variable) and the logarithm of the backward cumulative distribution function. Formally, let n be the number of economic units (households) andn(x) the number of households having an income greater than x, then the relation observed by Pareto is:

logn(x) =b−alog (x) (2.14) with 0< x₀ < x <∞,n=n(x₀) and x₀ is the minimum observed income.

He deduced the cumulative distribution function F(x) = 1−

x x0

₋_α

(2.15) which is commonly called the Pareto law.

Theoretical and empirical research (see e.g. Mandelbrot 1960) led to the acceptance of the Pareto law as the model of high-income groups. Indeed, at that time, the type of data used consisted on personal incomes that exceeded a certain limit x0 ﬁxed by taxation rules. This reinforced the Pareto assumption of a zero modal distribution.

Pareto (1896, 1897) proposed two other models (type II and type III, see appendix A), based upon the same initial observation. Later, with data of a larger spectrum, the statisticians observed unimodal and asymmetric

(38)

2.5. PARAMETRIC MODELS FOR INCOME DISTRIBUTIONS 19 distributions. Nevertheless, the decreasing part of the income distribution systematically behaves like the Pareto law. Thereby an important property to be fulﬁlled by alternative PID models is their convergence to the Pareto law for high levels of income.

Pareto’s initial work stimulated other researchers in the direction of model specifications purporting to offer an accurate and elementary description of the PID. A large number of new models have since been proposed (see below). In this section we present these models following the classification proposed by Dagum (1980b). We begin with the development of generating systems, from which almost all PID models can be deduced. We then discuss the properties that PID models should fulfill, and finally give a list of the existing models and discuss some of their properties. The functional forms of the PID models discussed in this section are given in appendix A.

2.5.1 Generating systems

According to Dagum (1980b), all PID models except the Champernowne (1953) model can be deduced from the following three generating systems:

1. Pearson’s system (Elderton and Johnson 1969) 2. D’Addario’s system (D’Addario 1949)

3. Dagum’s system (Dagum 1980b, 1980c)

Pearson (1894) specified a differential equation from which an important family of probability distribution functions is derived. All members of the Pearson family are determined by means of the four first moments of the distribution. Although the Pearson system was not designed with the specific purpose of generating PID models, several of them belong to this system, such as the Pareto types I-II, the Pearson type I or Beta distributions of first and second kind (Thurow 1970, McDonald and Ransom 1979, McDonald 1984), the Gamma distribution or Pearson type III (March 1898, Amoroso 1925, Salem and Mount 1974, Bartels 1977), the generalized four-parameter Gamma distribution (Amoroso 1925), the Pearson type V or Vinci (1921) distribution, the Pearson type IV (Bartels 1977, Kloek and Van Dijk 1977, 1978), the Pearson type VI (McDonald 1984) and the Student distribution or Pearson type VII (Kloek and Van Dijk 1977, 1978).

Following the idea of transformation function applied by Edgeworth (1898) and Fr´echet (1939), D’Addario speciﬁed his system by means of a probability generating function and a transformation function (see Dagum 1980a, 1980b). Among the well known income distribution derived from

(39)

this system, we have the Pareto type I-II, the Lognormal type I (Gibrat 1931, Aitchison and Brown 1957), the Lognormal type II (Dagum 1980c), the Amoroso (1925) distribution, the Gamma distribution, the Vinci (1921) distribution, the Davis (1941) distribution and the Weibull (1951) distribution.

For the third system, Dagum (1980c) observed that the income elasticity of the cumulative distribution function was a decreasing, and in general, a concave function of the distribution function. This pattern led Dagum to the speciﬁcation of a new generating system of income and wealth distribution.

Among the well known PID derived from this system, we have the three Pareto types, the Benini (1906) distribution, the Weibull (1951) distribution, the logistic distribution (Fisk 1961), the Singh-Maddala (1976) distribution, the log-Gompertz distribution (Dagum, 1980c) and the three Dagum types distributions (Dagum 1977, 1980c).

2.5.2 Properties of income distribution models

The models developed for PID are not based on a causal explanation: they are simply univariate models that purport to achieve an accurate statistical description of the phenomenon. The choice of a particular mathematical form for the model can be made dependent on the set of properties it is supposed to fulﬁl.

While Edgeworth (1898) was the ﬁrst to propose some desired properties for income distribution models, further developments can be found in Fr´echet (1939), Aitchison and Brown (1957), Mandelbrot (1960) and Dagum (1977, 1980b). The following properties are widely accepted in the literature (see Dagum, 1980b):

• Model foundation

• Convergence to the Pareto distribution

• Principle of parsimony

• Relation to income inequality measures

• Good ﬁt over the whole income range

By model foundation is meant the extent to which the mathematical form of a PID model is derived from realistic elementary assumptions. Accordingly, the PID models can be grouped into the following three main classes: (a) stochastic, (b)logico-empirical, (c)ad hoc. An PID model has a stochastic

(40)

2.5. PARAMETRIC MODELS FOR INCOME DISTRIBUTIONS 21 foundation when its mathematical form is the outcome of an a priori set of probability assumptions (for example the Singh-Maddala distribution). The model has a logico-empirical foundation if it is the theoretical counterpart of observed regularities, that is if the functional form is the solution of a specified differential equation which captures the characteristics of regularity observed in an empirical PID (for example those specified by Pareto).

Finally, an ad hoc model is a model with the sole purpose of ﬁtting PID models, providing neither a plausible probability theory basis nor a logico- empirical foundation (for example the Gamma distribution). This property is very controversial, since it may be important for some author to have a model which has a theoretical basis, while for others the ﬁt to the data is the unique important property.

Convergence to the Pareto law is a property that has emerged from empirical evidence (see Davis 1941, Mandelbrot 1960, Budd 1970). This property is called the weak form of the Pareto law and means that for large values of income the income distribution can be approximated by Pareto’s formula.

The principle of parsimony is a well-known statistical property. A model with few parameters has to be preferred to a complicated one. It always costs time and money to include in the model new parameters which might improve the goodness-of-ﬁt over the whole income range. Moreover, simple models can be easily interpreted. The models commonly used contain two or three parameters.

The property ‘relation to income inequality measures’ is useful for an- alytical and statistical purpose. The explicit mathematical solution of the Gini ratio or other inequality measures becomes an important tool of analysis. They can be obtained directly from the ﬁtted model and used in the speciﬁcation of macroeconomic models of income determination and income policy. The simpler the relation between the distribution and the inequality measure is, the more the model is to be preferred.

For a long time, the lack of a model which could describe the whole range of incomes was a stumbling block in the study of income distribution. It has been frequently suggested that several diﬀerent models may be required to explain the empirical distribution. However, models have been proposed recently which provide a ‘reasonable’ description of observed income distributions. The term ‘reasonable’ has to be taken cautiously, as in the literature the quality is measured by means of goodness-of-ﬁt tests.

This tests, according for example to Ransom and Cramer (1983) are in most cases not appropriate because of their dependence on data set size. More- over, the large choice between the diﬀerent statistics can be very confusing

(41)

(see chapter 7). There is also the problem of robustness of both the estimators calculated with classical method, and the classical tests (see chapter 3). Moreover, it is well known that the more parameters the model has, the better the fit of the estimated distribution is. But, on the other hand, a model with many parameters is not only difficult to estimate, but also the variability of the estimators is very large. The property of ‘good fit over the whole income range’ is then an a priori subjective property, even if some models are always preferred to others.

We believe that although these properties are very important, the choice of a PID model should be made in accordance with the data. Therefore, it is very important to choose first an appropriate model choice statistic, then a good technique of estimation and right goodness-of-fit tests in order to be able to check if the model under consideration does reflect the behaviour of the data at hand. The main purpose of our research is precisely to develop and analyse such statistical tools.

2.5.3 Most commonly used models

In this subsection, we discuss the properties of the models cited in the pre- vious section. The functional forms are presented in appendix A.

Historically these models can be classiﬁed as follows:

1. Three Pareto types (Pareto 1896)

2. Pearson type III or Gamma distribution (March 1898, Amoroso 1925, Salem and Mount 1974, Bartels 1977)

3. Benini (1906) distribution

4. Pearson type V or Vinci (1921) distribution

5. The generalized four-parameter Gamma distribution (Amoroso 1925) 6. Lognormal type I (Gibrat 1931, Aitchison and Brown 1957)

7. Davis (1941) distribution 8. Weibull (1951) distribution 9. Logistic distribution (Fisk 1961)

10. Pearson type I or Beta of ﬁrst and second kind (Thurow 1970, Mc- Donald and Ransom 1979, McDonald 1984)

(42)

2.5. PARAMETRIC MODELS FOR INCOME DISTRIBUTIONS 23 11. Singh-Maddala (1976) distribution

12. Lognormal type II (Dagum 1980b)

13. Three Dagum types (Dagum 1977, 1980b) 14. Log-Gompertz distribution (Dagum 1980b) 15. Majumder and Chakravarty (1990) distribution

Among these models, the most frequently used in applied work are the Pareto, the Lognormal, the Gamma, and recently the Dagum and Singh- Maddala models. The generalised Beta distribution of the second kind is also used in practice as PID model (Slottje 1989), partly because it includes as special cases the Gamma and the Singh-Maddala distributions. In Fig- ure 2.2 we give a graphical representation of the Gamma model as a typical unimodal and asymmetric density used as a PID model.

Income variable

Gamma density

0 2 4 6 8 10

0.00.050.100.150.200.25

Figure 2.2: The Gamma density as a model for PID

All these functions constitute a set of possible statistical models for describing the PID. However, the problem of the choice of the functional form

Robust methods for personal income distribution models

Thesis

Reference

Robust methods for personal income distribution models

Robust Methods for Personal Income Distribution Models

Contents

List of Figures

List of Tables

Chapter 1

Introduction

Chapter 2

Income Distribution and Inequality Measures

2.1 Introduction

2.2 The generation and distribution of income

2.3 The Lorenz curve and analysis of income in- equality

2.4 Income inequality measures

2.5 Parametric models for income distributions