• Aucun résultat trouvé

Machine learning and statistical learning

N/A
N/A
Protected

Academic year: 2022

Partager "Machine learning and statistical learning"

Copied!
53
0
0

Texte intégral

(1)

Machine learning and statistical learning

2.2 Quantile Regression Ewen Gallic ewen.gallic@gmail.com

MASTER in Economics - Track EBDS - 2nd Year 2020-2021

(2)

1. Quantiles

1. Quantiles

(3)

1. Quantiles

Quantiles

Source: Alvaredo et al. (2018) Figure 1: Total Income Growth by Percentile Across All World Regions, 1980-2016

• Quantiles are used to describe a distribution

• Very useful for highly dispersed distributions I e.g., with income

• Useful if we are interested inrelative variations(rather thanabsolute variations)

Ewen Gallic Machine learning and statistical learning 3/53

(4)

1. Quantiles

Some references

Arellano, M (2009). Quantile methods,Class notes

Charpentier, A (2020). Introduction à la science des données et à l’intelligence artificielle, https://github.com/freakonometrics/INF7100

Charpentier, A. (2018). Big Data for Economics, Lecture 3

Davino et al. (2014). Quantile Regression. John Wiley & Sons, Ltd.

Givord, P., D’Haultfoeuillle, X. (2013). La régression quantile en pratique, INSEE

He, X., Wang, H. J. (2015). A Short Course on Quantile Regression.

Koenker and Bassett Jr (1978). Regression quantiles. Econometrica: journal of the Econometric Society (46), 33–50

Koenker (2005). Quantile regression. 38. Cambridge university press.

(5)

1. Quantiles 1.1. Median

1.1 Median

Ewen Gallic Machine learning and statistical learning 5/53

(6)

1. Quantiles 1.1. Median

Median: a specific quantile

• Themedianmof a real-valued probabil- ity distribution separates this probability distribution into two halves

I 50%of the values below m, 50% of the values abovem

• For a random variableY with cumula- tive distribution F, the median is such that:

P(Y ≤m)≥ 1

2 andP(Y ≥m)≥ 1 2

m

50%

50%

0.0 0.2 0.4 0.6

0 1 2 3 4 5

x

f(x)

Figure 2: Probability density function of aF(5,2).

(7)

1. Quantiles 1.1. Median

Median: a specific quantile

• From the cumulative distribution func- tion F, that gives F(y) = P(Y ≤ y)

I find the antecedent of .5

I if the cumulative distribution is strictly increasin, the antecedent is unique I if not, all antecedents of 0.5 are pos-

sible values for the median 0.0 m

0.2 0.4 0.6 0.8

0 1 2 3 4 5

x

F(x)

Figure 3: Cumulative distribution function of aF(5,2).

Ewen Gallic Machine learning and statistical learning 7/53

(8)

1. Quantiles 1.1. Median

Median and symmetric distribution

m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) m = E(X) 50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50%

50% 50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%

0.0 0.1 0.2 0.3 0.4

−2 0 2

x

φ(x)

Figure 4: PDF of theN(0,1)distribution.

If the probability distribution is symmetric, the median mand the mean (if it exists) are the same (the point at which the symmetry occurs).

(9)

1. Quantiles 1.1. Median

Numeric Optimization

It may be useful to think of the median as the solution of aminimization problem:

• The medianminimizes the absolute sum of deviations:

median(Y)arg min

m E|Ym|

• An analogy can be made with themean µof a random variable.

I It can be obtained using the following numerical optimization, wheremis defined as the center of the distribution:

µ= arg min

m E(Y m)2

Ewen Gallic Machine learning and statistical learning 9/53

(10)

1. Quantiles

1.2. Quantiles : definition

1.2 Quantiles : definition

(11)

1. Quantiles

1.2. Quantiles : definition

What is a quantile?

In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created (Wikipedia)

We thus define a probability threshold τ and then look for the value Q(τ) (i.e., the τth quantile) such that:

• a proportionτ of the observations have a value lower or equal to τ;

• a proportion1−τ of the observations have a value greater or equal toτ.

Ewen Gallic Machine learning and statistical learning 11/53

(12)

1. Quantiles

1.2. Quantiles : definition

Formal definition

Thequantile QY(τ) of levelτ ∈(0,1)of a random variableY is such that:

P{Y ≤QY(τ)} ≥τ andP{Y ≥QY(τ)} ≥1−τ We can also define it as:

QY(τ) =F−1(τ) =inf{y∈R:FY(y)≥τ} The most used quantiles are:

τ = 0.5: the median

τ ={0.1,0.9}: the first and last deciles

τ ={0.25,0.75}: the first and last quartiles

(13)

1. Quantiles

1.2. Quantiles : definition

Illustration

Figure 5: Quantiles of a probability distribution (here,F(5,2)).

Ewen Gallic Machine learning and statistical learning 13/53

25% 75%

Qτ(.25)

0.0 0.2 0.4 0.6

0 1 2 3 4 5

x

f(x)

Probability distribution function A

Qτ(.25) 0.0

0.2 0.4 0.6 0.8

0 1 2 3 4 5

x

F(x)

Cumulative distribution function B

(14)

1. Quantiles

1.2. Quantiles : definition

Numerical optimization

Once again,quantilescan be viewed as particular centers of the distribution that minimize theweighted absolute sum of deviation. For the τth quantile:

Qτ(Y)∈arg min

m E[ρτ(Y −m)]

whereρτ(u) is a loss function defined as follows:

ρτ(u) = (τ−1(u <0))×u, (1) for 0< τ <1.

(15)

1. Quantiles

1.2. Quantiles : definition

Loss function

This loss function is

• a continuous piecewise linear function

• non differentiable atu= 0 Then,

• a (1−τ) weight is assigned to the negative deviations

• aτ weight is assigned to the positive deviations

Figure 6: Quantile regression loss functionρτ.

Ewen Gallic Machine learning and statistical learning 15/53

τ =0.1

0.0 0.5 1.0 1.5 2.0

−2 −1 0 1 2

u ρτ(u)

(16)

1. Quantiles

1.2. Quantiles : definition

Numerical optimization

The minimization programQτ(Y) = arg minmE[ρτ(Y −m)]can be written as:

• For adiscrete Y variablewith pdf f(y) =P(Y =y):

Qτ(Y) = arg min

m

(1−τ) X

y≤m

|ym|f(y) +τX

y>c

|ym|f(y)

• For acontinuous Y variablewith pdf f(y):

Qτ(Y) = arg min

m

(1−τ)

Z m

−∞

|ym|f(y)dy+τ Z +∞

m

|yc|f(y)dy

(17)

1. Quantiles

1.3. Graphical Representation

1.3 Graphical Representation

Ewen Gallic Machine learning and statistical learning 17/53

(18)

1. Quantiles

1.3. Graphical Representation

Boxplots

IQR

1.5× IQR 1.5× IQR

Q1 Q2 Q3

1

2.0 2.5 3.0 3.5 4.0 4.5

Sepal Witdh Figure 7: Boxlpot.

A famous graphical representation of the dis- tribution of data relies on quantiles: the boxplots.

IQR: interquartile range

(19)

1. Quantiles

1.3. Graphical Representation

Choropleth maps

43.2°N 43.25°N 43.3°N 43.35°N 43.4°N

5.3°E 5.35°E 5.4°E 5.45°E 5.5°E

Equal cuts (3.7,5.67]

(5.67,7.63]

(7.63,9.6]

(9.6,11.6]

(11.6,13.5]

(13.5,15.5]

(15.5,17.5]

(17.5,19.4]

(19.4,21.4]

NA A

43.2°N 43.25°N 43.3°N 43.35°N 43.4°N

5.3°E 5.35°E 5.4°E 5.45°E 5.5°E

Based on quantiles (3.7,7.5]

(7.5,8.3]

(8.3,9]

(9,9.44]

(9.44,10.1]

(10.1,10.8]

(10.8,11.5]

(11.5,12.8]

(12.8,15]

(15,21.4]

NA B

Note: Example adapted from Arthur Charpentier’s slides (INF7100 Ete 2020, slides 224)

Figure 8: Proportion of individuals between the ages of 18 and 25 (inclusive) registered on the electoral lists in Marseilles, by voting center.

Ewen Gallic Machine learning and statistical learning 19/53

(20)

2. Quantile regression

2. Quantile regression

(21)

2. Quantile regression 2.1. Principles

2.1 Principles

Ewen Gallic Machine learning and statistical learning 21/53

(22)

2. Quantile regression 2.1. Principles

Quantile regression

In the linear regression context, we have focused on the conditional distribution ofy, but only paid attention to themean effect.

In many situations, we only look at the effects of a predictor on the conditional mean of the response variable. But there might be someasymetryin the effects across the quantiles of the response variable:

• the effect of a variable could not be the same for all observations (e.g., if we increase the minimum wage, the effect on wages may affect low wages differently than high wages).

Quantile regression offers a way to account for these possible asymmetries.

(23)

2. Quantile regression 2.1. Principles

234 5678910 11 12 13 14 15 16 17 18 19 20 26

24 22 20 18 16 14

12 kg/m2

28 26 24 22 20 18 16 14

12 kg/m2 30 32 34 BMI

BMI

AGE (YEARS) 13

15 17 19 21 23 25 27

13 15 17 19 21 23 25 27 29 31 33 35

95

90

75

50

25 10 5 85 2 to 20 years: Boys

Body mass index-for-age percentiles NAME RECORD #

SOURCE: Developed by the National Center for Health Statistics in collaboration withthe National Center for Chronic Disease Prevention and Health Promotion (2000).

http://www.cdc.gov/growthcharts

Date Age WeightStature BMI* Comments

Published May 30, 2000 (modified 10/16/00).

2 to 20 years: Girls

Body mass index-for-age percentiles NAME RECORD #

SOURCE: Developed by the National Center for Health Statistics in collaboration withthe National Center for Chronic Disease Prevention and Health Promotion (2000).

http://www.cdc.gov/growthcharts

234 5678910 11 12 13 14 15 16 17 18 19 20 26

24 22 20 18 16 14

12 kg/m2

28 26 24 22 20 18 16 14

12 kg/m2 30 32 34 BMI

BMI

AGE (YEARS) 13

15 17 19 21 23 25 27

13 15 17 19 21 23 25 27 29 31 33 35

Date Age WeightStature BMI* Comments

95

90

85

75

50

10 25

5

Published May 30, 2000 (modified 10/16/00).

Source: Centers for Disease Control and Prevention

Figure 9: BMI Percentile Calculator for Child and Teen (boys).

Ewen Gallic Machine learning and statistical learning 23/53

(24)

2. Quantile regression 2.1. Principles

Principles of Quantile Regression

LetY be the response variable we want to predict usingp+ 1predictors X (including the constant).

In the linear model, using least squares, we write:

Y =X>β+ε, with εa zero mean error term with varianceσ2In. Thus, the conditional distribution writes:

E(Y |X=x) =x>β

Here,β represents themarginal changein the mean of the responseY to a marginal change inx.

(25)

2. Quantile regression 2.1. Principles

Principles of Quantile Regression

Instead of looking at the mean effect, we can look at the effect at a given quantile. The conditional quantileis defined as:

Qτ(Y |X =x) =inf{y:F(y|x)τ} Thelinear quantile regression model assumes:

Qτ(Y |X=x) =x>βτ (2) whereβτ =hβ0,τ β1,τ β2,τ . . . βp,τ

i>

is the quantile coefficient:

• it corresponds to the marginal changein the τth quantile folowing a marginal change in x.

Ewen Gallic Machine learning and statistical learning 25/53

(26)

2. Quantile regression 2.1. Principles

Principles of Quantile Regression

If we assume thatQτ(ε|X) = 0, then Eq. (2) is equivalent to:

Y =X>βτ +ετ (3)

(27)

2. Quantile regression 2.1. Principles

Asymetric absolute loss

Recall the asymetric absolute loss function is defined as:

ρτ(u) = (τ−1(u <0))×u, for 0< τ <1.

Figure 10: Quantile regression loss functionρτ.

Ewen Gallic Machine learning and statistical learning 27/53

τ =0.1

0.0 0.5 1.0 1.5 2.0

−2 −1 0 1 2

u ρτ(u)

(28)

2. Quantile regression 2.1. Principles

Asymetric absolute loss

Withρτ(u) used as a loss function, it can be shown thatQτ minimizes the expected loss,i.e.:

Qτ(Y)∈arg min

m

{E[ρτ(Y −m)]} (4) We can note that in the case in whichτ = 1/2, this corresponds to the median.

Quantiles may not be unique:

• any element of {x∈R:FY(x) =τ} minimizes the expected loss

• if the solution is not unique, we have an interval ofτth quantiles

I the smalest element is chosen (this way, the quantile function remains left-continuous).

(29)

2. Quantile regression 2.1. Principles

Empirically

Now, let us turn to the estimation. Let us consider a random sample{(x1, y1), . . . ,(xn, yn)}.

In the case of theleast square estimation, the expectationE minimizes the risk that corresponds to the quadratic loss function,i.e.:

E(Y) = arg min

m

n E

h(Y −m)2io

• The sample mean solves minmPni=1(yim)2

• The least squares estimates of the parameters are obtained by minimizing Pn

i=1(yix>i β)2

Ewen Gallic Machine learning and statistical learning 29/53

(30)

2. Quantile regression 2.1. Principles

Empirically

Theτth quantileminimizes the risk associated with the asymetric absolute loss function:

Qτ(Y)∈arg min

m

{E[ρτ(Y −m)]}

Theτth sample quantile of Y solves:

minm n

X

i=1

ρτ(yim)

If we assume thatQτ(Y |X) =X>βτ, then, the quantile estimator of the parameters is given by

βˆτ ∈arg min

β

(1 n

n

X

i=1

ρτ(yixi>β) )

(5)

(31)

2. Quantile regression 2.1. Principles

Source:Koenker and Bassett Jr (1982)

Figure 11: Food Expenditure as a function of Income.

Ewen Gallic Machine learning and statistical learning 31/53

500 1,000 1,500 2,000

1,000 2,000 3,000 4,000 5,000

Household Income (BEF/year)

Food Expenditure (BEF/year)

τ =0. 1 A

0.4 0.5 0.6 0.7

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Probability level (τ)

Slope of quantile regression

B

(32)

2. Quantile regression 2.1. Principles

In the previous slide, let us focus onτ =.1.

Left graph: The estimatedslope and interceptfor that quantile allows us to show that if the household income is 3~500 Belgian Francs (BEF):

I there is a 90% chance that food expenditure for the household is lower than 1~500 BEF;

I there is a 10% chance that foodexpenditure for the household is greater than 1~500 BEF;

Right graph: plotting the slope of the quantile regression of expenditure on income, for different values ofτ

I the slope coefficient is always positive

I the value increases with the level of the quantile: for relatively richer households, the marginal effect of income on food expenditures is higer.

(33)

2. Quantile regression 2.1. Principles

500 1,000 1,500 2,000

1,000 2,000 3,000 4,000 5,000

Household Income (BEF/year)

Food Expenditure (BEF/year)

Figure 12: Food expenditure as a function of income: slope of linear quantile regression.

Ewen Gallic Machine learning and statistical learning 33/53

(34)

2. Quantile regression 2.1. Principles

From the previous graph, we observe that the slopes change relatively greatly depending on the level of the quantile :

• for the individuals with the lowest incomes, the slope is relatively small (for τ =.1, it is 0.40)

• for individuals with higher incomes, the slope is relatively higher (for τ =.9, it is 0.69)

Therefore, the higher the income category, the greater the effect of income on consump- tion.

(35)

2. Quantile regression 2.1. Principles

Location-shift model

Let usconsider a simple model:

Y =X>γ+ε (6)

We have:

Qτ(Y |X=x) =X>γ+Qτ(ε)

This model known as thelocation model, the only coefficient that varies accordingly with τ is the coefficient associated with the constant: β0, τ =γ1+Qτ(ε)

Ewen Gallic Machine learning and statistical learning 35/53

(36)

2. Quantile regression 2.1. Principles

Location-shift model

• The conditional distributionFY|X=x are parallel whenx varies.

• As a consequence, the conditional quantiles are linearly dependant on X, and the only coefficient that varies with the quantile is β0,τ,i.e., the coefficient associated with the constant :

I β0,τ =γ1+Qτ(ε)

I βj,τ =γj for all the coefficients except the constant.

(37)

2. Quantile regression 2.1. Principles

Location-shift model: exemple

Consider the following true process:

Y =β0+β1x2+ε, with β0= 3 and β1=−.1, and where ε∼ N(0,4).

Let us generate 500 observations from this process.

−10

−5 0 5

0 25 50 75 100

x

y

Linear reg.

Quantile reg.

0.01

0.25

0.5

0.75

0.9

NA

Ewen Gallic Machine learning and statistical learning 37/53

(38)

2. Quantile regression 2.1. Principles

Location-scale model

Now, let us consider a location-scale model. In this model, we assume that the predictors have an impact both on the mean and on the variance of the response:

Y =X>β+ (X>γ)ε, whereεis independant ofX.

As Qτ(aY +b) =aQτ(Y) +b, we can write:

Qτ(Y |X=x) =x>(β+γQτ(ε))

• By posing βτ =β+γQτ(ε), the assumption (2) still holds.

• The impact of the predictors will vary accross quantiles

• The slopes of the lines corresponding to the quantile regressions are not parallel

(39)

2. Quantile regression 2.1. Principles

Location-scale model: example

Consider the following true process:

Y =β1+β2x2+ε, with β0= 3 and β1=−.1, and where εis a normally distributed error with zero mean and non-constant variance.

Let us generate 500 observations from this process, and then estimate a quantile regression on different quantiles.

−40

−20 0 20

0 25 50 75 100

x

y

Linear reg.

Quantile reg.

0.01

0.25

0.5

0.75

0.9

NA

Ewen Gallic Machine learning and statistical learning 39/53

(40)

2. Quantile regression 2.2. Inference

2.2 Inference

(41)

2. Quantile regression 2.2. Inference

Asymptotic distribution

As there is no explicit form for the estimate βˆτ, achieving itsconsistencyis not as easy as it is with the least squares estimate.

First, we need some assumptions to achieve consistency:

• the observations {(x1, y1), . . . ,(xn, yn)} must be conditionnaly i.i.d.

• the predictors must have a bounded second moment, i.e.,E||Xi ||2<

• the error terms εi must be continuously distributed givenXi, and centered, i.e., R0

−∞fε()d= 0.5

hf ε(0)XX>imust be positive definite (local identification property).

Ewen Gallic Machine learning and statistical learning 41/53

(42)

2. Quantile regression 2.2. Inference

Asymptotic distribution

Under those weak conditions,βˆτ is asymptotically normal:

n

βˆτβτ d

→ N 0, τ(1τ)D−1τ xDτ−1

(7) where

Dτ =E

hf ε(0)XX>i

andx=E

hXX>i

The asymptotic variance ofβˆ writes, for the location shift model (Eq. 6):

Vˆarh βˆτi

= τ(1τ) hfˆε(0)i2

1 n

n

X

i=1

x>i xi

!−1

(8)

wherefˆε(0)can be estimated using an histogram (seePowell 1991).

(43)

2. Quantile regression 2.2. Inference

Confidence intervals

If we have obtained an estimator ofVarh βˆτi

, the confidence interval of level1αis given by:

"

βˆτ±z1−α/2 r

Vˆarh βˆτi

#

(9)

Otherwise, we can rely on bootstrap or resampling methods (see Emmanuel Flachaire’s course):

Generate a sample{(x(b)1 , y1(b)), . . . ,(x(b)n , yn(b))}from{(x1, y1, . . . ,(xn, yn)}

Then estimateβ(b)τ usingβˆ(b)τ = arg minn ρτ

yi(b)x(b)>i βo

Do the two previous staps B times

The bootstrap estimate of the variance ofβˆτ is then computed as:

Vˆar? βˆτ

= 1 B

B

X

b=1

βˆτ(b)βˆτ2

Ewen Gallic Machine learning and statistical learning 43/53

(44)

2. Quantile regression 2.3. Example: birth data

2.3 Example: birth data

(45)

2. Quantile regression 2.3. Example: birth data

Birth data

Let us consider an example provided inKoenker (2005), about the impact of demographic characteristics and maternal behaviour on the birth weight of infants botn in the US.

Let us use the Natality Data from the National Vital Statistics System, freely available on the NBER website (https://data.nber.org/data/vital-statistics-natality-data.html) The raw data for 2018 concerns more than 3M babies born in that year, from american mothers aged bewteen 18 and 50.

For the sake of the exercise, we only use a sample of 20,000 individuals.

Ewen Gallic Machine learning and statistical learning 45/53

(46)

2. Quantile regression 2.3. Example: birth data

Variables kept

The predictors used are:

education (less than high school (ref), high school, some college, college graduate)

prenatal medical care (no prenatal visit, first prenatal visit in the first trimester of pregnancy (ref), first visit in the second trimester, first visit in the last trimester)

the sex of the infant

the marital status of the mother

the ethnicity of the mother

theage of the mother

whether themother smokes

the number of cigarette per day

the gain of weight for the mother Let us focus only on a couple of them.

(47)

2. Quantile regression 2.3. Example: birth data

Age

Ewen Gallic Machine learning and statistical learning 47/53

0 1,000 2,000 3,000 4,000 5,000

20 30 40 50

Age of the mother (years)

Birth weight (in grams)

τ =0. 1 A

−5 0 5 10

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Probability level (τ)

Slope of quantile regression

B

(48)

2. Quantile regression 2.3. Example: birth data

Age

• If the slopes are not parallel, and the observed differences are significant, then the effect of age on the baby’s weight is different depending on where the baby is in the distribution.

• At the lower partof the distribution of weights (τ =.1, relatively light babies):

effect of age on weight negative but not significant

• At the upper part of the distribution of weights (τ =.9, relatively heavy babies):

positive and significant effect of age on weight

(49)

2. Quantile regression 2.3. Example: birth data

Age: nonlinear effect?

0 1,000 2,000 3,000 4,000 5,000

20 30 40 50

Age of the mother (years)

Birth weight (in grams)

τ 0.1 0.25 0.5 0.75 0.9

Figure 13: Birth weight as a function of the age of the mother - Non-linear Quantile Regression

Ewen Gallic Machine learning and statistical learning 49/53

(50)

2. Quantile regression 2.3. Example: birth data

Age: nonlinear effect?

The influence of the mother’s age may be thought to benon-linear on birth weight.

For heavy babies (and also light babies), the weight of the babies is relatively lower on the borders of the mother’s age distribution.

(51)

2. Quantile regression 2.3. Example: birth data

Cigarettes

0 1,000 2,000 3,000 4,000 5,000

Do not smoke Smokes

Mother smokes

Birth weight (in grams)

A

−300

−200

−100 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Probability level (τ)

Slope of quantile regression

B

Figure 14: Birth weight depending on whether the mother smokes

Ewen Gallic Machine learning and statistical learning 51/53

(52)

2. Quantile regression 2.3. Example: birth data

Cigarettes

• Smoking during pregnancy is associated with a decrease of birth weight

• For relatively light babies, the coefficient of the slope is relatively dramatically lower than for relatively heavy babies:

I the fact thet the mother is smoking prior pregnancy has a significant impact on the newborn weight, especially for the relatively lighter babies.

(53)

3. Quantile regression

References I

Alvaredo, F., Chancel, L., Piketty, T., Saez, E., and Zucman, G. (2018). The elephant curve of global inequality and growth. AEA Papers and Proceedings, 108:103–08.

Davino, C., Furno, M., and Vistocco, D. (2014). Quantile Regression. John Wiley & Sons, Ltd.

Koenker, R. (2005).Quantile regression. Number 38. Cambridge university press.

Koenker, R. and Bassett Jr, G. (1978). Regression quantiles.Econometrica: journal of the Econometric Society, pages 33–50.

Koenker, R. and Bassett Jr, G. (1982). Robust tests for heteroscedasticity based on regression quantiles.Econo- metrica: Journal of the Econometric Society, pages 43–61.

Powell, J. L. (1991). Estimation of monotonic regression models under quantile restrictions.Nonparametric and semiparametric methods in Econometrics, pages 357–384.

Ewen Gallic Machine learning and statistical learning 53/53

Références

Documents relatifs

Finally to detail the relationship between the ability of DSF to disrupt PHGDH tetramer and its anti-cancer activity, we set out additional experiments aiming to assess the effect

The Consequences of Newborn Genetic Screening is based on ethnographic research in a centre specializing in follow-up care for children with metabolic diseases

Accordingly to the three worlds theory - the physical realities world, the experimental consciousness world and the objective knowledge world - the artefactual

Fractional Fe absorption by young women from the experimental infant formulas based on pea-protein isolate was relatively high, compared with earlier data on Fe absorption by

In 1926, Caroline Pratt and Jessie Stanton published a book recording the every- day experimental praxis and observations at the BEE in the context of pro- gressive education.

A  quick  test  you  can  do  in  the  office  would  involve  walking  with  the  patient  while  holding  their  hand.  If  a  single  assisting  hand 

We need to be alert to the interdependence of a series of skills to the very life of a community, and we need to support maternity care providers in rural and

Financial Conditions are defined as the resource availability (or lack of resources) that influence local collaboration by either motivating or constraining panicipation