Research in Applied Econometrics Chapter 3. Contingent Valuation Econometrics

(1)

Research in Applied Econometrics

Chapter 3. Contingent Valuation Econometrics

Pr. Philippe Polomé, Université Lumière Lyon 2

M1 APE Analyse des Politiques Économiques M1 RISE Gouvernance des Risques Environnementaux

2018 – 2019

(2)

Outline

Principles of Contingent Valuation

Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation

Application

Computing Welfare Measures

Double-bounded CV Estimation

Application: Exxon-Valdez

Questionnaire Issues and Biases

(3)

Purpose

I Operationalize

the theoretical notions of values

I

In practice : impossible to measure every individual’s benefit

I Resort to statistical techniques

I Representative samplesw/ control variables to enable inference to the population

I Individual benefits areneveridentified I Econometric techniques are thus essential I

We’ll need the packages

I DCchoice

I Ecdat, stats (should be there already)

(4)

Classification of Valuation Techniques

I

Based on

stated

preferences

I Contingent Valuation (“Évaluation” Contingente) I Choice experiments / Contingent choices

I

Based on

revealed

preferences

I Transport cost – estimating demand for transport I Hedonic prices – estimating demand for housing I

Based on (infered) prices

I

Others, not based on preferences

(5)

Stated Preferences Techniques

I

A sample of people is

surveyed directly

on their preferences about a public project

I To infer a measure of individual statistical value I at the population level

I

Interviews can be anything: telephone, postal mail, e-mail, website

I Preferably face-to-face, but it’s more expensive I or combinaisons

I

The sample depends on the objective

(6)

Contingent Valuation

I

One potential environmental change

z0→z1

is described

I Together with its statedcost

I The context of such cost is important : taxes, fees, prices...

I

A single question is asked : for or against said change

I The question is sometimes repeated w/ another cost

I At this stage, model only the 1st Q

(7)

Questionnaire Structure: wide - precise - wide

I

Opening questions

I Possible filters to select certain respondents

I

General Q on the environment to bring to the particular case of interest

I While making the respondent think about it I We want informed, thought about, answers I

Evaluation Q

I

Debriefing

I Why did the respondent answer what s.he did ? I Did s.he not believe the scenario ?

I

Collect data on specific potential explanatory variables

I e.g. if survey on a lake quality, what use has the respondent of the lake ?

I Tourism, recreational (boat, fish...)

I

Socio-econ data

I Primarily for inference to population

(8)

The Dichotomous Format

I

This is the most popular, least controversial “elicitation”

format

I Assumed least prone to untruthful answers I While not too demanding for the respondent

I

Consider an environmental change from

z0

to

z1

I To simplify, consider only an improvement I Respondents are proposed a “bid”b

I They answer yes or no

I But may also state that they don’t know or refuse to answer I This is similar to “posted price” market context

I There is a good (the environmental improvement) I The situation is a bit like asking whether to buy it I Respondents are routinely in such situation I Except, not in a public good context

I Further, “buying” cannot really be that I So we need a context, but we discuss that later

(9)

The Dichotomous Format

I

Formalizing, let the Indirect Utility

v

(z,

y

) +

I

It represents individual’s preferences

I from the point of view of the researcher

I The error term reflect multiple influences the researcher does not know about

I These influences are modeled as a random variable, but that does not mean people act randomly

I This is called a Random Utility Model (RUM) I If the answer is “Yes”, then it must be that

v(z1,y−b) +1≥v(z0,y) +0

I and thus, thatWTP>b

(10)

WTP distribution

I

4 to 6 (different) bids are proposed to different respondents

I Each respondent ever sees only one bid

I Consider the proportion of “Yes” for each of these bids

(11)

WTP distribution

I

Assume that

I For a bid of 0, the proportion of Yes is 100%

I For some high bid the proportion would be zero I Respondents have a single form of their utility function

I but differ according to observable dataX and unobservables

I

“connect-the-dots” as an estimate of the WTP dist.

(12)

WTP distribution

I

Going back to the IUF, we had the answer is “Yes”, then

v

(z

1,y−b) +1 ≥v

(z

0,y) +0

I

In other words,

Pr

{Yes|b}

= Pr

{v

(z

1,y−b) +₁ ≥v

(z

0,y

) +

₀}

= Pr

{₀−1 ≤v

(z

1,y−b)−v

(z

0,y)}

= Pr

{₀−₁ ≤g

(b

,y, . . .

)

} g

() has some properties because of

V

(

.

), see later

I

If we make a hypothesis on the distribution of

₀−₁

I We have a model that can be estimated by Maximum Likelihood

I Logistic : Logit I Normal : Probit

(13)

WTP distribution

(14)

Outline

Principles of Contingent Valuation

Maximum Likelihood Estimation

Application of ML : Logit & probit Single-bounded CV Estimation

Application

Double-bounded CV Estimation

Questionnaire Issues and Biases

(15)

Density

I f

(y

|θ

): probability density function, pdf, of a random variable

y

I conditioned on a set of parametersθ

I It represents mathematically thedata generating processof each obs. of a sample of data

I

The joint density of n

independent and identically distributed

(iid) obs.

I = the product of the individuali densities:

f(y1, ...,yn|θ) =

n

Y

i=1

f(yi|θ) =L(θ|y)

I

This joint density is the

likelihood function I a function of the unknown parameter vectorθ I y is used to indicate the collection of sample data

(16)

Likelihood Function

I

Intuitively, this is much the same as a joint probability

I Consider 2 (6-sided) dices

I What is the probability of rolling a 36 ?

I The likelihood function is the idea of the probability of the sample

I Except that points have probability mass zero

(17)

Conditional Likelihood

I

Generalize the likelihood function to allow the density to depend on conditioning variables:

f

(y

i|xi, θ

)

I Take the classical LRMyi=xiβ+_i

I Supposeis normally distributed with mean 0 and variance σ²: ∼n 0, σ²

I Then,yi ∼n xiβ, σ²

I Thusyi arenot iid: they have different means

I But they are independant, so that (yi−xiβ)/σ∼n(0,1) I thusL(θ|y,X) =

Πif(yi|xi, θ) = Π

i

√ 1

2πσ²exp (

−(yi−xiβ)² σ²

)

(18)

Conditional log-Likelihood

Usually simpler to work with the

log

: ln

L

(

θ|y) =

n

X

i=1

ln

f

(y

i|θ

) thus ln

L

(

θ|y,X

) =

X

ln

f

(y

i|xi, θ) =−

1 2

n

X

i=1

h

ln

σ²

+ ln (2π) + (y

i −xiβ)²/σ²ⁱ

where X is the

n × K

matrix of data with ith row equal to

xi

(19)

Identification

I

Now that we have ln

L, how do we use it to obtain estimates

of the parameters

θ

?

I and to test hypotheses about them ? I

There is the preliminary issue of identification

I Whether estimation of the parameters is possible at all I This is about the formulation of the model

I The question is : Suppose we had an infinitely large sample, I could we uniquely determine the values ofθ from it?

I The answer is sometimes no

I e.g. in the LRMyi =xiβ+i when there is multicollinearity

(20)

Example : Identification via normalization

Consider the LRM

yi

=

β₁

+

β₂xi

+

_i

, where

_i|xi ∼n

0

, σ²

.

I

e.g. consumer’s purchases of a large commodity such as a car

where

I xi is the consumer’s income

I yi is the difference between what the consumer is willing to pay for the car,pi^∗, and the price of the car,pi

I

Suppose that rather than observing

pi^∗

or

pi

I we observe only whether the consumer actually purchases the car

I assume this occurs whenyi =p^∗i −pi >0.

I

The model states that the consumer purchases the car if

yi >

0

I and not purchase otherwise

I

The random variable in this model is “purchase” or “not purchase”

I there are only 2 outcomes

(21)

Example : Identification via normalization

The probability of a purchase is

Pr

{purchase|β₁, β₂, σ,xi}

= Pr

{yi >

0

}

= Pr

{β₁

+

β₂xi

+

_i >

0

}

= Pr

{_i >−β₁−β2}

= Pr

{_i/σ >

(

−β₁−β₂xi

)

/σ}

= Pr

{zi >

(

−β₁−β₂xi

)

/σ}

where

zi

has a standard normal distribution

The probability of not purchase is one minus this probability.

(22)

Example 2 Identification via normalization

Thus the likelihood function is

Y

i=purchase

[Pr

{purch|β₁, β2, σ,xi}

]

× ^Y

i=not purch

[1

−

Pr

{purch|β₁, β2, σ,xi}

] This is often rewritten as

Y

i

[Pr

{purchase|β₁, β₂, σ,xi}^yⁱ

] [1

−

Pr

{purchase|β₁, β₂, σ,xi}

]

^(1−yⁱ⁾

The parameters of this model are

not identified

:

I

If

β₁

,

β₂

and

σ

are all multiplied by the same nonzero constant,

I then Pr{purchase}and the likelihood function do not change I

This model

requires a normalization

I The one usually used isσ= 1.

(23)

Maximum Likelihood Estimation Principle

I

We see that with a discrete rv

I f(yi|θ) is the probability of observingyi conditionnally on θ I The likelihood function is then the probability of observing the

sampleY conditionnaly onθ

I Assume that the sample that we have observed is the most likely

I What value ofθ makes the observed sample most likely?

I Answer : The value ofθ that maximizes the likelihood function I since then the observed sample will have maximum probability

I

When

y

is a continuous rv, instead of a discrete one,

I we cannot say anymore thatf(yi|θ) is the probability of

observingyi conditionnally onθ, I but we retain the same principle.

(24)

Maximum Likelihood Estimation Principle

I

The value of the parameter vector that maximizes

L

(

θ|data)

is the

maximum likelihood estimatesθ

ˆ

I The value vector that maximizesL(θ|data) is the same as the one that maximizes lnL(θ|data)

I

The necessary condition for maximizing ln

L

(

θ|data) is

∂

ln

L

(

θ|data)/∂θ

= 0

I This is called thelikelihood equations

(25)

Assume a sample

Y

from an

n µ, σ² I

The lnL function is

ln

L µ, σ²

=

−ⁿ₂

ln (2

π

)

−ⁿ₂

ln

σ²−¹₂^Pⁿ_i=1

(yi−µ)² σ²

I

The likelihood equations are

I ^∂_∂µ^ln^L = _σ¹2

Pn

i=1(yi−µ) = 0 and I ^∂_∂σ^ln2^L =−_2σⁿ2 +_2σ¹4

Pn

i=1(yi−µ)²= 0 I

These equations accept an

explicit

solution at

I µˆ_ML=¹_nPn

i=1yi = ¯y and I σˆ²_ML= ¹_nPn

i=1(yi−y¯)²

I

Thus the sample mean is the ML estimator

I The ML estimator of the variance is not the OLS estimator (that has an n-1 denominator)

I In small samples, such estimator isbiased, but asn→ ∞that does not matter

(26)

ML Properties 1

I Conditionnally on correct distributionnal assumptions I and under regularity conditions

I ML has very good properties

I In a sense, because the information supplied to the estimator is very good: not only the sample but also the full distribution

I Notation

: ˆ

θ

is the ML estimator

I θ₀is the true value of the parameter vector I θis any other value

I ML has only asymptotic properties

I in small samples, it may be biased or inefficient

(27)

ML Properties 2

I Consistency

:

plimθ

ˆ =

θ₀

I Asymptotic normality

: ˆ

θ∼N^hθ0,{I

(

θ0

)

}⁻¹ⁱ I whereI(θ0) =−Eh

∂²lnL/∂θ0∂θ₀⁰i

is theinformation matrix I ∂f/∂θ₀indicates∂f/∂θevaluated at θ₀.

I Asymptotic efficiency

: ˆ

θ

is asymptotically efficient

I It achieves the Cramer–Rao lower bound for consistent

estimators

I Invariance

: The ML estimator of

γ0

=

c

(

θ0

) is

cθ

ˆ

I ifc(θ) is a continuous and continuously differentiable function.

(28)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation

Application of ML : Logit & probit

Single-bounded CV Estimation

Application

Double-bounded CV Estimation

Questionnaire Issues and Biases

(29)

Specification (“how the probability is written”)

I

Let a class of non-linear models with dichotomous responses : Pr (y = 1

|X

) =

G

(X

β

)

G

is a function taking values between zero and one : 0

≤G

(X

β

)

≤

1

I This guarantees that estimated response probabilities will be between zero and one

I That is not the case w/ the LRM

I

Therefore, there is a

non linear

relation between

y

and

X I

Many functions could do this job

I 2 are popular : logistic and normal

(30)

Logit & Probit

I Logit

model,

G

is the distribution function (cumulative density) of the standard

logistic

r.v. :

G

(X

β) = exp (Xβ)/

[1 + exp (X

β)] = Λ (Xβ) I Probit

model,

G

is the distribution function of the standard

normal

r.v., of which the density is noted

φ

(.) :

G

(X

β

) =

^Z ^X^β

−∞

φ

(t)

dt

= Φ (X

β

)

with

φ

(X

β

) = (2

π

)

^−1/2

exp

−

(X

β

)

²/

2

(31)

Logit vs. Probit

I

The logistic and normal distributions are similar

I

Logistic makes computations and analysis easier and allows for

simplifications in more advanced models

(32)

Latent Variable Model

I

Let

y^∗

a latent variable (that is, not directly observed) s.t.

y^∗

=

Xβ

+

I e.g. y^∗is utility from an environmental improvement ∆z I

Logit & probit may be derived from a latent variable model

I satifying all the classical LRM hypotheses

I

Utility is not observed, but only the consequence of the individual decision

( yi^∗ <

0 =

⇒ yi

= 0

y_i^∗ ≥

0 =

⇒ yi

= 1

I We observe that the person is (y= 1) or is not (y = 0) willing to pay an amountb to “buy” an environmental improvement z1−z0= ∆z

(33)

Response Probability

I

Hypotheses on

I independantX

I standard logistic or standard normal I

Response probability for

y

:

Pr

{y

= 1

|X}

= Pr

{y^∗≥

0

|X}

= Pr

{ >−

(X

β

)

|X}

= 1

−G

(

−

(X

β

))

=

G

(X

β

)

I

Since

is normal or logistic, it is symmetrical around zero

I thus 1−G(−Xβ) =G(Xβ)

(34)

Maximum Likelihood Estimation

I

As indicated earlier, the Likelihood Function for the dichotomous case is

Y

i

[Pr

{willing|β, σ,Xi}^yⁱ

] [1

−

Pr

{willing|β, σ,Xi}

]

^(1−yⁱ⁾ I

ML seeks the maximum of (the log of) this function

I It does not have an explicit solution I But yield numerical estimates ˆβMV

I Consistent but biaised

I Asymptotically efficient & normal I As long as the model hypotheses are true

I So, if you used Probit : isreally normal ?

I If the distribution hypothesis is not true, sometimes we may retain the properties

I On the other hand, endogeneity ofX is as serious as usual

(35)

Marginal effect a continuous regressor x

_j

I

The effect of a marginal change in

xj

I on the response proba Pr{y= 1|X}=p(X) I is given by the partial derivative

∂p(X)

∂xj = ∂G(Xβ)

∂xj =g(Xβ)β_j I

This is the

marginal effect

of

xj

I it depends on the values taken byallthe regressors (not just xj)

I Compare to the LRM:∂y/∂xj=βj

I it cannot bring the proba below zero or above one

(36)

Marginal effect a continuous regressor x

_j

I

Thus, the marginal effect is a non-linear combination of the regressors

I

It can be calculated at “interesting” points of

X I e.g. ¯X: the sample average point : ∂y

∂xj

X¯

I However, that does not mean much for discrete regressors e.g.

gender

I Or it can be calculated for eachi in the sample ∂y

∂xj

(Xji) I and then we can compute an average of the “individual”

marginal effect ∂y

∂xj(Xj)

I In general, these are not the same : ∂y

∂xj (Xj)6= ∂y

∂xj

X¯ I

Which one do we choose ?

I Often, that is too complicated for presentation

(37)

Marginal effect of a discrete regressor x

_j

I

Effect of a change in

xj

discrete

I fromatob (often, from 0 to 1)

I on the response proba Pr{y= 1|X}=p(X)

I WriteX−j the set of all the regressorsbutxj, similarlyβ_−j I The discrete change in ˆp(Xi) is

∆ˆp(Xi) =G

X−jiβˆ_−j+bβˆ_j

−G

X−jiβˆ_−j+aβˆ_j I

Such discrete effect differs from individual to individual

I Even the sign could in principle differ

I

In R, such effects are not calculated automatically

I The above formula must be calculated explicitly

(38)

Compare Logit & Probit

I

The

sizes

of the coefficients

are not

comparable between these models

I Approximately, multiply probit coef by 1.5 yields the logit coef (rule of thumb!)

I

The marginal effects should be approximately the same

(39)

Measures of the quality of fit

I

The

correctly predicted percentage I may be appealing

I ∀i compute the fit proba thatyi takes value 1, G Xiβˆ I If≥.5 we “predict”yi = 1 and zero otherwise

I Compute the % of correct predictions

I

Problem : it is possible to see high % of correctly predicted while the model is not much useful

I e.g. in a sample of 200, 180 obs. ofyi= 0 for which 150 are predicted zero and 20 obs. ofyi= 1 all predicted zero

I The model is clearly poor

I But we still have 75% correct predictions I A flat prediction of 0 has 90% correct predictions

I

A better measure is a 2

×

2 table as in the next slide

(40)

Goodness of Fit : Predictive Table

Observed

yi

= 1

yi

= 0 Total

Predict

ˆ

yi

= 1

350

122 472 ˆ

yi

= 0 78

203

281 Total 428 325 753

We’ll see the R cmd later

(41)

Goodness of Fit : Pseudo R-square

I Pseudo−R²

= 1

−

ln

L_UR/

ln

L₀

I lnLUR log-likelihood of the estimated model

I lnL0log-likelihood of a model with only the intercept I i.e. forcing allβ= 0 except for the intercept

I

Similar to an

R²

for OLS regression

I sinceR²= 1−SSRUR/SSR0

I

There exists other measures of the quality of fit

I but the fit is not what maximum likelihood is seeking

I contrarily to LS

I I stress more the statistical significance of regressors

(42)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit

Single-bounded CV Estimation

Application

Double-bounded CV Estimation

Questionnaire Issues and Biases

(43)

Utility

I

Come back to economic value

I Compensating Variation for a change fromz0 toz1is V(z1,y−CV) =V(z0,y)

CV is interpreted as the WTP to secure an improvement

∆z =z1−z0

I

We observe Yes answers to bid

b

when

b <WTP I

Assume RUM, then

Pr

{Yes|b}

= Pr

{₀−1 ≤v

(z

1,y−b

)

−v

(z

0,y)}

(44)

Linear utility

I

Suppose

V

(z

j,y

) =

α_j

+

βy j

= 0

,

1

I β is the marginal utility of income

I In principle, we would like it to decrease with income, but simplify

I

The WTP is s.t.

α₀

+

βy

+

₀

=

α₁

+

β

(y

−WTP) +₁ I SolvingWTP= α+

β , withα=α₁−α₀and=₁−₀ I So that e.g. E(WTP) =α/β

I

The probability of Yes to bid

b

is then

Pr

{Yes|b}

= Pr

{v

(z

1,y−b) +₁−v

(z

0,y

)

−₀>

0

}

= Pr

{≤βb−α}

I This is a simple probit/logit context

I UtilityV is not identified, only the difference

(45)

Probit/Logit

I

If

∼n

(0

,

1) std normal, then

Pr

{Yes|b}

= 1

−

Φ (βb

−α) = Φ (α−βb) I

If

is the std logistic, then

Pr

{Yes|b}

= 1

/

exp (

βb−α

)

I

If we assume that

WTP ≥

0, then we can derive similarly:

I with∼log−normal

Pr{Yes|b}= 1−Φ (βb−α) = Φ (α−βln (b)) I with∼log−logistic

Pr{Yes|b}= 1/exp (βln (b)−α) I These last two are still the Probit and Logit models,

respectively,

I But the ln of the bid is used instead of the bid itself I And that is still compatible with RUM

(46)

Estimation

I

Using these expressions, it is easy to derive the log-likelihood function

I as in the previous section

I

Usually, we want to account for individual characteristics

Xi

I Those that are collected in the survey I This is done throughα_i =Pk=K

k=0 γ_kxki

I x0i = 1∀i for an intercept

I

Parametric estimation

I GLM package (core distribution) I DCchoice package

I Check orlibrary( )it

I

Example with the

NaturalPark

data

I of theEcdatpackage (Croissant 2014)

(47)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit

Application

Double-bounded CV Estimation

Questionnaire Issues and Biases

(48)

NaturalPark data

I

From the

Ecdat

package (Croissant 2014)

I consists of WTP and other socio-demographic variables I of 312 individuals regarding the preservation of the Alentejo

natural park in Portugal

I

In

help

type NaturalPark, execute

summary(NaturalPark) I 7 variables

I bid1 is the first bid I min 6, max 48

I bidh & bidl are 2nd bids that we do not look into for the moment

I answers is a factor {YY, YN, NY, NN} of answers to 2 bids I We’ll use only the 1st letter, Y for Yes

I Socio-demographics are Age, Sex, Income

(49)

Data Transformation

I

Rename

NP <- NaturalPark

for simplicity

I

Extract the answer to the 1st bid from “answers”

I NP$R1 <- ifelse(NP$answers == "yy" | NP$answers ==

"yn", 1, 0)

I What does that do ?

I A call to the logical function ifelse( ) takes the form ifelse(condition, true, false)

I It returns true if condition is true, and false otherwise.

I The vertical bar | is an “or” operator in a logical expression.

I The prefix NP$ is a command that makes it possible to access each variable contained in NP directly.

I

Convert bid1 to log (log in English is ln in French)

I NP$Lbid1 <- log(NP$bid1)

I summary(NP)

reveals that things have gone smoothly

(50)

Estimation using glm

I glm

is a classical package for many models of type

G

(X

β

)

I Its use is much likelm

I But you have to specify the link functionG using option family

=

I This is fairly flexible, but a bit complicated I See RAE2017 for the specifications

I summary

works on

glm I As several usual commands I

Output is in part similar to

lm

I Coef values next to var names with their significance I This is interpreted in a way similar tolm

I Negative (signif.) coef implies a negative impact on Pr{Yes|b}

I Note “sexfemale” indicates the effect for women I Also gives the lnL value at optimum

(51)

have explicit solutions Solved numerically by algorithms

I

Newton-type in plot

I Iterates until condition I

Risk of local max

I poor start point

(52)

Estimation using DCchoice

I DCchoice

is designed for such data

I But not for other contexts

I Format for single-bounded issbchoice(formula, data, dist =

"log-logistic")

I

the default dist is log-logistic

I This is in fact logistic

I But the bid variable is interpreted in log I formula followsResponse ~ X | bid

I | bid is mandatory

I

the output is much more directly relevant for valuation purposes

I bidalways shown last

I log(bid)if log-logistic or log-normal were selected I Measures of mean WTP, we will see later

(53)

Goodness of fit

I

table(predict(fittted model, type = "probability")>.5,NP$R1)

I This is a contingency table that counts the number of

predicted Yes

I predicted prob>.5 (returns TRUE or FALSE) I against the actual Yes/No

I per individual, so with each individual’sXi (individual predictions)

SB.NP.DC.logit 0 1 FALSE (predicted 0) 85 38

TRUE (predicted 1) 56 133

(54)

Plotting

I sbchoice

cmd produces an object that can be

plotted directly I directplotofthe object is the fitted probability of Yes for the

bid range

I probably, conditionnaly on average age, income, sex – the package isn’t explicit

I

Using a predict method helps

I observe what is outside the range I compare logit & probit fitted curves

I see RAE2017.R

I In particular : logit has slightly fatter tails, inducing a higher WTP

I To use predict

I Creates a matrix of new data

I Chooses the proper type, here we want a proba,“response”

(55)

Logit vs. Probit predict

I

As can be seen, for the same data

I Logit has slightly fatter tails than probit

(56)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit

Application

Computing Welfare Measures Double-bounded CV Estimation

Questionnaire Issues and Biases

(57)

Expected WTP

I

Recall that when we assumed

V

(z

j,y) =αj

+

βy j

= 0

,

1

I Then (max) WTP is s.t.

α0+βy+0=α1+β(y−WTP) +1

I SoWTP= α+

β , withα=α1−α0and=1−0

I So that WTP is a r.v.

I we can compute e.g. E(WTP) =α/β I Whenαi=Pk=K

k=0 γkxki

I thenE(WTP) becomes individual

I So that we have to think about how we aggregate individual expected WTP

(58)

Other measures of welfare

I E

(WTP) is the most obvious measure

I However, the expectation

is strongly influenced by the tail of the

distributionG(.)

I While we do not actually have data to fit it

I Since there are not many bids

I And it does not feel very serious

to ask a very high bid

(59)

Truncated expectations

I E

(WTP) =

^Z ^∞

0

Pr

{Yes|b}db

I Historically, the highest bid has been used to truncate E(WTP)

Z b max

0

Pr{Yes|b}db

I However, that is not a proper expectation since the support of Pr{Yes|b}does not stop atb max

I An alternative uses the truncated distribution:

Z b max

0

Pr{Yes|b}/(1−Pr (Yes|b max))db

(60)

Median WTP

I

Finally the median has been suggested as a more robust measure:

bmedian

s.t.

Pr

{Yes|bmedian}

=

¹/2

I i.e. the bid s.t. 50%

would be favorable

(61)

Shape of the WTP function

I

Clearly, the form of the WTP function depends on the form of

V

(.)

I For some forms, some values ofβ lead to impossibilities

Distribution Expected Median

Normal

^α_β ^α_β

Logistic

^α_β ^α_β

Log-normal exp

^α_β

exp

_2β¹2

exp

^α_β

Log-logistic exp

^α_β

Γ

1

−_β¹

Γ

1 +

¹_β

exp

^α_β I

Again, if

αi

=

^P^k=Kk=0 γ_kxki

, each of these forms are individual

I So the question arises whether to compute a sample mean or a sample median

I DCchoice appears to compute a sample mean I But is not explicit

(62)

How do we choose a welfare measure ?

I

That might be the time for a debate ?

I

3 welfare measures : untruncated, properly truncated, median

I Of course we would not select an infinite one

I The smallest estimates to avoid criticism ?

I Do these measures differ significantly from each other ? I We will see that later

I

4 well-known distributions

I (log-)normal, (log-)logistic: they do not differ substantially I There are others

I There are also other estimators e.g. non-parametric I The estimate of the best-fit model ?

I

2 aggregation rules : sample mean or sample median

I researcher usually take the first, but what is the meaning of a sample mean of individual medians ?

I

DCchoice does not provide any guidance

(63)

Computing the welfare measure

I

DCchoice computes automatically

I

With glm, use the above formulas

I See the code in RAE.R

I

Much as I like DCchoice, I must note that for the data we use (NaturalPark)

I The mean WTP does not coincide with the median I for the symmetrical distributions (normal & logistic) I That is a problem I should write the authors

(64)

Confidence Intervals

I

In the end, WTP, under any of its forms, is an estimate

I As such it has a confidence interval

I Much as for ˆβ, you should always report the CI I At a minimum to give an idea of the variance

I and to show whether it is significantly different from zero

I

There are 2 main methods

I Krinsky & Robb I Bootstrap

I

These methods are much broader than valuation

I They are useful in all types of research in applied econometrics

(65)

Constructing Confidence Intervals: Krinsky and Robb

I

Start with the estimated vector of coefficients

I

By ML, it is distributed (multivariate) normally

I Its matrix of variance-covariance has been estimated in the ML process

I

Draw D times from a multivariate normal distribution with

I mean = the vector of estimated coefficients

I the estimated variance-covariance matrix of these estimated coefficients

I

So, we have D vectors of estimated coefficients

I If D is large, the average of these D vectors is just our original vector of coef.

(66)

Constructing Confidence Intervals: Krinsky and Robb

I

Compute the welfare measure for each such replicated coefficient vector

I Thus we have D estimated welfare measures I some large, some small : an empirical distribution

I order them from smallest to largest I the 5% most extreme are deemed random I the 95% most central are deemed reasonnable I and contitute the 95% confidence interval

I

For example,

I the lower and upper bounds of the 95% confidence interval I corresponds to the 0.025 and 0.975 percentiles of the

measures, respectively

(67)

Krinsky and Robb: Implementation in DCchoice

I

Function

krCI(.)

I constructs CI for the 4 different WTPs

I estimated by functionssbchoice(.) ordbchoice(.)

I

call as

krCI(obj, nsim = 1000, CI = 0.95)

I objis an object of either the “sbchoice” or “dbchoice” class, I nsimis the number of draws from the multidimensional normal

I influences machine time

I CIis the percentile of the confidence intervals to be estimated I returns an object “krCI”

I table of the simulated confidence intervals I vectors containing the simulated WTPs

I

Is there a package that does Krinsky & Robb for glm objects ?

(68)

Constructing Confidence Intervals: Bootstrap

I

Similar to Krinsky & Robb

I except in the way the new estimated coefficients are obtained I Essentially, instead of simulating new estimates

I We simulate new data

I and then calculate new estimates

(69)

Mediocrity principle

I

Consider that our sample is

mediocre

in the population

I We mean : it does not have anything exceptional

I

Then, if we could draw a new sample from that population,

I we would surely obtain a fairly mediocre sample

I that is, fairlysimilarto the original one

(70)

Bootstrap principle

I

It’s not possible to draw a new sample

I

But imagine that using the original sample, we draw one obs,

I and we “put it back” in the sample (“replace”)

I then we draw again

I repeat until we have the same number of obs as in the original sample

I call thisa bootstrap sample

I Each original obs. appears 0 to n times

I

By the

mediocrity principle

I the bootstrap sample is fairly close to a real new sample I Estimate a new vector of coefficients

I Repeat D times

(71)

Bootstrap: Implementation in DCchoice

I

Function

bootCI()

carries out the bootstrapping

I and returns the bootstrap confidence intervals I callbootCI(obj, nboot = 1000, CI = 0.95)

I

Longer than K&R since each sample must be generated

I and then compute new estimates

I while K&R only simultates new estimates I

In the end, the results are similar

I See RAER

I

Note : another mean would be the

resample

cmd

I Applicable toglm

I But I don’t develop here

(72)

Differences of welfare measures

I

Sometimes we want to know whether a welfare measure is significantly different from another

I In other words: is their difference significantly different from zero

I In terms of CI: does the CI of their difference include zero ? I

To compute that: Bootstrap similarly

I Krinsky and Robb is also possible but

I If the welfare measures are independant, their difference has variance-covariance that is the sum of each

variance-covariance

I If they are not independant, then it’s difficult

(73)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation

Application

Computing Welfare Measures Double-bounded CV Estimation

Questionnaire Issues and Biases

(74)

Double-bounded CV Estimation

I

To increase the amount of information collected by the survey,

I The valuation q is asked a 2nd time

I If answer is Yes (No), then ask with a higher (lower) bid

I

Called

double-bounded

dichotomous choice

I Dichotomous choice with follow-up

I

More precisely, the phrasing could be

I If ∆Z cost you b€, would you be WTP it ?

I If answered Yes: would you be WTPb^U€ ? b^U>b I If answered No: would you be WTPb^L€ ? b^L<b

(75)

Double-bounded CV Estimation

I

There are 4 outcomes per R

I YY, YN, NY, NN

I YY indicates that WTP>b^U I YN that b<WTP<b^U I NY thatb^L<WTP<b I NN that WTP<b^L

I

Thus the answers are intervals

I Probit & Logit do not suffice I Many use ML

I But the likelihood function is different

(76)

Estimation with DBDC data

I

To develop the likelihood function

I it is necessary to express probabilities first I

Write

P^YY

as the probability to answer Yes, Yes

I P^YY = Pr

bÛ <WTP = 1−G bÛ I P^YN = Prb<WTP<bÛ =G bÛ

−G(b) I P^NY = Pr

b^L<WTP<b =G(b)−G b^L I P^NN= PrWTP<b^L =G b^L

I

For a sample of

n

obs. ln

L

=

N

X

n=1

hdn^YYPn^YY

+

dn^YNPn^YN

+

dn^NYPn^NY

+

dn^NNPn^NN

i

where

n

indexes individuals and

dn^XX

indicates whether

n

answered

XX

(dich. variable)

(77)

Estimation with DBDC data

I

There is no direct command corresponding to such likelihood

I It must be programmed

I This is called “Full Information Maximum Likelihood” FIML I We don’t do this

I It is pre-programmed in DCchoice

I For the same basic choices of distribution as for SBDC data

I

Endogeneity issue

I The 2nd bid is not exogenous

I Since it depends on the previous answer

I Thus it contains unobserved characteristics of the individuals I Such unobservables also determine the 2nd choice

I This is in principle addressed by FIML I

A more general model is Bivariate probit

I allowing the 2 answers to have less than perfect correlation I not covered by DCchoice

(78)

Estimation with DBDC data : Std normal cdf

I P^YY

= Pr

ⁿb^U <WTP^o

= 1

−

Φ

−α

+

βb^U I P^YN = Pr

b<WTP<b^U = Φ −α+βb^U

−Φ (−α+βb) I P^NY = Pr

b^L<WTP<b = Φ (−α+βb)−Φ −α+βb^L I P^NN

= Pr

ⁿWTP <b^L^o

= Φ

−α

+

βb^L

I

So: we estimate the same coefficients

α

and

β

as in SBDC

I But with more data, so that it is more efficient

I Assuming people answer in the same way to both valuation questions

I

So: the computation of the welfare measures is the same

(79)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation

Application

Computing Welfare Measures Double-bounded CV Estimation

Application: Exxon-Valdez

Questionnaire Issues and Biases

(80)

Context

I

1989, about 35 000 tons of crude oil at sea

I

Ended up extending on about 26 000

km²

at sea

I and soil 1 600 km of coastline

I

Most of the damage was in Prince William Sound and the Gulf of Alaska up to the Kodiak Islands

I

Several types of damages

I Professional fishing (minimal) I Tourism (possibly a benefit) I Environmental heritage loss

I Punitive damages (supposed to be incentive) I

Valuation survey

(81)

(82)

(83)

(84)

(85)

(86)

(87)

(88)

(89)

(90)

(91)

(92)

(93)

(94)

Exxon-Valdez Questionnaire

I

Avoid Willingness to accept Q:

I For assumed strategic behaviour I

The basis Q is

I Compensation for the loss of an environmental heritage during 10 years?

I Scenario: after 10 years environmental damage will be fully recovered

I

Convert such “WTA” into a WTP to avoid that the catastrophy happens again for 10 years

I Scenario: in 10 years time, similar cat. will be impossible due to double-hull

I

The scenario need not be true

I Since we are investigating human preferences for things that may never happen

I However, it must appear credible to respondents

(95)

(96)

(97)

(98)

(99)

(100)

(101)

(102)

(103)

(104)

(105)

(106)

(107)

Reading the Exxon-Valdez Data

I data(“CarsonDB”)

I it is only a frequency table for the DBDC survey I Thus w/o theX data

T1 TU TL yy yn ny nn

1 10 30 5 119 59 8 78

2 30 60 10 69 69 31 98

3 60 120 30 54 75 25 101

4 120 250 60 35 53 30 139

I

So there are 6 distinct bids

I 5, 10, 30, 60, 120, 250

I There is always a large proportion of nn : part of protest I The proportion of yy decreases w/ b

I The proportion of yn & of ny is roughly constant w/ b I w/ about 2 to 3 more yn than ny

(108)

Converting the Exxon-Valdez Data

I

We need data that is individual

I 119+59+...+53+30 = 1043 (cfr nobs in RAE.R)

I

Observations are created from that frequency table by 3 steps.

1. create a new data frame db.data, filled w/ 0

I to save the reconstructed individual observations and to prepare indexes

2. then organize the 3 columns containing the first bids (bid1), and the increased and decreased second bids (bid2U and bid2L) 3. then, fill in the answers to each bid corresponding to the

numbers in the frequency table I

Follow the detailed code in RAE2017.R

(109)

Estimating WTP with DBDC data

I dbchoice(formula, data, dist = "log-logistic", par = NULL) I Usage similar tosbchoice

I Except forformula&par

I formula

: R1 + R2 ~ var | bid1 + bid2

I R1 + R2 : 4 response outcomes in 2 dichotomous variables I bid1 + bid2 : 2 bids

I var : any number of covariates I Unfortunately, we do not have any

I par

: starting values, may be NULL

I There is no guarantee that the likelihood is unimodal I Optimization may not converge

I Different s.v. may lead to different optima I Take the one w/ higher likelihood

(110)

Estimating WTP with DBDC data

I

See the code in RAE.R

I

Estimation goes smoothly

I See “convergence TRUE”

I When this is not the case : the estimated values have no meaning

I You need to specifypar =a vector I Cfr the slide “Iterations and Convergence”

I

Coef of bid is neg. & highly signif in all 4 models

I

The same 4 measures of welfare are given

I E(WTP)→ ∞in the log-log model when|β|<1 I The measure is for a one-time paiement for 10 years I As it turns out, the median is always more conservative (for

probit-type models)

(111)

CI

I

Use exactly the same cmds as for SB

I

for the log-logistic model e.g.

I The CI for the (infinite) mean WTP is not defined I Otherwise we see similar results for Krinsky & Robb as for

Bootstrap

I approx±5 around the central value I e.g. for the 30.4 median : [26.5 - 35.5]

(112)

Exxon-Valdez : Non-use value

I

The value presented to the Courts was for a

median

WTP of about 3.2$/y per household for 10 years

I to avoid such accidents in the next 10 years I That is 32$ total

I

Since the ∆z here is natural heritage of the whole US

I this value refers to the 90.9 million US households

I that is a aggregatedmedianof $2 800 millions (2.8 billions) I Interpretation is that it is the amount that would obtain

exactly 50% if there was a referendum

I It is not related to the cost of the hypothetical escort ship program

I But it can be taken as the minimal compensation for the loss of natural heritage due to the spill

I

In the end, Exxon and the governor of Alaska settled out of

court for 1 billion of $

(113)

Exxon-Valdez : Remarks

I

In US law, the ultimate responsibility of goods is of the owner of these goods

I Otherwise, if it was of the shipping cie, it would be easy for goods owners to contract unsolvable firms

I and effectively escape responsibility

I

Any tanker that calls in US territorial waters must have subscribed an insurance

I that has a $1 billion fund that is seized by the authorities to pay any damages

(114)

Exxon-Valdez : Remarks

I

This is part of the Oil Pollution Act of 1990

I following the Exxon Valdez spill

I "A company cannot ship oil into the United States until it presents a plan to prevent spills that may occur. It must also have a detailed containment and cleanup plan in case of an oil spill emergency."

I

In Europe, similar ideas advance slowly

I following the Erika (1999) and Prestige (2002) wrecks

(115)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation

Application

Double-bounded CV Estimation

Application: Exxon-Valdez Questionnaire Issues and Biases

(116)

Context

I

It is generally not easy to answer CV questions

I Not current to think of public/collective goods in terms of price or value

I How will my answer be used ? Could it commit me ? I

Often, a realistic context is described in the questionnaire

I A referendum (binding or consultative) on a local tax change I A contribution to an association

I An entry fee on a site

I

The paiement vehicle is often associated with such context

I If there is a paiement, how would be carried out ?

I Changes of prices, tax raise or similar fees, voluntary contributions...

(117)

Other Formats

I

Open-ended: how much are you WTP ?

I

Paiement cards, showing several amounts

I

“Bidding game”

I i.e. like an auction

I

Psychologists are very critical with any valuation question

I They say that any amount stated by the researcheranchors

very much the respondents’ anwers

I Within the valuation Q or anywhere else in the questionnaire I Yet, it is an empirical regularity that the proportion of Yes

decreases with the bid

I Not always very smoothly, but still

(118)

Answers that are not Preferences Revealing

I

Strategic : by lying, can I get more than by telling the truth ?

I Open-ended and the average of individual WTP

I This is less of a concern with the dichotomous format I Avoid willingness to accept Q as in Exxon-Valdez I

Symbolic

I The respondent does not identify correctly the ∆z of interest I Some people answer always Yes for environmental causes or

No when the word tax appears

I Some repondent answer what they think everyone (or the government) should do

I

Debriefing : series of questions

after

the valuation question(s)

to eliminate such answers

(119)

Don’t know / refuse to answer

I

Strategic or symbolic answers are not obvious

I

Don’t know / don’t answer are visible

I So actually, even a Yes/No Q always has 3 or 4 answers in practice

I Distinguishing “don’t know” and “refuse” or not

I

One option to treat these answers is to remove them

I ok if they are not associated w/ a specific profile I Preliminary dich. choice model

I Answer “Yes or No” vs. sthg else I Or directly a multinomial model

(120)

& complexity

I

Control “size” (scope) effect

I “size” of the ∆z

I e.g. it must be that the WTP to save 300 birds be < than for 3 000 birds

I Can be done by subsampling

I

Control embedding

I Is ∆z valued for itself or taken symbolically of a larger set ? I e.g. protecting against another Exxon-Valdez at the same site

or for the whole US

I This is often addressed by careful structure : broad Q first, more and more precise

I

Socio-demographics Q

I to acquire regressors of the valuation Q I Allow inference to the population I

Interviewer effect

(121)

Bid Design

I

Why should we use 4 or 6 bids ?

I Trade-off: the more bids,

I the more we know about the WTP curve I but the less precise this knowledge

I

Given the total size of the sample

I d-optimality seeks to minimize the asymptotic variances of the estimators

I c-optimality minimizes the confidence interval of the estimated WTP

I

In the end, the literature settled for “sequential design”

1. start w/ a focus group and ask open Q

2. use these first guesses for a first round of (100?) questionnaire 3. estimate the model : can you identify the WTP ?

4. If not, adjust: higher (lower) bids if not enough “No”(“Yes”)