Research in Applied Econometrics Chapter 3. Contingent Valuation Econometrics

(1)

Research in Applied Econometrics

Chapter 3. Contingent Valuation Econometrics

Pr. Philippe Polomé, Université Lumière Lyon 2

M1 APE Analyse des Politiques Économiques M1 RISE Gouvernance des Risques Environnementaux

2017 – 2018

(2)

Outline

Principles of Contingent Valuation

Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation

Application

Computing Welfare Measures

Double-bounded CV Estimation

Application: Exxon-Valdez

Questionnaire Issues and Biases

(3)

Purpose

I Operationalize

the theoretical notions of values

I

In practice : impossible to measure every individual’s benefit

I Resort to statistical techniques

I Representative samplesw/ control variables to enable inference to the population

I Individual benefits areneveridentified

I Econometric techniques are thus essential

I

We’ll need the packages

I DCchoice

I Ecdat, stats (should be there already)

(4)

Classification of Valuation Techniques

I

Based on

stated

preferences

I Contingent Valuation (“Évaluation” Contingente)

I Choice experiments / Contingent choices

I

Based on

revealed

preferences

I Transport cost – estimating demand for transport

I Hedonic prices – estimating demand for housing

I

Based on (infered) prices (not directly based on preferences)

(5)

Stated Preferences Techniques

I

A sample of people is

surveyed directly

on their preferences about a public project

I To infer a measure of individual statistical value

I at the population level

I

Interviews can be anything: telephone, postal mail, e-mail, website

I Preferably face-to-face, but it’s more expensive

I or combinaisons

I

The sample depends on the objective

(6)

Contingent Valuation

I

One potential environmental change is described

I Together with its statedcost

I The context of such cost is important : taxes, fees, prices...

I

A single question is asked : for or against said change

I The question is sometimes repeated w/ another cost

(7)

Choice experiment / contingent choices

I

Two or more potential situations are described

I They differ according to a number of attributes

I Including the potential cost of each situation I

Respondents are asked to choose one such situation

I Such choices are asked several times w/ differing levels of the attributes

(8)

Questionnaire Structure

I

Opening questions

I Possible filters to select certain respondents

I

General Q on the environment to bring to the particular case of interest

I While making the respondent think about it

I We want informed, thought about, answers

I

Evaluation Q

I

Debriefing

I Why did the respondent answer what s.he did ?

I Did s.he nor believe the scenario ?

I

Collect data on specific potential explanatory variables

I e.g. if survey on a lake quality, what use has the respondent of the lake ?

I Tourism, recreational (boat, fish...) I

Socio-econ data

I Primarily for inference to population

(9)

The Dichotomous Format

I

This is the most popular, least controversial “elicitation”

format

I Assumed least prone to untruthful answers

I While not too demanding for the respondent I

Consider an environmental change from

z0

to

z1

I To simplify, consider only an improvement

I Respondents are proposed a “bid”b

I They answer yes or no

I But may also state that they don’t know or refuse to answer

I This is similar to “posted price” market context

I There is a good (the environmental improvement)

I The situation is a bit like asking whether to buy it

I Respondents are routinely in such situation

I Except, not in a public good context

I Further, “buying” cannot really be that

I So we need a context, but we discuss that later

(10)

The Dichotomous Format

I

Formalizing, let the Indirect Utility

v(z,y) +‘

I

It represents individual’s preferences

from the point of view of the researcher

I The‘ error term reflect multiple influences the researcher does not know about

I These influences are modeled as a random variable, but that does not mean people act randomly

I This is called a Random Utility Model (RUM)

I If the answer is “Yes”, then it must be that v(z1,y≠b) +‘1Øv(z0,y) +‘0 I and it must also be thatWTP>b

(11)

WTP distribution

I

4 to 6 (different) bids are proposed to different respondents

I Each respondent ever sees only one bid

I Consider the proportion of “Yes” for each of these bids

(12)

WTP distribution

I

Assume that

I For a bid of 0, the proportion of Yes is 100%

I For some high bid the proportion would be zero

I Respondents have a single form of their utility function

I but differ according to observable dataX and unobservables‘ I

“connect-the-dots” as an estimate of the WTP dist.

(13)

WTP distribution

I

Going back to the IUF, we had the answer is “Yes”, then

v(z1,y≠b) +‘₁ Øv(z0,y) +‘₀

I

In other words,

Pr{Yes|b} = Pr{v(z1,y≠b) +‘₁ Øv(z0,y) +‘₀}

= Pr{‘₀≠‘₁ Æv(z1,y≠b)≠v(z0,y)}

= Pr{‘₀≠‘₁ Æg(b,y, . . .)}

g()

cannot be any function because of

V(.), see later

I

If we make a hypothesis on the distribution of

‘₀≠‘₁

I We have a model that can be estimated by Maximum Likelihood

I Logistic : Logit

I Normal : Probit

(14)

WTP distribution

(15)

Outline

Principles of Contingent Valuation

Maximum Likelihood Estimation

Application of ML : Logit & probit Single-bounded CV Estimation

Application

Double-bounded CV Estimation

Questionnaire Issues and Biases

(16)

Density

I f (y|◊): probability density function, pdf, of a random variable y

I conditioned on a set of parameters◊

I It represents mathematically thedata generating processof each obs. of a sample of data

I

The joint density of n

independent and identically distributed

(iid) obs.

I = the product of the individuali densities:

f(y1, ...,yn|◊) =Ÿⁿ

i=1

f(yi|◊) =L(◊|y)

I

This joint density is the

likelihood function

I a function of the unknown parameter vector◊

I y is used to indicate the collection of sample data

(17)

Likelihood Function

I

Intuitively, this is much the same as a joint probability

I Consider 2 (6-sided) dices

I What is the probability of rolling a 36 ?

I The likelihood function is akin to the idea of the probability of the sample

I Except that points have probability mass zero I

Usually simpler to work with the

log

:

lnL(◊|y) =^ÿⁿ

i=1

lnf (yi|◊)

(18)

Conditional Likelihood

I

Generalize the likelihood function to allow the density to depend on conditioning variables:

f (yi|xi,◊)

I Take the classical LRMyi=xi—+‘_i

I Suppose‘is normally distributed with mean 0 and variance

‡²: ‘≥n!_0,

‡²"

I Then,yi ≥n!_x

i—,‡²"

I Thusyi arenot iid: they have different means

I But they are independant, so that(yi≠xi—)/‡≥n(0,1)

I thuslnL(◊|y,X) = ÿlnf (yi|xi,◊) =≠1

2 ÿn

i=1

Ëln‡²+ ln (2ﬁ) + (yi≠xi—)²/‡²È where X is then◊Kmatrix of data with ith row equal toxi

(19)

Identification

I

Now that we have

lnL, how do we use it to obtain estimates

of the parameters

◊

?

I and to testi hypotheses about them ?

I

There is the preliminary issue of identification

I Whether estimation of the parameters is possible at all

I This is about the formulation of the model

I The question is : Suppose we had an infinitely large sample,

I could we uniquely determine the values of◊ from it?

I The answer is sometimes no

Identification. The parameter vector◊

is identified (estimable) if for any other parameter vector,

◊^ú”=◊, for some datay

:

L(◊^ú|y)”=L(◊|y)

(20)

Identification Example 1: Multicollinearity

I

LRM

yi =xi—+‘_i

I Suppose that there is a nonzero vectorasuch thatx_i^Õa=0’xi I That is the case when there is perfect multicollinearity

I Then there is another “parameter” vector,“=—+a”=— such thatx_i^Õ“=x_i^Õ—’xi

I

When this is the case, then the log-likelihood is the same whether it is evaluated at

—

or at

“

I So, it is not possible to consider estimation of— in this model since— cannot be distinguished from “

I Here identification is associated with the data

I But it can also be the way the model is written

(21)

Example 2 Identification via normalization

Consider the LRM

yi =—₁+—₂xi +‘_i

, where

‘_i|xi ≥n^!

0,

‡²^"

.

I

Consider the context of a consumer’s purchases of a large commodity such as a car where

I xi is the consumer’s income

I yi is the difference between what the consumer is willing to pay for the car,p_i^ú, and the price of the car,pi

I

Suppose rather than observing

p_i^ú

or

pi

, we observe only whether the consumer actually purchases the car, which, assume, occurs when

yi =p_i^ú≠pi >

0.

I

Thus, the model states that the consumer will purchase the car if

yi >

0 and not purchase otherwise

I

The random variable in this model is “purchase” or “not

purchase”—there are only two outcomes

(22)

Example 2 Identification via normalization

The probability of a purchase is

Pr{purchase|—₁,—₂,‡,xi} = Pr{yi >

0

|—₁,—₂,‡,xi}

= Pr{—₁+—₂xi +‘_i >

0|

—₁,—₂,‡,xi}

= Pr{‘_i >≠—₁≠—₂xi|—₁,—₂,‡,xi}

= Pr{‘_i/‡ >(≠—₁≠—₂xi)/‡|—₁,—₂,‡,xi}

= Pr{z_i >(≠—₁≠—₂x_i)/‡|—₁,—₂,‡,x_i}

where

zi

has a standard normal distribution

The probability of not purchase is one minus this probability.

(23)

Example 2 Identification via normalization

Thus the likelihood function is

Ÿ

i=purchase

[Pr{purch|—₁,—₂,‡,xi}]◊ ^Ÿ

i=not purch

[1≠Pr{purch|—₁,—₂,‡,xi}]

This is often rewritten as

Ÿ

i [Pr{purchase|—₁,—₂,‡,xi}^yⁱ] [1≠Pr{purchase|—₁,—₂,‡,xi}]^(1≠yⁱ⁾

The parameters of this model are

not identified

:

I

If

—₁

,

—₂

and

‡

are all multiplied by the same nonzero constant,

I thenPr{purchase}and the likelihood function do not change.

I

This model requires a

normalization.

I The one usually used is‡=1.

(24)

Maximum Likelihood Estimation Principle

I

We see that with a discrete rv

I f(yi|◊)is the probability of observingyi conditionnally on ◊

I The likelihood function is then the probability of observing the sampleY conditionnaly on◊

I

We assume that the sample that we have observed is the most likely

I What value of◊ makes the observed sample most likely?

I Answer : The value of◊ that maximizes the likelihood function

I since then the observed sample will have maximum probability I

When

y

is a continuous rv, instead of a discrete one,

I we cannot say anymore thatf(yi|◊)is the probability of observingyi conditionnally on◊,

I but we retain the same principle.

(25)

Maximum Likelihood Estimation Principle

I

The value of the parameter vector that maximizes

L(◊|data)is

the

maximum likelihood estimates◊ˆ

I Since the logarithm is a monotonic function, the value vector that maximizesL(◊|data)is the same as the one that maximizeslnL(◊|data)

I

The necessary condition for maximizing

lnL(◊|data)

is

ˆlnL(◊|data)/ˆ◊=

0

I This is called thelikelihood equation

(26)

Example : Likelihood Function and Equations for the Normal

Assume a sample

Y

from an

n^!µ,‡²^"

.

I

The lnL function is

lnL^!µ,‡²^"=≠ⁿ₂ln (2ﬁ)≠ⁿ₂ln‡²≠¹₂^qⁿi=1

5(yi≠µ)²

‡²

6

I

The likelihood equations are

I ˆlnL

ˆµ = _‡¹2qn

i=1(yi≠µ) =0 and

I ˆlnL

ˆ‡² =≠_2‡ⁿ² +_2‡¹4qn

i=1(yi≠µ)²=0

I

These equations accept an

explicit

solution at

I µˆML=¹_nqn

i=1yi = ¯y and

I ‡ˆ²_ML= ¹_nqn

i=1(yi≠y¯)²

I

Thus the sample mean is the ML estimator

I The ML estimator of the variance is not the OLS estimator (that has an n-1 denominator)

I In small samples, such estimator isbiased, but asnæ Œthat does not matter

(27)

ML Properties 1

I Conditionnally on correct distributionnal assumptions

I and under regularity conditions

I ML has very good properties

I In a sense, because the information supplied to the estimator is very good: not only the sample but also the full distribution I Notation

:

◊ˆ

is the ML estimator

I ◊₀is the true value of the parameter vector

I ◊is any other value

I ML has only asymptotic properties

I in small samples, it may be biased or inefficient

(28)

ML Properties 2

I Consistency

:

plim◊ˆ=◊₀

I Asymptotic normality

:

◊ˆ≥N^Ë◊₀,{I(◊0)}^≠¹^È

I whereI(◊0) =≠EË

ˆ²lnL/ˆ◊0ˆ◊₀^ÕÈ

is theinformation matrix

I asymptotic covariance matrix

I ˆf/ˆ◊₀indicatesˆf/ˆ◊evaluated at ◊₀.

(29)

ML Properties 3

I Asymptotic efficiency

:

◊ˆ

is asymptotically efficient if

I it is consistent, asymptotically normally distributed,

I and has an asymptotic covariance matrix that is not larger than the asymptotic covariance matrix of any other consistent, asymptotically normally distributed estimator.

I ◊ˆachieves the Cramer–Rao lower bound for consistent estimators

I Invariance

: The ML estimator of

“₀=c(◊0)

is

c¹◊ˆ²

I ifc(◊)is a continuous and continuously differentiable function.

(30)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation

Application of ML : Logit & probit

Single-bounded CV Estimation

Application

Double-bounded CV Estimation

Questionnaire Issues and Biases

(31)

Specification (“how the probability is written”)

I

Let a class of non-linear models with dichotomous responses :

Pr (y =

1

|X) =G(X—)

G

is a function taking values between zero and one : 0

ÆG(z)Æ

1,

’

real number

z

I This guarantees that estimated response probabilities will be between zero and one

I That is not the case w/ the LRM

I

Therefore, there is a

non linear

relation between

y

and

X

I

Many functions could do this job

I 2 are popular : logistic and normal

(32)

Logit & Probit

I Logit

model,

G

is the distribution function (cumulative density) of the standard

logistic

r.v. :

G(z) = exp (z)/[1+ exp (z)] = (z)

I Probit

model,

G

is the distribution function of the standard

normal

r.v., of which the density is noted

„(.)

:

G(z) =^⁄ ^z

≠Œ

„(t)dt = (z)

with

„(z) = (2ﬁ)^≠1/2exp^!≠z²/2^"

(33)

Logit vs. Probit

I

The logistic and normal distributions are similar

I

Logistic makes computations and analysis easier and allows for

simplifications in more advanced models

(34)

Latent Variable Model

I

Let

y^ú

a latent variable (that is, not directly observed) s.t.

y^ú =X—+‘

I e.g. y^úis utility from an environmental improvement z

I

Logit & probit may be derived from a latent variable model

I satifying all the classical LRM hypotheses

I

Utility is not observed, but only the consequence of the individual decision

I y_i^ú <

0

=∆ yi =

0

y_i^ú Ø

0

=∆ yi =

1

I We observe that the person is(y=1)or is not(y =0)willing to pay an amountbœX to “buy” z

(35)

Response Probability

I

Hypotheses on

‘

I independantX

I standard logistic or standard normal

I

Response probability for

y

:

Pr{y =

1

|X} = Pr{y^úØ

0

|X}

= Pr{‘>≠(X—)|X}

=

1

≠G(≠(X—))

=G(X—)

I

Since

‘

is normal or logistic, it is symmetrical around zero

I thus 1≠G(≠z) =G(z)’real numberz

(36)

Maximum Likelihood Estimation

I

As indicated earlier, the Likelihood Function for the dichotomous case is

Ÿ

i

[Pr{willing|—,‡,Xi}^yⁱ] [1≠Pr{willing|—,‡,Xi}]^(1≠yⁱ⁾

I

ML seeks the maximum of (the log of) this function

I It does not have an explicit solution

I But yield numerical estimates—ˆMV I Consistent but biaised

I Asymptotically efficient & normal

I As long as the model hypotheses are true

I So, if you used Probit : is‘really normal ?

I If the distribution hypothesis is not true, sometimes we may retain the properties

I On the other hand, endogeneity ofX is as serious as usual

(37)

Marginal effect a continuous regressor x

j

I

The effect of a marginal change in

xj

I on the response probaPr{y=1|X}=p(X)

I is given by the partial derivative ˆp(X)

ˆxj = ˆG(X—)

ˆxj =g(X—)—j I

This is the

marginal effect

of

xj

I it depends on the values taken byallthe regressors (not just xj)

I Compare to the LRM:ˆy/ˆxj=—j

I it cannot bring the proba below zero or above one

(38)

Marginal effect a continuous regressor x

j

I

Thus, the marginal effect is a non-linear combination of the regressors

I

It can be calculated at “interesting” points of

X

I e.g. X¯: the sample average point : ˆy ˆxj

!X¯"

I However, that does not mean much for discrete regressors e.g.

gender

I Or it can be calculated for eachi in the sample ˆy ˆxj (Xj)

I and then we can compute an average of the “individual”

marginal effect ˆy ˆxj(Xj)

I In general, these are not the same : ˆy

ˆxj (Xj)”= ˆy ˆxj

!X¯"

I

Which one do we choose ?

I Often, that is too complicated for presentation

(39)

Marginal effect of a discrete regressor x

j

I

Effect of a change in

xj

discrete

I fromatob (often, from 0 to 1)

I on the response probaPr{y=1|X}=p(X)

I WriteX≠j the set of all the regressorsbutxj, similarly—_≠j

I The discrete change inˆp(Xi)is ˆ

p(Xi) =G1

X≠ji—ˆ≠j+b—ˆj

2≠G1

X≠ji—ˆ≠j+a—ˆj

2

I

Such discrete effect differs from individual to individual

I Even the sign could in principle differ

I

In R, such effects are not calculated automatically

I The above formula must be calculated explicitly

(40)

Compare Logit & Probit

I

The

sizes

of the coefficients

are not

comparable between these models

I This is because, with dichotomous variablesy

I the set of all coefficients could be multiplied by a positive constant without changing they

I Usually, the variance ofy is not identified

I Approximately, multiply probit coef by 1.5 yields the logit coef (rule of thumb!)

I

The marginal effects should be approximately the same

(41)

Measures of the quality of fit

I

The

correctly predicted percentage

I may be appealing

I ’i compute the fit proba thatyi takes value 1, G1 Xi—ˆ2

I IfØ.5 we “predict”yi =1 and zero otherwise

I Compute the % of correct predictions

I

Problem : it is possible to see high % of correctly predicted while the model is not much useful

I e.g. in a sample of 200, 180 obs. ofyi=0 for which 150 are predicted zero and 20 obs. ofyi=1 all predicted zero

I The model is clearly poor

I But we still have 75% correct predictions

I A flat prediction of 0 has 90% correct predictions I

A better measure is a 2

◊

2 table as in the next slide

(42)

Goodness of Fit : Predictive Table

Observed

yi =

1

yi =

0 Total

Pre dic t

ˆyi =

1

350

122 472

ˆ

yi =

0 78

203

281 Total 428 325 753

We’ll see the R cmd later

(43)

Goodness of Fit : Pseudo R-square

I Pseudo≠R² =

1

≠lnLUR/lnL0

I lnLUR log-likelihood of the estimated model

I lnL0log-likelihood of a model with only the intercept

I i.e. forcing all—=0 except for the intercept I

Similar to an

R²

for OLS regression

I sinceR²=1≠SSRUR/SSR0

I

There exists other measures of the quality of fit

I but the fit is not what maximum likelihood is seeking

I contrarily to LS

I I stress more the statistical significance of regressors

(44)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit

Single-bounded CV Estimation

Application

Double-bounded CV Estimation

Questionnaire Issues and Biases

(45)

Utility

I

Come back to economic value

I Compensating Variation for a change fromz0 toz1is

V(z1,y≠CV) =V(z0,y) CV is interpreted as the (max) WTP to secure an improvement z=z1≠z0

I

We observe answers to bid

b

when

b <WTP

I

Assume RUM, then

Pr{Yes|b}= Pr{‘₀≠‘₁ Æv(z1,y≠b)≠v(z0,y)}

(46)

Linear utility

I

Suppose

V (zj,y) =–_j +—y j =

0, 1

I — is the marginal utility of income

I In principle, we would like it to decrease with income, but simplify

I

The (max) WTP is s.t.

–₀+—y+‘₀ =–₁+—(y≠WTP) +‘₁

I SolvingWTP= –+‘

— , with–=–₁≠–₀and‘=‘₁≠‘₀

I So that e.g. E(WTP) =–/—

I

The probability of Yes to bid

b

is then

Pr{Yes|b} = Pr{v(z1,y≠b) +‘₁≠v(z0,y)≠‘₀>

0}

= Pr{‘Æ—b≠–}

I This is a simple probit/logit context

I UtilityV is not identified, only the difference

I But that may be because of the linearity

(47)

Probit/Logit

I

If

‘≥n(0,

1) std normal, then

Pr{Yes|b}=

1

≠ (—b≠–) = (–≠—b)

I

If

‘

is the std logistic, then

Pr{Yes|b}=

1/

exp (—b≠–)

I

If we assume that

WTP Ø

0, then we can derive similarly:

I with‘≥log≠normal

Pr{Yes|b}=1≠ (—b≠–) = (–≠—ln (b))

I with‘≥log≠logistic

Pr{Yes|b}=1/exp (—ln (b)≠–)

I These last two are still the Probit and Logit models, respectively,

I But thelnof the bid is used instead of the bid itself

I And that is still compatible with RUM

(48)

Estimation

I

Using these expressions, it is easy to derive the log-likelihood function

I as in the previous section

I

Usually, we want to account for individual characteristics

Xi

I Those that are collected in the survey

I This is done through–i =qk=K k=0 “kxki I x0i =1’i for an intercept

I

Parametric estimation

I GLM package (core distribution)

I DCchoice package

I Check orlibrary( )it

I

Example with the

NaturalPark

data

I of theEcdatpackage (Croissant 2014)

(49)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit

Application

Double-bounded CV Estimation

Questionnaire Issues and Biases

(50)

NaturalPark data

I

From the

Ecdat

package (Croissant 2014)

I consists of WTP and other socio-demographic variables

I of 312 individuals regarding the preservation of the Alentejo natural park in Portugal

I

In

help

type NaturalPark, execute

summary(NaturalPark)

I 7 variables

I bid1 is the first bid

I min 6, max 48

I bidh & bidl are 2nd bids that we do not look into for the moment

I answers is a factor {YY, YN, NY, NN} of answers to 2 bids

I We’ll use only the 1st letter, Y for Yes

I Socio-demographics are Age, Sex, Income

(51)

Data Transformation

I

Rename

NP <- NaturalPark

for simplicity

I

Extract the answer to the 1st bid from “answers”

I NP$R1 <- ifelse(NP$answers == "yy" | NP$answers ==

"yn", 1, 0)

I What does that do ?

I A call to the logical function ifelse( ) takes the form ifelse(condition, true, false)

I It returns true if condition is true, and false otherwise.

I The vertical bar | is an “or” operator in a logical expression.

I The prefix NP$ is a command that makes it possible to access each variable contained in NP directly.

I

Convert bid1 to log (log in English is ln in French)

I NP$Lbid1 <- log(NP$bid1)

I summary(NP)

reveals that things have gone smoothly

(52)

Estimation using glm

I glm

is a classical package for many models of type

G(X—)

I Its use is much likelm

I But you have to specify the link functionG using option family

=

I This is fairly flexible, but a bit complicated

I See RAE2017 for the specifications I summary

works on

glm

I As several usual commands

I

Output is in part similar to

lm

I Coef values next to var names with their significance

I This is interpreted in a way similar tolm

I Negative (signif.) coef implies a negative impact on Pr{Yes|b}

I Note “sexfemale” indicates the effect for women

I Also gives thelnL value at optimum

(53)

have explicit solutions Solved numerically by algorithms

I

Newton-type in plot

I Iterates until condition

I

Risk of local max

I poor start point

(54)

Estimation using DCchoice

I DCchoice

is designed for such data

I But not for other contexts

I Format for single-bounded issbchoice(formula, data, dist =

"log-logistic")

I

the default dist is log-logistic

I This is in fact logistic

I But the bid variable is interpreted in log

I formula followsResponse ~ X | bid

I | bid is mandatory

I

the output is much more directly relevant for valuation purposes

I bidalways shown last

I log(bid)if log-logistic or log-normal were selected

I Measures of mean WTP, we will see later

(55)

Goodness of fit

I

table(predict(fittted model, type = "probability")>.5,NP$R1)

I This is a contingency table that counts the number of predicted Yes

I predicted prob>.5 (returns TRUE or FALSE)

I against the actual Yes/No

I per individual, so with each individual’sXi (individual predictions)

SB.NP.DC.logit 0 1 FALSE (predicted 0) 85 38

TRUE (predicted 1) 56 133

(56)

Plotting

I sbchoice

cmd produces an object that can be

plotted directly

I directplotofthe object is the fitted probability of Yes for the bid range

I probably, conditionnaly on average age, income, sex – the package isn’t explicit

I

Using a predict method helps

I observe what is outside the range

I compare logit & probit fitted curves

I see RAE2017.R

I In particular : logit has slightly fatter tails, inducing a higher WTP

I To use predict

I Creates a matrix of new data

I Chooses the proper type, here we want a proba,“response”

(57)

Logit vs. Probit predict

I

As can be seen, for the same data

I Logit has slightly fatter tails than probit

(58)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit

Application

Computing Welfare Measures Double-bounded CV Estimation

Questionnaire Issues and Biases

(59)

Expected WTP

I

Recall that when we assumed

V (zj,y) =–_j +—y j =

0, 1

I Then (max) WTP is s.t.

–₀+—y+‘₀=–₁+—(y≠WTP) +‘₁

I SoWTP= –+‘

— , with–=–1≠–0and‘=‘1≠‘0 I So that WTP is a r.v.

I we can compute e.g. E(WTP) =–/—

I When–i=qk=K k=0 “kxki

I thenE(WTP)becomes individual

I So that we have to think about how we aggregate individual expected WTP

(60)

Other measures of welfare

I E(WTP)

is the most obvious measure

I However, the expectation is strongly influenced by the tail of the

distributionG(.)

I While we do not actually have data to fit it

I Since there are not many bids

I And it does not feel very serious

to ask a very high bid

(61)

Truncated expectations

I E(WTP) =^⁄ ^Œ

0 Pr{Yes|b}db

I Historically, the highest bid has been used to truncate E(WTP):

⁄ b max

0 Pr{Yes|b}db

I However, that is not a proper expectation since the support of Pr{Yes|b}does not stop atb max

I An alternative uses the truncated distribution:

⁄ b max

0 Pr{Yes|b}/(1≠Pr (Yes|b max))db

(62)

Median WTP

I

Finally the median has been suggested as a more robust measure:

b_median

s.t.

Pr{Yes|b_median}=¹/2

I i.e. the bid s.t. 50%

would be favorable

(63)

Shape of the WTP function

I

Clearly, the form of the WTP function depends on the form of

V (.)

I For some forms, some values of— lead to impossibilities

Distribution Expected Median

Normal

^–_— ^–_—

Logistic

^–_— ^–_—

Log-normal

exp¹^–_—²exp¹_2—¹2

2 exp¹^–_—²

Log-logistic

exp¹^–_—^{2 1}

1

≠_—¹^{2 1}

1

+¹_—² exp¹^–_—²

I

Again, if

–_i =^q^k=K_k=0 “_kxki

, each of these forms are individual

I So the question arises whether to compute a sample mean or a sample median

I DCchoice appears to compute a sample mean

I But is not explicit

(64)

How do we choose a welfare measure ?

I

That might be the time for a debate ?

I

3 welfare measures : untruncated, properly truncated, median

I Of course we would not select an infinite one

I The smallest estimates to avoid criticism ?

I Do these measures differ significantly from each other ?

I We will see that later I

4 well-known distribution

I (log-)normal, (log-)logistic: they do not differ substantially

I There are others

I There are also other estimators e.g. non-parametric

I The estimate of the best-fit model ?

I

2 aggregation rules : sample mean or sample median

I researcher usually take the first, but what is the meaning of a sample mean of individual medians ?

I

DCchoice does not provide any guidance

(65)

Computing the welfare measure

I

DCchoice computes automatically

I

With glm, use the above formulas

I See the code in RAE2017.R

I

Much as I like DCchoice, I must note that for the data we use (NaturalPark)

I The mean WTP does not coincide with the median

I for the symmetrical distributions (normal & logistic)

I That is a problem I should write the authors

(66)

Confidence Intervals

I

In the end, WTP, under any of its forms, is an estimate

I As such it has a confidence interval

I Much as for—, you should always report the CIˆ

I At a minimum to give an idea of the variance

I and to show whether it is significantly different from zero I

There are 2 main methods

I Krinsky & Robb

I Bootstrap

I

These methods are much broader than valuation

I They are useful in all types of research in applied econometrics

(67)

Constructing Confidence Intervals: Krinsky and Robb

I

Start with the estimated vector of coefficients

I

By ML, it is distributed (multivariate) normally

I Its matrix of variance-covariance has been estimated in the ML process

I

Draw D times from a multivariate normal distribution with

I mean = the vector of estimated coefficients

I the estimated variance-covariance matrix of these estimated coefficients

I

So, we have D vectors of estimated coefficients

I If D is large, the average of these D vectors is just our original vector of coef.

(68)

Constructing Confidence Intervals: Krinsky and Robb

I

Compute the welfare measure for each such replicated coefficient vector

I Thus we have D estimated welfare measures

I some large, some small : an empirical distribution

I order them from smallest to largest

I the 5% most extreme are deemed random

I the 95% most central are deemed reasonnable

I and contitute the 95% confidence interval I

For example,

I the lower and upper bounds of the 95% confidence interval

I corresponds to the 0.025 and 0.975 percentiles of the measures, respectively

(69)

Krinsky and Robb: Implementation in DCchoice

I

Function

krCI(.)

I constructs CI for the four different WTPs

I estimated by functionssbchoice(.) ordbchoice(.) I

call as

krCI(obj, nsim = 1000, CI = 0.95)

I objis an object of either the “sbchoice” or “dbchoice” class,

I nsimis the number of draws of the parameters

I influences machine time

I CIis the percentile of the confidence intervals to be estimated

I returns an object “krCI”

I table of the simulated confidence intervals

I vectors containing the simulated WTPs

I

Is there a package that does Krinsky & Robb for glm objects ?

(70)

Constructing Confidence Intervals: Bootstrap

I

Similar to Krinsky & Robb

I except in the way the new estimated coefficients are obtained

I Essentially, instead of simulating new estimates

I We simulate new data

I and then calculate new estimates

(71)

Mediocrity principle

I

Consider that our sample is

mediocre

in the population

I We mean : it does not have anything exceptional

I

Then, if we could draw a new sample from that population,

I we would surely obtain a fairly mediocre sample

I that is, fairlysimilarto the original one

(72)

Bootstrap principle

I

It’s not possible to draw a new sample

I

But imagine that using the original sample, we draw one obs,

I and we “put it back” in the sample (“replace”)

I then we draw again

I repeat until we have the same number of obs as in the original sample

I call thisa bootstrap sample

I Each original obs. appears 0 to n times I

By the

mediocrity principle

I the bootstrap sample is fairly close to a real new sample

I Estimate a new vector of coefficients

I Repeat D times

(73)

Bootstrap: Implementation in DCchoice

I

Function

bootCI()

carries out the bootstrapping

I and returns the bootstrap confidence intervals

I callbootCI(obj, nboot = 1000, CI = 0.95)

I

Longer than K&R since each sample must be generated

I and then compute new estimates

I while K&R only simultates new estimates

I

In the end, the results are similar

I See RAE2017.R

I

Note : another mean would be the

resample

cmd

I Applicable toglm

I But I don’t develop here

(74)

Differences of welfare measures

I

Sometimes we want to know whether a welfare measure is significantly different from another

I In other words: is their difference significantly different from zero

I In terms of CI: does the CI of their difference include zero ?

I

To compute that: Bootstrap similarly

I Krinsky and Robb is also possible but

I If the welfare measures are independant, their difference has variance-covariance that is the sum of each

variance-covariance

I If they are not independant, then it’s difficult

(75)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation

Application

Computing Welfare Measures Double-bounded CV Estimation

Questionnaire Issues and Biases

(76)

Double-bounded CV Estimation

I

To increase the amount of information collected by survey

I The valuation q is asked a 2nd time

I If R answered Yes (No), then asked about a higher (lower) bid I

Called double-bounded dichotomous choice

I Dichotomous choice with follow-up

I

More precisely, the phrasing could be

I If Z cost you bÄ, would you be WTP it ?

I If answered Yes: would you be WTPb^UÄ? b^U>b

I If answered No: would you be WTPb^LÄ?b^L<b

(77)

Double-bounded CV Estimation

I

There are 4 outcomes per R

I YY, YN, NY, NN

I YY indicates that WTP>b^U

I YN that b<WTP<b^U

I NY thatb^L<WTP<b

I NN that WTP<b^L I

Thus the answers are intervals

I Probit & Logit do not suffice

I Many use ML

I But the likelihood function is different

(78)

Estimation with DBDC data

I

To develop the likelihood function

I it is necessary to express probabilities first

I

Write

P^YY

as the probability to answer Yes, Yes

I P^YY = Pr)b^U <WTP*=1≠G!b^U"

I P^YN = Pr)b<WTP<b^U*=G!b^U"

≠G(b)

I P^NY = Pr)b^L<WTP<b*=G(b)≠G!b^L"

I P^NN= Pr)WTP<b^L*=G!b^L"

I

For a sample of

n

obs.

lnL= ÿN

n=1

Ëd_n^YYP_n^YY +d_n^YNP_n^YN +d_n^NYP_n^NY +d_n^NNP_n^NN^È

where

n

indexes individuals and

d_n^XX

indicates whether

n

answered

XX

(dich. variable)

(79)

Estimation with DBDC data

I

There is no direct command corresponding to such likelihood

I It must be programmed

I This is called “Full Information Maximum Likelihood” FIML

I We don’t do this

I It is pre-programmed in DCchoice

I For the same basic choices of distribution as for SBDC data I

Endogeneity issue

I The 2nd bid is not exogenous

I Since it depends on the previous answer

I Thus it contains unobserved characteristics of the individuals

I Such unobservables also determine the 2nd choice

I This is in principle addressed by FIML

I

A more general model is Bivariate probit

I allowing the 2 answers to have less than perfect correlation

I not covered by DCchoice

(80)

Estimation with DBDC data : Std normal cdf

I P^YY = Pr^Ób^U <WTP^Ô=

1

≠ ¹≠–+—b^U²

I P^YN = Pr^Ób <WTP <b^U^Ô= 1≠–+—b^U²≠ (≠–+—b)

I P^NY = Pr^Ób^L<WTP <b^Ô= (≠–+—b)≠ ¹≠–+—b^L²

I P^NN = Pr^ÓWTP <b^L^Ô= ¹≠–+—b^L²

I

So: we estimate the same coefficients

–

and

—

as in SBDC

I But with more data, so that it is more efficient

I Assuming people answer in the same way to both valuation questions

I

So: the computation of the welfare measures is the same

(81)

Outline

Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation

Application

Computing Welfare Measures Double-bounded CV Estimation

Application: Exxon-Valdez

Questionnaire Issues and Biases

(82)

Context

I

1989, about 35 000 tons of crude oil at sea

I

Ended up extending on about 26 000

km²

at sea

I and soil 1 600 km of coastline

I

Most of the damage was in Prince William Sound and the Gulf of Alaska up to the Kodiak Islands

I

Several types of damages

I Professional fishing (minimal)

I Tourism (possibly a benefit)

I Environmental heritage loss

I Punitive damages (supposed to be incentive)

I

Valuation survey

(83)

(84)

(85)

(86)

(87)

(88)

(89)

(90)

(91)

(92)

(93)

(94)

(95)

(96)

Exxon-Valdez Questionnaire

I

Avoid Willingness to accept Q:

I For assumed strategic behaviour

I

The basis Q is

I Compensation for the loss of an environmental heritage during 10 years?

I Scenario: after 10 years environmental damage will be fully recovered

I

Convert such “WTA” into a WTP to avoid that the catastrophy happens again for 10 years

I Scenario: in 10 years time, similar cat. will be impossible due to double-hull

I

The scenario need not be true

I Since we are investigating human preferences for things that may never happen

I However, it must appear credible to respondents

(97)

(98)

(99)

(100)

(101)

(102)

(103)

(104)

(105)

(106)

(107)