Research in Applied Econometrics
Chapter 3. Contingent Valuation Econometrics
Pr. Philippe Polomé, Université Lumière Lyon 2
M1 APE Analyse des Politiques Économiques M1 RISE Gouvernance des Risques Environnementaux
2017 – 2018
Outline
Principles of Contingent Valuation
Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation
Application
Computing Welfare Measures
Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Purpose
I Operationalize
the theoretical notions of values
I
In practice : impossible to measure every individual’s benefit
I Resort to statistical techniques
I Representative samplesw/ control variables to enable inference to the population
I Individual benefits areneveridentified
I Econometric techniques are thus essential
I
We’ll need the packages
I DCchoice
I Ecdat, stats (should be there already)
Classification of Valuation Techniques
I
Based on
statedpreferences
I Contingent Valuation (“Évaluation” Contingente)
I Choice experiments / Contingent choices
I
Based on
revealedpreferences
I Transport cost – estimating demand for transport
I Hedonic prices – estimating demand for housing
I
Based on (infered) prices (not directly based on preferences)
Stated Preferences Techniques
I
A sample of people is
surveyed directlyon their preferences about a public project
I To infer a measure of individual statistical value
I at the population level
I
Interviews can be anything: telephone, postal mail, e-mail, website
I Preferably face-to-face, but it’s more expensive
I or combinaisons
I
The sample depends on the objective
Contingent Valuation
I
One potential environmental change is described
I Together with its statedcost
I The context of such cost is important : taxes, fees, prices...
I
A single question is asked : for or against said change
I The question is sometimes repeated w/ another cost
Choice experiment / contingent choices
I
Two or more potential situations are described
I They differ according to a number of attributes
I Including the potential cost of each situation I
Respondents are asked to choose one such situation
I Such choices are asked several times w/ differing levels of the attributes
Questionnaire Structure
I
Opening questions
I Possible filters to select certain respondents
I
General Q on the environment to bring to the particular case of interest
I While making the respondent think about it
I We want informed, thought about, answers
I
Evaluation Q
I
Debriefing
I Why did the respondent answer what s.he did ?
I Did s.he nor believe the scenario ?
I
Collect data on specific potential explanatory variables
I e.g. if survey on a lake quality, what use has the respondent of the lake ?
I Tourism, recreational (boat, fish...) I
Socio-econ data
I Primarily for inference to population
The Dichotomous Format
I
This is the most popular, least controversial “elicitation”
format
I Assumed least prone to untruthful answers
I While not too demanding for the respondent I
Consider an environmental change from
z0to
z1I To simplify, consider only an improvement
I Respondents are proposed a “bid”b
I They answer yes or no
I But may also state that they don’t know or refuse to answer
I This is similar to “posted price” market context
I There is a good (the environmental improvement)
I The situation is a bit like asking whether to buy it
I Respondents are routinely in such situation
I Except, not in a public good context
I Further, “buying” cannot really be that
I So we need a context, but we discuss that later
The Dichotomous Format
I
Formalizing, let the Indirect Utility
v(z,y) +‘I
It represents individual’s preferences
from the point of view of the researcherI The‘ error term reflect multiple influences the researcher does not know about
I These influences are modeled as a random variable, but that does not mean people act randomly
I This is called a Random Utility Model (RUM)
I If the answer is “Yes”, then it must be that v(z1,y≠b) +‘1Øv(z0,y) +‘0 I and it must also be thatWTP>b
WTP distribution
I
4 to 6 (different) bids are proposed to different respondents
I Each respondent ever sees only one bid
I Consider the proportion of “Yes” for each of these bids
WTP distribution
I
Assume that
I For a bid of 0, the proportion of Yes is 100%
I For some high bid the proportion would be zero
I Respondents have a single form of their utility function
I but differ according to observable dataX and unobservables‘ I
“connect-the-dots” as an estimate of the WTP dist.
WTP distribution
I
Going back to the IUF, we had the answer is “Yes”, then
v(z1,y≠b) +‘1 Øv(z0,y) +‘0I
In other words,
Pr{Yes|b} = Pr{v(z1,y≠b) +‘1 Øv(z0,y) +‘0}
= Pr{‘0≠‘1 Æv(z1,y≠b)≠v(z0,y)}
= Pr{‘0≠‘1 Æg(b,y, . . .)}
g()
cannot be any function because of
V(.), see laterI
If we make a hypothesis on the distribution of
‘0≠‘1I We have a model that can be estimated by Maximum Likelihood
I Logistic : Logit
I Normal : Probit
WTP distribution
Outline
Principles of Contingent Valuation
Maximum Likelihood EstimationApplication of ML : Logit & probit Single-bounded CV Estimation
Application
Computing Welfare Measures
Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Density
I f (y|◊): probability density function, pdf, of a random variable y
I conditioned on a set of parameters◊
I It represents mathematically thedata generating processof each obs. of a sample of data
I
The joint density of n
independent and identically distributed(iid) obs.
I = the product of the individuali densities:
f(y1, ...,yn|◊) =Ÿn
i=1
f(yi|◊) =L(◊|y)
I
This joint density is the
likelihood functionI a function of the unknown parameter vector◊
I y is used to indicate the collection of sample data
Likelihood Function
I
Intuitively, this is much the same as a joint probability
I Consider 2 (6-sided) dices
I What is the probability of rolling a 36 ?
I The likelihood function is akin to the idea of the probability of the sample
I Except that points have probability mass zero I
Usually simpler to work with the
log:
lnL(◊|y) =ÿn
i=1
lnf (yi|◊)
Conditional Likelihood
I
Generalize the likelihood function to allow the density to depend on conditioning variables:
f (yi|xi,◊)I Take the classical LRMyi=xi—+‘i
I Suppose‘is normally distributed with mean 0 and variance
‡2: ‘≥n!0,
‡2"
I Then,yi ≥n!x
i—,‡2"
I Thusyi arenot iid: they have different means
I But they are independant, so that(yi≠xi—)/‡≥n(0,1)
I thuslnL(◊|y,X) = ÿlnf (yi|xi,◊) =≠1
2 ÿn
i=1
Ëln‡2+ ln (2fi) + (yi≠xi—)2/‡2È where X is then◊Kmatrix of data with ith row equal toxi
Identification
I
Now that we have
lnL, how do we use it to obtain estimatesof the parameters
◊?
I and to testi hypotheses about them ?
I
There is the preliminary issue of identification
I Whether estimation of the parameters is possible at all
I This is about the formulation of the model
I The question is : Suppose we had an infinitely large sample,
I could we uniquely determine the values of◊ from it?
I The answer is sometimes no
Identification. The parameter vector◊
is identified (estimable) if for any other parameter vector,
◊ú”=◊, for some datay:
L(◊ú|y)”=L(◊|y)
Identification Example 1: Multicollinearity
I
LRM
yi =xi—+‘iI Suppose that there is a nonzero vectorasuch thatxiÕa=0’xi I That is the case when there is perfect multicollinearity
I Then there is another “parameter” vector,“=—+a”=— such thatxiÕ“=xiÕ—’xi
I
When this is the case, then the log-likelihood is the same whether it is evaluated at
—or at
“I So, it is not possible to consider estimation of— in this model since— cannot be distinguished from “
I Here identification is associated with the data
I But it can also be the way the model is written
Example 2 Identification via normalization
Consider the LRM
yi =—1+—2xi +‘i, where
‘i|xi ≥n!0,
‡2".
I
Consider the context of a consumer’s purchases of a large commodity such as a car where
I xi is the consumer’s income
I yi is the difference between what the consumer is willing to pay for the car,piú, and the price of the car,pi
I
Suppose rather than observing
piúor
pi, we observe only whether the consumer actually purchases the car, which, assume, occurs when
yi =piú≠pi >0.
I
Thus, the model states that the consumer will purchase the car if
yi >0 and not purchase otherwise
I
The random variable in this model is “purchase” or “not
purchase”—there are only two outcomes
Example 2 Identification via normalization
The probability of a purchase is
Pr{purchase|—1,—2,‡,xi} = Pr{yi >
0
|—1,—2,‡,xi}= Pr{—1+—2xi +‘i >
0|
—1,—2,‡,xi}= Pr{‘i >≠—1≠—2xi|—1,—2,‡,xi}
= Pr{‘i/‡ >(≠—1≠—2xi)/‡|—1,—2,‡,xi}
= Pr{zi >(≠—1≠—2xi)/‡|—1,—2,‡,xi}
where
zihas a standard normal distribution
The probability of not purchase is one minus this probability.
Example 2 Identification via normalization
Thus the likelihood function is
Ÿ
i=purchase
[Pr{purch|—1,—2,‡,xi}]◊ Ÿ
i=not purch
[1≠Pr{purch|—1,—2,‡,xi}]
This is often rewritten as
Ÿi [Pr{purchase|—1,—2,‡,xi}yi] [1≠Pr{purchase|—1,—2,‡,xi}](1≠yi)
The parameters of this model are
not identified:
I
If
—1,
—2and
‡are all multiplied by the same nonzero constant,
I thenPr{purchase}and the likelihood function do not change.
I
This model requires a
normalization.I The one usually used is‡=1.
Maximum Likelihood Estimation Principle
I
We see that with a discrete rv
I f(yi|◊)is the probability of observingyi conditionnally on ◊
I The likelihood function is then the probability of observing the sampleY conditionnaly on◊
I
We assume that the sample that we have observed is the most likely
I What value of◊ makes the observed sample most likely?
I Answer : The value of◊ that maximizes the likelihood function
I since then the observed sample will have maximum probability I
When
yis a continuous rv, instead of a discrete one,
I we cannot say anymore thatf(yi|◊)is the probability of observingyi conditionnally on◊,
I but we retain the same principle.
Maximum Likelihood Estimation Principle
I
The value of the parameter vector that maximizes
L(◊|data)isthe
maximum likelihood estimates◊ˆI Since the logarithm is a monotonic function, the value vector that maximizesL(◊|data)is the same as the one that maximizeslnL(◊|data)
I
The necessary condition for maximizing
lnL(◊|data)is
ˆlnL(◊|data)/ˆ◊=0
I This is called thelikelihood equation
Example : Likelihood Function and Equations for the Normal
Assume a sample
Yfrom an
n!µ,‡2".
I
The lnL function is
lnL!µ,‡2"=≠n2ln (2fi)≠n2ln‡2≠12qni=1
5(yi≠µ)2
‡2
6
I
The likelihood equations are
I ˆlnL
ˆµ = ‡12qn
i=1(yi≠µ) =0 and
I ˆlnL
ˆ‡2 =≠2‡n2 +2‡14qn
i=1(yi≠µ)2=0
I
These equations accept an
explicitsolution at
I µˆML=1nqn
i=1yi = ¯y and
I ‡ˆ2ML= 1nqn
i=1(yi≠y¯)2
I
Thus the sample mean is the ML estimator
I The ML estimator of the variance is not the OLS estimator (that has an n-1 denominator)
I In small samples, such estimator isbiased, but asnæ Œthat does not matter
ML Properties 1
I Conditionnally on correct distributionnal assumptions
I and under regularity conditions
I ML has very good properties
I In a sense, because the information supplied to the estimator is very good: not only the sample but also the full distribution I Notation
:
◊ˆis the ML estimator
I ◊0is the true value of the parameter vector
I ◊is any other value
I ML has only asymptotic properties
I in small samples, it may be biased or inefficient
ML Properties 2
I Consistency
:
plim◊ˆ=◊0I Asymptotic normality
:
◊ˆ≥NË◊0,{I(◊0)}≠1ÈI whereI(◊0) =≠EË
ˆ2lnL/ˆ◊0ˆ◊0ÕÈ
is theinformation matrix
I asymptotic covariance matrix
I ˆf/ˆ◊0indicatesˆf/ˆ◊evaluated at ◊0.
ML Properties 3
I Asymptotic efficiency
:
◊ˆis asymptotically efficient if
I it is consistent, asymptotically normally distributed,
I and has an asymptotic covariance matrix that is not larger than the asymptotic covariance matrix of any other consistent, asymptotically normally distributed estimator.
I ◊ˆachieves the Cramer–Rao lower bound for consistent estimators
I Invariance
: The ML estimator of
“0=c(◊0)is
c1◊ˆ2I ifc(◊)is a continuous and continuously differentiable function.
Outline
Principles of Contingent Valuation Maximum Likelihood Estimation
Application of ML : Logit & probitSingle-bounded CV Estimation
Application
Computing Welfare Measures
Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Specification (“how the probability is written”)
I
Let a class of non-linear models with dichotomous responses :
Pr (y =1
|X) =G(X—)G
is a function taking values between zero and one : 0
ÆG(z)Æ1,
’real number
zI This guarantees that estimated response probabilities will be between zero and one
I That is not the case w/ the LRM
I
Therefore, there is a
non linearrelation between
yand
XI
Many functions could do this job
I 2 are popular : logistic and normal
Logit & Probit
I Logit
model,
Gis the distribution function (cumulative density) of the standard
logisticr.v. :
G(z) = exp (z)/[1+ exp (z)] = (z)
I Probit
model,
Gis the distribution function of the standard
normalr.v., of which the density is noted
„(.):
G(z) =⁄ z
≠Œ
„(t)dt = (z)
with
„(z) = (2fi)≠1/2exp!≠z2/2"Logit vs. Probit
I
The logistic and normal distributions are similar
I
Logistic makes computations and analysis easier and allows for
simplifications in more advanced models
Latent Variable Model
I
Let
yúa latent variable (that is, not directly observed) s.t.
yú =X—+‘
I e.g. yúis utility from an environmental improvement z
I
Logit & probit may be derived from a latent variable model
I satifying all the classical LRM hypotheses
I
Utility is not observed, but only the consequence of the individual decision
I yiú <
0
=∆ yi =0
yiú Ø0
=∆ yi =1
I We observe that the person is(y=1)or is not(y =0)willing to pay an amountbœX to “buy” z
Response Probability
I
Hypotheses on
‘I independantX
I standard logistic or standard normal
I
Response probability for
y:
Pr{y =
1
|X} = Pr{yúØ0
|X}= Pr{‘>≠(X—)|X}
=
1
≠G(≠(X—))=G(X—)
I
Since
‘is normal or logistic, it is symmetrical around zero
I thus 1≠G(≠z) =G(z)’real numberz
Maximum Likelihood Estimation
I
As indicated earlier, the Likelihood Function for the dichotomous case is
Ÿ
i
[Pr{willing|—,‡,Xi}yi] [1≠Pr{willing|—,‡,Xi}](1≠yi)
I
ML seeks the maximum of (the log of) this function
I It does not have an explicit solution
I But yield numerical estimates—ˆMV I Consistent but biaised
I Asymptotically efficient & normal
I As long as the model hypotheses are true
I So, if you used Probit : is‘really normal ?
I If the distribution hypothesis is not true, sometimes we may retain the properties
I On the other hand, endogeneity ofX is as serious as usual
Marginal effect a continuous regressor x
jI
The effect of a marginal change in
xjI on the response probaPr{y=1|X}=p(X)
I is given by the partial derivative ˆp(X)
ˆxj = ˆG(X—)
ˆxj =g(X—)—j I
This is the
marginal effectof
xjI it depends on the values taken byallthe regressors (not just xj)
I Compare to the LRM:ˆy/ˆxj=—j
I it cannot bring the proba below zero or above one
Marginal effect a continuous regressor x
jI
Thus, the marginal effect is a non-linear combination of the regressors
I
It can be calculated at “interesting” points of
XI e.g. X¯: the sample average point : ˆy ˆxj
!X¯"
I However, that does not mean much for discrete regressors e.g.
gender
I Or it can be calculated for eachi in the sample ˆy ˆxj (Xj)
I and then we can compute an average of the “individual”
marginal effect ˆy ˆxj(Xj)
I In general, these are not the same : ˆy
ˆxj (Xj)”= ˆy ˆxj
!X¯"
I
Which one do we choose ?
I Often, that is too complicated for presentation
Marginal effect of a discrete regressor x
jI
Effect of a change in
xjdiscrete
I fromatob (often, from 0 to 1)
I on the response probaPr{y=1|X}=p(X)
I WriteX≠j the set of all the regressorsbutxj, similarly—≠j
I The discrete change inˆp(Xi)is ˆ
p(Xi) =G1
X≠ji—ˆ≠j+b—ˆj
2≠G1
X≠ji—ˆ≠j+a—ˆj
2
I
Such discrete effect differs from individual to individual
I Even the sign could in principle differ
I
In R, such effects are not calculated automatically
I The above formula must be calculated explicitly
Compare Logit & Probit
I
The
sizesof the coefficients
are notcomparable between these models
I This is because, with dichotomous variablesy
I the set of all coefficients could be multiplied by a positive constant without changing they
I Usually, the variance ofy is not identified
I Approximately, multiply probit coef by 1.5 yields the logit coef (rule of thumb!)
I
The marginal effects should be approximately the same
Measures of the quality of fit
I
The
correctly predicted percentageI may be appealing
I ’i compute the fit proba thatyi takes value 1, G1 Xi—ˆ2
I IfØ.5 we “predict”yi =1 and zero otherwise
I Compute the % of correct predictions
I
Problem : it is possible to see high % of correctly predicted while the model is not much useful
I e.g. in a sample of 200, 180 obs. ofyi=0 for which 150 are predicted zero and 20 obs. ofyi=1 all predicted zero
I The model is clearly poor
I But we still have 75% correct predictions
I A flat prediction of 0 has 90% correct predictions I
A better measure is a 2
◊2 table as in the next slide
Goodness of Fit : Predictive Table
Observed
yi =1
yi =0 Total
Pre dic t
ˆyi =1
350122 472
ˆyi =
0 78
203281
Total 428 325 753
We’ll see the R cmd later
Goodness of Fit : Pseudo R-square
I Pseudo≠R2 =
1
≠lnLUR/lnL0I lnLUR log-likelihood of the estimated model
I lnL0log-likelihood of a model with only the intercept
I i.e. forcing all—=0 except for the intercept I
Similar to an
R2for OLS regression
I sinceR2=1≠SSRUR/SSR0
I
There exists other measures of the quality of fit
I but the fit is not what maximum likelihood is seeking
I contrarily to LS
I I stress more the statistical significance of regressors
Outline
Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit
Single-bounded CV EstimationApplication
Computing Welfare Measures
Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Utility
I
Come back to economic value
I Compensating Variation for a change fromz0 toz1is
V(z1,y≠CV) =V(z0,y) CV is interpreted as the (max) WTP to secure an improvement z=z1≠z0
I
We observe answers to bid
bwhen
b <WTPI
Assume RUM, then
Pr{Yes|b}= Pr{‘0≠‘1 Æv(z1,y≠b)≠v(z0,y)}
Linear utility
I
Suppose
V (zj,y) =–j +—y j =0, 1
I — is the marginal utility of income
I In principle, we would like it to decrease with income, but simplify
I
The (max) WTP is s.t.
–0+—y+‘0 =–1+—(y≠WTP) +‘1
I SolvingWTP= –+‘
— , with–=–1≠–0and‘=‘1≠‘0
I So that e.g. E(WTP) =–/—
I
The probability of Yes to bid
bis then
Pr{Yes|b} = Pr{v(z1,y≠b) +‘1≠v(z0,y)≠‘0>
0}
= Pr{‘Æ—b≠–}
I This is a simple probit/logit context
I UtilityV is not identified, only the difference
I But that may be because of the linearity
Probit/Logit
I
If
‘≥n(0,1) std normal, then
Pr{Yes|b}=
1
≠ (—b≠–) = (–≠—b)I
If
‘is the std logistic, then
Pr{Yes|b}=
1/
exp (—b≠–)I
If we assume that
WTP Ø0, then we can derive similarly:
I with‘≥log≠normal
Pr{Yes|b}=1≠ (—b≠–) = (–≠—ln (b))
I with‘≥log≠logistic
Pr{Yes|b}=1/exp (—ln (b)≠–)
I These last two are still the Probit and Logit models, respectively,
I But thelnof the bid is used instead of the bid itself
I And that is still compatible with RUM
Estimation
I
Using these expressions, it is easy to derive the log-likelihood function
I as in the previous section
I
Usually, we want to account for individual characteristics
XiI Those that are collected in the survey
I This is done through–i =qk=K k=0 “kxki I x0i =1’i for an intercept
I
Parametric estimation
I GLM package (core distribution)
I DCchoice package
I Check orlibrary( )it
I
Example with the
NaturalParkdata
I of theEcdatpackage (Croissant 2014)
Outline
Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit
Single-bounded CV EstimationApplication
Computing Welfare Measures
Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
NaturalPark data
I
From the
Ecdatpackage (Croissant 2014)
I consists of WTP and other socio-demographic variables
I of 312 individuals regarding the preservation of the Alentejo natural park in Portugal
I
In
helptype NaturalPark, execute
summary(NaturalPark)I 7 variables
I bid1 is the first bid
I min 6, max 48
I bidh & bidl are 2nd bids that we do not look into for the moment
I answers is a factor {YY, YN, NY, NN} of answers to 2 bids
I We’ll use only the 1st letter, Y for Yes
I Socio-demographics are Age, Sex, Income
Data Transformation
I
Rename
NP <- NaturalParkfor simplicity
I
Extract the answer to the 1st bid from “answers”
I NP$R1 <- ifelse(NP$answers == "yy" | NP$answers ==
"yn", 1, 0)
I What does that do ?
I A call to the logical function ifelse( ) takes the form ifelse(condition, true, false)
I It returns true if condition is true, and false otherwise.
I The vertical bar | is an “or” operator in a logical expression.
I The prefix NP$ is a command that makes it possible to access each variable contained in NP directly.
I
Convert bid1 to log (log in English is ln in French)
I NP$Lbid1 <- log(NP$bid1)
I summary(NP)
reveals that things have gone smoothly
Estimation using glm
I glm
is a classical package for many models of type
G(X—)I Its use is much likelm
I But you have to specify the link functionG using option family
=
I This is fairly flexible, but a bit complicated
I See RAE2017 for the specifications I summary
works on
glmI As several usual commands
I
Output is in part similar to
lmI Coef values next to var names with their significance
I This is interpreted in a way similar tolm
I Negative (signif.) coef implies a negative impact on Pr{Yes|b}
I Note “sexfemale” indicates the effect for women
I Also gives thelnL value at optimum
have explicit solutions Solved numerically by algorithms
I
Newton-type in plot
I Iterates until condition
I
Risk of local max
I poor start point
Estimation using DCchoice
I DCchoice
is designed for such data
I But not for other contexts
I Format for single-bounded issbchoice(formula, data, dist =
"log-logistic")
I
the default dist is log-logistic
I This is in fact logistic
I But the bid variable is interpreted in log
I formula followsResponse ~ X | bid
I | bid is mandatory
I
the output is much more directly relevant for valuation purposes
I bidalways shown last
I log(bid)if log-logistic or log-normal were selected
I Measures of mean WTP, we will see later
Goodness of fit
I
table(predict(fittted model, type = "probability")>.5,NP$R1)
I This is a contingency table that counts the number of predicted Yes
I predicted prob>.5 (returns TRUE or FALSE)
I against the actual Yes/No
I per individual, so with each individual’sXi (individual predictions)
SB.NP.DC.logit 0 1 FALSE (predicted 0) 85 38
TRUE (predicted 1) 56 133
Plotting
I sbchoice
cmd produces an object that can be
plotted directlyI directplotofthe object is the fitted probability of Yes for the bid range
I probably, conditionnaly on average age, income, sex – the package isn’t explicit
I
Using a predict method helps
I observe what is outside the range
I compare logit & probit fitted curves
I see RAE2017.R
I In particular : logit has slightly fatter tails, inducing a higher WTP
I To use predict
I Creates a matrix of new data
I Chooses the proper type, here we want a proba,“response”
Logit vs. Probit predict
I
As can be seen, for the same data
I Logit has slightly fatter tails than probit
Outline
Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit
Single-bounded CV EstimationApplication
Computing Welfare Measures Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Expected WTP
I
Recall that when we assumed
V (zj,y) =–j +—y j =0, 1
I Then (max) WTP is s.t.
–0+—y+‘0=–1+—(y≠WTP) +‘1
I SoWTP= –+‘
— , with–=–1≠–0and‘=‘1≠‘0 I So that WTP is a r.v.
I we can compute e.g. E(WTP) =–/—
I When–i=qk=K k=0 “kxki
I thenE(WTP)becomes individual
I So that we have to think about how we aggregate individual expected WTP
Other measures of welfare
I E(WTP)
is the most obvious measure
I However, the expectation is strongly influenced by the tail of the
distributionG(.)
I While we do not actually have data to fit it
I Since there are not many bids
I And it does not feel very serious
to ask a very high bid
Truncated expectations
I E(WTP) =⁄ Œ
0 Pr{Yes|b}db
I Historically, the highest bid has been used to truncate E(WTP):
⁄ b max
0 Pr{Yes|b}db
I However, that is not a proper expectation since the support of Pr{Yes|b}does not stop atb max
I An alternative uses the truncated distribution:
⁄ b max
0 Pr{Yes|b}/(1≠Pr (Yes|b max))db
Median WTP
I
Finally the median has been suggested as a more robust measure:
bmedian
s.t.
Pr{Yes|bmedian}=1/2
I i.e. the bid s.t. 50%
would be favorable
Shape of the WTP function
I
Clearly, the form of the WTP function depends on the form of
V (.)I For some forms, some values of— lead to impossibilities
Distribution Expected Median
Normal
–— –—Logistic
–— –—Log-normal
exp1–—2exp12—122 exp1–—2
Log-logistic
exp1–—2 11
≠—12 11
+1—2 exp1–—2I
Again, if
–i =qk=Kk=0 “kxki, each of these forms are individual
I So the question arises whether to compute a sample mean or a sample median
I DCchoice appears to compute a sample mean
I But is not explicit
How do we choose a welfare measure ?
I
That might be the time for a debate ?
I
3 welfare measures : untruncated, properly truncated, median
I Of course we would not select an infinite one
I The smallest estimates to avoid criticism ?
I Do these measures differ significantly from each other ?
I We will see that later I
4 well-known distribution
I (log-)normal, (log-)logistic: they do not differ substantially
I There are others
I There are also other estimators e.g. non-parametric
I The estimate of the best-fit model ?
I
2 aggregation rules : sample mean or sample median
I researcher usually take the first, but what is the meaning of a sample mean of individual medians ?
I
DCchoice does not provide any guidance
Computing the welfare measure
I
DCchoice computes automatically
I
With glm, use the above formulas
I See the code in RAE2017.R
I
Much as I like DCchoice, I must note that for the data we use (NaturalPark)
I The mean WTP does not coincide with the median
I for the symmetrical distributions (normal & logistic)
I That is a problem I should write the authors
Confidence Intervals
I
In the end, WTP, under any of its forms, is an estimate
I As such it has a confidence interval
I Much as for—, you should always report the CIˆ
I At a minimum to give an idea of the variance
I and to show whether it is significantly different from zero I
There are 2 main methods
I Krinsky & Robb
I Bootstrap
I
These methods are much broader than valuation
I They are useful in all types of research in applied econometrics
Constructing Confidence Intervals: Krinsky and Robb
I
Start with the estimated vector of coefficients
I
By ML, it is distributed (multivariate) normally
I Its matrix of variance-covariance has been estimated in the ML process
I
Draw D times from a multivariate normal distribution with
I mean = the vector of estimated coefficients
I the estimated variance-covariance matrix of these estimated coefficients
I
So, we have D vectors of estimated coefficients
I If D is large, the average of these D vectors is just our original vector of coef.
Constructing Confidence Intervals: Krinsky and Robb
I
Compute the welfare measure for each such replicated coefficient vector
I Thus we have D estimated welfare measures
I some large, some small : an empirical distribution
I order them from smallest to largest
I the 5% most extreme are deemed random
I the 95% most central are deemed reasonnable
I and contitute the 95% confidence interval I
For example,
I the lower and upper bounds of the 95% confidence interval
I corresponds to the 0.025 and 0.975 percentiles of the measures, respectively
Krinsky and Robb: Implementation in DCchoice
I
Function
krCI(.)I constructs CI for the four different WTPs
I estimated by functionssbchoice(.) ordbchoice(.) I
call as
krCI(obj, nsim = 1000, CI = 0.95)I objis an object of either the “sbchoice” or “dbchoice” class,
I nsimis the number of draws of the parameters
I influences machine time
I CIis the percentile of the confidence intervals to be estimated
I returns an object “krCI”
I table of the simulated confidence intervals
I vectors containing the simulated WTPs
I
Is there a package that does Krinsky & Robb for glm objects ?
Constructing Confidence Intervals: Bootstrap
I
Similar to Krinsky & Robb
I except in the way the new estimated coefficients are obtained
I Essentially, instead of simulating new estimates
I We simulate new data
I and then calculate new estimates
Mediocrity principle
I
Consider that our sample is
mediocrein the population
I We mean : it does not have anything exceptional
I
Then, if we could draw a new sample from that population,
I we would surely obtain a fairly mediocre sample
I that is, fairlysimilarto the original one
Bootstrap principle
I
It’s not possible to draw a new sample
I
But imagine that using the original sample, we draw one obs,
I and we “put it back” in the sample (“replace”)
I then we draw again
I repeat until we have the same number of obs as in the original sample
I call thisa bootstrap sample
I Each original obs. appears 0 to n times I
By the
mediocrity principleI the bootstrap sample is fairly close to a real new sample
I Estimate a new vector of coefficients
I Repeat D times
Bootstrap: Implementation in DCchoice
I
Function
bootCI()carries out the bootstrapping
I and returns the bootstrap confidence intervals
I callbootCI(obj, nboot = 1000, CI = 0.95)
I
Longer than K&R since each sample must be generated
I and then compute new estimates
I while K&R only simultates new estimates
I
In the end, the results are similar
I See RAE2017.R
I
Note : another mean would be the
resamplecmd
I Applicable toglm
I But I don’t develop here
Differences of welfare measures
I
Sometimes we want to know whether a welfare measure is significantly different from another
I In other words: is their difference significantly different from zero
I In terms of CI: does the CI of their difference include zero ?
I
To compute that: Bootstrap similarly
I Krinsky and Robb is also possible but
I If the welfare measures are independant, their difference has variance-covariance that is the sum of each
variance-covariance
I If they are not independant, then it’s difficult
Outline
Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation
Application
Computing Welfare Measures Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Double-bounded CV Estimation
I
To increase the amount of information collected by survey
I The valuation q is asked a 2nd time
I If R answered Yes (No), then asked about a higher (lower) bid I
Called double-bounded dichotomous choice
I Dichotomous choice with follow-up
I
More precisely, the phrasing could be
I If Z cost you bÄ, would you be WTP it ?
I If answered Yes: would you be WTPbUÄ? bU>b
I If answered No: would you be WTPbLÄ?bL<b
Double-bounded CV Estimation
I
There are 4 outcomes per R
I YY, YN, NY, NN
I YY indicates that WTP>bU
I YN that b<WTP<bU
I NY thatbL<WTP<b
I NN that WTP<bL I
Thus the answers are intervals
I Probit & Logit do not suffice
I Many use ML
I But the likelihood function is different
Estimation with DBDC data
I
To develop the likelihood function
I it is necessary to express probabilities first
I
Write
PYYas the probability to answer Yes, Yes
I PYY = Pr)bU <WTP*=1≠G!bU"
I PYN = Pr)b<WTP<bU*=G!bU"
≠G(b)
I PNY = Pr)bL<WTP<b*=G(b)≠G!bL"
I PNN= Pr)WTP<bL*=G!bL"
I
For a sample of
nobs.
lnL= ÿNn=1
ËdnYYPnYY +dnYNPnYN +dnNYPnNY +dnNNPnNNÈ
where
nindexes individuals and
dnXXindicates whether
nanswered
XX(dich. variable)
Estimation with DBDC data
I
There is no direct command corresponding to such likelihood
I It must be programmed
I This is called “Full Information Maximum Likelihood” FIML
I We don’t do this
I It is pre-programmed in DCchoice
I For the same basic choices of distribution as for SBDC data I
Endogeneity issue
I The 2nd bid is not exogenous
I Since it depends on the previous answer
I Thus it contains unobserved characteristics of the individuals
I Such unobservables also determine the 2nd choice
I This is in principle addressed by FIML
I
A more general model is Bivariate probit
I allowing the 2 answers to have less than perfect correlation
I not covered by DCchoice
Estimation with DBDC data : Std normal cdf
I PYY = PrÓbU <WTPÔ=
1
≠ 1≠–+—bU2I PYN = PrÓb <WTP <bUÔ= 1≠–+—bU2≠ (≠–+—b)
I PNY = PrÓbL<WTP <bÔ= (≠–+—b)≠ 1≠–+—bL2
I PNN = PrÓWTP <bLÔ= 1≠–+—bL2
I
So: we estimate the same coefficients
–and
—as in SBDC
I But with more data, so that it is more efficient
I Assuming people answer in the same way to both valuation questions
I
So: the computation of the welfare measures is the same
Outline
Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation
Application
Computing Welfare Measures Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Context
I
1989, about 35 000 tons of crude oil at sea
I
Ended up extending on about 26 000
km2at sea
I and soil 1 600 km of coastline
I
Most of the damage was in Prince William Sound and the Gulf of Alaska up to the Kodiak Islands
I
Several types of damages
I Professional fishing (minimal)
I Tourism (possibly a benefit)
I Environmental heritage loss
I Punitive damages (supposed to be incentive)
I
Valuation survey
Exxon-Valdez Questionnaire
I
Avoid Willingness to accept Q:
I For assumed strategic behaviour
I
The basis Q is
I Compensation for the loss of an environmental heritage during 10 years?
I Scenario: after 10 years environmental damage will be fully recovered
I
Convert such “WTA” into a WTP to avoid that the catastrophy happens again for 10 years
I Scenario: in 10 years time, similar cat. will be impossible due to double-hull
I
The scenario need not be true
I Since we are investigating human preferences for things that may never happen
I However, it must appear credible to respondents