Research in Applied Econometrics
Chapter 3. Contingent Valuation Econometrics
Pr. Philippe Polomé, Université Lumière Lyon 2
M1 APE Analyse des Politiques Économiques M1 RISE Gouvernance des Risques Environnementaux
2018 – 2019
Outline
Principles of Contingent Valuation
Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation
Application
Computing Welfare Measures
Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Purpose
I Operationalize
the theoretical notions of values
I
In practice : impossible to measure every individual’s benefit
I Resort to statistical techniquesI Representative samplesw/ control variables to enable inference to the population
I Individual benefits areneveridentified I Econometric techniques are thus essential I
We’ll need the packages
I DCchoice
I Ecdat, stats (should be there already)
Classification of Valuation Techniques
I
Based on
statedpreferences
I Contingent Valuation (“Évaluation” Contingente) I Choice experiments / Contingent choices
I
Based on
revealedpreferences
I Transport cost – estimating demand for transport I Hedonic prices – estimating demand for housing I
Based on (infered) prices
I
Others, not based on preferences
Stated Preferences Techniques
I
A sample of people is
surveyed directlyon their preferences about a public project
I To infer a measure of individual statistical value I at the population level
I
Interviews can be anything: telephone, postal mail, e-mail, website
I Preferably face-to-face, but it’s more expensive I or combinaisons
I
The sample depends on the objective
Contingent Valuation
I
One potential environmental change
z0→z1is described
I Together with its statedcostI The context of such cost is important : taxes, fees, prices...
I
A single question is asked : for or against said change
I The question is sometimes repeated w/ another costI At this stage, model only the 1st Q
Questionnaire Structure: wide - precise - wide
I
Opening questions
I Possible filters to select certain respondents
I
General Q on the environment to bring to the particular case of interest
I While making the respondent think about it I We want informed, thought about, answers I
Evaluation Q
I
Debriefing
I Why did the respondent answer what s.he did ? I Did s.he not believe the scenario ?
I
Collect data on specific potential explanatory variables
I e.g. if survey on a lake quality, what use has the respondent of the lake ?
I Tourism, recreational (boat, fish...)
I
Socio-econ data
I Primarily for inference to population
The Dichotomous Format
I
This is the most popular, least controversial “elicitation”
format
I Assumed least prone to untruthful answers I While not too demanding for the respondent
I
Consider an environmental change from
z0to
z1I To simplify, consider only an improvement I Respondents are proposed a “bid”b
I They answer yes or no
I But may also state that they don’t know or refuse to answer I This is similar to “posted price” market context
I There is a good (the environmental improvement) I The situation is a bit like asking whether to buy it I Respondents are routinely in such situation I Except, not in a public good context
I Further, “buying” cannot really be that I So we need a context, but we discuss that later
The Dichotomous Format
I
Formalizing, let the Indirect Utility
v(z,
y) +
IIt represents individual’s preferences
I from the point of view of the researcher
I The error term reflect multiple influences the researcher does not know about
I These influences are modeled as a random variable, but that does not mean people act randomly
I This is called a Random Utility Model (RUM) I If the answer is “Yes”, then it must be that
v(z1,y−b) +1≥v(z0,y) +0
I and thus, thatWTP>b
WTP distribution
I
4 to 6 (different) bids are proposed to different respondents
I Each respondent ever sees only one bidI Consider the proportion of “Yes” for each of these bids
WTP distribution
I
Assume that
I For a bid of 0, the proportion of Yes is 100%
I For some high bid the proportion would be zero I Respondents have a single form of their utility function
I but differ according to observable dataX and unobservables
I
“connect-the-dots” as an estimate of the WTP dist.
WTP distribution
I
Going back to the IUF, we had the answer is “Yes”, then
v(z
1,y−b) +1 ≥v(z
0,y) +0I
In other words,
Pr
{Yes|b}= Pr
{v(z
1,y−b) +1 ≥v(z
0,y) +
0}= Pr
{0−1 ≤v(z
1,y−b)−v(z
0,y)}= Pr
{0−1 ≤g(b
,y, . . .)
} g() has some properties because of
V(
.), see later
IIf we make a hypothesis on the distribution of
0−1I We have a model that can be estimated by Maximum Likelihood
I Logistic : Logit I Normal : Probit
WTP distribution
Outline
Principles of Contingent Valuation
Maximum Likelihood EstimationApplication of ML : Logit & probit Single-bounded CV Estimation
Application
Computing Welfare Measures
Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Density
I f
(y
|θ): probability density function, pdf, of a random variable
yI conditioned on a set of parametersθ
I It represents mathematically thedata generating processof each obs. of a sample of data
I
The joint density of n
independent and identically distributed(iid) obs.
I = the product of the individuali densities:
f(y1, ...,yn|θ) =
n
Y
i=1
f(yi|θ) =L(θ|y)
I
This joint density is the
likelihood function I a function of the unknown parameter vectorθ I y is used to indicate the collection of sample dataLikelihood Function
I
Intuitively, this is much the same as a joint probability
I Consider 2 (6-sided) dicesI What is the probability of rolling a 36 ?
I The likelihood function is the idea of the probability of the sample
I Except that points have probability mass zero
Conditional Likelihood
I
Generalize the likelihood function to allow the density to depend on conditioning variables:
f(y
i|xi, θ)
I Take the classical LRMyi=xiβ+i
I Supposeis normally distributed with mean 0 and variance σ2: ∼n 0, σ2
I Then,yi ∼n xiβ, σ2
I Thusyi arenot iid: they have different means
I But they are independant, so that (yi−xiβ)/σ∼n(0,1) I thusL(θ|y,X) =
Πif(yi|xi, θ) = Π
i
√ 1
2πσ2exp (
−(yi−xiβ)2 σ2
)
Conditional log-Likelihood
Usually simpler to work with the
log: ln
L(
θ|y) =n
X
i=1
ln
f(y
i|θ) thus ln
L(
θ|y,X) =
X
ln
f(y
i|xi, θ) =−1 2
n
X
i=1
h
ln
σ2+ ln (2π) + (y
i −xiβ)2/σ2iwhere X is the
n × Kmatrix of data with ith row equal to
xiIdentification
I
Now that we have ln
L, how do we use it to obtain estimatesof the parameters
θ?
I and to test hypotheses about them ? I
There is the preliminary issue of identification
I Whether estimation of the parameters is possible at all I This is about the formulation of the model
I The question is : Suppose we had an infinitely large sample, I could we uniquely determine the values ofθ from it?
I The answer is sometimes no
I e.g. in the LRMyi =xiβ+i when there is multicollinearity
Example : Identification via normalization
Consider the LRM
yi=
β1+
β2xi+
i, where
i|xi ∼n0
, σ2.
Ie.g. consumer’s purchases of a large commodity such as a car
where
I xi is the consumer’s income
I yi is the difference between what the consumer is willing to pay for the car,pi∗, and the price of the car,pi
I
Suppose that rather than observing
pi∗or
piI we observe only whether the consumer actually purchases the car
I assume this occurs whenyi =p∗i −pi >0.
I
The model states that the consumer purchases the car if
yi >0
I and not purchase otherwise
I
The random variable in this model is “purchase” or “not purchase”
I there are only 2 outcomes
Example : Identification via normalization
The probability of a purchase is
Pr
{purchase|β1, β2, σ,xi}= Pr
{yi >0
}= Pr
{β1+
β2xi+
i >0
}= Pr
{i >−β1−β2}= Pr
{i/σ >(
−β1−β2xi)
/σ}= Pr
{zi >(
−β1−β2xi)
/σ}where
zihas a standard normal distribution
The probability of not purchase is one minus this probability.
Example 2 Identification via normalization
Thus the likelihood function is
Yi=purchase
[Pr
{purch|β1, β2, σ,xi}]
× Yi=not purch
[1
−Pr
{purch|β1, β2, σ,xi}] This is often rewritten as
Y
i
[Pr
{purchase|β1, β2, σ,xi}yi] [1
−Pr
{purchase|β1, β2, σ,xi}]
(1−yi)The parameters of this model are
not identified:
I
If
β1,
β2and
σare all multiplied by the same nonzero constant,
I then Pr{purchase}and the likelihood function do not change I
This model
requires a normalizationI The one usually used isσ= 1.
Maximum Likelihood Estimation Principle
I
We see that with a discrete rv
I f(yi|θ) is the probability of observingyi conditionnally on θ I The likelihood function is then the probability of observing the
sampleY conditionnaly onθ
I Assume that the sample that we have observed is the most likely
I What value ofθ makes the observed sample most likely?
I Answer : The value ofθ that maximizes the likelihood function I since then the observed sample will have maximum probability
I
When
yis a continuous rv, instead of a discrete one,
I we cannot say anymore thatf(yi|θ) is the probability ofobservingyi conditionnally onθ, I but we retain the same principle.
Maximum Likelihood Estimation Principle
I
The value of the parameter vector that maximizes
L(
θ|data)is the
maximum likelihood estimatesθˆ
I The value vector that maximizesL(θ|data) is the same as the one that maximizes lnL(θ|data)
I
The necessary condition for maximizing ln
L(
θ|data) is∂
ln
L(
θ|data)/∂θ= 0
I This is called thelikelihood equationsAssume a sample
Yfrom an
n µ, σ2 IThe lnL function is
ln
L µ, σ2=
−n2ln (2
π)
−n2ln
σ2−12Pni=1(yi−µ)2 σ2
I
The likelihood equations are
I ∂∂µlnL = σ12Pn
i=1(yi−µ) = 0 and I ∂∂σln2L =−2σn2 +2σ14
Pn
i=1(yi−µ)2= 0 I
These equations accept an
explicitsolution at
I µˆML=1nPn
i=1yi = ¯y and I σˆ2ML= 1nPn
i=1(yi−y¯)2
I
Thus the sample mean is the ML estimator
I The ML estimator of the variance is not the OLS estimator (that has an n-1 denominator)
I In small samples, such estimator isbiased, but asn→ ∞that does not matter
ML Properties 1
I Conditionnally on correct distributionnal assumptions I and under regularity conditions
I ML has very good properties
I In a sense, because the information supplied to the estimator is very good: not only the sample but also the full distribution
I Notation
: ˆ
θis the ML estimator
I θ0is the true value of the parameter vector I θis any other value
I ML has only asymptotic properties
I in small samples, it may be biased or inefficient
ML Properties 2
I Consistency
:
plimθˆ =
θ0I Asymptotic normality
: ˆ
θ∼Nhθ0,{I(
θ0)
}−1i I whereI(θ0) =−Eh∂2lnL/∂θ0∂θ00i
is theinformation matrix I ∂f/∂θ0indicates∂f/∂θevaluated at θ0.
I Asymptotic efficiency
: ˆ
θis asymptotically efficient
I It achieves the Cramer–Rao lower bound for consistentestimators
I Invariance
: The ML estimator of
γ0=
c(
θ0) is
cθˆ
I ifc(θ) is a continuous and continuously differentiable function.Outline
Principles of Contingent Valuation Maximum Likelihood Estimation
Application of ML : Logit & probitSingle-bounded CV Estimation
Application
Computing Welfare Measures
Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Specification (“how the probability is written”)
I
Let a class of non-linear models with dichotomous responses : Pr (y = 1
|X) =
G(X
β)
G
is a function taking values between zero and one : 0
≤G(X
β)
≤1
I This guarantees that estimated response probabilities will be between zero and one
I That is not the case w/ the LRM
I
Therefore, there is a
non linearrelation between
yand
X IMany functions could do this job
I 2 are popular : logistic and normal
Logit & Probit
I Logit
model,
Gis the distribution function (cumulative density) of the standard
logisticr.v. :
G
(X
β) = exp (Xβ)/[1 + exp (X
β)] = Λ (Xβ) I Probitmodel,
Gis the distribution function of the standard
normal
r.v., of which the density is noted
φ(.) :
G(X
β) =
Z Xβ−∞
φ
(t)
dt= Φ (X
β)
with
φ(X
β) = (2
π)
−1/2exp
−(X
β)
2/2
Logit vs. Probit
I
The logistic and normal distributions are similar
I
Logistic makes computations and analysis easier and allows for
simplifications in more advanced models
Latent Variable Model
I
Let
y∗a latent variable (that is, not directly observed) s.t.
y∗
=
Xβ+
I e.g. y∗is utility from an environmental improvement ∆z I
Logit & probit may be derived from a latent variable model
I satifying all the classical LRM hypotheses
I
Utility is not observed, but only the consequence of the individual decision
( yi∗ <
0 =
⇒ yi= 0
yi∗ ≥0 =
⇒ yi= 1
I We observe that the person is (y= 1) or is not (y = 0) willing to pay an amountb to “buy” an environmental improvement z1−z0= ∆z
Response Probability
I
Hypotheses on
I independantXI standard logistic or standard normal I
Response probability for
y:
Pr
{y= 1
|X}= Pr
{y∗≥0
|X}= Pr
{ >−(X
β)
|X}= 1
−G(
−(X
β))
=
G(X
β)
I
Since
is normal or logistic, it is symmetrical around zero
I thus 1−G(−Xβ) =G(Xβ)Maximum Likelihood Estimation
I
As indicated earlier, the Likelihood Function for the dichotomous case is
Y
i
[Pr
{willing|β, σ,Xi}yi] [1
−Pr
{willing|β, σ,Xi}]
(1−yi) IML seeks the maximum of (the log of) this function
I It does not have an explicit solution I But yield numerical estimates ˆβMV
I Consistent but biaised
I Asymptotically efficient & normal I As long as the model hypotheses are true
I So, if you used Probit : isreally normal ?
I If the distribution hypothesis is not true, sometimes we may retain the properties
I On the other hand, endogeneity ofX is as serious as usual
Marginal effect a continuous regressor x
jI
The effect of a marginal change in
xjI on the response proba Pr{y= 1|X}=p(X) I is given by the partial derivative
∂p(X)
∂xj = ∂G(Xβ)
∂xj =g(Xβ)βj I
This is the
marginal effectof
xjI it depends on the values taken byallthe regressors (not just xj)
I Compare to the LRM:∂y/∂xj=βj
I it cannot bring the proba below zero or above one
Marginal effect a continuous regressor x
jI
Thus, the marginal effect is a non-linear combination of the regressors
I
It can be calculated at “interesting” points of
X I e.g. ¯X: the sample average point : ∂y∂xj
X¯
I However, that does not mean much for discrete regressors e.g.
gender
I Or it can be calculated for eachi in the sample ∂y
∂xj
(Xji) I and then we can compute an average of the “individual”
marginal effect ∂y
∂xj(Xj)
I In general, these are not the same : ∂y
∂xj (Xj)6= ∂y
∂xj
X¯ I
Which one do we choose ?
I Often, that is too complicated for presentation
Marginal effect of a discrete regressor x
jI
Effect of a change in
xjdiscrete
I fromatob (often, from 0 to 1)I on the response proba Pr{y= 1|X}=p(X)
I WriteX−j the set of all the regressorsbutxj, similarlyβ−j I The discrete change in ˆp(Xi) is
∆ˆp(Xi) =G
X−jiβˆ−j+bβˆj
−G
X−jiβˆ−j+aβˆj I
Such discrete effect differs from individual to individual
I Even the sign could in principle differ
I
In R, such effects are not calculated automatically
I The above formula must be calculated explicitlyCompare Logit & Probit
I
The
sizesof the coefficients
are notcomparable between these models
I Approximately, multiply probit coef by 1.5 yields the logit coef (rule of thumb!)
I
The marginal effects should be approximately the same
Measures of the quality of fit
I
The
correctly predicted percentage I may be appealingI ∀i compute the fit proba thatyi takes value 1, G Xiβˆ I If≥.5 we “predict”yi = 1 and zero otherwise
I Compute the % of correct predictions
I
Problem : it is possible to see high % of correctly predicted while the model is not much useful
I e.g. in a sample of 200, 180 obs. ofyi= 0 for which 150 are predicted zero and 20 obs. ofyi= 1 all predicted zero
I The model is clearly poor
I But we still have 75% correct predictions I A flat prediction of 0 has 90% correct predictions
I
A better measure is a 2
×2 table as in the next slide
Goodness of Fit : Predictive Table
Observed
yi= 1
yi= 0 Total
Predict
ˆ
yi
= 1
350122 472 ˆ
yi
= 0 78
203281
Total 428 325 753
We’ll see the R cmd later
Goodness of Fit : Pseudo R-square
I Pseudo−R2
= 1
−ln
LUR/ln
L0I lnLUR log-likelihood of the estimated model
I lnL0log-likelihood of a model with only the intercept I i.e. forcing allβ= 0 except for the intercept
I
Similar to an
R2for OLS regression
I sinceR2= 1−SSRUR/SSR0I
There exists other measures of the quality of fit
I but the fit is not what maximum likelihood is seekingI contrarily to LS
I I stress more the statistical significance of regressors
Outline
Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit
Single-bounded CV EstimationApplication
Computing Welfare Measures
Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Utility
I
Come back to economic value
I Compensating Variation for a change fromz0 toz1is V(z1,y−CV) =V(z0,y)
CV is interpreted as the WTP to secure an improvement
∆z =z1−z0
I
We observe Yes answers to bid
bwhen
b <WTP IAssume RUM, then
Pr
{Yes|b}= Pr
{0−1 ≤v(z
1,y−b)
−v(z
0,y)}Linear utility
I
Suppose
V(z
j,y) =
αj+
βy j= 0
,1
I β is the marginal utility of incomeI In principle, we would like it to decrease with income, but simplify
I
The WTP is s.t.
α0+
βy+
0=
α1+
β(y
−WTP) +1 I SolvingWTP= α+β , withα=α1−α0and=1−0 I So that e.g. E(WTP) =α/β
I
The probability of Yes to bid
bis then
Pr
{Yes|b}= Pr
{v(z
1,y−b) +1−v(z
0,y)
−0>0
}= Pr
{≤βb−α}I This is a simple probit/logit context
I UtilityV is not identified, only the difference
Probit/Logit
I
If
∼n(0
,1) std normal, then
Pr
{Yes|b}= 1
−Φ (βb
−α) = Φ (α−βb) IIf
is the std logistic, then
Pr
{Yes|b}= 1
/exp (
βb−α)
I
If we assume that
WTP ≥0, then we can derive similarly:
I with∼log−normal
Pr{Yes|b}= 1−Φ (βb−α) = Φ (α−βln (b)) I with∼log−logistic
Pr{Yes|b}= 1/exp (βln (b)−α) I These last two are still the Probit and Logit models,
respectively,
I But the ln of the bid is used instead of the bid itself I And that is still compatible with RUM
Estimation
I
Using these expressions, it is easy to derive the log-likelihood function
I as in the previous section
I
Usually, we want to account for individual characteristics
XiI Those that are collected in the survey I This is done throughαi =Pk=K
k=0 γkxki
I x0i = 1∀i for an intercept
I
Parametric estimation
I GLM package (core distribution) I DCchoice package
I Check orlibrary( )it
I
Example with the
NaturalParkdata
I of theEcdatpackage (Croissant 2014)Outline
Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit
Single-bounded CV EstimationApplication
Computing Welfare Measures
Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
NaturalPark data
I
From the
Ecdatpackage (Croissant 2014)
I consists of WTP and other socio-demographic variables I of 312 individuals regarding the preservation of the Alentejo
natural park in Portugal
I
In
helptype NaturalPark, execute
summary(NaturalPark) I 7 variablesI bid1 is the first bid I min 6, max 48
I bidh & bidl are 2nd bids that we do not look into for the moment
I answers is a factor {YY, YN, NY, NN} of answers to 2 bids I We’ll use only the 1st letter, Y for Yes
I Socio-demographics are Age, Sex, Income
Data Transformation
I
Rename
NP <- NaturalParkfor simplicity
I
Extract the answer to the 1st bid from “answers”
I NP$R1 <- ifelse(NP$answers == "yy" | NP$answers ==
"yn", 1, 0)
I What does that do ?
I A call to the logical function ifelse( ) takes the form ifelse(condition, true, false)
I It returns true if condition is true, and false otherwise.
I The vertical bar | is an “or” operator in a logical expression.
I The prefix NP$ is a command that makes it possible to access each variable contained in NP directly.
I
Convert bid1 to log (log in English is ln in French)
I NP$Lbid1 <- log(NP$bid1)I summary(NP)
reveals that things have gone smoothly
Estimation using glm
I glm
is a classical package for many models of type
G(X
β)
I Its use is much likelmI But you have to specify the link functionG using option family
=
I This is fairly flexible, but a bit complicated I See RAE2017 for the specifications
I summary
works on
glm I As several usual commands IOutput is in part similar to
lmI Coef values next to var names with their significance I This is interpreted in a way similar tolm
I Negative (signif.) coef implies a negative impact on Pr{Yes|b}
I Note “sexfemale” indicates the effect for women I Also gives the lnL value at optimum
have explicit solutions Solved numerically by algorithms
I
Newton-type in plot
I Iterates until condition I
Risk of local max
I poor start point
Estimation using DCchoice
I DCchoice
is designed for such data
I But not for other contextsI Format for single-bounded issbchoice(formula, data, dist =
"log-logistic")
I
the default dist is log-logistic
I This is in fact logisticI But the bid variable is interpreted in log I formula followsResponse ~ X | bid
I | bid is mandatory
I
the output is much more directly relevant for valuation purposes
I bidalways shown last
I log(bid)if log-logistic or log-normal were selected I Measures of mean WTP, we will see later
Goodness of fit
I
table(predict(fittted model, type = "probability")>.5,NP$R1)
I This is a contingency table that counts the number ofpredicted Yes
I predicted prob>.5 (returns TRUE or FALSE) I against the actual Yes/No
I per individual, so with each individual’sXi (individual predictions)
SB.NP.DC.logit 0 1 FALSE (predicted 0) 85 38
TRUE (predicted 1) 56 133
Plotting
I sbchoice
cmd produces an object that can be
plotted directly I directplotofthe object is the fitted probability of Yes for thebid range
I probably, conditionnaly on average age, income, sex – the package isn’t explicit
I
Using a predict method helps
I observe what is outside the range I compare logit & probit fitted curvesI see RAE2017.R
I In particular : logit has slightly fatter tails, inducing a higher WTP
I To use predict
I Creates a matrix of new data
I Chooses the proper type, here we want a proba,“response”
Logit vs. Probit predict
I
As can be seen, for the same data
I Logit has slightly fatter tails than probit
Outline
Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit
Single-bounded CV EstimationApplication
Computing Welfare Measures Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Expected WTP
I
Recall that when we assumed
V(z
j,y) =αj+
βy j= 0
,1
I Then (max) WTP is s.t.α0+βy+0=α1+β(y−WTP) +1
I SoWTP= α+
β , withα=α1−α0and=1−0
I So that WTP is a r.v.
I we can compute e.g. E(WTP) =α/β I Whenαi=Pk=K
k=0 γkxki
I thenE(WTP) becomes individual
I So that we have to think about how we aggregate individual expected WTP
Other measures of welfare
I E
(WTP) is the most obvious measure
I However, the expectationis strongly influenced by the tail of the
distributionG(.)
I While we do not actually have data to fit it
I Since there are not many bids
I And it does not feel very serious
to ask a very high bid
Truncated expectations
I E
(WTP) =
Z ∞0
Pr
{Yes|b}dbI Historically, the highest bid has been used to truncate E(WTP)
Z b max
0
Pr{Yes|b}db
I However, that is not a proper expectation since the support of Pr{Yes|b}does not stop atb max
I An alternative uses the truncated distribution:
Z b max
0
Pr{Yes|b}/(1−Pr (Yes|b max))db
Median WTP
I
Finally the median has been suggested as a more robust measure:
bmedian
s.t.
Pr
{Yes|bmedian}=
1/2I i.e. the bid s.t. 50%
would be favorable
Shape of the WTP function
I
Clearly, the form of the WTP function depends on the form of
V(.)
I For some forms, some values ofβ lead to impossibilities
Distribution Expected Median
Normal
αβ αβLogistic
αβ αβLog-normal exp
αβexp
2β12
exp
αβLog-logistic exp
αβΓ
1
−β1Γ
1 +
1βexp
αβ IAgain, if
αi=
Pk=Kk=0 γkxki, each of these forms are individual
I So the question arises whether to compute a sample mean or a sample median
I DCchoice appears to compute a sample mean I But is not explicit
How do we choose a welfare measure ?
I
That might be the time for a debate ?
I
3 welfare measures : untruncated, properly truncated, median
I Of course we would not select an infinite oneI The smallest estimates to avoid criticism ?
I Do these measures differ significantly from each other ? I We will see that later
I
4 well-known distributions
I (log-)normal, (log-)logistic: they do not differ substantially I There are others
I There are also other estimators e.g. non-parametric I The estimate of the best-fit model ?
I
2 aggregation rules : sample mean or sample median
I researcher usually take the first, but what is the meaning of a sample mean of individual medians ?
I
DCchoice does not provide any guidance
Computing the welfare measure
I
DCchoice computes automatically
IWith glm, use the above formulas
I See the code in RAE.R
I
Much as I like DCchoice, I must note that for the data we use (NaturalPark)
I The mean WTP does not coincide with the median I for the symmetrical distributions (normal & logistic) I That is a problem I should write the authors
Confidence Intervals
I
In the end, WTP, under any of its forms, is an estimate
I As such it has a confidence intervalI Much as for ˆβ, you should always report the CI I At a minimum to give an idea of the variance
I and to show whether it is significantly different from zero
I
There are 2 main methods
I Krinsky & Robb I BootstrapI
These methods are much broader than valuation
I They are useful in all types of research in applied econometrics
Constructing Confidence Intervals: Krinsky and Robb
I
Start with the estimated vector of coefficients
IBy ML, it is distributed (multivariate) normally
I Its matrix of variance-covariance has been estimated in the ML process
I
Draw D times from a multivariate normal distribution with
I mean = the vector of estimated coefficientsI the estimated variance-covariance matrix of these estimated coefficients
I
So, we have D vectors of estimated coefficients
I If D is large, the average of these D vectors is just our original vector of coef.
Constructing Confidence Intervals: Krinsky and Robb
I
Compute the welfare measure for each such replicated coefficient vector
I Thus we have D estimated welfare measures I some large, some small : an empirical distribution
I order them from smallest to largest I the 5% most extreme are deemed random I the 95% most central are deemed reasonnable I and contitute the 95% confidence interval
I
For example,
I the lower and upper bounds of the 95% confidence interval I corresponds to the 0.025 and 0.975 percentiles of the
measures, respectively
Krinsky and Robb: Implementation in DCchoice
I
Function
krCI(.)I constructs CI for the 4 different WTPs
I estimated by functionssbchoice(.) ordbchoice(.)
I
call as
krCI(obj, nsim = 1000, CI = 0.95)I objis an object of either the “sbchoice” or “dbchoice” class, I nsimis the number of draws from the multidimensional normal
I influences machine time
I CIis the percentile of the confidence intervals to be estimated I returns an object “krCI”
I table of the simulated confidence intervals I vectors containing the simulated WTPs
I
Is there a package that does Krinsky & Robb for glm objects ?
Constructing Confidence Intervals: Bootstrap
I
Similar to Krinsky & Robb
I except in the way the new estimated coefficients are obtained I Essentially, instead of simulating new estimates
I We simulate new data
I and then calculate new estimates
Mediocrity principle
I
Consider that our sample is
mediocrein the population
I We mean : it does not have anything exceptionalI
Then, if we could draw a new sample from that population,
I we would surely obtain a fairly mediocre sampleI that is, fairlysimilarto the original one
Bootstrap principle
I
It’s not possible to draw a new sample
I
But imagine that using the original sample, we draw one obs,
I and we “put it back” in the sample (“replace”)I then we draw again
I repeat until we have the same number of obs as in the original sample
I call thisa bootstrap sample
I Each original obs. appears 0 to n times
I
By the
mediocrity principleI the bootstrap sample is fairly close to a real new sample I Estimate a new vector of coefficients
I Repeat D times
Bootstrap: Implementation in DCchoice
I
Function
bootCI()carries out the bootstrapping
I and returns the bootstrap confidence intervals I callbootCI(obj, nboot = 1000, CI = 0.95)I
Longer than K&R since each sample must be generated
I and then compute new estimatesI while K&R only simultates new estimates I
In the end, the results are similar
I See RAER
I
Note : another mean would be the
resamplecmd
I Applicable toglmI But I don’t develop here
Differences of welfare measures
I
Sometimes we want to know whether a welfare measure is significantly different from another
I In other words: is their difference significantly different from zero
I In terms of CI: does the CI of their difference include zero ? I
To compute that: Bootstrap similarly
I Krinsky and Robb is also possible but
I If the welfare measures are independant, their difference has variance-covariance that is the sum of each
variance-covariance
I If they are not independant, then it’s difficult
Outline
Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation
Application
Computing Welfare Measures Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Double-bounded CV Estimation
I
To increase the amount of information collected by the survey,
I The valuation q is asked a 2nd timeI If answer is Yes (No), then ask with a higher (lower) bid
I
Called
double-boundeddichotomous choice
I Dichotomous choice with follow-upI
More precisely, the phrasing could be
I If ∆Z cost you b€, would you be WTP it ?I If answered Yes: would you be WTPbU€ ? bU>b I If answered No: would you be WTPbL€ ? bL<b
Double-bounded CV Estimation
I
There are 4 outcomes per R
I YY, YN, NY, NNI YY indicates that WTP>bU I YN that b<WTP<bU I NY thatbL<WTP<b I NN that WTP<bL
I
Thus the answers are intervals
I Probit & Logit do not suffice I Many use MLI But the likelihood function is different
Estimation with DBDC data
I
To develop the likelihood function
I it is necessary to express probabilities first I
Write
PYYas the probability to answer Yes, Yes
I PYY = Pr
bU <WTP = 1−G bU I PYN = Prb<WTP<bU =G bU
−G(b) I PNY = Pr
bL<WTP<b =G(b)−G bL I PNN= PrWTP<bL =G bL
I
For a sample of
nobs. ln
L=
N
X
n=1
hdnYYPnYY
+
dnYNPnYN+
dnNYPnNY+
dnNNPnNNi
where
nindexes individuals and
dnXXindicates whether
nanswered
XX(dich. variable)
Estimation with DBDC data
I
There is no direct command corresponding to such likelihood
I It must be programmedI This is called “Full Information Maximum Likelihood” FIML I We don’t do this
I It is pre-programmed in DCchoice
I For the same basic choices of distribution as for SBDC data
I
Endogeneity issue
I The 2nd bid is not exogenous
I Since it depends on the previous answer
I Thus it contains unobserved characteristics of the individuals I Such unobservables also determine the 2nd choice
I This is in principle addressed by FIML I
A more general model is Bivariate probit
I allowing the 2 answers to have less than perfect correlation I not covered by DCchoice
Estimation with DBDC data : Std normal cdf
I PYY
= Pr
nbU <WTPo= 1
−Φ
−α+
βbU I PYN = Prb<WTP<bU = Φ −α+βbU
−Φ (−α+βb) I PNY = Pr
bL<WTP<b = Φ (−α+βb)−Φ −α+βbL I PNN
= Pr
nWTP <bLo= Φ
−α+
βbLI
So: we estimate the same coefficients
αand
βas in SBDC
I But with more data, so that it is more efficientI Assuming people answer in the same way to both valuation questions
I
So: the computation of the welfare measures is the same
Outline
Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation
Application
Computing Welfare Measures Double-bounded CV Estimation
Application: Exxon-Valdez
Questionnaire Issues and Biases
Context
I
1989, about 35 000 tons of crude oil at sea
IEnded up extending on about 26 000
km2at sea
I and soil 1 600 km of coastline
I
Most of the damage was in Prince William Sound and the Gulf of Alaska up to the Kodiak Islands
I
Several types of damages
I Professional fishing (minimal) I Tourism (possibly a benefit) I Environmental heritage lossI Punitive damages (supposed to be incentive) I
Valuation survey
Exxon-Valdez Questionnaire
I
Avoid Willingness to accept Q:
I For assumed strategic behaviour I
The basis Q is
I Compensation for the loss of an environmental heritage during 10 years?
I Scenario: after 10 years environmental damage will be fully recovered
I
Convert such “WTA” into a WTP to avoid that the catastrophy happens again for 10 years
I Scenario: in 10 years time, similar cat. will be impossible due to double-hull
I
The scenario need not be true
I Since we are investigating human preferences for things that may never happen
I However, it must appear credible to respondents
Reading the Exxon-Valdez Data
I data(“CarsonDB”)
I it is only a frequency table for the DBDC survey I Thus w/o theX data
T1 TU TL yy yn ny nn
1 10 30 5 119 59 8 78
2 30 60 10 69 69 31 98
3 60 120 30 54 75 25 101
4 120 250 60 35 53 30 139
I
So there are 6 distinct bids
I 5, 10, 30, 60, 120, 250I There is always a large proportion of nn : part of protest I The proportion of yy decreases w/ b
I The proportion of yn & of ny is roughly constant w/ b I w/ about 2 to 3 more yn than ny
Converting the Exxon-Valdez Data
I
We need data that is individual
I 119+59+...+53+30 = 1043 (cfr nobs in RAE.R)
I
Observations are created from that frequency table by 3 steps.
1. create a new data frame db.data, filled w/ 0
I to save the reconstructed individual observations and to prepare indexes
2. then organize the 3 columns containing the first bids (bid1), and the increased and decreased second bids (bid2U and bid2L) 3. then, fill in the answers to each bid corresponding to the
numbers in the frequency table I
Follow the detailed code in RAE2017.R
Estimating WTP with DBDC data
I dbchoice(formula, data, dist = "log-logistic", par = NULL) I Usage similar tosbchoice
I Except forformula&par
I formula
: R1 + R2 ~ var | bid1 + bid2
I R1 + R2 : 4 response outcomes in 2 dichotomous variables I bid1 + bid2 : 2 bids
I var : any number of covariates I Unfortunately, we do not have any
I par
: starting values, may be NULL
I There is no guarantee that the likelihood is unimodal I Optimization may not converge
I Different s.v. may lead to different optima I Take the one w/ higher likelihood
Estimating WTP with DBDC data
I
See the code in RAE.R
IEstimation goes smoothly
I See “convergence TRUE”
I When this is not the case : the estimated values have no meaning
I You need to specifypar =a vector I Cfr the slide “Iterations and Convergence”
I
Coef of bid is neg. & highly signif in all 4 models
IThe same 4 measures of welfare are given
I E(WTP)→ ∞in the log-log model when|β|<1 I The measure is for a one-time paiement for 10 years I As it turns out, the median is always more conservative (for
probit-type models)
CI
I
Use exactly the same cmds as for SB
Ifor the log-logistic model e.g.
I The CI for the (infinite) mean WTP is not defined I Otherwise we see similar results for Krinsky & Robb as for
Bootstrap
I approx±5 around the central value I e.g. for the 30.4 median : [26.5 - 35.5]
Exxon-Valdez : Non-use value
I
The value presented to the Courts was for a
medianWTP of about 3.2$/y per household for 10 years
I to avoid such accidents in the next 10 years I That is 32$ total
I
Since the ∆z here is natural heritage of the whole US
I this value refers to the 90.9 million US householdsI that is a aggregatedmedianof $2 800 millions (2.8 billions) I Interpretation is that it is the amount that would obtain
exactly 50% if there was a referendum
I It is not related to the cost of the hypothetical escort ship program
I But it can be taken as the minimal compensation for the loss of natural heritage due to the spill
I
In the end, Exxon and the governor of Alaska settled out of
court for 1 billion of $
Exxon-Valdez : Remarks
I
In US law, the ultimate responsibility of goods is of the owner of these goods
I Otherwise, if it was of the shipping cie, it would be easy for goods owners to contract unsolvable firms
I and effectively escape responsibility
I
Any tanker that calls in US territorial waters must have subscribed an insurance
I that has a $1 billion fund that is seized by the authorities to pay any damages
Exxon-Valdez : Remarks
I
This is part of the Oil Pollution Act of 1990
I following the Exxon Valdez spillI "A company cannot ship oil into the United States until it presents a plan to prevent spills that may occur. It must also have a detailed containment and cleanup plan in case of an oil spill emergency."
I
In Europe, similar ideas advance slowly
I following the Erika (1999) and Prestige (2002) wrecks
Outline
Principles of Contingent Valuation Maximum Likelihood Estimation Application of ML : Logit & probit Single-bounded CV Estimation
Application
Computing Welfare Measures
Double-bounded CV Estimation
Application: Exxon-Valdez Questionnaire Issues and Biases
Context
I
It is generally not easy to answer CV questions
I Not current to think of public/collective goods in terms of price or value
I How will my answer be used ? Could it commit me ? I
Often, a realistic context is described in the questionnaire
I A referendum (binding or consultative) on a local tax change I A contribution to an association
I An entry fee on a site
I
The paiement vehicle is often associated with such context
I If there is a paiement, how would be carried out ?I Changes of prices, tax raise or similar fees, voluntary contributions...
Other Formats
I
Open-ended: how much are you WTP ?
IPaiement cards, showing several amounts
I“Bidding game”
I i.e. like an auction
I
Psychologists are very critical with any valuation question
I They say that any amount stated by the researcheranchorsvery much the respondents’ anwers
I Within the valuation Q or anywhere else in the questionnaire I Yet, it is an empirical regularity that the proportion of Yes
decreases with the bid
I Not always very smoothly, but still
Answers that are not Preferences Revealing
I
Strategic : by lying, can I get more than by telling the truth ?
I Open-ended and the average of individual WTPI This is less of a concern with the dichotomous format I Avoid willingness to accept Q as in Exxon-Valdez I
Symbolic
I The respondent does not identify correctly the ∆z of interest I Some people answer always Yes for environmental causes or
No when the word tax appears
I Some repondent answer what they think everyone (or the government) should do
I
Debriefing : series of questions
afterthe valuation question(s)
to eliminate such answers
Don’t know / refuse to answer
I
Strategic or symbolic answers are not obvious
IDon’t know / don’t answer are visible
I So actually, even a Yes/No Q always has 3 or 4 answers in practice
I Distinguishing “don’t know” and “refuse” or not
I
One option to treat these answers is to remove them
I ok if they are not associated w/ a specific profile I Preliminary dich. choice modelI Answer “Yes or No” vs. sthg else I Or directly a multinomial model
& complexity
I
Control “size” (scope) effect
I “size” of the ∆zI e.g. it must be that the WTP to save 300 birds be < than for 3 000 birds
I Can be done by subsampling
I
Control embedding
I Is ∆z valued for itself or taken symbolically of a larger set ? I e.g. protecting against another Exxon-Valdez at the same site
or for the whole US
I This is often addressed by careful structure : broad Q first, more and more precise
I
Socio-demographics Q
I to acquire regressors of the valuation Q I Allow inference to the population I
Interviewer effect
Bid Design
I
Why should we use 4 or 6 bids ?
I Trade-off: the more bids,I the more we know about the WTP curve I but the less precise this knowledge
I
Given the total size of the sample
I d-optimality seeks to minimize the asymptotic variances of the estimators
I c-optimality minimizes the confidence interval of the estimated WTP
I
In the end, the literature settled for “sequential design”
1. start w/ a focus group and ask open Q
2. use these first guesses for a first round of (100?) questionnaire 3. estimate the model : can you identify the WTP ?
4. If not, adjust: higher (lower) bids if not enough “No”(“Yes”)