• Aucun résultat trouvé

Research in Applied Econometrics Chapter 1. R

N/A
N/A
Protected

Academic year: 2022

Partager "Research in Applied Econometrics Chapter 1. R"

Copied!
60
0
0

Texte intégral

(1)

Research in Applied Econometrics Chapter 1. R

Research in Applied Econometrics Chapter 1. R

Pr. Philippe Polomé, Université Lumière Lyon 2

M1 APE Analyse des Politiques Économiques M1 RISE Gouvernance des Risques Environnementaux

2017 – 2018

(2)

Research in Applied Econometrics Chapter 1. R SWIRL

Outline

SWIRL

Data Management R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

(3)

Research in Applied Econometrics Chapter 1. R SWIRL

SWIRL

I

Do Course 1 : R programming, Lessons 1-9 + 14 by yourself

I To quit a lesson : esc

I Answer “no” to any proposition to “register”

I Following ...

I pressΩÚ

I Sometimes, much text is to be read – that is a good exercice I

Follow the commands in the RAE2017.R

I They follow the slides

I

We do just Lesson 1

I To make sure you can start the other lessons by yourself

(4)

Research in Applied Econometrics Chapter 1. R SWIRL

SWIRL R programming overview

1 : Basic Building Blocks 2 : Workspace and Files 3 : Sequences of Numbers 4 : Vectors

5 : Missing Values 6 : Subsetting Vectors 7 : Matrices and Data Frames 8 : Logic

9 : Functions 10 : lapply and sapply

11 : vapply and tapply 12 : Looking at Data

13 : Simulation 14 : Dates and Times

15 : Base Graphics

(5)

Research in Applied Econometrics Chapter 1. R SWIRL

A few commands outside of SWIRL

I

In R-Studio, create a new project (upper right button)

I Call it “RAE” for example

I Store it where you can find it back

I

Execute the commands on RAE2017.R to see the output

I

Usual math functions : log, exp, sign, sqrt, abs, min, max

I log(exp(sin(pi/4)^2)*exp(cos(pi/4)^2))Type in ConsoleΩÚ

I

Special vectors

I ones <- rep(1, 10)

I even <- seq(from = 2, to = 20, by =2)

I trend <- 1981 :2005

I

diag(4) Identity mtx of size 4

(6)

Research in Applied Econometrics Chapter 1. R SWIRL

Mtx Operations

I

A<-matrix(1 :6, nrow = 2)

I Alook what it looks like & how R gives the position of the elements

I Look @ your environment window : A is now there

I It remains in you project until erased (the brush) I

t(A) = transpose of A (

not A’

)

I

dim(A) = dimensions of A (R then C)

I

nrow(A) ; ncol(A) nbr R ; C

I

A[i,j] extract element (i,j)

I Does not remove it from the mtx

I

A[,j] extract C j (all the R) into one vector

I A[i,]same for R i

I

A1<-A[1 :2, c(1, 3)] A1 has 2 R containing the elts in R 1 to 2 and C 1 & 3 from A

I For this particular mtx, same result w/A[,-2]

(7)

Research in Applied Econometrics Chapter 1. R SWIRL

Mtx Operations

I

det(A1) determinant

I

eigen(A1) eigenvalues

I

chol(A1) Cholesky decomposition (type ?chol in Console)

I

solve(A1) inverse

I

A %*% B mtx product

I A*Aelement-by-element product

I kronecker(A, B)Kronecker element¢(type ?kronecker)

I

crossprod(A, B) efficient calculation of A’B

I

diag(A1) extract diag

(8)

Research in Applied Econometrics Chapter 1. R SWIRL

Mtx Operations

I

cbind(1, A1) “combine” one C of ones and A1

. . .

. æ . .

I

rbind(A1, diag(4, 2)) “stack” A1 & a diag mtx of size 2 w/ 4 on the diag

. . . . ø . . . .

(9)

Research in Applied Econometrics Chapter 1. R Data Management

Outline

SWIRL

Data Management

R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

(10)

Research in Applied Econometrics Chapter 1. R Data Management

Dataframe

I

“Frame” = “context”

I In R, a “Dataframe” is a data mtx

I a collection of vectors of same length

I Stacked together horizontaly I

Each vector = 1 C = “variable”

I Possibly of different natures

I quantitative, numeric but qualitative, characters, dates...

I it may further contain meta-data

I e.g. variable type or categories name I

Each R = 1 obs in the sample

I

An “array” is, in R, a more general object as it may have more

than 2 dimensions

(11)

Research in Applied Econometrics Chapter 1. R Data Management

Dataframe Creation

I

Several ways

I keyboard (cfr Swirl programming lesson 7)

I read R file

I import

I

keyboard example

I alternative 1

I mydata <- data.frame(one = 1 :10, two = 11 :20, three = 21 :30)

I alternative 2

I mydata <- as.data.frame(matrix(1 :30, ncol=3))and names(mydata) <- c(“one”, “two”, “three”)

I R is not very good for encoding data manually

(12)

Research in Applied Econometrics Chapter 1. R Data Management

attach

I

A dataframe is “attached”

I w/ commandattach

I then variables’ names in the dataframe maybe used directly in commands

I

For example

I mean(two)produce an error message

I attach(mydata) and thenmean(two)produces the average of variable “two”

I

detach(mydata) is self-explanatory

I Why detach ? e.g. to avoid confusions

I

Attacher for a single operation

I with(mydata, mean(two))

(13)

Research in Applied Econometrics Chapter 1. R Data Management

Subset Selection

I

As seen in

swirl

a subset of a Dataframe can be accessed by [ or $

I $ extract a single varaible

I

The command

subset

sometimes work better (e.g.

conditional selection)

I e.g.mydata.sub<-subset(mydata, two<=16, select = -two)

I selects all the obs. of variables one & three

I fow which the obs of variable 2 areÆ16

(14)

Research in Applied Econometrics Chapter 1. R Data Management

Export (write) a dataframe

I

write.table(mydata, file=“mydata.txt”, col.names=TRUE)

I create a txt file mydata.txt in the working directory

I normally where your project is

I Meta-data are not passed

I The text file format is

“one” “two” “three”

“1” 1 11 21

“2” 2 12 22

...

I So that it looks like the C headers are shifted left

I Take that into account accordingly w/ the software you use to open it

(15)

Research in Applied Econometrics Chapter 1. R Data Management

Import (read) a dataframe

I

From a text file (.txt or .csv)

I newdata <- read.table(“mydata.txt”, header=TRUE)

I reads a txt file in which the 1st R has the variable names

I this is placed in a “table” callednewdata

I Also works asread_csv( )from csv into a data frame I

read.table accepts many options

I C separator : , ;

I Decimal separator : . ,

I French is your enemy here

I ?read.table

I

The Environment window has a button that makes it very easy

I a preview is generated

(16)

Research in Applied Econometrics Chapter 1. R Data Management

Import a dataframe

I

scan is used for data that are not in mtx form

I ?scan

I

Import from another software : excel, stata, sas...

I Easiest : if you have access to the software, export the data file in txt or csv

I loss of meta-data

I R-Studio proposes several formats

I It does not work often as these software change their formats often

I Use Google

I e.g. “R import Stata 17 data”

I Also www.statmethods.net/input/importingdata.html

I for a few formats

(17)

Research in Applied Econometrics Chapter 1. R R graphics

Outline

SWIRL

Data Management

R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

(18)

Research in Applied Econometrics Chapter 1. R R graphics

Plot

I

First SWIRL

I course R-programming, lesson 15 Base graphics

I

A few additional graphic elements using package

plot

I Packageslattice ggplot2are better

I http ://varianceexplai-

ned.org/RData/code/code_lesson2/#segment1

I R has many publication-quality graphics

I But they are not very intuitive

I

plot( ) is the default graphic command for many objects :

I dataframes, time séries, fitted linear models

I it is also an old, crude, command

(19)

Research in Applied Econometrics Chapter 1. R R graphics

Examples with data("CPS1988")

I

Data file is

cps1988

preloaded in the AER package

I Pop. survey March 1988, US Census Bureau

I 28 155 obs., cross-section

I Men, 18-70 y-o

I Income > US$ 50 in 1988

I Not self-employed, not working w/o salary

I

summary(CPS1988)

I

Quantitative data

I wage$/week

I education&experience(=age-education-6) in years

(20)

Research in Applied Econometrics Chapter 1. R R graphics

“Scatterplots” – dispersion – XY

I

Probably the + commons in stat (with histograms)

I We use CPS1988 : a census data file on wage and its determinants

I From the AER package

I

attach(CPS1988)

I plot(education, log(wage))

I First is on arg in x-axis, 2nd in y-axis

I rug(education)

I rug(log(wage), side=2)

I rug = “tapis” – is a 1-D plot I

detach(CPS1988)

I

plot(log(subs)~log(citeprice), data=Journals)

I alternative to avoid attaching the dataframe

(21)

Research in Applied Econometrics Chapter 1. R R graphics

R Graphic Parameters

I

A plot results may be modified in many ways

I E.g. argumenttypecontrols if the plot is made points (type = p), lines (type = l), both (type = b), steps (type = s) or others

I

Several dozens parameters may be modified

I See ?par

I They may be modifiedafterthe plot w/ commandpar( )

I Or thay can be supplied in theplot( ) command e.g.

plot(log(wage)~education, data=CPS1988, pch=20, col="blue", ylim=c(4,10), xlim=c(0,20), main="Wage by education years")

I

Next slide : list of par

(22)

Argument Description

axes should axes be drawn ?

bg background color

cex size of a point or symbol

col color

las orientation of axis label

lty, lwd line type and line width main, subs main title and subtitle

mar size of margins

mfcol, mfrow array defining layout for several graphs on one plot

pch plotting symbol

type types

xlab, ylab axis labels

xlim, ylim axis ranges

xlog, ylog, log logarithmic scales

(23)

Research in Applied Econometrics Chapter 1. R R graphics

R Graphic Parameters

I

Add layer(s) to a plot : lines( ), points( ), text( ), legend( )

I Add a straight lineabline(a, b)

I a intercept, b slope I

1 plot over another

I x <- rnorm(50)

I x2 <- rnorm(50, -1)

I plot(ecdf(x), xlim = range(c(x, x2)))

I ecdf empirical cumulative density function

I plot(ecdf(x2), add = TRUE, lty = "dashed")

I

Barplots, pie charts, boxplots, QQ plots & histograms

I barplot( ), pie( ), boxplot( ), qqplot( ), hist( )

I We’ll see later

(24)

Research in Applied Econometrics Chapter 1. R R graphics

Export graphics

I

To use R graphics in other software

I “Export” send the graph on a “device”

I Really : just a .pdf or .jpg file extension I

All devices work similarly in R, see ?devices

1. The device is opend by a command that bears its name, e.g.

pdf( )

2. Then, the plot is executed

3. Finaly, the device is closeddev.off( )

I

Example

I pdf("myfile.pdf", height=5, width=6)

I plot(1 :20, pch=1 :20, col=1 :20, cex=2)

I dev.off()

I Search myfile.pdf on your laptop I

Simplest : “Export” button in Plots window

(25)

Research in Applied Econometrics Chapter 1. R R graphics

Math Formulas in a Plot

I

R may pass a formula in a plot via L

A

TEX

I see ?plotmath

I

Example

I plot of the std normal density w/ its math definition

I curve(dnorm, from=-5, to=5, col="slategray", lwd=3, main="Density of the Standard Normal Distribution")

I text(-5, 0.3, expression(f(x) == frac(1, sigma ~~ sqrt(2*pi))

~~ e^{-frac((x - mu)^2, 2*sigma^2)}), adj=0)

I Unfortunately, you have to know LATEX

I & the parameters are not easy

(26)

Research in Applied Econometrics Chapter 1. R R graphics

Histograms & boxplots

I

Continue w/ CPS1988 data base on wage & its determinants

I summary(CPS1988)reveals that some variables are categorical

I Categorical : calledfactorsin R

I

Factors are vectors of categories

I sometimes w/ metadata

I e.g. categories names

I g <- rep(0 :1, c(2,4))

I g <- factor(g, levels=0 :1, labels=c("male", "female"))

I Name categories (0,1) of g into “Male”(=0) & “Female”

I so g is [1] male male female female female female

(27)

Research in Applied Econometrics Chapter 1. R R graphics

Factors in CPS1988

I

In CPS1988, the factors are

I ethnicityvaut caucasian “cauc” & african-american “afam”

I smsarésidence en zone urbaine

I region

I parttimetravail à mi-temps

I

Plots according to data type

I Numerical/Quantitative or categorical

I Single variable or 2 in relation

(28)

Research in Applied Econometrics Chapter 1. R R graphics

One numerical variable : histogram & density

I

hist(wage, freq=FALSE)

I optionfreq=FALSE

I relative frequencies, else absolute (counting)

I optionbinwidth=zzz

I “bin” = container : chose the length of the base of the rectangles

I

hist(log(wage), freq=FALSE)

I

lines(density(log(wage)), col=4)

I Commanddensity is actually a non-parametric estimate of the density function (next year)

I

Remarks

I log distribution is less asymetrical than the raw data

I data in log are often closer to a normal

I That is often the case w/ econ. data & a rationale for the normal hypothesis

(29)

Research in Applied Econometrics Chapter 1. R R graphics

One categorical

I

W/ categorical data

I Mean & variance have no meaning

I But frequencies do

I

summary(region) : absolute frequencies (counts)

I

tab <- table(region) : stores these freq. in a table called tab

I

prop.table(tab) computes the proportions (relative freq.)

I

Barplots & pie visualise often quite well cat. data

I barplot(tab)

I pie(tab)

I These plots can be modified using parameters

(30)

Research in Applied Econometrics Chapter 1. R R graphics

2 categorical

I

Usually presented in a Contingency Table

I xtabs( )w/ aformula interface :

I e.g.xtabs(~ ethnicity + region, data = CPS1988)

I data is optional si it is stillattached

I table(ethnicity, region)mêmes résultats

I

A plot of that is a “spine plot”

I plot(ethnicity ~ region)Formula

I plot(ethnicity, region)What differences ?

(31)

Research in Applied Econometrics Chapter 1. R R graphics

2 numerical

I

The Correlation Coefficient

r

is typical

I For positive & asymetrical variables : Spearman’s

I rankscorrelation, instead of values, is often prefered becauser is not robust to asymetry

I

cor(log(wage), education)

I

cor(log(wage), education, method="spearman")

I Results differ a bit

I

plot(log(wage)~education)

I scatterplot shows little correlation

I but log makes it difficult to see graphically

(32)

Research in Applied Econometrics Chapter 1. R R graphics

1 numerical & 1 categorical

I

Often, conditionnal moments are calculated

I e.g. average wage by ethnicity

I tapply(log(wage), ethnicity, mean)

I “Applies” the cmd “mean” on the 2 variables ethnicity &

log(wage)

I Mean maybe replaced by any valid cmd, e.g quantile I

The Box plots & QQ (quantile-quantile) plots are often used

(33)

Research in Applied Econometrics Chapter 1. R R graphics

1 numerical & 1 categorical : Box plot

I A box plot is a crude representation of an empirical distribution

I The box is limited by “hinges” (1º& 3ºquartiles) and show the median

I Outside of the box, 2 lines indicate the smallest & largest obs.

I within 1.5◊size of the box from the closest hinge

I Any obs. outside is represented by separate points

I

boxplot(log(wage)~ethnicity)

(34)

Research in Applied Econometrics Chapter 1. R R graphics

1 numerical & 1 categorical : QQ plot

I A QQ plot matchesthe quantiles of 2 (empirical) distributions

I Recall that quantiles are quantities

I e.g. the 1ºquartile of afam wage is the wage s.t. 25% of afam make less & 75% +

I If the 2 distributions are identical : QQ plot = diagonal

I Otherwise, if e.g. cauc make more than afam, then

I with cauc on the x-axis, the QQ plot will be below the diag.

I A bit like the plot of income inequality, but w/ 2 var.

I awage <- subset(CPS1988, ethnicity == "afam")$wage

I cwage <- subset(CPS1988, ethnicity == "cauc")$wage

I qqplot(awage, cwage)

I abline(0,1)overlay the diag (intercept 0, slope 1)

I

detach(CPS1988) pour refermer CPS1988

(35)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Outline

SWIRL

Data Management R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

(36)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Basic Regression Commands in R

I

Linear Regression Model LRM

yi =xiÕ+i

w/

i =

1...n

I In mtx formy=X—+

I

Typical Hyp. in cross-sections

I E(‘|X) =0 (exogeneity)

I Var(‘|X) =2I (“sphericity” : homoscedasticity & no autoc.)

I

In R, models are usually fitted by calling a cmd

I For the LRM in cross-section :fm <- lm(formula, data,...)

I Argument ... replace a series of arguments

I describing the model

I or choosing the computation mode (algorithm)

I or options

(37)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Basic Regression Commands in R

I

The lm cmd returns an

object

I Here : the fitted model under the namefm

I Maybe visualised in many ways or summarized I

The lm object can be used to compute :

I Predictions & fitted values, residuals, ... by means of fm$... see RAE2017

I Tests & several postestimations diagnostics

I

Most estimation commands work the same way

(38)

Research in Applied Econometrics Chapter 1. R Linear Regressions

SWIRL

I

Do Lessons 1-6, course « Regression Models » in Swirl

I The others : later

I Concentrate on code, you know the econometrics

I Think of closing files that may have remained opened from the previous session

1.

“Introduction”

I To remember “A coefficient will be within 2 standard errors of its estimate about 95% of the time”

2.

“Residuals” is + difficult (reading + programming +concepts)

I Explains loops

I Forces to re-read previous cmds

I Make sure to execute program res_eqn.r when it shows up

(39)

Research in Applied Econometrics Chapter 1. R Linear Regressions

SWIRL

3.

“Least Squares Estimation” – nothing in particular

4.

Introduction to Multivariable Regression

I Installmanipulatepreviously

I I am not sure of the stability of this lesson

I Do not edit the functionmyplotwhich will show up

I ! cor(gpa_nor, gch_nor) will be”= ˆ—, SWIRL expects =, so a bug

5.

“Residual Variation”

I “Gaussian elimination” shows that a k-regressors regression

I may be seen as a succession of k 1-regressor regressions

I DO NOT interpret this as model building or presentation or a way to select results

6.

“MultiVar Examples” – nothing in particular

(40)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Multivariare Linear Regression w/ Factors

I

The purpose of this example is to demonstrate various R tools

I that are used to transform & combine regressors

I

Dataframe :

cps1988

as before

I

SWIRL Course « Regression Models »

I lesson 7 : “MultiVar Examples2”

I Plots window for BoxPlot

I sapply : use help in help window

(41)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Wage Equation

I

Wage Equation

log (wage) =1+—2exp+—3exp2+—4education+—5ethnicity+‘

cps_lm<-lm(log(wage)~experience+I(experience^2) +education+ethnicity, data=CPS1988)

I

“Insulation function” I( )

I indicates to R that ^2 be understood as the square of exp

I otherwise, R is unsure of the meaning and withdraws experience^2

I This might be clearer w/ a formula y ~ a + (b+c)

I Are there 2 variables on the RHS of the formula : a et (b+c), or are there 3 ?

I To clarify, write y ~ a + I(b+c)

(42)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Results & Testing

I

summary(cps_lm)

I The return of education (to the wage) is 8.57%/year

I % interpretation because wage is in log model

I Categorical variables are managed by R

I that selects the reference cat.

I

Compare Nested Models : Anova (Analysis of Variance) Table

I Regression + constraint

I cps_noeth<-lm(log(wage)~experience+

I(experience^2)+education, data=CPS1988)

I Usually, the test is on + than one variable

I anova(cps_noeth,cps_lm)

(43)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Interactions : effects of combined regressors

I

e.g. in labor econ : the combined effect of education &

ethnicity

I Does one year of Education have the same return for different ethnicities ?

I

This is modeled w/

multiplicative

terms

I Consider

log (wage) =1+—2ethnicity+—3ethnicityeducation+—4education+‘

I Thenˆlog (wage)/ˆeducation=3ethinicity+4

I Ifethinicity =0, then the effect of 1 year of education is4 I Ifethinicity =1, then the effect of 1 year of education is

3+4

I

Let a, b, c three factors

I so that each has several discrete levels

I

and x, y two continuous variables (quantitative)

(44)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Several Models/Formulas with Interactions

I

y~a+x : no interaction

I A single slope (of x) but one intercept for each level of factor a

I

y~a*x : same as previous model +

I one interaction term for each level of a with x (different slopes)

I In a more formal notation, letdai =I(a=i): [y≥aúx]©

C

y =aiÿ

i

dai+aixÿ

i

dai

D

(45)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Formulas with Interactions

I

y~(a+b+c)^2

I models all the interactions at 2 variables

I but not at 3

I So this is like as many dichotomous var. as the nbr of levels dai≠bj=I(a=i·b=j)for a & b

I and similarly for a & c and for c & b I

SWIRL course Regression Models

I Lesson 8 : MultiVar Examples3

(46)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Interactions Wage eq. : ethnicity & education

I cps_int<-lm(log(wage)~experience+I(experience^2) +education*ethnicity, data=CPS1988)

I Only one of the “+” fromcps_lm has been replaced by*

I

coeftest(cps_int)

I A + compact version of summary( )

I That can also be used on some other regression cmds

I

The regression outputs the effects of education & ethnicity

I called “main effects”

I and the product of education & an indicator for the level

“afam” ofethnicity

I Why afam ? Probably because it is less numerous than cauc

(47)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Interactions Wage eq. : ethnicity & education

I

afam has a neg. effect on the intercept

I lower average wage for african-american

I AND on the slope ofeducation

I lower return ofeducationfor african-american I

The effect is not much significant though

I since a 5% significance with a sample of nearly 30 000 individuals is not much convincing

(48)

Research in Applied Econometrics Chapter 1. R Linear Regressions

Predictions

I

First define the values for which you want to predict.

I We simplify the model to exp. & educ. for ease of presentation

I Let’s say we want to show the effect of Exp. at an average level of Educ.

I

Create a new data frame w/ a C of average Educ & a C of all the possible values of Exp

I Note that in the Census, some people have negative experience !

I This is due to the way we compute Exp.

I

Use a predict( ) cmd on

I the lm object of interest : cps_lm here

I the new data set for which we want prediction : cps2 here

I predict( ) can not only gives a prediction but also bounds

I Plot that on the data

I

detach(CPS1988) when you are done to avoid confusion

(49)

Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building

Outline

SWIRL

Data Management R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

(50)

Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building

When building a model, there are 2 contradictory forces

I

If we omit a regressor, and it is in fact relevant

I unobserved heterogeneity & inconsistency of LS estimators

I we sometimes can deal w/ that using instruments or panel

I

If we include irrelevant regressor that are correlated w/

relevant ones

I we create multicollinearity w/ the csqce that both relevant &

irrelevant regressors may appear non-signif.

I That may even occur w/ 2 relevant regressors, e.g. in a Quantity-Price relation, the price of the subtitutes goods are relevant, but may be correlated w/ own price

(51)

Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building

Collinearity – Endogeneity Trade-off

I

From a statistical point of view, 2 collinear variables carry the same information

I Their separate influence on the dependant variable cannot be assessed in the present sample

I Be pragmatic : reject one of the 2 or merge them in some way that makes sense in context

I

It is not really possible to escape such a trade-off

I Especially since in a particular sample, a relevant regressor may coincidentally appear non significant (if the sample is not large)

I

Theory does not help by nature

I since an empirical model is a trial of a model

I theory helps interpreting results, not guide them

(52)

Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building

Progressive Inclusion

I

is an old way of looking at model building

1.

Among potential regressors

x, take the one w/ highest

correlation w/

y

2.

Regress

y

on that single regressor

I Is it significant ?

I No : you don’t have a model

I Yes : estimate the one-regressor model & compute its residuals

3.

Among the remaining regressors, take the one w/ highest correlation w/ the residuals

4.

Repeat previous steps with progressively more regressors

I Until one that is non-significant

(53)

Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building

Progressive Inclusion

I

The issue w/ this approach is that if there is several relevant regressors

I then at least the first step might be inconsistent

I because at least one relevant regressor is missing

I

This is a very serious issue that leads to non-sensical results

(54)

Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building

Progressive Elimination

I

Instead, consider the “largest reasonnable set of regressors”

I can be linked to the theory you want to test or to previous experience

I

It is risky to just run this “encompassing” regression and report the results

I because of multicollinearity

(55)

Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building

Progressive Elimination

I

Gradually remove regressors one by one

I Examine how the estimates of the remaining regressors evolve

I If there is a noticeable increase in significance

I but not so much change in estimates

I collinearity was an issue

I If estimated coefficients change wildly

I omitted regressor endogeneity I

However

I dropping collinear regressor could lead to jumps in coef estimates

I after all, collinearity affects their variance

I dropping a relevant regressor does not necessarily lead to major changes in the other coef

I when that regressor is not much correlated to the others

(56)

Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building

Summing up

I

Model

yi =0+1x1i +2x2i +i

(no missing relevant regressor)

I estimation by MCO whenx2 andx1 are correlated

I if they are not, there is NO serious consequences forˆ1

I “not relevant but correlated to a relevant regressor” might not be empirically common

x2

Consequences on

ˆ1

on

ˆ2 relevant

included May appear insignificant

not incl. Inconsistent –

not relev.

included May appear insignificant should

æ

0

not incl. ? ? –

(57)

Research in Applied Econometrics Chapter 1. R Document Edition Functionalities

Outline

SWIRL

Data Management R graphics

Linear Regressions

Discussing Regressors and Model Building

Document Edition Functionalities

(58)

Research in Applied Econometrics Chapter 1. R Document Edition Functionalities

Writing with R

I

A few packages are designed to use R to write reports directly

1.

The text is written directly in the script in the Editor window

I Math formulas in latex may be included

I Of course, R commands (graphics, regressions...) 2.

If the data change, or the model, everything is adjusted

automatically

3.

L

A

TEX helps choose an appropriate format

I report, paper, presentation

(59)

Research in Applied Econometrics Chapter 1. R Document Edition Functionalities

SWeave – Knitr – Markdown

I SWeave

simply send the whole script to L

A

TEX

I knitr

does the same but combine other packages and solve some issues in SWeave

I Markdown

is the current standard

I The script is directly printed using LATEX or .doc (Word) or html (webpage)

I Self-teach (I won’t look into it)

I http ://rmarkdown.rstudio.com/lesson-1.html

I https ://www.r-bloggers.com/how-to-create-reports-with-r- markdown-in-rstudio/

(60)

Research in Applied Econometrics Chapter 1. R Document Edition Functionalities

Should we sum up ?

I

Anything ?

Références

Documents relatifs

Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial

In half a page, give two accounts on the econometrics of any presentation except yours, on two

I Goal: Expose students to applied econometrics in English I Applied examples with environmental economics data.. I Students should improve both their applied econometrics skills

I Goal: Expose students to applied econometrics in English I Applied examples with environmental economics data.. I Students should improve both their applied econometrics skills

I Students should improve both their applied econometrics skills and their English level.. I Attendance and interactions

I Goal: Expose students to applied econometrics in English I Applied examples with environmental economics data.. I Students should improve both their applied econometrics skills

Discussing Regressors and Model Building Document Edition Functionalities... Research in Applied Econometrics

Discussing Regressors and Model Building Document Edition Functionalities... First command