Research in Applied Econometrics Chapter 1. R
Research in Applied Econometrics Chapter 1. R
Pr. Philippe Polomé, Université Lumière Lyon 2
M1 APE Analyse des Politiques Économiques M1 RISE Gouvernance des Risques Environnementaux
2017 – 2018
Research in Applied Econometrics Chapter 1. R SWIRL
Outline
SWIRL
Data Management R graphics
Linear Regressions
Discussing Regressors and Model Building
Document Edition Functionalities
Research in Applied Econometrics Chapter 1. R SWIRL
SWIRL
I
Do Course 1 : R programming, Lessons 1-9 + 14 by yourself
I To quit a lesson : esc
I Answer “no” to any proposition to “register”
I Following ...
I pressΩÚ
I Sometimes, much text is to be read – that is a good exercice I
Follow the commands in the RAE2017.R
I They follow the slides
I
We do just Lesson 1
I To make sure you can start the other lessons by yourself
Research in Applied Econometrics Chapter 1. R SWIRL
SWIRL R programming overview
1 : Basic Building Blocks 2 : Workspace and Files 3 : Sequences of Numbers 4 : Vectors
5 : Missing Values 6 : Subsetting Vectors 7 : Matrices and Data Frames 8 : Logic
9 : Functions 10 : lapply and sapply
11 : vapply and tapply 12 : Looking at Data
13 : Simulation 14 : Dates and Times
15 : Base Graphics
Research in Applied Econometrics Chapter 1. R SWIRL
A few commands outside of SWIRL
I
In R-Studio, create a new project (upper right button)
I Call it “RAE” for example
I Store it where you can find it back
I
Execute the commands on RAE2017.R to see the output
I
Usual math functions : log, exp, sign, sqrt, abs, min, max
I log(exp(sin(pi/4)^2)*exp(cos(pi/4)^2))Type in ConsoleΩÚ
I
Special vectors
I ones <- rep(1, 10)
I even <- seq(from = 2, to = 20, by =2)
I trend <- 1981 :2005
I
diag(4) Identity mtx of size 4
Research in Applied Econometrics Chapter 1. R SWIRL
Mtx Operations
I
A<-matrix(1 :6, nrow = 2)
I Alook what it looks like & how R gives the position of the elements
I Look @ your environment window : A is now there
I It remains in you project until erased (the brush) I
t(A) = transpose of A (
not A’)
I
dim(A) = dimensions of A (R then C)
I
nrow(A) ; ncol(A) nbr R ; C
I
A[i,j] extract element (i,j)
I Does not remove it from the mtx
I
A[,j] extract C j (all the R) into one vector
I A[i,]same for R i
I
A1<-A[1 :2, c(1, 3)] A1 has 2 R containing the elts in R 1 to 2 and C 1 & 3 from A
I For this particular mtx, same result w/A[,-2]
Research in Applied Econometrics Chapter 1. R SWIRL
Mtx Operations
I
det(A1) determinant
I
eigen(A1) eigenvalues
I
chol(A1) Cholesky decomposition (type ?chol in Console)
I
solve(A1) inverse
I
A %*% B mtx product
I A*Aelement-by-element product
I kronecker(A, B)Kronecker element¢(type ?kronecker)
I
crossprod(A, B) efficient calculation of A’B
I
diag(A1) extract diag
Research in Applied Econometrics Chapter 1. R SWIRL
Mtx Operations
I
cbind(1, A1) “combine” one C of ones and A1
. . .
. æ . .
I
rbind(A1, diag(4, 2)) “stack” A1 & a diag mtx of size 2 w/ 4 on the diag
. . . . ø . . . .
Research in Applied Econometrics Chapter 1. R Data Management
Outline
SWIRL
Data Management
R graphics
Linear Regressions
Discussing Regressors and Model Building
Document Edition Functionalities
Research in Applied Econometrics Chapter 1. R Data Management
Dataframe
I
“Frame” = “context”
I In R, a “Dataframe” is a data mtx
I a collection of vectors of same length
I Stacked together horizontaly I
Each vector = 1 C = “variable”
I Possibly of different natures
I quantitative, numeric but qualitative, characters, dates...
I it may further contain meta-data
I e.g. variable type or categories name I
Each R = 1 obs in the sample
I
An “array” is, in R, a more general object as it may have more
than 2 dimensions
Research in Applied Econometrics Chapter 1. R Data Management
Dataframe Creation
I
Several ways
I keyboard (cfr Swirl programming lesson 7)
I read R file
I import
I
keyboard example
I alternative 1
I mydata <- data.frame(one = 1 :10, two = 11 :20, three = 21 :30)
I alternative 2
I mydata <- as.data.frame(matrix(1 :30, ncol=3))and names(mydata) <- c(“one”, “two”, “three”)
I R is not very good for encoding data manually
Research in Applied Econometrics Chapter 1. R Data Management
attach
I
A dataframe is “attached”
I w/ commandattach
I then variables’ names in the dataframe maybe used directly in commands
I
For example
I mean(two)produce an error message
I attach(mydata) and thenmean(two)produces the average of variable “two”
I
detach(mydata) is self-explanatory
I Why detach ? e.g. to avoid confusions
I
Attacher for a single operation
I with(mydata, mean(two))
Research in Applied Econometrics Chapter 1. R Data Management
Subset Selection
I
As seen in
swirla subset of a Dataframe can be accessed by [ or $
I $ extract a single varaible
I
The command
subsetsometimes work better (e.g.
conditional selection)
I e.g.mydata.sub<-subset(mydata, two<=16, select = -two)
I selects all the obs. of variables one & three
I fow which the obs of variable 2 areÆ16
Research in Applied Econometrics Chapter 1. R Data Management
Export (write) a dataframe
I
write.table(mydata, file=“mydata.txt”, col.names=TRUE)
I create a txt file mydata.txt in the working directory
I normally where your project is
I Meta-data are not passed
I The text file format is
“one” “two” “three”
“1” 1 11 21
“2” 2 12 22
...
I So that it looks like the C headers are shifted left
I Take that into account accordingly w/ the software you use to open it
Research in Applied Econometrics Chapter 1. R Data Management
Import (read) a dataframe
I
From a text file (.txt or .csv)
I newdata <- read.table(“mydata.txt”, header=TRUE)
I reads a txt file in which the 1st R has the variable names
I this is placed in a “table” callednewdata
I Also works asread_csv( )from csv into a data frame I
read.table accepts many options
I C separator : , ;
I Decimal separator : . ,
I French is your enemy here
I ?read.table
I
The Environment window has a button that makes it very easy
I a preview is generated
Research in Applied Econometrics Chapter 1. R Data Management
Import a dataframe
I
scan is used for data that are not in mtx form
I ?scan
I
Import from another software : excel, stata, sas...
I Easiest : if you have access to the software, export the data file in txt or csv
I loss of meta-data
I R-Studio proposes several formats
I It does not work often as these software change their formats often
I Use Google
I e.g. “R import Stata 17 data”
I Also www.statmethods.net/input/importingdata.html
I for a few formats
Research in Applied Econometrics Chapter 1. R R graphics
Outline
SWIRL
Data Management
R graphicsLinear Regressions
Discussing Regressors and Model Building
Document Edition Functionalities
Research in Applied Econometrics Chapter 1. R R graphics
Plot
I
First SWIRL
I course R-programming, lesson 15 Base graphics
I
A few additional graphic elements using package
plotI Packageslattice ggplot2are better
I http ://varianceexplai-
ned.org/RData/code/code_lesson2/#segment1
I R has many publication-quality graphics
I But they are not very intuitive
I
plot( ) is the default graphic command for many objects :
I dataframes, time séries, fitted linear models
I it is also an old, crude, command
Research in Applied Econometrics Chapter 1. R R graphics
Examples with data("CPS1988")
I
Data file is
cps1988preloaded in the AER package
I Pop. survey March 1988, US Census Bureau
I 28 155 obs., cross-section
I Men, 18-70 y-o
I Income > US$ 50 in 1988
I Not self-employed, not working w/o salary
I
summary(CPS1988)
I
Quantitative data
I wage$/week
I education&experience(=age-education-6) in years
Research in Applied Econometrics Chapter 1. R R graphics
“Scatterplots” – dispersion – XY
I
Probably the + commons in stat (with histograms)
I We use CPS1988 : a census data file on wage and its determinants
I From the AER package
I
attach(CPS1988)
I plot(education, log(wage))
I First is on arg in x-axis, 2nd in y-axis
I rug(education)
I rug(log(wage), side=2)
I rug = “tapis” – is a 1-D plot I
detach(CPS1988)
I
plot(log(subs)~log(citeprice), data=Journals)
I alternative to avoid attaching the dataframe
Research in Applied Econometrics Chapter 1. R R graphics
R Graphic Parameters
I
A plot results may be modified in many ways
I E.g. argumenttypecontrols if the plot is made points (type = p), lines (type = l), both (type = b), steps (type = s) or others
I
Several dozens parameters may be modified
I See ?par
I They may be modifiedafterthe plot w/ commandpar( )
I Or thay can be supplied in theplot( ) command e.g.
plot(log(wage)~education, data=CPS1988, pch=20, col="blue", ylim=c(4,10), xlim=c(0,20), main="Wage by education years")
I
Next slide : list of par
Argument Description
axes should axes be drawn ?
bg background color
cex size of a point or symbol
col color
las orientation of axis label
lty, lwd line type and line width main, subs main title and subtitle
mar size of margins
mfcol, mfrow array defining layout for several graphs on one plot
pch plotting symbol
type types
xlab, ylab axis labels
xlim, ylim axis ranges
xlog, ylog, log logarithmic scales
Research in Applied Econometrics Chapter 1. R R graphics
R Graphic Parameters
I
Add layer(s) to a plot : lines( ), points( ), text( ), legend( )
I Add a straight lineabline(a, b)
I a intercept, b slope I
1 plot over another
I x <- rnorm(50)
I x2 <- rnorm(50, -1)
I plot(ecdf(x), xlim = range(c(x, x2)))
I ecdf empirical cumulative density function
I plot(ecdf(x2), add = TRUE, lty = "dashed")
I
Barplots, pie charts, boxplots, QQ plots & histograms
I barplot( ), pie( ), boxplot( ), qqplot( ), hist( )
I We’ll see later
Research in Applied Econometrics Chapter 1. R R graphics
Export graphics
I
To use R graphics in other software
I “Export” send the graph on a “device”
I Really : just a .pdf or .jpg file extension I
All devices work similarly in R, see ?devices
1. The device is opend by a command that bears its name, e.g.
pdf( )
2. Then, the plot is executed
3. Finaly, the device is closeddev.off( )
I
Example
I pdf("myfile.pdf", height=5, width=6)
I plot(1 :20, pch=1 :20, col=1 :20, cex=2)
I dev.off()
I Search myfile.pdf on your laptop I
Simplest : “Export” button in Plots window
Research in Applied Econometrics Chapter 1. R R graphics
Math Formulas in a Plot
I
R may pass a formula in a plot via L
ATEX
I see ?plotmath
I
Example
I plot of the std normal density w/ its math definition
I curve(dnorm, from=-5, to=5, col="slategray", lwd=3, main="Density of the Standard Normal Distribution")
I text(-5, 0.3, expression(f(x) == frac(1, sigma ~~ sqrt(2*pi))
~~ e^{-frac((x - mu)^2, 2*sigma^2)}), adj=0)
I Unfortunately, you have to know LATEX
I & the parameters are not easy
Research in Applied Econometrics Chapter 1. R R graphics
Histograms & boxplots
I
Continue w/ CPS1988 data base on wage & its determinants
I summary(CPS1988)reveals that some variables are categorical
I Categorical : calledfactorsin R
I
Factors are vectors of categories
I sometimes w/ metadata
I e.g. categories names
I g <- rep(0 :1, c(2,4))
I g <- factor(g, levels=0 :1, labels=c("male", "female"))
I Name categories (0,1) of g into “Male”(=0) & “Female”
I so g is [1] male male female female female female
Research in Applied Econometrics Chapter 1. R R graphics
Factors in CPS1988
I
In CPS1988, the factors are
I ethnicityvaut caucasian “cauc” & african-american “afam”
I smsarésidence en zone urbaine
I region
I parttimetravail à mi-temps
I
Plots according to data type
I Numerical/Quantitative or categorical
I Single variable or 2 in relation
Research in Applied Econometrics Chapter 1. R R graphics
One numerical variable : histogram & density
I
hist(wage, freq=FALSE)
I optionfreq=FALSE
I relative frequencies, else absolute (counting)
I optionbinwidth=zzz
I “bin” = container : chose the length of the base of the rectangles
I
hist(log(wage), freq=FALSE)
I
lines(density(log(wage)), col=4)
I Commanddensity is actually a non-parametric estimate of the density function (next year)
I
Remarks
I log distribution is less asymetrical than the raw data
I data in log are often closer to a normal
I That is often the case w/ econ. data & a rationale for the normal hypothesis
Research in Applied Econometrics Chapter 1. R R graphics
One categorical
I
W/ categorical data
I Mean & variance have no meaning
I But frequencies do
I
summary(region) : absolute frequencies (counts)
I
tab <- table(region) : stores these freq. in a table called tab
I
prop.table(tab) computes the proportions (relative freq.)
I
Barplots & pie visualise often quite well cat. data
I barplot(tab)
I pie(tab)
I These plots can be modified using parameters
Research in Applied Econometrics Chapter 1. R R graphics
2 categorical
I
Usually presented in a Contingency Table
I xtabs( )w/ aformula interface :
I e.g.xtabs(~ ethnicity + region, data = CPS1988)
I data is optional si it is stillattached
I table(ethnicity, region)mêmes résultats
I
A plot of that is a “spine plot”
I plot(ethnicity ~ region)Formula
I plot(ethnicity, region)What differences ?
Research in Applied Econometrics Chapter 1. R R graphics
2 numerical
I
The Correlation Coefficient
ris typical
I For positive & asymetrical variables : Spearman’sfl
I rankscorrelation, instead of values, is often prefered becauser is not robust to asymetry
I
cor(log(wage), education)
I
cor(log(wage), education, method="spearman")
I Results differ a bit
I
plot(log(wage)~education)
I scatterplot shows little correlation
I but log makes it difficult to see graphically
Research in Applied Econometrics Chapter 1. R R graphics
1 numerical & 1 categorical
I
Often, conditionnal moments are calculated
I e.g. average wage by ethnicity
I tapply(log(wage), ethnicity, mean)
I “Applies” the cmd “mean” on the 2 variables ethnicity &
log(wage)
I Mean maybe replaced by any valid cmd, e.g quantile I
The Box plots & QQ (quantile-quantile) plots are often used
Research in Applied Econometrics Chapter 1. R R graphics
1 numerical & 1 categorical : Box plot
I A box plot is a crude representation of an empirical distribution
I The box is limited by “hinges” (1º& 3ºquartiles) and show the median
I Outside of the box, 2 lines indicate the smallest & largest obs.
I within 1.5◊size of the box from the closest hinge
I Any obs. outside is represented by separate points
I
boxplot(log(wage)~ethnicity)
Research in Applied Econometrics Chapter 1. R R graphics
1 numerical & 1 categorical : QQ plot
I A QQ plot matchesthe quantiles of 2 (empirical) distributions
I Recall that quantiles are quantities
I e.g. the 1ºquartile of afam wage is the wage s.t. 25% of afam make less & 75% +
I If the 2 distributions are identical : QQ plot = diagonal
I Otherwise, if e.g. cauc make more than afam, then
I with cauc on the x-axis, the QQ plot will be below the diag.
I A bit like the plot of income inequality, but w/ 2 var.
I awage <- subset(CPS1988, ethnicity == "afam")$wage
I cwage <- subset(CPS1988, ethnicity == "cauc")$wage
I qqplot(awage, cwage)
I abline(0,1)overlay the diag (intercept 0, slope 1)
I
detach(CPS1988) pour refermer CPS1988
Research in Applied Econometrics Chapter 1. R Linear Regressions
Outline
SWIRL
Data Management R graphics
Linear Regressions
Discussing Regressors and Model Building
Document Edition Functionalities
Research in Applied Econometrics Chapter 1. R Linear Regressions
Basic Regression Commands in R
I
Linear Regression Model LRM
yi =xiÕ—+‘i
w/
i =1...n
I In mtx formy=X—+‘
I
Typical Hyp. in cross-sections
I E(‘|X) =0 (exogeneity)
I Var(‘|X) =‡2I (“sphericity” : homoscedasticity & no autoc.)
I
In R, models are usually fitted by calling a cmd
I For the LRM in cross-section :fm <- lm(formula, data,...)
I Argument ... replace a series of arguments
I describing the model
I or choosing the computation mode (algorithm)
I or options
Research in Applied Econometrics Chapter 1. R Linear Regressions
Basic Regression Commands in R
I
The lm cmd returns an
objectI Here : the fitted model under the namefm
I Maybe visualised in many ways or summarized I
The lm object can be used to compute :
I Predictions & fitted values, residuals, ... by means of fm$... see RAE2017
I Tests & several postestimations diagnostics
I
Most estimation commands work the same way
Research in Applied Econometrics Chapter 1. R Linear Regressions
SWIRL
I
Do Lessons 1-6, course « Regression Models » in Swirl
I The others : later
I Concentrate on code, you know the econometrics
I Think of closing files that may have remained opened from the previous session
1.
“Introduction”
I To remember “A coefficient will be within 2 standard errors of its estimate about 95% of the time”
2.
“Residuals” is + difficult (reading + programming +concepts)
I Explains loops
I Forces to re-read previous cmds
I Make sure to execute program res_eqn.r when it shows up
Research in Applied Econometrics Chapter 1. R Linear Regressions
SWIRL
3.
“Least Squares Estimation” – nothing in particular
4.Introduction to Multivariable Regression
I Installmanipulatepreviously
I I am not sure of the stability of this lesson
I Do not edit the functionmyplotwhich will show up
I ! cor(gpa_nor, gch_nor) will be”= ˆ—, SWIRL expects =, so a bug
5.
“Residual Variation”
I “Gaussian elimination” shows that a k-regressors regression
I may be seen as a succession of k 1-regressor regressions
I DO NOT interpret this as model building or presentation or a way to select results
6.
“MultiVar Examples” – nothing in particular
Research in Applied Econometrics Chapter 1. R Linear Regressions
Multivariare Linear Regression w/ Factors
I
The purpose of this example is to demonstrate various R tools
I that are used to transform & combine regressors
I
Dataframe :
cps1988as before
I
SWIRL Course « Regression Models »
I lesson 7 : “MultiVar Examples2”
I Plots window for BoxPlot
I sapply : use help in help window
Research in Applied Econometrics Chapter 1. R Linear Regressions
Wage Equation
I
Wage Equation
log (wage) =—1+—2exp+—3exp2+—4education+—5ethnicity+‘
cps_lm<-lm(log(wage)~experience+I(experience^2) +education+ethnicity, data=CPS1988)
I
“Insulation function” I( )
I indicates to R that ^2 be understood as the square of exp
I otherwise, R is unsure of the meaning and withdraws experience^2
I This might be clearer w/ a formula y ~ a + (b+c)
I Are there 2 variables on the RHS of the formula : a et (b+c), or are there 3 ?
I To clarify, write y ~ a + I(b+c)
Research in Applied Econometrics Chapter 1. R Linear Regressions
Results & Testing
I
summary(cps_lm)
I The return of education (to the wage) is 8.57%/year
I % interpretation because wage is in log model
I Categorical variables are managed by R
I that selects the reference cat.
I
Compare Nested Models : Anova (Analysis of Variance) Table
I Regression + constraint
I cps_noeth<-lm(log(wage)~experience+
I(experience^2)+education, data=CPS1988)
I Usually, the test is on + than one variable
I anova(cps_noeth,cps_lm)
Research in Applied Econometrics Chapter 1. R Linear Regressions
Interactions : effects of combined regressors
I
e.g. in labor econ : the combined effect of education &
ethnicity
I Does one year of Education have the same return for different ethnicities ?
I
This is modeled w/
multiplicativeterms
I Consider
log (wage) =—1+—2ethnicity+—3ethnicity◊education+—4education+‘
I Thenˆlog (wage)/ˆeducation=—3ethinicity+—4
I Ifethinicity =0, then the effect of 1 year of education is—4 I Ifethinicity =1, then the effect of 1 year of education is
—3+—4
I
Let a, b, c three factors
I so that each has several discrete levels
I
and x, y two continuous variables (quantitative)
Research in Applied Econometrics Chapter 1. R Linear Regressions
Several Models/Formulas with Interactions
I
y~a+x : no interaction
I A single slope (of x) but one intercept for each level of factor a
I
y~a*x : same as previous model +
I one interaction term for each level of a with x (different slopes)
I In a more formal notation, letdai =I(a=i): [y≥aúx]©
C
y =—aiÿ
i
dai+“aixÿ
i
dai
D
Research in Applied Econometrics Chapter 1. R Linear Regressions
Formulas with Interactions
I
y~(a+b+c)^2
I models all the interactions at 2 variables
I but not at 3
I So this is like as many dichotomous var. as the nbr of levels dai≠bj=I(a=i·b=j)for a & b
I and similarly for a & c and for c & b I
SWIRL course Regression Models
I Lesson 8 : MultiVar Examples3
Research in Applied Econometrics Chapter 1. R Linear Regressions
Interactions Wage eq. : ethnicity & education
I cps_int<-lm(log(wage)~experience+I(experience^2) +education*ethnicity, data=CPS1988)
I Only one of the “+” fromcps_lm has been replaced by*
I
coeftest(cps_int)
I A + compact version of summary( )
I That can also be used on some other regression cmds
I
The regression outputs the effects of education & ethnicity
I called “main effects”
I and the product of education & an indicator for the level
“afam” ofethnicity
I Why afam ? Probably because it is less numerous than cauc
Research in Applied Econometrics Chapter 1. R Linear Regressions
Interactions Wage eq. : ethnicity & education
I
afam has a neg. effect on the intercept
I lower average wage for african-american
I AND on the slope ofeducation
I lower return ofeducationfor african-american I
The effect is not much significant though
I since a 5% significance with a sample of nearly 30 000 individuals is not much convincing
Research in Applied Econometrics Chapter 1. R Linear Regressions
Predictions
I
First define the values for which you want to predict.
I We simplify the model to exp. & educ. for ease of presentation
I Let’s say we want to show the effect of Exp. at an average level of Educ.
I
Create a new data frame w/ a C of average Educ & a C of all the possible values of Exp
I Note that in the Census, some people have negative experience !
I This is due to the way we compute Exp.
I
Use a predict( ) cmd on
I the lm object of interest : cps_lm here
I the new data set for which we want prediction : cps2 here
I predict( ) can not only gives a prediction but also bounds
I Plot that on the data
I
detach(CPS1988) when you are done to avoid confusion
Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building
Outline
SWIRL
Data Management R graphics
Linear Regressions
Discussing Regressors and Model Building
Document Edition Functionalities
Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building
When building a model, there are 2 contradictory forces
I
If we omit a regressor, and it is in fact relevant
I unobserved heterogeneity & inconsistency of LS estimators
I we sometimes can deal w/ that using instruments or panel
I
If we include irrelevant regressor that are correlated w/
relevant ones
I we create multicollinearity w/ the csqce that both relevant &
irrelevant regressors may appear non-signif.
I That may even occur w/ 2 relevant regressors, e.g. in a Quantity-Price relation, the price of the subtitutes goods are relevant, but may be correlated w/ own price
Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building
Collinearity – Endogeneity Trade-off
I
From a statistical point of view, 2 collinear variables carry the same information
I Their separate influence on the dependant variable cannot be assessed in the present sample
I Be pragmatic : reject one of the 2 or merge them in some way that makes sense in context
I
It is not really possible to escape such a trade-off
I Especially since in a particular sample, a relevant regressor may coincidentally appear non significant (if the sample is not large)
I
Theory does not help by nature
I since an empirical model is a trial of a model
I theory helps interpreting results, not guide them
Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building
Progressive Inclusion
I
is an old way of looking at model building
1.
Among potential regressors
x, take the one w/ highestcorrelation w/
y2.
Regress
yon that single regressor
I Is it significant ?
I No : you don’t have a model
I Yes : estimate the one-regressor model & compute its residuals
3.
Among the remaining regressors, take the one w/ highest correlation w/ the residuals
4.
Repeat previous steps with progressively more regressors
I Until one that is non-significant
Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building
Progressive Inclusion
I
The issue w/ this approach is that if there is several relevant regressors
I then at least the first step might be inconsistent
I because at least one relevant regressor is missing
I
This is a very serious issue that leads to non-sensical results
Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building
Progressive Elimination
I
Instead, consider the “largest reasonnable set of regressors”
I can be linked to the theory you want to test or to previous experience
I
It is risky to just run this “encompassing” regression and report the results
I because of multicollinearity
Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building
Progressive Elimination
I
Gradually remove regressors one by one
I Examine how the estimates of the remaining regressors evolve
I If there is a noticeable increase in significance
I but not so much change in estimates
I collinearity was an issue
I If estimated coefficients change wildly
I omitted regressor endogeneity I
However
I dropping collinear regressor could lead to jumps in coef estimates
I after all, collinearity affects their variance
I dropping a relevant regressor does not necessarily lead to major changes in the other coef
I when that regressor is not much correlated to the others
Research in Applied Econometrics Chapter 1. R Discussing Regressors and Model Building
Summing up
I
Model
yi =—0+—1x1i +—2x2i +‘i(no missing relevant regressor)
I estimation by MCO whenx2 andx1 are correlated
I if they are not, there is NO serious consequences for—ˆ1
I “not relevant but correlated to a relevant regressor” might not be empirically common
x2
Consequences on
—ˆ1on
—ˆ2 relevantincluded May appear insignificant
not incl. Inconsistent –
not relev.
included May appear insignificant should
æ0
not incl. ? ? –
Research in Applied Econometrics Chapter 1. R Document Edition Functionalities
Outline
SWIRL
Data Management R graphics
Linear Regressions
Discussing Regressors and Model Building
Document Edition FunctionalitiesResearch in Applied Econometrics Chapter 1. R Document Edition Functionalities
Writing with R
I
A few packages are designed to use R to write reports directly
1.The text is written directly in the script in the Editor window
I Math formulas in latex may be included
I Of course, R commands (graphics, regressions...) 2.
If the data change, or the model, everything is adjusted
automatically
3.
L
ATEX helps choose an appropriate format
I report, paper, presentation
Research in Applied Econometrics Chapter 1. R Document Edition Functionalities
SWeave – Knitr – Markdown
I SWeave
simply send the whole script to L
ATEX
I knitr
does the same but combine other packages and solve some issues in SWeave
I Markdown
is the current standard
I The script is directly printed using LATEX or .doc (Word) or html (webpage)
I Self-teach (I won’t look into it)
I http ://rmarkdown.rstudio.com/lesson-1.html
I https ://www.r-bloggers.com/how-to-create-reports-with-r- markdown-in-rstudio/
Research in Applied Econometrics Chapter 1. R Document Edition Functionalities
Should we sum up ?
I