The purpose of this in-class lab16 is to use R to practice with panel data estimation methods. The lab16 should be completed in your group. To get credit, upload your .R script to the appropriate place on Canvas.
Open up a new R script (named ICL16_XYZ.R
, where XYZ
are your initials) and add the usual “preamble” to the top:
# Add names of group members HERE
library(tidyverse)
library(wooldridge)
library(broom)
library(magrittr)
library(stargazer)
library(clubSandwich)
library(plm) # You may need to install this package
Our data set will be a panel of wages for 545 men. Load the data from the wooldridge
package, format year
to be a factor, and rename the variable nr
to something more descriptive (id
):
df <- as_tibble(wagepan)
df %>% skimr::skim()
Name | Piped data |
Number of rows | 4360 |
Number of columns | 44 |
_______________________ | |
Column type frequency: | |
numeric | 44 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
nr | 0 | 1 | 5262.06 | 3496.15 | 13.00 | 2329.00 | 4569.00 | 8406.00 | 12548.00 | ▇▇▅▆▃ |
year | 0 | 1 | 1983.50 | 2.29 | 1980.00 | 1981.75 | 1983.50 | 1985.25 | 1987.00 | ▇▃▇▃▇ |
agric | 0 | 1 | 0.03 | 0.18 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
black | 0 | 1 | 0.12 | 0.32 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
bus | 0 | 1 | 0.08 | 0.26 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
construc | 0 | 1 | 0.08 | 0.26 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
ent | 0 | 1 | 0.02 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
exper | 0 | 1 | 6.51 | 2.83 | 0.00 | 4.00 | 6.00 | 9.00 | 18.00 | ▃▇▅▁▁ |
fin | 0 | 1 | 0.04 | 0.19 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
hisp | 0 | 1 | 0.16 | 0.36 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
poorhlth | 0 | 1 | 0.02 | 0.13 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
hours | 0 | 1 | 2191.26 | 566.35 | 120.00 | 2040.00 | 2080.00 | 2414.25 | 4992.00 | ▁▃▇▁▁ |
manuf | 0 | 1 | 0.28 | 0.45 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
married | 0 | 1 | 0.44 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
min | 0 | 1 | 0.02 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
nrthcen | 0 | 1 | 0.26 | 0.44 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
nrtheast | 0 | 1 | 0.19 | 0.39 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
occ1 | 0 | 1 | 0.10 | 0.31 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
occ2 | 0 | 1 | 0.09 | 0.29 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
occ3 | 0 | 1 | 0.05 | 0.22 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
occ4 | 0 | 1 | 0.11 | 0.31 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
occ5 | 0 | 1 | 0.21 | 0.41 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
occ6 | 0 | 1 | 0.20 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
occ7 | 0 | 1 | 0.09 | 0.29 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
occ8 | 0 | 1 | 0.01 | 0.12 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
occ9 | 0 | 1 | 0.12 | 0.32 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
per | 0 | 1 | 0.02 | 0.13 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
pro | 0 | 1 | 0.08 | 0.27 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
pub | 0 | 1 | 0.04 | 0.20 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
rur | 0 | 1 | 0.20 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
south | 0 | 1 | 0.35 | 0.48 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
educ | 0 | 1 | 11.77 | 1.75 | 3.00 | 11.00 | 12.00 | 12.00 | 16.00 | ▁▁▂▇▂ |
tra | 0 | 1 | 0.07 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
trad | 0 | 1 | 0.27 | 0.44 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▃ |
union | 0 | 1 | 0.24 | 0.43 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▂ |
lwage | 0 | 1 | 1.65 | 0.53 | -3.58 | 1.35 | 1.67 | 1.99 | 4.05 | ▁▁▁▇▁ |
d81 | 0 | 1 | 0.12 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
d82 | 0 | 1 | 0.12 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
d83 | 0 | 1 | 0.12 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
d84 | 0 | 1 | 0.12 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
d85 | 0 | 1 | 0.12 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
d86 | 0 | 1 | 0.12 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
d87 | 0 | 1 | 0.12 | 0.33 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 | ▇▁▁▁▁ |
expersq | 0 | 1 | 50.42 | 40.78 | 0.00 | 16.00 | 36.00 | 81.00 | 324.00 | ▇▂▁▁▁ |
df %<>%
dplyr::mutate(year=as_factor(year))
df %<>%
dplyr::rename(id = nr)
df %>% head()
## # A tibble: 6 x 44
## id year agric black bus construc ent exper fin hisp poorhlth hours
## <int> <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 13 1980 0 0 1 0 0 1 0 0 0 2672
## 2 13 1981 0 0 0 0 0 2 0 0 0 2320
## 3 13 1982 0 0 1 0 0 3 0 0 0 2940
## 4 13 1983 0 0 1 0 0 4 0 0 0 2960
## 5 13 1984 0 0 0 0 0 5 0 0 0 3071
## 6 13 1985 0 0 1 0 0 6 0 0 0 2864
## # ... with 32 more variables: manuf <int>, married <int>, min <int>,
## # nrthcen <int>, nrtheast <int>, occ1 <int>, occ2 <int>, occ3 <int>,
## # occ4 <int>, occ5 <int>, occ6 <int>, occ7 <int>, occ8 <int>, occ9 <int>,
## # per <int>, pro <int>, pub <int>, rur <int>, south <int>, educ <int>,
## # tra <int>, trad <int>, union <int>, lwage <dbl>, d81 <int>, d82 <int>,
## # d83 <int>, d84 <int>, d85 <int>, d86 <int>, d87 <int>, expersq <int>
It is important to know what your panel looks like. How many units? How many time periods? The easiest way to do this is the pdim()
function in the plm
package.
pdim(df)
## Balanced Panel: n = 545, T = 8, N = 4360
It is also helpful to “convert” the data to a cross-section of within-unit averages. Let’s do this for some of the key variables of our analysis.
df.within <- df %>%
dplyr::select(id,
year,
educ,
married,
union,
rur) %>%
group_by(id) %>%
summarize(
mean.edu = mean(educ),
var.edu = var(educ),
mean.marr = mean(married),
var.marr = var(married),
mean.union = mean(union),
var.union = var(union),
mean.rural = mean(rur),
var.rural = var(rur)
)
df.within %>% datatable()
df %>%
dplyr::select(id,
year,
educ,
married,
union,
rur) %>%
group_by(id) %>%
dplyr::summarise_at(vars(educ,married,union,rur),
list(mean = mean,var = var)) %>%
DT::datatable()
df.within %>% as.data.frame %>% stargazer(type="text")
##
## =================================================================
## Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
## -----------------------------------------------------------------
## id 545 5,262.059 3,498.960 13 2,329 8,406 12,548
## mean.edu 545 11.767 1.748 3 11 12 16
## var.edu 545 0.000 0.000 0 0 0 0
## mean.marr 545 0.439 0.377 0 0 0.8 1
## var.marr 545 0.120 0.115 0 0 0.2 0
## mean.union 545 0.244 0.329 0.000 0.000 0.375 1.000
## var.union 545 0.087 0.105 0.000 0.000 0.214 0.286
## mean.rural 545 0.204 0.359 0 0 0.2 1
## var.rural 545 0.039 0.087 0 0 0 0
## -----------------------------------------------------------------
Is there any within-person variance in the educ
variable? What about married
, union
, and rural
?
What does it mean for the married, union, or rural variables to have a positive within-person variance?
Why is it important to know if a variable has positive within-person variance?
Now estimate the following model using various options of the plm()
function:
\[ \begin{align*} \log(wage_{it}) & = \beta_0 + \beta_1 educ_{i} + \beta_2 black_{i} + \beta_3 hisp_{i} + \beta_4 exper_{it} + \beta_5 exper_{it}^2 + \beta_6 married_{it} + \\ &\phantom{=\,\,}\beta_7 union_{it} + \beta_8 rur_{it} + \sum_t \beta_{9,t}year_{it} + a_i + u_{it} \end{align*} \]
The pooled OLS model can be run from the plm()
function as follows.
est.pols <- plm(lwage ~ educ + black + hisp + exper + I(exper^2) + married + union + rur + year,
data = df, index = c("id","year"),
model = "pooling")
est.re <- plm(lwage ~ educ + black + hisp + exper + I(exper^2) + married + union + rur + year,
data = df, index = c("id","year"), model = "random")
Explain why the standard errors in the random effects model are larger (or equal to) those of the pooled OLS model.
What is the estimate of \(\theta\) in the RE model? (Hint: check est.re$ercomp$theta
) What does this tell you about what you expect the random effects estimates to be relative to the fixed effects estimates?
est.fe <- plm(lwage ~ I(exper^2) + married + union + rur + year,
data = df,
index = c("id","year"),
model = "within")
educ
, black
, hisp
, or exper
in the fixed effects model. (Note: the reason for not being able to estimate exper
is more nuanced)As discussed in class, the most appropriate standard errors are not those reported by plm()
. Let’s compute standard errors that account for within-person serial correlation, and that are robust to heteroskedasticity:
clust.po <- coef_test(est.pols, vcov = "CR1", cluster = "individual")
clust.re <- coef_test(est.re, vcov = "CR1", cluster = "individual")
clust.fe <- coef_test(est.fe, vcov = "CR1", cluster = "individual")
We can put these all in one stargazer
table:
stargazer(est.pols,
est.re,
est.fe,
se=list(clust.po$SE,
clust.re$SE,
clust.fe$SE),
type="text")
##
## ===========================================================================
## Dependent variable:
## --------------------------------------------------------------
## lwage
## (1) (2) (3)
## ---------------------------------------------------------------------------
## educ 0.087*** 0.091***
## (0.011) (0.011)
##
## black -0.149*** -0.141***
## (0.050) (0.051)
##
## hisp -0.016 0.016
## (0.040) (0.040)
##
## exper 0.069*** 0.106***
## (0.019) (0.016)
##
## I(exper2) -0.002** -0.005*** -0.005***
## (0.001) (0.001) (0.001)
##
## married 0.126*** 0.065*** 0.047**
## (0.026) (0.019) (0.021)
##
## union 0.182*** 0.107*** 0.079***
## (0.027) (0.021) (0.023)
##
## rur -0.138*** -0.023 0.049
## (0.035) (0.031) (0.039)
##
## year1981 0.053* 0.040 0.152***
## (0.028) (0.028) (0.026)
##
## year1982 0.055 0.030 0.254***
## (0.037) (0.035) (0.029)
##
## year1983 0.049 0.019 0.357***
## (0.046) (0.044) (0.035)
##
## year1984 0.074 0.041 0.494***
## (0.057) (0.055) (0.046)
##
## year1985 0.090 0.056 0.622***
## (0.066) (0.064) (0.057)
##
## year1986 0.120 0.089 0.771***
## (0.075) (0.075) (0.072)
##
## year1987 0.151* 0.132 0.931***
## (0.084) (0.085) (0.085)
##
## Constant 0.165 0.036
## (0.161) (0.161)
##
## ---------------------------------------------------------------------------
## Observations 4,360 4,360 4,360
## R2 0.199 0.181 0.181
## Adjusted R2 0.196 0.178 0.062
## F Statistic 72.036*** (df = 15; 4344) 958.833*** 76.530*** (df = 11; 3804)
## ===========================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Interpret the coefficient on union
across the three models.
Comment on the comparability of the pooled OLS and RE estimators now that within-person serial correlation has been addressed.