16混合OLS、随机效应和固定效应估计

The purpose of this in-class lab16 is to use R to practice with panel data estimation methods. The lab16 should be completed in your group. To get credit, upload your .R script to the appropriate place on Canvas.

16.1 For starters

Open up a new R script (named ICL16_XYZ.R, where XYZ are your initials) and add the usual “preamble” to the top:

# Add names of group members HERE
library(tidyverse)
library(wooldridge)
library(broom)
library(magrittr)
library(stargazer)
library(clubSandwich)
library(plm)   # You may need to install this package

16.1.1 Load the data

Our data set will be a panel of wages for 545 men. Load the data from the wooldridge package, format year to be a factor, and rename the variable nr to something more descriptive (id):

df <- as_tibble(wagepan)
df %>% skimr::skim()

Data summary
Name	Piped data
Number of rows	4360
Number of columns	44
_______________________
Column type frequency:
numeric	44
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
nr	1	5262.06	3496.15	13.00	2329.00	4569.00	8406.00	12548.00	▇▇▅▆▃
year	1	1983.50	2.29	1980.00	1981.75	1983.50	1985.25	1987.00	▇▃▇▃▇
agric	1	0.03	0.18	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
black	1	0.12	0.32	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
bus	1	0.08	0.26	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
construc	1	0.08	0.26	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
ent	1	0.02	0.12	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
exper	1	6.51	2.83	0.00	4.00	6.00	9.00	18.00	▃▇▅▁▁
fin	1	0.04	0.19	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
hisp	1	0.16	0.36	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▂
poorhlth	1	0.02	0.13	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
hours	1	2191.26	566.35	120.00	2040.00	2080.00	2414.25	4992.00	▁▃▇▁▁
manuf	1	0.28	0.45	0.00	0.00	0.00	1.00	1.00	▇▁▁▁▃
married	1	0.44	0.50	0.00	0.00	0.00	1.00	1.00	▇▁▁▁▆
min	1	0.02	0.12	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
nrthcen	1	0.26	0.44	0.00	0.00	0.00	1.00	1.00	▇▁▁▁▃
nrtheast	1	0.19	0.39	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▂
occ1	1	0.10	0.31	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
occ2	1	0.09	0.29	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
occ3	1	0.05	0.22	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
occ4	1	0.11	0.31	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
occ5	1	0.21	0.41	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▂
occ6	1	0.20	0.40	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▂
occ7	1	0.09	0.29	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
occ8	1	0.01	0.12	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
occ9	1	0.12	0.32	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
per	1	0.02	0.13	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
pro	1	0.08	0.27	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
pub	1	0.04	0.20	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
rur	1	0.20	0.40	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▂
south	1	0.35	0.48	0.00	0.00	0.00	1.00	1.00	▇▁▁▁▅
educ	1	11.77	1.75	3.00	11.00	12.00	12.00	16.00	▁▁▂▇▂
tra	1	0.07	0.25	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
trad	1	0.27	0.44	0.00	0.00	0.00	1.00	1.00	▇▁▁▁▃
union	1	0.24	0.43	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▂
lwage	1	1.65	0.53	-3.58	1.35	1.67	1.99	4.05	▁▁▁▇▁
d81	1	0.12	0.33	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
d82	1	0.12	0.33	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
d83	1	0.12	0.33	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
d84	1	0.12	0.33	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
d85	1	0.12	0.33	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
d86	1	0.12	0.33	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
d87	1	0.12	0.33	0.00	0.00	0.00	0.00	1.00	▇▁▁▁▁
expersq	1	50.42	40.78	0.00	16.00	36.00	81.00	324.00	▇▂▁▁▁

df %<>% 
  dplyr::mutate(year=as_factor(year))
df %<>% 
  dplyr::rename(id = nr)

df %>% head()

## # A tibble: 6 x 44
##      id year  agric black   bus construc   ent exper   fin  hisp poorhlth hours
##   <int> <fct> <int> <int> <int>    <int> <int> <int> <int> <int>    <int> <int>
## 1    13 1980      0     0     1        0     0     1     0     0        0  2672
## 2    13 1981      0     0     0        0     0     2     0     0        0  2320
## 3    13 1982      0     0     1        0     0     3     0     0        0  2940
## 4    13 1983      0     0     1        0     0     4     0     0        0  2960
## 5    13 1984      0     0     0        0     0     5     0     0        0  3071
## 6    13 1985      0     0     1        0     0     6     0     0        0  2864
## # ... with 32 more variables: manuf <int>, married <int>, min <int>,
## #   nrthcen <int>, nrtheast <int>, occ1 <int>, occ2 <int>, occ3 <int>,
## #   occ4 <int>, occ5 <int>, occ6 <int>, occ7 <int>, occ8 <int>, occ9 <int>,
## #   per <int>, pro <int>, pub <int>, rur <int>, south <int>, educ <int>,
## #   tra <int>, trad <int>, union <int>, lwage <dbl>, d81 <int>, d82 <int>,
## #   d83 <int>, d84 <int>, d85 <int>, d86 <int>, d87 <int>, expersq <int>

16.1.2 Summary statistics for panel data

It is important to know what your panel looks like. How many units? How many time periods? The easiest way to do this is the pdim() function in the plm package.

pdim(df)

## Balanced Panel: n = 545, T = 8, N = 4360

It is also helpful to “convert” the data to a cross-section of within-unit averages. Let’s do this for some of the key variables of our analysis.

df.within <- df %>% 
  dplyr::select(id,
                year,
                educ,
                married,
                union,
                rur) %>% 
             group_by(id) %>% 
             summarize(
                 mean.edu = mean(educ),
                 var.edu  = var(educ),
                mean.marr = mean(married),
                 var.marr = var(married),
               mean.union = mean(union),
                var.union = var(union),
               mean.rural = mean(rur),
                var.rural = var(rur)
             )

df.within %>% datatable()

df %>% 
  dplyr::select(id,
                year,
                educ,
                married,
                union,
                rur) %>% 
             group_by(id) %>% 
  dplyr::summarise_at(vars(educ,married,union,rur),
                      list(mean = mean,var = var)) %>% 
  DT::datatable()

df.within %>% as.data.frame %>% stargazer(type="text")

## 
## =================================================================
## Statistic   N    Mean    St. Dev.   Min  Pctl(25) Pctl(75)  Max  
## -----------------------------------------------------------------
## id         545 5,262.059 3,498.960  13    2,329    8,406   12,548
## mean.edu   545  11.767     1.748     3      11       12      16  
## var.edu    545   0.000     0.000     0      0        0       0   
## mean.marr  545   0.439     0.377     0      0       0.8      1   
## var.marr   545   0.120     0.115     0      0       0.2      0   
## mean.union 545   0.244     0.329   0.000  0.000    0.375   1.000 
## var.union  545   0.087     0.105   0.000  0.000    0.214   0.286 
## mean.rural 545   0.204     0.359     0      0       0.2      1   
## var.rural  545   0.039     0.087     0      0        0       0   
## -----------------------------------------------------------------

Is there any within-person variance in the educ variable? What about married, union, and rural?
What does it mean for the married, union, or rural variables to have a positive within-person variance?
Why is it important to know if a variable has positive within-person variance?

16.2 Pooled OLS, Random Effects, and Fixed Effects Models

Now estimate the following model using various options of the plm() function:

\[ \begin{align*} \log(wage_{it}) & = \beta_0 + \beta_1 educ_{i} + \beta_2 black_{i} + \beta_3 hisp_{i} + \beta_4 exper_{it} + \beta_5 exper_{it}^2 + \beta_6 married_{it} + \\ &\phantom{=\,\,}\beta_7 union_{it} + \beta_8 rur_{it} + \sum_t \beta_{9,t}year_{it} + a_i + u_{it} \end{align*} \]

16.2.1 Pooled OLS

The pooled OLS model can be run from the plm() function as follows.

est.pols <- plm(lwage ~ educ + black + hisp + exper + I(exper^2) + married + union + rur + year,
                data = df, index = c("id","year"), 
                model = "pooling")

Interpret the coefficient on $\beta_7$ in the pooled OLS model

16.2.2 Random effects

est.re <- plm(lwage ~ educ + black + hisp + exper + I(exper^2) + married + union + rur + year, 
              data = df, index = c("id","year"), model = "random")

Explain why the standard errors in the random effects model are larger (or equal to) those of the pooled OLS model.
What is the estimate of $\theta$ in the RE model? (Hint: check est.re$ercomp$theta) What does this tell you about what you expect the random effects estimates to be relative to the fixed effects estimates?

16.2.3 Fixed effects

est.fe <- plm(lwage ~ I(exper^2) + married + union + rur + year, 
              data = df, 
              index = c("id","year"), 
              model = "within")

Explain why we cannot estimate coefficients on educ, black, hisp, or exper in the fixed effects model. (Note: the reason for not being able to estimate exper is more nuanced)

16.3 Clustered standard errors

As discussed in class, the most appropriate standard errors are not those reported by plm(). Let’s compute standard errors that account for within-person serial correlation, and that are robust to heteroskedasticity:

clust.po <- coef_test(est.pols, vcov = "CR1", cluster = "individual")
clust.re <- coef_test(est.re,   vcov = "CR1", cluster = "individual")
clust.fe <- coef_test(est.fe,   vcov = "CR1", cluster = "individual")

We can put these all in one stargazer table:

stargazer(est.pols,
          est.re,
          est.fe,
          se=list(clust.po$SE,
                  clust.re$SE,
                  clust.fe$SE),
          type="text")

## 
## ===========================================================================
##                                   Dependent variable:                      
##              --------------------------------------------------------------
##                                          lwage                             
##                         (1)               (2)                (3)           
## ---------------------------------------------------------------------------
## educ                 0.087***           0.091***                           
##                       (0.011)           (0.011)                            
##                                                                            
## black                -0.149***         -0.141***                           
##                       (0.050)           (0.051)                            
##                                                                            
## hisp                  -0.016             0.016                             
##                       (0.040)           (0.040)                            
##                                                                            
## exper                0.069***           0.106***                           
##                       (0.019)           (0.016)                            
##                                                                            
## I(exper2)            -0.002**          -0.005***          -0.005***        
##                       (0.001)           (0.001)            (0.001)         
##                                                                            
## married              0.126***           0.065***           0.047**         
##                       (0.026)           (0.019)            (0.021)         
##                                                                            
## union                0.182***           0.107***          0.079***         
##                       (0.027)           (0.021)            (0.023)         
##                                                                            
## rur                  -0.138***           -0.023             0.049          
##                       (0.035)           (0.031)            (0.039)         
##                                                                            
## year1981              0.053*             0.040            0.152***         
##                       (0.028)           (0.028)            (0.026)         
##                                                                            
## year1982               0.055             0.030            0.254***         
##                       (0.037)           (0.035)            (0.029)         
##                                                                            
## year1983               0.049             0.019            0.357***         
##                       (0.046)           (0.044)            (0.035)         
##                                                                            
## year1984               0.074             0.041            0.494***         
##                       (0.057)           (0.055)            (0.046)         
##                                                                            
## year1985               0.090             0.056            0.622***         
##                       (0.066)           (0.064)            (0.057)         
##                                                                            
## year1986               0.120             0.089            0.771***         
##                       (0.075)           (0.075)            (0.072)         
##                                                                            
## year1987              0.151*             0.132            0.931***         
##                       (0.084)           (0.085)            (0.085)         
##                                                                            
## Constant               0.165             0.036                             
##                       (0.161)           (0.161)                            
##                                                                            
## ---------------------------------------------------------------------------
## Observations           4,360             4,360              4,360          
## R2                     0.199             0.181              0.181          
## Adjusted R2            0.196             0.178              0.062          
## F Statistic  72.036*** (df = 15; 4344) 958.833*** 76.530*** (df = 11; 3804)
## ===========================================================================
## Note:                                           *p<0.1; **p<0.05; ***p<0.01

Interpret the coefficient on union across the three models.
Comment on the comparability of the pooled OLS and RE estimators now that within-person serial correlation has been addressed.