The purpose of this in-class lab13 is to use R to practice with instrumental variables estimation. The lab should be completed in your group. To get credit, upload your .R script to the appropriate place on Canvas.
You may need to install the package AER. (It may have already been installed when you previously installed car and zoo.)
Open up a new R script (named ICL13_XYZ.R, where XYZ are your initials) and add the usual “preamble” to the top:
# Add names of group members HERE
library(tidyverse)
library(wooldridge)
library(broom)
library(AER)
library(magrittr)
library(stargazer)We’re going to use data on fertility of Botswanian women.
df <- as_tibble(fertil2)
df %>% glimpse()## Observations: 4,361
## Variables: 27
## $ mnthborn <int> 5, 1, 7, 11, 5, 8, 7, 9, 12, 9, 6, 10, 12, 2, 1, 6, 1, 8, ...
## $ yearborn <int> 64, 56, 58, 45, 45, 52, 51, 70, 53, 39, 46, 59, 42, 40, 53...
## $ age      <int> 24, 32, 30, 42, 43, 36, 37, 18, 34, 49, 42, 29, 45, 48, 35...
## $ electric <int> 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ radio    <int> 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ tv       <int> 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1...
## $ bicycle  <int> 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0...
## $ educ     <int> 12, 13, 5, 4, 11, 7, 16, 10, 5, 4, 15, 7, 0, 4, 12, 7, 7, ...
## $ ceb      <int> 0, 3, 1, 3, 2, 1, 4, 0, 1, 0, 3, 3, 4, 10, 3, 0, 4, 2, 0, ...
## $ agefbrth <int> NA, 25, 27, 17, 24, 26, 20, NA, 19, NA, 25, 23, 18, 19, 23...
## $ children <int> 0, 3, 1, 2, 2, 1, 4, 0, 1, 0, 3, 3, 2, 8, 3, 0, 4, 2, 0, 1...
## $ knowmeth <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ usemeth  <int> 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1...
## $ monthfm  <int> NA, 11, 6, 1, 3, 11, 5, NA, 7, 11, 6, 1, 1, 10, 1, NA, NA,...
## $ yearfm   <int> NA, 80, 83, 61, 66, 76, 78, NA, 72, 61, 70, 84, 66, 66, 74...
## $ agefm    <int> NA, 24, 24, 15, 20, 24, 26, NA, 18, 22, 24, 24, 23, 26, 21...
## $ idlnchld <int> 2, 3, 5, 3, 2, 4, 4, 4, 4, 4, 3, 6, 6, 4, 3, 4, 5, 1, 2, 3...
## $ heduc    <int> NA, 12, 7, 11, 14, 9, 17, NA, 3, 1, 16, 7, NA, 3, 16, NA, ...
## $ agesq    <int> 576, 1024, 900, 1764, 1849, 1296, 1369, 324, 1156, 2401, 1...
## $ urban    <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ urb_educ <int> 12, 13, 5, 4, 11, 7, 16, 10, 5, 4, 15, 7, 0, 4, 12, 7, 7, ...
## $ spirit   <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1...
## $ protest  <int> 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0...
## $ catholic <int> 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
## $ frsthalf <int> 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0...
## $ educ0    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0...
## $ evermarr <int> 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1...We can easily compute summary statistics of our data by using the stargazer package:
df %>% as.data.frame %>% stargazer(type="text")## 
## ================================================================
## Statistic   N    Mean   St. Dev.  Min   Pctl(25) Pctl(75)  Max  
## ----------------------------------------------------------------
## mnthborn  4,361  6.331   3.323     1       3        9       12  
## yearborn  4,361 60.434   8.683     38      55       68      73  
## age       4,361 27.405   8.685     15      20       33      49  
## electric  4,358  0.140   0.347   0.000   0.000    0.000   1.000 
## radio     4,359  0.702   0.458   0.000   0.000    1.000   1.000 
## tv        4,359  0.093   0.290   0.000   0.000    0.000   1.000 
## bicycle   4,358  0.276   0.447   0.000   0.000    1.000   1.000 
## educ      4,361  5.856   3.927     0       3        8       20  
## ceb       4,361  2.442   2.407     0       1        4       13  
## agefbrth  3,273 19.011   3.092   10.000  17.000   20.000  38.000
## children  4,361  2.268   2.222     0       0        4       13  
## knowmeth  4,354  0.963   0.188   0.000   1.000    1.000   1.000 
## usemeth   4,290  0.578   0.494   0.000   0.000    1.000   1.000 
## monthfm   2,079  6.270   3.620   1.000   3.000    9.000   12.000
## yearfm    2,079 76.912   7.760   50.000  72.000   83.000  88.000
## agefm     2,079 20.686   5.002   10.000  17.000   23.000  46.000
## idlnchld  4,241  4.616   2.219   0.000   3.000    6.000   20.000
## heduc     1,956  5.145   4.803   0.000   0.000    8.000   20.000
## agesq     4,361 826.460 526.923   225     400     1,089   2,401 
## urban     4,361  0.517   0.500     0       0        1       1   
## urb_educ  4,361  3.469   4.294     0       0        7       20  
## spirit    4,361  0.422   0.494     0       0        1       1   
## protest   4,361  0.228   0.419     0       0        0       1   
## catholic  4,361  0.102   0.303     0       0        0       1   
## frsthalf  4,361  0.540   0.498     0       0        1       1   
## educ0     4,361  0.208   0.406     0       0        0       1   
## evermarr  4,361  0.477   0.500     0       0        1       1   
## ----------------------------------------------------------------or:
df %>% as.data.frame() %>% skimr::skim()| Name | Piped data | 
| Number of rows | 4361 | 
| Number of columns | 27 | 
| _______________________ | |
| Column type frequency: | |
| numeric | 27 | 
| ________________________ | |
| Group variables | None | 
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist | 
|---|---|---|---|---|---|---|---|---|---|---|
| mnthborn | 0 | 1.00 | 6.33 | 3.32 | 1 | 3 | 6 | 9 | 12 | ▇▅▇▆▆ | 
| yearborn | 0 | 1.00 | 60.43 | 8.68 | 38 | 55 | 62 | 68 | 73 | ▂▃▆▇▇ | 
| age | 0 | 1.00 | 27.41 | 8.69 | 15 | 20 | 26 | 33 | 49 | ▇▇▆▃▂ | 
| electric | 3 | 1.00 | 0.14 | 0.35 | 0 | 0 | 0 | 0 | 1 | ▇▁▁▁▁ | 
| radio | 2 | 1.00 | 0.70 | 0.46 | 0 | 0 | 1 | 1 | 1 | ▃▁▁▁▇ | 
| tv | 2 | 1.00 | 0.09 | 0.29 | 0 | 0 | 0 | 0 | 1 | ▇▁▁▁▁ | 
| bicycle | 3 | 1.00 | 0.28 | 0.45 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▃ | 
| educ | 0 | 1.00 | 5.86 | 3.93 | 0 | 3 | 7 | 8 | 20 | ▆▇▅▁▁ | 
| ceb | 0 | 1.00 | 2.44 | 2.41 | 0 | 1 | 2 | 4 | 13 | ▇▃▁▁▁ | 
| agefbrth | 1088 | 0.75 | 19.01 | 3.09 | 10 | 17 | 19 | 20 | 38 | ▁▇▂▁▁ | 
| children | 0 | 1.00 | 2.27 | 2.22 | 0 | 0 | 2 | 4 | 13 | ▇▃▁▁▁ | 
| knowmeth | 7 | 1.00 | 0.96 | 0.19 | 0 | 1 | 1 | 1 | 1 | ▁▁▁▁▇ | 
| usemeth | 71 | 0.98 | 0.58 | 0.49 | 0 | 0 | 1 | 1 | 1 | ▆▁▁▁▇ | 
| monthfm | 2282 | 0.48 | 6.27 | 3.62 | 1 | 3 | 6 | 9 | 12 | ▇▃▅▅▇ | 
| yearfm | 2282 | 0.48 | 76.91 | 7.76 | 50 | 72 | 78 | 83 | 88 | ▁▂▃▇▇ | 
| agefm | 2282 | 0.48 | 20.69 | 5.00 | 10 | 17 | 20 | 23 | 46 | ▅▇▂▁▁ | 
| idlnchld | 120 | 0.97 | 4.62 | 2.22 | 0 | 3 | 4 | 6 | 20 | ▇▅▁▁▁ | 
| heduc | 2405 | 0.45 | 5.14 | 4.80 | 0 | 0 | 6 | 8 | 20 | ▇▅▃▁▁ | 
| agesq | 0 | 1.00 | 826.46 | 526.92 | 225 | 400 | 676 | 1089 | 2401 | ▇▅▂▁▁ | 
| urban | 0 | 1.00 | 0.52 | 0.50 | 0 | 0 | 1 | 1 | 1 | ▇▁▁▁▇ | 
| urb_educ | 0 | 1.00 | 3.47 | 4.29 | 0 | 0 | 0 | 7 | 20 | ▇▃▂▁▁ | 
| spirit | 0 | 1.00 | 0.42 | 0.49 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▆ | 
| protest | 0 | 1.00 | 0.23 | 0.42 | 0 | 0 | 0 | 0 | 1 | ▇▁▁▁▂ | 
| catholic | 0 | 1.00 | 0.10 | 0.30 | 0 | 0 | 0 | 0 | 1 | ▇▁▁▁▁ | 
| frsthalf | 0 | 1.00 | 0.54 | 0.50 | 0 | 0 | 1 | 1 | 1 | ▇▁▁▁▇ | 
| educ0 | 0 | 1.00 | 0.21 | 0.41 | 0 | 0 | 0 | 0 | 1 | ▇▁▁▁▂ | 
| evermarr | 0 | 1.00 | 0.48 | 0.50 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▇ | 
Suppose we want to see if education causes lower fertility (as can be seen when comparing more- and less-educated countries):
\[ children = \beta_0 + \beta_1 educ + \beta_2 age + \beta_3 age^2 + u \]
where \(children\) is the number of children born to the woman, \(educ\) is years of education, and \(age\) is age (in years).
est.ols.rob <- estimatr::lm_robust(children ~ educ + age + I(age^2), data=df)
est.ols <- lm(children ~ educ + age + I(age^2), data=df)summary(est.ols)## 
## Call:
## lm(formula = children ~ educ + age + I(age^2), data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.8351 -0.7135 -0.0054  0.7141  7.5055 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.1383066  0.2405942 -17.200   <2e-16 ***
## educ        -0.0905755  0.0059207 -15.298   <2e-16 ***
## age          0.3324486  0.0165495  20.088   <2e-16 ***
## I(age^2)    -0.0026308  0.0002726  -9.651   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.46 on 4357 degrees of freedom
## Multiple R-squared:  0.5687, Adjusted R-squared:  0.5684 
## F-statistic:  1915 on 3 and 4357 DF,  p-value: < 2.2e-16summary(est.ols.rob)## 
## Call:
## estimatr::lm_robust(formula = children ~ educ + age + I(age^2), 
##     data = df)
## 
## Standard error type:  HC2 
## 
## Coefficients:
##              Estimate Std. Error t value  Pr(>|t|)  CI Lower CI Upper   DF
## (Intercept) -4.138307  0.2438139 -16.973 1.282e-62 -4.616306 -3.66031 4357
## educ        -0.090575  0.0060502 -14.971 1.901e-49 -0.102437 -0.07871 4357
## age          0.332449  0.0192233  17.294 7.309e-65  0.294761  0.37014 4357
## I(age^2)    -0.002631  0.0003523  -7.468 9.800e-14 -0.003321 -0.00194 4357
## 
## Multiple R-squared:  0.5687 ,    Adjusted R-squared:  0.5684 
## F-statistic:  1922 on 3 and 4357 DF,  p-value: < 2.2e-16(Note: include I(age^2) puts the quadratic term in automatically without us having to use mutate() to create a new variable called age.sq.)
We can also use stargazer to examine the output. It puts the standard errors of each variable in parentheses under the estimated coefficient.
stargazer(est.ols, type="text")## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                              children          
## -----------------------------------------------
## educ                         -0.091***         
##                               (0.006)          
##                                                
## age                          0.332***          
##                               (0.017)          
##                                                
## I(age2)                      -0.003***         
##                              (0.0003)          
##                                                
## Constant                     -4.138***         
##                               (0.241)          
##                                                
## -----------------------------------------------
## Observations                   4,361           
## R2                             0.569           
## Adjusted R2                    0.568           
## Residual Std. Error      1.460 (df = 4357)     
## F Statistic         1,915.196*** (df = 3; 4357)
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01We know that education is endogenous (i.e. people choose the level of education that maximizes their utility). A possible instrument for education is \(firsthalf\), which is a dummy equal to 1 if the woman was born in the first half of the calendar year, and 0 otherwise.
Let’s create this variable:
df %<>% 
  dplyr::mutate(firsthalf = mnthborn<7)We will assume that \(firsthalf\) is uncorrelated with \(u\).
est.iv1.est.iv1 <- lm(educ ~ firsthalf, data=df)
stargazer(est.iv1, type="text")## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                educ            
## -----------------------------------------------
## firsthalf                    -0.938***         
##                               (0.118)          
##                                                
## Constant                     6.363***          
##                               (0.087)          
##                                                
## -----------------------------------------------
## Observations                   4,361           
## R2                             0.014           
## Adjusted R2                    0.014           
## Residual Std. Error      3.900 (df = 4359)     
## F Statistic          62.620*** (df = 1; 4359)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01Now let’s do the IV regression:
est.iv <- ivreg(children ~ educ + age + I(age^2) | firsthalf + age + I(age^2), data=df)summary(est.iv)## 
## Call:
## ivreg(formula = children ~ educ + age + I(age^2) | firsthalf + 
##     age + I(age^2), data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -6.05272 -0.71481  0.06224  0.76236  7.23693 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.3878054  0.5481502  -6.180 6.98e-10 ***
## educ        -0.1714989  0.0531796  -3.225  0.00127 ** 
## age          0.3236052  0.0178596  18.119  < 2e-16 ***
## I(age^2)    -0.0026723  0.0002797  -9.555  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.491 on 4357 degrees of freedom
## Multiple R-Squared: 0.5502,  Adjusted R-squared: 0.5499 
## Wald test:  1765 on 3 and 4357 DF,  p-value: < 2.2e-16summary(est.ols)## 
## Call:
## lm(formula = children ~ educ + age + I(age^2), data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.8351 -0.7135 -0.0054  0.7141  7.5055 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.1383066  0.2405942 -17.200   <2e-16 ***
## educ        -0.0905755  0.0059207 -15.298   <2e-16 ***
## age          0.3324486  0.0165495  20.088   <2e-16 ***
## I(age^2)    -0.0026308  0.0002726  -9.651   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.46 on 4357 degrees of freedom
## Multiple R-squared:  0.5687, Adjusted R-squared:  0.5684 
## F-statistic:  1915 on 3 and 4357 DF,  p-value: < 2.2e-16The variables on the right hand side of the | are the instruments (including the \(x\)’s that we assume to be exogenous, like \(age\)). The endogenous \(x\) is the first one after the ~.
Now we can compare the output for each of the models:
stargazer(est.ols,est.iv1,est.iv, type="text")## 
## ==========================================================================================
##                                              Dependent variable:                          
##                     ----------------------------------------------------------------------
##                              children                     educ               children     
##                                 OLS                       OLS              instrumental   
##                                                                              variable     
##                                 (1)                       (2)                   (3)       
## ------------------------------------------------------------------------------------------
## educ                         -0.091***                                       -0.171***    
##                               (0.006)                                         (0.053)     
##                                                                                           
## age                          0.332***                                        0.324***     
##                               (0.017)                                         (0.018)     
##                                                                                           
## I(age2)                      -0.003***                                       -0.003***    
##                              (0.0003)                                        (0.0003)     
##                                                                                           
## firsthalf                                              -0.938***                          
##                                                         (0.118)                           
##                                                                                           
## Constant                     -4.138***                  6.363***             -3.388***    
##                               (0.241)                   (0.087)               (0.548)     
##                                                                                           
## ------------------------------------------------------------------------------------------
## Observations                   4,361                     4,361                 4,361      
## R2                             0.569                     0.014                 0.550      
## Adjusted R2                    0.568                     0.014                 0.550      
## Residual Std. Error      1.460 (df = 4357)         3.900 (df = 4359)     1.491 (df = 4357)
## F Statistic         1,915.196*** (df = 3; 4357) 62.620*** (df = 1; 4359)                  
## ==========================================================================================
## Note:                                                          *p<0.1; **p<0.05; ***p<0.01