The purpose of this in-class lab12 is to use R to practice estimating time series regression models with standard errors corrected for heteroskedasticity and serial correlation (HAC). The lab11 should be completed in your group. To get credit, upload your .R script to the appropriate place on Canvas.
Load the usual packages, as well as the new ones installed in Lab 11.1
Open up a new R script (named ICL12_XYZ.R
, where XYZ
are your initials) and add the usual “preamble” to the top:
# Add names of group members HERE
library(tidyverse)
library(wooldridge)
library(broom)
library(car)
library(pdfetch)
library(zoo)
library(dynlm)
library(lmtest)
library(sandwich)
library(magrittr)
We’re going to use data on US macroeconomic indicators. The wooldridge
data set is called phillips
.
df <- as_tibble(phillips)
df %>% glimpse()
## Observations: 56
## Variables: 7
## $ year <int> 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, ...
## $ unem <dbl> 3.8, 5.9, 5.3, 3.3, 3.0, 2.9, 5.5, 4.4, 4.1, 4.3, 6.8, 5.5, ...
## $ inf <dbl> 8.1, -1.2, 1.3, 7.9, 1.9, 0.8, 0.7, -0.4, 1.5, 3.3, 2.8, 0.7...
## $ inf_1 <dbl> NA, 8.1, -1.2, 1.3, 7.9, 1.9, 0.8, 0.7, -0.4, 1.5, 3.3, 2.8,...
## $ unem_1 <dbl> NA, 3.8, 5.9, 5.3, 3.3, 3.0, 2.9, 5.5, 4.4, 4.1, 4.3, 6.8, 5...
## $ cinf <dbl> NA, -9.3000002, 2.5000000, 6.6000004, -6.0000000, -1.0999999...
## $ cunem <dbl> NA, 2.1000001, -0.5999999, -2.0000002, -0.3000000, -0.099999...
df
as time series datadf.ts <- zoo(df, order.by=df$year)
Now it will be easy to include lags of various variables into our regression models.
Let’s have a look at the inflation rate and unemployment for the US over the postwar period (1948–2003):
ggplot(df.ts) +
geom_line(aes(year, inf)) +
geom_line(aes(year, unem), color="red")
The negative correlation between the two led economist William Phillips to conclude that governments could increase their inflation rate to reduce the unemployment rate. This is known as the “Phillips Curve.”
Now let’s estimate the Phillips Curve:
\[ inf_{t} = \beta_0 + \beta_1 unemp_t + u_t \]
where \(inf\) is the inflation rate and \(unem\) is the unemployment rate.
library(tseries)
adf.test(df.ts$inf) # alternative hypothesis: stationary
##
## Augmented Dickey-Fuller Test
##
## data: df.ts$inf
## Dickey-Fuller = -2.1213, Lag order = 3, p-value = 0.5257
## alternative hypothesis: stationary
adf.test(df.ts$unem) # alternative hypothesis: stationary
##
## Augmented Dickey-Fuller Test
##
## data: df.ts$unem
## Dickey-Fuller = -2.2056, Lag order = 3, p-value = 0.4917
## alternative hypothesis: stationary
est <- dynlm(inf ~ unem, data=df.ts)
dynlm(resid(est) ~ L(resid(est))) %>% coeftest()
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.11181 0.31799 -0.3516 0.7265
## L(resid(est)) 0.57247 0.10835 5.2833 2.43e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
dynlm(resid(est) ~ L(resid(est))) %>% broom::tidy()
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.112 0.318 -0.352 0.727
## 2 L(resid(est)) 0.572 0.108 5.28 0.00000243
Equivalently, you can use the bgtest()
function in the lmtest
package:
bgtest(est)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: est
## LM test = 20.888, df = 1, p-value = 4.87e-06
unem
in the previous regression. What does it tell you about the idea that inflation and unemployment positively covary?Now let’s compute HAC (Heteroskedasticity and Autocorrelation Consistent) standard errors. To do so, we’ll use the NeweyWest
option in the coeftest()
function of the lmtest
package.2
coeftest(est) # re-display baseline results
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.05357 1.54796 0.6806 0.49902
## unem 0.50238 0.26556 1.8918 0.06389 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(est, vcov=NeweyWest)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.05357 1.56089 0.6750 0.5026
## unem 0.50238 0.36794 1.3654 0.1778
Another way to get rid of serial correlation is to difference the data. In this case, we will estimate the following regression:
\[ \Delta inf_{t} = \beta_0 + \beta_1 unemp_t + u_t \]
where \(\Delta inf_{t} = inf_{t}-inf_{t-1}\). Aside from addressing serial correlation, the differenced model also accounts for people’s inflationary expectations.
est.diff <- dynlm(d(inf) ~ unem, data = df.ts)
bgtest(est.diff)
##
## Breusch-Godfrey test for serial correlation of order up to 1
##
## data: est.diff
## LM test = 0.061273, df = 1, p-value = 0.8045
coeftest(est.diff)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.82820 1.22487 2.3090 0.02487 *
## unem -0.51765 0.20904 -2.4763 0.01650 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(est.diff, vcov=NeweyWest)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.82820 0.96385 2.9343 0.0049321 **
## unem -0.51765 0.14717 -3.5173 0.0009031 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1