AR Model¶

by Professor Throckmorton
for Time Series Econometrics
W&M ECON 408/PUBP 616
Slides

AR($p$) Model¶

  • The AR($p$) model is: $y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_p y_{t-p} + \varepsilon_t$
  • $\phi_1$, $\phi_2$, $\ldots$, $\phi_p$ are the autoregressive parameters that quantify the influence of past values on the current value
  • $\varepsilon_t$ is white noise error term (or innovation) with mean zero and constant variance, $\sigma^2$

AR($1$) Model¶

  • An AR($1$) Model is a special case of the AR($p$)

    $$ y_t = \phi y_{t-1} + \varepsilon_t $$

  • An AR($1$) is often used to model cyclical dynamics

  • e.g., the fluctuations in an AR(1) series where $\phi = 0.95$ appear much more persistent (i.e., smoother) than those where $\phi = 0.4$

AR($1$) for any mean¶

  • An AR($1$) model is covariance stationary if $|\phi| < 1$

  • Suppose $E[y_t] = E[y_{t-1}] \equiv \bar{y}$

  • Then $E[y_t] = 0$ since $E[y_t] = \phi E[y_{t-1}] + E[\varepsilon_t] \rightarrow (1-\phi)\bar{y} = 0 \rightarrow \bar{y} = 0$

  • Instead consider the AR(1) model with a generic mean: $y_t - \mu = \phi(y_{t-1} - \mu) + \varepsilon_t$

  • Then $E[y_t] = \mu$ since

    \begin{align*} E[y_t] - \mu &= \phi E[y_{t-1}] -\phi\mu + E[\varepsilon_t] \\ \rightarrow (1-\phi)\bar{y} &= (1-\phi)\mu \\ \rightarrow \bar{y} &= \mu \end{align*}

  • You can rearrange the (any mean) AR(1) model as $y_t = (1-\phi)\mu + \phi y_{t-1} + \varepsilon_t$

AR(1) as MA($\infty$)¶

  • The AR(1) model structure is the same at all points in time

    \begin{align*} y_t &= \phi y_{t-1} + \varepsilon_t \\ y_{t-1} &= \phi y_{t-2} + \varepsilon_{t-1} \\ y_{t-2} &= \phi y_{t-3} + \varepsilon_{t-2} \end{align*}

  • Combine, i.e., recursively substitute, to get $y_t = \phi^3 y_{t-3} + \phi^2 \varepsilon_{t-2}+ \phi \varepsilon_{t-1} + \varepsilon_t$

  • Rinse and repeat to get $y_t = \sum_{i=0}^\infty \phi^i \varepsilon_{t-i}$ (i.e., the MA($\infty$) Model or Wold Representation)

  • And an MA($\infty$) is invertible back to an AR($1$) so long as $|\phi| < 1$.

  • An AR model that can be written as an MA($\infty$) is causal (or exhibits causality).

  • That property differentiates it from an invertible MA model, which can be expressed as an AR($\infty$) model.

Variance¶

  • Suppose $\mu=0$ without loss of generality.

  • Given $Var(\varepsilon_t) = \sigma^2$ and $|\phi| < 1$, then $\gamma(0) \equiv Var(y_t)$

    \begin{align*} &= Var\left(\sum_{i=0}^\infty \phi^i \varepsilon_{t-i}\right) \\ &= Var(\varepsilon_t + \phi \varepsilon_{t-1} + \phi^2 \varepsilon_{t-2} + \phi^3 \varepsilon_{t-3} + \phi^4 \varepsilon_{t-4} + \cdots) \\ &= \sigma^2 + \phi^2 \sigma^2 + (\phi^2)^2 \sigma^2 + (\phi^3)^2 \sigma^2 + (\phi^4)^2 \sigma^2 + \cdots \\ &= \sigma^2(1 + \phi^2 + (\phi^2)^2 + (\phi^2)^3 + (\phi^2)^4 + \cdots ) \\ &= \sigma^2/(1-\phi^2) \end{align*}

    See Geometric Series.

  • $Std(y_t) = \sqrt{\sigma^2/(1-\phi^2)}$

Autocorrelation¶

  • $\gamma(1) = E[y_t y_{t-1}]$

    \begin{align*} &= E[(\phi y_{t-1} + \varepsilon_t)y_{t-1}] \\ &= E[\phi y_{t-1}^2] + E[\varepsilon_t y_{t-1}] \\ &= \phi Var[y_{t-1}] + 0 \\ &= \phi \gamma(0) \\ &= \phi \sigma^2/(1-\phi^2) \end{align*}

  • $\rho(1) = \gamma(1)/\gamma(0) = \phi$

  • $\gamma(2) = E[y_t y_{t-2}] = E[(\phi^2y_{t-2} + \varepsilon_t + \phi \varepsilon_{t-1})y_{t-2}] = \phi^2 E[y_{t-2}^2] = \phi^2 \gamma(0)$

  • $\rho(2) \equiv \gamma(2)/\gamma(0) = \phi^2$

  • In general, $\rho(k) = \phi^k$, which is a very nice looking ACF.

  • Thus, for an AR($1$) process/model we can refer to $\phi$ as the autocorrelation (AC) parameter/coefficient.

AR Model as DGP¶

Let's compare a couple different AR models to their underlying white noise.

  1. White noise:

    $$ y_t = \mu + \varepsilon_t $$

  2. AR(1), low AC:

    $$ y_t = (1-\phi_l)\mu + \phi_l y_{t-1} + \varepsilon_t $$

  3. AR(1), high AC:

    $$ y_t = (1-\phi_h)\mu + \phi_h y_{t-1} + \varepsilon_t $$

In [1]:
# Libraries
import numpy as np
# Assign parameters
T = 501; mu = 2; phi_l = 0.5; phi_h = 0.9;
# Draw random numbers
rng = np.random.default_rng(seed=0)
eps = rng.standard_normal(T)
# Simulate time series
y = np.zeros([T,3])
for t in range(1,T):
    y[t,0] = mu + eps[t]
    y[t,1] = (1-phi_l)*mu + phi_l*y[t-1,1] + eps[t]
    y[t,2] = (1-phi_h)*mu + phi_h*y[t-1,2] + eps[t]
In [2]:
# Libraries
import pandas as pd
# Convert to DataFrame and plot
df = pd.DataFrame(y)
df.columns = ['White Noise','AR(1) low AC','AR(1) high AC']
In [3]:
# libraries
import matplotlib.pyplot as plt
# Plot variables
fig, axs = plt.subplots(len(df.columns),figsize=(6.5,3.5))
for idx, ax in enumerate(axs.flat):
    col = df.columns[idx]; y = df[col]
    ax.plot(y); ax.set_ylabel(col)
    ax.grid(); ax.autoscale(tight=True); ax.label_outer()
    print(f'{col}: \
E(y) = {y.mean():.2f}, Std(y) = {y.std():.2f}, \
AC(y) = {y.corr(y.shift(1)):.2f}')
White Noise: E(y) = 1.97, Std(y) = 1.02, AC(y) = -0.02
AR(1) low AC: E(y) = 1.94, Std(y) = 1.16, AC(y) = 0.48
AR(1) high AC: E(y) = 1.72, Std(y) = 2.18, AC(y) = 0.88
No description has been provided for this image
In [4]:
# Auto-correlation function
from statsmodels.graphics.tsaplots import plot_acf as acf
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
acf(df['AR(1) high AC'],ax)
ax.grid(); ax.autoscale(tight=True);
No description has been provided for this image
In [5]:
# ARIMA model
from statsmodels.tsa.arima.model import ARIMA
# Define model
mod = ARIMA(df['AR(1) high AC'],order=(1,0,0))
# Estimate model
res = mod.fit()
summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
# print(tab0); print(tab1); print(tab2)

Estimation Results¶

Dep. Variable: AR(1) high AC No. Observations: 501
Model: ARIMA(1, 0, 0) Log Likelihood -718.607
coef std err z P>|z| [0.025 0.975]
const 1.6774 0.390 4.302 0.000 0.913 2.442
ar.L1 0.8843 0.021 41.211 0.000 0.842 0.926
sigma2 1.0282 0.059 17.442 0.000 0.913 1.144
Heteroskedasticity (H): 1.14 Skew: -0.15
Prob(H) (two-sided): 0.41 Kurtosis: 3.45
  • Parameter estimates are close to DGP and significantly different from $0$
  • Residuals have skewness near $0$ and kurtosis near $3$ so they are probably normally distributed

Modeling UR as AR($1$)¶

  • The U.S. unemployment rate is very persistent.
  • What if we model it with an AR(1)?
In [6]:
# Libraries
from fredapi import Fred
import pandas as pd
# Read Data
fred_api_key = pd.read_csv('fred_api_key.txt', header=None)
fred = Fred(api_key=fred_api_key.iloc[0,0])
data = fred.get_series('UNRATE').to_frame(name='UR')
data.index = pd.DatetimeIndex(data.UR.index.values,freq=data.UR.index.inferred_freq)
# Plot
fig, ax = plt.subplots(figsize=(6.5,2));
ax.plot(data.UR); ax.set_title('U.S. Unemployment Rate');
ax.yaxis.set_major_formatter('{x:,.1f}%')
ax.grid(); ax.autoscale(tight=True)
No description has been provided for this image
In [7]:
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,2.5))
acf(data.UR,ax);
ax.grid(); ax.autoscale(tight=True)
No description has been provided for this image
  • The ACF decays very slow $\rightarrow$ UR might be non-stationary.
  • Let's assume it is stationary for a moment and estimate an AR($1$) model
In [8]:
# ARIMA model
from statsmodels.tsa.arima.model import ARIMA
# Define model
mod = ARIMA(data.UR,order=(1,0,0))
# Estimate model
res = mod.fit()
summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
#print(tab0); print(tab1); print(tab2)

Estimation Results¶

Dep. Variable: VALUE No. Observations: 925
Model: ARIMA(1, 0, 0) Log Likelihood -496.551
coef std err z P>|z| [0.025 0.975]
const 5.5470 0.761 7.289 0.000 4.055 7.039
ar.L1 0.9704 0.008 122.023 0.000 0.955 0.986
sigma2 0.1708 0.001 159.798 0.000 0.169 0.173
Heteroskedasticity (H): 6.35 Skew: 16.87
Prob(H) (two-sided): 0.00 Kurtosis: 428.90
  • Estimate of AR coeffecient is very high at $0.9704$, but the data is not a random walk
  • Residuals are skewed and have excess kurtosis $\rightarrow$ model is missing nonlinearity

Actual vs. Simulated Data¶

  • What if we simulated data from AR(1) model setting the parameters to the previous estimates?
  • What does a simulation from this estimated model look like?
In [9]:
# Assign parameters
T = len(data.UR); mu = 5.547; rho = 0.9704; sigma = np.sqrt(0.1708);
# Draw random numbers
rng = np.random.default_rng(seed=0)
eps = rng.standard_normal(T)
# Simulate time series
URsim = mu*np.ones([T,1])
for t in range(1,T):
    URsim[t,0] = (1-rho)*mu + rho*URsim[t-1,0] + sigma*eps[t];
In [10]:
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
acf(URsim,ax)
ax.grid(); ax.autoscale(tight=True);
No description has been provided for this image
  • This ACF approaches zero at shorter lags compared to the actual data.
  • We know an AR($1$) model with $\phi = 0.9704$ generates stationary time series.
  • Maybe the actual UR isn't stationary?
In [11]:
# Add simulation to DataFrame
data['URsim'] = URsim
#  Variables to plot and ylabel
plotme = [['UR','Actual UR'],['URsim','Simulated UR']]
# Plot variables
fig, axs = plt.subplots(len(plotme),figsize=(6.5,3))
for idx, ax in enumerate(axs.flat):
    y = data[plotme[idx][0]]; ylabel = plotme[idx][1]
    ax.plot(y); ax.set_ylabel(ylabel)
    ax.yaxis.set_major_formatter('{x:.0f}%')
    ax.grid(); ax.autoscale(tight=True); ax.label_outer()
    print(f'{ylabel}: \
E(y) = {y.mean():.2f}, Std(y) = {y.std():.2f}, \
AC(y) = {y.corr(y.shift(1)):.2f}')
Actual UR: E(y) = 5.67, Std(y) = 1.71, AC(y) = 0.97
Simulated UR: E(y) = 5.21, Std(y) = 1.34, AC(y) = 0.95
No description has been provided for this image

Does it fit?¶

  • The actual UR is very smooth, but the AR(1) model is not.
  • The simulated UR often falls below $2\%$, whereas the actual UR does not.
  • The actual UR meets or exceeds $10\%$ a few times, whereas the simulated UR does not.
  • The actual UR rises quickly and falls slowly, whereas the simulated UR seems to rise and fall at roughly the same rates.
  • This all suggests that the AR(1) model is unable to reproduce some important features of the actual UR.

Partial ACF¶

  • Most of the slides in this chapter are about the AR($1$) model.
  • Suppose the true DGP is an AR model, but we do not know the order.
  • When the DGP is a MA model, we can use the ACF to determine the order since it drops off to zero after a certain lag.
  • But for AR model, the autocorrelation at higher lags is influenced by the intermediate observations between $t$ and $t-k$, and even for an AR($1$) model the ACF tapers slowly.
  • Alternatively, Partial ACF (PACF) helps to identify the order of an AR model.
  • The PACF at lag $k$ is the correlation between $y_t$ and $y_{t-k}$ that is not explained by observations in between, $\{y_{t-k+1}:y_{t-1}\}$
  • The PACF at lag $k$ is denoted as $\phi_{kk}$, and plotting $\phi_{kk}$ against lag $k$ forms the PACF plot.

AR($2$) Model¶

  • Let's simulate the following AR($2$) model

    $$ y_t = (1-\phi_1-\phi_2)\mu + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \sigma \varepsilon_t $$

    where $\mu = 2$, $\phi_1 = 0.8$, $\phi_2 = 0.15$, and $\sigma = 0.5$

  • We know the DGP and the true order of the AR model. Does the PACF suggest the same order?

In [12]:
# Assign parameters
T = 501; mu = 2; phi_1 = 0.8; phi_2 = 0.15; sigma = 0.5;
# Draw random numbers
rng = np.random.default_rng(seed=0)
eps = rng.standard_normal(T)
# Simulate time series
y = mu*np.ones([T,1])
for t in range(2,T):
    y[t,0] = (1-phi_1-phi_2)*mu + phi_1*y[t-1,0] + phi_2*y[t-2,0] + sigma*eps[t]
In [13]:
# Convert to DataFrame and plot
df = pd.DataFrame(y)
fig, ax = plt.subplots(figsize=(6.5,2.5))
ax.plot(df); ax.set_title('AR(2) Simulated Data')
ax.grid(); ax.autoscale(tight=True)
No description has been provided for this image
In [14]:
# Partial auto-correlation function
from statsmodels.graphics.tsaplots import plot_pacf as pacf
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
pacf(df,ax)
ax.grid(); ax.autoscale(tight=True);
No description has been provided for this image
  • The PACF is significantly different from zero for $k = \{1,2\}$, which indicates the DGP is probably an AR($2$).
  • So the PACF is good at deterimining order so long as AR coefficients are large enough.

Modeling UR as AR($2$)¶

  • Is a higher order AR model a good fit for the U.S. unemployment rate?
  • Let's first plot the PACF of the UR to see if we can determine an order.
In [15]:
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
pacf(data.UR,ax);
ax.grid(); ax.autoscale(tight=True)
No description has been provided for this image

This suggests that an AR(1) might be the best model.

  • What if we ignore the PACF and estimate an AR(2) anyway?
In [16]:
# Define model
mod = ARIMA(data.UR,order=(2,0,0))
# Estimate model
res = mod.fit()
summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
#print(tab0); print(tab1); print(tab2)

Estimation Results¶

Dep. Variable: VALUE No. Observations: 925
Model: ARIMA(2, 0, 0) Log Likelihood -495.314
coef std err z P>|z| [0.025 0.975]
const 5.5611 0.740 7.510 0.000 4.110 7.012
ar.L1 1.0208 0.009 109.007 0.000 1.002 1.039
ar.L2 -0.0517 0.010 -5.057 0.000 -0.072 -0.032
sigma2 0.1703 0.001 143.540 0.000 0.168 0.173
Heteroskedasticity (H): 6.37 Skew: 16.65
Prob(H) (two-sided): 0.00 Kurtosis: 423.48
  • Both AR coefficients are significantly different from $0$
  • But the AR(2) is still linear, i.e., normal and symmetric, so the residuals are still skewed and fat-tailed
  • PACF did not help to determine that second lag might improve fit

UR First Difference¶

In [17]:
# Year-over-year growth rate
dUR = data.UR.diff(1); dUR = dUR.dropna()
# Plot
fig, ax = plt.subplots(figsize=(6.5,1.5));
ax.plot(dUR); ax.set_title('Change in U.S. Unemployment Rate (month to month)');
ax.yaxis.set_major_formatter('{x:,.0f}pp'); ax.grid(); ax.autoscale(tight=True)
No description has been provided for this image

Is it stationary?

  • Yes, mean looks constant around $0$.
  • Yes, volatility looks constant, except for COVID.
  • Maybe not, it's impossible to see if the autocovariance is constant.
In [18]:
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
acf(dUR,ax)
ax.grid(); ax.autoscale(tight=True);
No description has been provided for this image
  • The ACF indicates that the series is probably stationary.
  • What does the PACF say?
In [19]:
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
pacf(dUR,ax);
ax.grid(); ax.autoscale(tight=True)
No description has been provided for this image
  • The PACF shows that the autocorrelation is not significantly different from zero for all lags.
  • What is the best model to estimate? Let's try MA($1$) and AR($1$) and compare.

Estimation Results¶

In [20]:
# Define model
mod0 = ARIMA(dUR,order=(0,0,1))
mod1 = ARIMA(dUR,order=(1,0,0))
# Estimate model
res = mod1.fit(); summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
#print(tab0); print(tab1); print(tab2)
Dep. Variable: VALUE No. Observations: 924
Model: ARIMA(0, 0, 1) Log Likelihood -500.699
coef std err z P>|z| [0.025 0.975]
const 0.0007 0.025 0.027 0.979 -0.047 0.049
ma.L1 0.0398 0.008 4.893 0.000 0.024 0.056
sigma2 0.1731 0.001 185.366 0.000 0.171 0.175
Heteroskedasticity (H): 6.54 Skew: 16.40
Prob(H) (two-sided): 0.00 Kurtosis: 419.10
  • The MA coefficient estimate is statistically significantly different from $0$.
  • However, the residuals are still probably heteroskedastic and not normally distributed.
  • Thus, a linear/normal model is probably not the best choice.
Dep. Variable: VALUE No. Observations: 924
Model: ARIMA(1, 0, 0) Log Likelihood -500.757
coef std err z P>|z| [0.025 0.975]
const 0.0007 0.025 0.026 0.979 -0.048 0.049
ar.L1 0.0363 0.008 4.344 0.000 0.020 0.053
sigma2 0.1731 0.001 185.413 0.000 0.171 0.175
Heteroskedasticity (H): 6.55 Skew: 16.42
Prob(H) (two-sided): 0.00 Kurtosis: 419.51
  • The AR coefficient is small/weak but estimate is statistically significantly different from $0$.
  • Like the MA($1$) model, the residuals are still probably heteroskedastic and not normally distributed.
  • This again confirms that a linear/normal model is probably not the best choice.