Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

AR Model

by Professor Throckmorton
for Time Series Econometrics
W&M ECON 408/PUBP 616
Slides

AR(pp) Model

  • The AR(pp) model is: yt=ϕ1yt1+ϕ2yt2++ϕpytp+εty_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_p y_{t-p} + \varepsilon_t

  • ϕ1\phi_1, ϕ2\phi_2, \ldots, ϕp\phi_p are the autoregressive parameters that quantify the influence of past values on the current value

  • εt\varepsilon_t is white noise error term (or innovation) with mean zero and constant variance, σ2\sigma^2

AR(1) Model

  • An AR(1) Model is a special case of the AR(pp)

    yt=ϕyt1+εty_t = \phi y_{t-1} + \varepsilon_t
  • An AR(1) is often used to model cyclical dynamics

  • e.g., the fluctuations in an AR(1) series where ϕ=0.95\phi = 0.95 appear much more persistent (i.e., smoother) than those where ϕ=0.4\phi = 0.4

AR(1) for any mean

  • An AR(1) model is covariance stationary if ϕ<1|\phi| < 1

  • Suppose E[yt]=E[yt1]yˉE[y_t] = E[y_{t-1}] \equiv \bar{y}

  • Then E[yt]=0E[y_t] = 0 since E[yt]=ϕE[yt1]+E[εt](1ϕ)yˉ=0yˉ=0E[y_t] = \phi E[y_{t-1}] + E[\varepsilon_t] \rightarrow (1-\phi)\bar{y} = 0 \rightarrow \bar{y} = 0

  • Instead consider the AR(1) model with a generic mean: ytμ=ϕ(yt1μ)+εty_t - \mu = \phi(y_{t-1} - \mu) + \varepsilon_t

  • Then E[yt]=μE[y_t] = \mu since

    E[yt]μ=ϕE[yt1]ϕμ+E[εt](1ϕ)yˉ=(1ϕ)μyˉ=μ\begin{align*} E[y_t] - \mu &= \phi E[y_{t-1}] -\phi\mu + E[\varepsilon_t] \\ \rightarrow (1-\phi)\bar{y} &= (1-\phi)\mu \\ \rightarrow \bar{y} &= \mu \end{align*}
  • You can rearrange the (any mean) AR(1) model as yt=(1ϕ)μ+ϕyt1+εty_t = (1-\phi)\mu + \phi y_{t-1} + \varepsilon_t

AR(1) as MA(\infty)

  • The AR(1) model structure is the same at all points in time

    yt=ϕyt1+εtyt1=ϕyt2+εt1yt2=ϕyt3+εt2\begin{align*} y_t &= \phi y_{t-1} + \varepsilon_t \\ y_{t-1} &= \phi y_{t-2} + \varepsilon_{t-1} \\ y_{t-2} &= \phi y_{t-3} + \varepsilon_{t-2} \end{align*}
  • Combine, i.e., recursively substitute, to get yt=ϕ3yt3+ϕ2εt2+ϕεt1+εty_t = \phi^3 y_{t-3} + \phi^2 \varepsilon_{t-2}+ \phi \varepsilon_{t-1} + \varepsilon_t

  • Rinse and repeat to get yt=i=0ϕiεtiy_t = \sum_{i=0}^\infty \phi^i \varepsilon_{t-i} (i.e., the MA(\infty) Model or Wold Representation)

  • And an MA(\infty) is invertible back to an AR(1) so long as ϕ<1|\phi| < 1.

  • An AR model that can be written as an MA(\infty) is causal (or exhibits causality).

  • That property differentiates it from an invertible MA model, which can be expressed as an AR(\infty) model.

Variance

  • Suppose μ=0\mu=0 without loss of generality.

  • Given Var(εt)=σ2Var(\varepsilon_t) = \sigma^2 and ϕ<1|\phi| < 1, then γ(0)Var(yt)\gamma(0) \equiv Var(y_t)

    =Var(i=0ϕiεti)=Var(εt+ϕεt1+ϕ2εt2+ϕ3εt3+ϕ4εt4+)=σ2+ϕ2σ2+(ϕ2)2σ2+(ϕ3)2σ2+(ϕ4)2σ2+=σ2(1+ϕ2+(ϕ2)2+(ϕ2)3+(ϕ2)4+)=σ2/(1ϕ2)\begin{align*} &= Var\left(\sum_{i=0}^\infty \phi^i \varepsilon_{t-i}\right) \\ &= Var(\varepsilon_t + \phi \varepsilon_{t-1} + \phi^2 \varepsilon_{t-2} + \phi^3 \varepsilon_{t-3} + \phi^4 \varepsilon_{t-4} + \cdots) \\ &= \sigma^2 + \phi^2 \sigma^2 + (\phi^2)^2 \sigma^2 + (\phi^3)^2 \sigma^2 + (\phi^4)^2 \sigma^2 + \cdots \\ &= \sigma^2(1 + \phi^2 + (\phi^2)^2 + (\phi^2)^3 + (\phi^2)^4 + \cdots ) \\ &= \sigma^2/(1-\phi^2) \end{align*}

    See Geometric Series.

  • Std(yt)=σ2/(1ϕ2)Std(y_t) = \sqrt{\sigma^2/(1-\phi^2)}

Autocorrelation

  • γ(1)=E[ytyt1]\gamma(1) = E[y_t y_{t-1}]

    =E[(ϕyt1+εt)yt1]=E[ϕyt12]+E[εtyt1]=ϕVar[yt1]+0=ϕγ(0)=ϕσ2/(1ϕ2)\begin{align*} &= E[(\phi y_{t-1} + \varepsilon_t)y_{t-1}] \\ &= E[\phi y_{t-1}^2] + E[\varepsilon_t y_{t-1}] \\ &= \phi Var[y_{t-1}] + 0 \\ &= \phi \gamma(0) \\ &= \phi \sigma^2/(1-\phi^2) \end{align*}
  • ρ(1)=γ(1)/γ(0)=ϕ\rho(1) = \gamma(1)/\gamma(0) = \phi

  • γ(2)=E[ytyt2]=E[(ϕ2yt2+εt+ϕεt1)yt2]=ϕ2E[yt22]=ϕ2γ(0)\gamma(2) = E[y_t y_{t-2}] = E[(\phi^2y_{t-2} + \varepsilon_t + \phi \varepsilon_{t-1})y_{t-2}] = \phi^2 E[y_{t-2}^2] = \phi^2 \gamma(0)

  • ρ(2)γ(2)/γ(0)=ϕ2\rho(2) \equiv \gamma(2)/\gamma(0) = \phi^2

  • In general, ρ(k)=ϕk\rho(k) = \phi^k, which is a very nice looking ACF.

  • Thus, for an AR(1) process/model we can refer to ϕ\phi as the autocorrelation (AC) parameter/coefficient.

AR Model as DGP

Let’s compare a couple different AR models to their underlying white noise.

  1. White noise:

    yt=μ+εty_t = \mu + \varepsilon_t
  2. AR(1), low AC:

    yt=(1ϕl)μ+ϕlyt1+εty_t = (1-\phi_l)\mu + \phi_l y_{t-1} + \varepsilon_t
  3. AR(1), high AC:

    yt=(1ϕh)μ+ϕhyt1+εty_t = (1-\phi_h)\mu + \phi_h y_{t-1} + \varepsilon_t
# Libraries
import numpy as np
# Assign parameters
T = 501; mu = 2; phi_l = 0.5; phi_h = 0.9;
# Draw random numbers
rng = np.random.default_rng(seed=0)
eps = rng.standard_normal(T)
# Simulate time series
y = np.zeros([T,3])
for t in range(1,T):
    y[t,0] = mu + eps[t]
    y[t,1] = (1-phi_l)*mu + phi_l*y[t-1,1] + eps[t]
    y[t,2] = (1-phi_h)*mu + phi_h*y[t-1,2] + eps[t]
# Libraries
import pandas as pd
# Convert to DataFrame and plot
df = pd.DataFrame(y)
df.columns = ['White Noise','AR(1) low AC','AR(1) high AC']
# libraries
import matplotlib.pyplot as plt
# Plot variables
fig, axs = plt.subplots(len(df.columns),figsize=(6.5,3.5))
for idx, ax in enumerate(axs.flat):
    col = df.columns[idx]; y = df[col]
    ax.plot(y); ax.set_ylabel(col)
    ax.grid(); ax.autoscale(tight=True); ax.label_outer()
    print(f'{col}: \
E(y) = {y.mean():.2f}, Std(y) = {y.std():.2f}, \
AC(y) = {y.corr(y.shift(1)):.2f}')
White Noise: E(y) = 1.97, Std(y) = 1.02, AC(y) = -0.02
AR(1) low AC: E(y) = 1.94, Std(y) = 1.16, AC(y) = 0.48
AR(1) high AC: E(y) = 1.72, Std(y) = 2.18, AC(y) = 0.88
<Figure size 650x350 with 3 Axes>
# Auto-correlation function
from statsmodels.graphics.tsaplots import plot_acf as acf
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
acf(df['AR(1) high AC'],ax)
ax.grid(); ax.autoscale(tight=True);
<Figure size 650x150 with 1 Axes>
# ARIMA model
from statsmodels.tsa.arima.model import ARIMA
# Define model
mod = ARIMA(df['AR(1) high AC'],order=(1,0,0))
# Estimate model
res = mod.fit()
summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
# print(tab0); print(tab1); print(tab2)

Estimation Results

Dep. Variable:AR(1) high ACNo. Observations:501
Model:ARIMA(1, 0, 0)Log Likelihood-718.607
coefstd errzP>|z|[0.0250.975]
const1.67740.3904.3020.0000.9132.442
ar.L10.88430.02141.2110.0000.8420.926
sigma21.02820.05917.4420.0000.9131.144
Heteroskedasticity (H):1.14Skew:-0.15
Prob(H) (two-sided):0.41Kurtosis:3.45
  • Parameter estimates are close to DGP and significantly different from 0

  • Residuals have skewness near 0 and kurtosis near 3 so they are probably normally distributed

Modeling UR as AR(1)

  • The U.S. unemployment rate is very persistent.

  • What if we model it with an AR(1)?

# Libraries
from fredapi import Fred
import pandas as pd
# Read Data
fred_api_key = pd.read_csv('fred_api_key.txt', header=None)
fred = Fred(api_key=fred_api_key.iloc[0,0])
data = fred.get_series('UNRATE').to_frame(name='UR')
data.index = pd.DatetimeIndex(data.UR.index.values,freq=data.UR.index.inferred_freq)
# Plot
fig, ax = plt.subplots(figsize=(6.5,2));
ax.plot(data.UR); ax.set_title('U.S. Unemployment Rate');
ax.yaxis.set_major_formatter('{x:,.1f}%')
ax.grid(); ax.autoscale(tight=True)
<Figure size 650x200 with 1 Axes>
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,2.5))
acf(data.UR,ax);
ax.grid(); ax.autoscale(tight=True)
<Figure size 650x250 with 1 Axes>
  • The ACF decays very slow \rightarrow UR might be non-stationary.

  • Let’s assume it is stationary for a moment and estimate an AR(1) model

# ARIMA model
from statsmodels.tsa.arima.model import ARIMA
# Define model
mod = ARIMA(data.UR,order=(1,0,0))
# Estimate model
res = mod.fit()
summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
#print(tab0); print(tab1); print(tab2)

Estimation Results

Dep. Variable:VALUENo. Observations:925
Model:ARIMA(1, 0, 0)Log Likelihood-496.551
coefstd errzP>|z|[0.0250.975]
const5.54700.7617.2890.0004.0557.039
ar.L10.97040.008122.0230.0000.9550.986
sigma20.17080.001159.7980.0000.1690.173
Heteroskedasticity (H):6.35Skew:16.87
Prob(H) (two-sided):0.00Kurtosis:428.90
  • Estimate of AR coeffecient is very high at 0.9704, but the data is not a random walk

  • Residuals are skewed and have excess kurtosis \rightarrow model is missing nonlinearity

Actual vs. Simulated Data

  • What if we simulated data from AR(1) model setting the parameters to the previous estimates?

  • What does a simulation from this estimated model look like?

# Assign parameters
T = len(data.UR); mu = 5.547; rho = 0.9704; sigma = np.sqrt(0.1708);
# Draw random numbers
rng = np.random.default_rng(seed=0)
eps = rng.standard_normal(T)
# Simulate time series
URsim = mu*np.ones([T,1])
for t in range(1,T):
    URsim[t,0] = (1-rho)*mu + rho*URsim[t-1,0] + sigma*eps[t];
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
acf(URsim,ax)
ax.grid(); ax.autoscale(tight=True);
<Figure size 650x150 with 1 Axes>
  • This ACF approaches zero at shorter lags compared to the actual data.

  • We know an AR(1) model with ϕ=0.9704\phi = 0.9704 generates stationary time series.

  • Maybe the actual UR isn’t stationary?

# Add simulation to DataFrame
data['URsim'] = URsim
#  Variables to plot and ylabel
plotme = [['UR','Actual UR'],['URsim','Simulated UR']]
# Plot variables
fig, axs = plt.subplots(len(plotme),figsize=(6.5,3))
for idx, ax in enumerate(axs.flat):
    y = data[plotme[idx][0]]; ylabel = plotme[idx][1]
    ax.plot(y); ax.set_ylabel(ylabel)
    ax.yaxis.set_major_formatter('{x:.0f}%')
    ax.grid(); ax.autoscale(tight=True); ax.label_outer()
    print(f'{ylabel}: \
E(y) = {y.mean():.2f}, Std(y) = {y.std():.2f}, \
AC(y) = {y.corr(y.shift(1)):.2f}')
Actual UR: E(y) = 5.67, Std(y) = 1.71, AC(y) = 0.97
Simulated UR: E(y) = 5.21, Std(y) = 1.34, AC(y) = 0.95
<Figure size 650x300 with 2 Axes>

Does it fit?

  • The actual UR is very smooth, but the AR(1) model is not.

  • The simulated UR often falls below 2%2\%, whereas the actual UR does not.

  • The actual UR meets or exceeds 10%10\% a few times, whereas the simulated UR does not.

  • The actual UR rises quickly and falls slowly, whereas the simulated UR seems to rise and fall at roughly the same rates.

  • This all suggests that the AR(1) model is unable to reproduce some important features of the actual UR.

Partial ACF

  • Most of the slides in this chapter are about the AR(1) model.

  • Suppose the true DGP is an AR model, but we do not know the order.

  • When the DGP is a MA model, we can use the ACF to determine the order since it drops off to zero after a certain lag.

  • But for AR model, the autocorrelation at higher lags is influenced by the intermediate observations between tt and tkt-k, and even for an AR(1) model the ACF tapers slowly.

  • Alternatively, Partial ACF (PACF) helps to identify the order of an AR model.

  • The PACF at lag kk is the correlation between yty_t and ytky_{t-k} that is not explained by observations in between, {ytk+1:yt1}\{y_{t-k+1}:y_{t-1}\}

  • The PACF at lag kk is denoted as ϕkk\phi_{kk}, and plotting ϕkk\phi_{kk} against lag kk forms the PACF plot.

AR(2) Model

  • Let’s simulate the following AR(2) model

    yt=(1ϕ1ϕ2)μ+ϕ1yt1+ϕ2yt2+σεty_t = (1-\phi_1-\phi_2)\mu + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \sigma \varepsilon_t

    where μ=2\mu = 2, ϕ1=0.8\phi_1 = 0.8, ϕ2=0.15\phi_2 = 0.15, and σ=0.5\sigma = 0.5

  • We know the DGP and the true order of the AR model. Does the PACF suggest the same order?

# Assign parameters
T = 501; mu = 2; phi_1 = 0.8; phi_2 = 0.15; sigma = 0.5;
# Draw random numbers
rng = np.random.default_rng(seed=0)
eps = rng.standard_normal(T)
# Simulate time series
y = mu*np.ones([T,1])
for t in range(2,T):
    y[t,0] = (1-phi_1-phi_2)*mu + phi_1*y[t-1,0] + phi_2*y[t-2,0] + sigma*eps[t]
# Convert to DataFrame and plot
df = pd.DataFrame(y)
fig, ax = plt.subplots(figsize=(6.5,2.5))
ax.plot(df); ax.set_title('AR(2) Simulated Data')
ax.grid(); ax.autoscale(tight=True)
<Figure size 650x250 with 1 Axes>
# Partial auto-correlation function
from statsmodels.graphics.tsaplots import plot_pacf as pacf
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
pacf(df,ax)
ax.grid(); ax.autoscale(tight=True);
<Figure size 650x150 with 1 Axes>
  • The PACF is significantly different from zero for k={1,2}k = \{1,2\}, which indicates the DGP is probably an AR(2).

  • So the PACF is good at deterimining order so long as AR coefficients are large enough.

Modeling UR as AR(2)

  • Is a higher order AR model a good fit for the U.S. unemployment rate?

  • Let’s first plot the PACF of the UR to see if we can determine an order.

# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
pacf(data.UR,ax);
ax.grid(); ax.autoscale(tight=True)
<Figure size 650x150 with 1 Axes>

This suggests that an AR(1) might be the best model.

  • What if we ignore the PACF and estimate an AR(2) anyway?

# Define model
mod = ARIMA(data.UR,order=(2,0,0))
# Estimate model
res = mod.fit()
summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
#print(tab0); print(tab1); print(tab2)

Estimation Results

Dep. Variable:VALUENo. Observations:925
Model:ARIMA(2, 0, 0)Log Likelihood-495.314
coefstd errzP>|z|[0.0250.975]
const5.56110.7407.5100.0004.1107.012
ar.L11.02080.009109.0070.0001.0021.039
ar.L2-0.05170.010-5.0570.000-0.072-0.032
sigma20.17030.001143.5400.0000.1680.173
Heteroskedasticity (H):6.37Skew:16.65
Prob(H) (two-sided):0.00Kurtosis:423.48
  • Both AR coefficients are significantly different from 0

  • But the AR(2) is still linear, i.e., normal and symmetric, so the residuals are still skewed and fat-tailed

  • PACF did not help to determine that second lag might improve fit

UR First Difference

# Year-over-year growth rate
dUR = data.UR.diff(1); dUR = dUR.dropna()
# Plot
fig, ax = plt.subplots(figsize=(6.5,1.5));
ax.plot(dUR); ax.set_title('Change in U.S. Unemployment Rate (month to month)');
ax.yaxis.set_major_formatter('{x:,.0f}pp'); ax.grid(); ax.autoscale(tight=True)
<Figure size 650x150 with 1 Axes>

Is it stationary?

  • Yes, mean looks constant around 0.

  • Yes, volatility looks constant, except for COVID.

  • Maybe not, it’s impossible to see if the autocovariance is constant.

# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
acf(dUR,ax)
ax.grid(); ax.autoscale(tight=True);
<Figure size 650x150 with 1 Axes>
  • The ACF indicates that the series is probably stationary.

  • What does the PACF say?

# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
pacf(dUR,ax);
ax.grid(); ax.autoscale(tight=True)
<Figure size 650x150 with 1 Axes>
  • The PACF shows that the autocorrelation is not significantly different from zero for all lags.

  • What is the best model to estimate? Let’s try MA(1) and AR(1) and compare.

Estimation Results

# Define model
mod0 = ARIMA(dUR,order=(0,0,1))
mod1 = ARIMA(dUR,order=(1,0,0))
# Estimate model
res = mod1.fit(); summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
#print(tab0); print(tab1); print(tab2)
Dep. Variable:VALUENo. Observations:924
Model:ARIMA(0, 0, 1)Log Likelihood-500.699
coefstd errzP>|z|[0.0250.975]
const0.00070.0250.0270.979-0.0470.049
ma.L10.03980.0084.8930.0000.0240.056
sigma20.17310.001185.3660.0000.1710.175
Heteroskedasticity (H):6.54Skew:16.40
Prob(H) (two-sided):0.00Kurtosis:419.10
  • The MA coefficient estimate is statistically significantly different from 0.

  • However, the residuals are still probably heteroskedastic and not normally distributed.

  • Thus, a linear/normal model is probably not the best choice.

Dep. Variable:VALUENo. Observations:924
Model:ARIMA(1, 0, 0)Log Likelihood-500.757
coefstd errzP>|z|[0.0250.975]
const0.00070.0250.0260.979-0.0480.049
ar.L10.03630.0084.3440.0000.0200.053
sigma20.17310.001185.4130.0000.1710.175
Heteroskedasticity (H):6.55Skew:16.42
Prob(H) (two-sided):0.00Kurtosis:419.51
  • The AR coefficient is small/weak but estimate is statistically significantly different from 0.

  • Like the MA(1) model, the residuals are still probably heteroskedastic and not normally distributed.

  • This again confirms that a linear/normal model is probably not the best choice.