AR($p$) Model¶
- The AR($p$) model is: $y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_p y_{t-p} + \varepsilon_t$
- $\phi_1$, $\phi_2$, $\ldots$, $\phi_p$ are the autoregressive parameters that quantify the influence of past values on the current value
- $\varepsilon_t$ is white noise error term (or innovation) with mean zero and constant variance, $\sigma^2$
AR($1$) Model¶
An AR($1$) Model is a special case of the AR($p$)
$$ y_t = \phi y_{t-1} + \varepsilon_t $$
An AR($1$) is often used to model cyclical dynamics
e.g., the fluctuations in an AR(1) series where $\phi = 0.95$ appear much more persistent (i.e., smoother) than those where $\phi = 0.4$
AR($1$) for any mean¶
An AR($1$) model is covariance stationary if $|\phi| < 1$
Suppose $E[y_t] = E[y_{t-1}] \equiv \bar{y}$
Then $E[y_t] = 0$ since $E[y_t] = \phi E[y_{t-1}] + E[\varepsilon_t] \rightarrow (1-\phi)\bar{y} = 0 \rightarrow \bar{y} = 0$
Instead consider the AR(1) model with a generic mean: $y_t - \mu = \phi(y_{t-1} - \mu) + \varepsilon_t$
Then $E[y_t] = \mu$ since
\begin{align*} E[y_t] - \mu &= \phi E[y_{t-1}] -\phi\mu + E[\varepsilon_t] \\ \rightarrow (1-\phi)\bar{y} &= (1-\phi)\mu \\ \rightarrow \bar{y} &= \mu \end{align*}
You can rearrange the (any mean) AR(1) model as $y_t = (1-\phi)\mu + \phi y_{t-1} + \varepsilon_t$
AR(1) as MA($\infty$)¶
The AR(1) model structure is the same at all points in time
\begin{align*} y_t &= \phi y_{t-1} + \varepsilon_t \\ y_{t-1} &= \phi y_{t-2} + \varepsilon_{t-1} \\ y_{t-2} &= \phi y_{t-3} + \varepsilon_{t-2} \end{align*}
Combine, i.e., recursively substitute, to get $y_t = \phi^3 y_{t-3} + \phi^2 \varepsilon_{t-2}+ \phi \varepsilon_{t-1} + \varepsilon_t$
Rinse and repeat to get $y_t = \sum_{i=0}^\infty \phi^i \varepsilon_{t-i}$ (i.e., the MA($\infty$) Model or Wold Representation)
And an MA($\infty$) is invertible back to an AR($1$) so long as $|\phi| < 1$.
An AR model that can be written as an MA($\infty$) is causal (or exhibits causality).
That property differentiates it from an invertible MA model, which can be expressed as an AR($\infty$) model.
Variance¶
Suppose $\mu=0$ without loss of generality.
Given $Var(\varepsilon_t) = \sigma^2$ and $|\phi| < 1$, then $\gamma(0) \equiv Var(y_t)$
\begin{align*} &= Var\left(\sum_{i=0}^\infty \phi^i \varepsilon_{t-i}\right) \\ &= Var(\varepsilon_t + \phi \varepsilon_{t-1} + \phi^2 \varepsilon_{t-2} + \phi^3 \varepsilon_{t-3} + \phi^4 \varepsilon_{t-4} + \cdots) \\ &= \sigma^2 + \phi^2 \sigma^2 + (\phi^2)^2 \sigma^2 + (\phi^3)^2 \sigma^2 + (\phi^4)^2 \sigma^2 + \cdots \\ &= \sigma^2(1 + \phi^2 + (\phi^2)^2 + (\phi^2)^3 + (\phi^2)^4 + \cdots ) \\ &= \sigma^2/(1-\phi^2) \end{align*}
See Geometric Series.
$Std(y_t) = \sqrt{\sigma^2/(1-\phi^2)}$
Autocorrelation¶
$\gamma(1) = E[y_t y_{t-1}]$
\begin{align*} &= E[(\phi y_{t-1} + \varepsilon_t)y_{t-1}] \\ &= E[\phi y_{t-1}^2] + E[\varepsilon_t y_{t-1}] \\ &= \phi Var[y_{t-1}] + 0 \\ &= \phi \gamma(0) \\ &= \phi \sigma^2/(1-\phi^2) \end{align*}
$\rho(1) = \gamma(1)/\gamma(0) = \phi$
$\gamma(2) = E[y_t y_{t-2}] = E[(\phi^2y_{t-2} + \varepsilon_t + \phi \varepsilon_{t-1})y_{t-2}] = \phi^2 E[y_{t-2}^2] = \phi^2 \gamma(0)$
$\rho(2) \equiv \gamma(2)/\gamma(0) = \phi^2$
In general, $\rho(k) = \phi^k$, which is a very nice looking ACF.
Thus, for an AR($1$) process/model we can refer to $\phi$ as the autocorrelation (AC) parameter/coefficient.
AR Model as DGP¶
Let's compare a couple different AR models to their underlying white noise.
White noise:
$$ y_t = \mu + \varepsilon_t $$
AR(1), low AC:
$$ y_t = (1-\phi_l)\mu + \phi_l y_{t-1} + \varepsilon_t $$
AR(1), high AC:
$$ y_t = (1-\phi_h)\mu + \phi_h y_{t-1} + \varepsilon_t $$
# Libraries
import numpy as np
# Assign parameters
T = 501; mu = 2; phi_l = 0.5; phi_h = 0.9;
# Draw random numbers
rng = np.random.default_rng(seed=0)
eps = rng.standard_normal(T)
# Simulate time series
y = np.zeros([T,3])
for t in range(1,T):
y[t,0] = mu + eps[t]
y[t,1] = (1-phi_l)*mu + phi_l*y[t-1,1] + eps[t]
y[t,2] = (1-phi_h)*mu + phi_h*y[t-1,2] + eps[t]
# Libraries
import pandas as pd
# Convert to DataFrame and plot
df = pd.DataFrame(y)
df.columns = ['White Noise','AR(1) low AC','AR(1) high AC']
# libraries
import matplotlib.pyplot as plt
# Plot variables
fig, axs = plt.subplots(len(df.columns),figsize=(6.5,3.5))
for idx, ax in enumerate(axs.flat):
col = df.columns[idx]; y = df[col]
ax.plot(y); ax.set_ylabel(col)
ax.grid(); ax.autoscale(tight=True); ax.label_outer()
print(f'{col}: \
E(y) = {y.mean():.2f}, Std(y) = {y.std():.2f}, \
AC(y) = {y.corr(y.shift(1)):.2f}')
White Noise: E(y) = 1.97, Std(y) = 1.02, AC(y) = -0.02 AR(1) low AC: E(y) = 1.94, Std(y) = 1.16, AC(y) = 0.48 AR(1) high AC: E(y) = 1.72, Std(y) = 2.18, AC(y) = 0.88
# Auto-correlation function
from statsmodels.graphics.tsaplots import plot_acf as acf
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
acf(df['AR(1) high AC'],ax)
ax.grid(); ax.autoscale(tight=True);
# ARIMA model
from statsmodels.tsa.arima.model import ARIMA
# Define model
mod = ARIMA(df['AR(1) high AC'],order=(1,0,0))
# Estimate model
res = mod.fit()
summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
# print(tab0); print(tab1); print(tab2)
Estimation Results¶
Dep. Variable: | AR(1) high AC | No. Observations: | 501 |
---|---|---|---|
Model: | ARIMA(1, 0, 0) | Log Likelihood | -718.607 |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 1.6774 | 0.390 | 4.302 | 0.000 | 0.913 | 2.442 |
ar.L1 | 0.8843 | 0.021 | 41.211 | 0.000 | 0.842 | 0.926 |
sigma2 | 1.0282 | 0.059 | 17.442 | 0.000 | 0.913 | 1.144 |
Heteroskedasticity (H): | 1.14 | Skew: | -0.15 |
---|---|---|---|
Prob(H) (two-sided): | 0.41 | Kurtosis: | 3.45 |
- Parameter estimates are close to DGP and significantly different from $0$
- Residuals have skewness near $0$ and kurtosis near $3$ so they are probably normally distributed
Modeling UR as AR($1$)¶
- The U.S. unemployment rate is very persistent.
- What if we model it with an AR(1)?
# Libraries
from fredapi import Fred
import pandas as pd
# Read Data
fred_api_key = pd.read_csv('fred_api_key.txt', header=None)
fred = Fred(api_key=fred_api_key.iloc[0,0])
data = fred.get_series('UNRATE').to_frame(name='UR')
data.index = pd.DatetimeIndex(data.UR.index.values,freq=data.UR.index.inferred_freq)
# Plot
fig, ax = plt.subplots(figsize=(6.5,2));
ax.plot(data.UR); ax.set_title('U.S. Unemployment Rate');
ax.yaxis.set_major_formatter('{x:,.1f}%')
ax.grid(); ax.autoscale(tight=True)
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,2.5))
acf(data.UR,ax);
ax.grid(); ax.autoscale(tight=True)
- The ACF decays very slow $\rightarrow$ UR might be non-stationary.
- Let's assume it is stationary for a moment and estimate an AR($1$) model
# ARIMA model
from statsmodels.tsa.arima.model import ARIMA
# Define model
mod = ARIMA(data.UR,order=(1,0,0))
# Estimate model
res = mod.fit()
summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
#print(tab0); print(tab1); print(tab2)
Estimation Results¶
Dep. Variable: | VALUE | No. Observations: | 925 |
---|---|---|---|
Model: | ARIMA(1, 0, 0) | Log Likelihood | -496.551 |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 5.5470 | 0.761 | 7.289 | 0.000 | 4.055 | 7.039 |
ar.L1 | 0.9704 | 0.008 | 122.023 | 0.000 | 0.955 | 0.986 |
sigma2 | 0.1708 | 0.001 | 159.798 | 0.000 | 0.169 | 0.173 |
Heteroskedasticity (H): | 6.35 | Skew: | 16.87 |
---|---|---|---|
Prob(H) (two-sided): | 0.00 | Kurtosis: | 428.90 |
- Estimate of AR coeffecient is very high at $0.9704$, but the data is not a random walk
- Residuals are skewed and have excess kurtosis $\rightarrow$ model is missing nonlinearity
Actual vs. Simulated Data¶
- What if we simulated data from AR(1) model setting the parameters to the previous estimates?
- What does a simulation from this estimated model look like?
# Assign parameters
T = len(data.UR); mu = 5.547; rho = 0.9704; sigma = np.sqrt(0.1708);
# Draw random numbers
rng = np.random.default_rng(seed=0)
eps = rng.standard_normal(T)
# Simulate time series
URsim = mu*np.ones([T,1])
for t in range(1,T):
URsim[t,0] = (1-rho)*mu + rho*URsim[t-1,0] + sigma*eps[t];
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
acf(URsim,ax)
ax.grid(); ax.autoscale(tight=True);
- This ACF approaches zero at shorter lags compared to the actual data.
- We know an AR($1$) model with $\phi = 0.9704$ generates stationary time series.
- Maybe the actual UR isn't stationary?
# Add simulation to DataFrame
data['URsim'] = URsim
# Variables to plot and ylabel
plotme = [['UR','Actual UR'],['URsim','Simulated UR']]
# Plot variables
fig, axs = plt.subplots(len(plotme),figsize=(6.5,3))
for idx, ax in enumerate(axs.flat):
y = data[plotme[idx][0]]; ylabel = plotme[idx][1]
ax.plot(y); ax.set_ylabel(ylabel)
ax.yaxis.set_major_formatter('{x:.0f}%')
ax.grid(); ax.autoscale(tight=True); ax.label_outer()
print(f'{ylabel}: \
E(y) = {y.mean():.2f}, Std(y) = {y.std():.2f}, \
AC(y) = {y.corr(y.shift(1)):.2f}')
Actual UR: E(y) = 5.67, Std(y) = 1.71, AC(y) = 0.97 Simulated UR: E(y) = 5.21, Std(y) = 1.34, AC(y) = 0.95
Does it fit?¶
- The actual UR is very smooth, but the AR(1) model is not.
- The simulated UR often falls below $2\%$, whereas the actual UR does not.
- The actual UR meets or exceeds $10\%$ a few times, whereas the simulated UR does not.
- The actual UR rises quickly and falls slowly, whereas the simulated UR seems to rise and fall at roughly the same rates.
- This all suggests that the AR(1) model is unable to reproduce some important features of the actual UR.
Partial ACF¶
- Most of the slides in this chapter are about the AR($1$) model.
- Suppose the true DGP is an AR model, but we do not know the order.
- When the DGP is a MA model, we can use the ACF to determine the order since it drops off to zero after a certain lag.
- But for AR model, the autocorrelation at higher lags is influenced by the intermediate observations between $t$ and $t-k$, and even for an AR($1$) model the ACF tapers slowly.
- Alternatively, Partial ACF (PACF) helps to identify the order of an AR model.
- The PACF at lag $k$ is the correlation between $y_t$ and $y_{t-k}$ that is not explained by observations in between, $\{y_{t-k+1}:y_{t-1}\}$
- The PACF at lag $k$ is denoted as $\phi_{kk}$, and plotting $\phi_{kk}$ against lag $k$ forms the PACF plot.
AR($2$) Model¶
Let's simulate the following AR($2$) model
$$ y_t = (1-\phi_1-\phi_2)\mu + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \sigma \varepsilon_t $$
where $\mu = 2$, $\phi_1 = 0.8$, $\phi_2 = 0.15$, and $\sigma = 0.5$
We know the DGP and the true order of the AR model. Does the PACF suggest the same order?
# Assign parameters
T = 501; mu = 2; phi_1 = 0.8; phi_2 = 0.15; sigma = 0.5;
# Draw random numbers
rng = np.random.default_rng(seed=0)
eps = rng.standard_normal(T)
# Simulate time series
y = mu*np.ones([T,1])
for t in range(2,T):
y[t,0] = (1-phi_1-phi_2)*mu + phi_1*y[t-1,0] + phi_2*y[t-2,0] + sigma*eps[t]
# Convert to DataFrame and plot
df = pd.DataFrame(y)
fig, ax = plt.subplots(figsize=(6.5,2.5))
ax.plot(df); ax.set_title('AR(2) Simulated Data')
ax.grid(); ax.autoscale(tight=True)
# Partial auto-correlation function
from statsmodels.graphics.tsaplots import plot_pacf as pacf
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
pacf(df,ax)
ax.grid(); ax.autoscale(tight=True);
- The PACF is significantly different from zero for $k = \{1,2\}$, which indicates the DGP is probably an AR($2$).
- So the PACF is good at deterimining order so long as AR coefficients are large enough.
Modeling UR as AR($2$)¶
- Is a higher order AR model a good fit for the U.S. unemployment rate?
- Let's first plot the PACF of the UR to see if we can determine an order.
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
pacf(data.UR,ax);
ax.grid(); ax.autoscale(tight=True)
This suggests that an AR(1) might be the best model.
- What if we ignore the PACF and estimate an AR(2) anyway?
# Define model
mod = ARIMA(data.UR,order=(2,0,0))
# Estimate model
res = mod.fit()
summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
#print(tab0); print(tab1); print(tab2)
Estimation Results¶
Dep. Variable: | VALUE | No. Observations: | 925 |
---|---|---|---|
Model: | ARIMA(2, 0, 0) | Log Likelihood | -495.314 |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 5.5611 | 0.740 | 7.510 | 0.000 | 4.110 | 7.012 |
ar.L1 | 1.0208 | 0.009 | 109.007 | 0.000 | 1.002 | 1.039 |
ar.L2 | -0.0517 | 0.010 | -5.057 | 0.000 | -0.072 | -0.032 |
sigma2 | 0.1703 | 0.001 | 143.540 | 0.000 | 0.168 | 0.173 |
Heteroskedasticity (H): | 6.37 | Skew: | 16.65 |
---|---|---|---|
Prob(H) (two-sided): | 0.00 | Kurtosis: | 423.48 |
- Both AR coefficients are significantly different from $0$
- But the AR(2) is still linear, i.e., normal and symmetric, so the residuals are still skewed and fat-tailed
- PACF did not help to determine that second lag might improve fit
UR First Difference¶
# Year-over-year growth rate
dUR = data.UR.diff(1); dUR = dUR.dropna()
# Plot
fig, ax = plt.subplots(figsize=(6.5,1.5));
ax.plot(dUR); ax.set_title('Change in U.S. Unemployment Rate (month to month)');
ax.yaxis.set_major_formatter('{x:,.0f}pp'); ax.grid(); ax.autoscale(tight=True)
Is it stationary?
- Yes, mean looks constant around $0$.
- Yes, volatility looks constant, except for COVID.
- Maybe not, it's impossible to see if the autocovariance is constant.
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
acf(dUR,ax)
ax.grid(); ax.autoscale(tight=True);
- The ACF indicates that the series is probably stationary.
- What does the PACF say?
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
pacf(dUR,ax);
ax.grid(); ax.autoscale(tight=True)
- The PACF shows that the autocorrelation is not significantly different from zero for all lags.
- What is the best model to estimate? Let's try MA($1$) and AR($1$) and compare.
Estimation Results¶
# Define model
mod0 = ARIMA(dUR,order=(0,0,1))
mod1 = ARIMA(dUR,order=(1,0,0))
# Estimate model
res = mod1.fit(); summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
#print(tab0); print(tab1); print(tab2)
Dep. Variable: | VALUE | No. Observations: | 924 |
---|---|---|---|
Model: | ARIMA(0, 0, 1) | Log Likelihood | -500.699 |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 0.0007 | 0.025 | 0.027 | 0.979 | -0.047 | 0.049 |
ma.L1 | 0.0398 | 0.008 | 4.893 | 0.000 | 0.024 | 0.056 |
sigma2 | 0.1731 | 0.001 | 185.366 | 0.000 | 0.171 | 0.175 |
Heteroskedasticity (H): | 6.54 | Skew: | 16.40 |
---|---|---|---|
Prob(H) (two-sided): | 0.00 | Kurtosis: | 419.10 |
- The MA coefficient estimate is statistically significantly different from $0$.
- However, the residuals are still probably heteroskedastic and not normally distributed.
- Thus, a linear/normal model is probably not the best choice.
Dep. Variable: | VALUE | No. Observations: | 924 |
---|---|---|---|
Model: | ARIMA(1, 0, 0) | Log Likelihood | -500.757 |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 0.0007 | 0.025 | 0.026 | 0.979 | -0.048 | 0.049 |
ar.L1 | 0.0363 | 0.008 | 4.344 | 0.000 | 0.020 | 0.053 |
sigma2 | 0.1731 | 0.001 | 185.413 | 0.000 | 0.171 | 0.175 |
Heteroskedasticity (H): | 6.55 | Skew: | 16.42 |
---|---|---|---|
Prob(H) (two-sided): | 0.00 | Kurtosis: | 419.51 |
- The AR coefficient is small/weak but estimate is statistically significantly different from $0$.
- Like the MA($1$) model, the residuals are still probably heteroskedastic and not normally distributed.
- This again confirms that a linear/normal model is probably not the best choice.