VAR Model#
by Professor Throckmorton
for Time Series Econometrics
W&M ECON 408/PUBP 616
Slides
Introduction#
A \(VAR(p)\) is a system of \(n\) linear equations that can be compactly expressed with vector/matrix notation
\[\begin{gather*} \mathbf{y}_t = \boldsymbol{\mu} + \boldsymbol{\Phi}_1 \mathbf{y}_{t-1} + \boldsymbol{\Phi}_2 \mathbf{y}_{t-2} + \cdots + \boldsymbol{\Phi}_p \mathbf{y}_{t-p} + \boldsymbol{\varepsilon}_t \end{gather*}\]where bold indicates a vector/matrix, e.g., \(\mathbf{y}\) has size \(n\times 1\) and \(\boldsymbol{\Phi}\) are \(n\times n\) square matrices
Now we assume \(E(\boldsymbol{\varepsilon}_t) = \mathbf{0}\) and \(E(\boldsymbol{\varepsilon} \boldsymbol{\varepsilon}') = \boldsymbol{\Omega}\), which has size \(n\times n\)
\(\boldsymbol{\Omega}\) is the covariance matrix and must be a symmetric positive definite matrix (i.e., this is analogous to a scalar SD/variance being positive)
A \(VAR\) is a system of linear equations where each variable is regressed on \(p\) of its own lags as well as \(p\) lags of each other variable
Growth Model Example#
Consider the following (nonlinear) growth model
\[\begin{gather*} y_t = z_{t-1} k_{t-1}^\alpha \tag{Prod.} \\ k_t = (1-\delta)k_{t-1} + i_{t-1} \tag{Capital} \\ \log z_t = (1-\rho)\log(\bar{z}) + \rho \log z_{t-1} + \varepsilon_t \tag{Tech.} \\ \varepsilon \sim i.i.d.~N(0,\sigma^2) \tag{Shock} \end{gather*}\]Capital accumulation and technology are (linear) first-order difference questions, and technology is specifically an AR(\(1\))
Investment could be exogenous, or endogenous to output, e.g., \(i_t = sy_t\) where \(s\) is the savings rate.
The production function is nonlinear, but we can linearize it with natural log, e.g., \(\log y_t = \log z_{t-1} + \alpha \log k_{t-1}\)
Natural log and percent change#
It would be nice if all the units of all variables were something easily interpretable, e.g., in percent changes
Suppose each variable is covariance stationary and has a constant long-run mean (a.k.a. steady state or long-run equilibrium). For any variable in the model, \(x_t\), define its mean as \(\bar{x}\).
Note that \(\log(x_t/\bar{x}) \approx x_t/\bar{x} - 1 \equiv \hat{x}_t\) (for more detail, see natural logs and percent changes)
This approximation works so long as the percent changes are relatively small, e.g., 5% or less considering observing a 5% GDP growth rate would be far in the right tail of the distribution of growth rates for most countries.
Linearization#
Recall that natural log converts exponents to multiplication and multiplication to addition (awesome!)
Consider log output, \(\log y_t = \log z_t + \alpha \log k_{t-1}\).
Suppose that each variable has a long-run mean/steady state, then \(\log \bar{y} = \log \bar{z} + \alpha \log \bar{k}\).
Question: How do we express those variables as percent change?
Answer: Subtract the steady state equation from the dynamic equation (note that \(\log x_t - \log \bar{x} = \log(x_t/\bar{x})\))
\[\begin{gather*} \log y_t = \log z_{t-1} + \alpha \log k_{t-1} \\ - (\log \bar{y} = \log \bar{z} + \alpha \log \bar{k}) \\ \rightarrow \log (y_t/\bar{y}) = \log (z_{t-1}/\bar{z}) + \alpha \log (k_{t-1}/\bar{k}) \\ \rightarrow \hat{y}_t = \hat{z}_{t-1} + \alpha \hat{k}_{t-1} \end{gather*}\]
Now \(k_t = (1-\delta)k_{t-1} + i_{t-1}\) is already linear, but the variables are not expressed as percent changes.
Subtract the steady-state equation from the dynamic equation
\[\begin{gather*} k_t = (1-\delta)k_{t-1} + i_{t-1} \\ - (\bar{k} = (1-\delta)\bar{k} + \bar{i}) \\ \rightarrow (k_t-\bar{k}) = (1-\delta)(k_{t-1}-\bar{k}) + (i_{t-1}-\bar{i}) \end{gather*}\]Multiply each term by one in the form of \(\bar{x}/\bar{x}\) using the appropriate variable for each term.
\[\begin{gather*} \frac{\bar{k}}{\bar{k}}(k_t-\bar{k}) = (1-\delta)\frac{\bar{k}}{\bar{k}}(k_{t-1}-\bar{k}) + \frac{\bar{i}}{\bar{i}}(i_{t-1}-\bar{i}) \\ \rightarrow \bar{k}\hat{k}_t = (1-\delta)\bar{k}\hat{k}_{t-1} + \bar{i}\hat{i}_{t-1} \end{gather*}\]Now it’s linear and all variables are expressed as percent changes.
Linear Growth Model#
The linear growth model with endogenous investment
\[\begin{gather*} \hat{y}_t = \hat{z}_{t-1} + \alpha \hat{k}_{t-1} \\ \hat{k}_t = (1-\delta)\hat{k}_{t-1} + \bar{i}\hat{i}_{t-1}/\bar{k} \\ \hat z_t = \rho \hat z_{t-1} + \varepsilon_t \\ \hat i_t = \hat{y}_{t-1} \\ \varepsilon \sim i.i.d.~N(0,\sigma^2) \end{gather*}\]This is a linear system of 4 dynamic equations and 4 variables (i.e., unknowns), \(\{\hat{y}_t,\hat{k}_t,\hat z_t,\hat i_t \}\) that we can map to a VAR(\(1\)), \(\mathbf{x}_t = \boldsymbol{\mu} + \boldsymbol{\Phi} \mathbf{x}_{t-1} + \boldsymbol{\varepsilon}_t\)
\[\begin{gather*} \begin{bmatrix} \hat{y}_t \\ \hat{k}_t \\ \hat{z}_t \\ \hat{i}_t \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} 0 & \alpha & 1 & 0 \\ 0 & (1-\delta) & 0 & \bar{i}/\bar{k} \\ 0 & 0 & \rho & 0 \\ 1 & 0 & 0 & 0 \\ \end{bmatrix} \begin{bmatrix} \hat{y}_{t-1} \\ \hat{k}_{t-1} \\ \hat{z}_{t-1} \\ \hat{i}_{t-1} \end{bmatrix}+ \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \end{gather*}\]where \(\varepsilon \sim i.i.d.~N(0,\sigma^2)\)
Covariance Matrix#
In a VAR(\(p\)), assume \(E(\boldsymbol{\varepsilon}_t) = \mathbf{0}\) and \(E(\boldsymbol{\varepsilon} \boldsymbol{\varepsilon}') = \boldsymbol{\Omega}\)
In the simple linear growth model, there is only one innovation/shock.
\[\begin{gather*} \boldsymbol{\varepsilon}_t \equiv \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \end{gather*}\]which clearly satisfies \(E(\boldsymbol{\varepsilon}_t) = 0\) since all elements are zero or \(E(\varepsilon_t) = 0\)
As for the covariance matrix \(E(\boldsymbol{\varepsilon}_t\boldsymbol{\varepsilon}_t')=\)
\[\begin{gather*} E\left( \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \begin{bmatrix} 0 & 0 & \varepsilon_t & 0 \\ \end{bmatrix}\right) = E\left(\begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \varepsilon_t^2 & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix}\right) = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma^2 & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix} \equiv \boldsymbol{\Omega} \end{gather*}\]
Likelihood Function#
Since a VAR(\(p\)) is a linear system of equations where the innovations are i.i.d. normally distrubted, the likelihood function is a multivariate normal density function. Maximizing the likelihood is the same as maximizing the log-likelihood,
\[\begin{gather*} \mathcal{L}(\boldsymbol{\Theta}) = -\left(\dfrac{Tn}{2}\right) \log (2\pi) + \left(\dfrac{T}{2}\right) \log \left| \boldsymbol{\Omega}^{-1} \right| \\ - \left(\dfrac{1}{2}\right) \sum_{t=1}^T (\mathbf{y}_t - \boldsymbol{\mu})' \boldsymbol{\Omega}^{-1} (\mathbf{y}_t - \boldsymbol{\mu}). \end{gather*}\]It turns out that the ordinary least squares (OLS) estimates are the maximum likelihood estimates (MLE) of the parameters in a VAR(\(p\)).
You can also use estimated residuals, \(\hat{\boldsymbol{\varepsilon}}\), from OLS to get the MLE of the covaraince matrix, \(\hat{\boldsymbol{\Omega}} = E(\hat{\boldsymbol{\varepsilon}} \hat{\boldsymbol{\varepsilon}}')\).
Information criterion are used to select the optimal number of lags, e.g., Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC).
Structural VAR#
A structural VAR(\(p\)) is given by
\[\begin{gather*} \mathbf{A}_0 \mathbf{y}_t = \mathbf{a}_0 + \mathbf{A}_1 \mathbf{y}_{t-1} + \cdots + \mathbf{A}_p \mathbf{y}_{t-p} + \boldsymbol{\varepsilon}_t, \end{gather*}\]where \(\mathbf{A}_0\) allows for contemporaneous relationships between variables, e.g., if there is an aggregate demand shock, then both GDP growth and the inflation rate might increase simultaneously.
Let’s assume \(\boldsymbol{\varepsilon}_t\sim N(0,\mathbf{I}_n)\), i.e., the multivariate standard normal distribution
There are \(n\) variables, each coefficient matrix \(\mathbf{A}_j\) are \(n \times n\) square matrices, while \(\mathbf{a}_0\), \(\mathbf{y}_{t-j}\), and \(\boldsymbol{\varepsilon}_t\) are \(n \times 1\) column vectors.
Reduced-form VAR#
Pre-multiplying by the inverse of \(\mathbf{A}_0\) yields a reduced-form \(VAR\) model
\[\begin{gather*} \mathbf{y}_t = \mathbf{b}_0 + \mathbf{B}_1 \mathbf{y}_{t-1} + \cdots + \mathbf{B}_p \mathbf{y}_{t-p} + \boldsymbol{\upsilon}_t, \end{gather*}\]where \(\mathbf{b}_0 = \mathbf{A}_0^{-1}\mathbf{a}_0\) is a \(n\times 1\) column vector, \(\mathbf{B}_j = \mathbf{A}_0^{-1}\mathbf{A}_j\) are \(n\times n\) matrices
Note: \(\mathbf{A}_0^{-1} \mathbf{A}_0 \mathbf{y}_t = \mathbf{I}_n \mathbf{y}_t = \mathbf{y}_t\), so just \(\mathbf{y}_t\) remains on the left-hand side
\(\boldsymbol{\upsilon}_t = \mathbf{A}_0^{-1}\boldsymbol{\varepsilon}_t\) is a \(n\times 1\) vector of shocks that has a multivariate normal distribution with zero mean and variance-covariance matrix \(\boldsymbol{\Omega}\)
Since the shocks are possibly correlated across variables, \(\mathbf{y}\) becomes an \(n\times 1\) vector of endogenous variables.
Estimation#
Estimate a reduced-form VAR(\(p\))’s parameters with ordinary least squares (OLS) estimator.
However, there might be contemporaneous relationships between variables, i.e., shocks/residuals to different variables can be correlated, so some restriction on the covariance matrix is necessary for identification of a structural VAR(\(p\)).
Given OLS estimate of covariance matrix, \(\hat{\boldsymbol{\Omega}}\), “structural” shocks are often identified recursively, e.g., with a Cholesky decomposition, \(\hat{\boldsymbol{\Omega}} = (\hat{\mathbf{A}}_0^{-1})'\hat{\mathbf{A}}_0^{-1}\), which ensures that \(\hat{\mathbf{A}}_0^{-1}\) is a lower-triangular matrix.
The implication is that the ordering of the variables matter in a structural VAR, so researchers must justify the ordering as part of their identification strategy.
OLS Estimates#
A reduced-form VAR(\(p\)) can be written compactly as
\[\begin{gather*} \mathbf{Y}_{n\times(T-p)} = \mathbf{B}_{n\times(1+np)} \mathbf{X}_{(1+np)\times(T-p)} + \mathbf{U}_{n\times(T-p)} \end{gather*}\]which is a multivariate linear regression where everything in bold is a matrix defined as
\[\begin{gather*} \mathbf{Y}_{T-j} = [\mathbf{y}_{p+1-j},\ldots,\mathbf{y}_{T-j}] \\ \mathbf{B} =[\mathbf{b}_0,\mathbf{B}_1,\ldots,\mathbf{B}_p] \\ \mathbf{X} = [\mathbf{1},\mathbf{Y}_{T-1}',\ldots,\mathbf{Y}_{T-p}']' \\ \mathbf{U} = [\boldsymbol{\upsilon}_{p+1},\ldots,\boldsymbol{\upsilon}_T] \end{gather*}\]Parameters estimates are OLS, \(\hat{\mathbf{B}} = \mathbf{Y} \mathbf{X}' \left( \mathbf{X}\mathbf{X}'\right)^{-1}\).
Residual estimates are \(\hat{\mathbf{U}} = \mathbf{Y} - \hat{\mathbf{B}} \mathbf{X}\) and the covariance matrix is \(\hat{\boldsymbol{\Omega}} = \hat{\mathbf{U}} \hat{\mathbf{U}}'\)
With \(\hat{\boldsymbol{\Omega}}\) in hand, the structural shocks are identified by a Cholesky decomposition, \(\hat{\boldsymbol{\Omega}} = (\hat{A}_0^{-1})'\hat{A}_0^{-1}\)
Recursive Identification#
Suppose the data is 1. Output (\(y_t\)), 2. Inflation (\(\pi_t\)), and 3. Interest Rate (\(r_t\))
\[\begin{gather*} \small \begin{bmatrix} y_t \\ \pi_t \\ r_t \end{bmatrix} = \begin{bmatrix} b_{0,1} \\ b_{0,2} \\ b_{0,3} \end{bmatrix} + \begin{bmatrix} b_{1,1} & b_{1,4} & b_{1,7} \\ b_{1,2} & b_{1,5} & b_{1,8} \\ b_{1,3} & b_{1,6} & b_{1,9} \end{bmatrix} \begin{bmatrix} y_{t-1} \\ \pi_{t-1} \\ r_{t-1} \end{bmatrix} + \begin{bmatrix} a_{0,1} & 0 & 0 \\ a_{0,2} & a_{0,4} & 0 \\ a_{0,3} & a_{0,5} & a_{0,6} \end{bmatrix} \begin{bmatrix} \varepsilon_{y,t} \\ \varepsilon_{\pi,t} \\ \varepsilon_{r,t} \end{bmatrix} \end{gather*}\]Recursive identification, e.g., using a Cholesky decomposition, imposes that
Output does not respond immediately to other variables’ shocks
Inflation may respond immediately to output shocks, but does not respond to interest rate shocks
Interest Rate may respond immediately to both output and inflation shocks
This ordering reflects these beliefs or theory
Firms and consumers don’t instantly change output/expenditure \(\rightarrow\) output responds slowly.
Prices are sticky \(\rightarrow\) inflation adjusts with a lag.
Central banks move fast \(\rightarrow\) interest rates adjust quickly to new information.
Cholesky Decomposition Example#
Here’s an example of a positive definite matrix
\[\begin{split} \boldsymbol{\Omega} = \begin{bmatrix} 4 & 12 & -16 \\ 12 & 37 & -43 \\ -16 & -43 & 98 \end{bmatrix} \end{split}\]Our goal is to find a lower triangular matrix, \(\mathbf{L}\), such that \(\boldsymbol{\Omega} = \mathbf{L} \mathbf{L}'\). The Cholesky factor is
\[\begin{split} L = \begin{bmatrix} 2 & 0 & 0 \\ 6 & 1 & 0 \\ -8 & 5 & 3 \end{bmatrix} \end{split}\]
import numpy as np
# Define the symmetric, positive definite matrix
Omega = np.array([
[4, 12, -16],
[12, 37, -43],
[-16, -43, 98]
])
print(Omega)
[[ 4 12 -16]
[ 12 37 -43]
[-16 -43 98]]
# Compute the Cholesky factor (lower triangular matrix L)
L = np.linalg.cholesky(Omega)
# Print the result
print('L ='); print(L)
print("L L' ="); print(L@L.T)
print('Omega ='); print(Omega)
L =
[[ 2. 0. 0.]
[ 6. 1. 0.]
[-8. 5. 3.]]
L L' =
[[ 4. 12. -16.]
[ 12. 37. -43.]
[-16. -43. 98.]]
Omega =
[[ 4 12 -16]
[ 12 37 -43]
[-16 -43 98]]
Verifying \(\boldsymbol{\Omega} = \mathbf{L} \mathbf{L}'\)
\[\begin{split} \mathbf{L} \mathbf{L}' = \begin{bmatrix} 2 & 0 & 0 \\ 6 & 1 & 0 \\ -8 & 5 & 3 \end{bmatrix}\begin{bmatrix} 2 & 6 & -8 \\ 0 & 1 & 5 \\ 0 & 0 & 3 \end{bmatrix} = \begin{bmatrix} 4 & 12 & -16 \\ 12 & 37 & -43 \\ -16 & -43 & 98 \end{bmatrix} = A \end{split}\]