VAR Model#

by Professor Throckmorton
for Time Series Econometrics
W&M ECON 408/PUBP 616
Slides

Introduction#

  • A \(VAR(p)\) is a system of \(n\) linear equations that can be compactly expressed with vector/matrix notation

    \[\begin{gather*} \mathbf{y}_t = \boldsymbol{\mu} + \boldsymbol{\Phi}_1 \mathbf{y}_{t-1} + \boldsymbol{\Phi}_2 \mathbf{y}_{t-2} + \cdots + \boldsymbol{\Phi}_p \mathbf{y}_{t-p} + \boldsymbol{\varepsilon}_t \end{gather*}\]

    where bold indicates a vector/matrix, e.g., \(\mathbf{y}\) has size \(n\times 1\) and \(\boldsymbol{\Phi}\) are \(n\times n\) square matrices

  • Now we assume \(E(\boldsymbol{\varepsilon}_t) = \mathbf{0}\) and \(E(\boldsymbol{\varepsilon} \boldsymbol{\varepsilon}') = \boldsymbol{\Omega}\), which has size \(n\times n\)

  • \(\boldsymbol{\Omega}\) is the covariance matrix and must be a symmetric positive definite matrix (i.e., this is analogous to a scalar SD/variance being positive)

  • A \(VAR\) is a system of linear equations where each variable is regressed on \(p\) of its own lags as well as \(p\) lags of each other variable

Growth Model Example#

  • Consider the following (nonlinear) growth model

    \[\begin{gather*} y_t = z_{t-1} k_{t-1}^\alpha \tag{Prod.} \\ k_t = (1-\delta)k_{t-1} + i_{t-1} \tag{Capital} \\ \log z_t = (1-\rho)\log(\bar{z}) + \rho \log z_{t-1} + \varepsilon_t \tag{Tech.} \\ \varepsilon \sim i.i.d.~N(0,\sigma^2) \tag{Shock} \end{gather*}\]
  • Capital accumulation and technology are (linear) first-order difference questions, and technology is specifically an AR(\(1\))

  • Investment could be exogenous, or endogenous to output, e.g., \(i_t = sy_t\) where \(s\) is the savings rate.

  • The production function is nonlinear, but we can linearize it with natural log, e.g., \(\log y_t = \log z_{t-1} + \alpha \log k_{t-1}\)

Natural log and percent change#

  • It would be nice if all the units of all variables were something easily interpretable, e.g., in percent changes

  • Suppose each variable is covariance stationary and has a constant long-run mean (a.k.a. steady state or long-run equilibrium). For any variable in the model, \(x_t\), define its mean as \(\bar{x}\).

  • Note that \(\log(x_t/\bar{x}) \approx x_t/\bar{x} - 1 \equiv \hat{x}_t\) (for more detail, see natural logs and percent changes)

  • This approximation works so long as the percent changes are relatively small, e.g., 5% or less considering observing a 5% GDP growth rate would be far in the right tail of the distribution of growth rates for most countries.

Linearization#

  • Recall that natural log converts exponents to multiplication and multiplication to addition (awesome!)

  • Consider log output, \(\log y_t = \log z_t + \alpha \log k_{t-1}\).

  • Suppose that each variable has a long-run mean/steady state, then \(\log \bar{y} = \log \bar{z} + \alpha \log \bar{k}\).

  • Question: How do we express those variables as percent change?

  • Answer: Subtract the steady state equation from the dynamic equation (note that \(\log x_t - \log \bar{x} = \log(x_t/\bar{x})\))

    \[\begin{gather*} \log y_t = \log z_{t-1} + \alpha \log k_{t-1} \\ - (\log \bar{y} = \log \bar{z} + \alpha \log \bar{k}) \\ \rightarrow \log (y_t/\bar{y}) = \log (z_{t-1}/\bar{z}) + \alpha \log (k_{t-1}/\bar{k}) \\ \rightarrow \hat{y}_t = \hat{z}_{t-1} + \alpha \hat{k}_{t-1} \end{gather*}\]
  • Now \(k_t = (1-\delta)k_{t-1} + i_{t-1}\) is already linear, but the variables are not expressed as percent changes.

  • Subtract the steady-state equation from the dynamic equation

    \[\begin{gather*} k_t = (1-\delta)k_{t-1} + i_{t-1} \\ - (\bar{k} = (1-\delta)\bar{k} + \bar{i}) \\ \rightarrow (k_t-\bar{k}) = (1-\delta)(k_{t-1}-\bar{k}) + (i_{t-1}-\bar{i}) \end{gather*}\]
  • Multiply each term by one in the form of \(\bar{x}/\bar{x}\) using the appropriate variable for each term.

    \[\begin{gather*} \frac{\bar{k}}{\bar{k}}(k_t-\bar{k}) = (1-\delta)\frac{\bar{k}}{\bar{k}}(k_{t-1}-\bar{k}) + \frac{\bar{i}}{\bar{i}}(i_{t-1}-\bar{i}) \\ \rightarrow \bar{k}\hat{k}_t = (1-\delta)\bar{k}\hat{k}_{t-1} + \bar{i}\hat{i}_{t-1} \end{gather*}\]
  • Now it’s linear and all variables are expressed as percent changes.

Linear Growth Model#

  • The linear growth model with endogenous investment

    \[\begin{gather*} \hat{y}_t = \hat{z}_{t-1} + \alpha \hat{k}_{t-1} \\ \hat{k}_t = (1-\delta)\hat{k}_{t-1} + \bar{i}\hat{i}_{t-1}/\bar{k} \\ \hat z_t = \rho \hat z_{t-1} + \varepsilon_t \\ \hat i_t = \hat{y}_{t-1} \\ \varepsilon \sim i.i.d.~N(0,\sigma^2) \end{gather*}\]
  • This is a linear system of 4 dynamic equations and 4 variables (i.e., unknowns), \(\{\hat{y}_t,\hat{k}_t,\hat z_t,\hat i_t \}\) that we can map to a VAR(\(1\)), \(\mathbf{x}_t = \boldsymbol{\mu} + \boldsymbol{\Phi} \mathbf{x}_{t-1} + \boldsymbol{\varepsilon}_t\)

    \[\begin{gather*} \begin{bmatrix} \hat{y}_t \\ \hat{k}_t \\ \hat{z}_t \\ \hat{i}_t \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} 0 & \alpha & 1 & 0 \\ 0 & (1-\delta) & 0 & \bar{i}/\bar{k} \\ 0 & 0 & \rho & 0 \\ 1 & 0 & 0 & 0 \\ \end{bmatrix} \begin{bmatrix} \hat{y}_{t-1} \\ \hat{k}_{t-1} \\ \hat{z}_{t-1} \\ \hat{i}_{t-1} \end{bmatrix}+ \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \end{gather*}\]

    where \(\varepsilon \sim i.i.d.~N(0,\sigma^2)\)

Covariance Matrix#

  • In a VAR(\(p\)), assume \(E(\boldsymbol{\varepsilon}_t) = \mathbf{0}\) and \(E(\boldsymbol{\varepsilon} \boldsymbol{\varepsilon}') = \boldsymbol{\Omega}\)

  • In the simple linear growth model, there is only one innovation/shock.

    \[\begin{gather*} \boldsymbol{\varepsilon}_t \equiv \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \end{gather*}\]

    which clearly satisfies \(E(\boldsymbol{\varepsilon}_t) = 0\) since all elements are zero or \(E(\varepsilon_t) = 0\)

  • As for the covariance matrix \(E(\boldsymbol{\varepsilon}_t\boldsymbol{\varepsilon}_t')=\)

    \[\begin{gather*} E\left( \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \begin{bmatrix} 0 & 0 & \varepsilon_t & 0 \\ \end{bmatrix}\right) = E\left(\begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \varepsilon_t^2 & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix}\right) = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma^2 & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix} \equiv \boldsymbol{\Omega} \end{gather*}\]

Likelihood Function#

  • Since a VAR(\(p\)) is a linear system of equations where the innovations are i.i.d. normally distrubted, the likelihood function is a multivariate normal density function. Maximizing the likelihood is the same as maximizing the log-likelihood,

    \[\begin{gather*} \mathcal{L}(\boldsymbol{\Theta}) = -\left(\dfrac{Tn}{2}\right) \log (2\pi) + \left(\dfrac{T}{2}\right) \log \left| \boldsymbol{\Omega}^{-1} \right| \\ - \left(\dfrac{1}{2}\right) \sum_{t=1}^T (\mathbf{y}_t - \boldsymbol{\mu})' \boldsymbol{\Omega}^{-1} (\mathbf{y}_t - \boldsymbol{\mu}). \end{gather*}\]
  • It turns out that the ordinary least squares (OLS) estimates are the maximum likelihood estimates (MLE) of the parameters in a VAR(\(p\)).

  • You can also use estimated residuals, \(\hat{\boldsymbol{\varepsilon}}\), from OLS to get the MLE of the covaraince matrix, \(\hat{\boldsymbol{\Omega}} = E(\hat{\boldsymbol{\varepsilon}} \hat{\boldsymbol{\varepsilon}}')\).

  • Information criterion are used to select the optimal number of lags, e.g., Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC).

Structural VAR#

  • A structural VAR(\(p\)) is given by

    \[\begin{gather*} \mathbf{A}_0 \mathbf{y}_t = \mathbf{a}_0 + \mathbf{A}_1 \mathbf{y}_{t-1} + \cdots + \mathbf{A}_p \mathbf{y}_{t-p} + \boldsymbol{\varepsilon}_t, \end{gather*}\]

    where \(\mathbf{A}_0\) allows for contemporaneous relationships between variables, e.g., if there is an aggregate demand shock, then both GDP growth and the inflation rate might increase simultaneously.

  • Let’s assume \(\boldsymbol{\varepsilon}_t\sim N(0,\mathbf{I}_n)\), i.e., the multivariate standard normal distribution

  • There are \(n\) variables, each coefficient matrix \(\mathbf{A}_j\) are \(n \times n\) square matrices, while \(\mathbf{a}_0\), \(\mathbf{y}_{t-j}\), and \(\boldsymbol{\varepsilon}_t\) are \(n \times 1\) column vectors.

Reduced-form VAR#

  • Pre-multiplying by the inverse of \(\mathbf{A}_0\) yields a reduced-form \(VAR\) model

    \[\begin{gather*} \mathbf{y}_t = \mathbf{b}_0 + \mathbf{B}_1 \mathbf{y}_{t-1} + \cdots + \mathbf{B}_p \mathbf{y}_{t-p} + \boldsymbol{\upsilon}_t, \end{gather*}\]

    where \(\mathbf{b}_0 = \mathbf{A}_0^{-1}\mathbf{a}_0\) is a \(n\times 1\) column vector, \(\mathbf{B}_j = \mathbf{A}_0^{-1}\mathbf{A}_j\) are \(n\times n\) matrices

  • Note: \(\mathbf{A}_0^{-1} \mathbf{A}_0 \mathbf{y}_t = \mathbf{I}_n \mathbf{y}_t = \mathbf{y}_t\), so just \(\mathbf{y}_t\) remains on the left-hand side

  • \(\boldsymbol{\upsilon}_t = \mathbf{A}_0^{-1}\boldsymbol{\varepsilon}_t\) is a \(n\times 1\) vector of shocks that has a multivariate normal distribution with zero mean and variance-covariance matrix \(\boldsymbol{\Omega}\)

  • Since the shocks are possibly correlated across variables, \(\mathbf{y}\) becomes an \(n\times 1\) vector of endogenous variables.

Estimation#

  • Estimate a reduced-form VAR(\(p\))’s parameters with ordinary least squares (OLS) estimator.

  • However, there might be contemporaneous relationships between variables, i.e., shocks/residuals to different variables can be correlated, so some restriction on the covariance matrix is necessary for identification of a structural VAR(\(p\)).

  • Given OLS estimate of covariance matrix, \(\hat{\boldsymbol{\Omega}}\), “structural” shocks are often identified recursively, e.g., with a Cholesky decomposition, \(\hat{\boldsymbol{\Omega}} = (\hat{\mathbf{A}}_0^{-1})'\hat{\mathbf{A}}_0^{-1}\), which ensures that \(\hat{\mathbf{A}}_0^{-1}\) is a lower-triangular matrix.

  • The implication is that the ordering of the variables matter in a structural VAR, so researchers must justify the ordering as part of their identification strategy.

OLS Estimates#

  • A reduced-form VAR(\(p\)) can be written compactly as

    \[\begin{gather*} \mathbf{Y}_{n\times(T-p)} = \mathbf{B}_{n\times(1+np)} \mathbf{X}_{(1+np)\times(T-p)} + \mathbf{U}_{n\times(T-p)} \end{gather*}\]

    which is a multivariate linear regression where everything in bold is a matrix defined as

    \[\begin{gather*} \mathbf{Y}_{T-j} = [\mathbf{y}_{p+1-j},\ldots,\mathbf{y}_{T-j}] \\ \mathbf{B} =[\mathbf{b}_0,\mathbf{B}_1,\ldots,\mathbf{B}_p] \\ \mathbf{X} = [\mathbf{1},\mathbf{Y}_{T-1}',\ldots,\mathbf{Y}_{T-p}']' \\ \mathbf{U} = [\boldsymbol{\upsilon}_{p+1},\ldots,\boldsymbol{\upsilon}_T] \end{gather*}\]
  • Parameters estimates are OLS, \(\hat{\mathbf{B}} = \mathbf{Y} \mathbf{X}' \left( \mathbf{X}\mathbf{X}'\right)^{-1}\).

  • Residual estimates are \(\hat{\mathbf{U}} = \mathbf{Y} - \hat{\mathbf{B}} \mathbf{X}\) and the covariance matrix is \(\hat{\boldsymbol{\Omega}} = \hat{\mathbf{U}} \hat{\mathbf{U}}'\)

  • With \(\hat{\boldsymbol{\Omega}}\) in hand, the structural shocks are identified by a Cholesky decomposition, \(\hat{\boldsymbol{\Omega}} = (\hat{A}_0^{-1})'\hat{A}_0^{-1}\)

Recursive Identification#

  • Suppose the data is 1. Output (\(y_t\)), 2. Inflation (\(\pi_t\)), and 3. Interest Rate (\(r_t\))

    \[\begin{gather*} \small \begin{bmatrix} y_t \\ \pi_t \\ r_t \end{bmatrix} = \begin{bmatrix} b_{0,1} \\ b_{0,2} \\ b_{0,3} \end{bmatrix} + \begin{bmatrix} b_{1,1} & b_{1,4} & b_{1,7} \\ b_{1,2} & b_{1,5} & b_{1,8} \\ b_{1,3} & b_{1,6} & b_{1,9} \end{bmatrix} \begin{bmatrix} y_{t-1} \\ \pi_{t-1} \\ r_{t-1} \end{bmatrix} + \begin{bmatrix} a_{0,1} & 0 & 0 \\ a_{0,2} & a_{0,4} & 0 \\ a_{0,3} & a_{0,5} & a_{0,6} \end{bmatrix} \begin{bmatrix} \varepsilon_{y,t} \\ \varepsilon_{\pi,t} \\ \varepsilon_{r,t} \end{bmatrix} \end{gather*}\]
  • Recursive identification, e.g., using a Cholesky decomposition, imposes that

    • Output does not respond immediately to other variables’ shocks

    • Inflation may respond immediately to output shocks, but does not respond to interest rate shocks

    • Interest Rate may respond immediately to both output and inflation shocks

  • This ordering reflects these beliefs or theory

    • Firms and consumers don’t instantly change output/expenditure \(\rightarrow\) output responds slowly.

    • Prices are sticky \(\rightarrow\) inflation adjusts with a lag.

    • Central banks move fast \(\rightarrow\) interest rates adjust quickly to new information.

Cholesky Decomposition Example#

  • Here’s an example of a positive definite matrix

    \[\begin{split} \boldsymbol{\Omega} = \begin{bmatrix} 4 & 12 & -16 \\ 12 & 37 & -43 \\ -16 & -43 & 98 \end{bmatrix} \end{split}\]
  • Our goal is to find a lower triangular matrix, \(\mathbf{L}\), such that \(\boldsymbol{\Omega} = \mathbf{L} \mathbf{L}'\). The Cholesky factor is

    \[\begin{split} L = \begin{bmatrix} 2 & 0 & 0 \\ 6 & 1 & 0 \\ -8 & 5 & 3 \end{bmatrix} \end{split}\]
import numpy as np

# Define the symmetric, positive definite matrix
Omega = np.array([
    [4, 12, -16],
    [12, 37, -43],
    [-16, -43, 98]
])
print(Omega)
[[  4  12 -16]
 [ 12  37 -43]
 [-16 -43  98]]
# Compute the Cholesky factor (lower triangular matrix L)
L = np.linalg.cholesky(Omega)

# Print the result
print('L ='); print(L)
print("L L' ="); print(L@L.T)
print('Omega ='); print(Omega)
L =
[[ 2.  0.  0.]
 [ 6.  1.  0.]
 [-8.  5.  3.]]
L L' =
[[  4.  12. -16.]
 [ 12.  37. -43.]
 [-16. -43.  98.]]
Omega =
[[  4  12 -16]
 [ 12  37 -43]
 [-16 -43  98]]
  • Verifying \(\boldsymbol{\Omega} = \mathbf{L} \mathbf{L}'\)

    \[\begin{split} \mathbf{L} \mathbf{L}' = \begin{bmatrix} 2 & 0 & 0 \\ 6 & 1 & 0 \\ -8 & 5 & 3 \end{bmatrix}\begin{bmatrix} 2 & 6 & -8 \\ 0 & 1 & 5 \\ 0 & 0 & 3 \end{bmatrix} = \begin{bmatrix} 4 & 12 & -16 \\ 12 & 37 & -43 \\ -16 & -43 & 98 \end{bmatrix} = A \end{split}\]