VAR¶

by Professor Throckmorton
for Time Series Econometrics
W&M ECON 408/PUBP 616
Slides

Introduction¶

  • A $VAR(p)$ is a system of $n$ linear equations that can be compactly expressed with vector/matrix notation

    \begin{gather*} \mathbf{y}_t = \boldsymbol{\mu} + \boldsymbol{\Phi}_1 \mathbf{y}_{t-1} + \boldsymbol{\Phi}_2 \mathbf{y}_{t-2} + \cdots + \boldsymbol{\Phi}_p \mathbf{y}_{t-p} + \boldsymbol{\varepsilon}_t \end{gather*}

    where bold indicates a vector/matrix, e.g., $\mathbf{y}$ has size $n\times 1$ and $\boldsymbol{\Phi}$ are $n\times n$ square matrices

  • Now we assume $E(\boldsymbol{\varepsilon}_t) = \mathbf{0}$ and $E(\boldsymbol{\varepsilon} \boldsymbol{\varepsilon}') = \boldsymbol{\Omega}$, which has size $n\times n$

  • $\boldsymbol{\Omega}$ is the covariance matrix and must be a symmetric positive definite matrix (i.e., this is analogous to a scalar SD/variance being positive)

  • A $VAR$ is a system of linear equations where each variable is regressed on $p$ of its own lags as well as $p$ lags of each other variable

Growth Model Example¶

  • Consider the following (nonlinear) growth model

    \begin{gather*} y_t = z_{t-1} k_{t-1}^\alpha \tag{Prod.} \\ k_t = (1-\delta)k_{t-1} + i_{t-1} \tag{Capital} \\ \log z_t = (1-\rho)\log(\bar{z}) + \rho \log z_{t-1} + \varepsilon_t \tag{Tech.} \\ \varepsilon \sim i.i.d.~N(0,\sigma^2) \tag{Shock} \end{gather*}

  • Capital accumulation and technology are (linear) first-order difference questions, and technology is specifically an AR($1$)

  • Investment could be exogenous, or endogenous to output, e.g., $i_t = sy_t$ where $s$ is the savings rate.

  • The production function is nonlinear, but we can linearize it with natural log, e.g., $\log y_t = \log z_{t-1} + \alpha \log k_{t-1}$

Natural log and percent change¶

  • It would be nice if all the units of all variables were something easily interpretable, e.g., in percent changes
  • Suppose each variable is covariance stationary and has a constant long-run mean (a.k.a. steady state or long-run equilibrium). For any variable in the model, $x_t$, define its mean as $\bar{x}$.
  • Note that $\log(x_t/\bar{x}) \approx x_t/\bar{x} - 1 \equiv \hat{x}_t$ (for more detail, see natural logs and percent changes)
  • This approximation works so long as the percent changes are relatively small, e.g., 5% or less considering observing a 5% GDP growth rate would be far in the right tail of the distribution of growth rates for most countries.

Linearization¶

  • Recall that natural log converts exponents to multiplication and multiplication to addition (awesome!)

  • Consider log output, $\log y_t = \log z_t + \alpha \log k_{t-1}$.

  • Suppose that each variable has a long-run mean/steady state, then $\log \bar{y} = \log \bar{z} + \alpha \log \bar{k}$.

  • Question: How do we express those variables as percent change?

  • Answer: Subtract the steady state equation from the dynamic equation (note that $\log x_t - \log \bar{x} = \log(x_t/\bar{x})$)

    \begin{gather*} \log y_t = \log z_{t-1} + \alpha \log k_{t-1} \\ - (\log \bar{y} = \log \bar{z} + \alpha \log \bar{k}) \\ \rightarrow \log (y_t/\bar{y}) = \log (z_{t-1}/\bar{z}) + \alpha \log (k_{t-1}/\bar{k}) \\ \rightarrow \hat{y}_t = \hat{z}_{t-1} + \alpha \hat{k}_{t-1} \end{gather*}

  • Now $k_t = (1-\delta)k_{t-1} + i_{t-1}$ is already linear, but the variables are not expressed as percent changes.

  • Subtract the steady-state equation from the dynamic equation

    \begin{gather*} k_t = (1-\delta)k_{t-1} + i_{t-1} \\ - (\bar{k} = (1-\delta)\bar{k} + \bar{i}) \\ \rightarrow (k_t-\bar{k}) = (1-\delta)(k_{t-1}-\bar{k}) + (i_{t-1}-\bar{i}) \end{gather*}

  • Multiply each term by one in the form of $\bar{x}/\bar{x}$ using the appropriate variable for each term.

    \begin{gather*} \frac{\bar{k}}{\bar{k}}(k_t-\bar{k}) = (1-\delta)\frac{\bar{k}}{\bar{k}}(k_{t-1}-\bar{k}) + \frac{\bar{i}}{\bar{i}}(i_{t-1}-\bar{i}) \\ \rightarrow \bar{k}\hat{k}_t = (1-\delta)\bar{k}\hat{k}_{t-1} + \bar{i}\hat{i}_{t-1} \end{gather*}

  • Now it's linear and all variables are expressed as percent changes.

Linear Growth Model¶

  • The linear growth model with endogenous investment

    \begin{gather*} \hat{y}_t = \hat{z}_{t-1} + \alpha \hat{k}_{t-1} \\ \hat{k}_t = (1-\delta)\hat{k}_{t-1} + \bar{i}\hat{i}_{t-1}/\bar{k} \\ \hat z_t = \rho \hat z_{t-1} + \varepsilon_t \\ \hat i_t = \hat{y}_{t-1} \\ \varepsilon \sim i.i.d.~N(0,\sigma^2) \end{gather*}

  • This is a linear system of 4 dynamic equations and 4 variables (i.e., unknowns), $\{\hat{y}_t,\hat{k}_t,\hat z_t,\hat i_t \}$ that we can map to a VAR($1$), $\mathbf{x}_t = \boldsymbol{\mu} + \boldsymbol{\Phi} \mathbf{x}_{t-1} + \boldsymbol{\varepsilon}_t$

    \begin{gather*} \begin{bmatrix} \hat{y}_t \\ \hat{k}_t \\ \hat{z}_t \\ \hat{i}_t \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} 0 & \alpha & 1 & 0 \\ 0 & (1-\delta) & 0 & \bar{i}/\bar{k} \\ 0 & 0 & \rho & 0 \\ 1 & 0 & 0 & 0 \\ \end{bmatrix} \begin{bmatrix} \hat{y}_{t-1} \\ \hat{k}_{t-1} \\ \hat{z}_{t-1} \\ \hat{i}_{t-1} \end{bmatrix}+ \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \end{gather*}

    where $\varepsilon \sim i.i.d.~N(0,\sigma^2)$

Covariance Matrix¶

  • In a VAR($p$), assume $E(\boldsymbol{\varepsilon}_t) = \mathbf{0}$ and $E(\boldsymbol{\varepsilon} \boldsymbol{\varepsilon}') = \boldsymbol{\Omega}$

  • In the simple linear growth model, there is only one innovation/shock.

    \begin{gather*} \boldsymbol{\varepsilon}_t \equiv \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \end{gather*}

    which clearly satisfies $E(\boldsymbol{\varepsilon}_t) = 0$ since all elements are zero or $E(\varepsilon_t) = 0$

  • As for the covariance matrix, $E(\boldsymbol{\varepsilon} \boldsymbol{\varepsilon}') = $

    \begin{gather*} E\left( \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \begin{bmatrix} 0 & 0 & \varepsilon_t & 0 \\ \end{bmatrix}\right) = E\left(\begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \varepsilon_t^2 & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix}\right) = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma^2 & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix} \equiv \boldsymbol{\Omega} \end{gather*}

Likelihood Function¶

  • Since a VAR($p$) is a linear system of equations where the innovations are i.i.d. normally distrubted, the likelihood function is a multivariate normal density function. Maximizing the likelihood is the same as maximizing the log-likelihood,

    \begin{gather*} \mathcal{L}(\boldsymbol{\Theta}) = -\left(\dfrac{Tn}{2}\right) \log (2\pi) + \left(\dfrac{T}{2}\right) \log \left| \boldsymbol{\Omega}^{-1} \right| \\ - \left(\dfrac{1}{2}\right) \sum_{t=1}^T (\mathbf{y}_t - \boldsymbol{\mu})' \boldsymbol{\Omega}^{-1} (\mathbf{y}_t - \boldsymbol{\mu}). \end{gather*}

  • It turns out that the ordinary least squares (OLS) estimates are the maximum likelihood estimates (MLE) of the parameters in a VAR($p$).

  • You can also use estimated residuals, $\hat{\boldsymbol{\varepsilon}}$, from OLS to get the MLE of the covaraince matrix, $\hat{\boldsymbol{\Omega}} = E(\hat{\boldsymbol{\varepsilon}} \hat{\boldsymbol{\varepsilon}}')$.

  • Information criterion are used to select the optimal number of lags, e.g., Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC).

Structural VAR¶

  • A structural VAR($p$) is given by

    \begin{gather*} \mathbf{A}_0 \mathbf{y}_t = \mathbf{a}_0 + \mathbf{A}_1 \mathbf{y}_{t-1} + \cdots + \mathbf{A}_p \mathbf{y}_{t-p} + \boldsymbol{\varepsilon}_t, \end{gather*} where $\mathbf{A}_0$ allows for contemporaneous relationships between variables, e.g., if there is an aggregate demand shock, then both GDP growth and the inflation rate might increase simultaneously.

  • Let's assume $\boldsymbol{\varepsilon}_t\sim N(0,\mathbf{I}_n)$, i.e., the multivariate standard normal distribution

  • There are $n$ variables, each coefficient matrix $\mathbf{A}_j$ are $n \times n$ square matrices, while $\mathbf{a}_0$, $\mathbf{y}_{t-j}$, and $\boldsymbol{\varepsilon}_t$ are $n \times 1$ column vectors.

Reduced-form VAR¶

  • Pre-multiplying by the inverse of $\mathbf{A}_0$ yields a reduced-form $VAR$ model

    \begin{gather*} \mathbf{y}_t = \mathbf{b}_0 + \mathbf{B}_1 \mathbf{y}_{t-1} + \cdots + \mathbf{B}_p \mathbf{y}_{t-p} + \boldsymbol{\upsilon}_t, \end{gather*}

    where $\mathbf{b}_0 = \mathbf{A}_0^{-1}\mathbf{a}_0$ is a $n\times 1$ column vector, $\mathbf{B}_j = \mathbf{A}_0^{-1}\mathbf{A}_j$ are $n\times n$ matrices

  • Note: $\mathbf{A}_0^{-1} \mathbf{A}_0 \mathbf{y}_t = \mathbf{I}_n \mathbf{y}_t = \mathbf{y}_t$, so just $\mathbf{y}_t$ remains on the left-hand side

  • $\boldsymbol{\upsilon}_t = \mathbf{A}_0^{-1}\boldsymbol{\varepsilon}_t$ is a $n\times 1$ vector of shocks that has a multivariate normal distribution with zero mean and variance-covariance matrix $\boldsymbol{\Omega}$

  • Since the shocks are possibly correlated across variables, $\mathbf{y}$ becomes an $n\times 1$ vector of endogenous variables.

Estimation¶

  • Estimate a reduced-form VAR($p$)'s parameters with ordinary least squares (OLS) estimator.
  • However, there might be contemporaneous relationships between variables, i.e., shocks/residuals to different variables can be correlated, so some restriction on the covariance matrix is necessary for identification of a structural VAR($p$).
  • Given OLS estimate of covariance matrix, $\hat{\boldsymbol{\Omega}}$, "structural" shocks are often identified recursively, e.g., with a Cholesky decomposition, $\hat{\boldsymbol{\Omega}} = (\hat{\mathbf{A}}_0^{-1})'\hat{\mathbf{A}}_0^{-1}$, which ensures that $\hat{\mathbf{A}}_0^{-1}$ is a lower-triangular matrix.
  • The implication is that the ordering of the variables matter in a structural VAR, so researchers must justify the ordering as part of their identification strategy.

OLS Estimates¶

  • A reduced-form VAR($p$) can be written compactly as

    \begin{gather*} \mathbf{Y}_{n\times(T-p)} = \mathbf{B}_{n\times(1+np)} \mathbf{X}_{(1+np)\times(T-p)} + \mathbf{U}_{n\times(T-p)} \end{gather*}

    which is a multivariate linear regression where everything in bold is a matrix defined as

    \begin{gather*} \mathbf{Y}_{T-j} = [\mathbf{y}_{p+1-j},\ldots,\mathbf{y}_{T-j}] \\ \mathbf{B} =[\mathbf{b}_0,\mathbf{B}_1,\ldots,\mathbf{B}_p] \\ \mathbf{X} = [\mathbf{1},\mathbf{Y}_{T-1}',\ldots,\mathbf{Y}_{T-p}']' \\ \mathbf{U} = [\boldsymbol{\upsilon}_{p+1},\ldots,\boldsymbol{\upsilon}_T] \end{gather*}

  • Parameters estimates are OLS, $\hat{\mathbf{B}} = \mathbf{Y} \mathbf{X}' \left( \mathbf{X}\mathbf{X}'\right)^{-1}$.

  • Residual estimates are $\hat{\mathbf{U}} = \mathbf{Y} - \hat{\mathbf{B}} \mathbf{X}$ and the covariance matrix is $\hat{\boldsymbol{\Omega}} = \hat{\mathbf{U}} \hat{\mathbf{U}}'$

  • With $\hat{\boldsymbol{\Omega}}$ in hand, the structural shocks are identified by a Cholesky decomposition, $\hat{\boldsymbol{\Omega}} = (\hat{A}_0^{-1})'\hat{A}_0^{-1}$

Recursive Identification¶

  • Suppose the data is 1. Output ($y_t$), 2. Inflation ($\pi_t$), and 3. Interest Rate ($r_t$)

    \begin{gather*} \small \begin{bmatrix} y_t \\ \pi_t \\ r_t \end{bmatrix} = \begin{bmatrix} b_{0,1} \\ b_{0,2} \\ b_{0,3} \end{bmatrix} + \begin{bmatrix} b_{1,1} & b_{1,4} & b_{1,7} \\ b_{1,2} & b_{1,5} & b_{1,8} \\ b_{1,3} & b_{1,6} & b_{1,9} \end{bmatrix} \begin{bmatrix} y_{t-1} \\ \pi_{t-1} \\ r_{t-1} \end{bmatrix} + \begin{bmatrix} a_{0,1} & 0 & 0 \\ a_{0,2} & a_{0,4} & 0 \\ a_{0,3} & a_{0,5} & a_{0,6} \end{bmatrix} \begin{bmatrix} \varepsilon_{y,t} \\ \varepsilon_{\pi,t} \\ \varepsilon_{r,t} \end{bmatrix} \end{gather*}

  • Recursive identification, e.g., using a Cholesky decomposition, imposes that

    • Output does not respond immediately to other variables' shocks
    • Inflation may respond immediately to output shocks, but does not respond to interest rate shocks
    • Interest Rate may respond immediately to both output and inflation shocks
  • This ordering reflects these beliefs or theory

    • Firms and consumers don’t instantly change output/expenditure $\rightarrow$ output responds slowly.
    • Prices are sticky $\rightarrow$ inflation adjusts with a lag.
    • Central banks move fast $\rightarrow$ interest rates adjust quickly to new information.

Cholesky Decomposition Example¶

  • Here's an example of a positive definite matrix

    $$ \boldsymbol{\Omega} = \begin{bmatrix} 4 & 12 & -16 \\ 12 & 37 & -43 \\ -16 & -43 & 98 \end{bmatrix} $$

  • Our goal is to find a lower triangular matrix, $\mathbf{L}$, such that $\boldsymbol{\Omega} = \mathbf{L} \mathbf{L}'$. The Cholesky factor is

    $$ L = \begin{bmatrix} 2 & 0 & 0 \\ 6 & 1 & 0 \\ -8 & 5 & 3 \end{bmatrix} $$

In [5]:
import numpy as np

# Define the symmetric, positive definite matrix
Omega = np.array([
    [4, 12, -16],
    [12, 37, -43],
    [-16, -43, 98]
])
print(Omega)
[[  4  12 -16]
 [ 12  37 -43]
 [-16 -43  98]]
In [13]:
# Compute the Cholesky factor (lower triangular matrix L)
L = np.linalg.cholesky(Omega)

# Print the result
print('L ='); print(L)
print("L L' ="); print(L@L.T)
print('Omega ='); print(Omega)
L =
[[ 2.  0.  0.]
 [ 6.  1.  0.]
 [-8.  5.  3.]]
L L' =
[[  4.  12. -16.]
 [ 12.  37. -43.]
 [-16. -43.  98.]]
Omega =
[[  4  12 -16]
 [ 12  37 -43]
 [-16 -43  98]]
  • Verifying $\boldsymbol{\Omega} = \mathbf{L} \mathbf{L}'$

    $$ \mathbf{L} \mathbf{L}' = \begin{bmatrix} 2 & 0 & 0 \\ 6 & 1 & 0 \\ -8 & 5 & 3 \end{bmatrix}\begin{bmatrix} 2 & 6 & -8 \\ 0 & 1 & 5 \\ 0 & 0 & 3 \end{bmatrix} = \begin{bmatrix} 4 & 12 & -16 \\ 12 & 37 & -43 \\ -16 & -43 & 98 \end{bmatrix} = A $$