Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

VAR Model

by Professor Throckmorton
for Time Series Econometrics
W&M ECON 408/PUBP 616
Slides

Introduction

  • A VAR(p)VAR(p) is a system of nn linear equations that can be compactly expressed with vector/matrix notation

    yt=μ+Φ1yt1+Φ2yt2++Φpytp+εt\begin{gather*} \mathbf{y}_t = \boldsymbol{\mu} + \boldsymbol{\Phi}_1 \mathbf{y}_{t-1} + \boldsymbol{\Phi}_2 \mathbf{y}_{t-2} + \cdots + \boldsymbol{\Phi}_p \mathbf{y}_{t-p} + \boldsymbol{\varepsilon}_t \end{gather*}

    where bold indicates a vector/matrix, e.g., y\mathbf{y} has size n×1n\times 1 and Φ\boldsymbol{\Phi} are n×nn\times n square matrices

  • Now we assume E(εt)=0E(\boldsymbol{\varepsilon}_t) = \mathbf{0} and E(εε)=ΩE(\boldsymbol{\varepsilon} \boldsymbol{\varepsilon}') = \boldsymbol{\Omega}, which has size n×nn\times n

  • Ω\boldsymbol{\Omega} is the covariance matrix and must be a symmetric positive definite matrix (i.e., this is analogous to a scalar SD/variance being positive)

  • A VARVAR is a system of linear equations where each variable is regressed on pp of its own lags as well as pp lags of each other variable

Growth Model Example

  • Consider the following (nonlinear) growth model

    yt=zt1kt1αkt=(1δ)kt1+it1logzt=(1ρ)log(zˉ)+ρlogzt1+εtεi.i.d. N(0,σ2)\begin{gather*} y_t = z_{t-1} k_{t-1}^\alpha \tag{Prod.} \\ k_t = (1-\delta)k_{t-1} + i_{t-1} \tag{Capital} \\ \log z_t = (1-\rho)\log(\bar{z}) + \rho \log z_{t-1} + \varepsilon_t \tag{Tech.} \\ \varepsilon \sim i.i.d.~N(0,\sigma^2) \tag{Shock} \end{gather*}
  • Capital accumulation and technology are (linear) first-order difference questions, and technology is specifically an AR(1)

  • Investment could be exogenous, or endogenous to output, e.g., it=syti_t = sy_t where ss is the savings rate.

  • The production function is nonlinear, but we can linearize it with natural log, e.g., logyt=logzt1+αlogkt1\log y_t = \log z_{t-1} + \alpha \log k_{t-1}

Natural log and percent change

  • It would be nice if all the units of all variables were something easily interpretable, e.g., in percent changes

  • Suppose each variable is covariance stationary and has a constant long-run mean (a.k.a. steady state or long-run equilibrium). For any variable in the model, xtx_t, define its mean as xˉ\bar{x}.

  • Note that log(xt/xˉ)xt/xˉ1x^t\log(x_t/\bar{x}) \approx x_t/\bar{x} - 1 \equiv \hat{x}_t (for more detail, see natural logs and percent changes)

  • This approximation works so long as the percent changes are relatively small, e.g., 5% or less considering observing a 5% GDP growth rate would be far in the right tail of the distribution of growth rates for most countries.

Linearization

  • Recall that natural log converts exponents to multiplication and multiplication to addition (awesome!)

  • Consider log output, logyt=logzt+αlogkt1\log y_t = \log z_t + \alpha \log k_{t-1}.

  • Suppose that each variable has a long-run mean/steady state, then logyˉ=logzˉ+αlogkˉ\log \bar{y} = \log \bar{z} + \alpha \log \bar{k}.

  • Question: How do we express those variables as percent change?

  • Answer: Subtract the steady state equation from the dynamic equation (note that logxtlogxˉ=log(xt/xˉ)\log x_t - \log \bar{x} = \log(x_t/\bar{x}))

    logyt=logzt1+αlogkt1(logyˉ=logzˉ+αlogkˉ)log(yt/yˉ)=log(zt1/zˉ)+αlog(kt1/kˉ)y^t=z^t1+αk^t1\begin{gather*} \log y_t = \log z_{t-1} + \alpha \log k_{t-1} \\ - (\log \bar{y} = \log \bar{z} + \alpha \log \bar{k}) \\ \rightarrow \log (y_t/\bar{y}) = \log (z_{t-1}/\bar{z}) + \alpha \log (k_{t-1}/\bar{k}) \\ \rightarrow \hat{y}_t = \hat{z}_{t-1} + \alpha \hat{k}_{t-1} \end{gather*}
  • Now kt=(1δ)kt1+it1k_t = (1-\delta)k_{t-1} + i_{t-1} is already linear, but the variables are not expressed as percent changes.

  • Subtract the steady-state equation from the dynamic equation

    kt=(1δ)kt1+it1(kˉ=(1δ)kˉ+iˉ)(ktkˉ)=(1δ)(kt1kˉ)+(it1iˉ)\begin{gather*} k_t = (1-\delta)k_{t-1} + i_{t-1} \\ - (\bar{k} = (1-\delta)\bar{k} + \bar{i}) \\ \rightarrow (k_t-\bar{k}) = (1-\delta)(k_{t-1}-\bar{k}) + (i_{t-1}-\bar{i}) \end{gather*}
  • Multiply each term by one in the form of xˉ/xˉ\bar{x}/\bar{x} using the appropriate variable for each term.

    kˉkˉ(ktkˉ)=(1δ)kˉkˉ(kt1kˉ)+iˉiˉ(it1iˉ)kˉk^t=(1δ)kˉk^t1+iˉi^t1\begin{gather*} \frac{\bar{k}}{\bar{k}}(k_t-\bar{k}) = (1-\delta)\frac{\bar{k}}{\bar{k}}(k_{t-1}-\bar{k}) + \frac{\bar{i}}{\bar{i}}(i_{t-1}-\bar{i}) \\ \rightarrow \bar{k}\hat{k}_t = (1-\delta)\bar{k}\hat{k}_{t-1} + \bar{i}\hat{i}_{t-1} \end{gather*}
  • Now it’s linear and all variables are expressed as percent changes.

Linear Growth Model

  • The linear growth model with endogenous investment

    y^t=z^t1+αk^t1k^t=(1δ)k^t1+iˉi^t1/kˉz^t=ρz^t1+εti^t=y^t1εi.i.d. N(0,σ2)\begin{gather*} \hat{y}_t = \hat{z}_{t-1} + \alpha \hat{k}_{t-1} \\ \hat{k}_t = (1-\delta)\hat{k}_{t-1} + \bar{i}\hat{i}_{t-1}/\bar{k} \\ \hat z_t = \rho \hat z_{t-1} + \varepsilon_t \\ \hat i_t = \hat{y}_{t-1} \\ \varepsilon \sim i.i.d.~N(0,\sigma^2) \end{gather*}
  • This is a linear system of 4 dynamic equations and 4 variables (i.e., unknowns), {y^t,k^t,z^t,i^t}\{\hat{y}_t,\hat{k}_t,\hat z_t,\hat i_t \} that we can map to a VAR(1), xt=μ+Φxt1+εt\mathbf{x}_t = \boldsymbol{\mu} + \boldsymbol{\Phi} \mathbf{x}_{t-1} + \boldsymbol{\varepsilon}_t

    [y^tk^tz^ti^t]=[0000]+[0α100(1δ)0iˉ/kˉ00ρ01000][y^t1k^t1z^t1i^t1]+[00εt0]\begin{gather*} \begin{bmatrix} \hat{y}_t \\ \hat{k}_t \\ \hat{z}_t \\ \hat{i}_t \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} 0 & \alpha & 1 & 0 \\ 0 & (1-\delta) & 0 & \bar{i}/\bar{k} \\ 0 & 0 & \rho & 0 \\ 1 & 0 & 0 & 0 \\ \end{bmatrix} \begin{bmatrix} \hat{y}_{t-1} \\ \hat{k}_{t-1} \\ \hat{z}_{t-1} \\ \hat{i}_{t-1} \end{bmatrix}+ \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \end{gather*}

    where εi.i.d. N(0,σ2)\varepsilon \sim i.i.d.~N(0,\sigma^2)

Covariance Matrix

  • In a VAR(pp), assume E(εt)=0E(\boldsymbol{\varepsilon}_t) = \mathbf{0} and E(εε)=ΩE(\boldsymbol{\varepsilon} \boldsymbol{\varepsilon}') = \boldsymbol{\Omega}

  • In the simple linear growth model, there is only one innovation/shock.

    εt[00εt0]\begin{gather*} \boldsymbol{\varepsilon}_t \equiv \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \end{gather*}

    which clearly satisfies E(εt)=0E(\boldsymbol{\varepsilon}_t) = 0 since all elements are zero or E(εt)=0E(\varepsilon_t) = 0

  • As for the covariance matrix E(εtεt)=E(\boldsymbol{\varepsilon}_t\boldsymbol{\varepsilon}_t')=

    E([00εt0][00εt0])=E([0000000000εt200000])=[0000000000σ200000]Ω\begin{gather*} E\left( \begin{bmatrix} 0 \\ 0 \\ \varepsilon_t \\ 0 \end{bmatrix} \begin{bmatrix} 0 & 0 & \varepsilon_t & 0 \\ \end{bmatrix}\right) = E\left(\begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \varepsilon_t^2 & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix}\right) = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \sigma^2 & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix} \equiv \boldsymbol{\Omega} \end{gather*}

Likelihood Function

  • Since a VAR(pp) is a linear system of equations where the innovations are i.i.d. normally distrubted, the likelihood function is a multivariate normal density function. Maximizing the likelihood is the same as maximizing the log-likelihood,

    L(Θ)=(Tn2)log(2π)+(T2)logΩ1(12)t=1T(ytμ)Ω1(ytμ).\begin{gather*} \mathcal{L}(\boldsymbol{\Theta}) = -\left(\dfrac{Tn}{2}\right) \log (2\pi) + \left(\dfrac{T}{2}\right) \log \left| \boldsymbol{\Omega}^{-1} \right| \\ - \left(\dfrac{1}{2}\right) \sum_{t=1}^T (\mathbf{y}_t - \boldsymbol{\mu})' \boldsymbol{\Omega}^{-1} (\mathbf{y}_t - \boldsymbol{\mu}). \end{gather*}
  • It turns out that the ordinary least squares (OLS) estimates are the maximum likelihood estimates (MLE) of the parameters in a VAR(pp).

  • You can also use estimated residuals, ε^\hat{\boldsymbol{\varepsilon}}, from OLS to get the MLE of the covaraince matrix, Ω^=E(ε^ε^)\hat{\boldsymbol{\Omega}} = E(\hat{\boldsymbol{\varepsilon}} \hat{\boldsymbol{\varepsilon}}').

  • Information criterion are used to select the optimal number of lags, e.g., Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC).

Structural VAR

  • A structural VAR(pp) is given by

    A0yt=a0+A1yt1++Apytp+εt,\begin{gather*} \mathbf{A}_0 \mathbf{y}_t = \mathbf{a}_0 + \mathbf{A}_1 \mathbf{y}_{t-1} + \cdots + \mathbf{A}_p \mathbf{y}_{t-p} + \boldsymbol{\varepsilon}_t, \end{gather*}

    where A0\mathbf{A}_0 allows for contemporaneous relationships between variables, e.g., if there is an aggregate demand shock, then both GDP growth and the inflation rate might increase simultaneously.

  • Let’s assume εtN(0,In)\boldsymbol{\varepsilon}_t\sim N(0,\mathbf{I}_n), i.e., the multivariate standard normal distribution

  • There are nn variables, each coefficient matrix Aj\mathbf{A}_j are n×nn \times n square matrices, while a0\mathbf{a}_0, ytj\mathbf{y}_{t-j}, and εt\boldsymbol{\varepsilon}_t are n×1n \times 1 column vectors.

Reduced-form VAR

  • Pre-multiplying by the inverse of A0\mathbf{A}_0 yields a reduced-form VARVAR model

    yt=b0+B1yt1++Bpytp+υt,\begin{gather*} \mathbf{y}_t = \mathbf{b}_0 + \mathbf{B}_1 \mathbf{y}_{t-1} + \cdots + \mathbf{B}_p \mathbf{y}_{t-p} + \boldsymbol{\upsilon}_t, \end{gather*}

    where b0=A01a0\mathbf{b}_0 = \mathbf{A}_0^{-1}\mathbf{a}_0 is a n×1n\times 1 column vector, Bj=A01Aj\mathbf{B}_j = \mathbf{A}_0^{-1}\mathbf{A}_j are n×nn\times n matrices

  • Note: A01A0yt=Inyt=yt\mathbf{A}_0^{-1} \mathbf{A}_0 \mathbf{y}_t = \mathbf{I}_n \mathbf{y}_t = \mathbf{y}_t, so just yt\mathbf{y}_t remains on the left-hand side

  • υt=A01εt\boldsymbol{\upsilon}_t = \mathbf{A}_0^{-1}\boldsymbol{\varepsilon}_t is a n×1n\times 1 vector of shocks that has a multivariate normal distribution with zero mean and variance-covariance matrix Ω\boldsymbol{\Omega}

  • Since the shocks are possibly correlated across variables, y\mathbf{y} becomes an n×1n\times 1 vector of endogenous variables.

Estimation

  • Estimate a reduced-form VAR(pp)'s parameters with ordinary least squares (OLS) estimator.

  • However, there might be contemporaneous relationships between variables, i.e., shocks/residuals to different variables can be correlated, so some restriction on the covariance matrix is necessary for identification of a structural VAR(pp).

  • Given OLS estimate of covariance matrix, Ω^\hat{\boldsymbol{\Omega}}, “structural” shocks are often identified recursively, e.g., with a Cholesky decomposition, Ω^=(A^01)A^01\hat{\boldsymbol{\Omega}} = (\hat{\mathbf{A}}_0^{-1})'\hat{\mathbf{A}}_0^{-1}, which ensures that A^01\hat{\mathbf{A}}_0^{-1} is a lower-triangular matrix.

  • The implication is that the ordering of the variables matter in a structural VAR, so researchers must justify the ordering as part of their identification strategy.

OLS Estimates

  • A reduced-form VAR(pp) can be written compactly as

    Yn×(Tp)=Bn×(1+np)X(1+np)×(Tp)+Un×(Tp)\begin{gather*} \mathbf{Y}_{n\times(T-p)} = \mathbf{B}_{n\times(1+np)} \mathbf{X}_{(1+np)\times(T-p)} + \mathbf{U}_{n\times(T-p)} \end{gather*}

    which is a multivariate linear regression where everything in bold is a matrix defined as

    YTj=[yp+1j,,yTj]B=[b0,B1,,Bp]X=[1,YT1,,YTp]U=[υp+1,,υT]\begin{gather*} \mathbf{Y}_{T-j} = [\mathbf{y}_{p+1-j},\ldots,\mathbf{y}_{T-j}] \\ \mathbf{B} =[\mathbf{b}_0,\mathbf{B}_1,\ldots,\mathbf{B}_p] \\ \mathbf{X} = [\mathbf{1},\mathbf{Y}_{T-1}',\ldots,\mathbf{Y}_{T-p}']' \\ \mathbf{U} = [\boldsymbol{\upsilon}_{p+1},\ldots,\boldsymbol{\upsilon}_T] \end{gather*}
  • Parameters estimates are OLS, B^=YX(XX)1\hat{\mathbf{B}} = \mathbf{Y} \mathbf{X}' \left( \mathbf{X}\mathbf{X}'\right)^{-1}.

  • Residual estimates are U^=YB^X\hat{\mathbf{U}} = \mathbf{Y} - \hat{\mathbf{B}} \mathbf{X} and the covariance matrix is Ω^=U^U^\hat{\boldsymbol{\Omega}} = \hat{\mathbf{U}} \hat{\mathbf{U}}'

  • With Ω^\hat{\boldsymbol{\Omega}} in hand, the structural shocks are identified by a Cholesky decomposition, Ω^=(A^01)A^01\hat{\boldsymbol{\Omega}} = (\hat{A}_0^{-1})'\hat{A}_0^{-1}

Recursive Identification

  • Suppose the data is 1. Output (yty_t), 2. Inflation (πt\pi_t), and 3. Interest Rate (rtr_t)

    [ytπtrt]=[b0,1b0,2b0,3]+[b1,1b1,4b1,7b1,2b1,5b1,8b1,3b1,6b1,9][yt1πt1rt1]+[a0,100a0,2a0,40a0,3a0,5a0,6][εy,tεπ,tεr,t]\begin{gather*} \small \begin{bmatrix} y_t \\ \pi_t \\ r_t \end{bmatrix} = \begin{bmatrix} b_{0,1} \\ b_{0,2} \\ b_{0,3} \end{bmatrix} + \begin{bmatrix} b_{1,1} & b_{1,4} & b_{1,7} \\ b_{1,2} & b_{1,5} & b_{1,8} \\ b_{1,3} & b_{1,6} & b_{1,9} \end{bmatrix} \begin{bmatrix} y_{t-1} \\ \pi_{t-1} \\ r_{t-1} \end{bmatrix} + \begin{bmatrix} a_{0,1} & 0 & 0 \\ a_{0,2} & a_{0,4} & 0 \\ a_{0,3} & a_{0,5} & a_{0,6} \end{bmatrix} \begin{bmatrix} \varepsilon_{y,t} \\ \varepsilon_{\pi,t} \\ \varepsilon_{r,t} \end{bmatrix} \end{gather*}
  • Recursive identification, e.g., using a Cholesky decomposition, imposes that

    • Output does not respond immediately to other variables’ shocks

    • Inflation may respond immediately to output shocks, but does not respond to interest rate shocks

    • Interest Rate may respond immediately to both output and inflation shocks

  • This ordering reflects these beliefs or theory

    • Firms and consumers don’t instantly change output/expenditure \rightarrow output responds slowly.

    • Prices are sticky \rightarrow inflation adjusts with a lag.

    • Central banks move fast \rightarrow interest rates adjust quickly to new information.

Cholesky Decomposition Example

  • Here’s an example of a positive definite matrix

    Ω=[41216123743164398]\boldsymbol{\Omega} = \begin{bmatrix} 4 & 12 & -16 \\ 12 & 37 & -43 \\ -16 & -43 & 98 \end{bmatrix}
  • Our goal is to find a lower triangular matrix, L\mathbf{L}, such that Ω=LL\boldsymbol{\Omega} = \mathbf{L} \mathbf{L}'. The Cholesky factor is

    L=[200610853]L = \begin{bmatrix} 2 & 0 & 0 \\ 6 & 1 & 0 \\ -8 & 5 & 3 \end{bmatrix}
import numpy as np

# Define the symmetric, positive definite matrix
Omega = np.array([
    [4, 12, -16],
    [12, 37, -43],
    [-16, -43, 98]
])
print(Omega)
[[  4  12 -16]
 [ 12  37 -43]
 [-16 -43  98]]
# Compute the Cholesky factor (lower triangular matrix L)
L = np.linalg.cholesky(Omega)

# Print the result
print('L ='); print(L)
print("L L' ="); print(L@L.T)
print('Omega ='); print(Omega)
L =
[[ 2.  0.  0.]
 [ 6.  1.  0.]
 [-8.  5.  3.]]
L L' =
[[  4.  12. -16.]
 [ 12.  37. -43.]
 [-16. -43.  98.]]
Omega =
[[  4  12 -16]
 [ 12  37 -43]
 [-16 -43  98]]
  • Verifying Ω=LL\boldsymbol{\Omega} = \mathbf{L} \mathbf{L}'

    LL=[200610853][268015003]=[41216123743164398]=A\mathbf{L} \mathbf{L}' = \begin{bmatrix} 2 & 0 & 0 \\ 6 & 1 & 0 \\ -8 & 5 & 3 \end{bmatrix}\begin{bmatrix} 2 & 6 & -8 \\ 0 & 1 & 5 \\ 0 & 0 & 3 \end{bmatrix} = \begin{bmatrix} 4 & 12 & -16 \\ 12 & 37 & -43 \\ -16 & -43 & 98 \end{bmatrix} = A