Review 2B Key

Contents

Review 2B Key#

by Professor Throckmorton
for Time Series Econometrics
W&M ECON 408/PUBP 616

The value of homes and rent might be cointegrated. If rent is above equilibrium, then home ownership is more appealing, which should drive rent down and home prices up, i.e., restoring a long-run equilibrium relationship. Was that true in the lead up to the 2007-2009 Great Recession?

1)#

  • Read data on home prices (FRED: CSUSHPINSA) and rental prices (FRED: CUUR0000SEHA)

  • Put the data in a dataframe and reindex the data to month end.

  • Display the head and tail of the dataframe in a table (not a plot).

# Libraries
from fredapi import Fred
import pandas as pd
# Setup acccess to FRED
fred_api_key = pd.read_csv('fred_api_key.txt', header=None).iloc[0,0]
fred = Fred(api_key=fred_api_key)
# Series to get
series = ['CSUSHPINSA','CUUR0000SEHA']
rename = ['home','rent']
# Get and append data to list
dl = []
for idx, string in enumerate(series):
    var = fred.get_series(string).to_frame(name=rename[idx])
    dl.append(var)
    print(var.head(2)); print(var.tail(2))
            home
1975-01-01   NaN
1975-02-01   NaN
               home
2025-04-01  329.638
2025-05-01  331.107
            rent
1914-12-01  21.0
1915-01-01   NaN
               rent
2025-05-01  433.698
2025-06-01  434.594
# Concatenate data to create data frame (time-series table)
raw = pd.concat(dl, axis=1).sort_index()
# Make all columns numeric
raw = raw.apply(pd.to_numeric, errors='coerce')
# Resample/reindex to month end
raw = raw.resample('ME').last().dropna()
# Display dataframe
display(raw)
home rent
1987-01-31 63.733 121.300
1987-02-28 64.132 121.700
1987-03-31 64.468 121.800
1987-04-30 64.973 122.000
1987-05-31 65.547 122.300
... ... ...
2025-01-31 323.652 429.506
2025-02-28 325.107 430.603
2025-03-31 327.658 431.798
2025-04-30 329.638 432.956
2025-05-31 331.107 433.698

461 rows × 2 columns

2)#

  • Transform the data with \(100\times \log\)

  • Remove seasonality as needed.

  • Conduct ADF unit root tests to verify that series are I(\(1\)) from 1990-2006.

# Scientific computing
import numpy as np
data = pd.DataFrame()
# Transform data
data['home'] = 100*np.log(raw['home'])
data['dhome'] = data['home'].diff(12)
data['d2home'] = data['dhome'].diff(1)
data['rent'] = 100*np.log(raw['rent'])
data['drent'] = data['rent'].diff(12)
data['d2rent'] = data['drent'].diff(1)
display(data)
home dhome d2home rent drent d2rent
1987-01-31 415.470248 NaN NaN 479.826682 NaN NaN
1987-02-28 416.094346 NaN NaN 480.155900 NaN NaN
1987-03-31 416.616898 NaN NaN 480.238036 NaN NaN
1987-04-30 417.397180 NaN NaN 480.402104 NaN NaN
1987-05-31 418.276744 NaN NaN 480.647704 NaN NaN
... ... ... ... ... ... ...
2025-01-31 577.966886 4.037125 0.129897 606.263571 4.156625 -0.027557
2025-02-28 578.415436 3.866708 -0.170417 606.518655 4.005756 -0.150868
2025-03-31 579.197038 3.320426 -0.546282 606.795789 3.915179 -0.090577
2025-04-30 579.799508 2.691851 -0.628575 607.063611 3.902339 -0.012840
2025-05-31 580.244159 2.225996 -0.465856 607.234844 3.741261 -0.161078

461 rows × 6 columns

from statsmodels.tsa.stattools import adfuller
# Function to organize ADF test results
def adf_test(data,const_trend):
    keys = ['Test Statistic','p-value','# of Lags','# of Obs']
    values = adfuller(data,regression=const_trend)
    test = pd.DataFrame.from_dict(dict(zip(keys,values[0:4])),
                                  orient='index',columns=[data.name])
    return test
# Select sample
start_date, end_date = '01-01-1990', '12-31-2006'
sample = data[start_date:end_date]
display(sample)
# ADF unit root tests
dl = []
for column in sample.columns:
    test = adf_test(sample[column],const_trend='c')
    dl.append(test)
results = pd.concat(dl, axis=1)
display(results)
home dhome d2home rent drent d2rent
1990-01-31 433.764362 3.878123 -0.421044 491.118322 3.980999 -0.085827
1990-02-28 433.842735 3.480830 -0.397292 491.265489 3.822121 -0.158878
1990-03-31 434.107442 3.148408 -0.332422 491.632461 4.036422 0.214301
1990-04-30 434.431179 2.858522 -0.289886 491.998093 4.173482 0.137060
1990-05-31 434.765515 2.622813 -0.235709 492.216831 4.164170 -0.009312
... ... ... ... ... ... ...
2006-08-31 521.713443 4.707097 -1.152809 542.141956 3.692450 0.213446
2006-09-30 521.601127 3.640424 -1.066673 542.539045 3.814687 0.122237
2006-10-31 521.522919 2.921304 -0.719120 542.934563 3.890497 0.075810
2006-11-30 521.292831 2.177308 -0.743996 543.328523 3.965768 0.075271
2006-12-31 521.073674 1.717451 -0.459857 543.807931 4.218161 0.252393

204 rows × 6 columns

home dhome d2home rent drent d2rent
Test Statistic -0.776473 -2.059508 -2.501802 1.557229 -1.936576 -3.251748
p-value 0.825935 0.261105 0.115036 0.997723 0.315055 0.017178
# of Lags 14.000000 15.000000 14.000000 4.000000 12.000000 11.000000
# of Obs 189.000000 188.000000 189.000000 199.000000 191.000000 192.000000

3)#

  • What is the lag order for a VECM selectd by AIC? Make sure maxlags is not constraining the result.

  • Conduct a Johansen cointegration test for the lag order. Set det_order=-1.

sample_I1 = sample[['dhome','drent']]
# Select number of lags in VECM
from statsmodels.tsa.vector_ar.vecm import select_order
lag_order_results = select_order(
    sample_I1, maxlags=20, deterministic='co')
print(f'Selected lag order (AIC) = {lag_order_results.aic}')
print(f'Selected lag order (BIC) = {lag_order_results.bic}')
Selected lag order (AIC) = 15
Selected lag order (BIC) = 1
# Johansen cointegration tests
from statsmodels.tsa.vector_ar.vecm import coint_johansen
test = coint_johansen(sample_I1, det_order=-1, k_ar_diff=lag_order_results.bic)
test_stats = test.lr1; crit_vals = test.cvt[:, 1]
# Print results
for r_0, (test_stat, crit_val) in enumerate(zip(test_stats, crit_vals)):
    print(f'H_0: r <= {r_0}')
    print(f'  Test Stat. = {test_stat:.2f}, 5% Crit. Value = {crit_val:.2f}')
    if test_stat > crit_val:
        print('  => Reject null hypothesis.')
    else:
        print('  => Fail to reject null hypothesis.')
H_0: r <= 0
  Test Stat. = 13.17, 5% Crit. Value = 12.32
  => Reject null hypothesis.
H_0: r <= 1
  Test Stat. = 0.09, 5% Crit. Value = 4.13
  => Fail to reject null hypothesis.

4)#

  • Estimate a VECM given your answers to the previous questions.

  • For which variable is the weight on the error correction term significant? Interpret that result.

# Estimate VECM
from statsmodels.tsa.vector_ar.vecm import VECM
# Estimate VECM
model_vecm = VECM(sample_I1, deterministic='co', 
            k_ar_diff=lag_order_results.bic, 
            coint_rank=1)
results_vecm = model_vecm.fit()
display(results_vecm.summary())
Det. terms outside the coint. relation & lagged endog. parameters for equation dhome
coef std err z P>|z| [0.025 0.975]
const -0.0438 0.020 -2.138 0.033 -0.084 -0.004
L1.dhome 0.9364 0.028 33.666 0.000 0.882 0.991
L1.drent -0.0771 0.058 -1.336 0.182 -0.190 0.036
Det. terms outside the coint. relation & lagged endog. parameters for equation drent
coef std err z P>|z| [0.025 0.975]
const 0.0662 0.024 2.741 0.006 0.019 0.114
L1.dhome -0.1193 0.033 -3.636 0.000 -0.184 -0.055
L1.drent -0.1052 0.068 -1.545 0.122 -0.239 0.028
Loading coefficients (alpha) for equation dhome
coef std err z P>|z| [0.025 0.975]
ec1 -0.0042 0.002 -2.301 0.021 -0.008 -0.001
Loading coefficients (alpha) for equation drent
coef std err z P>|z| [0.025 0.975]
ec1 0.0064 0.002 2.965 0.003 0.002 0.011
Cointegration relations for loading-coefficients-column 1
coef std err z P>|z| [0.025 0.975]
beta.1 1.0000 0 0 0.000 1.000 1.000
beta.2 -5.0155 1.900 -2.640 0.008 -8.739 -1.292
  • The loading coefficient for home price on the ECT is probably negative, which mean that short-run changes in home prices drive the system back to the long-run equilibrium.

  • I.e., if home prices are high, then people will prefer to rent, which will drive the home price back down toward equilbrium (and rents should follow).