Review 2B Key

Review 2B Key#

by Professor Throckmorton
for Time Series Econometrics
W&M ECON 408/PUBP 616

The value of homes and rent might be cointegrated. If rent is above equilibrium, then home ownership is more appealing, which should drive rent down and home prices up, i.e., restoring a long-run equilibrium relationship. Was that true in the lead up to the 2007-2009 Great Recession?

1)#

Read data on home prices (FRED: CSUSHPINSA) and rental prices (FRED: CUUR0000SEHA)
Put the data in a dataframe and reindex the data to month end.
Display the head and tail of the dataframe in a table (not a plot).

# Libraries
from fredapi import Fred
import pandas as pd
# Setup acccess to FRED
fred_api_key = pd.read_csv('fred_api_key.txt', header=None).iloc[0,0]
fred = Fred(api_key=fred_api_key)
# Series to get
series = ['CSUSHPINSA','CUUR0000SEHA']
rename = ['home','rent']
# Get and append data to list
dl = []
for idx, string in enumerate(series):
    var = fred.get_series(string).to_frame(name=rename[idx])
    dl.append(var)
    print(var.head(2)); print(var.tail(2))

            home
1975-01-01   NaN
1975-02-01   NaN
               home
2025-07-01  331.003
2025-08-01  330.022

            rent
1914-12-01  21.0
1915-01-01   NaN
               rent
2025-08-01  436.981
2025-09-01  438.212

# Concatenate data to create data frame (time-series table)
raw = pd.concat(dl, axis=1).sort_index()
# Make all columns numeric
raw = raw.apply(pd.to_numeric, errors='coerce')
# Resample/reindex to month end
raw = raw.resample('ME').last().dropna()
# Display dataframe
display(raw)

	home	rent
1987-01-31	63.733	121.300
1987-02-28	64.132	121.700
1987-03-31	64.468	121.800
1987-04-30	64.972	122.000
1987-05-31	65.547	122.300
...	...	...
2025-04-30	329.917	432.956
2025-05-31	331.458	433.698
2025-06-30	331.685	434.594
2025-07-31	331.003	435.489
2025-08-31	330.022	436.981

464 rows × 2 columns

2)#

Transform the data with \(100\times \log\)
Remove seasonality as needed.
Conduct ADF unit root tests to verify that series are I(\(1\)) from 1990-2006.

# Scientific computing
import numpy as np
data = pd.DataFrame()
# Transform data
data['home'] = 100*np.log(raw['home'])
data['dhome'] = data['home'].diff(12)
data['d2home'] = data['dhome'].diff(1)
data['rent'] = 100*np.log(raw['rent'])
data['drent'] = data['rent'].diff(12)
data['d2rent'] = data['drent'].diff(1)
display(data)

	home	dhome	d2home	rent	drent	d2rent
1987-01-31	415.470248	NaN	NaN	479.826682	NaN	NaN
1987-02-28	416.094346	NaN	NaN	480.155900	NaN	NaN
1987-03-31	416.616898	NaN	NaN	480.238036	NaN	NaN
1987-04-30	417.395641	NaN	NaN	480.402104	NaN	NaN
1987-05-31	418.276744	NaN	NaN	480.647704	NaN	NaN
...	...	...	...	...	...	...
2025-04-30	579.884111	2.787362	-0.572213	607.063611	3.902339	-0.012840
2025-05-31	580.350110	2.339051	-0.448311	607.234844	3.741261	-0.161078
2025-06-30	580.418572	1.929034	-0.410016	607.441226	3.697103	-0.044158
2025-07-31	580.212744	1.624284	-0.304750	607.646954	3.484196	-0.212907
2025-08-31	579.915932	1.498650	-0.125634	607.988972	3.435611	-0.048585

464 rows × 6 columns

from statsmodels.tsa.stattools import adfuller
# Function to organize ADF test results
def adf_test(data,const_trend):
    keys = ['Test Statistic','p-value','# of Lags','# of Obs']
    values = adfuller(data,regression=const_trend)
    test = pd.DataFrame.from_dict(dict(zip(keys,values[0:4])),
                                  orient='index',columns=[data.name])
    return test

# Select sample
start_date, end_date = '01-01-1990', '12-31-2006'
sample = data[start_date:end_date]
display(sample)
# ADF unit root tests
dl = []
for column in sample.columns:
    test = adf_test(sample[column],const_trend='c')
    dl.append(test)
results = pd.concat(dl, axis=1)
display(results)

	home	dhome	d2home	rent	drent	d2rent
1990-01-31	433.764362	3.876764	-0.422403	491.118322	3.980999	-0.085827
1990-02-28	433.842735	3.479478	-0.397286	491.265489	3.822121	-0.158878
1990-03-31	434.107442	3.148408	-0.331070	491.632461	4.036422	0.214301
1990-04-30	434.431179	2.859858	-0.288550	491.998093	4.173482	0.137060
1990-05-31	434.765515	2.624141	-0.235717	492.216831	4.164170	-0.009312
...	...	...	...	...	...	...
2006-08-31	521.712900	4.706555	-1.153384	542.141956	3.692450	0.213446
2006-09-30	521.601127	3.640424	-1.066131	542.539045	3.814687	0.122237
2006-10-31	521.522919	2.921304	-0.719120	542.934563	3.890497	0.075810
2006-11-30	521.292831	2.177308	-0.743996	543.328523	3.965768	0.075271
2006-12-31	521.073674	1.717451	-0.459857	543.807931	4.218161	0.252393

204 rows × 6 columns

	home	dhome	d2home	rent	drent	d2rent
Test Statistic	-0.774903	-2.061654	-2.495657	1.557229	-1.936576	-3.251748
p-value	0.826383	0.260210	0.116535	0.997723	0.315055	0.017178
# of Lags	14.000000	15.000000	14.000000	4.000000	12.000000	11.000000
# of Obs	189.000000	188.000000	189.000000	199.000000	191.000000	192.000000

3)#

What is the lag order for a VECM selectd by AIC? Make sure maxlags is not constraining the result.
Conduct a Johansen cointegration test for the lag order. Set det_order=-1.

sample_I1 = sample[['dhome','drent']]
# Select number of lags in VECM
from statsmodels.tsa.vector_ar.vecm import select_order
lag_order_results = select_order(
    sample_I1, maxlags=20, deterministic='co')
print(f'Selected lag order (AIC) = {lag_order_results.aic}')
print(f'Selected lag order (BIC) = {lag_order_results.bic}')

Selected lag order (AIC) = 15
Selected lag order (BIC) = 1

# Johansen cointegration tests
from statsmodels.tsa.vector_ar.vecm import coint_johansen
test = coint_johansen(sample_I1, det_order=-1, k_ar_diff=lag_order_results.bic)
test_stats = test.lr1; crit_vals = test.cvt[:, 1]
# Print results
for r_0, (test_stat, crit_val) in enumerate(zip(test_stats, crit_vals)):
    print(f'H_0: r <= {r_0}')
    print(f'  Test Stat. = {test_stat:.2f}, 5% Crit. Value = {crit_val:.2f}')
    if test_stat > crit_val:
        print('  => Reject null hypothesis.')
    else:
        print('  => Fail to reject null hypothesis.')

H_0: r <= 0
  Test Stat. = 13.17, 5% Crit. Value = 12.32
  => Reject null hypothesis.
H_0: r <= 1
  Test Stat. = 0.09, 5% Crit. Value = 4.13
  => Fail to reject null hypothesis.

4)#

Estimate a VECM given your answers to the previous questions.
For which variable is the weight on the error correction term significant? Interpret that result.

# Estimate VECM
from statsmodels.tsa.vector_ar.vecm import VECM
# Estimate VECM
model_vecm = VECM(sample_I1, deterministic='co', 
            k_ar_diff=lag_order_results.bic, 
            coint_rank=1)
results_vecm = model_vecm.fit()
display(results_vecm.summary())

Det. terms outside the coint. relation & lagged endog. parameters for equation dhome
	coef	std err	z	P>\|z\|	[0.025	0.975]
const	-0.0438	0.021	-2.132	0.033	-0.084	-0.004
L1.dhome	0.9362	0.028	33.623	0.000	0.882	0.991
L1.drent	-0.0780	0.058	-1.350	0.177	-0.191	0.035

Det. terms outside the coint. relation & lagged endog. parameters for equation drent
	coef	std err	z	P>\|z\|	[0.025	0.975]
const	0.0664	0.024	2.745	0.006	0.019	0.114
L1.dhome	-0.1196	0.033	-3.644	0.000	-0.184	-0.055
L1.drent	-0.1052	0.068	-1.545	0.122	-0.239	0.028

Loading coefficients (alpha) for equation dhome
	coef	std err	z	P>\|z\|	[0.025	0.975]
ec1	-0.0042	0.002	-2.294	0.022	-0.008	-0.001

Loading coefficients (alpha) for equation drent
	coef	std err	z	P>\|z\|	[0.025	0.975]
ec1	0.0064	0.002	2.969	0.003	0.002	0.011

Cointegration relations for loading-coefficients-column 1
	coef	std err	z	P>\|z\|	[0.025	0.975]
beta.1	1.0000	0	0	0.000	1.000	1.000
beta.2	-5.0237	1.900	-2.644	0.008	-8.748	-1.299

The loading coefficient for home price on the ECT is probably negative, which mean that short-run changes in home prices drive the system back to the long-run equilibrium.
I.e., if home prices are high, then people will prefer to rent, which will drive the home price back down toward equilbrium (and rents should follow).

Review 2B Key

Contents

Review 2B Key#

1)#

2)#

3)#

4)#