Review 1B Key

Contents

Review 1B Key#

by Professor Throckmorton
for Time Series Econometrics
W&M ECON 408/PUBP 616
Slides

1)#

  • Read U.S. Industrial Production: Total Index (IPB50001N) from FRED.

  • Resample/reindex the data to a quarterly frequency using the value for the last month of each quarter.

  • Print the original data and compare it to the resampled/reindexed data to verify it is correct.

# Libraries
from fredapi import Fred
import pandas as pd
# Read data
fred_api_key = pd.read_csv('fred_api_key.txt', header=None)
fred = Fred(api_key=fred_api_key.iloc[0,0])
data = fred.get_series('IPB50001N').to_frame(name='IP')
print(data.tail(3))
# Resample/reindex to quarterly frequency
data = data.resample('QE').last()
# Print data
print(f'number of rows/quarters = {len(data)}')
print(data.tail(3))
                  IP
2025-04-01  102.6547
2025-05-01  102.7865
2025-06-01  105.2800
number of rows/quarters = 426
                  IP
2024-12-31  102.5835
2025-03-31  104.2163
2025-06-30  105.2800

2)#

  • Why is this data not stationary?

  • Use appropriate transformations to remove the non-stationarity and plot the series.

  • Correctly and completely label the plot.

# Plotting
import matplotlib.pyplot as plt
# Plot
fig, ax = plt.subplots(figsize=(6.5,2));
ax.plot(data.IP);
ax.set_title('Industrial Production (Index, Quarterly)');
ax.grid(); ax.autoscale(tight=True)
_images/d9e6f0c4f81e88a3fe6c5d37c2862e777168e9bdb06e8b1bf6671045e86c3860.png

This data is non-stationary because it is seasonal over the year and it has an exponential time trend.

# Scientific computing
import numpy as np
# Transform data
#    Take log
data['logIP'] = np.log(data.IP)
#    Take difference
data['dlogIP'] = 100*(data.logIP.diff(4))
# Plot
fig, ax = plt.subplots(figsize=(6.5,2.5));
ax.plot(data.dlogIP);
ax.set_title('Industrial Production (YoY % Change, Quarterly)');
ax.yaxis.set_major_formatter('{x:.0f}%')
ax.grid(); ax.autoscale(tight=True)
_images/a5c87e5438c5e1069423d93e1c68fb0814a300669314b7efe10a06890f099e87.png

Time varying volatility indicates this is non-stationary data.

3)#

  • Plot the autocorrelation function of the transformed data.

  • Is it stationary? Why or why not?

#   Auto-correlation function
from statsmodels.graphics.tsaplots import plot_acf as acf
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,2))
acf(data.dlogIP.dropna(),ax)
ax.grid(); ax.autoscale(tight=True)
_images/7c8ac11f94e453879e5b63d8452b8fb683f2007f2ba1d8b54614d1208876aa5a.png
  • ACF is signifcant at long lags, doesn’t clearly taper to zero

  • might be non-stationary

  • First 3 lags are remarkable

4)#

  • Conduct a unit root test of the transformed data.

  • Is it stationary? Why or why not?

# ADF Test
from statsmodels.tsa.stattools import adfuller
# Function to organize ADF test results
def adf_test(data):
    keys = ['Test Statistic','p-value','# of Lags','# of Obs']
    values = adfuller(data)
    test = pd.DataFrame.from_dict(dict(zip(keys,values[0:4])),
                                  orient='index',columns=[data.name])
    return test
adf_test(data.dlogIP.dropna())
dlogIP
Test Statistic -5.803906e+00
p-value 4.563453e-07
# of Lags 1.500000e+01
# of Obs 4.060000e+02
  • Null hypothesis: data has unit root

  • p-value near zero, so reject the null

  • Unit root test says data is probably stationary

5)#

If you were to pick a parsimonious AR model (i.e., the number of lags is less than 5) to fit the transformed data, what lag would you pick? Why?

# Partial auto-correlation function
from statsmodels.graphics.tsaplots import plot_pacf as pacf
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
pacf(data.dlogIP.dropna(),ax)
ax.grid(); ax.autoscale(tight=True);
_images/ac3ac63922a58f5dc444b0dbed7ead2eaa2547ecba97f05cc3ea6778ff999f5e.png
  • PACF is signficantly different from zero for first 5 lags

  • Stops being signficant for lag 6

6)#

  • Given your last answer, estimate an AR model given the transformed quarterly data.

  • Interpret the results.

# ARIMA model
from statsmodels.tsa.arima.model import ARIMA
# Define model
mod = ARIMA(data.dlogIP,order=(5,0,0))
# Estimate model
res = mod.fit()
summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
#print(tab0); print(tab1); print(tab2)
Dep. Variable: dlogIP No. Observations: 425
Model: ARIMA(5, 0, 0) Log Likelihood -1298.814
coef std err z P>|z| [0.025 0.975]
const 2.9265 0.975 3.002 0.003 1.016 4.837
ar.L1 1.0243 0.031 32.647 0.000 0.963 1.086
ar.L2 -0.1735 0.042 -4.163 0.000 -0.255 -0.092
ar.L3 0.1902 0.035 5.505 0.000 0.123 0.258
ar.L4 -0.6642 0.029 -22.871 0.000 -0.721 -0.607
ar.L5 0.3486 0.025 13.773 0.000 0.299 0.398
sigma2 27.8385 0.912 30.511 0.000 26.050 29.627
Heteroskedasticity (H): 0.05 Skew: 0.34
Prob(H) (two-sided): 0.00 Kurtosis: 13.72
  • All the AR coefficients are significantly different than zero

  • Residuals have a lot of kurtosis, so maybe AR model isn’t the best choice