Review 1B Key

Review 1B Key#

by Professor Throckmorton
for Time Series Econometrics
W&M ECON 408/PUBP 616
Slides

1)#

Read U.S. Industrial Production: Total Index (IPB50001N) from FRED.
Resample/reindex the data to a quarterly frequency using the value for the last month of each quarter.
Print the original data and compare it to the resampled/reindexed data to verify it is correct.

# Libraries
from fredapi import Fred
import pandas as pd
# Read data
fred_api_key = pd.read_csv('fred_api_key.txt', header=None)
fred = Fred(api_key=fred_api_key.iloc[0,0])
data = fred.get_series('IPB50001N').to_frame(name='IP')
print(data.tail(3))
# Resample/reindex to quarterly frequency
data = data.resample('QE').last()
# Print data
print(f'number of rows/quarters = {len(data)}')
print(data.tail(3))

                  IP
2025-06-01  105.4026
2025-07-01  104.3398
2025-08-01  106.0004
number of rows/quarters = 427
                  IP
2025-03-31  104.1248
2025-06-30  105.4026
2025-09-30  106.0004

2)#

Why is this data not stationary?
Use appropriate transformations to remove the non-stationarity and plot the series.
Correctly and completely label the plot.

# Plotting
import matplotlib.pyplot as plt
# Plot
fig, ax = plt.subplots(figsize=(6.5,2));
ax.plot(data.IP);
ax.set_title('Industrial Production (Index, Quarterly)');
ax.grid(); ax.autoscale(tight=True)

_images/d9aee880dd8ca511798e89839944b5b8b11c36a0e0b19f53d208dd8c9a2de18d.png

This data is non-stationary because it is seasonal over the year and it has an exponential time trend.

# Scientific computing
import numpy as np
# Transform data
#    Take log
data['logIP'] = np.log(data.IP)
#    Take difference
data['dlogIP'] = 100*(data.logIP.diff(4))

# Plot
fig, ax = plt.subplots(figsize=(6.5,2.5));
ax.plot(data.dlogIP);
ax.set_title('Industrial Production (YoY % Change, Quarterly)');
ax.yaxis.set_major_formatter('{x:.0f}%')
ax.grid(); ax.autoscale(tight=True)

_images/9461a039a05d2af5cc17e110e20323a87557702b1ce3be759e815e5360d0f760.png

Time varying volatility indicates this is non-stationary data.

3)#

Plot the autocorrelation function of the transformed data.
Is it stationary? Why or why not?

#   Auto-correlation function
from statsmodels.graphics.tsaplots import plot_acf as acf
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,2))
acf(data.dlogIP.dropna(),ax)
ax.grid(); ax.autoscale(tight=True)

_images/1d15c89b6d691a34745990936487142ddf15e9b0d6704ee62aca1254caabe65e.png

ACF is signifcant at long lags, doesn’t clearly taper to zero
might be non-stationary
First 3 lags are remarkable

4)#

Conduct a unit root test of the transformed data.
Is it stationary? Why or why not?

# ADF Test
from statsmodels.tsa.stattools import adfuller
# Function to organize ADF test results
def adf_test(data):
    keys = ['Test Statistic','p-value','# of Lags','# of Obs']
    values = adfuller(data)
    test = pd.DataFrame.from_dict(dict(zip(keys,values[0:4])),
                                  orient='index',columns=[data.name])
    return test
adf_test(data.dlogIP.dropna())

	dlogIP
Test Statistic	-5.814992e+00
p-value	4.313232e-07
# of Lags	1.500000e+01
# of Obs	4.070000e+02

Null hypothesis: data has unit root
p-value near zero, so reject the null
Unit root test says data is probably stationary

5)#

If you were to pick a parsimonious AR model (i.e., the number of lags is less than 5) to fit the transformed data, what lag would you pick? Why?

# Partial auto-correlation function
from statsmodels.graphics.tsaplots import plot_pacf as pacf
# Plot Autocorrelation Function
fig, ax = plt.subplots(figsize=(6.5,1.5))
pacf(data.dlogIP.dropna(),ax)
ax.grid(); ax.autoscale(tight=True);

_images/ac3ac63922a58f5dc444b0dbed7ead2eaa2547ecba97f05cc3ea6778ff999f5e.png

PACF is signficantly different from zero for first 5 lags
Stops being signficant for lag 6

6)#

Given your last answer, estimate an AR model given the transformed quarterly data.
Interpret the results.

# ARIMA model
from statsmodels.tsa.arima.model import ARIMA
# Define model
mod = ARIMA(data.dlogIP,order=(5,0,0))
# Estimate model
res = mod.fit()
summary = res.summary()
# Print summary tables
tab0 = summary.tables[0].as_html()
tab1 = summary.tables[1].as_html()
tab2 = summary.tables[2].as_html()
#print(tab0); print(tab1); print(tab2)

Dep. Variable:	dlogIP	No. Observations:	425
Model:	ARIMA(5, 0, 0)	Log Likelihood	-1298.814

	coef	std err	z	P>\|z\|	[0.025	0.975]
const	2.9265	0.975	3.002	0.003	1.016	4.837
ar.L1	1.0243	0.031	32.647	0.000	0.963	1.086
ar.L2	-0.1735	0.042	-4.163	0.000	-0.255	-0.092
ar.L3	0.1902	0.035	5.505	0.000	0.123	0.258
ar.L4	-0.6642	0.029	-22.871	0.000	-0.721	-0.607
ar.L5	0.3486	0.025	13.773	0.000	0.299	0.398
sigma2	27.8385	0.912	30.511	0.000	26.050	29.627

Heteroskedasticity (H):	0.05	Skew:	0.34
Prob(H) (two-sided):	0.00	Kurtosis:	13.72

All the AR coefficients are significantly different than zero
Residuals have a lot of kurtosis, so maybe AR model isn’t the best choice

Review 1B Key

Contents

Review 1B Key#

1)#

2)#

3)#

4)#

5)#

6)#