You are on page 1of 9

Dynamic Econometrics

20th of May, 2019

The prosperity of few or the wealth of many: The


influence of wage on unemployment

Abstract
This report concerns the effect of average hourly wage in the manufacturing sector on the
unemployment rate in the United States of America. After data transformation, the causal
relationship between these two aspects were researched in an empirical analysis by means of
several descriptive models. Firstly, the best fitting ARMA model and subsequently ARDL
model were constructed, targeting to explain the unemployment rate by recent
unemployment rates and hourly earnings. Using the obtained models, a forecast was made,
testing the value and accuracy of the models by comparing the forecast to actual data.
From the obtained models and corresponding forecasts, we conclude that the average hourly
earnings in the manufacturing sector has a negative effect on the unemployment rate.

Group 28
Ettina Beiboer s3402282
Thom van Kemenade s3456803
Jorrit Wubben s3363112
Iris de Jong s3407365
Dynamic Econometrics May 2019

Introduction and problem formulation


For most people, having an income is the only way to get by. However, one’s income is not
always enough to be able to afford a decent standard of living. Imposing a high enough
minimum wage restriction could solve this problem easily, but at what cost? This question
is answered by decades of research, all confirming the observation made by Gary Becker,
a Nobel Prize-Winning economist. Becker stated that a higher minimum wage will further
reduce the employment opportunities of workers with few skills.1 Hence, increasing the
minimum wage will result in a rise in unemployment. However, does this also apply to the
average hourly earnings and the unemployment rate? In other words, will an increase in the
average hourly earnings induce a rise in the unemployment rate? If this is indeed the case, we
could use this information to predict the future unemployment rate. According to Claudia
Sahm, an economist at the Federal Reserve, the unemployment rate is an excellent recession
indicator.2 . In her new report she states that the economy is in a recession when the three-
month average unemployment rate is at least a 0.5 percentage points above its minimum
from the previous 12 months. It turns out that this simple measure has correctly called
every recession in America since 1970.3 Hence, in this way, forecasting the unemployment
rate precisely could help us predict the next recession, which is extremely useful. Therefore,
we would like to predict the unemployment rate. Since we stated before that wages have
an effect on the unemployment rate, we would like to forecast the unemployment rate based
on the average hourly earnings. In order for us to do this, we will impose ourselves with
the following question in this paper: What is the effect of average hourly earnings on
unemployment rates? We will attempt to find an answer on this question in multiple steps,
beginning with the analysis of the unemployment rate and average hourly earnings variables.
Secondly, we will try to determine the best fitting Autoregressive Moving Average (ARMA)
model and Autoregressive Distributed Lag (ARDL) model for our data. After the best fitting
model have been acquired, we will analyze the forecasting accuracy of these models. This
analysis will then lead us to the best forecasting model. In the end, we will consider a VAR
model in order to determine the linear interdependencies of the unemployment rate and the
average hourly earnings. After carefully analyzing the data and their corresponding models,
a conclusion will be put forward that hopefully gives an in depth and significant answer to
the question we proposed in this research paper.

Data description
The data that will be analyzed in this research paper, is solely based on the data provided by
the FRED-MD database.4 This data set consists of monthly data from January 1959 up to
1
Obtained from an article entitled ’High Minimum Wage Equals High Unemployment’, written by Craig
Garthwaite, published in December 2003 in the Seattle Post-Intelligencer. URL: https://www.epionline.
org/oped/o76/
2
”How to spot a recession”, published on June 11th 2019 on the website of The Economist. URL: https:
//www.economist.com/graphic-detail/2019/06/11/how-to-spot-a-recession
3
”Direct Stimulus Payments to Individuals”, written by Claudia Sahm and produced by The Hamilton
Project and published on May 6th 2019. URL: https://www.brookings.edu/wp-content/uploads/2019/05/
ES_THP_Sahm_web_20190506.pdf
4
FRED-MD: A Monthly Database for Macroeconomic Research. This database is publicly accessible,
measured in the United States of America and constructed by the Research Department of the Federal Reserve
Bank of St. Louis.
Retrieved from https://s3.amazonaws.com/real.stlouisfed.org/wp/2015/2015-012.pdf

1
Dynamic Econometrics May 2019

and including February 2019. However, for the research discussed in this paper, only the data
from January 1959 up to December 2006 will be considered. Furthermore, we focus solely on
two variables from this specific data set. First the civilian unemployment rate variable will
be analyzed, which is denoted by ”UNRATE”. This variable is given by the percentage of
the labour force which is unemployed. It is seasonally adjusted, meaning that the seasonal
part of the variable is removed from the data. Secondly, the variable of the average hourly
earnings of employees in the manufacturing sector will be further inspected. This variable
was denoted by ”CES3000000008” in the aforementioned data set, but will from now on be
referred to as ’CES’. CES is, in contrast to the variable of unemployment rate, not seasonally
adjusted. It is nominal data and hence not adjusted for inflation.
Observing the data of both variables, it can be seen in figure 1a that the unemployment
rate in the U.S.A. is both rising and falling throughout the time, which can be explained by
economic downfall and economic prosperity, respectively, as stated by basic macroeconomic
theory. Looking at figure 1b, one could state that the average hourly earnings are increasing
over time. Taking into consideration that the data is not adjusted for inflation, this seems
obvious and realistic. It can be seen that the data included some minor setbacks, which is
logical, looking at the macroeconomic cycles in the economy, for example during economic
downfalls such as recessions or minor economic setbacks.

(a) Unemployment rate (b) Average hourly earnings

Figure 1: UNRATE and CES plotted over time.

Now that we have ’eyeballed’ the data, we will begin to transform the data. In order to make
it usable for our analysis, the data has to be stationary. This can be checked by performing
a Dickey-Fuller test in R. This Dickey-Fuller test tests under the null hypothesis whether the
model has a unit root and therefore is non-stationary. Performing this test on the UNRATE
data set, yields an p-value of 0.2148. We conclude that the we cannot reject the null hypoth-
esis and that therefore the data set is non-stationary. Hence a transformation is necessary.
Taking the difference of the unemployment rate between month i and month i − 1, we create
a new variable named ”diffUNRATE”. This new variable yields an p-value of 0.01 for the
Dickey-Fuller test, meaning that the null hypothesis will be rejected. Thus we can conclude
that this model has roots of less than 1 in absolute value, creating a stationary model. From
now on, when referring to the unemployment rate variable or time series, we actually refer
to the transformed unemployment rate, i.e. ”diffUNRATE”, unless stated otherwise. By
plotting ”diffUNRATE” it directly becomes clear there is a time series without a trend, as
shown in figure 2a.

2
Dynamic Econometrics May 2019

For our second variable CES, which denotes the average hourly earnings, we are interested
in the growth rate. Therefore, we start with a log transformation of the data set. In order to
check whether the data is stationary, we will again perform a Dickey-Fuller test. This leads
to a p-value of 0.99, so we can conclude that we cannot reject the null hypothesis. Hence, we
have to take the first difference of the log transformed variable. In order to check whether
this difference of the log, denoted as ”difflogces”, is now stationary, we will perform a second
Dickey-Fuller test. This now leads to a p-value of 0.01135, which is still too much to reject
the null hypthesis. Hence, we are forced to take second differences in order to hopefully
generate stationary data. These second differences will now be given by the variable named
”diffdifflogCES”. In order to confirm the data is now stationary, a third Dicky-Fuller will
be performed. Finally we yield a p-value of 0.01. Hence, we reject the null hypothesis and
conclude the data is stationary. Therefore we will, when referring to the average hourly
earnings variable, actually refer to the transformed CES, i.e. ”diffdifflogCES”, unless stated
otherwise. It can be made clear with the plot shown in figure 2b that ”diffdifflogCES” is a
time series without a trend.

(a) Transformed unemployment rate (b) Transformed average hourly earnings

Figure 2: Transformed UNRATE and CES.

Now that we generated two stationary data sets, we are going to investigate the presence
structural changes in our data. For this, we perform a CUSUM test. The null hypothesis is
that there are no structural breaks. For the unemployment rate time series this test yields a
p-value of 1. Therefore, it follows that there are no structural breaks for this time series. For
the average hourly earnings time series, the p-value equals 0.06686. When testing on a 5%
level, we see that there is enough evidence to conclude that there are no structural breaks in
this time series either.

Empirical analysis

Now that the data has been transformed into usable data, it can be used to construct a
model. As stated in the introduction, our empirical analysis will mainly focus on the ques-
tion: What is the effect of average hourly earnings on unemployment rates? Therefore the
variable of unemployment rate will be of primary interest, meaning that this variable will be
analyzed singularly. This will be done by finding the best fitting model out of multiple mod-
els. Subsequently the unemployment rate variable will be analyzed jointly with our leading
indicator, i.e. the average hourly earnings variable. In the end, we will attempt to forecast
the unemployment rate based on the average hourly earnings.

3
Dynamic Econometrics May 2019

The ARMA model


Before we can generate a forecast, we have to determine the best fitting model to model the
unemployment rate. In order to try and fit multiple models at once, an ARMA(p,q) model
(Auto-Regressive Moving Average model of orders p and q) will be made, since this model
takes both AR(p) (Auto-regressive) and MA(q)(Moving Average) models into account and
can be solely an AR or MA model by setting one of the parameters to zero. In order to
find a fitting ARMA model, we first plot the Autocorrelation Function (ACF) and Partial
Autocorrelation Function (PACF) of our transformed data. These plots indicate which orders
are the best for our ARMA model by showing which lags of the variable have significant effect.

(a) ACF (b) PACF

Figure 3: ACF and PACF of diffUNRATE

Inspecting both plots, we observe significant values at lags 12 and 24. To find the ”best”
model, we search for the model that obtains the minimum value of the Akaike Information
Criterion (AIC) and the Schwarz Information Criterion (BIC). The table below shows the
models of orders p and q that yielded the most negative values of AIC and BIC.

(p,q) AIC (p,q) BIC


1,2 -4879.110 1,1 -4857.571
2,1 -4878.364 1,2 -4857.347
1,3 -4877.017 2,1 -4856.601
3,1 -4876.370 2,2 -4851.237

According to the AIC and BIC values shown in the table above, we can conclude that the
ARMA(1,2) and the ARMA(2,1) are the most appropriate models. The ARMA(2,1) model
is of the form Yt = c + φ1 Yt−1 + φ2 Yt−2 + εt + θ1 εt−1 and the ARMA(1,2) model is of the
form Yt = c + φ1 Yt−1 + εt + θ1 εt−1 + θ2 εt−2 . Since the AR-part of the ARMA model is
more insightful than the MA part, we slightly prefer to use the ARMA(2,1) model over the
ARMA(1,2) model. Estimating our preferred model yields the coefficient estimates given in
the following equation

Yt = −0.0041 + 0.5819 · Yt−1 + 0.2204 · Yt−2 − 0.5810 · εt−1 + εt (ARMA(2,1) model)


(0.0149) (0.0794) (0.0454) (0.0744)

The standard errors are given in the brackets below the corresponding coefficient estimates.
The model shows a negative mean, which means that the difference in unemployment rate
from month to month is a little negative in general. Furthermore we see that the difference in

4
Dynamic Econometrics May 2019

unemployment rate from the previous month has a positive effect on the Yt . The same goes
for the difference in unemployment rate from two months before since φ2 is also positive. On
the other hand, a shock from the previous month has a negative impact on Yt , as the negative
value for the estimate of θ1 explains. Looking at the residuals, the model appears to follow
a white noise process. In order to further test the efficiency of the model, a serial correlation
test has been made, known as the Ljung-Box test. According to a Ljung-box test, the model
yields a p-value of around 0.07. This means there is little evidence of autocorrelation. Since
the roots of the characteristic equation of the AR-part of the ARMA(2,1) model are equal
to 3.8261 and 1.1859, we conclude that the ARMA(2,1) model is stable. Furthermore, we
have that for the MA-part of the ARMA(2,1) model, the root of the characteristic equation
is equal to 1.7212. Hence we can also conclude that the ARMA(2,1) model is an invertible
one. The residuals of this model and its ACF are given in figure 4.

Figure 4: Checkresiduals of ARMA(2,1)

The plots in figure 4 show some significant outliers, however, this can be due to the seasonal
adjustment of the data. All in all, we conclude that the best fitting ARMA model for the
unemployment rate variable is the ARMA(2,1) model, which is stable as well as invertible
and has little evidence of autocorrelation.
The ARDL Model
Now that the best fitting ARMA model is obtained and analyzed, we continue our analysis
of the unemployment rate time series by finding the best fitting Autoregressive Distributed
Lag (ARDL) Model. Different from the ARMA model, the ARDL model is not only affected
by its own lags, but also affected by an explanatory variable Xt and its lags. For our ARDL
model, the explanatory variable is given by the average hourly earnings variable CES. We
will begin our search by examining models with different number of lags. We impose the
restriction that the lag order is the same for all variables. We also assume that there exists no
contemporaneous relation between the unemployment rate and the average hourly earnings.
Performing Breusch-Godfrey tests for serial correlation on the models with 3, 4 and 5 lags,
shows that the highest p-value is given by the model with 4 lags. Hence, we can conclude
that a model with 4 lags will be the most efficient. Next, we check for autocorrelation using
this ARDL(4,4) model. This is done by performing a Box-Pierce test as well as a Box-Ljung
test, which both yield a p-value of 0.9971. Hence we will not reject the null hypothesis

5
Dynamic Econometrics May 2019

of both tests and can therefore conclude that the data are independently distributed, i.e.
there is no evidence of autocorrelation. Now that we determined the optimal number of
lags for the ARDL model and checked for autocorrelation, we will construct the ARDL(4,4)
model. We know that an ARDL(4,4) model is generally given by Yt = c + φ1 Yt−1 + φ2 Yt−2 +
φ3 Yt−3 + φ4 Yt−4 + β0 Xt + β1 Xt−1 + β2 Xt−2 + β3 Xt−3 + β4 Xt−4 . Since we assumed that
a contemporaneous relation between unemployment rate and average hourly earnings does
not exist, the coefficient β0 is equal to zero and therefore will not be present in our model.
Estimation of our ARDL(4,4) model using R then yields the following equation

Yt = − 0.0002 − 0.0235 · Yt−1 + 0.1944 · Yt−2 + 0.1602 · Yt−3 + 0.1632 · Yt−4


(0.0071) (0.0425) 0.0418 (0.0414) (0.0425)

− 3.5560 · Xt−1 − 4.7586 · Xt−2 − 1.2433 · Xt−3 − 1.1270 · Xt−4


(1.9290) (2.5506) (2.5266) (1.8840)

where the standard errors are given below the corresponding coefficient estimates. Similar
to the mean of the ARMA(2,1) model, we can see that the mean of the difference in unem-
ployment rate is negatively valued in our ARDL model. On the other hand, the difference in
unemployment rate of the previous month is now negatively correlated with Yt instead of pos-
itively correlated as in the ARMA(2,1) model. However, the coefficient of Yt−2 is estimated
to be positively correlated with Yt in the ARMA(2,1) model as well as in the ARDL(4,4)
model. The same goes for the coefficients of Y3 and Y4 in the ARDL(4,4) model. Looking
at the impact of our second explanatory variable, the average hourly earnings, we discover
that every coefficient estimate of the lags of the average hourly earnings variable is negatively
correlated with Yt . This means that a rise in the difference of the difference in average hourly
earnings yields a decline in the difference in unemployment rates, Yt . It should however be
noted that the standard errors of the estimation of the coefficients of the explanatory variable
are fairly high, and thus the model might not be a good fit. Since we are interested in the
effect of average hourly earnings on unemployment rates, this result is very important. All in
all, we conclude that the best fitting ARDL model is given by the ARDL(4,4) model shown
above.
Forecasting with the ARMA and ARDL models
Since we concluded that the ARMA(2,1) and the ARDL(4,4) are the best fitting models for
our data, we will now extend our analysis by forecasting the data using both obtained models.
We will begin forecasting with what we concluded was the best fitting ARMA model, the
ARMA(2,1). This forecast will be based on quarterly data instead of monthly data, that
starts in 1960 up to the last quarter of 1999 for both variables UNRATE and CES. Using this
data, we will try to forecast up to the first year of the twenty-first century, in order to analyze
the ’out-of-sample’ performance of our models. We will do this by constructing two different
types of forecast beginning with the 1-step and ending with the 4-step quarterly forecast.
A 1-step ahead forecast indicates that the data is predicted one quarter into the future (3
months) and a 4-step ahead forecast predicts up to a year into the future. When we estimate
both the ARMA(2,1) 1-step and the 4-step ahead forecasts from the beginning of the year
2000 up till the end of 2006 and compare it to the data that we already had for this period
of time using the Mean Squared Error (MSE), we yield the MSE for our ARMA(2,1) 1-step
ahead forecast, which is 0.0432, and the MSE for our ARMA(2,1) 4-step ahead forecast, given
by 0.0439. Hence, we can conclude that the 1-step ahead ARMA(2,1) forecast is better in
the sense that it generates a lower mean squared error.
Now that we calculated the mean squared errors of the ARMA(2,1) 1-step and 4-step ahead,
we can apply the same formula to the estimates of the ARDL(1,1) model. The reason we

6
Dynamic Econometrics May 2019

choose an ARDL(1,1) model rather than our obtained ARDL(4,4), is that we are using quar-
terly data for our forecast instead of monthly data, which we used to obtain the ARDL(4,4)
model. Therefore we decided to forecast using an ARDL(1,1) model, which yielded a MSE
of 0.04650332 for the 1-step ahead forecast, and a MSE of 0.07848811 for the 4-step ahead
forecast. Since we constructed the forecasts for both the ARMA(2,1) and the ARDL(1,1)
model, we can compare them using the Diebold-Mariano test. The null hypothesis of the
Diebold-Mariano test assumes that both forecasts tested yield the same predictive accuracy.
In order to determine the forecast with the best predictive accuracy, we will begin comparing
the ARMA(2,1) and ARDL(1,1) 1-step ahead forecasts. Performing the Diebold-Mariano test
on these two forecasts yields a p-value of 0.4568. Hence, we cannot reject the null hypothesis
and we will therefore conclude that these forecasts have equal predictive accuracy. Then,
performing the same test but this time on the ARMA(2,1) and ARDL(1,1) 4-step ahead
forecasts, yields a p-value of 0.4926. Again, we cannot reject the null hypothesis and will
therefore conclude that these forecasts also have equal predictive accuracy.
The VAR model
In order to capture the linear interdependencies of both the unemployment rate as well as
the average hourly earnings time series, we will construct our last model called the Vector
Autoregressive (VAR) model. Since we obtained a ARDL(4,4) model as the best fitting
ARDL model, we will consider a VAR model with the same number of lags, namely 4. In
general, a VAR(4) model is constructed as follows, where the unemployment rate time series
is denoted by U N R and the average hourly earnings time series as CES
         
U N Rt U N R−1 U N Rt−2 U N Rt−3 U N Rt−4
= µ + Φ1 + Φ2 + Φ3 + Φ4 + ut
CESt CESt−1 CESt−2 CESt−3 CESt−4

Estimation of the VAR(4) model in R, yields the matrices given in the following VAR(4)
model.
         
U N Rt −0.0002 −0.0235 −3.5560 U N R−1 0.1944 −4.7586 U N Rt−2
= + +
CESt 0.0000 −0.0010 −0.9600 CESt−1 0.0007 −0.7687123 CESt−2
     
0.1602 −1.2433 U N Rt−3 0.1632 −1.1270 U N Rt−4
+ + + ut
0.0029 −0.4452 CESt−3 0.0001 −0.2094 CESt−4

Since our VAR(4) model has the same number of lags as our previously obtained ARDL(4,4)
model and both consider the average hourly earnings as a second explanatory variable, we can
compare the estimates of the coefficients of both models. Placing the coefficient estimates of
the VAR(4) model next to the coefficient estimates of our ARDL(4,4) model, one can easily
see the similarity between the two. Hence, we can conclude that the coefficient estimates
of the VAR(4) are equal to the coefficient estimates of our previously obtained ARDL(4,4)
model. However, this is not very surprising, keeping in mind the restrictions we imposed on
the ARDL(4,4) model. In order to study the stability of our VAR(4) model, we will examine
the roots of the characteristic polynomial using R. Because all the values given in R are
positive and less than 1, and since the R code yields the eigenvalues of the VAR(4) model
instead of the roots of the characteristic polynomial, we can conclude that the VAR(4) model
is stable (all roots lie outside the unit circle). Along with this test that determines stability,
we also performed the Portmanteau / Ljung-box test to check if there is any evidence of
autocorrelation. Since this test yielded a p-value of 0.000001714, we reject the null hypothesis
that assumes that the data is independently distributed. Hence, we can conclude that there
is evidence of autocorrelation in our VAR(4) model. All in all, we can conclude that the

7
Dynamic Econometrics May 2019

coefficient estimates of the VAR(4) are equal to those of the ARDL(4,4) model, that the
VAR(4) model is stable and that there is evidence of autocorrelation in the VAR(4) model.

Conclusion

In this paper we attempted to describe the unemployment rate in the United States of America
based on historic unemployment rates and average hourly earnings. After the transforming
the data into stationary data, we firstly determined the best fitting Auto-Regressive Moving
Average (ARMA) model, which turned out to be the ARMA(2,1) model. From this first model
we were able to conclude that the difference in the unemployment rate of the current month
is positively correlated with that of the previous months. We then continued our search by
introducing our leading indicator, i.e. the average hourly earnings, as an explanatory variable,
and by determining the best fitting Auto-Regressive Distributed Lag (ARDL) model, which
yielded the ARDL(4,4) model, a model with little evidence of autocorrelation. As a result of
the two restrictions imposed on the ARDL(4,4) model, the coefficient estimates were equal
to those of the Vector Auto-regressive (VAR) model of order 4, we obtained later on. For
this VAR(4) model however, we determined evidence of autocorrelation. Furthermore, based
on the obtained ARMA(2,1) and ARDL(4,4) models, various forecast were constructed and
compared to the realized data by means of the mean squared error. We concluded that for
the 1-step ahead as well as for the 4-step ahead forecast, the ARMA(2,1) and the ARDL(1,1)
were equally accurate forecasts. Although the accuracy of the models may be questioned,
due to high standard deviations and evidence of autocorrelation, we can conclude, since the
coefficient estmimates of CES in both the ARDL(4,4) as the VAR(4) model are negative,
that the average hourly earnings is negatively correlated with the unemployment rate.

References

Manfred Gärtner, Macroeconomics, 5th Edition, Pearson Hall, 2016

FRED (2019, may 15). FRED-MD: A Monthly Database for Macroeconomic Research. Re-
trieved from https://s3.amazonaws.com/real.stlouisfed.org/wp/2015/2015-012.pdf

Fumio Hayashi, Econometrics, 1th Edition, Princeton University Press, 2009.

An article entitled ’High Minimum Wage Equals High Unemployment’, written by Craig
Garthwaite, published in December 2003 in the Seattle Post-Intelligencer. URL: https:
//www.epionline.org/oped/o76/

An online article entitled ”How to spot a recession”, published on June 11th 2019 on the
website of The Economist. URL: https://www.economist.com/graphic-detail/2019/06/
11/how-to-spot-a-recession

A report entitled ”Direct Stimulus Payments to Individuals”, written by Claudia Sahm


and produced by The Hamilton Project and published on May 6th 2019. URL: https:
//www.brookings.edu/wp-content/uploads/2019/05/ES_THP_Sahm_web_20190506.pdf

You might also like