You are on page 1of 10

MAE 301: Applied Experimental Statistics

Professor James Middleton


Final Report

Statistical Analysis of Economic Metrics from


Bureau of Labor Statistics
Matthew Johnson
Fall 2021
Arizona State University
Executive Summary

This purpose of this project was to analyze several different economic metrics

provided by the Bureau of Labor Statics (BLS) and use statistical methods to determine

if there are any relationships between the metrics studied. The metrics being tested are

the average hourly earnings, unemployment rate, job opening rate, hire rate, quite rate,

consumer price index, and producer price index. Each metric is separated by industry,

so that the industry one works can be analyzed as well. These metrics provide a good

general snapshot shot of the economy and should provide decent insight into the health

of the economy, and relationships, that may govern some of the trends that can be

observed in our daily lives. A regression analysis will be performed on all variables that

display a possible relationship, to determine the degree that the two metrics correlate

with each other. If possible, a mathematical model can be constructed from this

analysis, that will allow for one to predict certain metrics based upon another metric.

Introduction

Every month the Bureau of Labor Statics (BLS), compiles several metrics into

what if referred to as the ‘jobs report’, this report is utilized by business leaders, and

elected and government official to inform their business and policy decisions. I think it

would be interesting to apply statical methods to determine if there are any correlation

between any of the metrics in the data, and if possible, create a mathematical model for

aspects of the economy. There are general assumptions and relationships that are

found in common economics textbooks, that don’t always seem to correspond to

reality, it would be interesting to research if these hypotheses can be corroborated by

the data.

2
The metrics that will be tested are the ones that I believe provide a good

snapshot of the general economy and labor force of the United States. The metrics

being tested are the average hourly earnings, unemployment rate, job opening rate,

hire rate, quite rate, consumer price index, and producer price index. For the average

hourly earnings, unemployment rate, job opening rate, hire rate, and quite rate; each

statistic is broken down into separate industries, to explore how each statistic varies

from industry to industry, the industries being explored are construction, manufacturing,

financial activities, information, education and health services, and leisure and

hospitality. These industries provide a good spread of both white-collar and blue-collar

jobs, to get a general snapshot of each industry.

Economics textbooks teach the laws of supply and demand, which we would

predict to be seen in the how to labor market operates, we would predict to see this

reflected in seeing a linear relationship between the job opening rate and average

hourly earnings. This is explained by, if there’s a labor shortage, workers have more

barging power over their wages.

There is significant uproar about inflation in the news lately, it would be

interesting to see if all the panic is warranted. Inflation is often measure through the

consumer price index (CPI), which a measure in the difference in prices of typical

basket of consumer products, typically 1980 is used as the base year for the index. It

would be concerning to see anything other than a linear relationship between CPI and

time and average hourly earnings, because if the CPI outpaced time, it could be a sign

of high inflation.

3
Naturally one would expect for there to be a relationship between the quit rate

and the job opening rate, as more people quit one would expect more jobs to open, this

would possible be a linear relationship. and possible inverse relationship between the

hire rate and the unemployment rate, as more people become unemployed, one would

expect less people to be hired. Anechoically one can observe a time dependency of

average hourly earnings and consumer price index, as they have increased across the

board over the studied time period.

Procedure

I began by collecting different metrics that were available over the same period,

as all metrics were indexed by date, so different metrics at the same moment of time

can be compared. The time period used for this project is around fifteen years, as more

comprehensive statistics became available beginning March 2006, until September

2021, around when this project was started. To improve the robustness of the results

time was measured in years from start date, as that put time on a similar magnitude as

the rates it will be compared to.

To begin analyzing this dataset I started by creating scatter plots of each of the

metrics plotted against each other, to determine visually if there are any correlations

between any of the variables. MATLAB’s regression function with a value of alpha equal

0.1 was used, this alpha is used because the data is over a broad time period and are

naturally prone to outliers, and only general trends are being examined. The best fit

lines from the regression are graphed along with the scatter plots along with the

regression coefficient.

4
Any best fit lines with a regression coefficient of greater than 0.9 are noted, a

value of 0.9 is used because we are only looking for only general trends. Graph that

displays possible nonlinear correlations are also noted. Average yearly earning has a

strong correlation (R2 > 0.9), with CPI and time, across all industries. There appears to

be a possible relation between job opening rate, quit rate as predictors for the inverse of

the unemployment rate. The inverse of the unemployment rate is graphed with the job

opening rate and quit rate, to determine if this manipulation linearizes the data. Although

the correlation between the variables increased with this manipulation, it still does not

have a strong enough correlation to constitute a linear model. There also appears some

possible relationship between the hire rate and job opening rate as predictors of the quit

rate, a log-log scale is applied to all the variables to attempt to linearize the data.

Although the correlation between the variables increased with this manipulation, it still

does not have a strong enough correlation to constitute a linear model.

Results

All tables that were used to make the graphs can be found in the attached

appendix file, as well as all other graphs. Because of the expansiveness of the data only

charts of interest will be featured in this report document, all other figures referenced will

include a page number of the appendix.

Please see regression analysis, that can be found in Appenix on page 68-85.

The resdial analysis and Q-Q plots are plotted by industry.

5
Time vs. Average Hourly Earnings ($)
50
Construction
R 2 = 0.970
45 Manufacturing
R 2 = 0.979
Financial Activities
R 2 = 0.966
40
Information
R 2 = 0.966
Average Hourly Earnings ($)

Education & Health Services


35 R 2 = 0.989
Lesiure & Hospitality
R 2 = 0.923
30

25

20

15

10
0 2 4 6 8 10 12 14 16
Time

Consumer Price Index vs. Average Hourly Earnings ($)


50
Construction
R 2 = 0.967
45 Manufacturing
R 2 = 0.970
Financial Activities
R 2 = 0.970
40
Information
R 2 = 0.952
Average Hourly Earnings ($)

Education & Health Services


35 R 2 = 0.987
Lesiure & Hospitality
R 2 = 0.929
30

25

20

15

10
190 200 210 220 230 240 250 260 270 280
Consumer Price Index

6
log(Job Opening Rate) vs. log(Quit Rate)
2.5
Construction
R 2 = 0.400
Manufacturing
2
R 2 = 0.600
Financial Activities
R 2 = 0.205
Information
1.5
R 2 = 0.172
Education & Health Services
R 2 = 0.598
Lesiure & Hospitality
log(Quit Rate)

1
R 2 = 0.538

0.5

-0.5

-1
-1 -0.5 0 0.5 1 1.5 2 2.5
log(Job Opening Rate)

log(Hiring Rate) vs. log(Quit Rate)


2.5

1.5
log(Quit Rate)

Construction
0.5
R 2 = 0.059
Manufacturing
R 2 = 0.643
Financial Activities
0
R 2 = 0.573
Information
R 2 = 0.372
Education & Health Services
-0.5
R 2 = 0.649
Lesiure & Hospitality
R 2 = 0.358
-1
0 0.5 1 1.5 2 2.5 3 3.5
log(Hiring Rate)

7
Quit Rate (%) vs. 1/(Unemployment Rate)
0.7
Construction
R 2 = 0.460
Manufacturing
0.6
R 2 = 0.291
Financial Activities
R 2 = 0.146
Information
0.5
R 2 = 0.051
Education & Health Services
1/(Unemployment Rate)

R 2 = 0.078
Lesiure & Hospitality
0.4
R 2 = 0.192

0.3

0.2

0.1

0
0 1 2 3 4 5 6 7 8
Quit Rate (%)

Job Opening Rate (%) vs. 1/(Unemployment Rate)


0.7
Construction
R 2 = 0.456
Manufacturing
0.6
R 2 = 0.265
Financial Activities
R 2 = 0.415
Information
0.5
R 2 = 0.207
Education & Health Services
1/(Unemployment Rate)

R 2 = 0.274
Lesiure & Hospitality
0.4
R 2 = 0.093

0.3

0.2

0.1

0
0 2 4 6 8 10 12
Job Opening Rate (%)

8
Conclusion

The only variables that act as predictors for another variable are time, consumer

price index, and average yearly earnings, respectively, while there appears to appear

some correlation between other variables there was too much error to create a linear

model. The outliers tend to have a definitive pattern, with the most error corresponding

to the years, 2008, and 2020; it makes sense that outliers would appear during these

times, as they correspond to the financial crisis and recession starting 2007, and the

covid-19 pandemic, and ensuing recession.

Industry Linear Model

Construction 𝐸𝑎𝑟𝑛𝑖𝑛𝑔 = (0.0664)𝐶𝑃𝐼 + (0.3705)𝑇𝑖𝑚𝑒 + (8.6323)

Manufacturing 𝐸𝑎𝑟𝑛𝑖𝑛𝑔 = (0.0353)𝐶𝑃𝐼 + (0.3844)𝑇𝑖𝑚𝑒 + (13.6703)

Financial Activities 𝐸𝑎𝑟𝑛𝑖𝑛𝑔 = (0.1415)𝐶𝑃𝐼 + (0.3682)𝑇𝑖𝑚𝑒 − (5.0898)

Information 𝐸𝑎𝑟𝑛𝑖𝑛𝑔 = (0.0327)𝐶𝑃𝐼 + (1.0425)𝑇𝑖𝑚𝑒 + (18.8633)

Education & Health Services 𝐸𝑎𝑟𝑛𝑖𝑛𝑔 = (0.0675)𝐶𝑃𝐼 + (0.3066)𝑇𝑖𝑚𝑒 + (8.6323)

Leisure & Hospitality 𝐸𝑎𝑟𝑛𝑖𝑛𝑔 = (0.0604)𝐶𝑃𝐼 + (0.1277)𝑇𝑖𝑚𝑒 − (0.77)

Where earnings are the average hourly earnings measured in dollars, CPI is

the consumer price index, and time is measured in years since 03/01/06. The results

of these equations show that earnings are most strongly correlated with time, so the

most significant predictor of earnings are the industry one is in and years since

03/01/06. A lot of the residual follows the pattern of economic downturn in this country,

and that would make sense since we would predict this would affect the value, and it’s

an effect not being


9
accounted for by this model. The most robust growth with respect to time is in the

information industry, this makes sense anecdotally, from the growth in the technology

sector and the need for data science. Financial Activities shows the most correlation

with the consumer price industry, This could be an interesting phenomenon to

investigate further. A possible source of type I error is overestimating the influence of

time, and not accounting for another dependency, another source could be not

accounting for collinearity between the two predictors, it’s possible one of the

predictors simply predicts the other, thus increasing the error, and reducing our

confidence to reject the null hypothesis.

References

https://www.bls.gov/

https://www.sciencedirect.com/science/article/pii/S2212567113001974

Middleton, James. Lecture 10: Simple Linear Regression. MAE 301: Applied

Experimental Statistics.

Arizona State University.

Middleton, James. Lecture 11: The General Linear Model. MAE 301: Applied

Experimental Statistics.

Arizona State University.

10

You might also like