Professional Documents
Culture Documents
This purpose of this project was to analyze several different economic metrics
provided by the Bureau of Labor Statics (BLS) and use statistical methods to determine
if there are any relationships between the metrics studied. The metrics being tested are
the average hourly earnings, unemployment rate, job opening rate, hire rate, quite rate,
consumer price index, and producer price index. Each metric is separated by industry,
so that the industry one works can be analyzed as well. These metrics provide a good
general snapshot shot of the economy and should provide decent insight into the health
of the economy, and relationships, that may govern some of the trends that can be
observed in our daily lives. A regression analysis will be performed on all variables that
display a possible relationship, to determine the degree that the two metrics correlate
with each other. If possible, a mathematical model can be constructed from this
analysis, that will allow for one to predict certain metrics based upon another metric.
Introduction
Every month the Bureau of Labor Statics (BLS), compiles several metrics into
what if referred to as the ‘jobs report’, this report is utilized by business leaders, and
elected and government official to inform their business and policy decisions. I think it
would be interesting to apply statical methods to determine if there are any correlation
between any of the metrics in the data, and if possible, create a mathematical model for
aspects of the economy. There are general assumptions and relationships that are
the data.
2
The metrics that will be tested are the ones that I believe provide a good
snapshot of the general economy and labor force of the United States. The metrics
being tested are the average hourly earnings, unemployment rate, job opening rate,
hire rate, quite rate, consumer price index, and producer price index. For the average
hourly earnings, unemployment rate, job opening rate, hire rate, and quite rate; each
statistic is broken down into separate industries, to explore how each statistic varies
from industry to industry, the industries being explored are construction, manufacturing,
financial activities, information, education and health services, and leisure and
hospitality. These industries provide a good spread of both white-collar and blue-collar
Economics textbooks teach the laws of supply and demand, which we would
predict to be seen in the how to labor market operates, we would predict to see this
reflected in seeing a linear relationship between the job opening rate and average
hourly earnings. This is explained by, if there’s a labor shortage, workers have more
interesting to see if all the panic is warranted. Inflation is often measure through the
consumer price index (CPI), which a measure in the difference in prices of typical
basket of consumer products, typically 1980 is used as the base year for the index. It
would be concerning to see anything other than a linear relationship between CPI and
time and average hourly earnings, because if the CPI outpaced time, it could be a sign
of high inflation.
3
Naturally one would expect for there to be a relationship between the quit rate
and the job opening rate, as more people quit one would expect more jobs to open, this
would possible be a linear relationship. and possible inverse relationship between the
hire rate and the unemployment rate, as more people become unemployed, one would
expect less people to be hired. Anechoically one can observe a time dependency of
average hourly earnings and consumer price index, as they have increased across the
Procedure
I began by collecting different metrics that were available over the same period,
as all metrics were indexed by date, so different metrics at the same moment of time
can be compared. The time period used for this project is around fifteen years, as more
2021, around when this project was started. To improve the robustness of the results
time was measured in years from start date, as that put time on a similar magnitude as
To begin analyzing this dataset I started by creating scatter plots of each of the
metrics plotted against each other, to determine visually if there are any correlations
between any of the variables. MATLAB’s regression function with a value of alpha equal
0.1 was used, this alpha is used because the data is over a broad time period and are
naturally prone to outliers, and only general trends are being examined. The best fit
lines from the regression are graphed along with the scatter plots along with the
regression coefficient.
4
Any best fit lines with a regression coefficient of greater than 0.9 are noted, a
value of 0.9 is used because we are only looking for only general trends. Graph that
displays possible nonlinear correlations are also noted. Average yearly earning has a
strong correlation (R2 > 0.9), with CPI and time, across all industries. There appears to
be a possible relation between job opening rate, quit rate as predictors for the inverse of
the unemployment rate. The inverse of the unemployment rate is graphed with the job
opening rate and quit rate, to determine if this manipulation linearizes the data. Although
the correlation between the variables increased with this manipulation, it still does not
have a strong enough correlation to constitute a linear model. There also appears some
possible relationship between the hire rate and job opening rate as predictors of the quit
rate, a log-log scale is applied to all the variables to attempt to linearize the data.
Although the correlation between the variables increased with this manipulation, it still
Results
All tables that were used to make the graphs can be found in the attached
appendix file, as well as all other graphs. Because of the expansiveness of the data only
charts of interest will be featured in this report document, all other figures referenced will
Please see regression analysis, that can be found in Appenix on page 68-85.
5
Time vs. Average Hourly Earnings ($)
50
Construction
R 2 = 0.970
45 Manufacturing
R 2 = 0.979
Financial Activities
R 2 = 0.966
40
Information
R 2 = 0.966
Average Hourly Earnings ($)
25
20
15
10
0 2 4 6 8 10 12 14 16
Time
25
20
15
10
190 200 210 220 230 240 250 260 270 280
Consumer Price Index
6
log(Job Opening Rate) vs. log(Quit Rate)
2.5
Construction
R 2 = 0.400
Manufacturing
2
R 2 = 0.600
Financial Activities
R 2 = 0.205
Information
1.5
R 2 = 0.172
Education & Health Services
R 2 = 0.598
Lesiure & Hospitality
log(Quit Rate)
1
R 2 = 0.538
0.5
-0.5
-1
-1 -0.5 0 0.5 1 1.5 2 2.5
log(Job Opening Rate)
1.5
log(Quit Rate)
Construction
0.5
R 2 = 0.059
Manufacturing
R 2 = 0.643
Financial Activities
0
R 2 = 0.573
Information
R 2 = 0.372
Education & Health Services
-0.5
R 2 = 0.649
Lesiure & Hospitality
R 2 = 0.358
-1
0 0.5 1 1.5 2 2.5 3 3.5
log(Hiring Rate)
7
Quit Rate (%) vs. 1/(Unemployment Rate)
0.7
Construction
R 2 = 0.460
Manufacturing
0.6
R 2 = 0.291
Financial Activities
R 2 = 0.146
Information
0.5
R 2 = 0.051
Education & Health Services
1/(Unemployment Rate)
R 2 = 0.078
Lesiure & Hospitality
0.4
R 2 = 0.192
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8
Quit Rate (%)
R 2 = 0.274
Lesiure & Hospitality
0.4
R 2 = 0.093
0.3
0.2
0.1
0
0 2 4 6 8 10 12
Job Opening Rate (%)
8
Conclusion
The only variables that act as predictors for another variable are time, consumer
price index, and average yearly earnings, respectively, while there appears to appear
some correlation between other variables there was too much error to create a linear
model. The outliers tend to have a definitive pattern, with the most error corresponding
to the years, 2008, and 2020; it makes sense that outliers would appear during these
times, as they correspond to the financial crisis and recession starting 2007, and the
Where earnings are the average hourly earnings measured in dollars, CPI is
the consumer price index, and time is measured in years since 03/01/06. The results
of these equations show that earnings are most strongly correlated with time, so the
most significant predictor of earnings are the industry one is in and years since
03/01/06. A lot of the residual follows the pattern of economic downturn in this country,
and that would make sense since we would predict this would affect the value, and it’s
information industry, this makes sense anecdotally, from the growth in the technology
sector and the need for data science. Financial Activities shows the most correlation
time, and not accounting for another dependency, another source could be not
accounting for collinearity between the two predictors, it’s possible one of the
predictors simply predicts the other, thus increasing the error, and reducing our
References
https://www.bls.gov/
https://www.sciencedirect.com/science/article/pii/S2212567113001974
Middleton, James. Lecture 10: Simple Linear Regression. MAE 301: Applied
Experimental Statistics.
Middleton, James. Lecture 11: The General Linear Model. MAE 301: Applied
Experimental Statistics.
10