You are on page 1of 17

Investigating the correlation between GDP of individual US states and their respective

incarceration rates per 100,000

Page Count: 13
Introduction

Prisons in the United States are total institutions, being integrated in every aspect of an

incarcerated individual’s life. Those who are imprisoned are expected to follow a strict schedule

and adhere to different rules from those on the outside. The transition from this totally controlled

environment to the outside world is one that is quite jarring for the individual and the

surrounding community in a variety of ways. Individuals who have been incarcerated for longer

periods of time must go through the process of making sure they can keep up with the changing

outside world. They must find living arrangements, learn about new technology, and find a way

to sustain themselves. Oftentimes, resources to succeed in the outside world are not offered to

the full extent needed to ex-offenders and thus results in many problems. Lack of access to

resources like housing or lack of rehabilitation cause ex-offenders to have a difficult time trying

to get adjusted to society and become productive citizens. Additionally, the stigma towards those

who have been previously incarcerated plays a role in lack of job opportunities for ex-inmates.

This then increases the rate of recidivism and has been shown to have negative economic effects

on previously incarcerated individuals and the community around them.

GDP is an economic measure of study that is used to measure gross domestic product. It

is described to be the measure of services and final goods from an area in the form of their

monetary value. It is often used as an overarching view on a country or area’s economic situation

and growth rate. Looking at GDP against another variable allows an observer to make possible

connections between the economy of a certain area and the variable in question. If the GDP is

affected, then there is a chance that the variable is significantly affecting an area economically.

Therefore, I want to find the correlation between incarceration rates and GDP of US states in

order to determine if incarceration rates are significantly linked in some way to GDP at all.
I am quite interested in this topic because of the connections it has with my IB Global

Politics course. I am currently doing my engagement activity IA on the effects of inmate

rehabilitation in regards to local economies. I sought to connect my study of this subject with

math by examining two factors that relate to the idea of prisons impacting the economy. Through

this study, I hope to find out if higher incarceration rates negatively impact the economy of

specific states or not. This investigation will also allow me to have a starting place in trying to

figure out the economics of incarceration and how that has affected (and will continue to affect)

local areas in detrimental ways. If I establish any kind of correlation, I will be able to focus my

study on specific areas of economic interest, like the housing sector and community stigma.

Plan of investigation:

In this investigation, I am looking at two variables:

Let x= Incarceration rates per US state (per 100,000 in 2019)

This variable shows the number of people incarcerated in a specific state per

100,000 people.

Let y= GDP per state in 2019

I will first be determining the standard deviation of the data to determine dispersion of the data

set. Through this, I hope to figure out if the data values are significantly far apart or not. I will

then find the equation for the line using least square regression. Based on the angle of the line, I

will be able to figure out of there is negative, positive, or no correlation. Then, I will then

calculate Pearson’s correlation coefficient in order see the strength of correlation between the

two variables, and do a Chi squared test to test for independence. My methodology is structured

in this format in order to discern how spread apart the data is and if there is a strong relationship

between the two variables. All my calculations will be valuable in discerning correlation.
Figure 1: This graph shows the data for x and y for all 50 US states. The values vary quite a bit,
especially for GDP. The data for incarceration rates was collected from nicic.gov, and GDP is
from the Bureau of Economic Analysis.

Data Table 1: GDP and Incarceration Rates Data Table 1: GDP and Incarceration
Per US State in 2019 (AL-MO) Rates Per US State in 2019 (NE-WY)

Incarceration Incarceration
GDP per GDP per
rates (per rates (per
US State state US State state
100,000) 100,000)
(2019) (2019)
(2019) (2019)
Alabama 419 231,172 Montana 440 51,789
Alaska 244 54,547 Nebraska 289 131,352
Arizona 558 369,988 Nevada 413 181,743
New
Arkansas 586 130,840 197 83,844
Hampshire
California 310 3,052,645 New Jersey 210 613,509
Colorado 341 392,218 New Mexico 316 101,972
Connecticut 245 280,692 New York 224 1,694,958
North
Delaware 382 72,488 313 595,655
Carolina
Florida 444 1,116,435 North Dakota 231 59,005
Georgia 507 637,799 Ohio 430 666,974
Hawaii 215 91,781 Oklahoma 639 203,700
Idaho 475 82,420 Oregon 353 246,647
Illinois 302 867,536 Pennsylvania 355 772,611
Indiana 399 373,518 Rhode Island 156 59,129
South
Iowa 293 190,403 353 244,662
Carolina
Kansas 342 172,328 South Dakota 428 53,940
Kentucky 516 216,102 Tennessee 384 376,916
Louisiana 680 254,562 Texas 529 1,863,954
Maine 146 65,492 Utah 206 195,088
Maryland 305 411,100 Vermont 182 33,033
Massachusetts 133 564,047 Virginia 422 554,306
Michigan 381 520,803 Washington 250 597,874
Minnesota 176 373,419 West Virginia 381 79,140
Mississippi 636 114,734 Wisconsin 378 332,263
Missouri 424 319,394 Wyoming 428 39,601
Standard Deviation

I am determining standard deviation in this section. Standard deviation is a measure that allows

one to discern the spread of the data (how spread apart or close together the data points are).

I will first measure the standard deviation of X.

To do so I must determine the average of the values, or the mean.

∑𝑥
𝜇=
𝑛

𝜇, or the mean, is equal to the sum of the x values over the number of terms.

17966
𝜇=
50

𝜇 = 359.32

Next, I will determine the actual standard deviation with this formula:

∑6 (𝑥2 − 𝜇)5
𝜎 = 0 278
𝑛

This formula shows that standard deviation is the sum of the difference of each x value by the

mean squared over the number of terms, all square rooted.

Therefore, first one needs to find the difference between each x value and the mean, and the

square of that value.

For Alabama:

Incarceration rate per 100,000 in 2019: 419


𝑥2 − 𝜇 = 419 − 359.32

= −59.68

(𝑥2 − 𝜇)5 = (−59.68)5 = 3561.7024

I did this for all 50 states using Excel.

I then found the sum for all of the values, and substituted values into the standard

deviation equation:

872045.1776
𝜎=0
50

𝜎 = 132.0640131

I used the same formula (substituting x for y) to find the standard deviation for y, which

ended up being 415,802.56.

Reflecting on the standard deviations, the values vary significantly for x and y. This may

indicate that there are multiple factors at play that drastically influence both incarceration rates

and GDP.

Least Square Regression Line

I will now find the equation for the line using the formula for least square regression. I want to

find out what the equation for the line is in order to determine if there is a negative, positive or

no correlation between the variables.


This table depicts the individual values of x, y, xy, and x2 for Alabama, as well as the sums and

averages overall across the 50-point data set.

Table 2: Values needed for Least Square Regression


(Alabama)
x y xy x2
419 231,172 96861068 175561
Sum of x Sum of y Sum xy Sum of x2
17966 20,790,128 7463796246 7331150
Average of Average of
Average of y Average of x2
x xy
359.32 415,803 149275925 146623

The rest of this table can be found in the Appendix under Figure 2.

I will be using this formula to find the line of the data:

𝑦< = 𝛽>8 𝑥 + 𝛽>@

Wherein,

𝑆𝑆BC
𝛽>8 =
𝑆𝑆BB

𝛽>@ = 𝑦D − 𝛽>8 𝑥̅

𝑦D and 𝑥̅ are respectively the averages of y and x values. This information was already found

when I calculated standard deviation. I calculated the mean for both x and y.

To acquire the values, first I have to find 𝑆𝑆BC and 𝑆𝑆BB .

1
𝑆𝑆BC = G 𝑥𝑦 − (G 𝑥)(G 𝑦)
𝑛

Using the values in the table for sums of x, y, and x2 I will be substituting the proper values into

the equation.

1
𝑆𝑆BC = 7,463,796,246 − (17966)(20,790,128)
50
𝑆𝑆BC = −6512547

Then, I will calculate 𝑆𝑆BB .

1 5
𝑆𝑆BB = G 𝑥 5 − (G 𝑥)
𝑛
1
𝑆𝑆BB = 7331150 − (17966)5
50

𝑆𝑆BB = 875606.88

I will then substitute the values in for both.

𝑆𝑆BC
𝛽>8 =
𝑆𝑆BB

𝛽>8 = −7.437752202

𝛽>@ = 𝑦D − 𝛽>8 𝑥̅

𝛽>@ = 418475.5331

Therefore, the equation for the line is:

𝑦< = −7.4378𝑥 + 418475.5331

*I have rounded to four decimal places for each value to keep accuracy.

The line that I found indicates a negative slope due to the 𝛽>8 value. This may indicate a negative

correlation. To be sure, I graphed the line on Excel and also graphed the data points to see how

scattered the data was. The results are shown in Figure 3:


Figure 3

This figure shows that the line is quite flat. The data is also quite scattered. I believe this shows

that there is little correlation between the two variables. I want to confirm this through finding

Pearson’s correlation coefficient for this data.

Pearson’s correlation coefficient

I will be finding this value using the values I found in the previous section. I will find ∑ 𝑦 5 the

same way I found ∑ 𝑥 5 .

r = Pearson’s correlation coefficient.

𝑛(∑ 𝑥𝑦) − (∑ 𝑥)(∑ 𝑦)


𝑟=
J[𝑛 ∑ 𝑥 5 − (∑ 𝑥)5 ][𝑛 ∑ 𝑦 5 − (∑ 𝑦)5 ]
50(7463796246) − (17966)(20790128)
𝑟=
M[(50)(7331150) − (17966)5 ][(50)(22849100000000) − (20790128)5 ]

𝑟 = −0.0018466435

𝑟 5 ≈ 0.00000341

The r2 value for this is almost zero. This indicates that the values do not fit the line of best fit

well.

The r value shows that the correlation between the two variables is weakly negative, and the

variables have low correlation with each other. The line also does not show a significant slope,

showing that the variables do not relate much to each other.

Chi Squared Test for Independence

I will be doing this test in order to see if the variables are related or if they are independent from

each other. I will use this formula to do so:

(𝑂2 − 𝐸2 )5
𝜒P5 =G
𝐸2

Wherein c= degrees of freedom, 𝑂2 indicates observed value and 𝐸2 indicates expected value.

This formula will help determine the relationship between the two variables.

First, I will formulate two hypotheses:

𝐻@ 𝑁𝑢𝑙𝑙 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠: 𝐺𝐷𝑃 𝑎𝑛𝑑 𝑖𝑛𝑐𝑎𝑟𝑐𝑒𝑟𝑎𝑡𝑖𝑜𝑛 𝑟𝑎𝑡𝑒𝑠 𝑜𝑓 𝑈𝑆 𝑠𝑡𝑎𝑡𝑒𝑠 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑜𝑡ℎ𝑒𝑟

𝐻8 𝐴𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠: 𝐺𝐷𝑃 𝑎𝑛𝑑 𝑖𝑛𝑐𝑎𝑟𝑐𝑒𝑟𝑎𝑡𝑖𝑜𝑛 𝑟𝑎𝑡𝑒𝑠 𝑜𝑓 𝑈𝑆 𝑠𝑡𝑎𝑡𝑒𝑠 𝑎𝑟𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑛 𝑒𝑎𝑐ℎ 𝑜𝑡ℎ𝑒𝑟.

Then, I will split the data into categories using this table:
Table of Observed Values

Category 1 Category 2
Totals
of y of y
Category 1
a b a+b
of x
Category 2
c d c+d
of x
Totals a+c b+d n

Table of Expected Values

Category 1 of y Category 2 of y Totals

Category 1 (𝑎 + 𝑏)(𝑎 + 𝑐) (𝑎 + 𝑏)(𝑏 + 𝑑)


of x a+b
𝑛 𝑛
Category 2 (𝑎 + 𝑐)(𝑐 + 𝑑) (𝑏 + 𝑑)(𝑐 + 𝑑)
of x c+d
𝑛 𝑛
Totals a+c b+d n

Now, I will substitute the values for each category:

Table of Observed Values

Incarceration rate per 100,000 for


each state

GDP Value
for each 100-300 301-700 Totals
state
$30,000-
12 23 35
$420,000
430,000-
4 9 13
4,000,000
Totals 16 32 48
Table of Expected Values

Incarceration rate per 100,000 for


each state

GDP Value
for each 100-300 301-700 Totals
state
$30,000- (35)(16) (35)(32)
35
$420,000 48 48
430,000- (16)(13) (32)(13)
13
4,000,000 48 48
Totals 16 32 48

Table of Expected Values

Incarceration rate per 100,000 for each


state

GDP Value
for each 100-300 301-700 Totals
state
$30,000-
11.66666667 23. 3D 35
$420,000
430,000-
4. 3D 8.666666667 13
4,000,000
Totals 16 32 48

To determine the degrees of freedom for hypothesis testing, we must do:

𝑐 = (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠 − 1)(𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑜𝑤𝑠 − 1)

𝑐 = (2 − 1)(2 − 1)

𝑐=1

Now, I will use the aforementioned formula to find Chi Squared:


(𝑂2 − 𝐸2 )5
𝜒P5 =G
𝐸2

D )5 (4 − 4. 3
(12 − 11.66666667)5 (23 − 23. 3 D )5 (9 − 8.666666667)5
𝜒85 = + D + D +
11.66666667 23. 3 4. 3 8.666666667

𝜒85 ≈ 0.052747

*The value is rounded to 5 significant figures for accuracy.

The p-value at 5% for this test was .81843. Since 0.052747<0.81843, the result is not significant

in indicating dependence between the variables. Therefore, the hypothesis is null, and the two

variables are independent of one another.

Limitations and Conclusion:

For this study, there were many limitations. Firstly, the data set I used was from 2019, as

that was the most recent incarceration data I could find. This may not be indicative of the

situation recently. Additionally, I also used data that was very broad to try to find a lead for my

investigation. I measured overall GDP for individual states, rather than more specific economic

indicators of the state economies. Using GDP may have hindered my study because it was quite

a broad indicator of economic growth.

For now, through my study, I found that there is no significant relationship between

incarceration rates and GDP of US states. The implications of this study are a few: firstly, this

helps me shift my direction in studying this topic. Instead of studying broad economic indicators,

it may be more helpful to study specific areas and their economic output relative to other parts of

the state. Being more specific will allow me to properly discern what kind of work is needed to
be done in the area of prison reform in regards to the economic benefits it may have.

Additionally, using the data from this study, I can study more aspects of what exactly GDP is

affected by and use that to narrow down my search in finding out if prisons impact local

economies, and if so, if they do so positively or negatively.


Works Cited

Chi-Square Statistic: How to Calculate It / Distribution. (n.d.). Retrieved December 19, 2021, from

https://www.statisticshowto.com/probability-and-statistics/chi-square/

Economics of Incarceration. (n.d.). Retrieved December 19, 2021, from

https://www.prisonpolicy.org/research/economics_of_incarceration/#:~:text=Total%20U.S.%20g

overnment%20expenses%20on,prisons%20and%20jails%3A%20%243.9%20billion%20%2B&t

ext=Annual%20cost%20to%20families%20of,who%20are%20unemployed%3A%2027%25%20

%2B

GDP by State. (n.d.). Retrieved December 19, 2021, from https://www.bea.gov/data/gdp/gdp-state

State Statistics Information. (n.d.). Retrieved December 19, 2021, from https://nicic.gov/projects/state-

statistics-information?location=Kansas


Appendix:

Figure 2:
Table 2: Values needed for Least Square Regression
x y xy x^2
419 231,172 96861068 175561
244 54,547 13309468 59536
558 369,988 206453304 311364
586 130,840 76672240 343396
310 3,052,645 946319950 96100
341 392,218 133746338 116281
245 280,692 68769540 60025
382 72,488 27690416 145924
444 1,116,435 495697140 197136
507 637,799 323364093 257049
215 91,781 19732915 46225
475 82,420 39149500 225625
302 867,536 261995872 91204
399 373,518 149033682 159201
293 190,403 55788079 85849
342 172,328 58936176 116964
516 216,102 111508632 266256
680 254,562 173102160 462400
146 65,492 9561832 21316
305 411,100 125385500 93025
133 564,047 75018251 17689
381 520,803 198425943 145161
176 373,419 65721744 30976
636 114,734 72970824 404496
424 319,394 135423056 179776
440 51,789 22787160 193600
289 131,352 37960728 83521
413 181,743 75059859 170569
197 83,844 16517268 38809
210 613,509 128836890 44100
316 101,972 32223152 99856
224 1,694,958 379670592 50176
313 595,655 186440015 97969
231 59,005 13630155 53361
430 666,974 286798820 184900
639 203,700 130164300 408321
353 246,647 87066391 124609
355 772,611 274276905 126025
156 59,129 9224124 24336
353 244,662 86365686 124609
428 53,940 23086320 183184
384 376,916 144735744 147456
529 1,863,954 986031666 279841
206 195,088 40188128 42436
182 33,033 6012006 33124
422 554,306 233917132 178084
250 597,874 149468500 62500
381 79,140 30152340 145161
378 332,263 125595414 142884
428 39,601 16949228 183184
Sum of x Sum of y Sum xy Sum x^2
17966 20,790,128 7463796246 7331150
Average of x Average of y Average of Average
359.32 415,803 149275925 146623

You might also like