You are on page 1of 16

Kyli Bartholomew

MATH 1040
Instructor Woodward
April 22nd, 2023
Final Project

I will be adding all of the projects (1-4) to this document to show proof of how I have grown throughout
stats.

In this project you will see that I have gone through and randomly selected 10 countries out of 194 from
the World Health Organization (WHO). After finding 10 randomly selected countries I went to the WHO
tuberculosis profile to find the total number of incidents, success rate and the cohort size for each 10
countries.

(This if my personal graph with the random 10 countries out of the 194 WHO
countries.) I chose Germany as my member of WHO.
(This is the incident, notified cases by age group and sex, 2021 for Germany on the TB profile website.)

1. For females, the highest age group with notified cases happened to be between the ages of 25-
34. I estimate that the total number is about 300 cases for the ages of 25-34. After I calculated
the total for all the ages for females, I estimate the number to be about 1,275 (the first bar being
about 250, second 100, third 150, fourth 200, fifth 300, sixth 200, seventh 50, and eight 25).
After dividing 300/1,275 I got the decimal of 0.235 which would be my relative frequency.

2. For males, the least notified age group happened to be ages 0-4. I estimate the total to be about
40 cases. After I calculated the total number for all of the males with notified cases I estimate
the total to be about 2,510(the first bar being about 400, second 300, third 350, fourth 375, fifth
575, sixth 410, seventh 50, eight 50). After dividing 50/2510 I got the decimal of 0.019 which
would be my relative frequency.

3. If there were 5000 total notified cases for the two genders, I would estimate the female highest
estimate to be about 1,235 out of 5,000. For the males who had the least amount of TB cases, I
estimate it to be about 1,019 out of 5,000.
𝐻𝜎 : 𝑃 = 0.85
𝐻𝐴 : 𝑃 ≠ 0.85
𝑃 − 𝑉𝑎𝑙𝑢𝑒 = 0.40
𝛼 = 0.05
𝑃 − 𝑉𝑎𝑙𝑢𝑒 < 𝛼
In conclusion, we reject the null hypothesis. We have sufficient evidence to state that the global
threshold for successful treatment for TB is less than 85%.

This is the second part of my data project; I will be using the table shown below to choose 2 countries
for this project. I will be using Egypt and Armenia to show the true proportions of TB treatment.
I will be using the cohort size of the country as a sample size in my equation (𝓃 ), then I will be using the
null value of 0.85 (p). With the countries I selected I will be checking for 3

requirements. 1 being is it a random sample or randomized experiment? 2 does this apply n≤0.05N, and
lastly do these two equations apply also: np≥10 and n (1- p)≥10.

Egypt:

1) It is random, as I used a number generator out of 9 (because I used Haiti in the first part of the
project) and got the number 8 which is Egypt.
2) n≤0.05N: 6834 in the country Egypt is less than 5% of the total population of Egypt.
3) 6834×0.85≥10 = 5808.9≥10 6834(1-0.85)≥10 = 1025.1≥10

Armenia:

1) It is random, because I used a number generator out of 9 and got the number 5 which is
Armenia.
2) n ≤0.05N: 303 in the country of Armenia is less than 5% of the total population of Armenia.
3) 303×0.85≥10 = 257.55≥10 303(1-0.85)≥10 = 45.45≥10

For the next part I am going to use GeoGebra which is an app we use in Math statistics to help us get to
the lower and upper limit of the confidence interval. Below you will see the pictures of me getting the
lower and upper limit for Egypt and Armenia. In GeoGebra we are using a confidence interval of 95%
which will be 0.95.
Egypt, it has a lower limit of 0.8415 and upper limit of 0.8585. To find the successes I multiplied the
cohort size (6834) by the null value (0.85) and got the number 5808.9 but rounded as there is not .9 of a
person.
We are 95% confident that the interval from 0.8415 to 0.8585 contains the true proportion for the true
success rate of successful TB treatment for Egypt.
Armenia, I got the lower limit of 0.8114 and upper limit of 0.8915. To find the successes I multiplied the
cohort size (303) by the null value and got 257.55 and rounded that to 258.
We are 95% confident that the interval from 0.8114 to 0.8915 contains the true proportion for the true
success rate of successful TB treatment for Armenia.

For the next part, I am going to determine if the global threshold of 85% for my countries selected is
correct or not. For Egypt, it would be correct as the intervals are (.841, .858) and the percentage of .85
lies between them as they are smaller and bigger than .85. Armenia’s intervals are (.811, .891) which
means the statement is true and that .85 does lie between the intervals for Armenia.
Above is a screenshot of a hypothesis test that I did for the first country in part one which was Haiti. I
used the null value of 0.85 and the success and sample size to find the values.

I am going to use the 3 requirements to see if Haiti also applies. Above you can see I used GeoGebra and
used my null value as 85% or 0.85. I used a two-sided test to find the numbers as shown in the picture
above.

1) It is random because, I selected these 10 countries out of 100+ using a number generator out of
100 and got Haiti as my first country.
2) n≤0.05N: Yes, 10844 is less than 5% of the total population within Haiti.
3) 10844×0.85≥10 = 9217.4≥10 10844(1-0.85)≥10 = 1626.6≥10

I did a hypothesis test for Haiti, I used 0.85 for my null value because the global threshold is 85% for TB
success rate. Which is why in the picture of the hypothesis test you will see that I chose (≠) because we
are comparing it 0.85 and not greater or lesser than just 0.85 as a whole and seeing where it lands. I will
be using the success rate that I found earlier (cohort size × success rate) then the N is the cohort size.
Then for my alpha I have chosen a=0.05.
For the hypothesis test I will be using these equations below:

𝐻𝑂 : 𝑃 = 0.85
𝐻𝐴 : 𝑃 ≠ 0.85
My P-value for Haiti is .9914 as you can see in the picture. Then my test statistic is –0.0108. I will use .5
for my alpha.

𝑃 − 𝑉𝑎𝑙𝑢𝑒 ≈ 0.9914 > 𝛼

We fail to reject the null value as there is insufficient evidence to conclude that Haiti’s TB success rate is
true to the global threshold of 85% success rate.

I am going to compare my hypotheses test from part 1 of my project compared to this part of my
project.

A similar part between the projects is the fact the I use 0.85 in both as guide. The biggest difference
would be that for the number of successes, we have a graph to show us how to calculate our successes
(cohort size × success rate).
We use the information in the first graph to provide a more accurate answer for each country we
research. We can use the cohort size as our sample size and then the number we got from multiplying
our cohort size by our null value (0.85). Because we were able to get more accurate information, we can
use GeoGebra rather than Rossman chance. We were able to get p-values for both parts of the project;
however, I would say that part 2 is more valid. We were able to calculate much more accurate answers
for (n and p̂). With my knowledge and expansion of math statistics I learn more ways to calculate
answers for each problem given to me. With this part of the project, I knew exactly how to (p) and use
that for Egypt and Armenia. I was also able to show how they made the requirements for a random
sample. A similarity is that I was able to show that for the success rate of TB that all the countries (Haiti,
Egypt and Armenia) were all above the 85% global threshold. For Egypt and Armenia, I was able to show
the lower and upper limit which happened to both be between the 0.85.

I will be using the information I gathered in project 1 where I randomly selected 10 countries and got
their total TB incident and will be making a table with that information.
Country Total TB Incident
Haiti 159
Sweden 3.8
Germany 5
Libya 59
Columbia 41
Armenia 27
Belarus 30
Iran 12
Egypt 10
Saint Vincent and Grenadine 8.7
Above is the boxplot graph I have made of the 10 sampled countries from the list.
My boxplot is skewed to the right, meaning that there is a long side (sometimes a tail) on the right side
than the left. The median of my graph is the most important as it is the middle, the median of my
boxplot is 19.5. The Q1 (First Interquartile) is 25% of the data that falls below the value. Then the Q3
(Third Quartile) is 75% of the data that falls below the value. For my boxplot given above my Q1 is 8.7
and my Q3 is 41. Something that we do with the Q1 and the Q3 is find the IQR (interquartile range)
which we do by Q3-Q1. Our IQR is 32.3. Because our boxplot is

right skewed that means the best way to interpret the middle is by the median because we are using
IQR. It would be the median instead of the mean because the median represents the prominent peak of
the graph.
The best measure of variability for a skewed graph is IQR. My IQR as stated above is 32.3, IQR helps with
a skewed graph because it is not affected by the outliers given within the

information. Another way we can calculate the variability is by using the range (range=

maximum-minimum) which would be 159-3.8= 155.2 however this is not the best way to find the
variability due to it including the outliers. Which is why the IQR is the best variability for the boxplot
which is 32.3.
As you can tell by the graph, there is an outlier (a number that is extreme when compared to the rest of
the data. The outlier is 159 as there is no other number that is 3 digits.

The way we make sure that this is an outlier is by calculating the lower fence (the lowest number that it
can be without being an outlier) and the upper fence (highest number it can be without being an
outlier). I am going to start by finding the lower fence. The calculation I use to find the lower fence is 𝑄1
− 1.5 × 𝐼𝑄𝑅. After I find the lower fence, I will find the upper fence which is calculated by 𝑄3 + 1.5 ×
𝐼𝑄𝑅.
Because I got 41 for Q1 I will be finding the answer for 8.7 − 1.5 × 32.3 (while remembering I got 32.3 for
the IQR) the answer is –39.6. I will find the upper fence next, the calculation I will be using is 41 + 1.5 ×
32.3 which is 89.45.
Upper fence= 89.45
Lower fence= -39.6

Looking through the 10 total TB incident numbers I can confirm that I have one outlier which is 159 due
to it being over 89.45. Haiti has the 159 total TB incident number making Haiti the outlier of the 10
countries I had randomly selected.

Next, I will be checking the three conditions to compute a confidence interval or perform a hypothesis
test.
1. Is it random or a randomized experiment?
a. Yes, because there were about 200 countries from the original list, and I used a number
generator to randomly select 10 countries out of the 200.
2. Is the sample Independent?
a. Yes, because 10 countries are less than 5% of all countries.
b. Yes, because 49900 is less than 5% of the total population of all countries.
3. 𝑁 ≥ 30
a. 49900(the cohort size of all 10 countries) 49900 ≥ 30
Because I was able to say yes that all three were met, they do meet the criteria to compute a confidence
interval or perform a hypothesis test.

I am going to compute and interpret a 95% Confidence Interval for my 10 countries. I got the mean
(35.55) from GeoGebra, then the standard deviation known as s from GeoGebra as well.

N is the population size.


We are 95% confident that the interval between 1.99 to 69.1 contains the true mean for the total TB
incidence for all countries.

For the last part of this project, I am going to conduct a two-sided hypothesis test. I will be using the
CDC global incidence rate of TB for my null hypothesis, which is 132, I will be getting the rest of my
information from my previous t estimate of a mean test.

𝐻𝑂 : 𝜇 = 132
𝐻𝐴 : 𝜇 ≠ 132
𝛼 = 0.05
𝑃 − 𝑉𝑎𝑙𝑢𝑒: 0.0001
𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐: − 6.50
Because of my findings I can make a conclusion from the hypothesis test.

We reject 𝐻𝜊. There is sufficient evidence to conclude that the mean of the global incidence rate for TB
was different than the reported value of 132.

I will make a table showing the success and failures of the randomly selected 10 WHO countries I
selected in Part 1-3.

Treatment Success/Failure
Success Failure TOTAL
Member of WHO Haiti 82 18 100
Sweden 72 28 100
Germany 74 26 100
Libya 69 31 100
Columbia 71 29 100
Armenia 81 19 100
Belarus 85 15 100
Iran 84 16 100
Egypt 89 11 100
Saint Vincent and 46 54 100
Grenadine
TOTAL 753 247 1000

I am going to use the general addition rule along with the multiplication rule and the conditional
probability in order to find the probability for the following questions below. I will use the graph above
too in order to find the probabilities. We learned these in chapter 5 in intro to statistics and that is what
I will be applying to find the probabilities for questions 1-5.

1. The probability that a randomly selected case that is from the 5th or 6th member of WHO.
100 100 200
𝑃 (5𝑡ℎ 𝑚𝑒𝑚𝑏𝑒𝑟 ) + 𝑃 (6𝑡ℎ 𝑚𝑒𝑚𝑏𝑒𝑟) = + = = 0.2
1000 1000 1000

2. The probability that a randomly selected case is from the 5th member of WHO in your table or is a
failure.

29 247 69
𝑃 (5𝑡ℎ 𝑚𝑒𝑚𝑏𝑒𝑟 𝑜𝑟 𝑓𝑎𝑖𝑙𝑢𝑟𝑒) = + = = 0.276
1000 1000 250

3. The probability that a randomly selected case is from the 5th member of WHO in your table and is a
failure.
29
𝑃 (5𝑡ℎ 𝑚𝑒𝑚𝑏𝑒𝑟 𝑜𝑟 𝑓𝑎𝑖𝑙𝑢𝑟𝑒) = = 0.029
1000

4. The probability that a randomly selected case is from the 6th member of WHO in your table,
given it is a failure.
19
𝑃 (6𝑡ℎ 𝑚𝑒𝑚𝑏𝑒𝑟 𝑔𝑖𝑣𝑒𝑛 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 ) = = 0.077
247

5. The probability that three randomly selected cases (without replacement) are all successes from the
8th member of WHO in your table.

84 83 82
× × = .00057
1000 999 998

Below is a picture of Columbia’s and Armenia’s successes and the difference between them. I find the
interval on GeoGebra (the upper and lower limit) and then I will interpret a 95% confidence interval.
We are 95% confident that the interval -0.218 to 0.018 contains the true difference between
proportions of successful treatments for Columbia (5th member) and Armenia (6th member).

Below is a picture of a T test I ran on GeoGebra with Columbia’s and Armenia’s successes with a null
hypothesis. Which I chose ≠ because I am just trying to find the difference not if one is higher or lower. I
will then identify the test statistic (Z) and p-value (p).
𝐻𝐴 : 𝑃𝐶𝑜𝑙𝑢𝑚𝑏𝑖𝑎 = 𝑃𝐴𝑟𝑚𝑒𝑛𝑖𝑎

𝐻𝑂 : 𝑃𝐶𝑜𝑙𝑢𝑚𝑏𝑖𝑎 ≠ 𝑃𝐴𝑟𝑚𝑒𝑛𝑖𝑎

𝛼 = 0.05

𝑇𝑒𝑠𝑡 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 = −1.66

𝑃 − 𝑉𝑎𝑙𝑢𝑒 = 0.098

𝑃 − 𝑉𝑎𝑙𝑢𝑒 > 𝛼

We fail to reject the 𝐻𝑜. There is insufficient evidence to conclude that the proportion of
successful treatments for Columbia and Armenia are different.

You might also like