You are on page 1of 17

Data Project Part I

10 Random World Health Organization members:

1. Democratic Republic of the Congo


2. Ireland
3. Netherlands
4. Azerbaijan
5. Cyprus
6. Burkina Faso
7. Trinidad and Tobago
8. Burundi
9. Denmark
10. Iceland

Sampling method used: Simple Random Sampling

I used the tool GeoGebra to pick from each of the members. They all had an equal chance
of being chosen, as I used the random between function. To do that, you click an empty space in
GeoGebra, then type random. A drop menu will appear and the first one is titled:
RandomBetween. Select the first of its two choices, then input the lowest value and the highest.
It will then select a number completely at random. It is simple random because it is sort of like
picking names from a hat, it is completely randomized and the values were collected directly
from the application, with no interference on my part. Each number correlated with a country
that is a member of the W.H.O..
Tuberculosis Data in the 10 W.H.O. Members:

Country Total TB Incidence Cohort Treatment Success


(per 100k pop.)

Democratic Republic 318 200,955 94%


of the Congo

Ireland 4.8 223 6%

Netherlands 4.4 601 83%

Azerbaijan 63 1,180 82%

Cyprus 4.4 34 44%

Burkina Faso 45 5,691 42%


Trinidad and Tobago 13 220 70%

Burundi 100 7,105 95%

Denmark 3.8 206 32%

Iceland 2.9 12 25%

Graphic from Ireland’s Data


• The 25-34 Age group had the most notified cases for females. The relative frequency
would be about 0.366. There were around 30 notified cases for that age group, and about
82 for all of the female age groups combined, so 30/82 would equal 0.3658…

• The age group for the males with the least notified cases would be the 5-14 age range,
with about 2 reported cases. The total of all the age groups on the men's side is around
154, so 2 out of 154 would be 0.013.

For females in the 25-34 age category, if there were to be a total of 5000 notified cases, we could
say there would be about 1,830 reported cases. (5000x0.366)

For males in the 5-14 age category, if we were to look at 5000 reported cases, we could say the
amount we would find would be 65 individuals. (5000x0.013)
Ho: p = 85%
Ha: p ≠ 85%
Alpha value: 0.05

Using a p^ value of 94% (from the data from the Democratic Republic of the Congo), we get a p
value of 0.02
I reject the Null hypothesis (the DRC is at the global threshold) as I have sufficient evidence in
favor of the alternative hypothesis (the DRC is not at the global threshold because the p value of
0.02 is less than 0.05 or alpha)
Data Project Part II

Are the three requirements met to obtain a valid confidence interval?

Burkina Faso:
n:5,691
p^:0.42
P:0.95

1. WHO data is random


2. The observations are independent, 5,691 is less than 5% of all people in Burkina Faso
3. The sample size is large enough:
• 5,691 x 0.42 = 2390.22 (>10)
• 5,691(1-0.42) = 3300.78 (>10)
4. Should result in valid data

Burundi:
n: 7,105
p^:0.95
P:0.95
5. WHO data is random
6. The observations are independent, 7,105 is less than 5% of all people in Burundi
7. The sample size is large enough:
• 7,105 x 0.95 = 6749.75 (>10)
• 7,105(1-0.95) = 355.25 (>10)
8. Should result in valid data
B&C

Burkina Faso Data:

Burundi Data:

The PLOS One Report global threshold for successful treatment of 85% would not be a likely
value for either Burkina Faso or Burundi, as it does not fit into the intervals of either group.

Burkina Faso: (0.41 , 0.43) - 0.85 is not found in that range


Burundi: (0.95 , 0.96) - 0.85 is also not found in that range
D

Democratic Republic of the Congo:


n:200,955
p^:0.94
P:0.85

1. WHO data is random


2. The observations are independent, while 200,955 is a large number, the population of the DRC
is approximately 96 Million, therefore it is less than 5% of all people in the DRC
3. The sample size is large enough:
• 200,955 x 0.94 = 188,897.7 (>10)
• 200,955(1-0.94) = 12,057.3 (>10)
4. Should result in valid data

Hypothesis test for The Democratic Republic of the Congo:

Ho: p = 0.85
Ha: p ≠ 0.85
Alpha Value: 0.05
The test statistic or z-score is 112.99
The P value is zero

Because the p value is zero, it is definitely lower than the alpha value, leading me to reject the
null hypothesis (DRC is at the global threshold of 85%) because I have sufficient data in favor of
the alternative hypothesis (DRC is not at the global threshold because p < 0.05)

Comparison Part i and ii hypothesis test results:


Both hypotheses were the same other than the difference in p value from the rossman chance
data set. I rejected the null in favor of the alternative hypotheses in both tests because of the p
value being lower than the alpha value.

The resulting p-value of the rossman chance of 0.02 I found interesting since the value with the
total number in this part of the project resulted in a p value of 0. I think it was zero because the
sample was much larger in this project's part, therefore getting it closer and closer down to zero.

I believe that the second test/the one from this project was more accurate, because it was a much
larger number of people sampled and the larger area we scan through, the more accurate data we
can produce.
Project part III

W.H.O. Member Total TB Incidence (per Cohort


100k pop.)

Democratic Republic of the 318 200,955


Congo

Ireland 4.8 223

Netherlands 4.4 601


Azerbaijan 63 1,180

Cyprus 4.4 34

Burkina Faso 45 5,691

Trinidad and Tobago 13 220

Burundi 100 7,105

Denmark 3.8 206

Iceland 2.9 12

Section A Graph Shape: Skewed-Right

Center: 8.9 (Median)


Spread/Variation: 58.6 (IQR)

Any Outliers?

Upper fence:
• Q3 + 1.5 x IQR
• 63 + 1.5 x 58.6
= 150.9

Lower fence:
• Q1 - 1.5 x IQR
• 4.4 - 1.5 x 58.6
= -83.5

Values Outside fences:

318 > 150.9 - Democratic Republic of the Congo

Does this meet the 3 conditions for computation of a confidence interval?

Condition 1 is it Random?
• Yes, I performed a random sampling method to obtain the data, and the data is
also from a random sample
Condition 2 is it Independant?
• Is n less than or equal to 5% of N?
• n = 216227 and N = world population
• Yes, the data is independent. 216,227 is definitely smaller than 5 percent of the
world's population
Condition 3 is it known to be normal?
• Is n greater than or equal to 30?
• Yes, 216,227 is larger than 30
With the given conditions and values, I believe that steps e and f should yield valid
results

E
F
Ho: µ = 132
Ha: µ ≠ 132

Alpha value: 0.05


Test statistic: -2.4596
P-Value: 0.0362

0.0362 (p-value) < 0.05(alpha)


P less than a means reject Ho

• I reject the null hypothesis, as there is sufficient evidence to conclude that my


sample of 10 W.H.O. Members is has a different incidence rate than that of the
global amount
Data Project Part IV

Part A Contingency Table:

Treatment
Success/failure
Success Failure Total
M Democratic Republic of the Congo 94 6 100
E
M Ireland 6 94 100
B Netherlands 83 17 100
E
R Azerbaijan 82 18 100
O Cyprus 44 56 100
F
Burkina Faso 42 58 100
W Trinidad and Tobago 70 30 100
H
O Burundi 95 5 100
Denmark 32 68 100
Iceland 25 75 100
Total 573 427 1,000

Table Explanation:
Each of the ten randomly selected countries I have chosen are listed in the table above.
This table shows successful and unsuccessful treatment rates of TB by country. The
successes and failures are shown in amounts per 100 people. This can help us see
what their values may represent relative to or in comparison to other countries.
Part B Probabilities:
1. Probability a randomly selected case is from the 5th or 6th member of the WHO in
my table:
1001000+1001000=0.2 1001000+1001000=0.2

- For this, we are taking everyone, success/treatment, from 5 & 6 and comparing it
to the total of all 10 groups

2. Probability a randomly selected case is from the 5th member of a WHO in my


table or is a failure:
1001000+4271000−561000=0.471 1001000+4271000−561000=0.471

- Here, we take successes and failures from the 5th member row, and everyone
from the failure column. Since one of those overlap, we subtract it as to not count
for it twice

3. Probability a randomly selected case is from the 5th member of WHO in my table
and is a failure:
561000= 0.056561000= 0.056

- By adding “and” this narrows it down to one cell of the table it must be at the
overlapping point where the row and column meet
4. Probability that a randomly selected case is from the 6th member of WHO in my
table, given it is a failure:
58427=0.1358

- Given means the total must be in the failure column, we take the failures from
group 6 and put it out of that total for the fail column

5. Probability that three randomly selected cases (without replacement) are all
successes from the 8th member of WHO in my table:
95100×9499×9398=0.856

- Note, without replacement means that we start with the full hundred people from the
8th WHO member on the table, and every time we select one, there is one less, so it
would become out of 99,98,97 and so forth

Part C Confidence Interval:

I am 95 percent confident that the interval between –0.1172 and 0.1572 contains the true difference of
proportion of successful treatment rates between the 5th and 6th members of WHO

- Here we are establishing an area in which we can be 95% certain we will find out difference in
proportions of successful treatment rates when looking at the 5th and 6th members from the
table.
Part D Hypothesis Test:

Ho: η5th member = μ6th member Ho: 𝜂5tℎ group= 𝜇6tℎ group

Ha: η5th member ≠ μ6th member Ha: 𝜂5tℎ group ≠ 𝜇6tℎ group

Alpha value: 0.05


Test Statistic: 0.2857
P-value: 0.7751
P-value > Alpha Value

I fail to reject the null hypothesis, because there is insufficient evidence to prove there is
a difference in proportions of successful treatment of the 5th and 6th members of WHO.

- We can now test for a probability value with our examples from the 5 th and 6th
members.
- If that value is higher than our alpha value, the set up threshold for determining
the accuracy of the test, then we stick to the original claim that there is no
difference between the two members.
- If that value is lower than alpha, we reject the original claim in favor of a new
one, a difference does in fact exist between the two proportions.

You might also like