You are on page 1of 17

ADDIS ABABA UNIVERSITY

COLLEGE OF HEALTH SCIENCE


SCHOOL OF MEDICINE

BIOSTATISTICS ASSIGNMENT

Submitted to: School of Public Health

Prepared by:
Name: Wondimnew Walle
ID: UGR/2899/12

October 21,2021
Biostatistics
Assignment

Assignment one
1.. Consider the experiment of tossing a fair die and define the following events:
A = {Observe an even number of dots}
B = {Observe a number of dots less or equal to 4}.
Are events A and B independent?

solution:
A is an event of founding an even number from tossing of a single die which has a
probability of 3/6 or 1/2
A = {2, 4, 6}
B is an event of getting number less than or equal to 4 with a probability of 4/6 or
2/3
B= {1, 2, 3, 4}
Two events A and B are independent if the knowledge that one occurred does not
affect the chance the other occurs.
Two events are independent if one of the following are true:
 P(A|B) = P(A)
 P(B|A) = P(B)
 P (A AND B) = P(A)P(B)
Let us check by at least one P (A AND B) =2/6=1/3 and P(A) X
P(B)=1/2X2/3=1/3
Therefore, our events of A and B satisfy these conditions. Hence, the outcomes of
two roles of event A and B of a fair die are independent events

2
2. Suppose that three programmers are designing computer code for a project:
Mr. A has designed 60% of the code, Mr. B 30% and Mr. C 10%. Suppose further
that Mr. A has a bug in 3% of her work, Mr. B in 7% of her work, and Mr. C in 5%
of his.
A. What percentage of the code written has a bug?
B. Given that you find a bug in a line of code, who is most likely to have written it?
Who is least likely?
C. How does the ordering compare to the unconditional probabilities and why
does this relationship make sense?
Solution:
A. in order to find the total percentage of code having bug we have to find the
bug produced by each person.
Bug by MR. A =3% (60%) =3/100(0.6) =0.018
Bug by MR. B=7% (30%) =7/100(0.3) =0.021
Bug by Mr.=5% (10%) =5/100(0.1) =0.005
Total bug written by three persons =0.018+0.021+0.005=0.044
=44/1000
= 44/1000 x 100
= 4.4%
Therefore, the percentage of the code written has a bug is 4.4%
B. code having largest bug is written by person B. Person B has a bug of 0.021 OR
2.1% than others.
but code written by person C has smallest bug which is 0.005 or 0.5 % in his
code.
C. it is similar to unconditional probabilities. no conditioning event is given to
compare the probabilities.
Assignment two

3
3. Suppose you take a sample of N independent biologists to determine how
many of them use valid statistical methods.
• In particular, you have a sample of N independent, identically distributed RVs.
With Yi with p=P(Y=1)
• What is the distribution of the number of successes Y=∑N I=1 Yi in N trials?
!
Y~Bin(y;N,p) P(Y=Y) = Py (1-p) n-y when y=n, p(y=n) becomes pn which is
!( )!
the success
• Calculate the probability that 0 out of 10 biologists use valid statistical methods
when the probability of using valid statistical methods is 0.8
Solution:
This is binomial distribution of random variable x .
the random experiment has trial of n=10 with probability of success of each single
trial p=0.8 and faller of q=1-p=1-0.8=0.2
For binomial distribution of random variable X probability is given by formula
!
P(X=x) = px (1-p) n-x
!( )!

=10! /0! 10! (0.8)0 (0.2)10-0


=0.210
=1.024 x 10-7

Assignment three
4. Assume that among diabetics the fasting blood level of glucose is
approximately normally distribute with a mean of 105 mg per 100 ml and SD of 9
mg per 100 ml.
a) What proportions of diabetics have levels between 90 and 125 mg per 100 ml?
SOLUTION:

4
 This question is about probability distribution of continuous random
variable that is blood glucose level. Therefore, we need normal probability
distribution to calculate probability
Let X be fasting blood glucose level
Mean (µ) =105 mg and SD = 9 mg
Z = (raw score - population mean) / population SD i.e. Z= (x-μ) /σ
P (90< x<125) =P (90-105/9 <z<125-105/9)
=P(-1.67<z<2.22) = P (z<2.22) -P(z<-1.67)
=0.9868-0.0475
= 0.9393
Therefore 94% of blood glucose level is between 90mg and 125mg
b) What proportions of diabetics have levels below 87.4 mg per 100 ml?
solution:
P (x<87.4) =P (z < 87.4-105/9)
=P (z<-1.96)=P(z>1.96)=1- P(z<1.96)
=1-0.9750
= 0.025
Therefore 2.5% of blood glucose are below 87.4mg/100ml
c) What level cuts of the lower 10% of diabetics?
Solution:
Lower 10% is equal to upper 10% diabetic’s level. We can find z value from
standard normal distribution table at probability of 10% or 0.1
P (z<z0) =0.1 from table z=1.28 but the lowest become Z= -1.28
Z= (x-μ) /σ = -1.28 =x-105/9
X=93.5 mg/100ml
Therefore 93.5 mg/100ml cut lowest 10%

5
d) What are the two levels which encompass 95% of diabetics?
solution:
From normal distribution table 95% of area are in between z=-1.96 and z=1.96 so
using such Z value we can find blood glucose at these two points.
WHEN z= -1.96
Z= (x1-μ) /σ= -1.96=x1-105/9
X1=87.36 mg/100ml
When z=1.96
Z= (x2-μ) /σ=1.96=x2-105/9
X2=122.64mg/100ml
Therefore, the two level that encompass 95% are blood glucose level of
87.36mg/100ml and 122.64mg/1ooml.
5. Among a large group of coronary patients it is found that their serum
cholesterol levels approximate a normal distribution. It was found that 10% of the
group had cholesterol levels below 182.3 mg per 100 ml where as 5% had values
above 359.0 mg per 100 ml. What is the mean and SD of the distribution?
Solution:
From SND table Z value of lowest 10% or area of 0.1 is -1.29
At this level we have serum cholesterol of 182.3mg/100ml
Z= (x-μ) /σ
-1.29=182.3-μ/ σ
µ=182.3+1.29 σ…………………. .equation 1
Z value of above 5% or area of 0.05 is 1.65
Z= (x-μ) /σ
=1.65=359.0-µ/ σ
µ=359-1.65 σ…………………equation 2

6
by combine equation 1 and 2 we can get mean and SD
182.3+1.29 σ=359-1.65 σ
2.94 σ=176.7
σ =60.1mg/100ml Hence, standard deviation is 60.1 mg/100ml
from equation 1 we get mean by substitute SD
µ=182.3+1.29(60.1)
=259.83mg/100ml
Hence mean is 259.83mg/100ml

Assignment four
6. Let A and B denote two independent genetic traits. Suppose the probability
that an individual will exhibit trait A is ½ and the probability that an individual will
exhibit trait B is ¾.
a) What is the probability that an individual will exhibit Both traits?
Solution:
Since two events are independent event, probability of both traits become
P (A and B) = P(A) X P(B)=1/2 x ¾=3/8=0.375
b) Neither trait?
Solution:
Both events are independent, so probability of neither traits become
P (A’ and B’) =P(A’) X P(B’) =1/2 X ¼ =1/8=0.125
c) trait A but not trait B?
Solution:
Still two events are independent probability of trait A but not B becomes
P(A and B’)=P(A) X P(B’)=1/2 X ¼=0.125

7
d) trait B but not trait A?
solution;
P(A’ and B)=P(A’) X P(B)=1/2 X ¾=3/8=0.375
e) exactly one trait?
Solution:
We sum the probabilities of the two mutually exclusive ways that yield “exactly
one”
Pr[exactly one] = Pr[(A, not B) or (not A, b)]
=Pr [A, not B]+Pr[not A, B]
=[(.50)(.25)]+[(.50)(.75)] =.125+.375
=0.50
7. A physician develops a diagnostic test that is positive for 95% of the patients
who have disease and is positive for 10% of the patients who do not have disease.
Of patients tested, 20% actually have disease. Suppose you evaluate a patient by
administering this diagnostic test and obtain a positive result. Using the
information given, calculate the probability that this patient has disease.
Solution:
 This is conditional probability
 By using Bayesian formula, we can calculate the required as follow
We are asked to calculate Probability (Disease | + test)
Given • Probability (+ test | disease) =0 .95
• Probability (+ test | no disease) = 0.10
 Probability (Disease) =0.20 implies Probability (not Disease) =0.80
Pr (disease | +) = Pr (disease and +) / Pr (+) …………………….. conditional probability
= Pr (+ | disease) Pr(disease) / Pr (+)

8
=Pr (+ | disease) Pr(disease) / Pr (+ | disease) Pr(disease) + Pr (+ | no
disease) Pr (no disease)
= (0.95) (0.20) / (0.95) (0.20) + (0.10) (0.80)
= 0 .7037
8.The height, X, of young American women is distributed normal with mean
μ=65.5 and standard deviation σ=2.5 inches. Find the probability of each of the
following events.
a. X < 67
solution:
Height is a continuous random variable. so, its probability found using normal
distribution and z score
Z=(x-μ) /σ
P(x<67) =P(z<67-65.5/2.5) =P(z<0.6)
=0.7257
b. 64 < X < 67
solution:
P(64<X<67) =P(64-65.5/2.5<Z<67-65.5/2.5)
=P(-0.6<Z<0.6) =P(Z<0.6)-P(Z<-0.6)
=0.7257-0.2743
=0.4515
9. Four buses carrying 148 students arrive at a football stadium. The buses carry,
respectively, 25, 33, 40 and 50 students. After everyone gets off the buses, a
random student is picked at random. Let X denote the number of students that
were on his/her bus. Also, one of the drivers is picked at random. Let Y denote the
number of students that were on his/her bus.
(a) Compute E[X] and E[Y ]. How do you explain the difference?
Solution:

9
X = 25, 33,40,50=y p(x)=148 and P(y)=4
X or y P(x) P(y)
25 25/148 1/4
33 33/148 1/4
40 40/148 1/4
50 50/148 1/4

Expected value of x =E(x)=25x25/148+33x33/148+40x40/148+50x50/148


=39.3
Expected value of y=E(y)=25x1/4+33x1/4+40x1/4+50x1/4
=37
 Expected E(X) is larger because it is per student average than expected
E(y) which is per bus average.
(b) Compute Var[X] and Var [Y].
Solution:
Variance var(x) can be calculated by E(X2)- (E(X))2
E(x2) =252x25/148+332x33/148+402x40/148+502x50/148=1625.4
(E(x))2=39.3 x39.3=1544.5
Var(x)=1625.4-1544.5
=82.3
# SD =square root of 82.3=9.077
Var(y)= E(Y2) -(E(Y))2
E(Y2) =252X1/4+332X1/4+402X1/4+502X1/4=1453.5±
(E(Y))2=37X37=1369
Var(y)=1453.5-1369
=84.5
#SD=ROOT OF 84.5=9.19
10
Assignment (confidence interval)
10.Find the 99% confidence interval for the proportion of children with
ontological abnormalities in a survey of ontological examination of school
children, out of 146 children examined 21 were found to have some type of
ontological abnormalities?
Solution:
 This is to find the 99% confidence interval proportion of population by
using interval estimation
We can use the following formula for proportion calculation;

( )( )
CI= 𝑝 +/− 𝑍 𝛼/2

we have P= 21/146=0.144 and 1-P=0.856


Zα/2=2.58 for 99% confidence level and n=146

( . )( . )
CI= 0.144 +/−2.58

CI=0.144+2.58 X0.029 OR CI=0.144-2.58X0.029


CI=0.2188 AND CI=0.0692
CI= (0.0692 ,0.2188)
 This means we are 99% confident that the proportion of population who
have ontological abnormalities lie between 0.0692 and 0.2188
Assignment (sample size)
11.A health officer wishes to estimate the mean serum cholesterol in a population
of men. From previous similar studies a standard deviation of 40 mg/100ml was
reported. If he is willing to tolerate a marginal error of up to 5 mg/100ml in his
estimate, how many subjects should be included in his study ? (95% CI)
a) If the population size is assumed to be very large, what would be the required
sample size?

11
Solution:
 This is required to calculate the sample size for estimating mean of a
population serum cholesterol for a large population
 For N (population size) > 10,000 we have a formula

 n=

n=(Zα/2)2 x 2/d2 …………. where d is equal to half of confidence interval


width or margin of error
 we have given; =40mg/100ml d= 5mg/100ml CL=95% Zα/2=1.96
n=1.96 X1.96 X 40X40/5X5
n = (1.96)2 (40)2 / (5)2
n=6146.56/25
n=245.86  246
 we need 246 persons as a sample size
b) If the population size is , say N=2000 , what they would be required sample size
solution:
 since the population size is small, we have to make adjustment with
correction formula:
n final= n/ 1+n/N
n final=246/1+246/2000
n final =220
 we need 220 persons as a sample size for population
size of 2000

Assignment 5
12.Please review literature and identify sample size calculation formula for the
following study design

12
a. Unmatched and matched Case control study design
Case-control study is a study that compares patients who have a disease or
outcome of interest (cases) with patients who do not have the disease or
outcome (controls). It looks back retrospectively to compare how frequently the
exposure to a risk factor is present in each group to determine the relationship
between the risk factor and the disease Unmatched Case-Control study calculates
the sample size recommended for a study given a set of parameters and the
desired confidence level.

Matched case control study is analytical study calculates the statistical


relationship between exposures and the likelihood of becoming ill in a given
patient population. This study is used to investigate a cause of an illness by
selecting a non-ill person as the control and matching the control to a case
information required
 POWER: probability of detecting a real effect.

 ALPHA: probability of detecting a false effect (two sided: double this if


you need one sided).
 r: correlation coefficient (ϕ) for exposure between matched cases and
controls.
 P0: probability of exposure in the control group.
 m: number of control subjects matched to each case subject.
 OR: odds ratio (ψ).

13
Sample size, when proportion is parameter of the study or data are on
nominal/ordinal scale:
( )
𝑛= 𝑥
( )
𝑤ℎ𝑒𝑟𝑒
n = Desired number of samples
r = Control to cases ratio (1 if same numbers of subject in both groups)
p = Proportion of population = (P1 +P2 )/2
Z1-β = It is the desired power (0.84 for 80% power and 1.28 for 90% power)
z1-α/2 = Critical value and a standard value for the corresponding level of
confidence. (At 95% CI or 5% type I error it is 1.96 and at 99% CI or 1% type
I error it is 2.58)
P1 = Proportion in cases
P2 = Proportion in controls
Sample size in case data is on interval/ratio (quantitative) scale and mean
as a parameter of the study:
( / )
𝑛= 𝑥

Where: n = Number of samples which we need to find out


r = Control to cases ratio
p = Proportion of population = P1+P2 /2
Z 1-β = It is the desired power
z1-α/2 = Critical value and a standard value for the level confidence
σ = SD which is based on a previous study or pilot study
d = Effect size (difference in the means from previous studies or pilot
study)
b. Cohort study designs

14
cohort study is one of analytic type of study design in which healthy individual
with and without exposure to some risk factors are observed and follow up toa
certain time period and their outcome is observed. The cohort study design is the
best available scientific method for measuring the effects of a suspected risk
factor. In a prospective cohort study, researchers raise a question and form a
hypothesis about what might cause a disease. Then they observe a group of
people, known as the cohort, over a period of time. The result can be
compared by using relative risk.
The sample size for cohort study can be calculated by the following
Formula:

c. Comparative-cross-sectional study design


Cross-sectional study design is a type of observational study design. In a cross-
sectional study, the investigator measures the outcome and the exposures in the
study participants at the same time. We can estimate the prevalence of disease in
cross-sectional studies. Although the majority of cross-sectional studies is
quantitative, cross-sectional designs can be also be qualitative or mixed-method
in their design.
Hence for qualitative variable in order to calculate sample size we can use the
following formula;

15
𝜶
𝒁𝟏 𝟐 𝑷(𝟏 𝑷)
Sample size= 𝟐
𝒅𝟐

Where: Z 1-α/2=is standard normal variate


P=expected proportion in population based on previous study
d= precision/ margin of error
For quantitative variable we can use;
𝜶 𝟐
(𝒁𝟏 ) 𝑺𝑫𝟐
Sample size= 𝟐
𝒅𝟐

Where: SD=standard deviation of variable


d. Interventional study design
Interventional studies, also called experimental studies, are those where the
researcher intercedes as part of the study design and they are analytical study.
There is different type of analytical studies. The main intervention study design
is the randomized controlled trial (RCT). A pre-post clinical trial/cross-over trial is
one in which the subjects are first assigned to the treatment group and, after a
brief interval for cessation of residual effect of the drug, are shifted into the
placebo /alternative group. Sample size should be calculated based on the
primary hypothesis. If the primary hypotheses are there is difference from
baseline to end of intervention for each marker, then, sample size should be
calculated using the standard deviation, and the assumed effect size for each
marker.

16
For quantitative study:
𝜶
𝟐𝝈𝟐 𝒁 𝒁𝜷 𝟐
Sample size= 𝟐
𝒅𝟐

For qualitative study:


𝒁𝜶
𝟐( 𝒁𝜷)𝟐 𝑷(𝟏 𝑷)
Sample size= 𝟐
(𝑷𝟏 𝑷𝟐)𝟐

Where; SD= standard deviation


Zα/2=level of significance
d= effect size/ difference between means
P1 and p2 proportion of events in the two grou
Reference:
Some questions like assignment five are referred from:

Daniel-1995-Biostatistics.

Principles of Biostatistics - 2nd Edition - Marcello Pagano

https://www.ncbi.nlm.nih.gov/pmc/articles

The end!

17

You might also like