Professional Documents
Culture Documents
• Usefulness of vaccines
• Incidence/prevalence of disease.
Probability
Sampling theory
Sample
Study population
Target population
Biostatistics -Notes WA , SPH AAU ,2016
Parameter and Statistic
Parameter: A descriptive measure computed
from the data of a population.
Statistic: A descriptive measure computed from
the data of a sample.
Example:
Variable
Types
of Qualitative Quantitative
variables or categorical measurement
Measurement scales
Biostatistics -Notes WA , SPH AAU ,2016
Types of variable
Continuous variable: It can have an infinite number
of possible values in any given interval.
• Unlimited number of possible values
• Infinite number of values can fall b/n any 2
observed values
• No gaps between units
Example. time taken to solve a problem
height or weight, weight/Temperature of patients
2. Secondary source data: when an investigator uses data which have already
been collected by others. Secondary sources can be individuals or agencies,
which supply data originally collected for other purposes by them or others.
• They are less expensive in time and cost than Primary data.
• Usually they are published or unpublished materials, records, reports,
e t c.
• Histogram
• Stem-and-leaf plot
• Box plot Quantitative
• Scatter plot data
• Line graph
• Others
45
40
35
30
25
20
15
10
5
0
• Bad
Figure 1.3 Bar graph Excellent
showing the Good
number of students Medium
of each category
• This table shows two characteristics and is formed when either of the two
variables (the caption or the stub) is divided into two or more parts.
• For instance , the marital status and cervical cancer status can be presented
in the following two way table.
• The bar graph is very commonly used and is better for representation of
qualitative data. Bars are vertical lines, where the lengths of the bars are
proportional to their corresponding numerical values and the bars should be
equally space.
• Pie chart enables us to show the partitioning of a total in to its component parts.
• The diagram is in the form of circle and component as slices of the circle.
• The size of the slice represents the proportion of the component out of the total.
value of component X 0
Degree of X= ×360
total value of the components
Example: The following data indicates the marital status of 40 women who came for the
service of contraceptives to St. Paul HMMC. Present the data using Pie- diagram.
total women
8
deg ree of Married women = ×3600 720
40
Like with the slice degree of the pie chart of the women for widowed, separated and
single women becomes is 108, 144and 36, respectively.
Frequency
Single
10% Married
20%
Separated
40% Widowed
30%
Graph 2.3: The Pie- diagram presentation of 40 women who came for for
contraceptive service to St. Paul HMMC.
Biostatistics -Notes WA , SPH AAU ,2016
Pie charts
Divide a complete circle (a pie) into slices, each
corresponding to a category, with the central angle and
hence the area of the slice proportional to the category
relative frequency.
Example 1.4b (Pie Chart)
Graph 2.4 Infant and under five mortality rate in Ethiopia, 1970-2005
(Tefera Darge 2011; EDHS, 2000, 2005)
2100
No. of confirmed malaria cases
1800 Positive
1500 P. falciparum
P. vivax
1200
900
600
300
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Months
Biostatistics -Notes WA , SPH AAU ,2016
Line graph cont..
Line graph can be also used to depict the
relationship between two continuous
variables like that of scatter diagram .
8
7
Blood zidovudine
concentration
6
5
4
3
2
1
0
10
20
70
80
100
120
170
190
250
300
360
Tim e since adm inistration (Min.)
x x2 ... xn x i
x 1 i 1
• n n
Population mean
N
where,
k = the number of class intervals
mi = the mid-point of the ith class interval
fi = the frequency of the ith class interval
Biostatistics -Notes WA , SPH AAU ,2016
Example 1: if the mark of five medical students is: 80, 75, 60, 50, 90 the mean
mark of the students is calculated as:
n 5
X i XX1 X 2 X 3 . ... X 5
i
X i 1
i 1
n 5 5
80 75 60 50 90
X 71.
5
• Therefore, the mean mark of the students was 71.
Exercise: You measure the body lengths (in inches) of 10 full-term infants at
birth and record the following: 17.5, 19.5, 17.5, 19, 20, 21, 18,
19.5, 18, 10.75. Compute the mean length of the infants for these
data.
54 2 64 3 74 5 84 10 94 3 104 2
X 68.72
2 3 5 10 3 2
Mean = 15.3
Advantage Disadvantage
Mathematical center of a It is affected by extreme
distribution. values and skewed
Just as far from scores above it distributions that are not
as it is from scores below it. representative of the rest of the
data.
Good for interval and ratio
data. May not exist in the data.
includes all the values of the
data set and unique .
Inferential statistics is based on
mathematical properties of the
mean Biostatistics -Notes WA , SPH AAU ,2016
Measures of location ( or central tendency)
• Example Consider 189 subjects: 48,35,…66.
= (48+35+…+66)/189 = 55.032
xk if n 2k 1 ( n is odd)
~
X Median 1
xk xk 1 if n 2k ( n is even)
2
n 1 th
largest value, when n (size of the data) is odd
2
median(X)
1 n
th
n 2
th
2 2 2 value, when n is even
Biostatistics -Notes WA , SPH AAU ,2016
Example : to find the median of: 6,2,7,13,4,9,15,1,12.
n 1
th
The sample size, n=9 (odd). So the median is the value, . value
largest
2
9 1
th
The median of the data becomes larg est value the 5 value ;
th
which is 7. 2
Exercise: Compute the sample median for the birth weight data Solution:
3265, 3314, 2581, 2759, 2834, 2838, 2841, 3031, 3200, 3245, 3260, 3323,
4146, 3609, 3484,, 3101, 3248, 2069, 3649, 3541.
• where,
Lm = lower true class boundary of the interval containing the
median
Fc = cumulative frequency of the interval just above the
median class interval
fm = frequency of the interval containing the median
W= class interval width
n = total number of observations
Advantage Disadvantage
and
>
Biostatistics -Notes WA , SPH AAU ,2016
Range (R)
• R = xmax – xmin, where
XL is the largest value and XS is the smallest value.
Example: for the given data set: 100, 95, 125, 45, 70, the range is calculated
as:
R= xmax – xmin
R= 125 – 45
Range = 80.
Properties of Range
• Range and relative range are easy to calculate and simple to understand.
• Both cannot be computed for grouped data with open ended classes.
• They do not tell us anything about the distribution of values in the
series.
Exercise1: Find the range for the monthly salary of ten workers in a certain
health center given below. 462, 480, 534, 624, 498, 552,606, 588, 516,
570.
Biostatistics -Notes WA , SPH AAU ,2016
Interquartile range (IQR)
• IQR = Q3 ‐ Q1, where
Q3 is the third quartile and Q1 is the first quartile.
Example: Suppose the first and third quartile for weights of
girls 12 months of age are 8.8 Kg and 10.2 Kg respectively.
The interruptible range is therefore,
IQR = 10.2 Kg – 8.8 Kg
i.e., 50% of infant girls at 12 months weigh between
8.8 and 10.2 Kg.
• Where
mi = the mid‐point of the ith class interval
fi = the frequency of the ith class interval
= the sample mean
k = the number of class intervals
Solution: As the data is collected from the population, the variance is calculated
using:
X
2
2
i
N
But first theN mean is calculated as:
i 1
Xi
80 70 95 100 125
N 5
470
94
5
To calculate the variance:
N
X
2
2 i 1
N 5
1770
= 354
5
The standard deviation will be:
f X
2
2
i i
N
The standard deviation is the square root of the variance.
i.e. S.D Variance
• Example: In the study, the weight of six new born babies was recorded
below. Find the variance and S.D
Xi fi
1.5 2 2.5 3 3 1
i 1
2.25
N
2 3 1
• f
i 1
i
fi X i
2
2(1.5 2.25)2 3(2.5 2.25)2 1(3 2.25)2
2
N 6
1.5 0.75 0.75
= 0.5
6
Solution: As the data is collected from the population, the variance is calculated
using:
X
2
2
i
N
But first theN mean is calculated as:
i 1
Xi
80 70 95 100 125
N 5
470
94
5
To calculate the variance:
N
X
2
2 i 1
N 5
1770
= 354
5
The standard deviation will be:
• Example: In the study, the weight of six new born babies was recorded
below. Find the variance and S.D
Xi fi
1.5 2 2.5 3 3 1
i 1
2.25
N
2 3 1
• f
i 1
i
fi X i
2
2(1.5 2.25)2 3(2.5 2.25)2 1(3 2.25)2
2
N 6
1.5 0.75 0.75
= 0.5
6
(X i X) 2
S2 i 1
• Note: - for the sample data we divide by (n-1) instead of n as in the case of
population variance, as it gives better and unbiased estimator of the
population variance.
f
i 1
i -1
Biostatistics -Notes WA , SPH AAU ,2016
Example: If samples of 6 children were taken from the population with age
of: 17, 18, 19, 20, 22, 24. Calculate;
A) the variance B) the standard deviation
First the sample mean is calculated as:
n
X i
17 18 19 20 22 24 120
X 11
20
n 6 6
1) 19, 20, 24, 12, 17, 22, 18, 20, 23, 17.
Age Frequency
2) 22 3
23 2
24 4
26 1
Q1 Q2
Mean 19.2 23.4
SD 3.489667 1.264911
• Two distributions may have the same mean and standard deviation but they
may differ in their shape of the distribution.
• Further description of their characteristics is necessary that is provided by
Skewness.
• A B = {1, 2, 3, 4, 5, 6, a, b, c}
• A B= {1, 2, 5}
Biostatistics -Notes WA , SPH AAU ,2016
Basic characteristics of Set
1. A = A, A = A, AU = U, AU= A
2. AA = A , A A = A;
5. A(BC)=(AB)(AC); A
(BC)=(AB)U(AC)
6. (Ac)c = A
Example2
– We may hear a physician say that a patient has a 50—50 chance
of surviving a certain operation .
P(A and B) = 0
Bright light 18 3 21
Reduced light 21 18 39
TOTAL 39 21 60
1-19 times 32 7 39
20-99 times 18 20 38
more than 100 times 25 9 34
----------------------------------------------------------------------------------------------------
Total 75 36 111
----------------------------------------------------------------------------------------------------------------------
1. What is the probability of a person randomly picked is a male?
2. What is the probability of a person randomly picked uses cocaine more than 100
times?
3. Given that the selected person is male, what is the probability of a person
randomly picked uses cocaine more than 100 times?
4. Given that the person has used cocaine less than 100 times, what is the
probability of being female?
5. What is the probability of a person randomly picked is a male and uses cocaine
more than 100 times?
Biostatistics -Notes WA , SPH AAU ,2016
Conditional Probability
1. For independent events A and B,
P(A/B) = P(A).
2. For non independent events A and B
P(A and B) = P(A/B) P(B), (General Multiplication Rule)
3. Bays theorem:
P(A/B) = P(B/A) P(A)
P(B)
Bayes’s Formula
If the event B may occur together with one and only one
of n mutually exclusive events A1, A2, ..., An then
• 0 1/8
• 1 3/8
• 2 3/8
• 3 1/8
• n! = nx(n‐1)x(n‐2)x…x2x1 = nx(n‐1)!
• By definition; 0!=1.
Outcome of X
Person1 Person2 Prob No of smokers
_____________________________________________________________________________________________________________________