Professional Documents
Culture Documents
COMH 601
By
• Take-home assignments
• Group discussions
6
7. F. L. Hernandez: Biostatistics: A Guide to
Design, Analysis and Discovery
8. H. Motulsky. Intuitive Biostatistics
9. C. Friis. Introduction to Biostatistics for Health
Sciences
10. V. Shukla: Biostatistics: perspective in
Healthcare Research & Practice
11. G. Van Belle: Biostatistics: A Methodology for
health Sciences
12. Rao: Biostatistics: A manual of methods use in
Health, Nutrition and Anthropology. 2nd edition
13. Rosner. Fundamentals of Biostatistics. 7th
edition
7
Descriptive Statistics
8
• Biostatistics: a science that deals with the
collection, organization, analysis, interpretation
and presentation of information that can be
stated numerically.
The application of statistical methods to the
fields of biological and medical sciences or
public health.
Has central role in medical investigations
Concerned with interpretation of biological data
& the communication of information about data
9
Uses of biostatistics
• Provide a way of organizing information
• Assessment of health status
• Health program evaluation
• Resource allocation
• aids in summarizing the results
• helps us recognize underlying trends and
tendencies in the data
• aids in communicating the results to others
• Magnitude of association
– Strong vs weak association between
exposure and outcome 10
• Assessing risk factors
– Cause & effect relationship
• Evaluation of a new vaccine or drug
– What can be concluded if the proportion of
people free from the disease is greater among
the vaccinated than the unvaccinated?
– How effective is the vaccine (drug)?
– Is the effect due to chance or some bias?
• Drawing of inferences
– Information from sample to population
11
What does biostatistics cover?
Research Planning
Presentation
Interpretation
12
Publication
Types of Statistics
1. Descriptive statistics:
• Ways of organizing and summarizing data
• Helps to identify the general features and
trends in a set of data and extracting
useful information
• Also very important in conveying the final
results of a study
– Example: tables, graphs, numerical summary
measures
13
2. Inferential statistics:
• Methods used for drawing conclusions
about a population based on the
information obtained from a sample of
observations drawn from that population
– Example: Principles of probability, estimation,
CI, comparison of two or more means or
proportions, hypothesis testing, etc.
14
• Population
– complete set of individuals, objects or
measurements
• Sample
– a sub-set of a population
16
Type of data and scales of measurement
• All measurements are not the same.
• Measuring weight of a patient eg. 40kg
• Measuring the status of a patient on scale eg.
―improved‖, ―stable‖, ―not improved‖.
18
3. Interval scale: assigns each measurement
to one of an unlimited number of categories
that are equally spaced.
• It has no true zero point.
Example: Temperature measured on
Celsius or Fahrenheit
Ratio
Degree of precision in measuring
Variables
Variable: A characteristic which takes different
values in different persons, places, or things
Qualitative variable: The notion of magnitude is
absent or implicit.
Quantitative variable: Variable that has magnitude.
21
SUMMARY
Variable
Types Quantitative
Qualitative
of measurement
or categorical
variables
Measurement scales 22
Methods of Data Organization
and Presentation
23
• The actual organization and
summarization of data starts from
frequency distribution.
24
Frequency Distributions
• Simple frequency distribution:
– It is useful for categorical variable
– For continuous variable it is not common.
– But the following information can be obtained if the
number of observation is not too large
• it allows you to pick up at a glance some valuable informa
tion, such as highest, lowest value.
• ascertain the general shape or form of the distribution
• make an informed guess about central tendency values
25
a) Qualitative variable:
• Count the number of cases in each category.
26
Table: Distribution of 25 patients entering ICU at a
certain period in ―X‖ Hospital:
Frequency Relative Percentage
ICU Type (How often) Frequency (100%)
(Proportionately
often)
Medical 12 0.48 48
Surgical 6 0.24 24
Cardiac 5 0.20 20
Other 2 0.08 8
Total 25 1.00 100
27
b) Quantitative variable:
- Select a set of continuous, non-overlapping
intervals such that each value can be placed
in one, and only one, of the intervals.
- The first consideration is how many intervals
to include
28
For a continuous variable
(e.g. – age), the frequency
distribution of the individual
ages is not so interesting.
We “see more” in frequencies
of age values in “groupings”.
Here, 10 year groupings
make sense.
Example:
– Leisure time (hours) per week for 120 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23 19
21 31 16 28 19 18 12 27 15 21 25 16 23 24
18 14 20 36 24 26 23 21 15 16 19 20 22 14
13 19 27 10 19 23 32 28 34 38 28 22 29 31
21 16 28 19 18 24 23 16 21 25 15 27 12 18
14 14 20 22 20 36 24 16 19 23 25 23 21 20
13 31 29 22 27 19 19 23 32 28 38 28 34 32
16 21 25 15 27 12 18 21
Grouped frequency Distribution
32
To determine the number of class intervals and the
corresponding width, we may use:
Sturge’s rule:
K 1 3.322(logn)
LS
W
K
where
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
33
Reading Assignment
34
Diagrammatic presentation
35
Importance of diagrammatic representation:
36
• Well designed graphs can be powerful
means of communicating a great deal of
information
37
Specific types of graphs include:
• Bar graph Nominal, ordinal
• Pie chart data
• Histogram
• Stem-and-leaf plot
• Box plot
• Scatter plot Quantitative
data
• Line graph
• Frequency polygon
• Ogive
38
• Others
Reading Assignment
39
Numerical Summary Measures
– Single numbers which quantify the
characteristics of a distribution of values
Measures of central tendency (location)
Measures of dispersion
40
Measures of Central Tendency (MCT)
• On the scale of values of a variable there is
a certain stage at which the largest number
of items tend to cluster.
• Since this stage is usually in the centre of
distribution, the tendency of the statistical
data to get concentrated at a certain value
is called “central tendency”
• The various methods of determining the
point about which the observations tend to
concentrate are called MCT.
41
• The objective of calculating MCT is to
determine a single figure which may be
used to represent the whole data set.
42
Characteristics of a good MCT
A MCT is good or satisfactory if it possesses
the following characteristics.
1. It should be based on all the observations
2. It should not be affected by the extreme values
3. It should have a definite value
4. It should not be subjected to complicated and tedious
calculations
5. It should be capable of further algebraic treatment
6. It should be stable with regard to sampling
43
• The most common measures of central
tendency include:
– Mean
– Median
– Mode
44
1. Arithmetic Mean
A. Ungrouped Data
• The arithmetic mean is the "average" of the data
set and by far the most widely used measure of
central location
• Is the sum of all the observations divided by the
total number of observations.
45
B. Grouped Data
• In calculating the mean for grouped data, we
assumed that all values falling into a particular class
interval are located at the mid point of the interval
• Because of this, it is calculated as follows:
k
m f i i
X i 1
k
f
i 1
i
48
When the data are skewed, the mean is
“dragged” in the direction of the skewness
52
• The median is a better description (than the mean) of
the majority when the distribution is skewed
• Example
– Data: 14, 89, 93, 95, 96
– Skewness is reflected in the outlying low value of 14
– The sample mean is 77.4
– The median is 93
53
b) Grouped data
• In calculating the median from grouped data, we
assume that the values within a class-interval
are evenly distributed through the interval.
55
Example. Compute the median age of 169
subjects from the grouped data.
56
• n/2 = 84.5 = in the 3rd class interval
• Lower limit = 29.5, Upper limit = 39.5
• Frequency of the class = 47
• (n/2 – fc) = 84.5-70 = 14.5
57
3. Mode
20
18
16
14
12
10
8
6
4
2
0 59
T. Ancelle, D. Coulombie
a) Ungrouped data
60
• Example
• Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6
• Mode is 4 “Unimodal”
• Example
• Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8
• There are two modes – 2 & 5
• This distribution is said to be “bi-modal”
• Example
• Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12
• No mode, since all the values are different
61
b) Grouped data
• To find the mode of grouped data, we usually
refer to the modal class, where the modal
class is the class interval with the highest
frequency.
• There are two ways of determining mode:
1. For a rough estimation of a mode, the mid-
point of the modal class interval could
consider.
62
63
2. It is also possible to approximate mode for
grouped frequency distribution based the
formula:
^
X = L.c.bmod + (d1)
d1+d2 W
and
n
logx
i=1
i
logGM = .
n
The geometric mean is generally used with data measured on a logarithmic scale, such
as titers of anti-neutrophil immunoglobulin G. 65
• Example: the minimum inhibitory concentration of
penicillin in urine for N. gonorrhoeae in 71 patients
(µg/ml) Frequency (µg/ml) Frequency
0.03125 21 0.250 19
0.0625 6 0.50 17
0.1250 8 1.0 3
Solution:
logGM = [21log(0.03125) + 6log(0.0625) +
8log(0.125) + 19log(0.25) + 17log(0.5)
+ 3log(1.0)]/71 = -0.846
The GM = the antilogarithm of -0.846 = 0.143
66
5. Harmonic mean (HM)
• Just as the geometric mean is based on an
arithmetic mean of logarithms, so is the
harmonic mean based on arithmetic mean of
the reciprocals.
• Pertains to rates and time
• We define it as the reciprocal of the arithmetic
mean of the reciprocal of the given numbers.
68
Example:
69
Which measure of central tendency is best with a
given set of data?
70
• The mean can be used for discrete and
continuous data
• The median is appropriate for discrete and
continuous data as well, but can also be
used for ordinal data
• The mode can be used for all types of
data, but may be especially useful for
nominal and ordinal measurements
• For discrete or continuous data, the
―modal class‖ can be used
71
• The geometric mean is used primarily for
observations measured on a logarithmic
scale.
• Harmonic mean is a suitable MCT when
the data pertains to rates and time.
• Weighted mean is commonly used in the
calculation of mean for different
outcomes.
72
Measures of Dispersion
73
These two distributions have the same mean,
median, and mode
74
• MCT are not enough to give a clear
understanding about the distribution of
the data.
75
• Measures that quantify the variation or
dispersion of a set of data from its central
location
76
• Measures of dispersion include:
– Range
– Inter-quartile range
– Variance
– Standard deviation
– Coefficient of variation
– Standard error
– Others
77
1. Range (R)
• Example –
– Data values: 5, 9, 12, 16, 23, 34, 37, 42
– Range = 42-5 = 37
78
2. Interquartile range (IQR)
Quartiles
• Split ordered data into 4 quarters
25% 25% 25% 25%
Q1 Q2 Q3
• Q1 = first quartile
IQR = Q3 - Q1
80
3. Mean deviation (MD)
82
• It is squared because the sum of the
deviations of the individual observations of
a sample about the sample mean is
always 0
0 = ( x i - x )
84
a) Ungrouped data
Let X1, X2, ..., XN be the measurement on
N population units, then:
i
(X ) 2
2 i 1
where
N
N
X i
= i=1
is the population mean.
N
85
A sample variance is calculated for a sample of
individual values (X1, X2, … Xn) and uses the sample
mean (e.g. ) rather than the population mean µ.
86
b) Grouped data
i
(m x) 2
fi
S2 i=1
k
f
i=1
i -1
where
mi = the mid-point of the ith class interval
fi = the frequency of the ith class interval
x = the sample mean
k = the number of class intervals
87
5. Standard deviation (, s)
and S = S
2 2
88
Example. Compute the variance and SD of the age of 169
subjects from the grouped data.
Mean = 5810.5/169 = 34.48 years
S2 = 20199.22/169-1 = 120.23
SD = √S2 = √120.23 = 10.96
Class
interval (mi) (fi) (mi-Mean) (mi-Mean)2 (mi-Mean)2 fi
10-19 14.5 4 -19.98 399.20 1596.80
20-29 24.5 66 -9-98 99.60 6573.60
30-39 34.5 47 0.02 0.0004 0.0188
40-49 44.5 36 10.02 100.40 3614.40
50-59 54.5 12 20.02 400.80 4809.60
60-69 64.5 4 30.02 901.20 3604.80
Total 169 1901.20 20199.22
89
SD Vs Standard Error (SE)
S
CV 100%
x
SD Mean CV (%)
SBP 15mm 130mm 11.5
Cholesterol 40mg/dl 200md/dl 20.0
93