Professional Documents
Culture Documents
Introduction to Biostatistics
Objectives
2
Statistics and the Scientific Method
Scientific Method
Question
Make Design
Conclusions Study
Analyze
Collect Data
Data
3
Why study statistics?
• MLS and CHS degree programmes prescribe a research
project
• much research relies on statistics
• literature is full of statistics
– support study of other courses
4
Defining Statistics (1)
6
Examples of Statistics (b)
2
% Change in Mineral Content
0
-2
-4
-6
-8
8
Defining Statistics (2)
In statistics, we use a
study sample to make
inference about some
population that we are
interested in.
9
Defining Statistics (3)
Sample
N=5
10
Defining Statistics (4)
What are some types of statistics that you are familiar with?
11
Defining Statistics (5)
What are some types of statistics that you are familiar with?
12
Need for Biostatistics (1)
14
Applications of Biostatistics (1)
15
Applications of Biostatistics (2)
In other words:
• A parameter is a feature of the population.
• A statistic is a feature of the sample (random subset of a
population)
18
Parameter vs. Statistic (3)
Variable
Categorical Numerical
(Qualitative) (Quantitative)
21
Classification of Variables
22
Variables: Categorical
• Categorical Nominal
– No natural order of the levels, mutually exclusive
• Example: Gender, race, species, HIV status, blood
groups, alive/dead, village of birth, eye colour, tall/short,
marital status, beliefs, types of tumor, etc.
• Name only, no order, magnitude unimportant
23
Variables: Categorical
• Categorical Ordinal
– Some natural ordering of the levels
• Example: Severity scale, good/better/best;
no/mild/moderate/severe, low/middle/high income,
tumor (benign/malignant), etc
• Order important, magnitude unimportant
24
Variables: Types
• Categorical Ordinal (Rank Data)
– numbers used only to order the data, thus the name, rank data.
• Example: ten leading causes of death, class
position in as test, Olympic medals, etc.
Top 10 Causes of Death in Botswana 2013
1. HIV/AIDS 32%
2. Malaria 7%
3. TB 6%
4. Diarrheal Diseases 4%
5. Cancer 4%
6. Pre-Term Birth Complications 2%
7. Ischemic Heart Disease 2%
8. Stroke 2%
9. STDs 2%
25
10. Road Injuries 2%
Classification of Variables
26
Variables: Quantitative
27
Variables: Roles
30
Derived Data
• Ratios
– Quotient of two variables e.g. BMI
• Rates
– e.g. disease rates
31
Class Activity: Activity 1
• Classifying Variables
32
• Number of deaths in Botswana in a specific year
• Number of previous miscarriages an expectant mother has had
• Anti-streptolysin O titre (ASOT)
• Estimated glomerular filtration rate (eGFR)
• Arterial PCO2, mmHg
• Concentration of chlorine in water
• Disease outcome
• Body mass index
• Stages of cancer
• Weight, kg
• Malaria parasitemia
• HIV viral load
• Level of education (illiterate, primary, secondary and tertiary)
• Haptoglobin phenotypes
• Number of passion killings in Botswana
33
• Length of time to recovery after a heart attack in years
Key Points (1)
34
Key Points (2)
35
Descriptive Methods for Categorical
Data
Classification of Statistics
b. Inferential statistics
i. Categorical data
ii. Quantitative data 37
Descriptive Statistics (1)
39
Descriptives for Categorical Data (1)
40
Descriptives for Categorical Data (2)
43
Descriptives for Categorical Data (1):
2 variables
44
Descriptives for Categorical Data (2):
2 Variables
45
Descriptives for Categorical Data (3):
2 Variables
...
46
Two-way Frequency Tables (1)
47
Two-way Frequency Tables (2)
48
Two-way Frequency Tables (3)
49
Two-way Frequency Tables (4)
50
Bar charts - Frequencies
51
Bar charts - Percentages
52
Key Points
53
Measures of Central Tendency and
Dispersion
Module #2
Objectives of Descriptive Statistics
a) Measures of location
b) Measures of dispersion/variability/spread
Population A
No. of
People
Population B
No. of
People
Population A Population B
• Three measures frequently used to provide a “Typical Value” for a given continuous
variable in a specific population.
Measures of Central Tendency
Quick definitions
– Mode
• the most frequently occuring score
– Median
• the mid-point of a set of ordered scores
– Mean
• the result of dividing the arithmetic sum of
scores by the number of scores
Finding the Mode
• Annual salary
–4332384372
• units of $10k
• Annual salary
– 2, 2, 3, 3, 3, 3, 4, 4, 7, 8
• The mode is three 3
•In this case, n=6 ( an even number); therefore, the median is the:
• the average of the observations (n/2) + (n/2+1)
• The average of the 3 and 4 observations
= (10+11)/2
= 10.5
Median
For this simple problem, you could compute the mean with pencil and paper by summing the
numbers in the salary column and dividing by “n” (10).
Method for Computing the Mean
To compute the mean:
– Count the number of scores (determine “n”)
– Determine the sum of the scores by adding
them
– Divide the sum by “n”
No. of
People
Value of Factor K
No. of
People
Value of Factor J
The mean and the median
• Take a sample of 10 heights (70, 95, 100, 103, 105, 107, 110,
112, 115, 140cms)
Lowest (minimum) value = 70cm.
Highest (Maximum) value= 140cm
Range is therefore 140 – 70 = 70cm
Simple to understand but far from perfect - why ?
i The range is derived from extreme values. It says nothing
about the values in between
• Not stable (as sample size increases the range can change
dramatically)
• Can’t use statistics to look at it.
Figure 8. Two distributions with the same range
No. of
People
Same Range
Different mean and variability
• Percentiles: Those values in a series of observations,
arranged in ascending order of magnitude, which divide the
distribution into two equal parts (thus the median is the 50th
percentile).
The median is the middle value (if n is odd) or the average of the two middle
values (if n is even), it is a measure of the “center” of the data
• Interquartile Range
– the difference between the score representing the 75th percentile and the score
representing the 25th percentile
– Arrange: 24 , 25 , 29 , 29, 30 , 31
» Q1 = value of (n+1)/4=1.75
» Q1 = 24+0.75(25-24) = 24.75
» Q3 = value of (n+1)*3/4=5.25
» Q3 = 30+0.25(31-30) = 30.25
» Q3 – Q1 = 30.25 – 24.75=5.50
Exercise
– 0, 3, 0, 7, 2, 1, 0, 1, 5, 2, 4, 2, 8, 1, 3, 0, 1, 2, 1
So how do we get a single mathematical
measure or
summarise the variability of an observed set of
values?
Why divide by n - 1 ?
- (
å%# - #$
&' =
)
" -!
n å x - (å x )
2 2
SD =
i i
n( n - 1 )
Calculating Standard Deviation
Score (x) Mean (x) Deviation Squared deviation
(x –x) (x – x )2
13
12
13
14
10
16
15
24
20
18
Σx = 155
Calculating Standard Deviation
155 =0 156.5
Choosing the Measures of
Central Location and Dispersion
Grouped data
Mean
!=
å
• Step 5 : divide sum
xf of x*f by total frequency
åf
• Determine the mean, median for the data
presented below
Valid Cumulative
Frequency Percent Percent Percent
Valid Less
2 1.0 1.4 1.4
than $10
$11-25 20 10.0 14.3 15.7
$26-50 63 31.5 45.0 60.7
More
55 27.5 39.3 100.0
than $50
Total 140 70.0 100.0
Missing System
60 30.0
Missing
Total 60 30.0
Total 200 100.0
1.The median for grouped data calculation requires you to use the frequency distribution
output and the class intervals that are the question’s response categories.
2.The formula for the median is:
æn ö
ç - cf p ÷
2
Med = Lm + è øC
m
fm
1.The median for grouped data calculation requires you to use the frequency distribution
output and the class intervals that are the question’s response categories.
2.The formula for the median is:
æn ö
ç - cf p ÷
2
Med = Lm + è øC
m
fm
where:
Lm = lower boundary of class containing median
n = sample size
cfp = cumulative frequency of classes preceding class containing the median
fm = number of observations in class containing the median
Cm = width of the interval containing the median
• Step 1: set up the frequency distribution table
• Step 2. Identify the median class i.e the class interval with 50% of
the values above it or below it.
• Step 3: use the formula to find the median
In our example,
The median class interval is the 26 -50 class interval.
Lm = 26
n = 140
cfp = 15.7
fm = 63
æn ö
ç - cf p ÷
2
Med = Lm + è øC
m
= 26 + (140/2 -15.7)24/63
fm
= 46.69
!ariance =
1 é
êå f i x i -
2
(å f i xi )
2
ù
ú
n -1 ê n úû
ë
End of Class
101
Lab Session
102
Excel Tutorial
1 2 1 1 4 1 1 4 3 2 1 4 1
1 2 3 1 1 4 1 2 4 1 1 1
103
Frequencies in Excel (1)
COUNTIF function
• Create a place to put the frequency values
104
Frequencies in Excel (2)
105
Frequencies in Excel (3)
106
Bar Charts in Excel (1)
Frequency table
107
Bar Charts in Excel (2)
108
Bar Charts in Excel (3)
109
Bar Charts in Excel (4)
110
111
Bar Charts in Excel (5)
112
Bar Charts in Excel (6)
113
Bar Charts in Excel (7)
114
Bar Charts in Excel (8)
115
Bar Charts in Excel (9)
116
Bar Charts in Excel (10)
117
Bar Charts in Excel (11)
Select the
data &
insert a
column
chart
118
Bar Charts in Excel (12)
119
Bar Charts in Excel (13)
120
Bar Charts in Excel (14)
121
Bar Charts in Excel: Two-way Tables
122
Homework
123