You are on page 1of 18

11/7/18

Bio-Statistics – What is it
• Biostatistics is a branch of science which deal with the
application of statistics to biological science for
interpretation of the data received in biological systems.
Introduction to Bio-Statistics • Statistics is a branch of science which deals with methods
and tools for collection, compilation, summarization, data
presentation, comparison, interpretation and making
Dr. Biju George inference about the data and population.
Associate Professor • Thus statistics try to make meaning out of the numbers that
Govt Medical College, Kozhikode
we collected as data.
bijugeorge1@gmail.com

What is the need of Statistics in Biology 2 different Branches of Statistics


• A typhoid fever patient may ask the doctor that what will be the
expected cure rate (in %) • Descriptive Statistics – This deal with describing a group of
• A patient may ask a nurse that will the pain of an injection will be data. Intention is to describe and interpret about the studied
reduce by taking a deep breath before the injection subjects only
• A doctor may ask pharmaceutical company that do the newly
introduced drug work better than the conventional drugs
• Inferential Statistics (Analytical Statistics) – It deal with
• A policy maker may ask public health expert before introducing a new making inference about the population, form a set of sample
vaccine that, is the new vaccine really reduce the disease occurrence
significantly
data. In this not only we describe the study group using
descriptive statistics but we make statistical inference/
• A patient may ask the pharmacist that will taking a new drug for
hypertension along with the existing drug actually increase the drug-
conclusion
drug interaction and adverse events.

Data, Information & Intelligence Data, Information & Intelligence – FBS values
• Data is the values of observation that we collected in any Data 121, 154, 108, 212, 163, 198, 200, 115, 167, 178
study. Data is the unprocessed information

• Information is the processed data and we make


interpretation based on this information.

• Intelligence is the interpretation and conclusion using the


information in the current study and the current existing
Knowledge.

1
11/7/18

Data, Information & Intelligence – FBS values Data, Information & Intelligence – FBS values
Data 121, 154, 108, 212, 163, 198, 200, 115, 167, 178
Data 121, 154, 108, 212, 163, 198, 200, 115, 167, 178
Information No (%)of observation above 150 = 7 / 10 = 70%
Information No (%)of observation above 150 = 7 / 10 = 70% Mean = 161.6, Median =165
Mean = 161.6, Median =165 Intelligence 70 % of the diabetic have FBS above 150 indicate a poor
control in the study group. This could be due to either
poor management of the cases or due to some specific
characteristics of the study group
Since Mean and Median are above 160, these subjects
are poorly controlled and reason could be……

Variable Types of Variable


• Variable is the character/ attribute of observations. • Quantitative – data can be measured in a quantitative
manner
• If there is possibility of character being varying from one • Eg-Weight, Height, Fasting Blood sugar value, No of members
subject to the next, then the character is a variable. in a committee meeting

• If there is no possibility of variability, then it is a constant • Qualitative – data obtained can be arranged in categories
only
• Eg- Gender, Disease Present / Absent, Cure yes/No

Quantitative variables Qualitative variable ( Categorical variable)


• Ordinal – 3 or more categories/ levels. Definite order
• Continuous Variable – Data are possible in a continuous between the categories. Ascending / Descending
range
• Stage of Cancer
• Height, Weight, Fasting Blood sugar
• Nominal – Unrelated categories, No Order
• Religion
• Discrete Variables – data are possible in fixed interval values
only.
• No of members in a committee meeting • Binary (dichotomous) – 2 Mutually exclusive options only.
• Cure Yes/ no

2
11/7/18

Exercise Measurement scales


Systolic BP BMI categories • Ratio Scale
Age Group Completed years of Age
RBC count /mm3 Rank in an exam • Interval scale
Diabetic status Age
Gender Score of a tool (GCS tool / Adherence/ depression) • Ordinal scale

• Nominal scale

Ratio scale Interval scale


• Ratio of variable is meaningful • Ratio is not meaningful
• Multiplication/ Division /Addition/ Subtraction / Order • No true zero . Only arbitrary zero
• True Zero value • Addition/ Subtraction / Order

• Weight , Height
• Temp in oK • Temp in oC, Temp in oF,

Downgrading a variable Semi quantitative variable


• Age -> Age categories • High ordinal variables
• BMI - > BMI Categories
• Numerical pain scale (0-10) -> Pain Categories (nill/ mild /
moderate/ severe)

• No Upgradation possible
• Except when we combine variables together

3
11/7/18

Need to identify the variable type Other classification of variables


• Summary measures • Dependent and Independent variables
• Graphical measures • In Analytical designs
• Types of Statistical tests • Exposure variable and Outcome variable

• Eg- Duration of daily exercise and Heart rate


• Eg- Regular exercise (yes/ no) and the Myocardial Infraction
(Yes/ No)

Other classification of variables Other classification of variables


• Composite variables • Hard Outcome
• Eg-BMI, MACE
• Soft outcome

• Baseline variables
• Eg- Variables which are collected at the baseline measurements

Other classification of variables Data presentation


• Proxy variables • Text
• Table
• Graphs

4
11/7/18

General guideline for Table / Graphs Table


• Should independently stand • Simple table
• Table no • For Qualitative variable
• Header • Complex table (Cross table / Contingency table)
• Row and Table heading • For 2 Qualitative variable to show relation
• Number and % • Frequency class interval table
• Mean and SD • For quantitative variable- categorized to class intervals
• 2D graphs

• Data to Ink ratio

Simple table Simple table- is table needed?

Gender Frequency Percentage Gender Frequency Percentage


Male 23 46 Male 23 46
Female 27 54 Female 27 54
Total 50 100 Total 50 100

Complex table Complex table

Diabetes Male Female Total Diabetes Male Female Total


n(%) n(%) n(%) n(%) n(%) n(%)
Yes 23 (46) 32 (64) 55 (55) Yes 23 (46) 32 (64) 55 (55)
No 27 (54) 18 (36) 45 (45) No 27 (54) 18 (36) 45 (45)
Total 50 (100) 50 (100) 100 (100) Total 50 (100) 50 (100) 100 (100)

5
11/7/18

Simple table – Quantitative variables Frequency Class Interval table – Quantitative


Family size Frequency Percentage BMI Class interval Frequency Percentage
1 5 3.3 <18.5 12 16.4
2 15 10.0 18.5 - 23 24 32.9
3 23 15.3 23 - 25 20 27.4
4 56 37.4 25 - 30 14 19.2
5 34 22.7 > 30 3 4.1
6 12 8.0 Total 73 100
7 5 3.3
Total 150 100

Graphs – depend on the types of variables Other graphs


• Single qualitative • Line diagram – to show the trend of a variable over time
• Pie diagram, Bar diagram
• Kaplan Meier curve – to show the survival of a group over
• Single quantitative time
• Stem and leaf plot, Histogram, Frequency polygon, Frequency
curve, Box plot • Forrest plot – To show the homogeneity / heterogeneity of
• Relating 2 qualitative the different study results
• Multiple and Component Bar charts
• Relating a qualitative and a quantitative
• Frequency polygon, Frequency curve, Box plot
• Relating 2 quantitative
• Scatter diagram

Pie diagram –
Pie diagram
100 adult people studies for morbidity
BMI categories
Disease Frequency Percentage
13.2% 16.5% Hypertension 30
<18.5 Diabetes 20
18.5 to 24.99 Obesity 30
28.6%
25 to 29.99 Lipid disorders 20
41.7% >=30

6
11/7/18

Pie diagram Bar diagram


• The variable should have mutually exclusive and exhaustive • Simple bar diagram
categories • For single Qualitative variable
• Total percentage should add up to 100%. • Multiple bar diagram
• The number of pie’s (slices) should be limited to 5 or less. • For relating 2 or more variables
• If there are more than 6 categories then we can combine • Component Bar diagram
small pie’s to form a miscellaneous group (other group) if it • For relating 2 or more variables
makes sense.

Simple Bar diagram Multiple bar diagram in numbers


35
30 30 120 108
100
30
Percentage

25
20 20 20 80
20
60 44 48
15
32 39
10 40 29
10
20
5
0
0

Hypertension Diabetes Obesity Lipid disorder Renal disease CAD Male Female
Morbidities Obese Overweight Normal

Multiple bar diagram in % Multiple bar diagram Row column change


60 54 60 54
50 50
39 39
40 32 40 32
29 29
30 22 24 30 22 24
20 20
10 10
0 0
Male Female Obese Overweigt Normal
Obese Overweight Normal Male Female

7
11/7/18

Component bar diagram in % Component bar diagram in numbers


120 250
100 200
80 29
54 150 108
60
39 100
40 24 29
50 48
20 39
22 32 44
0 0 32
Male Female Male Female
Obese Overweight Normal Obese Overweight Normal

Stem and Leaf plot. HbA1c values-100 Stem and Leaf plot. HbA1c values-100
5 0,0,0,0,0,1,3,4,5,5,7,7,8,8,8,9,9,9,9,9 20
12.8 9.0 8.1 7.5 7.0 6.8 6.3 6.0 5.9 5.5
6 0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,2,3,3,5,5,5,6,6,7,7,7,7,8,8,9,9,9 34
12.3 8.9 8.0 7.5 7.0 6.7 6.3 6.0 5.9 5.5
7 0,0,0,0,0,0,1,1,1,2,2,3,4,4,5,5,7,8,8,8,9 21
11.1 8.9 8.0 7.4 7.0 6.7 6.2 6.0 5.9 5.4
8 0,0,0,0,1,2,3,3,4,5,8,8,9,9 14
11.0 8.8 8.0 7.4 7.0 6.7 6.1 6.0 5.9 5.3
9 0,3,8,9 4
10.9 8.8 8.0 7.3 7.0 6.7 6.1 6.0 5.9 5.1
10 0,1,9 3
10.1 8.5 7.9 7.2 7.0 6.6 6.1 6.0 5.8 5.0
11 0,1 2
10.0 8.4 7.8 7.2 6.9 6.6 6.1 6.0 5.8 5.0
12 3,8 2
9.9 8.3 7.8 7.1 6.9 6.5 6.1 6.0 5.8 5.0
9.8 8.3 7.8 7.1 6.9 6.5 6.1 6.0 5.7 5.0
9.3 8.2 7.7 7.1 6.8 6.5 6.1 6.0 5.7 5.0

Histogram Frequency polygon

8
11/7/18

Frequency Curve Box and Whisker plot

Box and Whisker plot Box and Whisker plot

Box and Whisker plot Scatter plot

9
11/7/18

Line diagram Line diagram


30
Prevalance of Diabetes as %

90
25 25
86.1
85

Mean FBS as mg %
20
79.3
15 14 80
75.9
10 7 75 72.1 72.5 73.1
5
5 3 70
2
0
65
1960 1970 1980 1990 2000 2010
1960 1970 1980 1990 2000 2010
Years Years

Line diagram Line diagram


87 24.5 24.2 24.2
Mean BMI of the Participants

86 86.2 24 23.9 24 23.9


24 24.1
Mean DBP in the patients

85 24
84 84.1 23.5
23.5
83 82.3 23
82 81.4
80.8 22.5 22.9
80.8 22.7 22.7
81 80.2 80.4
80 22
79 21.5
78
l

r
th

th
th
na

a
ye

ye
on

on
on
tio

77
M

M
m

2
n
ve

Baseline 0.5 hrs 1 hrs 1.5 hrs 2.0 hrs 2.5 hrs post Sx 1.0 hrs
t er
In

post Sx
e

Treatment Group Control Group


Pr

Summary measures
• Every variable has to summarized for the given data set to be
interpretable
• Data -> Information -> Intelligence

• Differ for the types of variable

• Qualitative – Frequency and percentage


• Quantitative – Central tendency and dispersion measures

10
11/7/18

Central tendency measures Central tendency measures


• They are those measures which give an idea about the • Mode
central observation / which represent the whole set of data • Median
• Averages is the general term • Arithmetic Mean
• Geometric Mean
• Harmonic Mean
• Trimmed mean

Mode Median
• Most repeated value • Central data when it is arranged in ascending order
• Not commonly used in Health field
• Middle data
• Used mainly for Nominal variables and low ordinal variables
• Positional Average
• Unimodal / Bimodal / Multimodal data
• Divide data into two equal half’s

• Robust measure

Median Median
• 92 96 107 112 129 140 187 223 241 248 272 • 92 96 107 112 124 129 140 187 223 241 248 272

(./0)23
• !"#$%$"& "' ()*$+& = 5"#$%$"&; & = • Odd numbered data – mean of 2 central observations
4
&" "' "7#)89+%$"&
• Position of Median = 6.5th observation
• For even numbered data - central observation
• Average of 6th and 7th Observation = 134.5
• 11observation -> so Median = 6th observation = 140

11
11/7/18

Mean= Arithmetic Mean = Arithmetic Average Geometric mean


∑$ • Serum Bilirubin Values in Hepatitis A patients
• "̅ = %

• 0.5, 0.6, 0.7, 1.0, 1.6, 1.7, 1.7, 1.7, 1.9, 1.9, 2.0, 2.0, 2.0, 2.6,
• Most commonly used measure
2.6, 2.8, 2.8, 2.8, 3.1, 3.3, 3.9, 4.3, 5.2, 6.0, 7.1, 8.9, 9.3, 9.9,
10.4, 11.8, 13.2, 15.6, 17.7, 21.3, 23.8, 28.4, 31.0, 35.8
• Influenced by extreme values

Histogram of the data Geometric Mean


∑ 567(8)
• Geometric Mean = -./0 log( )
:

Geometric mean Harmonic Mean


• Used for positively skewed data • Used for serially diluted titer values

,
•!"#$%&'( $)& !* = .

/
0

12
11/7/18

Trimmed mean ( Truncated Mean) If we want to divide a data which is arranged in an


ascending order into 2 equal parts based on the
• Fixed % of observations are removed from both end of data
number of observations
• Used for Skewed data / data with outliers / extreme values

• 5% trimmed mean = 2.5% observations trimmed from lower


end and another 2.5% from upper end of data. Then mean is • Median
re calculated with the rest of data

If we want to divide a group of data which is Different Quantiles


arranged in an ascending order into 4 equal parts
based on the number of observations No of No of division Name of the groups
groups points
2 1 1st and 2nd half’s (division point is
• Quartile median)
3 2 Tertile
4 3 Quartiles
5 4 Quintile
10 9 Decile
100 99 Percentile (Centile)

2 sets of data
Set A Set B
Weight of 5 persons in kg Weight of 5 persons in kg
53 51
54 53
55 55
56 57
57 59
Mean = 55 Mean = 55

13
11/7/18

Measures of Variability
Range
(Spread / Dispersion)
• Range • Simplest to calculate
• Mean Deviation
• Variance • !"#$% = ' − )
• Standard deviation
• Inter quartile range • Influenced by extreme values
• Coefficient of variation

Set A ! − !̅ Set B ! − !̅
Mean Deviation Weight in kg Weight in kg
53 2 51 4
• Avrege Absolute Deviation AAD 54 1 53 2
55 0 55 0
∑ ;<;̅ 56 1 57 2
• 0123 415627683 =
> 57 2 59 4
Mean = 55 Mean = 55
∑ ! − !̅ ∑ ! − !̅
% %
6 12
= =
5 5
= 1.2 ,- = 2.4 ,-

Set A ! − !̅ (! − !)̅ & Set B ! − !̅ (! − !)̅ &


Weight in kg Weight in kg
Variance & Standard Deviation 53
Variance
-2
& Standard4
Deviation
51 -4 16
54 -1 1 53 -2 4
∑(+,+)̅ /
• !"#$"%&' = 0,1
55 0 0 55 0 0
56 +1 1 57 +2 4
57 +2 4 59 +4 16
∑(+,+)̅ /
• 23"%4"#4 5'6$"3$7% = !"#$"%&' = !̅ = 55 ∑(! − !)̅ & !̅ = 55 ∑(! − !)̅ &
0,1 )*+,*-./ = )*+,*-./ =
-−1 -−1
10 40
= = 2.567 = = 10
4 4

89 = )*+,*-./ = 2.5 89 = )*+,*-./


= 1.58 67 = 10 = 3.16 67

14
11/7/18

Degree of freedom = n-1 as the denominator Interquartile Range


• 3 observation, mean is 10 • !"# = "% &'()* − ", &'()*
• 1st observation
• Selected is 10 • 92 96 107 112 129 140 187 223 241 248 272
• 2nd observation
• Selected is 12
• !"# = "% &'()* − ", &'()* = 241 − 107 = 134
• 3rd observation
• Only 8 is possible, no freedom of choice

Coefficient of Variation Coefficient of Variation


Variable Mean SD Variable Mean SD
Weight in kg 60 10 Weight in kg 60 10
Height in cm 160 16 Height in cm 160 16
Height in m 1.6 0.16

Coefficient of Variation Summary for Qualitative variable


Variable Mean SD CV • Frequency = count of observation
Weight in kg 60 10 16.7%
Height in cm 160 16 10.0% • Percentage = Relative frequency = event / 100 person
Height in m 1.6 0.16 10.0%
• Proportion as values between 0 and 1

• Rate = event / person years

15
11/7/18

Ratio / Proportion / Rate


• Ratio = a/b

• Proportion = a/(a+b)

• Rate = a / person time

Probability of survival of 0.2 (20%) at 6 months


Probability after diagnosis for a Lung cancer patient
What does it mean
• Probability is defined as the relative frequency of a desired
event (event of interest) in a long run experiment.
• Individual subject
• Long run experiment in statistics means that we are
repeating the observations with the same underlying • Average / Expected
conditions either by an experimental or by an observational
design.

+, ,- ./012/. ,345,6/0
• Probability =
+, ,- 4,478 9,001:8/ ,345,6/0

Probability Distribution Normal distribution


• Distribution is the arrangement of the data.
• Eg- Histogram, Frequency curve, frequency table, Frequency
class interval table

• Empirical probability distribution

• Theoretical probability distribution

16
11/7/18

Normal distribution
Normal distribution
• Bilaterally symmetrical and bell shaped distribution
• The highest point of the curve corresponds to mode like any other • Gaussian Distribution
distribution. Mean and median coincides with the mode in normal
distribution • Distribution of a continuous variable
• Area under the curve represents the probability of the data and the • Many biological variables follow this
total area is 1
• Mean &SD are appropriate if the data is normally distributed
• The curve does not touch the baseline as theoretically values up to
infinity are possible • How to check Normality
• Mean + 1 SD, Mean + 2 SD, Mean + 3 SD will include 66.3%, 95.4% • Mean ≃ Median
and 99.7% of the central observations respectively. This is called as • Graphs
the 3σ rule or the empirical rule. • Skewness coefficient
• Skewness and excess Kurtosis coefficients are zero. • Statistical test

Skewness
• Asymmetry of the distribution

Median
• No Skew
• Positive skew ( right skew) Mode Mean
• Negative skew (left skew)
Low Middle High
Income Income Income

Skewness Coefficients
Mean Skewness Coefficients Interpretation
Median Mode
<-1 Severe negative skew
-1 to -0.5 Moderate negative skew
-0.5 to 0 Negligible negative skew
0 Symmetrical
0 to +0.5 Negligible positive skew
+0.5 to +1 Moderate positive skew
>+1 Severe positive skew

17
11/7/18

Kurtosis

Thank You
bijugeorge1@gmail.com
9846100093
Wednesday, November 7, 2018 Dept of Community Medicine 104

18

You might also like