You are on page 1of 74

Inferential Statistics

Dr S L Jadhav
Prof , Community Medicine
Dr D Y Patil Medical College
dr_slj173@yahoo.co.in
What should you be able to do
at the end of the session?
• Be familiar with basic concepts in
inferential statistics
• Enumerate types of distributions &
describe normal & binomial distribution
• Elucidate SEM/SEP and confidence
intervals
• Enlist steps of Hypothesis testing
• Describe Type 1 & Type II errors
Descriptive statistics
The presentation , organisation and
summarisation of data

Norman G R, & Streiner D L. Biostatistics: The bare essentials. 2000. Hamilton: B.C. Decker
Inferential statistics
• Inferential statistics allows us to generalise
from our sample to a larger group of
subjects
Sample describes those individuals who are in the
study
Population describes the hypothetical (and usually)
infinite number of people to whom you wish to
generalize
Probability
• The chances of occurrence of an event in
nature or in the long run
• or Probability deals with the relative
likelihood that a certain event will or will
not occur, relative to some other events
• Dice or cards
• Max 1 min 0
• What is the probability of getting a King in
a pack of 52 cards?
• What does a p= 0.05 mean?
Difference between probability
& odds
• Two events, X and Y, are mutually
exclusive if the occurrence of one
precludes the occurrence of the other.
Laws of Probability
Let us say the probability of being
intelligent(PI) is 0.1 and the probability of
being beautiful(PB) is 0.2
What is the probability that a person chosen
at random is both intelligent & beautiful?
If X and Y are mutually exclusive events,
then the probability of X or Y is the
probability of X multiplied by the probability
of Y.
Multiplicative law = 0.1x0.2 = 0.02
Laws of Probability
What is the probability that a person chosen
at random is both intelligent OR beautiful?
• If X and Y are mutually exclusive events,
then the probability of X or Y is the
probability of X plus the probability of Y.
Additional law = 0.1+0.2 = 0.3
Normal curve
• If the histogram of a large no. of
observations with small class intervals is
plotted , then the midpoints of the class
are joined to form a frequency polygon
one gets a curve – Normal curve
• If the x axis is changed to SD units , with
mean as 0 then it is called as Standard
Normal curve
Normal curve - History
• First described by Abraham DeMoivre ,
consultant to gamblers (1733)
• Then by German mathematician Carl
Friedrich Gauss.
• Adolphe Quetelet showed measurements in
human beings follow normal distribution
Properties of Standard Normal
curve
• Also called Gaussian or Bell curve
• Most biological values follow normal distribution
• Bell shaped curve
• Symmetrical around a single peak
• Mean, median , mode coincide . Mean =0
• Two ends of curve approach x axis asymptotically
• Skewness = 0, Kurtosis = 3( some Stat programs 0)
• Area under the curve =1
• It forms the basis of many tests of statistical
significance
The Standard Normal Curve
Skewed to left Skewed to right
Distribution
• A summary of frequencies or proportions
of a characteristic from a series of data
from a sample or a population
Some types of distribution
• Normal distribution
• Binomial
• Poisson
• Log normal
Binomial distribution
Bernoulli Random Variables

• The binomial distribution shows the


probabilities of different outcomes for a
series of random events, each of which
can have only one of two values
Eg what is the probability that given 2 tries
of a coin toss both will be heads
= 0.5 x 0.5 ( Multiplicative law) = 0.25
• What is the probability that out of 10 tries,
7 will be heads
Binomial distribution
General formula = n! x pr qn-r
r!(n-r)!
n – no. of tries( let us say 10)
r – no. of favourable outcomes( let us say 7)
p - probability of each try(e.g 0.5)
q – 1-p . ie 0.5
10! x 0.57 x 0.5 10-7 = .0439
7!(10-7)!
Binomial distribution

• Properties of Binomial distribution


Mean = np
Variance = npq
SD = √npq
Where n = number of trials
p – proportion of successful events
q – proportion of unsuccessful events
As sample size increases binomial distribution approaches normal distribution
What are the measures of
variability ?
Measures of variability of samples
Quantitative data
• Standard error of mean
• Standard error of difference between means
Qualitative data
• Standard error of proportion
• Standard error of difference between
proportions
Standard error of the Mean
Standard Error (SE)of Mean
• Sample means follow normal distribution
• SD of sample means is called SE
• SE x = SD or δ
√n √n
• BP in males, Mean = 128.8mm Hg, SD
=13.05 What is SE of mean ?
Standard Error (SE)of Mean &
confidence limits
SE x = 13.05 = 0.55
√566
95 % confidence limits = Mean+ or – 2 SE
= 128.8 + 2x0.55 & 128.8 - 2x0.55
= 129.9 & 127.7 mm Hg
SE of difference between
means
• If in a study we find mean IQ of Girls is
124 and boys is 114 ie a difference of 10.
If the same study is carried out using a
similar sample size a large no of times are
we going to get the same difference?
Standard Error of Diff between
Means
SE diff between means =
(SD12) +(SD12)
√ n1 n2

If n1 & n2 are equal


The Central Limit Theorem states if we draw
equally sized samples from a non-normal
distribution, the distribution of the means of
these samples will still be normal, as long as the
samples are large enough
Standard error of Proportion
• SE proportion = pq
√ n
p – proportion of people with the attribute ,
q – proportion without, n = number

• Let us say 20% of 400 people in a study


are diabetic.
SE = 20x80 = 2%
√ 400
Standard Error (SE)of Proportion &
confidence limits
95 % confidence limits = p+ or – 2 SE
= 20 % + or – 2 x 2%
= 16%- 20%
Wrong! 16% -24%
SE of difference between
proportions
• If in a study we find obesity in Indians is
20% and USA is 25% ie a difference of
5%. If the same study is carried out using
a similar sample size a large no of times
are we going to get the same difference?
Standard error of difference
between proportions
• SE proportion = p1q1 + p2q2
√ n1 n2
Hypothesis testing
• Null hypothesis – assume no difference
• Alternative hypothesis – there is difference
• Collect data
• Decide on α and power of the test
• Assumptions and select the test
• Test statistic – apply the statistical test
• Interpret
Type 1 & Type II errors

Reality
Not Guilty Guilty
H0 True H1 True

Evidence Not Guilty Correct


H
Error
Decision
Guilty Error Correct
Decision
Type 1 & Type II errors

Reality
H0 True H1 True

Research Accept H0 Correct Type II


H
result Decision Error(b)
Reject H0 Type I Correct
Error(a) Decision

a =0.05, b= 0.2, (1-b) i.e. Power = 0.8


Judge decision & hypothesis testing

Judge decision Hypothesis testing


• Innocence : accused innocent • Null hypothesis: no
• Guilt: did commit crime association
• Criteria for rejecting • Alt Hyp: association present
innocence : beyond • Criteria for rejecting null:
reasonable doubt level of statistical significance
(α)
Judge decision & hypothesis testing

Judge decision Hypothesis testing


• Correct judgment: convict a • inference: correctly conclude
criminal association
• Correct judgment: acquit a • Correct Inf: correctly conclude
innocent person no association
• Incorrect judgment: hang • Incorr inf: incorrectly find ass
innocent (α) Type I error
• Incorrect judgment: Acquit • Incorr inf: wrongly fail to find
criminal ass (β) : Type II Error
Type I error
• Incorrect rejection of null hypothesis is
Type I
• Also called as α error
• Usually kept at .05
• Can be reduced by increasing sample size
Type II error
• Incorrect acceptance of null hypothesis is
Type II
• β error
• Usually kept at .20
• Can be reduced by increasing sample size
• 1 - β is called is called as power
Who is the star of the show?
Who is the star of the show?
• P value
• d.f
• Difference in means or %
• No. of people in the study
Tests of statistical significance
• Basic test is Signal
Noise

Remember statistical association does not


mean cause effect relationship.
Tests of significance: Quantitative data

• Unpaired & paired t tests


• ANOVA
• Repeated Measures ANOVA
• Correlation coefficient
• Regression coefficient
Tests of significance: Qualitative data

• Z test ( test of of difference between


proportions)
• X2 test
• McNemar’s test
• Fisher’s exact test
• If qualitative ordinal – Chi square for trend
Tests for ordinal data
Interval scale data Ordinal data
T test Mann Whitney or
Wilcoxon rank sum
ANOVA Kruskal Wallis
Paired t test Wilcoxon signed rank
Pearson’s correlation Spearman’s correlation
coefficient coefficient

Repeated Measures Freidman’s test


ANOVA
Introduction to WinPepi &
Primer of Biostatistics
How to find out data normally
distributed?
• WinPepi . Remember - Quantitative data.
• For single group
Describe→D →D2 →Paste data → Run : See
Shapiro-Wilk test or Shapiro-Francia test. If p >
0.05 normally distributed
• For 2 groups at a time
Compare →H →H2 →Paste data → Run : See
Shapiro-Wilk test or Shapiro-Francia test. If p >
0.05 normally distributed.
• Both groups have to be normally distributed to
run t test or ANOVA, Pearson’s correlation
Homogeneity of variances
• In Compare(WinPepi) or Epi ( will describe
later)
• See Shapiro Wilk or Bartlett’s test . If p >
0.05 equal variances. Homogeneity of
variances is a requirement for t test or
ANOVA
Roll Back Malaria
• Launched 1998 by WHO, UNICEF,UNDP & World
Bank
• Objectives :
– Rapid cure,
– ↓Morbidity
– prevent progression into severe
– ↓ impact in pregnancy
– ↓Reservoir
– Prevent drug resistance
• To reduce no of malaria cases by 50% by 2010 &
75% by 2015
t test
t= x1 – x2
SE diff between means
d. f. = n1 + n2 – 2
-

Group N Mean Std Dev SEM


1 10 17.7 3.335 1.055
2 10 22.1 3.178 1.005

Difference -4.4 1.457

95% confidence interval for difference: -7.461 to -1.339

t = -3.020 with 18 degrees of freedom; P = 0.007


Paired t test
Used when each individual gives a pair of
readings
t= d
SE
SE = SD
√n
d.f = n - 1
Group N Mean Std Dev SEM
1 10 177 48.32 15.28
2 10 124 43.26 13.68
Difference 53 33.02 10.44
95% confidence interval for difference: 29.38 to 76.62
t = 5.076 with 9 degrees of freedom; P = 0.000
What if more than two groups-
Quantitative data
ANOVA
• F = Mean square(between)
Mean square (within)
Group N Mean Std Dev SEM
1 10 4.2 0.9189 0.2906
2 10 5.3 1.16 0.3667
3 10 4.9 2.767 0.875
4 10 3.2 1.229 0.3887

Source of Variation SS DF Variance Est (MS)


Between Groups 25.4 3 8.467
Within Groups 102.2 36 2.839
Total 127.6 39

s2_bet MSbet 8.467


F = ------ = ----- = ----- = 2.98 P = 0.044
s2_wit MSwit 2.839
How to write it?

Group N Mean Std Dev


1 10 4.2 0.9189
2 10 5.3 1.16
3 10 4.9 2.767
4 10 3.2 1.229

F( 3,36) = 2.98 P = 0.044


ANOVA: Post hoc test

Comparison Difference of means t Puncorr Pcrit P<.05


2 vs 4: 5.3 - 3.2 = 2.1 3.754 0.000 0.025 Yes
1 vs 2: 4.2 - 5.3 = -1.1 1.877 0.069 0.017 No
3 vs 4: 4.9 - 3.2 = 1.7 1.877 0.069 0.013 No
2 vs 3: 5.3 - 4.9 = 0.4 0.000 1.000 0.010 No
1 vs 4: 4.2 - 3.2 = 1 0.000 1.000 0.008 No
1 vs 3: 4.2 - 4.9 = -0.7 0.000 1.000 0.007 No
What if two groups- Qualitative data

Proportions given. z test


z = p1 –p2
SE of difference between proportions
The observations about obesity are given below.

Details India USA


Sample Size 400 375
Prevalence of weight of 20 25
obesity (%)

Calculate the significance of difference in the proportion of


obesity at 5% level of significance.
--- Compare Two Proportions ---
Group 1 n = 400 p = 0.2
Group 2 n = 300 p = 0.25

The difference is: -0.05


Standard error of difference: 0.03171
95% confidence interval for difference: -0.1122 to 0.01215

z= 1.485; P = 0.138
What if two groups- Qualitative
data
Actual nos. Chi Square test
More than two groups
Table 1.Distribution of caries in
school children
Outcome Caries No caries Total
Sex O ( E) No (E) No
Boys 50 (93.75) 200 (156.25) 250

Girls 100 (56.25 ) 50 (93.75) 150

Total 150 250 400


Calculation of Chi square
̐∑(O-E)2
E
d.f = (c-1)(r-1)

Chi-square = 85.131 with 1 degree of


freedom; P < 0.001
Correlation
• Also called Pearson's correlation coefficient
• The relationship or association between two
quantitative variables – correlation
• One variable is independent : x and the other is
dependent :y
• The extent of this relationship is measured by
correlation coefficient ‘r’
• -1≤ r ≥ 1
• If no relationship r=0
• Existence of correlation does not mean causation
5 Types of correlation
Regression
• Regression coefficient is a measure of the
change in the dependent character(Y) with
one unit change in independent
character(X)
--- Linear Regression and Correlation ---

n: 20
Slope: -0.8426
y Int: 56.76
SE Slope: 0.09356
SE Int: 3.007
SE Est: 6.928
r: -0.9046
t: -9.006
DF: 18
P: 0
Which test ?
Use of Statistical advisor
• Group Activity
Present Dummy tables & what statistical
test will be used

You might also like