You are on page 1of 61

Biostatistics-Testing of

hypothesis


“Be faithful in small things because it is in them
that your strength lies.”

Mother Teresa
Introduction
• Statistics plays a vital role to make a right
decision in research
• Statistical procedures enable researchers
to summarize, organize, evaluate, interpret
and communicate numeric information
Focus of this presentation
• Discuss commonly used terms
– Null hypothesis
– Type I and Type II errors
– Power etc.

• Present a few commonly used statistical
tests
Age
(Years)
Sex BMI
Systolic
BP
(mm of Hg)
Diastolic
BP
(mm of Hg)
Hypertension
40 2 21.6 110 70 2
60 1 33.78 150 90 1
47 2 25.22 120 74 2
50 1 28 120 88 2
47 1 24.39 140 80 1
43 2 25.63 110 70 2
48 2 29.33 120 70 2
55 1 25.83 160 80 1
40 2 29.3 130 80 2
40 2 25.39 110 70 2
67 1 25.71 140 100 1
74 2 22.96 140 90 1
50 2 26.52 154 110 1
46 1 19.53 128 90 1
58 1 31.11 130 80 2
45 2 30.61 140 100 1
45 2 32.6 120 80 2
40 1 30.61 150 80 1
65 2 24.11 150 100 1
55 1 27.06 120 70 2
45 2 24.3 120 94 1
48 1 27.41 110 70 2
67 2 25.56 150 80 1
46 2 16.5 110 70 2
52 1 23.31 122 82 1
45 2 30.61 140 100 1
45 2 32.6 120 80 2
40 1 30.61 150 80 1
65 2 24.11 150 100 1
55 1 27.06 120 70 2
45 2 24.3 120 94 1
48 1 27.41 110 70 2
67 2 25.56 150 80 1
46 2 16.5 110 70 2
52 1 23.31 122 82 1
Sex : 1 – Male, 2 – Female
Hypertension : 1 – Hypertensive 2 – Normotensive
Age in Years Tally Frequency %
40 – 49 |||| |||| |||| 14 56
50 – 59 |||| | 6 24
60 – 69 |||| 4 16
70 – 80 | 1 4
Agewise distribution of the 25 persons
0
10
20
30
40
50
60
Age Group
40 – 49 50 – 59 60 – 69 70 - 80
0
10
20
30
40
50
60
Age Group
40 – 49 50 – 59 60 – 69 70 - 80
Sexwise distribution of the 25 persons

Gender Tally Frequency %
Male |||| |||| | 11 44
Female |||| |||| |||| 14 56
0
10
20
30
40
50
60
Male Female
Male Female
44
56
Male Female
Hypertension status by Gender

sex
Male Female
Hypertension
status
No % No %
Hypertensive 7
63.
6
6 42.9
Normotensive 4
36.
4
8 57.1
0
1
2
3
4
5
6
7
8
Male female
Multiple bar diagram
Hypertensive
Normotensive
0
10
20
30
40
50
60
70
Male female
Multiple bar diagram
Hypertensive
Normotensive
0%
20%
40%
60%
80%
100%
Male Female
component bar diagram
Normotensive
Hypertensive
Descriptive analysis
In illustrative data,
Mean age =

years 92 . 50
25
52 46 ........ 60 40
=
+ + + +
median age is 48 years.
mode age is 40 years
Standard deviation= 9.48 years
Scatter diagram
0
20
40
60
80
100
120
0 50 100 150 200
Systolic Blood Pressure
D
i
a
s
t
o
l
i
c

B
l
o
o
d

P
r
e
s
s
u
r
e
62 . 0 619 . 0
13 . 173
32 . 107
98 . 121 73 . 245
32 . 107
= = =
=
x
• A hypothesis is an
assumption about the
population parameter.
– A parameter is a
characteristic of the
population, like its mean
or variance.
– The parameter must be
identified before analysis.
I assume the mean
age of the
participants is 25yrs
What is a Hypothesis?
A statement about the study objective
Testing of Hypothesis (or)
Statistical tests

• The testing of hypothesis consists of 5 major
steps.
• Forming of null hypothesis
• Forming of alternative hypothesis
• Fixing of level of significance
• Applying critical ratio / formula / statistical test
• Making inference
Step 1 : Null Hypothesis:


• An unbiased statement about the study
objective.
• The null hypothesis indicates a neutral
position in the given study or experiment.
Step 2 : Alternative hypothesis
• A statement against the null hypothesis.
• It can be of two forms: without any specific
direction or with a particular direction.
• Without any specific direction is called as
two-sided alternative hypothesis.
• With a specific direction it is called as one-
sided alternative hypothesis
Step 3 : Level of Significance

• The concept of sample – population here.
• prepare to do a mental exercise.
Population and Sample
Population Sample
Use parameters to
summarize features
Use statistics to
summarize features
Inference on the population from the sample
errors
• In any study, the researcher is going to
make a decision based on the sample
result.
• The researcher is unknown about the
population.
• Since the researcher makes the decision
based on the sample, there is a possibility
of committing some errors.
• This will be explained as follows:

Population – unknown actual situation is that the null
hypothesis is
Sample-
known the
researcher
calculates a
test
statistics
and decides
on the null
hypothesis
True False
True
(Accepted)
1. Correct
decision
3. Wrong
(Type II error)
False
(Rejected)
2. Wrong
(Type I error)
4. Correct
decision
Type I error: Rejecting the null hypothesis when it is in fact, true,
usually noted as by the letter o
Type II error:Accepting the null hypothesis when it is, in fact, false
noted as |.
o
|
Reduce probability of one error
and the other one goes up.
& Have an
Inverse Relationship
• True Value of Population Parameter
– Increases When Difference Between Hypothesized
Parameter & True Value Decreases
• Significance Level o
– Increases When o Decreases
• Population Standard Deviation o

– Increases When o

Increases
• Sample Size n
– Increases When n Decreases
Factors Affecting
Type II Error,
o
|
|
o
|
n
Common significance level
• The two most frequently used significance
levels are 0.05 and 0.01.
• With a 0.05 significance level, we are
accepting the risk that out of 100 samples
drawn from a population, a true null
hypothesis would be rejected only 5 times.
The statistical tests are classified
under these six situations.

• Comparison of sample mean with population
mean.
• Comparison of two sample means.
• Comparison of more than two sample means
simultaneously.
• Comparison of sample percentage with
population percentage.
• Comparison of two sample percentages.
• Finding any association or relationship between
any two or more variables.
Mean of SE
Mean Population Mean Sample
Z
÷
=
n SD
X
/
µ ÷
=
Mean of SE
Mean Population Mean Sample
t
÷
=
n SD
X
/
µ ÷
=
1
) (
2
÷
÷ ¿
=
n
x x
i
Common statistical tests / procedures
Situation I

Comparison of sample mean and population mean.
If sample size > 30 distribution


If sample size <30



.
Here, SD
Normal distribution
„t‟ distribution with n–1 degrees of freedom
2
2
2
1
2
1
2 1
n
SD
n
SD
SampleMean Mean Sample
Z
+
÷
=
2
1 1
1
2 1
n n
S
Mean Population Mean Sample
t
+
÷
= „t‟ distribution with
n
1
+ n
2
– 2 degrees of freedom
Situation II
Comparison between two samples means
If two samples are independent and sample size is >30


Where n
1
and n
2
are sample size of sample 1 and 2.


If two samples are independent and sample size <30


Normal distribution
Where
2
) 1 ( ) 1 (
2 1
2
2 2
2
1 1
÷ +
÷ + ÷
=
n n
S n S n
S
S
1
and S
2
are SD‟s of sample 1 and 2 and n
1
and n
2
are sample size
of sample 1 and 2. (It is frequently called as independent – student „t‟ test).
n S
d
t
d
/
=
d
d
S
1
) (
2
÷
÷ ¿
=
n
d d
i
If two samples are related and sample size is <30.


„t‟ distribution with n–1 degrees of freedom.
Where
- the mean of the difference between two sample observations,
is the standard error of the mean difference.
Sd
Note :
Correlated samples refers to the situations like pre and post tests, before and after treatment,
matched sample etc.
and n is the sample size.
This „t‟ test is usually called as paired „t‟ test
Comparison between sample percentage and population percentage
(i) If sample size > 30


Where p is the sample %
P is the population %
Q = 100 – P
and N is the sample size.
Situation – IV
Normal distribution
N PQ
P p
P SE
P p
Z
/
) (
÷
=
÷
=
) / 1 / 1 (
2 1
2 1
n n PQ
P P
Z
+
÷
=
2 1
2 2 1 1
n n
P n P n
+
+
Comparison between two sample percentages
If sample size > 30

where P =
Q = 100 – P
Situation V
Normal distribution
2
1
2
÷
÷
=
n
r
r
t
¿
= (
(
¸
(

¸
÷
=
m n
j i
ij
ij ij
E
E O
1
1 , 1 ,
2
2
) (
_
Relationship or Association between two variables

(i) If two variables are quantitative, Pearson correlation
co-efficient is used to find out the relationship.
To verify the calculated correlation coefficient the following formula is used.

(ii) If two variables are qualitative and sample size is >30,

to find out the association, chi-square test is used

Chi-square distribution
.
Situation VI
„t‟ distribution with n – 2 degrees of freedom
Where Oij is the i,jth observed value
and Eij is the i,jth expected value
Example 1
• The prevalence rate of neck pain among
computer professionals is 20%. A BPT
student wants to verify whether this true.
For this the student selects 400 software
professionals with minimum experience of
3 years. He conducts a survey and founds
that 96 of them are having neck pain. With
this evidence, what type of conclusion the
student will make
Solution

• Here, the situation is comparison between
sample percentage and population
percentage and sample size is > 30.
Step 1: Null hypothesis


• Sample % = population % i.e. the sample
has come from a population with 20%
neck pain among computer professionals.
• The prevalence rate of neck pain among
sample computer professional is equal to
the prevalence rate of all computer
professional
Step 2:Alternative Hypothesis

• Since there is no specific direction
mentioned in the problem the researcher
can have a two sided alternative
hypothesis.
• Sample % is not equal to population %
• The prevalence rate of neck pain among
the sample computer professionals is
different from the all the computer
professionals.
Step3: Level of Significance

• Since in the problem there is no specific
value is mentioned about the level of
significance, the researcher can select the
level of significance as 5% i.e. o= 0.05
and the corresponding table value is 1.96.
Step 4: Critical ratio


• Here Population % is
20% so, Q = 80%
• P =20% Q = 80%
.% 3 . 23
400
93
=
2
400
80 20
= =
x
P of SE
65 . 1
2
3 . 3
2
3 . 23 20
= =
÷
= Z
Sample %P =
N PQ
P p
P SE
P p
Z
/
) (
÷
=
÷
=
Normal Curve
Characteristics of Normal Curve

• Bell shaped and symmetric
• Mean Median and Mode are all equal
• Mean +SD approximately consists of 68.7
% of the observations.
• Mean+2SD approximately consists of
95.4% of the observations.
• Mean+3SD approximately consists of
99.9% of the observations
Step 5:Interpretation

• Since the critical ratio 1.65 < table value
1.96, Accept the null hypothesis.
• Conclusion
• The researcher may conclude that
prevalence rate of neck among computer
professionals is 20%.
Example: 2

• If a physiotherapist is interested to know whether
moist heat + isometric exercise + tapping is effective in
reducing the pain in osteoarthritis knee. For this, 80
osteoarthritis patients were selected. 30 of them were
treated with moist heat + isometric exercise, their
average pain was found to be 3.90 with the SD of 1.37.
(Treatment A) Remaining 50 of them were treated with
moist heat + isometric exercise + tapping, their average
pain level was found to be 1.70 with the SD of 0.67
(Treatment B). Based on the above results can the
physiotherapist concludes that treatment B is effective
than treatment A at 5% level of significance? (Pain is
assessed by a numerical pain scale : lower the score,
less pain).
Solution

• To do the statistical test, first identify the
situation and
follow the five steps. This design clearly
tells us a comparative
study between two sample means and
sample size is > 30.
Step 1

• Null hypothesis
• Treatment „A‟ = Treatment „B‟
• i.e. there is no significant difference exists between the
moist heat + isometric exercise and the moist heat +
isometric exercise + tapping with respect to the average
pain level after the treatment.
• Step 2
• Alternative Hypothesis
• Treatment „B‟ > treatment „A‟
• Treatment „B‟ is effective than treatment „A‟
• i.e. The mean pain level after treatment „B‟ is less than
the mean pain level of treatment „A‟.
Step 3

• Level of Significance
• It is given in the problem i.e. o=0.05. Since
alternative hypothesis is one sided, the
table value is 1.64.
Step 4

• Critical ratio / Formula
2
2
2
1
2
1
2 1 2
n
SD
n
SD
Mean Mean
Z
+
÷
=
25 . 8
267 . 0
2 . 2
50
) 67 . 0 (
30
) 37 . 1 (
7 . 1 9 . 3
2 2
= =
+
÷
=
Step 5 Interpretation

• Since the critical ratio 8.25 > table
value 1.64, accept the alternative
hypothesis. It is 95% confidence that
treatment B is effective than treatment A.
Example 3

• Objective of the Study
• Find out is there any association exists between obesity
and hypertension
• Data
• 140 obesed persons were selected and 65 of them are
having hypertension.
• 110 Non-obesed persons were selected and among
them, 42 are suffering from hypertension.
• Question
• Can we conclude at 99% confidence that there is an
association exists between obesity and hypertension.
Solution

• This data can be analysed by two ways
– Comparison between two sample
percentages and
– Finding association using chi-square test.
• Here, the second way is used to verify the
data
• Null hypothesis
• There is no association between obesity
and hypertension.
• Alternative hypothesis
• An association exists between obesity and
hypertension.
Level of Significance

• Since the question is asked with 99%
confidence, o=0.01. The corresponding
chi-square table value is 6.63 for d.f. = 1
(refer the table in the appendix).
Critical ratio


Where O‟s are called observed values are E‟s are called as expected values
(
(
¸
(

¸

÷
¿ =
ij
ij ij
E
E O
2
2
) (
_
Where O‟s are called observed values are E‟s are called as expected values

hyperten
sion
Normal Total
Obesed 85 55 140
Non-Obesed 42 68 110
Total 127 123 250
Given data is known as observed value.
Expected values are calculated by

T
T T
G
C x R
R
T
- Row total
C
T
- Column total
And G
T
- Grand Total
E
E O
2
) ( ÷
12 . 71
250
140 127
=
x
88 . 68
250
140 123
=
x
88 . 55
250
110 127
=
x
12 . 54
250
110 123
=
x
Row
Colum
n
Observed
Value (O)
Expected Value
(E)
O – E (O – E)
2

1 1 85 13.88 192.65 2.71
1 2 55 13.88 192.65 2.80
2 1 42 13.88 192.65 3.45
2 2 68 13.88 192.65 3.56
Total 12.52
_2 = 12.52 d.f = (R – 1) x (C – 1) = (2-1) x (2-1) = 1x1 = 1
Interpretation

• Since the calculated value, 12.52 > table
value 6.63, accept the alternative
hypothesis.
• Conclusion
• 99% confidence, that there is an
association exists between obesity and
hypertension.
You Need to Know
• How to turn a question into hypotheses
• Failing to reject the null hypothesis DOES NOT
mean that the null is true
• Every test has assumptions
– A statistician can check all the assumptions
– If the data does not meet the assumptions there are non-
parametric versions of the tests (see text)

Avoid Common Mistakes:
Hypothesis Testing
• If you have paired data, use a paired test
– If you don‟t then you can lose power
• If you do NOT have paired data, do NOT
use a paired test
– You can have the wrong inference

Common Mistakes:
Hypothesis Testing
• These tests have assumptions of independence
– Taking multiple samples per subject ?
– Different statistical analyses MUST be used
– Distribution of the observations
– Histogram of the observations
– Highly skewed data - t test - incorrect results
Common Mistakes:
Hypothesis Testing
• Assume equal variances and the
variances are not equal
– Did not show variance test
– Not that good of a test
– ALWAYS graph your data first to assess
symmetry and variance