hypothesis
“Be faithful in small things because it is in them
that your strength lies.”
Mother Teresa
Introduction
• Statistics plays a vital role to make a right
decision in research
• Statistical procedures enable researchers
to summarize, organize, evaluate, interpret
and communicate numeric information
Focus of this presentation
• Discuss commonly used terms
– Null hypothesis
– Type I and Type II errors
– Power etc.
• Present a few commonly used statistical
tests
Age
(Years)
Sex BMI
Systolic
BP
(mm of Hg)
Diastolic
BP
(mm of Hg)
Hypertension
40 2 21.6 110 70 2
60 1 33.78 150 90 1
47 2 25.22 120 74 2
50 1 28 120 88 2
47 1 24.39 140 80 1
43 2 25.63 110 70 2
48 2 29.33 120 70 2
55 1 25.83 160 80 1
40 2 29.3 130 80 2
40 2 25.39 110 70 2
67 1 25.71 140 100 1
74 2 22.96 140 90 1
50 2 26.52 154 110 1
46 1 19.53 128 90 1
58 1 31.11 130 80 2
45 2 30.61 140 100 1
45 2 32.6 120 80 2
40 1 30.61 150 80 1
65 2 24.11 150 100 1
55 1 27.06 120 70 2
45 2 24.3 120 94 1
48 1 27.41 110 70 2
67 2 25.56 150 80 1
46 2 16.5 110 70 2
52 1 23.31 122 82 1
45 2 30.61 140 100 1
45 2 32.6 120 80 2
40 1 30.61 150 80 1
65 2 24.11 150 100 1
55 1 27.06 120 70 2
45 2 24.3 120 94 1
48 1 27.41 110 70 2
67 2 25.56 150 80 1
46 2 16.5 110 70 2
52 1 23.31 122 82 1
Sex : 1 – Male, 2 – Female
Hypertension : 1 – Hypertensive 2 – Normotensive
Age in Years Tally Frequency %
40 – 49    14 56
50 – 59   6 24
60 – 69  4 16
70 – 80  1 4
Agewise distribution of the 25 persons
0
10
20
30
40
50
60
Age Group
40 – 49 50 – 59 60 – 69 70  80
0
10
20
30
40
50
60
Age Group
40 – 49 50 – 59 60 – 69 70  80
Sexwise distribution of the 25 persons
Gender Tally Frequency %
Male    11 44
Female    14 56
0
10
20
30
40
50
60
Male Female
Male Female
44
56
Male Female
Hypertension status by Gender
sex
Male Female
Hypertension
status
No % No %
Hypertensive 7
63.
6
6 42.9
Normotensive 4
36.
4
8 57.1
0
1
2
3
4
5
6
7
8
Male female
Multiple bar diagram
Hypertensive
Normotensive
0
10
20
30
40
50
60
70
Male female
Multiple bar diagram
Hypertensive
Normotensive
0%
20%
40%
60%
80%
100%
Male Female
component bar diagram
Normotensive
Hypertensive
Descriptive analysis
In illustrative data,
Mean age =
years 92 . 50
25
52 46 ........ 60 40
=
+ + + +
median age is 48 years.
mode age is 40 years
Standard deviation= 9.48 years
Scatter diagram
0
20
40
60
80
100
120
0 50 100 150 200
Systolic Blood Pressure
D
i
a
s
t
o
l
i
c
B
l
o
o
d
P
r
e
s
s
u
r
e
62 . 0 619 . 0
13 . 173
32 . 107
98 . 121 73 . 245
32 . 107
= = =
=
x
• A hypothesis is an
assumption about the
population parameter.
– A parameter is a
characteristic of the
population, like its mean
or variance.
– The parameter must be
identified before analysis.
I assume the mean
age of the
participants is 25yrs
What is a Hypothesis?
A statement about the study objective
Testing of Hypothesis (or)
Statistical tests
• The testing of hypothesis consists of 5 major
steps.
• Forming of null hypothesis
• Forming of alternative hypothesis
• Fixing of level of significance
• Applying critical ratio / formula / statistical test
• Making inference
Step 1 : Null Hypothesis:
• An unbiased statement about the study
objective.
• The null hypothesis indicates a neutral
position in the given study or experiment.
Step 2 : Alternative hypothesis
• A statement against the null hypothesis.
• It can be of two forms: without any specific
direction or with a particular direction.
• Without any specific direction is called as
twosided alternative hypothesis.
• With a specific direction it is called as one
sided alternative hypothesis
Step 3 : Level of Significance
• The concept of sample – population here.
• prepare to do a mental exercise.
Population and Sample
Population Sample
Use parameters to
summarize features
Use statistics to
summarize features
Inference on the population from the sample
errors
• In any study, the researcher is going to
make a decision based on the sample
result.
• The researcher is unknown about the
population.
• Since the researcher makes the decision
based on the sample, there is a possibility
of committing some errors.
• This will be explained as follows:
Population – unknown actual situation is that the null
hypothesis is
Sample
known the
researcher
calculates a
test
statistics
and decides
on the null
hypothesis
True False
True
(Accepted)
1. Correct
decision
3. Wrong
(Type II error)
False
(Rejected)
2. Wrong
(Type I error)
4. Correct
decision
Type I error: Rejecting the null hypothesis when it is in fact, true,
usually noted as by the letter o
Type II error:Accepting the null hypothesis when it is, in fact, false
noted as .
o

Reduce probability of one error
and the other one goes up.
& Have an
Inverse Relationship
• True Value of Population Parameter
– Increases When Difference Between Hypothesized
Parameter & True Value Decreases
• Significance Level o
– Increases When o Decreases
• Population Standard Deviation o
– Increases When o
Increases
• Sample Size n
– Increases When n Decreases
Factors Affecting
Type II Error,
o


o

n
Common significance level
• The two most frequently used significance
levels are 0.05 and 0.01.
• With a 0.05 significance level, we are
accepting the risk that out of 100 samples
drawn from a population, a true null
hypothesis would be rejected only 5 times.
The statistical tests are classified
under these six situations.
• Comparison of sample mean with population
mean.
• Comparison of two sample means.
• Comparison of more than two sample means
simultaneously.
• Comparison of sample percentage with
population percentage.
• Comparison of two sample percentages.
• Finding any association or relationship between
any two or more variables.
Mean of SE
Mean Population Mean Sample
Z
÷
=
n SD
X
/
µ ÷
=
Mean of SE
Mean Population Mean Sample
t
÷
=
n SD
X
/
µ ÷
=
1
) (
2
÷
÷ ¿
=
n
x x
i
Common statistical tests / procedures
Situation I
Comparison of sample mean and population mean.
If sample size > 30 distribution
If sample size <30
.
Here, SD
Normal distribution
„t‟ distribution with n–1 degrees of freedom
2
2
2
1
2
1
2 1
n
SD
n
SD
SampleMean Mean Sample
Z
+
÷
=
2
1 1
1
2 1
n n
S
Mean Population Mean Sample
t
+
÷
= „t‟ distribution with
n
1
+ n
2
– 2 degrees of freedom
Situation II
Comparison between two samples means
If two samples are independent and sample size is >30
Where n
1
and n
2
are sample size of sample 1 and 2.
If two samples are independent and sample size <30
Normal distribution
Where
2
) 1 ( ) 1 (
2 1
2
2 2
2
1 1
÷ +
÷ + ÷
=
n n
S n S n
S
S
1
and S
2
are SD‟s of sample 1 and 2 and n
1
and n
2
are sample size
of sample 1 and 2. (It is frequently called as independent – student „t‟ test).
n S
d
t
d
/
=
d
d
S
1
) (
2
÷
÷ ¿
=
n
d d
i
If two samples are related and sample size is <30.
„t‟ distribution with n–1 degrees of freedom.
Where
 the mean of the difference between two sample observations,
is the standard error of the mean difference.
Sd
Note :
Correlated samples refers to the situations like pre and post tests, before and after treatment,
matched sample etc.
and n is the sample size.
This „t‟ test is usually called as paired „t‟ test
Comparison between sample percentage and population percentage
(i) If sample size > 30
Where p is the sample %
P is the population %
Q = 100 – P
and N is the sample size.
Situation – IV
Normal distribution
N PQ
P p
P SE
P p
Z
/
) (
÷
=
÷
=
) / 1 / 1 (
2 1
2 1
n n PQ
P P
Z
+
÷
=
2 1
2 2 1 1
n n
P n P n
+
+
Comparison between two sample percentages
If sample size > 30
where P =
Q = 100 – P
Situation V
Normal distribution
2
1
2
÷
÷
=
n
r
r
t
¿
= (
(
¸
(
¸
÷
=
m n
j i
ij
ij ij
E
E O
1
1 , 1 ,
2
2
) (
_
Relationship or Association between two variables
(i) If two variables are quantitative, Pearson correlation
coefficient is used to find out the relationship.
To verify the calculated correlation coefficient the following formula is used.
(ii) If two variables are qualitative and sample size is >30,
to find out the association, chisquare test is used
Chisquare distribution
.
Situation VI
„t‟ distribution with n – 2 degrees of freedom
Where Oij is the i,jth observed value
and Eij is the i,jth expected value
Example 1
• The prevalence rate of neck pain among
computer professionals is 20%. A BPT
student wants to verify whether this true.
For this the student selects 400 software
professionals with minimum experience of
3 years. He conducts a survey and founds
that 96 of them are having neck pain. With
this evidence, what type of conclusion the
student will make
Solution
• Here, the situation is comparison between
sample percentage and population
percentage and sample size is > 30.
Step 1: Null hypothesis
• Sample % = population % i.e. the sample
has come from a population with 20%
neck pain among computer professionals.
• The prevalence rate of neck pain among
sample computer professional is equal to
the prevalence rate of all computer
professional
Step 2:Alternative Hypothesis
• Since there is no specific direction
mentioned in the problem the researcher
can have a two sided alternative
hypothesis.
• Sample % is not equal to population %
• The prevalence rate of neck pain among
the sample computer professionals is
different from the all the computer
professionals.
Step3: Level of Significance
• Since in the problem there is no specific
value is mentioned about the level of
significance, the researcher can select the
level of significance as 5% i.e. o= 0.05
and the corresponding table value is 1.96.
Step 4: Critical ratio
• Here Population % is
20% so, Q = 80%
• P =20% Q = 80%
.% 3 . 23
400
93
=
2
400
80 20
= =
x
P of SE
65 . 1
2
3 . 3
2
3 . 23 20
= =
÷
= Z
Sample %P =
N PQ
P p
P SE
P p
Z
/
) (
÷
=
÷
=
Normal Curve
Characteristics of Normal Curve
• Bell shaped and symmetric
• Mean Median and Mode are all equal
• Mean +SD approximately consists of 68.7
% of the observations.
• Mean+2SD approximately consists of
95.4% of the observations.
• Mean+3SD approximately consists of
99.9% of the observations
Step 5:Interpretation
•
• Since the critical ratio 1.65 < table value
1.96, Accept the null hypothesis.
• Conclusion
• The researcher may conclude that
prevalence rate of neck among computer
professionals is 20%.
Example: 2
• If a physiotherapist is interested to know whether
moist heat + isometric exercise + tapping is effective in
reducing the pain in osteoarthritis knee. For this, 80
osteoarthritis patients were selected. 30 of them were
treated with moist heat + isometric exercise, their
average pain was found to be 3.90 with the SD of 1.37.
(Treatment A) Remaining 50 of them were treated with
moist heat + isometric exercise + tapping, their average
pain level was found to be 1.70 with the SD of 0.67
(Treatment B). Based on the above results can the
physiotherapist concludes that treatment B is effective
than treatment A at 5% level of significance? (Pain is
assessed by a numerical pain scale : lower the score,
less pain).
Solution
• To do the statistical test, first identify the
situation and
follow the five steps. This design clearly
tells us a comparative
study between two sample means and
sample size is > 30.
Step 1
• Null hypothesis
• Treatment „A‟ = Treatment „B‟
• i.e. there is no significant difference exists between the
moist heat + isometric exercise and the moist heat +
isometric exercise + tapping with respect to the average
pain level after the treatment.
• Step 2
• Alternative Hypothesis
• Treatment „B‟ > treatment „A‟
• Treatment „B‟ is effective than treatment „A‟
• i.e. The mean pain level after treatment „B‟ is less than
the mean pain level of treatment „A‟.
Step 3
• Level of Significance
• It is given in the problem i.e. o=0.05. Since
alternative hypothesis is one sided, the
table value is 1.64.
Step 4
• Critical ratio / Formula
2
2
2
1
2
1
2 1 2
n
SD
n
SD
Mean Mean
Z
+
÷
=
25 . 8
267 . 0
2 . 2
50
) 67 . 0 (
30
) 37 . 1 (
7 . 1 9 . 3
2 2
= =
+
÷
=
Step 5 Interpretation
• Since the critical ratio 8.25 > table
value 1.64, accept the alternative
hypothesis. It is 95% confidence that
treatment B is effective than treatment A.
Example 3
• Objective of the Study
• Find out is there any association exists between obesity
and hypertension
• Data
• 140 obesed persons were selected and 65 of them are
having hypertension.
• 110 Nonobesed persons were selected and among
them, 42 are suffering from hypertension.
• Question
• Can we conclude at 99% confidence that there is an
association exists between obesity and hypertension.
Solution
• This data can be analysed by two ways
– Comparison between two sample
percentages and
– Finding association using chisquare test.
• Here, the second way is used to verify the
data
• Null hypothesis
• There is no association between obesity
and hypertension.
• Alternative hypothesis
• An association exists between obesity and
hypertension.
Level of Significance
• Since the question is asked with 99%
confidence, o=0.01. The corresponding
chisquare table value is 6.63 for d.f. = 1
(refer the table in the appendix).
Critical ratio
Where O‟s are called observed values are E‟s are called as expected values
(
(
¸
(
¸
÷
¿ =
ij
ij ij
E
E O
2
2
) (
_
Where O‟s are called observed values are E‟s are called as expected values
hyperten
sion
Normal Total
Obesed 85 55 140
NonObesed 42 68 110
Total 127 123 250
Given data is known as observed value.
Expected values are calculated by
T
T T
G
C x R
R
T
 Row total
C
T
 Column total
And G
T
 Grand Total
E
E O
2
) ( ÷
12 . 71
250
140 127
=
x
88 . 68
250
140 123
=
x
88 . 55
250
110 127
=
x
12 . 54
250
110 123
=
x
Row
Colum
n
Observed
Value (O)
Expected Value
(E)
O – E (O – E)
2
1 1 85 13.88 192.65 2.71
1 2 55 13.88 192.65 2.80
2 1 42 13.88 192.65 3.45
2 2 68 13.88 192.65 3.56
Total 12.52
_2 = 12.52 d.f = (R – 1) x (C – 1) = (21) x (21) = 1x1 = 1
Interpretation
• Since the calculated value, 12.52 > table
value 6.63, accept the alternative
hypothesis.
• Conclusion
• 99% confidence, that there is an
association exists between obesity and
hypertension.
You Need to Know
• How to turn a question into hypotheses
• Failing to reject the null hypothesis DOES NOT
mean that the null is true
• Every test has assumptions
– A statistician can check all the assumptions
– If the data does not meet the assumptions there are non
parametric versions of the tests (see text)
Avoid Common Mistakes:
Hypothesis Testing
• If you have paired data, use a paired test
– If you don‟t then you can lose power
• If you do NOT have paired data, do NOT
use a paired test
– You can have the wrong inference
Common Mistakes:
Hypothesis Testing
• These tests have assumptions of independence
– Taking multiple samples per subject ?
– Different statistical analyses MUST be used
– Distribution of the observations
– Histogram of the observations
– Highly skewed data  t test  incorrect results
Common Mistakes:
Hypothesis Testing
• Assume equal variances and the
variances are not equal
– Did not show variance test
– Not that good of a test
– ALWAYS graph your data first to assess
symmetry and variance