Professional Documents
Culture Documents
What is a Hypothesis?
If your sales volumes in bush meat drop; you put forward the hypothesis that "your price
is no longer competitive". You survey the price of your competitor to either reject or fail to
reject the hypothesis. If you find your price to be the lowest, you reject the hypothesis.
Next, you hypothesis that "Ebola has changed the preference of consumers toward bush
meat". If it found to be true, you fail to reject (accept!) the hypothesis. Will you ever
hypothesise that “there is too much traffic in cities”? Or “a child in your neighbourhood cries
every midnight”? Why? Probably it would be absurd to think that traffic in cities or midnight
cries of a neighbourhood child is responsible for the drop in bush meat sales volumes. This
will not be an “intelligent guess” and “testable”. We may say that a hypothesis is an
intelligent guess which is testable.
You can also state Research Questions in the form of a "hypothesis". Hypothesis is singular
and hypotheses are plural. A hypothesis is a tentative statement that explains a particular
phenomenon which is testable. The key word is "testable". Refer to the following
statements:
(i) People from low socio-economic families tend to consume inferior goods .
(ii) Empowerment programmes are more likely to enhance performance.
(iii) Good internal control system may enhance corporate performance
(iv) Listed firms tend to perform better than unlisted firms.
All these are examples of hypotheses. However, these statements are not particularly useful
because words such as "May", "tend to" and "more likely". Using these words tentative words
does not suggest how you would go about proving it. To solve this problem, hypotheses are
used. A hypothesis should consist of the following attributes:
states two are more variables that are measurable
states an independent and dependent variable
states a relationship between the two or more variables
states a possible predictions
NULL HYPOTHESIS
The null hypothesis is a hypothesis (or hunch) about the population. It represents a theory that
has been put forward because it is believed to be true. The word "null" means nothing or
zero. So, a null hypothesis states that 'nothing happened'. For example, there is no difference
between males and females in critical thinking skills or there is no relationship between
socio-economic status and academic performance. Such a hypothesis is denoted with the
symbol ”Ho:". In other words you are saying,
You do not expect the groups to be different
You do not expect the variables to be related
F Consider a study to test the difference in senior staff teaching and non-teaching
perception of empowerment programmes in UCC. The null hypothesis of the study
could be There is no difference in senior staff teaching and non-teaching
perception of empowerment programmes in UCC.
This is represented by the following two type’s null hypotheses with the following notation or
Ho:
25
Ho: µ1 = µ2 OR Ho: µ1 - µ2 = 0
In other words, you are saying that:
The senior staff teaching perception mean score (µ1) is EQUAL to the senior staff
non-teaching perception mean score (µ2).
The senior staff teaching perception mean score (µ1) MINUS the senior staff non-
teaching perception mean score (µ2) is equal to ZERO.
The null hypothesis is often the reverse of what the researcher actually believes in and it is
put forward to allow the data to contradict it
[You may find it strange but that's the way it is!]
CONCLUSION:
Based on the findings of the data, you found that there was a significant difference in
the mean score between the senior staff non-teaching perception mean score and the
senior staff teaching perception mean score. In fact, the senior staff teaching
perception mean score was higher than the senior staff non-teaching perception mean
score. What do you do?
You REJECT the null hypothesis because earlier you had said they would be equal.
You reject the null hypothesis in favor of the ALTERNATIVE HYPOTESIS
(i.e. µ1 ≠ µ2).
Alternative Hypothesis
F The Alternative Hypothesis (H1) is the opposite of the Null Hypothesis. For example,
the alternative hypothesis for the study discussed earlier is There is difference in
senior staff teaching and non-teaching perception of empowerment programmes in
UCC represented by the following notation:
Ha: µ1 ≠ µ2
Ha: The Alternative Hypothesis might be that the senior staff teaching perception
mean score is HIGHER than senior staff non-teaching perception mean score.
Ha: µ1 > µ2
Ha: The Alternative Hypothesis might be that senior staff teaching perception means
score between senior staff non-teaching perception mean score are DIFFERENT.
Ha: µ1 < µ2
Ha: The Alternative Hypothesis might be that the senior staff teaching perception
means score is LOWER than the senior staff non-teaching perception mean score
Type 1 Error: Claiming that two means are not equal when in fact they are equal. In
other words, you reject a null hypothesis when it is TRUE.
26
Type 2 Error: Not finding a difference between two means when in fact there is a
difference. In other words, you reject a null hypothesis when it is FALSE.
Type 1 Error is the error you are likely to make when you examine your data and say
that "Something is happening here!" For example, you conclude that "There is a
difference between senior staff teaching and senior staff non-teaching ". In actual
fact, there is NO difference between senior staff teaching and senior staff non-
teaching in the population.
Type 2 Error is the error you are likely to make when you examine your data and say
"Nothing is happening here!" . For example, you conclude that "There is no difference
between senior staff teaching and senior staff non-teaching ". In actual fact, there is
a difference between senior staff teaching and senior staff non-teaching in the
population.
Ho: µ1 = µ2 OR Ho: µ1 - µ2 = 0
False true
You decide to Reject the Null Hypothesis (Ho:). You have a correct decision if in the
real world the null hypothesis is TRUE.
You decide to Reject the Null Hypothesis (Ho:). You risk committing Type 1 Error if
in the real world the hypothesis is TRUE.
You decide NOT to Reject the Null Hypothesis (Ho:). You risk committing Type 2
Error if in the real world the hypothesis is FALSE.
You decide NOT to Reject the Null Hypothesis (Ho:). You have made a correct
decision if in the real world the null hypothesis is FALSE.
In other words, when you detect a difference in the sample you are studying and a difference
also is detected in the population, you are OK. When there is no difference in the sample you
27
are studying and there is no difference in the population you are OK. However, it is Not OK
when you risk committing Type 1 and Type 2 error.
In your study, you want to determine if the perception of empowerment of senior staff
teaching is lower compared to senior staff non-teaching; i.e. null hypothesis Ho: H1 = H2.
The alternative hypothesis is Ha: H1 < H2.
A hypothesis test whose alternative hypothesis has this form (i.e. H1 < H), is called a
LEFT TAILED TEST.
In your study, you want to determine if the perception of empowerment of senior staff
teaching is higher compared to senior staff non-teaching; i.e. null hypothesis Ho: H1 = H2.
The alternative hypothesis is Ha: H1 > H2.
A hypothesis test whose alternative hypothesis has this form (i.e. H1 < H2), is called a
RIGHT TAILED TEST.
Note:
A hypothesis test is called a ONE-TAILED TEST if it is either left-tailed or right-tailed;
i.e. if it is not TWO-TAILED.
TWO-TAILED TESTS
EXAMPLE 1:
You conduct a study to determine there is a difference in perception of empowerment
between senior staff teaching and senior staff non-teaching. Your sample consists of 40
teaching and 42 non-teaching. You administer a 30 item empowerment scale item to the
sample and the results of the study showed that teaching mean perception score is 23.90
while non-teaching score is 24.50.
Step 1:
The null and alternative hypotheses are:
Ho: H1 = H2 (Mean scores on teaching and non-teaching are the same
Ha: H1 ≠ H2 (Mean scores on teaching and non-teaching are different)
Step 2:
Decide on the significant level (alpha). Here you have set it at the 5% significant level or
alpha (α) = 0.05.
Step 3:
Computation of the test statistic. Using the independent t-test formula (to be discussed latter)
you obtaind a t-value of - 1.554
Step 4:
Since n1 = 40 and n2 = 42, degrees of freedom (df) is 40 + 42 - 2 = 80. Using the alpha (α) of
0.05; check the "Table of Critical Values for the t-Test". Refer to the two-tailed row and it is
28
0.025 (0.050 divided by 2 for each tail). You find that the critical value is + and - 1.990 as
shown on the graph.
df
Step 5:
You find that the t-value obtained is - 1.554 does not fall in the Rejection region. What is
your conclusion? You do not reject Ho. In other words, you conclude that there is NO
SIGNIFICANT DIFFERENCE in perception of empowerment between senior staff teaching
and senior staff non-teaching You could also say that the test results are not statistically
significant at the 5% level.
At = 0.05, the data does not provide sufficient evidence to conclude that the mean scores on
teaching is superior to that to non-teaching, even though the mean of the former is higher
than the latter.
29
ONE-TAILED TEST
EXAMPLE:
You conduct a study to determine if students taught to use mind maps are better in recalling
concepts and principles in economics. A sample of 10 students were administered a 20 item
economics test before the treatment (i.e. pretest). The same test was administered after the
treatment (i.e. posttest) which lasted for six weeks.
Step 1:
The null and alternative hypotheses are:
Ho: µ1 = µ2 (Mean scores on the posttest and the
pretest are the same)
Ha: µ1 > µ2 (Mean scores on the posttest is greater
than the pretest)
The above is a "right tailed" test/
Step 2:
Decide on the significant level (alpha). Here you have set it at the 5% significant level or
alpha (α) = 0.05.
Step 3:
Computation of the test statistic. Using the Dependent t-test formula you obtaind a t-value
of 4.711
Step 4:
The critical value for the right-tailed test is tα with df = n-1. The number of subjects is n = 10
and α = 0.05. You check the "Table of Critical Values for the t-Test" and it reveals that for df
= 10 – 1 = 9. The critical value is 1.833 and is shown in the graph below.
30
Step 5:
You find that the t-value obtained is 1.833 falls in the Rejection region. What is your
conclusion? You Reject Ho: In other words, you conclude that there is a SIGNIFICANT
DIFFERENCE in the performance in economics before and after the treatment. You could
also say that the test results are statistically significant at the 5% level. Put it another way, the
p-value is less than the specified significance level of 0.05. [The p-value is provided in most
outputs of statistical packages such as the SPSS].
At alpha= 0.05, the data provides sufficient evidence to conclude that the mean scores on the
posstest are superior to the mean scores obtained in the pretest. Evidently, teaching students
mind mapping enhances their recall of concepts and principles in economics.
31
Topic 4: NORMAL DISTRIBUTION
A normal distribution (or normal curve) is completely determined by the mean and standard
deviation. i.e. two normally distributed variables having the same mean and standard
deviation must have the same distribution. We often identify a normal curve by stating the
corresponding mean and standard deviation and calling those the parameters of the normal
curve.
A normal distribution is symmetric about the centred at the mean of the variable, and its
spread depends on the standard deviation of the variable. The larger the standard deviation,
the flatter and more spread out is the distribution.
FORTUNATELY, these statistical tests work very well even if the distribution is only
approximately normally distributed. Some tests work well even with very wide deviations
from normality. They are described as 'robust' tests that are able to tolerate the lack of a
normal distribution.
<===== 1 SD ======>
<============= 2 SD ===============>
<====================== 3 SD ======================>
32
The graph above is a picture of a normal distribution of IQ scores among a sample of
adolescents.
Mean is 100
Standard Deviation is 15.
As you can see, the distribution is symmetric. If you folded the graph in the centre, the two
sides would match, i.e. they are identical.
The centre of the distribution is the mean. The mean of a normal distribution is also the
most frequently occuring value (i.e. the mode), and it is also the value that divides the
distribution of scores into two equal parts (i.e. the median). In any normal distribution, the
mean, median and the mode all have the same value (i.e. 100 in the example above).
The normal distribution shows the area under the curve. The Three-standard-deviations
rule, when applied to a variable states that almost all the possible observations or scores of
the variable lie within three standard deviations to either side of the mean. The normal curve
is close to (but does not touch) the horizontal axis outside the range of the three standard
deviations to either side of the mean. Based on the graph above, you will notice that with a
mean of 100 and a standard deviation of 15;
68% of all IQ scores fall between 85 (i.e. one standard deviation less then the mean
which is 100 - 15 = 85) and 115. (i.e. one standard deviation more than the mean
which is 100 + 15 = 115).
95% of all IQ scores fall between 70 (i.e. two standard deviations less then the mean
which is 100 - 30 = 70) and 130. (i.e. two standard deviations more than the mean
which is 100 + 30 = 130.
99% of all IQ scores fall between 55 (i.e. three standard deviations less than the mean
which is 100 - 45 = 55) and 145. (i.e. three standard deviations more than the mean
which is 100 + 45 = 145.
A normal distribution can have any mean and standard deviation. But the percentage of cases
or individuals falling within one, two or three standard deviations from the mean is always
the same. The shape of a normal distribution does not change. Means and standard deviations
will differ from variable to variable. But the percentage of cases or individuals falling within
specific intervals is always the same in a true normal distribution.
33
ASSESSING NORMALITY USING GRAPHICAL METHODS
Assessing normality means determining whether the variables you are studying are normally
distributed. When you draw a sample from a population that is normally distributed, it does
not mean that your sample will necessarily have a distribution that is exactly normal. Samples
vary, so the distribution of each sample may also vary. However, if a sample is reasonably
large and it comes from normal population, its distribution should look more or less normal.
For example, when you administer a questionnaire to a group of customers, you want to be
sure that your sample of 250 customers is normally distributed. WHY? The assumption of
normality is a prerequisite for many inferential statistical techniques and there are two main
ways of determining the normality of distribution.
Using graphical methods (such as histograms, stem-and-lead plots and boxplots)
Using statistical procedures.(such as the Kolmogorov-Smirnov statistic and the
Shapiro-Wilks statistics)
34
See the Graph which is a histogram showing the distribution of scores performance index of
firms.
The values on the vertical axis indicate the freqency or number of cases.
The values on the horizontal axis are midpoints of value ranges. For example, the first
bar is 20 and the second bar is 30, indicating that each bar covers a range of 10.
Superimposed on the histogram is the normal curve. Simply looking at the bars indicates that
the distribution has the rough shape of a normal distribution. The superimposed curve,
however shows that there are some deviations. The question is whether this deviation is small
enough to say that the distribution is approximately normal.
SKEWNESS:
Skewness is the degree of departure from symmetry of a distribution. A normal distribution is
symmetrical. A non-symmetrical distribution is described as being either negatively or
positively skewed. A distribution is skewed if one of its tail is longer than the other or the tail
pulled to either the left or right.
Positive skew
Skewness = 1.5
Refer to the figure above which shows the distribution of the scores obtained by students on a
test. There is a positive skew because it has a longer tail in the positive direction or the long
tail is on the right side (towards the high values on the horizontal axis).
What does it mean? It means that more students were getting low scores in the test which
indicates that the test was too difficult. Alternatively, it could mean that the questions were
not clear or the teaching methods and materials did not bring about the desired learning
outcomes.
35
Negative skew
Skewness = -1.5
Refer to the graph above which shows the distribution of the scores obtained by students on a
test. There is a negative skew because it has a longer tail in the negative direction or to the
left (towards the lower values on the horizontal axis).
What does it mean? It means that more students were getting high scores on the test which
may indicate that either the test was too easy or the teaching methods and materials were
successful in bringing about the desired learning outcomes
SPSS Output:
KURTOSIS:
Kurtosis indicates the degree of "flatness" or "peakedness" in a distribution relative to the
shape of normal distribution. With reference to the graphs below:
36
Low Kurtosis: Data with low kurtosis tend to have a flat top near the mean rather than a
sharp peak.
High Kurtosis: Data with high kurtosis tend to have a distinct peak near the mean, decline
rather rapidly and have heavy tail
37
Group 2 with a kurtosis value of -1.58 has a distribution that is more flattened and not
as normally distributed compare to Group 1.
Group 3 with a kurtosis value + 1.65 has a distribution that more peaked and not as
normally distributed compared to Group 1.
SPSS Output:
The largest observed value within the distribution is represented by the horizontal line at the
end of the box, referred to as 'wisker'.
38
Box plot
The median is presented by a horizontal line through the centre of the box
The smallest observed value within the distribution is represented by the horizontal line at the
end of the box, referred to as 'wisker'
The BOX
The box has hinges that form the outer boundaries of the box. The hinges are the
scores that cut of the top and bottom 25% of the data. Thus, 50% of the scores fall
within the hinges.
The thick horizontal line through the box represents the median in the case of a
normal distribution the line runs through the centre of the box.
If the median is closer to the top of the box, then the distribution is negatively skewed.
If it is closer to the bottom of the box, then it is positively skewed.
The WHISKER
The smallest and largest observed values within the distribution are represented by the
horizontal lines at either end of the box, commonly referred as as whiskers.
The two whiskers indicate the spread of the scores.
Any scores that fall outside the upper and lower whiskers are classed as extreme
scores or outliers. If the distribution has any extreme scores, i.e. 3 or more box lengths
from the upper or lower hinge; these will be represented by a circle (o).
Outliers tell us that we should see why it is so extreme. Could it be that you may have
made an error in data entry.
Why is it important to identify ouliers? This is because many of the statistical
techniques used involve calculation of means. The mean is sensitive to extreme scores
and it is important to be aware whether you data contain such extreme scores if you
are to draw conclusions from the statistical analysis conducted.
39
ASSESSING NORMALITY USING THE NORMALITY PROBABILITY PLOT
Besides the histogram and the box plot, another frequently used graphical technique of
determining normality is the "Normality Probablity Plot" or "Normal Q-Q Plot". The idea
behind a normal probability plot is simple. It compares the observed values of the variable to
the observations expected for a normally distributed variable. More precisely, a normal
probability plot is a plot of the observed values of the variable versus the normal scores (the
observations expected for a variable having the standard normal distribution).
In a normal probability plot, each observed or value (score) obtained is paired with its
theoretical normal distribution forming a linear pattern. If the sample is from a normal
distribution, then the observed values or scores fall more or less in a straight line. The normal
probability plot is formed by:
Vertical axis: Expected normal values
SPSS Procedures
40
9. In the Missing Values box, click on the Exclude cases pairwise radion button. If this
option is not selected then, by default,
any variable with missing data will be excluded from the analysis. That is, plots and
statistics will generated only for cases with
complete data.
10.Click on Continue and then OK
Note that these commands will give you the 'Histogram', 'Stem-and-leaf plots', 'Boxplots' and
Normality Plots.
When you use a normal probability plot to assess the normality of a variable, you must
remember that the decision of whether the distribution is roughly linear and is normal is a
subjective one. The graph above is an example of a normal probability plot. Though none of
the value fall exactly on the line, most of the points are very close to the line.
Value that are above the line represent units for which the observation is larger than
its normal score.
Value that are below the line represent units for which the observation is smaller than
its normal score.
Note that there is one value that falls well outside the overall pattern of the plot. It is called an
outlier and you will have to remove the outlier from the sample data and redraw the normal
probability plot. Even with the outlier, the values are close to the line and you can conclude
that the distribution will look like a bell-shaped curve. If the normal scores plot departs only
41
slightly from having all of its dots on the line, then the distribution of the data departs only
slightly from a bell-shaped curve. If one or more of the dots departs substantially from the
line, then the distribution of the data is substantially different from a bell-shape.
Outliers:
Refer to the normal probability plot on the right. Note that there are possible outliers which
are values lying off the hypothetical straight line.
Outliers are anomalous values in the data which may be due to recording errors, which may
be correctable, or they may be due to the sample not being entirely from the same population.
42
Skewness to the right:
If both ends of the normality plot bend above the straight line passing through the values of
the probability plot, then the population distribution from which the data were sampled may
be skewed to the right.
43
ASSESSING NORMALITY USING STATISTICAL TECHNIQUES
The graphical methods discussed present qualitative information about the distribution of
data that may not be apparent from statistical tests. Histograms, box plots and normal
probability plots are graphical methods are useful for determining whether data follow a
normal curve. Extreme deviations from normality are often readily identified from graphical
methods. However, in many instances the decision is not straightforward. Using graphical
methods to decide whether a data set is normally distributed involves making a subjective
decision; formal test procedures are usually necessary to test the assumption of normality.
In general, both statistical tests and graphical plots should be used to determine normality.
However, the assumption of normality should not be rejected on the basis of a statistical test
alone. In particular, when the sample is large, available, statistical tests for normality can be
sensitive to very small (i.e., negligible) deviations in normality. Therefore, if the sample is
very large, a statistical test may reject the assumption of normality when the data set, as
shown using graphical methods, is essentially normal and the deviation from normality too
small to be of practical significance.
KOLMOGOROV-SMIRNOV TEST
You could use the Kolmogorov-Smirnov statistic Z test evaluates statistically whether the
difference between the observed distribution and a theoretical normal distribution is small
enough to be just due to chance. If it could be due to chance you would treat the distribution
as being normal. If the distribution between the actual distribution and the theoretical normal
distribution is larger than is likely to be due to chance (sampling error) then you would treat
the actual distribution as not being normal.
In terms of hypothesis testing, the Kolmogorov-Smirnov test is based on Ho: that the data are
normally distributed. The test is used for samples which have more than 50 subjects.
Ho: µ1 = µ2 OR Ha: µ1 ≠ µ2
If the Kolmogorov-Smirnov Z tests yields a significance level of less (<) than 0.05, it
means that the distribution is NOT normal.
If the Kolmogorov-Smirnov Z test yields a significance level of more (>) than 0.05, it
means that the distribution is normal.
Kolmogrorov-Smirnov (a)
Statistic df Sig.
score .21 1598 .000*
* This is lower bound of the true signifcance
(a) Lilliefors Significance Correction
44
The Kolmogorov-Smirnov Z test indicates that the p-value is less than 0.05 and hence you
REJECT the null hypotheis. You conclude that for this particular distribution is NOT
NORMAL.
In terms of hypothesis testing, the Shapiro-Wilk test is based on Ho: that the data are
normally distributed. The test is used for samples which have less than 50 subjects.
Ho: µ1 = µ2 OR Ha: µ1 ≠ µ2
SPSS output: Table showing the Kolmogorov-Smirnov statistic for assessing normality
Tests of Normality
Shapiro-Wilk
The Shapiro-Wilk normality tests indicate that the scores are normally distributed in each of
the three groups. All the p-values reported are more than 0.05 and hence you DO NOT
REJECT the null hypothesis.
In terms of hypothesis testing, the Kolmogorov-Smirnov test is based on Ho: that the data are
normally distributed. The test is used for samples which have more than 50 subjects.
Ho: µ1 = µ2 OR Ha: µ1 ≠ µ2
If the Kolmogorov-Smirnov Z tests yields a significance level of less (<) than 0.05, it
means that the distribution is NOT normal.
45
If the Kolmogorov-Smirnov Z test yields a significance level of more (>) than 0.05, it
means that the distribution is normal.
NOTE:
It should be noted that with large samples even a very small deviation from normality
can yield low significance levels, so a judgement still has to be made as to whether
the departure from normality is large enough to matter.
46
applied to "normalize" the distribution. The type of transformation selected depends on the
manner to which the distribution departs from normality.
Positive skew
The more commonly used transformations which are appropriate for data which are skewed
to the right with increasing strength (positive skew) are 1/x, log(x) and sqrt(x), where the x's
are the data values.
Negative skew
The more commonly used transformations which are appropriate for data which are skewed
to the left with increasing strength (negative skew) are squaring, cubing, and exp(x).
47