Professional Documents
Culture Documents
Statistics For Educational Research PDF
Statistics For Educational Research PDF
STATISTICS FOR
EDUCATIONAL
RESEARCH
Prof Dr John Arul Philips
Summary 30
Key Terms 30
Topic 5 t-test 67
5.1 What is t-test? 67
5.2 Hypothesis Testing using t-test 68
5.3 t-test for Independent Means 69
5.4 t-test for Independent Means Using SPSS 77
5.5 t-test for Dependent Means 79
5.6 t-test for Dependent Means Using SPSS 83
Summary 87
Key Terms 88
Summary 152
Key Terms 152
Appendix 183
Learning Package
In this Learning Module you are provided with TWO kinds of course materials:
1. The Course Guide you are currently reading
2. The Course Content (consisting of 10 topics)
Course Synopsis
To enable you to achieve the FOUR objectives of the course, HMEF5113 is
divided into 10 topics. Specific objectives are stated at the start of each topic,
indicating what you should be able to do after completing the topic.
Topic 1: Introduction
The topic introduces the meaning of Statistics and explains the
difference between descriptive and inferential statistics. As
inferential statistics is used to make inferences about the
population on specific variables based on a sample, this topic also
explains the meanings of different types of variables and
highlights the different sampling techniques in educational
research.
Topic 5: T - test
This topic explains what the t-test is and its use in hypothesis
testing. It also highlights the assumptions for using the t-test. Two
types of t-test are elaborated in the topic. The first one is the t-test
for independent means, while the second one is the t-test for
dependent means. Computation of the t-statistic using formulae,
as well as the SPSS procedures, is explained.
Topic 8: Correlation
This topic explains the concept of linear relationship between
variables. It discusses the use of statistical tests to determine
correlation and demonstrates how to compute correlation between
variables using SPSS and interpret correlation results.
To help you read and understand the individual topics, numerous realistic
examples support all definitions, concepts and theories. Diagrams and text are
combined into a visually appealing, easy-to-read module. Throughout the course
content, diagrams, illustrations, tables and charts are used to reinforce important
points and simplify the more complex concepts. The module has adopted the
following features in each topic:
INTRODUCTION
Lists the headings and subheadings of each topic to provide an overview of the
contents of the topic and prepare you for the major concepts to be studied and
learned.
LEARNING OUTCOMES
This is a listing of what you should be able to do after successful
completion of a topic. In other words, whether you are be able to explain,
compare, evaluate, distinguish, list, describe, relate and so forth. You
should use these indicators to guide your study. When you have finished a
topic, you must go back and check whether you have achieved the learning
outcomes or be able to do what is required of you. If you make a habit of
doing this, you will improve your chances of understanding the contents of
the course.
Copyright © Open University Malaysia (OUM)
COURSE GUIDE xiii
SELF-CHECK
ACTIVITY
The main ideas of each topic are listed in brief sentences to provide a review of
the content. You should ensure that you understand every statement listed. If
you do not, go back to the topic and find out what you do not know.
Key Terms discussed in the topic are placed at the end of each topic to make you
aware of the main ideas. If you are unable to explain these terms, you should go
back to the topic to clarify.
DISCUSSION QUESTIONS:
At the end of each topic, a list of questions is presented that are best solved
through group interaction and discussion. You can answer the questions
individually. But, you are encouraged to work with your coursemates and
discuss online and during the seminar sessions.
At the end of each topic a list of articles and titles of books is provided that is
directly related to the contents of the topic. As far as possible the articles and
books suggested for further reading will be available in OUMÊs Digital Library
(which you can access) and OUMÊs Library. Also, relevant Internet resources are
made available to enhance your understanding of selected curriculum concepts
and principles as applied in real-world situations.
Facilitator
Your facilitator will mark your assignment. Do not hesitate to discuss during the
seminar session or online if:
You do not understand any part of the course content or the assigned
readings
You have difficulty with the self-tests and activities
You have a question or problem with the assignment.
(e) When you have completed the topic, review the learning outcomes to
confirm that you have achieved them and are able to do what is
required.
(f) If you are confident, you can proceed to the next topic. Proceed topic
by topic through the course and try to pace your study so that you
keep yourself on schedule.
(g) After completing all topics, review the course and prepare yourself for
the final examination. Check that you have achieved all topic learning
outcomes and the course objectives (listed in this Course Guide).
FINAL REMARKS
Once again, welcome to the course. To maximise your gain from this course
you should try at all times to relate what you are studying to the real world.
Look at the environment in your institution and ask yourself whether the ideas
discussed apply. Most of the ideas, concepts and principles you learn in this
course have practical applications. It is important to realise that much of what
We wish you success with the course and hope that you will find it interesting,
useful and relevant in your development as a professional. We hope you will
enjoy your experience with OUM and we would like to end with a saying by
Confucius „Education without thinking is labour lost.‰
INTRODUCTION
This guide explains the basis on which you will be assessed in this course during
the semester. It contains details of the facilitator-marked assignments, final
examination and participation required for the course.
One element in the assessment strategy of the course is that all students should
have the same information as facilitators about the answers to be assessed.
Therefore, this guide also contains the marking criteria that facilitators will use in
assessing your work.
Please read through the whole guide at the beginning of the course.
ACADEMIC WRITING
(a) Plagiarism
(i) What is Plagiarism?
Any written assignment (essays, project, take-home exams, etc)
submitted by a student must not be deceptive regarding the abilities,
knowledge or amount of work contributed by the student. There are
many ways that this rule can be violated. Among them are:
(c) Referencing
All sources that you cite in your paper should be listed in the Reference
section at the end of your paper. Here is how you should do your
Reference.
ASSESSMENT
Please refer to myVLE.
INTRODUCTION
This topic introduces the meaning of statistics and explains the difference between
descriptive and inferential statistics. As inferential statistics is used to make
inference about the population on specific variables based on a sample, this topic
also explains the meanings of different types of variables and highlights the
different sampling techniques in educational research.
“The science of learning from data. Statistics is essential for the proper
running of government, central to decision making in industry, and a core
component of modern educational curricula at all levels. ”
Note that the word "mathematics" is mentioned in two of the definitions above,
while "science" is stated in the other definition. Some students are afraid of
mathematics and science. These students feel that since they are from the fields of
humanities and social sciences, they are weak in mathematics. Being terrified of
mathematics does not just happen overnight. Chances are that you may have had
bad experiences with mathematics in earlier years (Kranzler, 2007).
Fear of mathematics can lead to a defeatist attitude which may affect the way you
approach statistics. In most cases, the fear of statistics is due to irrational beliefs.
Just because you had difficulty in the past, does not mean that you will always
have difficulty with quantitative subjects. You have come this far in your
education and by doing this course in statistics, it is not likely that you are an
incapable person.
You have to convince yourself that statistics is not a difficult subject and you need
not worry about the mathematics involved. Identify your irrational beliefs and
thoughts about statistics. Are you telling yourself: "I'll never be any good in
statistics." “I'm a loser when it comes to anything dealing with numbers," or
"What will other students think of me if I do badly?"
For each of these irrational beliefs about your abilities, ask yourself what evidence
is there to suggest that "you will never be good in statistics" or that "you are weak
at mathematics." When you do that, you will begin to replace your irrational
beliefs with positive thoughts and you will feel better. You will realise that your
earlier beliefs about statistics are the cause of your unpleasant emotions. Each
time you feel anxious or emotionally upset, question your irrational beliefs. This
may help you to overcome your initial fears.
Keeping this in mind, this course has been written by presenting statistics in a
form that appeals to those who fear mathematics. Emphasis is on the applied
aspects of statistics and with the aid of a statistical software called Statistical
Package for the Social Sciences (or better known as SPSS), you need not worry
too much about the intricacies of mathematical formulas. Computations of
mathematical formulas have been kept to a minimum. Nevertheless, you still need
to know about the different formulas used, what they mean and when they are
used.
Descriptive statistics includes the construction of graphs, charts and tables and the
calculation of various descriptive measures such as averages (e.g. mean) and
measures of variation (e.g. standard deviation). The purpose of descriptive
statistics is to summarise, arrange and present a set of data in such a way that
facilitates interpretation. Most of the statistical presentations appearing in
newspapers and magazines are descriptive in nature.
Inference is the act or process of deriving a conclusion based solely on what one
already knows. In other words, you are trying to reach conclusions that extend
beyond data obtained from your sample towards what the population might think.
You are using methods for drawing and measuring the reliability of conclusions
about a population based on information obtained from a sample of the
population. Among the widely used inferential statistical tools are t-test, analysis
of variance, Pearson’s correlation, linear regression and multiple regression.
As you proceed through this course, you will obtain a more thorough
understanding of the principles of descriptive and inferential statistics. You should
establish the intent of your study. If the intent of your study is to examine and
explore the data obtained for its own intrinsic interest only, the study is
descriptive. However, if the information is obtained from a sample of a population
and the intent of the study is to use that information to draw conclusions about the
population, the study is inferential. Thus, a descriptive study may be performed on
a sample as well as on a population. Only when an inference is made about the
population, based on data obtained from the sample, does the study become
inferential.
SELF-CHECK 1.1
1. Define statistics.
2. Explain the differences between descriptive and inferential statistics.
3. When would you use the two types of statistics?
4. Explain two ways in which descriptive statistics and inferential
statistics are interrelated.
1.3 VARIABLES
Before you can use a statistical tool to analyse data, you need to have data which
have been collected. What is data? Data is defined as pieces of information which
are processed or analysed to enable interpretation. Quantitative data consist of
numbers, while qualitative data consist of words and phrases. For example, the
scores obtained from 30 students in a mathematics test are referred to as data. To
explain the performance of these students you need to process or analyse the
scores (or data) using a calculator or computer or manually. We collect and
analyse data to explain a phenomenon. A phenomenon is explained based on the
interaction between two or more variables. The following is an example of a
phenomenon:
What is a Variable?
A variable is a construct that is deliberately and consciously invented or adopted
for a special scientific purpose. For example, the variable “Intelligence” is a
construct based on observation of presumably intelligent and less intelligent
behaviours. Intelligence can be specified by observing and measuring using
intelligence tests, as well as interviewing teachers about intelligent and less
intelligent students. Basically, a variable is something that “varies” and has a
value. A variable is a symbol to which are assigned numerals or values. For
example, the variable “mathematics performance” is assigned scores obtained
from performance on a mathematics test and may vary or range from 0 to 100.
When you use any statistical tool, you should be very clear on which variables
have been identified as independent and which are dependent variables.
Put it another way, the DV is the variable predicted to, whereas the independent
variable is predicted from. The DV is the presumed effect, which varies with
changes or variation in the independent variable.
Thus, it is essential that you stipulate clearly how you have defined variables
specific to your study. For example, in an experiment to determine the
effectiveness of the discovery method in teaching science, the researcher will have
to explain in great detail the variable “discovery method” used in the experiment.
Even though there are general principles of the discovery method, its application
in the classroom may vary. In other words, you have to define the variable
operationally or how it is used in the experiment.
SELF-CHECK 1.2
1. What is a variable?
2. Explain the differences between a continuous variable and
nominal variable.
3. Why should variables be operationally defined?
1.5 SAMPLING
Every day, we make judgments and decisions based on samples. For example,
when you pick a grape and taste it before buying the whole bunch of grapes, you
are doing a sampling. Based on the one grape you have tasted, you will make the
decision whether to buy the grapes or not. Similarly, when a teacher asks a student
two or three questions, he is trying to determine the student’s grasp of an entire
subject. People are not usually aware that such a pattern of thinking is called
sampling.
• Population (Universe) is defined as an aggregate of people, objects, items,
etc. possessing common characteristics. It is a complete group of people,
objects, items, etc. about which we want to study. Every person, object, item,
etc. has certain specified attributes. In Figure 1.2, the population consists of #,
$, @, & and %.
• Sample is that part of the population or universe which we select for the
purpose of investigation. The sample is used as an "example" and in fact the
word sample is derived from the Latin exemplum, which means example. A
sample should exhibit the characteristics of the population or universe; it
should be a "microcosm," a word which literally means "small universe." In
Figure 1.2, the sample also consists of one #, $, @, & and %.
The study of a sample offers several advantages over a complete study of the
population. Why and when is it desirable to study a sample rather than the
population or universe?
• In most studies, investigation of the sample is the only way of finding out
about a particular phenomenon. In some cases, due to financial, time and
physical constraints, it is practically impossible to study the whole population.
Hence, an investigation of the sample is the only way of making a study.
• If one were to study the population, then every item in the population is
studied. Imagine having to study 500,000 Form 5 students in Malaysia!
Wonder what the costs will be! Even if you have the money and time to study
the entire population of Form 5 students in the country, it may take so much
time that the findings will be no use by the time they become available.
• Studying the population may not be necessary, since we have sound sampling
techniques that will yield satisfactory results. Of course, we cannot expect
from a sample exactly the same answer that might be obtained from studying
the whole population.
• However, by using statistics, we can establish based on the results obtained
from a sample, the limits, with a known probability where the true answer lies.
• We are able to generalise logically and precisely about different kinds of
phenomena which we have never seen simply based upon a sample of, say,
200 students.
ACTIVITY 1.1
1. What is the difference between a population and a sample?
2. Why is a study of the population practically impossible?
3. “The sample should be representative of the population.” Explain.
4. Provide a scenario of your own, in which a sample is not
representative.
5. Explain why a sample of 30 doctors from Kuala Lumpur taken to
estimate the average income of all Kuala Lumpur residents is not
representative.
Suppose for example there are 10,000 Form 1 students in a particular district and
you want to select a simple random sample of 500 students, when we select the
first case, each student has one chance in 10,000 of being selected. Once the
student is selected, the next student to be selected has a 1 in 9,999 chance of being
selected. Thus, as each case is selected, the probability of being selected next
changes slightly because the population from which we are selecting has become
one case smaller.
Say, for example, you choose line 3 and begin your selection. You will select
student #265, followed by student #313 and student #492. When you come to
‘805’ you skip the number because you only need numbers between 1 and 500.
You proceed to the next number, i.e. student #404. Again you skip ‘550’ and
proceed to select student #426. You continue until you have selected all 500
students to form your sample. To avoid repetition, you also eliminate numbers
that have occurred previously. If you have not found enough numbers by the time
you reach the bottom of the table, you move over to the next line or column.
SELF-CHECK 1.3
ACTIVITY 1.2
1. Briefly discuss how you would select a sample of 300 teachers from a
population of 5,000 teachers in a district using systematic sampling.
2. What are some advantages of using systematic sampling?
ACTIVITY 1.3
For example, in a particular district there are 10,000 households clustered into 25
sections. In cluster sampling, you draw a random sample of five sections or
clusters from the list of 25 sections or clusters. Then, you study every household
in each of the five sections or clusters. The main advantage of cluster sampling is
that it saves time and money. However, it may be less precise than simple random
sampling.
To use SPSS, you have to create the SPSS data file. Once this data file is created
and data entered, you can run statistical procedures to generate your statistical
output. Refer to Appendix A at the end of this module on how to go about
creating this SPSS data file.
Copyright © Open University Malaysia (OUM)
INTRODUCTION TO STATISTICS 15
• Descriptive statistics include the construction of graphs, charts and tables and
the calculation of various descriptive measures such as averages (means) and
measures of variation (standard deviations).
• Operational definition means that variables used in the study must be defined
as it is used in the context of the study.
• Population (universe) is defined as an aggregate of people, objects, items, etc.
possessing common characteristics, while sample is that part of the population
or universe we select for the purpose of investigation.
• In cluster sampling, the unit of sampling is not the individual but rather a
natural group of individuals.
INTRODUCTION
This topic introduces the different descriptive statistics, namely the mean, the
median, the mode and the standard deviation, and how they are computed. SPSS
procedures on how to obtain these descriptive statistics are also provided.
scores. Graphical methods are better suited than numerical methods for
identifying patterns in the data. Numerical approaches are more precise and
objective.
2.2.1 Mean
Mean and the standard deviation are the most widely used statistical tools in
educational and psychological research. Mean is the most frequently used
measure of central tendency, while standard deviation is the most frequently used
measure of variability or dispersion.
The mean or X (pronounced as X bar) is the figure obtained when the sum of all
the items in the group is divided by the number of items (N). Say for example you
have the score of 10 students on a science test.
In the computation of the mean, every item counts. As a result, extreme values at
either end of the group or series of scores severely affect the value of the mean.
The mean could be "pulled towards" as a result of the extreme scores which may
give a distorted picture of the groups or series of scores or data.
However, in general, the mean is a good measure of central tendency for roughly
symmetric distributions but can be misleading in skewed distributions (see the
example on page 20) since it can be greatly influenced by extreme scores.
2.2.2 Median
Median is the score found at the exact middle of the set of values. One way to
compute the median is to list all scores in ascending order and then locate the
score in the centre of the sample. For example, if we order the following seven
scores as shown below, we would get:
Score 25 is the median because it represents the halfway point for the distribution
of scores.
There are eight scores. The fourth score (20) and the fifth score (20) represent the
halfway point. Since both of these scores are 20, the median is 20.
If the two middle scores had different values, you have to interpolate to determine
the median by adding up the two values and dividing the sum by 2. For example,
2.2.3 Mode
Mode is the most frequently occurring value in the set of scores. To determine the
mode, you might again order the scores as shown below and then count each one.
The most frequently occurring value is the mode. In our example, the value 15
occurs three times and is the mode. In some distributions, there is more than one
modal value. For instance, in a bimodal distribution there are two values that
occur most frequently.
If the distribution is truly normal (i.e. bell-shaped), the mean, median and mode
are all equal to each other.
The mean and median are two common measures of central tendencies of a
typical score in a sample. Which of these two should you use when describing
your data? It depends on your data. In other words, you should ask yourself
whether the measure of central tendency you have selected gives a good
indication of the typical score in your sample. If you suspect that the measure of
central tendency selected does not give a good indication of the typical score, then
you most probably have chosen the wrong one.
The mean is the most frequently used measure of central tendency and it should
be used if you are satisfied that it gives a good indication of the typical score in
your sample. However, there is a problem with the mean. Since it uses all the
scores in a distribution, it is sensitive to extreme scores.
Example:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 42 is 29.89
Copyright © Open University Malaysia (OUM)
TOPIC 2 DESCRIPTIVE STATISTICS 21
If we were to change the last score from 42 to 70, see what happens to the mean:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 70 is 33.00
Obviously, this mean is not a good indication of the typical score in this set of
data. The extreme score has changed the mean from 29.89 to 33.00. If these were
test scores, it may give the impression that students performed better in the later
test when in fact only one student scored highly.
If you find that you have an extreme score and you are unable to use the mean,
then you should use the median. The median is not sensitive to extreme scores. If
you examine the above example, the median is 30 in both distributions. The
reason is simply that the median score does not depend on the actual scores
themselves beyond putting them in ascending order. So the last score in a
distribution could be 80, 150 or 5,000 and the median still would not change. It is
this insensitivity to extreme scores that makes the median useful when you cannot
use the mean.
2.3.1 Range
Range is simply the highest value minus the lowest value. For example, in a
distribution, if the highest value is 36 and the lowest is 15, the range is 36 – 15 = 21.
Standard deviation makes use of the deviations of the individual scores from the
mean. Then, each individual deviation is squared to avoid the problem of plus
and minus. Standard deviation is the most often used measure of variability or
variation in educational and psychological research.
2
1 n 2 X X
S= Xi X OR
n–1i 1 N 1
23 23 – 25 = – 2 4
22 22– 25 = – 3 9
26 26 – 25 = + 1 1
21 21 – 25 = – 4 16
30 30 – 25 = + 5 25
24 24 – 25 = + 1 1
20 20 – 25 = – 5 25
27 27 – 25 = + 2 4
25 25 – 25 = 0 0
32 32 – 25 = + 7 49
2
X = 25 X X = 134
2
X X 134 134
Std. Deviation = 3.8586
N–1 10 – 1 9
In Class B (Figure 2.2), there is low variance or a small standard deviation which
explains why most of the scores are clustered around the mean. Most of the scores
are “bunching” around the mean i.e. most of the scores are 3 from the mean. If
the mean is 50, approximately 95% of the students scored between 47 and 53.
ACTIVITY 2.1
Below are the scores obtained by students in two classes on a history test:
Class A marks: 15, 25, 20, 20, 18, 22, 16, 24, 28, 12
Class B marks: 10, 30, 13, 27, 16, 24, 5, 35, 28, 12
(a) Compute the mean of the two classes.
(b) Compute the standard deviation of the two classes.
(c) Explain the implication of differences in standard deviations.
2.4.1 Tables
Tables can contain a great deal of information but they also take up a lot of space
and may overwhelm readers with details. How should tables be presented in a
manner that can be easily understood? In general, frequency tables are best for
variables with different numbers of categories (see Table 2.2).
Table 2.2 summarises the responses of 13 teachers with regard to the teaching of
sex education in secondary school.
• The first column contains the values or categories of the variables (opinion
on teaching sex education in schools – extent of agreement).
• The frequency column indicates the number of respondents in each category.
• The percent column lists the percentage of the whole sample in each
category. These percentages are based on the total sample size, including
those who did not answer the question. Those who did not answer will be
shown as missing cases in this column.
• The valid percent column contains the percentage of those who gave a valid
response to the question that belongs to each category. When there are no
missing cases, the valid percent column is similar to the percent column.
2.5 GRAPHS
Graphs are widely used in describing data. However, it should be appropriately
used. There is a tendency for graphs to be cluttered, confusing and downright
misleading.
values (line graphs). Which units are used depend on the level of
measurement of the variable being graphed.
• In the example in Figure 2.3, the X-axis represents the students’ gain scores
after undergoing an innovative instructional programme.
• The Y-axis, which appears either in percentages or frequencies, as in Figure
2.3, shows the frequency of students who obtained the various scores
indicated in the X-axis.
• Interpretation of the graph on “Students’ Gain Scores”:
– A total of 275 students obtained between 1 and 5 marks as a result of
the innovative instructional programme; 199 obtained between 6 and
10 marks; 77 between 11 and 15 marks; and 28 between 16 and 20
marks.
– The number of students who obtained high gain scores decreases
gradually.
2.5.2 Histogram
Histograms are different from bar charts because they are used to display
continuous variables (see the histogram in Figure 2.4).
Figure 2.4: Percentage who agreed that sex education should be taught
in secondary schools
• The X-axis represents the different age groups, while the Y-axis represents the
percentages of respondents.
• Each bar in the X-axis represents one age group in ascending order.
• The Y-axis in this case represents the percentages of respondents in the Sex
Education survey.
• Interpretation of the graph “Sex Education Should be Taught in Secondary
School”:
– Among the 18 to 28 age group, only 20% agreed that sex education should
be taught in schools compared to 60% in the 51 to 61 age group.
– About 40% in the 40 to 50 age group and 50% among the 29 to 39 age
group agreed that sex education should be taught in secondary schools.
– Only 10% of those aged 73 years and older agreed that secondary school
students should be taught sex education.
The line graph in Figure 2.5 shows the frequency of using the library among a
group of male and female respondents. The level of measurement of the Y-axis
variable is ordinal or interval. Line graphs are more suitable for variables that
have more than five or six categories. They are less suited for variables with a
very large number of values as this can produce a very jagged and confusing
graph.
Since a separate line is produced for each category of the x variable, only x
variables with a small numbers of categories should be used. This will normally
mean that the x variable is a nominal or ordinal variable.
ACTIVITY 2.2
Interpret the line graph (Figure 2.5) showing the frequency of a group of
respondents visiting the library. A separate line is used for male and
female respondents.
• Mean, median and mode are common descriptive statistics used to measure
central tendency, while standard deviation is the commonly used statistic to
measure variability or dispersion of data.
• Graphs are also used to condense large sets of data and these include the use
of bar charts, histograms and line graphs.
INTRODUCTION
This topic explains what normal distribution is and introduces the graphical as
well as the statistical techniques used in assessing normality. It also presents SPSS
procedures for assessing normality.
While some argue that in the real world, scores or observations are seldom
normally distributed, others argue that in the general population, many variables
such as height, weight, IQ scores, reading ability, job satisfaction and blood
pressure turn out to have distributions that are bell-shaped or normal.
Fortunately, these statistical tests work very well even if the distribution is only
approximately normally distributed. Some tests work well even with very wide
deviations from normality. They are described as “robust” tests that are able to
tolerate the lack of a normal distribution.
A normal distribution is symmetric and centred at the mean of the variable, and its
spread depends on the standard deviation of the variable. The larger the standard
deviation, the flatter and more spread out is the distribution.
As you can see, the distribution is symmetric. If you folded the graph in the
centre, the two sides would match, i.e. they are identical.
A normal distribution can have any mean and standard deviation. However, the
percentage of cases or individuals falling within one, two or three standard
deviations from the mean is always the same. The shape of a normal distribution
does not change. Means and standard deviations will differ from variable to
variable but the percentage of cases or individuals falling within specific intervals
is always the same in a true normal distribution.
ACTIVITY 3.1
1. What is meant by the statement that a population is normally
distributed?
2. Two normally distributed variables have the same means and the
same standard deviations. What can you say about their distributions?
Explain your answer.
3. Which normal distribution has a wider spread: the one with mean 1
and standard deviation 2 or the one with mean 2 and standard
deviation 1? Explain your answer.
4. The mean of a normal distribution has no effect on its shape. Explain.
5. What are the parameters for a normal curve?
sample is reasonably large and it comes from a normal population, its distribution
should look more or less normal.
What does it mean? It means that more students were getting low scores in
the test and this indicates that the test was too difficult. Alternatively, it
could mean that the questions were not clear or the teaching methods and
materials did not bring about the desired learning outcomes.
Refer to Figure 3.4 which shows the distribution of the scores obtained by
students on a test. There is a negative skew because it has a longer tail in the
negative direction or to the left (towards the lower values on the horizontal
axis).
What does it mean? It means that more students were getting high scores on
the test. This may indicate that either the test was too easy or the teaching
methods and materials were successful in bringing about the desired
learning outcomes.
(i) Low Kurtosis: Data with low kurtosis tend to have a flat top near the
mean rather than a sharp peak.
(ii) High Kurtosis: Data with high kurtosis tend to have a distinct peak
near the mean, decline rather rapidly and have a heavy tail.
Copyright © Open University Malaysia (OUM)
40 TOPIC 3 NORMAL DISTRIBUTION
(ii) WHISKERS
The smallest and largest observed values within the distribution are
represented by the horizontal lines at either end of the box, commonly
referred to as whiskers.
The two whiskers indicate the spread of the scores.
Scores that fall outside the upper and lower whiskers are classified as
extreme scores or outliers. If the distribution has any extreme scores,
i.e. 3 or more box lengths from the upper or lower hinge, these will be
represented by a circle (o).
Outliers tell us that we should see why it is so extreme. Could it be that
you may have made an error in data entry?
Why is it important to identify outliers? This is because many of the
statistical techniques used involve calculation of means. The mean is
sensitive to extreme scores and it is important to be aware whether
your data contain such extreme scores if you are to draw conclusions
from the statistical analysis conducted.
sample is from a normal distribution, then the observed values or scores fall
more or less in a straight line. The normal probability plot is formed by:
• Vertical axis: Expected normal values
• Horizontal axis: Observed values
SPSS Procedures
1. Select Analyze from the main menu.
2. Click Descriptive Statistics and then Explore.....to open the Explore
dialogue box.
3. Select the variable you require (i.e. mathematics score) and click on
the arrow button to move this variable to the Dependent List: box.
4. Click the Plots....command push button to obtain the Explore: Plots
sub dialogue box.
5. Click the Histogram check box and the Normality plots with tests
check box and ensure that the Factor levels together radio button is
selected in the Boxplots display.
6. Click Continue.
7. In the Display box, ensure that both are activated.
8. Click the Options....command push button to open the Explore:
Options sub-dialogue box.
9. In the Missing Values box, click on the Exclude cases pairwise radio
button. If this option is not selected then, by default, any variable with
missing data will be excluded from the analysis. That is, plots and
statistics will be generated only for cases with complete data.
10. Click on Continue and then OK.
Note that these commands will give you the 'Histogram', 'Stem-and-leaf
plots', 'Boxplots' and 'Normality Plots'.
When you use a normal probability plot to assess the normality of a variable,
you must remember that ascertaining whether the distribution is roughly
linear and is normal is subjective. The graph in Figure 3.10 is an example of
a normal probability plot. Though none of the value falls exactly on the line,
most of the points are very close to the line.
• Values that are above the line represent units for which the observation
is larger than its normal score
• Values that are below the line represent units for which the observation
is smaller than its normal score
Note that there is one value that falls well outside the overall pattern of the
plot. It is called an outlier and you will have to remove the outlier from the
sample data and redraw the normal probability plot.
Even with the outlier, the values are close to the line and you can conclude
that the distribution will look like a bell-shaped curve. If the normal scores
plot departs only slightly from having all of its dots on the line, then the
distribution of the data departs only slightly from a bell-shaped curve. If one
or more of the dots departs substantially from the line, then the distribution
of the data is substantially different from a bell-shaped curve.
Outliers:
Refer to the normal probability plot in Figure 3.11. Note that there are
possible outliers which are values lying off the hypothetical straight line.
Outliers are anomalous values in the data which may be due to recording
errors, which may be correctable, or they may be due to the sample not
being entirely from the same population.
ACTIVITY 3.2
In general, both statistical tests and graphical plots should be used to determine
normality. However, the assumption of normality should not be rejected on the
basis of a statistical test alone. In particular, when the sample is large, statistical
tests for normality can be sensitive to very small (i.e. negligible) deviations in
normality. Therefore, if the sample is very large, a statistical test may reject the
assumption of normality when the data set, as shown using graphical methods, is
essentially normal and the deviation from normality is too small to be of practical
significance.
DISTRIBUTION: NORMAL
• If the Kolmogorov-Smirnov tests yields a significance level of less (<)
than 0.05, it means that the distribution is NOT normal.
• However, if the Kolmogorov-Smirnov test yields a significance level
of more (>) than 0.05, it means that the distribution is normal.
Kolmogorov-Smirnova
Statistic df Sig.
SCORE .21 1598 .000*
* This is lower bound of the true significance
a Lilliefors Significance Correction
DISTRIBUTION: NORMAL
• Reject the assumption of normality if the test of significance reports a
p-value of less (<) than 0.05.
• DO NOT REJECT the assumption of normality if the test of
significance reports a p-value of more (>) than 0.05.
Table 3.1 shows the Kolmogorov-Smirnov statistic for assessing normality.
NOTE:
It should be noted that with large samples, even a very small deviation from
normality can yield low significance levels. So a judgment still has to be made as
to whether the departure from normality is large enough to matter.
ACTIVITY 3.3
Kolmogorov-Smirnova
Statistic df Sig.
SCORE 0.57 999 .200*
INTRODUCTION
The topic explains the difference between the null and alternative hypotheses and
their use in research. It also introduces the concepts of Type I error and Type II
error. It illustrates the difference between the two-tailed and one-tailed tests and
explains when they are used in hypothesis testing.
Next, you hypothesise that "the car did not start because the spark plugs are dirty."
You check the spark plugs to determine if they are dirty. You find that the spark
plugs are indeed dirty. You do not reject the hypothesis.
All these are examples of hypotheses. However, these statements are not
particularly useful because of words such as "may," "tend to" and "more likely."
Using these tentative words does not suggest how you would go about proving it.
To solve this problem, a hypothesis should state:
• Two or more variables that are measurable
• An independent and dependent variable
• A relationship between two or more variables
• A possible prediction
Examine the hypothesis in Figure 4.1. It has all the attributes mentioned:
• The variables are "critical thinking" and "gender," which are both measurable.
• The independent variable is "gender" which can be manipulated as “male”
and “female”; and the dependent variable is "critical thinking."
• There is a possible relationship between the gender of undergraduates and
their critical thinking skills.
• It is possible to predict that males may be better in critical thinking compared
to females or vice-versa.
ACTIVITY 4.1
1. Rewrite the four hypotheses using the formalised style shown. Ensure
that each hypothesis has all the attributes stated.
2. Write two more original hypotheses of your own using this form.
Say, for example, you conduct an experiment to test the effectiveness of the
discovery method in learning science compared to the lecture method. You select
a random sample of 30 students for the discovery method group and 30 students
for the lecture method group (see Topic 1 on Random Sampling).
Based on your sample, you hypothesise that there are no differences in science
achievement between students in the discovery method group and students in the
lecture method group. In other words, you make the claim that there are no
differences in science scores between the two groups in the population. This is
represented by the following two types of null hypotheses with the following
notation or Ho:
Ho: µ¹ = µ OR Ho: µ¹ - µ = 0
² ²
Based on the findings of the experiment, you found that there was a significant
difference in science scores between the discovery method group and the lecture
method group.
In fact, the mean score of subjects in the discovery method group was HIGHER
than the mean of subjects in the lecture method group. What do you do?
• You REJECT the null hypothesis because earlier you had said they would be
equal.
• You reject the null hypothesis in favour of the ALTERNATIVE
HYPOTHESIS (i.e. µ¹ µ ).
²
SELF-CHECK 4.1
1. What is the meaning of a null hypothesis?
2. What do you mean when you "reject" the null hypothesis?
3. What is the alternative hypothesis?
4. What do you mean when you "accept" the alternative hypothesis?
Type 1 Error is the error you are likely to make when you examine your data and
say that "Something is happening here!" For example, you conclude that "There is
a difference between males and females." In fact, there is no difference between
males and females in the population.
Type 2 Error is the error you are likely to make when you examine your data and
say "Nothing is happening here!” For example, you conclude that "There is no
difference between males and females." In fact, there is a difference between
males and females in the population.
Ho: µ¹ = µ OR Ho: µ¹ - µ = 0
² ²
The null hypothesis can be true or false and you can reject or not reject the null
hypothesis. There are four possible situations which arise in testing a hypothesis
and they are summarised in Figure 4.2.
FALSE TRUE
Do Not Reject Ho: Correct Decision Risk committing
[Say it is TRUE] [no problem] Type 2 Error
Reject Ho: Risk committing Correct Decision
[Say it is FALSE] Type 1 Error [no problem]
In other words, when you detect a difference in the sample you are studying and a
difference is also detected in the population, you are OK. When there is no
difference in the sample you are studying and there is no difference in the
population you are OK.
ACTIVITY 4.3
You can use the logic of hypothesis testing in the courtroom. A student
is being tried for stealing a motorcycle. The judicial system is based on
the premise that a person is "innocent until proven guilty." It is the court
that must prove based on sufficient evidence that the student is guilty.
Thus, the null and alternative hypotheses would be:
Ho: The student is innocent
Ha: The student is guilty
1. Using the table in Figure 4.2, state the four possible outcomes of the
court's decision.
2. Interpret the Type I and Type II errors in this context.
In your study, you want to determine if females are inferior in spatial thinking
compared to males; i.e. null hypothesis is still Ho: μ1 = μ 2 . But, the alternative
hypothesis is Ha: μ1 < μ 2 . A hypothesis test whose alternative hypothesis has this
form is called a LEFT-TAILED TEST.
In your study, you want to determine if females are better in spatial thinking
compared to males; i.e. null hypothesis is still Ho: μ1 = μ 2 . The alternative
hypothesis is Ha: μ1 > μ 2 . A hypothesis test whose alternative hypothesis has this
form is called a RIGHT-TAILED TEST.
Note:
A hypothesis test is called a ONE-TAILED TEST if it is either left-tailed or right-
tailed; i.e. if it is not TWO-TAILED.
Step 1:
You want to test the following null and alternative hypotheses:
Ho : μ 1 = μ 2
Ha : μ 1 ≠ μ 2
Step 2:
Using the t-test for an independent variable (which we will discuss in detail in
Topic 5) means you obtained a t-value of 1.50. Based on the alternative
hypothesis, you decide that you are going to use a two-tailed test.
Step 3:
If you are using an alpha (α) of .05 for a two-tailed test, you have to divide .05 by
2 and you get 0.025 for each side of the rejection area.
Step 4:
The df = n-1 = (40 + 42) - 2 = 80. Look up the t table in Table 4.1 and find that
the critical value is 1.990 and the graph in Figure 4.3 shows that it ranges from
1.990 to + 1.990 which forms the Do Not Reject area.
Step 5:
The t-value you have obtained is –1.554 (We will discuss the formula for
computing the t-value in Topic 5). This value does not fall in the Rejection
Region. What is your conclusion? You do not reject Ho. In other words, you
conclude that there is NO SIGNIFICANT DIFFERENCE in spatial thinking
between male and female adolescents. You could also say that the test results are
not statistically significant at the 5% level and provide at most weak evidence
against the null hypothesis.
At = 0.05, the data does not provide sufficient evidence to conclude that the
mean scores on spatial thinking of females is superior to that of males, even
though the mean scores obtained is higher than that of males.
ACTIVITY 4.4
Step 1:
The null and alternative hypotheses are:
• Ho: μ1 = μ 2 (Mean scores on the economics tests are the same)
• Ha: μ1 > μ 2 (Mean score of the posttest is greater than the mean score of
the pretest)
Step 2:
Decide on the significant level (alpha). Here, you have set it at the 5% significant
level or alpha ( ) = 0.05.
Step 3:
Computation of the test statistic. Using the dependent t-test formula, you obtained
a t-value of 4.711.
Step 4:
The critical value for the right-tailed test is t with df = n-1. The number of
subjects is n = 10 and = 0.05. You check the "Table of Critical Values for the t-
Test" and it reveals that for df = 10 – 1 = 9. The critical value is 1.833 (Figure
4.4).
Step 5:
You find that the t-value obtained is 4.711. It falls in the Rejection Region. What is
your conclusion? You reject Ho. In other words, you conclude that there is a
SIGNIFICANT DIFFERENCE in the performance in economics before and after the
treatment. You could also say that the test results are statistically significant at the 5%
level. Put it another way, the p-value is less than the specified significance level of
0.05. (The p-value is provided in most outputs of statistical packages such as SPSS.)
At = 0.05, the data provides sufficient evidence to conclude that the mean scores
on the posttest are superior to the mean scores obtained in the pretest. Evidently,
teaching students mind mapping enhances their recall of concepts and principles
in economics.
ACTIVITY 4.5
A researcher conducted a study to determine the effectiveness of
immediate feedback on the recall of information in biology. The
experimental group of 30 students was provided with immediate
feedback on the questions that were asked. The control group consisted
of 30 students who were given delayed feedback on the questions asked.
1. Determine the null hypothesis for the hypothesis test.
2. Determine the alternative hypothesis for the hypothesis test.
3. Classify the hypothesis test as two-tailed, left-tailed or right-tailed.
Explain your answer.
• There are two types of error: Type I and Type II errors. Both relate to the
rejection or acceptance of the null hypothesis.
• Type I error is committed when the researcher rejects the null when the null is
indeed true; in other words incorrectly rejecting the null.
• The probability level where the null is incorrectly rejected is called the
significance level, denoted by the symbol a value set a priori (before even
conducting the research) by the researcher.
• Type II error is committed when the researcher fails to reject the null when the
null is indeed false, in other words wrongly accepting the null.
• In any research, the intention of the researcher is to correctly reject the null; if
the design is carefully selected and the samples represent the population, the
chances of achieving this objective are high. Thus, the power of the study is
defined as 1 - ß.
INTRODUCTION
This topic explains what t-test is and its use in hypothesis testing. It also
highlights the assumptions for using the t-test. Two types of t-test are elaborated
in the topic. The first is t-test for independent means while the second is the t-test
for dependent means. Computation of the t-statistic using formulae as well as
SPSS procedures is also explained.
For example, a teacher wants to find out whether the Discovery Method of
teaching science to primary schoolchildren is more effective than the Lecture
Method. She conducts an experiment involving 70 primary school children of
whom 35 are taught using the Discovery method and 35 are taught using the
Lecture method. Subjects in the Discovery group score 43.0 marks, while subjects
in the Lecture method group score 38.0 marks on the science test. The Discovery
group does better than the Lecture group. Does the difference between the two
groups represent a real difference or is it due to chance? To answer this question,
the t-test is often used by researchers.
Using the null hypothesis, you begin testing the significance by saying: "There is
no difference in the score obtained in science between subjects in the Discovery
group and the Lecture group."
Copyright © Open University Malaysia (OUM)
TOPIC 5 T-TEST 69
(a) Ho : 1 = 2
OR
(b) Ho : 1 - 2 = 0
If you reject the null hypothesis, it means the difference between the two means
have statistical significance. On the other hand, if you do not reject the null
hypothesis, it means the difference between the two means is NOT statistically
significant and the difference is due to chance.
Note:
For a null hypothesis to be accepted, the difference between the two means need
not be equal to zero since sampling may account for the departure from zero.
Thus, you can accept the null hypothesis even if the difference between the two
means is not zero provided the difference is likely to be due to chance. However,
if the difference between the two means appears too large to have been brought
about by chance, you reject the null hypothesis and conclude that a real difference
exists.
ACTIVITY 5.1
1. State TWO null hypothesis in your area of interest that can be tested
using the t-test.
2. What do you mean when you reject or do not reject a null
hypothesis?
(a) Illustration
Say, for example, you conduct a study to determine the spatial reasoning
ability of 70 ten-year-old children in Malaysia. The sample consisted of 35
males and 35 females (see Figure 5.2). The sample of 35 males was drawn
from the population of ten-year-old males in Malaysia and the sample of 35
females was drawn from the population of ten-year-old females in Malaysia.
Note that they are independent samples because they come from two completely
different populations.
Research Question:
"Is there a significant difference in spatial reasoning between male and
female ten-year-old children?"
Mean SD N Variance
Group 1: Males 12 2.0 35 4.0
12 10 2
t
4.01 4.02 00.1177 0.1177
(35 1) (35 1)
2
4.124
0.485
Note: The t-value will be positive if the mean for Group 1 is larger or more than
(>) the mean of Group 2 and negative if it is smaller or less than (<).
(g) Look up the Table of Critical Values for Student's t-test shown in
Table 5.1 (Note: Only part of the table is given here)
The df is 70 minus 2 = 68. You take the nearest df which is 70 and read the
column for the two-tailed alpha of 0.050.
The t-value you obtained is 4.124. The critical value shown is 1.994. Since
the t-value is greater than the critical value of 1.994, you reject Ho and
conclude that the difference between the means for the two groups is
different. In other words, males scored significantly higher than females on
the spatial reasoning test.
ACTIVITY 5.2
1. Would you reject Ho if you had set the alpha at 0.01 for a two-tailed
test?
2. When do you use the one-tailed test and two-tailed t-test?
(iii) Normality
The data come from a distribution that has one of those nice bell-
shaped curves known as a normal distribution. Refer to Topic 3: The
Normal Distribution, which provides both graphical and statistical
methods for assessing normality of a sample or samples.
"There are no significant differences between the variances of the two groups"
and you set the significant level at .05.
If the Levene statistic is significant, i.e. LESS than .05 level (p < .05), then the
null hypothesis is: REJECTED and one accepts the alternative hypothesis while
concluding that the VARIANCES ARE UNEQUAL. (The unequal variances in
the SPSS output is used)
If the Levene statistic is not significant i.e. MORE than .05 level (p > .05), then
you DO NOT REJECT (or Accept) the null hypothesis and conclude that the
VARIANCES ARE EQUAL. (The equal variances in the SPSS output is used)
The Levene test is robust in the face of departures from normality. The Levene's
test is based on deviations from the group mean.
SPSS provides two options i.e. "homogeneity of variance assumed" and
"homogeneity of variance not assumed" (see Table below).
The Levene test is more robust in the face of non-normality than more
traditional tests like Bartlett's test.
ACTIVITY 5.3
Output #1:
The “Group Statistics” in Table 5.3 reports the mean values on the variable
(inductive reasoning) for the two different groups (males and females). Here, we
see that 495 females in the sample scored 8.99 while 451 males had a mean score
of 7.95 on inductive reasoning. The standard deviation for the males is 3.46 while
that for the females is 3.14. The scores for the females are less dispersed
compared to those for the males.
Table 5.3: Mean Values on the Variable (Inductive Reasoning) for the Two Different
Groups (Males and Females)
Group Statistics
Output #2:
Let’s examine this output in two parts:
Firstly, determine that the data meet the "Homogeneity of Variance" assumption.
You can use the Levene's Test and set the alpha at 0.05. The alpha obtained is
0.030 which is less than (<) than 0.05 and you reject the Ho: and conclude that the
variances are not equal. Hence, you have violated the "Homogeneity of Variance"
assumption. Thus, the “Unequal Variances Assumed” output should be used.
Refer to Figure 5.3.
Interpretation:
t-value
This "t" value tells you how far away from 0, in terms of the number of standard
errors, the observed difference between the two sample means falls. The "t" value
is obtained by dividing the Mean Difference (– 1.0468) by the Std. Error (– .2146)
which is equal to – 4.878.
p-value
If the p-value as shown in the "sig (2 tailed) column is smaller than your chosen
alpha level you do not reject the null hypothesis and argue that there is a real
difference between the populations. In other words, we can conclude that the
observed difference between the samples is statistically significant.
Mean Difference
This is the difference between the means (labelled "Mean Difference") i.e. 7.9512
– 8.9980 = – 1.0468.
Example:
Research Questions:
Is there a significant difference in pretest and posttest scores in social studies
for subjects in the discovery method group?
Is there a significant difference in pretest and posttest scores in social studies
for subjects in the chalk and talk group?
Null Hypotheses:
There is no significant difference between the pretest and the posttest for the
discovery method group.
There is no significant difference between the pretest and the posttest for the
chalk and talk group.
Where,
t = t-ratio
D = Average difference
D2 = Different scores squared then summed
( D)2 = Different scores summed then squared
N = Number of pairs
EXAMPLE:
A researcher conducted a study on personality changes in 15 college women from
Year 1 to Year 4. A 30-item personality test was administered in Year 1 and then
again in Year 4 to the same 15 women. The results of the study are shown in
Table 5.4.
Step 1:
Calculate the mean score for the Year 1 Test by adding up all the Year 1 Test
scores and divide by the number of subjects. This will give you the mean score of
18.5. Similarly, calculate the mean score of the Year 4 Test and this will give you
the mean score of 20.8.
Step 2:
Next, calculate the value of standard deviation using the formula as follows.
2
D
D2 -
SD = n
N-1
2
35
159
15 159 81.67
SD = 5.52 2.35
15 1 14
Step 3:
D
Applying the t-test for Dependent Means formula: ( )
SD
Calculate effect size, the mean difference divided by the standard deviation.
The mean difference is 20.8 – 18.5 = 2.3 and the standard deviation is 2.35.
Substitute these values in the above equation, i.e. 2.3 / 2.35 = 0.979.
To determine the likelihood that the effect size is a function of chance, first
calculate the t-ratio by multiplying the effect size by the square root of the number
of pairs.
t N
Step 4:
Having computed the t-value (which is 3.79) you look up the t-value in The Table
of Critical Values for Student's t-test or The Table of Significance which tells us
whether the ratio is large enough to say that the difference between the groups is
significant. In other words, the difference observed is not likely due to chance or
sampling error. Refer to Table 5.5.
Alpha Level
The researchers set the alpha level at 0.05. This means that 5% of the time (five
out of a hundred) you would find a statistically significant difference between the
means even if there is none ("chance"). However, since this is a one-tailed test,
you divide 0.05 by 2 and you get 0.025.
Degrees of Freedom
The t-test also requires that we determine the degrees of freedom (df) for the test.
In the t-test, the degrees of freedom are the sum of the subjects or persons which
is 15 – 1 = 14. Given the alpha level, the df and the t-value, you look up in the
Table (available as an appendix in the back of most statistics texts) to determine
whether the t-value is large enough to be significant.
Step 5:
The t-value obtained is 3.79 which is greater than the critical value shown which
is 2.145 (one tailed). Hence, the null hypothesis [Ho:] is Rejected and Ha: is
accepted which states the Posttest Mean > than Pretest Mean. It can be concluded
that the difference between the means is significant. In other words, there is
overwhelming evidence that a "gain" has taken place on the personality inventory
from Year 1 to Year 4 women undergraduates.
Again, you do not have to go through this tedious process, as statistical computer
programs such as SPSS, provides the significance test results, saving you from
looking them up in a table.
To establish the statistical significance of the means obtained on the pretest and
posttest, the repeated measures t-test (also called dependent-samples and paired-
samples t-test) was used using SPSS.
Data was collected from the same group of subjects on both conditions and each
subject obtains a score on the pretest, and after the treatment (or intervention or
manipulation), a score on the posttest.
Ho: 1 = 2 or Ha: 1 2
The ‘Paired Samples Statistics’ table above reports that the mean values on the
variable (history test) for the pretest and posttest. The posttest mean is higher
(13.86) than the pretest mean (8.50) indicating improved performance in the
history test after the treatment. The standard deviation for the pretest 3.34 and is
very close to the standard deviation for the posttest which is 2.75.
The question remains: Is this mean difference large enough to convince us that
there is a significant difference in performance in history, a consequence of
teaching note-taking techniques?
Paired Differences
Mean Std. Std. Lower Upper t df Sig. (2
Difference Deviation Error tailed)
Mean
Pair 1 -5.36 2.90 .62 -6.65 -4.076 -8.65 21 .000
Pretest
Posttest
t-Value
This "t" value tells you how far away from 0, in terms of the number of standard
errors, the observed difference between the two sample means falls. The "t" value
is obtained by dividing the mean difference (–5.36) by the std. error (.62), which
is equal to –8.65. Refer to Figure 5.4.
p-value
The p-value shown in the "Sig (2 tailed)” column is smaller than your chosen
alpha level (0.05) and so you reject the null hypothesis and argue that there is a
real difference between the pretest and posttest.
In other words, we can conclude, that the observed difference between the two
means is statistically significant.
Mean Difference
This is the difference between the means 43.15 – 63.98 = –20.83.
ACTIVITY 5.4
Paired Differences
Mean Std. Std. Lower Upper t df Sig.
Deviation Error (2
Mean tailed)
Pair Pretest -5.36 2.90 .62 -6.65 -4.08 -8.66 21 .000
Posttest
ACTIVITY 5.5
INTRODUCTION
This topic explains what One-way Analysis of Variance (ANOVA) is about and
the assumptions for using ANOVA in hypothesis testing. It demonstrates how
ANOVA can be computed using the formula and the SPSS procedures. Also
explained are the interpretation of the related statistical results and the use of post-
hoc comparison tests.
an experimental study. Suppose you are interested in comparing the means of three
groups (i.e k = 3) rather than two.
You might be tempted to use the multiple t-test and compare the means separately;
i.e. you compare the means of Group 1 and 2, followed by Group 1 and 3 and so
forth. What is the danger of doing this? Multiple t-tests enhance the likelihood of
committing Type 1 error (i.e. claiming that two means are not equal, when in fact
they are equal). In other words, you reject a null hypothesis when it is TRUE. On a
practical level, using the t-test to compare many means is a cumbersome process in
terms of the calculations involved.
Example
Let us look at the following example, which shows the results of a study on
Attitude towards Homework among Students of Varying Ability Levels. Subjects
were divided into three groups: High Ability, Average Ability and Low Ability.
The total sample size is 505 students. You need a special class of statistical
techniques called the One-way Analysis of Variance or One-way ANOVA which
we will discuss here.
What do the three means tell you? High ability students have the highest mean
(13.03), while low ability students have the lowest mean (9.54). Meanwhile,
average ability students fall in the middle, with a mean of 11.99.
• What do the three standard deviations tell you? Note that the standard deviation
for high ability (3.17) and average ability (2.93) students are fairly close, while
low ability students have a somewhat bigger standard deviation of 3.50.
• What do the three Standard Errors tell you? Refer to Table 6.1, and you will
notice that there is a column called 'standard error'. What is the standard error?
The standard error is a measure of how much the sample means vary if you
were to take repeated samples from the same population. The first two groups
Copyright © Open University Malaysia (OUM)
TOPIC 6 ONE-WAY ANALYSIS OF VARIANCE (ONE–WAY ANCOVA) 91
contain > 200 students each; the standard error of the mean for each of these
groups is fairly small. It is 0.12 for high ability students and 0.11 for average
ability students. However, the standard error for the low ability group is
comparatively high = 0.40. Why? The smaller number of low ability students
(n=73) and the larger standard deviation explains why the standard error is
larger.
• What does 95 Pct Conf. Int for Mean means? The last column displays the
“confidence interval”. What is the confidence interval? It is the range which is
likely to contain the true population value or mean. If you take repeated
samples of 14-year-old students from the same population of 14-year-old
students in the country and calculate their mean, there is a probability that 95%
of them should include the unknown population value or mean. For example,
you can be 95% confident that, in the population, the mean of high ability
students is somewhere between 12.79 and 13.27. Similarly, you can be 95%
confident that, in the population, the mean of low ability students is somewhere
between 8.73 and 10.36.
• You will notice that the confidence interval is wider for low ability students
(i.e. 1.63) compared to confidence interval for high ability students (i.e. 0.48).
Why? This is due to the larger standard error (0.40) obtained by low ability
students. Since the confidence interval depends on the standard error of the
mean, the confidence interval for low ability students is wider than for high
ability students. So, the larger the standard error, the wider will be the
confidence interval. Makes sense, right?
At the heart of ANOVA is the concept of Variance. What is variance? Most of
you would say, it is the standard deviation squared! Yes, that is correct. The focus
is on two types of variance:
• Between-Group Variance, i.e. if there are three groups, it is the variance
between the three groups.
• Within-Group Variance, i.e. if in each group there are 30 subjects, it is the
variance of scores within subjects in that group.
If the F-value is significant, it tells us that the population means are probably not
all equal and you reject the null hypothesis. Next, you have to locate where the
significance lies or which of the means are significantly different. You have to use
post-hoc analysis to determine this.
ACTIVITY 6.1
1. What is the standard error? Why does the standard error vary?
2. Explain "95 Pct Conf. Int for Mean".
The null hypothesis states that the means of high ability, average ability and low
ability students are the same; i.e. is equal to 4.00.
To test the null hypothesis, the One-way Analysis of Variance is used. The One-
way ANOVA is a statistical technique used to test the null hypothesis that several
populations’ means are equal. The word 'variance' is used because it examines the
variability in the sample. In other words, how much do the scores of individual
students vary from the mean? Based on the variability or variance, it determines
whether there is reason to believe that the population means are not equal. In our
example, does creativity vary between the three groups of 12-year-old students?
The alternative hypothesis states that there is a difference between the three groups
of students (see Figure 6.2). However, the alternative hypothesis does not state
which groups differ from one another. It just says that the means of each group are
not all the same; or at least one of the groups differs from the others.
Are the means really different? We need to figure out whether the observed
differences in the sample means are attributed to just the natural variability among
sample means or whether there is reason to believe that the three groups of
students have different means in the population. In other words, are the differences
due to chance or there is a 'real' difference.
The following is the summarised formula for computing the F-statistic or F-ratio:
Based on the study (see Table 6.2 for results) about the relationship between
creativity and socio-economic status of the subject, computation of the F-statistics
is as follows:
Degrees of freedom:
This sum of squares has a number of degrees of freedom equal to the number
of groups minus 1. In this case, df = (3-1) = 2
Degrees of freedom:
As in Step 1, we need to adjust the WSS to transform it into an estimate of
population variance, an adjustment that involves a value for the number of
degrees of freedom within. To calculate this, we take a value equal to the
number of cases in the total sample (N = 950), minus the number of groups
(k = 3), i.e. 950 - 3 = 947
WSS 1593.18
Within Mean Squares = = = 1.68
df 947
df1 1 2 3 4
df2
96 3.940 3.091 2.699 2.466
97 3.939 3.090 2.698 2.465
98 3.938 3.089 2.697 2.465
99 3.937 3.088 2.696 2.464
100 3.936 3.087 2.696 2.463
120 3.920 3.070 2.680 2.450
Finally, compare the F-statistics (13.34) with the critical value 3.07. At p =
0.05, the F-statistics is larger (>) than the critical value and hence there is
strong evidence to reject the null hypothesis, indicating that there is a
significant difference in creativity among the three groups of students. While
the F-statistic assesses the null hypothesis of equal means, it does not address
the question of which means are different. For example, all three groups may
be different significantly, or two may be equal but differ from the third. To
establish which of the three groups are different, you have to follow up with
post-hoc comparison or tests.
Tukey HSD
The Tukey's HSD runs a series of Tukey's post-hoc tests, which are like a
series of t-tests. However, the post-hoc tests are more stringent than the
regular t-tests. It indicates how large an observed difference must be for the
multiple comparison procedure to call it significant. Any absolute difference
between means has to exceed the value of HSD to be statistically significant.
Most statistical programmes will give you an output in the form of a table as
shown above. Group means are listed as a matrix. An asterisk (*) indicates
which pairs of means are significantly different.
Note that only the mean of Group 3 is significantly different from Group 1.
In other words, High SES (Mean = 4.12) subject scored significantly higher
on creativity than Low SES (Mean = 3.85) subjects. There was no significant
difference between High SES and Middle SES subjects nor was there a
significant difference between Middle SES and Low SES subjects.
Table 6.3: Means, Skewness and Kurtosis for the Three Groups
Independent Variable Statistic Std. Error
Group
Group 1 Mean 43.82 2.20
Skewness .973 .491
Kurtosis .341 .953
Group 2 Mean 60.14 2.71
Skewness -.235 .597
Kurtosis -1.066 1.154
Group 3 Mean 64.75 3.61
Skewness -.407 .564
Kurtosis -1.289 1.091
The Shapiro-Wilk normality tests indicate that the scores are normally
distributed in each of the three conditions. The Kolmogorov-Statistic is
significant for Group 1, but that statistic is more appropriate for larger
sample sizes. Refer to Figure 6.3.
Just like the t-test, the Levene's test of homogeneity of variance is used for
the One-way ANOVA and is shown in Figure 6.4. The p-value which is
0.113 is greater than the alpha of 0.05. Hence, it can be concluded that the
variances are homogeneous which is reported as Levene (2, 49) = 2.28, p =
.113.
ACTIVITY 6.2
Procedure for the One-way ANOVA with post-hoc analysis Using SPSS
1. Select the Analyze menu.
2. Click Compare Means and One-Way ANOVA ..... to open the One-Way
ANOVA dialogue box.
3. Select the dependent variable (i.e. inductive reasoning) and click the arrow
button to move the variable into the Dependent List box.
4. Select the independent variable (i.e SES) and click the arrow button to move
the variable into the Factor box.
5. Click the Options ..... command push button to open the One-Way
ANOVA: Options sub-dialogue box.
6. Click the check boxes for Descriptive and Homogeneity-of-variance.
7. Click Continue.
8. Click the Post Hoc .... command push button to open the One-Way
ANOVA: Post Hoc Multiple Comparisons sub-dialogue box. You will
notice that a number of multiple comparison options are available. In this
example you will use the Tukey's HSD multiple comparison test.
9. Click the check box for Tukey.
10. Click Continue and then OK.
As you may have realised, just by looking at the “Descriptives” table, the
group means cannot tell us decisively if significant differences exist. What is
the next step?
Note that each mean is compared to every other mean thrice so the results are
essentially repeated in the table. Interpreting the table reveals that:
ACTIVITY 6.3
ACTIVITY 6.4
The one-way ANOVA is used to compare the differences between more than
two groups of samples from unrelated populations.
• Even though ANOVA is used to compare the mean, this test uses the variance
in computing the test statistics.
• This test requires large, other assumptions needed are normal distribution of
the population parameter, variables measures at least at interval levels, and
equality of variance between the groups.
• Between group variances are due to the differences between the groups (could
be due to different treatment etc.), while within group variances are due to
sampling (the differences among the members of the same group).
• Technically, for any comparison between groups, the between group variance
should be large simply because they are different groups while within the
group itself the variances should be low (assuming the members are
homogenous).
• The F-statistics are based on the premise that if different treatments have
different effects (or different groups respond differently due to their inherited
differences), the between group variance is large while the within group
variance (also called the residual variance) is low. If there is any difference
between the groups, the F-value will be high, causing the null hypothesis to be
rejected.
INTRODUCTION
This topic explains what analysis of covariance (ANCOVA) is about and the
assumptions for using it in hypothesis testing. It also demonstrates how to
compute and interpret ANCOVA using SPSS.
Besides prior knowledge, other factors that could complicate the situation include
level of intelligence, attitude, motivation and self-efficacy. The Analysis of
Covariance (ANCOVA) provides a way of measuring and removing the effects of
such initial systematic differences between groups or samples.
EXAMPLE:
A researcher conducted a study with the aim of comparing the effectiveness of the
lecture method and the discussion method in teaching geography (see
Figure 7.1). One group received instruction using the lecture method and another
group received instruction using the discussion method.
For illustration purposes, only four students were randomly assigned to the two
groups (in real-life research, you will certainly have more subjects). The result is
two sets of bivariate measures, one set for each group.
(f) Reliability of the Covariate: The instrument used to measure the covariate
should be reliable. In the case of variables such as gender and age, this
assumption can usually be easily met. However, with other types of
variables such as self-efficacy, attitudes, personality, etc., meeting this
assumption can be more difficult.
Look at the graph in Figure 7.3, which shows regression lines for each group
separately. Look to see how each group differs on mean age. The Graduates, for
instance have a mean age of 38, their score on knowledge of current events is 14;
while the mean age for the Diploma holders is 45 and their score on knowledge of
current events is 12.5. The mean for the subjects with High school qualifications
is 50 and their score on the knowledge of current events test is 11.5. What does
this tell you? It is probably obvious to you that part of the differences in
knowledge of current events is due to the groups’ having a different mean age.
So you decide to include Age as a covariate and use ANCOVA.
(a) ANCOVA reduces the error variance by removing the variance due to the
relationship between age (covariate) and the dependent variable (knowledge
of current events).
(b) ANCOVA adjusts the means on the covariate for all of the groups,
leading to the adjustment of the means of the dependent variable
(knowledge of current events).
ANCOVA adjusts the knowledge of current events means (y means) to what they
would be if the three groups had the same mean on age (x or covariate).
While ANOVA uses the “real” means of each group to determine if the
differences are significant, ANCOVA uses the Grand Mean. The grand mean is
the mean of each group divided by the number of groups (i.e. 38 + 45 + 50
divided by 3 = 44). Now, we can see how far each mean is from the grand mean.
So for the graduates groups, ANCOVA does not use the mean age of 38, in order
to find the mean knowledge of current events. Instead, it gives an estimate of what
the mean of knowledge of current events would be, if age were held constant (i.e.
the mean ages of the groups were the same which in this case is 44).
Hence, you have to ensure that the regression slopes for each group are parallel. If
the slopes are not parallel, using a procedure that adjusts the means of the groups
to an “average” (the grand mean) does not make sense. Is it possible to have a
sensible grand mean, from three very different slopes as shown in Figure 7.4? The
answer is NO because the differences between the groups are not the same, for
each value of the covariate. So, in this case, the use of ANCOVA would not be
sensible.
A researcher wanted to find out if the critical thinking skills of students can be
improved using the inquiry method when teaching science. A sample of 30
students were selected and divided into the following groups: 13 high ability
subjects, 8 average ability subjects and 13 low ability subjects. A 10-item critical
thinking test was developed by the researcher and administered before the
intervention and after the intervention.
The homogeneity of variance table (Table 7.1) indicates that the variances of the
three groups are similar and the null hypothesis is rejected as the p value is 0.500
is more than the p value of .05. Hence, you have not violated one of the
assumptions for using ANOVA.
Table 7.2 shows the means and standard deviations for the three groups of
subjects – low, average and high ability. Although the high ability group subjects
scored 4.84 and low ability subjects scored only 3.22; the difference between the
ability levels is not significant. Therefore, teaching students using the inquiry
method seems to have no significant effect on critical thinking.
Since the p-value reported is .108 which is more than the p-value of .05, the
Tukey’s post hoc comparison test revealed no significant differences between the
three groups of students. Therefore, it is concluded that teaching science using the
inquiry method seems to have no significant effect on critical thinking.
See the ANOVA table with the covariate included. Compare this to the ANOVA
table when the covariate was not included. The format of the ANOVA table is
largely the same as without the covariate (see Table 7.4), except that there is an
additional row of information about the covariate (pretest).
* Significant at p = .05
Looking first at the significance values, it is clear that the covariate (i.e. pretest)
significantly influenced the dependent variable (i.e. posttest), because the
significance values are less than .05. Therefore, performance in the pretest had a
significant influence on the posttest. What is more interesting is that when the
effect of the pretest is removed, teaching science using the inquiry method
becomes significant (p is .037 which is less than .05). There was a significant
effect of the inquiry method of teaching on critical thinking after controlling
for the effect of the pretest, F(2,26) = 4.14, p <.05.
Table 7.5 shows the adjusted means (The Sidak test was used to obtain the
adjusted means). These values should be compared with Table 7.2 to see the
effect of the covariate on the means of the three groups. The results show that low
ability subjects differed significantly from high ability subjects on the critical
thinking test (see Table 7.6). However, there were significant differences between
average and high ability subjects.
CONCLUSION
This example illustrates how ANCOVA can help us exert stricter experimental
control by taking into account confounding variables to give us a “purer” measure
of the effect of the experimental manipulation. Without taking into account the
pretest, we would have concluded that the inquiry method of teaching science had
no effect on critical thinking of subjects, yet clearly it does.
ACTIVITY 7.1
ACTIVITY 7.2
Refer to the following Table 7.7, which is an SPSS output and answer
the following questions:
1. State the independent variable. Give reasons.
2. Which is the covariate? Explain.
3. State the dependent variable. Give reasons.
4. State a hypothesis for the above results.
5. Do you reject or do not reject the hypothesis stated above?
Covariate Linearity
Homogeneity of regression Normality
Homogeneity of variance Reliability of the covariate
Independence
INTRODUCTION
This topic explains the concept of causal relationship between variables. It
discusses the use of statistical tests to determine slope, intercept and the
regression equation. It also demonstrates how to run regression analysis using
SPSS and interpret the results.
For example, if people who exercise regularly nearly always have better
health than those who do not exercise, then exercise and health are more
strongly correlated. If those who exercise regularly are just a little more
likely to be healthy than non-exercisers then the two variables are only
weakly related. The scale in Figure 8.1 as follows shows the strength of the
correlation coefficient.
How high does a correlation coefficient have to be, to be called strong? How
small is a weak correlation? The answer to these questions varies with the
variables being studied. For example, if the literature shows that in previous
research, a correlation of 0.51 was found between variable X and variable Y, but
in your study you obtained a correlation of 0.60; then you might conclude that the
correlation between variable X and Y is strong.
However, Cohen (1988) has provided some guidelines to determine the strength
of the relationship between two variables by providing descriptors for the
coefficients. Keep in mind that in education and psychology, it is rare that the
coefficients will be “very strong” or “near perfect” since the variables measured
are constructs involving human characteristics, which are subject to wide
variation.
Example:
Data was gathered for the following two variables (IQ test and science test) from a
sample of 12 students. Refer to Table 8.1 below.
Table 8.1: Data of Two Variables (IQ Test and Science Test)
Figure 8.2: Scatter Diagram Showing the Relationship between IQ Scores (X-axis)
and Science Score (Y-axis) for 12 Students
See Figure 8.3. If Attitudes (x) and English Achievement (y) had a positive
relationship then the Slope ( 1) will be a positive number. Lines with positive
slopes go from the bottom left toward the upper right, i.e. an increase from 1 to 2
on the X-axis is followed by an increase from 3 to 3.5 on the Y-axis.
See Figure 8.4. If Attitudes (x) and English Achievement (y) have a negative
relationship than the Slope ( 1) will be a negative number. Lines with
negative slopes go from the upper right to the lower left. The above graph
has a slope of –1. An increase of 1 on the X-axis is associated with a
decrease of 0.5 on the Y-axis; i.e an increase from 1 to 2 on the X-axis is
followed by a decrease from 5 to 4.5 on the Y-axis.
If Attitudes (x) and English Achievement (y) have zero relationship (as shown in
Figure 8.5) than there is NO SYSTEMATIC RELATIONSHIP between X and Y.
Here, some students with high Attitude scores have positive low English scores,
while some students who have low Attitude scores have high positive English
scores.
The Pearson Correlation Coefficient (also called the Pearson r) is the commonly
used formula in computing the correlation between two variables. The formula
measures the strength and direction of a linear relationship between variable X
and variable Y. The sample correlation coefficient is denoted by r. The formula
for the sample correlation coefficient is:
X Y
XY
r N
( X )2 ( Y )2
( X2 )( Y2 )
N N
( x)2 (135) 2
SSxx x2 1566
n 12
18225
1566 1566 1518.75 47.25
12
( y)2 (114) 2
SSyy y2 1139
n 12
12996
1139 1139 1083 56.00
12
12
X Y
XY
N 22.50
r
( X )2 ( Y )2 (47.50)(56.00)
( X2 )( Y2 )
N N
22.50
0.436
51.58
SPSS Procedures:
1. Select the Analyze menu.
2. Click on Correlate and then Bivariate to open the
Bivariate Correlations dialogue box.
3. Select the variables you require (i.e. reading and science) and
click on the arrow button to move the variables into the
Variables: box.
4. Ensure that the Pearson correlation option has been
selected.
5. In the Test of Significance box, select the One-tailed radio
button.
6. Click on OK.
To interpret the correlation coefficient, you examine the coefficient and its
associated significance value (p). The output show that the relationship between
reading and science scores is significant with a correlation coefficient of r = 0.63
which is p < .05. Thus, higher reading scores are associated with higher scores in
science.
Hence, the null hypothesis is REJECTED which affirms that the two variables are
positively related in the population.
Coefficient of Determination:
SPSS Procedures:
1. Select the Graph menu.
2. Click on Scatter to open the Scatterplot dialogue box.
3. Ensure Simple Scatterplot option is selected.
4. Click on the Define command push button to open the Simple
Scatterplot sub-dialogue box.
5. Select the first variable (i.e. science) and click on the arrow button to
move the variable into the Y Axis: box. .
6. Select the second variable (i.e. reading) and click on the arrow button to
move the variable into the X Axis: box.
6. Click on OK.
As you can see from the scatter plot (Figure 8.7) there is a linear relationship
between reading and science scores. Given that the scores cluster uniformly
around the regression line, the assumption of homogeneity of variance has not
been violated.
6 d2 6(49)
rs 1 =1 = 0.796
n (n 2 1) 12(121 1)
SPSS Procedures:
1. Select the Analyze menu.
2. Click on Correlate and then Bivariate to open the Bivariate
Correlations dialogue box.
3. Select the variables you require (i.e. reading and science) and click on
the arrow button to move the variables into the Variables: box.
4. Ensure that the Spearman correlation option has been selected.
5. In the Test of Significance box, select the One-tailed radio button.
6. Click on OK.
Results
Correlations
rq2 rq6
Spearman's rho rq2 Correlation Coefficient 1.000 .507**
Sig. (2-tailed) . .000
N 203 203
rq6 Correlation Coefficient .507** 1.000
Sig. (2-tailed) .000 .
N 203 203
**. Correlation is significant at the 0.01 level (2-tailed).
Does this imply that well-paid teachers "cause" better academic performance of students?
Would the percentage of academic performance increase if we increased the pay of
teachers? It is dangerous to conclude the causation just because there is a correlation or
relationship between the two variables. It tells nothing by itself about whether "teachers’
salary" causes "achievement".
ACTIVITY 8.1
A researcher conducted a study which aimed to determine the
relationship between self-efficacy and academic performance in
geography. A 20-item self-efficacy scale and a 25-item geography test
was administered to a group of 12 students.
• The linear relationship between two variables is evaluated from two aspects:
the strength of the relationship (correlation), and the cause-effect association
(regression).
• The value for correlation coefficient ranges from –1 to +1. Any value close to
these extremes indicates the strength of the linear relationships in the same or
opposite direction.
• There are two methods for computing the correlation coefficient, the Pearson
correlation, and Spearman Rank Order correlation. The latter is the non-
parametric equivalent of the former and used when the data is measured in an
ordinal level or when the sample size is small.
• The correlation coefficient computed from the sample indicates the strength of
the relationship in the sample. To generalise a linear relationship to the
population, the significant test needs to be performed.
INTRODUCTION
This topic explains the concept of causal relationship between variables. It
discusses the use of statistical tests to determine slope, intercept and the
regression equation. It also demonstrates how to run regression analysis using
SPSS and interpret the results.
Y = a + bX
where Y is the dependent variable, X is the independent variable, and a and b are
two constants to be estimated.
Basically regression is a technique of placing the best fitting straight line to represent
a cluster of points (see the following Figure 9.1). The points are defined in a two-
dimension plane. The straight line expresses the linear association between the
variables studied. It is a useful technique to establish cause-effect relationship
between variables and to forecast future results/outcomes. An important consideration
in linear regression analysis is, the researcher must identify the ‘independent’ and
‘dependent’ variable prior to the analysis.
Slope
n XY X Y
b
n X2 X2
Y-intercept
Example:
A research was conducted at TESCO Hypermarket to determine if there is a
cause-effect relationship between the sales and expenditure on advertisements.
Table 9.1 illustrates the computation of the regression coefficients.
The regression equation for the relationship between Sales and Expenditure on
advertisements is:
Example
If the researcher would like to test the hypothesis that there is a true relationship
between sales and expenditure on advertising, the following procedures need to be
adhered.
The researcher performs the ANOVA for the linear relationship between sales and
expenditure on advertising. The result is shown in Table 9.2.
Table 9.2: The Results of the ANOVA for Simple Linear Regression between Sales and
Expenditure on Advertising
ANOVA
df SS MS F p-value
Regression 1 254.65 254.65 13.46 0.01
Residual 9 170.22 18.91
Total 10 424.88
F-value is 13.46
P-value is 0.01
Copyright © Open University Malaysia (OUM)
TOPIC 9 LINEAR REGRESSION 141
Since the p-value is smaller than 0.05, reject null hypothesis and conclude the
alternative hypothesis. There is a linear relationship between the variables studied.
From the data it is evident that there is a linear relationship between sales and
expenditure on advertising.
Now, we can proceed to the test of significance for the regression slope.
Note : For simple linear regression where there is only one independent variable,
if linear relationship is ‘proven’ the significance test for the slope will show
‘significant departure from zero’.
Requirements
Parameter to be tested: Regression Slope,
Normality: Sample statistics (in this case, b) resembles normal distribution.
Sample size: Large
Recommended test: t-test for regression slope.
b
Test statistics: t
SE (b)
The Hypothesis
H0: The regression slope is equal to zero.
Ha: The regression slope is not equal to zero.
The researcher performs the t-test for regression slope for the linear relationship
between expenditure on advertisement and sales. The result is shown in Table 9.3.
Table 9.3: The Results of the T-test to Test the Significance of the Regression Slope
t-value is 3.82
p-value is 0.005
Since the p-value is smaller than 0.05, reject null hypothesis and conclude the
alternative hypothesis. The regression slope is not equal to zero. There is a true
relationship between the variables studied. Sales is linearly related to expenditure
in advertisement. The regression coefficient for this relationship is:
The R2 is 0.599, meaning that 59.9% of the variation in Sales is attributed to the
variation in Expenditure on advertising.
SPSS Procedures:
Results
Ho: The variation in the dependent variable is not explained by the linear model
(R2 = 0).
Ha: A significant porting of the variation in the dependent variable is explained
by the linear model (R2 0).
Since the p-value is less than 0.05, reject the null hypothesis and conclude that a
significant porting of the variation in the dependent variable is explained by the
linear model. Refer to Figure 9.3.
The R2 is 0.306; indicates that about 30.6% of the variation in the customers’
satisfaction can be attributed to the changes in the respondents’ perception on
employees’ knowledge.
The next step is to test the significant of the slope. In simple linear regression if
the global hypothesis shows that there is a significant linear relationship between
the dependent and independent variable, the significance test for the slope will
also provide evidence that it is significantly different from zero.
The Hypothesis
H0: The regression slope is equal to zero.
Ha: The regression slope is not equal to zero.
Since the p-value is less than 0.05, reject the null hypothesis and conclude that the
regression slope is not equal to zero. Thus,
Example
A researcher is interested in determining the various factors that contribute to the
sales of a newly introduced hair shampoo. Among the crucial factors that he
wishes to study are cost for TV advertisement, training of sales executives,
employing promoters, distribution of free samples, and leasing the prime spots at
hypermarkets and supermarkets.
TV : TV advertisement cost
Train : Training of sales executives cost
Promoters : Cost for employing promoters
Free samples : Cost for distributing free samples
Prime spot : Cost for leasing prime spots at hyper and supermarkets
The researcher performs the ANOVA for the linear relationship between
sales and all the defined predictor variables. The result for it is shown in
Table 9.4.
Since the p-value is smaller than 0.05, reject the null hypothesis and
conclude the alternative hypothesis. There is a linear relationship between
the variables studied. From the analysis it is evident that there is a linear
relationship between the sales and the combination of the predictor
variables.
The next step is the test of significance for the regression slope (for every
independent [predictor] variable). This is to determine the contribution of
each predictor variable independently.
(c) Requirements
The researcher performs the t-test for regression slopes for the linear
relationship between Sales and the following variables:
(i) Costs for TV advertisements;
(ii) Training of sales executives;
(iii) Employing promoters;
(iv) Distributing free samples; and
(v) Leasing prime spots.
Table 9.5: The Results of the T-test to Test the Significance of the Regression Slopes
Since the p-value is smaller than 0.05, for (i) costs for TV advertisements,
(ii) employing promoters and (iii) distributing free samples.
The regression model for this relationship between Sales and costs of
advertisements is:
The adjusted R2 is 0.254, meaning that 25.4% of the variation in the sales is
attributed to the combined variation in the costs for TV advertisement,
employing promoters, distributing free samples.
The Hypothesis
Ho: The variation in patients’ overall satisfaction is not explained by the linear
model comprising of patients assessment on assurance, reliability, service
policy, tangibles, problem solving and convenience. (R2 = 0).
SPSS Procedures:
Results
Since the p-value is less than 0.05, reject the null hypothesis and conclude that a
significant portion of the variation in the dependent variable is explained by the
linear model.
The next step is the test of significance for the regression slope (for every
independent [predictor] variable). This is to determine the contribution of each
predictor variable independently.
The Hypothesis
H0: The regression slope is equal to zero.
Ha: The regression slope is not equal to zero.
Model Summaryb
Refer to Figure 9.7. The adjusted R2 is 0.619, meaning that 61.9% of the variation
in the overall satisfaction is attributed to the combined variation in patients
perception of assurance, reliability and convenience of services provided by the
hospital.
ACTIVITY 9.1
The linear relationship between two variables is evaluated from two aspects:
the strength of the relationship (correlation), and the cause-effect association
(regression).
In statistics, correlation is used to denote association between two quantitative
variables, assuming that the association is linear.
Linear regression is a technique to establish the cause effect relationships
between two variables. If the two variables are related, the changes in one will
lead to some changes in the corresponding variable. If the researcher can
identify the “cause and effect” variable, the relationship can be represented in
the form of the following equation:
Y = a + bX;
where Y is the dependent variable, X is the independent variable, and a and b
are two constants to be estimated.
INTRODUCTION
This topic provides a brief explanation on the parametric and non-parametric test.
Detailed description is given on chi-square, Mann-Whitney and Kruskal-Wallis
tests. Besides that, the assumptions underlying these statistical techniques are
provided to facilitate student learning. It demonstrates how non-parametric
statistical procedures can be computed using formulae as well as SPSS and how
the statistical results should be interpreted.
The parametric or distribution constraint test is a statistical test that requires the
distribution of the population to be specified. Thus, parametric inferential methods
assume that the distributions of the variables being assessed belong to some form
of known probability distribution (e.g. assumption that the observed data are
sampled from a normal distribution).
Choosing the right test will contribute to the validity of the research findings.
Improper use of statistical tests will not only cause the validity of the test result to
be questioned and do little justification to the research, but at times it can be a
serious error, especially if the results have major implications. For example, it is
used in policy formulation and so on.
Sample size plays a crucial role in deciding the family of statistical tests:
parametric or non-parametric. In a large sample, the central limit theorem ensures
that parametric tests work well even if the population is not normal. Parametric
tests are robust to deviations from normal distributions, when the sample size is
large. The issue here is how large is large enough; a rule of thumb suggests that a
sample size of about 30 or more for each category of observation is sufficient to
use the parametric test. The non-parametric tests also work well with large
samples. The non-parametric tests are only slightly less powerful than parametric
tests with large samples.
On the other hand, if the sample size is small we cannot rely on the central limit
theorem; thus, the p value may be inaccurate if the parametric tests were to be
used. The non-parametric test suffers greater loss of statistical power with small
sample size. Table 10.1 summarises some of the commonly used parametric and
non-parametric tests but not all of them are explained in this module.
(a) Assumptions
Even though certain assumptions are not critical for using the chi-square,
you need to address a number of generic assumptions:
Random Sampling Observations should be randomly sampled from
the population of all possible observations.
Independence Observations Each observation should be generated
by a different subject and no subject is counted twice. In other words, the
subject should appear in only one group and the groups are not related in
any way.
Size of Expected Frequencies When the number of cells is less than
10 and particularly when the total sample size is small, the lowest
expected frequency required for a chi-square test is 5.
Example :
A sample of 110 teenagers was asked, which of the four hand phone brands they
preferred. The number of people choosing the different brands was recorded in
Table 10.2.
Table 10.2: Preferences for Brands of Hand Phones
We want to find out if one or more brands are preferred over others. If they are
not, then we should expect roughly the same number of people in each category.
There will not be exactly the same number of people in each category, but they
should be near equal.
Another way of saying this is: If the null hypothesis is TRUE, and some brands
are not preferred more than others, then all brands should be equally represented.
We expect roughly EQUAL NUMBERS IN EACH CATEGORY, if the NULL
HYPOTHESIS is TRUE.
Expected Frequencies
There are 110 people, and there are four categories. If the null hypothesis is true,
then we should expect 110 / 4 = 27.5 teenagers to be in each category. This is
because, if all brands of hand phones are equally popular, we would expect
roughly equal numbers of people in each category. In other words, the number of
teenagers should be evenly distributed among the four brands.
The numbers that we find in the four categories, if the null hypothesis is true
are called the EXPECTED FREQUENCIES (i.e. all brands are equally
popular).
The numbers that we find in the four categories are called the OBSERVED
FREQUENCIES (i.e. based on the data we collected).
2
See Table 10.3. What does is to compare the Observed Frequencies with the
Expected Frequencies.
If all brands of hand phones are equally popular, the Observed Frequencies
will not differ from the Expected Frequencies.
Table 10.3 shows the observed and expected frequencies for the four brands of
hand phones. It is often difficult to tell just by looking at the data, which is why
you have to use the 2 test.
Step 1:
Calculate the differences between the Expected Frequencies and Observed
Frequencies (see Column 4). Do not worry about the plus and minus signs!
Step 2:
Square the differences (see Column 5) to obtain the absolute value of the
difference.
Step 3:
Divide the squared difference with the measure of variance (see Column 6). The
“measure of variance” is the Expected Frequencies (i.e. 27.5). For Brand A, it is
56.25 / 27.5 = 2.05 and do the same for the other brands.
Step 4:
2
Add up the figures you obtained in Column 6 and you get 53.65. So the is
53.65.
The formula for the χ2 which you did above is shown as follows:
2
2 observed frequency - expected frequency
expected frequency
Step 5:
The degrees of freedom (DF) is one less than the number of categories. In this
case, DF is 4 categories – 1 = 3. We need to know this, for it is usual to report the
DF, along with the 2 and the associated probability level.
SPSS Output
Hand phones
Chi-Square 53.65a
Df 3
Asymp. Sig. .0000
The 2 value of 53.65 (rounded to 53.6) is compared with that value that would be
expected for a 2 with 3 DF, if the null hypothesis were true (i.e. all brands of
hand phones are preferred equally). [SPSS will compute this comparison]. The
SPSS Output shows that with a 2 value of 53.6 the associated probability value is
0.0001. This means that the probability that this difference was due to chance is
very small. We can conclude that there is a significant difference between the
Observed and Expected Frequencies; i.e. all the four brands of hand phones are
not equally popular. More people prefer brand B (60) than the other hand phone
brands.
2
10.2.2 Test for Independence: 2 X 2
Chi-square ( 2 ) enables you to discover whether there is a relationship or
association between two categorical variables. For example, is there an
association between students who smoke cigarettes and those who do not smoke,
and students who are active in sports and those who are not active in sports? This
is a type of categorical data, because we are asking whether they smoke or do not
smoke (not how many cigarettes they smoke); and whether they are active or not
active in sports. The design of the study is shown in Table 10.4, which is called a
contingency table and it is a 2 x 2 table because there are two rows and two
columns.
Example
A researcher is interested in finding out whether male students from high income
or low income families get into trouble more often in school. The following Table
10.5, shows the frequencies of male students from low and high income family
who have discipline problems in school:
Step 2: Calculate the Expected Value for Each Cell of the Table
As with the goodness-of-fit example described earlier, the key idea of the chi-
square test for independence is a comparison of observed and expected values.
How many of something was expected and how many were observed in some
processes? In the case of tabular data, however, we usually do not know what the
distribution should look like. Rather, in this use of the chi-square test, expected
values are calculated based on the row and column totals from the table.
The expected value for each cell of the table can be calculated using the following
formula:
For example, in the table comparing the percentage of high income and low
income students involved in disciplinary problems, the expected count for the
number of low income students with discipline problems is:
117 83
Expected Frequency (E1) = 40.97
237
120 154
Expected Frequency (E4) = 77.97
237
Use the formula and compute the Expected Frequencies for E2 and E3. Table 10.6
shows the completed expected frequencies for all the four cells.
2
Table 10.7: Extract from the Table of Critical Values
Probability Level (alpha)
Df 0.5 0.10 0.05 0.02 0.01 0.001
1 0.455 2.706 3.841 5.412 6.635 10.827
2 1.386 4.605 5.991 7.824 9.210 13.815
3 2.366 6.251 7.815 9.837 11.345 16.268
4 3.357 7.779 9.488 11.668 13.277 18.465
5 4.351 9.236 11.070 13.388 15.086 20.517
Note:
The 2 X 2 contingency table can be extended to larger tables such as 3 X 2 or 4
X 3 depending on the number of categories in the independent and dependent
variables. The formulae and the computation procedure are similar to that of the
2 X 2 contingency table.
Note:
The 2 X 2 contingency table can be extended to larger tables such as 3 X 2 or 4
X 3 depending on the number of categories in the independent and dependent
variables. The formulae and the computation procedure are similar to that of the
2 X 2 contingency table.
ACTIVITY 10.1
ACTIVITY 10.2
Yes No Total
Urban 36 14 50
Rural 30 25 55
Total 66 39 105
Questions:
What is the null hypothesis? What is the alternative hypothesis?
How many degrees of freedom are there?
What is the value of the chi-square statistic for this table?
What is the p-value of this statistic?
n1 (n1 1)
Test Statistics, T = S where S is the sum of rank of population 1 and
2
n1 is the sample size of population 1. Population is the population with smaller
sum of rank value.
The Mann-Whitney test uses the rank sum as the test statistics. The procedure is
as follows:
• The two independent samples are combined and ranks are assigned to the
scores (it can be a mean score).
• The sum rank of Population 1 (usually the population of interest, decided
based on the null hypothesis) is computed.
• This sum rank is than used to compute the test statistics.
Example:
In assessing the effect of TV advertisements on buyers’ preference on branding, a
simple experiment was carried out. A group of adults was selected to participate
in this experiment. One group was subjected to a behaviour modification
psychotherapy using a series of television advertisements while another formed
the control group. 17 adults were given the treatment, while 10 others did not
receive any treatment. After the treatment period, both the experimental and the
control group were rated for their brand preference using the brand preference
scale. Refer to Figure 10.3.
We wish to know whether these data provide sufficient evidence to indicate that
behaviour modification psychotherapy using TV advertisements improves the
brand preference among adult shoppers.
The Hypothesis
Ho: There is no difference in the brand preference between the group that
received behaviour modification therapy and the control group.
Ha: There is a difference in the brand preference between the group that received
behaviour modification therapy and the control group.
The level of significance is set at 0.05 ( = 0.05). Table 10.9 presents the Result
of Analysis on brand preference scores of treatment and control groups.
• Ranking of the scores by arranging all the scores from both groups in
ascending order.
• A rank of 1 is given to the smallest and same score will share the rank
n (n 1) 10(10 1)
T= S 1 1 = 81.5 = 26.00
2 2
p = 0.003
Example of SPSS output of the Mann-Whitney Test (refer Figure 10.4 below).
Since the p-value is smaller than 0.05, reject null hypothesis and conclude the
alternative hypothesis. There is a difference in the brand preference between the
group that received behaviour modification therapy and the group that did not.
The brand preference score of the group that received behaviour modification
therapy is significantly different compared to the group that did not receive any
therapy. From the mean rank, it is evident that the brand preference score for the
group that received behaviour modification therapy is higher. In other words, the
behaviour modification psychotherapy using TV advertisement enhances brand
preference among adults.
The Hypothesis
Ho : There is no difference between the male and female hospital staff’s
knowledge.
Ha: There is a significant difference between the male and female hospital
staff’s knowledge
SPSS Command
k R2
12 i - 3 (N + 1), where
Test Statistics, H =
N ( N 1) i 1 ni
Example:
In studying the average amount spent on mobile phone usage, a researcher
collected the average monthly mobile phone bills from three groups of adults:
clerical staff, supervisors and managers. Table 10.11 presents the data.
Table 10.11: Data
Average monthly expenditure on mobile phone bill
Clerical 257 302 206 318 449 334 299 149 282 351
Supervisor 460 496 450 350 463 357
Manager 338 767 202 833 632
Objective:
To determine whether there is any difference in the average monthly mobile
phone expenditure among the three populations.
The Hypothesis
Ho: There is no difference in the average monthly expenditure on mobile phone
usage among clerks, supervisors and managers
H1: There are differences in the average monthly expenditure on mobile phone
usage among clerks, supervisors and managers
The level of significance is set at 0.05 ( = 0.05). Table 10.12 as follows shows
the results of the analysis.
k R2
12 i - 3 (N + 1)
H=
N ( N 1) i 1 ni
12 69 2 90 2 72 2
= ( 3 (21 1) )
21(21 1) 10 6 5
= 8.36
SPSS Output
The Kruskal-Wallis 2 value is 8.361 and the p-value is 0.015. Since the p-value
is smaller than 0.05, reject null hypothesis and conclude the alternative
hypothesis. There is a difference in the average monthly expenditure on mobile
phone usage among the three groups. The average monthly expenditure on mobile
phone usage among the three different groups is not the same. Even though the
test statistics does not provide information on the differences in the average
monthly expenditure, judging from the mean rank, clerks spend the least
compared to supervisors and managers.
The Hypothesis
Ho : There is no difference in the assessment on hospital staff knowledge among
public sector employees, private sector employees, and students.
SPSS Command
Results
Ranks
Test Statisticsa,b
Knowledge of
Since the p-value is 0.429 which is greater than
staff
(assessment 0.05, there is no difference in the assessment
before on hospital staff knowledge among public
attending sector employees, private sector employees,
seminar) and students.
Chi-Square 1.694
df 2
Asymp. Sig. .429
a. Kruskal Wallis Test
b. Grouping Variable: Employment
ACTIVITY 10.3
The following data summarises the students PASS or FAIL in a
mathematics test on fractions and the method used to teach the concept
Group Mathematics Test Performance
Pass Fail
Method X 5 21
Method Y 9 29
• There are two categories of statistical tests: (i) the parametric and (ii)
non-parametric tests.
• The parametric or distribution constraint tests are statistical tests that require
the distribution of the population to be specified.
• Among the commonly used non-parametric tests are chi-square test, Mann-
Whitney Test and Kruskal-Wallis test.
• The chi-square test tests the significant difference in proportion and is very
useful when the variable measured is nominal.
• The chi-square is very flexible and mainly used in two forms (i) comparing
the observed proportion with some known values, and (ii) comparing the
difference in distribution of proportions between two groups whereby each
group can have two or more categories.
• The Kruskal-Wallis test serves the same purpose as the one way ANOVA,
comparing the differences between more than two groups of samples from
unrelated populations. This test uses the median as the parameter for
comparisons.
• The Kruskal-Wallis test is used when the sample size is small and/or when the
level of measurement is ordinal.
Appendix A
After you have developed your questionnaire, you need to create an SPSS data file
to enable you to enter data into a format which can be read by SPSS. You can do
this via the SPSS Data Editor which is inbuilt into the SPSS package. When
creating an SPSS data file, your items/questions in the questionnaire will have to
be translated into variables. For example, if you have a question “What is your
occupation?” and this question has several response options such as 1. Salesman
2. Clerk 3. Teacher 4. Accountant 5. Others; what you need to do is to translate
your question into a variable a name, perhaps called occu. In the context of SPSS
data entry, these response options are called value labels, for example Salesman is
assigned a value label of 1, Clerk 2, Teacher 3, Accountant 4 and Others 5. If the
respondent is a teacher, you enter 3 when inputting data into the variable occu in
your data file. Sometimes you may have a question which requires the respondent
to state in absolute terms such as “Your annual salary is _________” In this case,
you can create a variable name called salary. Since this variable only requires the
respondent to state his/her salary, you do not need to create response options –
just enter the actual salary figure.
When defining the variable name, you have to consider the following:
(i) it can only have a maximum of 8 characters (however version SPSS 12.0
and above allows up to 64 characters);
(ii) it must begin with a letter;
(iii) it cannot end with a full stop or underscore;
(iv) it must be unique, i.e. no duplication is allowed;
(v) it cannot include blanks or special characters such as !, ?, ”, and *.
When defining a variable name, an uppercase character does not differ from a
lower case character.
Besides understanding the variable name convention and value labels, you will
also need to know other variable definitions such as variable label, variable type,
missing values, column format and measurement level. A variable label describes
the variable name, for example, if the variable name is occu, the variable label can
be “Respondent’s occupation”. You need not specify the variable label if do not
wish to but variable label improves the interpretability of your output especially if
you have many variables. Missing values can also be assigned to a variable. It is
rare for one to obtain a questionnaire without any item being left blank. By
convention, a missing value is usually assigned a value of 9 but for statistical
analysis it would be preferable to assign a value which is equivalent to the mean
of the variable to fill up all the missing values. However, this can only be done for
interval or ratio level variables. For example, if you have the variable income and
data were derived from 150 respondents and 20 did not provide their income
information then compute the mean of the income via SPSS for the 150
respondents and then recode all missing values as the computed mean value.
The type of variable relates closely to your items in the questionnaire. For
example, the item age is a numeric variable, meaning you can input the variable
using only numbers such as if a person’s age is 34 then you can type 34 under the
age variable column for this particular case. However, sometimes there is a need
to use alphanumeric characters to input data into a variable. A good example is
respondent’s address. In this case, alphanumeric characters constitute what is
called a string variable type. For example, a short open-ended question will be
“Please state your address.” The respondent will write his/her address using
alphanumeric characters such as 23 Jalan SS2/75, 47301 Petaling Jaya, Selangor.
So this address is actually a combination of alphabets and numbers.
The column format in the data editor allows you to specify the alignment of your
data in a column, for example left, centre or right. Measurement in the SPSS
variable definition convention differs slightly from that used in the statistics
textbook as SPSS uses scale to refer to both interval and ratio measurement.
Ordinal and nominal levels of measurement are maintained as they are. In
statistical analysis, it is extremely important to know what the level of
measurement for a particular variable is. A nominal variable (also called
categorical variable) classifies persons or objects into two or more categories,
for example, the variable gender is categorised as 1 for Male and 2 for Female,
marital status as 1 for Single, 2 for Married and 3 for Divorced. Numbering in
nominal variables does not indicate that one category is higher or better than
another, for example, representing 1 for Male and 2 for Female does not mean
that male is lower that female by virtue of the number being smaller. In nominal
measurement the numbers are only labels. On the other hand, an ordinal variable
not only classifies persons or objects; they also rank them in terms of degree.
Ordinal variables put persons or objects in order from highest to lowest or from
most to least. In ordinal scale, intervals between ranks are not equal, for
example, the difference between rank 1 and rank 2 is not necessarily the same as
the difference between rank 2 and rank 3. For example, a person(A) with a
height of 5’ 10 ” and falls under rank 1 does not have the same interval as a
person(B) with a height of 5’ 5” who is ranked 2 and another person(C) with a
height of 4’ 8” who is ranked 3. The difference in height among the three
Copyright © Open University Malaysia (OUM)
APPENDIX 185
persons is not equal but there is an order, i.e. A is taller than B and B is taller
than C.
Interval variables have all the characteristics of nominal and ordinal variables but
also have equal intervals. For example, achievement test is treated as an interval
variable. The difference in a score of 50 and a score of 60 is essentially the same
as the difference between the score of 80 and 90. Interval scales, however, do not
have a true zero point. Thus, if Ahmad has a score of 0 for Mathematics it does
not mean he has no knowledge of mathematics at all nor does Muthu scoring 100
means he has total knowledge of Mathematics. Thus, if a person scores 90 marks
we know he scores twice as high as one who scores 45 but we cannot say that a
person scoring 90 knows twice as much as a person scoring 45.
Ratio variables are the highest, most precise level of measurement. This type of
variable has all the properties of the other types of variables above. In addition, it
has a true zero point. For example a person’s height – a person who is 6 feet tall is
twice as tall a person who is 3 feet tall. A person who weighs 50 kg is one third
the weight of another who is 150 kg. Since ratio scales encompass mostly physical
measures they are not used very often in social science research.
How to define variables and enter data using the SPSS Data Editor?
Steps
1. Click Start All Programs SPSS for Windows SPSS 12.0 for
Windows select Type in data OK Variable View Start defining
your variables by specifying the following:
(a) Name: Type Gender <Enter>
(b) Type: Select Numeric OK
(c) Width: 8
(d) Decimal: 0
(e) Label: Respondent’s gender
(f) Values: Under Value, type 1; under Value Label, type Male; Click
Add
(g) Under Value again, type 2; under Value Label, type Female
(h) Click Add
(i) Missing: No missing values OK
(j) Columns: 8
(k) Align: Right
(l) Measure: Nominal
2. Proceed to define the second variable and so forth until you have completed
all variables in your questionnaire. Do note that certain variables such as ID
do not have value labels. If you are not sure what the level of measurement
for that particular variable is, you may want to keep the default which is
Scale. Do remember that if the particular variable you are defining share the
same specification such as the variable label of a variable you have already
defined, then you may merely copy it into the relevant cells.
3. After you have completed defining all your variables, the next step is to
enter data into the data cells by doing the following:
(a) Click Data View
(b) Click row 1, column 1 (note the variable name as shown)
(c) Type in the data e.g. if the respondent’s gender is male, then type 1
and then proceed to the next variable by pressing the right arrow key
( ) on your keyboard.
(d) Input the next variable and so on so forth until you have completed all
your data input.
OR
Thank you.