Statistics For Educational Research PDF

HMEF5113
STATISTICS FOR
EDUCATIONAL
RESEARCH
Prof Dr John Arul Philips
Copyright © Open University Malaysia (OUM)

Project Directors: Prof Dato’ Dr Mansor Fadzil
Assoc Prof Dr Chung Han Tek
Open University Malaysia
Module Writer: Prof Dr John Arul Philips

Asia e University
Moderators: Dr Soon Seng Thah

Educational Planning and Research Division
Ministry of Education
Assoc Prof Dr Nagarajah Lee
Developed by: Centre for Instructional Design and Technology

Printed by: Meteor Doc. Sdn. Bhd.

Lot 47-48, Jalan SR 1/9, Seksyen 9,
Jalan Serdang Raya, Taman Serdang Raya,
43300 Seri Kembangan, Selangor Darul Ehsan
First Edition, May 2009

Second Edition, December 2012 (rs)
Copyright © Open University Malaysia (OUM), December 2012, HMEF5113
All rights reserved. No part of this work may be reproduced in any form or by any means
without the written permission of the President, Open University Malaysia (OUM).

Table of Contents
Course Guide ix – xvii
Course Assignment Guide xxi – xxiii
Topic 1 Introduction to Statistics 1

1.1 What is Statistics? 1
1.2 Two Kinds of Statistics 3
1.2.1 Descriptive Statistics 3
1.2.2 Inferential Statistics 4
1.2.3 Descriptive or Inferential Statistics 4
1.3 Variables 5
1.3.1 Independent Variable 6
1.3.2 Dependent Variable 7
1.4 Operational Definition of Variables 7
1.5 Sampling 8
1.6 Sampling Techniques 10
1.6.1 Simple Random Sampling 10
1.6.2 Systematic Sampling 12
1.6.3 Stratified Sampling 12
1.6.4 Cluster Sampling 14
1.7 SPSS Software 14
Summary 15
Key Terms 16
Topic 2 Descriptive Statistics 17

2.1 What are Descriptive Statistics? 17
2.2 Measures of Central Tendency 18
2.2.1 Mean 18
2.2.2 Median 19
2.2.3 Mode 20
2.3 Measures of Variability or Dispersion 21
2.3.1 Range 21
2.3.2 Standard Deviation 22
2.4 Frequency Distribution 25
2.4.1 Tables 25
2.4.2 SPSS Procedure 26
2.5 Graphs 26
2.5.1 Bar Charts 26
2.5.2 Histogram 28
2.5.3 Line Graphs 29

iv TABLE OF CONTENTS
Summary 30
Key Terms 30
Topic 3 Normal Distribution 31

3.1 What is Normal Distribution? 31
3.2 Why is Normal Distribution Important? 32
3.3 Characteristics of The Normal Curve 32
3.3.1 Mean, Median and Mode 33
3.4 Three-Standard-Deviations Rule 34
3.5 Inferential Statistics and Normality 35
3.5.1 Assessing Normality using Graphical Methods 35
3.5.2 Assessing Normality using Statistical Techniques 47
3.6 What to Do if The Distribution is Not Normal? 50
Summary 50
Key Terms 51
Topic 4 Hypothesis Testing 53

4.1 What is a Hypothesis? 53
4.2 Testing A Hypothesis 55
4.2.1 Null Hypothesis 55
4.2.2 Alternative Hypothesis 57
4.3 Type I And Type II Error 57
4.4 Two-tailed and One-tailed Test 60
4.4.1 Two-tailed Test 60
4.4.2 One-tailed Test 63
Summary 65
Key Terms 66
Topic 5 t-test 67
5.1 What is t-test? 67
5.2 Hypothesis Testing using t-test 68
5.3 t-test for Independent Means 69
5.4 t-test for Independent Means Using SPSS 77
5.5 t-test for Dependent Means 79
5.6 t-test for Dependent Means Using SPSS 83
Summary 87
Key Terms 88

TABLE OF CONTENTS ◄ v
Topic 6 One-way Analysis of Variance (One-way ANOVA) 89

6.1 Logic of The One-way Anova 92
6.2 Between Group and Within Group Variance 93
6.3 Computing F-Statistic 94
6.4 Assumptions For Using One-way Anova 99
6.6 Using SPSS To Compute One-way Anova 101
Summary 108
Key Terms 108
Topic 7 Analysis of Covariance (ANCOVA) 109

7.1 What is Analysis of Covariance (ANCOVA)? 109
7.2 Assumptions for Using ANCOVA 112
7.3 Using ANCOVA – Pretest-Posttest Design 116
7.3.1 Before Including a Covariate 116
7.3.2 After Including a Covariate 117
Summary 121
Key Terms 121
Topic 8 Correlation 122

8.1 What is a Correlation Coefficient? 122
8.2 Pearson Product-Moment Correlation 123
Coefficient
8.2.1 Range of Values of rxy 125
8.3 Calculation of the Pearson Correlation 127
Coefficient (r Or rxy)
8.4 Pearson Product- Moment Correlation using SPSS 129
8.4.1 SPSS Output 130
8.4.2 Significance of the Correlation Coefficient 130
8.4.3 Hypothesis Testing for Significant 131
Correlation
8.4.4 To Obtain a Scatter Plot using SPSS 132
8.5 Spearman Rank Order Correlation Coefficient 133
8.6 Spearman Rank Order Correlation Using SPSS 134
Summary 136
Key Terms 136
Topic 9 Linear Regression 137

9.1 What is Simple Linear Regression? 137
9.2 Estimating Regression Coefficient 138
9.3 Significant Test for Regression Coefficients 140
9.3.1 Testing the Assumption of Linearity 140
9.3.2 Testing the Significance of the Slope 141
9.4 Simple Linear Regression using SPSS 142
9.5 Multiple Regression 145
9.6 Multiple Regression using SPSS 148

vi TABLE OF CONTENTS
Summary 152
Key Terms 152
Topic 10 Non-parametric Tests 153

10.1 Parametric Versus Non-Parametric Tests 153
10.2 Chi Square Tests 157
10.2.1 One Variable or Goodness-of-Fit Test 157
10.2.2 χ2 Test for Independence: 2 X 2 161
10.3 Mann-Whitney U tests 167
10.4 Kruskal-Wallis Rank Sum Tests 173
Summary 178
Key Terms 179
Appendix 183

COURSE GUIDE

viii COURSE GUIDE

COURSE GUIDE ix
WELCOME TO HMEF5113 STATISTICS FOR

EDUCATIONAL RESEARCH
Welcome to HMEF5113 Statistics for Educational Research, which is one of the
required courses for the Master of Education (MEd) programme. The course
assumes no previous knowledge of Statistics but it is a prerequisite course for
MEd students before they embark on their research projects. This is a three-credit
hour course conducted over a semester of 14 weeks.
WHAT WILL YOU GET FROM DOING THIS COURSE?

Description of the Course
The course provides some basic knowledge necessary for students to understand
the various statistical techniques and how to apply them when analysing data in
education and psychology. It will acquaint students to the meaning of statistics,
normal distribution and hypothesis testing. The statistical techniques explained
in this course include t-test, ANOVA, ANCOVA, correlation linear regression,
chi-square, Mann-Whitney, and Kruskal-Wallis. The emphasis is on the
assumptions underlying the use of these statistical techniques and on the
interpretation of data. Guides on how to use the SPSS in analysing the data and
their interpretations are also presented at the end of each topic.
Aim of the Course

The main aim of the course is to provide you with basic knowledge on how to
use some basic statistical techniques in educational research.
Course Learning Outcomes

By the end of this course, you should be able to:
1. Explain the differences between descriptive and inferential statistics and
their uses in educational research;
2. Assess the normality of a set of data using graphical as well as statistical
techniques;
3. Differentiate between null and alternative hypotheses and their use in
educational research; and

x COURSE GUIDE
4. Apply the different statistical techniques in educational research, conduct

statistical analyses using SPSS and make appropriate interpretations of
statistical results.
HOW CAN YOU GET THE MOST FROM THIS COURSE?
Learning Package
In this Learning Module you are provided with TWO kinds of course materials:
1. The Course Guide you are currently reading
2. The Course Content (consisting of 10 topics)
Course Synopsis
To enable you to achieve the FOUR objectives of the course, HMEF5113 is
divided into 10 topics. Specific objectives are stated at the start of each topic,
indicating what you should be able to do after completing the topic.
Topic 1: Introduction
The topic introduces the meaning of Statistics and explains the
difference between descriptive and inferential statistics. As
inferential statistics is used to make inferences about the
population on specific variables based on a sample, this topic also
explains the meanings of different types of variables and
highlights the different sampling techniques in educational
research.
Topic 2: Descriptive Statistics

The topic introduces the different descriptive statistics, namely the
mean, the median, the mode and the standard deviation, and how
they are computed. SPSS procedures on how to obtain these
descriptive statistics are also provided.
Topic 3: The Normal Distribution

The topic explains what the normal distribution is and introduces
the graphical as well as the statistical techniques used in assessing
normality. It also presents the SPSS procedures for assessing
normality.

COURSE GUIDE xi
Topic 4: Hypothesis Testing

The topic explains the difference between the null and alternative
hypotheses and their use in research. It also introduces the
concepts of Type I error and Type II error. It illustrates the
difference between the two-tailed and one-tailed tests and
explains when they are used in hypothesis testing.
Topic 5: T - test
This topic explains what the t-test is and its use in hypothesis
testing. It also highlights the assumptions for using the t-test. Two
types of t-test are elaborated in the topic. The first one is the t-test
for independent means, while the second one is the t-test for
dependent means. Computation of the t-statistic using formulae,
as well as the SPSS procedures, is explained.
Topic 6: One-way Analysis of Variance

This topic explains what one-way analysis of variance (ANOVA)
is about and the assumptions for using ANOVA in hypothesis
testing. It demonstrates how ANOVA can be computed using the
formula and the SPSS procedures. Also explained are the
interpretation of the related statistical results and the use of post-
hoc comparison tests.
Topic 7: Analysis of Covariance

This topic explains what analysis of covariance (ANCOVA) is
about and the assumptions for using ANCOVA in hypothesis
testing. It also demonstrates how to compute and interpret
ANCOVA using SPSS.
Topic 8: Correlation
This topic explains the concept of linear relationship between
variables. It discusses the use of statistical tests to determine
correlation and demonstrates how to compute correlation between
variables using SPSS and interpret correlation results.
Topic 9: Linear Regression

This topic explains the concept of causal relationship between
variables. It discusses the use of statistical tests to determine slope,
intercept and the regression equation. It also demonstrates how to
run regression analysis using SPSS and interpret the results.

xii COURSE GUIDE
Topic 10: Non-parametric Tests

This topic provides a brief explanation on the parametric and non-
parametric test. Detailed description on chi-square, Mann-
Whitney and Kruskal-Wallis tests and the assumptions underlying
these statistical techniques are provided to facilitate student
learning. It demonstrates how the non-parametric statistical
procedures can be computed using formulae as well as SPSS and
how the statistical results should be interpreted.
Organisation of Course Content

In distance learning, the module replaces the university lecturer. This is one of
the main advantages of distance learning where specially designed study allow
you to study at your own pace, anywhere and at anytime. Think of it as reading
the lecture instead of listening to a lecturer. In the same way that a lecturer might
assign something for you to read or do, the module tells you what to read, when
to read and when to do the activities. Just as a lecturer might ask you questions
in class, your module provides exercises for you to do at appropriate points.
To help you read and understand the individual topics, numerous realistic
examples support all definitions, concepts and theories. Diagrams and text are
combined into a visually appealing, easy-to-read module. Throughout the course
content, diagrams, illustrations, tables and charts are used to reinforce important
points and simplify the more complex concepts. The module has adopted the
following features in each topic:
INTRODUCTION
Lists the headings and subheadings of each topic to provide an overview of the
contents of the topic and prepare you for the major concepts to be studied and
learned.
LEARNING OUTCOMES
This is a listing of what you should be able to do after successful
completion of a topic. In other words, whether you are be able to explain,
compare, evaluate, distinguish, list, describe, relate and so forth. You
should use these indicators to guide your study. When you have finished a
topic, you must go back and check whether you have achieved the learning
outcomes or be able to do what is required of you. If you make a habit of
doing this, you will improve your chances of understanding the contents of
the course.
COURSE GUIDE xiii
SELF-CHECK
Questions are interspersed at strategic points in the topic to encourage

review of what you have just read and retention of recently learned
material. The answers to these questions are found in the paragraphs
before the questions. This is to test immediately whether you have
understood the few paragraphs of text you have read. Working through
the questions will help you determine whether you understand the topic
ACTIVITY
These are situations drawn from research projects to show how

knowledge of the principles of research methodology may be applied to
real-world situations. The activities illustrate key points and concepts
dealt with in each topic.
The main ideas of each topic are listed in brief sentences to provide a review of
the content. You should ensure that you understand every statement listed. If
you do not, go back to the topic and find out what you do not know.
Key Terms discussed in the topic are placed at the end of each topic to make you
aware of the main ideas. If you are unable to explain these terms, you should go
back to the topic to clarify.

xiv COURSE GUIDE
DISCUSSION QUESTIONS:
At the end of each topic, a list of questions is presented that are best solved
through group interaction and discussion. You can answer the questions
individually. But, you are encouraged to work with your coursemates and
discuss online and during the seminar sessions.
At the end of each topic a list of articles and titles of books is provided that is
directly related to the contents of the topic. As far as possible the articles and
books suggested for further reading will be available in OUMÊs Digital Library
(which you can access) and OUMÊs Library. Also, relevant Internet resources are
made available to enhance your understanding of selected curriculum concepts
and principles as applied in real-world situations.
WHAT SUPPORT WILL YOU GET IN STUDYING THIS

COURSE?
Seminars
There are 15 hours of seminars or face-to-face interaction supporting the course.
These consist of FIVE seminar sessions of three hours each. You will be notified
of the dates, times and location of these tutorials, together with the name and
phone number of your tutor, as soon as you are allocated a seminar group.
MyVLE Online Discussion

Besides the face-to-face seminar sessions, you have the support of online
discussions. You should interact with other students and your facilitator using
myVLE. Your contributions to the online discussion will greatly enhance your
understanding of course content, how to go about doing the assignments and
preparation for the examination.

COURSE GUIDE xv
Facilitator
Your facilitator will mark your assignment. Do not hesitate to discuss during the
seminar session or online if:
You do not understand any part of the course content or the assigned
readings
You have difficulty with the self-tests and activities
You have a question or problem with the assignment.
HOW SHOULD YOU STUDY FOR THIS COURSE?

1. Time Commitment for Studying
You should plan to spend about six to eight hours per topic, reading the
notes, doing the self-tests and activities and referring to the suggested
readings. You must schedule your time to discuss online. It is often more
convenient for you to distribute the hours over a number of days rather
than spend one whole day per week on study. Some topics may require
more work than others, although on average, it is suggested that you spend
approximately three days per topic.
2. Proposed Study Strategy

The following is a proposed strategy for working through the course. If you
run into any trouble, discuss it with your facilitator either online or during
the seminar sessions. Remember, the facilitator is there to help you.
(a) The most important step is to read the contents of this Course Guide
thoroughly.
(b) Organise a study schedule. Note the time you are expected to spend
on each topic and the date for submission of your assignment as well
as seminar and examination dates. These are stated in your Course
Assessment Guide. Put all this information in one place, such as your
diary or a wall calendar. Whatever method you choose to use, you
should decide on and jot down your own dates for working on each
topic. You have some flexibility as there are 10 topics spread over a
period of 14 weeks.
(c) Once you have created your own study schedule, make every effort to
„stick to it‰. The main reason students are unable to cope is because
they get behind in their coursework.

xvi COURSE GUIDE
(d) To begin reading a topic:

Remember in distance learning, much of your time will be spent
READING the course content. Study the list of topics given at the
beginning of each topic and examine the relationship of the topic
to the other nine topics.
Read the Topic Overview showing the headings and subheadings
to get a broad picture of the topic.
Read the topicÊs Learning Outcomes (what is expected of you). Do
you already know some of the things to be discussed? What are
the things you do not know?
Read the Introduction (see how it is connected with the previous
topic).
Work through the topic. (The contents of the topic have been
arranged to provide a sequence for you to follow)
As you work through the topic, you will be asked to do the
self-test at appropriate points in the topic. This is to find out if you
understand what you have just read.
Do the Activities (to see if you can apply the concepts learned to
real-world situations)
(e) When you have completed the topic, review the learning outcomes to
confirm that you have achieved them and are able to do what is
required.
(f) If you are confident, you can proceed to the next topic. Proceed topic
by topic through the course and try to pace your study so that you
keep yourself on schedule.
(g) After completing all topics, review the course and prepare yourself for
the final examination. Check that you have achieved all topic learning
outcomes and the course objectives (listed in this Course Guide).
FINAL REMARKS
Once again, welcome to the course. To maximise your gain from this course
you should try at all times to relate what you are studying to the real world.
Look at the environment in your institution and ask yourself whether the ideas
discussed apply. Most of the ideas, concepts and principles you learn in this
course have practical applications. It is important to realise that much of what

COURSE GUIDE xvii
we do in education and training has to be based on sound theoretical

foundations. The contents of this course provide the principles and theories
explaining human learning whether it be in a school, college, university or
training organisation.
We wish you success with the course and hope that you will find it interesting,
useful and relevant in your development as a professional. We hope you will
enjoy your experience with OUM and we would like to end with a saying by
Confucius „Education without thinking is labour lost.‰

xviii COURSE GUIDE

COURSE ASSIGNMENT
GUIDE

xx COURSE ASSIGNMENT GUIDE

COURSE ASSIGNMENT GUIDE xxi
INTRODUCTION
This guide explains the basis on which you will be assessed in this course during
the semester. It contains details of the facilitator-marked assignments, final
examination and participation required for the course.
One element in the assessment strategy of the course is that all students should
have the same information as facilitators about the answers to be assessed.
Therefore, this guide also contains the marking criteria that facilitators will use in
assessing your work.
Please read through the whole guide at the beginning of the course.
ACADEMIC WRITING
(a) Plagiarism
(i) What is Plagiarism?
Any written assignment (essays, project, take-home exams, etc)
submitted by a student must not be deceptive regarding the abilities,
knowledge or amount of work contributed by the student. There are
many ways that this rule can be violated. Among them are:
Paraphrases: A closely reasoned argument of an author is paraphrased but

the student does not acknowledge doing so. (Clearly, all our
knowledge is derived from somewhere, but detailed arguments
from clearly identifiable sources must be acknowledged.)
Outright Large sections of the paper are simply copied from other
plagiarism: sources, and the copied parts are not acknowledged as
quotations.
Other These often include essays written by other students or sold by
sources: unscrupulous organisations. Quoting from such papers is
perfectly legitimate if quotation marks are used and the source
is cited.
Works by Taking credit deliberately or not deliberately for works
others: produced by others without giving proper acknowledgement.
These works include photographs, charts, graphs, drawings,
statistics, video clips, audio clips, verbal exchanges such as
interviews or lectures, performances on television and texts
printed on the Web.
Duplication The student submits the same essay for two or more courses.

xxii COURSE ASSIGNMENT GUIDE
(ii) How Can I Avoid Plagiarism?

Insert quotation marks around Âcopy and pasteÊ clause, phrase,
sentence, paragraph and cite the original source.
Paraphrase clause, phrase, sentence or paragraph in your own
words and cite your source.
Adhere to the APA (American Psychological Association) stylistic
format, whichever applicable, when citing a source and when
writing out the bibliography or reference page.
Attempt to write independently without being overly dependent
of information from anotherÊs original works.
Educate yourself on what may be considered as common
knowledge (no copyright necessary), public domain (copyright
has expired or not protected under copyright law) or copyright
(legally protected).
(b) Documenting Sources

Whenever you quote, paraphrase, summarise, or otherwise refer to the
work of another, you are required to cite its original source documentation.
Offered here are some of the most commonly cited forms of material.
Direct Citation Simply having a thinking skill is no assurance

that children will use it. In order for such skills to
become part of day-to-day behaviour, they must
be cultivated in an environment that value and
sustains them. „Just as childrenÊs musical skills
will likely lay fallow in an environment that
doesnÊt encourage music, learnerÊs thinking skills
tend to languish in a culture that doesnÊt
encourage thinking‰ (Tishman, Perkins and Jay,
1995, p.5)
Indirect Citation According to Wurman (1988), the new disease of

the 21st century will be information anxiety,
which has been defined as the ever-widening gap
between what one understands and what one
thinks one should understand.
(c) Referencing
All sources that you cite in your paper should be listed in the Reference
section at the end of your paper. Here is how you should do your
Reference.

COURSE ASSIGNMENT GUIDE xxiii
Journal Article DuFour, R. (2002). The learning-centred principal:

Educational Leadership, 59(8). 12-15.
Online Journal Evnine, S. J. (2001). The universality of logic: On the
connection between rationality and logical ability
[Electronic version]. Mind, 110, 335-367.
Webpage National Park Service. (2003, February 11). Abraham
Lincoln Birthplace National Historic Site.
Retrieved February 13, 2003, from
http://www.nps.gov/abli/
Book Naisbitt, J. and Aburdence, M. (1989). Megatrends
2000. London: Pan Books.
Article in a Nickerson, R. (1987). Why teach thinking? In J. B.
Book Baron & R.J. Sternberg (Eds). Teaching thinking
skills: Theory and practice. New York: W.H.
Freeman and Company. 27-37.
Printed Holden, S. (1998, May 16). Frank Sinatra dies at 82:
Newspaper Matchless stylist of pop. The New York Times, pp.
A1, A22-A23.
ASSESSMENT
Please refer to myVLE.
TAN SRI DR ABDULLAH SANUSI (TSDAS)

DIGITAL LIBRARY
The TSDAS Digital Library has a wide range of print and online resources
for the use of its learners. This comprehensive digital library, which is
accessible through the OUM portal, provides access to more than 30 online
databases comprising e-journals, e-theses, e-books and more. Examples of
databases available are EBSCOhost, ProQuest, SpringerLink, Books24x7,
InfoSci Books, Emerald Management Plus and Ebrary Electronic Books. As
an OUM learner, you are encouraged to make full use of the resources
available through this library.

xxiv COURSE ASSIGNMENT GUIDE

Topic Introduction
1 to Statistics
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Define statistics;
2. Differentiate between descriptive and inferential statistics;
3. Compare the different types of variables;
4. Explain the importance of sampling; and
5. Differentiate between the types of sampling procedures.
INTRODUCTION
This topic introduces the meaning of statistics and explains the difference between
descriptive and inferential statistics. As inferential statistics is used to make
inference about the population on specific variables based on a sample, this topic
also explains the meanings of different types of variables and highlights the
different sampling techniques in educational research.
1.1 WHAT IS STATISTICS?

Let us refer to some definitions of statistics:
American Heritage Dictionary® defines statistics as:
“The mathematics of the collection, organisation and interpretation of

numerical data, especially the analysis of population characteristics by
inference from sampling.”

2 TOPIC 1 INTRODUCTION TO STATISTICS
The Merriam-Webster’s Collegiate Dictionary® defines statistics as:
“A branch of mathematics dealing with the collection, analysis, interpretation

and presentation of masses of numerical data.”
Websters’s New World Dictionary® defines statistics as:
“Facts or data of a numerical kind; assembled, classified and tabulated so as to

present significant information about a given subject.”
Jon Kettenring, President of the American Statistics Association, defines

statistics as:
“The science of learning from data. Statistics is essential for the proper
running of government, central to decision making in industry, and a core
component of modern educational curricula at all levels. ”
Note that the word "mathematics" is mentioned in two of the definitions above,
while "science" is stated in the other definition. Some students are afraid of
mathematics and science. These students feel that since they are from the fields of
humanities and social sciences, they are weak in mathematics. Being terrified of
mathematics does not just happen overnight. Chances are that you may have had
bad experiences with mathematics in earlier years (Kranzler, 2007).
Fear of mathematics can lead to a defeatist attitude which may affect the way you
approach statistics. In most cases, the fear of statistics is due to irrational beliefs.
Just because you had difficulty in the past, does not mean that you will always
have difficulty with quantitative subjects. You have come this far in your
education and by doing this course in statistics, it is not likely that you are an
incapable person.
You have to convince yourself that statistics is not a difficult subject and you need
not worry about the mathematics involved. Identify your irrational beliefs and
thoughts about statistics. Are you telling yourself: "I'll never be any good in
statistics." “I'm a loser when it comes to anything dealing with numbers," or
"What will other students think of me if I do badly?"

INTRODUCTION TO STATISTICS 3
For each of these irrational beliefs about your abilities, ask yourself what evidence
is there to suggest that "you will never be good in statistics" or that "you are weak
at mathematics." When you do that, you will begin to replace your irrational
beliefs with positive thoughts and you will feel better. You will realise that your
earlier beliefs about statistics are the cause of your unpleasant emotions. Each
time you feel anxious or emotionally upset, question your irrational beliefs. This
may help you to overcome your initial fears.
Keeping this in mind, this course has been written by presenting statistics in a
form that appeals to those who fear mathematics. Emphasis is on the applied
aspects of statistics and with the aid of a statistical software called Statistical
Package for the Social Sciences (or better known as SPSS), you need not worry
too much about the intricacies of mathematical formulas. Computations of
mathematical formulas have been kept to a minimum. Nevertheless, you still need
to know about the different formulas used, what they mean and when they are
used.
1.2 TWO KINDS OF STATISTICS

Statistics are all around you. Television uses a lot of statistics: for example, when
it reports that during the holidays, a total of 134 people died in traffic accidents;
the stock market fell by 26 points; or that the number of violent crimes in the city
has increased by 12%. Imagine a football game between Manchester United and
Liverpool and no one kept score! Without statistics, you could not plan your
budget, pay your taxes, enjoy games to their fullest, evaluate classroom
performance and so forth. Are you beginning to get the picture? We need
statistics. Generally, there are two kinds of statistics:
• Descriptive Statistics
• Inferential Statistics
1.2.1 Descriptive Statistics

Descriptive statistics are used to describe the basic features of the data in a study.
Historically, descriptive statistics began during Roman times when the empire
undertook census of births, deaths, marriages and taxes. They provide simple
summaries about the sample and the measures. Together with simple graphics
analysis, they form the basis of virtually every quantitative analysis of data. With
descriptive statistics, you are simply describing what is or what the data show.

Descriptive statistics are used to present quantitative descriptions in a manageable

form. In a research study, we may have lots of measures. Or we may measure a
large number of people on any measure. Descriptive statistics help us to simplify
large amounts of data in a sensible way. Each descriptive statistic reduces lots of
data into a simple summary. For instance, the Grade Point Average (GPA) for a
student describes the general performance of a student across a wide range of
subjects or courses.
Descriptive statistics includes the construction of graphs, charts and tables and the
calculation of various descriptive measures such as averages (e.g. mean) and
measures of variation (e.g. standard deviation). The purpose of descriptive
statistics is to summarise, arrange and present a set of data in such a way that
facilitates interpretation. Most of the statistical presentations appearing in
newspapers and magazines are descriptive in nature.
1.2.2 Inferential Statistics

Inferential statistics or statistical induction comprises the use of statistics to
make inferences concerning some unknown aspect of a population. Inferential
statistics are relatively new. Major development began with the works of Karl
Pearson (1857-1936) and the works of Ronald Fisher (1890-1962) who published
their findings in the early years of the 20th century. Since the work of Pearson and
Fisher, inferential statistics has evolved rapidly and is now applied in many
different fields and disciplines.
Inference is the act or process of deriving a conclusion based solely on what one
already knows. In other words, you are trying to reach conclusions that extend
beyond data obtained from your sample towards what the population might think.
You are using methods for drawing and measuring the reliability of conclusions
about a population based on information obtained from a sample of the
population. Among the widely used inferential statistical tools are t-test, analysis
of variance, Pearson’s correlation, linear regression and multiple regression.
1.2.3 Descriptive or Inferential Statistics

Descriptive statistics and inferential statistics are interrelated. You must always
use techniques of descriptive statistics to organise and summarise the information
obtained from a sample before carrying out an inferential analysis. Furthermore,
the preliminary descriptive analysis of a sample often reveals features that lead
you to the choice of the appropriate inferential method.

As you proceed through this course, you will obtain a more thorough
understanding of the principles of descriptive and inferential statistics. You should
establish the intent of your study. If the intent of your study is to examine and
explore the data obtained for its own intrinsic interest only, the study is
descriptive. However, if the information is obtained from a sample of a population
and the intent of the study is to use that information to draw conclusions about the
population, the study is inferential. Thus, a descriptive study may be performed on
a sample as well as on a population. Only when an inference is made about the
population, based on data obtained from the sample, does the study become
inferential.
SELF-CHECK 1.1
1. Define statistics.
2. Explain the differences between descriptive and inferential statistics.
3. When would you use the two types of statistics?
4. Explain two ways in which descriptive statistics and inferential
statistics are interrelated.
1.3 VARIABLES
Before you can use a statistical tool to analyse data, you need to have data which
have been collected. What is data? Data is defined as pieces of information which
are processed or analysed to enable interpretation. Quantitative data consist of
numbers, while qualitative data consist of words and phrases. For example, the
scores obtained from 30 students in a mathematics test are referred to as data. To
explain the performance of these students you need to process or analyse the
scores (or data) using a calculator or computer or manually. We collect and
analyse data to explain a phenomenon. A phenomenon is explained based on the
interaction between two or more variables. The following is an example of a
phenomenon:
Intelligence Quotient (IQ) and Attitude Influence

Performance in Mathematics
Note that there are THREE variables explaining the particular phenomenon,
namely, Intelligence Quotient, Attitude and Mathematics Performance.

What is a Variable?
A variable is a construct that is deliberately and consciously invented or adopted
for a special scientific purpose. For example, the variable “Intelligence” is a
construct based on observation of presumably intelligent and less intelligent
behaviours. Intelligence can be specified by observing and measuring using
intelligence tests, as well as interviewing teachers about intelligent and less
intelligent students. Basically, a variable is something that “varies” and has a
value. A variable is a symbol to which are assigned numerals or values. For
example, the variable “mathematics performance” is assigned scores obtained
from performance on a mathematics test and may vary or range from 0 to 100.
A variable can be either a continuous variable or categorical variable. In the

case of the variable “gender” there are only two values, i.e. male and female, and
is called a categorical variable. Other examples of categorical variables include
graduate – non-graduate, low income – high income, citizen – non-citizen. There
are also variables which have more than two values. For example, religion such as
Islam, Christianity, Sikhism, Buddhism and Hinduism may have several values.
Categorical variable are also known as nominal variables. A continuous variable
has numeric value like 1, 2, 3, 4, 10...etc. An example is the scores on
mathematics performance which range from 0 to 100. Other examples are salary,
age, IQ, weight, etc.
When you use any statistical tool, you should be very clear on which variables
have been identified as independent and which are dependent variables.
1.3.1 Independent Variable

An independent variable (IV) is the variable that is presumed to cause a change in
the dependent variable (DV). The independent variables are the antecedents, while
the dependent variable is the consequent. See Figure 1.1 which describes a study
to determine which teaching method (independent variable) is effective in
enhancing the academic performance in history (dependent variable) of students.
An independent variable (teaching method) can be manipulated. ‘Manipulated’

means the variable can be manoeuvred, and in this case it is divided into
‘discovery method’ and ‘lecture method’. Other examples of independent
variables are gender (male and female), race (Malay, Chinese and Indian) and
socioeconomic status (high, middle and low). Other names for the independent
variable are treatment, factor and predictor variable.

1.3.2 Dependent Variable

A dependent variable is a variable dependent on other variable(s).The dependent
variable in this study is the academic performance which cannot be manipulated
by the researcher. Academic performance is a score and other examples of
dependent variables are IQ (score from IQ tests), attitude (score on an attitude
scale), self-esteem (score from a self-esteem test) and so forth. Other names for
the dependent variable are outcome variable, results variable and criterion
variable.
Figure 1.1: An example of independent variables and dependent variables
Put it another way, the DV is the variable predicted to, whereas the independent
variable is predicted from. The DV is the presumed effect, which varies with
changes or variation in the independent variable.
1.4 OPERATIONAL DEFINITION OF VARIABLES

As mentioned earlier, a variable is “deliberately” constructed for a specific
purpose. Hence, a variable used in your study may be different from a variable
used in another study even though they have the same name. For example, the
variable “academic achievement” used in your study may be computed based on
performance in the UPSR examination; while in another study, it may be
computed using a battery of tests you developed. Operational definition
(Bridgman, 1927) means that variables used in the study must be defined as it is
used in the context of the study. This is done to facilitate measurement and to
eliminate confusion.
Thus, it is essential that you stipulate clearly how you have defined variables
specific to your study. For example, in an experiment to determine the
effectiveness of the discovery method in teaching science, the researcher will have
to explain in great detail the variable “discovery method” used in the experiment.

Even though there are general principles of the discovery method, its application
in the classroom may vary. In other words, you have to define the variable
operationally or how it is used in the experiment.
SELF-CHECK 1.2
1. What is a variable?
2. Explain the differences between a continuous variable and
nominal variable.
3. Why should variables be operationally defined?
1.5 SAMPLING
Every day, we make judgments and decisions based on samples. For example,
when you pick a grape and taste it before buying the whole bunch of grapes, you
are doing a sampling. Based on the one grape you have tasted, you will make the
decision whether to buy the grapes or not. Similarly, when a teacher asks a student
two or three questions, he is trying to determine the student’s grasp of an entire
subject. People are not usually aware that such a pattern of thinking is called
sampling.
• Population (Universe) is defined as an aggregate of people, objects, items,
etc. possessing common characteristics. It is a complete group of people,
objects, items, etc. about which we want to study. Every person, object, item,
etc. has certain specified attributes. In Figure 1.2, the population consists of #,
$, @, & and %.
• Sample is that part of the population or universe which we select for the
purpose of investigation. The sample is used as an "example" and in fact the
word sample is derived from the Latin exemplum, which means example. A
sample should exhibit the characteristics of the population or universe; it
should be a "microcosm," a word which literally means "small universe." In
Figure 1.2, the sample also consists of one #, $, @, & and %.

Figure 1.2: Drawing a sample from the population
We use samples to make inferences about the population. Reasoning from a

sample to the population is called statistical induction or inference. Based on the
characteristics of a specifically chosen sample (a small part of the population of
the group that we observe), we make inferences concerning the characteristics of
the population. We measure the trait or characteristic in a sample and generalise
the finding to the population from which the sample was taken.
Why is a sample used in educational research?
The study of a sample offers several advantages over a complete study of the
population. Why and when is it desirable to study a sample rather than the
population or universe?
• In most studies, investigation of the sample is the only way of finding out
about a particular phenomenon. In some cases, due to financial, time and
physical constraints, it is practically impossible to study the whole population.
Hence, an investigation of the sample is the only way of making a study.
• If one were to study the population, then every item in the population is
studied. Imagine having to study 500,000 Form 5 students in Malaysia!
Wonder what the costs will be! Even if you have the money and time to study
the entire population of Form 5 students in the country, it may take so much
time that the findings will be no use by the time they become available.

• Studying the population may not be necessary, since we have sound sampling
techniques that will yield satisfactory results. Of course, we cannot expect
from a sample exactly the same answer that might be obtained from studying
the whole population.
• However, by using statistics, we can establish based on the results obtained
from a sample, the limits, with a known probability where the true answer lies.
• We are able to generalise logically and precisely about different kinds of
phenomena which we have never seen simply based upon a sample of, say,
200 students.
ACTIVITY 1.1
1. What is the difference between a population and a sample?
2. Why is a study of the population practically impossible?
3. “The sample should be representative of the population.” Explain.
4. Provide a scenario of your own, in which a sample is not
representative.
5. Explain why a sample of 30 doctors from Kuala Lumpur taken to
estimate the average income of all Kuala Lumpur residents is not
representative.
1.6 SAMPLING TECHNIQUES

When some students are asked how they selected the sample for a study, quite a
few are unable to explain convincingly the techniques used and the rationale for
selecting the sample. If you have to draw a sample, you must choose the method
for obtaining the sample from the population. In making that choice, keep in mind
that the sample will be used to draw conclusions about the entire population.
Consequently, the sample should be a representative sample, that is, it should
reflect as closely as possible the relevant characteristics of the population under
consideration.
1.6.1 Simple Random Sampling

All individuals in the defined population have an equal and independent chance of
being selected as a member of the sample. “Independent” means that the selection
of one individual does not affect in any way the selection of any other individual.
So, each individual, event or object has an equal probability of being selected.
Suppose for example there are 10,000 Form 1 students in a particular district and
you want to select a simple random sample of 500 students, when we select the
first case, each student has one chance in 10,000 of being selected. Once the
student is selected, the next student to be selected has a 1 in 9,999 chance of being
selected. Thus, as each case is selected, the probability of being selected next
changes slightly because the population from which we are selecting has become
one case smaller.
Using a Table of Random Numbers (refer to Figure 1.3) to select a sample,

obtain a list of all Form 1 students in Daerah Petaling and assign a number to each
student. Then, get a table of random numbers which consists of a long series of
three or four digit numbers generated randomly by a computer. Using the table,
you randomly select a row or column as a starting point, then select all the
numbers that follow in that row or column. If more numbers are needed, proceed
to the next row or column until enough numbers have been selected to make up
the desired sample.
Figure 1.3: Table of Random Numbers
Say, for example, you choose line 3 and begin your selection. You will select
student #265, followed by student #313 and student #492. When you come to
‘805’ you skip the number because you only need numbers between 1 and 500.
You proceed to the next number, i.e. student #404. Again you skip ‘550’ and
proceed to select student #426. You continue until you have selected all 500
students to form your sample. To avoid repetition, you also eliminate numbers
that have occurred previously. If you have not found enough numbers by the time
you reach the bottom of the table, you move over to the next line or column.

SELF-CHECK 1.3
1. What is the meaning of random?

2. What is simple random sampling technique?
3. Explain the use of the Table of Random Numbers in the
selection of a random sample.
1.6.2 Systematic Sampling

Systematic sampling is random sampling with a system. From the sampling
frame, a starting point is chosen at random, and thereafter at regular intervals. If it
can be ensured that the list of students from the accessible population is randomly
listed, then systematic sampling can be used. First, you divide the accessible
population (1,000) by the sample desired (100) which will give you 10. Next,
select a figure less or smaller than the number arrived by the division i.e. less than
10. If you choose 8, then you select every eighth name from the list of population.
If the random starting point is 10, then the subjects selected are 10, 18, 26, 34, 42,
50, 58, 66 and 74 until you have your sample of 100 subjects. This method differs
from random sampling because each member of the population is not chosen
independently. The advantage is that it spreads the sample more evenly over the
population and it is easier to select than a simple random sample.
ACTIVITY 1.2
1. Briefly discuss how you would select a sample of 300 teachers from a
population of 5,000 teachers in a district using systematic sampling.
2. What are some advantages of using systematic sampling?
1.6.3 Stratified Sampling

In certain studies, the researcher wants to ensure that certain sub-groups or stratum
of individuals are included in the sample and for this stratified sampling is
preferred. For example, if you intend to study differences in reasoning skills among
students in your school according to socio-economic status and gender, random
sampling may not ensure that you have sufficient number of male and female
students with the socio-economic levels. The size of the sample in each stratum is

taken in proportion to the size of the stratum. This is called proportional

allocation. Suppose that Table 1.1 shows the population of students in your school.
Table 1.1: Population of Students in Your School

Male, High Income 160
Female, High Income 140
Male, Low Income 360
Female, Low Income 340
TOTAL 1,000
The first step is to calculate the percentage in each group.

% male, high income = ( 160 / 1,000 ) x 100 = 16%
% female, high income = ( 140 / 1,000 ) x 100 = 14%
% male, low income = ( 360 / 1,000 ) x 100 = 36%
% female, low income = ( 340 / 1,000) x 100 = 34%
If you want a sample of 100 students, you should ensure that:

16% should be male, high income = 16 students
14% should be female, high income = 14 students
36% should be male, low income = 36 students
34% should be female, low income = 34 students
When you take a sample from each stratum randomly, it is referred to as

stratified random sampling. The advantage of stratified sampling is that it
ensures better coverage of the population than simple random sampling. Also, it is
often administratively more convenient to stratify a sample so that interviewers
can be specifically trained to deal with a particular age group or ethnic group.

ACTIVITY 1.3
Male, full-time teachers = 90

Male, part-time teachers = 18
Female, full-time teachers = 63
Female, part-time teachers = 9
The data above shows the number of full-time and part-time teachers in a
school according to gender.
Select a sample of 40 teachers using stratified sampling.
1.6.4 Cluster Sampling

In cluster sampling, the unit of sampling is not the individual but rather a naturally
occurring group of individuals. Cluster sampling is used when it is more feasible
or convenient to select groups of individuals than it is to select individuals from a
defined population. Clusters are chosen to be as heterogeneous as possible, that is,
the subjects within each cluster are diverse and each cluster is somewhat
representative of the population as a whole. Thus, only a sample of the clusters
needs to be taken to capture all the variability in the population.
For example, in a particular district there are 10,000 households clustered into 25
sections. In cluster sampling, you draw a random sample of five sections or
clusters from the list of 25 sections or clusters. Then, you study every household
in each of the five sections or clusters. The main advantage of cluster sampling is
that it saves time and money. However, it may be less precise than simple random
sampling.
1.7 SPSS SOFTWARE

SPSS software is frequently used by educational researchers for data analysis. It
can be used to generate both descriptive and inferential statistical output to answer
research questions and test hypotheses. The software is modular with the base
module as its core. The other more commonly used modules are Regression
Models and Advanced Models.
To use SPSS, you have to create the SPSS data file. Once this data file is created
and data entered, you can run statistical procedures to generate your statistical
output. Refer to Appendix A at the end of this module on how to go about
creating this SPSS data file.
• Statistics is a branch of mathematics dealing with the collection, analysis,

interpretation and presentation of masses of numerical data.
• Descriptive statistics include the construction of graphs, charts and tables and
the calculation of various descriptive measures such as averages (means) and
measures of variation (standard deviations).
• Inferential statistics or statistical induction comprises the use of statistics to

make inferences concerning some unknown aspect of a population.
• A variable is a construct that is deliberately and consciously invented or

adopted for a special scientific purpose.
• A variable can be either a continuous variable (ordinal variable) or categorical

variable (nominal variable).
• An independent variable (IV) is the variable that is presumed to cause a

change in the dependent variable (DV).
• A dependent variable is a variable dependent on other variable(s).
• Operational definition means that variables used in the study must be defined
as it is used in the context of the study.
• Population (universe) is defined as an aggregate of people, objects, items, etc.
possessing common characteristics, while sample is that part of the population
or universe we select for the purpose of investigation.
• In simple random sampling, all individuals in the defined population have an

equal and independent chance of being selected as a member of the sample.
• Systematic sampling is random sampling with a system. From the sampling

frame, a starting point is chosen at random, and thereafter at regular intervals.
• In a stratified sample, the sampling frame is divided into non-overlapping

groups or strata and a sample is taken from each stratum.
• In cluster sampling, the unit of sampling is not the individual but rather a
natural group of individuals.

Cluster sampling Random sampling

Dependent variable Sampling
Descriptive statistics Statistics
Independent variable Stratified sampling
Inferential statistics Systematic sampling
Nominal variable Variable
Ordinal variable

Topic Descriptive
Statistics
2
LEARNING OUTCOMES
1. Explain what is meant by descriptive statistics;
2. Compute the mean;
3. Compute the standard deviation;
4. Explain the implication of differences in standard deviations;
5. Identify the median and the mode; and
6. Explain the types of charts used to display data.
INTRODUCTION
This topic introduces the different descriptive statistics, namely the mean, the
median, the mode and the standard deviation, and how they are computed. SPSS
procedures on how to obtain these descriptive statistics are also provided.
2.1 WHAT ARE DESCRIPTIVE STATISTICS?

Descriptive statistics are used to summarise a collection of data and present it in a
way that can be easily and clearly understood. For example, a researcher
administered a scale via a questionnaire to measure self-esteem among 500
teenagers. How might these measurements be summarised? There are two basic
methods: numerical and graphical. Using the numerical approach, one might
compute the mean and the standard deviation. Using the graphical approach, one
might create a frequency table, bar chart, a line graph or a box plot. These
graphical methods display detailed information about the distribution of the
18 TOPIC 2 DESCRIPTIVE STATISTICS
scores. Graphical methods are better suited than numerical methods for
identifying patterns in the data. Numerical approaches are more precise and
objective.
Descriptive statistics are typically distinguished from inferential statistics. With

descriptive statistics you are simply describing what is or what the data show
based on the sample. With inferential statistics, you are trying to reach
conclusions based on the sample that extend beyond the immediate data. For
instance, we use inferential statistics to infer from the sample data what the
population might think. Or, we use inferential statistics to make judgments of the
probability that an observed difference between groups is dependable or might
have happened by chance in this study. Thus, we use inferential statistics to make
inferences from our data to more general conditions; we use descriptive statistics
simply to describe what is going on in our data.
Descriptive statistics are used to present quantitative descriptions in a manageable

form. In a research study, we may have lots of measures or we may measure a
large number of people on any measure. Descriptive statistics help us to simply
depict large amounts of data in a sensible way. Each descriptive statistic reduces
lots of data into a simpler summary. For instance, consider Grade Point Average
(GPA). This single number describes the general performance of a student across
a potentially wide range of course experiences. The number describes a large
number of discrete events such as the grade obtained for each subject taken.
However, every time you try to describe a large set of observations with a single
indicator you run the risk of distorting the original data or losing important details.
The GPA does not tell you whether a student was in a difficult or easy course, or
whether the student was taking courses in his major field or in other disciplines.
Given these limitations, descriptive statistics provide a powerful summary of
phenomena that may enable comparisons across people or other units.
2.2 MEASURES OF CENTRAL TENDENCY
2.2.1 Mean
Mean and the standard deviation are the most widely used statistical tools in
educational and psychological research. Mean is the most frequently used
measure of central tendency, while standard deviation is the most frequently used
measure of variability or dispersion.

TOPIC 2 DESCRIPTIVE STATISTICS 19
Computing the Mean
The mean or X (pronounced as X bar) is the figure obtained when the sum of all
the items in the group is divided by the number of items (N). Say for example you
have the score of 10 students on a science test.
The sum ( ) of all the ten scores = 23 + 22 + 26 + 21 + 30 + 24 + 20 + 27 + 25

+ 32
= 250
X 250
Mean or X = 25.0
N 10
In the computation of the mean, every item counts. As a result, extreme values at
either end of the group or series of scores severely affect the value of the mean.
The mean could be "pulled towards" as a result of the extreme scores which may
give a distorted picture of the groups or series of scores or data.
However, in general, the mean is a good measure of central tendency for roughly
symmetric distributions but can be misleading in skewed distributions (see the
example on page 20) since it can be greatly influenced by extreme scores.
2.2.2 Median
Median is the score found at the exact middle of the set of values. One way to
compute the median is to list all scores in ascending order and then locate the
score in the centre of the sample. For example, if we order the following seven
scores as shown below, we would get:
12, 18, 22, 25, 30, 37, 40
Score 25 is the median because it represents the halfway point for the distribution
of scores.
Look at this set of eight scores. What is the median score?
15, 15, 15, 20, 20, 21, 25, 36
There are eight scores. The fourth score (20) and the fifth score (20) represent the
halfway point. Since both of these scores are 20, the median is 20.

If the two middle scores had different values, you have to interpolate to determine
the median by adding up the two values and dividing the sum by 2. For example,
15, 15, 15, 18, 20, 21, 25, 36
The median is (18 + 20)/2 = 19.
2.2.3 Mode
Mode is the most frequently occurring value in the set of scores. To determine the
mode, you might again order the scores as shown below and then count each one.
15, 15, 15, 20, 20, 21, 25, 36
The most frequently occurring value is the mode. In our example, the value 15
occurs three times and is the mode. In some distributions, there is more than one
modal value. For instance, in a bimodal distribution there are two values that
occur most frequently.
If the distribution is truly normal (i.e. bell-shaped), the mean, median and mode
are all equal to each other.
Should You Use the Mean or the Median?
The mean and median are two common measures of central tendencies of a
typical score in a sample. Which of these two should you use when describing
your data? It depends on your data. In other words, you should ask yourself
whether the measure of central tendency you have selected gives a good
indication of the typical score in your sample. If you suspect that the measure of
central tendency selected does not give a good indication of the typical score, then
you most probably have chosen the wrong one.
The mean is the most frequently used measure of central tendency and it should
be used if you are satisfied that it gives a good indication of the typical score in
your sample. However, there is a problem with the mean. Since it uses all the
scores in a distribution, it is sensitive to extreme scores.
Example:
The mean for these set of nine scores:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 42 is 29.89
If we were to change the last score from 42 to 70, see what happens to the mean:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 70 is 33.00
Obviously, this mean is not a good indication of the typical score in this set of
data. The extreme score has changed the mean from 29.89 to 33.00. If these were
test scores, it may give the impression that students performed better in the later
test when in fact only one student scored highly.
NOTE: Keep in mind this characteristic when interpreting the mean

obtained from a set of data.
If you find that you have an extreme score and you are unable to use the mean,
then you should use the median. The median is not sensitive to extreme scores. If
you examine the above example, the median is 30 in both distributions. The
reason is simply that the median score does not depend on the actual scores
themselves beyond putting them in ascending order. So the last score in a
distribution could be 80, 150 or 5,000 and the median still would not change. It is
this insensitivity to extreme scores that makes the median useful when you cannot
use the mean.
2.3 MEASURES OF VARIABILITY OR

DISPERSION
Variability or dispersion refers to the spread of the values around the central
tendency. There are two common measures of dispersion, the range and the
standard deviation.
2.3.1 Range
Range is simply the highest value minus the lowest value. For example, in a
distribution, if the highest value is 36 and the lowest is 15, the range is 36 – 15 = 21.

2.3.2 Standard Deviation

Standard deviation is a more accurate and detailed estimate of dispersion because
an outlier can greatly exaggerate the range. The standard deviation shows the
relation that a set of scores has to the mean of the sample. For instance, when you
give a test, there is bound to be variation in the scores obtained by students.
Variability, variation or dispersion is determined by the distance of a particular
score from the “norm” or measure of central tendency such as the mean. The
standard deviation is a statistic that shows the extent of variability or variation
for a given series of scores from the mean.
Standard deviation makes use of the deviations of the individual scores from the
mean. Then, each individual deviation is squared to avoid the problem of plus
and minus. Standard deviation is the most often used measure of variability or
variation in educational and psychological research.
The following is the formula for calculating standard deviation:
2
1 n 2 X X
S= Xi X OR
n–1i 1 N 1
(a) Interpretation of the Formula

Standard deviation is found by:
• Taking the difference between the mean X and each item X X ;
2
• Squaring this difference X X ;
2
• Summing all the squared differences X X ;
• Dividing by the number of scores (N) minus 1; and

• Extracting the square root.

(b) Computing Standard Deviation

Example: A mathematics test was given to a group of 10 students. Their
scores are shown in Column 1 of Table 2.1.
Table 2.1: Example of Computing Standard Deviation
Column 1 Column 2 Column 3

X X X X X
2
23 23 – 25 = – 2 4
22 22– 25 = – 3 9
26 26 – 25 = + 1 1
21 21 – 25 = – 4 16
30 30 – 25 = + 5 25
24 24 – 25 = + 1 1
20 20 – 25 = – 5 25
27 27 – 25 = + 2 4
25 25 – 25 = 0 0
32 32 – 25 = + 7 49
2
X = 25 X X = 134
Apply the formula:
2
X X 134 134
Std. Deviation = 3.8586
N–1 10 – 1 9
(c) Differences in Standard Deviations

A mathematics test was administered to Class A and Class B. The
distribution of the scores are shown below.
In Class A (Figure 2.1), the scores are widely spread out, which means there
is high variance or a bigger standard deviation i.e. most of the scores are 6
from the mean. If the mean is 50, then you can say that approximately 95%
of the students scored between 44 and 56.

Figure 2.1: Standard deviation
In Class B (Figure 2.2), there is low variance or a small standard deviation which
explains why most of the scores are clustered around the mean. Most of the scores
are “bunching” around the mean i.e. most of the scores are 3 from the mean. If
the mean is 50, approximately 95% of the students scored between 47 and 53.
Figure 2.2: Standard deviation
ACTIVITY 2.1
Below are the scores obtained by students in two classes on a history test:
Class A marks: 15, 25, 20, 20, 18, 22, 16, 24, 28, 12
Class B marks: 10, 30, 13, 27, 16, 24, 5, 35, 28, 12
(a) Compute the mean of the two classes.
(b) Compute the standard deviation of the two classes.
(c) Explain the implication of differences in standard deviations.

2.4 FREQUENCY DISTRIBUTION

Frequency distribution is a way of displaying numbers in an organised manner.
A frequency distribution is simply a table that, at the minimum, displays how
many times in a data set each response or "score" occurs. A good frequency
distribution will display more information than this; although with just this
minimal information, many other bits of information can be computed.
2.4.1 Tables
Tables can contain a great deal of information but they also take up a lot of space
and may overwhelm readers with details. How should tables be presented in a
manner that can be easily understood? In general, frequency tables are best for
variables with different numbers of categories (see Table 2.2).
Table 2.2: Question: Should Sex Education be Taught in Secondary School?

Cumulative
Frequency Percent Valid Percent
Percent
4. Strongly Agree 1 7.7 7.7 7.7
3. Agree 3 23.1 23.1 30.8
2. Disagree 4 30.8 30.8 61.5
1. Strongly Disagree 5 38.5 38.5 100.0
Total 13 100.0 100.0
Table 2.2 summarises the responses of 13 teachers with regard to the teaching of
sex education in secondary school.
• The first column contains the values or categories of the variables (opinion
on teaching sex education in schools – extent of agreement).
• The frequency column indicates the number of respondents in each category.
• The percent column lists the percentage of the whole sample in each
category. These percentages are based on the total sample size, including
those who did not answer the question. Those who did not answer will be
shown as missing cases in this column.
• The valid percent column contains the percentage of those who gave a valid
response to the question that belongs to each category. When there are no
missing cases, the valid percent column is similar to the percent column.

• The cumulative percentage column provides the rolling addition of

percentages from the first category to the last valid category. For example,
7.7 percent of teachers strongly agree that sex education should be taught in
secondary school. A further 23.1 percent of them simply agree that sex
education should be taught. The cumulative percentage column adds up the
percentage of those who strongly agree with those who agree (7.7 + 23.1 =
30.8). Thus, 30.8 percent at least agree (either agree or strongly agree) that
sex education should be taught in secondary school.
2.4.2 SPSS Procedure

To obtain a frequency table, measure of central tendency and variability:
1. Select the Analyse menu.
2. Click on the Descriptive Statistics and then on Frequencies to open the
Frequencies dialogue box.
3. Select the variable(s) you require (i.e. opinion on sex education) and click
on the button to move the variable into the Variables(s) box.
4. Click on the Statistics…. command push button to open the Frequencies:
Statistics sub-dialogue box.
5. In the Central Tendency box, select the Mean, Median and Mode check
boxes.
6. In the Dispersion box, select the Std. deviation and Range check boxes.
7. Click on Continue and then OK.
2.5 GRAPHS
Graphs are widely used in describing data. However, it should be appropriately
used. There is a tendency for graphs to be cluttered, confusing and downright
misleading.
2.5.1 Bar Charts

The following are elements of a graph that should be given due consideration
(refer to Figure 2.3):
• The X-axis represents the values of the variables being displayed. The X-
axis may be divided into discrete categories (bar charts) or continuous

values (line graphs). Which units are used depend on the level of
measurement of the variable being graphed.
• In the example in Figure 2.3, the X-axis represents the students’ gain scores
after undergoing an innovative instructional programme.
• The Y-axis, which appears either in percentages or frequencies, as in Figure
2.3, shows the frequency of students who obtained the various scores
indicated in the X-axis.
• Interpretation of the graph on “Students’ Gain Scores”:
– A total of 275 students obtained between 1 and 5 marks as a result of
the innovative instructional programme; 199 obtained between 6 and
10 marks; 77 between 11 and 15 marks; and 28 between 16 and 20
marks.
– The number of students who obtained high gain scores decreases
gradually.
Figure 2.3: Example of a bar chart

2.5.2 Histogram
Histograms are different from bar charts because they are used to display
continuous variables (see the histogram in Figure 2.4).
Figure 2.4: Percentage who agreed that sex education should be taught
in secondary schools
• The X-axis represents the different age groups, while the Y-axis represents the
percentages of respondents.
• Each bar in the X-axis represents one age group in ascending order.
• The Y-axis in this case represents the percentages of respondents in the Sex
Education survey.
• Interpretation of the graph “Sex Education Should be Taught in Secondary
School”:
– Among the 18 to 28 age group, only 20% agreed that sex education should
be taught in schools compared to 60% in the 51 to 61 age group.
– About 40% in the 40 to 50 age group and 50% among the 29 to 39 age
group agreed that sex education should be taught in secondary schools.
– Only 10% of those aged 73 years and older agreed that secondary school
students should be taught sex education.

2.5.3 Line Graphs

The line graph serves a similar function as a histogram. It should be used for
continuous variables. The main differences between a line graph and a histogram
are that on a line graph, the frequency of any value on the X-axis is represented by
a point on a line rather than by a single column and the values of the continuous
variable are not automatically grouped into a smaller number of groups as they are
in histograms. As such, the line graph reflects the frequencies or percentages of
every value of the x variable and thus avoids potential distortions due to the way
in which the values are grouped.
The line graph in Figure 2.5 shows the frequency of using the library among a
group of male and female respondents. The level of measurement of the Y-axis
variable is ordinal or interval. Line graphs are more suitable for variables that
have more than five or six categories. They are less suited for variables with a
very large number of values as this can produce a very jagged and confusing
graph.
Since a separate line is produced for each category of the x variable, only x
variables with a small numbers of categories should be used. This will normally
mean that the x variable is a nominal or ordinal variable.
Figure 2.5: Example of a line graph

ACTIVITY 2.2
Interpret the line graph (Figure 2.5) showing the frequency of a group of
respondents visiting the library. A separate line is used for male and
female respondents.
• Descriptive statistics are used to summarise a collection of data and present it

in a way that can be easily and clearly understood.
• Mean, median and mode are common descriptive statistics used to measure
central tendency, while standard deviation is the commonly used statistic to
measure variability or dispersion of data.
• A frequency distribution is a table that, at the minimum, displays how many

times in a data set each response or "score" occurs.
• Graphs are also used to condense large sets of data and these include the use
of bar charts, histograms and line graphs.
Frequency distribution Median

Graphs Mode
Mean Range
Measures of central tendency Standard deviation
Measures of variability or dispersion

Topic Normal
3 Distribution
LEARNING OUTCOMES
1. Explain what normal distribution means;
2. Assess normality using graphical techniques – histogram;
3. Assess normality using graphical techniques – box plots;
4. Assess normality using graphical techniques – normality plots; and
5. Assess normality using statistical techniques.
INTRODUCTION
This topic explains what normal distribution is and introduces the graphical as
well as the statistical techniques used in assessing normality. It also presents SPSS
procedures for assessing normality.
3.1 WHAT IS NORMAL DISTRIBUTION?

Now that you know what mean stands for, as well as the standard deviation of a
set of scores, we can proceed to examine the concept of normal distribution. The
normal curve was developed mathematically in 1733 by DeMoivre as an
approximation to the binomial distribution. Laplace used the normal curve in 1783
to describe the distribution of errors. However, it was Gauss who popularised the
normal curve when he used it to analyse astronomical data in 1809 and it became
known as the Gaussian distribution.
The term “normal distribution” refers to a particular way in which scores or

observations tend to pile up or distribute around a particular value rather than be

32 TOPIC 3 NORMAL DISTRIBUTION
scattered all over. The normal distribution which is bell-shaped is based on a

mathematical equation (which we will not get into).
While some argue that in the real world, scores or observations are seldom
normally distributed, others argue that in the general population, many variables
such as height, weight, IQ scores, reading ability, job satisfaction and blood
pressure turn out to have distributions that are bell-shaped or normal.
3.2 WHY IS NORMAL DISTRIBUTION

IMPORTANT?
Normal distribution is important for the following reasons:
• Many physical, biological and social phenomena or variables are normally
distributed. However, some variables are only approximately normally
distributed.
• Many kinds of statistical tests (such as t-test, ANOVA) are derived from a
normal distribution. In other words, most of these statistical tests work best
when the sample tested is distributed normally.
Fortunately, these statistical tests work very well even if the distribution is only
approximately normally distributed. Some tests work well even with very wide
deviations from normality. They are described as “robust” tests that are able to
tolerate the lack of a normal distribution.
3.3 CHARACTERISTICS OF THE NORMAL

CURVE
A normal distribution (or normal curve) is completely determined by the mean
and standard deviation i.e. two normally distributed variables having the same
mean and standard deviation must have the same distribution. We often identify a
normal curve by stating the corresponding mean and standard deviation and
calling those the parameters of the normal curve.
A normal distribution is symmetric and centred at the mean of the variable, and its
spread depends on the standard deviation of the variable. The larger the standard
deviation, the flatter and more spread out is the distribution.

TOPIC 3 NORMAL DISTRIBUTION 33
Figure 3.1: Normal distribution or curve
The graph in Figure 3.1 is a picture of a normal distribution of IQ scores among a

sample of adolescents.
• Mean is 100.
• Standard Deviation is 15.
As you can see, the distribution is symmetric. If you folded the graph in the
centre, the two sides would match, i.e. they are identical.
3.3.1 Mean, Median and Mode

The centre of the distribution is the mean. The mean of a normal distribution is
also the most frequently occurring value (i.e. the mode) and it is also the value
that divides the distribution of scores into two equal parts (i.e. the median). In any
normal distribution, the mean, median and the mode all have the same value (i.e.
100 in the example above).

3.4 THREE-STANDARD-DEVIATIONS RULE

Normal distribution shows the area under the curve. The three-standard-deviations
rule, when applied to a variable, states that almost all the possible observations or
scores of the variable lie within three standard deviations to either side of the
mean. The normal curve is close to (but does not touch) the horizontal axis
outside the range of the three standard deviations to either side of the mean. Based
on the graph in Figure 3.1, you will notice that with a mean of 100 and a standard
deviation of 15;
• 68% of all IQ scores fall between 85 (i.e. one standard deviation less than the
mean which is 100 – 15 = 85) and 115 (i.e. one standard deviation more than
the mean which is 100 + 15 = 115).
• 95% of all IQ scores fall between 70 (i.e. two standard deviations less than the
mean which is 100 – 30 = 70) and 130 (i.e. two standard deviations more than
the mean which is 100 + 30 = 130).
• 99% of all IQ scores fall between 55 (i.e. three standard deviations less than
the mean which is 100 – 45 = 55) and 145 (i.e. three standard deviations more
than the mean which is 100 + 45 = 145).
A normal distribution can have any mean and standard deviation. However, the
percentage of cases or individuals falling within one, two or three standard
deviations from the mean is always the same. The shape of a normal distribution
does not change. Means and standard deviations will differ from variable to
variable but the percentage of cases or individuals falling within specific intervals
is always the same in a true normal distribution.

ACTIVITY 3.1
1. What is meant by the statement that a population is normally
distributed?
2. Two normally distributed variables have the same means and the
same standard deviations. What can you say about their distributions?
Explain your answer.
3. Which normal distribution has a wider spread: the one with mean 1
and standard deviation 2 or the one with mean 2 and standard
deviation 1? Explain your answer.
4. The mean of a normal distribution has no effect on its shape. Explain.
5. What are the parameters for a normal curve?
3.5 INFERENTIAL STATISTICS AND

NORMALITY
Often in statistics, one would like to assume that the sample under investigation
has a normal distribution or an approximate normal distribution. However, such
an assumption should be supported in some way by some techniques. As
mentioned earlier, the use of several inferential statistics such as the t-test and
ANOVA require that the distribution of the variables analysed are normally
distributed or at least approximately normally distributed. However, as discussed
in Topic 1, if a simple random sample is taken from a population, the distribution
of the observed values of a variable in the sample will approximate the
distribution of the population. Generally, the larger the sample, the better the
approximation tends to be. In other words, if the population is normally
distributed, the sample of observed values would also be normally distributed if
the sample is randomly selected and it is large enough.
3.5.1 Assessing Normality using Graphical Methods

Assessing normality means determining whether the samples of students,
teachers, parents or principals you are studying are normally distributed. When
you draw a sample from a population that is normally distributed, it does not
mean that your sample will necessarily have a distribution that is exactly normal.
Samples vary, so the distribution of each sample may also vary. However, if a

sample is reasonably large and it comes from a normal population, its distribution
should look more or less normal.
For example, when you administer a questionnaire to a group of school principals,

you want to be sure that your sample of 250 principals is normally distributed.
Why? The assumption of normality is a prerequisite for many inferential
statistical techniques and there are two main ways of determining the normality of
distribution.
The normality of a distribution can be determined using graphical methods (such

as histograms, stem-and-leaf plots and boxplots) or using statistical procedures
(such as the Kolmogorov-Smirnov statistic and the Shapiro-Wilk statistics).
SPSS Procedures for Assessing Normality

There are several procedures to obtain the different graphs and statistics to
assess normality, for example the EXPLORE procedure is the most convenient
when both graphs and statistics are required.
From the main menu, select Analyse.
Click Descriptive Statistics and then Explore ....to open the Explore dialogue
box.
Select the variable you require and click the arrow button to move this variable
into the Dependent List: box.
Click the Plots...command push button to obtain the Explore: Plots sub-
dialogue box.
Click the Histogram check box and the Normality plots with tests check box,
and ensure that the Factor levels together radio button is selected in the
Boxplots display.
Click Continue.
In the Display box, ensure that Both is activated.
Click the Options...command push button to open the Explore: Options sub-
dialogue box.
In the Missing Values box, click the Exclude cases pairwise (if not selected
by default)
Click Continue and then OK.

(a) Assessing Normality using Histogram

See the graph in Figure 3.2, which is a histogram showing the distribution of
scores obtained on a Scientific Literacy Test administered to a sample of
students.
The values on the vertical axis indicate the frequency or number of cases.
The values on the horizontal axis are midpoints of value ranges. For
example, the first bar is 20 and the second bar is 30, indicating that each bar
covers a range of 10.
A simple look at the bars shows that the distribution has the rough shape of
a normal distribution. However, there are some deviations. The question is
whether this deviation is small enough to say that the distribution is
approximately normal. Generating the histogram via the Explore option does
not show you the normal curve overlay. To show this overlay, you have to
generate the histogram using the Frequencies option (Analyse – Descriptive
Statistics – Frequencies – Charts – Histograms – With Normal Curve).
Figure 3.2: Distribution of scores obtained on a Scientific Literacy Test
(b) Assessing Normality using Skewness

Skewness is the degree of departure from the symmetry of a distribution. A
normal distribution is symmetrical. A non-symmetrical distribution is
described as being either negatively or positively skewed. A distribution is
skewed if one of its tails is longer than the other or the tail is pulled to either
the left or the right.
Refer to Figure 3.3, which shows the distribution of the scores obtained by
students on a test. There is a positive skew because it has a longer tail in the
positive direction or the long tail is on the right side (towards the high values
on the horizontal axis).

What does it mean? It means that more students were getting low scores in
the test and this indicates that the test was too difficult. Alternatively, it
could mean that the questions were not clear or the teaching methods and
materials did not bring about the desired learning outcomes.
Figure 3.3: Distribution of scores obtained by students on a test
Refer to Figure 3.4 which shows the distribution of the scores obtained by
students on a test. There is a negative skew because it has a longer tail in the
negative direction or to the left (towards the lower values on the horizontal
axis).
What does it mean? It means that more students were getting high scores on
the test. This may indicate that either the test was too easy or the teaching
methods and materials were successful in bringing about the desired
learning outcomes.
Figure 3.4: Distribution of scores obtained by students on a test
Interpreting the Statistics for Skewness

Besides graphical methods, you can also determine skewness by examining
the statistics reported. A normal distribution has a skewness of 0. See the
table on the right in Figure 3.5, which reports the skewness statistics for

three independent groups. A positive value indicates a positive skew, while

a negative value indicates a negative skew.
Among the three groups, Group 3 is not normally distributed compared to
the other two groups. Its skewness value of -1.200 which is greater than 1
normally indicates that the distribution is non-symmetrical (Rule of thumb:
>|1| indicates a non-symmetrical distribution).
The distribution of Group 2 with a skewness value of .235 is closer to being
normal of 0 followed by Group 1 with a skewness value of .973.
Figure 3.5: Skewness statistics for three independent groups
(c) Assessing Normality using Kurtosis

Kurtosis indicates the degree of "flatness" or "peakedness" in a distribution
relative to the shape of normal distribution. Refer to the graphs in
Figure 3.6.
Figure 3.6: Kurtosis
(i) Low Kurtosis: Data with low kurtosis tend to have a flat top near the
mean rather than a sharp peak.
(ii) High Kurtosis: Data with high kurtosis tend to have a distinct peak
near the mean, decline rather rapidly and have a heavy tail.
See the graphs in Figure 3.7:

• A normal distribution has a kurtosis of 0 and is called mesokurtic
(Graph A). (Strictly speaking, a mesokurtic distribution has a value of
3 but in line with the practice used in SPSS, the adjusted version is 0).
• If a distribution is peaked (tall and skinny), its kurtosis value is greater
than 0 and it is said to be leptokurtic (Graph B) and has a positive
kurtosis.
• If, on the other hand, the kurtosis is flat, its value is less than 0, or
platykurtic (Graph C) and has a negative kurtosis.
Figure 3.7: Mesokurtic, Leptokurtic and Platykurtic
Interpreting the Statistics for Kurtosis

Besides graphical methods, you can also determine skewness by examining
the statistics reported. A normal distribution has a kurtosis of 0. See the
table below in Figure 3.8, which reports the kurtosis statistics for three
independent groups.
Figure 3.8: Kurtosis statistics for three independent groups
• Group 1 with a kurtosis value of 0.500 (positive value) is more

normally distributed than the other two groups because it is closer to 0.

• Group 2 with a kurtosis value of –1.58 has a distribution that is more

flattened and not as normally distributed compared to Group 1.
• Group 3 with a kurtosis value + 1.65 has a distribution that is more
peaked and not as normally distributed compared to Group 1.
(d) Assessing Normality using Box Plot

The boxplot also provides information about the distribution of scores.
Unlike the histogram which plots actual values, the boxplot summarises the
distribution using the median, the 25th and 75th percentiles, and extreme
scores in the distribution. See Figure 3.9, which shows a boxplot for the
same set of data on scientific literacy discussed earlier. Note that the lower
boundary of the box is the 25th percentile and the upper boundary is the
75th percentile.
Figure 3.9: Boxplot for the set of data on scientific literacy

(i) The BOX

The box has hinges that form the outer boundaries of the box. The
hinges are the scores that cut off the top and bottom 25% of the data.
Thus, 50% of the scores fall within the hinges. The thick horizontal
line through the box represents the median. In the case of a normal
distribution, the line runs through the centre of the box.
If the median is closer to the top of the box, then the distribution is
negatively skewed. If it is closer to the bottom of the box, then it is
positively skewed.
(ii) WHISKERS
The smallest and largest observed values within the distribution are
represented by the horizontal lines at either end of the box, commonly
referred to as whiskers.
The two whiskers indicate the spread of the scores.
Scores that fall outside the upper and lower whiskers are classified as
extreme scores or outliers. If the distribution has any extreme scores,
i.e. 3 or more box lengths from the upper or lower hinge, these will be
represented by a circle (o).
Outliers tell us that we should see why it is so extreme. Could it be that
you may have made an error in data entry?
Why is it important to identify outliers? This is because many of the
statistical techniques used involve calculation of means. The mean is
sensitive to extreme scores and it is important to be aware whether
your data contain such extreme scores if you are to draw conclusions
from the statistical analysis conducted.
(e) Assessing Normality using Normality Probability Plot

Besides the histogram and the box plot, another frequently used graphical
technique of determining normality is the "Normality Probability Plot" or
"Normal Q-Q Plot." The idea behind a normal probability plot is simple. It
compares the observed values of the variable to the observations expected
for a normally distributed variable. More precisely, a normal probability plot
is a plot of the observed values of the variable versus the normal scores (the
observations expected for a variable having the standard normal
distribution).
In a normal probability plot, each observed or value (score) obtained is
paired with its theoretical normal distribution forming a linear pattern. If the

sample is from a normal distribution, then the observed values or scores fall
more or less in a straight line. The normal probability plot is formed by:
• Vertical axis: Expected normal values
• Horizontal axis: Observed values
SPSS Procedures
1. Select Analyze from the main menu.
2. Click Descriptive Statistics and then Explore.....to open the Explore
dialogue box.
3. Select the variable you require (i.e. mathematics score) and click on
the arrow button to move this variable to the Dependent List: box.
4. Click the Plots....command push button to obtain the Explore: Plots
sub dialogue box.
5. Click the Histogram check box and the Normality plots with tests
check box and ensure that the Factor levels together radio button is
selected in the Boxplots display.
6. Click Continue.
7. In the Display box, ensure that both are activated.
8. Click the Options....command push button to open the Explore:
Options sub-dialogue box.
9. In the Missing Values box, click on the Exclude cases pairwise radio
button. If this option is not selected then, by default, any variable with
missing data will be excluded from the analysis. That is, plots and
statistics will be generated only for cases with complete data.
Note that these commands will give you the 'Histogram', 'Stem-and-leaf
plots', 'Boxplots' and 'Normality Plots'.

Figure 3.10: Example of a normal probability plot
When you use a normal probability plot to assess the normality of a variable,
you must remember that ascertaining whether the distribution is roughly
linear and is normal is subjective. The graph in Figure 3.10 is an example of
a normal probability plot. Though none of the value falls exactly on the line,
most of the points are very close to the line.
• Values that are above the line represent units for which the observation
is larger than its normal score
• Values that are below the line represent units for which the observation
is smaller than its normal score
Note that there is one value that falls well outside the overall pattern of the
plot. It is called an outlier and you will have to remove the outlier from the
sample data and redraw the normal probability plot.

Even with the outlier, the values are close to the line and you can conclude
that the distribution will look like a bell-shaped curve. If the normal scores
plot departs only slightly from having all of its dots on the line, then the
distribution of the data departs only slightly from a bell-shaped curve. If one
or more of the dots departs substantially from the line, then the distribution
of the data is substantially different from a bell-shaped curve.
Outliers:
Refer to the normal probability plot in Figure 3.11. Note that there are
possible outliers which are values lying off the hypothetical straight line.
Outliers are anomalous values in the data which may be due to recording
errors, which may be correctable, or they may be due to the sample not
being entirely from the same population.
Figure 3.11: Outliers
Skewness to the left:

Refer to the normal probability plot in Figure 3.12. Both ends of the
normality plot fall below the straight line passing through the main body of
the values of the probability plot, then the population distribution from
which the data were sampled may be skewed to the left.

Figure 3.12: Skewness to the left
Skewness to the right:

If both ends of the normality plot bend above the straight line passing
through the values of the probability plot, then the population distribution
from which the data were sampled may be skewed to the right. Refer to
Figure 3.13.
Figure 3.13: Skewness to the right

ACTIVITY 3.2
Figure 3.14: Normal probability plot for the distribution of

mathematics scores
Refer to the output of a Normal Probability Plot for the distribution of
mathematics scores by eight students in Figure 3.14.
1. Comment on the distribution of scores.
2. Would you consider the distribution normal?
3. Are there outliers?
3.5.2 Assessing Normality using Statistical Techniques

The graphical methods discussed present qualitative information about the
distribution of data. Histograms, box plots and normal probability plots are
graphical methods useful for determining whether data follow a normal curve.
Extreme deviations from normality are often readily identified from graphical
methods. However, in many instances the decision is not straightforward. Using
graphical methods to decide whether a data set is normally distributed involves
making a subjective decision; formal test procedures are usually necessary to test
the assumption of normality.
In general, both statistical tests and graphical plots should be used to determine
normality. However, the assumption of normality should not be rejected on the

basis of a statistical test alone. In particular, when the sample is large, statistical
tests for normality can be sensitive to very small (i.e. negligible) deviations in
normality. Therefore, if the sample is very large, a statistical test may reject the
assumption of normality when the data set, as shown using graphical methods, is
essentially normal and the deviation from normality is too small to be of practical
significance.
(a) Kolmogorov-Smirnov Test

You could use the Kolmogorov-Smirnov test to evaluate statistically
whether the difference between the observed distribution and a theoretical
normal distribution is small enough to be just due to chance. If it could be
due to chance, you would treat the distribution as being normal. If the
distribution between the actual distribution and the theoretical normal
distribution is larger, then it is likely to be due to chance (sampling error)
and then you would treat the actual distribution as not being normal.
In terms of hypothesis testing, the Kolmogorov-Smirnov test is based on Ho:
that the data are normally distributed. The test is used for samples which
have more than 50 subjects.
H0: DISTRIBUTION FITS THE DATA

Ha: DISTRIBUTION DOES NOT FIT THE DATA
DISTRIBUTION: NORMAL
• If the Kolmogorov-Smirnov tests yields a significance level of less (<)
than 0.05, it means that the distribution is NOT normal.
• However, if the Kolmogorov-Smirnov test yields a significance level
of more (>) than 0.05, it means that the distribution is normal.
Kolmogorov-Smirnova
Statistic df Sig.
SCORE .21 1598 .000*
* This is lower bound of the true significance
a Lilliefors Significance Correction

(b) Shapiro-Wilk Test

Another powerful and most commonly employed test for normality is the
Shapiro-Wilk test by Shapiro and Wilk. It is an effective method for testing
whether a data set has been drawn from a normal distribution.
• If the normal probability plot is approximately linear (the data follow a
normal curve), the test statistic will be relatively high.
• If the normal probability plot has curvature that is evidence of non-
normality in the tails of a distribution, the test statistic will be
relatively low.
In terms of hypothesis testing, the Shapiro-Wilk test is based on Ho: that the
data are normally distributed. The test is used for samples which have less
than 50 subjects.
H0: DISTRIBUTION FITS THE DATA
Ha: DISTRIBUTION DOES NOT FIT THE DATA
DISTRIBUTION: NORMAL
• Reject the assumption of normality if the test of significance reports a
p-value of less (<) than 0.05.
• DO NOT REJECT the assumption of normality if the test of
significance reports a p-value of more (>) than 0.05.
Table 3.1 shows the Kolmogorov-Smirnov statistic for assessing normality.
Table 3.1: Kolmogorov-Smirnov Statistic for Assessing Normality

SPSS Output
Tests of Normality
Shapiro-Wilk
Independent variable group Statistic df Sig.
Group 1 .912 22 .055
Group 2 166 14 .442
Group 3 .900 16 .084
The Shapiro-Wilk normality tests indicate that the scores are normally
distributed in each of the three groups. All the p-values reported are more
than 0.05 and hence you DO NOT REJECT the null hypothesis.

NOTE:
It should be noted that with large samples, even a very small deviation from
normality can yield low significance levels. So a judgment still has to be made as
to whether the departure from normality is large enough to matter.
3.6 WHAT TO DO IF THE DISTRIBUTION IS

NOT NORMAL?
You have TWO choices if the distribution is not normal and they are:
• Use a Non-parametric Statistic
• Transform the Variable to make it Normal
(a) Use a Non-parametric Statistic

In many cases, if the distribution is not normal, an alternative statistic will
be available especially for bivariate analyses such as correlation or
comparisons of means. These alternatives which do not require normal
distributions are called non-parametric or distribution-free statistics. Some
of these alternatives are shown in Figure 3.15 as follows:
Figure 3.15: Non-parametric or distribution-free statistics

(b) Transform the Variable to make it Normal

The shape of a distribution can be changed by expressing it in a different
way statistically. This is referred to as transforming the distribution.
Different types of transformations can be applied to "normalise" the
distribution. The type of transformation selected depends on the manner to
which the distribution departs from normality. (We will not discuss
transformation in this course.)
ACTIVITY 3.3
Kolmogorov-Smirnova
Statistic df Sig.
SCORE 0.57 999 .200*
* This is lower bound of the true significance
a Lilliefors Significance Correction

Examine the SPSS output above and determine if the sample is
normally distributed.
• Normal distribution refers to a particular way in which scores or observations

will tend to pile up or distribute around a particular value.
• The normal distribution is bell-shaped and is completely determined by the
mean and standard deviation.
• The use of several inferential statistics such as t-tests and ANOVA requires
that the variables analysed are normally distributed or at least approximately
normally distributed.
• Normality of a distribution can be assessed using graphical methods or
statistical techniques.
• The graphical methods used to assess normality are the histogram, the boxplot
and the normality probability plot.
• The statistical techniques used to assess normality are the Kolmogorov-
Smirnov test and Shapiro-Wilk test.

Boxplot Normal distribution

Histogram Normality probability plot
Kolmogorov-Smirnov test Shapiro-Wilk test

Topic Hypothesis
4 Testing
LEARNING OUTCOMES
1. Explain the difference between null and alternative hypothesis and their
use in research;
2. Differentiate between Type I and Type II errors; and
3. Explain when the two-tailed and one-tailed test is used.
INTRODUCTION
The topic explains the difference between the null and alternative hypotheses and
their use in research. It also introduces the concepts of Type I error and Type II
error. It illustrates the difference between the two-tailed and one-tailed tests and
explains when they are used in hypothesis testing.
4.1 WHAT IS A HYPOTHESIS?

Your car did not start. You have a hunch and put forward the hypothesis that "the
car does not start because there is no petrol." You check the fuel gauge to either
accept or reject the hypothesis. If you find there is petrol, you reject the
hypothesis.
Next, you hypothesise that "the car did not start because the spark plugs are dirty."
You check the spark plugs to determine if they are dirty. You find that the spark
plugs are indeed dirty. You do not reject the hypothesis.

54 TOPIC 4 HYPOTHESIS TESTING
Many researchers state their research questions in the form of a "hypothesis."

Hypothesis is singular and hypotheses are plural. A hypothesis is a tentative
statement that explains a particular phenomenon which is testable. The key
word is "testable." Refer to the following statements:
(i) Juvenile delinquents tend to be from low socio-economic families.
(ii) Children who attend kindergarten are more likely to have higher reading
scores.
(iii) The discovery method of teaching may enhance the creative thinking skills
of students.
(iv) Children who go for tuition tend to perform better in mathematics.
All these are examples of hypotheses. However, these statements are not
particularly useful because of words such as "may," "tend to" and "more likely."
Using these tentative words does not suggest how you would go about proving it.
To solve this problem, a hypothesis should state:
• Two or more variables that are measurable
• An independent and dependent variable
• A relationship between two or more variables
• A possible prediction
Examine the hypothesis in Figure 4.1. It has all the attributes mentioned:
• The variables are "critical thinking" and "gender," which are both measurable.
• The independent variable is "gender" which can be manipulated as “male”
and “female”; and the dependent variable is "critical thinking."
• There is a possible relationship between the gender of undergraduates and
their critical thinking skills.
• It is possible to predict that males may be better in critical thinking compared
to females or vice-versa.

TOPIC 4 HYPOTHESIS TESTING 55
Figure 4.1: Hypothesis
ACTIVITY 4.1
1. Rewrite the four hypotheses using the formalised style shown. Ensure
that each hypothesis has all the attributes stated.
2. Write two more original hypotheses of your own using this form.
4.2 TESTING A HYPOTHESIS
4.2.1 Null Hypothesis

The null hypothesis is a hypothesis (or hunch) about the population. It represents a
theory that has been put forward because it is believed to be true. The word "null"
means nothing or zero. So, a null hypothesis states that “nothing happened.” For
example, there is no difference between males and females in critical thinking
skills or there is no relationship between socio-economic status and academic
performance. Such a hypothesis is denoted with the symbol "Ho:". In other words,
you are saying,
• You do not expect the groups to be different.
• You do not expect the variables to be related.

Say, for example, you conduct an experiment to test the effectiveness of the
discovery method in learning science compared to the lecture method. You select
a random sample of 30 students for the discovery method group and 30 students
for the lecture method group (see Topic 1 on Random Sampling).
Based on your sample, you hypothesise that there are no differences in science
achievement between students in the discovery method group and students in the
lecture method group. In other words, you make the claim that there are no
differences in science scores between the two groups in the population. This is
represented by the following two types of null hypotheses with the following
notation or Ho:
Ho: µ¹ = µ OR Ho: µ¹ - µ = 0
² ²
In other words, you are saying that:

• The science mean scores for the discovery method group (µ¹) is EQUAL to
the mean scores for the lecture method group (µ ).
²
• The science mean scores for the discovery method group (µ¹) MINUS the
mean scores for the lecture method group (µ ) is equal to ZERO.
²
The null hypothesis is often the reverse of what the researcher actually believes in
and it is put forward to allow the data to contradict it (You may find it strange but
it has its merit!).
Based on the findings of the experiment, you found that there was a significant
difference in science scores between the discovery method group and the lecture
method group.
In fact, the mean score of subjects in the discovery method group was HIGHER
than the mean of subjects in the lecture method group. What do you do?
• You REJECT the null hypothesis because earlier you had said they would be
equal.
• You reject the null hypothesis in favour of the ALTERNATIVE
HYPOTHESIS (i.e. µ¹ µ ).
²

4.2.2 Alternative Hypothesis

Alternative Hypothesis ( H1 ) is the opposite of Null Hypothesis. For example, the
alternative hypothesis for the study discussed earlier is that THERE IS A
DIFFERENCE in science scores between the discovery method group and the
lecture method group represented by the following notation:
Ha: µ¹ ≠ µ
²
Ha: The Alternative Hypothesis might be that the science mean scores between the
discovery method group and the lecture method group are DIFFERENT.
Ha: µ¹ > µ
²
Ha: The Alternative Hypothesis might be that the science mean scores of the
discovery method group are HIGHER than the mean scores of the lecture method
group.
Ha: µ¹ < µ
²
Ha: The Alternative Hypothesis might be that the science mean scores of the discovery
method group are LOWER than the mean scores of the lecture method group.
SELF-CHECK 4.1
1. What is the meaning of a null hypothesis?
2. What do you mean when you "reject" the null hypothesis?
3. What is the alternative hypothesis?
4. What do you mean when you "accept" the alternative hypothesis?
4.3 TYPE I AND TYPE II ERROR

The aim of any hypothesis-testing situation is to make a decision; in particular,
you have to decide whether to reject the Null Hypothesis (Ho), in favour of the
Alternative Hypothesis (Ha). Although you would like to make a correct decision
always, there are times when you might make a wrong decision.
• You can claim that the two means are not equal in the population when in fact
they are.
• Or you can fail to say that there is a difference when there is really no
difference.
Statisticians have given names to these two types of errors as follows:

Type 1 Error
Claiming that two means are different when in fact they are equal. In other words,
you reject a null hypothesis when it is TRUE.
Type 2 Error
Claiming that there are no differences between two means when in fact there is a
difference. In other words, you reject a null hypothesis when it is FALSE.
How do you remember to differentiate between the two types of errors?
Type 1 Error is the error you are likely to make when you examine your data and
say that "Something is happening here!" For example, you conclude that "There is
a difference between males and females." In fact, there is no difference between
males and females in the population.
Type 2 Error is the error you are likely to make when you examine your data and
say "Nothing is happening here!” For example, you conclude that "There is no
difference between males and females." In fact, there is a difference between
males and females in the population.
Four Possible Situations in Testing a Hypothesis
Ho: µ¹ = µ OR Ho: µ¹ - µ = 0
² ²
The null hypothesis can be true or false and you can reject or not reject the null
hypothesis. There are four possible situations which arise in testing a hypothesis
and they are summarised in Figure 4.2.
FALSE TRUE
Do Not Reject Ho: Correct Decision Risk committing
[Say it is TRUE] [no problem] Type 2 Error
Reject Ho: Risk committing Correct Decision
[Say it is FALSE] Type 1 Error [no problem]
Figure 4.2: Four possible situations in testing a hypothesis

Based on your study:

• You decide to Reject the Null Hypothesis (Ho). You have a correct decision if
in the real world the null hypothesis is TRUE.
• You decide to Reject the Null Hypothesis (Ho). You risk committing Type 1
Error if in the real world the hypothesis is TRUE.
• You decide NOT to Reject the Null Hypothesis (Ho). You risk committing
Type 2 Error if in the real world the hypothesis is FALSE.
• You decide NOT to Reject the Null Hypothesis (Ho). You have made a correct
decision if in the real world the null hypothesis is FALSE.
In other words, when you detect a difference in the sample you are studying and a
difference is also detected in the population, you are OK. When there is no
difference in the sample you are studying and there is no difference in the
population you are OK.
ACTIVITY 4.3
You can use the logic of hypothesis testing in the courtroom. A student
is being tried for stealing a motorcycle. The judicial system is based on
the premise that a person is "innocent until proven guilty." It is the court
that must prove based on sufficient evidence that the student is guilty.
Thus, the null and alternative hypotheses would be:
Ho: The student is innocent
Ha: The student is guilty
1. Using the table in Figure 4.2, state the four possible outcomes of the
court's decision.
2. Interpret the Type I and Type II errors in this context.

4.4 TWO-TAILED AND ONE-TAILED TEST

In your study, you want to determine if there is a difference in spatial thinking
between males and females; i.e. null hypothesis Ho: μ1 = μ 2 . The alternative
hypothesis is Ha: μ1 ≠ μ 2 . A hypothesis test whose alternative hypothesis has this
form is called a TWO-TAILED TEST.
In your study, you want to determine if females are inferior in spatial thinking
compared to males; i.e. null hypothesis is still Ho: μ1 = μ 2 . But, the alternative
hypothesis is Ha: μ1 < μ 2 . A hypothesis test whose alternative hypothesis has this
form is called a LEFT-TAILED TEST.
In your study, you want to determine if females are better in spatial thinking
compared to males; i.e. null hypothesis is still Ho: μ1 = μ 2 . The alternative
hypothesis is Ha: μ1 > μ 2 . A hypothesis test whose alternative hypothesis has this
form is called a RIGHT-TAILED TEST.
Note:
A hypothesis test is called a ONE-TAILED TEST if it is either left-tailed or right-
tailed; i.e. if it is not TWO-TAILED.
4.4.1 Two-tailed Test

EXAMPLE:
You conducted a study to determine if there is a difference in spatial thinking
between male and female adolescents. Your sample consists of 40 males and 42
female adolescents. You administer a 30-item spatial thinking test to the sample
and the results showed that males scored 23.4 and females scored 24.1.
Step 1:
You want to test the following null and alternative hypotheses:
Ho : μ 1 = μ 2
Ha : μ 1 ≠ μ 2

Step 2:
Using the t-test for an independent variable (which we will discuss in detail in
Topic 5) means you obtained a t-value of 1.50. Based on the alternative
hypothesis, you decide that you are going to use a two-tailed test.
Step 3:
If you are using an alpha (α) of .05 for a two-tailed test, you have to divide .05 by
2 and you get 0.025 for each side of the rejection area.
Figure 4.3: Step 3
Step 4:
The df = n-1 = (40 + 42) - 2 = 80. Look up the t table in Table 4.1 and find that
the critical value is 1.990 and the graph in Figure 4.3 shows that it ranges from
1.990 to + 1.990 which forms the Do Not Reject area.

Table 4.1: The t table
Table of Critical Values for Student's t-test

One 0.250 0.100 0.050 0.025 0.010 0.005
Two 0.500 0.200 0.100 0.050 0.020 0.010
df
50 0.679 1.299 1.676 2.009 2.403 2.678
60 0.679 1.296 1.671 2.000 2.390 2.660
70 0.678 1.294 1.667 1.994 2.381 2.648
80 0.678 1.292 1.664 1.990 2.374 2.639
90 0.677 1.291 1.662 1.987 2.368 2.632
Step 5:
The t-value you have obtained is –1.554 (We will discuss the formula for
computing the t-value in Topic 5). This value does not fall in the Rejection
Region. What is your conclusion? You do not reject Ho. In other words, you
conclude that there is NO SIGNIFICANT DIFFERENCE in spatial thinking
between male and female adolescents. You could also say that the test results are
not statistically significant at the 5% level and provide at most weak evidence
against the null hypothesis.
At = 0.05, the data does not provide sufficient evidence to conclude that the
mean scores on spatial thinking of females is superior to that of males, even
though the mean scores obtained is higher than that of males.
ACTIVITY 4.4
1. How would you have concluded if the t-value obtained is 2.243?

2. Explain how you might commit a Type I or Type II error.

4.4.2 One-tailed Test

EXAMPLE:
You conduct a study to determine if students taught to use mind maps are better in
recalling concepts and principles in economics. A sample of 10 students were
administered a 20-item economics test before the treatment (i.e. pretest). The
same test was administered after the treatment (i.e. posttest) which lasted six
weeks.
Step 1:
The null and alternative hypotheses are:
• Ho: μ1 = μ 2 (Mean scores on the economics tests are the same)
• Ha: μ1 > μ 2 (Mean score of the posttest is greater than the mean score of
the pretest)
Step 2:
Decide on the significant level (alpha). Here, you have set it at the 5% significant
level or alpha ( ) = 0.05.
Step 3:
Computation of the test statistic. Using the dependent t-test formula, you obtained
a t-value of 4.711.
Step 4:
The critical value for the right-tailed test is t with df = n-1. The number of
subjects is n = 10 and = 0.05. You check the "Table of Critical Values for the t-
Test" and it reveals that for df = 10 – 1 = 9. The critical value is 1.833 (Figure
4.4).

Figure 4.4: Step 4
Step 5:
You find that the t-value obtained is 4.711. It falls in the Rejection Region. What is
your conclusion? You reject Ho. In other words, you conclude that there is a
SIGNIFICANT DIFFERENCE in the performance in economics before and after the
treatment. You could also say that the test results are statistically significant at the 5%
level. Put it another way, the p-value is less than the specified significance level of
0.05. (The p-value is provided in most outputs of statistical packages such as SPSS.)
At = 0.05, the data provides sufficient evidence to conclude that the mean scores
on the posttest are superior to the mean scores obtained in the pretest. Evidently,
teaching students mind mapping enhances their recall of concepts and principles
in economics.
ACTIVITY 4.5
A researcher conducted a study to determine the effectiveness of
immediate feedback on the recall of information in biology. The
experimental group of 30 students was provided with immediate
feedback on the questions that were asked. The control group consisted
of 30 students who were given delayed feedback on the questions asked.
1. Determine the null hypothesis for the hypothesis test.
2. Determine the alternative hypothesis for the hypothesis test.
3. Classify the hypothesis test as two-tailed, left-tailed or right-tailed.

• Inferential statistics are used in making inferences from sample observation to

the relevant population.
• Hypothesis testing allows us to use sample data to test a claim about a

population, such as testing whether a population proportion or population
mean equals to some values.
• There are two types of hypotheses: null and alternative.
• Statistical inference using hypothesis testing involves procedures for testing

the significance of hypotheses using data collected from samples.
• Drawing the wrong conclusion is referred as error of inference.
• There are two types of error: Type I and Type II errors. Both relate to the
rejection or acceptance of the null hypothesis.
• Type I error is committed when the researcher rejects the null when the null is
indeed true; in other words incorrectly rejecting the null.
• The probability level where the null is incorrectly rejected is called the
significance level, denoted by the symbol a value set a priori (before even
conducting the research) by the researcher.
• Type II error is committed when the researcher fails to reject the null when the
null is indeed false, in other words wrongly accepting the null.
• Type II error is often denoted as ß.
• In any research, the intention of the researcher is to correctly reject the null; if
the design is carefully selected and the samples represent the population, the
chances of achieving this objective are high. Thus, the power of the study is
defined as 1 - ß.

Alternate hypothesis Power

Hypothesis Type I error
Inferential statistics Type II error
Null hypothesis

Topic t-test
5
LEARNING OUTCOMES
By the end of this topic, you will be able to:
1. Explain what is a t-test and its use in hypothesis testing;
2. Demonstrate using the t-test for Independent Means;
3. Identify the assumptions for using the t-test; and
4. Demonstrate the use of the t-Test for Dependent Means.
INTRODUCTION
This topic explains what t-test is and its use in hypothesis testing. It also
highlights the assumptions for using the t-test. Two types of t-test are elaborated
in the topic. The first is t-test for independent means while the second is the t-test
for dependent means. Computation of the t-statistic using formulae as well as
SPSS procedures is also explained.
5.1 WHAT IS t-TEST?

The t-test was developed by a statistician, W.S. Gossett (see Figure 5.1), who
worked in a brewery in Dublin, Ireland. His pen name was “student” and hence,
the term “student’s t-test” was published in the scientific journal, Biometrika, in
1908. The t-test is a statistical tool used to infer differences between small
samples based on the mean and standard deviation.

68 TOPIC 5 T-TEST
Figure 5.1: W.S. Gossett (1878-1937)
In many educational studies, the researcher is interested in testing the differences

between means on some variable. The researcher is keen to determine whether the
differences observed between two samples represent a real difference between the
populations from which the samples were drawn. In other words, did the observed
difference just happen by chance when, in reality, the two populations did not
differ at all on the variable studies.
For example, a teacher wants to find out whether the Discovery Method of
teaching science to primary schoolchildren is more effective than the Lecture
Method. She conducts an experiment involving 70 primary school children of
whom 35 are taught using the Discovery method and 35 are taught using the
Lecture method. Subjects in the Discovery group score 43.0 marks, while subjects
in the Lecture method group score 38.0 marks on the science test. The Discovery
group does better than the Lecture group. Does the difference between the two
groups represent a real difference or is it due to chance? To answer this question,
the t-test is often used by researchers.
5.2 HYPOTHESIS TESTING USING t-TEST

How do we go about establishing whether the differences in the two means are
statistically significant or due to chance? You begin by formulating a hypothesis
about the difference. This hypothesis states that the two means are equal or the
difference between the two means is zero and is called the null hypothesis.
Using the null hypothesis, you begin testing the significance by saying: "There is
no difference in the score obtained in science between subjects in the Discovery
group and the Lecture group."
TOPIC 5 T-TEST 69
More commonly, the null hypothesis may be stated as follows:
(a) Ho : 1 = 2
OR
(b) Ho : 1 - 2 = 0
If you reject the null hypothesis, it means the difference between the two means
have statistical significance. On the other hand, if you do not reject the null
hypothesis, it means the difference between the two means is NOT statistically
significant and the difference is due to chance.
Note:
For a null hypothesis to be accepted, the difference between the two means need
not be equal to zero since sampling may account for the departure from zero.
Thus, you can accept the null hypothesis even if the difference between the two
means is not zero provided the difference is likely to be due to chance. However,
if the difference between the two means appears too large to have been brought
about by chance, you reject the null hypothesis and conclude that a real difference
exists.
ACTIVITY 5.1
1. State TWO null hypothesis in your area of interest that can be tested
using the t-test.
2. What do you mean when you reject or do not reject a null
hypothesis?
5.3 t-TEST FOR INDEPENDENT MEANS

The t-test is a powerful statistical tool that enables you to determine that the
differences obtained between two groups is statistically significant. When two
groups are independent of each other, it means the sample drawn came from two
populations. In other words, it means that the two groups are independent or
belong to "unpaired groups" and "unpooled groups."

70 TOPIC 5 T-TEST
(a) Illustration
Say, for example, you conduct a study to determine the spatial reasoning
ability of 70 ten-year-old children in Malaysia. The sample consisted of 35
males and 35 females (see Figure 5.2). The sample of 35 males was drawn
from the population of ten-year-old males in Malaysia and the sample of 35
females was drawn from the population of ten-year-old females in Malaysia.
Note that they are independent samples because they come from two completely
different populations.
Figure 5.2: Independent Samples
Research Question:
"Is there a significant difference in spatial reasoning between male and
female ten-year-old children?"
Null Hypothesis or Ho:

"There is no significant difference in spatial reasoning between male and
female ten-year-old children."

TOPIC 5 T-TEST 71
(b) Formula for Independent t-test

Note that the formula for the t-test shown below is a ratio.
The top part of the equation is the
X1 X2 difference between the two means.
t
SE(X1 X2 )
The bottom part of the equation is the
Standard Error (SE) which is a
measure of the variability of dispersion
of the scores.
(c) Computation of the Standard Error

Use the formula below. To compute the standard error (SE), take the
variance (i.e. standard deviation squared) for Group 1 and divide it by the
number of subjects in that group minus "1." Do the same for Group 2. Then,
add these two values and take the square root.
This is the formula for the SE(X1 X2 )

var1 var2
Standard Error: (n1 1) (n 2 1)
Combine the two formulas X1 X2

and you get this version of t
var1 var2
the t-test formula: (n1 1) (n 2 1)
The results of the study are as follows:
Mean SD N Variance
Group 1: Males 12 2.0 35 4.0
Group 2: Females 10 2.0 35 4.0

72 TOPIC 5 T-TEST
Let's try using the formula:
12 10 2
t
4.01 4.02 00.1177 0.1177
(35 1) (35 1)
2
4.124
0.485
Note: The t-value will be positive if the mean for Group 1 is larger or more than
(>) the mean of Group 2 and negative if it is smaller or less than (<).
(d) What do you do after computing the t-value?

Once you have computed the t-value (which is 4.124), look up the t-value in
the Table of Critical Values for Student's t-test which tells us whether the
ratio is large enough to say that the difference between the groups is
significant. In other words, the difference observed is not likely due to
chance or sampling error.
(e) Alpha Level

As with any test of significance, you need to set the alpha level. In most
educational and social research, the "rule of thumb" is to set the alpha level
at .05. This means that 5% of the time (five times out of a hundred) you
would find a statistically significant difference between the means even if
there is none ("chance").
(f) Degrees of Freedom

The t-test also requires that we determine the degrees of freedom (df) for the
test. In the t-test, the degrees of freedom are the sum of the subjects or
persons in both groups minus 2. Given the alpha level, the df, and the t-
value, you look up the Table of Critical Values for Student's t-test (available
as an appendix in the back of most statistics texts) to determine whether the
t-value is large enough to be significant.
(g) Look up the Table of Critical Values for Student's t-test shown in
Table 5.1 (Note: Only part of the table is given here)
The df is 70 minus 2 = 68. You take the nearest df which is 70 and read the
column for the two-tailed alpha of 0.050.
The t-value you obtained is 4.124. The critical value shown is 1.994. Since
the t-value is greater than the critical value of 1.994, you reject Ho and

TOPIC 5 T-TEST 73
conclude that the difference between the means for the two groups is
different. In other words, males scored significantly higher than females on
the spatial reasoning test.
However, you do not have to go through this tedious process, as statistical

computer programs such as SPSS provide the significance test results,
saving you from looking them up in a table.
Table 5.1: Table of Critical Values for Student's t-test

One-tail 0.250 0.100 0.050 0.025 0.010 0.005
Two-tail 0.500 0.200 0.100 0.050 0.020 0.010
df
21 0.686 1.323 1.721 2.080 2.518 2.831
22 0.686 1.321 1.717 2.074 2.508 2.819
23 0.685 1.319 1.714 2.069 2.500 2.807
24 0.685 1.318 1.711 2.064 2.492 2.797
25 0.684 1.316 1.708 2.060 2.485 2.787
26 0.684 1.315 1.706 2.056 2.479 2.779
27 0.684 1.314 1.703 2.052 2.473 2.771
28 0.683 1.313 1.701 2.048 2.467 2.763
29 0.683 2.462 1.311 1.699 2.045 2.756
30 0.683 1.310 1.697 2.042 2.457 2.750
40 0.681 1.303 1.684 2.021 2.423 2.704
50 0.679 1.299 1.676 2.009 2.403 2.678
60 0.679 1.296 1.671 2.000 2.390 2.660
70 0.678 1.294 1.667 1.994 2.381 2.648
80 0.678 1.292 1.664 1.990 2.374 2.639
90 0.677 1.291 1.662 1.987 2.368 2.632
100 0.677 1.290 1.660 1.984 2.364 2.626
100 0.674 1.282 1.645 1.960 2.326 2.576

74 TOPIC 5 T-TEST
ACTIVITY 5.2
1. Would you reject Ho if you had set the alpha at 0.01 for a two-tailed
test?
2. When do you use the one-tailed test and two-tailed t-test?
(h) Assumptions for Independent t-test

While the t-test has been described as a robust statistical tool, it is based on a
model that makes several assumptions about the data that must be met prior to
analysis. These assumptions need to be evaluated because the accuracy of your
interpretation of the data depends on whether the assumptions are violated.
The following are five main assumptions that are generic to all t-tests:
(i) Scale of Measurement

The data that you collect for the dependent variable should be based on
an instrument or scale that is continuous or ordinal. For example,
scores that you obtain from a 5-point Likert scale: 1, 2, 3, 4, 5 or marks
obtained in a mathematics test, the score obtained on an IQ test or the
score obtained on an aptitude test.
(ii) Random Sampling

The sample of subjects should be randomly sampled from the
population of interest.
(iii) Normality
The data come from a distribution that has one of those nice bell-
shaped curves known as a normal distribution. Refer to Topic 3: The
Normal Distribution, which provides both graphical and statistical
methods for assessing normality of a sample or samples.
(iv) Sample Size

Fortunately, it has been shown that if the sample size is reasonably
large, quite severe departures from normality do not seem to affect the
conclusions reached. Then again what is a reasonable sample size? It
has been argued that as long as you have enough people in each group
(typically greater or equal to 30 cases) and the groups are close to
equal in size, you can be confident that the t-test will be a good and
strong tool for getting the correct conclusions. Statisticians say that the
t-test is a "robust" test. Departure from normality is most serious when

TOPIC 5 T-TEST 75
sample sizes are small. As sample sizes increase, the sampling

distribution of the mean approaches a normal distribution regardless of
the shape of the original population.
(v) Homogeneity of Variance

It has often been suggested by some researchers that homogeneity of
variance or equality of variance is actually more important than the
assumption of normality. In other words, are the standard deviations of
the two groups pretty close to equal? Most statistical software
packages provide a "test of equality of variances" along with the
results of the t-test and the most common being Levene's test of
homogeneity of variance. Refer to Table 5.2.
Table 5.2: Levene's Test of Homogeneity of Variance
LeveneÊs Test of Equality of 95% Confidence Interval

Variances
F Sig t d Sign. Mean Std. Error Upper Lower
Two- Difference Difference
tail
Equal 3.39 .080 .848 20 .047 1.00 1.18 -1.46 3.46
Variance
Assumed
Unequal .848 16.70 .049 1.00 1.18 -1.49 3.40
Variances
Assumed
Begin by putting forward the null hypothesis that:
"There are no significant differences between the variances of the two groups"
and you set the significant level at .05.
If the Levene statistic is significant, i.e. LESS than .05 level (p < .05), then the
null hypothesis is: REJECTED and one accepts the alternative hypothesis while
concluding that the VARIANCES ARE UNEQUAL. (The unequal variances in
the SPSS output is used)
If the Levene statistic is not significant i.e. MORE than .05 level (p > .05), then
you DO NOT REJECT (or Accept) the null hypothesis and conclude that the
VARIANCES ARE EQUAL. (The equal variances in the SPSS output is used)

76 TOPIC 5 T-TEST
The Levene test is robust in the face of departures from normality. The Levene's
test is based on deviations from the group mean.
SPSS provides two options i.e. "homogeneity of variance assumed" and
"homogeneity of variance not assumed" (see Table below).
The Levene test is more robust in the face of non-normality than more
traditional tests like Bartlett's test.
ACTIVITY 5.3
Refer to Table 5.2. Based on Levene’s Test of Homogeneity of variance,

what is your conclusion? Explain.
Let’s examine an EXAMPLE:

In the CoPs Project, an Inductive Reasoning scale consisting of 11 items was
administered to 946 eighteen-year-old respondents. One of the research questions
put forward is:
"Is there a significant difference in inductive reasoning between male and
female subjects"?
To establish the statistical significance of the means of these two groups, the t-
test is used. Use SPSS.

TOPIC 5 T-TEST 77
5.4 T-TEST FOR INDEPENDENT MEANS USING

SPSS
SPSS PROCEDURES for independent groups t-test:
1. Select the Analyze menu.
2. Click on Compare Means and then Independent-
Samples T Test .... to open the Independent Samples
T Test dialogue box.
3. Select the test variable(s) [i.e. Inductive Reasoning] and
then click the arrow button to move the variables into
the Test Variables(s): box
4. Select the grouping variables [i.e. gender] and click
the arrow button to move the variable into the Grouping
Variable box
5. Click Define Groups .... command pushbutton to
open the Define Groups sub-dialogue box.
6. In the Group 1 box, type the lowest value for the variable
[i.e. 1 for 'males']. Enter the second value for the
variables [i.e. 2 for 'females'] in the Group 2 box.
7. Click Continue and then OK.
Output #1:
The “Group Statistics” in Table 5.3 reports the mean values on the variable
(inductive reasoning) for the two different groups (males and females). Here, we
see that 495 females in the sample scored 8.99 while 451 males had a mean score
of 7.95 on inductive reasoning. The standard deviation for the males is 3.46 while
that for the females is 3.14. The scores for the females are less dispersed
compared to those for the males.
Table 5.3: Mean Values on the Variable (Inductive Reasoning) for the Two Different
Groups (Males and Females)
Group Statistics
Gender N Mean Std. Deviation Std. Error Mean
INDUCTIVE Male 451 7.9512 3.4618 2.345

REASONING
Female 495 8.9980 3.1427 3.879

78 TOPIC 5 T-TEST
The question remains: Is this sample difference in inductive reasoning large

enough to convince us that there is a real significant difference in inductive
reasoning ability between the population of 18-year-old females and the
population of 18-year-old males?
Output #2:
Let’s examine this output in two parts:
Firstly, determine that the data meet the "Homogeneity of Variance" assumption.
You can use the Levene's Test and set the alpha at 0.05. The alpha obtained is
0.030 which is less than (<) than 0.05 and you reject the Ho: and conclude that the
variances are not equal. Hence, you have violated the "Homogeneity of Variance"
assumption. Thus, the “Unequal Variances Assumed” output should be used.
Refer to Figure 5.3.
Levene’s Test of 95% Confidence

Equality of Variances Interval
F Sig t d Sign. Mean Std. Error Upper Lower
Two- Difference Difference
tail
Equal 4.720 .030 -4.875 944 .000 -1.0468 -2.147 -1.4682 -.6254
Variance
Assumed
Unequal -4.853 911.4 .049 -1.0468 -2.146 -1.4701 -.6234
Variances
Assumed
Figure 5.3: LeveneÊs Test of equality of variances
Secondly, examine the following:

The SPSS output below displays the results of the t-test to test whether the
difference between the two sample means is significantly different from zero.
Remember, the null hypothesis states there is no real difference between the
means (Ho: 1 = 2).
Any observed difference just occurred by chance.
Interpretation:
t-value
This "t" value tells you how far away from 0, in terms of the number of standard
errors, the observed difference between the two sample means falls. The "t" value
is obtained by dividing the Mean Difference (– 1.0468) by the Std. Error (– .2146)
which is equal to – 4.878.

TOPIC 5 T-TEST 79
p-value
If the p-value as shown in the "sig (2 tailed) column is smaller than your chosen
alpha level you do not reject the null hypothesis and argue that there is a real
difference between the populations. In other words, we can conclude that the
observed difference between the samples is statistically significant.
Mean Difference
This is the difference between the means (labelled "Mean Difference") i.e. 7.9512
– 8.9980 = – 1.0468.
5.5 T-TEST FOR DEPENDENT MEANS

The Dependent means t-test or the Paired t-test or the Repeated measures t-
test is used when you have data from only one group of subjects i.e. each subject
obtains two scores under different conditions. For example, when you give a pre-
test and after a particular treatment or intervention give the same subjects a post-
test. In this form of design, the same subjects obtain a score on the pretest and,
after some intervention or manipulation obtain a score on the posttest. Your
objective is to determine whether the difference between means for the two sets of
scores is the same or different.
Example:
Research Questions:
Is there a significant difference in pretest and posttest scores in social studies
for subjects in the discovery method group?
Is there a significant difference in pretest and posttest scores in social studies
for subjects in the chalk and talk group?
Null Hypotheses:
There is no significant difference between the pretest and the posttest for the
discovery method group.
There is no significant difference between the pretest and the posttest for the
chalk and talk group.

80 TOPIC 5 T-TEST
Formula of the Dependent t-test

D
t=
2
2
D
D -
N
N N-1
Where,
t = t-ratio
D = Average difference
D2 = Different scores squared then summed
( D)2 = Different scores summed then squared
N = Number of pairs
EXAMPLE:
A researcher conducted a study on personality changes in 15 college women from
Year 1 to Year 4. A 30-item personality test was administered in Year 1 and then
again in Year 4 to the same 15 women. The results of the study are shown in
Table 5.4.
Table 5.4: Results of the Study

Subject Year 1 Test Year 4 Test
D D2
Number (Pretest) (Posttest)
1 21 24 +3 +9
2 18 20 +2 +4
3 13 15 +3 +9
4 10 15 +5 +25
5 22 20 -2 +4
6 15 19 +4 +14
7 17 18 +1 +1
8 24 22 -2 +4
9 25 28 +3 +9
10 20 23 +3 +9
11 21 25 +4 +16
12 19 22 +3 +9
13 17 16 -1 +1
14 20 26 +6 +36
15 16 19 +3 +9
X =18.5 X =20.8 D 35 D2 159

TOPIC 5 T-TEST 81
Step 1:
Calculate the mean score for the Year 1 Test by adding up all the Year 1 Test
scores and divide by the number of subjects. This will give you the mean score of
18.5. Similarly, calculate the mean score of the Year 4 Test and this will give you
the mean score of 20.8.
Step 2:
Next, calculate the value of standard deviation using the formula as follows.
2
D
D2 -
SD = n
N-1
2
35
159
15 159 81.67
SD = 5.52 2.35
15 1 14
Step 3:
D
Applying the t-test for Dependent Means formula: ( )
SD
Calculate effect size, the mean difference divided by the standard deviation.
The mean difference is 20.8 – 18.5 = 2.3 and the standard deviation is 2.35.
Substitute these values in the above equation, i.e. 2.3 / 2.35 = 0.979.
To determine the likelihood that the effect size is a function of chance, first
calculate the t-ratio by multiplying the effect size by the square root of the number
of pairs.
t N
In this example, the t is (0.979) 15 (0.979)(3.87) 3.79

82 TOPIC 5 T-TEST
Table 5.5: Table of Critical Values for Student's t-test

Tail One 0.100 0.050 0.025 0.010 0.005
Tail Two 0.200 0.100 0.050 0.020 0.010
df
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
Step 4:
Having computed the t-value (which is 3.79) you look up the t-value in The Table
of Critical Values for Student's t-test or The Table of Significance which tells us
whether the ratio is large enough to say that the difference between the groups is
significant. In other words, the difference observed is not likely due to chance or
sampling error. Refer to Table 5.5.
Alpha Level
The researchers set the alpha level at 0.05. This means that 5% of the time (five
out of a hundred) you would find a statistically significant difference between the
means even if there is none ("chance"). However, since this is a one-tailed test,
you divide 0.05 by 2 and you get 0.025.
Degrees of Freedom
The t-test also requires that we determine the degrees of freedom (df) for the test.
In the t-test, the degrees of freedom are the sum of the subjects or persons which
is 15 – 1 = 14. Given the alpha level, the df and the t-value, you look up in the
Table (available as an appendix in the back of most statistics texts) to determine
whether the t-value is large enough to be significant.

TOPIC 5 T-TEST 83
Step 5:
The t-value obtained is 3.79 which is greater than the critical value shown which
is 2.145 (one tailed). Hence, the null hypothesis [Ho:] is Rejected and Ha: is
accepted which states the Posttest Mean > than Pretest Mean. It can be concluded
that the difference between the means is significant. In other words, there is
overwhelming evidence that a "gain" has taken place on the personality inventory
from Year 1 to Year 4 women undergraduates.
Again, you do not have to go through this tedious process, as statistical computer
programs such as SPSS, provides the significance test results, saving you from
looking them up in a table.
Misapplication of the Formula

A common error made by some research students is the misapplication of the
formula. Researchers who have Dependent Samples fail to recognise this fact, and
inappropriately apply the t-test for Independent Groups to test the hypothesis
that 1 = 2 . If an inappropriate Independent Groups t-test is performed with
Dependent Groups, the standard error will be greatly overestimated and significant
differences between the two means may be considered "non-significant" (Type 1
Error).
The opposite error, mistaking non-significant differences for significant ones

(Type 2 Error), may be made if the Independent Groups t-test is applied to
Dependent Groups t-test. Thus, when using the t-test, you need to recognise and
distinguish independent and dependent samples.
5.6 T-TEST FOR DEPENDENT MEANS USING

SPSS
EXAMPLE:
In a study, the researcher was keen to determine if teaching note-taking
techniques improved achievement in history. A sample of 22 students selected for
the study and taught note-taking techniques for a period of four weeks. The
research questions put forward is:
"Is there a significant difference in performance in history before and after

the treatment?" i.e you wish to determine whether the difference between the
means for the two sets of score is the same or different.

84 TOPIC 5 T-TEST
To establish the statistical significance of the means obtained on the pretest and
posttest, the repeated measures t-test (also called dependent-samples and paired-
samples t-test) was used using SPSS.
Data was collected from the same group of subjects on both conditions and each
subject obtains a score on the pretest, and after the treatment (or intervention or
manipulation), a score on the posttest.
Ho: 1 = 2 or Ha: 1 2
SPSS PROCEDURES for the dependent groups t-test:

2. Click Compare Means and then Paired-Samples T Test
...to open the Paired-Sample T Test dialogue box.
3. Select the test variable(s). [i.e. History Test] and
then click the arrow button to move the variables into
the Paired Variables box.
The following Table 5.6 shows the SPSS outputs.
Table 5.6: SPSS Outputs

Paired Samples Statistics
Mean N Std. Deviation Std. Error Mean
Pair 1 Pretest 8.50 22 3.34 .71
Posttest 13.86 22 2.75 .59
The ‘Paired Samples Statistics’ table above reports that the mean values on the
variable (history test) for the pretest and posttest. The posttest mean is higher
(13.86) than the pretest mean (8.50) indicating improved performance in the
history test after the treatment. The standard deviation for the pretest 3.34 and is
very close to the standard deviation for the posttest which is 2.75.

TOPIC 5 T-TEST 85
The question remains: Is this mean difference large enough to convince us that
there is a significant difference in performance in history, a consequence of
teaching note-taking techniques?
Paired Differences
Mean Std. Std. Lower Upper t df Sig. (2
Difference Deviation Error tailed)
Mean
Pair 1 -5.36 2.90 .62 -6.65 -4.076 -8.65 21 .000
Pretest
Posttest
Figure 5.4: Paired Differences
t-Value
This "t" value tells you how far away from 0, in terms of the number of standard
errors, the observed difference between the two sample means falls. The "t" value
is obtained by dividing the mean difference (–5.36) by the std. error (.62), which
is equal to –8.65. Refer to Figure 5.4.
p-value
The p-value shown in the "Sig (2 tailed)” column is smaller than your chosen
alpha level (0.05) and so you reject the null hypothesis and argue that there is a
real difference between the pretest and posttest.
In other words, we can conclude, that the observed difference between the two
means is statistically significant.
Mean Difference
This is the difference between the means 43.15 – 63.98 = –20.83.

86 TOPIC 5 T-TEST
ACTIVITY 5.4
t-test for Dependent Means or Groups

Case Study 1:
In a study, a researcher was interested in finding out whether attitude
towards science would be enhanced when students are taught science
using the Inquiry Method. A sample of 22 students were administered an
attitude toward science scale before the experiment. The treatment was
conducted for one semester, after which the same attitude scale was
administered to the same group of students.
ATTITUDE N Mean Std. Deviation Std. Error Mean

Pair Pretest 22 8.50 3.33 .71
Posttest 22 13.86 2.75 .59
Paired Differences
Mean Std. Std. Lower Upper t df Sig.
Deviation Error (2
Mean tailed)
Pair Pretest -5.36 2.90 .62 -6.65 -4.08 -8.66 21 .000
Posttest
Answer the following questions and discuss online:

1. State a null hypothesis for the above study.
2. State an alternative hypothesis for the above study.
3. Briefly describe the 'Paired Sample Statistics' table with regards to
the means and variability of scores.
4. What is the conclusion of the null hypothesis stated in (1)?
5. What is the conclusion of the alternative hypothesis stated in (2)?

TOPIC 5 T-TEST 87
ACTIVITY 5.5
t-test for Independent Means or Groups

Case Study 2:
A researcher was interested in finding out about the creative thinking skills
of secondary school students. He administered a 10-item creative thinking
test to a sample of 4,404 sixteen-year-old students drawn from all over
Malaysia.
GENDER N Mean Std. Std. Error

Deviation Mean
Male 1966 6.9410 2.2858 5.155E-02
Female 2438 6.8351 2.4862 5.035E-02
Levene’s Test for t-test for

Equality of Variances Equality of Means
Difference F Sig. t df Sig. 2- Mean Std.
tailed Difference Error
Equal 19.408 .000 1.456 4402 .145 .1059 7.271E-
02
Equal 1.469 4327.13 .142 .1059 7.206E-
02
Answer the following questions and discuss online:

1. State a null hypothesis for the above study.
2. State an alternative hypothesis for the above study.
3. Briefly describe the 'Group Statistics' table with regards to the means
and variability of scores.
4. Is there evidence for homogeneity of variance? Explain.
5. What would you do if the significance level is 0.053?
6. What is the conclusion of the null hypothesis stated in (1)?
7. What is the conclusion of the alternative hypothesis stated in (2)?

88 TOPIC 5 T-TEST
The Independent t-test is used to determine whether the difference observed

between two unrelated groups is statistically significant. It is a parametric test
and required the following assumption:
– normal distribution of the test parameter.
– the data is measured as interval or ratio data.
In order to have a meaningful interpretation, the independent t-test also
requires a large sample size.
The homogeneity of variance is another required assumption but the SPSS
does offer options for determining the p-value when this requirement is not
met.
The Paired t-test is used when you have before and after data from a single
group of subjects. In this test, the t-statistics is computed using the mean
differences rather that the difference in the mean between the two groups. As
such, all subjects must have the ‘pretest’ and ‘posttest’ data.
Critical value P-value

Homogeneity of variance Related sample (paired sample)
Independent sample Significant level

Topic One-way
6 Analysis of
Variance (one
-way ANOVA)
LEARNING OUTCOMES
1. Define One-way ANOVA;
2. Explain the logic of One-way ANOVA;
3. Compute One-way ANOVA using a formula;
4. Identify the assumptions for using the One-way ANOVA; and
5. Compute One-way ANOVA using SPSS and interpret the results.
INTRODUCTION
This topic explains what One-way Analysis of Variance (ANOVA) is about and
the assumptions for using ANOVA in hypothesis testing. It demonstrates how
ANOVA can be computed using the formula and the SPSS procedures. Also
explained are the interpretation of the related statistical results and the use of post-
hoc comparison tests.
6.1 WHAT IS ANOVA TEST?

In educational research, we are often involved in finding out whether there are
differences between groups. For example, is there a difference between male and
female students, between rural and urban students and so forth. As we have
discussed in Topic 5, the t-test was used to compare differences of means between
two groups, such as comparing outcomes between a control and treatment group in

90 TOPIC 6 ONE-WAY ANALYSIS OF VARIANCE (ONE–WAY ANCOVA)
an experimental study. Suppose you are interested in comparing the means of three
groups (i.e k = 3) rather than two.
You might be tempted to use the multiple t-test and compare the means separately;
i.e. you compare the means of Group 1 and 2, followed by Group 1 and 3 and so
forth. What is the danger of doing this? Multiple t-tests enhance the likelihood of
committing Type 1 error (i.e. claiming that two means are not equal, when in fact
they are equal). In other words, you reject a null hypothesis when it is TRUE. On a
practical level, using the t-test to compare many means is a cumbersome process in
terms of the calculations involved.
Example
Let us look at the following example, which shows the results of a study on
Attitude towards Homework among Students of Varying Ability Levels. Subjects
were divided into three groups: High Ability, Average Ability and Low Ability.
The total sample size is 505 students. You need a special class of statistical
techniques called the One-way Analysis of Variance or One-way ANOVA which
we will discuss here.
Table 6.1: Attitudes toward Homework among 14-Year-Old Students
Group N Mean Std. Deviation Std. Error 95 Pct Conf. Int

for Mean
High ability 220 13.03 3.17 0.12 12.79 – 13.27
Average
212 11.99 2.93 0.11 11.77 – 12.21
ability
Low ability 73 9.54 3.50 0.40 8.73 – 10.36
Interpretation of the Table 6.1
What do the three means tell you? High ability students have the highest mean
(13.03), while low ability students have the lowest mean (9.54). Meanwhile,
average ability students fall in the middle, with a mean of 11.99.
• What do the three standard deviations tell you? Note that the standard deviation
for high ability (3.17) and average ability (2.93) students are fairly close, while
low ability students have a somewhat bigger standard deviation of 3.50.
• What do the three Standard Errors tell you? Refer to Table 6.1, and you will
notice that there is a column called 'standard error'. What is the standard error?
The standard error is a measure of how much the sample means vary if you
were to take repeated samples from the same population. The first two groups
TOPIC 6 ONE-WAY ANALYSIS OF VARIANCE (ONE–WAY ANCOVA) 91
contain > 200 students each; the standard error of the mean for each of these
groups is fairly small. It is 0.12 for high ability students and 0.11 for average
ability students. However, the standard error for the low ability group is
comparatively high = 0.40. Why? The smaller number of low ability students
(n=73) and the larger standard deviation explains why the standard error is
larger.
• What does 95 Pct Conf. Int for Mean means? The last column displays the
“confidence interval”. What is the confidence interval? It is the range which is
likely to contain the true population value or mean. If you take repeated
samples of 14-year-old students from the same population of 14-year-old
students in the country and calculate their mean, there is a probability that 95%
of them should include the unknown population value or mean. For example,
you can be 95% confident that, in the population, the mean of high ability
students is somewhere between 12.79 and 13.27. Similarly, you can be 95%
confident that, in the population, the mean of low ability students is somewhere
between 8.73 and 10.36.
• You will notice that the confidence interval is wider for low ability students
(i.e. 1.63) compared to confidence interval for high ability students (i.e. 0.48).
Why? This is due to the larger standard error (0.40) obtained by low ability
students. Since the confidence interval depends on the standard error of the
mean, the confidence interval for low ability students is wider than for high
ability students. So, the larger the standard error, the wider will be the
confidence interval. Makes sense, right?
At the heart of ANOVA is the concept of Variance. What is variance? Most of
you would say, it is the standard deviation squared! Yes, that is correct. The focus
is on two types of variance:
• Between-Group Variance, i.e. if there are three groups, it is the variance
between the three groups.
• Within-Group Variance, i.e. if in each group there are 30 subjects, it is the
variance of scores within subjects in that group.
The F-value is a ratio of the Between-Group

Variance and Within-Group Variance
If the F-value is significant, it tells us that the population means are probably not
all equal and you reject the null hypothesis. Next, you have to locate where the
significance lies or which of the means are significantly different. You have to use
post-hoc analysis to determine this.

ACTIVITY 6.1
1. What is the standard error? Why does the standard error vary?
2. Explain "95 Pct Conf. Int for Mean".
6.2 LOGIC OF THE ONE-WAY ANOVA

A researcher was interested in finding out whether there are differences in creative
thinking among 12-year-old students from different socio-economic backgrounds.
Creative thinking was measured using The Torrance Test of Creative Thinking
consisting of five items, while socio-economic status (SES) was measured using
household income. Socio-economic status or SES was divided into three groups
(high, middle and low). The null hypothesis generated is that all three groups will
have the same mean score on the creative test. In formula terms, if we use the
symbol [pronounced as ‘mew’] to represent the average score, the null
hypothesis is expressed through the following notation:
The null hypothesis is represented in Figure 6.1 as follows:

Ho: 1 = 2 = 3
Figure 6.1: Null hypothesis
The null hypothesis states that the means of high ability, average ability and low
ability students are the same; i.e. is equal to 4.00.
To test the null hypothesis, the One-way Analysis of Variance is used. The One-
way ANOVA is a statistical technique used to test the null hypothesis that several
populations’ means are equal. The word 'variance' is used because it examines the
variability in the sample. In other words, how much do the scores of individual
students vary from the mean? Based on the variability or variance, it determines
whether there is reason to believe that the population means are not equal. In our
example, does creativity vary between the three groups of 12-year-old students?

The alternative hypothesis is represented in Figure 6.2 as follows:

Ha = Mean of at least one group is
different from the others
Figure 6.2: Alternative Hypothesis
The alternative hypothesis states that there is a difference between the three groups
of students (see Figure 6.2). However, the alternative hypothesis does not state
which groups differ from one another. It just says that the means of each group are
not all the same; or at least one of the groups differs from the others.
Are the means really different? We need to figure out whether the observed
differences in the sample means are attributed to just the natural variability among
sample means or whether there is reason to believe that the three groups of
students have different means in the population. In other words, are the differences
due to chance or there is a 'real' difference.
6.3 BETWEEN GROUP AND WITHIN GROUP

VARIANCE
As mentioned earlier, the researcher was interested in determining whether there
were differences in creativity between students from different socio-economic
backgrounds; i.e. High SES, Middle SES and Low SES. To determine if there are
significant differences between the three means, you have to compute the F-ratio
or F-test. To compute the F-ratio you have to use two types of variances:
• Between-Group Variance or the variability between group means.
• Within-Group Variance or the variability of the observations (or scores) within
a group (around a particular group's mean)

(a) Between-Group Variance

The diagram in the previous Figure 6.2 presents the results of the study. Let
us look more closely at the two types of variability or variance. Note that
each of the three groups has a mean which is also known as the
sample mean.
• The high SES group has a mean of 4.12 for the creativity test
• The middle SES group has a mean of 4.37 for the creativity test
• The low SES has a mean of 3.99 for the creativity test
(b) Within-Group Variance

Within group variance or variability is a measure of how many the
observations or scores within a group vary. It is simply the variance of the
observations or scores within a group or sample, and it is used to estimate the
variance within a group in the population. Remember, ANOVA requires the
assumption that all of the groups have the same variance in the population.
Since you do not know if all of the groups have the same mean, you cannot
just calculate the variance for all of the cases together. You must calculate
the variance for each of the groups individually and then combine these into
an "average" variance.
Within-group variance for the example shows that the 313 students within
the high SES group have different scores, the 297 students within the middle
SES group have different scores and the 340 students within the low SES
also have different scores. Among the three groups, there is slightly greater
variability or variance among Low SES subjects (SD = 1.31) compared to
High SES subjects with a SD of 1.28.
6.4 COMPUTING F-STATISTIC

The F-test or the F-ratio is a measure of how different the means are relative to the
variability or variance within each sample. The larger the F value, the greater the
likelihood that the differences between means are due to something other than
chance alone; i.e. real effects or the means are significantly different from one
another.

The following is the summarised formula for computing the F-statistic or F-ratio:
Between Mean Square

F =
Within Mean Square
Based on the study (see Table 6.2 for results) about the relationship between
creativity and socio-economic status of the subject, computation of the F-statistics
is as follows:
Table 6.2: Results
High SES Middle SES Low SES

Mean = 4.12 4.37 3.99
SD = 1.28 1.30 1.31
n = 313 297 340
Steps For Computing F-Statistics Or F-Ratio:

Step 1: Computation of the Between Sum of Squares (BSS)
The first step is to calculate the variation between groups by comparing the mean
of each SES group with the Mean of the Overall Sample (the mean score on the
test for all students in this sample is 4.00).
BSS = n1 ( x1 – x)² + n2 (x2 – x)² + n3 (x3 – x)²
This measure of between group variance is referred to as "Between Sum of

Squares" (or BSS). This is calculated by adding up (for all the three groups), the
difference between the group's mean and the overall population mean (4.00),
multiplied by the number of cases (i.e. n) in each group.
Between Sum of Squares (BSS) = No. of students x (Mean of Group 1 – Overall

Mean) ² + No. of students x (Mean of Group 2 – Overall Mean) ² + No. of
students x (Mean of Group 3 – Overall Mean) ²
= 313 (4.12 – 4.00)² + 297 (4.37 – 4.00)² + 340 (3.99 – 4.00)²

= 4.51 + 40.66 + 0.034 = 45.21

Degrees of freedom:
This sum of squares has a number of degrees of freedom equal to the number
of groups minus 1. In this case, df = (3-1) = 2
Step 2: Computation of the Between Mean Squares (BMS)

BBS 45.21
Between Mean Squares = = = 22.61
df 2
Divide the BSS figure (45.21) by the number of degrees of freedom (2) to get
our estimate of the variation between groups, referred to as "Between Mean
Squares".
Step 3: Computation of the Within Sum of Squares (WSS)

To measure the variation within groups, we find the sum of the squared
deviation between scores on the Torrance Creative Test and the group
average, calculating separate measures for each group, and then summing the
group values. This is a sum referred to as the "Within Sum of Squares" (or
WSS).
WSS = ( n1 – 1) SD1² + ( n2 – 1) SD2² + ( n3 – 1 ) SD3²
Within Sum of Squares (WSS) = (Degrees of Freedom of Group 1 – 1) x

SD1² + (Degrees of Freedom of Group 2 – 1) x SD2² + (Degrees of Freedom
of Group 3 – 1) x SD3²
= (313 – 1) 1.28² + (297 – 1) 1.30² + (340 – 1) 1.31²

= 511.18 + 500.24 + 581.76
= 1593.18
Degrees of freedom:
As in Step 1, we need to adjust the WSS to transform it into an estimate of
population variance, an adjustment that involves a value for the number of
degrees of freedom within. To calculate this, we take a value equal to the
number of cases in the total sample (N = 950), minus the number of groups
(k = 3), i.e. 950 - 3 = 947
Step 4: Computation of the Within Mean Squares (WMS)

Divide the WSS figure (1593.18) by the degrees of freedom (N - k = 947) to
get an estimate of the variation within groups referred to as "Within Mean
Squares".

WSS 1593.18
Within Mean Squares = = = 1.68
df 947
Step 5: Computation of the F-test statistic

This calculation is relatively straightforward. Simply divide the Between
Mean Squares (BMS), the value obtained in step 1, by the Within Mean
Squares (WMS), the value calculated in step 2.
Between Mean Squares 22.61
F= = = 13.46
Within Mean Squares 1.68
Step 6: To Reject or Not Reject the Hypothesis
To determine if the F-statistics is sufficiently large to reject the null
hypothesis, you have to determine the critical value for the F-statistics by
referring to the F-distribution. There are two degrees of freedom:
• k -1 which is the numerator [i.e. Three groups minus one = 3 – 1 = 2]
• N – k which is the denominator [i.e. no. of subjects minus number of
groups = 950 – 3 = 947
• The critical value is 3.070 which is 2 df by 120 df (the distribution
provided in most textbooks has a maximum of 120 df. You use it for any
denominator exceeding 120 df).
Extract from Table of Critical Values for the F-Distribution
df1 1 2 3 4
df2
96 3.940 3.091 2.699 2.466
97 3.939 3.090 2.698 2.465
98 3.938 3.089 2.697 2.465
99 3.937 3.088 2.696 2.464
100 3.936 3.087 2.696 2.463
120 3.920 3.070 2.680 2.450

Finally, compare the F-statistics (13.34) with the critical value 3.07. At p =
0.05, the F-statistics is larger (>) than the critical value and hence there is
strong evidence to reject the null hypothesis, indicating that there is a
significant difference in creativity among the three groups of students. While
the F-statistic assesses the null hypothesis of equal means, it does not address
the question of which means are different. For example, all three groups may
be different significantly, or two may be equal but differ from the third. To
establish which of the three groups are different, you have to follow up with
post-hoc comparison or tests.
Step 7: Post-Hoc Comparisons or Tests

There are many techniques available for post-hoc comparisons and they are
as follows:
• Least Square Difference (LSD)
• Duncan
• Dunnett
• Tukey’s Honestly Significant Difference (HSD)
• Scheffe
Tukey's HSD
Mean1 Mean2 Mean3

Mean1
Mean2
Mean3 *
Tukey HSD
The Tukey's HSD runs a series of Tukey's post-hoc tests, which are like a
series of t-tests. However, the post-hoc tests are more stringent than the
regular t-tests. It indicates how large an observed difference must be for the
multiple comparison procedure to call it significant. Any absolute difference
between means has to exceed the value of HSD to be statistically significant.
Most statistical programmes will give you an output in the form of a table as
shown above. Group means are listed as a matrix. An asterisk (*) indicates
which pairs of means are significantly different.

Note that only the mean of Group 3 is significantly different from Group 1.
In other words, High SES (Mean = 4.12) subject scored significantly higher
on creativity than Low SES (Mean = 3.85) subjects. There was no significant
difference between High SES and Middle SES subjects nor was there a
significant difference between Middle SES and Low SES subjects.
6.5 ASSUMPTIONS FOR USING ONE-WAY ANOVA

Just like all statistical tools, there are certain assumptions that have to be met for
their usage. The following are several assumptions for using the One-way
ANOVA:
(a) Independent Observations or Subject

Are the observations in each of the groups independent? This means that the
data must be independent. In other words, a particular subject should belong
to only one group. If there are three groups, they should be made up of
separate individuals so that the data are truly independent.
If the same subject belongs to the same group and tested twice, such as in the
case of a pretest and posttest design, you should instead use the Repeated
Measure One-way ANOVA (see Topic 7).
(b) Simple Random Samples

The samples taken from the population under consideration are randomly
selected (Refer to Topic 1 for random selection techniques).
(c) Normal Populations

For each population, the variable under consideration is normally distributed
(Refer to Topic 2 for techniques to determine normality of distribution). In
other words, to use the One-way ANOVA you have to ensure that the
distributions for each of the groups are normal. The analysis of variance is
robust if each of the distributions is symmetric or if all the distributions are
skewed in the same direction. This assumption can be tested by running
several normality tests as stated next:
(i) Normality Tests Using Skewness
Refer to Table 6.3, which shows the means, skewness and kurtosis for
the three groups. The skewness and kurtosis scores indicate that the
scores in Group 1 and Group 2 are normally distributed. There is some
positive skewness in Group 1.

Table 6.3: Means, Skewness and Kurtosis for the Three Groups
Independent Variable Statistic Std. Error
Group
Group 1 Mean 43.82 2.20
Skewness .973 .491
Kurtosis .341 .953
Group 2 Mean 60.14 2.71
Skewness -.235 .597
Kurtosis -1.066 1.154
Group 3 Mean 64.75 3.61
Skewness -.407 .564
Kurtosis -1.289 1.091
(ii) Normality Tests Using Kolmogorov-Smirnov Statistic
Figure 6.3: Test of normality
The Shapiro-Wilk normality tests indicate that the scores are normally
distributed in each of the three conditions. The Kolmogorov-Statistic is
significant for Group 1, but that statistic is more appropriate for larger
sample sizes. Refer to Figure 6.3.

(d) Homogeneity of Variance
Figure 6.4: Test of homogeneity of variances
Just like the t-test, the Levene's test of homogeneity of variance is used for
the One-way ANOVA and is shown in Figure 6.4. The p-value which is
0.113 is greater than the alpha of 0.05. Hence, it can be concluded that the
variances are homogeneous which is reported as Levene (2, 49) = 2.28, p =
.113.
ACTIVITY 6.2
1. When would you use One-way ANOVA instead of the t-test to

compare means?
2. What are the assumptions that must be met when using ANOVA?
6.6 USING SPSS TO COMPUTE ONE-WAY ANOVA

In the COPs study in 2006, a team of researchers administered an Inductive
Reasoning Test to a sample of 946 18-year-old Malaysians. One of the
independent variables examined was socio-economics status (SES). There were
four SES groups: Very High SES, High SES, Middle SES and Low SES.
Researchers were interested in answering the following research question:
Is there a significant difference in inductive reasoning ability between adolescents

of different socio-economic status?
Null Hypothesis: Ho: μ1 = μ2 = μ3 = μ4

Alternative Hypothesis: Ho: Mean of at least are group is different
from the others

Procedure for the One-way ANOVA with post-hoc analysis Using SPSS
2. Click Compare Means and One-Way ANOVA ..... to open the One-Way
ANOVA dialogue box.
3. Select the dependent variable (i.e. inductive reasoning) and click the arrow
button to move the variable into the Dependent List box.
4. Select the independent variable (i.e SES) and click the arrow button to move
the variable into the Factor box.
5. Click the Options ..... command push button to open the One-Way
ANOVA: Options sub-dialogue box.
6. Click the check boxes for Descriptive and Homogeneity-of-variance.
7. Click Continue.
8. Click the Post Hoc .... command push button to open the One-Way
ANOVA: Post Hoc Multiple Comparisons sub-dialogue box. You will
notice that a number of multiple comparison options are available. In this
example you will use the Tukey's HSD multiple comparison test.
9. Click the check box for Tukey.
10. Click Continue and then OK.
(a) Testing for Homogeneity of Variance

Before you conduct the One-way ANOVA, you have to make sure that your
data meet the relevant assumptions of using One-way ANOVA. Let’s first
look at the test of homogeneity of variances, since satisfying this assumption
is necessary for interpreting ANOVA results.
Levene’s test for homogeneity of variances assesses whether the population
variances for the groups are significantly different from each other. The null
hypothesis states that the population variances are equal.
The following Figure 6.5 shows the SPSS output for the Levene's test. Note
that the Levene F-statistic has a value of 0.383 and a p-value of 0.765. Since
p is greater than = 0.05 (i.e. 0.765 > 0.05); we do not reject the null
hypothesis. Hence, we can conclude that the data does not violate the
homogeneity-of-variance assumption.

Figure 6.5: SPSS output for the Levene's Test
(b) Means and Standard Deviations

Another SPSS output is the "Descriptives" table which presents the means
and standard deviations of each group (see Figure 6.6). You will notice that
the means are not all the same. However, this relatively simple conclusion
actually raises more questions. See if you can answer these questions in
Figure 6.6.
Figure 6.6: "Descriptives" Table
As you may have realised, just by looking at the “Descriptives” table, the
group means cannot tell us decisively if significant differences exist. What is
the next step?
(c) Significant Differences

Having concluded that the assumption of homogeneity of variance has been,
the means and standard deviations of each of the four groups have been
computed; the next step is to determine whether SES influences inductive
reasoning. You are seeking to establish whether the four means are 'equal'.
Look at Figure 6.7.

Figure 6.7: Significant differences
What does the table in Figure 6.7 indicates?

• The “Between groups” row shows that the df is 3 (i.e. k – 1 = 4 – 1 = 3)
and the mean square is 33.445.
• The “Within groups” row shows that the df is 942 (N – k = 946 – 4 =
942) and the mean square is 11.072.
• If you divide 33.445 by 11.072 you will get the F value of 3.021 which is
significant at 0.029.
• Since, 0.029 is < than = 0.05, we can reject the Null Hypothesis and
accept the alternative hypothesis. You can conclude that there is a
significant difference in inductive reasoning between the four SES
groups. But which group?
(d) Multiple Comparisons

Having obtained a significant result, you can go further and determine using
a post-hoc test, where the significance lies. There are many different kinds of
post-hoc tests, that examine which means are different from each other. One
commonly used procedure is Tukey’s HSD test. The Tukey test compares all
pairs of group means and the results are shown in the ‘Multiple
Comparisons’ table in Figure 6.8.
Dependent Variable: Inductive Reasoning Ability

Tukey HSD

Figure 6.8: Multiple comparisons table
Note that each mean is compared to every other mean thrice so the results are
essentially repeated in the table. Interpreting the table reveals that:
There is a significant difference only between ‘Low SES’ subjects (Mean =

8.01) and “Very High’ SES subjects (Mean = 8.49) at p = 0.047. i.e. Very
High SES scored significantly higher than “Low SES” at p = 0.047.
However, there are no significant differences between the other groups.

ACTIVITY 6.3
A study was conducted to determine the effectiveness of the collaborative

method in teaching primary school mathematics among pupils of varying
ability levels. The performance of 18 pupils on a mathematics posttest is
presented in Table 6.4 below.
Table 6.4: The Performance of 18 Pupils on a Mathematics Posttest

Low Ability Pupils Middle Ability Pupils High Ability Pupils
45 55 59
58 42 54
61 41 62
59 48 57
49 36 48
63 44 65
Based on the output, answer the following questions:

(a) Comment on the mean and standard deviation for the three groups.
(b) Is there a significant difference in the mathematics performance
between pupils of different ability levels?
(c) What is the p-value?
(d) What is the F-ratio or F-value?
(e) Interpret the Tukey HSD.

ACTIVITY 6.4
A researcher conducted a study to assess the level of knowledge possessed

by university students of their rights and responsibilities as citizens.
Students completed a standardised test. The students’ major was also
recorded. Data in terms of percentages were recorded below for 32
students. Compute the One-way Anova test for the data provided in Table
6.5 as follows:
Table 6.5: Data
Education Business/ Social Computer

Management Science Science
62 7 42 80
81 49 52 57
75 63 31 87
58 68 80 64
67 39 22 28
48 79 71 29
26 40 68 62
36 15 76 45
Based on the output, answer the following questions:

1. What is your computed answer?
2. What would be the null hypothesis in this study?
3. What would be the alternate hypothesis?
4. What probability level did you choose and why?
5. What were your degrees of freedom?
6. Is there a significant difference between the four testing conditions?
Interpret your answer.
7. If you have made an error, would it be a Type I or a Type II error?

The one-way ANOVA is used to compare the differences between more than
two groups of samples from unrelated populations.
• Even though ANOVA is used to compare the mean, this test uses the variance
in computing the test statistics.
• This test requires large, other assumptions needed are normal distribution of
the population parameter, variables measures at least at interval levels, and
equality of variance between the groups.
Between Mean Squares

Test Statistics: F
Within Mean Squares
• Between group variances are due to the differences between the groups (could
be due to different treatment etc.), while within group variances are due to
sampling (the differences among the members of the same group).
• Technically, for any comparison between groups, the between group variance
should be large simply because they are different groups while within the
group itself the variances should be low (assuming the members are
homogenous).
• The F-statistics are based on the premise that if different treatments have
different effects (or different groups respond differently due to their inherited
differences), the between group variance is large while the within group
variance (also called the residual variance) is low. If there is any difference
between the groups, the F-value will be high, causing the null hypothesis to be
rejected.
Analysis of variance Sum of squares

F-test Between mean squares
Between group variance Within mean squares
Within group variance Post-hoc comparisons

Topic Analysis of
7 Covariance
(ANCOVA)
LEARNING OUTCOMES
1. Define Analysis of Covariance (ANCOVA);
2. Explain the logic of ANCOVA;
3. Identify the assumptions for using ANCOVA;
4. Compute ANCOVA using SPSS; and
5. Interpret ANCOVA using SPSS.
INTRODUCTION
This topic explains what analysis of covariance (ANCOVA) is about and the
assumptions for using it in hypothesis testing. It also demonstrates how to
compute and interpret ANCOVA using SPSS.
7.1 WHAT IS ANALYSIS OF COVARIANCE

(ANCOVA)?
The Analysis of Covariance or ANCOVA, is a powerful statistical procedure that
is used in educational research to remove the effects of pre-existing individual
differences among subjects in a study. Due to sampling error, the two (or more
than two) groups that you are comparing do not start on an equal footing with
respect to one or more factors. Examples of such factors are relevant prior
knowledge, motivation, self-regulation, self-efficacy and intelligence.

110 TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)
For instance, when a researcher wants to compare the effectiveness of two

teaching methods, he is most concerned about “prior knowledge” that students
bring with them that is relevant before the experiment begins. For example, it
could happen through mere chance coincidence in the sorting process that
students in either the lecture method or discussion method group start out with
more prior knowledge about the subject or content that they are studying that is
relevant to the experiment.
Besides prior knowledge, other factors that could complicate the situation include
level of intelligence, attitude, motivation and self-efficacy. The Analysis of
Covariance (ANCOVA) provides a way of measuring and removing the effects of
such initial systematic differences between groups or samples.
EXAMPLE:
A researcher conducted a study with the aim of comparing the effectiveness of the
lecture method and the discussion method in teaching geography (see
Figure 7.1). One group received instruction using the lecture method and another
group received instruction using the discussion method.
For illustration purposes, only four students were randomly assigned to the two
groups (in real-life research, you will certainly have more subjects). The result is
two sets of bivariate measures, one set for each group.
Figure 7.1: Lecture method group versus Discussion method group

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA) 111
The data in Figure 7.2 explains the following features of covariance:

• Firstly, there is a considerable range of individual differences within each
group for both “attitude scores” and “geography scores”. For example,
student #1 obtained 66, while student #2 obtained 85 for the geography test
in the lecture method group and so forth.
• Secondly, there is a strong positive correlation between “attitude” and
“geography scores” for both the groups (i.e. 0.95 for the lecture method
group and 0.92 for the discussion method group). In other words, it is not
surprising that the more positive the attitude towards geography, the more
likely it is that a student does well in the subject regardless of the method of
instruction.
• The high correlation also means that a large portion of the variance found in
the geography test is actually contributed from the covariable or covariate
'Attitude' and would show as measurements of error.
• What should you do? You should remove the covariance from the
geography test thereby removing a substantial portion of the extraneous
variance of individual differences; i.e. you want to "subtract out" or
"remove" Attitude scores and you will be left with the "residual" (it is what
is left over). When you subtract, you have reduced geography scores
variability or variance while maintaining the group difference.
• Put it another way, you use ANCOVA to "reduce noise" to produce a more
efficient and powerful estimate of the treatment effect. In other words, you
“adjust” geography scores for variability on the covariate (attitude scores).
• As a rule, you should select a covariable or covariate (in this case, it is
'attitude') that is highly correlated to the dependent or outcome variable
(i.e. geography scores).
• If you have two or more covariables or covariates, make sure that among
themselves there is little intercorrelation (otherwise you are introducing
redundant covariates and end up losing precision). For example, you surely
would not want to use both “family income” and “father's occupation” as
covariates because it is likely that they are both highly correlated.

Figure 7.2: Data explaining features of covariance
7.2 ASSUMPTIONS FOR USING ANCOVA

There are a number of assumptions that underlie the analysis of covariance. Most
of the assumptions apply to One-way ANOVA, with the addition of two more
assumptions. As stated by Coakes and Steed (2000), the assumptions are:
(a) Normality: The dependent variable should have a normal distribution for
participants with the same score on the covariate and in the same group. You
want to obtain normality at each score on the covariate. If the scores for the
covariate alone are normally distributed, then ANCOVA is robust to this
assumption.
(b) Linearity: A linear relationship should exist between the dependent variable
and covariate for each group. This can be verified by inspecting scatter plots
for each group. If you have more than one covariate, they should not be
substantially correlated with each other. If they are, they do not add
significantly to reduction of error.
(c) Independence: Each subject’s scores on the dependent variable and the
covariate should be independent of those scores for all the other subjects.
(d) Homogeneity of Variance: Like ANOVA, ANCOVA assumes
homogeneity of variance. In other words, the variance of Group 1 is equal to
the Variance of Group 2 and so on.
(e) Homogeneity of Regression: ANCOVA assumes homogeneity of
regression exists. That is, the correlation between the dependent variable and
the covariate in each group should be the same. In other words, the
regression lines (or slopes) of each plot should be similar (i.e. parallel)
across groups. The hypothesis tested is that the slopes do not differ from
each other.

(f) Reliability of the Covariate: The instrument used to measure the covariate
should be reliable. In the case of variables such as gender and age, this
assumption can usually be easily met. However, with other types of
variables such as self-efficacy, attitudes, personality, etc., meeting this
assumption can be more difficult.
What is the Homogeneity of Regression Assumption?

One of the assumptions for using ANCOVA is the homogeneity of regression,
which means that the slopes of the regression line should be parallel for the
groups studied. Imagine a case where there are three groups of people we wish to
test the hypothesis; that the higher the Qualification, the higher the Knowledge of
Current Events. It may be the general belief that knowledge of current events is
associated with qualification level. Can you think of other variables that might be
related to the dependent variable (Knowledge of Current Events)? We will select
one covariate, i.e. Age. Assume that age is positively related to Knowledge of
Current Events.
Figure 7.3: Regression lines for the three groups
Look at the graph in Figure 7.3, which shows regression lines for each group
separately. Look to see how each group differs on mean age. The Graduates, for
instance have a mean age of 38, their score on knowledge of current events is 14;
while the mean age for the Diploma holders is 45 and their score on knowledge of
current events is 12.5. The mean for the subjects with High school qualifications
is 50 and their score on the knowledge of current events test is 11.5. What does
this tell you? It is probably obvious to you that part of the differences in
knowledge of current events is due to the groups’ having a different mean age.
So you decide to include Age as a covariate and use ANCOVA.

(a) ANCOVA reduces the error variance by removing the variance due to the
relationship between age (covariate) and the dependent variable (knowledge
of current events).
(b) ANCOVA adjusts the means on the covariate for all of the groups,
leading to the adjustment of the means of the dependent variable
(knowledge of current events).
In other words, what ANCOVA does is to answer the question:

“What would the means for the three groups be on knowledge of current events
(y or DV) if the means of the three groups for age (x or covariate) were all the
same?
ANCOVA adjusts the knowledge of current events means (y means) to what they
would be if the three groups had the same mean on age (x or covariate).
While ANOVA uses the “real” means of each group to determine if the
differences are significant, ANCOVA uses the Grand Mean. The grand mean is
the mean of each group divided by the number of groups (i.e. 38 + 45 + 50
divided by 3 = 44). Now, we can see how far each mean is from the grand mean.
So for the graduates groups, ANCOVA does not use the mean age of 38, in order
to find the mean knowledge of current events. Instead, it gives an estimate of what
the mean of knowledge of current events would be, if age were held constant (i.e.
the mean ages of the groups were the same which in this case is 44).
Hence, you have to ensure that the regression slopes for each group are parallel. If
the slopes are not parallel, using a procedure that adjusts the means of the groups
to an “average” (the grand mean) does not make sense. Is it possible to have a
sensible grand mean, from three very different slopes as shown in Figure 7.4? The
answer is NO because the differences between the groups are not the same, for
each value of the covariate. So, in this case, the use of ANCOVA would not be
sensible.

Figure 7.4: Regression lines for the three groups
SPSS PROCEDURES TO OBTAIN SCATTER DIAGRAM and

REGRESSION LINE FOR EACH GROUP
• Select Graphs, then Scatter. If you are using SPSS 16, then it is Graphs
Legacy Dialog and then Scatter/Dot.
• Make sure Simple is selected, and then choose Define.
• Move the dependent variable (i.e. Knowledge of current events) to the Y-
Axis.
• Move the independent variable (i.e. Qualification level) to the X-Axis.
• Move the grouping variable to Set Markers box.
• Click OK.
[Note that this will give you the scatter diagram of all the groups together]
• Once you have done the above, double-click on the Graph which opens up
the SPSS Chart Editor.
• Choose Chart and Options which opens the Scatter Plot Options.
• Check on the Subgroups box.
• Click on Fit Options button which opens the Fit Line dialogue box.
• Click on Linear Regression and ensure the box is highlighted.
• In Regression Prediction, check the Mean box.
• Click on Continue, then OK.
[This will give you the regression line of each of the groups separately]

7.3 USING ANCOVA – PRETEST-POSTTEST

DESIGN
One of the most common designs in which ANCOVA is used is in the pretest-
posttest design. This consists of a test given BEFORE an experimental condition
is carried out, followed by the same test AFTER the experimental condition. In
this case, the pretest score is used as a covariate. In the pretest-posttest design, the
researcher seeks to partial out (remove or hold constant) the effect of the pretest,
in order to focus on possible changes following the intervention or treatment.
A researcher wanted to find out if the critical thinking skills of students can be
improved using the inquiry method when teaching science. A sample of 30
students were selected and divided into the following groups: 13 high ability
subjects, 8 average ability subjects and 13 low ability subjects. A 10-item critical
thinking test was developed by the researcher and administered before the
intervention and after the intervention.
7.3.1 Before Including a Covariate

A One-way ANOVA was conducted on the data and the results are shown in
Table 7.1 as follows.
Table 7.1: Test of Homogeneity of Variance

Levene Statistics df1 df2 Sig.
.711 2 27 .500
The homogeneity of variance table (Table 7.1) indicates that the variances of the
three groups are similar and the null hypothesis is rejected as the p value is 0.500
is more than the p value of .05. Hence, you have not violated one of the
assumptions for using ANOVA.
Table 7.2: Means and Standard Deviations

Ability N Mean Std. Deviation
Low 9 3.22 1.78
Average 8 4.87 1.45
High 13 4.84 2.11
Total 30 4.37 1.95

Table 7.2 shows the means and standard deviations for the three groups of
subjects – low, average and high ability. Although the high ability group subjects
scored 4.84 and low ability subjects scored only 3.22; the difference between the
ability levels is not significant. Therefore, teaching students using the inquiry
method seems to have no significant effect on critical thinking.
Table 7.3: ANOVA Table
Dependent Variable: Critical Thinking

Sum of Mean
df F Sig.
Squares Square
Corrected Model 16.844a 2 8.422 2.416 .108
Intercept 535.184 1 535.184 153.522 .000
Between Groups 16.844 2 8.422 2.416 .108
Within Groups 94.123 27 3.486
Total 583.000 30
Corrected Total 110.967 29
A R Squared – 1.52 (Adjusted R Square = 0.89)
Since the p-value reported is .108 which is more than the p-value of .05, the
Tukey’s post hoc comparison test revealed no significant differences between the
three groups of students. Therefore, it is concluded that teaching science using the
inquiry method seems to have no significant effect on critical thinking.
7.3.2 After Including a Covariate

The same critical thinking test was administered before the commencement of the
experiment which served as the pretest. What happens when the scores of the
pretest are included in the model as a covariate?
See the ANOVA table with the covariate included. Compare this to the ANOVA
table when the covariate was not included. The format of the ANOVA table is
largely the same as without the covariate (see Table 7.4), except that there is an
additional row of information about the covariate (pretest).

Table 7.4: ANOVA Table

Dependent Variable: Critical Thinking
Sum III Sum Mean
Source df F Sig
of Squares Square
Corrected
31.920 3 10.640 3.500 0.030
Model
Intercept 76.069 1 76.069 25.020 0.000
PRETEST 15.076 1 15.076 4.959 0.035
Between Group 25.185 1 12.593 4.142 0.037
Within Groups 79.047 26 3.040
Total 683.000 30
Corrected Total 110.967 29
Table 7.5: Adjusted Means and Standard Errors

Ability N Mean Std. Error
Low 9 2.92 .59
Average 8 4.71 .62
High 13 5.15 .50
Table 7.6: Pairwise Comparisons

Low Average High
Low *
Average
High
* Significant at p = .05
Looking first at the significance values, it is clear that the covariate (i.e. pretest)
significantly influenced the dependent variable (i.e. posttest), because the
significance values are less than .05. Therefore, performance in the pretest had a
significant influence on the posttest. What is more interesting is that when the
effect of the pretest is removed, teaching science using the inquiry method
becomes significant (p is .037 which is less than .05). There was a significant
effect of the inquiry method of teaching on critical thinking after controlling
for the effect of the pretest, F(2,26) = 4.14, p <.05.
Table 7.5 shows the adjusted means (The Sidak test was used to obtain the
adjusted means). These values should be compared with Table 7.2 to see the

effect of the covariate on the means of the three groups. The results show that low
ability subjects differed significantly from high ability subjects on the critical
thinking test (see Table 7.6). However, there were significant differences between
average and high ability subjects.
CONCLUSION
This example illustrates how ANCOVA can help us exert stricter experimental
control by taking into account confounding variables to give us a “purer” measure
of the effect of the experimental manipulation. Without taking into account the
pretest, we would have concluded that the inquiry method of teaching science had
no effect on critical thinking of subjects, yet clearly it does.
SPSS PROCEDURES TO CONDUCT AN ANALYSIS OF

COVARIANCE (ANCOVA)
• Select the Analyze menu.
• Click on the General Linear Model and then Univariate… to open
the Univariate dialogue box.
• Select the dependent variable (e.g. geography test) and click on the
arrow button to move the variable into the Dependent Variable: box
• Select the independent variable (e.g. treatment), and click on the
arrow button to move the variable into the Fixed Factor(s): box.
• Select the covariate (e.g. attitude) and click on the arrow button to
move the variable into the Covariates(s): box
• Click on the Options… command push button to open the
Univariate: Options sub-dialogue box.
• In the Display box click on the Descriptive statistics, Estimates of
effect size, Observed power and Homogeneity tests check boxes.
• Click on Continue and then OK.

ACTIVITY 7.1
A researcher conducted a study on the memory of four groups of people

of different age groups. Since memory may be related to IQ, the
researcher decided to control it.
1. What is the covariate?
2. What would his analysis show?
3. State a hypothesis for the study.
ACTIVITY 7.2
Refer to the following Table 7.7, which is an SPSS output and answer
the following questions:
1. State the independent variable. Give reasons.
2. Which is the covariate? Explain.
3. State the dependent variable. Give reasons.
4. State a hypothesis for the above results.
5. Do you reject or do not reject the hypothesis stated above?
Table 7.7: SPSS Output

Dependent Variable: Reaction Time
Sum III Sum Mean
Source df F Sig.
of Squares Square
Corrected
76.252 3 25.417 36470 .064
Model
Intercept 4.792 1 4.792 .688 .431
Age 4.252 1 4.252 .610 .457
Group 41.974 2 20.987 3.012 .106
Error 55.748 8 6.969
Total 1860.000 12
Corrected
132.000 11
Total

• The Analysis of Covariance or often referred to as ANCOVA is a powerful

statistical procedure that is used in educational research to remove the effects
of pre-existing individual differences among subjects in a study.
• ANCOVA provides a way of measuring and removing the effects of such

initial systematic differences between groups or samples.
• It is a parametric procedure that requires the following assumption to be met:

(i) Normality, (ii) Linearity, (iii) Independence (iv) Homogeneity of Variance,
(v) Homogeneity of Regression and (vi) Reliability of the Covariate.
Covariate Linearity
Homogeneity of regression Normality
Homogeneity of variance Reliability of the covariate
Independence

Topic Correlation
8
LEARNING OUTCOMES
1. Explain the concept of relationship between variables;
2. Discuss the use of the statistical tests to determine correlation; and
3. Interpret SPSS outputs on correlation tests.
INTRODUCTION
This topic explains the concept of causal relationship between variables. It
discusses the use of statistical tests to determine slope, intercept and the
regression equation. It also demonstrates how to run regression analysis using
SPSS and interpret the results.
8.1 WHAT IS A CORRELATION COEFFICIENT?

Researchers are often concerned with the way two variables relate to each other
for given groups of persons such as students in schools and workers in a factory or
office. For example, do students who have higher scores in mathematics also have
higher scores in science? Is there a relationship between a person's self-esteem
and his personality? Is there a relationship between attitudes towards reading and
the number of books read? Is there a relationship between years of experience as a
teacher and attitudes towards teaching?
The correlation coefficient is a number between 0 and 1. If there is no

relationship between the values, the correlation coefficient is 0 or very low. As
the strength of the relationship between the values increases, so does the
correlation coefficient. Thus, the higher the correlation coefficient, the better is
the relationship.

TOPIC 8 CORRELATION 123
8.2 PEARSON PRODUCT-MOMENT

CORRELATION COEFFICIENT
Pearson's product-moment correlation coefficient (also known as Pearson r),
usually denoted by r, is one example of a correlation coefficient. It is a measure of
the linear association between two variables that have been measured on interval
or ratio scales, such as the relationship between the amount of education and
income levels. If there is a relationship between the amount of education and
income levels, the two variables co-vary.
(a) Assumptions Testing

Correlational analysis has the following underlying assumptions: (S. Coakes
and L. Steed, 2002, SPSS Analysis Without Anguish. Brisbane: John Wiley
& Sons)
• Related Pairs – the data to be collected from related pairs: i.e. if you
obtain a score on an X variable, there must be a score on the Y variable
from the same subject.
• Scale of Measurement – data should be interval or ratio in nature.
• Normality – the scores for each variable should be normally distributed.
• Linearity – the relationship between the two variables must be linear.
• Homogeneity of Variance – the variability in scores for one variable is
roughly the same at all values of the other variable; i.e. it is concerned
with how the scores cluster uniformly about the regression line.
(b) Strength of the Correlation

The strength of a relationship is indicated by the size of the correlation
coefficient: the larger the correlation, the stronger the relationship. A strong
relationship exists where cases in one category of the X variable usually
have a particular value on the Y variable, while those in a different value of
X have a different value on Y.
For example, if people who exercise regularly nearly always have better
health than those who do not exercise, then exercise and health are more
strongly correlated. If those who exercise regularly are just a little more
likely to be healthy than non-exercisers then the two variables are only
weakly related. The scale in Figure 8.1 as follows shows the strength of the
correlation coefficient.

124 TOPIC 8 CORRELATION
Trivial Low to Moderate Substantial Very Near

Moderate to to Very Strong Perfect
Substantial Strong
0.01-0.09 0.10-0.29 0.30-0.49 0.50-0.69 0.70-0.89 > 0.90
Figure 8.1: The Strength of the Correlation Coefficient
How high does a correlation coefficient have to be, to be called strong? How
small is a weak correlation? The answer to these questions varies with the
variables being studied. For example, if the literature shows that in previous
research, a correlation of 0.51 was found between variable X and variable Y, but
in your study you obtained a correlation of 0.60; then you might conclude that the
correlation between variable X and Y is strong.
However, Cohen (1988) has provided some guidelines to determine the strength
of the relationship between two variables by providing descriptors for the
coefficients. Keep in mind that in education and psychology, it is rare that the
coefficients will be “very strong” or “near perfect” since the variables measured
are constructs involving human characteristics, which are subject to wide
variation.
Example:
Data was gathered for the following two variables (IQ test and science test) from a
sample of 12 students. Refer to Table 8.1 below.
Table 8.1: Data of Two Variables (IQ Test and Science Test)
Student No. IQ Test Science Test

(X) (Y)
1 120 31
2 112 25
3 110 19
4 120 24
5 103 17
6 126 28
7 113 18
8 114 20
9 106 16
10 108 15
11 128 27
12 109 19
• Each unit or student is represented by a point on the scatter diagram (see the
following Figure 8.2). A dot is placed for each student at the point of
intersection of a straight line drawn through his IQ score perpendicular to the
X-axis and through his science score perpendicular to the Y-axis. For

example, a student who obtained an IQ score of 120 also obtained a science

score of 24. The intersection between these lines is represented by the dot 'A'.
• The scatter diagram shows a moderate positive relationship between IQ scores

and science scores. However, we do not have a summarised measure of this
relationship. There is a need for a more precise measure to describe the
relationship between the two variables. You need a numerical descriptive
measure of the correlation between IQ scores and science scores, which will
be discussed later.
Figure 8.2: Scatter Diagram Showing the Relationship between IQ Scores (X-axis)
and Science Score (Y-axis) for 12 Students
8.2.1 Range of Values of rxy

Note that rxy can never take on a value less than – 1 nor a value greater than + 1 (r
refers to the correlation coefficient, x the X-axis and y the Y-axis). The following
are three graphs showing various values of rxy and the type of linear relationship
that exists between X and Y for the given values of rxy.
(a) Positive Correlation
Value of rxy = + 1.00 = Perfect and Direct Relationship.

Figure 8.3: Perfect Correlation
See Figure 8.3. If Attitudes (x) and English Achievement (y) had a positive
relationship then the Slope ( 1) will be a positive number. Lines with positive
slopes go from the bottom left toward the upper right, i.e. an increase from 1 to 2
on the X-axis is followed by an increase from 3 to 3.5 on the Y-axis.
(b) Negative Correlation
Value of rxy = 1.00 = Perfect Inverse Relationship.
Figure 8.4: Negative Correlation

See Figure 8.4. If Attitudes (x) and English Achievement (y) have a negative
relationship than the Slope ( 1) will be a negative number. Lines with
negative slopes go from the upper right to the lower left. The above graph
has a slope of –1. An increase of 1 on the X-axis is associated with a
decrease of 0.5 on the Y-axis; i.e an increase from 1 to 2 on the X-axis is
followed by a decrease from 5 to 4.5 on the Y-axis.
(c) Zero Correlation
Value of rxy = .00 = No Relationship.
Figure 8.5: No Correlation
If Attitudes (x) and English Achievement (y) have zero relationship (as shown in
Figure 8.5) than there is NO SYSTEMATIC RELATIONSHIP between X and Y.
Here, some students with high Attitude scores have positive low English scores,
while some students who have low Attitude scores have high positive English
scores.
8.3 CALCULATION OF THE PEARSON

CORRELATION COEFFICIENT (r OR rXY)
A researcher conducted a study to determine the relationship between verbal and
spatial ability. She was interested in finding out whether students who scored high
on verbal ability also scored high on spatial ability. She administered two 15-item
tests measuring verbal and spatial ability to a sample of 12 primary school pupils.
The results of the study are shown in Table 8.2 as follows.

Table 8.2: Results of the Study

Verbal Spatial
Test Test
x y x y xy
Seng Huat 13 7 169 49 91

Fauzul 10 6 100 36 60
Shalini 12 9 144 81 108
Tajang 14 10 196 100 140
Sheela 10 7 100 49 70
Kumar 12 11 144 122 132
Mei Ling 13 12 169 144 156
Azlina 9 10 81 100 90
Ganesh 14 13 196 169 182
Ahmad 11 12 122 144 132
Kong Beng 8 9 64 81 72
Ningkan 9 8 81 64 72
x = 135 y = 114 x² = 1566 y² =1139 xy =1305
Illustration of the Calculation of Correlation Coefficient (r or rxy) for the

Data in Table 8.2
The Pearson Correlation Coefficient (also called the Pearson r) is the commonly
used formula in computing the correlation between two variables. The formula
measures the strength and direction of a linear relationship between variable X
and variable Y. The sample correlation coefficient is denoted by r. The formula
for the sample correlation coefficient is:
X Y
XY
r N
( X )2 ( Y )2
( X2 )( Y2 )
N N

( x)( y) (135)(114) 15390

SSxy xy 1303 22.50
n 12 12
( x)2 (135) 2
SSxx x2 1566
n 12
18225
1566 1566 1518.75 47.25
12
( y)2 (114) 2
SSyy y2 1139
n 12
12996
1139 1139 1083 56.00
12
12
Using the formula to Obtain the Correlation Coefficient:
X Y
XY
N 22.50
r
( X )2 ( Y )2 (47.50)(56.00)
( X2 )( Y2 )
N N
22.50
0.436
51.58
8.4 PEARSON PRODUCT-MOMENT

CORRELATION USING SPSS
A study was conducted to determine the relationship between reading ability and
performance in science. A Reading Ability and Science test was administered to
200 lower secondary students. The Pearson product-moment correlation was used
to determine the significance of the relationship. The steps for using SPSS are as
follows:

SPSS Procedures:
2. Click on Correlate and then Bivariate to open the
Bivariate Correlations dialogue box.
3. Select the variables you require (i.e. reading and science) and
click on the arrow button to move the variables into the
Variables: box.
4. Ensure that the Pearson correlation option has been
selected.
5. In the Test of Significance box, select the One-tailed radio
button.
6. Click on OK.
8.4.1 SPSS Output

Figure 8.6: SPSS Output
To interpret the correlation coefficient, you examine the coefficient and its
associated significance value (p). The output show that the relationship between
reading and science scores is significant with a correlation coefficient of r = 0.63
which is p < .05. Thus, higher reading scores are associated with higher scores in
science.

8.4.2 Significance of the Correlation Coefficient

We introduced Pearson correlation as a measure of the strength of a relationship
between two variables. But any relationship should be assessed for its significance
as well as its strength. The significance of the relationship is expressed in
probability levels: p (e.g., significant at p =.05). This tells how unlikely a given
correlation coefficient, r, will occur given there is no relationship in the
population. It assumes that you have a sample of cases from a population. The
question is whether your observed statistic for the sample is likely to be observed
given some assumption of the corresponding population parameter. If your
observed statistic does not exactly match the population parameter, perhaps the
difference is due to sampling error.
To be useful, a correlation coefficient needs to be accompanied by a test of

statistical significance. It is also important for you to know about the sample size.
Generally, a strong correlation in a small population may be statistically non-
significant, while a much weaker correlation in a large sample may be statistically
significant. For example, in a large sample, even low correlations (as low as 0.06)
can be statistically significant. Similar sized correlations that are statistically
significant with large samples are not significant for the smaller samples. This is
because with smaller samples the likelihood of sampling error is higher.
8.4.3 Hypothesis Testing for Significant Correlation

The null hypothesis (Ho:) states that the correlation between X and Y is = 0.0.
What is the probability that the correlation obtained in the sample came from a
population where the parameter = 0.0? The t-test for the significance of a
correlation coefficient is used. Note that the correlation between reading and
science (r = 0.630) is significant at p < 0.05.
Hence, the null hypothesis is REJECTED which affirms that the two variables are
positively related in the population.
Coefficient of Determination:
r = The correlation between X and Y = 0.630 and r² = The coefficient of

determination = (0.630)² = 0.3969.
Hence, 39.6% of the variance in Y can be explained by X.

8.4.4 To Obtain a Scatter Plot using SPSS

SPSS Output
SPSS Procedures:
1. Select the Graph menu.
2. Click on Scatter to open the Scatterplot dialogue box.
3. Ensure Simple Scatterplot option is selected.
4. Click on the Define command push button to open the Simple
Scatterplot sub-dialogue box.
5. Select the first variable (i.e. science) and click on the arrow button to
move the variable into the Y Axis: box. .
6. Select the second variable (i.e. reading) and click on the arrow button to
move the variable into the X Axis: box.
6. Click on OK.
Figure 8.7: Scatter Plot

As you can see from the scatter plot (Figure 8.7) there is a linear relationship
between reading and science scores. Given that the scores cluster uniformly
around the regression line, the assumption of homogeneity of variance has not
been violated.
8.5 SPEARMAN RANK ORDER CORRELATION

COEFFICIENT
This is the alternative form if the assumptions for Pearson correlation are not met.
In this case, the variables are converted into rank and the correlation coefficient is
computed using the ranked data. Table 8.3 illustrates how the Spearman Rank
Order correlation is computed for the sales and expenditure on advertisement data
by converting the scores into ranks.
Table 8.3: Computation of Spearman Rank Order Correlation
Month Rank Rank Rank difference d2

Sales Advertisement Sales Advertisement d Ranks are
(mil) - X (mil) -Y assigned to
11 scores by
1 175.3 66.8 11 0 0
giving rank 1
2 154.9 59.0 1 6 -5 25 to the smallest
3 172.7 61.3 10 7.5 2.5 6.25 score and
rank 2 to the
4 167.6 61.3 7 7.5 -0.5 0.25
value and so
5 167.6 54.5 7 4.5 2.5 6.25 on.
6 160.0 52.2 4.5 2.5 2 4
0
7 182.9 68.1 12 12 0 Scores with
2.5 same values
8 157.5 47.7 1 1.5 2.25
will share the
9 157.5 52.2 2.5 2.5 0 0
rank
10 170.2 65.8 9 10 -1 1
11 167.6 64.5 7 9 -2 4
12 160.0 54.5 4.5 4.5 0 0
49
6 d2 6(49)
rs 1 =1 = 0.796
n (n 2 1) 12(121 1)

8.5 SPEARMAN RANK ORDER CORRELATION

USING SPSS
The Spearman Rank Order Correlation is used to determine the linear relationship
between the two variables listed as follows:
• Employees are knowledgeable
• Performs the service right the first time
SPSS Procedures:
2. Click on Correlate and then Bivariate to open the Bivariate
Correlations dialogue box.
3. Select the variables you require (i.e. reading and science) and click on
the arrow button to move the variables into the Variables: box.
4. Ensure that the Spearman correlation option has been selected.
5. In the Test of Significance box, select the One-tailed radio button.
6. Click on OK.
Results
Correlations
rq2 rq6
Spearman's rho rq2 Correlation Coefficient 1.000 .507**
Sig. (2-tailed) . .000
N 203 203
rq6 Correlation Coefficient .507** 1.000
Sig. (2-tailed) .000 .
N 203 203
**. Correlation is significant at the 0.01 level (2-tailed).
• The correlation coefficient of 0.507, indicates a moderate positive relationship

between “Employees are knowledgeable (rq2) & Performs the service right the
first time (rq6)”.
• The p-value of 0.000 (less than 0.05), shows that the linear relationship is a
true reflection of the phenomena in the population. In other words, the linear
relationship seen in the sample is NOT due to mere chance.

NOTE : CAUSATION AND CORRELATION

Causation and correlation are two concepts that have been wrongly interpreted by some
researchers. The presence of a correlation between two variables does not necessarily
mean there exists a causal link between them. Say for instance that there is a correlation
(0.60) between "teachers’ salary" and "academic performance of students".
Does this imply that well-paid teachers "cause" better academic performance of students?
Would the percentage of academic performance increase if we increased the pay of
teachers? It is dangerous to conclude the causation just because there is a correlation or
relationship between the two variables. It tells nothing by itself about whether "teachers’
salary" causes "achievement".
ACTIVITY 8.1
A researcher conducted a study which aimed to determine the
relationship between self-efficacy and academic performance in
geography. A 20-item self-efficacy scale and a 25-item geography test
was administered to a group of 12 students.
The following are the results of the study:
Self-Efficacy Scale Geography Test

15 22
13 17
14 20
12 18
16 23
12 21
11 19
17 24
15 19
13 16
(a) Compute the Pearson Correlation Coefficient.

(b) What is the mean for the self-efficacy scale and the mean of the
geography test?
(c) Plot a scatter plot for the data.
(d) Comment on the scatter plot.
(e) Compute the Spearman Rank Order correlation coefficient.
(f) Perform a significant test for the correlation coefficient.

• The linear relationship between two variables is evaluated from two aspects:
the strength of the relationship (correlation), and the cause-effect association
(regression).
• In statistics, correlation is used to denote association between two quantitative

variables, assuming that the association is linear.
• The value for correlation coefficient ranges from –1 to +1. Any value close to
these extremes indicates the strength of the linear relationships in the same or
opposite direction.
• There are two methods for computing the correlation coefficient, the Pearson
correlation, and Spearman Rank Order correlation. The latter is the non-
parametric equivalent of the former and used when the data is measured in an
ordinal level or when the sample size is small.
• The correlation coefficient computed from the sample indicates the strength of
the relationship in the sample. To generalise a linear relationship to the
population, the significant test needs to be performed.
Coefficient of determination Scatter diagram

Linear relationship Spearman rank order correlation
Pearson's product-moment correlation

Topic Linear
9 Regression
LEARNING OUTCOMES
1. Explain the concept of relationship between variables;
2. Determine the slope and intercept of a regression equation;
3. Discuss the use of the statistical tests to determine cause-effect
relationship between variables; and
4. Interpret SPSS outputs on regression analysis.
INTRODUCTION
This topic explains the concept of causal relationship between variables. It
discusses the use of statistical tests to determine slope, intercept and the
regression equation. It also demonstrates how to run regression analysis using
SPSS and interpret the results.
9.1 WHAT IS SIMPLE LINEAR REGRESSION?

Correlation describes the strength of an association between two variables. If the
two variables are related, the changes in one will lead to some changes in the
corresponding variable. If the researcher can identify the ‘cause’ and ‘effect’
variable, the relationship can be represented in the form of equation:
Y = a + bX
where Y is the dependent variable, X is the independent variable, and a and b are
two constants to be estimated.

138 TOPIC 9 LINEAR REGRESSION
Basically regression is a technique of placing the best fitting straight line to represent
a cluster of points (see the following Figure 9.1). The points are defined in a two-
dimension plane. The straight line expresses the linear association between the
variables studied. It is a useful technique to establish cause-effect relationship
between variables and to forecast future results/outcomes. An important consideration
in linear regression analysis is, the researcher must identify the ‘independent’ and
‘dependent’ variable prior to the analysis.
9.2 ESTIMATING REGRESSION COEFFICIENT

Y = a + bX
Slope
The inclination of a regression line as compared to a base line:
n XY X Y
b
n X2 X2
Y-intercept
An intercepted segment of a line, the point at which a regression line intercepts

the Y-axis:
a Y bX
Figure 9.1: Slope and Intercept of a Regression Line

TOPIC 9 LINEAR REGRESSION 139
Example:
A research was conducted at TESCO Hypermarket to determine if there is a
cause-effect relationship between the sales and expenditure on advertisements.
Table 9.1 illustrates the computation of the regression coefficients.
Table 9.1: Computation of Regression Coefficients
Month Sales (mil) Advertisement (hundred thousand)

(X) (Y) (X*Y) X^2
1 157.5 47.7 7507.07 24799.95

2 157.5 52.2 8222.03 24799.95
3 160.0 52.2 8354.64 25606.40
4 160.0 54.5 8717.89 25606.40
5 167.6 54.5 9133.03 28103.17
6 154.9 59.0 9144.56 24006.40
7 167.6 61.3 10274.66 28103.17
8 172.7 61.3 10586.01 29832.20
9 167.6 64.5 10812.78 28103.17
10 170.2 65.8 11202.95 28961.23
11 175.3 66.8 11707.37 30716.07
12 182.9 68.1 12454.13 33445.09
Total 1993.9 707.9 118117.11 332083.21
Mean 166.2 59.0
12 118117.11 1993.9 707.9

b 2
0.63
12 332083.21 1993.9
a Y bX 46.86
The regression equation for the relationship between Sales and Expenditure on
advertisements is:
Sales = 0.63 (Expenditure on advertisement) – 46.86
This means that, on average every increase of RM 100,000 advertisement

expenditure will lead to an increase of RM 0.63 million in sales.

9.3 SIGNIFICANT TEST FOR REGRESSION

COEFFICIENTS
The slope computed, simply shows the degree of the relationship between the
variables in the sample observed. Whether this is due to chance or there is a true
relationship between these two variables can only be determined through the
significant test for regression coefficient.
Example
If the researcher would like to test the hypothesis that there is a true relationship
between sales and expenditure on advertising, the following procedures need to be
adhered.
9.3.1 Testing the Assumption of Linearity

Prior to proceeding with the significant test for the slope, the assumption of
linearity need to be tested first. This is simply to gather statistical evidence that
the Linear Regression model that we proposed is an appropriate model in relating
the relationship between the variables. The linearity test is also called the global
test.
The Hypothesis
Ho: The variation in the dependent variable is not explained by the linear model
(R2 = 0).
Ha: A significant porting of the variation in the dependent variable is explained

by the linear model (R2 0).
The level of significance is set at 0.05 ( = 0.05).
The researcher performs the ANOVA for the linear relationship between sales and
expenditure on advertising. The result is shown in Table 9.2.
Table 9.2: The Results of the ANOVA for Simple Linear Regression between Sales and
Expenditure on Advertising
ANOVA
df SS MS F p-value
Regression 1 254.65 254.65 13.46 0.01
Residual 9 170.22 18.91
Total 10 424.88
F-value is 13.46
P-value is 0.01
Since the p-value is smaller than 0.05, reject null hypothesis and conclude the
alternative hypothesis. There is a linear relationship between the variables studied.
From the data it is evident that there is a linear relationship between sales and
expenditure on advertising.
Now, we can proceed to the test of significance for the regression slope.
9.3.2 Testing the Significance of the Slope

The next step is testing the significance of the slope. This is to test whether there
is a significant contribution of the predictor variable to the changes in the
dependent variable. In our case, it is to test the significant contribution of
expenditure on advertising to sales.
Note : For simple linear regression where there is only one independent variable,
if linear relationship is ‘proven’ the significance test for the slope will show
‘significant departure from zero’.
Requirements
Parameter to be tested: Regression Slope,
Normality: Sample statistics (in this case, b) resembles normal distribution.
Sample size: Large
Recommended test: t-test for regression slope.
b
Test statistics: t
SE (b)
The Hypothesis
H0: The regression slope is equal to zero.
Ha: The regression slope is not equal to zero.
The researcher performs the t-test for regression slope for the linear relationship
between expenditure on advertisement and sales. The result is shown in Table 9.3.
Table 9.3: The Results of the T-test to Test the Significance of the Regression Slope
Coefficients Standard Error t-Stat P-value

Intercept -46.86 14.77 -3.17 0.006
Slope 0.633 0.1656 3.82 0.005

t-value is 3.82
p-value is 0.005
alternative hypothesis. The regression slope is not equal to zero. There is a true
relationship between the variables studied. Sales is linearly related to expenditure
in advertisement. The regression coefficient for this relationship is:
Sales = –46.86 + 0.633 (Expenditure on advertisement) + Error
The R2 is 0.599, meaning that 59.9% of the variation in Sales is attributed to the
variation in Expenditure on advertising.
9.4 SIMPLE LINEAR REGRESSION USING SPSS

The Linear regression is to determine the “causal” relationship between the
dependent and independent variables listed below:
Employees knowledge (Independent)
Customer Satisfaction (Dependent)
NOTE: Before proceeding with the regression analysis, the following

assumptions need to be checked.
Linear Relationship
Normal Error
Homoscedasticity

SPSS Procedures:

2. Click on Regression and then Linear to open the Linear
Regression dialogue box.
3. Select the dependent variable and push it into the Dependent Box
4. Select the independent variable and push it into the Independent
Box
5. Click Statistics and tick Estimates, Model fit, and Descriptive
6. Click Continue
7. Click on OK.
Results
The first step in regression analysis: Global Hypothesis
Ho: The variation in the dependent variable is not explained by the linear model
(R2 = 0).
Ha: A significant porting of the variation in the dependent variable is explained
by the linear model (R2 0).
Figure 9.2: ANOVA

Since the p-value is less than 0.05, reject the null hypothesis and conclude that a
significant porting of the variation in the dependent variable is explained by the
linear model. Refer to Figure 9.3.
Figure 9.3: Model Summary
The R2 is 0.306; indicates that about 30.6% of the variation in the customers’
satisfaction can be attributed to the changes in the respondents’ perception on
employees’ knowledge.
The next step is to test the significant of the slope. In simple linear regression if
the global hypothesis shows that there is a significant linear relationship between
the dependent and independent variable, the significance test for the slope will
also provide evidence that it is significantly different from zero.
The Hypothesis
Figure 9.4: Coefficients

Since the p-value is less than 0.05, reject the null hypothesis and conclude that the
regression slope is not equal to zero. Thus,
Customers’ Satisfaction = 0.553 (Employees’ knowledge) + 2.596 + Error
9.5 MULTIPLE REGRESSION

Multiple regression is an extension of simple linear regression. It uses the same
principles in placing the best fitting straight line to represent a cluster of points,
BUT the consideration is not TWO but multiple dimensions.
Example
A researcher is interested in determining the various factors that contribute to the
sales of a newly introduced hair shampoo. Among the crucial factors that he
wishes to study are cost for TV advertisement, training of sales executives,
employing promoters, distribution of free samples, and leasing the prime spots at
hypermarkets and supermarkets.
The variables involved in the study are:
TV : TV advertisement cost
Train : Training of sales executives cost
Promoters : Cost for employing promoters
Free samples : Cost for distributing free samples
Prime spot : Cost for leasing prime spots at hyper and supermarkets
(a) Testing the assumption of linearity

The purpose is to determine whether the factors are linearly related to the
sales of the newly introduced hair shampoo.
(b) The Hypothesis

Ho: The variation in the sales is not explained by the linear model
comprising of costs for TV advertisement, training of sales executives,
employing promoters, distributing free samples, and leasing prime spots. (R2
= 0).
Ha: A significant porting of the variation in the sales is explained by the

linear model comprising of costs for TV advertisement, training of sales
executives, employing promoters, distributing free samples, and leasing
prime spots. (R2 0).

The researcher performs the ANOVA for the linear relationship between
sales and all the defined predictor variables. The result for it is shown in
Table 9.4.
Table 9.4: The Results of the ANOVA for Multiple Regressions
Model Sum of Squares df Mean Square F Sig.

Regression 30.866 5 6.173 318.33 .0000
Residual 90.216 4652 0.019
Total 121.082 4657
a. Predictors: (Constant)TV, Train, Promoters, Free sample, Prime spot

b. Dependent Variable: sales
Since the p-value is smaller than 0.05, reject the null hypothesis and
conclude the alternative hypothesis. There is a linear relationship between
the variables studied. From the analysis it is evident that there is a linear
relationship between the sales and the combination of the predictor
variables.
The next step is the test of significance for the regression slope (for every
independent [predictor] variable). This is to determine the contribution of
each predictor variable independently.
(c) Requirements
Parameter to be tested: Regression Slope,

Normality: Sample statistics resembles normal distribution.
Sample size: Large sample size
Recommended test: t-test for regression slope.
b
Test statistics: t
SE (b)
(d) The Hypothesis


The researcher performs the t-test for regression slopes for the linear
relationship between Sales and the following variables:
(i) Costs for TV advertisements;
(ii) Training of sales executives;
(iii) Employing promoters;
(iv) Distributing free samples; and
(v) Leasing prime spots.
The result for it is shown in Table 9.5.
Table 9.5: The Results of the T-test to Test the Significance of the Regression Slopes
Model Unstandardised Coefficients t Sig.

B Std. Error
(Constant) 3.5373 0.4038 8.76 .000
TV ads 0.1214 0.0261 4.650 .000
Train -0.1247 0.0944 -1.321 0.429
Promoters 0.2626 0.0138 19.095 .000
Free
.05965 0.0114 5.208 .000
samples
Prime spots 0.2163 0.1531 1.413 0.115
a. Dependent Variable: sales
Since the p-value is smaller than 0.05, for (i) costs for TV advertisements,
(ii) employing promoters and (iii) distributing free samples.
The regression model for this relationship between Sales and costs of
advertisements is:
Sales = 3.54 +0.1214 (TV) + 0.2626(Promoters) + 0.0597(Free Samples) + error
The adjusted R2 is 0.254, meaning that 25.4% of the variation in the sales is
attributed to the combined variation in the costs for TV advertisement,
employing promoters, distributing free samples.

9.6 MULTIPLE REGRESSION USING SPSS

In a study on hospital service quality, the researcher classified service quality into
the following dimensions: assurance, reliability, service policy, tangibles, problem
solving and convenience. Apart from this, he also assessed the overall patients’
satisfaction with the services. The following is the description of the hospital
service quality dimensions.
Dimension Number of Items

Assurance 7
Reliability 5
Service Policy 2
Tangibles 3
Problem Solving 4
Convenience 4
He wanted to determine how patients’ perception on the service performance of

the hospital on the six dimensions of service quality influenced their overall
satisfaction.
The Hypothesis
Ho: The variation in patients’ overall satisfaction is not explained by the linear
model comprising of patients assessment on assurance, reliability, service
policy, tangibles, problem solving and convenience. (R2 = 0).
Ha: A significant porting of the variation in patients’ overall satisfaction is not

explained by the linear model comprising of patients assessment on
assurance, reliability, service policy, tangibles, problem solving and
convenience. (R2 0).

SPSS Procedures:

2. Click on Regression and then Linear to open the Linear Regression
dialogue box.
3. Select the dependent variable and push it into the Dependent Box
4. Select the independent variables and push them into the Independent
Box
5. Click Statistics and tick Estimates, Model fit and Descriptive
6. Click Continue
7. Click on OK.
Results
Figure 9.5: ANOVA
Since the p-value is less than 0.05, reject the null hypothesis and conclude that a
significant portion of the variation in the dependent variable is explained by the
linear model.
The next step is the test of significance for the regression slope (for every
independent [predictor] variable). This is to determine the contribution of each
predictor variable independently.

The Hypothesis
Figure 9.6: Coefficients
Overall satisfaction is linearly related to patients’ perception on assurance

reliability and convenience. The regression equation for this relationship is:
Overall satisfaction = 0.638 +0.204 (Assurance) + 0.263(Reliability) + 0.382

(Convenience) + error
Model Summaryb
Adjusted R Square Durbin-W

Model R R Square R Square Change atson
1 .790a .624 .619 .624 1.952
a. Predictors: (Constant), Convenience, Assurance, Reliability
b. Dependent Variable: Overall satisfaction
Figure 9.7: Model Summary

Refer to Figure 9.7. The adjusted R2 is 0.619, meaning that 61.9% of the variation
in the overall satisfaction is attributed to the combined variation in patients
perception of assurance, reliability and convenience of services provided by the
hospital.
ACTIVITY 9.1
A researcher conducted a study which aimed to determine the

relationship between self-efficacy and academic performance in
geography. A 20-item self-efficacy scale and a 25-item geography test
was administered to a group of 12 students.
The following data are the results of the study:
Self-Efficacy Score Geography Test Score

15 22
13 17
14 20
12 18
16 23
12 21
11 19
17 24
15 19
13 16
(a) Compute the Coefficient of Determination.

(b) Interpret the Coefficient of Determination
(c) Plot a Scatter Plot for the data and find the best fitting line.
(d) Determine the Regression Equation and explain it.
(e) Predict the marks for “Geography test” if the “Self-Efficacy Score”
is 18.

The linear relationship between two variables is evaluated from two aspects:
the strength of the relationship (correlation), and the cause-effect association
(regression).
In statistics, correlation is used to denote association between two quantitative
variables, assuming that the association is linear.
Linear regression is a technique to establish the cause effect relationships
between two variables. If the two variables are related, the changes in one will
lead to some changes in the corresponding variable. If the researcher can
identify the “cause and effect” variable, the relationship can be represented in
the form of the following equation:
Y = a + bX;
where Y is the dependent variable, X is the independent variable, and a and b
are two constants to be estimated.
Intercept Regression equation

Linear regression Significant test for slope
Multiple regression Slope
Regression coefficient

Topic Non-parametric
10 Tests
LEARNING OUTCOMES
1. Identify the differences between Parametric and Non-parametric
tests;
2. Explain the concept of the chi-square, Mann-Whitney and Kruskal-
Wallis tests;
3. Discuss the procedure in using the chi-square, Mann-Whitney and
Kruskal-Wallis tests; and
4. Interpret SPSS outputs on chi-square, Mann-Whitney and Kruskal-
Wallis tests.
INTRODUCTION
This topic provides a brief explanation on the parametric and non-parametric test.
Detailed description is given on chi-square, Mann-Whitney and Kruskal-Wallis
tests. Besides that, the assumptions underlying these statistical techniques are
provided to facilitate student learning. It demonstrates how non-parametric
statistical procedures can be computed using formulae as well as SPSS and how
the statistical results should be interpreted.
10.1 PARAMETRIC VERSUS NON-PARAMETRIC

TESTS
Descriptive statistics are used to compute summary statistics (e.g. mean, median,
standard deviation) to describe the samples, while statistical tests are used for
making inference from sample to the intended population. The following diagram
in Figure 10.1 illustrates this.

154 TOPIC 10 NON-PARAMETRIC TESTS
Figure 10.1: Descriptive Statistics and Statistical Tests
There are two categories of statistical tests:

(i) The parametric test
(ii) The non-parametric test
The parametric or distribution constraint test is a statistical test that requires the
distribution of the population to be specified. Thus, parametric inferential methods
assume that the distributions of the variables being assessed belong to some form
of known probability distribution (e.g. assumption that the observed data are
sampled from a normal distribution).
In contrast, for non-parametric test (also known as distribution-free test) the

distribution is not specified prior to the research but instead determined from the
data. Thus, this family of tests do not require the assumption on the distribution.
Most commonly used non-parametric tests rank the outcome variable from low to
high and then analyse the ranks rather than the actual observation.
Choosing the right test will contribute to the validity of the research findings.
Improper use of statistical tests will not only cause the validity of the test result to
be questioned and do little justification to the research, but at times it can be a
serious error, especially if the results have major implications. For example, it is
used in policy formulation and so on.
Parametric tests have greater statistical power compared to their non-parametric

equivalent. However, parametric tests cannot be used all the time. Instead, they
should be used if the researcher is sure that the data are sampled from a
population that follows a normal distribution (at least approximately).

TOPIC 10 NON-PARAMETRIC TESTS 155
On the other hand, non-parametric tests should be used if:

The outcome is a rank (e.g. brand preference);
The score and the population is not normally distributed; or
The existence of a significant number of outliers.
Sometimes, it is not easy to decide whether a sample comes from a normal

population. The following clues can be used to make decisions on normality:
Construct a histogram with normal curve overlapping; it will be fairly obvious
whether the distribution is approximately bell-shaped.
For large data set, use the Kolmogorov-Smirnov test (sample > 100) or
Shapiro-Wilk test (sample < 100) to test whether the distribution of the data
differs significantly from what is normal. This test can be found in most
statistical softwares.
Examine the literature; what matters is the distribution of the overall
population, not the distribution of the sample. In deciding whether a
population is normal, look at all available data, not just data in the current
experiment.
When in doubt, use a non-parametric test; you may have less statistical power
but at least the result is valid.
Sample size plays a crucial role in deciding the family of statistical tests:
parametric or non-parametric. In a large sample, the central limit theorem ensures
that parametric tests work well even if the population is not normal. Parametric
tests are robust to deviations from normal distributions, when the sample size is
large. The issue here is how large is large enough; a rule of thumb suggests that a
sample size of about 30 or more for each category of observation is sufficient to
use the parametric test. The non-parametric tests also work well with large
samples. The non-parametric tests are only slightly less powerful than parametric
tests with large samples.
On the other hand, if the sample size is small we cannot rely on the central limit
theorem; thus, the p value may be inaccurate if the parametric tests were to be
used. The non-parametric test suffers greater loss of statistical power with small
sample size. Table 10.1 summarises some of the commonly used parametric and
non-parametric tests but not all of them are explained in this module.

Table 10.1: Commonly used Parametric and Non-parametric Tests

Test
Requirements Test Type
Type
Parametric
One sample Test
P Z-test for population proportion
A Z-test for population mean
R Random sampling
T-test for population mean
A Large sample size
M Level of measurement at Two-sample Test
E least interval
Z-test for equality of two proportions
T Population parameter is
R t-test for population mean
normally distributed
I Paired t-test
C
Test involving more than two groups
One Way ANOVA
One Sample Test

X 2 Goodness of fit
Sign test for population median
Sign test for population mean
N
O Two-sample Test
N X 2 test for differences between two
Random sampling
Small sample size (less population
P
than 30) Fisher’s Exact test
A
Level of measurement can McNemar’s test
R
be lower than interval X 2 test of independence
A
M Distribution of the Wilcoxon signed rank test
E population parameter is Mann-Whitney U test
T not important
R Tests involving more than two groups
I X 2 test for differences between more than
C two populations
Cochran Test
X 2 test of independence
Friedman’s Test
Kruskal-Wallis rank sum

10.2 CHI SQUARE TESTS

In some situations, you need to use non-parametric statistics because the variables
measured are not intervals or ratios but are categorical such as religion, ethnic
origin, socioeconomic class, political preference and so forth. To examine
hypotheses using such variables, the chi-square test has been widely used. In this
section, we will discuss these popular non-parametric tests called the CHI-
SQUARE (pronounced as “kai-square”) and denoted by this symbol: χ2 .
(a) Assumptions
Even though certain assumptions are not critical for using the chi-square,
you need to address a number of generic assumptions:
Random Sampling Observations should be randomly sampled from
the population of all possible observations.
Independence Observations Each observation should be generated
by a different subject and no subject is counted twice. In other words, the
subject should appear in only one group and the groups are not related in
any way.
Size of Expected Frequencies When the number of cells is less than
10 and particularly when the total sample size is small, the lowest
expected frequency required for a chi-square test is 5.
(b) Types of Chi-Square Tests

We will discuss the use of the chi-square for:
1. One-variable χ2 (goodness-of-fit test) – used when we have one
variable only.
2. χ2 (test for independence: 2 x 2) – used when we are looking for an
association between two variables, with two levels.
10.2.1 One Variable 2

or Goodness-of-Fit Test
This test enables us to find out whether a set of Obtained (or Observed)
Frequencies differs from a set of Expected Frequencies. Usually the Expected
Frequencies are the ones that we expect to find if the null hypothesis is true. We
compare our Observed Frequencies with the Expected Frequencies and see how
good the fit is.

Example :
A sample of 110 teenagers was asked, which of the four hand phone brands they
preferred. The number of people choosing the different brands was recorded in
Table 10.2.
Table 10.2: Preferences for Brands of Hand Phones
Brand A Brand B Brand C Brand D
20 teenagers 60 teenagers 10 teenagers 20 teenagers
We want to find out if one or more brands are preferred over others. If they are
not, then we should expect roughly the same number of people in each category.
There will not be exactly the same number of people in each category, but they
should be near equal.
Another way of saying this is: If the null hypothesis is TRUE, and some brands
are not preferred more than others, then all brands should be equally represented.
We expect roughly EQUAL NUMBERS IN EACH CATEGORY, if the NULL
HYPOTHESIS is TRUE.
Expected Frequencies
There are 110 people, and there are four categories. If the null hypothesis is true,
then we should expect 110 / 4 = 27.5 teenagers to be in each category. This is
because, if all brands of hand phones are equally popular, we would expect
roughly equal numbers of people in each category. In other words, the number of
teenagers should be evenly distributed among the four brands.
The numbers that we find in the four categories, if the null hypothesis is true
are called the EXPECTED FREQUENCIES (i.e. all brands are equally
popular).
The numbers that we find in the four categories are called the OBSERVED
FREQUENCIES (i.e. based on the data we collected).
2
See Table 10.3. What does is to compare the Observed Frequencies with the
Expected Frequencies.
If all brands of hand phones are equally popular, the Observed Frequencies
will not differ from the Expected Frequencies.

If the Observed Frequencies differ greatly from the Expected Frequencies,

then it is likely that all four brands of hand phones are not equally popular.
Table 10.3 shows the observed and expected frequencies for the four brands of
hand phones. It is often difficult to tell just by looking at the data, which is why
you have to use the 2 test.
Table 10.3: Expected and Observed Frequencies and the Differences

Column Column Column 3 Column 4 Column 5 Column 6
1 2
Observed Expected Difference (O – E)2
(O – E)2
(O) (E) (O - E) E
Brand A 20 27.5 -7.5 56.25 2.05
Brand B 60 27.5 32.5 1056.25 38.41
Brand C 10 27.5 -17.5 306.25 11.14
Brand C 56.25
20 27.5 -7.5 2.05
TOTAL 53.65
HOW DO YOU DETERMINE IF THE OBSERVED AND EXPECTED

FREQUENCIES ARE SIMILAR?
Step 1:
Calculate the differences between the Expected Frequencies and Observed
Frequencies (see Column 4). Do not worry about the plus and minus signs!
Step 2:
Square the differences (see Column 5) to obtain the absolute value of the
difference.
Step 3:
Divide the squared difference with the measure of variance (see Column 6). The
“measure of variance” is the Expected Frequencies (i.e. 27.5). For Brand A, it is
56.25 / 27.5 = 2.05 and do the same for the other brands.

Step 4:
2
Add up the figures you obtained in Column 6 and you get 53.65. So the is
53.65.
The formula for the χ2 which you did above is shown as follows:
2
2 observed frequency - expected frequency
expected frequency
Step 5:
The degrees of freedom (DF) is one less than the number of categories. In this
case, DF is 4 categories – 1 = 3. We need to know this, for it is usual to report the
DF, along with the 2 and the associated probability level.
SPSS Output
Hand phones
Chi-Square 53.65a
Df 3
Asymp. Sig. .0000
a. 0 cells (.0%) have expected frequencies less than 5.

The minimum expected cell frequency is 27.5
The 2 value of 53.65 (rounded to 53.6) is compared with that value that would be
expected for a 2 with 3 DF, if the null hypothesis were true (i.e. all brands of
hand phones are preferred equally). [SPSS will compute this comparison]. The
SPSS Output shows that with a 2 value of 53.6 the associated probability value is
0.0001. This means that the probability that this difference was due to chance is
very small. We can conclude that there is a significant difference between the
Observed and Expected Frequencies; i.e. all the four brands of hand phones are
not equally popular. More people prefer brand B (60) than the other hand phone
brands.

SPSS PROCEDURES FOR THE CHI-SQUARE TEST FOR

GOODNESS OF FIT
Select the Data menu.
Click on the Weight Cases to open the Weight Cases dialogue box.
Click on the Weight cases by radio button.
Select the variable you require and click on the right arrow button to
move the variable in the Frequency Variable: box.
Click on OK. The message Weight On should appear on the status bar
at the bottom of the application window.
Select the Analyze menu.
Click on Nonparametric Tests and then Chi-Square…to open the
Chi-Square Test dialogue box.
move to the variable into the Test Variable List: box.
Click on OK.
2
10.2.2 Test for Independence: 2 X 2
Chi-square ( 2 ) enables you to discover whether there is a relationship or
association between two categorical variables. For example, is there an
association between students who smoke cigarettes and those who do not smoke,
and students who are active in sports and those who are not active in sports? This
is a type of categorical data, because we are asking whether they smoke or do not
smoke (not how many cigarettes they smoke); and whether they are active or not
active in sports. The design of the study is shown in Table 10.4, which is called a
contingency table and it is a 2 x 2 table because there are two rows and two
columns.
Table 10.4: 2 x 2 Contingency Table

Smoke Do not Smoke
Not Active in Sports 50 15
Active in Sports 20 25

Example
A researcher is interested in finding out whether male students from high income
or low income families get into trouble more often in school. The following Table
10.5, shows the frequencies of male students from low and high income family
who have discipline problems in school:
Table 10.5: Observed Frequencies

Discipline No Discipline
Total
Problems Problems
Low Income 46 71 117
High Income 37 83 120
Total 83 154 237
To examine statistically whether boys got in trouble in school more often, we

need to frame the question in terms of hypotheses.
Step 1: Establish Hypotheses

The first step of the chi-square test for independence is to establish hypotheses.
The null hypothesis is that the two variables are independent – or, in this
particular case that the likelihood of getting into discipline problems is the same
for high income and low income students. The alternative hypothesis to be tested
is that the likelihood of getting into discipline problems is not the same for high
income and low income students.
It is important to keep in mind that the chi-square test only tests whether two
variables are independent. It cannot address questions of which is greater or less.
Using the chi-square test, we cannot evaluate directly the hypothesis that low
income students get in trouble more than high income students; rather, the test
(strictly speaking) can only test whether the two variables are independent or not.
Step 2: Calculate the Expected Value for Each Cell of the Table
As with the goodness-of-fit example described earlier, the key idea of the chi-
square test for independence is a comparison of observed and expected values.
How many of something was expected and how many were observed in some
processes? In the case of tabular data, however, we usually do not know what the
distribution should look like. Rather, in this use of the chi-square test, expected
values are calculated based on the row and column totals from the table.

The expected value for each cell of the table can be calculated using the following
formula:
Row total × Column total

Total for table
For example, in the table comparing the percentage of high income and low
income students involved in disciplinary problems, the expected count for the
number of low income students with discipline problems is:
117 83
Expected Frequency (E1) = 40.97
237
120 154
Expected Frequency (E4) = 77.97
237
Use the formula and compute the Expected Frequencies for E2 and E3. Table 10.6
shows the completed expected frequencies for all the four cells.
Table 10.6: Observed and Expected Frequencies

Discipline No Discipline
Total
Problems Problems
Low Income O = 46 O = 71 117
E1 = E2 =
High Income O = 37 O = 83 120
E3 = E4 =
Total 83 154 237
Step 3: Calculate Chi-square Statistic
With these sets of figures, we calculate the chi-square statistic as follows:
(Observed Frequency - Expected Frequency)2

Chi-square = Sum of
(Expected Frequency)

In the example above, we get a chi-square statistic equals to:
(46 40.97) 2 (37 42.03) 2 (71 76.03) 2 (83 77.97) 2

x2
40.97 42.03 76.03 77.97
x2 1.87
Step 4: Assess Significance Level
(a) Degrees of Freedom

Before we can proceed, we need to know how many degrees of freedom we
have. When a comparison is made between one sample and another, a
simple rule is that the Degrees of freedom equal (Number of columns – 1) x
(Number of rows – 1) not counting the totals for rows or columns.
For our data, this gives (2–1) x (2–1) = 1.
(b) Statistical Significance

• We now have our chi-square statistic (χ2 = 1.87), our predetermined
alpha level of significance (0.05), and our degrees of freedom (df =1).
Refer to the chi square distribution table with 1 degree of freedom and
reading along the row, we find our value of χ2 = 1.87 is below 3.841 (see
Table 10.7).
• When the computed χ2 statistic is less than the critical value in the table
for a 0.05 probability level, then we DO NOT reject the null hypothesis
of equal distributions.
• Since our χ2 = 1.87 statistic is less than the critical value for 0.05
probability level (3.841) we DO NOT reject the null hypothesis and
conclude that students from low income families are NOT
SIGNIFICANTLY more likely to have discipline problems than students
from high income families.

2
Table 10.7: Extract from the Table of Critical Values
Probability Level (alpha)
Df 0.5 0.10 0.05 0.02 0.01 0.001
1 0.455 2.706 3.841 5.412 6.635 10.827
2 1.386 4.605 5.991 7.824 9.210 13.815
3 2.366 6.251 7.815 9.837 11.345 16.268
4 3.357 7.779 9.488 11.668 13.277 18.465
5 4.351 9.236 11.070 13.388 15.086 20.517
Note:
The 2 X 2 contingency table can be extended to larger tables such as 3 X 2 or 4
X 3 depending on the number of categories in the independent and dependent
variables. The formulae and the computation procedure are similar to that of the
2 X 2 contingency table.
SPSS PROCEDURES FOR THE CHI-SQUARE TEST FOR

RELATEDNESS OR INDEPENDENCE
• Select the Analyze menu.
• Click on Descriptive Statistics and then on Crosstabs to open the
Crosstabs dialogue box.
• Select a row variable and click on right arrow button to move the variable
into the Row(s): box
• Select a column variable and click on the right arrow button to move the
variable into the Column(s): box
Click on the Statistics command push button to open the Crosstabs: Statistics
sub-dialogue box
Note:
The 2 X 2 contingency table can be extended to larger tables such as 3 X 2 or 4
X 3 depending on the number of categories in the independent and dependent
variables. The formulae and the computation procedure are similar to that of the
2 X 2 contingency table.

• Click on the Chi-square box.

• Click on Continue.
• Click on the Cells….command push button to open the Crosstabs: Cell
Display sub-dialogue box.
• In the Counts box, click on the Observed and Expected check boxes.
• In the Percentages box, click on the Row, Column and Total check boxes.
• Click on Continue and then OK.
ACTIVITY 10.1
Look at the following table:

What is the value of the expected frequencies?
10-14 years 15-19 years 20-24 years 25-29 years
Observed 72 31 15 50
ACTIVITY 10.2
A study was conducted to determine if science and mathematics should

be taught in English. A total of 105 parents were asked to respond “yes”
or “no”. The data (shown in the following table) were categorised
according to whether they were from an urban or rural area:
Yes No Total
Urban 36 14 50
Rural 30 25 55
Total 66 39 105

Questions:
What is the null hypothesis? What is the alternative hypothesis?
How many degrees of freedom are there?
What is the value of the chi-square statistic for this table?
What is the p-value of this statistic?
10.3 MANN-WHITNEY U TESTS

The Mann-Whitney U test is used to compare the differences between two groups of
sample from an unrelated population. This test uses the median as the parameter
for comparisons. The Mann-Whitney U test is applied when the sample size is
small (less than 30 per group) and/or when the level of measurement is ordinal.
Figure 10.2: Mann-Whitney U Test
The Mann-Whitney U test tests the significant difference between two

independent groups. This test requires the dependent variable to be measured in
ordinal level. For example, comparing the IQ scores of male and females (the IQ
score is considered as an ordinal level measurement because an individual with an
IQ score of 100 is not twice as intelligent as the one with a score of 50). The
Mann-Whitney U test is also used for interval data when the sample size is small.
Requirements for the test:

• Parameter to be tested: Median
• Normality: No Assumption of Normality
• Unrelated Samples
• Sample size: Small

n1 (n1 1)
Test Statistics, T = S where S is the sum of rank of population 1 and
2
n1 is the sample size of population 1. Population is the population with smaller
sum of rank value.
The Mann-Whitney test uses the rank sum as the test statistics. The procedure is
as follows:
• The two independent samples are combined and ranks are assigned to the
scores (it can be a mean score).
• The sum rank of Population 1 (usually the population of interest, decided
based on the null hypothesis) is computed.
• This sum rank is than used to compute the test statistics.
Some crucial assumptions of the Mann-Whitney test:

• The data consists of a random sample of observations from two unrelated
populations with an unknown median.
• The two samples are independent.
• The variable observed is a continuous random variable (usually mean).
• The distribution functions of the two populations differ only with respect to
location, if they differ at all.
Example:
In assessing the effect of TV advertisements on buyers’ preference on branding, a
simple experiment was carried out. A group of adults was selected to participate
in this experiment. One group was subjected to a behaviour modification
psychotherapy using a series of television advertisements while another formed
the control group. 17 adults were given the treatment, while 10 others did not
receive any treatment. After the treatment period, both the experimental and the
control group were rated for their brand preference using the brand preference
scale. Refer to Figure 10.3.

Figure 10.3: Processes in the experiment
The result of the experiment can be seen in Table 10.8 below.
Table 10.8: Result

Brand Preference Score
BMP 11.9 11.7 9.5 9.4 8.7 8.2 7.7 7.4 7.4 7.1 6.9 6.8 6.3 5.0 4.2 4.1 2.2
Ctrl 6.6 5.8 5.4 5.1 5.0 4.3 3.9 3.3 2.4 1.7
We wish to know whether these data provide sufficient evidence to indicate that
behaviour modification psychotherapy using TV advertisements improves the
brand preference among adult shoppers.
The Hypothesis
Ho: There is no difference in the brand preference between the group that
received behaviour modification therapy and the control group.
Ha: There is a difference in the brand preference between the group that received
behaviour modification therapy and the control group.
The level of significance is set at 0.05 ( = 0.05). Table 10.9 presents the Result
of Analysis on brand preference scores of treatment and control groups.

Table 10.9: Result of Analysis

PRS score / Rank Mean Sum
BMP 11.9 11.7 9.5 9.4 8.7 8.2 7.7 7.4 7.4 7.1 6.9 6.8 6.3 5.0 4.2 4.1 2.2
Rank 27 26 25 24 23 22 21 19.5 19.5 18 17 16 14 9.5 7 6 2 17.44 296.
5
Ctrl 6.6 5.8 5.4 5.1 5.0 4.3 3.9 3.3 2.4 1.7
Rank 15 13 12 11 9.5 8 5 4 3 1 8.15 81.5
• Ranking of the scores by arranging all the scores from both groups in
ascending order.
• A rank of 1 is given to the smallest and same score will share the rank
n (n 1) 10(10 1)
T= S 1 1 = 81.5 = 26.00
2 2
p = 0.003
Example of SPSS output of the Mann-Whitney Test (refer Figure 10.4 below).
Figure 10.4: SPSS Output of the Mann-Whitney Test

alternative hypothesis. There is a difference in the brand preference between the
group that received behaviour modification therapy and the group that did not.
The brand preference score of the group that received behaviour modification
therapy is significantly different compared to the group that did not receive any
therapy. From the mean rank, it is evident that the brand preference score for the
group that received behaviour modification therapy is higher. In other words, the
behaviour modification psychotherapy using TV advertisement enhances brand
preference among adults.
Example: Mann-Whitney Test using SPSS

Mann-Whitney Test also can be used to compare the difference between two
distinct groups’ (e.g. male and female) rating on particular phenomena. In a
service quality survey carried at the Kuching General Hospital, the researcher
gauged the knowledge of hospital staff using a specially designed questionnaire.
He would like to test whether the knowledge level of male and female staff is
similar or differs significantly. The following Table 10.10 provides the mean
score and standard deviation of respondents’ assessment on the knowledge of the
hospital staff.
Table 10.10: Hospital Staff Knowledge
N Mean Std. Deviation Minimum Maximum

Male 24 4.58 1.213 1 7
Female 31 5.00 1.065 3 7
The Hypothesis
Ho : There is no difference between the male and female hospital staff’s
knowledge.
Ha: There is a significant difference between the male and female hospital
staff’s knowledge

SPSS Command
SPSS PROCEDURES FOR THE MANN WHITNEY TEST

• Open the Analysis menu.
• Select the Nonparametric test.
• Select the two Independent Samples
• Select the variable you require and click on the right arrow button to
move the variable in the Test Variable List box.
• Push “gender” into the Grouping variable box.
• Click on OK.
An example of SPSS results

Ranks
gender N Mean Rank Sum of Ranks

Knowledge of male 24 25.46 611.00
hospital staff female 31 29.97 929.00
Total 55
Test Statisticsa Decision: The p-value is greater than

0.05, do not reject the null, there is not
Knowledge of
hospital staff enough evidence to conclude the
Mann-Whitney U 311.000 alternate.
Wilcoxon W 611.000
Z -1.099 There is no significant difference
Asymp. Sig. (2-tailed) .272 between the male and female hospital
a. Grouping Variable: gender staff knowledge. Any difference
observed could be due to chance.

10.4 KRUSKAL-WALLIS RANK SUM TESTS

The Kruskal-Wallis test serves the same purpose as the One way ANOVA,
comparing the differences between more than two groups of samples from
unrelated populations. This test is less stringent than the ANOVA. This test uses
the median as the parameter for comparisons. The Kruskal-Wallis test is used
when the sample size is small (less than 30 per group) and/or when the level of
measurement is ordinal. Refer to Figure 10.5.
Figure 10.5: The Kruskal-Wallis Test
The Kruskal-Wallis test tests the significant differences among independent

groups (if the number of independent groups are two, then the appropriate test is
the Mann-Whitney U test). This test requires the dependent variable to be
measured in ordinal level. For example, comparing the IQ scores of Malay,
Chinese and Indian youths (IQ score is considered as ordinal level measurement
because an individual with an IQ score of 100 is not twice as intelligent as the one
with a score of 50). The Kruskal-Wallis test is also used for interval data when the
sample size is small.
Requirement for the test:

• Parameter to be tested: Median
Normality : No assumption of normality
• Sample size: Small
• Sample characteristics: Unrelated samples
• Recommended test: Kruskal-Wallis test

k R2
12 i - 3 (N + 1), where
Test Statistics, H =
N ( N 1) i 1 ni
N = Total sample size of all the group

ni = Sample size of each group
Ri = Rank sum of each group
The procedure for the Kruskal-Wallis test is as follows:

• The independent samples are combined and ranks are assigned to the scores
(it can be a mean score).
• The sum ranks of the different populations are computed.
• This sum rank then is used to compute the test statistics.
Some crucial assumptions of the Kruskal-Wallis test:

• The data consists of k-random samples of n1, n2, …nk.
• The samples are independent.
• The variable observed is a continuous random variable (usually mean).
• The populations are identical except for a possible difference in location for
at least one population.
Example:
In studying the average amount spent on mobile phone usage, a researcher
collected the average monthly mobile phone bills from three groups of adults:
clerical staff, supervisors and managers. Table 10.11 presents the data.
Table 10.11: Data
Average monthly expenditure on mobile phone bill
Clerical 257 302 206 318 449 334 299 149 282 351
Supervisor 460 496 450 350 463 357
Manager 338 767 202 833 632
Objective:
To determine whether there is any difference in the average monthly mobile
phone expenditure among the three populations.

The Hypothesis
Ho: There is no difference in the average monthly expenditure on mobile phone
usage among clerks, supervisors and managers
H1: There are differences in the average monthly expenditure on mobile phone
usage among clerks, supervisors and managers
The level of significance is set at 0.05 ( = 0.05). Table 10.12 as follows shows
the results of the analysis.
Table 10.12: Results of the Analysis

Clerk 257 302 206 318 449 334 299 149 282 351
Sum of rank
Rank 4 7 3 8 14 9 6 1 5 12
= 69
Supervisor 460 496 450 350 463 357
Sum of rank
Rank 16 18 15 11 17 13
= 90
Group III 338 767 202 833 632
Sum of rank
Manager 10 20 2 21 19
= 72
The Kruskal-Wallis statistics is computed using the formula, H =

k R2
12 i- 3 (N + 1), where
N ( N 1) i 1 ni
k R2
12 i - 3 (N + 1)
H=
N ( N 1) i 1 ni
12 69 2 90 2 72 2
= ( 3 (21 1) )
21(21 1) 10 6 5
= 8.36

SPSS Output
Refer to Table 10.13.

Table 10.13: SPSS Output
Group N Mean Rank
Clerk 10 6.90
Average monthly expenditure on Supervisor 6 15.00
mobile phone bill Manager 5 15.67
Total 21
Average monthly expenditure

Chi-Square 8.361
df 2
Asymp. Sig. 0.015
The Kruskal-Wallis 2 value is 8.361 and the p-value is 0.015. Since the p-value
is smaller than 0.05, reject null hypothesis and conclude the alternative
hypothesis. There is a difference in the average monthly expenditure on mobile
phone usage among the three groups. The average monthly expenditure on mobile
phone usage among the three different groups is not the same. Even though the
test statistics does not provide information on the differences in the average
monthly expenditure, judging from the mean rank, clerks spend the least
compared to supervisors and managers.
Example: Kruskal-Wallis Test using SPSS

With reference to the hospital service quality survey, the management wanted to
see how respondents’ employment influenced their assessment on the “knowledge
of hospital staff”. Respondents were grouped into three categories of employment
(public, private and students), while “knowledge of hospital staff’ was rated on a
five point scale (assumed ordinal). The hospital administrator wanted to know
who gave better ratings: public sector employees, private sector employees or
students.

The Hypothesis
Ho : There is no difference in the assessment on hospital staff knowledge among
public sector employees, private sector employees, and students.
H1 : There are differences in the assessment on hospital staff knowledge among

public sector employees, private sector employees, and students.
SPSS Command
SPSS PROCEDURES FOR THE KRUSKAL WALLIS TEST

Open the Analysis menu.
Select the Nonparametric test.
Select the K Independent Samples
move the variable in the Test Variable List box.
Push the independent into the Grouping variable box.
Click Define and define the group.
Tick “Kruskal-Wallis H” for Test Type.
Click on OK.

Results
Ranks
Employment N Mean Rank

Knowledge of staff Goverment 1 18.00
Private 5 9.60
Students 17 12.35
Total 23
Test Statisticsa,b
Knowledge of
Since the p-value is 0.429 which is greater than
staff
(assessment 0.05, there is no difference in the assessment
before on hospital staff knowledge among public
attending sector employees, private sector employees,
seminar) and students.
Chi-Square 1.694
df 2
Asymp. Sig. .429
a. Kruskal Wallis Test
b. Grouping Variable: Employment
ACTIVITY 10.3
The following data summarises the students PASS or FAIL in a
mathematics test on fractions and the method used to teach the concept
Group Mathematics Test Performance
Pass Fail
Method X 5 21
Method Y 9 29
(a) Determine the expected frequencies and degree of freedom.

(b) Formulate the hypothesis to test performance in mathematics test
that is associated with the teaching methods.
(c) Compute the chi-square statistics and state your conclusion.

• There are two categories of statistical tests: (i) the parametric and (ii)
non-parametric tests.
• The parametric or distribution constraint tests are statistical tests that require
the distribution of the population to be specified.
• Parametric inferential methods assume that the distribution of the variables

being assessed belong to some form of known probability distribution.
• Among the commonly used non-parametric tests are chi-square test, Mann-
Whitney Test and Kruskal-Wallis test.
• The chi-square test tests the significant difference in proportion and is very
useful when the variable measured is nominal.
• The chi-square is very flexible and mainly used in two forms (i) comparing
the observed proportion with some known values, and (ii) comparing the
difference in distribution of proportions between two groups whereby each
group can have two or more categories.
• Thus, even though the chi-square is often used with a 2 by 2 contingency

table, it can be extended to n by m table.
• The Mann-Whitney U test is used to compare the differences between two

groups of samples from unrelated populations. It uses the median as the
parameter for comparisons and the test is used when the sample size is small
(less than 30 per group) and/or when the level of measurement is ordinal.
• The Kruskal-Wallis test serves the same purpose as the one way ANOVA,
comparing the differences between more than two groups of samples from
unrelated populations. This test uses the median as the parameter for
comparisons.
• The Kruskal-Wallis test is used when the sample size is small and/or when the
level of measurement is ordinal.

Chi-square test Mann-Whitney test

Contingency table Mean rank
Degree of freedom Non-parametric
Kruskal-Wallis test Parametric

APPENDIX

APPENDIX 183
Appendix A
Creating an SPSS Data File
After you have developed your questionnaire, you need to create an SPSS data file
to enable you to enter data into a format which can be read by SPSS. You can do
this via the SPSS Data Editor which is inbuilt into the SPSS package. When
creating an SPSS data file, your items/questions in the questionnaire will have to
be translated into variables. For example, if you have a question “What is your
occupation?” and this question has several response options such as 1. Salesman
2. Clerk 3. Teacher 4. Accountant 5. Others; what you need to do is to translate
your question into a variable a name, perhaps called occu. In the context of SPSS
data entry, these response options are called value labels, for example Salesman is
assigned a value label of 1, Clerk 2, Teacher 3, Accountant 4 and Others 5. If the
respondent is a teacher, you enter 3 when inputting data into the variable occu in
your data file. Sometimes you may have a question which requires the respondent
to state in absolute terms such as “Your annual salary is _________” In this case,
you can create a variable name called salary. Since this variable only requires the
respondent to state his/her salary, you do not need to create response options –
just enter the actual salary figure.
When defining the variable name, you have to consider the following:
(i) it can only have a maximum of 8 characters (however version SPSS 12.0
and above allows up to 64 characters);
(ii) it must begin with a letter;
(iii) it cannot end with a full stop or underscore;
(iv) it must be unique, i.e. no duplication is allowed;
(v) it cannot include blanks or special characters such as !, ?, ”, and *.
When defining a variable name, an uppercase character does not differ from a
lower case character.
Besides understanding the variable name convention and value labels, you will
also need to know other variable definitions such as variable label, variable type,
missing values, column format and measurement level. A variable label describes
the variable name, for example, if the variable name is occu, the variable label can
be “Respondent’s occupation”. You need not specify the variable label if do not
wish to but variable label improves the interpretability of your output especially if
you have many variables. Missing values can also be assigned to a variable. It is

184 APPENDIX
rare for one to obtain a questionnaire without any item being left blank. By
convention, a missing value is usually assigned a value of 9 but for statistical
analysis it would be preferable to assign a value which is equivalent to the mean
of the variable to fill up all the missing values. However, this can only be done for
interval or ratio level variables. For example, if you have the variable income and
data were derived from 150 respondents and 20 did not provide their income
information then compute the mean of the income via SPSS for the 150
respondents and then recode all missing values as the computed mean value.
The type of variable relates closely to your items in the questionnaire. For
example, the item age is a numeric variable, meaning you can input the variable
using only numbers such as if a person’s age is 34 then you can type 34 under the
age variable column for this particular case. However, sometimes there is a need
to use alphanumeric characters to input data into a variable. A good example is
respondent’s address. In this case, alphanumeric characters constitute what is
called a string variable type. For example, a short open-ended question will be
“Please state your address.” The respondent will write his/her address using
alphanumeric characters such as 23 Jalan SS2/75, 47301 Petaling Jaya, Selangor.
So this address is actually a combination of alphabets and numbers.
The column format in the data editor allows you to specify the alignment of your
data in a column, for example left, centre or right. Measurement in the SPSS
variable definition convention differs slightly from that used in the statistics
textbook as SPSS uses scale to refer to both interval and ratio measurement.
Ordinal and nominal levels of measurement are maintained as they are. In
statistical analysis, it is extremely important to know what the level of
measurement for a particular variable is. A nominal variable (also called
categorical variable) classifies persons or objects into two or more categories,
for example, the variable gender is categorised as 1 for Male and 2 for Female,
marital status as 1 for Single, 2 for Married and 3 for Divorced. Numbering in
nominal variables does not indicate that one category is higher or better than
another, for example, representing 1 for Male and 2 for Female does not mean
that male is lower that female by virtue of the number being smaller. In nominal
measurement the numbers are only labels. On the other hand, an ordinal variable
not only classifies persons or objects; they also rank them in terms of degree.
Ordinal variables put persons or objects in order from highest to lowest or from
most to least. In ordinal scale, intervals between ranks are not equal, for
example, the difference between rank 1 and rank 2 is not necessarily the same as
the difference between rank 2 and rank 3. For example, a person(A) with a
height of 5’ 10 ” and falls under rank 1 does not have the same interval as a
person(B) with a height of 5’ 5” who is ranked 2 and another person(C) with a
height of 4’ 8” who is ranked 3. The difference in height among the three
APPENDIX 185
persons is not equal but there is an order, i.e. A is taller than B and B is taller
than C.
Interval variables have all the characteristics of nominal and ordinal variables but
also have equal intervals. For example, achievement test is treated as an interval
variable. The difference in a score of 50 and a score of 60 is essentially the same
as the difference between the score of 80 and 90. Interval scales, however, do not
have a true zero point. Thus, if Ahmad has a score of 0 for Mathematics it does
not mean he has no knowledge of mathematics at all nor does Muthu scoring 100
means he has total knowledge of Mathematics. Thus, if a person scores 90 marks
we know he scores twice as high as one who scores 45 but we cannot say that a
person scoring 90 knows twice as much as a person scoring 45.
Ratio variables are the highest, most precise level of measurement. This type of
variable has all the properties of the other types of variables above. In addition, it
has a true zero point. For example a person’s height – a person who is 6 feet tall is
twice as tall a person who is 3 feet tall. A person who weighs 50 kg is one third
the weight of another who is 150 kg. Since ratio scales encompass mostly physical
measures they are not used very often in social science research.
In SPSS, interval and ratio measurements are classified as scale variables.

Nominal and ordinal measurements remain as they are, i.e. nominal and ordinal
variables respectively.
A good understanding of the level of measurement will be useful when defining

the variables via the SPSS Data Editor and in the data analysis process. But before
you proceed to the next phase of data analysis, you need to enter data into a
format which can be read by SPSS. There are several ways you may do this, using
i. SPSS Data Editor ii. Excel iii. Access and iv. Word. The steps to enter data via
the SPSS Data Editor are described below.
How to define variables and enter data using the SPSS Data Editor?
Steps
1. Click Start All Programs SPSS for Windows SPSS 12.0 for
Windows select Type in data OK Variable View Start defining
your variables by specifying the following:
(a) Name: Type Gender <Enter>
(b) Type: Select Numeric OK

186 APPENDIX
(c) Width: 8
(d) Decimal: 0
(e) Label: Respondent’s gender
(f) Values: Under Value, type 1; under Value Label, type Male; Click
Add
(g) Under Value again, type 2; under Value Label, type Female
(h) Click Add
(i) Missing: No missing values OK
(j) Columns: 8
(k) Align: Right
(l) Measure: Nominal
2. Proceed to define the second variable and so forth until you have completed
all variables in your questionnaire. Do note that certain variables such as ID
do not have value labels. If you are not sure what the level of measurement
for that particular variable is, you may want to keep the default which is
Scale. Do remember that if the particular variable you are defining share the
same specification such as the variable label of a variable you have already
defined, then you may merely copy it into the relevant cells.
3. After you have completed defining all your variables, the next step is to
enter data into the data cells by doing the following:
(a) Click Data View
(b) Click row 1, column 1 (note the variable name as shown)
(c) Type in the data e.g. if the respondent’s gender is male, then type 1
and then proceed to the next variable by pressing the right arrow key
( ) on your keyboard.
(d) Input the next variable and so on so forth until you have completed all
your data input.

APPENDIX 187

188 APPENDIX

APPENDIX 189

190 APPENDIX

APPENDIX 191

MODULE FEEDBACK
MAKLUM BALAS MODUL
If you have any comment or feedback, you are welcome to:
1. E-mail your comment or feedback to modulefeedback@oum.edu.my
OR
2. Fill in the Print Module online evaluation form available on myVLE.
Thank you.
Centre for Instructional Design and Technology

(Pusat Reka Bentuk Pengajaran dan Teknologi)
Tel No.: 03-27732578
Fax No.: 03-26978702

Statistics For Educational Research PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics For Educational Research PDF

Uploaded by

Copyright:

Available Formats

HMEF5113

Copyright © Open University Malaysia (OUM)

Module Writer: Prof Dr John Arul Philips

Moderators: Dr Soon Seng Thah

Developed by: Centre for Instructional Design and Technology

Printed by: Meteor Doc. Sdn. Bhd.

First Edition, May 2009

Copyright © Open University Malaysia (OUM)

Topic 1 Introduction to Statistics 1

Topic 2 Descriptive Statistics 17

Copyright © Open University Malaysia (OUM)

Topic 3 Normal Distribution 31

Topic 4 Hypothesis Testing 53

Copyright © Open University Malaysia (OUM)

Topic 6 One-way Analysis of Variance (One-way ANOVA) 89

Topic 7 Analysis of Covariance (ANCOVA) 109

Topic 8 Correlation 122

Topic 9 Linear Regression 137

Copyright © Open University Malaysia (OUM)

Topic 10 Non-parametric Tests 153

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

WELCOME TO HMEF5113 STATISTICS FOR

WHAT WILL YOU GET FROM DOING THIS COURSE?

Aim of the Course

Course Learning Outcomes

Copyright © Open University Malaysia (OUM)

4. Apply the different statistical techniques in educational research, conduct

HOW CAN YOU GET THE MOST FROM THIS COURSE?

Topic 2: Descriptive Statistics

Topic 3: The Normal Distribution

Copyright © Open University Malaysia (OUM)

Topic 4: Hypothesis Testing

Topic 6: One-way Analysis of Variance

Topic 7: Analysis of Covariance

Topic 9: Linear Regression

Copyright © Open University Malaysia (OUM)

Topic 10: Non-parametric Tests

Organisation of Course Content

Questions are interspersed at strategic points in the topic to encourage

These are situations drawn from research projects to show how

Copyright © Open University Malaysia (OUM)

WHAT SUPPORT WILL YOU GET IN STUDYING THIS

MyVLE Online Discussion

Copyright © Open University Malaysia (OUM)

HOW SHOULD YOU STUDY FOR THIS COURSE?

2. Proposed Study Strategy

Copyright © Open University Malaysia (OUM)

(d) To begin reading a topic:

Copyright © Open University Malaysia (OUM)

we do in education and training has to be based on sound theoretical

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

Copyright © Open University Malaysia (OUM)

Paraphrases: A closely reasoned argument of an author is paraphrased but

Copyright © Open University Malaysia (OUM)

(ii) How Can I Avoid Plagiarism?

(b) Documenting Sources

Direct Citation Simply having a thinking skill is no assurance

Indirect Citation According to Wurman (1988), the new disease of

Copyright © Open University Malaysia (OUM)