DBI SPSS User Manual 2

DATA ANALYSIS USING
SPSS 23
©Digital Bridge Institute, Abuja Page 1

Module 1
Introduction
Introduction to
to SPSS
SPSS

Objectives
 Introduction to SPSS and Basic Concepts.
 Understanding the SPSS Environment and Data Entry in SPSS.
 Basic Operations in SPSS
 Understanding Data Types and Sorting and arranging Data.
 Creating Questionnaire in SPSS
 Measures of Central Tendencies
 Know how to find Mean, Median and Modes in SPSS.
 Measure of Dispersion
 MIN, max, Range, Standard Deviation and Variance
 Measures of Interquartile Range
 Understand how to apply DECILE and PERCENTILES
 Descriptive Statistics
 Understand how to create Frequencies, Cross Tabulations, Bar charts, Pie
Charts, Scatter Diagrams and Boxplots.
 Inferential Statistics
 Understand the use and applications of Correlations, T-Test, ANOVA, Chi-
Square and others.
 Statistical Decision Trees
 Choosing the right test to apply
 Parametric and Non-Parametric Testing

What is SPSS
 SPSS is an acronym of Statistical Package for Social Science but now it can
also be referred to as Statistical Product and Service Solutions.
 SPSS is an application Software developed by International Business
Machines (IBM) since 2009; though original developed by SPSS Inc.
 SPSS was originally designed to be used in the Social Sciences, however,
it’s usage and capabilities has extended to other areas of learning.
 The most recent release or version of SPSS is version 25.0 which was
released in August 2017. This training manual is based on SPSS version
23.0.
 SPSS is a tool, it only does what it is “told” to do. SPSS does not do the
thinking for you.
 To use SPSS you must have some basic knowledge of statistics.
 At first look the SPSS screen resembles a typical spreadsheet; but there is a
lot more to SPSS than that.
Uses of SPSS
 Data entry and Data Cleaning
 Descriptive Statistics Analysis and output
 Parametric and non-parametric test –tests of relationship and
difference
 Data division based on factors and groups
 Quantitative research with observed variables
 Qualitative research with coded themes
 Market Research and Trends.
 Data Management and Documentation.
 Predictive Analysis
 Health Statistics.
 Surveys
 Data Mining.

SPSS Alternatives
 SPSS is one of the most widely used and powerful data
analysis software. However there are other data analysis
software that are in use. Some examples are;
 MATHEMATICA
 STATISTICA
 R
 EpiInfo
 Eviews
 Maple
 STATA
 NCSS
 CSPro
 SAS
 MiniTab
Learning Objectives
At the end of this training programme, it is expected that
participants in the SPSS training programme would be able to do
the following;
 Describe and understand data elements and types in SPSS.
 Perform descriptive and inferential statistics using SPSS.
 Understand the data analysis processes.
 Describing categorical data
 Understand hypothesis testing process using several
statistical test in SPSS.
 Choosing the right statistical test to perform in SPSS.
 Representing questionnaire and interpreting questionnaire
data in SPSS.
 Perform Parametric and Non-Parametric Testing on a
Dataset.
Starting SPSS
 From the Start Menu, click on
All Programs.
 IBM SPSS Statistics > IBM
SPSS Statistics 23 or simply
search for IBM SPSS.
 The way SPSS is opened may
also vary depends on the type
and version of Operating
System that is in use on a
particular computer.

SPSS Screen
TITLE BAR
MIN, MAX, CLOSE
MENU BAR
TOOL BAR
CASE CELL
VIEWS STATUS BAR

SPSS Screen
 The default SPSS Window will have the Data Editor
 The Data Editor has two tabs in the left bottom corner: we
can click Data View for inspecting our data values.
Alternatively, Variable View shows information regarding the
meaning of our data, collectively known as the dictionary.
Variable View
Data View

Data Editor
 SPSS toolbars contain some handy tools. Columns of cells are
called variables. Variable names are shown in the column headers.
 Rows of cells are called cases. “Cases” refers to nothing more than
rows of cells which may or may not correspond to people or objects
 Data cell contents are called values
 You can drag the three dots in the right margin leftwards in order to
split the window horizontally. In a similar vein, split the window
vertically by dragging the three dots in the lower margin upwards.
Split windows allow for viewing distant cases or variables simultaneously
 You can toggle/Switch between Data View and Variable View by
clicking the tabs in the left lower corner. A faster option is the Ctrl+T
 The status bar may provide useful information on the data such as
whether a WEIGHT, FILTER, SPLIT FILE or Unicode mode is in effect.
 Each row represent a CASE e.g if 50 participants are involved in a study,
then 50 cases of information would be generated.
 A CELL is the intersection of a Variable and a Case

Data View Window
Data View - Spreadsheet-like system for defining, entering, editing,
and displaying data. Extension of the saved file will be “sav.”
Cell Editor
Dots for splitting window

Variable View Window
 Variable View contains descriptions of the attributes of each
variable in the data file. In Variable View:
 Rows are variables.
 Columns are variable attributes.
 You can add or delete variables and modify attributes of variables,
including the following attributes:
 Variable name
 Data type
 Number of digits or characters
 Number of decimal places
 Descriptive variable and value labels
 User-defined missing values
 Column width
 Measurement level
Variable View Window 2
 The variable View Window contains information about the
data set that is stored with the data view.
 After selecting Variable View, variables are shown
as rows instead of columns.
 Columns now represent variable properties such as label,
name and type.

Output Viewer
 The Output Viewer displays the result of analysis in form of tables,
frequencies, charts and graphs and gives the user the opportunity
to edit them before it is saved or printed.
 Output Viewer is divided into two main sections, an outline pane
on the left, and a tables pane on the right.
 Output viewer files are saved as *.spv
Outline Pane
Table Pane

Syntax Editor
 The Syntax Editor allows a user to write, edit, and run
commands in the SPSS programming language. These Syntax
editor files are a file extension of *.sps.

Defining and Entering Variable
 What is a Variable – is an item of data. A variable name can be
up to 256 characters long and must start with a letter. Examples
of variables are;
 A measurement:
 A characteristic e.g., Gender, Age, Height, Weight… etc.
 Experimental Condition
 e.g., Condition, Experimental group…
 Opinion/Belief
 e.g., A survey question which asks for a respondent’s level of
agreement with a statement etc.
 Time Point
 e.g., pre-test, post-test, T0, T1, T2………… etc.

Variable Definition
 The first character of a variable name must be alphabetic.
 Variable name must be unique and can be up to 256 characters
long.
 Spaces are not allowed in variable name.
 The rest of the name can contain letters, numbers, a period, or
the symbols @, #,_ or $.
 Variable name should not end with a period (.) or underscore
(_)
 Variable names are not case sensitive.
 SPSS Statistics initially assigned default variable names
(VAR00001, VAR00002) to variables. However it is advised
that user use meaningful names such as Height, Name, Sex
etc.
 The Variable type determines how cases are entered.
Variable Type
VARIABLE TYPE
VARIABLE TYPE
NUMERIC STRING
VARIABLE FORMAT

Variable Type 2
The data type of the variable. There are 9 options: Numeric,
Comma, Dot, Scientific notation, Date, Dollar, Custom currency,
and String. Most common variable type will be either Numeric or
String. Numeric variables are numbers that either map to a value
(e.g., 1=Single) or values (distance=125 km).
String numbers are
text and can only
be treated as
such. As a
result, very few
Variable Type
manipulations
can be performed
on them.

Variable Type Definitions
 Numeric
 Comma
 Dot
 Scientific Notation
 Date
 Dollar
 Custom Currency
 String
 Restricted numeric

Variable Type - String
 String variables -- which are also called alphanumeric
variables or character variables -- have values that are
treated as text. This means that the values of string variables
may include numbers, letters, or symbols. Missing string
values appear blank.
 Some data such as phone numbers, although composed
of numbers, are typically considered string variables
because their values cannot be used meaningfully in
calculations.
 Any written text is considered a string variable, including
free-response answers to survey questions.
 The width of a String variable is the number of characters
it can contain.

Variable Type - Numeric
 Numeric variables have values that are numbers (in standard format
or scientific notation). Missing numeric variables appear as a period
(i.e., “.”).
 Continuous variables that can take on any number in a range
(e.g., height, weight, blood pressure, ...) would be considered
numeric variables.
 Counts (e.g., number of free throws made per game) are a
numeric variable with zero decimal places.
 Width – defines the lenght
of numbers that the numeric
value can contain.
 Decimal – sets the number of
decimal positions that the value
can contain.

Variable Format – Comma and Dot
 Comma –
 Numeric variables that include commas that delimit every
three places (to the left of the decimals) and use a period to
delimit decimals.
 e.g Seventy Six Thousand, Five hundred and fifty 76,500.50
 Dot –
 Numeric variables that include
dots that delimit every three
places (to the left of the
decimals)
•
e.g Seventy Six Thousand,
Five hundred and fifty
76,500.50

Variable Format - Date
 Numeric variables that are displayed in any standard calendar
date or clock-time formats. Standard formats may include
commas, blank spaces, hyphens, periods, or slashes as space
delimiters.
 e.g: Dates: 21/2/2016, 21.2.2016, 21-2-2016, 21 2 2016
 e.g: Time: 04:02:33, 04 02 33

Variable Format - Dollar
 Numeric variables that contain a dollar sign (i.e., $) before
numbers. Commas may be used to delimit every three places,
and a period can be used to delimit decimals.
 e.g: Thirty-three thousand Naira
and thirty-three kobo: N33,300.33
•
To change currency, click on
Edit Menu, Options – Currency
•
In the Prefix Box, type
your preferred Currency
Symbol.
•
Click Apply and OK

Variable Format – Restricted Number
 Numeric variables whose values are restricted to non-negative
integers (in standard format or scientific notation). The values
are displayed with leading zeroes padded to the maximum
width of the variable.
 e.g. 00000123456 (width 11)

Variable Property - Label
 Label
 A brief but descriptive definition or display name for the
variable. When defined, a variable's label will appear in the
output in place of its name.
 e.g: The variable DoB might be described by the label “Date
of Birth".
 The Label format allows users to describe what the variable
name stands for.

Variable Property - Values
 Value labels are useful primarily for categorical (i.e., nominal or
ordinal) variables, especially if they have been recorded as codes
(e.g., 1, 2, 3). It is strongly suggested that a user give each value
a label that is easy to understand
 e.g 1 = “Male”, 2 = “Female”, 1 = “Boy” 2 =“Girl”
 Under the column “Values,” click the cell that corresponds to the
variable whose values you wish to label. If the values are currently
undefined, the cell will say “None.” Click the square “…” button.
The Value Labels window appears.
 Type the first possible value (1) for your
variable in the Value field. In the Label
field type the label exactly as you want
it to display (e.g., “Male")

Variable Property - Missing
 Missing is the user-defined values that indicate data is missing
for a variable (e.g., -99). Note that this does not affect or
eliminate SPSS's default missing value code ("."). This column
merely allows the user to specify alternative codes for missing
values.
 To set user-defined missing value codes, click inside the cell
corresponding to the “Missing” column for that variable. A
square button will appear; click on it.
 Click the option that best matches how

you wish to define missing data and
enter any associated values, then
click OK at the bottom of the window.

Variable Property - Column
 Column specifies the width of each column in the Data View
spreadsheet. Note that this is not the same as the number of
digits displayed for each value. This simply refers to the width of
the actual column in the spreadsheet.
 To set a variable's column width, click inside the cell
corresponding to the “Columns” column for that variable. Then
click the “up” or “down” arrow icons to increase or decrease the
column width.

Variable Property - Align
 The alignment of content in the cells of the SPSS Data View
spreadsheet options include left-justified, right-justified, or
center-justified.
 To set the alignment for a variable, click inside the cell
corresponding to the "Align" column for that variable. Then use
the drop-down menu to select your preferred alignment: Left,
Right, or Center.

Variable Property - Measure
 The level of measurement for the variable (e.g., nominal,
ordinal, or scale).
 Some procedures in SPSS treat categorical and scale
variables differently. By default, variables with numeric
responses are automatically detected as “Scale” variables. If
the numeric responses actually represent categories, you must
change the specified measurement level to the appropriate
setting.
 To define a variable's measurement level, click inside the cell
corresponding to the “Measure” column
for that variable. Then click the drop-down arrow
to select the level of measurement for that variable:
Scale, Ordinal, or Nominal.
 It is vital that you correctly define each variable's measurement
level. This setting affects everything from graphs to internal
algorithms for statistical analysis.
Levels of Measurement
NOMINAL
QUALITATIVE
Nominal Ordinal
(Unranked Category)
(Ranked Category)
e.g.
Gender, Ethnicity, e.g. Age, Educational level,
Colour of Eyes Likert Scales
 Nominal Data – Nominal Level of data is used for unranked

categorical data, where each group has been assigned to a discrete
group. Nominal data most give information about identity. For nominal
data, the number representing each group is completely arbitrary and
just a label e.g 1 = “Male”; 2 = “Female”.
 A variable can be treated as nominal when its values represent
categories with no intrinsic ranking (for example, the
department of the company in which an employee works, state
of origin, region, postal code, religion etc.
 Ordinal Data is the most commonly used level of measurement
and there is an indication of order and magnitude. The data
meaningful rankings that separates or gives importance to the
data group.
 A variable can be treated as ordinal when its values represent
categories with some intrinsic ranking (for example, levels of
job satisfaction from highly satisfied to highly dissatisfied).
 Other examples of ordinal variables include other values used
in Likert scales and preference rating scores.

SCALE
Quantitative
Interval Ratio
No Fixed Origin Fixed origin
No Fixed Distance Fixed distance
 Interval Data – With interval data the distance between values

are known and fixed. e.g Student Score (50 – 60). Such data
has no clear definition of zero.
 Ratio Data has all the properties of interval data, however it
has a clear definition of zero e.g Height, Weigh, Distance,
Speed are all examples of ratio data

Level of Measurement
 The icons that are displayed next to variables in dialog box lists
provide information about the variable type and measurement
level.
 Level of measurement gives a classification that describes the

nature of information within the values assigned to variable.
 Properly defining the level of measurement is essential to
effectively analyzing data.

Level of Measurement
 Ordinal Data
 Likert Scale is a type of ordinal data that is used in research
to give weighted scale e.g.
Job Satisfaction Student Performance
5 = Very Satisfied. 5 = Excellent

4 = Satisfied 4 = Good
3 = Neutral 3 = Average
2 = Not Satisfied 2 = Fair
1 = Very Dissatisfied 1 = Bad
 Yes or No – There is always a meaningful rank between 0

and
©Digital Bridge 1. Abuja
Institute, There Yes/No answers are ordinal data e.g. Do
Page 38
Differences between Levels of Measurement
Can Nominal Ordinal Interval Ratio
Get Frequency Distribution YES YES YES YES
Calculate Mean YES YES YES
Get indication of Order YES YES YES
Fixed Distance between each value YES YES
Has fixed origin of Value YES
Used with most major inferential YES

statistics

Measurement Level for Unknown
 The Set Measurement Level for Unknown dialog allows you to
define measurement level for any variables with an unknown
measurement level.
 Under certain conditions, the measurement level for some or all
numeric variables (fields) in a file may be unknown.
 These conditions include:
 Numeric variables imported from earlier versions of Microsoft
Excel, text data files, or data base sources.
 New numeric variables created with transformation
commands prior to the first data pass after creation of those
variables.

Measurement Level for Unknown
 To set the measurement level for
variables with an unknown
measurement level
 From the  Data  Set
Measurement Level for
Unknown.
 Move variables (fields) from
the source list to the
appropriate measurement
level destination list.

Variable Property - Role
 The role that a variable will play in analyses (i.e., independent
variable, dependent variable, both independent and dependent).
 Input: The variable will be used as a predictor (independent
variable). This is the default assignment for variables.
 Target: The variable will be used as an outcome (dependent
variable).
 Both: The variable will be used as both a predictor and
an outcome (independent and dependent variable).
 None: The variable has no role assignment.
 Partition: The variable will partition the data into
separate samples.
 Split: Used with the IBM® SPSS® Modeler.

Defining Multiple Variables
 The Define Variable Properties window is an efficient way of
defining many variables at once, or defining many variables that
share the same formatting. Click Data > Define Variable
Properties.
 The left column displays all of the

variables in your dataset. Select the
variables you wish to define and
move them to the right column using the arrow button.
Note that you can specify the
number of cases to scan, as well
as the number of values that will
display in the next step. Click
Continue when you have finished selecting variables.
Defining Multiple Variables 2
 A window will appear; this one allows you to define various
properties for each variable you selected.

 Scanned Variable List: The “Scanned Variable List” column includes
the variables selected in the previous step. Variables that do not have
assigned value labels will have an X in the “Unlabeled” column.
 Cases scanned: This section displays the number of cases that
were scanned for each selected variable, as well as the number of
values that are listed in the Value Label grid (G).
 Current Variable: Displays the variable that is currently selected
from the Scanned Variable List (A).
 Measurement Level: Displays the level of measurement for the
selected variable. You can change the level of measurement by
clicking the menu arrow and choosing the desired measurement level
from the listed options: Scale, Ordinal, Nominal.
 Role: Displays the role for the selected variable. Some options in
SPSS allow you to pre-select variables for particular analyses based
on their defined roles.

 Unlabeled Values: Specifies how many values do not have
corresponding value labels.
 Value Label grid: Displays current information about the selected
variable and updates the information based on any changes you
make.
 Label: Allows you to add a label for the selected variable that
describes more about what the variable is. This label is for the
variable rather than for the values of the variable. For example, we
might select the variable Height and give it the label “Student
Height#”.
 Type: Allows you to specify a particular kind of variable that helps
SPSS know how to work with the variable during analyses. The types
include numeric, comma, dot, scientific, date, dollar currency percent,
string, and restricted numeric. Depending on the type you select for
your variable, you may be asked to supply additional information.

 Attributes: Allows you to define custom attributes for variables.
These attributes are supplementary information not otherwise
specified by the variable's label, measurement labels, and missing
values.
 Copy Properties: Allows you to copy properties from one variable to
another variable. You can copy the properties from another variable
to the currently selected variable, or copy the properties of the
currently selected variable to one or more other variables. (For
example, you may have several variables representing survey items,
all of which use the value labels 0 = "No" and 1 = "Yes". After defining
the value labels for the first item, you can use "Copy Properties" to
quickly set the labels for the remaining survey item variables
 Unlabeled Values: Allows you to automatically label unlabeled
values by clicking Automatic Labels.
 When you have finished defining your variables, click OK at the
bottom of the window to apply the changes to your data.
Copy Data Properties
 The Copy Data Properties Wizard is
used is used to make an external SPSS
Statistics data file as a template for
defining file and variable properties in the
current dataset.
 To Copy Data Properties
 From the Data Menu  Click Data >
Copy Data Properties...
 Select the data file with the file and/or
variable properties that is to be
copied. (a currently open dataset, an
external IBM SPSS Statistics data file,
or the active dataset)
 Follow the step-by-step instructions in
the Copy Data Properties Wizard.
Practice 1
 Input the following data into SPSS
NAME GENDER HEIGHT
Ahmed Osagie 1 5.6
Effiong Chinedu 1 6.4
Agbo Olufemi 1 5.8
Osama Samson 1 6.4
Tolu Temi 2 5.9
Zainab Amaka 2 5.6
EkaetteTerver 2 5.9
Bosun Beauty 2 6.2
Nana Ularamu 2 6.8
Blessing Adamu 1 6.0
Okoro Ayodeji 1 5.9
Dada Aminu 2 6.3
Ufoma Kabiru 1 5.9

Solution Sample – Practice 1
 Define the Variable in the variable View
 Define the value for Gender

Solution Sample – Practice 1
 Input the Data in the Data View
 Notice the implementation

of the assigned properties
in the variable view

Saving Files
 To save the data file you created simply click ‘file’ and click
‘save as.’ You can save the file in different forms by clicking
“Save as type.”
 Click Save

Protecting a File
 A file can be protected by
encrypting with a
password.
 Once encrypted, the file
can only be opened by
providing the password.
 To protect a file
 Make the Data
Editor the active
window (click
anywhere in the
window to make it
active).

Protecting a File
 From the File menus  Save
As...
 Select Encrypt file with
password in the Save Data As
dialog box.
 Click Save.
 In the Encrypt File dialog box,
provide a password and re-
enter it in the Confirm
password text box.
 Passwords are limited to 10
characters and are case-
sensitive.

Protecting a File
 Passwords cannot be recovered if they are lost.
 If the password is lost the file cannot be opened.
 Strong passwords are created using eight or more characters.
 Passwords can contain numbers, symbols and other special
characters.
 Avoid sequences of numbers or characters, such as "123" and
"abc", and avoid repetition.
 Do not create passwords that use personal information such as
birthdays or nicknames.

Sorting Data
 Click ‘Data’ and then click Sort Cases
 From the Dialog box display choose
the case you want to use to sort
 Choose the Sort Order (Ascending/
Descending order.
 You can also save
the sorted data to
a different file.

Sorting Data
 Another way to sort data is by clicking on the column to select it
and then right click. From the drop down menu that is
displayed select the sort option (ascending or descending)

Rank Cases
 Ranking is used to recode the
data into their rank ordering from
smallest to largest or largest to
smallest.
 Click on Transform Menu 
Click on Rank Cases
 Click on the Variable that will
be used for the Ranking to the
Variable List.
 Uncheck the Display Summary
Table Check box.
 Assign Rank.
 Click OK.

Inserting Variables
 In the data View, Right click on the Variable heading where the
new variable would be inserted, then click on insert variable or
from the Edit menu, select Insert Variable or from the toolbar
click the insert variable icon
 From the Variable view, set the variable properties

Adding Variables
 Add a new variable “Weight” to the data

Inserting Cases
 A case is a row in SPSS Data View
 Entering data in a cell in a blank row automatically creates a
new case.
 You can also insert new cases between existing cases.
 To insert new cases between existing cases
 In Data View, select any cell in the case (row) below the
position where you want to insert the new case.
 From the Edit Menu  Insert Cases
 A new row is inserted for the case, and all variables receive
the system-missing value defined.

Inserting Cases
 Click on the Case, before which you want the new cases to be
inserted.
 From the Edit Menu, click on Insert Case or from the toolbar
click on the insert
cases icon

Inserting variables
 Entering data in an empty column in Data View or in an empty
row in Variable View automatically creates a new variable,
 The prefix var and a sequential number and a default data
format type (numeric) is assigned to a variable.
 To insert new variables between existing variables
 Select any cell in the variable to the right of (Data View) or
below (Variable View) the position where you want to insert
the new variable.
 From the Edith Menu  Insert Variable
 A new variable is inserted with the system-missing value for
all cases.
 A new Variable can also be added right clicking on the a
variable and from the drop down menu displayed, select Insert
variable.
Navigation
 GOTO Case is used to move to a particular case
based on the active row. Right Click on the active
column, from the Edit menu, Select Got to Case or
click on the Goto Case icon on
the toolbar.
 GOTO Variable is used to move

the active column based on the
active column.
 Type the Case or Variable
Number and click Go

Find and Replace
 The Find and Replace feature is helpful for task such as
updating respondent’s name. User can use find and replace in
data view
 Switch to the data view window. Click to select any cell in the
column to be searched or click on the column header.
 From the Edit menu click Find or Click on the Find icon
on the toolbar.
 From the dialog box
displayed, input the data
to be searched for. Other
Find options are listed
under the show options.

Changing Number to Text
 This feature is used to convert Variables that contain ordinal
data either to their numeric or string values.
 To switch text (string) or numeric values; click on any of the
cell containing the values and on the toolbar click the switch
to numeric value icon.

Handling Missing Values
 System Missing: When a cell is left empty in SPSS, the cell is
assigned a dot and is identified as a system missing in SPSS.
 User Missing: User missing is designated by assigning a value
to missing cells in the “Missing” column in SPSS variable view.
It is recommended that you define Missing before entering
cases in the data view.
 You can assign user missing
values by double clicking on
the missing value variable, and
assigning values in the dialogue
box shown
below

Handling Missing Values
 It is better we choose a value for our missing data that we
would not probably have in our dataset.
 For example, we can assigned -99 as the missing data for age,
because it is impossible for us to have -99 as the data for a
particular age in our dataset.
 However, -99 cannot be used when we are handling financial
data.
 It is good to have as much fewer labels for missing value as
possible, so as to keep our data simple.
 values for missing data

SPSS Output Viewer
 The SPSS output View is used to display the result of analysis.
 SPSS output viewer files are saved with extension .spv

SPSS Output Viewer 2
 SPSS’ Output Viewer window is the window that contains all generated
output. The most typical output items are tables and charts that describe
patterns in our data. An Output Viewer window opens automatically when
we generate output.
 Output view files are saved DATA VIEW
with extension .spv
OUTPUT VIEW
OUTPUT
OUTLINE
ACTUAL
OUTPUT
Transforming Data
 Example: Adding a new variable named ‘lnheight’ which is the natural
log of height
 Click on the Transform menu  Compute Variable 
 Type in lnheight in the ‘Target Variable’ box. Then type in
‘ln(height)’ in the ‘Numeric Expression’ box. Click OK

Importing Excel Data
 From the file menu, Click Open Data; from the dialog box
displayed choose the file and excel file format.
 SPSS uses the first row of the excel worksheet to create the
variable name and type.
 The excel file will then be opened in SPSS

SPSS Shortcuts
KEYBOARD SHORT CUT ACTION
CTRL + N Start a New File in SPSS
Ctrl + O Open an existing file.
Ctrl + S Save the active dataset to a file
Ctrl + C Copy Selection
Ctrl + V Paste copied selection
Ctrl + Z Undo an action
Ctrl + Y Redo an action
Ctrl + F Find and/or replace cases in data view
Ctrl + H Find and replaces cases in data view
Ctrl + T Switch between data and variable view
Hold Shift + Arrow Keys To select multiple cases
Delete To delete selection

Select Cases 1
 If there is two or more group
in a dataset, and each group
is to be analysed
separately/independently, the
select case option can be
used.
 For example in a dataset
where Male and Female
participants exist, the male
data can be analysed
separately from female.
 To Select Case  Switch
to Data Editor View.
 From the Data menu 
Select Cases.
Select Cases 2
 In the Select Case dialog Box, in the Select Section 
Select If condition is satisfied option  Click If button.
 In the select case: If dialog box, select the variable of choice
in the left box and then click the transfer arrow button to
move it to the right box.
 Click the = button, Click the 1 button and then click continue
 In the select case dialogue
box, Click Ok.
 The output viewer window
will open indicating the
selected participants.

Compute Variable
 Computing variable involves using different arithmetic and
mathematical functions to transform variables. Compute
variables can be used to;
 Add, subtract or divide a set of variables.
 To add a set of likert scale to form an overall scale
 To select cases.
 To normalize data (log Transformation) that violates the
assumption of normality.
 To check for multivariate outliers.
 To standardize values

Compute Variable 2
 To compute variable, Click on the Transform menu  Compute
Variable.
 The compute variable dialog
New Variable Arithmetic
box can now be used to Names Expression
perform task e.g. multiplying
student height by weight in
the datasheet shown.
New
Computed
Variable

Compute Variable 3
 Select cases is very useful when selecting only a particular
case in a dataset e.g to select only 1 = “Male” in a particular
dataset.
 To select case click on Data Menu  Select Cases  Click on
If Condition  If  then select the group case  Click
Continue  Ok. Any further analysis would be based on the
selected group
Unwanted data have been deselected

Compute Variable 4
 To remove select cases, Click on Data Menu Select Cases 
Click on all Cases  Ok.

Recoding Variables
 There are three main ways to recode variables in SPSS.
 Recoding String Variables to numeric.
 Recoding variables to New Variables.
 Recoding the same Variable.
 To change string data to numeric
Data, Click on Transform Menu
 Automatic Recode.

Recoding Variable 2
 A variable can be changed into a range, by using the recoding variable
feature, by Clicking on Transform  Recode into Different Variables.
 Select the variable to Recoded, Assign a New Name, Specify the Range.
 Click Continue  Ok.

Recode to a New Variable
 Add the gender Variable and Assign a New name GenderNum
for the Recoded Gender Variable
 Check “Treat blank string values as user missing” box. Click OK
 A New Variable GenderNum is added to the Dataset.

Weighing Cases
 Weighting cases is used to assign "importance" or "weight" to
the cases in a dataset. Some situations where this can be
useful include:
 data is in the form of counts (the number of occurrences) of factors or
events. The "weight" is the number of occurrences.
 data requires adjustments to correct for over- or under-representation of
certain characteristics in the sample. (This often happens with large
surveys: a "weighting" variable is developed to adjust a sample's
composition to be reflective of the population's composition, or to control
for over- or under-reporting from a certain group.)
 To Weigh Cases Click on  Data Menu  Weight Cases.
 To enable a weighting variable, click Weight cases by, then
double-click on the name of the weighting variable in the left-
hand column to move it to the Frequency Variable field.
Click OK.

Weighing Cases 2
 To turn off an enabled weighting variable, open Weight Cases
window again, and click Do not weight cases. Click OK.
 Further test such as Cross Tab and Chi-square test can be

performed further on the weighed cases.

Grouping Data
 Grouping data tool is used to temporarily "group" or "split" data
in order to compare results across different subsets.
 This is useful when you want to compare frequency
distributions or descriptive statistics with respect to the
categories of some variable (e.g., Gender) - especially if
separate tables of results for each group is needed.
 To create group Click Data Menu  Split File
 The Split File window will appear.
By default, the dataset is not split
according to any criteria; this is
indicated by Analyze all cases,
do not create groups.
 Choose one of two ways to split the
data: Compare groups or Organize output by groups
Grouping Data 2
 For both splitting methods, there are two considerations to be
made:
 The splitting variable(s) should be nominal or ordinal
categorical. SPSS will not stop you from using a continuous
variable as a splitting variable, but it is not advisable SPSS
will see each unique numeric value as a distinct category.
 In order to split a dataset, SPSS requires that the data be
sorted with respect to the splitting variable. By default, Sort
the file by grouping variables is selected.
 When you no longer want to split your analyses by group, you
can turn Split File off through the same window that was used
to turn it on.
 Click Data  Split File  Click Analyze all cases, do not
create groups  Click OK.

Practice Test
 Classify each of the following set of data as ordinal, nominal or
scale.
 Student Score in a Semester Course – GST 142
 Rating of Women’s top 5 favourite TV programme.
 Average entry level salary of fresh graduates in Nigeria.
 The prices of 10 top brand of 1.5cl bottled water sold in
Nigeria.
 The number of bottles of Coca Cola taken by each
undergraduate in Nigeria universities in a week.
 Marital Status of Staff that Work in an Organisation.
 Measure of Job Satisfaction of people living a city.

Identifying Duplicate Cases
 To Identify and Flag
Duplicate Cases
 From the Data Menu 
Identify Duplicate
Cases...
 Select one or more
variables that identify
matching cases.
 Select one or more of the
options in the Variables to
Create group.

Identifying Duplicate Cases 2
 Or you select one or more variables to sort cases within
groups defined by the selected matching cases variables.
 The sort order defined by these variables determines the
"first" and "last" case in each group. Otherwise, the original
file order is used.
 Automatically filter duplicate cases so that they won't be
included in reports, charts, or calculations of statistics.
 Define matching cases by.
 Cases are considered duplicates if their values match for all
selected variables.
 If you want to identify only cases that are a 100% match in
all respects, select all of the variables.

Calculating with Dates
 There are two methods that can be employed to determine the
duration of an event. The first method uses the Compute
Variable function and the second utilizes Date and Time Wizard
function. Both can be found in the Tool Bar under Transform.
 To use the Date and Time
Wizard: Click on the Transform
Menu  Date and Time Wizard.
 The Date and Time Dialog box
will be displayed.

Calculating with Dates 2
 In the next window select the
“Calculated the number of time
units between two dates” option.
 This option is used to subtract
two date from one another and
obtain a value in years, months
or days.
 This option can also be used to

add or subtract a particular
value to or from a date.

 Select the two dates to
subtract from. The Current
date and time function
[$TIME] is included in SPSS
in this dialog box.
 Choose the result
Format (Truncate to
Integer; Round Up to
Integer, retain Traditional
Part)
 State the Name of the
Result variable and the
Variable Label.
 Click Finish

 The difference in years between two dates is displayed in the
Age variable.
 The Time and Time Wizard can also be used to perform several
task the involves date and time.
Module II
DESCRIPTIVE STATISTICS

Representing Data
DATA
QUALITATIVE DATA QUALITATIVE DATA
TABULAR GRAPHICAL TABULAR GRAPHICAL

METHODS METHODS METHODS METHODS
 Frequency Dist.  Histogram

 Frequency  Bar Chart  Relative Freq. Dist.  Scatter Diagram
Distribution  Pie Chart  Cumulative Freq.
Distribution
 Relative  Cumulative
Frequency Relative Freq.
Distribution
Distribution  Crosstab
 Cross Tab
Introduction to Descriptive Statistics
 Descriptive Statistics is important for;
 Summarizing and Describing data.
 To make comparison or determine the relationship between
variables.
 Checking assumptions e.g.
Are their outliers
Is my data properly distributed.
 Answering Research Questions and Objectives.
Finding the mean weight for a particular population.
All other descriptive statistics techniques.
Gathering demographics of all other data.

Descriptive Statistics

Descriptive vs Inferential

Frequency Analysis
 Frequency Analysis is a descriptive statistical method that
shows the number of occurrences of each response chosen by
the respondents.
 When using frequency analysis, SPSS statistics can also
calculate the mean, median and mode to help users analysis
results and draw conclusions.
 To use the frequency analyses tool, Click on the Analyze Menu
 Descriptive Statistics  Frequencies.

Frequency Distribution
 Frequency distribution is a simple way of depicting the number
of occurrence of a particular value of characteristics. It can be
shown by using either a table or a graph.
 From our sample data we can, run the frequency distribution for
the Age variable as follow;
 Click on Analyze Menu
Descriptive Statistics
 Frequencies
 The result would be
displayed in the output
viewer.
 The data can now
be described.

Frequency Distribution
 From our sample data we can determine the demographics of
our data by finding the distribution of Males and Females.
 From the Analyse Menu  Descriptive Statistics 
Frequencies.

Measures of Central Tendency
 Measures of Central tendency are used to determine the middle
of a distribution and helps to summarise the data
 It helps to identify values that can be used as a representation
of the entire population.
 It helps to make reasonable comparison.
 Can be used as the base for most inferential statistics.
 The main measures of central tendency are;
 Mean
 Median
 Mode

Finding Mean, Median and Mode
 Click Analyze Menu  Descriptive Statistics  Frequencies.
 Select the Variable of Interest and Click Statistics.
 Note- Analyze Descriptive Statistics

 Descriptive gives only Mean

Measures of Dispersion
 Measures of dispersion measure how spread out a set of data is.
It is used to determine;
 Two or more set of data with similar mean.
 Know how far an observation is from the mean.
 Judge the variability in a sample or population.
 The main measures of dispersion are;
 Range: The range is the difference between the highest and
lowest scores in a data set and is the simplest measure of
spread.
 Quartile tells us about the spread of a data set by breaking the
data set into quarters. Inter quartile range is the difference
between the 25th Percentile (lower quartile) and the 75 th
Percentile (Upper Quartile). Therefore it describes the middle
50% of the sample. A large interquartile range shows that the
Middle 50% of observation are spread apart and is not
sensitive to outliers.
Measure of Dispersion 2
 Variance: is a numerical value used to indicate how
widely individuals in a group vary. It is the square of the
standard deviation. The main limitation of variance lies in
the fact that its values are arbitrary and it does not
represent the true unit of measurement.
 Standard Deviation is the Square Root of the
Variance. It tells how far our samples are from the mean
or average.
 Measures of dispersion are not applicable to nominal
data because there is no reasonable rank between
each category.
Measure of Dispersion -3
 Finding Range, Variance and Standard Deviation
 Click Analyze Menu Descriptive Statistics  Frequencies
 Click on the Variable of Interest  Select Range, Variance
and Standard Deviation Click OK.

Measure of Dispersion 4
 Finding Interquartile Range
 Click Analyze Menu  Descriptive Statistics  Explore, select
the variable of interest and Click OK.

Percentiles
 Percentiles (or a centile) is a measure used in statistics indicating
the value below which a given percentage of observations in a group
of observations fall.
 If the 60th percentile of student weight in a population is 68kg that
means 60% of the population of students are below 60kg. Also if in a
student score dataset if a score is in the 90th percentile that means
the score is higher than 90% of other scores.
 Percentiles are used to compare values in relation to the entire
population and also as an indication of performance.
 Decile - each of ten equal groups into which a population can be
divided according to the distribution of values of a particular variable.
 Quartile – are points in a distribution that cuts the distribution into
quarters. The three quarters in a distribution are 25% 91st or lower
percentile), 50% (50th percentile or median) and 75% (75%
percentile).

Percentiles - 2
 To find Percentile Click on Analyze Menu  Descriptive
Statistics  Frequencies. Select the variable of interest 
Statistics.
 Specify the Percentile  Continue

Percentiles 3
 To find Percentile of a sample; from the Analyze Menu, click
Descriptive Statistics  Frequencies  Statistics. From the
dialog box displayed, select the variable of interest.
 Select  Cut points for 10 equal groups

Percentiles 4
 To find Quartile of a sample; from the Analyze Menu, click
Descriptive Statistics  Frequencies  Statistics. From the
dialog box displayed, select the variable of interest.
 Select  Cut points for 10 equal groups

Z Scores
 A Z-score is a statistical measurement of a score's relationship
to the mean in a group of scores. A Z-score of 0 means
the score is the same as the mean. A Z-score can also be
positive or negative, indicating whether it is above or below the
mean and by how many standard deviations.
 Suppose we measure the weight of student in a class, and we
found out that the mean weight is 70.1kg and the standard
deviation is 8.1. In this case we could say that a student who is
78.1kg is one standard deviation below the mean weight,
assuming that the weight is normally distributed. Thus the Z is
1.
 Thus the Z score allows to estimate the number of unit with
which a student weight is higher than the mean weight.
 Z Score have a Mean of 0 and Standard Deviation of 1.

Z Scores - 2
 Uses of Z Score;
 It provides you with a probability of a defect.
 Probability of a score occurring within a normal distribution.
 It allows for the creation of composite or combined variables
from two or more variables from different scales, mean and
standard deviation.
 Assuming that Samuel is boasting that he has more weight
than anyone in the class; we would need to find out the
following;
 How man units of Standard Deviation does his height differ
from the mean of all other members in the class.

Z Scores 3
 To find the Z Score Click on Analyze  Descriptive Statistics 
Descriptives  Select the Variable of Interest  Click Save
Standard values as variables, Click OK.
 Get the mean and Standard deviation score by Click Analyze

Menu  Descriptive Statistics Frequencies  Mean and
Standard Deviation  OK.
Cross Tab
 Cross tabs is used to summarize the relationship between two
categorical variables. A cross-tabulation (or crosstab for short)
is a table that depicts the number of times each of the possible
category combinations occurred in the sample data.
 To create a crosstab, click Analyze 
Descriptive Statistics  Crosstabs
 Row(s): One or more variables to use
in the rows of the crosstab(s). You
must enter at least one Row variable.
 Column(s): One or more variables
to use in the columns of the crosstab
(s). You must enter at least one Column variable.
• Layer- layer is used to create the crosstab between the Row and
Column variable(s) at each level of the layer variable. You can have
multiple layers of variables by specifying the first layer variable and then
clicking Next to specify the second layer variable.
Cross Tab 2
 Statistics: Open the Crosstabs: Statistics window, which
contains different inferential statistics for comparing categorical
variables.
Cells: Opens the Crosstabs: Cell
Display window, which controls which
output is displayed in each cell of the
crosstab.
 Format: Opens the Crosstabs: Table
Format window, which specifies how
the rows of the table are sorted.

Cross Tab 3
 The output from the cross tab examples show the relationship
between male and female in the gender variable.
 The crosstabs procedure can use numeric or string variables

defined as nominal, ordinal, or scale. However, crosstabs
should only be used when there are a limited number of
categories.

Comparing Values
 Comparing values for Groups is use to compare the data
between two or more groups e.g.
 Compare the weight of male and females.
 Compare the weight between sick and healthy people.
 Click of Data Menu Split File  select the Variable of
Interest (e.g. Gender)  Compare Groups OK. The
Analyze Menu Descriptive Statistic Frequencies
Select field of Interest  Select Mean, Median Range,
Standard Deviation, Click OK.

Comparing Values 2
 Organising output by groups create separate tables for different
categories or group in the output viewer.
 Click on Data Menu Split File  select the Variable of Interest
(e.g. Gender)  Organise output by groups OK. The Analyze
Menu Descriptive Statistic Frequencies Select field of
Interest  Select Mean, Median Range, Standard Deviation,
Click OK.

Compare Mean
 The Compare Means procedure is used when there is a need
to compare differences in descriptive statistics across one or
more factors, or categorical variables.
 To open the Compare Means procedure, click Analyze Menu 
Compare Means  Means.
Dependent List: The
continuous numeric
variables to be analyzed.
You must enter at least
one variable in this box
before you can run the
Compare Means procedure.
Independent List: The categorical variable(s) that will be used to subset
the dependent variables. Specifying multiple values in the "Layer 1 of 1"
box will produce several tables, each with one layer variable.
Compare Mean 2
Options: Opens the Means: Options window, where you can specify the type
and order of descriptive statistics to produce.
 Select Statistics  Continue Ok.
 Results are displayed in the output
viewer

Creating Charts
 A histogram is a graphical representation of the distribution of
numerical data. It is an estimate of the probability distribution of
a continuous variable (quantitative variable) and was first
introduced by Karl Pearson.
 To create a Histogram Click Graph Menu  Legacy Dialog 
Histogram  Select the Variable of Interest  OK.

Histograms
 Another way of creating Histograms is to Click on Graph 
Chart Builder  Ok.
 Drag the Histogram of Interest from the gallery to the Preview,
then the variable of interest to the Preview Portion of Dialog
Box.

Bar Charts
 A bar chart or bar graph is a chart that presents grouped data
with rectangular bars with lengths proportional to the values
that they represent. The bars can be plotted vertically or
horizontally. A vertical bar chart is sometimes called a column
bar chart.
 There are three main types of bar Chart in SPSS;
 Simple Bar Chart – this can be used to visualize the distribution of one
variable or the relationship between two or more variable e.g. Gender
and Height
 Clustered Bar Chart – this is used mostly when a link is to be
established between two or more variables, it is used to support
inferential statistics such as ANOVA e.g. We can find out whether income
class of a set of people affect the amount they spend on call credit in a
month.
 Stacked Bar Chart - this is used for comparing nominal variable
(gender) based on an ordinal variable (Likert Scale) e.g. we can
compare gender with the degree of job satisfaction
Creating a Bar Chart
 To create a Bar Chart Click Graph Menu  Legacy Dialogs 
Bar chart  Select the Variable of Interest  OK.
 Insert the variable of Interest into the category axis box
 Click Ok.

Creating a Bar Chart 2
 To Create a Bar Chart, Click on the Graph Menu, Chart Builder
 Ok.
 From the Chart Builder dialog Box Drag the Bar Chart of
Interest from the gallery to the Preview, then the variable of
interest to the X axis  Chose type of Statistics from the
element dialog box  Click Ok.
statistics

Clustered Bar Chart
 To Create a Clustered Bar Chart, Click on the Graph Menu,
Chart Builder  Ok.
 From the Chart Builder dialog Box Drag the Clustered Bar
Chart of Interest from the gallery to the Preview, then the
variable of interest to the cluster on X axis
 Drag any other variable of interest to Count  Chose type of
Statistics from the element dialog box  Click Ok.
Cluster On X

Pie Chart
 Pie Chart are simple chart that are used to depict percentages.
Pie Chart are chart in the form of a circle which is divided into
sectors that each represent a proportion of the whole. They are
commonly used to represent demographic questions in a
dataset and are best suited for two or more categories but not
more than six.
 One way to create a Chart in SPSS is by
Clicking on Analyze Menu  Descriptive
Statistics  Frequencies Select the
variable of Interest Chart Pie Chart
 Percentage or Frequency  Continue
 Ok.

Pie Charts - 2
 Another way of creating Pie Chart is to Click on the Graph
Menu  Legacy Dialogs  Pie Chart  Select Type of
Grouping  Define . This is a more flexible options that allows
a user to add labels and tags.

Pie Chart - 3
 A third way to create Pie Charts in SPSS is to use the Chart
Builder. Click on Graph Menu  Chart Builder  Ok.
 Drag Pie Chart to the Preview Screen  Drag the Dependent
variable to Slice by Section in the Preview Screen.
 Select the Statistics to use  Click Apply  OK.
 Properties, Titles and Labels can also be added to the Pie Chart.

Pie Chart - 4
 Value Labels can be added to Pie Charts in the output Viewer.
 Double Click on the Pie Chart in the Output Viewer, this
activates the Chart Editor.
 Then Right Click on the Pie Chart and Select Show Data Labels
 Other chart properties can
also be set from here.

Line Graphs
 A line graph is useful in displaying data or information that changes
continuously over time. The points on a line graph are connected by
a line. Another name for a line graph is a line chart. Line Graph are
very similar to bar chart and both can be used for the same kind of
variable. In SPSS we can use Simple or Clustered Line Graphs.
 To Plot a Line Graph, Click on Graph Menu  Chart Builder  OK.
 Drag Line Chart to the Preview Menu  Drag the dependent and
Independent variable to their respective axis.
 Set Chart Properties  Apply  OK.

Line Graph - 2
 The clustered line Graph is used to look at the relationship or
difference between more than one variables.
 To create a cluster Line Graph, Click on Graph Menu  Chart
Builder  OK.
 Drag Clustered Line Chart to the Preview Menu  Drag the
dependent and Independent variable to their respective axis.
 e.g The graph below show the relationship between student
weight and height to their gender,

Scatter Plot
 Scatterplots are used to create a plot that depicts the
relationship between two variables.
 It is a graph of plotted points that show the relationship
between two sets of data. In this example, each dot represents
one person's weight versus their height.
 Scatterplots are the most commonly used graph to investigate
the relationship between two variables before carrying out
inferential statistics like Pearson Product Moment Correlation or
Spearman Rank Order Correlation (Non-Parametric alternative
of the Pearson Moment Correlation).
 For example from our sample data, we may want to investigate
the relationship between student weight and height using a
scatter plot.

Scatter Plot 2
 To create a Scatter Plot Graph, Click on Graph Menu  Chart
Builder  OK.
 Drag the simple Scatter Plot to the Preview Menu  Drag the

Scatter Plot 3
 To Draw a line across the Scatter plot  double click on the chart
in the SPSS output viewer, from the chart editor and property
dialog boxes select the type of line to use  Click Apply  Close.
 The Chart Editor and the property dialog boxes are used to
change the look and appearance of the chart.

Box Plot 1
 A box plot is a graphical rendition of statistical data based on the
minimum, first quartile, median, third quartile, and maximum.
 The term "box plot" comes from the fact that the graph looks like a
rectangle with lines extending from the top and bottom. Because
of the extending lines, this type of graph is sometimes called a
box-and-whisker plot.
 To create a Box Plot Graph, Click on Graph Menu  Chart Builder
 OK.
 Drag the simple Box Plot to the Preview Menu  Drag the
 The X-Axis data must be a scale variable

Box Plot 2
75% Percentile
Inter Quartile
Range
50% Percentile
(Median)
25% Percentile
Whiskers

Chart Editor
 The Chart editor is used in the output viewer to edit or change
the way charts are displayed. It is also used to change the
property of charts.
 To use the Chart Editor, double click on the Chart in the Output
viewer, then to use the property editor; from the edit meu 
Click Edit Properties or  Ctrl + T.
 To make specific change to a
particular section of a chart,
just right click on that section
then select the intended action
from the dropdown property
menu.

Exporting Output
 The Export File features allows a user to save SPSS output to
other file format e.g Word format, PDF, Excel etc.
 There are four main settings to look at;
 First, pick the type of file to which you want to export: useful
file types include Excel, PDF, PowerPoint, or Word. Next,
 check that you are exporting as much of your output as you
want, the Objects to Export at the top of the dialog.
 If you have a part of your output selected, this option will
default to exporting just your selection, otherwise you
typically will export all your visible output.
 Finally, change the default file name to something
meaningful, and save your file to a location where you will be
able to keep it

Exporting Output 2
 To Export Output from the Output Viewer Click File Menu 
Export.

Independent vs Dependent Variable
 The two main variables in an experiment are the independent and
dependent variable.
 An independent variable is the variable that is changed or
controlled in a scientific experiment to test the effects on the
dependent variable.
 A dependent variable is the variable being tested and measured in
a scientific experiment.
 The dependent variable is 'dependent' on the independent
variable. As the experimenter changes the independent variable,
the effect on the dependent variable is observed and recorded.
 The value of the independent variable is varied by the researcher,
while the value of the dependent variables results from the
changes in the value of the independent variable.
 The value of the independent variable can be changed, while the
value of the dependent variable cannot be change.
Independent vs Dependent Variable
 For one independent variable, there may be more than one
dependent variable.
 However, for more than one dependent variable, there is
always one independent variable.
 The value of an independent value can be changed. You cannot
change of the value of a dependent variable.
 The independent variable is the value which is manipulated in
an experiment. The dependent variable is the value observed
by the researcher during an experiment.
 The Independent Variable is denoted by X, while the dependent
variable is denoted by Y in a graphical representation.
 The Independent variable denotes the cause, while the
dependent variable denotes the effects of the actions of the
cause.

Dependent/Independent Variables
 The Independent variable is the variable that causes the change. It is
the variable that affects the dependent variable. The dependent
variable responds to changes in the independent variable.
Test Case Independent Dependent Variable
Variable (X) (Y)
Relationship between Gender and Gender Weight
Student Weight
Relationship between Ice Cream Sales Weather Sales
and Weather
Difference in motivation level between Gender Motivation Level
males and Female Workers
Job Satisfaction between Self Employment Type Satisfaction Level
employed and employed persons
Job Satisfaction and Employee Level Job Satisfaction Employee
of Performance Performance
Grey Hairs and Age Age Grey Hair
Relationship between the Hours Hours worked in a Amount Earned
worked and amount earned Month
Practice Test
 Generate data to show the relationship between income level and job
satisfaction.
 Generate SPSS data to illustrate the relationship between Type of Job

and cumulative income at the end of the month.
 A sample of 20 people enrolled in a weight reduction programme. After

one month their weight loss in Kg is as follows; 12,9, 10, 12, 15, 6, 9, 3,
7, 20, 11, 12, 18, 8, 10, 6, 13, 14, 9, 15. Using SPSS find the Mean,
Median, Variance and Standard Deviation for the Weight Loss. Using the
data Histogram and Line Charts to illustrate the relationship.
 Generate data for the student assessment of the course material for
GST 412 and how student satisfaction level of course material affect
their performance in their semester exam. Acceptance level is scaled as
“Extremely Helpful”; “Very Helpful”; Somewhat Helpful”; “Slightly helpful”;
Not helpful at all”

Module III
Statistical Testing

Statistical Testing
 Statistical Testing procedures generally fall into two possible
categorizations: parametric and non-parametric.
 Depending on the level of the data you plan to examine (e.g.,
nominal, ordinal, continuous), a particular statistical approach
should be followed.
 To make generalization about the population from the sample,
statistical tests are used.
 It is a formal technique that relies on the probability distribution,
for reaching the conclusion concerning the reasonableness of
the hypothesis.
 These hypothetical testing related to differences are classified
as parametric and nonparametric tests.
 In the parametric test, the test statistic is based on distribution;
while in non-parametric test, the test statistic is arbitrary.

Parametric and Non-Parametric Test
 In the parametric test, it is assumed that the measurement of
variables of interest is done on interval or ratio level. As
opposed to the nonparametric test, wherein the variable of
interest are measured on nominal or ordinal scale
 Parametric tests rely on the assumption that the data you are
testing is normally distributed.
 Non-parametric tests are frequently referred to as distribution-
free tests because there is no strict assumptions to check in
regards to the distribution of the data.
 In general, the measure of central tendency in the parametric
test is mean, while in the case of the nonparametric test is
median.

Normal Distribution
 The normal distribution is the most important and most widely
used distribution in statistics.
 It is sometimes called the "bell curve," although the tonal qualities
of such a bell would be less than pleasing. It is also called the
"Gaussian curve" after the mathematician Karl Friedrich Gauss.
 The assumption of normality assumes that the data will be
normally distributed.
 It conforms with a bell shaped curve also known as the normal
distribution curve.
 Most statistical test relies on the assumption of normality.
 Majority of the data is concentrated at the centre with the
remaining data thinning to the left and right of the curve.
 Normal distribution is one of the most important assumption for
parametric statistics. If the data is not normal, the researcher can
only use non-parametric test.
Normal Distribution 2
 Normal distribution makes it possible for us to easily detect
biased data.
 A normally distributed data is data with the mean at the centre.
 It is good for data to be normally distributed so that result

obtained will be without bias.
 Normal distribution is one of the most important assumption in
statistics.

 There are two main techniques in SPSS that is used to check for
Normality.
 The Graphical Techniques: using Histogram with a normal
curve.
 Statistical Technique: using Skewness, kurtosis, Shapiro-Wilk
and Kolmogorov-Smirnov.
 There are also two methods of implementing the graphical
technique of finding normal distribution. They are;
 Normal Distribution Curve:
 This assumes that data would be normal distributed i.e.
conforms to a bell shaped curve (normal distribution curve)
 Normal Quartile-Quartile (Q-Q) plot:
 This is a scatterplot of data versus the expected quartile, if the
data actually comes from a normal distribution then the
scatterplot would deviate in a normal fashion.
 One Scale Variable e.g height.
 One Categorical Variable: used when comparing a scale
variable e.g. height against a categorical variable e.g. Gender.
 Two or more Category Variable: used when comparing a scale
variable against two or more category variable.
 To Create a One Category Variable Curve

 Click on Graph Menu  Legacy Dialogs  Histogram.
Select the variable of interest  Select display Curve 
Click Ok.

 The normal curve below show the normal distribution of student
weight.
 The graphical method relies on subjective opinion (relies on the

judgement of the researcher as against statistical evaluation),
thus researchers prefer not to use it.

 One categorical variable and one dependent variable. e.g
gender and weight
 To do this we create two separate histogram for the two
variables
 Click on Data Menu  Split File  Compare Group  Select
the variable of interest  Compare Group  Ok.
 Graph Menu  Legacy dialogs  Select the variable of interest
Display Normal Curve  Ok.

 Normal distribution for two categorical variable and one dependent
variable e.g health status (healthy & unhealthy), Gender (male
and female) and weight.
 Click Data Menu Split File Add the two categorical variables
 Compare groups  Ok.
 Graph menu  Legacy Options  Histogram Display Normal
Curve OK.
 The graph are displayed. All the graphs may not pass the
normality test, or may have some Skewness to it. Especially if the
data set is not large.
 A simple way of plotting the Normal Q-Q Plot is to Click on the
Analyze Menu  Descriptive Statistics  Q-Q Plot  From the
Q-Q Plot dialog box displayed  Select the Target Variable 
Click Ok

 Normal Q-Q Plot
 Click Analyze Menu  Descriptive Statistics  Explore 
Select the Variable of Interest  Click Plots  Histogram 
Normality Plot with Test  Un-tick Stem & Leaf  Continue 
Ok.

 From the Q-Q Plot below we cannot say that the weight for
male and female student is normally distributed because the
relationships as shown in the graph is not linear.
 This method is used in cases where you have one dependent

variable and one categorical variable.

Normal Distributed 11
 Using a Q-Q plot we can also check for the Normal distribution
of one dependent variable and two categorical variable. To do
this;
 Click on Data Menu Split File Compare group Select
the two categorical variable  OK.
 From the Analyze Menu, Select
Descriptive Statistics Explore Add
The dependent Variable of Choice 
Click Ok

 The Q-Q plot output below show the distribution of two
categorical variable Gender and Health Condition against
One dependent variable Weight.
 The Q-Q plot show

that the categorical
variables are not
normally distributed.

Skewness
 Skewness and Kurtosis are Statistical methods used to check
for Normal Distribution.
 Skewness measures the degree of asymmetry; which is the
violation of symmetry. A normal distribution is symmetric and
has a Skewness value of zero (0).
Normal
Distribution
 A variable with no skew has a balance tail.

Skewness 2
 A Variable with a Positive Skew has a long right tail.(Higher
values).
 A Variable with a Negative Skew has a long left trail . (Lower
Values). Positive Skew
Negative Skew,
variables deviates
from mean
 A Skewness value that is more than half its standard error for a
variable is an indication that the data is not symmetrical. By
dividing the Skewness value by the standard error we can find
out whether the Skewness is statistically significant.
 If the Skewness value is greater than – or + 1.96, it is
statistically significant and the data under consideration may
violate the assumption of normality.
Skewness 3
 From our sample data let get the Skewness for Weight. To do
this;
 Click Analyze Menu Descriptive Statistics  Explore 
Select the Variable of Choice (Weight)  Statistics 
Descriptive  Continue  Ok.
 From the result displayed in the output viewer window pick

out the Skewness of the variable. Divide the Skewness with
the standard error value. If the answer gotten is greater than
Kurtosis
 Kurtosis measures the level of sharpness or flatness of a
frequency distribution curve. A normally distributed data has a
frequency of zero.
 A normal curve with zero (0) kurtosis is called Mesokurtic curve.
 A positive kurtosis is called Leptokurtic Curve. This means the
variable under study may violate the assumption of normality.
The curved is usually “peaked”
 A negative kurtosis is called a Playtykurtic curve. It means the
frequency of the distribution is low for all the values.

Kurtosis 2
 To check for kurtosis. View the kurtosis value from the statistics
table of the output viewer.
 Divide the kurtosis value by the standard error value. If the
value gotten is greater than -/+ 1.96, this means that the
variable under study may have violated the assumption of
normality.
 In interpreting skewing and kurtosis values, it can be written as
follows;
 Variable was (non-normally) or (Normally) distributed with
Skewness of _____(SE = ______) and Kurtosis of
___________ (SE = ______).

Shapiro Wilks Test
 The Shapiro Wilks test of normality test the null hypothesis to
show that the sample data are not significantly different from a
normal population be comparing the score or values. Thus
Shapiro Wilks test for the level of similarities between the
observed values and the expected values.
 In this case the significant value should not be less than .05
because the value indicates the level of similarities or difference
between the observed and expected values.
 The test works well mostly when the value is less than or equal
to 50.
 Use the Shapiro-Wilks test Click on Analyze Menu 
Descriptive Statistics  Explore  Select the Variable of
Interest  Plots  Click Normality Plot with Test  Untick
Stem and Leaf  Continue  OK.

Shapiro Wilks Test 2
 From the output table displayed it can then be determined

whether a variable is normally distributed based on shapiro wilk
test if the significant value is greater than 0.05

Kolmogorov Smirnov Test
 Kolmogorov Smirnoff test is similar to the Shapiro Wilks Test.
The main difference between the two test is that the
Kolmogorov Smirnov test can be used for a sample that is more
than 50 and it is not sensitive to problems in the tail (outliers
and clumps) in the tail. It is a non-parametric test.
 Kolmogorov Smirnov looks at the similarity between the
cumulative distribution of the samples and the cumulative
distribution of the normal population. A significant (less than
0.05) p-value shows that the sample data is not normally
distributed.
 Where there is a conflicting result between the output of
Shapiro Wilks and Kolmogorov Smirnov test, the researcher
would have to rely on other method of checking for normality.

Non-Normal Variable Transformation
 If data checked for normality using different test and data is not
normally distributed, the data can be fixed to make it normally
distributed using log transform and square root method.
 To transform a variable Click on Transform Menu  Compute
Variable. Select the operation to be performed on the variable
from the function group.
 Click Ok.
 Find the Skewness, Kurtosis
and other test of the variable.
You will find out that that the data
is getting close to be normally
distributed

Univariate Outliers
 Univariate Outliers are extreme values for a single variables.
Outliers can appear in a dataset as a result of mistakes or
extremely different values.
 Outliers tend to affect the statistical significant of results.
Two methods of checking for outliers are;
Statistical Method using Outlier labelling rule
Graphical Method.
 Analyze Menu  Descriptive Statistics  Explore  Select
variable of Interest  Statistics  Outlier and Percentile 
Continue  Plot  Histogram  Continue  OK.
 The Outlier labelling rule labels as outliers any observation
below the lower bound and above the upper bound.
 Lower Bound = Lower Quartile – (2.2*(Q3-Q1)) or
 Upper Bound = Upper Quartile - (2.2*(Q3-Q1))
Univariate Outliers 2
 Using the outlier labelling

Rule Compute the value of
outlier and compare it with
the extreme values table.

 Another way of checking for outliers is by using Z Scores.
 First standardize your variables using Z Score. If there are 80
or fewer cases, outliers would be those associated with large
standard z score values e.g. less that -2.5 and greater than
+2.5.
 If there are more than 80 cases outliers would be those cases
associated with large standard Z score values e.g less that -3
and greater than +3.
 This method is efficient for data that is normally distributed. If
the data is normally distributed a box plot can be used to
identify outliers.
 To get Z-Scores  Analyze menu  Descriptive Statistics
Descriptive  Select Variable of Interest  Click Save
Standardized Values as Variables  Ok.

 From the Z Scores in the
datasheet apply the Z score
rules to locate outliers.
 Another to determining outliers is to use Box Plot.

 Click Analyze Menu  Descriptive Statistics  Explore
Select the variable of Interest  Statistics  Click Outliers 
Ok.

 If the case is represented by an asterix *, it means the outlier is
more than 3 box length away from the hinge.
 If the case is represented by a circle o, it means the outlier is
more than 1.5 box length away from the hinge.
Extreme
Outlier

Multivariate Outliers
 Multivariate Outliers are unusual combination of values for
independent variables. The values of any of the independent
variables may not be a univariate outliers, but in combination
with other variable it may be an outlier.
 Multivariate outlier does not occur too often, but it is most time
necessary to check for multivariate outliers when conducting
multivariate analysis such as multiple regression, MANOVA etc.
 To check for Multivariate outliers in SPSS, we use the
Mahalanobi Distance (MD).
 MD calculates the multidimensional Z scores, that looks at the
distance of values to the centroid, as checks if these calculated
values are statistically significant.
 Identifying Multivariate Outliers requires checking for the
probability of the MD values not the score themselves.

Multivariate Outliers 2
 A variable is a multivariate outlier if the probability associated
with it is less than or equal to .001.
 Mahalanobi’s Distance requires that the variable is ordinal or
ratio level of measurement.
 If any of the variable have a corresponding value that is less
than or equal to 0.001, it is better it is taken out before
proceeding with multivariate analysis.
 If we want to check whether outliers exist in the relationship
between Student Continuous Access Performance and terminal
course score;
 Click on Analyze Menu  Regression  Linear  Select the
dependent variables and Independent variables  Save.
 Select Mahalanobi from the distance group and Click OK.

 The Mahalanobi Distance would then be added to the data
view.
 To Verify the MD significate value;

 Transform  Compute Variables  Assign a name to the target
Variable box  from the Function
 Select the CDF of the ChiSquare (Cdf.chisq).
 Insert the cdf.chisq function into the Numeric Expression Box of
the Dialog Box that is displayed.
 Add the MD Variable into the cdf.chisq function. Add the degree
of freedom (this is the number of predictor variable that is to be
used).
 Subtract the expression from 1 e.g. 1-CDF>CHISQ(MAH_1,2)
 The significant values would the

be displayed in the data view as
a variable.
 Check the significate value to
determined whether they are
outliers <=0.001

Hypothesis
 An Hypothesis is a projected statement of relationship between
two or more variables (Saunders et al, 2007).
 A supposition or explanation (theory) that is provisionally
accepted in order to interpret certain events or phenomena, and
to provide guidance for further investigation. A hypothesis may
be proven correct or wrong, and must be capable of refutation.
If it remains unrefuted by facts, it is said to be verified or
corroborated.
 Statistics: An assumption about certain characteristics of
a population. If it specifies values for every parameter of a
population, it is called a simple hypothesis; if not,
a composite hypothesis. If it attempts to nullify the difference
between two sample means (by suggesting that the difference
is of no statistical significance), it is called a null hypothesis.
(http://www.businessdictionary.com/definition/hypothesis.html#ixzz459n3qS7W)

Hypothesis 2
 Hypothesis is an assumption or statement about a population
parameter which may be true or false.
 Hypothesis testing is the formal procedure used to either accept
or reject a statistical hypothesis.
 The methodology used in hypothesis testing depends on the
data used.
 Hypothesis testing is used to infer the result of a test performed
on sample data from a larger population. It is a way to test the
results of a survey or experiment to see if it has meaningful
results.
 Examples of Hypothesis are;
 Watering flowers by 6pm in the evening increases their
rate of growth.
 Students pass better when exams starts by noon.

Hypothesis 3
 Examples of Hypothesis.
 Air-conditioned classrooms in Universities and Polytechnics
would lead to better student performance in examinations.
 Obsessed student perform better in Examinations.
 Supermarkets make better sales on Friday evenings than on
Monday evenings.
 Exposure to television time affects secondary school
students reading time.
 Level of Job Satisfaction in Men has a direct bearing with
domestic violence in the home.
Null and Alternative Hypothesis
 Null Hypothesis
 A null hypothesis is a type of hypothesis that proposes that
no statistical significance exists in a set of given
observations. The null hypothesis attempts to show that no
variation exists between two or more variables (one does not
impact the other). That is the idea being investigated may
have occurred by chance. It is denoted as Ho.
 Alternative/Experimental Hypothesis
 This is an hypothesis that tend to show that relationship exist
between variables. That is the behavior of one variable
affects the other. It supports the theory being investigated
and did not occur by chance.
 The null hypothesis is the opposite of the experimental
hypothesis. It is possible than an experimental hypothesis is
not supported, hence the need for null hypothesis.
Null and Alternative Hypothesis 2
 Null Hypothesis
 The null hypothesis attempts to show that no relationship
exists between two measured phenomenon, groups or
association or that a single variable is no different from its
mean.
 It is presumed to be true until statistical evidence nullifies it
for an alternative hypothesis.
 The Null Hypothesis is denoted by Ho and it is the hypothesis
that the researcher tries to disprove.
 Alternative Hypothesis
 An alternative hypothesis (H1)states that there is statistical
significance between two variables.
 The alternative hypothesis attempts to show that a
relationship exists between two measured phenomenon,
groups or association
One vs Two Tailed Test
 A test of a statistical hypothesis, where the region of rejection is
on only one side of the sampling distribution, is called a one-
tailed test.
 Example of One Tail Test
Suppose the null hypothesis states that the mean of the height of all men in
Lagos is less than or equal to 6 feet. The alternative hypothesis would be
that the mean is greater than 6 feet. The region of rejection would consist of
a range of numbers located on the right side of sampling distribution; that is,
a set of numbers greater than 6 feet.

One vs Two Tailed Test 2
 A test of a statistical hypothesis, where the region of rejection is
on both sides of the sampling distribution, is called a two-tailed
test.
 Suppose the null hypothesis states that the mean of the height of all
men in Lagos is to 6 feet. The alternative hypothesis would be that
the mean is less than 6 feet or greater than 6 feet. The region of
rejection would consist of a range of numbers located on both sides
of sampling distribution; that is, the region of rejection would consist
partly of numbers that were less than 6 feet and partly of numbers
that were greater than 6 feet.

One vs Two Tailed Test 3
 One Tailed – this is a directional
hypothesis that predicts the nature of
impact the independent variable would
have on the dependent variable
 e.g. The use of WhatsApp would
have a negative impact on student
examination performance
 Two Tailed: this is a non directional
hypothesis that predicts the impacts of
the independent variable on the
dependent variable without specifying
the direction of relationship.
 e.g. The use of WhatsApp would
impact on student examination
performance
Errors in Hypothesis Testing
 Type I Error
 Concluding that one variable would affect another (HA) when
in actual fact no relationship exist (HO). This occurs when
the Null hypothesis is true and the researcher rejects it.
 A Type I error occurs when the researcher rejects a null
hypothesis when it is true.
 The probability of committing a Type I error is called
the significance level. This probability is also called alpha,
and is often denoted by α.
 It rejects an idea that should have been accepted
 A good hypothesis must have the following;
 A dependent and independent variable.
 The measure (ordinal, nominal, scale) must be known.

Errors in Hypothesis Testing 2
 Type II Error
 This occurs when the researcher fails to conclude that there
is a relationship between two or more variables (HA) when in
fact a relationship exist. When the null hypothesis is false
and the researcher fails to reject it, a type II error is said to
occur.
 Type II error.
 A Type II error occurs when the researcher fails to reject a null
hypothesis that is false.
 The probability of committing a Type II error is called Beta or Beta
Risk and is often denoted by β.
 The probability of not committing a Type II error is called
the Power of the test.
http://stattrek.com/hypothesis-test/hypothesis-testing.aspx

Qualities of a Good Hypothesis
 A good Hypothesis should
 Include an “if” and “then” statement (according to the
University of California).
 Include both the independent and dependent variables.
 Be testable by experiment, survey or other scientifically
sound technique.
 Be based on information in prior research (either yours or
someone else’s).
 A hypothesis must be conceptually clear and free from
ambiguous information’s.
 It should be formulated for a particular and specific problem
and be free from include generalization.
 Have a design criteria and relies on sound reasoning.
http://www.statisticshowto.com/probability-and-statistics/hypothesis-testing/
Qualities of a Good Hypothesis 2
 Hypothesis Examples 1
 Ho – Television time does not affect student
performance in examinations.
 H1 – Student Perform better in examination when their
television time is reduced.
 Hypothesis Examples 1
 Ho – The Soil type has no influence on the size of
Cassava yield.
 H1 – The Soil Type affects cassava Yield

Significant Level (P-Value)
 The significant level is the level at which a null hypothesis can be
rejected by a researcher.
 P-value is a probability of obtaining a value of the test statistic or a
more extreme value of the test statistic assuming that the null
hypothesis is true.
 A null hypothesis can be rejected when the P Value is less than 0.05
(5%). The level of significance. 5% is the conventional benchmark
and cut-off.
 In SPSS, a result is significant when the probability of observing a
relationship or difference due to sampling error/chance is less than
5%.
 A small p-value (≤ 0.05) indicates strong evidence against the null
hypothesis, so it is rejected.
 A large p-value (> 0.05) indicates weak evidence against the null
hypothesis (fail to reject).
 p-values very close to the cutoff (~ 0.05) are considered to be
marginal (need attention).
Using P-Value for Inference
 Statistical significance means that a result from testing is very
unlikely to occur by chance, but is as a result of specific cause.
 Statistical significance can be strong or weak.
 Difference disciplines may use different levels of significant
value.
 If the P-value is less than the level of significance (5%=0.05)
then, the result is significant, it means we reject the null
hypothesis and accept the Alternative hypothesis.
 If the P-value is greater than the significant level (5%=0.05), the
result is not significant, i.e. we accept the Null Hypothesis and
reject the Alternative hypothesis
 H0: The sample data are normally distributed
 H1: The sample data are not normally distributed

Region of Rejection
 If the statistic in a research or hypothesis testing process falls
within a specified range of values, the researcher rejects
the null hypothesis . The range of values that leads the
researcher to reject the null hypothesis is called the region of
rejection.
 Critical values: The values of the test statistic that separate

the rejection and non-rejection regions.
 Rejection region: the set of values for the test statistic that
leads to rejection of Ho.
 Non-rejection region: the set of values not in the rejection
region that leads to non-rejection of Ho.
Steps in Hypothesis Testing
 State the null and alternative hypotheses
 Determine the Level of significance α
 Conduct the Test statistics
 Calculate the p-value
 Determine whether to accept or reject the null hypothesis by
comparing p-value to α
 Conclusion in words by writing out your inference.

Comparison of Statistical Test
 The correct statistical test to use not only depends on the study
design, but also the characteristics of the data.
 Before using a particular test assumptions must have been
satisfied.
 For example, the most appropriate test to use might be an
independent-samples t-test, but if your data failed the
assumptions for this test, you may have to use a Mann-Whitney U
test instead.
 For every Parametric test, there is a corresponding Non-
Parametric
FOR Alternative.
CORRELATION/RELATIONSHIP TEST
PARAMETRIC TESTS NONPARAMETRIC TESTS
Pearson Product M.C.C. Spearman Rank C.C.
Point Biserial C.C. Kendal Tau-B

Comparison of Statistical Test 2
 FOR OTHER STATISTICAL TESTS
PARAMETRIC NON-PARAMETRIC
PURPOSE
TEST TEST
One sample t-test One sample K-S Test Test on the median for data
from a symmetric distribution
Independent Mann-Whitney test/ Compare Two Independent
Sample t Test UK-SZ Samples
Two Sample or Wilcoxon signed Rank Compare dependent Samples
Paired samples t test
test
One way Analysis of Kruskal Wallis H Test Compare K-Independent
Variance (ANOVA) Samples
One way repeated Friedman's ANOVA Test of median/rank, based on
measures ANOVA randomized experiment
Chi-Square Chi-Square

Choosing the Right Test
 There are three main steps that would aid us in choosing the right
test.
 What is your research question.
 If your research question is descriptive; use descriptive statistics.
 Descriptive. Descriptive statistics identifies patterns and describe
characteristics.
 Relationship and prediction between variables
e.g - is there any relationship between Student Height and
Intelligence in Students (relationship – Correlation )
can an increase in student height lead to increase
intelligence level (causation - Regression)
 Difference between groups – One Way ANOVA (3 or more
category).
 Difference between group can also be measured using an
Independent Sample t test if we have only two groups e.g
Male/Female
Choosing the Right Test 2
 Understand the types and number of Variables
 A dependent variable is the variable being measured and
tested. An independent variable is the that may have an effect
on the dependent variable.
 How many dependent and independent do you have. This
determines the type of test to use.
 Know the level of measurement of the dependent and
independent variables. Ensure that they pass the assumptions
of parametric test.
 Do your variable require transformation using the recode.

Module IV
Inferential Statistics – 1
Non-Parametric Test

Correlation
 Correlation is a statistical measure that indicates the extent to
which two or more variables fluctuate together. A positive
correlation indicates the extent to which those variables
increase or decrease in parallel; a negative correlation indicates
the extent to which one variable increases as the other
decreases.
 Correlation is used to determine the direction and strength of
the association or linear relationship between two or more
variables.
 Correlation can be represented with a scatterplot and
correlation is used to draw a line of fit across the points
showing the association between the variables.
 The correlation value highlights how close the points on the
scatterplot fit the best fitting line.

Correlation 2
 In a positive correlation, increase in one variable is associated
with an increase in the other variable. This is represented by a
positive slope that ascends from the bottom left on a chart to
the upwards (top right).
 A negative correlation means that an increase in one variable is

associated with a decrease in the other variable. It is
represented by a negative slope that descends from the top left
to the bottom right in a chart.
 Where there is no relationship a line of fit cannot be drawn.

Correlation 3
 A correlation can be Perfect, Very Strong, Strong, Moderate or
Weak.
 The stronger the relationship the closer it will be to a maximum
value of 1 or -1, which shows a perfect positive or perfect
negative relationship respectively.
 r = 1 means there is a perfectly positive
relationship between the two variable;
 when r = -1 means there is perfectly negative

relationship between the two variables.

Correlation 4
 Strength of Correlation relationship – r.
 Correlation does not consider the differences between the

independent and dependent variable.
 Correlation does not imply causation (cause and effect).
 The coefficient of determination r2 is the shared between the
two variables. It is computed by squaring the correlation value.
 Coefficient of determination is symbolized by r2 because it
is square of the coefficient of correlation symbolized by r. The
coefficient of determination is an important tool in determining
the degree of linear-correlation of variables ('goodness of fit')
in regression analysis.
Correlation 5
 Types of Correlation in SPSS and when to use them.

Pearson Correlation
 Pearson Correlation analysis techniques is used to examine the
direction and strength of the linear association between two
continuous or scale variables.
 Pearson’s correlation coefficient is a statistical measure of the
strength of a linear relationship between paired data. In a
sample it is denoted by r.
 In a sample it is denoted by r and is by design constrained as
follows -1≤ r ≤ r
 Positive values denote positive linear correlation
 Negative values denote negative linear correlation
 A value of 0 denotes no linear correlation
 The closer the value is to 1 or –1, the stronger the linear
correlation.

Pearson Correlation 2
 Assumption to check before using Pearson Correlation are;
 The two variables must be continuous (scale or ratio)
 There must be a linear relationship between the two
variables.
 The two variable must be normally distributed.
 There should be no outliers.
 These assumption must be passed before the conduct of a
Pearson Correlation test.
 Example – Pearson Correlation between Student Weight and
Height.
 Ho – No relationship exist between an increase in student
height and their weight.
H1 – Increase in student Weight leads to a corresponding
increase in student weight.
 Analyze Menu  Correlate  Bivariate  Put in the two
variables of interest  Select Pearson Correlation  Choose
the type of tail  Ok.
 Use the Strength of Correlation table to interpret the value of r.

 Example for reporting Pearson Correlation test results;
 Pearson Correlation analysis was carried out to find out if there exist a
positive correlation between the Student Height and weight in a Class.
 The Scatter plot indicates that there exist a linear relationship, while the
box plots indicates that no outliers exist. The normality test was carried
out using Shapiro-Wilk and Kolmogorov Smirnov test which both indicates
that the data was normally distributed and Skewness and Kurtosis test
indicates that the data is reasonable normally distributed.
Homoscedasticity is assumed from the scatter plot – this indicates the
similarity of the variance of data along the line of the predictor variable.
 The result of the Pearson correlation coefficient test shows there is …..
(strong positive, negative, weak, moderate etc.) between student height
and weight with a mean of (________ and SD of _________) one tail or
two tail. Therefore the null hypothesis is accepted/rejected. The
relationship can account for ___% of the variance (r2). This suggest
therefore that __________________________________
 When the result is not statistically significant it should be stated.
Spearman Rank Correlation 1
 Spearman Ranked Order correlation is a statistical measure of the
direction and of the monotonic (i.e., that when one number increases,
so does the other, or vice versa). between two continuous or ordinal
variables.
 It is the non parametric alternative of the Pearson correlation and
it is used when data violates the basic assumptive requirements of
Pearson correlation. That is;
 Data is not normally distributed.
 Outliers exist in the data set.
 One or both of the variables are ordinal.
 The assumptions of the Spearman Ranked Correlation are;
 The two variables must be ordinal (Likert Scale) or Scale (Interval
or Ratio).
 One variable must be monotonically related to the other.
 Before carryout a Spearman Correlation first ensure that the data
violate Pearson's assumptions.
 Example
 H1 – A man salary per month affect his degree of satisfaction

with life.
 H0 – There is no relationship between life satisfaction and

monthly income
 First test to ensure that the dataset violate the assumption of
linearity (no linear relationship)
 Test to ensure that the dataset is not normally distributed and
finally test for outliers. This is to ensure that all assumptions of
Pearson correlation are violated.
 Analyze Menu  Correlate  Bivariate  Select the variables
of interest   Select Spearman Rank Correlation  Choose
the type of tail  Ok.
 From the Non-Parametric Correlation table in the output viewer

interpret the results as shown.

 The difference between Spearman Correlation and Pearson’s
Correlation is that Spearman’s Correlation is actually done on
the ranks of the data
 To get the ranks of data Click on the Transform Menu  Rank
Cases Select the Variable of Interest  OK. Ranked Data
 Spearman Rank Correlation can be carried out on the ranked

data as follows; Analyze Menu  Correlate  Bivariate 
Select the variables of interest Spearman Correlation 
Choose the type of tail  Ok.
 The correlation results obtained from the Spearman Ranked
Correlation analysis is the same with the one obtained from
performing a Pearson Correlation analysis on the ranked data
used in Spearman analysis.
Spearman’s ranked
Pearson’s Correlation on
Correlation
Ranked Data
 In interpreting the result we can report that there is a negative

between high monthly income and life satisfaction.

 The outcome of the Spearman Ranked Order Correlation test
can be stated as follows;
 In reporting the result we can say that Spearman ranked order
correlation was carried out to find out if there is a relationship between
high income and life satisfaction. The scatter plot indicates that there is
a monotonic relationship the variables (_____)
 The report of the analysis supports the research hypothesis that there
will be _______ statistical significant relationship high income (rs =
(degree of freedom Population sample – 2), Correlation Coefficient
Value, p = significant value).
 Square the Correlation Coefficient to get the percentage and use same
for interpretation.
 On this basis the null hypothesis can either be accepted or rejected.
 The format for reporting the result of a correlation test may vary
depending of the referencing style in use e.g. APA, Harvard,
MLA, Chicago etc.
Kendal Tau-B Correlation
 Kendall's tau-b (τb) correlation coefficient (Kendall's tau-b, for
short) is a nonparametric measure of the strength and direction
of association that exists between two variables measured on
at least an ordinal scale. It is considered a nonparametric
alternative to the Pearson’s product-moment correlation when
your data has failed one or more of the assumptions of this test.
 It is also considered an alternative to the non parametric
Spearman rank-order correlation coefficient (especially when
you have a small sample size with many tied ranks).
 The assumptions of the Kendal tau-B Correlation are;

 The two variables must be ordinal (Likert Scale) or Scale
(Interval ratio) level of measurement.
 One variable must be monotonically related to the other.

Kendal Tau-B Correlation 2
 Example
 H1 – A man salary per month affect his degree of satisfaction
with life.
 H0 – There is no relationship between life satisfaction and
monthly income
 First test that the assumptions are satisfied. That is test that the
variables are monotonic and that one variable is a scale while
the other is ordinal.
 Kendal Tau-b Correlation can be carried out as follows; Analyze
Menu  Correlate  Bivariate  Select the variables of
interest Kendal tau-b  Choose the type of tail  Ok.

Kendal Tau-B Correlation 3
 From the result obtained from the output table inference can be
made from the table shown.
 The level of significance can then be interpreted.

 The result of the analysis shows that there is a/no significant _____
relationship between _________________________ between variable
1____ and variable 2 _____(N, r2, p)

Point Biserial Correlation
 Point-Biserial correlation analysis technique is used to
determine the direction and strength of the linear association
between one continuous or scale variable and one
dichotomous/ binary variable e.g. Boy/Girl, Male/Female,
Married/Single, Yes/No.
 The assumptions of the Point Biserial Correlation are;
 One Variable must be a continuous variable (interval or
scale)
 The other variable should be dichotomous/binary.
 The variance of the values for the dichotomous variable
must be similar. Using the Levene test for equality
 The variable must be normally distributed the should be no
outliers in the values.

Point Biserial Correlation 2
 Using the Sample Student data, a Levene’s Test can be
performed to determine the equality of variance of score
between this two group (Male/Female).
 To perform the Levene test – Analyze Menu  Compare Mean
 Independent Sample T-Test  Put the Dependent variable in
the Test Variable Box  Add Gender as the Grouping
Variable Define the Group (1-Male; 2 – Female) Continue
 OK.

 The significant value obtained from the Levene’s test is >.05
which indicates that the variance in height between male and
female is similar.
 To test for the normality of the data we can use any of the
normal distribution test.
 Analyze  Descriptive Statistics  Explore  Select the
Independent Variable (Height)  Dependent variable (Factor
List)  Statistics  Select Descriptive and Outliers  Plot
 Determine the satisfaction of the test for normality of the data
and also that there is no outlier from the output viewer of the
data under study.
 To establish whether there is a relationship between the two
variable we plot a scatter plot to show whether there is
relationship (Height – dependent variable).
 Make sure the measurement of the two variables are Scale.
 To perform the Point Biserial Correlation  Analyze Menu 
Correlate  Bivariate  Pearson  Ok.
 From the output we can then
Determine the significance
between the variables.

 Select Normality Plot with Test  Continue  OK.

 Example
 H1 – A Student gender determines his/her height
 H0 – Gender has no relationship with height
 Direction of relation in Point Biserial Correlation
 If the correlation is positive, increase in one variable is
associated with increase in the other variable.
 If the correlation is negative, increase in one measure is
associated with a decrease in the other measure.
 Result format;
The result of the point Biserial correlation shows that there was a _______ statistical
significant relationship between gender and height and (rpb = ( __), N (__), p (__).
Therefore the null hypothesis is accepted/rejected. This relationship can account for
__% of variation in scores (r2__).
The r2 value can be gotten from the scatter plot.
If the relationship did not reach significant level state that the point Biserial
correlation test between gender and height did no reach statistical significance
rpb – (degree of freedom), p> (__) (p is the significant value)
Module V
Inferential Statistics – 2
Parametric & Non Parametric Test

Parameter Estimates
 Parameter estimates allows us to estimate the value of the
population that would have the characteristics we have
measures in the sample. Estimating (extrapolate) population
value based on a sample value.
 T-Test
 Linear regression.
 Multiple Regression.
 Binary Logistic Regression.
 In a parametric test a sample statistic is obtained to estimate
the population parameter. Because this estimation process
involves a sample, a sampling distribution, and a population,
certain parametric assumptions are required to ensure all
components are compatible with each other.

One Sample T-Test
 The One Sample t Test determines whether the sample mean is
statistically different from a known or hypothesized population
mean. The One Sample t Test is a parametric test.
 This test is also known as Single Sample t Test, the variable
used in this test is known as Test variable. In a One Sample
t Test, the test variable is compared against a "test value",
which is a known or hypothesized value of the mean in the
population.
 The One Sample t Test is commonly used to test the following:
 Statistical difference between a sample mean and a known or
hypothesized value of the mean in the population.
 Statistical difference between the sample mean and the sample midpoint
of the test variable.
 Statistical difference between the sample mean of the test variable and
chance.
 Statistical difference between a change score and zero.
One Sample T-Test 2
 Your data must meet the following requirements:
 Test variable that is continuous (i.e., interval or ratio level)
 Scores on the test variable are independent (i.e., independence of
observations). There is no relationship between scores on the test
variable. Violation of this assumption will yield an
inaccurate p value
 Random sample of data from the population
 Normal distribution (approximately) of the sample and population
on the test variable. Non-normal population distributions,
especially those that are thick-tailed or heavily skewed,
considerably reduce the power of the test. Among moderate or
large samples, a violation of normality may still yield
accurate p values
 Homogeneity of variances (i.e., variances approximately equal in
both the sample and population)
 No outliers
One Sample T-Test 3
 Example:
 Ho – A student gender does not significantly determines his/her weight.
 H0 – A student gender significantly determines his/her weight.
 To run a One Sample t Test in SPSS, click Analyze  Compare
Means  One-Sample T Test.
Test Variable(s): The variable whose
mean will be compared to the
hypothesized population mean
(i.e., Test Value). You may run multiple
One Sample t Tests simultaneously by
selecting more than one test variable.
Each variable will be compared to the same Test Value.
Test Value: The hypothesized population mean against which your test

variable(s) will be compared.

One Sample T-Test 4
Options: Clicking Options will open a window where you can specify
the Confidence Interval Percentage and how the analysis will
address Missing Values (i.e., Exclude cases analysis by analysis or Exclude
cases listwise). Click Continue when you have finished making specifications.
 Click Ok.

One Sample T-Test 5
 Test Value: The number we entered as the test value in the One-Sample
T Test window.
 t Statistic: The test statistic of one-sample t test, denoted t. In this
example, t = ____. Note that t is calculated by dividing the mean
difference (E) by the standard error mean (from the One-Sample
Statistics box).
 df: The degrees of freedom for the test. For a one-sample t test, df = n -
1; so here, df = __ - 1 = __.
 Sig. (2-tailed): The two-tailed p-value corresponding to the test statistic.
 E Mean Difference: The difference between the "observed" sample mean
(from the One Sample Statistics box) and the "expected" mean (the
specified test value (A)). The sign of the mean difference corresponds to
the sign of the t value (B). The positive t value in this example indicates
that the mean height of the sample is greater than the hypothesized value
(66.5).
 Confidence Interval for the Difference: The confidence interval for the
difference between the specified test value and the sample mean.
One Sample T-Test 6
 The result can stated as follows;
 Since p ≤ ____, we reject/accept the null hypothesis that the
sample mean is equal to the hypothesized population mean
and conclude that the mean _____ of the sample is
significantly different than the average ________
 Based on the results, we can state the following:
 There is a significant difference in mean ________between
the sample and the _________________(p< _____).

Paired Sample T-Test
 The Paired Samples t Test compares two means that are from
the same individual, object, or related units. The two means
typically represent two different times (e.g., pre-test and post-
test with an intervention between the two time points) or two
different but related conditions or units (e.g., left and right ears,
twins).
 The purpose of the test is to determine whether there is
statistical evidence that the mean difference between paired
observations on a particular outcome is significantly different
from zero. The Paired Samples t Test is a parametric test.
 This test is also known as: Dependent t Test or Paired t Test
 The variable used in this test is known as: Dependent variable,
or test variable (continuous), measured at two different times or
for two related conditions or units

Paired Sample T-Test 2
 For the paired sample t test to be successfully carried out the
following requirements needs to be satisfied.
 Dependent variable should be continuous (i.e., interval or ratio
level)
 The dependent variable should have related samples/groups
(i.e., dependent observations) This means that the subjects in
the first group are also in the second group.
 The dataset should be Normally distributed.
 There should be no outliers in the difference between the two
related groups.
 When one or more of the assumptions for the paired sample
t Test are not met, you may want to run the nonparametric
Wilcoxon Signed-Ranks Test instead.

 Example: H0 – There is no significant difference between the height of
Male and that of Female
 H0 – There is a significant difference between the height of Male and that
of Female
 Click Analyze Menu Compare Mean  Paired-Samples T
Test.
Pair: The “Pair” column
represents the number of
Paired Samples t Tests to
run. You may choose to
run multiple Paired
Samples t Tests simultaneously by selecting multiple sets of matched
variables. Each new pair will appear on a new line.
Variable1: The first variable, representing the first group of matched values.
Move the variable that represents the first group to the right where it will be
listed beneath the “Variable1” column.
Variable2: The second variable, representing the second group of matched
values. Move the variable that represents the second group to the right
where it will be listed beneath the “Variable2” column.
Options: Clicking Options will open a window where you can specify the

Confidence Interval Percentage and how the analysis will address Missing
Values (i.e., Exclude cases analysis by analysis or Exclude cases listwise).
Click Continue when you have finished making specifications.
The result of the test would the be displayed in the output

viewer
 From the results, we say infer that

 Weight and Height values are weak and negatively correlated (r = -.131, p <
0.001) for male, while for females Weight and Height values are strongly and
negatively correlated(r = -.625, p < 0.001)
 There was a significant difference between Weight and height for male(t65 =
-57.95, p < 0.001) and for female (t53 = -66.53, p < 0.001)
Independent Sample T-Test
 The Independent Samples t Test compares the means of two
independent groups in order to determine whether there is
statistical evidence that the associated population means are
significantly different. The Independent Samples t Test is a
parametric test. The test is also known as;
 Independent T Test
 Independent Measures T Test
 Independent Two-sample T Test
 Uncorrelated Scores T Test
 Unpaired T Test
 Unrelated T Test
 The variables used in this test are known as:

 Dependent variable or test variable
 Independent variable, or grouping variable

Independent Sample T-Test 2
 The Independent Samples t Test can only compare the means for
only two groups. It cannot make comparisons among more than two
groups. If you wish to compare the means across more than two
groups, ANOVA will be used.
 The data requirement for T-Test are as follows;
 Dependent variable that is continuous (i.e., interval or ratio level)
 Independent variable that is categorical (i.e., two or more groups)
 Cases that have values on both the dependent and independent
variables
 Independent samples/groups (i.e., independence of observations)
 Random sample of data from the population
 Normal distribution (approximately) of the dependent variable for each
group
 Homogeneity of variances (i.e., variances approximately equal across
groups)
 No outliers

 To run an Independent Samples t Test in SPSS, click Analyze
-Compare Means  Independent-Samples T Test.
 Select the Test variable (dependent variable), while the

grouping variable is the independent variable.

 The result of the test is displayed in the output viewer window
as follows;
 Levene's Test for Equality of Variances: This section has the

test results for Levene's Test. From left to right:
 F is the test statistic of Levene's test
 Sig. is the p-value corresponding to this test statistic.

 t-test for Equality of Means provides the results for the actual
Independent Samples t Test. From left to right:
 t is the computed test statistic
 df is the degrees of freedom
 Sig (2-tailed) is the p-value corresponding to the given test
statistic and degrees of freedom
 Mean Difference is the difference between the sample
means; it also corresponds to the numerator of the test
statistic
 Std. Error Difference is the standard error; it also
corresponds to the denominator of the test statistic
 From the result from the table we can decide to accept or reject
the null hypothesis Ho and also write our conclusion

ANOVA
 The One-Way ANOVA ("analysis of variance") compares the
means of two or more independent groups in order to
determine whether there is statistical evidence that the
associated population means are significantly different. One-
Way ANOVA is a parametric test.
 This test is also known as:
 One-Factor ANOVA
 One-Way Analysis of Variance
 The variables used in this test are known as:
 Dependent variable
 Independent variable (also known as the grouping variable,
or a factor). This variable divides cases into two or more
mutually exclusive levels, or groups

ANOVA 2
 ANOVA can be used to analyze data from the following;
 Field studies
 Experiments
 Quasi-experiment
 The One-Way ANOVA is commonly used to test the following:
 Statistical differences among the means of two or more groups
 Statistical differences among the means of two or more
interventions
 Statistical differences among the means of two or more change
scores
 Both the One-Way ANOVA and the Independent Samples t Test
can compare the means for two groups. However, only the One-
Way ANOVA can compare the means across three or more
groups.

ANOVA 3
 ANOVA test is based on the following assumptions;
 Dependent variable must be continuous (i.e., interval or ratio
level)
 Independent variable must be categorical (i.e., two or more
groups)
 Cases that have values on both the dependent and
independent variables
 Independent samples/groups (i.e., independence of
observations)
 The dependent variable should be normally distributed.
 Homogeneity of variances (i.e., variances approximately
equal across groups)
 No outliers

ANOVA 4
 To perform ANOVA Click on Analyze Menu -Compare Means
 One Way ANOVA
 Select the Test variable (dependent variable),

while the grouping variable is the independent
Variable  Post Hoc  Continue  Ok.

ANOVA 5
 Dependent List: The dependent variable(s). This is the
variable whose means will be compared between the samples
(groups). Multiple means comparison can be done by selecting
more than one dependent variable.
 Factor: This is the independent variable. The categories (or
groups) of the independent variable will define which samples
will be compared. The independent variable must have at least
two categories (groups), but usually has three or more groups
when used in a One-Way ANOVA
 Contrasts: (Optional) Specify contrasts, or planned
comparisons, to be conducted after the overall ANOVA test.
 Post Hoc: (Optional) Request post hoc (also known
as multiple comparisons) tests. Specific post hoc tests can be
selected by checking the associated boxes.

ANOVA 6
 Equal Variances Assumed: Multiple comparisons options that
assume homogeneity of variance(each group has equal variance).
For detailed information about the specific comparison methods, click
the Help button in this window.
 Significance level: The desired cutoff for statistical significance. By
default, significance is set to 0.05.
 Options will produce a window where we can
specify which Statistics to include in the
output (Descriptive, Fixed and random effects,
Homogeneity of variance test, Brown-Forsythe,
Welch), whether to include a Means plot, and
how the analysis will address Missing Values
(i.e., Exclude cases analysis by analysis or
Exclude cases listwise). Click Continue when
you are finished making specifications.
ANOVA 7
 The output of the ANOVA test is displayed in the output viewer.
 The Means plot is a visual representation of the Compare Means

output. The points on the chart are the average of each group.
 ANOVA alone does not tell us specifically which means were different
from one another. To determine that, we would need to follow up
with multiple comparisons (or post-hoc) tests.
 The result, discussion and conclusion can then written based on the
displayed output.
Two Way ANOVA
 The two-way analysis of variance is an extension of the one-way
analysis of variance.
 It compares the mean differences between groups that have been
split on two independent variables (called factors).
 The primary purpose of a two-way ANOVA is to understand if there
is an interaction between the two independent variables on the
dependent variable.
 It test the effects of factors (A and B) on one dependent
continuous variable Y.
 Three null hypotheses are tested in this procedure:
 factor A does not influence variable Y
 factor B does not influence variable Y
 the effect of factor A on variable Y does not depend on factor B
(ithere is no interaction of factors A and B).

Two Way ANOVA 1
 Examples of research questions that can be studied using Two
Way ANOVA are;
 Do diets and exercise really lead to loss of weight?
 H0: There is no difference in mean weight loss of a person
irrespective of whether he exercises and changes his diet.
 H1: There is a difference in the mean weight loss of a person that
exercises and changes his diet.
 Does the material used to produce a Car battery and the
operating temperature affect the life span of a battery?
 Hypotheses:
 H0: There is no difference in mean battery life for different
combinations of material type and Operating temperature level.
 H1: There is a difference in mean battery life for different
combinations of material type and operating temperature level

Two Way ANOVA 2
 The Two ANOVA has the following assumptions;
1. The dependent variable should be continuous (interval or
ratio)
2. The two independent variables should each consist of two or
more categorical, independent groups.
3. The populations (dependent variable) from which the samples
were obtained must be normally or approximately normally
distributed.
4. The variances of the populations must be equal.
5. The groups must have the same sample size.
 A statistical test to ensure that the sample data satisfies these
assumptions should always be carried out.
 If these assumptions are not satisfied the results that is gotten
from the two-way ANOVA may not be valid.

Two Way ANOVA 3
 To perform Two Way ANOVA
Test  Analyze Menu 
General Linear Model 
Univariate.
 Select the dependent variable
and fixed factors and inserted
into their appropriate section in
their dialog box.
 Click on  Plots . The
"Univariate: Profile Plots"
dialogue box will be displayed.

Two Way ANOVA 4
 Transfer the appropriate variable e.g.
Health to Horizontal Axis and Gender to
Separate Lines.
 Click the button.
 Click the button. This will return you to
the "Univariate" dialogue box.
 Click the button. The "Univariate: Post
Hoc Multiple Comparisons for
Observed..." dialogue box will be
displayed.
 Click the button to return to the
"Univariate" dialogue box -  Click
Turkey
 Click the button. This will present
the "Univariate: Options" dialogue box
Two Way ANOVA 5
 Transfer the Variables from Factor(s)
and Factor Interaction to Display Means
for;
 Click Descriptive Statistics and
Homogeneity from Display section in
the dialog box.
 Set the Significance level  Click
Continue  Click Ok.
 The output of the test is displayed in
the Output Viewer Window.

Two Way ANOVA 6
 The Result of the Two Way ANOVA
can now be written out from the
displayed tables and chart.

MANOVA
 The one-way multivariate analysis of variance (one-way
MANOVA) is used to determine whether there are any
differences between independent groups on more than one
continuous dependent variable.
 For example, a one-way MANOVA can be used to understand
whether there were differences in the perceptions of
attractiveness and intelligence of drug users in movies (i.e., the
two dependent variables are "perceptions of attractiveness" and
"perceptions of intelligence", whilst the independent variable is
"drug users in movies", which has three independent groups:
"non-user", "experimenter" and "regular user")
https://statistics.laerd.com/spss-tutorials/one-way-manova-using-spss-statistics.php
 The one-way MANOVA cannot tell you which specific groups
were significantly different from each other.
 It only tells you that at least two groups were different.
MANOVA 2
 The assumptions of MANOVA are;
 The two or more dependent variables should be measured at
the interval or ratio level (Continuous)
 The independent variable should consist of two or more
categorical, independent groups.
 There should be independence of observations, which means
that there is no relationship between the observations in each
group or between the groups themselves.
 The sample size should be adequate.
 The should not be univariate outliers in each group of the
independent variable for any of the dependent variables.
 There should be a linear relationship between each pair of
dependent variables for each group of the independent
variable.

MANOVA 3
 To perform MANCOVA Test in
SPSS ..
 Click Analyze Menu 
General Linear Model 
Multivariate. The multivariate
dialog box will be displayed.
 Transfer the independent
variable, into the Fixed
Factor(s): box and transfer the
dependent variables, into
the Dependent Variables box.
 Click on the Plot, button. The
Multivariate Profile
Plots dialogue box will be
displayed.
MANOVA 4
 Transfer the independent variable, into
the Horizontal Axis box.
 Click the Add button. It will shows the
added variable in the Plots box.
 Click the Continue button. This will return
you to the Multivariate dialogue box.
 Click the Post Hoc button. It will present
the Multivariate: Post Hoc Multiple
Comparisons for Observed... dialogue box.
 Transfer the independent variable, into
the Post Hoc Tests for: box and select
the Tukey checkbox in the Equal Variances
Assumed area.
 Click the Continue Button to return to the
Multivariate dialogue box.

MANOVA 5
 Click the Option button. This will
present the Multivariate: Options dialog
box.
 Transfer the independent variable,
from the Factor(s) and Factor
Interactions box into
the Display Means for box.
 Select (Click) the Descriptive
statistics, Estimates of effect
size and Observed power checkboxes
in the Display area.
 Click the Continue Button; this will
return the screen back to  Report the Result
the Multivariate dialogue box.
 Click  Ok to generate the Output in
the Output Viewer Window.
Chi-Square
 The chi-square test is a procedure for testing whether two
categorical variables are related to each other in any way.
 Like t-tests, chi-square tests come up in a wide variety of
circumstances, the most common of which is assessing the
independence of two variables in a contingency table (a
crosstab). So this chi-square test is specified as an option on a
crosstab command.
 Chi-Square test can be used
when we have tow nominal
variables
 To perform Chi-Square Test ,
Analyze menu  Descriptive
Statistics  Crosstabs.
 Select the variables (Row/Column)
Chi-Square Test 2
 In the cross tab dialog box, click on the Statistics button, then
select Chi-square and Continue back to the main dialog box.
Click  Continue  Ok
 The result of the chi-Square test
would be displayed on the output
viewer.
Discussion and conclusion can be
inferred from the result displayed.

Linear Regression
 Example:
 H1 – Time a student spends studying affects his/her score in an exam.
 . H0 – Time a student spends studying affects his/her score in an exam.
 From our sample data we can test for this relationship.
 To perform Linear Regression Click Analyze Menu 
Regression – Linear regression.
 Select the dependent and
Independent variables  Ok

Linear Regression 2
 The result of the Linear Regression would be displayed as
shown below.

Linear Regression 3
 The most important tables in the output viewer for linear regression are
the Model Summary Table, which gives us the value of R and R² of our
models, this means that the linear regression explains ___ (%
represented by R²) of the variance in the data. The Durbin-Watson d =
____, which is between the two critical values of 1.5 < d < 2.5 and
therefore we can assume or state the level of linear auto-correlation in
the data.
 The coefficient table shows the regression coefficients, the intercept and
the significance of all coefficients and the intercept in the model. We find
that our linear regression analysis estimates the linear regression
function with values shown under the unstandardized coefficient.
 This coefficient table includes the Beta weights (which express the
relative importance of independent variables) and the colinearity
statistics. However, if there is only 1 independent variable in our analysis
we do not pay attention to those values.
Mann Whitney Test
 The Mann Whitney Test compares the differences between two
independent groups when the dependent variable is either
ordinal or continuous, but not normally distributed.
 It is a non-parametric alternative for the independent samples
t- test.
 This test does not assume any properties regarding the
distribution of the dependent variable in the analysis.
 It is also sometimes called the Mann Whitney Wilcoxon Test or
the Wilcoxon Rank Sum Test.
 The most common scenario where Mann Whitney Test is used
is for non normally distributed outcome variable in a small
sample (n < 25).

Mann Whitney Test 2
Examples of other test cases for the Mann Whitney test are;
1.Determining if there are differences between two independent
groups.
 Examples: Is there here a statistically significant median difference in
salary (i.e., the dependent variable) between "male fresh Engineering
Graduates" and “fresh Engineering graduates" (i.e., the two groups of the
independent variable, "gender").
2. Determining if there are differences between interventions.
Example: determine if there was a statistically significant median difference
in malaria treatment response time (i.e., the dependent variable) between
the “ABC Drug" and “XYZ Drug" (i.e., the two groups of the independent
variable, "treatment type").
3. Determining if there are differences in change scores.
Examples: Does a change in diet improve the exam performance of
children in primary school when compared to those whose diet was not
changed. (Control/Intervention)

Mann Whitney Test 3
 The assumptions of the Mann Whitney Test are;
 The dependent variable should be measured at the ordinal or
continuous level. E.g. Likert Scale (Ordinal variables) or
Continuous Variable (Weight, Height, Hours)
 The independent variable should consist of two categorical
independent groups. E.g. (2 groups: male or female),
employment status (employed or unemployed).
 There should be an independence of observations, which
means that there is no relationship between the observations in
each group or between the groups themselves.
 Both Independent variable should have the same shape when
plotted in a chart (histogram).

Mann Whitney Test 4
 To perform Mann Whitney Test
in SPSS.
 Click Analyze Menu  Non
Parametric Test Legacy
Options  2 Independent
Sample.
 The two Independent Sample
test dialog box is displayed.
 Use Grouping Variable to set
values for Group 1 and 2.
 Click OK.
 The test result will be displayed
in the output viewer.

Friedman Test
 The Friedman test is a non-parametric test for checking the
difference between several related samples. It is used to test for
differences between groups when the dependent variable is ordinal.
 The Friedman test is the non-parametric alternative to the one-way
ANOVA with repeated measures.
 As a non-parametric alternative to repeated measures ANOVA, it is
used for continuous data that has violated the assumptions
necessary to run the one-way ANOVA with repeated measures
 It is used for testing if 3 or more variables that have identical
population.
 The assumptions of the Friedman Test are
 The group that is measured should have 3 or more variables.
 The group should be a random sample from the population.
 The dependent variable should be measured at
the ordinal or continuous level.
 The samples do not need to be normally distributed.
Friedman Test 2
 For example the Friedman Test
can be used to determine if there
is a significant change in level of
birth rate between time 1 (1995)
and time 2 (2005) and time 3
(2015).
 To Carry out Friedman Test in
SPSS  Click Analyze Menu 
Non Parametric Test  Legacy
Dialogs  K-Related Samples.
 Click on Statistics Button 
Select Descriptive and Quartile.
 Click Continue  Ok.
 The Test Result will be

displayed in the Output viewer
Wilcoxon Signed Rank Test
 The Wilcoxon signed-rank test is the nonparametric test
alternative to the dependent t-test.
 It does not assume normality in the data thus can be used
when this assumptions for normality has been violated and the
use of the dependent t-test is inappropriate.
 It is used to compare two sets of scores that come from the
same participants.
 This can occur when there is the need to investigate any
change in scores from one time point to another, or when
individuals are subjected to more than one condition.
Wilcoxon Signed Rank Test 2
 To perform the Wilcoxon
Signed Rank Test in SPSS
Click Analyze 
Nonparametric Tests  Legacy
Dialogs  2 Related
Samples...
 Specify which two variables
comprise the pairs of
observation by clicking on
them.
 Then clicking on the arrow to
put them under Test Pair(s)
List.

Wilcoxon Signed Rank Test 3
 Under Test Type select
Wilcoxon — If you want
exact probabilities (i.e.
based on the binomial
distribution),
 Click on Exact,
 Choose Exact, then
Continue
 Click on OK

WSRST vs WRST
 Wilcoxon Signed Rank Sum Test (WSRST) and Wilcoxon Rank
Sum Test (WRST) are often confused for one another. The
difference between both test are:
WSRST WRST
The WSRST requires that the populations The main requirement is that the samples
be paired, for example, the same group of be drawn from independent populations.
people are tested on two different For example you might want to test
occasions or things and MEASURED on whether the Subject A is harder than
the effects of each and we then compare Subject B, and to do this you will have two
the two things or occasions groups of students, and the groups need
not be the same size. From the example
the two groups are independent, if you had
asked the same group to write the same
paper twice, then you will use the WSRST
to test your hypothesis.
The WSRST requires the data to be The data need not be quantitative, i.e. you
quantitative; that is data a that is can also perform the test on qualitative
measured along a scale data.

Kruskal Wallis Test
 The Kruskal-Wallis test H-test is an extension of the Wilcoxon
Test and can be used to test the hypothesis that a number of
unpaired samples originate from the same population.
 It is a non-parametric alternative for One Way ANOVA test.
 Kruskal Wallis can be used for up to three groups unlike the
Mann Whitney test that is used for Two Groups.
 It is used to test differences between several groups of
measurements.
 In Kruskal Wallis Test, the dependent variable is Continuous
(scale) but not normally distributed or ordinal while the
Independent variable is categorical (Nominal).

Kruskal Wallis Test 2
 The assumptions of the Kruskal Wallis Test are;
 The dependent variable should be measured at the ordinal or
continuous level. E.g. Likert Scale (Ordinal variables) or
Continuous Variable (Weight, Height, Hours)
 The independent variable should consist of two or more
categorical independent groups. E.g. (2 groups: male or
female), employment status (employed or unemployed).
 There should be an independence of observations, which
means that there is no relationship between the observations in
each group or between the groups themselves.
 Both Independent variable should have the same shape/
variability.

 To Perform Kruskal Wallis Test
 Analyze Menu  Non
Parametric Tests 
Independent Sample.
 In the Fields tab, move the
dependent variable to the ‘Test
Field’ box and the grouping
factor to the ‘Groups’ box.
 From the setting Tab,
Select Kruskal Wallis
Test and Click Run.

 The test result will be
displayed in the output viewer.

Statistical Test Comparison
Kind of Data
Frequency Scores
How Many
Independent
Variable
One More than One

Experimental or
Correlational
Study
Chi-Squared Chi-Squared Test of
Goodness of Fit Association
Correlation Study – Looking for a

relationship between score on two Experimental (Looking for
Independent Variables differences between
groups or conditions)
Parametric Non-Parametric
Pearson’s R Spearman’s rho
Statistical Test Comparison 2
One Independent Variable
Independent Measure Repeated Measures
How Many
How Many
Groups
Conditions
Two Three or More

Two Three or More
Independent One Way Independent Repeated Measure One Way Repeated

Measure (T-test Measures- ANOVA (T-test Parametric) Measures- ANOVA
Parametric) (Parametric)
(Parametric)
OR OR OR OR
Mann Kruskal Wilcoxon Friedman’s
Whitney Test Wallis Test Test Test Non-
Non- Non- Non- Parametric
Parametric Parametric Parametric
Statistical Test Comparison 3

References
 https://statistics.laerd.com/spss-tutorials
 https://www.spss-tutorials.com/basics
 https://www.dur.ac.uk/academic.skills/
 https://www.thoughtco.com
 www.datastep.com/SPSSTraining.html
 https://www.tutorialspoint.com/spss
 https://stats.stackexchange.com/questions/91034/difference-between-the-wilcoxon-rank-
sum-test-and-the-wilcoxon-signed-rank-test
 https://keydifferences.com/difference-between-parametric-and-nonparametric-test.html
 Einspruch, Eric L., An introductory guide to SPSS® for Windows, Sage Publications, 2nd nd
Ed. 2005, California USA

 IBM Corporation, - IBM SPSS Statistics 23 Brief Guide
 Petra Petrovics, Spss Tutorial & Exercise Book for Business Statistics, 2012, University Of
Miskolc.
 Sidney Tyrel, SPSS: STAT Practically Short and Simple, Sidney Tyrel and Ventus Pub.
2009
 Sabine, Landau, Brian S. Everitt; A handbook of statistical analyses using SPSS. 2004 by
Chapman & Hall/CRC Press LLC, Washington, D.C.

DBI SPSS User Manual 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBI SPSS User Manual 2

Uploaded by

Copyright:

Available Formats

DATA ANALYSIS USING

©Digital Bridge Institute, Abuja Page 1

©Digital Bridge Institute, Abuja Page 2

©Digital Bridge Institute, Abuja Page 3

©Digital Bridge Institute, Abuja Page 5

©Digital Bridge Institute, Abuja Page 8

VIEWS STATUS BAR

©Digital Bridge Institute, Abuja Page 9

©Digital Bridge Institute, Abuja Page 10

©Digital Bridge Institute, Abuja Page 11

Dots for splitting window

©Digital Bridge Institute, Abuja Page 12

©Digital Bridge Institute, Abuja Page 14

©Digital Bridge Institute, Abuja Page 15

©Digital Bridge Institute, Abuja Page 16

©Digital Bridge Institute, Abuja Page 17

©Digital Bridge Institute, Abuja Page 19

©Digital Bridge Institute, Abuja Page 20

©Digital Bridge Institute, Abuja Page 21

©Digital Bridge Institute, Abuja Page 22

©Digital Bridge Institute, Abuja Page 23

©Digital Bridge Institute, Abuja Page 24

©Digital Bridge Institute, Abuja Page 25

©Digital Bridge Institute, Abuja Page 26

©Digital Bridge Institute, Abuja Page 27

©Digital Bridge Institute, Abuja Page 28

©Digital Bridge Institute, Abuja Page 29

 Click the option that best matches how

©Digital Bridge Institute, Abuja Page 30

©Digital Bridge Institute, Abuja Page 31

©Digital Bridge Institute, Abuja Page 32

 Nominal Data – Nominal Level of data is used for unranked

©Digital Bridge Institute, Abuja Page 35

 Interval Data – With interval data the distance between values

©Digital Bridge Institute, Abuja Page 36

 Level of measurement gives a classification that describes the

©Digital Bridge Institute, Abuja Page 37

Job Satisfaction Student Performance

5 = Very Satisfied. 5 = Excellent

 Yes or No – There is always a meaningful rank between 0

Can Nominal Ordinal Interval Ratio

Get Frequency Distribution YES YES YES YES

Calculate Mean YES YES YES

Get indication of Order YES YES YES

Fixed Distance between each value YES YES

Has fixed origin of Value YES

Used with most major inferential YES

©Digital Bridge Institute, Abuja Page 39

©Digital Bridge Institute, Abuja Page 40

©Digital Bridge Institute, Abuja Page 41

©Digital Bridge Institute, Abuja Page 42

 The left column displays all of the

©Digital Bridge Institute, Abuja Page 44

©Digital Bridge Institute, Abuja Page 45

©Digital Bridge Institute, Abuja Page 46

©Digital Bridge Institute, Abuja Page 49

 Define the value for Gender

©Digital Bridge Institute, Abuja Page 50

 Notice the implementation

©Digital Bridge Institute, Abuja Page 51

©Digital Bridge Institute, Abuja Page 52

©Digital Bridge Institute, Abuja Page 53

©Digital Bridge Institute, Abuja Page 54