You are on page 1of 87

STATISTICAL PACKAGE FOR SOCIAL

SCIENCES

JOSE JUREL M. NUEVO


Member, National Research Council of the Philippines
LEARNING OBJECTIVES
1. Understand basic concepts of biostatistics
and computer software SPSS.
2. Select appropriate statistical tests for
particular types of data.
3. Recognize and interpret the output from
statistical analyses.
4. Report statistical output in a concise and
appropriate manner.
NATURE OF DATA

Data is the value you get from observing


(measuring, counting, assessing etc.) from
experiment or survey. Data is either categorical or
metric. Categorical data is further divided into
Nominal and ordinal, whereas metric into discrete
and continuous (quantitative) data.
Nominal data
The data is divided into classes or categories. sex,, alive/ dead,
infected/not infected, hair color, smoking status. No
meaningful order of classes.
Ordinal data
The data is also divided into classes or categories but be put in
meaningful order.
For example satisfaction level:-Very satisfied, satisfied, neutral,
unsatisfied, very unsatisfied. Pain as mild, moderate, sever.
Socioeconomic status: poor, middle, rich, grade of breast
cancer, better, same, worst.
Discrete data
When data is taken from some counting process, for example
number of patients in different wards, number of nurses,
number of hospitals in different cities.
Continuous or quantitative data
When data is taken from some measuring process, for
example, height, weight, Temperature, uric acid, blood glucose
and serum level.
Primary Scales of Measurement

Scale Basic Common Marketing Permissible Statistics


Characteristics Examples Examples Descriptive
Inferential
Nominal Numbers Social Security Brand nos., Percentages, Chi-square,
identify nos., store mode binomial test
& classify numbering of types
objects football
players
Ordinal Nos. indicate Quality Preference Percentile, Rank-order
the rankings, rankings, median correlation,
relative rankings of market Friedman
positions of teams in a position, social ANOVA
objects but not tournament class
the magnitude
of differences
between them
Interval Differences Temperature Attitudes, Range, Product-
between objects (Fahrenheit) opinions, mean, moment
index standard
Ratio Zero point is Length, weight Age, sales, Geometric Coefficient of
Steps in data analysis
•Questionnaire checking/Data preparation
•Coding

•Cleaning data

•Applying most appropriate tools for

analysis
QUESTIONNAIRE CHECKING
A questionnaire returned from the field may be
unacceptable for several reasons.
Parts of the questionnaire may be incomplete.
The pattern of responses may indicate that the respondent did not
understand or follow the instructions.
The responses show little variance. One or more pages are missing.
The questionnaire is received after the pre-established cutoff date.
The questionnaire is answered by someone who does not qualify for
participation.
DATA PREPARATION
Preparation of data file
It is important to convert raw data into a usable data for
analysis (coding where it needed), simply transform
information from questionnaire to computer database
The analysis and results will surely depend on the quality
of data
There are possibilities of errors in handling instruments,
raw data, transcribing, data entry, assigning codes, values,
value labels
Data need to be cleaned to fulfill the analysis conditions
CODING

Coding means assigning a code, usually a


number, to each possible response to each
question.
Data cleaning
•One of the first steps in analyzing data is to
“clean” it of any obvious data entry errors:
Outliers? (really high or low numbers)
Example: Age = 110 (really 10 or 11?)
•Value entered that doesn’t exist for variable?

Example: 2 entered where 1=male, 0=female


•Missing values?

Did the person not give an answer? Was answer


accidentally not entered into the database?
Cont……
• May be able to set defined limits when entering data
Prevents entering a 2 when only 1, 0, or missing are acceptable
values
Univariate data analysis is a useful way to check the

quality of the data
Choosing the Right Statistic
One of the most difficult parts of the
research process for most students is
choosing the correct statistical tools to
analyze their data.

In most research projects, it is likely that once


will use quite a variety of different statistical
technique depending on the problem statement,
hypotheses and the nature of the data at hand.
It is therefore important that one will have at
least a basic understanding of the different
statistical instrument and their underlying
requirements.
This time, we will look at the various statistical
techniques and instrument and then we take you step
through the decision-making process. If you take this
process step by step you will find the final decision as to
what technique you will choose.

There are two types of statistical techniques:


Parametric and
Non-Parametric

What is the difference between these two


groups? And why the distinction is
Important?
The word PARAMETRIC comes from parameter, or
characteristic of a population. The Parametric Test (e.g.,t-test,
ANOVA) make assumption about the population, that the
sample has been drawn from. This often includes assumptions
about the shape of the population distribution (e.g. normally
distributed).

NON-PARAMETRIC techniques, on the other hand, do


not have such strict requirements and do not make
assumptions about underlying population distribution
(that is why they are sometimes referred to as
distribution free test). Non-Parametric techniques are
ideal for use when you have a data that is measured on
Nominal (categorical) and Ordinal (ranked) scales. They
are also helpful when you have a very small samples,
and when your data does not meet the assumptions of
the parametric techniques.
Non-Parametric Technique Parametric Alternative
Chi-Square for Independence Independent Sample t-test
Correlation Paired Sample t-test
Friedman test One way ANOVA
Spearman Rank Order Pearson’s Product-moment Correlation
Introduction to SPSS
SPSS (Statistical Package for Social Sciences)
is a software package used for conducting
statistical analyses, manipulating data, and
generating tables and graphs that summarize
data. Statistical analyses range from basic
descriptive statistics, such as averages and
frequencies, to advanced inferential statistics,
such as regression models, analysis of
variance, and factor analysis.

SPSS also contains several tools for


manipulating data, including functions for
recoding data and computing new variables
as well as merging and aggregating datasets.
SPSS also has a number of ways to
summarize and display data in the form of
tables and graphs.
Starting SPSS
There are numbers of
different ways to start
SPSS.
The simplest way is to
look for an SPSS icon
your desktop. Place your
cursor on the icon and
double click, or:
Click Start Button icon
under, and WinXP
Click All Programs,
then,
Select SPSS for
Windows
SPSS opening screen
1. Open an existing data source
button from the opening screen,
and then on More Files. This will
allow you to search through the
various directories on your
computer to find where your data
file is stored.
2. Type in data Option, allows you 3
to setup a new data file. Click on
this option and SPSS will give you 2
a blank spreadsheet where you can
name your variables and enter your 1
4
data.
3. Run the tutorial, this option 5
allows you to view the tutorial
lesson about SPSS.
4. Run an existing query is used
to view the existing query.
5. Create new query using
Database Wizard, an option button
to create database through wizard
program
The Four Windows: Data Editor
Data Editor
Spreadsheet-like system for defining, entering, editing,
and displaying data. Extension of the saved file will be
“sav.”
The Four Windows: Output Viewer
Output Viewer
Displays output and errors. Extension of the saved file will
be “spv.”
The Four Windows: Syntax editor
Syntax Editor
Text editor for syntax composition. Extension of the
saved file will be “sps.”
The Four Windows: Script Window
Script Window
Provides the opportunity to write full-blown programs,
in a BASIC-like language. Text editor for syntax
composition. Extension of the saved file will be “sbs.”
Procedure Frequencies

In this lesson, first we will learn how to enter data, edit the
variable name, and assign labels and values. Next, we will
create a frequency table, produce descriptive statistics, and
plot a histogram.
Quantitative Data

How to Enter Data


Example 1 Dr. Smith gave a 20-item quiz to ten
students.

Produce a frequency table for this small data


set.
SPSS for Windows
A. Open a new SPSS Data Editor window: File / New / Data.
B. How to Enter data
In the Data View
a. A heavy border appears around the first data cell in the
first column.
b. Type the first score `16`.
Note that this value will appear in the cell editor. Press
the Enter key. Wait a moment. The data value will
appear in the cell. Continue entering all the remaining
data values.
C. By entering a value in the first column, you
automatically create a new variable with the default name
var00001.
D. Create your own variable names
Click the Variable View tab as shown below
The Variable View window will appear.

To edit the variable name, double click on var00001.


Delete var00001 and type in `score` to replace the default
one.

Next, click the Data View tab to return to the data view
window.
E. Choose a statistical procedure.
a. From the main menus choose: Analyze / Descriptive
Statistics / Frequencies
[To know more about Procedure Frequencies, click on the
Help button. Close the help topic window when you are done.]
b. Select the variable ‘scores’ to be analyzed. By default, it will
display the frequency table.
c. Click the Statistics button and select the statistics you want
SPSS to compute as shown below.

Percentile Values: Select Quartiles. Quartiles are


points which divide a distribution of scores into
quarters.
Dispersion: Select Standard deviation, variance,
range, Minimum, Maximum.
Central Tendency: Select Mean, Median, Mode, and
Sum.
Distribution: Select Skewness and Kurtosis.
d. Click continue. Return to Frequencies dialog box. Click the
Charts button. Since the variable "score" is a continuous variable.
Select Histograms with normal curve as shown below.

Click Continue and OK.


SPSS Output
Descriptive Statistics

1. Measures of Central Tendency and Variability


Measures of Central Tendency: The single most
representative value in the distribution of scores..

(1) The median, the central value a set of ordered


scores, is 15. It divides the distribution of scores into
equal halves. Half of the students scored less than
15 while other half scored greater than 15.
(2) The most frequently occurring score is 16. It
appears four times. The mode is 16.
(3) The mean score is 15. The mean can be defined
as the sum of all scores, divided by the number of
scores: 150/10 = 15.
Measures of Variability: Quantify the degree to which
the scores are different from each other. The range,
variance, and standard deviation are examples of
measures of variability. Note that SPSS only
compute the unbiased variance and standard
deviation.
Quartiles are points which divide a distribution of scores into
quarters. The first quartile is the 25th percentile. The second
quartile is the 50th percentile or the median. The 75th percentile is
the third quartile.

The 25th percentile score is 14. That is, 25% of the students who
took the same test scored at or below a score of 14.
The 50th percentile score is 15. That is, 50% of the students who
took the same test scored at or below a score of 15.
The 75th percentile score is 16. That is, 75% of the students who
took the same test scored at or below a score of 16.
Frequency Table

There are only a few data values. The frequency table is shown below

Examine the Frequency column.

Count the number of times a score occurs. The frequency


associated with the value of 13 is 1. The frequency associated with
the value of 14 is 2.The frequency associated with the value of 15 is
3. The frequency associated with the value of 16 is 4. The mode is
defined as the most frequently occurring score in the distribution of
a variable. Thus, the mode is 16.

Examine the Cumulative Percent column.


Approximately 60% of the students had a score of 15 or less.
Histogram

"Score" is a quantitative variable. Visualize the frequency


distribution: histogram. The distribution was slightly
skewed to the left. The tail of the distribution points
toward the lower end of the scale. Skewness = -.712.
Categorical (Qualitative) Data
How to Define Labels and Values

Example 2 Dr. Smith asked fifteen students in his class on what days of
the week they were born. The results are shown below.

A. Code Data
The variable `dayofwk` is a categorical variable. One way to
simplify data entry is to assign numbers or symbols to represent
responses.
Assign 1 for `Sunday`, 2 for 'Monday', 3 for 'Tuesday', 4 for
'Wednesday', 5 for 'Thursday', 6 for 'Friday', 7 for 'Saturday' and 9
for ' Missing '.
B. The most common method of representing frequency of
categorical membership is a bar chart. Our task is to produce a
bar chart.
SPSS for Windows

A. Open a new Data Editor window. From the menus choose: File /
New / Data.

B. Define variable names, variable labels, value labels, and user-


missing values.

a. Name: Define the variable name. Click the Variable View tab.
Double click on the textbox. Type in the variable name
“dayofwk” as shown below

b. Label: Assign the variable DAYOFWK an extended


descriptive label DAY OF THE WEEK.
Double click on the textbox. Type in the long label
DAY OF THE WEEK as shown below
c. Values: Assign descriptive labels to values.
Double click on the textbox and a gray square will appear. Click on
the gray square as shown below

A Value Label dialog box will appear.


(a) Click inside of the Value text box and type 1.
(b) Press the Tab key or click inside of the Value Label text box
and type Sunday.
(c) Click on Add. The value label is added to the list as shown
below.

(d) Continue entering the other values (2 to 7) and their descriptive


labels (Monday to Saturday).

Finally, click Ok to end the Value Labels input.


d. Missing Values.

The easiest way to handle missing data is to leave them blank


when entering data. To distinguish among different types of
missing data, the missing values command can be used. For
example, we can code a respondent’s forgetfulness as 9 and a
respondent’s refusal as 99.

Double click on the textbox and a gray square will appear. Click
on the gray square as shown below

The Missing Values dialog box will appear.


(a) Select Discrete Missing Value.
(b) Type 9 in the first text box.

(c) Click OK.


e. Measure
Click on Measure. A down arrow will appear.

Click on the down arrow, choose Nominal. (In our


example, the order of the days of the week is not
important.)

C. Enter data values.


D. Procedure frequencies

a. From the menus choose: Analyze / Descriptive Statistics /


Frequencies

b. Highlight the variable ‘dayofwk’ and Click on the > pushbutton.

c. Click on Charts. This opens a Frequencies: Charts dialog box.


(a) Select Bar Chart(s). Note that the variable `dayofwk` is a
categorical variable.
(b) Click Continue or press the Enter key. Return to the
Frequencies dialog box.

d. Click OK in the Frequencies dialog box. The frequency table


and the chart will be displayed in the Viewer window.
About 27% of students were born on Thursday.

Visualize the frequency distribution of the categorical


variable, Day of the Week: Bar chart.
Examine the highest point in the chart. The mode is
_____.
The mode is Thursday. Note that the mode is often
used when the data are on a nominal scale. The mode is
the simplest measure of location of a distribution.
DATA TRANSFORMATIONS
In this lesson, we will learn how to compute and record new values.
I. Compute New Values
A researcher had collected the data about family income and family size.

Task: Compute income per capita.


SPSS for Windows
Create a new variable for income per capita, IPC.

Compute income per capita: income / family size


A. Open a new Data Editor Window: File / New / Data. Enter
data values.
B. Define the variable names: Click the Variable View tab.
Enter the variable names: income and famsize.
C. To compute values for a new variable 'ipc', from the
menus choose: Transform / Compute.

a. Click inside of the Target Variable text box. Enter


the new variable name `ipc`.
b. Type “income” or move the variable ‘income’ to the
Numeric Expression box. (Click the variable `income`.
Click the > pushbutton. The variable `income` will
appear in the Numeric Expression box.)
c. Type “/” or click the sign `/` (divide) from the
calculator pad. The sign `/` will appear in the Numeric
Expression box.
d. Click the variable `famsize`. Click the > pushbutton.
The variable `famsize` will appear in the Numeric
Expression box.
e. Click on OK. The data editor window displays the
new variable `ipc` and its values. Print it out.
II. Recoding New Values

Example: Twenty-one parents in the X school


district were randomly selected and asked how
many years of formal education they have
completed. Suppose you had information on
education that ranged from 0 to 22 years, and
now you need to do an analysis using only three
categories: parents who did not complete high
school, those who completed high school but
not college, and those who completed college.

SPSS for Windows


A. Open a New Data window: File / New / Data
B. Define the variable `educ`. Enter 21 values.
C. From the menus choose: Transform / Recode / Into Same
Variables
a. Select `educ` and move it to the Numeric Variables
list.
b. To define the values (or ranges) to be recoded, click
on Old and New Values.

D. Specify the old and new value or range.


a. Old Value Area
Specify the first old range. Click on the Range
radio button as shown above. Enter the range: 0
through 11 for parents who did not complete
high school.

b. New Value
Enter the new value: 1.
c. Click on Add. The numbers 0 through 11 will
now be recorded as `1`
.
d. Specify the second old range (12 through 15)
and specify the second new value `2`. Click on
Add. For those who completed high school but
not college

e. Specify the third old range (16 through 22) and


specify the third new value `3`. Click on Add. For
those who completed college

E. Click on Continue. Return to the Recode into


Same Variables dialog box. Click on OK. The
existing variable `educ` will have a range of from
1 to 3 instead of from 0 to 22.
F. Produce a pie chart

a. To obtain pie charts, from the menus choose: Graphs


/ Pie

b. There is one category variable `educ` with three


groups.
(a) Data in Chart Are

Summaries for groups of cases

(b) Click on Define.

c. A chart definition dialog will appear.

Click on the variable `educ` and move it to the Define


Slices by text box.

Slice Represent: Click on % of cases.

Click on Titles. Enter your preferred text for the title of


the pie chart.

Click Continue. Click OK.


G. Modify the chart

a.Double-click on the bar chart to bring up the Chart Editor


window.

b. From the Chart Editor menu bar choose: Chart / Options

(a) Labels. Click on Percents. Display the percentage of the


whole pie that each slice represents.

(b) Click on Edit Text. `1` is highlighted in the list. Delete `1` in
the Label text box. Type `0 through 11` in the Label text box.
Click on Change.

(c) Highlight `2` from the scroll list. Type `12 through 15` in the
Label text box. Click on Change.

(d) Highlight `3` from the scroll list. Type `16 through 22` in the
Label text box. Click on Change.

(e) Click Continue. Click OK.


Note that about 52% of the respondents did not
complete high school.
EXPLORING RELATIONSHIP
Frequently in survey research you are not
interested in differences between groups but interested in
the strength of the relationship between variables.

There are different techniques that can be applied.

Pearson Correlation

Correlation Analysis is used to describe the strength and


direction of the Linear relationship between two variables.
Pearson Correlation Coefficient ® can only take on values
from -1 to +1.

The sign out front indicates whether there is positive


correlation or negative correlation. The size of the absolute
value provides an indication of the strength of the
relationship.
Table1. Determining the Strength of the Relationship
Interpretations
Value
r= 1.0 to .90 or r= -1.0 to -.90 Very High Correlation; Very significant relationship
r= .89 to .70 or r= -.89 to -.70 High Correlation; Significant relationship
r= .69 to .40 or r= -.69 to -.40 Moderate Correlation; Average relationship
r= .39 to .20 or r= -.39 to -.20 Low Correlation; Small relationship
r= .19 and below Very low Correlation; Almost no relationship

Correlation describes the relationship between two


continuous variables, both in terms of the strength
of the relationship and the direction.
Pearson Product-Moment Coefficient is designed for
interval variables like for instance, Gender: male / female.
Example of research question:
Is there a relationship between the amount of
control people have over their internal states and their
levels of perceived stress?
Do people with high levels of perceived
control experience lower levels of perceived stressed?
There are two variables:

Both continuous, or one continuous and the other


dichotomous (two values).

Spearman rank Order Correlation (rho)– is used to


calculate the strength of the relationship between two
continuous variables. This is non-parametric alternative
to Pearson’s product-moment correlation. It is also
designed for used with ordinal level or ranked data.
Example of research question:
How strong is the relationship between control of internal
states and perceived stress.
Two Continuous variables: control of internal states and
perceived stress (as measured by the perceived stress
scale).
Linear Regression Analysis

A pretest on math skills and the final exam in an


introductory statistics class were collected in the
fall, 2001 as shown below.

Are scores on math skills


correlated with the
performance in a statistics
course?
A. Define the variables `pretest` and `final`.

B. Enter values.
Correlation

Are the scores on the pretest and final test correlated?

From the menus choose: Analyze \ Correlate \ Bivariate.


Select the variables pretest and final to be correlated.

By default, Pearson correlation will be computed and a


two-sided test of significance is used. The null
hypothesis states that the correlation is equal to zero.
The research hypothesis states that the correlation is
different from zero.

Click OK to obtain the result.


The null hypothesis is that the population
coefficient of correlation between the pretest and
the final exam is zero. The alternative hypothesis is
that the population coefficient of correlation
between the pretest and the final exam is
significantly different from zero.

Examine the output. For a p-value of .000, report It


as p < .001.

It is concluded that math skills and grade in


statistics are correlated (r = .893, p < .001).
Table1. Determining the Strength of the Relationship

Interpretations
Value
r= 1.0 to .90 or r= -1.0 to -.90 Very High Correlation; Very significant relationship
r= .89 to .70 or r= -.89 to -.70 High Correlation; Significant relationship
r= .69 to .40 or r= -.69 to -.40 Moderate Correlation; Average relationship
r= .39 to .20 or r= -.39 to -.20 Low Correlation; Small relationship
r= .19 and below Very low Correlation; Almost no relationship

Correlation describes the relationship between two


continuous variables, both in terms of the strength
of the relationship and the direction.
Linear Regression Analysis

Stage 1: Development
Since the math skills was strongly correlated with grades in
statistics, the researcher decided to use the math skills as a
predictor variable to predict grades in statistics.
Task: Develop a linear regression equation to predict the scores
on the final exam from the scores on the pretest.
From the menus choose: Analyze \ Regression \ Linear.
Dependent and Independent (s)
Move the variable `final` to the Dependent variable
list.
Move the variable `pretest` to the Independent
variable list.
To save predicted values, residuals, and prediction
interval for individual predicted values, click on
Save. The Save dialog box will appear. Click
Continue and OK.
A. Summary Statistics for the Equation

a. What is the correlation between the pretest and the


final exam? (r = .893)

b. The Pearson r, when squared, offers the proportion of


variance in one variable predictable from the other.

What percentage of the variation in the final exam is


explained by the pretest? (80%)
B. Regression and Prediction

Find the linear regression equation for predicting the final exam
from the pretest.

b. Stage 2: Estimation

A new student who received a score of 2 on the pretest


in the fall, 2002. What is the best estimate of the score
that the student will receive on the final exam?
.868(2) + .996 = ____ (2.732)

Unless there is a perfect relationship, it is unlikely that


the student's real score exactly equals the predicted
value.
Partial Correlations

A researcher is interested in studying the relationship between


reading comprehension and grades in science. However, he or
she is also concerned with the effect of IQ on the relationship.
The researcher finally decides to compute a partial correlation
coefficient between reading comprehension and grades in
science when IQ is held constant.

1. Enter Data
2. Choose Analysis \ Correlation \ Partial.

Select the two variables, read and science, to be correlated.

Note that you may press and hold down the Control key (the
Ctrl key on your keyboard) while clicking on the two
variables. Then click the arrow button to move them at the
same time. You may also move the variables one by one.

Controlling for: Select the variable IQ.

Click Options. Click `Means and standard deviations` and


`Zero-order correlations` to obtain a matrix of simple
correlations between all variables. Click Continue. Click OK.
SPSS Output
•Descriptive Statistics

•Compute the coefficients of correlation.


Tests of Significance
The null hypothesis states that the correlation is equal to
zero.
Is the correlation significantly different from zero?

There are three sets of bivariate correlations: read and


science, read and IQ, scienc and IQ. Use the Bonferroni
method to control for type I error across the 3 correlations.
Corrected Significance Level: Three correlations are tested
(READ &SCIENCE, READ & IQ, and SCIENCE & IQ). Set the
significance level as .017 (.05 / 3 = .017) for each test. Also,
note that the degrees of freedom are equal to N - 2 = 9 - 2 =
7.
The correlation between reading comprehension and
grades in science was significant, r(7) = .8072, p < .017.

The correlation between reading comprehension and IQ


was significant, r(7) = .8742, p < .017.

The correlation between grades in science and IQ was


significant, r(7) = .9005, p < .017.

Note that reading comprehension and grades in science are


significantly influenced by IQ.
EXPLORING DIFFERENCES BETWEEN GROUPS

T-test
T-test are used when you have two groups (e.g., Males and
Females) or two sets of data (Before and After) and you wish to
compare the mean score on some continuous variable.

Two Main Type of T-tests


Paired –Samples t-test is used when you want to compare
the main scores of the group of people on two different
occasions, or you have matched pairs.
A paired samples t-test will tell you whether there is a
statistically significant difference in the mean scores of
group1 and group 2.

Example of null hypothesis:


There is no significant difference between the mean
score obtained by college applicant entrance test from
Private and Public school?
Independent Samples t-test are used when you have two
different groups of people (Males and Females) and you
are interested in comparing their scores. In this case you
only collect information on one occasion, but from two
different sets of people.
On the other hand, this test is used when you want to
compare the mean scores of two different groups of
people or conditions.

Example of question:
Is there a significant difference in the mean self-esteem
scores for females and males?

There are two variables:


One categorical Independent variable (e.g., males/females);
and
One continuous, dependent variables (e.g., self-esteem
scores).
Consider a hypothetical experiment pertaining to a change in
attitudes following a persuasive communication.
Random selection and random assignment

Prior to the onset of the experiment, ten subjects were


randomly selected and randomly assigned to one of the two
groups. Assume that there is no difference between the two
groups in the initial attitudes toward animal slaughter.

Manipulate the independent variable and measure the


dependent variable
One group watched a movie about killing baby seals in the
Arctic (the experimental group), and the second group
watched a movie about the migration of caribou (the placebo
group). Subsequent to viewing the movies, both groups
responded to a questionnaire measuring agreement and
disagreement for various arguments justifying and opposing
the hunting and killing of animals. Higher scores on the
questionnaire represent rationalization of the animal slaughter.
If all factors except one are kept constant and the events
systematically change as a result of manipulation of that factor,
then the change can be ascribed to that particular factor
(experimental condition or treatment) and some degree of
causality can be inferred.
Data set

Research Question
Does watching the movie about killing baby seals
change viewers' attitudes toward the hunting and
killing of animals? Specifically, will the group
watching a movie about killing baby seals score lower
on the questionnaire?

Conduct an independent t-test. Use a .05 significant


level.
1. Input data. The variable `group` represents the group
membership. The variable `score` is the dependent variable.

2. From the menus choose: Analyze \ Compare


Means \ Independent-Samples T Test.

3. Select the test variable `score`. Select the


group variable `group`.
4. Define the categories of the grouping variable.
Click on Define Groups. Enter 0 and 1. Click on Continue.

5. Click on OK to get the default independent-samples


t test with two-tailed probabilities and a 95%
confidence interval for the difference in means.

Both equal- and unequal- variance t values are


provided, as well as the Levene test for equality of
variances.
Outputs
Group Statistics
Examine the group means and standard deviations.

Group Mean
Higher scores on the questionnaire represent
rationalization of the animal slaughter.
The group watching a movie about killing baby seals
scored lower (M1 = 28, SD = 3.24) on the questionnaire
than did the group viewing a movie about the migration
of caribou (M0 = 32.8, SD = 3.11).
Equality of Variances
Most computer programs routinely check for equality of
variances for both groups before computing a t-
test. Levene's test is used to test the null hypothesis that the
two population variances are equal.

Levene's Test for Equality of Variances: F = .076, p = .79

The observed significance level for the F test was larger


than .05 (the preset significance level). The null
hypothesis was not rejected, F = .076, p = .79. The two
population variances were equal. The assumption of
homoscedasticity was met.
Independent Samples Test

An independent-samples t test was conducted to


evaluate whether watching the movie about killing
baby seals would change viewers' attitudes toward
animal slaughter. Higher scores on the questionnaire
represent rationalization of the animal slaughter.
One-way Analysis of Variance (ANOVA)

One-way Analysis of Variance is similar to t-test, but is used


when you have two or more groups and you wish to
compare their mean groups and you wish to compare their
mean scores on a continuous variable. It is called One-way
because you are looking at the impact on only one
independent variable on your dependent variable.
One-way Analysis of Variance (ANOVA) will let you
know if the groups differ, but it won’t tell you where the
significant difference is (group1, group2, group3 etc).
It can be conducted post-hoc comparison to find
out which groups are significantly different to one another.
Example of Question:

Is there a difference in optimism scores for young,


middle-aged and old subjects?

There are two variables:


One categorical independent variable with three or more
distinct categories. This can also be a continuous variable
that has been recorded to give three equal groups (e.g., age
group: subjects divided into 3 age categories, 29 and
younger, between 30 and 44, 45 or above).
One continuous dependent variable (e.g., optimism.)
Two-way Analysis of Variance (Two-way ANOVA)
Two-way Analysis of Varian lets you test the
impact of two independent variables on one
dependent variable. This means that there are two
independent variables, and between-groups
indicates that different people are in each of the
groups. This technique allows us to look at the
individual and joint effect of two independent
variables on one dependent variable.
Advantage of using Two-way ANOVA
It allow to test for an interaction effect, that is, when
the effect of one dependent variable is influenced by
another.
Two different type of Two-way ANOVAs’:
Between groups ANOVA (when the groups are
different)
Repeated Measures ANOVA (when the same
people are tested on more than one occasion).
Example of research question:
What is the impact of age and gender on optimism?
Does gender moderate the relationship between age
and optimism?

Three Variables:
Two categorical independent variables (e.g., Gender:
males/females;
Age group: young, middle, old); and
One continuous dependent variable (e.g., total
optimism).

Summary:
For example, it allows you to test for:
Sex differences in optimism;
Differences in optimism for young, middle and old
subjects; and
The interaction of these two variables-is there a
difference in the effect of age on optimism for males
and females?
One-Way Independent Measures ANOVA

A researcher would like to evaluate the effects of four


teaching methods. Twenty-eight junior high students were
randomly selected. They were then randomly assigned to
one of four teaching methods: A, B, C, and D. Below are
their scores on the final examination. Different subjects
are used for all conditions of the experiment.

Data Set

Hypotheses
Are there significant differences among the four teaching
methods?
Set a significance level
Use a .05 significant level.
Data Input
The Analyze Menu
1. To obtain a one-way analysis of variance, from
the menus choose:
Analyze
CompareMeans
One-WayANOVA

Select the dependent variable (final) and the


factor variable (method).

2. Check the assumption of equal variances


To test the null hypothesis that the four groups
come from populations with the same variance,
you can use the Levene test.
To obtain descriptive statistics, the Levene
statistic, and means plot, click on Options.
Select Descriptive, Homogeneity-of-variance,
and Means plot. Click Continue and OK.
SPSS Output
Check the assumption of equal variances
Do four groups come from populations with the same
variance?
To test the null hypothesis that groups come from populations
with the same variances, you can use the Levene test.

The observed significance level is larger than .05.


The null hypothesis is not rejected. The
assumption of equal variances was met.
The F Test
For this example, F(3,24) = 6.663. The observed
significance level was .002. There were significant
differences among the four teaching methods, p < .05. The
differences among groups represented systematic effects.
Additional Graphics:
www.animationfactory.com
- Scale them up or down!
Backdrops:
-.JPG clipart can be scaled up
- These are full sized and take up little file space.
backdrops, just scale them up!
- .PNG clipart can be scaled
- Can be Copy-Pasted out of unusually large without
Templates for use anywhere! distortion.

You might also like