You are on page 1of 11

Hong Kong Institute of Vocational Education (Sha Tin)

I.T. Essentials Applied Science (ITE3004)


Unit of Competency GCIT404A

2012-2013

LAB SESSION 9
Topic TWO: Statistical Software
Learning Content:
(A)
Introduction of Statistical Software
(B)
Introduction of PSPP
(C)
(D)
(E)
(F)

Data Handling
Displaying and Summarizing Data
Modifying Data
Review on Statistics

(G)

Analyzing Data
1.
One Sample T Test
2.
Paired Samples T Test
3.
4.
5.
6.

Independent Sample T Test


One Way ANOVA
Chi Square Test of Independence
Bivariate Correlation

7.

Simple Linear Regression

Copyright VTC 2012

Page 1

Hong Kong Institute of Vocational Education (Sha Tin)


I.T. Essentials Applied Science (ITE3004)
Unit of Competency GCIT404A

2012-2013

Section G: Analyzing Data


1.

One Sample T Test

2.
Paired Samples T Test
3.
Independent Sample T Test
4.
One Way ANOVA
5.
Chi Square Test of Independence
It tests the association between two categorical variables.
variables are independent, there is no relationship between them.

When two
If there is a

relationship between the two variables, we would expect a higher proportion of


one group than the other group in a category.
e.g. The relationship between gender and preference of car model
e.g. The relationship between education level and monthly income
Assumptions:
The level of measurement for both variables is scale, ordinal or nominal.
It is the most useful for nominal variables for which we do not another
options.
Hypotheses:
Null: There is no significant relationship between the two variables.
Alternate: There is a significant relationship between the two variables.
Example 1
Lets read the output of Crosstab Chi Square Test of Independence
between gender and their preferences on Pepsi, Coke or Spirit.

Copyright VTC 2012

Page 2

Hong Kong Institute of Vocational Education (Sha Tin)


I.T. Essentials Applied Science (ITE3004)
Unit of Competency GCIT404A

2012-2013

We often need to identify which cell or cells are the major contributors to
the significant chi-square test by examining the pattern of column
percentages.
Based on the column percentages, we would identify cells on the Pepsi
row and the Spirit row as the ones producing the significant result
because they show the largest differences: 30.1% on the Coke row
(53.2%-23.3%) and 19.3% on the Pepsi row (46.7%-27.4%)
As the significance value (p-value) is .001 (< .05), we would reject the
Null Hypothesis and conclude that there is a significant relationship
between the two variables.
It is better to follow up the analysis with a post hoc test.
Step #1
Open the System File (Lab07_Answers.sav).
Step #2
From the menus, choose Analyse
Descriptive Statistics Crosstabs.
Step #3
Select Marital Status, and move it into the Rows list. Then, select Gender,
and move it into the Columns list.

Copyright VTC 2012

Page 3

Hong Kong Institute of Vocational Education (Sha Tin)


I.T. Essentials Applied Science (ITE3004)
Unit of Competency GCIT404A

2012-2013

Step #4
Click Format and a dialog box is prompted.
Step #5
Select Print tables, Pivot, Ascending and Label, and then click Continue.
Step #6
Click Statistics and a dialog box is prompted.
Step #7
Select Chisq, and then click Continue.
Step #8
Click OK to run the procedure.

As the significance value (p-value) is .46 (> .05), we cant reject the Null
Hypothesis and conclude that there is a no significant relationship
between the two variables.

Copyright VTC 2012

Page 4

Hong Kong Institute of Vocational Education (Sha Tin)


I.T. Essentials Applied Science (ITE3004)
Unit of Competency GCIT404A

2012-2013

6.
Bivariate Correlation
Ittests the correlation(Pearson R) between two categorical variables, by
showing the magnitude and the direction.
e.g. The relationship between head circumference at age 6 to 14 months and
the cerebral grey matter measurement at age 2 to 5 years
e.g. The relationship between the lead content of soil and the distance from a
major highway
e.g. The relationship between the vocabulary size in English and age in early
childhood education
e.g. Consumption of hot chocolate is negatively correlated with crime rate (high
values of hot chocolate tend to be paired with lower crime rates), but both are
responses to cold weather. (CAUSATION / CAUSALITY)
Assumption:
The level of measurement for both variables is scale or ordinal.
Both variables should be normally distributed.
It is a measure of linear association ONLY.
If the relationship is not approximately linear (e.g. complicated curved
relationship, such as quadratic, exponential), we need to examine the
scatterplot.
Hypotheses:
Null: There is no association between the two variables.
Alternate: There is an association between the two variables.
The correlation coefficient is a number between +1 and -1. This number
tells us about the magnitude and direction of the association between
two variables.
The MAGNITUDE is the strength of the correlation.
If the correlation is 0 or very close to zero, there is no association
between the two variables being tested.
For the correlation coefficient between -0.5 and +0.5, there is a weak
correlation and usually means there is not really any relationship
between the two variables being tested.

Copyright VTC 2012

Page 5

Hong Kong Institute of Vocational Education (Sha Tin)


I.T. Essentials Applied Science (ITE3004)
Unit of Competency GCIT404A

2012-2013

For the correlation coefficient between -0.5 and -0.8 OR between +0.5
and +0.8, there is a moderate correlationoccurring between the two
variables being tested.
For the correlation coefficient between -0.8 and -1 OR between +0.8 and
+1, there is a strong correlation occurring between the two variables
being tested.
The DIRECTION of the correlation tells us how the two variables are
related.
If the correlation is positive, the two variables have a positive
relationship (as one increases, the other also increases).
If the correlation is negative, the two variables have a negative
relationship (as one increases, the other decreases).
Example 2
Letsread the output of a Pearson correlation between the Rosenberg
Self-Esteem Scale and the Assessing Anxiety Scale.

It creates a correlation matrix of the two variables.


Three pieces of information are shown, the Pearson correlation
coefficient, the significance value and the number of cases (N).
Pearson correlation coefficient (-0.378)shows that there is a moderate
negative relationship between Rosenberg Self-Esteem Scale and
Assessing Anxiety Scale.
As self-esteem increases, anxiety decreases.

Copyright VTC 2012

Page 6

Hong Kong Institute of Vocational Education (Sha Tin)


I.T. Essentials Applied Science (ITE3004)
Unit of Competency GCIT404A

2012-2013

Step #1
Open the System File (Lab09_Children.sav).
Step #2
Background:We conducted a study on the relationship between Age (month)and
Height (cm) of children; in other words, predicting Height (cm) from Age (month), if
the two variables are correlated.
Step #3
From the menus, choose Analyse Bivariate Correlation.
Step #4
Select Age (month)and Height (cm) (both variables), and move them into the list
on the right text area.

Step #5
Select Two-tailed (OR One-tailed) in Test of Significance.
Step #6
Click OK to run the procedure.

Interpretation:
The Pearson correlation coefficient (0.68) shows that there is a positive,
moderate relationship between Age (month) and Height (cm).
As age increases, height also increases.

Copyright VTC 2012

Page 7

Hong Kong Institute of Vocational Education (Sha Tin)


I.T. Essentials Applied Science (ITE3004)
Unit of Competency GCIT404A

2012-2013

7.
Simple Linear Regression
Ittests whether there is a linear relationship between two quantitative variables.
Assumptions:
The data are linear (If you look at a Scatterplot of the data, and the data
seem to be moving in a straight line, it's a good indication that the data are
linear).
The dependent variable should be normally distributed.
When the two variables are perfectly correlated,the prediction is perfect; the
less correlatedthe variables, the less accurate the prediction.
It takes the form y = a + bx where y is the response (dependent)
variable, x is the explanatory or predictor (independent) variable and a
is the intercept term of the model, b is the slope/gradient of the linear
model.
Hypotheses:
The hypotheses for regression focus on the slope/gradient of the regression
line.
Null: The slope/gradient equals zero (there is no slope/gradient)
Alternate: The slope/gradient is not equal to zero
Example 3
There are 3 key pieces of information in the output:
The R Square value
The significance of the regression
The values of the constant and slope
Letsread the outputs of a simple linear regression between Assessing
Prejudice (independent variable) and Comfort with Inter-Ethnic Situations
(dependent variable).

Copyright VTC 2012

Page 8

Hong Kong Institute of Vocational Education (Sha Tin)


I.T. Essentials Applied Science (ITE3004)
Unit of Competency GCIT404A

2012-2013

First, the Model Summary table shows the Pearson correlation


coefficient (R = 0.382) shows that there is a positive, moderate
relationship between Assessing Prejudice and Comfort with Inter-Ethnic
Situations.
The R Square shows how much of the variance of the dependent
variable can be explained by the independent variable. In this case,
14.6% of the variance in Comfort with Inter-Ethnic Situations can be
explained by differences in levels of Assessing Prejudice.
It is preferably to be greater than 65%.

The ANOVA table uses the F-value to check the hypothesis.


As the significance value (p-value) is .000 (< .05), we would reject the
Null Hypothesis and conclude that at least one coefficient is not equal to
zero.
There is a significant linear relationship between Assessing Prejudice and
Comfort with Inter-Ethnic Situations.

Finally, the Coefficients table gives us all the information we need to


plug into y = a + b x equation. In this example, a = 5.474 and b = -.530.
The t-value is -6.058 and the significance value (p-value) of the gradient
parameter Assessing Prejudice is .000 (< .05). It provides a strong
evidence of a negative association between Assessing Prejudice and
Comfort with Inter-Ethnic Situation.

Copyright VTC 2012

Page 9

Hong Kong Institute of Vocational Education (Sha Tin)


I.T. Essentials Applied Science (ITE3004)
Unit of Competency GCIT404A

2012-2013

The equation is: y = 5.474 0.53 x.


Therefore, the regression model is
Comfort with Inter-Ethnic Situations = 5.474 0.53 Assessing Prejudice
Step #1
Open the System File (Lab09_Children.sav).
Step #2
Background: We conducted a study whether there is a linear relationship between
Age (month) and Height (cm) of children.
Step #3
From the menus, choose Analyse
Linear Regression.
Step #4
Select Height (cm) (at least one dependent variable), and move it into the
Dependent list. Select Age (month), and move it into the Independent list.
Step #5
Click Statistics. Select Coeff, R and ANOVA.
Step #6
Click OK to run the procedure.
Model Summary

ANOVA

Coefficients
Interpretation:
In the Model Summary table, the Pearson correlation coefficient (0.68)
shows that there is a positive, moderate relationship between Age
(month) and Height (cm).
The R Square (0.46) shows 46% of the variance in Height (cm) can be
explained by differences in levels of Age (month).
In the ANOVA table, the significance value (p-value) is .00 (< .05), we
would reject the Null Hypothesis and conclude that at least one
coefficient is not equal to zero.

Copyright VTC 2012

Page 10

Hong Kong Institute of Vocational Education (Sha Tin)


I.T. Essentials Applied Science (ITE3004)
Unit of Competency GCIT404A

2012-2013

There is a significant linear relationship between Age (month) and Height


(cm).
The Coefficients table gives us all the information we need to plug into
y = a + b x equation. In this example, a = 100.43 and b = .35.
The t-valueis 5.67 and the significance value (p-value) of the gradient
parameter Age (month) is .00 (< .05). It provides a strong evidence of
a positive association between Height (cm) and Age (month).
The equation is: y = 100.43 + 0.35 x.
Therefore, the regression model is
Height (cm) = 100.43 + 0.35 Age (month)
References
IBM SPSS Statistics 20 Brief Guide
Lind,D.A., Marchal, W.G., &Wathen, S.A. (2012),Statistical
Techniques in Business & Economics, 15th ed., McGraw-Hill Irwin,
New York, USA.
Peck, R., Olsen, C. & Devore, J. (2008), Introduction to Statistics &
Data Analysis, 3rd ed., Thomson Brooks/Cole, USA.
----- THE END -----

Copyright VTC 2012

Page 11