You are on page 1of 93

School of Psychology

University of New South Wales




PSYC2001
Research Methods 2
PSYC4111
Psychology and Statistics for Optometry
Statistics and Computing
Tutorial Manual 2012



Dr Melanie Gleitzman



2

TABLE OF CONTENTS
PAGE
SECTION 1 STATISTICS TUTORIALS

WEEK 2 ............................ 4
WEEK 3 ............................ 6
WEEK 4 ............................ 9
WEEK 5 ............................ 12
WEEK 6 ............................ 16
WEEK 7 ............................ 19
WEEK 8 & 9 ............................ 22
WEEK 10 & 11 ............................ 26
WEEK 12 ............................ 30
WEEK 13 ............................ 32
STATISTICS PRACTICE EXERCISES
............................ 34
SOLUTIONS TO STATISTICS EXERCISES
............................. 54
SECTION 2 COMPUTING LABS

LAB 1 ............................ 63
LAB 2 ............................ 69
LAB 3 ............................ 75
LAB 4 ............................ 78
LAB 5 ............................ 80
LAB 6 ............................ 83
STATISTICAL TABLES

TABLE 1: Areas under the Normal curve ............................ 86
TABLE 2: Students t Distribution ............................. 87
TABLES 3 & 4: Power tables ............................. 88
TABLE 5: Chi Square Distribution ............................. 89
SUMMARY OF FORMULAE ............................. 90
GLOSSARY OF SYMBOLS .. 92


3





SECTION 1

STATISTICS TUTORIALS




4
STATISTICS TUTORIAL WEEK 2
EXERCISE 1 AREA UNDER NORMAL CURVE
(a) What is the area under the normal curve between the Mean and a Z score of 1.28?




(b) What Z score cuts off the top 15% of the normal distribution?




(c) For a normally distributed variable, what is the probability of obtaining a score beyond
Z = 2.0?


EXERCISE 2

For the population of sales managers, assume that salaries are normally distributed with
= $93,500 and o = $5,000.

(a) Find the salary that cuts off the top 10% of the distribution.






(b) What percentage of sales managers have salaries between $85,000 and $95,000?



5
EXERCISE 3 SAMPLING DISTRIBUTION
(a) Class demonstration

(b) What are the properties of a sampling distribution of the mean?







EXERCISE 4 PROBABILITIES FOR RANGE OF MEANS

A psychologist administered the Stanford-Binet IQ test to a group of Grade 5 children. It is
known that, for the population of children, the Stanford-Binet has a mean of 100 and a
standard deviation of 16.

(a) What would be the mean and standard deviation of the sampling distribution
associated with a sample size of n=56?
mean = standard deviation =
n
M
o
o = =

(b) Using a sample of 56 children, the psychologist found that the mean IQ was 103.
What is the probability of getting a sample mean of 103 or higher?





(c) What would be the probability of getting a mean of 103 or higher if n = 144?



6
STATISTICS TUTORIAL WEEK 3
SINGLE MEAN, o KNOWN
CONFIDENCE INTERVAL ESTIMATION
EXERCISE 1
Pulse rates are known to be normally distributed with a standard deviation of 9 beats per
minute. The sample mean pulse rate for a sample of 25 university students is 73 beats per
minute. Calculate the 95% confidence interval for , for the population of university
students.

Formulae:
( ) o
upper
= + M Z
c M

( )
o
lower
= M Z
c M




















EXERCISE 2
How intelligent are politicians relative to the general population? It is known that for the
general adult population = 100 and o = 15 on a standard IQ measure. Suppose you obtain a
random sample of 16 politicians and obtain a sample mean of 95. From these data, estimate
the 95% confidence interval for the population mean IQ of politicians. What can you
conclude?










7
HYPOTHESIS TESTING
EXERCISE 3
Using the data from Exercise 2, carry out a test of the null hypothesis that the mean IQ of
politicians is the same as that for the general population. What can you conclude?
Compare your answer to the confidence interval you calculated in Exercise 2.



Step 1: H
0
100 : = H
1
100 : =



Step 2: Set o level, find Z
c
values



Step 3: Find o
M
=




Find
Z
M
M
=

o





Step 4: Apply decision rule Reject H
0
if |Z| Z
c





Step 5: Conclusion






8
EXERCISE 4
The American College Entrance Examination board standardises its examination results so
that = 500 and o = 100 across the nation. You wish to see whether a particular high school
is obtaining results on a par with high schools across the country and take a random sample (n
= 50) of all students from the school who have taken this examination in the last 5 years. The
mean of this sample was 530.

What can you conclude about the exam results for this school? Carry out a two-tailed test,
using o = .05.




9
STATISTICS TUTORIAL WEEK 4
SINGLE MEAN, o UNKNOWN
ESTIMATING POPULATION VARIANCE, t DISTRIBUTION

Unbiased estimate of o
2


( )
s
X M
n
2
2
1
=


Unbiased estimate of o

( )
s
X M
n
=

2
1

Unbiased estimate of
o
M

(sample standard error)
( )
( ) 1
2

=

n n
M X
s
M


EXERCISE 1 - t TABLES
Find the appropriate t
c
values for:

(a) df = 25, 95% confidence interval.


(b) df = 18, 99% confidence interval.


(c) df = 15, o = .05, two-tailed hypothesis test.


(d) df = 40, o = .01, two-tailed hypothesis test.




10
EXERCISE 2 - CONFIDENCE INTERVAL
( )
( )

upper c M
lower c M
M t s
M t s
= +
=
where
( )
s
x
n n
M
=

2
1
and df = n-1 and x = X-M

A psychologist obtained the following IQ scores from a sample of n = 16 school children.
Assuming random sampling, calculate the 90% confidence interval for the mean of the
population from which the sample was drawn.

X x x
2

94
94
95
96
98
99
99
99
101
101
101
102
104
105
106
106










11
EXERCISE 3 - HYPOTHESIS TESTING

Step 1: H
0
: =
0
H
1
:
0


Step 2: Set o level, find t
c
values, where df = n - 1

Step 3: Calculate
( )
( )
s
X M
n n
M
=

2
1
and
t
M
s
M
=



Step 4: Apply decision rule Reject H
0
if |t| t
c


Step 5: Conclusion

A random sample of clients of a counselling service is given a test of difficulties with
interpersonal relationships. A score of 60 is regarded as satisfactory. Their scores are listed
below. Assuming that the higher the score the more problems the patients have, what does
this indicate about the population of clients? Use alpha = .01.

X x x
2

59
60
67
65
90
89
73
81
83
71




12
STATISTICS TUTORIAL WEEK 5
DEPENDENT MEANS
WHEN o IS KNOWN
(A) CONFIDENCE INTERVAL FOR
D


( ) o
D, upper
= + M Z
D c MD
( ) o
D, lower
= M Z
D c MD


where
o
o
MD
D
n
=


EXERCISE 1
Nine participants attempt to memorise lists of words under conditions of both white noise
(Condition A) and talk-back radio (Condition B). The dependent variable is the difference in
the number of words correctly recalled between the two conditions (Condition A - Condition
B). It is known that in the population the standard deviation is a difference of 6 words. For
the sample of nine participants, the mean difference is found to be 5.8 words. Construct a
99% confidence interval for the population mean difference in number of words recalled
between the two conditions.












13
(B) HYPOTHESIS TEST OF H
0
:
D
= 0

Step 1: H
D 0
0 : = H
D 1
0 : =

Step 2: Set o level, find Z
c
value

Step 3: Calculate o
o
MD
D
n
= Calculate
Z
M
D
MD
=
o


Step 4: Apply decision rule Reject H
0
if |Z| Z
c


Step 5: Conclusion

EXERCISE 2

Nine participants are measured before and after a treatment for depression. The dependent
variable is a depression score where a higher score indicates a higher level of depression. It is
known that in the population, the standard deviation of difference in depression scores is 4.
Test whether the treatment for depression had any effect on participants scores. Use o = .05.

Participant X
1
(Before) X
2
(After) X
D
= X
1
- X
2

1 15 10
2 11 7
3 19 14
4 12 9
5 18 13
6 14 10
7 11 8
8 9 9
9 8 10
M
1
= M
2
= M
D
=








14

WHEN o IS UNKNOWN
(A) CONFIDENCE INTERVAL FOR
D

( )
D, upper
= + M t s
D c MD
( )
D, lower
= M t s
D c MD



where
( )
( )
s
X M
n n
MD
D D
=

2
1
and df = n - 1

EXERCISE 3
Five participants were administered a new drug treatment for glaucoma (high internal eye
pressure). All participants were measured on eye pressure before (Pre-test) and again after
(Post-test) the treatment. Estimate the 95% confidence limits for the population mean
difference in eye pressure after taking the drug.

Participant X
1
(Pre) X
2
(Post) X
D
= X
1
- X
2

D
x = X
D
- M
D

2
D
x
1 50 45
2 65 63
3 42 40
4 51 48
5 59 56
M
1
= M
2
= M
D
=

=
2
D
x



= = =


MD
s





df = tc =



D,upper
=
D,lower
=




15
(B) HYPOTHESIS TEST OF H
0
:
D
= 0

Step 1: H
D 0
0 : = H
D 1
0 : =

Step 2: Set o level, find t
c
value, where df = n - 1

Step 3: Calculate
( )
( )
s
X M
n n
MD
D D
=

2
1
Calculate
t
M
s
D
MD
=


Step 4: Apply decision rule Reject H
0
if |t| t
c


Step 5: Conclusion

EXERCISE 4
A group of nine students are given a statistics test on which they perform poorly. To
encourage students to study and (hopefully) learn the material, they are told that the test will
be given again in two weeks time. Difference scores (Test 2 - Test 1) for the nine students are
given below. Test the hypothesis that doing the test a second time motivated students to learn
and do better.

Participant X
D
x
D
x
D
2

1 5
2 -3
3 4
4 1
5 -1
6 0
7 3
8 2
9 -2
M
D
=


3. Calculate s
MD
and t



4. Apply decision rule. What can you conclude?
1. State the null and
alternative hypotheses:



2. df =

o =

t
c
=



16
STATISTICS TUTORIAL WEEK 6
INDEPENDENT MEANS WHEN o IS UNKNOWN

UNBIASED ESTIMATE OF o
2
:
s
x x
n n
SS SS
df df
pooled
2
1
2
2
2
1 2
1 2
1 2
2
=
+
+
=
+
+


ESTIMATE OF o
M M
1 2

:
|
|
.
|

\
|
+
+
+
=
|
|
.
|

\
|
+ =

2 1 2 1
2
2
2
1
2 1
2
1 1
2
1 1
2 1
n n n n
x x
n n
s s
pooled M M



(A) CONFIDENCE INTERVAL FOR
1
-
2

( ) ( )
( ) ( )


1 2 1 2
1 2 1 2
1 2
1 2
= +
=

upper
c M M
lower
c M M
M M t s
M M t s
where
s
x x
n n n n
M M
1 2
1
2
2
2
1 2 1 2
2
1 1

=
+
+
+
|
\

|
.
|


and df = n
1
+ n
2
- 2

EXERCISE 1
A forensic psychologist was interested in seeing whether allowing video cameras into a
courtroom affected the recall of jurors. In a mock trial, 20 participants were randomly
assigned to one of two groups of jurors. Participants in Group 1 were videotaped, whereas
those in Group 2 were not. Towards the end of the trial a test was given to all participants to
determine their recall of facts presented in the case, where a higher score indicates better
recall.

Group 1 Group 2
X
1
x
1
x
1
2
X
2
x
2
x
2
2

8 15
10 17
7 14
11 19
12 13
13 12
9 12
10 13
11 11
9 14
M
1
= M
2
=



17
(a) Estimate the standard error of the difference between means. What does the standard
error estimate measure?







(b) Calculate the 95% confidence interval for the difference in recall between
1
and
2
.








(c) Does the videotaping of court proceedings affect jurors recall?





(B) HYPOTHESIS TEST FOR H
0
:
1
-
2
= 0

Step 1: H
0 1 2
0 : = H
1 1 2
0 : =

Step 2: Set o level, find t
c
value

Step 3: Calculate
s
x x
n n n n
M M
1 2
1
2
2
2
1 2 1 2
2
1 1

=
+
+
+
|
\

|
.
|

Calculate
t
M M
s
M M
=

1 2
1 2


Step 4: Apply decision rule Reject H
0
if |t| t
c


Step 5: Conclusion




18
EXERCISE 2
Twenty two volunteers were randomly allocated into two groups of eleven. Group 1
participants were deprived of sleep for 48 hours, during which time they had to perform a
number of cognitive and perceptual-motor tasks. The participants in Group 2 were allowed to
continue their normal life over the same period. Blood samples were taken from all 22
participants at the end of the 48 hours. The sample for one participant in Group 1 was unable
to be analysed. Scores for the remaining 21 participants on serum cholesterol were:
Group 1: 7, 2, 4, 6, 10, 9, 6, 7, 5, 4
Group 2: 5, 0, 6, 3, 8, 7, 3, 5, 2, 3, 2


(a) What are the appropriate hypotheses?





(b) Carry out an independent groups t-test at the .05 level of significance.













(c) What conclusion can be made?







(d) What purpose would be served by repeating this study with a larger sample size?









19
STATISTICS TUTORIAL WEEK 7
CHOOSING AN INFERENTIAL TEST













































Two Means
2 dependent means

CI H Test
t=

2 independent
means ( unknown)

calculate difference
score and use single
mean procedures
One mean
H Test CI

C
I
H Test
Z= t=
known

M
=
unknown S
M
=


20
EXERCISE 1
A psychologist working in an inner city alcohol treatment clinic wanted to know whether the
clients who present at that clinic tend to have problems that are more or less severe than the
State average. For the past 6 months her clinic had been routinely assessing all new clients at
intake on a standardised alcohol dependence measure. A State-wide survey of alcohol
treatment centres carried out the year before showed that the mean score for clients on this
alcohol dependence measure was 22, and the standard deviation was 19. The psychologist
accessed the records for all clients in her own clinic for whom data were available (64
clients), and calculated that their mean score on the dependence measure was 27. Since the
mean of 27 was higher than the State average of 22, she suspected her clients had more severe
problems, but she wanted to know if this was just a chance result. How should she proceed?












EXERCISE 2
A cognitive researcher is interested in testing a popular theory that people make better
decisions when they rely on their (hypothesised) unconscious, intuitive system rather than
their conscious, analytic system. Since this theory asserts that an attention-demanding
secondary task will distract the conscious system but leave the unconscious system
unaffected, it predicts that a secondary task will improve decision-making. The researcher
decides to test the theory by measuring decision-making on a complex primary task in the
presence and absence of a secondary task. He recruits 12 first year psychology students as
participants, and tests them on two different complex decision-making tasks, the bus
timetable task and the oil refinery task. Half of the participants do the bus timetable task
first and the oil refinery task second, and the other half do the tasks in the opposite order.
The first task is performed under normal conditions. During the second task, the researcher
asks the participants to also perform another (secondary) task: successively subtracting 13
from 894 and saying the answer at each step aloud. The performance of each participant on
the two tasks is given in the table below (a higher score indicates better decision-making):



21
Participant Task 1 (carried
out alone)
Task 2 (with
secondary task)
1 19 17
2 28 22
3 17 12
4 31 29
5 18 19
6 13 8
7 30 32
8 25 25
9 25 21
10 22 18
11 24 22
12 12 15

a) Carry out an appropriate analysis to test the prediction of the theory, and state the
conclusion that follows from your analysis.
b) Comment on the methodological adequacy of the experiment.



22
STATISTICS TUTORIAL WEEKS 8 AND 9
TYPE I AND TYPE II ERRORS AND POWER
EXERCISE 1
What is a Type I error? What is a Type II error? What is statistical power?










EXERCISE 2
(a) For the situation where H
0
is true and an o-level two-tailed test is to be carried out, shade
in the area below corresponding to the probability of making a Type I error:


(b) For the situation where H
0
is false and an o-level two-tailed test is conducted, shade in
the area below corresponding to (i) the probability of making a Type II error, and (ii) the
probability of correctly rejecting H
0
.




23
FACTORS WHICH AFFECT POWER
EXERCISE 3
Given the following information: H
0
: = 70, H
1
: = 70, o = 15, n = 25, o = .05, M = 75.

(a) Would H
0
be rejected?







(b) Using the original values, but with o = 5, instead of 15, would H
0
be rejected? What
effect does a change in o have on the power of a test?






(c) Using the original values, but with n = 49, would H
0
be rejected? What influence does n
have on power?






(d) Using the original values, but with o = .10, would H
0
be rejected? What influence does o
have on power?






(e) What is size of effect and how does it influence the power of a test to detect that effect?





24
DETERMINING POWER
Step 1: Determine values for n, o and . If no value of is specified, calculate power for
small ( = .2), medium ( = .5) and large ( = .8) size of effect.
Step 2: Find o,

SINGLE MEAN INDEPENDENT MEANS
o = n
o =
n
2


Step 3: Use Table 3 (found in the Tables section at the back of the Manual) to convert o to
1-| (power as a proportion).
EXERCISE 4
A researcher wishes to carry out a single mean t-test to compare the mean symptom scores for
a group of 25 patients with the known Australian mean score. She wishes to be able to detect
a difference of 0.6 standard deviations or larger. What power would she have for this test?




EXERCISE 5
Researcher A performs an experiment and claims that children given an enriched
environment have a higher IQ than children from normal environments. The difference
between sample means is statistically significant at the .05 level, with sample sizes n
1
= n
2
=
3700. Psychologist A concludes that we should therefore start a crash program of
environment enrichment to improve our national IQ.

Researcher B, a strong believer in the genetic determination of IQ, performs a similar
experiment and claims that children given an enriched environment show no improvement in
IQ over children in a normal environment. The difference between sample means was not
statistically significant at o = .05 with sample sizes n
1
= n
2
= 180. B concludes that we
should not waste scarce resources on an ineffective enrichment program.
Whom do you believe?






25

DETERMINING SAMPLE SIZE
Step 1: Determine values for 1-|, o and . If no value of is specified, calculate n for small
( = .2), medium ( = .5) and large ( = .8) size of effect.

Step 2: Use Table 4 to convert 1-| to o.

Step 3: Apply formula for n:
SINGLE MEAN INDEPENDENT MEANS
n =
|
\

|
.
|
o

2

n =
|
\

|
.
| 2
2
o



EXERCISE 6
An experimenter is interested in discovering whether reaction time to a complex stimulus is
faster or slower than reaction time to a simple stimulus. She knows from previous research
that the standard deviation of reaction times is approximately 50 millisecs. If she wishes to
be able to detect a difference in reaction time between the groups of at least 25 millisecs, with
90% power, how many participants per group should she use? (Alpha is .05 and the test is 2-
tailed.)









26
STATISTICS TUTORIAL WEEKS 10 AND 11
CORRELATION AND PREDICTION
EXERCISE 1 - HYPOTHESIS TEST OF H
0
: = 0
Step 1: H
0
: = 0 vs H
1
: = 0

Step 2: Choose o level, find t
c
using df = n-2

Step 3: Calculate standard error,
s
r
n
r
=

1
2
2
, and
t
r
s
r n
r r
= =

2
1
2


Step 4: Apply decision rule Reject H
0
if |t| t
c


Step 5: Conclusion

In a sample of 10 students, a correlation of r = .77 was found between the number of hours of
paid work a student engages in and level of stress. Test the null hypothesis that in the
population of university students there is no relationship between the number of hours a
student spends in paid work and level of stress.






27

EXERCISE 2 - POWER AND SAMPLE SIZE
POWER SAMPLE SIZE SIZE OF EFFECT ( = |
1
|)
(Cohen)
o = n 1
n =
|
\

|
.
| +
o

2
1

.1 .3 .5
small medium large

(a) A manager wants to determine whether scores on an aptitude test correlate with a measure
of subsequent job performance. If the null hypothesis that the correlation is zero is tested
against an appropriate alternative, how much power does she have to detect a correlation
of at least .4 if 81 participants are to be used?











(b) How many participants does the psychologist in (a) need to have an 80% chance of
detecting a correlation of at least .5 (if a two-tailed .05 test is to be conducted)?


28
EXERCISE 3 - PREDICTION
Prediction Equation: Y bX a ' = + , where
b r
s
s
Y
X
=
and a M bM
Y X
=
Proportion of variance accounted for: r
2
Standard error of estimate:
For small samples:
( ) ( )( )
s
Y Y
n
s
n r
n
Y X Y .
'
=

2
2
2
1 1
2

For large samples: ( ) s s r
Y X Y .
= 1
2


(a) A psychologist finds a correlation of -.56 between the number of years spent driving and
anxiety related to driving in a sample of 200 research participants. Calculate the equation
predicting anxiety related to driving (Y) from number of years driving (X), where M
X
=
14, M
Y
= 15, S
X
= 4.7 and S
Y

= 5.3. What proportion of the variance of anxiety related to
driving can be accounted for by knowledge of number of years spent driving?












(b) Calculate the predicted anxiety score for a participant who has 12 years of driving
experience.









(c) Calculate the standard error of estimate. What does this tell us about the accuracy of
prediction?








29
EXERCISE 4
A researcher finds a correlation of .6 between IQ and statistics marks, for a large group of
second year students. Assuming normality for both population distributions of IQ and
statistics scores, what percentage of students with an IQ of 110 would be likely to have a
statistics mark between 30 and 40?
Means and standard deviations for the group are: M = 100 and s = 15 for IQ, and M = 25 and
s = 5 for statistics.











EXERCISE 5 - CONFIDENCE LIMITS FOR Y
(1-o)100% Confidence Limits =Y Z s
c Y X
'
.
, assuming normal distribution for Y.

Using the data from Exercise 4, calculate the 95% confidence limits for a predicted statistics
score associated with X = 110.


30
STATISTICS TUTORIAL WEEK 12
CHI-SQUARE
EXERCISE 1 - _
2
GOODNESS OF FIT TEST
( )
_
2
2
=

f f
f
o e
e
and df = number of categories - 1

Suppose that you obtain a random sample of 210 university students in Sydney and classified
them into categories A, B, and C on the basis of socio-economic background. Would you be
prepared to believe that students from the three categories contribute equally to the numbers
at university if your sample contained 120 in A, 50 in B and 40 in C?


cells

f
o


f
e


f
o
- f
e


(f
o
- f
e
)
2

( ) fo f
f
e
e

2




_
2
=








31
EXERCISE 2 - _
2
TEST OF INDEPENDENCE
( )
_
2
2
=

f f
f
o e
e
and df = (rows - 1)(columns - 1);
f
e
=
row total column total
N

Cramers
( ) 1
2

=
s N
_
| , where s = smaller of rows or columns

Forty patients were randomly allocated to three treatment groups as follows: Group A (10 Ss)
given drug; Group B (10 Ss) given placebo; Group C (20 Ss) nothing but rest. After one week
participants were categorised in terms of whether they had improved or not (based on a
reduction in symptoms). Carry out a _
2
test of independence, with o = .05. The observed
data are as follows:
A B C
Improved after 1 week 9 6 5
Not improved after 1 week 1 4 15

What do you conclude? What can you not conclude from the statistic as calculated?


Cells

f
o


f
e


f
o
- f
e


(f
o
- f
e
)
2

( ) fo f
f
e
e

2







_
2
=



32
STATISTICS TUTORIAL WEEK 13
COMPLEX DESIGNS
2 2 FACTORIAL DESIGN INTERPRETING EFFECTS
(A) A researcher is interested in the effect of gender on cognitive ability. Fifty students (25
males and 25 females) complete two cognitive tests. One test measures spatial ability, and the
other test measures verbal ability. A high score on each scale represents better test
performance. The data, in the form of group means, are given below. Provide an appropriate
graphical representation of the cell means. How would you interpret these data?


COGNITIVE TEST (B) GRAPH
GENDER Spatial Verbal Mean
OF PARTIC- M 64.5 57.5
IPANT (A) F 56.3 71.5
Mean
















(B) Make up a plausible set of cell means that fits the following hypothetical patterns:
(i) Main effects for Factor A and Factor B, and no interaction.

COGNITIVE TEST (B) GRAPH
GENDER Spatial Verbal Mean
OF PARTIC- M
IPANT (A) F
Mean





33
(ii) An interaction between Factor A and Factor B, and a main effect for Factor A.

COGNITIVE TEST (B) GRAPH
GENDER Spatial Verbal Mean
OF PARTIC- M
IPANT (A) F
Mean






MULTIPLE COMPARISONS
(A) How can multiple comparisons arise in a single experiment?








(B) What is the difference between decision-wise (per-comparison) error rate and
experiment-wise error rate?









34
STATISTICS PRACTICE QUESTIONS
1.(a) For the following data set, calculate the mean and standard deviation of the raw scores
and convert the raw scores to Z scores.

X x x2 Z
18
16
23
20
14
8
22
21
20

(b) What is the mean and standard deviation of the Z scores in the above table? Do you
have to do the calculations in order to answer this question? Why not?


2. You have a set of data which are skewed. Your friend tells you to transform the
scores to Z scores because this will make the distribution normal. Is your friend correct?
Why or why not?


Normal curve tables

3. What Z values include the following proportions of the total area under the normal
curve? Assume that the area not included is divided equally between the two tails.
.75 .80 .90 .95 .99 .999



4. For the normal curve, what is the probability of obtaining Z values in the following
ranges? Draw a curve in each case, shading and labelling each area.

(i) Greater than Z = +1.5

(ii) Between Z = -2.5 and Z = +2.0

(iii) Beyond Z = +.75

(iv) Less than Z = -4.0



35
(v) Beyond Z = +2.67

(vi) Greater than Z = 1.64

(vii) Less than Z = -1.64

(viii) Between Z = +1.96

(ix) Beyond Z = +2.57


Descriptive use of Z

5. For a well known fast food chain, monthly sales of hamburgers per restaurant are
normally distributed with a mean of 19,400 and a standard deviation of 2,000.

(a) Restaurants that make up the top 15% of sales are given rewards from head
office. How many burgers need to be sold in a month to be in this category?

(b) Restaurants in the top 25% of sales get a framed certificate from head office.
Last month the Kensington branch sold 21,540 burgers. Will the restaurant get
a certificate?

(c) Find the two values (number of hamburger sales) that separate the top 20% and
the bottom 30%.

(d) Restaurants whose monthly sales are in the bottom 5% are instructed to attend
a sales course. What value corresponds to this cut-off point?


Sampling Distribution

6. Scores on a university entrance exam are known to be normally distributed with a
mean of 500 and a standard deviation of 110.

(a) Suppose that a group of 16 students from a local high school obtained a mean of 552.
What is the probability of getting a sample mean of 552 or higher?

(b) Suppose the size of the group was 49 instead of 16, what is the probability of getting a
sample mean of 552 or higher?

(c) How does an increase in sample size change the probability of a mean of 552 or higher?

Probabilities for Range of Means
7. Suppose you have a population with a mean of 150 and a standard deviation of 40.
(a) What is the mean and standard error of the sampling distribution of the mean if n = 25?
(b) What is the probability of obtaining a sample mean of 160 or higher?


36

(c) What are the sample mean values which form the upper and lower limits of a 95%
interval of sample means around ?
(d) What are the upper and lower limits that bound a 95% interval of sample means around ,
if n = 100? What effect does a change in n have on the range of sample means?


Confidence interval for a single mean, using Z

8. Suppose that the following data appeared on a computer printout:

School Mean o
2

o
type n Exam
Country State 64 99.7 96.9 9.8
Country Private 25 102.2 144.0 12.0
City State 81 104.8 238.1 15.4
City Private 16 105.0 97.6 9.9

Assume that the samples were drawn randomly from the identifiable populations.
Calculate 95% confidence intervals for each of the population means and state in words what
such an interval tells you.


9. A psychologist is interested in the long-term effects of divorce on children. A sample
is obtained of 10 children whose parents were divorced at least 5 years before. Each child is
given a personality questionnaire that measures depression. Their scores were:
83 81 75 92 84 107 63 112 92 88

Assuming that o = 12, estimate the population mean:
(a) using a point estimate
(b) using an interval estimate that provides 90% confidence.


Hypothesis test for single mean, using Z

10. A local factory has a machine which is designed to fill lemonade bottles with 600mls
of liquid, with a standard deviation of 10mls. To check whether the machine is calibrated
correctly, a quality control officer took a random sample of 36 bottles and found the mean
amount of liquid was actually 580 ml. Assuming a normal distribution,

(a) What are the appropriate H
0
and H
1
?
(b) Using o = .05, test the hypotheses in part (a).
(c) What conclusion can be made?

11. Refer to the data for schools in Q. 8. For each sample, decide whether or not it came
from a population that had a mean of 100. Set o at .05 and use a two-tailed test. Compare
each decision with the relevant confidence interval that you calculated previously.


37

12. If you know that Z
c
= 2.33 when performing a two-tailed non-directional test, what
must the level of significance be?


The t distribution

13. Complete the following table with the appropriate t
c
values to include the required
percentage of the area in the middle section of the t curve:

Area df t
c
Area df t
c

10 10
20 20
80% 30 95% 30
60 60
120 120

10 10
20 20
90% 30 99% 30
60 60
120 120



14. Fill out the following table with critical values of t (both one and two-tailed) for the
following degrees of freedom with o = .05 and o = .01. Note that o is the proportion of the
total area in the tail or tails.

df o = .05 o = .01
5
15
25
40
60




38

Confidence interval for single mean, using t

15. A psychologist obtained the following scores from a sample of n = 16. Assuming
random sampling, calculate the 99% confidence interval for the mean of the population from
which the sample was drawn.

94 94 95 96 98 99 99 99 101 101 101
102 104 105 106 106 (these are the same data from Ex. 2, Week 4)

Why is the 99% confidence interval longer than the 90% interval calculated in the Week 4
tutorial?


16. A delinquency subscale of a large personality inventory has a norm of 35. A
researcher is interested to know whether the mean delinquency score for children from single-
parent families is different to the norm. She administers the delinquency subscale to a group
of 9 adolescents from single-parent homes and obtains the following scores.

Delinquency scores: 33, 36, 32, 39, 32, 30, 31, 40, 36

Calculate a 95% confidence interval for the population mean delinquency score of
adolescents from single-parent families. Do single-parent families lead to more (or less)
delinquency in children, compared to the norm?


Single mean hypothesis test using t

17. A random sample of students in an engineering faculty scored 21, 20, 23, 28, 30, 24,
23, 19 on a test of word knowledge. An appropriate standard is regarded as = 22. Do
engineering students differ from the appropriate standard? Use an alpha of .05.


18. In a new advertising campaign, a national food company claims to have increased the
number of sultanas in each box of breakfast cereal. Previously, there had been an average of
100 sultanas per box. Upon sampling 16 boxes you find that the sample mean is 110 sultanas
per box, with s = 8.0. Are there significantly more sultanas in the new breakfast cereal than
there were before? Carry out an appropriate hypothesis test at the .05 level.


Confidence interval for dependent means using Z

19. In order to assess the similarity of two standard tests of manual dexterity, 20
participants were tested for their performance on both tests. The mean difference score for
the sample was found to be 10 points. The population standard deviation of difference scores
for the two tests is 6.0.
(a) Calculate o
MD

(b) Calculate a 95% confidence interval for the population mean difference in test scores.




39
Hypothesis tests for dependent means using Z

20. Nine participants were tested under two instruction conditions and gave the following
scores (each row represents one participant):

CONDITION
A B
18 15
16 17 Test the difference between the conditions using Z and an alpha of .05
23 18 The population standard deviation for the population of these difference
20 21 scores is 1.2
14 10
8 11
22 19
21 22
20 20


21. A test of proficiency in the English language was claimed to give scores of true
proficiency unaffected by practice on the items. If true, the test could be used in a training
course for teaching purposes as well as testing participants proficiency at the end of the
course.
PARTIC-
IPANT
TEST
ONE
TEST
TWO
In order to test this claim, a psychologist administered the
test to sixteen participants.
1 10 15.5 The psychologist made sure that the participants had no
2 11 11.0 other practice for a week and then gave them the test again.
3 9 10.0 Do the data suggest that participants improve their scores as
4 12.0 11.5 a result of their prior experience on the test?
5 8.5 12.5 In the population, the standard deviation of the difference
6 7.0 6.0 scores is 2.
7 9.0 7.5
8 8.5 10.0
9 7.0 10.0
10 9.0 12.0
11 8.0 12.0
12 7.0 11.0
13 11.5 15.0
14 13.0 13.0
15 6.0 5.0
16 9.5 12.0


40

Confidence interval for dependent means using t
22. Six patients rated their average headache intensity on a 5-point scale (5 = maximum
Partic Before After intensity) for 6 weeks prior to relaxation training.
1 2.4 2.6 At the conclusion of the treatment, participants rated their
2 3.9 2.9 headache intensity during a follow-up period of 6 weeks.
3 2.7 2.1 Assuming that the 6 patients are a random sample from
4 2.8 2.7 population of potential patients, calculate the 95%
5 3.4 2.5 confidence interval for the population mean improvement
6 3.1 1.9 score. What do the confidence limits imply with regards
to the relaxation training?


Hypothesis tests for dependent means using t
23. A researcher was interested in whether training participants on a word recognition
task could decrease the number of errors made on the task.
SUBJ COND 1 COND 2 Ten participants were tested before given the training
1 110 107 and again afterwards. The dependent variable is the
2 98 95 number of errors made on the word recognition tasks
3 100 97 Did training have any significant effect on the number
4 105 101 of errors produced?
5 90 90 Test appropriate hypotheses with o = .01.
6 120 115
7 117 111
8 110 106
9 104 104
10 95 90



Confidence Interval for Independent means, using t

24. A psychologist was interested to see whether the University Counselling Unit
researcher has randomly selected a sample of 11 participants from those students seeking help
from the Counselling Unit last year. She also randomly selected the same number of


41
participants from those students who had not sought help. She then administered a Test
Anxiety Questionnaire to each participant. Hypothetical results are:

Sought
help
Did
not

21 21 (a) Estimate the standard error of the difference between
23 23 the means.
21 17 (b) What does this standard error measure?
29 27 (c) What are the upper and lower limits of the 99%
27 25 confidence interval for the difference between
21 17 population means?
23 21 (d) What do these limits mean?
19 15 (e) Calculate the 95% confidence interval.
17 15 (f) Comment on the relative lengths of the 95% and
17 11 99% confidence intervals.
13 17
MEANS 21 19


Hypothesis Test for Independent means using t

25. 22 volunteers were randomly allocated into two groups of 11. Group 1 participants
were deprived of sleep for 48 hours, during which time they had to perform a number of
cognitive and perceptual-motor tasks. The participants in Group 2 were allowed to continue
their normal life, merely reporting to the laboratory before and after the 48 hour period.
Changes in blood composition from before to after this period were measured for all 22
participants. Scores on serum cholesterol were:

Group 1: 4, 6, 6, 7, 9, 8, 11, 12, 8, 9, 8
Group 2: 5, 6, 3, 8, 7, 3, 5, 2, 2, 0, 3

Carry out an appropriate hypothesis test, using o = .05 and state your conclusion.


26. A researcher is interested in the effects of caffeine on concentration. Sixteen
participants with no prior history of caffeine intake are randomly allocated to one of two
groups: Group 1 are given the equivalent of 3 medium strength cups of coffee over a one
hour period; whereas Group 2 are given 3 cups of decaffeinated coffee to drink over the same
time period. All participants are then given a visual and auditory concentration task. The
dependent variable is a performance score, where the higher the score the better the
performance. Carry out a .05 level t-test (independent groups) for the data below and draw
appropriate conclusions.

Group 1: 29, 25, 20, 22, 16, 21, 21, 22.
Group 2: 21.5, 21.5, 14.5, 19.5, 20.5, 19.5, 21.5, 21.5




42
Dependent versus independent differences between means

27. A computer program generated 100 pairs of means. It then generated a second set of
100. You know that in both sets, the first member of each pair (M1) is the mean of a random
sample from Population A. The second member of each pair (M2) is the mean of a random
sample from Population B. The mean of Population A is 123.6 and the mean of Population B
is 112.8. You know that one of the sets has independent means and the other has correlated
means. What you dont know is which set is which. You decide to graph the M1 and M2
means for Set 1, plotting each pair over its replication number (from 1 to 100). You then
produce another graph for Set 2. When you compare the graphs for Set 1 and Set 2, what
differences between the graphs might suggest which Set has the independent means?



General confidence interval and hypothesis testing questions

Note 1: you will have to decide (a) whether Z or t is appropriate and (b) whether
research design is a single mean, dependent means or independent means.

Note 2: Use alpha = .05 and a 2-tail test unless otherwise stated.

28. A developmental psychologist has given a training program designed to improve
problem solving ability to a large number of 6 year olds. For the population of 6 year olds,
the average score on a standardised problem solving test was known to be = 80 with o = 10.
To test the effectiveness of the training program, a random sample of the participants is given
this test. Their scores were:

85, 69, 90, 77, 74, 76, 86, 93, 97, 88, 97, 80, 75, 98, 79, 75, 87, 94.

Was the program effective?


29. A personality questionnaire was administered to a sample of 16 college students.
Their scores on assertiveness were:

20, 24, 21, 25, 20, 19, 19, 18, 17, 29, 17, 19, 21, 22, 22, 23.

Calculate the 95% confidence interval for the mean of the population from which the sample
scores were drawn.


30. A researcher would like to know if oxygen deprivation at birth has a damaging effect
on IQ. It is known that scores on a standard intelligence test are normally distributed with =
100 and o = 15. The researcher takes a random sample of individuals for whom
complications at birth indicate moderate oxygen deprivation and administers the intelligence
test. The sample data are: 92, 100, 106, 78, 96, 94, 98, 91, 83, 81, 86, 89, 87, 91, 89. Is there
any evidence for an effect?




43
31. Chapter 6 of Eysencks Handbook of Abnormal Psychology presents these data (some
minor modifications have been made):

GROUP n MEAN IQ SS s
2
s
Introverted Neurotics 121 109.9 23185 193.2 13.9
Neurotics Unspecified 121 98.9 21870 182.2 13.5
Schizophrenics 676 92.7 181547 269.0 16.4
Epileptics 25 100.8 3064 127.7 11.3
Assume that the samples were drawn randomly from identifiable populations.

a. Calculate the 95% confidence interval for each of the means and say in words what the
interval tells you.

b. Calculate the 95% confidence intervals for:
(i) the difference between the two types of neurotic
(ii) the difference between the other two groups

c. Do you consider that the Introverted neurotic and Neurotics Unspecified samples
came from populations with different means?

d. Examine the difference between the population means of Schizophrenics and
Epileptics.


32. The following difference scores were obtained from a sample of 10 participants tested
under two conditions: 10, 25, 16, 24, 23, 23, 21, 26, 14, 18. Given that the population
standard deviation of difference scores is 10, can you conclude that there is no difference
between the conditions?


33.
NUMBER OF CIGARETTES SMOKED
BEFORE AFTER
S1 19 15 In order to examine the effects of sensitisation on
S2 22 7 cigarette smoking by habitual smokers, a random
S3 32 31 sample of 12 smokers was obtained and the number of
S4 17 10 cigarettes smoked per day recorded for each participant.
S5 37 28 They were then sensitised to the effects of smoking by
S6 20 12 viewing a film that graphically shows the harm caused
S7 23 23 by cigarette smoking. A week later, the participants
S8 24 17 were asked to record the number of cigarettes that they
S9 28 19 had smoked on that day. Did sensitisation cause a
S10 21 24 reduction?
S11 15 11
S12 18 16



44
34. In order to compare the political attitudes for the older and younger voters in her
electorate, a politician gives a standardised political attitude scale to a sample of 10 young
voters and 10 elderly voters. The mean for Young was 52 and the mean for Old was 39. The
population standard deviation for this scale is 20. Should the politician conclude that there is
a significant difference between young and old voters?


35. A management/union committee in a large company tried out a number of changes in
working procedures. At the end of the trial period, the committee met again to consider the
outcome. The union members considered the trial to be a huge success and wanted the new
arrangements to continue. The management members on the committee argued that the
changes had no effect in reducing staff discontent, the attitude of staff was one of
indifference, and the changes were costing the company money. It was decided to survey a
sample of employees randomly selected from the payroll list. Each of these employees was
asked to give anonymous responses to a rating scale. When coded, the ratings received scores
ranging from 7 (completely in favour) down to 1 (completely opposed), with completely
indifferent receiving a score of 4. Given the scores below, use a hypothesis testing
procedure and conclude whether the employees as a whole are in favour of the change,
opposed to it, or are indifferent.
3, 1, 7, 2, 1, 2, 2, 1, 6, 4, 5, 1, 1, 2, 6, 5, 2.


36. Calculate the 95% confidence interval for the data in Question 35.
What additional information does this provide (compared to the hypothesis test)?


37.
GROUP
E C Volunteers were randomly allocated to either an experimental group
20 20 or a control group. The two groups performed the same task, but the
22 19 experimental group was subjected to loud rock music while
14 17 doing the task.
15 27 The results were as shown, with the mean of the Experimental
14 13 Group being 18.58 and that for the Control Group being 20.50.
20 24 (a) Estimate the standard error of the difference between the two
27 15 population means.
20 29 (b) What does this standard error measure?
21 27 (c) What are the upper and lower limits of the 99% confidence interval
18 14 for this difference?
15 20 (d) What do these limits mean?
17 21


38. Some staff at a local school claimed that a remedial class in reading that had been run
for a number of years was too costly in terms of school resources and they argued that it
should be cancelled. After some heated discussion it was decided to stop offering it unless
there was evidence that students had benefited from it. The performance of the current batch
of students in the class was therefore measured at the start of the session and again at the end.


45
The mean for 36 students was 51.5 for the first test and 55.9 for the second test. The SS for
the difference scores was 1260. Should the class be retained on this evidence?


39. RECALL AFTER DELAY
Delay: 0 hrs 3 hrs In a memory experiment a group of 12 participants obtained the
S1 84 68 following percentage scores for memorising nonsense
S2 82 86 syllables. They were tested immediately after training and
S3 80 84 again three hours later. Statistically examine the change of
S4 77 82 mean performance over time, using a hypothesis testing
S5 75 80 procedure. Answer the following questions:
S6 73 78 (a) State and justify the statistical hypothesis that you would
S7 73 76 use.
S8 71 74 (b) Determine the appropriate critical value.
S9 69 72 (c) Calculate the appropriate test statistic.
S10 84 60 (d) Compare this result with the critical value.
S11 76 75 (e) What is the 95% confidence interval for the population
S12 68 85 mean difference score?


40. Two celebrity therapists challenge each other to a competition to see which one is more
successful in reducing anxiety in contestants auditioning for the Top 100 in Australian Idol.
Channel 10 arranges to recruit 20 contestants who suffer from performance anxiety and
randomly allocate them to the two therapists (10 each) for treatment during the week before
the audition. They film the contestants performances and ask blind judges to rate the level of
anxiety they displayed. The dependent variable is the mean anxiety rating given by the judges
for each contestant. The data are given in the table below (one contestant who had been
allocated to Dr Karl developed flu and dropped out, leaving 9 participants).

Therapist 1
(Dr Phil)
Therapist 2
(Dr Karl)
S1 17 S1 19
S2 11 S2 12
S3 21 S3 13
S4 20 S4 9
S5 14 S5 22
S6 8 S6 11
S7 19 S7 15
S8 11 S8 10
S9 14 S9 15
S10 15


46

a) Carry out an appropriate analysis to test the null hypothesis that the two therapists are
equally effective, and interpret the outcome
b) What does it mean to say the judges were blind and why is this important?


Power and sample size

41. An experimenter is testing the null hypothesis Ho: = 50 with o = .05, 2 tailed.
Complete the following table, by calculating the power of her tests for the values of for each
sample size. Comment on the shape of the power functions, and compare them.


o = (Z
0
- Z
1
) Power
0
.2
.4
n = 9 .5
.6
.8
1.0
0
.2
.4
n = 36 .5
.6
.8
1.0
0
.2
.4
n = 100 .5
.6
.8
1.0



47

42. (a) A clinical psychologist is interested in comparing the effectiveness of two treatments
for hypochondria. If she considers a difference of .5o between the groups to be the smallest
difference which is of practical importance, how many participants in each group will she
need to detect this difference with 80% power (with o = .05 two-tailed)?

(b) How many participants would be needed if o = .01 (two tailed)?


43. An experimenter is interested in discovering whether reaction time to a complex
stimulus is faster or slower than reaction time to a simple stimulus. She knows from previous
research that the standard deviation of reaction times is approximately 50 millisecs. If she
wishes to detect a difference of 5 millisecs with 80% power how many participants should
she use? (Alpha is .05 and the test is 2 tailed.)


44. A psychologist constructed a test to measure frustration tolerance, and administered it
to random samples of 18 males and 18 females among First Year UNSW students. The mean
for males was 18 points and the mean for females was 24 points. She calculated a t-value for
the difference between the 2 sample means and obtained t = 1.8. Having initially set o = .05
(2 tail), she concluded that males and females do not differ in frustration tolerance.

(a) Discuss the conclusion of no difference in relation to the power of the statistical test
used. (Hint: Calculate power for small, medium and large effects)

(b) Suppose that with two random samples of 162 participants, the mean for males was 20
and the mean for females was 22 points, giving a t-value of 1.8. Relate the conclusion of no
difference to the power of the test used.

(c) What sample size would give a Type II error rate of only .05 for a medium size of
effect?


Correlation and Prediction

45. Use SPSS to calculate the product-moment correlation coefficient for the following
pairs of scores.

Student Entrance Test score 2
nd
Year Uni grade
1 70 2.5
2 90 4.0
3 75 3.5
4 85 3.0
5 80 3.0
6 70 2.0
7 90 3.0



48
What proportion of uni grade variance can be accounted for by knowing entrance test scores?

46. For the sample correlation in Q. 45, carry out a test of the null hypothesis that in the
population entrance test scores and 2
nd
year uni grades are uncorrelated.


47. A psychologist was interested in whether a significant relationship exists for university
undergraduates between Dominance and Tolerance scales of the California Psychological
Inventory, based on the following data from 10 students:

Dominance: 42 33 35 26 15 21 40 26 18 20
Tolerance: 8 21 15 23 28 23 6 15 25 25

Use SPSS to carry out an appropriate hypothesis test, using o = .05.


48. A psychologist was interested in whether a relationship exists between intelligence
and length of big toe. He obtained data from a random sample of 2000 participants and found
a sample correlation of +.15. A two-tailed hypothesis test of H
0
: = 0 was rejected at
beyond the .05 level and he concluded that there is a significant association between
intelligence and big toe length.

(a) By carrying out a power analysis on the above study, what can you say about the
psychologists claim?
(b) What sample size is needed to have an 85% chance of detecting a medium size of effect
(ie a correlation of at least .3?)


49. An experimenter believes the correlation between exam anxiety and time spent
studying is -.4. She wishes to carry out an appropriate one-tailed test of H
0
: = 0 using o =
.05 and random sample of 50 participants. Is the power of her test satisfactory?


50. For the data in Q.45, find the regression equation for predicting a participants
university grade from their entrance test score. What is the predicted university grade for a
student with an entrance test score of 80? Find the 90% confidence limits for this predicted
university grade.


51. A psychologist found that scores for a random sample of 60 NSW Year 10 students on
Test A (Neuroticism Inventory) and Test B (Fear Survey) correlated 0.6. For Test A, M = 150
and s = 10. For Test B, M = 25 and s = 10. Consider participants who had the following
scores:
A B
(Neuroticism Inventory) (Fear Survey)
Participant 1 120 25
Participant 2 130 15
Participant 3 180 35


49
Participant 4 140 15
Participant 5 150 35
(a) For each participant, calculate her/his expected Z-score on the Fear survey given the
Neuroticism score. Also calculate the raw expected scores.
(b) Express as standard scores the difference between each participants actual Fear score and
the value predicted from knowledge of the Neuroticism score.
(c) What do these standardised residuals mean for the individuals in relation to each other
and to the rest of the sample? Make explicit any additional assumptions you need to
make.
(d) What are the 95% confidence limits for the predicted Fear score based on a Neuroticism
score of 135?


52. Suppose that in a study involving 100 UNSW students, a correlation of 0.7 was found
between performance in RM2 and performance in a talent contest. The relevant means and
standard deviations are:

Talent Contest RM2
M 500 62
s 50 13

(a) Two students were unable to attend the RM2 exam. Predict, using their talent contest
scores, how they would have fared in the exam:

Talent Contest
Student 1 450
Student 2 600

(b) Suppose the two students actually sat for the exam and obtained the following scores:

RM2
Student 1 65
Student 2 71

What would you say about their performance in light of the predictions? Specify any
assumptions you are making.






_
2
Goodness of Fit Test

53. In 1990 a questionnaire was given to all those HSC students who intended to go to
university. It indicated that 30% intended to be science majors, 50% intended to major in
social sciences or humanities, and 20% wanted to do professional courses (e.g., medicine,


50
law, architecture). A random sample of 100 1995 HSC students yielded the following
frequency distribution.


Intended Major
Social Science
Science or Humanities Professional
__________________________________________________________________
35 40 25
__________________________________________________________________
On the basis of these data, can you conclude that there has been a significant change
in students intentions? Use o = .05.


54. Suppose that you obtain a random sample of 210 university students in Sydney and
classified them into categories A, B, and C on the basis of socio-economic background.
Would you be prepared to believe that students are drawn from categories A, B, and C in the
ratio 4:2:1 if your sample contained 120 in A, 50 in B and 40 in C?


_
2
Test of Independence

55. The Australian Marijuana Party is preparing a submission to the State Government
concerning the legalisation of marijuana. They conducted a survey and asked 200 people
between the ages of 20 and 30, 100 between 30 and 40, and 100 between 40 and 50, whether
they were in favour of the legalisation of marijuana with the following results:

AGE
20 - 30 30 - 40 40 - 50
________________________________________
In favour 150 30 20
________________________________________
Against 50 70 80
________________________________________

What can you conclude about the relationship between age and attitude to legalisation? What
ambiguities are there regarding your conclusion?


56. One hundred patients were randomly allocated to two treatment groups as follows:
Group A (50 Ss) given drug; Group B (50 Ss) given placebo. After one week patients were
classified as to whether they had improved or not. State what hypothesis you could test using
_
2
with the following data and test it. What can you conclude? Calculate |.

A B
_______________
Improved 10 20


51
Not improved 40 30
_______________
57. The following sample data are randomly drawn from the records of diagnosed
schizophrenics at three institutions. Consider whether the institutions treat their patients
differently.
Number treated Number becoming Number
for less than 2 weeks day patients committed
__________________________________________________________________
Institution A 12 23 89
Institution B 8 12 62
Institution C 21 30 119
___________________________________________________________________


Factorial designs

58. A factorial design was described as being a 2 3 4 design.
(a) How many factors did it have?
(b) How many levels did each factor have?
(c) How many groups would be needed for a fully between groups design?


59. A factorial experiment obtained the following results:

B1 B2
A1 25.6 33.2
A2 12.8 20.0

Assuming that the obtained differences are significant, how would you describe the result in
terms of main effects and interactions? Draw a graph of these results.


60. The results of a 2 2 factorial involving A and B were that the interaction and both
main effects were significant. Explain in general terms what this result indicates.


Revision of strategy and methodological issues


61. List the defining characteristics of: a survey; an experiment; a correlational study; a
quasi experiment.

62. Define: independent variable; dependent variable; extraneous variable.


52

63. A psychology student is concerned about the effects of the university cafeteria on
health. He finds that a sample of people who regularly buy their lunch from the cafeteria
have a significantly higher level of cholesterol than a sample of people who regularly bring
their own lunch. He then writes a letter to the student newspaper, demanding that the
cafeteria display a signs saying that eating there is injurious to health. Comment upon his
interpretation of the data.


64. If two samples are randomly selected from the same population, we would not expect
the means of the two random samples to be exactly the same. Why is this so? How do we
cope with this for inferential purposes?


65. A researcher studying the effects of caffeine on concentration randomly allocated
students to two groups and gave all students a simple perceptual tracking task. The
experimental group ingested a fixed amount of caffeine prior to the task, whereas the control
group did not. For practical reasons, participants could be tested either in the morning or
after lunch. On the basis of a coin-toss, the experimental group was allocated to the morning
session whereas the control group was tested after lunch. Because the researcher had tossed a
coin to decide which group should be measured in which time-slot, he believed that his
design was free of any threats to internal validity.

(a) Is the researcher correct in this belief? If not, why not?

(b) The experiment was repeated in a way that was free of errors affecting internal
validity. It was found that caffeine increased concentration, and on the basis of this result, the
researcher recommended that all airline pilots consume caffeine-based drinks while flying a
plane. Comment on the external validity of this study.


66. For a repeated measures experiment involving three conditions, A, B and C.
(a) State what the within-subject counterbalanced order would be.

(b) State what the between-subject order would be.


67. A food manufacturer is developing a new range of corn chips which comes in five
flavours, and wants to taste-test the product before launching it onto the market. Participants
were asked to sample each flavour and fill out a ratings sheet each time. Flavours were
presented to participants in a random order, however, one of the flavours was hot chilli, and
participants reported that after tasting this flavour they were unable to taste any of the
subsequent flavours. Would this be classified as a random error, rank order effect or carry-
over effect? Why?


68. To counteract the problem of the hot chilli flavour, this flavour was always presented
to participants last. However, all the corn chips contained flavour enhancers and participants
reported that by the third or fourth flavour the corn chips all started to taste the same.
Would this be a different type of error to that described in Q67? If so, what and why?


53



Revision of issues in data analysis

69. Define the following terms: statistic, parameter, population, sample, replication,
sampling distribution.


70. Why is the term confidence interval used and not probability interval, when
carrying out confidence interval estimation procedures?


71. In the results section of a research article, the following statement was made A
significant effect was found (t = 3.2, p < .05), what does p < .05 mean? In general, what is
a p-value and how does it relate to the critical value for a test statistic?


72. A researcher claims that her result is highly significant and that this indicates that
there must be a large effect of the IV on the DV. Comment on this statement.


73. Another researcher obtains a mean of 114.3 for one group and a mean of 125.6 for
another group and carries out an appropriate test of significance to find out if these two
group means are equal. Comment.


74. A student makes an interpretation for a 95% confidence interval (where the limits are
90 and 105) and states that 95% of the time the population mean will fall between 90 and
105. Comment.



54
SOLUTIONS TO SELECTED EXERCISES
1.(a)
X Z
18 0 M = 18
16 -0.4497 S = 4.4472
23 1.1243
20 0.4497
14 -0.8994
8 -2.2486
22 0.8994
21 0.6746
20 0.4497

(b). M = 0, S = 1. No, because mean of a set of Z scores is always 0, and standard
deviation is always 1.

2. No, a distribution standard scores retains the shape of the original distribution of raw
scores.

3. 1.15 1.28 1.65 1.96 2.58 3.30

4. Percentages: 6.68 97.1 45.32 0.003 0.76 5.05 5.05 95.0 1.02

5. (a) Z = 1.04, sales = 21,480 (b) Yes, Z = 1.07 which is in top 14.23%
(c) top 20%: Z = .84, sales = 21, 080; bottom 30%: Z = .52, sales = 18,360
(d) bottom 5%, Z = -1.645, sales = 16,110.

6. (a) o
M
= 27.5, Z = +1.89, probability = .0294 (b) o
M
= 15.71, Z = 3.31,
probability = 0.0005 ( c) As n increases, o
M
decreases, probability decreases.

7. (a) = 150 o
M
= 8 (b) Z = 1.25, prob = 0.1056
( c) M
lower
= 134.32 M
upper
= 165.68
(d) o
M
= 4, M
lower
= 142.16 M
upper
= 157.84

8. Country State o
M
= 1.225 Zc = 1.96
lower
= 97.3
upper
= 102.1
Country Private o
M
= 2.4 Zc = 1.96
lower
= 97.5
upper
= 106.9
City State o
M
= 1.71 Zc = 1.96
lower
= 101.45
upper
= 108.15
City Private o
M
= 2.47 Zc = 1.96
lower
= 100.15
upper
= 109.85

Conclusion: eg. Country State: 95% confident that the population mean exam mark for
country state schools is captured by the interval 97.3 - 102.1.



55
9. (a) M = 87.7 is a point estimate of .
(b) o
M
= 3.79, Zc = 1.645,
lower
= 81.47
upper
= 93.93

10. (a) H
0
: = 600 vs H
1
: = 600.
(b) o
M
= 1.667, Zc = 1.96, Z = -6, reject H
0
.
(c) Evidence suggests, at .05 2-tailed level, that the machine is underfilling
bottles on average.

11. Country State
Z = -0.24 Zc = 1.96 retain H
0

Country Private Z = 0.916 Zc = 1.96 retain H
0

City State Z = 2.81 Zc = 1.96 reject H
0

City Private Z = 2.02 Zc = 1.96 reject H
0


12. o ~ .02.

13.
Area df t
c
Area df t
c

10 1.372 10 2.228
20 1.325 20 2.086
80% 30 1.310 95% 30 2.042
60 1.296 60 2.000
120 1.290 120 1.984
1.282 1.960
10 1.812 10 3.169
20 1.725 20 2.845
90% 30 1.697 99% 30 2.750
60 1.671 60 2.660
120 1.661 120 2.617
1.645 2.576

14.
df o = .05 o = .01
5 2.571 4.032
15 2.131 2.947
25 2.060 2.787
40 2.021 2.704
60 2.000 2.660
1.960 2.576


56

15. M = 100, s
M
= 1, df = 15, tc = 2.947, 99% limits:
lower
= 97.05
upper
= 102.95

16. M = 34.33, s = 3.57, s
M
= 1.19, df = 8, tc = 2.306,
lower
= 31.58
upper
= 37.07
Since = 35 is contained in interval, no evidence of greater or lesser delinquency in single-
parent families.

17. M = 23.5, s
M
= 1.35, df = 7, tc = 2.365, t = 1.11, retain H
0
.

18. s
M
= 2, df = 15, tc = 2.131 (two-tailed), t = 5, reject H
0
.

19. (a) o
MD
= 1.342 (b) Zc = 1.96,
D,lower
= 7.37
D,upper
= 12.63

20. M
D
= 1, o
MD
= 0.4, Zc = 1.96, Z = 2.5

21. M
D
= 1.75 (Test 2 - Test 1), o
M
= 0.5, Z = 3.5, Zc = 1.96 (two-tailed)

22. M
D
= 0.6 (B - A), s
MD
= 0.224, df = 5, tc = 2.571,
D,lower
= 0.024
D,upper
= 1.176.

23. M
D
= 3.3 (Cond1 - Cond2), s
MD
= 0.633, df = 9, tc = 3.25, t = 5.211

24. (a) s
M1-M2
= 2 ( c) df = 20, tc = 2.845, 99%: (
1
-
2
)
lower
= -3.69 (
1
-
2
)
upper
= 7.69
(e) df = 20, tc = 2.086, 95%: (
1
-
2
)
lower
= -2.17 (
1
-
2
)
upper
= 6.17

25. M
1
- M
2
= 4, s
M1-M2
= 1, df = 20, tc = 2.086, t = 4.

26. M
1
- M
2
= 2, s
M1-M2
= 1.58, df = 14, tc = 2.145, t = 1.26.

28. M = 84.44, o
M
= 2.36, Zc = 1.96, Z = 1.88

29. M = 21, s
M
= 0.79, df = 15, tc = 2.131,
lower
= 19.32
upper
= 22.68

30. M = 90.73, o
M
= 3.87, Zc = 1.96, Z = -2.40

31. a.
Introverted Neurotics
lower
= 107.4
upper
= 112.4
Neurotics Unspecified
lower
= 96.46
upper
= 101.34
Schizophrenics
lower
= 91.47
upper
= 93.93
Epileptics
lower
= 96.14
upper
= 105.46

b. (i) s
M1-M2
= 1.76, df = 240, tc = 1.96, 95%: (
1
-
2
)
lower
= 7.55 (
1
-
2
)
upper
= 14.45
(ii) s
M1-M2
= 3.31, df = 699, tc = 1.96, 95%: (
1
-
2
)
lower
= -14.59 (
1
-
2
)
upper
= -1.61

c. s
M1-M2
= 1.76, df = 240, tc = 1.96, t = 6.25

d. s
M1-M2
= 3.31, df = 699, tc = 1.96, t = -2.45

32. M
D
= 20, o
MD
= 3.16, Zc = 1.96, Z = 6.33



57
33. M
D
= 5.25 (B - A), s
MD
= 1.41, df = 11, tc = 2.201, t = 3.723

34. o
M1-M2
= 8.94, Zc = 1.96, Z = -1.45

35. H
0
: = 4 vs H
1
: = 4, M = 3, s
M
= 0.5, df = 16, tc = 2.12, t = -2.0

36.
lower
= 1.94
upper
= 4.06

37. s
M1-M2
= 1.9, df = 22, tc = 2.819, 99%: (
1
-
2
)
lower
= -7.28 (
1
-
2
)
upper
= 3.44

38. M
D
= 4.4 (M2 - M1), s
MD
= 1.0, df = 35, tc ~ 1.69 (one-tailed), t = 4.4

39. H
0
:
D
= 0 vs H
1
:
D
= 0, df = 11, tc = 2.201, M
D
= -0.67, s
MD
= 3.08, t = -0.22
95%:
D,lower
= -7.45
D,upper
= 6.11

40. a. H
0
:
1
=
2
H
1
:
1

2
(2-tailed test) This is an independent means design because
there are two separate groups of participants; is unknown

Therapist 1 (Dr Phil) Therapist 2 (Dr Karl)
X
1
X
1
-M
1
(X
1
-M
1
)
2
X
2
X
2
-M
2
(X
2
-M
2
)
2

S1 17 2 4 S1 19 5 25
S2 11 -4 16 S2 12 -2 4
S3 21 6 36 S3 13 -1 1
S4 20 5 25 S4 9 -5 25
S5 14 -1 1 S5 22 8 64
S6 8 -7 49 S6 11 -3 9
S7 19 4 16 S7 15 1 1
S8 11 -4 16 S8 10 -4 16
S9 14 -1 1 S9 15 1 1
S10 15 0 0 E 126 0 146
E 150 0 164 M 14
M 15

( ) ( )
( ) ( )
|
|
.
|

\
|
+
+
+
=

2 1 2 1
2
2 2
2
1 1
2 1
1 1
1 1 n n n n
M X M X
s
M M


( ) ( )
|
.
|

\
|
+
+
+
=
9
1
10
1
8 9
146 164
( ) 211 . 0
17
310
=
17
41 . 65
= 848 . 3 = 961 . 1 =
( )
2 1
2 1
M M
s
M M
t

=
( )
961 . 1
14 15
= 51 . 0 =
t
C
= t
/2
(n
1
+n
2
-2) = t
.05/2
(18) = 2.110 Since |0.51| < 2.110, we retain H
0



58
We conclude that there is insufficient evidence (with =.05, 2-tailed) to suggest that there is
any difference in the effectiveness of the two therapists in treating performance anxiety.
Dr Phil and Dr Karl shake hands and agree that they are both excellent therapists (even
though there was no evidence from this competition that either one had any beneficial impact
at all, since there was no pre-test or control group).
b) Blind in this context does not mean that the judges were visually impaired, but rather
that they were not told which therapist has treated each participant. If they had not been blind
to this information, then they might have shown bias in favour of their favourite therapist,
which would threaten the internal validity of the study.


41.
o = (Z
0
- Z
1
) Power
0 0 .05
.2 0.6 .09
.4 1.2 .22
n = 9 .5 1.5 .32
.6 1.8 .44
.8 2.4 .67
1.0 3.0 .85
0 0 .05
.2 1.2 .22
.4 2.4 .67
n = 36 .5 3.0 .85
.6 3.6 .95
.8 4.8 .99+
1.0 6.0 .99+
0 0 .05
.2 2.0 .52
.4 4.0 .98
n = 100 .5 5.0 .99+
.6 6.0 .99+
.8 8.0 .99+
1.0 10.0 .99+



59
42. (a) n = 63 (b) n = 94

43. n = 1570 per group.

44. (a) power: .09 ( = .2); .32 ( = .5); .67 ( = .8)
(b) power: .44 ( = .2); .99+ ( = .5); .99+ ( = .8)
( c) n = 104 per group.

45. r = .671, r
2
= .45

46. df = 5, s
r
= .33, t
c
= 2.571, t = 2.023

47. r = -.908, df = 8, s
r
= .148, t
c
= 2.306, t = -6.135

48. (a) For = .1, power = .99+. The effect is statistically significant due to very large N,
but psychologically of little significance (r
2
= .0225).
(b) n = 101

49. o = 2.8, power = .88, yes.

50. r = .67, Y = -1 + 0.05X, for X = 80, Y = 3. 90% limits for Y: (2.14, 3.86)

51. (a) and (b)

Z
Y
Y Zd
-1.8 7 2.25
-1.2 13 0.25
1.8 43 -1.00
-0.6 19 -0.50
0 25 1.25

(d) Y = 16, s
y.x
= 8, 95% limits: (0.32, 31.68)

52. (a) Student 1: Y = 52.9, Student 2: Y = 80.2
(b) Student 1: Zd = 1.30, Student 2: Zd = -.99

53. _
2
= 4.1, df = 2, o = .05, _
2
c
= 5.99

54. _
2
= 5.0, df = 2, o = .05, _
2
c
= 5.99

55. _
2
= 102, df = 2, o = .05, _
2
c
= 5.99

56. _
2
= 4.76, df = 1, o = .05, _
2
c
= 3.84, Cramers | = .22

57. _
2
= 1.30, df = 4, o = .05, _
2
c
= 9.49

58. (a) 3 factors (b) levels - 2 on factor 1; 3 on factor 2; 4 on factor 3 ( c) 24 groups

59. Evidence of A main effect, B main effect, but no interaction.


60
60. While factor A has an effect on the dependent variable, regardless of factor B, and
factor B has an effect on the DV, regardless of factor A, the significant interaction indicates
that the effect of factor A on the DV depends upon which level of factor B is being
experienced.

63. A correlational study ie. we cannot rule out competing explanations for the average
difference in cholesterol level. What would be a plausible alternative explanation for this
difference to the one given by the psych student?

64. Individual differences, measurement error etc all contribute to random error. The
extent to which the difference between sample means can be considered a true difference or
not (ie. error) is determined by the comparison of the obtained statistic, based on the ratio
of this difference to the amount of expected error (as determined by the standard error
estimate), to the theoretical value of the statistic that would have occurred if random error
was the only thing responsible for this difference.

65. No, testing time is a potential threat to the internal validity of the study because, as an
extraneous variable, it does not vary randomly within each group, but rather varies
systematically between groups. It is possible that participants level of concentration would
be different after lunch compared to the morning. The researcher should have randomly
allocated each participant to one of the two testing times.
To what extent can the performance of university students on a simple perceptual task be
generalised to a different population (pilots) and a more complex task (flying a plane)?

66. Within-subject counterbalancing: each participant receives conditions in the order
ABCCBA. Between-subject counterbalancing: participants are randomly allocated to one of
three orders, one possible permutation being either ABC, CAB or BCA.

67. Carry-over effect.

68. Rank-order effect.

70. A confidence interval is a statement about the location of a population parameter.
The random variable in this procedure is not the parameter, which is fixed across replications,
but the sample mean (or means) from which the upper and lower limits are calculated.
Consequently the limits of a confidence interval vary across replications. When referring to
the behaviour of a confidence interval procedure across replications, we can talk about the
probability of the population parameter being captured by the confidence interval limits.
However, for any one particular replication, the population parameter either is or is not
contained between the calculated limits. Hence we make a statement regarding the
confidence, rather than probability, that the population parameter is contained between the
observed interval limits.

71. A p-value is a probability statement regarding the obtained test statistic. A statistic for
which p < .05 means that the probability of obtaining a statistic at least as large as the one
observed is less than .05, under the null hypothesis. That is, p < .05 is commensurate with the
observed t being larger than an o-level critical t, and hence rejection of the null hypothesis.



61
72. The size of the test statistic in relation to the critical value (or the size of the p-value)
does not necessarily tell us anything about the size of the effect of the IV on the DV. Why
not?

73. The null hypothesis refers to the difference between population means, not sample
means.

74. See Q70.


62



SECTION 2

COMPUTING LABS


63
LAB 1: Data analysis with SPSS
Data Analysis with SPSS
In this course you will be learning how to use a statistical package SPSS produced by IBM.
This package can do lots of sophisticated data analyses. More advanced features of this
package are covered in the third year research methods courses in psychology. For now, you
will be learning the basics - how to enter data; how to obtain descriptive and simple
inferential analyses; how to draw graphs, tables and so on; and most importantly how to
understand the output. The following notes are written for SPSS version 20. If you are using
another version you may find some minor discrepancies.
There are four basic steps in the analysis of data with SPSS:









Getting Started
To access SPSS, double click the SPSS icon (left mouse button). If you cant see an SPSS
icon, ask your tutor for help. When SPSS first starts up, you should see the Data Editor, as
shown below. You may also get a dialogue box What would you like to do (click Cancel to
get rid of the dialogue box).
The Menu bar at the top of the Data Editor contains pull-down menus. Once data have been
entered into the data spreadsheet window and variable names defined, then statistical analyses
(found under Analyze) can be carried out. The results of any data analyses are sent to an
output window (which SPSS calls the Viewer), which can be viewed, printed or saved as a
separate file. You can switch back and forth between the Data Editor and the Output windows
from the pull-down menu under Window or by holding down the Alt key and then pressing
Tab.


64

USING AN EXISTING DATA FILE
To get the feel for what SPSS can do, you are going to play with one of the example data sets
that comes with the package. The name of the file is employee data and it contains data for
474 respondents on 11 variables. To access this file, follow the instructions below and over
the page.
From the top Menu bar, click File, then Open, then Data. This dialogue box will appear:

Navigate to C:\Program Files\IBM\SPSS\Statistics\20\Samples\English


65
Double click the filename Employee data.sav and after a few moments the data and variable
names will appear in the Data window. By convention, SPSS assigns the file extension *.sav
to data files.

Each column of the data spreadsheet represents a variable and each row a different case
(subject or participant). For large data sets, you can see the remaining columns (variables) by
using the horizontal scroll bar at the bottom of the screen, and the remaining cases by using
the vertical scroll bar to the right of screen.

Notice that at the top of the Data Window the name of the file has been included (this lets you
know which data file you are working on in case you forget).
Exercise 1: FINDING OUT ABOUT VARIABLES
To find out more about each variable (the variable name maybe rather cryptic, eg. salbegin)
you can switch from Data View to Variable View (at the bottom of the screen) to see if
variable labels, value labels or missing values have been included.
Variable View gives information to the right of each variable Name. For example, salbegin -
the variable Label tells us that this is "Beginning Salary", the variable Type is DOLLAR
(meaning the data are entered with $ sign) and the missing value is $0. This means that a
value of $0 for the variable salbegin indicates that the beginning salary was not provided (and
hence for subsequent data analyses any case for whom the value of salbegin is $0 will not be
included). If the missing value is not specified in this way, then the value of $0 will be taken
literally as a beginning salary of $0.00 (and included as such in subsequent analyses).


66
SPSS also allows for variable Values to be included. For example, jobcat is a categorical
variable and is coded 1, 2 or 3 (the Type is Numeric, which tells us that this variable is coded
as a number). The Values tell us that 1 = Clerical, 2 = Custodial and 3 = Manager. Gender is
also categorical, but is coded as a string variable with the values f or m (when Type is String,
we know that the variable is coded as a letter).
PRACTICE
Use the Variable View window to get information on all the variables in employee data.sav.
- Which variables are continuous and which are categorical? Pay attention to the
Measurement Scale information provided. Do you agree with the descriptions SPSS
applies to each variable? If not, why not?
EXERCISE 2 Obtaining descriptive statistics
Go to the top Menu bar and click Analyze, then Descriptive Statistics. You will find a pull
down menu like this:


(a) Obtain a frequency distribution for the variable educ. Click on Analyze, then Descriptive
Statistics, then Frequencies, a dialogue box opens and on the left will be a list of the variable
names. Click on the variable Education Level, and transfer it to the Variable(s): box, then
click OK. An Output window (Output1) will appear, containing the results of the analysis.
(b) Obtain a frequency distribution, descriptive statistics, and histogram (with superimposed
normal curve) for the variable prevexp. To get descriptive statistics, click on Frequencies
again, click on the Statistics button in the dialogue box and check whichever boxes you want
(eg. Mean and Std Deviation), and then click Continue. To obtain a histogram, click the
Charts button, then Histogram(s) and also click With normal curve, then Continue, and
OK.
Look at the output. How informative is the frequency distribution table compared to the
histogram? Which gives the better picture of the distribution? The histogram for prevexp is
SPSS has a number of different
procedures for describing variables.
Frequencies, Descriptives and
Explore all produce descriptive
statistics, but each procedure allows
for different kinds of output. Look at
the options available with each
procedure and become familiar with
how to produce frequency
distributions, histograms, stem-and-
leaf plots and other descriptive
statistics.


67
a grouped frequency distribution because prevexp is a continuous variable with a wide range
of values.
You can edit the Histogram by double clicking on it. When you do this, a new window
appears (the Chart Editor) containing the histogram [For easier editing, maximise the Chart
Editor window]. In the Chart Editor, selecting Options, then Bin Element brings up a
dialogue box that allows you to change various characteristics of the histogram.
For example, you can change the class interval size. Select Custom under the X axis heading.
You can enter either the number of intervals required or an interval width. What happens
when you change the default interval width from 20 to 10? Click Interval width, and in the
box enter 10, and click Apply. The histogram is redrawn with the new interval width. Repeat
the process, this time making the interval width 100. Comparing the three interval widths (10,
20 or 100) lets you see that the choice of interval width determines whether the histogram
conveys too much or too little detail.
OUTPUT: Every time you run a procedure, the output is appended to the previous output.
Get into the habit of deleting unnecessary parts of the output (by clicking on it and then
pressing Delete) before saving (and especially before printing) the file. Note that you can also
click on any part of the Output file and copy (Right mouse click and Copy) and paste (Right
mouse click and Paste) it into a Word document to include in assignments or to print.
Saving the Output: From the top Menu bar click File, then Save As. In the dialogue box that
appears, click on the Save in pull down menu at the top, and click through to your individual
directory on the server. In the File name box, type in employeedata.spv (or make up your own
file name, such as lab1.spv). Click OK.
You should also copy any files you wish to keep onto your memory stick or Z-drive folder as
a backup. To save the Output, go to File at the top of the Output window, click Save As, and
enter an appropriate file name in the File name box (eg. e:lab1.spv note: SPSS attaches the
extension *.spv to output files)
There is also an icon at the top of the Output window that allows you to export the output to a
file in another format, for example Word or Powerpoint (the icon has a picture of a page with
a green arrow).
To Print the Output: ONLY PRINT WHEN YOU REALLY HAVE TO. From the Output
window, click File then Print. In the Print window, click either All or Selection (if you have
highlighted only part of the Output to be printed), then click OK.
Note: If your output doesnt print in a few minutes DONT KEEP SENDING IT TO THE
PRINTER. See your tutor. There may be a long print queue or the printer may be
malfunctioning.
TO QUIT SPSS: When you have finished your SPSS session, Exit from the program by
going to File Exit. Unless you wish to save the data or output, click No to all dialogue
boxes asking you about saving



68
PRACTICE
1. Produce a histogram (but not a frequency distribution) for salary.

2. Calculate the 50
th
and 90
th
percentile scores for the variable salbegin. [Hint: Use Explore
or Frequencies].

3. SPSS has a number of tutorial modules which can be accessed by clicking on Help, then
Tutorial. For those of you who would like to work through some of the basics at your own
pace, have a look at the following topics:
- Introduction.
- Using the Data Editor.
- Working with output.


69
LAB 2:
Entering, defining, and analysing data
EXERCISE 1 - ENTERING DATA IN SPSS
The data set below represents scores on 9 variables (the variable names will be at the top of
each column) for 10 cases (there will be 10 rows of data). Before entering the data, remember
that each row is a participant (case) and each column is a variable.
Open SPSS. In the Data View window, click the first cell (row 1, column 1), and type the
value 5. Notice that 5 appears in the cell editor above. Hit Enter to transfer the value to the
first cell.
As soon as data are entered in the first column, the name var00001 will appear at the top of
that column. You can change var00001 to the variable name ACHIEVE by changing from
Data View to Variable View, then you can enter ACHIEVE in the variable Name box. [Note:
variable names must be no longer than 8 characters and must be a single word. SPSS reserves
some words or letter combinations for special purposes and they can't be used for variable
names. If you inadvertently choose the wrong variable name an error message will occur.]
Type in the remaining values for the first column and subsequent columns (you can use arrow
keys to move from cell to cell) and define the remaining variable names.

ACHIEVE MOTIV RESP SKILL OIR ATAR UNI OUT SEX
5.0 3.5 3.0 3.0 3.0 82.1 84.2 2 0
4.8 5.0 5.0 5.0 4.0 94.6 81.3 1 1
4.5 4.0 3.5 4.0 4.6 86.6 90.0 1 1
99 4.5 4.5 4.5 4.5 83.9 61.6 1 0
5.0 4.5 3.5 4.0 4.0 84.3 77.2 1 0
4.0 3.0 99 3.0 3.0 70.5 71.2 2 1
3.8 4.0 4.0 5.0 4.0 95.5 85.6 1 0
4.0 3.8 3.0 4.0 3.5 71.9 67.7 2 1
3.5 3.0 2.5 2.5 2.5 63.8 65.7 2 1
4.2 3.7 4.0 4.0 3.0 65.6 62.7 2 0



70
EXERCISE 2 - DEFINING VARIABLES
You have already defined variable names for the data set; however, some variables may
require variable labels, value labels and missing values. Below is a description of the
variables.
The variables are : ACHIEVE = work achievement; MOTIV = work motivation; RESP =
responsibility in the workplace; SKILL = workplace skill; OIR = overall interview rating;
ATAR = Australian University Admission Rank; UNI = average university grade; OUT =
outcome of application (where 1 = accepted into job and 2 = rejected); and SEX is coded 1 =
female and 0 = male.
For ACHIEVE, MOTIVE, RESP and SKILL missing values are coded 99.
To enter this information, switch to Variable View.
1. To define a label for a variable, simply click on the cell corresponding to the variable
(row) under the column headed Label, then type in the label (e.g. work achievement).
2. To define Value labels (for categorical variables only), click on the corresponding cell
under the column headed Values, then click on the small grey box at the right of the cell.
In the dialogue box that appears, enter a label for each possible value. For example, for
OUT, type 1 in the Value box, and type accepted in the Value Label box. Click Add. Go
back to the Value box, type in 2, type rejected in the Value Label box, click Add, then
OK.
3. To define missing values (see below), click on the corresponding cell under the column
headed Missing, then click on the small grey box at the right of the cell. In the dialogue
box that appears, click Discrete missing values and in the first box type the missing value
(e.g. 99), then OK.
Follow step 1 for all variables, step 2 for OUT and SEX, and step 3 for ACHIEVE, MOTIV,
RESP, and SKILL.
User Defined Missing Values and System Missing Values
A user defined missing value is a value for a variable that is entered for a case for which you
do not have a valid value (e.g., if a participant did not respond to an item on a questionnaire,
where 1, 2, 3, or 4 represent valid responses, you may enter, say, a 9 for that case to
indicate that the response on this item is missing). If, instead, you leave the cell blank in the
Data Editor for this participant (i.e., you do not enter a value for the variable for that case),
then SPSS will replace the empty cell with a full stop . and register the . as a system
missing value. For any subsequent analysis involving that variable, the case (and any other
cases for whom there are missing values) will be deleted from the analysis. However, the
most appropriate way of dealing with missing data is to define user missing values (and not to
rely on system missing values) for the following reasons:
- User defined missing values provide a greater amount of control over the data (and the
consequences of various actions in SPSS) than do system missing values;


71
- Checking for system missing values in your data (when you do not expect there to be any)
is an important strategy for data screening. If you find system missing values then you
know there have errors in data entry. Data screening should always be the first step in any
data analysis there is no point analysing data that have been entered incorrectly. A good
way to check for data entry errors is to carry out a Frequencies analysis on each variable
(see Lab 1), and look for inappropriate or missing values.
SAVING THE DATA
Once your data have been entered, you can save the data file for future work if you wish.
From the top Menu bar click File, then Save As. In the dialogue box that appears, click on
the Save in box at the top, and click through to your individual directory on the server (Z:\). In
the File Name box, type in workdata.sav (or make up your own file name). Click OK.
You only need to save files that you wish to use again. Remember that you should also copy
important files onto a memory stick as a backup it is always possible that files can be lost
from the server due to hardware breakdown. It is best to copy files through Windows (e.g.
using Windows Explorer) rather than by saving directly from SPSS. The reason for this is that
SPSS constantly reads from and writes to open data files, and if a memory stick is removed
while SPSS is running it can cause the program to crash and the file to be corrupted.
EXERCISE 3 - ANALYSING THE DATA
1. Use the Descriptives procedure (Analyze Descriptive Statistics - Descriptives) to
obtain descriptive statistics for each variable. Does this produce sensible output for all the
variables?
2. Obtain a scatterplot of ATAR with UNI. From the top Menu bar click Graphs, then
Legacy Dialogs, then Scatter/Dot, then Simple Scatter, then Define. In the Scatterplot
window, transfer UNI to the Y axis box, and ATAR into the X axis box, then click OK.
How would you describe this relationship?
3. Produce a table which shows how many males and females were successful in their job
application and how many were unsuccessful. Because both the variables SEX and OUT are
categorical, what is required here is a contingency table, which is produced by Crosstabs
(Analyze Descriptive Statistics Crosstabs). In the Crosstabs window, transfer SEX to
the Row(s) box and OUT to the Column(s) box. Select Cells, and under Percentages tick
Row, then click Continue - OK.
You should get a 2 2 contingency table with cell, row and column frequencies and row
percentages. Have a higher percentage of males or females been successful?
4. Obtain the average ATAR score separately for males and for females. The procedure
Means provides descriptive statistics for separate groups of cases (Analyze Compare
Means Means). In this example SEX is the grouping variable. In the Means window,
transfer the variable ATAR to the Dependent List and the variable SEX to the Independent
List, then click OK.


72
DATA MODIFICATION COMMANDS
COMPUTE AND COUNT
These commands are all used to modify the data in some way, by creating new variables or by
changing the values of existing ones.

COMPUTE can be used to create a new variable or to modify an existing one. For example,
suppose you want to separate your cases into young(=< 25 years) and old (> 25 years)
and you already have a variable age (in years). You can use the command COMPUTE to
create a new variable (call it GROUP) where all cases whose age is 25 years or less get a
value of 1 on GROUP, and those older than 25 years get a value of 2 on GROUP. The new
variable GROUP can be used as a grouping variable in subsequent procedures (eg. Means).

COUNT creates a variable which counts the occurrences of value(s) across a list of variables.
For example, suppose participants have indicated whether they agree, disagree or are
indifferent to 10 different statements, where the variables ITEM1, ITEM2 and so on up to
ITEM10 represent their responses to these 10 statement. Suppose, also, that you wish to know
how many times a participant responded with disagree across the 10 statements. You can
use COUNT to create a new variable whose value will be the number of times each
participant gave a response of disagree across ITEM1 to ITEM10.


EXERCISE 4
The following data set consists of weekly food, transport and leisure expenses for 10
employees:
SUBNO SEX EXF EXT EXL INCOME
1 1 67 45 23 899
2 1 56 23 12 756
3 2 126 146 54 1560
4 2 108 136 45 1038
5 2 128 66 37 1456
6 1 96 48 56 1056
7 2 67 45 20 979
8 1 56 34 45 967
9 1 46 85 84 999
10 2 89 78 56 1678

where SEX is coded 1 = males and 2 = females; EXF = weekly expenditure on food; EXT =
weekly expenditure on transport; EXL = weekly expenditure on leisure; and INCOME is
fortnightly income in dollars.
1. Enter the data set into SPSS and save the data as bills.sav.
2. For each case, calculate the total weekly expenditure on food, transport and leisure. To do
this you will create a new variable called WEEKEX, where WEEKEX = EXF + EXT + EXL.


73
Click Transform then Compute Variable. In the Compute window type the variable name
WEEKEX into the Target variable box. To add a variable label for WEEKEX click Type &
Label and enter Total weekly expenditure, and click Continue. In the Numeric Expression
box type EXF + EXT + EXL, and click OK.

3. For each case, calculate the money earned per year which isn't spent on food, transport or
leisure, i.e. create a new variable SAVING = INCOME*26 - WEEKEX*52.
Click Transform Compute Variable. In the Compute Variable window click Reset to
clear any previous expressions. Enter the variable name SAVING into the Target variable
box, and the expression INCOME*26 - WEEKEX*52 into the Numeric Expression box, and
click OK.

4. For each case count the number of weekly bills over $50. That is, create a new variable
OVER$50 which will take the values 0, 1, 2 or 3.
Go to Transform Count Values within Cases. In the Count Occurrences of Values within
Cases window, type the variable name OVER$50 into the Target Variable box. Highlight
EXF, EXT and EXL and transfer them to the Variables box using the transfer arrow.
Now click Define Values and then Range, value through HIGHEST. In the Range box
type 50, and click Add, then Continue and finally OK.

5. Create a new variable EXCESS which is coded 1 for those whose weekly bills exceed
$250, and 0 for those whose weekly bills do not. This requires using the COMPUTE
command and a conditional transformation.
Click Transform Compute Variable. In the Compute Variable window click Reset to
clear any previous expressions. Enter the variable name EXCESS into the Target variable
box, and then click If.
Click Include if case satisfies condition: and type in the box underneath WEEKEX GT 250,
and click Continue.
Now in the Numeric Expression box type the value 1 and click OK. This will create a
variable EXCESS which is coded 1 for all cases for whom WEEKEX is greater than 250.
Do the whole thing again, this time giving EXCESS the value 0 for those cases for whom
WEEKEX is less than or equal to $250.
Click Transform Compute Variable, and then click If. Change WEEKEX GT 250 to
WEEKEX LE 250, and click Continue. Now in the Numeric Expression box type the value 0
and click OK. The variable EXCESS should now be coded either 1 or 0 depending upon the
value of WEEKEX.


74
PRACTICE
1. Using the data file employee data.sav (the same file as used in Lab 1):
(a) Obtain a scatterplot of salbegin with salary and describe the relationship.
(b) Find the average current salary for males compared to females and for minority compared
to non-minority employees.
(c) Of the 474 cases in the data file, how many fall into each occupation category?
2. Use SPSS to show that the mean of a set of standardised scores (z-scores) is 0 and the
variance is 1. [Hint: The procedure Analyze Descriptive Statistics - Descriptives contains
an option for creating a new variable whose values are the standardised scores of an existing
variable.]


75
LAB 3:
Single mean and dependent mean analysis
SINGLE MEAN
EXERCISE 1 - Confidence Interval for

From the Week 4 Statistics Tutorial Exercise 2, the IQ scores for the 16 school children are:
94 94 95 96 98 99 99 99 101 101 101 102 104 105 106 106
Enter the data (with variable name IQ) into the Data window and save the data to your folder
or memory stick. Use Explore to calculate the 90% confidence interval.
[In the Explore window transfer IQ to the Dependent List: box and then click Statistics.
Change the 95% to 90% in the Confidence Interval for Mean box. Click Continue and then
OK.]
Compare the output to that obtained in the Week 4 Statistics Tutorial.
PRACTICE
Using the same data as above, obtain the 99% confidence interval limits (this is Statistics
Practice Q. 15).
EXERCISE 2 Hypothesis test for
From the Week 4 Statistics Tutorial Exercise 3, the interpersonal difficulties scores for the 10
clients are: 59 60 67 65 90 89 73 81 83 71.
To carry out a single mean hypothesis test of H
0
: = 60 vs. H
1
: 60, with o = .01, we can
use a One Sample t-test procedure in the following way:
Step 1: Enter the 10 scores into a data window (call the variable SCORE).
Step 2: Go to Analyze - Compare Means One-Sample T Test
In the One Sample T Test window, transfer SCORE to the Test Variables box, and enter 60
into the Test Value box, and click OK.
Compare the output to the solution from the Week 4 Tutorial. What interpretation can be
made?
[The One Sample T Test procedure also produces confidence interval output, under Options.]
PRACTICE
Use the One Sample T Test procedure to obtain the solution to Q. 17 in the Statistics Practice
questions.


76
DEPENDENT MEANS
(A) CONFIDENCE INTERVAL ESTIMATION
Below are the data from Week 5 Statistics Tutorial Exercise 3:
Pre: 50 65 42 51 59 Post: 45 63 40 48 56
Obtain 95% confidence limits for the population mean difference in anxiety.
Step 1: Enter the data into the data window, call the variables PRE and POST.
Step 2: Go to Analyze - Compare Means Paired-Samples T Test. In the Paired Samples
T Test window, highlight the variable POST and click the transfer arrow. It will become
Variable 1. Then highlight PRE, click the transfer arrow and it will become Variable 2.
Both variables will be in the Paired Variables box as Pair 1 (POST PRE). Click OK.
Save the output and compare the solution from the Week 5 Statistics Tutorial.
* (B) HYPOTHESIS TEST
Below are data commensurate with Week 5 Statistics Tutorial Exercise 4:
Test 1: 11 13 9 12 8 6 10 13 15
Test 2: 16 10 13 13 7 6 13 15 13
Carry out an = .05 level two-tailed test of the null hypothesis of no difference in
performance between Test 1 and Test 2.
Hint: Use the Paired-Sample T Test function
Save your output and compare with the solution from the Week 5 Statistics Tutorial.
a) From the SPSS output, how can we determine the direction of the effect of the
independent variable?

b) From the SPSS output, how can we determine whether the null hypothesis may be rejected
or not?


* = assessable component


77
PRACTICE
1. Make up and enter data for 15 cases into SPSS as follows:

Variable Description
--------------------------------------------------------------------------------------------------------
YEARS Age in years
MONTHS Number of months since last birthday
There are no missing data for age.
SEX Sex, 1=M 2=F. No missing data.
COURSE Course code, a 3 digit integer. No missing data.
PSY1 .PSY6 Scores on six third year psychology subjects.
SUB1SUB4 Scores on four non-psychology subjects.
Each is a 2-digit integer. Missing data are coded as -9.
Q1 Q8 Scores on 8 questionnaire items on attitudes towards continuous
assessment. Each is an integer from 1-7, with 7 in each case
favouring continuous assessment. Missing data are coded 0.
--------------------------------------------------------------------------------------------------------
Save the data file and carry out the following:
(a) Create, for each case:
(i) a mean score for the 6 psychology subjects;
(ii) a total score for the questionnaire items;
(iii) age expressed in decimal years (eg. 20 yrs 3 mths becomes 20.25 yrs);
(iv) a count of the number of times continuous assessment is favoured, across the 8
questionnaire items;
(b) Provide some descriptive statistics, across all cases, for the variables created in part
(a);
(c) Produce a listing of all the data (go to Analyze - Reports Case Summaries)
2. Follow the steps in Ex. (A) above to obtain a solution for Q.22 from the Statistics Practice
Questions.
3. Follow the steps in Ex. (B) above to obtain a solution for Q.23 from the Statistics Practice
Questions.


78
LAB 4
INDEPENDENT MEANS
(A) HYPOTHESIS TEST
Below are the data from the Week 6 Stats Tutorial Exercise 2:
GROUP 1: 6 7 6 10 9 6 7 5 4 4 2
GROUP 2: 5 6 3 8 7 3 5 2 2 0 3

Carry out an independent groups t-test at the .05 level of significance.
Step 1: Enter the data into SPSS, creating two variables: the participants serum cholesterol
score (SERUM) and the grouping variable (GP) which is coded 1 for group 1 and 2 for group
2.
When you have finished entering the data into SPSS, your data window should have 22 rows
(for the 22 cases, the first 11 cases are Group 1 and the last 11 cases are Group 2) and 2
columns - SERUM and GP.
Step 2: Go to Analyze - Compare Means - Independent Samples T Test. Transfer SERUM
to the Test Variables(s) box, and GP to the Grouping Variable box. You need to define the
values of GP that will make up the two groups for the t test. Click Define Groups and type in
the value 1 in the Group 1 box and the value 2 in the Group 2 box, then Continue and OK.
Save your output and compare to the solution from the statistics tutorial.
* (B) CONFIDENCE INTERVAL ESTIMATION
Below are the data from the Week 6 Stats Tutorial Exercise 1:
GROUP 1: 8 10 7 11 12 13 9 10 11 9
GROUP 2: 15 17 14 19 13 12 12 13 11 14
Calculate the 95% confidence interval for the difference in recall between the two
populations of jurors.
Hint: Use the Independent Samples T Test function

The 95% confidence limits are in the last column of the output table. Save your output and
compare to the solution from the statistics tutorial.
a) Write down the value of
2 1
M M
s

, as calculated by SPSS.
b) How does
2 1
M M
s

relate to the width of the confidence interval? What will happen to
the range of the confidence interval if
2 1
M M
s

is increased?
c) Without doing any further analysis in SPSS or by hand, estimate the 99% confidence
limits.


79
PRACTICE
1. Follow the steps in Ex. (A) above to obtain a solution for Q.24 from the Statistics Practice
questions.
2. Follow the steps in Ex. (B) above to obtain a solution for Q.25 and Q. 26 from the Statistics
Practice questions.
3. Questions 28 to 40 in the Statistics Practice section are a mixed bag of confidence interval
and hypothesis test problems using either the Z or the t approach. For those questions in this
set that require the t approach and where raw data are provided, use SPSS to provide a
solution to the question and compare to your hand calculations.


80
LAB 5
CORRELATION AND PREDICTION
EXERCISE 1
Below are the raw data from the Week 10 Statistics Tutorial Exercise 1:
HOURS (X) 12 8 20 6 0 10 8 5 0 5
STRESS (Y) 22 16 25 10 9 14 21 16 15 11

(A) Obtain a scatterplot of HOURS against STRESS.
Step 1: Enter the data into SPSS. You should have 10 rows and 2 columns (HOURS and
STRESS) of data. Save the data file to your disk.
Step 2: Click Graphs Legacy Dialogs Scatter/Dot Simple Scatter - Define. In the
Scatterplot window, transfer HOURS to the X axis and STRESS to the Y axis and click OK.

(B) Use SPSS to obtain the correlation coefficient and test the null hypothesis of no
correlation between the number of hours of outside work per week and perceived level of
stress for the population of university students.
Click Analyze - Correlate - Bivariate. Transfer the variables HOURS and STRESS to the
Variables: box and click OK.
SPSS does not calculate a t statistic for a hypothesis test of a correlation coefficient. Instead
it gives the exact p-value. If the p-value is less than .05, then the null hypothesis of no
correlation can be rejected at the .05 level. Based on the above data, the null hypothesis can
be rejected (r = .773 and p = .009). What conclusion can be made?

* EXERCISE 2 - PREDICTION
(A) Obtain the prediction equation, predicting STRESS given HOURS.
Click Analyze - Regression - Linear. In the Linear Regression window, transfer STRESS to
the Dependent: box and HOURS to the Independent(s): box and click OK.
The output from the Linear Regression command contains more information than is discussed
in RM2. This procedure allows for multiple regression analyses and so the output is
presented in multiple regression "jargon". In RM2, we are looking at simple regression only
(i.e., one dependent variable and one independent variable).
The part of the output that is relevant is below:
Coefficients
a

Model
Unstandardized Coefficients
Standardized
Coefficients
t Sig. B Std. Error Beta
1 (Constant) 10.693 1.889

5.662 .000
HOURS .704 .204 .773 3.451 .009
a. Dependent Variable: STRESS



81
The table tells us that the intercept (constant) is a = 10.693 and the regression coefficient is
b = .704. From this information, what is the regression equation for predicting STRESS from
HOURS?




(B) What proportion of the variance of STRESS scores is predictable from HOURS?
From the output of part (A), you get the following:



In this example with only one independent variable, the "R" is the same as the Pearson
product-moment correlation coefficient, r, and "R Square" is r
2
. Using this output, what
proportion of the variance of STRESS scores can be predicted from knowing HOURS?


(C) Obtain a fitted regression line to the scatterplot of STRESS with HOURS.
Edit the scatterplot of STRESS with HOURS created in Exercise 1 (double click the chart).
From the top Menu bar of the Chart Editor window, click on Elements then Fit Line at
Total.

Your scatterplot should now have a regression line fitted through it. Notice that the X axis
does not cross the Y axis at 0, and for this reason the fitted regression line does not "look
right" (i.e. it does not cross the Y axis at the value of the intercept a = 10.693). To correct
this, double click on the y axis, and then beside Minimum, untick Auto and change the
minimum value to 0. Press Apply. The scatterplot is redrawn and now looks right.

a) From the graph, it is possible to read off an approximate predicted stress level for a given
number of hours. Your tutor will ask you to do this for a particular number of hours.



PRACTICE
1. Use SPSS to obtain solutions to Statistics Practice Questions 45, 47 and 50.
2. Using the data file employee data.sav,
(a) Obtain the regression equation for predicting salary from salbegin.
(b) Fit the regression line to the scatterplot of salary with salbegin.
(c) Obtain predicted salary values.


82
3. Are the current salaries for males closer to their predicted values than current salaries for
females?
[Hint: One way of answering this question is to produce a scatterplot of current salary with
predicted salary, where data points for males are indicated separately than those for females.
To do this, in the Scatterplot window, transfer salary to the Y axis, pre_1 to the X axis and
sex to the Set markers by: box, then click OK.]
Look at the scatterplot and see if the scatter is greatest for females or for males. The greater
the scatter, the more error variability (i.e. the larger the residuals), and the less accurate is the
prediction.

4. Are the assumptions associated with regression met for the prediction of salary from
salbegin?
There are four assumptions associated with regression:
(i) Independence of observations (this is not testable but is a design issue);
(ii) Normality - the DV should be normally distributed at each value of the IV.
(iii) Homoscedasticity - the DV should have the same variability for each value of
the IV.
(iv) Linearity - the relationship between the DV and the IV should be linear.

SPSS can be used to check assumptions (ii), (iii) and (iv).

Go to Analyze - Regression - Linear and click Reset (to undo the previous requests). Set up
salary as the DV and salbegin as the IV. Then click Plots and in the Plots window transfer
*ZPRED to the Y box, *ZRESID to the X box, and click Histogram and Normal probability
plot. Continue and OK.
The histogram and normal probability plot provide a check of the normality assumption. A
normal curve has been superimposed on the histogram. Does the distribution of salary appear
normal? The normal probability plot provides another means of checking the normality
assumption. In the P-P plot the observed cumulative probabilities are plotted and if the
distribution is skewed or there is kurtosis, the cumulative probabilities will depart from a
straight line.
The scatterplot of the standardised predicted scores against the standardised residuals
provides a check of the homoscedasticity assumption. If the assumption is met, there should
be no discernible pattern to the scatter. The appearance of a pattern in the scatter suggests
that this assumption is not met. For example, if the scatter appears to fan out as the residuals
increase (or decrease), then this suggests that the variability of scores increases across values
of the IV and hence the data are not homoscedastic, but are heteroscedastic.
The linearity assumption can be checked by looking at the scatterplot of the DV with the IV.
Does the relationship appear linear? If not, and if there is a discernible non-linear
relationship, then linear regression procedures will not be appropriate for these data.


83
LAB 6
EXERCISE 1 - TESTING ASSOCIATIONS BETWEEN
CATEGORICAL VARIABLES
Using the data file employee data.sav (from Lab 1), is there an association between
occupational category and minority status for male employees?
To produce relevant output, you first need to select only male cases.
Step 1: To select only cases for whom gender takes the value m, you will need to use the
Select Cases command. Go to Data - Select Cases. In the Select Cases window, click If
condition is satisfied and then click the If button. Transfer gender to the right hand box,
press
=
, and type m (including the quotation marks). Click Continue, then OK.
Look at the data window. Those cases with gender values of f have been filtered out and
will not be included in subsequent analyses (until the Select Cases command is reset).
[Note: When you no longer want to filter out cases, go back to Select Cases and click All
cases or Reset, then OK.]
Step 2: Go to Analyze Descriptive Statistics - Crosstabs. In the Crosstabs window
transfer jobcat to Rows(s): box and minority to Column(s): box
Then click the Statistics button, and tick the Chi-square box and, under Nominal, check
the Phi & Cramers V box. Click Continue, then OK.
What interpretation can be made of the output?
* EXERCISE 2 - 2 2 FACTORIAL DESIGNS
Does an employees current salary depend upon whether they are male or female, or whether
they have minority status or not? To get a picture of whether current salaries differ across
levels of gender and minority, we can treat gender and minority as two IVs and salary as the
DV and get SPSS to produce 2 2 table of cell means.
[Before starting Exercise 2, remember to Select All cases, after completing Ex 1.]
Step 1: To get the cell means, go to Analyze - Compare Means - Means. In the Means
window, transfer salary to the Dependent List: box and gender to the Independent List:
box. We want to add another IV, so click Next and enter minority into the Independent List:
box, then click OK. Your output should look like this:

Report
Current Salary
Gender Minority Classification Mean N Std. Deviation
Female
dimension2
No $26,706.79 176 $8,011.894
Yes $23,062.50 40 $3,972.369
Total $26,031.92 216 $7,558.021
Male
dimension2
No $44,475.41 194 $20,330.662
Yes $32,246.09 64 $13,059.881
Total $41,441.78 258 $19,499.214
Total
dimension2
No $36,023.31 370 $18,044.096
Yes $28,713.94 104 $11,421.638
Total $34,419.57 474 $17,075.661


84
In this table, No and Yes refer to the two possible values of the IV Minority
Classification.

[Note: if your table does not print $values in the Mean column, but prints ****** instead, you
can correct the problem by double clicking on the table and then clicking on the column lines
and pulling them across to make the columns wider.]

Write the cell means below:
Minority Status
No Yes
Male
Female

Draw a line graph of the cell means.








The means suggest that there are main effects for both gender and minority status, as well as
an interaction. Write a brief interpretation of each effect listed below.
Main effect for gender:


Interaction (gender x minority):



85






STATISTICAL TABLES


86
AREAS UNDER THE NORMAL CURVE

Z
AREA
M to Z

Z
AREA
M to Z

Z
AREA
M to Z

Z
AREA
M to Z

Z
AREA
M to Z

Z
AREA
M to Z
0.00 0.0000 0.54 0.2054 1.08 0.3599 1.62 0.4474 2.16 0.4846 2.70 0.4965
0.01 0.0040 0.55 0.2088 1.09 0.3621 1.63 0.4484 2.17 0.4850 2.71 0.4966
0.02 0.0080 0.56 0.2123 1.10 0.3643 1.64 0.4495 2.18 0.4854 2.72 0.4967
0.03 0.0120 0.57 0.2157 1.11 0.3665 1.65 0.4505 2.19 0.4857 2.73 0.4968
0.04 0.0160 0.58 0.2190 1.12 0.3686 1.66 0.4515 2.20 0.4861 2.74 0.4969
0.05 0.0199 0.59 0.2224 1.13 0.3708 1.67 0.4525 2.21 0.4864 2.75 0.4970
0.06 0.0239 0.60 0.2257 1.14 0.3729 1.68 0.4535 2.22 0.4868 2.76 0.4971
0.07 0.0279 0.61 0.2291 1.15 0.3749 1.69 0.4545 2.23 0.4871 2.77 0.4972
0.08 0.0319 0.62 0.2324 1.16 0.3770 1.70 0.4554 2.24 0.4875 2.78 0.4973
0.09 0.0359 0.63 0.2357 1.17 0.3790 1.71 0.4564 2.25 0.4878 2.79 0.4974
0.10 0.0398 0.64 0.2389 1.18 0.3810 1.72 0.4573 2.26 0.4881 2.80 0.4974
0.11 0.0438 0.65 0.2422 1.19 0.3830 1.73 0.4582 2.27 0.4884 2.81 0.4975
0.12 0.0478 0.66 0.2454 1.20 0.3849 1.74 0.4591 2.28 0.4887 2.82 0.4976
0.13 0.0517 0.67 0.2486 1.21 0.3869 1.75 0.4599 2.29 0.4890 2.83 0.4977
0.14 0.0557 0.68 0.2517 1.22 0.3888 1.76 0.4608 2.30 0.4893 2.84 0.4977
0.15 0.0596 0.69 0.2549 1.23 0.3907 1.77 0.4616 2.31 0.4896 2.85 0.4978
0.16 0.0636 0.70 0.2580 1.24 0.3925 1.78 0.4625 2.32 0.4898 2.86 0.4979
0.17 0.0675 0.71 0.2611 1.25 0.3944 1.79 0.4633 2.33 0.4901 2.87 0.4979
0.18 0.0714 0.72 0.2642 1.26 0.3962 1.80 0.4641 2.34 0.4904 2.88 0.4980
0.19 0.0753 0.73 0.2673 1.27 0.3980 1.81 0.4649 2.35 0.4906 2.89 0.4981
0.20 0.0793 0.74 0.2704 1.28 0.3997 1.82 0.4656 2.36 0.4909 2.90 0.4981
0.21 0.0832 0.75 0.2734 1.29 0.4015 1.83 0.4664 2.37 0.4911 2.91 0.4982
0.22 0.0871 0.76 0.2764 1.30 0.4032 1.84 0.4671 2.38 0.4913 2.92 0.4982
0.23 0.0910 0.77 0.2794 1.31 0.4049 1.85 0.4678 2.39 0.4916 2.93 0.4983
0.24 0.0948 0.78 0.2823 1.32 0.4066 1.86 0.4686 2.40 0.4918 2.94 0.4984
0.25 0.0987 0.79 0.2852 1.33 0.4082 1.87 0.4693 2.41 0.4920 2.95 0.4984
0.26 0.1026 0.80 0.2881 1.34 0.4099 1.88 0.4699 2.42 0.4922 2.96 0.4985
0.27 0.1064 0.81 0.2910 1.35 0.4115 1.89 0.4706 2.43 0.4925 2.97 0.4985
0.28 0.1103 0.82 0.2939 1.36 0.4131 1.90 0.4713 2.44 0.4927 2.98 0.4986
0.29 0.1141 0.83 0.2967 1.37 0.4147 1.91 0.4719 2.45 0.4929 2.99 0.4986
0.30 0.1179 0.84 0.2995 1.38 0.4162 1.92 0.4726 2.46 0.4931 3.00 0.4987
0.31 0.1217 0.85 0.3023 1.39 0.4177 1.93 0.4732 2.47 0.4932 3.01 0.4987
0.32 0.1255 0.86 0.3051 1.40 0.4192 1.94 0.4738 2.48 0.4934 3.02 0.4987
0.33 0.1293 0.87 0.3078 1.41 0.4207 1.95 0.4744 2.49 0.4936 3.03 0.4988
0.34 0.1331 0.88 0.3106 1.42 0.4222 1.96 0.4750 2.50 0.4938 3.04 0.4988
0.35 0.1368 0.89 0.3133 1.43 0.4236 1.97 0.4756 2.51 0.4940 3.05 0.4989
0.36 0.1406 0.90 0.3159 1.44 0.4251 1.98 0.4761 2.52 0.4941 3.06 0.4989
0.37 0.1443 0.91 0.3186 1.45 0.4265 1.99 0.4767 2.53 0.4943 3.07 0.4989
0.38 0.1480 0.92 0.3212 1.46 0.4279 2.00 0.4772 2.54 0.4945 3.08 0.4990
0.39 0.1517 0.93 0.3238 1.47 0.4292 2.01 0.4778 2.55 0.4946 3.09 0.4990
0.40 0.1554 0.94 0.3264 1.48 0.4306 2.02 0.4783 2.56 0.4948 3.10 0.4990
0.41 0.1591 0.95 0.3289 1.49 0.4319 2.03 0.4788 2.57 0.4949 3.11 0.4991
0.42 0.1628 0.96 0.3315 1.50 0.4332 2.04 0.4793 2.58 0.4951 3.12 0.4991
0.43 0.1664 0.97 0.3340 1.51 0.4345 2.05 0.4798 2.59 0.4952 3.13 0.4991
0.44 0.1700 0.98 0.3365 1.52 0.4357 2.06 0.4803 2.60 0.4953 3.14 0.4992
0.45 0.1736 0.99 0.3389 1.53 0.4370 2.07 0.4808 2.61 0.4955 3.15 0.4992
0.46 0.1772 1.00 0.3413 1.54 0.4382 2.08 0.4812 2.62 0.4956 3.16 0.4992
0.47 0.1808 1.01 0.3438 1.55 0.4394 2.09 0.4817 2.63 0.4957 3.17 0.4992
0.48 0.1844 1.02 0.3461 1.56 0.4406 2.10 0.4821 2.64 0.4959 3.18 0.4993
0.49 0.1879 1.03 0.3485 1.57 0.4418 2.11 0.4826 2.65 0.4960 3.19 0.4993
0.50 0.1915 1.04 0.3508 1.58 0.4429 2.12 0.4830 2.66 0.4961 3.20 0.4993
0.51 0.1950 1.05 0.3531 1.59 0.4441 2.13 0.4834 2.67 0.4962 3.30 0.4995
0.52 0.1985 1.06 0.3554 1.60 0.4452 2.14 0.4838 2.68 0.4963 3.50 0.4998
0.53 0.2019 1.07 0.3577 1.61 0.4463 2.15 0.4842 2.69 0.4964 3.70 0.4999



87
CRITICAL VALUES FOR STUDENTS t DISTRIBUTION

CRITICAL VALUES FOR ONE-TAILED TEST

df + o .10 .05 .025 .01 .005 .0005

CRITICAL VALUES FOR TWO-TAILED TEST

df + o .20 .10 .05 .02 .01 .001

1 3.078 6.314 12.706 31.821 63.657 636.619

2 1.886 2.920 4.303 6.965 9.925 31.598

3 1.638 2.353 3.182 4.541 5.841 12.941

4 1.533 2.132 2.776 3.747 4.604 8.610

5

1.476 2.015 2.571 3.365 4.032 6.859

6 1.440 1.943 2.447 3.143 3.707 5.959

7 1.415 1.895 2.365 2.998 3.499 5.405

8 1.397 1.860 2.306 2.896 3.355 5.041

9 1.383 1.833 2.262 2.821 3.250 4.781

10

1.372 1.812 2.228 2.764 3.169 4.587

11 1.366 1.796 2.201 2.718 3.106 4.437

12 1.356 1.782 2.179 2.681 3.055 4.318

13 1.350 1.771 2.160 2.650 3.012 4.221

14 1.345 1.761 2.145 2.624 2.977 4.140

15

1.341 1.753 2.131 2.602 2.947 4.073

16 1.337 1.746 2.120 2.583 2.921 4.015

17 1.333 1.740 2.110 2.567 2.898 3.965

18 1.330 1.734 2.101 2.552 2.878 3.922

19 1.328 1.729 2.093 2.539 2.861 3.883

20

1.325 1.725 2.086 2.528 2.845 3.850

21 1.323 1.721 2.080 2.518 2.831 3.819

22 1.321 1.717 2.074 2.508 2.819 3.792

23 1.319 1.714 2.069 2.500 2.807 3.767

24 1.318 1.711 2.064 2.492 2.797 3.745

25

1.316 1.708 2.060 2.485 2.787 3.725

26 1.315 1.706 2.056 2.479 2.779 3.707

27 1.314 1.703 2.052 2.473 2.771 3.690

28 1.313 1.701 2.048 2.467 2.763 3.674

29 1.311 1.699 2.045 2.462 2.756 3.659

30

1.310 1.697 2.042 2.457 2.750 3.646

40

1.303 1.684 2.021 2.423 2.704 3.551

60

1.296 1.671 2.000 2.390 2.660 3.460

120

1.290 1.661 1.984 2.358 2.617 3.373

1.282 1.645 1.960 2.326 2.576 3.291



88
Table 3 POWER (1 - |) AS A FUNCTION OF DELTA (o) AND ALPHA (o)

o .10 .10 .05 .05 .01 .01
DELTA+ 1-tail 2-tail 1-tail 2-tail 1-tail 2-tail
0.0 .10 .10 .10 .05 .02 .01
0.2 .14 .11 .11 .05 .02 .01
0.4 .19 .13 .13 .07 .03 .01
0.6 .25 .16 .16 .09 .04 .02
0.8 .32 .21 .21 .13 .06 .04
1.0 .39 .26 .26 .17 .09 .06
1.2 .47 .33 .33 .22 .13 .08
1.4 .55 .40 .40 .29 .18 .12
1.5 .58 .44 .44 .32 .20 .14
1.6 .62 .48 .48 .36 .23 .16
1.8 .70 .56 .56 .44 .30 .22
2.0 .76 .64 .64 .52 .37 .28
2.2 .82 .71 .71 .59 .45 .35
2.4 .87 .77 .77 .67 .53 .43
2.6 .91 .83 .83 .74 .61 .51
2.8 .94 .88 .88 .80 .68 .59
3.0 .96 .91 .91 .85 .75 .66
3.2 .97 .94 .94 .89 .81 .73
3.4 .98 .96 .96 .93 .86 .80
3.6 .99 .97 .97 .95 .90 .85
3.8 .99 .98 .98 .97 .93 .89
4.0 *** .99 .99 .98 .95 .92





Table 4 DELTA (o) AS A FUNCTION OF POWER (1 - |) AND ALPHA (o)

o .10 .10 .05 .05 .01 .01
POWER
+
1-tail 2-tail 1-tail 2-tail 1-tail 2-tail
.95 2.927 3.290 3.290 3.605 3.971 4.221
.90 2.564 2.927 2.927 3.242 3.608 3.858
.85 2.318 2.681 2.681 2.996 3.362 3.612
.80 2.124 2.487 2.487 2.802 3.168 3.418
.75 1.956 2.319 2.319 2.634 3.000 3.250
.70 1.806 2.169 2.169 2.484 2.850 3.100
.65 1.667 2.030 2.030 2.345 2.711 2.961
.60 1.535 1.898 1.898 2.213 2.579 2.829
.55 1.408 1.771 1.771 2.086 2.452 2.702
.50 1.282 1.645 1.645 1.960 2.326 2.576
.45 1.156 1.519 1.519 1.834 2.200 2.450
.40 1.029 1.392 1.392 1.707 2.073 2.323
.35 0.897 1.260 1.260 1.575 1.941 2.191
.30 0.758 1.121 1.121 1.436 1.802 2.052
.25 0.608 0.971 0.971 1.286 1.652 1.902
.20 0.440 0.803 0.803 1.118 1.484 1.734
.15 0.246 0.609 0.609 0.924 1.290 1.540
.10 0.000 0.363 0.363 0.678 1.044 1.294
.05 *** 0.000 0.000 0.315 0.681 0.931



89
CRITICAL VALUES OF THE _
2
DISTRIBUTION: AREA IN UPPER TAIL

df+ o .05 .01 .001
1 3.84 6.63 10.83
2 5.99 9.21 13.82
3 7.82 11.34 16.27
4 9.49 13.28 18.46
5 11.07 15.09 20.52

6 12.59 16.81 22.46
7 14.07 18.48 24.32
8 15.51 20.09 26.12
9 16.92 21.67 27.88
10 18.31 23.21 29.59

11 19.68 24.72 31.26
12 21.03 26.22 32.91
13 22.36 27.69 34.53
14 23.68 29.14 36.12
15 25.00 30.58 37.70

16 26.30 32.00 39.25
17 27.59 33.41 40.79
18 28.87 34.81 42.31
19 30.14 36.19 43.82
20 31.41 37.57 45.32

21 32.67 38.93 46.80
22 33.92 40.29 48.27
23 35.17 41.64 49.73
24 36.42 42.98 51.18
25 37.65 44.31 52.62

26 38.89 45.64 54.05
27 40.11 46.96 55.48
28 41.34 48.28 56.89
29 42.56 49.59 58.30
30 43.77 50.89 59.70



90
FORMULAE AND DECISION RULES
Single Mean Dependent Means Independent Means Correlation
Parameter Value
D

1
-
2

Sample Value
n
X
M

=
n
X
M
D
D

=

M
1
M
2

n
Z Z
r
Y X
=
Standard Error
n
M
o
o =
n
D
M
D
o
o =
2
2
2
1
2
1
2 1
n n
M M
o o
o + =


.....
Unbiased estimate of
population variance
( )
s
X M
n
2
2
1
=


( )
1
2
2

n
M X
s
D D
D

( ) ( )
2
2 1
2
2 2
2
1 1 2
+
+
=

n n
M X M X
s
pooled

.....
Estimate of Standard Error ( )
( ) 1
2

=

n n
M X
s
M

( )
) 1 (
2

=

n n
M X
s
D D
M
D

|
|
.
|

\
|
+ =

2 1
2
1 1
2 1
n n
s s
pooled M M

2
1
2

=
n
r
s
r

df n 1 n 1 n
1
+ n
2
2 n 2
100(1-o)% Confidence
Interval (o known)
M Z
c
o
M
M
D
Z
c
o
MD
(M
1
M
2
) Z
c
o
M1-M2
.....
100(1-o)% Confidence
Interval (o unknown)
M t
c
s
M
M
D
t
c
s
MD
(M
1
M
2
) t
c
s
M1-M2
.....
H
0
H
0
: =
0
H
0
:
D
= 0 H
0
:
1
-
2
= 0 H
0
: = 0
H
1
(non-directional) H
1
: =
0
H
1
:
D
= 0 H
1
:
1
-
2
= 0 H
1
: = 0
H
1
(directional) H
1
: >
0

or H
1
: <
0

H
1
:
D
> 0
or H
1
:
D
< 0
H
1
:
1
-
2
> 0
or H
1
:
1
-
2
< 0
H
1
: > 0
or H
1
: < 0
Test statistic (o known)
M
M
Z
o

0

=
D
M
D
M
Z
o
=
2 1
2 1
M M
M M
Z

=
o

.....
Test statistic (o unknown)
M
s
M
t
0

=
D
M
D
s
M
t =
2 1
2 1
M M
s
M M
t

=
r
s
r
t =



91


Decision Rule Two tailed One tailed (Rejection
region in upper tail)
One tailed (Rejection
region in lower tail)
Z-test Reject H
0
if |Z| > Zc Reject H
0
if Z > Zc Reject H
0
if Z s Zc
t-test Reject H
0
if |t| > tc Reject H
0
if t > tc Reject H
0
if t s tc


Single mean Independent
means
Correlation
Determining power
n o =
2
n
o =
1 = n o
Determining sample size
2
|
|
.
|

\
|
=

o
n
2
2
|
|
.
|

\
|
=

o
n 1
2
+
|
|
.
|

\
|
=

o
n
Size of effect
(small, medium, large)
.2 .5 .8 .2 .5 .8 .1 .3 .5

Prediction Equation: Y bX a ' = + , where
b r
s
s
Y
X
=
and a M bM
Y X
=
Standard error of estimate:
For small samples:
( ) ( )( )
s
Y Y
n
s
n r
n
Y X Y .
'
=

2
2
2
1 1
2

For large samples: ( ) s s r
Y X Y .
= 1
2

Standardised Residual:
YX
residual
s
Y Y
Z
'
=

_
2
goodness of fit statistic:
( )
_
2
2
=

f f
f
o e
e
and df = number of categories 1
_
2
test of independence statistic:
( )
_
2
2
=

f f
f
o e
e
and df = (rows - 1)(columns - 1); f
e
=
row total column total
N

Cramers
( ) 1
2

=
s N
_
| , where s = smaller of rows or columns





92
GLOSSARY OF SYMBOLS

GREEK
LETTERS

DESCRIPTION
o alpha Level of significance -probability of Type I error
| beta Probability of Type II error
1-| Power
gamma Effect size (in units of o)
o delta Effect size (in units of o
M
)
mu Population mean

D
Mean of sampling distribution of M
D


1
-
2
Mean of sampling distribution of M
1
- M
2

rho Population correlation coefficient
o sigma Population standard deviation
o
2
sigma squared Population variance
o
M
Standard error of the mean

o
M
2

Sampling variance
o
MD
Standard error of mean differences
o
M1-M2
Standard error of difference between independent
means
E uppercase sigma Summation sign
| phi phi coefficient
_
2
chi square statistic


ENGLISH
LETTERS



DESCRIPTION
a Y intercept of a regression line
b Slope of a line; regression coefficient
df Degrees of freedom
e Error in regression
fe Expected frequency
fo Observed frequency
H
0
Null hypothesis
H
1
Alternative hypothesis

M or X
Sample mean
n or N Sample size











93
ENGLISH
LETTERS



DESCRIPTION
p Probability or proportion
r or r
xy
Pearson product-moment correlation coefficient
r
2
r squared - proportion variance accounted for
S Sample standard deviation (descriptive)
S
2
Sample variance (descriptive)
s Sample standard deviation, estimate of o
s
2
Sample variance, unbiased estimate of o
2


s
pooled
2

Sample variance of observations within groups
s
M
Estimate of o
M


s
M
2


Estimate of o
M
2

s
MD
Estimate of o
MD

s
M1-M2
Estimate of o
M1-M2

s
Y.X
Standard error of estimate
SS Sum of squared deviations around mean
t Students t statistic
tc critical value of t
X Raw score (obtained score)
X
D
or D Difference score (X
1
- X
2
)
x Deviation score (X - M)
Y Observed score on the criterion variable

Y Y

or '
Predicted Y score
Z Standard score
Zc critical value of Z