Professional Documents
Culture Documents
Statistics III
Jan Schepers
IPN/PSY3008
Period 4
Contents
Askpsy.nl is the website for FPN faculty information, FAQs, and contact options.
Note that the 2022/23 course may have COVID-19 related restrictions and based on the
situation we might need to adjust our way of education. Please contact your tutor, course
coordinator or mentor whenever you have questions or remarks related to the teaching
format. We are grateful for your input and flexibility in these times.
FPN regards behaviour in compliance with its core values as being of great importance. A
Code of Conduct has been developed to ensure a good and productive study environment
and to avoid undesirable and unwanted situations.
1
3. Attendance
The tutor registers your presence. Be aware that, if you arrive more than 10 minutes after
the official starting time of the meeting or if you leave more than 10 minutes before the
actual ending time of the meeting, you are considered to be absent.
There is an attendance obligation with respect to the tutorial group meetings (IPN/PSY3008)
and the practicals SPSS (IPN/PSY3201). The document entitled rules and regulations of the
education program define the attendance requirement for the tutorial meetings and the
practical meetings. Please see art. 5.8 EER and Art 7 Rules and Regulations Bachelor in
Psychology for the complete attendance rules (www.askpsy.nl/regulations). If you miss
more than the allowed number of tutorials to obtain your attendance obligation for
IPN/PSY3008, you will have to take the course again next year. Note: In some courses, it is
possible to join a meeting in another tutorial group. Joining a meeting in another tutorial
group will however not be registered as part of your attendance obligation. You need to
pass all attendance requirements in your own tutorial group. If you miss one practical
SPSS (but not more than one), you may apply for a catch-up assignment to obtain your
attendance obligation for IPN/PSY3201. Contact the course coordinator for the catch-up
assignment.
If you miss a meeting, you may join this meeting in another tutorial group.
However, this is only allowed if the tutor of the group you would like to join,
agrees that you will join the tutorial group you are officially not registered for.
2
4. Covid-19
Due to COVID-19, this course may be offered online or partially online. FPN will comply with
the measures set by the Dutch government and Maastricht University. See
maastrichtuniversity.nl and askpsy.nl for latest information.
In COVID-19 times, you will be assigned randomly to groups, as in normal times. But,
reshuffling of groups by the coordinator is allowed to configure groups in a most optimal
way if necessary (for example, to find a balance between students who are actually
physically here in Maastricht and students who have to be online because they are still
abroad due to travel restrictions). Any reshuffling will be communicated by the coordinator
to the Education Office. The office will arrange a regrouping in the system too. This is
important, so that you and your peers always have correct time table information. In case
your time table is not correct, please contact the education office.
3
B. Course Information
1. Course Planning Group
Nick Broers
Department of Methodology & Statistics,
Faculty of Psychology and Neuroscience,
Email: alberto.cassese@maastrichtuniversity.nl
Philippe Verduyn
Department of Work and Social Psychology,
Faculty of Psychology and Neuroscience,
Email: philippe.verduyn@maastrichtuniversity.nl
2. Course Description
The course will cover three methods: logistic regression, reliability analysis and factor
analysis.
Logistic regression is the counterpart of ANOVA and regression analysis for dichotomous
rather than continuous dependent variables. Examples include: recovering from an illness
and passing an exam. Using logistic regression it is possible to correct for confounding and
to investigate interaction if there are multiple independent variables and a dichotomous
dependent variable. It is therefore an extension of the contingency table analysis (covered
in Statistics 1), allowing multiple independent variables to be handled, of both the
categorical and continuous type. We will limit ourselves to between-subjects designs for the
reason that logistic regression for repeated measures is a too complicated topic for the
bachelor.
Reliability analysis is a classic psychometric method for analysing tests and questionnaires.
Tests and questionnaires are used as measuring tools in many studies. This involves
assigning binary scores for the answers provided by respondents to a number of multiple-
choice questions (items) and adding up the scores to obtain an overall score for the
characteristic being measured (such as intelligence or attitude). This is done on the
assumption that each item measures the same characteristic. Reliability analysis is a tool for
verifying how well each item fits the scale and how reliable the overall score is. The course
offers training in classical test theory (psychometrics) and an introduction to modern
psychometrics. Other topics that will be addressed are the relationship between reliability
and validity, and the difference between reliability and agreement.
Factor analysis provides a method for reducing a large number of variables to a small
number of underlying factors. Historically, the application of factor analysis mainly focused
on reducing scores from a range of tests to a small number of dimensions, such as verbal
and spatial intelligence or extraversion and neuroticism. Nowadays, factor analysis is often
4
used to group items within one and the same questionnaire into subscales. Factor analysis
is therefore related to classical reliability analysis. This course offers training in exploratory
factor analysis using SPSS, focusing on the selection of the number of factors and on
rotation and interpretation. In the final part, participants are introduced to confirmative
factor analysis.
The second part will deal with test theory. In week 3, the focus is on classical reliability
theory, measures of test reliability (Cronbach’s , split-half, test-retest) and item analysis.
Week 4 uses the Rasch model to provide an introduction to modern test theory, with an
emphasis on the similarities and differences compared with classical theories and
techniques, and the advantages of modern theories and techniques.
The third and last part begins in week 5 and first looks at factor analysis. The emphasis will
be on what is referred to as exploratory factor analysis and three choices that need to be
made in this regard: the extraction method, the number of factors and the method of
rotation. We will see that commonly used options are often not the best. In addition to this,
an outline of confirmatory factor analysis will be provided. In Week 6, the topics of validity
and agreement will form the conclusion of this course.
The teaching method for each component of the course will be a mix of lectures, tutorials,
SPSS practicals, response lectures (i.e., Q&A lectures) and formative tests. Each lecture will
discuss a statistical method using a real-world or simplified example. During the tutorial, the
relevant method will be repeated using general theory questions and applied using
calculation work on paper. During the practical, the method is then applied to another
example with the help of SPSS and the results will be discussed in the next tutorial. A
response lecture (Q&A lecture) will be held for each course component in order to provide a
safety net for any subject matter that was not discussed or understood. For each tutorial,
the course manual enables independent study through knowledge questions. The response
lectures will also be open to questions that arise after the student has practised
independently with the knowledge questions.
In addition to this, the following teaching materials will be made available via Canvas:
- two formative tests
- for each lecture, a PDF handout of the slides
- for each lecture, a summary
- Questions posted by students on Canvas > Discussions (the lecturer will reply on a
weekly basis)
5
3. Intended Learning Outcomes
Students:
- are able to explain relevant concepts central to this module, including confounding and
interaction, classical psychometrics, reliability, modern psychometrics, item response
theory, Rasch model, validity, agreement,
- are able to explain and apply specific statistical techniques, such as three-way
contingency table analysis, logistic regression, reliability analysis (including item
analysis) and exploratory factor analysis, and they can interpret relevant output of
these techniques
- are able to specify the assumptions of statistical techniques that were discussed in this
module and are able to apply this knowledge when analyzing data.
5. Course Schedule
Lectures:
6 Feb 08:30-10:30 Contingency tables, odds ratio, stratification,
confounding, interaction
14 Feb 08:30-10:30 Logistic regression
28 Feb 08:30-10:30 Classical psychometrics, reliability, item analysis
7 Mar 11:00-13:00 Modern psychometrics, the Rasch model, item
and test information
14 Mar 11:00-13:00 Factor analysis
23 Mar 08:30-10:30 Validity and agreement between assessors
6
Practical SPSS*:
9 Feb refer to your timetable Contingency table analysis
16 Feb refer to your timetable Logistic regression
7 Mar refer to your timetable Classical psychometrics
21 Mar refer to your timetable Factor analysis
Tutorials**:
7 Feb refer to your timetable tutorial 1
10 Feb refer to your timetable tutorial 2
15 Feb refer to your timetable tutorial 3
27 Feb refer to your timetable tutorial 4
3 Mar refer to your timetable tutorial 5
8 Mar refer to your timetable tutorial 6
13 Mar refer to your timetable tutorial 7
17 Mar refer to your timetable tutorial 8
22 Mar refer to your timetable tutorial 9
27 Mar refer to your timetable tutorial 10
7
Orlando: Harcourt Brace Javonovich College Publishers. A more technical
introduction into psychometrics and factor analysis. See Chapter 15 on modern
psychometrics in particular.
The ‘SPSS in steps’ reader issued in the first bachelor year.
8. Examination/Assessment plan
The exam will consist of 24 questions with three answer options (2 or 3 questions for each
tutorial), with a number of questions based on an extract of SPSS output. To gain an
impression of the nature of the exam, a formative exam will be made available. You will
have three hours to complete the exam for this course. Sixteen correct questions (or
more) means a pass.
The date and time of the exam inspection will be announced on Canvas on the date the
final results are submitted to the Exam Administration.
9. SPSS Practicals
This course is linked to the SPSS practical IPN/PSY3201. All assignments of that practical
are included in this syllabus.
8
C. Tasks
Tutorial 1: Statistics III
This tutorial will include two types of exercises: calculations and general theory questions. The
emphasis will be on the calculations. The aim is twofold: (1) learning how to calculate odds
ratios for 2*2 and 2*2*2 contingency tables, and (2) learning how to investigate confounding
and interaction for dichotomous dependent variables.
Calculations
Objective:
Practising the calculation and interpretation of odds ratios and log odds and checking for the
presence of interaction or confounding.
It is a well-known fact that many Psychology students find statistics a challenging topic. The pre-
university education and study effort are often pointed to as the causes. Whereas statistics is an
exact science, psychology studies attract a large number of students who have had a non-
scientific previous education. Statistics does not enjoy any popularity as a topic and therefore the
study efforts are lower.
In a (fictitious) study we will verify the extent to which the probability of passing a statistics
exam is determined by previous education and study efforts. This results in the contingency table
below.
Failed 30 20 40 10
Passed 10 40 20 30
9
a) First, set up a contingency table for the association between both predictors (previous
education and effort) without any distinction between fail/pass and then calculate the OR. Is this
a balanced or unbalanced design? Is there a confounding?
b) Now draw a graph expressing the effect of effort on the probability of a pass, whilst keeping
previous education constant (or put differently, for each type of previous education individually).
Do this in a similar way to a two-way ANOVA. Plot the logodds for a pass, LN [P(pass)/P(fail)],
against the Y axis. Put effort on the X axis and draw a separate line for each type of previous
education.
c) Calculate the odds ratio for the association between effort and the probability of a pass for
students with previous education Alfa on the one hand and for students with previous education
Beta on the other.
Code the variables as follows:
pass: 0=fail, 1=pass; effort: 0=low, 1=high; previous education:0=Alfa, 1=Beta.
Is there an interaction between previous education and effort? How is this reflected in the graph
for (b)? What is the exact relationship between the odds ratio and the slope of the lines in the
graph?
d) Now set up a contingency table for the association between effort and the probability of a pass
without distinguishing between the types of previous education, and calculate the odds ratio.
Compare this with question (c) and explain the difference.
e) Draw a graph expressing the effect of type of previous education on the probability of a pass,
whilst keeping effort constant (or put differently, for each level of effort individually). Plot the
log odds LN [ P(pass)/P(fail) ] against the Y axis. Put type of previous education on the X axis
and draw a separate line for each level of effort.
f) Calculate the odds ratio for the association between type of previous education and the
probability of a pass for students with a low effort level on the one hand and for students with a
high effort level on the other hand. Apply the same coding to the variables as in question (c).
Is there an interaction between previous education and effort? How is this reflected in the graph
for (e)? What is the exact relationship between the odds ratio and the slope of the lines in the
graph?
g) Now set up a contingency table for the association between type of previous education and
probability of a pass, without distinguishing between effort levels, and calculate the odds ratio.
Compare this with (f) and explain the difference.
10
Previous education Alfa (= Previous education Beta (= exact
language and culture studies) sciences)
Low effort High effort Low effort High effort
Failed 30 20 40 10
Passed 20 30 10 40
a) First, set up a contingency table for the association between both predictors (previous
education and effort) without any distinction between fail/pass and then calculate the OR. Is this
a balanced or unbalanced design? Is there a confounding?
b) Now draw a graph expressing the effect of effort on the probability of a pass, whilst keeping
previous education constant (or put differently, for each type of previous education individually).
Plot the log odds of a pass against the Y axis. Put effort on the X axis and draw a separate line
for each type of previous education.
c) Calculate the odds ratio for the association between effort and the probability of a pass for
students with previous education Alfa on the one hand and for students with previous education
Beta on the other.
Apply the same coding to the variables as in assignment 1.
Is there an interaction between previous education and effort? How is this reflected in the graph
for (b)? What is the exact relationship between the odds ratio and the slope of the lines in the
graph?
d) Now set up a contingency table for the association between effort and probability of a pass
without distinguishing between the types of previous education, and calculate the odds ratio.
Compare this with question (c) and explain the difference.
e) Draw a graph expressing the effect of the type of previous education on the probability of a
pass, whilst keeping effort level constant (i.e., for each level of effort individually). Plot the log
odds against the Y axis, put type of previous education on the X axis and draw a separate line for
each level of effort.
f) Calculate the odds ratio for the association between type of previous education and the
probability of a pass for students with a low effort level on the one hand and for students with a
high effort level on the other hand. Apply the same coding to the variables as in question (c).
Is there an interaction between previous education and effort? How is this reflected in the graph
for (e)? What is the exact relationship between the odds ratio and the slope of the lines in the
graph?
g) Now set up a contingency table for the association between the type of previous education and
a pass without distinguishing between effort levels, and calculate the odds ratio. Compare this
with (f) and explain the difference.
11
h) For each contingency table from assignment 1 and 2, indicate whether the table is plausible in
an actual study of previous education, effort and the probability of a pass and if so, explain why.
i) Let’s assume that the study from assignment 1 and 2 is carried out in practice. Which
independent variables would you add to the current set of two and why?
Let’s assume we are investigating the association between variable X and variable Y, such as
effort and performance or lifestyle and health. For each instance below, indicate what figure or
table and what technique and measure you would use to analyse the association:
- X and Y are both quantitative variables (continuous and interval or ratio measurement)
- X and Y are both dichotomous (0/1)
- X is dichotomous and Y is continuous
- X is continuous and Y is dichotomous
12
SPSS practical 1: Statistics III
Objective:
During this practical, we will apply the methods covered in tutorial 1 to other examples and with
the help of the computer. In addition to odds ratios, we will now also obtain standard errors and
tests of significance. This means we will no longer limit ourselves to descriptive statistics and
draw conclusions about population parameters as well.
Please note that a set of specific SPSS instructions for the following assignments can be found on
page 16.
In his book "The paradoxicon" (Wiley, 1990), Falletta discusses a number of instructive
examples of contingency table analysis. One of them relates to the probability of recovery among
the male and female patients of two doctors. The contingency table below provides a summary of
the data. Based on statistical analyses that we will replicate next, researcher I concludes that men
have a higher probability of recovery compared to women whereas researcher II concludes there
is no difference in the probability of recovery between the sexes.
Doctor A Doctor B
This contingency table can be found in the STAT3PR12a.SAV file and has the following
structure:
There are 8 records, with each record containing one of the eight cells from the contingency
table. The variables are: recovery, sex, doctor, frequency. The last variable indicates the number
of people in the relevant cell. This is a compact method for storing data that can only be used for
a small number of non-continuous variables.
Before starting the analysis, the user must indicate in SPSS that the file does not simply contain 8
persons, but that each record in the working memory must be multiplied by the frequency before
the analyses are carried out.
13
In SPSS, the option Data-Weight cases is used to specify this (select ‘Weight cases by’ and
choose the variable ‘frequency’).
Now it is possible to do a contingency table analysis. For checking purposes, the total N for the
following analyses must be 1600, not 8.
a) Calculate the contingency table of sex (columns) against recovery (rows) in SPSS (Analyze >
Descriptive Statistics > Crosstabs). Also request the following output: observed and expected
counts (see the tab ‘Cells…’), the recovery percentage for each sex (see the tab ‘Cells…’), as
well as the χ2 test, correlation and odds ratio (see the tab ‘Statistics …’; For the odds ratio, select
‘Risk’). What is your conclusion regarding the difference in the probability of recovery between
the sexes?
b) Repeat the analysis, now using doctor as the stratifying factor (place the variable ‘doctor’ in
the window ‘Layer 1 of 1’). As additional output, request the Mantel-Haenszel test for the
common odds ratio (see the tab ‘Statistics …’) and answer the following questions:
- Which null hypotheses do the ‘Chi-square Tests’ tables evaluate now?
- What does the ‘Risk Estimate’ table mean?
- Which null hypothesis is tested in the ‘Tests of Homogeneity of the Odds Ratio’ table and what
is your conclusion from this test?
- Which null hypothesis is tested in the ‘Tests of Conditional Independence’ table and what is
your conclusion from this test?
- Which null hypothesis is tested in the 'Mantel-Haenszel common odds ratio estimate' table and
what effect is being estimated?
- In what order should these five tables be read and what are the circumstances in which each
table is relevant?
- What is your conclusion regarding the difference between the sexes?
- How large is the observed effect of the patient’s sex in the random sample? What is the range
of plausible values of this effect in the population, given the data?
c) Now run the two following contingency tables with the same additional output as for question
1a:
- doctor (columns) * recovery (rows)
- doctor * recovery with the patient’s sex as stratifying factor (+ Mantel-Haenszel)
Now answer the following questions:
- Which of these two contingency tables is the most important and why is this?
- What conclusions can you derive from these tables regarding the impact of doctor on recovery?
e) Using the three contingency tables in (c) and (d), explain the different conclusions in the
analyses for (a) and (b). Which researcher is right? And which has the bigger impact, the
patient’s sex or the doctor?
14
Assignment 2. It all depends on your approach.
Chronic fatigue? Low stress High stress Low stress High stress
levels at work levels at work levels at work levels at work
No 60 40 30 10
Yes 40 60 20 40
This contingency table can be found in the STAT3PR12c.SAV file and has the following
structure:
There are 8 records, with each record containing one of the eight cells from the table. The
variables are: fatigue, stress, personality type and number (of respondents).
The file structure is therefore the same as for assignment 1. Before starting the analysis, specify
in SPSS that each record must be copied X times (using Data-Weight cases).
For checking purposes, the total N for each analysis must be 300, not 8.
a) Request the contingency table for work stress (columns) and fatigue (rows). Also request the
observed and expected cell frequencies and relevant percentages, as well as the χ2 test,
correlation and odds ratio. What is your conclusion regarding the effect of work stress on chronic
fatigue?
b) Repeat the analysis, now using personality type as the stratifying factor. As additional output,
request the Mantel-Haenszel test for the common odds ratio. For each combination of ‘work
stress’ and ‘personality type’, compute the log odds of fatigue by hand. Then draw on a piece of
paper a plot of these log odds against work stress, with a separate line for each personality type:
- What is the relationship between the odds ratio and these lines?
- Can you see an interaction between work stress and personality type? If so, is this interaction
significant?
c) Now request the contingency table for personality type and fatigue, along with any relevant
additional output (refer to question (a)). What is your conclusion regarding the effect of
personality type on chronic fatigue?
15
d) Repeat the analysis of question (c), now using work stress as the stratifying factor. Request the
Mantel-Haenszel test as additional output once more. Plot the log odds for fatigue against
personality type, with a separate line for each level of work stress. What is your interpretation of
this pattern? What is the significance, if any?
e) Finally, request the contingency table for both independent variables and exclude the
dependent variable. Is the design unbalanced? What are the implications of any unbalance for the
effect analysis? Draw a distinction between the presence and absence of interaction!
f) Which odds ratios and p-values would you report based on all the analyses in this assignment?
Give reasons for your answer (Tip: interaction? Confounding?)
a) Specify in SPSS that the ‘frequency’ variable must be used as the weighting (via Data -
Weight cases).
Now run the contingency table analysis. Contingency table analysis:
Select: Analyze - Descriptive Statistics - Crosstabs.
Fill in the rows (recovery) and columns (sex) and leave layers blank
Use the Statistics and Cells buttons to select the desired output
(Note: use the Risk box to obtain the odds ratio).
Skip the Exact and Format buttons.
Check that the total N in the output is equal to 1600 rather than just 8. Otherwise the weighting
based on the variable ‘frequency’ has not worked.
b) As under (a), but specify doctor as the variable for the layers. In the Statistics section, request
the Mantel-Haenszel test for a common odds ratio of 1.
c) As under (a) and (b), but switch sex and doctor around.
16
Tutorial 2: Statistics III
This tutorial is dedicated in its entirety to the discussion of the results of the SPSS contingency
table practical. Any remaining time can be spent on any assignments from the previous tutorials
that have not been discussed yet.
1. For the relevant assignment, first review the global structure of the SPSS output.
2. Then discuss the output using the questions for this practical as set out in this course
textbook.
3. Any questions or issues that remain unclear can be raised during the Q&A session.
17
Tutorial 3: Statistics III
This tutorial is similar to tutorial 1. We will first perform two calculations and then continue with
a general theory question. The calculations are intended to provide insight into the relationships
between logistic regression weights on the one hand and odds ratios on the other hand, and this
with and without correcting for confounding, and as main effect or a simple effect.
If these relationships are not understood, the SPSS output for logistic regression (next practical
and tutorial) will be impossible to grasp. During this tutorial, we will for now limit ourselves to
descriptive statistics once again. We will return to inferential statistics in tutorial 4.
The starting point is the contingency table below, taken from assignment 1 in tutorial 1. This
related to a fictitious study of the relationship between previous education, effort and the
probability of a pass for Psychology students taking a statistics exam.
Failed 30 20 40 10
Passed 10 40 20 30
a) Begin by calculating the odds ratio for the association between effort and the probability of a
pass for each level of previous education. Next, also calculate the odds ratio for the association
between previous education and the probability of a pass for each effort level.
b) Let’s assume we apply logistic regression to these data, using the probability of a pass as the
dependent variable and previous education, effort and previous education*effort as the
predictors. Define the regression model for this study, using the equation log odds = 0 .
c) For each level of previous education and effort, define the log odds for the probability of a
pass in the form of betas, as demonstrated during the lecture. Use the following coding:
pass: 0 = fail, 1 = pass; effort: 0 = low, 1 = high; previous education: 0 = Alfa, 1 = Beta.
d) Using the contingency table, calculate the log odds for each combination of previous
education and effort. Now plot the log odds (Y axis) against effort (X axis) for each level of
18
previous education and do the same for the log odds against previous education for each effort
level.
e) Use the results under (c) and (d) to calculate the best estimate for each regression weight (see
lecture slides 35 and 43 for an example). In the plots for (d), indicate which regression weight
corresponds to which slope. What does the regression weight represent, in terms of probabilities
or log odds?
f) Review question (c). Express the odds ratio for the association between effort and the
probability of a pass for students with a previous education in Alfa studies as regression weights.
Do the same for students with a previous education in Beta studies (for an example, refer once
more to lecture slides 35 and 43).
g) Calculate the odds ratios referred to in (f) by completing the beta estimates in question (e).
Compare your results with the odds ratios calculated in question (a).
h) Repeat (f) and (g) for the association between previous education and the probability of a pass
for each effort level.
i) Explain why in this assignment we have calculated the odds ratios using logistic regression, if
the same can also be obtained simply on the basis of the contingency table as in question (a). Put
differently, how are contingency table analysis and logistic regression similar and how do they
differ?
j) How can this regression model be simplified without compromising on ‘goodness of fit’ (that
is, without reducing the match with the data)?
The starting point is the contingency table below, taken from assignment 2 in tutorial 1.
Now answer the same questions as in assignment 1d to 1h in the current tutorial.
Failed 30 20 40 10
Passed 20 30 10 40
19
Assignment 3. Interaction and main effects too?
a) Let’s assume that when studying the effects of effort and previous education on the probability
of passing statistics, you found a significant interaction between effort and previous education.
How should you go about analysing the effect of effort on the probability of passing?
20
SPSS practical 2: Statistics III
Subject: SPSS logistic regression (bring the output from practical 2!)
Objective:
During this practical, we will use the computer to apply the methods covered in tutorial 3 to
other examples, so that we will now also obtain standard errors and tests of significance. This
means we are switching back from descriptive statistics to inferential statistics.
We will compare the SPSS output for logistic regression with the SPSS contingency table
analyses from the first practical (as discussed in tutorial 2). The aim is to gain a good
understanding of the exact relationship between the two analysis methods and the outcomes they
produce.
Please note that a set of specific SPSS instructions for the following assignments can be found on
page 24.
Now carry out a logistic regression analysis using recovery as dependent variable. Build the
model using three blocks: First, select only sex as the predictor, then add doctor and finally the
sex*doctor interaction. Also request 95% confidence intervals for the odds ratios and the
Hosmer-Lemeshow goodness of fit test.
Questions:
a) Begin by reviewing the output in its entirety. Which model is being fitted to the data in each
block? For block 3, now review the tables listed below, check which null hypothesis is being
tested in each table and what conclusion can be derived from this for the relevant block/model:
Omnibus Tests of Model Coefficients – Hosmer and Lemeshow Test – Variables in the
Equation. Then repeat these steps for block 2 and subsequently for block 1.
c) Compare the output for each model (= block) with the contingency tables in assignment 1 of
practical 1. Begin by checking which model corresponds with which analysis from practical 1.
Next, attempt to match the output of each regression model to one or more tables from practical
1. Use both the odds ratios and the p-values when you do this.
21
d) What conclusions can be drawn regarding the effects of sex (of the patient) and doctor on the
probability of recovery? Take account of the p-values, as well as the odd ratios and the
corresponding confidence intervals.
e) Let’s assume the selected model will be used to predict for each patient in the random sample
whether he or she will recover, based on their sex and doctor. How can you use this model to
obtain the best possible prediction? (Tip: How can you calculate the probability of recovery for
each individual? And how can you then convert this probability into a prediction?)
f) The result of these predictions is shown in the Classification Table. Which percentage of
patients had the correct prediction? Is this a high or a low percentage? (Tip: what percentage can
be achieved by guessing if the only fact that is known is that 50% in total will recover?)
Questions:
a) Let’s assume we select the model that includes interaction. Express this model in the form of a
regression equation. Complete the equation for each group (work stress * personality type), using
the coding of each variable. Use the table below to do this. For each group, you will now obtain
the log odds for fatigue, expressed as regression weights or betas. Refer to tutorial 3, assignment
1b and 1c, for an example.
Work Personali Work stress Log odds expressed as log odds as reported in
stress ty type * regression weights (betas) the output for
personality assignment 2 of
type practical 1
Low (0) B (0) 0*0 = 0 … …
Low (0) A (1) 0*1 = 0 … …
High (1) B (0) 1*0 = 0 … …
High (1) A (1) 1*1 = 1 … …
22
b) Now fill in the SPSS estimates of the regression weights in the equations for 2a). This
expresses the log odds for each group in numerical terms (example: tutorial 3, assignment 1d-
1e). Compare the results with those of assignment 2 of practical 1.
c) Use SPSS to calculate the log odds (of fatigue) for each individual based on the probability of
fatigue (which you previously calculated and saved during the logistic regression as variable
PRE_1 using Save). Do this by using COMPUTE to create a new variable called ‘logodds’,
where logodds = LN (PRE_1/(1 – PRE_1)). Next, request a graph that plots this new variable
against work stress, with a separate line for each personality type. Then request another graph to
plot the new variable against personality type, with a separate line for each level of work stress.
d) Compare the lines in both graphs with your calculations for question 2a. What is the
relationship between each line and the betas in the regression model for question 2a? What
conclusions can you derive from the graphs with regard to the effects of work stress and
personality type?
e) How does SPSS convert the log odds of fatigue to a prediction for each individual (where 1=
fatigued, 0 = not fatigued)? Compare your thoughts with the fatigue prediction that you
calculated and saved during the logistic regression in SPSS through the use of Save. Next,
request a contingency table of the observed fatigue (row) and the predicted fatigue (column). In
other words, this is a contingency table of the dependent variable against the saved predicted
value which was retained using Save. Compare this contingency table with the Classification
Table for the regression model with interaction.
f) Use the outcomes of question a) to manually calculate the following odds ratios: the odds ratio
for the effect of work stress on fatigue in people with a Type B personality and the same odds
ratio for people with a Type A personality. Compare the results with assignment 2b of practical
1.
g) Repeat question f) for the effect of the personality type on fatigue for each level of work
stress. Compare the results with assignment 2d of practical 1.
h) Let’s assume we select the model with interaction. Which follow-up analysis could we use to
estimate and test the effects of work stress and personality type? Carry out this analysis with
logistic regression. Compare the results with those for question f) and g), as well as those for
assignment 2 of practical 1.
i.) Let’s assume we select the model without interaction. Compare the resulting main effects
(odds ratios and p-values) with assignment 2 of practical 1. Explain how the simple effects in
question h) could be used to roughly calculate these main effects.
j) Which of the models in the output do you prefer and why? Do not only consider the p-values
of the predictors, but also look at the Hosmer-Lemeshow Test and the Classification Table.
23
SPSS instructions for practical 2
Assignment 1
First check the weighting based on ‘frequency’. The total N must be 1600. If N = 8 in your
output, the weighting has not been applied. In this case, you must first enable the weighting
using Data -Weight cases.
Assignment 2
As for assignment 1, using a different file. Under Save, tick both predicted values.
2h) Select Data – Split file – Organize output by groups, groups based on personality type and
click OK. Then carry out the logistic regression using work stress as a predictor for fatigue. Now
undo the file split using Data – Split file – Analyze all cases. Repeat these steps for the simple
effects of personality type by using work stress to split the file.
24
Tutorial 4: Statistics III
This tutorial is dedicated in its entirety to the discussion of the results of the SPSS logistic
regression practical. Any remaining time can be spent on any assignments from the previous
tutorials that have not been discussed yet.
1. For the relevant assignment, first review the global structure of the SPSS output.
2. Then discuss the output using the questions for this practical as set out in this course
textbook.
3. Any questions or issues that remain unclear can be raised during the Q&A session.
25
Knowledge questions part 1: contingency tables and logistic regression
Contingency tables
1. What type of graph or table is used to represent the correlation between two continuous
variables, such as intelligence and income? Which statistical measure and test correspond with
this?
2. Answer the questions under 1 for the correlation between two dichotomous or binary
variables, such as gender and being in paid employment.
3. Let’s assume we have a contingency table for treatment (yes/no) and recovery (yes/no).
Formulate a null hypothesis to establish a difference and a null hypothesis to establish a
correlation. How do these hypotheses relate to each other?
4. How does the contingency table calculate the frequencies expected under the null hypothesis?
5. Which two assumptions are made in a Chi-square test for a contingency table?
6. If both variables are dichotomous, Pearson’s correlation coefficient can be rewritten to which
measure of association?
7. What is the formula for calculating the odds ratio as a measure of association for a 2*2 table?
8. What is the relation between the odds ratio, the log of the odds ratio and the correlation? What
value must each of these three measures have in order for the association to be negative, positive
or absent?
9. Let’s assume we split a 2*2 table for variables X and Y into the levels for C, a third variable.
We then perform the Mantel-Haenszel test to establish the so-called ‘common odds ratio’. Which
null hypothesis is being tested in this case? Which assumption must be applied in order for this
test to be meaningful?
10. How does one determine that an interaction between X and C exists in the sample? And what
part of the SPSS output for contingency table analysis tests whether there is interaction in the
population?
11. What is the correct follow-up analysis for the effect of X on Y if interaction is present? And
if there is no interaction?
26
Logistic regression
15. What is the principal difference between linear and logistic regression?
16. In the case of ANOVA and regression (Statistics 2), the expected value of continuous
variable Y is modelled as the sum total of a constant + main effects + interactions. In logistic
regression, Y is dichotomous. What is being modelled now as the sum total of the effects?
17. How large is ln(X) if X is 1, has a value between 0 and 1, or is greater than 1? How large are
the log odds if the probability is 50%, less than 50% or more than 50%?
19. Assume we perform logistic regression using a single predictor, X, and X is dichotomous.
How do we interpret regression weight B for X in this case? And how do we interpret exp(B)?
20. How large is the odds ratio for the effect of dichotomous variable X on dichotomous variable
Y if the regression weight B for X is 0, negative or positive?
22. Answer the question in 20 if the model contains additional predictors (no interactions).
23. Let’s assume that the logistic model for dichotomous value Y (dementia: 0 =no, 1=yes)
includes predictors X (sex: 0=male, 1=female) and C (age in years), and there is no interaction.
The regression weight (B) for X is significantly negative. Will the odds ratio be greater or
smaller than 1? And which of the sexes will have a higher probability of dementia?
24. Suppose we repeat the analysis in the previous question with the coding for the sexes
reversed: 0=female, 1=male. Which B value will you find now, and what odds ratio?
25. Suppose we repeat the analysis in the previous question with the coding for both the sexes
and dementia reversed: 0 = yes, 1 = no. Which B value will you find now, and what OR?
26. Assume the model in question 23. What is the formula for the 95% confidence interval of the
odds ratio (as a measure of the effect of gender on dementia)?
27. Let’s assume the logistic regression of Y against X, C and X*C reveals a significant
interaction effect. How should we estimate and test the effect of X on Y in this case?
28. Answer the previous question if the interaction effect is not significant at all. How does the
method that needs to be applied in this case relate to the Mantel-Haenszel test of the common
odds ratio?
27
29. In the previous tutorials and practicals, logistic regression and contingency table analysis
yielded similar results. Despite the fact that logistic regression appears much more complicated
than contingency table analysis, this method is preferable in most cases. Why is this?
28
Tutorial 5: Statistics III
As before, this tutorial will include two types of exercises: calculations and general theory
questions. The emphasis will be on the calculations. The aim is to practise the correct application
and interpretation of classical methods for reliability estimation and item analysis when applied
to psychological tests and questionnaires.
Calculations
The test known as the ISI test (where ISI stands for Intelligence, Progress and Interest) is
intended for pupils completing the last two years of their primary education. Among other things,
the ISI comprises six intelligence subtests, which each consist of 20 questions with four possible
answers. Subtest 5, titled ‘'Understanding linguistic categories' measures the ability to place
words into categories. The following item provides a basic example:
Information:
Monday – Wednesday – Saturday
Question:
Which two of the following words also belong in this list?
January – Tuesday – April – Sunday – evening
A reliability analysis of ISI-5 (20 items) has yielded the following information: the sum-score
has a mean of 12.57 and an SD of 4.74; the item-p-values gradually decrease from 0.90 (item 1)
to 0.15 (item 20); the average of all 20 item variances is 0.20; the average of all item-item
covariances is 0.05; the average of all item-item correlations is 0.25.
Questions:
29
Assignment 2. Does high reliability translate to good agreement?
The case used in the previous assignment serves as the starting point. Here we will calculate two
scores for each pupil: the sum-score for odd items and the sum-score for even items. The sum-
score for odd items is 6.57 on average (SD = 2.54). The sum-score for even items is 6.00 on
average (SD = 2.49). The Pearson correlation coefficient between the even and odd sum-scores is
0.78. For each pupil, we also calculate the difference between the even and odd sum-scores. It
turns out this difference ranges from -5 to +6.
Questions:
a) Use this information to calculate the reliability of the sum-score for the entire subtest and
compare the outcome with that in the previous assignment.
b) Can the even and odd test halves be considered parallel tests? Explain.
c) Will there be a major difference for individual students as to which half of the test they take?
Explain.
d) How does your answer for question (c) impact on the assessment of end-of year coursework or
essays? (Tip: assume the even and odd test halves represent two assessors.)
The case used in the two previous assignments serves as the starting point. We will assume that
the test-retest reliability is 0.80 (for a retest interval of half a year). At both times, the mean sum-
score and SD are the same as stated in question 1. Marieke (11), a pupil at ‘Great Leap’
elementary school, only gets 12 out of 20 test items correct when she first takes the test. When
the test is taken for the second time, she scores 16 out of 20 after intensive individual tutoring.
Questions:
a) Has Marieke genuinely improved the skill that is measured in this test? Provide a calculation
to substantiate your answer. Tip: standard error of measurement (SEM).
b) Let’s assume the test-retest reliability is not known. Is it permitted in this case to answer
question a) using the previously calculated internal consistency or split-half reliability? Provide
arguments for your answer.
30
Assignment 4. About autonomy, nurses and items
In a study into the association between working conditions and health among nursing and care
staff, an autonomy scale was included. The scale consisted of 10 items in relation to autonomy in
the workplace, with each answer having the form of a Likert scale (from 1 - very little autonomy
to 5 - a great deal of autonomy). The next page shows a reliability analysis for this scale.
Questions:
a) From a statistical point of view, which item fits least well in the scale and which the best?
b) Cronbach’s alpha is 0.84 for the overall scale. Would you remove one or more items from the
scale? Why (or why not)?
c) What is the average item-item correlation for this scale? (Tip: the Spearman-Brown formula
also applies if K < 1, in other words for a shortening rather than expanding a test.)
d) The average sum-score for the sample was 27.4 (SD = 6). Calculate the standard error of
measurement for the sum-score and indicate the maximum deviation of the sum-score from the
true score for 95% of subjects.
31
General theory questions
Assignment 5. Metaphysics
In contrast to actual measurements, it is not possible to directly observe true scores and
measurement errors. Despite this, we are still able to estimate the true score variance and the
measurement error variance. How?
Think about an intelligence test. Will the reliability of this test among the student population be
greater than, smaller than or equivalent to the reliability for the total population group of 18-25
year olds? Explain.
According to the Spearman-Brown formula for calculating the reliability of the mean or the sum
of a number of replications reliability increases as the number of replications increases. With this
in mind, how is it possible that the removal of items from a scale may result in a higher rather
than a lower value for Cronbach's alpha?
Four methods for estimating reliability have been discussed. Explain which methods amount to
the same thing, and which do not. Suppose you can or wish to report on only two of these
measures in a paper, which two would you choose and why?
32
SPSS practical 3: Statistics III
Please note that a set of specific SPSS instructions for the following assignments can be found on
page 37.
In 1981, UM staff published a large-scale survey into health perception among adult Dutch
nationals. The questionnaire that was used contained a large number of scales and individual
items. One of these, the VOEG, asked about the presence of 21 physical complaints. The
answers from a random sample of N=200 persons from the original, much larger sample have
been saved in the stat3pr34a.sav file and provide the case for this assignment (for more
information, see Appendix C). We will use classical psychometric methods to analyse these
data.
However, data from questionnaires require pre-processing before psychometric analyses can be
carried out, as a result of missing values (MV) and in some cases the answer category ‘Not
Applicable’ (N/A). This is because both the MV and N/A category have the effect that a
dichotomous or ordinal item becomes nominal and therefore unsuitable for correlational analyses
such as reliability analysis. In this instance, we will limit ourselves to a simple initial check for
the presence and implications of missing values.
Questions:
a) Use SPSS to determine the number of missing values per item. If listwise deletion is applied,
how many of the 200 persons are excluded from the reliability analysis for this scale?
b) Will listwise deletion have a serious impact in this instance? (Provide arguments for your
answer.) How can exclusion of respondents from the analysis due to MVs be prevented?
c) Perform a reliability analysis for the entire scale of 21 items (without imputing any missing
data). Do not limit the output to Cronbach’s alpha, but request additional statistics for each item
and for the scale in its entirety.
d) Which health issue occurs the least and which the most?
e) Which health issue demonstrates the smallest spread and which the largest? What is the
relation between spread and average for dichotomous items? Why and when might a large spread
be desirable?
f) How large is the reliability for an arbitrary item from this scale according to the output?
33
g) Which assumptions have been made for the reliability statement in f)? Check these
assumptions to the extent that the output permits this. What is the impact on the analysis if these
assumptions are violated?
h) Use the Spearman-Brown formula to calculate Cronbach’s alpha on the basis of the item
reliability. Compare the outcome with the alpha provided in SPSS.
i) What are the estimated true variance and measurement error variance for the sum-score? If the
measurement error has a normal distribution, what are the margins of the measurement error
assuming a probability of 95%?
j) Decide which item fits the scale least well. What is the internal consistency of the scale
without this item? Using the SPSS Reliability procedure, calculate the split-half reliability for
the 20 remaining items. Compare the result with the internal consistency of the scale excluding
the item with the worst fit.
k) Now perform a split-half reliability analysis according to the odd-even method. Once again
exclude the item with the worst fit. The SPSS Reliability procedure is not able to do this. You
must therefore calculate the sum-score for the 10 even items and the sum-score for the 10 odd
items (excluding the item with the worst fit) yourself. Compare the outcomes of questions (j) and
(k). Which split-half method is used in the SPSS Reliability procedure? Which method is
generally preferred and why? That of SPSS or the odd-even method?
l) Which assumptions are made when applying the split-half method? (Tip: parallelism.) Check
whether these assumptions were met to a reasonable extent with regard to the odd-even method.
What is the impact on the split-half reliability if these assumptions are violated?
m) Starting point: we will assume that the following subsets of items from the VOEG are two
subscales: CDGHQ and JPRTU. A reliability analysis results in a Cronbach’s alpha of 0.68 for
subset 1 and 0.82 for subset 2. The correlation between both sum-scores is 0.57. What is the true
correlation between both subscales? Take a closer look at the content of each subscale. Is the
VOEG a homogeneous scale?
n) We will now assume that the following items from the VOEG are also a subscale: ABELNS.
What is measured in this subscale? What items have not been included in a scale yet? Where do
these items fit in in terms of their content? How well do they fit into the overall VOEG from a
statistical point of view?
34
Assignment 2. Intelligent measurement
Among other things, the ISI (Intelligence, Study progress and Interest) test for pupils completing
the last two years of their primary education includes six subtests, each of which has 20 items
scored dichotomously:
1: synonyms (verbal)
2: cut figures (spatial)
3: oppositions (verbal)
4: rotating figures (spatial)
5: understanding linguistic categories (categorising)
6: understanding categories of figures (categorising)
Each of these tests is taken in a classroom setting, subject to a time-limit, in other words the
pupils must stop when the time specified in the test guidelines has elapsed. The test score is the
number of correct answers and is a function of power (the ratio of correct answers) as well as
speed (the number of items completed). During the study that resulted in this file, only around
half of all pupils were able to complete each subtest within the time-limit. This variation in the
number of items completed renders psychometric analysis of tests subject to a time-limit
complicated (as an example, see Crocker & Algina, 1986, p. 145). For this reason, the current
stat3pr34b.sav file only contains those students who were able to complete all subtests within
the time-limit. Please note that a real analysis of tests subject to a time-limit must also include
the respondents that were not able to complete everything. Psychometric methods to evaluate this
are still being developed.
Questions:
a) Carry out a reliability analysis for ISI subtest 4 (see Appendix C for an example of an item.)
Request the output you will need for an item analysis and for manually calculating the true
variance and measurement error variance of the sum-score.
b) Which items seem the easiest and which the hardest? What role could the time-limit for the
test play in this?
c) Which item fits the best in the scale and which the least well?
d) Using classical test theory, estimate the reliability of a random item. Which assumptions apply
when you do this? Verify these assumptions using the SPSS output.
e) Use the Spearman-Brown formula to calculate Cronbach’s alpha using the item reliability and
compare the outcome with the alpha provided in SPSS.
f) What are the estimated true variance and measurement error variance for the sum-score? If the
measurement error has a normal distribution, what are the margins of the measurement error
assuming a probability of 95%?
35
g) What does this margin tell you about the replicability of the test score for these individuals?
Does a small margin imply that when the same children retake the test a few months later each
child will achieve roughly the same test score as they did the first time the test was taken?
Provide arguments for your answer. (Tip: parallel-test reliability vs. test-retest reliability.)
h) Starting point: Cronbach’s alpha for ISI subtest 5 is 0.87. The sum-scores for ISI-4 and ISI-5
have a correlation of 0.49.
Question: Express an opinion on each of the statements below, providing substantive and
statistical arguments:
(i) ISI-4 and ISI-5 measure two independent personal traits
(ii) ISI-4 and ISI-5 measure the same personal trait
36
SPSS instructions for practical 3
Assignment 1
c) Select Analyze - Scale - Reliability Analysis. Select all 21 items as the input variables. Under
the Statistics button, request all the information under Descriptives for and Summaries.
j) Click the Model button within the Reliability procedure and select ‘split-half’. Uncheck all the
additional output.
k) In the data screen, use Transform – Compute Variable to calculate two variables with 10
items each (make sure you exclude the item with the poorest fit, see question j)):
- SUMODD, the sum-score for the odd items (voega + voegc etc.)
- SUMEVEN, the sum-score for the even items (voegb + voegd etc.)
Next, calculate the correlation between the two sum-scores and compute the split-half reliability
of the test by hand using the Spearman-Brown formula.
Note:
If you add up items using + signs, you will only include respondents for whom there are no
missing values. Respondents with one or more missing values will be given a missing value as
their sum-score. As far as this practical is concerned, it is OK to use this addition method as there
are hardly any missing values. This also means the number of respondents is exactly the same as
for question (j), which enhances the comparability.
For practical applications, there are better yet more complicated methods to calculate
sum-scores if there are missing values. A brief discussion of these is provided in the SPSS
manual on Reliability.
l) Carry out a paired t test for both sum-scores calculated in question (k).
In addition, carry out a reliability analysis (Model: alpha) for each of the scale halves, skipping
the optional output. Make sure you use the individual items from the relevant scale half as the
input for these reliability analyses, rather than the sum-scores calculated in question (k)!
Assignment 2
a) As for assignment (1c). Analyse only the items for ISI subtest 4!
37
Tutorial 6: Statistics III
This tutorial is dedicated in its entirety to the discussion of the results of the SPSS reliability
analysis practical. Any remaining time can be spent reviewing any assignments from the
previous tutorial (classical psychometrics) that have not been discussed yet or a look ahead to the
next tutorial (modern psychometrics).
1. For the relevant assignment, first review the global structure of the SPSS output.
2. Then discuss the output using the questions for this practical as set out in this course
textbook.
3. Any questions or issues that remain unclear can be raised during the Q&A session.
38
Tutorial 7: Statistics III
These assignments mostly concern the theory. Real world calculations with regard to modern test
theory will be too advanced for this course and require specialised software. The aim of the
assignments is to gain a good understanding of the similarities and differences in classical and
modern psychometrics and an understanding of item parameters, latent traits, item information
and test information.
Assignment 1. Modernisation
Starting point:
The 1 and 2-parameter logistic model for dichotomous data (0/1) assume unidimensionality and
monotonicity of ICCs, item characteristics curves: For all items in the test, the probability of
item score 1 (= ‘correct’ in tests, ‘agree’ or ‘yes’ in questionnaires) is a monotonically increasing
function of the same latent trait θ (unidimensionality). The slope and location of the curve in
relation to the θ axis depend on the item parameters. In addition to this, the models assume what
is referred to as local independence, which means that for a constant trait value θ all item scores
are independent of each other and therefore uncorrelated.
Questions:
a) Draw a number of ICCs according to the 2-parameter model. Explain how the item parameters
determine the slope and location of the curve. What is the difference between the 1 and 2-
parameter model?
b) Which elements of classical test theory represent unidimensionality and monotonicity? (Tip:
what is the equivalent of a latent trait in classical test theory?)
c) Which element of classical test theory represents the assumption of local independence?
Does this assumption imply that all item-item correlations are equal to zero? Explain.
d) Which statistical measures in classical item analysis (as seen in the table for question 4 in
tutorial 5) are the equivalents of the item parameters from the 2-parameter logistic model?
e) What will roughly be the outcomes of classical item analysis if the data to be analysed
correspond with the 1-parameter (Rasch) model? And what if they correspond with the 2-
parameter model? In your answer, focus on the item-p-values and the item-rest correlations.
39
f) What shape will the ICCs of the 1-parameter model take in the case of a very high
discrimination parameter (= a)? How large will Cronbach’s alpha be in that case? Come up with
a test where this might occur.
g) Answer question (f) if the discrimination parameter is very small (close to 0).
h) In assignment 1 of the SPSS Reliability practical (tutorial 6) we analysed the VOEG, a scale
of 21 items relating to health issues. Review the analysis once more. Question: Assume that we
are to apply the 1-parameter logistic model to the VOEG data. Which item would have the
highest difficulty parameter b and which the smallest?
i) Assume that we are to apply the 2-parameter logistic model to the VOEG data. Which item
would have the highest discrimination parameter a and which the smallest?
j) In assignment 1m) of the SPSS Reliability practical (tutorial 6), we analysed two item subsets
from the VOEG, being the CDGHQ and JPRTU items. We observed the following: Cronbach’s
alpha was 0.68 for subset 1 and 0.82 for subset 2. The correlation between both sum-scores is
0.57. Question: which assumption in the 1 and 2-parameter logistic models is violated in the
VOEG data according to this analysis?
Starting point:
In the 1 and 2-parameter logistic models, the general formula for item information can be defined
as I = (Da)2 P(1-P). Here, P is the probability of a correct answer and (Da) the discrimination
parameter, with D being an arbitrary constant larger than zero.
Note:
Crocker and Algina (1986, p. 353) set D at 1.7 for a specific reason they explain in their text.
Others typically opt for D = 1, as is customary in most of the literature. The choice between the
two values is similar to choosing a length in cm or mm and the only effect is that a is expressed
on a different scale. Here we will assume D = 1.
Questions:
a) What will P be if the value of latent trait θ goes to negative infinity, positive infinity or is
equal to item difficulty b?
b) What P will yield maximum item information and what P will yield minimum item
information?
c) What item difficulty will therefore provide maximum information on the individual’s ability if
the true ability θ of that individual is 1? What item difficulty will provide maximum information
on a respondent for whom θ = 0, and for a respondent for whom θ = -1?
40
d) What effect will discrimination parameter a have on the item information? If a increases, will
this information increase or decrease? Please note: P partly depends on a!
Refer to the lecture notes on modern psychometrics and look at the slides titled ‘Test information
and Test construction’. We will now calculate item information and test information for a few
cases. Use the auxiliary tables for this.
Auxiliary table 1: calculation of the probability of success P using item parameters a and b and
latent trait value θ:
Auxiliary table 2: calculation of probability of success P for the 1-parameter model (with a = 1)
θ = -2 θ = -1 θ=0 θ = +1 θ = +2
b = -1 0.27 0.50 0.73 0.88 0.95
b=0 0.12 0.27 0.50 0.73 0.88
b = +1 0.05 0.12 0.27 0.50 0.73
Auxiliary table 3: calculation of item information (a**2) P(1-P) for the 1-parameter model (with
a = 1):
θ = -2 θ = -1 θ=0 θ = +1 θ = +2
b = -1 0.20 0.25 0.20 0.11 0.05
b=0 0.11 0.20 0.25 0.20 0.11
b = +1 0.05 0.11 0.20 0.25 0.20
Questions:
a) Check the calculation of item informations in auxiliary table 3 yourself (only for b = 0).
41
b) Now calculate the test information for the heterogeneous test (b1 = -1, b2 = 0, b3 = 1) as well
as the homogeneous test (b1 = b2 = b3 = 0) in the lecture slides. Which of the two tests is the
most informative?
c) Why does optimal test construction depend on the distribution of θ in the population in which
the test will be used? When thinking about test construction, consider two features: a high or low
value for discrimination parameter a and a homogeneous or heterogeneous difficulty parameter
b.
d) Why does optimal test construction also depend on the purpose of the test (accurate
measurements across the entire skills spectrum or a selection of pupils or staff members, where
for example θ > 1 means a pass)?
42
Knowledge questions part 2: psychometrics
1. What types of items and scoring in a questionnaire or test form the focus of classical test
theory?
3. Why is the reliability of a measurement tool often indicated by the letter ρ or r, which is the
symbol for correlation (in the population and the random sample respectively)?
5. And the attenuation effect? How can we correct for this effect?
7. What is the Spearman-Brown formula used for? What input is needed and what output will
this produce?
8. Name the four methods for testing the reliability of a test or questionnaire.
9. Which of these four methods is very different from the other three? (Tip: error of
measurement.)
10. Which two types of reliability testing are in effect applications of the Spearman-Brown
formula? What input does each type require?
12. How can you apply the SEM in order to establish whether there is a difference in the true
scores of two individuals? Which assumptions do we make to do this?
13. Answer the questions in 12 for the difference between two measurements for the same
individual.
14. How does item analysis estimate the reliability of each item in a test or questionnaire?
15. Which results from an item analysis could provide grounds for possibly removing items from
a measuring tool?
43
Modern test theory
16. Give reasons why modern psychometrics came to be developed in addition to classic
psychometrics (Tips: level of measurement, parallel items, tailored testing)
17. What does the assumption of unidimensionality in modern test models mean? Which
assumption is the equivalent in classical test theory?
18. Answer the questions in 17 for the assumption of ‘monotonicity’. How can you roughly
verify this assumption using statistical methods from classical psychometrics?
19. Answer the questions in 17 for the assumption of ‘local independence’. Does this assumption
imply that all item-item correlations are equal to zero? How can you verify this assumption?
20. Which parameters does the 1-parameter (Rasch) model from modern test theory include?
And the 2-parameter model?
21. For each parameter in question 20, indicate which statistic in classical item analysis more or
less corresponds with this.
22. The true score model from classical test theory does not include item parameters. What type
of result should an item analysis produce according to that model? (Think back to question 21.)
23. In education, there is often a need to compare (groups of) individuals based on their test
results, although not all of these individuals have answered the same items. Examples are the
Dutch CITO tests in primary education (the tests change year on year) and the progress tests
administered by UM (each student selects the items he or she will answer). How is it possible to
draw fair comparisons between individuals and what assumptions are required to guarantee this?
24. What does the concept of item information mean? Will this information only depend on the
item or does the individual tested also have an impact?
25. How can item information be derived from a graph containing item characteristic curves?
What probability of a correct answer produces maximum item information and for what
probability is this information minimal?
26. What does the concept of test information mean and how does it depend on item
information?
27. Let’s assume a test is used to decide on a pass or a fail (an example could be a course exam).
What test construction should be used to obtain maximum information? Should the test include
items that vary from very easy to very difficult or should the test simply consist of moderately
difficult items?
28. Answer question 27 if the aim is to obtain the best possible measurements across the entire
skills spectrum. What role does the item discrimination parameter play in this?
44
Tutorial 8: Statistics III
During this tutorial, we will run through each step of a factor analysis as discussed in the lecture,
using a number of small ‘pen-and-paper’ calculations for the same data discussed in the lecture.
The aim is to gain a good understanding of the criteria and methods of each step.
b) Which analyses does the first step entail and what is the purpose of each analysis?
c) Why is it useful to review a scatter plot for each pair of variables? How many plots would you
need to review if there are 10 variables?
d) Which methods can you list for the second step and how do these methods differ from each
other?
e) Which criteria can you list for the number of factors and what does each criterion entail?
f) Which methods can you list for the third step? Which of these methods is preferable and why?
Calculations
The starting point is the principal component analysis (PCA) of four intelligence subtests
(discussed in the lecture). Questions:
a) Use the table on slide 4 to perform your own calculation of the correlation between both
verbal tests and both spatial tests. Compare your outcomes with the table on slide 5.
b) Which tests have the strongest correlation, the verbal or the spatial tests?
45
d) Use the table on slide 15 to calculate the communality and uniqueness of test scores V1 and
S2. Do this according to a 1-factor model first, followed by a 2-factor model. Compare your
outcomes with the relevant cells in the tables on slides 17 and 19).
e.) Use the table on slide 15 to calculate the reproduced and residual correlation between test
scores V1 and S2. Do this according to a 1-factor model first, followed by a 2-factor model.
Compare your outcomes with the tables on slides 18 and 19).
f) Now review the table on slide 19). How many factors would you select and why?
The starting point is the principal factor analysis (PFA) of the case used in assignment 2.
Questions:
a) Use the factor loadings in the table on slide 22 to calculate the following for test scores V1
and S2 based on the two-factor model: communality, uniqueness and the reproduced and residual
correlations between them. Compare your outcomes with the information in the table on slide 23.
b) Compare the communalities of the PFA (table on slide 23), and the outcomes for question (a))
with those for the PCA (table on slide19). Which method is better able to explain the variance in
the test scores?
c) Compare the residual correlations in the table on slide 23 (PFA) with those in the table on
slide 19 (PCA). Which method is able to provide the best explanation for the correlations
between the four test scores?
e) Which method (PCA or PFA) should be the preferred choice at what time?
a) Review the criteria for selecting the number of factors (slide 25). State for each criterion how
many factors you would extract in the example of the four IQ tests using PFA.
b) Now review the factor plots for this case before rotation, after orthogonal rotation and after
oblique rotation (figures on slides 28, 30, 31). Which rotation method should be the preferred
choice and why?
c) Given the plot and the loadings, how would you interpret the factors?
d) Repeat questions (a), (b) and (c) for the factor analysis of 15 items relating to the working
conditions among nursing staff (slide 35).
46
SPSS practical 4: Statistics III
During this practical, we will apply the individual steps in a factor analysis that we practised in
the last tutorial to a different data set and we will also use SPSS, given the large number of
calculations.
The aim here is to gain a good understanding of the differences between the various methods for
factor extraction, of the criteria for selecting the number of factors and of methods for rotating
and interpreting factors. There will be an assignment for each step in factor analysis.
View the following YouTube video in order to prepare for, or complement, the practical:
https://www.youtube.com/watch?v=Nj9tj4AGAA0
Please note that a set of specific SPSS instructions for the following assignments can be found on
page 49.
In 1975, Statistics Netherlands (CBS) conducted a large-scale survey to elicit the opinions of
Dutch nationals on a wide range of topics. This assignment looks at the responses of 110
individuals to 8 Likert items vr1-vr8 (responses varying between ‘agree’ and ‘disagree’) in
relation to their view of the government, among other topics (file stat3pr34c.sav , see Appendix
C for the content of the items).
Questions:
a) For each item, check for the presence of missing values, outliers or strong non-normality.
b) Carry out a reliability analysis for this scale and also request the item-item correlation matrix.
What is your assessment of the internal consistency of this scale and of the extent to which each
item fits the scale? Provide arguments for your answer.
c) Can this scale be said to be unidimensional? How can you tell and what does this imply for the
sum-scores for this scale?
a) Carry out a principal component analysis (PCA) for these eight items. Determine the number
of factors in accordance with the K1 criterion (eigenvalues). Request all output with the
exception of factor scores. Do not apply rotation. How many factors should be extracted in
accordance with each criterion? Which items demonstrate a similar factor pattern? Compare this
with assignment 1b.
47
b) Verify whether the chi-square test in the ML factor analysis confirms the number of factors
selected in 2a.
c) Perform a principal factor analysis (PFA) and extract the number of selected factors. Request
the reproduced correlation matrix, communalities and factor loadings. Compare the output with
that of the principal component analysis. What similarities and differences can you see?
d) The differences in output requested under c) are very small in this instance. Why is this?
(Tip: look at the initial communalities = R2.)
a) Repeat the principal factor analysis of assignment 2c, but now carry out a Varimax rotation.
Request a table and a plot for the rotated factor loadings.
b) Repeat question 3a with oblique rotation. Also request the correlations between the factors.
How high is the correlation between the factors?
c) Compare the unrotated factor pattern (assignment 2c) with both rotated patterns. Which
pattern should be the preferred choice and why?
Based on the factor analyses, decide which items belong in the same subscale.
For each subscale, use COMPUTE to calculate the sum-score for the items in question.
Request the correlation matrix for the sum-scores. Compare this matrix with the correlation
matrix for the factors after oblique rotation. What do you notice? What does this tell you about
the meaning of the factors?
48
SPSS instructions for practical 4
Assignment 1
a) Use Analyze – Descriptive Statistics -Frequencies to request a histogram for each variable
(vr1 - vr8). Use Frequencies or Descriptives to request the mean, SD, minimum, maximum,
skewness and kurtosis. For ordinal variables, this will be adequate. In the case of continuous
variables, such as response times, it is better to use Explore (allows testing for normality and
provides an overview of outliers, among other things).
b) Select Analyze - Scale - Reliability Analysis. Select all eight items (vr1 – vr8) as the input
variables. Under Statistics, request all information under Descriptives for and Summaries, as
well as inter-item correlations.
Assignment 2
a) Select Analyze - Dimension Reduction - Factor. Select all eight items as variables and
select:
Descriptives: all output under this button (useful for an initial analysis of a file.
For follow-up analysis, use only the initial solution and the reproduced correlation matrix).
Extraction:
- under Method: principal components
- under Analyze: the correlation matrix
- under Extract: eigenvalues > 1
- under Display: unrotated factor solution, scree plot
Rotation: no rotation (not until assignment 3), only a loading plot
Scores: skip (only needed for factor scores)
Options: skip (only relevant if there are missing values).
c) As for (2b), but select Principal Axis Factoring under Method and specify the number of
factors selected in question 2b under Extract. Restrict the output under the Descriptives button to
the initial solution and the reproduced correlation matrix.
49
Assignment 3
a) As for assignment (2c), with the following adaptations: Under Extraction: select fixed no. of
factors and choose the number based on assignment 2. Under Rotation: Select Varimax and
request the rotated factor solution and the factor plot (‘loading plot’).
b) As for assignment (3a), but instead select Oblimin or Promax rotation and keep the default
value for the delta and kappa respectively.
50
Tutorial 9: Statistics III
This tutorial is dedicated in its entirety to the discussion of the results of the SPSS factor analysis
practical. Any remaining time can be spent on any assignments from the previous tutorials that
have not been discussed yet.
1. For the relevant assignment, first review the global structure of the SPSS output.
2. Then discuss the output using the questions for this practical as set out in this course
textbook.
3. Any questions or issues that remain unclear can be raised during the Q&A session.
51
Tutorial 10: Statistics III
With only a short while to go until the course exams, we will not introduce many more new or
complicated topics. That is why this tutorial only has a handful of assignments for the first hour
of the session. The second hour can be spent on a recap of the course or on questions.
Let’s assume there are four questionnaires, one for extraversion (E), one for impulsiveness (I),
one for remoteness (R) and one for neuroticism (N). All four questionnaires have a reliability of
0.70. We have established the following correlations between the sum-score for the extraversion
questionnaire and the three other sum-scores: r(E,I) = 0.50, r(E,R) = -0.70, r(E,N) = 0. Question:
For each questionnaire, determine the extent to which the measurement is comparable to the
extraversion questionnaire, from a statistical point of view.
Over the course of a few years, 200 patients suffering from back problems in a region have been
referred to a physiotherapy practice run by two part-time staff. The two therapists want to check
whether there is agreement between them on their decision whether to apply traction or not
(traction is a specific treatment method). In order to establish this, they assess each patient
without consulting one another. This results in the contingency table below.
therapist B
no possibly yes
therapist no 30 20 10
A possibly 20 40 20
yes 10 20 30
Questions:
a) For what percentage of patients are the therapists in full agreement? And for what
percentage do they completely disagree? What is your opinion of this level of agreement?
b) Now calculate the kappa measure of agreement (unweighted). What is your opinion of
this level of agreement? Is this level good, average or poor?
c) Now calculate the kappa with linear weighting and answer question b once more.
d) Finally calculate the kappa with quadratic weighting and answer question b again.
52
Assignment 3. Academic progress
Let’s assume that we wish to use a progress test to closely monitor the academic progress of
bachelor students. Key aspects are the reliability and validity of the progress test. Question: for
each of the criteria below, indicate how the criterion could be assessed for the progress test: (a)
internal consistency, (b) retest reliability, (c) content validity and (d) construct validity (tip:
progress test)
53
Knowledge questions part 3: factor analysis and validity & agreement
2. How is the covariance matrix for variable K constructed? And the correlation matrix? What do
we find on the diagonal and what off the diagonal?
3. What requirements must the measurement level and the distribution of the variables meet to do
a factor analysis?
11. How are they calculated from the factor pattern (for rotation)?
12. What is the reproduced and residual correlation between two variables?
13. How are they calculated from the factor pattern (for rotation)?
14. What is the difference between principal component analysis and principal factor analysis?
15. What advantage does each method offer compared to the other method?
16. What does the K1 criterion for factor selection mean? And the scree test?
17. What are the other criteria for determining the number of factors?
18. What is the best way to apply all of those criteria if they do not all result in the selection of
the same number of factors?
54
Factor rotation and confirmatory factor analysis
20. Once the factors have been extracted, they can be rotated arbitrarily. Why is this?
21. Factor rotation is used to produce a simple structure. What does that mean?
22. What is orthogonal rotation and which is the most well-known method?
24. Which rotation method, orthogonal or oblique, is usually best, and why?
25. What is the difference between exploratory and confirmatory factor analysis?
26. What criteria are used to select the number of factors in a confirmatory analysis?
28. What are factor scores? What is the simple alternative for factor scores?
55
Validity and agreement
30. What is content validity and how can this be verified for a test or questionnaire?
34. Why is it that reliability and validity are often also at odds with each other?
35. What statistical measure can be used to verify whether two tests or questionnaires measure
the same thing?
36. What do the sensitivity and specificity of a diagnostic test refer to?
37. Why is a high correlation between two assessors not sufficient to ensure good agreement?
38. Which is the common measure of agreement between two assessors for dichotomous or
nominal ratings?
39. Why is this measure less suitable for ordinal ratings? Which measure is suitable for ordinal
ratings?
40. Which is the best measure for verifying agreement between two assessors in the event of a
quantitative rating?
41. Does good agreement between the assessors mean that there will be little or no difference for
a random individual as to whom he or she is assessed by? Explain.
56
Appendix A General tips for using SPSS for Windows
1. data (.sav): the raw data matrix (individuals as rows, variables as columns)
3. syntax (.sps): save (via the Paste button for all statistical analyses), process and execute SPSS
commands in the old style (as previously under DOS)
(This window is not opened by default, but can be opened via File – New or Open)
Data matrix:
individual X1 X2 .... XK
1
2
...
N
The SPSS toolbar has various pull-down menus, from File through to Help. Listed below are the
four menus that are specific to SPSS, along with the most frequently used options for each menu:
Data: defining new variables, sorting and selected cases, merging files
Transform: calculating and recoding values, converting raw values to rankings, imputing
missing values. Options under Transform are applied to each individual (row) in the data file.
The result will be a new or amended variable (column).
Graphs: creating all manner of graphs The Legacy Dialogs option allows graphs to be defined in
the same manner as for analyses, that is, by placing variables in boxes using mouse clicks. The
Chart Builder option requires you to drag and drop variables.
Note:
Selection with filtering for unselected cases (in the Data menu) only has an impact on procedures
carried out using Analyze and Graphs. Procedures under Transform are not affected. This means
I
unselected cases will not be included in a statistical analysis, but they will be included when
calculating or recoding variables.
Selection with deletion of unselected cases (in the Data menu) results in the removal of cases
from the data window and also from the data file if this data window is saved.
Always check that the selection of persons or the creation/change of variables has been carried
out correctly. The result can be incorrect, especially if the selection or creation/change uses a
variable which has missing values. Always check selections or operations using a frequency
distribution (for new variables), a contingency table (new versus old variable) or listing (Analyze
– Reports – Case Summaries).
Saving paper: (1) Remove superfluous or incorrect output from the window, (2) reduce figures or
tables in size if necessary and (3) print two pages on each A4 sheet.
Names instead of labels for variables in the input screen and/or output file:
Select: Edit – Options – General (for input screens this only becomes effective after the data
have been loaded again), or Edit – Options – Output labels (effective immediately for output).
Values instead of labels to see the values assumed by the variables in the output:
Select Edit – Options – Output labels (effective immediately for output)
Including comments/explanation in the output (for later reference). Include lines that start with a
* and end in a full stop in the syntax window. If you ‘run’ these lines, they will be printed in the
output but not interpreted by SPSS as command lines.
II
Appendix B Importing SPSS output in Word
SPSS includes easy ways to copy output (both tables and graphs) to Word. Select a graph and
select <Copy> via <Edit>. Now open a Word document, hold down the right mouse button and
select <Edit> and <Paste>. If you hold down the Shift button, you can select and copy multiple
graphs or tables in one go if needed.
If a table doesn’t fit on the page, click the right mouse button and select <Autofit> and <Autofit
to content>.
The option below can be used as an alternative, particularly if there is a large volume of output:
Exporting all output at once: go to the output window. At the top, select <File> and
<Export>:
What?
File Type?
File name?
Location?
How?
- Objects to Export: select All to indicate that you wish to export all output. In this case, the
output will contain a number of useless tables that are not visible in the spv file. To exclude
these tables, select All Visible instead of All.
-Type: select Word/RTF(*doc). As you can see, you also have the option of exporting the
output in PDF, Excel or PowerPoint format.
- File Name: define the location and name of the Word file to which you wish to export the
SPSS output.
- Change options: the default option is Wrap table (tables that are too wide are split into
multiple tables shown one underneath the other). Select Shrink table if you don’t want to
do this.
III
Appendix C Training files
This appendix only describes the files for the psychometrics and factor analysis practicals. The
files for contingency table analysis and logistic regression each contain no more than a few
variables and a description is included with the relevant practical.
stat3pr34a.sav
This file contains the responses of 200 adult Dutch nationals to 21 items relating to physical
complaints. These formed part of a survey research around 1980, which looked at health in the
Netherlands (items scored as follows: 1 = yes, 0 = no, blank = missing).
Please see below for the content of the items.
IV
stat3pr34b.sav
This file contains the scores of 917 pupils in the two last years of primary education. The scores
concern two of the six intelligence subtests in the ISI, being ISI-4 (rotation of figures) and ISI-5
(understanding word categories).
Names of the variables:
item4_1 to item4_20 incl. = scores for items 1-20 of subtest 4 (1 = correct, 0 = incorrect),
item5_1 to item5_20 incl. = scores for items 1-20 of subtest 5
sum4 and sum5 = sum-scores for both subtests.
For reasons of protecting the copyright and confidentiality of the tests, the exact wording of the
items has not been reproduced here. Instead, fictional items are included below to give an
impression of ISI-4 and an example of the instructions provided with ISI-5.
A B C D E
A January
B Tuesday
C April
D Sunday
E Evening
V
stat3pr34c.sav
This file contains the responses of 100 individuals to the following eight items from a survey
conducted by Statistics Netherlands in 1975. The items relate to their view of the government
and young people with a critical attitude.
For each item, the answer is chosen using an ordinal scale, where 1 = disagree and 5 = agree.
In items 4, 5 and 7, the order of the statement is reversed. The responses in the file have already
been mirrored (recoded) so these items signify the following: 1 = agree, 5 = disagree
1. There should be no cause to be angry if young people who protest against perceived injustices
occasionally break the law.
2. The significant increase in the number of government bodies has put personal freedom under
threat.
3. Young people should have a critical attitude regarding the status quo.
4. It is desirable for the government to introduce a law which allows all protests to be broken up.
5. The government is authorised to deploy soldiers in order to break a strike.
6. More state-owned companies should be transferred into private ownership.
7. Severe sentences should be introduced for people who disregard the instructions of the police
during protests.
8. By introducing tougher measures against terrorism and civil disobedience, the government
would be turning our country into a police state.
VI