You are on page 1of 39

INTRO 2 IRT

Tim Croudace

Descriptions of IRT
IRT refers to a set of
mathematical models that
describe, in probabilistic
terms, the relationship
between a persons
response to a survey
question/test item and his
or her level of the latent
variable being measured
by the scale

This latent variable is


usually a hypothetical
construct [trait/domain or
ability] which is
postulated to exist but
cannot be measured by a
single observable
variable/item.

Fayers and Hays p55

Assessing Quality of Life in


Clinical Trials. Oxford Univ
Press:
Chapter on Applying IRT for
evaluating questionnaire
item and scale properties.

Instead it is indirectly
measured by using
multiple items or
questions in a multi-item
test/scale.

data:
logit {hi} = h 0 + h 1zi The0000

h0

h1

10

21

h0 40

1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111

n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31

Sources of knowledge : q1 radio


q2 newspapers
q3 reading
3
A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1

0
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1

Simple sum scores


(n=1729 new individual
values)
0
0
0 [n]
Total score
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
1

0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1

477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31

477 zeros added to data set (new

1
1
1
2
2
2
3
1
2
2
2
3
3
3
4
4

Binary Factor / Latent Trait


Analysis
Results:
logit-probit model
Warming up to this sort of thing soon .
U1

U2

U3

...

Up

2 items with similar thresholds and similar slopes


3 items with different thresholds but similar slopes

The key concept latent factor


models for constructs
underpinning multiple binary
responses
based on(0/1)
innovations
in educational testing

and psychometric statistics > 50 years old


Same models used in educational testing with
correct incorrect answers can be applied to
symptom present / absent data (both binary)
Extensions to ordinal outcomes (Likert scales)
Flexibility in parametric form available
Semi- and non-parametric approaches too

Binary IRT : The A B C D of it

Linear vs non-linear regression of


response probability on latent variable
y-axis
prob
of
response
(Yes)
Adapted
without permission
from a
slide by
Prof H Goldstein

on a
simple
binary
(Yes/No)
scale
item

x-axis

score on latent construct being


measured

Ordinal IRT : The A B C D of


GRM

IRT models
Simplest case of a latent trait analysis
Manifest variables are binary: only 2 distinctions are
made
these take 0/1 values
Yes / No
Right / Wrong
Symptom present / absent

Agree / disagree distinctions for attitudes more likely to be ordinal


[>2 response categories] .. see next lecture IRT 2 on Friday

For scoring of individuals


(not parameter estimation for items)
it is frequently assumed that the UNOBSERVED (latent) variable
< the latent factor / trait>
is not only continuous but normally distributed
[or the prior distn is normal but the posterior distn may not be]

10

IRT for binary data


The most commonly used model was developed by
Lord-Birnbaum model (Lord, 1952; Birnbaum, )
2-parameter logistic
[a.k.a. the logit-probit model; Bartholomew (1987)]
The model is essentially a non-linear single factor model
When applied to binary data, the traditional linear factor model is
only an approximation to the appropriate item response model
sometimes satisfactory, but sometimes very poor (we can guess when)

Some accounts of Item Response Theory make it sound


like a revolutionary & very modern development
this is not true!

It should not replace or displace classical concepts, and has


suffered from being presented and taught as disconnected from
these
A unified treatment can be given that builds one from the other
(McDonald, 1999) but this would be a one term course on its own
11

What IRT does


IRT models provide a clear statement [picture!]
of the performance of each item in the scale/test
and
how the scale/test functions, overall,
for measuring the construct of interest
in the study population
The objective is to model each item by estimating
the properties describing item performance
characteristics
hence Item Characteristic Curve
or Symptom Response Function.
12

Very bland (but simple)


example
Lombard and Doering (1947) data
Questions on cancer knowledge with four
addressing the source of the information
Fitting a latent variable model might be
proposed as a way of constructing a
measure of how well informed an individual
is about cancer
A second stage might relate knowledge
about cancer to knowledge about other
diseases or general knowlege
13

Very bland (but simple)


example
Lombard and Doering (1947) data
Questions on cancer knowledge with
four addressing the source of the
information
radio
newspapers
(solid) reading (books?)
lectures

2 to the power 4 i.e. 16 possible


response patterns from 0000 to 1111
14

Data
Lombard and Doering
(1947) data
2 to the power 4
i.e. 16 possible response
patterns (all occur)
with more items this is
neither likely nor necessary

frequency shown for


0000 to 1111
frequency is the number
with each item response
pattern

0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111

n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31

15

data:
logit {hi} = h 0 + h 1zi The0000

h0

h1

10

21

h0 40

1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111

n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31

Sources of knowledge : q1 radio


q2 newspapers
q3 reading
16
A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1

Basic objectives of modelling


When multiple items are applied in a test /
survey can use latent variable modelling to
explore inter-relationships among observed responses
determine whether the inter-relationships can be explained by a
small number of factors

THEN , to assign a SCORE to each individual each on


the basis of their responses
Basically to rank order (arrange) or quantify (score) survey
participants, test takers, individuals who have been studied
CAN BE THOUGHT OF AS ADDING A NEW SCORE TO YOUR
DATASET FOR EACH INDIVIDUAL

this analysis will also help you to understand the properties


of each item, as a measure of the target construct (what
properties?)
GRAPHICAL REPRESENTATION IS BEST

17

Item Properties that we are interested in


are captured graphically by so called Item
Characteristics Curves (ICCs)

18

Item/Symptom & Test/Scale INFORMATION


is useful and necessary to examine score precision
(the accuracy of estimated scores)
we are interested in this for different individuals
(individuals with different score values)
by inspecting the amount of information about each
score level, across the score range (range of
estimated scores) we are identifying variations in
measurement precision (reliable of individuals
estimated scores)
this enables us to make statements about the
effective measurement range of an instrument in an
population

19

e.g. Item Characteristics


Curves

20

Item information functions


- add them together to get TIF

beware y axis scaling : not all the same

21

Test Information Function

22

Item information functions


- shown alongside their ICCs

3.0

0.14

Item Characteristics Curves

0.14

0.40

11

beware y axis scaling : not all the same

23

1 / Sqrt [Information] =
s.e.m
Info Sqrt(Info) 1/(sqrt(Info)
1
1.0
1.0
2
1.4
0.7
3
1.7
0.6
4
2.0
0.5
5
2.2
0.4
6
2.4
0.4
7
2.6
0.4
8
2.8
0.4
9
3.0
0.3
10
3.2
0.3
11
3.3
0.3
12
3.5
0.3

24
Standard error of measuremenr is not constant (U-shaped, not symmet

Approximate reliability
Reliability
= 1 1/[Info]
= {1 1 / [1 / (s.e.m
^2) }
s.e.m. = standard error of measurement

25

Back to the Data


Lombard and Doering
(1947) data
2 to the power 4
i.e. 16 possible response
patterns (all occur)
with more items this is
neither likely nor necessary

frequency shown for


0000 to 1111
frequency is the number
with each item response
pattern

0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111

n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31

What would be the easiest thing to do with these numbers; to score the26patter

Answer ..
Simply add them up

0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111

What would be the easiest thing to do with these numbers; to score the27patter

0
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1

Simple sum scores


(n=1729 new individual
values)
0
0
0 [n]
Total score
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1

0
0
0
1
0
1
1
1
0
0
0
1
0
1
1
1

0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1

477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31

477 zeros added to data set (new

1
1
1
2
2
2
3
1
2
2
2
3
3
3
4
28

Weighted

[by discriminating power]

scores
0
h 1]
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
5.50

0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1

0
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1

0 [n]

Total

Factor Component [weighted by alpha

0
0
1
0
1
0
1
1
0
0
1
0
1
0
1

score
0
1
1
1
2
2
2
3
1
2
2
2
3
3
3

score
-0.98
-0.68
-0.67
-0.46
-0.41
-0.23
-0.22
0.0
0.16
0.42
0.43
0.66
0.72
0.99
1.02

477
63
12
150
7
32
11
4
231
94
13
378
12
169
45

score
0.72
0
3.40=
0.72
1.34
0.77
0.77
1.34
0.72
+ 0.77
Mplus version 4.1 ML Estimate
0.72 +1.34
Z by Q1
alpha h 1
0.721
1.34
Z by +
Q2 0.77
alpha h 2
3.358
Z by Q3
alpha h 3
0.72
+ 1.34
+ 0.771.344
Z by Q4
alpha h 4
0.769
3.40
0.72
+3.40 Z
Variances
1
3.40+ 0.77
Compare with Bartholomew (1987)
0.72 (0.09)
3.40
+ 1.34
3.40 (1.14)
1.34 (0.17)
0.72
+ 3.40+ 0.77
0.77 (0.15)
0.72+ 3.40+1.34
3.40+1.34+ 0.77

0
0.72
0.77
1.34
1.48
S.E.
2.06
0.093
2.10
1.035
0.167
2.82
0.145
3.40
4.12
4.16
p160
4.74
4.88
5.46
29

37

data:
logit {hi} = h 0 + h 1zi The0000

h0

h1

10

21

h0 40

1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111

n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31

Sources of knowledge : q1 radio


q2 newspapers
q3 reading
30
A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1

Weighted

[by discriminating power]

scores
0
h 1]
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
5.50

0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1

0
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1

0 [n]

Total

Factor Component [weighted by alpha

0
0
1
0
1
0
1
1
0
0
1
0
1
0
1

score
0
1
1
1
2
2
2
3
1
2
2
2
3
3
3

score
-0.98
-0.68
-0.67
-0.46
-0.41
-0.23
-0.22
0.0
0.16
0.42
0.43
0.66
0.72
0.99
1.02

477
63
12
150
7
32
11
4
231
94
13
378
12
169
45

score
0.72
0
3.40=
0.72
1.34
0.77
0.77
1.34
0.72
+ 0.77
Mplus version 4.1 ML Estimate
0.72 +1.34
Z by Q1
alpha h 1
0.721
1.34
Z by +
Q2 0.77
alpha h 2
3.358
Z by Q3
alpha h 3
0.72
+ 1.34
+ 0.771.344
Z by Q4
alpha h 4
0.769
3.40
0.72
+3.40 Z
Variances
1
3.40+ 0.77
Compare with Bartholomew (1987)
0.72 (0.09)
3.40
+ 1.34
3.40 (1.14)
1.34 (0.17)
0.72
+ 3.40+ 0.77
0.77 (0.15)
0.72+ 3.40+1.34
3.40+1.34+ 0.77

0
0.72
0.77
1.34
1.48
S.E.
2.06
0.093
2.10
1.035
0.167
2.82
0.145
3.40
4.12
4.16
p160
4.74
4.88
5.46
31

37

Something a little more


subtle
Simple sum scores assumes all item
responses equally useful at defining the
construct
may not be the case

If items are differentially important


different discriminating power with respect to
what we are measuring, we might want to take
that into accounf
How? Weighted sum scores [Component scores]
weighted by what?
weighted by the estimates (factor loading type
parameter) from a latent variable model
[latent trait model with a single latent factor]
32

logit {hi} = h 0 + h 1zi


h0

h1

Cancer
Knowledge zi

The data:
0000
1000
0001
0010
1001
1010
0011
1011
0100
1100
0101
0110
1101
1110
0111
1111

10

21

h0 40

n
477
63
12
150
7
32
11
4
231
94
13
378
12
169
45
31

0.14

Item information functions


- shown alongside their ICCs

3.0

Item Characteristics Curves

0.14

0.40

11

Sources of knowledge : q1 radio


q2 newspapers q3 reading
q4 lectures
A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1 too! 26
beware y axis scaling : not all the same

Mplus version 4.1 ML Estimate


Z
Z
Z
Z

by
by
by
by

Q1
Q2
Q3
Q4

Variances

alpha
alpha
alpha
alpha

h
h
h
h

1
2
3
4

0.721
3.358
1.344
0.769

Weighted
scores

S.E.
0.093
1.035
0.167
0.145

Weights
alpha h 1
parameters

Compare with Bartholomew (1987) p160


0.72
3.40
1.34
0.77

(0.09)
(1.14)
(0.17)
(0.15)

37

Q1
0.72
Q2
3.40
Q3
1.34
Q4
0.77
These numbers 33

20

?????
0.72
3.40
1.34
0.77

Estimated component scores


(weighted values)

0
h 1]
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
5.50

0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1

0
0
0
0
1
0
1
1
1
0
0
0
1
0
1
1

0 [n]

Total

Factor Component [weighted by alpha

0
0
1
0
1
0
1
1
0
0
1
0
1
0
1

score
0
1
1
1
2
2
2
3
1
2
2
2
3
3
3

score
-0.98
-0.68
-0.67
-0.46
-0.41
-0.23
-0.22
0.0
0.16
0.42
0.43
0.66
0.72
0.99
1.02

477
63
12
150
7
32
11
4
231
94
13
378
12
169
45

score
0
=
0.72
0.77
1.34
0.72+ 0.77
0.72 +1.34
1.34+ 0.77
0.72+ 1.34+ 0.77
3.40
0.72+3.40
3.40+ 0.77
3.40+ 1.34
0.72+ 3.40+ 0.77
0.72+ 3.40+1.34
3.40+1.34+ 0.77

0
0.72
0.77
1.34
1.48
2.06
2.10
2.82
3.40
4.12
4.16
4.74
4.88
5.46
34

But the bees knees are..


The estimated factor scores from the
model
Not just some simple sum or unweighted
or weighted items
Takes into account the proposed score
distribution (gaussian normal) and the
estimated model parameters (but not the
fact that they are estimates rather than
known values) and more besides (when
missing
are present)
thedata
estimated
factor scores
35

A graphical and interactive


introduction to IRT
Play with the key features of IRT
models
www2.unijena.de/svw/metheval/irt/VisualIRT.pdf

36

a b (see) [2 parameter IRT


model]
VisualIRT (pdf)
Page

VisualIRT (pdf)
Page

Individuals score = new ruler value


Any hypothetical latent variable [factor/trait] contin
expressed in a z-score metric (gaussian normal (0,1
Item properties
slope = item discrimination
location = item commonality [difficulty/prevalance/
37

IRT Resources
A visual guide to Item Response Theory
I. Partchev

Introduction to RIT,
R.Baker
http

//ericae.net/irt/baker/toc.htm

An introduction to modern measurement theory


B Reeve

Chapter in Fayers and Machin QoL book


P Fayers

ABC of Item Response Theory


H Goldstein

Moustaki papers, and online slides (FA at 100)


LSE books (Bartholomew, Knott, Moustaki, Steele)

38

Applying The Rasch Model Trevor G. Bond and Christine M. Fox 255 pages. 2001.
Constructing Measures: An ItemItem
Response
Modeling Approach Mark Wilson. 248
Response Theory Books
pages. 2005.
The EM Algorithm and Related Statistical Models Michiko Watanabe and Kazunori
Yamaguchi. 250 pages. 2004.
Essays on Item Response Theory Edited by Anne Boomsma, Marijtje A.J. van Duijn, Tom A.A.
Snijders. 438 pages. 2001.
Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach
Edited by Paul De Boeck and Mark Wilson. 382 pages. 2004.
Fundamentals of Item Response Theory Ronald K. Hambleton, H. Swaminathan, and H. Jane
Rogers. 184 pages. 1991.
Handbook of Modern Item Response Theory Edited by Wim J. van der Linden and Ronald K.
Hambleton. 510 pages. 1997.
Introduction to Nonparametric Item Response Theory Klaas Sijtsma and Ivo W. Molenaar.
168 pages. 2002.
Item Response Theory Mathilda Du Toit. 906 pages. 2003.
Item Response Theory for Psychologists Susan E. Embretson and Steven P. Reise. 376
pages. 2000.
Item Response Theory: Parameter Estimation Techniques (Second Edition, Revised and
Expanded w/CD) Frank Baker and Seock-Ho Kim. 495 pages. 2004.
Item Response Theory: Principles and Applications Ronald K. Hambleton and Hariharan
Swaminathan. 332 pages. 1984.
Logit and Probit: Ordered and Multinomial Models Vani K. Borooah. 96 pages. 2002.
Markov Chain Monte Carlo in Practice W.R. Gilks, Sylvia Richardson, and D.J.
Spiegelhalter. 512 pages. 1995.
Monte Carlo Statistical Methods Christian P. Robert and George Casella. 645 pages.
2004.
Polytomous Item Response Theory Models Remo Ostini and Michael L. Nering. 120
pages. 2005.
Rasch Models for Measurement David Andrich. 96 pages. 1988.
Rasch Models: Foundations, Recent Developments, and Applications Edited by Gerhard H.
39
Fischer and Ivo W. Molenaar. 436 pages. 1995.