Professional Documents
Culture Documents
Item IRT Parameters PDF
Item IRT Parameters PDF
P
Parameters
t in
i IRT
American Board of Internal Medicine
Item Response Theory Course
Overview
• Ability parameter estimation with
known item parameters:
–Maximum Likelihood.
–Bayesian
Bayesian procedures.
procedures
• Joint estimation of Item and
Ability parameters when both are
unknown.
unknown
Estimation
E ti ti off Ability
Abilit with
ith
Known
o Item
e Parameters
aa ees
• Given item p parameters and
the vector of observed item
responses forf an examinee,
i
what is the most likely
y abilityy
level for this examinee?
Recall IRT Assumptions
• Unidimensionality of the Test
• Local Independence
• N t
Nature off the
th ICC
• Parameter Invariance
Da j (θi −b j )
e
P(uij = 1| θi ) = c j + (1 − c j ) Da j (θi −b j )
1+ e
09
0.9
Q P
0.8
Probabillity
07
0.7
0.6
Item: b = 0.0
sponse P
0.5
a = 1.0
0.4
c = 0.0
00
Res
0.3
0.2
0.1
0.0
-3
3 -2
2 -1
1 0 1 2 3
Ability (θ)
1.0
09
0.9
0.8
Probabillity
07
0.7 Q P
0.6
Item: b = -0.5
sponse P
0.5
a = 0.5
0.4
c = 0.1
01
Res
0.3
0.2
0.1
0.0
-3
3 -2
2 -1
1 0 1 2 3
Ability (θ)
Estimation
E ti ti off Ability
Abilit with
ith
Known
o Item
e Parameters
aa ees
• Given item parameters and the
vector
t off observed
b d item
it responses
for an examinee, what is the most
lik l ability
likely bilit level
l l for
f this
thi examinee?
i ?
• What value of θ is most likelyy to
result in the pattern of item
p
responses we observed?
Local Independence
• Recall that the principle of local item
independence states that item responses
y independent
are statistically p given θ:
g
((0,0),
, ), (0,1),
( , ), (1,0),
( , ), or (1,1),
( , ),
0.8 b = -0.5
Probabilitty
0.7 a = 1.0
0.6 c = 0.1 Item 2:
sponse P
0.5
P2(θ) b = 1.0
0.4
a = 0.5
05
Res
03
0.3
0.2 c = 0.2
01
0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
Q1(θ)
0.9 Item 2:
0.8
b = 1.0
Probabilitty
Q2(θ)
0.7
a = 0.5
0.6
c = 0.2
02
sponse P
0.5
Item 1:
0.4
b = -0.5
Res
0.3
0.2
a = 1.0
01
0.1 c = 0.1
01
0.0
-3 -2 -1 0 1 2 3
Ability (θ )
For any given level of θ, the 1.0
P1(θ)
0.9
probability of observing a 0.8
e Probability
certain response pattern is 0.7
0.6
obtained by multiplying the 0.5
Response
corresponding probabilities 0.4
P2(θ)
0.2
responses
p ((P or Q)
Q). 01
0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
0.6
0.4
0.2
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
Determining Likelihood
• From the model
model, we know the
probability of a correct or incorrect
response for
f each h item,
it so we can
determine the likelihood of a certain
response pattern for many levels of θ
and determine which level
corresponds to the highest probability.
Why “Likelihood?”
Likelihood?
• When discussing “the the probability”
probability of
something occurring, it implies that we
haven’tt observed it happen
haven happen.
• We start with item response data, so item
responses are clearly NOT unobserved
unobserved.
• In this situation, we refer to the “likelihood”
off observing
b i a certain
t i response pattern,
tt
given ability.
Each “likelihood function” is determined by multiplying P1 or Q1
byy P2 or Q2 for manyy levels of θ from -3 to 3.
1.0
0.9 u = (1,1)
Item 1: a = 1.0, b = -0.5, c = 0.1
0.8
u = (0,0) Item 2: a = 0.5, b = 1.0, c = 0.2
0.7
Liikelihood
d
0.6
0.5
u = (1,0)
04
0.4
0.3
u = (0,1)
02
0.2
0.1
00
0.0
-3 -2 -1 0 1 2 3
Ability (θ )
The point where the likelihood function of θ reaches its largest
value is known as the Maximum Likelihood Estimate ((MLE)) for θ.
1.0
0.9 u = (1,1)
Item 1: a = 1.0, b = -0.5, c = 0.1
0.8
u = (0,0) Item 2: a = 0.5, b = 1.0, c = 0.2
0.7
Liikelihood
d
0.6
0.5
u = (1,0)
04
0.4
0.3
u = (0,1)
02
0.2
0.1
00
0.0
-3 -2 -1 0 1 2 3
Ability (θ )
1.0
Q1(θ)
0.9
What do these Likelihood 0.8
onse Probability
Functions tell us? 0.7
Q2(θ)
0.6
0.5
0.4
Respo
Response pattern: 0.3
0.2
(0,0) Æ Q1(θ)Q2(θ) 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
The likelihood function 0.9
0.6
05
0.5
0.4
0.3
This is why administering a 0.2
2 it test
2-item t t is
i a bad
b d idea!
id ! 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
0.9 P1(θ)
What do these Likelihood 0.8
onse Probability
Functions tell us? 0.7
0.6
0.5
P2(θ)
0.4
Respo
Response pattern: 0.3
0.2
(1,1) Æ P1(θ)P2(θ) 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
The likelihood function 0.9 u = ((1,1)
, )
continues to increase as θ 0.8
0.7
approaches +∞
Likelihood
0.6
05
0.5
0.4
0.3
Another reason why 2-item 0.2
t t are a bad
tests b d idea!
id ! 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
Q1(θ)
0.9
What do these Likelihood 0.8
onse Probability
Functions tell us? 0.7
0.6
0.5
0.4
Respo
P2(θ)
Response pattern: 0.3
0.2
(0,1) Æ Q1(θ)P2(θ) 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
The likelihood function 0.9
0.6
05
0.5
0.4
0.3
u = (0,1)
A much “better behaved” 0.2
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
0.9
Q2(θ)
What do these Likelihood 0.8
onse Probability
Functions tell us? 0.7
0.6
0.5
0.4
Respo
Response pattern: 0.3
P1(θ)
0.2
(1,0) Æ P1(θ)Q2(θ) 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
The likelihood function 0.9
0.6
05
0.5
u = (1,0)
0.4
0.3
The “best behaved” 0.2
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
Local Independence
• Generalized
Generalized, the probability of observing
the item response vector, u, is equal to the
product of the individual probabilities for
each item:
n
P(u | θ ) = ∏ Pj (θ ) Q j (θ )
Uj 1−U j
j =1
Likelihood
• We denote the function “likelihood”
likelihood
instead of “probability” (i.e., “L”
i t d off “P”) because
instead b item
it
responses are observed values.
n
L(u | θ ) = ∏ Pj (θ ) Q j (θ )
Uj 1−U j
j =1
Log likelihood
Log-likelihood
• In practice
practice, we work with a
transformation of the likelihood called
the log-likelihood
log likelihood function
function, which is
simply equal to finding the natural
logarithm (ln) of the likelihood
likelihood.
• This transformation maintains the
interval nature of the likelihood
likelihood, and
has useful properties in estimation.
0
-1
elihood
d
-2
Log-like
-3
L
-4
-5
0.0 0.5 1.0
Lik lih d
Likelihood
Log-likelihood
Log likelihood Function
• Convenient scale for interpretation
(multiplying many probabilities
results
lt in
i very smallll values).
l )
• Computational efficiency: the
product of probabilities is equivalent
to the sum of log-probabilities.
log probabilities
n
L =∏P Q
Uj 1−U j
j j
j =1
n
ln L = ln ∏ P Q
Uj 1−U j
j j
j =1
n
= ∑ ln[ P Q
Uj 1−U j
j j ]
j =1
Finding the MLE
• Intuitively
Intuitively, we could estimate an MLE
for any examinee by plotting many
possible θ values in a wide range
(from say -4 to +4) and determining
which θ value maximizes the log-
log
likelihood function…
– This would work
work, but slowly
slowly…
you have to give a score to every
examinee!
Instead of finding the MLE by trial-and-error,
notice that the MLE occurs where the slope of the
log-likelihood function is equal to zero.
0.0
d ln L
=0
dθ
-0.5
od
10
-1.0
Likelihoo
-1.5
Log-L
-2.0
-2.5 MLE
-3
3.0
0
-3 -2 -1 0 1 2 3
Ability (θ)
The same MLE is found whether we use the
lik lih d function
likelihood f i or the
h log-likelihood
l lik lih d
1.0
09
0.9
0.8
dL
0.7
=0
dθ
Likelihood
0.6
0.5
0.4
0.3
0.2
MLE
0.1
00
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
How can we do this efficiently?
• How do we find the point where the
slope of likelihood function is zero?
• Newton-Raphson
Newton Raphson Method
– After Issac Newton and Joseph
Raphson.
Raphson
• Will converge to a solution much
more quickly
i kl th
than if you plotted
l tt d a
wide range of the log-L function.
Newton-Raphson
Newton Raphson Method
• Determine first derivative of log-L
log L for
an initial estimate for θ Æ θ0
• The tangent line of the 1st derivative
(i.e., the 2nd derivative) will cross the
x axis at a point (θ1) closer to the
x-axis
MLE than θ0 did.
• Repeat
R untilil θn reaches
h convergence
(i.e., changes “very little”).
The slope of the log-likelihood function for any
ggiven level of θ is the first derivative:
0.0
d ln L
=0
-0.5 dθ
od
10
-1.0
Likelihoo
-1.5
Log-L
-2.0
-2.5 MLE
-3
3.0
0
-3 -2 -1 0 1 2 3
Ability (θ)
Absolute value of
first derivative of
log likelihood
log-likelihood
d ln L
dθ
(0, y0) (θ0, y0)
(θ0, 0)
y0 − 0 y0 θ1 θ0
Slope(m0 ) =
θ 0 − θ1 θ1 = θ 0 −
m0
y0
= improved initial
θ 0 − θ1 estimate estimate
Hypothetical 5-item
5 item Test
Consider various log
log-likelihood
likelihood functions
Item a b c
1 1.00 -2.00 0.25
2 1 00
1.00 -1 00
-1.00 0 25
0.25
3 1.00 0.00 0.25
4 1 00
1.00 1 00
1.00 0 25
0.25
5 1.00 2.00 0.25
5-item
5 item Test ICCs
1.0
09
0.9
0.8
0.7
P((u=1|θ )
0.6
0.5
0.4
0.3
02
0.2
0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ )
5-item
5 item Test TCC
5
3
X|θ )
E(X
0
-3 -2 -1 0 1 2 3
Ability (θ )
0
u = (1,1,0,0,0)
2
-2
og-Likeliihood
-4
-6
Lo
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ)
u = (1,1,1,0,0)
0
2
-2
og-Likeliihood
-4
-6
Lo
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ)
u = (0,0,0,0,0)
0
2
-2
og-Likeliihood
-4
-6
Lo
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ)
u = (1,1,1,1,1)
0
2
-2
og-Likeliihood
-4
-6
Lo
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ)
0
u = (0,1,1,1,1)
2
-2
og-Likeliihood
-4
-6
Lo
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ)
Benefits of MLE
• MLEs have desirable properties:
– Efficiency (asymptotically smallest
variance).
variance)
– Consistent (asymptotically unbiased).
– Asymptotically normally distributed
distributed.
• “Asymptotically” here refers to the
number
b off ititems. The
Th longer
l th
the ttest,
t
the better MLE works.
The Problem with MLE
• As we’ve
we ve seen,
seen sometimes there is no
MLE for a given response vector.
• These are less likely to occur with
longer
g tests, but still p
possible.
• We still, however, would like to
provide scores for these examinees!
• Bayesian estimation can help…
Bayes’ Theorem
• Thomas Bayes (1702-1761) put forth
what is known as “Bayes’
Bayes Theorem
Theorem”
Bayesian: f (u | θ ) f (θ )
“No, we can
estimate the
f (θ | u ) =
distribution of a
f (u )
parameter given
the data.”
f (θ | u ) ∝ L(u | θ ) f (θ )
Bayesian Ability Estimation
• The approach basically entails
combining the likelihood function with
a prior
i distribution
di t ib ti tot estimate
ti t theth
posterior distribution of ability:
Posterior is proportional to Likelihood X prior
f (θ | u ) ∝ L(u | θ ) f (θ )
Log likelihood with Bayes
Log-likelihood
f (θ | u ) ∝ L(u | θ ) f (θ )
f (θ | u ) ∝ L(u | θ ) f (θ )
• Take the likelihood function for
any given
i l off θ and
value d multiply
lti l
it byy the prior
p y value at θ.
density
Bayes’ Estimates in Practice
Bayes
f (θ | u ) ∝ L(u | θ ) f (θ )
• Conceptually
Conceptually, this computation effectively
adds an “item” to each likelihood function
and every examinee gets it “correct”.
correct .
• Instead of a monotonically increasing
function for this “item”
item we choose, say, a
normal density, which results in “pulling
down the tails” of the likelihood function.
ln ((Prior)
o)
0 Prior ~ N(0,1)
u = (1,1,0,0,0)
Prior
elihood & Log-P
-2
-4
-6
Log-Like
-8
L
-10
-3 -2 -1 0 1 2 3
Ability (θ )
0
u = (1,1,0,0,0)
-2
terior
-4
Log-Pos
-6
L
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
ln (Prior) u = ((1,1,1,0,0)
, , , , )
0 Prior ~ N(0,1)
Prior
kelihood & Log-P
-2
2
-4
-6
Log-Lik
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = ((1,1,1,0,0)
, , , , )
0
-2
terior
-4
Log-Pos
-6
L
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = (0,0,0,0,0) ln (Prior)
Prior ~ N(0
N(0,1)
1)
0
Prior
elihood & Log-P
-2
-4
-6
Log-Lik
-8
L
-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = (0,0,0,0,0)
0
-2
terior
-4
Log-Pos
-6
L
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
ln (Prior) u = (1,1,1,1,1)
0 Prior ~ N(0,1)
Prior
elihood & Log-P
-2
-4
-6
Log-Lik
-8
L
-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = (1,1,1,1,1)
0
-2
terior
-4
Log-Pos
-6
L
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
ln (Prior)
Prior ~ N(0,1)
N(0 1)
0
Prior
elihood & Log-P
-2
-4
u = (0,1,1,1,1)
(0 1 1 1 1)
-6
Log-Lik
-8
L
-10
-3 -2 -1 0 1 2 3
Ability (θ )
0
u = (0,1,1,1,1)
-2
terior
-4
Log-Pos
-6
L
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
Bayesian vs
vs. MLE
• Bayesian estimates will be biased
towards the mean of the prior
distribution; this is more apparent with
shorter tests, as the prior has a lot of
influence (1/6 in our example!)
example!).
• Influence of the prior can be lessened
by choosing a relatively
“uninformative” prior, e.g., N(0,10).
Bayesian vs
vs. MLE
• Recall that MLEs are asymptotically
unbiased; with relatively short tests,
MLE will
MLEs ill b
be bi
biased
d outwards,
t d as
opposed to Bayesian estimates,
which are biased inwards.
• Choosing a Uniform Prior will result in
estimates identical to MLE.
Jointt Estimation
Jo st at o ofo Item
te and
a d
Ability Parameters
• What happens when all you have is
a dataset filled with item responses
and no previously estimated item
parameters are available for you to
use in scoring?
– Fairly technical topic
topic, we’ll
we ll keep it on
the applied side for this discussion.
Jointt Estimation
Jo st at o ofo Item
te and
a d
Ability Parameters
• Most Common Approach:
– Marginal Maximum Likelihood
• Other procedures
– Joint
J i t Maximum
M i Lik
Likelihood
lih d
• Rasch only
– Markov Chain
C Monte C
Carlo ((MCMC)
C C)
• Typically only used for highly
parameterized models
Joint Maximum Likelihood
1. Obtain initial estimates of θi, i=1,…,N.
1 i=1 N
2. Solve likelihood equations for bj, aj, and
cj, j=1,…,n.
j=1 n
3. Return to the first set of equations and
l ffor θi, i=1,…,N.
solve i 1 N
4. Repeat until estimates from one iteration
to the next converge (popular criterion is
0.001).
Joint Maximum Likelihood
• JMLE can work well for the Rasch
model, but estimates are not always
consistent
i t t or unbiased
bi d ffor th
the 2
2-
and 3-PL models.
• Item (structural) parameters need to
be estimated without reference to
ability (incidental) parameters…
Marginal MLE
• Used to estimate the item
parameters using the marginal
di t ib ti off ability
distribution bilit parameters.
t
• Person parameters are then
estimated using one of the
previously mentioned techniques
techniques,
treating item parameters as fixed
and known
known.
Marginal Distribution: Continuous
∞
f ( x1 ) = ∫
−∞
f ( x1 , x2 ) dx2
Where f(x1) is the marginal distribution of item parameters,
f( 2) is
f(x i the
th marginal
i l distribution
di t ib ti off ability
bilit parameters,
t
and
f(x1,x2) is the joint distribution of item & ability
Marginal Distribution: Discrete
f ( x1 ) = ∑ f ( x1 , x2 )
x2
Where f(x1) is the marginal distribution of item parameters,
f( 2) is
f(x i the
th marginal
i l distribution
di t ib ti off ability
bilit parameters,
t
and
f(x1,x2) is the joint distribution of item & ability
Integrating Over Theta
• Finding the marginal distribution of
the item parameters by integrating
over θ …
• This eliminates θ from the function
and we can use the resulting
likelihood function to get the
parameter estimates.
Integrating Over θ
• However
However, to do this
this, we don’t
don t want to
weight all θ values the same in
determining the contribution to the
marginal density, and as such, a
distribution of θ is used to take this
into account.
– Typically a N(0
N(0,1)
1) distribution is used
used.
– Could be empirically derived.
MMLE
• Once θ has been eliminated from
the function, item parameters can
be estimated using a Maximum
Likelihood procedure
procedure.
• Bayesian procedures, too!
Marginal MLE
1 Specify a density function for θ,
1. θ
e.g., θ ∼ N(0,1), and treat
examinees as a random sample
from θ distribution.
distribution
2. Integrate over the log-likelihood
function of item parameters with
respect to θ.
Marginal MLE
X7
of Density Functtion
0.4
X6 X8
0.3
X5 X9
0.2
Height o
X4 X10
0.1
X3 X11
X2 X12
X1 X13
0.0
-3
3 -2.5
25 -2
2 -1.5
15 -1
1 -0.5
05 0 05
0.5 1 15
1.5 2 25
2.5 3
Ability Quadrature Nodes
EM Algorithm
• Expectation step: use provisional item
parameter estimates to compute
examinee
i lik
likelihood
lih d values
l along
l q
quadrature points (i.e., discrete
values) and estimate expected
frequency
q y and p proportion
p correct
values for each point.
EM Algorithm
• Maximization step: Iteratively solve
item parameter likelihood functions
( t included
(not i l d dhhere)) using
i expected
t d
frequency and proportion correct
values.
• These two steps go back and forth
until the overall likelihood is
unchanged from the previous cycle
“unchanged” cycle.
Bayesian Item Estimation
• In the analogous situation to estimation of
ability alone, there may arise situations
where no maximum exists for the
multidimensional likelihood function.
– Weird values are still possible.
• Specification of priors for item parameters
and consequent estimation of the posterior
distributions of item parameters will
facilitate convergence of solutions.
Bayesian Item Estimation
• With or without marginalization of
incidental parameters (i.e., θ).
–Marginalization: simple Bayesian
extension of the MMLE/EM
procedure.
–Without:
With t MCMC approaches. h
Marginalized Approach
• Mislevy & Bock (1986, 1989).
– Implemented in BILOG, BILOG-MG.
• Works in essentially the same way as the
MMLE/EM approach, except that
likelihood functions for item parameters
p
are “mixed” with prior distributions.
– Point estimates are subsequently derived by
finding the mean or mode of the posterior
distribution for each item parameter.
Priors
• Informative: small variance.
variance
• Uninformative: relativelyy largeg
variance.
• The
Th prior’s
i ’ iinfluence
fl will
ill b
be
inverselyy p
proportional
p to ((1)) its
variance, and (2) the amount of
data available for estimation
estimation.
Great Reference for more
technical discussion of MMLE,
EM, & Bayesian approaches:
• T
To simplify,
i lif a di
distribution
t ib ti ffrom which
hi h we
can easily draw samples is chosen, and
random
d d
draws are ttaken
k ffrom it.
it
• These draws are either accepted or
rejected as being plausible from the
actual posterior distribution until we
have retained enough draws to make
inferences.
MCMC Estimation
• The draws that are retained are
then taken to be a sample from the
posterior
i di
distribution.
ib i
• Using the sample from the
posterior, the point estimates can
taken to be the mean or the mode
of the posterior distribution.
MCMC Estimation
• One nice feature:
– One can incorporate estimates of
uncertainty into parameter estimation.
• E.g., standard errors of ability can be
taken into account when estimating item
parameters.
• Likewise,
Lik i standard
t d d errors off ititem
parameters can be taken into account
when estimating ability parameters
parameters.
MCMC Conclusion
• Method of simulating random draws from
any theoretical multivariate distribution.
– Any posterior distribution.
distribution
• Features of the theoretical distribution (i.e.,
mean variance) are then estimated based
mean,
on the random sample.
• Easy to specify,
f but takes a long time.
– Very inefficient, but it almost always works.
Next…
Next