Item IRT Parameters PDF

Estimation of Person and Item
P
Parameters
t in
i IRT
American Board of Internal Medicine
Item Response Theory Course
Overview
• Ability parameter estimation with
known item parameters:
–Maximum Likelihood.
–Bayesian
Bayesian procedures.
procedures
• Joint estimation of Item and
Ability parameters when both are
unknown.
unknown
Estimation
E ti ti off Ability
Abilit with
ith
Known
o Item
e Parameters
aa ees
• Given item p parameters and
the vector of observed item
responses forf an examinee,
i
what is the most likely
y abilityy
level for this examinee?
Recall IRT Assumptions
• Unidimensionality of the Test
• Local Independence
• N t
Nature off the
th ICC
• Parameter Invariance
• These assumptions are fundamental

to the estimation of parameters.
3-PL
3 PL IRT Model
Da j (θi −b j )
e
P(uij = 1| θi ) = c j + (1 − c j ) Da j (θi −b j )
1+ e
If a,, b,, and c are known,, how do we use

this model to estimate examinee ability?
Correct and Incorrect
• Up until now
now, we’ve
we ve only been talking
about response probabilities in terms
of correct item responses
responses,
P(U = 1|θ) Æ Pj(θ) or “P”.
• For every item
item, there is a
corresponding expression for the
probability of an incorrect response
response,
P(U = 0|θ) Æ Qj(θ) or “Q”.
Correct and Incorrect
• Law of Total Probability: the sum of
probabilities for all possible
occurrences isi equall tto one.
• For dichotomous items: {0,1}
{ }
• Q = 1 – P for every level of θ
• P + Q = 1 forf every level
l l off θ
1.0
09
0.9
Q P
0.8
Probabillity
07
0.7
0.6
Item: b = 0.0
sponse P
0.5
a = 1.0
0.4
c = 0.0
00
Res
0.3
0.2
0.1
0.0
-3
3 -2
2 -1
1 0 1 2 3
Ability (θ)
1.0
09
0.9
0.8
Probabillity
07
0.7 Q P
0.6
Item: b = -0.5
sponse P
0.5
a = 0.5
0.4
c = 0.1
01
Res
0.3
0.2
0.1
0.0
-3
3 -2
2 -1
1 0 1 2 3
Ability (θ)
Estimation
E ti ti off Ability
Abilit with
ith
Known
o Item
e Parameters
aa ees
• Given item parameters and the
vector
t off observed
b d item
it responses
for an examinee, what is the most
lik l ability
likely bilit level
l l for
f this
thi examinee?
i ?
• What value of θ is most likelyy to
result in the pattern of item
p
responses we observed?
Local Independence
• Recall that the principle of local item
independence states that item responses
y independent
are statistically p given θ:
g
P(u1 , u2 | θ ) = P(u1 | θ ) P(u2 | θ )

• This means that once we know ability
ability, an
examinee’s response to one item is not
affected by responses to other items.
P(u1 , u2 | θ ) = P(u1 | θ ) P(u2 | θ )
• In practical terms
terms, this states that the
probability associated with any
combination of item responses
responses,
((0,0),
, ), (0,1),
( , ), (1,0),
( , ), or (1,1),
( , ),
is equal to the product of the separate

probabilities associated with each item.
1.0
Item 1: P1(θ)
0.9
0.8 b = -0.5
Probabilitty
0.7 a = 1.0
0.6 c = 0.1 Item 2:
sponse P
0.5
P2(θ) b = 1.0
0.4
a = 0.5
05
Res
03
0.3
0.2 c = 0.2
01
0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
Q1(θ)
0.9 Item 2:
0.8
b = 1.0
Probabilitty
Q2(θ)
0.7
a = 0.5
0.6
c = 0.2
02
sponse P
0.5
Item 1:
0.4
b = -0.5
Res
0.3
0.2
a = 1.0
01
0.1 c = 0.1
01
0.0
-3 -2 -1 0 1 2 3
Ability (θ )
For any given level of θ, the 1.0
P1(θ)
0.9
probability of observing a 0.8
e Probability
certain response pattern is 0.7
0.6
obtained by multiplying the 0.5
Response
corresponding probabilities 0.4
P2(θ)
for correct or incorrect 0.3
0.2
responses
p ((P or Q)
Q). 01
0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
Response patterns: 0.9

Q1(θ)
Response Prrobability 0.8
(0,0) Æ Q1(θ)Q2(θ) 0.7

Q2(θ)
0.6
(0,1) Æ Q1(θ)P2(θ) 0.5
0.4
(1,0) Æ P1(θ)Q2(θ) 0.3
0.2
(1,1) Æ P1(θ)P2(θ) 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
Determining Likelihood
• From the model
model, we know the
probability of a correct or incorrect
response for
f each h item,
it so we can
determine the likelihood of a certain
response pattern for many levels of θ
and determine which level
corresponds to the highest probability.
Why “Likelihood?”
Likelihood?
• When discussing “the the probability”
probability of
something occurring, it implies that we
haven’tt observed it happen
haven happen.
• We start with item response data, so item
responses are clearly NOT unobserved
unobserved.
• In this situation, we refer to the “likelihood”
off observing
b i a certain
t i response pattern,
tt
given ability.
Each “likelihood function” is determined by multiplying P1 or Q1
byy P2 or Q2 for manyy levels of θ from -3 to 3.
1.0
0.9 u = (1,1)
Item 1: a = 1.0, b = -0.5, c = 0.1
0.8
u = (0,0) Item 2: a = 0.5, b = 1.0, c = 0.2
0.7
Liikelihood
d
0.6
0.5
u = (1,0)
04
0.4
0.3
u = (0,1)
02
0.2
0.1
00
0.0
-3 -2 -1 0 1 2 3
Ability (θ )
The point where the likelihood function of θ reaches its largest
value is known as the Maximum Likelihood Estimate ((MLE)) for θ.
1.0
0.9 u = (1,1)
Item 1: a = 1.0, b = -0.5, c = 0.1
0.8
u = (0,0) Item 2: a = 0.5, b = 1.0, c = 0.2
0.7
Liikelihood
d
0.6
0.5
u = (1,0)
04
0.4
0.3
u = (0,1)
02
0.2
0.1
00
0.0
-3 -2 -1 0 1 2 3
Ability (θ )
1.0
Q1(θ)
0.9
What do these Likelihood 0.8
onse Probability
Functions tell us? 0.7
Q2(θ)
0.6
0.5
0.4
Respo
Response pattern: 0.3
0.2
(0,0) Æ Q1(θ)Q2(θ) 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
The likelihood function 0.9
continues to increase as θ 0.8

u = (0,0)
0.7
approaches -∞
Likelihood
0.6
05
0.5
0.4
0.3
This is why administering a 0.2
2 it test
2-item t t is
i a bad
b d idea!
id ! 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
0.9 P1(θ)
onse Probability
0.6
0.5
P2(θ)
0.4
Respo
0.2
(1,1) Æ P1(θ)P2(θ) 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
The likelihood function 0.9 u = ((1,1)
, )
continues to increase as θ 0.8
0.7
approaches +∞
Likelihood
0.6
05
0.5
0.4
0.3
Another reason why 2-item 0.2
t t are a bad
tests b d idea!
id ! 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
Q1(θ)
0.9
onse Probability
0.6
0.5
0.4
Respo
P2(θ)
0.2
(0,1) Æ Q1(θ)P2(θ) 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
reaches a maximum at the 0.8
point where θ ≈ -1.75

0.7
Likelihood
0.6
05
0.5
0.4
0.3
u = (0,1)
A much “better behaved” 0.2
lik lih d function…

likelihood f ti 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
0.9
Q2(θ)
onse Probability
0.6
0.5
0.4
Respo
P1(θ)
0.2
(1,0) Æ P1(θ)Q2(θ) 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
reaches a maximum at the 0.8
point where θ ≈ 0.25

0.7
Likelihood
0.6
05
0.5
u = (1,0)
0.4
0.3
The “best behaved” 0.2
lik lih d function

likelihood f ti yet…
t 0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
Local Independence
• Generalized
Generalized, the probability of observing
the item response vector, u, is equal to the
product of the individual probabilities for
each item:
n
P(u | θ ) = ∏ Pj (θ ) Q j (θ )
Uj 1−U j
j =1
Likelihood
• We denote the function “likelihood”
likelihood
instead of “probability” (i.e., “L”
i t d off “P”) because
instead b item
it
responses are observed values.
n
L(u | θ ) = ∏ Pj (θ ) Q j (θ )
Uj 1−U j
j =1
Log likelihood
Log-likelihood
• In practice
practice, we work with a
transformation of the likelihood called
the log-likelihood
log likelihood function
function, which is
simply equal to finding the natural
logarithm (ln) of the likelihood
likelihood.
• This transformation maintains the
interval nature of the likelihood
likelihood, and
has useful properties in estimation.
0
-1
elihood
d
-2
Log-like
-3
L
-4
-5
0.0 0.5 1.0
Lik lih d
Likelihood
Log-likelihood
Log likelihood Function
• Convenient scale for interpretation
(multiplying many probabilities
results
lt in
i very smallll values).
l )
• Computational efficiency: the
product of probabilities is equivalent
to the sum of log-probabilities.
log probabilities
n
L =∏P Q
Uj 1−U j
j j
j =1
n
ln L = ln ∏ P Q
Uj 1−U j
j j
j =1
n
= ∑ ln[ P Q
Uj 1−U j
j j ]
j =1
Finding the MLE
• Intuitively
Intuitively, we could estimate an MLE
for any examinee by plotting many
possible θ values in a wide range
(from say -4 to +4) and determining
which θ value maximizes the log-
log
likelihood function…
– This would work
work, but slowly
slowly…
you have to give a score to every
examinee!
Instead of finding the MLE by trial-and-error,
notice that the MLE occurs where the slope of the
log-likelihood function is equal to zero.
0.0
d ln L
=0
dθ
-0.5
od
10
-1.0
Likelihoo
-1.5
Log-L
-2.0
-2.5 MLE
-3
3.0
0
-3 -2 -1 0 1 2 3
Ability (θ)
The same MLE is found whether we use the
lik lih d function
likelihood f i or the
h log-likelihood
l lik lih d
1.0
09
0.9
0.8
dL
0.7
=0
dθ
Likelihood
0.6
0.5
0.4
0.3
0.2
MLE
0.1
00
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
How can we do this efficiently?
• How do we find the point where the
slope of likelihood function is zero?
• Newton-Raphson
Newton Raphson Method
– After Issac Newton and Joseph
Raphson.
Raphson
• Will converge to a solution much
more quickly
i kl th
than if you plotted
l tt d a
wide range of the log-L function.
Newton-Raphson
Newton Raphson Method
• Determine first derivative of log-L
log L for
an initial estimate for θ Æ θ0
• The tangent line of the 1st derivative
(i.e., the 2nd derivative) will cross the
x axis at a point (θ1) closer to the
x-axis
MLE than θ0 did.
• Repeat
R untilil θn reaches
h convergence
(i.e., changes “very little”).
The slope of the log-likelihood function for any
ggiven level of θ is the first derivative:
0.0
d ln L
=0
-0.5 dθ
od
10
-1.0
Likelihoo
-1.5
Log-L
-2.0
-2.5 MLE
-3
3.0
0
-3 -2 -1 0 1 2 3
Ability (θ)
Absolute value of
first derivative of
log likelihood
log-likelihood
d ln L
dθ
(0, y0) (θ0, y0)
(θ0, 0)
y0 − 0 y0 θ1 θ0
Slope(m0 ) =
θ 0 − θ1 θ1 = θ 0 −
m0
y0
= improved initial
θ 0 − θ1 estimate estimate
Hypothetical 5-item
5 item Test
Consider various log
log-likelihood
likelihood functions
Item a b c
1 1.00 -2.00 0.25
2 1 00
1.00 -1 00
-1.00 0 25
0.25
3 1.00 0.00 0.25
4 1 00
1.00 1 00
1.00 0 25
0.25
5 1.00 2.00 0.25
5-item
5 item Test ICCs
1.0
09
0.9
0.8
0.7
P((u=1|θ )
0.6
0.5
0.4
0.3
02
0.2
0.1
0.0
-3 -2 -1 0 1 2 3
Ability (θ )
5-item
5 item Test TCC
5
3
X|θ )
E(X
0
-3 -2 -1 0 1 2 3
Ability (θ )
0
u = (1,1,0,0,0)
2
-2
og-Likeliihood
-4
-6
Lo
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ)
u = (1,1,1,0,0)
0
2
-2
og-Likeliihood
-4
-6
Lo
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ)
u = (0,0,0,0,0)
0
2
-2
og-Likeliihood
-4
-6
Lo
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ)
u = (1,1,1,1,1)
0
2
-2
og-Likeliihood
-4
-6
Lo
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ)
0
u = (0,1,1,1,1)
2
-2
og-Likeliihood
-4
-6
Lo
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ)
Benefits of MLE
• MLEs have desirable properties:
– Efficiency (asymptotically smallest
variance).
variance)
– Consistent (asymptotically unbiased).
– Asymptotically normally distributed
distributed.
• “Asymptotically” here refers to the
number
b off ititems. The
Th longer
l th
the ttest,
t
the better MLE works.
The Problem with MLE
• As we’ve
we ve seen,
seen sometimes there is no
MLE for a given response vector.
• These are less likely to occur with
longer
g tests, but still p
possible.
• We still, however, would like to
provide scores for these examinees!
• Bayesian estimation can help…
Bayes’ Theorem
• Thomas Bayes (1702-1761) put forth
what is known as “Bayes’
Bayes Theorem
Theorem”
• This is more controversial than you

might think! A lot of debate exists
about this approach to estimation…
Frequentists vs. Bayesians
Frequentist: P( B | A) P( A)
“All we can reallyy do is P( A | B) =
estimate the likelihood of
observing these data
P( B)
given a parameter.
parameter ”
Bayesian: f (u | θ ) f (θ )
“No, we can
estimate the
f (θ | u ) =
distribution of a
f (u )
parameter given
the data.”
f (θ | u ) ∝ L(u | θ ) f (θ )
Bayesian Ability Estimation
• The approach basically entails
combining the likelihood function with
a prior
i distribution
di t ib ti tot estimate
ti t theth
posterior distribution of ability:
Posterior is proportional to Likelihood X prior
f (θ | u ) ∝ L(u | θ ) f (θ )
Log likelihood with Bayes
Log-likelihood
f (θ | u ) ∝ L(u | θ ) f (θ )
ln[ f (θ | u )] ∝ ln[ L(u | θ )] + ln[ f (θ )]

• As with MLE, this transformation
won’t affect the estimate, and it is
p
computationally y more efficient.
Prior Distribution
• By specifying some density function we
expect θ to follow in the population, e.g.,
N(0 1) we can then estimate the form of
N(0,1),
the posterior distribution of ability for any
given examinee
examinee.
• f(θ) is called the prior distribution because
it represents some prior belief about the
distribution of θ in the population.
Posterior Distribution
• Once a prior has been specified
specified, we can
estimate the posterior distribution of ability
and determine a point estimate and
standard error directly from it.
• The posterior represents θ as a random
variable; it is conceptually determined by
the union of the observed data tempered
by our prior belief in θ’s distribution.
Estimates from the Posterior
• Mode of the posterior:
Modal a Posteriori (MAP) estimate
• Mean of the posterior:
Expected
p a Posteriori ((EAP)) estimate
• Standard Errors are computed
directly by estimating the standard
deviation of the posterior distribution.
Bayes’ Estimates in Practice
Bayes
f (θ | u ) ∝ L(u | θ ) f (θ )
• Take the likelihood function for
any given
i l off θ and
value d multiply
lti l
it byy the prior
p y value at θ.
density
Bayes’ Estimates in Practice
Bayes
f (θ | u ) ∝ L(u | θ ) f (θ )
• Conceptually
Conceptually, this computation effectively
adds an “item” to each likelihood function
and every examinee gets it “correct”.
correct .
• Instead of a monotonically increasing
function for this “item”
item we choose, say, a
normal density, which results in “pulling
down the tails” of the likelihood function.
ln ((Prior)
o)
0 Prior ~ N(0,1)
u = (1,1,0,0,0)
Prior
elihood & Log-P
-2
-4
-6
Log-Like
-8
L
-10
-3 -2 -1 0 1 2 3
Ability (θ )
0
u = (1,1,0,0,0)
-2
terior
-4
Log-Pos
-6
L
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
ln (Prior) u = ((1,1,1,0,0)
, , , , )
0 Prior ~ N(0,1)
Prior
kelihood & Log-P
-2
2
-4
-6
Log-Lik
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = ((1,1,1,0,0)
, , , , )
0
-2
terior
-4
Log-Pos
-6
L
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = (0,0,0,0,0) ln (Prior)
Prior ~ N(0
N(0,1)
1)
0
Prior
elihood & Log-P
-2
-4
-6
Log-Lik
-8
L
-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = (0,0,0,0,0)
0
-2
terior
-4
Log-Pos
-6
L
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
ln (Prior) u = (1,1,1,1,1)
0 Prior ~ N(0,1)
Prior
elihood & Log-P
-2
-4
-6
Log-Lik
-8
L
-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = (1,1,1,1,1)
0
-2
terior
-4
Log-Pos
-6
L
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
ln (Prior)
Prior ~ N(0,1)
N(0 1)
0
Prior
elihood & Log-P
-2
-4
u = (0,1,1,1,1)
(0 1 1 1 1)
-6
Log-Lik
-8
L
-10
-3 -2 -1 0 1 2 3
Ability (θ )
0
u = (0,1,1,1,1)
-2
terior
-4
Log-Pos
-6
L
-8
-10
-3 -2 -1 0 1 2 3
Ability (θ )
Bayesian vs
vs. MLE
• Bayesian estimates will be biased
towards the mean of the prior
distribution; this is more apparent with
shorter tests, as the prior has a lot of
influence (1/6 in our example!)
example!).
• Influence of the prior can be lessened
by choosing a relatively
“uninformative” prior, e.g., N(0,10).
Bayesian vs
vs. MLE
• Recall that MLEs are asymptotically
unbiased; with relatively short tests,
MLE will
MLEs ill b
be bi
biased
d outwards,
t d as
opposed to Bayesian estimates,
which are biased inwards.
• Choosing a Uniform Prior will result in
estimates identical to MLE.
Jointt Estimation
Jo st at o ofo Item
te and
a d
Ability Parameters
• What happens when all you have is
a dataset filled with item responses
and no previously estimated item
parameters are available for you to
use in scoring?
– Fairly technical topic
topic, we’ll
we ll keep it on
the applied side for this discussion.
Jointt Estimation
Jo st at o ofo Item
te and
a d
Ability Parameters
• Most Common Approach:
– Marginal Maximum Likelihood
• Other procedures
– Joint
J i t Maximum
M i Lik
Likelihood
lih d
• Rasch only
– Markov Chain
C Monte C
Carlo ((MCMC)
C C)
• Typically only used for highly
parameterized models
Joint Maximum Likelihood
1. Obtain initial estimates of θi, i=1,…,N.
1 i=1 N
2. Solve likelihood equations for bj, aj, and
cj, j=1,…,n.
j=1 n
3. Return to the first set of equations and
l ffor θi, i=1,…,N.
solve i 1 N
4. Repeat until estimates from one iteration
to the next converge (popular criterion is
0.001).
Joint Maximum Likelihood
• JMLE can work well for the Rasch
model, but estimates are not always
consistent
i t t or unbiased
bi d ffor th
the 2
2-
and 3-PL models.
• Item (structural) parameters need to
be estimated without reference to
ability (incidental) parameters…
Marginal MLE
• Used to estimate the item
parameters using the marginal
di t ib ti off ability
distribution bilit parameters.
t
• Person parameters are then
estimated using one of the
previously mentioned techniques
techniques,
treating item parameters as fixed
and known
known.
Marginal Distribution: Continuous
∞
f ( x1 ) = ∫
−∞
f ( x1 , x2 ) dx2
Where f(x1) is the marginal distribution of item parameters,
f( 2) is
f(x i the
th marginal
i l distribution
bilit parameters,
t
and
f(x1,x2) is the joint distribution of item & ability
Marginal Distribution: Discrete
f ( x1 ) = ∑ f ( x1 , x2 )
x2
Where f(x1) is the marginal distribution of item parameters,
f( 2) is
f(x i the
th marginal
i l distribution
bilit parameters,
t
and
f(x1,x2) is the joint distribution of item & ability
Integrating Over Theta
• Finding the marginal distribution of
the item parameters by integrating
over θ …
• This eliminates θ from the function
and we can use the resulting
likelihood function to get the
parameter estimates.
Integrating Over θ
• However
However, to do this
this, we don’t
don t want to
weight all θ values the same in
determining the contribution to the
marginal density, and as such, a
distribution of θ is used to take this
into account.
– Typically a N(0
N(0,1)
1) distribution is used
used.
– Could be empirically derived.
MMLE
• Once θ has been eliminated from
the function, item parameters can
be estimated using a Maximum
Likelihood procedure
procedure.
• Bayesian procedures, too!
Marginal MLE
1 Specify a density function for θ,
1. θ
e.g., θ ∼ N(0,1), and treat
examinees as a random sample
from θ distribution.
distribution
2. Integrate over the log-likelihood
function of item parameters with
respect to θ.
Marginal MLE
3 Differentiate the log-L

3. log L
function with respect
p to bi, ai,
and ci, i=1,…,n.
4. The values for b, a, and c
that maximize the function
are the marginal MLEs.
Marginal MLE
• MMLE estimates will be consistent &
unbiased for the 1-, 2-, and 3-PL (if the
choice of θ distribution is reasonable)
reasonable).
• Fairly inefficient because it requires
i t ti over 2n response vectors!
integration t !
• EM (Expectation-Maximization)
algorithm can be (and is) used to
facilitate estimation (BILOG-MG, etc.).
EM Algorithm
• Iterative procedure for finding
MLEs.
• Allows us to more quickly find
estimates of item parameters in the
presence of an unobservable
random variable, θ .
• I practice
In i this
hi iis d
done bby
discretizing the θ distribution into q
quadrature nodes.
0.5
X7
of Density Functtion
0.4
X6 X8
0.3
X5 X9
0.2
Height o
X4 X10
0.1
X3 X11
X2 X12
X1 X13
0.0
-3
3 -2.5
25 -2
2 -1.5
15 -1
1 -0.5
05 0 05
0.5 1 15
1.5 2 25
2.5 3
Ability Quadrature Nodes
EM Algorithm
• Expectation step: use provisional item
parameter estimates to compute
examinee
i lik
likelihood
lih d values
l along
l q
quadrature points (i.e., discrete
values) and estimate expected
frequency
q y and p proportion
p correct
values for each point.
EM Algorithm
• Maximization step: Iteratively solve
item parameter likelihood functions
( t included
(not i l d dhhere)) using
i expected
t d
frequency and proportion correct
values.
• These two steps go back and forth
until the overall likelihood is
unchanged from the previous cycle
“unchanged” cycle.
Bayesian Item Estimation
• In the analogous situation to estimation of
ability alone, there may arise situations
where no maximum exists for the
multidimensional likelihood function.
– Weird values are still possible.
• Specification of priors for item parameters
and consequent estimation of the posterior
distributions of item parameters will
facilitate convergence of solutions.
Bayesian Item Estimation
• With or without marginalization of
incidental parameters (i.e., θ).
–Marginalization: simple Bayesian
extension of the MMLE/EM
procedure.
–Without:
With t MCMC approaches. h
Marginalized Approach
• Mislevy & Bock (1986, 1989).
– Implemented in BILOG, BILOG-MG.
• Works in essentially the same way as the
MMLE/EM approach, except that
likelihood functions for item parameters
p
are “mixed” with prior distributions.
– Point estimates are subsequently derived by
finding the mean or mode of the posterior
distribution for each item parameter.
Priors
• Informative: small variance.
variance
• Uninformative: relativelyy largeg
variance.
• The
Th prior’s
i ’ iinfluence
fl will
ill b
be
inverselyy p
proportional
p to ((1)) its
variance, and (2) the amount of
data available for estimation
estimation.
Great Reference for more
technical discussion of MMLE,
EM, & Bayesian approaches:
Baker,, F. B.,, & Kim,, S.-H. ((2004).

) Item
Response Theory: Parameter
Estimation Techniques q ((2nd Ed.).
) New
York: Marcel Dekker, Inc.
MCMC Estimation
• Markov Chain Monte Carlo .
yp y used for estimation
• Not typically
in the conventional models since
MMLE/EM typically works wellwell.
MCMC Estimation
• With some new highly
parameterized models, there can
be difficulties in applying the
MMLE/EM approach so a solution
is impossible .
• MCMC offers an alternative.
MCMC Estimation
• The concept of MCMC estimation is
to construct a set of random draws
from the posterior distribution for
each parameter being estimated.
• The
Th problem:
bl typically
i ll diffi
difficult
l to
draw random numbers from an
arbitrary distribution without a
known form.
MCMC Estimation
• T
To simplify,
i lif a di
distribution
t ib ti ffrom which
hi h we
can easily draw samples is chosen, and
random
d d
draws are ttaken
k ffrom it.
it
• These draws are either accepted or
rejected as being plausible from the
actual posterior distribution until we
have retained enough draws to make
inferences.
MCMC Estimation
• The draws that are retained are
then taken to be a sample from the
posterior
i di
distribution.
ib i
• Using the sample from the
posterior, the point estimates can
taken to be the mean or the mode
of the posterior distribution.
MCMC Estimation
• One nice feature:
– One can incorporate estimates of
uncertainty into parameter estimation.
• E.g., standard errors of ability can be
taken into account when estimating item
parameters.
• Likewise,
Lik i standard
t d d errors off ititem
parameters can be taken into account
when estimating ability parameters
parameters.
MCMC Conclusion
• Method of simulating random draws from
any theoretical multivariate distribution.
– Any posterior distribution.
distribution
• Features of the theoretical distribution (i.e.,
mean variance) are then estimated based
mean,
on the random sample.
• Easy to specify,
f but takes a long time.
– Very inefficient, but it almost always works.
Next…
Next
• Using IRT Software to

estimate parameters
p
–BILOG-MG Æ All Dichotomous
ititems
PARSCALE Æ Polytomous
–PARSCALE
and/or dichotomous items

Item IRT Parameters PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Item IRT Parameters PDF

Uploaded by

Copyright:

Available Formats

Estimation of Person and Item

• These assumptions are fundamental

If a,, b,, and c are known,, how do we use

P(u1 , u2 | θ ) = P(u1 | θ ) P(u2 | θ )

is equal to the product of the separate

for correct or incorrect 0.3

Response patterns: 0.9

Response Prrobability 0.8

(0,0) Æ Q1(θ)Q2(θ) 0.7

(0,1) Æ Q1(θ)P2(θ) 0.5

(1,0) Æ P1(θ)Q2(θ) 0.3

(1,1) Æ P1(θ)P2(θ) 0.1

continues to increase as θ 0.8

reaches a maximum at the 0.8

point where θ ≈ -1.75

lik lih d function…

reaches a maximum at the 0.8

point where θ ≈ 0.25

lik lih d function

• This is more controversial than you

ln[ f (θ | u )] ∝ ln[ L(u | θ )] + ln[ f (θ )]

3 Differentiate the log-L

Baker,, F. B.,, & Kim,, S.-H. ((2004).

• Using IRT Software to

You might also like