You are on page 1of 97

Estimation of Person and Item

P
Parameters
t in
i IRT
American Board of Internal Medicine
Item Response Theory Course
Overview
• Ability parameter estimation with
known item parameters:
–Maximum Likelihood.
–Bayesian
Bayesian procedures.
procedures
• Joint estimation of Item and
Ability parameters when both are
unknown.
unknown
Estimation
E ti ti off Ability
Abilit with
ith
Known
o Item
e Parameters
aa ees
• Given item p parameters and
the vector of observed item
responses forf an examinee,
i
what is the most likely
y abilityy
level for this examinee?
Recall IRT Assumptions
• Unidimensionality of the Test
• Local Independence
• N t
Nature off the
th ICC
• Parameter Invariance

• These assumptions are fundamental


to the estimation of parameters.
3-PL
3 PL IRT Model

Da j (θi −b j )
e
P(uij = 1| θi ) = c j + (1 − c j ) Da j (θi −b j )
1+ e

If a,, b,, and c are known,, how do we use


this model to estimate examinee ability?
Correct and Incorrect
• Up until now
now, we’ve
we ve only been talking
about response probabilities in terms
of correct item responses
responses,
P(U = 1|θ) Æ Pj(θ) or “P”.
• For every item
item, there is a
corresponding expression for the
probability of an incorrect response
response,
P(U = 0|θ) Æ Qj(θ) or “Q”.
Correct and Incorrect
• Law of Total Probability: the sum of
probabilities for all possible
occurrences isi equall tto one.
• For dichotomous items: {0,1}
{ }
• Q = 1 – P for every level of θ
• P + Q = 1 forf every level
l l off θ
1.0

09
0.9
Q P
0.8
Probabillity

07
0.7

0.6
Item: b = 0.0
sponse P

0.5
a = 1.0
0.4
c = 0.0
00
Res

0.3

0.2

0.1

0.0
-3
3 -2
2 -1
1 0 1 2 3
Ability (θ)
1.0

09
0.9

0.8
Probabillity

07
0.7 Q P

0.6
Item: b = -0.5
sponse P

0.5
a = 0.5
0.4
c = 0.1
01
Res

0.3

0.2

0.1

0.0
-3
3 -2
2 -1
1 0 1 2 3
Ability (θ)
Estimation
E ti ti off Ability
Abilit with
ith
Known
o Item
e Parameters
aa ees
• Given item parameters and the
vector
t off observed
b d item
it responses
for an examinee, what is the most
lik l ability
likely bilit level
l l for
f this
thi examinee?
i ?
• What value of θ is most likelyy to
result in the pattern of item
p
responses we observed?
Local Independence
• Recall that the principle of local item
independence states that item responses
y independent
are statistically p given θ:
g

P(u1 , u2 | θ ) = P(u1 | θ ) P(u2 | θ )


• This means that once we know ability
ability, an
examinee’s response to one item is not
affected by responses to other items.
P(u1 , u2 | θ ) = P(u1 | θ ) P(u2 | θ )
• In practical terms
terms, this states that the
probability associated with any
combination of item responses
responses,

((0,0),
, ), (0,1),
( , ), (1,0),
( , ), or (1,1),
( , ),

is equal to the product of the separate


probabilities associated with each item.
1.0
Item 1: P1(θ)
0.9

0.8 b = -0.5
Probabilitty

0.7 a = 1.0
0.6 c = 0.1 Item 2:
sponse P

0.5
P2(θ) b = 1.0
0.4
a = 0.5
05
Res

03
0.3

0.2 c = 0.2
01
0.1

0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
Q1(θ)
0.9 Item 2:
0.8
b = 1.0
Probabilitty

Q2(θ)
0.7
a = 0.5
0.6
c = 0.2
02
sponse P

0.5
Item 1:
0.4
b = -0.5
Res

0.3

0.2
a = 1.0
01
0.1 c = 0.1
01
0.0
-3 -2 -1 0 1 2 3
Ability (θ )
For any given level of θ, the 1.0

P1(θ)
0.9
probability of observing a 0.8

e Probability
certain response pattern is 0.7

0.6
obtained by multiplying the 0.5

Response
corresponding probabilities 0.4
P2(θ)

for correct or incorrect 0.3

0.2

responses
p ((P or Q)
Q). 01
0.1

0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0

Response patterns: 0.9


Q1(θ)

Response Prrobability 0.8

(0,0) Æ Q1(θ)Q2(θ) 0.7


Q2(θ)

0.6

(0,1) Æ Q1(θ)P2(θ) 0.5

0.4

(1,0) Æ P1(θ)Q2(θ) 0.3

0.2

(1,1) Æ P1(θ)P2(θ) 0.1

0.0
-3 -2 -1 0 1 2 3
Ability (θ)
Determining Likelihood
• From the model
model, we know the
probability of a correct or incorrect
response for
f each h item,
it so we can
determine the likelihood of a certain
response pattern for many levels of θ
and determine which level
corresponds to the highest probability.
Why “Likelihood?”
Likelihood?
• When discussing “the the probability”
probability of
something occurring, it implies that we
haven’tt observed it happen
haven happen.
• We start with item response data, so item
responses are clearly NOT unobserved
unobserved.
• In this situation, we refer to the “likelihood”
off observing
b i a certain
t i response pattern,
tt
given ability.
Each “likelihood function” is determined by multiplying P1 or Q1
byy P2 or Q2 for manyy levels of θ from -3 to 3.
1.0

0.9 u = (1,1)
Item 1: a = 1.0, b = -0.5, c = 0.1
0.8
u = (0,0) Item 2: a = 0.5, b = 1.0, c = 0.2
0.7
Liikelihood
d

0.6

0.5
u = (1,0)
04
0.4

0.3
u = (0,1)
02
0.2

0.1

00
0.0
-3 -2 -1 0 1 2 3
Ability (θ )
The point where the likelihood function of θ reaches its largest
value is known as the Maximum Likelihood Estimate ((MLE)) for θ.
1.0

0.9 u = (1,1)
Item 1: a = 1.0, b = -0.5, c = 0.1
0.8
u = (0,0) Item 2: a = 0.5, b = 1.0, c = 0.2
0.7
Liikelihood
d

0.6

0.5
u = (1,0)
04
0.4

0.3
u = (0,1)
02
0.2

0.1

00
0.0
-3 -2 -1 0 1 2 3
Ability (θ )
1.0
Q1(θ)
0.9
What do these Likelihood 0.8

onse Probability
Functions tell us? 0.7
Q2(θ)

0.6

0.5

0.4

Respo
Response pattern: 0.3

0.2
(0,0) Æ Q1(θ)Q2(θ) 0.1

0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
The likelihood function 0.9

continues to increase as θ 0.8


u = (0,0)
0.7
approaches -∞
Likelihood

0.6

05
0.5

0.4

0.3
This is why administering a 0.2

2 it test
2-item t t is
i a bad
b d idea!
id ! 0.1

0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0

0.9 P1(θ)
What do these Likelihood 0.8

onse Probability
Functions tell us? 0.7

0.6

0.5
P2(θ)
0.4

Respo
Response pattern: 0.3

0.2
(1,1) Æ P1(θ)P2(θ) 0.1

0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
The likelihood function 0.9 u = ((1,1)
, )
continues to increase as θ 0.8

0.7
approaches +∞
Likelihood

0.6

05
0.5

0.4

0.3
Another reason why 2-item 0.2

t t are a bad
tests b d idea!
id ! 0.1

0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
Q1(θ)
0.9
What do these Likelihood 0.8

onse Probability
Functions tell us? 0.7

0.6

0.5

0.4

Respo
P2(θ)
Response pattern: 0.3

0.2
(0,1) Æ Q1(θ)P2(θ) 0.1

0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
The likelihood function 0.9

reaches a maximum at the 0.8

point where θ ≈ -1.75


0.7
Likelihood

0.6

05
0.5

0.4

0.3
u = (0,1)
A much “better behaved” 0.2

lik lih d function…


likelihood f ti 0.1

0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0

0.9
Q2(θ)
What do these Likelihood 0.8

onse Probability
Functions tell us? 0.7

0.6

0.5

0.4

Respo
Response pattern: 0.3
P1(θ)

0.2
(1,0) Æ P1(θ)Q2(θ) 0.1

0.0
-3 -2 -1 0 1 2 3
Ability (θ)
1.0
The likelihood function 0.9

reaches a maximum at the 0.8

point where θ ≈ 0.25


0.7
Likelihood

0.6

05
0.5
u = (1,0)
0.4

0.3
The “best behaved” 0.2

lik lih d function


likelihood f ti yet…
t 0.1

0.0
-3 -2 -1 0 1 2 3
Ability (θ)
Local Independence
• Generalized
Generalized, the probability of observing
the item response vector, u, is equal to the
product of the individual probabilities for
each item:
n
P(u | θ ) = ∏ Pj (θ ) Q j (θ )
Uj 1−U j

j =1
Likelihood
• We denote the function “likelihood”
likelihood
instead of “probability” (i.e., “L”
i t d off “P”) because
instead b item
it
responses are observed values.
n
L(u | θ ) = ∏ Pj (θ ) Q j (θ )
Uj 1−U j

j =1
Log likelihood
Log-likelihood
• In practice
practice, we work with a
transformation of the likelihood called
the log-likelihood
log likelihood function
function, which is
simply equal to finding the natural
logarithm (ln) of the likelihood
likelihood.
• This transformation maintains the
interval nature of the likelihood
likelihood, and
has useful properties in estimation.
0

-1
elihood
d

-2
Log-like

-3
L

-4

-5
0.0 0.5 1.0

Lik lih d
Likelihood
Log-likelihood
Log likelihood Function
• Convenient scale for interpretation
(multiplying many probabilities
results
lt in
i very smallll values).
l )
• Computational efficiency: the
product of probabilities is equivalent
to the sum of log-probabilities.
log probabilities
n
L =∏P Q
Uj 1−U j
j j
j =1
n
ln L = ln ∏ P Q
Uj 1−U j
j j
j =1
n
= ∑ ln[ P Q
Uj 1−U j
j j ]
j =1
Finding the MLE
• Intuitively
Intuitively, we could estimate an MLE
for any examinee by plotting many
possible θ values in a wide range
(from say -4 to +4) and determining
which θ value maximizes the log-
log
likelihood function…
– This would work
work, but slowly
slowly…
you have to give a score to every
examinee!
Instead of finding the MLE by trial-and-error,
notice that the MLE occurs where the slope of the
log-likelihood function is equal to zero.
0.0
d ln L
=0

-0.5
od

10
-1.0
Likelihoo

-1.5
Log-L

-2.0

-2.5 MLE

-3
3.0
0
-3 -2 -1 0 1 2 3
Ability (θ)
The same MLE is found whether we use the
lik lih d function
likelihood f i or the
h log-likelihood
l lik lih d
1.0

09
0.9

0.8
dL
0.7
=0

Likelihood

0.6

0.5

0.4

0.3

0.2
MLE
0.1

00
0.0
-3 -2 -1 0 1 2 3
Ability (θ)
How can we do this efficiently?
• How do we find the point where the
slope of likelihood function is zero?
• Newton-Raphson
Newton Raphson Method
– After Issac Newton and Joseph
Raphson.
Raphson
• Will converge to a solution much
more quickly
i kl th
than if you plotted
l tt d a
wide range of the log-L function.
Newton-Raphson
Newton Raphson Method
• Determine first derivative of log-L
log L for
an initial estimate for θ Æ θ0
• The tangent line of the 1st derivative
(i.e., the 2nd derivative) will cross the
x axis at a point (θ1) closer to the
x-axis
MLE than θ0 did.
• Repeat
R untilil θn reaches
h convergence
(i.e., changes “very little”).
The slope of the log-likelihood function for any
ggiven level of θ is the first derivative:
0.0
d ln L
=0
-0.5 dθ
od

10
-1.0
Likelihoo

-1.5
Log-L

-2.0

-2.5 MLE

-3
3.0
0
-3 -2 -1 0 1 2 3
Ability (θ)
Absolute value of
first derivative of
log likelihood
log-likelihood

d ln L

(0, y0) (θ0, y0)

(θ0, 0)
y0 − 0 y0 θ1 θ0
Slope(m0 ) =
θ 0 − θ1 θ1 = θ 0 −
m0
y0
= improved initial
θ 0 − θ1 estimate estimate
Hypothetical 5-item
5 item Test
Consider various log
log-likelihood
likelihood functions

Item a b c
1 1.00 -2.00 0.25
2 1 00
1.00 -1 00
-1.00 0 25
0.25
3 1.00 0.00 0.25
4 1 00
1.00 1 00
1.00 0 25
0.25
5 1.00 2.00 0.25
5-item
5 item Test ICCs
1.0
09
0.9
0.8
0.7
P((u=1|θ )

0.6
0.5
0.4
0.3
02
0.2
0.1
0.0
-3 -2 -1 0 1 2 3

Ability (θ )
5-item
5 item Test TCC
5

3
X|θ )
E(X

0
-3 -2 -1 0 1 2 3

Ability (θ )
0
u = (1,1,0,0,0)

2
-2
og-Likeliihood

-4

-6
Lo

-8

-10
-3 -2 -1 0 1 2 3
Ability (θ)
u = (1,1,1,0,0)
0

2
-2
og-Likeliihood

-4

-6
Lo

-8

-10
-3 -2 -1 0 1 2 3
Ability (θ)
u = (0,0,0,0,0)
0

2
-2
og-Likeliihood

-4

-6
Lo

-8

-10
-3 -2 -1 0 1 2 3
Ability (θ)
u = (1,1,1,1,1)
0

2
-2
og-Likeliihood

-4

-6
Lo

-8

-10
-3 -2 -1 0 1 2 3
Ability (θ)
0

u = (0,1,1,1,1)
2
-2
og-Likeliihood

-4

-6
Lo

-8

-10
-3 -2 -1 0 1 2 3
Ability (θ)
Benefits of MLE
• MLEs have desirable properties:
– Efficiency (asymptotically smallest
variance).
variance)
– Consistent (asymptotically unbiased).
– Asymptotically normally distributed
distributed.
• “Asymptotically” here refers to the
number
b off ititems. The
Th longer
l th
the ttest,
t
the better MLE works.
The Problem with MLE
• As we’ve
we ve seen,
seen sometimes there is no
MLE for a given response vector.
• These are less likely to occur with
longer
g tests, but still p
possible.
• We still, however, would like to
provide scores for these examinees!
• Bayesian estimation can help…
Bayes’ Theorem
• Thomas Bayes (1702-1761) put forth
what is known as “Bayes’
Bayes Theorem
Theorem”

• This is more controversial than you


might think! A lot of debate exists
about this approach to estimation…
Frequentists vs. Bayesians
Frequentist: P( B | A) P( A)
“All we can reallyy do is P( A | B) =
estimate the likelihood of
observing these data
P( B)
given a parameter.
parameter ”

Bayesian: f (u | θ ) f (θ )
“No, we can
estimate the
f (θ | u ) =
distribution of a
f (u )
parameter given
the data.”

f (θ | u ) ∝ L(u | θ ) f (θ )
Bayesian Ability Estimation
• The approach basically entails
combining the likelihood function with
a prior
i distribution
di t ib ti tot estimate
ti t theth
posterior distribution of ability:
Posterior is proportional to Likelihood X prior

f (θ | u ) ∝ L(u | θ ) f (θ )
Log likelihood with Bayes
Log-likelihood
f (θ | u ) ∝ L(u | θ ) f (θ )

ln[ f (θ | u )] ∝ ln[ L(u | θ )] + ln[ f (θ )]


• As with MLE, this transformation
won’t affect the estimate, and it is
p
computationally y more efficient.
Prior Distribution
• By specifying some density function we
expect θ to follow in the population, e.g.,
N(0 1) we can then estimate the form of
N(0,1),
the posterior distribution of ability for any
given examinee
examinee.
• f(θ) is called the prior distribution because
it represents some prior belief about the
distribution of θ in the population.
Posterior Distribution
• Once a prior has been specified
specified, we can
estimate the posterior distribution of ability
and determine a point estimate and
standard error directly from it.
• The posterior represents θ as a random
variable; it is conceptually determined by
the union of the observed data tempered
by our prior belief in θ’s distribution.
Estimates from the Posterior
• Mode of the posterior:
Modal a Posteriori (MAP) estimate
• Mean of the posterior:
Expected
p a Posteriori ((EAP)) estimate
• Standard Errors are computed
directly by estimating the standard
deviation of the posterior distribution.
Bayes’ Estimates in Practice
Bayes

f (θ | u ) ∝ L(u | θ ) f (θ )
• Take the likelihood function for
any given
i l off θ and
value d multiply
lti l
it byy the prior
p y value at θ.
density
Bayes’ Estimates in Practice
Bayes

f (θ | u ) ∝ L(u | θ ) f (θ )
• Conceptually
Conceptually, this computation effectively
adds an “item” to each likelihood function
and every examinee gets it “correct”.
correct .
• Instead of a monotonically increasing
function for this “item”
item we choose, say, a
normal density, which results in “pulling
down the tails” of the likelihood function.
ln ((Prior)
o)
0 Prior ~ N(0,1)
u = (1,1,0,0,0)
Prior
elihood & Log-P

-2

-4

-6
Log-Like

-8
L

-10
-3 -2 -1 0 1 2 3
Ability (θ )
0
u = (1,1,0,0,0)

-2
terior

-4
Log-Pos

-6
L

-8

-10
-3 -2 -1 0 1 2 3
Ability (θ )
ln (Prior) u = ((1,1,1,0,0)
, , , , )
0 Prior ~ N(0,1)
Prior
kelihood & Log-P

-2
2

-4

-6
Log-Lik

-8

-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = ((1,1,1,0,0)
, , , , )
0

-2
terior

-4
Log-Pos

-6
L

-8

-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = (0,0,0,0,0) ln (Prior)
Prior ~ N(0
N(0,1)
1)
0
Prior
elihood & Log-P

-2

-4

-6
Log-Lik

-8
L

-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = (0,0,0,0,0)
0

-2
terior

-4
Log-Pos

-6
L

-8

-10
-3 -2 -1 0 1 2 3
Ability (θ )
ln (Prior) u = (1,1,1,1,1)
0 Prior ~ N(0,1)
Prior
elihood & Log-P

-2

-4

-6
Log-Lik

-8
L

-10
-3 -2 -1 0 1 2 3
Ability (θ )
u = (1,1,1,1,1)
0

-2
terior

-4
Log-Pos

-6
L

-8

-10
-3 -2 -1 0 1 2 3
Ability (θ )
ln (Prior)
Prior ~ N(0,1)
N(0 1)
0
Prior
elihood & Log-P

-2

-4
u = (0,1,1,1,1)
(0 1 1 1 1)

-6
Log-Lik

-8
L

-10
-3 -2 -1 0 1 2 3
Ability (θ )
0

u = (0,1,1,1,1)
-2
terior

-4
Log-Pos

-6
L

-8

-10
-3 -2 -1 0 1 2 3
Ability (θ )
Bayesian vs
vs. MLE
• Bayesian estimates will be biased
towards the mean of the prior
distribution; this is more apparent with
shorter tests, as the prior has a lot of
influence (1/6 in our example!)
example!).
• Influence of the prior can be lessened
by choosing a relatively
“uninformative” prior, e.g., N(0,10).
Bayesian vs
vs. MLE
• Recall that MLEs are asymptotically
unbiased; with relatively short tests,
MLE will
MLEs ill b
be bi
biased
d outwards,
t d as
opposed to Bayesian estimates,
which are biased inwards.
• Choosing a Uniform Prior will result in
estimates identical to MLE.
Jointt Estimation
Jo st at o ofo Item
te and
a d
Ability Parameters
• What happens when all you have is
a dataset filled with item responses
and no previously estimated item
parameters are available for you to
use in scoring?
– Fairly technical topic
topic, we’ll
we ll keep it on
the applied side for this discussion.
Jointt Estimation
Jo st at o ofo Item
te and
a d
Ability Parameters
• Most Common Approach:
– Marginal Maximum Likelihood
• Other procedures
– Joint
J i t Maximum
M i Lik
Likelihood
lih d
• Rasch only
– Markov Chain
C Monte C
Carlo ((MCMC)
C C)
• Typically only used for highly
parameterized models
Joint Maximum Likelihood
1. Obtain initial estimates of θi, i=1,…,N.
1 i=1 N
2. Solve likelihood equations for bj, aj, and
cj, j=1,…,n.
j=1 n
3. Return to the first set of equations and
l ffor θi, i=1,…,N.
solve i 1 N
4. Repeat until estimates from one iteration
to the next converge (popular criterion is
0.001).
Joint Maximum Likelihood
• JMLE can work well for the Rasch
model, but estimates are not always
consistent
i t t or unbiased
bi d ffor th
the 2
2-
and 3-PL models.
• Item (structural) parameters need to
be estimated without reference to
ability (incidental) parameters…
Marginal MLE
• Used to estimate the item
parameters using the marginal
di t ib ti off ability
distribution bilit parameters.
t
• Person parameters are then
estimated using one of the
previously mentioned techniques
techniques,
treating item parameters as fixed
and known
known.
Marginal Distribution: Continuous

f ( x1 ) = ∫
−∞
f ( x1 , x2 ) dx2
Where f(x1) is the marginal distribution of item parameters,
f( 2) is
f(x i the
th marginal
i l distribution
di t ib ti off ability
bilit parameters,
t
and
f(x1,x2) is the joint distribution of item & ability
Marginal Distribution: Discrete

f ( x1 ) = ∑ f ( x1 , x2 )
x2
Where f(x1) is the marginal distribution of item parameters,
f( 2) is
f(x i the
th marginal
i l distribution
di t ib ti off ability
bilit parameters,
t
and
f(x1,x2) is the joint distribution of item & ability
Integrating Over Theta
• Finding the marginal distribution of
the item parameters by integrating
over θ …
• This eliminates θ from the function
and we can use the resulting
likelihood function to get the
parameter estimates.
Integrating Over θ
• However
However, to do this
this, we don’t
don t want to
weight all θ values the same in
determining the contribution to the
marginal density, and as such, a
distribution of θ is used to take this
into account.
– Typically a N(0
N(0,1)
1) distribution is used
used.
– Could be empirically derived.
MMLE
• Once θ has been eliminated from
the function, item parameters can
be estimated using a Maximum
Likelihood procedure
procedure.
• Bayesian procedures, too!
Marginal MLE
1 Specify a density function for θ,
1. θ
e.g., θ ∼ N(0,1), and treat
examinees as a random sample
from θ distribution.
distribution
2. Integrate over the log-likelihood
function of item parameters with
respect to θ.
Marginal MLE

3 Differentiate the log-L


3. log L
function with respect
p to bi, ai,
and ci, i=1,…,n.
4. The values for b, a, and c
that maximize the function
are the marginal MLEs.
Marginal MLE
• MMLE estimates will be consistent &
unbiased for the 1-, 2-, and 3-PL (if the
choice of θ distribution is reasonable)
reasonable).
• Fairly inefficient because it requires
i t ti over 2n response vectors!
integration t !
• EM (Expectation-Maximization)
algorithm can be (and is) used to
facilitate estimation (BILOG-MG, etc.).
EM Algorithm
• Iterative procedure for finding
MLEs.
• Allows us to more quickly find
estimates of item parameters in the
presence of an unobservable
random variable, θ .
• I practice
In i this
hi iis d
done bby
discretizing the θ distribution into q
quadrature nodes.
0.5

X7
of Density Functtion

0.4
X6 X8

0.3
X5 X9

0.2
Height o

X4 X10

0.1
X3 X11
X2 X12
X1 X13
0.0
-3
3 -2.5
25 -2
2 -1.5
15 -1
1 -0.5
05 0 05
0.5 1 15
1.5 2 25
2.5 3
Ability Quadrature Nodes
EM Algorithm
• Expectation step: use provisional item
parameter estimates to compute
examinee
i lik
likelihood
lih d values
l along
l q
quadrature points (i.e., discrete
values) and estimate expected
frequency
q y and p proportion
p correct
values for each point.
EM Algorithm
• Maximization step: Iteratively solve
item parameter likelihood functions
( t included
(not i l d dhhere)) using
i expected
t d
frequency and proportion correct
values.
• These two steps go back and forth
until the overall likelihood is
unchanged from the previous cycle
“unchanged” cycle.
Bayesian Item Estimation
• In the analogous situation to estimation of
ability alone, there may arise situations
where no maximum exists for the
multidimensional likelihood function.
– Weird values are still possible.
• Specification of priors for item parameters
and consequent estimation of the posterior
distributions of item parameters will
facilitate convergence of solutions.
Bayesian Item Estimation
• With or without marginalization of
incidental parameters (i.e., θ).
–Marginalization: simple Bayesian
extension of the MMLE/EM
procedure.
–Without:
With t MCMC approaches. h
Marginalized Approach
• Mislevy & Bock (1986, 1989).
– Implemented in BILOG, BILOG-MG.
• Works in essentially the same way as the
MMLE/EM approach, except that
likelihood functions for item parameters
p
are “mixed” with prior distributions.
– Point estimates are subsequently derived by
finding the mean or mode of the posterior
distribution for each item parameter.
Priors
• Informative: small variance.
variance
• Uninformative: relativelyy largeg
variance.
• The
Th prior’s
i ’ iinfluence
fl will
ill b
be
inverselyy p
proportional
p to ((1)) its
variance, and (2) the amount of
data available for estimation
estimation.
Great Reference for more
technical discussion of MMLE,
EM, & Bayesian approaches:

Baker,, F. B.,, & Kim,, S.-H. ((2004).


) Item
Response Theory: Parameter
Estimation Techniques q ((2nd Ed.).
) New
York: Marcel Dekker, Inc.
MCMC Estimation
• Markov Chain Monte Carlo .
yp y used for estimation
• Not typically
in the conventional models since
MMLE/EM typically works wellwell.
MCMC Estimation
• With some new highly
parameterized models, there can
be difficulties in applying the
MMLE/EM approach so a solution
is impossible .
• MCMC offers an alternative.
MCMC Estimation
• The concept of MCMC estimation is
to construct a set of random draws
from the posterior distribution for
each parameter being estimated.
• The
Th problem:
bl typically
i ll diffi
difficult
l to
draw random numbers from an
arbitrary distribution without a
known form.
MCMC Estimation

• T
To simplify,
i lif a di
distribution
t ib ti ffrom which
hi h we
can easily draw samples is chosen, and
random
d d
draws are ttaken
k ffrom it.
it
• These draws are either accepted or
rejected as being plausible from the
actual posterior distribution until we
have retained enough draws to make
inferences.
MCMC Estimation
• The draws that are retained are
then taken to be a sample from the
posterior
i di
distribution.
ib i
• Using the sample from the
posterior, the point estimates can
taken to be the mean or the mode
of the posterior distribution.
MCMC Estimation
• One nice feature:
– One can incorporate estimates of
uncertainty into parameter estimation.
• E.g., standard errors of ability can be
taken into account when estimating item
parameters.
• Likewise,
Lik i standard
t d d errors off ititem
parameters can be taken into account
when estimating ability parameters
parameters.
MCMC Conclusion
• Method of simulating random draws from
any theoretical multivariate distribution.
– Any posterior distribution.
distribution
• Features of the theoretical distribution (i.e.,
mean variance) are then estimated based
mean,
on the random sample.
• Easy to specify,
f but takes a long time.
– Very inefficient, but it almost always works.
Next…
Next

• Using IRT Software to


estimate parameters
p
–BILOG-MG Æ All Dichotomous
ititems
PARSCALE Æ Polytomous
–PARSCALE
and/or dichotomous items

You might also like