© All Rights Reserved

4 views

© All Rights Reserved

- High Impact n Effective Application of Manufacturing Statistics
- QC Solution Manual Chapter 6, 6th Edition
- Descriptive statistics+Probability
- Technological Change, Technological Catch-up, and Capital Deepening
- Performance of Logistic Regression in Tuberculosis Data
- Personality Factors in the Eysenck Personality Questionnaire(Barrett & Kline, 1980)
- BayesTree
- Statistical Analysis of Priority Sector Credit By Commercial Banks in India
- Construction Spending December 11
- Markowitz
- bindr-tutorial-markdown
- 2.6 Hooker, Oct_FDA_HOOKER, Final
- Grocery Gateway Simulated Demand Data
- Variance Covariance Modified
- SR1_lec1a_McGready
- Assignment 3_4021_Mayank_Verma.docx
- will19a
- OR ASSIGNMENT 2.docx
- Project Statistika
- thesis in insurance

You are on page 1of 39

Tim Croudace

Descriptions of IRT

IRT refers to a set of

mathematical models that

describe, in probabilistic

terms, the relationship

between a persons

response to a survey

question/test item and his

or her level of the latent

variable being measured

by the scale

usually a hypothetical

construct [trait/domain or

ability] which is

postulated to exist but

cannot be measured by a

single observable

variable/item.

Clinical Trials. Oxford Univ

Press:

Chapter on Applying IRT for

evaluating questionnaire

item and scale properties.

Instead it is indirectly

measured by using

multiple items or

questions in a multi-item

test/scale.

data:

logit {hi} = h 0 + h 1zi The0000

h0

h1

10

21

h0 40

1000

0001

0010

1001

1010

0011

1011

0100

1100

0101

0110

1101

1110

0111

1111

n

477

63

12

150

7

32

11

4

231

94

13

378

12

169

45

31

q2 newspapers

q3 reading

3

A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1

0

0

1

0

0

1

1

0

1

0

1

0

0

1

1

0

1

(n=1729 new individual

values)

0

0

0 [n]

Total score

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

1

0

0

1

0

1

0

1

1

0

0

1

0

1

0

1

1

477

63

12

150

7

32

11

4

231

94

13

378

12

169

45

31

1

1

1

2

2

2

3

1

2

2

2

3

3

3

4

4

Analysis

Results:

logit-probit model

Warming up to this sort of thing soon .

U1

U2

U3

...

Up

3 items with different thresholds but similar slopes

models for constructs

underpinning multiple binary

responses

based on(0/1)

innovations

in educational testing

Same models used in educational testing with

correct incorrect answers can be applied to

symptom present / absent data (both binary)

Extensions to ordinal outcomes (Likert scales)

Flexibility in parametric form available

Semi- and non-parametric approaches too

response probability on latent variable

y-axis

prob

of

response

(Yes)

Adapted

without permission

from a

slide by

Prof H Goldstein

on a

simple

binary

(Yes/No)

scale

item

x-axis

measured

GRM

IRT models

Simplest case of a latent trait analysis

Manifest variables are binary: only 2 distinctions are

made

these take 0/1 values

Yes / No

Right / Wrong

Symptom present / absent

[>2 response categories] .. see next lecture IRT 2 on Friday

(not parameter estimation for items)

it is frequently assumed that the UNOBSERVED (latent) variable

< the latent factor / trait>

is not only continuous but normally distributed

[or the prior distn is normal but the posterior distn may not be]

10

The most commonly used model was developed by

Lord-Birnbaum model (Lord, 1952; Birnbaum, )

2-parameter logistic

[a.k.a. the logit-probit model; Bartholomew (1987)]

The model is essentially a non-linear single factor model

When applied to binary data, the traditional linear factor model is

only an approximation to the appropriate item response model

sometimes satisfactory, but sometimes very poor (we can guess when)

like a revolutionary & very modern development

this is not true!

suffered from being presented and taught as disconnected from

these

A unified treatment can be given that builds one from the other

(McDonald, 1999) but this would be a one term course on its own

11

IRT models provide a clear statement [picture!]

of the performance of each item in the scale/test

and

how the scale/test functions, overall,

for measuring the construct of interest

in the study population

The objective is to model each item by estimating

the properties describing item performance

characteristics

hence Item Characteristic Curve

or Symptom Response Function.

12

example

Lombard and Doering (1947) data

Questions on cancer knowledge with four

addressing the source of the information

Fitting a latent variable model might be

proposed as a way of constructing a

measure of how well informed an individual

is about cancer

A second stage might relate knowledge

about cancer to knowledge about other

diseases or general knowlege

13

example

Lombard and Doering (1947) data

Questions on cancer knowledge with

four addressing the source of the

information

radio

newspapers

(solid) reading (books?)

lectures

response patterns from 0000 to 1111

14

Data

Lombard and Doering

(1947) data

2 to the power 4

i.e. 16 possible response

patterns (all occur)

with more items this is

neither likely nor necessary

0000 to 1111

frequency is the number

with each item response

pattern

0000

1000

0001

0010

1001

1010

0011

1011

0100

1100

0101

0110

1101

1110

0111

1111

n

477

63

12

150

7

32

11

4

231

94

13

378

12

169

45

31

15

data:

logit {hi} = h 0 + h 1zi The0000

h0

h1

10

21

h0 40

1000

0001

0010

1001

1010

0011

1011

0100

1100

0101

0110

1101

1110

0111

1111

n

477

63

12

150

7

32

11

4

231

94

13

378

12

169

45

31

q2 newspapers

q3 reading

16

A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1

When multiple items are applied in a test /

survey can use latent variable modelling to

explore inter-relationships among observed responses

determine whether the inter-relationships can be explained by a

small number of factors

the basis of their responses

Basically to rank order (arrange) or quantify (score) survey

participants, test takers, individuals who have been studied

CAN BE THOUGHT OF AS ADDING A NEW SCORE TO YOUR

DATASET FOR EACH INDIVIDUAL

of each item, as a measure of the target construct (what

properties?)

GRAPHICAL REPRESENTATION IS BEST

17

are captured graphically by so called Item

Characteristics Curves (ICCs)

18

is useful and necessary to examine score precision

(the accuracy of estimated scores)

we are interested in this for different individuals

(individuals with different score values)

by inspecting the amount of information about each

score level, across the score range (range of

estimated scores) we are identifying variations in

measurement precision (reliable of individuals

estimated scores)

this enables us to make statements about the

effective measurement range of an instrument in an

population

19

Curves

20

- add them together to get TIF

21

22

- shown alongside their ICCs

3.0

0.14

0.14

0.40

11

23

1 / Sqrt [Information] =

s.e.m

Info Sqrt(Info) 1/(sqrt(Info)

1

1.0

1.0

2

1.4

0.7

3

1.7

0.6

4

2.0

0.5

5

2.2

0.4

6

2.4

0.4

7

2.6

0.4

8

2.8

0.4

9

3.0

0.3

10

3.2

0.3

11

3.3

0.3

12

3.5

0.3

24

Standard error of measuremenr is not constant (U-shaped, not symmet

Approximate reliability

Reliability

= 1 1/[Info]

= {1 1 / [1 / (s.e.m

^2) }

s.e.m. = standard error of measurement

25

Lombard and Doering

(1947) data

2 to the power 4

i.e. 16 possible response

patterns (all occur)

with more items this is

neither likely nor necessary

0000 to 1111

frequency is the number

with each item response

pattern

0000

1000

0001

0010

1001

1010

0011

1011

0100

1100

0101

0110

1101

1110

0111

1111

n

477

63

12

150

7

32

11

4

231

94

13

378

12

169

45

31

What would be the easiest thing to do with these numbers; to score the26patter

Answer ..

Simply add them up

0000

1000

0001

0010

1001

1010

0011

1011

0100

1100

0101

0110

1101

1110

0111

1111

What would be the easiest thing to do with these numbers; to score the27patter

0

0

1

0

0

1

1

0

1

0

1

0

0

1

1

0

1

(n=1729 new individual

values)

0

0

0 [n]

Total score

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

1

0

0

1

0

1

0

1

1

0

0

1

0

1

0

1

1

477

63

12

150

7

32

11

4

231

94

13

378

12

169

45

31

1

1

1

2

2

2

3

1

2

2

2

3

3

3

4

28

Weighted

scores

0

h 1]

0

1

0

0

1

1

0

1

0

1

0

0

1

1

0

5.50

0

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

0

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0 [n]

Total

0

0

1

0

1

0

1

1

0

0

1

0

1

0

1

score

0

1

1

1

2

2

2

3

1

2

2

2

3

3

3

score

-0.98

-0.68

-0.67

-0.46

-0.41

-0.23

-0.22

0.0

0.16

0.42

0.43

0.66

0.72

0.99

1.02

477

63

12

150

7

32

11

4

231

94

13

378

12

169

45

score

0.72

0

3.40=

0.72

1.34

0.77

0.77

1.34

0.72

+ 0.77

Mplus version 4.1 ML Estimate

0.72 +1.34

Z by Q1

alpha h 1

0.721

1.34

Z by +

Q2 0.77

alpha h 2

3.358

Z by Q3

alpha h 3

0.72

+ 1.34

+ 0.771.344

Z by Q4

alpha h 4

0.769

3.40

0.72

+3.40 Z

Variances

1

3.40+ 0.77

Compare with Bartholomew (1987)

0.72 (0.09)

3.40

+ 1.34

3.40 (1.14)

1.34 (0.17)

0.72

+ 3.40+ 0.77

0.77 (0.15)

0.72+ 3.40+1.34

3.40+1.34+ 0.77

0

0.72

0.77

1.34

1.48

S.E.

2.06

0.093

2.10

1.035

0.167

2.82

0.145

3.40

4.12

4.16

p160

4.74

4.88

5.46

29

37

data:

logit {hi} = h 0 + h 1zi The0000

h0

h1

10

21

h0 40

1000

0001

0010

1001

1010

0011

1011

0100

1100

0101

0110

1101

1110

0111

1111

n

477

63

12

150

7

32

11

4

231

94

13

378

12

169

45

31

q2 newspapers

q3 reading

30

A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1

Weighted

scores

0

h 1]

0

1

0

0

1

1

0

1

0

1

0

0

1

1

0

5.50

0

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

0

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0 [n]

Total

0

0

1

0

1

0

1

1

0

0

1

0

1

0

1

score

0

1

1

1

2

2

2

3

1

2

2

2

3

3

3

score

-0.98

-0.68

-0.67

-0.46

-0.41

-0.23

-0.22

0.0

0.16

0.42

0.43

0.66

0.72

0.99

1.02

477

63

12

150

7

32

11

4

231

94

13

378

12

169

45

score

0.72

0

3.40=

0.72

1.34

0.77

0.77

1.34

0.72

+ 0.77

Mplus version 4.1 ML Estimate

0.72 +1.34

Z by Q1

alpha h 1

0.721

1.34

Z by +

Q2 0.77

alpha h 2

3.358

Z by Q3

alpha h 3

0.72

+ 1.34

+ 0.771.344

Z by Q4

alpha h 4

0.769

3.40

0.72

+3.40 Z

Variances

1

3.40+ 0.77

Compare with Bartholomew (1987)

0.72 (0.09)

3.40

+ 1.34

3.40 (1.14)

1.34 (0.17)

0.72

+ 3.40+ 0.77

0.77 (0.15)

0.72+ 3.40+1.34

3.40+1.34+ 0.77

0

0.72

0.77

1.34

1.48

S.E.

2.06

0.093

2.10

1.035

0.167

2.82

0.145

3.40

4.12

4.16

p160

4.74

4.88

5.46

31

37

subtle

Simple sum scores assumes all item

responses equally useful at defining the

construct

may not be the case

different discriminating power with respect to

what we are measuring, we might want to take

that into accounf

How? Weighted sum scores [Component scores]

weighted by what?

weighted by the estimates (factor loading type

parameter) from a latent variable model

[latent trait model with a single latent factor]

32

h0

h1

Cancer

Knowledge zi

The data:

0000

1000

0001

0010

1001

1010

0011

1011

0100

1100

0101

0110

1101

1110

0111

1111

10

21

h0 40

n

477

63

12

150

7

32

11

4

231

94

13

378

12

169

45

31

0.14

- shown alongside their ICCs

3.0

0.14

0.40

11

q2 newspapers q3 reading

q4 lectures

A single latent dimension Z Normal (mean 0; std dev =1 ) so Var= 1 too! 26

beware y axis scaling : not all the same

Z

Z

Z

Z

by

by

by

by

Q1

Q2

Q3

Q4

Variances

alpha

alpha

alpha

alpha

h

h

h

h

1

2

3

4

0.721

3.358

1.344

0.769

Weighted

scores

S.E.

0.093

1.035

0.167

0.145

Weights

alpha h 1

parameters

0.72

3.40

1.34

0.77

(0.09)

(1.14)

(0.17)

(0.15)

37

Q1

0.72

Q2

3.40

Q3

1.34

Q4

0.77

These numbers 33

20

?????

0.72

3.40

1.34

0.77

(weighted values)

0

h 1]

0

1

0

0

1

1

0

1

0

1

0

0

1

1

0

5.50

0

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

0

0

0

0

1

0

1

1

1

0

0

0

1

0

1

1

0 [n]

Total

0

0

1

0

1

0

1

1

0

0

1

0

1

0

1

score

0

1

1

1

2

2

2

3

1

2

2

2

3

3

3

score

-0.98

-0.68

-0.67

-0.46

-0.41

-0.23

-0.22

0.0

0.16

0.42

0.43

0.66

0.72

0.99

1.02

477

63

12

150

7

32

11

4

231

94

13

378

12

169

45

score

0

=

0.72

0.77

1.34

0.72+ 0.77

0.72 +1.34

1.34+ 0.77

0.72+ 1.34+ 0.77

3.40

0.72+3.40

3.40+ 0.77

3.40+ 1.34

0.72+ 3.40+ 0.77

0.72+ 3.40+1.34

3.40+1.34+ 0.77

0

0.72

0.77

1.34

1.48

2.06

2.10

2.82

3.40

4.12

4.16

4.74

4.88

5.46

34

The estimated factor scores from the

model

Not just some simple sum or unweighted

or weighted items

Takes into account the proposed score

distribution (gaussian normal) and the

estimated model parameters (but not the

fact that they are estimates rather than

known values) and more besides (when

missing

are present)

thedata

estimated

factor scores

35

introduction to IRT

Play with the key features of IRT

models

www2.unijena.de/svw/metheval/irt/VisualIRT.pdf

36

model]

VisualIRT (pdf)

Page

VisualIRT (pdf)

Page

Any hypothetical latent variable [factor/trait] contin

expressed in a z-score metric (gaussian normal (0,1

Item properties

slope = item discrimination

location = item commonality [difficulty/prevalance/

37

IRT Resources

A visual guide to Item Response Theory

I. Partchev

Introduction to RIT,

R.Baker

http

//ericae.net/irt/baker/toc.htm

B Reeve

P Fayers

H Goldstein

LSE books (Bartholomew, Knott, Moustaki, Steele)

38

Applying The Rasch Model Trevor G. Bond and Christine M. Fox 255 pages. 2001.

Constructing Measures: An ItemItem

Response

Modeling Approach Mark Wilson. 248

Response Theory Books

pages. 2005.

The EM Algorithm and Related Statistical Models Michiko Watanabe and Kazunori

Yamaguchi. 250 pages. 2004.

Essays on Item Response Theory Edited by Anne Boomsma, Marijtje A.J. van Duijn, Tom A.A.

Snijders. 438 pages. 2001.

Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach

Edited by Paul De Boeck and Mark Wilson. 382 pages. 2004.

Fundamentals of Item Response Theory Ronald K. Hambleton, H. Swaminathan, and H. Jane

Rogers. 184 pages. 1991.

Handbook of Modern Item Response Theory Edited by Wim J. van der Linden and Ronald K.

Hambleton. 510 pages. 1997.

Introduction to Nonparametric Item Response Theory Klaas Sijtsma and Ivo W. Molenaar.

168 pages. 2002.

Item Response Theory Mathilda Du Toit. 906 pages. 2003.

Item Response Theory for Psychologists Susan E. Embretson and Steven P. Reise. 376

pages. 2000.

Item Response Theory: Parameter Estimation Techniques (Second Edition, Revised and

Expanded w/CD) Frank Baker and Seock-Ho Kim. 495 pages. 2004.

Item Response Theory: Principles and Applications Ronald K. Hambleton and Hariharan

Swaminathan. 332 pages. 1984.

Logit and Probit: Ordered and Multinomial Models Vani K. Borooah. 96 pages. 2002.

Markov Chain Monte Carlo in Practice W.R. Gilks, Sylvia Richardson, and D.J.

Spiegelhalter. 512 pages. 1995.

Monte Carlo Statistical Methods Christian P. Robert and George Casella. 645 pages.

2004.

Polytomous Item Response Theory Models Remo Ostini and Michael L. Nering. 120

pages. 2005.

Rasch Models for Measurement David Andrich. 96 pages. 1988.

Rasch Models: Foundations, Recent Developments, and Applications Edited by Gerhard H.

39

Fischer and Ivo W. Molenaar. 436 pages. 1995.

- High Impact n Effective Application of Manufacturing StatisticsUploaded byeddiekuang
- QC Solution Manual Chapter 6, 6th EditionUploaded byBunga Safhira Wirata
- Descriptive statistics+ProbabilityUploaded byrahulsukhija
- Technological Change, Technological Catch-up, and Capital DeepeningUploaded byTechnoMainstream Blog
- Personality Factors in the Eysenck Personality Questionnaire(Barrett & Kline, 1980)Uploaded byMal Falado
- Performance of Logistic Regression in Tuberculosis DataUploaded byIJSRP ORG
- BayesTreeUploaded byAshwini Kumar Pal
- Statistical Analysis of Priority Sector Credit By Commercial Banks in IndiaUploaded byEditor IJTSRD
- Construction Spending December 11Uploaded byCoy Davidson
- MarkowitzUploaded byrishi_aja
- bindr-tutorial-markdownUploaded byapi-274782352
- 2.6 Hooker, Oct_FDA_HOOKER, FinalUploaded byNugraha Muharafandy
- Grocery Gateway Simulated Demand DataUploaded byTrisha Sinha
- Variance Covariance ModifiedUploaded byMuhammadIjazAslam
- SR1_lec1a_McGreadyUploaded byVivek Jain
- Assignment 3_4021_Mayank_Verma.docxUploaded byMayank Verma
- will19aUploaded byAtanu De
- OR ASSIGNMENT 2.docxUploaded bySiddesh S
- Project StatistikaUploaded bytira kristy pane
- thesis in insuranceUploaded byOla Shaheen
- hw9Uploaded byapi-465474049
- Dynamic Phenomena Sensor Networks .pptUploaded bymoyedahmed
- Info 159/259 HW 2Uploaded byYvonne Yifan Zhou
- Probabilistic Fatigue BUploaded byMr Polash
- 6 3 and practice test 2016Uploaded byapi-354951013
- control chartUploaded byNurul Fadilah Syahrul
- Probability & Statistics New 1Uploaded byamrit403
- QMB12ch04Uploaded bySyed Ali
- Production Adjusting Method Based on Predicted Distribution of Production and Inventory Using Dynamic Bayesian NetworkUploaded byAlexgri
- Seminar 3 Grouping DataUploaded byCrina Elena Andries

- ctt-irtUploaded byAin Kyra
- IRTUploaded byMani Sousa
- A Comparison of IRT and CFA Methodologies for Establishing Measurement Equivalence-InvarianceUploaded byRosa Camacho
- An Introduction to Modern Measurement TheoryUploaded byAZOGTHOTH
- piat-r 1.pdfUploaded byChempaka Chem
- LtmUploaded bysunilverma2010
- CAT ExamsUploaded byAnonymous v5QjDW2eHx
- 2005.Adults’ ProsocialnessUploaded byanzuff
- In Search for Qualified Engineers: Construction of the Best Engineering Traits (BET) InventoryUploaded byCarlo Magno
- EQUATING-Impact of matched samples equating methods on.pdfUploaded byFirman
- MahatUploaded byBrooke Tillman
- Both Et Al 2007-Review Dermatology QoL InstrumentsUploaded byOkkidhona
- New Scoring Methodology Improves the Adas CogUploaded byIcaro
- Ba Yes i an Modeling User ManualUploaded byJoe King
- Cuurent Development in Lanuage TestingUploaded byOnin
- Soklan Exam Ppg Sem 6Uploaded byHaslilah Selamat
- Measuring Self EfficacyUploaded byNha' Ini Nahh
- Livro-bilog Multilog ParscaleUploaded byMarcel Bruno Braga
- Multiple Level VocabularyUploaded byWojtek Zaluska
- SirtUploaded byfaisal
- Methods of Test ConstructionUploaded byDiana Quimpan Cillo
- Ronald K. Hambleton-Fundamentals of Item Response TheoryUploaded bySoffan Nurhaji
- AERA, APA, NCME Standards - ReliabilityUploaded byMadalina Kit
- Self Measures for Love and Compassion Research MARITAL SATISFACTIONUploaded byanon_177367831
- Verify Technical ManualUploaded byamit64007
- A Comparison of IRT-based Methods for Examining DiUploaded byAnonymous e2FCVT
- The International Journal of Educational and Psychological Assessment Vol 1Uploaded byCarlo Magno
- Student SatisfactionUploaded bycss_said
- Item resposne TheoryUploaded byTiago Cabaço
- Raven's Apm International Technical ManualUploaded byArscent Piliin